Crowd vs. Expert: What can relevance judgment rationales teach us about assessor disagreement?

Kutlu, M.; Kutlu, Mucahid; McDonnell, Tyler; Barkallah, Yassmine; Elsayed, Tamer; Lease, Matthew

المؤلف	Kutlu, M.
المؤلف	Kutlu, Mucahid
المؤلف	McDonnell, Tyler
المؤلف	Barkallah, Yassmine
المؤلف	Elsayed, Tamer
المؤلف	Lease, Matthew
تاريخ الإتاحة	2019-09-15T08:00:42Z
تاريخ النشر	2018-06-27
اسم المنشور	41st International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2018
المعرّف	http://dx.doi.org/10.1145/3209978.3210033
الاقتباس	Mucahid Kutlu, Tyler McDonnell, Yassmine Barkallah, Tamer Elsayed, and Matthew Lease. 2018. Crowd vs. expert: What can relevance judgment rationales teach us about assessor disagreement? In Proceedings of the 41st International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’18). 805–814.
الترقيم الدولي الموحد للكتاب	9781450356572
معرّف المصادر الموحد	https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85051536970&origin=inward
معرّف المصادر الموحد	http://hdl.handle.net/10576/11838
الملخص	© 2018 ACM. While crowdsourcing offers a low-cost, scalable way to collect relevance judgments, lack of transparency with remote crowd work has limited understanding about the quality of collected judgments. In prior work, we showed a variety of benefits from asking crowd workers to provide \em rationales for each relevance judgment \citemcdonnell2016relevant. In this work, we scale up our rationale-based judging design to assess its reliability on the 2014 TREC Web Track, collecting roughly 25K crowd judgments for 5K document-topic pairs. We also study having crowd judges perform topic-focused judging, rather than across topics, finding this improves quality. Overall, we show that crowd judgments can be used to reliably rank IR systems for evaluation. We further explore the potential of rationales to shed new light on reasons for judging disagreement between experts and crowd workers. Our qualitative and quantitative analysis distinguishes subjective vs.\ objective forms of disagreement, as well as the relative importance of each disagreement cause, and we present a new taxonomy for organizing the different types of disagreement we observe. We show that many crowd disagreements seem valid and plausible, with disagreement in many cases due to judging errors by the original TREC assessors. We also share our WebCrowd25k dataset, including: (1) crowd judgments with rationales, and (2) taxonomy category labels for each judging disagreement analyzed.
اللغة	en
الناشر	ACM
الموضوع	Crowdsourcing Disagreement Evaluation Relevance assessment
العنوان	Crowd vs. Expert: What can relevance judgment rationales teach us about assessor disagreement?
النوع	Conference
الصفحات	805-814
dc.accessType	Abstract Only

الملفات في هذه التسجيلة

الملفات	الحجم	الصيغة	العرض
لا توجد ملفات لها صلة بهذه التسجيلة.

هذه التسجيلة تظهر في المجموعات التالية

علوم وهندسة الحاسب [‎2489‎ items ]

عرض بسيط للتسجيلة

Crowd vs. Expert: What can relevance judgment rationales teach us about assessor disagreement?

الملفات في هذه التسجيلة

هذه التسجيلة تظهر في المجموعات التالية

Video