عرض بسيط للتسجيلة

المؤلفKutlu, M.
المؤلفKutlu, Mucahid
المؤلفMcDonnell, Tyler
المؤلفBarkallah, Yassmine
المؤلفElsayed, Tamer
المؤلفLease, Matthew
تاريخ الإتاحة2019-09-15T08:00:42Z
تاريخ النشر2018-06-27
اسم المنشور41st International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2018
المعرّفhttp://dx.doi.org/10.1145/3209978.3210033
الاقتباسMucahid Kutlu, Tyler McDonnell, Yassmine Barkallah, Tamer Elsayed, and Matthew Lease. 2018. Crowd vs. expert: What can relevance judgment rationales teach us about assessor disagreement? In Proceedings of the 41st International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’18). 805–814.
الترقيم الدولي الموحد للكتاب 9781450356572
معرّف المصادر الموحدhttps://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85051536970&origin=inward
معرّف المصادر الموحدhttp://hdl.handle.net/10576/11838
الملخص© 2018 ACM. While crowdsourcing offers a low-cost, scalable way to collect relevance judgments, lack of transparency with remote crowd work has limited understanding about the quality of collected judgments. In prior work, we showed a variety of benefits from asking crowd workers to provide \em rationales for each relevance judgment \citemcdonnell2016relevant. In this work, we scale up our rationale-based judging design to assess its reliability on the 2014 TREC Web Track, collecting roughly 25K crowd judgments for 5K document-topic pairs. We also study having crowd judges perform topic-focused judging, rather than across topics, finding this improves quality. Overall, we show that crowd judgments can be used to reliably rank IR systems for evaluation. We further explore the potential of rationales to shed new light on reasons for judging disagreement between experts and crowd workers. Our qualitative and quantitative analysis distinguishes subjective vs.\ objective forms of disagreement, as well as the relative importance of each disagreement cause, and we present a new taxonomy for organizing the different types of disagreement we observe. We show that many crowd disagreements seem valid and plausible, with disagreement in many cases due to judging errors by the original TREC assessors. We also share our WebCrowd25k dataset, including: (1) crowd judgments with rationales, and (2) taxonomy category labels for each judging disagreement analyzed.
اللغةen
الناشرACM
الموضوعCrowdsourcing
Disagreement
Evaluation
Relevance assessment
العنوانCrowd vs. Expert: What can relevance judgment rationales teach us about assessor disagreement?
النوعConference Paper
الصفحات805-814
dc.accessType Abstract Only


الملفات في هذه التسجيلة

الملفاتالحجمالصيغةالعرض

لا توجد ملفات لها صلة بهذه التسجيلة.

هذه التسجيلة تظهر في المجموعات التالية

عرض بسيط للتسجيلة