Search
Now showing items 1-5 of 5
Your Behavior Signals Your Reliability: Modeling Crowd Behavioral Traces to Ensure Quality Relevance Annotations
(
AAAI Press
, 2018 , Conference Paper)
While peer-agreement and gold checks are well-established methods for ensuring quality in crowdsourced data collection, we explore a relatively new direction for quality control: estimating work quality directly from ...
The many benefits of annotator rationales for relevance judgments
(
International Joint Conferences on Artificial Intelligence
, 2017 , Conference Paper)
When collecting subjective human ratings of items, it can be difficult to measure and enforce data quality due to task subjectivity and lack of insight into how judges arrive at each rating decision. To address this, we ...
Efficient Test Collection Construction via Active Learning
(
Association for Computing Machinery
, 2020 , Conference Paper)
To create a new IR test collection at low cost, it is valuable to carefully select which documents merit human relevance judgments. Shared task campaigns such as NIST TREC pool document rankings from many participating ...
Overview of the CLEF-2021 CheckThat! Lab Task 1 on check-worthiness estimation in tweets and political debates
(
CEUR-WS
, 2021 , Conference Paper)
We present an overview of Task 1 of the fourth edition of the CheckThat! Lab, part of the 2021 Conference and Labs of the Evaluation Forum (CLEF). The task asks to predict which posts in a Twitter stream are worth ...
ArTest: The First Test Collection for Arabic Web Search with Relevance Rationales
(
Association for Computing Machinery, Inc
, 2020 , Conference Paper)
The scarcity of Arabic test collections has long hindered information retrieval (IR) research over the Arabic Web. In this work, we present ArTest, the first large-scale test collection designed for the evaluation of ad-hoc ...