Abstract
Relevance evaluation is an essential part of the development and maintenance of information retrieval systems. Yet traditional evaluation approaches have several limitations; in particular, conducting new editorial evaluations of a search system can be very expensive. We describe a new approach to evaluation called TERC, based on the crowdsourcing paradigm, in which many online users, drawn from a large community, each performs a small evaluation task.
- Amazon Mechanical Turk, http://www.mturk.comGoogle Scholar
- Jeff Barr and Luis Felipe Cabrera. "AI Gets a Brain", ACM Queue, May 2006. Google ScholarDigital Library
- Brendan O'Connor, "Search Engine Relevance: An Empirical Test", http://blog.doloreslabs.com/2008/04/search-engine-relevance-an-empirical-test/#more-35, accessed April 13, 2008.Google Scholar
- Jeff Howe. "The Rise of Crowdsourcing". Wired, June 2006. http://www.wired.com/wired/archive/14.06/crowds.htmlGoogle Scholar
- Peter Ingwersen and Kalervo Järvelin. The Turn: Integration of Information Seeking and Retrieval in Context, Springer, 2005. Google ScholarDigital Library
- Thorsten Joachims and Filip Radlinski, "Search Engines that Learn from Implicit Feedback", IEEE Computer, Vol. 40, No. 8, August 2007. Google ScholarDigital Library
- Daniel E. Rose, "Why Is Web Search So Hard.. to Evaluate?" Journal of Web Engineering, Vol. 3, Nos. 3 & 4, pp. 171--181, December 2004. Google ScholarDigital Library
- Tefko Saracevic. "Relevance: A Review of the Literature and a Framework for Thinking on the Notion in Information Science. Part III: Behavior and Effects on Relevance". Journal of the American Society for Information Science and Technology, 58(13):212--2144, 2007. Google ScholarDigital Library
- Ellen Voorhees. "TREC: Continuing Information Retrieval's Tradition of Experimentation". Comm. Of the ACM, Vol. 50, No. 11, November 2007. Google ScholarDigital Library
Index Terms
- Crowdsourcing for relevance evaluation
Recommendations
Using crowdsourcing for TREC relevance assessment
Crowdsourcing has recently gained a lot of attention as a tool for conducting different kinds of relevance evaluations. At a very high level, crowdsourcing describes outsourcing of tasks to a large group of people instead of assigning such tasks to an ...
On the effect of relevance scales in crowdsourcing relevance assessments for Information Retrieval evaluation
AbstractRelevance is a key concept in information retrieval and widely used for the evaluation of search systems using test collections. We present a comprehensive study of the effect of the choice of relevance scales on the evaluation of ...
Highlights- We collect relevance judgments for 4 crowdsourced scales.
- We compare the crowd ...
Using graded relevance assessments in IR evaluation
This article proposes evaluation methods based on the use of nondichotomous relevance judgements in IR experiments. It is argued that evaluation methods should credit IR methods for their ability to retrieve highly relevant documents. This is desirable ...
Comments