research-article

IR system evaluation using nugget-based test collections

Authors:
Virgil Pavlu

Northeastern University, Boston, MA, USA

Northeastern University, Boston, MA, USA
View Profile

,
Shahzad Rajput

Northeastern University, Boston, MA, USA

Northeastern University, Boston, MA, USA
View Profile

,
Peter B. Golbus

Northeastern University, Boston, MA, USA

Northeastern University, Boston, MA, USA
View Profile

,
Javed A. Aslam

Northeastern University, Boston, MA, USA

Northeastern University, Boston, MA, USA
View Profile

WSDM '12: Proceedings of the fifth ACM international conference on Web search and data miningFebruary 2012Pages 393–402https://doi.org/10.1145/2124295.2124343

Published:08 February 2012Publication History

WSDM '12: Proceedings of the fifth ACM international conference on Web search and data mining

Pages 393–402

ABSTRACT

The development of information retrieval systems such as search engines relies on good test collections, including assessments of retrieved content. The widely employed Cranfield paradigm dictates that the information relevant to a topic be encoded at the level of documents, therefore requiring effectively complete document relevance assessments. As this is no longer practical for modern corpora, numerous problems arise, including scalability, reusability, and applicability. We propose a new method for relevance assessment based on relevant information, not relevant documents. Once the relevant 'nuggets' are collected, our matching method can assess any document for relevance with high accuracy, and so any retrieved list of documents can be assessed for performance. In this paper we analyze the performance of the matching function by looking at specific cases and by comparing with other methods. We then show how these inferred relevance assessments can be used to perform IR system evaluation, and we discuss in particular reusability and scalability. Our main contribution is a methodology for producing test collections that are highly accurate, more complete, scalable, reusable, and can be generated with similar amounts of effort as existing methods, with great potential for future applications.

References

The Eighth Text REtrieval Conference (TREC-8). U.S. Government Printing Office, 2000.Google Scholar
33rd ACM SIGIR Workshop on Crowdsourcing for Search Evaluation, Geneva, Switzerland, 2010.Google Scholar
P. Achananuparp, X. Hu, and X. Shen. The evaluation of sentence similarity measures. DaWaK '08, Berlin, Heidelberg, 2008. Google ScholarDigital Library
E. Amitay, D. Carmel, R. Lempel, and A. Soffer. Scaling IR-system evaluation using term relevance sets. SIGIR '04, New York, NY, USA, 2004. Google ScholarDigital Library
P. Arvola, S. Geva, J. Kamps, R. Schenkel, A. Trotman, and J. Vainio. Overview of the INEX 2010 ad hoc track. In Preproceedings of the INEX 2010 Workshop, Vught, The Netherlands, 2010. Google ScholarDigital Library
A. Ashkan and C. L. Clarke. On the informativeness of cascade and intent-aware effectiveness measures. WWW '11, 2011. Google ScholarDigital Library
J. A. Aslam and E. Yilmaz. Inferring document relevance via average precision. SIGIR '06, 2006. Google ScholarDigital Library
P. Bailey, N. Craswell, I. Soboroff, P. Thomas, A. P. de Vries, and E. Yilmaz. Relevance assessment: Are judges exchangeable and does it matter? SIGIR '08, New York, NY, USA, 2008. Google ScholarDigital Library
A. Z. Broder. Identifying and filtering near-duplicate documents. COM '00, London, UK, 2000. Google ScholarDigital Library
A. Z. Broder, S. C. Glassman, M. S. Manasse, and G. Zweig. Syntactic clustering of the web. WWW'97. Google ScholarDigital Library
B. Carterette, J. Allan, and R. K. Sitaraman. Minimal test collections for retrieval evaluation. SIGIR'06. Google ScholarDigital Library
B. Carterette, V. Pavlu, E. Kanoulas, J. A. Aslam, and J. Allan. Evaluation over thousands of queries. SIGIR'08, 2008. Google ScholarDigital Library
C. L. A. Clarke, N. Craswell, and I. Soboroff. Overview of the trec 2009 web track, 2009.Google Scholar
C. L. A. Clarke, M. Kolla, G. V. Cormack, O. Vechtomova, A. Ashkan, S. Büttcher, and I. MacKinnon. Novelty and diversity in information retrieval evaluation. SIGIR'08, 2008. Google ScholarDigital Library
H. T. Dang, J. J. Lin, and D. Kelly. Overview of the TREC 2006 question answering track 99. In TREC, 2006.Google Scholar
C. L. A. C. Gordon V. Cormack, Mark D. Smucker. Efficient and effective spam filtering and re-ranking for large web datasets. University of Waterloo, 2010.Google Scholar
D. Harman. Overview of the third text REtreival conference (TREC-3). In Overview of the Third Text REtrieval Conference (TREC-3). U.S. Government Printing Office, Apr. 1995.Google ScholarCross Ref
S. Krenzel. Finding blurbs. Website. http://www.stevekrenzel.com/articles/blurbs.Google Scholar
J. Lin and D. Demner-Fushman. Automatically evaluating answers to definition questions. HLT '05, Morristown, NJ, USA, 2005. Google ScholarDigital Library
J. Lin and D. Demner-Fushman. Will pyramids built of nuggets topple over? HLT'06, Morristown, NJ, USA, 2006. Google ScholarDigital Library
G. Marton and A. Radul. Nuggeteer: Automatic nugget-based evaluation using descriptions and judgements. In Proceedings of NAACL/HLT, 2006. Google ScholarDigital Library
F. Radlinski, M. Kurup, and T. Joachims. How does clickthrough data reflect retrieval quality? CIKM '08, New York, NY, USA, 2008. Google ScholarDigital Library
S. Rajput, V. Pavlu, P. B. Golbus, and J. A. Aslam. A nugget-based test collection construction paradigm. CIKM '11, New York, NY, USA, 2011. Google ScholarDigital Library
T. Sakai and C.-Y. Lin. Ranking Retrieval Systems without Relevance Assessments -- Revisited. In the 3rd International Workshop on Evaluating Information Access (EVIA) -- A Satellite Workshop of NTCIR-8, Tokyo, Japan, 2010.Google Scholar
I. Soboroff, C. Nicholas, and P. Cahan. Ranking retrieval systems without relevance judgments. SIGIR '01, New York, NY, USA, 2001. Google ScholarDigital Library
A. Spoerri. Using the structure of overlap between search results to rank retrieval systems without relevance judgments. Inf. Process. Manage., 43, 2007. Google ScholarDigital Library
E. M. Voorhees. Variations in relevance judgments and the measurement of retrieval effectiveness. Inf. Process. Manage., 36, 2000. Google ScholarDigital Library
E. M. Voorhees. Question answering in TREC. CIKM '01, New York, NY, USA, 2001. Google ScholarDigital Library
Y. Yang and A. Lad. Modeling expected utility of multi-session information distillation. ITCIR'09, 2009. Google ScholarDigital Library
Y. Yang, A. Lad, N. Lao, A. Harpale, B. Kisiel, and M. Rogati. Utility-based information distillation over termporally sequenced documents. SIGIR'07, 2007. Google ScholarDigital Library
E. Yilmaz and J. A. Aslam. Estimating average precision when judgments are incomplete. Knowledge and Information Systems, 16(2), 2008. Google ScholarDigital Library
J. Zobel. How reliable are the results of large-scale retrieval experiments? SIGIR'98, Aug. 1998. Google ScholarDigital Library

Index Terms

IR system evaluation using nugget-based test collections
1. Information systems
  1. Information retrieval
    1. Evaluation of retrieval results

Recommendations

A nugget-based test collection construction paradigm
CIKM '11: Proceedings of the 20th ACM international conference on Information and knowledge management

The problem of building test collections is central to the development of information retrieval systems such as search engines. Starting with a few relevant "nuggets" of information manually extracted from existing TREC corpora, we implement and test a ...
Read More
Constructing test collections by inferring document relevance via extracted relevant information
CIKM '12: Proceedings of the 21st ACM international conference on Information and knowledge management

The goal of a typical information retrieval system is to satisfy a user's information need---e.g., by providing an answer or information "nugget"---while the actual search space of a typical information retrieval system consists of documents---i.e., ...
Read More
Live nuggets extractor: a semi-automated system for text extraction and test collection creation
SIGIR '13: Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval

The Live Nugget Extractor system provides users with a method of efficiently and accurately collecting relevant information for any web query rather than providing a simple ranked lists of documents. The system utilizes an online learning procedure to ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
WSDM '12: Proceedings of the fifth ACM international conference on Web search and data mining
February 2012
792 pages
ISBN:9781450307475
DOI:10.1145/2124295
General Chairs:
Eytan Adar
University of Michigan, USA
,
Jaime Teevan
Microsoft Research, USA
,
Program Chairs:
Eugene Agichtein
Emory University, USA
,
Yoelle Maarek
Yahoo! Research, Israel
Copyright © 2012 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 8 February 2012
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
nuggets
test collection
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate498of2,863submissions,17%
Upcoming Conference
WSDM '25

Sponsor:

sigir

sigir

sigir

sigir

The Eighteenth ACM International Conference on Web Search and Data Mining

April 7 - 11, 2025

Hannover , Germany
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 20
  Total Citations
  View Citations
- 430
  Total Downloads
- Downloads (Last 12 months)29
- Downloads (Last 6 weeks)7
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

IR system evaluation using nugget-based test collections

WSDM '12: Proceedings of the fifth ACM international conference on Web search and data mining

ABSTRACT

References

Cited By

Index Terms

Recommendations

A nugget-based test collection construction paradigm

Constructing test collections by inferring document relevance via extracted relevant information

Live nuggets extractor: a semi-automated system for text extraction and test collection creation