Article

Predicting query performance

Authors:
Steve Cronen-Townsend

University of Massachusetts, Amherst, MA

University of Massachusetts, Amherst, MA
View Profile

,
Yun Zhou

University of Massachusetts, Amherst, MA

University of Massachusetts, Amherst, MA
View Profile

,
W. Bruce Croft

University of Massachusetts, Amherst, MA

University of Massachusetts, Amherst, MA
View Profile

SIGIR '02: Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrievalAugust 2002Pages 299–306https://doi.org/10.1145/564376.564429

Published:11 August 2002Publication History

SIGIR '02: Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval

Pages 299–306

ABSTRACT

We develop a method for predicting query performance by computing the relative entropy between a query language model and the corresponding collection language model. The resulting clarity score measures the coherence of the language usage in documents whose models are likely to generate the query. We suggest that clarity scores measure the ambiguity of a query with respect to a collection of documents and show that they correlate positively with average precision in a variety of TREC test sets. Thus, the clarity score may be used to identify ineffective queries, on average, without relevance information. We develop an algorithm for automatically setting the clarity score threshold between predicted poorly-performing queries and acceptable queries and validate it using TREC data. In particular, we compare the automatic thresholds to optimum thresholds and also check how frequently results as good are achieved in sampling experiments that randomly assign queries to the two classes.

References

A. Bowman and A. Azzilini. Applied Smoothing Techniques for Data Analysis. Oxford University Press, New York, 1997.Google Scholar
C. Buckley. The trec-9 query track. In E. Voorhees and D. Harman, editors, Proceedings of the Ninth Text REtrieval Conference(TREC-9), 2000. NIST Special Publication 500-249.Google Scholar
C. Carpineto, R. de Mori, G. Romano, and B. Bigi. An information-theoretic approach to automatic query expansion. ACM Transactions on Information Systems, 19(1):1--27, 2001. Google ScholarDigital Library
T. M. Cover and J. A. Thomas. Elements of Information Theory. Wiley-Interscience, New York, 1991. Google ScholarDigital Library
W. B. Croft. Combining approaches in information retrieval. In W. B. Croft, editor, Advances in Information Retrieval: Recent Research from the CIIR, pages 1--36. Kluwer Academic Publishers, Boston, 2000.Google Scholar
S. Cronen-Townsend and W. B. Croft. Quantifying query ambiguity. In Proc. of Human Language Technology 2002, pages 94--98, March 2002. Google ScholarDigital Library
R. O. Duda and P. E. Hart. Pattern Classification and Scene Analysis. John Wiley and Sons, New York, 1973.Google ScholarDigital Library
J. D. Gibbons and S. Chakraborty. Nonparametric Statistical Inference, 3rd ed. Marcel Dekker, New York, New York, 1992.Google Scholar
F. Jelinek. Statistical Models for Speech Recognition. MIT Press, Cambridge, 1997. Google ScholarDigital Library
M. H. Kalos and P. A. Whitlock. Monte Carlo Methods, Volume I: Basics. Wiley-Interscience, New York, 1986. Google ScholarDigital Library
R. Krovetz. Viewing morpholgy as an inference process. In Proc. of the 16th Annual ACM SIGIR Conference, pages 191--202, June--July 1993. Google ScholarDigital Library
K. L. Kwok. A new method of weighting query terms for ad-hoc retrieval. In Proc. of the 19th Annual ACM SIGIR Conference, pages 187--195, 1996. Google ScholarDigital Library
V. Lavrenko and W. B. Croft. Relevance-based language models. In Proc. of the 24th Annual ACM SIGIR Conference, pages 120--127, September 2001. Google ScholarDigital Library
C. D. Manning and H. Sch\"utze. Foundations of Statistical Natural Language Processing. MIT Press, Cambridge, Massachusetts, 1999. Google ScholarDigital Library
A. Pirkola and K. Jarvelin. Employing the resolution power of search keys. Journal of the American Society for Information Science and Technology, 52(7):575--583, 2001. Google ScholarDigital Library
J. Ponte and W. B. Croft. A language modeling approach to information retrieval. In Proc. of the 21st Annual ACM SIGIR Conference, pages 275--281, 1998. Google ScholarDigital Library
P. Resnik. Selectional constraints: An information-theoretic model and its computational realization. Cognition, 61:127--159, 1996.Google ScholarCross Ref
M. Rorvig. A new method of measurement for question difficulty. In Proceedings of the 2000 Annual Meeting of the American Society for Information Science, Knowledge Innovations, volume 37, pages 372--378, 2000.Google Scholar
F. Song and W. B. Croft. A general language model for informatioon retrieval. In Proceedings of the 22nd annual international ACM SIGIR conference, pages 279--280, 1999. Google ScholarDigital Library
T. Sullivan. Locating question difficulty through explorations in question space. In Proceedings of the 1st ACM/IEEE Joint Conference on Digital Libraries, pages 251--252, 2001. Google ScholarDigital Library
S. K. M. Wong and Y. Y. Yao. An information-theoretic measure of term specificity. Journal of the American Society for Information Science, 43(1):54--61, 1992.Google ScholarCross Ref

Index Terms

Predicting query performance
1. Information systems
  1. Information retrieval
    1. Information retrieval query processing

Recommendations

Predicting Query Performance by Query-Drift Estimation

Predicting query performance, that is, the effectiveness of a search performed in response to a query, is a highly important and challenging problem. We present a novel approach to this task that is based on measuring the standard deviation of retrieval ...
Read More
Do clarity scores for queries correlate with user performance?
ADC '04: Proceedings of the 15th Australasian database conference - Volume 27

Recently the concept of a clarity score was introduced in order to measure the ambiguity of a query in relation to the collection in which the query issuer is seeking information [Cronen-Townsend et al. Proc. ACM SIGIR2002, Tampere Finland, August 2002]. ...
Read More
Quantify query ambiguity using ODP metadata
SIGIR '07: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval

Query ambiguity prevents existing retrieval systems from returning reasonable results for every query. As there is already lots of work done on resolving ambiguity, vague queries could be handled using corresponding approaches separately if they can be ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGIR '02: Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
August 2002
478 pages
ISBN:1581135610
DOI:10.1145/564376
General Chair:
Kalervo Järvelin
University of Tampere, Finland
,
Program Chairs:
Micheline Beaulieu
University of Sheffield, UK
,
Ricardo Baeza-Yates
University of Chile, Chile
,
Sung Hyon Myaeng
Chungnam National University, Korea
Copyright © 2002 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 11 August 2002
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
ambiguity
clarity
information theory
language models
Qualifiers
- Article
Conference

Acceptance Rates
SIGIR '02 Paper Acceptance Rate44of219submissions,20%Overall Acceptance Rate792of3,983submissions,20%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 427
  Total Citations
  View Citations
- 2,146
  Total Downloads
- Downloads (Last 12 months)64
- Downloads (Last 6 weeks)7
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Predicting query performance

SIGIR '02: Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval

ABSTRACT

References

Cited By

Index Terms

Recommendations

Predicting Query Performance by Query-Drift Estimation

Do clarity scores for queries correlate with user performance?

Quantify query ambiguity using ODP metadata