ABSTRACT
We develop a method for predicting query performance by computing the relative entropy between a query language model and the corresponding collection language model. The resulting clarity score measures the coherence of the language usage in documents whose models are likely to generate the query. We suggest that clarity scores measure the ambiguity of a query with respect to a collection of documents and show that they correlate positively with average precision in a variety of TREC test sets. Thus, the clarity score may be used to identify ineffective queries, on average, without relevance information. We develop an algorithm for automatically setting the clarity score threshold between predicted poorly-performing queries and acceptable queries and validate it using TREC data. In particular, we compare the automatic thresholds to optimum thresholds and also check how frequently results as good are achieved in sampling experiments that randomly assign queries to the two classes.
- A. Bowman and A. Azzilini. Applied Smoothing Techniques for Data Analysis. Oxford University Press, New York, 1997.Google Scholar
- C. Buckley. The trec-9 query track. In E. Voorhees and D. Harman, editors, Proceedings of the Ninth Text REtrieval Conference(TREC-9), 2000. NIST Special Publication 500-249.Google Scholar
- C. Carpineto, R. de Mori, G. Romano, and B. Bigi. An information-theoretic approach to automatic query expansion. ACM Transactions on Information Systems, 19(1):1--27, 2001. Google ScholarDigital Library
- T. M. Cover and J. A. Thomas. Elements of Information Theory. Wiley-Interscience, New York, 1991. Google ScholarDigital Library
- W. B. Croft. Combining approaches in information retrieval. In W. B. Croft, editor, Advances in Information Retrieval: Recent Research from the CIIR, pages 1--36. Kluwer Academic Publishers, Boston, 2000.Google Scholar
- S. Cronen-Townsend and W. B. Croft. Quantifying query ambiguity. In Proc. of Human Language Technology 2002, pages 94--98, March 2002. Google ScholarDigital Library
- R. O. Duda and P. E. Hart. Pattern Classification and Scene Analysis. John Wiley and Sons, New York, 1973.Google ScholarDigital Library
- J. D. Gibbons and S. Chakraborty. Nonparametric Statistical Inference, 3rd ed. Marcel Dekker, New York, New York, 1992.Google Scholar
- F. Jelinek. Statistical Models for Speech Recognition. MIT Press, Cambridge, 1997. Google ScholarDigital Library
- M. H. Kalos and P. A. Whitlock. Monte Carlo Methods, Volume I: Basics. Wiley-Interscience, New York, 1986. Google ScholarDigital Library
- R. Krovetz. Viewing morpholgy as an inference process. In Proc. of the 16th Annual ACM SIGIR Conference, pages 191--202, June--July 1993. Google ScholarDigital Library
- K. L. Kwok. A new method of weighting query terms for ad-hoc retrieval. In Proc. of the 19th Annual ACM SIGIR Conference, pages 187--195, 1996. Google ScholarDigital Library
- V. Lavrenko and W. B. Croft. Relevance-based language models. In Proc. of the 24th Annual ACM SIGIR Conference, pages 120--127, September 2001. Google ScholarDigital Library
- C. D. Manning and H. Sch\"utze. Foundations of Statistical Natural Language Processing. MIT Press, Cambridge, Massachusetts, 1999. Google ScholarDigital Library
- A. Pirkola and K. Jarvelin. Employing the resolution power of search keys. Journal of the American Society for Information Science and Technology, 52(7):575--583, 2001. Google ScholarDigital Library
- J. Ponte and W. B. Croft. A language modeling approach to information retrieval. In Proc. of the 21st Annual ACM SIGIR Conference, pages 275--281, 1998. Google ScholarDigital Library
- P. Resnik. Selectional constraints: An information-theoretic model and its computational realization. Cognition, 61:127--159, 1996.Google ScholarCross Ref
- M. Rorvig. A new method of measurement for question difficulty. In Proceedings of the 2000 Annual Meeting of the American Society for Information Science, Knowledge Innovations, volume 37, pages 372--378, 2000.Google Scholar
- F. Song and W. B. Croft. A general language model for informatioon retrieval. In Proceedings of the 22nd annual international ACM SIGIR conference, pages 279--280, 1999. Google ScholarDigital Library
- T. Sullivan. Locating question difficulty through explorations in question space. In Proceedings of the 1st ACM/IEEE Joint Conference on Digital Libraries, pages 251--252, 2001. Google ScholarDigital Library
- S. K. M. Wong and Y. Y. Yao. An information-theoretic measure of term specificity. Journal of the American Society for Information Science, 43(1):54--61, 1992.Google ScholarCross Ref
Index Terms
- Predicting query performance
Recommendations
Predicting Query Performance by Query-Drift Estimation
Predicting query performance, that is, the effectiveness of a search performed in response to a query, is a highly important and challenging problem. We present a novel approach to this task that is based on measuring the standard deviation of retrieval ...
Do clarity scores for queries correlate with user performance?
ADC '04: Proceedings of the 15th Australasian database conference - Volume 27Recently the concept of a clarity score was introduced in order to measure the ambiguity of a query in relation to the collection in which the query issuer is seeking information [Cronen-Townsend et al. Proc. ACM SIGIR2002, Tampere Finland, August 2002]. ...
Quantify query ambiguity using ODP metadata
SIGIR '07: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrievalQuery ambiguity prevents existing retrieval systems from returning reasonable results for every query. As there is already lots of work done on resolving ambiguity, vague queries could be handled using corresponding approaches separately if they can be ...
Comments