skip to main content
10.1145/564376.564429acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
Article

Predicting query performance

Published:11 August 2002Publication History

ABSTRACT

We develop a method for predicting query performance by computing the relative entropy between a query language model and the corresponding collection language model. The resulting clarity score measures the coherence of the language usage in documents whose models are likely to generate the query. We suggest that clarity scores measure the ambiguity of a query with respect to a collection of documents and show that they correlate positively with average precision in a variety of TREC test sets. Thus, the clarity score may be used to identify ineffective queries, on average, without relevance information. We develop an algorithm for automatically setting the clarity score threshold between predicted poorly-performing queries and acceptable queries and validate it using TREC data. In particular, we compare the automatic thresholds to optimum thresholds and also check how frequently results as good are achieved in sampling experiments that randomly assign queries to the two classes.

References

  1. A. Bowman and A. Azzilini. Applied Smoothing Techniques for Data Analysis. Oxford University Press, New York, 1997.Google ScholarGoogle Scholar
  2. C. Buckley. The trec-9 query track. In E. Voorhees and D. Harman, editors, Proceedings of the Ninth Text REtrieval Conference(TREC-9), 2000. NIST Special Publication 500-249.Google ScholarGoogle Scholar
  3. C. Carpineto, R. de Mori, G. Romano, and B. Bigi. An information-theoretic approach to automatic query expansion. ACM Transactions on Information Systems, 19(1):1--27, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. T. M. Cover and J. A. Thomas. Elements of Information Theory. Wiley-Interscience, New York, 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. W. B. Croft. Combining approaches in information retrieval. In W. B. Croft, editor, Advances in Information Retrieval: Recent Research from the CIIR, pages 1--36. Kluwer Academic Publishers, Boston, 2000.Google ScholarGoogle Scholar
  6. S. Cronen-Townsend and W. B. Croft. Quantifying query ambiguity. In Proc. of Human Language Technology 2002, pages 94--98, March 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. R. O. Duda and P. E. Hart. Pattern Classification and Scene Analysis. John Wiley and Sons, New York, 1973.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. J. D. Gibbons and S. Chakraborty. Nonparametric Statistical Inference, 3rd ed. Marcel Dekker, New York, New York, 1992.Google ScholarGoogle Scholar
  9. F. Jelinek. Statistical Models for Speech Recognition. MIT Press, Cambridge, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. M. H. Kalos and P. A. Whitlock. Monte Carlo Methods, Volume I: Basics. Wiley-Interscience, New York, 1986. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. R. Krovetz. Viewing morpholgy as an inference process. In Proc. of the 16th Annual ACM SIGIR Conference, pages 191--202, June--July 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. K. L. Kwok. A new method of weighting query terms for ad-hoc retrieval. In Proc. of the 19th Annual ACM SIGIR Conference, pages 187--195, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. V. Lavrenko and W. B. Croft. Relevance-based language models. In Proc. of the 24th Annual ACM SIGIR Conference, pages 120--127, September 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. C. D. Manning and H. Sch\"utze. Foundations of Statistical Natural Language Processing. MIT Press, Cambridge, Massachusetts, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. A. Pirkola and K. Jarvelin. Employing the resolution power of search keys. Journal of the American Society for Information Science and Technology, 52(7):575--583, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. J. Ponte and W. B. Croft. A language modeling approach to information retrieval. In Proc. of the 21st Annual ACM SIGIR Conference, pages 275--281, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. P. Resnik. Selectional constraints: An information-theoretic model and its computational realization. Cognition, 61:127--159, 1996.Google ScholarGoogle ScholarCross RefCross Ref
  18. M. Rorvig. A new method of measurement for question difficulty. In Proceedings of the 2000 Annual Meeting of the American Society for Information Science, Knowledge Innovations, volume 37, pages 372--378, 2000.Google ScholarGoogle Scholar
  19. F. Song and W. B. Croft. A general language model for informatioon retrieval. In Proceedings of the 22nd annual international ACM SIGIR conference, pages 279--280, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. T. Sullivan. Locating question difficulty through explorations in question space. In Proceedings of the 1st ACM/IEEE Joint Conference on Digital Libraries, pages 251--252, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. S. K. M. Wong and Y. Y. Yao. An information-theoretic measure of term specificity. Journal of the American Society for Information Science, 43(1):54--61, 1992.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Predicting query performance

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      SIGIR '02: Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
      August 2002
      478 pages
      ISBN:1581135610
      DOI:10.1145/564376

      Copyright © 2002 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 11 August 2002

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • Article

      Acceptance Rates

      SIGIR '02 Paper Acceptance Rate44of219submissions,20%Overall Acceptance Rate792of3,983submissions,20%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader