Skip to main content
Log in

Measuring ranked list robustness for query performance prediction

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

We introduce the notion of ranking robustness, which refers to a property of a ranked list of documents that indicates how stable the ranking is in the presence of uncertainty in the ranked documents. We propose a statistical measure called the robustness score to quantify this notion. Our initial motivation for measuring ranking robustness is to predict topic difficulty for content-based queries in the ad-hoc retrieval task. Our results demonstrate that the robustness score is positively and consistently correlation with average precision of content-based queries across a variety of TREC test collections. Though our focus is on prediction under the ad-hoc retrieval task, we observe an interesting negative correlation with query performance when our technique is applied to named-page finding queries, which are a fundamentally different kind of queries. A side effect of this different behavior of the robustness score between the two types of queries is that the robustness score is also found to be a good feature for query classification.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Amati G, Carpineto C, Romano G (2004) Query difficulty, robustness and selective application of query expansion. In: The proceedings of 26th European Conference on Information Retrieval (ECIR), pp 127–137

  2. Bernstein Y, Billerbeck B, Garcia S et al (2005) RMIT University at TREC 2005: Terabyte and Robust Track. In: The online proceedings of 2005 text REtrieval conference

  3. Bookstein A and Swanson D (1974). Probabilistic models for automatic indexing. J Am Soc Inf Sci 25(5): 312–319

    Article  Google Scholar 

  4. Cronen-Townsend S, Zhou Y, oft WB (2002) Predicting query performance. In: Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval, pp 299–306

  5. Carmel D, Yom-Tov E, Darlow A et al (2006) What makes a query difficult? In: Proceedings of the 29th annual international ACM SIGIR conference on research and development in information retrieval, pp 390–397

  6. Diaz F, Jones R (2004) Using temporal profiles of queries for precision prediction. In: Proceedings of the 27th annual international ACM SIGIR conference on research and development in information retrieval, pp 18–24

  7. Jensen EC, Beitzel SM, Chowdhury A et al. (2005) Predicting Query Difficulty on the Web by Learning Visual Clues. In: Proceedings of the 2005 ACM conference on research and development in information retrieval, pp 615–616

  8. Harter SP (1975) A probabilistic approach to automatic keyword indexing. J Am Soc Inf Sci 26(4/5) Part I: 197–206; Part II:280–289

  9. Hastie T, Tibshirani R, Friedman JH (2001) The elements of statistical learning, Chap. 6. Kernel method-Springer press, Heidelberg

  10. He B, Ounis I (2004) Inferring query performance using pre-retrieval predictors. In: proceedings of the SPIRE 2004, pp 43–54

  11. Lopresti D, Zhou J (2006) Retrieval strategy for noisy text. In: Symposium on document analysis and information retrieval, pp 1–16

  12. Gibbons JD and Chakraborty S (1992). Nonparametric statistical inference. Marcel Dekker, New York

    MATH  Google Scholar 

  13. Kalos MH and Whitlock PA (1996). Monte carlo methods. Wiley, New York

    Google Scholar 

  14. Kreyszig E (1997). Advanced engineering mathematics, Chap. 23.10. Wiley, New York

    Google Scholar 

  15. Kwok KL, Grunfeld L, Sun HL et al (2004) TREC 2004 Robust Track Experiments Using PIRCS. In: The Online Proceedings of 2004 text REtrieval conference

  16. Kwok KL, Grunfeld L, Dinstl et al (2005) TREC 2005 Robust Track Experiments Using PIRCS. In: The Online proceedings of 2005 text REtrieval conference

  17. Mittendorf E (1998) Data corruption and information retrieval. PhD Thesis, Department of Computer Science, the Katholieke Universiteit Leuven

  18. Metzler D, Strohman T, Zhou Y et al (2005) Indri at TREC 2005: terabyte Track. In: The Online proceedings of 2005 TREC

  19. Ogilvie P, Callan J (2003) Combining document representations for known-item search. In: Proceedings of the 26th annual international ACM SIGIR conference on Research and development in information retrieval, pp 143–150

  20. Plachouras V, He B, Ounis I (2004) University of Glasgow at TREC2004: experiments in Web, robust, and terabyte tracks with terrier. In: The online proceedings of 2004 text REtrieval conference

  21. Predicting Query Difficulty. SIGIR workshop 2005 http://www.haifa.ibm.com/sigir05-qp/index.html

  22. Robust Track http://trec.nist.gov/tracks.html

  23. Singhal A, Salton G, Buckley C (1996) Length normalization in degraded text collections. In: Symposium on document analysis and information retrieval, pp 149–162

  24. Song F, oft WB (1999) A general language model for information retrieval. In: Proceedings of the 22nd annual international ACM SIGIR conference on research and development in information retrieval, pp 270–280

  25. Tomlinson S (2004) Robust, Web and terabyte retrieval with hummingbird searchServer at TREC 2004. In: The online proceedings of 2004 text REtrieval conference

  26. Vinay V, Cox IJ, Mill-Frayling N et al (2006) On ranking the effectiveness of searcher. In: Proceedings of the 29th annual international ACM SIGIR conference on research and development in information retrieval, pp 398–404

  27. Voorhees EM (2004) Overview of the TREC 2004 Robust Track. In: The online proceedings of 2004 text REtrieval conference

  28. Yom-Tov E, Fine S, Carmel D et al (2005) Learning to estimate query difficulty with applications to missing content detection and distributed information retrieval. In: The proceedings of 28th annual international ACM SIGIR conference on research and development in information retrieval, pp 512–219

  29. Zhou Y, Croft WB (2005) Document quality models for web ad hoc retrieval, a poster presentation. In: The proceedings of the ACM 14th conference on information and knowledge management (CIKM 2005), pp 331–334

  30. Zhou Y, Croft WB (2006) Ranking robustness: a novel framework to predict query performance. In: The proceedings of the ACM 15th conference on information and knowledge management (CIKM 2006), pp 567–574

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yun Zhou.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhou, Y., Croft, W.B. Measuring ranked list robustness for query performance prediction. Knowl Inf Syst 16, 155–171 (2008). https://doi.org/10.1007/s10115-007-0100-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-007-0100-8

General terms

Keywords

Navigation