Skip to main content

Tackling Biased Baselines in the Risk-Sensitive Evaluation of Retrieval Systems

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8416))

Abstract

The aim of optimising information retrieval (IR) systems using a risk-sensitive evaluation methodology is to minimise the risk of performing any particular topic less effectively than a given baseline system. Baseline systems in this context determine the reference effectiveness for topics, relative to which the effectiveness of a given IR system in minimising the risk will be measured. However, the comparative risk-sensitive evaluation of a set of diverse IR systems – as attempted by the TREC 2013 Web track – is challenging, as the different systems under evaluation may be based upon a variety of different (base) retrieval models, such as learning to rank or language models. Hence, a question arises about how to properly measure the risk exhibited by each system. In this paper, we argue that no model of information retrieval alone is representative enough in this respect to be a true reference for the models available in the current state-of-the-art, and demonstrate, using the TREC 2012 Web track data, that as the baseline system changes, the resulting risk-based ranking of the systems changes significantly. Instead of using a particular system’s effectiveness as the reference effectiveness for topics, we propose several remedies including the use of mean within-topic system effectiveness as a baseline, which is shown to enable unbiased measurements of the risk-sensitive effectiveness of IR systems.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Amati, G., Carpineto, C., Romano, G.: Query difficulty, robustness, and selective application of query expansion. In: McDonald, S., Tait, J.I. (eds.) ECIR 2004. LNCS, vol. 2997, pp. 127–137. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  2. Carmel, D., Farchi, E., Petruschka, Y., Soffer, A.: Automatic query refinement using lexical affinities with maximal information gain. In: Proc. SIGIR, pp. 283–290 (2002)

    Google Scholar 

  3. Macdonald, C., Santos, R., Ounis, I.: The whens and hows of learning to rank for web search. Information Retrieval 16(5), 584–628 (2013)

    Article  Google Scholar 

  4. Voorhees, E.M.: Overview of the TREC 2003 robust retrieval track. In: Proc. TREC (2003)

    Google Scholar 

  5. Collins-Thompson, K.: Accounting for stability of retrieval algorithms using risk-reward curves. In: Proceedings of SIGIR Workshop on the Future of Evaluation in Information Retrieval (2009)

    Google Scholar 

  6. Collins-Thompson, K.: Reducing the risk of query expansion via robust constrained optimization. In: Proc. CIKM, pp. 837–846 (2009)

    Google Scholar 

  7. Wang, L., Bennett, P.N., Collins-Thompson, K.: Robust ranking models via risk-sensitive optimization. In: Proc. SIGIR, pp. 761–770 (2012)

    Google Scholar 

  8. Collins-Thompson, K., Bennett, P.N., Diaz, F., Clarke, C., Voorhees, E.: TREC 2013 Web Track Guidelines, http://research.microsoft.com/en-us/projects/trec-web-2013/

  9. Chapelle, O., Metlzer, D., Zhang, Y., Grinspan, P.: Expected reciprocal rank for graded relevance. In: Proc. CIKM, pp. 621–630 (2009)

    Google Scholar 

  10. Cormack, G., Smucker, M., Clarke, C.: Efficient and effective spam filtering and re-ranking for large web datasets. Information Retrieval 14(5), 441–465 (2011)

    Article  Google Scholar 

  11. Jackson, J.E.: A users guide to principal components. John Wiley & Sons (1990)

    Google Scholar 

  12. Dinçer, B.T.: Statistical principal components analysis for retrieval experiments. Journal of the American Society for Information Science and Technology 58(4), 560–574 (2007)

    Article  Google Scholar 

  13. Amati, G., van Rijsbergen, C.: Probabilistic models of information retrieval based on measuring the divergence from randomness. Transactions on Information Systems 20(4), 357–389 (2002)

    Article  Google Scholar 

  14. Dinçer, B.T.: IRRA at TREC 2012: Index term weighting based on divergence from independence model. In: Proc. TREC (2012)

    Google Scholar 

  15. Macdonald, C., McCreadie, R., Santos, R., Ounis, I.: From puppy to maturity: experiences in developing Terrier. In: Proc. OSIR at SIGIR (2012)

    Google Scholar 

  16. Clarke, C.L.A., Craswell, N., Soboroff, I.: Overview of the TREC 2004 Terabyte track. In: Proc. TREC (2004)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Dinçer, B.T., Ounis, I., Macdonald, C. (2014). Tackling Biased Baselines in the Risk-Sensitive Evaluation of Retrieval Systems. In: de Rijke, M., et al. Advances in Information Retrieval. ECIR 2014. Lecture Notes in Computer Science, vol 8416. Springer, Cham. https://doi.org/10.1007/978-3-319-06028-6_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-06028-6_3

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-06027-9

  • Online ISBN: 978-3-319-06028-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics