skip to main content
10.1145/3539618.3591760acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
research-article
Open Access

Safe Deployment for Counterfactual Learning to Rank with Exposure-Based Risk Minimization

Published:18 July 2023Publication History

ABSTRACT

Counterfactual learning to rank (CLTR) relies on exposure-based inverse propensity scoring (IPS), a LTR-specific adaptation of IPS to correct for position bias. While IPS can provide unbiased and consistent estimates, it often suffers from high variance. Especially when little click data is available, this variance can cause CLTR to learn sub-optimal ranking behavior. Consequently, existing CLTR methods bring significant risks with them, as naively deploying their models can result in very negative user experiences.

We introduce a novel risk-aware CLTR method with theoretical guarantees for safe deployment. We apply a novel exposure-based concept of risk regularization to IPS estimation for LTR. Our risk regularization penalizes the mismatch between the ranking behavior of a learned model and a given safe model. Thereby, it ensures that learned ranking models stay close to a trusted model, when there is high uncertainty in IPS estimation, which greatly reduces the risks during deployment. Our experimental results demonstrate the efficacy of our proposed method, which is effective at avoiding initial periods of bad performance when little date is available, while also maintaining high performance at convergence. For the CLTR field, our novel exposure-based risk minimization method enables practitioners to adopt CLTR methods in a safer manner that mitigates many of the risks attached to previous methods.

References

  1. Aman Agarwal, Xuanhui Wang, Cheng Li, Michael Bendersky, and Marc Najork. 2019. Addressing Trust Bias for Unbiased Learning-to-rank. In The World Wide Web Conference. 4--14.Google ScholarGoogle Scholar
  2. Olivier Chapelle and Yi Chang. 2011. Yahoo! Learning to Rank Challenge Overview. In Proceedings of the learning to rank challenge. PMLR, 1--24.Google ScholarGoogle Scholar
  3. Olivier Chapelle and Ya Zhang. 2009. A Dynamic Bayesian Network Click Model for Web Search Ranking. In Proceedings of the 18th international conference on World wide web. 1--10.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Jia Chen, Jiaxin Mao, Yiqun Liu, Min Zhang, and Shaoping Ma. 2019. TianGong-ST: A New Dataset with Large-scale Refined Real-world Web Search Sessions. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management. 2485--2488.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Aleksandr Chuklin, Ilya Markov, and Maarten de Rijke. 2015. Click Models for Web Search. Morgan & Claypool Publishers.Google ScholarGoogle Scholar
  6. Corinna Cortes, Yishay Mansour, and Mehryar Mohri. 2010. Learning Bounds for Importance Weighting. In Proceedings of the 23rd International Conference on Neural Information Processing Systems-Volume 1. 442--450.Google ScholarGoogle Scholar
  7. Nick Craswell, Onno Zoeter, Michael Taylor, and Bill Ramsey. 2008. An Experimental Comparison of Click Position-bias Models. In Proceedings of the 2008 international conference on web search and data mining. 87--94.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Domenico Dato, Claudio Lucchese, Franco Maria Nardini, Salvatore Orlando, Raffaele Perego, Nicola Tonellotto, and Rossano Venturini. 2016. Fast Ranking with Additive Ensembles of Oblivious and Non-Oblivious Regression Trees. ACM Transactions on Information Systems (TOIS), Vol. 35, 2 (2016), 1--31.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. BK Ghosh. 2002. Probability Inequalities Related to Markov's Theorem. The American Statistician, Vol. 56, 3 (2002), 186--190.Google ScholarGoogle ScholarCross RefCross Ref
  10. Li He, Long Xia, Wei Zeng, Zhi-Ming Ma, Yihong Zhao, and Dawei Yin. 2019. Off-policy Learning for Multiple Loggers. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 1184--1193.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Daniel G Horvitz and Donovan J Thompson. 1952. A Generalization of Sampling Without Replacement From a Finite Universe. Journal of the American statistical Association, Vol. 47, 260 (1952), 663--685.Google ScholarGoogle ScholarCross RefCross Ref
  12. Rolf Jagerman, Ilya Markov, and Maarten de Rijke. 2020. Safe Exploration for Optimizing Contextual Bandits. ACM Transactions on Information Systems (TOIS), Vol. 38, 3 (2020), 1--23.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Frederick James. 1980. Monte Carlo Theory and Practice. Reports on progress in Physics, Vol. 43, 9 (1980), 1145.Google ScholarGoogle Scholar
  14. Kalervo Järvelin and Jaana Kekäläinen. 2002. Cumulated Gain-Based Evaluation of IR Techniques. ACM Transactions on Information Systems (TOIS), Vol. 20, 4 (2002), 422--446.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Shan Jiang, Yuening Hu, Changsung Kang, Tim Daly Jr, Dawei Yin, Yi Chang, and Chengxiang Zhai. 2016. Learning Query and Document Relevance from a Web-scale Click Graph. In Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval. 185--194.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Thorsten Joachims. 2002. Optimizing Search Engines Using Clickthrough Data. In Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining. 133--142.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Thorsten Joachims and Adith Swaminathan. 2016. Counterfactual Evaluation and Learning for Search, Recommendation and Ad Placement. In Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval. 1199--1201.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Thorsten Joachims, Adith Swaminathan, and Tobias Schnabel. 2017. Unbiased Learning-to-rank with Biased Feedback. In Proceedings of the Tenth ACM International Conference on Web Search and Data Mining. 781--789.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Lihong Li, Wei Chu, John Langford, and Robert E Schapire. 2010. A Contextual-Bandit Approach to Personalized News Article Recommendation. In Proceedings of the 19th international conference on World wide web. 661--670.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Tie-Yan Liu. 2009. Learning to Rank for Information Retrieval. Foundations and Trends® in Information Retrieval, Vol. 3, 3 (2009), 225--331.Google ScholarGoogle Scholar
  21. Andreas Maurer and Massimiliano Pontil. 2009. Empirical Bernstein Bounds and Sample-Variance Penalization. In Annual Conference Computational Learning Theory.Google ScholarGoogle Scholar
  22. Sebastian Nowozin, Botond Cseke, and Ryota Tomioka. 2016. f-GAN: Training Generative Neural Samplers using Variational Divergence Minimization. Advances in neural information processing systems, Vol. 29 (2016).Google ScholarGoogle Scholar
  23. Harrie Oosterhuis. 2020. Learning from User Interactions with Rankings: A Unification of the Field. Ph.,D. Dissertation. Informatics Institute, University of Amsterdam.Google ScholarGoogle Scholar
  24. Harrie Oosterhuis. 2021. Computationally Efficient Optimization of Plackett-Luce Ranking Models for Relevance and Fairness. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. 1023--1032.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Harrie Oosterhuis. 2022a. Doubly-Robust Estimation for Unbiased Learning-to-Rank from Position-Biased Click Feedback. arXiv preprint arXiv:2203.17118 (2022).Google ScholarGoogle Scholar
  26. Harrie Oosterhuis. 2022b. Reaching the End of Unbiasedness: Uncovering Implicit Limitations of Click-Based Learning to Rank. In Proceedings of the 2022 ACM SIGIR International Conference on the Theory of Information Retrieval. ACM.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Harrie Oosterhuis and Maarten de Rijke. 2020a. Policy-aware Unbiased Learning to Rank for Top-k Rankings. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. 489--498.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Harrie Oosterhuis and Maarten de Rijke. 2020b. Taking the Counterfactual Online: Efficient and Unbiased Online Evaluation for Ranking. In Proceedings of the 2020 ACM SIGIR on International Conference on Theory of Information Retrieval. 137--144.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Harrie Oosterhuis and Maarten de Rijke. 2021a. Unifying Online and Counterfactual Learning to Rank: A Novel Counterfactual Estimator that Effectively Utilizes Online Interventions. In Proceedings of the 14th ACM International Conference on Web Search and Data Mining. 463--471.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Harrie Oosterhuis and Maarten de de Rijke. 2021b. Robust Generalization and Safe Query-Specialization in Counterfactual Learning to Rank. In Proceedings of the Web Conference 2021. 158--170.Google ScholarGoogle Scholar
  31. Harrie Oosterhuis, Rolf Jagerman, and Maarten de Rijke. 2020. Unbiased Learning to Rank: Counterfactual and Online Approaches. In Companion Proceedings of the Web Conference 2020. 299--300.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Tao Qin and Tie-Yan Liu. 2013. Introducing LETOR 4.0 Datasets. arXiv preprint arXiv:1306.2597 (2013).Google ScholarGoogle Scholar
  33. Tao Qin, Tie-Yan Liu, Jun Xu, and Hang Li. 2010. LETOR: A Benchmark Collection for Research on Learning to Rank for Information Retrieval. Information Retrieval, Vol. 13, 4 (2010), 346--374.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Filip Radlinski, Madhu Kurup, and Thorsten Joachims. 2008. How Does Clickthrough Data Reflect Retrieval Quality?. In Proceedings of the 17th ACM conference on Information and knowledge management. 43--52.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Alfréd Rényi. 1961. On measures of entropy and information. In Proceedings of the fourth Berkeley symposium on mathematical statistics and probability, Vol. 1. Berkeley, California, USA.Google ScholarGoogle Scholar
  36. Yuta Saito, Shunsuke Aihara, Megumi Matsutani, and Yusuke Narita. 2020. Open Bandit Dataset and Pipeline: Towards Realistic and Reproducible Off-Policy Evaluation. arXiv preprint arXiv:2008.07146 (2020).Google ScholarGoogle Scholar
  37. Yuta Saito and Thorsten Joachims. 2021. Counterfactual Learning and Evaluation for Recommender Systems: Foundations, Implementations, and Recent Advances. In Fifteenth ACM Conference on Recommender Systems. 828--830.Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Mark Sanderson, Monica Lestari Paramita, Paul Clough, and Evangelos Kanoulas. 2010. Do User Preferences and Evaluation Measures Line Up?. In Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval. 555--562.Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Shai Shalev-Shwartz and Shai Ben-David. 2014. Understanding Machine Learning: From Theory to Algorithms. Cambridge university press.Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Mirco Speretta and Susan Gauch. 2005. Personalized Search Based on User Search Histories. In The 2005 IEEE/WIC/ACM International Conference on Web Intelligence (WI'05). IEEE, 622--628.Google ScholarGoogle Scholar
  41. Adith Swaminathan and Thorsten Joachims. 2015. Batch Learning from Logged Bandit Feedback through Counterfactual Risk Minimization. The Journal of Machine Learning Research, Vol. 16, 1 (2015), 1731--1755.Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Philip Thomas, Georgios Theocharous, and Mohammad Ghavamzadeh. 2015. High-Confidence Off-Policy Evaluation. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 29.Google ScholarGoogle ScholarCross RefCross Ref
  43. Ali Vardasbi, Harrie Oosterhuis, and Maarten de Rijke. 2020. When Inverse Propensity Scoring does not Work: Affine Corrections for Unbiased Learning to Rank. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management. 1475--1484.Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Xuanhui Wang, Michael Bendersky, Donald Metzler, and Marc Najork. 2016. Learning to Rank with Selection Bias in Personal Search. In Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval. ACM, 115--124.Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Xuanhui Wang, Nadav Golbandi, Michael Bendersky, Donald Metzler, and Marc Najork. 2018. Position Bias Estimation for Unbiased Learning to Rank in Personal Search. In Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining. 610--618.Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Steve Wedig and Omid Madani. 2006. A Large-Scale Analysis of Query Logs for Assessing Personalization Opportunities. In Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining. 742--747.Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Ryen W White, Mikhail Bilenko, and Silviu Cucerzan. 2007. Studying the Use of Popular Destinations to Enhance Web Search Interaction. In Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval. 159--166.Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Ronald J Williams. 1992. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning. Machine learning, Vol. 8, 3 (1992), 229--256.Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Hang Wu and May Wang. 2018. Variance Regularized Counterfactual Risk Minimization via Variational Divergence Minimization. In International Conference on Machine Learning. PMLR, 5353--5362.Google ScholarGoogle Scholar
  50. Himank Yadav, Zhengxiao Du, and Thorsten Joachims. 2021. Policy-Gradient Training of Fair and Unbiased Ranking Functions. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. 1044--1053.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Safe Deployment for Counterfactual Learning to Rank with Exposure-Based Risk Minimization

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        SIGIR '23: Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval
        July 2023
        3567 pages
        ISBN:9781450394086
        DOI:10.1145/3539618

        Copyright © 2023 Owner/Author

        This work is licensed under a Creative Commons Attribution International 4.0 License.

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 18 July 2023

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate792of3,983submissions,20%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader