Safe Deployment for Counterfactual Learning to Rank with Exposure-Based Risk Minimization

Authors:
Shashank Gupta

University of Amsterdam, Amsterdam, Netherlands

University of Amsterdam, Amsterdam, Netherlands

0000-0003-1291-7951
View Profile

,
Harrie Oosterhuis

Radboud Universiteit, Nijmegen, Netherlands

Radboud Universiteit, Nijmegen, Netherlands

0000-0002-0458-9233
View Profile

,
Maarten de Rijke

University of Amsterdam, Amsterdam, Netherlands

University of Amsterdam, Amsterdam, Netherlands

0000-0002-1086-0202
View Profile

SIGIR '23: Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information RetrievalJuly 2023Pages 249–258https://doi.org/10.1145/3539618.3591760

Published:18 July 2023Publication History

SIGIR '23: Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval

Pages 249–258

ABSTRACT

Counterfactual learning to rank (CLTR) relies on exposure-based inverse propensity scoring (IPS), a LTR-specific adaptation of IPS to correct for position bias. While IPS can provide unbiased and consistent estimates, it often suffers from high variance. Especially when little click data is available, this variance can cause CLTR to learn sub-optimal ranking behavior. Consequently, existing CLTR methods bring significant risks with them, as naively deploying their models can result in very negative user experiences.

We introduce a novel risk-aware CLTR method with theoretical guarantees for safe deployment. We apply a novel exposure-based concept of risk regularization to IPS estimation for LTR. Our risk regularization penalizes the mismatch between the ranking behavior of a learned model and a given safe model. Thereby, it ensures that learned ranking models stay close to a trusted model, when there is high uncertainty in IPS estimation, which greatly reduces the risks during deployment. Our experimental results demonstrate the efficacy of our proposed method, which is effective at avoiding initial periods of bad performance when little date is available, while also maintaining high performance at convergence. For the CLTR field, our novel exposure-based risk minimization method enables practitioners to adopt CLTR methods in a safer manner that mitigates many of the risks attached to previous methods.

References

Aman Agarwal, Xuanhui Wang, Cheng Li, Michael Bendersky, and Marc Najork. 2019. Addressing Trust Bias for Unbiased Learning-to-rank. In The World Wide Web Conference. 4--14.Google Scholar
Olivier Chapelle and Yi Chang. 2011. Yahoo! Learning to Rank Challenge Overview. In Proceedings of the learning to rank challenge. PMLR, 1--24.Google Scholar
Olivier Chapelle and Ya Zhang. 2009. A Dynamic Bayesian Network Click Model for Web Search Ranking. In Proceedings of the 18th international conference on World wide web. 1--10.Google ScholarDigital Library
Jia Chen, Jiaxin Mao, Yiqun Liu, Min Zhang, and Shaoping Ma. 2019. TianGong-ST: A New Dataset with Large-scale Refined Real-world Web Search Sessions. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management. 2485--2488.Google ScholarDigital Library
Aleksandr Chuklin, Ilya Markov, and Maarten de Rijke. 2015. Click Models for Web Search. Morgan & Claypool Publishers.Google Scholar
Corinna Cortes, Yishay Mansour, and Mehryar Mohri. 2010. Learning Bounds for Importance Weighting. In Proceedings of the 23rd International Conference on Neural Information Processing Systems-Volume 1. 442--450.Google Scholar
Nick Craswell, Onno Zoeter, Michael Taylor, and Bill Ramsey. 2008. An Experimental Comparison of Click Position-bias Models. In Proceedings of the 2008 international conference on web search and data mining. 87--94.Google ScholarDigital Library
Domenico Dato, Claudio Lucchese, Franco Maria Nardini, Salvatore Orlando, Raffaele Perego, Nicola Tonellotto, and Rossano Venturini. 2016. Fast Ranking with Additive Ensembles of Oblivious and Non-Oblivious Regression Trees. ACM Transactions on Information Systems (TOIS), Vol. 35, 2 (2016), 1--31.Google ScholarDigital Library
BK Ghosh. 2002. Probability Inequalities Related to Markov's Theorem. The American Statistician, Vol. 56, 3 (2002), 186--190.Google ScholarCross Ref
Li He, Long Xia, Wei Zeng, Zhi-Ming Ma, Yihong Zhao, and Dawei Yin. 2019. Off-policy Learning for Multiple Loggers. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 1184--1193.Google ScholarDigital Library
Daniel G Horvitz and Donovan J Thompson. 1952. A Generalization of Sampling Without Replacement From a Finite Universe. Journal of the American statistical Association, Vol. 47, 260 (1952), 663--685.Google ScholarCross Ref
Rolf Jagerman, Ilya Markov, and Maarten de Rijke. 2020. Safe Exploration for Optimizing Contextual Bandits. ACM Transactions on Information Systems (TOIS), Vol. 38, 3 (2020), 1--23.Google ScholarDigital Library
Frederick James. 1980. Monte Carlo Theory and Practice. Reports on progress in Physics, Vol. 43, 9 (1980), 1145.Google Scholar
Kalervo Järvelin and Jaana Kekäläinen. 2002. Cumulated Gain-Based Evaluation of IR Techniques. ACM Transactions on Information Systems (TOIS), Vol. 20, 4 (2002), 422--446.Google ScholarDigital Library
Shan Jiang, Yuening Hu, Changsung Kang, Tim Daly Jr, Dawei Yin, Yi Chang, and Chengxiang Zhai. 2016. Learning Query and Document Relevance from a Web-scale Click Graph. In Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval. 185--194.Google ScholarDigital Library
Thorsten Joachims. 2002. Optimizing Search Engines Using Clickthrough Data. In Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining. 133--142.Google ScholarDigital Library
Thorsten Joachims and Adith Swaminathan. 2016. Counterfactual Evaluation and Learning for Search, Recommendation and Ad Placement. In Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval. 1199--1201.Google ScholarDigital Library
Thorsten Joachims, Adith Swaminathan, and Tobias Schnabel. 2017. Unbiased Learning-to-rank with Biased Feedback. In Proceedings of the Tenth ACM International Conference on Web Search and Data Mining. 781--789.Google ScholarDigital Library
Lihong Li, Wei Chu, John Langford, and Robert E Schapire. 2010. A Contextual-Bandit Approach to Personalized News Article Recommendation. In Proceedings of the 19th international conference on World wide web. 661--670.Google ScholarDigital Library
Tie-Yan Liu. 2009. Learning to Rank for Information Retrieval. Foundations and Trends® in Information Retrieval, Vol. 3, 3 (2009), 225--331.Google Scholar
Andreas Maurer and Massimiliano Pontil. 2009. Empirical Bernstein Bounds and Sample-Variance Penalization. In Annual Conference Computational Learning Theory.Google Scholar
Sebastian Nowozin, Botond Cseke, and Ryota Tomioka. 2016. f-GAN: Training Generative Neural Samplers using Variational Divergence Minimization. Advances in neural information processing systems, Vol. 29 (2016).Google Scholar
Harrie Oosterhuis. 2020. Learning from User Interactions with Rankings: A Unification of the Field. Ph.,D. Dissertation. Informatics Institute, University of Amsterdam.Google Scholar
Harrie Oosterhuis. 2021. Computationally Efficient Optimization of Plackett-Luce Ranking Models for Relevance and Fairness. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. 1023--1032.Google ScholarDigital Library
Harrie Oosterhuis. 2022a. Doubly-Robust Estimation for Unbiased Learning-to-Rank from Position-Biased Click Feedback. arXiv preprint arXiv:2203.17118 (2022).Google Scholar
Harrie Oosterhuis. 2022b. Reaching the End of Unbiasedness: Uncovering Implicit Limitations of Click-Based Learning to Rank. In Proceedings of the 2022 ACM SIGIR International Conference on the Theory of Information Retrieval. ACM.Google ScholarDigital Library
Harrie Oosterhuis and Maarten de Rijke. 2020a. Policy-aware Unbiased Learning to Rank for Top-k Rankings. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. 489--498.Google ScholarDigital Library
Harrie Oosterhuis and Maarten de Rijke. 2020b. Taking the Counterfactual Online: Efficient and Unbiased Online Evaluation for Ranking. In Proceedings of the 2020 ACM SIGIR on International Conference on Theory of Information Retrieval. 137--144.Google ScholarDigital Library
Harrie Oosterhuis and Maarten de Rijke. 2021a. Unifying Online and Counterfactual Learning to Rank: A Novel Counterfactual Estimator that Effectively Utilizes Online Interventions. In Proceedings of the 14th ACM International Conference on Web Search and Data Mining. 463--471.Google ScholarDigital Library
Harrie Oosterhuis and Maarten de de Rijke. 2021b. Robust Generalization and Safe Query-Specialization in Counterfactual Learning to Rank. In Proceedings of the Web Conference 2021. 158--170.Google Scholar
Harrie Oosterhuis, Rolf Jagerman, and Maarten de Rijke. 2020. Unbiased Learning to Rank: Counterfactual and Online Approaches. In Companion Proceedings of the Web Conference 2020. 299--300.Google ScholarDigital Library
Tao Qin and Tie-Yan Liu. 2013. Introducing LETOR 4.0 Datasets. arXiv preprint arXiv:1306.2597 (2013).Google Scholar
Tao Qin, Tie-Yan Liu, Jun Xu, and Hang Li. 2010. LETOR: A Benchmark Collection for Research on Learning to Rank for Information Retrieval. Information Retrieval, Vol. 13, 4 (2010), 346--374.Google ScholarDigital Library
Filip Radlinski, Madhu Kurup, and Thorsten Joachims. 2008. How Does Clickthrough Data Reflect Retrieval Quality?. In Proceedings of the 17th ACM conference on Information and knowledge management. 43--52.Google ScholarDigital Library
Alfréd Rényi. 1961. On measures of entropy and information. In Proceedings of the fourth Berkeley symposium on mathematical statistics and probability, Vol. 1. Berkeley, California, USA.Google Scholar
Yuta Saito, Shunsuke Aihara, Megumi Matsutani, and Yusuke Narita. 2020. Open Bandit Dataset and Pipeline: Towards Realistic and Reproducible Off-Policy Evaluation. arXiv preprint arXiv:2008.07146 (2020).Google Scholar
Yuta Saito and Thorsten Joachims. 2021. Counterfactual Learning and Evaluation for Recommender Systems: Foundations, Implementations, and Recent Advances. In Fifteenth ACM Conference on Recommender Systems. 828--830.Google ScholarDigital Library
Mark Sanderson, Monica Lestari Paramita, Paul Clough, and Evangelos Kanoulas. 2010. Do User Preferences and Evaluation Measures Line Up?. In Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval. 555--562.Google ScholarDigital Library
Shai Shalev-Shwartz and Shai Ben-David. 2014. Understanding Machine Learning: From Theory to Algorithms. Cambridge university press.Google ScholarDigital Library
Mirco Speretta and Susan Gauch. 2005. Personalized Search Based on User Search Histories. In The 2005 IEEE/WIC/ACM International Conference on Web Intelligence (WI'05). IEEE, 622--628.Google Scholar
Adith Swaminathan and Thorsten Joachims. 2015. Batch Learning from Logged Bandit Feedback through Counterfactual Risk Minimization. The Journal of Machine Learning Research, Vol. 16, 1 (2015), 1731--1755.Google ScholarDigital Library
Philip Thomas, Georgios Theocharous, and Mohammad Ghavamzadeh. 2015. High-Confidence Off-Policy Evaluation. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 29.Google ScholarCross Ref
Ali Vardasbi, Harrie Oosterhuis, and Maarten de Rijke. 2020. When Inverse Propensity Scoring does not Work: Affine Corrections for Unbiased Learning to Rank. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management. 1475--1484.Google ScholarDigital Library
Xuanhui Wang, Michael Bendersky, Donald Metzler, and Marc Najork. 2016. Learning to Rank with Selection Bias in Personal Search. In Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval. ACM, 115--124.Google ScholarDigital Library
Xuanhui Wang, Nadav Golbandi, Michael Bendersky, Donald Metzler, and Marc Najork. 2018. Position Bias Estimation for Unbiased Learning to Rank in Personal Search. In Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining. 610--618.Google ScholarDigital Library
Steve Wedig and Omid Madani. 2006. A Large-Scale Analysis of Query Logs for Assessing Personalization Opportunities. In Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining. 742--747.Google ScholarDigital Library
Ryen W White, Mikhail Bilenko, and Silviu Cucerzan. 2007. Studying the Use of Popular Destinations to Enhance Web Search Interaction. In Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval. 159--166.Google ScholarDigital Library
Ronald J Williams. 1992. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning. Machine learning, Vol. 8, 3 (1992), 229--256.Google ScholarDigital Library
Hang Wu and May Wang. 2018. Variance Regularized Counterfactual Risk Minimization via Variational Divergence Minimization. In International Conference on Machine Learning. PMLR, 5353--5362.Google Scholar
Himank Yadav, Zhengxiao Du, and Thorsten Joachims. 2021. Policy-Gradient Training of Fair and Unbiased Ranking Functions. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. 1044--1053.Google ScholarDigital Library

Index Terms

Safe Deployment for Counterfactual Learning to Rank with Exposure-Based Risk Minimization
1. Information systems
  1. Information retrieval
    1. Retrieval models and ranking
      1. Learning to rank
    2. Retrieval tasks and goals
      1. Recommender systems

Recommendations

Policy-Aware Unbiased Learning to Rank for Top-k Rankings
SIGIR '20: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval

Counterfactual Learning to Rank (LTR) methods optimize ranking systems using logged user interactions that contain interaction biases. Existing methods are only unbiased if users are presented with all relevant items in every ranking. There is currently ...
Read More
Unifying Online and Counterfactual Learning to Rank: A Novel Counterfactual Estimator that Effectively Utilizes Online Interventions
WSDM '21: Proceedings of the 14th ACM International Conference on Web Search and Data Mining

Optimizing ranking systems based on user interactions is a well-studied problem. State-of-the-art methods for optimizing ranking systems based on user interactions are divided into online approaches - that learn by directly interacting with users - and ...
Read More
Cascade Model-based Propensity Estimation for Counterfactual Learning to Rank
SIGIR '20: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval

Unbiased counterfactual learning to rank (CLTR) requires click propensities to compensate for the difference between user clicks and true relevance of search results via inverse propensity scoring (IPS). Current propensity estimation methods assume that ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGIR '23: Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval
July 2023
3567 pages
ISBN:9781450394086
DOI:10.1145/3539618
General Chairs:
Hsin-Hsi Chen
National Taiwan University
,
Wei-Jou (Edward) Duh
National Taiwan University
,
Hen-Hsen Huang
Academia Sinica
,
Program Chairs:
Makoto P. Kato
Spotify
,
Josiane Mothe
Universite de Toulouse
,
Barbara Poblete
University of Chile and Amazon Visiting Academic
Copyright © 2023 Owner/Author
This work is licensed under a Creative Commons Attribution International 4.0 License.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 18 July 2023
Check for updates
Author Tags
counterfactual learning to rank
learning to rank
safety
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate792of3,983submissions,20%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 5
  Total Citations
  View Citations
- 158
  Total Downloads
- Downloads (Last 12 months)158
- Downloads (Last 6 weeks)36
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Safe Deployment for Counterfactual Learning to Rank with Exposure-Based Risk Minimization

SIGIR '23: Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval

ABSTRACT

References

Cited By

Index Terms

Recommendations

Policy-Aware Unbiased Learning to Rank for Top-k Rankings

Unifying Online and Counterfactual Learning to Rank: A Novel Counterfactual Estimator that Effectively Utilizes Online Interventions

Cascade Model-based Propensity Estimation for Counterfactual Learning to Rank