skip to main content
10.1145/3589334.3645487acmconferencesArticle/Chapter ViewAbstractPublication PageswwwConference Proceedingsconference-collections
research-article
Open Access

Mitigating Exploitation Bias in Learning to Rank with an Uncertainty-aware Empirical Bayes Approach

Published:13 May 2024Publication History

ABSTRACT

Ranking is at the core of many artificial intelligence (AI) applications, including search engines, recommender systems, etc. Modern ranking systems are often constructed with learning-to-rank (LTR) models built from user behavior signals. While previous studies have demonstrated the effectiveness of using user behavior signals (e.g., clicks) as both features and labels of LTR algorithms, we argue that existing LTR algorithms that indiscriminately treat behavior and non-behavior signals in input features could lead to suboptimal performance in practice. Because user behavior signals often have strong correlations with the ranking objective and can only be collected on items that have already been shown to users, directly using behavior signals in LTR could create an exploitation bias that hurts the system performance in the long run.

To address the exploitation bias, we propose an uncertainty-aware empirical Bayes based ranking algorithm, referred to as EBRank. Specifically, EBRank uses a sole non-behavior feature-based prior model to get a prior estimation of relevance. In the dynamic training and serving of ranking systems, EBRank uses the observed user behaviors to update posterior relevance estimation instead of concatenating behaviors as features in ranking models. Besides, EBRank additionally applies an uncertainty-aware exploration strategy to explore actively and collect user behaviors for empirical Bayesian modeling. Experiments on three public datasets show that EBRank is effective, practical and significantly outperforms state-of-the-art ranking algorithms.

Skip Supplemental Material Section

Supplemental Material

rfp1075.mp4

Supplemental video

mp4

6.3 MB

References

  1. Milton Abramowitz and Irene A Stegun. 1964. Handbook of mathematical functions with formulas, graphs, and mathematical tables. Vol. 55. US Government printing office.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Aman Agarwal, Kenta Takatsu, Ivan Zaitsev, and Thorsten Joachims. 2019a. A general framework for counterfactual learning-to-rank. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval. 5--14.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Aman Agarwal, Ivan Zaitsev, Xuanhui Wang, Cheng Li, Marc Najork, and Thorsten Joachims. 2019b. Estimating position bias without intrusive interventions. In Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining. 474--482.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Eugene Agichtein, Eric Brill, and Susan Dumais. 2006. Improving web search ranking by incorporating user behavior information. In Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval. 19--26.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Qingyao Ai, Keping Bi, Jiafeng Guo, and W Bruce Croft. 2018a. Learning a deep listwise context model for ranking refinement. In The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval. 135--144.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Qingyao Ai, Keping Bi, Cheng Luo, Jiafeng Guo, and W Bruce Croft. 2018b. Unbiased learning to rank with unbiased propensity estimation. In The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval. 385--394.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Qingyao Ai, Tao Yang, Huazheng Wang, and Jiaxin Mao. 2021. Unbiased Learning to Rank: Online or Offline? ACM Transactions on Information Systems (TOIS), Vol. 39, 2 (2021), 1--29.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Jessa Bekker, Pieter Robberechts, and Jesse Davis. 2020. Beyond the selected completely at random assumption for learning from positive and unlabeled data. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, 71--85.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Olivier Chapelle and Yi Chang. 2011. Yahoo! learning to rank challenge overview. In Proceedings of the learning to rank challenge. PMLR, 1--24.Google ScholarGoogle Scholar
  10. Olivier Chapelle, Donald Metlzer, Ya Zhang, and Pierre Grinspan. 2009. Expected reciprocal rank for graded relevance. In Proceedings of the 18th ACM conference on Information and knowledge management. 621--630.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Matej Cief, Branislav Kveton, and Michal Kompan. 2022. Pessimistic Off-Policy Optimization for Learning to Rank. arXiv preprint arXiv:2206.02593 (2022).Google ScholarGoogle Scholar
  12. Daniel Cohen, Bhaskar Mitra, Oleg Lesota, Navid Rekabsaz, and Carsten Eickhoff. 2021. Not All Relevance Scores are Equal: Efficient Uncertainty and Calibration Modeling for Deep Retrieval Models. arXiv preprint arXiv:2105.04651 (2021).Google ScholarGoogle Scholar
  13. Nick Craswell, Onno Zoeter, Michael Taylor, and Bill Ramsey. 2008. An experimental comparison of click position-bias models. In Proceedings of the 2008 international conference on web search and data mining. 87--94.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. J Shane Culpepper, Charles LA Clarke, and Jimmy Lin. 2016. Dynamic cutoff prediction in multi-stage retrieval systems. In Proceedings of the 21st Australasian Document Computing Symposium. 17--24.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Domenico Dato, Claudio Lucchese, Franco Maria Nardini, Salvatore Orlando, Raffaele Perego, Nicola Tonellotto, and Rossano Venturini. 2016. Fast ranking with additive ensembles of oblivious and non-oblivious regression trees. ACM Transactions on Information Systems (TOIS), Vol. 35, 2 (2016), 1--31.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Arthur P Dempster, Nan M Laird, and Donald B Rubin. 1977. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society: Series B (Methodological), Vol. 39, 1 (1977), 1--22.Google ScholarGoogle ScholarCross RefCross Ref
  17. Jianfeng Gao, Wei Yuan, Xiao Li, Kefeng Deng, and Jian-Yun Nie. 2009. Smoothing clickthrough data for web search ranking. In Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval. 355--362.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Parth Gupta, Tommaso Dreossi, Jan Bakus, Yu-Hsiang Lin, and Vamsi Salaka. 2020. Treating Cold Start in Product Search by Priors. In Companion Proceedings of the Web Conference 2020. 77--78.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Cuize Han, Pablo Castells, Parth Gupta, Xu Xu, and Vamsi Salaka. 2022. Addressing Cold Start in Product Search via Empirical Bayes. In Proceedings of the 31st ACM International Conference on Information and Knowledge Management (Atlanta, GA, USA) (CIKM '22). Association for Computing Machinery, New York, NY, USA, 3141--3151. https://doi.org/10.1145/3511808.3557066Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Maria Heuss, Daniel Cohen, Masoud Mansoury, Maarten de Rijke, and Carsten Eickhoff. 2023. Predictive Uncertainty-based Bias Mitigation in Ranking. arXiv preprint arXiv:2309.09833 (2023).Google ScholarGoogle Scholar
  21. Kalervo J"arvelin and Jaana Kek"al"ainen. 2002. Cumulated gain-based evaluation of IR techniques. ACM Transactions on Information Systems (TOIS), Vol. 20, 4 (2002), 422--446.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Olivier Jeunen and Bart Goethals. 2021. Pessimistic reward models for off-policy learning in recommendation. In Proceedings of the 15th ACM Conference on Recommender Systems. 63--74.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Thorsten Joachims, Adith Swaminathan, and Tobias Schnabel. 2017. Unbiased learning-to-rank with biased feedback. In Proceedings of the Tenth ACM International Conference on Web Search and Data Mining. 781--789.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Branislav Kveton, Ofer Meshi, Masrour Zoghi, and Zhen Qin. 2022. On the Value of Prior in Online Learning to Rank. In International Conference on Artificial Intelligence and Statistics. PMLR, 6880--6892.Google ScholarGoogle Scholar
  25. Chang Li, Branislav Kveton, Tor Lattimore, Ilya Markov, Maarten de Rijke, Csaba Szepesvári, and Masrour Zoghi. 2020. BubbleRank: Safe online learning to re-rank via implicit click feedback. In Uncertainty in Artificial Intelligence. PMLR, 196--206.Google ScholarGoogle Scholar
  26. Dawen Liang and Nikos Vlassis. 2022. Local Policy Improvement for Recommender Systems. arXiv preprint arXiv:2212.11431 (2022).Google ScholarGoogle Scholar
  27. Yen-Chieh Lien, Daniel Cohen, and W Bruce Croft. 2019. An assumption-free approach to the dynamic truncation of ranked lists. In Proceedings of the 2019 ACM SIGIR International Conference on Theory of Information Retrieval. 79--82.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Tie-Yan Liu et al. 2009. Learning to rank for information retrieval. Foundations and Trends® in Information Retrieval, Vol. 3, 3 (2009), 225--331.Google ScholarGoogle Scholar
  29. Craig Macdonald, Rodrygo LT Santos, and Iadh Ounis. 2012. On the usefulness of query features for learning to rank. In Proceedings of the 21st ACM international conference on Information and knowledge management. 2559--2562.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Marco Morik, Ashudeep Singh, Jessica Hong, and Thorsten Joachims. 2020. Controlling Fairness and Bias in Dynamic Learning-to-Rank. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval (Virtual Event, China) (SIGIR '20). Association for Computing Machinery, New York, NY, USA, 429--438. https://doi.org/10.1145/3397271.3401100Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Harrie Oosterhuis and Maarten de Rijke. 2018. Differentiable unbiased online learning to rank. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management. 1293--1302.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Harrie Oosterhuis and Maarten de Rijke. 2020. Policy-aware unbiased learning to rank for top-k rankings. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. 489--498.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Harrie Oosterhuis and Maarten de Rijke. 2021a. Unifying Online and Counterfactual Learning to Rank: A Novel Counterfactual Estimator that Effectively Utilizes Online Interventions. In Proceedings of the 14th ACM International Conference on Web Search and Data Mining. 463--471.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Harrie Oosterhuis and Maarten de de Rijke. 2021b. Robust Generalization and Safe Query-Specializationin Counterfactual Learning to Rank. In Proceedings of the Web Conference 2021. 158--170.Google ScholarGoogle Scholar
  35. Zohreh Ovaisi, Ragib Ahsan, Yifan Zhang, Kathryn Vasilaky, and Elena Zheleva. 2020. Correcting for selection bias in learning-to-rank systems. In Proceedings of The Web Conference 2020. 1863--1873.Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Gustavo Penha and Claudia Hauff. 2021. On the Calibration and Uncertainty of Neural Learning to Rank Models for Conversational Search. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume. 160--170.Google ScholarGoogle ScholarCross RefCross Ref
  37. Tao Qin, Tie-Yan Liu, Jun Xu, and Hang Li. 2010. LETOR: A benchmark collection for research on learning to rank for information retrieval. Information Retrieval, Vol. 13, 4 (2010), 346--374.Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Howard Raiffa, Robert Schlaifer, et al. 1961. Applied statistical decision theory. (1961).Google ScholarGoogle Scholar
  39. Haggai Roitman, Shai Erera, and Bar Weiner. 2017. Robust standard deviation estimation for query performance prediction. In Proceedings of the ACM SIGIR International Conference on Theory of Information Retrieval. 245--248.Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Yuta Saito, Suguru Yaginuma, Yuta Nishino, Hayato Sakata, and Kazuhide Nakata. 2020. Unbiased recommender learning from missing-not-at-random implicit feedback. In Proceedings of the 13th International Conference on Web Search and Data Mining. 501--509.Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Anne Schuth, Katja Hofmann, Shimon Whiteson, and Maarten de Rijke. 2013. Lerot: An online learning to rank framework. In Proceedings of the 2013 workshop on Living labs for information retrieval evaluation. 23--26.Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Anne Schuth, Harrie Oosterhuis, Shimon Whiteson, and Maarten de Rijke. 2016. Multileave gradient descent for fast online learning to rank. In Proceedings of the Ninth ACM International Conference on Web Search and Data Mining. 457--466.Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Ashudeep Singh and Thorsten Joachims. 2018. Fairness of exposure in rankings. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2219--2228.Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Mark D Smucker, James Allan, and Ben Carterette. 2007. A comparison of statistical significance tests for information retrieval evaluation. In Proceedings of the sixteenth ACM conference on Conference on information and knowledge management. 623--632.Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Anh Tran, Tao Yang, and Qingyao Ai. 2021. ULTRA: An Unbiased Learning To Rank Algorithm Toolbox. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management. 4613--4622.Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Ali Vardasbi, Harrie Oosterhuis, and Maarten de Rijke. 2020. When Inverse Propensity Scoring does not Work: Affine Corrections for Unbiased Learning to Rank. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management. 1475--1484.Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Huazheng Wang, Sonwoo Kim, Eric McCord-Snook, Qingyun Wu, and Hongning Wang. 2019. Variance reduction in gradient exploration for online learning to rank. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval. 835--844.Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Huazheng Wang, Ramsey Langley, Sonwoo Kim, Eric McCord-Snook, and Hongning Wang. 2018b. Efficient exploration of gradient space for online learning to rank. In The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval. 145--154.Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Qunbo Wang, Wenjun Wu, Yuxing Qi, and Yongchi Zhao. 2021. Deep bayesian active learning for learning to rank: A case study in answer selection. IEEE Transactions on Knowledge and Data Engineering (2021).Google ScholarGoogle Scholar
  50. Xuanhui Wang, Michael Bendersky, Donald Metzler, and Marc Najork. 2016. Learning to rank with selection bias in personal search. In Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval. 115--124.Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Xuanhui Wang, Nadav Golbandi, Michael Bendersky, Donald Metzler, and Marc Najork. 2018a. Position bias estimation for unbiased learning to rank in personal search. In Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining. 610--618.Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Tao Yang and Qingyao Ai. 2021. Maximizing marginal fairness for dynamic learning to rank. In Proceedings of the Web Conference 2021. 137--145.Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Tao Yang, Shikai Fang, Shibo Li, Yulan Wang, and Qingyao Ai. 2020. Analysis of multivariate scoring functions for automatic unbiased learning to rank. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management. 2277--2280.Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Tao Yang, Chen Luo, Hanqing Lu, Parth Gupta, Bing Yin, and Qingyao Ai. 2022. Can clicks be both labels and features? Unbiased Behavior Feature Collection and Uncertainty-aware Learning to Rank. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval. 6--17.Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Tao Yang, Zhichao Xu, Zhenduo Wang, Anh Tran, and Qingyao Ai. 2023. Marginal-Certainty-aware Fair Ranking Algorithm. In Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining. 24--32.Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Yisong Yue and Thorsten Joachims. 2009. Interactively optimizing information retrieval systems as a dueling bandits problem. In Proceedings of the 26th Annual International Conference on Machine Learning. 1201--1208.Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Jianhan Zhu, Jun Wang, Michael Taylor, and Ingemar J Cox. 2009. Risk-aware information retrieval. In European Conference on Information Retrieval. Springer, 17--28.Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. Masrour Zoghi, Tomas Tunys, Mohammad Ghavamzadeh, Branislav Kveton, Csaba Szepesvari, and Zheng Wen. 2017. Online learning to rank in stochastic click models. In International Conference on Machine Learning. PMLR, 4199--4208. ioGoogle ScholarGoogle Scholar

Index Terms

  1. Mitigating Exploitation Bias in Learning to Rank with an Uncertainty-aware Empirical Bayes Approach

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      WWW '24: Proceedings of the ACM on Web Conference 2024
      May 2024
      4826 pages
      ISBN:9798400701719
      DOI:10.1145/3589334

      Copyright © 2024 Owner/Author

      This work is licensed under a Creative Commons Attribution-ShareAlike International 4.0 License.

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 13 May 2024

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate1,899of8,196submissions,23%
    • Article Metrics

      • Downloads (Last 12 months)18
      • Downloads (Last 6 weeks)18

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader