skip to main content
10.1145/3604915.3610651acmconferencesArticle/Chapter ViewAbstractPublication PagesrecsysConference Proceedingsconference-collections
extended-abstract

On the Consistency, Discriminative Power and Robustness of Sampled Metrics in Offline Top-N Recommender System Evaluation

Published:14 September 2023Publication History

ABSTRACT

Negative item sampling in offline top-n recommendation evaluation has become increasingly wide-spread, but remains controversial. While several studies have warned against using sampled evaluation metrics on the basis of being a poor approximation of the full ranking (i.e. using all negative items), others have highlighted their improved discriminative power and potential to make evaluation more robust. Unfortunately, empirical studies on negative item sampling are based on relatively few methods (between 3-12) and, therefore, lack the statistical power to assess the impact of negative item sampling in practice.

In this article, we present preliminary findings from a comprehensive benchmarking study of negative item sampling based on 52 recommendation algorithms and 3 benchmark data sets. We show how the number of sampled negative items and different sampling strategies affect the consistency and discriminative power of sampled evaluation metrics. Furthermore, we investigate the impact of sparsity bias and popularity bias on the robustness of these metrics. In brief, we show that the optimal parameterizations for negative item sampling are dependent on data set characteristics and the goals of the investigator, suggesting a need for greater transparency in related experimental design decisions.

References

  1. Alejandro Bellogín, Pablo Castells, and Iván Cantador. 2017. Statistical biases in information retrieval metrics for recommender systems. Information Retrieval Journal 20 (2017), 606–634.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Rocío Cañamares and Pablo Castells. 2020. On target item sampling in offline recommender system evaluation. In Proceedings of the 14th ACM Conference on Recommender Systems. 259–268.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Weiyu Cheng, Yanyan Shen, Yanmin Zhu, and Linpeng Huang. 2018. DELF: A dual-embedding based deep latent factor model for recommendation.. In Proceedings of the 27th International Joint Conferences on Artificial Intelligence Organization, Vol. 18. 3329–3335.Google ScholarGoogle ScholarCross RefCross Ref
  4. Alexander Dallmann, Daniel Zoller, and Andreas Hotho. 2021. A case study on sampling strategies for evaluating neural sequential item recommendation models. In Proceedings of the 15th ACM Conference on Recommender Systems. 505–514.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Travis Ebesu, Bin Shen, and Yi Fang. 2018. Collaborative memory network for recommendation systems. In The 41st International ACM SIGIR Conference on Research and Development in Information Retrieval. 515–524.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Maurizio Ferrari Dacrema, Simone Boglio, Paolo Cremonesi, and Dietmar Jannach. 2021. A troubling analysis of reproducibility and progress in recommender systems research. ACM Transactions on Information Systems 39, 2 (2021), 1–49.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. F Maxwell Harper and Joseph A Konstan. 2015. The movielens datasets: History and context. ACM Transactions on Interactive Intelligent Systems 5, 4 (2015), 1–19.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Ruining He and Julian McAuley. 2016. Ups and downs: Modeling the visual evolution of fashion trends with one-class collaborative filtering. In Proceedings of the 25th International Conference on World Wide Web. 507–517.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Xiangnan He, Lizi Liao, Hanwang Zhang, Liqiang Nie, Xia Hu, and Tat-Seng Chua. 2017. Neural collaborative filtering. In Proceedings of the 26th International Conference on World Wide Web. 173–182.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Yehuda Koren. 2008. Factorization meets the neighborhood: a multifaceted collaborative filtering model. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 426–434.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Walid Krichene and Steffen Rendle. 2020. On sampled metrics for item recommendation. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 1748–1757.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Aleksandr Petrov and Craig Macdonald. 2022. A systematic review and replicability study of BERT4Rec for sequential recommendation. In Proceedings of the 16th ACM Conference on Recommender Systems. 436–447.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Joseph P Simmons, Leif D Nelson, and Uri Simonsohn. 2011. False-positive psychology: Undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychological science 22, 11 (2011), 1359–1366.Google ScholarGoogle ScholarCross RefCross Ref
  14. Zhu Sun, Di Yu, Hui Fang, Jie Yang, Xinghua Qu, Jie Zhang, and Cong Geng. 2020. Are we evaluating rigorously? Benchmarking recommendation for reproducible evaluation and fair comparison. In Proceedings of the 14th ACM Conference on Recommender Systems. 23–32.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Daniel Valcarce, Alejandro Bellogín, Javier Parapar, and Pablo Castells. 2018. On the robustness and discriminative power of information retrieval metrics for top-N recommendation. In Proceedings of the 12th ACM Conference on Recommender Systems. 260–268.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Xiang Wang, Dingxian Wang, Canran Xu, Xiangnan He, Yixin Cao, and Tat-Seng Chua. 2019. Explainable reasoning over knowledge graphs for recommendation. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 5329–5336.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Wayne Xin Zhao, Zihan Lin, Zhichao Feng, Pengfei Wang, and Ji-Rong Wen. 2022. A revisiting study of appropriate offline evaluation for top-N recommendation algorithms. ACM Transactions on Information Systems 41, 2 (2022), 1–41.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Wayne Xin Zhao, Shanlei Mu, Yupeng Hou, Zihan Lin, Yushuo Chen, Xingyu Pan, Kaiyuan Li, Yujie Lu, Hui Wang, Changxin Tian, 2021. Recbole: Towards a unified, comprehensive and efficient framework for recommendation algorithms. In Proceedings of the 30th ACM International Conference on Information and Knowledge Management. 4653–4664.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. On the Consistency, Discriminative Power and Robustness of Sampled Metrics in Offline Top-N Recommender System Evaluation

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      RecSys '23: Proceedings of the 17th ACM Conference on Recommender Systems
      September 2023
      1406 pages

      Copyright © 2023 Owner/Author

      Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 14 September 2023

      Check for updates

      Qualifiers

      • extended-abstract
      • Research
      • Refereed limited

      Acceptance Rates

      Overall Acceptance Rate254of1,295submissions,20%

      Upcoming Conference

      RecSys '24
      18th ACM Conference on Recommender Systems
      October 14 - 18, 2024
      Bari , Italy

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format .

    View HTML Format