ABSTRACT
The task of item recommendation requires ranking a large catalogue of items given a context. Item recommendation algorithms are evaluated using ranking metrics that depend on the positions of relevant items. To speed up the computation of metrics, recent work often uses sampled metrics where only a smaller set of random items and the relevant items are ranked. This paper investigates sampled metrics in more detail and shows that they are inconsistent with their exact version, in the sense that they do not persist relative statements, e.g., recommender A is better than B, not even in expectation. Moreover, the smaller the sampling size, the less difference there is between metrics, and for very small sampling size, all metrics collapse to the AUC metric. We show that it is possible to improve the quality of the sampled metrics by applying a correction, obtained by minimizing different criteria such as bias or mean squared error. We conclude with an empirical evaluation of the naive sampled metrics and their corrected variants. To summarize, our work suggests that sampling should be avoided for metric calculation, however if an experimental study needs to sample, the proposed corrections can improve the quality of the estimate.
Supplemental Material
- Fabio Aiolli. 2013. Efficient Top-n Recommendation for Very Large Scale Binary Rated Datasets. In Proceedings of the 7th ACM Conference on Recommender Systems (Hong Kong, China) (RecSys '13). Association for Computing Machinery, New York, NY, USA, 273--280. https://doi.org/10.1145/2507157.2507189Google ScholarDigital Library
- R.E. Barlow, D.J. Bartholomew, J. M. Bremner, and Brunk H. D. 1972. Statistical Inference Under Order Restrictions: The Theory and Application of Isotonic Regression .J. Wiley.Google Scholar
- Immanuel Bayer, Xiangnan He, Bhargav Kanagal, and Steffen Rendle. 2017. A Generic Coordinate Descent Framework for Learning from Implicit Feedback. In Proceedings of the 26th International Conference on World Wide Web (Perth, Australia) (WWW '17). 1341--1350. https://doi.org/10.1145/3038912.3052694Google ScholarDigital Library
- Yoshua Bengio and Jean-Sé bastien Senecal. 2003. Quick Training of Probabilistic Neural Nets by Importance Sampling. In Proceedings of the Ninth International Workshop on Artificial Intelligence and Statistics, AISTATS 2003, Key West, Florida, USA, January 3--6, 2003.Google Scholar
- Yoshua Bengio and Jean-Sé bastien Senecal. 2008. Adaptive Importance Sampling to Accelerate Training of a Neural Probabilistic Language Model. IEEE Trans. Neural Networks, Vol. 19, 4 (2008), 713--722.Google ScholarDigital Library
- Guy Blanc and Steffen Rendle. 2018. Adaptive Sampled Softmax with Kernel Based Sampling. In Proceedings of the 35th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 80), Jennifer Dy and Andreas Krause (Eds.). PMLR, Stockholmsmässan, Stockholm Sweden, 590--599.Google Scholar
- Travis Ebesu, Bin Shen, and Yi Fang. 2018. Collaborative Memory Network for Recommendation Systems. In The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval (Ann Arbor, MI, USA) (SIGIR '18). ACM, New York, NY, USA, 515--524. https://doi.org/10.1145/3209978.3209991Google ScholarDigital Library
- F. Maxwell Harper and Joseph A. Konstan. 2015. The MovieLens Datasets: History and Context. ACM Trans. Interact. Intell. Syst., Vol. 5, 4, Article 19 (Dec. 2015), 19 pages. https://doi.org/10.1145/2827872Google ScholarDigital Library
- Xiangnan He, Lizi Liao, Hanwang Zhang, Liqiang Nie, Xia Hu, and Tat-Seng Chua. 2017. Neural Collaborative Filtering. In Proceedings of the 26th International Conference on World Wide Web (Perth, Australia) (WWW '17). International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, Switzerland, 173--182. https://doi.org/10.1145/3038912.3052569Google ScholarDigital Library
- Binbin Hu, Chuan Shi, Wayne Xin Zhao, and Philip S. Yu. 2018. Leveraging Meta-path Based Context for Top- N Recommendation with A Neural Co-Attention Model. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (London, United Kingdom) (KDD '18). ACM, New York, NY, USA, 1531--1540. https://doi.org/10.1145/3219819.3219965Google ScholarDigital Library
- Yifan Hu, Yehuda Koren, and Chris Volinsky. 2008. Collaborative Filtering for Implicit Feedback Datasets. In Proceedings of the 2008 Eighth IEEE International Conference on Data Mining (ICDM '08). 263--272.Google ScholarDigital Library
- Walid Krichene, Nicolas Mayoraz, Steffen Rendle, Li Zhang, Xinyang Yi, Lichan Hong, Ed Chi, and John Anderson. 2019. Efficient Training on Very Large Corpora via Gramian Estimation. In International Conference on Learning Representations.Google Scholar
- Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient Estimation of Word Representations in Vector Space. CoRR, Vol. abs/1301.3781 (2013).Google Scholar
- Badrul Sarwar, George Karypis, Joseph Konstan, and John Riedl. 2001. Item-Based Collaborative Filtering Recommendation Algorithms. In Proceedings of the 10th International Conference on World Wide Web (Hong Kong, Hong Kong) (WWW '01). Association for Computing Machinery, New York, NY, USA, 285--295. https://doi.org/10.1145/371920.372071Google ScholarDigital Library
- Xiang Wang, Dingxian Wang, Canran Xu, Xiangnan He, Yixin Cao, and Tat-Seng Chua. 2019. Explainable Reasoning over Knowledge Graphs for Recommendation. In Proceedings of the 33rd AAAI Conference on Artificial Intelligence (AAAI) (AAAI '19). 5329--5336.Google ScholarDigital Library
- Longqi Yang, Eugene Bagdasaryan, Joshua Gruenstein, Cheng-Kang Hsieh, and Deborah Estrin. 2018a. OpenRec: A Modular Framework for Extensible and Adaptable Recommendation Algorithms. In Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining (Marina Del Rey, CA, USA) (WSDM '18). ACM, New York, NY, USA, 664--672. https://doi.org/10.1145/3159652.3159681Google ScholarDigital Library
- Longqi Yang, Yin Cui, Yuan Xuan, Chenyang Wang, Serge Belongie, and Deborah Estrin. 2018b. Unbiased Offline Recommender Evaluation for Missing-not-at-random Implicit Feedback. In Proceedings of the 12th ACM Conference on Recommender Systems (Vancouver, British Columbia, Canada) (RecSys '18). ACM, New York, NY, USA, 279--287. https://doi.org/10.1145/3240323.3240355Google ScholarDigital Library
- Hsiang-Fu Yu, Mikhail Bilenko, and Chih-Jen Lin. 2017. Selection of Negative Samples for One-class Matrix Factorization. In Proceedings of the 2017 SIAM International Conference on Data Mining. 363--371.Google ScholarCross Ref
Index Terms
- On Sampled Metrics for Item Recommendation
Recommendations
On sampled metrics for item recommendation
Recommender systems personalize content by recommending items to users. Item recommendation algorithms are evaluated by metrics that compare the positions of truly relevant items among the recommended items. To speed up the computation of metrics, ...
Item recommendation in collaborative tagging systems via heuristic data fusion
Collaborative tagging systems have been popular on the Web. However, information overload results in the increasing need for recommender services from users, and thus item recommendation has been one of the key issues in such systems. In this paper, we ...
Item recommendation based on context-aware model for personalized u-healthcare service
A personalized service in the ubiquitous environment is to provide services or items, which reflect personal tastes, attitudes, and contexts. It is impossible to reflect the context information generated in u-healthcare environments due to the existing ...
Comments