skip to main content
10.1145/3543873.3587661acmconferencesArticle/Chapter ViewAbstractPublication PageswwwConference Proceedingsconference-collections
research-article
Open Access

Investigating Action-Space Generalization in Reinforcement Learning for Recommendation Systems

Published:30 April 2023Publication History

ABSTRACT

Recommender systems are used to suggest items to users based on the users’ preferences. Such systems often deal with massive item sets and incredibly sparse user-item interactions, which makes it very challenging to generate high-quality personalized recommendations. Reinforcement learning (RL) is a framework for sequential decision making and naturally formulates recommender-system tasks: recommending items as actions in different user and context states to maximize long-term user experience. We investigate two RL policy parameterizations that generalize sparse user-items interactions by leveraging the relationships between actions: parameterizing the policy over action features as a softmax or Gaussian distribution. Our experiments on synthetic problems suggest that the Gaussian parameterization—which is not commonly used on recommendation tasks—is more robust to the set of action features than the softmax parameterization. Based on these promising results, we propose a more thorough investigation of the theoretical properties and empirical benefits of the Gaussian parameterization for recommender systems.

References

  1. Ilge Akkaya, Marcin Andrychowicz, Maciek Chociej, Mateusz Litwin, Bob McGrew, Arthur Petron, Alex Paino, Matthias Plappert, Glenn Powell, Raphael Ribas, 2019. Solving rubik’s cube with a robot hand. arXiv:1910.07113.Google ScholarGoogle Scholar
  2. Yoshua Bengio and Jean-Sébastien Senécal. 2007. Adaptive Importance Sampling to Accelerate Training of a Neural Probabilistic Language Model. IEEE Transactions on Neural Networks, 713–722.Google ScholarGoogle Scholar
  3. Yash Chandak, Georgios Theocharous, James E. Kostas, Scott M. Jordan, and Philip S. Thomas. 2019. Learning action representations for reinforcement learning. International Conference on Machine Learning, 1565–1582.Google ScholarGoogle Scholar
  4. Minmin Chen, Alex Beutel, Paul Covington, Sagar Jain, Francois Belletti, and Ed H. Chi. 2019. Top-K Off-Policy Correction for a REINFORCE Recommender System. International Conference on Web Search and Data Mining, 456–464.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Minmin Chen, Bo Chang, Can Xu, and Ed H. Chi. 2021. User Response Models to Improve a REINFORCE Recommender System. International Conference on Web Search and Data Mining, 121–129.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Paul Covington, Jay Adams, and Emre Sargin. 2016. Deep Neural Networks for YouTube Recommendations. ACM Conference on Recommender Systems, 191–198.Google ScholarGoogle Scholar
  7. Thomas Degris, Patrick M Pilarski, and Richard S Sutton. 2012. Model-Free Reinforcement Learning with Continuous Action in Practice. American Control Conference, 2177–2182.Google ScholarGoogle Scholar
  8. Yan Duan, Xi Chen, Rein Houthooft, John Schulman, and Pieter Abbeel. 2016. Benchmarking deep reinforcement learning for continuous control. International Conference on Machine Learning, 1329–1338.Google ScholarGoogle Scholar
  9. Gabriel Dulac-Arnold, Richard Evans, Hado van Hasselt, Peter Sunehag, Timothy Lillicrap, Jonathan Hunt, Timothy Mann, Theophane Weber, Thomas Degris, and Ben Coppin. 2015. Deep Reinforcement Learning in Large Discrete Action Spaces. arXiv:1512.07679.Google ScholarGoogle Scholar
  10. Ji He, Jianshu Chen, Xiaodong He, Jianfeng Gao, Lihong Li, Li Dengt, and Mari Ostendor. 2016. Deep Reinforcement Learning with a Natural Language Action Space. Annual Meeting of the Association for Computational Linguistics, 1621–1630.Google ScholarGoogle ScholarCross RefCross Ref
  11. Xiangnan He, Lizi Liao, Hanwang Zhang, Liqiang Nie, Xia Hu, and Tat-Seng Chua. 2017. Neural Collaborative Filtering. International Conference on World Wide Web, 173–182.Google ScholarGoogle Scholar
  12. Shengyi Huang, Rousslan Fernand Julien Dossa, Antonin Raffin, Anssi Kanervisto, and Weixun Wang. 2022. The 37 Implementation Details of Proximal Policy Optimization. ICLR Blog Track. https://iclr-blog-track.github.io/2022/03/25/ppo-implementation-details/Google ScholarGoogle Scholar
  13. Eugene Ie, Vihan Jain, Jing Wang, Sanmit Narvekar, Ritesh Agarwal, Rui Wu, Heng-Tze Cheng, Morgane Lustman, Vince Gatto, Paul Covington, Jim McFadden, Tushar Chandra, and Craig Boutilier. 2019. SlateQ: A Tractable Decomposition for Reinforcement Learning with Recommendation Sets. International Joint Conference on Artificial Intelligence, 2592–2599.Google ScholarGoogle Scholar
  14. Vijay Konda and John Tsitsiklis. 1999. Actor-critic algorithms. Advances in Neural Information Processing Systems, 1008–1014.Google ScholarGoogle Scholar
  15. Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, 2015. Human-level control through deep reinforcement learning. Nature, 529–533.Google ScholarGoogle Scholar
  16. Marius Muja and David G. Lowe. 2014. Scalable nearest neighbor algorithms for high dimensional data. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2227–2240.Google ScholarGoogle ScholarCross RefCross Ref
  17. Reuven Y. Rubinstein. 1981. Simulation and the Monte Carlo method. John Wiley & Sons.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. John Schulman, Sergey Levine, Pieter Abbeel, Michael Jordan, and Philipp Moritz. 2015. Trust region policy optimization. International Conference on Machine Learning, 1889–1897.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. 2017. Proximal policy optimization algorithms. arXiv:1707.06347.Google ScholarGoogle Scholar
  20. David Silver, Thomas Hubert, Julian Schrittwieser, Ioannis Antonoglou, Matthew Lai, Arthur Guez, Marc Lanctot, Laurent Sifre, Dharshan Kumaran, Thore Graepel, 2018. A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play. Science, 1140–1144.Google ScholarGoogle Scholar
  21. Richard S. Sutton. 1988. Learning to predict by the methods of temporal differences. Machine Learning, 9–44.Google ScholarGoogle Scholar
  22. Richard S. Sutton and Andrew G. Barto. 2018. Reinforcement Learning: An Introduction. MIT Press.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Gerald Tesauro. 1995. Temporal difference learning and TD-Gammon. Commun. ACM, 58–68.Google ScholarGoogle Scholar
  24. A. Töscher, M. Jahrer, and R. M. Bell. 2009. The BigChaos Solution to the Netflix Grand Prize. 1–52.Google ScholarGoogle Scholar
  25. Aaron van den Oord, Sander Dieleman, and Benjamin Schrauwen. 2013. Deep Content-based Music Recommendation. Advances in Neural Information Processing Systems, 2643–2651.Google ScholarGoogle Scholar
  26. Hado van Hasselt and Marco A. Wiering. 2009. Using continuous action spaces to solve discrete problems. International Joint Conference on Neural Networks, 1149–1156.Google ScholarGoogle ScholarCross RefCross Ref
  27. Christopher J. C. H. Watkins and Peter Dayan. 1992. Q-learning. Machine Learning, 279–292.Google ScholarGoogle Scholar
  28. Ronald J. Williams. 1992. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Reinforcement Learning, 5–32.Google ScholarGoogle Scholar
  29. Xinyang Yi, Ji Yang, Lichan Hong, Derek Zhiyuan Cheng, Lukasz Heldt, Aditee Kumthekar, Zhe Zhao, Li Wei, and Ed H. Chi. 2019. Sampling-Bias-Corrected Neural Modeling for Large Corpus Item Recommendations. ACM Conference on Recommender Systems, 269–277.Google ScholarGoogle Scholar

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Conferences
    WWW '23 Companion: Companion Proceedings of the ACM Web Conference 2023
    April 2023
    1567 pages
    ISBN:9781450394192
    DOI:10.1145/3543873

    Copyright © 2023 Owner/Author

    This work is licensed under a Creative Commons Attribution International 4.0 License.

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 30 April 2023

    Check for updates

    Qualifiers

    • research-article
    • Research
    • Refereed limited

    Acceptance Rates

    Overall Acceptance Rate1,899of8,196submissions,23%

    Upcoming Conference

    WWW '24
    The ACM Web Conference 2024
    May 13 - 17, 2024
    Singapore , Singapore
  • Article Metrics

    • Downloads (Last 12 months)389
    • Downloads (Last 6 weeks)49

    Other Metrics

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format