research-article

Open Access

Investigating Action-Space Generalization in Reinforcement Learning for Recommendation Systems

Authors:
Abhishek Naik

University of Alberta, Canada and Alberta Machine Intelligence Institute (Amii), Canada

University of Alberta, Canada and Alberta Machine Intelligence Institute (Amii), Canada

0009-0008-1427-1609
View Profile

,
Bo Chang

Google Research, USA

Google Research, USA

0000-0001-7429-7212
View Profile

,
Alexandros Karatzoglou

Google Research, USA

Google Research, USA

0000-0001-6063-9023
View Profile

,
Martin Mladenov

Google Research, USA

Google Research, USA

0009-0007-5565-5126
View Profile

,
Ed H. Chi

Google Research, USA

Google Research, USA

0000-0003-3230-5338
View Profile

,
Minmin Chen

Google Research, USA

Google Research, USA

0000-0002-7342-9022
View Profile

WWW '23 Companion: Companion Proceedings of the ACM Web Conference 2023April 2023Pages 966–972https://doi.org/10.1145/3543873.3587661

Published:30 April 2023Publication History

WWW '23 Companion: Companion Proceedings of the ACM Web Conference 2023

Pages 966–972

ABSTRACT

Recommender systems are used to suggest items to users based on the users’ preferences. Such systems often deal with massive item sets and incredibly sparse user-item interactions, which makes it very challenging to generate high-quality personalized recommendations. Reinforcement learning (RL) is a framework for sequential decision making and naturally formulates recommender-system tasks: recommending items as actions in different user and context states to maximize long-term user experience. We investigate two RL policy parameterizations that generalize sparse user-items interactions by leveraging the relationships between actions: parameterizing the policy over action features as a softmax or Gaussian distribution. Our experiments on synthetic problems suggest that the Gaussian parameterization—which is not commonly used on recommendation tasks—is more robust to the set of action features than the softmax parameterization. Based on these promising results, we propose a more thorough investigation of the theoretical properties and empirical benefits of the Gaussian parameterization for recommender systems.

References

Ilge Akkaya, Marcin Andrychowicz, Maciek Chociej, Mateusz Litwin, Bob McGrew, Arthur Petron, Alex Paino, Matthias Plappert, Glenn Powell, Raphael Ribas, 2019. Solving rubik’s cube with a robot hand. arXiv:1910.07113.Google Scholar
Yoshua Bengio and Jean-Sébastien Senécal. 2007. Adaptive Importance Sampling to Accelerate Training of a Neural Probabilistic Language Model. IEEE Transactions on Neural Networks, 713–722.Google Scholar
Yash Chandak, Georgios Theocharous, James E. Kostas, Scott M. Jordan, and Philip S. Thomas. 2019. Learning action representations for reinforcement learning. International Conference on Machine Learning, 1565–1582.Google Scholar
Minmin Chen, Alex Beutel, Paul Covington, Sagar Jain, Francois Belletti, and Ed H. Chi. 2019. Top-K Off-Policy Correction for a REINFORCE Recommender System. International Conference on Web Search and Data Mining, 456–464.Google ScholarDigital Library
Minmin Chen, Bo Chang, Can Xu, and Ed H. Chi. 2021. User Response Models to Improve a REINFORCE Recommender System. International Conference on Web Search and Data Mining, 121–129.Google ScholarDigital Library
Paul Covington, Jay Adams, and Emre Sargin. 2016. Deep Neural Networks for YouTube Recommendations. ACM Conference on Recommender Systems, 191–198.Google Scholar
Thomas Degris, Patrick M Pilarski, and Richard S Sutton. 2012. Model-Free Reinforcement Learning with Continuous Action in Practice. American Control Conference, 2177–2182.Google Scholar
Yan Duan, Xi Chen, Rein Houthooft, John Schulman, and Pieter Abbeel. 2016. Benchmarking deep reinforcement learning for continuous control. International Conference on Machine Learning, 1329–1338.Google Scholar
Gabriel Dulac-Arnold, Richard Evans, Hado van Hasselt, Peter Sunehag, Timothy Lillicrap, Jonathan Hunt, Timothy Mann, Theophane Weber, Thomas Degris, and Ben Coppin. 2015. Deep Reinforcement Learning in Large Discrete Action Spaces. arXiv:1512.07679.Google Scholar
Ji He, Jianshu Chen, Xiaodong He, Jianfeng Gao, Lihong Li, Li Dengt, and Mari Ostendor. 2016. Deep Reinforcement Learning with a Natural Language Action Space. Annual Meeting of the Association for Computational Linguistics, 1621–1630.Google ScholarCross Ref
Xiangnan He, Lizi Liao, Hanwang Zhang, Liqiang Nie, Xia Hu, and Tat-Seng Chua. 2017. Neural Collaborative Filtering. International Conference on World Wide Web, 173–182.Google Scholar
Shengyi Huang, Rousslan Fernand Julien Dossa, Antonin Raffin, Anssi Kanervisto, and Weixun Wang. 2022. The 37 Implementation Details of Proximal Policy Optimization. ICLR Blog Track. https://iclr-blog-track.github.io/2022/03/25/ppo-implementation-details/Google Scholar
Eugene Ie, Vihan Jain, Jing Wang, Sanmit Narvekar, Ritesh Agarwal, Rui Wu, Heng-Tze Cheng, Morgane Lustman, Vince Gatto, Paul Covington, Jim McFadden, Tushar Chandra, and Craig Boutilier. 2019. SlateQ: A Tractable Decomposition for Reinforcement Learning with Recommendation Sets. International Joint Conference on Artificial Intelligence, 2592–2599.Google Scholar
Vijay Konda and John Tsitsiklis. 1999. Actor-critic algorithms. Advances in Neural Information Processing Systems, 1008–1014.Google Scholar
Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, 2015. Human-level control through deep reinforcement learning. Nature, 529–533.Google Scholar
Marius Muja and David G. Lowe. 2014. Scalable nearest neighbor algorithms for high dimensional data. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2227–2240.Google ScholarCross Ref
Reuven Y. Rubinstein. 1981. Simulation and the Monte Carlo method. John Wiley & Sons.Google ScholarDigital Library
John Schulman, Sergey Levine, Pieter Abbeel, Michael Jordan, and Philipp Moritz. 2015. Trust region policy optimization. International Conference on Machine Learning, 1889–1897.Google ScholarDigital Library
John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. 2017. Proximal policy optimization algorithms. arXiv:1707.06347.Google Scholar
David Silver, Thomas Hubert, Julian Schrittwieser, Ioannis Antonoglou, Matthew Lai, Arthur Guez, Marc Lanctot, Laurent Sifre, Dharshan Kumaran, Thore Graepel, 2018. A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play. Science, 1140–1144.Google Scholar
Richard S. Sutton. 1988. Learning to predict by the methods of temporal differences. Machine Learning, 9–44.Google Scholar
Richard S. Sutton and Andrew G. Barto. 2018. Reinforcement Learning: An Introduction. MIT Press.Google ScholarDigital Library
Gerald Tesauro. 1995. Temporal difference learning and TD-Gammon. Commun. ACM, 58–68.Google Scholar
A. Töscher, M. Jahrer, and R. M. Bell. 2009. The BigChaos Solution to the Netflix Grand Prize. 1–52.Google Scholar
Aaron van den Oord, Sander Dieleman, and Benjamin Schrauwen. 2013. Deep Content-based Music Recommendation. Advances in Neural Information Processing Systems, 2643–2651.Google Scholar
Hado van Hasselt and Marco A. Wiering. 2009. Using continuous action spaces to solve discrete problems. International Joint Conference on Neural Networks, 1149–1156.Google ScholarCross Ref
Christopher J. C. H. Watkins and Peter Dayan. 1992. Q-learning. Machine Learning, 279–292.Google Scholar
Ronald J. Williams. 1992. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Reinforcement Learning, 5–32.Google Scholar
Xinyang Yi, Ji Yang, Lichan Hong, Derek Zhiyuan Cheng, Lukasz Heldt, Aditee Kumthekar, Zhe Zhao, Li Wei, and Ed H. Chi. 2019. Sampling-Bias-Corrected Neural Modeling for Large Corpus Item Recommendations. ACM Conference on Recommender Systems, 269–277.Google Scholar

Recommendations

Learning discriminative recommendation systems with side information
IJCAI'17: Proceedings of the 26th International Joint Conference on Artificial Intelligence

Top-N recommendation systems are useful in many real world applications such as E-commerce platforms. Most previous methods produce top-N recommendations based on the observed user purchase or recommendation activities. Recently, it has been noticed ...
Read More
Investigating serendipity in recommender systems based on real user feedback
SAC '18: Proceedings of the 33rd Annual ACM Symposium on Applied Computing

Over the past several years, research in recommender systems has emphasized the importance of serendipity, but there is still no consensus on the definition of this concept and whether serendipitous items should be recommended is still not a well-...
Read More
New Recommendation Techniques for Multicriteria Rating Systems

Traditional single-rating recommender systems have been successful in a number of personalization applications, but the research area of multicriteria recommender systems has been largely untouched. Taking full advantage of multicriteria ratings in ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
WWW '23 Companion: Companion Proceedings of the ACM Web Conference 2023
April 2023
1567 pages
ISBN:9781450394192
DOI:10.1145/3543873
Editors:
Ying Ding,
Jie Tang,
Juan Sequeda,
Lora Aroyo,
Carlos Castillo,
Geert-Jan Houben
Copyright © 2023 Owner/Author
This work is licensed under a Creative Commons Attribution International 4.0 License.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 30 April 2023
Check for updates
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
Overall Acceptance Rate1,899of8,196submissions,23%
Upcoming Conference
WWW '24

Sponsor:

sigweb

The ACM Web Conference 2024

May 13 - 17, 2024

Singapore , Singapore
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 389
  Total Downloads
- Downloads (Last 12 months)389
- Downloads (Last 6 weeks)49
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Investigating Action-Space Generalization in Reinforcement Learning for Recommendation Systems

WWW '23 Companion: Companion Proceedings of the ACM Web Conference 2023

ABSTRACT

References

Cited By

Recommendations

Learning discriminative recommendation systems with side information

Investigating serendipity in recommender systems based on real user feedback

New Recommendation Techniques for Multicriteria Rating Systems

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

Investigating Action-Space Generalization in Reinforcement Learning for Recommendation Systems

WWW '23 Companion: Companion Proceedings of the ACM Web Conference 2023

ABSTRACT

References

Cited By

Recommendations

Learning discriminative recommendation systems with side information

Investigating serendipity in recommender systems based on real user feedback

New Recommendation Techniques for Multicriteria Rating Systems

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media