ABSTRACT
A significant challenge in reinforcement learning is quantifying the complex relationship between actions and long-term rewards. The effects may manifest themselves over a long sequence of state-action pairs, making them hard to pinpoint. In this paper, we propose a method to link transitions with significant deviations in state with unusually large variations in subsequent rewards. Such transitions are marked as possible causal effects, and the corresponding state-action pairs are added to a separate replay buffer. In addition, we include contrastive samples corresponding to transitions from a similar state but with differing actions. Including this Contrastive Experience Replay (CER) during training is shown to outperform standard value-based methods on 2D navigation tasks. We believe that CER can be useful for a broad class of learning tasks, including for any off-policy reinforcement learning algorithm.
- Marcin Andrychowicz, Filip Wolski, Alex Ray, Jonas Schneider, Rachel Fong, Peter Welinder, Bob McGrew, Josh Tobin, OpenAI Pieter Abbeel, and Wojciech Zaremba. 2017. Hindsight experience replay. Advances in neural information processing systems 30 (2017).Google Scholar
- Adrià Puigdomènech Badia, Pablo Sprechmann, Alex Vitvitskyi, Daniel Guo, Bilal Piot, Steven Kapturowski, Olivier Tieleman, Martín Arjovsky, Alexander Pritzel, Andew Bolt, 2020. Never give up: Learning directed exploration strategies. In International Conference on Learning Representations.Google Scholar
- Maxime Chevalier-Boisvert, Lucas Willems, and Suman Pal. 2018. Minimalistic Gridworld Environment for OpenAI Gym. https://github.com/maximecb/gym-minigridGoogle Scholar
- Ishita Dasgupta, Jane Wang, Silvia Chiappa, Jovana Mitrovic, Pedro Ortega, David Raposo, Edward Hughes, Peter Battaglia, Matthew Botvinick, and Zeb Kurth-Nelson. 2019. Causal reasoning from meta-reinforcement learning. arXiv preprint arXiv:1901.08162(2019).Google Scholar
- Maxime Gasse, Damien Grasset, Guillaume Gaudron, and Pierre-Yves Oudeyer. 2021. Causal reinforcement learning using observational and interventional data. arXiv preprint arXiv:2106.14421(2021).Google Scholar
- Samuel J Gershman. 2017. Reinforcement learning and causal models. The Oxford handbook of causal reasoning 295 (2017).Google Scholar
- St John Grimbly, Jonathan Shock, and Arnu Pretorius. 2021. Causal Multi-Agent Reinforcement Learning: Review and Open Problems. arXiv preprint arXiv:2111.06721(2021).Google Scholar
- Marek Grzes and Daniel Kudenko. 2009. Theoretical and empirical analysis of reward shaping in reinforcement learning. In 2009 International Conference on Machine Learning and Applications. IEEE, 337–344.Google ScholarDigital Library
- Prashan Madumal, Tim Miller, Liz Sonenberg, and Frank Vetere. 2020. Explainable reinforcement learning through a causal lens. In Proceedings of the AAAI conference on artificial intelligence, Vol. 34. 2493–2500.Google ScholarCross Ref
- Li Menglin, Chen Jing, Chen Shaofei, and Gao Wei. 2020. A new reinforcement learning algorithm based on counterfactual experience replay. In 2020 39th Chinese Control Conference (CCC). IEEE, 1994–2001.Google ScholarCross Ref
- Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, 2015. Human-level control through deep reinforcement learning. Nature 518, 7540 (2015), 529–533.Google Scholar
- Gabriel Poesia, WenXin Dong, and Noah Goodman. 2021. Contrastive Reinforcement Learning of Symbolic Reasoning Domains. Advances in Neural Information Processing Systems 34 (2021), 15946–15956.Google Scholar
- Tom Schaul, John Quan, Ioannis Antonoglou, and David Silver. 2015. Prioritized experience replay. arXiv preprint arXiv:1511.05952(2015).Google Scholar
- Maximilian Seitzer, Bernhard Schölkopf, and Georg Martius. 2021. Causal influence detection for improving efficiency in reinforcement learning. Advances in Neural Information Processing Systems 34 (2021), 22905–22918.Google Scholar
- Shiv Shankar, Vihari Piratla, Soumen Chakrabarti, Siddhartha Chaudhuri, Preethi Jyothi, and Sunita Sarawagi. 2018. Generalizing across domains via cross-gradient training. arXiv preprint arXiv:1804.10745(2018).Google Scholar
- Richard Stuart Sutton. 1984. Temporal credit assignment in reinforcement learning. University of Massachusetts Amherst.Google Scholar
Index Terms
- Using Contrastive Samples for Identifying and Leveraging Possible Causal Relationships in Reinforcement Learning
Recommendations
Reward Shaping in Episodic Reinforcement Learning
AAMAS '17: Proceedings of the 16th Conference on Autonomous Agents and MultiAgent SystemsRecent advancements in reinforcement learning confirm that reinforcement learning techniques can solve large scale problems leading to high quality autonomous decision making. It is a matter of time until we will see large scale applications of ...
Counterfactual Explanations for Reinforcement Learning Agents
AAMAS '23: Proceedings of the 2023 International Conference on Autonomous Agents and Multiagent SystemsReinforcement learning (RL) algorithms often use neural networks to represent agent's policy, making them difficult to interpret. Counterfactual explanations are human-friendly explanations which offer users actionable advice on how to change their ...
Using Transfer Learning to Speed-Up Reinforcement Learning: A Cased-Based Approach
LARS '10: Proceedings of the 2010 Latin American Robotics Symposium and Intelligent Robotics MeetingReinforcement Learning (RL) is a well-known technique for the solution of problems where agents need to act with success in an unknown environment, learning through trial and error. However, this technique is not efficient enough to be used in ...
Comments