skip to main content
10.1145/3570991.3571031acmotherconferencesArticle/Chapter ViewAbstractPublication PagescodsConference Proceedingsconference-collections
short-paper

Using Contrastive Samples for Identifying and Leveraging Possible Causal Relationships in Reinforcement Learning

Published:04 January 2023Publication History

ABSTRACT

A significant challenge in reinforcement learning is quantifying the complex relationship between actions and long-term rewards. The effects may manifest themselves over a long sequence of state-action pairs, making them hard to pinpoint. In this paper, we propose a method to link transitions with significant deviations in state with unusually large variations in subsequent rewards. Such transitions are marked as possible causal effects, and the corresponding state-action pairs are added to a separate replay buffer. In addition, we include contrastive samples corresponding to transitions from a similar state but with differing actions. Including this Contrastive Experience Replay (CER) during training is shown to outperform standard value-based methods on 2D navigation tasks. We believe that CER can be useful for a broad class of learning tasks, including for any off-policy reinforcement learning algorithm.

References

  1. Marcin Andrychowicz, Filip Wolski, Alex Ray, Jonas Schneider, Rachel Fong, Peter Welinder, Bob McGrew, Josh Tobin, OpenAI Pieter Abbeel, and Wojciech Zaremba. 2017. Hindsight experience replay. Advances in neural information processing systems 30 (2017).Google ScholarGoogle Scholar
  2. Adrià Puigdomènech Badia, Pablo Sprechmann, Alex Vitvitskyi, Daniel Guo, Bilal Piot, Steven Kapturowski, Olivier Tieleman, Martín Arjovsky, Alexander Pritzel, Andew Bolt, 2020. Never give up: Learning directed exploration strategies. In International Conference on Learning Representations.Google ScholarGoogle Scholar
  3. Maxime Chevalier-Boisvert, Lucas Willems, and Suman Pal. 2018. Minimalistic Gridworld Environment for OpenAI Gym. https://github.com/maximecb/gym-minigridGoogle ScholarGoogle Scholar
  4. Ishita Dasgupta, Jane Wang, Silvia Chiappa, Jovana Mitrovic, Pedro Ortega, David Raposo, Edward Hughes, Peter Battaglia, Matthew Botvinick, and Zeb Kurth-Nelson. 2019. Causal reasoning from meta-reinforcement learning. arXiv preprint arXiv:1901.08162(2019).Google ScholarGoogle Scholar
  5. Maxime Gasse, Damien Grasset, Guillaume Gaudron, and Pierre-Yves Oudeyer. 2021. Causal reinforcement learning using observational and interventional data. arXiv preprint arXiv:2106.14421(2021).Google ScholarGoogle Scholar
  6. Samuel J Gershman. 2017. Reinforcement learning and causal models. The Oxford handbook of causal reasoning 295 (2017).Google ScholarGoogle Scholar
  7. St John Grimbly, Jonathan Shock, and Arnu Pretorius. 2021. Causal Multi-Agent Reinforcement Learning: Review and Open Problems. arXiv preprint arXiv:2111.06721(2021).Google ScholarGoogle Scholar
  8. Marek Grzes and Daniel Kudenko. 2009. Theoretical and empirical analysis of reward shaping in reinforcement learning. In 2009 International Conference on Machine Learning and Applications. IEEE, 337–344.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Prashan Madumal, Tim Miller, Liz Sonenberg, and Frank Vetere. 2020. Explainable reinforcement learning through a causal lens. In Proceedings of the AAAI conference on artificial intelligence, Vol. 34. 2493–2500.Google ScholarGoogle ScholarCross RefCross Ref
  10. Li Menglin, Chen Jing, Chen Shaofei, and Gao Wei. 2020. A new reinforcement learning algorithm based on counterfactual experience replay. In 2020 39th Chinese Control Conference (CCC). IEEE, 1994–2001.Google ScholarGoogle ScholarCross RefCross Ref
  11. Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, 2015. Human-level control through deep reinforcement learning. Nature 518, 7540 (2015), 529–533.Google ScholarGoogle Scholar
  12. Gabriel Poesia, WenXin Dong, and Noah Goodman. 2021. Contrastive Reinforcement Learning of Symbolic Reasoning Domains. Advances in Neural Information Processing Systems 34 (2021), 15946–15956.Google ScholarGoogle Scholar
  13. Tom Schaul, John Quan, Ioannis Antonoglou, and David Silver. 2015. Prioritized experience replay. arXiv preprint arXiv:1511.05952(2015).Google ScholarGoogle Scholar
  14. Maximilian Seitzer, Bernhard Schölkopf, and Georg Martius. 2021. Causal influence detection for improving efficiency in reinforcement learning. Advances in Neural Information Processing Systems 34 (2021), 22905–22918.Google ScholarGoogle Scholar
  15. Shiv Shankar, Vihari Piratla, Soumen Chakrabarti, Siddhartha Chaudhuri, Preethi Jyothi, and Sunita Sarawagi. 2018. Generalizing across domains via cross-gradient training. arXiv preprint arXiv:1804.10745(2018).Google ScholarGoogle Scholar
  16. Richard Stuart Sutton. 1984. Temporal credit assignment in reinforcement learning. University of Massachusetts Amherst.Google ScholarGoogle Scholar

Index Terms

  1. Using Contrastive Samples for Identifying and Leveraging Possible Causal Relationships in Reinforcement Learning

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Other conferences
        CODS-COMAD '23: Proceedings of the 6th Joint International Conference on Data Science & Management of Data (10th ACM IKDD CODS and 28th COMAD)
        January 2023
        357 pages
        ISBN:9781450397971
        DOI:10.1145/3570991

        Copyright © 2023 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 4 January 2023

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • short-paper
        • Research
        • Refereed limited

        Acceptance Rates

        Overall Acceptance Rate197of680submissions,29%
      • Article Metrics

        • Downloads (Last 12 months)36
        • Downloads (Last 6 weeks)3

        Other Metrics

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format .

      View HTML Format