short-paper

Using Contrastive Samples for Identifying and Leveraging Possible Causal Relationships in Reinforcement Learning

Authors:
Harshad Khadilkar

Tata Consultancy Services Ltd, India

Tata Consultancy Services Ltd, India

0000-0003-3601-778X
View Profile

,
Hardik Meisheri

Tata Consultancy Services Ltd, India

Tata Consultancy Services Ltd, India

0000-0002-9014-1098
View Profile

CODS-COMAD '23: Proceedings of the 6th Joint International Conference on Data Science & Management of Data (10th ACM IKDD CODS and 28th COMAD)January 2023Pages 108–112https://doi.org/10.1145/3570991.3571031

Published:04 January 2023Publication History

CODS-COMAD '23: Proceedings of the 6th Joint International Conference on Data Science & Management of Data (10th ACM IKDD CODS and 28th COMAD)

Pages 108–112

ABSTRACT

A significant challenge in reinforcement learning is quantifying the complex relationship between actions and long-term rewards. The effects may manifest themselves over a long sequence of state-action pairs, making them hard to pinpoint. In this paper, we propose a method to link transitions with significant deviations in state with unusually large variations in subsequent rewards. Such transitions are marked as possible causal effects, and the corresponding state-action pairs are added to a separate replay buffer. In addition, we include contrastive samples corresponding to transitions from a similar state but with differing actions. Including this Contrastive Experience Replay (CER) during training is shown to outperform standard value-based methods on 2D navigation tasks. We believe that CER can be useful for a broad class of learning tasks, including for any off-policy reinforcement learning algorithm.

References

Marcin Andrychowicz, Filip Wolski, Alex Ray, Jonas Schneider, Rachel Fong, Peter Welinder, Bob McGrew, Josh Tobin, OpenAI Pieter Abbeel, and Wojciech Zaremba. 2017. Hindsight experience replay. Advances in neural information processing systems 30 (2017).Google Scholar
Adrià Puigdomènech Badia, Pablo Sprechmann, Alex Vitvitskyi, Daniel Guo, Bilal Piot, Steven Kapturowski, Olivier Tieleman, Martín Arjovsky, Alexander Pritzel, Andew Bolt, 2020. Never give up: Learning directed exploration strategies. In International Conference on Learning Representations.Google Scholar
Maxime Chevalier-Boisvert, Lucas Willems, and Suman Pal. 2018. Minimalistic Gridworld Environment for OpenAI Gym. https://github.com/maximecb/gym-minigridGoogle Scholar
Ishita Dasgupta, Jane Wang, Silvia Chiappa, Jovana Mitrovic, Pedro Ortega, David Raposo, Edward Hughes, Peter Battaglia, Matthew Botvinick, and Zeb Kurth-Nelson. 2019. Causal reasoning from meta-reinforcement learning. arXiv preprint arXiv:1901.08162(2019).Google Scholar
Maxime Gasse, Damien Grasset, Guillaume Gaudron, and Pierre-Yves Oudeyer. 2021. Causal reinforcement learning using observational and interventional data. arXiv preprint arXiv:2106.14421(2021).Google Scholar
Samuel J Gershman. 2017. Reinforcement learning and causal models. The Oxford handbook of causal reasoning 295 (2017).Google Scholar
St John Grimbly, Jonathan Shock, and Arnu Pretorius. 2021. Causal Multi-Agent Reinforcement Learning: Review and Open Problems. arXiv preprint arXiv:2111.06721(2021).Google Scholar
Marek Grzes and Daniel Kudenko. 2009. Theoretical and empirical analysis of reward shaping in reinforcement learning. In 2009 International Conference on Machine Learning and Applications. IEEE, 337–344.Google ScholarDigital Library
Prashan Madumal, Tim Miller, Liz Sonenberg, and Frank Vetere. 2020. Explainable reinforcement learning through a causal lens. In Proceedings of the AAAI conference on artificial intelligence, Vol. 34. 2493–2500.Google ScholarCross Ref
Li Menglin, Chen Jing, Chen Shaofei, and Gao Wei. 2020. A new reinforcement learning algorithm based on counterfactual experience replay. In 2020 39th Chinese Control Conference (CCC). IEEE, 1994–2001.Google ScholarCross Ref
Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, 2015. Human-level control through deep reinforcement learning. Nature 518, 7540 (2015), 529–533.Google Scholar
Gabriel Poesia, WenXin Dong, and Noah Goodman. 2021. Contrastive Reinforcement Learning of Symbolic Reasoning Domains. Advances in Neural Information Processing Systems 34 (2021), 15946–15956.Google Scholar
Tom Schaul, John Quan, Ioannis Antonoglou, and David Silver. 2015. Prioritized experience replay. arXiv preprint arXiv:1511.05952(2015).Google Scholar
Maximilian Seitzer, Bernhard Schölkopf, and Georg Martius. 2021. Causal influence detection for improving efficiency in reinforcement learning. Advances in Neural Information Processing Systems 34 (2021), 22905–22918.Google Scholar
Shiv Shankar, Vihari Piratla, Soumen Chakrabarti, Siddhartha Chaudhuri, Preethi Jyothi, and Sunita Sarawagi. 2018. Generalizing across domains via cross-gradient training. arXiv preprint arXiv:1804.10745(2018).Google Scholar
Richard Stuart Sutton. 1984. Temporal credit assignment in reinforcement learning. University of Massachusetts Amherst.Google Scholar

Index Terms

Using Contrastive Samples for Identifying and Leveraging Possible Causal Relationships in Reinforcement Learning
1. Computing methodologies
  1. Artificial intelligence
    1. Knowledge representation and reasoning
      1. Causal reasoning and diagnostics
  2. Machine learning
    1. Learning paradigms
      1. Reinforcement learning

Recommendations

Reward Shaping in Episodic Reinforcement Learning
AAMAS '17: Proceedings of the 16th Conference on Autonomous Agents and MultiAgent Systems

Recent advancements in reinforcement learning confirm that reinforcement learning techniques can solve large scale problems leading to high quality autonomous decision making. It is a matter of time until we will see large scale applications of ...
Read More
Counterfactual Explanations for Reinforcement Learning Agents
AAMAS '23: Proceedings of the 2023 International Conference on Autonomous Agents and Multiagent Systems

Reinforcement learning (RL) algorithms often use neural networks to represent agent's policy, making them difficult to interpret. Counterfactual explanations are human-friendly explanations which offer users actionable advice on how to change their ...
Read More
Using Transfer Learning to Speed-Up Reinforcement Learning: A Cased-Based Approach
LARS '10: Proceedings of the 2010 Latin American Robotics Symposium and Intelligent Robotics Meeting

Reinforcement Learning (RL) is a well-known technique for the solution of problems where agents need to act with success in an unknown environment, learning through trial and error. However, this technique is not efficient enough to be used in ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

CODS-COMAD '23: Proceedings of the 6th Joint International Conference on Data Science & Management of Data (10th ACM IKDD CODS and 28th COMAD)
January 2023
357 pages
ISBN:9781450397971
DOI:10.1145/3570991

Copyright © 2023 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 4 January 2023
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
causality
contrastive replay samples
reinforcement learning
Qualifiers
- short-paper
- Research
- Refereed limited
Conference

Acceptance Rates
Overall Acceptance Rate197of680submissions,29%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 47
  Total Downloads
- Downloads (Last 12 months)36
- Downloads (Last 6 weeks)3
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Using Contrastive Samples for Identifying and Leveraging Possible Causal Relationships in Reinforcement Learning

CODS-COMAD '23: Proceedings of the 6th Joint International Conference on Data Science & Management of Data (10th ACM IKDD CODS and 28th COMAD)

ABSTRACT

References

Cited By

Index Terms

Recommendations

Reward Shaping in Episodic Reinforcement Learning

Counterfactual Explanations for Reinforcement Learning Agents

Using Transfer Learning to Speed-Up Reinforcement Learning: A Cased-Based Approach

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

Using Contrastive Samples for Identifying and Leveraging Possible Causal Relationships in Reinforcement Learning

CODS-COMAD '23: Proceedings of the 6th Joint International Conference on Data Science & Management of Data (10th ACM IKDD CODS and 28th COMAD)

ABSTRACT

References

Cited By

Index Terms

Recommendations

Reward Shaping in Episodic Reinforcement Learning

Counterfactual Explanations for Reinforcement Learning Agents

Using Transfer Learning to Speed-Up Reinforcement Learning: A Cased-Based Approach

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media