Abstract and Explore: A Novel Behavioral Metric with Cyclic Dynamics in Reinforcement Learning

Authors

  • Anjie Zhu University of Electronic Science and Technology of China, China
  • Peng-Fei Zhang The University of Queensland, Australia
  • Ruihong Qiu The University of Queensland, Australia
  • Zetao Zheng University of Electronic Science and Technology of China, China
  • Zi Huang The University of Queensland, Australia
  • Jie Shao University of Electronic Science and Technology of China, China

DOI:

https://doi.org/10.1609/aaai.v38i15.29660

Keywords:

ML: Reinforcement Learning, ROB: Behavior Learning & Control, ROB: Localization, Mapping, and Navigation

Abstract

Intrinsic motivation lies at the heart of the exploration of reinforcement learning, which is primarily driven by the agent's inherent satisfaction rather than external feedback from the environment. However, in recent more challenging procedurally-generated environments with high stochasticity and uninformative extrinsic rewards, we identify two significant issues of applying intrinsic motivation. (1) State representation collapse: In existing methods, the learned representations within intrinsic motivation have a high probability to neglect the distinction among different states and be distracted by the task-irrelevant information brought by the stochasticity. (2) Insufficient interrelation among dynamics: Unsuccessful guidance provided by the uninformative extrinsic reward makes the dynamics learning in intrinsic motivation less effective. In light of the above observations, a novel Behavioral metric with Cyclic Dynamics (BCD) is proposed, which considers both cumulative and immediate effects and facilitates the abstraction and exploration of the agent. For the behavioral metric, the successor feature is utilized to reveal the expected future rewards and alleviate the heavy reliance of previous methods on extrinsic rewards. Moreover, the latent variable and vector quantization techniques are employed to enable an accurate measurement of the transition function in a discrete and interpretable manner. In addition, cyclic dynamics is established to capture the interrelations between state and action, thereby providing a thorough awareness of environmental dynamics. Extensive experiments conducted on procedurally-generated environments demonstrate the state-of-the-art performance of our proposed BCD.

Published

2024-03-24

How to Cite

Zhu, A., Zhang, P.-F., Qiu, R., Zheng, Z., Huang, Z., & Shao, J. (2024). Abstract and Explore: A Novel Behavioral Metric with Cyclic Dynamics in Reinforcement Learning. Proceedings of the AAAI Conference on Artificial Intelligence, 38(15), 17150-17158. https://doi.org/10.1609/aaai.v38i15.29660

Issue

Section

AAAI Technical Track on Machine Learning VI