Abstract and Explore: A Novel Behavioral Metric with Cyclic Dynamics in Reinforcement Learning

Anjie Zhu; Peng-Fei Zhang; Ruihong Qiu; Zetao Zheng; Zi Huang; Jie Shao

doi:10.1609/aaai.v38i15.29660

Authors

Anjie Zhu University of Electronic Science and Technology of China, China
Peng-Fei Zhang The University of Queensland, Australia
Ruihong Qiu The University of Queensland, Australia
Zetao Zheng University of Electronic Science and Technology of China, China
Zi Huang The University of Queensland, Australia
Jie Shao University of Electronic Science and Technology of China, China

DOI:

https://doi.org/10.1609/aaai.v38i15.29660

Keywords:

ML: Reinforcement Learning, ROB: Behavior Learning & Control, ROB: Localization, Mapping, and Navigation

Abstract

Intrinsic motivation lies at the heart of the exploration of reinforcement learning, which is primarily driven by the agent's inherent satisfaction rather than external feedback from the environment. However, in recent more challenging procedurally-generated environments with high stochasticity and uninformative extrinsic rewards, we identify two significant issues of applying intrinsic motivation. (1) State representation collapse: In existing methods, the learned representations within intrinsic motivation have a high probability to neglect the distinction among different states and be distracted by the task-irrelevant information brought by the stochasticity. (2) Insufficient interrelation among dynamics: Unsuccessful guidance provided by the uninformative extrinsic reward makes the dynamics learning in intrinsic motivation less effective. In light of the above observations, a novel Behavioral metric with Cyclic Dynamics (BCD) is proposed, which considers both cumulative and immediate effects and facilitates the abstraction and exploration of the agent. For the behavioral metric, the successor feature is utilized to reveal the expected future rewards and alleviate the heavy reliance of previous methods on extrinsic rewards. Moreover, the latent variable and vector quantization techniques are employed to enable an accurate measurement of the transition function in a discrete and interpretable manner. In addition, cyclic dynamics is established to capture the interrelations between state and action, thereby providing a thorough awareness of environmental dynamics. Extensive experiments conducted on procedurally-generated environments demonstrate the state-of-the-art performance of our proposed BCD.

Abstract and Explore: A Novel Behavioral Metric with Cyclic Dynamics in Reinforcement Learning

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information

Developed By

Subscription