Abstract and Explore: A Novel Behavioral Metric with Cyclic Dynamics in Reinforcement Learning
DOI:
https://doi.org/10.1609/aaai.v38i15.29660Keywords:
ML: Reinforcement Learning, ROB: Behavior Learning & Control, ROB: Localization, Mapping, and NavigationAbstract
Intrinsic motivation lies at the heart of the exploration of reinforcement learning, which is primarily driven by the agent's inherent satisfaction rather than external feedback from the environment. However, in recent more challenging procedurally-generated environments with high stochasticity and uninformative extrinsic rewards, we identify two significant issues of applying intrinsic motivation. (1) State representation collapse: In existing methods, the learned representations within intrinsic motivation have a high probability to neglect the distinction among different states and be distracted by the task-irrelevant information brought by the stochasticity. (2) Insufficient interrelation among dynamics: Unsuccessful guidance provided by the uninformative extrinsic reward makes the dynamics learning in intrinsic motivation less effective. In light of the above observations, a novel Behavioral metric with Cyclic Dynamics (BCD) is proposed, which considers both cumulative and immediate effects and facilitates the abstraction and exploration of the agent. For the behavioral metric, the successor feature is utilized to reveal the expected future rewards and alleviate the heavy reliance of previous methods on extrinsic rewards. Moreover, the latent variable and vector quantization techniques are employed to enable an accurate measurement of the transition function in a discrete and interpretable manner. In addition, cyclic dynamics is established to capture the interrelations between state and action, thereby providing a thorough awareness of environmental dynamics. Extensive experiments conducted on procedurally-generated environments demonstrate the state-of-the-art performance of our proposed BCD.Downloads
Published
2024-03-24
How to Cite
Zhu, A., Zhang, P.-F., Qiu, R., Zheng, Z., Huang, Z., & Shao, J. (2024). Abstract and Explore: A Novel Behavioral Metric with Cyclic Dynamics in Reinforcement Learning. Proceedings of the AAAI Conference on Artificial Intelligence, 38(15), 17150-17158. https://doi.org/10.1609/aaai.v38i15.29660
Issue
Section
AAAI Technical Track on Machine Learning VI