OCEAN-MBRL: Offline Conservative Exploration for Model-Based Offline Reinforcement Learning

Fan Wu; Rui Zhang; Qi Yi; Yunkai Gao; Jiaming Guo; Shaohui Peng; Siming Lan; Husheng Han; Yansong Pan; Kaizhao Yuan; Pengwei Jin; Ruizhi Chen; Yunji Chen; Ling Li

doi:10.1609/aaai.v38i14.29520

Authors

Fan Wu Intelligent Software Research Center, Institute of Software, CAS, Beijing, China University of Chinese Academy of Sciences, UCAS, Beijing, China
Rui Zhang SKL of Processors, Institute of Computing Technology, CAS, Beijing, China
Qi Yi University of Science and Technology of China, USTC, Hefei, China
Yunkai Gao University of Science and Technology of China, USTC, Hefei, China
Jiaming Guo SKL of Processors, Institute of Computing Technology, CAS, Beijing, China
Shaohui Peng Intelligent Software Research Center, Institute of Software, CAS, Beijing, China
Siming Lan University of Science and Technology of China, USTC, Hefei, China
Husheng Han SKL of Processors, Institute of Computing Technology, CAS, Beijing, China University of Chinese Academy of Sciences, UCAS, Beijing, China
Yansong Pan University of Chinese Academy of Sciences, UCAS, Beijing, China
Kaizhao Yuan University of Chinese Academy of Sciences, UCAS, Beijing, China
Pengwei Jin University of Chinese Academy of Sciences, UCAS, Beijing, China SKL of Processors, Institute of Computing Technology, CAS, Beijing, China
Ruizhi Chen Intelligent Software Research Center, Institute of Software, CAS, Beijing, China
Yunji Chen SKL of Processors, Institute of Computing Technology, CAS, Beijing, China University of Chinese Academy of Sciences, UCAS, Beijing, China
Ling Li Intelligent Software Research Center, Institute of Software, CAS, Beijing, China University of Chinese Academy of Sciences, UCAS, Beijing, China

DOI:

https://doi.org/10.1609/aaai.v38i14.29520

Keywords:

ML: Reinforcement Learning

Abstract

Model-based offline reinforcement learning (RL) algorithms have emerged as a promising paradigm for offline RL. These algorithms usually learn a dynamics model from a static dataset of transitions, use the model to generate synthetic trajectories, and perform conservative policy optimization within these trajectories. However, our observations indicate that policy optimization methods used in these model-based offline RL algorithms are not effective at exploring the learned model and induce biased exploration, which ultimately impairs the performance of the algorithm. To address this issue, we propose Offline Conservative ExplorAtioN (OCEAN), a novel rollout approach to model-based offline RL. In our method, we incorporate additional exploration techniques and introduce three conservative constraints based on uncertainty estimation to mitigate the potential impact of significant dynamic errors resulting from exploratory transitions. Our work is a plug-in method and can be combined with classical model-based RL algorithms, such as MOPO, COMBO, and RAMBO. Experiment results of our method on the D4RL MuJoCo benchmark show that OCEAN significantly improves the performance of existing algorithms.

OCEAN-MBRL: Offline Conservative Exploration for Model-Based Offline Reinforcement Learning

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information

Developed By

Subscription