Reinforcement learning with Demonstrations from Mismatched Task under Sparse Reward

Guo, Yanjiang; Gao, Jingyue; Wu, Zheng; Shi, Chengming; Chen, Jianyu

Computer Science > Robotics

arXiv:2212.01509 (cs)

[Submitted on 3 Dec 2022 (v1), last revised 8 Mar 2023 (this version, v2)]

Title:Reinforcement learning with Demonstrations from Mismatched Task under Sparse Reward

Authors:Yanjiang Guo, Jingyue Gao, Zheng Wu, Chengming Shi, Jianyu Chen

View PDF

Abstract:Reinforcement learning often suffer from the sparse reward issue in real-world robotics problems. Learning from demonstration (LfD) is an effective way to eliminate this problem, which leverages collected expert data to aid online learning. Prior works often assume that the learning agent and the expert aim to accomplish the same task, which requires collecting new data for every new task. In this paper, we consider the case where the target task is mismatched from but similar with that of the expert. Such setting can be challenging and we found existing LfD methods can not effectively guide learning in mismatched new tasks with sparse rewards. We propose conservative reward shaping from demonstration (CRSfD), which shapes the sparse rewards using estimated expert value function. To accelerate learning processes, CRSfD guides the agent to conservatively explore around demonstrations. Experimental results of robot manipulation tasks show that our approach outperforms baseline LfD methods when transferring demonstrations collected in a single task to other different but similar tasks.

Comments:	11 pages, 5 figures, CoRL 2022
Subjects:	Robotics (cs.RO); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2212.01509 [cs.RO]
	(or arXiv:2212.01509v2 [cs.RO] for this version)
	https://doi.org/10.48550/arXiv.2212.01509

Submission history

From: Yanjiang Guo [view email]
[v1] Sat, 3 Dec 2022 02:24:59 UTC (5,748 KB)
[v2] Wed, 8 Mar 2023 15:35:56 UTC (5,748 KB)

Computer Science > Robotics

Title:Reinforcement learning with Demonstrations from Mismatched Task under Sparse Reward

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Robotics

Title:Reinforcement learning with Demonstrations from Mismatched Task under Sparse Reward

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators