OVD-Explorer: Optimism Should Not Be the Sole Pursuit of Exploration in Noisy Environments

Liu, Jinyi; Wang, Zhi; Zheng, Yan; Hao, Jianye; Bai, Chenjia; Ye, Junjie; Wang, Zhen; Piao, Haiyin; Sun, Yang

Computer Science > Machine Learning

arXiv:2312.12145 (cs)

[Submitted on 19 Dec 2023 (v1), last revised 20 Dec 2023 (this version, v2)]

Title:OVD-Explorer: Optimism Should Not Be the Sole Pursuit of Exploration in Noisy Environments

Authors:Jinyi Liu, Zhi Wang, Yan Zheng, Jianye Hao, Chenjia Bai, Junjie Ye, Zhen Wang, Haiyin Piao, Yang Sun

View PDF HTML (experimental)

Abstract:In reinforcement learning, the optimism in the face of uncertainty (OFU) is a mainstream principle for directing exploration towards less explored areas, characterized by higher uncertainty. However, in the presence of environmental stochasticity (noise), purely optimistic exploration may lead to excessive probing of high-noise areas, consequently impeding exploration efficiency. Hence, in exploring noisy environments, while optimism-driven exploration serves as a foundation, prudent attention to alleviating unnecessary over-exploration in high-noise areas becomes beneficial. In this work, we propose Optimistic Value Distribution Explorer (OVD-Explorer) to achieve a noise-aware optimistic exploration for continuous control. OVD-Explorer proposes a new measurement of the policy's exploration ability considering noise in optimistic perspectives, and leverages gradient ascent to drive exploration. Practically, OVD-Explorer can be easily integrated with continuous control RL algorithms. Extensive evaluations on the MuJoCo and GridChaos tasks demonstrate the superiority of OVD-Explorer in achieving noise-aware optimistic exploration.

Comments:	Accepted by AAAI 2024, with appendix
Subjects:	Machine Learning (cs.LG)
Cite as:	arXiv:2312.12145 [cs.LG]
	(or arXiv:2312.12145v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2312.12145

Submission history

From: Jinyi Liu [view email]
[v1] Tue, 19 Dec 2023 13:28:34 UTC (2,202 KB)
[v2] Wed, 20 Dec 2023 15:16:32 UTC (2,202 KB)

Computer Science > Machine Learning

Title:OVD-Explorer: Optimism Should Not Be the Sole Pursuit of Exploration in Noisy Environments

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:OVD-Explorer: Optimism Should Not Be the Sole Pursuit of Exploration in Noisy Environments

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators