How to Fine-tune the Model: Unified Model Shift and Model Bias Policy Optimization

Zhang, Hai; Yu, Hang; Zhao, Junqiao; Zhang, Di; Huang, Chang; Zhou, Hongtu; Zhang, Xiao; Ye, Chen

Computer Science > Machine Learning

arXiv:2309.12671 (cs)

[Submitted on 22 Sep 2023 (v1), last revised 24 Oct 2023 (this version, v2)]

Title:How to Fine-tune the Model: Unified Model Shift and Model Bias Policy Optimization

Authors:Hai Zhang, Hang Yu, Junqiao Zhao, Di Zhang, Chang Huang, Hongtu Zhou, Xiao Zhang, Chen Ye

View PDF

Abstract:Designing and deriving effective model-based reinforcement learning (MBRL) algorithms with a performance improvement guarantee is challenging, mainly attributed to the high coupling between model learning and policy optimization. Many prior methods that rely on return discrepancy to guide model learning ignore the impacts of model shift, which can lead to performance deterioration due to excessive model updates. Other methods use performance difference bound to explicitly consider model shift. However, these methods rely on a fixed threshold to constrain model shift, resulting in a heavy dependence on the threshold and a lack of adaptability during the training process. In this paper, we theoretically derive an optimization objective that can unify model shift and model bias and then formulate a fine-tuning process. This process adaptively adjusts the model updates to get a performance improvement guarantee while avoiding model overfitting. Based on these, we develop a straightforward algorithm USB-PO (Unified model Shift and model Bias Policy Optimization). Empirical results show that USB-PO achieves state-of-the-art performance on several challenging benchmark tasks.

Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2309.12671 [cs.LG]
	(or arXiv:2309.12671v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2309.12671

Submission history

From: Hai Zhang [view email]
[v1] Fri, 22 Sep 2023 07:27:32 UTC (10,297 KB)
[v2] Tue, 24 Oct 2023 06:09:44 UTC (10,340 KB)

Computer Science > Machine Learning

Title:How to Fine-tune the Model: Unified Model Shift and Model Bias Policy Optimization

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:How to Fine-tune the Model: Unified Model Shift and Model Bias Policy Optimization

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators