Exploration and Exploitation in Hierarchical Reinforcement Learning with Adaptive Scheduling

Huang, Zhigang; Liu, Quan

doi:10.3233/FAIA230386

Abstract

In hierarchical reinforcement learning (HRL), continuous options provide a knowledge carrier that is more aligned with human behavior, but reliable scheduling methods are not yet available. To design an available scheduling method for continuous options, in this paper, the hierarchical reinforcement learning with adaptive scheduling (HAS) algorithm is proposed. It focuses on achieving an adaptive balance between exploration and exploitation during the frequent scheduling of continuous options. It builds on multi-step static scheduling and makes switching decisions according to the relative advantages of the previous and the estimated options, enabling the agent to focus on different behaviors at different phases. The expected t-step distance is applied to demonstrate the superiority of adaptive scheduling in terms of exploration. Furthermore, an interruption incentive based on annealing is proposed to alleviate excessive exploration, accelerating the convergence rate. We develop a comprehensive experimental analysis scheme. The experimental results demonstrate the high performance and robustness of HAS. Moreover, it provides evidence that adaptive scheduling has a positive effect both on the representation and option policies.

Contact

IOS Press Copyright 2024

Contact

IOS Press Copyright 2024

This website uses cookies

This website uses cookies