Dynamic Regret of Adversarial MDPs with Unknown Transition and Linear Function Approximation

Authors

  • Long-Fei Li Nanjing University
  • Peng Zhao Nanjing University
  • Zhi-Hua Zhou Nanjing University

DOI:

https://doi.org/10.1609/aaai.v38i12.29261

Keywords:

ML: Reinforcement Learning, ML: Online Learning & Bandits

Abstract

We study reinforcement learning (RL) in episodic MDPs with adversarial full-information losses and the unknown transition. Instead of the classical static regret, we adopt dynamic regret as the performance measure which benchmarks the learner's performance with changing policies, making it more suitable for non-stationary environments. The primary challenge is to handle the uncertainties of unknown transition and unknown non-stationarity of environments simultaneously. We propose a general framework to decouple the two sources of uncertainties and show the dynamic regret bound naturally decomposes into two terms, one due to constructing confidence sets to handle the unknown transition and the other due to choosing sub-optimal policies under the unknown non-stationarity. To this end, we first employ the two-layer online ensemble structure to handle the adaptation error due to the unknown non-stationarity, which is model-agnostic. Subsequently, we instantiate the framework to three fundamental MDP models, including tabular MDPs, linear MDPs and linear mixture MDPs, and present corresponding approaches to control the exploration error due to the unknown transition. We provide dynamic regret guarantees respectively and show they are optimal in terms of the number of episodes K and the non-stationarity P̄ᴋ by establishing matching lower bounds. To the best of our knowledge, this is the first work that achieves the dynamic regret exhibiting optimal dependence on K and P̄ᴋ without prior knowledge about the non-stationarity for adversarial MDPs with unknown transition.

Published

2024-03-24

How to Cite

Li, L.-F., Zhao, P., & Zhou, Z.-H. (2024). Dynamic Regret of Adversarial MDPs with Unknown Transition and Linear Function Approximation. Proceedings of the AAAI Conference on Artificial Intelligence, 38(12), 13572-13580. https://doi.org/10.1609/aaai.v38i12.29261

Issue

Section

AAAI Technical Track on Machine Learning III