Multiobjective Lipschitz Bandits under Lexicographic Ordering

Authors

  • Bo Xue City University of Hong Kong, Hong Kong, China The City University of Hong Kong Shenzhen Research Institute, Shenzhen, China
  • Ji Cheng City University of Hong Kong, Hong Kong, China The City University of Hong Kong Shenzhen Research Institute, Shenzhen, China
  • Fei Liu City University of Hong Kong, Hong Kong, China The City University of Hong Kong Shenzhen Research Institute, Shenzhen, China
  • Yimu Wang University of Waterloo, Waterloo, Canada
  • Qingfu Zhang City University of Hong Kong, Hong Kong, China The City University of Hong Kong Shenzhen Research Institute, Shenzhen, China

DOI:

https://doi.org/10.1609/aaai.v38i15.29558

Keywords:

ML: Reinforcement Learning, ML: Online Learning & Bandits

Abstract

This paper studies the multiobjective bandit problem under lexicographic ordering, wherein the learner aims to simultaneously maximize ? objectives hierarchically. The only existing algorithm for this problem considers the multi-armed bandit model, and its regret bound is O((KT)^(2/3)) under a metric called priority-based regret. However, this bound is suboptimal, as the lower bound for single objective multi-armed bandits is Omega(KlogT). Moreover, this bound becomes vacuous when the arm number K is infinite. To address these limitations, we investigate the multiobjective Lipschitz bandit model, which allows for an infinite arm set. Utilizing a newly designed multi-stage decision-making strategy, we develop an improved algorithm that achieves a general regret bound of O(T^((d_z^i+1)/(d_z^i+2))) for the i-th objective, where d_z^i is the zooming dimension for the i-th objective, with i in {1,2,...,m}. This bound matches the lower bound of the single objective Lipschitz bandit problem in terms of T, indicating that our algorithm is almost optimal. Numerical experiments confirm the effectiveness of our algorithm.

Published

2024-03-24

How to Cite

Xue, B., Cheng, J., Liu, F., Wang, Y., & Zhang, Q. (2024). Multiobjective Lipschitz Bandits under Lexicographic Ordering. Proceedings of the AAAI Conference on Artificial Intelligence, 38(15), 16238-16246. https://doi.org/10.1609/aaai.v38i15.29558

Issue

Section

AAAI Technical Track on Machine Learning VI