DGPO: Discovering Multiple Strategies with Diversity-Guided Policy Optimization

Authors

  • Wentse Chen Carnegie Mellon University
  • Shiyu Huang 4Paradigm Inc.
  • Yuan Chiang Tsinghua University
  • Tim Pearce Microsoft Research
  • Wei-Wei Tu 4Paradigm Inc.
  • Ting Chen Tsinghua University
  • Jun Zhu Tsinghua University

DOI:

https://doi.org/10.1609/aaai.v38i10.29019

Keywords:

ML: Reinforcement Learning, ML: Deep Learning Algorithms

Abstract

Most reinforcement learning algorithms seek a single optimal strategy that solves a given task. However, it can often be valuable to learn a diverse set of solutions, for instance, to make an agent's interaction with users more engaging, or improve the robustness of a policy to an unexpected perturbance. We propose Diversity-Guided Policy Optimization (DGPO), an on-policy algorithm that discovers multiple strategies for solving a given task. Unlike prior work, it achieves this with a shared policy network trained over a single run. Specifically, we design an intrinsic reward based on an information-theoretic diversity objective. Our final objective alternately constraints on the diversity of the strategies and on the extrinsic reward. We solve the constrained optimization problem by casting it as a probabilistic inference task and use policy iteration to maximize the derived lower bound. Experimental results show that our method efficiently discovers diverse strategies in a wide variety of reinforcement learning tasks. Compared to baseline methods, DGPO achieves comparable rewards, while discovering more diverse strategies, and often with better sample efficiency.

Published

2024-03-24

How to Cite

Chen, W., Huang, S., Chiang, Y., Pearce, T., Tu, W.-W., Chen, T., & Zhu, J. (2024). DGPO: Discovering Multiple Strategies with Diversity-Guided Policy Optimization. Proceedings of the AAAI Conference on Artificial Intelligence, 38(10), 11390-11398. https://doi.org/10.1609/aaai.v38i10.29019

Issue

Section

AAAI Technical Track on Machine Learning I