Decoding Global Preferences: Temporal and Cooperative Dependency Modeling in Multi-Agent Preference-Based Reinforcement Learning

Authors

  • Tianchen Zhu School of Computer Science and Engineering, Beihang University
  • Yue Qiu School of Computer Science and Engineering, Beihang University
  • Haoyi Zhou Zhongguancun Laboratory, Beijing, China School of Software, Beihang University
  • Jianxin Li School of Computer Science and Engineering, Beihang University Zhongguancun Laboratory, Beijing, China

DOI:

https://doi.org/10.1609/aaai.v38i15.29666

Keywords:

ML: Reinforcement Learning, MAS: Multiagent Learning

Abstract

Designing accurate reward functions for reinforcement learning (RL) has long been challenging. Preference-based RL (PbRL) offers a promising approach by using human preferences to train agents, eliminating the need for manual reward design. While successful in single-agent tasks, extending PbRL to complex multi-agent scenarios is nontrivial. Existing PbRL methods lack the capacity to comprehensively capture both temporal and cooperative aspects, leading to inadequate reward functions. This work introduces an advanced multi-agent preference learning framework that effectively addresses these limitations. Based on a cascading Transformer architecture, our approach captures both temporal and cooperative dependencies, alleviating issues related to reward uniformity and intricate interactions among agents. Experimental results demonstrate substantial performance improvements in multi-agent cooperative tasks, and the reconstructed reward function closely resembles expert-defined reward functions. The source code is available at https://github.com/catezi/MAPT.

Published

2024-03-24

How to Cite

Zhu, T., Qiu, Y., Zhou, H., & Li, J. (2024). Decoding Global Preferences: Temporal and Cooperative Dependency Modeling in Multi-Agent Preference-Based Reinforcement Learning. Proceedings of the AAAI Conference on Artificial Intelligence, 38(15), 17202-17210. https://doi.org/10.1609/aaai.v38i15.29666

Issue

Section

AAAI Technical Track on Machine Learning VI