Skip to main content

Cooperative Multi-agent Reinforcement Learning with Hierachical Communication Architecture

  • Conference paper
  • First Online:
Artificial Neural Networks and Machine Learning – ICANN 2022 (ICANN 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13530))

Included in the following conference series:

  • 2331 Accesses

Abstract

Communication is an essential way for multi-agent system to coordinate. By sharing local observations and intentions via communication channel, agents can better deal with dynamic environment and thus make optimal decisions. However, restricted by the limited communication channel, agents have to leverage less communication resources to transmit more informative messages. In this article, we propose a two-level hierarchical multi-agent reinforcement learning algorithm which utilizes different timescales in different levels. Communication happens only between high levels at a coarser time scale to generate sub-goals which convey the intention of agents for the low level. And the low level is responsible for implementing these sub-goals by controlling primitive actions at every tick of environment. Sub-goal is the core of this hierachical communication architecture which requires the high level to communicate efficiently and provide guidance for the low level to coordinate. This hierarchical communication architecture conveys several benefits: 1) It coarsens the collaborative granularity and reduces the requirement of communication since communication happens only in high level at a larger scale; 2) It enables the high level to focus on the coordination of goals without paying attention to implementation, thus improves the efficiency of communication; and 3) It makes better control by dividing a complex multi-agent cooperative task into multiple single-agent tasks. In experiments, we apply our approach in vehicle collision avoidance tasks and achieve better performance than baselines.

This work was supported in part by the Natural Science Foundation of China under Grant 61902035, Grant 61876023, Grant 62001054 and Grant 62102041, and in part by the Fundamental Research Funds for the Central Universities.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Bacon, P.L., Harb, J., Precup, D.: The option-critic architecture. In: Proceedings of the AAAI, vol. 31 (2017)

    Google Scholar 

  2. Cao, Y., Yu, W., Ren, W., Chen, G.: An overview of recent progress in the study of distributed multi-agent coordination. IEEE Trans. Ind. Inform. 9(1), 427–438 (2012)

    Article  Google Scholar 

  3. Chu, T., Chinchali, S., Katti, S.: Multi-agent reinforcement learning for networked system control. In: Proceedings of the ICLR (2019)

    Google Scholar 

  4. Das, A., Gervet, T., Romoff, J., Batra, D., Parikh, D., Rabbat, M., Pineau, J.: Tarmac: targeted multi-agent communication. In: Proceeding of the ICML, pp. 1538–1546. PMLR (2019)

    Google Scholar 

  5. Foerster, J., Assael, I.A., De Freitas, N., Whiteson, S.: Learning to communicate with deep multi-agent reinforcement learning. In: Advances in Neural Information Processing Systems, vol. 29 (2016)

    Google Scholar 

  6. Foerster, J., Farquhar, G., Afouras, T., Nardelli, N., Whiteson, S.: Counterfactual multi-agent policy gradients. In: Proceedings of the AAAI, vol. 32 (2018)

    Google Scholar 

  7. Harb, J., Bacon, P.L., Klissarov, M., Precup, D.: When waiting is not an option: learning options with a deliberation cost. In: Proceedings of the AAAI (2018)

    Google Scholar 

  8. Hoshen, Y.: Vain: attentional multi-agent predictive modeling. In: Advances in Neural Information Processing Systems, vol. 30 (2017)

    Google Scholar 

  9. Jiang, J., Lu, Z.: Learning attentional communication for multi-agent cooperation. Advances in Neural Information Processing Systems, vol. 31 (2018)

    Google Scholar 

  10. Kim, D., Moon, S., Hostallero, D., Kang, W.J., Lee, T., Son, K., Yi, Y.: Learning to schedule communication in multi-agent reinforcement learning. In: Proceedings of the ICLR (2018)

    Google Scholar 

  11. Kulkarni, T.D., Narasimhan, K., Saeedi, A., Tenenbaum, J.: Hierarchical deep reinforcement learning: integrating temporal abstraction and intrinsic motivation. In: Advances in Neural Information Processing Systems, vol. 29 (2016)

    Google Scholar 

  12. Lowe, R., Wu, Y.I., Tamar, A., Harb, J., Pieter Abbeel, O., Mordatch, I.: Multi-agent actor-critic for mixed cooperative-competitive environments. In: Advances in Neural Information Processing Systems, vol. 30 (2017)

    Google Scholar 

  13. Peng, P., et al.: Multiagent bidirectionally-coordinated nets: Emergence of human-level coordination in learning to play starcraft combat games. arXiv:1703.10069 (2017)

  14. Rashid, T., Samvelyan, M., Schroeder, C., Farquhar, G., Foerster, J., Whiteson, S.: Qmix: monotonic value function factorisation for deep multi-agent reinforcement learning. In: Proceedings of the ICML, vol. 80, pp. 4295–4304 (2018)

    Google Scholar 

  15. Singh, A., Jain, T., Sukhbaatar, S.: Learning when to communicate at scale in multiagent cooperative and competitive tasks. In: Proceedings of the IMCL (2018)

    Google Scholar 

  16. Son, K., Kim, D., Kang, W.J., Hostallero, D.E., Yi, Y.: Qtran: learning to factorize with transformation for cooperative multi-agent reinforcement learning. In: Proceedings of the ICML, vol. 97, pp. 5887–5896 (2019)

    Google Scholar 

  17. Sukhbaatar, S., Fergus, R., et al.: Learning multiagent communication with backpropagation. In: Advances in Neural Information Processing Systems, vol. 29 (2016)

    Google Scholar 

  18. Sutton, R.S., Precup, D., Singh, S.: Between MDPs and semi-MDPs: a framework for temporal abstraction in reinforcement learning. Artif. Intell. 112(1–2), 181–211 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  19. Vezhnevets, A.S., et al.: Feudal networks for hierarchical reinforcement learning. In: Proceedings of the ICML, pp. 3540–3549 (2017)

    Google Scholar 

  20. Vinyals, O., et al.: Starcraft ii: a new challenge for reinforcement learning. arXiv:1708.04782 (2017)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shifan Liu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Liu, S., Yuan, Q., Chen, B., Luo, G., Li, J. (2022). Cooperative Multi-agent Reinforcement Learning with Hierachical Communication Architecture. In: Pimenidis, E., Angelov, P., Jayne, C., Papaleonidas, A., Aydin, M. (eds) Artificial Neural Networks and Machine Learning – ICANN 2022. ICANN 2022. Lecture Notes in Computer Science, vol 13530. Springer, Cham. https://doi.org/10.1007/978-3-031-15931-2_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-15931-2_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-15930-5

  • Online ISBN: 978-3-031-15931-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics