Skip to main content

Towards Interpretable Policies in Multi-agent Reinforcement Learning Tasks

  • Conference paper
  • First Online:
Bioinspired Optimization Methods and Their Applications (BIOMA 2022)

Abstract

Deep Learning (DL) allowed the field of Multi-Agent Reinforcement Learning (MARL) to make significant advances, speeding-up the progress in the field. However, agents trained by means of DL in MARL settings have an important drawback: their policies are extremely hard to interpret, not only at the individual agent level, but also (and especially) considering the fact that one has to take into account the interactions across the whole set of agents. In this work, we make a step towards achieving interpretability in MARL tasks. To do that, we present an approach that combines evolutionary computation (i.e., grammatical evolution) and reinforcement learning (Q-learning), which allows us to produce agents that are, at least to some extent, understandable. Moreover, differently from the typically centralized DL-based approaches (and because of the possibility to use a replay buffer), in our method we can easily employ Independent Q-learning to train a team of agents, which facilitates robustness and scalability. By evaluating our approach on the Battlefield task from the MAgent implementation in the PettingZoo library, we observe that the evolved team of agents is able to coordinate its actions in a distributed fashion, solving the task in an effective way.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 54.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 69.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    In the rest of this paper, we will define as an interpretable system one that can be understood and inspected by humans [2].

  2. 2.

    https://www.pettingzoo.ml/magent/battlefield (accessed on 02/02/2022).

References

  1. OroojlooyJadid, A., Hajinezhad, D.: A review of cooperative multi-agent deep reinforcement learning (2020) . arXiv:1908.03963

  2. Barredo Arrieta, A., et al.: Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI. Inf. Fus. 58, 82–115 (2020)

    Google Scholar 

  3. Rudin, C.: Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nat. Mach. Intell. 1(5), 206–215 (2019)

    Article  Google Scholar 

  4. Rudin, C., Radin, J.: Why are we using black box models in AI when we don’t need To? A lesson from an explainable ai competition. Harvard Data Sci. Rev .1(2) (November 2019)

    Google Scholar 

  5. Rudin, C., Chen, C., Chen, Z., Huang, H., Semenova, L., Zhong, C.: Interpretable machine learning: fundamental principles and 10 grand challenges, July 2021. arXiv:2103.11251

  6. Custode, L.L., Iacca, G.: Evolutionary learning of interpretable decision trees (2020)

    Google Scholar 

  7. Potter, M.A., De Jong, K.A.: A cooperative coevolutionary approach to function optimization. In: Davidor, Y., Schwefel, H.-P., Männer, R. (eds.) PPSN 1994. LNCS, vol. 866, pp. 249–257. Springer, Heidelberg (1994). https://doi.org/10.1007/3-540-58484-6_269

    Chapter  Google Scholar 

  8. Zheng, L., et al.: MAgent: a many-agent reinforcement learning platform for artificial collective intelligence. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32, pp. 8222–8223 (2018)

    Google Scholar 

  9. Terry, J.K., et al.: Pettingzoo: gym for multi-agent reinforcement learning (2020). arXiv:2009.14471

  10. Busoniu, L., Babuska, R., De Schutter, B.: A comprehensive survey of multiagent reinforcement learning. IEEE Trans. Syst. Man Cybernet. Part C (Applications and Reviews) 38(2) 156–172 (2008)

    Google Scholar 

  11. Stone, P., Veloso, M.: Multiagent Systems: A Survey from a Machine Learning Perspective: Technical report. Defense Technical Information Center, Fort Belvoir, VA, December 1997

    Google Scholar 

  12. Yu, C., Liu, J., Nemati, S.: Reinforcement Learning in Healthcare: a survey, April 2020. arXiv:1908.08796

  13. Sandholm, T.W., Crites, R.H.: On multiagent Q-learning in a semi-competitive domain. In: Weiß, G., Sen, S. (eds.) IJCAI 1995. LNCS, vol. 1042, pp. 191–205. Springer, Heidelberg (1996). https://doi.org/10.1007/3-540-60923-7_28

    Chapter  Google Scholar 

  14. Littman, M.L.: Markov games as a framework for multi-agent reinforcement learning. In: Machine Learning Proceedings 1994. Morgan Kaufmann, San Francisco (CA), pp. 157–163 (1994)

    Google Scholar 

  15. Haynes, T., Wainwright, R.L., Sen, S., Schoenefeld, D.A.: Strongly typed genetic programming in evolving cooperation strategies. In: International Conference on Genetic Algorithms, San Francisco, CA, USA, pp. 271–278. Morgan Kaufmann Publishers Inc. (July 1995)

    Google Scholar 

  16. Tan, M.: In: Multi-agent Reinforcement Learning: Independent vs, pp. 487–494. Cooperative Agents. Morgan Kaufmann Publishers Inc., San Francisco (1997)

    Google Scholar 

  17. Lauer, M., Riedmiller, M.A.: An algorithm for distributed reinforcement learning in cooperative multi-agent systems. In: International Conference on Machine Learning, San Francisco, CA, USA, pp. 535–542. Morgan Kaufmann Publishers Inc. (2000)

    Google Scholar 

  18. Fuji, T., Ito, K., Matsumoto, K., Yano, K.: Deep multi-agent reinforcement learning using DNN-weight evolution to optimize supply chain performance. In: Hawaii International Conference on System Sciences, pp. 1278–1287. Honolulu, HI, USA, HICSS, (2018)

    Google Scholar 

  19. Omidshafiei, S., Pazis, J., Amato, C., How, J.P., Vian, J.: Deep decentralized multi-task multi-agent reinforcement learning under partial observability. In: International Conference on Machine Learning, pp. 2681–2690. Sydney, NSW, Australia, JMLR.org, August 2017

    Google Scholar 

  20. Matignon, L., Laurent, G.J., Le Fort-Piat, N.: Hysteretic q-learning: an algorithm for decentralized reinforcement learning in cooperative multi-agent teams. In: International Conference on Intelligent Robots and Systems, pp. 64–69. New York, NY, USA, IEEE/RSJ (2007)

    Google Scholar 

  21. Tampuu, A., et al.: Multiagent cooperation and competition with deep reinforcement learning, November 2015. arXiv:1511.08779

  22. Chu, X., Ye, H.: Parameter sharing deep deterministic policy gradient for cooperative multi-agent reinforcement learning, October 2017. arXiv:1710.00336

  23. Singh, A., Jain, T., Sukhbaatar, S.: Learning when to communicate at scale in multiagent cooperative and competitive tasks (2018). arXiv:1812.09755

  24. Macua, S.V., et al.: Diff-DAC: distributed actor-critic for average multitask deep reinforcement learning (2019). arXiv:1710.10363

  25. Sunehag, P., et al.: Value-decomposition networks for cooperativae multi-agent learning based on team reward. In: International Conference on Autonomous Agents and MultiAgent Systems, Stockholm, Sweden, International Foundation for Autonomous Agents and Multiagent Systems, pp. 2085–2087, July 2018

    Google Scholar 

  26. Yang, J., Nakhaei, A., Isele, D., Fujimura, K., Zha, H.: CM3: cooperative multi-goal multi-stage multi-agent reinforcement learning, January 2020. arXiv:1809.05188

  27. Virgolin, M., De Lorenzo, A., Medvet, E., Randone, F.: Learning a formula of interpretability to learn interpretable formulas. In: Bäck, T., et al. (eds.) Parallel Problem Solving from Nature, pp. 79–93. Springer International Publishing, Cham (2020)

    Chapter  Google Scholar 

  28. Barceló, P., Monet, M., Pérez, J., Subercaseaux, B.: Model interpretability through the lens of computational complexity. In: Proceedings of 33rd conference on Advances in Neural Information Processing Systems (2020)

    Google Scholar 

  29. Custode, L.L., Iacca, G.: A co-evolutionary approach to interpretable reinforcement learning in environments with continuous action spaces. In: 2021 IEEE Symposium Series on Computational Intelligence (SSCI), pp. 1–8, December 2021

    Google Scholar 

  30. Ryan, C., Collins, J.J., Neill, M.O.: Grammatical evolution: evolving programs for an arbitrary language. In: Banzhaf, W., Poli, R., Schoenauer, M., Fogarty, T.C. (eds.) EuroGP 1998. LNCS, vol. 1391, pp. 83–96. Springer, Heidelberg (1998). https://doi.org/10.1007/BFb0055930

    Chapter  Google Scholar 

  31. Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. A Bradford Book, Cambridge (2018)

    Google Scholar 

  32. Foerster, J., Assael, I.A., de Freitas, N., Whiteson, S.: Learning to communicate with deep multi-agent reinforcement learning. In Lee, D., Sugiyama, M., Luxburg, U., Guyon, I., Garnett, R., eds.: Advances in Neural Information Processing Systems, vol. 29, Curran Associates, Inc. Red Hook (2016)

    Google Scholar 

  33. Lotito, Q.F., Custode, L.L., Iacca, G.: A signal-centric perspective on the evolution of symbolic communication. In: Proceedings of the Genetic and Evolutionary Computation Conference. Association for Computing Machinery, pp. 120–128. New York, NY, USA, June (2021)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Giovanni Iacca .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Crespi, M., Custode, L.L., Iacca, G. (2022). Towards Interpretable Policies in Multi-agent Reinforcement Learning Tasks. In: Mernik, M., Eftimov, T., Črepinšek, M. (eds) Bioinspired Optimization Methods and Their Applications. BIOMA 2022. Lecture Notes in Computer Science, vol 13627. Springer, Cham. https://doi.org/10.1007/978-3-031-21094-5_19

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-21094-5_19

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-21093-8

  • Online ISBN: 978-3-031-21094-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics