Towards Interpretable Policies in Multi-agent Reinforcement Learning Tasks

Crespi, Marco; Custode, Leonardo Lucio; Iacca, Giovanni

doi:10.1007/978-3-031-21094-5_19

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13627))

Included in the following conference series:

International Conference on Bioinspired Optimization Methods and Their Applications

Abstract

Deep Learning (DL) allowed the field of Multi-Agent Reinforcement Learning (MARL) to make significant advances, speeding-up the progress in the field. However, agents trained by means of DL in MARL settings have an important drawback: their policies are extremely hard to interpret, not only at the individual agent level, but also (and especially) considering the fact that one has to take into account the interactions across the whole set of agents. In this work, we make a step towards achieving interpretability in MARL tasks. To do that, we present an approach that combines evolutionary computation (i.e., grammatical evolution) and reinforcement learning (Q-learning), which allows us to produce agents that are, at least to some extent, understandable. Moreover, differently from the typically centralized DL-based approaches (and because of the possibility to use a replay buffer), in our method we can easily employ Independent Q-learning to train a team of agents, which facilitates robustness and scalability. By evaluating our approach on the Battlefield task from the MAgent implementation in the PettingZoo library, we observe that the evolved team of agents is able to coordinate its actions in a distributed fashion, solving the task in an effective way.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 54.99; Price excludes VAT (USA)

Softcover Book: USD 69.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
In the rest of this paper, we will define as an interpretable system one that can be understood and inspected by humans [2].
2.
https://www.pettingzoo.ml/magent/battlefield (accessed on 02/02/2022).

References

OroojlooyJadid, A., Hajinezhad, D.: A review of cooperative multi-agent deep reinforcement learning (2020) . arXiv:1908.03963
Barredo Arrieta, A., et al.: Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI. Inf. Fus. 58, 82–115 (2020)
Google Scholar
Rudin, C.: Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nat. Mach. Intell. 1(5), 206–215 (2019)
Article Google Scholar
Rudin, C., Radin, J.: Why are we using black box models in AI when we don’t need To? A lesson from an explainable ai competition. Harvard Data Sci. Rev .1(2) (November 2019)
Google Scholar
Rudin, C., Chen, C., Chen, Z., Huang, H., Semenova, L., Zhong, C.: Interpretable machine learning: fundamental principles and 10 grand challenges, July 2021. arXiv:2103.11251
Custode, L.L., Iacca, G.: Evolutionary learning of interpretable decision trees (2020)
Google Scholar
Potter, M.A., De Jong, K.A.: A cooperative coevolutionary approach to function optimization. In: Davidor, Y., Schwefel, H.-P., Männer, R. (eds.) PPSN 1994. LNCS, vol. 866, pp. 249–257. Springer, Heidelberg (1994). https://doi.org/10.1007/3-540-58484-6_269
Chapter Google Scholar
Zheng, L., et al.: MAgent: a many-agent reinforcement learning platform for artificial collective intelligence. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32, pp. 8222–8223 (2018)
Google Scholar
Terry, J.K., et al.: Pettingzoo: gym for multi-agent reinforcement learning (2020). arXiv:2009.14471
Busoniu, L., Babuska, R., De Schutter, B.: A comprehensive survey of multiagent reinforcement learning. IEEE Trans. Syst. Man Cybernet. Part C (Applications and Reviews) 38(2) 156–172 (2008)
Google Scholar
Stone, P., Veloso, M.: Multiagent Systems: A Survey from a Machine Learning Perspective: Technical report. Defense Technical Information Center, Fort Belvoir, VA, December 1997
Google Scholar
Yu, C., Liu, J., Nemati, S.: Reinforcement Learning in Healthcare: a survey, April 2020. arXiv:1908.08796
Sandholm, T.W., Crites, R.H.: On multiagent Q-learning in a semi-competitive domain. In: Weiß, G., Sen, S. (eds.) IJCAI 1995. LNCS, vol. 1042, pp. 191–205. Springer, Heidelberg (1996). https://doi.org/10.1007/3-540-60923-7_28
Chapter Google Scholar
Littman, M.L.: Markov games as a framework for multi-agent reinforcement learning. In: Machine Learning Proceedings 1994. Morgan Kaufmann, San Francisco (CA), pp. 157–163 (1994)
Google Scholar
Haynes, T., Wainwright, R.L., Sen, S., Schoenefeld, D.A.: Strongly typed genetic programming in evolving cooperation strategies. In: International Conference on Genetic Algorithms, San Francisco, CA, USA, pp. 271–278. Morgan Kaufmann Publishers Inc. (July 1995)
Google Scholar
Tan, M.: In: Multi-agent Reinforcement Learning: Independent vs, pp. 487–494. Cooperative Agents. Morgan Kaufmann Publishers Inc., San Francisco (1997)
Google Scholar
Lauer, M., Riedmiller, M.A.: An algorithm for distributed reinforcement learning in cooperative multi-agent systems. In: International Conference on Machine Learning, San Francisco, CA, USA, pp. 535–542. Morgan Kaufmann Publishers Inc. (2000)
Google Scholar
Fuji, T., Ito, K., Matsumoto, K., Yano, K.: Deep multi-agent reinforcement learning using DNN-weight evolution to optimize supply chain performance. In: Hawaii International Conference on System Sciences, pp. 1278–1287. Honolulu, HI, USA, HICSS, (2018)
Google Scholar
Omidshafiei, S., Pazis, J., Amato, C., How, J.P., Vian, J.: Deep decentralized multi-task multi-agent reinforcement learning under partial observability. In: International Conference on Machine Learning, pp. 2681–2690. Sydney, NSW, Australia, JMLR.org, August 2017
Google Scholar
Matignon, L., Laurent, G.J., Le Fort-Piat, N.: Hysteretic q-learning: an algorithm for decentralized reinforcement learning in cooperative multi-agent teams. In: International Conference on Intelligent Robots and Systems, pp. 64–69. New York, NY, USA, IEEE/RSJ (2007)
Google Scholar
Tampuu, A., et al.: Multiagent cooperation and competition with deep reinforcement learning, November 2015. arXiv:1511.08779
Chu, X., Ye, H.: Parameter sharing deep deterministic policy gradient for cooperative multi-agent reinforcement learning, October 2017. arXiv:1710.00336
Singh, A., Jain, T., Sukhbaatar, S.: Learning when to communicate at scale in multiagent cooperative and competitive tasks (2018). arXiv:1812.09755
Macua, S.V., et al.: Diff-DAC: distributed actor-critic for average multitask deep reinforcement learning (2019). arXiv:1710.10363
Sunehag, P., et al.: Value-decomposition networks for cooperativae multi-agent learning based on team reward. In: International Conference on Autonomous Agents and MultiAgent Systems, Stockholm, Sweden, International Foundation for Autonomous Agents and Multiagent Systems, pp. 2085–2087, July 2018
Google Scholar
Yang, J., Nakhaei, A., Isele, D., Fujimura, K., Zha, H.: CM3: cooperative multi-goal multi-stage multi-agent reinforcement learning, January 2020. arXiv:1809.05188
Virgolin, M., De Lorenzo, A., Medvet, E., Randone, F.: Learning a formula of interpretability to learn interpretable formulas. In: Bäck, T., et al. (eds.) Parallel Problem Solving from Nature, pp. 79–93. Springer International Publishing, Cham (2020)
Chapter Google Scholar
Barceló, P., Monet, M., Pérez, J., Subercaseaux, B.: Model interpretability through the lens of computational complexity. In: Proceedings of 33rd conference on Advances in Neural Information Processing Systems (2020)
Google Scholar
Custode, L.L., Iacca, G.: A co-evolutionary approach to interpretable reinforcement learning in environments with continuous action spaces. In: 2021 IEEE Symposium Series on Computational Intelligence (SSCI), pp. 1–8, December 2021
Google Scholar
Ryan, C., Collins, J.J., Neill, M.O.: Grammatical evolution: evolving programs for an arbitrary language. In: Banzhaf, W., Poli, R., Schoenauer, M., Fogarty, T.C. (eds.) EuroGP 1998. LNCS, vol. 1391, pp. 83–96. Springer, Heidelberg (1998). https://doi.org/10.1007/BFb0055930
Chapter Google Scholar
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. A Bradford Book, Cambridge (2018)
Google Scholar
Foerster, J., Assael, I.A., de Freitas, N., Whiteson, S.: Learning to communicate with deep multi-agent reinforcement learning. In Lee, D., Sugiyama, M., Luxburg, U., Guyon, I., Garnett, R., eds.: Advances in Neural Information Processing Systems, vol. 29, Curran Associates, Inc. Red Hook (2016)
Google Scholar
Lotito, Q.F., Custode, L.L., Iacca, G.: A signal-centric perspective on the evolution of symbolic communication. In: Proceedings of the Genetic and Evolutionary Computation Conference. Association for Computing Machinery, pp. 120–128. New York, NY, USA, June (2021)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Information Engineering and Computer Science, University of Trento, Trento, Italy
Marco Crespi, Leonardo Lucio Custode & Giovanni Iacca

Authors

Marco Crespi
View author publications
You can also search for this author in PubMed Google Scholar
Leonardo Lucio Custode
View author publications
You can also search for this author in PubMed Google Scholar
Giovanni Iacca
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Giovanni Iacca .

Editor information

Editors and Affiliations

University of Maribor, Maribor, Slovenia
Marjan Mernik
Jožef Stefan Institute, Ljubljana, Slovenia
Tome Eftimov
University of Maribor, Maribor, Slovenia
Matej Črepinšek

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Crespi, M., Custode, L.L., Iacca, G. (2022). Towards Interpretable Policies in Multi-agent Reinforcement Learning Tasks. In: Mernik, M., Eftimov, T., Črepinšek, M. (eds) Bioinspired Optimization Methods and Their Applications. BIOMA 2022. Lecture Notes in Computer Science, vol 13627. Springer, Cham. https://doi.org/10.1007/978-3-031-21094-5_19

Download citation

DOI: https://doi.org/10.1007/978-3-031-21094-5_19
Published: 10 November 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-21093-8
Online ISBN: 978-3-031-21094-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Towards Interpretable Policies in Multi-agent Reinforcement Learning Tasks