Abstract
Multi-Agent Reinforcement Learning (MARL) has been used to solve sequential decision problems by a collection of intelligent agents interacting in a shared environment. However, the design complexity of MARL strategies increases with the complexity of the task specifications. In addition, current MARL approaches suffer from slow convergence and reward sparsity when dealing with multi-task specifications. Linear temporal logic works as one of the software engineering practices to describe non-Markovian task specifications, whose synthesized strategies can be used as a priori knowledge to train the multi-agents to interact with the environment more efficiently. In this paper, we consider multi-agents that react to each other with a high-level reactive temporal logic specification called Generalized Reactivity of rank 1 (GR(1)). We first decompose the synthesized strategy of GR(1) into a set of potential-based reward machines for individual agents. We prove that the parallel composition of the reward machines forward simulates the original reward machine, which satisfies the GR(1) specification. We then extend the Markov Decision Process (MDP) with the synchronized reward machines. A value-iteration-based approach is developed to compute the potential values of the reward machine based on the strategy structure. We also propose a decentralized Q-learning algorithm to train the multi-agents with the extended MDP. Experiments on multi-agent learning under different reactive temporal logic specifications demonstrate the effectiveness of the proposed method, showing a superior learning curve and optimal rewards.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Brockman, G., et al.: Openai gym. arXiv preprint arXiv:1606.01540 (2016)
Camacho, A., Toro Icarte, R., Klassen, T.Q., Valenzano, R., McIlraith, S.A.: Ltl and beyond: Formal languages for reward function specification in reinforcement learning. In: Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, (IJCAI), pp. 6065–6073 (7 2019)
Cassandras, C.G., Lafortune, S.: Introduction to discrete event systems. Springer (2008)
Ding, X., Smith, S.L., Belta, C., Rus, D.: Optimal control of markov decision processes with linear temporal logic constraints. IEEE Trans. Autom. Control 59(5), 1244–1257 (2014)
Even-Dar, E., Mansour, Y.: Convergence of optimistic and incremental q-learning. In: Advances in Neural Information Processing Systems, vol. 14 (2001)
Gao, Q., Hajinezhad, D., Zhang, Y., Kantaros, Y., Zavlanos, M.M.: Reduced variance deep reinforcement learning with temporal logic specifications. In: Proceedings of the 10th ACM/IEEE International Conference on Cyber-Physical Systems(ICCPS), pp. 237–248 (2019)
Gronauer, S., Diepold, K.: Multi-agent deep reinforcement learning: a survey. Artif. Intell. Rev. 5, 1–49 (2021). https://doi.org/10.1007/s10462-021-09996-w
Hahn, E.M., Perez, M., Schewe, S., Somenzi, F., Trivedi, A., Wojtczak, D.: Omega-regular objectives in model-free reinforcement learning. In: International Conference on Tools and Algorithms for the Construction and Analysis of Systems(TACAS), pp. 395–412. Springer (2019)
Hammond, L., Abate, A., Gutierrez, J., Wooldridge, M.: Multi-agent reinforcement learning with temporal logic specifications. In: Adaptive Agents and Multi-Agent Systems (2021)
Hasanbeig, M., Kantaros, Y., Abate, A., Kroening, D., Pappas, G.J., Lee, I.: Reinforcement learning for temporal logic control synthesis with probabilistic satisfaction guarantees. In: 2019 IEEE 58th Conference on Decision and Control (CDC), pp. 5338–5343. IEEE (2019)
Hu, Y., et al.: Learning to utilize shaping rewards: a new approach of reward shaping. Adv. Neural. Inf. Process. Syst. 33, 15931–15941 (2020)
Icarte, R.T., Klassen, T., Valenzano, R., McIlraith, S.: Using reward machines for high-level task specification and decomposition in reinforcement learning. In: International Conference on Machine Learning(ICML), pp. 2107–2116. PMLR (2018)
Icarte, R.T., Klassen, T.Q., Valenzano, R., McIlraith, S.A.: Reward machines: exploiting reward function structure in reinforcement learning. J. Artif. Intell. Res. 73, 173–208 (2022)
Ikeda, T., Shibuya, T.: Centralized training with decentralized execution reinforcement learning for cooperative multi-agent systems with communication delay. In: 2022 61st Annual Conference of the Society of Instrument and Control Engineers (SICE), pp. 135–140. IEEE (2022)
Kaelbling, L.P., Littman, M.L., Moore, A.W.: Reinforcement learning: a survey. J. Artif. Intell. Res. 4, 237–285 (1996)
Karimadini, M., Lin, H.: Guaranteed global performance through local coordinations. Automatica 47(5), 890–898 (2011)
Karimadini, M., Lin, H., Karimoddini, A.: Cooperative tasking for deterministic specification automata. Asian J. Contr. 18(6), 2078–2087 (2016)
Livingston, S.: Gr1c: A collection of tools for gr(1) synthesis and related activities, https://github.com/tulip-control/gr1c
Lynch, N., Vaandrager, F.: Forward and backward simulations. Inf. Comput. 121(2), 214–233 (1995)
Neary, C., Xu, Z., Wu, B., Topcu, U.: Reward machines for cooperative multi-agent reinforcement learning. In: Proceedings of the 20th International Conference on Autonomous Agents and MultiAgent Systems. pp. 934–942. AAMAS In: 21, International Foundation for Autonomous Agents and Multiagent Systems, Richland, SC (2021)
Oura, R., Sakakibara, A., Ushio, T.: Reinforcement learning of control policy for linear temporal logic specifications using limit-deterministic generalized büchi automata. IEEE Contr. Syst. Lett. 4(3), 761–766 (2020)
Piterman, N., Pnueli, A., Sa’ar, Y.: Synthesis of reactive (1) designs. In: International Workshop on Verification, Model Checking, and Abstract Interpretation, pp. 364–380. Springer (2006)
Pnueli, A.: The temporal logic of programs. In: 18th Annual Symposium on Foundations of Computer Science (sfcs 1977), pp. 46–57. IEEE, IEEE (Sep 1977). https://doi.org/10.1109/sfcs.1977.32
Pnueli, A., Rosner, R.: On the synthesis of a reactive module. In: Proceedings of the 16th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pp. 179–190 (1989)
Tan, M.: Multi-agent reinforcement learning: Independent vs. cooperative agents. In: Proceedings of the Tenth International Conference on Machine Learning, pp. 330–337 (1993)
Tang, H., et al.: Hierarchical deep multiagent reinforcement learning with temporal abstraction. arXiv preprint arXiv:1809.09332 (2018)
Waqar, N., Hassan, S.A., Pervaiz, H., Jung, H., Dev, K.: Deep multi-agent reinforcement learning for resource allocation in noma-enabled mec. Comput. Commun. 196, 1–8 (2022)
Watkins, C.J., Dayan, P.: Q-learning. Mach. Learn. 8(3), 279–292 (1992)
Wolff, E.M., Topcu, U., Murray, R.M.: Robust control of uncertain markov decision processes with temporal logic specifications. In: 2012 IEEE 51st IEEE Conference on Decision and Control (CDC), pp. 3372–3379. IEEE (2012)
Yang, C., Littman, M.L., Carbin, M.: On the (in) tractability of reinforcement learning for ltl objectives. In: Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, IJCAI, pp. 3650–3658 (2022)
Zhang, K., Yang, Z., Liu, H., Zhang, T., Basar, T.: Fully decentralized multi-agent reinforcement learning with networked agents. In: International Conference on Machine Learning, pp. 5872–5881. PMLR (2018)
Zhu, C., Butler, M., Cirstea, C., Hoang, T.S.: A fairness-based refinement strategy to transform liveness properties in Event-B models. Sci. Comput. Program. 225, 102907 (2023)
Zhu, C., Cai, Y., Hu, C., Bi, J.: Efficient reinforcement learning with generalized-reactivity specifications. In: 2022 29th Asia-Pacific Software Engineering Conference (APSEC), pp. 31–40. IEEE (2022)
Zhu, C., Cai, Y., Zhu, J., Hu, C., Bi, J.: Gr (1)-guided deep reinforcement learning for multi-task motion planning under a stochastic environment. Electronics 11(22), 3716 (2022)
Acknowledgement
This work was supported by National Natural Science Foundation of China (No.62202067) and Natural Science Foundation of the Higher Education Institutions of Jiangsu Province (No. 22KJB520012).
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Zhu, C., Zhu, J., Cai, Y., Wang, F. (2023). Decomposing Synthesized Strategies for Reactive Multi-agent Reinforcement Learning. In: David, C., Sun, M. (eds) Theoretical Aspects of Software Engineering. TASE 2023. Lecture Notes in Computer Science, vol 13931. Springer, Cham. https://doi.org/10.1007/978-3-031-35257-7_4
Download citation
DOI: https://doi.org/10.1007/978-3-031-35257-7_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-35256-0
Online ISBN: 978-3-031-35257-7
eBook Packages: Computer ScienceComputer Science (R0)