2v2 Close Air Combat Decision-Making Based on Improved MAPPO Algorithm

Yan, Qingzhong; Ren, Jihuan; Liu, Yi; Wu, Xiang

doi:10.1007/978-981-97-1083-6_20

Qingzhong Yan³⁹,
Jihuan Ren³⁹,
Yi Liu³⁹ &
…
Xiang Wu³⁹

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 1171))

Included in the following conference series:

International Conference on Autonomous Unmanned Systems

79 Accesses

Abstract

Aircraft cluster air warfare is a complex and challenging combat scenario. Reinforcement learning is applied to unmanned cluster control because of its powerful dynamic decision-making and control capabilities. However, for the scenario described above, the multi-agent reinforcement learning algorithm still has issues such as local optima and long training times. To address the above issues, our work improves the Multi-Agent Proximal Policy Optimization(MAPPO) algorithm. Specifically we apply a mechanism to reduce the dimensionality of actions and corresponding values, and design an adaptive reward function which can help the agent maintain a good balance between attack and defense. In addition we built a 2V2 simulation scenario of close air combat to evaluate our algorithm. The experimental results demonstrate that the models trained by our algorithm are more effective in decision-making performance.

X. Wu—This work was supported by the National Natural Science Foundation of China (62103192), China Postdoctoral Science Foundation (2021M691597) and Fundamental Research Funds for Central Universities (30922010710).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 219.00; Price excludes VAT (USA)

Hardcover Book: USD 279.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

McGrew, J.S., How, J.P., Williams, B., Roy, N.: Air-combat strategy using approximate dynamic programming. J. Guid. Control. Dyn. 33(5), 1641–1654 (2010)
Article Google Scholar
Nigam, N., Bieniawski, S., Kroo, I., Vian, J.: Control of multiple uavs for persistent surveillance: algorithm and flight test results. IEEE Trans. Control Syst. Technol. 20(5), 1236–1251 (2011)
Article Google Scholar
Kaelbling, L.P., Littman, M.L., Moore, A.W.: Reinforcement learning: a survey. J. Artifi. Intell. Res. 4, 237–285 (1996)
Article Google Scholar
Peng, P., et al.: Multiagent bidirectionally-coordinated nets: emergence of human-level coordination in learning to play starcraft combat games. arXiv preprint arXiv:1703.10069 (2017)
Matignon, L., Jeanpierre, L., Mouaddib, A.I.: Coordinated multi-robot exploration under communication constraints using decentralized markov decision processes. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 26, pp. 2017–2023 (2012)
Google Scholar
Arulkumaran, K., Deisenroth, M.P., Brundage, M., Bharath, A.A.: Deep reinforcement learning: a brief survey. IEEE Signal Process. Mag. 34(6), 26–38 (2017)
Article Google Scholar
Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)
Article Google Scholar
Buşoniu, L., Babuška, R., De Schutter, B.: Multi-agent reinforcement learning: an overview. In: Innovations in Multi-agent Systems and Applications-1, pp. 183–221 (2010)
Google Scholar
Yu, C., Velu, A., Vinitsky, E., Wang, Y., Bayen, A.M., Wu, Y.: The surprising effectiveness of mappo in cooperative, multi-agent games. ArXiv abs/ arxiv: 2103.01955 (2021)
Tan, M.: Multi-agent reinforcement learning: Independent vs. cooperative agents. In: Proceedings of the Tenth International Conference on Machine Learning, pp. 330–337 (1993)
Google Scholar
Matignon, L., Laurent, G.J., Le Fort-Piat, N.: Independent reinforcement learners in cooperative markov games: a survey regarding coordination problems. Knowl. Eng. Rev. 27(1), 1–31 (2012)
Article Google Scholar
Panait, L., Luke, S.: Cooperative multi-agent learning: The state of the art. Auton. Agent. Multi-Agent Syst. 11, 387–434 (2005)
Article Google Scholar
Sukhbaatar, S., Fergus, R., et al.: Learning multiagent communication with backpropagation. In: Advances in Neural Information Processing Systems 29 (2016)
Google Scholar
Claus, C., Boutilier, C.: The dynamics of reinforcement learning in cooperative multiagent systems. AAAI/IAAI 1998(746–752), 2 (1998)
Google Scholar
Littman, M.L.: Markov games as a framework for multi-agent reinforcement learning. In: Machine learning proceedings 1994, pp. 157–163. Elsevier (1994)
Google Scholar
Lowe, R., Wu, Y.I., Tamar, A., Harb, J., Pieter Abbeel, O., Mordatch, I.: Multi-agent actor-critic for mixed cooperative-competitive environments. In: Advances in Neural Information Processing Systems 30 (2017)
Google Scholar
Foerster, J., Farquhar, G., Afouras, T., Nardelli, N., Whiteson, S.: Counterfactual multi-agent policy gradients. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32 (2018)
Google Scholar
Kuba, J.G., et al.: Trust region policy optimisation in multi-agent reinforcement learning. ArXiv abs/ arxiv: 2109.11251 (2021)
Schulman, J., Moritz, P., Levine, S., Jordan, M., Abbeel, P.: High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:1506.02438 (2015)
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017)
Hessel, M., Soyer, H., Espeholt, L., Czarnecki, W., Schmitt, S., van Hasselt, H.: Multi-task deep reinforcement learning with popart. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 3796–3803 (2019)
Google Scholar
QihanLiu, Yuhua Jiang, X.M.: Light aircraft game: a lightweight, scalable, gym-wrapped aircraft competitive environment with baseline reinforcement learning algorithms (2022). https://github.com/liuqh16/CloseAirCombat

Download references

Author information

Authors and Affiliations

School of Automation, Nanjing University of Science and Technology, Nanjing, China
Qingzhong Yan, Jihuan Ren, Yi Liu & Xiang Wu

Authors

Qingzhong Yan
View author publications
You can also search for this author in PubMed Google Scholar
Jihuan Ren
View author publications
You can also search for this author in PubMed Google Scholar
Yi Liu
View author publications
You can also search for this author in PubMed Google Scholar
Xiang Wu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiang Wu .

Editor information

Editors and Affiliations

Nanjing University of Science and Technology, Nanjing, China
Yi Qu
Beijing HIWING Scientific and Technological Information Institute, Beijing, China
Mancang Gu
College of Intelligence Science and Technology, National University of Defense Technology, Changsha, Hunan, China
Yifeng Niu
Unmanned System Research Institute, Northwestern Polytechnical University, Xi'an, Shaanxi, China
Wenxing Fu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yan, Q., Ren, J., Liu, Y., Wu, X. (2024). 2v2 Close Air Combat Decision-Making Based on Improved MAPPO Algorithm. In: Qu, Y., Gu, M., Niu, Y., Fu, W. (eds) Proceedings of 3rd 2023 International Conference on Autonomous Unmanned Systems (3rd ICAUS 2023). ICAUS 2023. Lecture Notes in Electrical Engineering, vol 1171. Springer, Singapore. https://doi.org/10.1007/978-981-97-1083-6_20

Download citation

DOI: https://doi.org/10.1007/978-981-97-1083-6_20
Published: 26 April 2024
Publisher Name: Springer, Singapore
Print ISBN: 978-981-97-1082-9
Online ISBN: 978-981-97-1083-6
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

2v2 Close Air Combat Decision-Making Based on Improved MAPPO Algorithm