Abstract
Aircraft cluster air warfare is a complex and challenging combat scenario. Reinforcement learning is applied to unmanned cluster control because of its powerful dynamic decision-making and control capabilities. However, for the scenario described above, the multi-agent reinforcement learning algorithm still has issues such as local optima and long training times. To address the above issues, our work improves the Multi-Agent Proximal Policy Optimization(MAPPO) algorithm. Specifically we apply a mechanism to reduce the dimensionality of actions and corresponding values, and design an adaptive reward function which can help the agent maintain a good balance between attack and defense. In addition we built a 2V2 simulation scenario of close air combat to evaluate our algorithm. The experimental results demonstrate that the models trained by our algorithm are more effective in decision-making performance.
X. WuāThis work was supported by the National Natural Science Foundation of China (62103192), China Postdoctoral Science Foundation (2021M691597) and Fundamental Research Funds for Central Universities (30922010710).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
McGrew, J.S., How, J.P., Williams, B., Roy, N.: Air-combat strategy using approximate dynamic programming. J. Guid. Control. Dyn. 33(5), 1641ā1654 (2010)
Nigam, N., Bieniawski, S., Kroo, I., Vian, J.: Control of multiple uavs for persistent surveillance: algorithm and flight test results. IEEE Trans. Control Syst. Technol. 20(5), 1236ā1251 (2011)
Kaelbling, L.P., Littman, M.L., Moore, A.W.: Reinforcement learning: a survey. J. Artifi. Intell. Res. 4, 237ā285 (1996)
Peng, P., et al.: Multiagent bidirectionally-coordinated nets: emergence of human-level coordination in learning to play starcraft combat games. arXiv preprint arXiv:1703.10069 (2017)
Matignon, L., Jeanpierre, L., Mouaddib, A.I.: Coordinated multi-robot exploration under communication constraints using decentralized markov decision processes. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol.Ā 26, pp. 2017ā2023 (2012)
Arulkumaran, K., Deisenroth, M.P., Brundage, M., Bharath, A.A.: Deep reinforcement learning: a brief survey. IEEE Signal Process. Mag. 34(6), 26ā38 (2017)
Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529ā533 (2015)
BuÅoniu, L., BabuÅ”ka, R., DeĀ Schutter, B.: Multi-agent reinforcement learning: an overview. In: Innovations in Multi-agent Systems and Applications-1, pp. 183ā221 (2010)
Yu, C., Velu, A., Vinitsky, E., Wang, Y., Bayen, A.M., Wu, Y.: The surprising effectiveness of mappo in cooperative, multi-agent games. ArXiv abs/ arxiv: 2103.01955 (2021)
Tan, M.: Multi-agent reinforcement learning: Independent vs. cooperative agents. In: Proceedings of the Tenth International Conference on Machine Learning, pp. 330ā337 (1993)
Matignon, L., Laurent, G.J., Le Fort-Piat, N.: Independent reinforcement learners in cooperative markov games: a survey regarding coordination problems. Knowl. Eng. Rev. 27(1), 1ā31 (2012)
Panait, L., Luke, S.: Cooperative multi-agent learning: The state of the art. Auton. Agent. Multi-Agent Syst. 11, 387ā434 (2005)
Sukhbaatar, S., Fergus, R., etĀ al.: Learning multiagent communication with backpropagation. In: Advances in Neural Information Processing Systems 29 (2016)
Claus, C., Boutilier, C.: The dynamics of reinforcement learning in cooperative multiagent systems. AAAI/IAAI 1998(746ā752), 2 (1998)
Littman, M.L.: Markov games as a framework for multi-agent reinforcement learning. In: Machine learning proceedings 1994, pp. 157ā163. Elsevier (1994)
Lowe, R., Wu, Y.I., Tamar, A., Harb, J., PieterĀ Abbeel, O., Mordatch, I.: Multi-agent actor-critic for mixed cooperative-competitive environments. In: Advances in Neural Information Processing Systems 30 (2017)
Foerster, J., Farquhar, G., Afouras, T., Nardelli, N., Whiteson, S.: Counterfactual multi-agent policy gradients. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol.Ā 32 (2018)
Kuba, J.G., et al.: Trust region policy optimisation in multi-agent reinforcement learning. ArXiv abs/ arxiv: 2109.11251 (2021)
Schulman, J., Moritz, P., Levine, S., Jordan, M., Abbeel, P.: High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:1506.02438 (2015)
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017)
Hessel, M., Soyer, H., Espeholt, L., Czarnecki, W., Schmitt, S., van Hasselt, H.: Multi-task deep reinforcement learning with popart. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol.Ā 33, pp. 3796ā3803 (2019)
QihanLiu, YuhuaĀ Jiang, X.M.: Light aircraft game: a lightweight, scalable, gym-wrapped aircraft competitive environment with baseline reinforcement learning algorithms (2022). https://github.com/liuqh16/CloseAirCombat
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
Ā© 2024 Beijing HIWING Scientific and Technological Information Institute
About this paper
Cite this paper
Yan, Q., Ren, J., Liu, Y., Wu, X. (2024). 2v2 Close Air Combat Decision-Making Based onĀ Improved MAPPO Algorithm. In: Qu, Y., Gu, M., Niu, Y., Fu, W. (eds) Proceedings of 3rd 2023 International Conference on Autonomous Unmanned Systems (3rd ICAUS 2023). ICAUS 2023. Lecture Notes in Electrical Engineering, vol 1171. Springer, Singapore. https://doi.org/10.1007/978-981-97-1083-6_20
Download citation
DOI: https://doi.org/10.1007/978-981-97-1083-6_20
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-97-1082-9
Online ISBN: 978-981-97-1083-6
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)