Air combat maneuver decision based on deep reinforcement learning with auxiliary reward

Zhang, Tingyu; Wang, Yongshuai; Sun, Mingwei; Chen, Zengqiang

doi:10.1007/s00521-024-09720-z

Air combat maneuver decision based on deep reinforcement learning with auxiliary reward

Original Article
Published: 26 April 2024

(2024)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

Tingyu Zhang¹,
Yongshuai Wang^1,2,
Mingwei Sun^1,2 &
…
Zengqiang Chen^1,2

66 Accesses
Explore all metrics

Abstract

For air combat maneuvering decision, the sparse reward during the application of deep reinforcement learning limits the exploration efficiency of the agents. To address this challenge, we propose an auxiliary reward function considering the impact of angle, range, and altitude. Furthermore, we investigate the influences of the network nodes, layers, and the learning rate on decision system, and reasonable parameter ranges are provided, which can serve as a guideline. Finally, four typical air combat scenarios demonstrate good adaptability and effectiveness of the proposed scheme, and the auxiliary reward significantly improves the learning ability of deep Q network (DQN) by leading the agents to explore more intently. Compared with the original deep deterministic policy gradient and soft actor critic algorithm, the proposed method exhibits superior exploration capability with higher reward, indicating that the trained agent can adapt to different air combats with good performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Deep Reinforcement Learning Based Intelligent Decision Method for UCAV Air Combat

The Pursuit-Evasion Game Strategy of High-Speed Aircraft Based on Monte-Carlo Deep Reinforcement Learning

Research on Evasion Strategy of Aircraft Based on Deep Reinforcement Learning

Data availability

No datasets were analyzed during the current study. The data generated during the current study are available from the corresponding author on reasonable request.

Abbreviations

UCAV:: Unmanned combat aerial vehicle
DQN:: Deep Q network
DDPG:: Deep deterministic policy gradient
SAC:: Soft actor critic
LOS:: Line of sight
AA:: Aspect angle (rad)
ATA:: Antenna train angle (rad)
HTC:: Heading crossing angle (rad)
R :: Distance between UCAV and target (m/s)
\(\dot{R}\) :: Change rate of R (m/s)
\({z_\text{U}}\),\({z_\text{T}}\) :: Flight altitude of UCAV and target (m)
\({v_\text{U}}\),\({v_\text{T}}\) :: Flight speed of UCAV and target (m/s)
\(\Delta h\) :: Flight altitude difference (rad)
\(\Delta v\) :: Flight speed difference (m/s)
\({R_\text{w}}\) :: Airborne weapon attack range (m)
\({\varphi _\text{w}}\) :: Maximum weapon attack angle (deg)
\({q_\text{w}}\) :: Maximum aspect angle
\({P_\text{U}},{P_\text{T}}\) :: Position vector of UCAV and target

References

Alpdemir MN (2022) Tactical UAV path optimization under radar threat using deep reinforcement learning. Neural Comput Appl 34:5649–5664
Article Google Scholar
Liu H, Meng Q, Peng F, Lewis FL (2020) Heterogeneous formation control of multiple UAVs with limited-input leader via reinforcement learning. Neurocomputing 412:63–71
Article Google Scholar
Zhou K, Wei R, Xu Z (2020) An air combat decision learning system based on a brain-like cognitive mechanism. Cogn Comput 12:128–139
Article Google Scholar
Trotta A, Felice MD, Montori F, Chowdhury KR, Bononi L (2018) Joint coverage, connectivity, and charging strategies for distributed UAV networks. IEEE Trans Robot 34:883–900
Article Google Scholar
Sun Z, Wu H, Shi Y, Yu X, Gao Y, Pei W, Yang Z, Piao H, Hou Y (2023) Multi-agent air combat with two-stage graph-attention communication. Neural Comput Appl 35:19765–19781
Article Google Scholar
Shin H, Lee J, Kim H, Hyunchul Shim D (2018) An autonomous aerial combat framework for two-on-two engagements based on basic fighter maneuvers. Aerosp Sci Technol 72:305–315
Article Google Scholar
Maravall Lope J, Fuentes JP (2015) Vision-based anticipatory controller for the autonomous navigation of an UAV using artificial neural networks. Neurocomputing 151:101–107
Article Google Scholar
Dai X, Mao Y, Huang T (2020) Automatic obstacle avoidance of quadrotor UAV via CNN-based learning. Neurocomputing 402:346–358
Article Google Scholar
Wang M, Wang L, Yue T, Liu H (2020) Influence of unmanned combat aerial vehicle agility on short-range aerial combat effectiveness. Aerosp Sci Technol 96:105534
Article Google Scholar
Zhou K, Wei R, Xu Z, Zhang Q (2018) (2018) A brain like air combat learning system inspired by human learning mechanism. In: Proceedings of IEEE CSAA guidance, navigation and control conference (CGNCC). IEEE, Xiamen, pp 1–6
Wang X, Guo K, Chao T, Wang S (2022) Design of differential game guidance law for dual defense aircrafts. In: Proceedings of 2022 5th international symposium on autonomous systems (ISAS). IEEE, Hangzhou, pp 1–6
Weintraub IE, Pachter M, Garcia E (2020) (2020) An introduction to pursuit-evasion differential games. In: Proceedings of American control conference (ACC). IEEE, Denver, pp 1049–1066
Ruan W, Sun Y, Deng Y, Duan H (2023) Hawk-pigeon game tactics for unmanned aerial vehicle swarm target defense. IEEE Trans Ind Inform 19:11619–11629
Article Google Scholar
Ma Y, Wang G, Hu X, Luo H, Lei X (2020) Cooperative occupancy decision making of multi-UAV in beyond-visual-range air combat: a game theory approach. IEEE Access 8:11624–11634
Article Google Scholar
Kang Y, Pu Z, Liu Z (2020) (2020) Air-to-air combat tactical decision method based on SIRMs fuzzy logic and improved genetic algorithm. In: Proceedings of international conference on guidance, navigation and control (ICGNC). Springer, Tianjin, pp 3699–3709
Crumpacker JB, Robbins MJ, Jenkins PR (2022) An approximate dynamic programming approach for solving an air combat maneuvering problem. Expert Syst Appl 203:117448
Article Google Scholar
Sharma R (2014) (2014) Fuzzy Q learning based UAV autopilot. In: Proceedings of innovative applications of computational intelligence on power, energy and controls with their impact on humanity (CIPECH). IEEE, Ghaziabad, pp 29–33
Liu Y, Liu W, Obaid MA, Abbas IA (2016) Exponential stability of Markovian jumping Cohen–Grossberg neural networks with mixed mode-dependent time-delays. Neurocomputing 177:409–415
Article Google Scholar
Du B, Liu Y, Atiatallah Abbas I (2016) Existence and asymptotic behavior results of periodic solution for discrete-time neutral-type neural networks. J Frankl Inst 353:448–461
Article MathSciNet Google Scholar
Emuna R, Duffney R, Borowsky A, Biess A (2022) Example-guided learning of stochastic human driving policies using deep reinforcement learning. Neural Comput Appl 35:16791–16804
Article Google Scholar
Kiani F, Saraç ÖF (2023) A novel intelligent traffic recovery model for emergency vehicles based on context-aware reinforcement learning. Inf Sci 619:288–309
Article Google Scholar
Damadam S, Zourbakhsh M, Javidan R, Faroughi A (2022) An intelligent IoT based traffic light management system: deep reinforcement learning. Smart Cities 5:1293–1311
Article Google Scholar
Zhu R, Li L, Wu S, Lv P, Li Y, Xu M (2023) Multi-agent broad reinforcement learning for intelligent traffic light control. Inf Sci 619:509–525
Article Google Scholar
Du G, Zou Y, Zhang X, Liu T, Wu J, He D (2020) Deep reinforcement learning based energy management for a hybrid electric vehicle. Energy 201:117591
Article Google Scholar
Yang D, Karimi HR, Pawelczyk M (2023) A new intelligent fault diagnosis framework for rotating machinery based on deep transfer reinforcement learning. Control Eng Pract 134:105475
Article Google Scholar
Liu Q, Shi L, Sun L, Li J, Ding M, Shu FS (2020) Path planning for UAV-mounted mobile edge computing with deep reinforcement learning. IEEE Trans Veh Technol 69:5723–5728
Article Google Scholar
Hoel C-J, Driggs-Campbell K, Wolff K, Laine L, Kochenderfer MJ (2020) Combining planning and deep reinforcement learning in tactical decision making for autonomous driving. IEEE Trans Intell Veh 5:294–305
Article Google Scholar
Leong AS, Ramaswamy A, Quevedo DE, Karl H (2020) Deep reinforcement learning for wireless sensor scheduling in cyber-physical system. Automatic 113:108759
Article MathSciNet Google Scholar
Liessner R, Schmitt J, Dietermann A, Bäker B (2019) Hyperparameter optimization for deep reinforcement learning in vehicle energy management. In: Proceedings of 11th international conference on agents artificial intelligence SCITEPRESS—science and technology publications, Prague, pp 134–144
Chen Y, Zhang J, Yang Q, Zhou Y, Shi G, Wu Y (2020) Design and verification of UAV maneuver decision Simulation system based on deep Q-learning network. In: Proceedings of 2020 16th international conference on control, automation, robotics and vision (ICARCV). IEEE, Shenzhen, pp 817–823
Cao Y, Kou Y-X, Li Z-W, Xu A (2023) Autonomous maneuver decision of UCAV air combat based on double deep Q network algorithm and stochastic game theory. Int J Aerosp Eng 2023:1–20
Article Google Scholar
Zhang J, Yu Y, Zheng L, Yang Q, Shi G, Wu Y (2023) Situational continuity-based air combat autonomous maneuvering decision-making. Def Technol 29:66–79
Article Google Scholar
Yang Q, Zhu Y, Zhang J, Qiao S, Liu J (2019) UAV air combat autonomous maneuver decision based on DDPG algorithm. In: 2019 IEEE 15th international conference on control automation. ICCA. IEEE, Edinburgh, pp 37–42
Zhang J, Yang Q, Shi G (2021) UAV cooperative air combat maneuver decision based on multi-agent reinforcement learning. J Syst Eng Electron 32:1421–1438
Article Google Scholar
Wang Z, Guo Y, Li N, Hu S, Wang M (2023) Autonomous collaborative combat strategy of unmanned system group in continuous dynamic environment based on PD-MADDPG. Comput Commun 200:182–204
Article Google Scholar
Li L, Zhang X, Qian C et al (2023) Basic flight maneuver generation of fixed-wing plane based on proximal policy optimization. Neural Comput Appl 2023:1–17
Google Scholar
Wang Z, Li H, Wu Z, Wu H (2021) A pretrained proximal policy optimization algorithm with reward shaping for aircraft guidance to a moving destination in three-dimensional continuous space. Int J Adv Robot Syst 18:172988142198954
Article Google Scholar
Liu X, Yin Y, Su Y, Ming R (2022) A multi-UCAV cooperative decision-making method based on an MAPPO algorithm for beyond-visual-range air combat. Aerospace 9:563–582
Article Google Scholar
Xu J, Zhang J, Yang L, Liu C (2022) Autonomous decision-making for dogfights based on a tactical pursuit point approach. Aerosp Sci Technol 129:107857
Article Google Scholar
Li B, Bai S, Liang S, Ma R, Neretin E, Huang J (2023) Manoeuvre decision-making of unmanned aerial vehicles in air combat based on an expert actor-based soft actor critic algorithm. CAAI Trans Intell Technol 8:1608–1619
Article Google Scholar
Li B, Huang J, Bai S, Gan Z, Liang S, Evgeny N, Yao S (2023) Autonomous air combat decision-making of UAV based on parallel self-play reinforcement learning. CAAI Trans Intell Technol 8:64–81
Article Google Scholar
Huang C, Dong K, Huang H, Tang S (2018) Autonomous air combat maneuver decision using Bayesian inference and moving horizon optimization. J Syst Eng Electron 29:86–97
Article Google Scholar
Johnson J (2023) Automating the OODA loop in the age of intelligent machines: reaffirming the role of humans in command-and-control decision-making in the digital age. Def Stud 23:43–67
Article Google Scholar
Wang LX, Guo YG, Zhang Q, Yue T (2017) Suggestion for aircraft flying qualities requirements of a short-range air combat mission. Chin J Aeronaut 30:881–897
Article Google Scholar
Li Y, Lyu Y, Shi J, Li W (2022) Autonomous maneuver decision of air combat based on simulated operation command and FRV-DDPG algorithm. Aerospace 9:658–676
Article Google Scholar
Austin F, Carbone G, Falco M, Hinz H, Lewis M (1987) Automated maneuvering decisions for air-to-air combat. In: Guidance, navigation and control conference, pp 2393

Download references

Acknowledgements

This work was funded by the National Nature Science Foundation of China Grant Nos. 62073177, 61973175, 62003351.

Author information

Authors and Affiliations

College of Artificial Intelligence, Nankai University, Tianjin, 300350, China
Tingyu Zhang, Yongshuai Wang, Mingwei Sun & Zengqiang Chen
Key Laboratory of Intelligent Robotics of Tianjin, Tianjin, 300350, China
Yongshuai Wang, Mingwei Sun & Zengqiang Chen

Authors

Tingyu Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Yongshuai Wang
View author publications
You can also search for this author in PubMed Google Scholar
Mingwei Sun
View author publications
You can also search for this author in PubMed Google Scholar
Zengqiang Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yongshuai Wang.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Ethical approval

This study belongs to the improvement and application innovation of reinforcement learning algorithms, so it does not involve ethical issues.

Informed consent

All authors are aware of this paper and agree to its submission.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Zhang, T., Wang, Y., Sun, M. et al. Air combat maneuver decision based on deep reinforcement learning with auxiliary reward. Neural Comput & Applic (2024). https://doi.org/10.1007/s00521-024-09720-z

Download citation

Received: 12 August 2023
Accepted: 25 March 2024
Published: 26 April 2024
DOI: https://doi.org/10.1007/s00521-024-09720-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Air combat maneuver decision based on deep reinforcement learning with auxiliary reward

Abstract

Access this article

Similar content being viewed by others

A Deep Reinforcement Learning Based Intelligent Decision Method for UCAV Air Combat

The Pursuit-Evasion Game Strategy of High-Speed Aircraft Based on Monte-Carlo Deep Reinforcement Learning

Research on Evasion Strategy of Aircraft Based on Deep Reinforcement Learning

Data availability

Abbreviations

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Informed consent

Additional information

Publisher's Note

Appendix

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Air combat maneuver decision based on deep reinforcement learning with auxiliary reward

Abstract

Access this article

Similar content being viewed by others

A Deep Reinforcement Learning Based Intelligent Decision Method for UCAV Air Combat

The Pursuit-Evasion Game Strategy of High-Speed Aircraft Based on Monte-Carlo Deep Reinforcement Learning

Research on Evasion Strategy of Aircraft Based on Deep Reinforcement Learning

Data availability

Abbreviations

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Informed consent

Additional information

Publisher's Note

Appendix

Appendix

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation