Skip to main content
Log in

Air combat maneuver decision based on deep reinforcement learning with auxiliary reward

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

For air combat maneuvering decision, the sparse reward during the application of deep reinforcement learning limits the exploration efficiency of the agents. To address this challenge, we propose an auxiliary reward function considering the impact of angle, range, and altitude. Furthermore, we investigate the influences of the network nodes, layers, and the learning rate on decision system, and reasonable parameter ranges are provided, which can serve as a guideline. Finally, four typical air combat scenarios demonstrate good adaptability and effectiveness of the proposed scheme, and the auxiliary reward significantly improves the learning ability of deep Q network (DQN) by leading the agents to explore more intently. Compared with the original deep deterministic policy gradient and soft actor critic algorithm, the proposed method exhibits superior exploration capability with higher reward, indicating that the trained agent can adapt to different air combats with good performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22

Similar content being viewed by others

Data availability

No datasets were analyzed during the current study. The data generated during the current study are available from the corresponding author on reasonable request.

Abbreviations

UCAV:

Unmanned combat aerial vehicle

DQN:

Deep Q network

DDPG:

Deep deterministic policy gradient

SAC:

Soft actor critic

LOS:

Line of sight

AA:

Aspect angle (rad)

ATA:

Antenna train angle (rad)

HTC:

Heading crossing angle (rad)

R :

Distance between UCAV and target (m/s)

\(\dot{R}\) :

Change rate of R (m/s)

\({z_\text{U}}\),\({z_\text{T}}\) :

Flight altitude of UCAV and target (m)

\({v_\text{U}}\),\({v_\text{T}}\) :

Flight speed of UCAV and target (m/s)

\(\Delta h\) :

Flight altitude difference (rad)

\(\Delta v\) :

Flight speed difference (m/s)

\({R_\text{w}}\) :

Airborne weapon attack range (m)

\({\varphi _\text{w}}\) :

Maximum weapon attack angle (deg)

\({q_\text{w}}\) :

Maximum aspect angle

\({P_\text{U}},{P_\text{T}}\) :

Position vector of UCAV and target

References

  1. Alpdemir MN (2022) Tactical UAV path optimization under radar threat using deep reinforcement learning. Neural Comput Appl 34:5649–5664

    Article  Google Scholar 

  2. Liu H, Meng Q, Peng F, Lewis FL (2020) Heterogeneous formation control of multiple UAVs with limited-input leader via reinforcement learning. Neurocomputing 412:63–71

    Article  Google Scholar 

  3. Zhou K, Wei R, Xu Z (2020) An air combat decision learning system based on a brain-like cognitive mechanism. Cogn Comput 12:128–139

    Article  Google Scholar 

  4. Trotta A, Felice MD, Montori F, Chowdhury KR, Bononi L (2018) Joint coverage, connectivity, and charging strategies for distributed UAV networks. IEEE Trans Robot 34:883–900

    Article  Google Scholar 

  5. Sun Z, Wu H, Shi Y, Yu X, Gao Y, Pei W, Yang Z, Piao H, Hou Y (2023) Multi-agent air combat with two-stage graph-attention communication. Neural Comput Appl 35:19765–19781

    Article  Google Scholar 

  6. Shin H, Lee J, Kim H, Hyunchul Shim D (2018) An autonomous aerial combat framework for two-on-two engagements based on basic fighter maneuvers. Aerosp Sci Technol 72:305–315

    Article  Google Scholar 

  7. Maravall Lope J, Fuentes JP (2015) Vision-based anticipatory controller for the autonomous navigation of an UAV using artificial neural networks. Neurocomputing 151:101–107

    Article  Google Scholar 

  8. Dai X, Mao Y, Huang T (2020) Automatic obstacle avoidance of quadrotor UAV via CNN-based learning. Neurocomputing 402:346–358

    Article  Google Scholar 

  9. Wang M, Wang L, Yue T, Liu H (2020) Influence of unmanned combat aerial vehicle agility on short-range aerial combat effectiveness. Aerosp Sci Technol 96:105534

    Article  Google Scholar 

  10. Zhou K, Wei R, Xu Z, Zhang Q (2018) (2018) A brain like air combat learning system inspired by human learning mechanism. In: Proceedings of IEEE CSAA guidance, navigation and control conference (CGNCC). IEEE, Xiamen, pp 1–6

  11. Wang X, Guo K, Chao T, Wang S (2022) Design of differential game guidance law for dual defense aircrafts. In: Proceedings of 2022 5th international symposium on autonomous systems (ISAS). IEEE, Hangzhou, pp 1–6

  12. Weintraub IE, Pachter M, Garcia E (2020) (2020) An introduction to pursuit-evasion differential games. In: Proceedings of American control conference (ACC). IEEE, Denver, pp 1049–1066

  13. Ruan W, Sun Y, Deng Y, Duan H (2023) Hawk-pigeon game tactics for unmanned aerial vehicle swarm target defense. IEEE Trans Ind Inform 19:11619–11629

    Article  Google Scholar 

  14. Ma Y, Wang G, Hu X, Luo H, Lei X (2020) Cooperative occupancy decision making of multi-UAV in beyond-visual-range air combat: a game theory approach. IEEE Access 8:11624–11634

    Article  Google Scholar 

  15. Kang Y, Pu Z, Liu Z (2020) (2020) Air-to-air combat tactical decision method based on SIRMs fuzzy logic and improved genetic algorithm. In: Proceedings of international conference on guidance, navigation and control (ICGNC). Springer, Tianjin, pp 3699–3709

  16. Crumpacker JB, Robbins MJ, Jenkins PR (2022) An approximate dynamic programming approach for solving an air combat maneuvering problem. Expert Syst Appl 203:117448

    Article  Google Scholar 

  17. Sharma R (2014) (2014) Fuzzy Q learning based UAV autopilot. In: Proceedings of innovative applications of computational intelligence on power, energy and controls with their impact on humanity (CIPECH). IEEE, Ghaziabad, pp 29–33

  18. Liu Y, Liu W, Obaid MA, Abbas IA (2016) Exponential stability of Markovian jumping Cohen–Grossberg neural networks with mixed mode-dependent time-delays. Neurocomputing 177:409–415

    Article  Google Scholar 

  19. Du B, Liu Y, Atiatallah Abbas I (2016) Existence and asymptotic behavior results of periodic solution for discrete-time neutral-type neural networks. J Frankl Inst 353:448–461

    Article  MathSciNet  Google Scholar 

  20. Emuna R, Duffney R, Borowsky A, Biess A (2022) Example-guided learning of stochastic human driving policies using deep reinforcement learning. Neural Comput Appl 35:16791–16804

    Article  Google Scholar 

  21. Kiani F, Saraç ÖF (2023) A novel intelligent traffic recovery model for emergency vehicles based on context-aware reinforcement learning. Inf Sci 619:288–309

    Article  Google Scholar 

  22. Damadam S, Zourbakhsh M, Javidan R, Faroughi A (2022) An intelligent IoT based traffic light management system: deep reinforcement learning. Smart Cities 5:1293–1311

    Article  Google Scholar 

  23. Zhu R, Li L, Wu S, Lv P, Li Y, Xu M (2023) Multi-agent broad reinforcement learning for intelligent traffic light control. Inf Sci 619:509–525

    Article  Google Scholar 

  24. Du G, Zou Y, Zhang X, Liu T, Wu J, He D (2020) Deep reinforcement learning based energy management for a hybrid electric vehicle. Energy 201:117591

    Article  Google Scholar 

  25. Yang D, Karimi HR, Pawelczyk M (2023) A new intelligent fault diagnosis framework for rotating machinery based on deep transfer reinforcement learning. Control Eng Pract 134:105475

    Article  Google Scholar 

  26. Liu Q, Shi L, Sun L, Li J, Ding M, Shu FS (2020) Path planning for UAV-mounted mobile edge computing with deep reinforcement learning. IEEE Trans Veh Technol 69:5723–5728

    Article  Google Scholar 

  27. Hoel C-J, Driggs-Campbell K, Wolff K, Laine L, Kochenderfer MJ (2020) Combining planning and deep reinforcement learning in tactical decision making for autonomous driving. IEEE Trans Intell Veh 5:294–305

    Article  Google Scholar 

  28. Leong AS, Ramaswamy A, Quevedo DE, Karl H (2020) Deep reinforcement learning for wireless sensor scheduling in cyber-physical system. Automatic 113:108759

    Article  MathSciNet  Google Scholar 

  29. Liessner R, Schmitt J, Dietermann A, Bäker B (2019) Hyperparameter optimization for deep reinforcement learning in vehicle energy management. In: Proceedings of 11th international conference on agents artificial intelligence SCITEPRESS—science and technology publications, Prague, pp 134–144

  30. Chen Y, Zhang J, Yang Q, Zhou Y, Shi G, Wu Y (2020) Design and verification of UAV maneuver decision Simulation system based on deep Q-learning network. In: Proceedings of 2020 16th international conference on control, automation, robotics and vision (ICARCV). IEEE, Shenzhen, pp 817–823

  31. Cao Y, Kou Y-X, Li Z-W, Xu A (2023) Autonomous maneuver decision of UCAV air combat based on double deep Q network algorithm and stochastic game theory. Int J Aerosp Eng 2023:1–20

    Article  Google Scholar 

  32. Zhang J, Yu Y, Zheng L, Yang Q, Shi G, Wu Y (2023) Situational continuity-based air combat autonomous maneuvering decision-making. Def Technol 29:66–79

    Article  Google Scholar 

  33. Yang Q, Zhu Y, Zhang J, Qiao S, Liu J (2019) UAV air combat autonomous maneuver decision based on DDPG algorithm. In: 2019 IEEE 15th international conference on control automation. ICCA. IEEE, Edinburgh, pp 37–42

  34. Zhang J, Yang Q, Shi G (2021) UAV cooperative air combat maneuver decision based on multi-agent reinforcement learning. J Syst Eng Electron 32:1421–1438

    Article  Google Scholar 

  35. Wang Z, Guo Y, Li N, Hu S, Wang M (2023) Autonomous collaborative combat strategy of unmanned system group in continuous dynamic environment based on PD-MADDPG. Comput Commun 200:182–204

    Article  Google Scholar 

  36. Li L, Zhang X, Qian C et al (2023) Basic flight maneuver generation of fixed-wing plane based on proximal policy optimization. Neural Comput Appl 2023:1–17

    Google Scholar 

  37. Wang Z, Li H, Wu Z, Wu H (2021) A pretrained proximal policy optimization algorithm with reward shaping for aircraft guidance to a moving destination in three-dimensional continuous space. Int J Adv Robot Syst 18:172988142198954

    Article  Google Scholar 

  38. Liu X, Yin Y, Su Y, Ming R (2022) A multi-UCAV cooperative decision-making method based on an MAPPO algorithm for beyond-visual-range air combat. Aerospace 9:563–582

    Article  Google Scholar 

  39. Xu J, Zhang J, Yang L, Liu C (2022) Autonomous decision-making for dogfights based on a tactical pursuit point approach. Aerosp Sci Technol 129:107857

    Article  Google Scholar 

  40. Li B, Bai S, Liang S, Ma R, Neretin E, Huang J (2023) Manoeuvre decision-making of unmanned aerial vehicles in air combat based on an expert actor-based soft actor critic algorithm. CAAI Trans Intell Technol 8:1608–1619

    Article  Google Scholar 

  41. Li B, Huang J, Bai S, Gan Z, Liang S, Evgeny N, Yao S (2023) Autonomous air combat decision-making of UAV based on parallel self-play reinforcement learning. CAAI Trans Intell Technol 8:64–81

    Article  Google Scholar 

  42. Huang C, Dong K, Huang H, Tang S (2018) Autonomous air combat maneuver decision using Bayesian inference and moving horizon optimization. J Syst Eng Electron 29:86–97

    Article  Google Scholar 

  43. Johnson J (2023) Automating the OODA loop in the age of intelligent machines: reaffirming the role of humans in command-and-control decision-making in the digital age. Def Stud 23:43–67

    Article  Google Scholar 

  44. Wang LX, Guo YG, Zhang Q, Yue T (2017) Suggestion for aircraft flying qualities requirements of a short-range air combat mission. Chin J Aeronaut 30:881–897

    Article  Google Scholar 

  45. Li Y, Lyu Y, Shi J, Li W (2022) Autonomous maneuver decision of air combat based on simulated operation command and FRV-DDPG algorithm. Aerospace 9:658–676

    Article  Google Scholar 

  46. Austin F, Carbone G, Falco M, Hinz H, Lewis M (1987) Automated maneuvering decisions for air-to-air combat. In: Guidance, navigation and control conference, pp 2393

Download references

Acknowledgements

This work was funded by the National Nature Science Foundation of China Grant Nos. 62073177, 61973175, 62003351.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yongshuai Wang.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Ethical approval

This study belongs to the improvement and application innovation of reinforcement learning algorithms, so it does not involve ethical issues.

Informed consent

All authors are aware of this paper and agree to its submission.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

Algorithm 1
figure a

Maneuver decision algorithm based on the DQN

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, T., Wang, Y., Sun, M. et al. Air combat maneuver decision based on deep reinforcement learning with auxiliary reward. Neural Comput & Applic (2024). https://doi.org/10.1007/s00521-024-09720-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s00521-024-09720-z

Keywords

Navigation