Skip to main content
Log in

CAAC: An effective reinforcement learning algorithm for sparse reward in automatic control systems

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Nowadays, reinforcement learning (RL) is increasingly being employed to optimize automatic control systems. This allows these systems to autonomously learn and improve their operations, enhancing their ability to adapt to changing environments and conditions. However, RL heavily relies on reward signals to guide learning, and in practical tasks, these signals are often sparse. This sparsity hinders the learning procedure as only a small amount of feedback signals are available. Most of the current methods to solve the sparse reward problem need to introduce a lot of hyperparameters, and the sample data utilization is relatively low. To address the issue of sparse rewards in RL-based automatic control systems, we propose the Cosine Attenuation Monte Carlo Augmented Actor-Critic (CAAC) algorithm. CAAC uses a cosine decay function to adjust the Q-value during training, optimizing the effect of final rewards and improving RL algorithm performance in the area of automatic control systems. In addition, we conduct experiments in three simulation environments to validate the proposed approach. The results demonstrate that CAAC outperforms the baseline algorithms in terms of learning speed and obtains 10% to 44.3% higher final reward.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Algorithm 1
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

Availability of data and materials

The data and materials supporting the findings of this study are available upon request. Researchers interested in accessing the data for further analysis or verification are invited to contact Kun Liu.

Data Availibility Statement

The source code used in the implementation of the analyses presented in this paper is available for access and review. Researchers interested in obtaining the code for replicating the study or further exploration are encouraged to contact Kun Liu

Code Availability

The source code used in the implementation of the analyses presented in this paper is available for access and review. Researchers interested in obtaining the code for replicating the study or further exploration are encouraged to contact Kun Liu.

References

  1. Li X, Lei H, Zhang L, Wang M (2023) Differentiable logic policy for interpretable deep reinforcement learning: A study from an optimization perspective. IEEE Trans Pattern Anal Mach Intell 45(10):11654–11667

    Google Scholar 

  2. Xia Z, Xue S, Wu J, Chen Y, Chen J, Wu L (2021) Deep reinforcement learning for smart city communication networks. IEEE Trans Industr Inf 17(6):4188–4196. https://doi.org/10.1109/TII.2020.3006199

    Article  Google Scholar 

  3. Xu Y, Zhou H, Ma T, Zhao J, Qian B, Shen X (2021) Leveraging multiagent learning for automated vehicles scheduling at nonsignalized intersections. IEEE Internet Things J 8(14):11427–11439. https://doi.org/10.1109/JIOT.2021.3054649

    Article  Google Scholar 

  4. Güitta-López L, Boal J, López-López ÁJ (2023) Learning more with the same effort: how randomization improves the robustness of a robotic deep reinforcement learning agent. Appl Intell 53(12):14903–14917. https://doi.org/10.1007/S10489-022-04227-3

    Article  Google Scholar 

  5. Wang M, Wu L, Li J, He L (2022) Traffic signal control with reinforcement learning based on region-aware cooperative strategy. IEEE Trans Intell Transp Syst 23(7):6774–6785. https://doi.org/10.1109/TITS.2021.3062072

    Article  Google Scholar 

  6. Silver D, Schrittwieser J, Simonyan K, Antonoglou I, Huang A, Guez A, Hubert T, Baker L, Lai M, Bolton A et al (2017) Mastering the game of go without human knowledge. Nature 550(7676):354–359. https://doi.org/10.1038/NATURE24270

    Article  Google Scholar 

  7. Vinyals O, Babuschkin I, Czarnecki WM, Mathieu M, Dudzik A, Chung J, Choi DH, Powell R, Ewalds T, Georgiev P et al (2019) Grandmaster level in starcraft II using multi-agent reinforcement learning. Nature 575(7782):350–354. https://doi.org/10.1038/S41586-019-1724-Z

    Article  Google Scholar 

  8. Agarwal R, Schwarzer M, Castro PS, Courville AC, Bellemare M (2022) Reincarnating reinforcement learning: Reusing prior computation to accelerate progress. In: Advances in neural information processing systems, vol 35, pp 28955–28971. http://papers.nips.cc/paper_files/paper/2022/hash/ba1c5356d9164bb64c446a4b690226b0-Abstract-Conference.html

  9. Yaghmaie FA, Gustafsson F, Ljung L (2022) Linear quadratic control using model-free reinforcement learning. IEEE Trans Autom Control 68(2):737–752

    Article  MathSciNet  Google Scholar 

  10. Ladosz P, Weng L, Kim M, Oh H (2022) Exploration in deep reinforcement learning: A survey. Inform Fusion 85:1–22

  11. Dilokthanakul N, Kaplanis C, Pawlowski N, Shanahan M (2019) Feature control as intrinsic motivation for hierarchical reinforcement learning. IEEE Trans Neural Netw Learn Syst 30(11):3409–3418. https://doi.org/10.1109/TNNLS.2019.2891792

    Article  Google Scholar 

  12. Zhao H, Wu J, Li Z, Chen W, Zheng Z (2022) Double sparse deep reinforcement learning via multilayer sparse coding and nonconvex regularized pruning. IEEE Trans Cybern 53(2):765–778

    Article  Google Scholar 

  13. Kipf T, Li Y, Dai H, Zambaldi VF, Sanchez-Gonzalez A, Grefenstette E, Kohli P, Battaglia PW (2019) Compile: Compositional imitation learning and execution. In: Proceedings of the 36th international conference on machine learning, vol 97 pp 3418–3428. http://proceedings.mlr.press/v97/kipf19a.html

  14. Ravichandar H, Polydoros AS, Chernova S (2020) Billard, A. Annual review of control, robotics, and autonomous systems 3:297–330. https://doi.org/10.1146/ANNUREV-CONTROL-100819-063206

    Article  Google Scholar 

  15. Le Mero L, Yi D, Dianati M, Mouzakitis A (2022) A survey on imitation learning techniques for end-to-end autonomous vehicles. IEEE Trans Intell Transp Syst 23(9):14128–14147

    Article  Google Scholar 

  16. Wilcox A, Balakrishna A, Dedieu J, Benslimane W, Brown D, Goldberg K (2022) Monte carlo augmented actor-critic for sparse reward deep reinforcement learning from suboptimal demonstrations. In: Advances in neural information processing systems, vol 35, pp 2254–2267

  17. Aotani T, Kobayashi T, Sugimoto K (2021) Bottom-up multi-agent reinforcement learning by reward shaping for cooperative-competitive tasks. Appl Intell 51(7):4434–4452. https://doi.org/10.1007/S10489-020-02034-2

    Article  Google Scholar 

  18. Hu Y, Wang W, Jia H, Wang Y, Chen Y, Hao J, Wu F, Fan C (2020) Learning to utilize shaping rewards: A new approach of reward shaping. In: Advances in neural information processing systems, vol 33, pp 15931–15941

  19. Sun H, Han L, Yang R, Ma X, Guo J, Zhou B (2022) Exploit reward shifting in value-based deep-rl: Optimistic curiosity-based exploration and conservative exploitation via linear reward shaping. In: Advances in neural information processing systems, vol 35, pp 37719–37734

  20. Sami H, Bentahar J, Mourad A, Otrok H, Damiani E (2022) Graph convolutional recurrent networks for reward shaping in reinforcement learning. Inf Sci 608:63–80

    Article  Google Scholar 

  21. Li J, Wu X, Xu M, Liu Y (2022) Deep reinforcement learning and reward shaping based eco-driving control for automated hevs among signalized intersections. Energy 251:123924

    Article  Google Scholar 

  22. Zheng B, Verma S, Zhou J, Tsang IW, Chen F (2022) Imitation learning: Progress, taxonomies and challenges. IEEE Trans on Neural Netw and Learn Syst

  23. Zhu Z, Lin K, Dai B, Zhou J (2022) Self-adaptive imitation learning: Learning tasks with delayed rewards from sub-optimal demonstrations. In: Proceedings of the AAAI conference on artificial intelligence, vol 36, pp 9269–9277

  24. Xu T, Li Z, Yu Y (2022) Error bounds of imitating policies and environments for reinforcement learning. IEEE Trans Pattern Anal Mach Intell 44(10):6968–6980. https://doi.org/10.1109/TPAMI.2021.3096966

    Article  Google Scholar 

  25. Rolnick D, Ahuja A, Schwarz J, Lillicrap TP, Wayne G (2019) Experience replay for continual learning. In: Advances in neural information processing systems, pp 348–358. https://proceedings.neurips.cc/paper/2019/hash/fa7cdfad1a5aaf8370ebeda47a1ff1c3-Abstract.html

  26. Wu J, Huang Z, Huang W, Lv C (2022) Prioritized experience-based reinforcement learning with human guidance for autonomous driving. IEEE Trans Neural Netw Learn Syst 35(1):855–869

  27. Packer C, Abbeel P, Gonzalez JE (2021) Hindsight task relabelling: Experience replay for sparse reward meta-rl. In: Advances in neural information processing systems, pp 2466–2477. https://proceedings.neurips.cc/paper/2021/hash/1454ca2270599546dfcd2a3700e4d2f1-Abstract.html

  28. Weng W, Gupta H, He N, Ying L, Srikant R (2020) The mean-squared error of double q-learning. In: Advances in neural information processing systems, vol 33, pp 6815–6826. https://proceedings.neurips.cc/paper/2020/hash/4bfbd52f4e8466dc12aaf30b7e057b66-Abstract.html

  29. Yang Y, Hao J, Chen G, Tang H, Chen Y, Hu Y, Fan C, Wei Z (2020) Q-value path decomposition for deep multiagent reinforcement learning. In: Proceedings of the 37th international conference on machine learning, vol 119, pp 10706–10715. http://proceedings.mlr.press/v119/yang20d.html

  30. Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: Proceedings of the 35th international conference on machine learning, vol 80, pp 1582–1591. http://proceedings.mlr.press/v80/fujimoto18a.html

  31. Dabney W, Rowland M, Bellemare MG, Munos R (2018) Distributional reinforcement learning with quantile regression. In: Proceedings of the thirty-second AAAI conference on artificial intelligence, pp 2892–2901. https://doi.org/10.1609/AAAI.V32I1.11791

  32. Kapturowski S, Ostrovski G, Quan J, Munos R, Dabney W (2019) Recurrent experience replay in distributed reinforcement learning. In: International conference on learning representations . https://openreview.net/forum?id=r1lyTjAqYX

  33. Sutton RS, Barto AG (2018) Reinforcement Learning: An Introduction, Cambridge

  34. Haarnoja T, Zhou A, Abbeel P, Levine S (2018) Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In: Proceedings of the 35th international conference on machine learning, vol 80, pp 1856–1865. http://proceedings.mlr.press/v80/haarnoja18b.html

  35. Lee SY, Choi S, Chung S (2019) Sample-efficient deep reinforcement learning via episodic backward update. In: Advances in neural information processing systems, pp 2110–2119. https://proceedings.neurips.cc/paper/2019/hash/e6d8545daa42d5ced125a4bf747b3688-Abstract.html

Download references

Acknowledgements

This work was supported by the National Key R &D Program of China (No.2022YFB3104500), National Natural Science Foundation of China (No. U20A20177, U22B2022, 62272348), Wuhan Science and Technology Joint Project for Building a Strong Transportation Country(No.2023-2-7), and Open Research Fund from Guangdong Laboratory of Artificial Intelligence and Digital Economy (SZ) (No.GML-KF-22-07).

Author information

Authors and Affiliations

Authors

Contributions

Each author contributed significantly to the conception, design, and execution of this study.The first draft was written by Kun Liu, and all authors reviewed and revised the previous versions of the manuscript. Each author has thoroughly reviewed and endorsed the final version of the manuscript.

Corresponding authors

Correspondence to Libing Wu or Zhuangzhuang Zhang.

Ethics declarations

Ethics approval

The authors confirm that this research project has undergone rigorous ethical review and has received approval.

Competing interests

The authors declare no conflicts of interest that could influence the research design, data collection, or interpretation of results.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, K., Wu, L., Zhang, Z. et al. CAAC: An effective reinforcement learning algorithm for sparse reward in automatic control systems. Appl Intell (2024). https://doi.org/10.1007/s10489-024-05464-4

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10489-024-05464-4

Keywords

Navigation