CAAC: An effective reinforcement learning algorithm for sparse reward in automatic control systems

Liu, Kun; Wu, Libing; Zhang, Zhuangzhuang; Hu, Xinrong; Lu, Na; Wei, Xuejiang

doi:10.1007/s10489-024-05464-4

CAAC: An effective reinforcement learning algorithm for sparse reward in automatic control systems

Published: 03 May 2024

(2024)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Kun Liu^1,2,
Libing Wu^1,2,
Zhuangzhuang Zhang ORCID: orcid.org/0000-0001-7749-2718^1,2,
Xinrong Hu³,
Na Lu⁴ &
…
Xuejiang Wei⁴

76 Accesses
Explore all metrics

Abstract

Nowadays, reinforcement learning (RL) is increasingly being employed to optimize automatic control systems. This allows these systems to autonomously learn and improve their operations, enhancing their ability to adapt to changing environments and conditions. However, RL heavily relies on reward signals to guide learning, and in practical tasks, these signals are often sparse. This sparsity hinders the learning procedure as only a small amount of feedback signals are available. Most of the current methods to solve the sparse reward problem need to introduce a lot of hyperparameters, and the sample data utilization is relatively low. To address the issue of sparse rewards in RL-based automatic control systems, we propose the Cosine Attenuation Monte Carlo Augmented Actor-Critic (CAAC) algorithm. CAAC uses a cosine decay function to adjust the Q-value during training, optimizing the effect of final rewards and improving RL algorithm performance in the area of automatic control systems. In addition, we conduct experiments in three simulation environments to validate the proposed approach. The results demonstrate that CAAC outperforms the baseline algorithms in terms of learning speed and obtains 10% to 44.3% higher final reward.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Algorithm 1

ACRE: Actor-Critic with Reward-Preserving Exploration

Article Open access 14 August 2023

An Improved Off-Policy Actor-Critic Algorithm with Historical Behaviors Reusing for Robotic Control

Integrated Actor-Critic for Deep Reinforcement Learning

Availability of data and materials

The data and materials supporting the findings of this study are available upon request. Researchers interested in accessing the data for further analysis or verification are invited to contact Kun Liu.

Data Availibility Statement

The source code used in the implementation of the analyses presented in this paper is available for access and review. Researchers interested in obtaining the code for replicating the study or further exploration are encouraged to contact Kun Liu

Code Availability

The source code used in the implementation of the analyses presented in this paper is available for access and review. Researchers interested in obtaining the code for replicating the study or further exploration are encouraged to contact Kun Liu.

References

Li X, Lei H, Zhang L, Wang M (2023) Differentiable logic policy for interpretable deep reinforcement learning: A study from an optimization perspective. IEEE Trans Pattern Anal Mach Intell 45(10):11654–11667
Google Scholar
Xia Z, Xue S, Wu J, Chen Y, Chen J, Wu L (2021) Deep reinforcement learning for smart city communication networks. IEEE Trans Industr Inf 17(6):4188–4196. https://doi.org/10.1109/TII.2020.3006199
Article Google Scholar
Xu Y, Zhou H, Ma T, Zhao J, Qian B, Shen X (2021) Leveraging multiagent learning for automated vehicles scheduling at nonsignalized intersections. IEEE Internet Things J 8(14):11427–11439. https://doi.org/10.1109/JIOT.2021.3054649
Article Google Scholar
Güitta-López L, Boal J, López-López ÁJ (2023) Learning more with the same effort: how randomization improves the robustness of a robotic deep reinforcement learning agent. Appl Intell 53(12):14903–14917. https://doi.org/10.1007/S10489-022-04227-3
Article Google Scholar
Wang M, Wu L, Li J, He L (2022) Traffic signal control with reinforcement learning based on region-aware cooperative strategy. IEEE Trans Intell Transp Syst 23(7):6774–6785. https://doi.org/10.1109/TITS.2021.3062072
Article Google Scholar
Silver D, Schrittwieser J, Simonyan K, Antonoglou I, Huang A, Guez A, Hubert T, Baker L, Lai M, Bolton A et al (2017) Mastering the game of go without human knowledge. Nature 550(7676):354–359. https://doi.org/10.1038/NATURE24270
Article Google Scholar
Vinyals O, Babuschkin I, Czarnecki WM, Mathieu M, Dudzik A, Chung J, Choi DH, Powell R, Ewalds T, Georgiev P et al (2019) Grandmaster level in starcraft II using multi-agent reinforcement learning. Nature 575(7782):350–354. https://doi.org/10.1038/S41586-019-1724-Z
Article Google Scholar
Agarwal R, Schwarzer M, Castro PS, Courville AC, Bellemare M (2022) Reincarnating reinforcement learning: Reusing prior computation to accelerate progress. In: Advances in neural information processing systems, vol 35, pp 28955–28971. http://papers.nips.cc/paper_files/paper/2022/hash/ba1c5356d9164bb64c446a4b690226b0-Abstract-Conference.html
Yaghmaie FA, Gustafsson F, Ljung L (2022) Linear quadratic control using model-free reinforcement learning. IEEE Trans Autom Control 68(2):737–752
Article MathSciNet Google Scholar
Ladosz P, Weng L, Kim M, Oh H (2022) Exploration in deep reinforcement learning: A survey. Inform Fusion 85:1–22
Dilokthanakul N, Kaplanis C, Pawlowski N, Shanahan M (2019) Feature control as intrinsic motivation for hierarchical reinforcement learning. IEEE Trans Neural Netw Learn Syst 30(11):3409–3418. https://doi.org/10.1109/TNNLS.2019.2891792
Article Google Scholar
Zhao H, Wu J, Li Z, Chen W, Zheng Z (2022) Double sparse deep reinforcement learning via multilayer sparse coding and nonconvex regularized pruning. IEEE Trans Cybern 53(2):765–778
Article Google Scholar
Kipf T, Li Y, Dai H, Zambaldi VF, Sanchez-Gonzalez A, Grefenstette E, Kohli P, Battaglia PW (2019) Compile: Compositional imitation learning and execution. In: Proceedings of the 36th international conference on machine learning, vol 97 pp 3418–3428. http://proceedings.mlr.press/v97/kipf19a.html
Ravichandar H, Polydoros AS, Chernova S (2020) Billard, A. Annual review of control, robotics, and autonomous systems 3:297–330. https://doi.org/10.1146/ANNUREV-CONTROL-100819-063206
Article Google Scholar
Le Mero L, Yi D, Dianati M, Mouzakitis A (2022) A survey on imitation learning techniques for end-to-end autonomous vehicles. IEEE Trans Intell Transp Syst 23(9):14128–14147
Article Google Scholar
Wilcox A, Balakrishna A, Dedieu J, Benslimane W, Brown D, Goldberg K (2022) Monte carlo augmented actor-critic for sparse reward deep reinforcement learning from suboptimal demonstrations. In: Advances in neural information processing systems, vol 35, pp 2254–2267
Aotani T, Kobayashi T, Sugimoto K (2021) Bottom-up multi-agent reinforcement learning by reward shaping for cooperative-competitive tasks. Appl Intell 51(7):4434–4452. https://doi.org/10.1007/S10489-020-02034-2
Article Google Scholar
Hu Y, Wang W, Jia H, Wang Y, Chen Y, Hao J, Wu F, Fan C (2020) Learning to utilize shaping rewards: A new approach of reward shaping. In: Advances in neural information processing systems, vol 33, pp 15931–15941
Sun H, Han L, Yang R, Ma X, Guo J, Zhou B (2022) Exploit reward shifting in value-based deep-rl: Optimistic curiosity-based exploration and conservative exploitation via linear reward shaping. In: Advances in neural information processing systems, vol 35, pp 37719–37734
Sami H, Bentahar J, Mourad A, Otrok H, Damiani E (2022) Graph convolutional recurrent networks for reward shaping in reinforcement learning. Inf Sci 608:63–80
Article Google Scholar
Li J, Wu X, Xu M, Liu Y (2022) Deep reinforcement learning and reward shaping based eco-driving control for automated hevs among signalized intersections. Energy 251:123924
Article Google Scholar
Zheng B, Verma S, Zhou J, Tsang IW, Chen F (2022) Imitation learning: Progress, taxonomies and challenges. IEEE Trans on Neural Netw and Learn Syst
Zhu Z, Lin K, Dai B, Zhou J (2022) Self-adaptive imitation learning: Learning tasks with delayed rewards from sub-optimal demonstrations. In: Proceedings of the AAAI conference on artificial intelligence, vol 36, pp 9269–9277
Xu T, Li Z, Yu Y (2022) Error bounds of imitating policies and environments for reinforcement learning. IEEE Trans Pattern Anal Mach Intell 44(10):6968–6980. https://doi.org/10.1109/TPAMI.2021.3096966
Article Google Scholar
Rolnick D, Ahuja A, Schwarz J, Lillicrap TP, Wayne G (2019) Experience replay for continual learning. In: Advances in neural information processing systems, pp 348–358. https://proceedings.neurips.cc/paper/2019/hash/fa7cdfad1a5aaf8370ebeda47a1ff1c3-Abstract.html
Wu J, Huang Z, Huang W, Lv C (2022) Prioritized experience-based reinforcement learning with human guidance for autonomous driving. IEEE Trans Neural Netw Learn Syst 35(1):855–869
Packer C, Abbeel P, Gonzalez JE (2021) Hindsight task relabelling: Experience replay for sparse reward meta-rl. In: Advances in neural information processing systems, pp 2466–2477. https://proceedings.neurips.cc/paper/2021/hash/1454ca2270599546dfcd2a3700e4d2f1-Abstract.html
Weng W, Gupta H, He N, Ying L, Srikant R (2020) The mean-squared error of double q-learning. In: Advances in neural information processing systems, vol 33, pp 6815–6826. https://proceedings.neurips.cc/paper/2020/hash/4bfbd52f4e8466dc12aaf30b7e057b66-Abstract.html
Yang Y, Hao J, Chen G, Tang H, Chen Y, Hu Y, Fan C, Wei Z (2020) Q-value path decomposition for deep multiagent reinforcement learning. In: Proceedings of the 37th international conference on machine learning, vol 119, pp 10706–10715. http://proceedings.mlr.press/v119/yang20d.html
Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: Proceedings of the 35th international conference on machine learning, vol 80, pp 1582–1591. http://proceedings.mlr.press/v80/fujimoto18a.html
Dabney W, Rowland M, Bellemare MG, Munos R (2018) Distributional reinforcement learning with quantile regression. In: Proceedings of the thirty-second AAAI conference on artificial intelligence, pp 2892–2901. https://doi.org/10.1609/AAAI.V32I1.11791
Kapturowski S, Ostrovski G, Quan J, Munos R, Dabney W (2019) Recurrent experience replay in distributed reinforcement learning. In: International conference on learning representations . https://openreview.net/forum?id=r1lyTjAqYX
Sutton RS, Barto AG (2018) Reinforcement Learning: An Introduction, Cambridge
Haarnoja T, Zhou A, Abbeel P, Levine S (2018) Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In: Proceedings of the 35th international conference on machine learning, vol 80, pp 1856–1865. http://proceedings.mlr.press/v80/haarnoja18b.html
Lee SY, Choi S, Chung S (2019) Sample-efficient deep reinforcement learning via episodic backward update. In: Advances in neural information processing systems, pp 2110–2119. https://proceedings.neurips.cc/paper/2019/hash/e6d8545daa42d5ced125a4bf747b3688-Abstract.html

Download references

Acknowledgements

This work was supported by the National Key R &D Program of China (No.2022YFB3104500), National Natural Science Foundation of China (No. U20A20177, U22B2022, 62272348), Wuhan Science and Technology Joint Project for Building a Strong Transportation Country(No.2023-2-7), and Open Research Fund from Guangdong Laboratory of Artificial Intelligence and Digital Economy (SZ) (No.GML-KF-22-07).

Author information

Authors and Affiliations

Key Laboratory of Aerospace Information Security and Trusted Computing, Ministry of Education, School of Cyber Science and Engineering, Wuhan University, Wuhan, 430072, China
Kun Liu, Libing Wu & Zhuangzhuang Zhang
Guangdong Laboratory of Artificial Intelligence and Digital Economy (SZ), ShenZhen, 518000, China
Kun Liu, Libing Wu & Zhuangzhuang Zhang
Engineering Research Center of Hubei Province for Clothing Information, Wuhan Textile University, Wuhan, 430200, China
Xinrong Hu
School of artificial intelligence, Wuhan Technology and Business University, Wuhan, 430072, China
Na Lu & Xuejiang Wei

Authors

Kun Liu
View author publications
You can also search for this author in PubMed Google Scholar
Libing Wu
View author publications
You can also search for this author in PubMed Google Scholar
Zhuangzhuang Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Xinrong Hu
View author publications
You can also search for this author in PubMed Google Scholar
Na Lu
View author publications
You can also search for this author in PubMed Google Scholar
Xuejiang Wei
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Each author contributed significantly to the conception, design, and execution of this study.The first draft was written by Kun Liu, and all authors reviewed and revised the previous versions of the manuscript. Each author has thoroughly reviewed and endorsed the final version of the manuscript.

Corresponding authors

Correspondence to Libing Wu or Zhuangzhuang Zhang.

Ethics declarations

Ethics approval

The authors confirm that this research project has undergone rigorous ethical review and has received approval.

Competing interests

The authors declare no conflicts of interest that could influence the research design, data collection, or interpretation of results.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Liu, K., Wu, L., Zhang, Z. et al. CAAC: An effective reinforcement learning algorithm for sparse reward in automatic control systems. Appl Intell (2024). https://doi.org/10.1007/s10489-024-05464-4

Download citation

Accepted: 11 April 2024
Published: 03 May 2024
DOI: https://doi.org/10.1007/s10489-024-05464-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

CAAC: An effective reinforcement learning algorithm for sparse reward in automatic control systems

Abstract

Access this article

Similar content being viewed by others

ACRE: Actor-Critic with Reward-Preserving Exploration

An Improved Off-Policy Actor-Critic Algorithm with Historical Behaviors Reusing for Robotic Control

Integrated Actor-Critic for Deep Reinforcement Learning

Availability of data and materials

Data Availibility Statement

Code Availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Ethics approval

Competing interests

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

CAAC: An effective reinforcement learning algorithm for sparse reward in automatic control systems

Abstract

Access this article

Similar content being viewed by others

ACRE: Actor-Critic with Reward-Preserving Exploration

An Improved Off-Policy Actor-Critic Algorithm with Historical Behaviors Reusing for Robotic Control

Integrated Actor-Critic for Deep Reinforcement Learning

Availability of data and materials

Data Availibility Statement

Code Availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Ethics approval

Competing interests

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation