Abstract
Nowadays, reinforcement learning (RL) is increasingly being employed to optimize automatic control systems. This allows these systems to autonomously learn and improve their operations, enhancing their ability to adapt to changing environments and conditions. However, RL heavily relies on reward signals to guide learning, and in practical tasks, these signals are often sparse. This sparsity hinders the learning procedure as only a small amount of feedback signals are available. Most of the current methods to solve the sparse reward problem need to introduce a lot of hyperparameters, and the sample data utilization is relatively low. To address the issue of sparse rewards in RL-based automatic control systems, we propose the Cosine Attenuation Monte Carlo Augmented Actor-Critic (CAAC) algorithm. CAAC uses a cosine decay function to adjust the Q-value during training, optimizing the effect of final rewards and improving RL algorithm performance in the area of automatic control systems. In addition, we conduct experiments in three simulation environments to validate the proposed approach. The results demonstrate that CAAC outperforms the baseline algorithms in terms of learning speed and obtains 10% to 44.3% higher final reward.
Similar content being viewed by others
Availability of data and materials
The data and materials supporting the findings of this study are available upon request. Researchers interested in accessing the data for further analysis or verification are invited to contact Kun Liu.
Data Availibility Statement
The source code used in the implementation of the analyses presented in this paper is available for access and review. Researchers interested in obtaining the code for replicating the study or further exploration are encouraged to contact Kun Liu
Code Availability
The source code used in the implementation of the analyses presented in this paper is available for access and review. Researchers interested in obtaining the code for replicating the study or further exploration are encouraged to contact Kun Liu.
References
Li X, Lei H, Zhang L, Wang M (2023) Differentiable logic policy for interpretable deep reinforcement learning: A study from an optimization perspective. IEEE Trans Pattern Anal Mach Intell 45(10):11654–11667
Xia Z, Xue S, Wu J, Chen Y, Chen J, Wu L (2021) Deep reinforcement learning for smart city communication networks. IEEE Trans Industr Inf 17(6):4188–4196. https://doi.org/10.1109/TII.2020.3006199
Xu Y, Zhou H, Ma T, Zhao J, Qian B, Shen X (2021) Leveraging multiagent learning for automated vehicles scheduling at nonsignalized intersections. IEEE Internet Things J 8(14):11427–11439. https://doi.org/10.1109/JIOT.2021.3054649
Güitta-López L, Boal J, López-López ÁJ (2023) Learning more with the same effort: how randomization improves the robustness of a robotic deep reinforcement learning agent. Appl Intell 53(12):14903–14917. https://doi.org/10.1007/S10489-022-04227-3
Wang M, Wu L, Li J, He L (2022) Traffic signal control with reinforcement learning based on region-aware cooperative strategy. IEEE Trans Intell Transp Syst 23(7):6774–6785. https://doi.org/10.1109/TITS.2021.3062072
Silver D, Schrittwieser J, Simonyan K, Antonoglou I, Huang A, Guez A, Hubert T, Baker L, Lai M, Bolton A et al (2017) Mastering the game of go without human knowledge. Nature 550(7676):354–359. https://doi.org/10.1038/NATURE24270
Vinyals O, Babuschkin I, Czarnecki WM, Mathieu M, Dudzik A, Chung J, Choi DH, Powell R, Ewalds T, Georgiev P et al (2019) Grandmaster level in starcraft II using multi-agent reinforcement learning. Nature 575(7782):350–354. https://doi.org/10.1038/S41586-019-1724-Z
Agarwal R, Schwarzer M, Castro PS, Courville AC, Bellemare M (2022) Reincarnating reinforcement learning: Reusing prior computation to accelerate progress. In: Advances in neural information processing systems, vol 35, pp 28955–28971. http://papers.nips.cc/paper_files/paper/2022/hash/ba1c5356d9164bb64c446a4b690226b0-Abstract-Conference.html
Yaghmaie FA, Gustafsson F, Ljung L (2022) Linear quadratic control using model-free reinforcement learning. IEEE Trans Autom Control 68(2):737–752
Ladosz P, Weng L, Kim M, Oh H (2022) Exploration in deep reinforcement learning: A survey. Inform Fusion 85:1–22
Dilokthanakul N, Kaplanis C, Pawlowski N, Shanahan M (2019) Feature control as intrinsic motivation for hierarchical reinforcement learning. IEEE Trans Neural Netw Learn Syst 30(11):3409–3418. https://doi.org/10.1109/TNNLS.2019.2891792
Zhao H, Wu J, Li Z, Chen W, Zheng Z (2022) Double sparse deep reinforcement learning via multilayer sparse coding and nonconvex regularized pruning. IEEE Trans Cybern 53(2):765–778
Kipf T, Li Y, Dai H, Zambaldi VF, Sanchez-Gonzalez A, Grefenstette E, Kohli P, Battaglia PW (2019) Compile: Compositional imitation learning and execution. In: Proceedings of the 36th international conference on machine learning, vol 97 pp 3418–3428. http://proceedings.mlr.press/v97/kipf19a.html
Ravichandar H, Polydoros AS, Chernova S (2020) Billard, A. Annual review of control, robotics, and autonomous systems 3:297–330. https://doi.org/10.1146/ANNUREV-CONTROL-100819-063206
Le Mero L, Yi D, Dianati M, Mouzakitis A (2022) A survey on imitation learning techniques for end-to-end autonomous vehicles. IEEE Trans Intell Transp Syst 23(9):14128–14147
Wilcox A, Balakrishna A, Dedieu J, Benslimane W, Brown D, Goldberg K (2022) Monte carlo augmented actor-critic for sparse reward deep reinforcement learning from suboptimal demonstrations. In: Advances in neural information processing systems, vol 35, pp 2254–2267
Aotani T, Kobayashi T, Sugimoto K (2021) Bottom-up multi-agent reinforcement learning by reward shaping for cooperative-competitive tasks. Appl Intell 51(7):4434–4452. https://doi.org/10.1007/S10489-020-02034-2
Hu Y, Wang W, Jia H, Wang Y, Chen Y, Hao J, Wu F, Fan C (2020) Learning to utilize shaping rewards: A new approach of reward shaping. In: Advances in neural information processing systems, vol 33, pp 15931–15941
Sun H, Han L, Yang R, Ma X, Guo J, Zhou B (2022) Exploit reward shifting in value-based deep-rl: Optimistic curiosity-based exploration and conservative exploitation via linear reward shaping. In: Advances in neural information processing systems, vol 35, pp 37719–37734
Sami H, Bentahar J, Mourad A, Otrok H, Damiani E (2022) Graph convolutional recurrent networks for reward shaping in reinforcement learning. Inf Sci 608:63–80
Li J, Wu X, Xu M, Liu Y (2022) Deep reinforcement learning and reward shaping based eco-driving control for automated hevs among signalized intersections. Energy 251:123924
Zheng B, Verma S, Zhou J, Tsang IW, Chen F (2022) Imitation learning: Progress, taxonomies and challenges. IEEE Trans on Neural Netw and Learn Syst
Zhu Z, Lin K, Dai B, Zhou J (2022) Self-adaptive imitation learning: Learning tasks with delayed rewards from sub-optimal demonstrations. In: Proceedings of the AAAI conference on artificial intelligence, vol 36, pp 9269–9277
Xu T, Li Z, Yu Y (2022) Error bounds of imitating policies and environments for reinforcement learning. IEEE Trans Pattern Anal Mach Intell 44(10):6968–6980. https://doi.org/10.1109/TPAMI.2021.3096966
Rolnick D, Ahuja A, Schwarz J, Lillicrap TP, Wayne G (2019) Experience replay for continual learning. In: Advances in neural information processing systems, pp 348–358. https://proceedings.neurips.cc/paper/2019/hash/fa7cdfad1a5aaf8370ebeda47a1ff1c3-Abstract.html
Wu J, Huang Z, Huang W, Lv C (2022) Prioritized experience-based reinforcement learning with human guidance for autonomous driving. IEEE Trans Neural Netw Learn Syst 35(1):855–869
Packer C, Abbeel P, Gonzalez JE (2021) Hindsight task relabelling: Experience replay for sparse reward meta-rl. In: Advances in neural information processing systems, pp 2466–2477. https://proceedings.neurips.cc/paper/2021/hash/1454ca2270599546dfcd2a3700e4d2f1-Abstract.html
Weng W, Gupta H, He N, Ying L, Srikant R (2020) The mean-squared error of double q-learning. In: Advances in neural information processing systems, vol 33, pp 6815–6826. https://proceedings.neurips.cc/paper/2020/hash/4bfbd52f4e8466dc12aaf30b7e057b66-Abstract.html
Yang Y, Hao J, Chen G, Tang H, Chen Y, Hu Y, Fan C, Wei Z (2020) Q-value path decomposition for deep multiagent reinforcement learning. In: Proceedings of the 37th international conference on machine learning, vol 119, pp 10706–10715. http://proceedings.mlr.press/v119/yang20d.html
Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: Proceedings of the 35th international conference on machine learning, vol 80, pp 1582–1591. http://proceedings.mlr.press/v80/fujimoto18a.html
Dabney W, Rowland M, Bellemare MG, Munos R (2018) Distributional reinforcement learning with quantile regression. In: Proceedings of the thirty-second AAAI conference on artificial intelligence, pp 2892–2901. https://doi.org/10.1609/AAAI.V32I1.11791
Kapturowski S, Ostrovski G, Quan J, Munos R, Dabney W (2019) Recurrent experience replay in distributed reinforcement learning. In: International conference on learning representations . https://openreview.net/forum?id=r1lyTjAqYX
Sutton RS, Barto AG (2018) Reinforcement Learning: An Introduction, Cambridge
Haarnoja T, Zhou A, Abbeel P, Levine S (2018) Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In: Proceedings of the 35th international conference on machine learning, vol 80, pp 1856–1865. http://proceedings.mlr.press/v80/haarnoja18b.html
Lee SY, Choi S, Chung S (2019) Sample-efficient deep reinforcement learning via episodic backward update. In: Advances in neural information processing systems, pp 2110–2119. https://proceedings.neurips.cc/paper/2019/hash/e6d8545daa42d5ced125a4bf747b3688-Abstract.html
Acknowledgements
This work was supported by the National Key R &D Program of China (No.2022YFB3104500), National Natural Science Foundation of China (No. U20A20177, U22B2022, 62272348), Wuhan Science and Technology Joint Project for Building a Strong Transportation Country(No.2023-2-7), and Open Research Fund from Guangdong Laboratory of Artificial Intelligence and Digital Economy (SZ) (No.GML-KF-22-07).
Author information
Authors and Affiliations
Contributions
Each author contributed significantly to the conception, design, and execution of this study.The first draft was written by Kun Liu, and all authors reviewed and revised the previous versions of the manuscript. Each author has thoroughly reviewed and endorsed the final version of the manuscript.
Corresponding authors
Ethics declarations
Ethics approval
The authors confirm that this research project has undergone rigorous ethical review and has received approval.
Competing interests
The authors declare no conflicts of interest that could influence the research design, data collection, or interpretation of results.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Liu, K., Wu, L., Zhang, Z. et al. CAAC: An effective reinforcement learning algorithm for sparse reward in automatic control systems. Appl Intell (2024). https://doi.org/10.1007/s10489-024-05464-4
Accepted:
Published:
DOI: https://doi.org/10.1007/s10489-024-05464-4