Approximate Optimal Stabilization Control of Servo Mechanisms based on Reinforcement Learning Scheme

Lv, Yongfeng; Ren, Xuemei; Hu, Shuangyi; Xu, Hao

doi:10.1007/s12555-018-0551-6

Approximate Optimal Stabilization Control of Servo Mechanisms based on Reinforcement Learning Scheme

Intelligent Control and Applications
Published: 04 July 2019

Volume 17, pages 2655–2665, (2019)
Cite this article

International Journal of Control, Automation and Systems Aims and scope Submit manuscript

Yongfeng Lv¹,
Xuemei Ren ORCID: orcid.org/0000-0002-7248-3318¹,
Shuangyi Hu¹ &
…
Hao Xu¹

15 Citations
Explore all metrics

Abstract

A reinforcement learning (RL) based adaptive dynamic programming (ADP) is developed to learn the approximate optimal stabilization input of the servo mechanisms, where the unknown system dynamics are approximated with a three-layer neural network (NN) identifier. First, the servo mechanism model is constructed and a three-layer NN identifier is used to approximate the unknown servo system. The NN weights of both the hidden layer and output layer are synchronously tuned with an adaptive gradient law. An RL-based critic three-layer NN is then used to learn the optimal cost function, where NN weights of the first layer are set as constants, NN weights of the second layer are updated by minimizing the squared Hamilton-Jacobi-Bellman (HJB) error. The optimal stabilization input of the servomechanism is obtained based on the three-layer NN identifier and RL-based critic NN scheme, which can stabilize the motor speed from its initial value to the given value. Moreover, the convergence analysis of the identifier and RL-based critic NN is proved, the stability of the cost function with the proposed optimal input is analyzed. Finally, a servo mechanism model and a complex system are provided to verify the correctness of the proposed methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Reinforcement Learning in Discrete Neural Control of the Underactuated System

Neural Network Observer Based Optimal Tracking Control for Multi-motor Servomechanism with Backlash

Neuroadaptive control of information-poor servomechanisms with smooth and nonsmooth uncertainties

Article Open access 11 February 2022

References

S. Wang, J. Na, and X. Ren, “RISE-based asymptotic prescribed performance tracking control of nonlinear servo mechanisms,” IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. 48, no. 12, pp. 2359–2370, December 2017.
Article Google Scholar
Y. X. Su, C. H. Zheng, and B. Y. Duan, “Automatic disturbances rejection controller for precise motion control of permanent-magnet synchronous motors,” IEEE Transactions on Industrial Electronics, vol. 52, no. 3, pp. 814–823, June 2005.
Article Google Scholar
M. A. Rahman, D. M. Vilathgamuwa, M. N. Uddin, and K. J. Tseng, “Nonlinear control of interior permanent-magnet synchronous motor,” IEEE Transactions on Industry Applications, vol. 39, no. 2, pp. 408–416, April 2003.
Article Google Scholar
I.-C. Bark, K.-H. Kim, and M.-J. Youn, “Robust nonlinear speed control of PM synchronous motor using boundary layer integral sliding mode control technique,” IEEE Transactions on Control Systems Technology, vol. 8, no. 1, pp. 47–54, January 2000.
Article Google Scholar
X. Ren, D. Li, G. Sun and W. Zhao, “Eso-based adaptive robust control of dual motor driving servo system,” Asian Journal of Control, vol. 18, no. 6, pp. 2358–2365, November 2016.
Article MathSciNet Google Scholar
Y. Jia, “Robust control with decoupling performance for steering and traction of 4WS vehicles under velocityvarying motion,” IEEE Transactions on Control Systems Technology, vol. 8, no. 3, pp. 554–569, May 2000.
Article Google Scholar
J. Na, Q. Chen, X. Ren, and Y. Guo.,“Adaptive prescribed performance motion control of servo mechanisms with friction compensation,” IEEE Transactions on Industrial Electronics, vol. 61, no. 1, pp. 486–494, January 2014.
Article Google Scholar
J. Yang, J. Na, G. Gao, and C. Zhang, “Adaptive Neural Tracking Control of Robotic Manipulators with Guaranteed NN Weight Convergence,” Complexity, vol. 2018. Article ID 7131562. 11 pages, October 2018.
J. Na, M. N. Mahyuddin, G. Herrmann, X. Ren, and R. Barber, “Robust adaptive finite-time parameter estimation and control for robotic systems,” International Journal of Robust and Nonlinear Control, vol. 25, no. 16, pp. 3045–3071, November 2015.
Article MathSciNet Google Scholar
Y Jia, “Alternative proofs for improved LMI representations for the analysis and the design of continuous-time systems with polytopic type uncertainty: a predictive approach,” IEEE Transactions on Automatic Control, vol. 48, no. 8, pp. 1413–1416, August 2003.
Article MathSciNet Google Scholar
C. Liu, H. Zhang, G. Xiao, and S. Sun, “Integral reinforcement learning based decentralized optimal tracking control of unknown nonlinear large-scale interconnected systems with constrained-input,” Neurocomputing, vol. 323, pp. 1–11, January 2019.
Article Google Scholar
J. Zhao, X. Wang, G. Gao, J. Na, H. Liu, and F. Luan, “Online adaptive parameter estimation for quadrotors,” Algorithms, vol. 11, no. 11, pp. 167, October 2018.
Article MathSciNet Google Scholar
H. Modares and F.L. Lewis, “Optimal tracking control of nonlinear partially-unknown constrained-input systems using integral reinforcement learning,” Automatica, vol. 50, no. 7, pp. 1780–1792, July 2014.
Article MathSciNet Google Scholar
H. Zhang, L. Cui, X. Zhang, and Y. Luo, “Data-driven robust approximate optimal tracking control for unknown general nonlinear systems using adaptive dynamic programming method,” IEEE Transactions on Neural Networks, vol. 22, no. 12, pp. 2226–2236, December 2011.
Article Google Scholar
F. L. Lewis and D. Liu, Reinforcement Learning and Approximate Dynamic Programming for Feedback Control, John Wiley & Sons, 2013.
Google Scholar
B. Luo, H.-N. Wu, and T. Huang, “Off-policy reinforcement learning for H°o control design,” IEEE Transactions on Cybernetics, vol. 45, no. 1, pp. 65–76, January 2015.
Article Google Scholar
B. Luo, H.-N. Wu, T. Huang, and D. Liu, “Data-based approximate policy iteration for afflne nonlinear continuous-time optimal control design,” Automatica, vol. 50, no. 12, pp. 3281–3290, December 2014.
Article Google Scholar
P. J. Werbos, “A menu of designs for reinforcement learning over time,” Neural Networks for Control, MIT Press, Cambridge, USA, pp. 67–95, 1990.
Google Scholar
S. Yasini, M. B. N. Sistani, and A. Karimpour, “Approximate dynamic programming for two-player zero-sum game related to H°o control of unknown nonlinear continuous-time systems,” International Journal of Control, Automation and Systems, vol. 13, no. 1, pp. 99–109, February 2015.
Article Google Scholar
D. Wang, D. Liu, Q. Wei, D. Zhao, and N. Jin, “Optimal control of unknown nonaffine nonlinear discrete-time systems based on adaptive dynamic programming,” Automatica, vol. 48, no. 8, pp. 1825–1832, August 2012.
Article MathSciNet Google Scholar
J. M. Lee and J. H. Lee, “Approximate dynamic programming strategies and their applicability for process control,” International Journal of Control, Automation, and Systems, vol. 2, no. 3, pp. 263–278, September 2004.
Google Scholar
A. Al-Tamimi, F. L. Lewis, and M. Abu-Khalaf, “Model-free Q-learning designs for linear discrete-time zero-sum games with application to H-inflnity control,” Automatica, vol. 43, no. 3, pp. 473–481, March 2007.
Article Google Scholar
K. G. Vamvoudakis and F. L. Lewis, “Online actor-critic algorithm to solve the continuous-time infinite horizon optimal control problem,” Automatica, vol. 46, no. 5, pp. 878–888, May 2010.
Article MathSciNet Google Scholar
Y. Lv, J. Na, Q. Yang, X. Wu, and Y. Guo, “Online adaptive optimal control for continuous-time nonlinear systems with completely unknown dynamics,” International Journal of Control, vol. 89, no. 1, pp. 99–112, January 2016.
Article MathSciNet Google Scholar
Y. Lv, J. Na, and X. Ren, “Online H°o control for completely unknown nonlinear systems via an identifier-critic-based ADP structure,” International Journal of Control, vol. 92, no. 1, pp. 100–111, April 2019.
Article MathSciNet Google Scholar
Y Lv, X. Ren, and J. Na, “Online optimal solutions for multi-player nonzero-sum game with completely unknown dynamics,” Neurocomputing, vol. 283, pp. 87–97, March 2018.
Article Google Scholar
Q. Wei, D. Liu, F. L. Lewis, Y. Liu, and J. Zhang, “Mixed iterative adaptive dynamic programming for optimal battery energy control in smart residential microgrids,” IEEE Transactions on Industrial Electronics, vol. 64, no. 5, pp. 4110–4120, May 2017.
Article Google Scholar
D. Wang, H. He, C. Mu, and D. Liu, “Intelligent critic control with disturbance attenuation for affine dynamics including an application to amicrogrid system,” IEEE Transactions on Industrial Electronics, vol. 64, no. 6, pp. 4935–4944, June 2017.
Article Google Scholar
D. Wang and C. Mu, “Adaptive-critic-based robust trajectory tracking of uncertain dynamics and its application to a spring-mass-damper system,” IEEE Transactions on Industrial Electronics, vol. 65, no. 1, pp. 654–663, January 2018.
Article Google Scholar
Q. Wei, G. Shi, R. Song, and Y. Liu, “Adaptive dynamic programming-based optimal control scheme for energy storage systems with solar renewable energy,” IEEE Transactions on Industrial Electronics, vol. 64, no. 7, pp. 5468–5478, July 2017.
Article Google Scholar
B. Zhao and Y. Li, “Model-free adaptive dynamic programming based near-optimal decentralized tracking Control of reconfigurable manipulators,” International Journal of Control, Automation and Systems, vol. 16, no. 2, pp. 478–490, April 2018.
Article Google Scholar
M. W Ulmer, J. C. Goodson, D. C. Mattfeld, ad M. Hennig, “Offline-online approximate dynamic programming for dynamic vehicle routing with stochastic requests,” Transportation Science, vol. 53, no. 1, pp. 1–318, February 2019.
Article Google Scholar
X. Yang, H. He, and X. Zhong, “Adaptive dynamic programming for robust regulation and its application to power systems,” IEEE Transactions on Industrial Electronics, vol. 65, no. 7, pp. 5722–5732, July 2018.
Article Google Scholar
D. Liu, H. Li, and D. Wang, “Online synchronous approximate optimal learning algorithm for multi-player non-zero-sum games with unknown dynamics,” IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. 44, no. 8, pp. 1015–1027, August 2014.
Article Google Scholar
D. Zhao, Q. Zhang, D. Wang, and Y. Zhu, “Experience replay for optimal control of nonzero-sum game systems with unknown dynamics,” IEEE transactions on cybernetics, vol. 46, no. 3, pp. 854–865, March 2016.
Article Google Scholar
Y. Lv and X. Ren, “Approximate Nash Solutions for Multiplayer Mixed-Zero-Sum Game With Reinforcement Learning,” IEEE Transactions on Systems, Man, and Cybernetics: Systems, August 2018. DOI: 10.1109/TSMC.2018.2861826
Google Scholar
J. Na and G. Herrmann, “Online adaptive approximate optimal tracking control with simplified dual approximation structure for continuous-time unknown nonlinear systems,” IEEE/CAA Journal of Automatica Sinica, vol. 1, no. 4, pp. 412–422, Octomber 2014.
Article Google Scholar
F. L. Lewis and D. Vrabie, “Reinforcement learning and adaptive dynamic programming for feedback control,” IEEE circuits and systems magazine, vol. 41, no. 1, pp. 14–25, February 2011.
Google Scholar

Download references

Author information

Authors and Affiliations

School of Automation, Beijing Institute of Technology, Beijing, 100081, China
Yongfeng Lv, Xuemei Ren, Shuangyi Hu & Hao Xu

Authors

Yongfeng Lv
View author publications
You can also search for this author in PubMed Google Scholar
Xuemei Ren
View author publications
You can also search for this author in PubMed Google Scholar
Shuangyi Hu
View author publications
You can also search for this author in PubMed Google Scholar
Hao Xu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xuemei Ren.

Additional information

Publisher’s Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This work is supported by National Natural Science Foundation of China under Grant No.61433003 and Grant No.61573174.

Yongfeng Lv received his B.S. and M.S. degrees in mechatronic engineering from Faculty of Mechanical and Electrical Engineering Kunming University of Science and Technology, Kunming, China, in 2012.and 2016. respectively. He is currently pursuing a Ph.D. degree in control science and engineering with School of Automation, Beijing Institute of Technology, Beijing, China. His current research interests include adaptive dynamic programming, optimal control, game theory, and multi-input system.

Xuemei Ren received her B.S. degree in applied mathematics from Shandong University, Shandong, China, in 1989. and her M.S. and Ph.D. degrees in control engineering from Beijing University of Aeronautics and Astronautics, Beijing, China, in 1992.and 1995. respectively. She has been a Professor with the School of Automation, Beijing Institute of Technology, Beijing, China, since 2002. From 2001.to 2002. and 2005.to 2005. she visited the Department of Electrical Engineering, Hong Kong Polytechnic University, Hong Kong, China. From 2006.to 2007. she visited the Automation and Robotics Research Institute, University of Texas at Arlington, Arlington, USA, as a Visiting Scholar. She has published over 100 academic papers. Her current research interests include nonlinear systems, intelligent control, neural network control, reinforcement learning and multi-driven servo systems.

Shuangyi Hu received his B.S. degree I from the School of Automation, Beijing Institute of Technology, Beijing, China, in 2017. He is currently pursuing a Ph.D. degree in control science and engineering with School of Automation, Beijing Institute of Technology, Beijing, China. His current research interests include tracking and synchronization control for multi-motor driving servo systems.

Hao Xu received his B.S. degree from the School of Automation, Beijing Institute of Technology, Beijing, China, in 2018. He is currently working toward an M.S. degree in the Naval University of Engineering. His current research interest includes power electronics and optimization design of phase-shifting transformers.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lv, Y., Ren, X., Hu, S. et al. Approximate Optimal Stabilization Control of Servo Mechanisms based on Reinforcement Learning Scheme. Int. J. Control Autom. Syst. 17, 2655–2665 (2019). https://doi.org/10.1007/s12555-018-0551-6

Download citation

Received: 02 August 2018
Revised: 04 March 2019
Accepted: 15 May 2019
Published: 04 July 2019
Issue Date: October 2019
DOI: https://doi.org/10.1007/s12555-018-0551-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Approximate Optimal Stabilization Control of Servo Mechanisms based on Reinforcement Learning Scheme

Abstract

Access this article

Similar content being viewed by others

Reinforcement Learning in Discrete Neural Control of the Underactuated System

Neural Network Observer Based Optimal Tracking Control for Multi-motor Servomechanism with Backlash

Neuroadaptive control of information-poor servomechanisms with smooth and nonsmooth uncertainties

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Approximate Optimal Stabilization Control of Servo Mechanisms based on Reinforcement Learning Scheme

Abstract

Access this article

Similar content being viewed by others

Reinforcement Learning in Discrete Neural Control of the Underactuated System

Neural Network Observer Based Optimal Tracking Control for Multi-motor Servomechanism with Backlash

Neuroadaptive control of information-poor servomechanisms with smooth and nonsmooth uncertainties

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation