Potential-Based Least-Squares Policy Iteration for a Parameterized Feedback Control System

Cheng, Kang; Zhang, Kanjian; Fei, Shumin; Wei, Haikun

doi:10.1007/s10957-015-0809-6

Potential-Based Least-Squares Policy Iteration for a Parameterized Feedback Control System

Published: 29 December 2015

Volume 169, pages 692–704, (2016)
Cite this article

Journal of Optimization Theory and Applications Aims and scope Submit manuscript

Kang Cheng¹,
Kanjian Zhang¹,
Shumin Fei¹ &
…
Haikun Wei¹

223 Accesses
Explore all metrics

Abstract

In the paper, a potential-based policy iteration method is proposed for optimal control of a stochastic dynamic system with an average cost criterion and a parameterized control law. In this method, the potential function and the optimal control parameters are obtained via a least-squares-based approach. The potential estimation algorithm is derived from a temporal difference learning method, which can be viewed as a continuous version of the least-squares policy evaluation algorithm. The policy iteration algorithm is validated by solving a linear quadratic gaussian problem in the simulation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Two Approaches to Stochastic Optimal Control Problems with a Final-Time Expectation Constraint

Article 10 September 2016

Laurent Pfeiffer

Policy Search for Path Integral Control

Adaptive Dynamic Programming for Minimal Energy Control with Guaranteed Convergence Rate of Linear Systems

Article 19 August 2019

Kai Zhang, Suoliang Ge & Yuling Ge

References

Bellman, R.E.: Dynamic Programming. Princeton University Press, Princeton (1957)
MATH Google Scholar
Bertsekas, D.P.: Dynamic Programming and Optimal Control, vols. I and II. Athena Scientific, Belmont (2007)
Google Scholar
Powell, W.B.: Approximate Dynamic Programming: Solving the Curses of Dimensionality. Wiley, New York (2007)
Bertsekas, D.P., Tsitsiklis, J.N.: Neuro-Dynamic Programming. Athena Scientific, Belmont (1996)
MATH Google Scholar
Geist, M., Pietquin, O.: Algorithmic survey of parametric value function approximation. IEEE Trans. Neural Netw. Learn. Syst. 24, 845–867 (2013)
Article Google Scholar
Sutton, R.S.: Learning to predict by the methods of temporal differences. Mach. Learn. 3, 9–44 (1988)
Google Scholar
Bradtke, S., Barto, A.: Linear least-squares algorithms for temporal difference learning. Mach. Learn. 22, 33–57 (1996)
MATH Google Scholar
Nedic, A., Bertsekas, D.P.: Least squares policy evaluation algorithms with linear function approximation. Discrete Event Dyn Syst 13, 79–110 (2003)
Article MathSciNet MATH Google Scholar
Lagoudakis, M., Parr, R.: Least-squares policy iteration. J. Mach. Learn. Res. 4, 1107–1149 (2003)
MathSciNet MATH Google Scholar
Xu, X., Hu, D.W.: Kernel-based least squares policy iteration for reinforcement learning. IEEE Trans. Neural Netw. 18, 973–992 (2007)
Article Google Scholar
Bertsekas, D.P.: Approximate policy iteration: a survey and some new methods. J. Control Theory Appl. 9, 310–335 (2011)
Article MathSciNet MATH Google Scholar
Powell, W.B., Ma, J.: A review of stochastic algorithms with continuous value function approximation and some new approximate policy iteration algorithms for multidimensional continuous applications. J. Control Theory Appl. 9, 336–352 (2011)
Article MathSciNet MATH Google Scholar
Cheng, K., Fei, S., Zhang, K., Liu, X., Wei, H.: Temporal difference-based policy iteration for optimal control of stochastic systems. J. Optim. Theory Appl. 163, 165–180 (2014)
Article MathSciNet MATH Google Scholar
Xu, X., Lu, C.M., Hu, D.W.: Continuous-action reinforcement learning with fast policy search and adaptive basis function selection. Soft. Comput. 15, 1055–1070 (2011)
Article Google Scholar
Cao, X.R.: Basic ideas for event-based optimization of Markov systems. Discrete Event Dyn. Syst. 15, 167–197 (2005)
Article MathSciNet MATH Google Scholar
Cao, X.R.: Single sample path-based optimization of Markov chains. J. Optim. Theory Appl. 100, 527–548 (1999)
Article MathSciNet MATH Google Scholar
Zhang, K.J., Xu, Y.K., Chen, X., Cao, X.R.: Policy iteration based feedback control. Automatica 44, 1055–1061 (2008)
Article MathSciNet MATH Google Scholar
Meyn, S.P.: The policy iteration algorithm for average reward Markov decision processes with general state space. IEEE Trans. Autom. Control 42, 1663–1680 (1997)
Article MathSciNet MATH Google Scholar
Reich, S.: Weak convergence theorems for nonexpansive mappings in Banach spaces. J. Math. Anal. Appl. 67, 274–276 (1979)
Article MathSciNet MATH Google Scholar
Byrd, R.H., Gilbert, J.C., Nocedal, J.: A trust region method based on interior point techniques for nonlinear programming. Math. Program. 89, 149–185 (2000)
Article MathSciNet MATH Google Scholar

Download references

Acknowledgments

The authors would like to thank the editors and the anonymous reviewers for their constructive comments that improved the manuscript. The paper work was supported by the National Natural Science Foundation of China under Grant Nos. 60874030 and 61374006, and the Major Program of National Natural Science Foundation of China under Grant No. 11190015.

Author information

Authors and Affiliations

School of Automation, Southeast University, Nanjing, 210096, Jiangsu, China
Kang Cheng, Kanjian Zhang, Shumin Fei & Haikun Wei

Authors

Kang Cheng
View author publications
You can also search for this author in PubMed Google Scholar
Kanjian Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Shumin Fei
View author publications
You can also search for this author in PubMed Google Scholar
Haikun Wei
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kanjian Zhang.

Additional information

Communicated by Qianchuan Zhao.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cheng, K., Zhang, K., Fei, S. et al. Potential-Based Least-Squares Policy Iteration for a Parameterized Feedback Control System. J Optim Theory Appl 169, 692–704 (2016). https://doi.org/10.1007/s10957-015-0809-6

Download citation

Received: 19 September 2014
Accepted: 21 September 2015
Published: 29 December 2015
Issue Date: May 2016
DOI: https://doi.org/10.1007/s10957-015-0809-6

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Potential-Based Least-Squares Policy Iteration for a Parameterized Feedback Control System

Abstract

Access this article

Similar content being viewed by others

Two Approaches to Stochastic Optimal Control Problems with a Final-Time Expectation Constraint

Policy Search for Path Integral Control

Adaptive Dynamic Programming for Minimal Energy Control with Guaranteed Convergence Rate of Linear Systems

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

Potential-Based Least-Squares Policy Iteration for a Parameterized Feedback Control System

Abstract

Access this article

Similar content being viewed by others

Two Approaches to Stochastic Optimal Control Problems with a Final-Time Expectation Constraint

Policy Search for Path Integral Control

Adaptive Dynamic Programming for Minimal Energy Control with Guaranteed Convergence Rate of Linear Systems

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation