Skip to main content
Log in

Potential-Based Least-Squares Policy Iteration for a Parameterized Feedback Control System

  • Published:
Journal of Optimization Theory and Applications Aims and scope Submit manuscript

Abstract

In the paper, a potential-based policy iteration method is proposed for optimal control of a stochastic dynamic system with an average cost criterion and a parameterized control law. In this method, the potential function and the optimal control parameters are obtained via a least-squares-based approach. The potential estimation algorithm is derived from a temporal difference learning method, which can be viewed as a continuous version of the least-squares policy evaluation algorithm. The policy iteration algorithm is validated by solving a linear quadratic gaussian problem in the simulation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2

Similar content being viewed by others

References

  1. Bellman, R.E.: Dynamic Programming. Princeton University Press, Princeton (1957)

    MATH  Google Scholar 

  2. Bertsekas, D.P.: Dynamic Programming and Optimal Control, vols. I and II. Athena Scientific, Belmont (2007)

    Google Scholar 

  3. Powell, W.B.: Approximate Dynamic Programming: Solving the Curses of Dimensionality. Wiley, New York (2007)

  4. Bertsekas, D.P., Tsitsiklis, J.N.: Neuro-Dynamic Programming. Athena Scientific, Belmont (1996)

    MATH  Google Scholar 

  5. Geist, M., Pietquin, O.: Algorithmic survey of parametric value function approximation. IEEE Trans. Neural Netw. Learn. Syst. 24, 845–867 (2013)

    Article  Google Scholar 

  6. Sutton, R.S.: Learning to predict by the methods of temporal differences. Mach. Learn. 3, 9–44 (1988)

    Google Scholar 

  7. Bradtke, S., Barto, A.: Linear least-squares algorithms for temporal difference learning. Mach. Learn. 22, 33–57 (1996)

    MATH  Google Scholar 

  8. Nedic, A., Bertsekas, D.P.: Least squares policy evaluation algorithms with linear function approximation. Discrete Event Dyn Syst 13, 79–110 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  9. Lagoudakis, M., Parr, R.: Least-squares policy iteration. J. Mach. Learn. Res. 4, 1107–1149 (2003)

    MathSciNet  MATH  Google Scholar 

  10. Xu, X., Hu, D.W.: Kernel-based least squares policy iteration for reinforcement learning. IEEE Trans. Neural Netw. 18, 973–992 (2007)

    Article  Google Scholar 

  11. Bertsekas, D.P.: Approximate policy iteration: a survey and some new methods. J. Control Theory Appl. 9, 310–335 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  12. Powell, W.B., Ma, J.: A review of stochastic algorithms with continuous value function approximation and some new approximate policy iteration algorithms for multidimensional continuous applications. J. Control Theory Appl. 9, 336–352 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  13. Cheng, K., Fei, S., Zhang, K., Liu, X., Wei, H.: Temporal difference-based policy iteration for optimal control of stochastic systems. J. Optim. Theory Appl. 163, 165–180 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  14. Xu, X., Lu, C.M., Hu, D.W.: Continuous-action reinforcement learning with fast policy search and adaptive basis function selection. Soft. Comput. 15, 1055–1070 (2011)

    Article  Google Scholar 

  15. Cao, X.R.: Basic ideas for event-based optimization of Markov systems. Discrete Event Dyn. Syst. 15, 167–197 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  16. Cao, X.R.: Single sample path-based optimization of Markov chains. J. Optim. Theory Appl. 100, 527–548 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  17. Zhang, K.J., Xu, Y.K., Chen, X., Cao, X.R.: Policy iteration based feedback control. Automatica 44, 1055–1061 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  18. Meyn, S.P.: The policy iteration algorithm for average reward Markov decision processes with general state space. IEEE Trans. Autom. Control 42, 1663–1680 (1997)

    Article  MathSciNet  MATH  Google Scholar 

  19. Reich, S.: Weak convergence theorems for nonexpansive mappings in Banach spaces. J. Math. Anal. Appl. 67, 274–276 (1979)

    Article  MathSciNet  MATH  Google Scholar 

  20. Byrd, R.H., Gilbert, J.C., Nocedal, J.: A trust region method based on interior point techniques for nonlinear programming. Math. Program. 89, 149–185 (2000)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgments

The authors would like to thank the editors and the anonymous reviewers for their constructive comments that improved the manuscript. The paper work was supported by the National Natural Science Foundation of China under Grant Nos. 60874030 and 61374006, and the Major Program of National Natural Science Foundation of China under Grant No. 11190015.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kanjian Zhang.

Additional information

Communicated by Qianchuan Zhao.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Cheng, K., Zhang, K., Fei, S. et al. Potential-Based Least-Squares Policy Iteration for a Parameterized Feedback Control System. J Optim Theory Appl 169, 692–704 (2016). https://doi.org/10.1007/s10957-015-0809-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10957-015-0809-6

Keywords

Mathematics Subject Classification

Navigation