Skip to main content

Multikernel Recursive Least-Squares Temporal Difference Learning

  • Conference paper
  • First Online:
Intelligent Computing Methodologies (ICIC 2016)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9773))

Included in the following conference series:

  • 2997 Accesses

Abstract

Traditional least-squares temporal difference (LSTD) algorithms provide an efficient way for policy evaluation, but their performance is greatly influenced by the manual selection of state features and their approximation ability is often limited. To overcome these problems, we propose a multikernel recursive LSTD algorithm in this paper. Different from the previous kernel-based LSTD algorithms, the proposed algorithm uses Bellman operator along with projection operator, and constructs the sparse dictionary online. To avoid caching all history samples and reduce the computational cost, it uses the sliding-window technique. To avoid overfitting and reduce the bias caused by the sliding window, it also considers \( L_{2} \) regularization. In particular, to improve the approximation ability, it uses the multikernel technique, which may be the first time to be used for value-function prediction. Experimental results on a 50-state chain problem show the good performance of the proposed algorithm in terms of convergence speed and prediction accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)

    Google Scholar 

  2. Tsitsiklis, J.N., Roy, B.V.: An analysis of temporal-difference learning with function approximation. IEEE Trans. Autom. Control 42(5), 674–690 (1997)

    Article  MathSciNet  MATH  Google Scholar 

  3. Bradtke, S.J., Barto, A.G.: Linear least-squares algorithms for temporal difference learning. Mach. Learn. 22(1–3), 33–57 (1996)

    MATH  Google Scholar 

  4. Boyan, J.A.: Technical update: least-squares temporal difference learning. Mach. Learn. 49(2–3), 233–246 (2002)

    Article  MATH  Google Scholar 

  5. Xu, X., He, H., Hu, D.: Efficient reinforcement learning using recursive least-squares methods. J. Artif. Intell. Res. 16, 259–292 (2002)

    MathSciNet  MATH  Google Scholar 

  6. Kolter, J.Z., Ng, A.Y.: Regularization and feature selection in least-squares temporal difference learning. In: Proceedings of the 26th Annual International Conference on Machine Learning, pp. 521–528. ACM Press, New York (2009)

    Google Scholar 

  7. Chen, S., Chen, G., Gu, R.: An efficient L 2-norm regularized least-squares temporal difference learning algorithm. Knowl. Based Syst. 45, 94–99 (2013)

    Article  Google Scholar 

  8. Xu, X.: A sparse Kernel-Based Least-Squares temporal difference algorithm for reinforcement learning. In: Jiao, L., Wang, L., Gao, X.-b., Liu, J., Wu, F. (eds.) ICNC 2006. LNCS, vol. 4221, pp. 47–56. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  9. Shawe-Taylor, J., Cristianini, N.: Kernel Methods for Pattern Analysis. Cambridge University Press, Cambridge (2004)

    Book  MATH  Google Scholar 

  10. Liu, W., Príncipe, J.C., Haykin, S.: Kernel Adaptive Filtering: A Comprehensive Introduction. Wiley, Hoboken (2010)

    Book  Google Scholar 

  11. Bae, J., Chhatbar, P., Francis, J.T., Sanchez, J.C., Príncipe, J.C.: Reinforcement learning via kernel temporal difference. In: 33rd Annual International Conference of the IEEE EMBS, pp. 5662–5665. IEEE Press, New York (2011)

    Google Scholar 

  12. Engel, Y., Mannor, S., Meir, R.: The kernel recursive least-squares algorithm. IEEE Trans. Signal Process. 52(8), 2275–2285 (2004)

    Article  MathSciNet  Google Scholar 

  13. Fan, H., Song, Q.: A sparse kernel algorithm for online time series data prediction. Expert Syst. Appl. 40(6), 2174–2181 (2013)

    Article  Google Scholar 

  14. Tobar, F.A., Mandic, D.P.: Multikernel least squares estimation. In: 3rd Conference on Sensor Signal Processing for Defence, pp. 1–5. IET, London (2012)

    Google Scholar 

  15. Tobar, F.A., Kung, S., Mandic, D.P.: Multikernel least mean square algorithm. IEEE Trans. Neural Netw. Learn. Syst. 25(2), 265–277 (2014)

    Article  Google Scholar 

  16. Xu, X., Hu, D., Lu, X.: Kernel-based least squares policy iteration for reinforcement learning. IEEE Trans. Neural Networks 18(4), 973–992 (2007)

    Article  Google Scholar 

  17. Jakab, H.S., Csató, L.: Novel feature selection and kernel-based value approximation method for reinforcement learning. In: Mladenov, V., Koprinkova-Hristova, P., Palm, G., Villa, A.E., Appollini, B., Kasabov, N. (eds.) ICANN 2013. LNCS, vol. 8131, pp. 170–177. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  18. Kivinen, J., Smola, A.J., Williamson, R.C.: Online learning with kernels. In: Advances in Neural Information Processing Systems, vol. 14, pp. 785–792. MIT Press, Cambridge (2001)

    Google Scholar 

  19. Taylor, G., Parr, R.: Kernelized value function approximation for reinforcement learning. In: Proceedings of the 26th Annual International Conference on Machine Learning, pp. 1017–1024. ACM Press, New York (2009)

    Google Scholar 

  20. Schölkopf, B., Smola, A.J.: Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press, Cambridge (2002)

    Google Scholar 

  21. Hefferon, J.: Linear Algebra. Orthogonal Publishing L3c, Ann Arbor (2014)

    Google Scholar 

  22. Rasmussen, C.E., Williams, C.K.I.: Gaussian Processes for Machine Learning. MIT Press, Cambridge (2006)

    MATH  Google Scholar 

  23. Vaerenbergh, S.V., Vía, J., Santamaría, I.: A sliding-window kernel RLS algorithm and its application to nonlinear channel identification. In: 2006 IEEE International Conference on Acoustics Speech and Signal Processing, pp. 789–792. IEEE Press, New York (2006)

    Google Scholar 

  24. Platt, J.: A resource-allocating network for function interpolation. Neural Comput. 3(2), 213–225 (1991)

    Article  MathSciNet  Google Scholar 

  25. Lagoudakis, M.G., Parr, R.: Least-squares policy iteration. J. Mach. Learn. Res. 4, 1107–1149 (2003)

    MathSciNet  MATH  Google Scholar 

Download references

Acknowledgments

This work is supported in part by the National Natural Science Foundation of China under Grant Nos. 61300192 and 11261015, the Fundamental Research Funds for the Central Universities under Grant No. ZYGX2014J052, and the Natural Science Foundation of Hainan Province, China, under Grant No. 613153.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chunyuan Zhang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Zhang, C., Zhu, Q., Niu, X. (2016). Multikernel Recursive Least-Squares Temporal Difference Learning. In: Huang, DS., Han, K., Hussain, A. (eds) Intelligent Computing Methodologies. ICIC 2016. Lecture Notes in Computer Science(), vol 9773. Springer, Cham. https://doi.org/10.1007/978-3-319-42297-8_20

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-42297-8_20

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-42296-1

  • Online ISBN: 978-3-319-42297-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics