Multikernel Recursive Least-Squares Temporal Difference Learning

Zhang, Chunyuan; Zhu, Qingxin; Niu, Xinzheng

doi:10.1007/978-3-319-42297-8_20

Chunyuan Zhang^16,17,
Qingxin Zhu¹⁶ &
Xinzheng Niu¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9773))

Included in the following conference series:

International Conference on Intelligent Computing

2997 Accesses

Abstract

Traditional least-squares temporal difference (LSTD) algorithms provide an efficient way for policy evaluation, but their performance is greatly influenced by the manual selection of state features and their approximation ability is often limited. To overcome these problems, we propose a multikernel recursive LSTD algorithm in this paper. Different from the previous kernel-based LSTD algorithms, the proposed algorithm uses Bellman operator along with projection operator, and constructs the sparse dictionary online. To avoid caching all history samples and reduce the computational cost, it uses the sliding-window technique. To avoid overfitting and reduce the bias caused by the sliding window, it also considers \( L_{2} \) regularization. In particular, to improve the approximation ability, it uses the multikernel technique, which may be the first time to be used for value-function prediction. Experimental results on a 50-state chain problem show the good performance of the proposed algorithm in terms of convergence speed and prediction accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)
Google Scholar
Tsitsiklis, J.N., Roy, B.V.: An analysis of temporal-difference learning with function approximation. IEEE Trans. Autom. Control 42(5), 674–690 (1997)
Article MathSciNet MATH Google Scholar
Bradtke, S.J., Barto, A.G.: Linear least-squares algorithms for temporal difference learning. Mach. Learn. 22(1–3), 33–57 (1996)
MATH Google Scholar
Boyan, J.A.: Technical update: least-squares temporal difference learning. Mach. Learn. 49(2–3), 233–246 (2002)
Article MATH Google Scholar
Xu, X., He, H., Hu, D.: Efficient reinforcement learning using recursive least-squares methods. J. Artif. Intell. Res. 16, 259–292 (2002)
MathSciNet MATH Google Scholar
Kolter, J.Z., Ng, A.Y.: Regularization and feature selection in least-squares temporal difference learning. In: Proceedings of the 26th Annual International Conference on Machine Learning, pp. 521–528. ACM Press, New York (2009)
Google Scholar
Chen, S., Chen, G., Gu, R.: An efficient L ₂-norm regularized least-squares temporal difference learning algorithm. Knowl. Based Syst. 45, 94–99 (2013)
Article Google Scholar
Xu, X.: A sparse Kernel-Based Least-Squares temporal difference algorithm for reinforcement learning. In: Jiao, L., Wang, L., Gao, X.-b., Liu, J., Wu, F. (eds.) ICNC 2006. LNCS, vol. 4221, pp. 47–56. Springer, Heidelberg (2006)
Chapter Google Scholar
Shawe-Taylor, J., Cristianini, N.: Kernel Methods for Pattern Analysis. Cambridge University Press, Cambridge (2004)
Book MATH Google Scholar
Liu, W., Príncipe, J.C., Haykin, S.: Kernel Adaptive Filtering: A Comprehensive Introduction. Wiley, Hoboken (2010)
Book Google Scholar
Bae, J., Chhatbar, P., Francis, J.T., Sanchez, J.C., Príncipe, J.C.: Reinforcement learning via kernel temporal difference. In: 33rd Annual International Conference of the IEEE EMBS, pp. 5662–5665. IEEE Press, New York (2011)
Google Scholar
Engel, Y., Mannor, S., Meir, R.: The kernel recursive least-squares algorithm. IEEE Trans. Signal Process. 52(8), 2275–2285 (2004)
Article MathSciNet Google Scholar
Fan, H., Song, Q.: A sparse kernel algorithm for online time series data prediction. Expert Syst. Appl. 40(6), 2174–2181 (2013)
Article Google Scholar
Tobar, F.A., Mandic, D.P.: Multikernel least squares estimation. In: 3rd Conference on Sensor Signal Processing for Defence, pp. 1–5. IET, London (2012)
Google Scholar
Tobar, F.A., Kung, S., Mandic, D.P.: Multikernel least mean square algorithm. IEEE Trans. Neural Netw. Learn. Syst. 25(2), 265–277 (2014)
Article Google Scholar
Xu, X., Hu, D., Lu, X.: Kernel-based least squares policy iteration for reinforcement learning. IEEE Trans. Neural Networks 18(4), 973–992 (2007)
Article Google Scholar
Jakab, H.S., Csató, L.: Novel feature selection and kernel-based value approximation method for reinforcement learning. In: Mladenov, V., Koprinkova-Hristova, P., Palm, G., Villa, A.E., Appollini, B., Kasabov, N. (eds.) ICANN 2013. LNCS, vol. 8131, pp. 170–177. Springer, Heidelberg (2013)
Chapter Google Scholar
Kivinen, J., Smola, A.J., Williamson, R.C.: Online learning with kernels. In: Advances in Neural Information Processing Systems, vol. 14, pp. 785–792. MIT Press, Cambridge (2001)
Google Scholar
Taylor, G., Parr, R.: Kernelized value function approximation for reinforcement learning. In: Proceedings of the 26th Annual International Conference on Machine Learning, pp. 1017–1024. ACM Press, New York (2009)
Google Scholar
Schölkopf, B., Smola, A.J.: Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press, Cambridge (2002)
Google Scholar
Hefferon, J.: Linear Algebra. Orthogonal Publishing L3c, Ann Arbor (2014)
Google Scholar
Rasmussen, C.E., Williams, C.K.I.: Gaussian Processes for Machine Learning. MIT Press, Cambridge (2006)
MATH Google Scholar
Vaerenbergh, S.V., Vía, J., Santamaría, I.: A sliding-window kernel RLS algorithm and its application to nonlinear channel identification. In: 2006 IEEE International Conference on Acoustics Speech and Signal Processing, pp. 789–792. IEEE Press, New York (2006)
Google Scholar
Platt, J.: A resource-allocating network for function interpolation. Neural Comput. 3(2), 213–225 (1991)
Article MathSciNet Google Scholar
Lagoudakis, M.G., Parr, R.: Least-squares policy iteration. J. Mach. Learn. Res. 4, 1107–1149 (2003)
MathSciNet MATH Google Scholar

Download references

Acknowledgments

This work is supported in part by the National Natural Science Foundation of China under Grant Nos. 61300192 and 11261015, the Fundamental Research Funds for the Central Universities under Grant No. ZYGX2014J052, and the Natural Science Foundation of Hainan Province, China, under Grant No. 613153.

Author information

Authors and Affiliations

School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu, China
Chunyuan Zhang, Qingxin Zhu & Xinzheng Niu
College of Information Science and Technology, Hainan University, Haikou, China
Chunyuan Zhang

Authors

Chunyuan Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Qingxin Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Xinzheng Niu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chunyuan Zhang .

Editor information

Editors and Affiliations

Tongji University , Shanghai, China
De-Shuang Huang
Inha University , Incheon, Korea (Republic of)
Kyungsook Han
Liverpool John Moores University , Liverpool, United Kingdom
Abir Hussain

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, C., Zhu, Q., Niu, X. (2016). Multikernel Recursive Least-Squares Temporal Difference Learning. In: Huang, DS., Han, K., Hussain, A. (eds) Intelligent Computing Methodologies. ICIC 2016. Lecture Notes in Computer Science(), vol 9773. Springer, Cham. https://doi.org/10.1007/978-3-319-42297-8_20

Download citation

DOI: https://doi.org/10.1007/978-3-319-42297-8_20
Published: 12 July 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-42296-1
Online ISBN: 978-3-319-42297-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics