Abstract
Traditional least-squares temporal difference (LSTD) algorithms provide an efficient way for policy evaluation, but their performance is greatly influenced by the manual selection of state features and their approximation ability is often limited. To overcome these problems, we propose a multikernel recursive LSTD algorithm in this paper. Different from the previous kernel-based LSTD algorithms, the proposed algorithm uses Bellman operator along with projection operator, and constructs the sparse dictionary online. To avoid caching all history samples and reduce the computational cost, it uses the sliding-window technique. To avoid overfitting and reduce the bias caused by the sliding window, it also considers \( L_{2} \) regularization. In particular, to improve the approximation ability, it uses the multikernel technique, which may be the first time to be used for value-function prediction. Experimental results on a 50-state chain problem show the good performance of the proposed algorithm in terms of convergence speed and prediction accuracy.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)
Tsitsiklis, J.N., Roy, B.V.: An analysis of temporal-difference learning with function approximation. IEEE Trans. Autom. Control 42(5), 674–690 (1997)
Bradtke, S.J., Barto, A.G.: Linear least-squares algorithms for temporal difference learning. Mach. Learn. 22(1–3), 33–57 (1996)
Boyan, J.A.: Technical update: least-squares temporal difference learning. Mach. Learn. 49(2–3), 233–246 (2002)
Xu, X., He, H., Hu, D.: Efficient reinforcement learning using recursive least-squares methods. J. Artif. Intell. Res. 16, 259–292 (2002)
Kolter, J.Z., Ng, A.Y.: Regularization and feature selection in least-squares temporal difference learning. In: Proceedings of the 26th Annual International Conference on Machine Learning, pp. 521–528. ACM Press, New York (2009)
Chen, S., Chen, G., Gu, R.: An efficient L 2-norm regularized least-squares temporal difference learning algorithm. Knowl. Based Syst. 45, 94–99 (2013)
Xu, X.: A sparse Kernel-Based Least-Squares temporal difference algorithm for reinforcement learning. In: Jiao, L., Wang, L., Gao, X.-b., Liu, J., Wu, F. (eds.) ICNC 2006. LNCS, vol. 4221, pp. 47–56. Springer, Heidelberg (2006)
Shawe-Taylor, J., Cristianini, N.: Kernel Methods for Pattern Analysis. Cambridge University Press, Cambridge (2004)
Liu, W., PrÃncipe, J.C., Haykin, S.: Kernel Adaptive Filtering: A Comprehensive Introduction. Wiley, Hoboken (2010)
Bae, J., Chhatbar, P., Francis, J.T., Sanchez, J.C., PrÃncipe, J.C.: Reinforcement learning via kernel temporal difference. In: 33rd Annual International Conference of the IEEE EMBS, pp. 5662–5665. IEEE Press, New York (2011)
Engel, Y., Mannor, S., Meir, R.: The kernel recursive least-squares algorithm. IEEE Trans. Signal Process. 52(8), 2275–2285 (2004)
Fan, H., Song, Q.: A sparse kernel algorithm for online time series data prediction. Expert Syst. Appl. 40(6), 2174–2181 (2013)
Tobar, F.A., Mandic, D.P.: Multikernel least squares estimation. In: 3rd Conference on Sensor Signal Processing for Defence, pp. 1–5. IET, London (2012)
Tobar, F.A., Kung, S., Mandic, D.P.: Multikernel least mean square algorithm. IEEE Trans. Neural Netw. Learn. Syst. 25(2), 265–277 (2014)
Xu, X., Hu, D., Lu, X.: Kernel-based least squares policy iteration for reinforcement learning. IEEE Trans. Neural Networks 18(4), 973–992 (2007)
Jakab, H.S., Csató, L.: Novel feature selection and kernel-based value approximation method for reinforcement learning. In: Mladenov, V., Koprinkova-Hristova, P., Palm, G., Villa, A.E., Appollini, B., Kasabov, N. (eds.) ICANN 2013. LNCS, vol. 8131, pp. 170–177. Springer, Heidelberg (2013)
Kivinen, J., Smola, A.J., Williamson, R.C.: Online learning with kernels. In: Advances in Neural Information Processing Systems, vol. 14, pp. 785–792. MIT Press, Cambridge (2001)
Taylor, G., Parr, R.: Kernelized value function approximation for reinforcement learning. In: Proceedings of the 26th Annual International Conference on Machine Learning, pp. 1017–1024. ACM Press, New York (2009)
Schölkopf, B., Smola, A.J.: Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press, Cambridge (2002)
Hefferon, J.: Linear Algebra. Orthogonal Publishing L3c, Ann Arbor (2014)
Rasmussen, C.E., Williams, C.K.I.: Gaussian Processes for Machine Learning. MIT Press, Cambridge (2006)
Vaerenbergh, S.V., VÃa, J., SantamarÃa, I.: A sliding-window kernel RLS algorithm and its application to nonlinear channel identification. In: 2006 IEEE International Conference on Acoustics Speech and Signal Processing, pp. 789–792. IEEE Press, New York (2006)
Platt, J.: A resource-allocating network for function interpolation. Neural Comput. 3(2), 213–225 (1991)
Lagoudakis, M.G., Parr, R.: Least-squares policy iteration. J. Mach. Learn. Res. 4, 1107–1149 (2003)
Acknowledgments
This work is supported in part by the National Natural Science Foundation of China under Grant Nos. 61300192 and 11261015, the Fundamental Research Funds for the Central Universities under Grant No. ZYGX2014J052, and the Natural Science Foundation of Hainan Province, China, under Grant No. 613153.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Zhang, C., Zhu, Q., Niu, X. (2016). Multikernel Recursive Least-Squares Temporal Difference Learning. In: Huang, DS., Han, K., Hussain, A. (eds) Intelligent Computing Methodologies. ICIC 2016. Lecture Notes in Computer Science(), vol 9773. Springer, Cham. https://doi.org/10.1007/978-3-319-42297-8_20
Download citation
DOI: https://doi.org/10.1007/978-3-319-42297-8_20
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-42296-1
Online ISBN: 978-3-319-42297-8
eBook Packages: Computer ScienceComputer Science (R0)