Computer Science and Information Systems 2017 Volume 14, Issue 3, Pages: 789-804
https://doi.org/10.2298/CSIS170107029Z
Full text ( 345 KB)
Cited by
A kernel based true online Sarsa(λ) for continuous space control problems
Zhu Fei (Soochow University, School of Computer Science and Technology, Suzhou, Jiangsu, China + Soochow University, Provincial Key Laboratory for Computer Information Processing Technology, Suzhou, Jiangsu, China)
Zhu Haijun (Soochow University, School of Computer Science and Technology, Suzhou, Jiangsu, China)
Fu Yuchen (School of Computer Science and Engineering, Changshu Institute of Technology, China)
Chen Donghuo (Soochow University, School of Computer Science and Technology, Suzhou, Jiangsu, China)
Zhou Xiaoke (University of Basque Country, Spain)
Reinforcement learning is an efficient learning method for the control
problem by interacting with the environment to get an optimal policy.
However, it al so faces challenges such as low convergence accuracy and slow
convergence. More over, conventional reinforcement learning algorithms could
hardly solve continuous control problems. The kernel-based method can
accelerate convergence speed and improve convergence accuracy; and the
policy gradient method is a good way to deal with continuous space
problems. We proposed a Sarsa(λ) version of true online time difference
algorithm, named True Online Sarsa(λ)(TOSarsa(λ)), on the basis of the
clustering-based sample specification method and selective kernel-based
value function. The TOSarsa(λ) algorithm has a consistent result with both
the forward view and the backward view which ensures to get an optimal
policy in less time. Afterwards we also combined TOSarsa(λ) with heuristic
dynamic programming. The experiments showed our proposed algorithm worked
well in dealing with continuous control problem.
Keywords: reinforcement learning, kernel method, true online, policy gradient, Sarsa(λ)