Abstract
In this paper, we propose a method to reduce the learning time of Q-learning by combining the method of updating even to Q-values of unexecuted actions with the method of adding a terminal reward to unvisited Q-values. To verify the method, its performance was compared to that of conventional Q-learning. The proposed approach showed the same performance as conventional Q-learning, with only 27 % of the learning episodes required for conventional Q-learning. Accordingly, we verified that the proposed method reduced learning time by updating more Q-values in the early stage of learning and distributing a terminal reward to more Q-values.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Sung Y, Cho K (2012) Collaborative programming by demonstration for human, robot, and software agent team members in a virtual environment. IEEE Intell Syst 27(2):674–679
Watkins CJCH, Dayan P (1992) Q-learning. Mach Learn 8:279–292
Melo FS, Ribeiro MI (2007) Q-learning with linear function approximation. In: Learning theory: 20th annual conference on learning theory, Lecture notes in artificial intelligence (LNAI), vol 4539, pp 308–322
Thomaz AL, Hoffman G, Breazeal C (2006) Reinforcement learning with human teachers: understanding how people want to teach robots. In: the 15th IEEE International Symposium on Robot Hum Interact Commun pp 352–257
Jeong SI, Lee YJ (2001) Fuzzy Q-learning using distributed eligibility. J Fuzzy Log Intell Syst 11(5):388–394
Kormushev P, Nomoto K, Dong F, Hirota K (2008) Time manipulation technique for speeding up reinforcement learning in simulations. Int J Cybern Inf Technol 8(1):12–24
Moore AW, Atkeson CG (1993) Prioritized sweeping: reinforcement learning with less data and less time. Mach Learn 13:103–130
Jeong SI, Lee YJ (2001) Fuzzy Q-learning using distributed eligibility. J Fuzzy Logic Intell Syst 11(5):388–394
Singh S, Sutton RS, Kaelbling P (1996) Reinforcement learning with replacing eligibility traces. Mach Learn 22:123–158
Lee SG (2006) A cooperation online reinforcement learning approach in Ant-Q. Lecture notes in computer science (LNCS) 4232, pp 487–494
Wiering MA (2004) QV(λ)-learning: a new on-policy reinforcement learning algorithm. Mach Learn 55(1):5–29
Peng J, Williams RJ (1994) Incremental multi-step Q-learning. Mach Learn 226–232
McGovern A, Sutton RS, Fagg AH (1997) Roles of macro-actions in accelerating reinforcement learning. In: Grace Hopper celebration of women in computing, pp 13–18
Kim BC, Yun BJ (1999) Reinforcement learning using propagation of goal-state-value. J Korea Inf Process 6(5):1303–1311
Acknowledgments
This work was supported by Basic Science Research Program through the National Research Foundation of Korea(NRF) funded by the Ministry of Education, Science and Technology (2011-0011266).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer Science+Business Media Dordrecht
About this paper
Cite this paper
Sung, Y., Ahn, E., Cho, K. (2013). Enhanced Reinforcement Learning by Recursive Updating of Q-values for Reward Propagation. In: Kim, K., Chung, KY. (eds) IT Convergence and Security 2012. Lecture Notes in Electrical Engineering, vol 215. Springer, Dordrecht. https://doi.org/10.1007/978-94-007-5860-5_121
Download citation
DOI: https://doi.org/10.1007/978-94-007-5860-5_121
Published:
Publisher Name: Springer, Dordrecht
Print ISBN: 978-94-007-5859-9
Online ISBN: 978-94-007-5860-5
eBook Packages: EngineeringEngineering (R0)