Skip to main content

Enhanced Reinforcement Learning by Recursive Updating of Q-values for Reward Propagation

  • Conference paper
  • First Online:
  • 941 Accesses

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 215))

Abstract

In this paper, we propose a method to reduce the learning time of Q-learning by combining the method of updating even to Q-values of unexecuted actions with the method of adding a terminal reward to unvisited Q-values. To verify the method, its performance was compared to that of conventional Q-learning. The proposed approach showed the same performance as conventional Q-learning, with only 27 % of the learning episodes required for conventional Q-learning. Accordingly, we verified that the proposed method reduced learning time by updating more Q-values in the early stage of learning and distributing a terminal reward to more Q-values.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Sung Y, Cho K (2012) Collaborative programming by demonstration for human, robot, and software agent team members in a virtual environment. IEEE Intell Syst 27(2):674–679

    Google Scholar 

  2. Watkins CJCH, Dayan P (1992) Q-learning. Mach Learn 8:279–292

    MATH  Google Scholar 

  3. Melo FS, Ribeiro MI (2007) Q-learning with linear function approximation. In: Learning theory: 20th annual conference on learning theory, Lecture notes in artificial intelligence (LNAI), vol 4539, pp 308–322

    Google Scholar 

  4. Thomaz AL, Hoffman G, Breazeal C (2006) Reinforcement learning with human teachers: understanding how people want to teach robots. In: the 15th IEEE International Symposium on Robot Hum Interact Commun pp 352–257

    Google Scholar 

  5. Jeong SI, Lee YJ (2001) Fuzzy Q-learning using distributed eligibility. J Fuzzy Log Intell Syst 11(5):388–394

    Google Scholar 

  6. Kormushev P, Nomoto K, Dong F, Hirota K (2008) Time manipulation technique for speeding up reinforcement learning in simulations. Int J Cybern Inf Technol 8(1):12–24

    MathSciNet  Google Scholar 

  7. Moore AW, Atkeson CG (1993) Prioritized sweeping: reinforcement learning with less data and less time. Mach Learn 13:103–130

    Google Scholar 

  8. Jeong SI, Lee YJ (2001) Fuzzy Q-learning using distributed eligibility. J Fuzzy Logic Intell Syst 11(5):388–394

    Google Scholar 

  9. Singh S, Sutton RS, Kaelbling P (1996) Reinforcement learning with replacing eligibility traces. Mach Learn 22:123–158

    Google Scholar 

  10. Lee SG (2006) A cooperation online reinforcement learning approach in Ant-Q. Lecture notes in computer science (LNCS) 4232, pp 487–494

    Google Scholar 

  11. Wiering MA (2004) QV(λ)-learning: a new on-policy reinforcement learning algorithm. Mach Learn 55(1):5–29

    Google Scholar 

  12. Peng J, Williams RJ (1994) Incremental multi-step Q-learning. Mach Learn 226–232

    Google Scholar 

  13. McGovern A, Sutton RS, Fagg AH (1997) Roles of macro-actions in accelerating reinforcement learning. In: Grace Hopper celebration of women in computing, pp 13–18

    Google Scholar 

  14. Kim BC, Yun BJ (1999) Reinforcement learning using propagation of goal-state-value. J Korea Inf Process 6(5):1303–1311

    Google Scholar 

Download references

Acknowledgments

This work was supported by Basic Science Research Program through the National Research Foundation of Korea(NRF) funded by the Ministry of Education, Science and Technology (2011-0011266).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kyungeun Cho .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer Science+Business Media Dordrecht

About this paper

Cite this paper

Sung, Y., Ahn, E., Cho, K. (2013). Enhanced Reinforcement Learning by Recursive Updating of Q-values for Reward Propagation. In: Kim, K., Chung, KY. (eds) IT Convergence and Security 2012. Lecture Notes in Electrical Engineering, vol 215. Springer, Dordrecht. https://doi.org/10.1007/978-94-007-5860-5_121

Download citation

  • DOI: https://doi.org/10.1007/978-94-007-5860-5_121

  • Published:

  • Publisher Name: Springer, Dordrecht

  • Print ISBN: 978-94-007-5859-9

  • Online ISBN: 978-94-007-5860-5

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics