Skip to main content
Log in

SSPQL: Stochastic shortest path-based Q-learning

  • Regular Papers
  • Robotics and Automation
  • Published:
International Journal of Control, Automation and Systems Aims and scope Submit manuscript

Abstract

Reinforcement learning (RL) has been widely used as a mechanism for autonomous robots to learn state-action pairs by interacting with their environment. However, most RL methods usually suffer from slow convergence when deriving an optimum policy in practical applications. To solve this problem, a stochastic shortest path-based Q-learning (SSPQL) is proposed, combining a stochastic shortest path-finding method with Q-learning, a well-known model-free RL method. The rationale is, if a robot has an internal state-transition model which is incrementally learnt, then the robot can infer the local optimum policy by using a stochastic shortest path-finding method. By increasing state-action pair values comprising of these local optimum policies, a robot can then reach a goal quickly and as a result, this process can enhance convergence speed. To demonstrate the validity of this proposed learning approach, several experimental results are presented in this paper.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. C. M. Witkowski, Schemes for Learning and Behaviour: A New Expectancy Model, Ph.D. dissertation, University of London, 1997.

  2. S. Lee, I. Suh, and W. Kwon, “A motivation-based action-selection-mechanism involving reinforcement learning,” International Journal of Control, Automation, and Systems, vol. 6, no. 6, pp. 904–914, 2008.

    Google Scholar 

  3. W. Smart and L. Kaelbling, “Effective reinforcement learning for mobile robots,” Proc. of the IEEE International Conference on Robotics and Automation, May 2002.

  4. J. Peters, S. Vijayakumar, and S. Schaal, “Reinforcement learning for humanoid robotics,” Proc. of 3rd IEEE-RAS International Conference on Humanoid Robots, 2003.

  5. S. K. Chalup, C. L. Murch, and M. J. Quinlan, “Machine learning with aibo robots in the four-legged league of robocup,” IEEE Trans. on Systems, Man and Cybernetics, Part C: Applications and Reviews, vol. 37, no. 3, pp. 297–310, May 2007.

    Article  Google Scholar 

  6. D. H. Grollman and O. C. Jenkins, “Learning robot soccer skills from demonstration,” Proc. IEEE 6th International Conference on Development and Learning, pp. 276–281, 11–13 July 2007.

  7. W. Yang and N. Chong, “Imitation learning of humanoid locomotion using the direction of landing foot,” International Journal of Control, Automation and Systems, vol. 7, no. 4, pp. 585–597, 2009.

    Article  Google Scholar 

  8. Q. Jiang, H. Xi, and B. Yin, “Dynamic file grouping for load balancing in streaming media clustered server systems,” International Journal of Control, Automation and Systems, vol. 7, no. 4, pp. 630–637, 2009.

    Article  Google Scholar 

  9. A. Barto, S. Bradtke, and S. Singh, “Learning to act using real-time dynamic programming,” Artificial Intelligence, vol. 72, no. 1–2, pp. 81–138, 1995.

    Article  Google Scholar 

  10. R. Sutton, “Learning to predict by the methods of temporal differences,” Machine Learning, vol. 3, no. 1, pp. 9–44, 1988.

    Google Scholar 

  11. C. Watkins, Learning from Delayed Rewards, Ph.D. dissertation, University of Cambridge, 1989.

  12. L. P. Kaelbling, M. L. Littman, and A. P. Moore, “Reinforcement learning: a survey,” Journal of Artificial Intelligence Research, vol. 4, pp. 237–285, 1996.

    Google Scholar 

  13. G. Konidaris and G. Hayes, “An architecture for behavior-based reinforcement learning,” Adaptive Behavior, vol. 13, no. 1, pp. 5–32, 2005.

    Article  Google Scholar 

  14. I. H. Suh, I. H. Suh, S. Lee, W. Y. Kwon, and Y.-J. Cho, “Learning of action patterns and reactive behavior plans via a novel two-layered ethologybased action selection mechanism,” Proc. IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 1799–1805, 2005.

  15. I. H. Suh, I. H. Suh, S. Lee, B. O. Kim, B. J. Yi, and S. R. Oh, “Design and implementation of a behavior-based control and learning architecture for mobile robots,” Proc. IEEE International Conference on Robotics and Automation, vol. 3, pp. 4142–4147, 2003.

    Google Scholar 

  16. A. Kleiner, M. Dietl, and B. Nebel, “Towards a Life-Long Learning Soccer Agent,” Proc. International RoboCup Symposium, Fukuoka, Japan, pp. 119–127, 2002.

  17. M. Wiering and J. Schmidhuber, “Fast online Q(l),” Machine Learning, vol. 33, no. 1, pp. 105–115, 1998.

    Article  MATH  Google Scholar 

  18. J. Schmidhuber, “Exploring the predictable,” Advances in Evolutionary Computing, A. Ghosh and S. Tsuitsui, Eds., Kluwer, 2002.

  19. B. Bakker, B. Bakker, V. Zhumatiy, G. Gruener, and J. Schmidhuber, “Quasi-online reinforcement learning for robots,” Proc. IEEE International Conference on Robotics and Automation, V. Zhumatiy, Ed., pp. 2997–3002, 2006.

  20. J. Morimoto and C. Atkeson, “Learning biped locomotion,” IEEE Robotics and Automation Magazine, vol. 14, no. 2, pp. 41–51, June 2007.

    Article  Google Scholar 

  21. R. Sutton, “Integrated architectures for learning, planning, and reacting based on approximating dynamic programming,” Proc. of the 7th International Conference on Machine Learning, 1990.

  22. O. Buffet and D. Aberdeen, “Robust planning with (L)RTDP,” Proc. of the 19th International Joint Conference on Artificial Intelligence, 2005.

  23. P. Plamondon, B. Chaib-draa, and A. Benaskeur, “A Q-decomposition and bounded RTDP approach to resource allocation,” Proc. of the 6th International Joint Conference on Autonomous Agents and Multiagent Systems, ACM, p. 200, 2007.

  24. A. Strehl, L. Li, E. Wiewiora, J. Langford, and M. Littman, “PAC model-free reinforcement learning,” Proc. of the 23rd International Conference on Machine Learning, ACM, p. 888, 2006.

  25. R. E. Korf, “Real-time heuristic search,” Artificial Intelligence, vol. 42, no. 2–3, pp. 189–211, 1990.

    Article  MATH  Google Scholar 

  26. S. Babvey, O. Momtahan, and M. Meybodi, “Multi mobile robot navigation using distributed value function reinforcement learning,” Proc. IEEE International Conference on Robotics and Automation, vol. 1, 14–19, pp. 957–962, September 2003.

    Google Scholar 

  27. W. Zhu and S. Levinson, “Vision-based reinforcement learning for robot navigation,” Proc. International Joint Conference on Neural Networks, vol. 2, 15–19, pp. 1025–1030, July 2001.

    Google Scholar 

  28. M. A. Wiering, R. P. Salustowicz, and J. Schmidhuber, “Model-based reinforcement learning for evolving soccer strategies,” Proc. of Computational Intelligence in Games, Vienna, Austria, Austria: Physica Verlag Rudolf Liebing KG, pp. 99–131, 2001.

    Google Scholar 

  29. T. Nishi, Y. Takahashi, and M. Asada, “Incremental behavior acquisition based on reliability of observed behavior recognition,” Proc. IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 70–75, Oct. 29 2007–Nov. 2 2007.

  30. J. Morimoto, J. Nakanishi, G. Endo, G. Cheng, C. Atkeson, and G. Zeglin, “Poincare-map-based reinforcement learning for biped walking,” Proc. of the IEEE International Conference on Robotics and Automation, April 2005.

  31. M. Ogino, Y. Katoh, M. Aono, M. Asada, and K. Hosoda, “Vision-based reinforcement learning for humanoid behavior generation with rhythmic walking parameters,” Proc. IEEE/RSJ International Conference on Intelligent Robots and Systems, vol. 2, 27–31, pp. 1665–1671, October 2003.

    Google Scholar 

  32. W. Y. Kwon, S. Lee, and I. H. Suh, “A reinforcement learning approach involving a shortest path finding algorithm,” Proc. IEEE/RSJ International Conference on Intelligent Robots and Systems, vol. 1, 27–31, pp. 436–441, October 2003.

    Google Scholar 

  33. W. Kwon, I. H. Suh, S. Lee, and Y.-J. Cho, “Fast reinforcement learning using stochastic shortest paths for a mobile robot,” Proc. IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 82–87, Oct. 29 2007–Nov. 2 2007.

  34. T. Cormen, Introduction to Algorithms, The MIT Press, 2001.

  35. S. B. Thrun, “Efficient exploration in reinforcement learning,” Tech. Rep. CMU-CS-92-102, Computer Science Department, Carnegie Mellon University, Pittsburgh, PA, 1992.

    Google Scholar 

  36. C. Watkins and P. Dayan, “Q-learning,” Machine Learning, vol. 8, no. 3–4, pp. 279–292, 1992.

    MATH  Google Scholar 

  37. R. Neapolitan, Foundation of Algorithms: Using C++ Pseudocode, Jones and Bartlett Publishers, 1998.

  38. B. P. Gerkey, R. T. Vaughan, and A. Howard, “The player/stage project: tools for multi-robot and distributed sensor systems,” Proc. of International Conference on Advanced Robotics, pp. 317–323, 2003.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Il Hong Suh.

Additional information

Recommended by Editorial Board member Sooyong Lee under the direction of Editor Jae-Bok Song. This work was supported by the Global Frontier R&D Program on <Human-centered Interaction for Coexistence> funded by the National Research Foundation of Korea grant funded by the Korean Government(MEST) (NRFM1AXA003-2010-0029744)

Woo Young Kwon received his B.S. degree in Mechanical Engineering at Hanyang University in 2001. He received his M.S. degree from the Department of Information and Communication at the same University in 2003. Currently, he is a Ph.D. student in the Department of Electronic and Computer Engineering at Hanyang University. His research interests include artificial intelligence, reinforcement learning, probabilistic modeling and inference.

Il Hong Suh received his Ph.D. degree in Electrical Engineering from the Korea Institute of Science and Technology (KAIST), Seoul, in 1982. From 1985 to 2000, he was with the Department of EECS, Ansan Campus, Hanyang University, where he was a full professor. From 2000 to the present, he has been with Department of Computer Science and Engineering, Hanyang University, Seoul, Korea, where he is a full professor. He has served as the President of the Systems and Control Society of Korea Institute of Telecommunications Engineers (2005–2007), and president of the Korea Robotics Society in 2008. He has also served as Editor-in-chief for Journal of Intelligent Service Robotics, Springer, and an Associate Editor for IEEE Transaction on Robotics. His research interests lie in the area of control and intelligence for robots including visual perception, ontology-based robot intelligence, learning, and intelligence architecture. He has published more than 170 contributions on robotics and control.

Sanghoon Lee received his B.S. degree from the Department of Mathematics, and his M.S. and Ph.D. degrees in Electronics, Electrical, Control and Instrumentation Engineering from Hanyang University, Korea, in 1994, 1997 and 2006, respectively. Currently, he is a Research Fellow at the Education Center for Network-based Intelligence Robotics, Hanyang University. His research interests include ethology-based action selection architecture, robot vision, and ontology-based robot intelligence.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kwon, W.Y., Suh, I.H. & Lee, S. SSPQL: Stochastic shortest path-based Q-learning. Int. J. Control Autom. Syst. 9, 328–338 (2011). https://doi.org/10.1007/s12555-011-0215-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12555-011-0215-2

Keywords

Navigation