SSPQL: Stochastic shortest path-based Q-learning

Kwon, Woo Young; Suh, Il Hong; Lee, Sanghoon

doi:10.1007/s12555-011-0215-2

SSPQL: Stochastic shortest path-based Q-learning

Regular Papers
Robotics and Automation
Published: 02 April 2011

Volume 9, pages 328–338, (2011)
Cite this article

International Journal of Control, Automation and Systems Aims and scope Submit manuscript

Woo Young Kwon¹,
Il Hong Suh¹ &
Sanghoon Lee¹

365 Accesses
7 Citations
Explore all metrics

Abstract

Reinforcement learning (RL) has been widely used as a mechanism for autonomous robots to learn state-action pairs by interacting with their environment. However, most RL methods usually suffer from slow convergence when deriving an optimum policy in practical applications. To solve this problem, a stochastic shortest path-based Q-learning (SSPQL) is proposed, combining a stochastic shortest path-finding method with Q-learning, a well-known model-free RL method. The rationale is, if a robot has an internal state-transition model which is incrementally learnt, then the robot can infer the local optimum policy by using a stochastic shortest path-finding method. By increasing state-action pair values comprising of these local optimum policies, a robot can then reach a goal quickly and as a result, this process can enhance convergence speed. To demonstrate the validity of this proposed learning approach, several experimental results are presented in this paper.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms

Path Planning and Trajectory Planning Algorithms: A General Overview

Challenges of real-world reinforcement learning: definitions, benchmarks and analysis

Article 22 April 2021

References

C. M. Witkowski, Schemes for Learning and Behaviour: A New Expectancy Model, Ph.D. dissertation, University of London, 1997.
S. Lee, I. Suh, and W. Kwon, “A motivation-based action-selection-mechanism involving reinforcement learning,” International Journal of Control, Automation, and Systems, vol. 6, no. 6, pp. 904–914, 2008.
Google Scholar
W. Smart and L. Kaelbling, “Effective reinforcement learning for mobile robots,” Proc. of the IEEE International Conference on Robotics and Automation, May 2002.
J. Peters, S. Vijayakumar, and S. Schaal, “Reinforcement learning for humanoid robotics,” Proc. of 3rd IEEE-RAS International Conference on Humanoid Robots, 2003.
S. K. Chalup, C. L. Murch, and M. J. Quinlan, “Machine learning with aibo robots in the four-legged league of robocup,” IEEE Trans. on Systems, Man and Cybernetics, Part C: Applications and Reviews, vol. 37, no. 3, pp. 297–310, May 2007.
Article Google Scholar
D. H. Grollman and O. C. Jenkins, “Learning robot soccer skills from demonstration,” Proc. IEEE 6th International Conference on Development and Learning, pp. 276–281, 11–13 July 2007.
W. Yang and N. Chong, “Imitation learning of humanoid locomotion using the direction of landing foot,” International Journal of Control, Automation and Systems, vol. 7, no. 4, pp. 585–597, 2009.
Article Google Scholar
Q. Jiang, H. Xi, and B. Yin, “Dynamic file grouping for load balancing in streaming media clustered server systems,” International Journal of Control, Automation and Systems, vol. 7, no. 4, pp. 630–637, 2009.
Article Google Scholar
A. Barto, S. Bradtke, and S. Singh, “Learning to act using real-time dynamic programming,” Artificial Intelligence, vol. 72, no. 1–2, pp. 81–138, 1995.
Article Google Scholar
R. Sutton, “Learning to predict by the methods of temporal differences,” Machine Learning, vol. 3, no. 1, pp. 9–44, 1988.
Google Scholar
C. Watkins, Learning from Delayed Rewards, Ph.D. dissertation, University of Cambridge, 1989.
L. P. Kaelbling, M. L. Littman, and A. P. Moore, “Reinforcement learning: a survey,” Journal of Artificial Intelligence Research, vol. 4, pp. 237–285, 1996.
Google Scholar
G. Konidaris and G. Hayes, “An architecture for behavior-based reinforcement learning,” Adaptive Behavior, vol. 13, no. 1, pp. 5–32, 2005.
Article Google Scholar
I. H. Suh, I. H. Suh, S. Lee, W. Y. Kwon, and Y.-J. Cho, “Learning of action patterns and reactive behavior plans via a novel two-layered ethologybased action selection mechanism,” Proc. IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 1799–1805, 2005.
I. H. Suh, I. H. Suh, S. Lee, B. O. Kim, B. J. Yi, and S. R. Oh, “Design and implementation of a behavior-based control and learning architecture for mobile robots,” Proc. IEEE International Conference on Robotics and Automation, vol. 3, pp. 4142–4147, 2003.
Google Scholar
A. Kleiner, M. Dietl, and B. Nebel, “Towards a Life-Long Learning Soccer Agent,” Proc. International RoboCup Symposium, Fukuoka, Japan, pp. 119–127, 2002.
M. Wiering and J. Schmidhuber, “Fast online Q(l),” Machine Learning, vol. 33, no. 1, pp. 105–115, 1998.
Article MATH Google Scholar
J. Schmidhuber, “Exploring the predictable,” Advances in Evolutionary Computing, A. Ghosh and S. Tsuitsui, Eds., Kluwer, 2002.
B. Bakker, B. Bakker, V. Zhumatiy, G. Gruener, and J. Schmidhuber, “Quasi-online reinforcement learning for robots,” Proc. IEEE International Conference on Robotics and Automation, V. Zhumatiy, Ed., pp. 2997–3002, 2006.
J. Morimoto and C. Atkeson, “Learning biped locomotion,” IEEE Robotics and Automation Magazine, vol. 14, no. 2, pp. 41–51, June 2007.
Article Google Scholar
R. Sutton, “Integrated architectures for learning, planning, and reacting based on approximating dynamic programming,” Proc. of the 7th International Conference on Machine Learning, 1990.
O. Buffet and D. Aberdeen, “Robust planning with (L)RTDP,” Proc. of the 19th International Joint Conference on Artificial Intelligence, 2005.
P. Plamondon, B. Chaib-draa, and A. Benaskeur, “A Q-decomposition and bounded RTDP approach to resource allocation,” Proc. of the 6th International Joint Conference on Autonomous Agents and Multiagent Systems, ACM, p. 200, 2007.
A. Strehl, L. Li, E. Wiewiora, J. Langford, and M. Littman, “PAC model-free reinforcement learning,” Proc. of the 23rd International Conference on Machine Learning, ACM, p. 888, 2006.
R. E. Korf, “Real-time heuristic search,” Artificial Intelligence, vol. 42, no. 2–3, pp. 189–211, 1990.
Article MATH Google Scholar
S. Babvey, O. Momtahan, and M. Meybodi, “Multi mobile robot navigation using distributed value function reinforcement learning,” Proc. IEEE International Conference on Robotics and Automation, vol. 1, 14–19, pp. 957–962, September 2003.
Google Scholar
W. Zhu and S. Levinson, “Vision-based reinforcement learning for robot navigation,” Proc. International Joint Conference on Neural Networks, vol. 2, 15–19, pp. 1025–1030, July 2001.
Google Scholar
M. A. Wiering, R. P. Salustowicz, and J. Schmidhuber, “Model-based reinforcement learning for evolving soccer strategies,” Proc. of Computational Intelligence in Games, Vienna, Austria, Austria: Physica Verlag Rudolf Liebing KG, pp. 99–131, 2001.
Google Scholar
T. Nishi, Y. Takahashi, and M. Asada, “Incremental behavior acquisition based on reliability of observed behavior recognition,” Proc. IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 70–75, Oct. 29 2007–Nov. 2 2007.
J. Morimoto, J. Nakanishi, G. Endo, G. Cheng, C. Atkeson, and G. Zeglin, “Poincare-map-based reinforcement learning for biped walking,” Proc. of the IEEE International Conference on Robotics and Automation, April 2005.
M. Ogino, Y. Katoh, M. Aono, M. Asada, and K. Hosoda, “Vision-based reinforcement learning for humanoid behavior generation with rhythmic walking parameters,” Proc. IEEE/RSJ International Conference on Intelligent Robots and Systems, vol. 2, 27–31, pp. 1665–1671, October 2003.
Google Scholar
W. Y. Kwon, S. Lee, and I. H. Suh, “A reinforcement learning approach involving a shortest path finding algorithm,” Proc. IEEE/RSJ International Conference on Intelligent Robots and Systems, vol. 1, 27–31, pp. 436–441, October 2003.
Google Scholar
W. Kwon, I. H. Suh, S. Lee, and Y.-J. Cho, “Fast reinforcement learning using stochastic shortest paths for a mobile robot,” Proc. IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 82–87, Oct. 29 2007–Nov. 2 2007.
T. Cormen, Introduction to Algorithms, The MIT Press, 2001.
S. B. Thrun, “Efficient exploration in reinforcement learning,” Tech. Rep. CMU-CS-92-102, Computer Science Department, Carnegie Mellon University, Pittsburgh, PA, 1992.
Google Scholar
C. Watkins and P. Dayan, “Q-learning,” Machine Learning, vol. 8, no. 3–4, pp. 279–292, 1992.
MATH Google Scholar
R. Neapolitan, Foundation of Algorithms: Using C++ Pseudocode, Jones and Bartlett Publishers, 1998.
B. P. Gerkey, R. T. Vaughan, and A. Howard, “The player/stage project: tools for multi-robot and distributed sensor systems,” Proc. of International Conference on Advanced Robotics, pp. 317–323, 2003.

Download references

Author information

Authors and Affiliations

Department of Electronics & Computer Engineering, Hanyang University, 17 Haengdang-dong, Seongdong-gu, Seoul, 133-791, Korea
Woo Young Kwon, Il Hong Suh & Sanghoon Lee

Authors

Woo Young Kwon
View author publications
You can also search for this author in PubMed Google Scholar
Il Hong Suh
View author publications
You can also search for this author in PubMed Google Scholar
Sanghoon Lee
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Il Hong Suh.

Additional information

Recommended by Editorial Board member Sooyong Lee under the direction of Editor Jae-Bok Song. This work was supported by the Global Frontier R&D Program on <Human-centered Interaction for Coexistence> funded by the National Research Foundation of Korea grant funded by the Korean Government(MEST) (NRFM1AXA003-2010-0029744)

Woo Young Kwon received his B.S. degree in Mechanical Engineering at Hanyang University in 2001. He received his M.S. degree from the Department of Information and Communication at the same University in 2003. Currently, he is a Ph.D. student in the Department of Electronic and Computer Engineering at Hanyang University. His research interests include artificial intelligence, reinforcement learning, probabilistic modeling and inference.

Il Hong Suh received his Ph.D. degree in Electrical Engineering from the Korea Institute of Science and Technology (KAIST), Seoul, in 1982. From 1985 to 2000, he was with the Department of EECS, Ansan Campus, Hanyang University, where he was a full professor. From 2000 to the present, he has been with Department of Computer Science and Engineering, Hanyang University, Seoul, Korea, where he is a full professor. He has served as the President of the Systems and Control Society of Korea Institute of Telecommunications Engineers (2005–2007), and president of the Korea Robotics Society in 2008. He has also served as Editor-in-chief for Journal of Intelligent Service Robotics, Springer, and an Associate Editor for IEEE Transaction on Robotics. His research interests lie in the area of control and intelligence for robots including visual perception, ontology-based robot intelligence, learning, and intelligence architecture. He has published more than 170 contributions on robotics and control.

Sanghoon Lee received his B.S. degree from the Department of Mathematics, and his M.S. and Ph.D. degrees in Electronics, Electrical, Control and Instrumentation Engineering from Hanyang University, Korea, in 1994, 1997 and 2006, respectively. Currently, he is a Research Fellow at the Education Center for Network-based Intelligence Robotics, Hanyang University. His research interests include ethology-based action selection architecture, robot vision, and ontology-based robot intelligence.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kwon, W.Y., Suh, I.H. & Lee, S. SSPQL: Stochastic shortest path-based Q-learning. Int. J. Control Autom. Syst. 9, 328–338 (2011). https://doi.org/10.1007/s12555-011-0215-2

Download citation

Received: 31 March 2009
Revised: 14 October 2010
Accepted: 21 October 2010
Published: 02 April 2011
Issue Date: April 2011
DOI: https://doi.org/10.1007/s12555-011-0215-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

SSPQL: Stochastic shortest path-based Q-learning

Abstract

Access this article

Similar content being viewed by others

Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms

Path Planning and Trajectory Planning Algorithms: A General Overview

Challenges of real-world reinforcement learning: definitions, benchmarks and analysis

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

SSPQL: Stochastic shortest path-based Q-learning

Abstract

Access this article

Similar content being viewed by others

Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms

Path Planning and Trajectory Planning Algorithms: A General Overview

Challenges of real-world reinforcement learning: definitions, benchmarks and analysis

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation