DQL: A New Updating Strategy for Reinforcement Learning Based on Q-Learning

Mariano, Carlos E.; Morales, Eduardo F.

doi:10.1007/3-540-44795-4_28

Carlos E. Mariano³ &
Eduardo F. Morales⁴

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2167))

Included in the following conference series:

European Conference on Machine Learning

3326 Accesses
7 Citations

Abstract

In reinforcement learning an autonomous agent learns an optimal policy while interacting with the environment. In particular, in one-step Q-learning, with each action an agent updates its Q values considering immediate rewards. In this paper a new strategy for updating Q values is proposed. The strategy, implemented in an algorithm called DQL, uses a set of agents all searching the same goal in the same space to obtain the same optimal policy. Each agent leaves traces over a copy of the environment (copies of Q-values), while searching for a goal. These copies are used by the agents to decide which actions to take. Once all the agents reach a goal, the original Q-values of the best solution found by all the agents are updated using Watkins’ Q-learning formula. DQL has some similarities with Gambardella’s Ant-Q algorithm [4], however it does not require the definition of a domain dependent heuristic and consequently the tuning of additional parameters. DQL also does not update the original Q-values with zero reward while the agents are searching, as Ant-Q does. It is shown how DQL’s guided exploration of several agents with selected exploitation (updating only the best solution) produces faster convergence times than Q-learning and Ant-Q on several test bed problems under similar conditions.

Download to read the full chapter text

Chapter PDF

Multi-agent-Based Systems in Machine Learning and Its Practical Case Studies

Comparing NARS and Reinforcement Learning: An Analysis of ONA and Q-Learning Algorithms

Improvements to Vanilla Implementation of Q-Learning Used in Path Planning of an Agent

References

Boutilier, C.: Sequential Optimality and Coordination in Multi agent Systems, In Proc. of IJCAI-99, Stockholm, Sweden, 1999.
Google Scholar
Claus, C., Boutilier, C.: The Dynamics of Reinforcement Learning in Cooperative Multiagents Systems, In Proc. of AAAI-97 Multiagent Learning Workshop, pg. 13–18, Providence, 1997.
Google Scholar
Dorigo, M.: Optimization, Learning, and Natural Algorithms, PhD thesis, Politecnico da Milano, Italy, 1992.
Google Scholar
Gambardella, L., M., Dorigo, M.: Ant-Q: A reinforcement Learning Approach to the Traveling Salesman Problem, In Proceedings of the 12th International Conference on Machine Learning, pp. 252–260, Morgan Kaufmann, 1995.
Google Scholar
Hu, J., Wellman, M.: Multiagent Reinforcement Learning: Theoretical Framework and an Algorithm, In Proc. 15th Int. Conf. on Machine Learning, pp. 242–250, Morgan Kaufmann, 1998.
Google Scholar
Littman, M., Boyan, J.: A Distributed Reinforcement Learning Scheme for Network Routing, In Proc. Int. Workshop on Applications of Neural Networks to Telecommunications, pp. 45–51, J. Alspector, et al., (eds.), Lawrence Erlbaum, Hillsdale, NJ, 1993.
Google Scholar
Littman, M.: Markov Games as a Framework for Multiagent Reinforcement Learning, In Proc. 11th Int. Conf. on Machine Learning, pp. 157–163, New Brunswick, NJ, 1994, Morgan Kaufmann.
Google Scholar
Mariano, C., Morales, E.: A New Distributed Reinforcement Learning Algorithm for the solution of Multiple Objective Optimization Problems, In O. Cairo et al., eds. Lecture Notes in Artificial Intelligence, 1793:212–223, April 2000.
Google Scholar
Oliver, I., Smith, D., Holland, J.R.: A study of Permutation Crossover Operators on the Traveling Salesman Problem, In Proc. 2nd Int. Conf. an Genetic Algorithms, pp. 224–230, J.J. Grefenstette (ed.), Lawrence Erlbaum, Hillsdale, NJ, 1987.
Google Scholar
Price, B., Boutilier, C.: Implicit Imitation in Multiagent Reinforcement Learning, In Proc. 16th Int. Conf. on Machine Learning, pp., 1999.
Google Scholar
Reinelt, G.: The Traveling Salesman: Computational Solutions for TSP Applications, Springer Verlag, Berlin, 1994.
Google Scholar
Sutton, R., Barto, A.: Reinforcement Learning an Introduction, MIT Press, Cambridge, MA, 1998.
Google Scholar
Tan, M.: Multiagent Reinforcement Learning: Independent vs. Cooperative Agents, In Proc. 10th Int. Conf. on Machine Learning, pp. 330–337, Amherst, MA, 1993.
Google Scholar
Watkins, C.: Learning from Delayed Rewards. PhD thesis, Cambridge University, Cambridge, MA, 1978.
Google Scholar
Watkins, C., Dayan, P.: Q-Learning, Machine Learning, 3:279–292, 1992.
Google Scholar

Download references

Author information

Authors and Affiliations

Instituto Mexicano de Tecnología del Agua, Paseo Cuahunáhuac 8532, Jiutepec, Morelos, 62550, MEXICO
Carlos E. Mariano
ITESM-Campus Cuernavaca, Paseo de la Reforma 182-A, Col. Lomas de Cuernavaca, Temixco, Morelos, 62589, MEXICO
Eduardo F. Morales

Authors

Carlos E. Mariano
View author publications
You can also search for this author in PubMed Google Scholar
Eduardo F. Morales
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, Albert-Ludwigs University Freiburg, Georges Köhler-Allee, Geb. 079, 79110, Freiburg, Germany
Luc De Raedt
Department of Computer Science, University of Bristol, Merchant Ventures Bldg., Woodland Road, Bristol, BS8 1UB, UK
Peter Flach

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mariano, C.E., Morales, E.F. (2001). DQL: A New Updating Strategy for Reinforcement Learning Based on Q-Learning. In: De Raedt, L., Flach, P. (eds) Machine Learning: ECML 2001. ECML 2001. Lecture Notes in Computer Science(), vol 2167. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44795-4_28

Download citation

DOI: https://doi.org/10.1007/3-540-44795-4_28
Published: 30 August 2001
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42536-6
Online ISBN: 978-3-540-44795-5
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics

DQL: A New Updating Strategy for Reinforcement Learning Based on Q-Learning

Abstract

Chapter PDF

Similar content being viewed by others

Multi-agent-Based Systems in Machine Learning and Its Practical Case Studies

Comparing NARS and Reinforcement Learning: An Analysis of ONA and Q-Learning Algorithms

Improvements to Vanilla Implementation of Q-Learning Used in Path Planning of an Agent

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

DQL: A New Updating Strategy for Reinforcement Learning Based on Q-Learning

Abstract

Chapter PDF

Similar content being viewed by others

Multi-agent-Based Systems in Machine Learning and Its Practical Case Studies

Comparing NARS and Reinforcement Learning: An Analysis of ONA and Q-Learning Algorithms

Improvements to Vanilla Implementation of Q-Learning Used in Path Planning of an Agent

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation