Abstract
In reinforcement learning an autonomous agent learns an optimal policy while interacting with the environment. In particular, in one-step Q-learning, with each action an agent updates its Q values considering immediate rewards. In this paper a new strategy for updating Q values is proposed. The strategy, implemented in an algorithm called DQL, uses a set of agents all searching the same goal in the same space to obtain the same optimal policy. Each agent leaves traces over a copy of the environment (copies of Q-values), while searching for a goal. These copies are used by the agents to decide which actions to take. Once all the agents reach a goal, the original Q-values of the best solution found by all the agents are updated using Watkins’ Q-learning formula. DQL has some similarities with Gambardella’s Ant-Q algorithm [4], however it does not require the definition of a domain dependent heuristic and consequently the tuning of additional parameters. DQL also does not update the original Q-values with zero reward while the agents are searching, as Ant-Q does. It is shown how DQL’s guided exploration of several agents with selected exploitation (updating only the best solution) produces faster convergence times than Q-learning and Ant-Q on several test bed problems under similar conditions.
Chapter PDF
Similar content being viewed by others
References
Boutilier, C.: Sequential Optimality and Coordination in Multi agent Systems, In Proc. of IJCAI-99, Stockholm, Sweden, 1999.
Claus, C., Boutilier, C.: The Dynamics of Reinforcement Learning in Cooperative Multiagents Systems, In Proc. of AAAI-97 Multiagent Learning Workshop, pg. 13–18, Providence, 1997.
Dorigo, M.: Optimization, Learning, and Natural Algorithms, PhD thesis, Politecnico da Milano, Italy, 1992.
Gambardella, L., M., Dorigo, M.: Ant-Q: A reinforcement Learning Approach to the Traveling Salesman Problem, In Proceedings of the 12th International Conference on Machine Learning, pp. 252–260, Morgan Kaufmann, 1995.
Hu, J., Wellman, M.: Multiagent Reinforcement Learning: Theoretical Framework and an Algorithm, In Proc. 15th Int. Conf. on Machine Learning, pp. 242–250, Morgan Kaufmann, 1998.
Littman, M., Boyan, J.: A Distributed Reinforcement Learning Scheme for Network Routing, In Proc. Int. Workshop on Applications of Neural Networks to Telecommunications, pp. 45–51, J. Alspector, et al., (eds.), Lawrence Erlbaum, Hillsdale, NJ, 1993.
Littman, M.: Markov Games as a Framework for Multiagent Reinforcement Learning, In Proc. 11th Int. Conf. on Machine Learning, pp. 157–163, New Brunswick, NJ, 1994, Morgan Kaufmann.
Mariano, C., Morales, E.: A New Distributed Reinforcement Learning Algorithm for the solution of Multiple Objective Optimization Problems, In O. Cairo et al., eds. Lecture Notes in Artificial Intelligence, 1793:212–223, April 2000.
Oliver, I., Smith, D., Holland, J.R.: A study of Permutation Crossover Operators on the Traveling Salesman Problem, In Proc. 2nd Int. Conf. an Genetic Algorithms, pp. 224–230, J.J. Grefenstette (ed.), Lawrence Erlbaum, Hillsdale, NJ, 1987.
Price, B., Boutilier, C.: Implicit Imitation in Multiagent Reinforcement Learning, In Proc. 16th Int. Conf. on Machine Learning, pp., 1999.
Reinelt, G.: The Traveling Salesman: Computational Solutions for TSP Applications, Springer Verlag, Berlin, 1994.
Sutton, R., Barto, A.: Reinforcement Learning an Introduction, MIT Press, Cambridge, MA, 1998.
Tan, M.: Multiagent Reinforcement Learning: Independent vs. Cooperative Agents, In Proc. 10th Int. Conf. on Machine Learning, pp. 330–337, Amherst, MA, 1993.
Watkins, C.: Learning from Delayed Rewards. PhD thesis, Cambridge University, Cambridge, MA, 1978.
Watkins, C., Dayan, P.: Q-Learning, Machine Learning, 3:279–292, 1992.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2001 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Mariano, C.E., Morales, E.F. (2001). DQL: A New Updating Strategy for Reinforcement Learning Based on Q-Learning. In: De Raedt, L., Flach, P. (eds) Machine Learning: ECML 2001. ECML 2001. Lecture Notes in Computer Science(), vol 2167. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44795-4_28
Download citation
DOI: https://doi.org/10.1007/3-540-44795-4_28
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42536-6
Online ISBN: 978-3-540-44795-5
eBook Packages: Springer Book Archive