A Reinforcement Learning Scheme for a Partially-Observable Multi-Agent Game

Ishii, Shin; Fujita, Hajime; Mitsutake, Masaoki; Yamazaki, Tatsuya; Matsuda, Jun; Matsuno, Yoichiro

doi:10.1007/s10994-005-0461-8

A Reinforcement Learning Scheme for a Partially-Observable Multi-Agent Game

Published: May 2005

Volume 59, pages 31–54, (2005)
Cite this article

Download PDF

Machine Learning Aims and scope Submit manuscript

A Reinforcement Learning Scheme for a Partially-Observable Multi-Agent Game

Download PDF

Shin Ishii¹,
Hajime Fujita²,
Masaoki Mitsutake²,
Tatsuya Yamazaki³,
Jun Matsuda⁴ &
…
Yoichiro Matsuno⁵

3305 Accesses
16 Citations
4 Altmetric
Explore all metrics

Abstract

We formulate an automatic strategy acquisition problem for the multi-agent card game “Hearts” as a reinforcement learning problem. The problem can approximately be dealt with in the framework of a partially observable Markov decision process (POMDP) for a single-agent system. Hearts is an example of imperfect information games, which are more difficult to deal with than perfect information games. A POMDP is a decision problem that includes a process for estimating unobservable state variables. By regarding missing information as unobservable state variables, an imperfect information game can be formulated as a POMDP. However, the game of Hearts is a realistic problem that has a huge number of possible states, even when it is approximated as a single-agent system. Therefore, further approximation is necessary to make the strategy acquisition problem tractable. This article presents an approximation method based on estimating unobservable state variables and predicting the actions of the other agents. Simulation results show that our reinforcement learning method is applicable to such a difficult multi-agent problem.

References

Barto, A. G., Sutton, R. S., & Anderson, C. W. (1983). Neuronlike adaptive elements that can solve difficult learning control problems. IEEE Trans. Syst., Man. & Cybern., 13, 834–846.
Google Scholar
Blair, J. R. S., Mutchler, D., & Lent, M. (1995). Perfect recall and pruning in games with imperfect information. Computational Intelligence, 12, 131–154.
Google Scholar
Crites, R. H. (1996). Large-scale dynamic optimization using teams of reinforcement learning agents. Ph.D. thesis, University of Massachusetts, Amherst.
Crites, R. H., & Barto, A. G. (1996). Elevator group control using multiple reinforcement learning agents. Machine Learning, 33, 235–262.
Google Scholar
Ginsberg, M. (2001). Gib: Imperfect information in a computationally challenging fame. Journal of Artificial Intelligence Research, 14, 303–358.
Google Scholar
Hu, J., & Wellman, M. P. (1998). Multiagent reinforcement learning: Theoretical framework and an algorithm. In Proceedings of the Fifteenth International Conference on Machine Learning (pp. 242–250).
Ishii, S., Yoshida, W., & Yoshimoto, J. (2002). Control of exploitation-exploration meta-parameter in reinforcement learning. Neural Networks, 15, 665–687.
PubMed Google Scholar
Kaelbling, L. P., Littman, M. L., & Cassandra, A. (1998). Planning and acting in partially observable stochastic domains. Artificial Intelligence, 101, 99–134.
Article MathSciNet Google Scholar
Lin, L.-J., & Mitchell, T. (1992). Memory approaches to reinforcement learning in non-markovian domains. Tech. rep., CMU-CS-92-138.
Littman, M. L. (1994). Markov games as a framework for multi-agent reinforcement learning. In Proceedings of the 11th International Conference on Machine Learning (pp. 157–163).
Matsuno, Y., Yamazaki, T., Matsuda, J., & Ishii, S. (2001). A multi-agent reinforcement learning method for a partially-observable competitive game. In Proceedings of the Fifth International Conference on Autonomous Agents (pp. 39–40).
McCallum, A. (1995). Reinforcement learning with selective perception and hidden state. Ph.D. thesis, Univercity of Rochester.
Moody, J., & Darken, C. J. (1989). Fast learning in networks of locally-tuned processing units. Neural Computation, 1, 281–294.
Google Scholar
Moore, A., & Atkeson, C. (1993). Prioritized sweeping: Reinforcement learning with less data and less real time. Machine Learning, 13, 103–130.
Google Scholar
Nagayuki, Y., Ishii, S., & Doya, K. (2000). Multi-agent reinforcement learning: An approach based on the other agent’s internal model. In Proceedings of the Fourth International Conference on MultiAgent Systems (pp. 215–221).
Salustowicz, R. P., Wiering, M. A., & Schmidhuber, J. (1998). Learning team strategies: Soccer case studies. Machine Learning, 33, 263–282.
Google Scholar
Sandholm, T. W., & Crites, R. H. (1995). Multiagent reinforcement learning in the iterated prisoner’s dilemma. Biosystems, 37, 147–166.
Google Scholar
Sato, M., & Ishii, S. (2000). On-line em algorithm for the normalized gaussian network. Neural Computation, 12, 407–432.
CAS PubMed Google Scholar
Sen, S., Sekaran, M., & Hale, J. (1994). Learning to coordinate without sharing information. In Proceedings of the Twelfth National Conference on Artificial Intelligence (pp. 426–431).
Sutton, R., & Barto, A. (Eds.). (1998). Reinforcement learning: An introduction. MIT Press.
Tan, M. (1993). Multi-agent reinforcement learning: Independent vs. cooperative agents. In Proceedings of the Tenth International Conference on Machine Learning (pp. 330–337).
Tesauro, G. J. (1994). Td-gammon, a self-teaching backgammon program, achieves masterlevel play. Neural Computation, 6, 215–219.
Google Scholar
Whitehead, S., & Lin, L.-J. (1995). Reinforcement learning of non-markov decision processes. Artificial Intelligence, 73, 271–306.
Google Scholar

Download references

Author information

Authors and Affiliations

Nara Institute of Science and Technology, CREST, Japan Science and Technology Agency, 8916-5 Takayama, Ikoma, 630-0192, Japan
Shin Ishii
Nara Institute of Science and Technology, 8916-5 Takayama, Ikoma, 630-0192, Japan
Hajime Fujita & Masaoki Mitsutake
National Institute of Information and Communications Technology, 3-5 Hikaridai, Seika, Kyoto, 619-0289, Japan
Tatsuya Yamazaki
Osaka Gakuin University, 2-36-1 Kishibeminami, Suita, 564-8511, Japan
Jun Matsuda
Ricoh Co. Ltd., 1-1-17 Koishikawa, Tokyo, 112-0002, Japan
Yoichiro Matsuno

Authors

Shin Ishii
View author publications
You can also search for this author in PubMed Google Scholar
Hajime Fujita
View author publications
You can also search for this author in PubMed Google Scholar
Masaoki Mitsutake
View author publications
You can also search for this author in PubMed Google Scholar
Tatsuya Yamazaki
View author publications
You can also search for this author in PubMed Google Scholar
Jun Matsuda
View author publications
You can also search for this author in PubMed Google Scholar
Yoichiro Matsuno
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shin Ishii.

Additional information

Editor

Risto Miikkulainen

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ishii, S., Fujita, H., Mitsutake, M. et al. A Reinforcement Learning Scheme for a Partially-Observable Multi-Agent Game. Mach Learn 59, 31–54 (2005). https://doi.org/10.1007/s10994-005-0461-8

Download citation

Received: 29 March 2002
Revised: 15 September 2004
Accepted: 27 October 2004
Issue Date: May 2005
DOI: https://doi.org/10.1007/s10994-005-0461-8

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

A Reinforcement Learning Scheme for a Partially-Observable Multi-Agent Game

Abstract

Article PDF

Similar content being viewed by others

A unifying learning framework for building artificial game-playing agents

Adaptive Learning in Games: Defining Profiles of Competitor Players

Mastering the Card Game of Jaipur Through Zero-Knowledge Self-Play Reinforcement Learning and Action Masks

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Editor

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A Reinforcement Learning Scheme for a Partially-Observable Multi-Agent Game

Abstract

Article PDF

Similar content being viewed by others

A unifying learning framework for building artificial game-playing agents

Adaptive Learning in Games: Defining Profiles of Competitor Players

Mastering the Card Game of Jaipur Through Zero-Knowledge Self-Play Reinforcement Learning and Action Masks

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Editor

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation