Abstract
We formulate an automatic strategy acquisition problem for the multi-agent card game “Hearts” as a reinforcement learning problem. The problem can approximately be dealt with in the framework of a partially observable Markov decision process (POMDP) for a single-agent system. Hearts is an example of imperfect information games, which are more difficult to deal with than perfect information games. A POMDP is a decision problem that includes a process for estimating unobservable state variables. By regarding missing information as unobservable state variables, an imperfect information game can be formulated as a POMDP. However, the game of Hearts is a realistic problem that has a huge number of possible states, even when it is approximated as a single-agent system. Therefore, further approximation is necessary to make the strategy acquisition problem tractable. This article presents an approximation method based on estimating unobservable state variables and predicting the actions of the other agents. Simulation results show that our reinforcement learning method is applicable to such a difficult multi-agent problem.
Article PDF
Similar content being viewed by others
References
Barto, A. G., Sutton, R. S., & Anderson, C. W. (1983). Neuronlike adaptive elements that can solve difficult learning control problems. IEEE Trans. Syst., Man. & Cybern., 13, 834–846.
Blair, J. R. S., Mutchler, D., & Lent, M. (1995). Perfect recall and pruning in games with imperfect information. Computational Intelligence, 12, 131–154.
Crites, R. H. (1996). Large-scale dynamic optimization using teams of reinforcement learning agents. Ph.D. thesis, University of Massachusetts, Amherst.
Crites, R. H., & Barto, A. G. (1996). Elevator group control using multiple reinforcement learning agents. Machine Learning, 33, 235–262.
Ginsberg, M. (2001). Gib: Imperfect information in a computationally challenging fame. Journal of Artificial Intelligence Research, 14, 303–358.
Hu, J., & Wellman, M. P. (1998). Multiagent reinforcement learning: Theoretical framework and an algorithm. In Proceedings of the Fifteenth International Conference on Machine Learning (pp. 242–250).
Ishii, S., Yoshida, W., & Yoshimoto, J. (2002). Control of exploitation-exploration meta-parameter in reinforcement learning. Neural Networks, 15, 665–687.
Kaelbling, L. P., Littman, M. L., & Cassandra, A. (1998). Planning and acting in partially observable stochastic domains. Artificial Intelligence, 101, 99–134.
Lin, L.-J., & Mitchell, T. (1992). Memory approaches to reinforcement learning in non-markovian domains. Tech. rep., CMU-CS-92-138.
Littman, M. L. (1994). Markov games as a framework for multi-agent reinforcement learning. In Proceedings of the 11th International Conference on Machine Learning (pp. 157–163).
Matsuno, Y., Yamazaki, T., Matsuda, J., & Ishii, S. (2001). A multi-agent reinforcement learning method for a partially-observable competitive game. In Proceedings of the Fifth International Conference on Autonomous Agents (pp. 39–40).
McCallum, A. (1995). Reinforcement learning with selective perception and hidden state. Ph.D. thesis, Univercity of Rochester.
Moody, J., & Darken, C. J. (1989). Fast learning in networks of locally-tuned processing units. Neural Computation, 1, 281–294.
Moore, A., & Atkeson, C. (1993). Prioritized sweeping: Reinforcement learning with less data and less real time. Machine Learning, 13, 103–130.
Nagayuki, Y., Ishii, S., & Doya, K. (2000). Multi-agent reinforcement learning: An approach based on the other agent’s internal model. In Proceedings of the Fourth International Conference on MultiAgent Systems (pp. 215–221).
Salustowicz, R. P., Wiering, M. A., & Schmidhuber, J. (1998). Learning team strategies: Soccer case studies. Machine Learning, 33, 263–282.
Sandholm, T. W., & Crites, R. H. (1995). Multiagent reinforcement learning in the iterated prisoner’s dilemma. Biosystems, 37, 147–166.
Sato, M., & Ishii, S. (2000). On-line em algorithm for the normalized gaussian network. Neural Computation, 12, 407–432.
Sen, S., Sekaran, M., & Hale, J. (1994). Learning to coordinate without sharing information. In Proceedings of the Twelfth National Conference on Artificial Intelligence (pp. 426–431).
Sutton, R., & Barto, A. (Eds.). (1998). Reinforcement learning: An introduction. MIT Press.
Tan, M. (1993). Multi-agent reinforcement learning: Independent vs. cooperative agents. In Proceedings of the Tenth International Conference on Machine Learning (pp. 330–337).
Tesauro, G. J. (1994). Td-gammon, a self-teaching backgammon program, achieves masterlevel play. Neural Computation, 6, 215–219.
Whitehead, S., & Lin, L.-J. (1995). Reinforcement learning of non-markov decision processes. Artificial Intelligence, 73, 271–306.
Author information
Authors and Affiliations
Corresponding author
Additional information
Editor
Risto Miikkulainen
Rights and permissions
About this article
Cite this article
Ishii, S., Fujita, H., Mitsutake, M. et al. A Reinforcement Learning Scheme for a Partially-Observable Multi-Agent Game. Mach Learn 59, 31–54 (2005). https://doi.org/10.1007/s10994-005-0461-8
Received:
Revised:
Accepted:
Issue Date:
DOI: https://doi.org/10.1007/s10994-005-0461-8