Abstract
Recently, with the rapid development of artificial intelligence technology, there are growing researchers drawing their attention on the field of computer game. In two-player zero-sum extensive games with imperfect information, counterfactual regret (CFR) method is one of the most popular method to solve Nash equilibrium strategy. Therefore, we have carried on a wide range of research and analysis on the CFR and its related improved methods in recent years. In this paper, we firstly introduce the process of solving the strategy of Nash equilibrium based on CFR method. Then, some related improved methods of CFR are reviewed. The bare extended experiments are carried out to help researchers to understand these methods more conveniently. Further, some successful applications and common test platforms are described. Finally, the paper is ended with a conclusion of CFR-based methods and a prediction of future development.
Similar content being viewed by others
References
Mnih V, Kavukcuoglu K, Silver D, Graves A, Antonoglou I, Wierstra D, Riedmiller M (2013) Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602
Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller M, Fidjeland AK, Ostrovski G et al (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529
Silver D, Huang A, Maddison CJ, Guez A, Sifre L, Van Den Driessche G, Schrittwieser J, Antonoglou I, Panneershelvam V, Lanctot M et al (2016) Mastering the game of go with deep neural networks and tree search. Nature 529(7587):484
Silver D, Schrittwieser J, Simonyan K, Antonoglou I, Huang A, Guez A, Hubert T, Baker L, Lai M, Bolton A et al (2017) Mastering the game of go without human knowledge. Nature 550(7676):354
Silver D, Hubert T, Schrittwieser J, Antonoglou I, Lai M, Guez A, Lanctot M, Sifre L, Kumaran D, Graepel T, Lillicrap T, Simonyan K, Hassabis D (2018) A general reinforcement learning algorithm that masters chess, shogi, and go through self-play. Science 362(6419):1140–1144
Zinkevich M, Johanson M, Bowling M, Piccione C (2008) Regret minimization in games with incomplete information. In: Advances in neural information processing systems, pp. 1729–1736
Nash J (1951) Non-cooperative games. Ann Math 286–295
Bowling M, Burch N, Johanson M, Tammelin O (2015) Heads-up limit Hold’em poker is solved. Science 347(6218):145–149
Moravčík M, Schmid M, Burch N, Lisỳ V, Morrill D, Bard N, Davis T, Waugh K, Johanson M, Bowling M (2017) Deepstack: expert-level artificial intelligence in heads-up no-limit poker. Science 356(6337):508–513
Brown N, Sandholm T (2017) Superhuman ai for heads-up no-limit poker: libratus beats top professionals. Science 359(6374):1733
Brown N, Sandholm T (2019) Superhuman ai for multiplayer poker. Science 365(6456):885–890
Osborne MJ, Rubinstein A (1994) A course in game theory. MIT Press, Cambridge
Foster DP, Vohra R (1999) Regret in the on-line decision problem. Games Econ Behav 29(1–2):7–35
Hart S, Mas-Colell A (2000) A simple adaptive procedure leading to correlated equilibrium. Econometrica 68(5):1127–1150
Lanctot M, Waugh K, Zinkevich M, Bowling M (2009) Monte Carlo sampling for regret minimization in extensive games. In: Advances in neural information processing systems, pp 1078–1086
Brown N, Sandholm T (2019) Solving imperfect-information games via discounted regret minimization. Proc AAAI Conf Artif Intell 33:1829–1836
Brown N, Lerer A, Gross S, Sandholm T (2019) Deep counterfactual regret minimization. In: International conference on machine learning, pp 793–802
Tammelin O, Burch N, Johanson M, Bowling M (2015) Solving heads-up limit Texas Hold’em. In: Twenty-fourth international joint conference on artificial intelligence
LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444
Ponsen M, Lanctot M, De Jong S (2010) Mcrnr: fast computing of restricted nash responses by means of sampling. In: Workshops at the twenty-fourth AAAI conference on artificial intelligence
Ponsen M, De Jong S, Lanctot M (2011) Computing approximate nash equilibria and robust best-responses using sampling. J Artif Intell Res 42:575–605
Johanson M, Bard N, Burch N, Bowling M (2012) Finding optimal abstract strategies in extensive-form games. In: Twenty-sixth AAAI conference on artificial intelligence
Burch N, Lanctot M, Szafron D, Gibson RG (2012) Efficient Monte Carlo counterfactual regret minimization in games with many player actions. In: Advances in neural information processing systems, pp 1880–1888
Johanson M, Bard N, Lanctot M, Gibson R, Bowling M (2012) Efficient Nash equilibrium approximation through Monte Carlo counterfactual regret minimization. In: Proceedings of the 11th international conference on autonomous agents and multiagent systems, vol 2. International Foundation for Autonomous Agents and Multiagent Systems, pp 837–846
Burch N, Johanson M, Bowling M (2014) Solving imperfect information games using decomposition. In: Twenty-eighth AAAI conference on artificial intelligence
Brown N, Sandholm T (2014) Regret transfer and parameter optimization. In: Twenty-eighth AAAI conference on artificial intelligence
Lanctot M (2014) Further developments of extensive-form replicator dynamics using the sequence-form representation. In: Proceedings of the 2014 international conference on autonomous agents and multi-agent systems. International Foundation for Autonomous Agents and Multiagent Systems, pp 1257–1264
Lanctot M, Lisy V, Bowling m (2014) Search in imperfect information games using online Monte Carlo counterfactual regret minimization. In: Workshops at the twenty-eighth AAAI conference on artificial intelligence
Waugh K, Bagnell JA (2015) A unified view of large-scale zero-sum equilibrium computation. In: Workshops at the twenty-ninth AAAI conference on artificial intelligence
Brown N, Ganzfried S, Sandholm T (2015) Hierarchical abstraction, distributed equilibrium computation, and post-processing, with application to a champion no-limit Texas Hold’em agent. In: Workshops at the twenty-ninth AAAI conference on artificial intelligence
Lisỳ V, Lanctot M, Bowling M (2015) Online Monte Carlo counterfactual regret minimization for search in imperfect information games. In: Proceedings of the 2015 international conference on autonomous agents and multiagent systems. International Foundation for Autonomous Agents and Multiagent Systems, pp 27–36
Waugh K, Morrill D, Bagnell JA, Bowling M (2015) Solving games with functional regret estimation. In: Twenty-ninth AAAI conference on artificial intelligence
Brown N, Sandholm T (2015) Regret-based pruning in extensive-form games. In: Advances in neural information processing systems, pp 1972–1980
Brown N, Sandholm T (2016) Strategy-based warm starting for regret minimization in games. In: Thirtieth AAAI conference on artificial intelligence
Brown N, Sandholm T (2017) Reduced space and faster convergence in imperfect-information games via regret-based pruning. In: Workshops at the thirty-first AAAI conference on artificial intelligence
Gibson R, Lanctot M, Burch N, Szafron D, Bowling M (2012) Generalized sampling and variance in counterfactual regret minimization. In: Twenty-sixth AAAI conference on artificial intelligence
Jackson EG (2016) Compact CFR. In: Workshops at the thirtieth AAAI conference on artificial intelligence
Jackson EG (2017) Targeted CFR. In: Workshops at the thirty-first AAAI conference on artificial intelligence
Brown N, Sandholm T, Amos B (2018) Depth-limited solving for imperfect-information games. In: Advances in neural information processing systems, pp 7663–7674
Jin P, Keutzer K, Levine S (2018) Regret minimization for partially observable deep reinforcement learning. In: International conference on machine learning, pp 2347–2356
Li H, Hu K, Ge Z, Jiang T, Qi Y, Song L (2019) Double neural counterfactual regret minimization. In: Thirty-AAAI conference on artificial intelligence
Steinberger E (2019) Single deep counterfactual regret minimization. arXiv preprint arXiv:1901.07621
Zhou Y, Ren T, Li J, Yan D, Zhu J (2018) Lazy-CFR: a fast regret minimization algorithm for extensive games with imperfect information. arXiv preprint arXiv:1810.04433
Schmid M, Burch N, Lanctot M, Moravcik M, Kadlec R, Bowling M (2019) Variance reduction in monte carlo counterfactual regret minimization (VR-MCCFR) for extensive form games using baselines. Proc AAAI Conf Artif Intell 33:2157–2164
Farina G, Kroer C, Sandholm T (2019) Regret circuits: composability of regret minimizers. In: International conference on machine learning, pp 1863–1872
Farina G, Kroer C, Brown N, Sandholm T (2019) Stable-predictive optimistic counterfactual regret minimization. In: International conference on machine learning, pp 1853–1862
Hopner P, Mencía EL (2018) Analysis and optimization of deep counterfactual value networks. arXiv preprint arXiv:1807.00900
Ricciardelli E (2019) Solving adversarial patrolling problems with parallel counterfactual regret minimization
D’Orazio R, Morrill D, Wright JR (2019) Bounds for approximate regret-matching algorithms. arXiv preprint arXiv:1910.01706
Farina G, Kroer C, Sandholm T (2019) Optimistic regret minimization for extensive-form games via dilated distance-generating functions. In: Advances in neural information processing systems, pp 5222–5232
Serrino JS (2019) Finding friend and foe in Avalon with counterfactual regret minimization and deep networks. PhD thesis, Massachusetts Institute of Technology
Kash IA, Sullins M, Hofmann K (2019) Combining no-regret and q-learning. arXiv preprint arXiv:1910.03094
Waugh K, Zinkevich M, Johanson M, Kan M, Schnizlein D, Bowling M (2009) A practical use of imperfect recall. In: Eighth symposium on abstraction, reformulation, and approximation
Risk NA, Szafron D (2010) Using counterfactual regret minimization to create competitive multiplayer poker agents. In: Proceedings of the 9th international conference on autonomous agents and multiagent systems, vol 1. International Foundation for Autonomous Agents and Multiagent Systems, pp 159–166
Johanson M, Waugh K, Bowling M, Zinkevich M (2011) Accelerating best response calculation in large extensive games. In: Twenty-second international joint conference on artificial intelligence
Lanctot M, Gibson R, Burch N, Zinkevich M, Bowling M (2012) No-regret learning in extensive-form games with imperfect recall. In: Proceedings of the 29th international coference on international conference on machine learning. Omnipress, pp 1035–1042
Teófilo LFG, Reis LP, Cardoso HL (2013) Speeding-up poker game abstraction computation: average rank strength. In: Workshops at the twenty-seventh AAAI conference on artificial intelligence
Brown N, Sandholm T (2015) Simultaneous abstraction and equilibrium finding in games. In: Twenty-fourth international joint conference on artificial intelligence
Lisy V, Davis T, Bowling M (2016) Counterfactual regret minimization in sequential security games. In: Thirtieth AAAI conference on artificial intelligence
Moravcik M, Schmid M, Ha K, Hladik M, Gaukrodger SJ (2016) Refining subgames in large imperfect information games. In: Thirtieth AAAI conference on artificial intelligence
Brown N, Kroer C, Sandholm T (2017) Dynamic thresholding and pruning for regret minimization. In: Thirty-first AAAI conference on artificial intelligence
Hartley M, Zheng S, Yue Y (2017) Multi-agent counterfactual regret minimization for partial-information collaborative games. In: 31st conference on neural information processing systems (NIPS 2017)
Farina G, Kroer C, Sandholm T (2017) Regret minimization in behaviorally-constrained zero-sum games. In: Proceedings of the 34th international conference on machine learning, vol 70, pp 1107–1116. JMLR. org
Brown N, Sandholm T (2017) Safe and nested endgame solving for imperfect-information games. In: Workshops at the thirty-first AAAI conference on artificial intelligence
Farina G, Kroer C, Sandholm T (2019) Composability of regret minimizers. In: Thirty AAAI conference on artificial intelligence
Kroer C, Farina G, Sandholm T (2018) Solving large sequential games with the excessive gap technique. In: Advances in neural information processing systems, pp 864–874
Blair A, Saffidine A (2019) Ai surpasses humans at six-player poker. Science 365(6456):864–865
Heinrich J, Lanctot M, Silver D (2015) Fictitious self-play in extensive-form games. In: International conference on machine learning, pp 805–813
Acknowledgements
We thank all the researchers in this field. This research is supported by PINGAN-HITsz Intelligence Finance Research Center, Key Technology Program of Shenzhen, China, (No. JSGG20170823152809704), Key Technology Program of Shenzhen, China, (No. JSGG20170824163239586), and Basic Research Project of Shenzhen, China, (No. JCYJ20180507183624136)
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
We have no conflict of interest to declare.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Li, H., Wang, X., Jia, F. et al. A Survey of Nash Equilibrium Strategy Solving Based on CFR. Arch Computat Methods Eng 28, 2749–2760 (2021). https://doi.org/10.1007/s11831-020-09475-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11831-020-09475-5