A Survey of Nash Equilibrium Strategy Solving Based on CFR

Li, Huale; Wang, Xuan; Jia, Fengwei; Li, Yifan; Chen, Qian

doi:10.1007/s11831-020-09475-5

A Survey of Nash Equilibrium Strategy Solving Based on CFR

Original Paper
Published: 17 August 2020

Volume 28, pages 2749–2760, (2021)
Cite this article

Archives of Computational Methods in Engineering Aims and scope Submit manuscript

Huale Li¹,
Xuan Wang ORCID: orcid.org/0000-0002-9168-5038^1,2,
Fengwei Jia¹,
Yifan Li¹ &
…
Qian Chen¹

884 Accesses
4 Citations
Explore all metrics

Abstract

Recently, with the rapid development of artificial intelligence technology, there are growing researchers drawing their attention on the field of computer game. In two-player zero-sum extensive games with imperfect information, counterfactual regret (CFR) method is one of the most popular method to solve Nash equilibrium strategy. Therefore, we have carried on a wide range of research and analysis on the CFR and its related improved methods in recent years. In this paper, we firstly introduce the process of solving the strategy of Nash equilibrium based on CFR method. Then, some related improved methods of CFR are reviewed. The bare extended experiments are carried out to help researchers to understand these methods more conveniently. Further, some successful applications and common test platforms are described. Finally, the paper is ended with a conclusion of CFR-based methods and a prediction of future development.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms

Reinforcement Learning in Economics and Finance

Article 23 April 2021

A mean field game approach to relative investment–consumption games with habit formation

Article 02 May 2024

References

Mnih V, Kavukcuoglu K, Silver D, Graves A, Antonoglou I, Wierstra D, Riedmiller M (2013) Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602
Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller M, Fidjeland AK, Ostrovski G et al (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529
Article Google Scholar
Silver D, Huang A, Maddison CJ, Guez A, Sifre L, Van Den Driessche G, Schrittwieser J, Antonoglou I, Panneershelvam V, Lanctot M et al (2016) Mastering the game of go with deep neural networks and tree search. Nature 529(7587):484
Article Google Scholar
Silver D, Schrittwieser J, Simonyan K, Antonoglou I, Huang A, Guez A, Hubert T, Baker L, Lai M, Bolton A et al (2017) Mastering the game of go without human knowledge. Nature 550(7676):354
Article Google Scholar
Silver D, Hubert T, Schrittwieser J, Antonoglou I, Lai M, Guez A, Lanctot M, Sifre L, Kumaran D, Graepel T, Lillicrap T, Simonyan K, Hassabis D (2018) A general reinforcement learning algorithm that masters chess, shogi, and go through self-play. Science 362(6419):1140–1144
Article MathSciNet Google Scholar
Zinkevich M, Johanson M, Bowling M, Piccione C (2008) Regret minimization in games with incomplete information. In: Advances in neural information processing systems, pp. 1729–1736
Nash J (1951) Non-cooperative games. Ann Math 286–295
Bowling M, Burch N, Johanson M, Tammelin O (2015) Heads-up limit Hold’em poker is solved. Science 347(6218):145–149
Article Google Scholar
Moravčík M, Schmid M, Burch N, Lisỳ V, Morrill D, Bard N, Davis T, Waugh K, Johanson M, Bowling M (2017) Deepstack: expert-level artificial intelligence in heads-up no-limit poker. Science 356(6337):508–513
Article MathSciNet Google Scholar
Brown N, Sandholm T (2017) Superhuman ai for heads-up no-limit poker: libratus beats top professionals. Science 359(6374):1733
MathSciNet MATH Google Scholar
Brown N, Sandholm T (2019) Superhuman ai for multiplayer poker. Science 365(6456):885–890
Article MathSciNet Google Scholar
Osborne MJ, Rubinstein A (1994) A course in game theory. MIT Press, Cambridge
MATH Google Scholar
Foster DP, Vohra R (1999) Regret in the on-line decision problem. Games Econ Behav 29(1–2):7–35
Article MathSciNet Google Scholar
Hart S, Mas-Colell A (2000) A simple adaptive procedure leading to correlated equilibrium. Econometrica 68(5):1127–1150
Article MathSciNet Google Scholar
Lanctot M, Waugh K, Zinkevich M, Bowling M (2009) Monte Carlo sampling for regret minimization in extensive games. In: Advances in neural information processing systems, pp 1078–1086
Brown N, Sandholm T (2019) Solving imperfect-information games via discounted regret minimization. Proc AAAI Conf Artif Intell 33:1829–1836
Google Scholar
Brown N, Lerer A, Gross S, Sandholm T (2019) Deep counterfactual regret minimization. In: International conference on machine learning, pp 793–802
Tammelin O, Burch N, Johanson M, Bowling M (2015) Solving heads-up limit Texas Hold’em. In: Twenty-fourth international joint conference on artificial intelligence
LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444
Article Google Scholar
Ponsen M, Lanctot M, De Jong S (2010) Mcrnr: fast computing of restricted nash responses by means of sampling. In: Workshops at the twenty-fourth AAAI conference on artificial intelligence
Ponsen M, De Jong S, Lanctot M (2011) Computing approximate nash equilibria and robust best-responses using sampling. J Artif Intell Res 42:575–605
MathSciNet MATH Google Scholar
Johanson M, Bard N, Burch N, Bowling M (2012) Finding optimal abstract strategies in extensive-form games. In: Twenty-sixth AAAI conference on artificial intelligence
Burch N, Lanctot M, Szafron D, Gibson RG (2012) Efficient Monte Carlo counterfactual regret minimization in games with many player actions. In: Advances in neural information processing systems, pp 1880–1888
Johanson M, Bard N, Lanctot M, Gibson R, Bowling M (2012) Efficient Nash equilibrium approximation through Monte Carlo counterfactual regret minimization. In: Proceedings of the 11th international conference on autonomous agents and multiagent systems, vol 2. International Foundation for Autonomous Agents and Multiagent Systems, pp 837–846
Burch N, Johanson M, Bowling M (2014) Solving imperfect information games using decomposition. In: Twenty-eighth AAAI conference on artificial intelligence
Brown N, Sandholm T (2014) Regret transfer and parameter optimization. In: Twenty-eighth AAAI conference on artificial intelligence
Lanctot M (2014) Further developments of extensive-form replicator dynamics using the sequence-form representation. In: Proceedings of the 2014 international conference on autonomous agents and multi-agent systems. International Foundation for Autonomous Agents and Multiagent Systems, pp 1257–1264
Lanctot M, Lisy V, Bowling m (2014) Search in imperfect information games using online Monte Carlo counterfactual regret minimization. In: Workshops at the twenty-eighth AAAI conference on artificial intelligence
Waugh K, Bagnell JA (2015) A unified view of large-scale zero-sum equilibrium computation. In: Workshops at the twenty-ninth AAAI conference on artificial intelligence
Brown N, Ganzfried S, Sandholm T (2015) Hierarchical abstraction, distributed equilibrium computation, and post-processing, with application to a champion no-limit Texas Hold’em agent. In: Workshops at the twenty-ninth AAAI conference on artificial intelligence
Lisỳ V, Lanctot M, Bowling M (2015) Online Monte Carlo counterfactual regret minimization for search in imperfect information games. In: Proceedings of the 2015 international conference on autonomous agents and multiagent systems. International Foundation for Autonomous Agents and Multiagent Systems, pp 27–36
Waugh K, Morrill D, Bagnell JA, Bowling M (2015) Solving games with functional regret estimation. In: Twenty-ninth AAAI conference on artificial intelligence
Brown N, Sandholm T (2015) Regret-based pruning in extensive-form games. In: Advances in neural information processing systems, pp 1972–1980
Brown N, Sandholm T (2016) Strategy-based warm starting for regret minimization in games. In: Thirtieth AAAI conference on artificial intelligence
Brown N, Sandholm T (2017) Reduced space and faster convergence in imperfect-information games via regret-based pruning. In: Workshops at the thirty-first AAAI conference on artificial intelligence
Gibson R, Lanctot M, Burch N, Szafron D, Bowling M (2012) Generalized sampling and variance in counterfactual regret minimization. In: Twenty-sixth AAAI conference on artificial intelligence
Jackson EG (2016) Compact CFR. In: Workshops at the thirtieth AAAI conference on artificial intelligence
Jackson EG (2017) Targeted CFR. In: Workshops at the thirty-first AAAI conference on artificial intelligence
Brown N, Sandholm T, Amos B (2018) Depth-limited solving for imperfect-information games. In: Advances in neural information processing systems, pp 7663–7674
Jin P, Keutzer K, Levine S (2018) Regret minimization for partially observable deep reinforcement learning. In: International conference on machine learning, pp 2347–2356
Li H, Hu K, Ge Z, Jiang T, Qi Y, Song L (2019) Double neural counterfactual regret minimization. In: Thirty-AAAI conference on artificial intelligence
Steinberger E (2019) Single deep counterfactual regret minimization. arXiv preprint arXiv:1901.07621
Zhou Y, Ren T, Li J, Yan D, Zhu J (2018) Lazy-CFR: a fast regret minimization algorithm for extensive games with imperfect information. arXiv preprint arXiv:1810.04433
Schmid M, Burch N, Lanctot M, Moravcik M, Kadlec R, Bowling M (2019) Variance reduction in monte carlo counterfactual regret minimization (VR-MCCFR) for extensive form games using baselines. Proc AAAI Conf Artif Intell 33:2157–2164
Google Scholar
Farina G, Kroer C, Sandholm T (2019) Regret circuits: composability of regret minimizers. In: International conference on machine learning, pp 1863–1872
Farina G, Kroer C, Brown N, Sandholm T (2019) Stable-predictive optimistic counterfactual regret minimization. In: International conference on machine learning, pp 1853–1862
Hopner P, Mencía EL (2018) Analysis and optimization of deep counterfactual value networks. arXiv preprint arXiv:1807.00900
Ricciardelli E (2019) Solving adversarial patrolling problems with parallel counterfactual regret minimization
D’Orazio R, Morrill D, Wright JR (2019) Bounds for approximate regret-matching algorithms. arXiv preprint arXiv:1910.01706
Farina G, Kroer C, Sandholm T (2019) Optimistic regret minimization for extensive-form games via dilated distance-generating functions. In: Advances in neural information processing systems, pp 5222–5232
Serrino JS (2019) Finding friend and foe in Avalon with counterfactual regret minimization and deep networks. PhD thesis, Massachusetts Institute of Technology
Kash IA, Sullins M, Hofmann K (2019) Combining no-regret and q-learning. arXiv preprint arXiv:1910.03094
Waugh K, Zinkevich M, Johanson M, Kan M, Schnizlein D, Bowling M (2009) A practical use of imperfect recall. In: Eighth symposium on abstraction, reformulation, and approximation
Risk NA, Szafron D (2010) Using counterfactual regret minimization to create competitive multiplayer poker agents. In: Proceedings of the 9th international conference on autonomous agents and multiagent systems, vol 1. International Foundation for Autonomous Agents and Multiagent Systems, pp 159–166
Johanson M, Waugh K, Bowling M, Zinkevich M (2011) Accelerating best response calculation in large extensive games. In: Twenty-second international joint conference on artificial intelligence
Lanctot M, Gibson R, Burch N, Zinkevich M, Bowling M (2012) No-regret learning in extensive-form games with imperfect recall. In: Proceedings of the 29th international coference on international conference on machine learning. Omnipress, pp 1035–1042
Teófilo LFG, Reis LP, Cardoso HL (2013) Speeding-up poker game abstraction computation: average rank strength. In: Workshops at the twenty-seventh AAAI conference on artificial intelligence
Brown N, Sandholm T (2015) Simultaneous abstraction and equilibrium finding in games. In: Twenty-fourth international joint conference on artificial intelligence
Lisy V, Davis T, Bowling M (2016) Counterfactual regret minimization in sequential security games. In: Thirtieth AAAI conference on artificial intelligence
Moravcik M, Schmid M, Ha K, Hladik M, Gaukrodger SJ (2016) Refining subgames in large imperfect information games. In: Thirtieth AAAI conference on artificial intelligence
Brown N, Kroer C, Sandholm T (2017) Dynamic thresholding and pruning for regret minimization. In: Thirty-first AAAI conference on artificial intelligence
Hartley M, Zheng S, Yue Y (2017) Multi-agent counterfactual regret minimization for partial-information collaborative games. In: 31st conference on neural information processing systems (NIPS 2017)
Farina G, Kroer C, Sandholm T (2017) Regret minimization in behaviorally-constrained zero-sum games. In: Proceedings of the 34th international conference on machine learning, vol 70, pp 1107–1116. JMLR. org
Brown N, Sandholm T (2017) Safe and nested endgame solving for imperfect-information games. In: Workshops at the thirty-first AAAI conference on artificial intelligence
Farina G, Kroer C, Sandholm T (2019) Composability of regret minimizers. In: Thirty AAAI conference on artificial intelligence
Kroer C, Farina G, Sandholm T (2018) Solving large sequential games with the excessive gap technique. In: Advances in neural information processing systems, pp 864–874
ACPC. http://www.computerpokercompetition.org/
Blair A, Saffidine A (2019) Ai surpasses humans at six-player poker. Science 365(6456):864–865
Article Google Scholar
http://jeskola.net/cfr/
Heinrich J, Lanctot M, Silver D (2015) Fictitious self-play in extensive-form games. In: International conference on machine learning, pp 805–813

Download references

Acknowledgements

We thank all the researchers in this field. This research is supported by PINGAN-HITsz Intelligence Finance Research Center, Key Technology Program of Shenzhen, China, (No. JSGG20170823152809704), Key Technology Program of Shenzhen, China, (No. JSGG20170824163239586), and Basic Research Project of Shenzhen, China, (No. JCYJ20180507183624136)

Author information

Authors and Affiliations

Computer Application Research Center, Harbin Institute of Technology, Shenzhen, 518055, China
Huale Li, Xuan Wang, Fengwei Jia, Yifan Li & Qian Chen
Peng Cheng Laboratory, Shenzhen, China
Xuan Wang

Authors

Huale Li
View author publications
You can also search for this author in PubMed Google Scholar
Xuan Wang
View author publications
You can also search for this author in PubMed Google Scholar
Fengwei Jia
View author publications
You can also search for this author in PubMed Google Scholar
Yifan Li
View author publications
You can also search for this author in PubMed Google Scholar
Qian Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xuan Wang.

Ethics declarations

Conflict of interest

We have no conflict of interest to declare.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, H., Wang, X., Jia, F. et al. A Survey of Nash Equilibrium Strategy Solving Based on CFR. Arch Computat Methods Eng 28, 2749–2760 (2021). https://doi.org/10.1007/s11831-020-09475-5

Download citation

Received: 17 December 2019
Accepted: 30 July 2020
Published: 17 August 2020
Issue Date: June 2021
DOI: https://doi.org/10.1007/s11831-020-09475-5

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Survey of Nash Equilibrium Strategy Solving Based on CFR

Abstract

Access this article

Similar content being viewed by others

Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms

Reinforcement Learning in Economics and Finance

A mean field game approach to relative investment–consumption games with habit formation

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Navigation

A Survey of Nash Equilibrium Strategy Solving Based on CFR

Abstract

Access this article

Similar content being viewed by others

Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms

Reinforcement Learning in Economics and Finance

A mean field game approach to relative investment–consumption games with habit formation

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation