Skip to main content
Log in

A Survey of Nash Equilibrium Strategy Solving Based on CFR

  • Original Paper
  • Published:
Archives of Computational Methods in Engineering Aims and scope Submit manuscript

Abstract

Recently, with the rapid development of artificial intelligence technology, there are growing researchers drawing their attention on the field of computer game. In two-player zero-sum extensive games with imperfect information, counterfactual regret (CFR) method is one of the most popular method to solve Nash equilibrium strategy. Therefore, we have carried on a wide range of research and analysis on the CFR and its related improved methods in recent years. In this paper, we firstly introduce the process of solving the strategy of Nash equilibrium based on CFR method. Then, some related improved methods of CFR are reviewed. The bare extended experiments are carried out to help researchers to understand these methods more conveniently. Further, some successful applications and common test platforms are described. Finally, the paper is ended with a conclusion of CFR-based methods and a prediction of future development.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Mnih V, Kavukcuoglu K, Silver D, Graves A, Antonoglou I, Wierstra D, Riedmiller M (2013) Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602

  2. Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller M, Fidjeland AK, Ostrovski G et al (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529

    Article  Google Scholar 

  3. Silver D, Huang A, Maddison CJ, Guez A, Sifre L, Van Den Driessche G, Schrittwieser J, Antonoglou I, Panneershelvam V, Lanctot M et al (2016) Mastering the game of go with deep neural networks and tree search. Nature 529(7587):484

    Article  Google Scholar 

  4. Silver D, Schrittwieser J, Simonyan K, Antonoglou I, Huang A, Guez A, Hubert T, Baker L, Lai M, Bolton A et al (2017) Mastering the game of go without human knowledge. Nature 550(7676):354

    Article  Google Scholar 

  5. Silver D, Hubert T, Schrittwieser J, Antonoglou I, Lai M, Guez A, Lanctot M, Sifre L, Kumaran D, Graepel T, Lillicrap T, Simonyan K, Hassabis D (2018) A general reinforcement learning algorithm that masters chess, shogi, and go through self-play. Science 362(6419):1140–1144

    Article  MathSciNet  Google Scholar 

  6. Zinkevich M, Johanson M, Bowling M, Piccione C (2008) Regret minimization in games with incomplete information. In: Advances in neural information processing systems, pp. 1729–1736

  7. Nash J (1951) Non-cooperative games. Ann Math 286–295

  8. Bowling M, Burch N, Johanson M, Tammelin O (2015) Heads-up limit Hold’em poker is solved. Science 347(6218):145–149

    Article  Google Scholar 

  9. Moravčík M, Schmid M, Burch N, Lisỳ V, Morrill D, Bard N, Davis T, Waugh K, Johanson M, Bowling M (2017) Deepstack: expert-level artificial intelligence in heads-up no-limit poker. Science 356(6337):508–513

    Article  MathSciNet  Google Scholar 

  10. Brown N, Sandholm T (2017) Superhuman ai for heads-up no-limit poker: libratus beats top professionals. Science 359(6374):1733

    MathSciNet  MATH  Google Scholar 

  11. Brown N, Sandholm T (2019) Superhuman ai for multiplayer poker. Science 365(6456):885–890

    Article  MathSciNet  Google Scholar 

  12. Osborne MJ, Rubinstein A (1994) A course in game theory. MIT Press, Cambridge

    MATH  Google Scholar 

  13. Foster DP, Vohra R (1999) Regret in the on-line decision problem. Games Econ Behav 29(1–2):7–35

    Article  MathSciNet  Google Scholar 

  14. Hart S, Mas-Colell A (2000) A simple adaptive procedure leading to correlated equilibrium. Econometrica 68(5):1127–1150

    Article  MathSciNet  Google Scholar 

  15. Lanctot M, Waugh K, Zinkevich M, Bowling M (2009) Monte Carlo sampling for regret minimization in extensive games. In: Advances in neural information processing systems, pp 1078–1086

  16. Brown N, Sandholm T (2019) Solving imperfect-information games via discounted regret minimization. Proc AAAI Conf Artif Intell 33:1829–1836

    Google Scholar 

  17. Brown N, Lerer A, Gross S, Sandholm T (2019) Deep counterfactual regret minimization. In: International conference on machine learning, pp 793–802

  18. Tammelin O, Burch N, Johanson M, Bowling M (2015) Solving heads-up limit Texas Hold’em. In: Twenty-fourth international joint conference on artificial intelligence

  19. LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444

    Article  Google Scholar 

  20. Ponsen M, Lanctot M, De Jong S (2010) Mcrnr: fast computing of restricted nash responses by means of sampling. In: Workshops at the twenty-fourth AAAI conference on artificial intelligence

  21. Ponsen M, De Jong S, Lanctot M (2011) Computing approximate nash equilibria and robust best-responses using sampling. J Artif Intell Res 42:575–605

    MathSciNet  MATH  Google Scholar 

  22. Johanson M, Bard N, Burch N, Bowling M (2012) Finding optimal abstract strategies in extensive-form games. In: Twenty-sixth AAAI conference on artificial intelligence

  23. Burch N, Lanctot M, Szafron D, Gibson RG (2012) Efficient Monte Carlo counterfactual regret minimization in games with many player actions. In: Advances in neural information processing systems, pp 1880–1888

  24. Johanson M, Bard N, Lanctot M, Gibson R, Bowling M (2012) Efficient Nash equilibrium approximation through Monte Carlo counterfactual regret minimization. In: Proceedings of the 11th international conference on autonomous agents and multiagent systems, vol 2. International Foundation for Autonomous Agents and Multiagent Systems, pp 837–846

  25. Burch N, Johanson M, Bowling M (2014) Solving imperfect information games using decomposition. In: Twenty-eighth AAAI conference on artificial intelligence

  26. Brown N, Sandholm T (2014) Regret transfer and parameter optimization. In: Twenty-eighth AAAI conference on artificial intelligence

  27. Lanctot M (2014) Further developments of extensive-form replicator dynamics using the sequence-form representation. In: Proceedings of the 2014 international conference on autonomous agents and multi-agent systems. International Foundation for Autonomous Agents and Multiagent Systems, pp 1257–1264

  28. Lanctot M, Lisy V, Bowling m (2014) Search in imperfect information games using online Monte Carlo counterfactual regret minimization. In: Workshops at the twenty-eighth AAAI conference on artificial intelligence

  29. Waugh K, Bagnell JA (2015) A unified view of large-scale zero-sum equilibrium computation. In: Workshops at the twenty-ninth AAAI conference on artificial intelligence

  30. Brown N, Ganzfried S, Sandholm T (2015) Hierarchical abstraction, distributed equilibrium computation, and post-processing, with application to a champion no-limit Texas Hold’em agent. In: Workshops at the twenty-ninth AAAI conference on artificial intelligence

  31. Lisỳ V, Lanctot M, Bowling M (2015) Online Monte Carlo counterfactual regret minimization for search in imperfect information games. In: Proceedings of the 2015 international conference on autonomous agents and multiagent systems. International Foundation for Autonomous Agents and Multiagent Systems, pp 27–36

  32. Waugh K, Morrill D, Bagnell JA, Bowling M (2015) Solving games with functional regret estimation. In: Twenty-ninth AAAI conference on artificial intelligence

  33. Brown N, Sandholm T (2015) Regret-based pruning in extensive-form games. In: Advances in neural information processing systems, pp 1972–1980

  34. Brown N, Sandholm T (2016) Strategy-based warm starting for regret minimization in games. In: Thirtieth AAAI conference on artificial intelligence

  35. Brown N, Sandholm T (2017) Reduced space and faster convergence in imperfect-information games via regret-based pruning. In: Workshops at the thirty-first AAAI conference on artificial intelligence

  36. Gibson R, Lanctot M, Burch N, Szafron D, Bowling M (2012) Generalized sampling and variance in counterfactual regret minimization. In: Twenty-sixth AAAI conference on artificial intelligence

  37. Jackson EG (2016) Compact CFR. In: Workshops at the thirtieth AAAI conference on artificial intelligence

  38. Jackson EG (2017) Targeted CFR. In: Workshops at the thirty-first AAAI conference on artificial intelligence

  39. Brown N, Sandholm T, Amos B (2018) Depth-limited solving for imperfect-information games. In: Advances in neural information processing systems, pp 7663–7674

  40. Jin P, Keutzer K, Levine S (2018) Regret minimization for partially observable deep reinforcement learning. In: International conference on machine learning, pp 2347–2356

  41. Li H, Hu K, Ge Z, Jiang T, Qi Y, Song L (2019) Double neural counterfactual regret minimization. In: Thirty-AAAI conference on artificial intelligence

  42. Steinberger E (2019) Single deep counterfactual regret minimization. arXiv preprint arXiv:1901.07621

  43. Zhou Y, Ren T, Li J, Yan D, Zhu J (2018) Lazy-CFR: a fast regret minimization algorithm for extensive games with imperfect information. arXiv preprint arXiv:1810.04433

  44. Schmid M, Burch N, Lanctot M, Moravcik M, Kadlec R, Bowling M (2019) Variance reduction in monte carlo counterfactual regret minimization (VR-MCCFR) for extensive form games using baselines. Proc AAAI Conf Artif Intell 33:2157–2164

    Google Scholar 

  45. Farina G, Kroer C, Sandholm T (2019) Regret circuits: composability of regret minimizers. In: International conference on machine learning, pp 1863–1872

  46. Farina G, Kroer C, Brown N, Sandholm T (2019) Stable-predictive optimistic counterfactual regret minimization. In: International conference on machine learning, pp 1853–1862

  47. Hopner P, Mencía EL (2018) Analysis and optimization of deep counterfactual value networks. arXiv preprint arXiv:1807.00900

  48. Ricciardelli E (2019) Solving adversarial patrolling problems with parallel counterfactual regret minimization

  49. D’Orazio R, Morrill D, Wright JR (2019) Bounds for approximate regret-matching algorithms. arXiv preprint arXiv:1910.01706

  50. Farina G, Kroer C, Sandholm T (2019) Optimistic regret minimization for extensive-form games via dilated distance-generating functions. In: Advances in neural information processing systems, pp 5222–5232

  51. Serrino JS (2019) Finding friend and foe in Avalon with counterfactual regret minimization and deep networks. PhD thesis, Massachusetts Institute of Technology

  52. Kash IA, Sullins M, Hofmann K (2019) Combining no-regret and q-learning. arXiv preprint arXiv:1910.03094

  53. Waugh K, Zinkevich M, Johanson M, Kan M, Schnizlein D, Bowling M (2009) A practical use of imperfect recall. In: Eighth symposium on abstraction, reformulation, and approximation

  54. Risk NA, Szafron D (2010) Using counterfactual regret minimization to create competitive multiplayer poker agents. In: Proceedings of the 9th international conference on autonomous agents and multiagent systems, vol 1. International Foundation for Autonomous Agents and Multiagent Systems, pp 159–166

  55. Johanson M, Waugh K, Bowling M, Zinkevich M (2011) Accelerating best response calculation in large extensive games. In: Twenty-second international joint conference on artificial intelligence

  56. Lanctot M, Gibson R, Burch N, Zinkevich M, Bowling M (2012) No-regret learning in extensive-form games with imperfect recall. In: Proceedings of the 29th international coference on international conference on machine learning. Omnipress, pp 1035–1042

  57. Teófilo LFG, Reis LP, Cardoso HL (2013) Speeding-up poker game abstraction computation: average rank strength. In: Workshops at the twenty-seventh AAAI conference on artificial intelligence

  58. Brown N, Sandholm T (2015) Simultaneous abstraction and equilibrium finding in games. In: Twenty-fourth international joint conference on artificial intelligence

  59. Lisy V, Davis T, Bowling M (2016) Counterfactual regret minimization in sequential security games. In: Thirtieth AAAI conference on artificial intelligence

  60. Moravcik M, Schmid M, Ha K, Hladik M, Gaukrodger SJ (2016) Refining subgames in large imperfect information games. In: Thirtieth AAAI conference on artificial intelligence

  61. Brown N, Kroer C, Sandholm T (2017) Dynamic thresholding and pruning for regret minimization. In: Thirty-first AAAI conference on artificial intelligence

  62. Hartley M, Zheng S, Yue Y (2017) Multi-agent counterfactual regret minimization for partial-information collaborative games. In: 31st conference on neural information processing systems (NIPS 2017)

  63. Farina G, Kroer C, Sandholm T (2017) Regret minimization in behaviorally-constrained zero-sum games. In: Proceedings of the 34th international conference on machine learning, vol 70, pp 1107–1116. JMLR. org

  64. Brown N, Sandholm T (2017) Safe and nested endgame solving for imperfect-information games. In: Workshops at the thirty-first AAAI conference on artificial intelligence

  65. Farina G, Kroer C, Sandholm T (2019) Composability of regret minimizers. In: Thirty AAAI conference on artificial intelligence

  66. Kroer C, Farina G, Sandholm T (2018) Solving large sequential games with the excessive gap technique. In: Advances in neural information processing systems, pp 864–874

  67. ACPC. http://www.computerpokercompetition.org/

  68. Blair A, Saffidine A (2019) Ai surpasses humans at six-player poker. Science 365(6456):864–865

    Article  Google Scholar 

  69. http://jeskola.net/cfr/

  70. Heinrich J, Lanctot M, Silver D (2015) Fictitious self-play in extensive-form games. In: International conference on machine learning, pp 805–813

Download references

Acknowledgements

We thank all the researchers in this field. This research is supported by PINGAN-HITsz Intelligence Finance Research Center, Key Technology Program of Shenzhen, China, (No. JSGG20170823152809704), Key Technology Program of Shenzhen, China, (No. JSGG20170824163239586), and Basic Research Project of Shenzhen, China, (No. JCYJ20180507183624136)

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xuan Wang.

Ethics declarations

Conflict of interest

We have no conflict of interest to declare.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, H., Wang, X., Jia, F. et al. A Survey of Nash Equilibrium Strategy Solving Based on CFR. Arch Computat Methods Eng 28, 2749–2760 (2021). https://doi.org/10.1007/s11831-020-09475-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11831-020-09475-5

Navigation