Elsevier

Games and Economic Behavior

Volume 117, September 2019, Pages 479-498
Games and Economic Behavior

The truth behind the myth of the Folk theorem

https://doi.org/10.1016/j.geb.2019.04.008Get rights and content

Abstract

We study the problem of computing an ϵ-Nash equilibrium in repeated games. Earlier work by Borgs et al. (2010) suggests that this problem is intractable. We show that if we make a slight change to their model—modeling the players as polynomial-time Turing machines that maintain state—and make a standard cryptographic assumption (that public-key cryptography can carried out), the problem can actually be solved in polynomial time. Our algorithm works not only for games with a finite number of players, but also for constant-degree graphical games (where, roughly speaking, which players' actions a given player's utility depends on are characterized by a graph, typically of bounded degree). As Nash equilibrium is a weak solution concept for extensive-form games, we additionally define and study an appropriate notion of subgame-perfect equilibrium for computationally bounded players, and show how to efficiently find such an equilibrium in repeated games (again, assuming public-key cryptography).

Introduction

The complexity of finding a Nash equilibrium (NE) is a fundamental question at the interface of game theory and computer science. A celebrated sequence of results showed that the complexity of finding a NE in a normal-form game is PPAD-complete [Chen and Deng (2006), Daskalakis et al. (2006)], even for 2-player games. Less restrictive concepts, such as ϵ-NE for an inverse-polynomial ϵ, are just as hard [Chen et al. (2006)]. This suggests that these problems are computationally intractable.

There was some hope that the situation would be better in infinitely-repeated games. The Folk Theorem (see Osborne and Rubinstein (1994) for a review) informally states that in an infinitely-repeated game G, for any payoff profile that is individually rational, in that all players get more than1 their minimax payoff (the highest payoff that a player can guarantee himself, no matter what the other players do) and is the outcome of some correlated strategy in G, there is a Nash equilibrium of G with this payoff profile. With such a large set of equilibria, the hope was that finding one would be less difficult. Indeed, Littman and Stone (2005) showed that ideas from the proof of the Folk Theorem can be used to design an algorithm for finding a NE in a two-player repeated game. Recently, Andersen and Conitzer (2013) used a folk-theorem–like strategy to design an algorithm that with high probability finds a NE in a restricted family of repeated games with more than two players and with the limit-of-means payoff criterion.

Despite these positive results, the hope of being able to compute a NE efficiently in repeated games was apparently dashed by Borgs et al. (2010) (BC+ from now on), who proved results suggesting that, for more than two players, it was as difficult to find a NE in infinitely-repeated games as it was in a single-shot normal-form game. Before explaining the results of BC+ in more detail and our new results, we need to consider more carefully what it means to compute a NE in a repeated game. In a single-shot normal-form game, a strategy for a player is simply a probability distribution over actions, and thus the algorithm that computes a NE can simply output an appropriate distribution. On the other hand, in a repeated game (or, more generally, in an extensive-form game) a strategy is a function from all histories to probability distributions over actions. A naive description of such a function can be quite large (or even infinite); thus, we can't expect an algorithm to output such a description. We instead need to consider a concise representation of the strategy. For example, Littman and Stone (2005) describe strategies using finite automata; the output of their algorithm is a description of such an automaton.

BC+ represent strategies using what they called strategy machines. A strategy machine is a Turing Machine (TM) that takes as input the history of the game so far and outputs a probability distribution over actions. Such a representation gives rise to another subtlety. If we do not restrict the TM in some way, there is a trivial algorithm that finds strategy machines that are a NE. The algorithm simply outputs a TM for each player that, at each round, first computes a NE of the underlying game and simply plays according to that player's probability distribution in the NE. Each player's TM might take exponential time to compute its next action, but its description is short (polynomial in the description of the underlying game) and can be computed efficiently. Intuitively, such a representation allows the algorithm to transfer the complexity of the problem to the players. This does not seem reasonable. The interest in being able to compute a NE efficiently largely stems from the fact that we think of players as being resource bounded. As pointed out by Roughgarden (2010), “participants in a repeated game face two types of complexity: that of formulating a strategy, and that of executing a strategy”. It seems inappropriate to limit one type of complexity without also limiting the other. We capture the complexity of formulating the strategy by restricting to polynomial-time algorithms for finding a NE. It seems reasonable to further require the strategy machines themselves to be polynomial-time TMs.2 With this requirement, the strategies that the algorithm outputs are no more “powerful” than the algorithm itself.

This is exactly what BC+ do. They show that the problem of finding a NE (or even an ϵ-NE for an inverse-polynomial ϵ) represented by a polynomial-time TM in an infinitely repeated game with three or more players where there is a discount factor bounded away from 1 by an inverse polynomial is PPAD-hard. They prove this by showing that, given an arbitrary normal-form game G with c2 players, there is a game G with c+1 players such that an algorithm can use an ϵ/8c-NE for the repeated game based on G represented by polynomial-time TMs to compute an ϵ-NE for G.

While their proof is indeed correct, in this paper, we challenge their conclusion. While BC+ restrict to strategies that are polynomial-time TMs, we do not think that this restriction suffices. As is standard when considering NE, BC+ allow players to deviate in arbitrary ways. However, we believe that polynomial-time players should also be restricted to using only polynomial-time deviations. It seems inconsistent to restrict players to using polynomial-time TMs in equilibrium, but then to allow them to deviate in ways that might not be implementable by a polynomial-time TM. Roughly speaking, we show that if we restrict to polynomial-time deviations, then we can compute an ϵ-NE in a repeated game in polynomial time.

Formally, we view players as probabilistic3 polynomial-time Turing machines (PPT TMs). We differ from BC+ in two key respects. First, as we suggested above, we restrict the deviations that can be made in equilibrium to those that can be computed by a PPT TM. Second, BC+ implicitly assume that players have no memory: they cannot remember computation from earlier rounds. By way of contrast, we allow players to have a bounded (polynomial) amount of memory. This allows players to remember, for example, the results of a few coin tosses from earlier rounds, and correlate in a non-trivial way moves in later rounds with these coin tosses. We call such TMs stateful, and the TMs used by BC+ stateless. We note that if we do not restrict to polynomial-time TMs, then there is no real difference between stateful TMs and stateless TMs in our setting (since a player with unbounded computational power can recreate the necessary state).4 These two assumptions mean that the players can use some cryptography (making some standard cryptographic assumptions) to coordinate the action of different players. We stress that this coordination happens in the process of the game play, not through communication. That is, there are no side channels; the only form of “communication” is by making moves in the game. With these assumptions (and the remaining assumptions of the BC+ model), we show that in fact an ϵ-NE in an infinitely-repeated game can be found in polynomial time.

Our equilibrium strategy uses threats and punishment in much the same way that they are used in the Folk Theorem. However, since players are computationally bounded, we can use cryptography (we assume the existence of a secure public key encryption scheme) to secretly correlate the punishing players. This allows us to overcome the difficulties raised by BC+, who show that computing an uncorrelated minimax punishment is NP-hard. We instead use the secret correlation to implement a correlated punishment that is easy to compute. Roughly speaking, the ϵ-NE can be described as proceeding in three stages. In the first stage, the players play a predefined sequence of actions repeatedly. If some player deviates from the sequence, the second stage begins. In this stage, the non-deviating players use their actions to secretly exchange a random seed, through the use of public-key encryption. In the third stage, the players use a correlated minimax strategy to punish the deviator forever. To achieve this correlation, the players use the secret random seed as the seed of a pseudorandom function (i.e., a function that a polynomial-time player cannot distinguish from a truly random function), and use the outputs of the pseudorandom function as the source of randomness for the correlated strategy. Since the existence of public-key encryption implies the existence of pseudorandom functions, the only cryptographic assumption needed is the existence of public-key encryptions—one of the most basic cryptographic assumptions.

The use of modern cryptography in game theory goes back to the works of Urbano and Vila, 2002, Urbano and Vila, 2004 and Gossner (1998); more recently, it has been used by, for example, Dodis, Halevi, and Rabin (2000). The work of Gossner, 1998, Gossner, 2000 is perhaps the application of cryptography to game theory that is most closely related to ours. Gossner (1998) uses cryptographic techniques to show how any payoff profile that is above the players' correlated minimax value can be achieved in a NE of a repeated game with public communication played by computationally bounded players. In Gossner (2000), a strategy similar to the one that we use is used to prove that, even without communication, the same result holds. Gossner's results apply only to infinitely-repeated games with 3 players and no discounting; Gossner claims that his results do not hold for games with discounting (which is what we consider here). We note that under the limit-of-means criterion there is a trivial solution to our problem. The algorithm can simply output TMs that first spend some rounds computing the NE of the underlying game and, after finding it, play it repeatedly. Since computing the NE can be done in finite time (although perhaps exponential in the size of the game), the players will eventually find it. Under the limit-of-means criterion, their payoff with this strategy is the equilibrium payoff; thus, the players do not care about the rounds wasted in finding the NE. It is worth noting that the same argument applies to the case where the discount factor is bounded away from 1 by only an inverse exponential.

While not the main emphasis of this paper, our results can be seen informally as showing that, like other folk theorem results, if players are computationally bounded and there exists a public-key encryption scheme, then a utility profile in which each player gets a utility that is above her utility in the correlated minimax can be supported (up to some inverse polynomial) by a computational ϵ-NE in games with an inverse-polynomial discount factor and a constant number of players (and, as discussed below, by a subgame-perfect equilibrium and in graphical games, where, roughly speaking, which players' actions a given player's utility depends on are characterized by a graph, typically of bounded degree), thus generalizing Gossner's results to these cases.

The idea of using the structure of the game as a means of correlation has also been suggested before. For example, Lehrer (1991) uses this idea to show an equivalence between NE and correlated equilibrium in certain repeated games with nonstandard information structures. In our setting, there is an additional complication: we need the correlation to be hidden from a specific player. The problem of hiding correlation was also considered in two recent papers. Bavly and Neyman (2014) show that concealment is possible if players have bounded recall, and the set of players trying to hide the correlation includes at least one player that has longer recall than any of the opponents. They also show that the same result holds if the players are represented as finite automata, and one of the players trying to hide the correlation is an automaton with more states than any of the opponents. Peretz (2013) shows that under appropriate conditions, if players have bounded recall, even 2 weak players can conceal their correlation from stronger players (although Bavly and Peretz (2019) show that this is no longer the case if the stronger players who recall capacity is linearly larger than that of the weaker players). The earlier results do not help in our setting. The assumptions made by Peretz do not hold in our setting, and to prove that a NE exists, it does not help to assume that players have asymmetric abilities; the stronger player could be the one that defects.

In the second part of the paper we show how to extend this result to a more refined solution concept. While NE has some attractive features, it allows some unreasonable solutions. In particular, the equilibrium might be obtained by what are arguably empty threats. This actually happens in our proposed NE (and in the basic version of the folk theorem). Specifically, players are required to punish a deviating player, even though that might hurt their payoff. If a deviation occurs, it might not be the best response of the players to follow their strategy and punish; thus, such a punishment is actually an empty threat.

To deal with this (well known) problem, a number of refinements of NE have been considered. The one typically used in dynamic games of perfect information is subgame-perfect equilibrium, suggested by Selten (1965). A strategy profile is a subgame-perfect equilibrium if it is a NE at every subgame of the original game. Informally, this means that at any history of the game (even those that are not on any equilibrium path), if all the players follow their strategy from that point on, then no player has an incentive to deviate. In the context of repeated games where players' moves are observed (so that it is a game of perfect information), the folk theorem continues to hold even if the solution concept used is subgame-perfect equilibrium [Aumann and Shapley (1994), Fudenberg and Maskin (1986), Rubinstein (1979)].

We define a computational analogue of subgame-perfect equilibrium that we call computational subgame-perfect ϵ-equilibrium, where the strategies involved are polynomial-time, and deviating players are again restricted to using polynomial-time strategies. There are a number of subtleties that arise in defining this notion. While we assume that all actions in the underlying repeated game are observable, we allow our TMs to also have memory, which means that the action of a TM does not depend only on the public history. Like subgame-perfect equilibrium, our computational solution concept is intended to capture the intuition that the strategies are in equilibrium after any possible deviation. This means that in a computational subgame-perfect equilibrium, at each history for player i, player i must make a (possibly approximate) best response, no matter what his and the other players' memory states are.

To compute a computational subgame-perfect ϵ-equilibrium, we use the same basic strategy as for NE, but, as often done to get a subgame-perfect equilibrium (for example see Fudenberg and Maskin (1986)), we limit the punishment phase length, so that it is incentive compatible for players to punish deviations. However, to prove our result, we need to overcome one more significant hurdle. When using cryptographic protocols, it is often the case (and, specifically, is the case in the protocol used for NE) that player i chooses a secret (e.g., a secret key for a public-key encryption scheme) as the result of some randomization, and then releases some public information which is a function of the secret (e.g., a public key). After that public information has been released, another party j typically has a profitable deviation by switching to the TM M that can break the protocol—for every valid public information, there always exists some TM M that has the secret “hardwired” into it (although there may not be an efficient way of finding M given the information). We deal with this problem by doing what is often done in practice: we do not use any key for too long, so that j cannot gain too much by knowing any one key.

A second challenge we face is that in order to prove that our new proposed strategies are even an ϵ-NE, we need to show that the payoff of the best response to this strategy is not much greater than that of playing the strategy. However, since for any polynomial-time TM there is always a better polynomial-time TM that has just a slightly longer running time, this natural approach fails. This instead leads us to characterize a class of TMs that we can analyze, and show that any other TM can be converted to a TM in this class that has at least the same payoff. While such an argument might seem simple in the traditional setting, since we only allow for polynomial time TMs, in our setting this turns out to require a surprisingly delicate construction and analysis to make sure this converted TM does indeed has the correct size and running time.

There are a few recent papers that investigate solution concepts for extensive-form games involving computationally bounded players [Gradwohl et al. (2013), Halpern and Pass (2013), Kol and Naor (2008)]; some of these focus on cryptographic protocols [Gradwohl et al. (2013), Kol and Naor (2008)]. Kol and Naor (2008) discuss refinements of NE in the context of cryptographic protocols, but their solution concept requires only that on each history on the equilibrium path, the strategies from that point on form a NE. Our requirement for the computational subgame-perfect equilibrium is much stronger. Gradwohl, Livne and Rosen (2013) also consider this scenario and offer a solution concept different from ours; they try to define when an empty threat occurs, and look for strategy profiles where no empty threats are made. Again, our solution concept is much stronger.

The rest of this paper is organized as follows. In Section 2, we review the relevant definitions from game theory and cryptography. In Section 3, we define our notion of computational ϵ-NE and show how find it efficiently for repeated games. In Section 4, we consider computational subgame-perfect ϵ-equilibrium and show that it too can be found efficiently.

Section snippets

One-shot games

We define a game G to be a triple ([c],A,u), where [c]={1,,c} is the set of players, Ai is the set of possible actions for player i, A=A1××Ac is the set of action profiles, and u:ARc is the utility function (ui(a) is the utility of player i). A (mixed) strategy σi for player i is a probability distribution over Ai, that is, an element of Δ(Ai) (where, as usual, we denote by Δ(X) the set of probability distributions over the set X). We use the standard notation xi to denote vector x

Computing a computational NE

As we discussed in the introduction, we are interested in designing an algorithm that, given a game G, outputs a profile (M1,,Mn) of (possibly probabilistic) polynomial-time TMs, where Mi computes player i's strategy in G. Each TM Mi gets as input in each round of the game the history of the game so far, and uses that and its internal memory state (recall that the TMs considered in BC+ did not have internal memory) to compute its next action. By polynomial-time TMs we mean that at round t the

Motivation and definition

In this section we would like to define a notion similar to subgame-perfect equilibrium, where for all histories h in the game tree (even ones not on the equilibrium path), playing σ restricted to the subtree starting at h forms a NE. This means that a player does not have any incentive to deviate, no matter where he finds himself in the game tree.

As we suggested in the introduction, there are a number of issues that need to be addressed in formalizing this intuition in our computational

Acknowledgments

Joseph Halpern and Lior Seeman are supported in part by NSF grants IIS-0911036 and CCF-1214844, by AFOSR grant FA9550-08-1-0266, by ARO grant W911NF-14-1-0017, and by the Multidisciplinary University Research Initiative (MURI) program administered by the AFOSR under grant FA9550-12-1-0040. Lior Seeman is partially supported by a grant from the Simons Foundation #315783. Rafael Pass is supported in part by an Alfred P. Sloan Fellowship, a Microsoft Research Faculty Fellowship, NSF Awards

References (34)

  • X. Chen et al.

    Computing Nash equilibria: approximation and smoothed complexity

  • C. Daskalakis et al.

    The complexity of computing a Nash equilibrium

  • W. Diffie et al.

    New directions in cryptography

    IEEE Trans. Inf. Theory

    (1976)
  • Y. Dodis et al.

    A cryptographic solution to a game theoretic problem

  • D. Fudenberg et al.

    The folk theorem in repeated games with discounting or with incomplete information

    Econometrica

    (1986)
  • O. Goldreich

    Foundation of Cryptography. Basic Tools, vol. I

    (2001)
  • O. Goldreich et al.

    How to construct random functions

    J. ACM

    (1986)
  • Cited by (0)

    Preliminary versions of this work appeared in “The truth behind the myth of the folk theorem”, Proceedings of the 5th Conference on Innovations in Theoretical Computer Science (ITCS), 2014 and “Not just an empty threat: subgame-perfect equilibrium in repeated games played by computationally bounded players”, Proceedings of the 10th Conference on Web and Internet Economics (WINE), 2014.

    View full text