Game Dynamics and Nash Equilibria

If a game has a unique Nash equilibrium, then this equilibrium is arguably the solution of the game from the refinement's literature point of view. However, it might be that for almost all initial conditions, all strategies in the support of this equilibrium are eliminated by the replicator dynamics and the best-reply dynamics.


Introduction
Evolutionary game dynamics model the evolution of the mean behavior in populations of agents interacting strategically. A most studied topic is the link between the outcome of these dynamics and Nash equilibria. Many positive * E-mail: viossat ατ ceremade.dauphine.fr. † The author thanks Sylvain Sorin, for patient and painful hours spent trying to decipher a first version of this article. The support of the ANR RISK and of the Fondation du Risque (Chaire Groupama) is gratefully acknowledged. All errors are mine.
connections have been found, including convergence to the set of Nash equilibria for many dynamics in special classes of games (Sandholm, 2010). In general though, solutions of evolutionary game dynamics need not converge to the set of Nash equilibria (Hofbauer and Sigmund, 1998, section 8.6). By contrast with no-regret dynamics (e.g., Hart, 2005 and references therein), replacing Nash equilibria by correlated equilibria and convergence of the solutions by convergence of some time-average hardly helps: for many dynamics, there are examples of games with a unique Nash equilibrium, which is also the unique correlated equilibrium, but such that, for some initial conditions, all strategies in the support of this equilibrium are eliminated (Viossat, 2007(Viossat, , 2008.
In these examples however, the Nash equilibrium is strict and thus asymptotically stable under reasonable dynamics. This leads to the following ques- Our examples are relatively high dimensional: 6 × 6 games for BR and 7 × 7 for REP. The reason why we need an extra-dimension for the replicator dynamics seems purely technical: our examples for the best-reply dynamics should work as well for the replicator dynamics, but this is not so easy to prove, as the replicator dynamics is more difficult to analyze than the bestreply dynamics.
The reason why our games are relatively high dimensional is deeper: first, by the folk-theorem of evolutionary game theory (Weibull, 1995, Prop. 4.11), if an interior trajectory of REP or BR converges to a point, then this point is a Nash equilibrium. Thus, we need nonconvergent trajectories, and along which, asymptotically, only strategies that do not belong to the support of a Nash equilibrium have positive probability. For single population dynamics, this seems to require at least three strategies not in the support of Nash equilibria.
Moreover, the only solution for having a unique strategy in the support of at least one Nash equilibrium is to have a unique, pure Nash equilibrium. But such a Nash equilibrium would be strict, hence asymptotically stable: Proposition 1.1. In a bimatrix game, a unique and pure Nash equilibrium is strict.
Proof. A Nash equilibrium is quasi-strict if each player puts positive weight on each of her pure best-replies. In a bimatrix game, if a Nash equilibrium is unique, then it is quasi-strict (Jansen, 1981;Norde, 1999); if it is unique and pure, it is quasi-strict and pure, hence strict.
We thus need at least two strategies in the support of Nash equilibria. With the three strategies not in the support of equilibria, this makes at least five strategies. Our examples for the best-reply dynamics are 6 × 6 games: there might be room for improvement, but not much.
The remainder of this article is organized as follows: the framework and the notation are introduced below. Section 2 studies the behavior of the bestreply dynamics in a family of 6 × 6 games. Section 3 studies the replicator dynamics in a specific 7 × 7 game. Section 4 concludes. Appendix A shows that the games we study have a unique Nash equilibrium. Appendix B studies the behavior of the best-reply dynamics in the 7 × 7 game of Section 3.
Notation and definitions. We study single-population dynamics in two-player, finite symmetric games. The set of pure strategies is I = {1, 2, .., N} and the payoff matrix is U = (u ij ) 1≤i,j≤N . Thus, u ij is the payoff of an individual playing strategy i against an individual playing strategy j. Let S N denote the simplex of mixed strategies (henceforth, "the simplex"): Its vertices e i , 1 ≤ i ≤ N, correspond to the pure strategies of the game. Note that vectors and matrices are denoted by bold characters.
Denote by x i (t) the proportion of the population playing strategy i at time t and by x(t) = (x 1 (t), ..., x N (t)) ∈ S N the population profile (or mean strategy).
We often omit time arguments and write x for x(t). We study the evolution of the population profile under the two most studied dynamics: the replicator dynamics and the best-reply dynamics.
The replicator dynamics (Taylor and Jonker, 1978) may be derived by assuming that the per capita growth rate of the total number of individuals playing strategy i is the payoff of the game. 1 For frequencies of strategies, this leads to:ẋ The right-hand side is Lipschitz in x, hence there is a unique solution through each initial condition. This solution is interior if x i (t) > 0 for all i ∈ I and all t ∈ R. Since the faces of the simplex are invariant under (REP), this boils down to the initial condition being interior; that is, The best-reply dynamics (Gilboa and Matsui, 1991;Matsui, 1992) may be derived by assuming that in each small time interval, a fraction of the 1 Or, up to a change of time, a background fitness plus the payoff of the game.
population revises its strategy and switches (rationally, but myopically) to a best-reply to the current population profile. Since this best-reply need not be unique, this does not lead to a differential equation but to the differential where BR(x) = {y ∈ S N : y · Ux = max z∈S N z · Ux} denotes the set of mixed best-replies to x. A solution of the best-reply dynamics is an absolutely continuous function x : R + → S N satisfying (BR) for almost every t. Solutions exist for each initial condition, but need not be unique.
Other definitions. The limit set of a solution x(·) of a given dynamics is the set of accumulation points of x(t) as t → +∞. A pure strategy i belongs to the support of a Nash equilibrium of a symmetric bimatrix game if there is a Nash equilibrium (x, y) such that x i > 0 (or equivalently, due to the symmetry of the game, a Nash equilibrium (x, y) such that y i > 0). Finally, the pure strategy i is eliminated (for a given solution x(·) of a given dynamics) We show that there are games with a unique Nash equilibrium but such that, under the best-reply dynamics and the replicator dynamics, all strategies in the support of this equilibrium are eliminated for almost all initial conditions.
2 Best-reply dynamics

A reminder on Rock-Paper-Scissors
A general Rock-Paper-Scissors game (RPS) is a 3 × 3 symmetric game with payoff matrix (As the game is symmetric, we only indicate the payoffs of the row player.) These games have a unique Nash equilibrium. It is symmetric and completely mixed. We say that the game is outward cycling In that case, almost all solutions of the best-reply dynamics converge to a triangle, which Gaunersdorfer and Hofbauer (1995) called the Shapley triangle after Shapley (1964). It is defined by Proposition 2.1 (Gaunersdorfer and Hofbauer, 1995). In an outward cycling RPS game, for every initial condition different from the equilibrium, the solution of the best-reply dynamics is uniquely defined and its limit set is the , if the game is zero-sum, the Shapley triangle is degenerate and coincides with the equilibrium; if A RPS game has cyclic symmetry if the payoffs a i , b i , c i are independent of i. The Nash equilibrium is then (1/3, 1/3, 1/3) and, up to a rescaling that does not affect the equilibrium nor the dynamics we study, the payoffs may be taken of the form: The outward cycling condition (2) then boils down to α > β, and the Shapley We now describe in detail the behavior of the best-reply dynamics in RPS games, and give a sketch of proof of Proposition 2.1, as this allows to introduce some crucial tools. The first one is a version of the improvement principle (Monderer and Sela, 1997). It says that when the solution of the best-reply dynamics points towards a pure best-reply i, only certain strategies can become best-replies: those that are better replies to i than i itself. reply to x(t) is strategy i. If strategy j = i is a best-reply to x(T ′ ) then Proof. See Viossat (2008, Lemma 4.2).
Assume for instance that in a RPS game, strategy 1 is currently the unique best-reply to the population profile x(t), so that the solution points towards e 1 ; that is,ẋ = e 1 − x. Since (e 1 , e 1 ) is not a Nash equilibrium, a new best-reply must arise. By the improvement principle, this can only be strategy 2. The solution then points towards the edge e 1 − e 2 . Since in the game restricted to strategies 1 and 2, strategy 2 strictly dominates strategy 1, strategy 2 immediately becomes the unique best-reply. Therefore the solution points towards e 2 , then towards e 3 , then towards e 1 again,...
By itself, this cyclic behaviour does not preclude convergence to equilib- However, for outward cycling RPS games and for solutions that do not start at the equilibrium, this cyclic behavior goes on forever. This follows from the following observations, which we do not prove. Below, the function V is defined in (3) and v(t) = V (x(t)).
(i) If the game is outward cycling, then V (x) is zero on the Shapley triangle, positive outside it, and negative inside, with its unique minimum attained at the equilibrium point.
(ii) When the solution points towards a pure strategy (that is,ẋ = e i − x Consider a solution that does not start at the equilibrium. Combining (i), (ii), and the above described cyclic behavior, we get that the solution cannot approach the equilibrium, therefore the times at which the direction changes cannot accumulate and the cyclic behavior goes on for ever; thus, by (ii), v(t) → 0 hence x(t) → ST . The limit set of the solution is then easily seen to be the whole triangle.

A 6 × 6 game
Consider the following 6 × 6 symmetric game: Let G 123 and G 456 denote the 3 × 3 games obtained from (6) by restricting the players to their three first and to their three last strategies, respectively. Both G 123 and G 456 are outward cycling RPS games with cyclic symmetry. Their unique Nash equilibrium correspond in the whole game to, respectively: The payoffs are chosen so that (n 123 , n 123 ) be a Nash equilibrium of (6) but not (n 456 , n 456 ).
Proof. See Appendix A.
Proposition 2.3 does not only state that (n 123 , n 123 ) is the unique symmetric Nash equilibrium, but also that there are no asymmetric Nash equilibria.
Nevertheless, from almost all initial conditions, all strategies in its support are eliminated. More precisely, let ST 456 denote the Shapley triangle: Proposition 2.4. For almost every mixed strategy x in S 6 , there is a unique solution x(·) of (BR) such that x(0) = x, and its limit set is the Shapley Proof. The proof relies on the improvement principle and the better-reply structure of the game, described in Fig. 1 below: Figure 1: Better-replies to pure strategies in game (6). An arrow from i to j means Consider a solution x(·) of the best-reply dynamics. We may assume that there is a unique best-reply to x = x(0), since this holds for almost all x in S 6 . There are then two cases.
Case 1: the unique best-reply to x(0) is strategy 4, 5 or 6. Assume for concreteness that this is strategy 4. The improvement principle (Lemma 2.2) and the same reasoning as for RPS games imply that the solution first points towards e 4 , then towards e 5 , then towards e 6 , then towards e 4 again, in a cyclic fashion. It may be shown exactly as in (Viossat, 2008, p.33) that the times at which the direction of the solution changes do not accumulate. 3 It follows that this cyclic behavior goes on for ever. Therefore, strategies 1, 2 and 3 never become best-replies, hence Moreover, when strategy i ∈ {4, 5, 6} is the unique best-reply, the function Case 2: the unique best-reply to x(0) is strategy 1, 2 or 3. Assume for concreteness that this is strategy 4. If none of the strategies 4, 5 and 6 ever becomes a best-reply, the solution points towards e 1 , then towards e 2 , then towards e 3 , etc., and due to the same reasoning as in case 1, its limit set will be the Shapley triangle This is impossible, because the payoffs are such that at one of the vertices of this triangle, the closest to e 3 , strategy 4 is the unique best-reply. This vertex is given by 1 13 (1, 3, 9, 0, 0, 0), see Gaunersdofer and Hofbauer (1995, Eq. (3.6)). Thus, there exists a first time T > 0 at which one of the strategies 4, 5 and 6 becomes a best-reply. Due to the improvement principle and to the better-reply structure of the game (Fig. 1), this can only be strategy 4, and just before T , the unique best-reply was strategy 3.
There are then two subcases: Robustness to perturbations of the payoffs. The above proof uses only strict inequalities, which are unaffected by sufficiently small perturbations of the payoffs (the only modification is that the Shapley triangles and the underlying functions V must be defined as in (3) because the diagonal terms need no longer be zero). Moreover, since the game is a bimatrix game with a unique Nash equilibrium, it follows that any game in its neighborhood has a unique Nash equilibrium, and with the same support (Jansen, 1981). Therefore: Proposition 2.5. There exists a neighborhood of game (6) such that, for any symmetric game in this neighborhood, the unique Nash equilibrium has support in {1, 2, 3} × {1, 2, 3}, but for almost all initial conditions, strategies 1, 2 and 3 are eliminated by the best-reply dynamics.

Replicator Dynamics
Up to a further rescaling, the payoff matrix of an outward cycling RPS game with cyclic symmetry (4) may be taken of the form: The behavior of the replicator dynamics in such games is well known. The boundary Γ = {x ∈ S 3 : x 1 x 2 x 3 = 0} forms a heteroclinic cycle, that is, a globally invariant set consisting of saddle rest-points and saddle orbits connecting these rest-points. Moreover: Proposition 3.1. [Zeeman, 1980;Gaunersdorfer and Hofbauer, 1995] In game (9), the set Γ is asymptotically stable, all interior solutions that do not start at the equilibrium (1/3, 1/3, 1/3) converge to Γ and the limit set of their time-average is the Shapley triangle (5).
(If y(·) is a solution of (REP), its time-average at t = 0 is 1 t t 0 y(s) ds.) Two other facts will prove useful: first, in game (9), the mean payoff is always nonpositive: Second, as computed by Gaunersdorfer and Hofbauer (1995, Eq. (3.6)), the vertex of the Shapley triangle closest to e 3 is given bȳ Consider a solution y(·) of the replicator dynamics that does not start at the equilibrium. Proposition 3.1 implies thatq is an accumulation point of the time-average of y(t). Moreover, for ε small enough (ε < 1/4 suffices), 4q 3 − 3 > 0; this implies the following result: Now consider the following 7 × 7 symmetric game: with ε > 0 small enough. 4 The games obtained by restricting both players to their three first or to their three last strategies are outward cycling Rock-Paper-Scissors games with cyclic symmetry. The Nash equilibria of these games correspond in the whole game to rest points of the replicator dynamics, which we denote by n 123 and n 567 : , 0, 0, 0, 0 ; n 567 = 0, 0, 0, 0, The heteroclinic cycles of the RPS games correspond to heteroclinic cycles of the whole game, which we denote by Γ 123 and Γ 567 : In the proofs, for simplicity, we use ε < 1/48, but the results extend easily to ε < 1/6, and probably beyond.
Γ 567 = {x ∈ S 7 : x 5 + x 6 + x 7 = 1 and x 5 x 6 x 7 = 0}. However, x 4 then decreases, which may lead to a come-back of strategies 1, 2, 3, and the whole process might start again. The difficulty is to make sure that, each time this process runs, the solution gets closer to Γ 567 .
For the replicator dynamic, this can be shown due to the last important property of game (12): against strategies 4 to 7, strategies 1 to 3 have the same payoffs. That is, for any i, i ′ in {1, 2, 3} and any j in {4, 5, 6, 7}, u ij = u i ′ j .
Similarly, against strategies 1 to 4, strategies 5 to 7 have the same payoffs.
Due to linearity properties of the replicator dynamics, this implies that the dynamics may be decomposed as we now explain. x i x 1 + x 2 + x 3 (13) and letx = (x 1 ,x 2 ,x 3 ). For i ∈ {5, 6, 7}, define similarly: and letx = (x 5 ,x 6 ,x 7 ). Finally, let λ(t) = x 1 (t) + x 2 (t) + x 3 (t) and µ(t) = x 5 (t) + x 6 (t) + x 7 (t) denote respectively the total share of the three first and of the three last strategies at time t. The evolution of x is fully described by the joint evolution ofx,x, λ and µ. The interest of this description is that, up to a change in velocity,x andx follow the replicator dynamics of the Rock-Paper-Scissors game (9).
Formally, letτ (t) denote the rescaled timē Let bothŪ andÛ denote the payoff matrix (9), depending on whether it arises as the top-left or the bottom-right corner of game (12). 6 Lemma 3.5. Let y(·) denote the solution of (REP) in the RPS game (9), with initial condition y(0) =x(0). We have: Proof. The proof of (16) is the same as the proof of Lemma 5.2 of Viossat (2007). Due to (16), y(τ (t)) andx(t) are solutions of the same differential equation, which admits a unique solution through each initial condition. This proves (17).
Similarly, if z(·) is the solution of the replicator dynamics in game (9) with initial condition z(0) =x(0), andτ (t) is the rescaled timê We are now ready to prove the main result of this section: Proposition 3.6. For any interior initial condition x = x(0) such that neither , the solution of the replicator dynamics converges to Γ 567 . In particular, all pure strategies in the support of the unique equilibrium of game (12) are eliminated. 6 The top-left and bottom-right RPS games of (12) need not be the the same for the results to hold, this is just to minimize the number of parameters.
Assume by contradiction that this is not the case.

Proof. λ(t)
0 and λ is clearly Lipschitz.  Proof. Using (REP), an easy computation shows that where the inequality follows from Lemma 3.2.
We now conclude. Recall the definition ofτ in (18). A corollary of Claim 3.9 is thatτ (t) → +∞ as t → +∞. By (19) and Proposition 3.1, it follows thatx converges to the heteroclinic cycle of game (9). It is easy to check that along this cycle, the mean payoff is always greater than − 1 4 . Therefore: Moreover, (REP) and a somewhat tedious computation show that: Assuming ε ≤ 1 48 , (22), (23) and Lemma 3.2 imply that for t ≥ T 1 : It follows from Claim 3.9 that there exists a time T 2 ≥ T 1 at which the ratio µ/λ is greater than 16. By (24), this ratio then keeps increasing hence, by (24) again, By Claim 3.7, this implies that λ goes to zero, a final contradiction.
Perturbation of payoffs. As for game (6), any game sufficiently close to game (12) in the payoff space has a unique Nash equilibrium, and its support is {1, 2, 3}×{1, 2, 3}. We conjecture that the result of Proposition 3.6 generalizes to such nearby games. That is, for almost all initial conditions, the solution of the replicator dynamics converges to the boundary of the face spanned by e 5 , e 6 and e 7 , hence all pure strategies in the support of the unique Nash equilibrium are eliminated. Our proof does not go through however, because Lemma 3.5 requires a very specific payoff structure.

Correlated equilibrium.
By contrast with the games of Viossat (2007Viossat ( , 2008, the Nash equilibrium of games (6) and (12) is not the unique correlated equilibrium. Whether reasonable dynamics may eliminate all strategies used in correlated equilibrium for almost all initial conditions is an open question.

Other dynamics.
A variant of Lemma 3.5 holds for the discrete-time replicator dynamics: Thus, extending Proposition 3.6 to (26) should be relatively simple. Proposition 3.6 might also extend to some classes of payoff functionnal dynamicṡ and f an increasing and sufficiently smooth function from R to R. This might be hard to prove though, as Lemma  Finally, there is a strong link between the best-reply dynamics and the time-average of the replicator dynamics (Gaunersdorfer and Hofbauer, 1995;Hofbauer et al., 2009). For this reason, we conjecture that Proposition 2.4 extends to (REP); that is, in game (6) This can be done due to a decomposition of the best-reply dynamics similar to Lemma 3.5.

Discussion
In game (12), the Nash equilibrium is unique and quasi-strict, and therefore persistent, regular, hence strongly stable, essential, strictly proper, strictly perfect, etc. (van Damme, 1991) Thus, from the traditional, rationalistic point of view, it may be seen as the unambiguous solution of the game. However, under two of the most studied dynamics, all strategies in the support of this Nash equilibrium are eliminated from almost all initial conditions. This indicates an even wider gap between strategic and evolutionary considerations that had been noted before.
We conjecture that elimination of all strategies in the support of Nash equilibria from almost all initial conditions occurs for many other dynamics, including multi-population dynamics. However, this might be hard to prove because this can only arise in relatively large games, in which having a precise understanding of dynamics more complex than the replicator dynamics or the best-reply dynamics might prove difficult. A way forward might be to consider nonlinear games and to replace, in the construction, Rock-Paper-Scissors games by hypnodisk games (Hofbauer and Sandholm, 2011).

A Equilibrium uniqueness
In this section, we show that games (6) and (12) have a unique equilibrium.
We begin with a lemma used in both proofs.
Consider a symmetric bimatrix game with pure strategy set I = {1, 2, ..., N} and payoff matrix U. Let I ′ ⊂ I. For any x in S N , define Lemma A.1. Let (x, y) be a Nash equilibrium such that x(I ′ )y(I ′ ) > 0. Assume that against x − x ′ and y − y ′ , the payoffs of a strategy i in I ′ is independent of i. That is, for all i and j in I ′ , Then (x ′ , y ′ ) induces an unnormalized Nash equilibrium of the game restricted to I ′ × I ′ . That is, for all i, j in I ′ : Proof. Let i ∈ I ′ . If x ′ i > 0 then x i > 0, hence strategy i is a best-reply to y. Together with (28) this implies that for all j in I ′ , (Uy ′ ) i − (Uy ′ ) j = (Uy) i − (Uy) j ≥ 0. This proves the first part of (29). The second part is symmetric.
Indeed, if x 4 x 5 x 6 > 0, then strategies 4, 5 and 6 are all best replies to y, hence so is n 456 . This cannot be because, as is easily checked, n 456 is strictly dominated by n 123 .
Step 2. y 1 + y 2 + y 3 > 0 and by symmetry Assume by contradiction that y 1 = y 2 = y 3 = 0. It follows that Step 3. x 1 = x 2 = x 3 and y 1 = y 2 = y 3 . n 123 and n 567 . Against both of these strategies, the payoff of n 123 is strictly greater than the payoff of strategies 5, 6 and 7. Thus, the latter cannot be best-replies to y, hence x 5 + x 6 + x 7 = 0. This contradicts (32).
Subcase 1.2. If (32) does not hold. Without loss of generality, assume that y 5 + y 6 + y 7 = 0. Since y 4 = 0 and y 1 = y 2 = y 3 , this implies that y = n 123 . Therefore, as above, none of the strategies 5, 6 and 7 is a best reply to y. Therefore x 5 + x 6 + x 7 = 0 which by the same argument implies x = n 123 .
Case 2. If (31) does not hold. Without loss of generality, assume x 1 + x 2 + x 3 = 0. This implies that n 567 is a strictly better response to x than strategy 4. Thus, y 4 = 0.
In the latter case, since y 4 = 0, it follows that y has support in {1, 2, 3}, hence that n 123 is a strictly better response to y than either 5, 6, or 7; therefore, in any case, x 5 + x 6 + x 7 = 0. Since we assumed x 1 + x 2 + x 3 = 0, it follows that x = e 4 . Therefore, y must have support in {5, 6, 7}. It follows that x is not a best-reply to y, a contradiction.
Summing up, only subcase 1.2 is possible, and then x = y = n 123 .
We now conclude. By Lemma B.3, there exists a time T such that none of the strategies 1, 2 and 3 is a best-reply to x(T ). Moreover, due to Lemma B.1, for all t ≥ 0,x(t) = (1/3, 1/3, 1/3), hence by a variant of Lemma A.1, strategies 5, 6 and 7 cannot all be best-replies to x(t). Thus, due to the cyclic symmetry of strategies 5 to 7, we may assume that the set of pure best-replies to x(T ) is one of the followings: In Case 1, the same arguments as in Proposition 2.4 show that x(t) → ST 567 . In Case 2, since strategy 4 is strictly dominated by strategy 6 on the face spanned by e 4 , e 5 and e 6 , it immediately ceases to be a best-reply. This leads to Case 1. In Case 3, since e 4 is not a best-reply to itself, there exists a first time T ′ > T at which e 4 is not the unique best-reply to x(t). Due to the improvement principle (Lemma 2.2), none of the strategies 1, 2 and 3 is a best-reply to x(T ′ ). Thus we are back to Case 2. This concludes the proof.