Evolutionary games on the lattice: best-response dynamics

The best-response dynamics is an example of an evolutionary game where players update their strategy in order to maximize their payoff. The main objective of this paper is to study a stochastic spatial version of this game based on the framework of interacting particle systems in which players are located on an infinite square lattice. In the presence of two strategies, and calling a strategy selfish or altruistic depending on a certain ordering of the coefficients of the underlying payoff matrix, a simple analysis of the non-spatial mean-field approximation of the spatial model shows that a strategy is evolutionary stable if and only if it is selfish, making the system bistable when both strategies are selfish. The spatial and non-spatial models agree when at least one strategy is altruistic. In contrast, we prove that, in the presence of two selfish strategies and in any spatial dimensions, only the most selfish strategy remains evolutionary stable. The main ingredients of the proof are monotonicity results and a coupling between the best-response dynamics properly rescaled in space with bootstrap percolation to compare the infinite time limits of both systems.


Introduction
The framework of evolutionary game theory, which describes the dynamics of populations of individuals identified to players, has been initiated by theoretical biologist Maynard Smith and first appeared in his work with Price [7]. Each individual-player is characterized by one of a finite number n of possible strategies and is attributed a payoff that is calculated based on the strategy of the surrounding players and an n × n payoff matrix. The most popular model of evolutionary game is probably the so-called replicator equation reviewed in [3], a system of deterministic differential equations for the frequencies of players holding a given strategy. This paper is a sequel of the second author's work [5] continuing the analytical study of evolutionary games based on the framework of interacting particle systems which, in contrast with the replicator equation, also includes stochasticity and space in the form of local interactions.
Model description -The version of the best-response dynamics we consider in this paper is a continuous-time Markov chain whose state at time t is a spatial configuration η t : Z d −→ {1, 2} := the set of strategies.
In words, each point of the d-dimensional square lattice is occupied by exactly one player who is characterized by her strategy. The spatial structure is included in the form of local interactions assuming that each player's payoff only depends on the strategy of her 2d neighbors. More precisely, having a two by two payoff matrix A = (a ij ) where a ij is interpreted as the payoff of a player holding strategy i interacting with a player holding strategy j, each configuration is turned into a so-called payoff landscape that attributes a payoff to each vertex as follows: φ(x, η t ) := (a 11 N 1 (x, η t ) + a 12 N 2 (x, η t )) 1{η t (x) = 1} + (a 21 N 1 (x, η t ) + a 22 N 2 (x, η t )) 1{η t (x) = 2} for all x ∈ Z d where N j (x, η t ) is the number of type j neighbors of vertex x, i.e., N j (x, η t ) := card {y ∈ Z d : y ∼ x and η t (y) = j} where the binary relationship ∼ indicates that two vertices are neighbors. In the traditional framework of evolutionary game theory, each strategy is often interpreted as a trait and each payoff defined through the payoff landscape as a fitness or reproduction success. In particular, evolutionary game theory makes the implicit assumption that players are not rational decision-makers who can choose their strategy and that the evolution of the system is driven by births and deaths. In contrast, the best-response dynamics assumes that players are rational decision-makers changing their strategy in order to maximize their payoff. Specifically, we assume that each player updates her strategy at an exponential rate one choosing to change her strategy if and only if it increases her payoff. In particular, in case of a tie, i.e., the player would not change her payoff by changing her strategy, nothing happens. More precisely, letting be the payoff that the player at x would receive if she followed strategy 1 and 2, respectively, the best-response dynamics is formally described by the Markov generator where the configuration η x,i t is obtained from η t by setting to i the strategy at x and leaving the strategy at the other vertices unchanged. Note that, for any given vertex x, the difference between the two alternative payoffs in (1) can be written as In particular, the dynamics only depends on a 1 := a 11 − a 21 and a 2 := a 22 − a 12 rather than all four coefficients of the payoff matrix so the Markov generator (2) can be written as Since the behavior of the system strongly depends on the sign of a 1 and a 2 , it is convenient to use the terminology introduced in [4,5] by declaring strategy i to be • altruistic when a i < 0, meaning that a player with strategy i confers a lower payoff to a player following the same strategy than to a player following the other strategy, • selfish when a i > 0, meaning that a player with strategy i confers a higher payoff to a player following the same strategy than to a player following the other strategy.
Mean-field approximation -To understand the role of space in the long-term behavior of the best-response dynamics, the first step is to look at the deterministic nonspatial version, or meanfield approximation, of the process (3). This mean-field model is obtained under the assumption that the population is well-mixing, and more precisely by looking at the process on the complete graph in which any two players are neighbors and then taking the limit as the number of vertices tends to infinity. This results in a system of differential equations for the frequency of players holding strategy i that we denote by u i . In the absence of a spatial structure, the payoff that a player would receive if she followed strategy 1 and 2, respectively, is φ 1 (u 1 , u 2 ) = a 11 u 1 + a 12 u 2 and φ 2 (u 1 , u 2 ) = a 21 u 1 + a 22 u 2 which can be viewed as the nonspatial analog of (1). Also, under the evolution rules of the bestresponse dynamics, either each type 1 player or each type 2 player changes her strategy at an exponential rate one depending on whether φ 1 − φ 2 is negative or positive, respectively. Then, rescaling time by the number of vertices and taking the limit as the number of vertices tends to infinity gives the following differential equation for the frequency of type 1 players: where we used that u 1 + u 2 = 1. Letting u * := a 2 (a 1 + a 2 ) −1 , we have u ′ 1 (t) = + u 2 when (u 1 > u * and a 1 + a 2 > 0) or (u 1 < u * and a 1 + a 2 < 0) u ′ 1 (t) = − u 1 when (u 1 > u * and a 1 + a 2 < 0) or (u 1 < u * and a 1 + a 2 > 0) which shows the following four possible regimes: • when strategy 1 is selfish and strategy 2 altruistic, strategy 1 wins in the sense that starting from any initial condition u 1 (t) → 1 as t → ∞.
• when strategy 1 is altruistic and strategy 2 selfish, strategy 2 wins in the sense that starting from any initial condition u 1 (t) → 0 as t → ∞.
• when both strategies are altruistic, coexistence occurs in the sense that starting from any initial condition u 1 (t) → u * ∈ (0, 1) as t → ∞.
• when both strategies are selfish, the system is bistable: In terms of evolutionary stable strategy, this indicates that, for well-mixing populations, a strategy is evolutionary stable if it is selfish but not if it is altruistic. Recall that an evolutionary stable strategy is defined as a strategy which, if adopted by a population, cannot be invaded by any alternative strategy starting at an infinitesimally small frequency.
Spatial stochastic model -We now return to the spatial model (3) looking at the four parameter regions corresponding to the four possible regimes of the mean-field approximation. Assuming first that strategy 1 is selfish and strategy 2 altruistic, we get for all x ∈ Z d and all configuration η t . This shows that each type 2 player changes her strategy at an exponential rate one whereas each type 1 player sticks to her strategy, therefore strategy 1 wins, just as in the mean-field model, in the sense that for any initial configuration By symmetry, strategy 2 wins whenever strategy 1 is altruistic and strategy 2 selfish. Note in particular that the "all 1" and "all 2" configurations are not necessarily absorbing states for the process. This is due to the fact that, though the new strategy is chosen based on the strategy of the neighbors, it is not chosen from the neighborhood. Looking now at altruistic-altruistic interactions, whenever the player at x and all her neighbors follow the same strategy, In either case, the player at x changes her strategy at an exponential rate one, indicating that, as in the mean-field model, two altruistic strategies coexist in the sense that We now study the process when both strategies are selfish, a case more challenging mathematically and also more interesting as it shows some important disagreements between the spatial and nonspatial models. To confront our results for the spatial model with the bistability displayed by its nonspatial counterpart, we consider the process starting from the product measure with P (η 0 (x) = 1) =: p for all x ∈ Z d and compare the models when p = u 1 (0). The fact that the inclusion of space in the form of local interactions strongly affects the long-term behavior of the system can be seen in a specific parameter region using a standard coupling with the Richardson model [8]. Indeed, let Then, when almost surely. These two inequalities imply that the set of type 1 players dominates stochastically the set of infected sites in the Richardson model π t with initial configuration π 0 (x) = 1 {η 0 (x) = 1 and η 0 (y) = 1 for some y ∼ x} which, in turns, implies that strategy 1 wins whenever p > 0. This shows in particular the existence of parameter regions in which, in contrast with the nonspatial model, only the most selfish strategy . Best-response dynamics on a 300 × 300 lattice with periodic boundary conditions starting from a product measure with density p of type 1 players in black. On the left picture, the process hits an absorbing state in which both types are present, whereas on the right picture, which shows a snapshot of the process at time 25, the system is converging to the all black configuration: strategy 1 wins.
is evolutionary stable for the spatial model. Returning to general selfish-selfish interactions, the numerical simulations of the two-dimensional process displayed in Figure 1 suggest that, when a 1 is slightly larger than a 2 and the initial density p > 0 is small, the system fixates to a configuration in which the set of type 1 players consists of a union of disjoint rectangles, indicating that strategy 1 is unable to invade strategy 2. These simulations, however, are misleading due to the finiteness of the graph, and it can be proved that, in any dimensions, the most selfish strategy always wins even when starting at a low density. More precisely, we have the following theorem.
Theorem 1 -Assume that a 1 > a 2 > 0 and p > 0. Then, In particular, while any selfish strategy is evolutionary stable in the nonspatial model, only the most selfish strategy is evolutionary stable in the spatial model. The result in one dimension directly follows from our coupling with the Richardson model since while the general result relies on a combination of monotonicity results and coupling arguments to compare the best-response dynamics with bootstrap percolation. More precisely, we first prove that, in the presence of selfish-selfish interactions, the best-response dynamics is attractive, which allows to focus on the process starting from a certain reduced configuration that consists of a union of hyperrectangles. The second ingredient is to show that, for the process starting from this reduced configuration, the set of type 1 players is a pure growth process, just like the Richardson model.
This strong monotonicity result is then applied repeatedly to show that the best-response dynamics properly rescaled in space dominates stochastically bootstrap percolation with parameter d. From this domination and a result due to Schonmann [9, Theorem 3.1], we finally deduce that, unlike what Figure 1 suggests, the most selfish strategy indeed invades the entire lattice.

Some monotonicity results
To avoid cumbersome notations, it is convenient to sometimes think of the state of the process as a subset rather than a function by using the identification: One key ingredient is to think of the process as being constructed from a so-called Harris' graphical representation [2] which, in the case of the best-response dynamics, reduces to a collection of independent Poisson processes. More precisely, • for each x ∈ Z d , we let (N t (x) : t ≥ 0) be a rate one Poisson process and • we denote by T n (x) := inf {t : N t (x) = n} its nth arrival time.
The configuration at time t := T n (x) is obtained from η t− by An argument due to Harris [2] implies that the best-response dynamics starting from any initial configuration can indeed be constructed using this rule. The next lemma shows that, in the presence of selfish-selfish interactions, the best-response dynamics is attractive.
Lemma 2 -The process with a 1 > 0 and a 2 > 0 is attractive: Proof. Letη t ⊂ η t . Since a 1 > 0 and a 2 > 0, Let c(x, η t ) be defined as in (5). Using (6), we obtain that, for all x ∈η t , Similarly, for all x / ∈ η t , we have The inequalities (7)- (8) show that condition (B14) in Liggett [6] are satisfied, which proves that, in the presence of selfish-selfish interactions, the process is attractive.
In addition to attractiveness, a key ingredient to prove our theorem is to replace the initial configuration η 0 with a specific reduced initial configurationη 0 . To define this new initial configuration, we introduce the following collection of hypercubes: Then, given η 0 , we say that H z is a type 1 hypercube whenever H z ⊂ η 0 and definē η 0 := {x ∈ Z d : x ∈ H z and H z ⊂ η 0 for some z ∈ Z d } = the union of all type 1 hypercubes.
Note thatη 0 ⊂ η 0 therefore, according to Lemma 2, In particular, it suffices to prove the theorem for the modified processη t that we call from now on the sparse best-response dynamics. The main reason for working with this process appears in the next lemma which states that, starting from any configuration that consists of a union of hypercubes, the process can only increase. This somewhat strong result is due in part to the fact that, while the time of the updates are random, the outcome at each update is deterministic.
Proof. Let Φ be the function defined on the set of configurations by or (x ∈ η t and a 1 N 1 (x, η t ) = a 2 N 2 (x, η t ))}.
In words, while η t represents the set of vertices following strategy 1, configuration Φ(η t ) can be seen as the set of vertices that will become or stay of type 1 at the next update provided the configuration in their neighborhood does not change by the time of the update. Note that, due to the presence of selfish-selfish interactions: a 1 > 0 and a 2 > 0, we have indicating that the function Φ is nondecreasing. In addition, for any configurationη 0 obtained by reduction of an arbitrary initial configuration using the partition into hypercubes, since each type 1 player has at least d type 1 neighbors and a 1 > a 2 > 0, we also have indicating thatη 0 ⊂ Φ(η 0 ). Monotonicity (11) and the generalization of (12) to all times are the main two ingredients to establish the lemma that we prove by induction. Since the lattice is infinite, the time of the first update does not exist. Also, in order to prove the result inductively, the next step is to use an idea of Harris [2] to break down the lattice into finite islands that do not interact with each other for a short time. More precisely, we do the following construction: • we let ǫ > 0 be small and, for each vertex x such that T 1 (x) < ǫ, draw a line segment between x and each of its 2d nearest neighbors.
This construction naturally induces a partition of the lattice into clusters, where two vertices belong to the same cluster if there is a sequence of line segments connecting them. In addition, since the probability of two neighbors x ∼ y being connected by a line segment P (there is a line segment between x and y) can be made arbitrarily small by choosing time ǫ > 0 small, Theorem 1.33 in [1] implies that there exists ǫ > 0 small, fixed from now on, such that each cluster is almost surely finite. Letting A be an arbitrary, necessarily finite, cluster, we have the following two properties: (a) the configuration in A at time ǫ only depends on the initial configuration of the process and its graphical representation restricted to the cluster A.
where N x refers to the interaction neighborhood of vertex x, the strategy at x is not updated before time ǫ.
Now, since A is finite, the number of updates in A up to time ǫ is almost surely finite and therefore can be ordered. Let the times of these updates and their corresponding locations be s 0 := 0 < s 1 < s 2 < · · · < s m < ǫ and x 1 , x 2 , . . . , x m ∈ A.
By (a) and the definition of the function Φ, we have But according to (12), we also haveη 0 ⊂ Φ(η 0 ) therefore This, together with (b) and the monotonicity of Φ in (11), implies The last inclusion in (14) allows us to repeat the same reasoning to get (13)-(14) at the next update time, and so on up to time s m . Using in addition the obvious fact that the configuration in the cluster A does not change between two consecutive updates implies that the property to be proved holds at all times smaller than ǫ so we have This only proves the result for the process restricted to A and up to time ǫ. To extend the result across the lattice and for all times, we first use that the set of all the clusters forms a partition of the lattice and sum (15) over all the possible clusters: This first inclusion proves the lemma up to time ǫ while the second inclusion can be used, together with the fact that the process is Markov, to restart the argument and extend the result inductively up to time 2ǫ, then 3ǫ, and so on. This proves the result at all times.

Coupling with bootstrap percolation
This section is devoted to the proof of the theorem, which relies on a coupling between bootstrap percolation and the best-response dynamics. Bootstrap percolation with parameter m is the discrete-time process whose state at time t is a spatial configuration In view of Lemma 3 for the sparse best-response dynamics and the evolution rules of bootstrap percolation, both processes are almost surely monotone, therefore the limits η ∞ := lim t→∞ηt and ξ ∞ := lim t→∞ ξ t exist.
Here, we again identify configurations with the set of vertices in state 1. From now on, we call the two limit sets above, the infinite time limits of the sparse best-response dynamics and bootstrap percolation, respectively. To prove the theorem, we first rely on the monotonicity results of the previous section to show that the infinite time limit of the sparse best-response dynamics properly rescaled in space dominates its counterpart for bootstrap percolation. The main ingredient is to couple both systems using the key function introduced in (10). Based on this coupling, we can directly deduce the theorem from its analog for bootstrap percolation on the infinite lattice starting from a product measure, a result due to Schonmann [9, Theorem 3.1].
Proof. We prove the result by induction with respect to n.
Base case -This follows from Lemma 3 which gives Inductive step -Assume Φ n (η s ) ⊂η ∞ and x ∈ Φ n+1 (η s ) \ Φ n (η s ). Then, s. for all y ∈ Φ n (η s ) and In addition, the choice of x implies that while a new application of Lemma 3 gives for all t > τ x . Combining (18)-(19) and using that a 1 > 0 and a 2 > 0, we get It follows that, given that the player at vertex x follows strategy 2 after time τ x , she switches to strategy 1 at rate one. This together with (17) implies that Finally, using consecutively (11) and (16) and then (20), we deduce that which shows the result at step n + 1 and completes the proof.
We are now ready to prove that the infinite time limit of the best-response dynamics properly rescaled in space dominates the infinite time limit of bootstrap percolation. More precisely, we look at the best-response dynamics viewed at the hypercube level by introducing From now on, we call this process the hypercubic best-response dynamics. Identifying once more configurations with the set of vertices in state 1 and using again the monotonicity of the sparse best-response dynamics given by Lemma 3, we note that Lemma 5 -Assume that a 1 > a 2 > 0 and m = d. Then, ξ ∞ ⊂ ζ ∞ almost surely whenever ξ 0 = ζ 0 .
Recalling (21), this indicates that there are at least m = d hypercubes adjacent to H z that are completely occupied by players of type 1. Invoking the invariance by symmetry of the best-response dynamics, we may assume without loss of generality that H z−e j ⊂η s for j = 1, 2, . . . , d where e j := jth unit vector.
For an illustration in three dimensions, we refer to Figure 2 where configurationη s consists of the union of three hypercubes. In particular, taking n = d + 1 gives Applying Lemma 4, we then obtain H z ⊂ Φ d+1 (η s ) ⊂η ∞ therefore ζ t (z) = 1 for some time t < ∞ a.s.
In addition, since the hypercubic process clearly inherits the monotonicity property of the sparse best-response dynamics given by Lemma 3, ζ s (z) = 1 implies that ζ t (z) = 1 for all t > s.
In summary, (27) and the fact that (22) implies (26) indicate that: for the hypercubic process, once a vertex is occupied it remains occupied forever, and if an empty vertex has at least d occupied neighbors then it becomes occupied after an almost surely finite time. Recalling the evolution rules of bootstrap percolation with parameter m = d, the result follows.
Combining the previous lemma with a result of Schonmann [9, Theorem 3.1] on bootstrap percolation on the infinite lattice, we now deduce the theorem.
Whether the set of occupied vertices ultimately covers the entire lattice depends on the initial density and the fact that bootstrap percolation is clearly attractive motivates the introduction of the following critical value for the initial density: q c := inf {q ∈ [0, 1] : P (ξ ∞ = Z d ) = 1}.