On a continuous mixed strategies model for evolutionary game theory

We consider an integro-differential model for evolutionary game theory which describes the evolution of a population adopting mixed strategies. Using a reformulation based on the first moments of the solution, we prove some analytical properties of the model and global estimates. The asymptotic behavior and the stability of solutions in the case of two strategies is analyzed in details. Numerical schemes for two and three strategies which are able to capture the correct equilibrium states are also proposed together with several numerical examples.


Introduction
Evolutionary dynamics is based on the ideas of mathematical game theory. In game theory, a player's strategy in a game is a complete plan of action at any stage of the game. A pure strategy defines a specific move or action that a player will follow in every possible attainable situation in a game. A player's strategy set is the set of pure strategies available to that player and defines what strategies are available to play. A mixed strategy is an assignment of a probability to each pure strategy. This allows for a player to select a pure strategy with a given distribution of probability. Since probabilities are continuous, there are infinitely many mixed strategies available to a player, even if their strategy set is finite. Of course, one can regard a pure strategy as a degenerate case of a mixed strategy, in which that particular pure strategy is selected with probability 1 and every other strategy with probability 0.
In any game, an important concept is the payoff that is the number which represents the motivations of a player. The exact definition of the payoff depends on the case of interest: payoff may represent profit, utility, or other continuous measures, or may simply rank the desirability of outcomes. In all cases, the payoffs must reflect the motivations of the players. Following the basic tenet of Darwinism, we may express the success of a player in a game, that means the player's survival, as the difference between the player's payoff and the average payoff of all players.
Dynamic models for continuous strategy spaces have received considerable attention recently both in theoretical biology when considering the evolution of species traits [1,5,9] and in economy when predicting rational behavior of individuals whose payoffs are given through game interactions [3,6].
In the present paper we analyze a continuous mixed strategies model for population dynamics based on an integro-differential representation. Analogous models based on the replicator equation with continuous strategy space were recently investigated in [2,4,10,12,13]. In contrast with finite strategy spaces, where the notion of equilibrium is well understood and studied [11,15], the situation of games with infinite strategies is still missing a general theory due to several technical and conceptual difficulties [12].
The model here considered is characterized by a continuous density function f (t, q) of population adopting the q ∈ R N strategy at time t and presents some analogies with classical kinetic or mean field approaches. In particular we show that the model, which contains a cubic nonlinearity in f , can be reformulated in terms of the first moments of the solution. Such reformulation is essential in our analysis and in the derivation of numerical approximations.
For the moment based model we prove global existence of solutions and study the asymptotic behavior and stability of solutions in the case of two strategies. Two classes of stationary solutions are found. Continuous stationary solutions are characterized by every density function with a given mean strategy. If we consider more general solutions, so that the probability distributions are no more absolutely continuous with respect to the Lebesgue measure, another class of stationary solutions is given by concentrated Dirac masses. Numerical schemes for the two and three strategy case which are able to capture the correct equilibrium states are also proposed together with several numerical examples.
The rest of the paper is organized as follows. In Section 2, we present the model for N pure strategies and prove a priori estimates and the global existence of solutions. In Section 3, we put the emphasis on the model with two pure strategies, which can be reduced to a 1D model, and study the asymptotic behavior of the solutions and their relation with stationary solutions. Section 4 is dedicated to the numerical approximation of the 1D model and to numerical tests for the Prisoner's Dilemma and for the Hawk or Dove games, with results about the a priori estimate, the asymptotic behavior of the solutions and the stationary solutions. In Section 5 and 6, we present the 2D model and the numerical tests for the Rock-Scissors-Paper game. Some final considerations are reported in the last section.
2 An integro-differential model for continuous mixed strategies 2.1 Setting of the model First, we introduce an integro-differential model for continuous mixed strategies. We start from some preliminary concepts and definitions taken from [11]. Assume that we have a game where there are N pure strategies R 1 to R N and that the players can use mixed strategies: these consists in playing the pure strategies R 1 to R N with some probabilities q 1 to q N with q i ≥ 0 and q i = 1. A strategy corresponds to a point q in the simplex The corners of the simplex are the standard unit vectors e i with the i-th component is 1 and all others are 0 and correspond to the N pure strategies R i , i = 1, . . . , N . Let us denote by a ij the payoff for a player using the pure strategy R i against a player using the pure strategy R j . The N × N matrix A = (a ij ) is said to be the payoff matrix. An R i -strategist obtains the expected payoff (Aq * ) i = j a ij q * j against a q * -strategist, since q * j is the probability that he is met with strategy R j . The payoff for a q-strategist against a q * -strategist is given by We consider a population of individuals as a player of the game and denote by f (t, q) the density of population adopting the q strategy at time t; the evolution in time of f , due to the dynamics of the game, is driven by where the term represents the payoff of the strategy q against all the others strategies, A(q, q * ) being the interacting kernel between the q-strategist and the q * -strategist. The last term of the equation (7) is defined by and represents the average payoff of the population.
Since N i=1 q i = 1, we can reduce the number of variables, considering (3), on the simplex namely with A(p, p * ) defined by and φ defined by Remark 1. If we take an initial condition with T N −1 f 0 (p)dp = 1, then it is easy to see that f ≥ 0 for all t > 0 and if f 0 (p) = 0 for somep, then f (t,p) = 0 for all t > 0. We have also that This follows from the mass conservation, by integrating the equation (7) w.r.t. p and using (9) and (11) Let us introduce the moments for f : with k := (k 1 , k 2 , . . . , k N −1 ). Using M k (f ), the payoff and the average payoff (9) are expressed respectively by where e i ∈ R N −1 is the standard unit vector with the i-th component equal to 1 and all others equal to 0, In the final form of the equation (7), that we will use later in this paper, the only integral terms are the first moments M ei :

Global existence of the solutions
We consider the Cauchy problem (16)-(10) for t ≥ 0 and p ∈ T N −1 , i.e.
with f 0 (p) ≥ 0 and T N −1 f 0 (p)dp = 1. Proof. Let us define where We define the set for R ≥ M , and, for all g ∈ B R , the operator We have that for all g ∈ B R , It is easy to prove that G(g) ∈ B R for t ≤ T (M ): The operator G(g) is a contraction on B R for t ≤ T (M ): for all g,g ∈ B R |ϑi,j| |Me j (g)| dt The last inequality is obtained using the following inequalities, for all g,g ∈ B R : We have that G(g) is a contraction on B R for all t ≤ T (M ) and so problem (17) admits a unique We proved the local existence of solution in a time interval (0,T ), depending on M . Now we define T max as the time limit in which this local solution exists.
The solution f of the Cauchy problem (17) verifies the following a priori estimate The proof follows easily using the Gronwall inequality.
Lemma 2.1 and Lemma 2.2 provide the following Theorem: Now we present a simple property of the moments that we will use later in the paper to study the asymptotic behavior of the solutions for 2 × 2 games.
The set S is not empty because f ≥ 0 and its integral over T N −1 is equal to 1. We have f (p) > p k f (p) > 0 ∀ p ∈ S, and so We have also that Since T N −1 \S f (p) dp ≥ 0 and (24) holds, we have

Two strategies games
Assume there are two different strategies, whose interplay is ruled by the payoff matrix: In this case the simplex T 1 is just the interval [0, 1] and so we have a population where individuals are going to play the first strategy with probability p ∈ [0, 1] and the second strategy with probability 1 − p. The payoff (2) is given by The one dimensional Cauchy problem (17) reads with f 0 (p) ≥ 0 and 1 0 f 0 (p)dp = 1.

Asymptotic behavior of the solutions
We want to understand what happens asymptotically. We start with a result on the curve of change of sign for ∂ t f .
The proof of Proposition 2 is easily obtained by Lemma 2.4.
There are four different possible cases: Figure 1 shows that it is possible if and only if (α, β) ∈ A ∪ B. Let us describe in detail the different situations:  Figure 2 (on the left), M 1 (f ) is increasing in time and is limited on the right by the curveM (t) −→ 1 with  Figure 1 shows that it is possible if and only if (α, β) ∈ C ∪ D. (α, β) ∈ C Figure 3 (on the left) shows two different situations: By contrast with the previous case, the behavior changes according to the value of the first moment M 1 at initial time Figure 3 (on the right) shows the two situations: Also in this case, the behavior depends on the value of In any cases, the value − β α is the one that dominates in time. From the behavior described is easy to understand what happens in the population. If we are in region A, there is dominance of the first of the two pure strategies that describe the game, because the dynamic encourages the state p = 1 which corresponds to the first pure strategy. This means that in the population, those who adopt the first pure strategy survive, the others do not. In B we have the opposite situation: there is the dominance of the second pure strategy and so those who adopt the first pure strategy or any other mixed strategy, do not survive. The third region C is such that there is not a mixed strategy that dominates, but a priori we can not say which of the two pure strategies dominates, it all depends on the value of M 1 (0). If M 1 (0) > − β α then there is the dominance of the first pure strategy, if M 1 (0) < − β α then there is the dominance of the second pure strategy. In D we have a different situation than in the previous cases: here there is coexistence between the two pure strategies and so between the two populations.

Stationary solutions
From the study of the asymptotic behavior, we expect that for t → ∞ the solution of the model (27) tends to a stationary solution. We can find two classes of stationary solutions: Type I If we are in case b, namely − β α ∈ (0, 1), then a stationary solution is given by every density function f (p) such that Type II If we consider more general solutions, so that the probability distributions are no more absolutely continuous with respect to the Lebesgue measure, we can say that another class of stationary solutions is given by concentrated Dirac masses, i.e.: In the following we are going to deal with these generalized solutions in a quite informal way.
More rigorous arguments will be given in a future paper.
Here we want just remark that formally, since M 1 (f p ) = p, we have

Linear stability of stationary solutions
This Subsection is dedicated to the study of the linear stability of stationary solutions. Denote by the integral operator associated to the replicator equation. Letf be a generalized stationary state. We linearise the operator around the statef . So for every perturbation g, with 1 0 g(p)dp = 0, we have the linear operator Type I we have −1 < β α < 0. Using (34) we obtain that the linearized equation for a perturbation g is given by If 1 0 g 0 (p)dp = 0, the same is true for g for t > 0. Proof. To prove the result it is enough to establish the following equality which implies that the variance of the measuref dp vanishes and so the measure has to be a Dirac mass. We take a continuous stationary solution of Type I, namely a positive function f such that its total mass is equal to 1 and M 1 (f ) = − β α . We perturb this state by a function g of zero mass. Computing the first moment of the perturbation g yields Setting we obtain: This means that the condition for linear stability is just: This inequality can be verified only when the equality condition is satisfied, since we already know from (32) that N 1 (f ) ≥ 0.
Type II For the concentrated Dirac masses f p (p) = δ(p = p) we have that the linearized equation for a perturbation g is given by Proposition 4. The Dirac mass solutions are linear stable if, on the support of g(p), we have: For general perturbations, i.e.: with supp g ≡ [0, 1] we have three cases: 1. the Dirac mass concentrated in p = 0 is stable if β < 0, which means in the original constants, b < d.
2. the Dirac mass concentrated in p = 1 is stable if α + β > 0, which means in the original constants, a > c.

Numerical approximation of the 1D model
Clearly the above discretization is such that conservation of mass holds d dt provided that initially In the case of equally spaced points, ∆p = p i+1 − p i , the above property implies that and thus the numerical solution is well-defined even when we approach a Dirac delta at the continuous level. More precisely both possible steady states are preserved by the numerical method, namelyM 1 (f ) = −β/α and the discrete Dirac delta defined as

Numerical tests: Prisoner's Dilemma game
One interesting example of a game is given by the so-called Prisoner's dilemma game in which there are two players and two possible strategies. The players have two options, cooperate or defect. The payoff matrix is the following If both players cooperate both obtain R fitness units (reward payoff); if both defect, each receives P (punishment payoff); if one player cooperates and the other defects, the cooperator gets S (sucker's payoff) while the defector gets T (temptation payoff). The payoff values are ranked T > R > P > S and 2R > T + S. From the game theory we know that cooperators are always dominated by defectors. One of the main problems has been about the possibility of success for cooperation, which is impossible in the pure strategies models: the replicator dynamics of prisoner's dilemma, [11], shows that cooperators are extinguished.
For the numerical tests we fix the following normalized payoff matrix: with b = 1.1 and ε = 0.001. In this case we have α = 1 − b + ε < 0 and β = −ε < 0 and so β α > 0. This means that stationary solutions are expected to be given by concentrated Dirac masses (see Section 3.2). For general perturbation we have thatp = 0 is linearly stable.
(40) Figure 4 shows that the density f tends to concentrate at the point p = 0, according to what we (α, β) ∈ B (see Figure 1) and we know, from game theory, that the defectors' pure strategy dominates the cooperators' pure strategy. The evolution in time of M 1 (f ) (Figure 6) is as expected (see Figure 2 (on the right)).
We consider now a quadratic initial datum for the model (27). We have plotted the numerical results in Figure 7. As in the previous case with f 0 (p) = 1, we see that the density f tends to concentrated at the point p = 0 that corresponds to the defectors' strategy.

Test n.2
Now we want to consider an initial datum f 0 (p) with compact support in [p 1 , p 2 ] ⊂ [0, 1], with p 1 < p 2 . If we define we have that f 0 (q) has compact support in [0, 1]. W.r.t. q the average payoff (25) has the following form A(q, q * ) =ᾱqq * +βq +γq * +δ, withᾱ The quantityβ is positive if β α > 0, as in the Prisoner's Dilemma game. This means that the pointp = p 1 (corresponding to the pointq = 0) is stable, as shown in Figure 8 that is related to the Cauchy problem (27) with the following initial datum:

Numerical tests: Hawk or Dove Game
Another example of a game is given by the so-called Hawk or Dove game in which there are two pure strategies: hawks (H) and doves (D). While hawks escalate fights, doves retreat when the opponent escalates. The benefit of winning the fight is b. The cost of injury is c. If two hawks meet, then the expected payoff for each of them is b−c 2 . The fight will escalate. One hawk wins, while the other is injured. Since both hawks are equally strong, the probability of winning or losing is 1 2 . If a hawk meets a dove, the hawk wins and receives payoff b, while the dove retreats and receives payoff 0. If two doves meet, there will be no injury. One of them eventually wins. The expected payoff is b 2 . Thus the payoff matrix is given by H D If b < c, then neither pure strategy is a Nash equilibrium. If everybody adopts the first pure strategy (H), it is best to adopt the second pure strategy (D) and vice versa. This means that hawks and doves can coexist. Selection dynamics will lead to a mixed population.
We fix b = 1 and look for a suitable value of c > b for the numerical tests. We obtain the matrix In this case we have − β α = 1 c and (α, β) ∈ D if c > 1 (see Figure 1). The function is an admissible stationary solution for the problem, namely positive, with a total mass equal to 1, and with the first momentum equal to − β α , if and only if θ = 2 3 and c = 36 17 . In Figure 9 we show the numerical results for the Cauchy problem (27), associated to this initial datum. We remark that the numerical scheme preserves the stationary solution.  As a consequence of Proposition 3, the stationary solution (47) is not linearly stable, in fact Actually, even small perturbations of the datum can generate large perturbations on the solutions. We consider a perturbation with zero mass for the function (47): ∀ p ∈ [0, 1]. Figure 10 shows the evolution of f (t, p) for the related Cauchy problem. The perturbed datum (48) originates the loss of the stationary solution, as seen in Figure 10. The solution of the problem evolves (slowly) towards a Dirac mass and we can see the first moment M 1 which converges to the value − β α = 0.4722 (Figure 11), as expected since (α, β) are in the region D.

Three strategies games
Assume there are three different strategies, whose interplay is ruled by the payoff matrix: We have a population where individuals are going to play strategy A with probability p 1 , strategy B with probability p 2 and strategy C with probability 1 − p 1 − p 2 , for (p 1 , p 2 ) ∈ T 2 , where the simplex T 2 is just The payoff (2) is given by with α := a 1 − a 3 − a 7 + a 9 , β := a 2 − a 3 − a 8 + a 9 , γ := a 4 − a 6 − a 7 + a 9 , δ := a 5 − a 6 − a 8 + a 9 , σ := a 3 − a 9 , η := a 7 − a 9 , ξ := a 6 − a 9 , µ := a 8 − a 9 and ι := a 9 . In this case, the problem (17) is where the source term F (f ) is defined as follows: We consider the initial datum f 0 (p) such that f 0 (p) ≥ 0 and T2 f 0 (p)dp = 1.

Remark 2.
It is easy to prove that if then every distribution functionf (p) with is a stationary solution for the problem (51). Actually, by arguing as in Section 3, also Dirac masses concentrated on points are stationary solutions of these equations.

A special case: the Rock-Scissors-Paper Game
We consider the Rock-Scissors-Paper game, which is characterized by having three pure strategies such that R 1 is beaten by R 2 , which is beaten by R 3 , which is beaten by R 1 . The outcomes of the game are tabulated as In the Rock-Scissors-Paper game, the constants that appear in the source term (52) have the following values: α = 0, β = 3, σ = −1, γ = −3, δ = 0, ξ = 1, and so The initial datum f 0 (p) = 2 has integral over T 2 equal to 1 and the moments M (1,0) (2) = M (0,1) (2) = 1 3 and so it is a stationary solution for this game. In the next Section 6 we will present the numerical results related to this stationary solution.
Now we want to present a result about the curve of changing sign for ∂ t f : for this game the source term (52) is and so we have that the curvep(t) ⊂ R 2 such that ∂ t f (p(t)) = 0, is given by that is the straight line joining the points (M (1,0) (f ), M (0,1) (f )) and 1 3 , 1 3 . In the following Section 6 we will present the evolution over time of this straight line.

Numerical approximation for the 3-strategies model
First we want to construct a numerical method for problem (51). The domain T 2 is just the triangle with vertices (0, 0), (1, 0), (0, 1). In order to make the numerical integrations, we fix a discretization step ∆p and a uniform triangular grid in T 2 as Figure 12 shows. Each point of the grid is with I := 1 ∆p . We use the notation g i,j := g(t, p 1,i , p 2,j ) for all i = 0, . . . , I and j = 0, . . . , I − i to indicate the value of a general function g(t, p 1 , p 2 ) at each grid point p ij . In order to discretize the integral of g over the domain T 2 , we start to consider each triangle of the grid and indicate its vertices as (x s , y s ), for s = 1, 2, 3. We define the following quantities: g := max(g(x s , y s )) s = 1, 2, 3 g := min(g(x s , y s )) s = 1, 2, 3, the maximum and the minimum value of g on the triangle. On each triangle of the grid we consider a 2D product formula based on the trapezoidal rule: T2 g(t, p 1 , p 2 )dp = 1 0 1−p1 0 g(t, p 1 , p 2 ) dp 2 dp 1 where The discretization of the first moments M (1,0) and M (0,1) is easily obtained by (59), considering the function g(t, p 1 , p 2 ) = p 1 f (t, p 1 , p 2 ) for M (1,0) (f ) and g(t, p 1 , p 2 ) = p 2 f (t, p 1 , p 2 ) for M (0,1) (f ).
Similarly to the one-dimensional case it can be shown that the method preserves the total mass in time, as well as discrete analogous of the steady states. As before the time discretization is done with a fourth order Runge-Kutta method with constant time stepping.
The graphical results (in Figure 13) shows that there is dominance of one of the groups: the initial datum is likely to have two areas of concentration, the final configuration shows only one area of concentration, and the total L 1 mass remains constantly equal to 1 over time. The dominance group is contained in the region where ∂ t f is positive as we can see in the Figure 14 that shows the contours of f and the numerical results of the straight linep(t) of changing sign for ∂ t f (see Subsection 5.1). We also remark that, for all t > 0, the support of f (t) is equal or a subset of the support of the initial datum f 0 :
The graphical results ( Figure 15 and Figure 16) show that the situation is different from the previous test: in this case the initial datum lies between the two regions where ∂ t f is positive and negative and the straight linep(t) of separation between this two regions does not changes significantly over time. Therefore the configuration of the function f at the final time T is not very different from that at the initial time.

Test 1.3
We fix s = 3, (p 0,1 1 , p 0,1 2 ) = ( 1 3 , 1 3 ), (p 0,2 1 , p 0,2 2 ) = ( 3 16 , 3 16 ), (p 0,3 1 , p 0,3 2 ) = ( 1 10 , 3 5 ), K 1 = 600, K 2 = 300 and K 3 = 300.  The graphical results (Figure 17 and Figure 18) show dominance phenomena in the region where ∂ t f is positive. Initially the three areas of concentration are located, almost entirely, in the region where ∂ t f is negative. The time evolution shows us that, already at t = 6, two of the three areas of concentration are in the middle between the two regions where ∂ t f is negative and positive, then to the final time, are completely in the region where ∂ t f is positive. So it is clear that dominance takes place in these areas.

Conclusions
We have considered a kinetic-like model for the evolution of a continuous mixed strategy game. The model is based on the time evolution of a density function describing the density of population adopting a given strategy. We established several analytical properties and develop some numerical discretizations useful for numerical simulations in the case of two and three strategies. Several explicit examples for two and three strategies games are reported. Of course when considering more strategies a deterministic approach may result in excessive computational requirements and stochastic simulations methods should be considered [7].
Let us finally mention that, in the situation considered so far, each player adopts a strategy and evolution over time leading to survival or not of the player. In principle it can be interesting to consider a situation in which each player can change strategy by a random mutation, so moving through the strategy space. One can introduce, to this end, a term in the equation that allows for the random change of strategy, following the ideas presented in [8]. The most natural way to model this phenomenon is to add a variation term in the equation (16), due to the probability p ∈ T N −1 with D > 0. The new term p f can be interpreted as a diffusion term describing the spreading of the population in the probability space from strategy to strategy, which in evolution models corresponds to a random mutation mechanism, and will be the object of a future work. A similar model has been presented recently in [14].