Evolutionary dynamics of the prisoner's dilemma with expellers

Expulsion refers to the widespread behavior of expelling intruders from the owners’ territories, which has not been considered in current models on the evolutionary dynamics of cooperation so far. In the context of prisoner’s dilemma, we present a simple game-theoretical model of expulsion in which punishing cooperators (i.e. expellers) are able to banish defectors from their own neighborhoods. In the mean-field limit, our theoretical analysis of prisoner’s dilemma with expellers shows that the increment of either vacant sites ratio or time scale parameter between pairwise interaction process and strategy updating process can slow down evolutionary speed though defection is the only stable fixed point anyway. In more realistic spatial settings, we provide both analytical and numerical results for the limiting case where pairwise interaction dynamics proceeds much faster than strategy updating dynamics. Using the extended pair approximation methods and Monte Carlo simulations, we show that the introduction of expellers not only promotes coevolution of expulsion and cooperation by means of both direct and indirect domain competition but also opens the gate to rich dynamical behaviors even if expulsion is costly. Phase diagrams reveal the occurrence of frozen as well as dynamical stationary states, between which continuous or discontinuous phase transition may happen. For intermediate ranges, we investigate numerically the coupled interplay between pairwise interaction dynamics and strategy updating dynamics, and show that the validity of main results for the limiting case can be extended to this general case. Interestingly, there exists an optimal value of time scale parameter that results in the maximum fraction of altruistic players, which resembles the coherence resonance phenomenon in dynamical systems. Our results may provide insights into understanding coevolutionary dynamics of expulsive and cooperative behavior in social dilemma situations.


Introduction
In pairwise social dilemma games, two players have to simultaneously choose between two strategies, i.e. cooperation (C) and defection (D). Mutual cooperation leads to a reward R, whereas mutual defection results in a punishment value P for each individual (R>P). If one player cooperates and the other one defects, then the cooperator obtains a payoff S, i.e. the sucker's payoff, whereas the defector gets a payoff T, which is often described as the temptation to defect (T>S). In the presence of social dilemmas, mutual cooperation should be preferred over unilateral cooperation (i.e. R>S) and over an equal probability of unilateral cooperation and defection (i.e. 2R>T+S). If T>R>P>S is further satisfied, we obtain the famous prisoner's dilemma [1], which is the most challenging pairwise social dilemma game for the evolution of cooperation [2]. Obviously, it is best for a rational individual to defect regardless of the coplayer's decision when facing such a dilemma. Indeed, defection is the only strict Nash equilibrium [3], and also the evolutionarily stable strategy [2,4,5] in neighborhoods with better quality, and are often willing to expel cheaters who accept benefits but do not reciprocate in order to avoid exploitation. Remarkably, it is found that expulsion and cooperation can coevolve in the spatial prisoner's dilemma with expellers as long as population has a sparse structure, though defection is the only evolutionarily stable strategy in the mean-field prisoner's dilemma with expellers.
The paper is organized as follows. Section 2 describes the evolutionary game theoretical model of prisoner's dilemma with expellers in detail. Section 3 presents the main results, including analysis of mean-field dynamics (section 3.1) as well as spatial dynamics (section 3.2). Section 4 gives summarizations and discussions on our findings.

Prisoner's dilemma with expellers
Consider a population of size N structured by an undirected and unweighted graph G V E , ( )where the set of vertices V represents the sites, and the set of edges E denotes the possible pairwise symmetric interactions between individuals. The vertex v i represents the ith site which can be either occupied by an individual or just empty, while the edge e ij denotes the possible pairwise partnership between the individual at site v i and that at site v j .
Initially, a fraction p f ( . In each time step (e.g. t), pairwise interaction dynamics happens with probability 1−w, whereas strategy updating dynamics occurs with probability w.

Pairwise interaction dynamics
A pair of sites linked by an edge, e.g. e ij , is randomly picked up from the graph. If both sites are occupied by individuals, they play the one-shot prisoner's dilemma with each other; otherwise, the current time step ends. The payoff P ij of the individual at site v i and the payoff P ji of the individual at site v j resulted from such an interaction can be respectively expressed as which represents the donor-recipient version of prisoner's dilemma with a single rescaled parameter: the costto-benefit ratio r (0<r<1). Here the cost-to-benefit ratio r measures the strength of dilemma confronted by individuals [77,78]. Note that, in our model, expellers act as cooperators to play the prisoner's dilemma with their coplayers. But different from cooperators, we assume that expellers will unilaterally exclude defectors at a cost c from their current sites into any other vacant sites in the population after they are exploited by defectors in the prisoner's dilemma. The payoff of each individual, e.g. the payoff P i of the individual at site v i , is accumulated during the pairwise interaction process. Because of the randomness in selecting pairs of individuals, the rounds that each individual participates in vary from each other. To decouple the effect of this heterogeneity, the accumulative payoff of each individual, e.g. the one at site v i , is finally normalized according to where z represents the frequency with which the individual at site v i is chosen for interaction with other ones during the pairwise interaction process.

Strategy updating dynamics
Individuals experience strategy updating phase in a synchronous manner. Specifically, each individual at site v i imitates a randomly selected neighbor at site v j , if any, with probability given by the Fermi function [20]: Tr P P P P K where the noise K is the measure of stochastic uncertainties allowing irrational choices. Otherwise, the player at site v i does not change its strategy.
Obviously, the parameter w controls time scale between pairwise interaction process and strategy updating process: When w→0, pairwise interaction dynamics proceeds much faster than strategy updating dynamics. Therefore, the equilibrium of pairwise interaction dynamics determines the local configurations as well as the average payoffs of individuals on a graph (see an analytical method for regular graphs in appendix B). This means that the consequent strategy evolution proceeds as on a graph with no mobility of individuals induced by expulsion events (see a theoretical method for regular graphs in supplementary material available online at stacks.iop.org/JPCO/3/015011/mmedia); when w → 1, one can expect neutral evolution on a graph depending on the initial strategy configuration; whenever 0 < w < 1, evolution is driven by a detailed interplay between these two dynamical processes.

Mean-field dynamics
Now let us consider the deterministic dynamics of prisoner's dilemma with expellers in the mean-field setting, where the state is characterized by density p of players (i.e. p E of expellers, p C of cooperators and p D =1−p E −p C −p f of defectors) (see appendix A). In the limit w→0, the expected values Ex • ( ) of average payoffs P X (X ä E, C, D) for expellers, cooperators and defectors are respectively given by According to the replicator-like equation set describing motion for the concentration of each type of players (equation (A.5)), p E and p C tend to zero for arbitrary value of p f and K as Ex P Ex P Ex P The increment of p f or K merely leads to the slowdown of evolutionary speed (figure 1), but cannot change evolutionary direction of the population as Ex P Ex P Ex P for any composition of the population (equation (5)). In short, expellers and cooperators become extinct in the mean-field limit for w→0 (figure 2(a)). In the opposite limit w → 1, pairwise interaction dynamics almost never occurs. In this case, one can obtain Ex P Ex P Ex P 0 . Then natural selection will not change the composition of the population. Any mixture of expellers, cooperators and defectors in the simplex S 3 is an unstable equilibrium ( figure 2(b)).
In the general case of 0<w<1, we cannot give analytical predictions on the evolutionary outcome of the population since the expressions of Ex P E (¯), Ex P C (¯) and Ex P D (¯) cannot be further simplified (equations (A.1)-(A.3)). Nevertheless, in the mean-field limit, the prisoner's dilemma with expellers can be described by the following transformed 3×3 payoff matrix M': according to equations (A.1)-(A.3) From the above payoff matrix, we can see that the strategy D is completely dominant over C and E, and has a basin of attraction extended to the whole domain of the system. Based on this as a function of (a) ratio of vacant sites p f when noise K=0.1 and (b) noise K when ratio of vacant sites p f =0.5. The time to reach stable fixed point τ i, j is determined by equations (5) and (A.5) the initial condition of which is given by Parameter settings: cost-to-benefit ratio r=0.5, cost of expulsion c=0.5 and T=100.
fact, we predict that the variation of w does not alter the evolutionary outcome that defectors will eventually wipe out expellers and cooperators irrespective of the initial composition of the population. To confirm this prediction, we present phase diagrams of prisoner's dilemma with expellers for w=0.2, w=0.4, w=0.6 and w=0.8 in figure 3. It can be seen that all evolutionary trajectories end up in the pure state of defectors irrespective of the values of w. In sum, defectors are advantageous over expellers as well as cooperators, and can take over the whole population in the mean-field limit.

Spatial dynamics
In realistic populations, unlike the mean-field model, individuals do not interact with all other ones. Let us therefore consider a structured population with individuals arranged on a square lattice with von Neumann neighborhood (i.e. including connections between nearest neighbors) and periodic boundary conditions.  (5) and (A.5)). The arrows indicate the direction of deterministic flows in the simplex, while solid (hollow) points denote stable (unstable) fixed points. Red corresponds to fast dynamics and blue to slow dynamics close to the fixed points of the system. (a) For w→0, there is only one stable fixed point in the defector corner, and any state alone the line EC in the simplex is an unstable fixed point. (b) For w→1, there is no stable fixed point, and any state of the dynamical system is an unstable fixed point. Parameter settings: cost-tobenefit ratio r=0.5, ratio of vacant sites p f =0, cost of expulsion c=0.5 and noise K=0.1. The arrows indicate the direction of deterministic flows in the simplex, while solid (hollow) points denote stable (unstable) fixed points. Red corresponds to fast dynamics and blue to slow dynamics close to the fixed points of the system. Parameter settings: cost-to-benefit ratio r=0.5, ratio of vacant sites p f =0.1, cost of expulsion c=0.5 and noise K=0.1.
Despite their difference from actual social networks [79], lattices provide a convenient entry point for exploring the impacts of network structures on evolutionary dynamics [14,20,80]. Furthermore, the interaction and competition between species in realistic systems, especially in biology and ecology, can be represented adequately by means of lattices [81][82][83][84]. Therefore, we aim to study the evolutionary dynamics of prisoner's dilemma with expellers on spatial networks.
In our computer simulations, each efficient time step (t e ) gives exactly one chance for every individual to change its strategy. Initially, an equal number of expellers, cooperators and defectors are randomly placed on a square lattice of size N, such that the fraction of occupied sites is 1−p f . The stationary densities of expellers (p E ), cooperators (p C ) and defectors (p D ) on square lattices are evaluated over sufficiently long sampling times after the system enters into the equilibrium state. To ensure proper accuracy, final results (i.e. the mean densities of expellers p Ē , cooperators p C and defectors p D ) are averaged over up to 50 independent runs for each set of parameter values. ). In this case, cooperators are able to survive in the population by forming compact clusters as long as the cost-tobenefit ratio r is sufficiently low [23]. However, the mean fraction of cooperators p p 1 C f ( )becomes almost invariant with the cost-to-benefit ratio r when ratio of vacant sites p f is high (figures 4(c) and (d)). In this case, the clusters of cooperators become sparse. Cooperators are vulnerable to the invasion of defectors irrespective of the values of cost-to-benefit ratio r. Therefore, only cooperators isolated from the spatial domain of defectors remain in the population [85].
Let us now study the spatial prisoner's dilemma with expellers when the parameter of time scale w→0. In comparison with those in figure 4, results reported in figure 5 demonstrate that expulsion is able to promote the evolution of cooperation irrespective of the values of cost-to-benefit ratio r, ratio of vacant sites p f and cost of expulsion c. On saturated square lattices (i.e. p f =0), there is no vacant sites available for expellers to banish their defective neighbors. Expellers, like cooperators, persist in the population solely by virtue of network reciprocity. If cost of expulsion c=0, expellers and cooperators become equivalent. Hence the mean fraction of expellers p p 1 )for any value of cost-to-benefit ratio r ( figure 5(a)). However, if cost of expulsion c>0, the evolutionary performance of expellers becomes inferior to that of cooperators (figures 5(e), (i) and (m)). In contrast, on diluted square lattices (i.e. p f >0), expellers perform better than cooperators and can persist (or even prevail) in the population across the whole applicable range of cost-to-benefit ratio r and cost of expulsion c (figure 5). In short, expulsion can coevolve with cooperation in the spatial prisoner's dilemma with expellers in contrast to the evolutionary outcomes in the mean-field limit.
To systematically investigate the effects of cost of expulsion c and ratio of vacant sites p f , we move to study phase diagrams for two representative values of cost-to-benefit ratio r in the spatial prisoner's dilemma with expellers. In the absence of expellers, cooperators survive only if r<0.006, and they are able to defeat defectors completely for r<0 on the saturated square lattice. Taking these thresholds as benchmark values, we focus on r=0.005 and r=0.5. For r=0.005, cooperators are able to coexist with defectors solely on the basis of network reciprocity. This value of r thus yields lenient conditions for the evolution of cooperation. For r=0.5, on the other hand, cooperators are unable to survive. This value of r thus yields adverse conditions for cooperative behavior to emerge. In close vicinity of transition points, we here enlarge the size of system from  where w N denotes the parameter of time scale associated with a system of size N. Figure 6 shows the dynamical behaviors of two systems with different sizes (i.e. N=10 4 and N=9×10 4 ) but the same ratio of time scale (i.e. Nw/(1−w)=5×10 −3 ). Clearly, evolutionary trajectories for both systems are largely coincident with each other. Due to the finite-size effects, the current fractions of expellers p t p 1 )fluctuate around an equilibrium after initial transient times, wherein fluctuations are smaller in the lattice with larger size (figure 6).
The phase diagram for r=0.005 presented in figure 7(c) together with figure 5 suggests that at such a low value of r, expellers rather than cooperators represent the more effective players to outperform defectors except if cost of expulsion c>0 and ratio of vacant sites p f =0. The mixed and dynamical (C+D) D phase in the bottom line of the c−p f plane first gives way to the mixed and dynamical (E+C+D) D phase, subsequently to the mixed and frozen (E+D) F phase and finally to the mixed and frozen (E+C+D) F phase as the ratio of vacant sites p f increases. Here a mixed state is further classified as a frozen or dynamical sub-state. A sub-state is called to be frozen provided two or more of the second order heterogeneous variables (i.e. p EC , p ED or p CD ) of the system are equal to zero in the stationary state. Otherwise, the sub-state is defined to be dynamical. While majority of phase transitions is continuous, the E C D E D D F ) phase transition is discontinuous because of an indirect territorial competition between expellers and cooperators (for details see main text below) [86,87]. Only if the ratio of vacant sites p f is either negligible or large, are cooperators able to survive. When the ratio of vacant sites p f is sufficiently small (e.g. p f =0.005), the clusters of cooperators are compact enough to fight against defectors ( figure 8(a)). Accordingly, cooperators are able to coexist alongside expellers and defectors  in a dynamical manner (see the mixed and dynamical (E+C+D) D phase in lower part of figure 7(c)). When the ratio of vacant sites p f is large (e.g. p f =0.8), the clusters of cooperators are sparse and thus are easily invaded by their defective neighbors. In this case, only cooperators that are isolated from defectors can persist in  (7)). By N=10 4 and N=9×10 4 , the fluctuations of values around an equilibrium are smaller for the lattice with larger size. Indeed, if initial 5×10 3 values are discarded, the standard deviations σ of current fractions with respect to mean fractions are smaller for N=9×10 4 in comparison with that for N=10 4 Parameter settings: cost-to-benefit ratio r=0, ratio of vacant sites p f =0, cost of expulsion c=0 and noise K=0.1. Interestingly, when the ratio of vacant sites p f is moderate (e.g. p f =0.1), cooperators become extinct, and expellers and defectors are frozenly coexistent in the system (see the mixed and frozen (E+D) F phase in middle part of figure 7(c)). To understand why expellers can outperform cooperators despite additional costs of expulsion, it is important to analyze spatiotemporal dynamics of spatial prisoner's dilemma with expellers. Let ) denote the transition probability of Xʼs changing into Yʼs at efficient time step t e .
we get the following equation set [20]:  , the description of mean-field is applicable, and the current fraction of expellers p t p 1 ) decreases with the efficient time t e in an exponential way due to the advantage of defectors and cooperators in payoff (figures 9(a) and (i)). In comparison, the current fraction of cooperators p t p 1 ) does not vary sharply with the efficient time t e : it first increases mildly and then decreases slowly. Although exploited by defectors at this stage, cooperators are compensated at the expense of expellers (e.g. at t e =1, the net inflow of cooperators changing from defectors G t G t 0.042 7; figure 9(i)). The further development is determined by a local equilibrium where strategy updating only occurs at the interfaces between clusters of different strategies (i.e. t 8, In the presence of defectors, the evolutionary fate of expellers is decided not only by a direct competition with cooperators, but also by the success of both cooperative strategies against invasion attempts by defectors (the accumulated net inflow of expellers by the manner of direct domain competition with cooperators G t G t dt 0.104 6; ) denotes the current net inflow of Zʼs from Xʼs via Yʼs at efficient time step t e ; figure 9(d)). If ratio of vacant sites p f is moderate, the clusters of cooperators are not compact enough to defend against defectors in their neighborhood. Furthermore, the spatial configurations of cooperators are constantly subjected to additional destruction due to the attack of migratory defectors induced by the expulsion of expellers (figures 9(f) and (g)). In contrast, expellers are able to continuously banish defectors in their neighborhood, and thus reduce frequency of interaction with defective players (figures 9(f) and (g)).
Consequently, the evolutionary performance of cooperators become inferior to that of expellers, which explains the advantage of expellers over cooperators in the direct domain competition (e.g. at t e =8, the net inflow of expellers changing from cooperators G t G t 0.001 1; and (k)). Figures 9(b) and (c) demonstrate the excellent competitiveness of expellers against defectors in their evolutionary race. Except extremely few efficient time steps, the current net inflow of expellers changing from defectors is always no less than that of defectors changing from expellers after the self-organization process of initial clusters (e.g. at t e =8, the net inflow of expellers changing from defectors G t G t 0.000 2; figures 9(j) and (k)). Indeed, expellers are able to accumulate more payoffs than defectors in the pairwise interaction process, due to evolutionary mechanisms of network reciprocity as well as expulsion. Interestingly, expellers can spread successfully in the presence of defectors that spatial domains lost by cooperators are finally occupied by expellers (figures 9(f)-(h)), which elaborates the advantage of expellers over cooperators in the indirect domain competition. The synergetic combination of direct and indirect territorial competitions ultimately leads to the eventual extinction of cooperators and the final downfall of defectors ( figure 9(a)). The evolutionary competition between expellers and defectors terminates when they are completely separated from each other (figures 9(h) and (l)).
Moreover, if ratio of vacant sites p f is negligible, the increment of cost of expulsion c can lead to phase transition from the mixed and frozen (E+D) F phase to the mixed and dynamical (E+C+D) D phase ( figure 7(c)). Here the emergence of cooperation is due to the increasing cost of expulsion c, which lends more support for cooperators to compete with expellers. The more specific nature of the phase diagram for r=0.005 is illustrated quantitatively in figures 10(a) and (b), which shows a cross section across the ratio of vacant sites p f for cost of expulsion c=2 and that across the cost of expulsion c for ratio of vacant sites p f =0.02, respectively. Interestingly, it can be observed in figure 10(a) that there exists an optimal ratio of vacant sites p * f leading to the maximal overall fraction of altruistic individuals (i.e. expellers and cooperators) in the system. Note that the optimal ratio of vacant sites p * f in present study does not relate to percolation threshold of the host graph, in contrast to the findings in either prisoner's dilemma [85] or public goods game [88]. Here the evolutionary fate of altruistic individuals is determined by both direct and indirect territorial competition between expellers and cooperators. Neither too large nor too small but moderate ratios of vacant sites p * f maximize the positive impact of territorial competition.
If the conditions for the evolution of cooperation become harsh, as is the case for r=0.5, the phase diagram changes qualitatively ( figure 7(d)). It is shown that the mixed and dynamical (C+D) D phase at ratio of vacant sites p f =0 as well as the mixed and dynamical (E+C+D) D phase at low ratio of vacant sites p f transforms into the pure D phase and the mixed and frozen (E+D) F phase, respectively. Here, cooperators vanish because of the increasing cost of cooperation. In addition, the parameter region where expellers, cooperators and defectors can frozenly coexist (i.e. the mixed and frozen (E+C+D) F phase) extends to lower part of the c−p f parameter plane (compare figures 7(d) with (c)). When the cost-to-benefit ratio r is large (e.g. r=0.5), expellers and cooperators can survive only if they are separated from defectors. If the ratio of vacant sites p f is small (e.g. p f =0.1), cooperators would not have enough space to escape from the exploitation by defectors. Thus expellers coexist with defectors in a frozen manner ( figure 11(a)). If the ratio of vacant sites p f is large (e.g. p f =0.8), as is the case for r=0.005, cooperators are able to frozenly coexist with expellers and defectors in the system ( figure 11(b)).   [89]. In principle, more reliable predictions can be made by enlarging the basic cluster from two-point (used by pair approximation or two-point approximation) to n-point (n>2; used by n-point approximation) [20].
In the opposite limit w → 1, the system will stay in the strategy updating process but rarely enter into the pairwise interaction process. In such a case, the whole population is governed by random drift because the payoffs of expellers, cooperators and defectors are approximately equal to zero. In terms of game results, there is no role difference between expellers, cooperators and defectors. Hence, expulsion plays neither positive nor negative role in the evolution of cooperation.  figure 12(b)). Otherwise, defectors dominate expellers as well as cooperators completely (see top row of figure 12). However, on diluted square lattices (i.e. p f >0), expellers can coexist with defectors on the full r−w parameter plane (see middle and bottom rows of figure 12). For relatively low ratio of vacant sites p f (e.g. p f =0.4), expellers perform best when cost-to-benefit ratio r and parameter of time scale w are negligible (figure 12(e)). Meanwhile, cooperators are able to survive only if the parameter of time scale w is moderate ( figure 12(f)). Interestingly, the mean fraction of cooperators p p 1 C f ( )is largely invariant with cost-to-benefit ratio r. For relatively high ratio of vacant sites p f (e.g. p f =0.8), the evolutionary fate of expellers is unaffected by cost-to-benefit ratio r as well as parameter of time scale w (figure 12(i)).As for cooperators, both cost-to-benefit ratio r and parameter of time scale w have similar effects as they have in the case of relatively low ratio of vacant sites p f (compare figures 12(f) with (j)). Intriguingly, it is shown that there exists an optimal as well as a worst value of time scale parameter w + and w − that leads to the highest and lowest level of expulsion and cooperation, respectively (figures 12(c), (g) and (k)). In present study, pairs of individuals are stochastically chosen to play prisoner's dilemma with expellers in the pairwise interaction process. The randomness of this process can be adjusted by the parameter of time scale w. In one limit w→0, the pairwise interaction dynamics is deterministic. In the other limit w → 1, the pairwise interaction dynamics becomes stochastic to the largest extend. Whenever 0<w<1, the pairwise interaction dynamics falls somewhere between these limits. Therefore, the existence of such an optimal phenomenon can attribute to the coherence resonance scenario trigged by the parameter of time scale w which may be considered as the constructive noise [90,91]. In this case, the increment of mean fraction of altruistic players (i.e. expellers and cooperators) p

The general case
) is regarded as an constructive effect, as widespread altruistic players yield a higher total population payoff in comparison with dominant defectors, and thus is favourable for the population. The mean fraction of altruistic players p p p 1 ) determines the constructive effects of noise on the system and thus has the same meaning as the signal-to-noise ratio in classical coherence resonance phenomena observed in dynamical systems [46]. On the other hand, the appearance of the worst value of time scale parameter w − near w=1 is due to the unbalanced interplay between pairwise interaction dynamics and strategy updating dynamics. Figure 13 shows the optimal and worst parameter of time scale w + and w − as a function of ratio of vacant sites p f . Both the optimal and worst parameters of time scale w + and w − are increased with the ratio of vacant sites p f . This result indicates that with increasing vacancy rate the optimal as well as worst trade-offs between pairwise interaction dynamics and strategy updating dynamics translate from deterministic to stochastic manner.

Summary and discussion
The purpose of the present paper is to study the evolutionary dynamics of prisoner's dilemma with expellers, and to determine whether expulsion can provide evolutionary advantages for altruistic individuals (i.e. expellers and cooperators) to compete with defectors in prisoner's dilemma. In the mean-field limit, we have shown that   defection is the only evolutionarily stable strategy in prisoner's dilemma with expellers. The increment of either vacant sites ratio or time scale parameter between pairwise interaction dynamics and strategy updating dynamics merely leads to the extension of mean time to reach the stable equilibrium. However, expulsion is, in general, effective in deterring defection in spatial prisoner's dilemma with expellers as long as the population has a sparse structure. In one limit of time scale parameter w→0, the introduction of expellers can dramatically change the system behavior, wherein we do not find the dominance of defectors as it is in spatial prisoner's dilemma, but do find the maintenance or even prevalence of expellers (compare figures 4 with 5). In the presence of defectors, expellers can take advantages over cooperators by a combination of direct and indirect domain competition. By the evolutionary mechanism of expulsion, expellers are able to decrease frequency of interaction with defectors while increase frequency of interaction between cooperators and defectors (see appendix B). The former factor lowers down the payoffs of defectors in expellers' neighborhood, which helps expellers to reach a superiority over defectors who in turn diminish cooperators: a manner of indirect territorial competition. On the other hand, both factors assist expellers to defeat cooperators in the direct territorial competition. Although there exists a rather constrained and unrealistic parameter region (i.e. cost-to-benefit ratio r → 0, ratio of vacant sites p f =0 and cost of expulsion c>0; figure 5) where cooperators can outperform expellers, these advantages are highly impossible to play a role in reality. The spatial prisoner's dilemma with expellers gives rise to a rich variety of possible dynamics and a number of continuous or discontinuous transitions between qualitatively different system behaviors ( figure 7). For low cost-to-benefit ratio r, we have observed several types of strategy coexistence: ) . Each kind of these states is sustained in a particular way by means of which a portion of altruistic individuals can survive in the presence of defectors. To be specific, cooperators can dynamically coexist with defectors by spatial aggregation in C D D + ( ) phase. In E C D D + + ( ) phase, expellers can dynamically coexist with cooperators and defectors by direct and indirect territorial competition, the latter of which also leads to the discontinuous phase transition between ) and E C D F + + ( ) phases, altruistic individuals can coexist with defectors by spatial isolation in a frozen manner. For high cost-to-benefit ratio r, we have observed more simplified evolutionary outcomes: (1) the pure state D, (2) the mixed and frozen state E D F + ( ) and (3) the mixed and frozen state E C D F + + ( ) . Note that our principal discoveries are not expected to change for relaxing parameter of time scale from the limiting case w→0 to the general case 0<w<1 ( figure 12). Interestingly, we have found that there exist not only an optimal but also a worst value of time scale parameter w + and w − that results in the highest and lowest level of expulsion and cooperation, respectively. By interpreting parameter of time scale w as constructive noise, we can attribute the optimal phenomenon to coherence resonance reported previously in temporal and spatially extended dynamical systems [46,90,91]. On the other hand, the existence of worst time scale parameter w − is due to the unbalanced interplay between pairwise interaction dynamics and strategy updating dynamics. Furthermore, it is well known that mutation weakens the benefits of network reciprocity to cooperation in social dilemma games [92]. Therefore it is worth investigating how the coevolution of expulsion and cooperation affected by mutation. The preliminary simulation results show that the beneficial effects of expulsive behavior on the evolution of cooperation are fairly robust against the variation of mutation. This indicates that the evolutionary mechanism of expulsion on spatial networks is different from network reciprocity, which deserves additional research to clarify.
It was previously found that punishment represents a typical behavior that is able to promote cooperation in certain situations [93,94]. Particularly, costly punishment refers to the special kind of punishment that is manipulated by imposing some fine on defectors but at a cost to punishers [86,[95][96][97][98][99][100][101]. In fact, this kind of costly punishment can be classified as an active one. In this sense, the expulsive behavior in our model can be considered as a kind of passive punishment: expellers tend to 'punish' defectors in a way that they terminates future interactions between them. On the other hand, expulsion is very similar to ostracism, in which defectors lose all their present partners [102,103]. In addition, the evolutionary mechanism that assists expellers to fight against the exploitation of defectors share a similarity with that in optional games of social dilemma: reduction of interaction frequency between cooperative strategies and defective strategies [104]. Nevertheless, different from the loner in optional games, expellers in our model can also win over cooperators via enhancing frequency of interaction between cooperators and defectors though it requires additional supports from network reciprocity as well as dilution of graph. This is the reason that leads to different dynamics between games with loners and games with expellers. Still, spatial prisoners dilemma with expellers can also be considered as a kind of coevolutionary model between individual strategies and network structures as the neighborhoods of individuals are also coevolved with strategies in our model [22]. Recently, the impact of social exclusion on the evolution of cooperation governed by group interaction has been extensively studied in infinite populations, finite populations and spatial networks, respectively [105][106][107]. In their works, excluders are able to deny the rights of defectors to enjoy the benefits of public goods at some cost. In this case, excluders can defeat defectors as long as the net gain of excluders from the public goods game is greater than the cost of exclusion. Thus social exclusion is a very powerful mechanism for the evolution of cooperation even in infinite populations. In contrast, expellers in our model are able to recognize the types of coplayers only after they play games with each other. In other words, expellers need to pay a cost r, which is the cost-to-benefit ratio, to make identification as well as another cost c to execute expulsion so as to expel defectors from their neighborhoods. This is the reason why expellers perform inferior to both cooperators and defectors in the mean-field limit. Finally, cooperators in our model play the role of second-order free riders. To explain this, suppose that there are no defectors in the population. In this case, natural selection cannot distinguish between the players that cooperate and banish defectors (i.e. expellers) and the players that cooperate but do not expel defectors (i.e. cooperators). There is no requirement to expel, so there is no second-order free rider problem. If defectors are present in the population, however, the players that eject defectors must do so at a personal cost. Natural selection will now favor the players that cooperate but do not expel defectors. As a result, cooperators rise at the expense of expellers and eventually introduce the invasion of defectors. However, we have shown that the coevolution of expulsion and cooperation can be promoted even in such an adverse condition.

Acknowledgments
We thank two anonymous referees for their constructive and insightful comments. XFW is supported by Donghua University Startup Funds for Young Faculty. XFW and WJK gratefully acknowledge financial support from the Fundamental Research Funds for the Central Universities

Appendix A. Mean-field dynamics of prisoner's dilemma with expellers
In the mean-field theory, the state of a system is characterized by the density of players. In this situation, the expected values Ex • ( ) of average payoffs P X (X ä E, C, D) for expellers, cooperators and defectors in prisoner's dilemma with expellers are respectively given by     Herein, X and Y stand for the state of a site, which is occupied by either an expeller or a cooperator, or a defector.
Since the present dynamical rule has the form of Fermi function (equation (4)), the motion for concentrations of expellers, cooperators and defectors can be given by the following equation set: which is the master equations governing the evolution of the whole system.

Appendix B. Pairwise interaction dynamics of spatial prisoner's dilemma with expellers
In this appendix, we would like to present an extended pair approximation method for describing pairwise interaction dynamics of prisoner's dilemma with expellers on regular graphs. Let p E , p C , p D and p f denote the density of expellers, cooperators, defectors and vacant sites in a population, respectively. Let p EE , p EC , p CE , p CC , p ED , p DE , p DD , p Ef , p fE , p ff , p CD , p DC , p Cf , p fC , p Df and p fD represent the density of EE, EC, CE, CC, ED, DE, DD, Ef, fE, ff, CD, DC, Cf, fC, Df and fD pairs, respectively. Then q p p X Y XY Y = | specifies the conditional probability that the neighboring site of a site of state Y is in state X. Herein, X and Y stand for the state of a site, which is occupied by either an expeller or a cooperator or a defector, or just vacant. For pair approximation method [15,[108][109][110][111][112], only frequencies of state pairs p XY are tracked. The probabilities of larger configurations are expressed and approximated by the frequencies of pair configurations. Based on the symmetry condition p XY =p YX , the compatibility condition p p X Y XY = å , and the closure conditions, the whole system can be described by the following eight variables in pair approximation: p E , p C , p EE , p EC , p ED , p CC , p CD and p DD .
During the pairwise interaction process, the configuration frequencies do not change except if a defector is ejected by an expeller after they play games with each other. Let us now consider the case that an ED pair is randomly selected for interaction from the population. The expulsion of the chosen expeller to the chosen defector from its current site to another vacant site in the population may lead to the variation of p ED , p CD and p DD . The expelled defector has k E expellers, k C cooperators, k D defectors and k f =k−1−k E −k C −k D vacant sites among the k−1 remaining neighbors on a regular graph with connectivity k. The frequency of such a configuration is The frequency of the configuration, in which a vacant site has k E ¢ expellers, k C ¢ cooperators, k D ¢ defectors and k k k k k The number of CD pairs increases by k k C C ¢ -, and thus p CD increases by k k Prob .