Direct Reciprocity in Spatial Populations Enhances R-Reciprocity As Well As ST-Reciprocity

As is well-known, spatial reciprocity plays an important role in facilitating the emergence of cooperative traits, and the effect of direct reciprocity is also obvious for explaining the cooperation dynamics. However, how the combination of these two scenarios influences cooperation is still unclear. In the present work, we study the evolution of cooperation in 2×2 games via considering both spatial structured populations and direct reciprocity driven by the strategy with 1-memory length. Our results show that cooperation can be significantly facilitated on the whole parameter plane. For prisoner's dilemma game, cooperation dominates the system even at strong dilemma, where maximal social payoff is still realized. In this sense, R-reciprocity forms and it is robust to the extremely strong dilemma. Interestingly, when turning to chicken game, we find that ST-reciprocity is also guaranteed, through which social average payoff and cooperation is greatly enhanced. This reciprocity mechanism is supported by mean-field analysis and different interaction topologies. Thus, our study indicates that direct reciprocity in structured populations can be regarded as a more powerful factor for the sustainability of cooperation.


Introduction
One major question in evolutionary biology and social science is to understand the emergence of cooperative traits and their sustenance under the pressure of free-rider. To explain the ubiquitous cooperation, a theoretical framework that has shed some light on this long-standing problem is the evolutionary game theory [1][2][3]. In particular, a simple, paradigmatic model, prisoner's dilemma game (PD), where two individuals simultaneously decide to adopt one of two strategies: cooperation (C) and defection (D), has attracted tremendous attention from both theoretical and experimental studies [4,5]. When populations play the prisoner's dilemma game in the well-mixed case, this setup does not support the organization of cooperative dynamics. Over the past decides, a great number of scenarios have been identified that can offset an unfavorable outcome of social dilemmas and lead to the evolution of cooperation [6][7][8][9][10]. Whereas, Nowak attributed all these to five scenarios: kin selection, direct reciprocity, indirect reciprocity, network reciprocity, and group selection [11], which, comparing with the so-called well-mixed situation, can be somewhat related to the reduction of an opposing player's anonymity.
Among the five scenarios, network reciprocity, where players are arranged on the spatially structured topology and interact only with their direct neighbors, has attracted the greatest interest [12], because cooperators can survive by means of forming compact clusters which minimize the exploitation from defectors and protect those cooperators that are located in the interior of such clusters. Along this seminal idea, the role of spatial structure, and its various underlying variance in evolutionary games, have been intensively explored (see [13,14] for a recent review). In addition, scientists also find that the strategy updating rule and dynamics on spatial topology also take a significant impact on the evolution of cooperation [15][16][17][18][19][20][21][22][23][24][25][26][27][28]. Let us mention a couple of typical examples. In recent research works [29][30][31][32][33], where players were allowed to adjust their strategy based on diverse learning ability or aspiration to fittest opponent, the prevalence of cooperative behavior even under large temptation to defect was observed. In [34] it was reported that the replicator dynamics could lead to an outbreak of cooperation on complex network, even if the conditions did not necessarily favor the spreading of cooperators. It was promising, furthermore, strategy update rules as well as update dynamics were more influence on the evolution of cooperation than the network topology alone [35,36]. In [37,38], allowing weight into evaluation of individual fitness, cooperation was also largely enhanced.
Interestingly, except for the above studies mostly focusing on the prisoner's dilemma game (PD), other paradigmatic settings have also been explored on top of spatial topology [39][40][41][42][43][44]. Of particular renown are the investigations of chicken game (CH) (or snowdrift game (SD)) [45], where the best action for individual relies on the choice of your opponent: to defect (cooperate) if the other cooperates (defects). Such a case finally leads to the coexistence of cooperators and defectors, namely, the state of ST-reciprocity [46], which is preferable to maximize population payoff than R-reciprocity [47,48,49]. To explain the social cooperative behaviors in this game, many different proposals aimed at sustaining cooperation were suggested and investigated. Examples include continuous strategy [50] multi-person interaction [51], stochastic noise in the payoff [52], teaching activity [53,54], mobility [55][56][57], memory [41] and fitness evaluation [58], to name but a few.
In spite of the relative body of work that has been accumulated in the past years, the study for supporting cooperation traits is usually separated with the framework of prisoner's dilemma game (PD) or chicken game (CH). The situation of resolving the social dilemma in both games remains less explored [59], because an effective approach in one game may not provide a way for cooperation to survive in other game. Moreover, in realistic society, the type of dilemma is variable and more complex, how to constitute a universal protocol facilitating cooperation becomes highly necessary and meaningful. Inspired by all these, in the present work, we introduce the mixed strategy with 1-memory length into the different spatial game classes to study the evolution of cooperation, where both network reciprocity and direct reciprocity induced by the memory are suggested. We explore whether cooperation is sustained, especially for prisoner's dilemma game (PD) and chicken game (CH). Our results show that cooperation is actually promoted under such a protocol. In the remainder of this paper we will first describe the considered evolutionary games, subsequently, we will present the main results, and finally we will summarize our conclusions.

Model
We consider 262 game as the archetype. In order to depart from the traditional setup of spatial social dilemma games, we introduce strategy profile S i~( p i , q i )to each player, where p i [½0, 1is the probability that player i will cooperate with player j if agent j was a cooperator in the last step, while q i [½0, 1is the probability that player i will cooperate with player j when player j defected in the anterior step. Interestingly, 1-length memory is assumed for each player to store the opponent's previous strategy, and it can be, to some extent, considered as a type of mixed strategy game with 1-memory length. In a typical game, two players simultaneously decide whether they wish to cooperate or defect. If both cooperate (defect) they receive the reward R (the punishment P). If, however, one player chooses cooperation while the other defects, the latter gets the temptation T and the former is left with the sucker's payoff S. For simplicity, the standard scaled parameterization entails designating R = 1 and P = 0 as fixed, while the remaining two payoffs can be occupied 21#S#1 and 0#T#2. Thus, if T.R.P.S we have prisoner's dilemma game (PD), T.R.S.P yields chicken game (CH) (or snowdrift game (SD)), and R.T.P.S belongs to stag-hunt game (SH), as schematically presented in Fig. 1(a). Without loss of generality, the payoff parameterization can also been denoted by the staghunt-type dilemma D r = P2S and the chicken-type dilemma D g = T2R as follows [48,60], Correspondingly, we have the prisoner's dilemma game (PD) if 0ƒD g ƒ1and0ƒD r ƒ1, the chicken game (CH) if 0ƒD g ƒ1and 0ƒD r ƒ{1, the stag-hunt game (SH) if 0ƒD g ƒ{1and 0ƒD r ƒ1(see fig. 1(b)).
Throughout this work each player i is initially designated either as a cooperator (C) or defector (D) with equal probability, and is also assigned the parameter value S = (p, q) to the interval [0, 1]. This setting is performed uniformly irrespective of its initial strategy and remains unchanged during the simulations. As the interaction network, we use either the L|Lregular square lattice or random regular graph (RRG) constructed as described in [61]. At each Monte Carlo step (MCS), defined as the amount of time, on average, each player has a chance to update its strategy once. The updating procedure comprises the following elementary steps. First, a randomly chosen player i earns its payoff p i by playing the game with all its four neighbors. Then, we evaluate in the same way the payoffs of all the neighbors of player i. At last, player i adopts the strategy from the selected player j with the probability where K denotes the amplitude of noise [62]. The effect of noise on the cooperation in the spatial game has been studied in detail in previous work [63]. Since this issue goes beyond the purpose of the present work, in all our following studies, we simply fix the value of K to be K = 0.5. The results of Monte Carlo simulations presented below are obtained for lattices with 100 2 individuals, and the average fraction of cooperators r C , that is, the number of cooperators divided by L 2 , is determined by the average within the last 2000 steps out of the total 2610 5 MCS. Moreover, since the random distributions of p and q may introduce additional disturbances, the final results are averaged over up to 100 independent runs for each set of parameter values in order to assure suitable accuracy.

Results and Discussion
We start by presenting the color map encoding the final fraction of cooperation r C , strategy profile p and q on the D g -D r parameter plane in Figure 2. It is obvious, compared with the solution of wellmixed population shown in Figure 1, cooperative behavior drastically enhances in our setting. Even under the case of strong dilemma D g = D r = 1 (PD region), where mutual defection dominates in the traditional scenario, almost complete cooperation can be observed. In this sense, the prosperity of cooperative behavior suggests the formation of R-reciprocity, where the best choice to maximize social profit is that all players become cooperators to obtain R in prisoner's dilemma game [46,48,49]. Moreover, it is interesting to focus on the strategy profile parameters. p reaches nearly 1 besides the top left corner of CH, while q differs according to the exposed dilemma strength and gradually gets close to 0 when the chicken-type dilemma D g exceeds 0. Based on these facts, the elucidation for the high level of cooperation is explicit that the defined strategies in our model can be regarded as the mixed strategies, which effectively help cooperators to weaken defector attacks. Naturally, such a feedback mechanism causes the preference of cooperation with a defector (i.e., the value of q) fast decreasing. Thus, we argue when stochasticity is introduced in the decision making process, the evolution of cooperation thrives.
In order to explain the promotive impact of mixed strategy (caused by memory) on the evolution of cooperation, we examine the evolution process of cooperation fraction r C , strategy profile p and q. Figure 3 features results obtained for D g = D r = 1, whereat the corresponding behavioral snapshots are shown as well (see Fig. 3(b)-C). Interestingly, as observed in the traditional version [29,30,62,64], in the early stages of the evolutionary process, it appears as if defectors would actually fare better than cooperators. This is actually in agreement with what one would expect, given that defectors are, as individuals, more successful than cooperators and will thus be chosen more likely as potential strategy donors. At the same time, we can observe that the values of p and q decrease. However, the tide changes fast, as one can observe from the presented time series, the individuals with high p value start to form compact clusters (see the Fig. 3(b)-A), which, to large extent, helps more agents choose cooperation to resist the disadvantageous environment. Under the guidance of such a direct reciprocity proposal, the few remaining clusters of cooperators start recovering lost ground against expended defectors. More crucial is the fact that the clusters formed by these cooperators are impervious to defector attacks again, which can obtain sufficient attestation through the extremely low q value. In a sea of cooperators another cooperator rather than a defector always tries to penetrate into the clusters. Thus, we validate our argument that the feedback mechanism driven by direct reciprocity scenario halts and transfers the march of defectors to the undisputed decay. This newly identified mechanism eventually leads to the widespread cooperation that goes beyond what can be warranted by the spatial reciprocity alone [48,49].
Next, it is interesting to focus on the evolution of cooperation in chicken game (CH). One notable character is that the complete cooperation phase (namely, r C~1 ) is still not observed in the upper half part of CH (surrounded by dotted line in Fig.2(a)) even if both memory and spatial topology are implemented. What happens in the game? Here, to obtain more payoff, ST-reciprocity becomes more meaningful than R-reciprocity when the condition D g .D r +1 (or 2R,S+T) is satisfied. In what follows, we will systematically examine the validity of this claim.
To quantify the vantage of maintaining ST-reciprocity in chicken game (CH), we first calculate the expected payoff via the mean-field approximation. Assuming the cooperation fraction at equilibrium as r c , then the expected payoff for each individual SpT should be   [46]), which can lead to high payoff than the case of complete cooperation (the so-called R-reciprocity [47] According to the above expression, we can further obtain the maximal payoff SpT max and the corresponding cooperation fraction r cmax resulting in this maximal payoff and Figure 4 illustrates how the expected payoff SpTobtained by playing games with four neighbors varies as a function of cooperation fraction r C when assuming D g = 2D r = 1. Obviously, r Cmax can maximize the average social payoff in the population, because the expected payoff is a quadratic function curve for the fraction of cooperation. We also confirm that when the spatial structure is introduced, the final distribution of agents' strategies is homogeneous at equilibrium (due to the fact that the continuous value is permitted as strategy profile). Thus, the discussion about mean-field approximation is still valid in spatial structure. Substituting D g = 2D r = x into both Eqs. (4) and (5), we obtain the maximum payoff and the corresponding cooperation fraction as follows, We need to particularly note that these formulas are only valid for D g .D r +1 (or 2R,S+T), because this limitation guarantees ST-reciprocity becoming more meaningful (to obtain higher payoff) than R-reciprocity. Figure 5 features the comparison between the theoretical analysis and the simulation result. It is evident that the average payoff under simulation is close to the theoretical maximum payoff. Moreover, because of the well-known claim that spatial topology may inhibit the evolution of cooperation in the chicken game (CH) (or snowdrift game (SD)) [45], it becomes of interest to explore the ST-reciprocity. We can observe, under the joint impact of spatial interaction topology and direct reciprocity driven by memory, that efficient ST-reciprocity (that exceeding R-reciprocity) can be maintained.
An important remaining question is to examine the universality of mixed strategy implemented by two parameters within different topology and neighborhoods. Results presented in Fig. 6 depict how cooperators and the average payoff fare on the random regular graph (RRG). Similarly as Fig. 2(a), it can be observed, when the condition Dg.Dr+1 is satisfied, that cooperators perform significantly better than the well-mixed case yet can not reach the complete dominance. While for the average payoff, strikingly, we can observe that it becomes more profitable than the state of full cooperation, which proves the existence of ST-reciprocity once again. This is in the qualitative agreement with the observations made on the square lattice, indicating that direct reciprocity in spatial populations is universally effective in promoting the evolution of cooperation and enhancing ST-reciprocity, irrespective of the underlying interaction networks. In addition, we can observe that, with the increment of neighborhood, cooperation fraction will decay and corresponding average payoff becomes lower, which is consistent with previous prediction of mean field approximation [65]. Lastly, it is instructive to explore how the cooperation evolves under extremely strong dilemma. Figure 7 shows the cooperation behaviors and strategy profile p, q as a function of D g value. Strikingly, full cooperation dominance state can be maintained even for D g .1, which further supports the fact that the newly introduced scenario about the reciprocity in spatial topology boosts the R-reciprocity and is generally valid for strong dilemma. When D g is sufficiently large (namely, D g .1.8), the cooperation level within the system starts to decline slowly, and p SpT T as a function for fraction of cooperation r C at D g = 2D r = 1, which is obtained by applying mean-field approximation approach. It is a quadratic function curve. Therefore, there is a cooperation fraction r Cmax to guarantee the highest expected payoff SpT max . For example, in the Fig. 4, best social payoff SpT max~4 :5 (exceeding 4R) is supported by r Cmax~0 :75 (indicated by dotted line.). doi:10.1371/journal.pone.0071961.g004 possess the similar tendency with the changing of r c (note that the downfall of q is particularly obvious in the weak dilemma region and its value approximates to 0 for extremely strong dilemma). Thus, direct reciprocity in spatial populations, i.e., the propensity of individual cooperation with the opponent according to previous performance, can be seen as a universally applicable promoter of cooperation for different dilemma games.

Conclusion
We have presented a new framework of direct reciprocity on spatial populations in 262 games, where two strategy profile parameters p and q are taken into account. By means of extensive simulations, we have found, to maximize social efficiency, that agents alternatively change their strategies according to the difference of exposed dilemma structure, which is even effective under the strong dilemma structure. Compared with the case of spatial reciprocity alone [36], it is interesting that complete cooperative phase can be maintained till extremely strong dilemma structure in prisoner's dilemma game (PD). The elucidation for the promotion of cooperation can be attributed to a feedback mechanism: the survival cooperators not only induce a collective resistance against the invasion of defectors, but importantly accelerate the formation of extremely robust clusters of cooperators, where they are more likely to be regarded as the potential strategy donors and surrounded by more followers. In this sense, the area of R-reciprocity extensively increases (i.e., players still choose mutual cooperation for obtaining R in strong  dilemma). Moreover, another interesting finding is that although cooperation trait cannot reach the perfect state in the region D g .D r +1 (or 2R,S+T) of chicken game (CH), ST-reciprocity can be guaranteed (i.e., alternatively obtaining S and T is more profitable than mutual cooperation), which is robust to the network topology. Through mean-field analysis, we have also proved that social average payoff has maximum value in this particular area. Therefore, the direct reciprocity in spatial populations can be regarded as a universally applicable promoter of cooperation irrespective of the evolutionary games. We hope that it will inspire future studies, especially in terms of the solution of some realistic social puzzles via a co-evolutionary process [14].