Punish, but not too hard: How costly punishment spreads in the spatial public goods game

We study the evolution of cooperation in spatial public goods games where, besides the classical strategies of cooperation (C) and defection (D), we consider punishing cooperators (PC) or punishing defectors (PD) as an additional strategy. Using a minimalist modeling approach, our goal is to separately clarify and identify the consequences of the two punishing strategies. Since punishment is costly, punishing strategies loose the evolutionary competition in case of well-mixed interactions. When spatial interactions are taken into account, however, the outcome can be strikingly different, and cooperation may spread. The underlying mechanism depends on the character of the punishment strategy. In case of cooperating punishers, increasing the fine results in a rising cooperation level. In contrast, in the presence of the PD strategy, the phase diagram exhibits a reentrant transition as the fine is increased. Accordingly, the level of cooperation shows a non-monotonous dependence on the fine. Remarkably, punishing strategies can spread in both cases, but based on largely different mechanisms, which depend on the cooperativeness (or not) of punishers.


Introduction
A social dilemma is a situation where actions that ensure or enhance individual prosperity harm the well-being on the collective level [1]. Public goods such as social benefit systems or the environment are particularly prone to exploitation by individuals who want to profit at the expense of others. While collective cooperation would be favorable [2], individual freeriding ('defection') is tempting, which may end in a collapse of solidarity known as 'tragedy of the commons' [3]. While several mechanisms that prevent defection from taking over have been discovered so far [4], the identification of conditions for the survival and spread of cooperation among selfish individuals still remains a grand challenge [5], which is addressed by scientists from various fields of research, including physics [6]- [14]. The puzzle is most frequently tackled within the framework of evolutionary game theory [15]. In contrast to the famous prisoner's dilemma, which studies cooperation (C) and defection (D) in pairwise interactions, the public goods game addresses cooperation and defection within groups. In the latter, cooperators contribute to the public good, while defectors do not. Irrespective of the strategy, all contributions are summed up, multiplied by a factor and then equally divided among all members of the group. Thus, defectors bear no cooperation costs, while enjoying the same benefits as contributors, which makes it profitable to defect and tends to cause a spread of free-riders. Remarkably enough, however, individuals cooperate much more in public goods situations than expected [16]. This requires the identification of mechanisms that can sustain cooperation in public goods games. Punishment has been identified as one possible route to cooperation [17,18], but its effectiveness depends on whether the participation in the public goods game is optional or not [19]. Social diversity [20] and volunteering [21] may also promote cooperation in public goods games, as a random exploration of strategies does [22].
In this paper, we investigate the impact of punishment on the evolution of cooperation in structured populations, focusing on the case of a minimal number of pure strategies. Punishment is considered by adding the strategy of punishing cooperators (PC) or, alternatively, of punishing defectors (PD). Both punishing strategies sanction other defectors with a fine at a personal cost. Our main interest is to clarify how the so-called 'institution of punishment' influences the general cooperation level, if it is executed by players who either cooperate or defect. We investigate the possible similarities and differences in the mechanisms leading to the final system states and the underlying dynamics. It turns out that, in the two variants of the model (the one with the additional PC strategy and the one with the PD strategy), punishment 3 promotes cooperation through completely different mechanisms. As a consequence, the impact of punishment in structured populations can be significantly different. While we describe the details of our model in section 2, we discuss the results of computer simulations in section 3 and summarize our findings in section 4.

Public goods game with punishment
The public goods game is played on a periodic square lattice. Each site on the lattice is occupied by one player, represented by the index x. Initially, all three strategies s (C, D and PC or PD) are assumed to have the same frequency, and they are randomly and uniformly distributed over the grid. For the sake of simplicity, every player participates in G = 5 groups (consisting of the focal individual and the 4 nearest neighbors each). We should also note that our results basically remain valid when varying the group size or the interaction network within reasonable limits. The only crucial feature is the limited number of interacting neighbors in the structured population.
In accordance with the standard definition of the public goods game, cooperators (C and PC) contribute an asset a = 1 to the public good and defectors (D and PD) contribute nothing. Subsequently, the sum of contributions in a group is multiplied by the 'synergy factor' r . The resulting amount is then shared equally among all members of the group, irrespective of their strategy. In this way the defector strategies (D and PD) try to exploit the cooperator strategies (C and PC). Summing up the shares of all groups that a player x belongs to yields the value P * x . This value corresponds to his or her overall payoff, P x , if no punishment is applied. Otherwise, the overall payoff, P x , quantifying the 'fitness' of player x is obtained by subtracting punishment costs and/or punishment fines. If the strategy, s x , of player x is D or PD, player x will be punished with a fine, f , resulting in P x = P * x − f π p , where the sum runs over all the groups containing player x. π p is given by the number of punishing players (PC and PD) in each group (not considering player x), divided by G − 1. Furthermore, if s x = PC or PD, player x will have to bear the punishment cost, c, resulting in P x = P x − cπ d , where the sum runs again over all the groups containing player x. π d is given by the number of defectors around player x in each group, divided by G − 1. In other words, the punishing strategies (PC and PD) make an extra contribution to keep the punishment and, as we will see, also cooperation alive. To update the strategy of players, we employ the Monte Carlo simulation procedure. Each elementary step involves the random selection of a focal player, x, and of one nearest neighbor, y. Following the determination of payoffs P x and P y , as described above, player y takes over the strategy, s x , of player x with probability where K denotes the uncertainty of strategy adoption [23]. In the limiting case, K → 0, player y copies the strategy of player x if and only if P x > P y . For K > 0, however, under-performing strategies may also be adopted sometimes; for example, due to errors in the evaluation of payoffs. During one full iteration, the strategy of all players may be copied once on average. The computational results presented below have been obtained for lattices with L 2 sites, where L is chosen between 400 and 3000 (large enough to avoid the accidental disappearance of a strategy). The final fractions, ρ s , of all three strategies, s, were obtained after up to 10 6 iterations (depending on how quickly the fractions stabilized). The presented data were averaged over a sufficient number of runs to ensure a low variability of the results (5-30 runs, depending on the system size).

Computational results
For well-mixed interactions, when a random sample of G players engages in public goods games with the two strategies, C and D, only, defectors spread and the tragedy of the commons results for r < G. This undesirable outcome does not significantly change by adding punishing strategies (PC and PD), because the latter have to bear additional punishment costs, which reduce their competitiveness. Accordingly, the social dilemma persists in the presence of punishing strategies, and for well-mixed interactions, defectors still spread in the system [24]. It is furthermore worth noting that conventional cooperators (C), who avoid extra costs by punishment efforts, can be considered as 'second-order free-riders', as they exploit the defectionsuppressing benefits created by punishers. This is actually the reason why punishing cooperators tend to disappear, which finally weakens the cooperators in their battle against defectors. In other words, the tragedy of the commons results because 'lazy (non-punishing) cooperators' crowd out their 'friends', the punishing cooperators, who are needed for their own survival. As Nowak and May pointed out for the prisoner's dilemma [25], a fixed interaction network in structured populations facilitates network reciprocity, which is beneficial for cooperators. The same mechanism can be found for the two-strategy spatial public goods game as well.
Using the parametrization of our model, cooperators manage to survive if r > 3.74, and crowd out the other strategies if r > 5.49 [26]. The impact of additional punishing strategies (PC and PD) on structured populations was also studied by several research groups [27]- [29]. It turns out that the condition of a fixed and finite interaction neighborhood can resolve the problem of second-order free-riding by allowing punishing cooperators to separate themselves from pure cooperators, thereby escaping direct competition and exploitation. In this paper, we study two minimalist models, where only one type of punishing strategy is considered besides conventional cooperators and defectors. In other words, we explore the possible impact of punishing cooperators and punishing defectors separately. The corresponding models will be called the 'PC model' and the 'PD model', respectively.

Phase diagrams of the minimalist models with spatial neighborhood interactions
Representative phase diagrams for the two minimalist models are presented in figure 1, using the same value of the synergy factor, r . In both diagrams, each region ('phase') is named after the strategies, which survive over time and contribute to the final strategy distribution. A small value of the punishment fine does not significantly change the behavior of the system, given a finite punishment cost. Generally, however, the system behavior depends in a sensitive way on the actual values of punishment cost and fine. In the case of the PC model, punishing cooperators always prevail for a sufficiently large fine, independently of the cost value. If the cost is lower than a critical value (c ≈ 0.65 for r = 3.8), the application of a sufficiently large fine will drive the system into a state where the punishing strategy replaces its non-punishing counterpart. (As we will see, a similar behavior can be observed for the PD model, but the explanation is completely different.) The critical cost value that limits the existence of a mixed D + PC phase decreases by reducing the synergy factor, r , and the phase disappears completely for sufficiently low values of r . Accordingly, the system turns from D-only to a PC-only phase, similar to what is found in the public goods game with all four strategies (C, D, PC and PD) [29]. The system always leaves the punishment-free state via a discontinuous first-order phase transition, while the transition between the mixed D + PC phase and the PC-only phase is continuous. (The critical behavior of this transition will be discussed in the next subsection.) The global cooperation level, i.e. the sum of fractions of cooperators, ρ C , and punishing cooperators, ρ PC , increases monotonously with the fine, as the inset shows.
In the case of the PD model (right panel of figure 1), the impact of punishment is limited to a finite region of the punishment cost (c < 0.095 for r = 3.8). Below this cost value, the impact of punishment starts similarly to the PC model: when the fine value is increased, a firstorder phase transition occurs, which goes along with a considerable increase in the fraction of cooperators (ρ C ). Beyond a certain value, however, a further increase in the fine decreases the level of cooperation, and the system eventually returns to a phase that is characteristic of a system without punishment. As a consequence of the observed reentrant phase transition, there exists an optimal level of the punishment fine, f , for which the fraction of cooperators (ρ C ) becomes maximal. This can be understood based on a pattern formation mechanism described in subsection 3.3. The mentioned critical c-value that limits the emergence of the punishing strategy decreases as we increase the value of the synergy factor, and it disappears around r ≈ 4.7. As we will see, this is closely related to the fact that too large fines do not influence the system behavior.

Characterization of phase transitions and universality class
To study the phase transitions in more detail, we have plotted the stationary fractions of all strategies for both models in figure 2. In the case of the PC model ( figure 2(a)), the fraction of  2(b)). In the case of the PC model, a discontinuous phase transition and, subsequently, a continuous phase transition occurs, as the value of the punishment fine is increased.
In the case of the PD model, despite the punishment cost, an additional, continuous phase transition appears. In both models, the punishing strategy can replace its non-punishing counterpart at a sufficiently large value of the punishment fine. In the PD model, however, an interesting reentrant phase transition can be observed, and for large punishment fines the system behaves as if the fine was zero ( f = 0). Bottom: critical scaling behavior of the order parameter for both models. punishing cooperators can increase at the cost of defectors, as soon as cooperators are eliminated from the system. Interestingly, second-order free-riders disappear suddenly, as soon as the punishment fine passes a critical threshold. At this threshold, 'lazy', non-punishing cooperators 7 are essentially replaced by punishing ones. As the punishment fine is further increased, the fraction of defectors (ρ D ) decreases gradually and becomes zero above a certain value of the fine. The present nonequilibrium continuous phase transition from the fluctuating D + PC phase to the absorbing PC phase agrees with the directed percolation universality class conjecture [30,31]. Namely, the interactions amongst players are short-ranged, and the order parameter, which is the fraction of defectors (ρ D ), becomes zero at the critical value, f c , of the fine, where the system arrives at the single absorbing all-PC state. Accordingly, the (static) exponents of the phase transition are expected to belong to the universality class of directed percolation, for which ρ D ∝ ( f c − f ) β with β = 0.584(4) in two spatial dimensions [32]. In the PD model, the fraction of punishing defectors rises suddenly from zero to a finite value at a critical threshold of the fine value, as in the other minimal model (see figure 2(b)). However, as defectors disappear, punishing defectors only reach half of the fraction that defectors had in the previous C + D phase. This difference signals already that another type of mechanism must be responsible for the spreading of the punishing strategy in the PD model. It turns out to be crucial that the fraction of punishing defectors decreases as the punishment is increased. This is because punishing defectors (PD) punish not only pure defectors (D) but also each other-a behavior that is called 'hypocritical punishment' [33,34]. Consequently, defectors can spread again above a certain value of the punishment fine. When this happens, the fraction of cooperators starts to fall, while the fraction of punishing defectors decreases further (until it reaches zero). Therefore, for high values of the fine, the system arrives in a state that is identical to the one for negligible fines ( f = 0). In other words, the system behavior becomes exactly the same as for the spatial public goods game without punishment.
The critical behavior of the PD model is more interesting than that of the PC model, because two continuous phase transitions can be observed as the fine is increased (for a fixed cost value). In both cases, the system leaves a three-strategy (C + D + PD) phase for a twostrategy (C + PD or C + D) phase when the fine is decreased or increased. As we will see in the next subsection, the mechanisms determining the stationary patterns in the last two phases are significantly different. Despite this, as figure 2(d) illustrates, the exponents of the phase transitions agree within the accuracy of numerical estimates. The value is ≈0.6, which is very close to the previously mentioned directed percolation exponent.

Pattern formation mechanisms
To explore the differences between the punishment-promoting mechanisms in the PC model and the PD model, we have plotted the fraction of each strategy as a function of time (see figure 3). The punishment cost, c, and fine, f , were chosen such that the final strategy distribution contained punishing players (D + PC or C + PD, respectively). For the PC model (left), the randomly mixed initial state is particularly beneficial for the exploitation of cooperative strategies by defectors. Accordingly, ρ D rises rapidly, while both ρ C and ρ PC fall. Defectors spread almost everywhere, but a number of islands made up of cooperative strategies can survive, where cooperative behavior is effective, thanks to network reciprocity [4,25].
It is important to note that, in the beginning, C and PC players may form mixed cooperative islands together. However, when defectors are absent in the neighborhood, the C and PC In the beginning, the mixed initial state is beneficial for the spreading of defectors. Left: in the PC model, an arrow indicates the moment when the surviving cooperative players aggregate in the sea of defectors and their clusters start to grow, thanks to network reciprocity. Since punishing cooperators can fight more efficiently against defectors than cooperators, the fraction of PC players increases faster than the fraction of C players. Right: in the PD model, both defecting strategies can exploit cooperators first, but pure D players (who do not have to bear punishment costs) do it more efficiently. As a consequence, the PD strategy is crowded out. Eventually, however, the C and PD strategies can form an alliance (at the time indicated by the arrow). When eliminating defectors together, their fractions, ρ C and ρ PD , are jointly growing with a typical ratio, ρ C /ρ PD , among them (which is almost constant). strategies result in identical payoffs, and the strategy update dynamics defined by equation (1) result in a voter model kind of logarithmic coarsening within the cooperative islands [35] (since the C and PC strategies are equivalent in the bulk of C + PC domains, where there are no defectors and, accordingly, also no punishment). Although the coarsening dynamics are logarithmically slow, the C and PC strategies in the cooperative islands segregate quickly, as the sizes of these islands are small. After this time period, the end of which is indicated in the left panel of figure 3 by an arrow, homogeneous clusters of cooperators (C) and punishing cooperators (PC) fight separately against defectors (D). When the punishment fine is high enough, punishing cooperators can outcompete defectors, but defectors are superior to cooperators (thanks to the low synergy factor, r ). Consequently, the fraction of punishing cooperators, ρ PC , increases quickly, and cooperators are eventually crowded out. Finally, cooperators disappear completely and, with them, second-order free-riders. As a conclusion, to get rid of second-order free-riding, the spatial segregation of the C and PC strategies is crucial.

9
The evolutionary dynamics are significantly different for the PD model (see the right panel of figure 3). Initially, similarly to the PC model, both defecting strategies (D and PD) can benefit from the well-mixed distribution at the beginning. As pure defectors are not burdened by punishment costs, their fraction (ρ D ) further increases with time. After some iterations, however, small cooperative clusters that have survived start growing, thanks to network reciprocity, while the number of defectors is reduced, since they perform poorly in the defecting environments they have created.
When the fraction of PD players (ρ PD ) reaches a certain value, the mixture of C and PD strategies can form an alliance that is beneficial for both strategies. On the one hand, PD players can collect the payoff in the vicinity of cooperators, which allows them to survive despite their costs for punishing defectors. On the other hand, the payoff of cooperators is competitive, because the punishment efforts of PD players keep the fraction of defectors in the neighborhood of cooperators at a low level. Accordingly, both strategies benefit from the alliance, and they can crowd out the D players together.
It is essential that the alliance can only work when the mixture of cooperators and punishing defectors is just right. When crowding out defectors, neither C nor PD players can occupy the gained territory alone. Instead, as soon as the C + PD alliance starts to work, the fractions of both strategies rise simultaneously with an almost constant ratio (as we have checked by complementary evaluations). The start of this phase is marked by an arrow in the right panel of figure 3. It appears that the delicate balance between both members of the alliance is self-organized and self-stabilizing.
For both models, the above-described pattern formation mechanisms can be nicely seen in snapshots of the time evolution. Figure 4 illustrates how the strategy distribution evolves in the cases of the PC model (top) and the PD model (bottom) when the same parameter values are used as in figure 3. The first snapshot for the PC model shows the moment when C and PC players form common islands together, but the segregation of both cooperative strategies is just the beginning. In the second plot, both cooperative strategies have already largely segregated from each other and now mainly struggle with defectors. The third plot shows the nearly final state, where C and PC players still form independent clusters but punishing cooperators have largely replaced cooperators, as they are more successful in the battle with defectors. The final resulting strategy distribution containing only D and PC players is illustrated in the last plot.
For the PD model, the first plot in the bottom of figure 4 shows a state where the alliance of C and PD players is not yet established, so that defectors can spread. However, when the optimal mixture of cooperators and punishing defectors emerges (second plot in the bottom row), the two allied strategies, C and PD, can continuously crowd out defectors (third plot in the bottom row). It can be seen that the ratio of C and PD players stays essentially constant while both strategies spread, which indicates a self-stabilizing mechanism. If only cooperators would conquer the territory previously occupied by defectors, the fraction of punishing defectors would locally decrease below a critical level, and cooperators would become vulnerable to exploitation by defectors. On the other hand, if only punishing cooperators would spread, they would not find enough cooperators to exploit, while they require this for their survival. As a consequence, the ratio of C and PD strategies is maintained at a typical value, which supports the spreading of the alliance best.
The concept of an optimal ratio of alliance members can explain why the phase of C + PD disappears for large fine values or high values of the synergy factor. Too large synergy factors keep defecting strategies at a low level, while too large fines prevent the required fraction a b c d a b c d iterations. The PD strategy is shown in yellow. In contrast to the PC model, the punishing PD strategy would disappear without the presence of the C strategy in its vicinity. As soon as the optimal mixture of the C and PD strategies occurs, their alliance can efficiently spread in the whole system. of PD players from occurring. This is why the alliance does not work, and D players can spread again. At first sight, the phase diagram of the PD model and the functional dependence of the cooperation level in figure 2 appear to be paradoxical: when the punishment fine is increased (while the punishment cost is fixed, something that can happen in case of escalation), the cooperation level is reduced, although punishment intends quite the opposite. Based on the above-described argument, however, this paradox can be resolved: too big fines prevent the occurrence of the right mixture of the two strategies and, thereby, the emergence of a functioning alliance.
To support our argument, we have plotted stationary strategy distributions in the PD model for different fine values. As the top panel of figure 5 shows, we have used identical punishment costs to study the effect of the fine. Figure 5(a) illustrates the case where the punishment is too low to eliminate defectors, so that the resulting strategy distribution consists of cooperators and defectors, as in the spatial public goods game without punishment for r = 3.8. When the fine is  figure 4. When the fraction of PD players decreases below a critical value for a large fine, the alliance of C and PD strategies does not work anymore. As a consequence, the D strategy spreads again. Upon increasing the value of the fine further towards high values, the outcome becomes identical to the one for the spatial public goods game without punishment.
increased, the alliance of cooperators and punishing defectors can crowd out non-punishing defectors, which enhances the level of cooperation (see figure 5(b)). A new phase, which additionally includes the D strategy, starts when the alliance between the C and PD strategies does not work anymore, because the fine is too large and hence the fraction of PD players is too small (see figure 5(c)). For higher fines, PD players cannot efficiently punish D players anymore, and as the fraction of punishing defectors goes towards zero, the system returns to the state that is typical for the spatial public goods game without punishment (cf figure 5(d) with figure 5(a)). For the PD model, one could, therefore, conclude that the 'institution of punishment' fails when values of the punishment fine are set too high.

Summary
In order to explore the impact of punishment in spatial public goods games, we have studied two minimalist models by adding either punishing cooperators (PC) or punishing defectors (PD) as an additional behavioral strategy. We have found that both punishing strategies can promote cooperation for synergy factors for which defectors would spread in the case of well-mixed interactions.
As we have pointed out, punishing strategies can spread in different ways. Punishing cooperators (PC) can crowd out 'lazy', non-punishing cooperators (C) above a certain value of the punishment fine, f . This solves the 'second-order free-rider problem' [36]- [38], i.e. the puzzle of why people perform punishment efforts despite their costs: the cooperation-and punishment-promoting mechanism is based on spatially restricted interactions between players, which supports the survival of non-defecting strategies via clustering and segregation [20], [39]- [43]. Through segregation, punishing cooperators can avoid being exploited by pure cooperators and fight against defectors more efficiently. Accordingly, defectors (conventional free-riders) and non-punishing cooperators (second-order free-riders) disappear eventually if the punishment fine exceeds a certain threshold. Larger punishment fines do not have any positive effects.
In contrast to punishing cooperators (PC), punishing defectors (PD) cannot survive alone. They need the presence of cooperators whom they can exploit, while the cooperators (C) need punishing defectors to punish and contain defection. The functionality of this alliance needs an optimal mixture of strategies to thrive. Once the optimal ratio between the C and PD strategies comes into existence, it is maintained by self-stabilization when conquering the territory of the rival D strategy. If external conditions prevent the establishment of this optimal ratio, the alliance cannot work. This explains the paradoxical re-entrant behavior found in the phase diagram of the PD model, according to which too high punishment fines generate the same results as no punishment at all. While the occurrence of alliances is possible in spatial games with more than two strategies, as is known from spatial population dynamics [44]- [46], here the resulting outcomes and dynamics provide interesting new examples of this fascinating phenomenon.