Stochastic evolutionary voluntary public goods game with punishment in a Quasi-birth-and-death process

Traditional replication dynamic model and the corresponding concept of evolutionary stable strategy (ESS) only takes into account whether the system can return to the equilibrium after being subjected to a small disturbance. In the real world, due to continuous noise, the ESS of the system may not be stochastically stable. In this paper, a model of voluntary public goods game with punishment is studied in a stochastic situation. Unlike the existing model, we describe the evolutionary process of strategies in the population as a generalized quasi-birth-and-death process. And we investigate the stochastic stable equilibrium (SSE) instead. By numerical experiments, we get all possible SSEs of the system for any combination of parameters, and investigate the influence of parameters on the probabilities of the system to select different equilibriums. It is found that in the stochastic situation, the introduction of the punishment and non-participation strategies can change the evolutionary dynamics of the system and equilibrium of the game. There is a large range of parameters that the system selects the cooperative states as its SSE with a high probability. This result provides us an insight and control method for the evolution of cooperation in the public goods game in stochastic situations.

biological populations very long ago. By behavioral experiments, experimental economists have also found that altruistic punishment can significantly improve the level of cooperation in the population 26 . However, the reality that the punishment itself is costly for it consumes time and energy; in addition, it needs to bear the risk of other's retaliation, which will produce the "second-order free ride" behavior. Then who will implement the punishment becomes a second-order dilemma for the problem. As Colman described that under the punishment mechanism, the interpretation of punishment is needed to replace the interpretation of cooperation 27 . Thus, various forms of peer (individual) and pool (institutional) costly punishment have been studied in the evolutionary public goods game [28][29][30][31][32][33][34][35][36] . For example, the introducing of punishing strategies of punishing cooperators and punishing defectors 28,29 ; the introducing of the moralists who cooperate while punishing non-cooperative behavior and immoralists who defect while punishing other non-cooperative behavior 30 ; tolerance-based punishment that individuals have the traits to punish the co-players based on social tolerance 33 ; self-organized punishment that allowing players to adapt their sanctioning efforts in dependence on the success of cooperation 34 ; conditional punishment that based the fine on the number of other punishers 35 ; probabilistic sharing of the responsibility to punish defectors 36 . Chen et al. 37 studied the competition and cooperation among different punishing strategies in the spatial public goods game. Chen and Perc 38 also investigated the optimal distribution of unequal punishment for the public goods game on a scale-free network. Recent studies related to this topic include implicated punishment that punish all the group members once a wrongdoer is caught 39 ; the heterogeneous punishment that divide punishers into categories according to their willing to bear for punishing 40 ; competitions between prosocial punishment and exclusion 41 ; etc. For a more detailed research progress on the punishment mechanism in evolutionary public goods game, the reader can refer to the recent review article of Perc et al. 42 .
The optional participation mechanism was first proposed by Hauert et al. 19 . By introducing a non-participation strategy in the game, the participant is free to choose whether or not to participate in the investment business of the public goods. The individual who does not participate does not contribute to the investment nor obtain the cooperative benefit. It is found that the introduction of non-participation strategy can resolve the cooperation dilemma in the public goods game to a certain extent. In the case of allowing non-participants, cooperation and the defection strategies can coexist. Hauert et al. established the replication dynamic equations of public goods game with punishment and non-participation strategies, and analyzed the evolutionary dynamics of the model 43,44 . The replication dynamic model assumes that the number of individuals in the population is infinite or large enough. It uses differential equations to describe the evolution of different strategies in the system. The evolutionary stable state of the system is analyzed by the stability of the differential equations at the equilibrium point. The stable equilibrium point is the evolutionary stable strategy (ESS) of the game. In order to describe the evolutionary dynamics in a finite size population and with mutation in the evolutionary process, Hauert et al. established an evolutionary public goods game model with punishment and non-participation strategies using the Moran process in the evolutionary biology 20 . Moran process based model uses the fixation probability to analyze the evolutionary dynamics of only two strategies in the system. It is generally necessary to assume weak selection in the analysis.
Because of the strong interpretability, the evolutionary voluntary public goods game based on the model of Hauert et al. has been widely studied by scholars. One of the extensions focuses on the use of different evolutionary dynamics to study the model. Wang and Xu et al. introduced the bounded rationality of the participants, using the approximate optimal reaction dynamic equations to study the evolutionary voluntary Public goods game with punishment 45,46 . Xu et al. introduced a self-adjustment rule for the strategy updating of individuals, and under this rule to study the evolutionary public goods game with non-participation strategy 47 , and evolutionary public goods game with both punishment and non-participation strategies 48 , respectively. Song et al. used the Logit evolutionary dynamics to study the cooperative behavior of groups in the public goods game with non-participation strategy 49 . It should be pointed out that all the above evolutionary dynamics studies are based on differential equations, and all the equilibrium analysis is based on the concept of the ESS.
In addition, the impacts of other factors on the cooperative behavior of the population have also been studied in the voluntary public goods game. For example, Dercole et al. found that under the optional participation mechanism, there was no need for excessive punishment to achieve the evolution of cooperation 50 . Rand et al. studied the impact of anti-social punishment on cooperative behavior of the population in the voluntary public goods game 51 . Zhong et al. studied the impact of the aggregation of cooperation and the liquidity of the group due to pursuit of profit on cooperative behavior of the population in the voluntary public goods game 52 . Nakamaru et al. studied the effects of different permit mechanisms on group cooperation and found that allowing deportation has a better effect than unconditional acceptance in promoting cooperation 53 . Valverde et al. studied the influence of the fluctuation on the voluntary public goods game by introducing a simple stochastic liquidity in the group interaction network 54 .
In contrast to the above studies, this paper uses the stochastic evolutionary model proposed by Amir et al. 55 , and studies the stochastic stable equilibrium (SSE) of voluntary public goods game with punishment in a finite size population. The concept of SSE was first proposed by Young and Foster 56,57 . Unlike ESS, SSE can effectively describe the continuous noise impact on the system during the evolutionary process of the system, which can better analyze the stability of the equilibrium in a stochastic environment. Based on the SSE, in this paper, the evolutionary process of strategies is described as a generalized quasi-birth-and-death process with ergodicity. The stochastic stable state of the system under combinations of different parameters is analyzed by the limit distribution of the stochastic process. By analyzing the SSE of the system, we can reveal the influence of these parameters on the cooperation behavior of the population in a stochastic situation.

Results
Voluntary Public goods game with punishment. We introduce two strategies of punishment and non-participation in the classical public goods game model. The punishment strategy participates in the investment of the public goods and punishes the defection strategies (free rides) randomly. Let α denote the punishment probability. Punishment is costly. Let γ denote the unit cost of punishment for the punisher, β denote the corresponding unit penalty for the individual who is punished ( ) γ β < . Non-participation strategies do not participate in the investment nor do they get the average investment income, but can get a fixed proportion of income.
Assuming that there are four types of strategies in the population, namely cooperation type (denote as C), defection type (denote as D), punishment type (denote as P) and non-participation type (denote as L), respectively. Each time N individuals from the population are randomly selected to participate in the public goods game. Each individual chooses the corresponding strategy according to its type. Let c denote the cost of investment, and rthe return on investment coefficient < < r N (1 ) of the public goods. Cooperation, defection and punishment type are collectively referred to as participation type. Let S (1 ≤ S ≤ N) denote the number of participation type individuals when it selected randomly from the N individuals. And the numbers of cooperation, defection and punishment type are n C , n D and n P respectively. When ; and non-participation type individuals get a fixed income σc ( σ < < − r 0 1 ). When = S 1, there is only one participant, and the game can not happen; we assume that this participant can only be forced to obtain a fixed income as well as the non-participants.
When the population size is M, and the number of cooperation, defection, punishment and non-participation type individuals in the group are i, j, k, l respectively. The expected payoffs of the four types of strategies are respectively: Stochastic Evolutionary dynamics. The number of individuals in each type of strategies will evolve with the amount of their payoffs. In order to describe the evolutionary process, we introduce a stochastic process z(t).
denote the number of cooperation, defection, punishment and non-participation type strategies in the population at time t, and define z(t) as the system state. For convenience, we abbreviate it as z t z t z t ( ( ), ( ), ( )) 1 2 3 . The state space of the system is = ; , , }  , and the number of elements in the state space is S At each time, the individual in the population adjusts its strategy according to its expected payoff, and the adjustment of the individual's strategy leads to the change of the system state. The three assumptions: inertia, myopic and mutation in the literature 55 about the bounded rationality of individuals in the population are used in our model. Due to inertia, we can assume that it is impossible to have more than two individuals to adjust their strategies simultaneously at one time. Myopic refers to the individual when choosing its strategy; it will only consider the current payoff, regardless of the payoff in the future. Mutation refers to the possibility that individuals may choose a non-optimal strategy with a small probability because of the complex decision-making environment and the limited nature of individual cognitive ability.
According to the above assumptions, when the system state z t z t z t ( ( ), ( ), ( )) , the transfer rate of the strategy x towards strategy y can be described as x y , the individual in the i j k ( , , ) state has a more incentive to move from strategy y to strategy x, but because of the mutation, the transfer rate of the strategy x to strategy y is p x y i j k ( , , ) ε = → . Thus, the parameter ε can be seen as the noise intensity in the environment, and κ can be understood as the speed at which the individual responds to the environment.
When i, j, k, l equals to zero respectively, the corresponding C does not make sense. At this point, the payoffs of cooperation, defection, punishment and non-participation type strategies are defined as the average payoff of the population.
denote the transfer probability of the system from state I to state I′ after time t. That is: Stochastic Stable Equilibrium. While ε > 0, this process is ergodic. According to the properties of the stochastic process, when → + ∞ t , the limit of exists and it's independent of the initial state I. Let Then ε ′ v I is the limit distribution of the quasi-birth-and-death process of reaching arbitrary state I′ ′ ∈ I S ( ) when the system noise is ε. According to the limit distribution, it is possible to determine the evolutionary stable state of the system under arbitrary noise intensity. Further, when the noise parameter is gradually reduced to zero, let According to v I′ , we can determine the limit state of the system and its probability distribution when the system noise is small enough. According to Young's description in reference 57 Numerical experiments. The Gauss-Seidel iterative algorithm introduced by Stewart in his monograph 58 can be used to calculate the limit distribution of the above quasi-birth-and-death process. In the following, we will show the SSE of the system in our stochastic evolutionary dynamics model, and investigate the effects of game parameters on the SSE of the system. By a vast of numerical simulation experiments, we found that when ε is positive and small enough, and for any parameters, the system has the limit distribution of more than zero only in the (0, 0, 0), (0, 0, 1), (0, 1, 0), ( be the SSEs of the system. In addition, according to the description of the model, when there is only one participation type individual in the population and whether it is cooperation or defection, this strategy is equivalent to the non-participation strategy. Thus (0, 0, 0), (0, 0, 1), (0, 1, 0), (1, 0, 0) states are essentially equivalent to the situation that all individuals choose the non-participation strategy, and we denote these states as the "All L" state. In addition, we denote the state (0, M, 0) as the "All D" state.
… ) indicate the co-existence of the cooperation and punishment strategies; or all the individuals choose the punishment strategy; or all the individuals choose the cooperation strategy, and we denote them as the "C + P" states. In the following, and study the impacts of α, γ, r, σ on the probabilities of the system to choose different stable equilibriums.
By numerical experiments, Fig. 1 shows the SSEs of system and the probabilities for the system to select them when punishment probability and the cost α γ = .
= . 0 1, 03, and parameters r ( , ) σ continuously change in the region σ . ≤ < < < − r N r 1 1 ,0 1 . It can be seen from the figure, when σ is small and r is large, the system selects the "All D" state with probability 1, and only the "All D" state is the stochastic stable state of the system. When σ is small and r is small, the system selects the "All L" state with probability 1, and only "All L" state is the stochastic stable state of the system. When σ is slightly larger and r is small, the system selects the "All L" state with a large probability, and selects "C + P" states with a small probability; "All L" state and "C + P" states are the stochastic stable states of the system. In all these situations, the probability for the system choosing "C + P" states is not large, indicating that when the punishment probability α is small, a small fixed income coefficient σ has a limited effect on the cooperation behavior of the population. Only when σ is not too small, and r is large, the system selects "C + P" states with a large probability. In this situation, the effect for the promotion of cooperation is significant. Interestingly, the parameters range that the system selects different stable states with high probability is not regular. As shown in the figure, the probability that the system chooses different states exhibits rich and nonmonotonic features.
In order to show all the stable states and their limit distributions, we also show the results in the simplex S4 form. Because there are many parameter combinations of (r, σ), and each combination of parameters corresponds to a simplex S4. So we only choose 4 groups for 12 combinations of (r, σ) for fixed α = 0.1, γ = 0.3 to show the results. The results of the four groups are shown in Figs 2-5 respectively. For more combinations of (r, σ) as α γ = .
= . 0 1, 03, the possible stochastic stable states and their limit probabilities can be acquired from the supplementary attachment file named Limit Probability Data.xlsx. For every piece of data in the Excel table, we can easily draw its simplex S4.
We gradually increase the parameter value of punishment probability α. Figure 6 shows when the punishment probability value 0 5 α = . and the punishment cost 0 3 γ = . , the SSEs of the system and the probabilities for the system to select them change with parameters σ r ( , ) ( r N r 1 1 , 0 1 σ . ≤ < < < − ) varying. It can be seen from   , the increase in the punishment probability greatly expands the parameter ranges for the system selecting "C + P" states, which can greatly promote the evolution of cooperation. Except for the following two cases: for small σ and large r, the system selects the "All D" state with a small probability; for small σ and small r, the system selects the "All L" state with a not too large probability. For other parameter ranges, the system selects "C + P" states with a large probability or the probability equals to 1. If further increase the punishment probability α, the conclusions are similar. Figure 7 shows the corresponding case when α = 1 and 0 3 γ = . . Compared with Fig. 6, the parameter range that the system choosing "C + P" states as its stable states has further expanded, or that the probability of choosing "C + P" states is further increased. As can be seen from the figure, for any combination of parameters, the probability of the system choosing "C + P" states is greater than 0.94, and the parameter range of the system to choose the "All D" state or the "All L" state with a small probability is consistent with Fig. 6.   = . .
Finally, it is necessary to analyze the effect of the punishment cost on the system's selection of various types of stable equilibriums and their probabilities. Take parameter values of Fig. 6 in which punishment probability and the cost 0 5, 03 α γ = .
= . as a reference. Figure 8 shows the stochastic stable equilibriums and their probabilities of the system to choose them with the change of σ r ( , ) ( r N r 1 1 ,0 1 σ . ≤ < < < − ) when 0 5 α = . and γ = . 0 8. Compared with Fig. 6, it can be found that the increase of punishment cost narrows the parameter range that system selecting the "C + P" states with high probabilities, and increases the probabilities that system selecting "All D" and "All L" states under the same parameter. For small σ and large r, the system selects the "All D" state with a large probability. For small σ and small r, the system selects the "All L" state with a large probability. It is important to note that, although the probability of punishment is not large and the cost of punishment is large, the system has a wide range of parameters of selecting "C + P" states, which also explains the effectiveness of this mechanism in promoting cooperation.
At the same high punishment cost, we increase the punishment probability. In order to compare the results with that in Figs 3 and 4, we let 1, . And Fig. 9 shows the corresponding stochastic stable equilibriums   . and their probabilities for the system to choose them with the change of σ r ( , ) ( r N r 1 1 ,0 1 σ . ≤ < < < − ). In contrast to Fig. 7, it is found that under the same parameters, the increase in punishment costs narrows the parameter range of the system selecting "C + P" states with a large probability, and greatly increases the probability of the system selecting the "All D" state, and slightly increase the probability of the system selecting the "All L" state. When punishment cost is large, for small σ and large r, the system chooses the "All D" state with a large probability, which cannot occur for small punishment costs. In contrast to Fig. 8, it is found that when the punishment costs is high; the increase in the punishment probability is very effective for suppressing the system to choose the "All L" state; because the maximum value of v AllL is reduced from 0.8 to near 0.14. But the suppression effect for the system to choose the "All D" state is not significant because the maximum value of the system to choose the "All D" state v AllD is almost not changed, nor the parameter range for system to choose this state.
It can be seen from Figs 1-5 that the introduction of the punishment strategy and the non-participation strategy can change the evolutionary dynamics and equilibriums of the system in the public goods game. The system selects the "C + P" states with large probabilities over a large range of parameters. When other parameters are fixed, the influence of the four parameters that are the punishment probability α, the punishment cost γ, the fixed income coefficient of the non-participation strategy σ, the investment income coefficient of the public goods r, on the stochastic stable equilibriums of the system and their probabilities for the system to choose them can be summarized as follows: (i) For small α and small γ, there are parameter regions of r ( , ) σ for the system to choose the three stochastic stable states of the "All L", "All D" and "C + P" states with large probabilities respectively. In general, when σ is small and r is large, the system chooses the "All D" state with a large probability; when σ is small and r is not too large, the system chooses the "All L" state with a large probability; when σ is not too small and r is large, the system chooses the "C + P" states with a large probability. The parameter regions for the system to choose these three stable states with large probabilities show some irregular features. (ii) When γ is small, the increase of α will greatly expand the parameter ranges that system selecting the "C + P" states with large probabilities, which can greatly promote the evolution of cooperation. When α increases to a certain extent (the specific value depends on γ), the system selects the "C + P" states with a probability close to 1 for any parameters of σ r ( , ). (iii) When γ is large, the increase of α can only expand the parameter range for the system choosing "C + P" states with large probabilities to a small degree. The increase in α is very effective for the suppression of system selecting the "All L" state, but the suppression effect of the system to choose the "All D" state is not significant. Even if α is increased to 1, there is still a range where the system chooses the "All D" state with a large probability. (iv) Overall, the increase of γ will reduce the parameter range of the system to choose "C + P" states with a large probability, and greatly increase the probability of the system choosing the "All D" state, and slightly increase the probability of the system choosing the "All L " state under the same combination of parameters.
Discussing. The evolution process of strategies of public goods game in a finite size population is modeled by a generalized quasi-birth-and-death process under the framework of stochastic stable equilibrium in this paper. Both punishment and non-participation strategies are introduced into the traditional public goods game. The stochastic stable state of the evolutionary system is analyzed by the limit distribution of the stochastic process. This paper focuses on the influence of parameters such as the punishment probability, the punishment cost, the fixed income coefficient of non-participation strategy, and the investment income coefficient of the public goods on the system's selection of different equilibriums and their probabilities.
The contribution of this paper is mainly manifested in the following aspects: First, the model of this paper further develops the model of Amir et al. 55 , and generalizes the Markov-based evolutionary process from one-dimension to multidimensional case, while the analysis under multidimensional situation is more difficult than that in one dimension. Secondly, the model of this paper further enriches the concept of SSE. Since the concept of evolutionary games has been put forward, ESS has been regarded as its core equilibrium concept. There is little literature on the analysis of SSE in a stochastic situation. As far as we know, in the recent literature, Quan et al. 59 studied the SSE of 2 × 2 symmetry stochastic evolutionary games in finite populations with non-uniform interaction rate based on a one-dimension Markov process. And other models basically based on the stochastic differential equations for analysis which assume a infinite size population, such as recent literature of Huang et al. 60 and Liang et al. 61 . In this paper, the model is based on the multidimensional stochastic process which can describe the finite size population and the multi-strategy situation. Then, the model of this paper has different characteristics with the most commonly used Moran process model in finite size population. For the Moran process based model, it is generally necessary to assume weak selectivity in the analysis, and the fixation probability method can only analyze the evolutionary dynamic with only two kinds of strategies in the system, which has some limitations. Finally, this paper obtains the range of parameters of all possible stochastic stable equilibriums that the system may choose by numerical experiments. Among the parameters, some may not be changed according to the actual situation, but others can be adjusted within a certain range, such as the punishment probability, the fixed income coefficient. Therefore, by adjusting the corresponding parameter values, we can facilitate the system to select "C + P" states with large probabilities, which provides a feasible method for realizing the evolution and control of cooperation in public goods game. Each time N individuals are randomly selected from the group for the public goods game. In a sample, when the number of participation type is S 0 , for a defection type, the probability of encountering other − S 1 0 ( > S 1 0 ) participation type individuals is:  We do not consider the penalty items in our analysis at first. After this treatment, the strategies of the cooperation type and the punishment type are the same, including their payoffs. When the number of participation type individuals is S 0 , and the total number of cooperation and punishment type individuals is m, then the defection type individual's payoff is rcm S 0 (do not consider the penalty items). Thus, for a defection type individual, when it's in the state that the number of participation type individuals is S 0 > S ( 1) 0 , the expected payoff for it is:  Because when the number of participation type is S 0 , and a participation type individual meets with other S 1 0 − participation individuals, when its type change from the defection to the cooperation, the amount of its payoff reduction is −

Expected
Thus, in the whole population, the expected payoff difference between the defection type individuals and cooperation (or punishment) type individuals is (do not consider the penalty items): Taking into account the corresponding penalty items and penalty cost for each type of strategy and the above-mentioned expected payoffs are corrected as follows.
For defection type individuals, the penalty item brought by the punishment strategies is Where α is the probability of the defection type individuals being punished by the punishment type strategy, β is the penalty intensity coefficient, is the expected number of punishment type individuals in a sample. For punishment type individuals, the punishment cost is αγ Figure 10 shows the possible state transfer processes of the system. Obviously, the evolution of the system is a three-dimensional state-limited and discrete Markov process; more specifically, a generalized quasi-birth-and-death process. According to the system evolutionary rules and the transition rate between different strategies, after a small enough time t, the probabilities of the system transfers from state I i j k ( , , )  can be obtained by equations Q 0 1 = , where 1 is the column vector with all components equal to one, and 0 is the column vector with all components equal to zero.