The dynamics of corruption under an optional external supervision service

External supervision services play an important role in combating corruption by detecting potential collusive bribery. This work aims at studying the dynamics of collusive bribery when participants have the option of engaging the external supervision services. To do so, we construct a basic model where collusive bribery can happen between the defecting participants who aim to escape from a punishment by offering a bribe to rule enforcers who monitor interactions among all participants. Among rule enforcers, only the corrupt ones accept the bribe and ignore the violations. The cooperative participants can engage the external supervision service at a certain cost to avoid the risk of potential collusive bribery. Under the framework of evolutionary game theory, we ﬁnd that a higher initial fraction of honest enforcers is more likely to lead to a trusting cooperating equilibrium. We also ﬁnd that, when allowing random exploration of available strategies, increasing the exploration rate of rule enforcers is effective in combating corruption for both inﬁnite and ﬁnite populations. Lastly, we ﬁnd that minimizing the cost of external supervision services is not always good. When the system evolves into a cooperating equilibrium, a low cost of external supervision service induces unnecessary costs of seeking external supervision. When the strategy proﬁles exhibit stable oscillations, there exists an optimal cost of external supervision, considering the trade-off between minimizing the chance of exposing cooperative participants to collusive bribery and strengthening the punishment on the corrupt enforcers. Premised on the results, we discuss practical management suggestions.


Introduction
From simple groups of individuals to complex alliances of countries, incentives are a common way of constraining behaviors, ensuring interactions are compliant with any specified rules, and a means of promoting cooperative behavior [1] .Incentives are always enforced institutionally by an independent rule enforcer [2] who has complete oversight of the participants and executes punishments to the rule-breakers (resp.defectors), or rewards/compensates the rule-obeyers (resp.cooperators) [4] .However, the effectiveness of incentives can always be undermined by pervasive corruption, as institutions are still operated by individuals with selfish motives [5][6][7][8] .These corrupt authorities might accept bribes from the defecting participants who intend to escape from the punishment [9,10] or to get the reward [11] .Such collusive bribery can encourage non-compliant behaviors, erode trust among the participants, and ultimately lead to the collapse of the social contract [9] .
Combating pervasive corruption has thus attracted the attention of scholars from different fields.An achieved consensus is that transparency is critical for mitigating corruption.For example, Michael et al. proved the effect of transparency on mitigating corruption in public goods games(PGGs) through experiments as described in Muthukrishna et al. [5] ; Brusca et al., based on survey data, analyzed the positive effect of transparency on fighting against corruption [12] .Supervision services are effective and common countermeasures in guaranteeing transparency facing corruption: companies can hire independent regulatory authorities to detect potential fraud [13,14] ; citizens can pay for certification [15] , appraising or identification service if they suspect fraudulent behavior, and prepare for further available public activities such as complaints [16,17] , whistle-blowing, or reporting [18] .For the sake of brevity, we refer to such services able to survey the process, assess the value, detect frauds, and provide certifications as external supervision services .These external supervision services provide participants with an opportunity to supervise authorities by reducing the information asymmetry [19] , thereby improving the transparency of the system, and ultimately combating collusive corruption.
External supervision services result in extra costs [16,20,21] , naturally, the level of the cost can influence the participants' willingness to engage such services.If the cost is too high, there might be fewer participants to pay for the service, which makes the collusive bribery still hard to be discovered, and thus corruption can breed.However, not only the cost, but also the environment can influence the engagement strategy, since the benefits of the engagement depends on the probability of discovering the collusive bribery.If the probability of being cheated by peers, and the corruption level of rule enforcers are high, the engagement would be sensible.But if rule enforcers are honest and the participants are all cooperators, engaging the external supervision service is no longer a sensible strategy.Meanwhile, the participants' engagement to the external supervision service in turn shapes the environment.Rule enforcers have to count the extra income from the bribe against the risk of being discovered.As a consequence, when more participants engage the service, rule enforcers would tend to be honest.
Accordingly, the introduction of external supervision services can potentially control the corruption of rule enforcers and subsequently decrease the fraction of defectors among the participants.But these effects can be influenced by other factors, for example, the cost of the external supervision service can be decisive.This work tries to figure out how the paid external supervision service influences the evolutionary dynamics of strategies employed by both rule enforcers and participants.This problem is ideally suited for analysis by the evolutionary game theory (EGT), which is an analytical framework widely applied in analyzing and predicting the dynamics of strategy profiles during the evolution process [22][23][24][25] , relying on the Darwinian process of natural selection that drives participants toward the optimization of reproductive success [26,27] .
Recently, a number of papers have applied EGT to explore the key factors that might influence the corruption level of rule enforcers [28,29] , and the evolutionary dynamics of rule enforcers and participants [2,9] .Additionally, EGT has been used to examine the influence of different anti-corruption controls on the corruption level, including social exclusion [30] and government authority's direct investigation [18] .These anti-corruption controls are of zero cost for the participants.However, it is still not yet fully understood when the control requires a certain cost for the participants [31] .Verma et al. explored how the complaint cost impacts the proliferation of the harassment bribes among varying structured population [16] .They found that with a lower cost of complaining, participants will tend to complain, and the population of honest rule enforcers is proliferated.In harassment bribes, participants are limited to interacting with the rule enforcers and can choose to either bribe or complain.In contrast, collusive bribes involve interactions among participants [32] , where their payoff depends not only on the interactions with the rule enforcers, but also on their interactions with peers, deciding whether to collaborate or betray.This is the distinction in the strategy space of participants in collusive bribes and harassment bribes.In addition, considering a lower complaint cost could narrow the payoff difference between participants who complain and those who do not, which may encourage complaint abuse and eventually increase unnecessary cost among the participants.It is hence arguable whether minimizing the complaint cost is the optimized choice within the collusive bribery environment.In this article, we thereby aim to answer the following questions: 1) to what extend can the external supervision service combat corruption?2) how do other key factors, such as the service cost, influence its effectiveness in combating corruption? 3) is minimizing the service cost always optimal for the market?
To address these questions, we consider a general market composed of two parties: rule enforcers who ought to enforce the incentives but face the temptation of bribes, and the participants who can play games pairwise.Among the participants, the cooperators are provided with the optional choice of engaging the external supervision service, which can prevent potential loss caused by collusive bribery.To investigate the effect of such external supervision services on combating collusive bribery, we first construct a model to mimic the co-evolution of the strategies in both parties.The evolution of the strategies is then analyzed using replicator dynamics in an infinite and well-mixed market; furthermore, we apply numerical simulation experiments to study the evolution within different sizes of finite markets.Our analysis reveals several interesting results, and premised on which, we propose a number of strategies to help manage such markets.
The remainder of this paper is organized as follows: Section 2 introduces the basic model composed of the bribery game between rule enforcers and the participants, and the dilemma game played by participants.In Section 3 , we provide the analytical results of player-enforcer dynamics with differing service cost in an infinite market, and we also provide some

Table 3
Payoff matrix with corrupt enforcers implementing incentives.
robustness analysis when participants can explore different strategies.Section 4 discusses the stochastic dynamics within finite markets of various scales.We conclude the findings and compare them with related work in Section 5 .

The bribery game model with the external supervision service
Let us first consider a population of participants within a market.We assume interactions within this market operate as a pairwise social dilemma between two participants (also called players).The action space of players is defined as where C is the strategy of the cooperators who obey the market rules, and D is the strategy of the defectors who break the rules.In Table 1 , we present the payoff matrix of pairwise players.It can be regarded as a classic one-shot prisoner's dilemma where b is the payoff for mutual cooperation, and c is the temptation of choosing to defect [33,34] .It can also be considered as a donation game where c is the cost for donation, and b + c is the payoff when free-riding [35] .
Incentives are needed to maintain the order of the market.The payoff matrix shown in Table 1 indicates that defecting is the dominant strategy.Without incentives, all self-interested players will choose D , which leads to pure defectors where no one can gain any benefits.Accordingly, rule enforcers need to detect and punish the defectors, and protect the interests of the cooperators, for ensuring the order of the market.Since the implementation of the incentives is usually at a cost [3,36] , we assume that players need to pay C 0 ( C 0 < b) to rule enforcers as a commission fee for joining this market.
In this model, both negative and positive incentives are considered.Each of the defectors is penalized with a fine f where f > c. k f ( k ∈ (0 , 1) ) is used to cover the cost of rule enforcers in monitoring the market and collecting the fine.Each of the cooperators who were cheated by defectors receive a compensation of (1 − k ) f from rule enforcers.Without loss of generality, we set k = 0 .5 .Under this mechanism, the dominant strategy is C as shown in Table 2 , and the market evolves into one that contains only cooperators.When information is asymmetric, collusive bribery can happen.First, the defectors are more likely to bribe as the probability for players to discover the defection is low.As long as the cost of bribe B is less than the profit gained from cheating and less than the fine ( B < c < f ), bribing is preferable for defectors.Second, rule enforcers also tend to be corrupt, driven by the additional income B and the collusive bribe being unlikely to be detected.Such hidden bribes and corruption can change the payoff matrix completely.Table 3 describes the scenario in which bribes and corruption happen.Comparing Tables 3 to 2 , it can be observed that the existence of a bribe and corruption drives D to become the dominant strategy.
Hence, the strategy of players fully depends on whether the rule enforcers are corrupt or not; we denote the action space of enforcers as A u = { U h , U c } .For each pair of players, there is one enforcer in charge of incentive enforcement.If the rule enforcer is honest, C is the dominant strategy, otherwise D is the dominant one.Based on these variables, we represent the level of corruption by the fraction of corrupt enforcers.
If there are no mechanisms to break the information asymmetry, then honest enforcers can become corrupt through social learning or natural selection, because of the additional income from the bribe B .Fortunately, the cooperators can protect their interests through external supervision services at cost a to check the interaction, and subsequently be informed about the existence of non-compliant behavior. 1Once the collusive bribery between the defector and the corrupt enforcer is exposed, the corrupt enforcer needs to not only return the compensation f / 2 and the commission fee C 0 but also cover the cost a for the cooperator.In addition, the defector is fined f .Hence, for cooperators, they can choose to engage the external supervision service or not.The strategy in which the service is (resp.not) engaged is denoted as C a (resp.C ā ), and the corresponding cooperators are called cautious cooperators (resp.trusting cooperators).
The payoff matrix of C a , C ā and D facing an honest enforcer is denoted as A h : facing a corrupt enforcer is denoted as A c : For the rule enforcers, U h can get 2 C 0 in all combinations of participants, while U c can get C 0 + B − a in the event (C a , D ) , 2 C 0 + B in the event (C ā , D ) , and 2 C 0 + 2 B in the event (D, D ) .The strategy of rule enforcers is thus decided by the fractions of cautious cooperators ( C a ), trusting cooperators ( C ā ) and defectors ( D ) in the population.In the next section, we discuss the strategy dynamics of participants and rule enforcers.

Player-enforcer dynamics in an infinite population
For convenience, we note the total number of players as N, the number of players who choose strategy C a , C ā , and D as # C a , # C ā , and # D ; and the fraction of strategy S i is # S i /N, ( S i ∈ { C a , C ā , D } ) .The strategy profile is thus denoted as Analogously, let M be the total number of rule enforcers, M = N/ 2 , the strategy profile of rule enforcers is y = (y 1 , y 2 ) = (# U h /M, # U c /M) .The initial state of strategy profiles are noted as x (0) and y (0) .x and y evolve within the simplex S = 3 × [0 , 1] spanned by the six points ( C a , U h ), ( C ā , U h ), ( D , U h ), ( C a , U c ), ( C ā , U c ), ( D , U c ).This section analyzes the dynamics of x and y within an infinite and well-mixed population under the framework of EGT [35,37] .
Considering an infinite and well-mixed population, the dynamics of x and y follow the replicator-equations: ˙ where π (U h ) and π (U c ) are the payoff of honest enforcers and corrupt enforcers: (2) Thereby, we can analyze the dynamics of x and y , and then discuss the robustness of the results under different exploration rates.

Player-enforcer dynamics without exploration
The simplex S in Fig. 1 (a) represents the dynamics of x with honest enforcers ( y 1 = 1 ), while the simplex in Fig. 1 (b) represents the dynamics of x with corrupt enforcers ( y 1 = 0 ).The values of parameters b, c, C 0 , B , and f have been set as 0.5, 0.5, 0.2, 0.2, and 2, respectively, based on previous studies [2,3,9,36,38] .Changing these parameter values can impact the players' and rule enforcers' payoffs, thereby leading to different results.However, in this study, we have opted to use commonly employed values for these parameters and only focus on varying the level of a and exploration rates.
In the context of the evolutionary game theory framework, the boundary points that span S are invariant states.Therefore, the vertices of 3 are fixed points [35] .In both Fig. 1 (a) and (b), trusting cooperation ( C ā ) is the dominant strategy on the edge C a C ā , the cautious cooperators thus evolve into trusting ones, and x = (1 , 0 , 0) is always a saddle point.With pure honest enforcers, x * = (0 , 1 , 0) is the only asymptotically stable fixed point.
While with pure corrupt enforcers, there are no stable fixed points.The growth of C ā makes strategy D the dominant one, hence, the players' strategies adapt from C ā to D .Then, with the increase of x 3 , cooperators are motivated to be cautious again.That's why the three fixed points located in the vertices of the simplex 3 are unstable, as Fig. 1 (b) shows.Except for the three vertices, there is one internal fixed point x 1 . The white nodes are unstable fixed points, gray nodes are saddle fixed points, and black nodes are asymptotic stable points.In the simplex S = 3 × 1 , (y 1 = 1) , the dominance of pure trusting cooperation ( x * = (0 , 1 , 0) ) is the only asymptotic stable point.In the simplex S = 3 × 0 , (y 1 = 1) , there is an unstable interior fixed point x 1 * , and with a lower a , x 1 * gets closer to the edge C a C ā .S × [0 , 1] in Fig. 1(c) represents the dynamics of x facing mixed rule enforcers.The horizontal evolution directions of fixed points are shown with dashed arrows.Except for the seven fixed points on the left and right surface, all points on the edge C a C a (pure cautious cooperators) are saddle points; points on the edge C ā C ā (pure trusting cooperators) are asymptomatically stable when y 1 > (B − c) / (B − f ) , otherwise, they are saddle points.All points on the edge DD (pure defectors) are unstable, and players evolve from D to C a or to both C a and C ā (if y 1 > 2(B − c) / (2 B − 3 f ) ); rule enforcers evolve to corrupt ones.Without exploration, y * is always reachable, and the equilibrium is decided by y (0) .Figure 1(d) and 1 (e) show the player-enforcer dynamics when y (0) = (0 . 1 , 0 .9) , in which y * = (0 , 1) and x exhibits cyclic dominance.A higher a has a stronger inhibition effect on cautious cooperators and a heavier punishment effect on corrupt enforcers, which induces the difference of the trajectories of x and y in Fig. 1(d) and (e).
We first discuss the existence of x 1 * , and then discuss its stability.Since B < c < f , it is easy to tell that (c − B ) / f ∈ (0 , 1) and 2 a/ (2 a + f + 2 C 0 ) ∈ (0 , 1) .Thus, when * is asymptotically stable depends on the real part of the eigenvalues of the Jacobian matrix (Eq.A.2 in Supplementary Material A.1) at x 1 * .Figure 1 (b) shows one unstable example with a = 0 . 1 .However, when a = 0 , x 1 * is stable, and it is located on the edge C a C ā .A more elaborate stability analysis of x 1 * can be found in Supplementary Material A.1.
Figure 1 (a) and (b) show two extreme scenarios, corresponding to the left and right surface of the triangular prism simplex S = 3 × [0 , 1] in Fig. 1 (c).This simplex captures the regime with mixed rule enforcers when a = 0 . 1 .There are no interior fixed points inside S = 3 × [0 , 1] , and neither on the back surface ( x 1 = 0 ), the front surface ( x 2 = 0 ), nor the bottom surface ( x 3 = 0 ).However, all the points on edge C a C a and C ā C ā are fixed points.In the following analysis, we further analyze S = 3 × [0 , 1] from the bottom surface to the upper edge DD .
On the bottom surface where x 3 = 0 , since honest and corrupt strategies perform equally well for rule enforcers, facing a homogeneous population of cooperators, every fixed point on the edge C a C a and C ā C ā are fixed points.For fixed points on C a C a , they are saddle points and unstable, as for players, C a is dominated by C ā facing cooperators.For the fixed points on the edge C ā C ā , the transversal eigenvalue lim x 3 → 0 ˆ , the transversal eigenvalue is negative, which indicates that x * = (0 , 1 , 0) is asymptotically stable and D cannot invade this trusting cooperative equilibrium.Whereas when y 1 ≥ (B − c) / (B − f ) , x * = (0 , 1 , 0) is not asymptotically stable and can be invaded by defectors, then the fraction of corrupt enforcers will be aroused by the invading defectors, and finally move towards the right surface of the simplex S = 3 × [0 , 1] (We use dashed arrows to represent the horizontal evolution directions).Accordingly, all the fixed points on the left side of the dark gray triangle are stable, while those on the right side of the triangle are saddle points.It can be inferred that, in an infinite population, the initial strategy profile of rule enforcers is of vital importance to the evolution direction.If the market initially contains more honest enforcers , players have a higher likelihood to evolve into pure trusting cooperators (More analysis can be found in Supplementary Material A.2.1).Otherwise, the system ends up with pure corrupt enforcers, and the strategy profile of players exhibits stable oscillations, as Fig. 1 (d) and (e) show.
When x 3 > 0 , there are no fixed points inside the simplex nor on the edge DD .On the edge DD , all points are unstable.When the market is completely composed of defectors, they eliminate themselves due to the fixed cost C 0 , zero gain from the game, and the additional expense of bribing B .We define this feature of defectors as "self-inhibiting " in the remainder of the paper.The evolution direction at x = (0 , 0 , 1) depends on y .As lim and C a can invade, otherwise only C a can invade.These two parts are segmented by the light gray triangle in Fig. 1 (c).
In order to explore the influence of the cost for engaging the external supervision service a , the player-enforcer dynamics under low cost a = 0 . 1 and high cost a = 0 .5 are presented in Fig. 1 (d) and (e), where x (0) = (0 . 25 , 0 .25 , 0 .5) and y (0) = (0 . 1 , 0 .9) .Comparing these two subfigures, it can be observed that the value of a doesn't change the equilibrium, but it can influence the trajectories to the equilibrium.
For the dynamics of players, when a = 0 .5 , the summit of the fraction of C a ( x 1 ), is lower in each cycle, compared to when a = 0 . 1 .This result is on account of the influence of a on C a .Since C a is the dominant strategy when there are enough defectors, once the fraction of defectors decreases below the threshold, # C a starts decreasing.A higher a increases the threshold, hence makes x 1 decrease earlier, which leads to a lower summit.Another feature of x 1 is that the absolute value of its negative gradient is larger.That is because C ā is dominant to C a when cooperators are the majority, and a higher a strengthens this dominance.Then players transform from C a to C ā faster.We generalize these two influences, reducing the summit and accelerating the elimination of C a , as the "inhibition effect " of a on C a .
Actually, it is the inhibition effect that accelerates the oscillations of x .The lower summit of x 1 leads to more defectors remaining in the market, the event (C ā , D ) then has a higher chance to happen.Since π (D ) is the highest in the event (C ā , D ) , the higher chance makes it easier for the defectors to invade when C ā is the majority.x hence is easier to move away from its unstable fixed point x = (0 , 1 , 0) , which shortens the time of x 2 staying at the high level.In addition, the faster elimination of C a also shortens the period.Therefore, the high a reduces the period of x by its heavier inhibition effect.
For rule enforcers, the fraction of corrupt enforcers ( y 2 ) in the equilibrium is y * 2 = 1 .Under a low a , y 2 increases monotonically from y (0) 2 = 0 .9 ; but when a = 0 .5 , y 2 decreases briefly at the beginning.This difference is caused by the influence of a on U c .In both circumstances, the defectors start with being eliminated by C a in the event a then turns out to be a punishment for the corrupt enforcers.Further, a high a means a heavier punishment, which weakens the dominance of U C and decreases its rate.We name this consequence of a on rule enforcers as "punishment effect ". Due to this punishment effect, # U h increases at the beginning when a = 0 .5 .In summary, from the analytical results, the dynamics of players facing honest enforcers are as one would expect: since C ā is the strict dominant strategy, x * = (0 , 1 , 0) is the only global evolutionary stable state (ESS).When facing corrupt enforcers, there are no stable fixed points and x exhibits stable oscillations, with their frequency being higher for larger a .In the case of mixed rule enforcers, the evolution direction strongly depends on y (0) .When y (0) 1 < (B − c) / (B − f ) , y * = (0 , 1) , the dynamics of x is then the same as when facing corrupt enforcers; otherwise the equilibrium is x * = (0 , 1 , 0) , and the higher a is, the longer it takes to reach y * .Finally, the different levels of a can influence of trajectories of x and y through the inhibition effect on C a and the punishment effect on D .

Player-enforcer dynamics with exploration
In this section, we analyze the player-enforcer dynamics when the players and the rule enforcers explore alternative strategies with a certain probability, denoted as the exploration rate or mutation rate.Let μ and v be the mutation rate of players and rule enforcers.μ means that, in an exploration step, μx i players from the population of S i switch to one of the other two strategies, S j and S k .Meanwhile, μx j / 2 players from the population of S j and μx k / 2 players from the population of S k joining the population of S i .Therefore, the change of x i caused by mutation is −μx i + μx j / 2 + μx k / 2 .Similarly, for rule enforcers, the change of y i brought by v is −v y i + v (1 − y i ) .Thus, the replicator equations with random exploration can be formulated as: ˙ To analyze the influence of the mutation rates on the player-enforcer dynamics, we first define two groups with two different mutation levels to study the influence of the absolute value of the mutation rates.In the low-level group, the mutation rate is 0.001 or 0.005 [39] , and in the high-level group, the rate is 0.01 and 0.05 [9] .To better understand the influence of the relative value of μ and v , we design asymmetric mutation rates for players and rule enforcers.We assume that either v < μ or v > μ.Accordingly, there are 4 combinations of μ and v in total: {( μ = 0 .001 , v = 0 .005 ), ( μ = 0 .005 , v = 0 .001 ), ( μ = 0 .01 , v = 0 .05 ), ( μ = 0 .05 , v = 0 .01 )}.The other variables are the same as in Section 3.1 We present the results under a = 0 . 1 and a = 0 .5 in Fig. 2 .μ for players and v for rule enforcers, x * and y * are always reachable, and the equilibrium of the system is robust to the initial state x (0) and y (0) .When v > μ, x * ≈ (0 , 1 , 0) , y * ≈ (0 . 5 , 0 .5) (Fig. 2(a)-(d)).A higher μ leads more C ā to explore C a , x 2 then increases; while this increment of x 2 can be offset by the stronger inhibition effect of a on C a : when a = 0 .5 , the fraction of trusting cooperators is increased, which reduces the unnecessary supervision cost.
For rule enforcers, a higher v almost doesn't influence y * because y * ≈ (0 . 5 , 0 .5) .When v < μ, corrupt enforcers are always the majority, as Fig. 2(e), (f) show.Hence, a higher v means more U c explore U h , which improves the fraction of honest enforcers.For players, more honest enforcers corresponds to a lower fraction of defectors.Nevertheless, the fraction of cautious cooperators ( x 1 ) is not necessarily decreasing, because a higher μ and a lower a can lift x * 1 as 2 (g) shows.
In contrast to the results in Section 3.1 , with exploration, x * and y * are always reachable.Furthermore, the equilibrium is independent of the initial state (See Supplementary Material A.2.2).The equilibrium of the system as well as the time required to reach it are decided by a combination of the mutation rate and the cost a .From Fig. 2 , we observe that the relative value of μ and v are critical for y * .More concretely, when v > μ, y * ≈ (0 . 5 , 0 .5) ( Fig. 2 (a)-(d)), otherwise U c is the majority ( Fig. 2 (e), (f)).In the following, we discuss the results under v > μ and v < μ.

The mutation rate of rule enforcers is higher than that of players
When v > μ, the equilibrium of players is x * ≈ (0 , 1 , 0) , 2 and the equilibrium of rule enforcers is y * = (0 . 5 , 0 .5) .In the low mutation group, players always reach equilibrium faster than rule enforcers, as Fig. 2 (a) and (b) show.According to the analysis in Section 3.1 , when the market is composed of pure trusting cooperators, y has no motivation to move, as strategies U c and U h perform equally well; but when v = 0 , y continues evolving after x * is already reached.The majority U c mutates to U h since π (U c ) ≈ π (U h ) | (x 2 → 1) , until y * ≈ (0 . 5 , 0 .5) is reached.
In the high mutation group, the time required to reach y * is significantly shortened.This is because a higher v leads to more corrupt enforcers mutating into honest enforcers.The influence of higher μ is also applicable to x : with a higher μ, the system reaches x * faster.It is worth noting that the combination of μ and a decides the fraction of trusting cooperators in the stable state( x * 2 ).Generally, a higher μ leads to a lower x * 2 , as more C ā can explore C a .However, when a = 0 .5 , the strengthened inhibition effect of a on C a offsets the raise brought by the high μ, and lifts x * 2 .Accordingly, 01 , a = 0 .5) as Fig. 2 (c) and (d) show.The fact that a higher a is associated with more trusting cooperators indicates that a lower a is not always better.A lower a might encourage more cooperators to seek the external supervision service spontaneously, which is unnecessary when facing cooperators; we call it as "unnecessary supervision cost " in the rest of the paper.

The mutation rate of rule enforcers is lower than that of players
When v < μ, players have a higher mutation rate, and both players and rule enforcers reach a mixed strategy equilibrium.Within the low mutation group, y * ≈ (0 , 1) , x * depends on a .In Fig. 2 (a), the fraction of defectors in the equilibrium is 0 .0676925 , rather 0 .0 0 0852428 (when v > μ).The persistent existence of D stimulates the growth of U c , hence instead of (0 . 5 , 0 .5) , y evolves to y * = (0 .04391494 , 0 .9560851) , where U c is the majority, and the corresponding equilibrium of players is x * = (0 .1316954 , 0 .8006121 , 0 .0676925) .
In the high mutation group, the fraction of honest enforcers in the equilibrium increases a lot compared to in the low mutation group.Because U c is the absolute majority, with a higher mutation rate, more U c explore strategy U h , and y * 1 increases.For players, with more honest enforcers in the market, the fraction of defectors is lower.Although the remaining cooperators are surrounded by more U h and fewer D , the fraction of C a is not necessarily lower.This is because a higher mutation rate also means more C ā explore strategy C a , which then raises x * 1 .As for the influence of a , its inhibition effect of a on C ā decreases x * 1 in both low and high mutation groups.With less external supervision, the corresponding rate of U c and D are higher.
In summary, the relative value of the mutation rates determines if rule enforcers can reach the equal dominance ( y * ≈ (0 . 5 , 0 .5) ) where players are of trusting cooperation dominance ( x * ≈ (0 , 1 , 0) ).Only when v > μ, y * ≈ (0 . 5 , 0 .5) , and x * ≈ (0 , 1 , 0) .Under this circumstance, a higher a corresponds to a higher fraction of the trusting cooperators, which reduces the unnecessary supervision cost.However, when v < μ, the cost a together with the absolute value of the mutation rate determine x * and y * ; reducing a or increasing the mutation rate is beneficial in reducing the fraction of corrupt enforcers and promoting cooperation.We draw the following: increasing the mutation rate of v is always favorable for combating bribery corruption; the optimal a depends on whether x * ≈ (0 , 1 , 0) is reached.

Simulation experiments design
We now discuss the stochastic dynamics in a finite population.We design three sets of experiments for different market sizes: small scale with N = 10 , medium scale with N = 100 , and large scale with N = 10 0 0 .For exploring the effect of introducing the external supervision service and the related key factors on the stochastic dynamics, we first vary the level of a from 0.1 to 0.5; whereof 0.1 indicates low cost and 0.5 indicates high cost.In place of the replicator Eq. (3) that are applied in an infinite population, in a finite population, we assume players update their strategy from S i to S j with the likelihood 1 + exp S (π (S i ) − π (S j )) −1 .Similarly, for rule enforcers, the likelihood of switching from U h to U c is −1 , and vice versa.The random exploration strategy is also adapted in the simulation experiments.The complete algorithm of the stochastic dynamics is listed in Supplementary Material B.2. Table 4 shows the complete setup for the experiments.In the simulation experiments, we run for a fixed number of time steps T that varies for different market sizes.We set T = 10 0 0 for N = 10 , T = 20 0 0 for N = 100 , and T = 5000 for N = 1000 .These time steps are adjusted and fixed after trials to ensure that the evolution time is sufficient to reveal the patterns of evolution and provide relatively accurate results in the presence of randomness.

Results of simulation experiments
For the evolution of x and y , there are two possible patterns that can emerge: 1) the strategy profiles eventually evolve to stable states; 2) the strategy profiles show stable oscillations.Figure 3 (a) shows an example of reaching a stable state, where N = 10 .As can be observed, for players, the number of defectors decreases while the number of cooperators increases.The trusting cooperators eliminate cautious ones until they dominate the market ( x * = (0 , 1 , 0) ).For rule enforcers, as discussed in the analytical results, when x * = (0 , 1 , 0) , any y can be a fixed point.In Fig. 3 (a), y * = (0 . 5 , 0 .5) , where the rule enforcers choose U h or U c with an equal probability.
Figure 2 (b) shows an example of stable oscillations in a large scale market.This phenomenon essentially comes from players' and rule enforcers' adaptation to the environment.Since the market scale is large, defectors cannot be completely removed.As the fraction of C ā grows, the occurrence of (D, C ā ) | (U c ) increases.When this event takes place, x 3 rises while x 2 falls since π (D ) > π (C ā ) .Therefore, x 2 cannot reach 1.However, the increase of x 3 is constrained by the self-inhibiting nature of defectors: the flourish of defectors triggers the emergence of cautious cooperators, which in turn eliminate defectors.As a result, the fraction of cooperators increases, leading to a renewed rise in x 2 as the cycle repeats.Regarding y , the high fraction of cautious cooperators hampers the growth of U c by reducing π (U c ) from 2 C 0 + B to C 0 + B − a .Consequently, y 2 decreases after x 1 climbs up.However, y 2 cannot decrease to 0 because the persistent presence of defectors can stimulate the growth of U c .The oscillatory pattern emerges.
In our simulation experiments, we find the pattern depends on the market size N, the mutation rates ( μ and v ), and the cost of the external supervision service a .In a small scale market where N = 10 , both x and y can always reach a stable state.In a large scale market where N = 10 0 0 , the strategy profiles always show stable oscillations.The most complicated case is the medium scale market, where N = 100 , the strategy profiles can evolve into one of the two patterns, depending on the mutation rates and the external supervision cost a .For convenience, we report the results of experiments in the order of N = 10 , N = 10 0 0 and N = 100 .In each set of experiments, we elaborated on the influence of the cost a and the mutation rate on the stochastic dynamics.a determines the dynamics of y * : when v > μ ( v = 0 .05 ), the probability of more than half of the rule enforcers being honest ( y * 1 ≥ 0 .5 ) in the stable state is greater than 38.4%, whereas when v < μ ( v = 0 .01 ), the probability of evolving into pure corrupt enforcers ( y * 2 = 1 ) is almost certain.In the former situation, the chance of y * 1 ≥ 0 .5 decreases with a higher a due to the inhibition effect of a on C a ; and in the latter situation, the punishment effect of a on U c makes the relative frequency of y * 2 = 1 slightly decreases as a increases.

Stochastic dynamics in a small scale market: N = 10
When N = 10 , since the number of players in the market is small and the low mutation rate does not influence the stochastic dynamics, we only report the results of the high mutation rate group: μ = 0 .01 , v = 0 .05 and μ = 0 .05 , v = 0 .01 .
For each set of parameters, the experiments are repeated 500 times.We find that in a small scale market, the strategy profile of rule enforcers eventually evolve to one of the stable states { (0 .015 , 0 .985) , (0 .065 , 0 .935) , (0 . 5 , 0 .5) , (0 .935 , 0 .065) , (0 .985 , 0 .015) } .The stable states with a small fraction of honest or corrupt enforcers are caused by the mutation rate v (Detailed proof is in Supplementary Material C.1), i.e., y * = (0 .015 , 0 .985) and y * = (0 .065 , 0 .935) are equivalent to y * = (0 , 1) when N = 10 .In the remainder of the paper, for simplification, if the population evolves to a pure strategy equilibrium, we use y * = (0 , 1) .Hence, y evolve to one of the three equilibria: the dominance of honest enforcers ( y * = (1 , 0) ), equal dominance ( y * = (0 . 5 , 0 .5) ) and the dominance of corrupt enforcers ( y * = (0 , 1) ).The fraction of simulations (of the 500 repetitions) that reach these equilibria are shown in Fig. 4 .For players, when μ = 0 .01 , x * = (0 , 1 , 0) is always reachable (Details can be found in Supplementary Material C.2.2), but when μ = 0 .05 , x * is only reachable when y * = (1 , 0) ; otherwise x presents stable oscillations.For rule enforcers in an infinite market, we have discussed that y * ≈ (0 . 5 , 0 .5) when v > μ, otherwise, rule enforcers evolve into a highly corrupt group where y * 2 > y * 1 .Within a small scale market, we also observe from Fig. 4 that when v > μ, the probability of y * 1 ≥ 0 .5 is much higher than when v < μ.That is to say, when v is high, the market is more likely to evolve to the state where at least half of the rule enforcers are honest; whereas when v < μ, the probability of evolving into corrupt dominance is greater than 94.8%, although evolving into honest dominance is still possible.
For players, x eventually evolves to a stable state that is composed of pure trusting cooperators when μ = 0 .01 .However, when μ = 0 .05 , x might never reach a stable state.Considering N = 10 , the average number of exploring players is 0 .5 , which means C a or D can easily sneak in.Hence, when y * = (0 , 1) , which is the most frequent outcome when μ = 0 .05 , x presents a cyclic pattern as the left panel of Fig. 3 (b) shows.
The external supervision cost a can influence y * 's value.From Fig. 4 , it can be observed that the chance of y 1 * ≥ 0 .5 decreases as a increases when v > μ.This phenomenon is caused by the inhibition effect of a on strategy C a .With a lack of the external supervisions, the probability of y * = (0 , 1) is higher.However, when v < μ, the relative frequency of extreme fixed points y * = (1 , 0) is also increasing.This counter-intuitive conclusion that rule enforcers have a higher probability to evolve to an honest equilibrium with the increase of a , is owing to the punishment effect of a on U c .Increasing a induces a heavier punishment on U c , and strengthens the dominance of U h in the event (C a , D ) , and therefore y * = (1 , 0) increases as a increases.) and E (x 3 ) .E (y 2 ) depends on both the relative value and absolute value of μ and v .E(y 2 ) is higher when v > μ than when v < μ.When the relative value is controlled, a higher mutation rate leads to a larger E(y 2 ) .Additionally, a higher value of a has an inhibiting and punishing effect on C a and U c , resulting in more trusting cooperators and honest enforcers.
In summary, in small scale markets, the equilibrium of the rule enforcers' strategy profile is always reachable.In the stable state, enforcers can be composed of pure U h , half U c and half U h , or pure U c .The probability of a specific equilibrium depends on the relative value of exploration rates and a .When v > μ, the probability of y * 1 ≥ 0 .5 decreases from 72.4% to 38.4% as a increases from 0.1 to 0.5; when v < μ, the chance of y * 1 = 0 is more than 94.8%.For players, x * = (0 , 1 , 0) is only reachable when μ = 0 .01 (Proved in Supplementary Material C.2.2) or when y * 1 ≥ 0 .5 (Proved in Supplementary Material C.2.1).Otherwise, it exhibits a pattern of cyclic dominance.

Stochastic dynamics in a large scale market: N = 10 0 0
In a large scale market, cyclic dominance emerges among different strategies for both participants and rule enforcers.The strategy profiles x and y can never reach equilibria because the large population makes it more likely for rare mutations to occur and potentially lead to the emergence and spread of less dominant strategies.Nevertheless, the mean value of the fraction of corrupt enforcers or bribery defectors is critical to evaluate the corruption level of the market.Ergo, in this subsection, we discuss the influence of the two key factors, the external supervision cost and mutation rate, on the mean value of the fraction of strategies, E(# S i /N) = t=50 0 0 t=1 (# S i /N) (t) / 50 0 0 , S i ∈ { C a , C ā , D, U h , U c } .In addition, inspired by the results from the numerical experiments, the oscillation frequency ( Fig. 1 (d) and (e)) or the speed of fluctuations ( Fig. 2 ) can also be impacted by these two factors.We thus count the average cycle length of strategies C (S i ) and analyze the results.We repeat the experiments 200 times under each set of parameters.All the results are the average of the 200 repeated experiments.
Figure 5 shows the mean value of the fraction of specific strategies among the population.The mutation rates are distinguished by the line type, and the strategies are distinguished by the color.From the experiments, we find the mutation rate doesn't bring too much difference to E(# S i /N) ; nevertheless, with higher values of μ, E (x 1 ) , E (x 3 ) , and E (y 2 ) are higher.
The influence of the service cost a is that a higher a induces more trusting cooperators and more honest enforcers, which corresponds to a higher E (x 2 ) and a higher E (y 1 ) .
When there is no exploration, we have proved that the only equilibrium for players is x * = (0 , 1 , 0) .Although the players tend to evolve to homogeneous trusting cooperators, this equilibrium is not reachable in a finite market because the random exploration allows D to invade.The greater the μ is, x 2 is more likely to drop before it approaches one.This fact leads to the result that E(x 2 ) | (μ) < E(x 2 ) | (μ ) when μ > μ .For the fraction of the other two strategies, C a and D , they are more likely to rise before they approach zero, and thus have higher mean values as μ increases.
For rule enforcers, y * exists when the population is infinite, and the equilibrium depends on the relative value of μ and v : when v > μ, y *

Table 5
The relative frequency of y * and x * under the low mutation rate group with N = 100 .
The influence of a on the mean value is, with the increase of the external supervision cost a , the fraction of C a decreases, induced by the stronger inhibition effect of a on C a ; meanwhile E(y 1 ) increases, for the heavier punishment effect of a on U c .Hence, a higher a introduces more honest enforcers and more trusting participants to the market.
To sum up, in a large scale market, the strategy profiles show cyclic dominance patterns.A lower service cost is not necessarily preferable, considering its punishing effect on corrupt enforcers.When a is higher, the punishment effect is strengthened, the average fraction of honest enforcers and trusting cooperators are higher.But such result is at the cost of more C ā being exposed to and eliminated by defectors.Inspired by the cyclic pattern of strategies in Fig. 3 (b), we further explore the cycle length of a strategy C (S i ) .We define C (S i ) as the time steps that it takes to finish one period, during which the fraction of the strategy S i grows up from the bottom, reaches its summit and then drops to the bottom again.
The results show that the mutation rates influence the cycle length of strategies significantly: v > μ is always related to a shorter C (S i ) , and when the absolute value of mutation rate is lower, the corresponding C (S i ) is shorter.In addition to the mutation rate, a also determines the cycle length through its inhibition effect on C a and punishment effect on D (Concrete analysis is reported in Supplementary Material C.3.1).

Stochastic dynamics in a medium scale market: N = 100
For the medium scale market, we also repeat 500 times for each set of parameters.The results show that: 1) in the low mutation rate group, y * ∈ { (0 , 1) , (0 . 5 , 0 .5) } , whether x * is reachable depends on both the mutation rates and a ; 2) in the high mutation rate group, both x and y show cyclic dominance patterns.
( 1 ) Stochastic dynamics in the low mutation rate group Table 5 shows the relative frequency of y * = (0 , 1) and y * = (0 . 5 , 0 .5) under the low mutation rate.Similar to the results in Section 4.2.1 , when v > μ, the probability of evolving into y * = (0 . 5 , 0 .5) is greater than when v < μ.When v > μ ( μ = 0 .001 , v = 0 .005 ), with the increase of a , the probability of y * = (0 . 5 , 0 .5) decreases from 89.8% to 75%.This result is due to a 's inhibition effect on C a , which makes C ā eliminates C a more thoroughly, followed by D 's invasion.Then the event (C ā , D ) | (U c ) stimulates the growth of U c and drives y * away from (0 . 5 , 0 .5) .
However, there are two increases of the relative frequency of y * = (0 . 5 , 0 .5) when a increases from 0 . 1 to 0 . 2 , and from 0 .3 to 0 .4 .The first rise if caused by the stronger punishment effect on U c , which obstacles the growth of U c , and therefore rises up the probabilities of y * (0 . 5 , 0 .5) .The second rise is more complicated, when a further increases, both the inhibition effect and the punishment effect are stronger.The former drives y away from (0 . 5 , 0 .5) , the later drives y away from (0 , 1) , which prolongs the required time of reaching the equilibrium (More details can be found in Supplementary Material C.4).When y = (0 . 5 , 0 .5) , if the event (C ā , D ) | (U c ) never happens until C ā take over the market, then y * = (0 . 5 , 0 .5) can be reached.Although such process is unlikely to happen especially when the fraction of C ā is increasing, the long dynamic time improves its probability.
When v < μ ( μ = 0 .005 , v = 0 .001 ), the relative frequency of y * = (0 , 1) is high, and it increases with a higher a .This result is owing to the stronger inhibition effect of a , the lower mutation rate of rule enforcers, and the higher mutation rate of players.When a increases, the inhibition effect on C a leaves more space for U c to grow; at the same time, the lower v makes it harder for U h to invade.As a result, rule enforcers are less likely to evolve into the equal dominance, but into the corrupt dominance.Furthermore, unlike the scenario under μ = 0 .001 , when μ = 0 .005 , defectors can hardly get excluded permanently from the players.The persistent existence of defectors makes the event (D, D ) | (U c ) or (C ā , D ) | (U c ) more likely to happen, which stimulates the growth of U c and drives y * to (0 , 1) Regarding x , x * = (0 , 1 , 0) | (μ = 0 .001) or x * = (0 .006 , 0 .991 , 0 .003) | (μ = 0 .005) (0.006 is caused by μ), pure cooperators in the stable state is the only reachable equilibrium.The sufficient condition of trusting cooperator dominance is y * = (0 . 5 , 0 .5) or a ≤ 0 . 2 .The first sufficient condition is easy to understand, as x * = (0 , 1 , 0) is the only equilibrium when y * = (0 . 5 , 0 .5) .The second sufficient condition comes from the inhibition effect of a .When a ≤ 0 . 2 , players eventually evolve into pure cooperators ( x * 1 + x * 2 = 0 .997 ), but if a ≥ 0 .3 , the dominance of C a is further weakened, which arouses the invasion of defectors.
Nevertheless, those two conditions are not necessary.When a = 0 .3 , there are 28 cases out of the 500 repetitions that x * = (0 , 1 , 0) when y * = (0 , 1) .The average required time for these outliers to reach the stable state is 1123.57,while that ( 2 ) Stochastic dynamics under high mutation rate group Under the high mutation rate group, both x and y show stable oscillations.Similar to the analysis in the large scale market, we also focus on E(# S i /N) as well as C (S i ) , and track the influence of the mutation rate and the cost of the external service on them.
Figure 6 shows the average fraction of different strategies.We find and with the increase of a , E(x 1 ) and E(y 2 ) decrease monotonically.Compare to the results in a large scale market, the differences are: when a = 0 . 1 , E(y 2 ) | (N = 100) < E(y 2 ) | (N = 10 0 0) , hence, the slop of E(y 2 ) is more gradual; and when a ≥ 0 . 2 , E(x 1 ) | (N = 100) < E(x 1 ) | (N = 10 0 0) , hence the slope of E(x 1 ) is steeper in medium scale markets.
The reason for these results is that when N is smaller, the event (C a , D ) | (U c ) has higher probability to occur under the same x and y (Detailed proof is provided in Supplementary Material C.5).The occurrence of the event (C a , D ) | (U c ) makes # U c more likely to drop from the high level, which reduces E(y 2 ) .Nevertheless, such consequence is offset by the stronger inhibition effect of a on C a when a increases from 0.2 to 0.5.That's why E(y 2 ) has no significant difference in the medium or the large scale market when a ≥ 0 . 2 .Meanwhile, the heavier inhibition effect of a reduces the summit of x 1 , which makes E(x 1 ) | (N = 100) < E(x 1 ) | (N = 10 0 0) .The large decline of V ar(C a ) when a ≥ 0 . 2 also confirms this reasoning (More detailed analysis is in Supplementary Material C. 3.3).
With regard to the cycle length, the results are almost the same as in large scale markets; the only difference is that the event (C a , D ) | (U c ) is more likely to happen in medium scale markets, which amplifies the punishment effect of a on U c .Since the punishment effect of a accelerates the elimination of U c , and eventually shortens C (U c ) , C (U c ) is monotonically decreasing when a increases (More detailed analysis is provided in Supplementary Material C.3.2).In summary, in a medium scale market, when the mutation rate of rule enforcers and players are low, keeping the cost of external supervision services no greater than 0.2 is enough to lead the system to evolve into a stable trusting cooperative cooperation.While within the high mutation rate group, similar to in large scale markets, the strategy profiles exhibit stable oscillations.Moderately increasing the cost is beneficial to improve the average fraction of both trusting cooperators and honest enforcers.
These three groups of simulation experiments conducted in small, medium and large scale markets provide representative evolution patterns and demonstrate the mechanism of a and mutation rates influencing the evolution.Despite the designed experiments have answered the research question, the largest scale of market is limited to 10 0 0. Notably, if N → ∞ , the evolution of x and y will converge to the analytical results in infinite markets.Detailed proofs can be found in Supplementary Material C.6.Additionally, it is important to acknowledge that, the assumption of M = N/ 2 simplifies the real-world scenario, as it is possible for one rule enforcer to monitor multiple pairs of players.Relaxing this assumption may influence whether strategy profiles can evolve to equilibria, but conclusions regarding the effects of a and mutation rates hold.Concrete analysis can be found in Supplementary Material C.7.

Conclusion and discussion
In this study, we explore the effectiveness of introducing an optional external supervision service to cooperators on combating corruption.Considering that the decisions of players and rule enforcers both depend on and determine the environment [40,41] , we construct a simple model in which players join pair-wise, and each pair of players is assigned a rule enforcer; in this model, players can choose to be a cautious cooperator ( C a ), who engages the external supervision service at the cost a ; a trusting cooperator ( C ā ), who does not engage the service; or a defector ( D ), who bribes the enforcer for escaping from the punishment.Parallelly, rule enforcers can choose to be either an honest one ( U h ), to enforce the incentives; or a corrupt one ( U c ), who exonerates defectors for the bribe.The collusive bribery will be discovered in the event where C a is paired with D and assigned U c .
To better study the consequence of introducing the external supervision option, we assume there are no additional explicit punishment mechanisms for the corrupt enforcers.The classic explicit punishments for combating corruption include fining corrupt enforcers with a fixed penalty by the right authorities [42,43] or by other honest enforcers [44] ; fining not only bribe-receiver, but also the bribe-givers [41] ; social exclusion which ostracizes the corrupt ones out of the system [28,30] , etc. External supervision is different from these as it is not designed to combat corruption through punishing the corrupt enforcers, but through increasing the transparency of the market, disclosing the collusive bribery, and correcting the participants' payoff to the level such that they are joining a fair market.Hence, we assume that the defectors only need to pay the fine f for their cheating, but will not receive any additional punishment for conducting the bribery; corrupt enforcers only need to pay the commission fee back to the cautious cooperators and cover their cost on the external supervision services, without any extra punishment for committing the corruption.
Providing cooperators with the option of engaging external supervision service can reverse the outcome and mitigate corruption.However, the effectiveness of this approach depends on several key factors, including the initial strategy profile of rule enforcers ( y (0) ) and the cost of external supervision service ( a ).While all the parameters that determine the payoff matrix influence the effectiveness, our specific focus is on a , as it is regulatable and directly affects players' willingness to engage with external supervision services.The remaining parameters are fixed with commonly used values.Note that altering these parameter values may lead to different outcomes.
Under the framework of evolutionary game theory, we find that y (0) = (y (0)   1 , y (0) 2 ) plays a decisive role in the game.

y (0)
1 > (B − c) / (B − f ) results in the dominance of trusting cooperation.Otherwise, players are eventually surrounded by corrupt enforcers, and the strategy profile of players ( x ) exhibits cyclic dominance.This finding is consistent with previous studies [7,38] .Furthermore, when cyclic dominance is observed in x , the corresponding oscillation frequency is positively related to a ( Fig. 1 (d), (e)), a higher value of a leads to higher oscillation frequency.
Other than the mechanism of survival of the fittest, exploring is also a common learning strategy in our real-world.Accordingly, we extended the original model by adding asymmetric mutation rates to the players( μ) and rule enforcers( v ).We find that with the random exploration mechanism, the strategy profile of both players and rule enforcers can reach their equilibrium x * and y * .The relative value of μ and v changes the equilibrium drastically ( Fig. 2 ).When v > μ, enforcers evolve to an equal dominance (half U h and half U c , y * = (0 . 5 , 0 .5) ) and players evolve into the dominance of trusting cooperation; further, this result is robust to the initial state of players or enforcers.Lee et al. also noticed the decisive influence of asymmetric mutation rate in dynamics [2] , they constructed a harvester-enforcer game in which the enforcer can be honest or corrupt.They pointed out that if the corrupt enforcers have higher mutation rate than the honest ones, the system is more likely to end up with cooperation dominance.In contrast, in our model, it is the bias of the mutation rate between enforcers and players that determines the equilibrium.Such different results illustrate the subtle influence of the random exploration mechanism on the evolution, which depends on the nature of the specific system.As for the cost of the external supervision service, it has critical influence on the player-enforcer dynamics through its double effect: the inhibition effect on cautious cooperators decreases the transparency of the system, and breeds corruption; yet the punishment effect on corrupt enforcers deters the corrupt enforcers, and improves the fraction of honest enforcers.Although we exclude the extra punishments of committing corruption for rule enforcers, the assumption that the corrupt enforcer has to cover the cost of engaging external supervision for C a in the event (C a , D ) | (U c ) turns a into a negative incentive for corruption.Due to these two effects of a , different level of a can change the trajectories of x and y ( Fig. 1 (d) and (e)), and determines x * and y * if the equilibrium exists ( Fig. 2 ).It is intuitive that reducing a would be an effective means of reducing corruption [16,45] .However, in our model, we noticed that a lower a is not necessarily better.The results reveal that only when v < μ, corrupt enforcers are the majority, is a lower a preferable.Otherwise, y * ≈ (0 . 5 , 0 .5) , a lower a in turn decreases the fraction of trusting cooperators but increases that of cautious cooperators.This corresponds to more cooperators engaging in unnecessary external supervision services.
To examine whether the insights of the influence of mutation rates and supervision cost a are also valid in a finite population, we apply simulation experiments to explore the stochastic dynamics within different sizes of markets.We find that in a finite market the conclusions are still valid.Increasing v makes the rule enforcers more likely to evolve into an equal dominance ( Fig. 4 and Table 5 ).Even when the strategy profiles cannot reach any equilibrium, but exhibit cyclic patterns, the average fraction of C ā ( E(C ā ) ) and of U h ( E(U h ) ) are higher if v > μ ( Figs. 5 and 6 ).Furthermore, in finite markets, decreasing a to the utmost is not always beneficial neither.The optimal value of a depends on the scale of the market.Within a small scale market, reducing a is not necessary, since players eventually evolve into homogeneous trusting cooperators.Within a medium scale market, when the equilibrium is reachable, ensuring a ≤ 0 . 2 is meaningful, as it can guarantee the market ends up with trusting cooperator dominance.Whereas when cyclic dominance patterns are observed in x and y , such as in a large scale market, a higher a arouses a higher E(U h ) and E(C ā ) instead ( Figs. 5   and 6 ), as a result of a 's punishment effect.However, it does not necessarily mean that a should be increased as much as possible when strategy profiles cannot reach equilibria.After analyzing the average cycle length of strategy C a and D , we find that a higher a also leaves C ā exposed to defectors longer, due to its inhibition effect on C a .Therefore, considering the trade-off between protecting trusting cooperators and improving the average fraction of honest enforcers, a = 0 . 2 is the most eclectic level.Note that this value is provided with a set of predefined parameters ( Table 4 ).When changing the assumptions, the optimal a might also be different.
In all, the conclusions that drawn from the replicator dynamics imply some practical suggestions for platform management.First, since the initial fraction of the honest enforcers is critical, investing in the ethical education for new rule enforcers is a valuable investment, as this measure facilitates the establishment of an honest atmosphere from the beginning, which can effectively prevent corruption.Second, increasing the mutation rate of rule enforcers is always beneficial.
Despite v , as a feature of rule enforcers, is challenging to regulate directly, we can indirectly influence it by replacing parts of the rule enforcers with new recruits or through rotation [46] .As long as the new group has a different strategy profile than the original group, it is equivalent to introducing randomly exploring rule enforcers into the system, thereby achieving the effect of increasing v .Third, reducing a is intuitively advantageous, however, it may not always be the case.In our model, we found the optimal cost depends on the scale of market and the exploration rates of enforcers and players.
This work still leaves out certain possibilities for future research.In our model, players and rule enforcers are independent and not structured.However, corruption and bribes are usually not happening independently in real life.The structured social network of players or rule enforcers can influence the bribery and corruption behavior [16,47] .For example, the honest enforcers may transform into corrupt ones under the peer pressure [29] or social intimidation [28] .Additionally, in our research, we assume that once the collusive bribery is discovered by the external supervisor, the loss that the cautious cooperators suffer from the interaction can be covered in time.But in real life, there can be a delay [48] or even subsequent loses (like revenge from the rule enforcers), which can change the payoff matrix and dynamics essentially.It may be interesting to consider these extensions in future researches.

Fig. 4 .
Fig. 4. The relative frequency of specific equilibrium of rule enforcers among the 500 repetitions with N = 10 .The combination of the mutation rate and

Fig. 5 .
Fig. 5.The mean value of the fraction of specific strategies ( E(# S i /N) ) in large scale markets where N = 10 0 0 .The influence of different mutation rates onE(# S i /N) is as follows: E(x 2 ) | (μ) < E(x 2 ) | (μ )when μ > μ .A higher mutation rate makes C a and D easier to sneak in before x 2 reaching 1, and hence decreases E(x 2 ) and increases E(x 1 ) and E (x 3 ) .E (y 2 ) depends on both the relative value and absolute value of μ and v .E(y 2 ) is higher when v > μ than when v < μ.When the relative value is controlled, a higher mutation rate leads to a larger E(y 2 ) .Additionally, a higher value of a has an inhibiting and

Fig. 6 .
Fig.6.The mean value of the fraction of specific strategies ( E(# S i /N) ) in medium scale markets where N = 100 .Comparing to the results obtained in large scale markets, the difference is E(y 2 ) | (N = 100) < E(y 2 ) | (N = 10 0 0) when a = 0 . 1 .This difference is caused by the higher probability of the event (C a , D ) | U c which leads # U c easier to decline from a high level and consequently reduces E(y 2 ) .However, this influence is offset by the stronger inhibition effect of a on C a when a > 0 . 2 .This stronger inhibition also explains the observation that E(x 1 ) | (N = 100) < E(x 1 ) | (N = 10 0 0) .

Table 1
Payoff matrix for the pairwise game.

Table 4
Simulation experiment setup.