Optimal Tag-Based Cooperation Control for the “Prisoner’s Dilemma”

A long-standing problem in biology, economics, and social sciences is to understand the conditions required for the emergence and maintenance of cooperation in evolving populations. *is paper investigates how to promote the evolution of cooperation in the Prisoner’s Dilemma game (PDG). Differing from previous approaches, we not only propose a tag-based control (TBC) mechanism but also look at how the evolution of cooperation by TBC can be successfully promoted. *e effect of TBC on the evolutionary process of cooperation shows that it can both reduce the payoff of defectors and inhibit defection; although when the cooperation rate is high, TBC will also reduce the payoff of cooperators unless the identified rate of the TBC is large enough. An optimal timing control (OTC) of switched replicator dynamics is designed to consider the control costs, the cooperation rate at terminal time, and the cooperator’s payoff. *e results show that the switching control (SC) between an optimal identified rate control of the TBC and no TBC can properly not only maintain a high cooperation rate but also greatly enhance the payoff of the cooperators. Our results provide valuable insights for some clusters, for example, logistics parks and government, to regard the decision to promote cooperation.


Introduction
Animal Dispersion in Relation to Social Behavior was published by Wynne-Edwards and looked specifically at the "Evolution of cooperation" [1]. Why should an individual help another person who is a potential competitor in the struggle for survival? is question is listed in "Science" Magazine as one of the 25 core problems of the 125 scientific challenges proposed by scientists from all over the world. e evolution of cooperation is an enduring conundrum in biology, mathematics, and social sciences. Natural selection opposes cooperation unless some mechanisms are at work to promote the evolution of cooperation. e Prisoner's Dilemma game (PDG), which represents an extreme case, has emerged as one of the most promising mathematical areas in the study of cooperation.
In the PDG, two players simultaneously decide whether to cooperate (C) or to defect (D). If one player cooperates, the other player can choose between cooperation which yields R (the reward for mutual cooperation) or defection which yields T (the temptation to defect). On the other hand, if one player defects, the other player can choose between cooperation which yields S (the sucker's payoff) or defection which yields P (the punishment for mutual defection), where T > R and P > S. e donor-recipient game (DRG) is a special case of the PDG. In the DRG, a cooperator is someone who pays a cost, c, for another individual to receive a benefit, b. A defector has no cost and does not deal out benefits. By comparing the PDG and DRG, we obtain R � b − c, T � b, S � − c, and P � 0. If the game is only played once, then each player gets a higher payoff from D than from C, regardless of what the other player does. So, natural selection implies competition and, therefore, opposes cooperation unless a specific mechanism is at work.
Direct reciprocity was proposed by Trivers [8] and developed as a mechanism for the evolution of cooperation [9,24]. In the repeated Prisoner's Dilemma, the same two individuals repeatedly encounter each other for some rounds. If one cooperates now, the other may cooperate later. Hence, he or she might cooperate [10]. Axelrod and Hamilton proposed a model of the evolution of cooperation based on the iterated Prisoner's Dilemma [11]. In two computer tournaments, Axelrod discovered that the "winning strategy" was the simplest of all, tit-for-tat (TFT). TFT is a program where one player uses C on the first move of the game and, then, plays whatever the other player chose in the previous move. is simple concept captured the fascination of many enthusiasts of the repeated Prisoner's Dilemma, and a number of empirical and theoretical studies were inspired by Axelrod's groundbreaking work [12,13,25]. e presence of the decoy increased willingness of volunteers to cooperate in the first step of each game, leading to subsequent propagation of such willingness by (noisy) tit-for-tat [25]. Reference [26] pointed that resilient cooperators could sustain cooperation indefinitely in the finitely repeated Prisoner's Dilemma.
e Prisoner's Dilemma experiment showed that the termination rules and the (expected) length of the game significantly increased cooperation in [27].
Indirect reciprocity does not rely on repeated encounters between the same two individuals. An individual can establish a good reputation by helping someone which can increase the chance of receiving help from others. Indirect reciprocity has substantial cognitive demands. For example, language is needed to gain the information and spread the gossip associated with indirect reciprocity so that only humans seem to engage in the full complexity of the game. Differing from indirect reciprocity based on reputation, kin can be recognized through familiarity based on environmental or learned cues or through pattern matching based on some inherited trait. If individuals display a heritable marker or a tag, they preferentially cooperate with partners who share their own marker. So, tag-based donation does not have the substantial cognitive demands that a reputation mechanism based on indirect reciprocity has. Cooperation can also arise when individuals donate to others who are sufficiently similar to themselves in some arbitrary characteristic. Such a characteristic, or "tag," for example, a green beard [28,29], can be observed by others. Hamilton illustrated how this mechanism worked even when individuals were not genealogical kin [2]. is is known as the green beard effect. For tag-based donations, it is not necessarily required to remember past encounters. In the recent years, the green beard effect has become a favorite topic among both evolutionary biologists and sociologists [30][31][32][33][34][35][36]. For example, Tian et al. and Tian et al. [30,31] proposed Vcash as a reputation framework for identifying denial of traffic service to resolve the trustworthiness problem, where the result of verified traffic event notification acted as a "tag". Taylor and Nowak proposed nonuniform interaction rates (interaction rates dependent on the strategies) that allowed the coexistence of cooperators and defectors in the Prisoner's Dilemma [37]. e analytical models of evolution of cooperation were given when nonuniform interaction rates were introduced [38]. Both"kin recognition" and "tag-based donation" lead to nonuniform interaction rates.
By combining evolutionary processes with differential equations, the links between stability and optimization have been researched and a mathematical framework has been built [39,40]. In multiagent systems, an individual is regarded as an agent who autonomously regulates his behavior according to his benefit. Agents play games with their neighbors through local interaction. e evolution of cooperation is an adaptive coordinated control process which is a controllable, intelligent, and autonomous decision process. From the perspective of evolutionary games, the following objectives, inter alia, have been achieved through the designing of the control law: optimizing individual cost function [39], designing consensus control of stochastic multiagent systems [41], and promoting the evolution of cooperation in social dilemmas [23,38].
Motivated by the abovementioned discussion, through combining direct reciprocity, indirect reciprocity, tag-based donation, and optimal control theory, a tag-based control (TBC) is proposed. e goal of this paper is to study the feasibility of promoting cooperation by TBC and designing an appropriate identified rate for operators. As part of this, we discuss the different effectiveness and drawbacks between by identifying cooperation and by identifying defection for promoting the evolution of cooperation. Using the optimal control theory, we examine whether an operator should design TBC mechanisms for the evolution of cooperation or give up TBC temporarily for increasing the group's payoff. We show that an operator can design an optimal timing control (OTC) to optimize the control costs, the cooperation rate, and the cooperator's payoff. e main contributions of this paper are as follows. (1) For the first time, a TBC that promotes the evolution of cooperation in the Prisoner's Dilemma is proposed. (2) Advantages and disadvantages of identifying cooperation and identifying defection in TBC are investigated. (3) An OTC is designed to better promote cooperation rates and greatly enhance the group's payoff. (4) From a research standpoint, this work contributes to evolutionary game theory and optimal control theory. e paper is organized as follows. In Section 2, we obtain the payoff matrix and the replicator dynamics of the Prisoner's Dilemma with TBC. By applying the results of Section 2, we discuss the problem of evolution of cooperation in the Prisoner's Dilemma with TBC in Section 3. In the next two sections, the optimal TBC is designed. We design the optimal identified rate of promoting cooperation in Section 4. e effectiveness of TBC is illustrated and an OTC is given in Section 5. We provide some concluding remarks and discussion in the last section.

The Replicator Dynamics of the Prisoner's Dilemma with TBC
In the Prisoner's Dilemma game, two individuals can each either cooperate (C) or defect (D). e payoff matrix of the Prisoner's Dilemma is as follows: No matter what the other does, the selfish choice of defection yields a higher payoff than cooperation, so a player will choose defection no matter whether his opponent chooses cooperation or defection. However, if both defect, both get P rather than the larger value of R that they both could have gotten if they had both cooperated. In other words, if both defect, both do worse than if both had cooperated. Hence, the game poses a dilemma. In repeated Prisoner's Dilemma games, if constraint (3) cannot be satisfied, the social efficiency that all players always choose cooperation is less than that they agree to choose cooperation and defection in turn.
Let Ω � 1, 2, . . . , N { } be a group of game participants, where N is a sufficiently large natural number. Consider the symmetric static games of complete information between two individuals. e pure strategy set for each individual in group Ω is denoted by Λ′ � C, D { }. For notational convenience, we will label every individual's pure strategies by positive integers. Hence, the purestrategy set of each individual is written as Λ � 1, 2 { }. A vector of pure-strategy profile is denoted as s � s 1 , s 2 , where s i is a pure strategy for individual i. e pure strategy space is Λ × Λ, where Λ × Λ is the Cartesian product of the pure strategy set Λ. A probability distribution (x, y) over the pure-strategy set Λ of an individual i is defined as a hybrid strategy for the individual i, where x and y are the probabilities assigned to the individual's pure strategy C and D, respectively. We denote z � (x, y). e hybrid strategy space of each individual is the simplex Δ � z ∈ R 2 + : x + y � 1 , where R 2 + is a subset of the twodimensional vectors with all elements being positive. At any point of time t, let x(t) and y(t) be the rates at which individuals choose strategy C and D in group Ω, respectively.
en, the corresponding group state is z(t) � (x(t), y(t)), where x(t), y(t) ∈ (0, 1) and x(t) + y(t) � 1. erefore, the group state is formally equivalent to the hybrid strategy. Because the state z(t) is completely determined by its component x(t) or y(t), this paper discusses the variable x(t) only.
In the group Ω, the individuals who choose strategy D have a higher average fitness than the others. erefore, selection acts to increase the relative abundance of strategy D. After sometime, cooperation will vanish from the group. So, strategy D is the only evolutionarily stable strategy (ESS) unless a specific mechanism is at work.

e Game under Labeling.
is paper studies the special mechanisms for the evolution of cooperation in some clusters in the economic society when faced with the Prisoner's Dilemma, for example, in logistics parks. For such a large number of individuals from different sources, it is difficult to build mutual trust because of the high cost of fully understanding and accurately knowing each other's willingness to cooperate. In this case, the mechanisms for promoting the evolution of cooperation, such as direct reciprocity, indirect reciprocity, punishment mechanism, and nonuniform interaction, are difficult to meet. A third party in the cluster, though, can play the role of collecting information and rewarding or punishing enterprise behavior to promote mutual recognition efficiently. Such examples might be park managers, platform promoters, park management committees, industry associations, and so on.
In view of this, we assume that there exists a controller (an operator) outside of group Ω who can identify defection and cooperation. e controller labels "cooperator" or "defector" for everyone. e tags labeled by the controller affect the strategy choice of individuals in the following games until the next identification. Definition 2. One round game is defined as a game set which consists of a game identified by a controller and all games tagged by this identification.
Assume the number of rounds is infinite. In one round, every individual plays m + 1 period games, where m is a positive integer. e first period is identified by the controller in one round. All individuals are tagged before the second period game begins. In the following m periods, all individuals know their tags and can distinguish tags of others.

Assumption 1.
e individuals are bounded rationality and any individual encounters other individuals randomly with equal probability.
is paper assumes that the individuals (enterprises or persons) are in certain clusters (such as logistics parks) or bilateral platforms in the economic society. e indirect reciprocity has substantial cognitive demands for players. e tag mechanism for the evolution of cooperation demands that individuals own and display a heritable marker or a tag. Due to the large number of individuals in the Complexity cluster, the interaction between them is random, and it is difficult to form long-term repeated games. erefore, it is impossible to establish a direct reciprocity mechanism based on repeated games. In the early stage of the formation of clusters or bilateral platforms, due to the lack of mutual understanding and trust relationship between individuals, the cost of collecting information and displaying individual reputation is very high, so indirect reciprocity and tag mechanism cannot work unless a person or organization helps the players. We define this person or organization as the controller (Definition 1). e identification implemented by the controller can be regarded as tagging individuals. In the first period of a round game, the controller identifies individuals and displays the results.
Based on Assumption 1, for each individual, the probability of interacting with a cooperator or a defector is x(t) or 1 − x(t), respectively. At time t, the rate of cooperators in the group Ω is x(t). So, the state x(t) is defined as the group cooperation rate. e cooperator defined by Definition 3 can be equated with the individual who always chooses cooperation without control based on Assumption 1. Obviously, an individual is a defector if he is not a cooperator under Definition 3.

Assumption 2.
Assume that the identification service is not perfect. e probabilities that the cooperators and defectors are correctly identified by the controller are β 1 and β 2 , respectively, where β i ∈ (0, 1), i � 1, 2.
Based on Assumption 2, β 1 is called the cooperation identified rate and β 2 is called the defection identified rate. We denote the tags of "cooperator" and "defector" as "C ′ " and "D ′ ". In one round, Ω is divided into 4 parts noted by Ω i (i � 1, 2, 3, 4). Ω 1 and Ω 2 are sets of cooperators and defectors labeled C ′ , respectively. Analogously, Ω 3 and Ω 4 are sets of cooperators and defectors labeled as D ′ , respectively. Furthermore,   Based on Assumption 3, the controller can dominate an individual's strategy choice through tagging individuals. At time t, the cooperation rate is x(t) in group Ω. In the first period, the ratio of C to total strategies is equal to the cooperation rate x(t). In the following m periods, an individual's strategy is decided by tags according to Assumption 3. e ratio of C to total strategies is not the cooperation rate x(t) but Assumption 3 is reasonable by Assumption 1 and Definition 1. From the abovementioned discussion, we know that C not only denotes cooperation but also denotes a cooperator and D not only denotes defection but also denotes a defector. Numbers of individuals in Ω 1 , Ω 2 , Ω 3 , and Ω 4 are β 1 xN, Based on Assumption 2 and 3, in the first period, cooperators choose C and defectors choose D. In the following m periods, two individuals choose C, if and only if they are in Ω 1 . e strategy pair (C, C) is contributed. When one player from Ω 1 encounters a player from Ω 2 , a cooperator chooses C and a defector chooses D. In addition to the abovementioned two cases, the strategy pair (D, D) will be formed. According to Taylor and Nowak [31] and Dong et al. [32], the probabilities of (C, C), with TBC. e larger β 1 and β 2 means the higher probability of (C, C) and the lower probability (C, D). So, the larger β 1 and β 2 are in accordance with the better effect of promoting the evolution of cooperation. But, under uniform interaction rates, the corresponding probabilities are x 2 and (1 − x) 2 , respectively, so an individual's encounter constrained by TBC no longer abides by uniform interaction rates.
Based on Assumptions 2 and 3 and Definition 4, cooperation identified rate β 1 and defection identified rate β 2 reflect the accuracy of TBC. We denote β ′ as the probability of another encounter between the same two individuals in the repeated Prisoner's Dilemma and β ″ as the probability of knowing someone's reputation in indirect reciprocity. According to [31], the condition that direct reciprocity (indirect reciprocity) can lead to the evolution of cooperation depends on the parameters of payoff matrix (1) and β ′ (β ″ ). But, by Assumption 1, β ′ is very small because the group size, N, is large enough. On the other hand, the high cost of fully understanding and accurately knowing each other's willingness to cooperate leads to that β ″ is very small also. So, direct reciprocity and indirect reciprocity cannot promote cooperation in the setting that this paper considers. In every round game, cooperation identified rate β 1 and defection identified rate β 2 can act as β ′ of direct reciprocity or β ″ of indirect reciprocity.

Assumption 4.
e evolution of group state x is carried out according to the replicator. An individual replication occurs after each round game ends and before the next round begins. An individual's payoff refers to the average total payoff in this round game. 4 Complexity Assumption 4 means the offspring of cooperators are still cooperators and the offspring of defectors are still defectors. So, the offsprings of C or D are disconnected with their parent's tag. By Assumption 3, defectors always choose D, but the cooperator does not always choose C.
"Tag" does a mapping σ between Ω and C ′ , D ′ , i.e., σ: By calculation, the payoff of cooperator (defector) in the pth period of the rth round game is independent of r but determined by the cooperation rate x. Let v pth,C (x) (v pth,D (x)) denote the payoff of a cooperator (defector) in the pth period. Let w C (x) (w D (x)) denote total payoff of the cooperator (defector) in one round game; we get By Assumptions 1 and 2, the probability of anyone encountering a cooperator labeled C ′ is β 1 x and the probability of anyone encountering a cooperator labeled D ′ is (1 − β 1 )x; the probability of anyone encountering a de- Together, (4)-(7) determine w C (x) and w D (x) as follows: By Assumption 3, an individual's average payoff in the rth round game is xw C Summing up the abovementioned discussion, the total payoff matrix (1) of a round game under TBC is In fact, in the latter m periods, the hybrid strategy and the group state are not equivalent because a cooperator constrained by Assumption 3 does not always choose C. So, the payoff a ij in payoff matrix (9) does not indicate the payoff of individual i encountering strategy j, where i, j � C, D. However, we still use payoff matrix (9) for the following two reasons: he enters a group of cooperators or is P if he enters a group of defectors.
If β 1 � 1, β 2 � 0, i.e., all individuals are identified as cooperators, the game with payoff matrix (9) is equivalent to a Prisoner's Dilemma repeated m + 1 times with payoff matrix (1). By Assumption 3, cooperators always choose C and defectors always choose D. e total number of death is

e Replicator
en, the replicator dynamics under TBC is as follows: Assumption 5. e Prisoner's Dilemma tagged repeats enough periods, i.e., m ≫ 1.

Complexity
By Assumption 5, the total payoff matrix (9) in a round game approximately equals the following matrix: e replicator dynamics (10) approximates the following dynamics: Definition 6 (See [43]). A fixed point (x, y) ∈ Δ in replicator dynamics is an evolutionary equilibrium (EE) if it is locally asymptotically stable, i.e., every open neighborhood Κ ⊂ Δ of the point (x, y) has the property that every path starting sufficiently close to (x, y) remains in Κ and converges asymptotically to (x, y).
Definition 7 (See [43]). e largest open set of points whose evolutionary paths converge to a given EE is called its basin of attraction.
In order to simplify the calculation, P � 0 is assumed in the following discussion. Doing so would result in a simpler model, without significant qualitative or directional changes to our key results about evolution of cooperation. Obviously, the parameters T, R, and S satisfy T > R > 0 > S. Based on (11), the payoff matrix for a period under TBC is transformed as follows: Regardless of time scale, m does not affect either ESS or EE and does not even affect the evolutionary path of a replicator's dynamics, so we can rewrite the replicator dynamics (12) as follows: Two specific instances are given in the following.
Case1. Only the defectors are identifies, and let p(D ′ /D) � β 2 . Ω is divided into two parts. One is the set of defectors tagged D ′ . e other is the set of individuals without tag. e corresponding part of Assumption 3 is modified as follows: a cooperator chooses C if he encounters an individual without tag but chooses D if he encounters an individual tagged D ′ . Defectors always choose defect. In this case, the payoff matrix is as follows: Only the cooperators are identified, and let p(C ′ /C) � β 1 . Ω is divided into two parts. One is the set of cooperators tagged C ′ . e other is the set of individuals without a tag. e corresponding part of Assumption 3 is modified as follows: the cooperator chooses C if he encounters an individual with tag C ′ and chooses D if he encounters an individual without a tag. Defectors always choose defect. In this case, we obtain the payoff matrix as follows: By comparing matrices (13) and (15), we see that Case 1 is equivalent to cooperation identified rate β 1 � 1 in matrix (13), i.e., the controller accurately identifies cooperators. It follows that Case 1 is in accordance with control based on identifying defection. Correspondingly, Case 2 means the controller identifies defectors accurately.
From the operator's perspective, the relevant decisions are (1) whether to design TBC; (2) identifying which individual is more effective, cooperator or defector; and (3) how to design cooperation identified rate β 1 and defection identified rate β 2 .

Evolution of Cooperation for the Prisoner's Dilemma with TBC
We denote p(C/C ′ ) as the probability that the individual tagged C ′ is a cooperator and p(D/D ′ ) as the probability that the individual tagged D ′ is a defector. en, Proof. According to Assumption 2, p(C ′ /C) � β 1 and So, one can obtain posterior probability p(C/C ′ ) and p(D/D ′ ) as follows: Proof. From matrices (1) and (13), one can obtain It follows from (18) and (19) By In summary, Based on Proposition 1, TBC can help cooperators not only identify cooperators but also reduce the risk of unilateral fraud by defectors if β 1 + β 2 > 1. According to Proposition 2, for any cooperation rate x ∈ (0, 1), TBC satisfying β 1 + β 2 > 1 can inhibit defection. But, Propositions 1 and 2 do not mean TBC can promote a cooperator's payoff even when β 1 + β 2 > 1. is judgment is confusing. Propositions 3 and 4 will demonstrate. □ Proposition 3. We denote * ix � z * i /zx, where * � f, g; i � C, D. With increasing cooperation rate x, some properties of the individual's payoff are as follows: (18) and (19), according to T > R > 0 > S, we get So, with increases in cooperation rate x, the payoff of all individuals with TBC increases less than their payoff without TBC.
By Proposition 3, the fact that all individual's payoff increases in cooperation rate x has nothing to do with TBC or not. But, the rate of increase of an individual's payoff with TBC is lower than its payoff without TBC. In this sense, TBC reduces group efficiency when the rate of group cooperation rate increases. Why do we design TBC in the Prisoner's Dilemma? In fact, without special mechanisms for promoting the evolution of cooperation, the group cooperation rate will always gradually decrease until cooperators are completely expelled from the group. So, the effect of TBC on the payoff of individuals and the evolution of cooperation is complex. e effect of TBC on the cooperator's payoff and cooperation evolution involves β 1 , β 2 , parameters R, T, S, and group cooperation rate x. We denote e effects of TBC on group cooperation rate x and an individual's payoff are as follows: (i) For any x ∈ (0, 1), the defector's payoff with TBC is always lower than its payoff without TBC. (ii) A cooperator's payoff with TBC is higher than its payoff without TBC when the group cooperation rate x is lower than x 1 (β 1 , β 2 ) but is lower than its payoff without TBC when x is higher than where sign(·) is a sign function of ·. (iii) For any x ∈ (0, 1), the cooperator's payoff increases in β 2 and the defector's payoff increases in β 1 but decreases in However, the cooperator's payoff increases in β 1 when the group cooperation rate x is higher than x 2 (β 1 , β 2 ) but decreases when x is lower than Proof. From (18) and (19), for any β 1 , β 2 ∈ [0, 1] and x ∈ (0, 1), f D (β 1 , β 2 , x) < g D (x) and f C (x, β 1 , β 2 ) and . Conclusions (I) and (II) are proved. Furthermore, one can obtain f Cβ 2 (x, β 1 , Conclusions (III) and (IV) are proved.
By applying Proposition 4, for any β 1 , β 2 ∈ (0, 1), the TBC increases or decreases the cooperator's payoff depending on the group cooperation rate and x 1 (β 1 , β 2 ). Fortunately, the critical point � 1, 2). Furthermore, we obtain lim β 1 ⟶ 1 So, the TBC can always improve the cooperator's payoff if cooperation identified rate β 1 is large enough. Furthermore, for any x ∈ (0, 1), we can design an appropriate β 1 , such that x 1 . Applying (I) and (II) of Proposition 4, an appropriate TBC can raise the cooperator's payoff and reduce the defector's payoff, thus increasing the group cooperation rate rapidly.
Denoting the group's expected payoff without TBC as H(x) and with TBC as H ′ (x, β 1 , β 2 ), we get the following proposition.
□ Proposition 5. If S + T > 0, then, for any given x ∈ (0, 1) and any β 1 β 2 ∈ (0, 1), the group's expected payoff without TBC is higher than that with TBC, i.e., Proof. Applying the payoff matrix (13), we get is completes the proof. Proposition 5 is simple but interesting. It shows that the TBC reduces the group's average payoff when the cooperator's loss − S, caused by defection, is less than the defector's gain T. Furthermore, this conclusion is not affected by x, R, β 1 and β 2 . Furthermore, we obtain When is is consistent with the result of Proposition 2. By the fact that lim β 2 ⟶ 1 x 3 (β 1 , β 2 ) � 0, when β 2 is large enough, increasing β 1 can improve the group's average payoff, but it is still lower than that without TBC.
We assume that S + T < 0. Denoting <0. In other words, in the situation where two players' total payoff with strategy pair (C,D) is lower than the total payoff with (D,D), the TBC can increase the group's average payoff for a lower cooperation rate. □ Remark 1. Propositions 2-5 are all based on the payoff matrix (13) that is obtained by m ≫ 1 and P � 0. e assumption m ≫ 1 is realistic. For example, confirmation by the supervisory department of, amongst other things, an enterprise's creditor product quality, cannot be eliminated in short term. e values of x i (β 1 , β 2 ) will be more complicated if P ≠ 0,j � 1, 2, 3, 4. But, the hypothesis that P � 0 does not result in significant qualitative or directional changes to conclusions of Propositions 2-5. Furthermore, the ESS and EE of the game do not alter by the assumptions m ≫ 1 and P � 0. Propositions 2-5 show that TBC will bring down a group's payoff unless the cooperation identified rate β 1 and the defection identified rate β 2 are all large enough. From the point of static analysis, it is hard to understand that TBC can promote cooperation. However, from the evolutionary point 8 Complexity of view, without TBC, cooperation will fail in evolution and all benefits based on cooperation will disappear. Now, we discuss the ESS and EE of the Prisoner's Dilemma with TBC.
According to Friedman [43], the Prisoner's Dilemma was discussed as being a linear game; the point (x, y) � (0, 1) is an EE of replicator dynamics (14). Based on Proposition 6, a small group of cooperators cannot invade a group of defectors, no matter how high identified rates β 1 and β 2 are.
If the defection identified rate satisfies β 2 > (T − R)/T, cooperation can evolve successfully. By denoting ) ≜ x 6 (β 2 ), the largest basin of attraction of cooperation is (x 6 (β 2 ), 1) for given β 2 . Otherwise, if β 2 < (T − R)/T, defection is the only ESS. If we consider (T − R)/T as the costto-benefit ratio of the cooperation, this conclusion is in good agreement with the work of Nowak [44]. In [44], it is shown that direct reciprocity can lead to the evolution of cooperation only if the probability of another encounter between the same two individuals exceeds the cost-to-benefit ratio of cooperation, i.e., β ′ > (T − R)/T, and indirect reciprocity can only promote cooperation if the probability of knowing someone's reputation exceeds the cost-to-benefit ratio of the cooperation, i.e., β ″ > (T − R)/T. By Proposition 7, when condition (26) is satisfied, cooperation is the ESS of the Prisoner's Dilemma and the average payoff of an individual in a group full of cooperators is β 2 1 R. But, without TBC, defection is the only ESS and the average payoff of an individual will be 0! To sum up the abovementioned discussion, from a dynamic point of view, TBC can promote a cooperator's payoff. Further observation shows that defection identified rate β 2 plays a more important role in promoting the evolution of cooperation than cooperation identified rate β 1 , but β 1 determines an individual's payoff when the cooperation rate is high. Now, we only discuss the cooperator's payoff and evolution of cooperation.
Cooperation identified rate β 1 and defection identified rate β 2 are the input of TBC. ey are feedback of cooperation rate x. At any time t, cooperation rate is variable x(t). So, β 1 and β 2 are functions of time t. We denote β 1 (t) and β 2 (t) as the cooperation identified rate and the Complexity 9 defection identified rate at time t in the next part of this paper.

e Optimal Identified Rate Model.
We define u i (t) as Based on β i ∈ (0, 1), one can obtain u i ∈ (− ∞, +∞) and β i (t) � (1/(1 + e u i (t) )), i � 1, 2. e performance index is introduced: where t f is the terminal time of the replicator dynamic (14), x(t f ) denotes the terminal cooperation rate, 1 − x(t f ) is the effective cost of promoting the cooperation rate, and t f 0 β i (t)dt is the control cost of the entire TBC process. e bigger x(t f ) corresponds to the better effect of promoting cooperation and the lower effective cost. e parameter q i > 0 is used to adjust the difference between the effective cost and the control cost in the dimension and the weight. Furthermore, q i > 0 reflects the controller's balance of the TBC for cooperation and for defection.
We rewrite replicator dynamic (14) and performance index (28) under transformation (27) as the follows: We suppose the initial time of replicator dynamic (14) is t � 0, and the initial value of system is e initial value (31) indicates that, at the beginning of the evolution, the population is in a state in which cooperation and defection are mixed. We denote u(t) � (u 1 (t), u 2 (t)) T . We consider u(t) as a control variable of the dynamic (29). e feedback control law of the state x(t) is designed so that the total cost defined by the performance index (30) is minimized.
e optimal identified rate problem can be expressed as follows: under dynamic constraint (29) and initial value constraint (31), the optimal control u * (t) and the optimal state trajectory x * (t) are designed to minimize (30).

e Optimal Identified Rate Designing Process.
ere is no analytic solution for the optimal control problem (29)- (31). Now, we give a numerical algorithm to design an optimal identified rate.

Theorem 1.
Consider the optimal identified rate design described by replicator dynamic (14) with performance index (30). Under (27) and (31), the optimal identified rate β * i (t) is existent and unique, and β * i (t) is expressed in the following form: where λ (h,k) (t) is the solution of the following adjoint variable differential equation: , and x (h,k) (t) is the solution of the following state differential equation: where λ ( * h ) (t) � lim k⟶∞ λ (h,k) (t).
Proof. By introducing the adjoint variable λ(t) and letting h(x) � (S + T)x − S, if the optimal control and the optimal state are u * (t) and x * (t), then there exists the adjoint variable λ * (t) such that x * (0) � x 0 , λ * (t f ) � 1, and From (36), we obtain x * (t), λ * (t), and u * i (t) satisfying the following equations: e adjoint variable and its terminal condition with the optimal control u * (t) are as follows: Substituting (37) and (38) into (29) and (39) yields the following two-point boundary value (TPBV) problem:
e sequences x (h,k) (t) and λ (h,k) (t) are uniformly convergent to the solution of TPBV (39), i.e., is completes the proof. e number of iterations X and Y can be determined by the accuracy requirement for the performance index (30). We give a practical algorithm calculating the (X, Y)th suboptimal optimal identified rate as Algorithm 1 □

Simulation Examples
In the Prisoner's Dilemma with payoff matrix (1), let t f � 5, e replicator dynamic without TBC is By simulation, we study the following three problems.

Effect Analysis of TBC.
Let x 0 � 0.4, β � β 1 (t) � β 2 (t) � 0.7 and 0.9, and t ∈ [0, 5], respectively. In the previous discussion, we denote f C (x, β 1 , β 2 ) as the payoff of the cooperator with TBC, denote f D (x, β 1 , β 2 ) as the payoff of the defector with TBC, denote g C (x) as the payoff of the cooperator without TBC, and denote g D (x) as the payoff of the defector without TBC, denoting the group's expected payoff without TBC as H(x) and with TBC as H ′ (x, β 1 , β 2 ). In Figure 1, to simplify legends and discuss the effect of β on promoting cooperation, we redefine the symbol as follows: at t � t 0 , let H ′ (β) and f i (β) be the payoffs of group and individual i with TBC β form t � 0 to t � t 0 , respectively, H(β) and g i (β) be obtained after TBC has just been withdrawn, i.e., TBC β is used from t � 0 to t � t 0 − τ, where τ is positive and small enough, x(β) is the cooperation rate with TBC β, g i is the payoff function curve of individual i, H is the payoff function curve of the group, and x is the cooperation rate curve without TBC, respectively, where i � C, D.
Based on Figures 1(a) and 1(b), from the curves of g C and g D , we can know the payoffs of the cooperator and defector are all decreasing and cooperation fails to evolve without TBC. Comparing Figures 1(a) and 1(d), we can see that cooperators obtain more payoff than without TBC, but cooperation rate x(0.7) decreases and cooperators are expelled still because the identified rate is low when β � 0.7. According to the parameters in this example, x 0 � 0.4 is in the basin of attraction of cooperation only when β > 11/14. So, as a result, payoff of cooperator decreases and cooperation fails to evolve. When β � 0.9, payoff of cooperators increases and cooperation evolves successfully; payoff of defectors increases but defection fails to evolve.
Here, we try to show what will happen if we withdraw TBC for a short time. First of all, we emphasize the symbols H ′ (0.9), H(0.9), and H of Figure 1(c). For a given time t 0 , H(0.9) and H ′ (0.9) all correspond to the cooperation rate x(t 0 ) defined by the replicator dynamics (48) with TBC β � 0.9, but H corresponds to the cooperation rate x ′ (t 0 ) defined by the replicator dynamics without TBC (49). As shown in Figure 1(b), g D (β) < f D (β) for any identified rate Input: Given positive real numbers δ and ε. Output: (X, Y)th suboptimal β Step 1: Initialize h � 1, λ ( * 0 ) (t) � 1.
; otherwise, let k � k + 1 and jump to Step 3.
Otherwise, let k � k + 1 and jump to Step 6.
ALGORITHM 1: Finding a suboptimal optimal identified rate.

Complexity 13
β. When β � 0.7, condition (26) is not satisfied and x � 0.861, so the point x 0 � 0.4 is not in the basin of attraction of cooperation and f C (0.7) > g C (0.7). But, by designing a large identified rate β � 0.9, we obtain the basin of attraction of cooperation is (0.154, 1) and x 1 � 0.861. When t > 2.213, one gets x > 0.861 and f C (0.9) > g C (0.9) so that TBC inhibits the growth rate of the cooperator's payoff. If we withdraw TBC for a short time, cooperators will obtain more payoffs.
Combined with the evolution of cooperation, we come to the following conclusion: when the cooperation rate x(t) is low, TBC results in the payoff for all individuals being higher than the payoff without TBC; when the cooperation rate x(t) is large enough, the payoff for all individuals will be lower than the payoff without TBC under same x(t). On one hand, Figures 1(b) and 1(c) show that TBC decreases not only the defector's payoff but also the group's payoff. Furthermore, the higher the cooperation rate is, the more obvious the inhibition is. On the other hand, we observe the positive effects of TBC on promoting both the cooperation rate and the group's payoff. Only under a bigger β i (t), i � 1, 2 does the cooperation rate x(t) increases. For example, although the group payoff H ′ (0.9) under TBC is always smaller than H(0.9), it is always greater than H.
In summary, Figure 1 shows that when the group cooperation rate x(t) defined by (48) reaches a satisfactory value, temporary withdrawal of TBC can increase the payoff of all individuals in the group for a short time. However, if without TBC for a long time, the group cooperation rate β i (t), i � 1, 2 defined by (49) will drop to a very low level and all individual benefits will decrease.

Analysis of Rationality of the Optimal Identified Rate.
Letting x(0) � 0.4, q 1 � q 2 � q, we obtain the optimal identified rate β * i (t), i � 1, 2 under different q as Figure 2 and the optimal state path as Figure 3.
Given a sufficiently large constant Γ > 0 and a sufficiently small constant c > 0, if u i (t) > Γ, we define β i (t) � 0, and if 0 < u i (t) < c, we define β i (t) � 1, where i � 1, 2. Figure 2 shows that β * i (t), i � 1, 2 approximates bang-bang control with switching time t is . If t < t is , then β * i (t) � 1; otherwise, β * i (t) � 0. e smaller q corresponds to the bigger switching time. If q i is large enough, then β * i (t) � 0 and Now, we explain β * i (t) � 0. According to the definition of TBC, β * i (t) � 0 means all cooperators are labeled as "defectors" and all defectors are labeled as "cooperators".
is is impossible and inefficient. Here, β * i (t) � 0 can be understood as no effective TBC; all individuals choose "defection" strategy for self-protection. In this case, the control cost is 0. Under β i (t) � 0, the cooperation rate x(t) is a constant value. By designing the bang-bang control as Figure 2, we can achieve the goal of successfully promoting the evolution of cooperation. is meets the requirement of minimizing the performance index (30).
is is, however, unreasonable for the group in terms of revenue. Probably, the more reasonable approach would be to withdraw TBC when the cooperation rate x(t) is satisfactory. Now, let us discuss the TBC design from the perspective of revenue. Figure 3 shows that there is a suitable TBC to make the cooperation evolve successfully. Furthermore, the smaller q results in the better effect of promoting cooperation.
By designing β * i (t) defined by eorem 1, the corresponding expressions of the total payoff of the cooperator, defector, and group is respectively. If we do not design the TBC, from t � 0 to t � 5, the total payoffs of the cooperator, defector, and group can be expressed as 5 0 (7x ′ (t) − 4)dt, 5 0 5x ′ (t)dt, and 5 0 (2x ′ 2 (t) + x ′ (t))dt, respectively, where x ′ (t) is described by (49). Let q 1 � q 2 � 0.1 and q 1 � 0.1, q 2 � 0.05, respectively. e calculation results are shown in Table 1. Table 1 shows that β * i (t) improves the cooperator's payoff. Furthermore, as q i decreases, the switching time t is moves backwards and the cooperator's payoff increases.

Optimal Timing Control (OTC) of Switched Replicator
Dynamics. System (48) and (49) is a switched nonlinear system. In the past decades, switched systems have received significant research attention, see, e.g., [46][47][48]. In this paper, we design the switching control (SC) as follows: for a given q 1 and q 2 , let t s � t 1s � t 2s , and we design TBC with the identified rate as β i (t) � 1, t ∈ 0, t s , i � 1, 2. (50) However, when t > t s , we give up the TBC. e replicator dynamics with SC is as follows: Assuming x 0 � 0.4, the switching time t s is decided by the optimal identified rate problem (29)-(31) according to different weight coefficients q i . Based on a switched system (51), the total payoffs of the cooperator, defector, and group can be expressed as e simulation results are shown in Table 2, and the state path with SC is shown in Figure 4.
By comparing Tables 1 and 2, we find the impact of control on the average payoff of a group is complex. Table 1 shows that the smaller coefficient q i corresponds to a large group's payoff under optimal TBC, but Table 2 shows the opposite result. Combining with Figures 3 and 4, the group has a larger average payoff under SC, but the cooperation rate is lower at t � t f � 5.
So, how to choose an appropriate control, TBC or SC, and how to choose the switching time t s depend on what we pay more attention to. Now, we design the cost function J SC with OTC as follows: where x(5) is the terminal cooperation rate, 2t s is the control cost, t s 0 3x ″ (t)dt + 5 t s (7x ″ (t) − 4)dt is the cooperator's average payoff, and q > 0 is its weight coefficient. We search the optimal switching time t s to maximize the function (52), i.e., we design an OTC for promoting evolution of the Prisoner's Dilemma and increasing the cooperator's average payoff.
Theorem 2. Consider the OTC described by the switched replicator dynamic (51) with performance index (52). φ 1 (t s ) and φ 2 (t, t s ) are solutions to algebra equation (53) and implicit function (54), respectively: e optimal switching time t s is decided by
Letting q ′ � 0.1, q 1 � 0, 0.01, 0.1 and 1, respectively, we get the cooperation rate curve as in Figure 5 and the payoffs of the cooperator, defector, and group as in Table 3 by applying eorem 2. Figure 5 shows that the cooperation rate x(t) is very high whenever the weight coefficient of the cooperator's payoff q � 0, 0.01, 0.1 or q � 1 although the switching time t s changes (t s varies from 2.472 to 2.967). It is because we choose a large coefficient of cooperation rate at terminal time x (5). In fact, the main goal of this paper is to promote the evolution of cooperation by designing TBC. So, we design the cost function J SC to promote the cooperation rate and to reconcile control cost and the cooperator's payoff. Observing Table 3, we note that the switching time t s increases as the cooperator's payoff weight coefficient q gets bigger. e OTC has the following advantages when Table 3 is  compared to Tables 1 and 2. Firstly, it has the same advantage as TBC: it restrains the payoffs of defectors. Secondly, it greatly enhances the payoffs of cooperators compared to TBC. irdly, it is the optimal way to promote the payoffs not only for cooperators but also for groups. Lastly, it can effectively promote the cooperation rate x(t).

Conclusions
e analysis in this paper provides value insights into a number of issues regarding some clusters in the economic society when faced with the Prisoner's Dilemma, which can help operators of clusters decide to adopt TBC or not, identify cooperators or identify defectors, and design the value of the identified rate.
First, operators should judge the harm of defection in the Prisoner's Dilemma. We show that when the cooperation rate is high, to further increase cooperation rate, a large identified rate is needed. us, it is difficult to design a TBC to further promote cooperation and further improve the cooperator's payoff. Furthermore, the cluster must pay for the TBC. So, it is not necessary to use TBC for a cluster when the cooperation rate is high. If operator considers TBC is necessary, we advice he/she depends more on the cooperation identification.
By contrast, if the cooperation rate is too low, i.e., the group is almost full of defectors, it is difficult to design a TBC which can help cooperators invade the group of defectors successfully. In this case, any TBC all can increase the cooperator's payoff. But, a large defection identified rate is essential to promote the cooperation rate. It is very important to note that defection identification is more effective than cooperation identification.
We show the switch between adopting TBC and withdrawing TBC is necessary. TBC has an advantage in promoting evolution of cooperation. However, when the cooperation rate is high, TBC reduces the payoffs of the cooperators unless the identified rate is high enough. is paper gives OTC as a way to design switching law. e simulation results confirm OTC not only maintains a high cooperation rate but also increases the payoff of the cooperators.
While this paper is a significant step in the economic analysis of the Prisoner's Dilemma and in the optimal control of promoting cooperation, there are a number of interesting directions for future work in this area. ere are a number of potential interesting extensions of our model through relaxation of some of our assumptions. For example, we assume the payoff of mutual defection is 0 in this paper, i.e., P � 0. In fact, we can assume an individual is exclusive [49] and every individual only accepts an encounter-tagged "cooperator".
Last but not the least, this paper does not really design an SC law for the switched replicator dynamics; the SC designed in this paper is only an OTC with one switching time. Further discussions for promoting cooperation and increasing the cooperator's average payoff will aim to design an OTC with multiple switching times and general SC unrestrained by bang-bang control. Furthermore, the optimal tag-based cooperation control proposed in this paper can be applied in management practice, such as in e-commerce platforms. Because of information asymmetry about product quality in e-commerce platforms, there exists the PDG in seller groups and consumer groups. Platform manager can design tag-based cooperation control by offering an authentication service. A valuable extension of this research is to empirically examine the feasibility of TBC proposed in this paper.
Data Availability e simulation model and data are given in the article.

Conflicts of Interest
e authors declare that there are no conflicts of interest regarding the publication of this paper.