The intricate geometry of zero-determinant strategies underlying evolutionary adaptation from extortion to generosity

The advent of zero-determinant (ZD) strategies has reshaped the study of reciprocity and cooperation in the iterated prisoner’s dilemma games. The ramification of ZD strategies has been demonstrated through their ability to unilaterally enforce a linear relationship between their own average payoff and that of their co-player. Common practice conveniently represents this relationship by a straight line in the parametric plot of pairwise payoffs. Yet little attention has been paid to studying the actual geometry of the strategy space of all admissible ZD strategies. Here, our work offers intuitive geometric relationships between different classes of ZD strategies as well as nontrivial geometric interpretations of their specific parameterizations. Adaptive dynamics of ZD strategies further reveals the unforeseen connection between general ZD strategies and the so-called equalizers that can set any co-player’s payoff to a fixed value. We show that the class of equalizers forming a hyperplane is the critical equilibrium manifold, only part of which is stable. The same hyperplane is also a separatrix of the cooperation-enhancing region where the optimum response is to increase cooperation for each of the four payoff outcomes. Our results shed light on the simple but elegant geometry of ZD strategies that is previously overlooked.


Introduction
The evolution of cooperation is one of the long-standing conundrums facing researchers from diverse fields [1]. Over the past decades, a number of seminal contributions have been made to improve our understanding of cooperation [2]. Among others, cooperation can prevail under reciprocal altruism [3], which means 'you scratch my back and I will scratch yours'. As a repeated two-player game, iterated prisoner's dilemma (IPD) games have been the paradigm for studying direct reciprocity and cooperation [4][5][6][7][8][9][10]. In particular, the famous Axelrold's tournament has ushered in an era of studying powerful IPD strategies using computer simulations in combination with analytical approaches [11][12][13].
A plethora of memory-n strategies including deterministic strategies and their stochastic counterparts have been thoroughly investigated [8,14]. IPD strategies can be as sophisticated as possible, such as finite automata [15], lookup table-based strategies [16], or strategies that are optimized by using techniques like neural networks or swarm particle intelligence [17]. On the other hand, winning IPD strategies can be surprisingly simple but powerful. The prominent tit-for-tat (TFT), for example, is such a 'fair-minded' strategy that is found to be the backbone of direct reciprocity [6]. In the original version, TFT responds with full cooperation after co-player's cooperative move but always retaliates with full defection after co-player's defection. Variants of TFT, often called 'compliers' [18], include the so-called generous TFT (GTFT) [7] which forgives co-player's defection with a certain probability. These special strategies can be part of a larger class of reactive strategies that condition their responses on their co-player's behavioral choice (C vs D) [7,19]. The notion of reactive strategies has enabled analytical insights into the evolution of reciprocity and cooperation in the IPD games [20].
Memory-one IPD strategies are further specified by the probability to cooperate for the initial move and the probability to cooperate conditioned on each of the four possible payoff outcomes in a single round, denoted by [q 1 , q 2 , q 3 , q 4 ]. The ordered labels 1, 2, 3, and 4, respectively, refer to the payoff outcome R from the pair of strategy choices, (C, C), S from (C, D), T from (D, C), and P from (D, D), as described from the focal row player's perspective via the payoff matrix For prisoner's dilemma (PD), we have T > R > P > S. Conventional PD games also assume 2R > T + S > 2P, that is, mutual cooperation is better than unilateral cooperation better than mutual defection. From the perspective of evolution, another strategy known as win-stay lose-shift (WSLS) stands out later [7]. As for payoff control and manipulation, the discovery of equalizers is worthy of note [21]. The authors describe such a simple strategy, up to a properly chosen normalization factor φ, ] that is able to set any co-player's average payoff O to be the exact same amount between P and R. Using an elegant approach of linear algebra [22], Press and Dyson further show that the so-called zero-determinant (ZD) strategies are able to unilaterally enforce a linear relationship between its average payoff s X and that of the co-player's s Y , where O is the baseline payoff, and χ is the extortion factor. Thus, the class of equalizers becomes a limiting subset of ZD strategies as χ → ∞ (that is, to unilaterally set co-player Y's average payoff s Y to be O). ZD strategies can be categorized by their intended level of generosity O [23]. The class with O = P and χ > 1 is often called extortionate ZD since these players can always ensure their advantage with an unfair surplus as s X − P = χ(s Y − P) 0 for conventional PD games. The class with O = R and χ > 1 is called generous ZD, ensuring that their own average payoff is never greater than the co-player's as Efforts on various extensions to ZD strategies have been proven fruitful [24][25][26][27], including but not limited to multi-person games [28][29][30][31], noises or errors [32,33], and finitely repeated games [34]. The evolution of ZD strategies has been studied in finite populations [18,35] as well as in structured populations [36]. The overall insight is that extortionate ZD strategies are powerful yet not evolutionarily stable in the sense that they will neutralize each other's advantage unless they adapt to be generous [37][38][39][40]. Nevertheless, they can be the catalysts for the evolution of cooperation that pave the way for the emergence of cooperation (TFT, GTFT, or more generally, generous ZD) [41]. Noteworthy, the dominance and optimality of ZD strategies depend on the underlying payoff structure, which is determined by the sign of T + S − 2P [42]. It is found that a seemingly formidable ZD strategy can actually be outperformed, for example, by WSLS if T + S < 2P and that when against fixed unbending strategies, the best response of ZD players is to offer a fair split by letting χ → 1 [42].
Let us now turn to the superset of ZD strategies, which is a collection of memory-one strategies [q 1 , q 2 , q 3 , q 4 ] with three free parameters (O, χ, φ): For q i 's must be within [0, 1], the admissible ranges of (O, χ, φ) are given by Geometrically, the family of memory-one ZD strategies formally expressed by a four-dimensional tuple, the quadruple [0, 1] 4 , is located on a three-dimensional hyperplane. Instead of treating the components of a ZD strategy as functions of O, χ, and φ, we can also solve for these parameters reversely and express them each as functions of q 1 , q 2 , and q 3 and then plug them back into the expression of q 4 in equation (2). Without loss of generality, the result can be represented by Prior work has almost exclusively focused on the resulting linear payoff relationship explicitly only with two parameters , assuming that player X uses a ZD strategy regardless of player Y's chosen strategy. While the ramification of ZD strategies can be conveniently visualized as a straight line in the parametric plot of (s X , s Y ), with the slope 1/χ and the invariant point (O, O), the actual geometry of ZD strategies is paid little attention. In particular, the impact of the oftentimes neglected parameter φ can hardly be addressed. Yet to study ZD strategies in the context of memory-one strategies, one needs to understand the underlying properties of q i 's explicitly, rather than only considering the linear payoff relation determined by O and χ. Having said so, the present work will shed new light on the elegance of ZD strategies in relation to their parameterizations (O, χ, φ) from the previously overlooked perspective of geometry.

Results
In this work, we will focus on the geometry of (q 1 , q 2 , q 3 ) in the cube [0, 1] 3 for specific choices of (O, χ, φ).
Unless noted otherwise, we use the correspondence of Cartesian coordinates (x, y, z) with respect to the ordered triplet (q 1 , q 2 , q 3 ). There exists a one-to-one mapping, at least in the small neighborhood of a given ZD strategy, between (q 1 , q 2 , q 3 ) and (O, χ, φ) except for φ = 0 or χ = 1, since the determinant of the Jacobian is For φ = 0, the corresponding subset of ZD strategies regardless of the choice of (O, χ) degenerates into a point D (1, 1, 0) (figure 1). Meanwhile, for χ = 1, the corresponding subset of ZD strategies regardless of the choice of O forms a line DA connecting (1, 0, 1) and (1, 1, 0) (figures 1(a)-(c)). This line is the common limit subset shared by all ZD strategies as χ → 1.
For ZD strategies not in the line DA, their (O, χ, φ) can be uniquely determined by (q 1 , q 2 , q 3 ): For ZD strategies with the same O, they all together form a two-dimensional plane (figures 1(a)-(c)), given by The Of particular interest, the 'equalizer' subset of ZD strategies (obtained by letting |χ| → +∞) forms another plane (figure 1(d)), which can be written as Figure 1. Geometry of ZD strategies. Any ZD strategy [q 1 , q 2 , q 3 , q 4 ] is parameterized by three parameters: the baseline payoff O, the extortion factor χ, and the normalization factor φ. Therefore, they can be visualized in the cube [0, 1] 3 for the ordered triplet (q 1 , q 2 , q 3 ) using the fact that q 4 is linearly dependent on q 1 , q 2 , and q 3 , as shown in equation (3). We note that the equation above is exactly equivalent to the denominator of the expression of χ as a function of (q 1 , q 2 , q 3 ) given in equation (5). This plane of equalizer intersects with the plane of ZD strategies with fixed O, and the intersection line fixes any co-player's payoff to the same level O ( figure 1(d)).  The angle θ between the vector [q 1 − 1, q 2 − 1, q 3 ] and [0, −1, 1] is φ-independent and given by Its derivative with respect to χ is Because we have d cos θ/dχ < 0, suggesting that as χ increases, the line formed by ZD with the same O and χ rotates around the point (1, 1, 0) toward the equalizer limit line until χ → +∞. For negative χ, the corresponding line similarly rotates around the point (1, 1, 0) toward the equalizer limit line until χ → −∞. Further, the equalizer plane separates all admissible ZD strategies into two regions: those above the plane with positive χ values whereas those below the plane with negative χ values (figure 2).
Most strikingly, we find the nontrivial emergence of 'equalizers' as the equilibrium manifold appearing in the adaptive dynamics of ZD strategies explicitly in the strategy space (p 1 , p 2 , p 3 ) ( figure 3(a)).  (13) near the critical equilibrium manifold, which is the plane of equalizer strategies. Part of the equilibrium manifold, indicated by BEDJ, is stable and the remaining part, CDJ, is unstable. Different colors are used to indicate the sign of the dot product in equation (15): blue for positive and red for negative. The line DJ is the intersection of the plane q 1 + q 2 + q 3 − 2 = 0 and the equalizer plane. The vector field [dq 1 /dt, dq 2 /dt, dq 3 /dt] is orthogonal to the vector [q 1 − 1, q 2 − 1, q 3 ] that points from (1, 1, 0) toward (q 1 , q 2 , q 3 ). (b) Shows the cooperation-enhancing region where dq i /dt is positive for i = 1, 2, 3, 4 in adaptive dynamics. In this region, it is optimal for ZD strategies to increase their probabilities to cooperate invariably, after each of the four outcomes.
Assuming ZD strategies p = [p 1 , p 2 , p 3 , p 4 ] vs q = [q 1 , q 2 , q 3 , q 4 ], both satisfying equation (3), Press and Dyson show that the average payoff π(p, q) of player using p can be calculated as The adaptive dynamics of ZD strategies can be obtained by After a bit of algebra, we get We see that the selection gradient is zero at the equalizer plane, forming the equilibrium manifold of the corresponding adaptive dynamics ( figure 3(a)). A simple calculation shows that Thus, the direction of the vector field [dq 1 /dt, dq 2 /dt, dq 3 /dt] is orthogonal to the vector [q 1 − 1, q 2 − 1, q 3 ] that points from (1, 1, 0) to (q 1 , q 2 , q 3 ). As a matter of fact, the vector field is formed by the concentric spheres around the corner (1, 1, 0).

Stability of the equilibrium manifold
The normal vector of the equalizer plane is [T − S, −(R − S), −(T − R)]. Its dot product with the selection gradient [dq 1 /dt, dq 2 /dt, dq 3 /dt] near the equilibrium manifold would be The sign of this dot product can be used to determine the local stability of the equilibrium manifold. In particular, the plane q 1 + q 2 + q 3 − 2 = 0 intersects with the equalizer plane and divides it into stable and unstable parts ( figure 3(a)).

Cooperation-enhancing region
In the full strategy space, the cooperation-enhancing region can be found by requiring dq i dt > 0 for i = 1, 2, 3, 4. Using equation (3), we get We thus obtain the critical plane 2(R − P)(1 − q 1 ) − (T + S − 2P)(q 2 + q 3 − 1) = 0 so as to determine the sign of dq 4 /dt. This plane is in fact the O-fixed plane (see equation (6)) formed by ZD strategies with the specific O value satisfying (T + S)/2 < O < R and Hence the cooperation-enhancing region (as shown in figure 3(b)) is given by

Equal gains from switching: T + S = R + P
A necessary condition for reactive strategies to be ZD strategies is that the four payoff values need to satisfy the so-called equal gains from switching: T + S = R + P. Under this condition, we may further get q 1 = q 3 and q 2 = q 4 . By letting q 1 = q 3 and q 2 = q 4 , we obtain φ = 1/[(R − S)χ + T − R] and φ = 1/[(T − P)χ + P − S], respectively. Moreover, equating these two φ values yields Hence, the payoff condition T + S = R + P is required as desired. To obtain the reactive strategy [q 1 , q 2 , q 1 , q 2 ] properly from the superset of ZD strategies, we also need to employ the following combinations of (O, χ): Under this payoff structure condition, another interesting finding is that unconditional strategies (q 1 = q 2 = q 3 = q 4 ) are ZD strategies with negative χ. Using equation (3) and forcing q 1 = q 2 = q 3 , we get , which leads to the same 'equal gains from switching', T + S = R + P, in order for q 4 = q 1 . The corresponding choice of O, χ, φ are now

Fun facts about ZD strategies
The term 2q 1 − q 2 − q 3 − 1 is the denominator of the expression of O as a function of (q 1 , q 2 , q 3 ) in equation (5). For all admissible ZD strategies, we actually have Algebraically, using equation (2), we get 2q < 0 for any admissible parameter choices. The same inequality can also be shown by utilizing the exquisite geometry of ZD strategies. As O → −∞, the O-fixed plane approaches the limit q 3 = 2q 1 − q 2 − 1, which is further below the plane for O = P. Likewise, as O → +∞, the O-fixed plane rotates further beyond the plane corresponding to O = R (whose equation is q 1 = 1) and approaches the same limit from the other direction. Since a ZD strategy only admits P O R, it follows that q 3 > 2q 1 − q 2 − 1 for all ZD strategies. Hence we obtain the inequality above as desired.

Conclusions
In this paper, we focus on the payoff structure of conventional PD games satisfying T + S > 2P. It is not impossible that in some PD games T + S < 2P holds, that is, the polygon connecting (R, R), (S, T), (P, P), and (T, S) is non-convex in the pairwise payoff plot [42]. Under this circumstance of more adversarial nature, the geometry of ZD strategies can be studied similarly. It is also worth noting that, for games satisfying 'equal gains from switching', T + S = R + P, reactive strategies (including unconditional strategies) become special cases of ZD strategies. Scrutinizing the geometry of ZD strategies provides useful intuition about and insights into how ZD strategies relate to one another. The parameterization combination (O, χ, φ) of ZD strategies determine the admissible range of q = [q 1 , q 2 , q 3 , q 4 ] that can be realized. Each subset of ZD strategies with a fixed O value contains a common extreme boundary AD (figures 1(a)-(c)) that is a linear interpolation of TFT (q = [1, 0, 1, 0]) and 'AllC or AllD' (q = [1, 1, 0, 0]). The line formed by ZD strategies sharing the same (O, χ) approaches the limit of equalizer strategies, located on the plane BCDE (figure 1(d)), as χ → +∞ or χ → −∞. This equalizer plane in fact separates the strategy space (q 1 , q 2 , q 3 ) of admissible ZD strategies into two regions with positive versus negative χ values (figures 2(a) and (b)). We also realize that the region of the cube under the plane expanded by ZD strategies with O = P and containing the corner (1, 0, 0) is inadmissible for any ZD strategies ( figure 2(c)).
From the perspective of evolution, it is straightforward to use the adaptive dynamics of ZD strategies to figure out the potential optimal ZD strategies against each other. We find that the common line AD of all ZD strategies with χ → 1 emerges as a particular equilibrium. More generally, the plane of equalizers is the non-trivial equilibrium manifold, part of which is stable whereas the remaining part is not ( figure 3(a)). The same plane acts as a separatrix of the cooperation-enhancing region that encourages increases in cooperation for each q i where i = 1, 2, 3, 4 ( figure 3(b)).
In sum, this paper reveals the non-trivial and elegant geometry of ZD strategies that is previously overlooked. Our present work offers a geometrical interpretation of the ZD parameters (O, χ, φ) and particularly the unprecedented geometrical relationships of ZD strategies in the entire strategy space of memory-one IPD strategies with q = [q 1 , q 2 , q 3 , q 4 ]. Most interestingly, we find that the subset of equalizer strategies forms the critical plane in the strategy space that emerges as the critical equilibrium manifold in the adaptive dynamics of ZD strategies and also as a separatrix of the cooperation-enhancing region. These results highlight the previously unforeseen connection between equalizers and general ZD strategies.