Mutual cooperation and tolerance to defection in the context of socialization: the theoretical model and experimental evidence

The study of the nature of human cooperation still contains gaps needing investigation. Previous findings reveal that socialization effectively promotes cooperation in the well-known Prisoner's dilemma (PD) game. However, theoretical concepts fail to describe high levels of cooperation (probability higher than 50%) that were observed empirically. In this paper, we derive a symmetrical quantal response equilibrium (QRE) in PD in Markov strategies and test it against experimental data. Our results indicate that for low levels of rationality, QRE manages to describe high cooperation. In contrast, for high rationality QRE converges to the Nash equilibrium and describes low-cooperation behavior of participants. In the area of middle rationality, QRE matches the curve that represents the set of Nash equilibrium in Markov strategies. Further, we find that QRE serves as a dividing line between behavior before and after socialization, according to the experimental data. Finally, we successfully highlight the theoretically-predicted intersection of the set of Nash equilibrium in Markov strategies and the QRE curve.


Introduction
Human behavior is still a question and still contains gaps needing investigation. What we know is that our way of thinking, actions, and beliefs depend on many different factors: internal and external. Human behavior includes the important social ability of cooperation, defined by the Cambridge dictionary as "the act of working together with someone or doing what they ask you" 1 . Perhaps more importantly, cooperation is about sharing mutual profit, equality, costs, and skills. Nowadays, during the pandemic, we realize how important cooperation is for society. It is not about money, but human lives. Examples of cooperation include wearing a mask, keeping social distance, and being patient and generous with public. Thus, studying people's ability to cooperate helps in making beneficial choices during the world pandemic 2 .
Despite the evidence of the clear advantages of cooperation, rational to choose defection rather than cooperation when faced with a social dilemma. It is for this reason that these situations are called dilemmas, and studying the factors that lead to cooperation is an important step toward understanding this example of behavioral economics.
Authors make different arguments on which factors may increase cooperation in social dilemmas 3 : using communication 4,5 or socialization [6][7][8][9] , mobility and dynamics 10,11 , connectivity 12 , or aspects of an individual's identity 13,14 . The choice to cooperate is more of an intuitive act than a meaningful one. It is an emotional, quick, automatic operation that does not involve effort. To support this claim, the authors compared the amount of time that participants in the experiments spent choosing between cooperation and non-cooperation strategies 15 . Their results indicate that quick choice could be a predictor of cooperation. Effects of sociality could also lead to increasing of cooperation 16 . Valerio Capraro even introduces the cooperative equilibrium for explaining deviation from Nash equilibrium, based on the idea that people have some tendency to cooperate by default 17,18 .
There are different approaches to shifting strategies from individual to social. The question remains regarding what models can explain irrational cooperation in social dilemmas.
Here we list some concepts accepting relatively high cooperation level: Previous studies demonstrate that social interaction significantly increases the cooperation level in iterated Prisoner's Dilemma games, from a 20% cooperation rate prior to socialization to 53% after socialization 9,[25][26][27] . To model such a high level of cooperative strategy choice we require a specific approach. In the paper 26 it was proposed to consider Prisoner's Dilemma in Markov strategies. For this game, a symmetric totally mixed Nash equilibrium was found. However, this equilibrium better fits strategies prior to socialization than after. Therefore, we developed a new model that is able to describe high-cooperation strategies.

Prisoner's Dilemma game (PD)
This work is based on the broadly-known Prisoner's Dilemma game. In this game, two participants choose between two strategies: Left or Right for the first strategy, and High and Low for the second. The choices are simultaneous and independent from each other. Payoffs correspond to the following payoff matrix (see Table 1

Nash equilibrium for PD
PD has one Nash equilibrium: it is a mutual choice of Defect strategy which gives the payoff of 1 for two players. However, laboratory experiments show that people in some conditions avoid Nash equilibrium 9,26 . For example, under social framing, individuals may start to choose more frequently the Cooperate strategy, a sort of behavior that could be considered irrational.
For this reason, it would be interesting to discern a theoretical concept underlying this specific behavior.

Nash equilibrium for PD in Markov strategies
The papers 7,9,28,29 argue that for some subjects, social context led to the increase in cooperative choices up to 100%. So, the behavior under social context is far from Nash equilibrium. One of the ways to somehow describe the cooperative behavior is to consider Prisoner's Dilemma game in Markov strategies.
Consider two participants ∈ {1,2}. Let us denote the probability to cooperate in round for the first participant as ! " ( ). We describe participants' behavior by means of the following two quantities: (1) -mutual cooperation (probability of cooperative choice as the respond to cooperative choice of opponent on the previous round); (2) -tolerance to defection (probability of cooperative choice as the response to a defective choice of opponent on the previous round). These two variables imply that individuals' strategies at round − 1 determine completely their behavior at round . This model will be referred as PD in Markov strategies 26,30 , and for brevity we will refer to subsequent and as Markov strategies.
Dynamics of participants' actions can be presented as follows: In a stationary state, we have: where ! " and # " are stationary probabilities of cooperation.
Payoff function for participant 1 has the following form: The paper 26 found a symmetric (whereby ! = # = and ! = # = ) totally mixed Nash equilibrium for Prisoner's Dilemma in Markov strategies in explicit form. This equilibrium can be represented as the points ( , ) that meet equation and located in the unit square (see Figure 1). Further we will refer to equilibrium as Nash equilibrium and this curve as Nash equilibrium curve.

Figure 1. Symmetric totally mixed Nash equilibrium for PD in Markov strategies
It is evident from Figure 1 that curve (4) exists in the area which is characterized by relatively small values of (the tolerance to defection does not exceed 0.3). However, experimental results (Section 4) show that tolerance to defection could be even more than 0.5.
Attempting to reconcile this problem, we derive QRE for PD in Markov strategies in the next Section.

Quantal response equilibrium (QRE)
The QRE model was invented to explain the observed behavior of participants in laboratory experiments when it differs significantly from the Nash equilibrium 19 . QRE is an internally consistent equilibrium model in the sense that the quantum response functions are based on the distribution of equilibrium probabilities in the choice of strategies of opponents, not simply on arbitrary beliefs that players may have about these probabilities. One of the features of the model is that it allows consideration of "players making mistakes". QRE imposes a requirement that expectations must match an equilibrium choice of probabilities. However, in contrast to the classical Nash equilibrium, the definition of QRE assumes that participants strive for the best answer only in the probabilistic sense: the better the answer, the more likely the participant will choose it 31,32 . The QRE was compared with experimental data, and determined that this approach provided better fit than the Nash equilibrium 33 . In practice, the QRE is built upon employing logistic distribution. Answer $ to the mixed strategy %$ of the remaining players (the probability of choosing strategy $ ) is expressed through the following formula: where -is the parameter of participant's rationality, and $ ( $ , %$ ) is the expected gain of participant when strategies of other players %$ and strategy $ of participant are given.
Therefore, when → 0 (low rationality) choices are equally random, and when → ∞ (high rationality) participants chose the strategies with the highest expected payoff.
The paper 26 where ∈ [0; 1], ∈ [0; 1], is the payoff function (expressions for | -./ , | -.! , | 0./ , and | 0.! are given in Appendix 1), and ∈ [0; +∞) is fixed. We propose to solve system (6) numerically, reducing it to finding (as far as feasible) the optimal solution of the following optimization problem: Let us first investigate the behavior of the objective function In Fig. 2, we demonstrate how solutions obtained correspond to the contour lines of the objective function for different values of rationality. We observe that for values of close to null, the (unique) minimum of the objective function is reached near the point [0.5, 0.5] that corresponds to the sense of (under assumption of low rationality, individuals should act at random). With the increasing of , we notice several local minima gradually shifting to the Nash equilibrium curve. When rationality is high, local minima converge to the Nash equilibrium curve on one hand and the strategies' profile corresponding to the standard Nash equilibrium = 0, = 0 (defect/defect) on other hand. This result corresponds to the theory, as the Nash equilibrium describes behavior of fully rational actors 19 .
To solve optimization problem (7), we use the Python package minimize from scipy.optimize 34 . Fig. 3 plots the arrangement of obtained a symmetrical quantal response equilibrium for PD in Markov strategies which form a near smooth curve in the range of small (approximately less than 5). For these values of rationality, the objective function always has a unique global minimum, which is perfectly caught by the solver. In the middle range of (approximately in the interval [5,7.08]), the solution of the optimization problem approaches the Nash equilibrium curve. Nonetheless, at these levels of rationality, the QRE curve loses its smoothness and solutions "leapfrog" on the Nash equilibrium curve (see blue triangles on Fig.   3). For large values of the rationality ( > 7.08), solutions of (7) converge to the point = 0, = 0 which does not belong to Nash equilibrium curve in Markov strategies. Instead, it marks the strategies' profile of standard Nash equilibrium (defect/defect). According to the theory, intersections of the Nash equilibrium curve and the QRE curve should exist and mark the branch of Nash equilibrium in Markov strategies that appear to have a special significance in experimental data description 19 . We demonstrate that there are few intersections of these curves (blue triangles, Fig. 3) that could be the result of the optimization method weakness.
However, exactly the "first" intersection (which is located near ≈ 0.2, ≈ 0.5 and is derived under ≈ 4) fits the experimental data best when compared to other intersections (Section 4).   (6) for different values of rationality. The arrows indicate the direction in which rationality grows. The resulting QRE curve consists of three branches that correspond to different ranges of rationality. The dashed line represents Nash equilibrium for PD in Markov strategies.

Experimental results and discussion
In this section, we compare the equilibrium found against the data from laboratory experiments which were presented in the following publications 9,25,26 . The general goal of these experiments was to identify the effect of socialization on the level of cooperation choice in the PD game.
The full description of the experiments can be found in Appendix 2. The following is a schematic representation of the experimental design: 1. 12 recruited participants (all strangers).

2.
Participants play iterated PD (Table 1) in a mixed-gender group of 12 people for 11-22 rounds.

3.
Socialization of unacquainted members of groups. Division of participants into two groups of 6 people.

5.
Participants are compensated for the experiment.
In Table 1 (Appendix 3) we present aggregated results of the experiments. We find that the choice of cooperation is higher after socialization (58%) rather than before (22%). We assume that socialization compensates for the irrationality of these choices. This implies that despite the expectation that payoff of defection is higher than of cooperation, the utility of sociality is higher than probable losses of the cooperation choice. In comparing theoretical results with the experimental data, we found for every part of the experiments probabilities of mutual cooperation (gamma) and tolerance to defection (alpha) (see Table 1, Appendix 3).
We first analyzed how experimental points correspond to values of objective function (8) under different levels of rationality (see Fig. 4). We observed that most participants' strategies can be approximated by the minima of the objective function after selecting the appropriate level of rationality. More precisely, we recognized that behavior of individuals with high level of cooperation (more than 50%) could be modeled by selecting low rationality rates (which was one of our objectives) whereas low-cooperative participants were wellapproximated by high values of rationality. From this perspective, we conclude that socialization reduces the level of rationality. Unfortunately, participants located in the upper right zone of the phase plane are still unexplained. The fact that the QRE curve have intersections with the Nash equilibrium curve means that our results are consistent with the theory. Notably, among all Nash equilibrium in Markov strategies, the most "successful" equilibrium (in fitting experimental data) is one found on the "first" intersection between the QRE curve and Nash equilibrium curve (at the rationality level ≈ 4). Interestingly, the QRE equilibrium for ~< 4 divides strategies before and after socialization (see Fig. 5), resembles phase boundary.  participants before socialization, while violet circles signify strategies after socialization.
QRE points for ~< 4 serve as a natural border between points before and after socialization.

Conclusion
In the era of highlighting the importance of every individual's choice while mass media promotes living for oneself, it is crucial to remember cooperation as an effective mechanism to promote well-being of the whole society. Our paper proposes a new theoretical concept to explain the high levels of cooperation (more than 50%) which was previously obtained in the When we matched QRE and experimental data, we observed that high level of cooperation after socialization can be explained by QRE with the low rationality rates.
Conversely, low-cooperative results before socialization are well-approximated by high values of rationality. We also found how QRE completes the existing Nash equilibrium for this game.
The intersection between the equilibrium curves under the low parameter of rationality ( ≈ 4) gives the unique selection of Nash equilibrium, fitting the experimental data the closest (comparing to other Nash equilibrium in Markov strategies). Additionally, QRE curve (for parameter of rationality ~< 4) serves somewhat as a phase boundary for the experimental data before and after socialization: before socialization points lie below QRE while after socialization points lie above. However, we understand the chance this result was found by coincidence. Therefore, one possible continuation of the study is to investigate the possible property of QRE as a phase boundary in two directions: (1) theoretical (by deriving corresponding models) and (2) empirical (by conducting experiments). Finally, we also observed experimental strategies which deviate far from QRE or Nash equilibrium. This indicates that the investigation of theoretical concepts explaining the behavior of high cooperation is still in progress.