Dynamics of heuristics selection for cooperative behaviour

Situations involving cooperative behaviour are widespread among animals and humans alike. Game theory and evolutionary dynamics have provided the theoretical and computational grounds to understand what are the mechanisms that allow for such cooperation. Studies in this area usually take into consideration different behavioural strategies and investigate how they can be fixed in the population under evolving rules. However, how those strategies emerged from basic evolutionary mechanisms continues to be not fully understood. To address this issue, here we study the emergence of cooperative strategies through a model of heuristics selection based on evolutionary algorithms. In the proposed model, agents interact with other players according to a heuristic specified by their genetic code and reproduce -- at a longer time scale -- proportionally to their fitness. We show that the system can evolve to cooperative regimes for low mutation rates through heuristics selection while increasing the mutation decreases the level of cooperation. Our analysis of possible strategies shows that reciprocity and punishment are the main ingredients for cooperation to emerge, being conditional cooperation the more frequent strategy. Additionally, we show that if in addition to behavioural rules, genetic relatedness is included, then kinship plays a relevant role. Our results illustrate that our evolutionary heuristics model is a generic and powerful tool to study the evolution of cooperative behaviour.


I. INTRODUCTION
Game theory constitutes a powerful framework for the mathematical study of social dilemmas [1,2]. Within this framework, the most representative and widely used game to model cooperation, the Prisoner's Dilemma, has become a paradigm for modelling the evolution of cooperative behaviour [3]. The Prisoner's Dilemma mimics the worst possible scenario for cooperation in which selfishness always provides a higher individual benefit than cooperative behaviour. Initial predictions indicated the social optimum would not be reachable by rational selfish individuals if the temptation for defecting (T ) exceeded the reward for cooperating R. Nonetheless, cooperation is pervasive in human and animal societies [4][5][6], and a vast literature has demonstrated how cooperation can thrive in the presence of an appropriate evolutionary process [7][8][9][10][11][12][13][14][15]. The possible situations where cooperation might flourish are endless, and we are just beginning to uncover the ingredients behind the complexity observed in real systems [16,17]. Consequently, theoretical studies usually focus on simplifications, such as individuals behaving according to fixed pure strategies [7,12] or some arbitrary set of them [18,19]. Yet, the reasoning and motivations of humans are more sophisticated and complex than pure strategies and decisions are usually taken factoring in many ingredients, weighting them differently [20]. In other words, generally speaking, the selection of strategies takes place in complex systems wherein imprecise behaviour and the environment are inputs of each other in a perpetual feedback loop [21].
In this line, behavioural economics has shown that humans respond in unexpected ways [22,23] and often seem to possess hardwired heuristics while acting in experimental situations [24,25]. Experiments have also shown that humans automatic responses are modelled by experiences from daily-life, building heuristics or intuitions which tend to favour cooperation [26,27]. Therefore, it is plausible that cooperative societies are sustained by existent heuristics, maintained by norms [28,29] or biological factors [17,24,30], that have resulted from a selection dynamics. It is thus imperative to understand how such possible heuristics have evolved, which will allow explaining the ingrained mechanisms behind the behaviour observed in living beings.
In this paper, we investigate the evolution of cooperative strategies through an agent-based model of heuristics selection inspired by evolutionary algorithms [31]. The ultimate goal is to obtain a description of the evolutionary process that could lead to different strategies. Explicitly, we consider agents composed of a chromosome and memory to store information of other players' previous actions (Fig. 1a). Their actions are responses, according to what is coded in their genes, to other players' history. The strategy space is given, thus, by all the possible genes' combinations. This does not mean that we model behaviours defined by real genomes: decision making, especially in humans, has entangled layers of complexity, and such an approach would be misguided. Rather, we use chromosomes as a tool to model heuristics formed through cultural or biological evolution [21,32].
In our framework, the fitness of agents corresponds to the payoff obtained in iterated games, and it determines the agent's reproduction rates. Offspring will inherit its parents' chromosomes while being susceptible to mutation. Their memories store their experiences with their neighbours, and their chromosomes determine what will be their responses to the variables stored. b) Agent u cooperates probabilistically with agent a according to what is coded in its gene and the history of agent a. c) Reproduction takes place synchronously at the end of a generation (G): for each site u, a new agent is chosen proportionally to its fitness from the set {u ∩ N (u)} (coloured nodes), wherein N (u) are u's neighbours. In the example, each colour corresponds to a different chromosome and, at generation G + 1, the chromosome of agent u happens to have reproduced in sites u and b, while its other neighbours by chance maintained the same chromosome. d) When an agent reproduces, with probability pmut a bit will be flipped.

b) Decision Making
Note that our approach differs from elementary evolutionary algorithms: they optimize functions in a constant fitness landscape, but in evolutionary games changes in the population imply changes in the fitness landscape [33], which can be easily seen in any form of the rock-paper-scissors game [15].
The use of evolutionary algorithms to explore the adaptation of agents is not new [10,11,34], and previous works have studied the evolution of automata-like strategies, though aiming at answering specific situations [35,36]. In these studies, the equivalent of a chromosome is a tool to encode an extensive set of memory-based strategies used to understand when cooperation may thrive. Unfortunately, these types of strategies are hardly realistic and do not correspond to the optimal model for understanding the mechanisms behind human or animal responses. A model of heuristics should resemble more closely automatic responses based on intuition and past experiences [20], namely, by considering that intuitive responses are no more than stochastic processes which take as inputs the variables observed by the individual.
Here, we develop a modelling approach in which agents can evaluate different variables at the same time, thus resembling real situations wherein different factors interact and affect actions. Agents decisions are determined by an activation function taking as input their chromosome and the information to which they have access. Given their theoretical and practical importance, we focus on the evolution of cooperation in social dilemmas. For this case, therefore, we selected a set of variables based on the history of the players with whom they are playing. Nonetheless, our modelling framework is generic, and any arbitrary set of variables can be added or removed according to the question of interest. Our results show that the specified heuristics can evolve to cooperative equilibria for low mutation rates. An analysis of agents chromosome reveals that cooperation endures by reciprocity, indicating that the evolution drives heuristics to reproduce a fundamental mechanism underlying cooperation in nature, especially in humans [37,38]. In this case, emerging strategies of conditional cooperators dominate, permitting cooperation to prosper. Finally, we provide an extension wherein agents can evaluate their genetic relatedness with others. The population in this scenario evolves to similar equilibria. However, the agents' chromosomes differ significantly from the first model. Kin identification becomes the main mechanism of cooperative heuristics. Nonetheless, agents still need to have a memory of their past actions for cooperation to endure.
Undoubtedly, varied environmental or perception variables affect the resulting behaviour in humans and other animals. Unfortunately, it is not straightforward to capture which variables guided evolution to the emerged behaviour in each particular scenario. In this line, our proposal provides one generic approach for the modelling of such processes. In particular, the model here presented also contribute some insightful results with the current specifications. Namely, we observe that cooperation can spread spontaneously when memory is available, and that mutation is essential to ensure this outcome. Moreover, although the same behaviour might be observed in distinct populations, the underlying causes might be significantly different, as we observe with our kin and non-kin models. These insights suggest that our method can be a useful tool to uncover the ultimate causes behind the evolution of pro-social behaviour.

A. Population Dynamics
We consider a virtual environment inhabited by n haploid agents in a zero population growth condition, each one of them (u) containing a chromosome A u defining the heuristic which will guide its decision. Each agent interacts with each other through links defined by a static contact structure, in which L is the set of edges connecting the two pairs of agents. In real systems, a generation embodies repeated interactions between individuals, and it is known that fast selection fluctuations can suppress cooperation even in the cases in which it is the only rational choice [13]. In our model, in each generation, there is a finite number of s = 100 time steps and, therefore, s|L| dyadic interactions take place, i.e., one for each edge at each time step. Thus, at each time step t, connected agents u and v interact in a game and obtain the payoffs π t u and π t v , respectively. The generation reaches its end after the s time steps, and each agent u will have accumulated a total payoff of Π u , corresponding to its fitness in a strong selection pressure process [13,39]. Agents reproduce by a localized death-birth process [40]: at the end of each generation, each node u will be replaced by a node u in the set N 2 (u) = N (u) ∩ {u}, which is composed by the neighbourhood of u (N (u)) and u itself (Fig. 1c). Node u is chosen probabilistically according to the fitness (Π u ) of nodes in N 2 . Thus, on one hand, the nodes which accumulate more payoff are more likely to be chosen, on the other hand, most adapted agents can reproduce up to sites of distance one. Finally, some fluctuations might affect offspring. Specifically, there is a probability p mut of a newborn having a bit flipped in their chromosome (Fig. 1d).

B. Game
We are interested in the evolution of cooperation in a population of agents facing a social dilemma. Strictly speaking, we want to check if cooperative heuristics are the most adapted in conditions wherein pure strategies equilibria would be of full defection. We consider that at each interaction, agents play a round of a Prisoners' Dilemma (PD) with their neighbours. The PD game is a 2x2 game in which only two actions are available to the players, either cooperate or defect. If two players cooperate, they both get a reward R, if one cooperates and the other defects, the cooperator earns S and the defector gets a payoff T (the temptation to defect). Finally, if both defect, both of them obtain P . The PD occurs when the elements of the payoff matrix are such that T > R > P > S, which implies that a rational player should defect because, whatever your opponent does, the best (in terms of having larger payoff) is to defect. Henceforth, we consider that the values of each entry are a normalized version of Axelrod's tournament [41] values. Namely: T = 1/ k ; R = 0.6/ k ; P = 0.2/ k ; S = 0. As mentioned before, for these values, the prediction is that under a replicator dynamics, the system ends up in full defection [13]. We also note that small changes in this parametrization would not affect our results, as they are robust for a broad range of the temptation (T ) parameter (see Section I of the Supplementary Material).

C. Agents
Agents are hardwired, and their heuristics do not change in the course of one generation, which corresponds to their lifetime. Their heuristics are determined by their chromosomes and constitute a stochastic way to evaluate the Variable Gene Description a Typically indirect reciprocity is defined by individuals playing a one-shot game in a large well-mixed population [43]. Nonetheless, here R t u,v does not consider the actions of v with respect to u, which should correspond to an analogous effect. variables stored in their memory and make a decision on whether to cooperate or not. Agents' memory stores the variables from previous interactions, and we assume their working memory is limited [42]. Hence, agents can only store a finite set of variables from the previous m rounds. Specifically, an agent u with the set of neighbours N (u), has stored in its memory M u variables for all v ∈ N (u) and for all l ∈ [1, m]. Therefore, M u is a matrix wherein each row contains the values stored for one neighbour, as shown in Fig. 1a.
The heuristics evaluate each stored variable according to a specific gene in the chromosome. Therefore, the expression of each gene is a weight given to a variable containing some information influencing agents' decision making. The vector E u carries the responses of an agent u, i.e., its expressed genes values. They are given by a two complement representation of the gene bits, therefore, they are integers from -128 to 127 1 . The vector E u contains the responses to the variables plus a constant response (β 0 ). Table I shows the set of variables stored and their corresponding genes. They are a basic set of external characteristics that an elementary agent can observe. Thus, they constitute a reasonable set of variables to be taken into account by a somewhat minimal heuristic. Finally, whether or not an agent will cooperate is determined by the sigmoid function specified by where ρ u,v corresponds to the probability of agent u cooperating with agent v. X u,v = (1) ⊕ M u,v corresponds to a vector composed by the number 1 in the first position, followed by the memory variables as specified in Table I. κ (henceforth set to 0.05) provides the steepness of the curve, and it is chosen in such a way that if the dot product of both vectors is greater (resp. lesser ) than 100, the probability should be approximately 1 (resp. 0 ), as illustrated by Fig. 1b.

III. RESULTS
We ran simulations for populations of 1024 agents connected on a lattice (LTT ) with a von Neumann neighbourhood and on Random Regular Networks (RRN ) with the same nodes' degree (k = 4). We evolved the model for 5 · 10 5 generations, each with 100 rounds, for different values of the p mut parameter. Results for memory between 0 and 5 are shown in Fig. 2. When m = 0, the agents' chromosome is composed of only the constant response (β 0 ) and strategies are reducible to mixed strategies. In this case, when no mutation is available the system quickly goes to full defection (see Supplementary Fig. S5), as expected, and mutation increases the possibility of adding cooperative strategies by drift. Conversely, when agents have access to memory, cooperation is predominant in the regime of low mutation. Furthermore, cooperation is larger and more resilient to higher mutation when agents have access to a bigger memory. With more memory, agents should be able to construct more complex heuristics which seem to favour cooperation.  When p mut = 0, the final fraction of cooperative fraction is highly dependent on the initial conditions, reaching a multitude of equilibria, some being fully cooperative and others showing a rather small level of cooperation, specially in the RRN network. In the regime of small mutation rates, fluctuations increase significantly. However, for some small values of the mutation rate, all realizations converge to highly cooperative equilibria, as can be seen when p mut = 0.05. Note, additionally, that as the probability of mutation increases, the fraction of cooperative actions decreases. For the limiting value p mut = 1, every new player is born with a mutation and the system evolves into a negligible average level of cooperation. Interestingly, this is a demonstration that a small noise can foster cooperation in the process of evolution. With more mutation, it gets harder for cooperative strategies to prevail and defection tends to increase, however, a sufficiently small mutation probability will guarantee that the system evolves to a cooperative equilibrium.

A. Heuristics and Strategies
In this section, we focus on the composition of the populations in the different regimes. It is not straightforward to evaluate how genes and variables interact, hence, it is hard to determine if agents are going to cooperate or not in a specific situation. A first step is to investigate what are the gene values in cooperative and non-cooperative equilibria. Fig. 4 show the distributions of genes for two mutation values: p mut = 0.05 and p mut = 1, wherein evolution leads to mostly cooperation and to mostly defection, respectively. Simulations in both LTT and RRN networks yielded similar distributions, indicating the presence of a common evolutionary pattern.
When the majority of the population cooperates (p mut = 0.05), β 0 , C 1 , and R 1 have a clear right-modality with most of these values being higher than 0. Conversely, D 1 is left-modal with a clear peak at extreme negative values, while P 1 shows a softer trend towards negative values. This implies that when cooperation thrives, agents have a baseline cooperative response and tend to reciprocate cooperation both directly and indirectly. On the other hand, the agents punish defectors rigorously and have a mild negative response to other agents' payoff, probably as a means to punish defectors, as only defectors can attain the highest payoffs. Interestingly, the distributions of β 0 indicate that the emerging strategies are willing to cooperate even in a one-shot game (see Supplementary Fig. S16) with an unknown player, albeit this is not the expected behaviour for m = 0. In the other extreme, for p mut = 1, defection prevails, and genes values indicate the underpinnings of this trend. All distributions are right-skewed, with β 0 and D 1 having a noticeable peak at the lowest possible values. Thus, when mutations are too frequent agents are much more likely to exploit and punish, leading defection to be the default strategy. Too much drift will make it impossible for cooperative heuristics to be selected, and they will vanish in the population.
These last results provide a picture of the genotype space. However, there is still the need to identify which strategies have emerged. When studying evolutionary games, it is always challenging to bridge the gap between the genotype and phenotype spaces [33]. In our model, the profile of agents' actions would correspond to observable phenotypes, yet it is not straightforward to specify a method for heuristics classification. An unsupervised procedure would fall into the problem of how to identify the groups encountered, i.e., how to determine to which known strategies they correspond. Therefore, here we adopted an approach that consisted of classifying agents by looking at what would be their responses to the most basic strategies: a pure defector and a pure cooperator. Namely, we looked at whether agents were likely to cooperate or defect with agents having a history corresponding to each of the two pure strategies. For instance, a full defector v would always have defected with u (C 1 v,u = 0, D 1 v,u = 1), with its other neighbours Pure Cooperator Pure Defector TABLE II. Classification of heuristics according to their responses to the two pure strategies: pure defector and pure cooperator. We consider that agents cooperate (C) or defect (D) if their probability to cooperate is greater than (1 − σ) or smaller than σ, respectively. (R 1 v,u = 0), and would have an expected payoff (π 1 v ) corresponding to these actions.
The proposed classification is shown in Table II (see the details in Section II of the SI). We considered strategies analogous to known ones, namely: Full Cooperator (FC), cooperates with both pure cooperators and pure defectors; Full Defector (FC), defects with both; Conditional Cooperator(CC), reciprocates cooperation and defects otherwise; Generous Conditional Cooperator(GCC), reciprocates cooperation and can cooperate randomly with defectors; Conditional Defector(CD), cooperates randomly with cooperators and always defects with defectors; Bully, defects with cooperators, but cooperates with defectors; Random, behave randomly with both pure strategies. We labelled agents that could not be classified by this process as Undefined.
In Fig. 5, we show the frequencies of each strategy from simulations of the heuristics selection dynamics on a lattice (a similar pattern is obtained for RRN networks, see the SI, Fig. S3). When the mutation is low (p mut = 0.05), most of the agents tend to be cooperators or conditional cooperators (mean fraction is 0.9 with a standard deviation of 0.07): CC constitutes most of the strategies, followed by a small fraction of GCC and FC players. In contrast, when mutation is high (p mut = 1), FD and CD constitute the majority (mean=0.66, sd=0.038) of agents. However, a minority of CC players can persist (mean=0.17, sd=0.022), which explains the existence of a small fraction of cooperative actions even in this regime.

IV. EXPLORING KIN DISCRIMINATION: A FIRST EXTENSION.
It is known that cooperative behaviour can emerge and be sustained by factors that do not depend on players history of decisions. Namely, genetic relatedness or kinship plays a key role in the evolution of cooperation in nature [30,37,38,44]. Kin selection is pervasive [4,5], despite controversies over its role in particular phenomena [45][46][47][48][49][50]. Indeed, these disagreements indicate the need to investigate the role played by genetic relatedness in each specific scenario [48]. Therefore, to address this question, we take such mechanisms into account in the evolutionary dynamics of heuristics selection. Namely, we have extended the previous analysis and considered that agents could evaluate an additional variable that accounts whom they are interacting with, specifically, genetic proximity, which is one main mechanism ensuring interactions occur among related individuals [51].
We added to the agents' chromosome a gene K to account for genetic relatedness with the interacting agent. Operationally, we consider that this kinship relation is given by the Jaccard index of pairs of agents' chromosomes. Note that we are not specifying a method for kin selection, but allowing the heuristics to take into consideration agents similarity when deciding to cooperate or not. Enabling, thus, an estimation of the relevance of genetic relatedness by evaluating the weight organically given to the heuristics' new gene.
Results of simulations on a lattice are presented in Figure 6. Figure 6A shows the fraction of cooperation at the steady-state both for our previous model (Non-Kin) and for the extended model (Kin). The evolution leads to similar scenarios in both cases, indicating that the presence of the (K) gene did not enhance nor undermine cooperation significantly, though there is one modest exception. For heuristics without memory (m = 0) and low mutation, there is a modest increase in the level of cooperation.
Despite the negligible differences in outcomes, there is a substantial effect on agents' chromosomes. Fig. 6B shows that including the possibility to weigh gene similarity changes the values of all other genes significantly. For m = 1, cooperation is strongly determined by the (K) gene, and genes for direct reciprocity and constant response becomes negative or neutral. The latter implies that most agents will not cooperate in one-shot interactions with unrelated individuals, as shown in Supplementary Fig. S17, demonstrating a significant difference from the agents without the K gene. There still is a mostly positive response for indirect reciprocity and a negative for punishment, while the weight given to participants payoff inverts. This result points to a compelling message: when heuristics can evaluate genetic relatedness, the ones that do that will have a higher reproduction, therefore resulting in more adapted heuristics. Nonetheless, information from past interactions is still required, with punishment and reciprocity playing a role.

V. CONCLUSIONS
Natural selection has shaped the evolution of all sort of life forms. Advantageous strategies endure while others dwindle in a never-ending process of adaptation. Fundamental questions regarding the emergence of cooperative behaviour in social dilemmas have to be studied in the light of evolutionary mechanisms. Undoubtedly, emerging behaviour is intrinsically dependant on the individuals under study, e.g., humans commonly cooperate in large societies composed of unrelated individuals, while groups of animals are hardly greater than a few hundred [52]. In particular, variance in humans is especially relevant, as behaviour is deeply affected by the specifics of the interactions and the culture of the individuals [16,53]. Moreover, given that it is an emergent phenomenon, behaviour can be deeply affected by the complex topology of interactions [54]. In an attempt to provide a framework for such scenarios, here we explore a model that allows unravelling what could be the drivers of cooperation by a heuristics selection process.
By exploring heuristics that make use of agents behavioural information to stochastically determine their decisions in iterated prisoners' dilemma games across generations, we have shown that, in a feasible environment, evolution will drive heuristics towards cooperation even when defection is expected for pure strategies. In these scenarios, reciprocity and punishment are the main ingredients of cooperators' decision-making, and most strategies will follow conditional cooperation. The fraction of cooperative decisions decreases with an increase in the mutation rate, nonetheless, for small mutation rates the system reaches a cooperative equilibrium. Without mutation, the configuration of the initial state is critical and the system can get trapped in equilibria of meagre cooperation. Increasing the memory of individuals also increases the fraction of cooperation, suggesting that heuristics with more resources are more cooperative. These aggregate results are indistinguishable from a version of the model wherein agents have, in addition to behavioural information, access to their similarity with others (which mimics genetic relatedness). For this latter scenario, the level of cooperation at the macroscopic level remains roughly the same. Important enough, however, at the level of individuals, chromosomes change significantly and cooperation is given through a kin identification process.
Therefore, when agents discriminate their kin, reciprocity loses much of its importance, which is especially insightful given the behaviour observed in nature. Kin selection is arguably the most important mechanism behind cooperation in non-human animals, while reciprocity is uncommon [37,38]. Our result suggests that in order for reciprocity to be dominant, perfect kin discrimination cannot exist, which suggest that figuring out the interplay between both mechanisms is crucial for understanding human evolution. Moreover, agents evolved in each condition presented a different expected response in one-shot games with unrelated individuals: cooperation is likely without the kin discrimination gene, while the majority of agent will defect when they can discriminate their genetic similarity.
To round off, we note that heuristics will adapt according to the information that they have access to, and they can change significantly according to the variables available. Surprisingly, despite changes in methods, cooperation is more likely than exploitation, due to reciprocity [55,56] or to kin selection [30]. This suggests that even if individuals have limited cognitive capacities (a small memory weighed by a rather inexpensive function), cooperative heuristics can have higher reproduction rates and be pervasive. However, extrapolations have to be made with caution. As it is often the case of works in evolutionary game theory, our model sidesteps important details from biology and cognitive sciences [57]. Future work should explore the intersection between moral and material values and how it influences heuristics [6], and how selection works in more complex scenarios, for instance, when higher cognition has higher associated costs [58]. Moreover, our approach could be used to understand how cultural characteristics [16,28] drive cooperation in different directions by modelling proper environmental variables, and whether costly punishment could sustain large scale cooperation [59]. We plan to explore this and similar questions next.
Supplemental Materials: Dynamics of heuristics selection for cooperative behavior

I. OTHER PAYOFF VALUES
To ensure that our results are robust with respect to differences in the payoff values, we ran simulations for different values of the temptation parameter T . To make our results comparable to previous work, we used the one-dimensional parametrization of payoffs used by Nowak et. al [7]. In this version, R = 1, P = , S = 0, and T varies from 1 to 2, with being a value close to zero. As we consider normalized versions, the payoff here is defined by T = T / k ; R = 1/ k ; P = 0.01/ k ; S = 0, with T varying from 1 to 2. Results for memory 0 and 1 are shown in Fig. S1. The results show that without memory, cooperation is only attainable when T = 1 and low mutation. However, when agents have memory of their last interaction, cooperation endures even when the temptation to defect is around 2. As an illustration, the distributions of gene values for T values of 1.2 and 1.8 are shown in Fig S2. They follow a close pattern to the ones shown in Fig. 4 in the main text, indicating the equivalence of both results. This shows that our results are robust across a broad range of parameter values.

II. HEURISTICS CLASSIFICATION
Heuristics are classified according to two basic strategies: Pure Cooperation and Pure Defection. These two strategies always cooperate and always defect, respectively. Table S1 illustrates the variables contained in the memory of agent u with respect to a player v, corresponding to the two pure strategies for m = 1. All the values are given straightforwardly, expect for π 1 v . Payoffs values are more complicated, as they depend on the players with whom they are playing with, which we cannot define a priori. We decided to use the average payoff of individuals which cooperated and defected with all their neighbours for the pure cooperator and pure defector, respectively. Therefore, π C = π t i ∀i ∈ R 1 , ∀t ∈ [1, 100] and π D = π t i ∀i ∈ R 0 , ∀t ∈ [1, 100], wherein R 1 (resp. R 0 ) corresponds to the set of agents which cooperated with all (resp. none) of their neighbours in the last time step.
The activation function (Eq. 1 in the main text) of an agent results in the probability to cooperate with the Pure Cooperator (ρ C ) and the Pure Defector (ρ D ). We then, use the threshold σ to divide the plane (ρ C , ρ D ). Namely, we designate as cooperation when ρ > (1 − σ), defection as ρ < σ, and random when σ ≤ ρ ≤ (1 − σ). This process results in the set of strategies given in Table II of the main text. Therefore, a precise version of this table would  correspond to the one shown in Table S2.

III. EXTENDED MODEL IN RANDOM REGULAR NETWORKS
In this section, we present the results of the extended model with the Kinship parameter ran on RRN graphs. At variance with the model in a lattice, when there is no mutation, the fraction of cooperative actions can be different from zero, as it is also shown in the time evolution figures. This demonstrates how important the kin identification mechanism can be in an adequate environment. With mutation, the macroscopic results are equivalent to the results in a lattice and in an RRN without the extension. Furthermore, when agents have had access to memory the results are equivalent to the ones obtained in a lattice, including the distribution of gene values for m = 1.

V. ONE-SHOT RESPONSES
Distributions of the probability of cooperating with an unknown agent in a one-shot game, are shown in Fig. S16 and Fig. S17 for the non-kin and kin models, respectively.