Cognitive strategies take advantage of the cooperative potential of heterogeneous networks

Understanding the emergence and maintenance of cooperation is one of the most challenging topics of our time. Evolutionary game theory offers a very flexible framework within which to address this challenge. Here we use the prisoner's dilemma game to investigate the performance of individuals who are capable of adopting reactive strategies in communities structurally organized by means of Barabási–Albert scale-free networks. We find that basic cognitive abilities, such as the capability to distinguish their partners and act according to their previous actions, enable cooperation to thrive. This result is particularly significant whenever fear is the leading social tension, as this fosters retaliation, thus enforcing and sustaining cooperation. Being able to simultaneously reward fellow cooperators and punish defectors proves instrumental in achieving cooperation and the welfare of the community. As a result, central individuals can successfully lead the community and turn defective players into cooperative ones. Finally, even when participation costs—known to be detrimental to cooperation in scale-free networks—are explicitly included, we find that basic cognitive abilities have enough potential to help cooperation to prevail.


Introduction
Solving the puzzle of the emergence and maintenance of cooperation among selfish individuals is a challenge that has occupied scientists from many different disciplines for a long time [1][2][3][4][5][6][7][8][9][10][11]. Biologists, economists, mathematicians, physicists, sociologists and many more are trying to provide comprehensive answers using the tools developed in their fields. Game theory (GT) is an interdisciplinary mathematical tool which seems to be able to encompass several relevant features of the problem and, as such, is used in much cooperation-oriented research. A branch of GT, namely evolutionary game theory (EGT) [12][13][14], provides a dynamical framework within which to describe the evolution of different traits/strategies in populations. We shall adopt this framework to study a prisoner's dilemma (PD) [1,6,15] model of cooperation. The PD is a game that attracts attention from those working in all fields concerned with analysing the nature of cooperation due to the fact that it is simple and yet able to encompass the most important features of cooperation. In a one-shot PD, two individuals can choose between two options-to cooperate or to defect. The game configures a dilemma since individually the players are drawn into defection, both by the temptation to defect and by the fear of others defecting, whereas to cooperate would be the most favourable joint decision. Indeed, when two individuals interact, either both cooperate, both defect, or one cooperates while the other defects. Mutual cooperation yields both players the reward R, while mutual defection results in a punishment P for each of them (where P < R). A cooperator facing a defector gets the sucker's payoff S and, at the same time, the defector who successfully exploited her partner receives a payoff value associated with the temptation to defect T. In the PD game, the payoffs satisfy the relation T > R > P > S, resulting in the dilemma situation in which defection yields the highest individual income independently of the partner's decision. However, if both players think rationally along these lines then they end up with the second worst payoff P instead of the higher reward (R) for mutual cooperation.
Breaking with the traditional approach in EGT of letting individuals adopt unconditional strategies only (to cooperate or to defect), we endow our players with some basic cognitive skills [16][17][18]. Namely, individuals are able to distinguish their partners by remembering their last action and are therefore able to act towards them taking this past action into account. A player's strategy is thus characterized by two parameters: p, the probability that she will cooperate if the partner cooperated in the past, and q, the probability of cooperating after a 3 defection by the partner. In real life scenarios, p could be described as mutualism, (1 − p) as treason, q as forgiveness and (1 − q) as retaliation [17]. The usual unconditional strategies correspond to extreme cases in this strategy space: unconditional cooperators are ( p = 1, q = 1) strategists, while unconditional defectors are (0, 0) strategists. An important feature of such strategies is that because of their stochastic nature, they can lead to different decisions (whether to cooperate or to defect) against different opponents who have previously adopted the same behaviour. After each game round, every individual has the opportunity to adapt her values of p and q by imitating successful partners (for details see section 2). This process is possibly inaccurate and the model takes into account the noisy nature of imitation (whether to imitate a partner's strategy with a similar payoff) and of perception. Our strategy parameters can also be understood as a very simple model of reputation: players are able to note the previous actions of their partners (which confer a reputation based on a memory with a single step of depth). At the same time they have the opportunity to use this reputation-information to influence (punish) their opponents [19].
In line with [17], we study the behaviour of incipiently smart players in structured populations defined in terms of an interaction network [20][21][22][23][24]. In this case, the network generates a heterogeneous scenario, in which some individuals engage in more interactions than others and, as a result, may potentially create conditions for a broad distribution of fitness values. We adopt a paradigmatic example of such interaction structures: scale-free networks [25][26][27][28], associated with degree distributions following a power law. Such class of heterogeneous networks resemble real-life interaction patterns [25,[29][30][31][32][33] more closely than the homogeneous networks already studied [17]. Typically, scale-free networks contain a small number of nodes with many interaction links-called hubs-connecting the majority of nodes which contain fewer neighbours. Networks with a scale-free characteristic can be built in many ways to include or exclude different kinds of correlations. In this paper, we are going to examine populations whose structure was built using the Barabási-Albert (BA) growth and preferential attachment algorithm [25]. For the case of unconditional strategies, scale-free interaction structures were shown to help cooperation to thrive [22,23,[34][35][36][37][38][39][40][41][42] when compared with homogeneous interaction structures [23,[43][44][45][46][47][48][49], as hubs are quickly taken over by cooperators who can then influence the whole community into cooperating, given the high payoff resulting from their large number of neighbours. This enhancement is grounded on the diverse nature of real interactions [39], being particularly significant in situations where the losses from unilateral cooperation (S, sucker's payoff) are small, as the fear [50][51][52] of being cheated stands as the biggest threat to cooperation in structured populations [51]. Here we will also study the effects of high penalties whenever someone is being cheated.
A further issue we shall take into consideration is related to participation costs. Recent studies have shown [53] that when each interaction involves a certain cost, cooperation is hampered, even on BA scale-free networks. Hence, we shall investigate how incipient cognitive agents deal with interactions involving participation costs.

Model
In the case of the two-person PD, two payoff entries can be fixed without loss of generality by normalizing the difference between returns from mutual cooperation and defection; following the usual parameterization [34,43,51], we set R = 1 and P = 0. A special case of the general PD payoff matrix arises when it is formulated in terms of a single parameter b-1 < b < 2, 4 characterizing both the measure of benefit obtained from defection/the temptation to defect (T = b) and the injury from/fear of being cheated (S = 1 − b)-to describe the strength of the dilemma. In contrast to the case of weak PD-where S = 0 [34,43]-being exploited now lays a burden on the victim of the exploitation, and this burden (S = 1 − b) increases when the dilemma becomes tougher. Hence, in this simplified framework, the payoff matrix takes the following shape using the parameter b: Players are located on the nodes of a BA scale-free network [25][26][27] and the links between them define the network of contacts, i.e., who can interact with whom. Individuals play single-shot PD games with each of their neighbours in each simulation step and acquire the accumulated payoff from these interactions. It is worth noting that the outcome of evolution may depend on the way fitness is computed, in particular when fitness is artificially divided by the number of interactions in which each player engages. Under these circumstances, the evolution of cooperation is governed by different mechanisms and so it can result in fundamentally different end states [54,55]. In each encounter, players have to choose simultaneously whether to defect or to cooperate. They gain their payoffs defined by the payoff matrix with parameters T > R > P > S defined above. When participation costs are included, each interaction entails now a cost h, which means all the payoff matrix elements are uniformly decreased by h.
When reactive players are considered, the strategy space consists of stochastic strategies characterized by two probabilities-the parameters (p,q) [0 p, q 1]. A player following the (p,q) strategy cooperates (defects) with respect to a neighbour with probability p (1 − p) if the neighbour cooperated with her in the previous round. Analogously, she cooperates (defects) with probability q (1 − q) if the neighbour defected in the previous round. Resorting to computer simulations, we adopt a discretized strategy space with p = i * 0.01 and q = j * 0.01 (i, j = 0, . . . , 100). Results are obtained performing extensive computer simulations starting from random initial conditions-the p and q parameters of all players are randomly drawn from a uniform distribution.
At the beginning of the simulations, individuals start by cooperating with probability ( p + q)/2 or defecting with probability [1 − ( p + q)/2], as they do not have any information about their neighbours yet. In an elementary simulation step, we randomly choose two neighbouring individuals (x and y), and calculate their individual payoffs; player x adopts the strategy of player y with a probability depending on their payoff difference, according to the pairwise comparison rule [56,57]: W (x ← y) = 1+e (P x −P y )/K −1 . P x and P y are the accumulated payoffs of player x and y, respectively, while K represents the errors in decision making. For small K values, the strategies with a higher payoff are most likely to be adopted, while for increasing K, strategies with worse performance can be adopted anyway; in the limit of very large K we approach neutral selection or random drift. This uncertainty can describe a broad spectrum of features; the role of social learning in individual decisions, fluctuations of the payoff parameters and errors in decision due to emotions and/or 'free will'. Additionally, whenever a strategy imitation takes place, the new strategy parameters change as follows: p x = p y + ξ 1 (σ ) and q x = q y + ξ 2 (σ ), where ξ 1 (σ ) and ξ 2 (σ ) are normally distributed random variables with zero mean and standard deviation of σ . This mechanism models a slight blur in perception and is essential in helping to avoid the random extinction of strategies as the pairwise comparison rule does not introduce new strategies. Another important point is that it ensures a complete exploration of the strategy space.
Scale-free networks are generated using a direct implementation of the Barabási-Albert model, based on growth and preferential attachment [25]. The average degree of the network 5 was four and the total number of nodes 1000. The equilibrium average p and q values and the measure of average cooperation result from averaging over 100 000 generations after a transient period of 10 000 generations for figures 1, 2 and 4 and an average over 10 000 generations after 5000 generations' transient for figure 3. Each point corresponds to an average over 100 runs and 10 networks. The results are independent of the type of updating (synchronous versus asynchronous). The K parameter of the strategy update was 0.4; the results are qualitatively the same for positive K values as long as they are not too high (the specific value depends on the size (or maximum degree) of the network). The standard deviation of the copying accuracy was taken to be = 0.03. The effects of changing are discussed in section 3. Figure 1 shows the results for both unconditional and reactive strategies when the community structure is described by BA scale-free networks. Unconditional strategies fare well as long as the penalty for being cheated is below ∼0.3 (b ∼ 1.3 in figure 1), in which case cooperation dominates. As noted previously, the key for this success of cooperation resides in the hubs [40,54]. If a hub is occupied by a defector, then initially she will be successful because of the high number of her neighbours among which, due to the random initial conditions, there will be some cooperators to be exploited. However, because of her success, the defector hub will influence her neighbours, who will start to imitate her strategy, thereby turning into defectors and, as such, contributing to decrease the fitness of the hub, rendering her an easy prey to be invaded by a nearby cooperator. In short, defectors in central positions become victims of their own success.

Dynamics of mutualism and retaliation in heterogeneous interaction structures
The situation of a cooperator in a hub is strikingly different: when she is imitated by one of her neighbours, this increases the hub's fitness, giving rise to a positive feedback mechanism which reinforces the position of the hub-cooperator, eventually leading to a cooperative community.
Clearly, for stricter dilemma situations this mechanism is not sufficient to ensure cooperator dominance [51] (see black squares in figure 1). Indeed, for high values of the sucker's payoff S (= 1 − b), the advantage of highly connected cooperators will be undermined by the high cost paid when being exploited. As a result, the central positions will become prone to invasion by defectors and the population may finish in an overall defector state.
The situation changes considerably in the case of reactive strategies: players can react differently to different partners, allowing them to escape the trap discussed above in connection with unconditional strategies. Indeed, cooperative hubs can benefit from mutual cooperation while punishing or retaliating against defecting players. In fact, among reactive strategies, pure cooperators and pure defectors are rather rare, prompting a change in terminology: players with a high p value become the 'new' cooperators as they mostly cooperate after a cooperative act. Their q value, in turn, gives information about how generous they are, as it defines how easily they forgive a defective act. Individuals with low p and q values become the 'new' defectors, as they mostly defect independently of the partner's past decision.
It is easy to see that the hubs maintain a leading, central role in what concerns the evolutionary dynamics. However, and unlike the unconditional strategy case, now both cooperators and defectors, once occupying a hub, are able to protect themselves from invasion by other strategies. Indeed, the non-zero p and q values significantly increase their resilience to invasion compared to the p = q = 0 scenario characteristic of unconditional defectors. Due to the stochastic nature of reactive strategies and a hub's large number of partners, a mostly defecting player can still accumulate a considerable fitness even when most of her partners imitate her strategy. Overall, the message is that, in heterogeneous environments, the emergence of cooperation can be more troublesome. On the other hand, once established, cooperation is extremely robust and takes over the whole population (see below). The accuracy with which individuals copy the behaviour of their neighbours plays a very interesting role in the process of strategy update in a hub. This uncertainty in the perception is modelled by a Gaussian centred on the real (p, q) parameters of the imitated neighbour and so the accuracy is embedded in the Gaussian width parameter σ (for the definition, see section 2). The higher the value of σ , the more inaccurate is the capacity of individuals to imitate those that perform better. For small values of σ , and despite the fact that most of the times evolution leads to cooperative scenarios, the influence of the initial strategy of those individuals occupying the hubs is so dramatic that it may dictate, to a large extent, the outcome of the evolutionary dynamics in a single evolutionary run. Hence, fully cooperating as well as fully defecting populations may emerge from random initial conditions associated with faithful copying of better strategies. However, as soon as σ increases above a critical value (which depends on the degree of heterogeneity of the network, on the population size and on the average connectivity), the window of opportunity for defectors no longer exists and cooperators dominate unconditionally, as shown in figure 1. The reason lies again in the particular behaviour 7 of highly connected nodes. Due to their initial success, hub-players with low p and q will influence their peers to adopt a similar behaviour. However, in such an environment the fitness of players with different degrees will become comparable. Increasing the inaccuracy of copying (σ ) results in a wider exploration of the space of p and q values, and so higher values can also emerge. Whenever a highly connected node adopts a higher p, due to her high number of connections she will be able not only to maintain such a strategy for a long period, but also to reward a higher number of other cooperative actions. Neighbours copying this new behaviour will increase their own and the hub-player's fitness, too. In other words, the positive feedback mechanism acts here towards increasing the cooperative behaviour nature of individuals. As a result, such behaviour will spread in the population, as hubs are frequently imitated. This said, the increase of the blur in perception has an important side effect: cooperation cannot reach 100% in this population as the strategy parameters are spread around the average value. This is the reason why the measure of cooperative acts does not reach 100% for the cognitive strategies in figure 1.
Another interesting stochastic parameter in the strategy adoption process is the parameter K, which describes the errors in decision making; that is, how frequently less successful players are imitated. Alternatively, it is often associated with the so-called intensity of selection. For low K values, selection is strong and the more successful strategies are almost always imitated. With increasing K, the importance of fitness decreases, while for K → ∞ the evolution becomes neutral and fitness plays no role in the imitation process. Consequently, for finite K the strategy of individuals with identical or lesser fitness may be adopted with a finite probability. As a result, the main role of K is to catalyse cooperation by tipping out the hub-players from absolute defecting behaviour. For positive K, in a cluster of (0,0) strategists, players will try to adopt each other's strategies despite their identical fitness, and due to σ their p and q values will begin to move away from the defective corner of the strategy phase space. As figure 1 shows, cooperation emerges for any temptation value. Surprisingly, the measure of cooperation is higher for larger b (and larger c).

Games of fear and greed
Up to now we have limited our study to the prisoner's dilemma formulated in terms of a single parameter b. However, one may enhance the strength of the dilemma by varying T and S independently, as each of these parameters defines a different social tension [50,51]. While S can be associated with the fear of being cheated (as it represents the disadvantage of a cooperator when facing a defector), T may stand as a measure of greed (as it represents the temptation to defect).
In figure 2 we disentangle these social tensions, showing that S provides the main influencing factor in the evolution of p and q. Increasing the fear of being exploited increases both the strength of retaliation (lower average q) and positive responses to cooperative acts (higher average p), driving the whole population into mutual cooperation. This surprising conclusion differs from that obtained with unconditional strategies [23,51], highlighting the importance of simple cognitive features under severe cooperation dilemmas. Fear stands as the most significant pressure in the evolution of individuals' reactions, whereas greed (T) does not significantly influence the emergence of cooperation as long as fear is strong enough, as shown in figure 2.

Robustness against invasion
An important aspect to be investigated is how robust cooperation is against invasion by defectors. To this end we follow the procedure adopted in [17], adapted to the heterogeneous nature of scale-free graphs. Due to the different neighbourhood structure of the nodes in a heterogeneous network, robustness is not straightforward to define. Replacing a cooperative player (with high p and low q values) in a hub with an unconditional defector will have an impact that is very different from doing the same on an edge node with few neighbours (leaf). Two mechanisms contribute to this effect: firstly, the hub-player interacts with more players than a leaf player, and so she can directly influence a larger fraction of the community; secondly, due to the high number of neighbours, her accumulated payoff can become potentially higher (as defector) compared to that of her neighbours, so she can spread her strategy successfully until, possibly via retaliation, her payoff eventually drops.
Given these scenarios, we tested the robustness of cooperation in two different situations. In the first one, we replaced a given fraction (µ) of the population with (0,0) strategists in every simulation step independently of the degree of the node, while in the second we only allowed the injection of defectors on nodes with a degree smaller than 1/5 of that of the biggest hub in the network. That is, in the second case we artificially protected the influential players in the network. For smaller percentages of injection of pure defectors, cooperation was robust for any temptation value in both of the above cases. However, for more frequent invasions cooperation survives only in the case where the biggest hubs are protected from direct occupation (second scenario).

Participation costs
So far, cognitive strategies have been able to promote the emergence of cooperation on heterogeneous networks which, once installed, proves robust against invasion. Prompted by such success, we now study the impact of participation costs in the situations studied.
Keeping track of a large number of contacts requires considerably more energy/resources than remembering the actions of just a few acquaintances [53]. Alternatively, participation costs may be considered as the price of having a larger memory capacity, being as such proportional to the number of neighbours. Such costs can be introduced into the payoff matrix by subtracting the value h (h > 0) from every payoff entry. From a purely mathematical perspective, the prisoner's dilemma payoff ranking stays unaffected when the interaction cost h is applied to every player independently of her decision. It is also worth noting that the introduction of h does not change the results in a homogeneous environment where every individual has the same number of neighbours, as in this case it decreases the payoff of all players by the same value, so strategy update processes based on payoff differences lead to the same evolutionary dynamics. This said, however, it is also important to note that in our model players are expected to play with each and every neighbour they have. This may be reasonable whenever interacting has no cost. With participation costs, however, it is only reasonable to expect that individuals should be able to opt not to engage in all their possible interactions. It was shown that the introduction of voluntary participation can significantly change the outcome of the evolution in structured populations [58][59][60]. The issue becomes particularly relevant whenever the cost of each interaction is higher than the payoff resulting from both mutual defection and mutual cooperation. In this case, neither opportunistic nor altruistic behaviour pay, compared to interaction avoidance, and the strategy space of our model becomes incomplete. Hence, for the PD under study here where 1 = R > P = 0, h should satisfy h R.
The effects of participation costs are shown in figure 4 for both unconditional and reactive strategies as a function of the temptation (b) and participation cost (h). Here we adopted the payoff matrix characterized by T = b and S = 1 − b, in order to reduce the number of free parameters to b and h, while still being able to study harsh dilemma situations. As the figure shows, as long as h < R (more precisely, when 0 h 0.9) cooperation thrives, and defection will appear only residually due to the stochastic nature of the strategies. The key for this result is that, in this regime, the reward for mutual cooperation (1 − h) is positive; that is, if cooperative hub-players can turn most of their 'followers' into cooperators, then they can have the same stable, leading role as when without the participation cost, due to their superior payoff.
Consequently, as figure 4(a) shows, the disadvantage-for those with higher number of nodes-stemming from introducing participation costs is not enough to suppress the success of reactive strategies. In fact, the basic mechanisms described above remain valid, leading to a high prevalence of cooperative actions for the entire b range. This contrasts with the scenario of unconditional cooperators and defectors, shown in figure 4(b), where cooperation is rapidly reduced by the existence of participation costs.
The dominance of cooperation shows a weak dependence on b: with increasing temptation values (and penalties) it becomes harder for cooperative individuals to take over the hubs as an occasional, successful defection against them lowers their income by a greater amount, with an associated impact which is higher and, hence, harder to counteract. Consequently, for higher b values, cooperation can only win if the participation costs are slightly lower.
In the transitional region, that is, whenever 0.9 h 1.0, the shift of the leading role from the hubs to the smallest degree players happens gradually with the increase of h. At around h = 0.9 and smaller b values, the difference in strategic success and the burden of participation costs from different numbers of interactions makes the payoff for players with different degrees comparable. Such a homogeneous and leaderless state results in a slight drop in overall cooperation. Further increase in h turns the behaviour of low degree nodes more influent, although the payoffs remain comparable. Players in cooperative clusters, which are formed due to the randomness of the initial conditions and consist of low degree individuals, can pass their strategy to other low degree individuals through higher degree nodes. These intermediate nodes are taxed by the higher participation cost, but with support from the cooperative cluster they can still influence lower degree players in a defecting environment. Thus information can flow in the population and cooperation can spread and reach its full potential. For higher b values, the probability of the cooperative clusters' formation at the start of the evolution decreases as a successful defection against a cooperative player sets back the individual much more in this case.
As expected, whenever the participation costs satisfy h > R, the evolutionary outcome resembles that of the costly participation region in [53], since highly connected individuals are so overburdened that they cannot influence the rest of the population. Such a scenario leads to a fragmented community of isolated, small degree individuals.
In summary, we can point out that reactive strategies-in contrast to unconditional strategies-grant cooperation a leading role in the evolutionarily interesting range (when actual strategy evolution happens) of participation costs and for all temptation values.

Conclusion
We have studied the evolution of cooperation amongst individuals with incipient cognitive capabilities in structured, BA scale-free populations. Individuals were able to distinguish their neighbours and to remember the last action performed by each neighbour in respect of them, although they were not able to anticipate the present or future actions of their neighbours. They could choose different actions towards different neighbours depending on their own strategy and their neighbours' past actions. We found that these incipient cognitive abilities can establish high levels of cooperation. Compared to unconditional strategies, incipient cognition facilitates the emergence of cooperation. Individuals rapidly adopt a positive reaction towards cooperative actions, which ultimately defines the equilibrium fraction of cooperative actions. Therefore, the conditions for the emergence of cooperation are essentially created by responses to cooperative acts, as reactions towards selfish actions only become relevant for fierce social dilemmas. As previously shown [51,52], fear (S) of being cheated plays a central role in the outcome of evolution in heterogeneous structured populations. However, while in the evolution of unconditional strategies fear undermines the chances of cooperation, incipient cognitive skills evolve under fear by increasing the capacity for retaliation which, in turn, promotes overall cooperation. In this sense, cognitive skills may appear as an evolutionary response to the nature of the dilemmas individuals face, in which case fear is more important than greed in promoting retaliation.
It is worth pointing out that a slight blur in perception and in copying the behaviour of the partners proves instrumental in turning the most influential people-located in the hubs of the network-into cooperators. As soon as the hub players act cooperatively, they easily lead the whole community into cooperation. Due to the possibility of reacting differently to the actions of the partners, established cooperation is very robust against defector invasion. On heterogeneous networks, defector invaders are efficiently isolated and converted into cooperators.
Finally, reactive strategies are able to efficiently prevent the prevalence of defectors, even in the presence of participation costs. In fact, the effectiveness of reactive strategies is even enhanced in the presence of participation costs, when compared to the case of unconditional strategies.