The self-organizing impact of averaged payoffs on the evolution of cooperation

According to the fundamental principle of evolutionary game theory, the more successful strategy in a population should spread. Hence, during a strategy imitation process a player compares its payoff value to the payoff value held by a competing strategy. But this information is not always accurate. To avoid ambiguity a learner may therefore decide to collect a more reliable statistics by averaging the payoff values of its opponents in the neighborhood, and makes a decision afterwards. This simple alteration of the standard microscopic protocol significantly improves the cooperation level in a population. Furthermore, the positive impact can be strengthened by increasing the role of the environment and the size of the evaluation circle. The mechanism that explains this improvement is based on a self-organizing process which reveals the detrimental consequence of defector aggregation that remains partly hidden during face-to-face comparisons. Notably, the reported phenomenon is not limited to lattice populations but remains valid also for systems described by irregular interaction networks.


Introduction
The presence or absence of cooperation has a huge consequence in various fields of life, therefore it has a paramount importance to identify which conditions help or which ones block the spreading of altruistic behavior in a complex population [1,2]. Interestingly, some universal mechanisms were identified in the last two decades which remain valid not only in microbiological systems, but also in human societies where interacting agents have significant cognitive skills to adjust their behavior for a higher individual income [3,4,5,6,7].
Without exaggeration, hundreds of research papers were published by scientists with biology, economics, applied mathematics, or statistical physics background, in which they proposed different microscopic models to increase the general willingness of actors to cooperate with their partners [8,9,10,11,12]. In some cases the desired evolutionary outcome is expected, for example when defection is punished or cooperation is awarded by individuals or by a governing institution [13,14,15,16,17,18,19]. In these cases, however, the proper question is how to avoid the so-called secondorder free-riding, when a cooperator player is reluctant to contribute and maintain the mentioned cooperation supporting institution or behavior [20,21,22,23]. Intellectually it is more challenging to identify those mechanisms or conditions which do not support directly one of the competing strategies. More precisely, in the latter cases it is not obvious in advance why they result in a higher cooperation level. In these, so-called strategy-neutral alterations of the traditional models it is a common feature that the cooperator supporting effect emerges just as a secondary or indirect consequence of the fair and democratic rule. One of the very first and most celebrated example was to identify that heterogeneous population could be a cooperator supporting environment [24]. The heterogeneity may originate from an irregular interaction networks where some players have significantly more neighbors than for others hence they can collect higher payoff [25,26,27]. Diversity may also originate from different individual skills, like strategy teaching capacity or other social status, which could also result in similar effect [28,29,30,31,32]. The common feature behind these models is a kind of matching process in which players locally coordinate their strategies which reveal the advantage of cooperators. For completeness we note that coordination can be reached directly via a sort of conformity attitude [33,34,35], but it is not scope of our present work. Similar impact can also be reached when player treat their neighbors differently, via weighted interaction graph, or support their neighbors in an unequal way [36,37,38,39,40,41]. Interestingly, an intervention into the microscopic dynamical process can also be helpful for cooperation. By introducing an inertia into the decision making or hindering too fast individual strategy change could also be beneficial for cooperation [42,43,44]. The related model studies pointed out that the mentioned dynamical change has asymmetric consequence on the invasion process of different strategies. The mentioned intervention does not relevantly modify the slow and balanced propagation of cooperator state resulting in smooth interfaces separating competing domains in a spatial system. On the other hand, the resulting dynamical change blocks significantly the rapid progress of defection state, which would lead to irregular interfaces and easy individual victory of defection otherwise. But we can also mention memory effects when accumulated success from past interactions can reveal that defection can only be successful for short term because neighbors who follow this behavior eliminate the potential prey of further exploitation [45,46,47,48].
In this work we consider an alternative "strategy neutral" modification of the traditional model where the positive consequence on the evolutionary outcome is not straightforward. In particular, we focus on the strategy imitation process when a learner player analyzes the payoff value of the model player who represents a tempting strategy. Evidently, the decision whether to adopt or not an alternative strategy is based on the information that learner player collects from the partner. But this info could be inaccurate [49,50,51]. Principally we are not talking about a perception error, which can be handled by a noise parameter introduced in the strategy learning probability. Instead, we focus on the deceptive behavior of the model player. Such deception is rather frequent in animal kingdom and basically it serves to avoid conflicts or to gain mating advantage [52]. But of course, Homo sapiens is the best liar and we can easily give examples when someone's dress or lifestyle shows more than her/his proper success [53]. This experience makes a general learner more careful who may try to collect additional information to evaluate an alternative strategy more accurately. In this way the learner's decision is not based solely on the success of a particular player but on a more reliable averaged statistics obtained from the neighborhood. Here the key question is how to weight the directly observed local and the average payoff values obtained from the learner's environment. Naturally, the size of the environment from which the learned collects information could also be a crucial detail. To explore the possible consequences of extra information we study not just different sizes of perception environment of the learner player, but also check cases when the mentioned environment is not stable, but potential model players are chosen randomly from the population.
In the rest of this paper we propose a very simple model to explore how averaged payoff values change the learning process and reveal that it has a significant cooperator supporting consequence. We not just report this phenomenon, but also give a plausible explanation what is behind it. Furthermore, we also emphasize that the simple extension we propose results in a universally valid effect that could be observed in populations characterized by not only regular, but also irregular interaction graphs. But we first define our extended model, and then proceed with the results and a discussions of their implications for a more sophisticated and effective learning process.

Evaluating the complete neighborhood
We start from the traditional version of spatial prisoner's dilemma game model where players are distributed on a graph and interact with their neighbors. The players represent either cooperator (C) or defector (D) strategy, which strategies are distributed randomly in the initial stage. We first define our proposed model for a square grid, but the extension to other graphs is straightforward. For simplicity, but without jeopardizing the essence of the conflict of interests, we use the so-called weak prisoner's dilemma game parametrization where the only control parameter is T , temptation to defect, which characterizes a defector's income against a cooperator partner. The latter player gets nothing in the mentioned interaction, similarly to the case when two defector players meet. In the last case, when two cooperators meet, both collect R = 1 payoff value.
According to the standard simulation protocol, in an elementary step a randomly chosen player x, who has strategy s x , plays the game with her neighbors and collects altogether Π x payoff from these interactions. Similarly, a neighboring player y, who has the opposite s y strategy, collects Π y payoff from the games played with the corresponding neighbors. In the usual strategy imitation rule the Π y −Π x payoff difference has a crucial role on how likely player x adopts the s y strategy of the model player y. This likelihood is defined by the well-known Fermi-function [54] where K denotes the noise factor which collects different sources of errors, like the possibility of a bad decision based on the available information. Our present work focuses on the reliability of information that can be collected from other partners. Of course, there are several ways how to deceive others for a particular reason. For instance, a player may try to show a different strategy to the neighborhood from the one she actually applies. But here we concentrate on the possibility that the payoff value we collect from a potential model actor is not accurate. Needless to say, such an ambiguity could be frustrating for the learner player because her decision about strategy change is based on this payoff value, as it is summarized by Eq. 1.
To minimize the possible error of evaluating the competitor's payoff value, our learner player may want to collect alternative information about the potentially tempting strategy. More precisely, player x makes a survey in the available neighborhood and checks the payoff values of all players who practice the alternative s y strategy. If player x averages the related values then she has a more reliable information about the general success of the strategy she wants to adopt. Here we have two fundamental aspects to be contemplated. The first one is how strongly to consider the additional information collected from the neighborhood. This can be done in a way that we replace Π y in Equation 1 by a weighted Π w value which is the combination of the original Π y payoff value of model player y and the Π av averaged value obtained from akin players from the neighborhood: Here q is the control parameter determining how strongly our learner player trusts on the alternative source of information about the success of tempting strategy. Accordingly, if q = 0 then we get back the traditional spatial prisoner's dilemma game, while in the q = 1 limit the adoption probability is based on the averaged value collected from x y Figure 1. As a learner, a cooperator (blue) player x tries to imitate the strategy of the defector (red) neighbor player y. To calculate the imitation probability player x considers not only the Π y payoff of player y, but also the Π av averaged payoff values of all other defector players who are within the evaluation circle. The latter players are marked by yellow background while the border of evaluation circle around player x is marked by a dashed diamond. In this particular case l e = 2 is applied, which means that all players whose distance from player x are not larger than 2 may contribute to Π av , hence providing a more accurate statistics about the general success of s y strategy.
the neighborhood exclusively. We must stress that the average is obtained by summing over not all payoff values detected in the neighborhood but are reduced to those values only which are achieved by similar strategies to the one represented by model player y. This way of averaging is in stark contrast to previously applied averaging methods [55,56], because our learner player does not want to explore the general wellness of the neighborhood, but focuses on the success of a specific strategy. We also note that the averaging process is restricted to the payoff values of the alternative strategy in the neighborhood because the learner's goal to gain more accurate information for potential strategy update. This protocol is in stark contrast to general average process applied in mean-field calculations and in some previous spatial models [57,58]. Evidently, to collect additional information from the neighborhood requires a high cognitive skill from a learner that was detected in previous human experiments [59,60,61,62]. The other main ingredient of our model is to define the neighborhood of a learner player which is accessible to her to collect more accurate information. A natural way to assume that all players are checked who are within an l e steps from player x hence they are within the evaluation circle. To clarify it better, we present a case in Fig. 1 where we surrounded by a dashed diamond shape line those players who are within the l e = 2 evaluation circle of the learner player x. Naturally, the value of l e can be increased from 1 toward higher values gradually and we can monitor how the information obtained from larger and larger set influences the evolution of cooperation. Importantly, as we already stressed, the Π av value is calculated from the values of those players who represent identical strategy to the one having by the model player y. In the above specified case they are marked by yellow background in our Figure. In our simulations we studied populations containing up to N = 160000 players.
According to the standard protocol in an elementary step a randomly chosen player has a chance to change her strategy by adopting the alternative strategy of a randomly chosen neighbor. By repeating this loop N times we declare a natural unit of simulation, called 1 Monte Carlo (MC) step. In this work we applied maximum 50000 MC relaxation steps to reach the stationary states where the fraction of cooperators were measured and time averaged for another 100000 MC steps. The applied system size and the sufficiently long simulation time made us possible to obtain results which are independent of the applied system size, hence finite-size effects can be excluded. In this work we used K = 0.1 noise level to allow comparison with results of traditional model, but we stress that our qualitative observations remain intact if we use other K < 2 values of noise parameter. Of course, in the high K parameter region the strategy imitation process becomes completely random and independent of the payoff difference. Beside the mentioned square grid topology we also used random network interaction graph where the degree of nodes was k = 4 unchanged. In this way we could check the consequence of using irregular topology without introducing additional effects originated from heterogeneity of players. Last, we mention that we also studied a case when the "evaluation circle" was not selected from players around the learner player, but we choose them randomly from the whole population. Nevertheless the details of this modified protocol will be given in the next section when we present its consequences by comparing with the results of the originally defined model. Our first observations are summarized in Fig. 2 where we plotted the stationary cooperation level in dependence of temptation value for different values of the weight factor q. In the presented case we used l e = 2 evaluation circle, but qualitatively similar behavior can be found when the size of the neighborhood to collect extra information is different. As we noted here q = 0 is equivalent with the traditional spatial model which suggests a T c = 1.03576 critical temptation value for the used K = 0.1 noise level [63]. But as we enlarge q, hence the learner players give a larger credit to the alternative information obtained from the neighborhood, the chance of cooperators to survive is improved significantly. Furthermore, when q is close to 1, hence the additional information becomes dominant during the decision making about the strategy change then only T > 1.5 temptation values can provide a full defection state. It is worth noting that l e is relatively small in the presented case, which practically means that typically not more a half dozen of other players are checked to gain a valuable extra information. Still, the improvement is remarkable.  Next we illustrate how the size of the neighborhood, from which the extra information is gained, influences the cooperation level. A representative plot is shown in Fig. 3 where the weight factor is fixed at q = 0.2. These curves highlight that the cooperation level can be improved if learners can collect information from a larger neighborhood. This effect, however, cannot be enlarged endlessly, because after a certain level this enhancement saturates. For example, collecting data from a 180-member set at l e = 9 gives almost equally good information than the neighborhood of beyond 3000 neighbors which is obtained for l e = 39. But the tendency is clear. One may note that the improvement of the critical temptation value characterizing the border of mixed state is not really large. But this change is the consequence of the relatively small q weight factor which gives a modest credit to external information. We stress, however, that even at this q the threshold T c can be doubled if the neighborhood size is large  (d)) and the modified model (panels (e) to (g)) where a learner considers a more accurate payoff value about the success of alternative strategy during the decision making. In both cases we applied the same T = 1.1 temptation value and launched the simulation from an identical state, shown in panel (a), where a red defector belt is surrounded by a blue cooperator domain. In the traditional case, which can be considered as q = 0 case in the modified model, singular defectors can invade deep in the bulk of cooperator domain thanks to the high T value which provides them high individual success. As a result, the interface separating the two main domains becomes irregular which makes the situations of cooperators more difficult. The moving front leaves behind it small cooperator islands, shown in panel (c), but they cannot survive long and finally the system terminates into a full defector state. If q = 1, shown in the bottom row, a defector's payoff in the front becomes less tempting, which results in the shrink of the red belt. The propagating front leaves behind it small patches of defectors, shown in panel (e), who can maintain a competitive payoff value, hence this phase eventually spreads in the whole system. In both cases we applied similar L = 200 linear size.

enough.
To understand the cooperator supporting mechanism more deeply in the following we present a comparison of pattern formations obtained in the traditional and in the modified models. Figure 4 shows the significantly different evolutionary paths when we launch the simulations from the same initial state, shown in panel (a), where a red defector island is surrounded by a blue cooperator domain. This setup, where different players meet along two domain walls, helps us to reveal the characteristic movement of propagation fronts more easily. For a proper comparison we applied the same T = 1.1 temptation value for both cases. In the top row, containing panels (b) to (d), we show the evolution in the traditional model. Notably, this can be considered as a q = 0 extreme case of the modified model. Here q = 0 weight factor ensures that a learner x player estimates the success of the alternative strategy based exclusively on the payoff value of her neighboring y model player. As a consequence, shown in panel (b) and (c), the original straight front line starts roughening because a neighboring defector can collect a high individual payoff value because of the relatively high value of temptation. We here note that the threshold temptation value is well below the presently applied T value. The rough propagation front results in even more difficult circumstances for cooperators because it destroys their original phalanx and network reciprocity can hardly work anymore. Only just small islands of cooperators remain when the front passes. They are marked by white circles in panel (c). However, they are unable to survive long because of the high T value and the system eventually evolves to a full defector state, shown in panel (d).
As a comparison, in bottom line we present the evolutionary path in the other extreme case, when the success of the alternative strategy is estimated from the information collected from the neighborhood. Despite of the fact that we applied relatively small l e = 3 radius, the mentioned trajectory is significantly different. Here the direction of the invasion is reversed by maintaining a not too noisy front line. Behind these lines, however, small fraction of defector players remain alive, as they are marked by white ellipses in panel (d). The reverse direction of propagation informs us that a bulk of defector cannot provide a large average payoff to their members because there is no one to exploit. Furthermore, a pure cooperator domain provides robust average payoff value for a cooperating neighborhood, hence the adoption of cooperator strategy by a defector player becomes a frequent process in the initial stage. But if the density of defectors becomes low, as in the case marked by the ellipse, then they can collect competitive average payoff value again and form a stable coexistence with their rivals. Evidently, a pure cooperator neighborhood offers a high average of payoff, therefore the spreading of the mentioned mixed state is a slow process. For comparison, for L = 200 linear system size the traditional evolution terminates into the full defector state typically within 300 MC steps, while at least 1000 MC steps needed to reach the stationary state in the modified case shown in panel (g). Based on the above described argument we can also understand why the l e value influences the stationary concentration of defector players. The larger the value of l e the smaller the faction of defectors who can survive permanently, as we observed in Fig. 3. If their concentration exceeds a threshold value then their average payoff becomes less attractive, which provides a feedback mechanism to maintain a significant cooperation level in spite of relatively high temptation value. In this way the average information about the competing strategy maintains a self-organizing pattern of a mixed state where compelling cooperation level can be reached even for a high temptation value. This mechanism also explains our observations summarized in Fig. 2 because the effect becomes stronger as we give higher credit to the neighborhood via using larger q weight factor.
One may ask what if the additional information is not collected from a local neighborhood, but originates from a random sampling where the target could be the whole population? In this modified model a learner player x calculates the crucial Π av average payoff value by selecting m other players randomly from the complete population. As previously, if the strategy of a selected i player agrees to the strategy of the model player y then we consider the related Π i payoff value of player i when the mentioned average is calculated. Naturally, for a proper comparison with previously defined model extension, the value of m should agree with the size of the neighborhood defined by the radius l e . For example, if l e = 1 then m should be 4, for l e = 2 the corresponding value is m = 12, etc. The largest sampling set we used contains m = 3120 members size is equivalent to a neighborhood around x for l e = 39 radius. Importantly, the former m = 3120 sub-population contains randomly chosen players from the whole population who are not necessarily neighbors to each others.  Figure 5. Cooperation level in dependence of temptation value on a square lattice for different sizes of sample set obtained at q = 0.3 weight factor. Players belonging to the sample set are chosen randomly for every learning process. Legend shows how many players are selected to gain information about the general success of the alternative strategy. The comparison of curves highlights that the size of sample set has no particular importance on the evolutionary outcome if we apply random sampling: the information gained from half dozen other players could be as valuable than the one gained from a much larger group. Here the most relevant parameter is the q weight factor, which practically determines the critical temptation value until cooperators may survive. This value, however, is significantly larger than the one that can be reached when the information is collected from those players who are related to the learner player topologically.
As previously, we still have two parameters, q and m, but there roles are different from the one we previously observed for q and l e . A typical behavior is summarized in Fig. 5. The first conspicuous feature is that size of the sampling set has no relevant role on the stationary value of cooperation level. Roughly speaking, it is enough to collect additional information from a small random sample because it gives no relevant advantage if a learner player bothers too much by checking too many players about the expected success of the alternative strategy. Our second observation is the general improvement of cooperation level comparing to the case when fixed and connected neighbors are used as a source of additional information. This fact can be seen easily if we check Fig. 2 where even a higher q = 0.4 value is still unable to provide as high portion of cooperators as we see in Fig. 5 for random sampling.
The above mentioned superiority of random sampling is valid for all related parameter pairs. Next we give some inspirations to understand its origin. To illustrate and understand the difference between random sampling and collecting data from a compact neighborhood in Fig. 6 we show how they drive the pattern formation at similar conditions. Importantly, we not simply apply equally strong temptation value and weighting factor, but also use equal size for the sampling population. Indeed, the latter has no decisive importance for random sampling, but it could be an essential factor for neighbor-based sampling, as it was illustrated in Fig. 3. Accordingly, l e = 2 radius around the learner player is equivalent to check m = 12 randomly chosen players. In contrast to Fig. 4 we here use an alternative common initial state, shown in panel (a), where both a homogeneous cooperator and defector domains meet with a phase where strategies are mixed randomly. In the top row, where additional information is collected from the neighborhood, the fastest change can be observed in the mixed phase. This is a general phenomenon that can be seen even for spatially structured populations, because cooperators can only protect themselves if they are organized. In our case, despite of the relatively high q = 0.5 weighting factor, network reciprocity alone is incapable to block the spreading of defection. It is because mixed environment can always give a decent payoff advantage for other defector players, too. At the same time a fully homogeneous and compact cooperator domain cannot really resist the invasion of defectors who are wrapped in a supporting cooperator shell. Interestingly, the homogeneous defector domain is not sensitive and cooperator player never enters into the down-left quadrant. At such a q value the temptation is too high to replace defection by cooperation.
We, however, observe a strikingly different evolutionary trajectory when a learner collects information from randomly selected players. In this case the mix phase is table, albeit the actual ratio of defectors and cooperators is adjusted to the value of T and q. On the other hand, the stability of homogeneous domains is proved to be the opposite we detected previously. Firstly, the shrink of the fully cooperator island is slower because in the average payoff of defectors may not be tempting: it can easily happen that we sample defector players from deep of the full-D domain where they get nothing. But our argument is also valid for the opposite case. In the bottom row the homogeneous defector island becomes unstable and disappears very fast. In this case the strategy of cooperators standing at the front may become attractive because their average payoff value may be increased significantly by the contributions of their akin fellows who are sitting safe in the middle of a fully cooperator patch.
But we should stress that both dynamical process we discussed about the stability of homogeneous spots are just temporary because the distant information collected by Figure 6. Comparison of pattern formation for the case when additional information is collected from the learner's neighborhood (top row, panels (b) to (d)) and for the case when sampling players are chosen randomly (panels (e) to (g)). For proper comparison we applied the same T = 1.3 temptation value, q = 0.5 weighting factor and used equally large sampling population as a source of additional information about the success of alternative strategy. The simulations were launched from identical starting state, shown in panel (a). Initially we have a cooperator domain in the top-right corner and a homogeneous defector domain in the down-left corner. Their neighbor is a domain where strategies are distributed randomly. The hight temptation value cannot prevent the final victory of defectors in the fixed neighborhood sampling, while random sampling can provide a stable cooperation level shown in panel (g). Further discussion can be found in the main text. random sampling drives the system eventually toward a uniform state where the fraction of C and D strategies is the same everywhere. In this stationary state, however, the previously mentioned self-organizing mechanism still works, which prevents defectors to grow too large homogeneous spot. Admittedly, this information gathering via random sampling also hinders cooperators to grow too large homogeneous domains because they cannot really utilize their high cooperation thanks to the smaller contributions to Π av from other C players. Nevertheless, from cooperator's viewpoint the situation is fine because they can reach a decent fraction even at high temptation value if q is large enough.
Finally we briefly note that our observations about the positive consequence of considering additional information is not restricted to lattice-type populations, but remains valid when the interaction graph is not ordered. Having discussed the very positive consequence of random sampling, maybe this fact is not really surprising because our argument did not utilize the translation invariance of the interaction graph. But for completeness we also checked our results by using random topology where players have similar degree distribution as for square grid. Therefore we can check the consequence of randomness exclusively without bothering other effects due to the degree change of the topology. The essence of our findings are summarized in Fig. 7 where we plot the results obtained for neighboring-based and random sampling based additional information gathering.  Figure 7. Cooperation level in dependence of temptation value on a random graph for different values of q weight factor as indicated in the legend. For a proper comparison to previous results obtained for square grid, the degree distribution k = 4 remained unchanged. On the left panel we show the case when the evaluation circle contains 12 nearest neighbor players of the actual learning player. On the right panel we show the results of a similar size of sampling set, but here these players are chosen randomly from the whole population. The comparison of the panels suggests that it has no significant importance whether the sampling set contains topologically related players or randomly chosen fellows. The only crucial factor is how large credit was given to the information collected from the external group when decision was made.
In agreement to our earlier results, to give larger credit to averaged payoff values via increasing q will improve the cooperation level. However, the clear consequence of random interaction topology is that this effect is really strong and cooperators may survive even at T = 2. We stress that alone the random interaction topology would not be enough to produce such an improvement, because in the traditional model the critical temptation value remains close to T = 1. Our other main observation is based on the comparison of the panels of Fig. 7 where similar curves are detected for neighboringbased and random sampling. This agreement suggests that the original randomness of the topology already serves as an information mixing tool. Therefore, in sharp contrast to the lattice-type topologies like square grid, in a randomized graph to collect extra information via random distance sampling has no additional value. But the positive consequence remains intact, and is more pronounced for irregular topologies.

Conclusion
Making a decision about which behavior to follow is a crucial act not only at personal but also on collective level. It is easy to see that the dominance of a hasty or careless adoption choice of players can drive the whole society toward an undesired destination. Therefore, huge intellectual efforts have been focused on this delicate task to find those methods which are in agreement with the fundamental Darwinian selection rule of the more successful strategy, but on the same time they help us to block the obvious advantage of defection. For example by recording and accumulating previous success of a strategy or by introducing an inertia and keeping a strategy more valuable if it survived long could be a cooperator supporting modification of the simplest "imitating the more successful strategy" protocol. But, of course, there are alternative methods and we refer the interested reader to related review papers.
In our present work we suggested a very simple modification of the traditional model where we considered the chance that a learner player is more careful and does not accept the information about the model player unreservedly. Instead, the former player tries to collect information about the success of the competing behavior from an alternative source. This could be the neighborhood of the learner player or could be randomly selected group of other players from the whole population. No matter which source is used, the population where members give higher credit to averaged information about the success of an alternative strategy can reach a higher cooperation level. The larger the weight of this additional information in the decision making the more significant improvement can be achieved.
The main mechanism which is responsible for this positive consequence is based on a self-organizing pattern formation of the spatial population. More precisely, to consider average information instead of accepting unconditionally the success of a particular case prevents the condensation of defectors, hence maintains an acceptable cooperation level even at high temptation values. This procedure works not only in populations having lattice-type interaction graphs, but also for irregular topologies.
It is worth noting that the observed cooperator supporting mechanism fits nicely to those where the introduced strategy-neutral rule has biased impact on the strategy invasion of competing strategies [42], hence they provide an alternative way to understand to original enigma, why cooperation may prevail among selfish agents. This research direction could be potentially promising for broader application of evolutionary game theory beyond human societies. In these systems participants may not necessarily have cognitive skills, like in microbiological populations, therefore the related theories should not rely on additional assumptions on moral issues, like reputation [64,65] or preliminary judgment about strategies which are the source of punishing or rewarding mechanisms in advanced populations [66,67,68,69,70].