Small groups and long memories promote cooperation

Complex social behaviors lie at the heart of many of the challenges facing evolutionary biology, sociology, economics, and beyond. For evolutionary biologists the question is often how group behaviors such as collective action, or decision making that accounts for memories of past experience, can emerge and persist in an evolving system. Evolutionary game theory provides a framework for formalizing these questions and admitting them to rigorous study. Here we develop such a framework to study the evolution of sustained collective action in multi-player public-goods games, in which players have arbitrarily long memories of prior rounds of play and can react to their experience in an arbitrary way. We construct a coordinate system for memory-m strategies in iterated n-player games that permits us to characterize all cooperative strategies that resist invasion by any mutant strategy, and stabilize cooperative behavior. We show that, especially when groups are small, longer-memory strategies make cooperation easier to evolve, by increasing the number of ways to stabilize cooperation. We also explore the co-evolution of behavior and memory. We find that even when memory has a cost, longer-memory strategies often evolve, which in turn drives the evolution of cooperation, even when the benefits for cooperation are low.


Overview of Supporting Information
In this supplement we detail our analysis of iterated n-player games in which players have two choices in each round and can remember the outcomes of the previous m rounds. We identify the strategies that are able to resist selective invasion by any other strategy in an evolving population of players. Such strategies are called "evolutionary robust", as defined formally below. An iterated n-player game consists of an infinite series of "rounds" in each of which each player chooses to either "cooperate" (c) or "defect" (d). A memory-m strategy stipulates that the probability of cooperation in the current round depends on the outcomes of the preceding m rounds. The full space of memory-m strategies in such an n-player game thus has dimension 2 n×m . To identify strategies that are evolutionary robust across such a large space we first introduce a convenient coordinate transform for the space of memory-m strategies, which generalizes that introduced to study memory-1 strategies in iterated 2-player games [1][2][3]. This coordinate transformation enables us to identify sets of memory-m strategies that are robust to invasion by any other strategy in an evolving population. We apply this method to analyse evolutionary robustness in various n-player iterated public goods game.

Iterated n-player games
We consider an iterated game with an infinite number of successive rounds between a player, X 0 and her opponents X 1 , X 2 . . .X n−1 . We study games for which, in each round, each player has two choices, denoted cooperate (c) and defect (d). The payoffs in a given round to the focal player X 0 is given by R c,l−1 , if she cooperates along with l − 1 of her opponents, and it is given by R d,l if she defects while l of her opponents cooperate.
We will focus on public goods-type games, for which by definition in each round • R d,l > R c,l−1 so that, given l players cooperating in total, those who defected receive a higher payoff than those who cooperated • R c,l ≥ R c,l−1 and R d,l ≥ R d,l−1 so that, typically, the more of her opponents cooperate, the higher the payoff a cooperative focal player receives.
We will focus in particular on the most typical type of public goods game, for which R c,l−1 = B l n − C and R d,l = B l n , where B > C.

Memory-m strategies
A memory-m strategy takes account of the outcomes of the preceding m rounds of play among all players.
As such in any given round there are n × m plays taken into account, and the strategy space therefore has dimension 2 n×m -that is, a player's strategy consists of 2 n×m probabilities for cooperation. First we develop notation to describe the probability that a focal player will cooperate in a focal round, given the plays made by all n players over the preceding m rounds. We denote the sequence of plays of the ith player over the preceding m rounds σ i , which has elements σ i k , denoting the play of player i, k steps in the past, where i = 0 . . . n − 1 and k = 1 . . . m. Thus σ i k = c if player i cooperated and σ i k = d if she defected k steps in the past. We then write the probability for cooperation for a particular history of play in its most general form as p σ 0 ,σ 1 ,σ 2 , . . . ,σ n−1 ∈ [0, 1].
In order to determine the robustness of such strategies, it will be convenient to introduce the operator θ which returns where for simplicity we will often write θ i k in place of θ σ i k for the play of the ith player k steps back in time. The number of times player i cooperated within memory is thus m k=1 θ i k and the number of players who cooperated in the immediately preceding round is n−1 i=0 θ i 1 .

Equilibrium payoffs in Iterated Games
The longterm scores received by n memory-m players in an infinitely iterated game are calculated from the equilibrium rates of the different plays. This can be determined from the stationary distribution a Markov chain on 2 n×m states, which correspond to the history of plays across the preceding m rounds.
In order to do this we write the equilibrium rate of a particular history of plays as v σ 0 ,σ 1 ,σ 2 , . . . ,σ n−1 .
The essential trick we use to analyze equillibrium payoffs in multi-player games among players with long-memory strategies is to reduce the problem to an equivalent problem involving more players each using only memory-1 strategies. The advantage of the memory-1 setting is that it will allow us to express equillibrium payoffs in the framework of determinants developed by Press & Dyson and others ??. In particular, given a game among n players who memory-m strategies we construct an equivalent n × mplayer game in which players use only memory-1 strategies. Of these n × m players, n are "real" players, and they each use a memory-1 strategy that corresponds precisely to a memory-m strategy in the original long-memory game, as described above. In order to allow the "real" memory-1 players to effectively react to the entire history of plays across m prior rounds we construct m − 1 "shadow" players for each real player, who encode the information of earlier rounds. At each round, the shadow player with index k > 1 deterministically executes the play of its associated real player, k rounds in the past.
Given an n-player game among memory-m strategies, we encode the equivalent n × m-player game among memory-1 strategies by writing p i,k as the vector of all 2 n×m probabilities for cooperation for the ith player, k steps in the past. If we order the players such that they are indexed from j = 1 . . . n × m then the index of player i, k is given by j = i × m + k. In this labelling system, p i,1 is the strategy of the ith real player, and v is the corresponding stationary vector of equilibrium rates of play. Finally we must encode the "strategy vector" of the shadow players, which encode how a player updates her memory each round. This is simple to do. We write p i,k as the "strategy" vector which updates the memory of player i, k steps in the past (see Figure. S1 for illustration). This vector has entry 1 if θ i k−1 = 1 and 0 if θ i k−1 = 0. Thus the real strategy of player i consists of probabilities p i,1 ∈ [0, 1] n×m , whereas a shadow strategy, for which k > 1, consists of deterministic quantities p i,k ∈ {0, 1} n×m .
The equilibrium score of player X 0 against players X 1 , X 2 . . .X n−1 is calculated according to a particular form of determinant D defined below and written as: Note that in this expression have used the notation of the associated game with n × m, memory-1 players, n of which correspond to the players in the original n-player, memory-m game. In this equation I denotes the identity vector of size 2 n×m , for which all elements are 1, and R 0,1 denotes the payoff vector of player X 0 . The payoff to player 0 in a given round depends only on her own play and the plays of the n other players in that round. In general, the payoffs received by player i, in the round that occurred k steps previously is determined from the payoff vector R i,k , which has 2 n×m elements R i,k σ 0 ,σ 1 ,σ 2 , . . . ,σ n−1 can be written as Where for the standard public goods game we can write R c, n−1 In general, the determinant D p 0,1 , p 0,2 . . .p 0,m , p 1,1 , p 1,2 . . .p 1,m . . .p n−1,1 , p n−1,2 . . .p n−1,m , f arises from a generalization of the results of Press & Dyson, [1] for two-player games, and of [4,5] for multiplayer games, and gives the dot product between the stationary vector v and an arbitrary vector f which has elements f σ 0 ,σ 1 ,σ 2 , . . . ,σ n−1 . In the example of a three player game with memory-1 strategies between players X 0 , X 1 and X 2 with strategies p, q and r, the determinant is given by Eq. 1 can be used to calculate the scores received by n memory-1 players in a given game. However, there are certain cases in which the Markov chain describing the iterated game has multiple absorbing states, and the denominator of Eq.1 goes to zero. The scores in these cases can be calculated by assuming that players execute their strategy with some small "error rate" [6], so that the probability of cooperation is at most 1 − and at least . Assuming this, and taking the limit → 0 then gives the player's scores in the cases where multiple absorbing states exist.

Evolution in a population of players
We study the evolution of memory-m strategies in a population of N individuals playing an iterated n-player game, with N ≥ n. In each generation, all subsets of n players in the population engage in the iterated game, and each player in the population receives a total score across all the N −1 n−1 games in which she participates. We assume that the population is well-mixed, so that the makeup of different strategies these games depends upon the frequencies of strategies in the population. We focus on evolution under weak-mutation, in which a strategy X is resident in the population; a single mutant strategy Y arises through mutation; and Y is subsequently either lost or goes to fixation in the population, before another mutant arises. We always use X to denote the resident, and Y the mutant, strategy. Under this weak-mutation assumption there are at most two strategy types present in the population at any time.
We use the notation S X a to denote the payoff to strategy X in a single iterated game involving a players of type Y and n − a players of type X. We use the notation S Y a to denote the payoff to strategy Y in a single iterated game involving a players of type Y and n − a players of type X. When the population as a whole contains b players of type Y and N − b players of type X, then, the total score to a player of type X, denoted T X (b), is given by where the sum over a denotes the different number of opponents of type Y that X may face in the n-player games she plays in a single generation. The total score to a mutant Y in such a population, denoted T Y (b), can be calculated in the same way.
We model evolution according to the copying process [7], in which pairs of players are drawn at random from the population, and the first player switches her strategy to that of the second player with a probability that depends on the difference between their total scores. Thus a player using a strategy X switches to Y with probability where s is a parameter denoting the strength of selection.
The "strong-selection" regime of this process occurs when N s 1. Under this regime selection is sufficiently strong that an invading mutant is extremely unlikely to reach high frequency in the populaiton, unless it has a selecive advantage (or is neutral) against the resident strategy in the population. Thus under strong selection, resident strategies that can resist invasion by all other mutants are evolutionary robust. Alternatively, the "weak-selection" limit arises when N s 1 in which case even deleterious strategies may reach high frequency through genetic drift. We focus on the regime of strong selection in our analysis below.

Evolutionary robustness
The concept of evolutionary robustness [8] is similar to the notion of evolutionary stability [9,10], but more useful for studying evolution in large strategy spaces, in which an ESS strategy typically does not exist [3,8]. In general, a strategy is defined to be evolutionary robust if, when resident in a population, there is no mutant that is favored to spread by natural selection when rare [8].
More precisely, under strong selection a resident strategy X is evolutionary robust iff T Y (1) ≤ T X (1) for all strategies Y . This condition for evolutionary robustness under strong selection is identical to that of a Nash equilibrium in the limit N → ∞.

Coordinate Transform
In two-player games, the work of Press & Dyson [1] and Akin [2] allows us to identify a coordinate transform for the full space of memory-1 strategies. This coordinate transformation permits a simple closed-form expression relating the scores of two players in a game, which has enabled us to identify all evolutionary robust memory-1 strategies, under both strong and weak selection [3,8]. Here we extend this line of analysis to multi-player games with memory-m strategies. We begin by identifying an analogous coordinate transform for the 2 n×m -dimensional space of memory-m strategies in an n-player game.
To define the desired co-ordinate transform we must identify 2 n×m vectors that form a basis in R n×m and that allow us to write down a simple, closed-form relationship between the players' scores in a given game.
These vectors consist firstly of the n × m + 1 vectors R i k for each player's payoff in the kth preceding round, along with the identify vector I, with entry 1 in all positions. The second set of vectors in the coordinate transform consists of the n × m − 1 vectors denoted L l , where L l is has entry 1 when l players cooperated in the previous m rounds and entry 0 otherwise, regardless of the focal player's play in the previous round. Note that this excludes the case where all players cooperated and the case where no players cooperated, which are accounted for by the identity vector.
The final 2 n×m − 2n × m vectors required for the coordinate transform account for the degeneracy that arises due to the number of ways in which l players can cooperate in the preceding m rounds. Given l, if the focal player cooperates l p times over m rounds, and her opponents therefore cooperate l o = l−l p times, she will receive the same total payoff over those m rounds in a standard public goods game, regardless of which players cooperated or when. In the most general case a player may nonetheless distinguish between the play of each player (including herself) in each of the preceding m rounds.
We already have 2n × m vectors, as described above. The simplest way to account for the remaining dimensions (required for players to distinguish between all possible outcomes) is simply to add . . ,σ n−1 , which have entry 1 for a single set of plays (σ 0 , σ 1 , σ 2 , . . . , σ n−1 ), for which n−1 i=1 m k=1 θ i k = l o is the total number of times all players have cooperated in the last m rounds and m k=1 θ 0 k = l p is the number of times the focal player has cooperated in the last m rounds. Note that we have written these vectors in terms of l p and l o in order to aid later analysis.
We adopt the convention that we do not add a vector G lo,lp σ 0 ,σ 1 ,σ 2 , . . . ,σ n−1 for the set of opponent plays which is ordered cccc. . .dddd across the ordered history of all players, and the set focal player plays ordered either cccc. . .dddd or dccc. . .dddd. That is, the play for which the first l o terms of the sum In summary, the 2 n×m vectors for the coordinate transform consist of • n × m vectors R for the player's payoffs, and the vector I.
• n × m − 1 vectors L l with entry 1 when l players cooperated in the previous round.
• 2 n×m − 2n × m vectors G with a single entry 1, to account for the degeneracy which arises when different combinations of opponents cooperate.
Although this is a somewhat complex transformation, we shall see the utility of working this way in what follows.
For clarity's sake we can write down this coordinate system explicitly, first for the case of n = 3 players with memory-1. The new coordinate system is R 0 which is therefore a basis R 8 . Similarly, for the case of n = 2 players with with memory-m the new coor- which is therefore a basis R 16 .
Using the results of Press & Dyson, generalised to multi-player games [1,4,5], the strategy of the focal player in this coordinate new system, which for convenience we assign index i = 0, is given by a vector of the form: where where S i denotes the equilibrium score of player i in the current game, v lo+lp denotes the rate at which l o + l p players cooperate and v lo,lp σ is the rate at which the focal player cooperates l p times, along with l o of her opponents, with the sequence of plays following the ordering σ = (σ 0 , σ 1 , σ 2 , . . . , σ n−1 ). Note that the equilibrium score of player i is independent of k in Eq. 7.
We now additionally define the parameters χ 0 In this new parameterization we can re-write the relationship among the players' scores as: Eq.8 gives the most general form for the relationship between player's scores in an n-player game with memory-m. Henceforth we will restrict our analysis restricted to a focal strategy in which a memory-m player does not distinguish between her opponents, and does not pay attention to the order of cooperation events. As such we consider a focal player who keeps track of two quantities: (i) the total number of times her opponents cooperated in the last m rounds and (ii) the total number of times she cooperated in the last m rounds.

Strategies that track cooperation frequency
If a focal player tracks only the number of times she cooperated in the last m rounds, and the total number of times her opponent cooperated in the last m rounds, then her memory-m strategy consists of ((n − 1)m + 1) × (m + 1), since her (n − 1) opponents can cooperate anywhere between 0 and (n − 1)m times in m rounds, and she can cooperate anywhere between 0 and m times, to give a strategy consisting of ((n − 1)m + 1) × (m + 1) probabilities for cooperation. We will henceforth explicitly adopt a standard public goods payoff structure, with R c,l = B l n − C and R d,l = B l n Eq. 6 encodes a strategy with 2 n×m probabilities for cooperation, many of which are redundant in our reduced strategy space. Let the focal player cooperate l p times and her opponents cooperate l o times in m rounds.
Starting from Eq. 8 are now able to make two observations: as the expression for the probability of cooperation given that the focal player cooperated l p times and her opponents l o times in the last m rounds, assuming she only tracks cooperation frequency.

Boundary conditions
If we recall our convention that the equation lacking a γ is that which is ordered with cccc.....dddd etc we then have the following boundary conditions Similarly the term with dccc.....dddd lacks a γ terms so that First, combining the two expressions for p 0,lp gives φCχ 0 lp+1 = 1 + φCχ 0 1 and since this must hold for all l p we have χ 0 lp+1 = χ 0 m is constant, and Substituting these into the general expression for p 0,lp we find We therefore have ((n − 1) × m + 1)(m + 1) − (n × m + 2) parameters γ lo,lp , plus n × m − 1 parameter λ lo,lp , plus 3 parameters χ, φ and κ to give a total ((n − 1) × m + 1)(m + 1) parameters as required. We can use these boundary conditions for γ to construct the inverse coordinate transform. We arrive at the three simultaneous equations which can be solved for for κ, χ and φ: with the remaining terms Λ lo,lp being determined by these three parameters plus p lo,lp . Finally, we set and Λ lo,lp = λ lo+lp + γ lo,lp to define a coordinate system characterized by a vector of ((n − 1) × m + 1)(m + 1) numbers, (κ, χ, φ, Λ 0,0 , . . . , Λ (n−1)×m,m ) where we have conditions Λ 0,0 = Λ(n − 1) × m, m = 0 and a third linear condition as described above.

Strategies and payoffs in a public goods game
We can now write the relationship between the players' scores when players do not pay attention to the identity of their opponents as In the case when one player uses strategy Y and the rest use strategy X we then have the following relationship between scores: The strategy of a focal player in a public goods game can then be written as Since a viable strategy must have 0 ≤ p lo,lp ≤ 1 we see by looking at p 0,0 and p (n−1)×m,m that with additional constraints on the other parameters. This in turn implies that a cooperator, for which p (n−1)×m,m = 1 necessitates κ = B − C and a defector, for which p 0,0 = 0 necessitates κ = 0.

Equilibrium rates of play
We now derive some inequalities that, in combination with Eq. 11, will allow us to identify the strategies that are evolutionary robust in n-player games. In general, we can write the score of a focal player with resident strategy X as Similarly the score of an opponent with a strategy Y , is given by where we assume w lo+lp−l p ,l p = 0 for the unphysical case l p > l o . This allows us to write the score of X in terms of w lo,lp . We will now use these results to explore two special cases of interest: (i) The effect of increasing the size of the game n with fixed memory, and (ii) the effect of increasing memory size m with fixed game size.

Bounds on players' scores
We can now use Eq. 14 to find upper and lower bounds on the difference and the sum of players' scores, in the case that the game contains a single player using a strategy Y and n − 1 players using a strategy X.
We can now write the difference between the scores of X and Y scores as which enables us to identify upper and lower bounds on the difference between two players' scores, namely which becomes an equality when Y always defects at equilibrium and which becomes an equality when Y never defects at equilibrium. We can similarly write, for the sum of the player's scores, This gives an upper bound on the sum which becomes an equality when w 0,0 = 0 at equilibrium (i.e it is never the case that all players defect).
Finally we have a lower bound on the sum which becomes an equality when w (n−1)×m,m = 0, (i.e it is never the case that all players cooperate at equilibrium).
It is also convenient to rewrite Eq. 12 for the relationship between two player's scores in terms of w to give We can now use Eqs. 16-20 to identify the strategies that are evolutionary robust, under strong selection, in multi-player games.

Robust strategies
We focus here on the prospects for cooperation in iterated games. In particular, we identify strategies which, when used by all players in a game, ensure that all players cooperate. This is achieved quite simply by setting p m×(n−1),m = 1 so that if all players cooperated in the preceding round, all players assuredly cooperate in the following round. We call these strategies the cooperators and we calculate the robustness of these strategies by determining the proportion of cooperators that can resist invasion by all other strategies. We contrast this to the defectors: strategies which have p 0,0 = 0, such that if al players defected in the preceding round, all players assuredly defect in the following round. The importance of these two strategy classes in two-player public goods games has been established already [3], making it natural to generalise their study to games with multiple players and long memory.
To determine whether a strategy is robust we use the condition given previously for an evolving population of N players in a multiplayer game. Given a resident strategy X in a population, selection acts against a new mutant Y provided T X (1) > T Y (1), as described above. This can be written explicitly in terms of players' scores as where S X 0 is the score received by X with no mutants in the game, S X 1 is the score received by X with one mutant player Y in the game and S Y 1 is the score received by Y with no other mutants in the game.

Robust cooperating strategies under strong selection
We first identify the cooperating strategies that are robust under strong selection. As defined above, a cooperating strategy X is such that, if all players use the strategy, all players cooperate every turn at equilibrium. Such strategies must have κ = B − C.
A mutant strategy Y can selectively invade a cooperating strategy under strong selection iff Thhe longterm payoffs must additionally satisfy Eq. 17-21. We can therefore identify strategies X which cannot be selectively invaded by any mutant Y . For simplicity we write Case I: Robustness when χ ≤ N (n−2)+1 (N −1)(n−1) φ Using Eq. 20 we can write the condition for invasion by a mutant strategy Y as combining Eq. 16 with Eq. 19 then gives as a necessary condition for invasion.
Thus, in summary, the set of robust cooperating strategies in an n-player game under strong selection with memory-m, which we denote C n,m s , is given by:

Robust defecting strategies under strong selection
We now identify the defecting strategies that are robust under strong selection. As defined above, a defecting strategy X is one such that, if all players adopt the strategy, all players defect every turn at equilibrium. Such strategies must have κ = 0.
A mutant strategy Y can selectively invade a defecting strategy under strong selection iff Using Eq. 20 this can be re-written as Following the same procedure as for the cooperators above, we find that the set of robust defecting strategies in an n-player game under strong selection with memory-m, which we denote D n,1 s , is given by D n,m s = (χ, φ, κ, Λ 0,1 , . . . Λ (n−1)×m,m−1 ) κ = 0, Notice that in the special case n = N , in which all members of a population play the same public goods game together, the conditions for robust cooperation are independent of Λ lo,lp and can never be satisfied.

Calculating robust volumes
We have now derived necessary and sufficient conditions for cooperators and defectors to be robust in nplayer public goods games. However, in contrast to the case of two-player games these conditions depend explicitly on the equilibrium play w lo,lp of an invading mutant. We can nonetheless easily construct strategies that are assuredly robust, by using the fact that w lo,lp ≤ 1 for all possible mutants. Similarly we can construct strategies that are assuredly invadable. However this leaves a large subset of strategies whose robustness depends on the actual values of w lo,lp . Nonetheless we can still determine their robustness by using the fact that the bounds on players scores render the conditions Eqs. 26 and 27 most stringent when a mutant plays such that (1) he never cooperates at equilibrium (Eq. 16), (2) he plays so that   Figure S1 for the effect of population size N on the volume of robust strategies. As discussed in the main text, larger populations leader to larger volumes of robust cooperators and smaller volumes of robust defectors.

The impact of memory on robustness
As discussed in the main text, the impact of memory on robustness arises because it increases the capacity for contingent punishment, as expressed through the parameters Λ lo,lp . The way in which this occurs is most clearly understood by looking at the expectation Λ lo,lp for a randomly drawn strategy. For a randomly drawn cooperating or defecting strategy the expectations χ and φ are related according to which gives The expectation Λ lo,lp for a randomly drawn cooperator is then Similarly, the expectation Λ lo,lp for a randomly drawn defector is and the average Λ is which gives an average across all l o , l p of If we now use Eq. 22 to determine the average Λ lo,lp for a cooperators and defectors faced with a mutant who cooperated l p times within their memory, we recover Fig S2. We see that a randomly drawn cooperator tends to be more succesful at punishing a given mutant, while a randomly drawn defector tends to become less successful, as memory increases.

Invasability and cost of memory
Our evolutionary simulations, Figure 4, show that in addition to increasing the overall robustness of cooperation, memory capacity m tends to increase in small games. To understand this we must look at the average fixation probability of mutations that increase memory by 1, versus those that decrease memory by 1. This is shown in Figure S2c. We see that mutations that increase memory capacity are more likely to fix than mutations that decrease memory capacity, regardless of the current resident memory capacity. Thus longer memories will tend to evolve on average. If we introduce a cost for memory, so that a player's overall payoff is reduced by a factor mC m , we see ( Figure S2d) that mutations that increase memory eventually become worse invaders than mutations that decrease memory. In such cases an intermediate memory length evolves. Thus the evolution of memory depends on the costs associated with longer memories, as well as the size of the game being played. As we see in Figure S2a, shorter memories evolve, and much more slowly, when memory comes at a cost. Correspondingly ( Figure S2b), the effect of evolving longer memories has a much weaker effect on the evolution of cooperation, although the general trend of reduced defection and increased cooperation is maintained.

Robustness and the dimension of strategy space
As discussed in the main text and shown in Figure 4, as memory increases the overall frequency of robust cooperators and defectors that evolve tends to decline. This reflects the fact that the absolute volume of robust strategies tends to decline as the dimension of strategy space increases -the probability of randomly drawing a strategy from the n-dimensional unit cube, that also lies within a robust volume volume with sides of fixed length, declines as a power of 1/n. This decline in the robust volumes of strategies with the dimension of strategy space (both game size n and memory length m) is shown in Figure S4.  Figure S1: The impact of population size on cooperation. We calculated the relative volumes of robust cooperation -that is, the absolute volume of robust cooperative strategies divided by the total volume of robust cooperators and defectors -and compared this to the relative volume of defectors (solid lines) using Eqs. 2-3. We also verified these analytic results by randomly drawing 10 6 strategies and determining their success at resisting invasion from 10 5 random mutants (points). We calculated player's payoffs by simulating 2 × 10 3 rounds of a public-goods game. We then plotted the relative volumes of robust cooperators and robust defectors as a function of populations size N with fixed game size n = 2 and memory length m = 1, (left) and m = 10 (right). In both cases the effect of increasing population size is to increase the relative volume of cooperators and decrease that of defectors. In all calculations and simulations we used cost C = 1 and benefit B as indicated in the figure Λ lo,lp for the average punishment of a mutant who defected l p times within the memory of the resident strategy, for both cooperators (blue) and defectors (red). As memory becomes longer, the average punishment increases for cooperators, making strategies more likely to be robust, and decreases for defectors, making strategies less likely to be robust.  Figure S3: Invasibility of memory. We simulated co-evolution of memory and strategies as described in Figure 4 of the main text, with an additional cost to having memory which reduces a player's payoff by c m × m. We see that (a) much shorter memories evolve for c m = 0.1 compared to c m = 0 and (b) a correspondingly smaller amount of cooperation evolves. In order to understand why longer memory strategies evolve in small games we looked at the average fixation probability of mutations that increase or decrease memory, when played against a randomly drawn resident strategy. We drew 10 6 resident strategies for each memory length m ∈ {1, 2, 3, /ldots, 10} and for each drew 10 5 mutants that increase memory length by 1 and 10 5 mutants that decrease memory length by 1. We assumed that a mutation that increased memory length by 1 did not change the probability p lo,lp of the player's strategy. Where mutations increased memory length, we randomly drew probabilities p (n−1)(m+1),lp and p lo,m+1 . (c) Plotted are the average fixation probabilities for mutations that increase (black dots) or decrease (gray dots) memory by 1. Each point shows the probability of in versus out transition for the state k (i.e mutations that result in increase in memory from k to k + 1 and mutations that result in decrease in memory from k + 1 to k). When there is no cost to memory, mutations that increase memory length are always better invaders, for games of size n = 2. (d) When there is a cost to memory, mutations that decrease memory length do relatively better, and mutations that increase memory length do relatively worse. As a result we expect to see intermediate memory lengths evovle in the presence of costs.  Figure S4: Absolute volumes of robust strategies. Here we show the same plot as in Figure 3 of the main text, using absolute rather than relative volumes. As is clear, the absolute volumes of both cooperators and defectors tends to decline as the dimension of strategy space increases. However this occurs at different rates for the different strategy types depending on whether game size (left) or memory (right) is increasing.