Introduction

Cooperation can be conceptualized as an individually costly behavior that creates a benefit to others1. Such cooperative behaviors have evolved in many species, from uni-cellular organisms to mammals2. Yet they are arguably most abundant and complex in humans, where they form the very basis of families, institutions, and society3,4. Humans often support cooperation through direct reciprocity5. Here, people preferentially help those who have been helpful in the past6. Such forms of direct reciprocity naturally emerge when groups are stable, and when cooperation yields substantial returns7. In that case, individuals readily learn to engage in conditional cooperation, using strategies like Tit-for-tat8,9,10,11 (TFT), Win-Stay Lose-Shift12,13 (WSLS), or multiplayer variants thereof14,15,16. When everyone adopts these strategies, groups can sustain cooperation despite any short-run incentives to free ride17,18.

To describe direct reciprocity formally, traditional models of cooperation consider individuals who face the same strategic interaction (game) over and over again. The most prominent model of this kind is the iterated prisoner’s dilemma8. In this game, two individuals (players) repeatedly decide whether to cooperate or defect. While the players’ decisions may change from one round to the next, the feasible payoffs remain constant. Models based on iterated games have become fundamental for our understanding of reciprocity. However, they presume that interactions take place in a constant social and natural environment. Individual actions in one round have no effect on the exact game being played in future. In contrast, in many applications, the environment is adaptive, such as when populations aim to control an epidemics19,20,21, manage natural resources22,23,24, or mitigate climate change25,26,27. Changing environments in turn often bring about a change in the exact game being played. Such applications are therefore best described with models in which there is a feedback between behavior and environment. In the context of direct reciprocity, such feedbacks can be incorporated with the framework of stochastic games28,29,30.

In stochastic games, individuals interact over multiple time periods. Each period, the players’ environment is in one of several possible states. This state can change from one period to the next, depending on the current state, the players’ actions, and on chance. Changes of the state affect the players’ available strategies and their feasible payoffs. In this way, stochastic games are better able to describe social dilemmas in which individual actions affect the nature of a group’s future interactions. Yet previous evolutionary models of stochastic games presume that individuals are perfectly aware of the current state31,32,33. This allows individuals to coordinate on appropriate responses once the state has changed. In contrast, in many applications, any knowledge about the state of the environment is at best incomplete. Such uncertainties can in turn have dramatic effects on human behavior34,35,36,37. Understanding the impact of information on decision-making has been a rich field of study in economics. Corresponding studies suggest that the effect of information is often positive, even though there are situations in which it has adverse effects38,39,40. Additionally, studies of partially observable stochastic games suggest that settings with incomplete information can benefit decision-makers41,42.

In the following, we explore how state uncertainty in stochastic games shapes the evolution of cooperation. To this end, we compare two scenarios. First, we consider the case when individuals are able to learn the state of their environment and condition their decisions on the current state. We will refer to this case as the ‘full-information setting’. In the second case, individuals may be aware that they are engaged in a stochastic game but they either ignore or are unable to obtain information about the current state. As a result, their decisions are independent of their environment. We refer to this case as the ‘no-information setting’. To compare these two settings we focus on the simplest possible case, where two players may experience two possible states. Already for this elementary setup, we obtain an extremely rich family of models that gives rise to many different possible dynamics. Already here, we observe that conditioning strategies on state information can have drastic effects on how people cooperate.

To quantify the importance of state information, we introduce a measure to which we refer as the ‘value of information’. This value reflects by how much the cooperation rate in a population changes by gaining access to information about the present state. When this value is positive, access to information makes the population more cooperative. In that case, we speak of a ‘benefit of information’. In general, it is also possible to observe negative values, in which case we speak of a ‘benefit of ignorance’. With analytical methods for the important limit of weak selection43,44,45, and with numerical computations for arbitrary selection strengths, we compare the value of information across many stochastic games. We identify settings where receiving information is better, neutral, or worse for the evolution of cooperation. Most often, information is highly beneficial. However, there are also a few notable exceptions in which populations can achieve more cooperation when they are ignorant of their state. In the following, we describe and characterize these cases in detail.

Results

Stochastic games with and without state information

To explore the dynamics of cooperation in variable environments, we consider stochastic games31,32,33. We introduce our framework for the most simple setup, in which the game takes place among two players who interact for infinitely many rounds, without discounting of their future payoffs. In each round, players can find themselves in two possible states, S = {s1, s2}. Depending on the state, players engage in one of two possible prisoner’s dilemma games. In either game, they can either cooperate (C) or defect (D). Cooperation means to pay a cost c for the other player to get a benefit bi. The cost of cooperation is fixed, but the benefit bi depends on the present state si (Fig. 1a). Without loss of generality, we assume that the first state is more profitable, such that b1 ≥ b2 > c 1. However, states can change from one round to the next, depending on the game’s transition vector

$${{{{{{{\bf{q}}}}}}}}=({q}_{CC}^{1},\, {q}_{CD}^{1} \,,{q}_{DD}^{1};\,{q}_{CC}^{2},\, {q}_{CD}^{2},\, {q}_{DD}^{2}).$$
(1)

Here, each entry \({q}_{a\tilde{a}}^{i}\in [0,\, 1]\) is the probability that players find themselves in the more profitable state s1 in the next round. This probability depends on the previous state si and on the players’ previous actions a and \(\tilde{a}\). For example, the transition vector q = (1, 0, 0; 1, 0, 0) corresponds to a game in which players are only in the more profitable state if they both cooperated in the previous round. Note that we assume the transition vector to be symmetric. That is, transition probabilities depend on the number of cooperators, but they are independent of who cooperated (\({q}_{CD}^{i}={q}_{DC}^{i}\) for all i). We say a transition vector is deterministic if each entry \({q}_{a\tilde{a}}^{i}\) is either zero or one (Fig. 1b). Even for deterministic vectors we speak of a ‘stochastic game’, because games with deterministic transitions represent a special case of our framework. Based on Eq. (1), there are 26 = 64 deterministic transition vectors in total. We call a transition vector single-stochastic if there is exactly one entry that is strictly between zero and one. Games with single-stochastic transitions can serve as the most elementary example of an interaction for which the environment depends on chance events.

Fig. 1: Stochastic games with full and no information.
figure 1

a We study 2-state stochastic games where transitions between the states depend on the players' actions. In each state, players engage in a prisoners' dilemma with benefit b1 (or b2) and cost c. The two benefit parameters b1 and b2 might reflect the group’s environmental conditions. Without loss of generality, we assume b1 ≥ b2. b Transitions between the states can be either completely determined by the player’s current actions (deterministic game transitions), or they may additionally depend on chance events (stochastic game transitions). In the two cases depicted here, environmental conditions worsen when players defect, reducing the players' possible benefits. Once players resume mutual cooperation, they may return to the more profitable first state. We note that in the bottom case, only a single transition depends on chance; in that case, we speak of a single-stochastic transition vector. c In this work, we compare two possible scenarios, depending on whether or not players are able to condition their behavior on the current state (`full information' versus `no information'). d With full information, individuals can react differently to their opponent, depending on the current state. As a result, they can choose among 28 = 256 deterministic memory-one strategies. Without information, players need to act in the same way in each of the two states. Hence there are only 24 = 16 deterministic memory-one strategies. The acronyms ALLC, ALLD, TFT, WSLS refer to unconditional cooperation, unconditional defection, tit-for-tat, and win-stay lose-shift, respectively.

To explore how often players cooperate depending on the information they have, we compare two settings (Fig. 1c). In the full-information setting, players learn the present state before making decisions. Thus, their strategies may depend on both the present state and on the players’ actions in the previous rounds. Herein, we assume that players make decisions based on memory-1 strategies. Such strategies only take into account the outcome of the last round46 (extensions to more complex strategies47,48,49,50,51,52 are possible, but for simplicity we do not explore them here). In the full information setting, memory-1 strategies take the form of an 8-tuple,

$${{{{{{{{\bf{p}}}}}}}}}_{F}=({p}_{CC}^{1},\, {p}_{CD}^{1},\, {p}_{DC}^{1},\,{p}_{DD}^{1};\,{p}_{CC}^{2},\, {p}_{CD}^{2},\,{p}_{DC}^{2},\, {p}_{DD}^{2}).$$
(2)

Here, \({p}_{a\tilde{a}}^{i}\) is the player’s probability to cooperate in state si, depending on the focal player’s and the co-player’s previous actions a and \(\tilde{a}\), respectively. We compare this full-information setting with a no-information setting, in which individuals are unable to condition their behavior on the current state. In that case, strategies are 4-tuples

$${{{{{{{{\bf{p}}}}}}}}}_{N}=({p}_{CC},\, {p}_{CD},\, {p}_{DC},\, {p}_{DD}).$$
(3)

We note that the set of no-information strategies is a strict subset of the full-information strategies (they correspond to those pF for which \({p}_{a\tilde{a}}^{1}={p}_{a\tilde{a}}^{2}\) for all actions a and \(\tilde{a}\)). For simplicity, we assume in the following that the players’ strategies are deterministic, such that each entry is either zero or one. For full information, there are 28 = 256 deterministic strategies. For no information, there are 24 = 16 deterministic strategies. Some results for stochastic strategies are shown in Fig. S1a, b.

The players’ strategies may be subject to errors with some small probability ε. This model parameter reflects the assumption that people may occasionally make mistakes when engaging in reciprocity53,54. In that case, an intended cooperation may be misimplemented as a defection (and vice versa). Games with errors have the useful technical property that the long-run dynamics is independent of the players’ initial moves46. For ε > 0, a player with strategy p effectively implements the strategy (1 − ε)p + ε(1 − p). In particular, even when the original strategy p is deterministic, the effective strategy is stochastic. Given the error probability, the players’ strategies, and the game’s transition vector, we can compute how often players cooperate on average and which payoffs they get (see Methods).

Because we are interested in how cooperation evolves, we do not consider players with fixed strategies. Rather players can change their strategies in time, depending on the payoffs they yield. To describe this evolutionary dynamics, we use a pairwise comparison process55. This process considers populations of fixed size N. Players receive payoffs by interacting with all other population members. At regular time intervals, one player is randomly chosen and given the opportunity to revise its strategy. The player may do so in two ways. With probability μ, the player switches to a random deterministic memory-1 strategy (similar to a mutation in biological models of evolution). Otherwise, with probability 1 − μ, the focal player compares its own payoff π to the payoff \(\tilde{\pi }\) of a random role model. The player switches to the role model’s strategy with probability \({(1+\exp [-\beta (\tilde{\pi }-\pi )])}^{-1}\). The parameter β > 0 is the strength of selection. The higher this parameter, the more individuals are prone to imitate only those role models with a high payoff. Overall, these assumptions define a stochastic process on the space of all possible population compositions. For finite β, evolutionary trajectories do not converge to any particular outcome because no population composition is absorbing. However, because the process is ergodic, the respective time averages converge to an invariant distribution. This invariant distribution describes how often the population has a given composition in the long run (see Methods).

We study this evolutionary process analytically when mutations are rare and selection is weak (that is, when μ, β → 0). In addition, we numerically explore the process for arbitrary selection strengths. In either case, we compute which payoffs players receive on average and how likely they are to cooperate over time. By comparing the cooperation rates \({\hat{\gamma }}^{F}\) and \({\hat{\gamma }}^{N}\) for populations with full and no information, respectively, we quantify how favorable information is for the evolution of cooperation. We refer to the difference, \({V}_{\beta }({{{{{{{\bf{q}}}}}}}}):={\hat{\gamma }}^{F}-{\hat{\gamma }}^{N}\) as the value of (state) information. In general, this value depends on the game’s transition vector q, as well as on the strength of selection β. When this value is positive, populations achieve more cooperation when they learn the present state of the stochastic game.

In the following, we describe the results of this baseline model in detail. In the SI, we provide further results on the impact of different game parameters (Fig. S1), other strategy spaces (Fig. S2), and alternative learning rules (Fig. S3).

The effect of state information in two examples

To begin with, we illustrate the effect of state information by exploring the dynamics of two examples. Both examples are variants of models that have been previously used to highlight the importance of stochastic games for the evolution of cooperation31. In the first example (Fig. 2a), players only remain in the more profitable first state if they both cooperate. If either of them defects, they transition to the inferior second state. Once there, they transition back to the more profitable state after one round, irrespective of the players’ actions. The second state may thus be interpreted as a ‘time-out’31. For numerical results, we assume that cooperation yields an intermediate benefit in the more profitable state and a low benefit in the inferior state (b1 = 1.8, b2 = 1.3).

Fig. 2: A comparison of the value of information in two games.
figure 2

a, e As an example, we consider the dynamics of two games with deterministic transitions. We refer to the first game with transition vector q1 = (1, 0, 0; 1, 1, 1) as a game with timeout. The second game with transition vector q2 = (1, 0, 0; 1, 1, 0) corresponds to a timeout game with conditional return. b, f To illustrate the evolutionary dynamics, we simulate the pairwise comparison process for both settings (full and no information). Full information yields more cooperation in the first game but less cooperation in the second. c, g To explore the impact of information in each game, we numerically compute the abundance of all strategies, according to the invariant distribution of the process (see Methods). In Fig. 3, we describe these abundances in more detail. d, h By simultaneously varying the benefit b1 in the more profitable state and the selection strength β, we explore for which parameters there is a benefit of ignorance. Colors represent the value of information Vβ(q) according to the invariant distribution of the process. Default parameters: b1 = 1.8, b2 = 1.3, c = 1, population size N = 100, error rate ε = 0.01, and selection strength β = 10.

When we simulate the evolutionary dynamics of this stochastic game, we observe that individuals consistently learn to cooperate when they have full information. In contrast, without information, they mostly defect (Fig. 2b). To explain this result, we numerically compute which strategies are most likely to evolve according to the process’s invariant distribution, for each of the two cases (Fig. 2c). In the full-information setting, individuals predominantly adopt a strategy pF = (1, 0, 0, 0; x, 0, 0, 1), where x {0, 1} is arbitrary. This strategy may be considered as a variant of the WSLS rule that has been successful in the traditional prisoner’s dilemma12. In particular, it is fully cooperative with itself. We prove in Supplementary Note 3 that this strategy forms a subgame perfect (Nash) equilibrium if 2b1 − b2 ≥ 2c, which is satisfied for the parameters we use (see also Fig. 3a). On the other hand, in the no-information setting, this strategy is no longer available. Instead, players can only sustain cooperation with the traditional WSLS rule pN = (1, 0, 0, 1). This strategy is only an equilibrium under the more stringent condition b1 > 2c. Because our parameters do not satisfy this condition, cooperation does not evolve in the no-information setting (Fig. 3b). To explore how these results depend on the benefit of cooperation b1 and on the selection strength β, Fig. 2d shows further simulations where we systematically vary both parameters. In all considered cases, state information is beneficial because it allows individuals to give more nuanced responses.

Fig. 3: Strategy invasion analysis for the two timeout games.
figure 3

Here we analyze the invasion dynamics between different resident populations for the two examples considered in Fig. 2. Every circle represents a possible resident strategy. The frequency underneath indicates how often the respective resident population is observed over the course of the evolutionary process according to the invariant distribution. Strategies that have 100% self-cooperation rate (in the limit of rare errors) are highlighted with a green ring. Lines between the strategies represent the direction of selection. Solid lines indicate that the respective fixation probability is larger than 1/N. Dotted lines indicate that the fixation probability is smaller than 1/N but greater than 1/(10N); in that case we speak of `almost-neutral drift'. a, b In the timeout game with full information, there are several highly cooperative strategies that are fairly robust against invasions. In contrast, for no information, players can only maintain cooperation with WSLS, which is unstable for the given parameter values. c, d The picture changes in the timeout game with conditional return. Here, WSLS is stable in the game with no information. In contrast, when there is full information, WSLS can be invaded through almost-neutral drift. Parameters are the same as in Fig. 2.

The second example has a similar transition vector as the first, with a single modification. This time, the inferior state is only left if at least one of the two players cooperates (Fig. 2e). Although this modification may appear minor, the resulting dynamics is strikingly different. We observe that with and without state information, individuals are now largely cooperative. However, they are most cooperative when individuals do not condition their strategies on the state information (Fig. 2f). For this stochastic game, we show in Supplementary Note 3 that already the traditional WSLS rule is subgame perfect for 2b1 − b2 ≥ 2c. As a result, WSLS is predominant in the no-information setting (Fig. 3d). In contrast, in the full-information setting, WSLS is subject to (almost) neutral drift by strategies that only differ from WSLS in a few bits (Fig. 3c). These other strategies may in turn give rise to the occasional invasion of defectors. Overall, we find that this stochastic game exhibits a benefit of ignorance when selection is sufficiently strong, and when cooperation is particularly valuable in the more profitable state (i.e., in the upper right corner of Fig. 2h).

These examples highlight three observations. First, just as there are instances in which state information is beneficial, there are also instances in which state information can reduce how much cooperation players achieve. Second, the stochastic games (transition vectors) for which state information is beneficial may only differ marginally from games with a benefit of ignorance. Finally, even if a stochastic game admits a benefit of ignorance, this benefit may not be present for all parameter values. Taken together, these observations suggest that in general, the effect of state information can be non-trivial and requires further investigation.

A systematic analysis of the weak-selection limit

To explore more systematically in which cases there is a benefit of information (or ignorance), we study the class of all games with deterministic transition vectors. We first consider the limit of weak selection (β → 0). Here, game payoffs only weakly influence how individuals adopt new strategies. While a vanishingly small selection strength is a mathematical idealization, this limit plays an important role in evolutionary game theory43,44,45. It often permits researchers to derive explicit solutions when analytical results are difficult to obtain otherwise. In our case, the limit of weak selection is particularly convenient, because it allows us to exploit certain symmetries between the two possible states s1 and s2, and between the two possible actions C and D, see Supplementary Note 1. As a result, we show that instead of 64 stochastic games, we only need to analyze 24. For each of these 24 transition vectors q, we explore whether information is beneficial, detrimental, or neutral (i.e., whether V0(q) is positive, negative, or zero).

First, we prove that half of the 64 stochastic games are neutral. In these games, the full-information and the no-information setting yield the same average cooperation rate in the limit of weak selection. Among the neutral games, we identify three (overlapping) subclasses. (i) The first subclass consists of those games that have an absorbing state (15 cases). Here, either the first or the second state can no longer be left once it is reached, because \({q}_{a\tilde{a}}^{1}=1\) or \({q}_{a\tilde{a}}^{2}=0\) for all a and \(\tilde{a}\). For these games, state information is neutral because players can be sure they are in the absorbing state eventually. (ii) In the second subclass, transitions are state-independent31, which means \({q}_{a\tilde{a}}^{1}={q}_{a\tilde{a}}^{2}\) for all a and \(\tilde{a}\) (6 additional cases). For deterministic transitions, state-independence implies that the current state can be directly inferred from the players’ previous actions, even without obtaining explicit state information. (iii) In the third subclass, neutrality arises because of more abstract symmetry arguments, described in detail in Supplementary Note 1. In particular, while the games in the first two subclasses are neutral for all selection strengths, the games in the third subclass only become neutral for vanishing selection. One particular example of this last subclass is the game with transition vector q = (1, 0, 0; 1, 1, 0), which we studied in the previous section (Figs. 2e–h and 3c, d). There, we observed that this game can give rise to a benefit of ignorance when selection is intermediate or strong. Here, we conclude that this benefit disappears completely for vanishing selection (see also the lower boundary of Fig. 2h).

For the remaining 32 non-neutral cases, we identify a simple proxy variable that indicates whether or not the respective game exhibits a benefit of information for weak selection (Fig. 4a). Specifically, in a non-neutral game, information is beneficial if and only if X > 0, with X being

$$X=\left({{\mathbb{1}}}_{{q}_{CC}^{1}=1}+{{\mathbb{1}}}_{{q}_{CC}^{2}=0}\right)-\left({{\mathbb{1}}}_{{q}_{DD}^{1}=1}+{{\mathbb{1}}}_{{q}_{DD}^{2}=0}\right).$$
(4)

Here, \({{\mathbb{1}}}_{A}\) is an indicator function that is one if assertion A is true and zero otherwise. One can interpret the variable X as a measure for how easily the game can be absorbed in mutual cooperation (X ≥ 0) or mutual defection (X ≤ 0). For example, if a game has a transition vector with \({q}_{CC}^{1}=1\), groups can easily implement indefinite cooperation by choosing strategies with \({p}_{CC}^{1}=1\). By doing so, players ensure they remain in the first state, in which they again would continue to cooperate. Using the proxy variable X, we can conclude that there are two properties of transition vectors that make state information beneficial in the limit of weak selection. The transition vector either needs to allow players to coordinate on mutual cooperation in a stable environment (\({q}_{CC}^{1}=1\), \({q}_{CC}^{2}=0\)); or it needs to prevent players from coordinating on mutual defection in a stable environment (\({q}_{DD}^{1} \, \ne \, 1\), \({q}_{DD}^{2} \, \ne \, 0\)). Again by symmetry considerations, we find that there are as many games with a benefit of information as there are games with a benefit of ignorance (16 cases each, see Fig. 4a).

Fig. 4: Classification of games with deterministic transitions.
figure 4

a In the limit of weak selection (β → 0), we can define a simple proxy variable X by Eq. (4) that indicates whether information is better, neutral, or worse for games with deterministic transitions. When X = 0 or when the stochastic game has an absorbing state, the game is neutral. In all other cases, there is either a benefit of information (when X > 0) or a benefit of ignorance (when X < 0). The bar diagram depicts the respective value of information for each of the 64 possible cases. The panel is symmetric; for each game with a benefit of information, there is a corresponding game with the same benefit of ignorance. b Once we increase the selection strength, games with a benefit of information become predominant. c We also study the effect of b1 (the benefit of cooperation in the more profitable state) under strong selection, β = 10. Again, most games are either neutral or show a benefit of information. Unless explicitly noted otherwise, we use the same parameters as before.

Exploring the impact of other game parameters

After characterizing the case of weak selection, we next explore the dynamics under strictly positive selection. To this end, we numerically compute the population’s average cooperation rate with and without state information, for each of the 64 stochastic games considered previously. To explore the impact of different game parameters, we systematically vary the strength of selection (Figs. 4b and S4), the benefit of cooperation (Figs. 4c and S5), and the error rate (Fig. S6). For 21 games, the evolving cooperation rates are the same with and without information. These games are neutral either because there is an absorbing state, or because transitions are state-independent (as described earlier). For the remaining cases, we find that a clear majority of them result in a benefit of information (Fig. 4b, c).

In the few cases with a consistent benefit of ignorance (the red squares in Figs. S4S6), there is overall very little cooperation. As a result, the magnitude of this benefit is often negligible. Only in two cases one can find parameter combinations that lead to a sizeable benefit of ignorance. The first case is the stochastic game considered in Fig. 2e–h with transition vector q = (1, 0, 0;  1, 1, 0). The other case is a slight modification of the first, having the transition vector q = (1, 0, 1;  1, 1, 0). In both cases mutual cooperation leads to the more profitable first state. Moreover, in both cases, players can use WSLS to sustain cooperation even without state information, provided that 2b1 − b2 ≥ 2c. But even when this condition holds, the benefit of ignorance is constrained, because even fully informed populations tend to achieve substantial cooperation rates (Figs. S4S6). Overall, these results suggest that for positive selection strengths, a sizeable benefit of ignorance is rare. Moreover, there seems to be no simple rule that predicts for which stochastic games we can expect a benefit of ignorance (see Supplementary Note 3, Section 3.3 for a more detailed discussion).

The effect of environmental stochasticity

In our analysis so far, we assumed that the environment changes deterministically. Individuals who know the present state and the players’ actions can therefore anticipate the game’s next state. This form of predictability may overall diminish the impact of explicit state information because it reduces uncertainty. In the following, we extend our analysis to allow for stochasticity in the game’s transitions. To gain some intuition, we start with a simple example taken from the previous literature31 (see Fig. 5a for a depiction). According to the game’s transition vector, q = (1, 0, 0, q, 0, 0), players always find themselves in the less profitable second state if one or both players defect. If both players cooperate, however, they either remain in the first state (if they are already there), or they transition to the first state with probability q (if they start out in the second state). This stochastic game represents a scenario in which an environment deteriorates immediately once players defect. If players resume to cooperate, it may take several rounds for the environment to recover.

Fig. 5: Benefit of ignorance in a game with a single-stochastic transition.
figure 5

a We consider a stochastic game in which defection by any player leads to the inferior second state. From there, players return to the more profitable first state after mutual cooperation with probability q. bf We compute numerically exact cooperation rates for the stochastic game with no information and with full information, for different values of the transition probability q and selection strength β. For no and weak selection, there is a benefit of ignorance for all values of q (0, 1). For intermediate and strong selection, a benefit of ignorance persists when the transition probability q is sufficiently small. g,h, We plot how often each strategy is played for q = 0.2 and strong selection. Because any defection leads to state 2, we can use a simplified notation for full-information strategies, \({{{{{{{\bf{p}}}}}}}}=({p}_{CC}^{1};{p}_{CC}^{2},\, {p}_{CD}^{2},\, {p}_{DC}^{2},\, {p}_{DD}^{2})\); the remaining three entries \({p}_{CD}^{1}\), \({p}_{DC}^{1}\), \({p}_{DD}^{1}\) are irrelevant (Section 3.4 in Supplementary Note 3). We observe that when there is no information, most players adopt WSLS. With full information, there is no clearly winning strategy. Baseline parameters are the same as before. For no, weak, intermediate and strong selection we use β = 0, β = 0.001, β = 1, and β = 10, respectively.

For this example, we find that the value of information varies non-trivially, depending on the transition probability q and the strength of selection β (Fig. 5b–e). Overall, parameter regions with a benefit of ignorance seem to prevail (Fig. 5f). To obtain analytical results, again we study the game for weak selection (β → 0). In that case, the value of information can be computed explicitly, as \({V}_{0}({{{{{{{\bf{q}}}}}}}})=-\frac{3q(1-q)}{64(1+q)}\). In particular, there is a benefit of ignorance for all intermediate values q (0, 1). This benefit becomes most pronounced for \(q=\sqrt{2}-1\) (for more details, see Supplementary Note 3, Section 3.4). As we increase the selection strength, however, the dynamics can change, depending on q. For small q, we continue to observe a benefit of ignorance, whereas for larger q information tends to become beneficial (Fig. 5f).

To explore the scenarios with a benefit of ignorance, we record which strategies players adopt for q = 0.2. Without state information, we find that players adopt WSLS almost all of the time (Fig. 5g). In contrast, when players condition their strategies on state information, WSLS is risk-dominated by a strategy that has been termed Ambitious WSLS31 (AWSLS). AWSLS differs from WSLS after mutual cooperation, in which case AWSLS only cooperates when players are in the first state (i.e., \({q}_{CC}^{1}=1\) but \({q}_{CC}^{2}=0\)). Once AWSLS is common in the population, it opens up opportunities for less cooperative strategies to invade. In particular, also non-cooperative strategies like Always Defect (ALLD) are adopted for a non-negligible fraction of time (Fig. 5h). Overall, we find that predicting the effect of information is non-trivial. While some parameter combinations favor populations with full information, we also observe a benefit of ignorance for a significant portion of the parameter space.

To obtain a more comprehensive picture, we numerically analyze all stochastic games with single-stochastic transition vectors. Because the corresponding transition vectors have exactly one entry q between 0 and 1, there are 6  25 = 192 cases in total. We find several regularities. First, similarly to games with deterministic transitions, we find that there are 24 transition vectors for which the game is neutral. In all of these games, one of the two states is absorbing. Second, we analyze the remaining cases in the limit of vanishing selection (Fig. S7). Most of these games follow the rule defined by the proxy variable X in Eq. (4), with some exceptions discussed in detail in Supplementary Note 2. Finally, for positive selection strengths we can again compute the players’ average cooperation rates numerically. We do this for all 192 families of games for weak (Fig. S8), intermediate (Fig. S9), and strong selection (Fig. S10). Similar to the case of deterministic transitions, state information is beneficial in an absolute majority of cases (Fig. S11). However, exceptions can and do occur. A notable benefit of ignorance arises most frequently when mutual cooperation in the more beneficial state leads the players to remain in that state, and when mutual defection in any state is punished with deteriorating environmental conditions.

Our computational methods are not limited to games with deterministic or single-stochastic transitions. To obtain a comprehensive understanding of the general effect of state information, we systematically explore the space of all stochastic transition vectors. To make this analysis feasible, we assume the entries of q are taken from a finite grid \({q}_{ij}^{k}\in \{0,\, 0.2,\, 0.4,\, 0.6,\, 0.8,\, 1.0\}\), leading to 66 = 46, 656 possible cases. Our numerical results again confirm that for the majority of these cases, environmental information is beneficial (Fig. S12b). Although there is also a non-negligible number of games for which populations are better off without information, the respective benefit of ignorance is often small (Fig. S12a).

Discussion

When people interact in a social dilemma, their actions often have spillovers to their social, natural, and economic environment56,57,58,59. Changes in the environment may in turn modulate the characteristics of the social dilemma. One important example of such a feedback loop is the ‘tragedy of the commons’60. Here, groups with little cooperation may deteriorate their environment, thereby restricting their own feasible long-run payoffs.

Such spillovers between the groups’ behavior and their environment can be formalized as a stochastic game28. In stochastic games, individuals interact for many time periods. In each period, they may face a different kind of social dilemma (state). The way they act in one state may affect the state they experience next. Recently, stochastic games have become a valuable model for the evolution of cooperation, because changing environments can reinforce reciprocity31,32,33. In particular, the evolution of cooperation may be favored in stochastic games even if cooperation is disfavored in each individual state31, see also Fig. S2a, b. However, implicit in these studies is the assumption that individuals are perfectly aware of the state they are in. Here, we systematically explore the implications of this assumption. We study to which extent individuals learn to cooperate, depending on whether or not they know the present state of their environment. We say the stochastic game shows a benefit of information if well-informed groups tend to be more cooperative. Otherwise, we speak of a benefit of ignorance.

Already for the most basic instantiation of a stochastic game, with two individuals and two states, we find that the impact of information is non-trivial. All three cases are possible: state information can be beneficial, neutral, or detrimental for cooperation. To explore this complex dynamics, we employ a mixture of analytical techniques and numerical approaches. Analytical results are feasible in the limiting case of weak selection43,44,45. Here, we observe an interesting symmetry. For every stochastic game in which there is a benefit of information, there is a corresponding game with a benefit of ignorance. This symmetry breaks down for positive selection. As selection increases, we observe more and more cases in which state information becomes beneficial. Moreover, in those few cases in which a benefit of ignorance persists, this benefit tends to be small. These results highlight the importance of accurate state information for responsible decision making.

However, our research also highlights a few notable exceptions. We identify several ecologically plausible scenarios where individuals cooperate more when they ignore their environment’s state. One example is the game displayed in Fig. 2e–h. Here, players only remain in the profitable state when they both cooperate. Once they defect, they transition to the inferior state. From there, they can only escape if at least one player cooperates. This game reflects a scenario where the group’s environment reinforces cooperation. Cooperative groups are rewarded by maintaining access to the more profitable state. Non-cooperative groups are punished by transitioning to an inferior state. For this kind of environmental feedback it was previously observed that the simple WSLS strategy can sustain cooperation easily31,32,33. WSLS can be instantiated without any state information. Once a population settles at WSLS, providing state information can even be harmful; in that case, individuals may deviate towards more nuanced strategies, which in turn can destabilize cooperation. In this sense, our study mirrors previous results suggesting that richer strategy spaces can sometimes reduce a population’s potential to cooperate61.

To allow for a systematic treatment, we focus on comparably simple games. Nevertheless, the number of games we consider is huge. For example, if all transitions between states are assumed to be deterministic (independent of chance), there are 64 cases to consider (Figs. S4S6). If all but one transition are deterministic, we obtain 192 families of games (each having a free parameter q [0, 1], Figs. S7S10). In addition, we also systematically explore the set of fully stochastic transition functions, by considering 46,656 different cases (Fig. S12). In all these instances, we observe that seemingly innocent changes in the environmental feedback or in the game parameters can lead to complex changes in the dynamics. In particular, games with a benefit of information may turn into games with a benefit of ignorance. As shown in Fig. S13, we observe a similar sensitivity in games with more than two players. These observations suggest that there may be no simple rule that predicts the impact of state information. These difficulties are likely to further increase as we extend the model to more complex strategies47,48,49,50,51,52, or environments with multiple states31.

Overall, we believe our work makes at least two contributions. First, we introduce a simple and easily generalizable framework to explore how state information (or the lack thereof) affects the evolution of cooperation. This framework can be generalized into various directions. For example, in our model we compare two limiting cases. We either consider a population in which no one knows the state of the environment, or in which everyone gets precise information about the environment’s state. There are many interesting cases in between. In some applications, population members may only obtain an imperfect signal of the environment’s true state42. Alternatively, one may adapt our model to explore games with information asymmetries. As one instance of such a model extension, individuals may choose to acquire state information at a small cost. Such a model would allow researchers to explore whether individuals acquire information exactly in those games for which we find a benefit of information.

As our second contribution, our results illustrate the intricate dynamics that arise in the presence of environmental, informational, and behavioral feedbacks. By exploring these feedbacks in elementary stochastic games, we can better understand the more complex dynamics of the socio-ecological systems around us.

Methods

Calculation of payoffs in stochastic games

In this study, we compare the evolutionary dynamics for two strategy sets. The first set \({{{{{{{{\mathcal{S}}}}}}}}}_{F}\) is the set of all memory-one strategies for the full-information setting. The second set \({{{{{{{{\mathcal{S}}}}}}}}}_{N}\) consists of all memory-one strategies for the no-information setting. Equivalently, we can define \({{{{{{{{\mathcal{S}}}}}}}}}_{N}\) as the set of all full-information strategies that do not condition their behavior on the current state,

$${{{{{{{{\mathcal{S}}}}}}}}}_{N}=\left\{\,{{{{{{{\bf{p}}}}}}}}\in {{{{{{{{\mathcal{S}}}}}}}}}_{F}\,\left|\right.\,{p}_{a\tilde{a}}^{1}={p}_{a\tilde{a}}^{2}\,\,\forall a,\tilde{a}\in \{C,\, D\}\,\right\}.$$
(5)

We denote by \({{{{{{{{\mathcal{P}}}}}}}}}_{F}\) and \({{{{{{{{\mathcal{P}}}}}}}}}_{N}\) the respective sets of deterministic strategies, for which all entries are required to be either zero or one. In the following, we describe how to calculate payoffs when players have full information. Since any strategy for the case of no information can be associated with a full-information strategy, the same method also applies to the case of no information.

As our baseline, we consider games that are infinitely repeated and in which there is no discounting of the future. Given player 1’s effective memory-1 strategy p and player 2’s effective strategy \(\tilde{{{{{{{{\bf{p}}}}}}}}}\), such games can be described as a Markov chain. The states of this Markov chain correspond to the eight possible outcomes \(\omega=({s}_{i},\, a,\, \tilde{a})\) of a given round. Here, si {s1, s2} reflects the environmental state, and \(a,\, \tilde{a}\in \{C,\, D\}\) are player 1’s and player 2’s actions, respectively. The transition probability to move from state \(\omega=({s}_{i},\, a,\, \tilde{a})\) in one round to \({\omega }^{{\prime} }=({s}_{i}^{{\prime} },\, {a}^{{\prime} },\, {\tilde{a}}^{{\prime} })\) in the next round is a product of three factors,

$${m}_{\omega,{\omega }^{{\prime} }}=x\cdot y\cdot \tilde{y}.$$
(6)

The first factor

$$x=\left\{\begin{array}{cl}{q}_{a\tilde{a}}^{i}&\,{{\mbox{if}}}\,{s}_{i}^{{\prime} }={s}_{1}\\ 1-{q}_{a\tilde{a}}^{i}&\,{{\mbox{if}}}\,{s}_{i}^{{\prime} }={s}_{2}\end{array}\right.$$
(7)

reflects the probability to move from environmental state si to \({s}_{i}^{{\prime} }\), given the player’s previous actions. Since the game is symmetric, we note that \({q}_{DC}^{i}\) is defined to be equal to \({q}_{CD}^{i}\). The other two factors are

$$y=\left\{\begin{array}{cl}{p}_{a\tilde{a}}^{{i}^{{\prime} }}&\,{{\mbox{if}}}\,{a}^{{\prime} }=C\\ 1-{p}_{a\tilde{a}}^{{i}^{{\prime} }}&\,{{\mbox{if}}}\,{a}^{{\prime} }=D\end{array},\right.$$
(8)
$$\tilde{y}=\left\{\begin{array}{cl}{\tilde{p}}_{\tilde{a}a}^{{i}^{{\prime} }}&\,{{\mbox{if}}}\,{\tilde{a}}^{{\prime} }=C\\ 1-{\tilde{p}}_{\tilde{a}a}^{{i}^{{\prime} }}&\,{{\mbox{if}}}\,{\tilde{a}}^{{\prime} }=D.\end{array}\right.$$
(9)

They correspond to the conditional probability that each of the two players chooses the action prescribed in \({\omega }^{{\prime} }\). By collecting all these transition probabilities, we obtain an 8 × 8 transition matrix \(M=({m}_{\omega,{\omega }^{{\prime} }})\). Assuming that players are subject to errors and that the game’s transition vector satisfies q ≠ (1, 1, 1, 0, 0, 0), this transition matrix has a unique left eigenvector v. The entries \({v}_{a\tilde{a}}^{i}\) of this eigenvector give the frequency with which players observe the outcome \(\omega=({s}_{i},\, a,\, \tilde{a})\) over the course of the game. For a given transition vector q, we can thus compute the first players’ expected payoff as

$$\pi ({{{{{{{\bf{p}}}}}}}},\, \tilde{{{{{{{{\bf{p}}}}}}}}})={b}_{1}\left({v}_{CC}^{1}+{v}_{DC}^{1}\right)+{b}_{2}\left({v}_{CC}^{2}+{v}_{DC}^{2}\right)-c\left({v}_{CC}^{1}+{v}_{CD}^{1}+{v}_{CC}^{2}+{v}_{CD}^{2}\right)$$
(10)

The second player’s payoff can be computed analogously. Similarly, the average cooperation rate of the two players can be defined as follows.

$$\gamma ({{{{{{{\bf{p}}}}}}}},\, \tilde{{{{{{{{\bf{p}}}}}}}}})=\left({v}_{CC}^{1}+\frac{{v}_{CD}^{1}+{v}_{DC}^{1}}{2}\right)+\left({v}_{CC}^{2}+\frac{{v}_{CD}^{2}+{v}_{DC}^{2}}{2}\right).$$
(11)

In this work, we focus on games without discounting. However, similar methods can be applied to games in which future payoffs are discounted by a factor of δ (or equivalently, to games with a continuation probability δ). For δ < 1, instead of computing the left eigenvector of the transition matrix, we define v to be the vector

$${{{{{{{\bf{v}}}}}}}}=(1-\delta ){{{{{{{{\bf{v}}}}}}}}}_{{{{{{{{\bf{0}}}}}}}}}\mathop{\sum }\limits_{t=0}^{\infty }{(\delta M)}^{t}=(1-\delta ){{{{{{{{\bf{v}}}}}}}}}_{{{{{{{{\bf{0}}}}}}}}}{({I}_{8}-\delta M)}^{-1}.$$
(12)

In this expression, v0 is the vector that contains the probabilities to observe each of the eight possible states ω in the very first round. Moreover, I8 is the 8 × 8 identity matrix. Similar to before, the entries of \({{{{{{{\bf{v}}}}}}}}=({v}_{a,\tilde{a}}^{i})\) represent the weighted average that describes how often the two players visit the state ω over the course of the game62. The payoffs and the average cooperation rate can then again be computed with the formulas in (10) and (11). We use this approach when we explore the impact of the continuation probability δ on the robustness of our results in Fig. S1e, f.

Evolutionary dynamics

To model how players learn to adopt new strategies over time, we study a pairwise comparison process55 in the limit of rare mutations63,64,65,66. We consider a population of fixed size N. Initially, all players adopt the same resident strategy pR = ALLD. Then one of the players switches to a randomly chosen alternative strategy pM. This mutant strategy may either go extinct or reach fixation, depending on which payoff it yields compared to the resident strategy. If the number of players adopting the mutant strategy is given by k, the expected payoffs of the two strategies is

$${\pi }_{R}(k)=\frac{N-k-1}{N-1}\cdot \pi ({{{{{{{{\bf{p}}}}}}}}}_{R},\, {{{{{{{{\bf{p}}}}}}}}}_{R})+\frac{k}{N-1}\cdot \pi ({{{{{{{{\bf{p}}}}}}}}}_{R},\, {{{{{{{{\bf{p}}}}}}}}}_{M}),$$
(13)
$${\pi }_{M}(k)=\frac{N-k}{N-1}\cdot \pi ({{{{{{{{\bf{p}}}}}}}}}_{M},\, {{{{{{{{\bf{p}}}}}}}}}_{R})+\frac{k-1}{N-1}\cdot \pi ({{{{{{{{\bf{p}}}}}}}}}_{M},\, {{{{{{{{\bf{p}}}}}}}}}_{M})$$
(14)

Based on these payoffs, the fixation probability of the mutant strategy can be computed explicitly43,67,

$$\rho ({{{{{{{{\bf{p}}}}}}}}}_{R},\, {{{{{{{{\bf{p}}}}}}}}}_{M})=\frac{1}{1+\mathop{\sum }\limits_{i=1}^{N-1}\mathop{\prod }\limits_{k=1}^{i}\exp \left[-\beta \left({\pi }_{M}(k)-{\pi }_{R}(k)\right)\right]}.$$
(15)

As the selection strength parameter β approaches zero, this fixation probability approaches the neutral probability 1/N, as one may expect. As β increases, the fixation probability is increasingly biased in favor of mutant strategies with a high relative payoff.

If the mutant fixes, it becomes the new resident strategy. Then another mutant strategy is introduced and either fixes or goes extinct. By iterating this basic process for τ time steps, we obtain a sequence (p0, p1, p2, …, pτ) where pt is the resident strategy present in the population after t mutant strategies have been introduced. Based on this sequence, we can calculate the population’s average cooperation rate and payoff as

$$\hat{\pi }=\mathop{\lim }\limits_{\tau \to \infty }\frac{1}{\tau+1}\mathop{\sum }\limits_{t=0}^{\tau }\pi ({{{{{{{{\bf{p}}}}}}}}}_{t},\, {{{{{{{{\bf{p}}}}}}}}}_{t}),$$
(16)
$$\hat{\gamma }=\mathop{\lim }\limits_{\tau \to \infty }\frac{1}{\tau+1}\mathop{\sum }\limits_{t=0}^{\tau }\gamma ({{{{{{{{\bf{p}}}}}}}}}_{t},\, {{{{{{{{\bf{p}}}}}}}}}_{t}).$$
(17)

Because the evolutionary process is ergodic for any finite β, these time averages exist and are independent of the population’s initial composition.

If players have infinitely many strategies, the payoff and cooperation averages in (16) can only be approximated, by simulating the above described process for a sufficiently long time τ. However, when strategies are taken from a finite set \({{{{{{{\mathcal{P}}}}}}}}\), these quantities can be computed exactly. In that case, the evolutionary dynamics can again be described as a Markov chain63. Each state of this Markov chain corresponds to one possible resident population \({{{{{{{\bf{p}}}}}}}}\in {{{{{{{\mathcal{P}}}}}}}}\). Given that the current resident population uses p, the probability that the next resident population uses strategy \(\tilde{{{{{{{{\bf{p}}}}}}}}} \, \ne \, {{{{{{{\bf{p}}}}}}}}\) is given by \(\rho ({{{{{{{\bf{p}}}}}}}},\tilde{{{{{{{{\bf{p}}}}}}}}})/|{{{{{{{\mathcal{P}}}}}}}}|\). By calculating the invariant distribution w = (wp) of this Markov chain, we can compute the average cooperation rates and payoffs according to Eq. (16) by evaluating

$$\hat{\pi }=\mathop{\sum}\limits_{{{{{{{{\bf{p}}}}}}}}\in {{{{{{{\mathcal{P}}}}}}}}}{w}_{{{{{{{{\bf{p}}}}}}}}}\cdot \pi ({{{{{{{\bf{p}}}}}}}},\, {{{{{{{\bf{p}}}}}}}}),$$
(18)
$$\hat{\gamma }=\mathop{\sum}\limits_{{{{{{{{\bf{p}}}}}}}}\in {{{{{{{\mathcal{P}}}}}}}}}{w}_{{{{{{{{\bf{p}}}}}}}}}\cdot \gamma ({{{{{{{\bf{p}}}}}}}} \,,{{{{{{{\bf{p}}}}}}}}).$$
(19)

Herein, we perform these calculations for the specific strategy sets for full information and no information, \({{{{{{{{\mathcal{P}}}}}}}}}_{F}\) and \({{{{{{{{\mathcal{P}}}}}}}}}_{N}\), respectively. By comparing the respective averages \({\hat{\gamma }}^{F}\) and \({\hat{\gamma }}^{N}\), we characterize for which stochastic games there is a benefit of information, by computing \({V}_{\beta }({{{{{{{\bf{q}}}}}}}})={\hat{\gamma }}^{F}-{\hat{\gamma }}^{N}\).

We use this process based on deterministic strategies, pairwise comparisons, and rare mutations for all of our main text figures. As robustness checks, we present several variations of this model in the SI. For example, in Fig. S1a,b, we show simulation results for players with stochastic memory-1 strategies. To this end, we assume that mutant strategies are randomly drawn from the spaces \({{{{{{{{\mathcal{S}}}}}}}}}_{F}\) and \({{{{{{{{\mathcal{S}}}}}}}}}_{N}\). To make sure that strategies close to the corners get sufficient weight, the entries \({p}_{a\tilde{a}}^{i}\) are sampled according to an arcsine distribution, as for example in Nowak and Sigmund12. Similarly, in Fig. S1h, i, we show simulations for positive mutation rates. In Fig. S2a, b, we compare the results from Fig. 2 to a setup in which players only engage in the game in the first state (without any transitions), or in which they only engage in the game in the second state. In addition, in Fig. S2c, d, we run simulations when players are unable to condition their behavior on the outcome of the previous round. Finally, to explore whether our qualitative results depend on the specific learning process we use, we have also implemented simulations with an alternative learning process, introspection dynamics68,69,70. The respective results are shown in Fig. S3.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.