The Cultural Brain Hypothesis: How culture drives brain expansion, sociality, and life history

In the last few million years, the hominin brain more than tripled in size. Comparisons across evolutionary lineages suggest that this expansion may be part of a broader trend toward larger, more complex brains in many taxa. Efforts to understand the evolutionary forces driving brain expansion have focused on climatic, ecological, and social factors. Here, building on existing research on learning, we analytically and computationally model the predictions of two closely related hypotheses: The Cultural Brain Hypothesis and the Cumulative Cultural Brain Hypothesis. The Cultural Brain Hypothesis posits that brains have been selected for their ability to store and manage information, acquired through asocial or social learning. The model of the Cultural Brain Hypothesis reveals relationships between brain size, group size, innovation, social learning, mating structures, and the length of the juvenile period that are supported by the existing empirical literature. From this model, we derive a set of predictions—the Cumulative Cultural Brain Hypothesis—for the conditions that favor an autocatalytic take-off characteristic of human evolution. This narrow evolutionary pathway, created by cumulative cultural evolution, may help explain the rapid expansion of human brains and other aspects of our species’ life history and psychology.

To explore the evolutionary adaptive dynamics of the CBH, we begin with individuals represented by three continuous variables: brain size , adaptive knowledge , and reliance on social learning (over asocial learning; e.g. time spent), . We will initially ignore the evolution of oblique learning, learning biases, and population structure, and assume that individuals using social learning use oblique learning and learning biases to hone in on the target individual with the most adaptive knowledge. We will relax this assumption in our simulation and allow oblique learning, learning biases, life history, and population structure to endogenously evolve. Table A is a handy key for the variables in our analytic model. Scaling coefficient used to scale relationship between adaptive knowledge and ≥ 0 Scaling constant relating brain size to death rate (similar to in simulation) ≥ 0 Death rate mitigation (e.g. richness of ecology) used to scale effect of in reducing ≥ 0

Increase in population size [−∞, ∞]
Individual has two routes to acquire adaptive knowledge : (1) through asocial (individual) learning as a function of their own brain size and (2) through social learning as a function of the target model from whom they learn ( ). The proportion of time or propensity to use social over asocial learning is given by . Thus adaptive knowledge is given by: Where is transmission fidelity (how well an individual can learn from a model), is asocial learning efficacy (how effectively an individual can use their brain to figure things out), and is the adaptive knowledge possessed by the individual in the parent generation from whom they are learning (e.g. model with maximal or model with average , etc). The parameters and in Equation 1 are abstractions of more complicated details covered in other work. By outsourcing the evolution of these features to other models, we can focus on the core of the CBH argument; i.e. how learning, brain size, knowledge, sociality, and life history are interconnected. Examples of this earlier work includes, Lewis and Laland [1] model of the relationship between transmission fidelity and the rate of trait loss, showing that sufficiently high transmission fidelity is necessary for cumulative culture, even more so than novel invention, incremental improvement, and recombination. Relatedly, building on work by Henrich [2], Mesoudi [3] models how increases in cumulative culture (driven by, for example, sociality) are more difficult for each generation to acquire. Thus, selection favors mechanisms to increase transmission fidelity. Muthukrishna and Henrich [4] discuss the many mechanisms to increase transmission fidelity as adaptive knowledge accumulates. Mechanisms such as explicit teaching may not be required in a small-scale society, but in a large-scale society, not only is explicit teaching required, but also formal institutionalized schooling from a variety of teachers. Thus, could include individuals' cognitive abilities [itself increased by culture; see 4], but also greater social tolerance, more interactions or opportunities for interaction, and some passive or active teaching by models, and so on [for more examples, see 4,5,6]. Transmission fidelity could be broken down into constraints and endogenous state variables for genetic, cultural, and social factors, as well as interactions between these (e.g. genes for sociality), but for the purposes of expressing our argument, here we capture all this with . Similarly, our model relies on the idea that "bigger" brains will be better at solving novel problems, and figuring stuff out [7,8]. As Deaner, et al. [7] analyses reveal, at least in primates, the best predictor of cognitive ability is overall brain size. But, as with transmission fidelity, many factors will influence individuals' ability to use their brains, such as constraints on time (for trial and error learning) or energy. These constraints are captured by . We take an evolutionary adaptive dynamics approach to find the evolutionary stable strategies (ESS) in our model. This approach involves assuming a monomorphic population and then looking at the "invasibility" of the population to a mutant (in variables of interest) with slightly different values. Appropriate to the dynamics we are interested in, this analytic method assumes mutations are small (i.e. we are not exploring competition between two vastly different groups). Social Learning. To determine the average adaptive knowledge in a population that is monomorphic for resident genotype ( , ), we'll initially assume that genotype is fixed over the course of learning. We'll assume that the learning process leads to a distribution of adaptive knowledge values in the population and that individuals using social learning select a model using payoff-biased learning, choosing to learn from the model with the maximal possible value of adaptive knowledge (i.e., they learn from the rare individual who has attained the maximal value). In the simulation, we will relax this assumption and allow oblique learning (learning from non-genetic parents) and learning bias to evolve. Assuming individuals do learn from the best model when social learning, the mean adaptive knowledge in the population is given by: We further assume that the maximal adaptive knowledge is constrained by the brain size of the learner, such that = , where > 0 is some scaling parameter. As we shall see, the insights of the model are independent of the specific value of the scaling. Thus, Equation 2 becomes: We can now easily understand the adaptive dynamics of the social learning trait ( ) assuming more adaptive knowledge has a higher payoff. For a given brain size ( ), we simply compare and : if > , then it pays to increase as much as possible to maximize adaptive knowledge (i.e. → 1); conversely, if < , then it pays to decrease as much as possible to maximize adaptive knowledge (i.e. → 0). This will be true as long as individuals have access to a range of models and are learning from the model with the greatest adaptive knowledge. Given these conditions, the key to reliance on social learning is the ability to learn with high fidelity and the key to reliance on asocial learning is the ability to efficiently use one's brain to learn by oneself. Further, if there is some limitation on accessing the model with the maximal adaptive knowledge, such as ineffective payoff biased learning making it difficult to identify who has the most adaptive knowledge or too small or disconnected a population for at least one individual to consistently reach this maximal value every generation, then the evolution of social learning is also going to depend on the maximal adaptive knowledge learners have access to. We explore these dynamics in the simulation model. Brain Size. To determine the adaptive dynamics of brain size, we need an ecological model for monomorphic populations (i.e. for populations that consist of a single resident type ( , )). To do this, we need to specify how the various traits affect the birth and death rates in the model. We use a logistic ecological model: Here is population density, is the per capita birth rate of the resident and is the per capita death rate of the resident. Next, we specify the per capita birth rate ( ) and death rate ( ). We assume the birth rate decreases with population size (density dependence influencing carrying capacity), but that that decrease is slower with increased adaptive knowledge (e.g. allowing you to support more offspring or outcompete competitors in access to mating opportunities). The birth rate ( ) is given by: Where is the maximal birth rate and that dependence leads to a linear decrease in the birth rate given by the second half of Equation 4. This linear decrease is assumed to be influenced by the mean adaptive knowledge ( ̅), such that more adaptive knowledge leads to a larger denominator, slowing the decrease with density dependence (allowing for a higher effective carrying capacity). 0 and 1 are positive parameters, which we set to 1, without loss of generality, in the following analyses.
We assume that a larger brain is more costly than a smaller brain in terms of death rate (e.g. higher calorie requirements), but that more adaptive knowledge lowers the death rate (e.g. finding food or evading predators). The death rate ( ) is given by: This function assumes that the cost of brains scales up in a polynomial fashion (e.g. = 2), but that the reduction in the death rate through adaptive knowledge is an exponential decay, where adaptive knowledge is bounded by brain size (i.e. ≤ ). Here scales the maximum brain size and scales the death rate reducing payoff to adaptive knowledge. The degree to which adaptive knowledge can offset brain size is determined by and the ratio of adaptive knowledge to brain size (adaptive knowledge is constrained by brain size regardless of learning mechanism and as brains grow, more knowledge is required to provide an equivalent offset). The parameter allows us to adjust the extent to which adaptive knowledge can offset the costs of brain size, where = 0 indicates no offset and increasing increases the probability of survival for a given adaptive knowledge and brain size. The parameter can be interpreted as how much adaptive knowledge one requires to unlock the fitness-enhancing advantages. For example, in a calorie-rich environment where only a little skill or knowledge is required to access calories (e.g. simply remembering food locations), would be high -a little bit of knowledge gives a large return. Conversely, in a caloriepoor environment where a lot of skills or knowledge are required to access fewer calories (e.g. food needs significant preparation before safe consumption), would be low. Note that this is a mechanical relationship between adaptive knowledge and probability of survival. Calorie availability is a potentially useful metaphor to think about the model in concrete term, but its is not the only interpretation of , which could be influenced any number of factors, including knowledge required to evade predators or avoid environmental hazards. We are also not directly saying anything about selection on cognition, brain size, or information use [9]. For example, high might allow for larger brains due to greater food availability for given food finding knowledge, but equally, when is low, there may be selection pressure for larger brains with more knowledge needed to acquire the more difficult to access food. In the analytic model, the decrease to the death rate through adaptive knowledge becomes a constant since adaptive knowledge is a function of brain size (and parameters affecting learning efficiency), but although this will not affect the dynamics of the model, it will affect the final brain sizes. We fully explore this in the simulation. Given a resident ( , ), the equilibrium population size of the resident is determined by the solution to Equation 3: Since we know that → 0 when < and → 1 when > , we can consider these two cases, asocial learners and social learners, separately and then compare the outcomes of these two regimes.

Asocial learners ( = )
To determine the adaptive dynamics of brain size, consider a mutant (designated by subscript "m") with brain . This mutant's adaptive knowledge based on Equation 1 will be = , since = 0. Using the same ecological assumptions as before for a mutant type , and assuming the mutant is rare and growing (initially) in a resident population that is at its ecological equilibrium * , the per capita growth rate of the mutant, its invasion fitness, is: To examine the adaptive dynamics of brain size, we need to calculate the selection gradient by taking the derivative of the invasion fitness with respect to the mutant trait and evaluate this derivative at the resident value . To calculate if these equilibria are stable, we will calculate the second derivative. If the second derivative is negative, then the value is a convergent stable ESS. For those unfamiliar with this approach, it may be helpful to use a physical analog-distance, speed, and acceleration (or more accurately, displacement, velocity, and acceleration). The derivative of distance over time (metres) is speed (metres per second). The second derivative (derivative of speed) is acceleration (metres per second per second). The adaptive dynamics approach is the equivalent of looking at when an object is stationary (i.e. speed-derivative of distance-is 0) and confirming that these "equilibria" stationary points are convergent by confirming that objects decelerate around these points (i.e. acceleration-second derivative-is negative). If the second derivative were positive, objects would increase speed and move away from this stationary point, or in the present case, there would be positive selection for mutants away from this equilibrium. Let us calculate the selection gradient for brain size: From Equation 8 we can see that if > 1, ⁄ < 0 for large and ⁄ > 0 for small , which suggests that there is some intermediate ESS value for brain size ( * ). It is straightforward to check that the second derivative of the invasion fitness function (Equation 8) with respect to the mutant trait and evaluated at the resident trait is always negative and therefore the singular strategy * is a CSS (i.e., a convergent stable ESS). This equilibrium brain value (i.e. when ⁄ = 0) is difficult to solve for a generic polynomial . To calculate a solution, we can select a reasonable polynomial (e.g. = 2, which we use in the simulation) and solve for ⁄ = 0. As long as brain size is positive, the relationship between brain size and the death rate will be superlinear and monotonous; our qualitative results should be robust to the specific polynomial used. Here is the equilibrium brain size for = 2: * = − + √ 2 + 3 2 3 We need to compare the equilibrium brain size among asocial learners expressed in Equation 9 with the equilibrium brain size among social learners, so let's now calculate the dynamics for social learners.

Social learners ( = )
To determine the adaptive dynamics of brain size, consider a mutant (designated by subscript "m") with brain . This mutant's adaptive knowledge based on Equation 1 will be = , since = 1. Using the same ecological assumptions as before for a mutant type , and assuming the mutant is rare and growing (initially) in a resident population that is at its ecological equilibrium * , the per capita growth rate of the mutant, its invasion fitness, is: ( res , m ) = ( * ) − ( ) As before, to examine the adaptive dynamics of brain size, we need to calculate the selection gradient by taking the derivative of the invasion fitness with respect to the mutant trait and evaluate this derivative at the resident value . To calculate if these equilibria are stable, we will calculate the second derivative. If the second derivative is negative, then the value is a convergent stable ESS. Let us calculate the selection gradient for the brain size of social learners: As with asocial learners, from Equation 11 we can see that if > 1, ⁄ < 0 for large and ⁄ > 0 for small , which suggests that there is some intermediate ESS value for brain size ( * ). As before, it is straightforward to check that the second derivative of the invasion fitness function (Equation 11) with respect to the mutant trait and evaluated at the resident trait is always negative and therefore the singular strategy * is a CSS (i.e., a convergent stable ESS). We can set = 2 and calculate this equilibrium brain value (i.e. when ⁄ = 0): * = − + √ 2 + 3 2 2 3 (12) Equation 12 is functionally similar to Equation 9, but the equilibrium brain size for asocial and social learners will be different. Moreover, since to enter the realm of social learning, > , social learners, ceteris paribus, will have larger equilibrium brain sizes than asocial learners. Note this prediction -that social learners will have larger brain sizes than asocial learners -is an outcome of the model, not an assumption. Moreover, transmission fidelity, asocial learning efficacy, and the payoff for adaptive knowledge (e.g. richness of the environment) are all going to affect the equilibrium brain size.