Impact of crowding on the diversity of expanding populations

Significance Growing cell populations become densely packed as cells proliferate and fill space. Crowding prevents spatial mixing of individuals, significantly altering the evolutionary outcome from established results for well-mixed populations. Despite the fundamental differences between spatial and well-mixed populations, little is known about the impact of crowding on genetic diversity. With microbial colonies on plates, we show that the allele frequency spectrum is characterized by a power law for low frequencies. Using cell-based simulations and microfluidic experiments, we identify the origin of this distribution in the volume-exclusion interactions within the crowded cellular environment, enabling us to extend these findings to a broad range of dense populations. This study highlights the importance of cellular crowding for the emergence of rare genetic variants.

In the limit where ∆ 1/2 (or equivalently n λ) This relationship between position at birth and final clone size is consistent with what we predict in the main text for 30 infinitesimal clones (Eq. 1 in the main text) and find in cell-based simulations (Fig. 3b). 31 This approximation underestimates the final clone size, with the largest errors corresponding to mutations that occur 32 closest to the front (∆ λ). However, the approximation works very well even for the largest possibly non-surfing clones 33 that originate at ∆ = σ0, corresponding to a relative error of |(nexact − napprox)/nexact| = 0.1. Clones born closer to the front 34 (∆ < σ0) tend to surf (Fig. 3c), leading to a qualitative change in the clone size distribution that is highly dependent on the 35 granular nature of the cell colony. 36 The relationship between position at birth and final clone size, which holds for each clone individually, translates into a 37 prediction for the global clone size distribution when combined with the probability of observing a mutation at distance ∆. We 38 assume that the mutation rate is proportional to the growth rate (no death), meaning that mutations occur with a certain 39 probability only when a new cell is born. Since growth is constant within the growth layer, the probability that a mutation will 40 occur at ∆ (for 0 < ∆ < λ) is P (∆) = λ −1 . By inverting Eq. 2 in order to calculate d∆/dn 41 ∆ = 1 2 e n/λ + 1 e n/λ − 1 , we can obtain the probability of observing a clone of size n 43 P (n) = P (∆) d∆ dn = e n/λ λ 2 (e n/λ − 1) 2 , [5]

Extension to non-uniform growth layer
We built an ODE model able to explain the form of the clone size distribution we observe for an arbitrary one-dimensional 48 growth profile k(z). At time t = 0, a mutant cell is born a distance ∆ behind the front. We assume that the growth rate k(z) 49 depends only on the z position of the cell measured as distance from the front in units of cell widths and that k(z) is constant 50 over the length of one cell. The clone size n grows according to where z(t) is mutant clone's position at time t and where we have taken the limit that the clone size is smaller than the 53 lengthscale on which k decays (the growth layer size). Formally, the final size of the clone will be:

58
A key assumption here is that the growth profile is not time varying in the frame moving with the front. Now, we can write 59 the asymptotic clone size as: Note that in order for the asymptotic area to be well-defined, the integral over z must be finite. We make the following change 62 of variables: Therefore, where we have dropped the subscript on n. We see that κ∞ must be a finite constant to have a finite asymptotic area, so this 68 further constrains our choice of k(z).

69
The clone size distribution P (n) can be written as and from the relationship above we have that If we assume that the probability of mutating is proportional to the growth rate, then P (∆) ∼ k(∆). It thus follows that The result holds for exponential growth profiles, power law profile with small and large z cutoffs, and Monod type profiles of 76 the form e −z 1+e −z . We explicitly test this prediction for mechanical cell-based simulations with exponential profiles in Fig. S9.

77
Another interesting aspect of this analysis is that the final bubble area depends on its position at birth through the term 78 κ ∆ . We see that κ ∆ is a measure of the total amount of available biomass between the bubble's birth position and the front.

79
In other words, the bubble size is dictated by global properties of the nutrient profile. If the mutations are not neutral, the mutant population will grow at a different rate compared to the WT. We assume that this 82 difference is given by a multiplicative constant, so that if the WT grows according to k(z), the mutant grows according to 83 (1 + s)k(z) where s > −1 is the "fitness difference" between the two strains.

84
Using the same analytical derivation as in the previous section, we find that . 86 Note that for neutral mutations (s = 0) we recover the old result. We now get for the probability distribution (conditioned on it becomes more broad as we get to larger positive fitness effects.

Selection: distribution of fitness effects
where n s s is related to the generating function of the distribution of fitness effects, e zs s, evaluated at z = log n. 99 Let's assume s ∼ N (0, σ), so we have: where we have dropped any constant factors. We can complete the square and perform the integral to get: since σ is assumed to be small. to size σ f = σ0λ/(∆ + σ0/2). Inverting this relationship gives ∆ = λ/n − 1/2, where the length scale is rescaled by σ0. The 111 result slightly deviates from Eq. 1 in the main text, but P (n) ∝ d∆/dn ∝ n −2 holds in this case as well. The clone size distribution for non-surfing clones behaves like n −2 when assuming that the probability P (∆) for a mutation to appear at position ∆ is proportional to the net growth rate k(∆) in such position. However, this assumption might break under certain conditions, for instance if a non-homogeneous death rate is present. In the general case, it still holds that the probability of observing a mutant of size n is and the relationship between final size n and position at birth ∆ remains where k(z) is the net growth rate at position z. However, further simplifications cannot be made, leading to the general expression where the notation ∆(n) highlights that ∆ is a function of n. The functional form of the clone size distribution P (n) will then 114 in general depend on the specific form of P (∆) and k(∆). 115 For illustration, we report here an example in which we define the net growth rate k(z) = exp(−z/λ) = α(z) − β(z), where α(z) represents the birth rate and is proportional to the mutation rate P (z), while β(z) is the death rate. In this case, n = [1 − e −∆/λ ] −1 and .
If α(z) is uniform along z and β(z) = 1 − e −z/λ , then the clone size distribution P (n) ∝ 1 n(n−1) , which tends to n −2 for large 116 n, but deviates from it at small n. This would correspond to the case in which replication rate is not affected by position, but 117 death rate increases as we move deeper inside the colony, for instance because of the accumulation of toxic waste.
118 7. Inferring mutation rate from non-surfing clones distribution and localized deep sequencing related quantity to P (Clone size > n) that can be observed experimentally is the number of mutations M (Clone size > x) that 122 are carried by at least a frequency x of the population (Fig. 4). Because the total number of mutations in a population of final 123 size N is approximately µN , where µ is the mutation rate per replication, it follows that M (Clone size > x) = µN P (Clone size 124 > n) = µN/n = µ/x. It is therefore possible to estimate the mutation rate µ from the prefactor of the non-surfing clone regime 125 of M (Clone size > x) (low-frequency range of the black line in Fig. 4).

126
If the population is too large or sequencing coverage is too low to observe the non-surfing clone regime, we find that localized 127 bulk deep sequencing can be used (cyan line in Fig. 4). In this case, the observed frequencyx of a mutation represents the 128 frequency in the sequenced sub-populationN < N . However, also the total number of mutations in the sub-population scales (b) Clone size distribution rescaled by λ shows that the n −1 regime extends over the range n = 1 to n = λ. For n > λ, the clone size distribution is dominated by surfing bubbles (Fig. 3a).   S3. A schematic of the engineered S. cerevisiae strain yJK10 that stochastically switches the color from RFP (yEmRFP) to GFP (yEGFP). The switching rate is tunable with β-estradiol, and the fitness advantage/disadvantage of switched cells can be tuned with drugs, hygromycin B and cycloheximide due to an additional cycloheximide resistance allele cyh2∆::cyh2r (1). The genotype of the strain is as follows: W303 MATa cyh2∆::cyh2-Q37E-cs hmlα2∆::R ho∆::prSCW11-cre-EBD78-natMX ura3∆::prGPD-loxP-yEmRFP-tCYC1-CYH2-hygMX-loxP-yEGFP-tADH3.  S4. (a) A snapshot from the particle image velocimetry analysis (2, 3). Each arrow shows the parallel component of the displacement of a 20x20 µm 2 region (32x32-pixel) during one time frame (10 minutes). For the sake of visibility, the length of arrows is rescaled by factor of 2, and the number of arrows is reduced from 37x37 to 13x13. (b) Under our usual experimental conditions (nutrient-rich condition, 2%-glucose YPD media), the velocity field is linear along the growth direction, showing that all cells grow at the same rate. To make sure this method would capture a drop in growth rate, we replicated this experiment under nutrient-poor condition (0.01%-glucose YPD media). As expected, the overall velocity is reduced (slower growth rate) and heterogeneous along the growth direction (see inset), showing a slow down in the middle of the chamber due to nutrient depletion. The error shows the standard deviation of the statistics across horizontal positions and over 100 (nutrient-rich) and 140 (nutrient-poor) time points.  The ellipse-shaped cells in these simulations have aspect ratio = 1 at birth and grow to aspect ratio = 2. These simulations use conjugate gradient energy minimization (5) rather than overdamped molecular dynamics as used in the main text (Fig. 3). Ellipse data is shown in cyan, budding data is shown in red, and budding data from the main text (Fig. 3b)