An exactly solvable, spatial model of mutation accumulation in cancer

One of the hallmarks of cancer is the accumulation of driver mutations which increase the net reproductive rate of cancer cells and allow them to spread. This process has been studied in mathematical models of well mixed populations, and in computer simulations of three-dimensional spatial models. But the computational complexity of these more realistic, spatial models makes it difficult to simulate realistically large and clinically detectable solid tumours. Here we describe an exactly solvable mathematical model of a tumour featuring replication, mutation and local migration of cancer cells. The model predicts a quasi-exponential growth of large tumours, even if different fragments of the tumour grow sub-exponentially due to nutrient and space limitations. The model reproduces clinically observed tumour growth times using biologically plausible rates for cell birth, death, and migration rates. We also show that the expected number of accumulated driver mutations increases exponentially in time if the average fitness gain per driver is constant, and that it reaches a plateau if the gains decrease over time. We discuss the realism of the underlying assumptions and possible extensions of the model.


The Model
We model the tumour as a collection of microlesions made of cancer cells (Fig. 1A). Microlesions do not interact with one another; this is crucial for the analytic solubility of the model. This rather strong assumption could be justified by interpreting microlesions as spatially-separated metastases: however, even if microlesions are fragments of the same primary tumour, numerical simulations (ref. 12, Extended Data) indicate that as long as microlesions are separated by normal tissue, their growth rate is only minimally affected.
Each microlesion is fully characterized by two numbers: its age a and its type n which describes the genetic makeup of the lesion. For simplicity, we take n to be the number of driver mutations in the cell which initiated the lesion. This convention does not need all microlesions to be genetically the same, but only for their evolution to be completely determined by the two variables {n, a}. In what follows we shall assume that lesion {n, a} contains at most two different genotypes, n and n + 1, with relative abundances r n (a) and 1 − r n (a), respectively. Note that we do not model individual cells explicitly: rather, we model only the behaviour of microscopic lesions which contain thousands or even millions of cells.
The dynamics of the model is governed by three processes: growth, the emergence of new drivers caused by mutations, and the creation of new microlesions by migration.

Growth.
A microlesion {n, a} grows deterministically and its volume at age a is given by V n (a). The proportion of the n + 1-th genotype also grows deterministically and equals 1 − r n (a). The functions V n (a) and r n (a) depend on the growth model of a single, isolated lesion and we shall specify them later (Results, "Multiple driver mutations", equation (29) onwards). The spatial character of the model manifests itself in our choice of these functions; for example, V n (a) is very different for three-dimensional growth in which replication occurs only on the surface, and "well-mixed" growth in which growth occurs in the whole volume and all cells replicate with identical rates. Driver accumulation. Cells in microlesion {n, a} can gain a new driver n + 1 with some small but non-zero probability. This represents the occurrence of genetic mutations and is accounted for in the form of r n (a), the fraction of cells that do not contain the new n + 1-th driver. Therefore, a lesion changes its genetic composition in a deterministic way in this model.

Migration.
A lesion {n, a} seeds new microlesions with rate φ n (a). A new lesion has age a = 0 and, with probability r n (a), it inherits the same type n as the parent lesion, or with probability 1 − r n (a) it is assigned type n + 1. This accounts for what would have happened if we modelled individual cells within the lesion, and cells of different types migrated from the lesion with probabilities proportional to their abundances. For example, the probability that a cell of type n + 1 migrated and established a new lesion of type n + 1 would be equal to the fraction 1 − r n (a) of cells of type n + 1 in the lesion. This is the only other component of the model affected by spatial arrangement of cells.
The model defined by the above postulates is stochastic: while growth and mutation are deterministic, migration (creation of new lesions) is a random process. In what follows we shall focus on the analysis of the mean behaviour of the model, averaged over many realisations of the stochastic process, and we shall therefore ignore fluctuations: the role of fluctuations will be discussed in Section Comparison to simulations. Let f n (a, t) be the expected number of microlesions of age a at time t. The time evolution of f n (a, t) is governed by the following two equations, t n a n with the initial condition f n (a, 0) = δ n,1 δ(a) in which δ n,1 is the Kronecker delta and δ(a) the Dirac delta function. Equation (1) describes the shift of the distribution in time -the "ageing" of the microlesions. Equation (2) describes the birth of new lesions (through migration) and can be thought of as a boundary condition for Eq. (1). The function V n (a) which defines the growth of a single lesion does not appear in these equations, but it appears in the equation for the total volume of the tumour, n n n tot 1 0 Although equations (1 and 2) are quite general, their hierarchical structure allows an exact formal solution (Methods, "General solution with multiple types of microlesions", equation (52) onwards). To model cancer in this framework, we must define the functions r n (a), V n (a), φ n (a) such that they describe the growth of individual microlesions.

Results
Only one driver. We first consider three special cases of the model with only one type of microlesions. This will form a set of "null models" against which we will compare the multiple-driver models from the next section.
General solution of Eqs (1 and 2). In the case of one type of cells (no new driver mutations), r 1 (a) = 1 and r n (a) = 0 for n > 1, and Eqs (1 and 2) reduce to the McKendrick-von Foerster equation without removal 38 : t a with the boundary condition 0 and the initial condition f(a, 0) = δ(a). Note we have dropped the index "1" in f 1 (a, t) and φ 1 (a) for brevity.
is some non-negative continuous function, Θ (z) is the Heaviside step function, and the Dirac-delta term δ(t − a) accounts for the initial condition. The boundary condition (6) implies that t 0 Laplace-transforming the above equation gives where the constant term 1 is due to the singular initial condition, and the functions marked with tildes are Laplace transforms: Equation (8) enables us to calculate the distribution f(a, t), provided we can invert the Laplace transform. We shall see that this is possible in some important special cases (Results, "Multiple driver mutations"). However, even if the full distribution cannot be obtained, Eq. (8) provides us means to infer large-t behaviour of f(a, t). We note that  F s ( ) from Eq. (8) becomes singular when φ =  s ( ) 1 and behaves as ∼ −  F s s G ( ) 1/( ) near any of the roots G of the equation The inverse transform F(z) of  F s ( ) will thus contain terms ~exp(Gz). One of these terms corresponding to the largest G dominates the long-time behaviour, and the distribution in this limit reads where A is some positive constant. Equation (9) is called the Euler-Lotka equation 37,38 and its largest root G gives the (exponential) growth rate of the number of microlesions N(t) in our model as follows from Gt 0 Similarly, the total volume (3) also increases exponentially, Gt tot 0 These results are well known in the literature of mathematical ecology. Here they were re-derived using a different approach which can be adapted to more complicated cases with many cell types (new driver mutations). We shall now use these general results to investigate three special cases of cancer growth.
Surface growth. We assume that microlesions are spherically symmetric, cells replicate only on the surface, and that the centre of the lesion is static. Let the volume (number of cells) of the lesion increase with its age a as 3 where v is the speed (in cells/day) with which the radius of the ball of cells expands in time. We further assume that cells can migrate only from the surface, thus 3 2 where M is the dimensionless migration probability representing the fraction of cells on the surface which escape and go on to eventually seed new microlesions. Taking the largest root of Eq. (9) we find the asymptotic growth rate G to be 1/3 1/3 In fact, for this simple model we can find the exact expression for the number of microlesions N(t) and the total volume V tot (t). The Laplace transform of φ(a) reads 8πMv 3 /s 3 and from Eq. (8) we obtain which can be inverted: Here G is the growth rate from Eq. (15) and Θ (z) the Heaviside step. We thus obtain that the total number of microlesions increases in time as tot Gt Gt 2 which also increases exponentially with time. Figure  Surface growth with decreasing replication rate. In reality, a lesion cannot be expected to grow continuously. Growth eventually slows down due to spatial constrains and limited nutrient supply which is why many tumours never reach detectable sizes 40,41 . Let us consider a simple case of an exponential slow-down: where λ > 0 is the characteristic time scale of growth decrease. As before, we assume that proliferation occurs only on the surface, and therefore This function has a sigmoid shape and saturates at V(∞ ) = 8πv 3 /λ 3 . The formulas for φ(a), V(a) reduce to Eqs (13 and 14) when λ = 0 i.e. when there is no slow-down. We can now calculate  F s ( ): and, upon inverting the Laplace transform, we have 1/3 is the exponential growth rate of the tumour. We note that G > 0 only for sufficiently large M, thus the tumour made of microlesions whose growth slows down can grow exponentially as a whole only if sufficient migration is present. We can also calculate N(t) and the total volume exactly but since both expressions are rather lengthy and not particularly illuminating (SI Mathematica notebook), we quote here only the leading terms: Gt tot 3 2 Figure 2C shows that the volume increases exponentially only if λ is small enough, otherwise growth becomes arrested after some time.
Volumetric growth. Finally, we shall consider the case in which all cells from the lesion are able to replicate and migrate. If cells replicate with rate b and migrate with rate M, we obtain thus individual lesions grow exponentially in time. As expected, Eq. (9) predicts that the whole tumour grows exponentially with rate G = b i.e. the same rate as the birth rate of individual cells.

Multiple driver mutations.
We shall now extend these results to the case in which new drivers can emerge during tumour growth. Each new driver mutation speeds up growth as a result of increased proliferation/ decreased death of cells.
Surface growth. We assume that each new driver mutation increases the expansion speed of microlesions. This can be motivated as follows: replication and death balance exactly for normal cells to ensure homoeostasis. A driver mutation increases the birth rate or decreases the death rate and hence the net growth rate becomes positive 42 . In the case of surface growth of a solid lesion considered here this net growth rate is assumed to translate into a radial expansion of the lesion with speed v n .
We first take the case v n = nv 1 , i.e., each new driver contributes equally to v n . We have So far we have not specified r n (a), the fraction of cells with n drivers in the lesion of type n. To simplify calculations, we shall take n with some positive constant μ. The above equation can be rationalized if we assume that cells on the surface of the lesion mutate from n to n + 1 drivers with a very small rate μ and that their selective advantage can be neglected. The dynamics of type n + 1 in a lesion of type n is thus effectively neutral. A mutant sector that arose at time a 0 will then have approx. (a/a 0 ) 2 cells on the surface. The fraction 1 − r n (a) of mutant cells obeys then the following equation: which can be solved yielding Eq. (31). However, for the assumed v n = vn 1 the selective advantage of type n + 1 over type n is (n + 1)/n = 1/n > 0 and therefore our assumption about the population dynamics being neutral does not hold for small n. The dynamics of non-neutral mutant sectors is quite complex 22,43 and leads to formulas for φ n (a)r n (a) which cannot be Laplace-transformed analytically. We therefore stick to a much simpler but perhaps less realistic Eq. (31). In Section "More realistic surface fractions r n (a)" we shall discuss a different form for r n (a) that is more biologically plausible whilst still resulting in analytically tractable models.
In Methods ("Surface growth with many drivers"), it is shown that the exponential growth rate of the number of microlesions of type n is The growth rate can thus become negative for sufficiently large mutation rates μ. This effect is similar to what was found in the preceding subsection ("Surface growth with decreasing replication rate"). However, in contrast to this earlier result in which replication slowed down over time, resulting in fewer cells able to migrate and establish new microlesions, here the cells only change their type while their replication and migration is not affected. The total number of cells thus increases over time; we shall show this explicitly later in this section. A negative G n only means that a subpopulation of type n cannot grow exponentially at long times: once a strain with G n > 0 emerges, it rapidly dominates the population. The volumes of cells of type n, V tot,n (t), as well as the total volume V tot (t) can be obtained analytically (Methods, "Surface growth with many drivers" and Supplemental Material). Figure 3A shows example curves V tot,n (t). The asymptotic behaviour of each subpopulation V tot,n ~ exp(G n t) is clearly visible. Since G n increases with n, microlesions containing more driver mutations grow faster and their population eventually overtakes that of microlesions with fewer drivers. For this reason, the total volume V tot (t) obtained by summing up all V tot,n (Eq. (3)) increases faster than exponentially, as is plainly visible in the plots (Fig. 3A). In the large-t limit, we can show (Methods) that the volume behaves asymptotically as We can also calculate the average number of drivers (SI Mathematica notebook). Figure 3G shows that 〈 n(t)〉 is small until late times when it rapidly increases. In fact, in the limit t → ∞ it can be shown (Methods) that (G-I) The average number of drivers 〈 n(t)〉 : exact calculation (black) and asymptotic formula (dashed blue). The columns are as follows. Panels A,D,G: from the section on the "Surface growth" model with parameters v 1 = 0.0475, M = 10 −6 , μ = 2 · 10 −5 (all rates in units day −1 ). The parameters v 1 , M have been chosen such that the tumour reaches about 10 11 cells in 15 years (~5500 days) and accumulates 3-4 drivers, μ was assumed to be the same as in ref. 8. Black dashed curve in A is the total volume V tot derived from Eqs (3) and (57) with F n (z) given by (61). Solid black curve in G is the exact analytical calculation from Eqs (4) and (61), blue curve the asymptotic approximation (35). Panels B,E,H: corresponding to the section on the "Surface growth with decreasing replication rate" model for v 1 = 0.053, M = 10 −6 , μ = 2 × 10 −5 , λ = 10 −3 . Blue curve in panel H is the asymptotic approximation (37). Panels C,F,I: volumetric growth for b = 0.0026, M = 10 −5 , μ = 2 · 10 −5 . Solid black curve is the exact analytical calculation derived from Eqs (3) and (57) with F n (z) given by (77). Blue curve is the asymptotic approximation derived from (41).
G t 1 1 2 3 1 so the number of drivers accumulates exponentially over time. Note that the asymptotic formulas derived here are quite accurate, even for finite times relevant to tumour growth.
Slow-down surface growth. If growth of individual lesions slows down over time (as in "Surface growth with decreasing replication rate"), it can be easily obtained that Note that the differences G n+1 − G n do not depend on λ and are the same as in the case without slow down (Results, "Surface growth"). We shall see that this causes driver mutations to accumulate at the same rate as before. The average number of drivers reads (Methods) which turns out to be identical to Eq. (35) when we observe that G 1 from Eqs (33) and (36) differ only by λ. This means that the average number of drivers increases approximately exponentially over time with a rate that is not affected by growth of individual microlesions slowing down with age, and is the same for all λ. Example growth curves V tot,n (t) and the average number of drivers are presented in Fig. 3B,E,H.
Volumetric growth. In the case of volumetric growth i.e. cells replicating in the whole volume of the lesion, we have n n where b n = nb is the net replication rate of cells with n drivers. We assume that r n (a) = exp(− μa) exactly as for the surface growth models (31). This formula correctly describes the proportion of cells of type n only if type n + 1 does not have a selective advantage. In the case of selective advantage we are dealing with here, the formula is only approximately correct, as detailed in Section "More realistic surface fractions r n (a)". In the limit of long time we obtain (Methods) that the total volume and the average number of drivers behave asymptotically as bt/2 so that the total volume increases faster than exponentially, and drivers accumulate exponentially, similar to the surface growth models from previous sections. We can also compute exact expressions for V tot (t) and 〈 n(t)〉 (Methods). Figure 3C,F,I shows the volume and the number of driver mutations as a function of time. Note that to reach both the same size as in the surface growth model (of 10 11 cells) and a reasonable number of drivers (more than two), the net increment b of the growth rate per driver has to be very small.
Driver mutations with decreasing selective advantage. In all previous cases, virtually all drivers accumulated at a late stage during the growth of the tumour. In this section we shall study the case when the selective advantage of the first three drivers is much bigger than that of all subsequent drivers. Statistical analysis of cancer incidence rates suggests that this is indeed the case for some lung and colorectal cancers 6 . We take the surface growth model with the following v n : n 1 with some v 1 and a small  1  . The expansion speed increases fast for n = 2, 3 and slow for n ≥ 4 drivers. Figure 4 shows that now most drivers accumulate earlier, before the tumour reaches a detectable size (V tot < 10 9 ). This causes the final tumour to become much less heterogeneous because almost all cells have the same driver mutations. Moreover, the total volume grows approximately exponentially for long times, rather than super-exponentially as in the previous models. These results are in agreement with recent experimental evidence on intra-tumour heterogeneity in colorectal cancer 44 .
Scientific RepoRts | 6:39511 | DOI: 10.1038/srep39511 More realistic surface fractions r n (a). So far we have assumed the fraction of cells with n drivers to be r n (a) = exp(− μa) (see Eq. (31)). This simplifies calculations, but as already mentioned is not very realistic. In particular, as it is independent of n, Eq. (31) cannot take into account the expansion of the subpopulation of cells with n + 1 drivers due to their faster growth. We will now derive a more realistic form of r n (a) which is still simple enough to lead to analytic results for the total volume and the average number of drivers.
Recall that r n (a) is interpreted as the fraction of cells of genotype n in a type-n lesion that are able to migrate and establish new lesions. In the volumetric growth model, this would correspond to all cells in the lesion that do not have the n + 1 driver, while in the surface growth model only cells present on the surface contribute to r n (a).
Let us first consider the volumetric model. If b n is the growth rate of type-n, the fraction 1 − r n (a) of new mutant cells obeys the following equation: n n n n n 1 The first term (b n+1 − b n ) (1 − r n (a))r n (a) corresponds to the rate with which the population with n + 1 drivers increase due to selective advantage, and the second term μr n (a) accounts for new mutations. Equation (45) is of logistic type and can be solved yielding We can see that for small a the above expansion agrees with that of Eq. (31) which is the approximation we used before: the two formulae for r n (a) can therefore be expected to agree for a < μ −1 . Although the exact form of r n (a) from the above formula leads to a Laplace transform of F n (t) (Methods) that cannot be inverted analytically, the Taylor expansion can be used instead; in the Supplementary Mathematica notebook we show example formulas for V tot (t) for this case.
We will now consider the surface growth model. Note that we have an additional complication here: only cells that are on the surface are able to replicate and migrate, and if these processes are stochastic, it is possible that a random fluctuation causes a population of mutant cells to disappear from the surface. For example, if the surface is rough and fluctuating, a mutant strain can be "swallowed" by fluctuations in the growing surface if it does not "surf " on the expanding frontier of the lesion and fails to establish a macroscopic sector fast enough 45,46 . We therefore need to consider the "surfing probability" P surf that if a mutant was generated at time (lesion age) a 0 , it is still on the surface at time a.
Let us focus on a specific example of surface growth: Eden-like growth from ref. 12. It has been shown 43 that the probability P surf that a mutant strain remains on the surface is approximately P surf ≈ c/a 0 , i.e., inversely proportional to the age at which the first mutant cell has been created. This inverse proportionality law is likely to be true for more general models that fall into the KPZ universality class 47 . The proportionality constant c can be thought of as a "surfing time", and while we are not aware of an ab initio estimate, our simulations indicate that it is on the order of 1/v n days for the model considered in ref. 12. Since this constant depends on the microscopic details of the model, for the sake of generality we shall not ascribe a specific value in the formulas below. New mutants emerge in this model with a rate equal to the mutation probability p μ times the rate of expansion of the surface, which for sufficiently large (and old) lesions of age a will scale as a 2 . Following ref. 22, we calculate the probability r n that a randomly selected surface cell is non-mutant at radius ρ as (c.f. equations (19), (23), and (25) in ref. 22). The formula (47) gives the fraction of type-n cells, but as a function of the radius rather than the age a. Equation (47) also accounts for mutant cells that remained on the surface through the factor P surf . By inserting ρ = v n a and P surf = c/a we easily obtain that Equation (50) is quadratic in a and leads to an analytically solvable model, see the Supplementary Mathematica notebook. Notice that the fraction r n (a) can become negative for η > a 1/ n which limits the applicability of (50) to times shorter than η 1/ n . This problem can be partially alleviated by including higher order terms in the Taylor expansion of Eq. (48) at the expense of increasing calculation time.
Comparison to simulations. We compared the results obtained in previous sections with numerical simulations of two models: our model as defined in section 1 in which growth and mutations are deterministic but migration of cells is a stochastic process, and the Eden-like lattice model 12 in which all these processes are stochastic.
Computer simulations of the original model. We performed numerical simulations of the surface growth model. We treated individual lesions as balls of volume π v a (4 /3)( ) n 3 , where a was the age of the lesion. New lesions were established with rate φ π = a M v a ( ) 4 n n 3 2 . A new lesion was assigned type n with probability r n (a) = exp(− μa), and type n + 1 with probability 1 − r n (a). We assumed that mutations had the same effects on growth as in Driver mutations with decreasing selective advantage: three strong drivers and a small fitness advantage  for each additional driver after the first three. The expansion speeds v n were thus given by Eqs (42)(43)(44).
In each simulation we measured the total volume, the average number of drivers per cell 〈 n〉 , and the proportion of cells with a specific number of mutations n, and then averaged those quantities over 500 such simulations. Figure 4 shows an excellent agreement between simulations and the analytic formulae at early times; a small deviation can be seen for longer times. This deviation is caused by the analytic calculations neglecting stochastic effects and reproducing only "mean" behaviour.
The impact of stochastic fluctuations can be best seen in Fig. 4C which shows the average number of drivers 〈 n〉 as a function of time t. If we first calculate the average number of drivers in each replicate simulation, and then average over simulations ("Method 1"), the obtained 〈 n〉 deviates significantly from the analytical curve (purple points in Fig. 4C). However, if we calculate volumes of cells of type V n for each simulation, average this over replicates, and then calculate = ∑ n n V V / n n tot ("Method 2"), we obtain a much better agreement (black points in Fig. 4C). The second method corresponds to what we do in the analytic calculations -we average out randomness already in Eqs (1 and 2) and use the average volumes to find 〈 n〉 . The discrepancy between Method 1 and the analytical calculations is caused by a broad and highly skewed distribution of the times by which lesions with new drivers arise in the stochastic simulation. Tumours of different sizes are weighted equally in Method 1, and hence the contribution to 〈 n〉 from tumours in which drivers arose late is significant, even though such tumours are much smaller. These tumours will however decrease 〈 n〉 which is exactly what we see in Fig. 4C.
Comparison to the Eden lattice model. In the lattice model 12 , each site is either unoccupied (which may be interpreted as being occupied by a normal cell) or occupied by a cancer cell. Each cell carries some number of driver mutations n ≥ 1. The simulation begins with a single cell with one driver mutation situated at the origin. Each time step, cells attempt to divide and produce one additional cell at a random lattice site which neighbours their own with rate b n , but the attempt is only successful if the random lattice site is empty. During replication cells can gain additional drivers with probability p μ per daughter cell. Cells also migrate and create new microlesions with rate M. Distinct microlesions are approximately spherical, and do not interact with one another.
In that the resulting cloud of microlesions are asymptotically linearly expanding spheres and migration and mutation are both stochastic, the model is qualitatively similar to our mathematical model, but has much more complicated microscopic dynamics. Moreover, in contrast to the original model considered in this work, Scientific RepoRts | 6:39511 | DOI: 10.1038/srep39511 replication and mutation are also stochastic processes. It is thus not obvious a priori that the analytical model should be a good approximation to the lattice model.
We first tested whether the fraction of mutated cells 1 − r n (a) agrees with Eq. (50) from Section More realistic surface fractions r n (a). Figure 5 shows that the average 1 − r 1 (a) obtained from computer simulations is very close to η 1 a 2 , with η 1 fitted to data from simulations of a single ball of cells. The distribution of 1 − r 1 (a) for a given a is however very broad. We thus conclude that our analytic model should be able to reproduce the average V n (t) from computer simulations, although stochastic effects may be visible in 〈 n〉 .
We then compared the computer model with the analytical model. We assumed (similarly to Eqs (42-44)) only three drivers: in the computer Eden-like model and, accordingly, equations (42)(43)(44) with v 1 ≈ 0.53b 1 [sites/day] in the mathematical model. The proportionality constant 0.53 was obtained by fitting V tot (Eq. (19) in the analytical model) to the simulation data for p μ = 0, using G 1 as the single fitting parameter (see supplementary Mathematica notebook for further details).
We finally simulated the model with p μ > 0, see Fig. 6. Since we did not know the exact relationship between the mutation rate and the selective advantage in the Eden model, and the corresponding parameters p μ ,  in the analytical model, we fixed η n in Eq. (50) by fitting the product cp μ from Eq. (49) to the numerical simulation data for the total volume V tot (t) (Fig. 6A). We then used the fitted parameters to calculate the average number of drivers. Figure 6B shows that the agreement between the two models is quite good even though the analytical model has vastly simpler dynamics than the Eden model. The analytical model slightly underestimates the number of driver mutations, and we attribute this to a broad distribution of times at which drivers first appear, as in Sec. Computer simulations of the original model.

Discussion
In this paper we study a model of cancer based on the following three processes: replication of cancer cells, mutations endowing cells with fitness advantage, and migration that causes cells to disperse. The latter process causes the tumour to become a conglomerate of microlesions. Cellular dispersal has been recently recognized as a method by which tumours 28,48 or, more generally, populations of motile cells 49 can speed up their growth. Our model applies both to local migration as well as long-range invasion involved in metastasis and is one of hallmarks of cancer 50 . In fact, the aforementioned conglomerate of lesions can also include metastatic lesions.
The strength of the model presented in this work lies in the relative ease with which its average behaviour can be obtained analytically. Perhaps surprisingly, including dispersal of cells in the model makes it easier to solve than most other spatial (non-well mixed) models, because migration effectively "smears out" spatial structure, bringing the model's qualitative behaviour closer to well-mixed models. Analytical solubility means that the model works for tumours of any size, including large masses that need to be surgically removed, and it can be thus Tumour growth. In the absence of new driver mutations and assuming sufficient migration, our model predicts that long-time growth is exponential. This is also true when individual microlesions grow sub-exponentially, and even if their growth slows down over time. Given that most tumours contain avascular areas where the lack of oxygen and glucose inhibits proliferation 51,52 , our model provides a plausible explanation how the growth of an entire tumour can still be exponential, as often observed experimentally for intermediate-size tumours. This phenomenon does not require postulating any previously unknown mechanisms, but it relies on short-range migration of cancer cells, a process that certainly occurs in nature and has been actively researched recently 53 . Figure 7A shows a region in the space of parameters M, v 1 (v 1 ≡ v in this case) for which the model predicts the total size to fall between 10 10 and 10 12 cells after 10 to 20 years -typical sizes of cancerous tumours after that time as explained earlier. Generally, the slower individual microlesions grow, the larger the migration probability needs to be to achieve the typical size, and the region of "good" parameters is approximately a straight line on the log-log plot of M versus v: M ~ v −3 . This follows from how these parameters determine the growth rate G ~ (Mv 3 ) 1/3 and is related to the assumed surface growth of tumour microlesions in 3d space. If growth slows down over time (as studied in detail in "Surface growth with decreasing replication rate"), the "good" region moves up to higher M and v 1 but it also becomes thinner. Assuming that growth occurs in a layer that is about 10 cell thick 51 , and that cells replicate with rate b = 1 day −1 , we can estimate that a selective advantage of the initial driver (b − d)/b = 0.001, … , 0.01 8 correspond to v 1 ≈ 10(b − d) = 0.01, … , 0.1. From Fig. 7A we can then deduce that the required migration rate M ≈ 10 −6 -10 −3 -a rather modest number given that neoplastic cells can be highly motile 54,55 . Note that the model would not be appropriate for very large selective advantages (~50% has been claimed for the first driver in colorectal cancer 56 ) because M would have to be less than 10 −10 in which case migration would be negligible even for macroscopic lesions. Our model is thus applicable to clonal expansions occurring after the initial driver has been acquired, because subsequent driver mutations are likely to be less potent.
If new driver mutations, each with a non-zero fitness advantage, steadily accumulate during tumour growth, our model predicts faster-than-exponential growth. While this speed-up does not have to be large and may arise only in tumours that are too big to be clinically significant, we are not aware of any evidence that super-exponential growth has been observed in human cancer. On the other hand, if only a few drivers can occur, growth will be exponential even at large times, which is consistent with the experimental evidence. We shall come back to this problem when discussing tumour heterogeneity.
Exponential or faster growth is obviously unrealistic for very large tumours for which spatial constraints become important. Many models of tumour growth have been proposed in the past 57 that account for the experimentally observed sigmoidal growth curve of many tumours. Many of these models are however phenomenological and are not based on the microscopic dynamics of tumour cells, in contrast to the model studied here. It would be interesting to learn what minimal changes our model would require to reproduce sigmoidal growth.
Genetic heterogeneity of tumours. Experimental evidence gathered over the last 6 years 10,58,59 strongly suggest that cancerous masses are genetically heterogeneous, although the exact level of heterogeneity and its importance are still under debate. In this work we explored this heterogeneity by calculating the size of clonal subpopulations of cells with different numbers of driver mutations. We showed that these subpopulations increase exponentially in size and that only one driver is initially dominant, until finally becoming replaced by a mixture of clones with two, three, and more drivers. The coexistence of multiple clones means that the time to n drivers cannot be simply calculated as the sum of the times between consecutive driver mutations. We have also shown that if each new driver increases the selective advantage of cancer cells in comparison to normal cells, the average number of driver mutations predicted by our model increases exponentially in time for all considered scenarios. This means that most drivers would accumulate late during cancer progression. There is limited evidence 44,59 that this is not true. On the other hand, if only a few first drivers have significant fitness advantage, these drivers will accumulate early during growth and the tumour will become much more homogeneous. An interesting application of our model would be to predict how strong the selective advantage of new drivers can be that would still be consistent with recently postulated neutral evolution in tumours 59 .
We can also use the model to infer the relationship between the number of drivers and the parameters of the model for "typical" tumours. Figure 7B, shows a "good" region in (M, v 1 ) which produce "typical" tumour sizes. Points in the region have been colour-coded depending on how many drivers the tumour has for a given pair (M, v 1 ). The plot shows that, unless individual microlesions grow slowly enough (small v 1 ), the number of new drivers is close to zero (only one, initial driver present). Since v 1 is related to the selective advantage of a driver, this is consistent with previous research 8 showing that significant accumulation of new drivers occurs only for small selective advantages.
Our model can be extended in several ways. For example, we have not analysed spatial distribution of drivers. While it may not be possible to do this analytically, the model proposed here can easily be simulated on a computer (as demonstrated in Section "Comparison to simulations") and perhaps extended to include spatial locations of microlesions. Another possible extension would be to consider that only a fraction of cells in the tumour ("cancer stem cells") can replicate forever, whereas the majority of cells can undergo only a few rounds of replication, as e.g. in ref. 60. This assumption would affect the growth rate and the rate at which mutations accumulate in a single lesion. Finally, since separability of the coupled systems of structured population equations (1 and 2) only depends on the validity of the infinite genome approximation, it is likely that our approach is extensible to evolution in more complex landscapes than the linear chain of mutations we have studied. In particular, the model could be extended to epistatic interactions between drivers and passangers 61 or non-equal, random fitness increments 62 .

Methods
General solution with multiple types of microlesions. We first consider the general case (1,2). Given the initial condition f n (a, 0) = δ n,1 δ(a) i.e. that there is only one lesion of type n = 1 and age zero at time t = 0, Eq. (1) implies that f n (a, t) takes the form The boundary condition (2) can be rewritten as which upon a Laplace transform and little algebra gives  where … s [ ]( )  denotes the Laplace transform of the function in the square brackets. We can now write a formal solution of our coupled system of equations for n > 1: which is the Euler-Lotka equation for each lesion type m. In the large-t limit, the number of microlesions of type n will thus grow exponentially, and the rate of this exponential growth will be the largest of all dominant roots G j for 1 ≤ j ≤ n. In particular, if each new driver increases the product r m (a)φ m (a) (as if increasing the replication rate of cells in the lesion), the number of microlesions of type n will increase exponentially with rate G n , where G n is the largest root of Eq. (58).
Surface growth with many drivers. The required Laplace transforms read From Eqs (58) and (59) we then find the exponential growth rate of the number of microlesions of type n (Eq. (33)).
Number of lesions of type n. Upon inserting Eqs (59 and 60) into Eq. (57) we obtain an expression for the Laplace transform  F s ( ) n . This expression can be analytically inverted to find F n (z) for n = 1, 2, 3, … , although the formulas for even the lowest n are quite complicated (SI Mathematica notebook). Nevertheless, all F n (z) take the form Similarly, the total volume (number of cells) of type n is In the biologically-relevant limit, the mutation rate μ is considerably less than the net growth rate G 1 . Taking the limit μ/G 1 → 0 we obtain and finally, using Eq. (66), we obtain Eq. (35). We can also use Eq. (70) to calculate the asymptotic volume of the tumour, Following the same procedure as in the earlier section "Surface growth", we can approximate A n in the limit μ/G 1 → 0 as