Rigorous results for a population model with selection II: genealogy of the population

We consider a model of a population of fixed size $N$ undergoing selection. Each individual acquires beneficial mutations at rate $\mu_N$, and each beneficial mutation increases the individual's fitness by $s_N$. Each individual dies at rate one, and when a death occurs, an individual is chosen with probability proportional to the individual's fitness to give birth. Under certain conditions on the parameters $\mu_N$ and $s_N$, we show that the genealogy of the population can be described by the Bolthausen-Sznitman coalescent. This result confirms predictions of Desai, Walczak, and Fisher (2013), and Neher and Hallatschek (2013).


Introduction
We consider the following model of a population undergoing selection. We assume there are exactly N individuals in the population at all times. Each individual independently acquires mutations at times of a Poisson process with rate µ N , and all mutations are assumed to be beneficial. Each individual is assigned a fitness, which depends on how many mutations the individual has acquired relative to the mean of the population. More precisely, let X j (t) be the number of individuals with j mutations at time t, and let be the average number of mutations for the N individuals in the population at time t. Then the fitness of an individual with j mutations at time t is max 0, 1 + s N (j − M (t)) . (1.1) Each individual independently lives for a time which is exponentially distributed with mean one, then dies and gets replaced by a new individual. The parent of the new individual is chosen at random from the population, and the probability that a particular individual is chosen as the parent is proportional to that individual's fitness. The new individual inherits all of its parent's mutations. Note that this model includes two parameters: the mutation rate µ N and the selection parameter s N .
This model is of interest mostly because it is essentially the simplest possible model that allows for repeated beneficial mutations. The model has appeared previously in the literature; see, for example, [3,4]. An alternative to (1.1), which was considered, for example, in [2,8,19], is to assign a fitness of (1 + s N ) j to an individual with j mutations. However, assumption A3 below will ensure that for the range of parameters that we will consider, s N (j − M (t)) is small and therefore the approximation 1 + s N (j − M (t)) ≈ (1 + s N ) (j−M (t)) is valid. Consequently, the distinction between these two models is not important for our purposes. A limitation to our model is that the selective advantage s N is assumed to be the same for every beneficial mutation. Some authors have considered models in which the selective advantage resulting from a mutation is random (see [7,11,17,18,23]), but we do not consider this complication here.
Here we will be interested in determining how rapidly the population acquires beneficial mutations, that is, how fast M (t) grows as a function of t. This growth rate is sometimes called the rate of adaptation or the speed of evolution. Also, we will be interested in understanding the distribution of the fitnesses of individuals in the population at a given time.

Previous work
The behavior of the population in this model can vary considerably depending on the values of the parameters µ N and s N . The simplest case to handle is when the mutation rate µ N is small enough that there is only one beneficial mutation in the population at a time. This occurs, for example, when s N = s > 0 is a fixed constant and lim N →∞ µ N (N log N ) = 0. In this case, there is approximately an exponentially distributed waiting time until there is a so-called selective sweep, in which a beneficial mutation appears on one individual and then spreads to the entire population, followed by another exponentially distributed waiting time until another selective sweep occurs, and so on. See, for example, Chapter 6 of [6] for details. However, the process becomes much more complicated as soon as mutations occur rapidly enough that there can be more than one beneficial mutation in the population at a time.
Another case that has been studied in detail is when N µ N → α ∈ (0, ∞) and N s N → γ ∈ (0, ∞) as N → ∞. That is, the mutation rate µ N and the selection parameter s N are both of the order 1/N . In this case, one can describe the process using diffusion theory. For a summary of results in this direction, see sections 7.2 and 8.1 of [6] and chapter 10 of [9].
An important paper which establishes rigorous results is the work of Durrett and Mayberry [8], who were motivated by cancer modeling. They considered the variation of the model in which the fitness of an individual with j mutations is given by (1 + s) j , where s is a fixed constant not depending on N . They also assumed that µ N ∼ N −α , where 0 < α < 1. They showed that if T j = min{t : X j (t) ≥ 1} is the first time when an individual gets j mutations, then for a certain deterministic sequence of constants (t j ) ∞ j=1 , where → p denotes convergence in probability as N → ∞. They also obtained more precise results describing how the number of type j individuals evolves over time.
Yu, Etheridge, and Cuthbertson [24] considered very fast mutation rates, where µ N = µ > 0 and s N = s > 0 for all N . That is, neither the mutation rate nor the selection parameter depends on N . The model they considered is slightly different from the one presented here in that an individual's fitness affects its death rate as well as its birth rate. They observed that the process that keeps track of the differences between the fitness of the individuals and the mean fitness of the population has a stationary distribution. They proved that if the process starts from this stationary distribution, then for all δ > 0, we have if N is sufficiently large, thus establishing a lower bound of (log N ) 1−δ on the rate of adaptation. Kelly [14] considered the same model and obtained a corresponding upper bound by showing that if at time zero there are no mutations in the population, then for t ≥ log log N , where C is a positive constant. However, up to now, the precise asymptotic rate of adaptation has not been calculated rigorously in this case.
Although there are only a few rigorous results available for this model, there has been a considerable amount of previous nonrigorous work on this model and closely related models, mostly appearing in the Biology literature. Of particular relevance for the present paper is the work of Desai and Fisher [4], who carried out a precise and detailed analysis of this model. They found, under certain conditions on the parameters s N and µ N , that the difference in the number of mutations between the fittest individual in the population and a typical individual in the population is approximately 2 log(N s N ) log(s N /µ N ) (1.2) and that in the long-run, the number of mutations carried by a typical individual in the population increases at the rate of approximately 2s N log(N s N ) [log(s N /µ N )] 2 (1. 3) per unit time. See the discussion around equations (4) and (5) on p. 1765 of [4] for a brief explanation, and see the discussion around (40) and (41) on p. 1774 of [4] for a more detailed analysis. See also Brunet, Rouzine, and Wilke [3] for further analysis of these results. The heuristic arguments in [4] are discussed in more detail in section 2 below, and are largely the basis for the rigorous results proved in this paper. Rouzine, Brunet, and Wilke [19] studied the same problem using a different approach, building on earlier work of Rouzine, Wakeley, and Coffin [20], and estimated the rate of increase in the number of mutations carried by a typical individual in the population to be approximately which will match (1.3) asymptotically as long as the extra factors inside the logarithms can be ignored. See equation (53) in [19], and see section A.1 of [19] for a discussion of the assumptions required for (1.4) to be valid. In addition to obtaining the estimates (1.3) and (1.4) on the speed of evolution, these and other authors have considered the distribution of fitnesses of individuals in the population at a given time, coming to the conclusion that this distribution should be approximately Gaussian. See, for example, the discussion at the top of p. 1775 in [4], the mathematical appendix in [20], and the discussion around (11) of [19]. Other heuristic arguments for why the distribution of fitnesses should be approximately Gaussian are given in section 3 of [24] and in the supporting information to [2]. Because the mean of this Gaussian distribution is increasing in time as the population evolves, the evolution of the fitness distribution in the population can be modeled as a Gaussian traveling wave. This point of view is emphasized in [2] and can be traced back at least to [22]. It should be noted that Durrett and Mayberry [8] rigorously obtained traveling wave behavior in their model. However, for the low mutation rates that they considered, the number of values of j for which X j (t) > 0 does not tend to infinity as N → ∞. Consequently, they did not observe a traveling wave with a Gaussian shape, and indeed the Gaussian traveling wave picture has not been established rigorously for any range of parameter values.
The goal of this paper is to carry out a detailed, mathematically rigorous analysis of the model described above. Under certain conditions on s N and µ N , we are able to confirm several of the most important nonrigorous predictions about the model. We obtain rigorous results concerning the speed of evolution and the distribution of fitnesses of individuals in the population at a given time. We present our assumptions in section 1.2 and our main results in section 1.3. In section 2, we explain the heuristics behind the results, most of which are adapted from the previous nonrigorous work mentioned above. The rest of the paper is devoted to proving the main results. This is the first in a series of two papers devoted to the study of this model. In the follow-up paper [21], we show that the genealogy of the population can be described by a process called the Bolthausen-Sznitman coalescent, confirming predictions of Desai, Walczak, and Fisher [5] and Neher and Hallatschek [16]. The paper [21] uses extensively the results and techniques developed here.

Assumptions on the parameters
For deterministic sequences (x N ) ∞ N =1 and (y N ) ∞ N =1 depending on the population size N , we write x N ∼ y N if lim N →∞ x N /y N = 1. We write x N ≪ y N if lim N →∞ x N /y N = 0 and x N ≫ y N if lim N →∞ x N /y N = ∞.
For our main results, we will need the following assumptions on the parameters s N and µ N : The biological meaning of these assumptions, and the reason why they are needed for the main results, will be described later in section 2.3. Here we mention some of their consequences. Dividing A3 by A1, we see that the assumptions imply that lim N →∞ s N log(1/s N ) = 0 and therefore lim (1.7) Dividing (1.7) by A1, we get lim N →∞ log(1/s N )/ log(s N /µ N ) = 0. Thus, log(1/µ N ) ≫ log(1/s N ), which means that for all a > 0, we have (1.8) That is, the mutation rate µ N tends to zero faster than any power of s N . Another consequence of the fact that lim N →∞ log(1/s N )/ log(s N /µ N ) = 0 is that log(s N /µ N ) ∼ log(1/µ N ). In particular, (1.6) implies that log N ≫ log(1/µ N ), which means that for all a > 0, we have That is, the mutation rate tends to zero more slowly than any power of 1/N . Also, note that because log(s N /µ N ) ∼ log(1/µ N ), the expression log(s N /µ N ) could be replaced by log(1/µ N ) in any of the conditions A1, A2, and A3. We state the conditions in their current form because log(s N /µ N ) arises more naturally, as we will see later. We will always assume N is large enough that µ N < s N , so log(s N /µ N ) > 0.
To illustrate how these assumptions can be satisfied, we observe that if 1/2 < b < 1 and 0 < a < 1 − b, and if for all N we have then assumptions A1-A3 hold.

Main results
Let (1.10) We will see later that, as was observed in [5], the quantity a N is approximately the amount of time between when the first individual with j mutations appears and when individuals in the population have j mutations on average. This is the time scale on which we will study the process. Also, define which we will see is the natural scale on which to consider the number of mutations. For t ≥ 0, let Q(t) = max{j : X j (t) > 0} − M (t) (1.12) be the difference between the number of mutations carried by the fittest individual in the population and the mean number of mutations in the population. Our first theorem is an asymptotic result for this quantity. Here and throughout the paper, the notation → p denotes convergence in probability as N → ∞.
(1.15) Note that Theorem 1.1 implies that for large t, we have , (1.16) which is consistent with Desai and Fisher's prediction (1.2) because | log s N | ≪ log N when A1-A3 hold. Note also that the function q is discontinuous at 1, which is why we can not expect uniform convergence to hold over intervals containing 1. The next result is our main theorem concerning the speed of evolution. It shows how the mean number of mutations in the population changes over time. Note that the function m is discontinuous at 1, so Theorem 1.2 implies that the average number of mutations of individuals in the population stays close to zero until time a N , then rapidly increases to approximately k N . To see the long-run rate at which the population acquires beneficial mutations, note that (1.15) implies that lim t→∞ m(t) t = 2.
(1. 19) Therefore, for large t, M (a N t) (1.20) The right-hand side of (1.20) can be viewed as the rate of adaptation, or the rate per unit time at which new mutations take hold in the population. Because | log s N | ≪ log N and log(1/µ N ) ≪ log N when A1-A3 hold, as can be seen from (1.8) and (1.9), and log log N ≪ log(s N /µ N ) by (1.7), this result is consistent with the predictions (1.3) and (1.4).
Remark 1.3. The functions q and m have a renewal theory interpretation, which helps to explain (1.15) and (1.19). Consider a renewal process in which the distribution of the time between renewals is uniform on (0, 1). Let N (t) be the number of renewals by time t, and let U (t) = E[N (t)]. The renewal equation gives Let U ′ denote the right derivative of U . If 0 ≤ t < 1, then U ′ (t) = 1+ U (t), and since U (0) = 0, it follows that U (t) = e t −1 and thus U ′ (t) = e t . If t ≥ 1, then U ′ (t) = U (t)−U (t−1) = t t−1 U ′ (u)du. It follows that U ′ satisfies (1.13), so U ′ (t) = q(t) for all t. Also, for t ≥ 1, For large t, because the uniform distribution on (0, 1) has mean 1/2, we have U (t) ≈ 2(t − 1), which explains (1.19).
Next we state our main result for the distribution of fitnesses of individuals in the population at a given time. Let τ 0 = 0 and for j ∈ N, let which we will see later is approximately the time when some individuals with j − 1 mutations start to acquire a jth mutation. Also, let (1.22) We will see later that most individuals in the population between times γ j and γ j+1 will have j mutations. For t > 1, let j(t) = max{j : γ j ≤ a N t}.
(1.25) Theorem 1.4 compares the number of individuals with j(t) mutations to the number of individuals with j(t) + ℓ mutations at time a N t. To see why this result is consistent with the conjecture from section 1.1 that the distribution of fitnesses is Gaussian, note that if Z is a random variable having a Gaussian distribution with mean x + d and variance σ 2 , and f is the probability density function of Z, then Therefore, the result of Theorem 1.4 suggests that, in some sense, the distribution of the fitnesses of individuals in the population at time a N t is approximately Gaussian with a mean of j(t) + d(t) and a variance of It should be noted, however, that (1.7) implies that lim N →∞ σ 2 N (t) = 0. Consequently, the distribution of fitnesses of individuals in the population at time a N t does not actually converge to a Gaussian distribution as N → ∞. Rather, the fraction of individuals in the population with exactly j(t) mutations will be close to 1, unless |d(t)| is very close to 1/2. Nevertheless, the appearance of ℓ 2 − 2ℓd(t) in Theorem 1.4 demonstrates Gaussian-like tail behavior.

Notation
We collect here for the convenience of the reader some of the most important notation used throughout the paper. Because most of this notation has not yet been introduced, the reader is encouraged to skip this section for now and refer back to it as needed. Defined in (3.14), used to determine which mutations are "early" B j (t) Birth rate of a type j individual at time t, see (4.1) D j (t) Death rate of a type j individual at time t, see (4.2) (F t ) t≥0 Natural filtration of the process G j (t) s(j − M (t)) − µ, growth rate of type j population at time t j(t) max{j : γ j ≤ a N t}, corresponds to most common type at time t J 3k N T + k * + 1, bound on number of types likely to appear by time a N T k N log N/ log(s/µ), natural scale for the number of mutations k − N , k + N numbers slightly smaller and larger than k N , see (3.2) and (3.3) k * largest integer less than k + Mean number of mutations in the population at time t M (t) Approximation to mean number of mutations at time t, defined in (7.16) N Population size Q(t) Difference in number of mutations between fittest individual and average q(t) Scaling limit of (Q(t), t ≥ 0), defined in (1.13) q j Approximately the value of Q(τ j ), see (3.15) R(t) Number of τ j between t − a N and t, see (3.23) s = s N Selective advantage resulting from a mutation S j (t) Number of individuals with j or fewer mutations t * Time before which individuals of types up to k N appear, defined in (3.6) T Large positive number; the process is studied up to time a N T x j (t) Approximation to number of individuals with j mutations at time t for t ≤ t * X j (t) Number of individuals with j mutations at time t X j,1 (t) Number of type j individuals at time t descended from mutations before ξ j X j, Martingale associated with evolution of type j individuals, see Proposition 4.1

Heuristics
In this section, we discuss the key ideas behind the main results in the paper. The goal is to explain to the reader, in just a few pages of calculations, why the main results are true. Most of these heuristics have already appeared in the Biology literature, particularly in the work of Desai and Fisher [4]. We postpone rigorous proofs of the results, and justification for the approximations used, until later sections, and in this section we assign no precise meaning to the approximation symbol ≈. Here and throughout the rest of the paper, to lighten notation we write µ and s in place of µ N and s N respectively, even though these parameters depend on the population size N .

The initial stage
Consider first the initial stage of the process, when the average number of mutations in the population is close to zero. For times t in this range, we have X 0 (t) ≈ N and M (t) ≈ 0. During this stage, we can approximate the process by a multitype branching process in which a type j individual dies at rate 1, gives birth to another type j individual at rate 1 + sj, and mutates to type j + 1 at rate µ. This means that the total rate at which type j + 1 individuals appear due to mutations is µX j−1 (t), and if such a mutation appears at time u < t, the expected number of descendants of this individual in the population at time t is e (sj−µ)(t−u) ≈ e sj(t−u) , where the approximation is valid because µ is much smaller than s. This leads to the approximation Then an inductive argument gives The approximation (2.2) only holds when the mean number of mutations is close to zero, which can be true only when X 1 (t) is much smaller than N . From (2.1), we see that X 1 (t) will be of order N when e st is comparable to s/µ, which happens near time Before time a N , the average number of mutations in the population will be close to zero, and the approximation (2.2) will be valid. For the approximation (2.2) to be useful for understanding the evolution of the number of type j individuals, we need to know that X j (t) ≈ E[X j (t)]. We will calculate, using a second moment argument, that this approximation holds for small times t when j ≤ k N . This is true essentially because, for j ≤ k N , type j individuals appear in the population very quickly. For larger values of j, however, it is not true that X j (t) ≈ E[X j (t)]. Rather, the expectation is dominated by rare events in which an individual acquires a jth mutation much earlier than usual, causing the number of type j individuals at later times to be unusually large. Therefore, for j > k N , we can not approximate X j (t) by its expectation, and we need a different technique to understand the process (X j (t), t ≥ 0).

Evolution of the number of type j individuals
We now consider the evolution of type j individuals when j > k N . The key idea is to break the process into two stages: an initial stage in which the type j population becomes established as a result of mutations experienced by type j − 1 individuals, and a second stage in which these mutations are no longer important and the type j population evolves essentially in a deterministic way. This idea has been used in previous work on this model, and in particular many of the calculations in this section strongly resemble those in [4]. Recall that τ j is the first time when there are at least s/µ individuals of type j−1 in the population. We will show using a first moment argument that with high probability, no type j individuals will appear before time τ j . The type j population becomes established during the interval [τ j , τ j+1 ], then evolves approximately deterministically after time τ j+1 .
After time τ j+1 , we will see that mutations from type j−1 to type j no longer have a significant impact on the size of the type j population. Consequently, at a time u ≥ τ j+1 , the number of type j individuals will be growing approximately deterministically at the rate s(j − M (u)), which is the size of the selective advantage that a type j individual has over an individual of average fitness. That is, for t ≥ τ j+1 , we have Consider next what happens between times τ j and τ j+1 , when the type j population gets established. We can use (2.3) to approximate the number of type j − 1 individuals shortly after time τ j . As long as no type j individual appears before time τ j , we have (j − 1) − M (τ j ) = Q(τ j ), so (2.3) suggests the approximation As long as the average fitness of the population does not change much shortly after time τ j , a new type j individual that appears because of a mutation at time u will have on average e s(Q(τ j )+1)(t−u) descendants at time t. Thus, we have the approximation where the last approximation requires t − τ j ≫ 1/s. Therefore, τ j+1 should occur approximately when the expression in (2.5) equals s/µ, which leads to .
To estimate Q(τ j ), note that (2.4) and (2.5) lead to which equals one when That is, the number of type j individuals surpasses the number of type j − 1 individuals approximately a N time units after type j individuals first appear. Around that time, there will be more type j individuals than individuals of any other type, and the mean number of mutations in the population will be approximately j. It follows that M (τ j ) will be approximately the type that first appeared roughly a N time units in the past, and Q(τ j ) will be approximately the number of new types that have appeared in the last a N time units. Because the rate per unit time at which new types are appearing can be approximated by the reciprocal of the expression in (2.6), we obtain for t > 1 the approximation For t < 1, we know from the discussion in section 2.1 that M (a N t) ≈ 0, so Q(a N t) is approximately the number of types that have originated before time a N t. Since we know from the discussion in section 2.1 that k N types appear at very small times, we have for t < 1 the approximation To understand Theorem 1.2, recall again that M (a N t) ≈ 0 for t < 1. For t > 1, we know from the discussion in the previous paragraph that M (a N t) is approximately the number of types that appear before time a N (t − 1). Because k N types appear near time zero and the rate at which new types appear can be approximated by the reciprocal of the expression in (2.6), we get for t > 1 the approximation which leads to Theorem 1.2. To obtain the result of Theorem 1.4, we use the approximation (2.3) to compare X j(t)+ℓ and X j(t) . We refer the reader to the proof of Theorem 1.4 in subsection 9.3 for the details of this calculation. Although the main ideas discussed in this section come from [4], it has been assumed in most previous work on this model such as [4,19] that the population is already in equilibrium. Then one can argue that this equilibrium is only possible when (1. 16) and (1.20) hold. One of the contributions of the present work is to show how the process arrives at such a state, beginning from a population in which no mutations are present.

Meaning of the assumptions
We briefly discuss here the assumptions required for these results to be valid. Note that (1.6) is equivalent to the condition lim Since Q(a N t) is of the order k N , assumption A1 implies that the number of different types in the population at a given time tends to infinity as N → ∞. This condition is not satisfied in the parameter regime considered by Durrett and Mayberry [8]. Assumption A1 also ensures s N is large enough for mutations to take hold in the population in the manner described above. For the heuristics described in section 2.2 to be valid, the type j population must be growing approximately exponentially after time τ j+1 , which will happen as long as additional mutations from type j − 1 to type j are no longer having a significant impact on the population size. The contribution to the type j population from mutations at different times can be seen from the integral in the second line of (2.5). The primary contribution to this integral comes when u is comparable to 1/s. Consequently, we need τ j+1 − τ j ≫ 1/s for the number of type j individuals to be growing exponentially after time τ j+1 . In view of (2.6) and the fact that Q(τ j ) is the same order of magnitude as k N , this is equivalent to the condition which is equivalent to (1.7). Thus, the role of assumption A2 is to ensure that the mutation rate µ is slow enough that we can ignore mutations from type j − 1 to type j after time τ j+1 . For technical reasons, assumption A2 is slightly stronger than (1.7), but we conjecture that the main results of the paper are still true if assumption A2 is replaced by (1.7 Because the difference in fitness between the fittest individual and an individual of average fitness is of the order sk N , assumption A3 implies that we are not considering very strong selection.

Structure of the Proofs
In this section, we state some intermediate results that will lead to the proofs of the main results. Some of these intermediate results may also be of independent interest, as they provide some insight into how the number of individuals with j mutations evolves over time. Throughout the section, we will fix three positive numbers: ε, δ, and T . We will use ε ∈ (0, 1) for the maximum allowable probability of some "bad" event and for the maximum allowable error in certain approximations. We will study the process up to time a N T , where T > 1. Throughout the paper, we will introduce some positive constants C n . These constants may depend on the three parameters ε, δ, and T , even though this dependence will not be specifically mentioned each time.
On numerous occasions throughout the paper, we will assert that a statement holds "for sufficiently large N ". This means that there exists a positive integer N 0 , depending on ε, δ, and T , such that the statement in question holds for N ≥ N 0 . Often the statement in question will somehow involve the evolution of type j individuals, where j could take values in a certain range, typically 0 ≤ j ≤ k * or k * + 1 ≤ j ≤ J, where k * and J are defined below. The statement may also involve the time t, which may be permitted to take values in a certain range. In such cases, the value of N 0 may not depend on j or t. That is, the same N 0 must work for all j and t in the indicated ranges.

The process until time t *
We begin by considering the initial stage of the process. Recall from subsection 2.1 that for j ≤ k N = log N/ log(s/µ), we expect individuals of type j to appear in the population very early, and we expect the number of type j individuals to be well approximated by the right-hand side of (2.2). To state a precise result, define and be the largest integer less than k + N . Assumption A2 implies that lim so for sufficiently large N , the number of integers j such that k − N < j < k + N must be either zero or one. Define the time The following proposition, which we prove in section 5, describes how the process evolves before time t * .
Proposition 3.1. For all nonnegative integers j and all t ≥ 0, define Then there exist positive constants C 1 and C 2 such that for sufficiently large N , the following four statements all hold with probability at least 1 − ε/2: where −1 < b j < 2, and let d j = max{0, b j }. Then

Evolution of type j individuals
In this subsection, we consider how the population evolves after time t * . Recall the definitions of τ j and γ j from (1.21) and (1.22). For nonnegative integers j and t ≥ 0, define also which we can interpret as the rate of growth for the number of type j individuals at time t. We will also define the integers The next proposition describes the evolution, after time t * , of the number of individuals with k * or fewer mutations. The first part of the proposition controls the evolution of the type j individuals after time t * . The second and third parts provide upper bounds on the number of type j individuals as these individuals get close to extinction.
For sufficiently large N the following statements all hold with probability at least 1 − ε: 1. For all j ≤ k * and t ∈ [t * , γ k * +K ], we have 2. For all j ≤ k * and t ∈ [γ k * +K , a N T ], we have On the event that γ k * +L ≤ a N T , we have X j (t) = 0 for all j ≤ k * and t ≥ γ k * +L .
We next consider the individuals of type j for j ≥ k * + 1. By part 4 of Proposition 3.1, individuals of these types typically do not appear until after time t * , so we need to consider how these types originate. Define the positive number b = log 24000 T δ 2 ε . (3.14) For j ≥ k * + 1, let Then define When an individual with j − 1 mutations gets an additional mutation, we call this a type j mutation. Each type j individual in the population at time t has an ancestor that got a type j mutation at some earlier time. We call the individual an early type j individual if this type j mutation happened at or before time ξ j . Let X j,1 (t) be the number of early type j individuals at time t, and let X j,2 (t) be the number of other type j individuals at time t. This means, of course, that X j (t) = X j,1 (t) + X j,2 (t).
Also, define the time The result below describes the evolution of the type j individuals for j ≥ k * + 1. The first two parts of the proposition concern the evolution of the type j individuals up to time τ j+1 and require classifying the type j individuals as being early or not early. The remaining three parts parallel the three parts of Proposition 3.2. Proposition 3.3. There exists a positive constant C 3 such that for sufficiently large N , the following statements all hold with probability at least 1 − ε: Also, X j,1 (t) ≤ s/2µ for all t ≤ τ * j ∧ a N T , and no early type j individual acquires a type j + 1 mutation until after time τ j+1 ∧ a N T .
5. For all j ≥ k * + 1 such that γ j+L ≤ a N T , we have X j (t) = 0 for all t ≥ γ j+L .
Remark 3.4. When the statement of part 1 of Proposition 3.3 holds, the number of early type j individuals can not reach s/µ until after time τ j ∧a N T , and because ξ j ≥ τ j by definition, no other type j individuals appear until after time τ j . It follows that if j ≥ k * + 1, then τ j+1 ≥ τ j ∧ a N T .
The next proposition shows how the mean number of mutations in the population evolves over time. Note that the mean number of mutations in the population is near zero before time a N and is near j during the time interval [γ j , γ j+1 ).
Proposition 3.5. There exist positive constants C 4 and C 5 such that for sufficiently large N , the following statements all hold with probability at least 1 − ε: 2. For all t ∈ (a N , γ k * +1 ), we have M (t) < k N + C 4 .
Next, we state a result concerning the differences τ j+1 − τ j . Here q is the function defined in (1.13).

23)
where #S denotes the cardinality of a set S. For sufficiently large N , the following statements all hold with probability at least 1 − ε: 3. For all j ≥ k * + 1 such that either If the statement of part 3 of Proposition 3.6 holds, then (3.26) implies that Assuming, in addition, that the last statement of part 1 of Proposition 3.3 holds, it follows that no individual of type J + 1 or higher can appear until after time a N T . Consequently, throughout the paper, it will usually only be necessary to consider individuals of type j for 0 ≤ j ≤ J.

Waiting for the time ζ
Although Proposition 3.1 is proved in section 5 independently of the other results in this section, it does not seem to be possible to prove Propositions 3.2, 3.3, 3.5, and 3.6 sequentially. Proving Propositions 3.2 and 3.3 requires that we have some control over the quantities M (t) and τ j+1 −τ j , which are established in Propositions 3.5 and 3.6. On the other hand, to prove Propositions 3.5 and 3.6, it will be necessary to have control over the quantities X j (t), as established by Propositions 3.2 and 3.3. Consequently, we will prove these propositions simultaneously by defining a random time ζ which will be the first time that one of the statements in the above propositions fails. We will then show that ζ > a N T with high probability.
Choose constants C 1 and C 2 as in Proposition 3.1. Let Note that ζ 0 = ∞ if the four statements of Proposition 3.1 all hold.
Next, for all nonnegative integers j, we will define a random time ζ 1,j , which is essentially the first time that the behavior of the type j individuals violates the conditions of Proposition 3.2 or Proposition 3.3. First consider j ≤ k * . For t ∈ [t * , γ k * +K ], let A j (t) be the event that (3.12) fails to hold. For t ∈ (γ k * +K , γ k * +L ), let A j (t) be the event that (3.13) fails to hold. For t ≥ γ k * +L , let A j (t) be the event that X j (t) > 0. Now consider j ≥ k * + 1. Choose a constant C 3 as in Proposition 3.3. For t ≥ t * , we say that A j (t) occurs if t ∈ [τ * j , τ j+1 ] and (3.18) or (3.19) fails to hold, if t ∈ [ξ j , τ j+1 ] and the upper bound in (3.19) fails to hold, if t ≤ τ * j and X j,1 (t) > s/2µ, if t ≤ τ j+1 and an early type j individual acquires a type j + 1 mutation at time t, if t ∈ [τ j+1 , γ j+K ] and (3.20) fails to hold, if t ≥ γ j+K and (3.21) fails to hold, or if t ≥ γ j+L and X j (t) > 0. Then let ζ 1,j = inf{t : A i (t) occurs for some i ≤ j} and ζ 1 = inf{ζ 1,j : 0 ≤ j ≤ J}.
Next, we will define ζ 2 to be the first time when the result of Proposition 3.5 fails. More precisely, choose C 4 and C 5 as in Proposition 3.5, and let for some j ≥ k * + 1 we have t ∈ [γ j , γ j+1 ) but (3.22) fails to hold}, Also, let there exists j ≥ k * + 1 such that τ j+1 ≤ t but (3.24), (3.25), or (3.26) fails to hold, or there exists j ≥ k * + 1 such that τ j+1 > t and t = τ j + 2a N /k N }, which can be interpreted as the first time when Proposition 3.6 fails. Finally, let ζ = min{ζ 0 , ζ 1 , ζ 2 , ζ 3 }. Note that ζ depends on δ and depends also on ε and T through the choice of b in (3.14). Also, ζ depends on the constants C 1 , . . . , C 5 . The constants C 1 and C 2 are chosen independently of the others in Proposition 5.9 below. The constant C 3 is specified below in (8.54). The constants C 4 and C 5 , which depend on C 3 , are obtained below in Propositions 6.5 and 6.8 respectively.
We prove parts 1, 2, and 3 of Proposition 3.8 in sections 6, 7, and 8 respectively. Here we show how Proposition 3.8, along with Proposition 3.1, implies Propositions 3.2, 3.3, 3.5, and 3.6. Essentially, parts 1, 2, and 3 of Proposition 3.8 show that ζ 2 , ζ 3 , and ζ 1 respectively are unlikely to be the first of these three times to occur. This forces ζ to be pushed beyond time a N T with high probability. Note that a consequence of this result is that for sufficiently large N , the conclusions of Propositions 3.1, 3.2, 3.3, 3.5, and 3.6 simultaneously hold with probability at least 1 − ε.
Hence, for sufficiently large N , we have ζ > a N T on the event in (3.30). Thus, by (3.30), for such N we have Propositions 3.2, 3.3, 3.5, and 3.6 follow from (3.31). Note that Remark 3.7 implies that on {ζ > a N T }, no individual of type J + 1 or higher appears until after time a N T , which is why it is only necessary to consider ζ 1,j for 0 ≤ j ≤ J.

A useful martingale
In this section, we introduce a martingale which will be useful throughout the paper for controlling the fluctuations of the number of type j individuals in the population.

Constructing the martingale
We first record the birth and death rates for different types of individuals. Let F j (t) be the fitness of a type j individual at time t, which is max{0, 1 + s(j − M (t))}, divided by the sum of the fitnesses of the N individuals in the population. Note that, if there is a birth event at time t, then F j (t−) is the probability that a particular type j individual is the one chosen to give birth. As long as every individual's fitness is strictly positive, the sum of the fitnesses of the N individuals in the population is There are three ways that the number of type j individuals could change at time t: 1. If j ≥ 1, a type j − 1 individual could acquire a jth mutation at time t. This event happens at rate µX j−1 (t−). So that our formulas hold also when j = 0, we adopt the convention that X −1 (t) = 0 for all t ≥ 0.
2. The number of type j individuals could increase by one because of a birth. This happens if one of the N − X j (t−) individuals that is not type j dies at time t, and the new individual born has type j. Because each individual dies at rate 1, and when a death occurs, the probability that a type j individual is born is X j (t−)F j (t−), the rate at which new type j individuals are born is which can be interpreted as the rate at which a particular type j individual gives birth following the death of an individual with a different type.
3. The number of type j individuals could decrease by one because of a mutation or death.
The rate at which one of the type j individuals acquires a (j + 1)st mutation is µX j (t−).
The rate at which the number of type j individuals decreases due to a death is given by X j (t−)(1 − X j (t−)F j (t−)) because there are X j (t−) type j individuals each dying at rate one, and when a death occurs, the probability that the new individual born is not a type j individual is 1 − X j (t−)F j (t−). Thus, the total rate of events that reduce the number of type j individuals is µX which can be interpreted as the rate at which a particular type j individual either acquires a mutation or dies and gets replaced by an individual with a different type.
Let X b j (t) be the number of times in [0, t] that the number of type j individuals increases by one. Let X d j (t) be the number of times in [0, t] that the number of type j individuals decreases by one.
for all j ∈ N and t ≥ 0. From the rates obtained above, we see that if we define and then the processes (W b j (t), t ≥ 0) and (W d j (t), t ≥ 0) are martingales for all j ∈ Z + . Therefore, if we define W j (t) = W b j (t) − W d j (t) for all t ≥ 0, then the process (W j (t), t ≥ 0) is a martingale for all j ∈ Z + . Let ∆W j (t) = W j (t) − W j (t−). Because the process W j is locally of bounded variation, the quadratic variation is given by (see (8.19) of [15]). Because W b j + W d j , being the sum of two martingales, is a martingale, we get (see Definition 8.22 of [15]) (4.5) We will work primarily with a different martingale. For all t ≥ 0 and j ∈ Z + , let As long as every individual's fitness is strictly positive, we have where G j (t) was defined in (3.11). We interpret G * j (t) as the growth rate of the type j population a time t. In Proposition 4.1 below, we define a martingale that will be very useful for studying how the number of type j individuals evolves over time. This is similar to the martingale studied in section 4 of [8].
Then (Z j (t), t ≥ 0) is a mean zero martingale with Proof. For t ≥ 0 and j ∈ Z + , define The processes X j and I j are both semimartingales, so the Integration by Parts Formula (see Corollary 8.7 of [15]) gives Because the processes X j and I j are locally of bounded variation, and the process I j has continuous paths, we have (see (8.19) of [15]) [X j , I j ] t = 0 for all t a.s. (4.10) and I j (t) is a continuous function of t, we get Combining (4.9), (4.10), (4.11), and (4.12) and using that I j (0) = 1, we get Therefore, in view of (4.7) and (4.8), we have Note that D j (t) ≤ 1 + µ for all t. Also, because 0 ≤ F j (t) ≤ 1 for all t, we have B j (t) ≤ N for all t, and so the process (G * j (t), t ≥ 0) is bounded. Therefore, using (4.5), for each fixed t > 0, we have Therefore (see Theorem 8.32 of [15]), the process (Z j (t), t ≥ 0) is a square integrable martingale and Because Z j (0) = 0, the process (Z j (t), t ≥ 0) is a mean zero martingale. Finally, because Var(Z j (t)) = E[Z 2 j (t)] = E[ Z j (t)] (see Corollary 8.25 of [15]), the result follows.

Generalizations
It will often be useful to consider the martingale of Proposition 4.1 started or stopped at a stopping time. Let (F t ) t≥0 be the natural filtration of the process ((X 0 (t), X 1 (t), . . . ), t ≥ 0). Let τ be a stopping time with respect to (F t ) t≥0 . Let X τ j (t) = X j (t ∧ τ ) and Z τ j (t) = Z j (t ∧ τ ) for all t ≥ 0. Then the process ((X τ 0 (t), X τ 1 (t), X τ 2 (t), . . . ), t ≥ 0) represents the population modified so that it does not change after time τ . Because stopped martingales are martingales, the process , and we have the following corollary.
Corollary 4.2. Let τ be a stopping time, and let Z τ Also, the process ((X 0 (t), .
Also, we will sometimes need to consider the type j individuals that are descended from an individual that gets its jth mutation during some time interval. The following result is established in the same way as Proposition 4.1 and Corollary 4.3 except that the mutation rate is set to zero outside of the time interval (κ, γ]. (t) be the number of type j individuals in the population at time t that are descended from individuals that acquired a jth mutation during the time interval expressions on the right-hand sides of (4.1) and (4.2) with X Remark 4.5. By the Strong Markov Property, the result of Corollary 4.4 holds even if j is random, as long as j is F κ -measurable.

A related supermartingale
We will also need to consider a supermartingale that involves not just the individuals of type j but the individuals of all types less than or equal to j. For j ∈ Z + and t ≥ 0, let 1. The number of individuals with j or fewer mutations could increase by one because of a birth. This happens when one of the N − S j (t−) individuals with more than j mutations dies and is replaced by an individual with j or fewer mutations. Because each individual dies at rate 1, and when a death occurs at time t, the probability that a type ℓ individual is born is X ℓ (t−)F ℓ (t−), the rate at which this occurs is (4.14) 2. The number of individuals with j or fewer mutations could decrease by one because of a mutation or death. The rate at which one of the type j individuals acquires a (j + 1)st mutation is µX j (t−). There are S j (t−) individuals with j or fewer mutations that could die, and when a death occurs, the probability that the new individual born has more than . Therefore, the total rate of events that reduce the number of type j individuals is and note that the difference between the expressions in (4.14) and (4.15) is V j (t−). Thus, reasoning as in the argument following (4.3) and (4.4), the process (S j (t) − t 0 V j (u) du, t ≥ 0) is a martingale. This leads to the following proposition.
Proposition 4.6. For all j ∈ Z + and t ≥ 0, let Proof. Lemma 3.2 in Chapter 4 of [9] states that if (X(t), t ≥ 0) is a process which takes its values in a complete separable metric space E and is adapted to (F t ) t≥0 , and if f : , then the process whose value at time t is is a martingale with respect to (F t ) t≥0 . We can apply this result with Letting Because S j (t) = 0 whenever S j (v) = 0 for some v < t, the indicators on both sides of (4.17) can be removed.

Proof of Proposition 3.1
In this section, we study the behavior of the process before the time t * defined in (3.6). We prove Proposition 3.1. Recall the definitions of k N , k − N , k + N , and k * from (1.11), (3.2), (3.3), and (3.4). Part 1 of Proposition 3.1 says that for j ≤ k − N , the number of type j individuals at time t ∈ [0, t * ] is well approximated by x j (t), which is defined in (3.7). Part 2 handles the delicate case in which there is an integer j in the interval (k − N , k + N ). Parts 3 and 4 say that for j ≥ k + N , no type j individuals appear before time t * , and there are fewer than s/µ individuals of type k * through time t * .

Bounding the mean number of mutations
Before time t * , the mean number of mutations in the population is close to zero. Accordingly, let η = µk 5 N /s, and define the stopping time Recall the definition of the martingale (Z j (t), t ≥ 0) from (4.7). We will consider the processes From assumption A3 and (1.8), we see that for all a > 0, so in particular η → 0 and sη → 0 as N → ∞. Therefore, we may and will assume throughout this section that N is large enough that sη < 1. This implies that the fitness of every individual is strictly positive before time τ , which means G * j (t) = G j (t) = s(j − M (t)) − µ for all j ∈ Z + and t < τ . Our first goal is to show that with high probability, we have τ > t * , and so stopping the process at time τ does not change the behavior of the process before time t * . To do this, we need the upper bound on E[X τ j (t)] provided by the following lemma. This lemma will also be useful for first moment estimates later in the proof.
(t) be the number of type j individuals at time t that are descended from individuals that acquired their jth mutation during the time interval (t i , t i+1 ]. The process (Z and then letting m → ∞ gives the result follows by induction.
By Lemma 5.1, Therefore, in view of (5.1), we have y → 0 as N → ∞, and thus ye y ≤ 2y for sufficiently large N . Using (1.6), for sufficiently large N , as claimed.

Controlling the fluctuations in X j
Our goal in this subsection is to obtain sharp bounds on the fluctuations of the number of type j individuals before time t * . Because the randomness can be expressed in terms of the martingales Z j , the key result is the next lemma, which will provide control on the value of |Z j (t)|. Before stating this lemma, we establish a simple bound on the birth and death rates that will be useful throughout the paper. Note that for all t such that all individuals at time t have a strictly positive fitness, and in particular for all t < τ , we have Because sk + N → 0 as N → ∞ by (3.5) and assumption A3 and µ → 0 as N → ∞, we have for For future reference, note that (5.5) also holds for all j ≤ J = 3k N T + k * + 1.
The next lemma shows that when the processes Z j are bounded as indicated in Lemma 5.3, the processes X j will stay fairly close to the deterministic functions x j defined in (3.7). Because the difference between X j and x j depends in part on the difference between X j−1 and x j−1 , the proof proceeds by induction. Rather precise bounds are needed to prevent the errors from accumulating too rapidly during the induction process, so some technical work is required to obtain sufficiently sharp estimates.
Lemma 5.4. On the event that t * < τ and Proof. Throughout the proof, we will assume that t * < τ and that (5.10) holds. This implies We will first show by induction that for ℓ ∈ {0, 1, . . . , ⌊k + N ⌋} and t ∈ [0, t * ], we have dv ≤ e sℓt and we are assuming that (5.10) holds, Combining (5.14), (5.15), (5.16), and (5.17) leads to Using the induction hypothesis to bound the integral, we get The first term t 0 µe sℓ(t−u) H ℓ−1 (u) du in the integral on the right-hand side of (5.18) matches the j = ℓ − 1 term on the right-hand side of (5.13). For j ∈ {0, 1, . . . , ℓ − 2}, the term corresponding to j in the sum on the right-hand side of (5.18) can be expressed as which matches the term corresponding to j on the right-hand side of (5.13) because the substi- Thus, by induction, (5.13) holds for all ℓ ∈ {0, 1, . . . , ⌊k + N ⌋} and t ∈ [0, t * ]. Next we will obtain (5.11) from (5.13). For j ∈ {0, 1, . . . , ℓ − 1}, the term corresponding to j in the sum in (5.13) can be written as The first of the two terms in this expression is bounded above by By making the substitution x = e su and y = e st and then applying the result (3.199) of [12], we see that so upper bound on the first term in (5.19) becomes The second term in (5.19) equals which matches the term corresponding to j in (5.11). Furthermore, we have (5.22) and the second term matches the j = ℓ term in (5.11). Combining the bound in (5.20) with the results in (5.21) and (5.22) gives the bound in (5.11).

Proof of part 1 of Proposition 3.1
Here we show how the results in the previous section can be used to obtain the desired control on the difference between X j and x j up to time t * for j ≤ k − N . The result (5.24) below is essentially a restatement of part 1 of Proposition 3.1.
Proof. It follows from Lemmas 5.2 and 5.3 that the probability that t * < τ and (5.10) holds is at least 1 − ε/12 for sufficiently large N . Thus, the proposition will follow from Lemma 5.4 provided that for sufficiently large N , we have It will suffice to show that the two terms on the left-hand side of (5.25) each tend to zero as N → ∞ uniformly in ℓ ≤ k − N and t ∈ [t 0 , t * ]. The first term tends to zero by the reasoning in (5.7), so it remains to consider the second term. For for sufficiently large N . To show that this expression tends to zero as We use o(k N ) to denote a term which, when divided by k N , tends to zero as N → ∞ and O(1) to denote a term that stays bounded as N → ∞. Because n! ∼ √ 2πn n+1/2 e −n by Stirling's Formula, we have and because assumption A1 implies that . Therefore, the logarithm on the left-hand side of (5.26) is which tends to −∞ as N → ∞. The result follows.

Proof of part 2 of Proposition 3.1
In this subsection, we consider the case in which there is an integer j ∈ (k − N , k + N ). As noted before the statement of Proposition 3.1, for sufficiently large N there can be at most one such integer, so we will assume that N is large enough to ensure this. Also, such a j may not exist for every N , so in this subsection asymptotic statements as N → ∞ should be understood to mean that we consider a subsequence of integers (N i ) ∞ i=1 tending to infinity such that there is an integer in (k − N i , k + N i ) for all i. Recall that we can write j as in (3.9), with −1 < b j < 2, and d j = max{0, b j }. Recall also that when such a j exists, we have t * = (4/s) log k N . In this case, we can not use the same argument as in the proof of Part 1 of Proposition 3.1 because the expression in (5.29) does not tend to −∞ as N → ∞ if k − N is replaced by k + N . Instead, we will break the type j individuals into three subpopulations. Define the times Note that 0 ≤ r 1 < r 2 < t * . For each type j individual in the population, we can consider the time when this individual or its ancestor acquired its jth mutation. For t ∈ [0, t * ], using the notation of Corollary 4.4, we can write Here we are dividing the type j population into three groups, depending on whether the jth mutation occurred before time r 1 , between times r 1 and r 2 , or after time r 2 . We will consider these three subpopulations separately in the next three lemmas.
Lemma 5.6. We have Proof. Clearly X [0,r 1 ] j (t) = 0 for all t ∈ [0, t * ] when r 1 = 0, so we will assume that r 1 > 0. Each type j − 1 individual is acquiring mutations at rate µ. Therefore, by Lemma 5.1, the expected number of times, before time r 1 ∧ τ , that a type j − 1 individual acquires a jth mutation is at most We have Therefore, for sufficiently large N , the expression in (5.31) is bounded above by By Markov's Inequality, this expression also gives an upper bound for the probability that at least one type j − 1 individual acquires a jth mutation by time r 1 ∧ τ . Using (3.9), the reasoning in (5.27), and the fact that which tends to −∞ as N → ∞. Thus, the probability that some individual acquires a jth mutation by time r 1 ∧ τ tends to zero as N → ∞. Combining this observation with Lemma 5.2 gives the result.
Lemma 5.7. For sufficiently large N , Proof. First, suppose b j > 0. By applying the argument that leads to (5.3) followed by the result of Lemma 5.1 and then (5.4), we get The result now follows from Lemma 5.2.
Lemma 5.8. There exist positive constants c and c ′ , not depending on ε, such that for sufficiently large N , Note that r 2 ≥ (1/s) log k N . Assume for now that τ > t * and that the event in (5.23) holds so that, in particular, We need to consider the asymptotic behavior of Therefore, using ∼ to denote that the ratio of the two sides tends to one as N → ∞, we have In view of (3.1), the constants c 1 and c 2 can be chosen so that the equation holds for all allowable values of δ. Also, using (5.38) and then reasoning as in (5.16), we get , it follows from (5.42) and (5.43) that there are positive constants c 3 and c 4 such that for sufficiently large N , We still need to control the second term on the right-hand side of (5.37), which requires bounding Z We now take expectations of both sides of this equation. Using that X (u) ≤ 3 by the reasoning that leads to (5.5), and that Lemma 5.1 holds, we get for sufficiently large N , Reasoning as in the derivation of (5.8) from (5.6), we have e −s(j+1)u sj+3e −sju ≤ e −sjr 2 (e −su sj+3) for u ≥ r 2 , so for sufficiently large N , Note that if Y is a random variable and G is a σ-field such that E[Y |G] = 0, then by the conditional Chebyshev's Inequality, Therefore, (5.45) implies In view of (5.35), when the event in (5.46) holds, for sufficiently large N we have Combining this result with (5.37) and (5.44), we get that when equation (5.38) and the event in (5.46) hold and when τ > t * , we have for sufficiently large N , where In view of Lemma 5.2, Proposition 5.5, and equation (5.46), the result will follow if we can show that y N → 0 as N → ∞. To show this, we make a calculation similar to the calculation in the proof of Part 1 of Proposition 3.1. Noting that e −sjr 2 = e −(d j +1)j log k N and using (5.27), (5.28), and (5.32), we get which tends to −∞ as N → ∞. Thus, y N → 0 as N → ∞, which completes the proof.
Combining Lemmas 5.6, 5.7, and 5.8 and using (5.30), we arrive immediately at the following result, which is essentially part 2 of Proposition 3.1.
Proposition 5.9. There exist positive constants C 1 and C 2 such that for sufficiently large N , we have, for all j ∈ (k − N , k + N ),

Proof of parts 3 and 4 of Proposition 3.1
In this subsection, we complete the proof of Proposition 3.1. We will need the following lemma. Recall from (3.4) that k * = max{j ∈ N : j < k + N }.
Proof. We have, using the reasoning in (5.27), We consider two cases. First, suppose k * + 1 − k N ≥ 1/2. It follows from assumption A2 that (k N log k N )/ log(s/µ) → 0 as N → ∞. Therefore, the first term dominates the expression in (5.48), so the expression tends to −∞ as N → ∞. On the other hand, suppose k * + 1 − k N < 1/2. Then k * < k N − 1/2, which for sufficiently large N implies that k * < k − N by (3.5). It follows that there are no integers in the interval (k − N , k + N ), which means t * = (2/s) log k N . Because The results below establish parts 3 and 4 of Proposition 3.1. Proposition 3.1 follows immediately from Propositions 5.5, 5.9, 5.11, and 5.12.
Proposition 5.11. For sufficiently large N , Since the rate of events that increase the number of type k * individuals by one is thus always greater than the rate of events that decrease the number of type k * individuals by one, the process (X τ k * (t), t ≥ 0) is a submartingale. By Doob's Maximal Inequality and Lemma 5.1, for sufficiently large N , This expression tends to zero as N → ∞ by Lemma 5.10 which, in view of Lemma 5.2, implies the result.
Proposition 5.12. For sufficiently large N , Proof. Each individual of type k * acquires mutations at rate µ. Therefore, by Lemma 5.1, the expected number of times, before time t * ∧ τ , that a type k * individual acquires a (k * + 1)st mutation is at most This expression tends to zero as N → ∞ by Lemma 5.10 and the fact that k * → ∞ as N → ∞.
The result now follows from Markov's Inequality and Lemma 5.2.
6 Proof of part 1 of Proposition 3.8 Recall that Proposition 3.5 states that M (t) is close to zero for t ≤ a N and close to j during the time interval [γ j , γ j+1 ). The time ζ 2 can be interpreted as the first time at which the approximation to M (t) given in Proposition 3.5 fails to hold. Part 1 of Proposition 3.8 stipulates that, during the time interval [t * , a N T ], the time ζ 2 can not happen until either ζ 1 or ζ 3 has occurred. That is, as long as the behavior of the type j individuals follows the description in Propositions 3.2, 3.3 and 3.6, the mean number of mutations in the population must satisfy the approximation in Proposition 3.5. Note that part 1 of Proposition 3.8 is a deterministic statement. To prove it, we will assume that ζ 0 = ∞, meaning that until time t * the population behaves according to Proposition 3.1. We will show that if t ∈ (t * , a N T ] and ζ 1 ∧ ζ 3 > t, then the approximation in Proposition 3.5 is valid up through time t. We begin with two lemmas. The first one gives a useful bound that follows from (3.18) and (3.19), and the second one shows that if t < τ j+1 , then type j individuals contribute little to the mean number of mutations at time t. Lemma 6.1. Let C 6 = C 3 + 1 + 4δ, where C 3 comes from (3.18). Suppose j ≥ k * + 1. Suppose τ j+1 < ζ 1,j and τ j+1 ≤ a N T . Suppose also that either τ j+1 < ζ 3 or j ≤ J. Then for sufficiently large N , Also, suppose k * + 1 ≤ j ≤ J, τ j ≤ t < ζ 1,j , and t ≤ a N T . Then for sufficiently large N , if
Lemma 6.2. If t ∈ (t * , a N T ] and ζ 1 ∧ ζ 3 > t, then Proof. Suppose j ≥ k * + 1. If the statement of part 1 of Proposition 3.3 holds, then no early type j individual can get a type j + 1 mutation until after time τ j+1 ∧ a N T . No other type j individual appears until after time ξ j ≥ τ j . Thus, we have X j+1 = 0 for t ≤ τ j+1 ∧ τ j ∧ a N T . Also, τ j+1 ≥ τ j ∧ a N T , as noted in Remark 3.4. Thus, since we are assuming that ζ 1 ∧ ζ 3 > t, we have X j+1 (t) = 0 on the event that t ≤ τ j . Therefore, when t ∈ (t * , a N T ] and ζ 1 ∧ ζ 3 > t, there can be at most one value of j for which τ j+1 > t but X j (t) > 0. Because ζ 3 > t, the calculation in (3.28) implies that τ J > t and thus X j (t) = 0 for all j > J. Because X j (t) ≤ s/µ when t < τ j+1 , the result follows.
The approximation in Proposition 3.5 has four parts. The first part pertains to the case t ≤ a N , the second part pertains to the case t ∈ [a N , γ k * +1 ), and the third part pertains to the case in which t ∈ [γ j , γ j+1 ) for some j ≥ k * + 1. The fourth part will be a consequence of the first three. Proposition 6.3 below handles the case of t ≤ a N . Proposition 6.3. For sufficiently large N , on the event that ζ 0 = ∞ and ζ 1 ∧ ζ 3 > t, we have for all t ∈ (t * , a N ], Proof. Fix t ∈ (t * , a N ]. We will assume throughout the proof that ζ 0 = ∞ and ζ 1 ∧ ζ 3 > t. Suppose first that 0 ≤ j ≤ k − N . Write α = (1 + δ) 2 /(1 − δ) 2 . By equations (3.12) and (3.8) and the fact that G j (t) − G 0 (t) = sj for all t ≥ 0, we have Because e −s(a N −t) ≤ 1 and thus e (µ/s)e st ≤ e, it follows that . Then, using (3.10) instead of (3.8), the same reasoning used in (6.3) gives that for some positive constant C 7 , For sufficiently large N , there will be at most one integer in the interval (k − N , k + N ). In this case, using (6.4) and then using that (µ/s)e st ≤ 1 for the last inequality, we get Consider now the case in which j ≥ k * + 1 and τ j+1 ≤ t. Then by (3.20), The assumption that t < ζ 1 entails that t * < τ k * +1 ≤ τ j+1 in view of part 3 of Proposition 3.1 and Remark 3.4, so using (3.12), we get Therefore, another application of (3.12) leads to for all v ≥ 0, combining (6.7) and (6.8) leads to By Lemma 6.1, for sufficiently large N we have Combining these observations gives that for sufficiently large N , , it follows that for sufficiently large N , Thus, making the substitution ℓ = j − k * , for sufficiently large N , In view of (6.4), we see that C 6 µe st /sk 2 N → 0 as N → ∞, and therefore the infinite sum on the right-hand side of (6.9) is dominated by the leading term when N is large. Therefore, for sufficiently large N , using (6.4) again, It remains only to consider the case in which j ≥ k * + 1 and τ j+1 > t, for which the necessary bound is given in Lemma 6.2. Combining (6.5), (6.6), (6.10), and Lemma 6.2, we get that for sufficiently large N , As N → ∞, clearly C 7 /⌈k − N ⌉! → 0 and 2αC 6 k * /k 2 N → 0. As for the fourth term, we have e s(a N −t) ≤ e sa N = s/µ, which means We next consider the case in which t ∈ (a N , γ k * +1 ). During this period of time, the mean number of mutations in the population increases rapidly from near zero at time a N to near k * at time γ k * +1 . The upper bound on the mean number of mutations given by Proposition 6.5 below will be sufficient for our purposes. Before stating this proposition, we prove a lemma which will also be useful in studying the population at later times. Lemma 6.4. Suppose j and ℓ are positive integers with j ≥ k * . Let Suppose also that ζ 0 = ∞ and ζ 1 ∧ ζ 3 > t. Then for sufficiently large N , Proof. Assume for now that j ≥ k * + 1. Then because ζ 1 > t, the bounds in (3.20), combined with the facts that γ j+K < γ j+ℓ+K and τ j+1 < τ j+ℓ+1 by Remark 3.4, give If instead j = k * , then we use (3.12), as in (6.8), rather than (3.20) to get the lower bound on X j (t), and we again obtain (6.12). In both cases, We now apply Lemma 6.1 and (3.26) to get that for sufficiently large N , Because e sa N = s/µ, combining this inequality with (6.12) gives the result.
Proposition 6.5. There is a positive constant C 4 such that if N is sufficiently large, then for all t ∈ (a N , γ k * +1 ), on the event that ζ 0 = ∞ and ζ 1 ∧ ζ 3 > t we have Proof. Suppose t ∈ (a N , γ k * +1 ). Suppose also that ζ 0 = ∞ and ζ 1 ∧ ζ 3 > t. Note that ℓX k * +ℓ (t). (6.13) By Lemma 6.4, for sufficiently large N , Because t − τ k * +1 ≤ γ k * +1 − τ k * +1 = a N and e sℓa N = (s/µ) ℓ , we have for sufficiently large N , (6.14) If r ℓ denotes the ℓth term in the sum on the right-hand side of (6.14), then r 1 = C 6 and for ℓ ≥ 1, which tends to zero as N → ∞ because log µ s which tends to −∞ as N → ∞ by (1.7). Therefore, the first term dominates the sum on the right-hand side of (6.14) for sufficiently large N , so for sufficiently large N we have Because k * − k N ≤ k + N − k N → 0 as N → ∞ by (3.5), the result follows from (6.13), (6.16), and (6.17).
It remains to consider the case in which t ∈ [γ j , γ j+1 ) for some j ≥ k * + 1. In this case, we will need to consider carefully the contributions to M (t) not just from individuals with an unusually large number of mutations, as in the proofs of Propositions 6.3 and 6.5, but also from individuals with an unusually small number of mutations. Therefore, we will use the following two lemmas, which parallel Lemma 6.4. Lemma 6.6. Suppose j and ℓ are positive integers such that j − ℓ ≥ k * + 1.
Lemma 6.7. Suppose i and j are positive integers such that 0 ≤ i ≤ k * and j ≥ k * + 1. Let κ(t) = 1 + δ if t ≤ γ k * +K , and let κ(t) = k 2 N if t > γ k * +K . There is a positive constant C 8 such that if N is sufficiently large, then for all t ∈ [γ j , γ j+K ] ∩ [0, a N T ], on the event that ζ 0 = ∞ and ζ 1 ∧ ζ 3 > t we have Proof. Because ζ 1 > t, equations (3.12) and (3.13) give Because we are working on the event that ζ 0 = ∞, we can use the bounds on X i (t * ) and X k * (t * ) from (3.8) and (3.10). Recall that for sufficiently large N , there is at most one integer j such that k − N < j < k + N , which then must be k * . Let Because d j ≥ 0, it follows from Proposition 3.1 that for sufficiently large N , so for sufficiently large N , Also, equation (3.12) implies that for sufficiently large N , and therefore, Combining (6.22), (6.23), and (6.24), we get that for sufficiently large N , By (3.20), for sufficiently large N , Combining this result with (6.25) gives that for sufficiently large N , Note that the ratio of exponentials on the right-hand side of (6.26) is the same as the ratio of exponentials on the right-hand side of (6.18) with j −k * in place of ℓ. Consequently, the argument used to prove Lemma 6.6 gives Putting this result together with (6.26) gives that for sufficiently large N , Because ζ 0 = ∞, we have τ j ≥ t * = (θ/s) log k N , where θ = 4 if k − N < k * < k + N and θ = 2 otherwise. Therefore, recalling (1.22), Recall that λ i = 1 when i = k * . Also, for i < k * , we have λ i = (1 + δ)/(1 − δ) when θ = 2 and λ i = (1 + δ)k 2 N /C 1 when θ = 4. It follows that for sufficiently large N , Combining (6.27), (6.28), and (6.29) gives the result.
Proposition 6.8. There exists a positive constant C 5 such that for sufficiently large N , if t ∈ [γ j , γ j+1 ) ∩ [0, a N T ] for some j ≥ k * + 1, then on the event that ζ 0 = ∞ and ζ 1 ∧ ζ 3 > t, we have Proof. Throughout the proof, we work on the event that ζ 0 = ∞ and ζ 1 ∧ ζ 3 > t. We also assume that t ∈ [γ j , γ j+1 ). Note that The argument for bounding the first term is similar to that in the proof of Proposition 6.5. By Lemma 6.4, for sufficiently large N , Since e sℓa N = (s/µ) ℓ , it follows that e sℓ(t−τ j+1 ) = (s/µ) ℓ e −sℓ(γ j+1 −t) and therefore Let r ℓ be the ℓth term in the sum on the right-hand side of (6.31). Then r 1 = C 6 e −s(γ j+1 −t) and for ℓ ≥ 1, which goes to zero as N → ∞ by the argument following (6.15). Therefore, the first term dominates the sum on the right-hand side of (6.31), so for sufficiently large N we have Also, by Lemma 6.2, as N → ∞ by (6.11). Combining this result with (6.32), we get that for sufficiently large N , Consider now the second term in (6.30). Suppose ℓ ≤ j − k * − 1, so that j − ℓ ≥ k * + 1. As in Lemma 6.6, write α ℓ (t) = (1 + δ)/(1 − δ) if t ≤ γ j−ℓ+K and α ℓ (t) = k 2 N /(1 − δ) if t > γ j−ℓ+K . Then Lemma 6.6 implies that for sufficiently large N , Because γ j − τ j = a N and e sa N = s/µ, we have Therefore, for sufficiently large N , Let v ℓ denote the ℓth term on the right-hand side of (6.36). Note that t < γ j+1 ≤ γ j−1+K as long as N is large enough that K ≥ 2. Therefore, v 1 = 2((1 + δ)/(1 − δ))e −s(t−γ j ) and for ℓ ≥ 1, To see that this expression tends to zero as N → ∞, note that which tends to −∞ as N → ∞ by assumption A2. Therefore, the first term dominates the sum on the right-hand side of (6.36) when N is large. For sufficiently large N , we therefore have Finally, we consider the third term in (6.30). Suppose 0 ≤ i ≤ k * . Define κ(t) as in the statement of Lemma 6.7. By Lemma 6.7, for sufficiently large N , . . , k * }. Therefore, for sufficiently large N , the sum k * i=0 v i is dominated by the i = k * term, and we get for sufficiently large N . If j = k * + 1 and N is sufficiently large, then κ(t) = 1 + δ, and so Then, for ℓ ≥ 2, we have w ℓ+1 /w ℓ ≤ 3(µ/s) 2/3k N , which tends to zero as N → ∞ by the argument following (6.15). Therefore, for sufficiently large N , the ℓ = 2 term is largest, so if j ≥ k * + 2, then which tends to zero as N → ∞ by the argument around (6.37). Combining (6.39) with the bounds in (6.40) and (6.41) gives that for sufficiently large N , The result now follows from (6.30), (6.34), (6.38), and (6.42).
Remark 6.9. If t ∈ [t * , γ j+1 ] ∩ [0, a N T ], then on the event that ζ 0 = ∞ and ζ 1 ∧ ζ 3 > t, it follows from (6.32) and (6.33) that where we get s/N µ in place of Js/N µ for the second term from the argument in the proof of Lemma 6.2 that there can be at most one value of i for which , then on the event that ζ 0 = ∞ and ζ 1 ∧ ζ 3 > t, equations (6.38) and (6.42) imply that In particular, for t ∈ [γ j , γ j+1 ), unless t is close to γ j or γ j+1 , nearly all individuals in the population at time t will be of type j.

Proof of part 2 of Proposition 3.8
Recall that Proposition 3.6 consists of three parts. The first part simply bounds τ k * +1 . The second part is concerned with R(t), which can be interpreted as the number of new types that have emerged between times a N (t − 1) and a N t. The third part pertains to the spacings between the times τ j . The time ζ 3 is the first time at which one of the statements of Proposition 3.6 fails to hold. Part 2 of Proposition 3.8 stipulates that ζ 3 can not happen until either ζ 1 or ζ 2 has occurred. That is, as long as the behavior of the type j individuals follows the description in Propositions 3.1 and 3.2, and the mean number of mutations in the population behaves as described in Proposition 3.5, the results of Proposition 3.6 must continue to hold. Part 2 of Proposition 3.8, like part 1, is a deterministic statement. To prove it, we will assume that ζ 0 = ∞. We will fix a time t ∈ [t * , a N T ] and show that if ζ 1 > t and ζ 2 ≥ t, then ζ 3 > t, which means that the conclusions of Proposition 3.6 are valid through time t.

An upper bound on τ k * +1
In this subsection, we establish the following result, which gives part 1 of Proposition 3.6.
Proposition 7.1. For sufficiently large N , on the event that ζ 0 = ∞, ζ 1 > 2a N /k N , and (v) dv ≤ 3/s by part 1 of Proposition 3.5. Therefore, since 2µa N /k N → 0 as N → ∞ by (1.8), for sufficiently large N we have Also, by Proposition 3.1, if we set d = 0 when k * ≤ k − N and d = d k * when k * > k − N , we get Combining (7.1), (7.2), and (7.3), we see that there is a constant c > 0 such that Arguing as in (5.48), we get Because k * /k N → 1 as N → ∞, and k N − k * ≥ k N − k + N → 0 as N → ∞ by (3.5), the first term on the right-hand side of (7.5) is at least (1/2) log(s/µ) for sufficiently large N . Because (k N log k N )/ log(s/µ) → 0 as N → ∞ by assumption A2, it follows that the first term dominates the right-hand side of (7.5), and thus the expression in (7.5) tends to infinity as N → ∞. Hence, (7.4) holds, which completes the proof.

Approximating R(a N t)/k N by q(t)
In this subsection, we establish the second part of Proposition 3.6, which states that R(a N t)/k N can be well approximated by q(t), where q is the function defined in (1.13). The first lemma collects some properties of the function q. Proof. Note that (1.13) is equivalent to the renewal equation where f (u) = g(u) = 1 {0≤u<1} . That this equation has a unique solution which is nonnegative and bounded on every finite interval is a consequence of Theorem 2 in [10]. Another consequence of Theorem 2 in [10] is that the function t → q(t) − g(t) is continuous, which implies that q is right continuous on [0, ∞) and continuous on [0, 1) ∪ (1, ∞).
Consider next the case in which t * < t < a N . Suppose also that ζ 0 = ∞, ζ 1 > t, and ζ 2 ≥ t. Let θ > 0. If k * + 1 ≤ ℓ ≤ J and τ ℓ+1 ≤ t, then Lemma 6.1 implies that for sufficiently large N , Note that τ ℓ ≥ t * by parts 3 and 4 of Proposition 3.1. Therefore, because ζ 2 ≥ t, part 1 of Proposition 3.5 implies that Since 2e 3 < 41 and µa N → 0 as N → ∞, it follows from (7.9), (7.10), and (7.11) that for sufficiently large N , Therefore, for sufficiently large N , Furthermore, by repeating the above argument with t in place of τ j+1 , we see that for sufficiently in which case the last statement of Lemma 6.1 implies that τ ℓ+1 ≤ t. Therefore, if k * + 1 ≤ j ≤ J and τ j ≤ t, then (7.12) implies that for sufficiently large N , and rearranging this equation gives j ≤ (k * + 1)e t/[a N (1−θ)] . In view of (3.23), it follows that for sufficiently large N , Likewise, equation (7.12) and Proposition 7.1 imply that if k * + 1 ≤ j ≤ J and τ j ≤ t, then for sufficiently large N , and the observation following (7.13) thus implies that if for sufficiently large N . Because k N → ∞ and k * /k N → 0 as N → ∞, we can see from (7.14) and (7.15) that for sufficiently large N , equation (7.8) holds for all t ∈ (t * , a N ) as long as θ is chosen to be sufficiently small relative to η.
We next consider the value of R(t) for t ∈ [a N , a N T ]. We will find it useful to introduce the following notation. For t ∈ [0, ζ 1 ∧ a N T ), let Note that M (t) is well-defined because, by Remark 3.4, we have τ j < τ j+1 , and therefore γ j < γ j+1 , whenever τ j < ζ 1 . As long as the conclusions of Proposition 3.5 hold,M (t) is a good approximation to the mean number of mutations in the population at time t.
The other possibility is that t < γ k * +1 . Because t − a N < τ k * +1 , the times τ k * +1 , . . . , τ j occur in the interval (t − a N , t]. Therefore R(t) = j − k * if t ≥ a N and R(t) = j if t < a N , which again matches the conclusion of the lemma in view of (7.16).
The lemma below is the key to obtaining the integral equation for the limit function q.
The following deterministic result will help us to obtain the second part of Proposition 3.6 from Lemmas 7.3 and 7.6.
We claim that r 1 (t) < r(t) < r 2 (t) and r 1 (t) < q(t) < r 2 (t) for all t ∈ [0, T ]. To see this, let u = inf{t : r(t) ≥ r 2 (t)}. Seeking a contradiction, suppose u ≤ T . Clearly u ≥ 1, and so r 2 (u) − r(u) ≥ (1 + η) u u−1 (r 2 (t) − r(t)) dt > 0, which contradicts the right continuity of r and r 2 . Therefore, r(t) ≤ r 2 (t) for all t ∈ [0, T ]. A parallel argument gives r(t) ≥ r 1 (t) for all t ∈ [0, T ]. The result for q is a special case of the result for r, which completes the proof of the claim.

The spacings between τ j and τ j+1
The third part of Proposition 3.6 primarily pertains to the spacings between τ j and τ j+1 . The proposition below establishes the necessary relationship between the times τ j and the function q, and leads easily to the main result (3.26).
Let η > 0. From (7.19) we see that if N is sufficiently large, then Let L j be the number of integers ℓ ≥ k * + 1 such that γ ℓ ∈ [τ j , τ j+1 ). Then the interval [τ j , τ j+1 ) intersects at most L j + 1 intervals of the form [γ ℓ−1 , γ ℓ ), so by Lemma 7.4, if N is sufficiently large, then By the induction hypothesis, (7.32) holds if j is replaced by ℓ ∈ {k * + 1, . . . , j − 1}. By Lemma 7.2, we have q(u) ≤ e for all u ≥ 0. Also, γ k * +1 /a N = 1 + τ k * +1 /a N ≤ 1 + 2/k N for sufficiently large N by Lemma 7.1. Since q is right continuous and q(1) = e − 1, it follows that for sufficiently large N , we have sup Thus, using (3.1) and (7.32), for sufficiently large N , It follows that L j ≤ 1 + (3k N /a N )(τ j+1 − τ j ). Combining this observation with (7.36) gives Write C = 3 + 4C 5 . Because k N /(a N s) → 0 as N → ∞ by (1.7) and µ/s → 0 as N → ∞, it follows that for sufficiently large N , Combining (7.38) with (7.34) and (7.35), we get that for sufficiently large N , To simplify notation, write Make the substitution u = v/a N and divide both sides by a N k N to get for sufficiently large N . By Proposition 7.8, we have |R(a N u)/k N − q(u)| < δ for u < τ j+1 /a N , so for sufficiently large N , We now pursue the upper and lower bounds separately. In view of part 2 of Proposition 3.5, because Combining this result with (7.39) yields Since sa N → ∞, we have C/(sa N ) < η for sufficiently large N . Therefore, bringing the last two terms on the right-hand side of (7.40) to the left-hand side, we get for sufficiently large N . Also, since q(u) ≥ 1 for all u ≥ 0 by Lemma 7.2, we have q(u)(1 − α) ≤ q(u) − α for all u ≥ 0 and α > 0. Therefore, for sufficiently large N , The upper bound (7.31) follows as long as η is chosen to be small enough relative to δ.
To obtain (7.32), note that h(v) ≤ k * for all v ∈ [a N , γ k * +1 ) ∩ [τ j , τ j+1 ). Therefore, for sufficiently large N , Combining this result with (7.39) and using that sa N → ∞ as N → ∞, we get for sufficiently large N , If x ≥ 1 and α > 0, then x(1 + α) ≥ x + α. Therefore, for sufficiently large N , The lower bound (7.32) follows as long as η is chosen to be small enough relative to δ. It remains to prove the last statement of the proposition. Suppose now that ζ 0 = ∞, ζ 1 > t, ζ 2 ≥ t, t ≤ a N T , and (7.33) holds. We need to show that τ j+1 ≤ t. By Lemma 6.1, if N is large enough, it suffices to show that Therefore, it suffices to show that for sufficiently large N , (7.41) Using (7.35), the bound in part 2 of Proposition 3.5, and the reasoning leading to (7.38) with t in place of τ j+1 , we get for sufficiently large N , for sufficiently large N . Combining this bound with (7.42) and (7.43), and then using (7.33), we get that for sufficiently large N , which implies (7.41) as long as η is chosen to be small enough relative to δ, in view of the fact that sa N → ∞ as N → ∞.

Proof of part 3 of Proposition 3.8
To prove part 3 of Proposition 3.8, we need to show that with high probability, the results of Propositions 3.2 and 3.3 hold as long as the results of Propositions 3.5 and 3.6 hold. Propositions 3.2 and 3.3 describe the behavior of the number of type j individuals. The proof proceeds by induction on j, in the sense that to show that the number of type j individuals behaves as predicted, we will need to know that the number of type j − 1 individuals does so. Define the stopping time We then need to show that Essentially, this means that the number of type j individuals behaves as expected with high probability until after time ρ j . Note that if t < ρ j , then the reasoning in Remark 3.7 implies that no individual of type J + 1 or higher can appear until after time t. Because assumption A3 implies that sk N → ∞, we have sJ ≤ 1 for sufficiently large N . It follows that 1 + s(j − M (t)) ≥ 0 for all j ≥ 0, and therefore G * j (t) = G j (t) for all j ≥ 0 as noted in (4.6). Throughout this section, we will assume that N is large enough that sJ ≤ 1, which will make it possible to ignore the distinction between G * j (t) and G j (t).
where T j,1 (t), T j,2 (t), and T j,3 (t) denote the three terms in the previous line. To establish the result of part 1 of Proposition 3.2, we need to show that |T j,2 (t) + T j,3 (t)|/T j,1 (t) < δ with high probability for t ∈ [t * , ρ * j ]. We first bound T j,2 (t)/T j,1 (t).
Because the right-hand sides of (8.4) and (8.5) tend to zero as N → ∞, the result follows.
To bound T j,3 (t)/T j,1 (t), we will need to control the fluctuations of the process (Z ′ j (t), t ≥ t * ). The following preliminary bound will be useful.
Proof. Note that In view of parts 1 and 3 of Proposition 3.6, we have µ(u−t * ) ≤ µγ k * +K ≤ µ(a N +2Ka N /k N ) → 0 as N → ∞. If u ≤ a N , then u t * M (v) dv ≤ 3/s by Lemma 7.4 and therefore, for sufficiently large N , Suppose instead a N < u ≤ γ k * +K . By the results of Propositions 3.5 and 3.6, .
Since the other terms are of a smaller order of magnitude for large N , it follows that there is a positive constant c < 1/3 such that 3k N /(2sa N ) + (k N + C 4 ) + (K − 1)(k * + K + 2C 5 ) < ck 2 N for sufficiently large N . Hence, for sufficiently large N , The result follows from (8.7), (8.8), and (8.9).
Proof. The process (Z ′ j (t), t ≥ t * ) is a mean zero martingale. By Corollary 4.3 and (5.5), for t ≥ t * , For u < ρ * j , the conclusion of part 1 of Proposition 3.2 holds for j − 1 through time u, and so Plugging this result and the result of Lemma 8.3 into (8.10), and then bringing the conditional expectation inside the integral, we get for t > t * , Using this fact along with (8.2) followed by (8.11), we get for u > t * , Thus, for t > t * , By Lemma 8.1, for sufficiently large N we have µ(1 + δ)X j−1 (t * )(1 + 3/s) ≤ X j (t * ) on {ζ 0 = ∞}. Therefore, for t > t * , if N is sufficiently large, then on {ζ 0 = ∞} ∈ F t * , When j = 0, we take N large enough that (s/µ) 2k N /3 ≥ 21, and then, using the bound that e −sj(u−t * ) ≤ 1, equations (8.12) and (8.6) imply that on {ζ 0 = ∞}, When 1 ≤ j ≤ k * , we break the integral in (8.12) into two pieces and use (8.6) to get that on {ζ 0 = ∞}, . (8.14) Parts 1 and 3 of Proposition 3.6 imply that if γ k * +K ≤ ζ 3 , then In particular, we must have ρ * j ≤ 3a N /2. Combining this observation with the L 2 Maximum Inequality, we get that on {ζ 0 = ∞}, If we can show that, on {ζ 0 = ∞}, the second factor on the right-hand side of (8.16) is less than one for sufficiently large N , the result will follow by taking expectations of both sides in (8.16). We will assume that ζ 0 = ∞ and show that this factor tends to zero as N → ∞, uniformly in j.
We consider separately the cases j = 0 and 1 ≤ j ≤ k * . Suppose first that j = 0. We have X 0 (t * ) ≥ (1 − δ)N by Proposition 3.1, so using (8.13), which tends to −∞ as N → ∞ because (log k N )/(log N ) → 0 as N → ∞ and because, by assumption A1, we have log(1/s)/ log N → 0 and (log log(s/µ))/ log N → 0 as N → ∞. It follows that the expression in (8.17) tends to zero as N → ∞. Next, suppose 1 ≤ j ≤ k * . Then, using (8.14), We will show that the two terms on the right-hand side of (8.18) each go to zero as N → ∞. For the first term, we use Proposition 3.1, equation (5.27), and the fact that log(1/s)/k N → 0 by assumption A1 to get If j ≤ k − N , then − log N + j log(s/µ) ≤ 0, and sjt * ≥ 2j log k N . Therefore, the expression in (8.19) tends to −∞ as N → ∞. If instead j ∈ (k − N , k + N ), then t * = (4/s) log k N , and we can we write j as in (3.9) to get which tends to −∞ as N → ∞ because b j < 2 and (5.32) holds. Thus, the first term on the right-hand side of (8.18) tends to zero as N → ∞. To bound the second term, we use (8.19) to get which tends to −∞ as N → ∞ because (k N log k N )/ log N → 0 as N → ∞. It follows that the right-hand side of (8.18) tends to zero as N → ∞.
To get (8.22), observe that when the complement of the event in (8.27) holds and ρ j ≥ γ k * +L , we have For v ∈ [γ k * +K , ρ j ), the result of Proposition 3.5 implies that for sufficiently large N , Also, the result of Proposition 3.6 implies that if ρ j ≥ γ k * +L then for sufficienly large N , Also, S j (γ k * +K ) ≤ N , so combining (8.29), (8.30), and (8.31), we get that when the complement of the event in (8.27) holds and ρ j ≥ γ k * +L , for sufficiently large N , The logarithm of the right-hand side of (8.32) is which tends to −∞ as N → ∞. Thus, the right-hand side of (8.32) tends to zero as N → ∞ and thus is guaranteed to be less than one if N is sufficiently large. Since S j (γ k * +L ) is an integer, it must be zero. Furthermore, if S j (γ k * +L ) = 0, then S j (t) = 0 for all t ≥ γ k * +L , which implies that X j (t) = 0 for all t ≥ γ k * +L . We can now conclude (8.22).
Remark 8.7. It follows immediately from Propositions 8.5 and 8.6 that if 0 ≤ j ≤ k * , then for sufficiently large N ,

Other type j individuals before time τ j+1
For the rest of section 8, we assume that j ∈ {k * + 1, . . . , J}. In this subsection, we focus on type j individuals that are not early, meaning they are descended from type j mutations that occurred after the time ξ j defined in (3.16). We will show that the claim in part 2 of Proposition 3.3 holds with high probability. We will begin with three preliminary lemmas. Define the random set Recall from (3.15) that as long as q j > 1, we have q j = j − k N if j ∈ Θ and q j = j − M (τ j ) if j / ∈ Θ. When j ∈ Θ, it will be difficult to bound X j (t) as tightly as when j / ∈ Θ, so we will structure the proof so that we can allow a larger probability of ζ 1,j ≤ ρ j when j ∈ Θ. Because the times τ i are spaced at least a N /3k N apart until time ζ 3 by Proposition 3.6, there can be at most 12 values of j for which τ j < ρ j and j ∈ Θ. Lemma 8.8. There is a positive constant C 9 for sufficienty large N , the following hold: 1. If j / ∈ Θ and t ∈ [τ j , τ j+1 ∧ ρ j ), then s(q j − C 9 ) ≤ G j (t) ≤ s(q j + C 9 ).
Recall that X j,2 (t) denotes the number of type j individuals at time t descended from an individual that acquired a type j mutation after time ξ j . Then, using the notation of Corollary 4.4, for t ∈ [ξ j , τ j+1 ∧ ρ j ], we have with the convention that Z ′ j (t) = 0 ifρ j ≤ ξ j . Then for t ≥ ξ j , we have We will separately consider the two terms on the right-hand side of (8.42). Lemma 8.11 below gives the required bounds on the first term.
Lemma 8.11. For sufficiently large N , we have Proof. Suppose t ∈ [ξ j ,ρ j ]. By Lemma 8.10, for sufficiently large N , Because ξ j ≥ τ j , we have e −s(ξ j −τ j ) − e −s(t−τ j ) ≤ 1, which gives the upper bound in the lemma. Now suppose t ∈ [τ * j ,ρ j ]. The same argument that yields (8.43) implies that for sufficiently large N , For sufficiently large N , part 3 of Lemma 8.8 gives q j ≥ (1 − 2δ)k N when τ j < ρ j , which by assumption A1 implies that s(ξ j −τ j ) → 0 and therefore e −s(ξ j −τ j ) → 1 uniformly in j as N → ∞. 7). Consequently, the lower bound in the lemma follows from (8.44).
It remains to show that the second term on the right-hand side of (8.42) is small. We know from Corollary 4.4 that the process (Z ′ (ξ j + t), t ≥ 0) is a mean zero martingale, so the problem is to control the fluctuations of this process. The next result gives the key second moment estimate.
Lemma 8.12. For sufficiently large N , we have, for all t ≥ 0, Proof. By Corollary 4.4, we have We now can use the reasoning leading to (5.5) to get Using Lemma 8.10, we get that if u <ρ j , then Also, although Z ′ j (u) can be negative, it can be seen from (8.46) that the integrand in (8.49) must be nonnegative so, in particular, for u ∈ [ξ j ,ρ j ). Because s < 1 for sufficiently large N , we see that se −s(u−τ j ) − e −s(u−τ j ) is an increasing function of u. Also, Z ′ j (u) = Z ′ j (ρ j ) for all u ≥ ρ j . Therefore, (8.51) holds for all u ≥ ξ j . Thus, combining (8.49) and (8.50) gives Every expression in the integrand in (8.52) is F ξ j -measurable except Z ′ j (u). Since (Z ′ (ξ j +t), t ≥ 0) is a mean zero martingale by Corollary 4.4, we can apply Fubini's Theorem and then evaluate the conditional expectation in (8.52) to get Now for all u ≥ ξ j , so for sufficiently large N , as claimed.
Proof. Suppose ξ j < ρ j . Consider first the case in which j / ∈ Θ. Then for sufficiently large N , and the result follows because s(ξ j − τ j ) → 0 as N → ∞ by the argument following (8.45).
Lemma 8.14. For sufficiently large N , Proof. By the L 2 Maximum Inequality and Lemma 8.12, Plugging this result into (8.53), then taking expectations and using (3.14) and the fact that J ≤ 4T k N for sufficiently large N , we get that for sufficiently large N , .
The lemma follows.
Combining (8.42) with Lemmas 8.11 and 8.14 and then summing over j immediately yields the following corollary, which shows that the result of part 2 of Proposition 3.3 holds with high probability.

Early type j individuals before time τ j+1
In this subsection, we continue to assume j ∈ {k * +1, . . . , J}. We consider early type j individuals, which are descended from type j mutations that occur at or before the time ξ j . We will show that the claims of part 1 of Proposition 3.3 hold with high probability. Note that (3.18) involves a constant C 3 , which we will define to be We will assume throughout this section that N is large enough that the conclusions of Lemma 8.9 hold. From part 1 of Proposition 3.3, we know that if j ≥ k * + 2, then no early type j − 1 individual acquires a jth mutation until time τ j ∧ ρ j ∧ a N T . In particular, no type j individual can appear until time ξ j−1 ∧ ρ j . This result is also true when j = k * + 1 if we define ξ k * = t * because, according to Proposition 3.1, on {ζ 0 = ∞}, no individuals of type k * + 1 appear until after time t * . Therefore, using the notation from Corollary 4.4 in which X [u,v] j (t) denotes the number of type j individuals at time t descended from individuals that acquired a jth mutation during the time interval (u, v], as long as ξ j−1 < ρ j , we have (8.55) We will consider these three processes separately.
Lemma 8. 16. Let (Z(t), t ≥ 0) be a continuous-time birth and death process in which each individual independently dies at rate ν > 0 and gives birth to a new individual at rate λ > ν. Assume that Z(0) = 1. Then Proof. It is well-known (see section 5 of Chapter III in [1]) that the generating function for this process is Because P (Z(t) > 0) = 1 − F (0, t), the result (8.56) follows after some algebra. Also, at any given time, the probability that the next event is a birth is λ/(λ + ν), while the probability that the next event is a death is ν/(λ + ν). Therefore, (8.57) follows from well-known results for asymmetric random walks (see, for example, section 3 of chapter 3 in [13]).
Lemma 8.17. Suppose κ is an (F t ) t≥0 stopping time such that ξ j−1 ≤ κ ≤ ξ j and, with positive probability, a type j mutation occurs at time κ. For sufficiently large N , the following hold: 1. Given that a type j mutation occurs at time κ, the probability that the number of type j descendants of this mutation exceeds (s/µ) 1−δ before time ρ j is at most 3sk N .
2. Given that a type j mutation occurs at time κ, the probability that κ + a N /8T k N < ρ j and at least one type j individual descended from this mutation is alive at time κ + a N /8T k N is at most 3sk N .
Proof. Suppose a type j mutation occurs at time κ. By the reasoning leading to (4.1), each type j descendant of the individual that gets this mutation gives birth at rate less than or equal to 1 + s(j − M (t)). Since s(j − M (t)) = G j (t) + µ, it follows from Lemma 8.8 that until time ρ j , the birth rate is at most λ = 1 + (e + 2δ)sk N . As long as the number of type j individuals descended from this mutation is less than (s/µ) 1−δ , the reasoning leading to (4.2) implies that the rate at which each such individual either acquires a mutation or dies and gets replaced by an individual that is not a type j individual descended from this mutation is at least µ + 1 − (s/µ) 1−δ (1 + s(j − M (t)))/N . Using Lemma 8.8 and (1.9), we see that for sufficiently large N , this quantity is at least ν = µ + 1 − δsk N until time ρ j . Therefore, until time ρ j occurs or the number of type j individuals descended from this mutation reaches (s/µ) 1−δ , the number of such individuals is dominated by a continuous-time branching process in which each individual gives birth at rate λ and dies at rate ν. By Lemma 8. 16, the probability that the number of type j individuals descended from this mutation exceeds (s/µ) 1−δ before time ρ j is at most Likewise, the probability that κ + a N /8T k N < ρ j and at least one type j individual descended from this mutation is alive at time κ + a N /8T k N is less than or equal to We must show that the expressions in (8.58) and (8.59) are bounded above by 3sk N for sufficiently large N . We have Because e + 3δ < 3 by (3.1), it remains only to show that the denominators of the expressions in (8.58) and (8.59) tend to one as N → ∞. If N is large enough that ν < 1, then we have (ν/λ) (s/µ) 1−δ ≤ (1 + (e + 2δ)sk N ) −(s/µ) 1−δ , which tends to zero as N → ∞ because The result follows. Lemmas 8.18,8.19, and 8.20 below give us the bounds that we will need to establish that the result of part 1 of Proposition 3.3 holds with high probability. We will use the notation o(k −1 N ) for a collection of probabilities p j,N such that Lemma 8.18 shows that it is highly unlikely that any type j mutations appearing before time τ j will have descendants alive in the population after time τ * j . As a result, it will be possible essentially to ignore such mutations.
Proof. Writeρ j = τ j ∧ ρ j . Suppose first that j ≥ k * + 2. Becauseρ j ≤ ζ 1,j−1 , the result of part 2 of Proposition 3.3 holds for type j − 1 individuals up to timeρ j , which means Also, sinceρ j ≤ ζ 1,j−1 , Lemma 6.1 implies that for sufficiently large N , which leads to Now suppose instead that j = k * + 1, and recall that ξ k * = t * by definition. Then becausẽ ρ j ≤ ζ 1,j−1 , the result of part 1 of Proposition 3.2 gives Reasoning as in the proof of Lemma 6.1 but using (3.12), we get that for sufficiently large N , so (8.62) holds in this case as well. Therefore, combining (8.62) with part 2 of Lemma 8.8 and writing C 10 = 2(1 + 4δ)/(1 − 2δ), we get Becauseρ j ≤ ζ 1,j−1 , the last statement of part 1 of Proposition 3.3 implies that no early type j − 1 individual acquires a jth mutation before timeρ j . Because each type j − 1 individual acquires mutations at rate µ, the number of times that type j − 1 individuals that are not early acquire a jth mutation between the times ξ j−1 and inf u : is Poisson with mean C 10 /k N . In particular, the probability that at least one such mutation occurs during this time period is at most C 10 /k N , and the probability that two or more such mutations occur during this time period is at most C 2 10 /k 2 N . If such a mutation occurs before timẽ ρ j , then by Lemma 8.17, the probability that the number of type j descendants of this mutation exceeds (s/µ) 1−δ before time ρ j is at most 3sk N . Likewise, the probability that some type j descendant of this individual is still alive at time τ * j ∧ ρ j is at most 3sk N . Thus, the probabilities of the events in (8.60) and (8.61) are both bounded above by This expression is o(k −1 N ) because sk N → 0 as N → ∞ by assumption A3.
Lemma 8.19 bounds the probability that, when j / ∈ Θ, we have an early type j mutation with descendants alive after time τ * j . This bound is given in (8.64) below. A sharper bound is given in (8.63) for the probability that such a mutation occurs before time ξ − j .
Lemma 8.19. For sufficiently large N , we have . (8.63) and P X Proof. By Lemma 8.10 and part 1 of Lemma 8.8, on the event {j / ∈ Θ}, we have By part 3 of Lemma 8.8, we have (1 + 3δ)/(q j + C 9 ) ≤ 2/k N for sufficiently large N . Also, recalling (8.39) and observing that log(1/sq j )/q j → 0 as N → ∞ on {τ j < ρ j } by assumption A1 and part 3 of Lemma 8.8, we get that for sufficiently large N , on {τ j < ρ j }, Therefore, on the event {j / ∈ Θ}, we have Likewise, if we replace ξ − j by ξ j in (8.65), (8.66), and (8.67), we get that on the event {j / ∈ Θ}, Let Γ 1 be the number of type j mutations between times τ j and ξ − j ∧ρ j , and let Γ 2 be the number of type j mutations between times τ j and ξ j ∧ ρ j . Because each type j − 1 individual acquires mutations at rate µ, equations (8.67) and (8.68 . Let A i be the event that τ * j ≤ ρ j and the individual that gets the ith type j mutation between times τ j and ξ j has type j descendants alive at time τ * j . By Lemma 8.9, this individual must have type j descendants alive for at least a time a N /8T k N after the time of the mutation. Therefore, by Lemma 8.17, we have P (A i |Γ ≥ i) ≤ 3sk N . Using part 3 of Lemma 8.8, equation (3.1), and the fact that J/k N ≤ 4T for sufficiently large N , we get Equation (8.63) follows because e −b < ε/832T by (3.14). Likewise, which implies (8.64).
Lemma 8.20. For sufficiently large N , on the event {ρ j > τ j }, we have .
Furthermore, (8.69) holds even if j is random, as long as τ j is a stopping time.
Proof. Letρ j = τ j+1 ∧ ρ j . Using the notation of Corollary 4.4, if t ≥ τ j , then for all t ≥ τ j , then the process (Y (t), t ≥ τ j ), having been expressed in (8.72) as the sum of an increasing process and a martingale, is a submartingale. By Doob's Maximal Inequality, By (8.72) and Lemma 8.10, for some t ∈ [τ j , τ * j ∧ ρ j ] .
Write θ = 1 − δ − (e + 2δ)/4T , which is positive by (3.1). Arguing as in the derivations of (8.73) and (8.74) but using (s/µ) θ in place of C 3 and τ * j ∧ ρ j in place ofρ j , we get The result (8.71) follows because (s/µ) −θ k N → 0 as N → ∞, as can be seen by taking the logarithm and using (1.7). The argument for (8.70) is similar to the argument for (8.69). Again using Corollary 4.4, we have By (8.75), the process (W (ξ − j + t), t ≥ 0) is a submartingale. By Doob's Maximal Inequality, By (8.75) and Lemma 8.10, Since q j ≥ (1 − 2δ)/k N on {τ j < ρ j } for sufficiently large N by part 3 of Lemma 8.8, it follows that for sufficiently large N , we have, on {τ j < ρ j }, Therefore, recalling (8.54) and noting that J ≤ 4T k N for sufficiently large N , we get .
Taking conditional expectations of both sides with respect to F τ j yields (8.70).
Proof. We first bound the probability that an early type j individual gets a (j + 1)st mutation between times ξ j−1 and τ * j . When E 1 occurs, we have, using (3.26), as N → ∞. Because each type j individual acquires mutations at rate µ, the expression on the right-hand side of (8.80) bounds the probability that E 1 occurs and, for some j ∈ {k * + 1, . . . , J}, an early type j individual gets another mutation between times ξ j−1 and τ * j . Consider next the possibility that such a mutation occurs between times τ * j and τ j+1 ∧ ρ j . In view of (8.64) and the fact that there are at most 12 values of j for which τ j < ρ j and j ∈ Θ, the probability that there are fewer than k 1/2 N values of j for which X j,1 (t) > 0 for some t ∈ [τ * j , ρ j ] tends to one as N → ∞. Suppose there are indeed fewer than k 1/2 N such values of j, and suppose E 2 and E 3 occur. Then, for sufficiently large N , Therefore, using part 2 of Lemma 8.8, as N → ∞. The expression on the right-hand side of (8.81) bounds the probability that for some j ∈ {k * + 1, . . . , J}, an early type j individual gets another mutation between times τ * j andρ j . Equations (8.80) and (8.81) thus imply that the probability that E 1 , E 2 , and E 3 occur but A j also occurs for some j ∈ {k * + 1, . . . , J} tends to zero as N → ∞. The result follows.

Type j individuals between times τ j+1 and γ j+K
In this subsection, we show that the number of type j individuals behaves quite predictably between times τ j+1 and γ j+K . In particular, we show that the result of part 3 of Proposition 3.3 holds with high probability. The key to the argument will be showing that the fluctuations in X j (t) are small. We assume throughout the subsection that j ∈ {k * + 1, . . . , J}. Let We apply Corollary 4.3 with τ j+1 in place of κ and ρ ′ j in place of τ to get that for t ≥ τ j+1 , To lighten notation, we will set and then the process (Z ′′ j (τ j+1 + t), t ≥ 0) is a mean zero martingale. By definition, we have s/µ ≤ X j (τ j+1 ) ≤ 1 + s/µ, so the first term in (8.82) is very close to the expression in (3.20). Therefore, to show that (3.20) holds with high probability, we need to show that the second and third terms in (8.82) are small relative to the first term with high probability. We begin with a result similar to Lemma 8.10 that holds between times γ j−1+K and γ j+K .
It remains to bound the third term on the right-hand side of (8.82). To bound this term, we will need to control the fluctuations of the martingale (Z ′′ j (τ j+1 + t), t ≥ 0). Lemma 8.27 below gives the required second moment bound. Before stating this lemma, we provide some estimates on G j (v) in the following two lemmas.
Proof. We will use the results of Proposition 3.5, which by definition hold up to time ρ ′ j . Also, recall that K = ⌊k N /4⌋. First, suppose j ≥ k * + 1 + K and t < ρ ′ j . If t ≤ a N , then by part 1 of Proposition 3.5, for sufficiently large N , If t ∈ (a N , γ k * +1 ), then by part 2 of Proposition 3.5, for sufficiently large N , If t ∈ [γ k * +1 , γ j−K ), then by part 3 of Proposition 3.5, for sufficiently large N , which leads to (8.90). Next, suppose k * + 1 ≤ j ≤ k * + K. If t < a N ∧ ρ ′ j , then (8.91) holds as before, which again yields (8.90).
, and note from the definition of ρ ′ j that this means u < γ j+K . Now Because positive terms can be bounded below by zero, we have, using (3.26), Using (3.26) and (7.17) and the fact that there are at most 2K terms in the sum, we get It remains to consider the integral between times τ j+1 and γ * . Suppose first that j ≥ k * +1+K. In view of part 3 of Proposition 3.6, for sufficiently large N , as long as γ * < ρ ′ j , we have Thus, assuming that γ * < ρ ′ j , Lemma 8.25 implies that for sufficiently large N , Suppose next that k * + 1 ≤ j ≤ k * + K. Then, as long as a N < ρ ′ j , parts 1 and 3 of Proposition 3.6 imply that for sufficiently large N , and the same reasoning that yields (8.98) gives Therefore, since (γ k * +1 ∧ ρ ′ j ) − a N ≤ 2a N /k N by part 1 of Proposition 3.6, for sufficiently large (8.100) By combining (8.97) and (8.98) when j ≥ k * + 1 + K and u ∈ [γ j−K , ρ ′ j ), and by combining (8.97), (8.99), and (8.100) when k * + 1 ≤ j ≤ k * + K and u ∈ [a N , ρ ′ j ), we obtain for sufficiently large N in both cases, The result of the lemma follows.
Taking expectations of both sides yields the result.

Type j individuals after time γ j+K
In this subsection, we show that the number of type j individuals decreases rapidly after time γ j+K . More specifically, we show that the results of parts 4 and 5 of Proposition 3.3 hold with high probability. We will consider the event By Corollary 8.29 when t = γ j+K , with probability at least 1 − ε/24, for all j ∈ {k * + 1, . . . , J} either H j occurs or γ j+K > ρ j . Recall also the definition of the event F j from (8.83). on the event F j ∩ H j ∩ {γ j+K < ρ j }.

(8.116)
Also, on the complement of the event in (8.115), if ρ j ≥ γ j+L then S j (γ j+L ) ≤ k 2 N 2 e γ j+L γ j+K G j (v) dv S j (γ j+K ).
Reasoning exactly as in (8.29), (8.30), (8.31), and (8.32) but with j in place of k * , we get that on the complement of the event in (8.115), if ρ j ≥ γ j+L then for sufficiently large N , . As in the discussion following (8.32), we see that the right-hand side tends to zero as N → ∞ and thus must be less than one if N is large enough. Because S j (γ j+L ) is an integer, it follows that S j (γ j+L ) = 0, and therefore that X j (t) = S j (t) = 0 for all t ≥ γ j+L . Combining this observation with (8.116), we get that the sum of the probabilities in (8.114) is bounded above by J j=k * +1 2 k 2 N + P (F j ∪ H j ) .
9 Proof of Theorems 1.1, 1.2, and 1.4 With Proposition 3.8 having been established, in this section we use this result to prove Theorems 1.1, 1.2, and 1.4. All of these theorems follow rather directly from Propositions 3.2, 3.3, 3.5, and 3.6, which, as noted in section 3, all follow from Proposition 3.8. We prove Theorem 1.1 in section 9.1, Theorem 1.2 in section 9.2, and Theorem 1.4 in section 9.3.

The selective advantage of the fittest individuals
Recall that Q(t), defined in (1.12), is the difference between the number of mutations carried by the fittest individual and the mean number of mutations in the population. Consequently, it is a measure of the selective advantage that the fittest individuals in the population have over typical individuals in the population. Theorem 1.1 describes the asymptotic behavior of the process (Q(t), t ≥ 0) as the population size tends to infinity.
We assume that N is large enough that the conclusions of Propositions 3.2, 3.3, 3.5, and 3.6 hold with probability at least 1 − ε, and we work on the event that the conclusions of these propositions hold. Suppose first that 0 < u < v < 1. By Proposition 3.5, we have sup t∈S M (a N t) ≤ 3 (9.2) Also, note that a N u > t * for sufficiently large N , as can be seen from (1.7). Therefore, by part 1 of Proposition 3.6, for sufficiently large N we have τ k * +1 ≤ a N u. Therefore, recalling (3.23) and using either part 3 of Proposition 3.6 or Remark 3.4, for sufficiently large N we have R(a N t) = max{j : τ j ≤ a N t} for all t ∈ S. Suppose R(a N t) = i, so τ i ≤ a N t < τ i+1 . By part 1 of Proposition 3.3, no type i + 2 individual can appear before time τ i+1 , which implies that max{j : X j (a N t) > 0} ≤ i + 1. By part 3 of Proposition 3.3, we have X i−1 (a N t) > 0. Therefore, for all t ∈ S, R(a N t) − 1 ≤ max{j : X j (a N t) > 0} ≤ R(a N t) + 1. Now suppose instead 1 < u < v < ∞. Recall from (1.23) that j(t) = max{j : γ j ≤ a N t}, and write h(t) = max{j : τ j ≤ a N t}. Then, again recalling (3.23), for all t ∈ S we have R(a N t) = h(t) − j(t). By following again the derivation of (9.3), we get, for all t ∈ S, h(t) − 1 ≤ max{j : X j (a N t) > 0} ≤ h(t) + 1.

The speed of evolution
Here we obtain Theorem 1.2, which gives the asymptotic behavior of the mean number of mutations in the population and therefore determines the speed of evolution.
Proof of Theorem 1.2. It suffices to prove (1.18) when S = [u, v], where either 0 ≤ u < v < 1 or 1 < u < v < ∞. Suppose first that 0 ≤ u < v < 1. Then m(t) = 0 for all t ∈ S. By part 1 of Proposition 3.5, for all ε > 0, we have Suppose instead that 1 < u < v < ∞. We fix ε > 0, δ > 0, and T > max{1, v}. We assume for now that N is large enough that the conclusions of Propositions 3.5 and 3.6 hold with probability at least 1 − ε, and we work on the event that the conclusions of these propositions hold. Recall that j(t) = max{j : γ j ≤ a N t} = max{j : τ j ≤ a N (t − 1)}, so a N t ∈ [γ j(t) , γ j(t)+1 ) for all t ∈ S. By part 1 of Proposition 3.6 we have j(t) ≥ k * + 1 for all t ∈ S if N is sufficiently large, so it follows from Proposition 3.5 that |M (a N t) − j(t)| ≤ 2C 5 for all t ∈ S. We now obtain upper and lower bounds on the expression in (9.9). For the upper bound, we use (3.24) along with part 1 of Proposition 3.6 and the fact that q(t) ≤ e for all t by Lemma 7.2 to get For the lower bound, we use (3.25) and the fact that (γ k * +1 /a N ) − 1 ≤ 2/k N by part 1 of Proposition 3.6 to get It follows from (9.8), (9.10), and (9.11) that there exists a positive constant C, depending on ε, δ, and T , such that Because Proposition 3.6 implies that for all t ∈ S, j(t) ≤ k * + 3k N a N · a N v ≤ k * + 3vk N , and, by our assumptions, the event in (9.12) holds with probability at least 1 − ε for sufficiently large N , the result (1.18) follows.

The distribution of fitnesses in the population
In this subsection, we prove Theorem 1.4, which describes the distribution of the fitnesses of individuals in the population at time a N t. We begin with a lemma concerning the differences τ j+1 − τ j . Recall again the definition of j(t) from (1.23).
Proof of Theorem 1.4. Let η > 0 and t ∈ (1, 2)∪(2, ∞). Choose θ = θ(η, t) such that 0 < θ < 1/4 and the three conditions at the beginning of the proof of Lemma 9.1 are satisfied. As in the proof of Lemma 9.1, choose ε > 0, δ ∈ (0, η/14), and T > t. We may assume that N is large enough that the conclusions of Propositions 3.3 and 3.6 hold with probability at least 1 − ε. For now we will suppose the conclusions of Propositions 3.3 and 3.6 hold.