Dynamics of the evolving Bolthausen-Sznitman coalescent

Consider a population of fixed size that evolves over time. At each time, the genealogical structure of the population can be described by a coalescent tree whose branches are traced back to the most recent common ancestor of the population. As time goes forward, the genealogy of the population evolves, leading to what is known as an evolving coalescent. We will study the evolving coalescent for populations whose genealogy can be described by the Bolthausen-Sznitman coalescent. We obtain the limiting behavior of the evolution of the time back to the most recent common ancestor and the total length of the branches in the tree. By similar methods, we also obtain a new result concerning the number of blocks in the Bolthausen-Sznitman coalescent.


Introduction
Consider a haploid population of fixed size n that evolves over time. The genealogy of the population at time t can be represented by a coalescent process (Π(s), s ≥ 0) taking its values in the set of partitions of {1, . . . , n}, which is defined so that integers i and j are in the same block of Π(s) if and only if the ith and jth individuals in the population at time t have the same ancestor at time t−s. The genealogical structure encoded by the coalescent process can also be represented as a tree T n (t). The shape of this tree changes over time as the population evolves, leading to what was called in [23] an evolving coalescent. The associated tree-valued stochastic process, for infinite as well as finite populations, was constructed and studied by Greven, Pfaffelhuber, and Winter [19]. Depperschmidt, Greven, and Pfaffelhuber [11] incorporated mutation and selection into the model.
Rather than studying the full tree-valued process, one can follow the evolution of certain properties of the tree that are of interest. One such property is the time back to the most recent common ancestor (MRCA) of the population. This evolution can be described by a process (A n (t), t ≥ 0), where t − A n (t) = sup{s : one individual at time s is the ancestor of all individuals at time t}.
Note that A n (t) is the height of the tree T n (t). The process (A n (t), t ≥ 0) increases linearly at speed one between jumps, and jumps downward when one of the two oldest families in the population dies out, causing a new MRCA to be established. One can also consider the process (L n (t), t ≥ 0), where L n (t) denotes the sum of the lengths of all branches in the tree T n (t). This process is of interest because, assuming that mutations occur at a constant rate θ along each branch of the coalescent tree, L n (t) should be roughly proportional to the number of distinct mutations observed in the population at time t.
A natural population model to consider is the Moran model [22]. In this model, the population size stays fixed at n. Changes in the population occur at times of a homogeneous Poisson process, and we will scale time so that these changes occur at rate n(n − 1)/2. At the time of each such change, one of the n individuals is chosen at random to give birth to a new offspring, and independently one of the other n − 1 individuals is chosen at random to be killed. For any fixed t ∈ R, the genealogy of the population follows Kingman's coalescent [20], meaning that each pair of lineages merges at rate one and no other transitions are possible. An analogous construction for infinite populations can be carried out using the lookdown construction of Donnelly and Kurtz [14]. The associated evolving coalescent was studied by Pfaffelhuber and Wakolbinger [23]. They showed that the jumps of the process (A(t), t ≥ 0) that follows the time back to the MRCA occur at times of a homogeneous Poisson process, but that the process (A(t), t ≥ 0) is not Markov. They also calculated the distributions of some other quantities, such as the number of individuals in the population at time t that will have descendants in the population when the next MRCA is established and the number of individuals in the population that will become the MRCA of the population in the future. Delmas, Dhersin, and Siri-Jegousse [12] extended these results by considering also the distribution of the sizes of the two oldest families at time t. Simon and Derrida [32] did some further work related to the evolution of the MRCA for populations with genealogies governed by Kingman's coalescent, and considered correlations between the time back to the MRCA and a measure of genetic diversity. Pfaffelhuber, Wakolbinger, and Weisshaupt [24] studied the evolution of the total branch length. They showed that the sequence of processes (L n (t) − 2 log n, t ∈ R) converges as n → ∞ in the Skorohod topology to a limit process which is a stationary process with infinite infinitesimal variance.
Evans and Ralph [17] studied the dynamics of the time back to the MRCA in a population in which a single "immortal particle" produces offspring at times of a Poisson process, and descendants of the offspring eventually die out. In this setting, the process (A(t), t ≥ 0) is a Markov process whose jump rates and stationary distribution can be calculated explicitly. An example of a process that fits into this framework is the α-stable continuous-state branching process conditioned on nonextinction with 1 < α ≤ 2.
The goal of the present paper is to determine the dynamics of the time back to the MRCA and the total branch length for populations whose genealogy is given by the Bolthausen-Sznitman coalescent. The Bolthausen-Sznitman coalescent, which was introduced in [6], is an example of a coalescent with multiple mergers [25,28], in which it is possible for many lineages to merge at once. More precisely, the Bolthausen-Sznitman coalecent started with n blocks is a continuoustime Markov chain taking its values in the set of partitions of {1, . . . , n} such that whenever the partition has b blocks, each transition that involves the merger of k blocks into one is happening at rate This means that the total rate of all transitions when there are b blocks is It is well-known (see, for example, [25]) that this process has the property of sampling consistency, meaning that if m < n, then the process restricted to the integers {1, . . . , m} is a Bolthausen-Sznitman coalescent started with m blocks. The reason for focusing on the Bolthausen-Sznitman coalescent is that this coalescent process has arisen in a wide variety of settings. The Bolthausen-Sznitman coalescent has been shown recently to describe the genealogy of certain populations undergoing selection [3,8]. The Bolthausen-Sznitman coalescent also describes the genealogical structure of Neveu's continuousstate branching process [5], certain Galton-Watson processes with heavy-tailed offspring distributions [31], and Derrida's generalized random energy model [6,7].
We describe the population model that we will study in subsection 1.1. In subsection 1.2, we state our main result concerning the dynamics of the time back to the MRCA. In subsection 1.3, we state our main result concerning the total branch length. By similar methods, we also obtain a new result concerning the number of blocks of the Bolthausen-Sznitman coalescent. We state this result in subsection 1.4. The rest of the paper is devoted to proofs.

A population model
We now define more precisely the population model that we will study in this paper. We assume that for all times t ∈ R, there are exactly n individuals in the population, labeled by the integers 1, . . . , n. Changes in the population occur at times of a homogeneous Poisson process on R with rate n − 1. At the time of such a change, one particle is chosen at random to give birth to a random number ξ of new offspring, with P (ξ = k) = n n − 1 · 1 k(k + 1) , k = 1, 2, . . . , n − 1.
Then ξ of the n − 1 individuals who did not give birth are chosen at random to be killed, and the new individuals take over the labels of the individuals who were killed. We now give a representation of the genealogy of this population. For each s ∈ R and t ≥ 0, let Π n (s, t) be the partition of 1, . . . , n such that i and j are in the same block of Π n (s, t) if and only if the individuals at time s labeled i and j are descended from the same ancestor immediately before time s − t. We consider the population immediately before time s − t, rather than exactly at time s − t, to ensure that for each s ∈ R, the process (Π n (s, t), t ≥ 0) is right continuous. As long as there is no change in the population at time s, the partition Π n (s, 0) consists of n singletons. However, if k new individuals are born at time s, then Π n (s, 0) will consist of one block of size k + 1 and n − k − 1 singleton blocks. Proposition 1. Fix s ∈ R. Then (Π n (s, t), t ≥ 0) is the Bolthausen-Sznitman coalescent started with n blocks.
Proof. With probability one, there is no change in the population at time s, and Π n (s, 0) consists of n singletons. When ancestral lines are followed backwards in time, an event in which k − 1 individuals are born becomes an event in which k randomly chosen lineages merge, because the ancestral lines of the k − 1 children merge with that of the parent. The rate of events in which k − 1 individuals are born is (n − 1)P (ξ = k − 1) = n k(k − 1) = n k λ n,k , where λ n,k comes from (1). Therefore, (Π n (s, t), t ≥ 0) follows the dynamics of the Bolthausen-Sznitman coalescent up to the time of the first merger. That the process continues to follow the dynamics of the Bolthausen-Sznitman coalescent after this time is a consequence of the exchangeability of the population model and the sampling consistency of the Bolthausen-Sznitman coalescent.
For s ∈ R and t ≥ 0, let N n (s, t) be the number of blocks of the partition Π n (s, t). Let A n (s) = inf{t : N n (s, t) = 1}, which is the time back to the MRCA of the population and corresponds to the height of the tree T n (s) that represents the genealogy of the population at time s. Let which is sum of the lengths of all branches in the tree T n (s).

Time back to the MRCA
We consider here how the time back to the MRCA of the population evolves over time. Proposition 3.4 of [18] states that if Y has the exponential distribution with mean 1, then for each fixed where ⇒ denotes convergence in distribution as n → ∞. We are interested here in finding the limit of the stochastic processes (A n (t), t ≥ 0) as n → ∞. We now construct the limit process (A(t), t ∈ R) and its time-reversal (R(t), t ∈ R) from a Poisson point process. The construction is similar to constructions in [16] and [17]. Let N be a Poisson process on R 2 with intensity λ × ν, where λ is Lebesgue measure and ν(dy) = e −y dy. Let M consist of the points {(t, y) : (−t, y) is a point of N }. Note that M is also a Poisson process on R 2 with the same intensity as N . For each (t, x) ∈ R 2 , define the wedges Then let Note that with this construction, both (R(t), t ∈ R) and (A(t), t ∈ R) are right continuous, and because of the relationship between N and M , we have A(t) = R(−t) for all t such that there is no point in M whose first coordinate is t.
The figure below shows how (R(t), t ∈ R) and (A(t), t ∈ R) are constructed from the Poisson point process. The process (R(t), t ∈ R) decreases linearly at speed one between jumps but jumps up to the level of any point of N that appears above it. That is, if (t, y) is a point of N and R(t−) < y, then R(t) = y. The process (A(t), t ∈ R) increases linearly at speed one between jumps. When the trajectory of the process encounters a point of M at time t, the process jumps downward to the highest level y such that there is a point of M on the diagonal half-line starting at (t, y) and extending upward and to the right.
The following theorem is our main result concerning the time back to the MRCA for the evolving Bolthausen-Sznitman coalescent. This result is proved in Section 2 by considering the processes in reversed time and establishing convergence to (R(t), t ≥ 0). Note that unlike in the case of Kingman's coalescent, the limit process is Markov.

Remark 3.
For each fixed t ≥ 0, the event that R(t) ≤ y differs from the event that there is no point of N in W (t, y) only on a set of probability zero. The probability that there is no point of If Y has the exponential distribution with mean one, then P (− log Y ≤ y) = P (Y ≥ e −y ) = exp(−e −y ). Thus, we see from the above construction that the processes (R(t), t ∈ R) and (A(t), t ∈ R) are stationary processes whose stationary distribution is the same as the distribution of − log Y . This result also follows from Theorem 2 and equation (4).
Remark 4. The processes (R(t), t ∈ R) and (A(t), t ∈ R) are both examples of piecewise deterministic Markov processes, a class of processes whose theory was developed by Davis [9,10]. These processes are characterized by their deterministic behavior between jump times, which is linear drift for the processes (R(t), t ∈ R) and (A(t), t ∈ R), and their jump rates. The jump rates for (R(t), t ∈ R) can easily be read from the Poisson process N . The process (R(t), t ∈ R) jumps away from x at rate e −x , and when it jumps away from x, the distribution of the location to which it jumps has density e x−y 1 {y≥x} . This means that the rate of jumps from x to y is given by q(x, y) = e −y 1 {y≥x} .
To obtain the jump rates for (A(t), t ∈ R), note that if A(t) = x then there is a point of M at (t + s, x + s) for some s ≥ 0 but no points above this diagonal line. Because the density of the intensity measure of M at (t + s, x + s) is e −(x+s) , we see that conditional on A(t) = x, the distribution of the time before the next jump is exponential with mean 1. That is, for all x ∈ R, the process (A(t), t ∈ R) jumps away from x at rate one, which implies that the jump times of (A(t), t ∈ R) form a homogeneous Poisson process of rate one on R. If the process jumps away from x at time t, then the probability that it jumps below y is the probability that there is no point of M in the trapezoidal region {(s, z) : Differentiating with respect to y, we see that the rate of jumps from x to y is given by r(x, y) = exp(e −x − e −y − y)1 {y≤x} .
As a check on these formulas, let π(y) = e −y e −e −y for y ∈ R, which is the density of the stationary distribution, obtained by differentiating the right-hand side of (5). Then note that π(x)q(x, y) = π(y)r(y, x) for all x, y ∈ R, as expected given that (R(t), t ∈ R) and (A(t), t ∈ R) are related by time reversal.

Total branch length
Theorem 5.2 of [13] establishes that for each fixed t ∈ R, where, using γ to denote Euler's constant, We consider here the stochastic process L n = (L n (t), t ≥ 0). If there are no changes in the population between times t and t + s, then the tree T n (t + s) is obtained by starting with the tree T n (s) and then adding a segment of length s to each of the n branches. Consequently, the process L n increases at speed n between jumps. However, if k individuals die at time t, then the tree T n (t) is obtained from the tree T n (t−) by removing k of the branches, causing a downward jump in the process L n . Our main result concerning the dynamics of the total branch length for the Bolthausen-Sznitman coalescent is that the processes (L n (t), t ≥ 0), properly centered and scaled, converge to a stable process of Ornstein-Uhlenbeck type.
Theorem 5. Let ν be the measure on R whose density with respect to Lebesgue measure is given by be a stationary process of Ornstein-Uhlenbeck type generated by (2 − γ, 0, ν, 1). As n → ∞, the sequence of processes converges in the Skorohod topology to (L(t), t ≥ 0).

Number of blocks
The techniques used to establish Theorem 5 can also be used to prove a new result about how the number of blocks of the Bolthausen-Sznitman coalescent changes over time. Because the number of blocks for other coalescents with multiple mergers has been studied in some depth (see, for example, [1,2]), we believe that this result may be of independent interest.
Theorem 7. Let (Π n (t), t ≥ 0) be a Bolthausen-Sznitman coalescent started with n blocks. Let N n (t) be the number of blocks of Π n (t), and let X n (t) = log n n N n t log n − ne −t − nte −t log log n log n .
Let (S(t), t ≥ 0) be a stable Lévy process satisfying As n → ∞, the sequence of processes (X n (t), t ≥ 0) converges in the Skorohod topology to 2 Proof of Theorem 2

Construction from random recursive trees
Our proof of Theorem 2 makes use of a connection between the Bolthausen-Sznitman coalescent and random recursive trees that was discovered by Goldschmidt and Martin [18]. Suppose ℓ 1 , . . . , ℓ n are disjoint subsets of N such that if 1 ≤ i < j ≤ n, then the smallest element of ℓ i is less than the smallest element of ℓ j . A random recursive tree with vertices labeled ℓ 1 , . . . , ℓ n can be constructed inductively as follows. The vertex labeled ℓ 1 is the root. For k ≥ 2, once the vertices labeled ℓ 1 , . . . , ℓ k−1 have been placed in the tree, the vertex labeled ℓ k is attached to a vertex chosen uniformly at random from those labeled 1, . . . , ℓ k−1 . Suppose e is an edge in the tree connecting vertices x and y, where x is closer to the root than y. We can cut the tree at the edge e by deleting the edge from the tree as well as the entire subtree below e. That is, we remove all vertices z such that the shortest path from the root to z goes through x. All integers that are in labels of vertices that are removed from the tree are then added to the label of x. When a random recursive tree is cut at a randomly chosen edge, the remaining tree is a random recursive tree on the new set of labels (see Proposition 2.1 of [18]).
To establish the connection with the Bolthausen-Sznitman coalescent, start with a random recursive tree on n vertices, labeled with the integers 1, . . . , n. Then to each edge, add an independent exponential random variable with mean 1, whose value gives the time at which the edge is cut. For all t ≥ 0, let Π n (t) denote the partition of {1, . . . , n} such that i and j are in the same block of Π n (t) if and only if the integers i and j are in the same vertex label at time t. Then (Π n (t), t ≥ 0) is the Bolthausen-Sznitman coalescent started with n blocks (see Proposition 2.2 of [18]). Because the last transition always involves deleting an edge adjacent to the root, the time back to the MRCA for this Bolthausen-Sznitman coalescent is the maximum of the exponential random variables assigned to the edges adjacent to the root. In [18], Goldschmidt and Martin used this fact to prove (4).
We now use recursive trees to construct the evolving Bolthausen-Sznitman coalescent in reversed time. Note that the dynamics of an evolving Bolthausen-Sznitman coalescent in reversed time are the same as the dynamics of an ordinary Bolthausen-Sznitman coalescent, except that whenever k lineages are lost due to a merger, these lineages are replaced by k new lineages. For the purposes of studying the time back to the MRCA, the labeling of the lineages by the integers 1, . . . , n is unimportant, so we will use a different vertex labeling scheme in the recursive tree construction.
To carry out this construction, begin with a random recursive tree having n vertices constructed as above, and give every vertex the label zero. Add an independent exponential random variable with mean 1 to each edge to obtain the tree at time zero. The process evolves in time as follows. The edge labels decrease linearly at speed one. When an edge label hits zero at, say, time t, this edge is cut from the tree and all vertices below the edge are removed. If this cut removes k vertices, then we replace these vertices by adding k new vertices to the tree, one at a time, to randomly chosen vertices of the existing tree. The k new edges are assigned independent exponential random variables. The k new vertices are given the label t, corresponding to the time when they are added to the tree. Then the process continues to evolve according to the same rules.
By the result of Goldschmidt and Martin, we know that this process follows the dynamics of the Bolthausen-Sznitman coalescent up to the time of the first merger, in the sense that the time to the first transition is exponential with rate n − 1 = λ n and the probability that the first transition eliminates k vertices is λ n,k+1 /λ n . To see that the same dynamics continue after the first merger, note that if the first cut causes k vertices to be removed, the remaining tree has the shape of a random recursive tree on n − k vertices, while the edge lengths remain exponential random variables with mean 1 by the memoryless property of the exponential distribution. Consequently, when k more vertices are added according to the recursive procedure, the resulting tree has the shape of a random recursive tree on n vertices, and all of the random variables attached to the edges have the exponential distribution with mean 1. Thus, this process follows the same dynamics as the population process followed backwards in time, with each set of k lineages merging at rate λ n,k .
Let M n (t) denote the maximum of the exponential random variables assigned to the edges that are adjacent to the root at time t. Then t+M n (t) is the first time at which every vertex other than the root has a label greater than t, and M n (t) corresponds to the time back to the MRCA of the population at time −t. Let R n (t) = M n (t) − log log n. Then, we see that (R n (t), t ≥ 0) has the same finite-dimensional distributions as (A n (−t) − log log n, t ≥ 0), and the two processes would have the same law if the process (A n (−t), t ≥ 0) were modified at the jump times to make it right-continuous rather than left-continuous. Consequently, in view of the stationarity of the population process, to prove Theorem 2, it suffices to show that the processes (R n (t), t ≥ 0) converge in the Skorohod topology to (R(t), t ≥ 0), and it is this result that we will show.

A heuristic argument
To understand heuristically why Theorem 2 is true, note that if R n (t) ≤ z, then the process R n jumps above z only when a new vertex is attached to the root, and the random variable assigned to the new edge is greater than log log n + z. Because the number of blocks in the Bolthausen-Sznitman coalescent decreases by k − 1 whenever k blocks merge into one, the rate at which blocks are being lost, and thus new vertices are being added to the tree, is As long as not too many vertices are cut away from the tree at once, the probability that a new vertex attaches to the root is approximately 1/n. The probability that the exponential random variable assigned to the new edge is greater than log log n + z is e −z / log n. Hence, the rate at which the process R n jumps above z is approximately in agreement with the dynamics of the process (R(t), t ≥ 0). The rest of the proof consists of making these ideas rigorous.

Lemmas pertaining to random recursive trees
We prove here two lemmas related to random recursive trees that will be used later in the proof of Theorem 2.
Proof. The rate at which blocks are being lost in the Bolthausen-Sznitman coalescent is given by The rate of transitions in which at least half the blocks are lost is so the probability that such a transition occurs by time ε is bounded by ε. By the construction, for each block that is lost due to a merger in the Bolthausen-Sznitman coalescent, a new vertex is added to the tree. As long as the merger causes at most half the blocks to disappear, the probability that each new vertex attaches to the root is bounded by 2/n. Furthermore, when a new vertex attaches to the root, the probability that it causes the process (R n (t), t ≥ 0) to jump above z is the same as the probability that an exponential random variable with mean 1 is greater than log log n + z, which is e −(log log n+z) = e −z / log n. Consequently, in view of (10), the probability that the process (R n (t), t ≥ 0) jumps above z before time ε is bounded by which implies the result.
Lemma 9. Consider a random recursive tree with vertices labeled 1, . . . , n. Let d k be the depth of the vertex labeled k, which is the number of edges on the path from the root to k, and let D n = d 1 + · · · + d n . Then Also, there exists a positive constant C such that Proof. We first prove (11) by induction. Because the vertex labeled 1 is the root vertex, which has depth zero, clearly E[D 1 ] = 0, verifying (11). Suppose (11) holds for n = m−1, where m ≥ 2. Let F m−1 = σ(d 1 , . . . , d m−1 ). Recall that the random recursive tree can be constructed so that the vertex labeled m is attached to one of the previous m − 1 vertices at random. Consequently, the level of the vertex labeled m is one greater than the level of a randomly chosen previous vertex, so and thus Therefore, using the induction hypothesis, which implies that (11) holds for all n ∈ N. We next show by induction that The result is clear when n = 1 because Var(D 1 ) = 0. Suppose the claim holds for , so using the induction hypothesis,

Now (14) follows by induction.
It remains to bound E[d 2 k ]. Suppose k ≥ 2. If d k = j ≥ 2, then there is a sequence of numbers 1 = i 0 < i 1 < · · · < i j = k such that during the construction of the random recursive tree, vertex i ℓ attaches to vertex i ℓ−1 for ℓ = 1, . . . , j. Because the vertex i ℓ has a choice of i ℓ − 1 vertices to which it can attach, the probability of this event is 1/ Combining this bound with the trivial bound that P (d k = j) ≤ 1 for j = 1, 2 gives for some positive constant C. Combining this bound with (14) gives (12).

Generator of the limit process
From the Poisson process construction described in the introduction, it is clear that (R(t), t ≥ 0) is a Markov process. Furthermore, it is easy to describe the transition semigroup of the Markov process. Suppose R(0) = x. Then for y > x − t, we have R(t) ≤ y when there are no points in the Poisson process N above the line segment from (0, y + t) to (t, y). It follows that .
, and for x, y ∈ E, let d(x, y) = |e x −e y |. Then (E, d) is a complete separable metric space. Let C 0 (E) be the set of continuous real-valued functions on [−∞, ∞) that vanish at infinity with the norm f = sup x∈E |f (x)|. Note that if f ∈ C 0 (E), then lim x→−∞ f (x) exists and equals f (−∞), and lim x→∞ f (x) = 0. If f ∈ C 0 (E) and x ∈ E, define Note that the definition of P t f (x) makes sense when x = −∞, in which case the second term is zero. It is easily checked that P t f ∈ C 0 (E) and that P t f → f as t → 0. Consequently, (P t ) t≥0 is a Feller semigroup on C 0 (E), and (R(t), t ≥ 0) is a Feller process with semigroup (P t ) t≥0 . The following result characterizes the infinitesimal generator of the process (R(t), t ≥ 0) and describes a core for the generator. We see from the form of the generator in (17) that for y > x, the process jumps from x to y at rate e −y . See also chapter 26 of [10] for a full characterization of the domain of the extended generator.
Furthermore, C is a core for A.
Proof. Fix f ∈ C. Choose a real number z such that f is constant on [−∞, z]. By (16), for all x ∈ E, We need to show that the right-hand side of (18) converges to the right-hand side of (17) uniformly in which tends to zero uniformly in x as t → 0. Because which tends to zero uniformly in x as t → 0. Likewise, tends to zero uniformly in x as t → 0. Therefore, uniformly in x as t → 0. Equation (17) follows. It remains to show that C is a core for A. It is easy to see that C is dense in Thus, P t f is constant on [−∞, z + t]. By differentiating the right-hand side of (16), we see that the first and second derivatives of P t f are continuous and vanish at infinity. Thus, P t f ∈ C. It follows from Proposition 3.3 in Chapter 1 of [15] that C is a core for A.

Convergence of finite-dimensional distributions
We will show here that the finite-dimensional distributions of the processes (R n (t), t ≥ 0) defined from random recursive trees in section 2.1 converge as n → ∞ to the finite-dimensional distributions of (R(t), t ≥ 0).
Let G n (t) be the σ-field generated by the shape of the random recursive tree on n vertices at time t and the exponential random variables attached to the edges adjacent to the root. That is, G n (t) includes all the information about the tree at time t except for the values of the exponential random variables on the edges that are not adjacent to the root.
Fix a function f ∈ C. Let ε n = n −4 for all n ∈ N. Let By (4), R n (0) converges in distribution to R(0) as n → ∞. By Theorem 8.2 in chapter 4 of [15] (see also parts (a) and (b) of Remark 8.3), to show that the finite-dimensional distributions of (R n (t), t ≥ 0) converge to those of (R(t), t ≥ 0), it suffices to show that the following hold for all t ≥ 0: Note that (19) is obvious because |ξ n (s)| ≤ f for all s ≥ 0. To show (20), choose z > −∞ such that f is constant on [−∞, z]. For s ≥ 0, let J s be the event that there exists a time u ∈ [s, s + ε n ] such that R n (u) = R n (u−) and R n (u) > z. Because the process (R n (s), s ≥ 0) decreases at speed one between jumps, we have which proves (20). Next, observe that Thus, by Lemma 8, taking expectations of both sides gives as n → ∞, which gives (21). It remains only to show (22). When the tree is cut, we call the event a small cut if fewer than n/(log n) 1/2 vertices are removed as a result of the cut, and a large cut otherwise. We define the following five events. Recall that z has been chosen so that f is constant on [−∞, z].
• Let A 1 be the event that between times t and t + ε n , there is a small cut at some edge not adjacent to the root, one of the new edges attaches to the root and is assigned a label greater than log log n + z, and this is the only edge that attaches to the root between times t and t + ε n and gets a label greater than log log n + z.
• Let A 2 be the event that between times t and t + ε n , there is a large cut during which one of the new edges attaches to the root and is assigned a label greater than log log n + z.
• Let A 3 be the event that between times t and t + ε n , there is an event in which the tree is cut at some edge adjacent to the root, and one of the new edges attaches to the root and is assigned a label greater than log log n + z.
• Let A 4 be the event that between times t and t + ε n , two or more new edges attach to the root and are assigned labels greater than log log n + z.
The next lemma shows that A 2 , A 3 , and A 4 are unlikely to occur, which means that jumps of the process R n between times t and t + ε n will occur primarily on the event A 1 .
Proof. To bound P (A 2 ), note that an event during which k − 1 vertices are removed from the tree corresponds to a transition in the Bolthausen-Sznitman coalescent in which k blocks merge into one. Such events happen at rate n k λ n,k = n/[k(k − 1)] by (3). When such an event occurs, the expected number of vertices that reattach to the root is 1/(n − k + 1) + · · · + 1/(n − 1). When a vertex reattaches to the root, the probability that its label exceeds log log n + z is e −(log log n+z) = e −z / log n. Thus, for sufficiently large n, To bound P (A 3 ), note that an event in which the tree is cut at some edge adjacent to the root corresponds to a merger in the Bolthausen-Sznitman coalescent that involves the block containing the integer 1. By the sampling consistency of the Bolthausen-Sznitman coalescent, every other block merges with this block at rate λ 2 = 1. Consequently, the expected number of blocks that are removed from the tree between times t and t + ε n when the tree is cut at an edge adjacent to the root is ε n (n − 1). Provided that fewer than n/(log n) 1/2 vertices are removed as a result of the cut, the probability that a given vertex reattaches to the root is at most 2/n. The probability that the label on the new edge exceeds log log n + z is e −z / log n as before. Thus, for sufficiently large n To bound P (A 4 ), note that there are two ways that A 4 can occur. Either the tree can be cut twice between times t and t + ε n , or two or more edges can reattach to the root after a single cut. Because cuts of the tree happen at times of a Poisson process of rate λ n = n − 1, the probability that two or more cuts happen between times t and t + ε n is at most ε 2 n (n − 1) 2 . If there is an event in which k − 1 vertices are removed from the tree following a cut, there are k−1 2 pairs of vertices that could reattach to the root. On A c 2 , for sufficiently large n, the chance that two given vertices reattach to the root is at most 4/n 2 , and each new edge independently has probability e −z /(log n) of having a label greater than log log n + z. Thus, The result follows from (23), (24), and (25).
Proof. Recall that in our construction using trees, we labeled the vertices by the time in which they were added rather than using the vertex labels 1, . . . , n. However, for the purposes of this proof, we will arbitrarily number the vertices at time t by the integers 1, . . . , n, with the root being vertex 1. For k ≥ 2, let d k be the depth of vertex k, which is the number of edges on the path from the root to k. Let v k be the number of edges along the path from the root to k that are not adjacent to the root but that have at least n/(log n) 1/2 vertices below them. That is, v k is the number of edges e along this path not adjacent to the root such that if we cut the tree at the edge e, it would be classified as a large cut. Define D n (t) = d 2 + · · · + d n and V n (t) = v 2 + · · · + v n . For each k = 2, . . . , n, we will separately bound the probability that A 1 occurs and that k is the vertex that reattaches to the root with a label of at least log log n + z.
Note that there are d k − 1 − v k edges not adjacent to the root such that, if the tree were cut at that edge, the vertex labeled k would be removed from the tree and this would be a small cut.
The probability that one of these edges is cut before time The probability that the vertex labeled k reattaches to the root is between 1/n and 1/(n − n/ √ log n), and the probability that the new edge label is at least log log n + z is e −z / log n. Therefore, Because E[D n (t)] ≤ n log n by Lemma 9, it follows that P (A 1 ) ≤ 2e −z ε n for all n large enough that √ log n/( √ log n − 1) ≤ 2. Suppose the vertex k is cut from the tree. The probability that vertex k and some other vertex both reattach to the root with new edge labels greater than log log n + z is at most for sufficiently large n. Also, the probability that there are two cuts to the tree before time t + ε n is at most (n − 1) 2 ε 2 n . Combining these observations, we get Combining (26) and (27) gives We need to show that the six terms on the right-hand side of (28) tend to zero in expectation as n → ∞. By Lemma 9 and the Cauchy-Schwarz Inequality, is at most n times the rate of transitions in the Bolthausen-Sznitman coalescent that cause at least n/(log n) 1/2 blocks to be lost, we have from which it follows that the expected value of the second term on the right-hand side of (28) tends to zero as n → ∞. Using Lemma 9 and the fact that ε n = n −4 , it is easily checked that the expectations of the last four terms on the right-hand side of (28) tend to zero as n → ∞.
Proof. Recall that it remains only to show (22), which is equivalent to showing that By Lemma 11, . Therefore, using Lemmas 11 and 12, t + ε n ] such that at time τ , a new edge attaches to the root and is assigned a label greater than log log n + z. Denote by K the value of this label minus log log n, and let J = max{K, R n (t)}. Conditional on A 1 and F n (t), the distribution of K has a density given by k(y) = e z−y 1 {y>z} . Therefore,

It follows that
It now follows from Lemma 12 that the right-hand side of (32) tends to zero as n → ∞. The result now follows by combining this observation with (30) and (31).

Tightness
Here we show that the sequence of processes (R n ) ∞ n=1 is relatively compact. By Theorem 7.8 in Chapter 3 of [15], this result in combination with Proposition 13 implies that the processes (R n (t), t ≥ 0) converge in the Skorohod topology to (R(t), t ≥ 0).
Let 0 < τ 1,n < τ 2,n < . . . denote the jump times of (R n (s), s ≥ 0). As long as τ j,n − τ j−1,n > δ for all j such that τ j,n ≤ t and there are no jump times in [0, δ] or [t − δ, t], it is easy to choose the times t 0 , . . . , t n such that δ < t i − t i−1 < 2δ for i = 1, . . . , n and for all j such that τ j,n ≤ t, we have τ j,n = t i for some i. That is, there is one of the t i at every jump time of the process. In this case, whenever r, s ∈ [t i−1 , t i ), we have |R n (r) − R n (s)| = |r − s| ≤ 2δ < ε. By Lemmas 11 and 12 with δ in place of ε n and y in place of z, we have Note that if τ j,n − τ j−1,n ≤ δ for some j such that τ j,n ≤ t, then there exists a nonnegative integer k ≤ t/δ − 1 such that kδ ≤ τ j−1,n < τ j,n ≤ min{t, (k + 2)δ}. For two jumps of the process (R n (s), s ≥ 0) to fall within the interval [kδ, min{t, (k + 2)δ}], one of the following four events must occur: • We have R n (s) ≤ y for some s ∈ [0, t].
• Between times kδ and (k + 2)δ, there is a large cut, and one of the new edges attaches to the root and is assigned a label greater than log log n + y.
• Between times kδ and (k + 2)δ, more than 3δn log n vertices are removed from the tree during small cuts.
• Of the first ⌊3δn log n⌋ vertices, after time kδ, that are removed from the tree during small cuts, two or more reattach to the root with new edge labels greater than log log n + y.
We have already bounded the probability of the first event by ε/6. The other three events depend on k. The probability of the second event tends to zero as n → ∞ by (23) with 2δ in place of ε n and y in place of z. To bound the probability of the third event, note that mergers in the Bolthausen-Sznitman coalescent in which k−1 blocks are lost occur at rate n k λ n,k = n/(k(k−1)). Therefore, if N denotes the number of vertices removed during small cuts between times kδ and (k + 2)δ, we have Therefore, by Chebyshev's Inequality, as n → ∞. Finally, concerning the fourth event, note that when a vertex is reattached after being removed during a small cut, the probability that it reattaches to the root is at most 2/n, and the probability that it is assigned a label greater than log log n + y is e −y /(log n). Therefore, since there are at most (3δn log n) 2 /2 pairs of vertices to consider, the probability of the fourth event for a particular k is at most (3δn log n) 2 2 · 4e −2y (n log n) 2 ≤ 18δ 2 e −2y .
Since there are at most t/δ possible values of k to consider, the probability that the fourth event occurs for some k is at most 18tδe −2y ≤ ε/2. Therefore, lim sup n→∞ P (τ j,n − τ j−1,n ≤ δ for some j such that τ j, Combining this result with (34) gives (33) and completes the proof of Theorem 2.
We obtain Theorems 5 and 7 using a very different approach. Rather than using recursive trees, we couple the population model with a family of stable processes by constructing both from a Poisson process. We describe this construction in section 3.1. We then prove Theorem 7 in section 3.4, and we prove Theorem 5 in section 4. Throughout the rest of the paper, T > 0 will be an arbitrary positive constant, and T n = 2 log log n.
Also, → p will denote convergence in probability as n → ∞.

A Poisson process construction
Fix a positive integer n. Let Ψ be a Poisson point process on R × (0, ∞) whose intensity measure is given by dt × y −2 dy. Then define Θ to be the image of Ψ under the map (t, y) → (−t/ log n, y/ log n), restricted to R × (0, 1]. That is, if (t, y) is a point of Ψ with y ≤ log n, then (−t/ log n, y/ log n) is a point of Ψ. Note that Θ is a Poisson point process on R × (0, 1] with intensity measure dt × y −2 dy. We now construct a population model consisting of n individuals labelled 1, . . . , n. We independently attach to each point (t i , y i ) of Θ independent random variables U i,1 , . . . , U i,n , each having the uniform distribution on (0, 1). If zero or one of the random variables U i,1 , . . . , U i,n is less than y i , then there is no change in the population at time t i . However, if k ≥ 2 of the random variables U i,1 , . . . , U i,n are less than y i and the k smallest of these random variables are U i,j 1 < · · · < U i,j k , then at time t i , the individuals labeled j 2 , . . . , j k are killed, and the individual labeled j 1 gives birth to k − 1 new offspring, which assume the labels j 2 , . . . , j k .
To see that this is equivalent to the population model described in the introduction, note that if (t i , y i ) is a point of Θ, the probability that exactly k new offspring are born at time t i is n k+1 y k+1 i (1 − y i ) n−k−1 . Thus, the rate of events in which exactly k new offspring are born is which matches (2) because changes in the population occur at rate n − 1. For s ∈ R and t ≥ 0, let N n (s, t) denote the number of individuals in the population immediately before time s − t who have a descendant alive in the population at time s. That is, N n (s, t) is the number of ancestral lines remaining after time t if we trace back the ancestral lines of the individuals in the population at time s. Note that because we consider the population immediately before time s − t rather than exactly at time s − t when defining N n (s, t), the process (N n (s, t), t ≥ 0) is right continuous. Also, note that N (s, 0) = n as long as there is no change in the population at time s, but if k individuals are killed at time s and replaced by new offspring, then N (s, 0) = n − k. Let N n (t) = N n (0, t). Because the genealogy of this population is given by the Bolthausen-Sznitman coalescent started with n blocks, the process (N n (t), t ≥ 0) has the same law as the process defined in the statement of Theorem 7. Let Then L n (s) is the total branch length for the coalescent tree representing the genealogy of the population at time s, so (L n (s), s ≥ 0) has the same law as the process considered in Theorem 5.
Next, we use the Poisson process Ψ to construct, for each s ∈ R, a stable process (S(s, t), t ≥ 0) with characteristic exponent where γ denotes Euler's constant. Because Ψ has only countably many points, we can enumerate the points of Ψ as (s j , x j ) ∞ j=1 . If s j > −s, then the process (S(s, t), t ≥ 0) will have a jump of size −x j at time s j + s. To make this construction precise, we use a standard approximation procedure that is described, for example, in Section I.1 of [4]. Define Let R n (s, t) be the set of all j such that −s < s j ≤ −s + t and x j > ε n . For all s ∈ R and t ≥ 0, let Note that 0 −∞ x1 {εn<|x|≤1} x −2 dx = log ε n . Therefore, for any fixed T > 0 and any integers m, n > N , the proof of Theorem 1 in Section I.1 of [4] (see the bottom of p. 14) gives It follows that for each s ∈ R, there is a limit process (S(s, t), t ≥ 0) satisfying (36) such that for each fixed T > 0, as n → ∞. Let S n (t) = S n (0, t) and S(t) = S(0, t) for all t ≥ 0.

Bounds on stable processes
The three lemmas below collect some bounds on these stable processes that will be needed later. Therefore, Since e −t t 2 ≤ 1 for all t ≥ 0, it follows that sup 0≤s≤T sup t≥0 e −t |S(s, t)| ≤ 2M + L + 1.

Rate of decrease in the number of blocks
We record here some results about the rate at which the number of blocks decreases in the Bolthausen-Sznitman coalescent. Recall that the process (N n (t), t ≥ 0) tracks the evolution of the number of blocks over time. Because the number of blocks decreases by k − 1 whenever k blocks merge into one, the rate at which the number of blocks is decreasing when there are b blocks is Considering the process from the perspective of the construction in subsection 3.1, suppose there is a point (−t, y) in the Poisson process Θ. If N n (t−) = b, then N n (t−) − N n (t) is one less than the number out of b independent uniformly distributed random variables that are less than or equal to y, unless all of the random variables are greater than y in which case N n (t−)−N n (t) = 0. Therefore, the expected decrease in the process N n at time t is by − 1 + (1 − y) b . It follows that We will be interested also in the process obtained by removing from Θ the points whose second coordinate exceeds ε n / log n. In this case, the genealogy of the population is given not by the Bolthausen-Sznitman coalescent but by a different coalescent process in which the largest merger events are suppressed. In particular, when there are b blocks, the rate at which k blocks merge into one is given by and the rate at which the number of blocks is decreasing is given by Lemma 17. For all positive integers b and n with 2 ≤ b ≤ n, we have where γ denotes Euler's constant. Also, Proof. Using (48) and (3), we get for all b. It follows that Because the last integral is bounded by ∞ εn/ log n y −2 dy = (log n)/ε n , the result (51) follows by combining (53) and (54).
To prove (52), note that if (−t, y) is a point of Θ and N n (t−) = b, then because (k−1) 2 ≤ 2 k 2 for k ≥ 2, the expected square of the decrease in the process N n at time t is at most twice the expected number of pairs out of b independent uniformly distributed random variables having the property that both random variables are less than or equal to y, which is 2 b 2 y 2 . Therefore, as claimed.

The coupling
In this section, we prove Theorem 7. We use the construction and notation of section 3.1. As in the statement of Theorem 7, let for all t ≥ 0. Also, let Y n (t) = e −t S n (t) + e −t t 2 2 and We need to show that the processes (X n (t), t ≥ 0) converge to (Y (t), t ≥ 0) in the Skorohod topology. Because In fact, to prove Theorem 7, it would suffice to show (56) with the arbitrary fixed constant T in place of T n , but it will be helpful for the proof of Theorem 5 to control the difference between X n and Y n up to time T n . Let 0 < τ 1 < · · · < τ Jn < T n be the jump times of the process (S n (t), t ≥ 0) before time T n . Let τ 0 = 0 and τ Jn+1 = T n . Note that the τ i depend on n even though we do not record this dependence in the notation. This means that there are points (τ i , y i ) in Ψ with y i ≥ ε n for i = 1, . . . , J n . Also, the process Θ, which is used to construct the population process, contains the points (−τ i / log n, y i / log n) but Θ contains no points in the regions (−τ i+1 / log n, −τ i / log n) × [ε n / log n, 1]. Therefore, conditional on τ 1 , . . . , τ Jn , between times τ i / log n and τ i+1 / log n, the process N n follows the dynamics of the number of blocks in a coalescent process with transition rates given by (49).
For i = 0, 1, . . . , J n and t ∈ [0, τ i+1 − τ i ), let By standard results about compensators for Markov jump processes (see, for example, Theorem 9.15 in [21]), the process (M i,n (t), 0 ≤ t < τ i+1 − τ i ) is a martingale. Note also that τ i+1 − τ i is exponentially distributed with mean ε n and is independent of the evolution of the process M i,n before time τ i+1 − τ i . Next, for τ j ≤ t < τ j+1 , let Then the process (M n (t), 0 ≤ t < T n ) is a martingale.
By the L 2 Maximum Inequality for martingales, Thus, by Markov's Inequality, P log n n sup 0≤t≤Tn |M n (t)| > 1 log n = P sup 0≤t≤Tn M n (t) 2 > n 2 (log n) 4 ≤ 4(log n) 2 T n ε n , as claimed. In the next Lemma, we show that several events hold with high probability. This result will allow us to assume that these events holds throughout much of the rest of the paper. Lemma 19. Let A 1,n be the event that J T n ≤ n 1/4 . Let A 2,n be the event that Let A 3,n be the event that τ T i+1 − τ T i ≤ ε n log n for i = 0, 1, . . . , J T n . Let A n = A 1,n ∩ A 2,n ∩ A 3,n . Then lim n→∞ P (A n ) = 1.
To estimate P (A 3,n ), let a k = −T + (kε n log n)/2 for k = 0, 1, . . . , and let K = min{k : a k > T n }. Note that if τ T i+1 − τ T i > ε n log n, then some interval of the form [a k−1 , a k ] with 1 ≤ k ≤ K must not contain any of the points τ T i . The probability that [a k−1 , a k ] does not contain any of It follows that for sufficiently large n, as n → ∞. Therefore, lim n→∞ P (A 3,n ) = 1, which completes the proof.
The next lemma shows that the processes Y n and Z n typically jump by approximately the same amount at the jump times τ i .

Lemma 20. We have
for sufficiently large n.
Proof. Recall that (τ i , y i ) is a point of Ψ for i = 1, . . . , J n . Therefore, Let τ i = T n and y i = 0 on {i > J n }, and let G i = σ(τ i , N n ((τ i / log n)−), y i ). Then on {i ≤ J n }, where, conditional on G i , the distribution of B i is binomial with parameters N n ((τ i / log n)−) and y i / log n. Here B i represents the number of lineages out of N n ((τ i / log n)−) that merge at time τ i / log n. We have Var(B i |G i ) ≤ ny i / log n on {i ≤ J n }, so by the Cauchy-Schwarz Inequality, Multiplying both sides by 1 {i≤Jn} 1 {y i ≤log n} , which is G i -measurable, and taking expectations gives Summing over i and observing that A 2,n ∩ {i ≤ J n } ⊂ {y i ≤ log n} for all i, we get Thus, Also, which on the event {|X n (τ i −)| ≤ log log n} is bounded by 2y i (log log n)/ log n. Thus, on A 2,n , By (61), and so on A 1,n , Combining (62), (63), and (64), we get The result follows by combining this result with (60) and using that 1 (log n) 1/2 + 2 log log n (log n) 1/4 + log n n 3/4 ≤ 1 (log n) 1/8 for sufficiently large n.
Lemmas 21 and 23 below pertain to the behavior of the processes Y n and Z n in between the jump times τ i . Lemma 21. If 0 ≤ h < τ i+1 − τ i ≤ ε n log n and n is sufficiently large, then on {i ≤ J n }, we have Proof. By the construction of the process S n , we have Using O(h 2 ) to denote an expression whose absolute value is at most h 2 , Taylor's Theorem gives On the event that S n (τ i ) ≤ 1 4 log n, for sufficiently large n, the absolute value of the sum of the first two terms is bounded above by 1 2 h log n, while the absolute value of the sum of the last three terms is bounded above by 1 2 h 2 log n. Therefore, for sufficiently large n, on the event that S n (τ i ) ≤ 1 4 log n, we have Also, for sufficiently large n, if 0 ≤ s ≤ h, then |Y n (τ i + s) − Y n (τ i )| ≤ 1 2 s log n + 1 2 s 2 log n ≤ s log n on the event that S n (τ i ) ≤ 1 4 log n because h ≤ ε n log n ≤ 1. Therefore, By combining (66) and (67), we arrive at the statement of the lemma.

Lemma 22.
Suppose 0 ≤ h < τ i+1 − τ i ≤ ε n log n and τ i ≤ s ≤ τ i + h. If n is sufficiently large and |X n (s)| ≤ log log n, then Proof. In view of (55), if τ i ≤ s ≤ τ i + h, then, using that h ≤ ε n log n ≤ (log log n)/ log n for sufficiently large n and using the assumption that |X n (s)| ≤ log log n, we get ≤ nh + n log log n log n + n log log n log n ≤ 3n log log n log n for sufficiently large n. We consider two cases. First, suppose ne −τ i ≥ (4n log log n)/ log n. Because (69) implies that ne −τ i ≥ 1 4 N n (s/ log n), we get, using (69) and the fact that d Next, suppose that ne −τ i ≤ (4n log log n)/ log n. To determine the value of N n (s/(log n)) that maximizes the left-hand side of (68), we consider the function f (x) = x(log x − log a), where a > 0 is a constant. Note that f ′ (x) = 1 + log(x/a), which is positive when x > a/e and negative when x < a/e. Thus, f (x) is negative when x < a and reaches its minimum when x = a/e, and f (x) is positive and increasing for x > a. We conclude that (68) will hold in general provided that it holds if ne −τ i /e or ne −τ i + (3n log log n)/ log n is plugged in for N n (s/(log n)). Note that by (69), we need not consider larger values. In the former case, the expression that we get is In the latter case, since ne −τ i ≥ ne −Tn = n/(log n) 2 , the expression that we get is bounded above by 7n log log n log n log 7n log log n log n − log n (log n) 2 ≤ 7n log log n log n log 7 + log log log n + log log n .
Proof. Throughout the proof, we will work on the event that i ≤ J n and |X n (s)| ≤ log log n for all s ∈ [τ i , τ i + h]. By the definition (59) of Z n , Using the definition of M n in (57) and (58), we get To estimate the integral, we need to estimate η * (N n (t)). Equation (51), which holds for sufficiently large n even when b = 1 if we define η * (1) = 0, gives η * (N n (t)) − N n (t)(log N n (t) − log log n + log ε n + γ − 1) ≤ log n ε n + 1.
Proof. Note that B 1,n is the complement of the event on the right-hand side of (40) with 1 2 log log n in place of K, and B 2,n is the complement of the event in (41). Therefore, by Lemma 14, we have lim n→∞ P (B 1,n ) = lim n→∞ P (B 2,n ) = 1. We also have lim n→∞ P (B 3,n ) = 1 by Lemma 15. Recall that the event A n was defined in Lemma 19. The next lemma shows that when A n and B n occur, it is unlikely that the processes X n and Y n are ever far apart before time T n . In view of Lemmas 19 and 24, this result implies (56), and therefore Theorem 7.
Proof. We claim that on the event we have |Z n (t)− Y n (t)| ≤ 3/(log n) 1/16 for all t ≤ T n , and therefore |X n (t)− Y n (t)| ≤ 4/(log n) 1/16 for all t ≤ T n by (59). Thus, by Lemmas 18 and 20, this claim implies the result.
To prove the claim, let ζ = inf{t ≤ T n : |Z n (t) − Y n (t)| > 3/(log n) 1/16 }. We assume that we are working on the event in (74), and we must show that ζ is the infimum of the empty set, and thus that ζ = ∞. On B 1,n ∩ B 3,n , we have for all t ≤ T n . Therefore, if t < ζ and n is sufficiently large, then Note also that S n (t) ≤ 1 4 log n for all t ≤ T n on B 2,n ∩ B 3,n for sufficiently large n, and τ i+1 − τ i ≤ ε n log n for i = 0, 1, . . . , J n on A 3,n . Therefore, Lemmas 21 and 23 imply that if τ i + h < ζ and h < τ i+1 − τ i , then for sufficiently large n. Define the sets Clearly, we have I 1 ∪ I 2 ∪ I 3 = {0, 1, . . . , J n + 1}. By right continuity and the fact that Z n (0) = Y n (0) = 0, we have ζ > 0 = τ 0 . Suppose that ζ ∈ (τ i , τ i+1 ) for some i ∈ I 1 . Then, by (77) and the fact that τ i+1 − τ i ≤ ε n log n, ≤ 1 (log n) 1/16 + ε n log n · 3 (log n) 1/16 + 4ε n (log n) 1/2 log log n, a contradiction for sufficiently large n because |Z n (ζ) − Y n (ζ)| ≥ 3/(log n) 1/16 by right continuity. Thus, if i ∈ I 1 , then ζ ≥ τ i+1 . Therefore, if i ∈ I 1 , then reasoning as in (78) and using the fact that we are working on the event (74), we get for sufficiently large n, which implies that ζ = τ i+1 . Next, suppose that i ∈ I 2 . Let ρ = inf{t > τ i : Z n (t) ≤ Y n (t)}. Suppose ρ ≤ min{ζ, τ i+1 }. Then 0 ≤ Z n (s) − Y n (s) ≤ 3/(log n) 1/16 for all s < ρ. Therefore by (77), which is positive for sufficiently large n, a contradiction. Therefore, for sufficiently large n, if i ∈ I 2 , then Z n must stay greater than Y n from time τ i until after time min{ζ, τ i+1 }. In particular, if i ∈ I 2 and ζ ≥ τ i+1 , then i + 1 ∈ I 1 ∪ I 2 . By the same argument with the roles of Y n and Z n reversed, if i ∈ I 3 and ζ ≥ τ i+1 , then i + 1 ∈ I 1 ∪ I 3 . It follows from these observations and (79) that the only way we could have ζ ∈ (τ i , τ i+1 ] with i ∈ I 2 is if there exists j < i such that j ∈ I 1 , j + 1, j + 2, . . . , i ∈ I 2 , and However, if this is true, then Z n (t) > Y n (t) for all t ∈ [τ j+1 , ζ]. Therefore, using (77) and the fact that we are working on the event in (74), we have For sufficiently large n, the sum of the last two terms on the right-hand side is less than 1/(log n) 1/16 , a contradiction. Hence, we can not have ζ ∈ (τ i , τ i+1 ] for i ∈ I 2 , and the same argument with the roles of Y n and Z n reversed establishes that we can not have ζ ∈ (τ i , τ i+1 ] with i ∈ I 3 . We conclude that ζ = ∞, which completes the proof.

Extension to arbitrary starting times
The result (56) pertains to the evolution of the number of lineages when we trace back the ancestral lines of the individuals in the population at time zero. Of course, by stationarity, the analogous result holds if we instead trace back the ancestral lines of the individuals in the population at some other time s ≥ 0. However, to help with the proof of Theorem 5 in the next section, we will need a stronger version of this result that will make it possible to show that the approximation works well simultaneously at many times. The key to this result is that the events A n and B n were defined so that the bounds that hold on these events are valid simultaneously for all s ∈ [0, T ].
Recall that N n (s, t) denotes the number of individuals in the population immediately before time s − t with a descendant alive in the population at time s. LetÑ n (s, t) = N n (s, t)1 {N (s,t)>1} , which will be convenient because Note that X n (0, t) = X n (t) for all t ≤ (log n) inf{s : N n (s) = 1} and Y n (0, t) = Y n (t) for all t ≥ 0. We have the following extension of Lemma 25. The result follows from the same argument that gives Lemma 25. We have replaced 4/(log n) 1/16 by 5/(log n) 1/16 to account for the error in replacing N n (s, t) byÑ n (s, t). Indeed, it would be possible to use 4/(log n) 1/16 + (log n)/n because |N n (s, t) −Ñ n (s, t)| ≤ 1 for all s and t. Also, because an interval of time T n + s rather than T n must be considered when adapting the proofs of Lemmas 18 and 20, the bounds involving T n have been replaced by bounds involving T n + T .
Lemma 26. For sufficiently large n, we have

Proof of Theorem 5
Recall that for each s ∈ R, a process (S(s, t), t ≥ 0) was defined in section 3.
The next result shows that (L(s), s ≥ 0) has the same law as the process defined in the statement of Theorem 5.
Proposition 27. Let ν be the measure on R whose density with respect to Lebesgue measure is given by The process (L(s), s ≥ 0) is a process of Ornstein-Uhlenbeck type generated by (2 − γ, 0, ν, 1).
We use the notation ∞ 0 f (x) dS t (x) to denote the stochastic integral of f (x) with respect to the stable process (S(t, x), x ≥ 0). If 0 ≤ s ≤ t, then where in the next-to-last step we used Fubini's Theorem for general stochastic integrals (see, for example, Theorem 45 in Part IV of [27] and the remark on p. 161 in [27] that the measure µ in that theorem can be taken to be σ-finite rather than finite). Note that J(0) = 0 almost surely, and almost surely t 0 J(s) ds = 0 for all t. Therefore, using Fubini's Theorem again and (82), we get for all t ≥ 0 almost surely. Therefore, if we define Z(t) = S(t, t) + t − J(t) for all t ≥ 0, then which is exactly (8) with c = 1 and L in place of X.
By the symmetry of the construction in subsection 3.1, the processes (S(0, t), t ≥ 0) and (S(t, t) − J(t), t ≥ 0) have the same law. Indeed, if one reflects the points of the Poisson process about the vertical axis, so that a point at (t, x) is moved to (−t, x), and then follows the procedure used to construct (S(0, t), t ≥ 0), one obtains the process (S(t, t) − J(t), t ≥ 0) provided that J(0) = 0. Therefore, the process (S(t, t) − J(t), t ≥ 0) is a stable process whose characteristic exponent is given by the expression in (36). Since (Z(t), t ≥ 0) differs from (S(t, t) − J(t), t ≥ 0) only by the addition of a linear drift of rate 1, it follows that (Z(t), t ≥ 0) is a stable process whose characteristic exponent is obtained by replacing 1 − γ with 2 − γ on the right-hand side of (36). This observation, combined with (84), implies the result.
Remark 28. The process (L(s), s ≥ 0) is clearly stationary by construction. Thus, it follows from Propositon 27 and the theory of processes of Ornstein-Uhlenbeck type reviewed in the introduction that the distribution of L(0), and therefore the distribution of L(s) for any fixed s ≥ 0, has a characteristic function given by the right-hand side of (7). To observe this result more directly, use the Integration by Parts Formula (see, for example, Corollary 8.7 of [21]) to write that almost surely The distribution of the stable integral on the right-hand side of (85) can be evaluated using the theory developed in Chapter 3 of [29]. In this case, we apply Proposition 3.4.1 of [29] with m being π/2 times Lebesgue measure, E = [0, ∞), σ = π/2, µ = 0, β = −1, and f (t) = e −t . We get σ f = π/2, β f = −1, and µ f = − Consequently, the following proposition will imply Theorem 5.
To prove Proposition 29, we need to compare the integrals ∞ 0 X n (s, t) dt and ∞ 0 Y (s, t) dt. Lemmas 30 and 32 below will show that it suffices to consider the integrals of X n and Y up to time T n + s, and Lemma 33 will allow us to replace Y by Y n . The result is immediate.
Lemma 31. For all ε > 0, there exists a positive constant C such that Proof. Let (Π n (t), t ≥ 0) be a Bolthausen-Sznitman coalescent started with n blocks. There is a well-known method for constructing a random partition having the same distribution as Π n (t). Let α = e −t . Then let J 1 ≥ J 2 ≥ . . . denote the points of a Poisson point process on (0, ∞) whose intensity measure is given by  [25]. Therefore, N n (t) has the same distribution as the number of blocks of Π n (t). Thus, by (3.13) of [26], Therefore, by Stirling's Formula, there exist positive constants C 1 and C 2 such that for all n ≥ 2 and t ≥ 0, Now, take t = T n /(log n), so that α = e −Tn/(log n) . Because we have n α ≤ ne −Tn+T 2 n /2 log n ≤ C 3 n (log n) 2 (87) for some positive constant C 3 . Combining (86) and (87), we get that E N n T n log n ≤ C 4 n (log n) 2 for some positive constant C 4 . The result now follows from Markov's Inequality.
Proof. It is easy to check that Conditional on N n (0, T n / log n) = m ≥ 2, the distribution of ∞ Tn/ log nÑ n (0, t) dt is the same as the distribution of L m (0). Let ε > 0. By Lemma 31, there is a positive constant C such that P (N n (0, T n / log n) ≤ Cn/(log n) 2 ) > 1 − ε for sufficiently large n. Conditional on the event that N n (0, T n / log n) ≤ Cn/(log n) 2 , the distribution of ∞ Tn/ log nÑ n (0, t)dt is stochastically dominated by the distribution of L m (0) for m = ⌊Cn/(log n) 2 ⌋. Thus, by (6), there is a positive constant K such that P ∞ Tn/ log n N n (0, t) dt ≤ Kn (log n) 3 > 1 − 2ε for sufficiently large n. Hence, the right-hand side of (88) converges in probability to zero, which gives the result. Choose fixed times 0 = s 0 < s 1 < · · · < s m = T such that 1/(log n) 2 ≤ s i+1 − s i ≤ 2/(log n) 2 for i = 0, 1, . . . , m − 1. This is clearly possible for sufficiently large n, and m ≤ T (log n) 2 . Let ε > 0, and let (−u 1 , y 1 ), . . . , (−u k , y k ) denote the points of Ψ in the region [−T, 0] × [ε 3 , ∞). Note that with probability one, there are only finitely many points of Ψ in this region. For i = 0, 1, . . . , m − 1, let G i be the event that none of the u j falls in [s i , s i+1 ], and let H i be the event that exactly one of the u j falls in [s i , s i+1 ] and that this point u j is in (s i , s i+1 ). Note that almost surely none of the u j land on one of the points s 0 , . . . , s m , and with probability tending to one as n → ∞, no two of the points u j fall in the same interval [s i , s i+1 ]. Consequently, we have On H i , let j(i) be the value of j such that u j ∈ (s i , s i+1 ).
Lemma 34. We have Proof. If s i ≤ s ≤ s i+1 and t ≥ 0, then Therefore, Tn+s e −t t 2 2 dt.
We must bound the last four terms on the right-hand side of (91). Now e s i+1 −s i − 1 ≤ 3/(log n) 2 for sufficiently large n, so Also, e s i+1 −s ≤ 2 for sufficiently large n, which means Likewise, Finally,

Tn+s i+1
Tn+s e −t t 2 2 dt ≤ 1 (log n) 2 (95) for sufficiently large n. Combining (92), (93), (94), and (95) with (91), we get that for sufficiently large n, Note that sup 0≤s≤T |J(s)| is almost surely finite by properties of the Poisson process Ψ, and sup 0≤s≤T |L(s)| is almost surely finite by Proposition 27. Combining these observations with Lemmas 30 and 33, we see that the probability that the third term on the right-hand side of (96) is less than ε tends to zero as n → ∞. Clearly 4/(log n) 2 < ε for sufficiently large n. To control the second term, apply Lemma 16 with δ n = s i+1 − s i and θ = ε 3 to get P sup and thus Combining these bounds with (96) gives the result.
Lemma 35. We have Proof. Note that (91) still holds in this setting. Consequently, if s i ≤ s ≤ s i+1 , then on the event The bounds (92), (93), and (95) also hold. In place of (94), observe that if s i ≤ s ≤ s i+1 , then on the event H i , By Lemmas 30 and 33, and the fact that sup 0≤s≤T |L(s)| and sup 0≤s≤T |J(s)| are finite almost surely, the probability that the first, second, or fourth term on the right-hand side of (100) is greater than ε tends to zero as n → ∞. To bound the third term, we apply Lemma 16. Observe that the conditional distribution of given H i is the same as the conditional distribution of sup 0≤t≤δn |S n (s, t)| given A(θ) in Lemma 16 if we take δ n = s i+1 − s i and θ = ε 3 . Because it follows from Lemma 16 that which tends to zero as n → ∞. The result follows.
Lemma 36. If s ≥ 0 and h ≥ 0 with 0 ≤ s + h ≤ T , then Because N n (s, t) ≤ n for all s, t ≥ 0, the integral in the third term on the right-hand side of (104) is bounded by hn, so the term is bounded by h log n. Also, for all s, t, h ≥ 0, we have N (s + h, t + h) ≤ N (s, t) because the number of individuals in the population immediately before time (s + h) − (t + h) = s − t with descendants alive in the population at time s + h is less than or equal to the number of individuals in the population immediately before time s − t with descendants alive in the population at time s. Therefore, the fourth term in the right-hand side of (104) is nonpositive. Equation (102) follows. Next, we claim that for all s, t, h ≥ 0, we have N n (s, t) − N n (s + h, t + h) ≤ n − N n (s + h, h). To see this, note that N n (s, t) − N n (s + h, t + h) is the number of individuals in the population immediately before time s − t with descendants alive in the population at time s but not at time s + h. This is at most the number of individuals in the population at time s that do not have descendants alive in the population at time s + h, which is at most n − N n (s + h, h). Therefore, N n (s, t) −Ñ n (s + h, t + h) ≤ n − N n (s + h, h) + 1. The result (103) and therefore That is, R i is the number of individuals in the population immediately before time s i / log n with descendants in the population at least until immediately before time u j(i) / log n. Note that N n (s/ log n, (s − s i )/ log n) ≥ R i for all s ∈ [s i , u j(i) ). Therefore, by (103), for s ∈ [s i , u j(i) ) we have Tn+s 0 X n (s, t) dt ≥ Tn+s i 0 X n (s i , t) dt − 2(s − s i ) log n − (log n)(T n + T )(n − R i + 1) n .
Combining this result with (107) and the fact that s i+1 − s i ≤ 2/ log n gives Tn+s 0 X n (s, t) dt − Tn+s i 0 X n (s i , t) dt ≤ 2 log n + 4 (log n) 3 + (log n)(T n + T )(n − R i + 1) n .
It follows that if s ∈ [s i , u j(i) ), then Likewise, let and note that N n (s i+1 / log n, (s i+1 − s)/ log n) ≥ S i for all s ∈ (u j(i) , s i+1 ]. Therefore, if s ∈ (u j(i) , s i+1 ], then (103) gives We claim that conditional distributions of n − R i and n − S i given H i are each stochastically dominated by the distribution of n − N n (0, 2/(log n) 3 ), which is the number of blocks lost by time 2/(log n) 3 in a Bolthausen-Sznitman coalescent started with n blocks. To see this, note that n−S i is the number of blocks lost by time (s i+1 − u j(i) )/ log n ≤ 2/(log n) 3 in a Bolthausen-Sznitman coalescent. Likewise n − R i is the number of blocks lost by time (u j(i) − s i )/(log n) ≤ 2/(log n) 3 in a Bolthausen-Sznitman coalescent started with n lineages, if we disallow the instantaneous merger caused by the birth event at time u j(i) . Because H i requires that exactly one of the u k falls in (s i , s i+1 ), the effect of conditioning on H i is the same as suppressing all mergers in which each lineage participates with probability greater than ε 3 . This can only reduce the number of blocks lost. Thus, we have The result (105) follows from (111), (116), and the fact that with probability tending to 1 as n → ∞, we have G i ∪ H i for i = 0, 1, . . . , m − 1 by (90).