RANDOM RECURSIVE TREES AND THE BOLTHAUSEN-SZNITMAN COALESCENT

. We describe a representation of the Bolthausen-Sznitman coalescent in terms of the cutting of random recursive trees. Using this representation, we prove results concerning the ﬂnal collision of the coalescent restricted to [ n ]: we show that the distribution of the number of blocks involved in the ﬂnal collision converges as n ! 1 , and obtain a scaling law for the sizes of these blocks. We also consider the discrete-time Markov chain giving the number of blocks after each collision of the coalescent restricted to [ n ]; we show that the transition probabilities of the time-reversal of this Markov chain have limits as n ! 1 . These results can be interpreted as describing a \post-gelation" phase of the Bolthausen-Sznitman coalescent, in which a giant cluster containing almost all of the mass has already formed and the remaining small blocks are being absorbed.


Introduction
The Bolthausen-Sznitman coalescent, (Π(t), t ≥ 0), is a Markov process which takes its values in the set of partitions of N. It is most easily defined via its restriction Π [n] (t), t ≥ 0 to the set [n] := {1, 2, . . . , n}, for n ≥ 1. Denote by #Π [n] (t) the number of blocks of Π [n] (t). Then Π [n] (t), t ≥ 0 is a continuous-time Markov chain whose transition rates are as follows: if #Π [n] (t) = b, then any k of the blocks present coalesce at rate It is usual to start the coalescent from the partition into singletons, and in this case we say that the coalescent is standard.
The Bolthausen-Sznitman coalescent was first introduced in [5], in the context of the Sherrington-Kirkpatrick model for spin glasses. In [21], Pitman demonstrated a great number of its properties. He introduced the class of coalescents with multiple collisions (also known as Λ-coalescents), gave a construction of them based on Poisson random measures and studied the Bolthausen-Sznitman coalescent as a member of this class. Bertoin and Le Gall [4] give an alternative derivation in terms of the genealogy of a continuous-state branching process. Marchal [15] gives a construction via regenerative sets.
In this paper, we describe a new representation for the Bolthausen-Sznitman coalescent in terms of the cutting of random recursive trees.
We then use this representation to prove results about the last collision of the coalescent restricted to [n], as n → ∞. We obtain scaling laws for the sizes of the blocks involved in the final collision; essentially, this collision involves one large block and one or more smaller blocks, whose combined size behaves like n U , where U has the uniform distribution on [0, 1]. We also show that the distribution of the number of blocks involved in the final collision converges as n → ∞ (for example, the probability that exactly two blocks are involved converges to log 2).
More generally, we can also consider the discrete-time Markov chain giving the number of blocks after each collision of the coalescent restricted to [n]. We show that the transition probabilities of the time-reversal of this Markov chain have limits as n → ∞, which we make explicit. (We observe in passing that the form of these limiting probabilities yields certain infinite product expansions of powers of e, which appear to be new).
These results can be interpreted as describing a "post-gelation" phase of the Bolthausen-Sznitman coalescent, in which a giant cluster containing almost all of the mass has already formed and the very small left-over blocks are being absorbed. We also note that the tree representation has an intrinsic asymmetry which contrasts strongly with the exchangeability properties of the coalescent itself (for example, the root of the tree always represents the block containing 1). This makes it possible to read off properties concerning a tagged particle in the coalescent process directly from the tree representation.
2 Random recursive trees

The representation
A tree on n vertices labelled 1, 2, . . . , n is called a recursive tree if the vertex labelled 1 is the root and, for all 2 ≤ k ≤ n, the sequence of vertex labels in the path from the root to k is increasing (Stanley [26] calls this an unoriented increasing tree). See Figure 1 for an example of a recursive tree. Call a random recursive tree a tree chosen uniformly at random from the (n − 1)! possible recursive trees on n vertices. A random recursive tree can also be constructed as follows. The vertex 1 is distinguished as the root. We imagine the vertices arriving one by one. For k ≥ 2, vertex k attaches itself to a vertex chosen uniformly at random from 1, 2, . . . , k − 1. For a detailed survey of results on recursive trees, see Smythe and Mahmoud [24]. We can represent an infinite tree by the infinite sequence of integers {p i } i≥2 , where p i is the parent of node i. (The condition for this to be a recursive tree is 1 ≤ p i < i for i ≥ 2.) For the purposes of this article, it will be convenient also to define a random recursive tree on a label set {l 1 , l 2 , . . . , l b } where l 1 , l 2 , . . . , l b are the blocks of a partition of [n] for some n, listed in increasing order of least elements. (That is, we require that l 1 < l 2 < . . . < l b where "<" is the total order induced by the least elements of the blocks.) The tree is constructed in the obvious way: l 1 labels the root and l k is attached to a vertex chosen uniformly at random from those labelled l 1 , l 2 , . . . , l k−1 . Call the weight of a label the number of integers it contains and let L n be the set of partitions of [n].
Meir and Moon [17] define a cutting procedure which they apply to random recursive trees. They take a random recursive tree on [n] and pick an edge e uniformly at random from the n − 1 present. This edge is deleted along with the entire subtree below it. These operations are then repeated until the root is isolated. The idea of cutting combinatorial trees in this way appears to have been introduced in Meir and Moon [16] and remains a current research topic. Interest has tended to focus on the number of cuts required to isolate the root in different types of trees. Recent references include Janson [11,12], Fill, Kapur and Panholzer [9] and Panholzer [18,19]. In particular, [19] treats the case of random recursive trees; the author's presentation of this work at the MathInfo 2004 conference in Vienna stimulated the present work. A variant of the cutting procedure will be the basis of our representation of the Bolthausen-Sznitman coalescent. Suppose that instead of throwing the subtree below e away, we add its labels to those of the vertex above e. We repeat this procedure until only the root remains, labelled by [n]. See Figure 2 for an example.
Proposition 2.1. Suppose T is a random recursive tree on L = {l 1 , l 2 , . . . , l b } ∈ L n . Pick an edge at random, cut it and add the labels below the cut to the label above. Then the resulting tree is a random recursive tree on the new label-set.
Proof. (Essentially due to Meir and Moon [17].) The resulting tree is clearly recursive because its labels still increase along all paths away from the root. So we need to show that it is chosen uniformly from the set of all recursive trees with the same label-set. Put another way, we will show that each of the (b − 1)!(b − 1) recursive trees on the label-set L with a single marked edge corresponds to a tree constructed as follows: for some 2 ≤ k ≤ b, pick k of the labels, say l i 1 , l i 2 , . . . , l i k (taken to be in increasing order), make a recursive tree on L \ {l i 2 , . . . , l i k }, make another recursive tree on {l i 2 , . . . , l i k } and then join them together with an edge between the vertices labelled l i 1 and l i 2 .
There are b k ways of picking the k labels l i 1 , . . . , l i k . There are (k − 2)! ways of arranging the k − 1 largest into a recursive tree rooted at l i 2 . There are (b − k)! ways of arranging the b − k + 1 other labels into a recursive tree. Clearly each of the trees constructed in this way is distinct and also a recursive tree. The number which can be constructed is Hence, the claimed correspondence holds.
Proposition 2.2. Start with a random recursive tree on [n] and associate an independent exponential random variable with mean 1 to each edge. This exponential time is the time at which the edge is deleted, at which point the labels in the subtree below it are instantaneously added to the label of the vertex above the edge. Then the set of labels forms a partition of [n] (2,7,10) (1) (3) (2,7,10) (1) (1,3,4,6,9) (2,5,7,10) (8) (1,2,3,4,5,6,7,8,9,10) (1, 3,4,6,9) (2,7,10) (1,3,4,9)  Proof. We need to show that the rate of coalescence of any set of k of the labels is λ b,k whenever there are b vertices in the tree, for λ b,k defined as at (1.1). The total rate of events when there are b vertices is b − 1. The probability that the next event will coalesce a particular k-set is worked out in the same way as in the proof of Proposition 2.1. Suppose we start with label-set L = {l 1 , l 2 , . . . , l b } and we want the probability that the next event is the coalescence of {l i 1 , . . . , l i k }. There are (k − 2)! ways of making a recursive tree on {l i 2 , . . . , l i k }; there are (b − k)! ways of making a recursive tree on the remaining labels. There are (b − 1)!(b − 1) recursive trees on a label-set of size b with a single marked edge and so the probability that we coalesce .
Hence, the rate at which we coalesce any k-set is The evolution is Markovian because, by Proposition 2.1, the resulting tree is another random recursive tree, this time on b − k + 1 labels.
Because of the recursive way in which the original tree is built, the representations are consistent for different n: if Π [n] (t) is the partition given by the tree representation on [n] at time t then for all t ≥ 0 and all n ∈ N, Thus we are able to define (Π(t), t ≥ 0), the Bolthausen-Sznitman coalescent on the whole of N, by means of the cutting procedure applied to an infinite random recursive tree (indexed by N).

Random recursive trees, the Chinese restaurant process and random permutations
There is a well-known correspondence between random recursive trees and uniform random permutations. We find it most convenient to demonstrate this connection via the Chinese restaurant process of Dubins and Pitman (see Chapter 3 of Pitman [22]), which we now introduce.
For 0 ≤ α < 1 and θ > −α, the Chinese restaurant process with parameters α and θ (referred to hereafter as the (α, θ)-Chinese restaurant process) is constructed as follows. The existence of this limit for blocks generated by the Chinese restaurant process is guaranteed by Kingman's theory of exchangeable random partitions [13,14]. Moreover, if (F 1 , F 2 , . . .) is the vector of asymptotic frequencies for a (α, θ)-Chinese restaurant process in the order of the tables then (F 1 , F 2 , . . .) has the Griffiths-Engen-McCloskey GEM(α, θ) distribution. If this vector is put into decreasing order of size then it has the Poisson-Dirichlet PD(α, θ) distribution. (See Pitman [22] or Arratia, Barbour and Tavaré [1] for more details.) For α = 0 and θ > 0, there is a simpler construction. Person 1 sits at table 1. For m ≥ 1, person m + 1 starts a new table with probability θ/(θ + m) and sits to the left of each of the m existing customers with probability 1/(θ + m). In this case, the blocks created contain more information than is needed to just give a partition of [n]. In fact, they are the cycles of a permutation of [n], written in standard cycle notation (i.e. the permutation maps i to the customer seated to the left of i). The special case θ = 1 gives a uniform random permutation of [n] (that is, a permutation chosen uniformly at random from all the possible permutations of [n]). Moreover, the permutations are consistent for different n and, when n → ∞, we obtain a uniform random permutation of N.
We can find a (0, 1)-Chinese restaurant process in the construction of the random recursive tree on [n + 1]. For this purpose, it is easier to imagine the random recursive tree labelled by the set {0, 1, . . . , n} rather than {1, 2, . . . , n + 1}. The root (labelled 0) is fixed and does not appear in the permutation. Vertices attached to the root correspond to individuals who start a new table. Thus, individual 1 necessarily starts a new table. If vertex k arrives and attaches to a vertex other than the root (say j) then individual k sits directly to the left of individual j. As vertex k is equally likely to attach to each of the vertices labelled 0, 1, . . . , k − 1, individual k is equally likely to sit to the left of any of the individuals 1, 2, . . . , k − 1 or to form a new table.
Thus, a random recursive tree on [n + 1] corresponds to a random permutation of [n].
Just to give some interpretation to the cutting procedure in the Chinese restaurant, we embellish the usual model as follows. Each customer present in the restaurant has friends arrive at rate 1 and there is also a rate 1 stream of customers who know no-one in the restaurant at their time of arrival. Customers who arrive and have a friend present always sit to the left of that friend. Friendless customers sit at new tables. Stop the process when there have been n arrivals. A meal in the restaurant costs one euro. At rate 1, each customer decides to leave and go home. Whenever he leaves, any of his friends who arrived after him want to leave (and their friends, and so on). (If the person who decided to depart was k then it is the subtree rooted at k in the random recursive tree which departs.) Anyone leaving gives the money for their meal to the person to whose left k sat down on entering the restaurant, so that he can pay for them when he leaves. If k was, in fact, the first person to have sat at that table then he collects all the money for the whole table (plus, of course, the price of his own meal), takes it to the cashier and departs. Note that at any time t, the amount of money in the cash register is the same as the weight of the label at the root in the random recursive tree (i.e. the size of the block containing 1 in the Bolthausen-Sznitman coalescent). In this paper, we will answer questions such as: how much does the last customer to leave pay the cashier?

Using the construction
The construction of Subsection 2.1 gives an oddly asymmetrical way of looking at the Bolthausen-Sznitman coalescent. Rather than the usual exchangeable partitions approach here, because the blocks are automatically ordered according to their least elements, we have a size-biased view (smaller numbers are more likely to be the smallest number in their blocks). Moreover, a particular realisation of a random recursive tree corresponds to a particular conditioning of the coalescent (e.g. if 2 and 5 are both children of 1 then we are conditioning 2 and 5 only to be in the same block when they have both coalesced with 1). Some facts are very easy to read off from the tree and we will describe here some examples.
In Subsection 2.4 of [21], Pitman discusses the size of the block containing 1 at time t.
In terms of the random recursive tree, in order to find the size of the block containing 1 at time t, it suffices to know only the sizes of the descendencies of all the children of the root, plus the times at which the edges from the root to those children are severed. By the connection with the Chinese restaurant process, it is clear that in the limit as n → ∞ the vector of asymptotic frequencies of the descendencies of the children of the root has the GEM(0, 1) distribution (i.e. the PD(0, 1) distribution with the blocks in sizebiased rather than decreasing order). Moreover, by construction, the times at which these descendencies are added to the root are independent and identically distributed standard exponential random variables. Thus, we have here given a very straightforward proof of Pitman's Corollary 16, reproduced below: (Pitman). Letf 1 (t) be the frequency of the block containing 1 at time t in a standard Bolthausen-Sznitman coalescent. Then , and the process (− log(1 −f 1 (t)), t ≥ 0) has non-stationary independent increments.
. . be the ranked magnitudes of jumps of the process (f 1 (t), t ≥ 0), and let T i be the time when the jump of magnitude J i occurs. Then the distribution of the sequence (J 1 , J 2 , . . .) is PD(0, 1), and this sequence is independent of the T i , which are independent with standard exponential distribution.
((i), (ii) and (iii) are equivalent by properties of the Dirichlet random measure; (iii) is immediate from the random recursive tree approach.) In Bolthausen and Sznitman [5] and Pitman [21], it is shown that the marginal distribution of the asymptotic frequencies of the blocks of the coalescent (ranked in decreasing order), at fixed time t > 0, is PD(e −t , 0). Jason Schweinsberg has observed that the recursive tree provides a simple way of proving this fact. We work via the Chinese restaurant process. Let E 1 , E 2 , . . . be independent and identically distributed standard exponential random variables. Fix a time t and imagine constructing the random recursive tree in such a way that the vertex i arrives with the edge above it marked if E i < t and unmarked otherwise, for all i ≥ 2. (These marks are the cuts, but we will find it convenient here not to collapse the tree at the cuts but rather just to keep track of where they are.) Then vertex i has the smallest label in its block if there are no marks in the path from the root to i. Otherwise, i is in the same block as the closest vertex to i in the path from the root to i which has no cuts above it. Suppose now that vertices 1, 2, . . . , n have arrived (possibly with marks) in the construction of the random recursive tree. Suppose also that they form k blocks (after cutting) of sizes n 1 , . . . , n k (where k i=1 n i = n). Then vertex n + 1 has n possible places to attach. It creates a new block if (i) it attaches to the smallest element of one of the k blocks and (ii) it doesn't have a mark. This happens with probability ke −t /n. It adds to the ith block (of size n i ) if it attaches to one of the n i − 1 non-smallest elements of the block or if it attaches to the smallest element and arrives with a mark. This happens with probability (n i − e −t )/n. But these are exactly the probabilities in the (e −t , 0)-Chinese restaurant process and so we can conclude that the full asymptotic frequencies in decreasing order must have PD(e −t , 0) distribution.

Translating results for trees into results for the coalescent
As mentioned above, Panholzer [19] has studied the number of cuts necessary to isolate the root in a random recursive tree on [n]. In the context of the Bolthausen-Sznitman coalescent, this corresponds to the number of collision events that take place until there is just a single block. Let J n be this number of collisions.
Theorem 2.4 (Panholzer). J n has moments and centred moments as follows: for k ∈ N, as n → ∞, as n → ∞.
Panholzer notes that it is not possible to obtain a limiting distribution of a centred, scaled version of J n by the method of moments.
3 The last collision of the Bolthausen-Sznitman coalescent

Main results
Let M n be the sum of the sizes (or masses) of the blocks not containing the integer 1 in the last collision of the Bolthausen-Sznitman coalescent restricted to [n], and let B n be the number of blocks involved in the final collision. For example, in Figure 2 we have n = 10, M n = 5 and B n = 3. where (Y (t)) t≥0 is a standard Yule process, U is uniform on [0, 1], E is standard exponential and (Y (t)) t≥0 , U and E are independent.
The proof of this theorem is quite long and so we defer it to Section 3.4. Theorem 3.1 tells us that the final collision of the coalescent restricted to [n] is between a very large block containing almost all of the mass and a collection of very small blocks whose total mass is roughly n U . We can make the distribution of the number of the small blocks in the last collision explicit as follows.
This distribution has infinite mean.
Proof. Let q m (t) = P (Y (t) = m) for m ≥ 1. It is easily shown (for example, from the forward equations) that and so, by Proposition 3.15 (see Section 3.5), we have and so P (Y (U E) = 1) = log 2 and, for m ≥ 2, For example, putting m = 1 gives P (B n = 2) → log 2 as n → ∞. The joint convergence given in Theorem 3.1 makes it possible to condition on the value of Y (U E) to obtain results such as

Remarks. (a)
We have already mentioned the connection between random recursive trees and uniform random permutations. In a permutation, M n represents the length of a uniformly-chosen cycle. In fact, considerably stronger results hold on the distribution of the cycle lengths. Let K n (t) be the number of cycles of length less than or equal to n t . Then the functional central limit theorem of DeLaurentis and Pittel [6] states that (a random element of D[0, 1]) converges weakly to Brownian motion on [0, 1]. (This theorem was extended by Hansen [10] to the case of a permutation with distribution given by the Ewens sampling formula i.e. a permutation generated by the (0, θ)-Chinese restaurant process. An alternative proof of her more general theorem, which is more in the style of this paper, was given by Donnelly, Kurtz and Tavaré [7]. See also Feng and Hoppe [8] for a path-level large deviations principle for the Ewens sampling formula.) (b) Take any rooted tree T , random or deterministic. Suppose that each edge e has a random variable λ e attached to it, where the values λ e are independent and identically distributed with a continuous distribution. Say that λ e is a record if it is the largest value in the path from the root to e. Janson [11] shows that the distribution of the number of records in T is the same as that of the number of random cuts required to isolate the root. To see this, each time cut the edge with the largest λ e amongst those remaining. Then by symmetry, we always cut a uniformly-chosen edge, and so we get the cutting procedure. Moreover, λ e is a record if and only if its edge is cut; thus, there must be the same number of records as of cut edges. In the records setting, B n has the distribution of the size of the maximal subtree of a random recursive tree on [n] containing the smallest record λ * and only edges e with λ e < λ * . where E has a standard exponential distribution.
Proof. This is a corollary of Lemma 3.8 in Section 3.4 (to connect the notation, we have A n = E * ).
This proposition is straightforward, but it nicely illustrates the idea that the Bolthausen-Sznitman coalescent on N "only just fails" to come down from infinity (see Pitman [21] and Schweinsberg [23]).

The coalescent in reversed time
We can regard Theorem 3.1 as a statement about the first event in a time-reversed version of the Bolthausen-Sznitman coalescent. More precisely, let X = (X 0 , X 1 , . . .) be the discrete-time Markov chain describing the evolution of the number of blocks in the coalescent restricted to [n], so that X i is the number of blocks after i collisions, for i ≥ 0. (We can do this because the Bolthausen-Sznitman coalescent is homogeneous, in the sense that the sizes of the blocks do not influence the dynamics of the process.) X has transition probabilities and is clearly decreasing. Started from X 0 = n, for any finite n, the chain hits 1 almost surely. LetX be the time-reversal of X, whereX starts from 1. We show that the reversedtime transition probabilitieŝ p l,m := lim n→∞ P (X enters l from m|X 0 = n and X hits l) = lim n→∞ P X jumps from l to m|X 0 = 1 andX hits l and n exist and can be made explicit.
Theorem 3.5. We havê (3.4) The l = 1 case of this theorem is a direct consequence of Theorem 3.1 and Proposition 3.2.
In principle, one could extend the probabilistic proof that we give of this case to study individual cases of higher l; however, we have not been able to find such a proof for the general case, and so the self-contained proof given in Section 3.5 is via the analysis of certain recurrence formulae.

Number theoretic results
Before proceeding to the proofs of Theorems 3.1 and 3.5, we observe in passing that Theorem 3.5 entails various infinite product expansions of powers of e, which seem not to have been previously known.
For example, the r = 1 case put more clearly reads · · · and the r = 2 case reads The r = 1 case of this theorem was first proved by Guillera and Sondow (cited in Sondow [25]) and bears a certain resemblance to Pippenger's product [20] for e.
Proof. Consider first the r = 1 case. In Theorem 3.5, we find the distribution (p 1,n+1 , n ≥ 1). Summing over n, we obtain Hence, and taking the exponential of both sides gives the result.

Proof of Theorem 3.1
It is convenient to think of the formation of the random recursive tree as occurring in continuous time, according to a linear birth process with immigration. (This is the approach used by Donnelly, Kurtz and Tavaré [7] when dealing with random permutations.) More specifically, we imagine that the root is present at time 0 and that it has a special rôle. Children of the root "immigrate" at rate 1 and each immigrant initiates a family which evolves as a standard Yule process. Let R i be the time of the ith immigration and let P (t) be the number of immigrations up to time t (this is just a standard Poisson process). Let Y i (t) be the size of the the ith family at time R i + t and let the time at which the total population present (including the root) reaches n. (Of course, 1 + is itself just a standard Yule process, but in what follows we find it helpful to distinguish the root.) Finally, let C n = P (T n ), the number of immigrations up to (and including) time T n . Note that P and Y 1 , Y 2 , . . . are independent.
The theorem concerns what happens when we cut the tree so much that all of the children of the root become severed. We will think of the cutting as occurring in continuous time (although subsequent to the formation of the tree). To this end, we associate an independent standard exponential lifetime to each individual, which gives the time at which the edge above it is cut.
For each i ≥ 1 we define a family of processes, {(Y thinned(u) i (s), s ≥ 0), u ≥ 0}, in the following way. The tree representing the ith Yule process at time s has size Y i (s). From this tree, we remove all the edges whose cutting time is less than or equal to u (and all the subtrees below such edges). Let Y thinned(u) i (s) be the size of the tree that remains. Note that for any u, the process (Y thinned(u) i (s), s ≥ 0) has the distribution of (Y (e −u s), s ≥ 0) where Y is a standard Yule process.
For 1 ≤ i ≤ C n , let E i be the exponential lifetime associated with the ith child of the root. The last child to be cut is severed at time and is the C * th child, where E * is the time at which the root is isolated, and is therefore the time of the last event in the coalescent process. Let R * = R C * , the time at which the C * th child arrived in the formation of the tree.
In this framework, we have that the total family size (in the unthinned tree) of individual C * (which is the last child of the root to be cut). Similarly, the number of nodes still remaining in the (thinned) family of C * at the moment when it is cut.

Now define
On the "good set" G n , we have E * = E − * and R * = R − * almost surely. Also, R − * ∼ U[0, t − n ] and is independent of E − * and of the Yule processes Y i for i ≥ 1. We will prove the following three lemmas: where E is a standard exponential random variable. Lemma 3.9. As n → ∞, P (G n ) → 1.
To show how the lemmas give the theorem, we proceed as follows. On G n , we have and So to show that (log M n / log n, B n ) (1), and U , E and Y 1 , Y 2 , . . . are independent.
Using Lemma 3.7 and the fact that t − n / log n → 1, we can rewrite the left-hand side of (3.7) as where A n d → 1 and B n d → 1 as n → ∞. Thus, using Slutsky's Lemma, it is enough to show that Recall that we can replace the process (Y Then the left-hand side of (3.8) becomes Since t − n → ∞, this indeed converges in distribution to (U, U, Y (U E)), as desired.
It remains to prove the three lemmas.
Proof of Lemma 3.7. For a standard Yule process Y , and so, for any 0 < < 1, which converges to 1 as t → ∞. Thus, as t → ∞. Hence, if V n is a sequence of random variables such that V n d → ∞ as n → ∞ (in the sense that P (V n > x) → 1 as n → ∞, for all x), then also as n → ∞.
The first two parts of the lemma follow, since R − * ∼ U [0, t − n ], and so t − n − R − * and t + n − R − * both converge to infinity in distribution.
The last part is immediate from the facts that R − * ∼ U [0, t − n ] and that t − n /t + n → 1.
Proof of Lemma 3.8. Think of P now as a marked Poisson process: points arrive at rate 1 and the ith point carries the mark E i , where the E i are independent and identically distributed with standard exponential distribution. So, for s > 0, points with mark greater than or equal to s arrive as a Poisson process of rate e −s . The quantity E − * is the largest mark out of all those points arriving in the interval [0, t − n ]. So So for x < t − n , we have Also, E − * ≥ 0 with probability 1, so t − n e −E − * ≤ t − n with probability 1. Thus, indeed, t − n e −E − * has the distribution of E ∧ t − n , as required.
Suppose now that T n ∈ (t − n , t + n ) but that C * = C − * . This implies the event that, out of all the children of the root arriving in [0, t + n ], the one with the largest exponential lifetime arrived in [t − n , t + n ]. The probability of this event is simply the ratio (t + n − t − n )/t + n , which converges to 0 as n → ∞. Thus, we have as n → ∞.
Finally, we need to show that As observed earlier, we can replace the process Y thinned(u) C − * (s) by Y (e −u s) and so we need to show that As before, we can put t − n − R − * = t − n U and e −E − * = (E ∧ t − n )/t − n to rewrite this as as n → ∞ (by Lemma 3.7), so it will be enough to show that P (a jump of Y occurs in the interval (U E, (1 + )U E)) → 0, as ↓ 0. This is easily seen to be true (for example, by integrating over U and E), since E [Y (t)] is continuous in t.

Proof of Theorem 3.5
The self-contained proof of Theorem 3.5 which we give here mostly consists of the analysis of certain recurrence formulas via generating functions (see Wilf [28] for an excellent introduction to this method). We will start by proving the l = 1 case and will then see that this enables us to prove the rest of the theorem. Let for m ≥ 2 (we do not need to condition on X entering 1 ever as this occurs almost surely).
We can re-phrase the l = 1 case of Theorem 3.5 as The generating function of the sequence (x (m) n , n ≥ m) we will use is defined as for t ∈ [0, 1).
The key to proving Theorem 3.10 is the following result on the convergence of power series.
Proposition 3.11. Suppose that f (t) = ∞ n=1 f n t n is a power series with positive real coefficients such that for some constant C ∈ (0, ∞) and all n ≥ 2. Suppose that where we take f 0 = 0 for convenience. Then by Littlewood's Tauberian theorem (see, for example, Theorem 1.1 of Chapter 2 in van de Lune [27]) This sum telescopes and so we have lim n→∞ f n = f * , as required.
In view of Proposition 3.11, we would like to prove a condition like (3.15) for the sequence (x (m) n , n ≥ m). It turns out to be easier to prove this for a more general situation. Let the sequence (x n , n ≥ m) be defined as follows for fixed m ≥ 2: for some x m ∈ (0, 1], x n = n−m+1 k=2 a n,k b n x n−k+1 , n ≥ m + 1, (3.17) where the non-negative coefficients (a n,k ) 2≤k≤n satisfy a n,k = n−k+1 n+1 a n+1,k + k+1 n+1 a n+1,k+1 , 2 ≤ k ≤ n (3.18) and b n = a n,2 + a n,3 + · · · + a n,n , n ≥ 2. (3.19) Conditions (3.18) and (3.19) are exactly those satisfied by the rates in a general Λ-coalescent and (3.17) is the recurrence that arises for its reversed-time transition probabilities. In Pitman's notation of [21], a n,k = ( n k ) λ n,k and b n = n k=2 ( n k ) λ n,k , where λ n,k = 1 0 x k−2 (1− x) n−k Λ(dx) and Λ is any finite measure. In the Bolthausen-Sznitman case, a n,k = n k(k−1) and b n = n − 1.
Lemma 3.14. For n ≥ m + 1, we have Proof. We proceed by induction. For n = m + 1, by (3.17) we have Now suppose that (3.21) holds up to n − 1. From (3.20) and this induction hypothesis we have n−k n a n,k m n−k + m−1 n a n,n−m+1 = 1 b n m n (a n,2 + · · · + a n,n−m ) + m−1 n a n,n−m+1 Hence result.
Thus, we know that for the sequences (x (m) n , n ≥ m) associated with the Bolthausen-Sznitman coalescent, |x As a final preparation for the proof of Theorem 3.10, we mention a simple identity. Proof. By induction on m.
We are now ready to prove Theorem 3.10.
Proof of Theorem 3.10. Recall that for m ≥ 2 and t ∈ [0, 1), Then From It remains only to show how the rest of Theorem 3.5 follows from Theorem 3.10. We first note that P (X enters l from m|X 0 = n and X hits l) = P (X enters l from m|X 0 = n) P (X hits l|X 0 = n) .
Let y (l,m) n = P (X enters l from m|X 0 = n) and y (l) n = P (X hits l|X 0 = n) . It is clear from our argument thatp l,m must be non-negative for all m > l ≥ 1. However, by Fatou's lemma, we know only that ∞ m=l+1p l,m ≤ 1.
We would like to show that (p l,m , m ≥ l + 1) is really a distribution (i.e. that this inequality is in fact an equality). The only thing that can go wrong is if mass "escapes to infinity". To show that this does not happen, we firstly make clear the probability space on which we are working. For each state n, sample d n independently from the distribution (p n,k , 1 ≤ k ≤ n − 1). Then for any n, the path of X started with X 0 = n is n, d n , d dn , . . .. Thus, we have coupled the paths of the Markov chain X started from all different possible states. Now for fixed l ≥ 1 consider the events Remark. It does not appear that the methods of this section can be extended to a more general Λ-coalescent, despite the encouraging fact that Lemma 3.14 holds for all Λ-coalescents. The generating function methods used rely crucially on special structure of the Bolthausen-Sznitman case, in particular the fact that p n,n−k+1 decomposes neatly into two simple factors, one involving only n and the other involving only k.