External branch lengths of $\Lambda$-coalescents without a dust component

$\Lambda$-coalescents model genealogies of samples of individuals from a large population, and the individuals' time durations up to some common ancestor are given by the external branch lengths of the family tree. We consider typical external branches under the minimal assumption that the coalescent has no dust component, and maximal external branches under further regularity assumptions. As it turns out, the crucial characteristic is the coalescent's rate of decrease $\mu(b)$, $b\geq 2$. The magnitude of a typical external branch is asymptotically given by $n/\mu(n)$, where $n$ denotes the sample size. This result and also asymptotic independence of several typical external lengths hold in full generality, while convergence in distribution of the scaled external lengths requires that $\mu(n)$ is regularly varying at infinity. For the maximal lengths, we distinguish two cases. First, we analyze a class of $\Lambda$-coalescents coming down from infinity and with regularly varying $\mu$. Here the scaled external lengths behave as the maximal values of $n$ i.i.d. random variables, and their limit is captured by a Poisson point process on the positive real line. Second, we turn to the Bolthausen-Sznitman coalescent, where the picture changes. Now the limiting behavior of the normalized external lengths is given by a Cox point process, which can be expressed by a randomly shifted Poisson point process.


Introduction and main results
In population genetics, family trees stemming from a sample out of a big population are modeled by coalescents nowadays. The prominent Kingman coalescent [21] found widespread applications in biology. More recently, the Bolthausen-Sznitman coalescent, originating from statistical mechanics [3], gained in importance in order to analyze genealogies of populations undergoing selection [4,7,25,31]. Unlike Kingman's coalescent, the Bolthausen-Sznitman coalescent allows multiple mergers. The larger class of Beta-coalescents found interest, e.g., in the study of marine species [33,26]. All these instances are covered by the notion of Λ-coalescents as introduced by Pitman [27] and Sagitov [29] in 1999. Today general properties of this broad class become more transparent [19,11].
In this paper, we deal with the lengths of external branches of Λ-coalescents under the minimal assumption that the coalescent has no dust component, which applies to all cases mentioned above. We shall treat external branches of typical and, under additional regularity assumptions, of maximal length. For the total external length see the publications [23,16,6,17,10].
to a single one. In this paper, the crucial characteristic of Λ-coalescents is their rate of decrease µ = (µ(b)) b≥2 defined as This notion captures that a merger of k blocks corresponds to a decline of k−1 blocks. Its importance became also apparent from other publications [30,22,11]. In particular, the assumption of absence of a dust component may be expressed in this terms. Originally characterized by the condition [0,1] Λ(dp) p = ∞, (see [27]), it can be equivalently specified by the requirement µ(n) n → ∞ as n → ∞ (see Lemma 1 (iii) of [11]).
An n-coalescent can be thought of as a random rooted tree with n labeled leaves representing the individuals of a sample. Its branches specify ancestral lineages of the individuals or their ancestors. The branch lengths give the time spans until the occurrence of new common ancestors.
Branches ending in a leaf are called external branches. If mutations under the infinite sites model [20] are added in these considerations, the importance of external branches is revealed. This is due to the fact that mutations on external branches only affect a single individual of the sample.
Longer external branches result thereby in an excess of singleton polymorphisms [35] and are known to be a characteristic for trees with multiple mergers [12]; e.g., external branch lengths have been used to discriminate between different coalescents in the context of HIV trees [36] (see also [34]).
Now we turn to the main results of this paper. For 1 ≤ i ≤ n, the length of the external branch ending in leaf i within an n-coalescent is defined as In the first theorem, we consider the length T n of a randomly chosen external branch. Based on the exchangeability, T n is equal in distribution to T n i for 1 ≤ i ≤ n. The result clarifies the magnitude of T n in full generality. as n → ∞.
Among others, this theorem excludes the possibility that T n converges to a positive constant in probability. In [18] the order of T n was interpreted as the duration of a generation, namely the time at which a specific lineage out of the n present ones takes part in a merging event. In that paper, only Beta(2 − α, α)-coalescents with 1 < α < 2 have been considered, and the duration was given as n 1−α . Our theorem shows that for this heuristics n/µ(n) is a suitable choice in general.
Asymptotic independence of the external branch lengths holds as well in full generality for dustless coalescents.
Theorem 1.2. For a Λ-coalescent without a dust component, for fixed k ∈ N and for any sequence of numbers t n 1 , . . . , t n k ≥ 0, n ≥ 2, we have In the light of the waiting times, which the different external branches have in common, this might be an unexpected result. We point out that for each coalescent with a dust component, Möhle showed that this asymptotic independence does not hold (see equation (11) of [23]).
In order to achieve convergence in distribution of the scaled lengths, stronger assumptions are required on the rate of decrease, namely that µ is a regularly varying sequence. A characterization of this property is given in Proposition 2.2 below. Let δ 0 denote the Dirac measure at zero.
For a Λ-coalescent without a dust component, there is a sequence (γ n ) n∈N such that γ n T n converges in distribution to a probability measure unequal δ 0 as n → ∞ if and only if µ is regularly varying at infinity. Then its exponent α of regular variation fulfills 1 ≤ α ≤ 2, and we have as n → ∞.
In particular, this theorem contains the special cases known from the literature. Blum [14] showed asymptotic exponentiality of the external branch length. This result was somewhat generalized by Yuan [37]. A class of coalescents containing the Beta(2 − α, α)-coalescent with 1 < α < 2 was analyzed by Dhersin et al. [8].
Combining Theorem 1.2 and 1.3 yields the following corollary: Suppose that the Λ-coalescent lacks a dust component and has regularly varying rate of decrease µ with exponent α ∈ [1,2]. Then for fixed k ∈ N, we have as n → ∞, where T 1 , . . . , T k are i.i.d. random variables each having the density for 1 < α ≤ 2 and a standard exponential distribution for α = 1.
For Beta-coalescents this corollary reads as follows: Example. For k ∈ N, let T 1 , . . . , T k be the i.i.d. random variables from Corollary 1.4.
In the second part of this paper, we change perspective and examine the external branch lengths ordered by size downwards from their maximal value. In this context, an approach via a point process description is appropriate. Here we consider Λ-coalescents having regularly varying rate of decrease µ, additionally to the absence of a dust component. It turns out that one has to distinguish two cases.
First, we treat the case of µ being regularly varying with exponent α ∈ (1, 2] (implying that the coalescent comes down from infinity). We introduce the sequence (s n ) n≥2 given by Note that µ(n)/n is a strictly increasing and, in the dustless case, diverging sequence (see Lemma 2.1 (ii) and (iv) below), which directly transfers to the sequence (s n ) n≥2 . Also note in view of as n → ∞.
Theorem 1.5. Assume that the Λ-coalescent has a regularly varying rate of decrease µ with exponent α ∈ (1, 2]. Then, as n → ∞, the point process Φ n converges in distribution to a Poisson point process Φ on (0, ∞) with intensity measure Note that 1 0 φ(x)dx = ∞, which means that the points from the limit Φ accumulate at the origin. On the other hand, we have ∞ 1 φ(x)dx < ∞ saying that the points can be arranged in decreasing order. Thus, the theorem focuses on the maximal external lengths showing that the longest external branches differ from a typical one by the factor s n in order of magnitude (see Corollary 1.4). For Kingman's coalescent, this result was obtained by Janson and Kersting [16] using a different method. This heuristics fails for the Bolthausen-Sznitman coalescent, which we address now. For n ∈ N, define the quantity t n := log log n − log log log n + log log log n log log n , where we put t n := 0 if the right-hand side is negative or not well-defined. Here we consider the point processes Ψ n on the whole real line given by for Borel sets B ⊂ R. As before, we focus on the maximal values of Ψ n .
Theorem 1.6. For the Bolthausen-Sznitman coalescent, the point process Ψ n converges in distribution as n → ∞ to a Cox point process Ψ on R directed by the random measure where E denotes a standard exponential random variable.
Observe that this random density may be rewritten as This means that the limiting point process can also be considered as a Poisson point process with intensity measure e −x dx shifted by the independent amount log E. This alternative representation will be used in the theorem's proof (see Theorem 8.1 below). Recall that G := − log E has a standard Gumbel distribution.
We point out that the limiting point process Ψ no longer coincides with the limiting Poisson point process as obtained for the maximal values of n independent exponential random variables.
The same turns out to be true for the scaling sequences. In order to explain these findings, let us consider the external branch with maximal length T n 1 . Then Theorem 1.6 implies that In particular, T n 1 → ∞ in probability. Hence, we pass with this theorem to the situation where very large mergers effect the maximal external lengths. Then circumstances change and new techniques are required. For this reason, we have to confine ourselves to the Bolthausen-Sznitman coalescent in the case of regularly varying µ with exponent α = 1.
It is interesting to note that an asymptotic shift by a Gumbel distributed variable also shows up in the absorption time τ n (the moment of the most recent common ancestor) of the Bolthausen-Sznitman coalescent: Goldschmidt and Martin [15]). However, this shift remains unscaled. Apparently, these two Gumbel distributed variables under consideration built up within different parts of the coalescent tree.
Before closing this introduction, we give some hints concerning the proofs. For the first three theorems, we make use of an asymptotic representation of the tail probabilities of the external branch lengths, which is given in Theorem 3.1 below. Remarkably, this representation involves solely the rate of decrease µ. This fact largely relies on different approximation formulas derived in [11].
The proofs of the last two theorems incorporate Corollary 1.4 as one ingredient. The idea is to implement stopping times ρ c,n with the property that at that moment a positive number of external branches is still extant which remains bounded as n → ∞. To these remaining branches, the results of Corollary 1.4 are applied taking the strong Markov property into account. More precisely, let be the block counting process of the n-coalescent, where N n (t) := #Π n (t) states the number of lineages present at time t ≥ 0. For definiteness, we put N n (t) = 1 for t > τ n . In the case of 1 < α ≤ 2, we set ρ c,n := inf {t ≥ 0 : N n (t) ≤ cs n } with some c > 0. Next, we split the external lengths T n i into the times q T n i up to the moment ρ c,n and the residual times T n i . Formally, we have q T n i := T n i ∧ ρ c,n and T n i := T n i − q T n i . The approach for the Bolthausen-Sznitman coalescent is essentially the same. Here the role of ρ c,n is taken by t c,n ∧ τ n , where t c,n := t n − log c log log n for some c > 1. Thus, for the Bolthausen-Sznitman coalescent, the external lengths T n i are split into The paper is organized as follows: Section 2 summarizes several properties of the rate of decrease.
The fundamental asymptotic expression of the external tail properties is developed in Section 3. Section 4 and 5 contain the proofs of Theorem 1.1 to 1.3. In Section 6, we prepare the proofs of the remaining theorems by establishing a formula for factorial moments of the number of external branches. Section 7 and 8 contain the proofs of Theorem 1.5 and 1.6.

Properties of the rate of decrease
We now have a closer look at the rate of decrease µ introduced in the first section. Defining we extent µ to all real values x ≥ 1, where the integrand's value at p = 0 is understood to be The next lemma summarizes some required properties of µ.
Lemma 2.1. The rate of decrease and its derivatives have the following properties: (i) µ(x) has derivatives of any order with finite values, also at x = 1. Moreover, µ and µ are both non-negative and strictly increasing, while µ is a non-negative and decreasing function.
(iv) In the dustless case, which is a C ∞ -function for x > 0. Set Note that the second integral in the first is finite and non-negative just as its integrand. Then we have From these formulas our claim follows.
In order to characterize regular variation of µ, we introduce the function Note that H is a finite function because we have For a Λ-coalescent without a dust component, the following statements hold: is regularly varying at infinity if and only if H(u) is regularly varying at the origin.
Then µ has an exponent α ∈ [1, 2], and we have is regularly varying at infinity with some exponent α ∈ (1, 2) if and only if the function (y,1] p −2 Λ(dp) is regularly varying at the origin with an exponent α ∈ (1, 2). Then we have The last statement brings the regular variation of µ together with the notion of regularly varying Λ-coalescents as introduced in [11].
For the proof of this proposition, we apply the following characterization of regular variation.

The length of a random external branch
Recall that T n denotes the length of an external branch picked at random. The following result on its distribution function does not only play a decisive role in the proofs of Theorem 1.1 and 1.3 but is of interest on its own. It shows that the distribution of T n is primarily determined by the rate function µ.
Theorem 3.1. For a Λ-coalescent without a dust component and a sequence (r n ) n∈N satisfying 1 < r n ≤ n for all n ∈ N, we have as n → ∞. Moreover, as n → ∞.
For the proof, we introduce some notations. Recall from the introduction that the continuoustime Markov process N n denotes the block counting process. For the embedded discrete-time Markov chain, we use the notation X = (X j ) j∈N 0 , where X j denotes the number of branches after j merging events. In particular, we have X 0 = n, and we set X j = 1 for j ≥ τ n , where τ n is defined as the total number of merging events. (For convenience, we suppress n in the notation of X.) The waiting time of the process N n in state X j is referred to as W j for 0 ≤ j ≤ τ n − 1.
The number of merging events until the external branch ending in leaf i ∈ {1, . . . , n} coalesces is given by Similarly, ζ n denotes the corresponding number of a random external branch with length T n .
Proof of Theorem 3.1. For later purpose, we show the stronger statement as n → ∞. It implies (3.1) by taking expectations and using dominated convergence. The In order to prove (3.3), note that, by the standard subsubsequence argument and the metrizability of the convergence in probability, we can assume that r n /n converges to some value q with 0 ≤ q ≤ 1. We distinguish three different cases of asymptotic behavior of the sequence r n /n: (a) We begin with the case r n ∼ qn as n → ∞, where 0 < q < 1. Then there exist q 1 , q 2 ∈ (0, 1) such that q 1 n ≤ r n ≤ q 2 n for all n ∈ N but finitely many.
From Lemma 4 of [11], we know that as n → ∞.
Now we have a closer look at the stopping times and their analogues From Lemma 2.1 (i) and (iii), we know that the function µ(x) is increasing in x and that x/µ(x) converges in the dustless case to 0 as x → ∞. In view of r n ≥ q 1 n, we therefore have Thus, Lemma 3 of [11] implies the convergence of ρ rn in probability to 0 and X ρr n = r n + O P ∆X ρr n = r n + O P (X ρr n ).
Hence, we may apply Proposition 3 of [11] yielding Using Lemma 2.1 (ii) once more, equation (3.4) implies In order to transfer this equality to the continuous-time setting, we first show that for each ε ∈ (0, 1) there is an δ > 0 such that for large n ∈ N. For the proof of the left-hand inequality, note that due to Lemma 2.1 (ii) we implying with q 1 n ≤ r n that These inequalities show how to choose δ > 0. The right-hand inequality in (3.6) follows along the same lines.
Finally, the choice f ≡ 1 in Proposition 2 of [11] provides for sufficiently small ε > 0, as n → ∞. Combining (3.5) to (3.7) yields where we used Lemma 2.1 (ii) for the last inequality. With this estimate holding for all ε > 0, we end up with as n → ∞. The reverse inequality can be shown in the same way so that we obtain equation (b) Now we turn to the two remaining cases r n ∼ n and r n = O(n). In view of Lemma 2.1 (ii), the asymptotics r n ∼ n implies µ(r n ) ∼ µ(n), i.e., the right-hand side of (3.3) converges to 1.
With respect to Lemma 2.1 (ii), part (a) therefore entails for all q ∈ (0, 1), as n → ∞. Hence, the left-hand side of (3.3) also converges in probability to 1. Similarly, the convergence of both sides of (3.3) to 0 can be shown for r n = O(n).
Proof of Theorem 1.2. Similar to the proof of Theorem 3.1, we are first considering the discrete version ζ n i of T n i for 1 ≤ i ≤ k to prove as n → ∞, where 0 =: I n 0 ≤ I n 1 ≤ · · · ≤ I n k are random variables measurable with respect to the σ-fields σ (N n ). Denote by ζ A the number of mergers until some external branch out of the set A ⊆ {1, . . . , n} coalesces, and let a := #A. Given ∆X j , the j-th merging amounts to choosing ∆X j + 1 branches uniformly at random out of the X j present ones implying for m ≥ 1 (for details see (28) The Markov property and (4.4) provide as n → ∞, where the rightmost O(·) term in the first line stems from the fact that X I n i < X j for all j < I n i . Further, from (4.4) with A = {i}, we know that so that we receive equation (4.3).
Now based on exchangeability, it is no loss to assume that 0 ≤ t n 1 ≤ · · · ≤ t n k . So inserting as n → ∞. For 1 ≤ i ≤ k, let 1 < r n i ≤ n be defined implicitly via From Lemma 2.1 (iii) we know that n 1 dx µ(x) = ∞ ; therefore, r n i is well-defined. Thus, we may apply formula (3.3) to obtain as n → ∞. Taking expectations in this equation yields via dominated convergence the theorem's claim.

Proof of Theorem 1.3
(a) First suppose that γ n T n converges for some positive sequence (γ n ) n∈N in distribution as n → ∞ to a probability measure unequal δ 0 with cumulative distribution function F = 1 − s F , i.e., where D denotes the set of discontinuities of s F . We first show that γ n ∼ cµ (n) for some c > 0. From Theorem 1.1 it follows that there exist 0 < c 1 ≤ c 2 < ∞ with and that 0 < s F (t) < 1 for all t > 0. Similarly as in the proof of Theorem 1.2, define r n (t) for Applying formula (3.3) and (5.1), we obtain for all t ≥ 0, t / ∈ D, as n → ∞. Differentiating both sides of (5.3) with respect to t and using Lemma 2.1 (i) yields γ n r n (t) µ(n) = µ(r n (t)) µ(n) ≤ 1.
Moreover, equation (5.5) with (5.2) yield r n (t) − 1 ≥ n/2 + O(n) for t sufficiently small. Taking (5.2) once more into account, we obtain that for given ε > 0 and t sufficiently small, or equivalently, for t > 0, The right-hand quotient is finite and positive for all t > 0, which implies our claim γ n ∼ cµ (n) for some c > 0.
Without loss of generality, we now set γ n = µ (n). With this choice, inserting (5.5) in (5.4) yields µ(n) s F (t) (1 + O(1)) = µ (r n (t)) = µ n − µ(n) µ (n) as n → ∞. In view of the monotonicity properties of µ and µ due to Lemma 2.1 (i), we may proceed to for suitable 0 < d 1 ≤ d 2 < ∞. From Lemma 2.1 (i) we know that µ(x) has an inverse ν(y). Let us now show that ν is regularly varying. For the inverse, the last equation translates into Applying ν to equation (5.7), both inside and outside, we get as y → ∞. This equation immediately implies that s F (v) < s F (u) for all u < v. It also shows that s F has no jump discontinuities, i.e., D = ∅. Indeed, by the mean value theorem and because ν (y) = 1/µ (ν(y)) is decreasing due to Lemma 2.1 (i), we have for 0 ≤ u < v, Thus, also assuming u, v / ∈ D, (5.9) yields which implies D = ∅. By a Taylor expansion, we get where s F (v)y ≤ ξ y ≤ s F (u)y. Dividing this equation by yν (y), using formula (5.9) and rearrang- Note that from Lemma 2.1 (iii) we have Hence, together with (5.
for each ε > 0 and for v close enough to u, or equivalently Again, since the right-hand quotient is finite and positive for all u < v, this estimate implies that ν (y)/ν ( s F (u)y) has a positive finite limit as y → ∞. (b) Now suppose that µ(x) is regularly varying with exponent α ∈ [1, 2], i.e., we have where L is a slowly varying function. Let r n := qn with 0 < q ≤ 1. The statement of Theorem

then boils down to
as n → ∞. From (5.10) we obtain as n → ∞. Thus, choosing, for given t ≥ 0, in equation (5.11) yields the claim.

Moment calculations for external branches of Λ-coalescents
In this section, we consider the number of external branches Y j after j merging events: In particular, we set Y 0 = n and Y j = 0 for j > τ n . (Again we suppress n in the notation for convenience.) We provide a representation of the conditional moments of the number of external branches for general Λ-coalescents (also covering coalescents with dust component).
For this purpose, we use the notation (x) r := x (x − 1) · · · (x − r + 1) for falling factorials with x ∈ R and r ∈ N. Recall that τ n is the total number of merging events.
(i) For a natural number r, the r-th factorial moment, given N n , can be expressed as (ii) For the conditional variance, the following inequality holds: Proof. (i) First, we recall a link between the external branches and the hypergeometric distribution based on the Markov property and exchangeability properties of the Λ-coalescent, as already described for Beta-coalescents in [6]: Given N n and Y 0 , . . . , Y ρ−1 , the ∆X ρ +1 lineages coalescing at the ρ-th merging event are chosen uniformly at random among the X ρ−1 present ones. For the external branches, this means that, given N n and Y 0 , . . . , Y ρ−1 , the decrement ∆Y ρ := Y ρ−1 − Y ρ has a hypergeometric distribution with parameters X ρ−1 , Y ρ−1 and ∆X ρ + 1. In view of the formula of the i-th factorial moment of a hypergeometric distributed random variable, we obtain Next, we look closer at the falling factorials. We have the following binomial identity for a, b ∈ R and r ∈ N. It follows from the ChuVandermonde identity Returning to the number of external branches, we get from the identity (6.2) that With equation (6.1), we arrive at Furthermore, combining the binomial identity (6.2) with the definition of ∆X ρ , we have Thus, and finally The proof now finishes by iteration and taking E [Y 0 |N n ] = Y 0 = X 0 into account.
(ii) The inequality for the conditional variance follows from the representation in (i) with r = 1 and r = 2: This finishes the proof.
7 Proof of Theorem 1.5 In order to study Λ-coalescents having a regularly varying rate of decrease µ with exponent for convenience. For k ∈ N and for real-valued random variables Z 1 , . . . , Z k , denote the reversed order statistics by We now prove the following theorem which is equivalent to Theorem 1.5: Theorem 7.1. Suppose that the Λ-coalescent has regularly varying rate µ with exponent 1 < α ≤ 2, and fix ∈ N. Then, as n → ∞, the following convergence holds: where U 1 > · · · > U are the points in decreasing order of a Poisson point process Φ on (0, ∞) for 0 < ε < α − 1, because µ is regularly varying with exponent α.
The next lemma deals with properties of the stopping times from (7.1) and (7.2). In particular, it reveals that external branches are still present up to the times ρ c,n .
Then we have: (ii) For each c > 0, as n → ∞, X ρc,n = cs n + O P (s n ).
(iii) We first prove that as n → ∞.
as n → ∞. Combining statement (ii) with Lemma 2.1 (ii), we therefore arrive at so that the regular variation of µ and the definition of s n imply (7.4). Thus, in the upper bound with ε > 0, the first right-hand probability converges to 0. For the second one, Chebyshev's inequality and Lemma 6.1 (ii) imply that From (7.4) and dominated convergence, we conclude as n → ∞, which gives the claim. Then for , y ∈ N, there exist random variables U 1,y ≥ . . . ≥ U ,y such that the following convergence results hold: (i) For fixed y ≥ , as n → ∞, L κ(cs n ) T n 1 , . . . , T n Y ρc,n = y, X ρc,n −→ L (U 1,y , . . . , U ,y ) in probability, where convergence takes place in the space of probability measures on R .
(ii) For fixed ∈ N, as y → ∞, where U 1 > · · · > U are the points of the Poisson point process of Theorem 7.1.
Proof. (i) Observe that due to the strong Markov property, given X ρc,n and the event Y ρc,n = y, the y remaining external branches evolve as y ordinary external branches out of a sample of X ρc,n many individuals. From these y external branches, we now consider the largest ones.
In view of Lemma 7.2 (i), we have X ρc,n → ∞ in probability as n → ∞. Hence, Corollary 1.4, together with established formulas for order statistics of i.i.d random variables, yields that with u 1 ≥ · · · ≥ u ≥ 0, is the limit of the conditional distributions of κ(X ρc,n )( T n 1 , . . . , T n ) as n → ∞, where f is the density from formula (1.1) and F its cumulative distribution function.
Our claim now follows due to Lemma 7.2 (ii) and Lemma 2.1 (ii).
(ii) Note that being the density of y −β (U 1,y , . . . , U ,y ), has the limit as y → ∞. Indeed, this is the joint density of the rightmost points U 1 > · · · > U of the Poisson point process given in Theorem 7.1.
Proof of Theorem 7.1. For convenience, we use in this proof the notation V c,n := κ(cs n ) T n 1 , . . . , T n .
Let g : R d → R be a continuous function, and assume that max |g| ≤ 1. For c > 0, we obtain via the law of total expectation and Lemma 7.3 (i) that as n → ∞. Without loss of generality, we may assume that the O P (·) term is bounded by 1. Hence, taking expectations, applying Jensen's inequality to the left-hand side and using dominated convergence, we obtain After this preparatory work, we now additionally assume that g is a Lipschitz continuous function with Lipschitz constant 1 (in each coordinate) and prove that which implies the theorem's statement. For ε > 0, we have We now use (7.7) for the first two right-hand terms and Lemma 7.2 (iii) for the first probability taking κ(cs n )/κ(s n ) ∼ c α−1 = c αβ into account. To the other probability, we apply Lemma 7.2 (i). Hence, passing to the limit as c → ∞ yields lim sup n→∞ E g κ(s n ) T n 1 , . . . , κ(s n ) T n − E g U 1 , . . . , U ≤ E [(εU 1 ) ∧ 2] + ε.
Finally, taking the limit ε → 0 and using dominated convergence gives the claim.
8 Proof of Theorem 1.6 Recall the notation of the reversed order statistics Z 1 ≥ Z 2 ≥ · · · of real-valued random variables as introduced in the previous section and the definition t n := log log n − log log log n + log log log n/ log log n.
In this section, we prove the following equivalent version of Theorem 1.6: as n → ∞, where U 1 > · · · > U are the maximal points in decreasing order of a Poisson point process on R with intensity measure e −x dx and G is an independent standard Gumbel distributed random variable.
Recall for c > 1 the notion t c,n := t n − log c log log n .
Furthermore, because of Thus, (8.1) transfers to as n → ∞, and our claim follows by method of moments.
The following lemma provides the asymptotic behavior of the joint probability distribution of the lengths of the longest external branches starting at time t c,n . Let  Proof. (i) We proceed in the same vein as in the proof of Lemma 7.3 (i). The strong Markov property, Corollary 1.4 (see also formula (1.2) in the first example) and Lemma 8.2 yield that the random vector log N n (t c,n ) T n 1 , . . . , T n , given N n (t c,n ) and the event M tc,n = m, has a limiting distribution as n → ∞ with density for u 1 ≥ · · · ≥ u . Moreover, from Lemma 8.2, we obtain log (N n (t c,n )) = t c,n + O P (1) = log log n + O P (log log n) as n → ∞. This implies our claim.
(ii) Shifting the distribution from (8.2) by log m, we arrive at the densities e −u i du 1 · · · du and its limit e −u i du i as m → ∞, which is the joint density of U 1 , . . . , U . This finishes the proof.
Next, we introduce the notion ρ c,n := min k ≥ 1 : W j > t c,n ∧ τ n .
It is important to note that in the case of the Bolthausen-Sznitman coalescent (7.3) is no longer valid, and consequently we may not simply apply (7.5). As a substitute, we shall use the following lemma. Proof. Let F k := σ (X, W 0 , . . . , W k−1 ) and In particular, we have Z 0 = 0. Given F j and X j = b with b ≥ 2, the waiting time W j in the Bolthausen-Sznitman coalescent is exponential with rate parameter b − 1 (see (47) in [27]).
Thus, (Z k ) k∈N is a martingale with respect to the filtration (F k ) k∈N with (predictable) quadratic variation Applying Doob's optional sampling theorem to the martingale Z 2 k − Z k yields and therefore, because of X ρc,n−1 = N n (t c,n ) a.s., By Lemma 8.2 and dominated convergence, the right-hand term converges to 0 as n → ∞ implying ρc,n−1 as n → ∞. Finally, the quantity ρc,n−1 j=0 W j − t c,n is the residual time the process N n spends in the state N n (t c,n ). Due to the property that exponential times lack memory, the residual time is exponential with parameter N n (t c,n ). Thus, in view of Lemma 8.2, the residual time converges to 0 in probability. This finishes the proof. The first right-hand term converges to 0 as c → ∞. Also, as we may assume ε < 1/2, the second term goes to 0 in view of the first claim of part (ii).
With these preparations, we now turn to Theorem 8.1.