The evolving beta coalescent

In mathematical population genetics, it is well known that one can represent the genealogy of a population by a tree, which indicates how the ancestral lines of individuals in the population coalesce as they are traced back in time. As the population evolves over time, the tree that represents the genealogy of the population also changes, leading to a tree-valued stochastic process known as the evolving coalescent. Here we will consider the evolving coalescent for populations whose genealogy can be described by a beta coalescent, which is known to give the genealogy of populations with very large family sizes. We show that as the size of the population tends to infinity, the evolution of certain functionals of the beta coalescent, such as the total number of mergers, the total branch length, and the total length of external branches, converges to a stationary stable process. Our methods also lead to new proofs of known asymptotic results for certain functionals of the non-evolving beta coalescent.


Introduction
In 1999, Pitman [14] and Sagitov [15] introduced coalescents with multiple mergers, also known as Λ-coalescents. These processes are continuous-time Markov processes taking values in the set of partitions of N, and as time goes forward, blocks of the partition merge together. For any finite measure Λ on [0, 1], the Λ-coalescent is defined by the property that whenever the restriction of the process to {1, . . . , n} has b blocks, each possible transition that involves b blocks merging into one happens at rate and these are the only transitions that occur. This means that when there are b blocks, the total rate of all mergers is When Λ is a unit mass at zero, only two blocks ever merge at a time, and each transition that involves two blocks merging into one happens at rate 1, so we get the celebrated Kingman's coalescent [10]. When Λ is the uniform distribution on [0, 1], the Λ-coalescent is known as the Bolthausen-Sznitman coalescent [4]. Coalescent processes arise naturally in population genetics, where they are used to model the genealogy of populations. The genealogy of a population of size n can be modeled by a process (Π n (r), r ≥ 0) taking its values in the set of partitions of {1, . . . , n}. The integers i and j are in the same block of the partition Π n (r) if and only if the ith and jth individuals in the population have the same ancestor r units back in time. Typically, Kingman's coalescent is used to model the genealogy of populations. However, in some circumstances, other Λ-coalescents could describe the genealogy of a population. For example, it was shown in [17] that if the probability of an individual having k or more offspring is proportional to k −α , where 1 < α < 2, then the genealogy of the population is best described by the Λ-coalescent in which Λ is the Beta(2 − α, α) distribution. This process will hereafter be called the Beta(2 − α, α)-coalescent, or simply the beta coalescent. Beta coalescents are thus natural models for populations with large family sizes, and predictions from beta coalescents were shown in [19] to fit genetic data from some marine species.
There has been recent interest in describing not only the genealogy of a population at a fixed time but also how the genealogy of a population changes over time. At any given time, the genealogical structure described by the coalescent process can also be represented by a tree. The shape of this tree changes over time, and the associated tree-valued process (T n (t), t ∈ R) is known as an evolving coalescent. For populations whose genealogy at a fixed time is described by Kingman's coalescent, the associated evolving coalescent was studied in [12,13,8]. The evolving Bolthausen-Sznitman coalescent was studied in [18].
In the present paper, we study the evolving coalescent for populations whose genealogy is given by the Beta(2 − α, α)-coalescent, where 1 < α < 2. To this end we consider a time rescaling depending on n, the scaled time being s = n α−1 t. On the original time scale, merging events occur at a rate of order n α ; see formula (23) below. Therefore the number of coalescent events in one unit of the scaled time is of order n, meaning that the rate at which a specific one of the n lineages takes part in a merging event is of order 1. Thus, while the original time t captures evolutionary time, the scaled time s plays the role of a generation time.
We show that as n → ∞, the distribution of certain functionals of the beta coalescent converges to a stable distribution of index α. For the evolving beta coalescent, the distribution of these functionals evaluated at times s 1 < s 2 < · · · < s d converges as n → ∞ to a multivariable stable distribution of index α. Examples of functionals that fit into this framework include the total number of merger events before all of the lineages have coalesced into a single lineage, the total length of all branches in the tree, and the total length of all external branches in the tree.
As a typical result, which follows by combining Example 7 and Corollary 5 below, we state the following theorem, which shows how the total branch length of the coalescent tree evolves over time.
Theorem 1. Let L n (t) be the total branch length of T n (t), t ∈ R. Then for 1 < α < 1 2 (1 + √ 5) the sequence of processes converges in finite-dimensional distributions as n → ∞ to the moving average process where g(r) = (α − 1)(αΓ(α)) 1 α−1 (r + αΓ(α)) − 2−α α−1 and (L s ) −∞<s<∞ , is a mean zero Lévy process with L 0 = 0 and Lévy measure 1 The time reversal in the integrator is explained in more detail in Section 2.5. The stochastic integral is well-defined for α < 1 2 (1 + √ 5). Above this threshold, the statement fails to be true even for the fixed value s = 0, as shown in [9]. Note that the function g(r) decreases as a power of r, and consequently the moving average process defined in (1) is not Markovian. The exponent −(2 − α)/(α − 1) tends to −∞ as α → 1. In the limiting case α = 1, which corresponds to the Bolthausen-Sznitman coalescent, it is known that the evolution of the total branch length converges to a moving average process defined as in (1), but with g(r) = e −r . In this case the limit process is Markovian. For details, see [18].
Our approach consists of deriving asymptotic expansions for suitable functionals by means of Poisson integrals. Thereby we rediscover some known results for functionals of the static (nonevolving) beta coalescent, e.g. for its total length L n and total external length ℓ n . Moreover by means of the Poisson integral representations we get hold also on their (properly scaled) joint distributions, which are asymptotically multivariate stable. This allows us to calculate, for example, the asymptotic distribution of ℓ n /L n (see Example 9 in Section 2.6).
We give a precise construction of the evolving beta coalescent, as well as a precise statement of the main results and a few examples in section 2. Proofs are given in section 3.

2
Framework, main results, and examples

Construction of the evolving beta coalescent
We give here a precise construction of the evolving Λ-coalescent, which is modeled after the Poisson process construction of the Λ-coalescent given in [14] and is similar to the construction of the evolving Bolthausen-Sznitman coalescent in [18]. Let Λ be a finite measure on (0, 1]. We will construct a population of fixed size n defined for all times t ∈ R whose genealogy is given by the Λ-coalescent. Individuals in the population will be labeled 1, . . . , n. Let Υ = Υ n be a Poisson point process on R × (0, 1] × [0, 1] n with intensity measure dt × p −2 Λ(dp) × dv 1 . . . dv n .
Suppose (t, p, v 1 , . . . , v n ) is a point of Υ. If zero or one of the points v 1 , . . . , v n is less than p, then no change in the population occurs at time t. However, if k ≥ 2 of these points are less than p, so that v i 1 < · · · < v i k < p, then at time t, the individuals labeled i 2 , . . . , i k all die, and the individual labeled i 1 gives birth to k − 1 new individuals who take over the labels i 2 , . . . , i k . This implies that if we are following the genealogy of the population backwards in time, the lineages labeled i 1 , . . . , i k will all coalesce at time t. To see that the Λ-coalescent describes the genealogy of this population, note that the rate of events that cause the lineages i 1 , . . . , i k to coalesce is This construction is well-defined because the rate of changes in the population is bounded above by 1 0 n 2 p 2 · p −2 Λ(dp) < ∞.
Note that, since we are ordering the genealogy with respect to the v i , and not in a lookdown manner with respect to the indices i, we do not have strong consistency in n.
Although this construction works whenever Λ({0}) = 0, we will hereafter restrict ourselves to the case in which Λ is the Beta(2 − α, α) distribution. For each t ∈ R, there will be a different realization of the beta coalescent which describes the genealogy of the population at time t. We denote the corresponding coalescent tree (read off from the genealogy backwards from time t) by T n (t) and call the process (T n (t), t ∈ R) the evolving beta coalescent.

Two Poisson processes
The evolving beta coalescent is constructed from a Poisson process Υ n on R × (0, 1] × [0, 1] n with intensity measure In this section, we will construct two other Poisson processes, denoted by Ψ n and Θ n , that will be useful for analyzing the evolving beta coalescent and the static (non-evolving) beta coalescent back from time 0, respectively. First, we obtain a Poisson process Υ ′ on R × R + in two steps by discarding all but the first two coordinates of the points of Υ n , and then augmenting these points with the points of an independent Poisson process with intensity From Υ ′ , we can then obtain a Poisson process Ψ n via the mapping (t, p) → (s, u) := (n α−1 t, n 1−1/α p). Figure 1: The process Υ ′ contains the points (marked by •) from the first two coordinates of the points of Υ, plus additional points (marked by •) that make up for the difference between the intensities p −1−α (1 − p) α−1 1 p≤1 and p −1−α , p > 0. The point process Ψ n arises from Υ ′ through the transformation (t, p) → (s, u) = (n α−1 t, n 1−1/α p).
That is, if (t, p) is a point of Υ ′ , then (n α−1 t, n 1−1/α p) is a point of Ψ n . It is straightforward to verify that the intensity of Ψ n is also given by (3), now with (t, p) replaced by (s, u).
The reason for considering the rescaled Poisson process Ψ n is that the fluctuations in the behavior of the beta coalescent that will be important for studying the evolving coalescent are those that happen after the coalescent has evolved for a time which is O(n 1−α ). Also, the largest mergers on this time scale affect approximately a fraction O(n −1+1/α ) of the blocks, in other words a number of O(n 1/α ) blocks out of the n initial blocks. If (s, u) is a point of Ψ n , then if this point corresponds to a point of Υ n (i.e. unless it corresponds to a point of Υ ′ that does not belong to Υ n ), at time n 1−α s there is an event during which approximately a fraction n −1+1/α u of the blocks merge together.
We now define another Poisson process Θ n on (0, 1] × R + which is useful for studying the static beta coalescent that describes the genealogy of the population at time 0. Writing r = −s for the reverse time, which is the coalescence time direction, we obtain Θ n by first restricting Ψ n to all points (s, u) with s ≤ 0 and then applying the mapping (s, u) → (x, y), t ≤ 0, to the remaining points, where for This quantity is the asymptotic proportion of the number of blocks that have not yet coalesced by the reverse time r; see Lemma 13 below. To calculate the intensity of Θ n , we invert the mapping given by (4) to get It follows that x .
Suppose (x, y) is a point of Θ n . Because the number of blocks in a beta coalescent at time r is approximately m(r)n, the merger corresponding to (x, y) occurs when approximately a fraction xn blocks remain, and the number of blocks that merge is approximately n 1/α−1 · yn = n 1/α y. Therefore, whereas the second coordinate of Ψ n approximates the fraction of blocks lost due to a merger, the second coordinate of Θ n represents the number of blocks lost due to a merger. The fact that the Poisson process Θ n is homogeneous in time reflects the fact that the distribution of the number of blocks lost in the first merger tends to a limit as the number of blocks at time zero tends to infinity; see section 3.1.

Review of results on stable laws and Poisson integrals
To state and prove our main results, we will need to review some results on both univariate and multivariate stable distributions. We will restrict ourselves to the stable distributions of index α, where 1 < α < 2. Following the notation of [16], we write in the univariate case Z ∼ S α (σ, β, µ) if the characteristic function of the random variable Z is given by with location parameter µ (which equals E(Z) for 1 < α < 2), scale parameter σ > 0, and skewness parameter β ∈ [−1, 1]. Here sgn(θ) = 1 if θ > 0, sgn(θ) = −1 if θ < 0, and sgn(0) = 0. We only deal with the case µ = 0. When µ = 0, β = 1, the characteristic function of Z can also be written in the Lévy-Khinchine form (since α > 1, we may and do avoid here the customary truncation within the integrand), where One can then construct Z in the following way. Consider a Poisson point process i≥1 δ y i on (0, ∞) with intensity by −1−α dy, and for ε > 0 define Then the limit exists and obeys To facilitate comparisons with results in [5,9], we note that if c > 0 and if Z has a stable distribution such that as z → ∞, then cZ ∼ S α (σ, 1, 0) with σ given by (7) with b = αc α , which means Note that the results in [5,9] are stated with −Z in place of Z.
Next we define the notion of an α-stable random measure, following sections 3.3 and 3.12 of [16], again only for β = 1. Let (E, E, ρ) be a measure space, and let E 0 = {A ∈ E : ρ(A) < ∞}. Then an α-stable random measure with control measure ρ is defined as a countably additive function M which assigns a random variable to each set A ∈ E 0 , satisfying the following properties: Then the second coordinates of those points of Ξ that fall in A × R + form a Poisson point process on R + with intensity aρ(A)y −1−α dy. If one constructs M (A) just as Z is obtained in (8) with b = aρ(A), then M = M Ξ is an α-stable random measure with control measure ρ (and β = 1) obtained from Ξ. As noted in section 3.4 of [16], if f : E → R is a function such that E |f (x)| α ρ(dx) < ∞, then one can define the integral by approximating f with simple functions. It is shown in [16] that In particular it follows for any linear combination In the case α > 1 this implies that the joint distribution of [16]. Here Γ is the so-called spectral measure of the α-stable random vector (I(f 1 ), . . . , I(f d )), which characterizes the joint distribution. In our case E will be either R or the interval (0, 1]. Then the above Poisson integrals can also be viewed as stochastic integrals using a Lévy process L = L Ξ with mean zero, constructed from the Poisson point process Ξ in the usual manner via compensation. Thus,

An asymptotic expansion for the static case
Consider a beta coalescent back from time 0, and let N n (r) be the number of blocks in the partition at time r, so in particular N n (0) = n. Let R 0 = 0, and for k ≥ 1, let Let τ n = max{k : R k < ∞}. Thus, R 1 < · · · < R τn are the times at which mergers occur, and τ n is the number of mergers before only one block remains. For 0 ≤ k ≤ τ n , let and let X k = 1 for k > τ n , which means n = X 0 > X 1 > · · · > X τn = 1. The process (X k ) τn k=0 is called the block-counting process associated with the beta coalescent.
The result below shows that certain functionals of the block-counting process have an asymptotic stable law as n → ∞. For this purpose, we consider the beta coalescent, constructed from Υ n as in Section 2.1, and the α-stable random measures M n = M Θn obtained from Poisson point processes Ξ n as described in Section 2.3, with E = (0, 1], intensity (6) and Ξ n = Θ n (arising from Υ n as described in Section 2.2). In view of (11) the control measure of M n is We denote by F the set of all differentiable functions f which for some c > 0 and 0 < ζ < 1 for all x ∈ (0, 1]. Theorem 2. Let (X k ) τn k=0 be the block-counting process associated with the beta coalescent, and let f ∈ F . Then as n → ∞, where M n is an α-stable random measure with control measure (15).
The statement of Theorem 2 is an asymptotic version of the equation The proof consists of showing that the sums appearing in the right-hand side of (18) may asymptotically be replaced by the integrals Since the control measure, and hence the distribution, of M n does not depend on n, the asymptotic expansion (17) in Theorem 2 directly imples convergence in distribution not only for a single f ∈ F , but also for finitely many f 1 , . . . , f d . To ease notation, let us define, for f ∈ F and n ∈ N, where σ f , β f are as in (13), I(f ) is as in (12) with an α-stable random measure M with control measure (15), and ⇒ denotes convergence in distribution as n → ∞.

The evolving beta coalescent
Now we transform the asymptotic expansion from Theorem 2 to the evolving coalescent. Here it is convenient to use Lévy processes. Let be the mean zero stable Lévy processes with L n,0 = L n,0 = 0 a.s. and with jumps ∆L n,s = u and ∆L ′ n,x = y for all points (s, u) from Ψ n and (x, y) from Θ n , respectively. For the time-reversals we writeL n,r := L n,s− , s = −r, r ≥ 0, Then the mapping from (4) translates into the following lemma. Proof.
The assumption on f guarantees that the integrals are well-defined. The processes (L n,r ) r≥0 and (L ′ n,m(r) ) r≥0 are mean zero Lévy processes, and hence martingales with respect to the filtration is a local martingale. Jumps can only occur at points (r, u) in Ψ n ; because of (4) they vanish: Thus, J is a.s. continuous. Moreover, since the underlying processes are Lévy processes without a Brownian component, the quadratic variation of J is From (20) and (14) it follows that M n = M Θn satisfies Now we apply Lemma 4 to get Let us now proceed to consider the evolving coalescent (T n (t), t ∈ R) described in Section 2.1. For each s ∈ R and n ∈ N, we denote the block counting process of the coalescent tree T n (n 1−α s) by (X s k ) τ s n k=0 . By shifting the origin of the scaled time to the time point s and re-centering the process L at this new time origin (which does not affect its increments), we can apply Theorem 2 together with (21) and conclude that Writing J n f (s) for the random variable (19) with (X k ) = (X 0 k ) replaced by (X s k ), we thus obtain Since the distribution of the Poisson point process Ψ n , and hence also that of the Lévy process L n = L Ψn , does not depend on n, we obtain the following result for the evolving beta coalescent.
Corollary 5. For f ∈ F and s ∈ R, let J n,s (f ) be as in (19), but now evaluated at the coalescent tree T n (n 1−α s) instead of T n (0). Then the sequence of stationary processes (J n,s (f )) −∞<s<∞ , n ≥ 1, converges as n → ∞ in finite-dimensional distributions to the moving average process where m is given by (5) and and (L s ) −∞<s<∞ is a mean zero Lévy process with L 0 = 0 and Lévy measure given by (2).
To understand better how these functionals of the beta coalescent evolve over time, note that the stable random variable (0,1] f (x) M n (dx) from Theorem 2, which gives the limit of the functional J n,0 (f ), is a function of the Poisson process Θ n , and is therefore also a function of the Poisson process Ψ n . Likewise, the stable random variable that gives the limit of

Functionals of the beta coalescent
In this section, we consider three functionals of the beta coalescent: the number of collisions, the total branch length, and the total length of external branches. We observe how Theorem 2 allows us to recover known results for the asymptotic distributions of these quantities for the static beta coalescent. Then Corollary 5 allows us to describe how these functionals behave over time in the evolving beta coalescent. We also obtain a new result about the ratio of the external branch length to the total branch length, which could be of interest for biological applications.
Example 6. Consider the number τ n of collisions before just a single block remains. Because we can apply directly the result of Theorem 2 with f (x) = α − 1 for all x ∈ (0, 1]. We get that for 1 < α < 2, where This agrees with the result of Lemma 4 in [5], where the limit on the right-hand side of (22) is expressed as c 1 Z for −Z satisfying (9) and c 1 = (α − 1) 1+1/α /Γ(2 − α) 1/α . This result had also been shown in [6,7,11], and the equivalence between the two ways of expressing the limit can be seen from (10). Because we use this result in our proof of Theorem 2, we have not obtained here another independent proof of this result. The benefit is that our approach allows us to examine the common distribution of τ n and other functionals. Also, Corollary 5 with f (x) = α − 1 allows us to understand how the total number of collisions changes over time for the evolving beta coalescent. In particular, we see that the limit process is a stationary stable process that can be expressed, in a relatively simple way, as a moving average process.
Example 7. Consider next the total length L n of all branches in the coalescent tree. This quantity is of interest in Biology because the total branch length should be approximately proportional to the number of mutations observed in a sample of n individuals. Note that Lemma 2.2 in [6] implies that as m → ∞, Therefore, there is a constant c > 0 such that Also, conditional on σ(X) = σ(X 0 , X 1 , . . . , X τn ), the distribution of X k (R k+1 −R k ) is exponential with rate parameter λ X k . It follows that and It now follows from (24), (25), (26), and Chebyshev's Inequality that if 1 < α < 1 2 (1 + √ 5), so that 1 + α − α 2 > 0, we have n α−1−1/α (L n − L ′′ n ) ⇒ 0. Therefore, we may replace L n by L ′′ n in asymptotic calculations. Because which agrees with part (i) of Theorem 1 in [9]. It also follows from Corollary 5 that for the evolving beta coalescent, the evolution of the total branch length, scaled as above, converges in the sense of finite-dimensional distributions to a stationary stable process.
Example 8. Consider also the total length ℓ n of all external branches in the tree. This quantity is also of interest in Biology, as it should be approximately proportional to the number of mutations that appear on just one individual in a sample of n individuals. It is shown in the proof of Theorem 1 in [5] that Therefore, so for 1 < α < 2, we can apply Theorem 2 with f (x) = α(α − 1)(2 − α)Γ(α) to get in agreement with Theorem 1 of [5].
Example 9. Finally, we consider the quantity ℓ n /L n , which in the biological setting should be approximately equal to the proportion of mutations that appear on only one individual. This ratio is potentially useful for drawing inferences about the genealogy of a population from data, in part because the value that we expect for this ratio does not depend on the mutation rate, which is often unknown. Indeed, it follows from results in [1] that the parameter α in the beta coalescent can be consistently estimated by the quantity 2 − ℓ n /L n . Assume that 1 < α < 1 2 (1 + √ 5). From the discussion in Examples 7 and 8, we see that Therefore, It follows that Using the substitution y = x α−1 , the integral transforms to a beta integral: . Altogether, , then the fluctuations in L n are of a higher order of magnitude than the fluctuations of ℓ n , so the asymptotic distribution of ℓ n /L n is determined by the asymptotics of L n given in Theorem 2 of [5]. In particular, when 1 2 (1 + √ 5) < α < 2, the asymptotic distribution of ℓ n /L n is no longer a stable law.

Proofs
Let us remark in advance that for f satisfying (16) with ζ < 2 by linearity we may and will assume the following properties: f (x) ≥ 1 for all x ∈ (0, 1], f (x) is monotonically decreasing and x 2 f (x) is monotonically increasing.
Indeed, for any f satisfying the condition |f Then f 1 and f 2 fulfil these three requirements, if we let This implies that f 1 is decreasing and also Taking also into account This gives the assertion.
Note also for f satisfying (16), there is a positive constant c such that for all x ∈ (0, 1] and therefore 1 0 f (x) α dx < ∞.
To facilitate notation we adopt, here and throughout the rest of the paper, the convention that c > 0 denotes a constant, only dependent on α, which may change its value from term to term.

The number of blocks for the beta coalescent
We assemble here some results about the evolution of the number of blocks for the beta coalescent. We adopt the notation of Section 2.4, so that τ n is the total number of mergers and (X k ) τn k=0 is the block counting process. Let which are the numbers of blocks lost during the mergers. Define The numbers q i are the weights of a probability distribution on N (see [9]). From Stirling's formula where the last equality is formula (5) in [9]. See also [3,6] for a discussion of this probability distribution. It has been known since the work of Bertoin and Le Gall [3] that the distribution of the random variables Y 0 , Y 1 , . . . is well approximated by (q i ) ∞ i=1 . The next result gives a bound on the accuracy of this approximation.

Lemma 10.
There is a number c < ∞ such that for j ≥ 2 and 1 ≤ i < j Proof. In the proof of Lemma 3 in [9] (see there the two displayed formulas before (9)) it is shown that there are real numbers b j such that for 1 ≤ i < j and so for j > γ This gives our claim in the case when j > 2γ. The other finitely many cases are covered too, if we choose c sufficiently large.
The next two lemmas contain our first applications of these estimates.
From Lemma 10 and (28) and consequently, using that τ n ≤ n − 1, Since α > 1, the series is convergent. Also as a → ∞, we have η → 1, and the claim follows.
Therefore, using the strong Markov property, we obtain which gives the claim.
The next lemma pertains to the evolution of the number of blocks in continuous time and follows fairly directly from results in [2], where the number of blocks was studied for the beta coalescent started with infinitely many blocks. Recall the definition of m(r) from (5) and let as above N n (r) denote the number of blocks at time r. Proof. Consider a beta coalescent started with infinitely many blocks at time zero, and let N (r) = N ∞ (r) denote the number of blocks at time r. Theorem 1.1 of [2] states that lim r↓0 r 1/(α−1) N (r) = (αΓ(α)) 1/(α−1) a.s.

Functionals of the block counting process
We again use the notation of section 2.4, so that n = X 0 > X 1 > · · · > X τn = 1 is the blockcounting process of the beta n-coalescent and Y k = X k − X k+1 . or Because of Lemma 10, equation (28), and the fact that α < 2, for k < τ n we get if only ζ is chosen sufficiently close to 1/α, such that the right-hand series is convergent. Altogether in view of Lemma 11 we obtain Finally, since ζ < 1/α < 1 it follows from (27) that This gives the second assertion.
Lemma 15. Let f ∈ F . Then for any η > 0 there is an ε > 0 such that for all n ≥ 1 Proof. Suppressing the dependence on n in the notation, let Since (X k ) is a Markov chain and τ n is a stopping time, the random variables have zero mean and are uncorrelated. Therefore, From Lemma 10 and (28), we see that Thus, Because ζ < 1/α, we have, using (27), and it follows that Thus, by Chebyshev's Inequality, if ε is sufficiently small, then It remains to replace γ(X k ) by γ in this formula. From we get from Lemma 10 and (28) the estimate, uniform in n and k, From (28) and (29), where the o(1) goes to 0 with n going to infinity, uniformly in j. Putting these formulas together we arrive at It follows that Because | d ds f (s) α | = αf (s) α−1 |f ′ (s)| ≤ cs −ζ(α−1) s −ζ−1 = cs −αζ−1 and αζ < 1, we may estimate the middle term in the right-hand side of (37) by applying the first statement of Proposition 14.
This implies which, combined with (36), implies the result.

Proof of Theorem 2
Let f ∈ F . We also assume that f (x) is decreasing, x 2 f (x) is increasing and f (x) ≥ 1 for all x; see the remark at the beginning of Section 3.
We have to show that In view of Proposition 14 it suffices to show that Enumerate the points of Θ n as (x i , y i ) ∞ i=1 and for ε > 0 let First let us check that S n (ε) − E(S n (ε)) → Z n in probability as ε → 0. For this purpose note that from (6) which is finite by our assumptions on f . Thus the sum in (40) has a.s. finitely many summands. Also for η > 0 Letting η → 0 we obtain which goes to 0 as ε → 0. It now follows from Chebyshev's Inequality and the fact that E[Z] = 0 that for all η > 0 and sufficiently small ε, P (|Z n − (S n (ε) − E(S n (ε))| > η) < η.
Because E(S n (ε)) equals the constant d in (35), it follows from Lemma 15 that for sufficiently small ε we have for sufficiently large n. Thus, to show (39), it suffices to show that for all ε > 0 we have for sufficiently large n. For this, we will use the following lemmas. Again let Θ n be the Poisson point process constructed in section 2.2 from another Poisson point process Ψ n . Denote the points of Θ n by ( are the points of Ψ n and r i = −s i . Lemma 16. Let δ > 0, ε > 0. With probability tending to 1 as n → ∞, for all i such that f (x i )y i > ε, there exists a positive integer k i such that the following hold: (i) R k i = n 1−α r i , i.e. at time n 1−α r i there is a merger in the coalescent back from time 0.
(ii) The block size X k i = N n (n 1−α r i −) and the merger's size Y k i = N n (n 1−α r i −) − N n (n 1−α r i ) fulfil Proof. The points (n 1−α s i , n −1+1/α u i ) are points of Υ ′ . Consider those points, which in addition fulfil f (x i )y i > ε. First we verify that the probability that some of these points do not belong to Υ n is asymptotically vanishing. The expected number of indices i with f (x i )y i > ε, that is Note that 1 − q(u) ≤ cu for all u ≥ 0. Using this bound when r ≤ n γ and the bound 1 − q(u) ≤ 1 when r ≥ n γ , we see that the above expectation is at most Making the substitution x = m(r) and using (27) again, we get that this expression is bounded above by which tends to zero as n → ∞ if γ > 0 is sufficiently small. By Markov's Inequality, the probability that (n 1−α s i , n −1+1/α u i ) is a point of Υ n for all i with f (x i )y i > ε tends to 1 as n → ∞. Second we show that the probability that no more than one of the remaining N n (n 1−α r i −) blocks takes part in a merging event at time n 1−α r i is asymptotically vanishing. We choose a function h : N → ∞ such that lim n→∞ h(n) = ∞, and h(n) = o(n α−1 ). The expected number of indices i such that r i > h(n) and f (x i )y i > ε is at most which tends to zero as n → ∞. Thus, we may assume , this implies by our assumptions on ζ and h(n) that Also, because E(S n (ε)) does not depend on n and is finite, it suffices to show that points (x i , y i ) with f (x i )y i > ε and r i ≤ h(n) lead to mergers fulfilling condition (ii) with high probability for large n. At time n 1−α r i , the number of blocks of the beta coalescent is reduced by (A i − 1) ∨ 0, where A i has a binomial distribution with parameters n i = N n (n 1−α r i −) and p i = n −1+1/α u i . By Chebyshev's Inequality, we have Let θ = δ/3. Lemma 13 implies that with probability tending to 1 as n → ∞, we have and on this event the right-hand side of (44) is in view of (43) bounded above by Due to our assumption on ζ the right-hand side tends to zero as n → ∞. Taking expectations in (44) gives P (|A i − n i p i | > θn 1/α y i ) = o(1).
Thus, with probability tending to one as n → ∞, the beta coalescent must have a merger at time n 1−α r i , which means n 1−α r i = R k i for some positive integer k i . That is, condition (i) in the statement of the lemma holds. Now, because δ = 3θ, condition (ii) is a consequence of Lemma 13 and (46).
Lemma 17. Let δ > 0, ε > 0. With probability going to 1 as n → ∞, for all 0 ≤ k < τ n with f ( X k n ) Y k n 1/α > ε, where i k is determined by X k = N n (n 1−α r i k −) or Y k = N n (n 1−α r i k −) − N n (n 1−α r i k ).
Proof. First we estimate the expectation of S n (ε, γ) = n −1/α k<τn f X k n Y k 1 {f (X k /n)Y k >εn 1/α } 1 {X k ≤γn} with 0 < γ ≤ 1. From Lemma 10 and (28) and, since f is assumed to be decreasing, By the assumptions on f , the integral is finite. For γ = 1 we see that E(S n (ε, γ)) is uniformly bounded in n. Since each positive summand of S n (ε, γ) is bigger than ε, it follows that the number of positive summands is stochastically bounded. Therefore it is sufficient to verify that for any 0 ≤ k < τ n with f (X k /n)Y k > εn 1/α the two formulas (47) hold with probability going to 1, uniformly in k.
Let θ > 0. Then there is a γ > 0 such that E(S n (ε, γ)) ≤ θε. Therefore, with probability at least 1 − θ, we have X k ≥ γn for all k such that f (X k /n)Y k > εn 1/α . In view of Lemma 13, since N n (r) is decreasing, this implies that with probability going to 1 we have (1 + δ)m(r i k ) ≥ γ, which implies that r i k ≤ T for some fixed constant T < ∞ and also Furthermore X k ≥ γn and f (X k /n)Y k > εn 1/α imply Y k > cn 1/α with c = ε/ sup x≥γ f (x) > 0. Let n k = N n (n 1−α r i k −). Since r i k ≤ T , by Lemma 13 with probability going to 1 |n k p i k − n 1/α y i k | = |n −1 N n (n 1−α r i k −) − m(r i k )|n 1/α u i k ≤ δ 3 n 1/α y i k .
Since Y k = (A k − 1) ∨ 0, where A k is binomial with parameters n k and p i k , it follows that P (Y k > cn 1/α , |Y k − n 1/α y i k | > δy i k n 1/α | n k , p i k , y i k ) ≤ P (Y k > 0, |A k − n k p i k | > δ 2 y i k n 1/α | n k , p i k , y i k )1 {y i k >n −η } + P (A k > cn 1/α | n k , p i k , y i k )1 {y i k ≤n −η } .
Since θ was arbitrary, this gives our assertions.