On the total external length of the Kingman coalescent

In this paper we prove asymptotic normality of the total length of external branches in Kingman’s coalescent. The proof uses an embedded Markov chain, which can be described as follows: Take an urn with n black balls. Empty it in n steps according to the rule: In each step remove a randomly chosen pair of balls and replace it by one red ball. Finally remove the last remaining ball. Then the numbers U k , 0 ≤ k ≤ n , of red balls after k steps exhibit an unexpected property: ( U 0 ,..., U n ) and ( U n ,..., U 0 ) are equal in distribution.


Introduction and results
Our main result in this paper is that the total length L n of all external branches in Kingman's coalescent with n external branches is asymptotically normal as n → ∞.
As is customary the coalescent can be represented by a tree with n leaves labelled from 1 to n. Each of these leaves corresponds to an external branch of the tree. The other node of the branch with label i is located at level ρ(i) := max{k ≥ 1 : {i} ∈ π k } within the coalescent. The length of this branch is T ρ(i) , The total external length of the coalescent is given by This quantity is of a certain statistical interest. Coalescent trees have been introduced by Kingman (1982) as a model for the genealogy of n individuals, down to their most recent common ancestor. Mutations can be located everywhere on the branches. Then mutations on external branches affect only single individuals. This fact was used by Fu and Li (1993) in designing their D-statistic and providing a test whether or not data fit to Kingman's coalescent.
Elsewhere the total external length of coalescents has been studied by Möhle (2010). He obtained results on the asymptotic distribution for a class of coalescents, which differ substantially from Kingman's coalescent. It includes so-called Beta(2 − α, α)-coalescents with 0 < α < 1. For 1 < α < 2 Berestycki et al (2006) proved a law of large numbers (see the quantity M 1 (n) in their Theorem 9); a more general result is contained in Berestycki et al (2011). Otherwise single external branches have been investigated in the literature. The asymptotic distribution of T ρ(i) has been obtained by Caliebe et al (2007), using a representation of its Laplace transform due to Blum and François (2005 Here is our main result.
Here d → denotes convergence in distribution. The proof will show that the limiting normal distribution originates from the random partitions and not from the exponential waiting times.
A second glance on this result reveals a peculiarity: The normalization of L n is carried out using its expectation, but only half of its variance. These two terms have been determined by Fu and Li (1993) (with a correction given by Durrett (2002)). They obtained with h n := 1 + 1 2 + · · · + 1 n , the n-th harmonic number. Below we derive a more general result. To uncover this peculiarity we shall study the external lengths in more detail. First we look at the point processes η n on (0, ∞), given by for Borel sets B ⊆ (0, ∞).

Theorem 2.
As n → ∞ the point process η n converges in distribution, as point processes on (0, ∞], to a Poisson point process We use (0, ∞] in the statement of Theorem 2 instead of (0, ∞) since it is stronger, including for The significance is that, as n → ∞, there will be points clustering at 0 but not at ∞. (Below in the proof we recall the definition of convergence in distribution of point processes.) It is not evident, whether there exists a connection to the Poisson point processes introduced by Pitman (1999) for the construction of coalescent processes.
Theorem 2 permits a first orientation. Since nL n = x η n (d x), one is tempted to resort to infinitely divisible distributions. However, the intensity measure λ(d x) is slightly outside the range of the Lévy-Chintchin formula. Shortly speaking this means that small points of η n have a dominant influence on the distribution of L n and we are within the domain of the normal distribution.
Thus let us look in more detail on the external lengths and focus on which is the total length of those external branches having their internal nodes between level n α and n β within the coalescent. Obviously L n = L 0,1 n .
In particular E(L 1− ,1  n the summands are of order 1/n and log n/n, such that in the limit the second, asymptotically normal component dominates. To this end, however, n has to become exponentially large, otherwise the few long branches, which make up L 0, 1 2 n , cannot be neglected and may produce extraordinary large values of L n . Thus the normal approximation for the distribution of L n seems little useful for practical purposes. One expects a fat right tail compared to the normal distribution. Indeed ∞ 2 x η(d x) has finite mean but infinite variance. This is illustrated by the following two histograms from 10000 values of L n , where the length of the horizontal axis to the right indicates the range of the values. The heavy tails to the right are clearly visible. Also very large outliers appear: For n = 50 the simulated values of L n range from 0.685 to 8.38, and for n = 1000 from 1.57 to 7.87.
Also it turns out that the approximation of the variance in Proposition 3 is good only for very large n. This can be seen already from the formula of Fu and Li. To get an exact formula for the variance we look at a somewhat different quantity, namelŷ with 0 ≤ α < β ≤ 1, which is the portion of the external length between level n α and n β within the coalescent.
For α = 0 we recover the formula of Fu and Li. A similar expression holds forL α,β n . Proposition 3 and Theorem 4 carry over toL α,β n , up to a change in expectation and with the limit nL The following histogram from a random sample of length 10000 shows that already for n = 50 the distribution ofL 1 2 ,1 n fits well to the normal distribution when using the values for expectation and variance, given in Proposition 5. Our main tool for the proofs is a representation of L n by means of an imbedded Markov chain U 0 , U 1 , . . . , U n , which is of interest of its own. We shall introduce it as an urn model. The relevant fact is that this model possesses an unexpected hidden symmetry, namely it is reversible in time. This is our second main result. For the proof we use another urn model, which allows reversal of time in a simple manner.
The urn models are introduced and studied in Section 2. Proposition 3 is proven in Section 3, Theorems 2 and 4 are derived in Section 4 and Proposition 5 in Section 5.

The urn models
Take an urn with n black balls. Empty it in n steps according to the rule: In each step remove a randomly chosen pair of balls and replace it by one red ball. In the last step remove the last remaining ball. Let U k := number of red balls in the urn after k steps .
. . , U n is a time-inhomogeneous Markov chain with transition probabilities We begin our study of the model by calculating expectations and covariances.
Proof. Imagine that the black balls are numbered from 1 to n. Let Z ik be the indicator variable of the event that the black ball with number i is not yet removed after k steps.
Our claim now follows by careful calculation.
Note that these expressions for expectations and covariances are invariant under the transformation k → n − k, l → n − l. This is not by coincidence: It turns out that for this process one can specify different dynamics, which are more lucid and amenable to reversing time.
Consider the following alternative box scheme: There are two boxes A and B. At the beginning A contains n − 1 black balls whereas B is empty. The balls are converted in 2n − 2 steps into n − 1 red balls lying in B. Namely, in steps number 1, 3, . . . , 2n − 3 a randomly drawn ball from A is shifted to B and in steps number 2, 4, . . . , 2n − 2 a randomly chosen black ball (whether from A or B) is recolored to a red ball. These 2n − 2 operations are carried out independently.
For 1 ≤ k ≤ n − 1 let U k := number of red balls in box A after 2k − 1 steps, that is at the moment after the kth move and before the kth recoloring. Obviously the sequence is a Markov chain, also U 1 = 0.
As to the transition probabilities note that after 2k − 1 steps there are n − k black balls in all and n − k − 1 balls in A. Thus given U k = r there are r red and n − k − r − 1 black balls in A, and the remaining r + 1 black balls belong to B. Then U k+1 = r + 1 occurs only, if in the next step the ball recolored from black to red belongs to A and subsequently the ball shifted from A to B is black. Thus Similarly U k+1 = r − 1 occurs, if the recolored ball belongs to B and next the ball shifted from A to B is red. The corresponding probability is Since U 1 = 1 = U 1 + 1 and in view of the transition probabilities of (U k ) and (U k ) we see that (U 1 , . . . , U n−1 ) and (U 1 + 1, . . . , U n−1 + 1) indeed coincide in distribution.
Next note that U n−1 = 0. Therefore U k can be considered as a function not only of the first 2k−1 but also of the last 2n−2k −1 shifting and recoloring steps. Since the steps are independent, the process backwards is equally easy to handle. Taking into account that backwards the order of moving and recoloring balls is interchanged, one may just repeat the calculations above to obtain reversibility.
But this repetition can be avoided as well. Let us put our model more formally: Label the balls from 1 to n − 1 and write the state space as where L i is the location of ball i and c i its color. Then in our model the first and second coordinate are changed in turn from A to B and from b to r. This is done completely at random, starting within the first coordinates. Clearly we may interchange the role of the first and second coordinate. Thus our box model is equivalent to the following version: Again initially A contains n−1 black balls whereas B is empty. Now in the steps number 1, 3, . . . , 2n− 3 a randomly chosen black ball is recolored to a red ball and in the steps number 2, 4, . . . , 2n − 2 a randomly drawn ball from A is shifted to B. Again these 2n − 2 operations are carried out independently. Here we consider U k := number of black balls in box B after 2k − 1 steps.
Then from the observed symmetry it is clear that the quantities (U 1 , . . . , U n−1 ) and (U 1 , . . . , U n−1 ) are equal in distribution.
If we finally interchange both colors and boxes as well, then we arrive at the dynamics of the backward process. This finishes the proof.
There is a variant of our proof, which makes the reversibility of (U k ) manifest in a different manner. Let again the balls be labelled from 1 to n − 1. Denote ν m := instance between 1 and n − 1, when ball m is colored to red, σ m := instance between 1 and n − 1, when ball m is shifted to box B.
Then from our construction it is clear that ν = (ν m ) and σ = (σ m ) are two independent random permutations of the numbers {1, . . . , n − 1}. Moreover, at instance k (i.e. after 2k − 1 steps) ball number m is red and belongs to box A, if it was colored before and shifted afterwards, i.e. ν m < k < σ m . Thus we obtain the formula and we may conclude the following result.
Certainly this representation implies Theorem 7 again. Also it contains additional information. For example, it is immediate that U k − 1 has a hypergeometric distribution with parameters n − 1, k − 1, n − k − 1.
One might think to apply similar dimishing urn schemes to other coalescent processes. However, reversiblity will hardly be preserved. For related urn models compare the sock-sorting process studied in Steinsaltz (1999) and Janson (2009), Section 8.
We conclude this section by imbedding our urn model into the coalescent. Let and U k := V n−k , 0 ≤ k ≤ n. Thus V k is the number of internal branches among the k branches after the (n − k)-th coalescing event and U k is the number of internal branches among the n − k branches after the k-th coalescing event. The coalescing mechanism takes two random branches and combines them into one internal branch. If we code the external branches by black balls and the internal branches by red, this completely conforms to our urn model; thus (U 0 , . . . , U n ) is as above. By Theorem 7, (V 0 , . . . , V n ) has the same distribution as (U 0 , . . . , U n ). In the next sections we make use of the Markov chain V 0 , . . . , V n and its properties.

Proof of Proposition 3
We use the representation In view of the coalescing procedure X k takes only the values 0, 1, 2, and from the definition From (4), V k = U n−k and Proposition 6 we obtain after simple calculations and for k < l for a suitable c > 0, independent of n.
Thus from independence Now the first claim follows by simple computation.

Further from independence
Var Using (5)-(7) we have for k < l, and it follows that Consequently, (8) yields, using again (5)- (7), It remains to show that and consequently This gives our claim.

Proof of Theorems 2 and 4
In this section we use Theorem 7. Namely, V 0 , . . . , V n is a Markov chain with transition probabilities, which can be expressed by means of X 1 , . . . , X n−1 as follows: We would like to couple these random variables with suitable independent random variables taking values 0 or 1. Note that V k takes only values v ≤ k, thus for k ≤ n/3 Therefore we may enlarge our model by means of random variables Y k , k ≤ n/3, such that For P(X k = x | V k = v) this gives the above formula, whereas This means that the 0/1-valued random variables Y k , k ≤ n/3, are independent. For convenience we put Y k = 0 for k > n/3. A straightforward computation gives for k ≤ n/3. Since k − E(V k ) = k(k − 1)/(n − 1) from Proposition 6, it follows Proof of Theorem 2. Recall that, by (1) and (4), Recall also that η n For thus we obtain from standard results on sums of independent 0/1-valued random variables that η n ([a, b)) has asymptotically a Poisson distribution. Also η n (B 1 ), . . . , η n (B i ) are independent for disjoint B 1 , . . . , B i . Therefore we obtain from standard results on point processes (for example Kallenberg (2002), Proposition 16.17) weak convergence of η n to the Poisson point process η on (0, ∞] with intensity 8x −3 d x. Next we prove that for all 0 < a < b ≤ ∞ η n ([a, b)) − η n ([a, b)) → 0 in probability. To this end note that from (12) which implies that P(X k = Y k for all k ≤ 2 n a ) → 1. Therefore we may well replace Y k by X k in η n ([a, b)).
Also, by (7), nT k − 2 n/k = nT k − nE(T k ) − 2/ n. From (7) and Doob's inequality for any > 0 P max Since P(Y k = 0 for all k < n 2/5 ) → 1, we may as well also replace 2 n/k by nT k in η n , which yields η n by (13) and (14) For β < 1/2 this quantity converges to zero, which gives the first claim of the theorem.
For the next claim we use that because of (7) nT n 1/2 has expectation 2 + O(n −1/2 ) and variance of order n −1/2 . Thus P(2 − < nT n 1/2 < 2 + ) → 1 for all > 0. This implies that the probability of the event goes to 1. Also for a > 0 from Theorem 2 which is our second claim.
As to the last claim of Theorem 4 we note that from (9) meaning that the remainder term is of order O(n −1/2 ) in the L 1 -norm. In this representation, we would like to replace X k by Y k . We assume first β < 1. Note that for β < 1 in view of (7) and (12) Var and from (10), (7) and Proposition 6 Var (15) yields Also Var( 1 n n α ≤k<n β Y k ) ≤ n −2 n α ≤k<n β 2k/(n − k) = O(n −1 ), and because of (7) we end up with This is a representation of the external length by a sum of independent random variables. Now Var(Y k ) = 2k n−k − 4k 2 (n−k) 2 , thus for β < 1 Thus for α ≥ 1/2 we get This finishes the proof in the case β < 1, using Proposition 3.
The last claim on asymptotic independence follows from (16), too.

Proof of Proposition 5
Let 0 ≤ α ≤ 1 and m = n α . Since k − V k = #{i : ρ(i) < k} is the number of external branches, which are found between level k − 1 and k, From independence This gives the first claim. Next, letting Combining our formulas the result follows.