Some large deviations in Kingman's coalescent

Kingman's coalescent is a random tree that arises from classical population genetic models such as the Moran model. The individuals alive in these models correspond to the leaves in the tree and the following two laws of large numbers concerning the structure of the tree-top are well-known: (i) The (shortest) distance, denoted by $T_n$, from the tree-top to the level when there are $n$ lines in the tree satisfies $nT_n \xrightarrow{n\to\infty} 2$ almost surely; (ii) At time $T_n$, the population is naturally partitioned in exactly $n$ families where individuals belong to the same family if they have a common ancestor at time $T_n$ in the past. If $F_{i,n}$ denotes the size of the $i$th family, then $n(F_{1,n}^2 + \cdots + F_{n,n}^2) \xrightarrow{n\to \infty}2$ almost surely. For both laws of large numbers we prove corresponding large deviations results. For (i), the rate of the large deviations is $n$ and we can give the rate function explicitly. For (ii), the rate is $n$ for downwards deviations and $\sqrt n$ for upwards deviations. For both cases we give the exact rate function.


Introduction
Kingman's coalescent is a random tree introduced by Kingman (1982) as the genealogy arising in large population genetic models. It has infinitely many leaves and is usually constructed from leaves to the root as follows: given that there are k lines in the tree, after some exponential time with rate k 2 , two lines are chosen uniformly and merged to one line, leaving the tree with k − 1 lines. Due to the quadratic rate k 2 the tree immediately comes down from infinitely to finitely many leaves (Donnelly, 1991). Since the seminal paper by Pitman (1999) this random tree has been generalized to other infinite trees arising in population genetics models.
For the Kingman coalescent some laws of large numbers and central limit theorems have been proved. They are nicely summarized in Aldous (1999), Chapter 4.2; see also Proposition 2.1 below. For ε > 0 let N ε denote the number of lines time ε in the past. Then, since the Kingman coalescent immediately comes down from infinity, N ε is finite. Furthermore it is approximately 2/ε. Equivalently, the time T n it takes the coalescent to go from infinitely many lines to n lines is approximately 2/n for large n. Going to the fine structure, at time T n the infinite population is decomposed in n families (whose joint distribution is exchangeable) and every leaf in the tree belongs to exactly one of the n families whose frequencies are denoted by F 1,n , . . . , F n,n . It is known that for large n a randomly chosen F i,n is approximately exponentially distributed with mean 1/n. This translates into several laws of large numbers; see e.g. (35) in Aldous (1999).
In particular the probability of picking (from the initial infinite population) two leaves that belong to the same family, given by F 2 1,n + · · · + F 2 n,n , is approximately 2/n.
The main goal of the present paper is to study the corresponding large deviations results. To the best of our knowledge, except for Angel et al. (2012), cf. Remark 2.6, results in this direction are not present in the literature. We formulate our results in the next section. Theorem 1 gives a full large deviation principle for the distributions of nT n . The proof, given in Section 3, is an application of the Gärtner-Ellis Theorem. As a byproduct, we derive a large deviation principle for the distributions of εN ε in Corollary 2.4. Large deviations of n(F 2 1,n + · · · + F 2 n,n ) are considered in Theorem 2 and exact rate functions for downwards and upwards deviations are given. The proof is given in Section 4.2. For the upward deviations we use a variant of Cramér's theorem for heavy-tailed random variables; see e.g. Gantert et al. (2014). For the downward deviations we use a connection to self-normalized large deviations; see Shao (1997). This connection was pointed out to us by Alain Rouault and Nina Gantert. Since the rate function for downward deviations is hard to treat analytically we provide in Theorem 3 a simple lower bound. The proof of that bound is given in Section 4.3.

Main results
The Kingman coalescent can be seen as a discrete graph, more precisely a discrete tree with infinitely many leaves. Let S 2 , S 3 , . . . be independent exponentially distributed variables with mean 1. Then the Kingman coalescent tree can be constructed from the root to the leaves as follows.
1. Start the tree with two lines from the root.
2. For k ≥ 2 the tree stays with k lines for the amount of time S k / k 2 . After that time one of the k lines is randomly chosen. This line splits in two so that the number of lines jumps from k to k + 1.
3. Stop upon reaching infinitely many lines, which happens after (the almost surely The random variable T 1 is the total tree height. Alternatively, T 1 is the time to the most recent common ancestor (MRCA) of the infinite population (of leaves). Counted from the top of the tree at time ε > 0 a random number N ε of active lines in the Kingman tree is present, i.e.
N ε := inf{n : T n < ε} (2.1) At time T n every leaf belongs to one of n disjoint families and all members of each such family stem from the same line at time T n . Let us denote the frequencies of these families (which exist due to exchangeability by deFinetti's Theorem) by F 1,n , . . . , F n,n .

Proposition 2.1 (Laws of large numbers).
Let (T n ) n=1,2,... , (N ε ) ε>0 and (F 1,n , . . . , F n,n ) n=1,2,... be as above. Then (2.4) Remark 2.2 (Interpretation of (2.4)). We note that the left hand side of (2.4) has the interpretation of a homozygosity by descent in the following sense: when picking two leaves from the tree at time 0, the probability that both share a common ancestor at time T n is n k=1 F 2 k,n . Then, the law of large number states that the homozygosity by descent at time T n is approximately 2/n for large n.
In the present paper we are interested in large deviations results corresponding to the statements of Proposition 2.1. We start with large deviations connected with (2.2). First we introduce some notation. For n = 1, 2, . . . let µ n denote the distribution of nT n , i.e. µ n ( · ) = P(nT n ∈ · ). Furthermore we denote by B(R) the Borel σ-algebra on R and for Γ ∈ B(R) we denote by Γ • the interior and by Γ the closure of Γ. For x > 0, let t x < 1 be the unique solution of the equation x = f (t), where the continuous and increasing function f : (−∞, 1) → (0, ∞) is defined by (see Figure 1 for a plot) arctan |t| : t < 0. (2.5) The proof of the following theorem is given in Section 3.1.
Theorem 1 (LDP for (µ n ) n=1,2,... ). The sequence (µ n ) n=1,2,... satisfies a large deviation principle with scale n and good rate function I given by In other words, for any Γ ∈ B(R) we have

Remark 2.3 (Interpretation).
Both, the function f from (2.5) and I from (2.6) are plotted in Figure 1. The minimum of the rate function is attained at x = 2. This fact is clear from the law of large numbers, (2.2). In addition, I(x) = ∞ for x ≤ 0 because nT n > 0 almost surely.
Let us now have a closer look at the behaviour of I(x) for x near 0 and for large x.
In the neighbourhood of 2 the last inequality translates easily into a bound for the rate function I from (2.8); see Figure 2. Namely, for x ∈ (1.5, 2.5) we have Next, we state some large deviations results connected to (2.4). For W n := n n k=1 F 2 k,n we know from (2.4) that W n n→∞ − −−− → 2 holds almost surely. The proof of this result is based on the well-known fact (see e.g. Section 5 in Kingman (1982)) that the distribution of W n can be derived using uniform order statistics: Let U 1 , . . . , U n−1 be independent and uniformly distributed on [0, 1], and 0 < U (1) < · · · < U (n−1) < 1 be their order statistics. Additionally, let R 1 , . . . , R n be independent exponentially distributed random variables with mean 1. Then, Here the second equality in distribution is one of the well known representations of uniform spacings; see e.g. Section 4.1 in Pyke (1965). It follows (2.10) We will use this representation to obtain large deviations results for W n . In particular we show that upwards large deviations of W n are on the scale √ n while downwards large deviations are on the scale n. The proof is given in Section 4.2.
(2.11) Furthermore P(W n < 1) = 0 and for each 1 < x < 2, we have (2.12) The function I(x) is positive for 1 < x < 2 and is given by (2.13) where Φ denotes the distribution function of the one dimensional standard Gaussian distribution.
Though the rate function in (2.12) is exact it is hard to treat analytically. For this reason we provide in Theorem 3 a much simpler lower bound for downwards large deviations of W n . For the proof we use the following lemma which provides another representation of W n in terms of exponential random variables (see Section 4 for proofs).
Lemma 2.7 (Representation of W n ). Let R 1 , . . . , R n be independent exponentially distributed random variables with mean 1. Then, (2.14) Theorem 3 (Lower bound on downwards large deviations of W n ). For 1 < x < 2 we have lim inf (2.15) Remark 2.8 (Rationale and use of the representation in Lemma 2.7). The main point in the proof of Lemma 2.7 is that W n does not depend on the order of the R k and hence we can as well order them according to their size.
Let us briefly explain how we will use (2.14) in the proof of in (2.15). Since W n is minimal if R 1 = · · · = R n (whence W n = 1), we have to look for possibilities that all R k 's are of about the same size in order to obtain a large deviations result for W n . Let R (1) , . . . , R (n) denote the above exponential random variables ordered in increasing order, i.e. R (i) is the ith smallest value. Using "competing exponential clocks" arguments (see also the proof of the lemma) one can see that R (i) − R (i−1) is exponentially distributed with mean 1/(n − i + 1). Hence, one way of obtaining similar values for all R k 's arises if R (1) is particularly large, which then leads to a large deviations result for W n . have to ask ourselves about the easiest way W n becomes too large. From (2.9), we see that this is the case if one of the R k 's is too large, making this kind of deviations a local property in the sense that only a single of the R k 's has to show some untypical behavior. This is different when looking at (2.12), i.e. too small values of W n . First, observe that W n is small only if all (or many) families have about equal sizes (extreme case F 1,n = · · · = F n,n = 1 n gives the minimal value W n = 1). Hence, such downward deviations require to study a global property of the random variable W n , which is significantly harder. For the proof of (2.12) we will interpret W n as a self-normalised sum and use from Shao (1997) a result on large deviations result for such sums.
2. From (2.9), we see that in fact W n is a function of uniform order statistics, which, for instance, have been studied in detail (although no large deviations results were given) in Pyke (1965). Hence, Theorem 2 may as well be interpreted as a large deviations result for uniform order statistics.
3. As stated in Remark 2.2, W n /n can be interpreted as homozygosity at time T n . Using a Poisson process along the tree with intensity θ/2, we can ask for the probability of picking two leaves from the tree which are not separated by a Poisson mark, denoted by homozygosity in state, abbreviated by H θ /θ. This quantity is closely related to the Poisson-Dirichlet distribution and some large deviations (in the limit of large θ) were derived in Dawson and Feng (2006). It is shown there in Theorem 5.1 that H θ /θ for I(x) = − log(1 − x). However, a large deviation principle for the quantity H θ (noting that H θ θ→∞ −−−→ 1), which corresponds to the results from Theorem 2, could not be obtained by Dawson and Feng (2006). At least, it was shown that its scale cannot be larger than √ θ.

Proof of Theorem 1 and Corollary 2.4 3.1 Proof of Theorem 1
The proof of Theorem 1 is an application of the Gärtner-Ellis theorem; see for instance Section 2.3 in Dembo and Zeitouni (2010).
Let Λ n (t) := log E[e tnTn ] and µ n ( · ) = P(nT n ∈ · ). To show that the sequence µ 1 , µ 2 , . . . satisfies a large deviation principle with scale n and a good rate function we need to check the following three conditions. GE1 Λ(t) := lim n→∞ 1 n Λ n (nt) exists for all t as a limit in R Then the good rate function is given by ( 3.1) We proceed in three steps. First, we compute Λ(t) := lim n→∞ 1 n Λ n (nt). Second, we check the further assumptions of the Gärtner-Ellis theorem and obtain I as the Fenchel-Legendre transform of Λ. In the third step, for the rate function I from (3.1) we obtain its simplified form given in Theorem 1.
Step 1. The limit of 1 n Λ n (nt): We will show that For this, recall from (2.1) that T n = ∞ k=n+1 S k / k 2 where S k / k 2 is exponentially distributed with rate k 2 as well as independent of S for all = k. Furthermore recall that the moment generating function of an exponentially distributed random variable R with rate λ > 0 is given by Hence, for each n ∈ N and t ∈ R we obtain by the monotone convergence theorem We have to consider two cases t > 1 2 and t ≤ 1 2 separately. First suppose that t > 1 2 .
Step 2. Further assumptions of the Gärtner-Ellis theorem: We proceed by checking the assumptions GE2 and GE3. For differentiability of Λ for t < 1 2 consider for We have ∞ 1 |f (x, t)| dx < ∞ for t ∈ (r, s) and the derivative d dt f (x, t) = 2 x 2 − 2t exists for each x ∈ (1, ∞) and is continuous in t. Hence, we can interchange differentiation and integration and obtain Furthermore, for a sequence t 1 , t 2 , . . . with t n ↑ 1 i.e. condition GE3 is also satisfied.
Step 3. Properties of I: Applying the Gärtner-Ellis theorem reveals that the sequence of distributions of nT n , n = 1, 2, . . . satisfies a large deviation principle with good rate function I(x) = sup In order to compute that supremum, we write for t ≥ 0 2 ∂ ∂t It is easy to see that the second derivative is negative throughout, such that the supremum is attained at t x given by the solution of f (t x ) = x for f as in (2.5). Finally we note Hence, the scale function I is of the form given in (2.6).

Proof of Corollary 2.4
The proof is based on the fact that {T n ≥ ε} = {N ε ≥ n}. Thus, for x ≥ 2 we have and for 0 < x ≤ 2 The value I(0) follows from (2.7). Since the rate function I attains its minimum at x = 2, is decreasing below and increasing above 2, the result follows.

Proof of Lemma 2.7
When looking at (2.10), note that W n does not depend on the order of the R k 's. Therefore, it is possible to order them according to their size. Precisely, let 0 < R (1) < · · · < R (n) be their order statistics. Then it is well-known that Indeed, the smallest of n independent exponentially distributed mean 1 random variables is exponentially distributed with mean 1 n (as does Rn n ), and the second smallest then has the same distribution as Rn n + Rn−1 n−1 etc. Now, we obtain (2.14) as follows

Proof of Theorem 2
We start by proving (2.11). Let x ≥ 2 and let R 1 , R 2 , . . . be independent exponential random variables with mean 1. In what follows we set X n := 1 n n k=1 R k and Z n := 1 n n k=1 R 2 k . (4.1) According to (2.10), it suffices to show that (4.2) To this end we will show that for all 0 < ε < 1, as well as and obtain (4.2) by letting ε → 0. For (4.3) we have (4.5) We consider the two terms on the right hand side of the last display separately and start with the first one. Observe that E[e λR 2 1 ] = ∞ for λ > 0, E[R 2 1 ] = 2 and P R 2 for t ≥ 0. We use a variant of Cramér's theorem for heavy-tailed random variables from Gantert et al. (2014). In particular, we refer to the statement around equation (1.2) there (the assumption there is fulfilled with X 1 replaced by R 2 1 and r = 1 2 , m = 2 and c = 1). We obtain (1)) as n → ∞. (4.6) For the second term on the right hand side of (4.5) by the (classical) Cramér theorem we obtain (1)) , as n → ∞, is the Fenchel-Legendre transform of the function t → log E[e λR1 ]. Now, using (4.5), (4.6) and (4.7) we obtain lim sup which shows (4.3). For the proof of (4.4) we write (4.9) Again we consider both terms in the last line separately. For the first term, as in (4.6) we obtain (1) , as n → ∞. (4.10) For the second term, we use the same argument as for (4.7) and get (1)) , as n → ∞. (4.11) Combining (4.10) and (4.11) with (4.9) now gives (4.4) which proves (2.11).
Since the minimum of W n is 1 (when F k,n = 1/n for all k) the assertion P(W n < 1) = 0 is clear. It remains to prove (2.12), show that the rate function is of the form (2.13) and justify the positivity of I(x) for x ∈ (1, 2). For x ∈ (1, 2) using (2.10) we obtain (4.12) Furthermore, for x ∈ (1, 2) we have 1 . Thus, we can use Theorem 1.1 from Shao (1997) and obtain (4.13) Now we have x (y 2 + c 2 )) dy and elementary integration yields where Φ denotes the distribution function of the one dimensional standard Gaussian distribution. Taking log of the last term we obtain (2.13). Now we fix x ∈ (1, 2) and show that I(x) is positive. In the sequel we write h(r, c) := cr − 1 2 √ x (r 2 + c 2 ).
We have inf t≥0 E exp th(R 1 , c) This expression (and therefore also I(x)) is positive for x ∈ (1, 2). Thus, the proof of Theorem 2 is concluded.

Proof of Theorem 3
We prove the inequality (2.15) using Lemma 2.7. Let 1 < x < 2 and set y = 1 √ x−1 − 1. For ε > 0 we have 1 n log P W n ≤ x + ε = 1 n log P n 2 n l=1 l k=1 x + ε, R n > ny = 1 n log P R n > ny P n 2 n l=1 l k=1 Now 1 n log P R n > ny = −y, and conditioning in the second factor in the curly braces can be removed by using the fact that conditioned on R n > ny the exponential random variable R n has the same distribution as ny + R n . After some elementary calculations we see that the last line of the above display equals −y + 1 n log P R 2 k k + 2y 1 n n k=1 R k + y 2 1 n n k=1 R k 2 + 2y 1 n n k=1 R k + y 2 n→∞ − −−− → 2 + 2y + y 2 1 + 2y + y 2 = x.
The rest follows by letting ε ↓ 0.