The genealogy of Galton-Watson trees

Take a continuous-time Galton-Watson tree and pick $k$ distinct particles uniformly from those alive at a time $T$. What does their genealogical tree look like? The case $k=2$ has been studied by several authors, and the near-critical asymptotics for general $k$ appear in Harris, Johnston and Roberts (2018). Here we give the full picture.


Introduction
Let L be a random variable taking values in Z + = {0, 1, 2, . . .}, and let f (s) = E[s L ] be its generating function. Consider a continuous-time Galton-Watson tree starting with one initial particle, branching at rate 1 and with offspring distribution L. Let N t be the number of particles alive at time t, and let F t (s) = E[s Nt ] be the generating function of the process.
For T > 0 we condition on the event {N T ≥ k}, and pick k particles alive at time T uniformly and without replacement. Label these particles 1, 2, ..., k, and consider their ancestral tree.  In this direction, the following result, which we will generalise significantly later, was proved by Lambert [14] in discrete-time. See also Le, who gave the continuous-time analogue [16], and Grosjean and Huillet [7], who prove an extension we mention below.
Lemma 1.1 (Lambert [14], Corollary 1). Conditioned on {N T ≥ 2}, pick uniformly and without replacement two particles of those alive at time T , and label the time at which they shared a common ancestor τ L,T . Then where we recall F t (s) = E[s Nt ] is the generating function for the number alive in the process.
Although equation (1) gives a powerful implicit characterisation of the distribution of τ L,T , it is difficult to infer qualitative properties of this random variable directly from the formula above. When T → ∞, however, it is possible to gain a more intuitive insight. Unsurprisingly, different qualitative behaviours arise depending on whether the underlying Galton-Watson process is supercritical, critical, or subcritical. Letting m := E[L] denote the mean number of offspring, these cases correspond to m > 1, m = 1, and m < 1 respectively.
Buhler first observed in [4], that when two particles are picked from a supercritical tree at a large time, their most recent common ancestor was a member of one of the generations near the start. Indeed, Athreya [1] showed (albeit in discrete-time) that when 1 < E[L] < ∞, the time τ L,T remains near the beginning of the interval [0, T ] even when T is very large, and consequently we have the distributional convergence τ L,T D − →τ L ∈ [0, ∞).
for some random variableτ L depending on the offspring distribution. The special case k = 2 of our results on the limiting coalescent structure of supercritical trees allows us to characterise the law ofτ L under an L log + L condition. and the limit variableτ L has probability density given by the integral representation where ϕ j (v) = E[W j e −vW ], W = lim t→∞ N t e −(m−1)t is the martingale limit, and (1 − ϕ 0 (∞)) is the survival probability.
In the interest of generating concrete examples, the following equation by Harris [8] allows us to extract martingale transforms ϕ(v) from offspring generating functions f (s): A look at even the simplest supercritical Galton-Watson tree reveals surprisingly detailed structure.
Example 1.1. We call our process a standard Yule tree if f (s) = s 2 , and by (3), the key transforms associated with the process are Inserting these into (2), a calculation shows that the limit variableτ Yule has an interesting and nontrivial probability density function P(τ Yule ∈ dt) = 2e t (e t − 1) 3 (t − 2)e t + (t + 2) dt, t ∈ [0, ∞).
Moving on to the critical case, which has recieved a lot of attention from different authors, the following result shows that whenever E[L 2 ] < ∞, the variable τ L,T is scaled with T as T → ∞: Then as T → ∞, τ L,T /T converges in distribution to a continuous [0, 1]-valued limit variableτ Crit , universal in all critical L with E[L 2 ] < ∞. Furthermore, the limit variableτ Crit has probability density function This result is known. See Durrett [6], O'Connell [17], Lambert [14], Athreya [2], and Harris, Johnston, and Roberts [10].
Finally, we look at the subcritical case. On the overwhelmingly rare event that a subcritical tree manages to survive until a large time T , the number of particles alive converges to a quasistationary limit (see [3, Chapter I, Section 8]). Furthermore, every particle alive is descended from a single ancestral lineage that survived for the majority of [0, T ]. Where in the supercritical case, the common ancestor of two particles chosen at a large time T last existed near the beginning of time, in the subcritical case the common ancestor last existed very near T .
Indeed, Lambert showed in [14] (and also Athreya in [2]) that the difference υ L,T := T − τ L,T satisfies the distributional convergence for some limit variableῡ L depending on the offspring distribution. Lambert also gave an implicit formula for the distribution of the limit variableῡ L , which Le [16] inverted to give the formula where W is the quasi-stationary limit variable: and B(s) = E[s W ] is the generating function of the quasi-stationary limit. The results (7) and (8) correspond to the special case k = 2 of Theorem 2.7, which characterises the limiting coalescent structures of subcritical trees.
We now move on to results which concern general k. With the exception of a special case of Theorem 2.1 that appears in [7], the following work is original.

Overview
Let us now give a brief overview of our main results, which will be stated formally in the next section. First we will study the process (π T t ) := (π k,L,T t ) t∈[0,T ] for fixed (finite) T . Theorems 2.3 and 2.4 characterise the law of this process in two different ways, first in terms of its finite dimensional distributions and then in terms of the random splitting times of blocks in (π T t ). In both cases, explicit formulas are obtained, each in the form of an integral equation involving various generating functions associated with the process.
We then send the picking time T → ∞, and study the asymptotic behaviour of the process (π T t ). As in the case k = 2 discussed in the introduction, we will see analogous qualitative differences depending on the mean of the offspring distribution. Namely, in the supercritical case the common ancestors of a sample of k particles picked at a large time T last existed near the start of [0, T ], and in the subcritical case, the common ancestors last existed near the end of [0, T ]. The critical case has already been covered in [10], but in the interest of completeness we mention very briefly the results from this paper below.
First we will consider the supercritical case, where m := E[L] > 1. Under the Kesten-Stigum condition, E[L log + L] < ∞, we will see in Theorem 2.5 that the process (π k,L,T t ) t∈[0,T ] satisfies the distributional convergence and we will characterise the law of the limit process (π k,L t ) t∈[0,∞) in terms of the generating function of the martingale limit In the subcritical case, we will see that the coalescent process satisfies and we will characterise the law of the limit process (ρ k,L t ) t∈[0,∞) in terms of the quasi-stationary generating function The rest of the paper is structured as follows. In section 2, we outline our main results. In section 3, we introduce the idea of multiple spines, privileged lines of descent that flow through the tree. We then create a change of measure which forces the spines to represent a uniform sample of k particles at time T . In section 4, we prove the formula of section 2.1 characterising the distribution of (π T t ). In section 5, we then give proofs of our asymptotic formulas, proving the results of section 2.2.

Main Results
Before stating our main results, we need to introduce some more notation and two key hypotheses. We start by giving a brief formal description of the continuous time Galton-Watson tree. Let L be a {0, 1, 2, ...}-valued random variable. Under the probability measure P, we start at time 0 with one particle which we call ∅. The particle ∅ lives for an exponentially distributed, rate 1, length of time τ ∅ until it dies, and is replaced by a random number of offspring with labels 1, 2, ..., L ∅ , where L ∅ is distributed like L and is independent of τ ∅ . These offspring then independently repeat this behaviour. That is, for each u born at some time, u lives a length of time τ u distributed like τ ∅ and at death is replaced by offspring with labels u1, u2, ..., uL u , where L u is distributed like L. Here, τ u and L u are independent of each other and of the past. We write N t for the set of particles alive at time t and N t = |N t | for the number of particles alive. We also write u < v if u is an ancestor of v, and u ≤ v if u < v or u = v.
A partition γ of {1, ..., k} is a collection of disjoint non-empty subsets Γ 1 , ..., Γ m of {1, ..., k}, or'blocks', whose union is {1, ..., k}. Equivalently, a partition is an equivalence relation on {1, ..., k}. We will always order the blocks Γ i of a partition γ by order of least element. We write P k for the collection of partitions of {1, ..., k}. For γ = [Γ 1 , ..., Γ p ] ∈ P k , we write |γ| = p for the number of blocks in the partition. For partitions α, β, we say α can break into β, written α ≺ β, if β can be obtained by breaking up the blocks of α. .., k} into singletons, with the property that γ i ≺ γ i+1 for every i. We emphasise that consecutive terms in a chain may be equal, a departure from the usual terminology in the the theory of partially ordered sets. Let CP n k be the set of chains (γ 0 , γ 1 , ..., γ n , γ n+1 ) with entries in P k of internal length n.
For a chain (γ 0 , γ 1 , ..., γ n , γ n+1 ) ∈ CP n k , we will write Γ i,j for the jth block (ordered by least element) of γ i . By definition, each Γ i,j is the union of some classes in γ i+1 . With this in mind, we write γ i,j for the restriction of γ i+1 to Γ i,j . We call b i,j = |γ i,j | the breakage numbers of the partition sequence. That is, b i,j ≥ 1 is the number of blocks the jth block of γ i needs to break into to form some new blocks in γ i+1 .
For k distinct particles alive in a branching process at time T with labels 1, ..., k, let π t be the partition generated by the equivalence relation i ∼ πt j if and only if i and j share a common ancestor alive at time t. Note that (π t ) t∈[0,T ] is then a right-continuous process taking values in P k , and π s ≺ π t for s ≤ t, and in particular, Finally, we need to ensure that there actually are at least k particles alive at time T with positive probability, and that we can choose uniformly from them. To be more precise, we must ensure that both P(N T ≥ k) > 0, and P (N T < ∞) = 1. The inequality P(N T ≥ k) > 0 is guaranteed to hold by virtue of our first hypothesis, which states that In addition to (11), we insist that the following non-explosion hypothesis holds: This condition is equivalent to our second requirement that P(N t < ∞) = 1 for t, and is weaker than E[L] < ∞. See [9, Chapter II, Theorem 9.1] for details. We emphasize that both hypotheses (11) and (12) are in force in the remainder of this paper.
We are now ready to state our main results, which we split into two sections. The results in section 2.1 concern fixed and finite T . The results in section 2.2 concern the asymptotic regime in which T is sent to ∞.

Law of the partition process
On the event {N T ≥ k}, pick uniformly k distinct particles alive at time T and label them 1, ...., k. Let (π T t ) = (π k,L,T t ) t∈[0,T ] be their partition process. Our main theorems in this section describe the law of the process (π T t ) in terms of the generating functions F t (s) = E[s Nt ] and f (s) = E[s L ]. Below we will write F j t (s) for the j th -derivative of F t (s) with respect to s.
We start by calculating the one-dimensional distributions of the process (π k,L,T t ) t∈[0,T ] . That is, for fixed t ∈ [0, T ] we give the law of the P k -valued random variable π k,L,T t . Theorem 2.1. For a partition γ of {1, ..., k} into p blocks Γ 1 , ..., Γ p of sizes k 1 + ... + k p = k, By plugging in γ = γ 0 into equation (13), and using the semigroup identity F t • F u (s) = F t+u (s) to note that , we output a result by Grosjean and Huillet [7], giving the time to most recent common ancestor of k uniformly chosen particles.
For a fascinating investigation of the coalescent processes associated with discrete-time Galton-Watson forests, with an array of interesting computations involving explicit branching mechanisms, we refer the reader to [7].
In fact, Theorem 2.1 is the special case n = 1 of the following more general theorem, giving the finite dimensional distributions of (π k,L,T t ) t∈[0,T ] .
be a chain of partitions with breakage numbers b ij . Then An alternate way to characterise the law of (π T t ) is in terms of the 'split times', the times t at which be a chain of partitions that is maximal in the sense that η i is created from η i−1 by breaking precisely one block of η i−1 into c i ≥ 2 blocks in η i . Note that 1 ≤ n ≤ k − 1, and that the set of multiplicities {c i : i = 1, ..., n} satisfy the branch equation since at each split time, the number of blocks increase by c i − 1, and over the period [0, T ], the number of blocks increases by k − 1.
for the event that for each i = 1, ..., n, the value of (π T t ) changes from The following theorem gives the density of the event (17), thereby providing an alternate characterisation of the law of (π T t ).
Theorem 2.4 (Split time representation for the law of (π T t )).
Note that for an integer j, P(L ≥ j) = 0 implies the j th -derivative of the offspring generating function f j = 0, and hence almost surely we have c i < j for every i.
We end this section by shedding light on the combinatorial structure of (15), by making reference to Faà Di Bruno's formula. This formula states that for k-times differentiable functions f and g, where Γ ∈ γ refers to the blocks Γ in the partition γ, |Γ| is the size of a block, and |γ| is the number of blocks in γ. By setting f = f 0 , and g = f 1 • ... • f n , we can use induction to generalise this result to an (n + 1)-fold composition: where b ij are the breakage numbers of the chain γ 0 , ..., γ n+1 .

Now suppose we have a one-parameter collection of smooth transformations {G
Then for any sequence 0 = t 0 < t 1 ... < t n < t n+1 = T , writing ∆t i = t i+1 − t i , we have for every i = 0, 1.., and hence by setting f i = G ∆ti in (19) we can characterise the s derivatives of G T through the equation By the branching identity, the collection of smooth functions {F t : t ≥ 0} given by obey the semigroup structure in (20). Noting that F j t (s) ≥ 0, for every (j, t, s), it can be seen that K(· |s), defined by the finite dimensional distributions forms a probability measure on the product space P Furthermore, we prove in Lemma 4.2 that M given by is a probability measure. It follows that (15) may be rewritten as mixture That is, (π T t ) may be interpreted as a mixture (23) of stochastic processes with finite dimensional distributions given by the Faà di Bruno quotients (22).
Alternatively, the conditional measure K(·|s) can be understood through the split time formula

Asymptotic results
We now move on to our main results concerning the asymptotic behaviour of the partition process as T → ∞. Below, we say a collection of P k -valued processes if for any fixed times t 1 < .... < t n and partition chain

The Supercritical Case
We start with the supercritical regime E[L] > 1, and further suppose that the Kesten-Stigum condition E[L log L + ] < ∞ holds. If we pick k particles at a large time T from a supercritical tree, it turns out that the common ancestors of these k particles last existed near the start of [0, T ].  In this case the martingale N t e −(m−1)t converges to a non-degenerate limit W . See [3, Chapter III, Section 7] for details. In light of this almost-sure convergence, we may manipulate the limit in the split time representation, equation (18), to prove the following.
and the limit process has law given by , and the g ci (·|v) are non-unit densities given by This theorem is proved in section 4. The main idea of the proof is to change variable s(v) = e −ve −r(m−1)T to plug (27) into (18).

The Critical Case
Let us briefly mention here the following result from [10]. Suppose we are in the critical case with E[L 2 ] < ∞. We find the ancestral partition process (π T t ) gets scaled with the interval [0, T ] as T → ∞.
and the limit process (π k,Crit is a binary tree topologically equivalent to a Kingman coalescent [13], but with more complicated merger times. For further details on this fascinating link and on other properties of the process (π k,Crit t ), see [10].

The Subcritical Case
As mentioned earlier, on the rare event that a subcritical process survives until time T , it is overwhelmingly likely that just one lineage did the surviving for the majority of [0, T ]. Hence, we expect the most recent common ancestor of k particles chosen at a large time T to be close to the time T . For a subcritical process, the conditional limit law states that the conditional random variable N T |{N T ≥ 1} has a quasi-stationary limit. Namely, for each j ∈ {1, 2, . . .} the limit a j := lim T →∞ P(N T = j|N T ≥ 1) exists, and that ∞ j=1 a j = 1. See [3, Chapter III, Section 7] for details. Write W for a random variable with distribution P(W = j) = a j , and B(s) = E[s W ] = n≥1 a n s n for the associated quasi-stationary generating function.
with the property that η i is obtained from η i−1 by c i blocks of η i−1 merging to form precisely one block in η i . For a coalescent process (ρ t ), write and the limit process has law given by merger times We prove this in section 5. The main idea is to use the conditional limit law in (18).

Spines partitions and changes of measure
In this section we introduce spines, our tool for calculating the distributions of ancestral partition processes of uniformly chosen particles. For each n ∈ N, we associate a line of descent (ξ n t ) t≥0 that flows through the tree forward in time, choosing uniformly a branch to follow next at branching points. Our idea is to create a change of measure under which the first k spines n = 1, ..., k flow through a tree forward in time in such a way that at time T , the particles carrying spines 1, ..., k are distinct, and represent a uniform sample of k particles from the N T alive.
During section 3 we will assume that for every k, E[L k ] < ∞ and hence E[N k t ] < ∞ for all t. A result in section 5 shows we can safely remove this assumption.

N spines
Suppose under a measure P we have a continuous-time Galton-Watson branching process where particles live for a exponential rate 1 amount of time, and upon death are replaced by a random number of partiticles distributed like L. Recall we write N t for the set of particles alive at time t, and N t = |N t |. For technical reasons, we append a cemetery state ∆ to the statespace, and writê Additionally under P, for each n ∈ N, there is a right-continuous stochastic process (ξ n t ) t≥0 called the n-spine defined as follows.
• At time t the n-spine takes values inN t . That is, for some u ∈N t , ξ n t = u. We say that the n-spine is following u, and that u is carrying the n-spine.
• If a particle carrying the n-spine just before time t dies at time t and is replaced by p ≥ 1 particles v 1 , ..., v p , then the n-spine chooses uniformly among the p offspring a particle to follow next. If the particle carrying the n-spine dies and is replaced by no offspring, we send the n-spine to the cemetery state ∆ for the remainder of time. That is, ξ n s = ∆ for all s ≥ t.
• The n-spines don't affect the behaviour of the particles they are following. That is, if particle u is carrying the n-spine at time t, then this particle still branches at rate 1 and has offspring distributed like L.
• The set of n-spines {(ξ n t ) t≥0 : n ∈ N} are independent of one another -that is, if a particle carrying some spines dies and is replaced by p offspring, each of these spines chooses uniformly an offspring to follow next, independently of the others.
So in essence, the n-spines are simply a set of labels that flow forward in time through a continuoustime Galton-Watson tree without affecting the law of the underlying tree.  rather than E[·] for the expectation operator associated with P.
For any set S and k ≥ 0, let S (k) be the set of distinct k-tuples from S, and for n ≥ 0, write Note that |S (k) | = |S| (k) . Let F ∅ t be the σ-algebra containing all the information about the tree up until time t, but without any knowledge of which particles the spines are following. For a subset A ⊆ N, we call the set of processes {(ξ A t ) t≥0 : a ∈ A} the A-spines. Let (F A t ) t≥0 be the filtration containing all the information about the entire tree and who the A-spines have been following until time t. Formally, We now examine the probabilities, conditional on F ∅ T -knowledge, that a given spine is following a given particle in N T . For a particle in u ∈ N T , let be the product of birth sizes of ancestors of v.
Note that for spine n to be following particle u ∈ N T , for each ancestor v of u, spine n must have chosen the 'correct' offspring of the L v offspring of v to continue following. Hence, for all n ∈ N. Since the spines behave independently of one another, the probability that the A-spines are following a list (u a : a ∈ A) of (possibly non-distinct) members of N t is Since, in general, the quantities Q(u) −1 in (31) vary for different u ∈ N T , under P the spines are more likely to be following some particles than others. This can be seen in the following example. For a fixed T , we will construct a change of measure Q α,T on F A T under which we have the following: • There are always at least k particles alive at time T , and the A-spines are assigned to particles in N T in such a way that the the following holds: That is, a ∼ α b if and only if spines a and b are following the same particle at time T .
• Given F ∅ T , the k-tuple of particles which the spines are following at time T is equally likely to be any k-tuple of those alive. That is, for a particular k-tuple (u 1 , ..., u k ) ∈ N (k) We For A ⊆ N, let π A t be the partition of A defined by the equivalence relation that is, a and b are related in π A t if the a-spine and the b-spine are following the same particle at time t. Then (33) reads We define, for t ≥ 0, the F A t measurable random variables Proof. Note that every combination of k distinct particles u = (u 1 , ..., u k ) ∈ N (k) t corresponds to a way in which π A t = α, by setting ξ a t = u i , ∀a ∈ A i , ∀ i = 1, ..., k. Thus Finally, Q(u i ) ∈ F ∅ t , so for every a, i, by (31), and hence Finally, Let A ⊆ N be finite, α be a partition of A into k blocks, and T be a fixed time. By the previous T ], so the random variable has unit mean.
Define a new probability measure Q α,T on F A T by setting Let so that, again by Lemma 3.1, The remainder of section 3 is dedicated to studying the properties of Q α,T , and how it affects the behaviour of the A-spines. In section 3.3, we show that the measure Q α,T has a nice uniformity property, namely that at time T , the A-spines are equally likely to be any k-tuple of particles alive at a time T . In section 3.4, we see the change of measure Q α,T has a key symmetry property.

Uniformity properties of Q α,T
Let α be a partition of A into k blocks. The goal of section 3.3 is to prove that under Q α,T , conditional on F ∅ T , the particles the A-spines are following at the time T are equally likely to be any k-tuple alive. Proof. For any A ∈ G, Since Zµ[X|G] is G-measurable, it therefore satisfies the definition of conditional expectation of XY with respect to G under ν.
Applying this result using (36) and (37), we find that for any non-negative F A T -measurable random variable X, on the event {Z k,T > 0}, For a subinterval I ⊆ [0, T ], let G A I be the σ-algebra with knowledge of who the A-spines have been following during I, and the offspring sizes of ancestors of the spines during this interval, that is and set Using (36) and the fact that ζ α,T is G A T measurable, we deduce that Applying Lemma 3.2, this time using (36) and (40), we deduce that for any F A T measurable random variable X, on the event {ζ α,T > 0}; in the second equation we have used the G A T -measurability of ζ α,T .
In particular, (41) implies that Q α,T and P coincide on the collection of F A T -measurable events that are independent of G A T . In other words, particles not carrying any of the A-spines behave under Q α,T exactly as they do under P: they branch at unit rate and have offspring distribution L.
Note by the definition of ζ α,T and Lemma 3.1 that Now if |α| = k, there must be at least k distinct particles alive at time T for the spines to follow. The previous display therefore yields In summary, under Q α,T the A-spines at time T are distributed across k different particles in N T and induce the partition α of A at time T . The following lemma tells us that these k particles are chosen uniformly without replacement from those alive at time T . Lemma 3.3. When |α| = k, the Q α,T -probability that the spines are following a particular k-tuple Proof. Note that if N T ≥ k then Z k,T > 0. Then by (38), for any u ∈ N (k) where the third equality follows from (35).

The law of (π A t ) t∈[0,T ] under Q α,T and symmetric properties
Let α be a partition of a finite subset A ⊆ N, and let [A] = γ 0 ≺ ... ≺ γ n ≺ γ n+1 = α be a chain of partitions.
As we have seen, under Q α,T the partition process (π A t ) t∈[0,T ] starts as the one block partition [A] at time 0 and finishes at the partition α at time T . In this section, we will prove a theorem that describes the finite dimensional distributions of this partition process on [0, T ] under the change of measure Q α,T . Furthermore, we give an essential symmetry property of the measure Q α,T , which is the crux of the entire paper.
In order to understand this symmetry property, recall from the above that G B In terms of shift operators, this says that for any event W ∈ F B (s,t)

Conditioned on the event {π
To see that this implies the result for n = 1, the first part follows by plugging in ½ X = 1, ½ Yj = 1, ∀ j. Dividing through by (42) obtains the second part.
To prove (43), use (36) to write Hence, Now here is the key observation. By the definition (40), H 0 = ζ γ,t ½ X , and hence Now given that the Γ j spines are following the same particle u ∈ N t at time t, by the Markovian nature of the branching process, the subtree generated by descendents of u looks like a tree started at time 0. Furthermore, the random variable Q(ξ a T )/Q(ξ a t ) is a product of births experienced by the a-spine during the time period (t, T ], that is, like a copy of Q(ξ a T −t ). It follows that conditionally on {π Γj t = [Γ j ]}, we have the following equivalence in law under P: which implies ) Finally, on the event {π t = γ} clearly the collection of spine groups Γ j , j = 1, 2, ..., p are independent of one another during [t, T ] and the A-spines during [0, t], hence and this establishes the result for n = 1.
It is straightforward to see the result holds for n > 1 by induction. Suppose the result holds for all m = 1, ..., n. Let A ≺ γ 1 ≺ ...γ n ≺γ ≺ α be a partition chain of internal length n + 1, and let t 1 < ... < t n <t ∈ [0, T ]. Let the sets inγ have sizes k 1 + ...k p = k. Then by the case n = 1, where we note that the terms p j=1 P[N Our goal now is to use the symmetry theorem to calculate the joint generating function First we calculate the conditional generating function At time T , there are N T particles alive, k of whom are spines. Thus N T − k is equal to the number of particles at time T not carrying a k-spine. We call these the non-carriers. For a non-carrier alive at time T , let r u := sup{r ∈ [0, T ] : ∃l = 1, ..., k, ξ l r < u} be the time to most recent spine ancestor of u. On the event B, for each non-carrier u ∈ N T , there is a unique time interval [t i , t i+1 ) containing r u , and a unique j = 1, ..., |γ i | such that a Γ ij -spine comes arbitrarily close to this supremum. That is, a Γ ij spine was the most recent spine ancestor of u during the time interval [t i , t i+1 ). With this in mind, on the event B, let M Γi,j T be the set of non-carriers whose most recent k-spine ancestor of u was Γ i,j -spine during the interval [t i , t i+1 ). It follows that we may write Γi,j T |. By Theorem 3.4, the behaviour of the spines in Γ i,j during the period [t i , t i+1 ) looks like Q γij ,∆ti , independently of other (i, j), and hence the terms of the sum are conditionally independent on B.
Furthermore, again conditional on B, the number of particles at time t i+1 who were born off a spine in Γ i,j during the period [t i , t i+1 ) is distributed like N ∆ti − b ij under Q γij,∆ti , where we recall b ij = |γ ij |. Each one of these time t i+1 particles goes on to grow as an independent population governed by P over the time period [t i+1 , T ]. Write N
Using the fact that N T −t are distributed like independent copies of N T −t in the second equality below, the conditional generating function for M and using the conditional independence of the M Γij T we have For some γ, t with |γ| = k consider the quantity ∆ti ] and thus Multiplying this equation by (42), we obtain 4 Coalescent formula for uniformly chosen particles and proof of our main theorem

Expectation of functionals of uniformly chosen particles
Assume P[L k ] < ∞. Suppose, conditional on {N T ≥ k}, we pick uniformly k distinct particles and label them U 1 , ..., U k , writingŪ = (U 1 , ..., U k ). For a nonnegative functional f , we want to calculate the expectation The following lemma permits us to give a value for this quantity in terms of the Q k,T -behaviour of the spines {1, ..., k}.
as required. ( Proof. Recall the definition of the beta function: It follows that for n ≥ k, we have the identity By Fubini's theorem, Proof. Recall that Q k,T (N T ≥ k) = 1. Hence using Lemma 4.1 in the first equality below and the identity (46) in the second, The partition process (π k,L,T t ) t∈[0,T ] is only defined on the event {N T ≥ k}. In the interest of making formula lighter, we return to our previous convention in making a slight abuse of notation and writing P(π k,L,T t ∈ ·) rather than P(π k,L,T t ∈ ·|N T ≥ k) for events relating to the process (π k,L,T t ) t∈[0,T ] . We wrap together our work to prove our formula for the finite dimensional distributions of (π T t ), under the assumption P[L k ] < ∞. Namely for any partition chain γ i , and times t i we now weave things together to show that P(π t1 = γ 1 , ..., π tn = γ n ) = Proof of Theorem 2.3 under k th -moment assumption. By Theorem 3.5, for the functional f (ξ) = Now plug equation (49)  |f (s)−s| = ∞ holding for every ǫ near 0 is necessary and sufficient to ensure P(N t < ∞) = 1 for every t.
In order to safely use a coupling argument in Lemma 4.6, we require the following result. Proof. Suppose we have a continuous-time Galton-Watson process N := (N t ) t≥0 with offspring generating function f (s) = E[s L ] = p 0 + p 1 s + s 2 g(s) satisfying the non-explosion hypothesis. Couple N with another process M := (M t ) t≥0 with generating function f * (s) = (p 0 + p 1 + g(s))s 2 as follows. Every time an particle in the process N has 0 or 1 children, the corresponding particle in the process M has 2 children. Writing M t for the number who have ever lived until t in the M -process, clearly P(N t ≤M t ) = 1, and it is straightforward to verify that f * (s) also satisfies the non-explosion hypothesis, and hence M t is almost surely finite.
Consider in the process M that every particle is replaced by at least two particles upon death, and hence there can have been at most 1 2 M t parents of particles alive at time t. A similar argument says that there can have been at most 1 4 M t grandparents, and so forth. It follows that the we can bound above the number who have ever lived: Since 2M t ≥M t ≥N t , the latter quantity is almost surely finite.
The following lemma, a variant of the dominated convergence theorem, will be used in the proofs of Lemma 4.6, Theorem 2.5 and Theorem 2.7.
Lemma 4.5. Let g, (g n ), and h, (h n ) be measurable functions on a probaiblity space (Ω, A, µ), with |g n | ≤ h n for all n, and such that g n → g, h n → h, and µh n → µh. Then µg n → µg.
Lemma 4.6. Theorem 2.3 holds for every offspring distribution L such that P(L ≥ 2) > 0 and satisfying the non-explosion hypothesis.
Proof. Our proof idea as follows. To calculate the distribution of (π T t ) t∈[0,T ] , first calculate (π T t ) t∈[0,T ] from a tree where branch sizes are bounded by n, and hence the moments are finite and the formula (15) applies. Then we send n → ∞, showing the formula (15) converge suitably.
Let L be a random variable, and let Tree T be the continuous-time Galton-Watson tree run until time T and (N t ) t∈[0,T ] be the corresponding process. We couple the tree Tree T with a tree with bounded branching as follows. Let Tree T,n be the tree with offspring distribution L½ L≤n taken by replacing any birth of size greater than n Tree T with a birth of size zero, and let (N n,t ) t∈[0,T ] be the associated process.
By Lemma 4.4, P(N t < ∞) = 1, and thus P(N t ≤ n) ↑ 1 as n → ∞. Note that {N t ≤ n} ensures {Tree T,n = Tree T }, since if at most n particles have ever lived, no particle ever had more than n offspring. It follows that P(Tree T,n = Tree T ) ↑ 1 as n → ∞. In particular, N n,t → N t almost surely and hence P(N n,T ≥ k) → P(N T ≥ k).
If we pick k particles from Tree T,n and call the partition process (π n,T t ) t∈[0,T ] , it follows that P (π n,T t ) t∈[0,T ] = (π T t ) t∈[0,T ] → 1, since the partition processes correspond to subtrees of the trees Tree n,T and Tree T respectively.
It remains to check that for a process (N n,t ) t≥0 with offspring distribution L½ L≤n and generating function F n,t (s), that as n ↑ ∞, First, let us establish that for all (j, t, s), that as n → ∞ F j n,t (s) → F j t (s).
t ] = F j t (1) by the monotone convergence theorem.
If s < 1, then the function n → n (j) s n−j is bounded for n ∈ {0, 1, 2, . . .} by a constant M > 0. Now note that N n,t → N t implies M ≥ N n,t s Nn,t−j → N t s Nt−j , and hence F j n,t (s) = E[N n,t s Nn,t−j ] → E[N t s Nt−j ] = F j t (s) by the bounded convergence theorem.
To see the convergence of (50) to (51), the Faà di Bruno formula for smooth function semigroups, equation (21), states that Since F j t (s) ≥ 0 for every (j, t, s), every term in the sum above is non-negative, and we have the domination relation Similarly, By our assumption P(N T ≥ k) > 0, and hence by Lemma 4.2, for every n,

Proof of the mixture representation
Here we prove our remark at the end of Section 2.1 that (π k,L,T t ) t∈[0,T ] is a mixture of stochastic processes constructed as follows. Sample a random variable S from the probability density conditional on S = s, (π k,L,T t ) t∈[0,T ] has finite dimensional distributions given by Proof. Clearly, with the definitions (55) and (56) above, we may rewrite (15): It remains to see that by Lemma 4.2, M k,L,T is a probability measure on [0, 1], and that K is a probability measure on the product space P Remark. In the proof of the split time representation, we will use the mixture representation to sidestep technical issues of integral convergence.

Proof of the split time representation
Recall we say [{1, ..., k}] = η 0 ≺ η 1 ≺ ... ≺ η n = [{1}, ..., {k}] is a maximal chain of partitions if η i+1 is created from η i by breaking precisely one block of η i into c i ≥ 2 blocks in η i+1 , and we write Let (η i ) i=0,..n be a maximal chain where a block of size c i breaks at each split. Now we prove the split time representation, which states that, Proof of theorem 2.4. By the mixture representation, (π T t ) is a mixture of stochastic processes with finite dimensional distributions given by We calculate the densities for the split times under the measure K(·|s) for s ∈ [0, 1). Namely, for 0 = t 0 < ... < t n < t n+1 = T , we use (15) to give probabilities that the process (π T t ) t∈[0,T ] |S = s first visits the state η i in the time period (t i , t i + dt i ] =: dt i . That is, we calculate Time 0 Time T t1 t1 + dt1 ti+1 ti ti + dti η0 η1 ηi−1 ηi ηi Figure 9. The joint event that for each i, (π t ) jumps from η i−1 to η i in a small interval near t i .
It follows that the numerator of (59) is given by Now, it is a straightforward calculation that the generating function F t (s) = E[s Nt ] satisfies (1)).
where f n (s) = E[L (n) s L−n ]. Modulo lower order terms, we can simplify (61): . Letting c 0 := 2 for convenience, we can collect the F ′ terms and write

Now by the functional identity
and the chain rule, we have It then follows that 5 Proofs of the supercritical and subcritical asymptotics

A remark on convergence
Suppose we have a collection of monotone (in terms of partial ordering) and right-continuous partition processes whose finite dimensional distributions converge as some parameter tends to ∞.
The following lemma shows that the collection has a right-continuous and monotone limit.
Lemma 5.1. Let (P, ≺) be a finite partially ordered set, and let {λ T = (λ T t ) t∈[0,T ] : T > 0} be a collection of right-continuous processes taking values in P that are (P, ≺)-increasing in the sense that that for each s ≤ t ∈ [0, T ] λ s ≺ λ t Suppose further that the finite dimensional distributions of λ T converge, that is for all t 1 , ..., t n ∈ [0, ∞), and γ 1 , ..., γ n , the limit (taken for T ≥ max{t 1 , ..., t n }) exists. Then there exists a right-continuous and increasing limit process (λ t ) t∈[0,∞) such that P(λ ti = γ i ∀i) = µ t1,...,tn (γ 1 , ..., γ n ) Proof. For each T , λ T has a natural right-continuous extension to all of [0, ∞) constructed by setting λ T t = λ T T for each t ≥ T . The existence of a right-continuous limit process (λ t ) t∈[0,∞) satisfying (63) is a consequence of the right-continuity of the λ T and the finiteness (and hence seperability) of the space P . See [5, Chapter III, Theorem 7.8b] for details.
By the previous lemma, to establish convergence to suitable limit processes in the supercritical and subcritical case, we are content proving that the finite dimensional distributions of the respective processes (π k,L,T t ) t∈[0,T ] and (ρ k,L,T t ) t∈[0,T ] converge.

Proof of the subcritical asymptotics
To see that the integrals converge, note that by the mixture representation, we have the domination relation and apply the argument in the final paragraph of the proof of Lemma 4.6.