A 2-spine Decomposition of the Critical Galton-Watson Tree and a Probabilistic Proof of Yaglom's Theorem

In this note we propose a two-spine decomposition of the critical Galton-Watson tree and use this decomposition to give a probabilistic proof of Yaglom's theorem.

In [7], Lyons, Pemantle and Peres gave a probabilistic proof of Theorem 1.1 using the so-called size-biased µ-Galton-Watson tree. In this note, by size-biased transform we mean the following: Let X be a random variable and g(X) be a Borel function of X with P (g(X) ≥ 0) = 1 and E[g(X)] ∈ (0, ∞). We say a random variable W is a g(X)-size-biased transform (or simply g(X)-transform) of X if for each positive Borel function f . An X-transform of X is sometimes called a size-biased transform of X. We now recall the size-biased µ-Galton-Watson tree introduced in [7]. Let L be a random variable with distribution µ. Denote byL an L-transform of L. The celebrated size-biased µ-Galton-Watson tree is then constructed as follows: • There is an initial particle which is marked.
• Any marked particle gives independent birth to a random number of children according toL. Pick one of those children randomly as the new marked particle while leaving the other children as unmarked particles.
• Any unmarked particle gives birth independently to a random number of unmarked children according to L.
• The evolution goes on.
Notice that the marked particles form a descending family line which will be referred to as the spine. DefineŻ n as the population of the nth generation in the size-biased tree. It is proved in [7] that the process (Ż n ) n≥0 is a martingale transform of the process (Z n ) n≥0 via the martingale (Z n ) n≥0 . That is, for any generation number n and any bounded Borel function g on N n 0 , It is natural to consider probabilistic proofs of analogous results of Theorem 1.1 for more general critical branching processes. Vatutin and Dyakonova [9] gave a probabilistic proof of Theorem 1.1(1) for multitype critical branching processes. As far as we know, there is no probabilistic proof of Yaglom's theorem for multitype critical branching processes. It seems that it is difficult to adapt the probabilistic proofs in [3] and [7] for monotype branching processes to more general models, such as multitype branching processes, branching Hunt processes and superprocesses.
equation. An intuitive explanation of our method, and a comparison with the methods of [3] and [7], are made in the next subsection. We think this new point of view of convergence to the exponential law provides an alternative insight on the classical Yaglom's theorem.
We now give a formal construction of our k(k − 1)-type size-biased µ-Galton-Watson tree. Denote byL an L-transform of L, and byL an L(L − 1)-transform of L. Fix a generation number n and pick a random generation number K n uniformly among {0, . . . , n − 1}. The k(k − 1)-type size-biased µ-Galton-Watson tree with height n is then defined as a particle system such that: • There is an initial particle which is marked.
• Before or after generation K n , any marked particle gives birth independently to a random number of children according toL. Pick one of those children randomly as the new marked particle while leaving the other children as unmarked particles.
• The marked particle at generation K n , however, gives birth, independent of other particles, to a random number of children according toL. Pick two different particles randomly among those children as the new marked particles while leaving the other children as unmarked particles. • Any unmarked particle gives birth independently to a random number of unmarked children according to L.
• The system stops at generation n.
If we track all the marked particles, it is clear that they form a two-spine skeleton with K n being the last generation where those two spines are together. It would be helpful to consider this skeleton as two disjoint spines, where the longer spine is a family line from generation 0 to n and the shorter spine is a family line from generation K n + 1 to n.
For any 0 ≤ m ≤ n, denote byZ (n) m the population of the mth generation in the k(k − 1)-type size-biased µ-Galton-Watson tree with height n. The main reason for proposing such a model is that the process (Z (n) m ) 0≤m≤n can be viewed as a Z n (Z n − 1)-transform of the process (Z m ) 0≤m≤n . This is made precise in the result below which will be proved in Section 2.1. Theorem 1.2. Let (Z m ) m≥0 be a µ-Galton-Watson process and (Z (n) m ) 0≤m≤n be the population of a k(k − 1)-type size-biased µ-Galton-Watson tree with height n. Suppose that µ satisfies (1.1) and (1.2). Then, for any bounded Borel function g on N n 0 , The idea of considering a branching particle system with more than one spine is not new. A particle system with k spines was constructed in [4] and used in the many-to-few formula for branching Markov processes and branching random walks. Inspired by [4], we use a two-spine model to characterize the k(k − 1)-type size-biased branching process.

Methods
Suppose that X is a non-negative random variable with E[X] ∈ (0, ∞). Then its distribution conditioned on {X > 0} can be characterized by its conditional expectation E[X|X > 0] and its size-biased transformẊ. In fact, for each λ ≥ 0,  (2). In Section 3, for completeness, we will simplify the argument of [2] and [9], and give a proof of Theorem 1.1(1). Our method of proving (1.6) takes advantage of a fact that the exponential distribution is characterized by an x 2 -type size-biased distributional equation. This is made precise in the next lemma, which will be proved in Section 3: With this lemma and Theorem 1.2, we can give an intuitive explanation of the in the nth generation can be separated into two parts: descendants from the longer spine and descendants from the shorter spine. Due to their construction, the first part, the descendants from the longer spine at generation n, is distributed approximately likė Z n , while the second part, the descendants from the shorter spine at generation n, is distributed approximately likeŻ U ·n . Those two parts are approximately independent of each other. So, after a renormalization, we have roughly thaẗ where the process (Ż m ) is an independent copy of (Ż m ). Suppose thatŻ n /n converges weakly to a random variableẎ , andZ n /n converges weakly to a random variableŸ .
It is interesting to compare this method of proving exponential convergence with the methods used in [3] and [7]. In [7], Lyons, Pemantle and Peres characterize the exponential distribution by a different but well-known x-type size-biased distributional equation: A nonnegative random variable Y with positive finite mean is exponentially distributed if and only if it satisfies that whereẎ is a Y -transform of Y , and U is a uniform random variable on [0, 1], which is independent ofẎ . With the help of the size-biased tree, they then show that U ·Ż n is distributed approximately like Z n conditioned on {Z n > 0}. So, after a renormalization, they have roughly that (1.10) Suppose that {Z n /n; P (·|Z n > 0)} converge weakly to a random variable Y , andŻ n /n converge weakly to a random variableẎ . Then, according to [7,Lemma 4.3],Ẏ is the size-biased transform of Y . Therefore, letting n → ∞ in (1.10), Y should satisfy (1.9), which suggests that Y is exponentially distributed.
In [3], Geiger characterizes the exponential distribution by another distributional equation: If Y (1) and Y (2) are independent copies of a random variable Y with positive finite variance, and U is an independent uniform random variable on [0, 1], then Y is ).
( 1.11) Geiger then shows that for (Z n ), conditioned on non-extinction at generation n, the distribution of the generation of the most recent common ancestor (MRCA) of the particles at generation n is asymptotically uniform among {0, 1, . . . , n} (a result due to [11], see also [2]), and there are asymptotically two children of the MRCA, each with at least 1 descendant in generation n. After a renormalization, roughly speaking, Geiger has that where for each m, Z m and Z (2) m are independent copies of {Z m ; P (·|Z m > 0)}. Therefore, if {Z n /n; P (·|Z n > 0)} converges weakly to a random variable Y , then Y should satisfy (1.11), which suggests that Y is exponentially distributed.
From this comparison, we see that all the methods mentioned above share one similarity: They all establish the exponential convergence via some particular distributional equation. However, since the equations (1.7), (1.9) and (1.11) are different, the actual way of proving the convergence varies. In [7], an elegant tightness argument is made along with (1.10). However, it seems that this tightness argument is not suitable for (1.12), due to a property that the conditional convergence for some subsequence Z n k /n k implies the convergence of U ·Ż n k /n k , but does not imply the convergence of Z (i) U n k /U n k , i = 1, 2. Instead, a contraction type argument in the L 2 -Wasserstein metric is used in [3].
For similar reasons, in this note, to actually prove the exponential convergence using (1.8) and (1.7), some efforts also must be made. We observe that the distributional equation (1.8) admits a so-called size-biased add-on structure, which is related to Lèvy's theory of infinitely divisible distributions: Suppose that X is a nonnegative random variable with a := E[X] ∈ (0, ∞); then X is infinitely divisible if and only if there exists a nonnegative random variable A independent of X such thatẊ d = X + A. In fact, the Laplace exponent of X can be expressed as Yaglom's theorem. This is made precise in Section 3. A similar type of argument is also used in our follow-up paper [8] for critical superprocesses.

Trees and their decompositions 2.1 Spaces and measures
In this subsection, we give a proof of Theorem 1.2. Consider particles as elements in the space where N := {1, 2, . . . }. Therefore elements in U are of the form 213, which we read as the individual being the 3rd child of the 1st child of the 2nd child of the initial ancestor ∅. For two particles u = u 1 . . . u n , v = v 1 . . . v m ∈ U , uv denotes the concatenated particle uv := u 1 . . . u n v 1 . . . v m . We use the convention u∅ = ∅u = u and u 1 . . . u n = ∅ if n = 0. For any particle u := u 1 . . . u n−1 u n , we define its generation as |u| := n and its parent particle as ← − u := u 1 . . . u n−1 . For any particle u ∈ U and any subset a ⊂ U, we define the number of children of u in a as l u (a) := #{α ∈ a : ← − α = u}. We also define the height of a as |a| := sup α∈a |α| and its population in the nth generation as X n (a) := #{u ∈ a : |u| = n}. A tree t is defined as a subset of U such that there exists an N 0 -valued sequence (l u ) u∈U , indexed by U, satisfying t = {u 1 . . . u m ∈ U : m ≥ 0, u j ≤ l u1...uj−1 , ∀j = 1, . . . , m}.
Fix a generation number n ∈ N. Define the following spaces: • The space of trees with height no more than n, T ≤n := {t : t is a tree with |t| ≤ n}.
• The space of n-height trees with one distinguishable spine, T n := {(t, v) : t is a tree with |t| = n, v is a spine on t}.
• The space of n-height trees with two different distinguishable spines, Let (L u ) u∈U be a collection of independent random variables with law µ, indexed by U. Denote by T the random tree defined by T := {u 1 . . . u m ∈ U : 0 ≤ m ≤ n, u j ≤ L u1...uj−1 , ∀j = 1, . . . , m}.
We refer to T as a µ-Galton-Watson tree with height no more than n since its population (X m (T )) 0≤m≤n is a µ-Galton-Watson process stopped at generation n. Define the µ-Galton-Watson measure G n on T ≤n as the law of the random tree T . That is, for any t ∈ T ≤n , G n (t) := P (T = t) = P (L u = l u (t) for any u ∈ t with |u| < n) = u∈t:|u|<n µ(l u (t)).
Recall thatL is an L-transform of L. DefineĊ as a random number which, conditioned onL, is uniformly distributed on {1, . . . ,L}. Independent of (L u ) u∈U , let (L u ,Ċ u ) u∈U be a collection of independent copies of (L,Ċ), indexed by U. We then use (L u ) u∈U and (L u ,Ċ u ) u∈U as the building blocks to construct the size-biased µ-Galton-Watson treeṪ and its distinguishable spineV following the steps described in Section 1.1. We use L u as the number of children of particle u if u is unmarked and useL u if u is marked. In the latter case, we always set theĊ u -th child of u, i.e. particle uĊ u , as the new marked particle. For convenience, we stop the system at generation n. To be precise, the random spineV is defined bẏ where, for any u ∈ U ,L u := L u 1 u ∈V +L u 1 u∈V .
Recall that K n is a random generation number uniformly distributed on {0, . . . , n − 1}, andL is an L(L − 1)-transform of L. Define (C,C ) as a random vector which, conditioned onL, is uniformly distributed on {(i, j) ∈ N 2 : 1 ≤ i = j ≤L}. Suppose that (L u ) u∈U , (L u ,Ċ u ) u∈U , (L,C,C ) and K n are independent of each other. We now use these elements to build the k(k − 1)-type size-biased µ-Galton-Watson treeT and its two different distinguishable spinesV andV following the steps described in Section 1.1. Write C u :=Ċ u 1 |u| =Kn +C1 |u|=Kn and C u :=Ċ u 1 |u| =Kn +C 1 |u|=Kn . We define the random spinesV andV as where, for any u ∈ U , L u := L u 1 u ∈V ∪V +L u 1 u∈V ∪V ,|u| =Kn +L1 u∈V ∪V ,|u|=Kn .
We now consider the distribution of (T ,V ,V ). For any (t, v, v ) ∈T n , the event {(T ,V ,V ) = (t, v, v )} occurs if and only if: Using this analysis, we get that The k(k − 1)-type size-biased µ-Galton-Watson measureG n on T ≤n is then defined as the law of the random elementT . That is, for any t ∈ T ≤n , We note in passing that, because of the way they are constructed, the measures (G n ) n≥1 are not consistent, that is, the measureG n is not the restriction ofG n+1 . This = G n X n (t)(X n (t) − 1) nσ 2 g(X 1 (t), . . . , X n (t)) = 1 nσ 2 E[Z n (Z n − 1)g(Z 1 , . . . , Z n )]. (2.4) (2.5)

Spine decompositions.
Using the notation introduced in the previous section, we are now ready to give a precise meaning to (1.8): X n (Ȧ k ). (2.6) Notice that the right side of the above equation is a sum of independent random variables; and from their construction, we see that X n (Ȧ k )

Proofs
Proof of Theorem 1.1 (1). Denote by B j n the event that the Galton-Watson process (Z n ) n≥0 survives up to generation n, and the left-most particle in the n-th generation is a descendant of the jth particle of the first generation. Write q n = P [Z n = 0] = f (n) (0) and p n = 1 − q n where f is the probability generating function of the offspring distribution µ.
The criticality implies that q n ↑ 1 as n → ∞, and that By the monotone convergence theorem, Now combining (3.1) with the above, we get In order to compare distributions using their size-biased add-on structures, we need the following lemma: Lemma 3.1. Let X 0 and X 1 be two non-negative random variables with the same mean a = E[X 0 ] = E[X 1 ] ∈ (0, ∞). Let F 0 be defined by E[e −λẊ0 ] = E[e −λX0 ]F 0 (λ), whereẊ 0 is an X 0 -transform of X 0 , and F 1 be defined by E[e −λẊ1 ] = E[e −λX1 ]F 1 (λ), whereẊ 1 is an X 1 -transform of X 1 . Then, Similarly, ∂ λ (− ln E[e −λX1 ]) = aF 1 (λ). Therefore, since x − ln x is decreasing on [0, 1], We are now ready to prove Lemma 1.3. It is elementary to verify that if Y is exponentially distributed, then it satisfies (1.7). So we only need to show that if Y is a strictly positive random variable with finite second moment, then (1.7) implies that it is exponentially distributed. The following lemma will be used to prove this.  Then F ≡ 0.
Proof of Lemma 1.3. Suppose that Y is a strictly positive random variable with finite second moment, and (1.7) is true. Define a := E[Ẏ ] ∈ (0, ∞). Consider an exponential random variable e with mean a/2. It is elementary to verify that e satisfies (1.7), in the sense thatë d =ė + Uė , whereė andė are both e-transforms of e,ë is an e 2 -transform of e, U is a uniform random variable on [0, 1], andė,ė and U are independent. Notice that E[ė] = a, therefore we can compare the distribution ofẎ with that ofė using Lemma 3. Proof of Theorem 1.1 (2). Consider an exponential random variable Y with mean σ 2 /2. LetẎ be a Y -transform of Y . As in Section 1.2, we only need to prove thatŻ n /n converge weakly toẎ . From Proposition 2.1, we know that E[e −λZ (n) n ] = E[e −λŻn ]E[g(λ, U n )e −λŻ U n ],