A note on the probability of cutting a Galton-Watson tree

. The structure of Galton-Watson trees conditioned to be of a given size is well-understood. We provide yet another embedding theorem that permits us to obtain asymptotic probabilities of events that are determined by what happens near the root of these trees. As an example, we derive the probability that a Galton-Watson tree is cut when each node is independently removed with probability p , where by cutting a tree we mean that every path from root to leaf must have at least one removed node.


The probabilistic model
A set of nodes of a tree is a cut if its removal hits every path from root to leaf. Given a tree t, and a fixed parameter p ∈ (0, 1), we remove each node independently with probability p. Let ρ(t) denote the probability that this removal procedure cuts the tree. The main purpose of this paper is to study ρ(t) for a large class of trees, including random Galton-Watson trees conditioned on their size. A number of comparisons are carried out, including for complete d-ary trees, uniform random binary trees (Catalan trees), random Cayley trees, and random binary search trees, and we give the appropriate limits for conditional Galton-Watson trees, which are also called simply generated trees in the combinatorial literature-see, e.g., Moon (1970).
The paper is motivated by various other pieces of work, notably the destruction of terrorist cells (Farley, 2003(Farley, , 2007, and the failure of a broadcast in tree-shaped networks. The phrase "cutting trees" has also been used in another model, which was studied, e.g., by Janson (2004) for complete binary trees, by Janson (2006a) for random trees including Galton-Watson trees, and by Holmgren (2010) for random split trees. Here nodes are removed sequentially, independently and uniformly from the remaining tree. After a node is removed, its entire subtree is disconnected. The parameter of interest in their model is the number of nodes that have to be removed until the tree is entirely gone.
It is clear that if t 1 , . . . , t k are the subtrees rooted at the k children of the root of t, then This recursive property can be used to compute ρ(t) for many models of trees. Clearly, where the lower bound is reached for a star, i.e., a root with |t| − 1 children, and the upper bound is reached for a chain. If nodes have at most d > 1 children, and |t| − 1 is a multiple of d, then the tree that minimizes ρ(t) over all trees of the same size is the fishbone (Campos et al, 2011), where a fishbone is a tree in which the root has d children, of which d − 1 are leaves. The sole non-leaf child again has d children, of which d − 1 are leaves, and so forth. A simple computation shows that if we define ρ n = ρ(t) for the fishbone tree on n nodes (where n − 1 is a multiple of d), then and the solution of this recursion is Note that if we keep p fixed and let n → ∞, then To compare performances of other trees with the optimal fishbones, it is instructive to compare limits as tree sizes diverge.
Example 1: Complete d-ary trees. If we define t ℓ as the complete d-ary tree with ℓ full levels, then |t ℓ | = n = 1 + d + · · · d ℓ−1 . A simple recursive argument yields An elementary calculus shows that lim ℓ→∞ ρ(t ℓ ) = ρ * where for p ∈ (0, 1 − 1/d), ρ * is the unique solution in (0, 1) of For p ∈ [1 − 1/d, 1], we have ρ * = 1. For example, when d = 2, we obtain Example 2: Families of d-ary trees. For a tree t, let ℓ be the number of full levels, and let h be the the number of occupied levels (also called the height). We have But both upper and lower bound tend to the same limit as ℓ → ∞. Thus, for any sequence of d-ary trees with full level number ℓ → ∞, the limiting probability of cutting the trees tends to C ∞ . A typical example in this class is the random binary search tree T n on n nodes. It is known that the (random) full level number ℓ = ℓ(T n ) tends to infinity in probability (Devroye, 1987), and thus, we can conclude that ρ(T n ) → ρ * in probability and E{ρ(T n )} → ρ * as n → ∞, where E denotes expected value. The same property holds for a large class of random trees, called split trees, introduced by Devroye (1999).
In what follows, we are mainly interested in critical Galton-Watson trees (i.e., those having E{ξ = 1}, and P{ξ = 1} < 1). For those, ρ(T ) is given by the unique solution of (2). Moon (1970) and Meir and Moon (1978) defined the simply generated trees as ordered labelled trees of size n that are all equally likely given a certain pattern of labeling for each node of a given degree. The most important examples include the Catalan trees (equiprobable binary trees), random planted plane trees (equiprobable trees of unlimited degrees) and Cayley trees (equiprobable unordered rooted trees).
Let T n be a Galton-Watson tree conditional on its size being n. It is well-known (see, e.g., Kennedy, 1975, or Kolchin, 1980, 1986 that most uniform random trees correspond to conditional Galton-Watson trees for particular choices of the offspring distribution ξ. For example, when ξ is 0 and 2 with probability 1/2 each, then we have a uniform full binary tree. When ξ is 0 or 2 with probability 1/4 and 1 with probability 1/2, we obtain the uniform binary (Catalan) tree. Uniformly random full binary trees are obtained by setting P{ξ = 0} = P{ξ = 2} = 1/2. A uniformly random d-ary tree has its offspring distributed as a binomial (d, 1/d) random variable. A uniform planted plane tree is obtained for the geometric law P{ξ = i} = 1/2 i+1 , i ≥ 0. When ξ is Poisson of parameter 1, one obtains a random rooted labeled (or Cayley) tree. For ξ uniform on {0, 1, 2, . . . , k}, T n is like a uniform ordered tree with a maximal degree of k. All such trees can be dealt with at once.
Assume E{ξ} < ∞. Find a θ > 0 such that the random variable ξ * with P{ξ * = i} = cθ i P{ξ = i} (where c is a normalization constant) has E{ξ * } = 1. Kennedy (1975) showed that the distribution of the conditional Galton-Watson tree T n does not depend upon the value of θ. Thus, without loss of generality, we can normalize and assume that E{ξ} = 1. Note however that the above construction does not work for all cases-indeed, there are heavy-tailed distributions with E{ξ} < 1 such that there is no θ that yields E{ξ * } = 1. We use the notation p i = P{ξ = i}, and g(s) = E{s ξ }. We assume throughout that p 1 < 1. Note that {ip i , i ≥ 1} is a probability distribution on the integers. Its generating function is sg ′ (s). Let ξ ′ be a random variable with this distribution. Define the span d as the greatest common divisor of all i for which p i > 0. A Galton-Watson tree with span d can only have sizes that are 1 mod d.
The main result of this paper is Theorem 1.
The proof, which is given in a later section, uses two steps. For an unconditional Galton-Watson tree T , we have a root level recursion, with ξ still denoting the number of children of the root: Thus, indeed, r = E{ρ(T )}. In the second step, one notes (see, e.g., Kolchin, 1986) that the root of T n has a number of children that as n → ∞ tends in distribution to ξ ′ . As n becomes large, all but one of these children are roots of unconditional Galton-Watson trees (for the original ξ) and one is the root of a conditional Galton-Watson tree whose size is n − 1 minus the sizes of the unconditional Galton-Watson trees. This structural view is asymptotically precise, and will be nailed down in a useful Lemma below. This suggests that if ρ ∞ is the limit of E{ρ(T n )}, then Example 4: The uniform full binary tree. For the uniform full binary tree, we have g(s) = (1 + s 2 )/2, g ′ (s) = s, This can be verified independently by analytic methods based on generating functions and singularity analysis (for a good account of singularity analysis, see Flajolet and Sedgewick, 2008).
Example 5. The Catalan tree. For the Catalan tree, we have g(s) = (1+s) 2 /4, g ′ (s) = (1+s)/2, This yields Example 6: The random rooted Cayley tree. A more advanced example is the random rooted Cayley tree, i.e., a uniform random rooted labeled tree. As pointed out above, this corresponds to T n with ξ Poisson (1). We have g(s) = g ′ (s) = exp(s − 1). The constant r is the unique solution on (0, 1) of

Preparing the proof: random walks
Let ξ be a random variable representing the number of children of the root in a critical Galton-Watson tree: E{ξ} = 1, P{ξ = 0} > 0. Set X = ξ − 1, and let X 1 , X 2 , . . . be a sequence of i.i.d. random variables distributed as X. Define the partial sums S 0 = 1, S n = 1 + n i=1 X i . There is a well-known depth first or preorder construction of a random Galton-Watson tree which lends itself well to the study of all properties of randomly selected nodes (see, e.g., Le Gall, 1989, or Aldous, 1991. Nodes in an ordered tree can be encoded with a vector of child numbers. The root corresponds to the empty vector. Its children have encodings 1, 2, . . . , ξ. The children of the j − th child of the root are encoded by (j, 1), (j, 2), . . .. A preorder listing of the nodes is nothing but a lexicographic listing of the node vectors.
We can traverse a random Galton-Watson tree by visiting nodes in preorder, starting at the root. To do so, a list L of nodes to be visited is kept, which is initially of size one (having the root). When node u is visited, we consider ξ u , the number of children of u, and remove u from L. Thus, L increases by ξ u − 1. The next node in lexicographic order is taken from L, and the process continues until L = 0.
We denote by N the size of a random Galton-Watson tree. The size of L after n nodes have been processed is denoted by S n . Thus, S 0 = 1, and S n = 1 + X 1 + · · · + X n , where X n = ξ n − 1, and indexing of the nodes is by their lexicographic rank. The root has rank one, for example. We have the identity The standard circular symmetry argument for random walks (see, e.g., Dwass, 1968) shows that The asymptotics for this probability distribution are well-known. Let E{ξ 2 } < ∞, σ 2 = V{ξ} > 0 (which implies P{ξ = 0} > 0), and let the span d be the greatest common divisor of all i for which P{ξ = i} > 0. A Galton-Watson tree with span d can only have sizes that are 1 mod d. From Petrov (1975, p. 197) or Kolchin (1986, p. 16), we see that when d = 1, Thus, for d = 1, 2πn 3/2 . In fact, if n → ∞ such that n = 1 mod d, and d ≥ 1, then 2πn 3/2 (Kolchin, 1986, p. 105).
remark. infinite variance. If the variance is infinite, one can replace the estimate (0) by one that involves stable laws. This will not be pursued in the paper.

A structure theorem for Galton-Watson trees
Various ways have been suggested for describing the structure of a random Galton-Watson tree conditioned on its size. In particular, the idea of a spine or marked (infinite) path, or sizebiasing already present in the work of Rouault (1981) has been made prominent by Lyons, Pemantle and Peres (1995), who used it to give a novel proof of the Kesten-Stigum theorem. It is useful for studying both subcritical, critical and supercritical trees. Many proofs are based on this view of marking one special node at each level to form a spine, see, e.g., Aldous and Pitman (1998), Geiger and Kaufmann (2004) and Duquesne (2009). Our intent is to give a simple lemma that is useful for proving properties that are heavily influenced by the shape of the tree near the root.
We provide a bound on the total variation distance between the top few levels of two related random trees. In this section, T is a Galton-Watson tree with typical litter size ξ, where it is assumed that E{ξ} = 1, p 1 < 1, and V{ξ} = σ 2 < ∞. Let T n be T conditional on |T | = n.
The second tree in which the size-biasing is made explicit is denoted by T ′ n . We construct T ′ n by first generating ξ ′ children of the root according to the probability law
Among the ξ ′ children, choose one uniformly at random, and denote its index by Z. Each child, Z excepted, is the root of an independent unconditional Galton-Watson tree drawn from the distribution of T . Denote the size of the tree of child i by N ′ i . If N def = 1 + i:i =Z N ′ i ≥ n, then remove child Z (and thus, ξ ′ is reduced by one). Otherwise, make child Z the root of a tree that is distributed as T n−N .
One can apply the same construction recursively to child Z, and this leads to the well-known "spine" of the conditional Galton-Watson trees. However, our first result is about the total variation distance between the probability measures of the vectors of subtree sizes. In T n , the ξ children of the root have subtree sizes N = (N 1 , N 2 , . . . , N ξ ). We pad this vector with zeroes to the right. In T ′ n , we denote the subtree sizes by N ′ = (N ′ 1 , . . .), and pad with zeroes to the right. Denote the total variation distance between the probability measures of these vectors by d (N , N ′ ).
By Doeblin's coupling lemma, we can couple the trees in such a way that with probability 1 − d(N , N ′ ), both are identical. Indeed, if the number of children matches, and the subtree sizes match, then each subtree is by construction a Galton-Watson tree conditioned by its size. Let us use such a coupling, and let us also maximally couple the nodes marked for removal (say, by depth first order). Clearly, we have This shows the usefulness of the construction and Lemma 1.
Then, for all (n 1 , . . . , n k ) ∈ W k , Next, we have, denoting by T (ℓ), ℓ ≥ 1, independent copies of T , Summing over all k and over all vectors (n 1 , n 2 , . . . , n k ) ∈ W k , and using a well-known property of the total variation distance, and using the notation u + = max(u, 0), we have Here K is a fixed large integer, and ξ is the number of children of the root in T n . It is understood here that there is no problem of division by zero, as zero factors will occur simultaneously in the numerator and denominator. We will need only one term in the last sum of the upper bound. We also note that the right-hand-side is zero when k = 1. Incorporating this, we have Consider the subspace W ′ k consisting of all vectors in W k with n 1 = max i n i . By the symmetry in our upper bound, we have If n 1 is fixed, then define W ′′ k−1 as the space of positive k − 1-vectors of sum n − 1 − n 1 . Thus, The last sum is of interest. It is, in fact, nothing but the probability that the total size of a forest of k − 1 random Galton-Watson trees is of size n − 1 − n 1 . It is well-known that this is where ξ(1), . . . are i.i.d. and distributed as ξ (see, e.g., Kolchin, 1986, p. 104), and S n = n i=1 ξ(i). We now have Recalling also that we rewrite the inequality as follows: By (0), assuming that the span of ξ is 1, we have a uniform approximation that immediately yields the following, with all the o(·) terms depending upon the distribution of ξ and n only, but not on k: We show that the upper bound in (4) tends to zero. Note that Fix k and, discarding constants, look at the sums We first show that for any fixed K, Using (4), we then have for any fixed K, We conclude by showing that The proof of (5) makes use of the fact that if f is a monotone decreasing function on the reals, then We assume throughout that n is large enough. Clearly, we have for some constants c, c ′ not depending upon k or n, B k ≤ c n 1 :(n−1)/k≤n 1 ≤n−k √ n n 3/2 1 √ n − 1 − n 1 ≤ c ′ k 3/2 n n 1 :(n−1)/k≤n 1 ≤n−k Very similar calculations can be done for C k : Thus, we only need to show (6).
Fix k. Then, using Kolchin's formula again, Thus, from which our result follows since we assumed that kp k = 1. The above derivation assumes that the span of ξ is 1. It is easy to verify that the results remain valid for d > 1.
We will show that lim i.e., Before we continue, we note that g ′ is the generating function of ξ ′ − 1. Thus we have .
for some constant c not depending upon ξ ′ or n. Thus, This concludes the proof.

Other applications
Lemma 1 renders many proofs simple. For example, assuming the conditions of the Lemma, we see that if Z n,1 is the size of the first generation in the conditional Galton-Watson tree T n , then where ξ ′ has the size-biased distribution {ip i , i ≥ 1}, and L → denotes convergence in distribution. This known fact follows immediately from the inequality on total variation distances: All other properties of N and N ′ are shared. For example, if M n = max i N i is the maximal size of a subtree of the root of T n , then, under the conditions of Lemma 1, for every fixed ℓ ≥ 0,
If in a Galton-Watson tree T , we only consider levels 0, k, 2k, 3k, . . ., then we obtain a new Galton-Watson tree T * . If |T | = n, then |T * | is random. Under reasonable conditions, we expect |T * | ≈ n/k if k is fixed. In any case, |T * | ≥ H n /k, where H n is the height of T n . Consider the population size Z n,k of the k-th generation in T n , which is the size of the first generation Z * |T * |,1 in the corresponding T * . Thus, where ξ ′ k is the size-biased distribution of the population in an unconditional Galton-Watson tree, i.e., if Z k is the law of the size of the k-th generation in an ordinary critical Galton-Watson tree, then P{ξ ′ k = i} = iP{Z k = i}, i ≥ 1.
Since M is arbitrary and H n / √ n L → H, where H is a strictly positive random variable with the theta distribution (see, e.g., Flajolet and Odlyzko, 1982), we see that Results of this nature have been around in various forms. For example, Janson (2006b) showed that lim n→∞ E{Z n,k } = 1 + kσ 2 = E{ξ ′ k }, where σ 2 = V{ξ}.
Consider sums of the form where ϕ(k) is summable in k and ψ is a bounded function. Our results imply that lim n→∞ E{V n } = ∞ k=0 ϕ(k)E{ψ(ξ ′ k )}.
By Fatou's lemma, it is easy to see that the right-hand side is a lower bound. Furthermore, the upper bound follows from the summability of ϕ(k) and the fact that for each fixed k, d(Z n,k , ξ ′ k ) = o(1).
Lemma 1 is severely limited-it can only handle questions about the behavior of the top of T n . In contrast, Aldous's fringe result (1991), which states that the subtree rooted at a uniform random node in T n tends in distribution to the unconditional Galton-Watson tree T , is useful for global parameters far away from the root.