Invasion Percolation on Galton-Watson Trees

We consider invasion percolation on Galton-Watson trees. On almost every Galton-Watson tree, the invasion cluster almost surely contains only one infinite path. This means that for almost every Galton-Watson tree, invasion percolation induces a probability measure on infinite paths from the root. We show that under certain conditions of the progeny distribution, this measure is absolutely continuous with respect to the limit uniform measure. This confirms that invasion percolation, an efficient self-tuning algorithm, may be used to sample approximately from the limit uniform distribution. Additionally, we analyze the forward maximal weights along the backbone of the invasion cluster and prove a limit law for the process.


Introduction
Given an infinite rooted tree, how might one sample, nearly uniformly, from the set of paths from the root to infinity? One motive for this question is that nearly uniform sampling leads to good estimates on the growth rate [JS89]. One might be trying to estimate the size of a search tree, or, in the case of [RS00], to determine the growth rate of the number of self-avoiding paths.
A number of methods have been studied. One is to do a random walk on the tree, with a "homesickness" parameter determining how much steps back toward the root are favored [LPP96]. The parameter needs to be tuned near criticality: too much homesickness and the walk gets stuck near the root; too little homesickness and the walk goes to infinity without taking the time to ensure that the path is well randomized. Randall and Sinclair [RS00] solve this by estimating the critical parameter as the walk progresses, re-tuning the homesickness to lie above this by an amount decreasing at an appropriate rate.
Another approach is to use percolation. One conditions the percolation cluster to survive to level N ; as the percolation parameter decreases to criticality and N is taken to infinity, the law of this cluster approaches the law of the incipient infinite cluster (IIC). For many graphs-e.g. regular or Galton-Watson trees-the IIC almost surely contains a unique infinite path, thereby giving a mechanism for sampling such a path. In practice, the same considerations arise as with homesick random walks: tuning the percolation parameter too low yields too little likelihood of survival and too great a time cost to rejection sampling; too great a percolation parameter results in too many surviving paths and a selection problem which leads to poor randomization.
Invasion percolation was introduced as a model for how viscous fluid creeps through an environment in [WW83]. Each site is given an independent U [0, 1] random variable, representing how great the percolation probability would have to be before the site would be open. The cluster then grows by adding, at each time step, the site with the least U value among sites neighboring the cluster but not in the cluster. It is not hard to see that the lim sup of U -values of bonds chosen is equal to the critical percolation parameter. In other words, instead of running percolation at p c and conditioning to survive, one allows slightly supercritical bonds but less and less as the cluster grows. As is the case for the IIC, the invasion cluster almost surely contains only one infinite path in the case of regular or Galton-Watson trees, and thus gives a different mechanism for sampling paths. Unlike the IIC and homesick random walk, invasion percolation requires no tuning to criticality and is an instance of self-organized criticality.
The invasion cluster has some properties in common with the IIC but not all. For example, results of Kesten [Kes86] and Zhang [Zha95] show that the growth exponents of the two are equal on the twodimensional lattice; however the measures of the two clusters are mutually singular on the lattice [DSV09] as well as on a regular tree [AGdHS08]. Our focus is the comparison of the laws induced on paths by both the IIC and invasion percolation.
On a Galton-Watson tree T , there is a natural measure on paths, the limit-uniform measure µ T , which although it does not restrict precisely to the uniform measure on each generation, approximates this as closely as possible. There is not, however, a fast algorithm for sampling from it. Rules such as "split equally at each node" lead to rapid sampling but the wrong entropy; in other words, the Radon-Nikodym derivative with respect µ T on generation N will be exponential in N . It is not hard to show that on almost every Galton-Watson tree (assuming a Z log Z moment for the offspring distribution), the unique path in the IIC has law µ. Since sampling from the IIC is problematic, it is therefore natural to ask how close the law ν T of the path chosen by the invasion cluster is to µ T . Showing that it has the right entropy is not too involved. This is Theorem 2.8 below. It is also easy to see that the two laws are typically not equal. As an example, consider the set of trees with first three generations given by When averaged over the remaining generations with the Galton-Watson measure-or equivalently, placing independent Galton-Watson trees at the terminal nodes-the limit-uniform measure splits equally at the root, while the invasion measure favors the left subtree regardless of the offspring distribution.
The best comparison one might hope for is that ν T be absolutely continuous with respect to µ T , perhaps even with Radon-Nikodym derivative in L p . Our main result is as follows.
The condition in Theorem 1.1 is a trade-off between p 1 and p. In the case of p = ∞, the condition becomes p 1 < 1/µ 3+ √ 17 2 . In the case of p 1 = 0, the condition is p > 11+ √ 105 2 . An outline of the argument behind Theorem 1.1 is as follows. Let X n be the KL-distance between the way that µ T and ν T split at the nth step γ n of a path chosen from ν T . A sufficient condition for absolute continuity is that ∞ n=1 EX n < ∞. A precise statement is given in Lemma 2.11 below. A more detailed outline of why this condition should hold is given at the beginning of Section 5.
The reason we have a hope of estimating X n is that there is a backbone decomposition for invasion percolation. Define the backbone to be the almost surely unique nonbactracking path γ = (0, γ 1 , γ 2 , . . .) from the root to infinity. For any vertex v define the pivot value at v, denoted β(v) , to be the least p such that there is a path from v to infinity in the subtree at v with all U variables (not including the one at v) at most p. On a regular tree, invasion percolation was studied in [AGdHS08,AGdHS13]. For the purposes of studying ν T , the regular tree is a degenerate case, because µ T and ν T are equal to each other and to the equally splitting measure. However their results on backbones and pivots extend in a useful way to the Galton-Watson setting.
Conditioning on T , the way the invasion measure splits at v depends on the whole tree. However, if one also conditions on the pivot at v, then the way the invasion measure splits at v becomes independent of everything outside of the subtree at v. A similar statement is true if one conditions on the pivot of v being less than or equal to a certain value; these are the Markov properties of Propositions 4.4 and 4.6. The limiting behavior of these values are given in Theorem 4.9 and Corollary 4.10. Further, Lemma 5.2 shows that this conditioned splitting measure is close to a ratio of survival probabilities under supercritical Bernoulli percolation. The problem is thus reduced to proving estimates of the survival probabilities of Galton-Watson trees under supercritical Bernoulli percolation as in Section 5.2.
The remainder of the paper is organized as follows. Section 2 sets up the notation and gives some preliminary results. Some care is required to set up the probability space so that we can easily speak of the random measures µ T and ν T , which are conditional on the Galton-Watson tree. Section 2 culminates in Lemma 2.11 and Corollary 2.12. Section 3 gives an upper bound on survival probabilities and squeezes survival probabilities close to an easier-to-analyze averaged version with high probability. Section 4 proves two Markov properties for the subtree from γ n together with β(γ n ). The remainder of the section extends the work of [AGdHS08] by proving a limit law for β(γ n ) which then implies an upper bound on the rate at which β(γ n ) ↓ p c . In particular, Corollary 4.10 shows convergence to the Poisson lower envelope process, as in [AGdHS08]. Section 5 proves Theorem 1.1 by comparing the conditional invasion measure to the ratio of survival probabilities and by carefully estimating survival probabilities near criticality.
A glossary of notation by page of reference is included after the references.

Galton-Watson trees
We begin with some notation we use for all trees, random or not. Let U be the canonical Ulam-Harris tree [ABF13]. The vertex set of U is the set V := ∞ n=1 N n , with the empty sequence 0 := ∅ as the root. There is an edge from any sequence a = (a 1 , . . . , a n ) to any extension a j := (a 1 , . . . , a n , j). The depth of a vertex v is the graph distance between v and 0 and is denoted |v|. We work with trees T that are locally finite rooted subtrees of U. The usual notations are in force: T n denotes the set of vertices at depth n; T (v) is the subtree of T at v, canonically identified with a rooted subtree of U, in other words the vertex set of T (v) is {w : v w ∈ V (T )} ; ∂T denotes the set of infinite non-backtracking paths from the root; if γ ∈ ∂T then γ n (n ≥ 0) denotes the nth vertex in γ; the last common ancestor of v and w is denoted v ∧ w and the last common vertex of γ and γ is denoted γ ∧ γ ; v denotes the parent of v . Let µ n T denote the uniform measure on the nth generation of T . In some cases, for example for almost every Galton-Watson tree, the limit µ T := lim n→∞ µ n T exists and is called the limit-uniform measure.
Turning now to Galton-Watson trees, let φ(z) := ∞ n=1 p n z n be the ordinary generating function for a supercritical branching process with no death, i.e., φ(0) = 0. We recall, where Z is a random variable with probability generating function φ. Throughout, we assume E[Z 2 ] < ∞; in particular, this also means that φ (1) < ∞. Moreover, since our focus is on ∂T , the assumption of φ(0) = 0 can be made without loss of generality by considering the reduced tree, as in [AN72,Chapter I.12].
We will work on the canonical probability space (Ω, F, P) where Ω = (N × [0, 1]) V , F is the product Borel σ-field, and P is the probability measure making the coordinate functions ω v = (deg v , U v ) IID with the law of (Z, U ), where U is uniform on [0, 1] and independent of Z. The variables {deg v }-where deg v is interpreted as the number of children of vertex v-will construct the Galton-Watson tree, while the variables {U v } will be used later for percolation. Let T be the random rooted subtree of U which is the connected component containing the root of the set of vertices that are either the root or are of the form v j such that 0 ≤ j < deg v . This is a Galton-Watson tree with ordinary generating function φ.
As is usual for Galton-Watson branching processes, we denote Z n := |T n |. Extend this by letting Z n (v) denote the number of offspring of v in generation |v| + n; similarly, extend the notation for the usual martingale W n := µ −n Z n by letting W n (v) := µ −n Z n (v). We know that W n (v) → W (v) for all v, almost surely and in L p if the offspring distribution has p moments. This is stated without proof for integer values of q ≥ 2 in [Har63,p. 16] and [AN72, p. 33, Remark 3]; for a proof for all q > 1, see [BD74, Theorems 0 and 5]. Further extend this notation by letting v (i) denote the ith child of v, letting Z (i) n (v) denote nth generation descendants of v whose ancestral line passes through v (i) , and letting W . For convenience, define p c := 1/µ and recall that p c is almost surely the critical percolation parameter for T [Lyo90].

Bernoulli and Invasion Percolation
In this subsection we give the formal construction of percolation on random trees, and for invasion percolation. Our approach is to define a simultaneous coupling of invasion percolations on all subtrees T of U via the U variables, then specialize to the random tree T. Let T := σ({deg v : v ∈ V}) denote the σ-field generated by the tree variables. Because T is independent from the U variables, this means we have constructed a process whose law, conditional on T , is invasion percolation on T. We use the notation E * to denote E[· | T ]; similarly P * [·] := P[· | T ] . Moreover, we use GW := P| T to denote the Galton-Watson measure on trees.
We begin with a similar construction for ordinary percolation. For 0 < p < 1, simultaneously define Bernoulli(p) percolations on rooted subtrees T of U by taking the percolation clusters to be the connected component containing 0 of the induced subtrees of T on all vertices v such that U v ≤ p. Let F n be the σ-field generated by the variables {U v , deg v : |v| < n}. Let p c = 1/µ = 1/φ (1) denote the critical probability for percolation. Write v ↔ T,p w if U u ≤ p for all u on the geodesic from v to w in T . Informally, v ↔ T,p w iff v and w are both in T and are connected in the p-percolation. The event of successful p percolation on T is H T (p) := {0 ↔ T,p ∞} and the event of successful p percolation on the random tree T, is denoted H T (p) or simply H(p). Let g(T, p) := P[H T (p)] denote the probability of p percolation on T . The conditional probability P * [H(p)] is measurable with respect to T and we may define g(T, p) := P * [H(p)]. Furthermore, we may define g(p) = P[H(p)] = Eg(T, p). Since p c = 1/µ is the critical percolation parameter for a.e. T, note that g(T, p) = 0 for all p ∈ [0, p c ].
Before defining invasion percolation, we record some basic properties of g.
Proposition 2.1. The derivative from the right K := g (p c ) exists and is given by . (2.1) Proof: Let φ p (z) := φ(1 − p + pz) be the offspring generating function for the Galton-Watson tree thinned by p-percolation for p ∈ (p c , 1). The fixed point of φ p is 1 − g(p). In other words, g(p) is the unique s ∈ (0, 1) for which 1 − φ p (1 − s) = s. The first two derivatives of φ p at 1 are given by By a Taylor expansion, this leads to as p ↓ p c . Setting this equal to s and solving for s ∈ (0, 1) yields the conclusion of the proposition.
Proof: The existence of g (p) on (p c , 1) follows from the implicit function theorem. To obtain an expression for g (p), we differentiate both sides of the expression φ(1 − p · g(p)) = 1 − g(p) with respect to p, which gives Rearranging this expression to isolate g (p), while using Proposition 2.1, along with the fact that Define invasion percolation on an arbitrary tree T as follows. Start with I T 0 = 0 where we recall that 0 is the root of T . Inductively define I T n+1 to consist of I T n along with the vertex of minimal weight U v adjacent to I T n . The invasion percolation cluster is defined as I T := I T n . Note that I T is measurable with respect to the U variables. Let I := I T denote the invasion cluster of the random tree T. By independence of the U variables and T , the conditional distribution of I given T agrees with that of invasion percolation.
Proposition 2.3. For any p > p c , I almost surely reaches some vertex v such that v ↔ p ∞ in T(v).
Proof: We consider the following coupling that generates I at the same time as T: begin with the root, and generate children according to Z, giving each new edge a (0, 1) weight uniformly and independently. If there exists a weight below p, take the edge of minimal weight among all boundary edges and generate children according to Z, giving each new edge a weight as before. Continue this process; if it terminates, then choose the minimal weight overall (which will necessarily be greater than p), generate children and assign weights as before. Repeat this process.
Each event measurable with respect to the law described above has the same probability as with respect to P. At the step after each termination-when all available weights are greater than p-the event that the newly generated vertex has an infinite subtree is independent from all that came before and has positive probability g(p). By the second Borel-Cantelli lemma, there will necessarily be an invaded edge with an infinite subtree below it with weights less than p.
Corollary 2.4. For any p > p c , the number of edges in I with weight greater than p is almost surely finite.
This was proven for a large class of graphs by Häggström, Peres and Schonmann [HPS99], but this class doesn't cover the case of Galton-Watson trees conditioned on survival; they exploit quite a bit of symmetry that does not occur in the Galton-Watson case.
Proof: Let x be the first invaded vertex with an infinite subtree below with weights less than p. Then after x is invaded, no edges of weight larger than p will be invaded.
Corollary 2.5. There is almost surely only one infinite non-backtracking path from 0 in I. Equivalently, T is almost surely in the set of trees T such that I T contains almost surely a unique infinite non-backtracking path from 0.
Proof: Suppose that there are two distinct paths to infinity in I; by Corollary 2.4, there exist maximal weights M 1 and M 2 along these paths after they split, P-almost surely. If M 1 > M 2 , the second infinite path would be invaded before the edge containing M 1 . Similarly, we cannot have M 2 > M 1 . Finally, M 1 = M 2 has P-probability 0, completing the proof.
This proof is stated for invasion percolation on regular trees in [AGdHS08], but is identical for Galton-Watson trees once Corollary 2.4 is in place; the unique path guaranteed by 2.5 is typically called the backbone of I, and we continue this convention. Note that a regular tree is simply a Galton-Watson tree with Z ≡ b.
Definition 2.6 (the invasion path γ). Let γ T := (0, γ T 1 , γ T 2 , . . .) be the random sequence whose nth element is the unique v with |v| = n such that v ↔ ∞ via a downward path in the invasion cluster I T . Let ν T denote the law of γ T given T . Let ν T denote the random measure on the random space (T, ∂T) induced by the γ T . In other words, for measurable A ⊆ ∂U, ν T (A, ω) = P[γ T ∈ A] evaluated at T = T(ω). By Corollary 2.5, this is a well defined probability measure for almost every ω.

Preliminary comparison of limit-uniform and invasion measures
Our main goal is to see whether ν T is almost surely absolutely continuous with respect to µ T . The following easy first result shows that ν T has the same entropy as µ T .
Definition 2.7. Let T be a rooted tree with branching number µ and S ⊂ ∂T . For each n, define S n to be the set of vertices in paths in S at height n. If there exists an ε > 0 so that |S n | (µ − ε) n , then we say that S is exponentially small. If every exponentially small set S has ν T (S) = 0, then we say that the the limit uniform and invasion measures on T have the same entropy.
Theorem 2.8. For GW-a.e. T, the limit uniform and invasion measures on T have the same entropy.
Proof: Condition on T and fix some exponentially small S ⊂ ∂T. Let p ∈ ( 1 µ , 1 µ−ε ); then since p > p c , Proposition 2.3 implies that there exists an invaded vertex with an infinite subtree of weights less than p below it. Let v p be the first such invaded vertex. Then the backbone is contained in the subtree below v p since the parental edge of v p together with edges of weight larger than p separate the root from infinity. Moreover, all invaded vertices below v p have parental edge weight at most p, implying that the backbone is contained in the set of open edges below v p with weight at most p; call this subtree C p . Note that the law of the subtree below v p (unconditioned on T ) with weight at most p has the law of a Galton-Watson tree with progeny distribution Bin(Z, p) conditioned on v p being contained in an infinite open cluster. Thus, conditioned on T and v p , the law of this subtree is the law of the open cluster containing v p of Bernoulli(p) percolation on the subtree below v p conditioned on this cluster being infinite. Note that since T is Galton-Watson, the critical parameter for percolation on the subtree below v p is p c ; in particular, if we call this cluster C p , we have that Condition on T and condition further on v p ; note that for x below v p we have . (2.2) Thus for any n, we have where the last inequality is via a union bound over S n . By the definition of p, |S n |p n → 0; therefore, for each v p , we have Since there are P-a.s. only countably many vertices in T, the above gives P-a.s. thereby completing the proof.
In the remainder of this section, we give the summability criterion that establishes a sufficient condition for absolute continuity in terms of the KL-divergence of the two measures along a ray chosen from ν T .
Definition 2.9 (the splits p and q at children of u, and their difference, X). Let v be a vertex of T and let u be the parent of v. Define where the sum is over all children w of u and ν T (v) = ν T ({γ : v ∈ γ}) and µ T (v) is defined similarly. The quantity X is known as KL-divergence. The KL-divergence K(ρ, ρ ) is defined between any two probability measures ρ and ρ on a finite set {1, . . . , k} by the formula It is a measure of the difference between the two distributions. It is always non-negative but not symmetric.
The following inequality shows that K behaves like quadratic distance away from ρ = 0.
Applying Proposition 2.10 to ρ = q and ρ = p gives Proof: On the measure space (∂T, B), define a filtration {G n } by letting G n denote the σ-field generated by the sets {γ : γ n = v}. The Borel σ-field B is the increasing limit σ( n G n ). Let In other words, M n (γ) = ν(γ n )/µ(γ n ). Let M := lim sup n→∞ M n . The Radon-Nikodym martingale theorem [Dur10, Theorem 5.3.3] says that {M n } is a martingale with respect to (∂T, B, µ T , {G n }) and that ν T µ T is equivalent to ν T ({M = ∞}) = 0. This is equivalent to ν T ({M = 0}) = 0 where M = 1/M = lim inf n 1/M n . The sequence {1/M n } is a ν T -martingale, therefore {log(1/M n )} is a ν Tsupermartingale and to conclude that it ν T -a.s. does not go to negative infinity, it suffices to show that its expectation is bounded from below.
Corollary 2.12. Recall that γ denotes the invasion path on T and let X n denote X(γ n ).
(ii) Define the filtration {G n } on (Ω, F) by letting G n be the σ-field generated by T together with γ 1 , . . . , γ n . Proof: , whence (2.6) holds for GW-almost every T, implying almost sure absolute continuity of µ T with respect to ν T .
(ii) The argument used to prove Lemma 2.11 may be adapted as follows. Let M n := dν T dµ T G n , a version of which is the function taking the value supermartingale which we need to show converges almost surely. The sequence is a convergent supermartingale because its expected increments are either 0 or −Y (γ n ); convergence of the unconditional expectations EY (γ n ) implies almost sure convergence of the expected increments, implying almost sure convergence of the supermartingale {S M }. The hypotheses of (ii) imply that the increments of S M differ from the increments of log(1/M n ) finitely often almost surely, implying convergence of the supermartingale log(1/M n ) and hence ν T µ T .

Survival function conditioned on the tree
This section is concerned with estimating g(T, p), a random variable measurable with respect to T . We first prove an upper bound on g which gives a uniform bound on the L q norm of g. Additionally, we show that conditioning on only the first n levels gives a random variable exponentially close to g. Estimating this averaged random variable is a key element in the proof of Theorem 1.1, and is the content of section 5.2.
The following result from [LP17] will be useful.
Theorem 3.1 ([LP17, Theorem 5.24]). For independent percolation, we have where R(0 ↔ ∞) denotes the effective resistance from 0 to infinity when each edge e with percolation parameter p e is given resistance From this, we deduce: Proposition 3.2. For any ε ∈ (0, 1 − p c ) and GW-almost surely, where W := sup n W n (T) is almost surely finite because lim n→∞ W n exists almost surely.
Proof: To get an upper bound on g, we need a lower bound on the resistance. For each height n, short together all nodes at this height. For every p = p c + ε this gives a lower bound of Using Theorem 3.1, we get Proposition 3.3 (uniform L q bound). Suppose the offspring has a finite q moment. Then for any δ > 0, there is a constant c q such that for all ε ∈ (0, 1 − p c − δ), Proof: First recall that if the offspring has a finite q-moment, then M q := EW q is finite as well. By the L q maximal inequality (e.g., [Dur10,Theorem 5.4.3]), we have that In particular, for any ε < 1 − p c − δ, this implies Let T n denote σ(deg v : |v| ≤ n). Because T n ↑ T and g is bounded, we know that E[g(T, p) | T n ] → g(T, p) almost surely and in L 1 . It turns out that g n := E[g(T, p) | T n ] is much easier to estimate that g itself. Our strategy is to show this convergence is exponentially rapid, transferring the work from estimation of g to estimation of g n .
Proof: Define a random set S = S(n, p) to be the set of vertices v ∈ T n such that 0 ↔ p v. Let π T denote the law of the random variable S, an atomic probability measure on the subsets of the random set T n . Using where F n be the σ-field generated by F n and T , we obtain the explicit representation Order the vertices in T n arbitrarily and define the revealed martingale {M k } by Arguing as in (3.3), we obtain the explicit representation where for a given set S ⊂ T n , S ≤k denotes the vertices in S indexed ≤ k and S >k denotes the set indexed > k.
We claim the increments of {M k } are bounded by p n . Indeed, (3.5) implies Now the lemma without the factor of ε on the right-hand side follows from Azuma's inequality [Azu67]: the bounded differences yield Gaussian tails on |M |Tn| − M 0 | with variance |T n |p 2n , which is exponentially small in n when µp 2 < 1.
4 Pivot Sequence on the Backbone

Markov property
Define the shift function θ : Ω → Ω by (θ(ω)) v := ω γ1 v . (4.1) Informally, θ shifts the values of random variables at nodes γ 1 v in T (γ 1 ) back to node v; these values populate the whole Ulam tree; values of variables not in T (γ 1 ) are discarded; this is a tree-indexed version of the shift for an ordinary Markov chain. The n-fold shift θ n shifts n steps down the backbone. This subsection is devoted to several versions of the Markov property involving the shift θ as well as the limiting behavior of β n := β(γ n ).
Definition 4.1 (dual trees and pivots). Recall that T (v) denotes the subtree from v, moved to the root. Let T * (v) denote the rooted subtree induced on all vertices w / ∈ T (v), and let β * v,w represent the pivot of the vertex w on T * (v), that is, the least x such that w is connected to infinity without going through v. The dual pivot β * v is defined to be min w<v β * v,w . In keeping with the notation for pivots, we denote β * n := β * γn .
Definition 4.2. We define the following σ-fields.
to be the σ-field generated by all the other data: U w and deg w for all w ∈ T * (v), along with U v .
(ii) For n ≥ 1, let B * n denote the σ-field containing γ n and all sets of the form n is generated by γ n and B * γn .
(iii) Let C n be the σ-field generated by θ n ω; in other words it contains deg(γ n ) and all pairs (deg γn x , U γn x ). It is not important, but this definition does not allow C n to know the identity of γ n .
It is elementary that {B * n } is a filtration, that B * n ∩ C n is trivial, and that B * n ∨ C n = F.
Definition 4.3. We define the following conditioned measures.
A common null set for all the conditioned measures is the set where either the invasion ray is not well defined or β(v) = β * v for some v. Statements such as (4.3) below are always interpreted as holding modulo this null set.
(ii) More generally, if 0 < y ≤ 1 then for any A ∈ F, (iii) Under P, the sequence {β * n } is a time homogeneous Markov chain adapted to B * n with transition kernel p(x, S) = Q x [β * 1 ∧ x ∈ S] and initial distribution δ 1 .
Proof: (i) By definition of conditional probability, the conclusion is equivalent to P[θ n ω ∈ A; G] = G Q β * n [A] dP for all G ∈ B * n . Writing G as the countable union of B * n -measurable sets v (G ∩ {γ n = v}), it suffices to verify the previous identity for each piece G ∩ {γ n = v}. By the definition of B * n , each of these may be written as {γ n = v} ∩ B * for B * ∈ B * v . Thus, it suffices to prove Fixing v, identify (Ω, F, P) as a product space (Ω 1 , . Let π i : Ω → Ω i denote the coordinate maps; then π 2 is a measure preserving map on (Ω, B * v , P) and π 1 is a measure preserving map on (Ω, C v , P). In particular, Working on the left-hand side of (4.2), observe that β • π 2 = β * v and hence, if we let H represent the event that the invasion percolation ever gets to v, then we have Using this, we obtain where H 2 above denotes π 2 (H). The integral over Ω 1 is equal to g(β * v (ω 2 ))1 H2 (ω 2 ) so we may simplify and continue. Writing g A (x) as Ω1 1 A∩{β0<x} (ω) dP 1 (ω) in the second line, (4.2) is finished as follows.
(ii) Begin with the observation that As before, letting G = {γ n = v} ∩ π −1 2 (B), we aim to prove the second identity in the first being definitional. Also by definition, Q y [·] = (1/g(y))P[· ∩ {β 0 < y}], whence, using (4.4), the left-hand side of (4.5) becomes Integrating over Ω 1 turns the last two indicator functions into g(β * v (ω 2 )∧y), again canceling the denominator and yielding 1 Rewriting g A (x) as Ω1 1 A∩{β0<x} dP 1 , this becomes Observing that the first three indicator functions define G, this simplifies to where the last inequality follows from the fact that γ n = v on G, which implies π 1 ω = θ n ω. Hence, that completes the proof of (ii).
(iii) Begin by observing that γ n+1 = v j if and only if γ n = v and β(v j) < β * 1 • π 1 . In other words, given that the backbone contains v, the next backbone vertex depends only on θ n ω and is chosen in the same way the first backbone vertex of ω was chosen. From this it follows that Therefore, Remark. Note that the final equality in the proof of (iii) does not immediately follow from the proof of (i) since {ω : β * 1 ω ∧β * n ω ∈ S} is not a fixed set, but rather depends on β * n . Nevertheless, the proof of this equality follows from a slight modification of the proof of (i), where we simply replace the expressions It is immediate that Q x P for all x. The following more quantitative statement will be useful.
Proposition 4.5. Let q > 1 and suppose that the offspring distribution has a finite q-moment. Then there exists a constant C q such that for all A ∈ T and for all δ > 0 and all x ∈ (p c , 1), Proof: On T , the density of Q x with respect to P is given by Combining Corollary 2.2, which implies g(x) ∼ Kx, with Proposition 3.3, which shows g(T, p c +ε) q dGW(T ) ≤ c q ε q provided p c + is bounded away from 1, we see that for some constant c q and all x ∈ (0, 1). Applying Hölder's inequality with 1/p + 1/q = 1 then gives The measures P x are in some sense more difficult to compute with than Q x because of the conditioning on measure zero sets. Relations such as the Markov property, however, are conceptually somewhat simpler. The following statement of the Markov property generalizes what was proved in [AGdHS08, Theorem 1.2 and Proposition 3.1], with B + n representing the σ-field generated by B * n together with β n . Proposition 4.6 (Markov property for pivots). For any A ∈ F, on (Ω, F, P x ).
Proof: By definition of conditional probability, the conclusion is equivalent to holding for all B ∈ B + n and A ∈ F. It is enough to prove (4.6) for sets that are subsets of {γ n = v} for some v: if it holds for sets of this form, then We now fix v and assume without loss of generality that B ⊆ {γ n = v}. The identity (4.6) we need to prove now reduces to and we need to show it holds for all B ∈ B + v where we recall B + v denotes the σ-field generated by B * v together with β v and σ v denotes shifting to v.
We claim it is enough to prove (4.7) for sets B of the form B 1 × B 2 where B 2 = {β(v) ≤ b} and B 1 is an element of B * v contained in the event {β * v ≥ a} for real numbers 0 < b < a < 1. To see why this is enough, observe that the set of all B for which (4.7) holds is a λ-system, meaning it is closed under increasing union and set theoretic difference of nested sets. The class of sets of the form B 1 × B 2 above are closed under intersection, whence by Dynkin's Theorem [Dur10, Theorem 2.1.2], if (4.6) holds when B is in this class, then it holds for all B in the σ-field generated by this class, which is B + v .
Working on the left-hand side of (4.7), Here, the first equality is definitional, the second uses independence of B * v and β(v), the third uses β(v)(ω) ≤ b if and only if β(0)(σ v ω) ≤ b, and the last holds because σ v preserves the measure P.
Proposition 4.7. The sequence {β n , β * n } is a time-homogeneous Markov chain adapted to {B + n } with initial distribution L × δ 1 .

Analyzing the decay rate of {h n }
This section will be devoted to describing the limiting behaviour of the process {h n }.
(i) The sequence {β n := β(γ n )} is a time-homogeneous Markov chain adapted to {B + n } with initial distribution L. and C a = 1 − φ (1 − (p c + a)g(p c + a)) g(p c + a) g (p c + a) .
Proof: Conclusion (i) follows from the recursion β n+1 = β n • θ by applying the Markov property with n = 1 and A of the form {β 1 ∈ S}. Conclusion (ii) follows from the recursive description of β n as the minimum of max{U (v), β(v)} over children of γ n−1 and γ n is the argmin. More specifically, fix 0 < x < a; then We note P[β 0 ∈ [p c + a, p c + a + da]] = g (p c + a)da + o(da). To calculate the numerator, note that in order for this event to occur, up to a term of O(dx 2 ), only one child of the root may have [p c + x, p c + x + dx]. This child v must have U v ∈ [p c + a, p c + a + da] and all other children must have pivot above p c + a + da. This gives Combining the two and taking da → 0 + completes the proof.
Theorem 4.9. Let U 0 , U 1 , . . . be a sequence of IID random variables each uniformly distributed on (0, 1), and let M n = min {U 0 , U 1 , . . . , U n }. For each C 1 , C 2 such that 0 < C 1 < p c < C 2 , the process {h n } can be coupled with the process {M n } so that, with probability 1, h n eventually (meaning for all sufficiently large n) satisfies C 1 · M n ≤ h n ≤ C 2 · M n .
Proof: We start by looking at the function otherwise .
Writing u as u = r · a (for r ∈ [0, 1)) and using Corollary 2.2, we find that with the convergence clearly being uniform with respect to r. Turning now to the process {M n }, if we definẽ Note that (4.9) implies there must exist δ > 0 such that for a < δ we have 1 C2 < f a (u) < 1 C1 on (0, a). Define N δ := min {n : h n < δ} and note that since h n → 0 a.s., it follows that N δ < ∞ a.s. Define the family of functions Q r for r ∈ [0, δ) where Q r : [0, r) → R is defined as where q := C1 C2 . Observe that because 1 C2 < f a (u) < 1 C1 on (0, a) for a < δ, it follows that and that In addition, note that it follows from (4.10) that we have (4.13) We'll now use the family of functions Q r , along with the process {h n } and the sequence {U k } defined in the statement of the theorem, to define a new sequence {V k }.
, then with probability Q Lj−1 (C 1 · U N δ +j ) set V j equal to C 1 · U N δ +j , and with probability 1 − Q Lj−1 (C 1 · U N δ +j ) set V j equal to C 2 · U N δ +j . Next we define the process h n as Observe that in order to show that {h n } has the same joint distribution as {h n }, it will suffice to establish that for any n > 0 and 0 < x < y < r (4.14) In the case where r ≥ δ we see that (4.14) follows immediately from the definition of {h n }. Alternatively, if r < δ then it will follow from the definition of {h n } that Defining the times τ 1 = min {n : V n < V 0 }, τ 2 = min {n : U N δ +n < M N δ }, and τ = max{τ 1 , τ 2 }, we see that τ 1 < ∞ a.s. due to the fact thath n → 0 a.s. since h n has the same joint distribution as {h n }, and τ 2 < ∞ a.s. due to the U j 's being IID uniform on (0, 1). Sinceh n = min {V 1 , V 2 , . . . , V n−N δ } for n ≥ N δ + τ , M n = min {U N δ +1 , U N δ +2 , . . . , U n } for n ≥ N δ + τ , and C 1 U N δ +j ≤ V j ≤ C 2 U N δ +j for all j ≥ 1, it can be concluded that C 1 M n ≤h n ≤ C 2 M n for all n ≥ N δ + τ , thus establishing that (h n , M n ) gives us our desired coupling.
This coupling is enough to prove convergence on the level of paths to the Poisson lower envelope process. Let P be an intensity 1 Poisson point process on the upper-half-plane; define the Poisson lower envelope process by L(t) := min{y > 0 : (x, y) ∈ P for some x ∈ [0, t]}.

Then we have
to be the Poissonized version of the min-uniform process defined by M n = min{U 1 , . . . , U n } for n ≥ 1 and M 0 = 1. Then note that bothM (t) and L(t) are continuous-time Markov processes that jump from height z to height zU [0, 1] at exponential rate z. Moreover, the process L 1 (t) := 1 ∧ L(t) andM (t) have the same starting value and jump from z to zU [0, 1] at exponential rate z. Using the same exponential clock and uniforms for both processes gives for all t ≥ 0. Since L(t) is eventually less than 1, we have that there exists an almost-surely finite time τ so that Thus, for all k, we have since for all k the process (kL(kt)) t≥0 has the same law as (L(t)) t≥0 .
By the strong law of large numbers, for any fixed ε > 0 and γ > 0, there exists an almost-surely finite random variable K so that is decreasing, this implies for all k ≥ K and uniformly in t ≥ ε we have Combining this with equation (4.17) gives where L is a different version of L.
By Theorem 4.9, for any δ > 0, there is a coupling so that where n 0 is an almost-surely finite stopping time. Combining (4.18) and (4.19) gives for some almost-surely finiteK. Taking δ, γ ↓ 0 completes the proof.
Corollary 4.11. The sequence n · h n converges in distribution to p c · exp(1), where exp(1) is an exponential random variable with mean 1.
Proof: It suffices to show that for every x ∈ (0, ∞), we have lim n→∞ P[n · h n > x] = e −µx . We know from Theorem 4.9 that (4.20) Taking the lim inf of the expressions on the right and left sides in (4.20), while recalling that N δ + τ < ∞ a.s., we find that lim inf n→∞ P[n · h n > x] ≥ e − x C 1 . Since C 1 < p c is arbitrary, it then follows that lim inf n→∞ P[n · h n > x] ≥ e −µx . Conversely, Theorem 4.9 also implies that (4.21) Taking the lim sup of the expressions on the right and left sides in (4.21) then gives lim sup n→∞ P[n · h n > x] ≤ e − x C 2 which, since C 2 > p c is arbitrary, implies lim sup Proof: (i) Since h * n+1 = min{β * n − p c , β * γn+1,γn − p c }, it follows that (ii) For a < x < b we have (iii) For z < a < b we have (iv) For z < a < x < b, we have Noting that h n < h * n , it follows from (i), (ii), (iii), and (iv) that Hence, we see that p({a, b}, ·) = µ a ×μ a,b where Noting that µ a ((0, a]) = g(pc+a) g (pc+a) + (p c + a) = 1 f (a) andμ a,b ((a, b]) = f (a), we define probability measure ν a = f (a)µ a andν a,b = 1 f (a)μ a,b , and we see that ν a andν a,b satisfy the statement in the proposition. (4.23) Now using (4.23), along with the results from the previous paragraph, we find that if n −t 2 < r and n −t < , is O e −Cn 1−t , thus completing the proof.

Proof of Theorem 1.1
This section is devoted to the proof of Theorem 1.1. A high-level summary is given: Outline 5.1. To prove Theorem 1.1, we utilize Corollary 2.12 for a suitable choice of Y (v): where A v := {|q(w)/p(w) − 1| < M n −t for each child w of v} for some t > 1/2 and M > 0. Then by Proposition 2.10, Y (γ n ) is summable so by 2.12, it suffices to show that, with probability 1, |q(w)/p(w)− 1| ≥ M n −t for only finitely many w that are children of γ n .
2. For a vertex v = 0 and p > p c , defineq In words,q(v, p) considers the tree rooted at v and finds the probability that v is in the backbone conditioned on the root having pivot at most p. We then have where β * n is as defined in Definition 4.1 and E (n) * := E[·|T , γ n ].

Show thatq
with high probability (where the sum in the denominator is over all children of γ n including v).

Bound
with high probability.

Comparingq and the ratio of survival functions
The goal of this section is to accomplish step 3 of Outline 5.1. This takes the form of Lemma 5.2. Let {w k } d k=1 be an enumeration of the children of v. Then for any p > p c and j, .
For each j, we observe that The upper bound is the probability that U wj ∨ β(w j ) ≤ p, while the lower bound is the probility that U wj ∨ β(w j ) ≤ p, and that this does not hold for any of the siblings of w j .
This gives the bounds .

(5.4)
Sandwich bounds on the difference with survival ratios follow: . (5.5) Finally, the simple bound of allows us to rewrite equation (5.5) as . (5.6)

Bounds on E(v, ε)
Before completing the final step in the proof of Theorem 1.1, we require estimates on g. For a fixed vertex v in a tree T define E(v, ε) by Proposition 5.3. Suppose the offspring distribution of Z has p ≥ 2 moments. Then for any δ, for which both 0 < δ < 1 and 0 < < 1 2 , there exist constants C i > 0 so that for all ε sufficiently small, a fresh Galton-Watson tree rooted at v satisfies with probability at least 1 − C 3 ε p −δ .
By [AN72, Chapter I.13], we have that By the law of total variance, this implies that Chebyshev's inequality then gives Since µ −m/3 ≤ µ −ε −δ /3 ≤ C 2 e −c1/ε c 2 for some positive constants C 2 and c 1 , c 2 , we have that By computing the lower probabilities of W again, recall that there exist constants C 1 and c 2 so that This implies that C 2 e −c1/ε c 2 < C 1 W (v)ε 1−δ with probability at least 1 − Ce −c 2 c1/ε c 2 . Relabeling constants, this means that for sufficiently small ε, we can upgrade (5.10) to with probability at least 1 − e −c1/ε c 2 .
The last piece is to bound S.O. m (v, ε)/g(p c + ε). By Fubini's theorem, where the second inequality is from the bound (1 + ε pc ) 2m ≤ 2 for sufficiently small ε.
Note that for each j the innermost sum is a sum of IID random variables. We utilize the Fuk-Nagaev inequality from [FN71] which states Applying this bound for t = EW 2 m−j Z j ε −2 gives for some choice of C p > C p . By applying this bound and a union bound, we get with probability at least 1 − C p mε p for some new choice of C. This means that for a fresh Galton-Watson tree, Recalling that g(p c + ε) = Θ(ε) now gives with probability at least 1 − Cε p −δ for some new C. Along with equations (5.8) and (5.11), this now implies the proposition. is greater than 1 2 . Then there exists a constant C > 0 such that for all ε > 0 sufficiently small for the root and its children with probability at least 1 − Cε δ for δ = min p − δ, log(1/p1) log(µ) dδ .
Proof: The first term in equation (5.7) is always eventually smaller than W (v)ε α since the exponent on ε is larger. The final term in equation (5.7) can now be dealt with separately.
By [BD74, Theorems 0 and 5], if Z is in L p , then W k Conditioning on Z 1 , applying a union bound, and taking expectation implies that for the root and all of its children with probability at least 1 − C(1 + µ)ε p −δ . Recalling m = ε −δ and applying this to the latter term in equation (5.7) gives In the case where p 1 = 0, the lower tails on W provided by [Dub71] show that for any r 1 , r 2 > 0 we have P[W (v) < ε r1 ] = o(ε r2 ), thereby showing W (v) < ε r1 with probability less than ε r2 for ε sufficiently small. Setting r 1 = dδ and r 2 = p − δ completes the proof when p 1 = 0.
When p 1 > 0, there exists a constant C so that for all a ∈ (0, 1) This implies that for α as in (5.12), (5.14) Performing a union bound for the root and all of its children again completes the proof.

Completing the Argument
With Corollary 5.4 in place, we're ready to complete the proof of Theorem 1.1. Recalling step 1 of Outline 5.1, it will be sufficient to establish the following proposition.
Claim: With probability 1, P (n) * [A c n ] max 1 p(v) , 1 > n −t for only finitely many children of the backbone.

Proof of Claim:
We begin with the observation that (5.16) Now using Corollary 5.4, we see that if we start with a fresh Galton-Watson tree then for every child v of the root, with probability at least 1 − Cn − t a δ . If we now condition on h * n ≤ n − t a and combine (5.17) with Proposition 4.5 we find that for every v such that v = γ n with probability at least 1 − Cn − t For the next step we recall that equation (5.6) tells us that Using (5.18), we see that when we condition on h * n ≤ n − t a , the latter fraction in (5.19) is bounded by, say 2, for every child of γ n , with probability at least 1 − Cn − t a (1− 1 p )δ . For the former fraction, we note that because g(T(v), p c + ε) ≤ C εW (v) for all ε bounded uniformly away from 1 − p c (see Proposition 3.2), it follows that for a fresh Galton-Watson tree and for s bounded uniformly away from 0, we have Now setting s = t a and combining the above string of inequalities with Proposition 4.5 we find that Combining this with what we determined about the second fraction in (5.19), it now follows that if we condition on h * n ≤ n − t a , then for every child of γ n , with probability at least 1−C n − t a (1− 1 p )δ (where we're using the fact that 1 a (1− 1 p )δ ≤ (p − 1)( 1 a − 1)). Finally, putting (5.20) together with (5.18) and Theorem 4.13, and defining t := t a (1 − 1 p )δ , we get that E P From this last result involving E P (n) * [A c n ] , we know that for any constant C 1 such that 0 < C 1 < 1 we have P P For the next step, we note that for any t > 0 and any constant C 2 with 0 < C 2 < 1, the probability 1 p(v) > n t for any child of γ n , is bounded by , as discussed at the end of section 5.2.
To finish establishing the claim, we now need to show that t, δ, d, , t , and C 1 can be chosen so that