Infection spread for the frog model on trees

The frog model is an infection process in which dormant particles begin moving and infecting others once they become infected. We show that on the rooted $d$-ary tree with particle density $\Omega(d^2)$, the set of visited sites contains a linearly expanding ball and the number of visits to the root grows linearly with high probability.


Introduction
The frog model is a system of interacting walks on a rooted graph and is one of the most studied examples of the class of A + B → 2B models from statistical physics [TW99,KZ17,AMP02,RS04,HJJ17,Her18]. Particles in this class of models have one of two states A and B. A particle in state A changes to state B on encountering a state B particle. Once a particle has state B, it keeps this state for all time. These models are conservative, meaning that particles are never created nor destroyed.
In the frog model, the particles in state A do not move, while the particles in state B perform independent simple random walks in discrete time. Thus, we refer to the state A particles as asleep and the state B particles as awake or active. The initial conditions consist of one particle awake at the root and some numbers of sleeping particles at all other vertices. The particles are traditionally referred to as frogs, a practice we continue.
The class of A + B → 2B models arose from the study of the spread of an infection or rumor through a network. Spitzer asked how fast the infection region (the set of sites visited by a B particle) grows on Z d . In a series of papers, Kesten and Sidoravicius gave a partial answer [KS05,KS06,KS08]. Under the assumption that A and B particles move at the same rate in continuous time, they proved a shape theorem for the infection region that shows it grows linearly in time. A similar shape theorem for the frog model was proven by Alves, Machado, and Popov in discrete time [AMP02,AMPR01] and by Ramírez and Sidoravicius in continuous time [RS04].
We prove here that if the density of particles is sufficiently large, the frog model on the infinite d-ary tree has a linearly growing infection region. If the density of particles is small, then no such shape theorem exists, as some sites remain uninfected forever [HJJ16,Proposition 15].
In [HJJ18], we consider the related question of how long the frog model takes to visit all sites on a finite tree, which we call the cover time. We show the existence of two regimes for the cover time on the full d-ary tree of height n: when the particle density is Ω(d 2 ), the cover time is Θ(n log n); when the particle density is O(d), the cover time is exp(Θ( √ n)). See [HJJ18, Theorem 1.1] for a more complete statement. This paper's results on the infinite tree are an essential part of our proof of the fast cover time regime on the finite tree. We apply them to show linear growth of the infected region on finite trees, far away from leaves. With some more elaborate argument, we then show that with an extra O(n log n) time, the leaves are also visited.
Notation. Given a graph G and starting vertex, we describe a frog model as a pair (η, S). For each vertex v other than the starting one, η(v) gives the number of frogs initially sleeping at v. The random variable S = (S•(v, i)) v∈G,i≥1 is a collection of walks satisfying S 0 (v, i) = v. The ith particle sleeping at v on waking follows the path S•(v, i). For the full construction of the frog model along these lines, see [KZ17]. Typically, S is a collection of simple random walks independent of each other and of η, and (η(v)) v are either i.i.d. or are deterministically equal to some constant, most often 1. When we discuss the frog model on a given graph with, say, i.i.d.-Poi(µ) initial conditions, unless we say otherwise we assume that the paths are simple random walks, and that these independence assumptions are in effect. For the frog model on a tree, the root is assumed to be the starting vertex unless stated otherwise. The frog model evolves in discrete time, though it is easy to show that the results of this paper hold in continuous time as well (see Remark 2.3). A realization of the frog model is called either transient or recurrent depending on whether the starting vertex is visited infinitely often by frogs. Typically, transience and recurrence are almost-sure properties; see [KZ17].
We use T d to refer to the infinite rooted d-ary tree, in which the root has degree d and all other vertices have degree d + 1. We refer to the vertices at distance k from the root as level k of the tree. We define T n d as the full d-ary tree of height n, which is the subset of T d made up of levels 0 to n. For any rooted tree T and vertex v ∈ T , we denote the subtree of T made up of v and its descendants by T (v). We use ∅ to refer the root of whichever tree we are discussing.
The probability measure Geo(p) is the distribution on {0, 1, . . .} of the number of failures before the first success in independent trials that succeed with probability p. We also refer to the geometric distribution on {1, 2, . . .} with parameter p, which is the same distribution shifted. In a slight abuse, we will sometimes use notation like Geo(p) or Bin(n, p) to refer not to a probability distribution but to a random variable with the given distribution, as in a statement like P[Geo(p) ≥ k] = (1 − p) k .
Results. The first result on the frog model was that with one sleeping frog per site on Z d , the model is recurrent with probability one for all d ≥ 1 [TW99]. Next to be pursued were shape theorems for the process on Z d demonstrating linear growth in time for the diameter of the infected region [AMP02, AMPR01,RS04]. Variations on the frog model with drift are considered in [GS09,GNR17,DP14,Ros17c,Ros17a]. The recent article [DGH + 17] establishes a phase transition between transience and recurrence for the one-per-site frog model on Z d for d ≥ 2 as the drift is varied. In [BDD + 17], a shape theorem is proven for a Brownian frog model in Euclidean space. The authors observe a phase transition to superlinear infection spread at the critical threshold for continuum percolation.
The structure of T d induces a natural drift away from the root. Unlike on the lattice, this drift is counterbalanced by the exponential increase in the volume of the tree. We have shown that the frog model exhibits a phase transition between transience and recurrence as d or the initial configuration is changed. With one sleeping frog per site, the frog model on a d-ary tree is recurrent for d = 2 and transient for d ≥ 5, with the behavior for d = 3, 4 still open [HJJ17]. Using similar techniques, Rosenberg proves the process is recurrent on the tree whose levels alternate between two and three children per vertex [Ros17b]. For any d ≥ 2, the frog model on a d-ary tree with i.i.d.-Poi(µ) frogs per site is transient or recurrent depending on whether µ is smaller or larger than a critical value µ c (d) > 0 [HJJ16], and the critical value satisfies µ c (d) = Θ(d) [JJ16].
Our main result is that if µ = Ω(d 2 ), the infected region of the frog model on T d grows linearly, similar to its behavior on Z d .
Theorem 1.1. Consider the frog model on T d with i.i.d.-Poi(µ) initial conditions, where µ ≥ 5d 2 . Let D t be the highest level of the tree at which all vertices have been visited at time t. Then for some c, γ > 0 depending only on d, We also show that the model is strongly recurrent, in the sense that the number of visits to the root grows linearly: While shape theorems for A + B → 2B models on lattices use the classical technique of applying the subadditive ergodic theorem, one needs a different approach on trees. Here, our version of a shape theorem follows without too much difficulty from our statement of strong recurrence, Theorem 1.2, and its proof is the bulk of the work. All proofs of recurrence for frog models on trees have used a bootstrap approach. Essentially, a lower bound on return counts applied to subtrees is leveraged to obtain an improved lower bound on return counts in the entire tree. This improved bound for the tree is then applied again to the subtrees to yield a further improved bound on the tree, and so on. The argument takes the form of producing an operator A acting on probability distributions such that if π is the distribution of the return count for each subtree, then Aπ is the distribution of the return count for the entire tree. One then argues that A n π → ∞. This approach was used in [HJJ17,HJJ16,JJ16].
Our proof of strong recurrence in this paper adopts the same approach in spirit, even using the same operator notation, but in implementation is completely different from previous arguments. Rather than acting on return counts, our operator acts on point processes that represent the time of each return, which are considerably more difficult to work with. Using stochastic inequalities for point processes, we manage to reduce the problem to showing that a certain deterministic sequence defined by a recurrence relation is bounded away from zero. This turns out to be surprisingly difficult and technical (see Lemma 3.13). The proof works only when µ = Ω(d 2 ), which leaves a gap in our understanding, as the frog model is transient only when µ = O(d). We would be very interested to see a more probabilistic version of this technical section of the proof. It seems plausible that this might let it be improved to show strong recurrence for a smaller value of µ.
Questions. The most pressing question raised in this paper is the behavior of the frog model when d ≪ µ ≪ d 2 . As we mentioned above, we know that the frog model on the infinite tree T d is transient for µ = O(d) [HJJ16, Proposition 15] and recurrent for µ = Ω(d) [JJ16, Theorem 1], and in this paper we show it strongly recurrent when µ = Ω(d 2 ). Question 1.3. Is there a weak recurrence phase for the frog model on T d , where the root is occupied for a vanishing fraction of all time and the infected region grows sublinearly?
If such a phase exists, it is unclear whether it would occur for an interval of µ or just at a single critical value. In [HJJ18], we ask whether there are more than the two known regimes for the cover time of finite trees, and whether there are sharp phase transitions between regimes. We suspect that Question 1.3 would be the first step toward resolving those questions.

Modified frog models
As we make our argument, we will usually work with variants of the frog model where frogs have nonbacktracking paths. We describe these processes here and relate them back to the usual frog model.

2.1.
The self-similar frog model. The self-similar frog model on the infinite tree T d is as defined in [JJ16]. Put succinctly, it is a modified version of the frog model with nonbacktracking paths in which all but the first frog to enter any given subtree are killed. For a more complete definition, we first define a nonbacktracking random walk (or a uniform nonbacktracking random walk if we wish to distinguish it from other nonbacktracking walks) from a vertex v 0 on any graph with minimum vertex degree 2. The walk starts at v 0 . Its first step is to a uniformly random neighbor of v 0 . On all subsequent steps, it moves to a vertex chosen uniformly from all its neighbors except the one it just arrived from.
We now define the self-similar frog model in two steps. First, for each v ∈ T d and i ≥ 1, let S•(v, i) = (S j (v, i)) j≥0 be a random nonbacktracking walk on T d starting from v, killed on arrival to ∅ at times 1 and beyond. Let these walks be independent for all v and i, and let S = (S•(v, i)) v∈T d ,i≥1 . Let η = (η(v), v ∈ T d ) be a given collection of sleeping frog counts. Next, we modify the frog model (η, S) by killing additional frogs as follows. As the frog model (η, S) runs, at each step consider all vertices visited for the first time at that step. Suppose that v ∈ T d \ {∅} is one of these vertices, and call its parent v ′ . On this step, v is necessarily visited by one or more frogs from v ′ . Kill all but one of these frogs on arrival to v, choosing arbitrarily which one survives. At subsequent steps, kill all frogs on arrival to v. The effect is that if a nonroot vertex v is ever visited, then it is visited only once by a frog originating outside of T d (v), and the frog model on T d (v) appears identical in law as the original one: 2.2. The nonbacktracking frog model. First, define a root-biased nonbacktracking random walk from v 0 on T d as a walk distributed as follows. We set X 0 = v 0 , and then we choose X 1 uniformly from the neighbors of X 0 . Conditionally on X 0 , . . . , X i , we choose X i+1 as follows: If X i = ∅, choose X i+1 to be X i−1 with probability 1/d 2 and to be each of the other children of the root with probability (d + 1)/d 2 . Otherwise, choose X i+1 uniformly from the neighbors of X i other than X i−1 . It turns out that this describes the behavior of the walk that results from deleting all excursions of a simple random walk away from its eventual path (see Appendix A). The odd behavior at the root results from the asymmetry of the tree there. We then define the nonbacktracking frog model on T d as the frog model whose paths are independent root-biased nonbacktracking random walks on T n d . Recalling that T n d consists of levels 0, . . . , n of T d define a root-biased nonbacktracking random walk from v 0 on T n d just as above, except that when X i is a leaf of T n d , we define X i+1 to be the parent of X i . Define the nonbacktracking frog model on T n d to have these walks as its paths.
The following result, which we prove in Appendix A, demonstrates that the time change for the underlying random walks has little effect. This allows us to work with nonbacktracking frog models when we prove Theorem 1.1.
Proposition 2.2. Let (η, S) and (η, S ′ ) be respectively the usual and the nonbacktracking frog models, both on T d or both on T n d , with arbitrary initial configuration η. There exists a coupling of the frog models (η, S) and (η, S ′ ) such that the following holds: For any b > log d, there exists C = C(b) such that all vertices visited in (η, S ′ ) by time t are visited in (η, S) by time Ct with probability 1 − e −bt .
Remark 2.3. To extend the results of this paper to continuous time, one could prove a version of this proposition in which a continuous-time frog model is coupled with a discrete-time nonbacktracking frog model.
We mention that frog models on the finite tree T n d do not come up in this paper. We include such models in Proposition 2.2 because the proof is nearly identical for them, and we need the result for our paper [HJJ18] on cover times for the frog model.

Strong recurrence for the self-similar model
In this section, we show that in the self-similar frog model on T d with sufficiently large initial density of frogs, the visits to the root are bounded from below by a Poisson process. Our main results, Theorems 1.1 and 1.2, will follow in the next section as easy corollaries.
To state the result, define the return process of a given frog model to be a point process consisting of a point at t for each frog occupying the root at time t ≥ 1. Note that the return process will be supported on the positive even integers, as frogs never visit the root at odd times by periodicity. For point processes ξ 1 and ξ 2 , we say that ξ 1 is stochastically dominated by ξ 2 (denoted ξ 1 ξ 2 ) if there exists a coupling of ξ 1 and ξ 2 such that every point of ξ 1 is contained in ξ 2 . See Section 3.1 for more background material on stochastic dominance between point processes.
Theorem 3.1. Consider the self-similar frog model on T d with i.i.d.-Poi(µ) initial conditions. For any d ≥ 2, α > 0, and µ ≥ 3d(d + 1) + α(d + 1), the return process stochastically dominates a Poisson point process with intensity measure ∞ k=1 αδ 2k . Ramírez and Sidoravicius prove that the field of site occupation counts for the one-per-site frog model on Z d in continuous time converges to independent Poisson random variables with mean one [RS04, Theorem 1.2]. Compared to this result, ours provides useful information across times, but it only gives lower bounds. Though our result is stated for occupation times at the root only, by Lemma 2.1 it can be applied to any site starting at the time of its first visit.
The proof of Theorem 3.1 is in three steps, Lemmas 3.6, 3.11, and 3.13. Let θ be the return process of the frog model in Theorem 3.1. First, we show that θ A n 0 for all n, where A is a certain operator acting on the laws of point processes and 0 is the empty point process. Second, we prove by induction that A n 0 dominates a Poisson point process with intensity measure n k=1 λ k δ 2k for an explicit sequence (λ k ) k≥1 depending on µ. The final step is to show that λ k remains bounded away from zero as k → ∞. Though λ k can be explicitly computed given µ, its formula is a complicated expression involving λ 1 , . . . , λ k−1 , and it takes some effort to show that λ k does not decay to zero as k tends to infinity. As we mentioned in the introduction, this argument is a sort of bootstrap. Every time we iterate the operator A, the self-similarity of the frog model together with a lower bound on the return process yields an improved lower bound on the return process.
3.1. Stochastic dominance for point processes. Let X and Y be random variables. We say that X is stochastically dominated by for all x ∈ R. This is equivalent to the existence of a coupling of X and Y where X ≤ Y a.s.
If X Y , then P[X = 0] ≥ P[Y = 0] by the definition of stochastic dominance. If X is Poisson and Y is a mixture of Poissons, then the converse is true as well: . Let X ∼ Poi(λ), and let Y ∼ Poi(U ) for some nonnegative random variable U . Then the following are equivalent: See also [JJ16, Section 2] for a quick presentation of the proof from [MSH03].
A Cox process is a mixture of Poisson point processes. The next result is a sufficient condition for stochastic dominance of a Poisson process by a Cox process in the same spirit as the previous lemma. Notationally, we view a point process ξ as a nonnegative integervalued random measure on R. For U ⊆ R, we denote the restriction of the point process to U by ξ| U . We write the number of atoms of ξ found in U as ξ(U ), abbreviating ξ({x 1 , . . . , x n }) to ξ{x 1 , . . . , x n }. Proof. This follows from [SS07, Theorem 6.B.3] combined with Lemma 3.2. This amounts to coupling ξ{x 1 } with a Poisson random variable, then conditioning on ξ{x 1 } and coupling ξ{x 2 } with an independent Poisson random variable, then conditioning on ξ{x 1 } and ξ{x 2 } and coupling ξ{x 3 } with an independent Poisson, and so on.
3.2. The operator A and its connection to the frog model. Our first task is to define an operator A = A d,µ acting on distributions of point processes. For a point process ξ with distribution ν, we will abuse notation slightly and write Aξ rather than Aν.
Let us first explain the idea behind A. The initial frog in the self-similar frog model moves from the root ∅ down the tree, first to a child ∅ ′ , and then to one of its children, v 1 . Let v 2 , . . . , v d be the other children of ∅ ′ . The basic idea is to imagine the frog model as Figure 1. The point process Aξ records the visit times to ρ in a point process that behaves somewhat like the frog model. The difference is that when u i is first visited, particles are released not immediately but at the times in ξ (i) , which is an independent copy of the point process ξ. One should think of this system as a frog model where we ignore the activity past level 2 of the tree, paying attention only to when particles emerge back to level 1.
occurring only on these vertices. When a frog moves to x i , we close our eyes to the subtree T d (x i ), imagining it as a black box that will occasionally emit frogs going back towards the root. We then define Aξ as the return process of this frog model if we assume that each black box emits frogs at times given by an independent copy of ξ, shifted by the time of activation.
To formally define Aξ, consider a modified frog process taking place on a star graph with center ρ ′ and leaf vertices ρ, u 1 , . . . , u d (see Figure 1). One should think of ρ and ρ ′ as paralleling ∅ and ∅ ′ , and of u 1 , . . . , u d as as paralleling v 1 , . . . , v d . We define Aξ as the distribution of the point process of times of visits to ρ in the process defined as follows.
(i) At time 0, there is a single particle awake at ρ and Poi(µ) sleeping particles at ρ ′ . There are also particles asleep on each of u 1 , . . . , u d , as will be described in (iv). (ii) The initially active particle moves from ρ to ρ ′ to u 1 and then halts. (iii) The particles at ρ ′ are woken at time 1 by the initial particle. In the next step, each particle moves independently to a neighbor chosen uniformly at random. All particles halt after this step. (iv) When site u i is visited, the particles there undergo a delayed activation as follows.
For each i, let ξ (i) be an independent copy of ξ. For each atom in ξ (i) , there is a sleeping particle on u i . If the atom is at k, then this sleeping particle is awoken k − 2 steps after u i is first visited, and it makes its first move (necessarily to ρ ′ ) one time step after this. In its next step, it chooses uniformly at random from the neighbors of ρ ′ except for u i (i.e., it takes a random nonbacktracking step). It then halts. The next lemma gives the connection between this operator A and the frog model. The proof is in the same in spirit as [JJ16, Lemma 3.5]. Proof. We couple the self-similar frog model with the particle system defining Aθ as follows. Couple the number of frogs at ρ ′ in the particle system with the number of frogs initially at ∅ ′ , and couple the first step of each particle with its corresponding frog. Also suppose without loss of generality that the initial frog moves to v 1 after ∅ ′ . At time 2, then, a frog moves to v 1 and a particle moves to u 1 . By Lemma 2.1, if v i is visited, then the self-similar frog model from this time on restricted to ∅ ′ ∪ T d (v i ) is distributed as the original frog model on ∅ ∪ T d (∅ ′ ). Thus, the visit times to ∅ ′ by frogs originating within T d (v 1 ) are distributed as θ, shifted forward in time by one step (the first possible visit to ∅ ′ is at time 3, while the first possible atom of θ n is at 2). This matches the definition of Aθ, and so we can couple these return times to the times of visits of particles to ρ ′ from u 1 in the particle system. We also couple their next steps. We then continue in this way, coupling return times of frogs to independent shifted copies of θ whenever one of the sites v 2 , . . . , v d is visited. Ultimately, the return times to ∅ in the frog model are identical to the visits to ρ in the particle system, proving that θ d = Aθ.
Proof. Couple the number and paths of the particles initially at ρ ′ in the particle systems defining Aξ and Aξ ′ . Couple ξ (i) and ξ ′(i) for i = 1, . . . , d so that every point in ξ (i) is contained in ξ ′(i) , and couple the paths of the corresponding particles to be identical. Now, every visit to ρ at time k in the particle system defining Aξ also occurs in the particle system defining Aξ ′ , demonstrating that Aξ Aξ ′ .
Lemma 3.6. Let θ be the return process in Theorem 3.1. For all n ≥ 0, we have θ A n 0.
Proof. We prove this by induction on n. Trivially, the lemma holds when n = 0. Now, assuming that θ A n 0, we apply Lemmas 3.4 and 3.5 to deduce that If ξ is a Poisson point process, we can take advantage of Poisson thinning to give a tractable lower bound on Aξ. Given a point process ξ, let τ ξ denote the 1/d-thinning of ξ, a point process that includes each atom of ξ independently with probability 1/d. Let σ t ξ denote the result of shifting each atom in ξ by t (for example, σ t (δ 2 + δ 5 ) = δ 2+t + δ 5+t ). Given a sequence of nonnegative numbers (λ k ) k≥1 , define S to be the number of failures until a success in mixed Bernoulli trials where the first trial has success probability 1 − e −µ/(d+1) , and the (k + 1)th has success probability 1 − e −λ k /d for k ≥ 1. Alternatively, we can write S as the random variable on {0, 1, . . .} ∪ {∞} with distribution given by We will typically have λ k = 0 for k > n. Under this condition, S is supported on {0, . . . , n}∪ {∞}.
Proposition 3.7. Suppose that ξ is a Poisson point process with intensity measure . . be independent copies of S defined above, let ξ (1) , ξ (2) , . . . be independent copies of ξ, and let Z ∼ Poi µ/(d + 1) be independent of all of these. Then where τ and σ t are the thinning and shift operators defined above. Proof. Consider the particle system defining Aξ. Suppose that we disallow particles starting at vertices u 2 , . . . , u d from waking up any of sites u 2 , . . . , u d . Since there are stochastically fewer particles, the process of return times to ρ in this system is stochastically dominated by Aξ. We now show that these return times are distributed as the right-hand side of (3).
The first term in (3) counts the particles initially sleeping at ρ ′ that move to ρ on step 2. The second term counts the particles that emerge from u 1 and move to ρ ′ and then ρ. The other terms count the particles emerging from u 2 , . . . , u d and moving to ρ ′ and then ρ. We now explain why these terms are as in (3) and why they are independent.
There are initially Poi(µ) particles on vertex ρ ′ in the particle system. These particles are woken on step 1 and then move to random neighbors in step 2. This is the source of the Zδ 2 term. By definition of the particle system, particles from u i arrive at ρ ′ at times k − 1 after u i is activated, for each atom k in ξ (i) . Each of these particles moves next to ρ with probability 1/d. Hence, ρ is visited at times τ ξ (i) after activation of u i . In particular, this explains the form of the second term on the right-hand side of (3), as u 1 is activated at time 2.
Last, we consider the time of activation of each of u 2 , . . . , u d . These sites can be activated either by particles initially at ρ ′ or by particles emerging from u 1 . For i = 2, . . . , d, let Z i be the number of particles initially at ρ ′ that move to u i . Similarly, let τ i ξ (1) denote the point process that keeps the points of ξ (1) corresponding to particles that move from ρ ′ to u i . By Poisson thinning, Z, Z 2 , . . . , Z d are independent, and τ ξ (1) , τ 2 ξ (1) , . . . , τ d ξ (1) are independent, and both are independent of each other as well. This explains the independence of the terms on the right-hand side of (3). The first visit to u i is the first point of and it is easily seen that this is distributed as 2 + 2S. As ρ is visited by particles from u + i at times τ ξ (i) after activation of u i , this explains the terms in the summand on the right-hand side of (3).
3.3. Iterating A to prove strong recurrence. By Lemma 3.6, a lower bound on A n 0 is a lower bound on the return process of the self-similar frog model. Thus, all it takes to prove Theorem 3.1 are suitable lower bounds on A n 0. We provide them inductively, by applying Proposition 3.7 and Lemma 3.3 to extend a lower bound on A n 0 to A n+1 0. This argument deals only with point process; one can completely forget about the frog model for the remainder of the section.
For a fixed choice of µ and d, we will define a sequence (λ n ) n≥1 . We then define a point process χ n with intensity measure n k=1 λ k δ 2k , which will serve as the lower bound for A n 0 (see Lemma 3.11). Our definition of (λ n ) is in terms of another sequence (P n ) n≥1 defined by a recurrence relation. As we will prove immediately after giving the definitions, the two sequences (P n ) and (λ n ) are related by Thus, for any n ≤ m, we think of λ n as the intensity of the Poisson point process χ m at the point 2n, and we think of P n as the probability that a 1 d -thinned version of χ m contains no points at {2, 4, . . . , 2n}. The reason that the definitions are given in terms of (P n ) is to set us up to apply Lemma 3.3.
The inductive hypothesis shows that all factors are nonnegative, proving P n+1 ≤ P n . Hence, (P n ) n≥1 is decreasing. Thus, all terms in (7) are nonnegative, showing that P n ≥ 0 for all n. Combined with (8), this proves that λ n ≥ 0.
The proof of this is a lengthy and unilluminating sequence of algebraic manipulations that we have left to Appendix B. We are left with the feeling that there must be a probabilistic explanation, but we could not come up with it.
In the next lemma, we see that χ n gives us a lower bound on A n 0 and hence on the return process of the self-similar frog model. We will see that the definition of (P n ) is tailored to this lemma.
Lemma 3.11. For all n ≥ 0, it holds that A n 0 χ n .
Proof. We will show that Aχ n χ n+1 for all n ≥ 0. The proposition then follows by repeatedly applying Lemma 3.5. By Proposition 3.7, it suffices to show that Here Z ∼ Poi µ/(d + 1) , χ (i) n is distributed as χ n , S (i) is distributed according to (2), and all are mutually independent. Thus, all terms on the left-hand side of (9) are independent. The terms Zδ 2 and σ 2 τ χ Proof. For the sake of brevity, let χ = σ 2+2S (i) τ χ (i) n . As mentioned, χ is a Cox process, since it is Poisson conditional on S. Our goal is to apply Lemma 3.3 to it. With this in mind, for 2 ≤ k ≤ n we compute P χ{4, 6, . . . , 2k Expanding this using the distribution of S (i) from (2), applying (4), and then applying (7), The same calculation shows that the final conclusion holds in the k = 1 case, and it it holds trivially in the k = 0 case. Hence, for 1 ≤ k ≤ n, We claim that this bound on the probability of χ{2k + 2} = 0 holds conditional on any point configuration for χ on {4, . . . , 2k}. That is, for all 1 ≤ k ≤ n, To prove this, suppose that χ {4,...,2k} contains any points. This guarantees that S (i) ≤ k −2. For s ≤ k − 2 (or indeed even for s ≤ k − 1), is decreasing by Proposition 3.10. This shows that conditional on χ {4,...,2k} being equal to any nonempty collection of points, the probability that χ{2k + 2} = 0 is bounded by e −λ k+1 /d . Together with (10), this confirms (11). By Lemma 3.3, this proves the claim.
To finish the proof of Lemma 3.11, let ξ (1) , ξ (2) , . . . denote Poisson point processes with intensity measures n+1 k=2 λ k d δ 2k , independent of each other and all else. By the claim, The right-hand side is Poisson with intensity measure Since λ k−1 ≥ λ k by Proposition 3.10, the right-hand side of (12) stochastically dominates χ n+1 . This confirms (9) and completes the proof.
With Lemmas 3.6 and 3.11 established, all that remains is to show that χ n dominates a Poisson point process with intensity measure n k=1 αδ 2k for some α > 0. This amounts to showing that λ k ≥ α > 0 for all k ≥ 1. We give a technical lemma and then the proof.
By Proposition 3.9, the sequence λ k is nonnegative, and by Proposition 3.10, it is decreasing. Thus it has a limit. Suppose the limit is strictly smaller than dγ. Recalling that λ k = −d log(P k /P k−1 ), we then have P k /P k−1 ≥ e −γ+ǫ for all sufficiently large k and some ǫ > 0. Thus, lim inf But this contradicts (13). Therefore lim k→∞ λ k ≥ dγ.
Proof of Theorem 3.1. Let ξ n be a Poisson point process with intensity measure n k=1 αδ 2k , for n ∈ N ∪ {∞}. By Lemmas 3.6 and 3.11, we have θ χ n for all n ∈ N. Applying Lemma 3.13 with γ = α/d, we have χ n ξ n . Thus θ ξ n for all n ∈ N. By [SS07, Theorem 6.B.30], which asserts the equivalence of our definition of stochastic dominance to stochastic dominance of all finite-dimensional distributions, we have θ ξ ∞ .

Proofs of the main results
Theorem 3.1 and Proposition 2.2 combine for a quick proof of Theorem 1.2: Proof of Theorem 1.2. Let U t be the number of visits to the root by time t in the self-similar frog model on T d with i.i.d.-Poi(µ) initial conditions. It follows from µ ≥ 5d 2 and d ≥ 2 that µ ≥ 3d(d + 1) + 2 3 (d + 1). By Theorem 3.1, U t Poi 2 3 ⌊t/2⌋ . Applying Proposition C.1 or even just Chebyshev's inequality, Choosing c appropriately in (1), this completes the proof.
Next, we build toward our shape result. The following lemma is also helpful for our cover time results in [HJJ18]. Proof. It is a consequence of Lemma 2.1 that (τ i ) i≥1 are i.i.d. For the tail bound, we first observe that τ 1 = 1 if a frog at ∅ ′ moves immediately to v 1 . The initial frog does this with probability 1/d, and one of the frogs starting at ∅ ′ does so with probability 1 − e −µ/(d+1) . Hence, which proves (16) when t = 1. If neither of these events occurs and τ 1 > 1, then some sibling v ′ 1 of v 1 is visited one step after ∅ ′ is visited. By Lemma 2.1 and Theorem 3.1, the number of visits from v ′ 1 to ∅ ′ in the first 2t steps after activation of ∅ ′ is stochastically larger than Poi(βdt). Each of these frogs moves next to v 1 with probability 1/d. By Poisson thinning, the number of visits to v 1 in the first 2t + 1 steps after activation of ∅ ′ is stochastically larger than Poi(βt). Thus, Equations (17) and (18) show that P[τ 1 > 2t + 1] ≤ e −β(t+1) , proving (16) for t ≥ 2. This lemma shows that the time to wake up a given vertex at level k is something like the sum of k geometric random variables. Thus, by a union bound, all vertices at level k are likely to be visited in time O(k). We now make this formal to prove our main result.
Start with a self-similar frog model on T d , and let ∅ ′ be the child of the root visited on the first step. We now make a change to this process to allow frogs outside of T d (∅ ′ ) to be visited. Rather than killing all frogs at the root at time 1 on, allow them to continue moving as root-biased nonbacktracking walks (see Section 2.2), reflecting with probability 1/d 2 and moving to the other children of the root with probability (d + 1)/d 2 each. Since this process continues to follow the rule that only a single frog is allowed to enter any subtree, a frog at the root is still stopped if its next move is to a previously visited child of the root. The resulting frog model is then a stopped version of the nonbacktracking frog model on T d , which we can relate back to the usual frog model via Proposition 2.2.
Consider an arbitrary path ∅, v 0 , . . . , v k−1 from the root outward in T d . Define τ 0 as the time that v 0 is first visited, and then for i ≥ 1 define τ i as the number of steps it takes to visit v i after v i−1 is first visited. The time to visit v k−1 is then τ 0 + · · · + τ k−1 . The time τ 0 does not fit the criteria of Lemma 4.1 exactly, because a frog that moves from a sibling of v 0 back to ∅ moves next to v 0 with probability (d + 1)/d 2 rather than 1/d as in Lemma 4.1. But this only makes it more likely to visit v 0 , and so it still satisfies the bound that τ i does in (16), by the same argument. Regardless of how long it takes to visit v 0 , once it is visited, the process restricted to T d (v 0 ) is identical in law. Thus, τ 0 is independent of τ 1 , . . . , τ k−1 .
From the time that v 0 is visited, the model restricted to {∅} ∪ T d (v 0 ) with frogs stopped at ∅ is a self-similar frog model. With the annoyance of dealing with the asymmetry of the root behind us, we apply Lemma 4.1 to conclude that the random variables (τ i ) are i.i.d. and have exponential tails whose decay is a fixed constant (since β = 1/3). By Proposition C.2, vertex v k−1 is unvisited at time Ck with probability at most d −3k , for C depending only on d.
We now take a union bound over all d k choices of v k−1 at level k, showing that there is an unvisited level k vertex at time Ck with probability at most d −2k . By Proposition 2.2, it holds with probability 1 − d −2k that all vertices visited in our process by time Ck are also visited in the usual frog model on T d within time C ′ k, for a large enough choice of C ′ depending on d. Thus, Setting k = ⌈t/C ′ ⌉ completes the proof.

Appendix A. Excursion decomposition of random walks on trees
The goal of this section is to prove Proposition 2.2 by breaking down a random walk on a tree into a loop-erased portion plus excursions. We carry this out first for a random walk on T hom d , which denotes the (d + 1)-homogeneous tree, the infinite tree in which all vertices have degree d + 1. The decomposition for random walks on the less symmetric tree T d will follow as a corollary. Though we do not need it for this paper, we also work out the decomposition for walks on the finite tree T n d , which we use in [HJJ18]. Given neighboring vertices u, v ∈ T hom d , we define a T hom d -excursion from u with first step to v as a random walk on T hom d defined as follows. The walker begins at u and takes its first step to v. On subsequent steps before returning to u, move towards u with probability d/(d + 1), and move to the d other possible neighbors each with probability 1/d(d + 1). The T hom d -excursion ends when u is reached, which will occur almost surely. Our next proposition decomposes a simple random walk on T hom d into a a nonbacktracking random walk spine (recall the definition from Section 2.1) with independent T hom d -excursions off of it. We believe this decomposition must be known in some form, but we have not managed to find a reference to it. Given two paths x 0 , . . . , x a and y 0 , . . . , y b such that x a = y 0 , we define the concatenation of the first and second paths as x 0 , . . . , x a , y 1 , . . . , y b . Note that the concatenation of the two walks x 0 , . . . , x a and x a leaves the first walk unaffected.
are independent for different j and distributed as follows: First step: Let G ∼ Geo (d − 1)/d be independent of all else. Define (X 0 , E 0 1 , . . . , E 0 ℓ0 ) to be the concatenation of G independent T hom d -excursions from X 0 with first step chosen uniformly from the neighbors of y 0 . Subsequent steps: Let G ∼ Geo d/(d + 1) be independent of all else. For all j ≥ 1, define (X j , E j 1 , . . . , E j ℓj ) to be the concatenation of G independent T hom d -excursions from X j with first step chosen uniformly from the neighbors of X j other than X j−1 .
Note that ℓ j = 0 if the random variable G used to construct it is zero.
Then (Y i ) i≥0 is simple random walk on T hom In other words, Z J is the farthest toward y 0 the walk will reach from time n onward. We claim that for any nearest-neighbor walk y 0 , . . . , y n , To prove this, let z 0 , . . . , z k be the geodesic from y 0 to y n . Given J = j and (Y 0 , . . . , Y n ) = (y 0 , . . . , y n ), we can classify the steps of the walk into three categories: Permanently forward steps: The step from y i to y i+1 is permanently forward if it moves away from y 0 , the distance from y i to y 0 is greater than j, and the vertex y i does not appear again in y i+1 , . . . , y n . This implies that (y i , y i+1 ) = (z i ′ , z i ′ +1 ) for some i ′ > j and that the walk will never revisit y i . A permanently forward step from y i = y 0 to y i+1 occurs with probability (d − 1)/d(d+ 1), since the walk has probability (d− 1)/d of taking a permanently forward step and probability 1/(d + 1) of taking this step to y i+1 . A permamently forward step from y i = y 0 to y i+1 has probability 1/(d + 1), since the walk has probability d/(d + 1) of taking a permanently forward step and probability 1/d of taking this step to y i+1 .

Excursion-forward steps:
The step from y i to y i+1 is excursion-forward if it moves away from y 0 and is not permanently forward (i.e., either y i appears in y i+1 , . . . , y n or the distance from y i to y 0 is j or less). This implies that the step is part of an excursion. Each excursion-forward step occurs with probability 1/d(d + 1). To prove this, suppose the step is from y i to y i+1 . If the walk has taken an excursion-forward step prior to step i and has not yet finished the T hom d -excursion, then this holds by the dynamics of a T hom d -excursion. If y i = y 0 , then the walk has probability 1/d of starting a T hom d -excursion and probability 1/(d + 1) of taking its next step to y i+1 . If y i = z i ′ for some i ′ > 0 and it is not in the midst of a T hom d -excursion, then the walk has probability 1/(d + 1) of starting a T hom d -excursion and probability 1/d of taking its next step to y i+1 . Backward steps: The step from y i to y i+1 is backward if it is toward y 0 . Every backward step occurs in the midst of a T hom d -excursion and has probability d/(d + 1).
We now use this decomposition of the walk to prove (19). Let f and b be the number of forward and backward steps, respectively, in y 0 , . . . , y n . Since f − b = k and f + b = n, we have f = (n + k)/2 and b = (n − k)/2. If j = 0, then all forward steps are excursion-forward, and If j > 0, then there are j permanently forward steps, one of which is from y 0 . The rest of the forward steps are excursion-forward. Thus, P J = j and (Y 0 , . . . , Y n ) = (y 0 , . . . , This establishes (19). Therefore, Thus, the law of (Y 0 , . . . , Y n ) is uniform over all nearest-neighbor paths from y 0 .
Next, we obtain similar decompositions of random walks on the infinite tree T d and on the finite tree T n d . These decompositions are slightly more complicated as a result of the asymmetry of the trees. First, we embed T d into T hom d as follows. Designate one vertex in T hom d as the root ∅. Specify one of its neighbors as its parent and the other d as its children. Then consider T d to consist of ∅ and all of its descendants. We also embed T n d into T d in the obvious way. Now, consider a walk (X i ) i≥0 in T hom d . We define a new walk as follows. Delete all portions of (X i ) that lie outside of T d . This might result in the walk remaining at the root for consecutive steps, in which case we delete all but one of these steps to keep it a nearestneighbor walk. We call this the restriction of (X i ) i≥0 to T d . Completely analogously, we define the restriction of (X i ) i≥0 to T n d . Observe that the restriction of (X i ) to T d will be a finite path if (X i ) escapes to infinity in the direction of the parent of the root in T hom d . The restriction of (X i ) to T n d is always finite, so long as (X i ) escapes to infinity. Given neighboring vertices u, v ∈ T d , we define a T d -excursion from u with first step to v as the restriction to T d of a T hom d -excursion from u with first step to v. Similarly, for any neighboring vertices u, v ∈ T n d , a T n d -excursion from u with first step to v is the restriction of a T d -excursion from u with first step to v.
When we talk of a random walk randomly stopped at a given vertex with probability p, we mean that on each visit to the vertex, the walk is stopped with probability p independent of all else.
be the restrictions of these to T d . Then the following holds: (i) The walk (X i ) S i=0 is a uniform random nonbacktracking walk stopped at the root randomly with probability 1/(d + 1) at time 0 and with probability 1/d on subsequent visits to the root.
i=0 is a simple random walk on T d stopped at the root randomly with probability , and when S = ∞, Conditional on (X i ) S i=0 and on S, the walks (E j i ) ℓj i=1 are independent and distributed as follows for 0 ≤ j ≤ S: (a) If X 0 = ∅, then (X 0 , E 0 1 , . . . , E 0 ℓ0 ) is the concatenation of Geo (d 2 − 1)/(d 2 + d−1) many independent T d -excursions from ∅ with first step uniformly chosen from the children of ∅. (b) If X 0 = ∅, then (X 0 , E 0 1 , . . . , E 0 ℓ0 ) is the concatenation of Geo (d−1)/d many independent T d -excursions from X 0 with first step chosen uniformly from the neighbors of X 0 . (c) If X j = ∅ for j ≥ 1, then (X j , E j 1 , . . . , E j ℓj ) is the concatenation of Geo d 2 /(d 2 + d−1) many independent T d -excursions from ∅ with first step chosen uniformly from the children of ∅ other than X j−1 . (d) If X j = ∅ for j ≥ 1, then (X j , E j 1 , . . . , E j ℓj ) is the concatenation of Geo d/(d + 1) many independent T d -excursions from X j with first step chosen uniformly from the neighbors of X j other than X j−1 .
Proof. By Proposition A.1, (X i ) ∞ i=0 is a uniform nonbacktracking random walk on T hom d . The walk (X i ) S i=0 follows the same path, except that if (X i ) moves outside of T hom d , then (X i ) S i=0 is stopped on the last step before it does so. This has a 1/(d + 1) chance of occurring on the first step if the walk starts at the root, and it has a 1/d chance of occurring on subsequent visits to the root. This proves (i).
Similarly, (Y i ) ∞ i=0 is a simple random walk on T hom d . Thus, (Y i ) T i=0 is a simple random walk on T d , except that it is stopped if (Y i ) is at ∅ and will never visit any children of ∅ afterwards. (Note that it is not necessarily the case that (Y i ) is stopped on the last visit of (Y i ) to T hom d , as one might expect.) A short calculation (done easily with electrical networks) shows that whenever (Y i ) is at ∅, the chance that it never visits any of the children of ∅ is To prove (iii), we note that (X j , E j 1 , . . . , E j ℓj ) is the concatenation of either Geo (d−1)/d or Geo d/(d + 1) many T hom d -excursions with uniformly selected first step, restricted to T d . Since a T hom d -excursion restricted to T d is a T d -excursion, the only complication in finding the distribution of (X j , E j 1 , . . . , E j ℓj ) is that if the T hom d -excursion is from ∅ with first step to the parent of ∅, then its restriction to T d has length 1 and is effectively deleted. When X j = ∅, this does not come up, taking care of cases (b) and (d). When X j = ∅, the number of excursions is thinned by d/(d + 1) when j = 0 or by (d − 1)/d when j ≥ 1, since excursions to the parent of ∅ are ignored. Thinning Geo(p) by q yields the distribution Geo p/(p + (1 − p)q) , and applying this formula proves (a) and (c).
The analogous lemma for T n d -excursions is nearly identical, and we omit its proof.
be the restrictions of these to T n d . Then the following holds: i=0 is a uniform random nonbacktracking walk on T d n stopped at the root randomly with probability 1/(d + 1) at time 0 and with probability 1/d at all times past this, and stopped at the leaves randomly with probability d/(d + 1) at time 0 and with probability one at all times past this.
i=0 is a simple random walk on T n d stopped randomly at the root with probability (d − 1)/(d 2 + d − 1) and at the leaves with probability where conditional on (X i ) S i=0 and on S, the walks (E j i ) ℓj i=1 are independent and distributed as follows for 0 ≤ j ≤ S: (a) If X 0 = ∅, then (X 0 , E 0 1 , . . . , E 0 ℓ0 ) is the concatenation of Geo (d 2 −1)/(d 2 +d− 1) many independent T n d -excursions from ∅ with first step uniformly chosen from the children of ∅. (b) If X 0 is a leaf of T n d , then (X 0 , E 0 1 , . . . , E 0 ℓ0 ) is the concatenation of Geo (d 2 − 1)/d 2 many independent T n d -excursions from X 0 with first step to the parent of X 0 . (c) If X 0 is neither the root nor a leaf, then (X 0 , E 0 1 , . . . , E 0 ℓ0 ) is the concatenation of Geo (d − 1)/d many independent T n d -excursions from X 0 with first step chosen uniformly from the neighbors of X 0 . (d) If X j = ∅ for j ≥ 1, then (X j , E j 1 , . . . , E j ℓj ) is the concatenation of Geo d 2 /(d 2 + d−1) many independent T n d -excursions from ∅ with first step chosen uniformly from the children of ∅ other than X j−1 . (e) For j ≥ 1, if X j is a leaf of T n d , then ℓ j = 0. (f ) For j ≥ 1, if X j is neither the root nor a leaf, then (X j , E j 1 , . . . , E j ℓj ) is the concatenation of Geo d/(d+1) many independent T n d -excursions from X j with first step chosen uniformly from the neighbors of X j other than X j−1 .
We are nearly ready to give the decompositions of random walks on T d and on T n d . Recall the definitions of root-biased nonbacktracking walks on T d and T n d from Section 2.2. Proposition A.4. Let (X i ) i≥0 be a root-biased nonbacktracking random walk from v 0 on T d , and define (Y i ) i≥0 on T d by where conditionally on (X i ) i≥0 , the paths (E j i ) ℓj i=1 are independent for different j and defined as follows: to be the concatenation of G independent T d -excursions with first step chosen uniformly from the to be the concatenation of G independent T d -excursions with first step chosen uniformly from the neighbors of X j other than X j−1 . (iii) Let G 1 ∼ Geo d 2 /(d 2 + d − 1) and G 2 ∼ Geo (d − 1)/d be independent of each other and all else. For all j ≥ 1 where X j = ∅ and as follows. With probability 1/(d + 1), let it be the concatenation of G 1 independent T d -excursions with first step chosen uniformly from the children of the root other than X j−1 followed by G 2 independent T d -excursions with first step chosen uniformly from all children of the root. With probability d/(d + 1), let it just be the concatenation of G 1 independent T d -excursions with first step chosen uniformly from the children of the root other than X j−1 . (iv) Let G 1 ∼ Geo d 2 /(d 2 +d−1) and G 2 ∼ Geo (d−1)/d be independent of each other and all else. For all j ≥ 1 where X j = ∅ and X j−1 = X j+1 , define (E j i ) ℓj i=0 as the concatenation of G 1 independent T d -excursions with first step chosen uniformly from the children of the root other than X j−1 followed by G 2 independent T d -excursions with first step chosen uniformly from all children of the root.
Tj i=0 for j = 0, . . . , J to form (Y i ) i≥0 . (This is a slight abuse of notation, as (X i ) i≥0 and (Y i ) i≥0 are not the walks from Proposition A.1.) We just need to show that these stitched together walks fit the description of the statement of this proposition. We start by showing that (X i ) is a root-biased nonbacktracking walk. From the description of (X (j) 0 ) i≥0 as a randomly stopped nonbacktracking walk given in Lemma A.2, we can describe (X i ) as follows. Starting at v 0 , it moves as a uniform nonbacktracking random walk. If it arrives at the root from a nonroot vertex, it is stopped with probability 1/d. If this occurs, it is reset; it forgets its previous step and moves to a uniform child of the root. It might be stopped repeatedly before it manages to make this move-that is, the underlying walks (X (j) i ) may include several with H j = 0-but nonetheless the next step of (X i ) after being reset in this way is to a uniform child of the root.
From this description, it is clear that X 1 is uniform on the neighbors of v 0 , and that conditional on (X 0 , . . . , X i ) for i ≥ 1, the distribution of X i+1 is uniform on the neighbors of X i except for X i−1 whenever X i = ∅. The only question is as to the distribution of X i+1 when X i = ∅. From our description, in this case, with probability (d − 1)/d the distribution of X i+1 is uniform on the children of the root except for X i−1 , and with probability 1/d is uniform over all children of the root. This mixture matches our definition of a root-biased random walk.
The proof that (Y i ) is simple random walk is similar but simpler. From Lemma A.2, the walk (Y i ) is the concatenation of a sequence of independent randomly stopped simple random walks, which is a simple random walk.
It now remains to describe the distribution of the excursions. , and so on. Regardless, this gives us a decomposition of (Y i ) as (20) with the paths (E j i ) ℓj i=1 conditionally independent given (X i ), and each made up of geometrically many independent T d -excursions. We just need to show that parameters of the geometric distributions and the distributions of their first steps match the statement of this lemma. When X j = ∅, the path (E j 1 , . . . , E j ℓj ) is taken from a single (Y (j) i ), and from Lemma A.2(iii), the path is distributed as given in (i) or (ii), depending on whether j = 0.
When X j = ∅, we first consider the case that j = 0. Then the path (E 0 1 , . . . , E 0 ℓ0 ) is the concatenation of the excursions in (Y where J is the smallest value so that H j ≥ 1. Note that J − 1 is then distributed as Geo d/(d + 1) , since for each j, the events H j ≥ 1 occur independently with probability d/(d + 1). Each set of excursions is the concatenation of Geo (d 2 − 1)/(d 2 + d − 1) many independent T d -excursions with first step chosen uniformly from the children of the root. Thus, the total number of excursions is distributed as the sum of 1 + Geo d/(d + 1) many independent Geo (d 2 − 1)/(d 2 + d − 1) -distributed random variables, which is Geo (d − 1)/d . (Here we use the general fact that a sum of 1 + Geo(p) many independent Geo(q) random variables is distributed as Geo pq/(1 − q + pq) . This confirms that (E 0 1 , . . . , E 0 ℓ0 ) is distributed as (i) when X 0 = ∅. If X j = ∅ for j ≥ 1 and X j−1 = X j+1 , then the underlying system of stopped and restarted uniform nonbacktracking random walks moved from X j−1 to X j , was stopped, and then moved to X j−1 when restarted. Thus, (E j i ) ℓj i=1 is the concatenation of two collections of excursions: first, Geo d 2 /(d 2 + d − 1) many independent T d -excursions from ∅ with first step chosen uniformly from the children of ∅ other than X j−1 , by Lemma A.2(iii)(c); second, 1 + Geo d/(d + 1) many concatenated independent Geo (d 2 − 1)/(d 2 + d − 1) many T d -excursions with first step chosen uniformly from the children of the root, as in the case j = 0. Together, this matches the description of (iv) from this proposition.
Last, suppose that X j = ∅ for j ≥ 1 and X j−1 = X j+1 . Now, it is possible that the underlying system of walks was restarted at the root or not. Conditioning on X j−1 = X j+1 makes the probability that we were stopped 1/(d + 1). If so, then the path (E j i ) ℓj i=1 is distributed as in (iv). Otherwise, it is the concatenation of Geo d 2 /(d 2 + d − 1) many independent T d -excursions from ∅ with first step chosen uniformly from the children of ∅ other than X j−1 . This shows that (E j i ) ℓj i=1 is distributed as given in (iii).
Last, we give the decomposition for walks on T n d . The proof is very similar to that of Proposition A.4, and we omit it.
Proposition A.5. Let (X i ) i≥0 be a root-biased nonbacktracking random walk from u 0 on T n d , and define another path where conditionally on (X i ) i≥0 , the paths (E j i ) ℓj i=1 are independent for different j and defined as in Proposition A.4, except that all T d -excursions are replaced by T n d -excursions, and when X j is a leaf, is distributed is the concatenation of Geo (d − 1)/d independent T n dexcursions with first step to the parent of X j . Then (Y i ) i≥0 is a simple random walk on T d n from u 0 .
Corollary A.6. Let (X i ) i≥0 be a root-biased nonbacktracking random walk from x 0 , and let (Y i ) i≥0 be a simple random walk from x 0 , both on T d or both on T n d . Suppose that they are coupled as in Proposition A.4 or A.5. Recall that ℓ j is the length of the path inserted in between X j and X j+1 to form (Y i ) i≥0 . Then the random variables (ℓ i ) i≥0 are independent conditional on (X i ) i≥0 , and it holds for some absolute constant c > 0 and all real numbers Proof. Let U be distributed as the length of a T hom d -excursion, which stochastically dominates the length of a T d -or T n d -excursion. We can characterize U as follows. Let (S k ) k≥1 be a biased random walk that moves left with probability d/(d + 1) and right with probability 1/(d + 1), with H 1 = 1. Then U is the first time that (S k ) hits 0, which we note is always even. If U ≥ 2 + 2u, then at least u of the first 2u steps are to the right, which by Proposition C.1 has probability bounded by Hence U/2 1 + Geo 1 − e −1/14 . By Proposition A.4 or A.5, conditional on (X i ) i≥0 the path X j , E j 1 , . . . , E j ℓj is made up of stochastically at most Geo (d − 1)/d concatenated T d -or T n d -excursions. We work conditionally on (X i ) i≥0 for the rest of the proof. The random variable ℓ j is stochastically dominated by twice a sum of 1 + Geo (d − 1)/d many independent random variables distributed as 1 + Geo 1 − e −1/14 . As the sum of 1 + Geo(p) many independent random variables distributed as 1 + Geo(q) is distributed as 1 + Geo(pq), This shows that for all t ≥ 0 . Hence (21) is satisfied. The conditional independence of (ℓ i ) i≥0 follows directly from Proposition A.4 or A.5.
Proof of Proposition 2.2. Let V be the set of vertices visited in the nonbacktracking model (η, S ′ ) by time t. Suppose v ∈ V. This means that there exists some sequence of frogs in (η, S ′ ) starting with the initial one such that each visits the next and the last visits v. Consider the path of length at most t formed by pasting together the portion of each frog's walk up until it hits the next frog or hits v. The same frogs take these same steps in the model (η, S), but with excursions inserted between each step. By Corollary A.6, conditional on (η, S ′ ), these excursions are independent and their lengths have an exponential tail. Choosing C 0 depending on b and applying Proposition C.2, conditional on (η, S ′ ), the combined length of these excursions is at most C 0 t with probability at least 1 − e −2bt . Let C = 1 + C 0 . We have shown that if E(v) is the event that v is unvisited in (η, S) at time Ct, then for any v visited in (η, S ′ ) by time t.
Observe that |V| ≤ d t , since no vertex beyond level t can be visited by time t. By a union bound, since b ≥ log d. Taking expectations completes the proof.
Appendix B. Proof of Proposition 3.10 In this appendix, we prove Proposition 3.10. First, we give some notation representing weighted averages that we will use throughout. Given nonnegative weights w 1 , . . . , w n , not all zero, and quantities a 1 , . . . , a n , we define WA(w 1 , a 1 | . . . | w n , a n ) = w 1 a 1 + · · · + w n a n w 1 + · · · + w n , the weighted average of a 1 , . . . , a n with weights proportional to w 1 , . . . , w n . The following lemma is easy to check by direct calculation.
Our goal now is to show that (23) holds. In (24), we will express p n as a weighted average. Using Lemma B.5, we then change this expression one term at a time to create a chain of inequalities that ends with p n+1 . We now define expressions for forming this chain. Definition B.3. For fixed n, we define functions u i (x 1 , . . . , x n−1 ) and related quantities u j i . Let u i (x 1 , . . . , x n−2 ) = x 1 · · · x i−1 (1 − x i )P n−1−i , 1 ≤ i ≤ n − 2, u n−1 (x 1 , . . . , x n−2 ) = x 1 . . . x n−2 .

Appendix C. Miscellaneous concentration inequalities
Proposition C.1. Let EY = λ, and suppose either that Y is Poisson or that Y is a sum of independent random variables supported on [0, 1]. For any 0 < α < 1, and for any α > 1, Proof. These inequalities are well known consequences of the Cramér-Chernoff method of bounding the moment generating function and applying Markov's inequality (see [BLM13, Section 2.2]), though it is difficult to find these exact bounds in the literature. For a convenient statement of these inequalities obtained by other means, apply [CGJ18, Theorem 3.3] with c = p = 1.
Proposition C.2. Let (X i ) n i=1 be a collection of independent random variables satisfying P[X i ≥ ℓ] ≤ Ce −bℓ for some C and b > 0 and all ℓ ≥ 1. Then for any b ′ > 0, there exists C ′ depending on C, b, and b ′ such that We can take C ′ = 2(b ′ + C)/b.
Proof. This is another consequence of the Cramér-Chernoff method. Observe that