Long paths in first passage percolation on the complete graph I. Local PWIT dynamics

We study the random geometry of first passage percolation on the complete graph equipped with independent and identically distributed edge weights, continuing the program initiated by Bhamidi and van der Hofstad [9]. We describe our results in terms of a sequence of parameters $(s_n)_{n\geq 1}$ that quantifies the extreme-value behavior of small weights, and that describes different universality classes for first passage percolation on the complete graph. We consider both $n$-independent as well as $n$-dependent edge weights. The simplest example consists of edge weights of the form $E^{s_n}$, where $E$ is an exponential random variable with mean 1. In this paper, we investigate the case where $s_n\rightarrow \infty$, and focus on the local neighborhood of a vertex. We establish that the smallest-weight tree of a vertex locally converges to the invasion percolation cluster on the Poisson weighted infinite tree. In addition, we identify the scaling limit of the weight of the smallest-weight path between two uniform vertices.


Model and results
In this paper, we continue the program of studying first passage percolation on the complete graph initiated in [9]. We start by introducing first passage percolation (FPP). Given a graph G = (V (G), E(G)), let (Y (G) e ) e∈E(G) denote a collection of positive edge weights. Thinking of Y (G) e as the cost of crossing an edge e, we can define a metric on V (G) by setting (a) The distance W n = d G,Y (G) (i, j) -the total edge cost of the optimal path π i,j ; (b) The hopcount H n -the number of edges in the optimal path π i,j ; (c) The topological structure -the shape of the random neighborhood of a point.
In this paper, we consider FPP on the complete graph and focus on problem (c). In the companion paper [11], we will use these results to investigate problems (a) and (b). We will often refer to results in [11], and write, e.g., [Part II, Section 5.2] to refer to [11,Section 5.2]. We also refer to [Part II, Section 1.4] for an extended discussion of the results in these two papers and their relations to the literature.
In [9], the question was raised what the universality classes are for this model. We bring the discussion substantially further by describing a way to distinguish several universality classes and by identifying the limiting behavior of first passage percolation in one of these classes. The cost regime introduced in (1.1) uses the information from all edges along the path and is known as the weak disorder regime. By contrast, in the strong disorder regime the cost of a path π is given by max e∈π Y (G) e . We establish a firm connection between the weak and strong disorder regimes in first passage percolation. Interestingly, this connection also establishes a strong relation to invasion percolation (IP) on the Poisson-weighted infinite tree (PWIT), which is the scaling limit of IP on the complete graph. This process also arises as the scaling limit of the minimal spanning tree on K n .
Our main interest is in the case G = K n , the complete graph on n vertices V (K n ) = [n] := {1, . . . , n}, equipped with independent and identically distributed (i.i.d.) edge weights (Y (Kn) e ) e∈E(Kn) .
We write Y for a random variable with Y d = Y (G) e , and assume that the distribution function F Y of Y is continuous. For definiteness, we study the optimal path π 1,2 between vertices 1 and 2; by exchangeability, π 1,2 has the same law as π u,v for any other u, v ∈ [n], u = v. In [9] and [12] this setup was studied for the case that Y (Kn) e d = E s where E is an exponential mean 1 random variable, and s > 0 constant and s = s n > 0 a null-sequence, respectively. We start by stating our main theorem for this situation where s = s n tends to infinity. First, we introduce some notation: Notation. All limits in this paper are taken as n tends to infinity unless stated otherwise. A sequence of events (A n ) n happens with high probability (whp) if P(A n ) → 1. For random variables (X n ) n , X, we write X n d −→ X, X n P −→ X and X n a.s.
−→ X to denote convergence in distribution, in probability and almost surely, respectively. For real-valued sequences (a n ) n , (b n ) n , we write a n = O(b n ) if the sequence (a n /b n ) n is bounded; a n = o(b n ) if a n /b n → 0; a n = Θ(b n ) if the sequences (a n /b n ) n and (b n /a n ) n are both bounded; and a n ∼ b n if a n /b n → 1. Similarly, for sequences (X n ) n , (Y n ) n of random variables, we write X n = O P (Y n ) if the sequence (X n /Y n ) n is tight; X n = o P (Y n ) if X n /Y n P −→ 0; and X n = Θ P (Y n ) if the sequences (X n /Y n ) n and (Y n /X n ) n are both tight. Moreover, E always denotes an exponentially distributed random variable with mean 1.

First passage percolation with n-dependent edge weights
We start by investigating the case where Y = E sn where s n → ∞:  (2) .
(1.2) When s n → ∞ the values of the random weights E sn e depend strongly on the disorder (E e ) e∈E(G) , making small values increasingly more favorable and large values increasingly less favorable, thus the weak disorder problem with weights E sn e approaches the strong disorder problem. Mathematically, the elementary limit lim expresses the convergence of the ℓ s norm towards the ℓ ∞ norm and establishes a relationship between the weak disorder regime and the strong disorder regime of FPP. We continue by discussing the result in Theorem 1.1 in more detail. Any sequence (u n (x)) n for which nF Y (u n (x)) → x is such that, for i.i.d. random variables (Y i ) i∈N with distribution function (1.4) As the distribution function F Y is continuous, we will choose u n (x) = F −1 Y (x/n), so that nF Y (u n (x)) is x. The value u n (1) is denoted by u n . In view of (1.4), the family (u n (x)) x∈(0,∞) are the characteristic values for min i∈[n] Y i . See [13] for a detailed discussion of extreme value theory. (In the strong disorder regime, u n (x) varies heavily in x such that the phrase characteristic values can be misleading.) In the setting of Theorem 1.1, F Y (y) = 1 − e −y 1/sn ≈ y 1/sn when y = y n tends to zero fast enough, so that u n (x) can be taken as u n (x) ≈ (x/n) sn (where ≈ indicates approximation with uncontrolled error). Then we see in (1.2) that W n ≈ u n (M (1) ∨ M (2) ), which means that the random fluctuations of the weight of the smallest-weight path are of the same order of magnitude as some of the typical values for the minimal edge weight adjacent to vertices 1 and 2.
To explain the appearance of the random variables M (1) and M (2) , we now informally state that the local neighborhoods of the smallest-weight tree for FPP from a single source converges to invasion percolation on the so-called Poisson-weighted infinite tree. We start by introducing these notions informally; for more details see Section 2.1. In invasion percolation (IP) on a weighted graph, we grow the invasion percolation cluster by starting from a single vertex and sequentially adding the edge attached to the cluster with minimal edge weight. The Poisson-weighted infinite tree (PWIT), which serves as the large-n limit of the complete graph with i.i.d. exponential edge weights, is the infinite weighted tree for which the edge weight between a vertex and its ith child, jointly in i, has the same distribution as E 1 + · · · + E i , where (E i ) i≥1 are i.i.d. exponential random variables with mean 1. The weights of edges between different vertices and their children in the tree are independent. When performing IP on the PWIT, the largest edge weight ever to be accepted has distribution M (1) as in Theorem 1.1. When performing IP on the complete graph started from the two sources 1 and 2, the largest edge weight accepted converges in distribution to M (1) ∨M (2) . This explains how Theorem 1.1 can be interpreted in terms of IP on the PWIT and on the complete graph. The next theorem shows that this relation also holds for local neighborhoods of vertices: where (s n ) n is a positive sequence with s n → ∞. For each fixed m ∈ N, the topology of the FPP smallest-weight tree to the nearest m vertices, as well as the weights along its edges, converge in distribution to invasion percolation on the Poisson-weighted infinite tree.
In Section 2.2, all notions needed in Theorem 1.2 are formally defined. Moreover, the evolution of two smallest-weight trees started from vertices 1 and 2 is explored in detail.

The universal picture
Our results are applicable not only to the edge-weight distribution Y (Kn) e d = E sn with s n → ∞, but to a large family of distributions which we now characterize. Interestingly, this family includes edge weights whose distribution is independent of n. For fixed n, the edge weights (Y (Kn) e ) e∈E(Kn) are independent for different e. However, there is no requirement that they are independent over n, and in fact in Section 4, we will produce Y (Kn) e using a fixed source of randomness not depending on n. Therefore, it will be useful to describe the randomness on the edge weights ((Y (Kn) e ) e∈E(Kn) : n ∈ N) uniformly across the sequence. It will be most useful to give this description in terms of exponential random variables.  . (1.6) The relations between these parametrizations are given by (1.7) We define f n (x) = g(x/n) = F −1 Because of this convenient relation between the edge weights Y (Kn) e and exponential random variables, we will express our hypotheses about the distribution of the edge weights in terms of conditions on the functions f n (x) as n → ∞. In Section 1.3, we use this formulation for increasing g or analogous for decreasing function, to explore the universality class in which our results hold in more detail.
Consider first the case Y Thus, (1.9)-(1.10) show that the parameter s n measures the relative sensitivity of min i∈[n] Y i to fluctuations in the variable E. In general, we will have f n (x) ≈ f n (1)x sn if x is appropriately close to 1 and s n ≈ f ′ n (1)/f n (1). These observations motivate the following conditions on the functions (f n ) n , which we will use to relate the distributions of the edge weights Y (Kn) e , n ∈ N, to a sequence (s n ) n : Condition 1.3 (Scaling of f n ). For every x ≥ 0, (1.11) Condition 1.4 (Density bound for small weights). There exist ε 0 > 0, δ 0 ∈ (0, 1] and n 0 ∈ N such that (1.12) Condition 1.5 (Density bound for large weights).
(a) For all R > 1, there exist ε 1 > 0 and n 1 ∈ N such that for every 1 ≤ x ≤ R and n ≥ n 1 , For all C > 1, there exist ε 1 > 0 and n 1 ∈ N such that (1.13) holds for every n ≥ n 1 and every x ≥ 1 satisfying f n (x) ≤ Cf n (1) log n.
Notice that Condition 1.3 implies that f n (1) ∼ u n whenever s n = o(n). Indeed, by (1.8) we can write u n = f n (x 1/sn n ) for x n = (−n log(1 − 1/n)) sn . Since s n = o(n), we have x n = 1 − o(1) and the monotonicity of f n implies that f n (x In Section 1.3 the universality class is explored in more detail, and several examples that satisfy Conditions 1.3-1.5 are discussed. The results from Section 1.1 generalize as follows: Theorem 1.6. Suppose that Condition 1.5 (a) is satisfied for a positive sequence (s n ) n with s n / log log n → ∞. Let M (1) , M (2) be i.i.d. random variables for which P(M (j) ≤ x) is the survival probability of a Poisson Galton-Watson branching process with mean x. Then (1.14) Theorem 1.6 is proved in Section 7.
Theorem 1.7. Let (s n ) n be a positive sequence with s n → ∞. Suppose that Condition 1.5 (a) holds and in addition Condition 1.4 holds for any value δ 0 ∈ (0, 1). Then, for each fixed m ∈ N, the topology of the FPP smallest-weight tree to the nearest m vertices, as well as the weights along its edges, converges in distribution to invasion percolation on the Poisson-weighted infinite tree. 1.3 Regularity: Sufficient condition for the universality class Theorems 1.6 and 1.7 apply also to n-independent distributions. The following example collects some edge-weight distributions that satisfy Conditions 1.3-1.5: and define s n = ρn α .
and define s n = ρn α .
where t → L(t) is slowly varying as t → ∞. The following proposition shows that edge weights of this type satisfy our conditions with s n = n α L(n): is regularly varying with index −α as x ↓ 0. If either of these equivalent conditions holds then, writing for all x > 0, u ∈ (0, 1), Furthermore if either of the asymptotically equivalent sequences satisfies s n → ∞, then Conditions 1.3, 1.4 and 1.5 (a) hold, while if in addition s n / log log n → ∞ then Condition 1.5 (b) holds as well.
Note that replacing (s n ) n by an asymptotically equivalent sequence makes no difference in Conditions 1.3-1.5. Moreover, every sequence (s n ) n of the form s n = n α L(n), for α ≥ 0 and L slowly varying at infinity, can be obtained from a n-independent distribution by taking log F −1 Y (u) = u −1−α L(1/u)du, i.e., the indefinite integral of the function u → u −1−α L(1/u). For a given nindependent distribution, Proposition 1.9 allows us to define the sequence s n using either F Y or g, whichever is more convenient. See the proof of Corollary 1.10 for details.
Proof of Proposition 1.9. The equivalence follows by noting that (1.7) and (1.18)), so that L(t) is slowly varying as t → ∞ if and only if L(t) is. Conditions 1.4 and 1.5 (a) follow from the observation that (recall (1.8)) and the definition of slowly varying, while Condition 1.3 follows from the computation log since L(n/u 1/sn )/ L(n) → 1 and u −α/sn → 1, uniformly over u in a compact subset of (0, ∞). Finally, assume s n / log log n → ∞ and fix any C, R ∈ (1, ∞). By Condition 1.5 (a), there is ε 1 = ε(R) > 0 such that for sufficiently large n, log(f n (R)/f n (1)) ≥ ε 1 s n R 1 1/x dx. Therefore, for sufficiently large n, f n (R) ≥ f n (1)R ε 1 sn ≥ Cf n (1) log n by Condition 1.5 (a) and the inequality (1.13) holds for any value x ≥ 1 such that f n (x) ≤ Cf n (1) log n, as required by Condition 1.5 (b). Proof. For Example 1.8 (a) is immediate. For Example 1.8 (d), Proposition 1.9 allows us to derive the sequence s n . In this case it is more convenient to use the second equation in (1.18) with g(x) = exp(−ρx −α /α), thus we get We conclude that the Conditions 1.3, 1.4 and 1.5 (a) hold with s n = ρn α . Since s n / log log n → ∞, Condition 1.5 (b) holds as well. For Example 1.8 (c) it is more convenient to use the first equation in (1.18 By Proposition 1.9, we can set s n = ρn α , and Conditions 1.3-1.5 hold since s n / log log n → ∞. In this case is more convenient to use the second equation in (1.18) with so that L(n) ∼ 1 κρ 1/κ (log n) 1/κ−1 . Asymptotic equivalence does not affect Conditions 1.3-1.5, so we can set s n = (log n) 1/κ−1 κρ 1/κ . By Proposition 1.9 and the observation that s n / log log n → ∞ since κ < 1, Conditions 1.3-1.5 hold.
An example of an edge-weight distribution that is n-dependent but not Y e d = E sn is the following: Example 1.11. Let (s n ) n be a positive sequence with s n → ∞, s n = o(n 1/3 ). Let U be a positive, continuous random variable with distribution function G and Take Y (Kn) e d = U sn , i.e., F Y (y) = G(y 1/sn ). Examples for G are the uniform distribution on an interval (0, b), for any b > 0, or the exponential distribution with any parameter.
Lemma 1.12. The edge weights of Example 1.11 satisfy Conditions 1.4-1.5, and Condition 1.3 when s n / log log n → ∞. 2 Fine results on the IP part of FPP In this section, we argue that the optimal path between two vertices can be divided into two parts: The local neighbourhoods of the two endpoints that follow IP dynamics by Theorem 1.7, and the main part of the path which is characterized in terms of a branching process. The main results of this paper connect the maximal weight M (1) in IP to the transition time between these two regimes, and give a detailed description of the topology of the neighbourhood contained in the IP part.

Coupling FPP to a continuous-time branching process
To understand the random neighbourhood of a vertex in the complete graph, we study the first passage exploration process. Recall from (1.1) that d Kn,Y (Kn) (i, j) denotes the total cost of the optimal path π i,j between vertex i and j. For a vertex j ∈ V (K n ), let the smallest-weight tree SWT (j) t be the connected subgraph of K n defined by t ) = e ∈ E(K n ) : e ∈ π j,i for some i ∈ V (SWT (j) t ) . (2.1) Note that SWT (j) t is indeed a tree: if two optimal paths π j,k , π j,k ′ pass through a common vertex i, both paths must contain π j,i since the minimizers of (1.1) are unique.
To visualize the process (SWT (j) t ) t≥0 , think of the edge weight Y (Kn) e as the time required for fluid to flow across the edge e. Place a source of fluid at j and allow it to spread through the graph. Then V (SWT (j) t ) is precisely the set of vertices that have been wetted by time t, while E(SWT (j) t ) is the set of edges along which, at any time up to t, fluid has flowed from a wet vertex to a previously dry vertex. Equivalently, an edge is added to SWT (j) t whenever it becomes completely wet, with the additional rule that an edge is not added if it would create a cycle.
Because fluid begins to flow across an edge only after one of its endpoints has been wetted, the age of a vertex -the length of time that a vertex has been wet -determines how far fluid has traveled along the adjoining edges. Given SWT (j) t , the future of the exploration process will therefore be influenced by the current ages of vertices in SWT (j) t , and the nature of this effect depends on the probability law of the edge weights (Y (Kn) e ) e . In the sequel, for a subgraph G = (V (G), E(G)) of K n , we write G instead of V (G) for the vertex set when there is no risk of ambiguity.
To study the smallest-weight tree from a vertex, say vertex 1, let us consider the time until the first vertex is added. By construction, min i∈[n]\{1} Y (Kn) , where E is an exponential random variable of mean 1. We next extend this to describe the distribution of the order statistics of the weights of edges from vertex 1 to all other vertices.
where S k,n = k j=1 n n−j E j and (E j ) j∈[n−1] are i.i.d. exponential random variables with mean 1. The fact that the distribution of S k,n depends on n is awkward, and can be changed by using a thinned Poisson point process. Let X 1 < X 2 < · · · be the points of a Poisson point process with intensity 1, so that X k d = k j=1 E j = lim n→∞ S k,n . To each k ∈ N, we associate a mark M k which is chosen uniformly at random from [n], different marks being independent. We thin a point X k when M k = 1 (since 1 is the initial vertex) or when M k = M k ′ for some k ′ < k. Then In the next step, we extend this result to the smallest-weight tree SWT (1) using a relation to FPP on the Poisson-weighted infinite tree. Before giving the definitions, we recall the Ulam-Harris notation for describing trees. Define the tree T (1) as follows. The vertices of T (1) are given by finite sequences of natural numbers headed by the symbol ∅ 1 , which we write as ∅ 1 j 1 j 2 · · · j k . The sequence ∅ 1 denotes the root vertex of T (1) . We concatenate sequences v = ∅ 1 i 1 · · · i k and w = ∅ 1 j 1 · · · j m to form the sequence vw = ∅ 1 i 1 · · · i k j 1 · · · j m of length |vw| = |v| + |w| = k + m. Identifying a natural number j with the corresponding sequence of length 1, the j th child of a vertex v is vj, and we say that v is the parent of vj. Write p (v) for the (unique) parent of v = ∅ 1 , and p k (v) for the ancestor k generations before, for k ≤ |v|.
We can place an edge (which we could consider to be directed) between every v = ∅ 1 and its parent; this turns T (1) into a tree with root ∅ 1 . With a slight abuse of notation, we will use T (1) to mean both the set of vertices and the associated graph, with the edges given implicitly according to the above discussion, and we will extend this convention to any subset τ ⊂ T (1) . We also write ∂τ = {v / ∈ τ : p (v) ∈ τ } for the set of children one generation away from τ . The Poisson-weighted infinite tree is an infinite edge-weighted tree in which every vertex has infinitely many (ordered) children. To describe it formally, we associate weights to the edges of T (1) . By construction, we can index these edge weights by non-root vertices, writing the weights as X = (X v ) v =∅ 1 , where the weight X v is associated to the edge between v and its parent p(v). We make the convention that X v0 = 0. Definition 2.1 (Poisson-weighted infinite tree). The Poisson-weighted infinite tree (PWIT) is the random tree (T (1) , X) for which X vk − X v(k−1) is exponentially distributed with mean 1, independently for each v ∈ T (1) and each k ∈ N. Equivalently, the weights (X v1 , X v2 , . . . ) are the (ordered) points of a Poisson point process of intensity 1 on (0, ∞), independently for each v.
Motivated by (2.2), we study FPP on T (1) with edge weights (f n (X v )) v : Definition 2.2 (First passage percolation on the Poisson-weighted infinite tree). For FPP on T (1) with edge weights (f n (X v )) v , let the FPP edge weight between v ∈ T (1) \ {∅ 1 } and p (v) be f n (X v ). The FPP distance from ∅ 1 to v ∈ T (1) is and the FPP exploration process BP (1) Note that the FPP edge weights (f n (X vk )) k∈N are themselves the points of a Poisson point process on (0, ∞), independently for each v ∈ T (1) . The intensity measure of this Poisson point process, which we denote by µ n , is the image of Lebesgue measure on (0, ∞) under f n . Since f n is strictly increasing by assumption, µ n has no atoms and we may abbreviate µ n ((a, b]) as µ n (a, b) for simplicity. Thus µ n is characterized by Clearly, and as suggested by the notation, the FPP exploration process BP (1) is a continuoustime branching process: Similar to the analysis of the weights of the edges containing vertex 1, we now introduce a thinning procedure. Define M ∅ 1 = 1 and to each other v ∈ T (1) \ {∅ 1 } associate a mark M v chosen independently and uniformly from [n].
Note that whether or not a vertex v is thinned can be assessed recursively in terms of earlierborn vertices 1 and therefore Definition 2.4 is not circular.
Write BP (1) t for the subgraph of BP (1) t consisting of unthinned vertices. If a vertex v ∈ T (1) is thinned, then so are all its descendants, and this implies that BP (1) t is a tree for all t. Note that if the marks (M v ) v∈τ are distinct then π M (τ ) and τ are isomorphic graphs. The following theorem establishes a close connection between FPP on K n and FPP on the PWIT with edge weights (f n (X v )) v : Theorem 2.6 (Coupling to FPP on PWIT). The law of (SWT (1) t ) t≥0 is the same as the law of π M BP (1) t t≥0 . Theorem 2.6 is based on an explicit coupling between the edge weights (Y (Kn) e ) e on K n and (X v ) v on T (1) . A general form of those couplings and the proof of Theorem 2.6 are given in Section 4.

Relation to invasion percolation on the PWIT
Under our scaling assumptions, FPP on the PWIT is closely related to invasion percolation (IP) on the PWIT which is defined as follows. Set IP (1) (0) to be the subgraph consisting of ∅ 1 only. For k ∈ N, form IP (1) (k) inductively by adjoining to IP (1) (k − 1) the boundary vertex v ∈ ∂IP (1) (k − 1) of minimal weight. We note that, since we consider only the relative ordering of the various edge weights, we can use either the PWIT edge weights (X v ) v or the FPP edge weights (f n (X v )) v .
Write IP (1) (∞) = ∞ k=1 IP (1) (k) for the limiting subgraph. We remark that IP (1) (∞) is a strict subgraph of T (1) a.s. (in contrast to FPP, which eventually explores every vertex). Indeed, define the largest weight of an invaded edge. Then P(M (1) < x) is the survival probability of a Poisson Galton-Watson branching process with mean x, as in Theorem 1.6. Indeed, the event {M (1) < x} is the event that, if we remove from T (1) all edges of weight X vk > x, the component of ∅ 1 in the resulting subgraph is infinite. We remark that, a.s., the supremum in (2.5) is attained, and the unique edge of weight M (1) is invaded after a finite number of steps (see Proposition 6.2 below).
The value x = 1 acts as a critical value for the PWIT. Indeed, if we remove all edges of weight X vk > x, then the subtree containing the roots is a branching process with Poi(x) offspring distribution. Hence for x ≤ 1 the tree is finite a.s., while for x > 1 the tree is infinite with positive probability. As a result, IP on the PWIT will have to accept edges of weight X vk > 1 infinitely often, and we have M (1) > 1 a.s. We next explain the connection between FPP and IP on the PWIT, under our scaling assumptions for the edge weights. We emphasize that FPP depends on n via the edge weights (f n (X v )) v , whereas IP is independent of n.
By Condition 1.3, we can approximate (Note that for the FPP exploration process, the only effect of multiplying by f n (1) is to rescale time.) Since the ℓ s norm converges towards the ℓ ∞ norm (recall (1.3)), we see that, when s n → ∞, a small edge weight is almost negligible when added to a larger edge weight. Therefore, the sequence of edges added under the FPP dynamics with weights f n (X v ) ≈ f n (1)(X v ) sn can be well approximated by adding the boundary edge having the smallest weight, that is, by the IP dynamics. Moreover, the time for the FPP exploration is dominated by the time f n (M (1) ) spent exploring the edge of largest weight, and until this edge has been explored only a finite number of other edges will be explored.
To formalize this discussion, consider the smallest-weight tree SWT (1) on K n started from vertex 1, as defined in (2.1). Write τ k = inf{t ≥ 0 : |E(SWT (1) t )| = k} for the time when the k th edge is added to SWT (1) . Then we have the following local weak convergence result, formalizing Theorem 1.2 and Theorem 1.7: Theorem 2.7 (Coupling to IP on the PWIT). Suppose lim n→∞ f n (x + δ)/f n (x) = ∞ for each x, δ > 0. Then the smallest-weight tree SWT (1) on K n can be coupled to invasion percolation IP (1) on one copy of the PWIT such that, for any fixed m ∈ N, Theorem 2.7 is proved in Section 4.4. The convergence in Theorem 2.7 is the local weak convergence in the sense of Benjamini and Schramm [8] for appropriately chosen metrics. Theorem 1.7 follows from Theorem 2.7 because given x, δ > 0 we can apply Condition 1.4 (with 1 − δ 0 = x, if x < 1) or Condition 1.5 (a) (with R = x + δ, if x + δ > 1) to find a value ε > 0 such that for sufficiently large n, Applying the exponential function on both sides, f n (x + δ)/f n (x) → ∞ follows from s n → ∞.
The heuristic comparison between FPP and IP ceases to be valid when a smaller edge weight f n (X v ∧ X w ) is no longer negligible when added to a larger edge weight f n (X v ∨ X w ). By (2.6), this requires |X v − X w | = Θ(1/s n ). By our discussion of the critical value, only edge weights X v ≈ 1 will be relevant to the large-scale behavior. It follows that once edge weights belonging to a critical window [1 − Θ(1/s n ), 1 + Θ(1/s n )] become numerous, the heuristic fails and the connection to IP on the PWIT ceases to hold. Since we are mainly interested in the case that 1/s n ≫ n −1/3 (see in particular [11]), the critical window observed here is wider than the critical window for the Erdős-Rényi random graph; cf. [6].
For IP on the PWIT, the weight of the maximal weight edge that IP uses after time k 2 t scales as 1 + U t /k, where (U t ) t≥0 is a limiting stochastic process (see [7, Proposition 3.3 and Theorem 1.6] and the remarks following [3, Theorem 31]). In particular, these weights become of the order 1 + Θ(1/s n ) when the size of the IP cluster is Θ(s 2 n ). This suggests that the maximal size of the smallest-weight tree that allows its dynamics to be coupled to IP on the PWIT is o(s 2 n ). However, we do not need and will not prove such a strong result.

Exploration from two sources
So far we have studied the smallest-weight tree SWT (1) from one vertex and its coupling to a suitably thinned version of a CTBP starting from one root. To study the optimal path between vertices 1 and 2, a standard approach would be to place sources of fluid on both vertices and wait for the two smallest-weight trees to merge. Appealing to the coupling in Theorem 2.6, equally we could study the evolution of two independent CTBPs BP (1) and BP (2) with original ancestors ∅ 1 and ∅ 2 . For this method it is important that the two trees grow in a similar fashion. However, the heuristics in Section 2.2 show that BP (j) spends a considerable amount of time, namely f n (M (j) ) ≫ f n (1), waiting for a single edge to be explored, during which time only finitely many other edges are explored. Since M (1) = M (2) a.s., our scaling assumptions on f n mean that the times f n (M (1) ), f n (M (2) ) will be quite different.
For this reason, we will not grow the two CTBPs at the same speed. When one of them becomes large enough (what this means precisely will be explained later), it has to wait for the other one to catch up. We call this procedure freezing.
To formalize this, let T be the disjoint union of two independent copies (T (j) , X (j) ), j ∈ {1, 2}, of the PWIT. We shall assume that the copies T (j) are vertex-disjoint, with roots ∅ j , so that we can unambiguously write X v instead of X (j) v for v ∈ T (j) , v = ∅ j . The notation introduced for T (1) is used verbatim on T . For example, for any subset τ ⊆ T , we write ∂τ = {v ∈ τ : p (v) ∈ τ } for the boundary vertices of τ .
The FPP process on T with edge weights (f n (X v )) v starting from ∅ 1 and ∅ 2 is equivalent to the union BP = BP (1) ∪ BP (2) of two CTBPs. Let T (j) fr be a stopping time with respect to the filtration induced by BP (j) , j ∈ {1, 2}. We call T (1) fr and T (2) fr freezing times and run BP until T (1) fr ∧ T (2) fr , the time when one of the two CTBPs is large enough (see Definition 2.12 for the precise definition of what large enough means). Then we freeze the larger CTBP and allow the smaller one to evolve normally until it is large enough, at time T (1) fr ∨ T (2) fr . At this time, which we call the unfreezing time T unfr = T (1) fr ∨ T (2) fr , both CTBPs resume their usual evolution. We denote by R j (t) the on-off processes describing this behavior: that is, for j ∈ {1, 2}, The version of BP = (BP t ) t≥0 including freezing is then given by for all t ≥ 0. (2.10) As with T , we can consider B t to be the union of two trees by placing an edge between each non-root vertex v / ∈ {∅ 1 , ∅ 2 } and its parent. We denote by

Freezing a CTBP
For very fine results about the branching process and the freezing times, we strengthen Condition 1.5 to the following condition: Condition 2.8 (Density bound for large weights). There exist ε 1 > 0 and n 1 ∈ N such that As explained, the purpose of the freezing is to guarantee a comparable growth of the two CTBPs. One requirement is therefore that the edge of weight f n (M (j) ) is explored before freezing, for each j = 1, 2 (and they are instantaneously unfrozen after the last of these times). The second requirement is that the two CTBPs exhibit typical branching process dynamics. To make this precise, recall that a typical branching process grows exponentially where the growth rate is given by its Matlthusian parameter λ n : Writingμ n (λ) = e −λy dµ n (y) for the Laplace transform of the intensity measure µ n , λ n > 0 is the unique solution tô µ n (λ n ) = 1. (2.13) Asymptotically, λ n scales like 1/f n (1): Lemma 2.9. Suppose Conditions 1.3, 1.4 and 2.8 hold for a positive sequence (s n ) n with s n → ∞.
Lemma 2.9 is proved in Section 3. The same reasoning is used to prove a more general statement in [Part II, Theorem 5.3] (see [Part II, Sections 5.2 and 5.3]) but we include the proof here to avoid circularity.
Hence, we expect BP (j) = (BP (j) t ) t≥0 to grow exponentially at rate λ n for large times t. However, initially, BP (j) does not grow in this way. The reason is that the exponential growth typical of a branching process arises primarily from rare individuals that have an unusually large number of offspring in a very short time. Formally, for v ∈ T (1) , write BP (v) for the branching process of descendants of v, re-rooted and time-shifted to start at t = 0. That is, That is, an R-lucky vertex has Rs 2 n descendants by the time it reaches age f n (1). The following proposition states that this happens with probability at least a constant times 1/s n : Proposition 2.11. Suppose Conditions 1.3, 1.4 and 2.8 hold for a positive sequence (s n ) n with s n → ∞. Fix R ∈ (0, ∞). There exists δ > 0 such that for every r ∈ (0, R] there is some n 0 ∈ N such that P (v is r-lucky) ≥ δ/(s n √ r) for all v ∈ T and all n ≥ n 0 .
Proposition 2.11 is proved in Section 5.3.
Once an R-lucky vertex v is born (for some R > 0), another R-lucky vertex (for some R > 0) is likely to be born soon thereafter. Indeed, Condition 1.3 implies that between ages f n (1) and 2f n (1), the number of new children of v will be Poisson with mean of order 1/s n . The same is true for the order s 2 n initial descendants of v, so that a total of order s n children is expected to be born during this time. Among these, of order 1 can be expected to repeat the unlikely event performed by v and thereby perpetuate the growth.
We therefore expect that BP (j) will exhibit typical branching dynamics, with exponential growth on the time scale f n (1), only once BP (j) is large enough so that the intensity of new births, in the time scale f n (1), is at least s n .
This motivates the following definition of the freezing times: Definition 2.12 (Freezing). Define, for j = 1, 2, the freezing times 15) and the unfreezing time T unfr = T (1) fr ∨ T (2) fr . The frozen cluster is given by The random variable ∞ t−Tv e −λn(y−(t−Tv )) dµ n (y) represents the expected number of future offspring of vertex v ∈ BP (j) t , exponentially time-discounted at rate λ n . Recall from (2.9) that R j (t) = (t ∧ T (j) fr ) + ((t − T unfr ) ∨ 0). Thus, each CTBP evolves at rate 1 until its expected number of future offspring, exponentially time-discounted at rate λ n , first exceeds s n . At that time the configuration is "frozen" and ceases to evolve until both sides have been frozen. The two sides, which are now of a comparably large size, are then simultaneously unfrozen and thereafter evolve at rate 1. Henceforth we will always assume this choice of T (1) fr , T (2) fr . We next investigate the asymptotics of the freezing times: Theorem 2.13, which is proved in Section 6.2, confirms that the two CTBPs BP (1) and BP (2) are indeed large enough relatively soon after exploring the edges of weight f n (M (1) ) and f n (M (2) ), respectively.
The next result states that soon after unfreezing, an R-lucky vertex is born.
Lemma 2.14. Suppose Conditions 1.3, 1.4 and 2.8 hold for a positive sequence (s n ) n with s n → ∞. Fix R < ∞ and let v j,R denote the first R-lucky vertex in B (j) born after time T unfr . Then Lemma 2.14 is proved in Section 5.4.

Decomposing FPP into IP and branching dynamics
Similarly to the definition of the CTBPs with freezing B, we can study the smallest-weight trees on K n with freezing. However, in contrast to the definition of B, there is competition between the two clusters and we have to actively forbid their merger. We do not need or prove any statements about the smallest-weight trees from two sources in this paper, and, therefore, give only the following informal definition. We refer to Part II for extensive results in this direction.
The two smallest-weight trees with freezing on K n started from vertices 1 and 2 are two disjoint trees S (j) = (S (j) t ) t≥0 , j ∈ {1, 2}, on K n with root 1 and 2, respectively. Initially S (j) contains only vertex j which can be viewed as the source of a fluid that differs from the fluid originating at j ′ with {j, j ′ } = {1, 2}. When a vertex becomes wet, it is added to the tree of the corresponding fluid and all adjacent edges between the new vertex and the other tree can no longer transport fluid. Moreover, fluid originating from j cannot travel between times T (j) fr and T unfr . We write S t = S (1) t ∪ S (2) t for all t ≥ 0 call S t the smallest-weight tree from two sources. Theorems 2.6 and 2.13 imply that the smallest-weight trees without freezing, i.e., the choice R 1 (t) = R 2 (t) = t, would behave almost like the asymmetric choices R 1 (t) = t, R 2 (t) = 0 or for the smallest-weight tree starting from vertices 1 and 2, constructed as S = (S t ) t≥0 but with R j replaced by the identity map, for j ∈ {1, 2} (that is, S (id) is the smallest-weight tree from two sources without freezing). Denote by S (id,1) the connected component of S (id) containing vertex 1. Our results suggest that where θ id and θ fr are random variables with P(θ id = 1) = P(θ id = 0) = 1/2 and P(θ fr ∈ (0, 1)) = 1. This means that freezing is necessary to guarantee that both trees are asymptotically of comparable size.
Similarly to Theorem 2.6, we can couple B t and S t . To this end, we introduce a thinning procedure for B = (B t ) t≥0 analogous to Definition 2.4. Define M ∅ j = j, for j = 1, 2. To each other v ∈ T \ {∅ 1 , ∅ 2 }, we associate a mark M v chosen uniformly and independently from [n].
As in the remarks below Definition 2.4, this definition is not circular, and we write B t for the subgraph of B t consisting of unthinned vertices.
Recall the subgraph π M (τ ) of K n introduced in Definition 2.5, which we extend to the case where τ ⊂ T . Theorem 2.16. There exists a coupling such that S (1) t ∪ S (2) t = π M ( B t ) for all t ≥ 0 almost surely.
Theorem 2.16 is formalized and proved as [Part II, Theorem 2.15]. We will not use the result of Theorem 2.16 in this paper. Nevertheless, Theorem 2.16 motivates our study of B and freezing times because it relates FPP on the complete graph (n-independent dynamics run on an n-dependent weighted graph) with an exploration defined in terms of a pair of Poisson-weighted infinite trees (n-dependent dynamics run on an n-independent weighted graph). By analyzing the dynamics of B when n and s n are large, we obtain a fruitful dual picture: a static approximation by IP, valid when the number of explored vertices is small and independent of n; followed by a dynamic rescaled branching process approximation, valid when the number of explored vertices is large, that is also essentially independent of n.
An important goal of this paper is to understand the dynamics of the IP part which is formalized as the frozen cluster. Our main results are collected in the following theorem: (a) The volume |B fr | of the frozen cluster is O P (s 2 n ).
Theorem 2.17 (a) and (b) are proved in Sections 5.3 and 6.3, respectively. Theorem 2.17 will allow us to ignore the elements coming from the frozen cluster in analysis of the smallest-weight path. For instance, part (b) will be used to show that path lengths within the frozen cluster are negligible compared to overall path lengths. We believe that the diameter max {|v| : v ∈ B fr } of the frozen cluster really is of order s n , but we will not need this and therefore also not prove this. Since s n → ∞, this intuitively explained 'long paths' in our title. This analysis will be carried out in [11] where we are interested in global properties of the FPP process. There also the title will be substantiated, since it will be shown that H n is of order s n log(n/s 3 n ) when s n = o(n 1/3 ).

Discussion of our results
In this section we briefly discuss our results and state open problems. For a more detailed discussion of the results in this paper and in our companion paper [11], as well as an extensive discussion of the relations to the literature, we refer to [Part II, Section 1.4]. First passage percolation (FPP) on the complete graph is closely approximated by invasion percolation (IP) on the Poisson-weighted infinite tree (PWIT), studied in [3], whenever s n → ∞. See Theorem 1.7 and the discussion in Section 2.1. However, this relationship is a local one, and the scaling of s n relative to n controls whether the two objects are globally comparable. Theorem 1.6 shows that the weights are globally comparable provided s n / log log n → ∞. For the hopcount, the appropriate comparison is to the minimal spanning tree (MST) on the complete graph, obtained from running IP with a simple no-loops constraint. Path lengths in the MST scale as n 1/3 (see [2] and [1]). We conjecture that, for s 3 n /n → ∞, FPP on the complete graph is in the same universality class as IP. It would be of great interest to make this connection precise by showing, for example, that H n /n 1/3 converges in distribution, and that the scaling limit of H n is the same as the scaling limit of the graph distance between two uniform vertices in the MST.
The local graph convergence from Theorem 1.7 and weight convergence from Theorem 1.6 are the first two in a hierarchy of possible comparisons between FPP and the MST. Strengthening the previous statement about the scaling limit of H n , we can ask whether the optimal path between vertices i, j ∈ [n] equals (under a suitable coupling) the unique path in the MST from i to j; whether the union of the optimal paths 2 from vertex i to every other vertex j = i equals the entire MST; and whether these unions agree simultaneously for every i ∈ [n]. Assuming hypotheses similar to Conditions 1.3-1.5, it would be of interest to know how s n must grow relative to n in order for each of these events to occur.
3 Growth and density bounds for f n and µ n Throughout this section, we assume Conditions 1.4 and 2.8. Further assumptions will be stated explicitly. We will reserve the notation ε 0 , δ 0 for some fixed choice of the constants in Conditions 1.4 and 2.8, with ε 0 chosen small enough to satisfy both conditions.
The aim of the section is to explore the key implications of Conditions 1.4 and 2.8 on f n and on the intensity measure µ n .
Proof. Divide (1.12) or (2.12) by x and integrate between x and x ′ to obtain log We call Condition 2.8 a density bound because it implies the following lemma: For n sufficiently large, on the interval (f n (1 − δ 0 ), ∞), the measure µ n is absolutely continuous with respect to Lebesgue measure and Proof. By Conditions 1.4 and 2.8, f n is strictly increasing on (1 − δ 0 , ∞), so y = f n (µ n (0, y)) for y > f n (1 − δ 0 ). Differentiating and again applying Conditions 1.4 and 2.8, we get For n sufficiently large, the density of µ n with respect to Lebesgue measure is at most 1/(ε 0 s n f n (1)) on the interval (f n (1), ∞).

(3.3)
Proof. By Lemma 3.3, for large n, the density of µ n with respect to Lebesgue measure is bounded from above by 1/(ε 0 s n f n (1)) on (f n (1), ∞). Hence, for K > 1, We are now in the position to prove Lemma 2.9. Recall from the discussion around (2.13) that µ n (λ) = e −λy dµ n (y) denotes the Laplace transform of µ n .
To conclude Lemma 2.9 from (3.4), we use the monotonicity of µ n . Taking a > 1 shows that µ n ae −γ /f n (1) < 1 for all n large enough, implying λ n < ae −γ /f n (1) and lim sup n→∞ λ n f n (1) ≤ e −γ . A lower bound holds similarly. Lemma 3.5. Suppose Condition 1.3 holds. Given K < ∞, there exist ε K > 0 and n 0 ∈ N such that, for 0 ≤ t ≤ Kf n (1) and n ≥ n 0 , ∞ 0 e −λny µ n (t + dy) ≥ ε K /s n . 4 Coupling K n and the PWIT In Theorem 2.16, we indicated that two random processes, the first passage exploration processes S and B on K n and T , respectively, could be coupled. In Section 2.1 we have intuitively described the coupling between FPP on K n and on the PWIT. In this section we explain how this coupling arises as a special case of a general family of couplings between K n , understood as a random edge-weighted graph with i.i.d. exponential edge weights, and the PWIT. In Section 4.2 we define the minimal rule processes and the thinning in this context. In Section 4.3 and Section 4.4 we prove Theorems 2.6 and 2.7, respectively.

Exploration processes and the definition of the coupling
As in Sections 2.1 and 2.5, we define M ∅ j = j, for j = 1, 2, and to each other v ∈ T \ {∅ 1 , ∅ 2 }, we associate a mark M v chosen uniformly and independently from [n]. We next define what an exploration process is: Definition 4.1 (Exploration process on two PWITs). Let F 0 be a σ-field containing all null sets, and let (T , X) be independent of F 0 . We call a sequence E = (E k ) k∈N 0 of subsets of T an exploration process if, with probability 1, E 0 = {∅ 1 , ∅ 2 } and, for every k ∈ N, either E k = E k−1 or else E k is formed by adjoining to E k−1 a previously unexplored child v k ∈ ∂E k−1 , where the choice of v k depends only on the weights X w and marks M w for vertices w ∈ E k−1 ∪ ∂E k−1 and on events in F 0 .
Examples for exploration processes are given by FPP and IP on T . For FPP, as defined in Definition 2.2, it is necessary to convert to discrete time by observing the branching process at those moments when a new vertex is added, similar to Theorem 2.7. The standard IP on T is defined as follows. Set IP(0) = {∅ 1 , ∅ 2 }. For k ∈ N, form IP(k) inductively by adjoining to IP(k − 1) the boundary vertex v ∈ ∂IP(k − 1) of minimal weight. However, an exploration process is also obtained when we specify at each step (in any suitably measurable way) whether to perform an invasion step in T (1) or in T (2) .
For k ∈ N, let F k be the σ-field generated by F 0 together with the weights X w and marks M w for vertices w ∈ E k−1 ∪ ∂E k−1 . Note that the requirement on the choice of v k in Definition 4.1 can be expressed as the requirement that E is (F k ) k -adapted.
For v ∈ T , define the exploration time of v by Write E k for the subgraph of E k consisting of unthinned vertices.
Recall the remark below Definition 2.4 that explains that the definition above is not circular. We define the stopping times at which i ∈ [n] first appears as a mark in the unthinned exploration process. Note that, on the event {N(i) < ∞}, E k contains a unique vertex in T whose mark is i, for any k ≥ N(i); call that vertex V (i). On this event, we define We define, for an edge {i, i ′ } ∈ E(K n ), where (E e ) e∈E(Kn) are exponential variables with mean 1, independent of each other and of (X v ) v .   4). The E {i,j} are exponential, independently from everything else, so we may perform the integration over these variables separately. We conclude that We claim that, for all When ℓ = n − 2, by convention, the second sum vanishes, while in the first sum i is the empty sequence. The case ℓ 0 = 1 reduces to (4.6): then S ℓ 0 ,i contains only one element, which we called i n but which is in fact uniquely determined by the values i 3 , . . . , i n−1 , and the product {i,j}⊂S ℓ 0 ,i h {i,j} is empty. This initializes the induction hypothesis.
We remark that in the right-hand sides of (4.6) and (4.8), the indicators already allow us to determine which of the three cases from (4.4) occurs. For notational simplicity, we will introduce this information gradually as we proceed. Now suppose (4.8) has been proved for a given ℓ 0 < n − 2. In the first summand of the righthand side of (4.8), we condition on F N (i n−ℓ 0 ) . By Lemma 4.3 and the presence of the indicators, each factor h {i,j} (X (Kn) {i,j} ) is equal to a factor h {i,j} ( 1 n X(i, j)) (or h {i,j} ( 1 n X(j, i)), if N(j) < N(i)) that is F N (i n−ℓ 0 ) -measurable, with the exception of the factors h {in−ℓ 0 ,j} ( 1 n X(i n−ℓ 0 , j)) for j ∈ S ℓ 0 ,i , which are conditionally independent given F N (i n−ℓ 0 ) , again by Lemma 4.3. Note furthermore that Thus, Leaving i 3 , . . . , i n−ℓ 0 −1 fixed, we now sum (4.9) over all i n−ℓ 0 ∈ [n] \ {i 1 , . . . , i n−ℓ 0 −1 }. This rewrites the first sum in (4.8) as (4.10) However, (4.10) combines with the summand ℓ = ℓ 0 + 1 from the second sum in (4.8) (since we can to produce the first sum in (4.8) for ℓ 0 replaced by ℓ 0 + 1. This advances the induction hypothesis, and thus completes the proof of (4.8) for all ℓ 0 ≤ n − 2.
We therefore conclude that (4.8) holds when ℓ 0 = n − 2. In this case the second sum vanishes, while in the first sum i is the empty sequence, S n−2,i = [n] \ {1, 2} and the events A n−2,i , B n−2,i always occur (note that N(i 2 ) = 0). Therefore By Lemma 4.3, ( 1 n X(1, i)) i≥3 and ( 1 n X(2, i)) i≥3 are each families of independent exponential random variables with mean 1. Moreover they are mutually independent, since they are determined from the independent Poisson point processes of edge weights corresponding to ∅ 1 and ∅ 2 , respectively. Since furthermore E {1,2} is independent of everything, we conclude that (4.5) holds.

Minimal-rule exploration processes
An important class of exploration processes, which includes both FPP and IP, are those exploration processes determined by a minimal rule in the following sense: Definition 4.5. A minimal rule for an exploration process E on T is an (F k ) k -adapted sequence (S k , ≺ k ) ∞ k=1 , where S k ⊂ ∂E k−1 is a (possibly empty) subset of the boundary vertices of E k−1 and ≺ k is a strict total ordering of the elements of S k (if any) such that the implication holds. An exploration process is determined by the minimal rule (S k , ≺ k ) ∞ k=1 if E k = E k−1 whenever S k = ∅ and otherwise E k is formed by adjoining to E k−1 the unique vertex v k ∈ S k that is minimal with respect to ≺ k .
In words, in every step k there is a set of boundary vertices S k from which we can select for the next exploration step. The content of (4.12) is that, whenever a vertex w ∈ S k is available for selection, then all siblings of w with the same mark but smaller weight are also available for selection and are preferred over w.
For FPP without freezing on T with edge weights f n (X v ), we take v ≺ k w if and only if T v < T w (recall (2.3)) and take S k = ∂E k−1 . For IP on T , we have v ≺ k w if and only if X v < X w ; the choice of subset S k can be used to enforce, for instance, whether the k th step is taken in T (1) or T (2) .
Recall the subtree E k of unthinned vertices from Definition 4.2 and the subgraph π M ( E k ) from Definition 2.5. That is, π M ( E k ) is the union of two trees with roots 1 and 2, respectively, and for v ∈ E k \ {∅ 1 , ∅ 2 }, π M ( E k ) contains vertices M v and M p(v) and the edge M v , M p(v) . For 13) The following lemma shows that, for an exploration process determined by a minimal rule, unthinned vertices must have the form V (i, i ′ ): Lemma 4.6. Suppose E is an exploration process determined by a minimal rule , it follows that X V (i k ,i ′ k ) < X v k and (4.12) yields V (i k , i ′ k ) ∈ S k and V (i k , i ′ k ) ≺ k v k , a contradiction since v k must be minimal for ≺ k . If E is an exploration process determined by a minimal rule, then we define and where e j = i j , i ′ j and i j ∈ π M ( E k−1 ), i ′ j / ∈ π M ( E k−1 ) as in (4.14).

Proposition 4.7 (Thinned minimal rule)
. Suppose E is an exploration process determined by a minimal rule (S k , ≺ k ) ∞ k=1 . Then, under the edge-weight coupling (4.4), the edge weights of π M ( E k ) are determined by and generally Moreover, for any k ∈ N for which E k = E k−1 , π M ( E k ) is formed by adjoining to π M ( E k−1 ) the unique edge e k ∈ S (Kn) k that is minimal with respect to ≺ k . Proposition 4.7 asserts that the subgraph π M ( E k ) of K n , equipped with the edge weights (X (Kn) e ) e∈E(π M ( E k )) , is isomorphic as an edge-weighted graph to the subgraph E k of T , equipped with the rescaled edge weights ( 1 n X v ) v∈ E k \{∅ 1 ,∅ 2 } . Furthermore, the subgraphs π M ( E k ) can be grown by an inductive rule. Thus the induced subgraphs (π M ( E k )) ∞ k=0 themselves form a minimalrule exploration process on K n , with a minimal rule derived from that of E, with the caveat that ≺ k may depend on edge weights from E k−1 \ E k−1 as well as from π M ( E k−1 ).
, and by the definition of minimal rule, the vertex v k belongs to S k and is minimal for ≺ k . It follows from the definitions (4.14)-(4.15) that e k ∈ S (Kn) k is minimal for ≺ k .

Coupling SWT (1) and BP (1) : Proof of Theorem 2.6
In this section, we prove Theorem 2.6: that is, we couple the smallest-weight tree SWT (1) on K n to a single branching process BP (1) on T (1) . Since this statement is concerned with processes starting from only one source, we use exploration processes on T (1) instead of T . All results from Sections 4.1 and 4.2 carry over up to obvious changes (indeed, the results hold for any finite number of copies of the PWIT).
It is easy to verify by induction that for any i ∈ SWT (1) τ for e belonging to the unique path in SWT (1) τ k ′ −1 from 1 to i. Define the edge weights on K n according to (4.4) using the exploration process E = E (FPP,1) . By Theorem 4.4, the edge weights X (Kn) e are i.i.d. exponential with mean 1. Hence, as discussed in Section 1.2, the edge weights Y (Kn) e = g(X (Kn) e ) have the distribution function F Y , so that SWT (1) has the correct law under this coupling.

Comparing FPP and IP: Proof of Theorem 2.7
In this section, we prove Theorem 2.7 by comparing the FPP and IP dynamics on the PWIT.
Proof of Theorem 2.7. It is easy to see that IP (1) is an exploration process determined by a minimal rule. For instance, we may take S k = ∂IP (1) k−1 and v ≺ k w if and only if X v < X w . In fact, it will be more convenient to use a different characterization of IP (1) . ,1) , . . . , X (v,|v|) ) for the vector of edge weights X v ′ along the path from ∅ 1 to v, ordered from largest to smallest. Set v ≺ IP k w if and only if O(v) is lexicographically smaller than O(w). It is an elementary exercise that this minimal rule (S k , ≺ IP k ) ∞ k=1 also determines IP (1) . Couple the edge weights on K n according to (4.4), where the exploration process is E (FPP,1) from the proof of Theorem 2.6. Fix m ∈ N. With high probability, none of the first m vertices explored by E (FPP,1) is thinned. By Theorem 2.6, it therefore suffices to show that (E (FPP,1) Let ε > 0 be given. Write B m for the collection of all vertices of the form ∅ 1 j 1 · · · j r with 1 ≤ r ≤ m and j 1 , . . . , j r ≤ m. (That is, B m consists of all vertices in T (1) within m generations for which each ancestor is at most the m th child of its parent.) Note that the first m explored vertices v 1 , . . . , v m necessarily belong to B m , for both E (FPP,1) and IP (1) . Let δ > 0 and write A δ for the event We may choose δ > 0 sufficiently small that P(A δ ) ≥ 1 − ε.
Choose x 0 < x 1 < · · · < x N such that x 0 = δ, x N = 1/δ and x j − x j−1 ≤ δ/2 for all j ∈ [N]. By assumption, there is an n 0 ∈ N such that f n (x j )/f n (x j−1 ) > m for all j ∈ [N] and n ≥ n 0 . Hence, for any x, x ′ ∈ [δ, 1/δ] with x ′ ≥ x + δ, the monotonicity of f n implies . From now on assume n ≥ n 0 . Consider any v, w ∈ B m such that v = w and neither v nor w is an ancestor of the other. Then there is a smallest index j with X (v,j) = X (w,j) . If X (v,j) < X (w,j) then, on A δ , (4.19) and similarly if X (v,j) > X (w,j) . Hence v ≺ FPP = IP (1) (0), it follows that, on A δ for n sufficiently large, we have (E (FPP,1) k ) m k=1 = (IP (1) (k)) m k=1 , and since P(A δ ) ≥ 1 − ε with ε > 0 arbitrary, this completes the proof.

Poisson Galton-Watson trees, lucky vertices and the emergence of the frozen cluster
In this section, we prove Proposition 2.11, Lemma 2.14 and Theorem 2.13. We begin with preliminary results on Poisson Galton-Watson trees. Throughout this section, we assume Conditions 1.3, 1.4 and 2.8. We will reserve the notation ε 0 , δ 0 for some fixed choice of the constants in Conditions 1.4 and 2.8, with ε 0 chosen small enough to satisfy both conditions.
(e) Choose n 0 so large that 1 + o(1) in (5.4) is bounded from below by 1/2 and that −K/s n − log(1 − K/s n ) ≤ K 2 /s 2 n and ⌈rs 2 n ⌉ ≤ ⌊2rs 2 n ⌋ for n ≥ n 0 . According to (d), (5.9) (f) Denote by (G i , P) the uniform labelled rooted tree on i nodes after the labels of the children have been discarded and by h(G i ) the height of G i . Given d ∈ (0, ∞) and ε > 0, we write G (≤d) i for the subtree of G i consisting of vertices within distance d from the root and G i ε for G i with distances rescaled by ε. For all m > 0 and i ∈ N, the Poisson Galton-Watson tree with offspring mean m conditioned on having i nodes has the same distribution as G i . Moreover, h(G i )/ √ i converges, as i → ∞, to the maximum of 2B where B = (B t ) t∈[0,1] is a standard Brownian excursion [4,5]. We deduce that The first factor on the right-hand side of (5.10) can be estimated, for sufficiently large n, by min rs 2 n ≤i≤2rs 2 The claim now follows from a standard result on critical Galton-Watson processes (see for example [15,Lemma I.10.1]).

The probability of luckiness: proof of Proposition 2.11
In the proof of Proposition 2.11 we use the following estimates: 1 2 p and the result follows. Lemma 5.3. There exists n 0 ∈ N such that for any m ∈ (0, ∞), a ∈ (0, 1] and n ≥ n 0 , Proof. By Lemma 3.1, for sufficiently large n, Proof of Proposition 2.11. Applying Lemma 5.3 with m = 1 and a = 1 − K/s n , we find that K can be chosen such that for sufficiently large n, Let τ be the subtree of T consisting of v and those descendants joined to v by a path whose edge weights X w all satisfy X w ≤ 1 − K/s n . We consider τ as a rooted labelled tree equipped with edge weights, but with the vertex labels from the PWIT forgotten 3 . Then, ignoring weights, τ is equal in distribution to a Poisson Galton-Watson tree with offspring mean 1 − K/s n and by Proposition 5.1 (e) there is some c > 0 independent of r ∈ (0, R] such that for every r ∈ (0, R] there is n 0 ∈ N with P(4rs 2 n ≤ |τ | ≤ 8rs 2 n ) ≥ c sn √ r for all n ≥ n 0 . Write τ ′ for the subtree of τ consisting of vertices within distance s n from the root. By Proposition 5.1 (f), there is some δ 1 > 0 independent of r ∈ (0, R] such that with δ = δ 1 c/4, P(|τ ′ | ≥ 4rs 2 n ) ≥ 4δ sn √ r for n sufficiently large. Conditional on τ ′ , the PWIT edge weights X wk , wk ∈ τ ′ , are uniformly distributed on [0, 1−K/s n ], and therefore the first passage edge weights Y wk = f n (X wk ) satisfy By the definition of τ ′ , the graph distance between u ∈ τ ′ and v is at most s n . Hence, (5.15) and (5.14) give In the proof of Theorem 2.17 (a), we will bound the number of vertices of large age in the frozen cluster. To do so, we will argue that for each vertex of large age, there is an independent chance to have an R-lucky child v, where R is chosen large enough that v and its descendants are on their own sufficient to bring about the freezing time. This motivates the following definition: Definition 5.4. Let ε 1 denote the constant from Lemma 3.5 for K = 1. Call a vertex v ∈ T (j) lucky if it is (1/ε 1 )-lucky, and set the first time that a lucky vertex is born to a parent of age greater than f n (1).
In view of Definition 2.12 and Lemma 3.
In other words, a lucky vertex has enough descendants in time f n (1) that the integral in the definition (2.15) of the freezing time must be at least s n .
is exponential with rate P(v is lucky).
lucky is equivalent to X v > 1. On the other hand, the event {v is lucky} depends only on the evolution of BP (v) until time f n (1) and is therefore determined by those descendants v ′ of v for which X v ′ ≤ 1. Because these two conditions on edge weights are mutually exclusive, it will follow that T (j) lucky is the first arrival time of a certain Cox process. We now formalize this intuition, which requires some care.
Proof of Lemma 5.5. To avoid complications arising from our Ulam-Harris notation, we consider instead of a vertex v = ∅ j k 1 k 2 . . . k r the modified vertex w(v) = ∅ j X k 1 X k 1 k 2 . . . X k 1 k 2 ...kr formed out of the edge weights along the path from ∅ j to v. We can extend our usual notation for parents, length, concatenation, edge weight, and birth times to vertices of the form w = ∅ j x 1 · · · x r , x i ∈ (0, ∞): for instance, |w| = r, X w = x r and T w = f n (x 1 ) + · · · + f n (x r ).
Form the point measure Given M, we can recover the PWIT (T , X): for instance, The point measure M has the advantage that a value such as M({∅ j } × (a, b)) (the number of children of ∅ j with edge weights in the interval (a, b)) does not reveal information about the number of sibling edges of smaller edge weight.
The Poisson property of the PWIT can be expressed in terms of M by saying that, conditional on the restriction M {∅ 1 ,∅ 2 }×(0,∞) r to the first r generations, the (r + 1) st generation ≤ is lucky. Define A (j) to be the collection of vertices w(v), v ∈ T (j) , that are born before time T (j) lucky . That is, for any non-root ancestor and v ′ is lucky. To study A (j) , we decompose the PWIT according to vertices v ′ that are born late (i.e., T v ′ > T p(v ′ ) + f n (1)) and keep track of their early explored descendants (i.e., M (w(v ′ )) ≤ ). For w = ∅ j x 1 . . . x r , let i 1 < · · · < i k denote those indices (if any) for which x i > 1, and write w ℓ = ∅ j x 1 . . . x i ℓ , ℓ = 1, . . . , k. Set q(w, M) to be the sequence M (That is, R is a point measure on pairs (w, q) such that w satisfies w ∈ {∅ 1 , ∅ 2 } or X w > 1, and q is a sequence of measures on ∪ ∞ r=0 (0, 1] r . Considering R instead of M corresponds to partitioning vertices according to their most recent ancestor (if any) having edge weight greater than 1.) The Poisson property of the PWIT implies that, conditional on the restriction R {(w,q) : |w|≤r} , the restriction R {(w,q) : |w|=r+1} forms a Cox process with intensity measure where δ ww ′ ⊗ ½ {x>1} dx means the image of Lebesgue measure on (1, ∞) under the concatenation mapping x → ww ′ x. The formula (5.20) expresses the fact that every vertex in the (r + 1) st generation has a parent uniquely written as ww ′ , with (w, q(w, M)) corresponding to a point mass in R, w ′ corresponding to a point mass in the last entry M (w) ≤ of the sequence q(w, M), and r = |w| + |w ′ |. Now, it is easy to verify that A (j) is measurable with respect to the restriction of R to pairs (w, q) such that q = m 0 . . . m k with m ℓ not lucky for each ℓ = 0.
Because of the Poisson property of the PWIT, as expressed via R in (5.20), it follows that, conditional on A (j) , the point measure L = w ′′ / ∈A (j) ,p(w ′′ )∈A (j) δ T w ′′ forms a Cox process on (0, ∞). Furthermore, the first point of L is precisely T (j) lucky . To determine the intensity measure of L, we note that for a vertex ww ′ ∈ A (j) , w ′′ = ww ′ x satisfies w ′′ ∈ A (j) if and only if X w ′′ > 1 and M (w ′′ ) ≤ is lucky. Furthermore, the condition . Using (5.20), it follows that the cumulative intensity measure of L on (0, t] is given by (w,q) : q=m 0 ...m k , m ℓ not lucky for any ℓ =0 dR(w, q) is lucky).
(5.21) The vertices ww ′ from the integral in (5.21) are in one-to-one correspondence with the vertices u ∈ A (j) satisfying T u ≤ t. Consequently we may re-write the cumulative intensity as Finally we note that P(v is lucky) times the sum in (5.18) is exactly the sum in (5.22) evaluated at t = T (j) lucky . (The vertex v for which T v = T (j) lucky does not contribute to (5.18).) The cumulative intensity in (5.22) is a.s. continuous as a function of t (since f −1 n is continuous and the jumps at the times T u are zero). But for any Cox process with continuous cumulative intensity function and infinite total intensity, it is elementary to verify that when the cumulative intensity is evaluated at the first point of the Cox process, the result is exponential with mean 1. This completes the proof.

Volume of the frozen cluster: proof of Theorem 2.17 (a)
To study the frozen cluster, we introduce the frozen intensity measures Recall from Section 3 that the notation µ(t 0 + dy) denotes the translation of the measure µ by t 0 ; thus (5.23) means that, for a test function h ≥ 0, Lemma 5.6. Almost surely, for j = 1, 2, s n ≤ e −λny dµ (j) n,fr (y) ≤ s n + 1. ). Since µ n has no atoms, this process is continuous in t except for jumps at the birth times, and since the birth times are distinct a.s., the corresponding jump has size ∞ 0 e −λny dµ n (y) = 1. By definition, T (j) fr is the first time the process in (5.26) exceeds s n , so it can have value at most s n + 1 at that time.
To prove Theorem 2.17 (a), we will separate vertices v ∈ B (j) fr according to whether their age at freezing, T (j) fr − T v , is large or small. We then use Lemma 5.5 and Lemma 5.6, respectively, to bound the number of such vertices.
Proof of Theorem 2.17 (a). Lemma 5.5 and Proposition 2.11 imply that the sum in (5.18) is O P (s n ). By (5.17), any vertex v with T (j) fr − T v ≥ f n (1) + f n (1 + 1/s n ) satisfies T (j) lucky − T v ≥ f n (1 + 1/s n ) and therefore must contribute at least 1/s n to the sum in (5.18). Consequently there can be at most O P (s 2 n ) vertices of age at least f n (1) + f n (1 + 1/s n ) in B (j) fr . For the vertices of small age, recall the definition of µ (j) n,fr from (5.23) and that e −λny dµ (j) n,fr (y) ≤ s n + 1 by Lemma 5.6. Apply Lemma 3.5 with K chosen large enough that Kf n (1) ≥ f n (1 + 1/s n ) + f n (1) for each n (such a K exists since lim n→∞ f n (1 + 1/s n )/f n (1) = e), to see that summands corresponding to vertices of age at most Kf n (1) in B (j) fr contribute at least ε K /s n to the integral in Lemma 5.6. Hence, there can be at most s n (s n + 1)/ε K such vertices.
For future reference, we now state a lemma showing that most of the mass of the frozen intensity measures µ (j) n,fr comes from small times: Lemma 5.7. Let δ, δ ′ > 0 be given. Then there exists K < ∞ and n 0 ∈ N such that, for all n ≥ n 0 , P e −λny ½ {λny≥K} dµ (j) n,fr (y) > δs n ≤ δ ′ .
Proof. Let ε = e −γ /2, where γ denotes Euler's constant. Using the definition of µ (j) n,fr from (5.23), and Lemma 2.9, we obtain n 0 ∈ N such that for all K < ∞, and n ≥ n 0 , (5.28) According to Lemma 3.4, for any ε ′ > 0 we can choose some K < ∞ such that, after possibly increasing n 0 , the right-hand side of (5.28) is bounded from above by B (j) fr ε ′ /s n . Since B (j) fr = O P (s 2 n ) by Theorem 2.17 (a), the proof is complete.

Emergence of R-lucky vertices: Proof of Lemma 2.14
In this section we show how to express B t \ B fr , t ≥ T unfr , as a suitable union of branching processes. This representation will be useful in the proof of Lemma 2.14.
Consider the immediate children v ∈ ∂B fr of individuals in the frozen cluster B fr . Then, for where BP (v) denotes the branching process of descendants of v, re-rooted and time-shifted as in (2.14). Furthermore, conditional on B fr , the children v ∈ ∂B fr appear according to a Cox process. Formally, the point measures P (j) n,unfr = v∈∂B (j) fr form Cox processes with intensities dµ (j) n,fr ⊗ dP(BP (1) ∈ ·), j = 1, 2, where the frozen intensity measures µ (j) n,fr were introduced in (5.23). Proof of Lemma 2.14. Let ε > 0 be given. We first claim that there exists r > 0 and C < ∞ such that T B v j,r ≤ T unfr + Cf n (1) with probability at least 1 − ε for n sufficiently large. To prove this, apply Lemma 5.6 and Lemma 5.7 (with δ = 1 2 , δ ′ = ε/2) to find K < ∞ such that n,fr (y) ≥ 1 2 s n (5.31) with probability at least 1 − 1 2 ε for n sufficiently large. Use Lemma 2.9 to choose C < ∞ such that K/λ n ≤ Cf n (1) for n large enough. Conditional on B fr , each vertex v ∈ ∂B (j) fr has an independent chance of being r-lucky, so the number of r-lucky vertices v ∈ ∂B (j) fr with T v ∈ (T unfr , T unfr + Cf n (1)) is Poisson with mean µ (j) n,fr (0, Cf n (1))P(v is r-lucky). By taking r sufficiently small, Proposition 2.11 allows us to make ( 1 2 s n )P(v is r-lucky) large enough (uniformly in n sufficiently large) so that, by (5.31), the number of such vertices v will be positive with probability at least 1 − ε. This proves the claim. It now suffices to show that (1)). The argument is very similar to the previous one, except that now we have access to at least rs 2 n vertices whose ages are known to be small. Let r | ≥ rs 2 n by definition. Conditional on L (j) r , for every K ′ ∈ (0, ∞), the number of children w is Poisson with mean at least (rs 2 n )(K ′ /s n ). Given ε > 0, Proposition 2.11 allows us to choose K ′ large enough, so that at least one such vertex will be R-lucky with probability at least 1 − ε − o(1). By Condition 1.3, f n (1 + K ′ /s n ) = O(f n (1)), and this completes the proof.

IP and the geometry of the frozen cluster
In this section, we compare the frozen cluster to the IP cluster IP (j) (∞) -the set of all vertices ever invaded in the IP process. The structure of the IP cluster is encoded in a single infinite backbone and an associated process of maximum weights, with off-backbone branches expressed in terms of Poisson Galton-Watson trees. See Proposition 6.2 below.
The proofs in this section rely on detailed comparisons between the frozen cluster and the part of the IP cluster within distance of order s n from the root. Specifically, we show that (a) the freezing time can be effectively bounded by the time T V BB,j ⌊Ksn⌋ when the first passage exploration process first explores to distance Ks n along the IP backbone (Lemma 6.6); (b) the time to complete this exploration is comparable to the largest first passage weight along the path (Lemma 6.7); and (c) the likelihood of exploring very long paths that do not belong to the IP cluster is moderate. Assertions (a) and (b) will allow us to prove Theorem 2.13, and assertion (c) will be made precise in Lemma 6.8 and Lemma 6.9 and the proof of Theorem 2.17 (b).
For the proof of Theorems 2.13 and 2.17 (b), we will obtain bounds on BP (j) T for certain random times T satisfying T ≥ T (j) fr with large probability. Since B (j) and BP (j) evolve in the same way until the time T (j) fr , such bounds will apply a fortiori to B (j) fr , and we write BP (j) Throughout this section, we assume Conditions 1.3, 1.4 and 2.8. We will reserve the notation ε 0 , δ 0 for some fixed choice of the constants in Conditions 1.4 and 2.8, with ε 0 chosen small enough to satisfy both conditions.

Structure and scaling of the IP cluster
Our description of IP (j) (∞) is based on [3], which examines the structure of the IP cluster on the PWIT, and the scaling limit results in [7], which proves similar results for regular trees. As remarked in [3], the scaling limit results of [7] can be transferred to the PWIT without difficulty.
To describe the structure of the IP cluster, we first define the backbone: The off-backbone branch at height k means the subtree of IP (j) (∞) consisting of vertices that are descendants of V BB,j k but not descendants of V BB,j k+1 , and is denoted by τ k . We consider τ k as a rooted labelled tree, but with the edge weights and vertex labels from the PWIT forgotten 5 .
In the notation of Definition 6.1, the maximum invaded edge weight M (j) from (2.5) is now M (j) 0 . (This amounts to the observation, elementary to verify, that the largest edge weight M (j) must occur as one of the backbone edge weights X BB,j k .) 3,7]). The backbone is well-defined, and M (j) k > 1 for each k, a.s. Furthermore: (a) The maximum in (6.1) is attained uniquely, for each k, a.s. Writing I k for the random height at which the maximum in (6.1) is attained, it holds that I k = O P (k) for each k ≥ 1.
is non-increasing and forms a Markov chain with initial distribution P(M (j) 0 ≤ m) = θ(m) and transition mechanism (c) M (j) k = 1 + Θ P (1/k) and indeed k(M (j) k − 1) converges weakly to an exponential distribution with mean 1 as k → ∞.    Proof. The backbone is well-defined by Corollary 22 in [3]. The same paper proves (a) in Theorems 21 and 30, (b) in Section 3.3, (d) in Theorem 31, and (e) in Theorem 3. It has been 5 As in the proof of Proposition 2.11, we should consider instead of τ k the setτ k where we replace each vertex v ∈ τ k \{V BB,j k } by an arbitrary label ℓ(v) drawn independently from some continuous distribution. By a slight abuse of notation, we will refer to τ and X v , v ∈ τ k \ {V BB,j k } instead ofτ k and X ℓ −1 (v) , v ∈τ k . This procedure avoids the complication, implicit in our Ulam-Harris notation, that the vertex v = ∅ j k 1 k 2 . . . k r ∈ T (j) automatically gives information about the number of its siblings with smaller edge weights. observed on the top of page 954 in [3] that the methodology of [7] can be applied to show that [7,Proposition 3.3] holds for the PWIT, proving (c).
For parts (f) and (g), notice that the event that τ k equals a particular finite tree τ requires that the children of V BB,j k should consist of (i) the child V BB,j k+1 with edge weight consistent with the process M (j) ; and (ii) other children, and their descendants, joined to V BB,j k by edges of weight less than M (j) k , in numbers corresponding to the structure specified by τ . However, conditioning on (M (j) k ) ∞ k=0 and (τ k ) ∞ k=0 does not impose any constraint on the precise value of the edge weights less than M (j) k , nor on the uninvaded edge weights that exceed M (j) k . Parts (f) and (g) therefore follow from properties of Poisson point processes.
For (h) it suffices to notice that (d) implies that the diameter of k−1 j=0 τ j is stochastically dominated by k plus the maximum of k extinction times from k independent critical Poisson Galton-Watson branching processes. Since the probability that a critical Poisson Galton-Watson branching process lives to generation ℓ is O(1/ℓ), the claim follows.
We next give two lemmas that we will use in the following section to bound expectations of functions of the backbone edge weights X BB,j k : Lemma 6.3. There is a constant K < ∞ such that, for any non-negative measurable function h and any k ∈ N, Proof. This is immediate from Proposition 6.2 (e) and the bound P M (j) 2) and Proposition 5.1 (b). Lemma 6.4. Given m 0 ∈ (1, ∞), there is a constant K < ∞ such that, for any k, k 0 ∈ N 0 with k > k 0 and for all m ∈ (1, m 0 ], the law of M (j) k conditional on M (j) k 0 = m is stochastically dominated by m ∧ (1 + K k−k 0 E), where E is an exponential random variable with mean 1. Proof. Use (6.2) and (5.2) to obtain K ∈ (0, ∞) such that, for all k ∈ N 0 , 1 < m ′ ≤ m ≤ m 0 , and the statement of the lemma is proved for k = k 0 + 1. To establish the result for general k > k 0 , we use an induction over k. According to (6.6), we can couple M (j) k+1 given M (j) k to an exponential random variable E ′ of mean 1 that is independent of M (j) k and satisfies k ≤ m 0 . Using the Markov property of (M (j) k ) k from Proposition 6.2 (b), By the induction hypothesis, the distribution of M (j) k conditional on M (j) k 0 = m is stochastically dominated by m ∧ (1 + K k−k 0 E ′′ ) for an exponential random variable E ′′ of mean 1 which can be chosen independent of E ′ . Since x → P(x ∧ (1 + KE ′ ) < m ′ ) is non-increasing, we obtain Since E ′ and E ′′ are independent, k+1−k 0 E for an exponential random variable E of mean 1 and the proof is complete.

First passage times and the IP cluster: Proof of Theorem 2.13
For notational convenience, given a constant K < ∞ to be fixed below, we will abbreviate (6.9) for the remainder of this section. We begin with the following preliminary lemma: Lemma 6.5. Let (U i ) i∈N be an i.i.d. sequence of uniform random variables on (0, 1). There exist δ > 0 and n 0 ∈ N such that Proof. Choose u ∈ (0, 1) such that u ε 0 ≤ 1 4 ε 0 , where ε 0 was fixed below Condition 2.8. Using the inequality P(U i ≤ u 1/sn ∀i ≤ ⌊s n ⌋) ≥ u, we can bound P( ⌊sn⌋ i=1 f n (mU i ) ≤ f n (m)) ≥ uP ⌊sn⌋ i=1 f n (mU i ) ≤ f n (m) U i ≤ u 1/sn ∀i ≤ ⌊s n ⌋ . By Lemma 5.3, for sufficiently large n and for all m ≥ 1, 2 for all n sufficiently large. We may therefore take δ = u/2. Lemma 6.6. Given δ > 0, there exist K < ∞ and n 0 ∈ N such that T (j) fr ≤ T V BB,j ⌊Ksn⌋ with probability at least 1 − δ for all n ≥ n 0 .
Proof. By Theorem 2.17 (a) there exists K 1 such that P( BP (j) fr > K 1 s 2 n ) < δ/4 for large n. Recall from (6.9) that we write T = T V BB,j ⌊Ksn⌋ . We now show that there is a K < ∞ such that BP (j) T contains more than K 1 s 2 n vertices with probability at least 1 − 3δ/4 for large n. This will then prove the lemma. Set A = M (j) ⌊sn⌋ ≤ 1 + K 2 /s n , (6.12) where K 2 < ∞ is chosen using Proposition 6.2 (c) so that P(A) > 1 − δ/4 for large n. By monotonicity, on A, M (j) k ≤ 1 + K 2 /s n for all k ≥ ⌊s n ⌋. Let k ≥ ⌊s n ⌋ and write τ ′ k ⊂ τ k for those vertices in the k th off-backbone branch that lie within distance s n of V BB,j K 1 < ∞ and then Lemma 5.3 and Lemma 3.1 and M (j) k−1 ≤ M (j) k 0 , we estimate uniformly in m 0 ) is a subgraph of those vertices connected to the root by paths that use only edges of PWIT edge weight strictly less than M (j) 0 . This latter subgraph is a random subgraph of T (j) that is finite a.s. (by Proposition 6.2 (a) and (d)) and independent of n. In particular, the subgraph BP (j) fr ≥ f n (M (j) 0 ) whp, it therefore suffices to show that B (j) fr P −→ ∞. Indeed, we shall show that B (j) fr must contain at least of order s n vertices. To this end, it suffices by Lemma 5.6 to show that any vertex v ∈ B (j) fr can contribute at most O(1) to e −λny dµ (j) n,fr (y). We use Lemma 3.3 to compute e −λny µ n (t − T v + dy) ≤ 1 + We have therefore shown that f −1 n (T (j) fr ) ≥ M (j) 0 whp. For the corresponding upper bound, let ε > 0 be given and choose m 0 < ∞ large enough that P(M (j) 0 > m 0 ) < ε. By Lemma 6.6 it is enough to show that for any fixed K < ∞, Use Lemma 6.7 with k 0 = 0 together with Markov's inequality to find a constant K ′ < ∞ such that  We next give an upper bound on the number of boundary edges of the IP cluster for which a distant descendant is explored before the freezing time. Here we say that a vertex v is explored by time t if v ∈ BP t . Vertices that belong to the IP cluster IP(∞) are called invaded. Write N(k, K) for the number of vertices of ∂τ k for which some descendant at least ⌈s n ⌉ generations away belongs to BP (j) T . (Recall from (6.9) the abbreviation T = T V BB,j ⌊Ksn⌋ , and recall also that τ k denotes the tree consisting of vertices that are descendants of V BB,j k but not descendants of V BB,j k+1 .) Proof. Consider v ∈ τ k and its children, which we write as vi, i ∈ N. Recall from Proposition 6.2 (g) that, conditional on the invasion cluster, the PWIT weights {X vi : vi / ∈ τ k } (i.e., the weights corresponding to children that were not invaded) form a Poisson point process on (M (j) k , ∞) with intensity 1.
A child vi of v will be explored by the FPP process by time T if and only if f n ( , we have as an upper bound that vi can only be explored if X vi ≤X. In particular, the conditional expectation of the number of uninvaded children of v that are explored by the FPP process by time T obeys the bound For each such child vi, we may bound the probability that a descendant of vi at least ⌈s n ⌉ generations away is explored by time T by the probability that a Poisson Galton-Watson branching process with meanX survives for at least s n generations. Because of (6.21) and since in (6.20) M (j) k ≥ 1 + δ/s n , we may assume thatX ≥ m ≥ 1 + δ/s n . By Proposition 5.1 (g) and (b), this probability is O(X − 1).
On the other hand, conditional on M (j) k = m, the expected size of τ k is O(1/(m − 1)), since, by Proposition 6.2 (d) and (5.3), τ k is distributed as a Poisson Galton-Watson tree with subcritical parameterm < 1 and 1 −m ∼ m − 1. We conclude that uniformly in m ∈ [1 + δ/s n , ∞), In order to bound the right-hand side of (6.22), we first use Lemma 3.1 and Lemma 6.7 and Markov's inequality to obtain a constant K ′ < ∞ such that for all m ∈ [1, m 0 ] and x ≥ 0, Thus, for 1 ≤ m ≤ m 0 and p ∈ {1, 2}, (6.24) For m ≥ 1 + δ/s n , we have 1/(m − 1) ≤ O(s n ) and we obtain that uniformly for m ∈ [1 + δ/s n , m 0 ], Combining (6.22) with (6.25) completes the proof. Lemma 6.8 will allow us to control the effect of branches τ k for small k. For larger k, the following estimate will allow us to bound the probability that a very long path is explored between time T V BB,j k and the freezing time. Let v ∈ τ k . Write D v,r,C,ℓ for the collection of descendants of v connected to v by exactly r edges of weight at most 1 + C/s n , at most ℓ of which lie between 1 − 1/s n and 1 + C/s n . Lemma 6.9. Given C, ℓ < ∞, there is C ′ < ∞ such that P(D v,r,C,ℓ = ∅) ≤ C ′ /r for all r ∈ N , v ∈ T and s n ≥ 1.
Proof. On the event {D v,r,C,ℓ = ∅}, let W be chosen uniformly from D v,r,C,ℓ , and let L denote the collection of vertices wi along the path from v to W for which 1 − 1/s n ≤ X wi ≤ 1 + C/s n . By the Poisson point process property, conditional on the occurence of {D v,r,C,ℓ = ∅} and the values of W and L, the edge weights X wi , wi ∈ L, are uniformly distributed on [1 − 1/s n , 1 + C/s n ]. In particular, 1 − 1/s n ≤ X wi ≤ 1 for each wi ∈ L with conditional probability (1/(C + 1)) |L| , which is at least (1/(C + 1)) ℓ by construction. If this occurs then v is connected by edges of weight at most 1 to a descendant at generation r. Using the notation from Proposition 5.1 and appealing to standard properties of critical Galton-Watson processes (see for example Lemma I.1.10 in [15]), O(1/r) = P 1 (|τ (≥r) | ≥ 1) ≥ P(D v,r,C,ℓ = ∅) · (1/(C + 1)) ℓ , (6.26) completing the proof.
With these preliminaries, we can now complete the proof. Assume that all the above events occur, making an error of at most 6ε. By Proposition 6.2 (h), the part of the IP cluster that does not descend from V BB,j ⌊Ksn⌋ has diameter O P (s n ). Hence it suffices to show that vertices of BP (j) T are within distance O P (s n ) from the IP cluster. The assumption N(k, K) = 0, k ≤ ηs n , verifies this for the beginning of the backbone.
For k ≥ ηs n , (6.21), (6.27) and the bound E |τ k | M (j) k = O(1/(M (j) k − 1)) = O(s n ) imply that the (conditionally) expected number of vertices vi ∈ BP (j) T \ τ k for which v ∈ τ k , for some ηs n ≤ k ≤ Ks n , is at most (C/s n ) · O(s n ) · (Ks n ) = O(s n ). If such a vertex vi has a descendant w ∈ BP (j) T at distance r, then by (6.27) the path from vi to w can contain no edges of PWIT weight greater than 1 + C/s n and at most ℓ edges of PWIT weight greater than 1 − 1/s n . In other words, necessarily D vi,r,C,ℓ = ∅. This event is conditionally independent of the possible vi, so by Lemma 6.9 the conditional probability of any D vi,r,C,ℓ occuring is at most O(s n /r). This can be made smaller than ε by taking r = C ′′′ s n for C ′′′ large enough, and this completes the proof. Proof of Theorem 1.6. The proof of (1.14) proceeds via stochastic upper and lower bounds on W n based on couplings with IP where we use two different exploration processes. For the lower bound, consider the minimal-rule exploration process E = (E k ) k∈N 0 given by IP on T that alternates between an invasion step in T (1) and an invasion step in T (2) as explained below Definition 4.5. By Theorem 4.4 the edge-weights Y (Kn) e = f n (X (Kn) e ) derived from this exploration process as in (4.4) are i.i.d. with distribution function F Y and it suffices to consider the FPP problem on K n with these edge weights. Let V (j) denote the set of vertices in the invasion cluster on T (j) explored before the edge of weight M (j) 0 is invaded. (In the language of [3,10,14,17], V (j) is the first pond, not including the first outlet.) Let V (j) + consist of V (j) together with all adjacent vertices connected by an edge of weight at most M (1) 0 ∨ M (2) 0 . Let A n be the event that none of the vertices in V (1) + ∪V (2) + are thinned, and that the exponential variable X (Kn) {1,2} = E {1,2} from (4.4) satisfies X (Kn) {1,2} ≥ 1 n (M (1) 0 ∨ M (2) 0 ). Since V (j) + is finite and the vertices in V (j) are explored after a finite number of steps (not depending on n), A n holds with high probability.
For the upper bound, let ε > 0 and let N, N 1 ∈ N denote constants to be chosen later. Modify the minimal-rule exploration process above by stopping after N steps in each subtree T (j) , i.e., set E ′ k = E k∧2N , and couple the edge weights according to (4.4). Denote by X (j) max the largest edge weight in T (j) so invaded, so that X (j) max ≤ M (j) 0 by definition. Let U (j) = {v ∈ ∂E 2N 1 ∩ T (j) : X (j) max < X v < X (j) max + ε} denote the collection of boundary vertices joined to invaded vertices by an edge of weight at most X (j) max + ε. Conditional on X (1) max , X (2) max and E ′ , the number |U (j) | of such boundary vertices is Poisson with mean εN, independently for j ∈ {1, 2}. (This holds because the event that the exploration process E ′ explores a given sequence of vertices v 1 , . . . , v 2N 1 can be expressed solely in terms of the numbers |{vw ∈ ∂E ′ k : X vw < X v i }| of boundary edges of smaller weight, over all k, i = 1, . . . , 2N.) Let A ′ n denote the event that none of the vertices in E ′ 2N ∪ U (1) ∪ U (2) have the same mark. Condition on the occurrence of A ′ n and the disjoint vertex sets π M (E ′ 2N ), π M (U (1) ), π M (U (2) ). Consider the induced subgraph K ′ n−2N −2 of K n obtained by excluding the 2N + 2 explored vertices in π M (E ′ 2N ). Since no other vertices are explored, the edge weights in this induced subgraph are the independent exponential random variables E e from (4.4).
The random subgraph G ′ n−2N −2 = e ∈ E(K ′ n−2N −2 ) : E e ≤ 1 n (1 + ε) has the (conditional) law of the Erdős-Rényi random graph G(n − 2N − 2, p) with p = P(E e ≤ 1 n (1 + ε)) ∼ 1 n (1 + ε) as n → ∞. As is well known, in the supercritical regime, the giant component has diameter O P (log n) and contains a positive asymptotic fraction of vertices (see e.g., [?]). Suppose U (1) , U (2) are two disjoint subsets of vertices in G ′ n−2N −2 (possibly random but independent of the randomness in G ′ n−2N −2 ). If U (1) and U (2) are sufficiently large, each of them is likely to contain at least one vertex from the giant component. Hence we may choose N 1 ∈ N such that, given the event {|U (1) | , |U (2) | ≥ N 1 }, there will exist a pair of vertices u 1 ∈ U (1) , u 2 ∈ U (2) connected by a path in G ′ n−2N −2 of length at most N 1 log n, with conditional probability at least 1 − ε for n sufficiently large.
Since the sizes |U (j) | are independent Poisson random variables with mean εN, we may choose N large enough that |U (j) | ≥ N 1 with probability at least 1 − ε. Moreover, since E ′ 2N , U (1) , U (2) are finite and do not depend on n, it follows that A n occurs with high probability and we can choose N 2 large enough that the diameters of E ′ 2N , U (1) , U (2) are at most N 2 with probability at least 1 − ε.

(7.2)
Since s n / log log n → ∞, the right-hand side of (7.2) diverges to infinity and therefore exceeds 1 for n sufficiently large.
• B t is the union of 2 CTBPs with the appropriate freezing of one of them, and the resulting thinning. Thus, B t has the same law as the frozen S t .
• f n is the function with Y (Kn) e d = f n (nE), where E is exponential with mean 1.
• µ n is the image of the Lebesgue measure on (0, ∞) under f n • λ n is the exponential growth rate of the CTBP, cf. (2.13).