University of Birmingham Heavy subtrees of Galton-Watson trees with an application to Apollonian networks

We study heavy subtrees of conditional Galton-Watson trees. In a standard Galton-Watson tree conditional on its size being n , we order all children by their subtree sizes, from large (heavy) to small. A node is marked if it is among the k heaviest nodes among its siblings. Unmarked nodes and their subtrees are removed, leaving only a tree of marked nodes, which we call the k -heavy tree. We study various properties of these trees, including their size and the maximal distance from any original node to the k -heavy tree. In particular, under some moment condition, the 2 -heavy tree is with high probability larger than cn for some constant c > 0 , and the maximal distance from the k -heavy tree is O ( n 1 / ( k +1) ) in probability. As a consequence, for uniformly random Apollonian networks of size n , the expected size of the longest simple path is Ω( n ) . We also show that the length of the heavy path (that is, k = 1 ) converges (after rescaling) to the corresponding object in Aldous’ Brownian continuum random tree.

•Users may freely distribute the URL that is used to identify this publication.
•Users may download and/or print one copy of the publication from the University of Birmingham research portal for the purpose of private study or non-commercial research.
•User may use extracts from the document in line with the concept of 'fair dealing' under the Copyright, Designs and Patents Act 1988 (?) •Users may not further distribute the material nor use it for the purposes of commercial gain.
Where a licence is displayed above, please note the terms and conditions of the licence govern your use of this document.
When citing, please reference the published version.Take down policy While the University of Birmingham exercises care and attention in making items available there are rare occasions when an item has been uploaded in error or has been deemed to be commercially or otherwise sensitive.
If you believe that this is the case for this document, please contact UBIRA@lists.bham.ac.uk providing details and we will remove access to the work immediately and investigate.

Introduction
We study Galton-Watson trees of size n.More precisely, we have a generic random variable ξ defined by where (p i ) i≥0 is a fixed probability distribution.Throughout the paper, we assume that E [ξ] = 1, and 0 < σ 2 := E ξ 2 − 1 < ∞. (1.1) The random variable ξ is used to define a critical Galton-Watson process (see, e.g.[11]).
In a standard construction, we label the nodes of the Galton-Watson tree in preorder, that is, by sorting them as they appear first in the depth first traversal.See Figure 1 for an example.If ξ 1 , ξ 2 , . . .are independent copies of ξ, then we assign ξ i children to node i.
This is a Galton-Watson tree.Given |T | = n, T is a conditional Galton-Watson tree.
The associated random walk (S i ) 0≤i≤n with S n = −1 and S i ≥ 0 for all 0 ≤ i ≤ n is called Łukasiewicz path.(We extend this walk to a continuous function S t by linear interpolation.See Figure 1.)The family of conditional Galton-Watson trees has gained importance in the literature because it encompasses the simply-generated trees introduced by Meir and Moon [47], which are basically ordered rooted trees (of a given size) that are uniformly chosen from a class of trees.For example, when p 0 = p 2 = 1/4, p 1 = 1/2, the conditional Galton-Watson tree corresponds to a binary tree of size n chosen uniformly at random.When (p i ) i≥0 is Poisson(1), then we obtain a random labeled rooted tree, also called a Cayley tree.

The asymptotic behaviour of Galton-Watson trees
In order to put the results of this paper into perspective, we shortly discuss the two main approaches towards limit theorems on conditional Galton-Watson trees with respect to their global and local behavior.
First, thanks to Aldous' groundbreaking work [5,6,7], it is well-known that conditional Galton-Watson trees converge (in a suitable sense and as random metric spaces endowed with the graph distance) after rescaling of edge-lengths by

√
n in distribution to the Brownian continuum random tree.In this context, see also the work of Le Gall [43] and Marckert and Mokkadem [46].We review these results in more detail in Section 6.
Second, as n grows large, the Galton-Watson tree in the vicinity of the root is described by the so-called size-biased Galton-Watson tree in the sense of Aldous-Steele (or Benjamini-Schramm [13]) convergence [9].This infinite (but locally finite) random tree was introduced by Kesten [40] and is related to the so-called spine decomposition of the Galton-Watson tree.Compare Lyons, Pemantle and Peres [44], Lyons and Peres [45,Chapter 12], Aldous and Pitman [8, Section 2.5] and Janson [36,Section 7].More details and precise statements in this context are presented in Section 3. The present paper looks at a less natural decomposition of the conditional Galton-Watson tree, but one that has far-reaching applications in computer science and the study of random networks, more precisely, random Apollonian networks.

Heavy subtrees and main results
One can reorder all sets of siblings by subtree size, from large to small, where ties are broken by considering the preorder index.For a node v in the (conditional or not) Galton-Watson tree distinct from its root, we denote by ρ v the rank in its ordering (for example, ρ v = 1 means that v has the largest subtree among its siblings).No rank is defined for the root.Let A v = (v 1 , . . ., v d = v) be the sequence of ancestors of v if v is at distance d ≥ 1 from the root.(The root does not appear in this sequence, and the node v i , 1 ≤ i ≤ d, has distance i from the root.)We define the maximal rank ρ * v = max(ρ v1 , . . ., ρ v d ).
For a fixed integer k, we define the k-heavy Galton-Watson tree as the tree formed by the root and all nodes v in the conditional Galton-Watson tree with ρ * v ≤ k.In particular, as nodes in the k-heavy tree have rank at most k, they have out-degree k or less.For k = 1, we obtain a path, which we call the heavy path -just follow the path from the root down, always going to the largest subtree, taking the oldest branch in case of a tie.
Our main interest is the study of the case k = 2, the 2-heavy Galton Watson tree.We show that it captures a huge chunk of the Galton-Watson tree by proving the following result: Theorem 1.1.Consider a Galton-Watson tree whose offspring distribution satisfies (1.1) with E ξ 5 < ∞ conditional on having size n where P (S n = −1) > 0.
Since the number of nodes of degree i in a conditional Galton-Watson tree is in probability asymptotic to np i , it is easy to see that the size of the 2-heavy tree cannot be more than so that there is no hope of replacing κn by n − o(n) in (1.3).In fact, we believe that the size of the 2-heavy tree satisfies a law of large numbers when rescaled by n −1 as n → ∞ with a limiting constant depending on the distribution of ξ.The condition E ξ 5 < ∞ is of technical nature, and we believe that the statement holds under the finite variance assumption (1.1).A related interesting statistic is the maximal size of any binary subtree of the conditional Galton-Watson tree.The lower bounds in Theorem 1.1 and the upper bound (1.4) also apply to this quantity, which we think deserves further studies.
We also study the maximal distance to the k-heavy trees.For a non-empty (connected or not) subgraph A of the conditional Galton-Watson tree, we call the maximal distance to A d max (A) := max where dist(•, •) refers to path distance between vertices in the conditional Galton-Watson tree.(This definition makes sense in any finite ordered tree.)The maximal distance to the k-heavy tree measures to some extent how pervasive the k-heavy tree is.In the next theorem, we let H k denote the k-heavy subtree.Further, we write A k for the set of all subtrees of the conditional Galton-Watson tree in which every node has at most k children.Observe that A k is in general much larger than the collection of all subtrees of H k .
(i) If E ξ k+3 < ∞, then, for any ε > 0, there exists a constant C * > 0 such that (1.5) (ii) If E ξ k+2 < ∞ and ≥k+1 p > 0, then, for any ε > 0, there exists c * > 0 such that P inf (1.6) The theorem shows that, under appropriate moment conditions on ξ, the k-heavy subtree exhausts the entire tree asymptotically optimally since every k-ary subtree leaves out nodes of distance order n 1/(k+1) away.(Here, the choice of a k-ary subtree can even depend on the realization of the conditional Galton-Watson tree.)In particular, under the fifth moment condition from Theorem 1.1, the maximal distance from the 2-heavy tree is Θ(n 1/3 ), a result that cannot possibly be deduced from standard continuum random tree results for conditional Galton-Watson trees [5,6,7,43].In Proposition 5.2 in Section 5 we give some results on necessary moment conditions on ξ to guarantee tightness of the sequence n −1/(k+1) inf T ∈A k d max (T ), n ≥ 1.
We finally study the length L n of the heavy path.(i) There exists a non-negative random variable L ∞ such that, as n → ∞, in distribution and with convergence of all moments, (ii) For k ≥ 0, let P n (k) be the size of the subtree rooted at the node on level k on the heavy path.There exists a random decreasing process P ∞ (t), t ∈ [0, 1], with càdlàg paths such that, in distribution, in the Skorokhod topology on the set of càdlàg functions, ] such that, in distribution, on the space of continuous functions on [0, 1], Heavy subtrees of Galton-Watson trees In Section 6, we discuss more detailed properties of L ∞ including the existence of a density (see (6.13)), a characterization of its distribution by a stochastic fixed-point equation (see (6.15)) and its (negative) moments (see (6.11) and (6.12)).Theorem 6.12 contains a more precise statement of Theorem 1.3 identifying the limiting random variables as functionals of a Brownian excursion.In particular, and as opposed to the k-heavy trees for k ≥ 2, the heavy path can be studied using the global picture sketched in Section 1.1 above, and the distributions of the scaling limits L ∞ , P ∞ and Q ∞ depend only on σ.The proof of Theorem 1.3 further reveals that the convergences in (i), (ii) and (iii) are joint and that the limiting objects are natural statistics in the continuum random tree.In this context, we draw connections to self-similar fragmentation processes studied by Bertoin [15,16] and exploit results from his work.
Further, we study the tail behaviour of L ∞ near 0 and ∞ in more detail.In particular, we note that, at 0, it grows more slowly than any polynomial but much faster than the theta law which is the scaling limit of the tree height H n , see (5.3) in Section 5. (H n is equal to the maximal distance of a node from the root in τ n .)Thus, the obvious inequality L n ≤ H n is loose.This is formulated in Proposition 6.2.

Apollonian networks
In 1930, Birkhoff [20] introduced a model of a planar graph that became known as an Apollonian network, a name coined by Andrade et al. [10] in 2005.Suggested as toy models of social and physical networks with remarkable properties, they are recursively defined by starting with three vertices that form a triangle in the plane.Given a collection of triangles in a triangulation, choose one (either at random, or following an algorithm), place a new vertex in its center, and connect it with the three vertices of the triangle.So, in each step, we create three new edges, one new point, and three new triangles (which replace an old one).After n steps, we have 3 + n vertices, and 3 + 3n edges in the graph.This is an Apollonian network.One can also define a corresponding evolutionary tree: start with the original triangle as the root of a tree.In a typical step, select a leaf node of the tree (which corresponds to a triangle) and attach to it three children.This tree has a one-to-one relationship with the Apollonian network.It has 1 + 2n leaves (after n steps) and 1 + 3n vertices.(In particular, the n non-leaves in the tree correspond to the nodes in the Apollonian network lying strictly inside the initial triangle.)See Figure 2 for an illustration.Random Apollonian networks.The most frequently studied random Apollonian network (see Zhou, Yan and Wang [53]) is one in which each triangle (in the network)-or, equivalently, each leaf in the tree-is chosen uniformly at random for splitting, leading to a so-called split tree [28].Asymptotically, its height after n steps is bounded almost EJP 24 (2019), paper 2.
surely by c log n for a suitable constant c > 0 [22].Typical distances, the diameter and node degrees in the network have recently been studied in a number or papers using probabilistic, combinatorial and analytic methods [3,31,42,34,26].
For this paper, the work on the longest simple path in the Apollonian network is most relevant.The asymptotic behavior of its length L rec n is still not well understood today.A series of papers in recent years including [34] and [31] have culminated in the work of Collevecchio, Mehrabian and Wormald [25] who showed that L rec n is with high probability at most of order n 1−ε where ε can be chosen 4 • 10 −8 .
Our main motivation to study k-heavy trees was to understand the length of the longest simple path L unif n in the probabilistic model where we generate a random ordered tree of size 1 + 3n in which each non-leaf node has three children, such that all trees are equally likely.This corresponds to a conditional Galton-Watson tree (of size 1 + 3n) with p 0 = 2/3 and p 3 = 1/3.We call the random network with this underlying evolutionary tree the uniform Apollonian network.With methods from analytic combinatorics, Darrasse and Soria [27] studied the degree distribution in this network.With similar techniques, Bodini, Darrasse and Soria [21] investigated typical distances.Relying on more probabilistic arguments, Albenque and Marckert [3] proved that a uniform Apollonian network possesses the same scaling limit as its evolutionary tree, namely the Brownian continuum random tree.In particular, typical distances and the diameter of the graph grow proportionally to the square root of the number of nodes.As a result, uniform Apollonian networks reveal a strikingly different behaviour than random recursive ones.
The length of the longest simple path L unif n in a uniform Apollonian network is bounded from below by the size of any binary subtree embedded in the evolutionary Galton-Watson tree divided by two.(In fact, this is a deterministic bound valid in any Apollonian network, and the argument is essentially given in [31,Section 4], albeit in a different language. For the reader's convenience, we reproduce the proof in Appendix A.) In particular, L unif n is larger than half the size of the 2-heavy tree.Therefore, by Theorem 1.1, and contrary to the situation in recursive Apollonian networks, the length of the longest simple path is not sublinear: there exists c > 0 such that Similarly to (1.4), any c satisfying the last display is bounded away from 1.This follows from Lemma 3.1 in [31] stating that any simple path in an Apollonian network visits at most eight grandchildren of any vertex in the evolutionary tree and the fact that there is a positive proportion of nodes in a conditional Galton-Watson tree with p 0 = 2/3 and p 3 = 1/3 with nine grandchildren.

Notation
Throughout the paper, we use Here, for a set of integers J, gcd(J) denotes the greatest common divisor of all elements in J. From Bézout's lemma, it follows that I = (Nh + 1)\A for some finite set A ⊆ N. We write • T for a generic realization of the unconditional Galton-Watson tree, • T 1 , T 2 , . . .for a sequence of independent copies of T , • τ n , n ∈ I, for T conditional on having size n.
T and τ n are considered as ordered rooted trees.For v ∈ τ n , we let • ξ(v) be the number of children of v, • N (v) be the size of the subtree rooted at v, • H(v) be the height of the subtree rooted at v, • N k (v) be the size of the k-th largest subtree rooted at a child of v, abbreviating We write ∅ for the root and abbreviate ξ and N k+ = N k+ (∅).(To increase readability, we suppress the dependence on n of these quantities in the notation.)For n ∈ I, if the context requires the indication of the size of the tree, we also write ξ ∅ (n), N (n), H(n), N k (n) and N k+ (n) for the corresponding quantities in τ n when referring to the root ∅.Finally, we let In all sections of this work with the exception of Section 6.2, all constants except c, c 1 , c 2 , . . ., C, C 1 , C 2 , . . .carry fixed values.The values of constants used multiple times may vary between two results or proofs but not within.Here, constants C, C 1 , C 2 , . . .> 0 are meant to carry large values, whereas c, c 1 , c 2 , . . .> 0 are typically small.

Outline
The paper is organized as follows: first, in Section 2, we recall standard material on the size of the Galton-Watson tree T as well as recent results about the number of fringe trees in τ n due to Janson [37].We then state some related preliminary bounds in Lemma 2.2 and Corollary 2.3 for later purposes.In Section 3 we study the distribution of the subtree sizes of the children of the root in the conditional Galton-Watson tree.Most notably, we provide bounds on the corresponding distribution functions in Theorem 3.4.Apart from applying these bounds in subsequent sections, we think they are of independent interest.In Section 4 we study the 2-heavy tree and prove Theorem 1.1.
Here, the proof of (1.3), the main part of the work, relies on a second moment argument.Section 5 is devoted to the proof of Theorem 1.2.While the upper bound in part (i) follows rather straightforwardly from our tools derived in earlier sections, the lower bound in (ii) relies on deeper results on the concentration of the number of fringe trees in [37].Finally, in Section 6 we study the heavy path.The techniques used in this section differ substantially from the remaining content of the paper.In particular, Section 6.2 can be read independently of the remainder of this work.

Preliminary results and fringe trees
Let us start by recovering some classical results which have proved fruitful in the analysis of conditional Galton-Watson trees.Throughout this section we use the notation α, h, I, I n , T , τ n and Y k as introduced in Section 1.4.Recall the following well-known identity going back to Dwass [30] (compare also Janson [36,Theorem 15.5] and the discussion therein), n . (2.1) More generally, for independent copies T 1 , T 2 , . . . of T , Similarly, as n → ∞, n ∈ Nh + 1, (2.5) By summation, using (2.1) and (2.5), as t → ∞, (2.6) For n ∈ I, 1 ≤ k ≤ n, the study of Y k is closely related to the analysis of a random fringe subtree τ * n of the conditional Galton-Watson tree τ n , that is, a subtree of τ n rooted at a uniformly chosen node.For example, we have The study of random fringe subtrees was initiated by Aldous [4], who showed that, under assumption (1.1), for all finite ordered rooted trees t.In particular, P (|τ For generalizations of Aldous' results, see Bennies and Kersting [14] and Janson [36].More recently, Janson [37, Theorem 1.5] obtained finer results on subtree counts, in particular estimates and asymptotic expansions for the variance and a central limit theorem.We summarize special cases of his results in the following proposition.The exact expressions for mean and variance are contained in [37,Lemma 5.1] and [37,Lemma 6.1].The uniform estimate on the variance (2.9) follows from [37,Theorem 6.7].

Lemma 2.2.
There exists a constant n 0 ≥ 1 such that, for all n ≥ n 0 , n ∈ I, k Similarly, there exist constants n 1 ≥ 1 and ς > 0, such that, for all n ≥ n 1 , n ∈ I, k ∈ I n ∩ I, Proof.By an application of (2.4) and (2.5) to (2.8), there exists n 0 ≥ 1, such that, for all Proof.By applications of the upper bound in the previous theorem, we have The claim follows by summing the three terms.

Subtrees of the root: local convergence
We want to understand the properties of the subtree sizes of a node in a Galton-Watson tree conditional on having size n when these trees are ordered from large to small.This section has key inequalities that will be needed throughout the paper.
Let us give more details on the size-biased Galton-Watson tree mentioned in Section 1.1.Its construction goes as follows: Let ζ 1 , ζ 2 , . . .be an infinite sequence of independent random variables drawn from the size-biased distribution (ip i ) i≥0 .Associate ζ i with the i-th node on a one-sided infinite path (the spine).To every node i on the path assign (ζ i − 1) children off the path, and make each child the root of an independent (unconditional) Galton-Watson tree.The ordered infinite size-biased Galton-Watson is obtained by choosing a uniform ordering on the children of every node on the infinite spine.A formulation of the local convergence result discussed in Section 1.1 is given in the following well-known proposition which is equivalent to Lemma 1 in Devroye [29] and closely related to Lemma 1.14 in Kesten [40].(The convergence of ξ ∅ had already been obtained by Kennedy [39].)Here, by S ↓ , we denote the set of non-negative integer valued sequences x 1 , x 2 , . . .with x 1 ≥ x 2 ≥ . . .and only finitely many non-zero elements.
For k ≥ 1 and 1 ≤ i ≤ k, and real-valued random variables X 1 , . . ., X k , denote by (For random trees T 1 , . . ., T k , we simplify the notation and write |T (i:k) | for the size of i-th largest tree.) where T 1 , T 2 , . . ., ζ are independent and T 1 , T 2 , . . .are copies of the unconditional Galton-Watson tree T .In distribution, ξ ∅ → ζ, where ξ ∅ is the number of children of the root of We are interested in tail bounds on N k , k ≥ 2. The order is suggested by the behaviour of the limiting random variable.Note that, in point (ii) of the following proposition, we write f (t) = Ω(g(t)) as t → ∞ for functions f : R → R and g : R → (0, ∞) meaning that there exists a constant c > 0 such that, for all t sufficiently large, |f (t)| ≥ cg(t).
Proof.We have By (2.6), the right-hand side is asymptotically equivalent to Again, the right hand side is of order t −k/2 .For (iii), since E ξ k+1 = ∞, for any C > 0, find K sufficiently large such that As t → ∞, using (2.6), the right hand side is equivalent to C(2αh −1 ) k t −k/2 .As C was chosen arbitrarily, the final assertion of the proposition follows.
The next theorem give corresponding results for the conditional Galton-Watson tree.
In this context, recall that we write ξ ∅ for the number of children of the root in τ n .
(i) If E ξ k+1 < ∞, then there exists a constant β k > 0, such that, for all t ≥ 1, n ∈ I, (3.1) (ii) If ≥k p > 0, then, for any 0 < ε < 1, there exist constants β * > 0 and n 2 ≥ 1 both depending only on k and ε, such that, for all n ≥ n 2 , n ∈ I, and (iii) Finally, if k ≥ 3 and E ξ k = ∞, then, for any sequence ω n tending to infinity and ε < 1/(k + 1), we have Remark 3.5.The proof of Theorem 3.4 (i) shows the following stronger result: for k ≥ 2, there exists a constant C > 0 such that, for all n ∈ I, ≥ k and t ≥ 1, Lemma 5.3 is the only result in this work that requires this stronger bound.
and the moment condition on this random variable in order to have tails decaying as in Proposition 3.3 (i) is tight for k ≥ 3, it is reasonable to conjecture that a tail bound such as (3.1) holds if and only if E ξ k < ∞. (3.2) shows that the latter is indeed necessary.
From Theorem 3.4 we deduce the following corollary using the well-known formula The remainder of this section is devoted to the proofs of Theorem 3.4.In this context the following two observations are useful.From (2.2) and (2.3), it follows that there exists ω 1 > 0 such that sup k>0,k∈Nh−n Similarly, there exist n 5 ∈ N and ω 2 > 0 such that, for all n ≥ n 5 and k ≤ √ n with n − k ∈ Nh, (3.5) Lemma 3.8.Let T 1 , T 2 , . . .be independent realizations of the Galton-Watson tree T .For all , t, n ≥ 1 and 1 ≤ k < , Lemma 3.9.Let T 1 , T 2 , . . .be independent realizations of the Galton-Watson tree T .Let 2 ≤ < m and 0 < ε < 1/2.
(i) There exist n 6 , n 7 ≥ 1 and c 1 > 0 depending only on and ε, such that, for n ≥ n 6 , n − ∈ Nh and n 7 ≤ t ≤ (1 − ε)n/ , we have (ii) There exist n 8 , n 9 ≥ 1 and c 2 > 0 depending only on , m and ε, such that, for n ≥ n 8 , n − m ∈ Nh and n 9 ≤ t ≤ (1 − ε)n/ , we have The two lemmas rely on the following simple results.
the sum is bounded from above by If min(a, b) = 1, then the sum on the right hand side is equal to ζ(3/2) < √ 8 which shows the claim.Otherwise, using the monotonicity of x → x −3/2 , we have the bound dx which is easily seen to be bounded by 8/ min(a, b).This concludes the proof.
(ii) Since n − b ≥ a + εn, the sum is bounded from below by where we used a EJP 24 (2019), paper 2.
To show the second inequality, note that Upon increasing C if necessary, the sum is bounded from below by a term of the order n −3/2 .This concludes the proof.
Proof of Theorem 3.4.(i) We may assume n ∈ I and t ≥ 1.First, By Lemma 3.8, Since E ξ k+1 < ∞, the second factor in this display is bounded.Inequality (3.1) now follows by approximating P (|T | = n) with the help of (2.1) and (2.5).
To move from N k to N k+ , note that, for non-negative numbers u 1 , . . ., u n , t, in order to have u 1 + . . .+ u n ≥ t, we need to have max(u 1 , . . ., u n ) ≥ t/n.Thus, P (N The second summand is bounded from above by Since E ξ (3k+1)/2 < ∞, using the same ideas as above, the last term is at most of order t (1−k)/2 .Further, by Markov's inequality, using Proposition 3.1,
(iii) Let K ≥ k + 2 be an integer.We suppose that h = 1 for the sake of presentation.Using the first statement in Lemma 3.9, there exists c > 0 depending only on the offspring distribution and k, ε but not on n or K, such that, for all sufficiently large n ∈ I and ω n ≤ t ≤ εn, we have Using the asymptotic expansion of P (|T | = n) stemming from (2.1) and (2.5), it follows that lim inf The assertion (3.2) follows since the right hand side becomes arbitrarily large as K → ∞.

The 2-heavy tree
Let T be a fixed finite ordered rooted tree whose root shall be labeled ∅.As in Section 1.2, to each node v = ∅, we assign the rank ρ v where ρ v = i if its subtree is the i-th largest among all the subtrees rooted at its siblings.Ties are broken by the original order in the tree.
be the nodes on the path connecting the root to v where v i has depth i.The path from ∅ to v has nodes of indices ρ v1 , . . ., ρ v k = ρ v .It is called the index sequence of v and denoted by R(v).We define R(∅) = ∅ as the empty word.For a set A of words of finite lengths over the alphabet N, we set V(A) = {v ∈ T : R(v) ∈ A}.Further, for B ⊆ N, we write B * for the set of all finite length (even 0) words with symbols drawn from B. For example, V({1} * ) is the set of nodes in T that have all their ancestors and itself of index 1 plus the root.Of course, these nodes form the heavy path.Furthermore, we recover the k-heavy tree V({1, . . ., k} * ) of T by removing from T all nodes of index strictly larger than k and their subtrees.For k = 2, we obtain the 2-heavy tree.The 2-heavy Galton-Watson tree is denoted by B n and its size by B n .It is tempting to think that B n is increasing in probability or, at least, in mean.The following example shows that this is not the case.Let p 0 , p 2 , p 5 > 0 with p 0 + p 2 + p 5 = 1.Then, on the one hand, almost surely, τ 5 is binary and B 5 = 5.On the other hand, almost surely, τ 6 consists of the root with five children.Thus B 6 = 3.Note that this issue cannot be avoided by imposing an aperiodicity condition such as p i > 0 for all i.
Proof of Theorem 1.1 (ii).We show by induction that there exist ν 1 , ν 2 > 0 such that for all n ∈ I. First, since B i = i for i ∈ {0, 1, 2, 3} ∩ I, we need to have Here, in the last step, we have used that From here, the claim b(n The last expression and all inequalities in (4.2) can simultaneously be satisfied by .
To prove the first part of Theorem 1.1, we return to the setting of a fixed ordered rooted tree T. For a node v ∈ T define by n(v) the size of the subtree rooted at v. (We use n(v) rather than N (v) to emphasize that we work in a fixed tree T.) Let B = V({1, 2} * ) denote the 2-heavy tree in T. For M ≥ 2, let T 1 be the binary subtree of B containing all nodes with subtree sizes (with respect to the original tree T) at least M .That is, the vertex set of T 1 is {v ∈ B : n(v) ≥ M }.Then, let V 2 be the set of nodes in B with graph distance 1 from T 1 .By construction, n(v) ≤ M − 1 for v ∈ V 2 .Furthermore, let V 4 be the subset of nodes v ∈ T which are in a subtree rooted at a node in V 2 .(In particular, V 2 ⊆ V 4 .)Next, let V 3 be the set of nodes in T which are neither in T 1 nor in V We arrive at the useful inequality, noting that the set V 3 and its size depend on the choice of M .
Proof of Theorem 1.1 (i).Let T 1 (τ n ) (V 3 (τ n ), respectively) be T 1 (V 3 , respectively) in the conditional Galton-Watson tree τ n and recall that these quantities depend on M .We show that, for some (in fact, all sufficiently) large M , there exists 0  If p 0 + p 1 + p 2 = 1, then B n = τ n , and the statement of the theorem is trivial.Thus, we shall assume p 3 + p 4 + . . .> 0. Recall that, for a node v ∈ τ n , we write N (v) for its subtree size, N k (v) for the size of the k-th largest subtree rooted at a child of v and It is crucial to note that, given the random set {v ∈ τ n : N (v) = k}, the terms in this sum are independent and distributed like N 3+ in the tree τ k .By construction, Let ω n = n 1/5 .(Any integer sequence tending to infinity which is o(n 1/4 ) would work.)For n ≥ 0, denote by N 3+ (n) a generic random variable with the distribution of N 3+ in τ n .(We set N 3+ (n) := 0 for n / ∈ I.) By Markov's inequality and Corollary 3.7 (iii), there exists C 1 > 0 such that, for all n ∈ I and γ > 0, By Corollary 2.3, there exists a constant C 2 > 0 such that, for all n sufficiently large, we have for all γ > 0. For all γ > 0, the right hand side of this display tends to zero as n → ∞.
Let Z 1 , Z 2 , . . .be independent copies of N 3+ (k) and Further, for any 1 We use Chebyshev's inequality to bound both summands in the last expression.Applying (2.3) to (2.8) and using (2.9) shows the existence of a constant C 3 > 0 such that, for all n ∈ I, By similar arguments also relying on Corollary 3.7 (iv), there exists C 4 > 0 such that, for all n ∈ I, Here, we have also used the fact that lim inf n→∞,n∈Nh+1 E [N 3+ (n)] > 0. Hence, the second summand in (4.4) converges to zero as n → ∞.By Corollaries 2.3 and 3.7, there exists a constant C 5 > 0 (depending on the offspring distribution but not on M or n) with ) is identically 0, and we deduce for any γ ∈ (c 5 , 1).

Distances
The aim of this section is to prove Theorem 1.2.The following result is closely related to this theorem.(i) If E ξ k+1 < ∞, then, for any ε > 0, there exists C 1 > 0 such that, for all n ∈ I, (5.1) ≥k p > 0, then, for any ε > 0, there exists c 1 > 0 such that, for all n ∈ I, Let us briefly discuss this result and Theorem 1.2.First of all, the lower bounds (1.6) and (5.2) are much harder to obtain than the upper bounds (1.5) and (5.1),where (1.6) follows very easily from (5.2) from known tail bounds on the height of τ n (see (5.4)).
Second, in light of Theorem 3.4 (ii), the moment conditions imposed in (1.6) and (5.2) are somewhat unexpected.Indeed, we believe that these results are valid under the finite variance assumption on the offspring distribution in (1.1).However, since our proof uses the second moment method and involves suitable bounds on variances which crucially rely on the estimates in Theorem 3.4 (i), we cannot remove these conditions.Third, similarly to statement of Theorem 3.4 (iii), we can make the following two statements about the necessity of moment conditions in order to have tightness of the sequence These claims lead to the following proposition accompanying Theorem 1.2, where we recall that A k stands for the set of subtrees of τ n in which every node has at most k children.
Proposition 5.2.Consider a Galton-Watson tree whose offspring distribution satisfies (1.1) conditional on having size n.
At this point, it is necessary to discuss results on the height H n of τ n , that is, the maximal distance of a node from the root, in more detail.In accordance with Aldous' theory on conditional Galton-Watson trees, the scaling limit of H n is given by the maximum of a Brownian excursion.More precisely, where H ∞ has the theta distribution.That is, (5.3) In this generality, this limit theorem goes back to Kolchin [41,Theorem 2.4.3].In the case of Cayley trees, (5.3) had already been discovered by Rényi and Szerekes [50] and for full binary trees, that is p 0 = p 2 = 1/2, by Flajolet and Odlyzko [33].
The rest of this section is devoted to the proofs of Theorems 1.2 and 5.1, Proposition 5.2 and the two statements (i), (ii) above.

Upper bounds
Recall the definition of the index sequences R(v), v ∈ τ n from Section 4, that H(v) denotes the height of the subtree rooted at v in τ n and that we write B * for the set of all finite length words with symbols from a set B ⊆ N. As v → R(v) maps from τ n to N * , we shall define families of random variables H * (y), N * (y), y ∈ N * by H * (R(v)) := H(v) and N * (R(v)) := N (v) for v ∈ τ n and H * (y) = N * (y) = 0 if y / ∈ {R(v) : v ∈ τ n }.In particular, for ≥ 1, H * ( ) describes the height of the subtree rooted at the child with rank of the root in τ n .Lemma 5.3.Let k ≥ 2 and E ξ k+2 < ∞.Then, there exists a constant C > 0 such that, for all n ∈ I and t ≥ 1, Proof.Let {H i (n) : n ∈ I, i ≥ 1} be a family of independent random variables where each H i (n) is distributed like the height of τ n .Furthermore, assume that the family is independent of τ n .Using (5.4), we have By inequality (3.3) in Remark 3.5, there exists C 1 > 0 such that the right-hand side of the last display is bounded from above by Here, Γ(x) = ∞ 0 e −t t x−1 dt denotes the Gamma function.This concludes the proof.
The proposition immediately yields statement (5.1) in Theorem 5.1.
Proof.The left hand side is zero for t ≥ n/2 .Thus, we assume t ≤ n/2 − 1.Note that, for all nodes v ∈ τ n with N (v) ≥ n/2 , we must have v ∈ V({1} * ).Hence, there are at most |V({1} * )| of them in the tree.Thus, by Theorem 3.4 (i), writing w 1 , . . ., w n for the nodes of τ n listed in preorder, where C 1 can be chosen independently of t and n by Lemma 2.2 and the fact that The same argument applies to N k+ (v).
In order to transfer the result to distances, we need a tighter bound when restricting to nodes on the heavy path.Recall that, for a finite of infinite deterministic set of words A over the alphabet N, we write V(A) for the random set of nodes v ∈ τ n with R(v) ∈ A.
As the average height of τ n is well-known to be of order √ n, we deduce the following corollary: Corollary 5.6.Let k ≥ 2 and E ξ k+1 < ∞.Then, there exists a constant C 1 > 0 such that P max If E ξ (3k+1)/2 < ∞, then the same results hold with N k (v) replaced by N k+ (v) upon possibly increasing C 1 .Finally, if E ξ k+2 < ∞, then there exists C 2 > 0, such that Proof of Lemma 5.5.For ≥ 0, let A ,n be the subset of A of vectors of length where each entry is bounded from above by n.We have (5.5) We denote the elements of A ,n by y 1 , . . ., y K , K = K( ) ≤ n .Let {N (i) k (j) : i ≥ 1, j ∈ I} be a family of independent random variables where each N (i) k (j) is distributed like N k in the tree τ j .Then, using (3.1),P max Plugging the bound into (5.5)gives The same proof works for N k+ (v).Similarly, one obtains the result for the heights upon replacing N k (v) by max ≥k H * (R(v) ) and using Lemma 5.3.
Proof.We may assume t ≥ n 0 with n 0 as in Lemma 2.2.For k ≥ 1, n ∈ I, let (H(n), N k (n), ξ(n)) be distributed like (H, N k , ξ ∅ ) in τ n .Using (5.4), we have The expectation in the last display is bounded by By Theorem 3.4 (i), there exists Here, C 2 > 1 denotes a constant which is independent of m, t and n.Summarizing and using Lemma 2.2, we obtain for some C 3 > 0. Together with Corollary 5.6 for the maximum over nodes on the heavy path, this concludes the proof.

Lower bounds
Our lower bounds rely on a variant of the second moment method which requires sufficiently tight upper bounds on variances (or second moments).To this end, we use Lemma 6.1 in Janson [37] and introduce the notation used in this work.Denote by T the set of all ordered rooted trees.For a function f : T → R, let F be defined by Here T v denotes the subtree in T rooted at v. For k ≥ 1, we abbreviate f k (T) := f (T)1 |T|=k .Note that F (f k , τ n ) = Y k for f = 1, where 1 denotes the function mapping every tree to 1.Then, for 1 ≤ m ≤ k ≤ n/2, P (S n−m = 0) P (S n = −1) , and Note that, by the crucial Lemma 6.2 in [37], cancellation effects in I 2 (f, k, m) cause this term to be of the order n (for m, k fixed), rather than n 2 .Below, we only need upper bounds on the variance which allows us to neglect I 3 (f, k, m).For i = 1, 2, we set From Lemma 2.2, we know that there exists a constant K 1 > 0 depending only on the offspring distribution such that, for all 1 ≤ t ≤ n/4, we have (5.6) Proposition 5.8.There exists a constant C > 0, such that, for all 1 ≤ t ≤ (n − 1)/4, t ∈ N and n ∈ I, we have Var(Y t ) ≤ Cn.
In particular, for any sequence t = t(n) = o(n), we have, as n → ∞, in probability, Proof.We use the notation introduced above with the function f = 1.Obviously, In the following, C i , i ≥ 1, denote constants independent of k, m, t and n, whose precise values are of no relevance.For m ≤ k, by the local limit theorem (2.3), we have By Lemma 6.2 in [37], for t ≤ m ≤ k ≤ 2t, (5.7) Hence, This finishes the proof.
For ≥ 2 let n (T) denotes the size of the -th largest subtree of a child of the root in T. For example, in our notation, we have N = n (τ n ).For t > 0, let g (T) = 1 n (T)≥t .(We suppress t in the notation.)For t > 0, let t = ( + 2)t and define Then, where, as before, we write N (i) for a random variable distributed like N in τ i .Proposition 5.9.Let ≥ 2.
(i) If E ξ +1 < ∞, then, there exists a constant C 1 > 0, such that, for n ∈ I sufficiently large and t ≤ n/4, (ii) If m≥ p m > 0, then, there exist constants C 2 , K 2 > 0, such that, for n ∈ I sufficiently large, and C 2 ≤ t ≤ n/( 4), (iii) If E ξ +1 < ∞, then there exists a constant K 3 > 0 such that, for all n ∈ I, 1 ≤ t < (n − 1)/(4( + 2)), we have Proof.The bounds on the mean in (i) and (ii) immediately follow from (5.8) and the bounds in (5.6) using the tail bounds in Theorem 3.4 (i).In (iii), we may assume m≥ p m > 0, since, otherwise, V t = 0 almost surely.We then have where and where the tilde on the right-hand side indicates that the quantities are considered in the tree τ k .Combining the bounds in Theorem 3.4 (i) and (5.7), there exists C 4 > 0 such that Heavy subtrees of Galton-Watson trees Next, again using Theorem 3.4 (i), Here, we have used (τ n ) v for the subtree in τ n rooted at v. Finally, . This concludes the proof.
Proofs of Theorem 1.2 and Theorem 5.1.As already indicated, the upper bounds (5.1) and (1.5) follow immediately from Propositions 5.4 and 5.7.For the lower bound in (5.2), let ≥ 3, and note that, by Chebyshev's inequality, using the bounds in Proposition 5.9, for t and n sufficiently large with t ≤ (n − 1)/(4( + 2)), Now, (5.2) follows from Proposition 5.4 (iii) upon choosing t = cn 2/ with c > 0 sufficiently small.For the lower bound in (1.6) note that, for ε > 0, there exists n 3 > 0 such that, for all n ≥ n 3 , we have Hence, the lower bound in (1.6) follows from the lower bound in (5.2) upon choosing m = c 1 n 2/ in the last display with c 1 > 0 sufficiently small.
Proof of Proposition 5.2 and the preceding claims (i), (ii).We start with claim (i) and let 2), for any C 1 > 0 and all n ∈ I sufficiently large, we have for all i = t , . . ., 2t .Using this bound, (5.8) and (5.6), for all n large enough, we obtain As K 1 and C are fixed and C 1 was chosen arbitrarily, we deduce the assertion E [V t ] → ∞.
To show claim (ii) again set t = t(n) = Cn 2/k .By following the steps in the proof of Proposition 5.9, there exists a constant C 1 > 0 independent of n and C such that 2 ).This concludes the proof by the second moment argument in the proof of Theorem 5.1.
The arguments to deduce Proposition 5.2 are very similar to those necessary to obtain Theorem 1.2 from Theorem 5.1 and therefore only sketched.First of all, to show part (ii) note that the subtrees rooted at the children of rank 1, . . ., k of a node v with N k (v) ≥ Cn 2/k all have heights at least Cεn 1/k for some small ε with high probability depending only on ε and not on C. Since for any large C such nodes v exist with high probability by claim (ii), for any T ∈ A k , we find nodes at least Cεn 1/k away from T .In fact, this argument also yields at least Cεn 1/k /2 − 1 many nodes which have graph distance Cεn 1/k /2 − 1 from T .This simple observation explains why claim (i) implies Proposition 5.2 (i): in fact, lim inf n→∞ E |{v ∈ τ n : N k (v) ≥ Cn 2/k }| > 0 for all C > 0 would be sufficient in this context.

The heavy path
In this section, we study the heavy path V({1} * ) in the conditional Galton-Watson tree τ n .We set L n = |V({1} * )| − 1 as in the introduction.Recall from Section 1.1, that the scaling limit of conditional Galton-Watson trees is Aldous' continuum random tree.
More precisely, define the depth-first process (or contour function) , where f (i), 0 ≤ i ≤ 2n − 2, denotes the node visited in the i-th step of the depth first traversal, and d(v) measures the distance of a node v from the root.We extend the process to a continuous function on [0, 2n − 2] by linear interpolation.Endowing the space of continuous functions with the supremum norm, we have, where e is a standard Brownian excursion.This is Aldous's Theorem 2 [6].As already indicated in the introduction, the heavy path can be defined in the continuum random tree making use of its definition based on a Brownian excursion.Therefore, using (6.1), convergence of L n / √ n boils down to an application of the continuous mapping theorem.
The technical steps in this context leading to Theorem 1.3 are intricate and of entirely different flavour than the arguments in the rest of the paper.Therefore, we defer the proof of this theorem to Section 6.2.
It turns out that L ∞ can be represented as an exponential functional of a subordinator ξ(t), t ≥ 0, that is, L ∞ = 2σ −1 ∞ 0 e − 1 2 ξ(t) dt.Such quantities have applications in various fields such as self-similar Markov processes and mathematical finance.We refer to Bertoin and Yor [18] for a survey.In particular, as worked out in detail in Section 6.2, the existence of a density for L ∞ as well as the formula for the moments follow from general results on exponential functionals due to Carmona, Petit and Yor [24].
Remark 6.1.As stated in Theorem 1.3, we also prove functional limit theorems (after rescaling) for the quantities ) and (6.7) in Theorem 6.12.The limiting random variables can be expressed in terms of the subordinator ξ involving a random time-change.
It is natural to compare L n to the height H n .In particular, since L n ≤ H n , the bound (5.4) on the tail of H n also applies to the right tail of L n .For the limiting behaviour, we have Our next result shows that the decay of the distribution function of T ∞ is considerably slower at 0. Still, all its derivatives vanish at 0. Proposition 6.2.We have The proof of the first part of the proposition relies on sandwiching the random variable L n / √ n between two quantities admitting series representations of the form ∞ i=0 ρ i Z i for some 0 < ρ < 1 and a sequence of independent and identically distributed random variables Z 1 , Z 2 , . . .It is presented in Section 6.1.The second claim shown in Section 6.2 uses a result due to Rivero [51] on the right tail decay of exponential functionals of subordinators (see (6.14)).(There are also general results on the left tail decay of exponential functionals.Compare, e.g.Pardo, Rivero and van Schaik [48] and the references given therein.We did however not find any result in the literature covering our case.)
Proof of Proposition 6.2 (Lower bound).Assume h = 1 for the sake of presentation.Fix 0 < δ < 1/2 such that (1 − δ) i n / ∈ N for all i, n ≥ 1 and set c := 1/(1 − δ).For i ≥ 0, let e i ∈ {1} * be the vector of 1 s of length i and σ δ and β * be as in Theorem 3.4 (ii) with k = 2 and the chosen ε.The crucial observation is that there exist C 1 , C 2 > 0 such that, for all n ≥ C 1 and j ≤ log c n − C 2 , we have, stochastically, where G 1 , G 2 , . . . is a sequence of independent geometrically distributed random variables on {1, 2, . ..} and G i has success parameter β * / δ(1 − δ) i−1 n.Taking (6.2) for granted, we obtain, in a stochastic sense, A simple direct computation using nothing but 1 + x ≤ e x , x ∈ R, shows that a geometrically distributed random variable with success probability 0 < p < 1 is stochastically smaller than 1 + E/p where E has the standard exponential distribution.It follows that, in probability, where E 1 , E 2 , . . . is a sequence of independent random variables each of which having the standard exponential distribution.Hence, in probability, From here, the lower bound on the limit inferior follows from Lemma 6.3 since we can choose δ arbitrarily close to 1/2.
It remains to prove the bound (6.2).Let t ∈ N, j ≥ 0 and n ∈ I.Then, Now, we specify C 1 , C 2 as follows: first, let n 2 ≥ n 2 with n 2 as in Theorem 3.4 (ii) (with k = 2 and the chosen ε) such that p ∈ I for all p ≥ n 2 .Then, let C 2 be large enough such that . Thus, by Theorem 3.4 (ii), the right hand side of the last display is bounded from below by Hence, P (σ j+1 ≥ t) ≤ P (σ j + G j+1 ≥ t) where σ j and G j+1 are independent.Iterating the argument concludes the proof.
Proof of Proposition 6.2 (Upper bound).First of all, as it will become clear in the formulation of Theorem 6.12 in the next section, the scaling limit σL ∞ /2 does not depend on the offspring distribution.Hence, we may assume that p 0 = p 2 = 1/2.In particular, σ = 1.Next, let {U i,j : i, j ≥ 1} be a family of independent random variables with the uniform distribution on [0, 1].Let 2 < a < a be non-algebraic.For i ≥ 1, define Fix k ∈ N (large).We will show that for all n sufficiently large, stochastically, For now, let us use this bound to conclude the proof of the proposition.Note that the random variable U −2 1,1 is in the domain of attraction of a non-negative stable distribution with index 1/2.More precisely, for some c > 0, The limit law is the Levy distribution with density c/(2π)x −3/2 e −c/(2x) on [0, ∞).A straightforward computation shows that S −1/2 is distributed like c −1/2 |N |, where N has the standard normal distribution.In particular, for any x > 0, as n → ∞, It follows that, for x > 0, where N 1 , N 2 , . . .are independent standard normal random variables.Since the left hand side does not depend on k, we may substitute k = ∞ on the right hand side.Lemma 6.3 concludes the proof since we can choose a > 2 arbitrarily.It remains to prove (6.3).To this end, for i ≥ 1, define P i = max{N (j) : N (j) ∈ [nm i , na −i+1 ]}.Subsequently, assume that n ≥ 4a k a /(a − 2).Then, since for all nonleaves v ∈ τ n , we have N * (R(v)1) ≥ (N (v) − 1)/2, a simple computation shows that the quantities P 1 , . . ., P k are well-defined.Let t > 0.Then, Observe that, conditionally on P k = x, the random variables (Q 1 , . . ., Q k−1 ), Q k are independent.Hence, The crucial observation is that, conditionally on P k = x, the random variable Q k is stochastically larger than R k .To see this, note that, by Theorem 3.4 (i), we know that N 2 ≥ β 2 2 U −2 1,1 in probability.Hence, for any nm k ≤ x ≤ na −k+1 and y ≥ 1, using the notation from the previous proof, we deduce We conclude Iterating gives the desired claim and finishes the proof.

Proof of Theorem 1.3 and further results
To keep this section self-contained, let us recall some definitions.For a discrete ordered rooted tree T, the heavy path is defined as the unique path from the root to a leaf which always continues in the largest subtree.Here, ties are broken considering the preorder index.It is easy to read off the length of the heavy path from the depth-first search process encoding T since each excursion above a level corresponds to a subtree.Thus, starting with the interval I 0 := [0, 2|T| − 2] at time 0, given the interval I i at time i ≥ 0, I i+1 is chosen as the largest subinterval of I i corresponding to an excursion above level i + 1.We now extend the concept to arbitrary continuous excursions.To this end, let We always consider C ex endowed with the topology induced by the supremum norm f = sup t∈[0,1] |f (t)|.
Superlevel sets for excursions.Let V be the space of open subsets of [0,1], where open refers to the subspace topology of [0,1]  For a function f ∈ C ex and t ≥ 0, the superlevel set P f (t) = {s ∈ [0, 1] : f (s) > t} is open.The V-valued process P f := P f (t), t ≥ 0 has the following properties: (i) P f (t) ⊆ P f (s) for 0 ≤ s ≤ t, (ii) P f is right-continuous, that is, P f (t) = lim s↓t P f (s) for all t ≥ 0, (iii) P f (t) = ∅ for all t large enough, and ∈ ∂P f (s) for all 0 ≤ t < s.
Here Letting W be the set of V-valued processes satisfying (i)-(iv), the map f → P f is a bijection between C ex and W.
The heavy path construction.For O ∈ V, let m(O) denote the interval with largest length in O.In case several intervals qualify, we choose the smallest of them with respect to the order defined for intervals I, I by For a V-valued process P, we define a process P * t , t ≥ 0 with P * t ⊆ P t for all t ≥ 0 as follows: set P * 0 = P 0 and T 0 = 0.Then, inductively, for n ≥ 0, given T n and P * t for all t ≤ T n , let T n+1 = inf{t > T n : m(P * Tn ∩ P t ) ≤ 2 −(n+1) }, P * t = m(P * Tn ∩ P t ), T n < t < T n+1 , T ∞ := lim n→∞ T n is finite and bounded by inf{t ≥ 0 : P t = ∅}.For t ≥ T ∞ , we set P * t = ∅.Then, P * ∈ W and P * t is an interval for all t ≥ 0. We also define t * = lim n→∞ inf P * Tn and t * = lim n→∞ sup P * Tn .We call P trivial if P t = ∅ for all t ≥ 0. For a non-trivial process P t , t ≥ 0, two scenarios are possible: (i) T n < T ∞ for all n ≥ 1.Then, P * t is continuous at T ∞ and t * = t * .(ii) T n = T ∞ for some n ≥ 1.Then, P * t is discontinuous at T ∞ and t * < t * .
For f ∈ C ex , write P * f for P * and T f ∞ for T ∞ when P = P f .If f is the depth-first search process of a discrete ordered rooted tree rescaled on the unit interval then T f ∞ is the length of the corresponding heavy path.For a discussion of the heavy path in a general real tree, see the end of this section.Remark 6.4.The sequence T n , n ≥ 0 arising in the heavy path construction plays no role in the sequel.We could replace the sequence 2 −(n+1) , n ≥ 0 in its definition by any monotonically decreasing sequence α n , n ≥ 0 with α n → 0 and α n ≥ 2 −(n+1) .This leaves P * and T ∞ invariant.In fact, we could also let α n depend on P by setting α n = 1 2 λ(P * Tn ).O is unique.For any fixed t ≥ 0, the map f → P f (t) is not continuous on C ex .The set W is not closed when endowing the set of all V-valued processes with the topology of uniform convergence on compact sets.The following important lemma contains a positive result in the converse direction.Here and subsequently, we recall the definition of the modulus of continuity of a continuous function f on [0, 1]: By the Arzela-Ascoli theorem, for a family of continuous functions (f i ) on [0, 1], we have sup i ω fi (ε) → 0 as ε → 0 if (f i ) is relatively compact.(In other words, the family is uniformly equicontinuous.)Lemma 6.5.Let f, f n , n ≥ 1 be continuous excursions.Suppose that, uniformly on compact sets, we have d((P fn (t), P f (t)) → 0.Then, f n − f → 0.
The Skorokhod space.Let (S, d) be a Polish space.By D S we denote the set of càdlàg functions with values in S. A function f : [0, ∞) → S is called càdlàg if, for all t ≥ 0, it is right-continuous at t and, for all t > 0, the left limit f (t−) := lim s↑t f (s) exists.D S is endowed with the Skorokhod topology: a sequence f n , n ≥ 1 converges to a function f if and only if there exists a sequence of strictly increasing continuous functions λ n : [0, ∞) → [0, ∞) such that λ n → id uniformly on [0, ∞) and f n • λ n → f uniformly on compact sets.For details on D s , we refer to Billingsley's book [19].Again, one can easily check that f → P f is not continuous on C ex .Further, W ⊆ D V is not closed.
The following lemma is crucial.Lemma 6.6.The set W ⊆ D V endowed with its relative topology is Polish.W is measurable with respect to the Borel-σ-algebra on D V .Further, the map f → P f from C ex to D V is measurable.
Proof.Let us first show that P → f P is continuous regarded as map W → C ex .To this end, let P, P n , n ≥ 1 be elements in W with P n → P in the Skorokhod topology.Choose a sequence λ n , n ≥ 1 of strictly increasing continuous bijections on [0, ∞) with λ n → id uniformly on [0, ∞) and P n • λ n → P uniformly on compact sets.By Lemma 6.5, f Pn•λn − f P → 0. Hence, it remains to show that f Pn•λn − f Pn → 0. But for any P ∈ W and any strictly increasing bijection λ, we have f P Clearly, if f ∈ C (1) ex , analogously for C (2) ex .
In the following lemma, recall that, for a càdlàg function f with values in a Polish space and t > 0, we have set f (t−) := lim s↑t f (s). ) Proof.It is easy to see that, for any r, s ∈ [0, 1] and f, g ∈ C ex , we have Hence, the final claim of the lemma is a direct implication of the remaining statements.If m f is continuous at ζ f (r), then we can simply choose r n = r.In this case, if r > m f (1/2), the assertions (6.4), (6.5) even hold for general f ∈ C ex .The interesting case is when m f is discontinuous at ζ f (r) which we assume from now on.Let α = inf P f (ζ f (r)−) and β = sup P f (ζ f (r)−).Since f ∈ C * * ex there exists a unique strict minimum x of f on (α, β) such that, either, i) Then, for all n sufficiently large, there exist α n < x n < β n such that P fn (s n −) = (α n , β n ) and, for i), P fn (s n ) = (α n , x n ) while, for ii), P fn (s n ) = (x n , β n ).We also have α n → α, β n → β and x n → x.All statements follow readily.Proposition 6.9.The map f → P * f is continuous at every f ∈ C * ex .
Proof.Let ε > 0 be small.Let f (0) = f and, recursively, (ii) For k ≥ 0, let P n (k) be the size of the subtree rooted at the node on level k on the heavy path.In distribution, in the Skorokhod topology on D [0,∞) , Then, in distribution, on the space of continuous functions on [0, 1], The heavy path in the Brownian Continuum tree.Interval decompositions governed by a Brownian excursion can be studied with the help of self-similar fragmentations introduced by Bertoin [16].We recall a version of Definition 2 in this work: a V-valued process F (t), t ≥ 0 with càdlàg paths is called self-similar with index α ∈ R, if (1) F (0) = [0, 1], F (t) ⊆ F (s) for all t ≥ s ≥ 0; (2) F (t) is continuous in probability at every t ≥ 0; further, given F (t) = ∪I j for t ≥ 0 and disjoint open intervals I 1 , I 2 , . .., (3) the processes (F (t + s) ∩ I j ) s≥0 , j ≥ 1 are stochastically independent; (4) for all j ≥ 1, F (t + s) ∩ I j , s ≥ 0 is distributed like F (|I j | α s), s ≥ 0 rescaled to fit on I j .
Bertoin [16] observes that P e (t), t ≥ 0 is a self-similar fragmentation process with α = −1/2.Hence, the process P * e (t), t ≥ 0 is also a self-similar process with α = −1/2.For t ≥ 0, let It follows from [16,Theorem 2] that the V-valued càdlàg process H(•) := P e ( 1 (•)) is a homogeneous interval fragmentation, that is, a self-similar fragmentation process with index α = 0. (Here, and subsequently, we abbreviate P e (∞) = H(∞) = ∅.) Homogeneous fragmentation processes were studied in detail in another work of Bertoin [15].In particular, by exploiting the connection between interval fragmentations and exchangeable partitions of the natural numbers [16, Lemmas 5 and 6], the arguments in the proof of Theorem 3 in [15] relying on a Poisson point process construction reveal that ξ(•) := − log λ(H(•)) is a subordinator, that is, an increasing non-negative càdlàg process with stationary and independent increments.By [15, Theorem 2], (the distribution) of a homogeneous fragmentation process is characterized by a unique exchangeable partition measure which is determined by an erosion coefficient c ≥ 0 and a Lévy measure ν on S * := {x ∈ R N : x 1 ≥ x 2 ≥ . . .≥ 0, i≥1 x i ≤ 1} \ {(1, 0, . ..)} with the property that S * (1 − x 1 )dν(x) < ∞.We refer to [15] for a detailed discussion of this characterization and only use the following two results: first, by the arguments in [16, Section 4], for P * e , we have c = 0, and ν is concentrated on {(x, 0, . ..) : 1/2 ≤ x ≤ 1} where the projection on the first component denoted by ν 1 satisfies Second, by the arguments in the proof of Theorem 3 in [15] the Laplace transform E [exp(−qξ(t))] , t, q ≥ 0 is given by exp(−tΦ(q)) with where 2 F 1 denotes the standard hypergeometric function.In particular, The definition and properties of Φ extend to q < 0. In particular, Φ is infinitely often differentiable on R and  For an overview of results on exponential functionals of Lévy processes we refer to Bertoin and Yor's survey [18].In particular, by results going back to Carmona, Petit and Yor [24] (see also [18,Theorem 2]), for k ≥ 1, .   Finally, from (6.9), using the substitution t = v/(q − 3/2), it is straightforward to show that Φ(q) ∼ √ 8q as q → ∞.Hence, the decay of the right tail of the corresponding distribution is given by [51, Proposition 2]: where e * is an independent copy of e.In particular, T e * ∞ , (ζ e (r), m e (r)) are independent while ζ e (r), m e (r) are defined using the same Brownian excursion e.Hence, T e ∞ is characterized by a family of perpetuities, one for each value of r.For more background on stochastic fixed-point equations of perpetuity type and a proof for the fact that (6.15) indeed determines the distribution of T e ∞ , we refer to Vervaat [52].For all 0 < r < 1, e (r), . . .are independent copies of ζ e (r).Similarly, in the proof of Proposition 6.2, we have shown that there exists a constant C > 0 and, for all a > 2 a constant c > 0 such that, stochastically, where N 1 , N 2 are independent standard normal random variables and E 1 , E 2 , . . ., are independent random variables with the standard exponential distribution.In fact, our proofs also revealed that, with the same constants c, C, a, in probability, ca −1/2 |N 1 | ≤ ζ e (1/2) ≤ C2 −1/2 E 1 . (6.18) Note that the lower bound in (6.17) does not follow from (6.16) and (6.18) due to the factor 1/2 in (6.16).Hence, the tail bound deduced from the discrete-time approach is stronger than the bound we could show relying only on the perpetuity (6.15).
Heavy trees in real trees.In the final paragraph, we give an outlook of the theory of heavy trees and the heavy path in the framework of real trees.We remain brief, as a full discussion of the topic would go far beyond the scope of this work.For background on real trees, we refer to Evans' book [32] and Le Gall's survey [43].
A metric space (T , d) is called a real tree if it satisfies the following two points: (Again, this definition is reminiscent of the definition of the graph distance between two vertices in a discrete tree.)With T f = [0, 1]/∼ where x ∼ y if and only if d f (x, y) = 0, µ = f * (Leb) (the pushforward measure) and ρ = f (0), the tuple (T f , d f , µ f , ρ f ) is wellknown to be a compact rooted measured real tree [32,Chapter 3].For a given compact rooted measured real tree (T , d, µ, ρ) and x ∈ T , we call the number of connected components of T \ {x} the degree of x. (This number is at most countably infinite).We call a point x ∈ T a leaf if its degree is one, and write L for the set of leaves.Of particular interested in the theory of random trees are those satisfying supp(µ) = T , and we only consider these cases from now on.
Let B be the set of branching points of T , that is, points with degree at least three.Set B * = B ∪ {ρ}.As all branching points in the Brownian continuum tree have degree three, the 2-heavy tree is equal to the entire tree).We can generalize the definition of the functions in this section to the level of real trees.For example, for 0 < t ≤ d(ρ, x), we let p y ∈ [ρ, x] be the unique element for which d(ρ, p y ) = y.Then, we set m T (t) = µ(C y 1 ).
(If y / ∈ B * , then C y 1 is to be understood as the unique component of T \ {y} which does not contain ρ.)The definition of the corresponding inverse ζ T remains unchanged: for t ∈ [0, 1], we set ζ T (t) = inf{s > 0 : m T (s) ≤ t}.A discussion of continuity of these functions is more involved.First of all, it is necessary to change perspective and consider isometry classes of real trees (or metric spaces) with respect to the so-called Gromov-Hausdorff-Prokhorov distance.(For details and definitions, see [1] and [23].)Next, it is important to observe that the function m T is invariant under isometries and can therefore be defined for isometry classes.As for continuous functions, the maps m T , ζ T are not continuous on the entire space of (equivalence classes) of real trees.Indeed, continuity of these functions can only be expected at (equivalence classes of) real trees

A Appendix
For the sake of completeness, we state and prove the lemma connecting the length of the longest simple path in the Apollonian network and the size of the largest binary subtree in the underlying evolutionary tree.Essentially, this is a reproduction of the proof of Theorem 1.2 (a) in [31].
Lemma A.1.Let G be an arbitrary Apollonian network with 3 + n, n ≥ 0 vertices (outer vertices included) and 1 + 2n faces.Denote by L the number of vertices on the longest simple path in G. Let T be the corresponding evolutionary tree with n non-leaves and 1 + 2n leaves.Then, for any binary subtree B of T , we have L ≥ (|B| + 5)/2.

Figure 1 :
Figure 1: A finite rooted tree of size 7 with labels given by the preorder with associated Łukasiewicz path.

Figure 2 :
Figure 2: Apollonian network of size 3 with evolutionary tree.Leaves are drawn as boxes.Note that the outer three vertices in the network have no counterparts in the tree.

Figure 3 :
Figure 3: Instance of the construction underlying the proof of Theorem 1.1.Black-filled nodes form T 1 , non-filled nodes constitute V 2 , dashed subtrees indicate V 3 , and V 4 is represented by the solid subtrees merged with V 2 .
where d H denotes the Hausdorff distance.For O ∈ V and a V-valued sequence O n , n ≥ 0, we have d(O n , O) → 0 if any only if λ(O n ∆O) → 0 where A∆B := A\B ∪ B\A and λ denotes the Lebesgue measure on [0, 1].(V, d) is a compact metric space (hence Polish).Every element of V uniquely decomposes in at most countably many disjoint open intervals.

Lemma 6 . 8 .
Let f n , n ≥ 1 be a sequence of continuous excursions and f ∈ C * ex .Suppose that 0 for a sequence of continuous excursions f n , n ≥ 1. Denote by r (0) n the sequence from Lemma 6.8 with r = m f (1 − ε).Set s (0) n := ζ fn (r (0) n ) and f (0) n

•
for every pair of points a, b ∈ T there exists a unique isometry ϕ a,b : [0, d(a, b)] → T for which ϕ a,b (0) = a and ϕ a,b (d(a, b)) = b,• if q : [0, 1] → T is a continuous and injective map with q(0) = a, q(1)= b, then q([a, b]) = ϕ a,b ([0, d(a, b)]).In words, (T , d) is geodesic and loop-free and therefore generalizes the concept of a discrete tree to a continuous level.We use the shorthand notation [a, b] := ϕ a,b ([0, d(a, b)]) for the path between a and b in T .Augmenting (T , d) by a probability measure µ on the Borel-σ-field on T and a unique vertex ρ (the root), the quadruple (T , d, µ, ρ) becomes a rooted measured real tree.In the remainder we are only interested in cases when the spaces are compact.An important construction of such spaces is via continuous excursions f ∈ C ex using the pseudometric d f (a, b) := f (a) + f (b) − 2 inf{f (a) : a ∧ b ≤ s ≤ a ∨ b} .
For each b ∈ B * we may order the connected components C b 1 , C b 2 , . . . of T \ {b} which do not contain the root ρ according to their µ-masses.Note that these masses are non-zero as µ has full support.(Since real trees are not ordered, a discussion of ties is technical and omitted here.)Let B * x = B * ∩ [ρ, x].Clearly, there exists a unique leaf x such that, for all b ∈ B * x, we have x ∈ C b 1 .The path (T ) = [ρ, x] is the heavy path in T and d(ρ, x) its length.Similarly, the k-heavy tree can be defined as x∈L k [ρ, x] where L k is the set of leafs x such that, for all b ∈ B * x , we have x ∈ k i=1 C b i .(Starting with a Brownian excursion e, we have T e ∞ = d e (ρ e , x).
Then, by part (i) of the lemma, the first factor in the sum is bounded from below by c 1 ( , ε/2)t , and subsequently, ∂O denotes the boundary of an open set O ⊆ [0, 1].Conversely, for every V-valued process P t , t ≥ 0 satisfying (i)-(iii), we can define f P (t) = sup{s ≥ 0 : t ∈ P s }, and observe that P t = P f P (t) for all t ≥ 0. Note that f P is lower semi-continuous.(A non-negative function on [0, 1] is lower semi-continuous if and only if P f (t) is open for all t ≥ 0.) Further, for all f P (t) ≤ s < f P (t−), we have t ∈ ∂P f (s).In particular, it easily follows that f P ∈ C ex if and only if P t , t ≥ 0 satisfies (iv).
Unfortunately, some technical issues arise in this construction.The map O → λ(m(O)) is continuous, and so is (O, O ) → O ∩ O .Similarly, the map O → inf O (O → sup O, respectively) is measurable and continuous at O ∈ V if and only if 0 ∈ O (1 ∈ O, respectively).The map O → m(O) is measurable and continuous at O ∈ V if only if the largest interval in Heavy subtrees of Galton-Watson treesIn a Brownian excursion, all local minima are strict and pairwise distinct.Hence, for all x ≥ 0, the set M f (x) contains at most two elements and e ∈ CThe map t → ζ f (t) is continuous.Every point of discontinuity of P * This shows the claimed continuity.In view of Lemma 6.5, for P, P ∈ W, define d * (P, P ) = f P − f P + d sk (P, P ), f (or, equivalently, of