Voronoi cells in random split trees

We study the sizes of the Voronoi cells of $k$ uniformly chosen vertices in a random split tree of size $n$. We prove that, for $n$ large, the largest of these $k$ Voronoi cells contains most of the vertices, while the sizes of the remaining ones are essentially all of order $n\exp(-\mathrm{const}\sqrt{\log n})$. This discrepancy persists if we modify the definition of the Voronoi cells by (a) introducing random edge lengths (with suitable moment assumptions), and (b) assigning different"influence"parameters (called"speeds"in the paper) to each of the $k$ vertices. Our findings are in contrast to corresponding results on random uniform trees and on the continuum random tree, where it is known that the vector of the relative sizes of the $k$ Voronoi cells is asymptotically uniformly distributed on the $(k-1)$-dimensional simplex.


Introduction
Voronoi cells.Consider a large graph G, from which we choose k vertices uniformly at random, and denote them by U 1 , . . ., U k .The Voronoi cell Vor(U j ) of U j consists of those vertices that are closer in graph distance to U j than to any of the other chosen vertices {U i : i = 1, . . ., k; i = j}, with an arbitrary rule to break ties.We are studying the vector of proportional sizes In recent work, Addario-Berry et al. [AAC + 18] investigated this question for the case that G is a uniform tree, and proved that the limiting vector is uniform on the (k − 1)-dimensional simplex.Indeed, they showed much more, namely that this is even true in a limiting sense on the Brownian continuum random tree (henceforth CRT), and thus for all graph models that converge to the Brownian CRT in the Gromov-Hausdorff-Prokhorov topology: rooted plane trees, rooted unembedded binary trees, stacked triangulations, and others.Guitter [Gui17] proved the same uniform limit for the case that G is a random planar map of genus 0 and k = 2. Chapuy [Cha19] made the far-reaching conjecture that the uniform limit is true for all random embedded graphs of fixed genus.
Our first contribution.In this paper, we look at the distribution of the Voronoi cells of k uniform nodes in a random split tree.Split trees are a family of rooted trees introduced by Devroye [Dev98] and later extended by Janson [Jan19] who allowed trees of unbounded degrees: this family includes classical random trees such as the binary search tree, the random recursive tree, the preferential attachment tree (also called port for "plane oriented recursive tree").In our first main result, we prove that the largest of the Voronoi cells of k uniform nodes in an n-node split tree contains a proportion 1 of all nodes.
We are also able to prove that the second, third, . . ., k-th largest Voronoi cells each contains an order n exp(−const √ log n) of all vertices.We show that this result also holds when edges of the tree are given random i.i.d.lengths (of finite variance, or heavy-tailed but with finite mean), and defining the Voronoi cells with respect to the distance induced by these edge lengths instead of the graph distance.
This result is in contrast with the findings of [AAC + 18] for the uniform random tree equipped with the graph distance: the distribution of the sizes of Voronoi cells is balanced in the case of the uniform random tree (and other trees whose scaling limit is the CRT), while we show a "winner takes it all" behaviour in the case of split trees.This difference in behaviour should not be surprising: it is well-known that split trees have a very different shape from the uniform random trees (and other random trees whose scaling limit is the CRT): for example, the typical height of an n-node split tree is log n, while the typical height of the uniform random tree is √ n.In that sense, split trees belong to another universality class of random trees (as opposed to trees whose scaling limit is the CRT), and our first main result corresponds to the findings of [AAC + 18] for this second universality class.Similarly to [AAC + 18] conjecturing that their result generalises to maps that scale to the random Brownian map, one might expect that the behaviour we prove for random split trees might also be exhibited by other graphs such as preferential attachment graphs and other scale-free models such as the configuration model.However, our proofs cannot be straightforwardly generalised.
Extension to a competition/epidemics model.The Voronoi cells can be seen as the result of a competition model where k agents are claiming territory with uniform speed until they reach vertices that are already claimed by another agent.This procedure stops when all vertices are claimed by some agent; the final territories are the same as the Voronoi cells.As discussed above, our first result is that -unlike in the case of uniform trees -the final territories are rather unbalanced: while one agent will claim almost the entire tree, the rest has to live on a rather small territory.(This behaviour also persists when we introduce random edge lengths.) One can also see this competition model as a competition between k mutually exclusive epidemics, which are started at k uniform vertices of a split tree, and which all spread at constant and equal speed.Our second main result is that, if the speed of transmission varies among the different epidemics, then the fastest epidemic spreads over order n of the vertices.We are also able to estimate precisely the number of nodes that get infected by each of the slower epidemics.
Note that this "winner takes it all" effect has already been observed in a competing first-passage percolation model on the configuration model with tail exponent τ ∈ (2, 3) (see [DH16]).The main difference with our model is that epidemics spread deterministically in our model, and randomly at a given rate in the competing first-passage percolation model.In both models, one epidemic occupies eventually almost all of the available territory.In the case of different speeds, this is the fastest one, but in the case of equal speed this is determined by the initial position (see [BHK15,HK15], where this is proved for the competing first-passage percolation model).The competing first-passage percolation model on random regular graphs exhibits similar behaviour [ADMP17].This suggests that the uniform limiting proportion of Voronoi cells is not true on complex networks.Instead, our results support the belief that for competing epidemics on networks with small distances ("small-world graphs") there is one dominating epidemic.
Information on the typical shape of a random split tree.The asymptotic sizes of the Voronoi cells (or territories) of k nodes chosen uniformly at random in a tree gives information on the typical shape of a tree.In fact, to prove our main result, we prove two results that may be of independent interest because they give information of the typical shape of a random split tree: (1) in Proposition 1.9 we show convergence in probability of the "profile" of a random split tree, and (2) in Proposition 2.6, we prove asymptotic results for the size of a typical "extended" fringe tree in a random split tree.
(1) The profile of a random tree is the distribution of the height (distance to the root) of a node taken uniformly at random in the tree.If the tree is random then its profile is a random measure.In Proposition 1.9, we show that the profile of a random split tree behaves asymptotically (in probability) as a Gaussian centred around const log n and of standard deviation const log n.Our framework includes the cases of the random binary and m-ary search trees, the random recursive tree and the preferential attachment trees, for which convergence of the profile is already known in the almost sure sense (see [CDJH01], [MM17], and [Kat05], respectively).
(2) Fringe trees are subtrees that are rooted at an ancestor of a node taken uniformly at random in the tree (or at the uniform node itself).Oftentimes, this ancestor is chosen to be at constant distance of the uniform node (see, e.g.[HJ17] and the references therein).In Proposition 2.6, we extend this definition to allow the ancestor to be at distance to the uniform node that tends to infinity with n, the number of nodes in the whole split tree.
The main technical obstables in our proofs comes from the three levels of randomness: (a) the trees we consider are random split trees, (b) we then sample i.i.d.random edge-lengths, and (c) we finally sample k nodes uniformly at random in the tree.The advantage of our approach is that the framework we consider is very wide: the random split trees we consider include, among others, the binary and m-ary search trees, the random recursive tree, and the preferential attachment tree; our edge-length distribution can be of finite variance, or heavy-tailed with finite mean; we allow the different epidemics to have identical or different speeds.
In the rest of this section, we define our model (Section 1.1) and state our main results (Section 1.2).

Trees and random split trees
In this paper, we use the Ulam-Harris definition of m-ary trees: let m ∈ N and be the set of all finite words on the alphabet {1, 2, . . ., m}.We further consider the case of infinitary trees, where m = ∞ and D ∞ = N * .We henceforth formulate our results for finite and infinite m in a unified fashion (unless stated explicitly); finite tuples, such as in (1.1) below, should be interpreted as infinite sequence whenever m = ∞.
Definition 1.1.An m-ary tree is a subset t of D m such that for all w = w 1 • • • w ∈ t, all the prefixes of w are in t, i.e. for all i ∈ {0, . . ., } one has See Figure 1 for an example of a 3-ary tree.In the following, we collect some standard vocabulary and notations; they reflect the fact that a tree is often seen as a genealogical structure: • words are called "nodes"; • the prefixes of a word are its "ancestors": we write v ≺ w if v is an ancestor of w, and v w if v is w or an ancestor of w; • the longest of the (strict) prefixes of a word w is its "parent", which we denote by ← v ; • a node is a "child" of its parent, and it is a "descendant" of each of its ancestors; • the "siblings" of a node v are all those nodes different from v that share the same parent with v; its "left-siblings" (resp."right-siblings") are all its siblings that are smaller (resp.larger) in the lexicographic order; • the word ∅ is the "root" of the tree; • the "height" of a node is the number of letters in the word (the root is at height 0); • the "last common ancestor" of two nodes is the longest prefix shared by the two nodes: we write u ∧ v for the last common ancestor of nodes u and v.
In particular, the definition of a tree can be immediately rephrased using this new vocabulary reflecting the genealogical point of view: a tree is a set of nodes such that if a node is in the tree, then all its ancestors must also be in the tree.We now define a probability distribution on the set of m-ary trees: it is the distribution of "split trees" first introduced by Devroye [Dev98], but generalised to possibly infinite arity as in [Jan19].Let ν be a probability distribution on the set and (Y (w)) w∈Dm be a family of i.i.d.ν-distributed random vectors.For each node ) denoting the w -th coordinate of the vector Y ( ← w) (see Figure 2 for an example: Y (3) = (.1, .4,.5)and thus Z 32 = .4).We also let (X n ) n≥0 be a sequence of i.i.d.random variables uniformly distributed on [0, 1], and independent from the sequence (Y (w)) w∈Dm .
We need one last definition to define our sequence of random split trees: Given a tree t, we denote by ∂t the nodes of D m that are not in t but whose parent is in t, and we call the elements of this set the "leaves" of t.It is not hard to see that if t has n nodes, then ∂t has cardinality (m − 1)n + 1 (see Figure 2).
We can now define the sequence (τ n ) n≥1 of random trees recursively as follows.
• for n ≥ 1 arbitrary, given τ n , we define τ n+1 as the tree obtained by adding one node to τ n as follows: -We subdivide the interval [0, 1] in subintervals indexed by ∂τ n of respective lengths ∅ =v w Z v , for all w ∈ ∂τ n .(Note that, by definition, w∈∂τn ∅ =v w Z v = 1; see Figure 2 for an example, and observe that some points form part of several intervals.) -We set ξ(n + 1) = w if X n+1 ∈ [0, 1] belongs to the part indexed by w of this partition of [0, 1], and finally set τ n+1 = τ n ∪ {ξ(n + 1)}; note that this is well-defined almost surely.
The sequence of random trees (τ n ) n≥1 is called the random split tree of split distribution ν (which we recall is the distribution of the Y (w)'s).This definition incorporates a variety of different random trees that are classical in the literature: Figure 2: A realisation of the 3-ary split tree τ 2 , here we have τ 2 = {∅, 3}.The labels on the edges represent the values of (Y (w)) w∈τ2 : for example, Y (∅) = (.65,.15,.2).The value of Z w is thus the label on the edge from w to its parent: for example, Z 31 = .1.The nodes that are marked by a square are the elements of ∂τ 2 , underneath each leaf is written the corresponding part in the partition used to build τ 3 .For example, the part corresponding to 32 is of length Z 3 Z 32 = .2× .4= .08.

Voronoi cells and final territories
In this paper, our aim is to investigate the sizes of the Voronoi cells corresponding to k nodes taken uniformly at random in the n-node random split tree τ n defined in Section 1.1.In this context, we will also accommodate for the setting of having random edge lengths between the nodes: let be a probability distribution on (0, ∞) and let (L w ) w∈Dm be a sequence of i.i.d.random variables of distribution , and we define the distance between two nodes as the sum of the length of the edges on the unique shortest path between them; see Figure 3  This definition holds for any fixed sequence of edge lengths: all along the paper, we use the distance d L , where L = (L w ) w∈Dm is the sequence of i.i.d.random edge lengths.Also, note that if w = 1 for all w ∈ D m , then d corresponds to the graph distance in the graph whose nodes are all elements of D m , and where there is an edge between two nodes if and only if one is the parent of the other.
Definition 1.4.Let u 1 , . . ., u k be k nodes in an m-ary tree t, and d a distance on D m .We define the Voronoi cells of u 1 , . . ., u k as follows: for all 1 ≤ i ≤ k, We say that Vor i t,d (u 1 , . . ., u k ) is the Voronoi cell of u i (with respect to u 1 , . . ., u k ).Remark 1.5.The idea of Definition 1.4 is that Vor i t,d (u 1 , . . ., u k ) contains all the nodes that are closer to u i than to any of the other u j 's for distance d on t.The difference between '<' and '≤' induces a simple rule to break ties (in case of equal distances, the vertex with smaller index is preferred).However, since the number of boundary vertices is of constant order, the choice we make about how to break ties has no impact on our results.
A possible interpretation of Voronoi cells is in terms of epidemics: imagine that k competing epidemics start spreading at speed one from, respectively, u 1 , . . ., u k , and that once a node is infected by an epidemic, then it becomes immune to all others.If two or more epidemics reach one node at the same time, then the node gets infected with the epidemics that started at the u i with smallest index.In this context, the Voronoi cells are the final territories of the k infections, that is, the Voronoi cell of u i contains all the nodes that got infected by the epidemic that started at node u i .From this point of view, it is natural to consider the case when the epidemics spread at different speeds: Definition 1.6.Let t be an m-ary tree and denote by d be a distance on D m .Furthermore, let u 1 , . . ., u k be nodes in t and let s 1 , . . ., s k ∈ (0, +∞), the 'speeds of the epidemics'.We define the final territories of (u 1 , s 1 ), . . ., (u k , s k ) as

Main results
Our main result provides asymptotic statements on the sizes of k epidemics in the case when the epidemics have different speeds.We first state the result in the simpler case when all epidemics have the same speed (Theorem 1.7) and then extend it to the setting where different speeds are admissible (Theorem 1.8).
Both theorems apply to finite (m ∈ {2, 3, . ..}) as well as infinite (m = ∞) arity.They hold under the following hypothesis on the split-vector distribution ν and the edge length distribution : (A1) (i) If Supp(ν) denotes the support of the probability distribution ν, and e i is the m-dimensional vector whose coordinates are all equal to 0 except the i-th coordinate which is equal to one, then is the size-biased version of the marginals of ν, then µ := E[log 1 / Ȳ ] > 0 and σ 2 := Var(log Ȳ ) < +∞.
Assumption (A1-i) just excludes the trivial case when the n-node split tree is almost surely equal to a line of n nodes hanging under each other under the root.Assumptions (A1-ii) and (A2) give some control over the moments of respectively the split vectors and the edge lengths: these assumptions will be used when applying laws of large numbers and of the iterated logarithm, as well as central limit theorems to sum of independent copies of these random variables.
Theorem 1.7.Let be a probability distribution on (0, +∞), and ν be a probability distribution on Σ m .Let (τ n ) n≥1 be the random split tree of split distribution ν, and L = (L w ) w∈Dm be a sequence of i.i.d.random variables of distribution , independent of (τ n ) n≥1 .
For each n ≥ 1, let U 1 (n), . . ., U k (n) be k nodes taken uniformly at random among the n nodes of τ n ; we let V (1) (n) ≥ . . .≥ V (k) (n) be the sizes of their Voronoi cells in τ n with respect to the distance d L , ordered in decreasing order.
Under Assumptions (A1) and (A2), we have in distribution when n → +∞, < +∞, and an α-stable distribution otherwise, and where (1.4) In words, the above amounts to the fact that the second, third, . . ., kth largest component each occupies a proportion of roughly exp{−Ψ(log n) 1/α } of the vertices, where Ψ is some explicit positive random variable.This implies that asymptotically and in distribution, the entire mass is allocated to the largest component (which, by construction, belongs to the vertex closest to the root).The allocation for split trees is therefore qualitatively very different from the allocation in the universality class of the continuum random tree, where the limit of the proportions of the masses is known to be uniform [AAC + 18].
We now extend the results of the previous theorem to the case of different speeds at which the uniformly chosen vertices claim territory (use the same notation as in Theorem 1.7).
Note that, given that the slower epidemics all have very small territories (cf.(1.6)), the j fastest territories behave as in Theorem 1.7, which -at least heuristically -entails (1.5).
It is also interesting to note that in their first asymptotic order given by (1.6), the sizes of the slow epidemics do not depend on the edge length.An intuitive indication towards this fact is that replacing L by cL for a positive constant c does not change the sizes of the territories.In a similar vein, the right-hand sides of (1.3) and (1.5) also remain unchanged upon replacing L by cL, as expected.
As a by-product of our proof of Theorems 1.7 and 1.8, we get the following result on the convergence of the profile of random split trees, which, as far as we are aware, is a new result in the context of split trees: Proposition 1.9.Let (τ n ) n≥1 be the random split tree of split distribution ν, and let, for all integer n, π n = 1 n n i=1 δ |ν i | be the random profile of τ n , where we recall that |ν i | is the height of the node inserted at time i in (τ n ) n≥1 .If ν satisfies Assumption (A1), then in probability as n → +∞, on the space of probability measures on R equipped with the topology of weak convergence.
Stronger results are already known for certain cases of split trees: in particular, it is known that (1.7) holds almost surely in the case of the binary search tree [CDJH01], the random recursive tree [MM17], and the preferential attachment tree [Kat05].The profile of the uniform random tree (considered by [AAC + 18] in the context of Voronoi cells) converges in distribution to the local time of a Brownian excursion (see [DG97]).
Remark 1.10.Note that Theorem 1.7 holds in an averaged sense (or with respect to the joint law).One could imagine two quenched versions by (i) conditioning on the random split tree (τ n ) n≥1 or (ii) conditioning additionally also on the sequence of edge lengths L. Since, in our proof, we use the central limit theorem for the sequence L, our current methods do not provide with a possible version of Theorem 1.7 quenched with respect to L. However, for the split distributions ν for which (1.7) holds almost surely, Theorem 1.7 would hold almost surely given (τ n ) n≥1 .
The remainder of the paper is organised as follows.In Section 2, we establish a central limit theorem for the joint law of the height of uniform vertices and derive Proposition 1.9.Furthermore, we proof Theorem 1.7.In Section 3, we extend these arguments to the case of different speeds thereby proving Theorem 1.8.
2 Proof of Theorem 1.7 In this section, we use the same notation, and place ourselves under the assumptions of Theorem 1.7.The idea of the proof is as follows: if = δ 1 (i.e.all edge lengths are equal to 1 almost surely, i.e.L ≡ 1) then among the nodes U 1 (n), . . ., U k (n), the one closest to the root belongs to the Voronoi cell containing the root, and this Voronoi cell typically is the largest of all Voronoi cells.As a consequence, it is important to understand the heights of k uniform nodes in a random split tree.Recall that for a graph node v ∈ τ , we write |v| for that graph distance between the root ∅ and v.Then, in distribution as n → +∞, we have where the Λ 1 , . . ., Λ k are independent centred Gaussian random variables of variance σ 2 .
This lemma straightforwardly implies Proposition 1.9.
Proof of Proposition 1.9.We use [MM17, Lemma 3.1], which states that for a sequence of random measures (π n ) n≥0 to converge in probability to a limiting measure π ∞ , it is enough to show, for two random variables A n and B n sampled independently according to the random measure π n , that (A n , B n ) → (A, B) in distribution, where A and B are ν-distributed and independent.(Note that, on the left-hand side, A n and B n are independent conditionally on π n , but not without this conditioning.)As a direct consequence, we conclude the proof using [MM17, Lemma 3.1] in combination with Lemma 2.1 for the particular case k = 2 in order to ensure the required convergence conditions of [MM17, Lemma 3.1] to be fulfilled.
To prove Lemma 2.1, we first prove convergence of the marginals and then derive asymptotic independence; the latter is a consequence of the following lemma: Lemma 2.2.For all h ∈ N, let us denote by | be the height of the most recent common ancestor of U 1 (n), . . ., U k (n) and S 1 (n), . . ., S k (n) be the sizes of the subtrees of τ n rooted at U (Hn) 1 (n), . . ., U (Hn)   k (n) respectively.In distribution when n → +∞, where H is an almost surely finite random variable, and α 1 , . . ., α k are almost surely positive random variables.
Proof.We first look at the last common ancestor of U 1 (n) and U 2 (n): for all words w ∈ D m , we have where s v (n) is the size of the subtree of τ n rooted at v (in particular, this is equal to zero if v / ∈ τ n ).By the definition of the model and the strong law of large numbers we know that, conditionally on the sequence Y = (Y (v)) v∈Dm , for all v ∈ D m , almost surely when n → +∞, where we recall that Z w = Y (w), for all w ∈ D m and ∈ {1, . . ., m}.Therefore, conditionally on Y and almost surely when n → +∞, we have which implies, using dominated convergence, To prove that this implies convergence in distribution of U 1 (n) ∧ U 2 (n) to an almost surely finite random variable K 1,2 ∈ D m , we need to show that In order to prove (2.1), we first note that, by independence of the Z u 's (except among siblings), for all w ∈ D m , where we have introduced the shorthand , with Y a random vector of distribution ν (note that, by Assumption (A1-i), β = 0).This entails that where we recall that |w| is the height of w.For all h ≥ 0, using again the independence of the Z u 's, we infer that by definition of β.Plugging the last equality into (2.3), this amounts to by iteration, and further, using (2.2) and the fact that by Assumption (A1-i), β = 0, we get This concludes the proof of (2.1), and thus of the fact that U 1 (n) ∧ U 2 (n) converges in distribution to an almost surely finite random variable K 1,2 , Consequently, where each of the K i,j (which are not independent) has the same distribution as K 1,2 .The random variable H is almost surely finite since all the K i,j 's are.
Finally, for all n ≥ 1, x 1 , . . ., x k ∈ [0, ∞) and h ∈ {0, 1, . ..} we have where D (h) m is the set of all distinct w 1 , . . ., w k ∈ {1, . . ., m} h such that the cardinality of the set { ← w 1 , . . ., ← w k } is at most k − 1 (where we recall that ← w denotes the parent of a node w).By the strong law of large numbers, we thus get and by dominated convergence, which concludes the proof.
To see that α j > 0 almost surely for all j ∈ {1, . . ., k}, note that because, if u w j Z u = 0 then k i=1 u w i Z u = 0 and thus Since this is true for all 1 ≤ j ≤ k, we indeed have that the α j 's are almost surely positive.
Proof of Lemma 2.1.We first prove the convergence of the marginals: let k n be an integer chosen uniformly at random in {1, . . ., n}, then ξ(k n ) = U 1 (n) in distribution; recall that by definition, for all n ≥ 1, ξ(n) is the unique node that belongs to τ n but not to τ n−1 .By [Dev98, Th. 2], we know that, in distribution when Therefore, since log k n = log n + O P (1) when n → +∞, we get which immediately entails the convergence of the marginals to the desired limit.To show that the limits are independent, we use Lemma 2.2, and the fact that, by definition of the model, given H n , S 1 (n), . . ., S k (n), the trees rooted at U (Hn)

1
(n), . . ., U (Hn)   k (n) are independent split trees of split distribution ν and of respective sizes S 1 (n), . . ., S k (n).Moreover, for all 1 ≤ i ≤ k, the node U i (n) is distributed uniformly at random among the nodes of the split tree rooted at U (Hn) i (n).Therefore, given H n , S 1 (n), . . ., S k (n), we have, in distribution and jointly for all 1 ≤ i ≤ k, where the U (i) 's are independent, and for all i, U (i) (S i (n)) is a node taken uniformly at random in a split tree of size S i (n).As a consequence, applying (2.4) to each of the k independent split trees, we get that, in distribution and jointly for all 1 ≤ i ≤ k, where (Λ 1 , . . ., Λ k ) are k independent centred Gaussians of variance σ 2 ; we have used the fact that by Lemma 2.2, log S i (n) = log n + O P (1) when n → +∞.
Applying the law of large numbers to the i.i.d.edge lengths, and using the fact that, by Lemma 2.2, the height H n of the last common ancestor of U 1 (n), . . ., U k (n) converges in distribution to an almost surely finite random variable, Lemma 2.1 entails a similar result for the distances of U 1 (n), . . ., U k (n) to the root (for the distance d L ).In the following, L denotes a random variable distributed according to .Lemma 2.3 (CLT for distances to the root).Under the assumptions of Theorem 1.7, if Var(L) < +∞, then, in distribution when n → +∞, we have where the Ξ 1 , . . ., Ξ k are independent centred Gaussian random variables of variance Var(L) + σ 2 (EL) 2 .
Proof.In this proof, we set U i [H n ] = U (Hn) i (n), i.e. the ancestor of U i (n) at height H n , where H n is defined in Lemma 2.2.For all 1 ≤ i ≤ k, we have where the L (i) = (L (i) u ) u∈Dm are i.i.d.copies of L, and where Since, by Lemma 2.2, H n converges in distribution to an almost surely finite random variable H, we have in distribution when n → +∞.Note that, given H n , the random variables A 1 (n), . . ., A k (n) are independent, because the L u 's are independent, and the sums in A 1 (n), . . ., A k (n) that involve the sequence L (as opposed to its i.i.d.copies L (1) , . . ., L (k) ) range over distinct nodes u.Therefore, in distribution, we have, jointly for all 1 ≤ i ≤ k, where L (i) = ( L (i) j ) j≥1 is a sequence of i.i.d.copies of the L (i) u 's, and the k sequences (L (i) ) 1≤i≤k are independent of each other.By the central limit theorem, we have, jointly for all 1 in distribution when n → +∞, where Θ 1 , . . ., Θ k are independent standard Gaussians.Since |U i (n)| → +∞ in probability when n → +∞, and since (|U i (n)|) 1≤i≤k is independent from (L (i) j : j ≥ 1) 1≤i≤k , this implies that, jointly for all 1 Indeed, for all u 1 , . . ., u k ∈ R, and ε > 0, there exists m 0 > 0 such that, for all m 1 , . . ., m k ≥ m 0 , Because U i (n) → +∞ in probability, there exists n 0 such that, for all n ≥ n 0 , P(inf Therefore, for all u 1 , . . ., u k ∈ R, ε > 0, and n ≥ n 0 , where, in the second inequality, we have conditioned on the different possible values of inf 1≤i≤k |U i (n)|, and used the triangular inequality.This concludes the proof of (2.7).We thus get that, jointly for all 1 ≤ i ≤ k, where we have used Lemma 2.1 (where Λ 1 , . . ., Λ k are defined).Note that, by definition, Θ i is independent from Λ i for all 1 2 , and Ξ 1 , . . ., Ξ k are independent, as claimed.
Proof.We proceed as in the proof of Lemma 2.3 and employ the same notation: in particular, using that our assumptions on L entail its expectation being finite, (2.5) and (2.6) give that, in distribution when n → +∞, (2.8) Using the functional limit theorem for sums of i.i.d.heavy-tailed random variables (see, e.g., Theorem 2 in [GK54, § 35]), we have that where Υ 1 (α), . . ., Υ k (α) are i.i.d.copies of an α-stable random variable.We thus get that in distribution when n → +∞, where we have used (2.8) and the fact that, by Lemma 2.1, |U i (n)| (log n)/µ → 1 in probability when n → +∞.We thus get that, jointly for all 1 ≤ i ≤ k, in distribution when n → +∞; here, we have used Lemma 2.1 and the fact that α < 2, which implies 1 /α > 1 /2 (and α > 1 again to get the finiteness of EL).
Remark 2.5.Note that in the α-stable case, the second summand on the right-hand side of (2.9) is negligible compared to the first summand, i.e. the fluctuations coming from the height of U i (n) are asymptotically negligible in front of the fluctuations coming from the edge-lengths.
Next we control the sizes of subtrees rooted at certain nodes within the tree.For this purpose, imagine that U 1 (n) is the closest to the root (in graph distance) among U 1 (n), . . ., U k (n).Then the Voronoi cell of U 2 (n) is the subtree rooted at the ancestor of U 2 (n) that has height From Lemma 2.2, we already know that |U 1 (n) ∧ U 2 (n)| converges in distribution to an almost surely finite random variable.The following lemma gives a limiting result for the size of the subtree rooted at an ancestor of U 2 (n) at height h(n) for some function h ) for the size of the subtree of τ n rooted at the ancestor of U i (n) closest to the root whose L-distance to the root is at least (2.10) By Lemmas 2.3 and 2.4, Proposition 2.6 (Convergence of (extended) fringe trees).Let f : N → N be a function such that lim n→+∞ f (n) = +∞, and x > 0. We assume that either f (n) = o(log n) when n → +∞, or f (n) = log n for all n ≥ 1 and set Then, under the assumptions of Theorem 1.7, for all This lemma is at the heart of the proof of Theorem 1.7: it establishes a law of large numbers for the logarithm of the size of fringe trees.A fringe tree, as defined in [HJ17] (see also the references therein for a literature review on the subject), is the subtree rooted at a node taken uniformly at random among the n-nodes of a random tree (in our case the n-node split tree of split distribution ν).An extended fringe tree (still following [HJ17]) is the subtree rooted at one of the ancestors of this randomly chosen node, under the assumption that this ancestor is at a fixed graph distance of the randomly chosen node.In Proposition 2.6, however, we also consider subtrees rooted at an ancestor of a node U (n) = U i (n) chosen uniformly at random in our tree, but this ancestor can be at a distance that grows with n.Therefore, we get results that are weaker than the result stated in [HJ17] in the case of the binary search tree and the random recursive tree (which, we recall, are both split trees).
Proof of Proposition 2.6.We defined the split tree (τ n ) n≥1 as a sequence of random trees, with ξ(n + 1) denoting the unique node in τ n+1 but not in τ n .Now let U be a uniform random variable on [0, 1], set k n = U n ∈ {1, . . ., n} for all n ≥ 1, and note that n − k n → +∞ almost surely as n → +∞. (2.12) We fix i ∈ {1, . . ., k} throughout the proof.Letting U i (n) be the node of index k n , we observe that by the law of large numbers.
Recall that U (xh(n)) i (n) denotes the ancestor of U i (n) closest to the root whose height is at least xh(n); we denote by k n (x) the integer such that U (xh(n)) i (n) = ξ(k n (x)).By definition, we have k n (x) ≤ k n .We next derive a law of large numbers for the width of the split interval associated to the node of index k n (x).Recall that, by definition of (τ n ) n≥1 , to each node w ∈ τ n (among which ξ(k n )) is associated a sub-interval of [0, 1], whose length is given by ∅ =u w Z u .We let be the length of the interval associated to node ξ(k n (x)).We claim the following law of large numbers for Indeed, first note that the random variables Z u are sized-biased since we condition on the event u ξ(k n ); more precisely, we condition the intervals associated to the nodes u occurring in the product of (2.14) to contain X kn .In other words, conditionally on u ξ(k n ), we have Z u = Ȳ in distribution, where Ȳ as in (1.2) and Y = (Y 1 , . . ., Y m ) ∼ ν.Therefore, by the law of large numbers, since h(n) → +∞ in probability (cf.(2.13)), we get in probability when n → +∞, which concludes the proof of (2.15).
The main step now is to show At the end of the proof, we show that this implies (2.11).
To prove (2.16), we rewrite its left-hand side as a sum of several terms to which we will apply various concentration inequalities.First note that, for all x ∈ [a, b], for all n ≥ 1, (2.17) We show that the right hand side is o P (1) as n → +∞ uniformly for x ∈ [a, b] by treating each of the three summands separately.
We start with the second term, which is the easiest.Recall that k n (x) ≤ k n and that almost surely as n → +∞.
For the third term on the right-hand side of (2.17), we proceed as follows.In distribution, ( , where ( Z ) ≥1 is a sequence of i.i.d.copies of Ȳ .We apply the law of the iterated logarithm to the sequence of i.i.This implies in particular that there exists an almost surely finite random number m 0 such that almost surely, We fix ε > 0 and choose m 1 ≥ m 0 such that 2σ log log(m 1 )/m 1 ≤ ε.We have, almost surely sup Consequently, we have for all n ≥ 1 that 1 X <Qn(x) .
We now recall that (see (2.13)) h(n) ∼ F i (n)/EL in probability as n → +∞.In the case when f (n) = o(log n), since |U i (n)| L / log n converges to µ/EL in probability when n → +∞, and since log where we have set c = 1/EL.Because of (2.23), the second summand converges to 0 as n → +∞.Note that, as x increases between a and b, D (i) n (xf )/(n − k n (x))Q n (x) only changes value when xh(n) crosses an integer value.Thus, the event inside the first probability on the right-hand side implies that there exists We thus get via a union bound Using the law of large numbers in (2.15), we get that, in probability as n → ∞, where we have used (2.16), in the first equality, and (2.23) in the second one.By (2.16) and the triangular inequality, this last supremum goes to zero in probability when n → +∞, which concludes the proof.
We are now ready to prove Theorem 1.7.
Proof of Theorem 1.7.We let U (1) (n), . . ., U (k) (n) be the nodes U 1 (n), . . ., U k (n) ordered in increasing Ldistance to the root, that is, be the sizes of their respective Voronoi cells (in that order, i.e. the Voronoi cell of U (i) is V i (n)).We set m = E[log Ȳ ]/EL and start by showing that, in distribution when n → +∞, where is the increasing order statistics of Ψ 1 , . . ., Ψ k , and with where we recall that α := 2 when Var(L) < +∞, and where we have set We now show that (2.28) implies (1.3): the only difference between the two is that the entries in the left-hand side of (2.28) are ordered in increasing distance of the respective U i (n)'s to the root, while those in the left-hand side of (1.3) are ordered in decreasing sizes of the Voronoi cells.However, the convergence in (2.28) implies in particular that where we recall that, by definition, V (i) (n) is the i-th largest of the k Voronoi cells.We now let C n denote the event that when n → +∞, because P(C n ) → 1.Thus, (2.26) entails (1.3), and due to the above it is sufficient to establish (2.28).For this purpose, for each n ≥ 1 set for all 1 ≤ i = j ≤ k, the L-distance to the root of the point where the Voronoi cells of U i (n) and U j (n) would meet if we ignored all other k − 2 points would be at least K n .Thus, for all i = := argmin 1≤j≤k |U j (n)| L , the Voronoi cell of U i (n) meets the Voronoi cell of U (n) at L-distance to the root exceeding K n , implying that the Voronoi cell of U i (n) for all i = is the subtree rooted at the ancestor of U i (n) closest to the root among all ancestors of U i (n) whose L-distance to the root exceeds

.31)
We thus get that on G n , log log Z u is a sum of x n h(n) i.i.d.random variables with finite variance, since Var(log Ȳ ) < +∞ by assumption.By definition, for all y > 0, the node ξ(k n (y)) is the ancestor of U i (n) closest to the root whose L-distance to the root is at least yh(n).Therefore, almost surely, for all y ≤ z, ξ(k n (y)) is an ancestor of ξ(k n (z)) (which includes the case when both nodes are equal).In other words, as y increases from x − ε to x + ε, ξ(k n (y)) goes through the ancestors of U i (n) at L-distance to the root between (x − ε)h(n) and (x + ε)h(n), in that order.Therefore, in distribution, we have, jointly for all y ∈ Note that, by the independence of the sequence L and the rest of the process, the two limits in (3.2) and (3.3) hold jointly, and the two Gaussians are independent.Combining (3.1) to (3.3), the above yields In (2.22) we have proved that F i (n) ∼ f (n) in probability as n → +∞, but in fact, this statement can be made stronger: we have F i (n) = f (n) as soon as |U i (n)| L ≥ xf (n), an event whose probability tends to 1 with n because either f (n) = o(log n) or f (n) = log n and x < EL/µ.This implies (i).
(ii) Under the assumption that P(L ≥ x) = (x)x −α with α ∈ (1, 2), the limit in (3.3) does not hold, instead we have that where Υ(α) is an α-stable distribution.Thus log(D where the second summand now vanishes in the limit.This concludes the proof of (ii) because µ = E[log( 1 / Ȳ )], and because F i (n) = f (n) with probability tending to 1 when n tends to infinity.
Proof of Theorem 1.8.For this proof, we consider that the infections "creep along edges" between the times at which they infect vertices: if two infections of respective speeds s and s start at two neighbouring vertices v and v (respectively), then they meet at distance from v proportional to s s+s .For any two infections i and , the L-distance between U i (n) and U (n) is equal to Therefore, the time t i, at which epidemics i and would meet if there were no other infection at play is equal to the time it would take for an infection of speed s i + s to cross a distance ∆ i, , i.e.
Therefore, in the absence of the other k−2 infections, the i-th and -th infections would meet at L-distance to the root equal to the maximum of Thus, on the event (recall K n from (2.29)), for all 1 ≤ i < ≤ k, if we ignored all other k − 2 epidemics, the epidemics started respectively at U i (n) and U (n) would meet at L-distance to the root at least K n .Also note that the probability of A n goes to one when n → +∞ by Lemmas 2.2 and 2.3.Thus it is enough to restrict ourselves to the set where A n holds.We let κ = κ(n) = arg min{1 ≤ ≤ j : |U (n)| L }. On the event A n , for all i = κ, the territory of the i-th infection neighbours a unique other territory, and this neighbouring territory is the territory of the κ-th epidemic.We let d i (n) denote the L-distance from the root to the point where they meet (this point can be in the middle of an edge).On A n , the territory of the i-th infection is the subtree of τ n rooted at the ancestor of U i (n) closest to the root whose L-distance to the root is at least d i (n).We

∅
for an example.Definition 1.3.For all families = ( w ) w∈Dm of positive random variables we define a distance d on D m as follows: for all pairs of nodes u and v in D m (for all m ≥ 2), let d (u, v) := For all nodes w ∈ D m , we denote by |w| := d (∅, w).

Figure 3 :
Figure3: A binary tree.The distance between the nodes 112 and 12 (marked as squares on the picture) with respect to the sequence is 11 + 112 + 12 (the sum of the length of the bold edges) because their last common ancestor is 1.
as n → +∞, because h(n) → +∞ in probability with n.In other words, ∈ [a, b] conditionally on k n (x) and Z = (Z u ) u∈Dm , we have, in distribution, probability as n → +∞.(2.22)In the case when f (n) = log n, since |U i (n)| L / log n converges to µ/EL in probability when n → +∞, and since x ≤ b < µ/EL, we get F i (n) ∼ log n in probability when n → +∞.In both cases, For all δ > 0 and for all η > 0 small enough such that sup x∈[1−η,1+η] | log x| ≤ δ, we have This implies that the right-hand side in (2.25) converges to zero as n → +∞.Using again that h(n) ∼ f (n) in probability when n → +∞, we get that the right-hand side of (2.24) also tends to zero with n, and thus (2.20) is true, which concludes the proof of (2.16).
Indeed, first note that, by (3.4) and (3.5), under A n , the i-th and κ-th infections meet at L-distance to the root equal to |U i (n)| L − s i t i,κ , and thus d i (n) = |U i (n)| L − s i t i,κ .By Lemma 2.2, and using the notationH n = max 1≤i<k≤n |U i (n) ∧ U k (n)|, we get 0 ≤ |U i (n) ∧ U κ (n)| L log n ≤ H n log n→ 0 in probability as n → +∞, (3.7)