The Brownian continuum random tree as the unique solution to a fixed point equation

In this note, we provide a new characterization of Aldous' Brownian continuum random tree as the unique fixed point of a certain natural operation on continuum trees (which gives rise to a recursive distributional equation). We also show that this fixed point is attractive.


Introduction
The Brownian continuum random tree (BCRT), which was introduced and first studied by Aldous [3,4,5], is the prototypical example of a random R-tree/continuum random tree. Its importance derives from the fact that it is the scaling limit of a large class of discrete trees including: all critical Galton-Watson trees with finite offspring variance [3,5], unordered binary trees [17], uniform unordered trees [14], uniform unlabelled unrooted trees [23], critical multitype Galton-Watson trees [18], random trees with a prescribed degree sequence satisfying certain conditions [8] and random dissections [10].
Many of these convergence results are proved using some sort of functional coding. However, particularly in the case of unordered trees, a natural functional coding whose distributional properties are easily understood is not always available. In such settings, an alternative approach is desirable. By a recursive distributional equation for a random variable X taking values in some Polish space S, we mean an equation of the form where X 1 , X 2 , . . . are i.i.d. copies of X, independent of the family (ξ i ) i≥1 , and f is a suitable S-valued function. We can, of course, think of this equation in terms of probability distributions: if µ is the distribution of X and F (µ) is the distribution of the right-hand side then µ is a fixed point of the operator F .
For families of random variables which satisfy a natural recursive distributional equation, the so-called contraction method has been demonstrated to be a powerful tool for proving convergence results. Suppose that (M n ) is a sequence of distributions for which we wish to prove that there exists a limit M . The basic idea is as follows. Suppose that M n can be described recursively in terms of M m for m < n. This equation often allows one to guess a limiting version, in which M is described in terms of itself. In other words, M should be the fixed point of some operator F. Suppose that, in addition, F is a contraction in a suitable metric on the space of probability measures. Then Banach's fixed point theorem tells us that there exists a fixed point and that M n → M as n → ∞ in the sense of that metric. This is straightforward in principle, but usually the recursive equation for M n does not have precisely the same form as the limiting operator. Moreover, finding a metric in which F is a contraction (but which also yields weak convergence) is often highly non-trivial. In practice, this method has been applied very successfully for sequences of random variables (see, for example, [21,22,19]), but so far there is only one result for the more complicated setting of convergence of stochastic processes [20].
It is often the case that families of discrete trees have a recursive definition or description. Aldous [6] proved that the BCRT is a fixed point for a natural operation on continuum trees. With these two facts in mind, it is natural to ask if a contraction method can be established for random trees. This seems an ambitious aim, and there are several technical issues to be overcome (not least the choice of metric). But our original motivation stems from the fact that, if such a principle were to be established, then the characterization of possible limits should be the first step. In this article, we prove that the BCRT is the unique fixed point of an appropriate operator, and that this fixed point is attractive for a certain natural class of measures on continuum trees.
The rest of this note is organised as follows. In Section 1, we provide an overview of the various definitions of the BCRT which already exist in the literature. This also enables us to introduce various concepts we will need in the sequel. We then set up our fixed point equation. In Section 2, we prove that it has a unique solution. In Section 3, we show that repeatedly applying the fixed point operator to any suitable law on continuum trees gives convergence to the law of the BCRT in the sense of the Gromov-Prokhorov topology. Section 4 contains some concluding remarks.

Overview of definitions of the BCRT
We begin by introducing the notion of an R-tree.
• there exists a unique geodesic from x to y i.e. there exists a unique isometry f x,y : [0, d(x, y)] → T such that f x,y (0) = x and f x,y (d(x, y)) = y. The image of f x,y is called x, y ; • the only non-self-intersecting path from x to y is x, y i.e. if q : [0, 1] → T is continuous and injective and such that q(0) = x and q(1) = y then q([0, 1]) = x, y .
An element x ∈ T is called a vertex. A rooted R-tree is an R-tree (T, d) with a distinguished vertex ρ called the root. The height of a vertex x is d(ρ, x). The degree deg(x) of a vertex x is the number of connected components of T \ {x}. By a leaf, we mean a vertex of degree 1; write L(T ) for the set of leaves of T . The tree T is leaf-dense if T is the closure of L(T ). We will often want to endow an R-tree with a probability measure (µ, say), which allows us to pick random points in the tree.
A measured metric space (X, d, µ) is a complete metric space (X, d) equipped with a Borel probability measure µ on (X, d). Define a first equivalence relation by declaring two such spaces (X, d, µ) and (X , d , µ ) to be GHP-equivalent if there exists an isometry f : X → X such that the image of µ under f is µ . Let S denote the space of GHP-equivalence classes of compact measured metric spaces. Then S is Polish when endowed with the Gromov-Hausdorff-Prokhorov topology [1]. Define a second equivalence relation by declaring (X, d, µ) and (X , d , µ ) to be GP-equivalent if there exists an isometry g : supp(µ) → X such that the image of µ under f is µ , where supp(µ) denotes the topological support of µ. Let S denote the space of GP-equivalence classes of compact measured metric spaces. Then S is Polish when endowed with the Gromov-Prokhorov topology [12]. can then be shown to be an R-tree (see Le Gall [15]). The tree T can be naturally rooted at ρ = τ (0), the equivalence class of 0, and we will sometimes think of it as a rooted object and sometimes not. There is a natural measure µ on T given by the push-forward of the uniform distribution on [0, σ] under the projection τ .
We define the BCRT (T, ) . to be the R-tree encoded by where (e(t), 0 ≤ t ≤ 1) is a standard Brownian excursion. We usually endow (T, ) .
with the probability measure m which is the push-forward of the Lebesgue measure on [0, 1].
1.2. The BCRT as a limit of discrete trees. Let T n be the ordered rooted tree representing the genealogy of a Galton-Watson branching process with offspring distribution having mean 1 and variance σ 2 ∈ (0, ∞). Think of T n as a metric space by endowing it with the graph distance d gr (which puts neighbouring vertices at distance 1). Let µ n be the uniform measure on the vertices of T n . Then as n → ∞, in the Gromov-Hausdorff-Prokhorov sense. (This convergence in distribution is originally due to Aldous [3], although this formulation is closer to that of Le Gall [16].) 1.3. The BCRT via random finite-dimensional distributions. We may also characterize the BCRT as the unique continuum random tree having certain distributional properties. We must first introduce properly what we mean by a continuum tree.
is an (unrooted) R-tree and µ is a probability measure on T which is non-atomic and satisfies • µ(L(T )) = 1; is a random variable taking values in the set of continuum trees.
(In [5], Aldous makes slightly different definitions of these quantities which, in particular, use rooted trees.) It will be important in the sequel to observe that, if we consider the BCRT to be rooted at the equivalence class of 0 in the Brownian excursion construction, then the root has the same distribution as a uniform pick from m on T.
Give a CRT (T, d, µ), let V 1 , V 2 , . . . be i.i.d. samples from the measure µ. For m ≥ 2, define the reduced tree R(m) to be the subtree of T spanned by V 1 , V 2 , . . . , V m . For every m ≥ 2, R(m) is a discrete tree with edge-lengths and labelled leaves, and so its distribution is specified by its tree-shape, t, an unrooted tree with m labelled leaves, and its edge-lengths. Note that, given the leaf labels, the edges of t can be uniquely identified (and so can be also given labels). The reduced trees are clearly consistent, in that R(m) is a subtree of R(m + 1).
The reduced trees of the BCRT are binary almost surely. This entails that R(m) has 2m − 2 vertices and 2m − 3 edges. Let its tree-shape be t and its edge-lengths be x 1 , x 2 , . . . , x 2m−3 . Then R(m) has density Note that this implies that the tree-shape is, in fact, uniform on the set of binary tree-shapes with m leaves, and that the edge-lengths have an exchangeable distribution. We observe, for future reference, that the distance between two uniformlychosen points of the BCRT has the Rayleigh distribution, with density xe −x 2 /2 and expectation π/2.
(Note that in [5], Aldous restricts his discussion to binary trees, but the theory is easily extended; see Haas and Miermont [13].) 1.4. The BCRT as a fixed point. The principal contribution of this paper is a characterization of the BCRT as the unique fixed point of a certain operation on CRT's. We need a couple of notational ingredients. We first recall the definition of the Dirichlet distribution.
has the Dirichlet distribution with parameters (α 1 , α 2 , . . . , α n ) (written Dir(α 1 , α 2 , . . . , α n )) if it has density Let M be the set of probability distributions on (GHP-equivalence classes of) measured R-trees. We define F : M → M as follows: for M ∈ M, • Identify the vertices X 1 , X 2 and X 3 in the rescaled trees to obtain a single larger tree (T • , d) with a marked branch-point; the three measures ∆ 1 µ 1 , ∆ 2 µ 2 and ∆ 3 µ 3 naturally give rise to a (probability) measure µ on T • ; • Forget the marked branch-point in order to obtain (T, d, µ); F(M ) is the distribution of (T, d, µ).
The operation on trees given by the function F was first described by Aldous [6]. Let M be the law of the BCRT. Theorem 2 of [6] implies, when rephrased in our terms, that M is a solution of M = F(M ). Actually, what is shown in [6] is the following statement of the reverse of this construction: take a BCRT (T, , . m) and pick three points independently according to m; the paths between pairs of these points intersect in a unique branch-point. Splitting at this branch-point then gives three BCRT's, which have been randomly rescaled by (∆ 1 , ∆ 2 , ∆ 3 ) and depend on one another only through this rescaling. Moreover the former branch-point yields a point chosen independently from the mass measure of each of the three subtrees.
(An expanded proof of Aldous' Theorem 2 may be found in [2].) We will comment on this reversed perspective at the end of the paper.
Let M be a solution to the fixed point equation. Write (Ω, F, P) for the probability space on which all the forthcoming random objects are defined. In particular, under P, let (T, d, µ) be a continuum random tree sampled from the distribution M . Let α > 0 be such that α π/2 is the expected distance between two points sampled independently from T according to µ.
The first main result of this article is the following theorem, which is proved in the next section. As there are almost surely only finitely many of these branches having length longer than any > 0, this construction yields a compact metric space and, therefore, induces a probability distribution on the set of measured R-trees. A simple computation shows that this distribution is a solution of the fixed point equation which is clearly not isometric to the BCRT. However, it seems reasonable to want to exclude such non-continuum tree-valued solutions.
Our second main result is as follows. Note that if E [d(V 1 , V 2 )] = α π/2 for some α = 1 then the same result holds on rescaling the metric d by α −1 . Theorem 1.7 is proved in Section 3.

2.
Uniqueness of the fixed point: proof of Theorem 1.6 We will prove Theorem 1.6 via random finite-dimensional distributions and Theorem 1.4. We start by thinking about the distance between two uniformly-chosen points. Throughout this section, we suppose that M is a measure on continuum trees which is a fixed point of F, and that (T, d, µ) is sampled from M . We write S(m), m ≥ 2 for the reduced trees of (T, α −1 d, µ).
Proposition 2.1. The distance between two points of (T, α −1 d) sampled independently according to µ has the Rayleigh distribution.
Proof. Suppose that (T 1 , d 1 , µ 1 ), (T 2 , d 2 , µ 2 ) and (T 3 , d 3 , µ 3 ) are sampled independently from M . Apply F with ∆ = (∆ 1 , ∆ 2 , ∆ 3 ) to obtain a new tree (T, d, µ) ∼ M . Suppose now that we sample two points independently according to µ. Let P 1 , P 2 and P 3 be the number of points falling in the subtrees of T corresponding to T 1 , T 2 and T 3 respectively. Then, conditional on ∆, we have (P 1 , P 2 , P 3 ) ∼ Multinomial(2; ∆ 1 , ∆ 2 , ∆ 3 ). Let I k = 1 {P k >0} , k = 1, 2, 3 and let D be the distance between the two points. Then where D 1 , D 2 and D 3 are three independent copies of D, independent of everything else on the right-hand side, corresponding to the distances between two uniformlychosen points in each of the three subtrees. Let W k = √ ∆ k I k , k = 1, 2, 3. Then this is precisely the setting of the smoothing transform studied by Durrett and Liggett [11]. In that paper, it is shown that the nature of the family of solutions to such distributional fixed point equations depends on the analytic properties of a certain function depending on the moments of W 1 , W 2 , W 3 : for s ≥ 0, let By symmetry, ν(s) = log 3E W s Hence,

ν(s) = log 3(s + 7) (s + 3)(s + 5)
, which is finite for all s ≥ 0 and has its unique zero in s ≥ 0 at s = 1. Moreover, ν (1) = −7/24 < 0. Theorems 1 and 2 of [11] then entail that the equation (3) has a unique fixed point, up to a constant scaling factor. Finally, the distance between two uniformly chosen points in a BCRT has the Rayleigh distribution and that must be a solution to (3). Since E [D] = α π/2 and the Rayleigh distribution has mean π/2, the scaling factor must be α.
For future reference, we write F sm for the operator which takes the law of a nonnegative real-valued random variable D and associates to it the law of where D 1 , D 2 and D 3 are three independent copies of D , independent of everything else on the right-hand side, and where I 1 , I 2 , I 3 are exactly as above.

A coupling.
Having determined the distribution of S(2) (which, of course, has trivial tree-shape), we now want to determine the distribution of the reduced trees S(m), m ≥ 3. In order to do so, we proceed by coupling a tree T distributed according to M and a realisationT of the BCRT, using the operator F. Before we can describe this coupling, we need to establish some notation. Let Σ = ∪ ∞ i=0 {1, 2, 3} i be the set of words on the alphabet {1, 2, 3} where, by convention, {1, 2, 3} 0 = ∅ is the empty word. Let Σ n = ∪ n i=0 {1, 2, 3} i be the set of words with at most n letters. For i ∈ Σ, write |i| for the length of the word i. Fix n ≥ 0 and start from a family (T i ) i∈{1,2,3} n+1 of 3 n+1 independent copies of T , and a family (T i ) i∈{1,2,3} n+1 of 3 n+1 independent copies of the BCRT. We will refer to these as the input trees and will use them and successive applications of F in order to build the trees T = T ∅ andT =T ∅ . At each application of F, we will use the same scaling factors and glue together subtrees with the same labels. More precisely, let (∆ (i) ) i∈Σn and (U (i) ) i∈Σn be independent families of independent random variables where, for each i ∈ Σ n , ∆ (i) = (∆ The families (T i ) i∈Σn and (T i ) i∈Σn are constructed recursively as follows. The tree T i (resp.T i ) is constructed by applying F to T i1 , T i2 and T i3 (resp.T i1 ,T i2 andT i3 ) with scaling factors ∆  3 , where we emphasize that the same scaling factors are used to construct both families. In each of the trees T i1 , T i2 and T i3 (resp.T i1 ,T i2 andT i3 ), we need to pick a uniform point which tells us where to glue them together (once rescaled) to form T i (resp.T i ). But if |i| < n, we will also want to keep track of where these uniform points sit in the trees at level n + 1. We can split this problem into two parts: first finding the label of the subtree at level n + 1 in which a particular uniform point lies, and then finding where precisely within that subtree it sits. We will use the random variables (U (i) ) i∈Σn to determine the label of the subtree, and the exact location of the point is then a uniform pick from that subtree. We will use the same labels in T andT but independent picks from the respective subtrees chosen.
k . In addition, for j ∈ Σ, write ∆ (i) j := ∆ ij /∆ i . For i ∈ Σ n , let j ∈ {1, 2, 3} n−|i|+1 . Then the probability that a uniform point in T i belongs to the subtree T ij is equal to ∆ (i) j . For i ∈ Σ n and 1 ≤ k ≤ 3, we define a word L (i,k) of length n − |i| such that ikL (i,k) represents the label of the input tree at level n + 1 Recursively, for j ∈ Σ n−|i|−1 \{∅}, let Observe that the definition of L (i,k) depends only on (∆ (i) ) i∈Σn and (U (i) ) i∈Σn .
So, finally, when we sample the uniform point in T ik (resp. inT ik ) needed to create T i (resp.T i ), the value of L (i,k) gives the index of the input tree in which the point sits. Then, conditionally on this choice, we sample the point uniformly from T ikL (i,k) (resp.T ikL (i,k) ).

2.3.
The reduced trees. Now, consider S(3). Again, in this case, the tree-shape is deterministic. We will show that the lengths of the three branches can each be expressed as sums of rescaled distances between pairs of uniform points. We sample three new independent uniform points from our tree T . If, when we decompose this tree into its three subtrees T 1 , T 2 and T 3 , the three points all happen to fall into different subtrees, then we have determined the branch-point between them (which it is natural to label by ∅). However, if at least two points fall into the same subtree, we must then further decompose that subtree in order to determine the location of the branch-point. Let N 3 ≥ 1 be the recursion depth we have to look at in order to determine the branch-point between our three points. More generally, let N m be the depth to which we have to look in order to determine all of the branch-points between m uniform points. Proof. We proceed by induction on m and start with the case m = 3. There are three possibilities for the way in which the three points are distributed amongst the subtrees T 1 , T 2 and T 3 .
(1) The three points fall in different subtrees.
(2) All three points fall in the same subtree.
(3) Two points fall in the same subtree and the remaining point falls in a different subtree.
In case (1), as observed above, the branch-point is necessarily ∅. In case (2), we have a new independent copy of the original problem of finding the branch-point between three points chosen uniformly from a copy of T . In case (3), the branchpoint we seek is the same as that between ∅ and the two uniform points which fell in the same subtree. But ∅ is also a uniformly chosen point in that subtree. So again, it remains to find the branch-point between three points chosen uniformly from a copy of T . Indeed, unless case (1) occurs, we recursively obtain a new (independent) copy of the original problem. (See Figure 3 for an illustration.) It follows that N 3 has a geometric distribution. The probability that the three points T 33 T 32 Figure 3. Finding the branch-point between three uniform points. One point falls in T 2 and two fall in T 3 , so we must further decompose T 3 . We have an independent copy of the original problem in T 3 (where one of the points considered is now ∅). One of the points falls in T 33 and the other two in T 31 , so we must further decompose T 31 . One of the points now falls in T 313 but the two others are in T 311 , so we repeat in T 311 . Finally, the three points fall in different subtrees of T 311 and so we obtain N 3 = 4.
fall in different subtrees at any step is given by For m ≥ 4, there are again three possibilities for the distribution of the m points amongst the subtrees T 1 , T 2 and T 3 : (1) At least two points fall in different subtrees from the rest.
(2) All m points fall in the same subtree.
(3) m − 1 points fall in the same subtree and the remaining point falls in a different subtree.
In cases (2) and (3), we obtain again a new copy of the same problem. In case (1), we get two or three independent copies of a problem of strictly smaller size. Since case (1) occurs with positive probability, the result follows by (strong) induction. for each m ≥ 2 (see Greven, Pfaffelhuber and Winter [12], or the introduction to Bertoin and Miermont [7]). We will again use the coupling of Subsection 2.2 to prove this. Indeed, for fixed m, we must look to recursion depth N m in order to separate our m uniform points. Work on the event {N m = k}, where k < n. Then in order to obtain coupled trees distributed as M and M n respectively, we need to "plug in" 3 k input trees, sampled according to M or M n−k , respectively. In order to determine the distances in the reduced tree, we in fact only need to know about the distance between two uniform points in each of the input trees. In the input trees sampled according to M n−k , this isμ n−k which is equal to the the (n − k)-fold iterate of the smoothing transform F sm applied to the lawμ 0 of the distance between two uniformly sampled points of T 0 . But, by assumption, ∞ 0 xμ 0 (dx) = π/2 and so, for fixed k,μ n−k = F n−k smμ0 →m as n → ∞, by Theorem 2(b) of Durrett and Liggett [11] (that all that is required here is that the lawμ 0 has the same mean asm is a consequence of the facts already observed in the proof of Proposition 2.1 that the function ν has its unique zero in s ≥ 0 at s = 1 and that ν (1) < 0). The edge-lengths in S n (m) can then be described as sums of randomly rescaled independent random variables sampled fromμ n−k . It is clear (since we use the same random scaling factors in order to construct both) that the edge-lengths of S n (m) converge in distribution to those of R(m) on the event {N m = k} for any fixed k ≥ 1. In order to conclude the proof of Theorem 1.7, we observe as before that P (N m > K) → 0 as K → ∞, since N m < ∞ almost surely.

Concluding remarks
4.1. Related work. As mentioned in Subsection 1.4, Aldous [6] shows that, in a sense, we can "reverse" the operator F. Indeed, we can decompose a BCRT by picking three uniform points and splitting at the branch-point between them; we obtain three independent BCRT's, Brownian-rescaled by (∆ 1 , ∆ 2 , ∆ 3 ) ∼ Dir(1/2, 1/2, 1/2). Each of these subtrees is doubly marked, one mark being the original uniform point and the other being the former branch-point. Moreover, these two marks are independent uniform picks from the relevant subtree. This operation is used recursively by Croydon and Hambly [9] to prove that the BCRT is homeomorphic to a certain deterministic fractal with a random self-similar metric, along with the naturallyassociated measure. In the course of their proof, they show (Lemma 10(d) of [9]) that all of the randomness in the BCRT is contained in an i.i.d. family of Dir(1/2, 1/2, 1/2) scaling factors (∆ i , i ∈ Σ).
Although we have referred to this decomposition of the BCRT as the reverse of our operator F, there is, in fact, a rather subtle difference concerning marking and labelling. The forward version of Croydon and Hambly's splitting operator acts on doubly uniformly marked trees and can be paraphrased as follows: take three independent BCRT's, T 1 , T 2 , T 3 , each with two independent uniform points, labelled 1 and 2. Rescale these trees according to the appropriate Dirichlet random vector and glue them together at the points labelled 1. Now relabel the point labelled 2 in T 1 by 1, keep the point labelled 2 in T 2 and forget the point labelled 2 in T 3 as well as the branch-point just created. Then this is again a doubly uniformly marked BCRT. This seems to us a much less natural "forward" operation on continuum trees than the one pursued in this paper, but it has the advantage that the recursive decomposition obtained by going backwards does not have any of the labelling issues encountered in Section 2.2. Indeed, in this version there is no randomness in which subtree attaches to which other subtree.

4.2.
Convergence. The distributional convergence in Theorem 1.7 is in the sense of the Gromov-Prokhorov distance which, for example, does not distinguish between the BCRT and the BCRT decorated by the independent PPP discussed after Theorem 1.6. In particular, this convergence is equivalent to the convergence in distribution of the random finite dimensional distributions. It would be interesting to find conditions under which the convergence holds instead in the stronger Gromov-Hausdorff-Prokhorov sense; in particular, we would need a certain tightness condition to hold (see Corollary 19 of [5]). would like to thank Louigi Addario-Berry and Luc Devroye for inviting us to the Fifth Annual Workshop on Probabilistic Combinatorics and WVD at the Bellairs Institute of McGill University, Barbados, where we began thinking about it, and the Isaac Newton Institute in Cambridge for its invitation in March-April 2015 which enabled us to complete the paper. We would also like to thank David Croydon for detailed discussions relating to the paper [9] and Ralph Neininger for discussions about the contraction method. C.G.'s research was partially funded by EPSRC Postdoctoral Fellowship EP/D065755/1. M.A. acknowledges the support of the ERC under the agreement "ERC StG 208471 -ExploreMap" and the ANR under the agreement "ANR 12-JS02-001-01 -Cartaplus".