Reversing the cut tree of the Brownian continuum random tree

Consider the Aldous--Pitman fragmentation process [Ann Probab, 26(4):1703--1726, 1998] of a Brownian continuum random tree ${\cal T}^{\mathrm{br}}$. The associated cut tree cut$({\cal T}^{\mathrm{br}})$, introduced by Bertoin and Miermont [Ann Appl Probab, 23:1469--1493, 2013], is defined in a measurable way from the fragmentation process, describing the genealogy of the fragmentation, and is itself distributed as a Brownian CRT. In this work, we introduce a shuffle transform, which can be considered as the reverse of the map taking ${\cal T}^{\mathrm{br}}$ to cut$({\cal T}^{\mathrm{br}})$.


Introduction
Let T be Aldous' Brownian continuum random tree (CRT) [2]. To the logging process of T introduced in Aldous and Pitman [4], one can associate another continuum random tree cut(T ), which describes the genealogical structure of this fragmentation process (see [10] and [13]). Moreover, for a Brownian CRT, this associated tree cut(T ) is also distributed as a Brownian CRT. One of the main questions [Miermont, Pers. Comm.] then is whether the transformation from T to the genealogy of the fragmentation cut(T ) is "reversible". Of course, some information has been lost about the initial tree T , and one must first understand whether it is possible to resample this information, and then study the possibility of the construction of a tree T that is distributed like T , conditional on cut(T ).
Let H be another Brownian CRT (that should be informally thought of as cut(T )), we define below a continuous tree shuff(H), which is random given H, and such that the following identity in distribution holds: The construction of the tree shuff(H) from H is the main objective of the present document, and can be seen as follows. Let Br(H) denote the set of branch points of H. Start by assigning independently to every branch point x ∈ Br(H) a random point A x sampled using the mass measure ν restricted to Sub(H, x), the subtree of H above x. For each such x, the choice of A x induces a choice among the subtrees of Sub(H, x) rooted at x: Let the fringe Fr(H, x, A x ) be the subset of points y ∈ Sub(H, x) for which the closest common ancestor of y and A x is y ∧ A x = x. Then, informally shuff(H) is obtained by detaching Fr(H, x, A x ) and reattaching it at A x , for every branch point x of H (see Figure 1); the points of the skeleton that are not branch points are not used. It is a priori unclear whether this definition makes sense, let alone that the resulting metric space is a real tree or that it has the correct distribution. It indeed seems that we discard from H all the length by leaving the skeleton behind. The remainder of the document is devoted to making this construction rigorous, and to prove that the tree shuff(H) satisfies (1).
The continuous problem at hand is connected to a rather large body of work on the destruction of random (discrete) trees by sampling of random nodes or edges initiated by Meir and Moon [24]. There is no significant difference between sampling nodes or edges, and we present here a version that samples nodes and proceeds as follows: sample a random node in some rooted tree (random or not), discard the portion that is now disconnected from the root, and keep going until the root is finally picked (the process then stops). The main question addressed by Meir and Moon [24] and many of the researchers after them was about the number of steps, or cuts, that are needed for the process to terminate. This problem has been considered for a number of classical models of trees including random binary search trees [19,20], random recursive trees [5,8,14,21] and the family trees of Galton-Watson processes conditioned on the total progeny [1,16,22,25]. Janson [22] was the first to realize by moment calculations that, when the tree is a Galton-Watson tree, there should be nice constructions of the limit random variables directly in terms of continuous cutting of the Brownian CRT. In some sense, the continuous cutting alluded in [22] is just a version of the logging of the CRT in which only the cuts affecting the size of the connected component containing the root are retained. The constructions in [1], [7] and [10] all encode the "number of cuts" affecting the connected components containing some points as the total length of some distinguished subtrees. The construction of the genealogy as a compact tree is due to Bertoin and Miermont [10].
Let us now describe our approach to the definition of shuff(H). The idea is to construct it by defining an order in which the fringes should be sent to their new attach points. This ordering yields a tree-valued Markov chain, and we formally define shuff(H) as its almost sure limit. More precisely, a construction of the first element of this Markov chain appears in [1] and has been formally justified in [13]: there the subtrees to be reattached are only those lying along the path between the root and a distinguished random leaf U 1 . In the following, we refer to this transformation as the one-path reversal. The Markov chain we have in mind consists in iteratively reattaching the subtrees lying on the paths to an i.i.d. sequence of leaves (U i ) i≥1 in H, that we later refer to as the i-paths reversals, or i-reversals for short. However, not any such sequence (U i ) i≥1 would do. Indeed, although it is very close to the one we are after, the one-path reversal enforces that the subtrees to be detached are precisely Fr(H, x, U 1 ), for the branch points x on the path to U 1 ; so in particular, they are only defined in terms of U 1 , and the choices A x are then somewhat conditioned on being consistent with the constraint that Fr(H, x, A x ) = Fr(H, x, U 1 ). ( It follows that if we want to use the results of [1,13], then the sequence (U i ) i≥1 must be constructed from (A x , x ∈ Br(H)) in such a way that, for all the branch points x on the path to U i , the constraint in (2) is satisfied with U i instead of U 1 in the right-hand side.
Plan of the paper. The route we use here to define shuff(H) relies on a careful understanding of the cutting procedure and of the genealogy induced by finitely many random points only. In Section 2, we introduce the relevant background on cut trees, and their reversals. We also prove a few results that have not appeared elsewhere. Section 3 is devoted to proving that the sequence of k-paths reversals converges as k → ∞ in the sense of Gromov-Prokhorov. Up to this point, the shuffle operation is therefore justified as a refining sequence of k-reversals. The direct construction presented above is then justified in Section 4 by proving that one can construct a sequence of leaves such that the shuffle tree corresponds to the limit of the k-reversals with respect to this sequence of leaves. Some auxiliary results about the Brownian CRT for which we did not find a reference are proved in Appendix A.

Preliminaries on cut trees and shuffle trees
In this section, we recall the previous results in [13] on the cut trees and the shuffle trees of the Brownian continuum random tree.

Notations and background on continuum random trees
We only give here a short overview, the interested reader may consult [3], [23], or [15] for more details.
A real tree is a geodesic metric space without loops. The real trees we are interested in are compact. A continuum random tree T is a random (rooted) real tree equipped with a probability measure, often referred to as the mass measure or the uniform measure. The Brownian continuum random tree is a special continuum random tree that has been introduced by Aldous [2] as the scaling limit of uniformly random trees. One way to define the Brownian CRT starts from a standard normalized Brownian excursion of unit length e = (e s , 0 ≤ s ≤ 1). For any s, t ∈ [0, 1], let and define s ∼ t if d(s, t) = 0. Then d induces a metric on the quotient space [0, 1]/∼. Moreover, this metric space is a real tree: it is the Brownian CRT, which we denote by (T , d T ) in the following. A Brownian CRT T also comes with a mass measure µ T , which is the push-forward of Lebesgue measure by the canonical projection p : [0, 1] → T . A point that is sampled according to µ T is usually called here a µ T -point. The Brownian CRT T is rooted at the point p(0). For a > 0, let e (a) = (e (a) s , 0 ≤ s ≤ a) denote the Brownian excursion of length a. We can associate with e (a) a random real tree, denoted by T (a) , by replacing e with e (a) in (3). If s > 0, we denote by sT the metric space in which the distance is sd T . Then, the Brownian scaling implies that (see also Appendix A) And clearly, the mass measure of T (a) , which is the push-forward of Lebesgue measure on [0, a], has total mass a. In what follows, we sometimes refer to T as the standard Brownian CRT. 3 For u, v ∈ T , we denote by u, v and u, v the closed and open paths between u and v in T , respectively. For u ∈ T , the degree of u in T , denoted by deg(u, T ), is the number of connected components of T \ {u}. We also denote by Lf(T ) = {u ∈ T : deg(u, T ) = 1} and Br(T ) = {u ∈ H : deg(u, T ) ≥ 3} the set of the leaves and the set of branch points of T , respectively. Almost surely, these two sets are everywhere dense in T , though Lf(T ) is uncountable and Br(T ) countable. The skeleton of T is the complement of Lf(T ) in T , denoted by Sk(T ). The skeleton is the union If ρ is the root of T , for u ∈ T , the subtree above u, denoted by Sub(T , u), is defined to be the subset {v ∈ T : u ∈ ρ, v }. If v ∈ Sub(T , u) is distinct from u, we denote by Fr(T , u, v) the fringe tree hung from u, v which is the set {w ∈ Sub(T , u) : w, u ∩ u, v = {u}}. It is nontrivial only if u ∈ Br(T ). There also exists a unique σ-finite measure concentrated on Sk(T ) such that for any two points u, v ∈ T we have ( u, v ) = d T (u, v); is called the length measure. If ρ denotes the root of T and v 1 , · · · , v k are k points of T , we write The state space of interest is the set of metric spaces that are pointed, that is with a distinguished point that we call the root and equipped with a probability measure. More precisely, it is the set of equivalence classes induced by measure-preserving isometries (on the support of the probability measure). When equipped with the Gromov-Prokhorov (GP) distance, this yields a Polish space. Convergence in the GP topology is equivalent to convergence in distribution of the matrices whose entries are distances between the pairs of points sampled from the probability distribution µ. This is discussed at length in [13], and we also refer the reader to [18] and [17] for more information.

The cutting procedure on a Brownian CRT
Let T be a Brownian CRT to be cut down. Now let P be a Poisson point process of intensity measure dt ⊗ (dx) on R + × T . Every point (t, x) ∈ P is seen as a cut on T at location x which arrives at time t. Then P defines a Poisson rain of cuts that split T into smaller and smaller connected components as time goes. More precisely, let (V i ) i≥1 be a sequence of independent points sampled according to µ, then for each t ≥ 0, P induces a nested process of exchangeable partitions of N in the following way. For each t ≥ 0, the blocks are the equivalence classes of the relation ∼ t defined by Let T i (t) be the set of those points in T which are still connected to V i at time t, that is Then it is easy to see that T i (t) is a connected subspace of T , that is, a subtree of T . Furthermore, we have T i (t) ⊆ T i (s) if s ≤ t, and ∩ t≥0 T i (t) = {V i } almost surely, since with probability one the atoms of P are everywhere dense in T and V i is not among these atoms. 4

The k-cut tree
The main point of the definition of a cut tree is to obtain a representation of the genealogy of the fragmentation induced by P as a compact real tree. A first step consists in focusing on the genealogy of the fragmentation induced on [k] = {1, 2, . . . , k}, for some k ≥ 1. So at this point, we only keep track of the evolution of the connected components containing the points V 1 , V 2 , · · · , V k and ignore all the other ones.
For each t ≥ 0, let us write π k (t) for the partition of [k] induced by ∼ t . Then for any t ≥ s, π k (t) is a refinement of π k (s). We encode the family (π k (t)) t≥0 by a rooted tree S k with k leaves. Each equivalence class induced on [k] by some ∼ t , t ≥ 0 is represented by a node of S k . It is also convenient to add an additional node r, which we see as the root of S k . From the root r there is a unique edge, which connects r to the node labeled by [k] := {1, 2, · · · , k}. Let t [k] := sup{t ≥ 0 : π k (t) = {[k]}} be the time when [k] disappears from (π k (t)) t≥0 . Note that t [k] is the first moment when there is some point Observe that almost surely x [k] has degree two in T , so that π k (t [k] ) consists of only two blocks E 1 and E 2 with probability one. This is represented in S k by the fact that the node labelled [k] has two children, labeled respectively by E 1 and E 2 . One then proceeds recursively to define the subtrees induced on the set of leaves in E 1 and E 2 , respectively. We obtain a binary tree on k leaves labelled by {1}, · · · , {k}. (See Figure 2). We now endow S k with a distance d S k , or to be more precise we define a binary real tree that has the same tree structure as S k . For this, we set for 1 ≤ i ≤ k and t ∈ [0, ∞], Then L i (∞) is finite almost surely (this is shown for instance in [1]). For every i ∈ [k], we want to identify the unique path of S k from the root [k] to the leaf {i} with the finite interval {L i (t) : t ∈ [0, ∞]} such that if a node is labelled by E for some E ⊆ [k], then it is at distance L i (t E ) from the root, where t E = sup{t ≥ 0 : E ∈ π k (t)}. Doing so does not cause any ambiguity since if i, j ∈ E then T j (s) = T i (s) for any s ≤ t E . So we obtain a compact real tree which consists of k paths of respective lengths L i (∞), 1 ≤ i ≤ k. By slightly abusing the notation, we still write S k for the real tree (S k , d S k ).
In other words, if we write E i (0) = r, E i (1) = [k], E i (2), . . . , E i (h i ) = {i} for the sequence of nodes on the path from the root to {i} in S k , the real tree (S k , d S k ) is the tree S k in which the edges have been replaced by the 2k − 1 intervals of lengths We now move on to the definition of the k-cut tree. The real tree S k provides the backbone of the k-cut tree. We define the k-cut tree cut(T , V 1 , · · · , V k ) as the real tree obtained by grafting on the backbone S k the subtrees discarded during the cutting procedure. Let C k be the set of those t ≥ 0 for which µ Because of the holes left by the previous cuts, ∆T it (t) is a connected but not complete subspace in T . We let ∆ k t be the completion of ∆T it (t). Almost surely there exists a unique x ∈ T such that (t, x) ∈ P. Note that x ∈ ∆T it (t). We denote by x ∈ ∆ k t the image of x via the canonical injection from ∆T it (t) to ∆ k t . We think of ∆ k t as rooted at x . Then for each t ∈ C k , we graft ∆ k t by its root on the path in S k connecting the root to the leaf {i t } at distance L it (t) from the root. We denote by G k = cut(T , V 1 , · · · , V k ) the obtained metric space. The tree G k also bears a mass measure which is inherited from that of T ; the set of the points which have been added (either in the backbone S k or due to completion) is assigned mass 0. The new mass is still denoted by µ. An alternative way to define G k (which is the way we have used in [13]) is to graft ∆T it (t) (rather than ∆ k t ) on S k , and then to complete the metric space. One easily checks that these two definitions coincide.
Remark. There is a number of different mass measures that we need to consider here. In order to clarify the discussion and to keep the notation under control, we have decided to keep using the same name for the mass measure when only a set of measure zero was modified by the transformation either by removal of countably many points, by (countable) completion, or by the addition of a backbone. For instance, we think of the tree G k as still carrying the mass measure µ of T .
Proposition 1 (Distribution of the k-cut tree). If T is the Brownian CRT, and (V i ) i≥1 is a sequence of i.i.d. points of T with common distribution µ, then for each k ≥ 1, we have The case k = 1 corresponds to a special case of Theorem 3.2 in [13]. The general case follows from the analogous result on the discrete trees (see Section 4, Lemma 4.5 there) and the same weak convergence argument as in [13], and we omit the details.
The "complete" cut tree By construction, since the definition of T i (t) does not depend on k, we have S k ⊂ S k+1 for each k ≥ 1.
Proposition 2 (Complete cut tree, [10,13]). Let cut(T ) = ∪ k S k be the limit metric space of (S k ) k≥1 . If T is the Brownian CRT, then almost surely, cut(T ) is a compact real tree and is distributed as T .
The construction of G k described above yields the following recurrence relation between G k+1 and G k (see Figure 3). For every k ≥ 1, the collection ∆ k t , t ∈ C k has full mass and a uniform point V falls with probability one in ∆ k t , for some t ∈ C k . If we let m k := µ(∆ k t ), then m k −1/2 ∆ k t is distributed as a standard Brownian CRT. As a consequence, the 1-cut tree cut(∆ k t , V ) is well-defined.
The cut tree cut(T ) may then be seen as the limit of the k-cut trees. The notion of cut(T ) here coincides with that in [8]. Figure 3: That subtree stopped being transformed at time τ k , since it did not contain any of V 1 , . . . , V k−1 , and it should now be replaced by Proposition 4 (Convergence of k-cut trees). Let T be the Brownian CRT. As k → ∞, Proof. If (T, d) is a compact real tree rooted at ρ, we let ht(T ) = sup u∈T d(u, ρ) and diam(T ) = sup u,v∈T d(u, v) denote the respective height and diameter of T . By the triangle inequality, we have ht(T ) ≤ diam(T ) ≤ 2 ht(T ). On the one hand, we deduce from Proposition 3 that is a non-increasing sequence, so it converges almost surely to some random variable that we denote by Υ ∞ . On the other hand, if we write δ H for the Hausdorff distance (on the compact subsets of G k ), then it follows from the construction of G k that .
Distribution of a uniform path One of the main ingredients of our construction of the shuffle tree in Section 3 consists in understanding how the path between two points gets transformed as the approximation G k of the cut tree cut(T ) gets refined. More precisely, let ξ 1 , ξ 2 be two independent µ-points of T and write p := ξ 1 , ξ 2 for the open path between them in T . Initially, in G 0 := T , p is indeed a path. But later on, as k increases, this path p gets cut into pieces each contained in some of the ∆ k t that are grafted onto the backbone S k . More formally, for each k ≥ 0, there exists an injective map φ • k : ∪ t∈C k ∆ k t → G k whose restriction to every Sk(∆ k By the recursive construction in Proposition 3, understanding how the path p gets mapped into G k by φ k reduces to understanding one step of the transformation, that is how p gets mapped into G 1 = cut(T , V 1 ) by φ 1 . The following result for k = 1 has been proved in [13]. It is the basis of the one-path reversal of the next section, and is used in Section 3.5 to derive the distribution of p k = φ k (p).
Proposition 5 (Distribution of p 1 , [13]). Almost surely, there exist M 1 , M 2 ≥ 0 and two finite sequences of elements of C 1 : which are all distinct except that t 1,M 1 = t 2,M 2 , and there exist two sequences of points (a 1 (m)) 0≤m≤M 1 and (a 2 (m)) 0≤m≤M 2 satisfying a i (0) = ξ i and a i (m) ∈ ∆ 1 t i,m for 0 ≤ m ≤ M i such that sampled in T according to the mass measure. Furthermore, we have constructed in [13] a real tree Q 1 = shuff(H, U 1 ) such that the following identity in distribution holds: In particular, (6) implies that Q 1 and T have the same distribution.
Let us now recall briefly the construction of . Then for each x ∈ B, we associate an attaching point A x , which is independent and sampled according to the restriction of ν to ∪ x x F x .
By Proposition 5, in a space where we would have H = cut(T , V 1 ) and where ρ, U 1 would be the distinguished path created, it would be possible to couple these choices with the cutting procedure in such a way that each A x corresponds to the location in the initial tree T where F x was detached from. Informally, if we were to glue these F x back at A x , we should obtain T back. So these choices are the correct ones, but nevertheless this transformation is a priori not well-defined, contrary to the discrete one (see [13,Section 3]).
The formal justification of this reverse transformation requires first to verify that the distance between two independent ν-points is a.s. well-defined (here only finitely many reattaching operations are needed), and then to construct shuff(H, U 1 ) as the continuum random tree corresponding to the matrix of distances between a sequence of i.i.d. ν-points. In other words, unlike cut(T , V 1 ), we do not construct shuff(H, U 1 ) by actually reassembling pieces of H, but we construct a tree that has the same metric properties.
Proposition 6 (Construction of shuff(H, U 1 ), [13]). Let (η i ) i≥1 be a sequence of independent points of H sampled according to the mass measure ν. For each i ≥ 1, let (a i (m)) m≥0 be a sequence of points obtained as follows: a i (0) = η i ; inductively for m ≥ 1, let x i (m − 1) be the element of B such that a i (m − 1) ∈ F x i (m−1) , and set a i (m) = A x i (m) . Then for each pair i = j, the following quantity is almost surely finite: and if we let We think of the matrix (γ(i, j)) i,j≥1 as the distance matrix between the points (η i ) i≥1 in a new metric space. Then (γ(i, j)) i,j≥1 defines a CRT, which we root at η 1 and denote by shuff(H, Note that only the points (η i ) i≥1 are kept from H. The other ones are constructed by the metric space completion. Furthermore, the very construction of the tree Q 1 = shuff(H, U 1 ) implies that, if one denotes by ν 1 its mass measure, then (η i ) i≥1 is a sequence of i.i.d. ν 1 -points in Q 1 . Observe also that γ(i, j) corresponds to the distance between η i and η j after grafting all the F x i ( ) at a i ( + 1) for < I (i, j) and all the F x j (m) at a j (m + 1) for m < I (j, i) (see Figure 4). Again, one may think of a coupling where the points η i , i ≥ 1, would be chosen by sampling (ξ i ) i≥1 in T . Almost surely, all these points are still in cut(T , V 1 ).

Multiple-paths reversal and the k-shuffle tree
Once shuff(H, U 1 ) has been properly defined, the k-shuffle tree Q k = shuff(H, U 1 , · · · , U k ) is then defined by induction. Suppose that we have constructed Q k−1 from H for some k ≥ 2. LetT • k be the component of H \ Span(H; U 1 , · · · , U k−1 ) containing U k , and letT k be the completion ofT • k . If we writem k := µ(T k ), thenm −1/2 kT k is distributed as a standard Brownian CRT and shuff(T k , U k ) is thus well-defined by Proposition 6. Now letH k be the tree obtained from H by replacingT k with shuff(T k , U k ). Then we define Q k := shuff(H k , U 1 , · · · , U k−1 ). The following is a continuous analog of Proposition 4.8 in [13].
Proof. We proceed by induction on k ≥ 1. The base case k = 1 is (8) from Proposition 6. Assume now that (9) holds for all natural numbers up to k − 1 ≥ 1. By the scaling property,m −1/2 kT k , equipped with the restriction of ν toT k , is distributed as H, and is independent of H \T k . Thus, we can apply the induction hypothesis to find that as ∆ k τ k has the same distribution asT k . In particular, all four trees in (10) are Brownian CRTs and shuff(T k , U k ) andT k have the same distribution, and we deduce from the definition ofH k that It follows thatH k and H have the same distribution. Then by the induction hypothesis, we have Now ∆ k τ k is the connected component of G k−1 \ S k−1 containing the leaf labelled as k. It is thus obtained in the same way asT k from H. As a consequence, (H \T k ,T k ) and (G k−1 \ ∆ k τ k , ∆ k τ k ) have the same distribution. Combining this with (11) and (12), we obtain The transformation fromT k to shuff(T k , U k ) only involves sampling random points inT k , and is therefore independent of the transformation fromH k to Q k . Similarly the transformations from T to G k−1 and the one from ∆ k τ k to cut(∆ k τ k , V k ) are also independent. Then it follows from (10) that which entails that (9) holds for k, and completes the proof.
3 Convergence of k-shuffle trees and the shuffle tree

The shuffle tree
In this section, we prove the following result, which constitutes the foundations of the formal definition of the shuffle tree.
The sequence of leaves (U i ) i≥1 that is used influences the limit: in particular, it determines which subtrees are fringes and in which direction they are sent to. Still, in the same way that cut(T ) does depend on the cutting procedure, we denote the limit by shuff(T ), although it does depend on the sequence (U i ) i≥1 . However, (U i ) i≥1 does not contain all the randomness used in the construction. In Section 4, we construct a specific sequence of leaves which emphasizes the randomness hidden in the construction, justifying the claim in the introduction that the sequence (A x , x ∈ Br(H)) is all that one needs.
The following is a direct consequence of Theorem 8, and justifies the claim that shuff( · ) is indeed the reverse transformation of cut( · ).
Corollary 9. We have the following identity in distribution: Proof. Let f and g be two bounded real-valued functions that are continuous in the Gromov-Prokhorov topology. Recall the notation G k = cut(T , V 1 , · · · , V k ). By (9), we have By Proposition 4 and the dominated convergence theorem, the right-hand side above converges to E[f (T )· g(cut(T ))], as k → ∞. Similarly, by Theorem 8, the left-hand side converges to E[f (shuff(H)) · g(H)]. Therefore, ].
Since f and g were arbitrary, this entails (13).
Let (η i ) i≥1 be the sequence of independent points of H in Proposition 6, which is independent of the sequence (U i ) i≥1 . Recall the random variable γ(i, j), which is the distance between η i and η j in Q 1 . Part of the statement of Proposition 6 says that (η i ) i≥1 is a a family of i.i.d. uniform points in Q 1 . Because of the inductive definition of Q k , the sequence (η i ) i≥1 remains an i.i.d. uniform family in each Q k . Let us denote by γ k (i, j) the distance between η i and η j in Q k for k ≥ 1. The main tool towards Theorem 8 is the following proposition: Let us first explain why this entails Theorem 8.
Proof of Theorem 8. Observe that by Proposition 7, for each k ≥ 1, Q k is distributed as the Brownian CRT H. Thus, (γ k (i, j)) i,j≥1 and (d H (η i , η j )) i,j≥1 have the same distribution for each k ≥ 1. It follows from Proposition 10 that the limit matrix (γ ∞ (i, j)) i,j≥1 is also distributed as (d H (η i , η j )) i,j≥1 . In other words, (γ ∞ (i, j)) i,j≥1 has the distribution of the distance matrix of the Brownian CRT. In particular, for each n ≥ 1, (γ ∞ (i, j)) 1≤i,j≤n defines an n-leaf real tree R n and the family (R n ) n≥1 is consistent and leaf-tight (see [3]), which means that (R n ) n≥1 admits a representation as a continuum random tree.
Observe that all these are still true a.s. conditionally on H. More precisely, by Proposition 10, for P-almost every H, as k → ∞, for each n ≥ 1, conditionally on H. For those H for which (14) holds, the family (R n ) n≥1 is a.s. consistent and leaf-tight, conditionally on H. Let R ∞ (H) be the CRT representation of this family. Then by definition, (14) entails the Gromov-Prokhorov convergence of shuff(H, U 1 , · · · , U k ) to R ∞ (H).
The remainder of the section is devoted to proving Proposition 10. Since (η i ) i≥1 is an i.i.d. sequence, it suffices to consider the case i = 1, j = 2.

A series representation for γ k (1, 2)
The idea behind the formal definition of Proposition 6 is to leverage Proposition 5 as follows: if H were cut(T , V 1 ), for some T and V 1 , and the distinguished path were the one between the root and U 1 ∈ H, then the image of a path between two points ξ 1 and ξ 2 in T would now go through a number of subtrees of H \ Span(H, U 1 ), and in every such tree it would go between two points which are uniform. We now go further and give such a representation for γ k (1, 2) as a sum where we specify the distributions of the trees and points involved.
The masses of these trees are of prime importance, and we let be the space of mass partitions, equipped with the usual 1 -norm · 1 . If x = (x 1 , x 2 , · · · ) ∈ S ↓ , then the length of x is defined to be the smallest index n such that x n = 0, which may well be infinite. And we denote by S ↓ f the subset of S ↓ which consists of the elements of finite length. Recall the definition of γ 1 (1, 2) in (7). The trees involved there are the components F x i (n) rooted at x i (n), for n ≥ 0 and i = 1, 2. Let denote the distribution of the rearrangement of {ν(F x 1 (n) ), 0 ≤ n ≤ I (1, 2)} ∪ {ν(F x 2 (n) ), 0 ≤ n ≤ I (2, 1) − 1} in decreasing order. Then is a probability measure supported on S ↓ f .
Lemma 11 (Representation of γ k (1, 2)). For each k ≥ 1, there exists a finite sub-collection of the masses of the components of H \ Span(H; U 1 , · · · , U k ) denoted by m k = (m k,n ) 0≤n≤N k ∈ S ↓ f , and a sequence of positive real numbers (R k n , 0 ≤ n ≤ N k ) such that where (R k n ) n≥0 is a sequence of i.i.d. copies of a Rayleigh random variable that is independent of m k and N k . Moreover, (m k ) k≥1 is a Markov chain with initial law and the following transitions: • with probability 1 − m k 1 , m k+1 = m k , and • for 0 ≤ n ≤ N k , with probability m k,n , m k+1 is obtained by replacing in m k the element m k,n by m k,n · m, where m has distribution , and then resorting the sequence in decreasing order.
Then, m 1,n > 0 for 0 ≤ n ≤ N 1 , and we set R 1 n := D 1 n m −1/2 1,n . By (7), this definition immediately yields This is (15) for k = 1. For the distribution of (R 1 n ) n≥1 , we need the following fact, whose proof is given in Appendix A.
Lemma 12 (Scaling property). For 0 ≤ n ≤ N 1 , let T * n be the rescaled metric space m −1/2 1,n T 1,n , equipped with the (probability rescaled) restriction of ν to T 1,n . Then for each j ≥ 1, on the event {N 1 = j}, (T * n ) 0≤n≤N 1 is a sequence of j independent copies of a Brownian CRT. Let us recall that by definition each a i (m) is distributed as ν restricted to F x i (m) . We also recall that if η, η are two independent points of H sampled according to ν, then d H (η, η ) (resp. the distance of η from the root) is Rayleigh distributed. Then it follows from Lemma 12 that (R 1 n ) 0≤n≤N 1 is a sequence of i.i.d. Rayleigh random variables. This proves the statement for k = 1, which is our base case. Now we proceed to prove the induction step, and assume that we almost surely have the desired representation for all natural numbers up to k ≥ 1. In particular, there exists a sequence F k = (T k,n ) 0≤n≤N k which is a finite sub-collection of the connected components of H \ Span(H; U 1 , · · · , U k ) such that m k := (ν(T k,n )) 0≤n≤N k is non-increasing. Moreover, we suppose that (15) holds for k, where for each 0 ≤ n ≤ N k , √ m k,n R k n is either the distance between two independent ν-points of T k,n or the distance between a ν-point and the root of T k,n .
Recall that Q k+1 is defined to be shuff(H k+1 , U 1 , · · · , U k ), whereH k+1 is obtained from H by replacing the connected componentT k+1 of H \ Span(H; U 1 , . . . , U k ) which contains U k+1 , byS := shuff(T k+1 , U k+1 ). The real treeH k+1 is a Brownian CRT with mass measureν k+1 , and by the induction hypothesis, there exists a sequenceF k = (T k,n , 0 ≤ n ≤N k ) which consists in a finite sub-collection of the components ofH k+1 \ Span(H k+1 ; U 1 , · · · , U k ) rearranged in decreasing order of their masses, such that where for each 0 ≤ n ≤N k ,m k,n = µ(T k,n ) and m k,nR k n is either the distance between two uniform independent points ofT k,n or the distance between a uniform point and the root ofT k,n . However, we are after a representation in terms of the connected components of H \ Span(H; U 1 , . . . , U k+1 ).
Note thatS is the only component ofH k+1 \ Span(H k+1 ; U 1 , · · · , U k ) that is not a component of H\Span(H; U 1 , . . . , U k+1 ). So ifS does not appear inF k , then by construction we haveF k = F k where F k = (T k,n ) 0≤n≤N k is a finite sub-collection of the connected components of H \ Span(H; U 1 , · · · , U k ) such that m k := (ν(T k,n )) 0≤n≤N k is non-increasing for which (15) holds, with the additional distributional properties we are after. Furthermore, in that case, we have γ k+1 (1, 2) = γ k (1, 2). It thus suffices to take F k+1 = F k and R k+1 n = R k n for each n. This case occurs precisely if U k+1 does not fall in any of the subtrees of F k , which happens with probability 1 − m k 1 , since U k+1 is ν-distributed.
If, on the other hand,S =T k,n 0 for some 0 ≤ n 0 ≤N k , then the representation in (17) needs to be modified. Still, sincem k,n 0 =ν k+1 (S) = ν(T k+1 ) = m k,n 0 , the masses are correct andm k = m k . So, in particular, this occurs with probability m k,n 0 Note also that, by definition ofH k+1 , where here, (m k,n 0 ) 1/2Rk n 0 is the distance inS between either two independentν k+1 -points or between aν k+1 -point and the root. Recall thatS = shuff(T k+1 , U k+1 ) is rooted at aν k+1 -point. Note also that by the scaling property, (m k,n 0 ) −1/2T k+1 is a Brownian CRT (and is thus distributed as H). Therefore, we may use the induction hypothesis with k = 1 to obtain that there exists a sequenceF = (Ť n , 0 ≤ n ≤Ň ) consisting in a sub-collection of the connected components ofT k+1 \ Span(T k+1 ; U k+1 ) rearranged in the decreasing order of their masses such that where for each 0 ≤ n ≤Ň , ν(Ť n ) 1/2Ř n is either the distance inŤ n between either two independent ν-points or between a ν-point and the root. So in particular, (Ř n ) n≥0 forms a sequence of i.i.d. copies of a Rayleigh random variable. Furthermore, (ν(Ť n )/m k,n 0 , 0 ≤ n ≤Ň ) is an independent copy of m 1 . Then we set F k+1 = (T k+1,n , 0 ≤ n ≤ N k+1 ) to be the rearrangement of the collection {T k,n : 0 ≤ n ≤N k , n = n 0 } ∪ {Ť n : 0 ≤ n ≤Ň } such that m k+1 := (ν(T k+1,n )) 0≤n≤N k+1 is non-increasing. Note that F k+1 is a finite sub-collection of components of H \ Span(H; U 1 , · · · , U k+1 ). Finally, inserting (19) into (17) and then comparing with (18), we obtain (15) for k + 1, which completes the proof.
Proving Proposition 10 now reduces to showing that the series representation in (15) converges almost surely. First observe that conditionally on m k , γ k (1, 2) is a sum of independent random variables which have Gaussian tails (see later for details). It then easily follows from classical results on concentration of measure that γ k (1, 2) is concentrated about the conditional mean E[γ k (1, 2) | m k ]. Furthermore the width of the concentration window is controlled by the variance, which is here O( m k 1 ). The following lemmas control the distance between γ k (1, 2) and E[γ k (1, 2) | m k ].
The proofs of these lemmas rely on standard facts about fragmentations chains and concentration inequalities and are presented in Sections 3.3 and 3.4. From there, the last step consists in proving that E[γ k (1, 2) | m k ] also converges almost surely.
Our approach to Lemma 15 relies on a coupling between the cutting and shuffling procedure and is given in Section 3.5.

Proof of Lemma 13: polynomial decay of the self-similar fragmentation chain
The dynamics of (m k ) k≥0 are quite similar to that of a self-similar fragmentation chain, and the proof of the lemma relies on classical results on the asymptotic behaviour of fragmentation processes. If we were to count only the number of actual jumps of the process (or equivalently the number of i's such that U i does affect the collection of masses) then one would exactly have the state of a fragmentation chain taken at the jump times. The fact that even the i's that do not affect the chain are counted only induces a time-change that is easily controlled.
Recall that denotes the law of m 1 . Let X(t) = (X i (t)) i≥1 be a self-similar fragmentation chain with index of self-similarity 1 and dislocation measure starting from the initial state X(0) = (1, 0, 0, · · · ), as introduced in [6, Chapter 1]. Then, X(t) jumps at rate X(t) 1 = i≥1 X i (t). This chain is non-conservative since P(0 < m 1 1 < 1) = 1 for m 1 is an a.s. finite and non-empty collection of the masses of the components of H \ Span(H; U 1 ). To compensate for the loss of mass (the i's that do not modify (m k ) k≥0 ), consider another Poisson point process Γ on [0, ∞) with rate 1 − X(t) 1 at time t ≥ 0, and let θ(t) denote the number of jumps before time t in X and Γ combined. Then θ(t) is the number of jumps before time t of a Poisson process with rate one, and if we set θ −1 (k) := inf{t ≥ 0 : θ(t) ≥ k} be the time of the k-th point, then we have From now on, we work on a space on which these are coupled to be equal with probability one. Let p * ∈ (0, 1) be the critical exponent such that E[ i≥1 m p * 1,i ] = 1. Then is a positive martingale which is uniformly integrable. Denote by M(∞) its a.s. limit. By Theorem 1 of [9], for every k ≥ 1, there exists a constant C k ∈ (0, ∞) such that from which it follows immediately that sup t≥0 t k · EX p * +k 1 (t) ≤ C k . Now, for any δ ∈ (0, 1) and > 0, by Markov's inequality, we obtain at time n ≥ 0 By choosing k large enough that k > (1 + δp * )/(1 − δ), this implies that n≥1 P(X 1 (n) ≥ n −δ ) < ∞ and lim sup n→∞ n δ X 1 (n) ≤ almost surely, by the Borel-Cantelli lemma. Letting → 0 along a sequence, we then obtain that n δ X 1 (n) → 0 a.s., and t δ X 1 (t) → 0 a.s. as well by monotonicity. Now notice that for any t ≥ 0, Then, for any δ ∈ (0, 1), since p * ∈ (0, 1) as the chain is non-conservative.
Finally, we go back to (m k ) k≥1 using (20). By the strong law of large numbers, we have θ(t)/t → 1 almost surely, and we easily deduce that k/θ −1 (k) → 1 almost surely as k → ∞. Using this fact together with (20) and (21), we obtain that for any δ ∈ (0, 1), which completes the proof of Lemma 13.

Proof of Lemma 14: concentration of the Rayleigh variable
Let R be a random variable of Rayleigh distribution, with density xe −x 2 /2 on R + . Then, one easily verifies that R − E[R] is sub-Gaussian in the sense that there exists a constant v such that for every λ ∈ R, one has (See [12, Theorem 2.1, p. 25].) We may thus apply concentration results for sub-Gaussian random variables such as those presented in Section 2.3 of [12]. For each k ≥ 1, we have by (15), where according to Lemma 11, (R k n , 1 ≤ n ≤ N k ) is a sequence of i.i.d. copies of a Rayleigh random variable. Therefore If (A k , B k , C k ) k≥1 are sequence of events satisfying that A k ⊂ B k ∪ C k for each k ≥ 1, then it is elementary that P(lim sup k A k ) ≤ P(lim sup k B k ) + P(lim sup k C k ). Here, we take with the same α as in Lemma 13. Then P(lim sup k C k ) = 0 by Lemma 13. On the other hand, we deduce from (22) that which entails that P(lim sup k B k ) = 0 by the Borel-Cantelli lemma. Hence, P (lim sup k A k ) = 0, which means lim sup k |σ k | < almost surely. Since > 0 was arbitrary, the proof of Lemma 14 is now complete.

Proof of Lemma 15: a coupling via cut trees
Let us recall the notations before Proposition 5. There are two µ-points ξ 1 , ξ 2 in the Brownian CRT T , and p := ξ 1 , ξ 2 is the path in T between these two points. We denote by D := d T (ξ 1 , ξ 2 ) the length of this path. Let G k = cut(T , V 1 , · · · , V k ) be the k-cut tree. Recall that, up to the finitely many cut points that are lost, p k is the image of p by the canonical embedding φ k from ∪ t∈C k ∆ k t into G k . We have the following representation of the distance D, which is an analog of Lemma 11. Lemma 16. For each k ≥ 1, there exists some m k = (m k,n ) 0≤n≤N k ∈ S ↓ f , which is a sub-collection of the masses of {∆ k t , t ∈ C k } such that where (B k n ) n≥0 is an i.i.d. sequence of Rayleigh random variables, independent of m k and N k . Moreover, (m k ) k≥1 has the same distribution as (m k ) k≥1 .
Proof. For each k ≥ 1, the injection φ k is an isometry on each Sk(∆ k t ). Thus, we have D = (p k ) for each k ≥ 1. Let us show that (p k ) can be written as the right-hand side in (23).
We proceed by induction on k ≥ 1. The base case k = 1 is a consequence of Proposition 5. Let F 1 = (T 1,n , 0 ≤ n ≤ N 1 ) be the vector consisting of the elements of the collection arranged in the decreasing order of their masses m 1,n := µ(T 1,n ). By comparing Propositions 5 and 6, we see that F 1 has the same distribution as F 1 , since G 1 is a Brownian CRT (Proposition 1) and given G 1 the sequences (a i (m)) m≥0 , i ≥ 1, are sampled in the same way as (a i (m)) m≥0 given H. As a consequence, m 1 := (m 1,n , 0 ≤ n ≤ N 1 ) has the same distribution as m 1 . Moreover, by Lemma 12, each (m 1,n ) −1/2 T 1,n is an independent copy of a Brownian CRT. Define if T 1,n = ∆ 1 t 2,m for m = 0, 1, · · · , M 2 − 1; d T (a 1 (M 1 ), a 2 (M 2 )), otherwise, and set B 1 n := D 1 n /(m 1,n ) 1/2 . Then B 1 n , 0 ≤ n ≤ N 1 , are independent from each other. Also, it follows from (5) that Suppose now that for all natural numbers up to some k ≥ 1 there exist some F k = (T k,n , 0 ≤ k ≤ N k ), the elements of which form a sub-collection of (∆ k t , t ∈ C k ), such that m k = (m k,n ) 0≤n≤N k := (µ(T k,n )) 0≤n≤N k is non-increasing and that where (m k,n ) 1/2 B k n is the distance between two points, say u k n and v k n , in T k,n , and u k n is a µ-point in T k,n while v k n is either the root of T k,n or another µ-point independent of u k n . Recall that by Proposition 3, G k+1 can be obtained from G k by replacing ∆ k τ k+1 with cut(∆ k τ k+1 , V k+1 ). Suppose first that ∆ k τ k+1 does not appear in F k , which happens with probability 1 − m k 1 . Then, all the components of F k are actually elements of {∆ k+1 t , t ∈ C k+1 } (see Figure 3) and it suffices to take F k+1 = F k and B k+1 n = B k n for each n. So in this case, the representation in (23) for k + 1 follows trivially from (24).
Suppose now that ∆ k τ k+1 = T k,n 0 for some 0 ≤ n 0 ≤ N k , which occurs with probability m k,n 0 . In this case, we have wherep Observe that the root behaves as a uniform point in our cutting procedure, and that the rescaled tree (m k,n 0 ) −1/2 ∆ k τ k+1 is a standard Brownian CRT. Thus, the induction hypothesis for k = 1 applies and with probability one there exists a sequence (T n , 0 ≤ n ≤Ñ ), which is a sub-collection of the ∆ k+1 t , t ∈ C k+1 , which are subsets of cut(∆ k τ k+1 , V k+1 ) (see Figure 3), rearranged in the decreasing order of their masses such that where µ(T n ) 1/2R n is either the distance between two uniform independent points ofT n or the distance between a uniform point and the root ofT n , so that (R n ) is an i.i.d. family of Rayleigh distributed random variables. Furthermore, (µ(T n )/m k,n 0 , 0 ≤ n ≤Ñ ) is an independent copy of m 1 . Then we set F k+1 = (T k+1,n , 0 ≤ n ≤ N k+1 ) to be the rearrangement of the collection {T k,n : 0 ≤ n ≤ N k , n = n 0 } ∪ {T n : 0 ≤ n ≤Ñ } such that m k+1 := (µ(T k+1,n ), 0 ≤ n ≤ N k+1 ) is non-increasing. Inserting (25) and (26) into (24) yields the representation in (23) for k + 1, which completes the proof of the induction step.
As (m k ) k≥1 has the same distribution as (m k ) k≥1 , Lemma 13 also holds for (m k ) k≥1 . Furthermore the concentration arguments already used in the course of the proof of Lemma 14 imply that, a.s., Since D does not vary with k, this implies that (E[D | m k ]) k≥1 converges almost surely. Since the sequence (E[γ k (1, 2) | m k ]) k≥1 has the same distribution, it also converges almost surely and the proof of Lemma 15 is complete.

Direct construction of the complete reversal shuff(H)
In this section, we finally prove that the operation which we described in the introduction as the dual to the complete cutting procedure makes sense, and is indeed the desired dual. This reduces to make the link between the collection of random variables (A x , x ∈ Br(T )) and the iterative reversal of paths to the random leaves (U i ) i≥1 . We prove here that the sequence of random leaves can be constructed as a measurable function of the family (A x , x ∈ Br(T )).

Construction of one consistent leaf
Recall that H is a Brownian CRT rooted at ρ and with mass measure ν. Let {A x , x ∈ Br(H)} be a family of independent random variables such that for every x, the point A x ∈ Sub(H, x) is chosen according to the restriction of the mass measure ν to Sub(H, x). One of the main constraints for the family (U i ) i≥1 is that the path to U 1 should be the first path to be reversed, and because of this, all the branch points x on the path between the root and U 1 should have choices A x which are consistent with the reversing of the path to U 1 in the sense that for every branch point x on ρ, U 1 , one should have Think of the discrete setting: if we have a rooted tree T and (A u , u ∈ T ) a sequence of independent nodes such that A u is distributed uniformly in the tree above u, then it is easy to construct a uniformly random node U 1 such that for every node u on the path between U 1 and the root, u, U 1 ∩ u, A u \{u} = ∅, or equivalently the points U 1 and A u both lie in the same subtree of T rooted at one of the children of u. To do this, one simply needs to build the path from the root to U 1 by iteratively adding nodes as follows: start from the root; at some step where the current node is v, if A v = v then move to the first node in the direction to A v , otherwise A v = v and set U 1 = v. The node U 1 is constructed so that the choices on the path ρ, U 1 are consistent, and one easily verifies that U 1 is a uniformly random node. The idea is to adapt this technique to the continuous setting. To this aim, it suffices to verify that, when constructing a consistent path from the root, we make positive progress.
Let v ∈ H \ {ρ}. We say that a branch point x ∈ ρ, v ∩ Br(H) is a turning point between the root and v if A x ∈ Fr(H, x, v). We denote by N v the set of those x which are turning points between the root and v.
Lemma 17. Let H be the Brownian CRT. For each y ∈ Sk(H), with probability one, N y has at most finitely many elements.
Proof. Let us write ν y = ν(Sub(H, y)). As H is a Brownian CRT ν y > 0 almost surely for every y ∈ Sk(H). Let us write P H for the probability measure conditionally on H. Since A x is distributed according to the restriction of ν to Sub(H, x), we have for each x ∈ ρ, y ∩ Br(H), Then, we have Now let us consider the following iterative process which constructs refining approximations Y k , k ≥ 0, of U . Start with Y 0 = ρ and Z 0 , which is a leaf of distribution ν. Supposing now that we have defined Y k and Z k , for some k ≥ 0, if N Z k = ∅, then we stop the process and set U = Z k . Otherwise, there must exist some y ∈ ρ, Z k ∩ Sk(H) such that N y = ∅. But by Lemma 17, N y is finite a.s. so that there is an x 0 ∈ N y which is closest to the root. Then we set Y k+1 = x 0 and Proof. Let us first show that Z k converges to some point U . If the process has stopped at some finite time, this is obvious. Otherwise, note that (Sub(H, Y k )) k≥1 is a decreasing sequence of sets and define U in ∩ k≥0 Sub(H, Y k ) to be the point which minimizes the distance to the root. (So ∩ k≥0 Sub(H, Y k ) = Sub(H, U ) and Y k → U .) Now, we claim that U is a leaf, so that Z k → U as k → ∞. To see this, suppose for a contradiction that U is not a leaf. Then U ∈ Sk(H) and with probability one, ν := ν(Sub(H, U )) > 0. Note that by construction, we have Z k ∈ Fr(H, Y k+1 , Z k ) ⊃ Sub(H, Y k+2 ), thus Z k ∈ Sub(H, U ) for each k ≥ 1. However, for every k, as k → ∞. As a consequence, almost surely ν = ν(Sub(H, U )) = 0 so that U is a.s. a leaf.
It now remains to prove that U is indeed ν-distributed. For this, it suffices to show that for every x ∈ H, P H (U ∈ Sub(H, x)) = ν (Sub(H, x)). Note that since U is a leaf, we have Z k → U as k → ∞. We claim that for every k ≥ 0, the leaf Z k is ν distributed. Clearly, this would complete the proof. We proceed by induction on k ≥ 0. For k = 0, Z 0 has distribution µ and the result is immediate. It will be useful to prove also the result for k = 1.
For a point s ∈ Y 0 , Z 0 , we write F s = Fr(H, s, Z 0 ). For any x ∈ Y 0 , Z 0 , since the branch points are countable, we have since the choices of all the points are independent. There are only finitely s ∈ Y 0 , x for which ν(F s ) > ν(Sub(H, x))/2. For the others, ν(F s )/ν(Sub(H, s)) ≤ 1/2 and .
It follows that the infinite product in (27) is absolutely convergent since s∈ Y 0 ,x ν(F s ) ≤ 1. Therefore, Note that from our definition of F s , this is indeed a probability distribution since s∈ Y 0 ,Z 0 ν(F s ) = 1. Now, for any z ∈ H, we have which implies that, almost surely, (Sub(H, z)), so that Z 1 has distribution ν. For the induction step, suppose now that Z k has distribution ν. Conditionally on Y k and Z k−1 , the point Z k is distributed according to the restriction of ν to the set Fr(H, Y k , Z k−1 ).
Applying the result for k = 1 to the subtree Fr(H, Y k , Z k−1 ), we see that Z k+1 is also distributed according to the restriction of ν to Fr(H, Y k , Z k−1 ). So, Z k+1 and Z k have the same conditional distribution, and it follows that Z k+1 is ν-distributed. Finally, since Z k → U almost surely and for every k, Z k is ν-distributed, it follows that U is a ν-distributed leaf and the proof is complete.
Finally, we prove that U does not contain any of the "auxiliary" randomness used for the construction in the following sense: Lemma 19. Conditionally on H, the random leaf U is a measurable function of (A x , x ∈ Br(H)).
Proof. It is clear from the construction, that U is a measurable function of (A x , x ∈ Br(H)) and Z 0 . It thus suffices to show that U is independent of Z 0 . To see this, consider an independent copy Z 0 of Z 0 and let (Y k , Z k ) k≥1 be the sequence of random variables obtained from this initial choice. By Lemma 18, Z k converges almost surely to a leaf that we denote by U . We now show that a.s. U = U .
In this direction, we prove by induction that for every k ≥ 1 we have Y k ∈ ρ, U and Z k ∈ Fr(H, Y k , U ). To see that this is the case, observe that with probability one Z 0 ∧ U ∈ Sk(H) is a turning point for Z 0 . It is also the closest such point from the root and thus Y 1 = Z 0 ∧U . Moreover, since (Y k ) k≥0 is a path to U that is consistent with (A x , x ∈ Br(H)), by construction U ∈ Sub(H, Y 1 ) \ Fr(H, Y 1 , Z 1 ). In other words, Z 1 ∈ Fr(H, Y 1 , U ). Supposing now that the claim holds for all integers up to k ≥ 1, we see that Z k ∧ U ∈ Y k , U since H is a.s. binary. Again, Z k ∧ U is a turning point, and there is no other such point on Y k , Z k ∧ U so that Y k+1 = Z k ∧ U . As before, Y k+1 ∈ Y r , Y r+1 for some r ≥ 0, and because the path to U is consistent, it must be the case that Z k ∈ Fr(H, Y k+1 , U ).
Finally, recall that we proved in Lemma 18 that Y k also almost surely converges to U . Since U ∈ Sub(H, Y k ) for each k ≥ 1, we have U = U and the proof is complete.

The direct shuffle as the limit of k-reversals
It is now easy to use Lemma 18 in order to construct a sequence of i.i.d. leaves (U i ) i≥1 which are distributed according to the mass measure ν, and is consistent with the collection (A x , x ∈ Br(H)). We proceed inductively as follows. First set U 1 to be the ν-distributed leaf whose existence is guaranteed by Lemma 18. Then, assume that we have defined (U i ) 1≤i≤k and set S k = ∪ 1≤i≤k ρ, U i . Let U • k+1 be an independent ν-leaf in H. With probability one U • k+1 ∈ S k , so R k+1 := {s ∈ H : s, U • k+1 ∩ S k = ∅} is a non-empty subtree of H that also has positive mass. Then U • k+1 is distributed according to the restriction of ν to R k+1 , but may not be consistent with (A x , x ∈ Br(R k+1 )) in that subtree. By Lemma 18, there exists a random leaf U k+1 , distributed according to µ restricted to R k+1 , and which is consistent with the collection (A x , x ∈ Br(R k+1 )). One easily verifies that the collection (U i ) i≥1 has the required properties.
Finally, the following result justifies the definition of shuff(H) that we gave in the introduction.
Proposition 20. Let H be a Brownian CRT with mass measure ν. Let (A x , x ∈ Br(H)) be a family of independent random variables such that A x is distributed according to the restriction of ν to Sub(H, x). Let (U i ) i≥1 be a family of random leaves consistent with (A x , x ∈ Br(H)). Then, the sequence (shuff(H; U 1 , U 2 , . . . , U k )) k≥1 converges a.s. as k → ∞ in the sense of Gromov-Prokhorov, and we define shuff(H) to be the limit.
Note that this definition is indeed consistent with the algorithm given in the introduction: for every fixed k ≥ 1, the branch points on Span(H; U 1 , . . . , U k ) are used to form shuff(H, U 1 , . . . , U k ); furthermore, since every branch point x ∈ Br(H) is in some Span(H; U 1 , . . . , U k ) for k large enough, all the branch points are used to form the limit of (shuff(H, U 1 , . . . , U k )) k≥1 .

A Some facts about the Brownian CRT
In this section, we provide a proof of Lemma 12, which says that the sequence of (mass rescaled) real trees encountered on the path between two random points in the k-reveral are independent Brownian CRT. The proof is based on the scaling property of the Brownian excursion, Bismut's path decomposition of a Brownian excursion [11], and the size-biased sampling of a Poisson point process due to Perman et al. [26]. For the sake of precision, the arguments are phrased in terms of excursions rather than real trees, but we make sure to give the tree-based intuition as well.
BISMUT'S DECOMPOSITION. Let B = (B t ) t≥0 be a standard Brownian motion. Let N be the Itô measure for the excursions of |B| away from 0, which is a σ-finite measure on the space of non-negative continuous paths C (R + , R + ). Let w = (w s ) s≥0 be the coordinate process. In particular, if we denote by ζ := inf{s > 0 : w s = 0} the lifetime of an excursion, then N(ζ ∈ dr) = dr/ √ 2πr 3 . We denote by N (1) the law of the normalized excursion. For r > 0, we set N (r) to be the distribution of the rescaled process ( √ rw s/r ) 0≤s≤r under N (1) . Then it follows from the scaling property of Brownian motion that N (r) is the law of w under N( · | ζ = r). We have N( · ) = N(ζ ∈ dr)N (r) ( · ).
If we set S (w) ∈ C (R + , R + ) to be the process (S (w))(s) = ζ −1/2 w sζ for s ≥ 0, then it follows from (28) that, under N, S (w) is independent of ζ and is distributed as the normalized excursion. Let (Z k ) k≥0 be a sequence of independent variables uniformly distributed on [0, ζ]. If we see w as encoding a Brownian CRT, then (Z k ) k≥0 is a sequence of leaves sampled according to the mass measure. The CRT is decomposed along the path leading to the leaf corresponding to Z 0 : set K := w Z 0 and let We set Q to be the sum of the point measures P( − → w • ) and P( ← − w • ) on R + × C (R + , R + ): where the last expression above serves as the definition of s j and w j , for j ≥ 1.
SIZE-BIASED ORDERING OF THE EXCURSIONS. For each j ≥ 1, we denote by I j the excursion interval (l i , r i ) associated with w j . We write ζ j for the length of I j . Notice that N-a.s. j≥1 ζ j = ζ. Let (κ i ) i≥1 be the permutation of N induced by (Z n ) n≥1 as follows. Let κ 1 be the index such that Z 1 ∈ I κ 1 , then