Trickle-down processes and their boundaries

It is possible to represent each of a number of Markov chains as an evolving sequence of connected subsets of a directed acyclic graph that grow in the following way: initially, all vertices of the graph are unoccupied, particles are fed in one-by-one at a distinguished source vertex, successive particles proceed along directed edges according to an appropriate stochastic mechanism, and each particle comes to rest once it encounters an unoccupied vertex. Examples include the binary and digital search tree processes, the random recursive tree process and generalizations of it arising from nested instances of Pitman's two-parameter Chinese restaurant process, tree-growth models associated with Mallows' phi model of random permutations and with Schuetzenberger's non-commutative q-binomial theorem, and a construction due to Luczak and Winkler that grows uniform random binary trees in a Markovian manner. We introduce a framework that encompasses such Markov chains, and we characterize their asymptotic behavior by analyzing in detail their Doob-Martin compactifications, Poisson boundaries and tail sigma-fields.

Several stochastic processes appearing in applied probability may be viewed as growing connected subsets of a directed acyclic graph that evolve according to the following dynamics: initially, all vertices of the graph are unoccupied, particles are fed in one-by-one at a distinguished source vertex, successive particles proceed along directed edges according to an appropriate stochastic mechanism, and each particle comes to rest once it encounters an unoccupied vertex. If we picture the source vertex as being at the "top" of the graph, then successive particles "trickle down" the graph until they find a vacant vertex that they can occupy.
We are interested in the question: "What is the asymptotic behavior of such a (highly transient) set-valued Markov chain?" For several of the models we consider, any finite neighborhood of the source vertex will, with probability one, be eventually occupied by a particle and so a rather unilluminating answer to our question is to say in such cases that the sequence of sets converges to the entire vertex set V . Implicit in the use of the term "converges" in this statement is a particular topology on the collection of subsets of V ; we are embedding the space of finite subsets of V into the Cartesian product {0, 1} V and equipping the product space with the usual product topology. A quest for more informative answers can therefore be thought of as a search for an embedding of the state space of the chain into a topological space with a richer class of possible limits.
An ideal embedding would be one such that the chain converged almost surely to a limit and the σ-field generated by the limit coincided with the tail σ-field of the chain up to null events. For trickle-down processes, the Doob-Martin compactification provides such an embedding, and so our aim is to develop a body of theory that enables us to identify the compactification for at least some interesting examples. Moreover, a knowledge of the Doob-Martin compactification allows us to determine, via the Doob h-transform construction, all the ways in which it is possible, loosely speaking, to condition the Markov chain to behave for large times. This allows us to construct interesting new processes from existing ones or recognize that two familiar processes are related by such a conditioning.
A prime example of a Markov chain that fits into the trickle-down framework is the binary search tree (BST) process, and so we spend some time describing the BST process in order to give the reader some concrete motivation for the definitions we introduce later. The BST process and the related digital search tree (DST) processes that we consider in Section 5 arise from considering the behavior of treebased searching and sorting algorithms. The trickle-down mechanism is at the heart of both algorithms: the vertices of the complete rooted binary tree are regarded as potential locations for the storage of data values x 1 , x 2 , . . . that arrive sequentially in time. We interpret these values as labels of particles. The particles are fed in at the root vertex, which receives x 1 , and they are routed through the tree until a free vertex is found. How we travel onwards from an occupied vertex depends on the algorithm: in the BST case we assume that the input stream consists of real numbers and we compare the value x to be inserted with the content y of the occupied vertex, moving to the left or right depending on whether x < y or x > y, whereas in the DST case the inputs x i are taken to be infinite 0-1 sequences, and we move from an occupied vertex of depth k to its left or right child if the k th component of x i is 0 or 1 respectively. If the input is random and we ignore the labeling of the vertices by elements of the input data sequence, then we obtain a sequence of subtrees of the complete binary tree; the n-th element of the sequence is the subtree consisting of the vertices occupied by the first n particles.
Binary trees in general and their role in the theory and practice of computer science are discussed in [Knu69]. Several tree-based sorting and searching algorithms are described in [Knu73]. In particular, a class of trees (generalizing binary search trees as well as digital search trees) with a construction similar to our trickle-down process is introduced in [Dev99]. An introduction to the literature on tree-valued stochastic processes arising in this connection is [Mah92]. Historically, real valued functionals such as the path length or the insertion depth of the next item were investigated first, with an emphasis on the expected value for random input as a function of the amount of stored data (that is, of the number of vertices in the tree). In recent years, several infinite-dimensional random quantities related to the shape of the trees such as the node depth profile [CDJH01,DJN08], the subtree size profile [DG10,Fuc08] and the silhouette [Grü09] have been studied.
In the present paper we develop a framework for trickle-down processes that contains the BST and DST processes as special cases. As a consequence, we obtain limit results for the sequence of random trees themselves, using a topology on the space of finite binary trees that is dictated by the underlying stochastic mechanism. We also establish distributional relationships; for example, we show that the Markov chains generated by the BST and the DST algorithms are related via h-transforms -see Theorem 5.1.
In order to motivate our later formal definition of trickle-down processes, we now reconsider the BST process from a slightly different point of view by moving away somewhat from the search tree application and starting with a bijection from classical enumerative combinatorics (see, for example, [Sta97]) between permutations of the finite set [n] := {1, 2, . . . , n} and certain trees with n vertices labeled by [n].
Denote by {0, 1} := ∞ k=0 {0, 1} k the set of finite tuples or words drawn from the alphabet {0, 1} (with the empty word ∅ allowed) -the symbol emphasizes that this is a disjoint union. Write an -tuple (v 1 , . . . , v ) ∈ {0, 1} more simply as v 1 . . . v . Define a directed graph with vertex set {0, 1} by declaring that if u = u 1 . . . u k and v = v 1 . . . v are two words, then (u, v) is a directed edge (that is, u → v) if and only if = k + 1 and u i = v i for i = 1, . . . , k. Call this directed graph the complete rooted binary tree. Say that u < v for two words u = u 1 . . . u k and v = v 1 . . . v if k < and u 1 . . . u k = v 1 . . . v k ; that is, u < v if there exist words w 0 , w 1 , . . . , w −k with u = w 0 → w 1 → . . . → w −k = v.
A finite rooted binary tree is a non-empty subset t of {0, 1} with the property that if v ∈ t and u ∈ {0, 1} is such that u → v, then u ∈ t. The vertex ∅ (that is, the empty word) belongs to any such tree t and is the root of t. See  Suppose that r(1), . . . , r(n) is an ordered listing of [n]. Define a permutation π of [n] by π −1 (k) = r(k), k ∈ [n]. There is a unique pair (t, φ), where t is a finite rooted binary tree with #t = n and φ is a labeling of t by [n], such that • φ(∅) = 1, • if u, v ∈ t and u < v, then φ(u) < φ(v), The labeling may be constructed inductively as follows. If n = 1, then we just have the tree consisting of the root ∅ labeled with 1. For n > 1 we first remove n from the list r(1), . . . , r(n) and build the labeled tree (s, ψ) for the resulting listing of [n − 1]. The labeled tree for r(1), . . . , r(n) is of the form (t, φ), where t = s ∪ {u} for u / ∈ s, φ(u) = n, φ restricted to s is ψ, and, setting u = u 1 . . . u k , u = 0, if π • ψ(u 1 . . . u −1 ) < π(n), 1, if π • ψ(u 1 . . . u −1 ) > π(n).
As illustrated in Figure 2, the label 1 is inserted at the root, the label 2 trickles down to the vertex 1, the label 3 trickles down to the vertex 10, the label 4 trickles down to the vertex 0, and so on until the label 9 trickles down to the vertex 001. Now let (U n ) n∈N be a sequence of independent identically distributed random variables that each have the uniform distribution on the interval [0, 1]. For each positive integer n define a uniformly distributed random permutation Π n of [n] by requiring that Π n (i) < Π n (j) if and only if U i < U j for 1 ≤ i, j ≤ n. That is, Π n (k) = #{1 ≤ ≤ n : U ≤ U k } and the corresponding ordered list R n (k) := Π −1 n (k), 1 ≤ k ≤ n, is such that U Rn(1) < U Rn(2) < . . . < U Rn(n) . The corresponding ordered list for Π n+1 is thus obtained by inserting n + 1 into one of the n − 1 "slots" between the successive elements of the existing list or into one of the two "slots" at the beginning and end of the list, with all n + 1 possibilities being equally likely.
Applying the procedure above for building labeled rooted binary trees to the successive permutations Π 1 , Π 2 , . . . produces a sequence of labeled trees (L n ) n∈N , where L n has n vertices labeled by [n]. This sequence is a Markov chain that evolves as follows. Given L n , there are n + 1 words of the form v = v 1 . . . v such that v is not a vertex of the tree L n but the word v 1 . . . v −1 is. Pick such a word uniformly at random and adjoin it (with the label n + 1 attached) to produce the labeled tree L n+1 .
If we remove the labels from each tree L n , then the resulting random sequence of unlabeled trees is also a Markov chain that has the same distribution as the sequence of trees generated by the BST algorithm when the input stream consists of independent random variables that all have the same continuous distribution function. In essence, at step n + 1 of the BST algorithm there are n + 1 vertices that can be added to the existing tree and the rank of the input value x n+1 within x 1 , . . . , x n , x n+1 determines the choice of this "external vertex": for i.i.d. continuously distributed random input, this rank is uniformly distributed on {1, . . . , n+1}, resulting in a uniform pick from the external vertices (see also the discussion following (4.2)). See  . . , r(9) = 8, 7, 9, 4, 1, 3, 5, 2, 6. For the sake of clarity, the coding (see Figure 1) of the vertices as elements of {0, 1} is not shown. The correspondence between the labeling by the set [9] and the vertices as elements of {0, 1} is 1 ↔ ∅, 2 ↔ 1, 3 ↔ 10, 4 ↔ 0, 5 ↔ 101, 6 ↔ 11, 7 ↔ 00, 8 ↔ 000, 9 ↔ 001.
From now on we will refer to any Markov chain on the space of finite rooted binary trees with this transition mechanism as "the" BST process and denote it by (T n ) n∈N .
We note in passing that the labeled permutation trees L 1 , . . . , L n−1 can be reconstructed from L n , but a similar reconstruction of the history of the process from its current value is not possible if we consider the sequence of labeled trees obtained by labeling the vertices of the tree in the binary search tree algorithm with the input values x 1 , . . . , x n that created the tree.
Write G n (respectively, D n ) for the number of vertices in T n of the form 0v 2 . . . v (resp. 1w 2 . . . w m ). That is, G n and D n are the sizes of the "left" and "right" subtrees in T n below the root ∅. Then, G n + 1 and D n + 1 are, respectively, the number of "slots" to the left and to the right of 1 in the collection of n + 1 slots between successive elements or at either end of the ordered list Π −1 n (1), . . . , Π −1 n (n). It follows that the sequence of pairs (G n + 1, D n + 1), n ∈ N, is itself a Markov chain that evolves as the numbers of black and white balls in a classical Pólya urn (that is, as the process describing the successive compositions of an urn that initially contains one black and one white ball and at each stage a ball is drawn uniformly at random and replaced along with a new ball of the same color). More Figure 3. A finite rooted binary tree, the tree with 9 vertices connected by the solid edges, and its 10 external vertices, the vertices connected to the tree by dashed edges. For simplicity, the coding of the vertices as elements of {0, 1} is not shown.
precisely, conditional on the past up to time n, if (G n + 1, D n + 1) = (b, w), then (G n+1 + 1, D n+1 + 1) takes the values (b + 1, w) and (b, w + 1) with respective conditional probabilities b b+w and w b+w . More generally, suppose for a fixed vertex u = u 1 . . . u k ∈ {0, 1} * that we write G u n (respectively, D u n ) for the number of vertices in T n of the form u 1 . . . u k 0v 2 . . . v (resp. u 1 . . . u k 1w 2 . . . w m ). That is, G u n and D u n are the sizes of the "left" and "right" subtrees in T n below the vertex u. Put C u n := #{v ∈ T n : u ≤ v} and S u r = inf{s ∈ N : C u s = r} for r ∈ N; that is, S u r is the first time that the subtree of T n rooted at u has r vertices. Then, the sequence (G S u r , D S u r ), r ∈ N, obtained by time-changing the sequence (G u n , D u n ), n ∈ N, so that we only observe it when it changes state is a Markov chain with the same distribution as (G n , D n ), n ∈ N.
It follows from this observation that we may construct the tree-valued process (T n ) n∈N from an infinite collection of independent, identically distributed Pólya urns, with one urn for each vertex of the complete binary tree {0, 1} , by running the urn for each vertex according to a clock that depends on the evolution of the urns associated with vertices that are on the path from the root to the vertex.
More specifically, we first equip each vertex u ∈ {0, 1} with an associated independent N 0 × N 0 -valued routing instruction process (Y u n ) n∈N0 such that (Y u n + (1, 1)) n∈N0 evolves like the pair of counts in a Pólya urn with an initial composition of one black and one white ball. Then, at each point in time we feed in a new particle at the root ∅. At time 0 the particle simply comes to rest at ∅. At time 1 the root is occupied and so the particle must be routed to either the vertex 0 or the vertex 1, where it comes to rest, depending on whether the value of Y ∅ 1 is (1, 0) or (0, 1). We then continue on in this way: at time n ≥ 2 we feed a particle in at the root ∅, it is routed to the vertex 0 or the vertex 1 depending on whether the value of Y ∅ n − Y ∅ n−1 is (1, 0) or (0, 1), the particle then trickles down through the tree until it reaches an unoccupied vertex. At each stage of the trickle-down, if the particle is routed to a vertex u that is already occupied, then it moves on to the vertex u0 or the vertex u1 depending on whether the value of Y u n is the number of particles that have passed through vertex u and been routed onwards by time n. The resulting sequence of trees is indexed by N 0 rather than N, and if we shift the indices by one we obtain a sequence indexed by N that has the same distribution as (T n ) n∈N .
It is well-known (see [BK64]) that the Doob-Martin compactification of the state space N 2 of the classical Pólya urn results in a Doob-Martin boundary that is homeomorphic to the unit interval [0, 1]: a sequence of pairs ((b n , w n )) n∈N from N 2 converges to a point in the boundary if and only if b n + w n → ∞ and wn bn+wn → z for some z ∈ [0, 1]. We can, of course, identify [0, 1] with the space of probability measures on a set with two points, say {0, 1}, by identifying z ∈ [0, 1] with the probability measure that assigns mass z to the point 1.
It is a consequence of results we prove in Section 4 that this result "lifts" to the binary search tree process: the Doob-Martin boundary is homeomorphic to the space of probability measures on {0, 1} ∞ equipped with the weak topology corresponding to the product topology on {0, 1} ∞ and a sequence (t n ) n∈N of finite rooted binary trees converges to the boundary point identified with the probability measure µ if and only if #t n → ∞ and for each u ∈ {0, 1} An outline of the remainder of the paper is the following. In Section 2 we give a general version of the trickle-down construction in which the complete rooted binary tree {0, 1} * is expanded to a broad class of directed acyclic graphs with a unique "root" vertex and the independent Pólya urns at each vertex are replaced by independent Markov chains that keep a running total of how many particles have been routed onwards to each of the immediate successors of the vertex. For example, we could take the graph to be N 2 0 with directed edges of the form ((i, j), (i + 1, j)) and ((i, j), (i, j + 1)) (so that the root is (0, 0)) and take the Markov chain at vertex (i, j) to correspond to successive particles being routed independently with equal probability to either ((i, j), (i + 1, j)) or ((i, j), (i, j + 1)). This gives a process somewhat reminiscent of Sir Francis Galton's quincunx -a device used to illustrate the binomial distribution and central limit theorem in which successive balls are dropped onto a vertical board with interleaved rows of horizontal pins that send a ball striking them downwards to the left or right "at random". We illustrate the first few steps in the evolution of the set of occupied vertices in Figure 4.
We give a brief overview of the theory of Doob-Martin compactifications in Section 3. We present our main result, a generalization of the facts about the . The first five steps in the trickle-down process for the directed acyclic graph N 2 0 with directed edges of the form ((i, j), (i+ 1, j)) and ((i, j), (i, j + 1)). The root (0, 0) is drawn at the top. Dashed lines show that paths taken by successive particles as they pass through occupied vertices until they come to rest at the first unoccupied vertex they encounter.
Doob-Martin boundary of the binary search tree process we have stated above, in Section 4. It says for a large class of trickle-down processes that if the convergence of a sequence to a point in the Doob-Martin boundary for each of the component Markov chains is determined by the convergence of the proportions of points that are routed to each of the immediate successors, then the Doob-Martin boundary of the trickle-down process is homeomorphic to a space of probability measures on a set of directed paths from the root that either have infinite length or are "killed" at some finite time. We then consider special cases of this general result in Section 5, where we investigate the binary and digital search tree processes, and in Section 6, where we study random recursive tree processes that are related to a hierarchy of Chinese restaurant processes.
More specifically, we show in Section 5 that, as we already noted above, the Doob-Martin boundary of the BST process may be identified with the space of probability measures on {0, 1} ∞ equipped with the weak topology corresponding to the product topology on {0, 1} ∞ , that every boundary point is extremal, that the digital search tree process is a Doob h-transform of the BST process with respect to the extremal harmonic function corresponding to the fair coin-tossing measure on {0, 1} ∞ , and that an arbitrary Doob h-transform may be constructed from a suitable "trickle-up" procedure in which particles come in successively from the "leaves at infinity" of the complete rooted binary tree {0, 1} * (that is from {0, 1} ∞ ) and work their way up the tree until they can move no further because their path is blocked by an earlier particle.
We observe in Section 6 that the random recursive tree (RRT) process -see [SM94] for a review -can be built from the above sequence (Π n ) n∈N of uniform permutations in a manner analogous to the construction of the BST process by using a different bijection between permutations and trees. The RRT process is also a trickle-down process similar to the BST process, with the tree {0, 1} * replaced by the tree N * and the Pólya urn routing instructions replaced by the Markov chain that gives the block sizes in the simplest Chinese restaurant process model of growing random partitions. We extend this construction to incorporate Pitman's two-parameter family of Chinese restaurant processes and then investigate the associated Doob-Martin compactification. We identify the Doob-Martin boundary as a suitable space of probability measures, show that all boundary points are extremal, demonstrate that h-transform processes may be constructed via a "trickleup" procedure similar to that described above for the BST process, and relate the limit distribution to the Griffiths-Engen-McCloskey (GEM) distributions. Similar nested hierarchies of Chinese restaurant processes appear in [DGM06,PW09] and in [TJBB06,BGJ10] in the statistical context of mixture models, hierarchical models, and nonparametric Bayesian inference.
A commonly used probability distribution on the set of permutations of a finite set is the Mallows φ model -see [Mal57,Cri85,FV86,Dia88,CFV91,Mar95] -for which the uniform distribution is a limiting case. This distribution extends naturally to the set of permutations of N, and applying the obvious generalization of the above bijection between finite permutations and labeled finite rooted subtrees of the complete rooted binary tree {0, 1} leads to an interesting probability distribution on infinite rooted subtrees of {0, 1} . In Section 7 we relate this distribution to yet another model for growing random finite trees that we call the Mallows tree process. We show that the Doob-Martin boundary of this Markov chain is a suitable space of infinite rooted subtrees of {0, 1} . We outline a parallel analysis in Section 8 for a somewhat similar process that is related to Schützenberger's non-commutative q-binomial theorem and its connection to weighted enumerations of "north-east" lattice paths.
The routing instruction processes that appear in the trickle-down construction of the Mallows tree process have the feature that if we know the state of the chain at some time, then we know the whole path of the process up to that time. We observe in Section 9 that such processes may be thought of as Markov chains on a rooted tree with transitions that always go to states that are one step further from the root. As one might expect, the Doob-Martin compactification in this case is homeomorphic to the usual end compactification of the tree. We use this observation to describe the Doob-Martin compactification of a certain Markov chain that takes values in the set of compositions of the integers and whose value at time n is uniformly distributed over the compositions of n.
As we have already remarked, our principal reason for studying the Doob-Martin compactification of a trickle-down chain is to determine the chain's tail σ-field. The Doob-Martin compactification gives even more information about the asymptotic behavior of the chain, but it is not always easy to compute. We describe another approach to determining the tail σ-field of certain trickle-down chains in Section 10. That result applies to the Mallows tree process and the model related to the noncommutative q-binomial theorem. We also apply it in Section 11 to yet another Markov chain model of growing random trees from [LW04]. The latter model, which turns out to be of the trickle-down type, has as its state space the set of finite rooted binary trees and is such that if it is started at time 0 in the trivial tree {∅}, then the value of the process at time n is equally likely to be any of the C n rooted binary trees with n vertices, where C n := 1 n+1 2n n is the n th Catalan number. Even though we cannot determine the Doob-Martin compactification of this chain, we are able to show that its tail σ-field is generated by the random infinite rooted subtree of the complete binary tree that is the (increasing) union of the successive values of the chain. Also, knowing the tail σ-field allows us to identify the Poisson boundary -see Section 3 for a definition of this object.
We observe that there is some similarity between the trickle-down description of the binary search tree process and the internal diffusion limited aggregation model that was first named as such in [LBG92] after it was introduced in [DF91]. There particles are fed successively into a fixed state of some Markov chain and they then execute independent copies of the chain until they come to rest at the first unoccupied state they encounter. The digital search tree process that we discuss in Section 5 turns out to be internal diffusion limited aggregation model for the Markov chain on the complete rooted binary tree that from the state u moves to the states u0 and u1 with equal probability.
Finally, we note that there are a number of other papers that investigate the Doob-Martin boundary of Markov chains on various combinatorial structures such as Young diagrams and partitions -see, for example, [PW94, KOO98, GK00, GP05, GO06b, GO06a].
2. The trickle-down construction 2.1. Routing instructions and clocks. We begin by introducing a class of directed graphs with features generalizing those of the complete binary tree {0, 1} considered in the Introduction.
Let I be a countable directed acyclic graph. With a slight abuse of notation, write u ∈ I to indicate that u is a vertex of I. Given two vertices u, v ∈ I, write Suppose that there is a unique vertex0 such that for any other vertex u there is at least one finite directed path0 Note that0 is the unique minimal element of I. Suppose further that the number of directed paths between any two vertices is finite: this is equivalent to supposing that the number of directed paths between0 and any vertex is finite. That is, α(u) and β(u) are, respectively, the immediate predecessors and the immediate successors of u. Suppose that β(u) is non-empty for all u ∈ I. Thus, any We next introduce the notion of routing instructions that underlies the construction of a sequence of connected subsets of I via a trickle-down mechanism analogous to that described in the Introduction for the BST: at each point in time a particle is fed into0 and trickles down through I according to the routing instructions at the occupied vertices it encounters until it finds a vacant vertex to occupy.
Let (N 0 ) β(u) be the space of functions on the set of successors of u ∈ I that take values in the non-negative integers. Let e v , v ∈ β(u), be the function that takes the value 1 at v and 0 elsewhere. That is, if we regard e v as a vector indexed by β(u), then e v has 1 in the v th coordinate and 0 elsewhere. Formally, a routing instruction for the vertex u ∈ I is a sequence (σ u n ) n∈N0 of elements of (N 0 ) β(u) with the properties: • σ u 0 = (0, 0, . . .), • for each n ≥ 1, σ u n = σ u n−1 + e vn for some v n ∈ β(u). The interpretation of such a sequence is that, for each v ∈ β(u), the component (σ u n ) v counts the number of particles out of the first n to pass through the vertex u that are routed onwards to vertex v ∈ β(u). The equation σ u n = σ u n−1 + e vn indicates that the n th such particle is routed onwards to the vertex v n ∈ β(u). For Note that a routing instruction (σ u n ) n∈N0 for the vertex u satisfies |σ u n | = n for all n ∈ N 0 .
For each vertex u ∈ I, suppose that we have a non-empty set Σ u of routing instructions for u. Put Σ := u∈I Σ u . Depending on convenience, we write a generic element of Σ in the form ((σ u n ) n∈N0 ) u∈I or the form ((σ u (n)) n∈N0 ) u∈I . Recall that σ u n = σ u (n) is an element of (N 0 ) β(u) , and so it has coordinates (σ u n ) w = (σ u (n)) w for w ∈ β(u).
Given σ ∈ Σ, each vertex u of I has an associated clock (a u n (σ)) n∈N0 such that a u n (σ) counts the number of particles that have passed through u by time n and been routed onwards to some vertex in β(u). For each n ∈ N and σ ∈ Σ the integers a u n (σ), u ∈ I, are defined recursively (with respect to the partial order on I) as follows: In particular, a 0 (σ) = (0, 0, . . .) for all σ ∈ Σ. The equation in (b) simply says that the number of particles that have been routed onwards from the vertex u by time n is equal to the number of particles that have passed through vertices v with v → u and have been routed in the direction of u, excluding the first particle that reached the vertex u and occupied it.
We say that the sequence (x n ) n∈N0 = ((x u n ) u∈I ) n∈N0 given by (2.2) x u n := σ u (a u n (σ)) is the result of the trickle-down construction for the routing instruction σ ∈ Σ.
(a) Figure 5 shows the state at time n = 12 (that is, the values of x u 12 for u = (i, j) ∈ I = N 2 0 ) generated by routing instructions whose initial pieces are when the states (i + 1, j) and (i, j + 1) that comprise β(u), the immediate successors of u, are taken in that order. (b) The clock a (0,1) , which translates from "real time" to the "local time" at the vertex (0, 1) ∈ I = N 2 0 by counting the particles that pass through this vertex, has a corresponding sequence of states that begins a = (2, 0) indicates that by time 5 the vertex (0, 1) has been occupied, 2 particles have been sent onwards to the vertex (1, 1), and 0 particles have been sent onwards to the other immediate successor (0, 2). (d) Looking at the state x u 12 , u ∈ I, at time n = 12 we cannot reconstruct the relevant initial segments of the routing instructions but we can see, for example, that -13 particles have been fed into the root (0, 0): the first of these stayed at the root, 6 of the remainder were routed onwards to (1, 0) and the other 6 were routed onwards to (0, 1) (that is, a (0,0) 12 (σ) = 12 and σ (0,0) 12 = (6, 6)); of the 6 particles routed from the root towards (1, 0), the first stayed there, 2 of the remainder were routed onwards to (2, 0) and the other 3 were routed onwards to (1, 1) (that is, a (1,0) 12 (σ) = 5 and σ (1,0) 5 = (2, 3)); of the 6 particles routed from the root towards (0, 1), the first stayed there, 3 of the remainder were routed onwards to (1, 1) and the other 2 were routed onwards to (0, 2) (that is, a (0,1) 12 (σ) = 5 and σ (0,1) 5 = (3, 2)).
For each vertex u ∈ I, write S u ⊆ (N 0 ) β(u) for the set of vectors that can appear as an entry in an element of Σ u . That is, s ∈ S u if and only if s = σ m for some sequence (σ n ) n∈N0 ∈ Σ u , where, of course, m = |s|. Note that the set S u is countable.
Let S denote the subset of u∈I S u consisting of points x = (x u ) u∈I that can be constructed as (x u ) u∈I = (σ u (a u m (σ))) u∈I for some m ∈ N 0 and some σ = ((σ v n ) n∈N0 ) v∈I ∈ Σ ; that is, x appears as the value at time m in the result of the trickle-down construction for the routing instruction σ. Clearly, if a sequence (6,6) (2,3) (0,1) (1,0) (1,0) (0,1) (x u ) u∈I ∈ u∈I S u belongs to S, then Given two points x, y ∈ S, say that x y if for some m, n ∈ N 0 with m ≤ n and some σ ∈ Σ we have x u = σ u (a u m (σ)) and y u = σ u (a u n (σ)) for all u ∈ I. Remark 2.2. Note that if x y, then (x u ) v ≤ (y u ) v for all u ∈ I and v ∈ β(u). Moreover, if x y, then σ ∈ Σ : (σ u (a u m (σ))) u∈I = x and (σ u (a u n (σ))) u∈I = y for some m ≤ n ∈ N 0 Example 2.3. Suppose that I is a tree. This amounts to imposing the extra condition that for each vertex u ∈ I there is a unique directed path from0 to u. For each u ∈ I take Σ u to be the set of all allowable routing instructions for u, so that the corresponding set S u is (N 0 ) β(u) . In this case, there is a bijection between S and finite subtrees of I that contain the root0. An element x ∈ S determines a finite rooted subtree t by In other words, the tree t consists of those vertices of I that are occupied by the first v∈β(0) (x0) v particles. Conversely, if t is a finite subtree of I that contains0, then the corresponding element of S is that is, x appears as the result of the trickle down construction at some time n and for each pair of vertices u ∈ I and v ∈ β(u) the integer #{w ∈ t : v ≤ w} gives the number of particles that have been routed onwards from vertex u ∈ I to vertex v ∈ β(u) by time n. The partial order on S is equivalent to containment of the associated subtrees. From now on, when I is a tree we sometimes do not mention this bijection explicitly and abuse terminology slightly by speaking of S as the set of finite subtrees of I that contain the root0.
Example 2.4. In Example 2.3, the set S u of states for the routing instructions at any vertex u ∈ I is all of (N 0 ) β(u) . At the other extreme we have what we call the single trail routing: as always, the first item is put into the root, but now, in the step from n to n + 1, the new item follows the trail u 0 , . . . , u n−1 left by the last one and then chooses u n from β(u n−1 ). In this case, S u = {0} v∈β(u) Ne v , where 0 is the zero vector in (N 0 ) β(u) . Examples of this type appear in Section 9.
Remark 2.5. In the setting of Example 2.3, the sequence (x n ) n∈N0 in S constructed by setting x u n = σ u (a u n (σ)) for some σ ∈ Σ corresponds to a sequence of growing subtrees that begins with the trivial tree {0} and successively add a single vertex that is connected by a directed edge to a vertex present in the current subtree, and this correspondence is bijective. In Example 2.4, a sequence (x n ) n∈N0 in S corresponds to the sequence of initial segments of some infinite directed path,0 = u 0 → u 1 → u 2 → · · · through I, and this correspondence is also bijective.
2.2. Trickle-down chains. We now choose the routing instructions randomly in order to produce an S-valued stochastic process.
For each u ∈ I, let Q u be a transition matrix whose rows and columns are indexed by some subset Then Σ u is a set of routing instructions for the vertex u. Define, as in the previous subsection, S u to be the set of elements of N β(u) 0 that can appear as an entry in an element of Σ u . Note that S u ⊆ R u : the set S u consists of the states that are reachable by a Markov chain with transition matrix Q u started from the state (0, 0, . . .). We will suppose from now on that R u = S u .
Write (Y u n ) n∈N0 for the corresponding S u -valued Markov chain with its associated collection of probability measures Q u,ξ , ξ ∈ S u . A realization of the process Y u starting from the zero vector in (N 0 ) β(u) will serve as the routing instruction for the vertex u; that is, the n th particle that trickles down to u and finds u occupied will be routed onward to the immediate successor v ∈ β(u) specified by e v = Y u n −Y u n−1 . By assumption, and with 0 the zero vector in (N 0 ) β(u) , Y u has positive probability under Q u,0 of hitting any given state in S u . We will refer to Y u as the routing chain for the vertex u. Let Y := (Y u ) u∈I , where the component processes Y u are independent and have distribution Q u,0 .
With a 0 , a 1 , . . . the clocks defined in Section 2.1, set Thus, (A n ) n∈N0 is an (N 0 ) I -valued stochastic process with non-decreasing paths and initial value (0, 0, . . .). When Y 0 = (0, 0, . . .), the value of the process A n at time n is a vector (A u n ) u∈I : the non-negative integer A u n records the number of particles that have trickled down to the vertex u by time n, found u already occupied, and have been routed onwards. Define is a Markov chain on the countable state space S under the probability measure u∈I Q u,0 . The paths of Z start from the state (0, 0, . . .) and increase strictly in the natural partial order on S. The random vector Z u n gives for each immediate successor v ∈ β(u) of u the number of particles that have trickled down to u by time n, found u already occupied, and have been routed onwards towards v.
By standard arguments, we can construct a measurable space (Ω, F), a family of probability measures (P x ) x∈S and an S-valued stochastic process X = (X n ) n∈N0 such that X under P x is a Markov chain with X 0 = x and the same transition mechanism as Z.
Remark 2.6. Note that if J is a subset of I with the property that {v ∈ I : v ≤ u} ⊆ J for all u ∈ J, then ((X u n ) u∈J ) n∈N0 is a Markov chain under P x . Moreover, the law of the latter process under P x agrees with its law under P y for any y ∈ S with x u = y u for all u ∈ J.

Doob-Martin compactification background
We restrict the following sketch of Doob-Martin compactification theory for discrete time Markov chains to the situation of interest in the present paper. The primary reference is [Doo59], but useful reviews may be found in [ Suppose that (X n ) n∈N0 is a discrete time Markov chain with countable state space E and transition matrix P . Define the Green kernel or potential kernel G of P by G(i, j) := ∞ n=0 P n (i, j) for i, j ∈ E and assume that there is a reference state e ∈ E such that 0 < G(e, j) < ∞ for all j ∈ E. This implies that any state can be reached from e and that every state is transient. For the chains to which we apply the theory, the state space E is a partially ordered set with unique minimal element e and transition matrix P such that P (k, ) = 0 unless k < , so that the sample paths of the chain are increasing and . Excessive functions are also called non-negative superharmonic functions. Similarly, regular functions are also called non-negative harmonic functions. Given a finite measure µ on E, define a function Gµ : The function Gµ is excessive and is called the potential of the measure µ. The Riesz decomposition says that any excessive function f has a unique decomposition f = h + p, where h is regular and p = Gν is the potential of a unique measure ν.
Note for any excessive function f that f (e) ≥ sup n∈N0 P n (e, j)f (j), and so f (e) = 0 implies that f = 0. Therefore, any excessive function is a constant multiple of an element of the set S of excessive functions that take the value 1 at e. The set S is a compact convex metrizable subset of the locally convex topological vector space The Martin kernel with reference state e is given by ; that is, K(·, j) is the potential of the unit point mass at j normalized to have value 1 at the point e ∈ E. For each j ∈ E the function K(·, j) belongs to S and is non-regular. Moreover, K(·, j) is an extreme point of S and any extreme point of S that is not of the form K(·, j) for some j ∈ E is regular. It also follows from the Riesz decomposition that the map φ : E → S given by φ(j) := K(·, j) is injective. Therefore, we can identify E with its image φ(E) ⊂ S that sits densely inside the compact closure F of φ(E) in S. With the usual slight abuse of terminology, we treat E as a subset of F and use the alternative notationĒ for F . The construction of the compact metrizable spaceĒ from E using the transition matrix P and the reference state e is the Doob-Martin compactification of E and the set By definition, a sequence (j n ) n∈N in E converges to a point inĒ if and only if the sequence of real numbers (K(i, j n )) n∈N converges for all i ∈ E. Each function K(i, ·) extends continuously toĒ and we call the resulting function K : E ×Ē → R the extended Martin kernel.
The set of extreme points F ex of the convex set F is a G δ subset of F and any regular function h ∈ S (that is, any regular function h with h(e) = 1) has the representation h = K(·, y) µ(dy) for some unique probability measure on F that assigns all of its mass to F ex ∩ E c ⊆ ∂E.
The primary probabilistic consequence of the Doob-Martin compactification is that for any initial state i the limit X ∞ := lim n→∞ X n exists P i -almost surely in the topology of F and the limit belongs to F ex , P i -almost surely.
If h is a regular function (not identical to 0), then the corresponding Doob htransform is the Markov chain (X and transition matrix When h is strictly positive, the Doob-Martin compactification of E and its set of extreme points are the same for P and P (h) .
The regular function h is extremal if and only if the limit lim n→∞ X h n is almost surely equal to a single point y for some y ∈ F , in which case y ∈ F ex ∩ E c and h = K(·, y). In particular, h is extremal if and only if the tail σ-field of (X is trivial. In this case, the transformed chain (X (h) n ) n∈N0 may be thought of as the original chain (X n ) n∈N0 conditioned to converge to y. The original chain is a mixture of such conditioned chains, where the mixing measure is the unique The Doob-Martin boundary provides a representation of the non-negative harmonic functions. We close this review section with a brief discussion of a measure theoretic boundary concept that has a more direct relation to tail σ-fields in the trickle-down case.
The set H of all bounded harmonic functions is a linear space and indeed a Banach space when endowed with the supremum norm. The Poisson boundary is a measure space (M, A, µ) with the property that L ∞ (M, A, µ) and H are isomorphic as Banach spaces. The Doob-Martin boundary ∂E together with its Borel σ-field and the distribution ν of X ∞ under P e provides such a measure space.
Our models have the specific feature that, loosely speaking, 'time is a function of space': the state space E of a trickle-down chain (X n ) n∈N0 may be written as the disjoint union of the sets Let T be the tail σ-field of the chain. Consider now the map that takes a bounded, T -measurable random variable Z to the function h : E → R defined by for all x ∈ E n , on each E n separately. Note that h(X n ) = E e [Z|X n ]. Using martingale convergence and the Markov property, it follows that this map is a Banach space isomorphism between L ∞ (Ω, T , P e ) and H. For any embedding in which the chain converges to a limit X ∞ , this limit is T -measurable. The limit in the Doob-Martin compactification of a transient chain generates the invariant σ-field up to null sets, where for a chain (X n ) n∈N0 with state space E, an event A is invariant if there is a product measurable subset B ⊆ E N0 such that for all n ∈ N 0 the symmetric difference A {(X n , X n+1 , . . .) ∈ B} has zero probability. In our models, the limit X ∞ in the Doob-Martin compactification generates the tail σ-field, because it is possible to reconstruct the value of the time parameter from the state of the process at an unspecified time. Conversely, from the tail σ-field we may obtain the Poisson boundary but not, in general, the Doob-Martin boundary.

Compactification for trickle-down processes
For each u ∈ I, let Q u be a transition matrix on S u ⊆ N β(u) 0 with the properties described in Section 2.2. The following result is immediate from the construction of the trickle-down chain X and Remark 2.2.
The product is zero unless x y (equivalently, x u ≤ y u for all u ∈ I). Only finitely many terms in the product differ from 1, because x u = y u = (0, 0, . . .) (equivalently, m u = n u = 0) for all but finitely many values of u ∈ I.
Corollary 4.2. The Martin kernel of the Markov chain X with respect to the reference state0 is given by where K u is the Martin kernel of the Markov chain Y u with respect to reference state (0, 0, . . .) ∈ S u . The product is zero unless x y (equivalently, x u ≤ y u for all u ∈ I). Only finitely many terms in the product differ from 1, because x u = (0, 0, . . .) for all but finitely many values of u ∈ I.
Proof. It suffices to note that and then apply Lemma 4.1.
Example 4.3. Consider the BST process from the Introduction. Recall that in this case the directed graph I is the complete binary tree {0, 1} and each of the processes (Y u n + (1, 1)) n∈N0 is the classical Pólya urn in which we have an urn consisting of black and white balls, we draw a ball uniformly at random at each step and replace it along with one of the same color, and we record the number of black and white balls present in the urn at each step. Note that if we start the Pólya urn with b black and w white balls, then the probability that we ever see B black balls and W white balls is the probability that after (B + W ) − (b + w) steps we have added B − b black balls and W − w white balls. The probability of adding the extra balls in a particular specified order is (the fact that this probability is the same for all orders is the fundamental exchangeability fact regarding the Pólya urn). The probability of adding the required extra balls of each color in some order is therefore Hence, Suppose that x, y ∈ S with x y. It follows from Corollary 4.2 that Recall from Example 2.3 that we may associate x and y with the two subtrees and that Similar relations exist for y and t. It follows that , and u∈I ((y u ) u0 + (y u ) u1 + 1) = u∈t #t(u), so we arrive at the simple formula This formula may also be obtained without using Corollary 4.2 as follows. With a slight abuse of notation, we think of the process (X n ) n∈N0 as taking values in the set of finite subtrees of {0, 1} containing the root ∅. We first want a formula for P s {X hits t} when s and t are two such trees with s ⊆ t. For ease of notation, set k := #s and n := #t. It is known (see, for example, [SF96,p.316 Write v 1 , . . . , v k+1 for the "external vertices" of s; that is, the elements of {0, 1} that are connected to a vertex of s by a directed edge, but are not vertices of s themselves (recall Figure 3). Denote by t(v j ), j = 1, . . . , k + 1 the subtrees of t that are rooted at these vertices; that is, the t(v j ) are the connected components of t \ s. In order for the BST process to pass from s to t it needs to place the correct number n j := #t(v j ) of vertices into each of these subtrees and, moreover, the subtrees have to be equal to t(v j ), for j = 1, . . . , k + 1. The process that tracks the number of vertices in each subtree is, after we add the vector (1, . . . , 1), a multivariate Pólya urn model starting with k + 1 balls, all of different colors. Thus, the probability that each subtree has the correct number of vertices is using a standard argument for the Pólya urn [JK77, Chapter 4.5]. Moreover, it is apparent from the recursive structure of the BST process that, conditional on k + 1 subtrees receiving the correct number of vertices, the probability the subtrees are Thus, and (4.1) follows upon taking the appropriate ratio.
With Example 4.3 in mind, we now begin to build a general framework for characterizing the Doob-Martin compactification of a trickle-down chain in terms of the compactifications of each of the routing chains.
Proposition 4.4. Suppose (y n ) n∈N0 is a sequence in S such that y u ∞ := lim n→∞ y u n exists in the Doob-Martin topology ofS u for each u ∈ I. Then, (y n ) n∈N0 converges in the Doob-Martin topology of S to a limit y ∞ and the value at The assumption that y u ∞ := lim n→∞ y u n exists in the Doob-Martin topology ofS u for each u ∈ I implies that lim n→∞ K u (ξ, y u n ) exists for each u ∈ I and ξ ∈ S u . This limit is, by definition, the value K u (ξ, y u ∞ ) of the extended Martin kernel. We need to show for all x ∈ S that lim n→∞ K(x, y n ) exists and is given by . It follows from Corollary 4.2 that K(x, y n ) = u∈I K u (x u , y u n ). We also know from that result that we may restrict the product to the fixed, finite set of u for which x u = (0, 0, . . .), and hence we may interchange the limit and the product.
Remark 4.5. Proposition 4.4 shows that if the sequence (y n ) n∈N0 in S is such that for each u ∈ I the component sequence (y u n ) n∈N0 converges in the Doob-Martin compactification of S u , then (y n ) n∈N0 converges in the Doob-Martin compactification of S.
Establishing results in the converse direction is somewhat tricky, since converges to 0 for some particular v ∈ I, and so we are not able to conclude that K u (x u , y u n ) converges for all u ∈ I. Instances of this possibility appear in Section 7 and Section 8.
The following set of hypotheses gives one quite general setting in which it is possible to characterize the Doob-Martin compactification of S in terms of the compactifications of the component spaces S u . These hypotheses are satisfied by a number of interesting examples such as the binary search tree and the random recursive tree processes (see Example 4.7 and Example 4.8 below as well as Section 5 and Section 6). The key condition is part (iii) of the following set of hypotheses: it requires that the Doob-Martin boundary of the routing chain for the vertex u may be thought of as a set of subprobability measures on β(u) that arise as the vector of limiting proportions of particles that have been routed onward to the various elements of β(u).
Hypothesis 4.6. Suppose that the following hold for all u ∈ I.
(i) Writing |ξ| = v∈β(u) ξ v for ξ ∈ S u , the sets {ξ ∈ S u : |ξ| = m} are finite for all m ∈ N 0 , so that if (ζ n ) n∈N0 is a sequence from S u , then the two conditions (4.4) #{n ∈ N 0 : ζ n = ζ} < ∞ for all ζ ∈ S u and (4.5) lim as n → ∞ for all ξ ∈ S u , it is necessary and sufficient that either #{n ∈ N 0 : ζ n = ζ} < ∞ for some ζ ∈ S u or that the equivalent conditions (4.4) and (4.5) hold and, in addition, (iii) If (ζ n ) n∈N0 and (ζ n ) n∈N0 are two sequences from S u such that #{n ∈ N 0 : ζ n = ζ} < ∞ and #{n ∈ N 0 : ζ n = ζ} < ∞ for all ζ ∈ S u and both K u (ξ, ζ n ) and K u (ξ, ζ n ) converge for all ξ ∈ S u , then . It follows that there is a natural bijection between ∂S u := S u \ S u , where S u is the Doob-Martin compactification of S u , and the set S u of subprobability measures on β(u) that are limits in the vague topology of probability measures of the form where (ζ n ) n∈N0 is a sequence from S u that satisfies (4.4). (iv) The bijection between ∂S u and S u is a homeomorphism if the former set is equipped with the trace of the Doob-Martin topology and the latter set is equipped with the trace of the vague topology.
are two sequences from S u that both satisfy (4.4) and for all ξ ∈ S u . (vi) Suppose that (ζ n ) n∈N0 is a sequence from S u such that (4.4) holds and K u (ξ, ζ n ) converges as n → ∞ for all ξ ∈ S u . Let ρ = (ρ v ) v∈β(u) be the subprobability vector of limiting proportions defined by (4.6). The extended Martin kernel is such that Example 4.7. Hypothesis 4.6 holds if #β(u) = 2 for all u ∈ I (for example, if , and the Markov chains Y u = (Y u n ) n∈N0 are such that (Y u n + (1, 1)) n∈N0 are all Pólya's urns starting with one black ball and one white ball. This is a consequence of the results in [BK64]. Indeed, the same is true if for arbitrary I with β(u) finite for all u ∈ I we take S u = (N 0 ) β(u) and let Y u be an urn scheme of the sort considered in [BM73] where there is a (not necessarily integer-valued) finite measure ν u on β(u) that describes the initial composition of an urn with balls whose "colors" are identified with the elements of β(u), balls are drawn at random and replaced along with a new ball of the same color, and Y u n records the number of balls of the various colors that have been drawn by time n.
In this general case, the extended Martin kernel is given by ξ v denotes the value ρ v that the probability measure ρ assigns to {v} raised to the power ξ v . We may take the set R u in this case to be the coordinate vectors e v , v ∈ β(u), where e v has a single 1 in the v th component and 0 elsewhere. The set S u consists of all the probability measures on the finite set β(u).
Example 4.8. Hypothesis 4.6 also holds if the set β(u) is finite for all u ∈ I, S u = (N 0 ) β(u) , and the routing chain Y u is given by where the W u k are independent, identically distributed S u -valued random variables with distribution that has support the set of coordinate vectors. If p v u is the probability that the common distribution of the W u k assigns to the coordinate vector e v , then the extended Martin kernel is given by Results of this type go back to [Wat60] and are described in [KSK76]. Once again, we may take R u to be the set of coordinate vectors, and once again S u consists of all the probability measures on the finite set β(u).
In order to state a broadly applicable result in the converse direction of Proposition 4.4 we first need to develop some more notation and collect together some auxiliary results.
Adjoin a point to I and write I ∞ for the set of sequences of the form (u n ) n∈N0 where either u n ∈ I for all n ∈ N 0 and0 = u 0 → u 1 → . . . or, for some N ∈ N 0 , u n ∈ I for n ≤ N ,0 = u 0 → . . . → u N , and u n = for n > N . We think of I ∞ as the space of directed paths through I that start at0 and are possibly "killed" at some time and sent to the "cemetery" .
Write C ∞ for the countable collection of subsets of Denote by I ∞ the σ-field generated by C ∞ . The following result is elementary and we leave its proof to the reader.
Lemma 4.9. Any probability measure on the measurable space (I ∞ , I ∞ ) is specified by its values on the sets in C ∞ . The space of such probability measures equipped with the coarsest topology that makes each of the maps µ → µ(C), C ∈ C ∞ , continuous is compact and metrizable.
Consider the case of Lemma 4.9 where the measure µ describes the dynamics of a Markov process. That is, for each u ∈ I there is a subprobability measure r u on β(u) such that if the process is in state u, then the next step is with probability (r u ) v to v, and with probability 1 − v∈β(u) (r u ) v to .
Label u ∈ I with ↓ if u is reachable from0 (in the classical sense of Markov chains), and with † otherwise. Denote by J ↓ and J † the sets of vertices labeled with ↓ and †, respectively.
Clearly, in order to specify the distribution µ of the Markovian path starting from0 it suffices to have the subprobability measures r u only for u ∈ J ↓ . Note that the labeling (J ↓ , J † ) has the two properties • the vertex0 is labeled with ↓; • if for some v =0 every vertex u ∈ α(v) is labeled with †, then v is also labeled with †. Let us now switch perspectives and start from a labeling instead of a collection of subprobability measures.
Definition 4.10. Say that a labeling of I with the symbols ↓ and † is admissible if it satisfies the above two properties. Write I ↓ (resp. I † ) for the subset of vertices labeled with ↓ (resp. †).
Note that if (I ↓ , I † ) is an admissible labeling of I, (u n ) n∈N0 is a directed path in I with u 0 =0, and we define a sequence (ũ n ) n∈N0 in I ∪ { } bỹ then (ũ n ) n∈N0 is an element of I ∞ .
Definition 4.11. Given an admissible labeling (I ↓ , I † ) of I, say that a collection The assertions (i), (ii) and (iii) in the following lemma, with J ↓ and J † instead of I ↓ and I † , are obvious. The proof of the lemma is then clear from the previous remark.
Lemma 4.13. Consider an admissible labeling of I with the symbols ↓ and † and a compatible collection of subprobability measures (r u ) u∈I ↓ .
(i) There is a unique probability measure µ on (I ∞ , I ∞ ) for which the mass assigned to the set otherwise.
(ii) The vertex u belongs to for any choice of0 = u 0 → . . . → u n = u → u n+1 = v such that the denominator is positive. In particular, it is possible to recover the labeling and the collection (r u ) u∈I ↓ from the probability measure µ. Moreover, if two such sequences converge to the same point then the corresponding elements of R ∞ coincide.
in S that converges to a point in the Doob-Martin boundary ∂S =S\S and satisfies (4.7). Moreover, any two such sequences converge to the same point, establishing a bijection between R ∞ and ∂S. (iii) For x ∈ S and ((I ↓ , I † ), (r u ) u∈I ↓ ) ∈ R ∞ ∼ = ∂S, the value of the extended Martin kernel is otherwise.
(iv) Let P ∞ be the set of probability measures on I ∞ constructed from elements of R ∞ via the bijection of Lemma 4.13. Equip P ∞ with the trace of the metrizable topology introduced in Lemma 4.9. The composition of the bijection between P ∞ and R ∞ and the bijection between R ∞ and ∂S is a homeomorphism between P ∞ and ∂S.
Proof. Consider part (i). Suppose that the sequence (y n ) n∈N0 converges to a point in ∂S; that is, (4.8) lim n→∞ K(x, y n ) exists for all x ∈ S and no subsequence converges in the discrete topology on S to a point of S. Thus, (4.9) #{n ∈ N 0 : y n = y} < ∞ for any y ∈ S.
Consequently, in order to understand what further constraints are placed on the sequence (y n ) n∈N0 by the assumption that (4.8) holds, we need only consider choices of x ∈ S with the property that (x0 Note from the consistency condition (2.3) that for this restricted class of x we must have x w = 0 for all w ∈ I such that all directed path from0 to w necessarily passes through v ∈ β(0) with (r0) v = 0. Suppose that r0 = 0. Fix a vertex u ∈ β(0) such that (r0) u > 0 and η ∈ R u . From Hypothesis 4.6(vi), there exists θ ∈ S0 such that θ u = |η| + 1, and θ w ∈ {0, 1} for w = u. Define x ∈ S by setting x0 = θ and x u = η. By the consistency condition (2.3), this completely specifies x. Note that x w = 0 if w / ∈ {0, u}. By Corollary 4.2, K(x, y n ) = K0(θ, y0 n )K u (η, y u n ), and, by the choice of θ, K0(θ, y0 n ) converges to a non-zero value as n → ∞. Therefore, lim n→∞ K u (η, y u n ) exists. Since this is true for all η ∈ R u , it follows from Hypothesis 4.6(v) that lim n→∞ K u (ξ, y u n ) exists for all ξ ∈ S u . Hence, by Hypothesis 4.6(ii), lim n→∞ (y u n ) v |y u n | exists for all v ∈ β(u). Write r u ∈ S u for the resulting subprobability measure.
Continuing in this way, we see that, under the assumption (4.9), if (4.8) holds then there is a labeling of I with the symbols ↓ and † such that the following are true: • the vertex0 is in I ↓ ; • if a vertex u is in I ↓ , then the limiting subprobability measure lim n→∞ y u n |y u n | =: r u ∈ S u exists; • a vertex v =0 belongs to I † if and only if every vertex u ∈ α(v) belongs to I † or (r u ) v = 0 for every vertex u ∈ α(v) ∩ I ↓ . Thus, the labeling (I ↓ , I † ) is admissible and the collection (r u ) u∈I ↓ ∈ u∈I ↓ S u are compatible, so (I ↓ , I † ), (r u ) u∈I ↓ ) is an element of R ∞ .
Suppose that (y n ) n∈N0 and (z n ) n∈N0 are two sequences from S that converge to the same point in ∂S. Then, |y0 n | → ∞ and |z0 n | → ∞ as n → ∞, It is clear that the vertices of I that are labeled with the symbol ↓ (resp. †) for the sequence (y n ) n∈N0 must coincide with the vertices of I that are labeled with the symbol ↓ (resp. †) for the sequence (z n ) n∈N0 , and Moreover, it follows from what we have just done that if x ∈ S and the convergent sequence (y n ) n∈N0 is associated with (I ↓ , I † ), (r u ) u∈I ↓ ), then otherwise.
This establishes part (iii) once we show part (ii). Now consider part (ii). Fix (I ↓ , I † ), (r u ) u∈I ↓ ) ∈ R ∞ . By Hypothesis 4.6(vii), for each u ∈ I ↓ there is a sequence (σ u n ) n∈N0 ∈ Σ u such that Choose sequences (σ u n ) n∈N0 ∈ Σ u for u / ∈ I ↓ arbitrarily and set σ = (σ u ) u∈I ∈ Σ. Define a sequence (y n ) n∈N0 from S by setting y u n = σ u (a u n (σ)) for n ∈ N 0 and u ∈ I. It is clear from the arguments for part (i) that (y n ) n∈N0 converges to a point in ∂S and (4.7) holds. Moreover, it follows from the same arguments that any two convergent sequences satisfying (4.7) must converge to the same point. This establishes (ii).
The proof of (iv) is straightforward and we omit it.

Binary search tree and digital search tree processes
Recall the binary search tree (BST) process from the Introduction. We observed in Example 4.7 that Hypothesis 4.6 holds for the BST process. Recall from Example 2.3 that we can identify S in this case with the set of finite subtrees of the complete binary tree {0, 1} that contain the root ∅. Moreover, it follows from the discussion in Section 4 that ∂S is homeomorphic to the set of probability measures on {0, 1} ∞ equipped with the weak topology corresponding to the usual product topology on {0, 1} ∞ .
We therefore abuse notation slightly and take S to be set of finite subtrees of {0, 1} rooted at ∅ and take ∂S to be the probability measures on {0, 1} ∞ .
With this identification the partial order on S is just subset containment and the Martin kernel is given by where we recall from Example 4.3 that #t(u) = #{v ∈ t : u ≤ v}. That is, µ u is the mass assigned by µ to the set of infinite paths in the complete binary tree that begin at the root and that pass through the vertex u. The extended Martin kernel is given by Note from the construction of the BST process that its transition matrix is (this is also apparent from (4.3)). Set h µ := K(·, µ) for µ ∈ ∂S. The Doob htransform process corresponding to the regular function h µ has state space {t ∈ S : µ u > 0 for all u ∈ t} and transition matrix It follows that the h-transformed process results from a trickle-down construction. For simplicity, we only verify this in the case when µ u > 0 for all u ∈ {0, 1} = I, so that the state-space of the h-transformed process is all of S, and leave the formulation of the general case to the reader. The routing chain on S u = N {u0,u1} 0 has transition matrix Q u given by Q u ((m, n), (m + 1, n)) = µ u0 µ u and Q u ((m, n), (m, n + 1)) = µ u1 µ u .
In other words, we can regard the routing chain as the space-time chain corresponding to the one-dimensional simple random walk that has probability µ u0 /µ u of making a −1 step and probability µ u1 /µ u of making a +1 step. We have the following "trickle-up" construction of the h-transformed process. Suppose on some probability space that there is a sequence of independent identically distributed {0, 1} ∞ -valued random variables (V n ) n∈N with common distribution µ. For an initial finite rooted subtree w in the state space of the h-transformed process, define a sequence (W n ) n∈N0 of random finite subsets of {0, 1} inductively by setting W 0 := w and W n+1 := W n ∪ {V n+1 1 . . . V n+1 H(n+1)+1 }, n ≥ 0, where H(n + 1) := max{l ∈ N : V n+1 1 . . . V n+1 l ∈ W n } with the convention max ∅ = 0. That is, at each point in time we start a particle at a "leaf" of the complete binary tree {0, 1} picked according to µ and then let that particle trickle up the tree until it can go no further because its path is blocked by previous particles that have come to rest. It is clear that (W n ) n∈N0 is a Markov chain with state space the appropriate set of finite rooted subtrees of {0, 1} , initial state w, and transition matrix P (hµ) .
It follows from the trickle-up construction and Kolmogorov's zero-one law that the tail σ-field of the h-transformed process is trivial, and hence µ is an extremal point ofS. Alternatively, µ is extremal because it is clear from the strong law of large numbers that the h-transformed process converges to µ.
Consider the special case of the h-transform construction when the boundary point µ is the "uniform" or "fair coin-tossing" measure on {0, 1} ∞ ; that is, µ is the infinite product of copies of the measure on {0, 1} that assigns mass 1 2 to each of the subsets {0} and {1}. In this case, the transition matrix of the h-transformed process is where we write |u| for the length of the word u; that is, |u| = k when u = u 1 . . . u k . This transition mechanism is that of the digital search tree (DST) process. We have therefore established the following result. ]. The process in Theorem 5.1 appears as the output of the DST algorithm if the input is a sequence of independent and identically distributed random 0-1 sequences with distribution µ, where µ is the fair coin tossing measure. In the literature this assumption is also known as the symmetric Bernoulli model; in the general Bernoulli model the probability 1/2 for an individual digit 1 is replaced by an arbitrary p ∈ (0, 1). In our approach we do not need any assumptions on the internal structure of the random 0-1 sequences and we can work with a general distribution µ on {0, 1} ∞ . Any such DST processes "driven by µ" is an h-transform of the BST process, provided that µ u > 0 for all u ∈ I, and the trickle-up construction shows that the conditional distribution of the BST process, given that its limit is µ, is the same as the distribution of the DST process driven by µ.
In the symmetric Bernoulli model, the sample paths of the DST process converge almost surely to the single boundary point µ in the Doob-Martin topology, where µ is the uniform measure on {0, 1} ∞ . We now investigate the distribution of the limit of the sample paths of the BST process. There are several routes we could take.
Recall that the routing chains for the BST process are essentially Pólya urns; that is, the routing chain Y u = ((Y u ) u0 , (Y u ) u1 ) for the vertex u ∈ {0, 1} makes the transition (g, d) → (g + 1, d) with probability (g + 1)/(g + d + 2) and the transition (g, d) → (g, d + 1) with probability (d + 1)/(g + d + 2). It is a wellknown fact about the Pólya urn that, when ((Y u 0 ) u0 , (Y u 0 ) u1 ) = (0, 0), the sequence ((Y u n ) u0 + (Y u n ) u1 ) −1 ((Y u n ) u0 , (Y u n ) u1 ), n ∈ N 0 , converges almost surely to a random variable of the form (U, 1 − U ), where U is uniformly distributed on [0, 1]. It follows that if we write (T n ) n∈N for the BST process, then almost surely where the pairs (U u0 , U u1 ), u ∈ {0, 1} , are independent, the random variables U u0 and U u1 are uniformly distributed on [0, 1], and U u0 + U u1 = 1. Thus, the limit of the BST chain is the random measure Another approach is to observe that, from the trickle-up description of the htransformed processes described above and the extremality of all the boundary points, we only need to find a random measure on {0, 1} ∞ such that if we perform the trickle-up construction from a realization of the random measure, then we produce the BST process. It follows from the main result of [BM73] that the random measure M has the correct properties.
Yet another perspective is to observe that, by the general theory outlined in Section 3, the distribution of the limit is the unique probability measure M on ∂S such that In the present situation the right hand side evaluates to whereM is a random measure on {0, 1} ∞ with distribution M. Rather than simply verify that takingM = M , where M u = ∅<v≤u U v as above, has the requisite property, we consider a more extensive class of random probability measures with similar structure, compute the corresponding regular functions, and identify the transition matrices of the resulting h-transform processes.
Let the pairs (R u0 , R u1 ), u ∈ {0, 1} , be independent and take values in the set {(a, b) : a, b ≥ 0, a + b = 1}. Define a random probability measure N on {0, 1} ∞ by setting N u := ∅<v≤u R v for all u ∈ {0, 1} . The corresponding regular function is where the last product is over {0, 1} and A u (j, k) := E R j u0 R k u1 .
Remark 5.3. The chain with θ u = η u = for some fixed ∈ N appears in connection with the median-of-(2 − 1) version of the algorithms Quicksort and Quickselect (Find) -see [Grü99].
Proof. Note that (D 1 , D 2 , D 3 , D 4 ) has the same distribution as where the G 1 , . . . , G 4 are independent and G i has the Gamma distribution with parameters (α i , 1). Moreover, the latter random vector is independent of the sum G 1 + · · · + G 4 . Now, has the same distribution as By the fact above, and G 3 G 3 + G 4 , G 4 G 3 + G 4 are independent, and so G 1 + G 2 G 1 + · · · + G 4 , G 3 + G 4 G 1 + · · · + G 4 , 6. Random recursive trees and nested Chinese restaurant processes 6.1. Random recursive trees from another encoding of permutations. Recall from the Introduction how the binary search tree process arises from a classical bijection between permutations of [n] := {1, 2, . . . , n} and a suitable class of labeled rooted trees. The random recursive tree process arises from a similar, but slightly less well-known, bijection that we now describe.
We begin with a definition similar to that of the complete binary tree in the Introduction. Denote by N := ∞ k=0 N k the set of finite tuples or words drawn from the alphabet N (with the empty word ∅ allowed). Write an -tuple (v 1 , . . . , v ) ∈ N more simply as v 1 . . . v . Define a directed graph with vertex set N by declaring that if u = u 1 . . . u k and v = v 1 . . . v are two words, then (u, v) is a directed edge (that is, u → v) if and only if = k + 1 and u i = v i for i = 1, . . . , k. Call this directed graph the complete Harris-Ulam tree. A finite rooted Harris-Ulam tree is a subset t of N with properties: • ∅ ∈ t, • if v = u 1 . . . u k ∈ t, then u 1 . . . u j ∈ t for 1 ≤ j ≤ k −1 and u 1 . . . u k−1 m ∈ t for 1 ≤ m ≤ u k − 1. As in the binary case there is a canonical way to draw a finite rooted Harris-Ulam tree in the plane, see Figure 6 for an example. Further, we can similarly define a vertex u ∈ N to be an external vertex of the tree t if u / ∈ t and if t {u} is again a Harris-Ulam tree. Note that, in contrast to the binary case, external vertices are now specified by their immediate predecessor; in particular, a Harris-Ulam tree with n vertices has n external vertices. Given a permutation π of [n], set r(i) = π −1 (i) for 1 ≤ i ≤ n. Construct a finite rooted Harris-Ulam tree with n + 1 vertices labeled by [n] ∪ {0} from r(1), . . . , r(n) recursively, as follows. Denote by t 0 the tree consisting of just the root ∅ labeled with 0. Suppose for 1 ≤ i ≤ n − 1 that a tree t i with i vertices labeled by {0, . . . , i − 1} has already been defined. Assume that i = r( ). If {j : 1 ≤ j < , r(j) < i} = ∅, set s := 0. Otherwise, set s := r(k), where k := max{j : 1 ≤ j < , r(j) < i}. Let u be the vertex in t i labeled by s. Put q := max{p ∈ N : up ∈ t i } + 1, adjoin the vertex uq to t i to create the tree t i+1 , and label this new vertex with i.
For example, 1 is always the first child of 0 (occupying the vertex 1 in the complete Harris-Ulam tree) and 2 is either the second child of 0 (occupying the vertex 2 in the complete Harris-Ulam tree) or the first child of 1 (occupying the vertex 11 in the complete Harris-Ulam tree), depending on whether 2 appears before or after 1 in the list r(1), . . . , r(n). See Figure 7 for an instance of the construction with n = 9.
As in the Introduction, given a sequence (U n ) n∈N of independent identically distributed random variables that are uniform on the unit interval [0, 1], define a random permutation Π n of [n] for each positive integer n by setting Π n (k) = #{1 ≤ ≤ n : U ≤ U k }. Applying the bijection to Π n , we obtain a random labeled rooted tree and a corresponding unlabeled rooted tree that we again denote by L n and T n , respectively. Both of these processes are Markov chains with simple transition probabilities. For example, given T n we pick one of its n + 1 vertices uniformly at random and connect a new vertex to it to form T n+1 . Thus, (T n ) n∈N is the simplest random recursive tree process (see, for example, [SM94] for a survey of such models).
As with the BST and DST processes, we think of building the sequence (T n ) n∈N by first building a growing sequence of finite rooted Harris-Ulam trees labeled with the values of the input sequence U 1 , U 2 , . . . and then ignoring the labels. The transition rule for the richer process takes a simple form: attach a new vertex labeled with U n+1 to the root if U n+1 is smaller than each of the previous variables U 1 , . . . , U n ; if not, then attach a new vertex labeled with U n+1 to the existing vertex that is the labeled with the rightmost of the smaller elements. In contrast to the binary search tree situation, the labeled versions of the trees T 1 , . . . , T n−1 can now be determined from the labeled version of T n . However, if we remove the labels then we are in the same situation as in the BST case: the next tree is obtained by choosing an external vertex of the current tree uniformly at random and attaching it to the current tree.
6.2. Chinese restaurant processes. Suppose that in the tree T n the root has k offspring. Let n 1 , . . . , n k denote the number of vertices in the subtrees rooted at each of these offspring, so that n 1 + · · · + n k = n. Note that in constructing T n+1 from T n , either a new vertex is attached to the j th subtree with probability n j /(n + 1) or it is attached to the root and begins a new subtree with probability 1/(n + 1). Thus, the manner in which the number and sizes of subtrees rooted at offspring of the root evolve is given by the number and sizes of tables in the simplest Chinese restaurant process: the n th customer to enter the restaurant finds k tables in use with respective numbers of occupants n 1 , . . . , n k and the customer either sits at the j th table with probability n j /(n + 1) or starts a new table with probability 1/(n + 1).
It is clear from the construction of (T n ) n∈N that if we begin observing the subtree below one of the offspring of the root at the time the offspring first appears and only record the state of the subtree at each time it grows, then the resulting tree-valued process has the same dynamics as (T n ) n∈N . Iterating this observation, we see that we may think of (T n ) n∈N as an infinite collection of hierarchically nested Chinese restaurant processes and, in particular, that (T n ) n∈N arises as an instance of the trickle-down construction.
Rather than just investigate the Doob-Martin compactification of (T n ) n∈N we first recall the definition of Pitman's two-parameter family of processes to which the simple Chinese restaurant process belongs -see [Pit06] for background and an extensive treatment of the properties of these processes. We then apply the trickledown construction to build a tree-valued Markov chain that uses these more general processes as routing instructions. Analogous finitely nested Chinese restaurant processes have been used in hierarchical Bayesian inference [TJBB06].
A member of the family of Chinese restaurant processes is specified by two parameters α and θ that satisfy the constraints α < 0 and θ = −M α for some M ∈ N or 0 ≤ α < 1 and θ > −α.
Note that if α < 0 and θ = −M α for some M ∈ N, then, with probability one, the number of blocks in the partition is always at most M . We are only interested in the process that records the number and size of the blocks. This process is also Markov. The probability that the random partition at time q has block sizes b 1 , b 2 , . . . , b n is (θ + α)(θ + 2α) · · · (θ + (n − 1)α) (θ + 1)(θ + 2) · · · (θ + q − 1) The ordering of the blocks in this formula is their order of appearance: b 1 is the size of the initial table, b 2 is the size of the table that began receive customers next, and so on. More generally, the probability that we go from the partition A = {A 1 , . . . , A m } at time p to the partition B = {B 1 , . . . , B n } at time q > p is (θ + mα)(θ + 2α) · · · (θ + (n − 1)α) (θ + p)(θ + p + 1) · · · (θ + q − 1) The corresponding probability that we go from a partition with block sizes a 1 , . . . , a m at time p to one with block sizes b 1 , . . . , b n at time q > p is We can think of the block size process as a Markov chain with state space This expression can be rearranged to give (θ + 1)(θ + 2) · · · (θ + p − 1) (θ + α)(θ + 2α) · · · (θ + (m − 1)α) Note that lim N →∞ K(a, b N ) exists for all a ∈ E if and only if the limit exists for all a ∈ E of the form (1, . . . , 1, 0, 0, . . .) (that is, for all a ∈ E with entries in {0, 1}). Note also that the extended Martin kernel has the property that K(a, ρ) = 0 ⇔ a k ≥ 1 for some k with k−1 j=1 ρ k = 1, a k ≥ 2 for some k with ρ k = 0.
Note that the parameters α and θ do not appear in this expression for the transition probabilities. It follows that for a given M the block size chains all arise as Doob h-transforms of each other.
We can build a Markov chain (W n ) n∈N0 with transition matrix P (hρ) and initial state c as follows. Let (V n ) n∈N be a sequence of independent identically distributed random variables taking values in [M ] ∪ {∞} with P{V n = k} = ρ k for k ∈ [M ] and P{V n = ∞} = 1− ρ (the latter probability is always 0 when M is finite). Define (W n ) n∈N0 inductively by setting W 0 = c and, writing N n := inf{j ∈ [M ] : W nj = 0} with the usual convention that inf ∅ = ∞, for n ≥ 0. It is clear from this construction and Kolmogorov's zero-one law that the tail σ-field of the chain is trivial, and so the regular function h ρ is extremal. Thus, I is a tree rooted at ∅ in which we may identify β(u), the set of offspring of vertex u ∈ I, with [M ] for every vertex u. With this identification, we take the routing chain for every vertex to be the Chinese restaurant block size process with parameters α and θ.
We may think of the state space S of the trickle-down chain (X n ) n∈N0 as the set of finite subsets t of I with the property that if a word v = v 1 . . . v ∈ t, then v 1 . . . v −1 ∈ t and v 1 . . . v −1 k ∈ t for 1 ≤ k < v . That is, when [M ] = N we may think of S as the set of finite rooted Harris-Ulam trees from Subsection 6.1, and when M is finite we get an analogous collection in which each individual has at most M offspring.
The partial order on I = [M ] is the one we get by declaring that u ≤ v for two words u, v ∈ I if and only if u = u 1 . . . u k and v = v 1 . . . v with k ≤ and u 1 . . . u k = v 1 . . . v , just as for the complete binary tree. By analogy with the notation introduced in Example 4.3 for finite rooted binary trees, write #t(u) := #{v ∈ t : u ≤ v} for t ∈ S and u ∈ [M ] .
It follows from the discussion in Subsection 6.2 that Hypothesis 4.6 holds. We may identify the set I ∞ with For each vertex u ∈ I the collection S u consists of all probability measures on β(u) when M is finite and all subprobability measures on β(u) when M = ∞. We may therefore identify ∂S with the probability measures on I ∞ that assign all of their mass to [M ] ∞ when M is finite and with the set of all probability measures on I ∞ when M = ∞. We may extend the partial order by declaring that u < v for The following result summarizes the salient conclusions of the above discussion. transition probabilities associated with a given µ ∈ ∂S are straightforward but notationally somewhat cumbersome, so we omit them. They show that there is the following "trickle-up" construction of a Markov chain (W n ) n∈N0 with initial state w ∈ S and the h-transform transition probabilities (compare the analogous construction for the Chinese restaurant process itself in Subsection 6.2).
Let (V n ) n∈N be a sequence of independent, identically distributed I ∞ -valued random variables with common distribution µ. Suppose that S-valued random variables w =: W 0 ⊂ . . . ⊂ W n have already been defined. Put H(n+1) := max{h ∈ N : V n+1 . . , n} for n ≥ 1; that is, W n consists of the root ∅ and the first n children of the root.
It is clear from the Kolmogorov zero-one law that the tail σ-field of (W n ) n∈N0 is trivial for any µ, and so any µ is extremal.
Remark 6.4. By analogy with the definition of the BST process in Section 5, we define T n to be the set of vertices occupied by time n (so that T 0 = {∅}). Put, for each vertex u, T n (u) := {v ∈ T n : u ≤ v}. The distribution of the random probability measure R on [M ] ∞ defined by R{w ∈ [M ] ∞ : u < w} := lim n→∞ #T n (u)/#T n , u ∈ [M ] , may be derived from known properties of the two-parameter Chinese restaurant process (see, for example, Theorem 3.2 of [Pit06]). The φ model of Mallows [Mal57] produces a random permutation of the set [n] for some integer n ∈ N. One way to describe the model is the following.
We place the elements of [n] successively into n initially vacant "slots" labeled by [n] to obtain a permutation of [n] (if the number i goes into slot j, then the permutation sends i to j). To begin with, each slot is equipped with a Bernoulli random variable. These random variables are obtained by taking n independent Bernoulli random variables with common success probability 0 < p < 1 and conditioning on there being at least 1 success. The number 1 is placed in the first slot for which the associated Bernoulli random variable is a success. Thus, the probability that there are k vacant slots to the left of 1 is Now equip the remaining n − 1 vacant slots (that is, every slot except the one in which 1 was placed) with a set of Bernoulli random variables that is independent of the first set. These random variables are obtained by taking n − 1 independent Bernoulli random variables with common success probability p and conditioning on there being at least 1 success. Place the number 2 in the first vacant slot for which the associated Bernoulli is a success. The probability that there are k vacant slots to the left of 2 is Continue in this fashion until all the slots have been filled.
The analogous procedure can be used to produce a permutation of N. Now the procedure begins with infinitely many slots labeled by N, and at each stage there is no need to condition on the almost sure event that there is at least one success. After each m ∈ N is inserted, the current number of vacant slots to the left of the slot in which m is placed is distributed as the number of failures before the first success in independent Bernoulli trials with common success probability p, and these random variables are independent. We note that this distribution on permutations of N appears in [GO10] in connection with q-analogues of de Finetti's theorem.
The following lemma is immediate from the construction of the Mallows model. Recall from the description of the BST process in the Introduction how it is possible to construct from a permutation π of [n] a subtree of the complete binary tree {0, 1} * that contains the root ∅ and has n vertices. The procedure actually produces a tree labeled with the elements of [n], but we are only interested in the underlying unlabeled tree. Essentially the same construction produces an infinite rooted binary tree labeled with N from a permutation π of N. This tree has the property that if a vertex u = u 1 . . . u k belongs to the tree, then there only finitely many vertices v such that u 1 . . . u k 0 ≤ v.
The following result is immediate from Lemma 7.1 and the recursive nature of the procedure that produces a rooted subtree of {0, 1} * from a permutation.
We may regard (X n ) n∈N0 as a Markov chain taking values in the set of finite subtrees of {0, 1} that contain the root ∅, in which case {∅} = X 0 ⊆ X 1 ⊆ . . . and X ∞ := n∈N0 X n is an infinite subtree of {0, 1} that contains ∅. Then, X ∞ has the same distribution as the random tree constructed from a random permutation of N that is distributed according to the Mallows model with parameter p.
We call the Markov chain (X n ) n∈N0 of Proposition 7.2 the Mallows tree process.

Mallows urns.
Consider the Markov chain on N 0 ×N 0 with transition matrix Q introduced in Proposition 7.2. We call this chain the Mallows urn, because its role as a routing chain for the Mallows tree process is similar to that played by the Pólya urn in the construction of the BST process. When started from (0, 0), a sample path of the Mallows urn process looks like (0, 0), (1, 0), . . . , (K, 0), (K, 1), (K, 2), . . ., The probability that the Mallows urn process visits the state (k, ) starting from the state (i, j) is otherwise.
In particular, the probability that the process visits (k, ) starting from (0, 0) is Taking, as usual, (0, 0) as the reference state, the Martin kernel for the Mallows urn process is thus if i ≤ k, j = 0 and = 0, or, equivalently, otherwise.
It follows that if ((k n , n )) n∈N0 is a sequence for which k n + n → ∞ then, in order for the sequence (K((i, j), (k n , n ))) n∈N0 to converge, it must either be that k n = k ∞ for some k ∞ for all n sufficiently large and n → ∞, in which case the limit is otherwise, or that k n → ∞ with no restriction on n , in which case the limit is Consequently, the Doob-Martin compactification N 0 × N 0 of the state space of the Mallows urn process is such that the Doob-Martin boundary ∂(N 0 × N 0 ) : With this identification, the state space of the h-transformed process corresponding to the boundary point k ∈ N 0 is { (0, 0), (1, 0), . . . , (k, 0)} ∪ {(k, 1), (k, 2), . . .} and the transition probabilities are Note that if s ⊆ t n and #s(u0) = #t n (u0), then {v ∈ s : u0 ≤ v} = {v ∈ t n : u0 ≤ v}. Therefore, when s ⊆ t n , M (s, t n ) counts the number of vertices of the form u0 such that the subtree below u0 in s is the same as the subtree below u0 in t n and u1 ∈ s. Similarly, I(s, t n ) = 1 if and only if for all vertices of the form u0, the subtree below u0 in s is the same as the subtree below u0 in t n whenever u1 ∈ s. Hence, if s ⊆ t n , then Suppose that #t n (0) → ∞. For any s such that 1 ∈ s, I(s, t n ) must be 0 for all n sufficiently large, because the subtree below 0 in s cannot equal the subtree below 0 in t n for all n.
On the other hand, if 1 / ∈ s, then K(s, t n ) = K(s,t n ), wheret n is the tree obtained from t n by deleting all vertices v with 1 ≤ v. Consequently, if #t n (0) → ∞, then in order to check whether K(s, t n ) converges for all s ∈ S, it suffices to replace t n byt n and restrict consideration to s such that 1 / ∈ s. Moreover, the limits of K(s, t n ) and K(s,t n ) are the same, so the sequences (t n ) n∈N and (t n ) n∈N correspond to the same point in the Doob-Martin compactification. Now suppose that #t n (0) → ∞ (so that #t n (1) → ∞ must hold). It is clear that if K(s, t n ) converges for all s ∈ S with 1 / ∈ s, then the sets {v ∈ t n : 0 ≤ v} are equal for all n sufficiently large.
Lett m be the subtree of t m obtained by deleting from t m any vertex v such that u1 ≤ v for some u with #t n (u0) → ∞. Applying the above arguments recursively, a necessary and sufficient condition for the sequence (t n ) n∈N0 to converge to a point in the Doob-Martin compactification is that whenever #t n (u0) → ∞ for some u, then the set {v ∈t n : u0 ≤ v} are equal for all n sufficiently large. Moreover, the sequences (t n ) n∈N0 and (t n ) n∈N0 converge to the same limit point.
Suppose that (t n ) n∈N0 and hence (t n ) n∈N0 converges in the Doob-Martin compactification. Set t ∞ = m∈N0 n≥mt n .
We have where I(s, t ∞ ) is defined to be 1 or 0 depending on whether or not for all vertices of the form u0 with u1 ∈ s the subtree below u0 in s is the same as the subtree below u0 in t ∞ .
Recall that we write |u| for the length of a word u ∈ {0, 1} ; that is, |u| = k when u = u 1 . . . u k . Note that if t ∈ T, then the sequence (t n ) n∈N0 in S defined by t n := {u ∈ t : |u| ≤ n} converges in the Doob-Martin compactification of S and the tree t ∞ constructed from this sequence is just t.
Finally, observe that if we extend K(s, t) for s ∈ S and t ∈ T Any vertex that has a "left" child with infinitely many descendants has no "right" child.
then for any distinct t , t ∈ T there exists s ∈ S such that K(s, t ) = K(s, t ). The important elements of the above discussion are contained in the following result.
Theorem 7.3. Consider the Mallows tree chain with state space S consisting of the set of finite rooted binary trees. Let T be the set of infinite rooted binary trees t such that u1 ∈ t for some u ∈ {0, 1} implies #t(u0) < ∞. Equip S T with the topology generated by the maps Π n : S T → S, n ∈ N 0 , defined by Π n (t) := {u ∈ t : |u| ≤ n}, where on the right we equip the countable set S with the discrete topology. The Doob-Martin compactificationS is homeomorphic to S T, and this homeomorphism identifies the Doob-Martin boundary ∂S with T.
Remark 7.4. The limit in the Doob-Martin topology of the Mallows tree chain (X n ) n∈N0 started from the trivial tree ∅ is just the T-valued random variable X ∞ := n∈N0 X n introduced in Proposition 7.2. Almost surely, the spine of X ∞ (that is, the unique infinite path from the root ∅) is equal to the rightmost path ∅ → 1 → 11 → 111 . . . in the complete infinite binary tree.
Remark 7.5. It is straightforward to check that each of the harmonic functions K(·, t), t ∈ T is extremal. If we order the alphabet {0, 1} so that 0 comes before 1 and equip the set of words {0, 1} with the corresponding lexicographic order, then the state space of the h-transformed process corresponding to an infinite tree t ∈ T is the set of finite subtrees s of t such that if u ∈ s, then every predecessor of u in the lexicographic order also belongs to s. A realization of the h-transformed process started from ∅ is the deterministic path that adds the vertices of t one at a time in increasing lexicographic order.
Remark 7.6. As in the BST and DST cases, the Mallows tree process can be regarded as a Markov chain which moves from a tree t to a tree s of the form s = t {v}, where the new vertex v is an external vertex of t (see the discussion following (4.2)). This implies that the transition probabilities can be coded by a function p that maps pairs (t, v), t ∈ I and v an external vertex of t, to the probability that the chain moves from t to t {v}.
In the BST case one of the |t| + 1 external vertices of t is chosen uniformly at random, that is, p(v|t) = 1/(|t| + 1), whereas we have p(v|t) = 2 −|v| in the DST case. For Mallows trees, we have the following stochastic mechanism. Let u be the vertex of t that is greatest in the lexicographic order. Denote by i 1 < · · · < i the indices at which the corresponding entry of u is a 0 (we set = 0 if every entry of u is a 1). Write v j , 1 ≤ j ≤ , for the external vertices of t that arise if the 0 in position i j is changed to 1. Put v +1 := v1 and v +2 := v0. Then, we choose v j with probability p ij , j = 1, . . . , , and v +1 and v +1 with probabilities rp and r(1 − p) respectively, where r := 1 − j=1 p ij .
Note that not all Markov chains of the vertex-adding type can be represented as trickle-down processes. Indeed, a distinguishing feature of the trickle-down chains within this larger class is the fact that the restriction of the function v → p(v|t) to the external vertices of the left subtree of t depends on t only via the number of vertices in the right subtree of t. Similar restrictions hold with left and right interchanged, and also for the subtrees of non-root vertices.
8. q-binomial chains 8.1. q-binomial urns. Fix parameters 0 < q < 1 and 0 < r < 1, and define a transition matrix Q for the state space N 0 × N 0 by Q((i, j), (i + 1, j)) = rq j and Q((i, j), (i, j + 1)) = 1 − rq j for (i, j) ∈ N 0 × N 0 . We note that this 2-parameter family of processes is a special case of the 3-parameter family studied in [CS97], where it is shown to have a number of interesting connections with graph theory. In the next subsection, we use Markov chains with the transition matrix Q as the routing chains for a trickle-down process on I = {0, 1} in the same way that we have used the Pólya and Mallows urn processes. Note that, by a simple Borel-Cantelli argument, almost surely any sample path of a Markov chain (Y n ) n∈N0 = ((Y n , Y n )) n∈N0 with transition matrix Q is such that We want to compute the probability that the chain goes from (i, j) to (k, ) for i ≤ k and j ≤ .
Observe that the probability the chain goes from (i, j) to (k, ) via (k, j) is Observe also that if S(i, j) is the probability the chain goes from (i, j) to (i + 1, j + 1) via (i + 1, j) and T (i, j) is the probability the chain goes from (i, j) to (i + 1, j + 1) via (i, j + 1), then T (i, j) = qS(i, j). It follows by repeated applications of this observation that the probability the chain goes from (i, j) to (k, ) along some "north-east" lattice path σ is where A(σ) is the area in the plane above the line segment [i, k] × {j} and below the curve obtained by a piecewise linear interpolation of σ. Hence, the probability that the chain hits (k, ) starting from (i, j) is where the sum is over all "north-east" lattice paths σ from (i, j) to (k, ).
As explained in [AAR99, Chapter 10], the evaluation of the sum is a consequence of the non-commutative q-binomial theorem of [Sch53] (see also [Pól69]), and .
Taking, as usual, (0, 0) as the reference state, the Martin kernel for the chain is thus for i ≤ k and j ≤ (and 0 otherwise). The Doob-Martin compactification of a chain with transition matrix Q is identified in [GO09, Section 4], but for the sake of completeness we present the straightforward computations. If ((k n , n )) n∈N0 is a sequence such that k n + n → ∞, then, in order for K((i, j), (k n , n )) to converge, we must have either that k n = k ∞ for some k ∞ for all n sufficiently large and n → ∞, in which case the limit is for i ≤ k ∞ (and 0 otherwise), or that k n → ∞ with no restriction on n , in which case the limit is Consequently, the Doob-Martin compactification N 0 × N 0 of the state space is such that With this identification, the h-transformed process corresponding to the boundary point k ∈ N 0 has state space {0, . . . , k} × N 0 , and transition probabilities Q h ((i, j), (i + 1, j)) = (1 − q k−i ), i < k, Q h ((i, j), (i, j + 1)) = q k−i , i < k, and Q h ((k, j), (k, j + 1)) = 1.
8.2. q-binomial trees. Suppose that we apply the trickle-down construction with I = {0, 1} and all of the routing chains given by the q-binomial urn of Subsection 8.1, in the same manner that the BST process and the Mallows tree process were built from the Pólya urn and the Mallows urn, respectively. Just as for the latter two processes, we may identify the state space S with the set of finite subtrees of {0, 1} that contain the root ∅. We call the resulting tree-valued Markov chain the q-binomial tree process. Recalling Theorem 7.3 and comparing the conclusions of Subsection 8.1 with those of Subsection 7.2, the following result should come as no surprise. We leave the details to the reader.
Theorem 8.1. Consider the q-binomial tree chain with state space S consisting of the set of finite rooted binary trees. Let T be the set of infinite rooted binary trees t such that u1 ∈ t for some u ∈ {0, 1} implies #t(u0) < ∞. Equip S T with the topology generated by the maps Π n : S T → S, n ∈ N 0 , defined by Π n (t) := {u ∈ t : |u| ≤ n}, where on the right we equip the countable set S with the discrete topology. The Doob-Martin compactificationS is homeomorphic to S T, and this homeomorphism identifies the Doob-Martin boundary ∂S with T. Moreover, each boundary point is extremal.

Chains with perfect memory
Recall the Mallows urn model of Subsection 7.2 and the q-binomial urn model of Subsection 8.1. These Markov chains have the interesting feature that if we know the state of the chain at some time, then we know the whole path of the process up to that time. In this section we examine the Doob-Martin compactifications of such chains with a view towards re-deriving the results of Subsection 7.2 and Subsection 8.1 in a general context. We also analyze a trickle-down process resulting from a composition-valued Markov chain. We return to the notation of Section 3: X = (X n ) n∈N0 is a transient Markov chain with countable state space E, transition matrix P and reference state e ∈ E such that ρ(j) := P e {X hits j} > 0, for all j ∈ E. We suppose that the chain X has perfect memory, by which we mean that the sets E n := {j ∈ E : P e {X n = j} > 0}, n ∈ N 0 , are disjoint, and that there is a map f : E \ {e} → E with the property that P e {f (X n ) = X n−1 } = 1, for all n ∈ N. Note that this implies that the tail σ-field associated with the process X is the same as the σ-field σ({X n : n ∈ N 0 }) generated by the full collection of variables of the process.
Suppose that we construct a directed graph T that has E as its set of vertices and contains a directed edge (i, j) if and only if P (i, j) > 0. By the assumption on e, for any j ∈ E n , n ∈ N, there is a directed path e = i 0 → . . . → i n = j. Also, it follows from the perfect memory assumption that a directed edge (i, j) must have i ∈ E n and j ∈ E n+1 for some n. Moreover, if (i, j) is such a directed edge, then there is no h ∈ E n for which (h, j) is also a directed edge. Combining these observations, we see that the directed graph T is a rooted tree with root e. The function f is simply the map that assigns to any vertex j ∈ E \ {e} its parent. For j ∈ E n , n ∈ N, the unique directed path from e to j is e = f n (j) → f n−1 (j) → . . . → f (j) → j.
Suppose from now on that the tree T is locally finite; that is, for each i ∈ E, there are only finitely many j ∈ E with P (i, j) > 0.
As usual, we define a partial order ≤ on T (= E) by declaring that i ≤ j if i appears on the unique directed path from the root e to j.
We now recall the definition of the end compactification of T . This object can be defined in a manner reminiscent of the definition of the Doob-Martin compactification as follows. We map T injectively into the space R T of real-valued functions on T via the map that takes j ∈ T to the indicator function of the set {i ∈ T : i ≤ j}. The closure of the image of T is a compact subset of R T . We identify T with its image and writeT for the closure. The compact spaceT is metrizable and a sequence (j n ) n∈N from T converges inT if and only if 1 {i≤jn} converges for all i ∈ T , where 1 {i≤·} is the indicator function of the set {j ∈ T : i ≤ j}. The boundary ∂T :=T \T can be identified with the infinite directed paths from the root e. We can extend the function 1 {i≤·} continuously toT . We can also extend the partial order ≤ toT by declaring that ξ ≤ ζ for any ξ = ζ ∈ ∂T and i ≤ ξ for ξ ∈ ∂T if and only if 1 {i≤ξ} = 1.
Theorem 9.1. Let X be a chain with state space E, reference state e, perfect memory, and locally finite associated tree T . Then, the associated Martin kernel is given by The Doob-Martin compactification of E is homeomorphic to the end compactification of T . The extended Martin kernel is given by Proof. By definition, By assumption, the numerator is 0 unless i ≤ j. If i ≤ j, then the denominator is P e {X hits j} = P e {X hits i} P i {X hits j} and the claimed formula for the Doob-Martin kernel follows. The remainder of the proof is immediate from the observation that the manner in which the end compactification is constructed from the functions 1 {i≤·} , i ∈ E, is identical to the manner in which the Doob-Martin compactification is constructed from the functions K(i, ·) = ρ(i) −1 1 {i≤·} , i ∈ E.
Example 9.2. The Mallows urns process satisfies the conditions of Theorem 9.1. The tree T has N 2 0 as its set of vertices, and directed edges of the form ((i, 0), (i + 1, 0)) and ((i, j), (i, j + 1)), i, j ∈ N 0 . The perfect memory property survives the lift from urn to tree. The "parenthood" function f takes a tree t in the state space of the Mallows tree process and simply removes the vertex of t that is greatest in the lexicographic order.
This description of the state space of the Mallows tree process as a "tree-of-trees" also makes its Doob-Martin compactification easier to understand. We know from Section 7.3 that points in the Doob-Martin boundary can be identified with rooted binary trees with a single infinite path -the "spine" -with nothing dangling off to the right of the spine. It is, of course, easy to construct a sequence of finite rooted binary trees that tries to grow more than one infinite path: for example, let t n be the tree that consists of the two vertices 00 . . . 0, 11 . . . 1 ∈ {0, 1} n and the vertices in {0, 1} on the directed paths connecting them to the root ∅. The sequence (t n ) n∈N must have a subsequence with a limit point in the compact spacē S or, equivalently, it must have a subsequence that converges to a limit in the end compactificationT of the tree T . From the above description of the parenthood function f , we see for a tree s ∈ T that s ≤ t n if and only if one of the following three conditions hold: • We note that the sequence (t n ) n∈N of finite rooted binary trees converges even in the Doob-Martin compactification of the binary search tree process to a point in the boundary. Indeed (see the first paragraph of Section 5), we can identify this latter point with the probability measure on {0, 1} ∞ that puts mass 1 2 at each of the points 00 . . . and 11 . . .. Example 9.3. A composition of an integer n ∈ N is an element c = (c 1 , . . . , c k ) of N with the property that k i=1 c i = n. We recall the standard proof of the fact that there are 2 n−1 such compositions for a given n: one thinks of placing n balls on a string and defines a composition by placing separators into some of the n − 1 gaps between the balls. A combinatorially equivalent bijection arises from deleting the last of these balls, labeling the balls to the left of each separator by 1 and labeling the remaining balls by 0. We can now construct a Markov chain (X n ) n∈N such that X n is uniformly distributed on the set of compositions of n and X n is a prefix of X n+1 for all n ∈ N: the state space is E = {0, 1} and the allowed transitions are of the form (u 1 , . . . , u n−1 ) → (u 1 , . . . , u n−1 , 1), (u 1 , . . . , u n−1 ) → (u 1 , . . . , u n−1 , 0), both with probability 1/2. Here X 1 = ∅ represents the only composition 1 = 1 of n = 1. Attaching the digit 1 to the state representing a composition of n means that the new composition, now of n + 1, has an additional summand of size 1 at the end, whereas adding 0 corresponds to increasing the last summand of the old composition by 1. A construction of this type, which relates random compositions to samples from a geometric distribution, has been used in [HL01] -see also the references given there.
The chain (X n ) n∈N certainly has the perfect memory property and the associated tree T is just the complete rooted binary tree structure on {0, 1} from the Introduction. It follows from Theorem 9.1 that the Doob-Martin compactification is homeomorphic to {0, 1} {0, 1} ∞ , the end compactification of {0, 1} . Note that we can also think of the chain (X n ) n∈N as a result of the trickle-down construction in which the underlying directed acyclic graph I is the complete rooted binary tree, the routing instruction chains all have state space The chain is of the single trail type described in Example 2.4. For processes of this type there are usually several possibilities for the underlying directed graph; here we may take I = N 0 × N 0 instead of the complete rooted binary tree if we interpret appending 0 as a move to the right and appending 1 as a move up.
Remark 9.4. For several of the chains (X n ) n∈N0 that we have considered in the previous sections there is a "background chain" (X n ) n∈N0 with the perfect memory property in the sense that there is a function Ψ :S → S with X n = Ψ(X n ) for all n ∈ N, where S andS are the respective state spaces. For example, random recursive trees are often considered together with their labels and are then of the perfect memory type -see Figure 7. Conversely, we can always extend the state space S of a given chain by including the previous states, taking the new state spaceS to be the set of words from the alphabet S, to obtain a background chain of the perfect memory type. For example, the Pólya urn then leads to a single trail chain in the sense of Example 2.4, with underlying directed graph N × N and transitions Q((i, j), (i + 1, j)) = i/(i + j) and Q((i, j), (i, j + 1)) = j/(i + j).

Another approach to tail σ-fields
As mentioned in the Introduction, our initial motivation for studying the Doob-Martin compactifications of various trickle-down chains was to understand the chains' tail σ-fields. Determining the compactification requires a certain amount of knowledge about the hitting probabilities of a chain, and this information may not always be easy to come by. In this section we consider a family of trickle-down chains for which it is possible to describe the tail σ-field directly without recourse to the more extensive information provided by the Doob-Martin compactification. The class of processes to which this approach applies includes the Mallows tree and q-binomial tree process that we have already analyzed, as well as the Catalan tree process of Section 11 below that we are unable to treat with Doob-Martin compactification methods.
We begin with a lemma that complements a result from [vW83] on exchanging the order of taking suprema and intersections of σ-fields.
Then, the two sub-σ-fields m∈N0 n∈N0 G m,n and n∈N0 m∈N0 G m,n are equal up to null sets.
Proof. We first establish that We now verify that m∈N0 n∈N0 G m,n ⊇ n∈N0 m∈N0 G m,n up to null sets. For this it suffices to show that any bounded random variable Z that is measurable with respect to m∈N0 n∈N0 G m,n , satisfies the equality G m,n = Z a.s.
By a monotone class argument, we may further suppose that Z is measurable with respect to M m=0 n∈N0 G m,n = n∈N0 G M,n for some M ∈ N 0 . Our assumptions guarantee that for all n ∈ N 0 and m > M G m,n ⊆ G M,n ∨ H M +1 ∨ · · · ∨ H m and G M,n ⊆ H 0 ∨ · · · ∨ H M .
From these inclusions, the backwards and forwards martingale convergence theorems and the assumed independence of the H j , j = 0, 1, . . . we see that By the assumptions of the trickle-down construction, ((Y u n ) v ) n∈N0 is nondecreasing Q u,ξ -almost surely for every u ∈ I, v ∈ β(u) and ξ ∈ S u . Therefore, (Y u ∞ ) v := lim n→∞ (Y u n ) v exists Q u,ξ -almost surely in the usual one-point compactification N 0 {∞} of N 0 .
Recall for the Mallows tree and q-binomial tree processes that I = {0, 1} and that the routing chains in both cases all had the property (Y u ∞ ) u0 < ∞ and (Y u ∞ ) u1 = ∞, Q u,ξ -almost surely. We see from the following result that it is straightforward to identify the tail σ-field for a trickle-down process if all of its routing chains exhibit this kind of behavior. Another example is the Catalan tree process defined in Section 11 below -see Proposition 11.1.
Proposition 10.2. Suppose that β(u) is finite for all u ∈ I. Fix x ∈ S. Suppose that #{v ∈ β(u) : (Y u ∞ ) v = ∞} = 1, Q u,x u -a.s. for all u ∈ I. Then, the tail σ-field m∈N0 σ{X n : n ≥ m} is generated by X ∞ := (X u ∞ ) u∈I up to P x -null sets. Proof. By the standing hypotheses on I and the assumption that β(u) is finite for all u ∈ I, we can list I as (u p ) p∈N0 in such a way that u p ≤ u q implies p ≤ q (that is, we can put a total order on I that refines the partial order ≤ in such a way that the resulting totally ordered set has the same order type as N 0 ). For each p ∈ N 0 , put J p := {u 0 , . . . , u p }. By Remark 2.6, each process ((X u n ) u∈Jp ) n∈N0 is a Markov chain. Now, m∈N0 σ{X n : n ≥ m} = m∈N0 p∈N0 σ{X u n : u ∈ J p , n ≥ m}.
Thus, by Lemma 10.1, m∈N0 σ{X n : n ≥ m} = p∈N0 m∈N0 σ{X u n : u ∈ J p , n ≥ m} up to P x -null sets. To show the claimed assertion, it thus suffices to check that for all p ∈ N 0 m∈N0 σ{X u n : u ∈ J p , n ≥ m} = σ{X u ∞ : u ∈ J p }.
We establish this via induction as follows.
For brevity we suppose that x u = (0, 0, . . .) for all u ∈ I. In this way we avoid the straightforward but somewhat tedious notational complications of the general case.
Remark 10.3. When I is a tree and we are in the situation of Proposition 10.2, then X ∞ may be thought of as an infinite rooted subtree of I with a single infinite directed path from the root0. Regarding (X n ) n∈N0 as a tree-valued process, we have X ∞ = n∈N0 X n . Equivalently, X ∞ is the limit of the finite subsets X n of I if we identify the subsets of I with the Cartesian product {0, 1} I in the usual way and equip the latter space with the product topology.

The Catalan tree process
Let S n denote the set of subtrees of the complete rooted binary tree {0, 1} that contain the root ∅ and have n vertices. The set S n has cardinality C n , where C n := 1 n + 1 2n n is the n th Catalan number. A special case of a construction in [LW04] gives a Markov chain (X n ) n∈N0 with state space the set of finite rooted subtrees of {0, 1} such that (11.1) P {∅} {X n = t} = C −1 n+1 , t ∈ S n ; that is, if the chain begins in the trivial tree {∅}, then its value at time n is uniformly distributed on S n+1 . Moreover, the construction in [LW04] is an instance of the trickle-down construction in which I = {0, 1} and all of the routing chains have the same dynamics.
Combining these observations, we can calculate the entries of the transition matrix Q iteratively and, as observed in [LW04], the entries of Q are non-negative and the rows of Q sum to one. We refer to the resulting Markov chain as the Catalan urn process. Note that if the random tree T is uniformly distributed on S n+1 then, conditional on the event {#T (0) = k, #T (1) = n−k}, the random trees {u ∈ {0, 1} * : 0u ∈ T } and {u ∈ {0, 1} * : 1u ∈ T } are independent and uniformly distributed on S k and S n−k , respectively. Thus, a trickle-down construction with each routing chain given by the Catalan urn process does indeed give a tree-valued chain satisfying (11.1).
Proposition 11.1. The tail σ-field of the Catalan tree process (X n ) n∈N0 is generated up to null sets by the infinite random tree X ∞ := n∈N0 X n under P {∅} .
As we noted in Remark 10.3, the tree X ∞ has a single infinite path from the root ∅. Denote this path by ∅ = U 0 → U 1 → . . .. For n ∈ N, define W n ∈ {0, 1} by U n = W 1 . . . W n . It is apparent from the trickle-down construction and the discussion above that the sequence (W n ) n∈N is i.i.d. with P{W n = 0} = P{W n = 1} = 1 2 . Moreover, if we setW n = 1 − W n and put T n := {u ∈ {0, 1} : W 1 . . . W n−1Wn u ∈ X ∞ }, so that T n is either empty or a subtree of {0, 1} rooted at ∅, then the sequence (T n ) n∈N is i.i.d. and independent of (W n ) n∈N with P{#T n = k} = 2 × 4 −(k+1) C k , k ∈ N 0 , and P{T n = t | #T n = k} = 1 C k , t ∈ S k , k ∈ N.
Note that if (S n ) n∈N0 is any sequence of random subtrees of {0, 1} * such that S n is uniformly distributed on S n+1 for all n ∈ N 0 , then S n converges in distribution to a random tree that has the same distribution as X ∞ , where the notion of convergence in distribution is the one that comes from thinking of subtrees of {0, 1} * as elements of the Cartesian product {0, 1} {0,1} * equipped with the product topology -see Remark 10.3. The convergence in distribution of such a sequence (S n ) n∈N0 and the above description of the limit distribution have already been obtained in [Jan02] using different methods. For a similar weak convergence result for uniform random trees, see [Gri81] and the survey [AS04, Section 2.5]. Also, if we define rooted finite d-ary trees for d > 2 as suitable subsets of {0, 1, . . . , d − 1} in a manner analogous to the way we have defined rooted finite binary trees, then it is shown in [LW04] that it is possible to construct a Markov chain that grows by one vertex at each step and is uniformly distributed on the set of d-ary trees with n vertices at step n -in particular, there is an almost sure (and hence distributional) limit as n → ∞ in the same sense as we just observed for the uniform binary trees. We have not investigated whether this process is the result of a trickle-down construction. Lastly, we note that there are interesting ensembles of trees that can't be embedded into a trickle-down construction or, indeed, into any Markovian construction in which a single vertex is added at each step; for example, it is shown in [Jan06] that this is not possible for the ensemble obtained by taking a certain critical Galton-Watson tree with offspring distribution supported on {0, 1, 2} and conditioning the total number of vertices to be n ∈ N.