Radix sort trees in the large

The trie-based radix sort algorithm stores pairwise different infinite binary strings in the leaves of a binary tree in a way that the Ulam-Harris coding of each leaf equals a prefix (that is, an initial segment) of the corresponding string, with the prefixes being of minimal length so that they are pairwise different. We investigate the {\em radix sort tree chains} -- the tree-valued Markov chains that arise when successively storing infinite binary strings $Z_1,\ldots, Z_n$, $n=1,2,\ldots$ according to the trie-based radix sort algorithm, where the source strings $Z_1, Z_2,\ldots$ are independent and identically distributed. We establish a bijective correspondence between the full Doob--Martin boundary of the radix sort tree chain with a {\em symmetric Bernoulli source} (that is, each $Z_k$ is a fair coin-tossing sequence) and the family of radix sort tree chains for which the common distribution of the $Z_k$ is a diffuse probability measure on $\{0,1\}^\infty$. In essence, our result characterizes all the ways that it is possible to condition such a chain of radix sort trees consistently on its behavior"in the large".


Introduction
Various sorting algorithms proceed by storing the data in the leaves of a tree.If the data are infinite binary strings z 1 , . . ., z n ∈ {0, 1} ∞ , then a natural choice for the tree is the rooted binary tree with n leaves chosen such that the Ulam-Harris coding of each of the leaves coincides with a finite initial segment (otherwise called a prefix or left factor) of one of the z j , and such that these initial segments are pairwise different and have minimal length (see below for a fuller description).This data structure is the basis of the Radix Sort algorithm.The tree R(z 1 , . . ., z n ) in whose leaves the n strings are stored is sometimes called a trie, alluding to the word retrieval.
When the n strings are random, drawn i.i.d.from a diffuse probability distribution ν on {0, 1} ∞ , then this construction gives rise to a random tree ν R n := R(Z 1 , . . ., Z n ).In order to obtain a probabilistic analysis of the Radix Sort algorithm, asymptotic properties of these random trees as n → ∞ have been considered for the symmetric Bernoulli or unbiased memoryless source model, where ν is the fair coin tossing measure, e.g. in [Mah92] ch. 5 and [Knu98] §5.2.2., and for more general inputs of random strings in [Szp01].The density model, where ν is the image under the binary expansion of an absolutely continuous probability measure on [0, 1], was considered in [Dev92].Dynamical sources appear in [CFV01]; these include Markovian inputs, where ν is the shift-invariant distribution of a Markov chain, see [SJ91], [LNS15].
In this paper we analyze the tree-valued Markov chains ( ν R n ) n∈N from a more synoptic point of view.We show that any such chain is a harmonic transform of the Markov chain ( γ R n ) n∈N , with γ the fair coin-tossing measure, and we prove that the family ( ν R n ) n∈N as ν varies constitute the full Doob-Martin boundary of ( γ R n ) n∈N .Loosely speaking, this means that all consistent ways of conditioning a chain of radix sort trees "in the large" are described by precisely the family In order to state our main result more formally, we first fix some notation.Denote by {0, 1} ⋆ := ∞ k=0 {0, 1} k the set of finite tuples or words drawn from the alphabet {0, 1} (with the empty word ∅ allowed) -the symbol emphasizes that this is a disjoint union.Write an ℓ-tuple v = (v 1 , . . ., v ℓ ) ∈ {0, 1} ⋆ more simply as v 1 . . .v ℓ and set |v| = ℓ.Define a directed graph with vertex set {0, 1} ⋆ by declaring that if u = u 1 . . .u k and v = v 1 . . .v ℓ are two words, then (u, v) is a directed edge (that is, u → v) if and only if ℓ = k + 1 and u i = v i for i = 1, . . ., k.Call this directed graph the complete rooted binary tree.Say that u < v for two words u and no two elements of {0, 1} ∞ are comparable).It will be convenient to introduce the notation τ (y) := {z ∈ {0, 1} ∞ : y < z} for y ∈ {0, 1} ⋆ .
A finite rooted binary tree is a non-empty subset t of {0, 1} ⋆ with the property that if v ∈ t and u ∈ {0, 1} ⋆ is such that u → v, then u ∈ t.The vertex ∅ (that is, the empty word) belongs to any such tree t and is the root of t.The leaves of t are the elements v ∈ t such that if v → w, then w / ∈ t, and we use the notation L(t) for the leaves of t.A finite rooted binary tree is uniquely determined by its leaves: it is the smallest rooted binary tree that contains the set of leaves and it consists of the leaves and the points u ∈ {0, 1} ⋆ such that u < v for some leaf v.In general, write for the smallest finite rooted binary tree containing y 1 , . . ., y m ∈ {0, 1} ⋆ ; the leaves of this tree form a subset of {y 1 , . . ., y m } and this subset is proper if and only if y i < y j for some pair 1 ≤ i = j ≤ m.
Let Z 1 , Z 2 , . . .be i.i.d.{0, 1} ∞ -valued random variables with common distribution some diffuse probability measure ν.Then Z 1 , Z 2 , . . .are a.s.pairwise distinct, and on this event we set ν R n := R(Z 1 , . . ., Z n ).When ν is fair coin-tossing measure γ (that is, γ is the infinite product of the uniform measure on {0, 1}), we drop the ν and simply write R n for γ R n .It is not hard to see that ( ν R n ) n∈N is a Markov chain; we call it a radix sort tree chain.
Note for y ∈ {0, 1} * and n ≥ k ≥ 2 that with probability one and ν can be recovered almost surely from the tail σ-field of 3) and the Hewitt-Savage zero-one law that the tail σ-field of ( ν R n ) n∈N is P-a.s.trivial.
In order to describe our results, we need to use some notions and facts from Doob-Martin boundary theory.A quick summary tailored to the sort of setting we are in of a process which "goes off to infinity" and never revisits states may be found in [EGW12,EGW15], where there are also references to expositions of the general theory for arbitrary transient Markov chains following on from the seminal paper [Doo59].Analyses of binary-search-tree and digitial-search-tree chains from the Doob-Martin point of view are presented in [EGW12].
Let S n be the set of trees that can arise as R(z 1 , . . ., z n ) for some choice of z 1 , . . ., z n and set S = n∈N S n .Of course, S 1 = {∅}.For n ≥ 2, a finite rooted binary tree t with n leaves belongs to S n if and only if whenever u 1 u 2 . . .u m−1 u m ∈ L(t), then u 1 u 2 . . .u m−1 ūm ∈ t, where 0 := 1 and 1 := 0.
Given a binary tree t ∈ S with for the bridge process obtained by conditioning R 1 , . . ., R M(t) on the event {R M(t) = t}.This Markov chain has the same backward transition probabilities as (R n ) n∈N ; that is, for n ∈ N and the same backward transition probabilities as (R n ) n∈N .We show in Sec. 5 that each chain , where the nonnegative function h is given up to a constant multiple by Conversely, any Markov chain with initial state the trivial tree ∅ and transition probabilites that arise from those of (R n ) n∈N through the h-transform construction for some nonnegative harmonic function h (normalized, without loss of generality, so that h(∅) = 1) is an infinite bridge.
The distribution of an infinite bridge is a mixture of distributions of infinite bridges with almost surely trivial tail σ-fields.Equivalently, the collection of nonnegative harmonic functions h with h(∅) = 1 is a compact convex set (for the product topology on R S + ) and any such function is a unique convex combination of the extreme points of this set.In particular, there is a bijective correspondence between the extreme points of these two sets; that is between the set of infinite bridges with trivial tail σ-fields and extremal normalized nonnegative harmonic functions.
One way to construct infinite bridges is to look for sequences A necessary condition for an infinite bridge to have an almost surely trivial tail σ-field is that is arises from such a construction.
The nonnegative harmonic function corresponding to an infinite bridge constructed in this way (normalized to have h where is the Doob-Martin kernel.A necessary condition for a normalized nonnegative harmonic function to be an extreme point is that it arises as such a limit.The following is our main result characterizing all the ways that it is possible to condition the radix sort tree chain with inputs distributed according to fair coin-tossing measure.We prove this result in Section 7. Theorem 1.1.An infinite bridge for the radix sort tree chain with inputs distributed according to fair coin-tossing measure on {0, 1} ∞ has an almost surely trivial tail σ-field if and only if it is a Markov chain with the same distribution as the radix sort tree chain with inputs distributed according to some diffuse probability measure on {0, 1} ∞ .Consequently, the distribution of an infinite bridge for the radix sort tree chain with inputs distributed according to fair coin-tossing measure is a unique mixture of distributions of radix sort tree chains with inputs distributed according to diffuse probability measures on {0, 1} ∞ .Moreover, an infinite bridge (R ∞ n ) n∈N has an almost surely trivial tail σ-field if and only if there is a sequence The structure of the remainder of the paper is as follows.In Sections 2, 3, and 4 we obtain that forward transition probabilities, backward transition probabilities, and Doob-Martin kernels of the radix sort tree chains.In Section 5 we show that each radix sort tree chain We consider infinite bridges for the Markov chain (R n ) n∈N in Section 6 and introduce an auxiliary consistent labeling of the leaves of the state of the bridge at each time n by [n] := {1, . . ., n} such that, intuitively, these labelings determine a labeling of the limit of the bridge at time ∞ and the whole bridge path can be recovered from the limit and its labeling.We prove two results, Theorem 7.1 and Corollarly 7.2, in Section 7 that together establish Theorem 1.1.

Forward transition probabilities
Recall that S n is the set of trees that can arise as R(z 1 , . . ., z n ) for some choice of distinct z 1 , . . ., z n ∈ {0, 1} ∞ .It is clear that R(z 1 , . . ., z n ) is the unique finite rooted binary tree t ∈ S n with the following property: if L(t) = {y 1 , . . ., y n }, then there is a permutation For n ∈ N, the distribution of ν R n is specified by and, for n ≥ 2 and t ∈ S n with {y 1 , . . ., y n } = L(t), In particular, The radix sort chain ( ν R n ) n∈N has the following forward transition dynamics.Consider s ∈ S n .There are two classes of trees t ∈ S n+1 such that Case I.Here t ∈ S n+1 is a tree with L(t) = L(s) ⊔ {w}, where w = xū m for some x = u 1 u 2 . . .u m−1 with xu m ∈ s \ L(s).In this case, In particular, . .vp for some p ≥ 1 and v 1 , . . ., v p ∈ {0, 1}.In this case, In particular, For later use we note that, with d := T(v 1 . . .v p , v 1 . . .vp ), this may be written as

Backward transition probabilities
Note that if s ∈ S n and t ∈ S n+1 are such that P{ ν R n+1 = t | ν R n = s} > 0, then the leaf set of s is obtained either by removing a leaf from the leaf set of t that has a sibling which is not a leaf (corresponding to Case I above), in which case (1.2) implies that or by removing two sibling leaves from the leaf set of t and replacing them by a single new leaf positioned at the start of the path that led from the rest of t to their common parent (corresponding to Case II above), in which case (1.2) implies that These backward transition probabilities can also be obtained directly.Again write L(s) = {y 1 , . . ., y n }.In Case I (using the notation that was introduced to first describe this case), In Case II (also using the notation that was introduced to first describe this case), The above observations are summarized in the following Definition and Remark.
Definition 3.1.Suppose that t ∈ S n+1 and v = v 1 . . .v m is a leaf of t.If v 1 . . .vm is not a leaf of t, let κ(t, v) ∈ S n be the tree t\{v} (that is, κ(t, v) is the tree with the same leaf set as t except that v has been removed).If v 1 . . .vm is also a leaf of t, then there is a largest ℓ < m such that v 1 . . .v ℓ v ℓ+1 and v 1 . . .v ℓ vℓ+1 are both vertices of t, and in this case let κ(t, v) ∈ S n be the tree t\({v 1 . . .v p : ℓ < p ≤ m}∪{v 1 . . .vm }) (that is, κ(t, v) is the tree with the same leaf set as t except that v and its sibling leaf v 1 . . .vm have both been removed and replaced by the single leaf v 1 . . .v ℓ ).
Remark 3.2.Using Definition 3.1, we can then describe the backward evolution of ( ν R n ) n∈N by saying that conditional on { ν R n+1 , ν R n+2 , . ..} one of the n + 1 leaves of ν R n+1 is chosen uniformly at random and, denoting this leaf by V n+1 , the random tree ν R n is constructed as κ( ν R n+1 , V n+1 ).

The Doob-Martin kernel
Suppose that s ∈ S m and t ∈ S m+n are such that P{R m+n = t | R m = s} > 0, a state of affairs which we denote by s ⊳ t.Write x 1 , . . ., x p for the vertices of s that have degree 2 and y 1 , . . ., y q for the leaves of s.Of course, q = m, but it will be clearer to use this alternative notation.Then t is obtained from s by attaching subtrees to some of the vertices {x 1 , . . ., x p }∪{y 1 , . . ., y q }.More precisely, t \ s = ( p i=1 a i ) ⊔ ( q j=1 b j ) where the subtrees a i and b j are as follows.Suppose that ∈ s, then either a i = ∅ (that is, no subtree is attached to x i , in which case we set α i = 0) or there is an α i ≥ 1 and c i ∈ S αi such that a i = {x i u i w : w ∈ c i }.Suppose that y j = y j1 . . .y jgj , then either b j = ∅ (that is, no subtree is attached to y j , in which case we set β j = 0) or there is a β j ≥ 1 and d j ∈ S βj +1 such that b j = {y j w : w ∈ d j } \ {y j }.We have n = i α i + j β j .Given a tree r ∈ S h for some h ∈ N, set M (r) = h (so that M (r) is the number of leaves of r) and π(r) := P{R h = r}.
Then, by iterating the arguments that lead to (2.2) and (2.4), Also, because of (2.1), Note also, that Therefore, the Doob-Martin kernel is .
Remark 4.1.It follows that, for s ∈ S m , m ∈ N with leaves L(s) = {y 1 , . . ., y m } and a sequence (t n ) n∈N with lim n→∞ M (t n ) = ∞, the sequence K(s, t n ) converges as n → ∞ if and only if the limit of exists, in which case the limits coincide.Recall that for y ∈ {0, 1} ⋆ the cardinality #{1 ≤ j ≤ n : y ≤ ζ n,j (z 1 , . . ., z n )} equals #{1 ≤ j ≤ n : y ≤ z j } if the latter cardinality is at least two and it is zero otherwise.Hence a sufficient condition for the limit as n → ∞ of K(s, t n ) (equivalently, of (4.1)) to exist for all s ∈ S is that t n = R(z 1 , . . ., z n ) for a sequence (z n ) n∈N of distinct elements of {0, 1} ∞ such that for some probability measure ν on {0, 1} ∞ we have for all y ∈ {0, 1} ⋆ ; that is, the sequence of empirical probability distributions ( 1 n n j=1 δ zj ) n∈N converges weakly to ν (where we put the usual topology on {0, 1} ∞ for which the sets τ (y) are both closed and open).In this case (4.2) The function ν h is excessive as a pointwise limit of excessive functions.Moreover, if ν is diffuse, then for all s ∈ S.

Examples of harmonic functions
It is immediate from the expressions for the forward transition probabilities derived in Section 2 that where the function ν h was defined in (4.2).
Thus, the nonnegative function ν h is harmonic, the Markov chain ( ν R n ) n∈N is the h-transform of (R n ) n∈N with the harmonic function ν h, and hence ( ν R n ) n∈N is an infinite bridge for (R n ) n∈N .Recall that the tail σ-field of ( ν R n ) n∈N is Pa.s.trivial.It follows that the normalized nonnegative harmonic function ν h is extremal.We show in Theorem 7.1 and Corollary 7.2 that the extremal normalized nonnegative harmonic functions are precisely those of this form and that they are, in turn, precisely the harmonic functions that arise as a limit of the form r → lim k→∞ K(r, t k ), where (t k ) k∈N is such that M (t k ) → ∞ as k → ∞.In the language of Doob-Martin theory, this shows that the the minimal Doob-Martin boundary of the radix sort tree chain (R n ) n∈N coincides with the full Doob-Martin boundary.It may be feasible to prove this fact "bare-hands", but the simpler indirect route we take is, we believe, more informative.

Labeled infinite bridges
Recall that the backward transition dynamics of any finite bridge (R t n ) n=1 and any infinite bridge (R ∞ n ) n∈N may be described in terms of the "pruning" operation κ from Definition 3.1 and Remark 3.2: • Suppose that the value of the process at time n + 1 is t ∈ S n+1 .
• Pick a leaf v uniformly at random.
• Replace t by κ(t, v) ∈ S n to produce the value of the process at time n.
Consider a binary tree t ′′ ∈ S n+1 .Label the n + 1 leaves of t ′′ with [n + 1] uniformly at random (that is, all (n + 1)! labelings are equally likely).Let V be the leaf labeled n + 1. Set t ′ := κ(t ′′ , V ).If the sibling of V was not a leaf in t ′′ , then the leaves of t ′ were also leaves of t ′′ and we maintain their labels.If the sibling of V was also a leaf of t ′′ , labeled, say, k ∈ [n], then in passing from t ′′ to t ′ we remove V and its sibling along with some vertices on the path leading to their parent, thereby creating a new leaf which we label k while leaving the labels of the remaining leaves (which are common to both t ′′ and t ′ ) unchanged.The distribution of t ′ is that arising from one step starting from t ′′ of the backward radix sort dynamics (that is, the common backward dynamics of all infinite bridges).Moreover, the labeling of t ′ by [n] is uniformly distributed over the n! possible labelings.Now suppose that (R ∞ n ) n∈N is an infinite bridge.For some N , let S N be a random binary tree with the same distribution as R ∞ N .Label S N uniformly at random with [N ] to produce a leaf-labeled binary tree SN .The pruning procedure described above is deterministic once the labeling is given and applying it successively for n = N − 1, . . ., 1 produces leaf-labeled binary trees SN−1 , . . ., S1 , where Sn has n leaves labeled by [n] for 1 ≤ n ≤ N − 1. Write S n for the underlying binary tree obtained by removing the labels of Sn .It follows from the observations above that the sequence (S 1 , . . ., S N ) has the same joint distribution as (R ∞ 1 , . . ., R ∞ N ).Note that the joint distribution of the sequence ( S1 , . . ., SN ) is uniquely determined by the distribution of R ∞ N and hence, a fortiori, by the joint distribution of (R ∞ n ) n∈N .
convex combinations, and hence preserves extremality.Therefore the tail σ-field of the infinite bridge (R ∞ n ) n∈N is P-a.s.trivial if and only if the exchangeable sequence ( i ) i∈N is ergodic.(This situation closely parallels one appearing in the analysis of Rémy's tree growth chain in [EGW15], and we refer to the more detailed argument in Proposition 5.19 (see also the subsequent Remark 5.20) of [EGW15].)Finally, a well-known consequence of de Finetti's theorem is that an exchangeable sequence is ergodic if and only if it is independent and identically distributed.(c) For any u ∈ {0, 1} ⋆ , the sequence (1{u = k }) k∈N is independent and identically distributed, and hence #{k ∈ N : u = k } = 0 P-a.s. or #{k ∈ N : u = k } = ∞ P-a.s.Now, if P{ i ∈ {0, 1} ⋆ } > 0 there would be a u ∈ {0, 1} ⋆ such that with positive probability i n = i = u for all n sufficiently large.Then, on the event { i = u} we would have #{k ∈ N : k = u} = 1, since it follows from the construction in Definition 6.1 that j = i for j = i when i ∈ {0, 1} ⋆ .This shows that P{ i ∈ {0, 1} ⋆ } = 0.
We therefore have that ( k ) k∈N is an independent identically distributed sequence of {0, 1} ∞ -valued random variables.Because i ∧ j = i n ∧ j n ∈ {0, 1} ⋆ for all n ≥ i ∨ j P-a.s. when i = j, it follows that i = j P-a.s. for i = j and the common distribution of ( k ) k∈N is diffuse.(d) We have already seen that when ν is a diffuse probability measure on {0, 1} ∞ the process ( ν R n ) n∈N is an infinite bridge which, by the Hewitt-Savage zero-one law, has a trivial tail σ-field.
Conversely, suppose that the infinite bridge (R ∞ n ) n∈N has a trivial tail σ-field.Let ν be the common diffuse distribution of the independent, identically distributed sequence of {0, 1} ∞ -valued random variables ( i ) i∈N .In the notation of the Introduction, it is clear that R ∞ n = R( 1 , . . ., n ), n ∈ N, and so (R ∞ n ) n∈N has the same distribution as ( ν R n ) n∈N .
Corollary 7.2.The extremal normalized nonnegative harmonic functions are precisely those that arise as s → lim k→∞ K(s, t k ) for a sequence (t k ) k∈N with M (t k ) → ∞ as k → ∞.There is a bijective correspondence between diffuse probability measures on {0, 1} ∞ and such functions: the measure ν corresponds to the normalized nonnegative harmonic function ν h of (4.2) and, conversely, if h is an extremal normalized nonnegative harmonic function and (R ∞ n ) n∈N is the infinite bridge constructed as the Doob h-transform of (R n ) n∈N using the function h, then h = ν h, where ν is the common distribution of the independent identically distributed sequence ( i ) i∈N associated with the labeled infinite bridge ( R∞ n ) n∈N .
Proof.We know from Theorem 7.1 that the extremal normalized nonnegative harmonic functions correspond to infinite bridges of the form ( ν R n ) n∈N where ν is a diffuse probability measure on {0, 1} ∞ , and hence they are the harmonic functions ν h.In order to see that the correspondence between ν and the distribution of ( ν R n ) n∈N is bijective, we observe that ν is determined uniquely by the distribution of the labeled version of ( ν R n ) n∈N and hence by the distribution of ( ν R n ) n∈N itself.It remains to check that if the normalized nonnegative harmonic function h is given by h(s) = lim k→∞ K(s, t k ) for a sequence (t k ) k∈N with M (t k ) → ∞ as k → ∞, then h is extremal.We will follow an argument similar to the proof of Corollary 5.21 in [EGW15].Writing (R ∞ n ) n∈N for the infinite bridge given by the Doob h-transform of (R n ) n∈N associated with h, we recall that extremality of h is equivalent to the tail σ-field of (R ∞ n ) n∈N being P-a.s.trivial.By Theorem 7.1, this