An ergodic theorem for partially exchangeable random partitions

We consider shifts $\Pi_{n,m}$ of a partially exchangeable random partition $\Pi_\infty$ of $\mathbb{N}$ obtained by restricting $\Pi_\infty$ to $\{n+1,n+2,\dots, n+m\}$ and then subtracting $n$ from each element to get a partition of $[m]:= \{1, \ldots, m \}$. We show that for each fixed $m$ the distribution of $\Pi_{n,m}$ converges to the distribution of the restriction to $[m]$ of the exchangeable random partition of $\mathbb{N}$ with the same ranked frequencies as $\Pi_\infty$. As a consequence, the partially exchangeable random partition $\Pi_\infty$ is exchangeable if and only if $\Pi_\infty$ is stationary in the sense that for each fixed $m$ the distribution of $\Pi_{n,m}$ on partitions of $[m]$ is the same for all $n$. We also describe the evolution of the frequencies of a partially exchangeable random partition under the shift transformation. For an exchangeable random partition with proper frequencies, the time reversal of this evolution is the heaps process studied by Donnelly and others.


Introduction
A random partition Π ∞ of the set N of positive integers arises naturally in a number of different contexts. The fields of application include population genetics [6] [14], statistical physics [11], Bayesian nonparametric statistics [7] and many others. Moreover, this subject has some purely mathematical interest. We also refer to [19] for various results on random partitions.
There are two convenient ways to encode a random partition Π ∞ of the set N as a sequence whose nth term ranges over a finite set of possible values. One way is to identify Π ∞ with its sequence of restrictions Π n to the sets [n] := {1, . . . , n}, say Π ∞ = (Π n ) where n will always range over N. Another encoding is provided by the allocation sequence (A n ) with A n = j iff n ∈ C j where Π ∞ = {C 1 , C 2 , . . .} with the clusters C j of Π ∞ listed in increasing order of their least elements, also called order of appearance. Commonly, random partitions Π ∞ of N are generated by some sequence of random variables (X n ), meaning that (C j ) is the collection of equivalence classes for the random equivalence relation m ∼ n iff X m = X n . Every random partition of N is generated in this way by its own allocation sequence. If F is a random probability distribution, and given F the sequence (X n ) is i.i.d. according to F , and then Π ∞ is generated by (X n ), say Π ∞ is generated by sampling from F . Kingman [14] [15] developed a theory of random partitions that are exchangeable in the sense that for each n the distribution of Π n on the set of partitions of [n] is invariant under the natural action on these partitions by permutations of [n]. Kingman's main results can be summarized as follows: • every exchangeable random partition Π ∞ of N has the same distribution as one generated by sampling from some random distribution F on the real line; • the distribution of Π ∞ generated by sampling from F depends only on the joint distribution of the list P ↓ j of sizes of atoms of the discrete component of F , in weakly decreasing order.
Two immediate consequences of these results are: • every cluster C j of an exchangeable random partition Π ∞ of N has an almost sure limiting relative frequency P j ; • ranking those limiting relative frequencies gives the distribution of ranked atoms P ↓ j required to replicate the distribution of Π ∞ by random sampling from an F with those ranked atom sizes.
Kingman's method of analysis of exchangeable random partitions of N, by working with the distribution of its ranked frequencies (P ↓ j ), continues to be used in the study of partitionvalued stochastic processes [16]. But well known examples, such as the random partition of N whose distribution of Π n is given by the Ewens sampling formula [5] [21], show it is often more convenient to encode the distribution of an exchangeable random partition of N by the distribution of its frequencies of clusters (P j ) in their order of appearance, rather than in weakly decreasing order. This idea was developed in Pitman [17], together with a more convenient encoding of the distribution of Π n . Call Π ∞ a partially exchangeable partition (PEP) of N if for each fixed n the distribution of Π n is given by the formula for each particular partition of [n] with k clusters C 1 , . . . , C k in order of appearance, of sizes #C 1 , . . . , #C k , for some function p(n 1 , . . . , n k ) of compositions of n, meaning sequences of positive integers (n 1 , . . . , n k ) with k i=1 n i = n for some 1 ≤ k ≤ n. The main results of [17] can be summarized as follows. See also [19,Chapters 2,3].
• There is a one-to-one correspondence between distributions of partially exchangeable partitions of N and non-negative functions p of compositions of positive integers subject to the normalization condition p(1) = 1 and the sequence of addition rules p(n) = p(n + 1) + p(n, 1), p(n 1 , n 2 ) = p(n 1 + 1, n 2 ) + p(n 1 , n 2 + 1) + p(n 1 , n 2 , 1) (1.2) and so on. This function p associated with Π ∞ is called its partially exchangeable partition probability function (PEPPF).
• Π ∞ is exchangeable iff Π ∞ is partially exchangeable with a p(n 1 , . . . , n k ) that is for each fixed k a symmetric function of its k arguments.
• Every cluster C j of a partially exchangeable random partition Π ∞ of N has an almost sure limiting relative frequency P j , with P j = 0 iff C j is a singleton, meaning #C j = 1.
• The distribution of the sequence of cluster frequencies (P j ) and the PEPPF p determine each other by the product moment formula is the cumulative frequency of the first i clusters of Π ∞ .
• The set of all PEPPFs p is a convex set in the space of bounded real-valued functions of compositions of positive integers, compact in the topology of pointwise convergence.
• The extreme points of this convex compact set of PEPPFs are given by the formula (1.3) for non-random sequences of sub-probability cluster frequencies (P j ), meaning that P j ≥ 0 and j P j ≤ 1.
• The formula (1.3), for a random sub-probability distribution (P j ), provides the unique representation of a general PEPPF as an integral mixture of these extreme PEPPFs.
• The family of distributions of partitions of N with PEPPFs (1.3), as a fixed sequence (P j ) varies over all sub-probability distributions, and the E can be omitted, provides for every exchangeable or partially exchangeable random partition Π ∞ of N a regular conditional distribution of Π ∞ given its cluster frequencies (P j ) in order of appearance.
These results provide a theory of partially exchangeable random partitions of N that is both simpler and more general than the theory of exchangeable random partitions. The structure of partially exchangeable random partitions of N is nonetheless very closely tied to that of exchangeable random partitions, due to the last point above. Starting from the simplest exchangeable random partition of N with an infinite number of clusters, whose cumulative frequencies (R k ) have the same distribution as the sequence of record values of an i.i.d. uniform [0, 1] sequence, given by the stick-breaking representation for H i a sequence of i.i.d. uniform [0, 1] variables, the most general extreme partially exchangeable random partition Π ∞ of N with fixed cluster frequencies (P j ) may be regarded as derived from this record model by conditioning its cluster frequencies. See [13] for further development of this point. Less formally, a PEP is as exchangeable as it possibly can be, given that its distribution of cluster frequencies (P j ) in appearance order has been altered beyond the constraints on the frequencies of an exchangeable random partition of N. For proper frequencies (P j ), with j P j = 1 almost surely, those constraints are that is a size-biased random permutation of (P j ). Then (P j ) is said to be in size-biased random order or invariant under size-biased random permutation [3] [18].
For a partially exchangeable random partition Π ∞ , consider for each n = 0, 1, 2, . . . the random partition Π (n) ∞ of N defined by first restricting Π ∞ to {n + 1, n + 2, . . .}, then shifting indices back by n to make a random partition of {1, 2, . . .} instead of {n + 1, n + 2, . . .}. This procedure appeared in our paper [20] as discussed further in Section 3 below. If Π ∞ is exchangeable, then obviously so is Π Moreover, the sequence of random partitions Π (n) ∞ , n = 0, 1, . . . is a stationary random process to which the ergodic theorem can be immediately applied. According to Kingman's representation, this process of shifts of Π ∞ is ergodic iff the ranked frequencies of Π ∞ are constant almost surely, For more general models with random ranked frequencies, the asymptotic behavior of functionals of Π (n) ∞ can be read from the ergodic case by conditioning on the ranked frequencies.
If Π ∞ is only partially exchangeable, it is easily shown that Π (n) ∞ is also partially exchangeable for every n. The PEPPF p (n) of Π (n) ∞ is obtained by repeated application of the following simple transformation from the PEPPF p of Π ∞ to the PEPPF p (1) and so on, in parallel to the basic consistency relations (1.2) for a PEPPF. If Π ∞ is partially is then a stationary random process to which the ergodic theorem can be applied. That raises two questions: (i) Are there any partially exchangeable random partitions of N which are stationary but not exchangeable?
(ii) If a partially exchangeable random partition of N is not stationary, what can be provided as an ergodic theorem governing the long run behavior of its sequence of shifts?
Since Π ∞ is exchangeable iff p is symmetric, the answer to the question (i) is "yes" if and only if every function p of compositions that is bounded between 0 and 1 and satisfies both systems of equations (1.2) and (1.5) is a symmetric function of its arguments. (1.6) So it seems the matter should be resolved by analysis of the combined system of equations. Surprisingly, this does not seem to be easy. Still, we claim that every partially exchangeable and stationary random partition of N is in fact exchangeable, so (1.6) is true. We do not know how to prove this without dealing with question (ii) first. But that question is of some independent interest, so we formulate the following theorem: Let Π ∞ be a partially exchangeable random partition of positive integers with ranked frequencies P ↓ j , and let P ∞ obtained from the restriction of Π ∞ restricted to {n + 1, n + 2, . . .}. Then: • As n → ∞, the distribution of Π (n) ∞ converges weakly to that of the exchangeable random partition Π ∞ of N with ranked frequencies P ↓ j , meaning that the PEPPF p (n) (· · · ) of Π (n) ∞ converges pointwise to the EPPF p (∞) (· · · ) of Π ∞ .
• As n → ∞, the finite dimensional distributions of P (n) j converge weakly to those of (P j ), the list of frequencies in order of appearance of an exchangeable random partition of N with ranked frequencies P ↓ j , which for proper P ↓ j with j P ↓ j = 1 is a sizebiased random permutation of P ↓ j , or of P (n) j for any fixed n.
In view of the one-to-one correspondence between the law of a partially exchangeable partition Π ∞ and the law of its frequencies of clusters in order of appearance, this theorem has the following corollary: In the setting of the previous theorem, with P (n) j for each n = 0, 1, . . . the frequencies of a partially exchangeable partition Π ∞ in their order of appearance when Π ∞ is restricted to {n + 1, n + 2, . . .}, • the sequence P (n) j , n = 0, 1, . . . is a Markov chain with stationary transition probabilities on the space of sub-probability distributions of N; for n ≥ 1 the forwards transition mechanism from P (n−1) j to P (n) j is by a top to random move, whereby given P for some random position X ∈ N with the proper conditional distribution • Π ∞ is exchangeable if and only if the Markov chain P • if Π ∞ is exchangeable the reversed transition mechanism from P According to the last part of the Corollary, when Π ∞ is exchangeable, with proper frequencies, the time-reversed random-to-top evolution of the cluster frequencies P ∞ is the mechanism of the heaps process studied by Donnelly [2], also called a moveto-front rule. The mechanism of this chain has been extensively studied, mostly in the case of finite number of nonzero frequencies, due to its interest in computer science [8] [1]. Donnelly's result that proper frequencies (P j ) are in a size-biased order iff the distribution of (P j ) is invariant under this transition mechanism is an immediate consequence of the above corollary. It seems surprising, but nowhere in Donnelly's article, or elsewhere in the literature we are aware of, is it mentioned that the random-to-top rule is the universal time-reversed evolution of cluster frequencies in order of appearance for shifts of any exchangeable random partition of N with proper frequencies. We are also unaware of any previous description of the time-forwards evolution of these cluster frequencies, as detailed in the corollary.
The rest of this article is organized as follows. Theorem 1.1 is proved in Section 2. In Section 3 we first recall an idea from [20] which led us to develop the results of this article. This leads to a proposition which we combine with Theorem 1.1 to obtain Corollary 1.2. Finally, Section 4 provides some references to related literature.

Proof of Theorem 1.1
Proof. The convergence in distribution of Π (n) ∞ to Π ∞ is obtained by a coupling argument. Given the frequencies P (0) j of Π ∞ and the independent i.i.d. sequence (U j ) of uniform on [0, 1] random variables, let us construct a partially exchangeable random partition Π ∞ distributed as Π ∞ , and an exchangeable random partition Π ∞ such that the convergence of Π (n) ∞ to Π ∞ holds almost surely. Set R k := k i=1 P (0) i and construct Π ∞ as the partition generated by values of the table allocation process (A n ) defined by A 1 := 1 and given that A 1 , . . . , A n have been assigned with K n := max 1≤i≤n A i distinct tables, A n+1 = j if U n+1 ∈ (R j−1 , R j ] for some 1 ≤ j ≤ k and A n+1 = k + 1 if U n+1 ∈ (R k , 1]. The limiting exchangeable random partition Π ∞ is conveniently defined on the same probability space to be the random partition of N whose list of clusters with strictly positive frequencies is C k := {n : U n ∈ (R k−1 , R k ]} for k with R k−1 < R k , and with each remaining element of N a singleton cluster. Let C k be the kth cluster of Π ∞ in order of appearance. The key observation is that for each n ≥ 1 the intersections of C k and C k with [n + 1, ∞) are identical on the event (K n ≥ k). In more detail, if say K n = k, then for all i > n • if U i ≤ R k then almost surely both i ∈ C j and i ∈ C j for some 1 ≤ j ≤ k; • if U i > R ∞ := lim n R k then i ∈ C j for some j > k, while {i} is a singleton cluster of Π ∞ ; • if U i ∈ (R k , R ∞ ] then i ∈ C j and i ∈ C ℓ for some j > k and ℓ > k.

Sampling frequencies in size-biased order
Let C 1 , C 2 , . . . be the list of clusters of an exchangeable random partition Π ∞ , in the appearance order of their least elements. Let M i,1 := min C i , and assuming that C i is infinite let M i,1 < M i,2 < · · · be the elements of C i listed in increasing order. So in particular 1 = M 1,1 < M 2,1 < · · · is the list of least elements of clusters C 1 , C 2 , . . .. Observe that the number of clusters K n of Π n is K n = n i=1 1(M i,1 ≤ n). Let X be the number of distinct clusters of Π ∞ , including the first cluster, which appear before the second element of the first cluster appears at time M 1,2 . That is, with K(n) instead of K n for ease of reading: As explained below, if Π ∞ is exchangeable with proper random frequencies (P j ) in sizebiased order, then X has the same distribution as a size-biased pick from (P j ): An extended form of this identity in distribution, giving an explicit construction from Π ∞ of an i.i.d. sample of arbitrary size from the frequencies of Π ∞ in size-biased order, played a key role in [20]. If (P j ) is defined by the limiting cluster frequencies of Π ∞ in their order of discovery in the restriction of Π ∞ to {2, 3, . . .}, then it is easily seen from Kingman's paintbox construction of Π ∞ , that X really is a size-biased pick from (P j ): from which (3.2) follows by taking expectations. But if (P j ) is taken to be the frequencies of clusters of Π ∞ in their order of discovery in {1, 2, 3, . . .}, then (3.3) is typically false, which makes (3.2) much less obvious. This gives the identity (3.2) a "now you see it, now you don't" quality. You see it by conditioning on the frequencies of clusters of Π ∞ ∩ [2, ∞) in their order of appearance, but you don't see it by conditioning on the frequencies of Π ∞ in their usual order of least elements, It is instructive to see exactly what is the conditional distribution of X given (P j ), for (P j ) the original frequencies of Π ∞ in order of appearance. To deal with non-proper frequencies let us extend the definition (3.1) by assuming that X = ∞ if the cluster C 1 = {1} in Π ∞ . As indicated in the Introduction, the conditional distribution of any exchangeable random partition Π ∞ of N, given its list of cluster frequencies (P j ) in order of appearance, is that of the extreme partially exchangeable random partition of N with the given cluster frequencies (P j ). Regarding (P j ) as a list of fixed frequencies P j ≥ 0 with j P j ≤ 1, the distribution of this random partition Π ∞ is described by the extreme CRP with fixed frequencies (P j ). In terms of the Chinese Restaurant metaphor in this model [19,Section 3.1], customer 1 sits at table 1; thereafter, given k tables are occupied and there are n i customers at table i for 1 ≤ i ≤ k with n 1 + · · · + n k = n, customer n + 1 sits at table i with probability P i for 1 ≤ i ≤ k, and at the new table k + 1 with probability 1 − P 1 − · · · − P k . (3.4) Formally, "customer i sits at table j" means in present notation that i ∈ C j . The identity (3.2) now becomes the special case when Π ∞ is fully exchangeable of the following description of the law of X given (P j ) for any partially exchangeable random partition Π ∞ of N with limit frequencies (P j ): Proposition 3.1. Let Π ∞ be a partially exchangeable random partition of N with limit frequencies (P j ), and let X be defined as above by (3.1), with X = ∞ if C 1 = {1}. Then • the event (X < ∞) equals the event (P 1 > 0); • the conditional distribution of X given (P j ) is defined by the stick-breaking formula with H 1 := P 1 and H j := P 1 1 − P 2 − P 3 − · · · − P j for j = 2, 3, . . . ; (3.6) • the unconditional probability P(X = j) is the expected value of the product in (3.5); • if Π ∞ is exchangeable then P(X = j) = EP j for all j = 1, 2, . . ., meaning that X has the same distribution as a size-biased pick from (P j ).
Proof. The first claim follows directly from the definitions, since X = ∞ iff {1} is a singleton cluster of Π ∞ , which is equivalent to P 1 = 0. By the general theory of exchangeable and partially exchangeable random partitions recalled in the Introduction, it is enough to prove the formula (3.5) for an arbitrary fixed sequence of frequencies (P j ). The case j = 1, with P[X = 1] = P 1 , is obvious from the extreme CRP (3.4), because the event (X = 1) is identical to the event (M 1,2 = 2) that the second customer is seated at table 1. Consider next the event (X = 2) = (2 = M 2,1 < M 1,2 < M 3,1 ) Conditioning on the value ℓ of M 1,2 − M 2,1 − 1 on this event, the extreme CRP description (3.4) gives for H j as in (3.6).
and so on. This gives the stick-breaking formula (3.5). Taking expectations gives the unconditional distribution of X.
Proof of Corollary 1.2. The fact that P (n) j , n = 0, 1, . . . is a Markov chain with stationary transition probabilities as indicated follows easily from the description (3.4) of the extreme CRP. If the cluster C containing n is a singleton in Π ∞ , then P 1 = P (n−1) 1 = 0 and frequencies P (n) j of Π ∞ restricted to {n + 1, n + 2, . . . } are (P 2 , P 3 , . . . ). Otherwise the cluster C is infinite and it obtains a new place as in Proposition 3.1. The rest of the corollary follows easily from the theorem and the general theory of partially exchangeable random partitions of N presented in the introduction.

Related literature
There is a substantial literature of various models of partial exchangeability for sequences and arrays of random variables, which has been surveyed in [12]. The article [4, §6.2] places the theory of partially exchangeable partitions of N in a larger context of boundary theory for Markov chains evolving as a sequence of connected subsets of a directed acyclic graph that grow in the following way: initially, all vertices of the graph are unoccupied, particles are fed in one-by-one at a distinguished source vertex, successive particles proceed along directed edges according to an appropriate stochastic mechanism, and each particle comes to rest once it encounters an unoccupied vertex. The article [9] discusses questions related to the size of the first cluster in a PEP, and its interaction with other clusters. Gnedin [10] indicates an application of PEPs to records in a partially ordered set.