Convergence Rates of Markov Chains on Spaces of Partitions

We study the convergence rate to stationarity for a class of exchangeable partition-valued Markov chains called cut-and-paste chains. The law governing the transitions of a cut-and-paste chain are determined by products of i.i.d. stochastic matrices, which describe the chain induced on the simplex by taking asymptotic frequencies. Using this representation, we establish upper bounds for the mixing times of ergodic cut-and-paste chains, and under certain conditions on the distribution of the governing random matrices we show that the"cutoff phenomenon"holds.


INTRODUCTION
A Markov chain {X t } t=0,1,2,... on the space [k] N of k−colorings of the positive integers N is said to be exchangeable if its transition law is equivariant with respect to finite permutations of N (that is, permutations that fix all but finitely many elements of N). Exchangeability does not imply that the Markov chain has the Feller property (relative to the product topology on [k] N ), but if a Markov chain is both exchangeable and Feller then it has a simple paintbox representation, as proved by Crane [3]. In particular, there exists a sequence {S t } t≥1 of i.i.d. k × k random column-stochastic matrices (the paintbox sequence) such that conditional on the entire sequence {S t } t≥1 and on X 0 , X 1 , . . . , X m , the coordinate random variables{X i m+1 } i∈ [n] are independent, and X i m+1 has the multinomial distribution specified by the X i m column of S m+1 . Equivalently (see Proposition 3.3 in section 3.3), conditional on the paintbox sequence, the coordinate sequences {X i m+1 } m≥0 are independent, time-inhomogeneous Markov chains on the state space [k] with one-step transition probability matrices S 1 , S 2 , . . . . This implies that for any integer n ≥ 1 the restriction X [n] t of X t to the space [k] [n] is itself a Markov chain. We shall refer to such Markov chains X t and X [n] t as exchangeable Feller cut-and-paste chains, or EFCP chains for short. Under mild hypotheses on the paintbox distribution (see the discussion in section 5) the restrictions of EFCP chains X [n] t to the finite configuration spaces [k] [n] are ergodic. The main results of this paper, theorems 1.1-1.2, relate the convergence rates of these chains to properties of the paintbox process S 1 , S 2 , . . . . Theorem 1.1. Assume that for some m ≥ 1 there is positive probability that all entries of the matrix product S m S m−1 · · · S 1 are nonzero. Then the EFCP chain X [n] is ergodic, and it mixes in O(log n) steps.

Theorem 1.2.
Assume that the distribution of S 1 is absolutely continuous relative to Lebesgue measure on the space of k × k column-stochastic matrices, with density of class L p for some p > 1. Then the associated EFCP chains X [n] exhibit the cutoff phenomenon: there exists a positive constant θ such that for all sufficiently small δ, ε > 0 the (total variation) mixing times satisfy (1) (θ − δ) log n ≤ t (n) MIX (ε) ≤ t (n) MIX (1 − ε) ≤ (θ + δ) log n for all sufficiently large n.
Formal statements of these theorems will be given in due course (see Theorems 5.4 and 5.7 in section 5), and less stringent hypotheses for the O(log n) convergence rate will be given. In the special case k = 2 the results are related to some classical results for random walks on the hypercube, e.g. the Ehrenfest chain on {0, 1} n : see example 5.9.
The key to both results is that the relative frequencies of the different colors are determined by the random matrix products S t S t−1 · · · S 1 (see Proposition 3.3). The hypotheses of Theorem 1.1 ensure that these matrix products contract the k−simplex to a point at least exponentially rapidly. The stronger hypotheses of Theorem 1.2 prevent the simplex from collapsing at a faster than exponential rate.
The paper is organized as follows. In section 2 we record some simple and elementary facts about total variation distance, and in section 3 we define cut-and-paste Markov chains formally and establish the basic relation with the paintbox sequence (Proposition 3.3). In section 4 we discuss the contractivity properties of products of random stochastic matrices. In section 5 we prove the main results concerning ergodicity and mixing rates of cut-andpaste chains, and in section 5.3 we discuss some examples of cut-and-paste chains not covered by our main theorems. Finally, in section 6 we deduce mixing rate and cutoff for projections of the cut-and-paste chain into the space of ordinary set partitions.

PRELIMINARIES: TOTAL VARIATION DISTANCE
Since the state spaces of interest in our main results are finite, it is natural to use the total variation metric to measure the distance between the law D(X m ) of the chain X at time m ≥ 1 and its stationary distribution π. The total variation distance µ − ν TV between two probability measures µ, ν on a finite or countable set X is defined by (2) µ − ν TV = 1 2 x∈X |µ(x) − ν(x)| = max B⊂X (ν(B) − µ(B)).
The maximum is attained at B * = {x : ν(x) ≥ µ(x)} and, since the indicator 1 B * is a function only of the likelihood ratio dν/dµ, the total variation distance µ − ν TV is the same as the total variation distance between the µ− and ν− distributions of any sufficient statistic. In particular, if Y = Y(x) is a random variable such that dν/dµ is a function of Y, then where the sum is over all possible values of Y(x). Likelihood ratios provide a useful means for showing that two probability measures are close in total variation distance. Lemma 2.1. Fix ε > 0, and define If ν(B ε ) < ε, then µ − ν TV < 2ε.
Proof. By definition of B ε , B c ε : The convergence rates of EFCP chains will be (in the ergodic cases) determined by the contractivity properties of products of random stochastic k × k matrices on the (k − 1)dimensional simplex We now record some preliminary lemmas about convergence of probability measures on ∆ k that we will need later. For each n ∈ N and each element s ∈ ∆ k define a probability measure ̺ n s , the product multinomial-s measure on [k] n by Observe that the vector m(x) := (m 1 , . . . , m k ) of cell counts defined by m j : Proof. This is a routine consequence of Lemma 2.1, as the hypotheses ensure that the likelihood ratio d̺ n s n /d̺ n s ′ n is uniformly close to 1 with probability approaching 1 as n → ∞.
A similar argument can be used to establish the following generalization, which is needed in the case of partitions with k ≥ 3 classes. For s 1 , . . . , s k ∈ ∆ k , let ̺ n 1 s 1 ⊗· · ·⊗̺ n k s k denote the product measure on [k] n 1 +···+n k where the first n 1 coordinates are i.i.d. multinomial-s 1 , the next n 2 are i.i.d. multinomial-s 2 , and so on.
In dealing with probability measures that are defined as mixtures, the following simple tool for bounding total variation distance is useful. Lemma 2.4. Let µ, ν be probability measures on a finite or countable space X that are both mixtures with respect to a common mixing probability measure λ(dθ), that is, such that there are probability measures µ θ and ν θ for which µ = µ θ dλ(θ) and ν = ν θ dλ(θ).
Lower bounds on total variation distance between two probabilities µ, ν are often easier to establish than upper bounds, because for this one only need find a particular set B such that µ(B) − ν(B) is large. By (3), it suffices to look at sets of the form B = {Y ∈ F}, where Y is a sufficient statistic. The following lemma for product Bernoulli measures illustrates this strategy. For α ∈ [0, 1], we write ν n α : ̺ n s , where s := (α, 1 − α) ∈ ∆ 2 , to denote the product Bernoulli measure determined by α.
These statements follow directly from Lemma 2.5 by projection on the appropriate coordinate variable.
Thus, the multinomial-s measure ̺ n s defined in the previous section induces a measure on L [n]:k , which we will also denote by ̺ n s . There is an obvious and natural projection This mapping coincides with the natural projection where ∼ is the equivalence relation l 1 l 2 · · · l n ∼ l 1 * l 2 * · · · l n * if and only if there exists a permutation σ of [k] such that l i * = σ(l i ) for each i ∈ [n]. Some of the Markov chains on L [n]:k considered below have transition laws invariant under such permutations σ of the labels [k], and in such cases the Markov chain projects via Π n to a Markov chain on the state space P [n]:k . This is discussed further in section 6 below.
We write M [n]:k to denote the space of k × k partition matrices of [n]. Observe that the matrix product defined by (7) makes sense for matrices with entries in any distributive lattice, provided ∪, ∩ are replaced by the lattice operations.
As  The proof is elementary and follows mostly from the definition (7) (the semigroup identity is the matrix whose diagonal entries are all [n] and whose off-diagonal entries are ∅). We now describe the role of the semigroup (M [n]:k , * ) in describing the transitions of the cut-and-paste Markov chain.
3.3. Cut-and-paste Markov chains. Fix n, k ∈ N, let µ be a probability measure on M [n]:k , and let ̺ 0 be a probability measure on L [n]:k . The cut-and-paste Markov chain X = (X m ) m≥0 on L [n]:k with initial distribution ̺ 0 and directing measure µ is constructed as follows. Let X 0 ∼ ̺ 0 and, independently of X 0 , let M 1 , M 2 , . . . be i.i.d. according to µ. Define We call any Markov chain with the above dynamics a CP n (µ; ̺ 0 ) chain, or simply a CP n (µ) chain if the initial distribution is unspecified. Henceforth we will use the notation X i m to denote the ith coordinate variable in X m (that is, X i m is the color of the site i ∈ [n] when X m is viewed as an element of [k] [n] ).
Our main results concern the class of cut-and-paste chains whose directing measures µ = µ Σ are mixtures of product multinomial measures µ S , where S ranges over the set ∆ k k of k × k column-stochastic matrices. For any S ∈ ∆ k k , the product multinomial measure µ S is defined by where M j (i) = r r1{i ∈ M r j } denotes the index r of the row such that i is an element of M r j .
(In other words, the columns of M ∼ µ S are independent labeled k−ary partitions, and in each column M j the elements i ∈ [n] are independently assigned to rows r ∈ [k] according to draws from the multinomial distribution S j determined by the jth column of S.) For any Borel probability measure Σ on ∆ k k , we write µ Σ to denote the Σ-mixture of the measures µ S on M [n]:k , that is, Crane [3] has shown that every exchangeable, Feller Markov chain on the the space [k] N of k−colorings of the positive integers is a cut-and-paste chain with directing measure of the form (10), and so henceforth, we shall refer to such chains as exchangeable Feller cut-andpaste chains, or EFCP chains for short. An EFCP chain on [k] [n] (or [k] N ) with directing measure µ = µ Σ can be constructed in two steps, as follows. First, choose i.i.d. stochastic matrices S 1 , S 2 , . . . with law Σ, all independent of X 0 ; second, given X 0 , S 1 , S 2 , . . ., let M 1 , M 2 , . . . be conditionally independent k−ary partition matrices with laws M i ∼ µ S i for each i = 1, 2, . . ., and define the cut-andpaste chain X m by equation (8). This construction is fundamental to our arguments, and so henceforth, when considering an EFCP chain with directing measure µ Σ , we shall assume that it is defined on a probability space together with a paintbox sequence S 1 , S 2 , . . . . For each m ∈ N, set Note that Q m is itself a stochastic matrix. Denote by S the σ−algebra generated by the paintbox sequence S 1 , S 2 , . . . .

Proposition 3.3. Given
, are conditionally independent versions of a time-inhomogeneous Markov chain on [k] with one-step transition probability matrices S 1 , S 2 , . . . . Thus, in particular, for each m ≥ 1, Proof. We prove that the Markov property holds by induction on m. The case m = 1 follows directly by (9), as this implies that, conditional on G, the coordinate random variables X i 1 are independent, with multinomial marginal conditional distributions given by the columns of S 1 . Assume, then, that the assertion is true for some m ≥ 1. Let F m be the σ-algebra generated by G and the random matrices M 1 , M 2 , . . . , M m . Since the specification (8) expresses X m as a function of X 0 , M 1 , M 2 , . . . , M m , the random variables X i t , where t ≤ m, are measurable with respect to F m . Moreover, given G the random matrix M m+1 is conditionally independent of F m , with conditional distribution (9) where S = S m+1 . Equation (9) implies that, conditional on G the columns M c m+1 of M m+1 are independent k−ary partitions obtained by independent multinomial−S c sampling. Consequently, the second equality by the induction hypothesis and the third by definition of the probability measure µ S m+1 . This proves the first assertion of the proposition. The equation (12) follows directly. Proposition 3.3 shows that for any n ≥ 1 a version of the EFCP on [k] [n] can be constructed by first generating a paintbox sequence S m and then, conditional on S, running independent, time-inhomogeneous Markov chains X i m with one-step transition probability matrices S m . From this construction it is evident that a version of the EFCP on the infinite state space [k] N can be constructed by running countably many conditionally independent Markov chains X i m , and that for any n ∈ N the projection of this chain to the first n coordinates is a version of the EFCP on [k] [n] .

RANDOM STOCHASTIC MATRIX PRODUCTS
For any EFCP chain {X m } m≥0 , Proposition 3.3 directly relates the conditional distribution of X m to the product Q m = S m S m−1 · · · S 1 of i.i.d. random stochastic matrices. Thus, the rates of convergence of these chains are at least implicitly determined by the contractivity properties of the random matrix products Q m . The asymptotic behavior of i.i.d. random matrix products has been thoroughly investigated, beginning with the seminal paper of Furstenberg and Kesten [4]: see [1] and [5] for extensive reviews. However, the random matrices S i that occur in the paintbox representation of the CP n (µ Σ ) chain are not necessarily invertible, so much of the theory developed in [1] and [5] doesn't apply. On the other hand, the random matrices S t are column-stochastic, and so the deeper results of [1] and [5] are not needed here. In this section we collect the results concerning the contraction rates of the products Q m needed for the study of the EFCP chains, and give elementary proofs of these results.
Throughout this section assume that {S i } i≥1 is a sequence of independent, identically distributed k × k random column-stochastic matrices, with common distribution Σ, and let

Asymptotic Collapse of the Simplex.
In the theory of random matrix products, a central role is played by the induced action on projective space. In the theory of products of random stochastic matrices an analogous role is played by the action of the matrices on the simplex ∆ k . By definition, the simplex ∆ k consists of all convex combinations of the unit vectors e 1 , e 2 , . . . , e k of R k ; since each column of a k × k column-stochastic matrix S ∈ ∆ k k lies in ∆ k , the mapping v → Sv preserves ∆ k . This mapping is contractive in the sense that it is Lipschitz (relative to the usual Euclidean metric on R k ) with Lipschitz constant ≤ 1.
The simplex ∆ k is contained in a translate of the (k − 1)-dimensional vector subspace V = V k of R k consisting of all vectors orthogonal to the vector 1 = (1, 1, . . . , 1) T (equivalently, the subspace with basis e i − e i+1 where 1 ≤ i ≤ k − 1). Any stochastic matrix A leaves the subspace V invariant, and hence induces a linear transformation A|V : V → V. Since this transformation is contractive, its singular values are all between 0 and 1. (Recall that the singular values of a d×d matrix S are the square roots of the eigenvalues of the nonnegative definite matrix S T S. Equivalently, they are the lengths of the principal axes of the ellipsoid Because the induced mapping Q n : ∆ k → ∆ k is affine, its Lipschitz constant is just the largest singular value λ n,1 . Proposition 4.1. Let (S i ) i≥1 be independent, identically distributed k×k column-stochastic random matrices, and let Q m = S m S m−1 · · · S 1 . Then if and only if there exists m ≥ 1 such that with positive probability the largest singular value λ m,1 of Q m |V is strictly less than 1. In this case, Proof. In order that the asymptotic collapse property (14) holds it is necessary that for some m the largest singular value of Q m |V be less than one. (If not then for each m there would exist points u m , v m ∈ ∆ k such that the length of Q m (u m − v m ) is at least the length of u m − v m ; but this would contradict (14).) Conversely, if for some ε > 0 the largest singular of Q m |V is less than 1 − ε with positive probability then with probability 1 infinitely many of the matrix products S mn+m S mn+m−1 · · · S mn+1 have largest singular value less than 1 − ε.
Hence, the Lipschitz constant of the mapping on ∆ k induced by Q mn must converge to 0 as n → ∞. In fact even more is true: the asymptotic fraction as n → ∞ of blocks where S mn+m S mn+m−1 · · · S mn+1 has largest singular value < 1 − ε is positive, by strong law of large numbers, and so the Lipschitz constant of Q mn : ∆ k → ∆ k decays exponentially. Proof. It is well known that if a stochastic matrix has all entries strictly positive then its only eigenvalue of modulus 1 is 1, and this eigenvalue is simple (see, for instance, the discussion of the Perron-Frobenius theorem in the appendix of [7]). Consequently, if Q m has all entries positive then λ m,1 < 1.

4.2.
The induced Markov chain on the simplex. The sequence of random matrix products (Q m ) m≥1 induce a Markov chain on the simplex ∆ k in the obvious way: for any initial That the sequence {Y m } m≥0 is a Markov chain follows from the assumption that the matrices S i are i.i.d. Since matrix multiplication is continuous, the induced Markov chain is Feller (relative to the usual topology on ∆ k ). Consequently, since ∆ k is compact, the induced chain has a stationary distribution, by the usual Bogoliubov-Krylov argument (see, e.g., [10]). (14) holds.

Proposition 4.4. The stationary distribution of the induced Markov chain on the simplex is unique if and only if the asymptotic collapse property
Proof of sufficiency. Let π be a stationary distribution, and let Y 0 ∼ π andỸ 0 be random are versions of the induced chain, and since the distribution of Y 0 is stationary, Y m ∼ π for every m ≥ 0. But the asymptotic collapse property (14) implies that as m → ∞, so the distribution ofỸ m approaches π weakly as m → ∞.
The converse is somewhat more subtle. Recall that the linear subspace V = V k orthogonal to the vector 1 is invariant under multiplication by any stochastic matrix. Define U ⊂ V to be the set of unit vectors u in V such that Q m u = u almost surely for every m ≥ 1. Clearly, the set U is a closed subset of the unit sphere in V, and it is also invariant, that is, Q m (U) ⊂ U almost surely. Proof. If (14) holds then lim m→∞ λ m,1 = 0, and so Q m u → 0 a.s. for every unit vector u ∈ V.
To prove the converse statement, assume that the asymptotic collapse property (14) fails. Then by Proposition 4.1, for each m ≥ 1 the largest singular value of Q m |V is λ m,1 = 1, and consequently there exist (possibly random) unit vectors v m ∈ V such that Q m v m = 1. Since each matrix S i is contractive, it follows that Q m v m+n = 1 for all m, n ≥ 1. Hence, by the compactness of the unit sphere and the continuity of the maps Q m |V, there exists a possibly random unit vector u such that Q m u = 1 for every m ≥ 1.
We will now show that there exists a non-random unit vector u such that Q m u = 1 for every m, almost surely. Suppose to the contrary that there were no such u. For each unit vector u, let p m (u) be the probability that Q m u < 1. Since the matrices S m are weakly contractive, for any unit vector u the events Q m u = 1 are decreasing in m, and so p m (u) is non-decreasing. Hence, by a subsequence argument, if for every m ≥ 1 there were a unit vector u m such that p m (u m ) = 0, then there would be a unit vector u such that p m (u) = 0 for every m. But by assumption there is no such u; consequently, there must be some finite m ≥ 1 such that p m (u) > 0 for every unit vector.
For each fixed m, the function p m (u) is lower semi-continuous (by the continuity of matrix multiplication), and therefore attains a minimum on the unit sphere of V. Since p m is strictly positive, it follows that there exists δ > 0 such that p m (u) ≥ δ for every unit vector u. But if this is the case then there can be no random unit vector u such that Q m u = 1 for every m ≥ 1, because for each m the event that Q m+1 u < Q m u would have conditional probability (given S 1 , S 2 , . . . , S m ) at least δ.
Proof of necessity in Proposition 4.4. If the asymptotic collapse property (14) fails, then by Lemma 4.5 there exists a unit vector u ∈ V such that Q m u = 1 for all m ≥ 1, almost surely. Hence, since ∆ k is contained in a translate of V, there exist distinct µ, ν ∈ ∆ k such that Q m (µ − ν) = µ − ν for all m ≥ 1, a.s. By compactness, there exists such a pair (µ, ν) ∈ ∆ 2 k for which µ − ν is maximal. Fix such a pair (µ, ν), and let A ⊂ ∆ 2 k be the set of all pairs (y, z) such that Note that the set A is closed, and consequently compact. Furthermore, because µ, ν have been chosen so that µ − ν is maximal, for any pair (y, z) ∈ A the points y and z must both lie in the boundary ∂ ∆ k of the simplex.
k −valued Markov chain, each of whose projections on ∆ k is a version of the induced chain. Since ∆ 3 k is compact, the Bogoliubov-Krylov argument implies that the Markov chain (Y m , Z m , R m ) has a stationary distribution λ whose projection λ Y,Z on the first two coordinates is supported by A. Each of the marginal distributions λ Y , λ Z , and λ R is obviously stationary for the induced chain on the simplex, and both λ Y and λ Z have supports contained in ∂ ∆ k . Clearly, if (Y, Z, R) ∼ λ then R = (Y + Z)/2.
We may assume that λ Y = λ Z , for otherwise there is nothing to prove. We claim that λ R λ Y . To see this, let D be the minimal integer such that λ Y is supported by the union  The k−fold product of Lebesgue measure on ∆ k will be referred to as Lebesgue measure on ∆ k k . Hypothesis 4.8. The distribution Σ of the random stochastic matrix S 1 is absolutely continuous with respect to Lebesgue measure on S k and has a density of class L p for some p > 1.
Hypothesis 4.8 implies that the conditional distribution of the ith column of S 1 , given the other k − 1 columns, is absolutely continuous relative to Lebesgue measure on ∆ k . Consequently, the conditional probability that it is a linear combination of the other k−1 columns is 0. Therefore, the matrices S t are almost surely nonsingular, and so the Proof. The assertion (18) follows from (17), by the strong law of large numbers, since the determinant is multiplicative. It remains to prove (17). Fix ε > 0, and consider the event det S 1 < ε. This event can occur only if the smallest singular value of S 1 is less than ε 1/k , and this can happen only if one of the vectors S 1 e i lies within distance ε 1/k (or so) of a convex linear combination of the remaining S 1 e j . The vectors S 1 e i , where i ∈ [k], are the columns of S 1 , whose distribution is assumed to have a L p density f (M) with respect to Lebesgue measure dM on S k . Fix an integer m ≥ 1, and consider the subset B m of S k consisting of all k×k stochastic matrices M such that the ith column Me i lies within distance e −m of the set of all convex combinations of the remaining columns Me j . Elementary geometry shows that the set B m has Lebesgue measure ≤ Ce −m , for some constant C = C k depending on the dimension but not on m or i. Consequently, by the Hölder inequality, for a suitable constant In fact, this also shows that log | det S 1 | has finite moments of all orders, and even a finite moment generating function in a neighborhood of 0.
lim n→∞ λ 1/n n,1 := λ 1 exists a.s. Moreover, the limit λ 1 is constant and satisfies 0 < λ 1 < 1.  Proof of Proposition 4.11. The almost sure convergence follows from the Furstenberg-Kesten theorem [4] (or alternatively, Kingman's subadditive ergodic theorem [8]), because the largest singular value of Q n |V is the matrix norm of Q n |V, and the matrix norm is submultiplicative. That the limit λ 1 is constant follows from the Kolmogorov 0−1 law, because if the matrices S j are nonsingular (as they are under the hypotheses on the distribution of S 1 ) the value of λ 1 will not depend on any initial segment S m S m−1 · · · S 1 of the matrix products.
Finally, the assertion that λ 1 > 0 follows from Proposition 4.9, because for any stochastic matrix each singular value is bounded below by the determinant. Proof. The lim sup of the maximum cannot be greater than λ 1 , because for each n the singular value λ n,1 of Q n |V is just the matrix norm Q n . To prove the reverse inequality, assume the contrary. Then there is a subsequence n = n m → ∞ along which lim sup m→∞ max i j Q n e i − Q n e j 1/n < λ 1 − ε for some ε > 0. Denote by u = u n ∈ V the unit vector that maximizes Q n u . Because the vectors e i − e i+1 form a basis of V, for each n the vector u n is a linear combination u n = i a ni (e i − e i+1 ), and because each u n is a unit vector, the coefficients a ni are uniformly bounded by (say) C in magnitude. Consequently, This implies that along the subsequence n = n m we have lim sup m→∞ Q n u n 1/n < λ 1 − ε.
But this contradicts the fact that Q n |V 1/n → λ 1 from proposition 4.11. This, however, will not be needed for the results of section 5. Hypothesis 4.8 guarantees that the sequence Q n e i − Q n e j 1/n has a limit, and that the limit is positive. When Hypothesis 4.8 fails, the convergence in (20) can be super-exponential (i.e., the limsup in (20) can be 0). For instance, this is the case if for some rank-1 stochastic matrix A with all entries positive there is positive probability that S 1 = A.

CONVERGENCE TO STATIONARITY OF EFCP CHAINS
Assume throughout this section that {X m } m≥1 is an EFCP on [k] [n] or [k] N with directing measure µ Σ , as defined by (10). Let S 1 , S 2 , . . . be the associated paintbox sequence: these are i.i.d. random column-stochastic matrices with distribution Σ. Proposition 3.3 shows that the joint distribution of the coordinate variables X i m of an EFCP chain with paintbox sequence {S i } i≥1 is controlled by the random matrix products Q m = S m S m−1 · · · S 1 . In this section we use this fact together with the results concerning random matrix products recounted in section 4 to determine the mixing rates of the restrictions {X [n] m } m≥1 of EFCP chains to the finite configuration spaces [k] [n] .

5.1.
Ergodicity. An EFCP chain need not be ergodic: for instance, if each S i is the identity matrix then every state is absorbing and X i m = X i 0 for every m ≥ 1 and every i ∈ N. More generally, if the random matrices S i are all permutation matrices then the unlabeled partitions of N induced by the labeled partitions X m do not change with m, and so the restrictions X [n] m cannot be ergodic. The failure of ergodicity in these examples stems from the fact that the matrix products Q m do not contract the simplex ∆ k .  (5); the λ−mixture is defined to be the average Thus, a random configuration X ∈ [k] [n] with distribution ̺ n λ can be obtained by first choosing s ∼ λ, then, conditional on s, independently assigning colors to the coordinates i ∈ [n] by sampling from the ̺ s distribution.
Proof. This is an immediate consequence of Proposition 3.3.

Proposition 5.3.
Assume that with probability one the random matrix products Q m asymptotically collapse the simplex ∆ k , that is, Proof. Fix n ≥ 1. By Propositions 4.4 and 5.1, there exists at least one stationary distribution π. Let {X m } m≥0 and {X m } m≥0 be conditionally independent versions of the EFCP given the (same) paintbox sequence (S i ) i≥1 , withX 0 ∼ π and X 0 ∼ ν arbitrary. Then for any time m ≥ 1 the conditional distributions of X m andX m given the paintbox sequence can be recovered from the formula (12) by integrating out over the distributions of X 0 andX 0 , respectively. But under the hypothesis (21), for large m the columns of Q m are, with high probability, nearly identical, and so for large m the products will be very nearly the same. It follows, by integrating over all paintbox sequences, that the unconditional distributions of X m andX m will be nearly the same when m is large. This proves that the stationary distribution π is unique and that as m → ∞ the distribution of X m converges to π. By proposition 4.4, if the asymptotic collapse property (21) fails then the induced Markov chain on the simplex has at least two distinct stationary distributions µ, ν. By Proposition 5.1, these correspond to different stationary distributions for the EFCP.

Mixing rate and cutoff for EFCP chains.
We measure distance to stationarity using the total variation metric (2). Write D(X m ) to denote the distribution of X m . In general, the distance D(X m ) − π TV will depend on the distribution of the initial state X 0 . The ε−mixing time is defined to be the number of steps needed to bring the total variation distance between D(X m ) and π below ε for all initial states x 0 : Theorem 5.4. Assume that with probability one the random matrix products Q m = S m S m−1 · · · S 1 asymptotically collapse the simplex ∆ k , that is, relation (14)  MIX (ε) ≤ K log n. Remark 5.5. In some cases the mixing times will be of smaller order of magnitude than log n. Suppose, for instance, that for some m ≥ 1 the event that the matrix Q m is of rank 1 has positive probability. (This would be the case, for instance, if the columns of S 1 were independently chosen from a probability distribution on ∆ k with an atom.) Let T be the least m for which this is the case; then T < ∞ almost surely, since matrix rank is submultiplicative, and Q m (∆ k ) is a singleton for any m ≥ T. Consequently, for any elements and {X m } m≥0 are versions of the EFCP with different initial conditions X 0 andX 0 , but with the same paintbox sequence S m , then by Proposition 3.3, X m andX m have the same conditional distribution, given σ(S i ) i≥1 , on the event T ≤ m. It follows that the total variation distance between the unconditional distributions of X m andX m is no greater than P{T > m}. Thus, for any n ∈ N, the EFCP mixes in O(1) steps, that is, for any ε > 0 there exists K ε < ∞ such that for all n, t (n) MIX (ε) ≤ K ε . Proof of Theorem 5.4. (A) Consider first the special case where for some δ > 0 every entry of S 1 is at least δ, with probability one. It then follows that no entry of Q m is smaller than δ. By Proposition 4.1, if (14) holds then the diameters of the sets Q m (∆ k ) shrink exponentially fast: in particular, for some (nonrandom) ̺ < 1, diameter(Q m (∆ k )) < ̺ m eventually, with probability 1. Let {X m } m≥0 and {X m } m≥0 be versions of the EFCP on [k] [n] with different initial conditions X 0 andX 0 , but with the same paintbox sequence S m . By Proposition 3.3, the conditional distributions of X m andX m given the paintbox sequence are product-multinomials: Since the multinomial distributions Q m (·, ·) assign probability at least δ > 0 to every color j ∈ [k], Corollary 2.3 implies that for any ε > 0, if m = K log n, where K > −1/(2 log ̺), then for all sufficiently large n the total variation distance between the conditional distributions of X m andX m will differ by ε on the event (24) holds. Since (24) holds eventually, with probability one, the inequality (23) now follows by Lemma 2.4.
(B) The general case requires a bit more care, because if the entries of the matrices Q m are not bounded below then the product-multinomial distributions (25) will not be bounded away from ∂ ∆ k , as required by Corollary 2.3.
Assume first that for some m ≥ 1 there is positive probability that Q m (∆ k ) is contained in the interior of ∆ k . Then for some δ > 0 there is probability at least δ that every entry of Q m is at least δ. Consequently, for any α > 0 and any K > 0, with probability converging to one as n → ∞, there will exist m ∈ [K log n, K(1 + α) log n] (possibly random) such that every entry of Q m is at least δ. By (24) the probability that the diameter of Q m (∆ k ) is less than ̺ m converges to 1 as m → ∞. It then follows from Corollary 2.3, by the same argument as in (A), that if K > −1/(2 log ̺) then the total variation distance between the conditional distributions of X m andX m will differ by a vanishingly small amount. Since total variation distance decreases with time, it follows that the total variation distance between the conditional distributions of X K+Kα andX K+Kα are also vanishingly small. Consequently, the distance between the unconditional distributions is also small, and so (23) follows, by Lemma 2.4.
(C) Finally, consider the case where Q m (∆ k ) intersects ∂ ∆ k for every m, with probability one. Recall (Proposition 5.1) that if the asymptotic collapse property (14) holds then the induced Markov chain Y m on the simplex has a unique stationary distribution ν. If there is no m ∈ N such that Q m (∆ k ) is contained in the interior of ∆ k , then the support of ν must be contained in the boundary ∂ ∆ k . Fix a support point v, and let m be sufficiently large that (24) holds. Since Q m (∆ k ) must intersect ∂ ∆ k , it follows that for any coordinate a ∈ [k] such that v a = 0 (note that there must be at least one such a, because v ∈ ∂ ∆ k ), the ath coordinate (Q m y) a of any point in the image Q m (∆ k ) must be smaller than ̺ m . If K is chosen sufficiently large and m ≥ K log n, then ̺ m < n −2 ; hence, by Proposition 3.3, P(X i m = a for some i ∈ [n] | σ(S l ) l≥1 ) ≤ n · n −2 = n −1 → 0, and similarly forX m . Therefore, the contribution to the total variation distance between the conditional distributions of X m andX m from states x 1 x 2 · · · x n in which the color a appears at least once is vanishingly small. But for those states for which no such color appears, the factors Q m (a, b) in (25) will be bounded below by the minimum nonzero entry of v, and the result will follow by a routine modification of the argument in (B) above.
Parts (A)-(B) of the foregoing proof provide an explicit bound in the special case where Q m (∆ k ) is contained in the interior of ∆ k with positive probability. Corollary 5.6. Assume that with probability one the random matrix products Q m = S m S m−1 · · · S 1 asymptotically collapse the simplex ∆ k , so that for some 0 < ̺ < 1, diameter(Q m (∆ k )) < ̺ m for all sufficiently large m, with probability 1. Assume also that with positive probability Q m (∆ k ) is contained in the interior of ∆ k , for some m ≥ 1. Then for any K > −1/(2 log ̺) the bound (23) holds for all sufficiently large n. Theorem 5.7. Assume that the paintbox distribution Σ satisfies hypothesis 4.8. Then the corresponding EFCP chains exhibit the cutoff phenomenon, that is, for all ε, δ ∈ (0, 1/2), if n is sufficiently large, then and λ 1 is the second Lyapunov exponent of the sequence Q m , that is, as in proposition (4.11).
Proof of the Upper Bound t MIX (ε) ≤ (θ + δ) log n. Because the distribution of S 1 is absolutely continuous with respect to Lebesgue measure, there is positive probability that all entries of S 1 = Q 1 are positive, and so there is positive probability that Q 1 (∆ k ) is contained in the interior of ∆ k . Therefore, Corollary 5.6 applies. But Proposition 4.11 and Corollary 4.14 implies that, under Hypothesis 4.8, that ̺ = λ 1 .
Proof of the Lower Bound t MIX (ε) ≥ (θ − δ) log n. It suffices to show that there exist initial states x 0 ,x 0 such that if {X t } t≥0 and {X t } t≥0 are versions of the EFCP with initial states X 0 = x 0 and X 0 =x 0 , respectively, then the distributions of X m andX m have total variation distance near 1 when m ≤ (θ − δ) log n. The proof will rely on Corollary 4.14, according to which there is a possibly random pair of indices i j for which Consider first, to fix ideas, the special case k = 2. In this case (28) holds with i = 1 and j = 2. Assume that n = 2n ′ is even (if n is odd, project onto the first n − 1 coordinates), and let x 0 = 11 · · · 111 · · · 1 andx 0 = 111 · · · 122 · · · 2 be the elements of [k] n such that x 0 has all coordinates colored 1, whilex 0 has its first n ′ colored 1 but its second n ′ colored 2. We will show that the distributions of X m andX m remain at large total variation distance at time m = (θ − α) log n. Without loss of generality, assume that both of the chains {X t } t≥0 and {X t } t≥0 have the same paintbox sequence S 1 , S 2 , . . . . Then by Proposition 3.3, the conditional distributions of X m andX m given S = σ(S t ) t≥1 are product-multinomials; in particular, for any state x l ∈ [k] [n] , Q m (x l , 1) and But relation (28) implies that, for some α = α(δ) > 0, if m = (θ−δ) log n then the ℓ ∞ −distance between the ith and jth columns of Q m is at least n −1/2+α , with probability approaching 1 as n → ∞. Consequently, the first n ′ and second n ′ coordinates ofX m are (conditional on S) independent samples from Bernoulli distributions whose parameters differ by at least n −1/2+α , but the 2n ′ coordinates of X m are (conditional on S) a single sample from the same Bernoulli distribution. It follows, by Lemma 2.5 (see Remark 2.6, statement (B)), that the unconditional distributions of X m andX m are at large total variation distance, because inX m the first and second blocks of n ′ coordinates are distinguishable whereas in X m they are not. Thus, if m = (θ − δ) log n then as n → ∞, The general case is proved by a similar argument. Let n = 2k(k − 1)n ′ be an integer multiple of 2k(k − 1). Break the coordinate set [n] into k(k − 1) non-overlapping blocks of size 2n ′ , one for each ordered pair (i, j) of distinct colors. In the block indexed by (i, j) let x 0 take the value i, and letx 0 take the value i in the first half of the block and the value j in the second half. Let {X t } t≥0 and {X t } t≥0 be versions of the EFCP with initial states x 0 andx 0 , respectively. Then by an argument similar to that used in the binary case k = 2, if m = (θ − δ) log n then for large n, in some block (i, j) ofX m the first n ′ and second n ′ coordinates ofX m will be distinguishable, but in X m they will not. Therefore, the unconditional distributions of X m andX m will be at total variation distance near 1.
Example 5.8 (Self-similar cut-and-paste chains). Self-similar cut-and-paste chains were introduced in [2]. These are EFCP chains for which the paintbox measure Σ = Σ ν is such that if S 1 ∼ Σ then the columns of S 1 are i.i.d. with common distribution ν, for some probability distribution ν on ∆ k . If S 1 , S 2 , . . . are i.i.d. with distribution Σ ν then the random matrix products Q m = S m S m−1 · · · S 1 asymptotically collapse the simplex provided the measure ν is nontrivial (i.e., not a point mass), and so Theorem 5.4 applies. If in addition the measure ν has a density of class L p relative to Lebesgue measure on ∆ k , then Theorem 5.7 applies.
We call X an Ehrenfest(α) chain. Define the coupling time T by Any two chains X and X ′ constructed from the same sequence A will be coupled by time T. An upper bound on the distance to stationarity of the general Ehrenfest(α) chain is obtained by standard properties of the hypergeometric distribution. In particular, let R t := # [n]\ t j=1 A j be the number of indices that have not appeared in one of A 1 , . . . , A t . By definition, {T ≤ t} = {R t = 0} and standard calculations give For fixed α ∈ (0, 1), the ε-mixing time is bounded above by (29) D(X t ) − π TV ≤ n 1 − ⌊αn⌋ n t ≤ n exp{−⌊αn⌋t/n} and it immediately follows, for β > 0 and t = n 2⌊αn⌋ log n + β n ⌊αn⌋ , that D(X t ) − π TV ≤ n −1/2 exp(−β) → 0 as β → ∞.
In general, we can obtain an upper bound of (1+β) f (n), where f (n) is a function of n ∈ N, by the relation The space [k] n is a group under addition modulo k defined by Write N n k to denote the group [k] n together with the operation +, which we define by componentwise addition modulo k of the coordinates of x ∈ [k] n . That is, for any x, x ′ ∈ [k] n , we have (x + x ′ ) i = x i + x ′i − 2 (mod k) + 1. This action makes the space L [n]:k into a group with a corresponding action • that can also be represented by left action of a partition matrix as follows. If we regard L, L ′ ∈ L [n]:k as elements of the group (N n k , +), then we define the group action L • L ′ ≡ L + L ′ in the obvious way. Alternatively, for each L ∈ L [n]:k , define ML ∈ M [n]:k as the k × k matrix whose jth column is the jth cyclic shift of the classes of L; that is, Then, for every L, L ′ ∈ L [∞]:k , we have L • L ′ := ML L ′ .
Example 5.12. For n ∈ N, let ̺ n be a probability measure on L [n]:k and let L 0 ∈ L [n]:k . A CP n (̺ n ) chain X with initial state X 0 = L 0 can be constructed as follows. First, generate L 1 , L 2 , . . . i.i.d. from ̺ n . Conditional on L 1 , L 2 , . . . , put X m = L m • · · · L 1 • X 0 . Under the definition (30) this is a cut-and-paste chain; however, the columns of each matrix are a deterministic function of one another and are not conditionally independent (as in previous examples).
Consider the case where ̺ n is a product measure of a probability measure λ on [k] which is symmetric, i.e. λ(j) = λ(k − j + 1) > 0, j = 1, . . . , k. In this case, it is easy to see that the CP n (̺ n ) chain is reversible and hence has the uniform distribution as its unique stationary distribution.
For this construction of X, the directing measure µ on M [n]:k induced by λ is neither row-column exchangeable (RCE) nor can it be represented as µ Σ for some measure Σ on S k . Nonetheless, the mixing time of X is bounded above by K log n for some constant K ≤ 2/ min j λ(j) < ∞.

PROJECTED CUT-AND-PASTE CHAINS
Recall that there is a natural projection Π n : L [n]:k → P [n]:k from the set L [n]:k of labeled partitions of [k] to the set P [n]:k of unlabeled partitions. If {X m } m≥0 is a Markov chain on the set [k] n L [n]:k whose transition probability matrix is invariant under permutations of the labels [k], then the projection {Π n (X m )} m≥0 is also a Markov chain. Assume henceforth that this is the case.
Following is a simple sufficient condition for the law of an EFCP chain to be invariant under permutations of the label set [k]. Say that a probability measure Σ on the space ∆ k k of column-stochastic matrices is row-column exchangeable if the distribution of S 1 ∼ Σ is invariant under permutations of the rows or the columns. Following Crane [3], we call the induced chain Π := Π ∞ (X) of an EFCP chain with RCE directing measure Σ a homogeneous cut-and-paste chain.
If the chain {X m } m≥0 is ergodic, then its unique stationary distribution is invariant under permutations of [k], since its transition probability matrix is, and therefore projects via Π n to a stationary distribution for the projected chain {Π n (X m )} m≥0 . The sufficiency principle (equation (3)) for total variation distance (see also Lemma 7.9 of [9]) implies that the rate of convergence of the projected chain {Π n (X m )} m≥0 is bounded by that of the original chain {X m } m≥0 . Theorem 5.4 provides a bound for this convergence when the chain {X m } m≥0 is an EFCP.

Corollary 6.2. Assume that {X m = X [n]
m } m≥0 is an EFCP chain on [k] [n] whose paintbox measure Σ is RCE and satisfies the hypothesis of Theorem 5.4 (in particular, the random matrix products Q m asymptotically collapse the simplex ∆ k ). Then for a suitable constant K = K Σ < ∞ depending only on the distribution Σ of S 1 , and for any ε > 0, the mixing times t (n) MIX (ε) of the projected chain {Π n (X m )} m≥0 satisfy t (n) MIX (ε) ≤ K log n for all sufficiently large n. Theorem 6.3. Suppose Σ is a row-column exchangeable probability measure on S k . Let X be a CP n (µ Σ ) chain and let Y = Π n (X) be its projection into P [n]:k . Let t X (ε) and t Y (ε) denote the ε-mixing times of X and Y respectively. Then t X (ε) = t Y (ε).
In particular, if l(ε, n) ≤ t X (ε) ≤ L(ε, n) are upper and lower bounds on the ε-mixing times of X, then l(ε, n) ≤ t Y (ε) ≤ L(ε, n), and vice versa. Moreover, X exhibits the cutoff phenomenon if and only if Y exhibits the cutoff phenomenon.
Proof. If π is the stationary distribution for X, then πΠ −1 n is the stationary distribution of Y. The rest follows by the proceeding discussion regarding sufficiency of Π n (X) and the sufficiency principle (3). Corollary 6.4. Assume that the paintbox measure Σ is row-column exchangeable and satisfies hypothesis 4.8, and let {X m } m≥0 be the EFCP on [k] [n] with associated paintbox measure Σ. Then the projected CP n (µ Σ ) chain Π n (X) exhibits the cutoff phenomenon at time θ log n, where θ = −1/(2 log λ 1 ).