MIXING TIMES FOR MARKOV CHAINS ON WREATH PRODUCTS AND RELATED HOMOGENEOUS SPACES

We develop a method for analyzing the mixing times for a quite general class of Markov chains on the complete monomial group G o S n and a quite general class of Markov chains on the homogeneous space ( G o S n ) = ( S r (cid:2) S n − r ). We derive an exact formula for the L 2 distance in terms of the L 2 distances to uniformity for closely related random walks on the symmetric groups S j for 1 (cid:20) j (cid:20) n or for closely related Markov chains on the homogeneous spaces S i + j = ( S i (cid:2) S j ) for various values of i and j , respectively. Our results are consistent with those previously known, but our method is considerably simpler and more general.


Introduction and Summary.
In the proofs of many of the results of Schoolfield (2001a), the L 2 distance to uniformity for the random walk (on the so-called wreath product of a group G with the symmetric group S n ) being analyzed is often found to be expressible in terms of the L 2 distance to uniformity for related random walks on the symmetric groups S j with 1 ≤ j ≤ n. Similarly, in the proofs of many of the results of Schoolfield (2001b), the L 2 distance to stationarity for the Markov chain being analyzed is often found to be expressible in terms of the L 2 distance to stationarity of related Markov chains on the homogeneous spaces S i+j /(S i × S j ) for various values of i and j. It is from this observation that the results of this paper have evolved. We develop a method, with broad applications, for bounding the rate of convergence to stationarity for a general class of random walks and Markov chains in terms of closely related chains on the symmetric groups and related homogeneous spaces. Certain specialized problems of this sort were previously analyzed with the use of group representation theory. Our analysis is more directly probabilistic and yields some insight into the basic structure of the random walks and Markov chains being analyzed.

Markov Chains on G S n .
We now describe one of the two basic set-ups we will be considering [namely, the one corresponding to the results in Schoolfield (2001a)]. Let n be a positive integer and let P be a probability measure defined on a finite set G (= {1, . . . , m}, say). Imagine n cards, labeled 1 through n on their fronts, arranged on a table in sequential order. Write the number 1 on the back of each card. Now repeatedly permute the cards and rewrite the numbers on their backs, as follows. For each independent repetition, begin by choosing integers i and j independently according to P .
If i = j, transpose the cards in positions i and j. Then, (probabilistically) independently of the choice of i and j, replace the numbers on the backs of the transposed cards with two numbers chosen independently from G according to P .
If i = j (which occurs with probability 1/n), leave all cards in their current positions. Then, again independently of the choice of j, replace the number on the back of the card in position j by a number chosen according to P .
When m = 1, i.e., when the aspect of back-number labeling is ignored, the state space of the chain can be identified with the symmetric group S n , and the mixing time can be bounded as in the following classical result, which is Theorem 1 of Diaconis and Shahshahani (1981) and was later included in Diaconis (1988) as Theorem 5 in Section D of Chapter 3. The total variation norm ( · TV ) and the L 2 norm ( · 2 ) will be reviewed in Section 1.3. Theorem 1.2 Let ν * k denote the distribution at time k for the random transpositions chain (1.1) when m = 1, and let U be the uniform distribution on S n . Let k = 1 2 n log n + cn. Then there exists a universal constant a > 0 such that Without reviewing the precise details, we remark that this bound is sharp, in that there is a matching lower bound for total variation (and hence also for L 2 ). Thus, roughly put, 1 2 n log n+cn steps are necessary and sufficient for approximate stationarity. Now consider the chain (1.1) for general m ≥ 2, but restrict attention to the case that P is uniform on G. An elementary approach to bounding the mixing time is to combine the mixing time result of Theorem 1.2 (which measures how quickly the cards get mixed up) with a coupon collector's analysis (which measures how quickly their back-numbers become random). This approach is carried out in Theorem 3.6.5 of Schoolfield (2001a), but gives an upper bound only on total variation distance. If we are to use the chain's mixing-time analysis in conjunction with the powerful comparison technique of Diaconis andSaloff-Coste (1993a, 1993b) to bound mixing times for other more complicated chains, as is done for example in Chapter 9 of Schoolfield (1998), we need an upper bound on L 2 distance.
Such a bound can be obtained using group representation theory. Indeed, the Markov chain we have described is a random walk on the complete monomial group G S n , which is the wreath product of the group G with S n ; see Schoolfield (2001a) for further background and discussion. The following result is Theorem 3.1.3 of Schoolfield (2001a). Theorem 1.3 Let ν * k denote the distribution at time k for the random transpositions chain (1.1) when P is uniform on G (with |G| ≥ 2). Let k = 1 2 n log n + 1 4 n log(|G| − 1) + cn. Then there exists a universal constant b > 0 such that For L 2 distance (but not for total variation distance), the presence of the additional term The group-representation approach becomes substantially more difficult to carry out when the card-rearrangement scheme is something other than random transpositions, and prohibitively so if the resulting step-distribution on S n is not constant on conjugacy classes. Moreover, there is no possibility whatsoever of using this approach when P is non-uniform, since then we are no longer dealing with random walk on a group.
In Section 2 we provide an L 2 -analysis of our chain for completely general shuffles Q of the sort we have described. More specifically, in Theorem 2.3 we derive an exact formula for the L 2 distance to stationarity in terms of the L 2 distance for closely related random walks on the symmetric groups S j for 1 ≤ j ≤ n. Subsequent corollaries establish more easily applied results in special cases. In particular, Corollary 2.8 extends Theorem 1.3 to handle non-uniform P .
Our new method does have its limitations. The back-number randomizations must not depend on the current back numbers (but rather chosen afresh from P ), and they must be independent and identically distributed from card to card. So, for example, we do not know how to adapt our method to analyze the "paired-shuffles" random walk of Section 5.7 in Schoolfield (1998).

Markov Chains on
We now turn to our second basic set-up [namely, the one corresponding to the results in Schoolfield (2001b)]. Again, let n be a positive integer and let P be a probability measure defined on a finite set G = {1, . . . , m}.
Imagine two racks, the first with positions labeled 1 through r and the second with positions labeled r + 1 through n. Without loss of generality, we assume that 1 ≤ r ≤ n/2. Suppose that there are n balls, labeled with serial numbers 1 through n, each initially placed at its corresponding rack position. On each ball is written the number 1, which we shall call its G-number. Now repeatedly rearrange the balls and rewrite their G-numbers, as follows.
Consider any Q as in Section 1.1. At each time step, chooseπ from Q and then (a) permute the balls by multiplying the current permutation of serial numbers by π; (b) independently, replace the G-numbers of all balls whose positions have changed as a result of the permutation, and also every ball whose (necessarily unchanged) position belongs to J, by numbers chosen independently from P ; and (c) rearrange the balls on each of the two racks so that their serial numbers are in increasing order.
Notice that steps (a)-(b) are carried out in precisely the same way as steps (a)-(b) in Section 1.1. The state of the system is completely determined, at each step, by the ordered n-tuple of Gnumbers of the n balls 1, 2, . . . , n and the unordered set of serial numbers of balls on the first rack. We have thus described a Markov chain on the set of all |G| n · n r ordered pairs of n-tuples of elements of G and r-element subsets of a set with n elements.
In our present setting, the transpositions example (1.1) fits the more general description, taking Q to be defined by Again there are matching lower bounds, for r not too far from n/2, so this Markov chain is twice as fast to converge as the random walk of Theorem 1.2.
The following analogue, for the special case m = 2, of Theorem Notice that Theorem 1.6 provides (essentially) the same mixing time bound as that found in Theorem 1.5. Again there are matching lower bounds, for r not too far from n/2, so this Markov chain is twice as fast to converge as the random walk of Theorem 1.3 in the special case m = 2.
In Section 3, we provide a general L 2 -analysis of our chain, which has state space equal to the homogeneous space (G S n )/(S r × S n−r ). More specifically, in Theorem 3.3 we derive an exact formula for the L 2 distance to stationarity in terms of the L 2 distance for closely related Markov chains on the homogeneous spaces S i+j /(S i × S j ) for various values of i and j. Subsequent corollaries establish more easily applied results in special cases. In particular, Corollary 3.8 extends Theorem 1.6 to handle non-uniform P .
Again, our method does have its limitations. For example, we do not know how to adapt our method to analyze the "paired-flips" Markov chain of Section 7.4 in Schoolfield (1998).

Distances Between Probability Measures.
We now review several ways of measuring distances between probability measures on a finite set G. Let R be a fixed reference probability measure on G with R(g) > 0 for all g ∈ G. As discussed in Aldous and Fill (200x), for each 1 ≤ p < ∞ define the L p norm ν p of any signed measure ν on G (with respect to R) by Thus the L p distance between any two probability measures P and Q on G (with respect to R) is In our applications we will always take Q = R (and R will always be the stationary distribution of the Markov chain under consideration at that time). In that case, when U is the uniform distribution on G, The total variation distance between P and Q is defined by If P(·, ·) is a reversible transition matrix on G with stationary distribution R = P ∞ (·), then, for any g 0 ∈ G, All of the distances we have discussed here are indeed metrics on the space of probability measures on G.

Markov Chains on G S n .
We now analyze a very general Markov chain on the complete monomial group G S n . It should be noted that, in the results which follow, there is no essential use of the group structure of G. So the results of this section extend simply; in general, the Markov chain of interest is on the set G n × S n .

A Class of Chains on G S n .
We introduce a generalization of permutations π ∈ S n which will provide an extra level of generality in the results that follow. Recall that any permutation π ∈ S n can be written as the product of disjoint cyclic factors, say where the K := k 1 + · · · + k numbers i We refer to the ordered pair of a permutation π ∈ S n and a subset J of F (π) as an augmented permutation. We denote the set of all such ordered pairsπ = (π, J), with π ∈ S n and J ⊆ F (π), by S n . For example,π ∈ S 10 given byπ = ((12)(34)(567), {8, 10}) is the augmentation of the permutation π = (12)(34)(567) ∈ S 10 by the subset {8, 10} of F (π) = {8, 9, 10}. Notice that any givenπ ∈ S n corresponds to a unique permutation π ∈ S n ; denote the mappingπ → π by T . Forπ = (π, J) ∈ S n , define I(π) to be the set of indices i included inπ, in the sense that either i is not a fixed point of π or i ∈ J; for our example, I(π) = {1, 2, 3, 4, 5, 6, 7, 8, 10}.
Let Q be a probability measure on S n such that We refer to this property as augmented symmetry. This terminology is (in part) justified by the fact that if Q is augmented symmetric, then the measure Q on S n induced by T is given by and so is symmetric in the usual sense. We assume that Q is not concentrated on a subgroup of G or a coset thereof. Thus Q * k approaches the uniform distribution U on S n for large k.
Suppose that G is a finite group. Label the elements of G as g 1 , g 2 , . . . , g |G| . Let P be a probability measure defined on G. Define p i := P (g i ) for 1 ≤ i ≤ |G|. To avoid trivialities, we suppose p min := min Letξ 1 ,ξ 2 , . . . be a sequence of independent augmented permutations each distributed according to Q. These correspond uniquely to a sequence ξ 1 , ξ 2 , . . . of permutations each distributed according to Q.
. .) to be the random walk on S n with Y 0 := e and Y k := ξ k ξ k−1 · · · ξ 1 for all k ≥ 1. (There is no loss of generality in defining Y 0 := e, as any other π ∈ S n can be transformed to the identity by a permutation of the labels.) Define X := (X 0 , X 1 , X 2 , . . .) to be the Markov chain on G n such that X 0 := x 0 = (χ 1 , . . . , χ n ) with χ i ∈ G for 1 ≤ i ≤ n and, at each step k for k ≥ 1, the entries of X k−1 whose positions are included in I(ξ k ) are independently changed to an element of G distributed according to P .
Notice that the random walk on G S n analyzed in Theorem 1.3 is a special case of W, with P being the uniform distribution and Q being defined as at (1.1). Let P(·, ·) be the transition matrix for W and let P ∞ (·) be the stationary distribution for W.

Notice that
for any ( x; π) ∈ G S n and that P (( x; π), ( y; σ)) = for any ( x; π), ( y; σ) ∈ G S n . Thus, using the augmented symmetry of Q, Therefore, P is reversible, which is a necessary condition in order to apply the comparison technique of Diaconis and Saloff-Coste (1993a).

Convergence to Stationarity: Main Result.
For notational purposes, let Example Let Q be defined as at (1.1). Then Q satisfies the augmented symmetry property (2.0). In Corollary 2.8 we will be using Q to define a random walk on G S n which is precisely the random walk analyzed in Theorem 1.3.
For now, however, we will be satisfied to determine Q S ( The following result establishes an upper bound on the total variation distance by deriving an exact formula for P k (( x 0 , e), ·) − P ∞ (·) 2 2 .

Theorem 2.3
Let W be the Markov chain on the complete monomial group G S n defined in Section 2.1. Then where µ n (J) and d k (J) are defined at (2.1) and (2.2), respectively.
Before proceeding to the proof, we note the following. In the present setting, the argument used to prove Theorem 3.6.5 of Schoolfield (2001a) gives the upper bound Combining these results gives P (W k = ( x; π)) =

J:J⊆[n]
the law of a random walk on S n (through step k) with step distribution Q S (J ) . Thus, using the reversibility of P and the symmetry of Q S (J ) , from which the desired result follows.

Corollaries.
We now establish several corollaries to our main result. Then The result then follows readily from Theorem 2.3.
Proof It follows from Corollary 2.4 that (2.6) If we let i = n − j, then the upper bound becomes Since c > 0, we have exp e −2c < e. Therefore from which the desired result follows.
The result thus follows from Corollary 2.5, with m = 1.
Theorem 2.3, and its subsequent corollaries, can be used to bound the distance to stationarity of many different Markov chains W on G S n for which bounds on the L 2 distance to uniformity for the related random walks on S j for 1 ≤ j ≤ n are known. Theorem 1.2 provides such bounds for random walks generated by random transpositions, showing that 1 2 j log j steps are sufficient. Roussel (2000) has studied random walks on S n generated by permutations with n − m fixed points for m = 3, 4, 5, and 6. She has shown that 1 m n log n steps are both necessary and sufficient.
Using Theorem 1.2, the following result establishes an upper bound on both the total variation distance and P k (( x 0 , e), ·) − P ∞ (·) 2 in the special case when Q is defined by (1.1). Analogous results could be established using bounds for random walks generated by random m-cycles. When P is the uniform distribution on G, the result reduces to Theorem 1.3.
where Q S j is the measure on S j induced by (1.1) and U S j is the uniform distribution on S j .
It then follows from Theorem 1.2 that there exists a universal constant a > 0 such that D k (j) ≤ 4a 2 e −2c for each 1 ≤ j ≤ n, when k ≥ 1 2 j log j + 1 2 cj. Since n ≥ j and p min ≤ 1/2, this is also true when k = 1 2 n log n + 1 4 n log 1 p min − 1 + 1 2 cn. It then follows from Corollary 2.5, with m = 2, that from which the desired result follows.
Corollary 2.8 shows that k = 1 2 n log n + 1 4 n log 1 p min − 1 + 1 2 cn steps are sufficient for the L 2 distance, and hence also the total variation distance, to become small. A lower bound in the L 2 distance can also be derived by examining n 2 1 p min − 1 1 − 1 n 4k , which is the contribution, when j = n − 1 and m = 2, to the second summation of (2.6) from the proof of Corollary 2.5. In the present context, the second summation of (2.6) is the second summation in the statement of Theorem 2.3 with µ n (J) = (|J|/n) 2 . Notice that k = 1 2 n log n + 1 4 n log 1 p min − 1 − 1 2 cn steps are necessary for just this term to become small.

A Class of Chains on
We now modify the concept of augmented permutation introduced in Section 2.1. Rather than the ordered pair of a permutation π ∈ S n and a subset J of F (π), we now take an augmented permutation to be the ordered pair of a permutation π ∈ S n and a subset J of F (R(π)). [ In the above example, F (R(π)) = F (π) = {2, 5, 6, 7}]. The necessity of this subtle difference will become apparent when defining Q. Forπ = (π, J) ∈ S n (defined in Section 2.1), define I(π) := I(R(π), J) = I(R(T (π)), J).
Thus I(π) is the union of the set of indices deranged by R(T (π)) and the subset J of the fixed points of R(T (π)).
Let Q be a probability measure on the augmented permutations S n satisfying the augmented symmetry property (2.0). Let Q be as described in Section 2.1.
Let P be a probability measure defined on a finite group G and let p i for 1 ≤ i ≤ |G| and p min > 0 be defined as in Section 2.1. Define X := (X 0 , X 1 , X 2 , . . .) to be the Markov chain on G n such that X 0 := x 0 = (χ 1 , . . . , χ n ) with χ i ∈ G for 1 ≤ i ≤ n and, at each step k for k ≥ 1, the entries of X k−1 whose positions are included in I(ξ k ) are independently changed to an element of G distributed according to P .
Define W := (W 0 , W 1 , W 2 , . . .) to be the Markov chain on (G S n )/(S r × S n−r ) such that W k := (X k ; Y k ) for all k ≥ 0. Notice that the signed generalization of the classical Bernoulli-Laplace diffusion model analyzed in Theorem 1.6 is a special case of W, with P being the uniform distribution on Z 2 and Q being defined as at (1.4).
Therefore, P is reversible, which is a necessary condition in order to apply the comparison technique of Diaconis and Saloff-Coste (1993b).

Convergence to Stationarity: Main Result.
For Let Q X (J ) be the probability measure on X (J) induced (as described in Section 2.3 of Schoolfield (1998)) by Q S (J ) . Also let U X (J ) be the uniform measure on X (J) . For notational purposes, let Example Let Q be defined as at (1.4). Then Q satisfies the augmented symmetry property (2.0). In the Bernoulli-Laplace framework, the elements Q(κ, {j}) and Q(κ, {i, j}) leave the balls on their current racks, but single out one or two of them, respectively; the element Q(τ κ, ?) switches two balls between the racks. In Corollary 3.8 we will be using Q to define a Markov chain on (G S n )/(S r × S n−r ) which is a generalization of the Markov chain analyzed in Theorem 1.6.
It is also easy to verify that Q S (J ) is the probability measure defined at (1. The following result establishes an upper bound on the total variation distance by deriving an exact formula for P k (( x 0 ;ẽ), ·) − P ∞ (·) 2 2 .  The proof continues exactly as in the proof of Theorem 2.3 to determine that P (W k = ( x;π)) = J:B⊆J⊆ [n] (−1) |J|μ

J:J⊆[n]
(−1) |J|μ Notice that {H k ⊆ J} = k =1 I(ξ ) ⊆ J for any k and J. So L ((Y 0 , Y 1 , . . . , Y k | H k ⊆ J)) is the law of a Markov chain on S n /(S r × S n−r ) (through step k) with step distribution Q X (J ) . Thus, using the reversibility of P and the symmetry of Q X (J ) ,