Mixing time bounds for overlapping cycles shufﬂes

Consider a deck of n cards. Let p 1 , p 2 ,..., p n be a probability vector and consider the mixing time of the card shufﬂe which at each step of time picks a position according to the p i ’s and move the card in that position to the top. This setup was introduced in [ 5 ] , where a few special cases were studied. In particular the case p n − k = p n = 1 / 2, k = Θ( n ) , turned out to be challenging and only a few lower bounds were produced. These were improved in [ 1 ] where it was shown that the relaxation time for the motion of a single card is Θ( n 2 ) when k / n approaches a rational number. In this paper we give the ﬁrst upper bounds. We focus on the case m : = n − k = (cid:98) n / 2 (cid:99) . It is shown that for the additive symmetrization as well as the lazy version of the shufﬂe, the mixing time is O ( n 3 log n ) . We then consider two other modiﬁcations of the shufﬂe. The ﬁrst one is the case p n − k = p n − k + 1 = 1 / 4 and p n = 1 / 2. Using the entropy technique developed by Morris [ 7 ] , we show that mixing time is O ( n 2 log 3 n ) for the shufﬂe itself as well as for the symmetrization. The second modiﬁcation is a variant of the ﬁrst, where the moves are made in pairs so that if the ﬁrst move involves position n , then the second move must be taken from positions m or m + 1 and vice versa. Interestingly, this shufﬂe is much slower; the mixing time is at least of order n 3 log n and at most of order n 3 log 3 n . It is also observed that results of [ 1 ] can be modiﬁed to improve lower bounds for some k = o ( n ) .


Introduction
How many times does one need to shuffle a deck of n cards to properly randomize it? This intuitively attracting question has turned out to provide one of the most important playgrounds for the development of the more general field of mixing times for Markov chains. The topic dates back to the early 20th century, and has been very lively for the last thirty years or so.
In [5], we introduced the class of "GR-shuffles" as a generalization of the (inverse) Rudvalis shuffle, for which at each step of time, either the bottom or the second bottom card is moved to the top, each with probability 1/2. For a GR-shuffle, we have a probability vector p 1 , p 2 , . . . , p n , and at each step, we pick position i with probability p i and move the card at that position to the top. In this general form, the problem turns out to be very difficult to to analyze, so the focus in [5] was on two special cases. The first of these was the case p n−k+1 = . . . = p n = 1 k , for some k = k(n), the "bottom-to-top shuffle". Here it was shown, via coupling on one hand and a variant of Wilson's technique on the other, that the mixing time is Θ( n 3 k 2 log n). The second special case, the topic of this paper, was p n−k = p n = 1/2, k = k(n) (and k odd to avoid parity problems). An application of Wilson's technique gave a lower bound of order n 3 k 2 log n, the same as for the bottom-to-top shuffle. This bound, however, is not tight, at least not for k of large order. Indeed, it was observed that for k = n/2, the mixing time for single card motion is Ω(n 2 ). Angel, Peres and Wilson [1] developed this observation via a careful analysis of the spectrum of the single card chain. The authors observed how the eigenvalues form two cycles close to the boundary of the unit disc in the complex plane, This, combined with the structure of the shuffle itself, inspired them to propose the name "overlapping cycles shuffle" for this shuffle. In particular, they found that for any rational α ∈ (0, 1) and k = αn , the relaxation time (i.e. the inverse spectral gap) is Θ(n 2 ). Perhaps surprisingly, they also showed that for a.e. α, the relaxation time is in fact of a different order, namely Θ(n 3/2 ). Neither [5] nor [1] gave any upper bounds on the mixing time.
Here we will make progress of various sorts. First, after the necessary preliminaries in Section 2, it will be observed in Section 3 that the results of Angel, Peres and Wilson generalize to give a relaxation time of Θ(nk) when k|n − k for a single card. When k = o(n), this can then be used together with Wilson's technique (which does not seem to add any extra information for k = Θ(n)) to give a lower bound of order nk log(n/k), an improvement over [5] for k = Ω(n 2/3 ).
From Section 4 and on, we will focus solely on the case k = n/2 and variants on this shuffle, even though most of the results can be obviously generalized to k = αn , α rational. The variant we shall mainly focus on is the one with p m = p m+1 = 1/4 and p n = 1/2, n = 2m; let us here call this the "triple shuffle". A variant of the triple shuffle will also be considered. For this variant the shuffles are made in pairs, and if the first shuffle in the pair takes the card in position n to the top, then the next move must pick a card from positions m or m + 1 vice versa. This shuffle will be called the "equalized shuffle". Due to the more deterministic motion of the cards on the lower half of the deck, the equalized shuffle turns out to be slower and more amenable to analysis than the original shuffle. In Section 4 we show that mixing time is Ω(n 3 log n). (Note that if one equalizes the original overlapping cycles shuffle in the same way, the resulting shuffle will not mix due to parity problems.) From Section 5, we turn to upper bounds. The short fifth section is concerned with upper bounds in L 2 . These are derived via a standard application of the comparison technique of Diaconis and Saloff-Coste [2]. This, combined with a simple path-counting argument, yields upper bounds in L 2 of O(n 3 log n) for the additive symmetrization and the lazy version of the original overlapping cycles shuffle as well as the triple shuffle, and O(n 5 log n) for the symmetrized or lazy equalized shuffle.
In Section 6, we turn back to upper bounds in total variation. For the triple shuffle and the equalized shuffle the results of Section 5 are improved, using Morris' entropy technique. The equalized shuffle is shown to have mixing time O(n 3 log 3 n) whereas the triple shuffle mixes in time O(n 2 log 3 n). We also show this bound is valid also for the symmetrized triple shuffle.

Basics
Let S be a finite set and let π be a probability measure on S. For a signed measure ν on S with ν(S) = 0 and p ∈ [1, ∞), the L p -norm of ν with respect to π is given by The total variation norm of ν is given by Obviously ν 1 = 2 ν T V . By Cauchy-Schwarz, ν p ≤ ν q whenever p ≤ q.
Let {X t } ∞ t=0 be an aperiodic irreducible Markov chain on S with stationary distribution π. It is common to measure the distance between the distribution of X t and the stationary distribution by some L p -norm or, most commonly, the total variation norm of their difference. The mixing time of the chain is defined as The convergence time in L 2 is given bŷ τ := min{t : P(X t ∈ ·) − π 2 ≤ 1 2 }.
By the above, τ mix ≤τ. The relaxation time is defined as where the maximum is taken over eigenvalues of the transition matrix. In this presentation, the state space will always be the symmetric group on n cards: S = S n , and the Markov chains will be random walks on this group, so π will be uniform. Hence

Wilson's technique
Let P be the transition matrix of {X t } and let ((1 − γ)e iθ , φ) be an eigenvalue/eigenvector pair for P and suppose that γ ≤ 1/2. Let Then (one variant of) Wilson's technique states that: The technique was introduced in [9]. A proof for this particular version is found in [5]. The idea of the proof is to use the eigenvalue property E[Φ(X t+1 )|X t ] = λφ(X t ) and R to bound the variance of φ(X T ). Then Chebyshev's inequality is used to see that X T with high probability has a value far from what it should have, had it been stationary.

Relaxation times and lower bounds for k = o(n)
Let P 1 be the transition matrix for the motion of a single card. The eigenvalue/eigenvector equation λξ = P 1 ξ, setting ξ(1) = 1 leads to and the characteristic equation Assume that k|n − k and let Some algebraic manipulation shows that f (λ 0 ) = O(k −2 ). Since, as is readily seen, f (λ 0 ) = Θ(n), there is a zero of f within distance O(n −1 k −2 ) of λ 0 . This follows e.g. from Theorem 4.2 of [5].
Hence there is an eigenvalue of the form If k is also such that 2k|n − k, then this can be strengthened a bit further, since one can then take γ = (1 + o(1))π 2 /(2nk) with θ = (1 + o(1))π/k. In summary: Minor adjustments of the above proof also lead to Theorem 3.2. For the triple shuffle, i.e. p n−k = p n−k+1 = 1/4 and p n = 1/2, with k|n − k we have Next we plug this into Wilson's technique under the assumption k = o(n). This will lead to the following result. Moreover, if 2k|n − k, then where the sum is over those i for which ℜξ(i) ≥ 0. Then |φ(X 0 )| = Θ(n). To bound R, use the triangle inequality to write We note that Theorems 3.1 and 3.3 improve over [5] when k is of larger order than n 2/3 .

Lower bound for the equalized shuffle
Recall that for the equalized shuffle, the moves are made in pairs, and that the second move in each pair takes the card in position n to the top if and only if the first move does not. To be more precise, counting each pair of moves in this sense as one step, the equalized shuffle is the random walk on S n generated by the step distribution that gives probability 1/4 to each of the permutations Recall also that n = 2m.

Remark.
We use the convention that when π 1 and π 2 are two permutations, we write π 1 π 2 for π 2 • π 1 . Also, when we use a permutation π to represent the order of a deck of cards, then π(i) is the position of card i, so that π −1 (k) is the label of the card in position k.
Unlike the unequalized case, the cards on the lower half of deck also move in a mainly deterministic manner. We shall see that this makes the single card chain considerably slower and Wilson's technique to work smoothly.
. Now equations m and m + 1 give, after some algebra, .
After some cleaning up, we have the characteristic equation Rewrite this, letting s := r − 1/3, to get Let w := 2π/s and λ 0 := (1 − cw 2 /s)e iw . Then some algebra using Taylor's formula quickly reveals that ℑg(λ 0 ) = O(n −3 ) and where, as in the previous section, Y i t is the position of card i at time t and the sum is over i for which ℜξ(i) > 0. We want to apply Wilson's technique, so let us estimate R. Cards in positions 1, . . . , m − 2 and m + 1, . . . , n − 1 move deterministically to a position whose contribution to φ differs by a factor λ from the previous contribution. Hence each such card contributes to a change in and hence together at most O(n −2 ). From the above relations between the ξ( j)'s it is easily seen that the remaining cards contribute to a change in . Lemma 2.1 now gives a lower bound of 135 36544π 2 n 3 log n. We summarize the results of the present section in the following theorem.

Upper bounds in L 2
In this section we will utilize the comparison technique of Diaconis and Saloff-Coste [2]. Let {X t } and {Y t } be two random walks on S n generated by the symmetric probability measures µ and ν respectively. Write E and F for the supports of µ and ν respectively. For each element y ∈ F , choose a representation y = x 1 x 2 . . . x k where x j ∈ E and k is odd. Write | y| = k. Let Then it is shown in [2] that A * is an upper bound for the ratio of the Dirichlet forms of ν and µ. For our purposes it suffices to know the following consequence.
Lemma 5.1 is a special case of Lemma 5 of [2].
Since the comparison technique is restricted to symmetric random walks, we also need a result from Saloff-Coste [8] (Theorem 10.2), which states that if H := µ(id) is significant, then a random walk generated by µ cannot be much slower than its symmetrized version. (Recall that the additive symmetrization of the walk generated by µ is defined as the walk generated by ( Then P(X t ∈ ·) − π 2 2 ≤ n!e −H t/2 + P(X s H t/4 ∈ ·) − π 2 2 .
The most common benchmark walk, {Y t }, to use for comparison is the random transpositions shuffle, i.e. the random walk generated by ν(id) = 1/n and ν(i j) = 2/n 2 , 1 ≤ i < j ≤ n. The random transpositions shuffle is very well understood. In particular the next result, due to Diaconis and Shashahani [3], will be of use here.

Lemma 5.3.
Let {Y t } ∞ t=0 be the random transpositions shuffle. There exists a constant C such that for t = (1/2)n(log n + c) , We are now ready to prove the main result of this section. . Note that this representation of y has odd length. Since |v| = m + 2, we have | y| = 2n + 5. If j > m, we add a prefix to v making the necessary moves to take both i and j to the upper half of the deck. This takes at most a prefix of length m + 1. Hence in general |v| ≤ 2m + 3 and so | y| is still odd and | y| ≤ 3m + 7 ≤ 4n. Now apply Lemma 5.1. For {X s t } we get A * = 64n 2 , so Lemma 5.1 and Lemma 5.3 give P(X s 128n 3 log n ∈ ·) − π 2 2 ≤ n!e −2n log n + P(Y n log n ∈ ·) − π 2 2 = o(1).
Finally Lemma 5.2 entails P(L 1024n 3 log n ∈ ·) − π 2 2 = o(1). Since the same bounds on | y| hold for the triple shuffle, the exact same argument goes through with only an adjustment of time by a factor 2.
Next we turn to the equalized shuffles. This is very similar to the above so we will be a bit sketchy. Consider the lazy additive symmetrization of the equalized shuffle. It is an easy exercise to show that a transposition of positions m − 1 and m can be made in two moves. A transposition of positions m and m+1 can be brought about in four moves. Now fix two positions i < j. Unless j = i +1, a round of n moves can always be made to bring the two cards one step closer together. From this it is easily seen that n 2 moves suffice to bring cards i and j to positions m − 1 and m or m and m + 1, where they can be transposed whereupon the moves bringing them together can be reversed. To make the number of moves odd, add an extra lazy move. Hence |(i, j)| = O(n 2 ) and so in Lemma 5.1, A * = O(n 4 ). Now an argument analogous to the above givesτ = O(n 5 log n). Finally an application of Lemma 5.2 takes care of the non-symmetrized case.

Upper bounds in total variation
We will use the main theorem of Morris [7]. First we extract what we need from [7]. Let µ and ν be two probability measures on S. The relative entropy of µ with respect to ν is given by An equivalent expression is ENT(µ ν) = E µ [log(µ(X )/ν(X ))]. By Jensen's inequality it follows that ENT(µ ν) ≥ 0 with equality if and only if µ = ν.
When the measure ν is suppressed from the notation, it will be understood that ν is uniform. In this case we will simply speak of the relative entropy of µ. We note that where H(µ) is the usual absolute entropy of µ. Relative entropy relates to total variation norm in the following way.
Lemma 6.1. Let π be the uniform probability measure on S. Then Proof. By Cauchy-Schwarz' inequality, Hence it suffices to show that This is a standard optimization problem over the µ(s)'s.
When X is a random variable, write for simplicity ENT(X ) for ENT( (X )). For two random variables X and Y , ENT(X |Y = y) and the random variable ENT(X |Y ) then have the obvious interpretations.
From now on we take S = S n . Let Z be a random permutation. Some algebraic manipulation and induction leads to the well-known chain rule for entropies. Lemma 6.2. Let j = σ(Z −1 ( j), Z −1 ( j + 1), . . . , Z −1 (n)). Then for any 1 ≤ i ≤ n, In particular, with i = 1 and E k : Let c(i, j) denote the random permutation that equals id with probability 1/2 and (i j) with probability 1/2. A permutation of this kind will be called a collision of the positions i and j. Assume that a random permutation Y is expressed on the form c(a r , b r ) where the a j 's and b j 's are distinct and the collisions are independent given Y 0 , (but where the number of collisions and the identity of a j and b j may depend on Y 0 ). Fix a positive integer t, let Y 1 , Y 2 , . . . , Y t be iid copies of Y and let where the E k 's refer to Z.
Next we apply this to the triple shuffle. Write one step, Y , of the triple shuffle on the above form by first letting Y 0 be a step of the original overlapping cycles shuffle p m = p n = 1/2, and then . It is easy to see, as above, that there are constants 0 < c < C < ∞ such that c/ jn < EI j ≤ C/ jn. Hence EN = Θ(k/n) and it is easy to see, also as above, that E[ ] for a j for which I j = 1, (provided that τ j ≤ T 1 − m, which holds with probability 1 − o(e −n )). Since this happens with probability Θ(n/k 2 ), we have P(M(k) = i) = Θ(1/k). Proof. Let X t be t steps of the triple shuffle. We want to apply Lemma 6.3. By the chain rule we have Partition the indexes k into blocks I 0 , I 1 , . . . , I log 2 n where I j := [n] ∩ {2 j , . . . , 2 j+1 − 1}. Then, since there are no more than log 2 n + 2 < 2 log n blocks, where j * is the index j that maximizes the sum k∈I j E k . Write k * = max I j * . By Lemma 6.4 applied to k ∈ I j * and T as in the lemma, Taken together with Lemma 6.3 with the same T , the last two observations give where t ≤ 20(k * ∨ n 1/2 ) 2 and C = aC/4. Using this inductively yields for r = 1, 2, . . ., Taking Z = id, noting that ENT(id) = log(n!) < n log n, and r = 2((k * ∨ n 1/2 )/k * )n 1/2 log 3 n then gives ENT(Z X 40((k * ∨n 1/2 ) 3 /k * ) log 3 n ) ≤ C log n n .
Next we turn to the symmetrized shuffle. The following analog of Lemma 6.4 turns out to be neater in its formulation, but a bit trickier to prove. Lemma 6.5. Fix k ∈ [n], let l := 2 log 2 k , T = 10l 2 and t = 100n 2 . Then there is a constant a, independent of n, k and i such that Proof. The intuition behind this is simple: with high probability cards k and i will spend a significant proportion of the time up to T on different halves of the deck. During this time they will diffuse with respect to each other a distance which is typically of order k. This will therefore cancel their starting distance with probability of order 1/k. Doing this properly however, takes some work. . . , 2m − 1}, it behaves like SRW with holding probability of 1/2; indeed in the former case U s moves according to F s and in the latter it moves along with L s . When at 0, i.e. when card k is in E, the random walk behaves differently. Let B be the event that none of the walks spends more than time C T up to time T at 0. Then P(B) is at least, say, 9/10, once C is properly chosen.
Whenever card k enters E, it will exit E in the upper half of the deck with a probability depending on where it hits E. This probability is readily computed to be 192/253 from 0, 200/253 from m, 12/23 from m + 1 and 96/253 from 2m. The particular numbers are not so important, we just note that they are all in the interval [1/5, 4/5]. In short, each time U s hits 0, it may change state, from going along with F s to following L s or vice versa, and a change takes place with probability at least 1/5. Of course, all this goes for V t too.
Taken together, U t and V t make the exact same moves, except when the cards k and i are on different halves of the deck, in which case the walk corresponding to the card in the lower half, only makes the move with probability 1/2 and instead holds with probability 1/2. We also note that if cards i and k are in E at the same time, the conditional probability that i goes to the upper/lower half of the deck given that k goes to the upper/lower half is at least 1/5. Now consider for a while conditioning on the set of points in time at which one of the cards k and i hit E and the position in which this happens. For each such time point, s j , let A j be the event that the card in question exits on a different half than the other card and that an independent coin flip results in heads. We let this coin be biased in such a way that A j happens with probability 1/5. Let A(p) be the event that the proportion of time off 0 between s 1 and some stopping time, that A j occurred for the latest s j , is at least p. Then, regardless of the structure of the set of s j 's, the conditional probability of A(1/5) is at least 1/5. Let A be A(1/5) intersected with an extra independent coin flip biased in such a way that the conditional probability of A is exactly 1/5.
The point of the extra coin flips is that A carries no information on the structure of the s j 's, and since F s and L s are independent of which half i and k are on, conditioning on A leaves no information on {D s }. Note that s ≤ 2T /3 + C T for all s counted in X . Let Y be the distance, modulo m in the above sense, at time T between cards k and i caused by k − i, the rest of the D s 's and moves when one of the cards is in E. Since P(|Y | ≤ 10k 2 ) is bounded away from 0 and X is independent of Y , A, B and {s 1 > T /3}, the local CLT implies that there exists a constant b > 0 such that P(Y − X = 1) ≥ b/k. Also, given this, there is clearly a probability bounded away from 0 that either i or k hit 0 in the time interval [2T /3] and at the last time before T this happened, i and k both went to the the upper half of the deck. If this happens, then we note that at time T , cards k and i are next to each other on the upper half of the deck with k on top of i.
Finally, given all this, there is a probability at least 1/16 that m(k) = i; this happens e.g. if the two first moves following the first time after T that card i hits position m are favorable. This completes the proof. Proof. Copying the proof of Theorem 6.1 word for word, but using t = 100n 2 and Lemma 6.5 instead of Lemma 6.4 shows that mixing time is bounded by 200n 2 log 3 n.
Some further small adjustments also lead to the following upper bound for the equalized shuffle, only a factor log 2 n off from the lower bound in Section 3. Proof. An analogous result to Lemma 6.5 goes through, but with t = 3n 3 and T = 2nl 2 . The proof is a copy of the proof of Lemma 6.4 with the difference (and simplification) that cards i and k move non-deterministically with respect to each other only when one of them is at position m, m + 1 or n.
Mimicking the proof of Theorem 6.1 once again, now gives a mixing time O(n 3 log 3 n).