Improved bounds for the mixing time of the random-to-random shufﬂe

We prove an upper bound of 1 . 5321 n log n for the mixing time of the random-to-random insertion shufﬂe, improving on the best known upper bound of 2 n log n . Our proof is based on the analysis of a non-Markovian coupling.


Introduction
How many shuffles does it take to mix up a deck of cards? Mathematicians have long been attracted to card shuffling problems. This is partly because of their natural beauty, and partly because they provide a testing ground for the more general problem of finding the mixing time of a Markov chain, which has applications to computer science, statistical physics and optimization. Let X t be a Markov chain on a finite state space V that converges to the uniform distribution. For probability measures µ and ν on V , define the total variation distance ||µ − ν|| = 1 contribution of this paper is to improve the constant in the upper bound to 1.5321. We achieve this via a non-Markovian coupling that reduces the problem of bounding the mixing time to finding the second largest eigenvalue of a certain Markov chain on 10 states. We also use the technique of path coupling (see [1]).

Main result
For sequences a n and b n , we write a n ∼ b n if lim n→∞ an bn = 1 and a n b n if lim sup n→∞ an bn ≤ 1. Let P be the transition matrix of the random-to-random insertion shuffle. Define When the number of cards is n, we write d n (t) for the value of d(t), and T (n) mix (ε) for the ε-mixing time of the random-to-random insertion shuffle. Our main result is the following upper bound on T (n) mix (ε). Theorem 2.1. For any ε ∈ (0, 1) we have T (n) mix (ε) 1.5321n log n. We think of a permutation π in S n as representing the order of a deck of n cards, with π(i) = position of card i. Say x and x are adjacent, and write x ≈ x , if x = (i, j)x for a transposition (i, j). We prove Theorem 2.1 using a path coupling argument (see [1]) and the following lemma.

Lemma 2.2.
If n is sufficiently large and x and x are adjacent permutations in S n , then there exist positive constants c and α such that for all t > 1.5321n log n .
The proof of Lemma 2.2, which uses a non-Markovian coupling, is deferred to Section 3.
Proof of Theorem 2.1. Suppose that t > 1.5321n log n. By convexity of the l 1 -norm, and since U = 1 n! z∈Sn P t (z, ·), it follows that for any state y we have ||P t (y, ·) − U|| ≤ max z ||P t (y, ·) − P t (z, ·)|| . Since any permutation in S n can be written as a product of at most n − 1 transpositions, by the triangle inequality the quantity on the righthand side of (2.1) is at most which tends to zero as n → ∞.

Proof of Lemma 2.2
Recall that we think of a permutation π in S n as representing the order of a deck of n cards, with π(i) = position of card i. Let M i,j : S n → S n be the operation on permutations that removes the card of label i from the deck and re-inserts it to the right of the card of label j if i = j;  We call such operations shuffles. If M 1 , . . . , M k is sequence of shuffles, we write The transition rule for the random-to-random insertion shuffle can now be stated as follows. If the current state is x, choose a shuffle M uniformly at random (that is, choose a and b uniformly at random and let M = M a,b ) and move to xM .
We call the numbers in {1, . . . , n} cards. If a shuffle M removes card c from the deck and then re-inserts it, we call M a c-move.
If P = M 1 , M 2 , . . . is a sequence of shuffles, we write (Px) t for the permutation xM 1 · · · M t . Note that if P is a sequence of independent uniform random shuffles, then {(Px) t : t ≥ 0} is the random-to-random insertion shuffle started at x.

The Non-Markovian coupling
Fix a permutation x and i, j ∈ {1, 2, . . . , n}. The aim of this subsection is to define a coupling of the random-to-random insertion shuffle starting from x and (i, j)x, respectively. Suppose that we couple the processes so that the same labels are chosen for each shuffle. Note that if there is an i-move (respectively, j-move) followed at some point by a j-move (respectively, i-move), then the processes will couple at the time of the j-move (respectively, i-move) provided that any cards placed to the right of card j (respectively, i) at any intermediate time (and any cards placed to the right of those cards, and so on) were subsequently removed. We keep track of these "problematic" cards using a process we call the queue.
For positive integers k we will call a sequence M 1 , . . . , M k of shuffles a k-path. For a k-path P, define the P-queue (or, simply the queue) as the following Markov chain {Q t : t = 0, . . . , k} on subsets of cards. Initially, we have Q 0 = ∅. If the queue at time t is Q t , and the shuffle at time t + 1 is M a,b , the next queue Q t+1 is We call a shuffle an i-or-j move if it is an i-move or a j-move. Note that at any time after the first i-or-j move the queue contains exactly one card from {i, j}. Let P = M 1 , . . . , M k be a k-path. For t < k, we say that t is a good time of P if 1. M t is an i-or-j move; 2. there is a time t ∈ {t + 1, . . . , k} such that and call T the last good time of P. Let θ i,j P be the k-path obtained from P by reversing the roles of i and j in each shuffle before time T (that is, by replacing shuffle M a,b with M π(a),π(b) , where π is a transposition of i and j). Note that θ i,j P has i-or-j moves at the same times as P. Furthermore, since the queue is reset at the times of i-or-j moves, the θ i,j P-queue will have the same values as the P-queue at all times t ≥ T . It follows that the last good time of θ i,j P is the same as the last good time of P, and hence ECP 22 (2017), paper 22. θ i,j (θ i,j (P)) = P. Since θ i,j is its own inverse, it is a bijection and hence if P is a uniform random k-path, then so is θ i,j P. Let x = (i, j)x. Let P k be a uniform random k-path, and let T k be the last good time of P k . Note that T k < k or T k = ∞. For t with 0 ≤ t ≤ k, define It is clear that x t and x t have distributions P t (x, ·) and P t (x , ·), respectively, for all t ≤ k.
Proof. Assume that T k < k. Note that at any time t < T k , the permutation (P k x) t can be obtained from ((θ i,j P k )x ) t by interchanging the cards i and j. Suppose that the next i-or-j move after time T k occurs at time T k . Without loss of generality, there is an i-move at time T k and a j-move at time T k . We claim that for times t with T k ≤ t < T k , the permutation x t can be obtained from x t by moving only the cards in Q t , as shown in the diagram below. (In the diagram, the mth X in the top row represents the same card as the mth X in the bottom row, and Q represents all the cards in Q t .) x t : X X X X X X Q X X X x t : X X X Q X X X X X X To see this, note that it holds at time T k , when the queue is the singleton {j} (since at this time the i's are placed in the same place), and the transition rule for the queue process ensures that if it holds at time t then it also holds at time t + 1. The claim thus follows by induction. This means that at time T k − 1 the permutations differ only in the location of card j. That is, they are of the form: x T k −1 : X X X X X X j X X X x T k −1 : X X X j X X X X X X Thus at time T k , when card j is removed and then re-inserted into the deck, the two permutations become identical, and they remain identical until time k.

Tail estimate of the coupling time
Recall that T k is the last good time of a uniform random k-path.

Lemma 3.2.
Suppose that k > 1.5321n log n. Then there exist positive constants c and α such that P(T k = ∞) ≤ c n 1+α for sufficiently large n.
Proof. Consider a process Y t ∈ {0, 1, . . . } ∪ ∞ that is defined as follows. The process starts in state ∞ and remains there until the first i-or-j move. From this point on, the value of Y t is the size of the queue, until the first time that either 1. card i is moved when the queue is {i}, or 2. card j is moved when the queue is {j}.
At this point Y t moves to state 0, which is an absorbing state. Note that T k = ∞ exactly when Y k > 0. For l = 1, 2, . . . , define It is easy to check that Y t is a Markov chain with the following transition rule. If the current state is 0, the next state is 0. If the current state is ∞ the next state is    1 with probability 2 n ; ∞ with probability n−2 n .
If the current state is l ∈ {1, 2, . . . }, the next state is The possible transitions of Y t andỸ t are indicated by the graph in Figure 1. We claim that if we start withỸ 0 = Y 0 = ∞ then the distribution ofỸ t stochastically dominates the distribution of Y t for all t. To see this, note that Y t changes state with probability less than 1 2 at each step, and when it changes state, it either makes a ±1 move or it transitions to 1. Since for m ∈ {1, 2, . . .} ∪ ∞, the transition probability K(m, 1) is decreasing in m, it follows that Y t is a monotone chain. (That is, K(x, ·) is stochastically increasing in x; see [3].) The claim follows sinceỸ t is obtained from Y t by replacing moves to 9 with moves to the (larger) state of ∞.
LetK n be the value of the matrixK when the number of cards is n, andK n the matrix obtained by deleting the first row and the first column ofK n . If we write A n → A for a sequence of matrices A n and a fixed matrix A, it means that A n converges to A component-wise as n → ∞.
Define C n := n(K n − I), where I is the identity matrix. A straightforward calculation shows that C n → C where   and that the eigenvalues of C are real and distinct (and hence C is diagonalizable), and negative. Denote the largest eigenvalue of C by −λ, where λ = 0.652703 . . . . (We can improve the eigenvalue marginally by considering a Markov chain with more than 10 states. For example with 35 states we get an eigenvalue of −0.6527363 . . . . However, we can't improve on this by more than 10 −7 even if we use up to 100 states. Therefore, for simplicity we shall stick to our 10-state chain as a reasonable approximation to Y t .) Since C is diagonalizable, there exists an invertible 9 × 9 matrix Q such that Q −1 C Q = D, where D is a diagonal matrix whose diagonal entries are the eigenvalues of C. Let D n = Q −1 C n Q, and note that D n → D. For matrices A, let A denote matrix norm induced by the l 1 norm on vectors. By continuity of the matrix exponential function and matrix norm, we have lim n→∞ e Dn = e D = e −λ . Since λ > 0.6527, it follows that e Dn ≤ e −0.6527 for sufficiently large n. Since k/n > 1.5321 log n, submultiplicativity of operator norms implies that for sufficiently large n we have e k n Dn ≤ e −0.6527×1.5321 log n ≤ 1 n 1+α for some α > 0.