The mixing time of the fifteen puzzle

We show that there are universal positive constants c and C such that the mixing time T_{mix} for the fifteen puzzle in an n by n torus satisfies cn^4 log n<T_{mix}<Cn^4 log^2 n.


Introduction
The fifteen puzzle, often credited to Sam Loyd, was a craze in 1880. The game consists of a 4 × 4 grid with fifteen tiles, labeled 1,2,. . . ,15, and an empty space (the "hole"). In a move, the player pushes a tile into the hole. The tiles start in "mixed up" order and the goal is to sort the tiles and move the hole to the lower right corner, as shown in Figure 1. There are also 3 × 3 and 2 × 4 versions of the game. In this paper we study the problem, posed by Diaconis [1], of finding the mixing time of the fifteen puzzle: starting from a solved game, how many steps are required to "mix up" the tiles again, if at each step we choose a move uniformly at random? (See Section 2 for a precise definition of the mixing time).
We can define the fifteen puzzle on any finite graph G as follows. In a configuration, the tiles and hole occupy the vertices of G. In a move, the hole is interchanged with a tile in an adjacent vertex. If G is bipartite, then there are some configurations that are not reachable from a given starting state. To see this, note that if G is bipartite then we can define a parity for each vertex in G. Thus, if we view configurations as permutations π on the vertex set of G, and define Ω = {π : parity(π) = parity(hole)}; configuration in Ω is reachable by legal moves. If G is not bipartite, we say the game is solvable if every configuration is reachable by legal moves. The fifteen puzzle is known to be solvable on most graphs (see [15]); in particular, it is solvable on an m × n grid provided that m and n are both at least 2 (see [7]).
In the present paper, we analyze the fifteen puzzle in the n × n torus G n := Z 2 n . We consider the Markov chain, which we call the Loyd process, in which at each step a uniform random move is made. (We actually consider the lazy version of the Loyd process, where we add a holding probability of 1 2 to each state to avoid periodicity.) The Loyd process is related to the interchange process on G n , which is defined as follows.
In a configuration, each vertex in G n is occupied by a particle. At each step, choose a pair of neighboring particles uniformly at random and then interchange them. Yau [16] famously showed that the log Sobolev constant for the interchange process is on the order of n −4 , which implies that the mixing time is O(n 4 log n), and there is a matching lower bound [11]. The Loyd process can be viewed as a variant of the interchange process, where there is a special particle (the hole) that is conditioned to be involved in each step.
Our main result is to determine the mixing time of the Loyd process to within constant factors. Let T V be the mixing time in total variation for the Loyd process (see Section 2 for a precise definition). We show that there are universal constants c > 0 and C > 0 such that cn 4 log n ≤ T V ≤ Cn 4 log n.
For the upper bound, we use the comparison techniques for random walks on groups developed in [2], which allow us to bound the mixing time for the Loyd process using known bounds for shuffling by random transpositions. A difficulty that arises here is that G n is bipartite when n is even, which implies that there is a restricted state space.
To handle this, we use a method to compare mixing times across different state spaces.
To compare our chain with shuffling by random transpositions, we introduce three intermediate chains and then make a total of four comparisons.
Fifteen puzzle p f (position of particle p), where the sum is over a certain set of particles, and f is an eigenfunction for the motion of a single particle. In a typical application of Wilson's method, only a bounded number of particles are involved in each move, and hence the distinguishing statistic is slowly decaying. However, in the Loyd process, each move of the hole affects the distribution of the final position of each tile, which makes the "Wilson statistic" hard to analyze. Fortunately, by making use of some surprising cancellations we are able to prove a lower bound of the correct order cn 4 log n. Note that our proof, although based on Wilson's method, is self-contained and does not rely on any of the results in [14].

(2.2)
Define the -mixing time in L 2 by T mix ( ) = min{t : χ(p t (x, ·), π) ≤ for all x ∈ S} . (2. 3) The mixing time in L 2 is T mix = T mix (1/4). By (2.2), for any > 0 we have (2 ). (2.4) For S ⊂ S, let τ 1 < τ 2 < · · · be the times when the chain is in S . The restriction of the Markov chain to S is the new Markov chain (X τ1 , X τ2 , . . . ). For f : S → R, the harmonic extension of f to S is the functionf that agrees with f on S and is harmonic on S \ S , which can be defined byf where E x · := E · X 0 = x and T S = min{t ≥ 0 : X t ∈ S } is the hitting time of S . If P is the transition matrix of the chain, we consider the continuous-time Markov process that moves at rate 1 according to P . Thus, for i = j the continuous-time chain moves from i to j at rate p(i, j). Let H t be the transition probabilities for the continuous-time chain, defined by , π) ≤ for all x ∈ S} , (2.5) and the mixing time in L 2 is τ mix = τ mix (1/4).

Random walks on groups and comparison techniques
Let G be a finite group and let p be a probability measure supported on a set of generators of G. The random walk on G driven by p is the Markov chain with the following transition rule. If the current state is x, choose y at random according to p, and then move to xy.
In the present paper we shall use a slightly more general definition of a random walk on a group. For a finite group G, we write G * for the set of strings over G, that is, finite sequences of elements of G. If g 1 g 2 · · · g k ∈ G * , we define its evaluation as the group element g 1 · g 2 · · · g k (where · is the group operation). As an abuse of notation, we use the string itself as notation for its evaluation. (Thus there exist strings y and y such that y = y in G * , but y = y in G.) If two strings evaluate to the same group element, we say that one is a representation of the other.
Let H be a subgroup of G, let p be a probability measure on G * , and suppose that {g ∈ G : g is the evaluation of a string in the support of p} is a generating set for H. The random walk on H driven by p is the Markov chain with the following transition rule. If the current state is x ∈ H: 1. choose the string y at random according to p; 2. move to xy.
We call strings in the support of p moves. For x and y in G * we write xy for the concatenation of x and y. Note that since the random walk on H driven by p is doubly stochastic the stationary distribution is uniform over H.

Comparison techniques
We say that p is symmetric if p(g 1 · · · g k ) = p(g −1 k · · · g −1 1 ) for every g 1 · · · g k ∈ G * . Let p andp be symmetric probability measures on G * that drive random walks on a subgroup H of G. Think ofp as driving a known chain and p as driving an unknown chain. Let E be the support of p. For each y in the support ofp, we give a random representation of y of the form Z 1 Z 2 · · · Z K , where K is possibly random, and each of the Z i are random elements of E. Given such a representation, we write |y| for the value of K. For z ∈ E, let N (z, y) = number of times z ∈ E occurs in the representation of y.
where Y is chosen at random according top.

Remark 3.3.
In the original formulation in [4] the representations are deterministic. In this case the quantity A can be written as

Mixing time upper bound: main theorem
Before stating the mixing time upper bound, we give a more formal description of the Loyd chain, and we also describe some other chains that are used in comparisons.
Suppose n ≥ 2 and let V n be the vertex set of the n×n torus G n . Note that if we give each tile and the hole a unique label in V n , then we can view configurations as permutations on V n . For reasons that will become clear later, we give the hole the label h := (0, 0). For y = (y 1 , y 2 ) ∈ V n , call y even if y 1 + y 2 is even, and define Ω and Ω c as in equations (1.1) and (1.2). Since the fifteen puzzle is solvable in a grid of size 2 × 2 or larger, any pair of states in Ω (respectively, Ω c ) communicate. Furthermore, there are transitions between Ω and Ω c if and only if n is odd. It follows that the state space is restricted to half the permutations exactly when n is even. If we start from a configuration in Ω, then the state space is Ω if n is even; all permutations on V n if n is odd.
As stated in the Introduction, we prove the upper bound by comparing the Loyd chain with shuffling by random transpositions, using a number of intermediate chains.
For easy reference we give a short description of each of these chains below. We shall describe each chain in discrete time, but in fact our comparisons will involve the mixing times of the continuous-time versions of each chain.
1. Loyd chain: interchange the hole with one of four adjacent tiles, chosen uniformly at random.
2. Hole-conditioned chain (HC): Interchange the hole with a tile chosen uniformly at random.
3. Shuffling by random transpositions (RT): Choose two particles uniformly at random and then swap them. (Here particle refers to both the tiles and the hole.) The following two chains are defined when n is even.
4. Parity-conditioned chain (PC): Choose a tile whose position has opposite parity to that of the hole, uniformly at random, and then interchange it with the hole. 5. Ω-restricted chain (OR): The hole-conditioned chain, restricted to Ω. That is, if T 1 < T 2 < · · · are the times when the hole-conditioned chain X t is in Ω, then the Ω-restricted chain is {X Tj : j ≥ 1}.
Our main theorem concerns the lazy version of the Loyd process, which is obtained from the Loyd process by adding a holding probability of 1 2 . That is, at each step we do nothing with probability 1 2 ; else move like in the Loyd process.  Proof. By (2.4) and Theorem 20.3 of [9], it is sufficient to show that the bound holds for the L 2 mixing time of the continuous-time Loyd chain. Hereafter, we will simply write mixing time for the L 2 mixing time of a continuous-time chain.
For the case when n is odd, Theorem 4.1 follows from the following relations between mixing times: which we prove below as Lemmas 5.1 and 6.5, respectively. Here, α is a universal constant. The proof of Lemma 5.1 can be found in Section 5. The proofs of Lemmas 6.2, 6.3, 6.4 and 6.5 can be found in Section 6.

Comparison of hole-conditioned chain with random transpositions
Lemma 5.1. The mixing times τ mixRT and τ mixHC satisfy Proof. Let G be the symmetric group on V n with the group operation defined by For permutations π on V n , if we think of π(j) as representing the label of the particle in position j, then we can view shuffling by random transpositions (respectively, the hole-conditioned chain) as the random walk on G driven byp (respectively, p), wherẽ p = uniform distribution on permutations of the form (i, j) with i = j and i, j ∈ V n ; p = uniform distribution on permutations of the form (h, i) with i = h and i ∈ V n .

Comparisons involving the remaining chains
The subsequent chains that we analyze are random walks on a different group. Note that the hole-conditioned chain, Loyd chain, and parity-conditioned chain all can be described as follows. At each step: 1. choose y according to some distribution on V n ; 2. if the hole is in position x, interchange it with the tile in position x + y.
To see that these are random walks on a group, let V n = V n \ (0, 0) and note that a configuration can be specified by an ordered pair (x, f ), where x ∈ V n is the position of the hole, and f : V n → V n , is the permutation defined by f (z) = (position of tile z) − x.
(Thus f gives the positions of the tiles relative to the hole; note that f maps tiles to positions, whereas for the permutations in Section 5 it was the other way around.) Let G be the group whose elements are {(x, f ) : x ∈ V n , f is a permutation on V n } and with the group operation Thus G is the direct product of V n and the symmetric group on V n . For y ∈ V n , the transition that translates the hole by y is right multiplication by the group element (y, π y ), where π y is the permutation defined by As an abuse of notation, we write y for the move (y, π y ). We write ↑, ↓, →, and ← for the moves (0, 1), (0, −1), (1, 0), and (−1, 0), respectively.
The Ω-restricted chain. We write 0 for the identity element (0, 0), of G. If n is even, and we define then Ω is the set of states reachable from 0 in the Loyd chain. Note that Ω is closed under products and inverses and hence is a subgroup of G. It is not hard to show that the permutation π y defined in (6.1) is odd unless y = 0. This implies that the move y is in Ω if and only if y is odd or 0. We will call such moves good and the other moves bad. Note that the product of moves y 1 y 2 · · · y m is in Ω if and only if an even number of the y i are bad.
The Ω-restricted chain is a random walk on Ω where each move is generated as follows: 1. Let y 1 , y 2 , . . . be i.i.d. moves of the hole-conditioned chain, and let T = min{m ≥ 1 : an even number of the moves y 1 , . . . , y m are bad}; 2. Let the move be y 1 y 2 · · · y T .

Comparison of hole-conditioned chain with Ω-restricted chain
Note that the Ω-restricted chain is a "sped up" version of the hole-conditioned chain; this suggests that its mixing time is at most of the same order. In this section we show that this is indeed the case. The key step in the proof is to show that the time the EJP 22 (2017), paper 9.
hole-conditioned chain spends in Ω over the interval [0, t] is tightly concentrated around t/2. More precisely, let X t be the continuous-time version of the hole-conditioned chain and define the occupation measure L t (Ω) = t 0 1( X s ∈ Ω) ds. For t ≥ 0, define T t = inf{s : L s (Ω) > t} and define X t = X Tt . Note that X t is the continuous-time version of the Ω-restricted chain.
When t is large the random variable T t is typically about 2t. In the following lemma we give some large-deviation bounds. Note that there are n 2 2 − 1 bad vertices, and n 2 2 good vertices, respectively, in V n . Lemma 6.1. Let λ = n 2 −2 2(n 2 −1) be the proportion of vertices in V n that are bad. For any a > 0 there exist positive constants α a and β a such that Proof. We shall prove that the Lemma holds with The second eigenvalue of G is −2λ. Thus, Theorem 3.4 of [10] implies that for any h > t/2 we have and for any h < t/2 we have Hence, for any s > 0 we have and First we show that (6.2) follows from (6.4). Using the identity E(X) = t ∞ 0 P(X > tu) du, valid for any t > 0 and non-negative random variable X, gives exp −s 2 λt/6(2 + s) ds, where in the second line we have made the change of variable s = u + a, and the inequality holds by (6.4). Since s 2+s ≥ a 2+a whenever s ≥ a, the quantity (6.7) is at most √ 2t ∞ a exp −asλt/6(2 + a) ds = 6 √ 2(2 + a) aλ exp −a 2 λt/6(2 + a) , (6.8) verifying (6.2). One can similarly show that (6.3) follows from (6.5). (Note that the bound in (6.5) is stronger than the bound in (6.4), but the right-hand sides of (6.2) and (6.3) are the same.) This completes the proof.
Lemma 6.2. Suppose that n is even. For any > 0 there is a C > 0 such that the mixing times of the Ω-restricted and hole-conditioned chain satisfy τ mixOR (2 ) ≤ max 2τ mixHC ( ), C n 2 log n ; Proof. Let p andp be transition probabilities for the Ω-restricted and hole-conditioned chains, respectively, and let π and π be the corresponding stationary distributions. For , with a similar definition ford 2 (t). Let x be a state in Ω. Since p is symmetric, conditioning on X t gives that p 2t (x, x) = y∈Ω p(x, y) 2 , and hence The factor 2 is present because the state space for the hole-conditioned chain is twice as large as that of the Ω-restricted chain.) Note that replacing t by t/2 in (6.10) gives (1 +d 2 2 (t/2)). (6.11) The integral in (6.12) can be interpreted as the expected amount of time that the Ω-restricted chain spends at x, between time t and 2t. This is equal to the expected amount of time that the Ω-restricted chain spends at x between T t and T 2t , which is at where B and D are constants that depend only on . The first term in (6.15) bounds the integral in (6.14) because the integrand is bounded above by p t (x, x) (since a ≤ 1 2 ), and the second term in (6.15) bounds the expectations in (6.14) by Lemma 6.3. (Note that the proportion λ of vertices in V n that are bad can be bounded below by 1 3 for n ≥ 2.) Now, note that whenever t ≥ 2τ mixHC ( ), we haved 2 2 (t/2)) ≤ , and hencẽ Since this is an upper bound on the integral in (6.12), we have by (6.9). Combining this with (6.13) gives whenever t ≥ C n 2 log n. It follows that d(t) ≤ 2 whenever t ≥ 2τ mixHC ( ) and t ≥ C n 2 log n, proving the Lemma.

Comparison of parity-conditioned chain
to Ω-restricted chain Lemma 6.3. Suppose that n is even. Then the mixing times τ mixPC and τ mixOR satisfy Proof. In order to compare the Ω-restricted chain with the parity-conditioned chain we intoduce an intermediate chain, which we denote BGB . A move of the BGB chain is a concatenation consisting of between 1 and 3 moves of the HC chain, generated as follows. Let b 1 and b 2 be uniform random bad moves, and let g be a uniform random good move, respectively, in V n . The BGB move is We shall use Theorem 3.1 twice, first to compare BGB with the Ω-restricted chain, then to compare PC with BGB.
Comparison of BGB with the Ω-restricted chain chain. We need to show how to represent moves of the Ω-restricted chain using BGB moves. Consider a move y of the Ω-restricted chain. Then y is of the form g, b 1 b 2 or b 1 g 1 g 2 · · · g k b 2 , where we write b's for bad moves and g's for good moves. If y = g (respectively, y = b 1 b 2 ) then we can represent it as g (respectively, b 1 b 2 ), since this is also a BGB move. Suppose now that EJP 22 (2017), paper 9.
In this case we represent it as z 1 · · · z k , where the z i are defined by for uniform random bad moves B 1 , . . . , B k−1 .
We apply Theorem 3.1, letting p = measure corresponding to the Ω-restricted chain; p = measure corresponding to the BGB chain.
We need to bound the quantity A = max z A(z), where Recall that there are n 2 2 − 1 bad vertices, and n 2 2 good vertices, respectively, in V n . If z = g then z is used only in the representation of g itself, and sincẽ It remains to check the case when z is of the form b 1 gb 2 . Note that A(z) can be written as 1 Note that P(K = k) = 1 2 k+2 for k ≥ 1. Furthermore, conditional on K = k, the distributions of Z 1 , . . . , Z k are uniform over moves of the form b 1 gb 2 . It follows that where S is the set of moves of the form b 1 gb 2 . It follows that as g itself. To handle moves of the form b 1 b 2 and b 1 gb 2 , we first note that if e 1 , e 2 ∈ V n are even and o ∈ V n is odd, then we can represent the BGB move e 1 oe 2 as Note that the moves in (6.18) are moves of the PC chain, since the corresponding elements of V n are odd. If y is of the form b 1 gb 2 , we can represent it with PC moves using (6.18). If y is of the form b 1 b 2 , we first give it the intermediate representation , where B and G are uniform random bad and good moves, respectively, and then represent both the b 1 GB and BGb 2 using (6.18). Note that the maximum length of the representation of any y is 14 We apply Theorem 3.1 again, this time lettingp (respectively, p) be the measure corresponding to the BGB chain (respectively, PC chain). We need to bound the quantity Note that for all k ∈ {1, 7, 14} the conditional distribution of Z 1 , . . . , Z k , given |Y | = k is uniform over the set of PC moves. It follows that for every PC move z we have where we write |P C| for the number of PC moves. It follows that, for any PC move z, we where the last line holds because |Y | ≤ 14. Since p is the uniform distribution over PC moves, we have p(z) = 1 |P C| , and hence 1 implies that τ mixPC ( ) ≤ 822τ mixBGB ( ). (6.19) Combining this with (6.17) yields the lemma.

Comparisons of parity-conditioned and hole-conditioned chains with Loyd chain
Lemma 6.4. Suppose that n is even. Then the mixing times τ mixLoyd ( ) and τ mixPC ( ) for a universal constant α.
Proof. In order to apply Theorem 3.1, we need to show how to represent any move of the PC chain using moves of the Loyd chain. We will actually show how to represent PC moves using a different Markov chain, which we call near Loyd (NL). In the NL chain, each move is a move y of the PC chain with y conditioned to satisfy |y 1 | + |y 2 | ∈ {1, 3}, where for u ∈ Z n we define |u| = min(u, n − u). That is, each step of the NL chain swaps the hole with tile at L 1 -distance 1 or 3 away from it. A representation using NL moves is sufficient because any NL move can be represented using a bounded number of Loyd EJP 22 (2017), paper 9. moves: if the L 1 -distance between the hole and tile T is at most 3, then there is a 3 × 3 square grid that contains both the hole and tile T , and the fifteen puzzle is solvable in a 3 × 3 grid.
We now show how to represent a PC move with NL moves. There are three cases to consider.
Case 1: swapping the hole with a tile one row higher.
We first consider the case where the move y = (y 1 , y 2 ) is such that y 2 = 1. That is, the move swaps the hole with a tile one row higher.
Suppose that tile T is located in the row immediately above the hole. To swap the hole with tile T , leaving everything else the same, perform the following algorithm: 1. repeat: ↑, →, ↓, →, until the hole is swapped with T . Note that each move here is actually a move of the Loyd chain.   To see why these moves swap the hole and tile T , leaving everything else unchanged, suppose that we label the tiles so that the numbers increase in the pattern between the hole and tile T . If k is the label of tile T , then for i with 1 ≤ i ≤ k, we write position i for the initial position of the tile of label i, and position 0 for the initial position of the hole. We can represent each intermediate configuration of tiles as a k + 1 dimensional vector, writing (u 0 , u 1 , . . . , u k ) for the configuration where tile u i is located in position i for 0 ≤ i ≤ k. Note that under this convention the initial configuration is (H, 1, 2, . . . , k). Analysis of the Algorithm: Step 1 moves tile i to position i − 1 for 1 ≤ i ≤ k, and moves the hole to position k. Hence after step 1 the configuration is (1, 2, . . . , k, H). (This is shown in Figure 3 in the case where k = 9.) Step 2 moves only the rightmost four tiles, moving tiles k − 1, k − 2, k − 3 and the hole to positions k − 2, k − 3, k and k − 1, respectively. The resulting configuration is (1, 2, . . . , k − 1, k, H, k − 2) (see Figure 4). In step 3, tiles k − 1 and k − 2 (initially the two rightmost tiles on the top row) each move only once and go to positions k and k − 1, respectively. Tile 2 (initially the leftmost tile on the top row) moves only once and goes to position 2. Every other tile on the top row is moved twice, first to the right then down, increasing its position by 2 units. Tile 1 (initially the leftmost tile of the bottom row) moves only once and goes to position 1. Every other tile on the bottom row is moved twice, first up and then to the right, increasing its position by 2 units. The resulting configuration is (k, 1, 2, H, 3, 4, . . . , k − 2, k − 1) (see Figure 5). Finally, step 4 moves the hole to position k, and tiles 3, 4, . . . , k − 1 down by one position each, resulting in configuration (k, 1, 2, . . . , k − 1, H) (see Figure 6).
Case 2: swapping the hole with a tile on the same row. Let C be a configuration in which the hole and tile T are on the same row. To swap the hole with tile T : Choose a tile T on the row one step higher such that T and T share one edge. Let C be the configuration obtained from C by interchanging T and T . Let f be the permutation on V n that transposes the positions of tiles T and T in configuration C. Since in C tile T is one row higher than the hole, we can use the algorithm for Case 1 to swap the hole and T starting from configuration C . Let l k be the label of the tile swapped with the hole in the kth step when performing this algorithm. To swap the hole with tile T starting from configuration C, we use the sequence of moves defined by the same label sequence (l 1 , l 2 , . . . ). Note that if a tile is in position x after k steps of the algorithm starting from C , then it is in position f (x) after k steps of the algorithm starting from C. Since the algoithm for Case 1 performs only Loyd moves, the resulting algorithm for C swaps the hole with tiles at a distance either 1 or 3 from it, that is, it performs only NL moves. Now we consider the situation not covered in Case 1 or Case 2. The cases where tile T is in the column to the immediate right of the hole or in the same column as the hole are similar to above, so assume neither of these situations hold, as in Figure 7. Let C be the configuration shown in Figure 7 and let C be the configuration shown in Figure  8. Let f be the bijection from locations in C to locations in C that leaves the horizontal part unchanged and rotates and inverts the vertical part (which consists of locations in the column of T and in the column one unit to the left of T ) so that the location of tile T is sent to the row second from the bottom. Since in C tile T is in the row second from the bottom, we can use the algorithm for Case 1 to swap the hole with tile T , using only Loyd moves, starting from configuration C . As before, we can use the labels of the tiles moved at each step to define an algorithm starting from configuration C. Note that if positions x and y are adjacent in C then f −1 (x) and f −1 (y) are at distance 1 or 3 from each other in C. It follows that the algorithm for configuration C swaps the hole with tiles at a distance 1 or 3 from it, that is, performs only NL moves.
Note that the maximum length of the representation of a PC move using NL moves is at most Bn, for a universal constant B. This also applies to the resulting representation using Loyd moves.
We apply Theorem 3.1 again, this time lettingp (respectively, p) be the measure corresponding to the PC chain (respectively, Loyd chain). We need to bound the quantity Proof. The proof follows the proof of Lemma 6.4 closely. We will show how to represent any HC move using Loyd moves. Consider a move y = (y 1 , y 2 ) of the HC chain. If y is odd then it is also a PC move and hence we can represent it using Loyd moves using the algorithm from the proof of Lemma 6.4. If y is even, then (−y 1 , y 2 ) is odd, and we can represent y using Loyd moves as follows: we perform the algorithm from the proof of Lemma 6.4 to swap the hole with the tile in position (−y 1 , y 2 ), but we interchange the roles of ← and → moves. The resulting algorithm will swap the hole with the tile in position (y 1 , y 2 ) = y. We have shown that any HC move can be represented by O(n) Loyd moves, so the theorem follows by calculations similar to those leading up to equation (6.20).

Lower bound
In this section we prove a lower bound on the order of n 4 log n for the mixing time of the Loyd chain. For the lower bound, a key fact is that if we look at a tile at times when the hole is immediately to its right, the x-coordinate is doing a random walk on Z n . More precisely, let {L t : t = 0, 1, 2, . . . } be a (discrete time) lazy Loyd process. We write L t (u) for the position of tile u at time t. For a configuration L and tile u let X(L, u) denote the x-coordinate of tile u in configuration L, and define X t (u) := X(L t , u). Define τ 1 (u), τ 2 (u), . . . inductively as follows. Let τ 1 (u) be the first time t such that the hole is immediately to the right of tile u at time t, and for k > 1, let τ k (u) be the first time t > τ k−1 (u) such that the hole is immediately to the right of tile u at time t. The process {X τ k (u) (u) : k ≥ 0} is a symmetric random walk on Z n , which we shall call the u random walk. To see this, note that if m 1 m 2 · · · m l is a sequence of moves between times τ 1 (u) and τ 2 (u) that changes X t (u) from x to x + 1 (mod n), then the sequence of moves m −1 l , m −1 l−1 , . . . , m −1 1 , which occurs with the same probability, would change x to x − 1 (mod n) over the same time interval. Note that each step of the u random walk has a positive holding probability, which is the probability that between times τ k (u) and τ k+1 (u) the value of X t (u) does not change.
Recall that for simple symmetric random walk on a cycle of length n, f (x) = cos 2πx n is an eigenfunction with corresponding eigenvalue cos 2π n . Thus f is an eigenfunction for the u random walk as well. Since the u random walk has a holding probability the corresponding eigenvalue λ > cos 2π n . The rough idea behind the lower bound will be to show that the tiles that start with an x-coordinate close to 0 will tend to stay that way if the number of random walk steps is too low. Let S be the set of tiles u such that f (X 0 (u)) > 1/2. and suppose that the hole EJP 22 (2017), paper 9.
Fifteen puzzle is not initially adjacent to any tile in S. Let µ be large enough so that cos 2π n n 2 ≥ e −µ , (7.1) for all n ≥ 2. (Such a µ exists because cos x has the power series expansion 1 − x 2 2! + x 4 4! − · · · .) Next, define = 1 8 µ −1 ; T = 1 + n 2 log n ; T = (n 2 − 1) T . (7.2) Since there are n 2 − 1 tiles, we can think of the quantity T as the typical number of times that the hole has been to the immediate right of any given tile if the Loyd process has made T steps. We shall bound the mixing time from below by T , which is on the order of n 4 log n. We accomplish this using as a distinguishing statistic the random variable W dist defined by Let k = |S| and let W be the sum of k samples without replacement from a population consisting of values of cos 2πx n for vertices (x, y) ∈ V n . The lower bound follows from Lemmas A and B below, which together imply that W dist − W T V → 1 as n → ∞. In the statements of Lemmas A and B, the random variables depend implicitly on the parameter n of the Loyd process. Theorem 7.1. Let L t be the Loyd process on G n , and let π be the stationary distribution. There is a universal constant c > 0 such that for any > 0, when n is sufficiently large, we have T mix ( ) > cn 4 log n.
Proof. Lemmas A and B together imply that W dist − W T V → 1 as n → ∞. This implies the Theorem since W dist is measurable with respect to L T and T ≥ cn 4 log n for a universal constant c > 0.
We prove Lemma A in subsection 7.1. Lemma B is a straightforward consequence of Hoeffding's bounds for sampling without replacement in [6], which we recall now. Theorem 7.2. Let X 1 , . . . , X k be samples, without replacement, from a population whose values are in the interval [a, b], and suppose that the population mean E(X 1 ) = 0.

Proof of Lemma A
For u ∈ S, let N t (u) be the number of times that the hole has been to the immediate right of tile u, up to time t. Note that for all t, if N t (u) > 0 then τ Nt(u) (u) ≤ t < τ Nt(u)+1 (u).
Recall that f (x) = cos 2πx n . It follows that f (x) = − 2π n sin 2πx n and hence |f (x)| ≤ 2π n for all x. Thus the mean value theorem implies that for every x and k we have n . (7.5) We will prove Lemma A by approximating W dist by the random variable Z := u∈S X τ T (u) (u). The random variable Z is easier to analyze than W dist (but couldn't be used as a distinguishing statistic itself because it is not measurable with respect to L t for any t). For the proof of Lemma A we will need the following propositions. Proof of Lemma A. Recall that W dist = u∈S f (X T (u)). Since for any tile u ∈ S we have |X τ N T (u) (u) − X T (u)| ≤ 1, it follows that |f (X τ N T (u) (u)) − f (X T (u))| ≤ 2π n , by (7.5). Thus (7.6) ≤ 2πn, (7.7) where the last line holds because |S| ≤ n 2 . The main remaining step of the proof is to compute E(Z). We claim that E(Z) ≥ cn 15 Incorporating an extra factor of 1 2 into the constant c yields Lemma A. So it remains only to verify that E(Z) ≥ cn 15/8 , for a universal constant c. Recall that τ k (u) denotes the kth time that the hole is to the right of tile u, and (X τ1(u) (u), X τ2(u) (u), . . . ) is a simple symmetric random walk on Z n with a holding probability. Since the second eigenvalue for this walk λ satisfies λ > cos 2π n , it follows that for all t we have E f (X τt(u) (u) | X τ1(u) (u) ≥ f (X τ1(u) (u))λ t−1 , and since f (X τ1(u) (u)) ≥ f (X 0 (u)) − 2π EJP 22 (2017), paper 9.

Fifteen puzzle
Substituting t = T and summing over u ∈ S gives The expression in square brackets can be bounded below by cn 2 for a universal constant c, since for every u ∈ S we have f (X 0 (u)) ≥ 1 2 . Furthermore, since T − 1 ≤ n 2 log n by (7.2) and λ n 2 ≥ e −µ by (7.1), it follows that EZ ≥ cn 2 exp(−µ log n) = cn 15/8 .
(Recall that µ = 1/8.) This verifies the claim and hence proves the lemma.

Proofs of Propositions 7.3 and 7.4
It remains to prove propositions 7.3 and 7.4, which were used in the proof of Lemma A. This is done is subsections 7.2.1 and 7.2.2, respectively.

Proof of Proposition 7.3
Recall that N t (u) denotes the number of times the hole has been to the immediate right of tile u, up to time t. The main step in the proof of Proposition 7.3 is to show that N t is well approximated by t(n 2 − 1) −1 . We accomplish this using the second moment method.
In order to bound the mean and variance of E(N t (u)), we use the fact that the position of the hole relative to tile u (that is, the position of the hole minus the position of tile u) behaves like a random walk on a certain graph. Let G n be the graph obtained from G n by deleting the origin and adding an edge from (−1, 0) to (1, 0) and an edge from (0, 1) to (0, −1). (Figure 9 shows G n when n = 5.) Note that if H t denotes the the position of the hole at time t in the Loyd chain, then H t − L t (u) is the same random process as a random walk on G n . The times τ k (u) coincide with the times when the random walk on G n is at the vertex (1, 0). In Lemmas 7.5 and 7.6 below, we use the connection to the random walk on G n to bound the mean and variance of N t (u). Lemma 7.5. There is a universal constant A such that for any tile u and time t we have E(N t (u)) − t(n 2 − 1) −1 ≤ A log t. (7.8) Proof. Let {p(x, y)} be transition probabilities for the random walk on G n . Lemma 8.2 in Appendix A states that there is a universal constant A > 0 such that for all t ≥ 1, where π(y) is the stationary probability (n 2 − 1) −1 . Since the hole is not initially to the right of tile u, using (7.9) with x = H t − L 0 (u) and y = (1, 0) gives A k (7.10) ≤ A log t.  cov(I i I j ). (7.12) The first term is at most E(N t (u)) (since for each i we have var(I i ) ≤ E(I 2 i ) ≤ E(I i )) and recall that Lemma 7.5 implies that E(N t (u)) is at most t(n 2 − 1) −1 + A log t. To bound the second term in (7.12), note that for each i and j with i < j we have where in the last line we used Lemma 8.2 to bound each expectation. Expanding each product and then collecting terms gives If we sum this over j with i < j ≤ t, then the result is at most EJP 22 (2017), paper 9.
If we sum this over i with 1 ≤ i ≤ t, then the result is at most 2At log t + 2At log t π(y) + A 2 log 2 t, (7.13) which is of the form O(n −2 t log t) + O(log 2 n). (Note that since t ≤ n 5 , we have log 2 t = O (log 2 n).) The result follows if we note that log n = n −2 (n 2 log n), which is at most n −2 t whenever n 2 log n ≤ t.
We will need one more lemma before proving Proposition 7.3, but first we recall Hoeffding's bounds for sums of independent random variables.  0 ≤ s ≤ Cn 4 log n; 0 ≤ t ≤ Cn 4 log n; |s − t| ≥ √ n. (7.14) Then for every p > 1 there is a constant C p , which depends only on p, such that Proof. Since each M n is bounded it is enough to show that lim sup n→∞ E(M p n ) < ∞. If |s − t| > √ n then applying Hoeffding's bounds with α = c|t − s| β gives Define p n (c) := P(M n > c). There are at most C 2 n 10 pairs (s, t) that satisfy the conditions in (7.14). Thus if n is large enough so that for all c ≥ 1 we have a union bound implies that for all c ≥ 1 we have p n (c) ≤ e −c 2 and hence Now that we have Lemmas 7.5, 7.6 and 7.8, we are ready to prove Proposition 7.3 Proof of Proposition 7.3. Since T is O(n 4 log n), applying Lemma 7.6 with t = T implies that when n is sufficiently large, we have var(N T (u)) ≤ Cn 2 log 2 n. It follows that where in the second line we have used the inequality E|X − EX| ≤ sd(X), valid for all random variables X, to bound the first term and Lemma 7.5 to bound the second term.
It follows that E N T (u) − T ≤ Bn log n, (7.16) for a universal constant B.
, that is, the change of the u random walk after k − 1 steps. Note that we can write S k as Y 1 + Y 2 + · · · Y k , where the Y i are i.i.d. ±1 random variables. Fix β ∈ ( 1 2 , 3 4 ). Since |S N T (u) − S T | ≤ |N T (u) − T |, and since N T (u) and T can both be bounded above by Cn 4 log n for a universal constant C, it follows that if M n is defined as in the statement of Lemma 7.8, then where in the last line we have used Lemma 7.8 to bound E(M p n ) 1/p and (7.16) to bound E|N T (u) − T |.
Taking expectations in (7.17) gives Now, let γ ∈ (β, 3 4 ). Then there is a constant B such that for a constant B . Define X final (u) = X τ T (u) (u). Summing (7.20) over u ∈ S gives Combining this with Markov's inequality yields the proposition, since γ < 3 4 .

Proof of Proposition 7.4
We prove Proposition 7.4 using the method of bounded differences. The main step is to show that each step of the Loyd process has a small effect on the conditional expectation of Z, which we prove via Lemma 7.9 below. Define f final (u) := f (X final (u)), so that we can write Z as EJP 22 (2017), paper 9.
Let H t = (L 0 , L 1 , . . . , L t ) be the history of the Loyd process up to time t. We call the Markov chain (H t : t ≥ 0) the history process. If H = (L 0 , . . . , L k ) is a state of the history process, we write L(H) for the Loyd configuration L k . Let H → H be a possible transition of the history process. We aim to compare the distribution of Z when the history process starts at H versus when it starts from H. We shall refer to the history process started from H (respectively, H) as the primary (respectively, secondary) history process.
Convention. If a random variable W is defined in terms of the primary process, we write W for the corresponding random variable defined in terms of the secondary process, and similarly for events. Lemma 7.9. We have for a universal constant D.
Proof. Our main tool is coupling. Note that to demonstrate a coupling of the primary and secondary history processes, it is sufficient to demonstrate a coupling of the Loyd process started from L := L(H) and the Loyd process started from L := L( H). We call these processes the primary and secondary Loyd processes, respectively.
We start by bounding |E(f final (u))−E( f final (u))| for the case when u is the tile swapped with the hole in the transition from L to L. We can couple the secondary Loyd process with the primary Loyd process so that the way that the hole moves after the first time it is to the right of tile u is the same in both processes. Since with this coupling we have n . (7.21) Let S be the set of tiles in S that are not swapped with the hole in the transition from L to L. We now consider the tiles in S . It will be convenient to group the tiles in columns (i.e., group them according to their x-coordinates) and then consider the columns one at a time.
Let H t be the location of the hole at time t in the primary Loyd process, and suppose that H 0 = (h x , h y ). Let C be a column in V n , that is, a set of the form {(j, k) : k ∈ Z n } for some j ∈ Z n , and suppose that |h x − j| = d (that is, the hole is initially a distance d from C), where d ∈ {0, 1, 2, . . . }. We claim that there is a universal constant D such that Summing this over columns C and combining this with (7.21) proves the Lemma.
We now prove the claim. We verify (7.22) by constructing a coupling of the primary Loyd process and the secondary Loyd process. The coupling is designed so that if the hole is initially far away from column C, then H t is likely to couple with H t before it gets close to column C.
Let C L and C R be the columns to the immediate left and right, respectively, of C. We now give a rough description of the coupling. The nature of the coupling will depend on whether the hole moves horizontally or vertically in the transition from L to L. If the hole moves horizontally (respectively, vertically), then the trajectory of H t is the reflection of the trajectory of H t about a vertical (respectively, horizontal) axis, up until the time when either the holes have coupled or one of them has reached column C, C R or C L . We now give a more formal description in the case where H 0 = (h x + 1, h y ). ( Note that if the x-coordinate of H t takes the value h x + 1 before either H t or H t hits C, C R or C L then the holes couple before either of them affects tile s. Let T be the first time either H t or H t hits columns C, C L or C R . Let E be the event that the holes have not coupled before time T . We claim that there is a universal constant K > 0 such that P(E) ≤ K d . (7.23) Recall that d is the initial distance between the hole and column C. We may assume that d > 0 since otherwise the bound is trivial provided that K ≥ 1.
It is enough to verify (7.23) in the folowing two cases, since we can always reduce to one of these cases by interchanging the roles of H t and H t if necessary: Let T C be the first time that the hole is in column C. For tiles u ∈ S that are initially in column C, let T R (u) (respectively, T L (u)) be the first time that the hole is to the immediate right (respectively, left) of tile u. Let R u be the event that T R (u) = min(T R (u), T L (u), T C ) and let L u be the event that T L (u) = min(T R (u), T L (u), T C ). Define Note that (7.5) implies that |z R − z L | ≤ 2π n . We say that the hole is beside a tile if it is to its immediate right or immediate left.
Note that if the hole starts in the same column as tile s, then the next time the hole is beside tile s it is equally likely to be to its right as to its left. It follows that Rearranging terms gives Similarly, we also have Replacing each probability in (7.25) and (7.26) with the expectation of an appropriate indicator random variable, and then subtracting (7.26) from (7.25), gives Note that 1 Ru − 1 Rs and 1 Lu − 1 Ls are both 0 on the event that the holes couple before either one hits C R or C L . It follows that where Y (u) is the indicator of the event that the hole is beside tile u before time T C . Let Y = u∈C Y (u) be the total number of positions in column C L and C R visited before time T C . Summing over u ∈ C gives Note that Y and Y are both 0 unless the event E occurs and recall that (7.23) gives P(E) ≤ K d+1 . Furthermore, the condional distribution of both Y and Y given E is geometric( 1 4 ), since each time the hole is in column C R or C L , it moves to column C in the next step with probability 1 4 . It follows that Finally, recall that ∆ = z R − z L and hence |∆| ≤ 2π n by (7.24). Combining this with (7.29) and (7.31) vertifies (7.22), which proves the lemma. Now that we know there are bounded differences, we are ready to prove Proposition 7.4: Proof of Proposition 7.4. We need to show that for any b > 0 we have P(|Z − E(Z)| > bn 7/4 ) → 0 as n → ∞, where Z = u∈S f (X final (u)). Recall that τ k (u) is the kth time at which the hole is to the immediate right of tile u. Define τ = max u∈S τ T (u). Let F t = σ(L 1 , . . . , L t ) and consider the Doob martingale M t := E(Z | F t ).
The idea of the proof will be to evaluate the martingale at a suitably chosen time K. The value of K will be chosen to be large enough so that τ ≤ K with high probability, but small enough so that the Azuma-Hoeffding inequality will give a good large deviation bound for M K . To these ends, we choose K = n 5 . Note that Z is determined by time τ . Hence M K = Z unless τ > K. Furthermore, we have E(M K ) = E(Z). It follows that P(|Z − E(Z)| > bn 7/4 ) ≤ P(|M K − E(M K )| > bn 7/4 ) + P(τ > K). (7.32) We now bound each term on the righthand side of (7.32). We start with the first term. Lemma 7.9 implies that |M t − M t−1 | ≤ D log n n , for t with 1 ≤ t ≤ K. Thus the Azuma-Hoeffding bound gives where α = D log n n . Substituting x = bn 7/4 and K = n 5 into (7.33) gives P(|M K − E(M K )| ≥ bn 7/4 ) ≤ 2 exp −b 2 n 7/2 2n 3 D 2 log 2 n (7.34) = 2 exp −b 2 n 1/2 2D 2 log 2 n , (7.35) which converges to 0 as n → ∞. Next, we bound P(τ > K). Note that τ T (u) ≤ K whenever N K (u) ≥ T . Furthermore, since K = n 5 , Lemmas 7.5 and 7.6 imply that for sufficiently large n we have E(N K (u)) ≥ n 3 − O(log n); var(N K (u)) = O(n 3 log n).
Note also that T is o(n 3 ). Thus Chebyshev's inequality implies that P(N K (u) < T ) is O log n n 3 , and hence P(τ T > K) is O log n n 3 . Thus a union bound implies that P(τ > K) is O log n n , and hence converges to 0 as n → ∞. This completes the proof.

Appendix A: Probability bounds for random walk on G n
In this section we derive bounds on transition probabilities for random walk on G n .
First, we give some definitions and extract some necessary results from [12].
Proof. Recall that G n denotes the n × n torus Z 2 n . We write Φ (respectively, Φ) for the conductance profile for the lazy random walk on G n (respectively, G n ). It is well known that Φ satisfies Φ(u) ≥ C n √ u , (8.5) for a universal constant C > 0. Let V n be the vertex set of G n . Since for S ⊂ V n , the boundary size and stationary probability of S, with respect to random walk on G n , are within constant factors of the corresponding quantities with respect to random walk on G n , it follows that the conductance profile Φ for random walk on G n satisfies the similar inequality Φ(u) ≥ C n √ u , (8.6) for a universal constant C > 0. Fix 0 < α < 1. Using Theorem 8.1 with = α/π(y) gives p t (x, y) − π(y) ≤ α  4 C −2 n 2 du ≤ 1 + 16π(y)n 2 for a universal constant A > 0, where the last line follows from the fact that π(y) is O(n −2 ). It follows that p t (x, y) − π(y) ≤ A t , (8.9) for all t ≥ 1, and the proof is complete.

Appendix B
Lemma 9.1. Let W t = (X t , Y t ) be a simple random walk on Z 2 , started at (0, 1). Fix a positive integer k and let L 1 , L 2 and L 3 be the lines y = 0, y = k and |x| = k, respectively. Let T 2 and T 3 be the hitting times of L 1 ∪ L 2 and L 1 ∪ L 3 , respectively.
(i) P(W T2 ∈ L 2 ) = 1 k . (ii) Proof. (i) This is immediate by the optional stopping theorem because Y t is a bounded martingale and T 2 is a stopping time.
(ii) Let T = min(T 2 , T 3 ). Note that T 3 and T are stopping times. A routine calculation shows that Y 2 t −X 2 t is a martingale. It follows that Y 2 t∧T −X 2 t∧T is a bounded submartingale. Thus the optional stopping theorem implies that But since Y 2 t∧T is a bounded submartingale and T ≤ T 2 , we have where the last line holds because P(Y T2 = k) = 1 k by part (i) of the lemma. Combining this with (9.1) gives E X 2 T + Y 2 T < 2k. (9.2) It follows that where first inequality is Markov's and the second follows from (9.2). This verifies (ii) because W T ∈ L 2 ∪ L 3 whenever W T ∈ L 3 .