Perfect shuffling by lazy swaps

We characterize the minimum-length sequences of independent lazy simple transpositions whose composition is a uniformly random permutation. For every reduced word of the reverse permutation there is exactly one valid way to assign probabilities to the transpositions. It is an open problem to determine the minimum length of such a sequence when the simplicity condition is dropped.


Introduction
Let S n be the symmetric group of all permutations of 1, . . . , n, with composition given by (στ )(i) := σ(τ (i)) for 1 ≤ i ≤ n. A lazy transposition with parameters (a, b, p) is a random permutation T that with probability p equals the transposition (or swap) t(a, b) := (a b) ∈ S n exchanging the elements in positions a and b, and otherwise equals the identity id ∈ S n . Given a sequence of parameters S = (a i , b i , p i ) ℓ i=1 , let T 1 , . . . , T ℓ be independent lazy transpositions, where T i has parameters (a i , b i , p i ). We say that S is a (perfect) transposition shuffle (of order n and length ℓ) if the composition T 1 · · · T ℓ of these random permutations is uniformly distributed on S n . We pose the following apparently unsolved question. Question 1. What is the minimum possible length L n of a transposition shuffle of order n? Is it the case that L n = n 2 for all n? The best bounds we know for general n are log 2 n! ≤ L n ≤ n 2 .
The lower bound (which is of course asymptotic to n log 2 n as n → ∞) follows by the obvious counting argument: a composition of ℓ lazy transpositions can take at most 2 ℓ possible values, while #S n = n!. In the other direction, we have several distinct constructions of transposition shuffles of length exactly n 2 (for all n), and none shorter (for any n). It can be verified by case analysis that L n = n 2 for n ≤ 4. Computer experiments by Viktor Kiss (personal communication) suggest that the same holds for n = 5 also.
Our main result addresses the special case of simple transposition shuffles, by which we mean those that transpose only adjacent pairs: b i = a i + 1 for all i. We will characterize the simple transposition shuffles of minimum length.
For 1 ≤ a < n denote the simple transposition t(a) = t(a, a + 1) ∈ S n . We call a sequence (a i ) ℓ i=1 a reduced word of order n if ℓ = n 2 and if the (deterministic) composition t(a 1 ) · · · t(a ℓ ) equals the reverse permutation ρ := [n, . . . , 1]. (It is easily verified that n 2 is the minimum number of simple transpositions whose composition is ρ, and that ρ is the unique permutation in S n for which this minimum is largest. Reduced words are extensively studied -see the later background discussion.) We construct a simple transposition shuffle from each reduced word as follows.
Construction 1 (Simple transposition shuffles). Let (a i ) ℓ i=1 be any reduced word. Write σ j := t(a 1 ) · · · t(a j ) for the composition of the first j transpositions, and let The minimum length of a simple transposition shuffle of order n is ℓ = is any reduced word of order n, then S as defined in Construction 1 above is a simple transposition shuffle. Moreover, every simple transposition shuffle of minimum length arises in this way.
Turning to general (non-simple) transposition shuffles, we will describe several other constructions below, all of length exactly n 2 , including some with rational probabilities p i that are not of the form d/(d+1) for integer d, and others with irrational probabilities.
For n ≥ 3 it is not possible for all the probabilities p i to equal 1 2 , since then the probability of each permutation would be a dyadic rational rather than 1/n!. However, we will show that 1 2 must appear rather frequently. Theorem 2. In any transposition shuffle S = (a i , b i , p i ) ℓ i=1 of order n, we have #{i : p i = 1 2 } ≥ n − 1. If the length ℓ equals the (in general unknown) minimum L n then p 1 = p ℓ = 1 2 . Theorem 1 implies that in a simple transposition shuffle of minimum length, the sequence of probabilities (p i ) ℓ i=1 cannot be altered to give another transposition shuffle. In the general case the following weaker statement holds.
Proposition 3. In a transposition shuffle S = (a i , b i , p i ) ℓ i=1 of order n and length L n , the probabilities are rigid in the sense that no single p i may be altered to give another transposition shuffle.
Additional constructions. Next we describe the promised further constructions, together with brief explanations of their correctness and properties. Also see Figure 1.  Construction 2 (Sweeping). We first note the following obvious inductive scheme for constructing transposition shuffles. Fix n. Call a sequence of parameters (a i , b i , p i ) k i=1 a sweep (of order n and length k) if the composition of independent lazy transpositions of S n with these parameters, π say, has the property that its last element π(n) is uniformly distributed on 1, . . . , n. The concatenation of any sweep of order n followed by any transposition shuffle of order n − 1 clearly gives a transposition shuffle of order n: the sweep randomizes the final element, then the shuffle shuffles the other elements.
Applying the inductive construction of the previous paragraph to this example gives a simple transposition shuffle which is a special case of Construction 1. Another straightforward sweep is 1, n, 1 2 , 2, n, 1 3 , . . . , n − 1, n, 1 n .
Here is an inductive construction of sweeps, also of length n − 1, generalizing the last example. Fix a partition of {1, . . . , n − 1} into non-empty sets D 1 , . . . , D r , denote their sizes m j := #D j , and fix an element d j ∈ D j of each. Apply any sweep of order m 1 to D 1 in such a way that element π(d 1 ) of the resulting permutation π is the one that is uniform on D 1 . (Formally, modify the sweep by mapping the parameters ( ) Then do similarly for each of D 2 , . . . , D r . Finally apply a sequence of lazy transpositions with parameters 1+m 1 +m 2 +m 3 , . . . , d r , n, mr n . This ensures that the probability that some element of D j ends up in location n is m j /n, as required. ♦ Since the sweeps constructed above all have length n − 1, the resulting shuffles have The length of a sweep of order n must be at least n − 1, since the graph on vertices 1, . . . , n with edges {(a i , b i ) : i = 1, . . . , k} needs to be connected. So the construction cannot help us to get transposition shuffles of length less than n 2 . The probabilities p i that result from Construction 2 are all rational, but (unlike those of Construction 1) need not be of the form d/(d + 1) for integer d. The construction can also give shuffles with #{i : p i = 1 2 } strictly greater than n − 1. Construction 3 (Divide and conquer). Fix n and let h = ⌊n/2⌋. Call the integers 1, . . . , h light and h + 1, . . . , n heavy. First apply any transposition shuffle of order h, to shuffle the light elements. Follow this with any transposition shuffle of order n − h = ⌈n/2⌉ on the heavy positions, to shuffle the heavy elements. (Formally, modify each lazy transposition by replacing parameters (a i , b i , p i ) with (a i + h, b i + h, p i ), and append these triples to the previous list.) Now apply a sequence of lazy transpositions with parameters where the probabilities q j are chosen so that the sum of j independent Bernoulli random variables with parameters q 1 , . . . , q h is equal in law to #{i > h : π(i) ≤ h} where π is a uniformly random permutation of S n -this is a hypergeometric distribution. The fact that this is possible is proved in [12]. (Indeed, the analogous fact holds for a general hypergeometric distribution. This amounts to the fact that hypergeometric distributions are strongly Rayleigh -see [4]). At this point, the light and heavy elements are both shuffled, and the number of light elements in heavy positions has the correct distribution. To complete the construction, we again apply any transposition shuffle to the light positions and apply any transposition shuffle to the heavy positions (as at the start). This ensures that the locations of the light and heavy elements are shuffled. ♦ If the order-h and order-(n − h) shuffles used in Construction 3 have lengths ℓ h and ℓ n−h respectively then the resulting transposition shuffle has length In particular, if ℓ h = h 2 and ℓ n−h = n−h 2 then this is exactly n 2 , so again the construction is of no help in beating this threshold. On the other hand if it were known that L n < n 2 for some fixed n then using Constructions 2 and 3 we could deduce that L n ≤ (1 − ǫ) n 2 for some ǫ > 0 and all sufficiently large n. The probabilities q j in Construction 3 are in general irrational (but algebraic). The construction also gives examples of transposition shuffles of length n 2 in which two of the probabilities p i may be simultaneously altered to give another transposition shuffle (compare Theorem 1 and Proposition 3). For example, two distinct q j can be exchanged.
Background. Reduced words have been studied in depth. For example, it is known [15] that the number of reduced words of order n is n 2 ! 1 n−1 3 n−2 5 n−3 · · · (2n − 1) 1 and that they are in bijection with Young tableau in a certain class [8,10]. The uniformly random reduced word of order n has remarkable structure and properties [2,5,6].
The term reduced word typically refers to a minimum-length sequence of simple transpositions whose composition is an arbitrary specified permutation [9] (not just ρ), and the concept can be extended to general Coxeter (and other) groups. In [2] and elsewhere reduced words are referred to as sorting networks (see below).
Closely related to transposition shuffles are permutation networks and sorting net- is a permutation network of order n if for every permutation π ∈ S n there is some subsequence j(1), . . . , j(r) of 1, . . . , ℓ such that t(a j(1) , b j(1) ) · · · t(a j(r) , b j(r) ) = π. Clearly if (a i , b i , p i ) ℓ i=1 is a transposition shuffle then (a i , b i ) ℓ i=1 must be a permutation network. Define the sort operator s(a, b) by x · s(a, b) := x ′ where for a sequence x = (x 1 , . . . , x n ) the sequence x ′ agrees with x except that x ′ a = min(x a , x b ) and such that for every permutation π ∈ S n we have π · s(a 1 , b 1 ) · · · s(a ℓ , b ℓ ) = id. Every sorting network is a permutation network.
There are permutation networks of order n and length asymptotic to n log 2 n as n → ∞ [17], asymptotically matching the obvious lower bound ⌈log 2 n!⌉ ∼ n log 2 n. There are sorting networks of length O(n log n) [1], but known constructions are quite indirect and complex, with impractically large constants in the O notation; on the other hand there are straightforward constructions of length O(n log 2 n) with reasonable constants [3]. Can these networks be turned into transposition shuffles?
Restricting attention to simple transpositions, (a i , a i + 1) ℓ i=1 is a sorting network if and only if it is a permutation network, and moreover this is equivalent to the condition t(a 1 ) · · · t(a ℓ ) = ρ; see e.g. [13, 5.3.4]. Thus, the minimum length of a simple permutation network (or sorting network) is n 2 , and the minimal examples coincide with reduced words as defined earlier.
Sorting networks have applications in distributed or hardware-optimized systems such as graphics processor units. Transposition shuffles also appear natural for applications, since the ability to permute objects uniformly is useful for privacy or security as well as for games of chance.
To our knowledge transposition shuffles have not been considered before. The problem of "square roots" of uniform measure addressed in [7] is somewhat related, while asymptotic shuffling under various random transposition models has been studied extensively -see e.g. [14] for a comprehensive treatment and [11] for a specific model close to the one considered here.

Simple transpositions
We divide the proof of Theorem 1 into two parts. First we show that Construction 1 works; then we show that it exhausts the possibilities. We will use several standard properties of reduced words. For a reduced word (a i ) ℓ i=1 recall that we write σ j = t(a 1 ) · · · t(a j ) for the (deterministic) permutation after j steps, so that in particular σ 0 = id and σ ℓ = ρ. On the other hand we write π j = T 1 · · · T j for the (random) composition of the first j lazy transpositions, so that in a transposition shuffle π ℓ is uniform on S n .
A reduced word (a i ) ℓ i=1 may be transformed into another via moves of the following types.

Proposition 4 (Tits, [16]). Any reduced word may be transformed into any other via a sequence of moves of types (i) and (ii).
Proposition 4 is a special case of a more general result [16], which applies to reduced words of an arbitrary permutation (not just ρ), and to general Coxeter groups. We remark that the result would be essentially obvious if we in addition allowed moves of the form (k, k) ↔ (), which change the length of the word.
We next address how to transform probabilities under braid moves. See Figure 2 for an example.
which are equivalent to the conditions P + R = Q and (P ′ , Q ′ , R ′ ) = (R, Q, P ).
The corresponding probabilities according to Construction 1 are: is a transposition shuffle. Indeed, this is a special case of Construction 2: the first n − 1 steps form a sweep, so the permutation π n−1 has uniformly random last element π n−1 (n). The remaining sequence of parameters agrees with the entire sequence for n − 1, so by induction, their composition is a uniformly random permutation of elements 1, . . . , n − 1, concluding the argument. Now we apply Proposition 4. Suppose that (a i ) ℓ i=1 and (a ′ i ) ℓ i=1 are reduced words that are related by a single move, and let (p i ) ℓ i=1 and (p ′ i ) ℓ i=1 be the corresponding probabilities given by Construction 1 in each case. It suffices to show that if (a i , a i + 1, . This clearly holds in the case of a commuting move. For a braid move we will use Lemma 5. Suppose without loss of generality that the move replaces (a j , a j+1 , a j+2 ) = (k, k + 1, k) with (a ′ j , a ′ j+1 , a ′ j+2 ) = (k + 1, k, k + 1). Write (u, v, w) = (σ j−1 (k), σ j−1 (k+1), σ j−1 (k+2)) for the three (deterministic) elements involved, which satisfy u < v < w. Let (p, q, r) = (p j , p j+1 , p j+2 ) and (p ′ , q ′ , r ′ ) = (p ′ j , p ′ j+1 , p ′ j+2 ) be the three probabilities before and after the move, and write P = p/(1 − p) so that p = P/(1 + P ), etc. Then the formula for the probabilities in Construction 1 gives These values satisfy the conditions of Lemma 5. We now prepare for the uniqueness part of the proof of Theorem 1. We parameterize the probability space as follows.
The next lemma characterizes the ways in which element 1 can reach position n in the final permutation π ℓ . See Figure 3. Moreover, in that case the trajectory of element 1 satisfies (and is determined by): Proof. First note that in the determinisitic sequence of permutations σ 0 , . . . , σ ℓ , each of the n 2 pairs of elements of {1, . . . , n} swaps exactly once. In particular, element 1 swaps with every other element exactly once, and it must move from left to right when it does so. Let h−1 (1) = a h be the set of times when these swaps occur. Note that no other transpositions are incident to the trajectory of 1; that is, It follows immediately that π −1 i (1) = σ −1 i (1) for all i if and only if ω h = 1 for all h ∈ H. It remains to show that there is no other possible trajectory via which element 1 can end at position n. Suppose on the contrary that ω is such that π ℓ (1) = n but for some i, consider the largest i for which this holds. Then we must have (1) = k, say. But this implies that a i+1 = k − 1 (and ω i+1 = 1), giving a contradiction to (1) and the definition of H. On the other hand if π −1 i (1) > σ −1 i (1) for some i, considering the smallest such i leads similarly to a contradiction.
Proof of Theorem 1 -uniqueness. It is clear that no simple transposition shuffle can have length less than n 2 , since it would be incapable of producing the reverse permutation ρ (in which every pair of elements is reversed).
It remains to show uniqueness: for any reduced word (a i ) ℓ i=1 there is at most one sequence of probabilities (p i ) ℓ i=1 for which (a i , a i + 1, p i ) ℓ i=1 is a transposition shuffle. We prove this statement by induction on the order n. It is clearly true for n ≤ 2.
To make use of this last fact we delete the trajectory of element 1 from the reduced word to get a lower-order word. More precisely, define (c i ) ℓ i=1 by where we use ∞ as a dummy symbol. Then the subsequence (a ′ i ) ℓ ′ i=1 := (c i : i / ∈ H) obtained by deleting all occurrences of ∞ is a reduced word of order n − 1 (and length ℓ ′ := n 2 − (n − 1) = n−1 2 ). See Figure 3. Now consider any ω ∈ {0, 1} ℓ that satisfies ω h = 1 for all h ∈ H, and define the subsequence ω ′ := (ω i : i / ∈ H) ∈ {0, 1} ℓ ′ . Let π ′ ℓ ′ be the final permutation of an order-(n − 1) simple transposition shuffle with word (a ′ i ) ℓ ′ i=1 at the element ω ′ of its probability space. Then the final permutation under the original shuffle at ω is π ℓ = π ′ ℓ ′ (1) + 1, . . . , π ′ ℓ ′ (n − 1) + 1, 1 . Therefore, by the induction hypothesis, there is at most one possible choice of the vector of probabilities (p i ) i / ∈H that results in the correct conditional law of the permutation π ℓ given π ℓ (n) = 1. Now, by symmetry, we can apply the same argument to the set h−1 (n) = −1 of times when element n moves, to deduce that there is also at most one choice for the vector of probabilities (p i ) i / ∈ H . Now, H ∩ H has exactly one element: it is the unique time k at which elements 1 and n swap in σ 0 , . . . , σ ℓ . Hence there is at most one choice for (p i ) i =k . But since h∈H p h is determined, there is at most one choice for p k also.

General Transpositions
Proof of Proposition 3. We fix the parameters (a i , b i ) ℓ i=1 and consider dependence of the law of the final permutation π ℓ on the probabilites (p i ) ℓ i=1 . For any given permutation α ∈ S n we have for some set S α ⊆ {0, 1} ℓ . Suppose we vary one probability p j while fixing the others. Then the dependence is affine: where the constants A and B depend on j, α and (p i : i = j). Suppose that the choice of probabilities (p i ) ℓ i=1 gives a transposition shuffle, and so also do the probabilities obtained by altering p j (only) to a different value p ′ j = p j . Then in particular A + Bp j = A + Bp ′ j = 1/n!, so B = 0, hence any choice of p ′′ j ∈ [0, 1] will also give P(π ℓ = α) = 1/n!. The same argument applies for every permutation α, so any choice of p ′′ j gives a transposition shuffle. But in particular we can take p ′′ j = 0 and remove the jth lazy transposition altogether, so ℓ was not minimal.
Proof of Theorem 2. We first prove the statement about the first and last probabilities. Suppose S = (a i , b i , p i ) ℓ i=1 is a minimum-length transposition shuffle. Appending a lazy transposition with parameters (a ℓ , b ℓ , 1 2 ) to the sequence clearly gives another transposition shuffle. But now the last two lazy transpositions can be replaced a single one of parameters (a ℓ , b ℓ , 1 2 ). This contradicts rigidity, Proposition 3, unless p ℓ = 1 2 . Symmetry gives p 1 = 1 2 also. Now we turn to the claim about the number of occurrences of 1 2 . To any random permutation π of S n we can associate the n × n matrix M(π) with entries M(π) i,j = P(π(i) = j).
In a transposition shuffle, M(π ℓ ) is the matrix with all entries 1/n, which has rank 1. On the other hand, the matrix M(T ) of the lazy transposition with parameters (a, b, p) agrees with the identity except in the intersection of rows a and b with columns a and b, where it has the form 1 − p p p 1 − p .
Thus M(T ) has rank n if p = 1 2 and rank n − 1 if p = 1 2 . Sylvester's rank inequality states that n − rank(AB) ≤ n − rank(A) + n − rank(B) for n × n matrices A, B, so we deduce that {i : p i = 1 2 } ≥ n − 1 as required.