Total Variation and Separation Cutoffs are not equivalent and neither one implies the other

The cutoff phenomenon describes the case when an abrupt transition occurs in the convergence of a Markov chain to its equilibrium measure. There are various metrics which can be used to measure the distance to equilibrium, each of which corresponding to a different notion of cutoff. The most commonly used are the total-variation and the separation distances. In this note we prove that the cutoff for these two distances are not equivalent by constructing several counterexamples which display cutoff in total-variation but not in separation and with the opposite behavior, including lazy simple random walk on a sequence of uniformly bounded degree expander graphs. These examples give a negative answer to a question of Ding, Lubetzky and Peres.


Introduction
Consider an irreducible discrete-time Markov chains X = (X t ) t≥0 , defined on a finite state space Ω (we call a chain finite if Ω is finite). We let P denote its transition matrix. We further assume that X is reversible, that is that there exists a probability measure π which satisfies the detailed balanced equation ∀x, y ∈ Ω, π(x)P (x, y) = π(y)P (y, x).
This measure is unique because of irreducibility. Let us assume furthermore that our Markov chain is lazy, meaning that ∀x ∈ Ω, P (x, x) ≥ 1/2. (1.1) A particular important case of such a Markov chain is lazy simple random walk (SRW) on a simple graph G = (V, E), in which case Ω = V , P (x, y) = 2 deg(x) and π(x) = deg(x) 2|E| , where deg(x) := |{y : {x, y} ∈ E}| and | · | denotes the cardinality of a set. It is a classic result of probability theory that for any initial condition the distribution of X(t) converges to π when t tends to infinity. The object of the theory of Mixing times of Markov chains is to study the characteristic of this convergence (see [16] for a self-contained introduction to the subject).
We denote by P t x (P x ) the distribution of X t (resp. (X t ) t≥0 ), given that X 0 = x. For any two distributions µ, ν on Ω, their total-variation distance is defined to be The (total-variation) ε-mixing-time is defined as t mix (ε) := inf {t : d(t) ε} .
When ε = 1/4 we omit it from the above notation.
Theorem A (Chen and Saloff-Coste [7]). Let (Ω n , P n , π n ) be a sequence of reversible lazy Markov chains. Let λ (n) 2 be the second largest eigenvalue of P n . Then the following assertions are equivalent • The sequence exhibits ℓ p -cutoff for some 1 < p ≤ ∞.
Observe that under reversibility (for any fixed chain) (1.5) expresses an equivalence between the separation and the total-variation mixing times, parallel to the one, expressed in (1.6), holding between the different ℓ p mixing times for p ∈ (1, ∞]. Hence a natural question (in light of Theorem A) is whether (under reversibility) there is cutoff in totalvariation if and only if there is cutoff in separation. This is Question 5.1 in [10], where an affirmative answer was given for the class of birth and death chains (which are Markov chains for which the set of edges (x, y) with P (x, y) > 0 forms a segment). In fact, both cutoffs were shown to be equivalent to the product condition (3.2).
In this note we give a negative answer to that question in general by constructing counter-examples. Theorem 1.1.
(i) Total-variation and separation cutoff are not equivalent for lazy reversible Markov chains and neither one implies the other.
(ii) The above statement remains true within the class of lazy simple random walks on graphs of maximal degree at most 7.
Remark 1.2. We can also produce non-reversible or non-lazy counter-examples by performing artificial modifications in our chains, but this is not a very important point. Nonlazy or non-reversible chains can have very pathological behavior and we want to underline that we are not using "unfair tricks" to produce our counter-examples.
Of course a full proof of this statement only requires two counter-examples as (ii) is a stronger statement than (i). However, we have chosen to include also examples that are not simple random-walks because they are much simpler. We present a total of five counter-examples. Apart from the first one, they are all lazy (weighted nearest-neighbor) random walks on bounded degree graphs, with transition rates which are bounded away from zero. The last two example, which are a bit more technical to analyze, are lazy SRWs on a sequence of bounded degree graphs G n := (V n , E n ) (i.e. sup n max v∈Vn deg(v) < ∞).
Note that for all our counter-examples the graph supporting the transitions contains some cycles. An interesting open problem is to determine whether Theorem B can extended to the case of lazy weighted nearest-neighbor random walk on trees for which it is already known (cf. [5]) that separation cutoff implies total-variation cutoff.
A sequence of Markov chains is said to display pre-cutoff (in total-variation resp. separation) if The proof of (1.8) involves more computation than (1.7). We present a complete proof of it in Appendix A.2) These two inequalities imply that the notion of pre-cutoff is equivalent for the two distances and the pre-cutoff ratio of one is at most twice that of the other. In particular, cutoff in one distance implies pre-cutoff with ratio at most 2 in the other. With our examples, we shall show that this is in fact sharp in some cases: There exists a sequence of lazy reversible Markov chains for which we have cutoff in total-variation and only pre-cutoff with ratio 2 in separation and vice-versa.
Our last point of comparison between total-variation mixing and separation mixing is related to the width of the cutoff window. We say that a sequence of chains exhibits total-variation (resp. separation) cutoff with a cutoff window w n if w n = o(t (n) mix ) and for all 0 < ε ≤ 1/4 there exists some constant C ε > 0 (depending only on ε) such that . Note that the window defined in this manner is not unique, but informally "the" cutoff window is given by the "smallest such w n ". Our examples demonstrate that the cutoff windows for total-variation and separation do not have the same behavior.
The following result is due to Chen and Saloff-Coste [6,Theorem 3.4]. We present a much simpler proof in the Appendix.
Theorem C. Let (Ω n , P n , π n ) be a sequence of lazy irreducible finite chains which exhibits total-variation cutoff with a cutoff window w n . Then w n = Ω( t (n) mix ). The bound given by Theorem C is obviously sharp for the biased random walk on a segment. Conversely, some very standard Markov chains like the lazy SRW on the ndimensional hyper-cube have a cutoff window w n >> t (n) mix (here w n = n and t (n) mix = ( 1 2 ± o(1))n log n). As indicated in Remark 1.6 the laziness assumption in Theorem C can be replaced by the assumption that inf n min x∈Ωn P 2 n (x, x) > 0 (as is the case for simple random walk on a sequence of bounded degree graphs).
In light of Theorem C one might expect that whenever separation cutoff occurs for a sequence of discrete-time lazy chains, the width of the separation cutoff window is . We are unaware of any previously analyzed example in which this fails. We find it remarkable that as the following remark asserts, the width of the separation cutoff window for a sequence of discrete-time lazy SRWs on a sequence of bounded degree graphs, can in fact be a constant! This, or more precisely, the mechanism that allows such behavior (see § 2.4 for more on this point) demonstrates that the separation distance can exhibit profoundly different behaviors than the total variation distance.
Our counter-examples show that the cutoff window in one distance can be as small as allowed even if there is no cutoff for the other distance: Remark 1.4. We will construct sequences of bounded degree graphs such that the corresponding sequences of lazy SRWs exhibit the following behaviors (resp.) (i) There is no separation cutoff but there is total-variation cutoff with window t (n) mix .
(ii) There is no total-variation cutoff but there is separation cutoff with window 1. In § 2.4 we refine the statement of (ii) and describe further surprising properties of the relevant example for (ii) above (listed in § 2.4 as properties (i)-(v)).
Remark 1.5. Let δ n ∈ (0, 1). We call a sequence of discrete time chains (Ω n , P n , π n ), δ n -lazy if for all n, P n (x, x) ≥ δ n for all x ∈ Ω n . It is not hard to extend the proof of Theorem C and show that if a sequence of δ n -lazy chains exhibits total-variation cutoff with a window w n , then w n = Ω( δ n (1 − δ n )t (n) mix ). Theorem C can also be extended to the continuous time setup, with the additional assumption that the sum of the transition rates from any given state is bounded above by 1 (or by some absolute constant). Remark 1.6. Let G n = (V n , E n ) be a sequence of connected non-bipartite simple graphs of maximal degree d n . Consider the sequence of (non-lazy) SRWs on G n . Then P 2 n (v, v) ≥ 1/d n , for every v ∈ V n . By considering P 2 rather than P it follows from the previous remark that if the sequence exhibits total-variation cutoff with a window w n , then w n = Ω( t (n) mix /d n ). This is in fact sharp by considering a sequence of random d n -regular graphs of size n for some d n such that lim n→∞ d n = ∞ and d n = o( log n log log n ) [17, Theorem 3]. 1.1. Organization of the note. In § 2 we describe the construction of our examples and our general strategy. We also describe relevant examples due to Aldous and Pak.
In § 3 we introduce a general framework, which under a certain condition, allows to reduce the study of the mixing-time to the study of the hitting time of a special point.
In § 4 we describe two examples of sequences of Markov chains which exhibit totalvariation cutoff but do not exhibit separation cutoff. The first example, Example 1, demonstrates that (1.7) may be sharp (even when the r.h.s. of (1.7) equals 1). The second example, Example 2, is a weighted nearest neighbor random walk on a bounded degree graph with transition probabilities which are bounded away from 0 and 1.
In § 5 we construct an example of a sequence of Markov chains that exhibits separation cutoff but no total-variation cutoff (Example 3).
Finally, in § 6 we transform Examples 2 and 3 into examples of sequences of lazy SRWs on bounded degree Expander graphs. The reason we first describe Examples 2 and 3 is that the key ideas of our constructions are more transparent in theses examples.
2. An overview of the main ideas of our constructions 2.1. A very basic chain with different cutoff times for separation and total variation. In this section we settle with a high-level description of some key ideas. Let us first present a very simple Markov chain which exhibits cutoff in both distances (see Figure 1) but for which the mixing-time in separation is twice as large as that in total variation. Consider a random walk on a segment a, b of length 2n which presents a constant bias towards the middle point which we call z (see Figure 1). Most of the equilibrium measure is concentrated on a small neighborhood of z and for this reason (cf. Proposition 3.3) the total-variation mixing-time corresponds to the time which is needed to hit z (starting from either of the end-points). The system displays cutoff because this hitting time is concentrated around its mean.  Figure 1. A very simple chain for which the separation mixing-time is twice as large as the total-variation mixing-time (6n and 12n, respectively). The transition rates (apart from at the special states a, b and z) are 1/3 in the z direction and 1/6 in the opposite one (the holding probability is 1/2), making the chain travel at speed 1/6 towards z.
The separation mixing-time on the other hand is twice as large. Roughly speaking, this is because for P t (a, b) to come close to its equilibrium value, "information" has to pass from one end to the other. The time required for this to occur corresponds more or less to the sum of the times needed to reach z from a and b, respectively (see Proposition 3.8).
This scheme with two extremal opposite initial conditions, though not ubiquitous among Markov chains, appears in many natural examples for which cutoff has been proved: e.g. the lazy SRW on the hyper-cube (see [16,Theorem 18.3]), the Ising model at high temperature [19] or the adjacent-transposition shuffle on the segment [15].

2.2.
An idea to avoid cutoff in separation while keeping that in total-variation. Our idea to produce counter-examples with total-variation cutoff but only pre-cutoff in separation is to modify the structure (state space and transition rates) of the simple chain above (Figure 1), only on one side (say, the side of b), to break the symmetry. To be precise, in Example 2 we first set the holding probabilities on both sides to be 3/4 (and consider the obtained chain as the "original chain", as opposed to Example 1, for which the chain in Figure 1 serves as the "original chain") before modifying the b-side. We want to perform our modifications in the following manner: • We want to keep the property that every path from a to b goes through z, which shall still bear a positive proportion of the equilibrium mass. • We want a to remain the initial condition from which it takes the longest time to reach equilibrium (equivalently, to hit z). More precisely, we want that also after the modification, the distribution of the hitting time of z, T z := inf{t : X t = z}, starting from a would still stochastically dominate the distribution of T z , starting from any other initial state. Moreover, we want the hitting time distribution of z, starting from any state between a and z (including a), to remain un-changed. • We want the hitting time of z from initial state b, to become non-concentrated, and to remain of the same order of magnitude as the mixing-time of the whole chain. Moreover, we want this hitting time to remain (stochastically) larger than the hitting time of z, starting from any other state which lies between b and z, and to become stochastically dominated by the hitting time distribution of z (from b) in the original chain (which equals the hitting time distribution from a in the modified chain). In this manner, the hitting time distribution of z under P a remains un-changed (and in particular, remains concentrated). Moreover, after the modification it is still the case that d(t) ≈ P a [T z > t], and thus by the aforementioned concentration there is still cutoff in total-variation (see Proposition 3.3). Using Proposition 3.8, we deduce that d sep (t mix +t) ≈ P b [T z > t] and so there is no cutoff in separation as the hitting time distribution of z under P b in the modified chain is no longer concentrated.
To perform such a modification, we borrow ideas from previous constructions of Pak (for Example 1) and Aldous (for Example 2), which we present now.
2.3. Related Constructions. When the product condition (Definition 3.1) was shown to be a necessary condition for cutoff, it was conjectured that it should also be a sufficient one for "nice" chains. However, two counter-examples constructed, respectively by Aldous and Pak (see [5,Example 8.1], [7] and [16,Chapter 18] for a more detailed description and analysis), show that in general the product condition does not imply cutoff. The mechanisms used to prevent cutoff in those two constructions are of different nature.
• Aldous' example ( Figure 2) locally looks like a biased random walk on a segment, so that most of the equilibrium measure is concentred on a small neighborhood of the end-point towards which the walk is biased (we call this end of the segment z and the opposite one b). To avoid cutoff, the half of the segment closer to z is split into two distinct parallel branches. The transition rates on these branches are tuned so that there is still a bias towards z but such that one path is slower than the other. Starting furthest away from equilibrium (i.e. at state b) we have two possible scenarios to reach z given by the two distinct branches and the probability of each is bounded away from 0 and 1. As the speed along the two branches is different, the CDF of the hitting time distribution of z starting from b has two abrupt jumps. Consequently, d (n) (t) exhibits two distinct abrupt drops and there is no cutoff. • Pak's idea is to start with a sequence of chains which exhibits cutoff and to modify it by adding transitions which are such that with a constant rate (which is chosen to be somewhere between the spectral gap and the inverse of the mixing-time of the original chain, say their geometric mean) the system is brought to equilibrium at once. For the modified Markov chain, the total-variation distance decays (up to a negligible error) exponentially with the rate of the newly added transitions and hence cutoff does not occur, neither pre-cutoff.  Figure 2. A version of Aldous' example. The walk is always biased towards z but the speed of the walk depends on the branch. On the top branch, as well as on the rest of the segment, the transition rates are 1/6 in the z direction and 1/12 in the opposite one (the holding probability is 3/4) whereas on the bottom branch the (exit) rates are twice as large (and the holding probability is 1/2), resulting in a larger speed. As a result, two transitions occur for the total-variation distance at times 9n and 12n respectively, where n denotes the total distance from z to b and the length of each of the two parallel branches is ⌈n/2⌉ (above n = 14). The rates at b, z and at the branching point are not very relevant but we display them for the sake of concreteness.
In our Example 1 (see Figure 3), we adapt Pak's idea: on the b-side (of the chain from Figure 1) we add transitions from states on the b-side to the center of mass z, and we choose the inverse of the rate to be of the same order as the mixing-time (which is of order of the length of the segment: n). This makes the hitting time of z started from b non-concentrated and (stochastically) smaller than started from a. Moreover, after this modification, all of the properties described in the beginning of § 2.2 are satisfied.
In our Example 2, (see Figure 4), we simply replace the b-side by Aldous' construction, and set the holding probability on the a-side to be 3/4 (which is the holding probability of the slow branch of the b-side). After this modification, all of the properties described in the beginning of § 2.2 are satisfied.
2.4. An idea to keep cutoff in separation while avoiding that in total-variation. For this part we must rely on a different idea. What we want to alter in our chain is the way the separation distance shrinks to zero. Loosely speaking, in the original chain on the segment, the separation mixing-time is determined by the sum of the hitting times of z from a and b since z is the only channel of communication between the two extremities.
Our construction (Example 3) relies on the following idea (see Figure 5). We take the length of the line segment to be 2(M + 1)n for some large (fixed) integer M .
• We connect the two sides of the segment at a second point z ′ which is far from the center of mass z. We do so by merging the two states which are of distance n from z (one on the a-side and one on the b-side) into a single state z ′ . This connection maintains the cutoff in separation. However, it has the effect of shortening the separation cutoff time by some constant factor, while, as we now describe, drastically altering the nature of the abrupt transition of d sep (t) around the (separation) cutoff time. It follows from our analysis of Example 3 and the refined analysis of Example 5 in § 6.5, that provided that M is taken to be sufficiently large: (i) Also after creating the connection at z ′ we have that lim n→∞ sup t |d (n) sep (t) − max(0, 1 − P t n (a, b)/π n (b))| = 0.
(ii) Due to the connection of A and B at point z ′ , up to negligible terms, around the separation cutoff time, P t n (a, b) is supported by trajectories which never get much closer to z than z ′ is, and so are contained in a set whose stationary probability is exponentially small in n. (iii) Let T a,b z ′ (Definition 3.4) be a random variable distributed as a convolution of the hitting time distribution of z ′ started from a with that started from b (in this case the two distributions are identical). Around the (separation) cutoff time, P t n (a, b)/π n (b) can be understood in terms of the behavior of T a,b z ′ in the large deviation regime (namely, the cutoff occurs around the time t for which sep (and decays exponentially for t < t (n) sep ) and continues to do so for Θ(n) steps around t (n) sep (in particular, shortly after t (n) sep , (a, b) no longer minimizes P t n (x, y)/π n (y)). By (i), it follows that w n = 1 is a (separation) cutoff window (and we can take C ε = C| log ε|, for some absolute constant C, for all ε ∈ (0, 1/4]).
(v) sup t P t n (a, b)/π n (b) = Θ(max t P[T a,b z ′ = t]/π n (z ′ )) = Θ(2 n /n) → ∞ as n → ∞. This behavior (namely, on the one hand having property (i) and on the other having properties (ii), (iv) and (v)) is atypical and quite surprising at first sight.
We are not done yet, as after creating the connection at z ′ , there are two symmetric parallel distinct branches from z ′ to the center of mass z, resulting in the hitting time of z from either a or b being concentrated. Consequently, there is still cutoff in total-variation (as by Proposition • We break the symmetry (between the two branches, but not between a and b) in order to "destroy" the cutoff in total-variation by making the speed along the two paths which link z ′ to z different as in Aldous' example ( Figure 2). Observe that as opposed to Examples 1-2, here a and b play symmetric roles (the chain looks the same starting from either one of them). As one should expect from property (ii) above (provided that M is sufficiently large), breaking the symmetry as described above does not influence the asymptotic pattern of convergence in separation, and (i)-(v) above remain valid. However the quantitative analysis of this example turns out to be more intricate than that of the first two.

2.5.
Constructing counter-examples which are lazy SRW on bounded degree graphs. It was observed by Peres and Wilson that the sequence of chains in Aldous' example could be modified into a sequence of lazy SRWs on bounded degree expander graphs (see Definition 3.6). In [18] Lubetzky and Sly constructed explicit 3-regular expanders with total-variation cutoff.
We use similar ideas to transform our Examples 2-3 into SRWs on bounded degree graphs (Examples 4-5). Our constructions includes one new idea: by introducing a sufficient amount of symmetry, (roughly speaking) we are able to reduce the analysis of Examples 4-5 to that of Examples 2-3. Consequently, the analysis of the asymptotic convergence profile of d n (t) is simpler than in [18] (at the cost of having maximal degree ≤ 7 rather than 3).

Preliminaries
The aim of this section is to introduce some general theory which shall reduce the analysis of our Examples 1-3 to the analysis of hitting time distributions of a specific state. The results appearing in this section are later generalized in § 6.1 (these generalizations reduce the analysis of Examples 4-5 to the analysis of hitting time distributions of a specific set). All proofs are deferred to the appendix. As we shall only prove the more general versions, we now describe the correspondence between the results of this section to the ones from § 6.1: Proposition 6.4 corresponds to Proposition 3.3, Lemma 6.3 to Lemma 3.5 and Proposition 6.5 to Proposition 3.8.
Definition 3.1. We say that a family of reversible Markov chains satisfies the product condition if Because of the following well-known fact (e.g. [16,Proposition 18.4]), all our counterexamples satisfy the product condition. mix }, if the sequence exhibits a pre-cutoff (either in totalvariation or separation) and lim n→∞ t Given z ∈ Ω we let T z := inf{t : X t = z} denote the hitting time of z. The following result allows us to characterize the mixingtime of the chain in terms of the hitting time of a given point which carries a positive proportion of the mass. As hitting times are sometimes easier to control than mixing-times, it will assist us in determining the total-variation profile of convergence to equilibrium in Examples 1-3. Proposition 3.3. Let (Ω n , P n , π n ) be a sequence of lazy reversible irreducible finite Markov chains which satisfies the product condition. Let us furthermore assume that there exists z n ∈ Ω n such that inf Then setting Note that in particular the result shows that total-variation cutoff occurs if and only if τ n (·) displays the following abrupt transition To characterize the separation time, we introduce a notion of "double-hitting time".
Definition 3.4. Given x, y and z in Ω. We let T x,y z denote a random variable obtained by taking the sum of two independent realizations of T z , once under P x and once under P y . That is, In particular, if every path from x to y goes through z) then for all t ≥ 0 (3.9) All our examples would be of sequences of chains whose spectral gaps are uniformly bounded away from zero, that is, ones satisfying Although this is not necessary, working with such chains substantially simplifies the analysis of our examples. To check this condition, we use the notion of the Cheeger constant and the well-known discrete analog of Cheeger's inequality (3.10) [3,4,21] (the proof can also be found at [16, Theorem 13.14]). We define the Cheeger constant of the chain to be We call a sequence of chains (Ω n , P n , π n ) an expander family if inf n Φ n > 0.
The following result implies that a sequence of reversible chains satisfies (⋆) if and only if it is an expander family.
Theorem 3.7. Let λ 2 be the second largest eigenvalue of a reversible transition matrix on a finite state space. Let Φ be as in Definition 3.6. Then It is rather straightforward to check in all of our examples that the Cheeger constant is bounded away from zero. Proposition 3.8. Let (Ω n , P n , π n ) be a sequence of lazy reversible irreducible finite Markov chains which satisfies (⋆). Let us furthermore assume that there exist z n ∈ Ω n , sets A n , B n ⊂ Ω n , with A n ∪ B n = Ω n \ {z n } and a n ∈ A n , b n ∈ B n , such that (i) inf n π n (z n ) > 0.
(ii) For any x ∈ A n and y ∈ B n , P x [T zn < T y ] = 1.
Then lim Proof. We want to show that P t n (x, y)/π n (y) achieves its smallest value for (x, y) = (a n , b n ) up to a negligible correction. According to (iv) we do not need to worry about the case when both x and y lie in A n . For the other cases, condition (iii) combined with Lemma 3.5 guaranties that (3.14) Finally, applying Lemma 3.5 again yields that This allows to conclude the proof by noticing that the right-hand side of (3.15) is o(1) (using (i) and (v)).
Remark 3.9. We note that for lazy chains condition (v) in Proposition 3.8 follows from the condition lim n→∞ dist(a n , z) = ∞ (which is satisfied in Examples 1-3), where dist(a n , z) is the minimal k such that P k (a n , z) > 0. To see this, consider the non-lazy path the chain performed from a n to z by time T z , γ = (γ 0 = a n , γ 1 , . . . , γ ℓ = z) (i.e. for all i < ℓ, γ i+1 = γ i and possibly after spending some time at γ i the chain moved to γ i+1 ). The conditional law of T z , given γ, is that of a sum of ℓ independent geometric random variables with parameter 1/2), and so by the local CLT its mode is at most C/ √ ℓ ≤ C/ dist(a n , z). Finally, note that the mode of a mixture is at most the maximal mode of a distribution in the mixture.

Total-variation cutoff without separation cutoff examples
In this section we describe two similar examples of sequences of reversible chains which exhibit total-variation cutoff but no separation cutoff. The analysis of both examples is extremely similar. We present both examples since while the first demonstrates that (1.7) is indeed sharp, it is much harder to transform it into an example of lazy SRWs on bounded degree expander graphs. Example 1. Given n ≥ 2, set Ω n := A ∪ {z} ∪ B where A = A n := {a = a n , a n−1 , . . . , a 1 } and B = B n := {b 1 , b 2 , . . . , b n−1 , b n = b}. For notational convenience we write a 0 := z =: b 0 . The matrix P n has positive transition rates on the set of (un-oriented) edges With a small abuse of notation we define e L 1 and e B 1 to be two distinct parallel edges. To each of these edges, we associate conductances (or weights), as follows , and w n (e L n ) = 2 −n (n−1) . We let P n be the transition matrix of the (1/2)-lazy random walk on the graph (Ω n , E) with conductances w n , i.e. we set where w n (x) := y∈Ωn w n (x, y) with the convention that w n (z, b 1 ) = w n (e L 1 ) + w n (e B 1 ). This Markov chain is reversible with respect to n n 00 00 11 11 00 00 11 11 00 00 11 11 00 00 11 11 00 00 11 11 00 00 11 11 0000000000000000000 1111111111111111111 000000000000000000 111111111111111111 (n−1)/2n 00 00 11 11 00 00 11 11 00 00 11 11 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 Figure 3. A schematic representation of the transition rates for Example 1. On the segments A and B the transition rates away from and towards the center of mass z are equal respectively to 1/6 and 1/3 (on the A side) and (n − 1)/6n and (n − 1)/3n (on the B side). The rate for using a green-edge to land on z is equal to 1/2n. The rates for using green edges in the other direction has a more complicated expression prescribed by reversibility. These rates are described below despite the fact that they play no role in our analysis.
A simple calculation show that which implies lim n→∞ π n (z) = 1/4. The transition matrix obtained from w n is • P n (x, x) = 1/2, for all x ∈ Ω n .
Note that for this chain, condition (⋆) is easily verified using Theorem 3.7. Since under P an , T z is concentrated around time 6n, to prove total-variation cutoff around time 6n for this sequence of chains (using Proposition 3.3), we only need to verify that a n is the initial state from which T z is (stochastically) the largest. A crucial fact which shall assist us in this task is that for all i ∈ [n] and all t (4.4) The reason for this identity is the following: We couple X A and X B starting from a i and b i (resp.) in the following manner: with probability 1/2 both stay put, with probability ( 1 2 − 1 2n ) X A and X B make "the same move" (+/ − 1 (towards/away from z) with (conditional) probability 1/3 and 2/3, resp. (unless the current position of the chain is either a n or b n in which case the move has to be −1) and with probability 1/(2n), X B is sent directly to z while X A moves towards/away from z with probability 2/3 and 1/3 (unless it is located at a n ). We do not need to specify how the coupling is defined after X B has hit z.
A way to describe X t starting from B before it hits z is the following: at each step it is killed (hits z) with rate 1/(2n) and conditionally on not being killed, it performs "the same" random walk as that on A (in terms of the index of its current position) but with holding probability n/(2n Moreover, it follows from the above discussion that We now turn to the task of verifying that there is no cutoff in separation. Note that conditions (i)-(ii) of Proposition 3.8 hold by construction, condition (v) holds by Remark 3.9, while condition (iv) holds by (3.8). Lastly, condition (iii) of Proposition 3.8 follows form  We now describe a variant of the previous example which is a nearest neighbor lazy weighted random walk on a bounded degree graph with bounded transition probabilities.  For notational convenience we write a 0 := z =: b 0 = c 0 and c n = b n . Consider the following transition matrix • P n (x, x) = 3/4 for all x ∈ Ω n \ C and P n (c i , c i ) = 1/2 for all i ∈ {1, . . . , n − 1}, • P n (z, a 1 ) = P n (z, b 1 ) = P n (z, c 1 ) = 1/12. When at a state of degree two or three (other than z), conditioned on making a nonlazy step, the chain moves away from (resp. towards) z with conditional probability 1/3 (resp. 2/3). For vertices of degree 2: along the green edges, rates away from and towards the center of mass z are equal respectively to 1/12 and 1/6 and along the red edges they are equal to 1/6 and 1/3, respectively. The transitions away from vertices of degree 1 and 3 are given on the figure.
States a 2n , b 2n and z play here the same respective roles as a n , b n and z in the previous example. A simple calculation (similar to (4.3)) yields that lim n→∞ π n (z) = 2/7.
We argue that for all t ≥ 0 and i ∈ [2n] In particular, Since the hitting time of z under P a 2n is concentrated around time t = 24n, by Proposition 3.3 the sequence exhibits total-variation cutoff around time 24n. The last inequality in (4.12) is trivial. For the first one we consider the case where P n is replaced by P ′ n which satisfies 2P ′ n (c i , c i+1 ) = 1/3 = P ′ n (c i , c i−1 ) and P ′ n (c i , c i ) = 1/2 for 1 ≤ i ≤ n − 1 and P ′ (x, y) = P (x, y) elsewhere. As adding extra laziness increases stochastically the hitting time T z (as in Remark 3.9 consider the law of γ, the non-lazy path performed by the chain by time T z ; Clearly it is invariant under this transformation, while the conditional law of T z , given γ, can only increase, stochastically), (where P ′ denotes the distribution of the modified chain with the increased holding probability on C n ) and the same holds when b i is replaced by c i .
To prove that b 2n is the vertex from which the hitting time of z is the largest, we need to prove the following two inequalities valid for i ∈ {1, . . . , n} (4.15) Both can be proved by coupling arguments. For the first one, we can couple the non-lazy path of the chains starting from b i and c i until they reach either b n or z (the second being at position c j when the first is at position b j ), and then in the case they reach b n = c n let them evolve together until they reach z. The larger laziness on the path starting from c i until the merging time, implies stochastic domination. For the second inequality, the case i = n follows from fact that starting from b 2n the chain has to go through b n before reaching z. For i < n, we can couple the chain starting from b i and b i+n until the pair of chains reaches either (b n , b 2n ) or (z, b n ) (the second chain being at position b j+n when the first is at position b j ), and conclude using the case i = n. As in the previous example, we can apply Proposition 3.8. The reason why separation cutoff does not occur is that when starting from b 2n , the hitting time T z is not concentrated. Indeed it is concentrated around 18n under the conditioned probability measure P this yields  While this result is rather elementary (we use some surgery to compare T z with a sum of independent variables, and then the law of large number for this sequence), the proof in full detail is long to expose (c.f. [5, Example 8.1]) and we choose to leave it as an exercise. Applying Proposition 3.8 for an adequate choice of sets and states (here (a 2n , b 2n , A n , B n ∪ C n ) plays the role of (a n , b n , A n , B n ) from Proposition 3.8) yields (4.17) In particular, there is no cutoff in separation.

Separation cutoff without total variation cutoff example
In the following example the analysis of the sharp transition of d sep (t) is reduced to the analysis of the behavior of sum of i.i.d. random variables in the large deviation regime. The analysis below is too coarse for the purpose of determining the width of the cutoff window. We later present a refined analysis for Example 5 (which is the bounded degree un-weighted version of Example 3) in § 6.5, which shows that in fact t for some absolute constant C > 0. The analysis in § 6.5 is built upon the analysis of Example 3 below, as it relies (in a non-quantitative manner) on the fact that certain large deviation estimates hold uniformly over compact sets (the identity of the large deviation rate function is not important for the analysis in § 6.5).
Example 3. Let M ≥ 10 be a fixed integer whose exact value shall be determined later. Consider the state space   Figure 5. A schematic representation of the transition rates for Example 3. When at a state of degree two or four (other than z), conditioned on making a non-lazy step, the chain moves away from (resp. towards) z with conditional probability 1/3 (resp. 2/3). The transition rates away from and towards the center of mass z, from degree two states, are equal respectively to 1/6 and 1/3, except on the segment C, due to increased holding probability. The transition rates away from the rest of the states are specified in the figure.
This chain is a modification of Aldous' example (which was discussed in § 2). The difference lies in the introduction of an additional branch B to the graph. This branch has no effect on the total-variation profile of the convergence to equilibrium, but crucially modifies the separation profile, as P t n (a, b)/π n (a) (recall a := a nM and b := b nM ) is the quantity that takes the longest time to reach equilibrium (i.e. up to negligible correction (x, y) = (a, b) maximizes 1 − P t n (x, y)/π n (y) for all relevant t). A standard calculation yields that lim n→∞ π n (z) = 2/11 and lim n→∞ 2 n π n (z ′ ) = 6/11. (5.1) By symmetry, the law of T z starting, resp., from a i and b i is identical for all i and by the Markov property, it is stochastically increasing in i (for i > j, to reach z from a i (resp. b i ) the chain must first hit a j (resp. b j )). Only minor efforts are necessary to prove rigorously that a and b are the points in A ∪ B ∪ D for which the hitting time T z is stochastically the largest (the coupling arguments are similar to the one developed in the previous section), while for any choice of M > 1, Due to the different holding probabilities along the two branches, C, D, the distribution of T z under P a is not concentrated around its mean. Thus, by Proposition 3.3, there is no total-variation cutoff, and the total-variation asymptotic profile is given by To show that there is separation cutoff, it suffices to prove that lim inf n→∞ inf t min x,y∈Ωn P t n (x, y)/π n (y) − min 1, P t n (a, b)/π n (b) = 0, (5.3) and to show that min(1, P t n (a, b)/π n (b)) displays an abrupt transition. Let us start with the second point. According to Lemma 3.5 (first inequality of (3.7)), we have By definition T a,b z ′ is the sum of two independent hitting times of a biased random walk on a segment of length M n (from one end-point towards the one towards which there is a bias). We make some efforts to compute the large deviation behavior of this sum.
Lemma 5.1. Consider a lazy random walk (Z t ) t≥0 on Z + with rates p(x, x + 1) = 1/3, p(x + 1, x) = 1/6, x ∈ Z + . Let T N be the first hitting time of N . We have
Proof. Let X ′ be the random walk with the same rates on Z, and T ′ N be the first hitting time of N for this walk. By the Markov property T ′ N is the sum of N IID copies of T ′ 1 and hence we can use Cramér's Theorem (see e.g. [8,Chapter 2]) to obtain the large deviation for T ′ N below its mean. If one decomposes according to the value of X ′ 1 we notice that the Laplace transform f(λ) and we deduce the right value for f(λ) from this relation (the fact that f(0) = 1 and continuity of f indicates which root to choose in (5.7)). Note that the derivative of log f(λ) at zero is equal to 6 which implies that Ψ(6) = 0 (Alternatively, E[T ′ 1 ] = 6, hence by Cramér's Theorem it must be the case that Ψ(6) = 0). As Ψ is non-negative (since log f(0) = 0), it must be the case that it attains a global minimum at 6, which implies that Ψ ′ (6) = 0 and Ψ ′′ (6) > 0. Now, note that T i − T i−1 are independent variables, which are dominated by T ′ 1 and who converge (when i tends to infinity) to T ′ 1 in law. In particular, by dominated convergence (and Cesaro's Theorem) we have that for any λ ∈ (−∞, log 6 and thus in that case the result follows from Gärdner Ellis Theorem [8]. Finally, the local large deviation estimate (the result on P[T N = ⌊sN ⌋]) can be deduced from the large deviation principle using the fact that due to laziness We leave it as an exercise. Note moreover that the convergence (5.5) holds uniformly on s ∈ K for any compact K (it can be deduced e.g. from (5.9)).
A consequence of (5.4) and the previous lemma in conjunction with (5.1) and Lemma 3.5 is that if s M is given by 2M s * , where s * is the unique solution in ( In what follows we let s ∈ (s M , 12M ] be fixed. We first use Lemma 3.5 to reduce to the case of x = a i , y = b j , i, j ≥ M n/2. Set E := {a i : i ≥ M n 2 } ∪ {b i : i ≥ M n 2 }. By (3.7) for any x ∈ Ω n and y ∈ Ω n \ E we have P ⌈sn⌉ n (x, y) π n (y) Finally, to treat the case x = a i , y = b j (the cases (a i , a j ) or (b i , b j ) are treated in the same manner), i, j ≥ M n/2 , we use again Lemma 3.5 which asserts that is (cf. the proof of Lemma 5.1) a sum of i+j independent random variables (not identically distributed) and that lim n→∞ sup i,j≥M n/2 One deduces from Gärdner Ellis Theorem [8] and the following consequence of laziness Note that the l.h.s. in the second line satisfies As 2M Ψ s 2M < log 2 (since s ∈ (s M , 12M ]) and δ can be chosen arbitrarily small, (5.18) (second line) and (5.19) imply that for sufficiently large n, for any i, j satisfying i + j ≥ sn 6 + ηn, we have Combining this with (5.18) (first line) and (5.14) we can conclude that (5.21) 5.1. Concerning Remark 1.3. Note that by performing a minor modification in the above construction we can bring the pre-cutoff ratio for total-variation to the largest possible value: 2. A way to achieve this is to make one of the branches linking z ′ to z much faster than the other (instead of only twice faster as in Example 3, we want the ratio of speeds to tend to infinity).
What we can do is to make these branches of length ⌈ √ n ⌉ while A and B are of length n. Furthermore, we choose the speed on one branch to be 1/6 while that one the other being 1/(6 √ n) by increasing the holding probability on this branch (see Figure 6). Using similar reasoning as in the analysis of Example 3 one can show that for this construction there is separation cutoff around time 12n (note that here − log π n (z ′ ) = Θ( √ n), which by (5.4) implies that for t n := ⌈(12 − ε)n⌉, P tn We can also find a similar example with transition rates bounded uniformly from zero by considering two branches of different lengthes, but in that case the analysis turns out to be more intricate.
which coincides with Definition 3.6 (see e.g. [16,Remark 7.2]). We say that G is a c-lazy expander if ch Lazy (G) > c. We say that a sequence of finite graphs (G n ) n≥1 is a family of c-lazy expanders if inf n ch Lazy (G n ) > c.
In our new context, the center of mass is rather a set which contains a positive fraction of the vertices. We shall relate the mixing-time of the chain to the hitting time of this set. Mutatis mutandis, the results of Section 3 and in particular Lemma 3.5 can be adapted to this new context, but only if the set and the starting point satisfy a special relation: Definition 6.2 (Balanced sets). For any Z ⊂ Ω we denote the hitting time of Z by T Z := inf{t : X t ∈ Z}.
• We say that Z is balanced seen from x ∈ Ω if for all t such that P where π Z (·) = 1 ·∈Z π(·) π(Z) is π conditioned on the set Z. • We say that Z is balanced seen from the set A if it is balanced seen from x for all x ∈ A. • We define T x,y Z to be a random variable distributed like the sum of two independent realizations of T Z , once under P x and once under P y . That is, for all t ≥ 0, Note that sets are not likely to be balanced by "pure luck" and we will be careful to introduce a sufficient amount of symmetry when constructing our graphs, so that our center of mass will be balanced seen from many starting points. However, this property cannot be satisfied for all starting points and we will have to deal with the remaining initial vertices separately (and show that they are irrelevant for determining the worstcase total-variation and separation distances), by using a crude ℓ 2 estimate (Lemma 6.8).
Lemma 6.3. Let (Ω, P, π) be a finite irreducible lazy reversible Markov chain and consider x, y ∈ Ω, and Z ∈ Ω which is balanced seen from both x and y (i) For all t ≥ 0 we have if every path from x to y goes through the set Z) then for all t ≥ 0 we have that (1 − π(Z))/π(Z).

(6.4)
We use this result directly but also to prove the following key propositions whose aim is to replace Propositions 3.3 and 3.8. Proposition 6.4. Let (Ω n , P n , π n ) be a sequence of lazy reversible irreducible finite chains which satisfies the product condition. Assume that for each n there exist sequences of sets and vertices I n , Z n ⊂ Ω n , a = a(n) ∈ Ω n which satisfy (i) inf n π n (Z n ) > 0.
(ii) Z n is balanced seen from I n for all n.
Then lim

In particular, there is separation cutoff if and only if T an,bn
Zn is concentrated around its median.
Remark 6.6. Note that the results presented above are generalizations of those presented in Section 3. Hence we shall only prove the more general versions in the Appendix.
6.2. Building blocks of our constructions. Let us now describe the building blocks of our constructions. We assume for simplicity that n is an even integer. To produce the analog of a biased nearest-neighbor random walk, our constructions must include structures which look like regular trees (for which the SRW has a bias towards the leaves).
We must also care about adding some extra connections to avoid producing dead-ends on the leaves (which could lead to a small Cheeger constant). Finally, we must introduce extra symmetries to ensure that the center of mass is balanced seen from all vertices which are sufficiently far from it. Finally, we "stretch" the edges which are far away from the center of mass (that is, replace each such edge by a path of length L, for some fixed large constant L), to ensure that the worst-case total-variation and separation distances are obtained by vertices which are far away from the center of mass (which is balanced, seen from those vertices).
Step 1: Let T a = (V a , E a ) be a binary tree of depth n rooted at a (in the rest of the construction, we keep calling a the root, even though the graph will no longer be a tree). Replace each edge between a pair of vertices belonging to the first n/2 generations of T a by a path of L edges, where L is an integer which does not depend on n. As L shall remain fixed we omit the dependence in L from our notation. In the course of the proof we will have to require L to be sufficiently large for the purpose of applying a certain crude ℓ 2 estimate. We call the obtained graph H 1 n . It is a tree rooted at a and we denote its set of leaves by L n := (u 1 , . . . , u 2 n ), (L n stands for the n-th generation of T a ), where the labels are chosen in an arbitrary fashion.
On H 1 n the walker starting from a will have a bias towards the set of leaves, which can be considered as the center of mass of these graph, since it contains a positive proportion of the vertices. The parameter L here is present only to make the walk slower (the expected number of steps to cross an L-path is 2L 2 , i.e. if v ∈ H 1 N is either the root a or a vertex of degree 3 adjacent to three degree 2 vertices E v [inf{t : D(X t , v) = L}] = 2L 2 where D denotes the graph distance). This shall assist us in verifying that the worst-case totalvariation and separation distances are obtained by vertices which are far away from the center of mass.
The problem of this construction is that seen from a vertex which is not a the set of leaves is not balanced. To cope with this defect, we add n extra "generations" of vertices, which make the center of mass balanced from "many" starting points.
Step 2: For all 1 ≤ m ≤ n we label the vertices of the "n + m-th generation" (they are at distance (L + 1)n/2 + m from a) as follows , k ∈ [2 n−m ]} and we connect them to generation n+m−1 using the following scheme: for all k ∈ [2 n−m ] u k i 1 ,...,i m−1 ,1 , u k i 1 ,...,i m−1 ,2 , u k i 1 ,...,i m−1 ,3 , u k i 1 ,...,i m−1 ,4 are connected to u 2k−1 i 1 ,...,i m−1 and u 2k i 1 ,...,i m−1 . We call the obtained graph H n 1 . The "center of mass" of H 2 n is the set L 2n (it bears roughly half of the total mass of H 2 n ), which is balanced seen from any vertex in H 1 n .
Step 3.1 and 3.2: We now want to plug (attach) to the leaf set of H 1 n "two paths" with different speeds (to have something similar to the structures present in Examples 2 and 3). The construction is the following (see Figure 7): (i) We start with a rooted binary tree T of depth n (assume n ≥ 4). And let us call 1 and 2 the two neighbors of the root and T 1 and T 2 the subtrees rooted at 1 and 2, respectively. (ii) In T 1 we add edges between any pair of vertices which have a common ancestor and are not leaves. (iii) Finally we assign labels to the leaf sets of T 1 and T 2 in a way that the two labeled trees (prior to step (ii) that is) are isomorphic (see e.g. Figure 7) and we merge each leaf of T 1 to the leaf of T 2 with the same label. We let T n denote the obtained graph. (iv) We let T ′ n denote the graph which is obtained by the same construction, in which we also add edges within T 2 in step (ii) using the same role as for T 1 (see Figure  7). To each vertex v ∈ L 2n , we glue a copy of T n (v is merged with the root of T n and we obtain H 3,1 n ). If we glue a copy of T ′ n (to each v ∈ L 2n ,) instead of T n , we obtain H 3,2 n . For both graphs we call L 3n the set of vertices at distance (L + 5)n/2 (i.e. maximal distance) from a. Root Root Figure 7. Representations Tn (on the left) and T ′ n (on the right) for n = 4. The red edges are those added in step (ii). On step (iv) leafs with the same label are merged.
Finally we want to link together all the vertices of L 3n in order to avoid dead-ends in the graph. We choose to link them together using an explicit expander (see e.g. [1,20] for examples of explicit construction of expanders) so that (total-variation) mixing occurs rapidly once L 3n is reached.
Step 4: We let F n = (V n , E n ) be a family of explicit 3-regular c-lazy expanders with V n = [2 3n−1 ]. We glue together G n and H 3,i n (i = 1, 2) without adding vertices by identifying V n with L 3n−1 . More precisely, we start with a copy of H 3,i n with root a. We label the vertices of L 3n by z 1 , . . . , z 2 3n−1 (the labeling is arbitrary). We then connect z i with z j if and only if {i, j} ∈ E n . We call the final result of our construction H 4,i n (i = 1, 2). We call a the root of H 4,i n (i = 1, 2). With some efforts and using the tools developed in the following sections, the reader can check that the lazy SRW on H 4,1 n exhibits pre-cutoff but not cutoff in total-variation. This is a SRW version of Aldous' counter-example.
6.3. A sequence of Lazy SRW on bounded degree expanders with total-variation cutoff and no separation cutoff. The following is a modification of Example 2 into a sequence of lazy SRWs on a sequence of bounded degree graphs.
Example 4. Take a copy of H 3,1 n with root b and a copy of H 3,2 n with root a. We glue together the two by merging the vertices of L 3n (of both graphs): we give labels z 1 , . . . , z 2 3n−1 to the vertices lying in L 3n of each of the two graphs, and then merge each pair of vertices who share the same label. Finally, we build extra-connections between z 1 , . . . , z 2 3n−1 using an expander graph F n with 2 3n−1 vertices, like in Step 4. We let G 1 n := (V 1 n , E 1 n ) denote the obtained graph. In order to apply Propositions 6.4 and 6.5, we need to identify which vertices and sets will play which role.
• The center of mass Z n is given by the 2 3n−1 vertices which are linked by the expander. • a is the vertex which maximizes (stochastically) the hitting time of Z n .
• The pair of vertices (x, y) which (up to negligible terms) attains the minimum for P t n (x, y)/π n (y) (for all t ≥ 0) is given by (a, b). • The sets A n and B n are chosen to be the largest set of points around a and b (resp.) such that Z n is balanced seen from I n := A n ∪ B n . Namely, these are the vertices within respective distance (L + 1)n/2 from a and b (the vertices of H 0 n in both H 3,1 n and H 3,2 n ). Indeed, due to step 2 of the construction, the set L 2n of H 3,1 n , respectively, H 3,2 n (i.e. the collection of vertices whose distance from a (resp. b) is (L + 3)n/2) is balanced seen from A n , resp. B n . This implies that the distribution of X T Zn is uniform on Z n .
Step (iv) of the construction of T n is there to guaranty that T Zn and X T Zn are independent (and hence that Z n is balanced seen from A n and B n ). It is then not difficult to check (cf. Figure 8) from the construction that assumptions (i) − (iii) resp. (i) − (v) of Propositions 6.4 and 6.5, are satisfied.
Moreover, the hitting time of Z n from a is concentrated around (17 + 3L 2 )n, while from b it satisfies that We want to prove that the system displays cutoff in total-variation around time (17+3L 2 )n, and that the asymptotic behavior for the separation distance is given by 12) The only thing we have to do to prove these statements is to verify condition (iv) in Proposition 6.4 and condition (vi) of Proposition 6.5 (resp.). The only delicate point is to show that for starting points outside of I n the walk mixes rapidly. I.e. that there exists an absolute constant C > 0, which does not depend on L, such that Before proving (6.13) let us explain how we use it to verify the remaining conditions. Note that if L is chosen to be sufficiently large (i.e. such that (17 + 3L 2 ) > C) then (6.13) implies condition (iv) of Proposition 6.4.
For condition (vi) of Proposition 6.5, for the case x ∈ Ω n y / ∈ I n , we use Lemma 6.8 and the total-variation cutoff result to show that for t ≥ (18 + 3L 2 + C) which is uniformly close to one. This yields the right condition provided 32 + 6L 2 > 18 + 3L 2 + C (which can obviously be fulfilled by picking L to be sufficiently large). We now treat the case where both x and y lie in A n (whose analysis does not rely on (6.13)). We use Lemma 6.3 with Z = Z ′ n chosen to be the set of vertices within distance (L + 3)/2n from a (corresponding to L 2n in the copy of H 3,2 n ). Recall that by construction this set is balanced seen from A n . By (6.3) we have that P t n (x, y)/π n (y) ≥ P T x,y Z ′ n ≤ t . P T x,y Z ′ n ≤ (6L 2 + 18 + ε)n = 1 (6.16) and this suffices to conclude that condition (vi) of Proposition 6.5 indeed holds. Now let us prove (6.13). We want to use a simple ℓ 2 bound using the Poincaré inequality (see Lemma A.1). The issue is that the spectral gap of our graph is rather small (of order L −2 ) due to the presence of stretched edges. However starting outside of I n the walk has a very small chance to visit the part of the graph where the edges are stretched, before the walk is already extremely mixed. Hence our idea is to apply the ℓ 2 bound for the walk on a smaller graph which corresponds to the vertices which are likely to be visited. This graph will have no stretched edges and a spectral gap which is bounded away from zero and does not depend on L.
We let G 1 n = ( V n , E n ) denote the graph which is obtained from G 1 n when all the vertices within distance Ln/2+1 from a and b have been deleted, together with all edges connected to them. First we observe that the Cheeger constant associated to G 1 n is large (i.e. it is bounded from below by some positive absolute constant, which is independent also of L), see e.g. Lemma 2.1 in [18] for a proof. Proposition 6.9. Let κ := (min(c/3, 1/18)) 2 /2. Then Consequently, the relaxation-time of the lazy SRW on G 1 n , t rel (n) , satisfies If we let P t x and π n refer to the distribution at time t and at equilibrium for the walk on G 1 n , this implies (by Lemma A.1) that for x ∈ V 1 n , for all t ≥ nκ −1 log 9.
P t x − π n TV ≤ 1 min y π n (y) e −κt ≤ max v∈ Vn deg v | V n |9 −n ≤ 6(8/9) n . (6.19) What remains to be proven is that if one considers V 1 n as a subset of V 1 n , then for any x ∈ V 1 n \ I n , the distances P t x − π n TV and P t x − π n TV are very close. Note that P t x − π n TV ≤ P t x − P t x TV + P t x − π n TV + π n − π n TV , (6.20) The term π n − π n TV is exponentially small in n because only an exponentially small fraction of the vertices of G 1 n lie outside of G 1 n . Now if one lets T ∂ V 1 n denote the hitting time of n is the vertex set of G 1 n ) we have (by a standard coupling argument) that Now if x ∈ V 1 n \ I n , it lies at distance of at least n/2 from ∂ V 1 n and has to overcome a drift to reach it. For this reason it should take time which is exponentially large in n. More rigorously, we let Ω x be the set of vertices y ∈ V 1 n such that there exists a graph automorphism of G 1 n preserving a and b which maps x to y (in most cases it is just a pedantic manner to describe the set of points at a fixed distance from a, but we have to introduce this definition due to the lack of symmetry of the b-side). Note that |Ω x |/|∂ V 1 n | ≥ 2 n/2 if x / ∈ I n . Hence we have for all i > 0 and x / ∈ I n that where in the first inequality we have used the stationarity of π n , y∈V 1 n π n (y)P i y ∂ V 1 n = π n (∂ V 1 n ).
Plugging this in (6.21) we obtain (6.13) or more precisely: Example 5. Take a copy of H 4,1 n with root a and a copy of H 1 n with root b. We glue them together as follows: we give labels in [2 2n ] to the vertices in L 2n in the two graphs and merge the vertices which share the same labels. We denote the set of merged vertices by Z ′ n (this is the set of vertices of distance (L + 3)n/2 from a and b). Let G 2 n denote the obtained graph. However, as in Example 3, the separation mixing-time is determined by the behavior of T a,b Z ′ in the large deviation regime. Note that Z ′ is a set of small equilibrium measure (it has 4 n vertices whereas the full graph has order 8 n vertices).
The reader can easily check that here a and b play symmetric roles. We let A n and B n denote the vertices within distance (L + 1)n/2 from a and b, respectively. Moreover, • The center of mass Z n is given by the 2 3n−1 vertices which are linked by the expander (which are the vertices belonging to L 3n of H 4,1 n ). • Z n is balanced seen from A n ∪ B n .
• a and b maximize (stochastically) the hitting time of Z n .
It is then not difficult to check (see Fig.9) from the construction that assumptions (i)−(iii) Proposition 6.4 are satisfied. Assumption (iv) can be showed to be satisfied as in the previous example by using an ℓ 2 bound for the graph in which points within distance Ln/2 of a and b have been deleted.
The asymptotic behavior of the hitting time of Z n from a (or b) is once again given by (6.11) and hence the system does not display cutoff in total-variation.
For cutoff in separation, we cannot use Proposition 6.5. We use instead Lemma 6.3, and the relevant set to hit is Z ′ n . This set is balanced seen from I n := A n ∪ B n and thus is the relevant one for the purpose of computing the separation mixing time. An analog of the analysis performed for Example 3, does the job. To control the quantity P t n (x, y)/π n (y) when one of x and y (or both) does not belong to A n ∪ B n we use an ℓ 2 estimate (in conjunction with Lemma 6.8) for the subgraph G 2 n obtained by deleting the stretched edges in G 2 n , similarly to what we have done in the analysis of Example 4.
6.5. Proof of Remark 1.4. Part (i) follows from the analysis of Example 4. We shall prove now that part (ii) is satisfied by Example 5. We denote by π Z ′ the distribution of π n conditioned on Z ′ (suppressing the dependence on n). By (6.4) we have that for all t and every x ∈ A n and y ∈ B n P t n (x, y)/π n (y) = We know from the previous analysis of Example 5 that for the separation distance to equilibrium only (x, y) ∈ A n × B n matter, or more precisely lim n→∞ sup t≥0 |d (n) sep (t) − max(0, 1 − min (x,y)∈An×Bn P t n (x, y)/π n (y))| = 0. (6.25) Hence setting t n η (x, y) := min{t : P t n (x, y)/π n (y) ≥ 1 − η} we prove that cutoff window is constant by proving that, for all ε > 0, there exist some n ε ∈ N and some absolute constant C 2 such that for all n ≥ n ε and all (x, y) ∈ A n × B n t n ε (x, y) − t n 1−ε (x, y) ≤ C 2 | log ε|. (6.26) ∀t ≥ t ε (x, y), P t n (x, y)/π n (y) ≥ 1 − ε. (6.27) In what follows for simplicity we drop the dependence in n in the notation t η (x, y). Although this is not used in the analysis below (and hence not proven), we can identify t 1/4 (x, y) for all (x, y) ∈ A n × B n as follows: where t ′ (x, y) := inf{t : P[T x,y Z ′ ≤ t] ≥ π n (Z ′ )} andt(x, y) := inf{t : P[T x,y Z ′ = t] ≥ π n (Z ′ )}. This follows from the analysis below, together with (6.24) and the exponential decay of P t π Z ′ (Z ′ )) − π n (Z ′ ) as a function of t.
Fact 6.12. The family of Geometric distributions is log-concave.
Fact 6.13. The family of log-concave distributions over Z is closed under convolutions.
The following representation of hitting times in birth and death chains is due to Karlin and McGregor [13,Equation (45)]. It was later rediscovered by Keilson [14]. The discrete time case of this result was given by Fill [11,Theorem 1.2].
We are now ready to prove (6.26) and (6.27). For clarity of exposition, we first expose our analysis for the special case x = a, y = b. Consider the sequence of graphs G 2 n from Example 5. Let G 3 n the subgraph of G 2 n whose set of vertices is given by V 3 n := {v : dist(v, {a, b}) ≤ (L + 3)n/2}, and whose edges are those of E 2 n for which both ends are in G 3 n (Note that this graph is connected and includes Z ′ but not any point further away from {a, b}) Let (Y t ) t∈Z + be lazy SRW on G 3 n . Consider the projectionȲ t := 1 + dist(Y t , {a, b}). Our construction implies that the projection is Markovian and thus (Ȳ t ) t∈Z + is a lazy birth and death chain on [1 + (L + 3)n/2]. Consequently, by Theorem 6.14 and Facts 6.12-6.13, the law of T a,b Z ′ , which is a sum of independent hitting time and thus of geometric variables, is log-concave. For any v ∈ V 3 n the distribution of T Z ′ , given that Y 0 = v, is the same as that of T 1+(L+3)n/2 (for the chain (Ȳ t )), given thatȲ 0 = 1 + dist(v, {a, b}). Consequently, by Theorem 6.14 and Facts 6.12-6.13, the law of T a,b Z ′ is log-concave. Let z * be the mode of T a,b Z ′ . A standard computation is sufficient to show that (in fact, the first inequality follows from unimodality). Fix some δ > 0 sufficiently small such that P[T a,b Z ′ ≤ z * − δn] ≫ 2 −n (2 −n is the order of magnitude of π n (Z ′ )). By a large-deviation estimate and log-concavity there is some α > 1 such that for all sufficiently large n we have that hence, again by log-concavity, Consequently, by (6.24) π n (Z ′ ) = α P t n (a, b) π n (b) . (6.30) As T a,b Z ′ is log-concave and hence by Fact 6.11 also unimodal, (6.24) also yields that and that there exist some absolute constants c, C 6 > 0, β ∈ (1, 2) such that This concludes the proof of the case (x, y) = (a, b) as (6.30) implies (6.26) with C 2 := (log α) −1 and (6.27) can be deduced from the four other equations. For general (x, y) ∈ A n × B n we decompose T x,y Z ′ into a convolution of a log-concave distribution and some other negligible term. Let (X x t ) t and (X y t ) t be independent realizations of the random walk, started from respective initial vertex x and y, defined on the same probability space. Let T x Z ′ := inf{t : X x t ∈ Z ′ } and T y Z ′ := inf{t : X y t ∈ Z ′ }. We define T ′ x (and T ′ y in an analogous manner, using (X y t ) and T y Z ′ ) as follows (with the convention sup ∅ = 0) By Theorem 6.14 and Facts 6.12-6.13 the laws of T x Z ′ − T ′ x and T y Z ′ − T ′ y are log-concave (by a similar argument to the one used before using a projection to a birth and death chain), and so T 1 is also log-concave (by Fact 6.13). Observe that T 1 + T 2 has the same law as T x,y Z ′ . Denote the mode of T 1 by z * = z * (x, y). Fix some δ > 0 sufficiently small such that min (x,y)∈An×Bn P[T 1 (x, y) ≤ z * (x, y) − δn] ≫ 2 −n . Imitating the proof of (6.30), using a large-deviation estimate on P[T 1 (x,y)=z * (x,y)] P[T 1 (x,y)=z * (x,y)−⌊δn⌋] which is uniform in (x, y) (the existence of such a uniform large-deviation estimate follows from the analysis of Example 3, or alternatively, by [5, Lemma 6.2]), together with log-concavity, we get that if α > 1 is chosen sufficiently small, then (6.29) remains valid simultaneously for all choices of x, y, if one replaces T a,b Z ′ by T 1 (x, y) (and z * with z * (x, y)). We argue that (6.28)-(6.33) can be extended (excluding the middle terms) to all (x, y) ∈ A n × B n (in the role of (a, b)), with the same choice of constants for all (x, y) ∈ A n × B n . To extend (6.30) and (6.31), note that after conditioning on T 2 we can imitate the above proofs and so the extensions are obtained by averaging over T 2 . For (6.32), note that by unimodality P[T x,y Z ′ = z * (x, y)+⌈n 2/3 ⌉]/π n (Z ′ ) ≥ c 1 2 n P[T 2 (x, y) ≤ ⌈n 2/3 ⌉]P[T 1 (x, y) = z * (x, y)+⌈n 2/3 ⌉]. It is not hard to show that there exists some γ < 2 and c 2 , C 6 > 0 such that P[T 1 (x, y) = z * (x, y) + ⌈n 2/3 ⌉] ≥ c 2 γ −n and P[T 2 (x, y) ≤ ⌈n 2/3 ⌉] ≥ 1 − C 6 n −2/3 . for all (x, y) ∈ A n ×B n (by Markov inequality and the fact that max (x,y)∈An×Bn E[T 2 (x, y)] = O(1)). For (6.28) use unimodality (first inequality) to show that for all (x, y) ∈ A n × B n |z * (x, y) − E[T 1 (x, y)]| ≤ C 4 Var(T 1 (x, y)) ≤ C 4 Var(T a,b Z ′ ) ≤ C 5 √ n.
A.3. Proof of Lemma 6.3. By decomposing over the possible values of T Z , using the assumption that Z is balanced seen from x and reversibility (which implies that P s π Z (y)/π(y) = P s y (Z)/π(Z), for all s), we get that P t (x, y) π(y) = k 1 ≤t P x [T Z = k 1 ] P t−k 1 π Z (y) π(y) + P x [X t = y and T Z > t] π(y) P t−k π Z (Z) π(Z) .
The first inequality in (6.4) is obtained by plugging the last estimate in the second term of (6.4). For the second inequality in (6.4) it follows from the estimate A.4. Proof of Proposition 6.4. The result is mostly a consequence of the following result which relates the mixing time starting from x to the hitting time of a set Z balanced seen from x.
Let s ′ := max(t x,Z (p) − s ε , 0). Then we have Moreover if Z is balanced seen from x then we also have that Proof. The first result is proved by coupling the chain with initial distribution P k−sε x with the stationary chain (k s ε to be determine soon). We have P x [T Z ≥ k] P k−sε x − π TV + P π [T Z ≥ s ε ] P k−sε x − π TV + ε.
(A. 16) where the last inequality is a consequence of (A.2) and the choice of s ε . Setting k = t x,Z (p) we obtain the result (as if s ′ = 0 there is nothing to prove). We now prove (A.15). By the assumption that Z is balanced seen from x, for all ℓ ≤ t (A.17) By the triangle inequality and the fact that the distance to π decreases in time, we obtain ≤ P x [T Z > ℓ] + P t−ℓ π Z − π TV Using this inequality for ℓ := t x,Z (p) (and so t − ℓ = r ε ) we only have to show that P t−ℓ π Z − π TV = P rε π Z − π TV ≤ ε. Combining (A.1) with the definition of r ε , we have that P rε π Z − π TV ≤ λ rε 2 π(Z c )/π(Z) ≤ ε.
(A. 18) We can now proceed to the proof of Proposition 6.4. With our assumptions on t rel and Z n , Lemma A.3 allows us to show that mixing time starting from x and t x,Zn (p) are equivalent when Z n is balanced seen from x (i.e. for x ∈ I n ). Assumption (iv) ensures that what occurs for other initial conditions does not matter and Assumption (iii) establishes that a is the worst initial condition.
A.5. Proof of Proposition 6.5. From Lemma 6.3 and assumptions (i), (iii) and (v) we know that P t n (x, y)/π n (y) and P[T x,y Zn ≤ t] differ only by a negligible amount, provided that x ∈ A n and y ∈ B n . Assumption (iv) ensures then that lim inf n→∞ inf t≥0 min (x,y)∈An×Bn P t n (x, y) π n (y) − P t n (a n , b n ) π n (b n ) = 0. (A. 19) We are left checking the other cases. Assumption (vi) takes care of most of them, and leaves the case where (x, y) ∈ B n × B n , for which Lemma 6.3 implies that P[T x,y Zn ≤ t] is a lower bound for P t n (x, y)/π n (y). Hence the conclusion follows by assumption (v) again.
A.6. A short alternative proof of Theorem C. We are going to show that there exists an absolute constant c > 0 such that for any lazy chain Indeed set t := t mix (1/4) and s := ⌊c √ t⌋. A sample of the distribution of the lazy chain at time t can be generated by running the non-lazy version of the chain for ξ t steps, where ξ t ∼ Bin(t, 1/2) and is independent of the non-lazy version of the chain. By the triangle inequality we have (first inequality) and a standard coupling argument (second inequality) Moreover, if c is chosen well, we have for every t ≥ 0 that ξ t − ξ t+⌊c √ t⌋ TV ≤ 1/2.