On sensitivity of mixing times and cutoff

A sequence of chains exhibits (total-variation) cutoff (resp., pre-cutoff) if for all $0<\epsilon<1/2$, the ratio $t_{\mathrm{mix}}^{(n)}(\epsilon)/t_{\mathrm{mix}}^{(n)}(1-\epsilon)$ tends to 1 as $n \to \infty $ (resp., the $\limsup$ of this ratio is bounded uniformly in $\epsilon$), where $t_{\mathrm{mix}}^{(n)}(\epsilon)$ is the $\epsilon$-total-variation mixing-time of the $n$th chain in the sequence. We construct a sequence of bounded degree graphs $G_n$, such that the lazy simple random walks (LSRW) on $G_n$ satisfy the"product condition"$\mathrm{gap}(G_n) t_{\mathrm{mix}}^{(n)}(\epsilon) \to \infty $ as $n \to \infty$, where $\mathrm{gap}(G_n)$ is the spectral gap of the LSRW on $G_n$ (a known necessary condition for pre-cutoff that is often sufficient for cutoff), yet this sequence does not exhibit pre-cutoff. Recently, Chen and Saloff-Coste showed that total-variation cutoff is equivalent for the sequences of continuous-time and lazy versions of some given sequence of chains. Surprisingly, we show that this is false when considering separation cutoff. We also construct a sequence of bounded degree graphs $G_n=(V_{n},E_{n})$ that does not exhibit cutoff, for which a certain bounded perturbation of the edge weights leads to cutoff and increases the order of the mixing-time by an optimal factor of $\Theta (\log |V_n|)$. Similarly, we also show that"lumping"states together may increase the order of the mixing-time by an optimal factor of $\Theta (\log |V_n|)$. This gives a negative answer to a question asked by Aldous and Fill.


Introduction
Consider a reversible irreducible lazy discrete-time Markov chain X = (X t ) t≥0 , defined on a finite state space Ω (we call a chain finite if Ω is finite). Let P and π denote the transition matrix and the unique reversible probability measure associated to X, respectively (we denote such a chain by (Ω, P, π)). In particular, the laziness and reversibility assumptions are (resp.) that P (x, x) ≥ 1/2 and π(x)P (x, y) = π(y)P (y, x) for all x, y ∈ Ω. To avoid periodicity and near-periodicity issues, one often considers the lazy version of a discrete time Markov chain, (X L t ) ∞ t=0 , obtained by replacing P with P L := 1 2 (I + P ). Periodicity issues can be avoided also by considering the continuous-time version of the chain, (X c t ) t≥0 . This is a continuous-time Markov chain whose heat kernel is defined by H t (x, y) := ∞ k=0 e −t t k k! P k (x, y). It is a classic result of probability theory that for any initial condition the distribution of both X L t and X c t converge to π when t tends to infinity. The object of the theory of Mixing times of Markov chains is to study the characteristic of this convergence (see [16] for a self-contained introduction to the subject).
For any two distributions µ, ν on Ω, their total variation distance is defined as µ − ν TV := 1 as d(t) := max x∈Ω P t x − π TV , where we denote by P t x (resp. P x ) the distribution of X t (resp. (X t ) t≥0 ), given that X 0 = x. The total variation mixing time is defined as When = 1/4 we omit it from the above notation.
In this work we consider lazy simple random walks (LSRW) on a sequence of finite (uniformly) bounded degree connected graphs, G n := (V n , E n ), whose sizes tend to infinity and also lazy random walks on a sequence of networks obtained from them via a bounded perturbation of their edge weights (we defer the formal definition to § 1.3).
Following [10], where Ding and the second author showed that a bounded perturbation can increase the order of the total variation mixing time (we state their result in more details in § 2), we study (by constructing relevant examples) the possible effects of such bounded perturbations on the convergence to the stationary distribution (of the corresponding lazy random walks on the perturbed networks compared to the original LSRWs). In particular, our Theorem 3 asserts that such bounded perturbations can increase the order of the total variation mixing-times by an optimal (as explained in (1.6) below) factor of Θ(log |V n |).
While the aforementioned result is merely an improvement and a simplification of [10, Theorem 1.1], various aspects related to sensitivity of mixing times and the cutoff phenomenon (of LSRW on a sequence of uniformly bounded degree graphs G n = (V n , E n )) are considered in this work for the first time: (1) We consider o(1)-perturbations (in which the weight of each edge may increase only by a 1 + o(1) multiplicative factor) and show that they may increase the order of the mixing time by a factor whose order is arbitrarily close to log |V n | (part (b) of Theorem 3).
(2) We consider sensitivity of mixing-times under lumping (Definition 1.5). We show that (even in the bounded degree unweighted setup) lumping together pairs of vertices may increase the order of the mixing time by an optimal factor of Θ(log |V n |) (part (c) of Theorem 3). This provides a negative answer to a question by Aldous and Fill [2,Problem 4.45] (Problem 1.6 here).
(3) We show that (in the above setup) the mixing time of the lazy non-backtracking random walk may be larger than that of the LSRW by a factor of Θ(log |V n |) (Remark 4.6). A similar example (in which the ratio of the mixing times is o(log |V n |)) was recently constructed by Hubert Lacoin et al. during an AIM workshop on mixing times of Markov chains.
(4) We show that the occurrence/non-occurrence of cutoff/pre-cutoff (see (1.1)-(1.2)) is sensitive under o(1) perturbations of the edge weights (Theorems 2 and 3). We also show that even in the above setup, the product condition (1.3) need not imply pre-cutoff (Theorem 2).
(5) Perhaps our most surprising result (Theorem 1) is that the occurrence of separation cutoff (1.1) for the sequence of discrete-time lazy chains, does not imply the same for the associated sequence of continuous-time chains (this can be interpreted as a "sensitivity" result w.r.t. the choice of discrete/continuous time 1 ). This is in contrast with the case of total variation cutoff [7] due to Chen and Saloff-Coste.
In [13] the first author constructed a sequence of pairs of 2-roughly isometric graphs G n = (V n , E n ), G n = (V n , E n ) of uniformly bounded degree with |V n | → ∞, whose ∞mixing times differ by an optimal factor of Θ(log log |V n |). In this paper we study the convergence to the stationary distribution only w.r.t. the total variation distance and the separation distance.
1.1. The general moral of our results. We now discuss the moral of our results. An important question is whether mixing times are robust. A related question is whether they can be characterized (perhaps only up to some universal constants and only under reversibility) using a geometric quantity which is robust. Different variants of this question were asked by various authors such as Pittet and Saloff-Coste [22], Kozma [15, p. 4], Diaconis and Saloff-Coste [8, p. 720] and Aldous and Fill [2, Open Problem 8.23] (Kozma conjectured that the ∞ -mixing time is robust under rough-isometries for LSRWs on bounded degree graphs, and the last two references ask for an extremal characterization of the ∞ -mixing time in terms of the Dirichlet form).
There are numerous works aiming at sharp geometric bounds on mixing-times such as the Fountoulakis-Reed bound [11] (recently refined by Addario-Berry and Roberts [1]) on the total variation mixing time and Morris and Peres' evolving sets bound [19] on the ∞mixing time, both expressed in terms of the expansion profile of the graph. The sharpest geometric bounds on the ∞ -mixing time are given by the spectral profile, due to Goel et al. [12] and by the Log-Sobolev bound [8,Corollary 3.11] due to Diaconis and Saloff-Coste. Because these bounds involve geometric quantities, they are robust under small changes to the geometry, like bounded perturbations of the edge weights or (in the bounded degree unweighted setup) under rough-isometries.
Our results strengthen the cautionary note of Ding and Peres [10] (and also of [13]) on the possibility of developing a sharp geometric bound on mixing-times. Indeed, any sharp bound would have to distinguish in some cases between the LSRW on a graph and the walk obtained by some o(1)-perturbation.
Although receiving much attention, the investigation of the cutoff phenomenon has progressed mostly through the study of examples (or of certain classes of chains) rather than by developing general theory. Our results concerning sensitivity of the cutoff phenomenon (namely, Theorem 1 and parts of Theorems 2 and 3) demonstrate some difficulties in developing such general theory.
Theorem 1 demonstrates that despite the fact that the separation and the total variation distances are intimately related to one another (e.g. [14,Eq. (1.5), (1.7) and (1.8)]), the former may exhibit surprising behaviors which the latter cannot exhibit. For more on this point see [14,Remark 1.4 and § 2.4].

1.2.
Cutoff and pre-cutoff. Before stating our results concerning the cutoff phenomenon we must first give a few definitions. Next, consider a sequence of chains, ((Ω n , P n , π n ) : n ∈ N), each with its corresponding worst-distances from stationarity d (n) (t), d  sep , etc.. Loosely speaking, the total variation (resp. separation) cutoff phenomenon is said to occur when over a negligible period of time, known as the cutoff window, the worst-case total variation distance (resp. separation distance) drops abruptly from a value close to 1 to near 0. In other words, one should run the n-th chain until time sep ) for it to even slightly mix in total variation (resp. separation), whereas running it any further after time sep ) is essentially redundant. Formally, we say that the sequence exhibits a total variation cutoff (resp. separation cutoff ) if the following sharp transition in its convergence to stationarity occurs: We say that the sequence exhibits a (total variation) pre-cutoff if The notions of total variation and separation cutoff for the corresponding sequence of continuous-time chains are defined in an analogous manner (lim n→∞ t sep,c (1 − ) = 1, resp., for all ∈ (0, 1)). One can also consider the mixing time and separation-time for the sequence of associated lazy chains and define the two notions of cutoff for it. Recently, Chen and Saloff-Coste [7] showed that if t (n) c → ∞ then the sequence of the associated continuous-time chains exhibits total variation cutoff iff the same holds for the sequence of the associated lazy chains. A natural question (L. Saloff-Coste, private communication) is whether the same is true for cutoff in separation. Surprisingly, this turns out to be false. Theorem 1. There exists a sequence of reversible chains so that the lazy chains exhibit separation cutoff but the associated continuous-time chains do not exhibit cutoff.
Remark 1.1. The example we give for Theorem 1 is intimately related to Example 3 in [14] and could be transformed into an example involving simple random walks on a sequence of bounded degree graphs using a similar construction as Example 5 in [14].
Remark 1.2. The δ-lazy version of a chain with transition matrix P is obtained by replacing P with δI + (1 − δ)P (where I is the identity matrix). Chen and Saloff-Coste [7] showed that given a sequence of chains, the corresponding sequence of 1/2-lazy chains exhibits total variation cutoff iff the same holds for the corresponding sequence of p-lazy chains, for all p ∈ (0, 1). One can use the idea behind the construction from the proof of Theorem 1 in order to construct a family of reversible examples demonstrating that for all p = q ∈ (0, 1) it is possible that the sequence of q-lazy versions of a certain sequence of chains exhibits separation cutoff, while the sequence of p-lazy versions does not exhibit separation cutoff or vice-versa 2 . The necessary adaptations are described in § 3.3. Question 1.3. Is it the case that separation cutoff for the sequence of continuous-time chains implies the same for the sequence of lazy chains?
In 2004 [20], during an AIM workshop on the cutoff phenomenon, the second author introduced the so called product condition: . This left open the problem of identifying general classes of chains for which the product condition is indeed sufficient for cutoff. This was verified e.g. for lazy birth and death chains [9] and recently for lazy weighted walks on trees [5].
In Aldous' example the graph supporting the transitions is of bounded degree and contains only a single cycle, however the ratio of the maximal and minimal edge weights is exponentially large in the size of the state space. As noted in [17], Aldous' example, which exhibits pre-cutoff, can be transformed into a sequence of LSRWs on bounded degree graphs (with pre-cutoff). Explicit constructions of such graphs (which were constructed as examples demonstrating that, in general, neither total variation cutoff nor separation cutoff implies the other) can be found at [14].
Until now, as in Pak's example, every known example in which the product condition does not imply pre-cutoff had unbounded degrees. These examples all share the following behavior. The chain mixes (in some sense) "at once" due to the occurrence of a certain rare event, which occurs before the chain has enough time to get even slightly mixed otherwise. It is plausible that (some concrete formulation of) the aforementioned behavior is necessary in order for pre-cutoff to fail when the product condition holds. Moreover, a-priori, it is not clear whether the mechanism that allows such behavior can be produced in the bounded degree (unweighted) setup.
A question, presented to us by E. Lubetzky (private communication), which naturally arises in light of the above discussion, is whether the product condition is also a sufficient condition for pre-cutoff for a sequence of LSRWs on bounded degree graphs {G n } n∈N . One case in which this holds (as a simple consequence of 2 -contraction; see Lemma 6.2) is when {G n } n∈N is a family of bounded degree expanders (that is, t rel (G n ) = Θ(1)). Problem 1.4 (E. Lubetzky (private communication)). Let {G n } n∈N be a sequence of finite (uniformly) bounded degree graphs satisfying the product condition. Is it always the case that the sequence of lazy simple random walks on {G n } n∈N exhibits a pre-cutoff ? detailed comparison of certain two large deviation rate functions, where for our analysis it suffices to use the fact that the two rate functions are not identical.
Our Theorem 2 provides a negative answer to this question. Our construction may be viewed as a bounded degree (unweighted) version of the aforementioned Pak's example (see § 2 and Remark 5.1 for further details concerning this point).
1.3. Perturbations of edge weights and lumping. Let G = (V, E) be a finite connected simple graph. Given a (weighted) network , a lazy random walk on G = (V, E) repeatedly does the following: when the current state is v ∈ V , the random walk will stay at v with probability 1/2 and move to vertex u (such that {u, v} ∈ E) with probability c u,v /(2c v ), where c v := w c v,w . The default choice for c u,v is 1 (in which case, we say that the random walk is unweighted), which corresponds to lazy simple random walk on G (in which at each step the walk with equal probability either stays put or moves to a new vertex, chosen from the uniform distribution over the neighbors of the current position). Its stationary distribution is given by π(x) := c x /c V , where c V := v∈V c v = 2 e∈E c e . We denote by t mix,G,(ce) ( ) the total variation -mixing time of the lazy random walk on the network induced on the graph G by the edge weights (c e ) e∈E . When the walk is unweighted we omit (c e ) e∈E from our notation. As always, when = 1/4 we omit it from the notation. In the unweighted setup we write t mix (G) for the (1/4)-mixing time.
The induced network, obtained by lumping together the states belonging to A i for all 1 ≤ i ≤ k is a network on [k] := {1, . . . , k} with transition probabilities p i,j := P π [X 1 ∈ A j | X 0 ∈ A i ]. It can be obtained by collapsing each A i into a single state i and setting the edge weight of the edge connecting i and j to be c i,j := a i ∈A i ,a j ∈A j c a i ,a j for all (i, j) ∈ [k] × [k] (in particular, it is reversible).
Proposition 4.44 in [2] asserts that several natural parameters of a reversible Markov chain, like the inverses of its spectral gap and cheeger constant, can only decrease as a result of lumping states together. This motivates the following question asked by Aldous and Fill [2, Open Problem 4.45]. Problem 1.6. Is it the case that lumping states together can increase the total variation mixing time of a reversible Markov chain by at most some constant factor K? Can one take K = 1?
Our Theorem 3 (part (c)) gives a negative answer to Problem 1.6. We say that (w  e . 3 We say that {G n } n∈N is robust if performing a sequence of bounded perturbations, (w (n) e ) e∈En preserves the mixing-times up to a constant factor of K (independent of n, but which may depend on M ((w (n) e ))). We say that {G n } n∈N is sensitive if it is not robust. We say that {G n } n∈N is o(1)-sensitive if there exists a sequence of o(1)-perturbations which either increases or decreases the order of the mixing-times. 3 Note that there is no loss of generality in the requirement that w (n) e ≥ 1 for all e ∈ En, since multiplying all of the edge weights by the same constant has no effect on the distribution of the walk.
The girth of a graph G is defined as the minimal length of a cycle in G. In general (even in the weighted case) girth(G) ≤ 2diameter(G) + 1 ≤ Ct mix (G).
Theorem 2. For every f (n) = o(log n/ log log n) such that lim n→∞ f (n) = ∞, there exists a sequence of bounded degree graphs G n = (V n , E n ) satisfying (i) girth(G n ) = Θ(t mix (G n )).
(ii) The corresponding sequence of LSRWs satisfies the product condition but does not exhibit pre-cutoff. (1.4) The following remark explains the significance of the large girth condition.
Remark 1.7. In [5] it was shown that a sequence of lazy random walks on weighted trees exhibits cutoff iff it satisfies the product condition. In [21] it was proved that the mixing time of (possibly weighted) nearest neighbor lazy walks on trees is robust (see [1, Theorem 1] for a recent extension of this result). Combining the two results it follows that for lazy weighted walks on trees the property of exhibiting cutoff is robust. Theorem 2 asserts that the tree assumption in these two results cannot be relaxed to the condition that girth(G n ) = Θ(t mix (G n )) (even in the unweighted setup, and even when considering only pre-cutoff, instead of cutoff ).

Theorem 3.
(a) There exists a sequence of bounded degree graphs G n = (V n , E n ) satisfying t rel (G n ) = Θ(t mix (G n )) (thus lacking pre-cutoff ) such that for every > 0, increasing the edge weight of some of the edges of G n to 1 + increases the mixing time by a factor of c log |V n |, for some constant c > 0 depending only on . Moreover, the sequence of walks on the perturbed networks exhibits a cutoff.
(b) For every f (n) = o(log n) such that lim n→∞ f (n) = ∞ there exists a sequence of graphs G n = (V n , E n ) satisfying t rel (G n ) = Θ(t mix (G n )) for which there exists a sequence of o(1)-perturbations which increases the mixing-times by a factor of Θ(f (|V n |)). Moreover, the sequence of lazy walks on the perturbed networks exhibits a cutoff. (c) There exists a sequence of bounded degree graphs G n = (V n , E n ) such that lumping together some pairs of vertices of G n increases the order of the total variation mixing time by a factor of Θ(log |V n |).
Remark 1.8. We note that the log |V n | factor in part (a) of Theorem 3 is optimal. The following general relation holds for lazy reversible chains (e.g. [16] Theorems 12.3-12.4). for some constant C D,K depending only on D and K. Similarly, since as mentioned earlier lumping does not increase t rel , also part (c) is optimal, up to a constant factor. Remark 1.9. We note that in part (a) of Theorem 3 we have that t mix (G n ) = Θ(log |V n |), and so the mixing time of the perturbed network is Θ([t mix (G n )] 2 ). This is also optimal by (1.5) and the bound diameter(G n ) ≥ c log |V n | (for some c > 0 depending only on the maximal degree).
If the function f in part (b) above is taken to tend to infinity sufficiently slowly, we can have that the o(1)-perturbation from part (b) increases the edge weight by a factor of 1 + δ n for some δ n = o(1) such that δ n t mix (G n ) tends to infinity arbitrarily slowly. An interesting problem is to determine how small can δ n be taken in terms of t mix (G n ) (or to construct such an example with δ n = o([t mix (G n )] −α ) for some α > 1/2).  Similarly, one can consider robustness w.r.t. rough isometries.
1.5. Notation. For every n ∈ N we denote [n] := {1, 2, . . . , n}. For any a, b ∈ R we write a ∨ b := max(a, b) and a ∧ b := min(a, b). Throughout, we use C, C , C 0 , C 1 , . . . and c, c , c 0 , c 1 , . . . to denote positive absolute constants that may be different from place to place. Given some parameter, say , we write C and c for positive constants which depend only on . Upper (resp. lower) case letters will be used to denote sufficiently large (resp. small) constants.

Related Constructions
As mentioned earlier, Ding and Peres have already constructed a sequence of sensitive bounded degree graphs [10]. More precisely, for all j ∈ N they constructed a sequence of bounded degree graphs G n = (V n , E n ) for which if some of the edge weights are doubled, then the order of the mixing times increases by a multiplicative factor of order log |V n |/ log (j) |V n |, where log (j) is the iterated logarithm of order j (see [10,Remark 2 Our constructions use a key observation from [10]. Namely, we use the fact that the harmonic measure is sensitive under perturbations and that (as explained below) this can lead to sensitivity of mixing-times. This idea was originally used by Benjamini [6] to study instability of the Liouville property. We note that our construction in the proof of Theorem 2 was greatly influenced by Ding and Peres' construction and is intimately related also to the construction from [13] of a sequence of graphs of uniformly bounded degree whose ∞ mixing time is sensitive under bounded perturbations.
The first construction of a sequence of finite irreducible lazy reversible chains satisfying the product condition, which does not exhibit pre-cutoff is due to Pak (private communication through Persi Diaconis, see [16,Example 18.7]). Pak's construction gives a general scheme of constructing such sequences of Markov chains. Start with a sequence of lazy reversible chains (Ω n , P n , π n ) which exhibits cutoff (π n P n = π n ). Let Π n be the transition matrix whose rows all equal π n . Denote L n := t (n) rel t (n) mix . Then the sequence (Ω n , (1 − 1 Ln )P n + 1 Ln Π n , π n ) satisfies the product condition but does not exhibit a precutoff.
Loosely speaking, the chain mixes "at once", at a random time having a Geometric distribution with mean L n , due to the occurrence of one rare event (moving according to Π n ) which (with high probability) occurs before the chain has enough time to get even slightly mixed otherwise. At first sight, it is surprising that it is possible to construct an example of bounded degree graphs so that the corresponding sequence of LSRWs imitates this behavior. In our examples, mixing occurs quickly once the chain reaches its "center of mass", which is an expander. This allows us to reduce the analysis of the mixing time and the occurrence/non-occurrence of cutoff to the easier problem of analyzing the distribution of the hitting time of the center of mass (namely, the mixing time is roughly equal to its mean, and total variation cutoff is equivalent to it being concentrated around its mean).
In the construction from the proof of Theorem 2, the distribution of the hitting time of the "center of mass" (starting from the worst starting state) would be roughly a geometric distribution. Loosely speaking, starting from the worst starting state, until the time the center of mass is reached, the chain looks like a LSRW on a regular tree whose edges were stretched (i.e. replaced by a long path whose length tends to infinity) with some "shortcuts" to the center of mass (see . The amount and positions of these shortcuts can be chosen so that the center of mass is reached (with high probability) through one of these shortcuts at a random time having roughly a geometric distribution. Moreover, this can be done so that, under a certain perturbation of the edge weights, the harmonic measure is changed in a manner which makes these shortcuts "invisible" to the walk. Hence after the perturbation, the walk (starting from the root) is "trapped" in the tree of stretched edges for a much longer period of time, which results in an increased mixing time.
The idea behind the construction of the graphs G n from Theorem 2 is simple. Start with an arbitrary sequence of constant degree expanders H n = (V (H n ), E(H n )) (with |V (H n )| → ∞) whose girth is n := Θ(log |V (H n )|) (see e.g. [18] for the existence of such graphs). We pick some vertex o ∈ V (H n ) and choose a certain collection of vertices D, all within distance < n /2 from o. Finally, for all d ∈ D we replace each edge along the shortest path between o and d by a path of length s n , where s n → ∞ and s 2 n = o( n / log n ). With some care, the set D can be chosen (in a canonical manner) so that: • The asymptotic profile of convergence in total variation of the walk can be understood in terms of the distribution (under P o ) of the escape time from the collection of vertices which are incident to the stretched edges. • This escape time distribution is "close" to the Geometric distribution with mean Θ( n ). However, under a certain (canonical) o(1)-perturbation, this escape time distribution becomes concentrated around some time t n = Θ(s 2 n n ). In [17] Lubetzky and Sly gave an explicit construction of 3-regular expanders, G n , such that the sequence of lazy simple random walks on G n exhibits total variation cutoff. Our construction in the proof of Theorem 2 resembles their construction (namely, in [17] they also "stretch" some of the edges of an expander).

Proof of Theorem 1
3.1. Preliminaries. Throughout this section we denote the distribution of the associated lazy (resp. continuous-time) chain started from x by P x (resp. H x ). We denote the transition matrix of the non-lazy (resp. lazy) version of the chain by P (resp. P L ). We denote the separation distance at time t of the continuous-time and lazy chains by d sep,c (t) and d sep,L (t), respectively.
Before presenting the construction for Theorem 1 we first provide some technical machinery, borrowed from [14], which shall assist us in analyzing the asymptotic profile of convergence in separation. To characterize the separation time, we introduce a notion of "double-hitting time". Definition 3.1. Given x, y and z in Ω. We let T x,y z (resp. τ x,y z ) denote a random variable obtained by taking the sum of two independent realizations of T z := inf{t : X t = z}, once under P x and once under P y (resp. H x and H y ). More explicitly, we consider realizations of the lazy and continuous-time chains started from x and y, denoted resp., by (X x t ) and (X x,c t ) (resp. (X y t ) and (X y,c t )) defined on the same probability space, so that (X x t ) and (X y t ) and also (X x,c t ) and (X y,c t ) are independent. Denote by T x u := inf{t : . Define T y u and τ y u in an analogous manner. We set We define Finally, we denote the density function of τ x,y z by f x,y z and that of the sub-distributions The following lemma is a slight variation of Lemma 3.5 from [14]. We present its proof in § 6.4 for the sake of completeness.
Lemma 3.2. Let (Ω, P, π) be a finite reversible Markov chain. Consider x, y, z ∈ Ω. (3.1) In particular, if every path from x to y goes through z) then for all t ≥ 0 In particular, if f x y is the density of the hitting time of y from x, then The following example, Example 3.3 (a birth and death chain of size 2n + 1 with a fixed bias towards its middle point), will serve as a gadget in the construction for Theorem 1. In Example 3.3 the state z serves as the center of mass and has a Θ(1) stationary probability. The chain from Example 3.3 exhibits cutoff in separation around the expectation of the "double hitting time" of z from a and b.
We now briefly explain how Example 3.3 will be used as a building block in the proof of Theorem 1. As noted in [14,Example 3], by attaching two birth and death chains ("branches") of length Θ(n) to z both having the same end-points z, z with a bias towards z , we can tune the stationary measure of z to become exponentially small in n. 6 We pick one of the branches to have a larger average speed than the other. In the example from the proof of Theorem 1 the slower branch is C and the faster is D. For technical reasons, in our construction the end-points of the branches C and D are z andz, wherez is connected to z through a biased birth and death chain (E) with a fixed bias towards z (see Figure  2). Using ideas from [14] we can tune simultaneously 7 both the stationary measure of z (we will take it to be Θ(2 −δn ), for some δ ∈ (0, 1/8)) and the time it takes to reach z from z along each branch. For each branch, if we condition on reaching z through that branch, the (conditional) distribution of the hitting time of z becomes concentrated.
If the time it takes the chain to reach z from z along the slow branch is sufficiently small, then the lazy and continuous-time chains would exhibit separation cutoff around the time t n (resp. τ n ) in which P[T a,b z ≤ t n ] = Θ(π(z)) (resp. P[τ a,b z ≤ τ n ] = Θ(π(z))). Indeed this is proven in [14] Example 3, by exploiting the equality in (3.3) above and analyzing the distribution of T a,b z in the large deviation regime. Note that because we will have that π(z) = Θ(2 −δn ), the times t n and τ n coincide (up to smaller order terms) with t δ and τ δ (respectively) from (3.5).
The significance of (3.6) is that after modifying Example 3.3 as described above so that π(z) = Θ(2 −δn ), we get by (3.6) that lim inf n→∞ t n /τ n = lim inf n→∞ t δ /τ δ > 2 (where t n and τ n are defined in the previous paragraph and t δ and τ δ in (3.5)). This allows us to ruin separation cutoff for the continuous-time chain while maintaining it for the lazy chain, by picking the average speed along the two branches wisely, so that the hitting time of z (for the continuous-time chain), starting from a, is with a constant probability between τ n + n and t n /2 (for some ∈ (0, 1 n ( tn 2 − τ n ))) and with the complement probability (up to negligible terms) is smaller than τ n . In other words, we exploit (3.6) to construct a Markov chain such that the following hold: 5 The term e −n can be replaced by any other term which is o(n). Effectively, it is as if the transition towards z equals 1, but if we would have defined it to equal 1 then the chain would be reducible. 6 We note that in [14, Example 3] the roles of z and z are interchanged compared to their role here. 7 The fact that we can tune them simultaneously is subtle and crucial.
(1) In terms of the separation distance it suffices to consider the case that the initial state is a. (2) The state y which up to o(1) terms minimizes H t (a, y)/π(y) and P 2t L (a, y)/π(y) is different for t lying in some (sufficiently large) time interval.
(3) For the continuous-time chain the worst state (i.e. the minimizer from (2)) is z for every t in the aforementioned time interval. Moreover, we will show that For the sake of simplicity let us assume that the aforementioned time interval is (0, ∞). In this case, the existence of two parallel branches of different average speeds through which z can be accessed prevents cutoff for the sequence of continuous-time chains. (4) For the discrete-time lazy chain we have that min y P t L (a, y)/π(y) = P t L (a, b)/π(b)+ o(1) for all t. Moreover, we will show that P t L (a, b)/π(b) exhibits an abrupt transition from o(1) to at least 1 − o(1) around time t δ and hence the sequence of lazy chains exhibits cutoff around time t δ . (5) The mechanism which allows us to construct an example with such behavior is that while we always have that P (1)).

3.2.
Proof of Theorem 1. Below we intentionally omit all ceiling and floor signs and suppress the dependence on n of some quantities, for the sake of notational convenience. Fix some 0 < < δ < 1/8 and 2 ≤ s ∈ N so that for every sufficiently large n, where t δ and τ δ are as in (3.5). Note that by (3.6) we can find such = (δ) and s = s(δ), provided that δ is sufficiently small. We also note that the leftmost inequality in (3.7) is not as important as the other two. 8 Take Ω : . . , a n = a}, B := {b 1 , b 2 , . . . , b n = b}, D := {d 1 , . . . , d δn/2 }, E := {e 1 , . . . , e δn/2 } and C := {c i,j : 1 ≤ i ≤ δn/2, 1 ≤ j ≤ s} (see Figure 2). Before specifying the transition probabilities we specify some properties that we want the construction to satisfy. The restriction of the chain to A ∪ B ∪ {z} is precisely the chain from Example 3.3, where also here the states a and b are the end-points. The state z serves as the center of mass of the chain (i.e. π(z ) = Θ(1)). Below we essentially show that for both the continuous-time and the discrete-time chains it suffices to consider the case that the initial state is a and that started from a the only two other relevant states for mixing in separation are b and z. More precisely, below (combining (3.15)-(3.16) with the analysis of the four cases in the proof of (3.15)) we show that (uniformly in t) and that P t L (a,b) π(b) (resp. Ht(a,b) π(b) ) exhibits an abrupt transition from o(1) to at least 1 − o(1) around time t δ (resp. τ δ ). Conversely, due to the existence of two parallel branches C and D through which z can be accessed we have that neither P a (T z > t) nor H a (T z > t) exhibit an abrupt transition as functions of t. Namely, they decrease from 1 − o(1) to o(1) via two drops occurring for the continuous-time chain around times 3(1 + δ)n and 3(1 + δ 2 (1 + s 2 ))n (and for the lazy chain around times 6(1 + δ)n and 6(1 + δ 2 (1 + s 2 ))n). The sets C and D serve as two parallel branches with different average speeds (as in the discussion following Remark 3.4) connecting z toz, which in turn is connected to the center of mass, z , via the segment E (see Figure 2). In order to ensure that the average speed along C is slower than along D we subdivide its edges into paths of length s (see Figure 2). 9 The term 3(1 + δ 2 (1 + s 2 ))n in (3.7) corresponds to the time around which the hitting time of z (for the continuous-time chain, started from either a or b) is concentrated, given that z is reached through the slow branch C (this is explained in more details below). The aforementioned roles of the terms in (3.7) (described over the last two paragraphs, other than that of the term 3(1 + δ 2 s 2 )n, whose significance can be seen from (3.10) below) together with (3.8)-(3.9) motivate (3.7).
Indeed, by (3.7) (and the aforementioned role of the term 3(1 + δ 2 (1 + s 2 ))n) we have P a (T z > t δ −2 n) = o(1), and so by (3.8) we have that d which (as will be made clear) by construction is bounded away from 0 and 1 (where we say that z is reached through C if the last state visited before Tz is in C, see Figure  2). Hence by (3.9) and the comment following it, for t ∈ [τ δ + n, τ δ + 2 n] we have that d The lengths of the branches C and D (and also of the interval E) are taken so that π(z) = Θ(2 −δn ). Note that this means that t δ and τ δ agree (up to negligible terms) with t n and τ n , respectively, from the discussion following Remark 3.4. In light of the discussion following Remark 3.4, this explains why (as mentioned above) (resp. Ht(a,b) π(b) ) exhibits an abrupt transition around t δ (resp. τ δ ). 10 We now specify the edge weights 11 and introduce some additional notation (for a schematic representation of the transition probabilities see Figure 2).
• a := a n , b := b n (the states a and b have symmetric roles in our construction).
• c 1,0 :=z =: e 1+δn/2 = d 0 and e 0 := z . Consider the following (symmetric) edge weights: 9 We could have simply increased the holding probability along the vertices of the branch C. However, in order to make the details behind Remark 1.1 more transparent we choose to subdivide its edges instead. 10 While the abrupt transition of around time tn essentially follows from the analysis in [14, Examples 3 and 5], we will prove it and the abrupt transition of H t (a,b) around time τn for the sake of completeness. 11 We write the edge weights (instead of the transition probabilities, which are given in Figure 2) in order to demonstrate that the chain is indeed reversible and to facilitate the calculation of π(z). Note that the restriction of the chain to D is a birth and death chain with a bias towards z and an average speed of 1/3 (1/6 for the lazy chain), while its restriction to C can be described as follows: first take the same birth and death chain as D and then "stretch" each edge (c i+1 , c i ) by a factor s by replacing it by a path of s edges, (c i,s , c i,s−1 ), . . . , (c i+1,1 , c i,0 ), of the same weight as (c i+1 , c i ). It is not hard to see that this results in an average speed of 1/(3s 2 ) along C towardsz (for the lazy chain the speed is 1/(6s 2 )).
Note that started from a the chain may reachz (and thus also z ) either through the branch C or D (i.e. the last state to be visited prior to Tz may be either in D or in C). Started from a, conditioned on taking the branch C (in the above sense), the hitting times ofz and z (for the continuous-time chain) are concentrated around 3(1 + s 2 δ/2)n and 3(1 + δ 2 (1 + s 2 ))n, resp., while conditioned on taking the branch D, they are concentrated around 3(1 + δ 2 )n and 3(1 + δ)n, resp.. From this, we get that there exists some constant 12 c = c(s) = Θ(1/s) such that for all sufficiently large n Note that (since s is fixed) π(z) = Θ(2 −δn ).
(3.14) We now argue that lim sup These equations are essentially borrowed from the analysis of Examples 3 and 5 in [14].
For the sake of completeness we present a sketch of their proofs (all of the missing steps can be found at [14]). We now prove (3.15). We only prove the first line of (3.15) as the second line is proved in a similar fashion. The proof of (3.16) is contained within the analysis of Case 1 below. By (3.4) with (x, y) = (a, z ) (and noting that t For Case 2, if (x, y) = (a i , a j ) we may assume w.l.o.g. that i > j (since H t (x, y)/π(y) = H t (y, x)/π(x)). Using (3.2) we get that for all t ≥ 3n+n 2/3 we have that 1−H t (x, y)/π(y) ≤ H x [T x ≤ y] = o(1) while by (3.17) we have that H n(3+δ/2) (a, z )/π(z ) = o(1) (hence we may neglect Case 2). 13 For Case 4 it is not hard to see that for j(n) := n δ 2 (s 2 + 1) + n 2/3 we have We now consider Case 3. For all x ∈ A, if y ∈ C ∪ D then by (3.1) we have that minimizing min{H t (x, y)/π(y), 1} for all t, up to o(1) additive terms) is y = z and that for each fixed t the worst x ∈ A w.r.t. y = z (at least up to o(1) additive terms) is x = a. Finally, for Case 1 note that by (3.1) we have that Using the fact that the law of τ x,y z is a convolution of Exponential distributions and hence log-concave (cf. the analysis of Example 5 in [14, Section 6.5]), it follows that for all (x, y) ∈ A × B we have that H t (x, y)/π(y) exhibits the following behavior around t x,y := inf{t : , there exists some C α such that (uniformly in all (x, y) as in Case 1) for some constant C α > 0 (this uses the aforementioned log-concavity of τ x,y z (cf. [14, Section 6.5]) and so by (3.18) (and the fact that t x,y < (1 − α)E[τ x,y z ] for some α > 0 for all sufficiently large n) there exist a constant C > 0 such that Again, using log-concavity and the fact that t x,y by definition is in the large deviation regime of τ x,y z , such that for some constants c, C > 0 for each fixed k ∈ Z we have that e ck ≤ This in particular establishes the first line of (3.16) (the proof of the second line is analogous, where Exponential distributions above are replaced by Geometric distributions (see [14,Section 6.5]). Moreover, the same reasoning as in [14, Section 6.5] yields that H t (x, y)/π(y) is increasing in [0, t * (x, y)], for some t * (x, y) satisfying |t * (x, y) − E[τ x,y z ]| ≤ C 2 √ n, and that for all t ≥ t * (x, y) we have that H t (x, y)/π(y) ≥ 1 − o(1) (uniformly in (x, y) ∈ A × B and t ≥ t * (x, y)). Finally, to conclude the analysis of Case 1 we note that max (x,y)∈A×B t * (x, y) = t * (a, b).

Proof of Remark 1.2.
We now explain the necessary adaptations for the proof of the assertion of Remark 1.2. The only adjustments are in the choices of δ and s (and possibly, one has to contract some of the s-paths along the slow branch C to a single edge in order to adjust the expected time it takes the walk to cross the slow branch). Fix α ∈ (0, 1). Consider the α-lazy version of the chain. Denote by P (α) x the law of the α-lazy version of the chain and by T x,y x (α) the version of T x,y x corresponding to holding probability α. Note that under the aforementioned modifications to the chain (i.e. adjusting δ and s) the transition probabilities along the A and B segments are unaffected and so (for each fixed α) the law of T a,b z (α) is also unaffected. Let δ > 0 to be determined later. The aforementioned modifications can be made so that we still have that π(z) = Θ(2 −δn ). Denote the separation distance at time t of the nth α-lazy chain by d (n) sep,α (t). Let κn be the expected hitting time of z fromz for the non-lazy chain, conditioned on taking the slow branch C. Similarly to the analysis in the proof of Theorem 1, as long as δ is taken to be sufficiently small, and κ is chosen in an appropriate Similarly to the analysis in the proof of Theorem 1, the quantity 1 − P (α) exhibits a sharp transition around t (α) δ . Using the aforementioned modifications we can ensure that P [λr − log f α (λ)] , (3.19) where for ∆ α (λ) := (e −λ − α) 2 − 4(1−α) 3 and λ α , the smaller solution to ∆ α (λ) = 0, we have that Fix p = q ∈ (0, 1). Using the fact that the Legendre transform of a strictly convex smooth function is itself smooth and that the restriction of the Legendre transform to this class of functions is invertible, it is not hard to verify that for every ∈ (0, 1/100) there is some r ∈ (3 − , 3) such that Ψ p ( r 1−p ) = Ψ q ( r 1−q ). 15 This implies that for all p = q ∈ (0, 1), we can find δ = δ p,q ∈ (0, 1/10) such that for some δ = δ p,q > 0 either (Case 1: (1)), for all n or (Case 2: (1)) for all n (namely, fixing such r we can pick δ to satisfy 2 −nδ = e −2nΨp( r 1−p ) ; Case 1 corresponds to Ψ p ( r 1−p ) > Ψ q ( r 1−q ) and Case 2 to Ψ p ( r 1−p ) < Ψ q ( r 1−q )). This implies 14 More precisely, this is the case whenever 3(1+max(δ,κ))n is defined in the following paragraph. We leave this as an exercise. 15 The choice of the constant 1/100 is arbitrary and is made in order to ensure that 3 − r is small. that we can tune κ such that in Case 1 we have t (p) δ > s (p) and lim sup n→∞ t (q) δ /s (q) < 1 and in Case 2 we have t (q) δ > s (q) and lim sup n→∞ t (p) δ /s (p) < 1. Thus in the first case the p-lazy chains exhibit separation cutoff while the q-lazy chains do not, and in the second case the q-lazy chains exhibit separation cutoff while the p-lazy chains do not. Note that in the above argument the formulas for Ψ p and Ψ q played no role. However, they can be used to distinguish between Cases 1 and 2 for every fixed p = q.

Proof of Theorem 3
4.1. Preliminaries. Before proving Theorem 3 we make several general comments regarding a principle which shall be utilized below repeatedly. We summarize a few different variations of this principle in Fact 4.1 (whose proof is deferred to the appendix § 6.3).
Let T = (V, E) be an infinite binary tree rooted at o (in practice, we shall work with finite trees; however, it is not hard to show that the "boundary effect" coming from the finiteness of the trees is negligible for our poruses). For any vertex u we distinguish its two children by left and right child. We denote the collection of all left (resp. right) children in T by L (resp. R). Denote by L n the collection of vertices whose distance from o is n. For any vertex u let L(u), (resp. R(u)) be the number of left (resp. right) children along the path from u to the root. Denote g(u) = L(u) − R(u). Let τ k := sup{t : X t ∈ L k }. (2 a) Fix some > 0. Consider the network obtained by increasing the edge weight between every u and v such that v is a left child of u to 1 + . Then, (g(X τ k )) k≥0 is distributed like a biased nearest-neighbor random walk on Z, ( S k ) k∈Z + , satisfying (2 b) Let n ∈ N and D ⊂ L n . Let D be the event that the walk visits the set D at least once. There exists an absolute constance c (which is independent also of ) such that also in the perturbed network we have that (4.1) holds.  4.2. Proof of part (a) of Theorem 3. Fix some (large) k ∈ N. We suppress the dependence on k from the notation (below, o(·), O(·), Θ(·) and Ω(·) are taken w.r.t. k).
Denote s := k 3 . We now construct a graph G = (V, E) in three steps (see Figure 3).
Step 1: Start with a binary tree T = (V (T ), E(T )) of depth s with root o.
Step . This is schematic representation of the graph G from part (a) of Theorem 3. We start with a binary tree rooted at o of depth s := k 3 . We then connect its leafs using an expander. We define a set D, contained in the first s/2 levels of the tree, which is the collection of "unbalanced" vertices in the sense that the number of left/right turns from them to the root violates the Law of Iterated Logarithm in some strong sense. Finally, d ∈ D is decorated by a k × k × k torus, represented by a square.
Step 3: We pick a set D ⊂ V strategically as follows and decorate each of its vertices by a 3D torus of side length k (and so of size s): Denote the collection of vertices belonging to the i-th level of T by L i (T ). For any vertex u which is not a leaf of T , we distinguish its two children by left and right child. We denote the collection of all left (resp. right) children in T by L (resp. R). Fix some large integer C to be determined later. We note that one can set C = 1, but the analysis is somewhat smoother by taking C to be large. Including the constant C in the construction shall benefit us in the proof of part (b). We denote by D i the collection of all vertices u belonging to L Ci (T ) such that if γ u = (v 0 = u, v 1 , . . . , v Ci = o) is the path from u to o in T , then for all 1 ≤ j ≤ i |{v : 0 ≤ ≤ Cj} ∩ L| − |{v : 0 ≤ ≤ Cj} ∩ R| ≥ 3 Cj log log(Cj).
Crucially, above the "base point" v 0 of γ u was taken to be u itself (rather than o).
That is, for each v ∈ D we attach to v a three dimensional torus, W v , of side length k, having v as one of its vertices, while the rest of its vertices are disjoint from T (where W v and W u are taken to be disjoint if v = u). Call the resulting graph G = (V, E). We argue that t mix (G) = Θ(s) and also t rel (G) = Θ(s). By Lemma 6.7 t rel (G) = Ω(s). We now explain why indeed t mix (G) = Θ(s) (by (1.5) this implies that t rel (G) = Θ(s)). Let A := j≥3s/4 L j (T ). The set A is sufficiently far from the set D so that starting from any u ∈ A the walk mixes in Θ(s) steps (as if the vertices in D were not decorated by tori). That is, there exists an absolute constant C 1 ∈ N such that for every u ∈ A, This can be deduced formally using Proposition 6.6. Proposition 6.6 applies because for all a ∈ A, by the tree structure we have that P a [T D ≤ C 1 s] can be bounded from above by the probability that a LSRW on a binary tree of depth s/4 , started from some leaf, reaches the root by time C 1 s (which occurs with probability of at most C e −cs ). Consequently, by Proposition 6.6, there exists some absolute constant C 2 such that for every t ≥ 0 Let T 1 (resp. T 2 ) be the total amount of time, prior to time T A , that the walk spends at v∈D W v (resp. V \ v∈D W v ). Let T 3 be the total number of times the set D was visited prior to time T A by crossing some edge belonging to T . That is, We argue that there exist absolute constants C 2 , C 3 , β > 0 such that (3) For every m, r ∈ N we have that Combining (1)-(3) with (4.5) concludes the proof of the fact that t mix (G) = Θ(s). To see why (1) holds, use part (1 a) of Fact 4.1, in conjunction with the law of the iterated logarithm and the fact that the distribution of the number of visits by time T A to each L i (T ), denoted by N i , has an exponential tail (along with the fact that Cov(N i , N i+j ) decays exponentially in j for all i, j). We leave the details as an exercise.
For (2), note that the distance from the root of a LSRW on a binary tree behaves like a biased nearest neighbor walk whose average speed is 1/6.
It is not hard to show that there exist some C 4 , C 5 , β > 0 such that for all v ∈ D max u∈Wv P u [T V \Wv > ms] ≤ C 4 e −β m , for every m ∈ N.
Thus (3) is obtained as a large deviation estimate.
The bounded perturbation from the assertion of part (a) of Theorem 3 is obtained by increasing the edge weight of the edge between any v ∈ 0≤i≤s/2 L i (T ) and its left child to 1 + , for some constant > 0.
It is easy to show that (4.5) remains valid also in the perturbed network and that (by symmetry) the maximum (in the LHSs of (4.5)) can (still) be taken over the set W o . To distinguish between the LSRW on G and the lazy random walk on the perturbed network we adopt the convention that when referring to the perturbed network we write P u and E u , instead of P u and E u .
Let T 1 , T 2 , T 3 be as above. Using part (2) of Fact 4.1 it is not hard to verify that for any u ∈ W o we have that E u [T 3 ] ≥ c 6 ( )s and Var u [T 3 ] ≤ C 7 ( )s for some constants c 6 ( ), C 7 ( ) > 0, depending on . The fact that E u [T 3 ] ≥ c 6 ( )s (for some c 6 ( )) is clear from part (2 a) of Fact 4.1. To see that Var u [T 3 ] ≤ C 7 ( )s use part (2) of Fact 4.1 to deduce that the correlation between the contribution to T 3 from vertices belonging to D i and D i+j , respectively, decays exponentially in j.
Using these estimates, we now show that the order of the mixing time of the walk on the perturbed network is Θ(s 2 ) and that a sequence of walks on such perturbed networks with k → ∞ exhibits cutoff.
Using (4.6)-(4.7) it is not hard to show that for any u ∈ W o , we have that E u [T 1 ] ≥ c 8 s 2 and that Var u [T 1 ] ≤ C 9 s 3 = o((E u [T 1 ]) 2 ), for some constants c 8 , C 9 > 0 depending on . Moreover, for every u ∈ W o , E u [T o ] ≤ C 5 s. Consequently, it follows from Chebyshev's inequality that starting from every u ∈ W o we have that T A is concentrated around

Proof of part (b) of Theorem 3. We present two different constructions.
First construction: The first construction is obtained from the construction of the proof of part (a) by replacing the constant C (from step 3) by some C(k) tending to infinity as k → ∞ such that C(k) = o(s) (where as above s = k 3 ). Call the obtained network G = (V, E). Clearly (4.3) remains valid. This follows from the fact that (4.5) remains valid, and that for all t > 0 it is still true that max u∈V P u [T A > t] = max u∈Wo P u [T A > t].
Consider a perturbation of the same collection of edges which were perturbed in the proof of part (a), only that now we increase their weights to 1 + K log log(C(k))

C(k)
, where K > 4 is some sufficiently large absolute constant to be determined shortly. Similar reasoning as in the proof of part (a) (using part (2 a) of Fact 4.1, together with the law of the iterated logarithm, here for a biased random walk on Z, with a fixed bias) shows that if K is sufficiently large, this perturbation increases the order of the mixing time by a factor of order s/C(k) = Θ((log |V |)/C(k)) and that a sequence of random walks on the perturbed networks with k → ∞ exhibits a cutoff. By taking C(k) to tend to infinity arbitrarily slowly we can increase the mixing time by any factor f k → ∞ such that f k = o(log |V |).
Second construction: We now present the second construction. Take sequences of integers k n , r n , n , m n tending to infinity as n → ∞ such that s n = k 3 n = n r n , m n ≤ 1/4 n and r n = Θ m n e m 2 n /2 . The first two steps of the construction are taken as in the proof of part (a) with k n in the role of k. We modify step 3 by changing the definition of the set D as follows.
Step 3': Set D 0 := {o}. For every 1 ≤ i ≤ s n /(2 n ) = r n /2, we set D i to be the collection of all vertices u ∈ L i n (T ) such that if (v 0 = u, v 1 , . . . , v n ) is the path in T between u and its n -th ancestor v n ∈ L (i−1) n (T ) then |{v j : 0 ≤ j ≤ n } ∩ L| − |{v j : 0 ≤ j ≤ n } ∩ R| ≥ m n n .
By the local CLT (4.2) and our choice of m n and r n we have that |D 1 |2 − n = Θ(r −1 n ). Consequently, for every i ≤ r n /2 − 1, for all u ∈ L (i−1) n (T ) we have that We define D := 0≤i≤rn/2 D i . As before, we decorate each v ∈ D by a 3D k n × k n × k n torus, W v . Call the obtained graph G n = (V n , E n ). As before, let A = A n := j≥3sn/4 L j (T ) and define T i in an analogous manner to the way they were defined in the proof of part (a) (i = 1, 2, 3). By (4.8) we have that E[T 3 ] = Θ(1) (using similar reasoning as in part (1 b) of Fact 4.1). Using this fact it is not hard to verify that (4.3) remains valid also here (with G n and s n in the roles of G and s). This follows from the fact that (4.5) remains valid, and that for all t > 0 it is still true We now describe the perturbation of the edge weights described in part (b) of the theorem. Let δ n = Km n / √ n , for some sufficiently large absolute constant K to be determined later. Perturb the same edges as before by increasing their weights to 1 + δ n .
Much as before, by symmetry, it suffices to consider the case that the initial state of the walk on the perturbed network belongs to W o . Since δ n n = Km n √ n , if K is taken to be sufficiently large , then by part (2) of Fact 4.1, for every i ≤ r n /2−1 and all u ∈ L (i−1) n (T ) we have that P u [T D i = T L i n (T ) ] = 1−o(1). This implies that if K is taken to be sufficiently large, then in the perturbed network on G n , starting from any u ∈ W o , we have that T 3 is concentrated around some t n = Θ(r n ). Consequently, T A is concentrated around some t n = Θ(r n k 3 n ), which, as before, implies that the walk on the perturbed network exhibits cutoff around time t n . In particular, the order of the mixing-times increased by a factor of Θ(r n ). Setting m n = 1/4 n , we get that r n = Θ(log |V n |/ log log |V n |). Note however that by taking m n to tend to infinity arbitrarily slowly we get that s n / n tends to infinity arbitrarily slowly and thus so does δ n t mix (G n ).

4.4.
Proof of part (c) of Theorem 3. Consider the graph G from part (a). Now, stretch all of its edges by a factor 3. The tree T is replaced by a tree T with stretched edges. However, we think of T of as being contained in T , with each pair of neighbors in T being separated by a path of length 3 in T . Denote the obtained graph by G . It is not hard to see that the mixing time of G can differ from that of G only by a constant factor (the expectation of the hitting time of the leaf set of T is delayed by a factor of 3 2 ). Now, for each path of length 3, (u, w, w , v) connecting some vertex u of T and its left child v, lump together the pair of its internal vertices w, w . This has the same effect as replacing the path (u, w, w , v) by a path (u, z u,v , v) with a self loop of weight 2 at z u,v . It is easy to see that this results in a bias towards the left children of T (the bias is the same as when we remove the self-loops at the z u,v 's; After removing the self-loops, a standard network reduction, similar to the one in the proof of Fact 4.1, can be used to establish the existence of the aforementioned bias). The analysis can be concluded in a similar manner to the analysis of the perturbed network in part (a).
Remark 4.4. It is possible to modify the examples from Theorem 3 so that they also satisfy girth(G n ) = Θ(t mix (G n )). Namely, parts (1)-(2) of the construction can be replaced by starting with some regular expander of logarithmic girth ([?, 18]) as the base graph, like in the construction of Theorem 2. Then, instead of decorating the set D by tori of size Θ(log |V n |), we could decorate them by binary trees of that size. Remark 4.5. Both Ding and Peres' example [10] and the example from Theorem 2 satisfy the product condition. It is thus natural to ask whether any sequence of bounded degree graphs satisfying t rel (G n ) = Θ(t mix (G n )) must be robust (equivalently, to ask whether the condition t rel (G n ) = Θ(t mix (G n )) is robust). Theorem 3 demonstrates that in fact the condition t rel (G n ) = Θ(t mix (G n )) may be o(1)-sensitive.
Remark 4.6. A non-backtracking random walk on a simple graph G = (V, E) (NBRW) evolves as follows. When at vertex u at some time t, if at time t − 1 the walk was at vertex v, the next position of the NBRW is chosen from the uniform distribution over {x ∈ V \ {v} : {u, x} ∈ E}. We may consider the lazy version of a NBRW.
A small variation of the example from part (a) of Theorem 3 shows that for a graph G of bounded degree (with no degree 1 vertices) the total variation mixing time of the lazy NBRW may be larger than that of the LSRW by a factor of Θ(log |V |). Namely, one can stretch each edge between a vertex in T and its right child by a factor 2. As in part (c) of Theorem 3, we think of T as being contained in the modified tree. One can then define D i to be the collection of all vertices u belonging to L Ci (T ) such that if for some small > 0 and, as before, set D := s/2C i=0 D i . Finally, as before, decorate each d ∈ D by a 3D torus of side length k and connect the leafs of T using an expander H. It is not hard to see that if is taken to be sufficiently small, then due to the bias towards the left children of T , resulting from stretching the "right edges", w.h.p. the LSRW will visit only a constant number of tori before reaching the 3s/4 level of T . However, the harmonic measure of the lazy NBRW is unaffected by the stretched edges (meaning that if v and v are the children of u in T , then also in the modified graph, when the lazy NBRW is at u, it has the same probability of reaching either v or v before the other), which means that w.h.p. it will visit Θ(k 3 ) tori before reaching the 3s/4 level of T . Since also the lazy NBRW spends at average Θ(k 3 ) steps at each torus, this means that the hitting time of the 3s/4 level of T is Ω(k 6 ), which as before, implies that the mixing time of the lazy NBRW is Ω(k 6 ).

Proof of Theorem 2
Our construction is obtained by stretching some of the edges of a Ramanujan Cayley graph. Let H be a group and S ⊂ H be a finite symmetric (i.e. S = S −1 := {s −1 : s ∈ S}) set of generators of H (i.e. every h ∈ H can be written as a finite product of the form s 1 s 2 · · · s k where s i ∈ S for all 1 ≤ i ≤ k). The Cayley graph of H w.r.t. S is defined to be the graph whose vertex set is H and whose edge set is {{h, hs} : h ∈ H, s ∈ S}.
Let G be a d-regular connected graph of size n. Denote the transition matrix of simple random walk on G by P . Denote the eigenvalues of P by λ n ≤ · · · ≤ λ 2 < λ 1 = 1. We say that G is a Ramanujan graph if |λ i | ∈ [0, 2d −1 √ d − 1] ∪ {1}, for all i ≤ n. Let p, q be two distinct prime numbers congruent to 1 modulo 4 such that q > √ p and q ≡ a 2 modulo p for some integer a. Then there exists a (p + 1)-regular Ramanujan Cayley graph G p,q of size q(q 2 − 1)/2 whose girth is of size at least 2 log p q [18]. We fix p = 5 and take an increasing sequence (q n ) n∈N of such prime numbers and consider H n = G 5,qn = (V n , E n ). Fix some vertex o ∈ V n . Note that up to a distance log 5 q n from o the graph H n looks like a 6-regular tree.
We take some sequences of integers s n , m n , b n tending to infinity such that: • log 5 q n /4 ≤ s 2 n m n b n ≤ log 5 q n /2. • e mn/bn = Θ(s 2 n b n ). We think of b n as tending to infinity arbitrarily slowly (compared to s n ). As e mn/bn = Θ(s 2 n b n ) we think also of m n / log s n and (log q n )/(s 2 n log log q n ) as tending to infinity arbitrarily slowly.  Denote the ball of radius s 2 n m n b n around o by T n = (V (T n ), E(T n )). We think of T n as a 6-regular tree rooted at o. We shall construct a sequence of graphs G n = (V n , E n ) by stretching some of the edges of H n by a factor s n as follows. We shall pick a certain subtree T n ⊂ T n (which is also rooted at o) and replace each edge {v, u} in T n by a path γ u,v of length s n whose end-points are u and v. We call the resulting tree T n = (V (T n ), E(T n )). We identify each vertex of T n = (V ( T n ), E( T n )) with the corresponding vertex of T n .
The stretched edges have the effect of significantly "slowing down" the walk while it is confined to T n . With some care, we shall choose T n so that • |T n |/|V n | = o(1/q 2 n ) and so the walk must escape T n before mixing.
• The distribution of the escape time from T n starting from the root, is stochastically the largest (compared to all other starting points).
• The distribution of T V (Tn) c /(3s 2 n m n b n ), starting from o, is "close" to the Exponential distribution with some constant mean (see (5.2) for a precise statement).
• Once the walk escapes T n it has a negligible chance of crossing any stretched edge by the time it is already extremely mixed. Consequently, there exists some o(1) terms such that the additional amount of time required for the walk to become + o(1) mixed, beyond the time required for it to escape T n with probability of at least 1 − , can be upper bounded by t mix,Hn (o(1)) ≤ C log q n = Θ(s 2 n m n b n ) (more precisely, we derive such a bound using Proposition 6.6). Putting all this together, it follows that t mix,Gn ( ) | log | = Θ(s 2 n m n b n ) = Θ(log q n ), uniformly for every 0 < ≤ 1/2.
Hence there is no pre-cutoff, although the product condition holds (by Lemma 6.7 t rel (G n ) = O(s 2 n ) = o(log q n )). • Under a certain o(1)-perturbation of some of the edges of T n , w.h.p. the walk would remain "trapped" in T n until escaping it through the collection of its leaves which have maximal distance from the root. Consequently, as opposed to the situation in the original graph, the escape time from T n (starting from the root) is concentrated, but around a time of strictly larger order than log q n , namely 3s 4 n m n b n = Θ(s 2 n log q n ). Thus the walk on the perturbed network exhibits a cutoff around time 3s 4 n m n b n .
We denote the internal vertex boundary of T n w.r.t. G n and of T n w.r.t. T n by By construction ∂ T n = ∂T n . As T n is rooted at o and T n ⊂ T n , in order to define T n it suffices to specify its collection of leaves, which is precisely the set ∂ T n . Namely, V ( T n ) is determined by ∂ T n as the union of the vertices along all paths in T n from the root to ∂ T n . We now describe our procedure for choosing the set ∂ T n (this concludes the construction).
(1) Denote the k-th level of T n by L k (T n ) (this level contains the ks n -th level of T n ). Then we shall construct ∂ T n so that ∂ T n ⊂ s 2 n bn k=2 L kmn (T n ). We shall define D k := ∂ T n ∩L kmn (T n ) recursively starting from k = 2 (i.e. we set ∂ T n := s 2 n bn k=2 D k ). Having defined D 2 = ∂T n ∩ L 2mn (T n ), . . . , D k = ∂T n ∩ L kmn (T n ), we define A k+1 to be the set of vertices in L (k+1)mn (T n ) such that the path from them to o in T n does not go through any vertex in k i=2 D i . The set D k+1 shall be defined to be a certain subset of the vertices in L (k+1)mn (T n ) which have a vertex in A k as an ancestor (this is described in (2)-(3) below). To start the construction we set A 1 := L mn (T n ) and to conclude it we define D s 2 n bn = ∂T n ∩ L s 2 n n b mn (T n ) to equal A s 2 n bn (making the last level of T n different). We now specify how D k+1 is defined in terms of A k (first qualitatively (2) and then more concretely (3)).
A 5-ary tree T T of depth m n . root m n levels Figure 6. The leafs of T T are partitioned into two parts, A and D, where the set D consist of leafs which are in some sense "unbalanced". The set D belongs to ∂ Tn, while for every a ∈ A another copy of T T is contained in Tn. We repeat this procedure for sn − 1 iterations (see Figure 7), where in every iteration, the partition of the leaves into A and D is identical.
A schematic representation of the recursive construction of T n o m n levels Each triangle represents a copy of s n -1 blocks each of m n -1 levels Attach a copy of to each v in the m n -th level d Figure 7. Every triangle represents a copy of T T . Every d ∈ D belongs to ∂ T , while for every a ∈ A another copy of T T is contained in Tn. We repeat this procedure for sn − 1 iterations, where in every iteration, the partition of the leafs into A and D is identical. The leafs of the copies of T T from the last iteration all belong to ∂ T . Note that the first iteration is different, in that all of the leafs of the first copy of T T (the one rooted at o) are in A.
(2) For every k ≤ m n b n s 2 n − 1 and every v ∈ L kmn (T n ) we denote L kmn+i (T n ) : the path in T n between u and o goes through v}.
Let B v be the the set of leaves of T v . For all 2 ≤ k ≤ s 2 n b n −1 we will define D k to be a subset of v∈A k−1 B v . We shall define T n so that for every 1 ≤ k 1 , k 2 ≤ s 2 n b n − 2 and every v i ∈ A k i (i = 1, 2) the trivial isomorphism of T v 1 and T v 2 is a bijection from B v 1 ∩∂ T n onto B v 2 ∩∂ T n . In particular, |B v ∩∂ T n |/|B v | is some fixed number, which we shall pick to be between b −1 n and 2b −1 n . (3) We now define the sets D 2 , . . . , D s 2 n bn . For every vertex v ∈ A k for some k < s 2 n b n − 1 and every u ∈ T v \ B v , we distinguish one of the children of u (w.r.t. the tree T v viewed as a rooted tree with root v) as a left child. For every u ∈ B v let f (u) be the number of left children along the path from u to v in T v . We define F v = B v ∩ ∂ T n to be the collection of all u ∈ B v such that f (u) ≤ m n /5−g n (m n ), where g n (which may depend on n and (b n , m n )) is chosen so that 1/b n ≤ |B v ∩ ∂T n |/|B v | ≤ 2/b n (for all sufficiently large n). Finally, we set D k+1 := v∈A k F v .
We start by describing four properties of G n which do not depend on the particular choice of T n .
(d) There exist absolute constants C 1 , C 2 > 0 such that for every vertex v ∈ V n whose distance from T n is at least C 1 log log q n we have that P C 2 log qn v − π n TV = o(1).
(e) Let C 1 be as in (d). Denote by J n the collection of vertices whose distance from T n is at least C 1 log log q n . Then for every u ∈ ∂T n we have that P u [T Jn > 5C 1 log log q n ] = o(1).
(a) and (b) are trivial. For (d) note that for any u in the exterior vertex boundary of T n , the intersection of the ball of radius C 1 log log q n centered at u with V n \ V (T n ) is a 5-ary tree. Hence (d) follows from Proposition 6.6.
For (e) observe that for any v ∈ ∂T n the probability that a lazy random walk started from v reaches its parent u w.r.t. T n (by crossing the path γ u,v ) before reaching J n is o(1). Moreover, the probability that the walk would spend at least C 1 log log n steps in γ u,v before reaching J n is o(1). Finally, conditioned on hitting J n before returning to v, the conditional distribution of T Jn is concentrated around 3C 1 log log q n .
(g) There exists an absolute constant c 1 > 0 such that for any 2 ≤ k ≤ m n s 2 .
It is easy to see how (f) follows from the symmetry of the construction together with the fact that the construction of the sets D i starts only from k = 2.
We now explain (g). The function f (from (3) above) could be extended to V (T n ) by contracting the stretched edges into a single edge (where internal vertices of an edge are assigned the same value as one of the end-points of that stretched edge). Fix some 1 ≤ k ≤ s 2 n b n − 2 and some v ∈ A k . Let Y be the last vertex in L (k+1)mn ( T n ) visited by the walk prior to T ∂Tn . Then starting from v, conditioned on T ∂Tn < T L kmn−1 ( Tn) we have that the (conditional) law of f (Y ) w.r.t. the walk on T n is the same as its (conditional) law w.r.t. the walk on T n . This law can be approximated well by that of a sum of m n i.i.d. Bernoulli(1/5) r.v.'s. Similarly to (4.1), the conditional probability that X T ∂Tn ∈ L (k+1)mn ( T n ) is at most c −1 P v [Y ∈ ∂T n | T ∂Tn < T L kmn−1 ( Tn) ]. Using this observation, it is easy to see that the probability that the walk reached L kmn ( T n ) = L ks 2 n mn (T n ) without first hitting ∂T n is between Whence (g) follows from the fact that starting from o the hitting time of L kmn (T n ) (conditioned that it is hit before ∂T n ) is concentrated around 3s 2 n m n k. To see this, first consider the hitting time of L kmn (T n ) with respect to the non-lazy version of the induced chain on T n (starting from o). Note that it is concentrated around 3 2 m n k (its distance from o (w.r.t. T n ) is distributed like a biased nearest-neighbor random walk with a fixed bias). Finally, since for a lazy SRW on the aforementioned non-lazy walk on T n and W 1 , W 2 , . . . are i.i.d. random variables of mean 2s 2 n and variance O(s 4 n ) (which are also independent of T ). Hence (g) follows from the CLT in conjunction with the aforementioned concentration of T around time 3 2 m n k.
We now describe the o(1)-perturbation described in the assertion of the theorem. For every vertex v ∈ A k for some k < m n b n s 2 n − 1, we increase the edge weight of every edge belonging to some γ u,w such that w is a left child of u in T v to 1 We consider also the perturbed network on T n in which we increase the edge weight between each vertex and its left child to 1 + 1/b 1/3 n . Let k, v and Y be as in the paragraph following (5.2). A simple network reduction shows that starting from v, conditioned on T ∂Tn < T L kmn−1 ( Tn) we have that the (conditional) law of f (Y ) w.r.t. the walk on the perturbed network on T n is the same as its (conditional) law w.r.t. the perturbed walk on T n . Similarly to part (2 a) of Fact 4.1, this law can be approximated well by that of a sum of m n i.i.d. Bernoulli(1/5 + O(1/b 1/3 n )) r.v.'s. As before, up to constants, we may consider P v [Y ∈ ∂T n | T ∂Tn < T L kmn−1 ( Tn) ] rather than P v [X T ∂Tn ∈ L (k+1)mn ( T n ) | T ∂Tn < T L kmn−1 ( Tn) ] (here both probabilities are considered w.r.t. the perturbed network). Note that the former event requires a deviation of order m n /b 1/3 n from the mean of f (Y ) and thus its conditional probability could be bounded from above by exp(−c 2 m n b −2/3 n ). By the above discussion, it is easy to see that also after this perturbation (d), (e) and (f) remain valid and that for all k < s 2 n b n for some constant c 2 > 0. In particular, since we took e mn/bn = Θ(s 2 n b n ) we get that P o T ∂Tn = T L s 2 n mnbn (T n ) = o(1).
As before, it follows that T ∂Tn is concentrated around 3s 4 n m n b n . By (d) and (e) this implies that after the perturbation the random walk exhibits cutoff around this time. In particular, the order of the mixing time increased by a factor of s 2 n . Observe that from our assumption that e mn/bn = Θ(s 2 n b n ) we get that log |V n |/ log log |V n | = O((s n b n ) 2 ). Note that by taking b n to tend to infinity arbitrarily slowly, we can increase the order of mixing time by a factor arbitrarily close to log |Vn| log log |Vn| (as long as it is o(log |V n |/ log log |V n |)). Remark 5.1. Lack of pre-cutoff implies that (along a certain subsequence) the chains mix "very-gradually". We note that the graphs G n from Theorem 2 exhibit, in some sense, the most gradual mixing possible. In general, t mix ( ) ≤ 2| log 2 | t mix , for all 0 < < 1/4. Using the results in [5], it is not hard to show that for reversible chains, under the product condition, there is some o(1) term (depending only on t  mix (1/2), for all 0 < < 1/2 and all n. We note that (by stretching the stretched edges in the construction by a slightly larger factor) we could have constructed the graphs G n from Theorem 2 so that t mix (G n )/girth(G n ) tends to infinity arbitrarily slowly (but still girth(G n ) = Θ(diameter(G n ))) and so that The probabilistic interpretation of (5.3) is that, loosely speaking, there is some random time τ n (which can be taken to be a certain hitting time) having roughly a Geometric distribution such that the chain is extremely mixed in time τ n + o(E[τ n ]) (but its distance from π is 1 − o(1) at time τ n − o(E[τ n ])).
6. Appendix 6.1. Hitting times connection to mixing times. The aim of this section is to introduce some general theory which shall reduce the analysis of our examples to the analysis of hitting time distributions of certain sets.
The following lemma is standard.
which coincides with Definition 6.3 (see e.g. [16,Remark 7.2]). We say that G is a c-lazy expander if ch L (G) > c. We say that a sequence of finite graphs (G n ) n≥1 is a family of c-lazy expanders if inf n ch L (G n ) > c.
The following theorem is the well-known discrete analog of Cheeger's inequality [3,4,23] (the proof could also be found at [16,Theorem 13.14]). Theorem 6.5. Let λ 2 be the second largest eigenvalue of a reversible transition matrix on a finite state space. Let Φ be as in Definition 6.3. Then 2) The following proposition enables us to reduce the problem of bounding d(t) from above in the proofs of Theorems 2 and 3 to the problem of estimating the probability that a certain large set A ("the center of mass of the chain") was not hit by time t. Proposition 6.6. Let G = (V, E) be a finite connected graph. Fix some edge weights (c e ) e∈E . Assume that 1 ≤ u c v,u ≤ D for all v ∈ V . Let (V, P, π) be the lazy random walk on the corresponding network.
We also have that π − π TV ≤ 1 − π( A) ≤ /3. By a straightforward coupling argument P r x (·) − P r (x, ·) TV ≤ P x [T ∂A < r] ≤ /3. Finally, by the triangle inequality we get that P r x − π TV ≤ 3 · 3 = . 6.2. A useful lemma for bounding the relaxation-time. The following lemma allows us to easily bound the relaxation-time of our examples. Lemma 6.7. Let G n = (V n , E n ) be a family of c-lazy expanders.
(i) Let H n be a sequence of graphs obtained by stretching some of the edges in G n by a factor of s n . Then t rel (H n ) = O(s 2 n ). (ii) Let F n = (W n , U n ) be a sequence of graphs obtained by decorating some of the vertices of G n with a 3 dimensional torus of side length k n . Then t rel (H n ) = Ω(k 3 n ). Proof. Since G n is an expander and H n is obtained from it by stretching some of the edges by a factor of s n , the Cheeger constant of H n is at least of order s −1 n (e.g. [13, Proposition 2.3]). Hence t rel (H n ) = O(s 2 n ) by (6.2). Part (ii) is obtained from (6.2).
6.3. Proof of Fact 4.1. The proof of part (1 a) is easy and hence omitted. We now prove (1 b). The upper bounds in (4.1) are trivial. We now prove that P o [X τn ∈ D], P o [X T Ln ∈ D] ≥ cP o [D] (for some 0 < c < 1, independent of D and n). This follows from the fact that for all d ∈ D, the probability that D was visited and that d is the last (resp. first) vertex in D to be visited is at least cP o [X τn = d] (resp. cP o [X T Ln = d]). We leave the details to the reader. The proof of (2 b) is analogous and hence omitted. We now prove (2 a). Denote the left and right children of o by u and v, respectively. Let T v and T u be the subtrees rooted at v and u, resp.. We define the left and right trees rooted at o to be the trees obtained by deleting T v and T u , resp.. Write w, w L and w R for the conductance from the root to infinity in the original tree, the left tree and the right tree, respectively. Let A be the event that the last vertex of {v, u} which was visited by the walk was u (i.e. the walk got absorbed in T u ). Using a standard network reduction and the fact that T v and T u are identical to T , we get the following relations: w = w L + w R , 1 w L = 1 1+ + 1 w , 1 w R = 1 + 1 w and P o [A] = w L w L +w R . Solving this system of equations yields that P o [A] = √ 1+ 1+ √ 1+ . Using again the fact that T v and T u are identical to T , allows us to repeat the argument and the claim now follows by induction.
6.4. Proof of Lemma 3.2. We first prove (3.1). We only prove the discrete-time lazy case, as the continuous-time case is analogous. By reversibility and the Markov property w.r.t. min(T z , T y ) P t L (x, y) π(y) = t k 1 =0 P x [T z = k 1 < T y ]P t−k 1 L (z, y)+ π(y) + P x [T y = k 1 < T z ]P t−k 1 L (y, y) π(y) = t k 1 =0 P x [T z = k 1 < T y ]P t−k L (y, z) π(z) + t k=0 P[T x z,y = k, T x y ≤ T x z ]P t−k L (y, y)/π(y) , which equals k:k≤t P[T x z,y = k, T x y > T x z ]P t−k L (z, z)/π(z). Substituting this in (6.3) yields the equality in (3.1). The inequality follows from the fact that for all a ∈ Ω we have that P s L (a, a)/π(a) is decreasing in s and tends to 1 as s → ∞ which follows from the spectral decomposition and the non-negativity of the eigenvalues of P L = 1 2 (I + P ). For (3.2) use (3.1) with z = y. We now prove (3.3). Here we only prove the continuous-time case.
Denote the densities of T z under H x and H y by f x z and f y z , resp.. Conditioning on T z (which is deterministically smaller than T y under H x in our current setup), then using reversibility and finally, conditioning on T z again (now under H y ), we get that