Cutoff for Ramanujan graphs via degree inflation

Recently Lubetzky and Peres showed that simple random walks on a sequence of $d$-regular Ramanujan graphs $G_n=(V_n,E_n)$ of increasing sizes exhibit cutoff in total variation around the diameter lower bound $\frac{d}{d-2}\log_{d-1}|V_n| $. We provide a different argument under the assumption that for some $r(n) \gg 1$ the maximal number of simple cycles in a ball of radius $r(n)$ in $G_n$ is uniformly bounded in $n$.


Introduction
Generically, we denote the stationary distribution of an ergodic Markov chain (X t ) t≥0 by π, its state space by Ω and its transition matrix by P . We denote by P t x (resp. P x ) the distribution of X t (resp. (X t ) t≥0 ), given that the initial state is x. The total variation distance of two distributions on Ω is µ − ν TV = 1 2 y |µ(y) − ν(y)|. The total variation εmixing time is t mix (ε) := inf{t : max x P t x − π TV ≤ ε}. Next, consider a sequence of chains, ((Ω n , P n , π n )) n∈N , each with its mixing time t (n) mix (·). We say that the sequence exhibits a cutoff if the following sharp transition in its convergence to stationarity occurs: ∀ε ∈ (0, 1/2], lim n→∞ t (n) mix (ε)/t is the spectral radius of SRW on the infinite d-regular tree T d . Lubotzky, Phillips, and Sarnak [6], Margulis [8] and Morgenstern [9] constructed d-regular Ramanujan graphs for all d of the form d = p m + 1, where p is a prime number. Recently, Marcus, Spielman and Srivastava [7] proved the existence of bipartite d-regular Ramanujan graphs for all d ≥ 3. In light of the Alon-Boppana bound [10], Ramanujan graphs are "optimal expanders" as they have asymptotically the largest spectral-gap.
Remark 1.1. Our definition of asymptotically Ramanujan graphs is not the standard one. The more standard definition is that max{|λ i (n)| : It is elementary to show that for every n-vertex d-regular graph, the 1 − ε total variation mixing time for the SRW is at least t d,ε,n := d d−2 log d−1 n − C n| log ε|/d, for some constant C > 0. 1 The following precise formulation of this fact is due to Lubeztky and Peres [4].
(d−2) 3/2 and Φ −1 be the inverse function of the CDF of the standard Normal distribution. Then SRW on G satisfies Recently, Lubetzky and Peres [4] showed that simple random walks on a sequence of non-bipartite d n -regular Ramanujan graphs G n = (V n , E n ) of increasing sizes exhibit cutoff around the diameter lower bound dn dn−2 log dn−1 |V n |. In this work we present an alternative argument and prove the same result under the following assumption: Assumption 1: There exists a diverging sequence r n such that the maximal number of simple cycles in a ball of radius r n in G n is uniformly bounded in n. Theorem 1.3. Let G n = (V n , E n ) be a sequence of non-bipartite, finite, connected, d n -regular asymptotically one-sided Ramanujan graphs.
(i) If d n = d for all n and Assumption 1 holds then the corresponding sequence of simple random walks exhibits cutoff around time d d−2 log d−1 |V n |. (ii) If d n diverges and log d n = o(log dn |V n |) then the corresponding sequence of simple random walks exhibits cutoff around time log dn |V n |.
Remark 1.4. If there is no cutoff, then cutoff must fail on some subsequence (n k ) such that either lim k→∞ d n k = ∞ or d n k = d for all k for some fixed d ≥ 3. Thus there is no loss of generality in assuming that either lim n→∞ d n = ∞ or d n = d for all n.
1 This can be derived from the fact that C can be chosen so that the probability that the probability that the distance of the walk at time t d,ε,n from its starting point is at least ⌊log d−1 ( 1 4 εn)⌋ with probability at most ε 2 (together with the fact that a ball of radius ⌊log d−1 ( 1 4 εn)⌋ contains at most 1 2 εn vertices).
Assumption 1 is rather mild as it is quite difficult to construct a family of asymptotically one-sided Ramanujan graphs violating this assumption. In particular, it is satisfied w.h.p. by a sequence of random d-regular graphs of increasing sizes [5]. It follows from [1, Theorem 1] that if G n is a sequence of d-regular transitive asymptotically Ramanujan graphs of increasing sizes then lim n→∞ girth(G n ) = ∞, where for a graph G, girth(G) denotes its girth 2 (and so Assumption 1 holds).
The argument of Lubetzky and Peres [4] does not require Assumption 1 (nor the assumption log d n = o(log dn |V n |)). They studied the Jordan decomposition of the transition matrix of the non-backtracking walk 3 and used it to derive cutoff for the non-backtracking walk, which for a regular graph implies cutoff also for the SRW. In this note we study the SRW by looking at it only when it crosses distance k from its previous position, for some large k.

Organization of this note
In § 2, as a warm up, we present an extremely simple and short proof for the occurrence of cutoff for SRW on a sequence of asymptotically Ramanujan graphs of diverging degree. In § 3 we present some machinery for bounding mixing times using hitting times. We then apply this machinery to prove Part (ii) of Theorem 1.3. In § 4 we give an overview of the proof of Part (i) of Theorem 1.3. In § 5 we prove two auxiliary results. Finally, in § 6 we conclude the proof of Theorem 1.3.

A warm up
It turns out that for a sequence of asymptotically Ramanujan graphs of diverging degree the trivial diameter lower bound (of Lemma 1.2) is matched by the trivial spectral-gap upper bound on the L 2 mixing time obtained via the Poincaré inequality. As a warm up and motivation for what comes we now prove the following theorem.
Theorem 2.1. Let G n = (V n , E n ) be a sequence of non-bipartite, finite, connected, d n -regular asymptotically Ramanujan graphs with d n → ∞. Then the corresponding sequence of simple random walks exhibits cutoff around time log dn |V n |.
Note that in Part (ii) of Theorem 1.3 the graphs are assumed to be only asymptotically one-sided Ramanujan. Before proving Theorem 2.1 we need a few basic definitions and facts. Let λ := max{|a| : a = 1, a is an eigenvalue of P } and t rel : The L 2 distance of P t x from π is defined as P t x − π 2 2,π = y π(y)(P t (x, y)/π(y)) 2 − 1.
2 The girth of a graph G is the length of the shortest cycle in G. 3 This is a random walk on the directed edges of the graph, with transition matrix P NB ((x, y)(z, w)) = By Jensen's and the Poincaré inequalities, for all t and x we have that Hence for SRW on an n-vertex regular graph we have for all t and x that Proof of Theorem 2.1: By assumption λ = ρ

Replacing the Poincaré inequality by its hitting time analog
In the proof of Theorem 1.3 we exploit the general connection between mixing times and escape times from small sets, established in [2] (Corollary 3.1 eq. (3.2)): There exists some absolute constant C > 0 such that for every reversible chain (with a finite state space), In the proof of Theorem 1.3 we replace the naive L 2 bound used in the proof of Theorem 2.1 by its hitting time counterpart: Under reversibility, for all A Ω, a ∈ A and t ≥ 0 where π A is π conditioned on A, P A is the restriction of the transition matrix P to A (this is the transition matrix of the chain which is "killed" upon escaping A), f 2 2,A := b∈A π A (b)f 2 (b) for f ∈ R A and λ(A) is the largest eigenvalue of P A . The following proposition relates λ(A) to λ 2 , the second largest eigenvalue of P .
Similarly to (2.1), by (3.1)-(3.3) we have for every reversible chain on a finite state space with λ 2 < 1/2 and every α ∈ (0, λ 2 ] that We are now in a position to give a short proof for Part (ii) of Theorem 1.3.
Proof. Let G n = (V n , E n ) be a sequence of non-bipartite, finite, connected, d n -regular asymptotically one-sided Ramanujan graphs. Assume that d n diverges and log d n = o(log dn |V n |).
. Let λ 2 = λ 2 (n) be the second largest eigenvalue of the transition matrix of SRW on G n . By our assumptions 2λ 2 = d − 1 2 +o(1) n and so by (3.4) we have that The proof is concluded using Lemma 1.2.

Degree inflation
The simple proof of Part (ii) of Theorem 1.3 motivates looking at the following graph.  Let G = (V, E) be a d-regular finite Ramanujan graph. Assume that Assumption 1 holds. Let r = r n be as in Assumption 1. Fix some k = k n such that 1 ≪ k ≪ √ r.
Remark 4.4. Let K, W and T i be as in Definitions 4.1 and 4.2. By Assumption 1, for every x, y ∈ V of distance k from one another In fact, Assumption 1 could have been replaced by the assumption that max{W (x, y), (1)) and that T 1 is concentrated around dk d−2 (uniformly for all initial states).

An overview of the proof of Part (i) of Theorem 1.3
Let G, k and r be as above. Intuitively, if either the SRW on G(k) or the chain Y (from Definitions 4.1 and 4.2) exhibit an abrupt convergence to stationarity around time t = t n , then also the SRW on G should exhibit an abrupt convergence to stationarity around time t · d d−2 k. The term d d−2 k comes from the fact that (by Assumption 1) the expected time it takes the walk on G to get within distance k from its current position is d d−2 k(1 + o (1)). While the chain Y is more directly related to the SRW on G, it is harder to analyze it directly since it need not be reversible and a-priori it is not clear that its stationary distribution is close to the uniform distribution. Instead we analyze the walk on G(k) and use it to learn about Y and then in turn about the walk on G.
In light of Part (ii) of Theorem 1.3 (which has already been proven) a natural strategy for proving Part (i) of Theorem 1.3 is to show that λ 2 (K) = ρ , where D is the maximal degree in G(k), K is the transition matrix of SRW on G(k) and λ 2 (K) is its second largest eigenvalue. Unfortunately, we do not know how to show this (see the first paragraph of § 5). Instead, we obtain such an estimate for λ K (A), the largest eigenvalue of K A , the restriction of K to A, for any "small" set A. By small we mean that its stationary probability is at most α := (d − 1) −3k 2 . Indeed, the key to the proof of Part (i) of Theorem (1)) for every small set A. Using (3.2) we get for the walk on (1)) . We then show that the same holds for Y (this is obvious when 2k < girth(G); The general case is derived using the fact that, as mentioned in Remark 4.4, cW (x, y) ≤ K(x, y) ≤ CW (x, y) for all x, y). Finally, using an obvious coupling between Y and the SRW on G, after multiplying by d d−2 k(1 + o(1)) the last bound is transformed into a bound on hit 1−α (o(1)) for SRW on G (for some o(1) terms).

Auxiliary results
In order to control λ K (A) (for small A), apart from Proposition 3.1 we need the following comparison result. While there are similar comparison techniques for the spectral-gap, we are not aware of a comparison technique which allows one to argue that λ 2 (the second largest eigenvalue of the transition matrix) of one chain is close to 0 (say, that λ 2 = o(1)) if that of another chain is close to 0.
Proof: Denote f, g π (i) . By the Perron-Frobenius Theorem Before proving Theorem 1.3 we need one more lemma. For any s ≥ 0 there exist some constant C(s, d) > 0 and k s such that if k ≥ k s , t(B k ) ≤ s and D k = ∅ then This follows from a standard argument involving the covering tree of G. A non-backtracking path of length ℓ is a sequence of vertices (v 0 , v 1 , . . . , v ℓ ) such that {v i , v i−1 } ∈ E and v i+2 = v i for all i. Let P ℓ be the collection of all non-backing paths of length ℓ starting from v. Let T d be the (infinite) d-regular tree. We may label the ℓth level of T d by the set P ℓ (in a bijective manner) such that the children of (v, v 1 , . n=0 is a SRW on T d (labeled as above) started from (v) (which is the root) then (φ(S n )) ∞ n=0 is a SRW on G started from v. Denote the law of (S n ) ∞ n=0 by P v .
We prove this by induction on s. The base case t(B k ) = 0 is trivial (it holds with C(1, d) = 1). Now consider the case that i . We now show that there is some constant K(s, d) and an edge e = {x, y} ∈ E belonging to some cycle in B k such that x ∈ B k , y ∈ B k−1 and Once this is established, invoking the induction hypothesis concludes the induction step.
Consider an arbitrary cycle in B k with at most one vertex in D k . Let x be the vertex of the cycle which maximizes P x [T D k = T z ]. Let e = {x, y}, e ′ = {x, y ′ } be the two edges of the cycle which are incident to x. Without loss of generality, let e be the one through which x is less likely to be reached. More precisely, assume that Also, by the choice of x we have that . Now consider the case that x / ∈ D k . Denote T x,y := min{T x , T y } and T + x := inf{t > 0 : Thus in order to conclude the proof of (5.2) it remains only to show that

By (5.3) we have that
Hence, there exists some constant M(s, d) such that where in the second inequality we have used the fact that P x [min{T + x , T y } > T D k ] ≥ c(s, d) for some constant c(s, d) > 0 5 and that by the choice of x (namely, by (5.4)) we have that We leave the missing details as an exercise. Finally, combining (5.5) and (5.6) yields (5.2).
6 Proof of Theorem 1.3 Part (ii) was proven in § 3. Let G n = (V n , E n ) be a sequence of non-bipartite, finite, connected, d-regular asymptotically one-sided Ramanujan graphs satisfying Assumption 1. Let r n → ∞ be as in Assumption 1. Pick some k = k n → ∞ such that k 2 n = o(r n ). From this point on we often suppress the dependence on n from our notation. Denote the transition matrix of SRW on G (resp. G(k)) by P (resp. K) and its stationary distribution by π (resp. π G(k) ). Let A be an arbitrary set such that π(A) ≤ α = α n := d −3k 2 . Denote Q := P k+2k 2 .
Before proceeding with the proof, we explain the choice of k + 2k 2 in the definition of Q. In order to obtain an upper bound on λ K (A) we shall apply Proposition 5.1 with P t (for some t) and K in the roles of P (2) and P (1) (respectively) from Proposition 5.1. The obtained estimate is useful only when t ≥ ck 2 . Heuristically, this is related to the fact that a SRW on a d-regular tree is much more likely to be at time t at some given vertex of distance O( √ t) from its starting point, than at some other given vertex at distance ≫ √ t from its starting point (and we want k = O( √ t)).
Recall that ρ d : . Let λ 2 and λ ′ 2 be the second largest eigenvalues of P and Q, respectively. Since , by decreasing k if necessary, we may assume that By Proposition 3.1 (using the notation from there) and our choice of α, Let (S t ) ∞ t=0 be SRW on T d , the infinite d-regular tree rooted at o. Denote its transition kernel by P T d . Denote the ith level of T d by L i . LetS t be the level S t belongs to. Let v ∈ L k . Let T + 0 := inf{t > 0 :S t = 0}. Then by Lemma 6.1 (second inequality) Let x, y be a pair of adjacent vertices in G(k). It is standard that P t (x, y) ≥ P t T d (o, v) for all t (where v is as above), and so by (6.2) By Proposition 5.1 (and borrowing the notation from there) in conjunction with (6.1), (6.3) and Assumption 1 (which implies that there exists some constant C 0 = C 0 (d) > 0 such that miny deg G(k) (y) ≤ C 0 and that if x, y are of distance k in G then K(x, y) ≤ C 0 (d − 1) −k ), we have that (1)) . Denote the probability w.r.t. SRW on G(k) by P. By (3.2) we have for all t (uniformly) that max (a,A):a∈A,π(A)≤α where we have used the fact that max x∈V π G(k) (x)/π(x) ≤ C 0 , where C 0 is as above.
Consider SRW on G, (X t ) ∞ t=0 . Let T 0 := 0 and inductively, Let W be its transition matrix. By Assumption 1 and Lemma 5.2 there exists some constant C = C(d) such that for all x, y ∈ V of distance k from one another (in G), To conclude the proof (using (3.1) in conjunction with Lemma 1.2), we now show that (for some o(1) terms) substituting above t = ⌈(1 + o(1)) d d−2 log d−1 |V |⌉ and s = t/ √ k + t 2/3 (the value 2/3 in the exponent can be replaced by any number in (1/2, 1)) yields max (a,A):a∈A,π(A)≤α P a [T A c > t + s] = o(1). By (6.6) it suffices to show that for this choice of s and t we have that max a∈V P a [T τ (t) > t + s] = o(1).
Fix s and t as above. We say that time j is good if X j has d − 1 neighbors of greater distance from X T i(j) , where i(j) is the index for which j ∈ [T i(j) , T i(j)+1 ). Let U i := |{t ∈ [T i , T i+1 ) : t is not good}| and U := By Assumption 1 we have that max v P v [U 0 > ℓ] ≤ C ′ e −cℓ for all ℓ, for some constants c, C ′ > 0 (this is left as an exercise). By the Markov property, it follows that Consider a coupling of the SRW on G (X j ) ∞ j=0 with the SRW on T d started from its root o (S j ) ∞ j=0 in which if j is the ℓth good time, then dist G (X j+1 , X T i(j) ) < dist G (X j , X T i(j) ) iff dist T d (S ℓ+1 , o) < dist T d (S ℓ , o) (unless S ℓ = o, but there is no harm in neglecting this possibility, as the number of returns to o has a Geometric distribution). Using this coupling we get that for all a ∈ V we have that To see that max 0≤j≤⌈ t √ k ⌉ P o [S t+s−j ∈ ∪ τ (t)+j i=0 L i ] = o(1) use the fact that the distance of S t+s−j from o is concentrated around d−2 d (t + s − j) within a window whose length is of order √ t (c.f. [4] (2.2)-(2.3) pg. 9) and that by our choice of s we have that d−2 d (t+s−j)−(τ (t)+j) ≫ √ t, for all 0 ≤ j ≤ ⌈ t √ k ⌉. Lemma 6.1. Let M be the number of paths of length k + 2k 2 in Z, starting from 0, which end at k and do not return to 0. Then M ≥ c 0 2 k+2k 2 /k 2 .