Cutoff for lamplighter chains on fractals

We show that the total-variation mixing time of the lamplighter random walk on fractal graphs exhibit sharp cutoff when the underlying graph is transient (namely of spectral dimension greater than two). In contrast, we show that such cutoff can not occur for strongly recurrent underlying graphs (i.e. of spectral dimension less than two).


Introduction
Markov chain mixing rate is an active subject of study in probability theory (see [21,26] and the references therein). Mixing is usually measured in terms of total variation distance, which for probability measures µ, ν on a countable set H is Specifically, the (ǫ-)total variation mixing time of a Markov chain Y = {Y t } t≥0 on the set of vertices of a finite graph G = (V, E), having the invariant distribution π, is T mix (ǫ; G) := min t ≥ 0 max x∈V (G) P x (Y t = ·) − π TV ≤ ǫ .
One of the interesting topics in the study of Markov chains is the cutoff phenomena, mainly for the total variation mixing time (see e.g. [21,Chapter 18]). The study of cutoff phenomena for Markov chains was initiated by Aldous, Diaconis and their collaborators early in 80s, and there has been extensive work in the past several decades. Specifically, a sequence of Markov chains {Y (N ) } N ≥1 on the vertices of finite graphs {G (N ) } N ≥1 has cutoff with threshold {a N } N ≥1 iff lim N →∞ a −1 N T mix (ǫ; G (N ) ) = 1, ∀ǫ ∈ (0, 1).
In the (switch-walk-switch) lamplighter Markov chains, each vertex of a locally connected, countable (or finite) graph G = (V, E) is equipped with a lamp (from Z 2 = {0, 1}), and a move consists of three steps: (a). The walker turns on/off the lamp at the vertex where he/she is, uniformly at random.
(b). The walker either stays at the same vertex, or moves to a randomly chosen nearest neighbor vertex.
(c). The walker turns on/off the lamp at the vertex where he/she is, uniformly at random.
Such a lamplighter chain on the graph G is precisely the random walk on the corresponding wreath product G * = Z 2 ≀ G (see Section 1.1 for the precise definitions), and the total variation mixing time of a lamplighter chain is closely related to the expected cover time of the underlying graph G, denoted hereafter by T cov (G). The study of cutoff for lamplighter chains goes back to Häggström and Jonasson [15] who showed that cutoff does not occur for the chain on one-dimensional tori, whereas for lamplighter chains on complete graphs, it occurs at the threshold a N = 1 2 T cov (G (N ) ). Peres and Revelle [25] further explore the relation between the mixing time of lamplighter chain on G (N ) and T cov (G (N ) ), showing that, under suitable assumptions, (1))T cov (G (N ) ). (1.1) The bounds of (1.1) cannot be improved in general, as the lower and the upper bounds are achieved for complete graphs, and two-dimensional tori, respectively. The same bounds apply for any Markov chain on X ≀ G (N ) , where in steps (a) and (c) the walker independently chooses the element from the finite set X according to some fixed strictly positive law. Indeed, for such chains total variation mixing time has mostly to do with the geometry of late points of G, namely those reached by the walker much later than most points. In particular, the lhs of (1.1) represents the need to visit all but O( ♯V (G)) points before mixing of the lamps can occur and the rhs reflects having the lamps at the invariant product measure once all vertices have been visited. Miller and Peres [22] provide a large class of graphs for which the lhs of (1.1) is sharp, with cutoff at 1 2 T cov (G (N ) ). Among those are lazy simple random walkers on d-dimensional tori, any d ≥ 3, for which [24] further examines the total-variation distance between the law of late points and i.i.d. Bernoulli points (c.f. [24,Section 1] and the references therein). Finally, the analysis of effective resistance on G (N ) = Z 2 N × Z [h log N ] plays a key role in [11], where it is shown that the threshold a(h)T cov (G (N ) ) for mixing time cutoff of lamplighter chain on such graphs, continuously interpolates between a(0) = 1 and a(∞) = 1 2 . Another topic of much current interest is the long time asymptotic behavior of random walks {X t } on (infinite) fractal graphs (see [1,18,19] and the references therein). Such random walks are typically anomalous and sub-diffusive, so generically E x [d(X 0 , X t )] ≍ t 1/dw and the walk-dimension d w exceeds two for many fractal graphs, in contrast to the srw on Z d for which d w = 2 (the notation a t ≍ b t is used hereafter whenever c −1 a t ≤ b t ≤ ca t for some c < ∞). A related important parameter is the volume growth exponent d f such that ♯B(x, r) ≍ r d f , where ♯B(x, r) counts the number of vertices whose graph distance from x is at most r. The growth of the eigenvalues of the corresponding generator is then measured by the spectral dimension d s := 2d f /d w , with the Markov chain {X t } strongly recurrent when d s < 2 and transient when d s > 2 (while d f = d s = d for the srw on Z d ).
We study here the cutoff for total variation mixing time of the lamplighter chain when G (N ) are increasing finite subsets of a fractal graph. While gaining important insights on the geometry of late points for the corresponding walks, our main result (see Theorem 1.4), is the following dichotomy: • When d s < 2 there is no cutoff for the corresponding lamplighter chain, whereas • if d s > 2, such cutoff occurs at the threshold a N = 1 2 T cov (G (N ) ).
If the behavior of the lamplighter chain in the critical case d s = 2 is likewise universal, then it should be having a mixing cutoff at a N = T cov (G (N ) ) (as in the two-dimensional tori example from [25]).

Framework and main results
Given a countable, locally finite and connected graph G = (V (G), E(G)), denote by d(·, ·) = d G (·, ·) the graph distance (with d(x, y) the length of the shortest path between x and y), and by B(x, r) = B G (x, r) := {y ∈ V (G) | d(x, y) ≤ r} the corresponding ball of radius r centered at x. A weighted graph is a pair (G, µ) with µ : V (G) × V (G) → [0, ∞) a conductance, namely a function (x, y) → µ xy such that µ xy = µ yx and µ xy > 0 if and only if xy ∈ E(G). We use the notation V (x, r) := µ(B(x, r)) and more generally µ(A) := x∈A µ x for A ⊂ V (G), where The discrete time random walk X = {X t } t≥0 associated with the weighted graph (G, µ) is the Markov chain on V (G) having the transition probability Let P t (x, y) = P t (x, y; G) := P x (X t = y) denote the distribution of X t with the corresponding heat kernel and Dirichlet form . The corresponding effective resistance R eff (·, ·) is given by We also consider the lazy random walkX = {X t } t≥0 on (G, µ), having the transition probabilitỹ The Dirichlet form and heat kernel ofX are then, respectivelyẼ(f, f ) = 1 2 E(f, f ) and We consider finite weighted graphs {(G (N ) , µ (N ) )} N ≥1 with ♯V (G (N ) ) → ∞. Using hereafter · (N ) for objects on (G (N ) , µ (N ) ) (e.g. denoting by R (N ) eff (·, ·) the effective resistance on (G (N ) , µ (N ) )), we make the following assumptions, which are standard in the study of sub-Gaussian heat kernel estimates (sub-ghke) (c.f. [2,19]).
for some Together with the uniform ellipticity, this implies that for somec < ∞ and therebyc To any finite underlying graph G = (V, E) corresponds the wreath product G * = Z 2 ≀ G such that , (g, y)} | f = g and xy ∈ E, or x = y and f (v) = g(v) for v = x} and we adopt throughout the convention of using y = (f, y) for the vertices of Z 2 ≀ G. The lazy random walkX on (G, µ) induces the switch-walk-switch lamplighter chain, namely the random walk Y = {Y t = (f t ,X t )} t≥0 on Z 2 ≀ G whose transition probability is One way to describe the moves of the Markov chain Y is as done before: first Y switches the lamp of the current position, then moves on G according toP , and finally switches the lamp on vertex on which it landed. We denote by )} t≥0 the lamplighter chain on weighted graphs (G (N ) , µ (N ) ), using P * (·, ·; G) whenever we wish to emphasize its underlying graph. The invariant (reversible) distribution of each X (N ) , and its lazy versionX (N ) , is clearly We next state our main result. In Section 3, we adapt to the setting of large finite weighted graphs, certain consequences of Assumptions 1.1 and 1.2 which are standard for infinite graphs. In case d f < d w , the relevant time scale for the cover time τ cov (G (N ) ) is shown there to be Applying in Section 4 results from Section 3 that apply for d f < d w , we derive the following uniform exponential tail decay for τ cov (G (N ) )/T N , which is of independent interest.
Starting with all lamps off, namely at Y 0 = x := (0, x), on the event {sup 0≤s≤t d(X 0 ,X s ) ≤ 1 4 R N }, all lamps outside B (N ) (x, 1 4 R N ) are off at time t. Hence, then P * t (x, ·; G (N ) ) − π * (·; G (N ) ) TV is still far from 0. Using this observation, we prove in Section 5 the following uniform lower bound on the lamplighter chain distance from equilibrium at time t ≍ T N . Proposition 1.6. If Assumptions 1.1, 1.2 hold, then for some finite c 1 , N 1 , any t and N ≥ N 1 , (1.8) In Proposition 5.1 we bound the lhs of (1.8) by max x P x (τ cov (G (N ) ) > t) provided t/S N is large (for S N of (3.10)). Since S N ≍ T N when d f < d w , contrasting Propositions 1.5 and 1.6 yields Theorem ) and lack of concentration of τ cov (G (N ) )/T N ). Propositions 1.6 and 5.1 apply also when d w < d f , but in that case τ cov (G (N ) ) ≥ ♯V (G (N ) ) ≫ T N , and the proof of Theorem 1.4(b), provided in Section 5.2, amounts to verifying the sufficient conditions of [22,Theorem 1.5] for cutoff at 1 2 T cov (G (N ) ). Indeed, the required uniform Harnack inequality follows from the uphi of Assumption 1.2, which as we see in Section 2 is more amenable to analytic manipulations than the Harnack inequality. (Yet, we note that quite recently stability of the (elliptic) Harnack inequalities is proved in [8].)

Cutoff in fractal graphs
We provide here a few examples for which Theorem 1.4 applies, starting with the following.
, is called the Sierpinski gasket graph. It is easy to confirm that if Assumption 1.1(a) holds for weight µ (N ) on G (N ) then such µ (N ) satisfies also Assumption 1.1(b) and Assumption 1.1(c) for d f = log 3/ log 2.
We further prove in Section 2.2 the following. In view of Proposition 2.2 and having d f < d w , we deduce from Theorem 1.4(a) that the total variation mixing time of the lamplighter chains of Example 2.1, admits no cutoff.
Sierpinski gasket graph is similarly defined, and by the same reasoning the corresponding lamplighter chains admit no mixing cutoff. In fact, one can deduce for a more general family of nested fractal graphs (see for instance [16,Section 2] for definition), that no cutoff applies.
We call F the generalized Sierpinski carpet if the following four conditions hold: is connected, and contains a path connecting the hyperplanes {x 1 = 0} and For a generalized Sierpinski carpet, let V (0) and E (0) denote the the 2 d corners of H 0 and d2 d−1 edges on the boundary of H 0 respectively, with Once again, it is easy to check that if Assumption 1.1(a) holds for weight µ (N ) on G (N ) , then such µ (N ) satisfies also Assumptions 1.1(b) and 1.1(c) for d f = log K/ log L.
We prove in Section 2.2 the following. Whereas directly verifying Assumption 1.2 is often difficult, as shown in Section 2.1, certain conditions from the research on sub-ghke are equivalent to phi and more robust. Indeed those equivalent conditions are key to our proof of Propositions 2.2 and 2.5.
In the context of Example 2.4, for carpets with central block of [4, lhs of (5.9)]), hence by Theorem 1.4(a) no cutoff for the corresponding lamplighter chain. In contrast, from [4, rhs of (5.9)] we know that ρ < 1 for high-dimensional carpets of small central hole (specifically, whenever b d−1 < L d−1 − L), so by Theorem 1.4(b) the corresponding lamplighter chains then admit cutoff at a N = 1 2 T cov (G (N ) ).

Stability of heat kernel estimates and parabolic Harnack inequality
We recall here various stability results for Heat Kernel Estimates (hke) and Parabolic Harnack Inequalities (phi), in case of a countably infinite weighted graph (G, µ). To this end, we assume • Uniform ellipticity: c −1 e ≤ µ xy ≤ c e for some c e < ∞ and all xy ∈ E(G), • p 0 -condition: µxy µx ≥ p 0 for some p 0 > 0 and all xy ∈ E(G), and recall few relevant properties of such (G, µ).
Definition 2.6. Consider the following properties for d w ≥ 2 and d f ≥ 1: for all x ∈ V (G) and r ≥ 1.
Note that in each implication of Theorem 2.7 the resulting values of (C D , C PI , θ, C CS ), C HK and C PHI depend only on p 0 , c e , d w and the assumed constants. For example, in (PHI(d w ))⇒ (HKE(d w )), the value of C HK depends only on C PHI , p 0 , c e and d w .
Combining Theorems 2.7 and 2.9 we have the following useful corollary.
Proceeding to construct for each N ≥ 1 a new weighted graph (G, µ ′(N ) ), recall that G (N +1) consists of three copies G (N,i) N ) ) and x ′ are symmetric w.r.t. ℓ N,1 or ℓ N,2 , 0, otherwise. (2.1) , hence from our construction of µ ′(N ) it follows thatũ (N ) (t, x) satisfy the heat equation corresponding to (G, µ ′(N ) ) on the time-space cylinder defined by (y 0 , R, T ). Since G has uniformly bounded degrees, the weighted graphs {(G, µ ′(N ) )} N satisfy a p ′ 0condition (for some p ′ 0 > 0 independent of N ). Further, {(G, µ ′(N ) )} N are uniformly rough isometric to (G, µ) (thanks to the uniform ellipticity of µ (N ) ). Hence, by Corollary 2.10, for some C ′ PHI < ∞, which does not depend on N , nor on the specific choice of y 0 , R and T , 2) by u (N ) and B (N ) (y 0 , R), respectively, may only decrease its lhs and increase its rhs. That is, (2.2) applies also for u (N ) (·, ·) and B (N ) (y 0 , R). This holds for all N and any of the preceding choices of y 0 , R, T , yielding Assumption 1.2, as stated.
Proof of Proposition 2.5. Consider the random walk, namely µ xy ≡ 1, on a limiting graph G that corresponds to a generalized Sierpinski carpet, as in Example 2.4. Clearly, (G, µ) is uniformly elliptic and of uniformly bounded degrees (so p 0 -condition holds as well). Further, such random walk has properties (V(d f )) and (HKE(d w )), with d f = log K/ log L ≥ 1 and d w = log(ρK)/ log L (see [3]). In particular, by Theorem 2.7 (G, µ) satisfies (PHI(d w )). With G (N +1) consisting of K copies of G (N ) , we extend the given weight µ (N ) on G (N ) to a weight µ ′(N ) on G. Specifically, the weight on the edges of the reflected part of G (N ) , as in Figure 4, is µ where K e ∈ [1, K] is the number of overlaps of e, and e ′ is the edge which moves to e by the reflection (so in Figure 4, we set µ .

(3.2)
Proof. (Sketch:) This is a finite graph analogue of (PHI(d w ))⇒ (HKE(d w )) of Theorem 2.7, which is standard for a countably infinite weighted graph (see [ Proof. Using the same arguments as in the proof of [20,  confined to certain balls, having our sub-hke restricted to t ≤ ηT N is immaterial here. Another consequence of (3.1) is the following upper bound on uniform mixing times.
For the proof of Proposition 3.3, consider the normalized Dirichlet forms ofX (N ) and X (N ) , N ) ) and define the spectral quantities Recall the following upper bound on uniform mixing times in terms of the corresponding spectral profile. Then, for any ǫ > 0 and all N , .
Our next lemma controls the spectral profiles on the rhs of (3.5) en-route to Proposition 3.3.
We conclude with a very useful covering property.
Proposition 3.6. Assumption 1.1 implies that for any η ∈ (0, 1], there exist N ) ). Proof. Covering V (G (N ) ) by a single ball of radius R N , thanks to (1.4) and the assumed d f -set con- N ) ) and we conclude that L ≤ (c c v ) 2 (2/η) d f for all N , as claimed.

Strongly recurrent case: d f < d w
A consequence of Assumptions 1.1, 1.2 for d f < d w is the following relation between the resistance metric and the graph distance.  The following corollary of Proposition 3.7 is immediate. Then, for some finite c ⋆ s (x, y) with q t (s) the probability that a Binomial(t, 1/2) equals s. Consequently,g (N ) (x, y) ≤ 2g (N ) (x, y) (since t q t (s) = 2). We further replace T U mix (ǫ; G (N ) ) in (3.11) by ηT N , for η := c(ǫ) of Proposition 3.3. Hence, from (3.1) for some c HK = c HK (η), all N and x = y, .
Since d f /d w > 1, the series on the rhs converges (even when d (N ) (x, y) = 0), and it is easy to further bound it by c ′ g d (N ) (x, y) dw−d f for some c ′ g = c ′ g (c HK ) finite, as we claim in (3.11).
4 Cover time: Proof of Proposition 1.5 We recall S N , r(G (N ) ) of (3.10) and use the following notations for x, y ∈ V (G (N ) ), r ∈ [0, 1], We show in Lemma 4.1 that for some ǫ ′ > 0, with positive probability, during its first S N steps, a random walk on G (N ) makes at least ǫ ′ r(G (N ) ) visits to the starting point.
. Proposition 1.5 then follows by using this fact, the Markov property and having S N ≍ T N (see Corollary 3.8).
We now implement the details of the preceding proof strategy. , form a partial sum, whose i.i.d. N-valued increments {η (N ) )] we thus have by Markov's inequality that With our graphs having uniform volume growth, [10, Theorem 1.4] applies here, giving the following modulus of continuity result. Proof.
Step 2. Turning to prove (4.2) when z = x, let τ = x} denote the first hitting time of x ∈ V (G (N ) ) by the random walk. Recall the commute time identity (see [21,Proposition 10.6]), that for any N and x = z in V (G (N ) ), Hence, so by the strong Markov property at τ , we see that for any z ∈ V (G (N ) ), Proof. Taking κ > 0 as in Proposition 4.3, we have that for all N and x, z ∈ V (G (N ) ), Applying the Markov property at times {4iS N } for i = 1, . . . , k − 1, it follows that is non-decreasing.

Lamplighter mixing: Theorem 1.4 and Proposition 1.6
Proof of Proposition 1.6. wlog we may and do assume that x 0 = (0, x 0 ) for some x 0 ∈ V (G (N ) ). Let where taking r N := ⌈(2d fc c v log 2 R N ) 1/d f ⌉ we have thanks to (1.4) and the d f -set condition, that By the same reasoning ♯V (G (N ) ) ≤c c v (R N ) d f , so for the invariant distribution π * (·; G (N ) ) of the lamplighter chain Y (N ) on Z 2 ≀ G (N ) Part of our d f -set condition is having R N → ∞, so there exists N 1 finite such that R 0 ≤ r N ≤ 1 4 R N for R 0 of Corollary 3.2 and any N ≥ N 1 . Since max y {d (N ) (x 0 , y)} ≥ 1 2 R N for any x 0 ∈ V (G (N ) ), whenever N ≥ N 1 the eventΓ Consequently, for any such N we have by (5.1) that which together with (5.2) completes the proof.
As shown next, at t ≫ S N the lazy walk is near equilibrium (in total variation), and the total variation distance of P * t (x, ·; G (N ) ) from its equilibrium law is then controlled by the tail probabilities of τ cov (G (N ) ).
Remark 5.2. In view of Proposition 1.5, here T mix (ǫ; Z 2 ≀ G (N ) )/T cov (G (N ) ) ≫ 1 for small ǫ. From Section 4 we also learn that, when d f < d w , the lamplighter chains have no mixing cutoff mainly because the laws of τ cov (G (N ) )/T cov (G (N ) ) do not concentrate as N → ∞ (unlike the transient case of d f > d w ).