Quenched invariance principle for a class of random conductance models with long-range jumps

We study random walks on Zd\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbb {Z}}^d$$\end{document} (with d≥2\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$d\ge 2$$\end{document}) among stationary ergodic random conductances {Cx,y:x,y∈Zd}\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\{C_{x,y}:x,y\in {\mathbb {Z}}^d\}$$\end{document} that permit jumps of arbitrary length. Our focus is on the quenched invariance principle (QIP) which we establish by a combination of corrector methods, functional inequalities and heat-kernel technology assuming that the p-th moment of ∑x∈ZdC0,x|x|2\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sum _{x\in {\mathbb {Z}}^d}C_{0,x}|x|^2$$\end{document} and q-th moment of 1/C0,x\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$1/C_{0,x}$$\end{document} for x neighboring the origin are finite for some p,q≥1\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p,q\ge 1$$\end{document} with p-1+q-1<2/d\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p^{-1}+q^{-1}<2/d$$\end{document}. In particular, a QIP thus holds for random walks on long-range percolation graphs with connectivity exponents larger than 2d in all d≥2\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$d\ge 2$$\end{document}, provided all the nearest-neighbor edges are present. Although still limited by moment conditions, our method of proof is novel in that it avoids proving everywhere-sublinearity of the corrector. This is relevant because we show that, for long-range percolation with exponents between d+2\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$d+2$$\end{document} and 2d, the corrector exists but fails to be sublinear everywhere. Similar examples are constructed also for nearest-neighbor, ergodic conductances in d≥3\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$d\ge 3$$\end{document} under the conditions complementary to those of the recent work of Bella and Schäffner (Ann Probab 48(1):296–316, 2020). These examples elucidate the limitations of elliptic-regularity techniques that underlie much of the recent progress on these problems.


INTRODUCTION
Random walks among random conductances have seen much interest in recent years. The term "random walk" actually refers to a Markov chain whose states will be confined, for the purpose of the present paper, to the d-dimensional hypercubic lattice Z d and the transition probabilities P(x, y) determined by a collection {C x,y : x, y ∈ Z d } of non-negative numbers via where π(x) ∈ (0, ∞) is assumed for all x ∈ Z d . The symmetry condition is imposed and the common value is called the conductance of unordered edge x, y . As is easily checked, π is then a reversible measure for the chain. The setting naturally includes the cases when only nearest-neighbor jumps occur, i.e., those for which C x,y := 0 whenever x and y are not nearest neighbors in Z d (this includes x = y).
c 2020 M. Biskup, X. Chen, T. Kumagai and J. Wang. Reproduction, by any means, of the entire article for non-commercial purposes is permitted without charge.
Many "ordinary" random walks are naturally covered by the above setting; notably, the simple random walk when C x,y is set to one for nearest neighbors x and y and zero otherwise, or random walks with α-stable tail when C x,y := |x−y| −(d+α) , where α ∈ (0, 2) and |x| denotes the Euclidean norm of x. Our interest here is in the situation when {C x,y : x, y ∈ Z d } is itself random. Writing P for the law of the conductances and E for its expectation, we impose: Assumption 1.1 Throughout we assume: (1) P is stationary and jointly ergodic with respect to the shifts of Z d .
In this framework we then ask what conditions guarantee various properties known for the "ordinary" random walks with symmetric jumps, e.g., lack of speed, recurrence/transience, etc. Here we will focus on the validity of an Invariance Principle, i.e., convergence of the path law to Brownian motion under the diffusive scaling of space and time.
The so called Annealed (or Averaged) Invariance Principle (AIP) has been known since late 1980s (Kipnis and Varadhan [35], De Masi, Ferrari, Goldstein and Wick [30,31]). The adjective "annealed" refers to the convergence taking place for a joint law of the chain and the environment. With Assumption 1.1 in force, this convergence was shown under the moment conditions These are directly linked to the limiting Brownian motion having finite and positive variance and so, in this sense, can be regarded as optimal.
Much effort in the past 15 years went to derivations of Individual or Quenched Invariance Principle (QIP) where the convergence to Brownian motion takes place for a.e. sample of the random conductances. The influential initial study by Sidoravicius and Sznitman [43], where a QIP was proved for all uniformly-elliptic nearest-neighbor conductances, elucidated the need for additional ingredients compared to AIP; namely, estimates on the heat kernel. Analyses of the simple random walk on supercritical percolation clusters (Sidoravicius and Sznitman [43], Berger and Biskup [14], Mathieu and Piatnitski [41]) then paved the way to a complete resolution of all i.i.d. nearest-neighbor random conductance models (Mathieu [40], Biskup and Prescott [23], Barlow and Deuschel [11] and Andres, Barlow, Deuschel and Hambly [3]).
Compared to the i.i.d. cases, our understanding of general non-uniformly elliptic conductances remains only partial and often restricted to special cases. For nearest-neighbor models satisfying Assumption 1.1, the restriction may come as a limitation on the spatial dimension: Indeed, as shown in Biskup [19,Exercise 4.4 and Theorem 4.7], a QIP holds true in d = 1, 2 whenever These are deemed sharp in light of (1.3) although examples violating (1.4) exist for which QIP fails yet AIP holds (Barlow, Burdzy and Timár [10]). Another way to limit the form of the distribution is through decay of correlations. Indeed, Procaccia, Rosenthal and Sapozhnikov [42] proved a QIP in correlated percolation models subject to technical conditions on correlation decay.
A third type of restriction comes via moment conditions on individual (nearest-neighbor) conductances. These can be expressed by means of numbers p, q ≥ 1 such that |x − y| = 1 ⇒ C x,y ∈ L p (P) and 1 C x,y ∈ L q (P). (1.5) For these Andres, Deuschel and Slowik [5] proved a QIP under the condition 1/p + 1/q < 2/d. This extends to a local QIP [6] and, under different moment conditions, also to the control of the heat kernel [7]. Bella and Schäffner [13] recently improved the methods of [5] and gave a proof of a QIP under a slightly weaker condition (1.6) We will show that this is, in fact, infinitesimally close to sharp, at least for the method of proof used (in the nearest-neighbor conductance models).
The main goal of the present paper is to push the control of a QIP to include models with arbitrarily large jumps. We will work under the following moment assumption: Assumption 1.2 Assume d ≥ 2 and that there are p, q ∈ (1, ∞) satisfying such that ∑ x∈Z d C 0,x |x| 2 ∈ L p (P) (1.8) and 1 C 0,x ∈ L q (P) whenever |x| = 1. (1.9) In particular, C 0,x > 0 for all |x| = 1 P-a.s. Assumption 1.2 is a direct extension of the conditions from the work [5], which is thus subsumed by the present paper (albeit with rather different proofs). We note that C x,y > 0 for all nearestneighbors x and y ensures that the underlying Markov chain is irreducible.

MAIN RESULTS
We will invariably work with random collections of conductances {C x,y = C y,x : x, y ∈ Z d } such that π(x) ∈ (0, ∞) for a.e. sample from P. This ensures that the transition probability in (1.1) is well defined almost surely in all cases of interest. We will write Z := {Z n : n ≥ 0} for the paths of the associated discrete-time Markov chain and use P x to denote its law subject to the initial condition P x (Z 0 = x) = 1.

QIP for general conductances.
As noted above, our main point of interest is the validity of the Quenched Invariance Principleor QIP for short -which we formalize as follows: We will say that a QIP holds if for each T > 0 and P-a.e. realization of the conductances, the law of B (n) induced on C([0, T ]) by P 0 tends weakly, as n → ∞, to that of a Brownian motion whose covariance is non-degenerate and constant a.s.
Our main result is then: Then a QIP holds under Assumptions 1.1 and 1.2.
As noted earlier, for nearest neighbor conductances that are bounded from below, our results degenerate to those of [5]. An interesting corollary arises in the context of random walks on a family of long-range percolation graphs. These graphs are obtained from a "nice" underlying graph, in our case Z d , by adding edges independently with probability that depends only on the displacement between the endpoints. While this probability is typically assumed to decay as a power of the distance, our formulation only requires a summability condition.
consider a random graph with vertices Z d and an (unoriented) edge between x and y present with probability p(y − x), independently of all other edges. Then a QIP holds for the simple random walk on this graph.
The graphs in Corollary 2.3 are automatically connected (because p(x) = 1 for |x| = 1) and of finite degree at every vertex (as ensured by ∑ x∈Z d p(x) < ∞). Restricting attention to power-law decaying connection probabilities, our results show that if In the regime s ∈ (d, d + 2), the walk on the long-range percolation graph is supposed to scale to a stable process with index α := s − d. This was proved for α ∈ (0, 1) in all d ≥ 1 by Crawford and Sly [26,27] under the L r -space topology for any r ∈ [1, ∞) (which is weaker than the Skorohod topology). In d = 1 the regime when a QIP holds extends to all α > 1, i.e., even the beyond the summability of ∑ x∈Z d |x| 2 p(x), cf. [27, Theorem 1.2] (see also Kumagai and Misumi [37,Theorem 2.2] concerning heat kernels). This is due to absence of percolation and the existence of cut-points. A corrector-based approach exists as well (Zhang and Zhang [44]).
We note that, in the regime s ∈ (d, 2d), the long-range percolation graph is rather different from Z d . Indeed, as shown by Biskup [17,18] and Biskup and Lin [21], the graph distances grow polylogarithmically with the Euclidean distance and balls in the intrinsic metric thus exhibit stretched-exponential volume growth. When s = 2d, the scaling of intrinsic metric relative to Euclidean one is only polynomial, but with exponents strictly less than one (Ding and Sly [32]). Notwithstanding, for s > d + 2, these do not seem to affect the asymptotic of the random walk, at least at the level of AIP.

Lack of everywhere sublinearity.
Our second set of our results address limitations of the techniques presently used for proofs of the QIP. This requires introduction of the basic object of stochastic homogenization, the socalled corrector χ. Consider the generator L := P − id associated with the discrete-time Markov kernel P. Explicitly, L acts on finitely-supported f : Z d → R as Under Assumption 1.1, the conditions (1.3) permit the construction of a random function χ : Z d → R d which is characterized by the following properties: (1) normalization χ(0) = 0, (2) stationarity of increments under the shifts of Z d harmonicity of the function Ψ(x) := x + χ(x) (2.6) in the sense that (For this reason, Ψ is sometimes referred to as "harmonic coordinate.") We refer to, e.g., Biskup [19,Proposition 3.7] for a detailed exposition and proofs of this otherwise completely classical material.

Remark 2.4
In all QIPs discussed in this paper, the covariance matrix Σ = (Σ i j ) of the limiting Brownian motion is related to the corrector via Then there is a law P on nearest-neighbor conductances satisfying Assumption 1.1(1) and |x| = 1 ⇒ C 0,x ∈ L p (P) and for which the corrector is (well defined yet) not sublinear everywhere.
Modulo a boundary case, condition (2.11) is complementary to (1.6) under which Bella and Schäffner [13] proved that the corrector is sublinear everywhere and thus a QIP holds. Theorem 2.6 thus makes it unlikely that the elliptic-regularity methods underlying [5,13] would yield a proof of a QIP under the presumably optimal conditions (1.4).

Main ideas.
Although our proofs are based on a combination of the corrector method with heat-kernel technology, our strategy is somewhat different from that used in proofs of QIPs so far. Through the use of functional inequalities we first control the first exit times of the walk from large balls. These are used to prove tightness of diffusively-scaled Markov-chain paths. The proof of a QIP then boils down to the proof of a quenched CLT. For this we use the corrector method but, since this is "just" a CLT, with everywhere sublinearity replaced by sublinearity on average.
As usual, we work primarily with continuous-time versions of our random walk. A key innovation is the use of ν(x) := ∑ y∈Z d C x,y |x − y| 2 (2.13) as the time-change measure for the walk. The need for this particular normalization was discovered in the derivation of off-diagonal heat-kernel estimates using the so called Davies method; cf the proof of Proposition 3.7. Another instance where this measure naturally appears are estimates on Dirichlet forms of spatially-mollified functions; see the proof of Proposition 3.3.
Unlike the recent work [5,6] on QIPs under moment conditions, in order to control the heat kernel and exit probabilities we do not use complicated inductive schemes such as Moser or De Georgi iterations. Instead, we base our argument on localization, which amounts to restricting jumps larger than unity to only a finite "active" ball, and truncation, by which we discard jumps larger than a constant (called κ below) multiple of the "active" ball radius. The localization helps us control "small" jumps using Sobolev inequalities and standard techniques from heat kernel theory. The contribution of "large" jumps is managed with the help of so called Meyer's construction. While part of the ideas are drawn from a recent paper by some of the authors [25], their extension to the present context requires new ideas and non-trivial generalizations.
Although we do not address everywhere sublinearity of the corrector in our parameter regime, we suspect that it does hold under Assumption 1.1 and 1.2. The counterexamples in the nearestneighbor case are strongly inspired by analogous examples of i.i.d. nearest neighbor conductances (Mathieu and Remy [29], Berger, Biskup, Hoffman and Kozma [15], Biskup and Boukhadra [20]) for which the return probabilities exhibit strongly subdiffusive decay while the path distribution still scales to a non-degenerate Brownian motion. The key mechanism there is trapping.

Open problems.
We finish by stating a few open problems that naturally build on the results of the present note. As a starter, we pose:

14)
where k 1 is the probability density of a centered normal with covariance (2.8).
We believe that this holds under the same conditions as a CLT by analogy with the nearestneighbor situations in [5,6]. Andres, Chiarini and Slowik [4] recently extended the iteration methods underlying [5,6] to non-elliptic situations.
Another extension that we believe should be possible by a reasonably straightforward extension of the methods of the present work is the content of: 2) with s > 2d and p(x) := p when |x| = 1 for some p ∈ [0, 1). Assume that the random graph with vertex set Z d and an edge between x and y present with probability p(y − x), independently of other edges, contains an infinite connected component C ∞ a.s. Prove that the simple random walk on C ∞ obeys a QIP.
Here the key challenge is the potential absence (as even p = 0 is allowed) of nearest-neighbor edges in the computations involving Dirichlet forms in our proofs. Barlow [9] and the recent work of Flegel, Heida and Slowik [33] provide good possible starting points.
In light of Theorems 2.5 and 2.6, a different strategy than used so far is needed to get a QIP beyond the regime marked by (1.6) or (1.8) and, in particular, for long-range percolation graphs with decay exponents s ∈ (d + 2, 2d]. Here we propose to start with: Problem 2.9 Consider the long-range percolation graph with exponents s ∈ (d + 2, 2d] and p(x) = 1 for |x| = 1. Prove a QIP.
The requirement p(x) = 1 ensures that the underlying graph is connected. A key obstacle is thus the lack of the Sobolev inequalities underlying our proofs. Although the corrector fails to be everywhere sublinear in these cases, this is not an obstacle for our approach, for which sublinearity on average is sufficient. We find it worthwhile to start by addressing the non-percolating regime, i.e., the situations when, upon removal of the nearest-neighbor edges, the graph does not contain an infinite connected component a.s.
A considerably more robust way to go beyond the p, q-conditions would be to prove corrector sublinearity along typical paths of the Markov chain This is, in fact, what underlies the known proofs of the AIP under the optimal conditions (1.3) and even the present paper goes part of the way along this line. Ba and Mathieu [8] have been able to utilize this strategy to prove a QIP for a continuum diffusion in a random environment subject to a periodicity requirement. Our last question, which is undoubtedly the one most ambitious, concerns the random walk on one-dimensional long-range percolation graphs (i.e., the setting of Corollary 2.3, (1,2) and (2.2)). Indeed, as noted above, there we get (s − 1)-stable process convergence when s ∈ (1, 2) and a Brownian limit when s > 2.
Problem 2.10 Prove (quenched or annealed) convergence for suitably scaled random walk on one-dimensional long-range percolation graphs for s = 2.
We conjecture that all the α-stable limits with α ∈ (1, 2) somehow appear for s = 2. If so, we would expect that the index of stability depends on the precise asymptotic of the connection probabilities; i.e., on β := lim |x|→∞ |x| 2 p(x). Since the 1/r 2 -percolation model exhibits several phase transitions (cf Aizenman and Newman [1], Imbrie and Newman [34]), the dependence of α on β may even undergo interesting phase transitions as well.

FUNCTIONAL INEQUALITIES AND HEAT-KERNEL ESTIMATES
We will now move to the exposition of the proofs. In this section, we develop the main technical ingredients underlying the proof of QIP in Theorem 2.2. We start by introducing continuous-time versions of our discrete-time Markov chains.

Continuous time processes.
Recall that Z := {Z n : n ≥ 0} denotes the discrete-time process on Z d with transition probabilities P(x, y) and associated stationary measure π as defined in (1.1). We will consider two continuoustime variants of Z. The first one is the canonical variable-speed chain X := {X t : t ≥ 0} -the VSRW -obtained from Z by taking jumps at independent exponential times whose parameter at x is π(x). The process X is then a continuous-time Markov chain on Z d with the generator which, we note, coincides with that in (2.4). The counting measure µ(x) := 1 on Z d is stationary and reversible for X . Hence, the Dirichlet form (D, F ) associated with the process X is given by Here, for any p ∈ [1, ∞) and any measure λ on Z d , let ℓ p (λ ) denote the space of p-integrable functions f : Z d → R and denote by f ℓ p (λ ) the corresponding ℓ p -norm.
Our second, and more important, continuous-time chain Y := {Y t : t ≥ 0} will be a time-change of the process X defined as follows: and Y is thus reversible with respect to ν. (Alternatively, Y can be defined directly from Z and independent exponentials that at x have parameter π(x)/ν(x).) In particular, the Dirichlet form ( D, F ) associated with the process Y is given by (3.5) We will henceforth think of the chains Z, X and Y as defined on the same probability space, and write P x for the joint law of their paths where (each) chain is at x at time zero a.s. We will use E x to denote expectation with respect to P x . The random processes X , Y and Z on Z d naturally induce corresponding random processes on the space of random environments, via the "point of view of the particle." These are stationary and reversible with respect to the measures Q X , Q Y and Q Z , respectively, defined by where ω denotes a generic element from the sample space carrying the conductance law P. Thanks to our assumptions, all three measures are mutually absolutely continuous with respect to P. This structure ensures absence of finite-time blow-ups: Lemma 3.1 Suppose Assumptions 1.1 and 1.2 hold. Then both X and Y are conservative under P x , for all x ∈ Z d and P-a.e. sample of the conductances.
Proof. We will invoke a standard criterion (see, e.g., Liggett [39, Chapter 2]) plus some stationarity and the fact that X and Y are derived from the discrete-time Markov chain Z. Focusing on X first, we have X t = Z N t for N t := sup{n ≥ 0 : T 1 + · · · + T n ≤ t} where, conditional on Z, the random times {T k : k ≥ 1} are independent exponentials with T k having parameter π(Z k−1 ). Thanks to the 1st and 2nd Borel-Cantelli lemmas, so no blow-ups occur if and only if the sum on the right diverges a.s. Now E x (T k |Z) = 1/π(Z k−1 ) and so we need ∑ k≥0 1/π(Z k ) = ∞ a.s. The stationarity and ergodicity of Q Z for the process on environments induced by Z imply The limit is positive since Eπ(0) < ∞ by Assumption 1.2. In particular, ∑ k≥0 1/π(Z k ) = ∞ a.s. The argument for Y process is completely analogous; only that T k+1 is now (conditionally on Z) exponential with parameter π(Z k )/ν(Z k ). Here we also need 0 < Eν(0) < ∞ as implied by Assumption 1.2.

Localization and truncation.
Our proof focuses on the process Y . The main challenge is to control the contribution of large jumps. As noted earlier, we do this by way of localization, which is a change of the environment that limits all the complexity to a finite ball, and truncation, where we remove jumps larger than a certain cutoff from the environment. We remark that the idea of considering localized modifications of non-local Dirichlet forms has appeared in [25, Section 2.2], but here the construction is more delicate as we need to modify both the conductances and the reference measure.
We start by localization. Denote For any integer R ≥ 1, let and define a symmetric regular Dirichlet form ( D R , F R ) by This form corresponds to the localized version of our process.
For all κ ∈ (0, 1] and all R ≥ 1 satisfying κR ≥ 1 we now define a truncated, localized Dirichlet The proof is based on two lemmas. Consider the Dirichlet form associated with an auxiliary collection {C x,y =C y,x : |x − y| = 1} of nearest-neighbor conductances. We then have: holds with α L := 1 Let s > 1 be the index Hölder conjugate to p. Then by our restriction on the support of f , Next define r by and note that, since s > 1 and ε > 0, we have r > 1. Using that where c is a d-dependent constant (which is directly related to the isoperimetric constant on Z d ).
For the next lemma, let denote the Dirichlet form associated with the simple random walk. Then we have: Proof. Let ε ∈ (0, 4 d−2 ) and set Then r d d−1 > 2 and so there exist unique β , γ ∈ R such that We claim that β ∈ (0, 1] and γ ∈ [0, 1/2). Indeed, β > 0 and γ ≥ 0 are immediate from (3.41) and r ≥ 2. The inequality β ≤ 1 is equivalent r ≥ d−1 d (2 + ε), which holds for our choice of r in (3.38), while γ < 1/2 is equivalent to r < 2 d−1 d−2 . This requires 2 + ε < 2 d d−2 , which holds thanks to ε < 4 d−2 . Using (3.39) and Hölder's inequality we have Furthermore, by the ℓ 1 -Sobolev inequality on Z d , as in (3.30) we obtain where c ∈ (0, ∞) depends only on the spatial dimension d and where we relied on (3.40) to get the second inequality. Hence we get and, after a short calculation, also Raising both sides to 2 2+ε and using the definition of r, the conclusion follows. We remark that an alternative proof of Lemma 3.5 can be devised based on estimates for the transition probabilities of the simple random walk. In particular, (3.37) is true even when d = 1 with ε ∈ (0, ∞). On the other hand, the proof of Lemmas 3.4 and 3.5 becomes considerably easier in d ≥ 3 where one can rely on the ℓ 2 -Sobolev inequality.
With the above lemmas in hand, we are ready to give: Proof of Proposition 3.3. Let d ≥ 2 and suppose Assumption 1.2 holds. Since the inequality in (1.7) is strict, we may assume that both indices are finite, i.e., p, q ∈ ( d 2 , ∞). Under Assumption 1.1, the Spatial Ergodic Theorem yields the existence of a constant c 0 ∈ (0, ∞) and a random variable R 0 = R 0 (ω) (which may depend on p, q) with P(1 ≤ R 0 < ∞) = 1 such that The definitions (3.10-3.11) of C R x,y and ν R then give for some c 1 ∈ (0, ∞) that also may depend on p and q. Let ε ∈ (0, 4 d−2 ) solve (3.24) and fix κ ∈ (0, 1]. Lemma 3.4 withν(x) := ν R (x),C x,y := C R x,y , L := 8R ) along with (3.49) shows the existence of a constant c 2 ∈ (0, ∞) that depends only on d, p, ε and c 1 above such that holds for all R ≥ R 0 with κR ≥ 1 and all f : due to κR ≥ 1 and the choiceC x,y := C R x,y . Next we invoke Lemma 3.5 along with the fact that, for some constant c > 0, is valid for all a, b, r > 0, to get the existence of c 3 ∈ (0, ∞) such that, for all f : (3.52) Here we used ν R (x) = 1 for all x ∈ B(0, 4R) c and the definitions (3.10-3.11) along with κR ≥ 1 to ensure that the conductance C R x,y is no smaller than that of the simple random walk whenever x or y is in supp( f ).
Consider a mollifier φ R : and where c 4 := 2 max{c 2 , c 3 }. For the sum of the two Dirichlet forms we then get where (3.13) was used in the last inequality. Plugging this in (3.55), the claim follows.
The above proof highlights the need for ν as a reference measure and its modification ν R .

Heat-kernel estimates.
We will now apply the above functional inequalities to estimates of the heat kernels. Denote by Y R := {Y R t : t ≥ 0} the Hunt process associated with ( D R , F R ) and let p R (t, x, y) be the associated transition probabilities. Similarly, write Y R,κ := {Y R,κ t : t ≥ 0} for the Hunt process associated with ( D R,κ , F R ) and let p R,κ (t, x, y) be the corresponding the transition probabilities.
We start a simple consequence of Proposition 3.3: Lemma 3.6 Suppose that Assumptions 1.1 and 1.2 hold, and let ε ∈ (0, 4 d−2 ) and the random variable R 0 := R 0 (ω) be as in Proposition 3.3. Then there exists a constant c > 0 such that holds for all κ ∈ (0, 1), all R ≥ R 0 with κR ≥ 1, all t > 0 and all x, y ∈ Z d .
The inequality (3.57) is particularly useful when t and R are related by diffusive scaling and it provides a version of a uniform, a.k.a. diagonal, heat-kernel upper bound. For the off-diagonal estimate, we have to work somewhat harder: Proposition 3.7 Suppose Assumptions 1.1 and 1.2 hold, and let ε ∈ (0, 4 d−2 ) and the random variable R 0 := R 0 (ω) be as in Proposition 3.3. For every κ ∈ (0, 1], there is a constant c ∈ (0, ∞) such that for all x, y ∈ Z d , all R ≥ R 0 (ω) with κR ≥ 1 and all 0 < t ≤ R 2 , Proof. We will invoke an argument from Carlen, Kusuoka and Stroock [24] (based on an earlier argument of Davies [28]) for obtaining off-diagonal heat-kernel bounds from the Nash inequality (3.59). For that we first introduce the auxiliary objects In particular, we can take ψ with bounded support. Carlen, Kusuoka and Stroock [24, Theorem (3.25)] then shows that there is a constant c 0 > 0 such that for all κ ∈ (0, 1], all R ≥ R 0 (ω) with κR ≥ 1, all t > 0 and all x, y ∈ Z d , which, we note, refines the estimate from Lemma 3.6. In order to bring (3.63) into the desired form, it thus suffices to supply a good lower bound on E R (2t, x, y). Fix x 0 , y 0 ∈ Z d , let λ ≥ 0 and consider the test function The triangle inequality gives |ψ(x) − ψ(y)| ≤ λ |x − y|. According to the elementary inequalities |e t − 1| 2 ≤ t 2 e 2|t| and t 2 e −|t| ≤ 2 for t ≥ 0, we then get from (3.13) that (3.65) Since the same bound applies to Γ R (−ψ) as well, the fact that ψ( Suppose that 0 < t ≤ R 2 and set Denoting c := e 1 2 +2(1+2d)κ −2 c 0 , the claim now follows from (3.63).

Exit time estimates.
The uniform estimate on the transition probabilities of the truncated, localized process Y R,κ permits us to control the tails of the exit times thereof. This can then be extended to the process Y as well. Indeed, given A ⊆ Z d , define the first exit time from A by We then have: The proof is based on a comparison with the corresponding exit problems for the walks Y R and Y R,κ . For all R ≥ 1, all κ ∈ (0, 1], all x ∈ Z d and all r ≥ 1, let We then have: Lemma 3.9 Suppose Assumptions 1.1 and 1.2 hold and let R 0 := R 0 (ω) be as in Proposition 3.3. There is κ ∈ (0, 1] and, for each δ ∈ (0, 1], also c > 0 such that for all x ∈ Z d , all R ≥ max{16δ −1 R 0 , (κδ ) −1 } and all t > 0, Proof. Let ε ∈ (0, 4 d−2 ) and R 0 be as in Proposition 3.3 and let κ ∈ (0, 1] be such that Fix δ ∈ (0, 1]. Since (3.61) applies with κ replaced by κδ for all R ≥ R 0 satisfying κδ R ≥ 1, all 0 < t ≤ R 2 and all x ∈ Z d , we get where c 0 depends on κ and δ . Assuming in addition that t ≤ R 2 /3 (which ensures log(R 2 /t) ≥ 1) and noting that the condition δ R ≥ 16R 0 (ω) enables us to apply (3.48) and (3.49), the two sums on the right are now bounded by for some c 2 depending on κ and δ . Combining (3.74-3.75), from (3.73) we obtain The strong Markov property at the first exit time from B(x, δ R) shows (3.77) Invoking (3.76), we get the claim for all t < R 2 /6. Adjusting the constant c if necessary, the claim holds trivially for t ≥ R 2 /6.
Then we get the following deterministic estimate: Proof. This is proved by following the argument of [25,Lemma 3.1], which is itself based on the Meyer's construction of Y R (see [12,Section 3.1]).
We are ready to give: to the constant in Lemma 3.9. Since the processes Y and Y R "see" the same conductances in B(0, 2R), for all R ≥ r ≥ 1, all t > 0 and all Lemma 3.10 along with (3.14) then show Proof. Denote by p R,B(0,R) the transition probabilities of the process Y R killed upon exiting the ball B(0, R). Then for all x, y ∈ B(0, R) and all t > 0, Since, trivially, it suffices to prove the desired bound for the transition probabilities p R (t, x, y) of the process Y R .
Here we note that the associated Dirichlet forms obey D R,κ ( f , f ) ≤ D R ( f , f ) and so the Nash inequality (3.59) applies for the Dirichlet form ( D R , F R ) as well. Since ν R = ν on B(0, R), the argument from the proof of Lemma 3.6 then gives the claim.

PROOF OF QUENCHED INVARIANCE PRINCIPLE
Having established the needed bounds on the transition probabilities and exit times, we proceed to the proof the quenched invariance principle.

Tightness.
We start with the proof of tightness of diffusively-scaled process Y . Our aim is to apply the criterion for tightness from Aldous [2].
in probability.
We will apply this to the choice to get: Proof. Let R 1 = R 1 (ω) be as in Proposition 3.8. We will check that the above conditions (1-2) from Aldous [2] hold on the set {R 1 < ∞}. To distinguish various processes, let us write τ B (X ) for the first exit time of the process X from set B. For (1) we note that, by (3.70) in Proposition 3.8, when r √ n ≥ R 1 , This implies condition (1) above on {R 1 < ∞}. Next, pick T > 0 and η > 0, let τ n be stopping times bounded by T and choose δ n with δ n ↓ 0. For any r > 0, the strong Markov property gives Using (4.3), the first quantity on the right is at most c 1 T /r 2 whenever r √ n ≥ R 1 . For the second quantity Proposition 3.8 with δ := η/r gives max z∈B(0,r √ n) for min{r √ n, η √ n} ≥ R 1 . While c 2 depends on the ratio η/r, for η and r fixed the right-hand side tends to zero in light of δ n → 0. Thus we get lim sup for some constant c 3 ∈ (0, ∞) regardless of η or the choice of stopping times τ n (as long as τ n ≤ T ). But the left-hand side does not depend on r and so taking r → ∞, we obtain condition (2)

Proof of a QIP.
Having proved tightness, our proof of a QIP is now reduced to the convergence of finite- where Σ is the matrix with entries as in (2.8). Then we have: (4.8) Proof. One of the main issues in the proof is a proper demonstration of the set of conductances of full P-measure on which (4.8) holds for all k-tuples (t 1 , . . . ,t k ) with the stated properties. We will therefore keep careful track of all requisite events. Let Ψ(x) denote the "harmonic coordinate" function from (2.6); this is defined (and depends on) conductances in a measurable set Ω 1 with P(Ω 1 ) = 1. Given a realization of the conductances and a path Z := {Z n : n ≥ 1} of the discrete-time Markov chain, consider the random variables {Ψ(Z n )}. A classical argument (cf, e.g., Corollary 3.10 of Biskup [19]) based on the fact that Ψ(Z n ) is a martingale implies that, under our standing assumptions, there is a measurable set Ω 2 ⊆ Ω 1 with P(Ω 2 ) = 1 such that for each realization of conductances in Ω 2 , the law of induced by P 0 on D([0, T ]) -in fact, even on C([0, T ]), provided we interpolate values linearly -tends to Brownian motion with mean zero and covariance Σ. Next we will prove a similar statement for t → 1 √ n Ψ(Y nt ) but for that we have to control the time change that takes Z into Y . To that end, conditionally on Z, let T 0 , T 1 , . . . denote independent exponentials with parameters π(Z 0 )/ν(Z 0 ), π(Z 1 )/ν(Z 1 ), . . . , respectively. Then { Y t : t ≥ 0}, defined by Y t := Z N t for N t := max{k ≥ 0 : T 1 + · · · + T k ≤ t}, (4.10) has the law of {Y t : t ≥ 0}. Letting Ω 3 ⊆ Ω 2 be the subset of conductances on which , P 0 -a.s., (4.11) and ∀ε > 0 : lim The stationarity and ergodicity of Q Z with respect to the chain on environments induced by Z guarantees (via the Pointwise Ergodic Theorem) that P(Ω 3 ) = 1. Invoking the Weak Law of Large Numbers (with a simple truncation step enabled by (4.12)) and a renewal argument, we then have for all conductances from Ω 3 . In light of monotonicity of t → N t , this gives a locally-uniform closeness of s → N ts /t to a linear function. By the definition of the Skorohod topology, the identification Y law = Y now shows that also the law induced by P 0 on D([0, T ]) tends to that of a Brownian motion with mean zero and covariance (Eπ(0)/Eν(0))Σ, for every realization of conductances in Ω 3 .
Since convergence on D([0, T ]) to a process with continuous paths implies convergence of finite-dimensional distributions, to get (4.8) it now suffices to identify a measurable set Ω ⋆ ⊆ Ω 3 of conductances with P(Ω ⋆ ) = 1 such that holds on Ω ⋆ for each t ≥ 0. For this we argue as follows. For any η > 0, Assume that η is so small that t ≤ η −2 . By Proposition 3.8, for some c 1 independent of n, η and t, where R 1 = R 1 (ω) is as in Proposition 3.8. The contribution of this term to (4.16) thus vanishes as n → ∞ followed by η ↓ 0. Let R 0 = R 0 (ω) be as in Proposition 3.3. Proposition 3.11 in turn gives where c 2 is independent of n, x, η and t. Using Hölder's inequality with p as Assumption 1.2, the sum on the right of (4.16) is thus bounded by a constant times The p-integrability of ν ensures that the term in the first large parentheses is bounded uniformly in n ≥ 1, P-a.s. Thanks to corrector sublinearity on average (2.9), the term in the second large parentheses, and thus the whole expression, tends to zero P-a.s. as n → ∞. This proves (4.15) and thus the whole claim.
Let us now see how the above proposition implies our main result: Proof of Theorem 2.2. Fix T > 0. Proposition 4.1 tells us that the laws of Y (n) are tight on D([0, T ]). By Proposition 4.2 we then conclude that Y (n) converges in law to B while the timechange argument in (4.10) and (4.13) then shows that t → 1 √ n Z ⌊tn⌋ , as an element of D([0, T ]), tends in law to a centered Brownian motion with covariance Σ. As the limit process has continuous paths, this implies the convergence of the linear interpolation B n of Z-values from (2.1) in the space C([0, T ]).

Assumption 1.2 for long-range percolation.
To complete our results concerning QIPs, it remains to verify the conditions on long-range percolation model that ensure convergence of the random walk to Brownian motion. Proof of Corollary 2.3. Let p > d 2 be as in the statement. Fix any total order x y on Z d and let {C x,y : x, y ∈ Z d , x y} be independent, zero-one valued random variables with P(C x,y = 1) = p(x − y), where p is as in the statement. Identify C x,y = C y,x to get symmetric conductances. Given n 0 ≥ 1 to be determined later, let The Burkholder-Gundy-Davis inequality thus shows, for any p ≥ 1, Furthermore, according to [38,Theorem 1], for every n ≥ 1, We now have to estimate the infimum.
for all r ≥ 1. By (1 + x) r ≤ (1 + ax1 {r≥1} + bx r ) valid with r-dependent a, b > 0 for all x > 0, we then get that for p ≥ 1, (4.25) where in the second inequality we also used the fact that ln(1 + x) ≤ x for all x > 0, and the last inequality is due to |x| 4 1 {p/2≥1} ≤ |x| 2p for |x| ≥ 1. By our assumption, the sum on the right of (4.24) is bounded by uniformly in n ≥ 1. For a given t > 0, say t := 1, this can be made smaller than p/2 by choosing n 0 sufficiently large. The infimum (4.24) is then bounded by one and so sup n≥1 E[|M n | p ] ≤ c 1 c 2 .
With the help of the Mononone Convergence Theorem we then get ν(0) ∈ L p (P). The QIP then follows from Theorem 2.2.
As noted earlier, Corollary 2.3 readily deals with the cases when {C x,y = C y,x } x,y∈Z d are independent, zero-one valued random variables with (assuming x = y) A QIP is then inferred for all s > d. Another example is motivated by long range stable-like random conductance models studied in [25]. There one takes (assuming again x = y) C x,y := ξ x,y |x − y| d+s (4.28) where {ξ x,y = ξ y,x } x,y∈Z d are i.i.d. Bernoulli random variables except for |x − y| = 1 where we set ξ x,y := 1. In this case the conditions of Corollary 2.3 are met for all s > 2.

FAILURES OF EVERYWHERE SUBLINEARITY
In this section we provide the promised counterexamples to everywhere sublinearity of the corrector and thus prove Theorems 2.5 and 2.6. We begin with the counterexample arising in the context of long-range percolation.

Long-range percolation.
Consider long-range percolation with the connection probability p(x) having the asymptotic (2.2) with exponent s ∈ (d + 2, 2d), which is non-vacuous only when d ≥ 3. We will assume p(0) = 0, p(x) = 1 for x with |x| = 1 and p(x) < 1 for all x with |x| > 1. The conductances then obey (1) C x,x = 0 for all x a.s., (2) C x,y = 1 whenever |x − y| = 1 a.s., (3) P(C x,y = 1) = p(y − x) whenever |x − y| > 1. As already mentioned, a key point is the proof of the existence of a "long" edge of length n from o(n)-neighborhood of the origin. This would itself be easy to guarantee; what makes it harder is that our arguments also need that the "far away" endpoint of the "long" edge is incident to no other edges than the nearest-neighbor ones. The exact statement is the subject of: occurs for infinitely many n a.s.
Proof. Instead of (5.1) consider the event whose advantage over A(x, y) is that the two events on the right are now independent as soon as x and y are as in the union in (5.2). Set A(x, y). (5.4) Obviously, A n ⊂ A n . Moreover, for n so large that n γ < n − 1 (note that γ < 1), on A c n A c n there is an edge between some x with |x| ≤ n γ and some y with n ≤ |y| ≤ 2n so that y has another edge to some x ′ with |x ′ | ≤ n γ . Defining, also for later use, (5.6) We will now proceed to estimate probabilities of two events on the right-hand side.
For the probability of B n , we invoke a straightforward union bound. Let Ξ n denote the set of all quadruples (x, x ′ , y, y ′ ) that satisfy the geometrical conditions in event B n . Then, for some constants c, c ′ < ∞, where we first used that both p(y − x) and p(y ′ − x ′ ) are at most n −s+o (1) , then carried out the sums over x and x ′ to get a constant times n dγ from each and, finally, applied that z → p(z) is summable because s > d. Noting that, in light of our choice of γ, the final exponent in (5.7) is negative, we get that B 2 n occurs only for finitely many n, a.s. Concerning the first event in (5.6), let N denote the number of edges between some x with |x| ≤ n γ and some y with n ≤ |y| ≤ 2n and let {(x i , y i ) : i = 1, . . . , N} list the corresponding pairs of vertices connected by these edges. On A c n ∩ B c n we then know that (once N > 1) all y i are distinct and each y i must have at least one non-nearest neighbor edge to a vertex z with |z| > n γ and z ∈ {y 1 , . . . , y N }. Conditioning on F n := σ (C x,y : |x| ≤ n γ , n ≤ |y| ≤ 2n), we thus have 8) where N and (y 1 , . . . , y n ) are as specified above. The product is bounded from below by which is positive by the summability of p and our assumption that p(z) < 1 once |z| > 1. Hence, holds true for any δ > 0. To estimate P(N ≤ n δ ), let q n := min |x|≤n γ min n≤|y|≤2n p(y − x). (5.11) and let V n be the number of pairs (x, y) with |x| ≤ n γ and n ≤ |y| ≤ 2n. Then N is stochastically dominated from below by a binomial random variable with parameters V n andq n . As V nqn = n d(1+γ)−s+o (1) with d(1 + γ) − s > 0 by our assumptions about γ, the probability P(N ≤ n δ ) decays, for δ positive but small, exponentially in a power of n. Using this in (5.10), the Borel-Cantelli lemma implies that A c n ∩ B c n occurs only finitely often a.s. With the existence of the desired "long" edge established, we can move to the construction of a counterexample to everywhere sublinearity of the corrector. Proof of Theorem 2.5. Consider the long-range percolation setting as specified above. The asymptotic (2.2) with s > d + 2 implies E(∑ x∈Z d C 0,x |x| 2 ) < ∞ and so the corrector can be defined by any of the standard methods (see, e.g., Biskup [19, Section 3] for a discussion of these). In fact, by (2) above, the corrector is sublinear on average (cf. [19,Proposition 4.15]), meaning that {x : |χ(x)| > ε|x|} is, for each ε > 0, a set of zero density in Z d .
To show that χ is not sublinear everywhere in the sense of (2.10) we will assume, for the sake of contradiction, that for each ε > 0 there is a (random) K < ∞ such that (This is equivalent to (2.10).) Suppose that A n occurs and let x and y be the endpoints of an edge that make A(x, y) in the definition of A n occur. The harmonicity condition (2.7) for Ψ from (2.6) at point y then reads where we noted that C x ′ y = 1 for x ′ = x and x ′ being a neighbor of y; otherwise C x ′ y = 0. Applying (5.12) and the fact that |x|, |y|, |y + z| ≤ 2n + 1 for all z with |z| = 1 yield |y − x| ≤ (2 + 4d)K + 2d + ε(2 + 4d)(2n + 1). (5.14) For ε small this contradicts |y − x| > n − n γ . Hence, by Lemma 5.1, (5.12) cannot occur on A n for n large enough and, since A n does occur of infinitely many n a.s., (5.12) fails a.s.

Nearest-neighbor conductances.
Next we move to the context underlying Theorem 2.6. We start by defining some auxiliary processes that will be used later to construct the desired environment law P. As all of these live on the same probability space, we will keep using the same P throughout. In the construction we assume that d ≥ 2 although the ultimate conclusion will be restricted to d ≥ 3.
The set on the right is non-empty a.s. as ξ 1 (x) = 1 a.s.) Thus, assuming henceforth L −d k to be summable, let ℓ(x) denote the maximal k with ξ L k (x) = 1.
Next letê 1 , . . . ,ê d be the unit vectors in the coordinate directions and let us regard Z d−1 as the integer span of {ê 2 , . . . ,ê d }. Denote by the set consisting of 6L vertices in the first coordinate direction and centered at, but not containing, the origin along with all of their nearest neighbors in the other coordinate directions. Note that By the monotonicity of k → L k , we have ∑ j≥1 L j ∑ k≥ j L −d k = ∑ k≥1 ∑ k j=1 L j L −d k ≤ ∑ k≥1 kL 1−d k , so another use of the Borel-Cantelli lemma gives Assuming henceforth kL 1−d k to be summable, let m(x) denote the maximal k in this set for the given x. As {L k : k ≥ 1} is increasing, we get Obviously, the collection {(ℓ(x), m(x)) : x ∈ Z d } is stationary. Moreover, as Λ L does not contain the origin, m(x) is independent of ℓ(x) for each x. We now observe: By |Λ L | = O(L), the fact that L 2 > 1 and the summability of kL 1−d k , both terms in the parentheses are bounded from below by a positive constant uniformly in k ≥ 2. Since m(x) and ℓ(x) are independent we get the lower bound in (5.21) as well.
(5. 25) We observe that, by (5.20), we have G k (y, j) ∩ G k (y, j ′ ) = / 0 as long as 0 < | j − j ′ | ≤ 3L k . Hence, invoking also the lower bound in (5.21), we get for some c ′ independent of k. Now observe that the union in (5.27) is a subset of the event (5.22). Also note that, as soon as we have L k+1 > 2L k , the unions in (5.27) use, for distinct k's, disjoint sets of underlying coordinates {ξ L (x) : L ≥ 1, x ∈ Z d } and are thus independent of one another. By the second Borel-Cantelli lemma, the event in (5.22) occurs for infinitely many k a.s.
Proof of Theorem 2.6. Let p, q ≥ 1 be numbers such that (2.11) holds and let p ′ > p and q ′ > q be such that we still have 1 Consider the construction given above with {L k : k ≥ 1} such that L k+1 > 2L k and L 1 := 1 so that all objects ℓ(x), m(x) and κ(x) are well defined. Given an x with κ(x) = 1, denote k := ℓ(x) and consider the set of edges E L k (x) incident with at least one vertex in {x + jê 1 : j = 0, . . . , L k }. Set the conductance to b L k on edges with both endpoints in this set and to a L k to those with only one endpoint in this set. Thanks to Lemma 5.3, the conductance of each edge is set at most once so no conflict can arise. We set the conductance on edges not in {E L ℓ(x) : κ(x) = 1} to one. The resulting configuration of conductances is a measurable function of {ξ L (x) : L ≥ 1, x ∈ Z d } and, since this family is stationary and ergodic with respect to shifts, so is the induced conductance law. Let us check that the integrability conditions (2.12) hold. Fix any x with |x| = 1.
Noting that E L k (z) contains L k edges of conductance b L k and R k := 2 + (2d − 2)(L k + 1) edges of conductance a L k , we have (5.32) Plugging in (5.31), invoking that p ′ > p and q ′ > q and using that {L k } grows exponentially, we get C 0,x ∈ L p (P) as desired. Similarly, which is again finite by (5.31), our choices of p ′ and q ′ and the exponential growth of {L k }. Now let us move to the violation of sublinearity of the corrector. Suppose the event (5.22) occurs at some x with L k ≤ |x| ∞ ≤ 2L k . The conductances C yz on edges y, z ∈ E L k (x) then take values a L k and b L k as specified above. Denote by D := {x + jê 1 : j = 0, . . . , L k } the corresponding set of vertices (which depends on x) and let We now derive bounds on E D (Ψ). To get a lower bound, we fix the values at x and x + L kê1 and set all conductances on edges with only one endpoint in D to zero. Optimizing the remaining values is now a one-dimensional problem whose simple solution yields But that contradicts the fact, implied by (5.30), that a L k L 2 k |∂ D| ≪ b L k L k once k is sufficiently large. Hence we cannot have (5.12) and, at the same time, the event in (5.22) to occur for k large. Lemma 5.2 implies that (5.12) fails for all ε > 0 and all K < ∞ a.s.