Gaussian upper bounds for heat kernels of continuous time simple random walks

We consider continuous time simple random walks with arbitrary speed measure $\theta$ on infinite weighted graphs. Write $p_t(x,y)$ for the heat kernel of this process. Given on-diagonal upper bounds for the heat kernel at two points $x_1,x_2$, we obtain a Gaussian upper bound for $p_t(x_1,x_2)$. The distance function which appears in this estimate is not in general the graph metric, but a new metric which is adapted to the random walk. Long-range non-Gaussian bounds in this new metric are also established. Applications to heat kernel bounds for various models of random walks in random environments are discussed.


Introduction
Let Γ = (G, E) be an unoriented graph. We assume that Γ is connected, contains neither loops nor multiple edges, is locally finite, and countably infinite. Let d be the usual graph metric; given x, y ∈ G, d(x, y) is equal to the number of edges in the shortest (geodesic) path between x and y. We write B(x, r) := {y ∈ G : d(x, y) ≤ r} for the closed ball of radius r in the metric d.
We assume that Γ is a weighted graph, so that associated with each (x, y) ∈ G × G is a nonnegative edge weight π xy which is symmetric (π xy = π yx for x, y ∈ G) and satisfies π xy > 0 if and only if {x, y} ∈ E. The edge weights can be extended to a measure on G by setting π x := π({x}) := y∈G π xy for x ∈ G, and this extends to all subsets of G by countable additivity.
Let (θ x ) x∈G be an arbitrary collection of positive vertex weights. We consider the continuoustime simple random walk (X t ) t≥0 , which has generator L θ , given by 1 θ x y∼x π xy (f (y) − f (x)).
Regardless of the choice of (θ x ) x∈G , the jump probabilities of these processes are P (x, y) = π xy /π x ; the various walks corresponding to different choices of (θ x ) x∈G will be time-changes of each other.
Two specific choices of the vertex weights (θ x ) x∈G arise frequently. The first is the choice θ x := π x , which yields a process called the constant-speed continuous time simple random walk (CSRW). The CSRW may also be constructed by taking a discrete-time simple random walk on (Γ, π), which we denote by (X n ) n∈Z + , together with an independent rate 1 Poisson process (N t ) t≥0 ; the CSRW is the process Y t := X Nt .
The second choice, θ x ≡ 1, yields a stochastic process referred to as the variable-speed continuous time simple random walk (VSRW). This walk has the same jump probabilities as the CSRW, but instead of waiting for an exponentially distributed time with mean 1 at a vertex x before jumping, the VSRW waits for an exponentially distributed time with mean π −1 x . As discussed in [4], the VSRW may explode in finite time.
Associated with the process (X t ) t≥0 is a semigroup (P t ) t≥0 defined by (P t f )(x) := E x f (X t ), and which possesses a density p t (x, y) with respect to the measure θ, defined by p t (x, y) := 1 θ y P x (X t = y).
This function is also called the heat kernel of the process (X t ) t≥0 .
We discuss here an alternative construction of the heat kernel which will be used in Section 3; this closely follows the discussion in [25]. Let (G n ) n∈Z + be an increasing sequence of finite connected subsets of G whose limit is G. We denote the exterior boundary of a connected set U ⊂ G by ∂U := {y ∼ G \ U : there exists x ∈ U with x ∼ y}.
On each G n we define the killed heat kernel p where given V ⊂ G, T V := inf{s ≥ 0 : X s ∈ V } is the first hitting time of V .
This object satisfies the following conditions: for all x, y ∈ G.
Furthermore, we have that for all x, y ∈ G and t > 0 and n ∈ Z + , We will also need a distance function on G×G which is adapted to the vertex weights (θ x ) x∈G ; this will be the metric which appears in our heat kernel estimates. In general, Gaussian upper bounds for the heat kernel do not hold if one only considers the graph metric, see Remark 6.6 of [4] for an example. Let d θ (·, ·) be a metric which satisfies      1 θ x y∼x π xy d 2 θ (x, y) ≤ 1 for all x ∈ G, d θ (x, y) ≤ 1 whenever x, y ∈ G and x ∼ y. (1.1) It is not difficult to verify that such metrics always exist. We write B θ (x, r) := {y ∈ G : d θ (x, y) ≤ r} for the closed ball of radius r in the metric d θ ; it should be noted that B θ (x, r) may contain infinitely many points for some choices of x ∈ G and r > 0, or, equivalently, points arbitrarily far from x in the graph metric. Note that for the CSRW, the graph metric always satisfies both of the above conditions.
The use of metrics different from the graph metric in heat kernel estimates was initiated by Davies in [9], and this metric is similar to the metrics considered there. These metrics are closely related to the intrinsic metric associated with a given Dirichlet form; some details on the latter may be found in [18]. Recent work using similar metrics includes [4], [12], [15], and [20].
For appropriate values of A and γ, this set of functions includes polynomial functions such as ct d/2 , exponential functions such as c exp(Ct α ), and various piecewise combinations of (A, γ)−regular functions such as c 1 t d 1 /2 1 (0,T ] + c 2 t d 2 /2 1 (T,∞) , where c 1 and c 2 are chosen to ensure that the resulting function is continuous.
Our work will assume that one has already obtained on-diagonal upper bound for the heat kernel at two points x 1 , x 2 ∈ G; that is, there are functions f 1 , f 2 which are (A, γ)−regular on (a, b) such that, for all t > 0 and i ∈ {1, 2}, . (1.2) On-diagonal bounds such as (1.2) have been studied in considerable detail in both discrete and continuous settings, and follow from a variety of analytic inequalities, such a Sobolev inequality [22], a Nash inequality [6], a log-Sobolev inequality [10], or a Faber-Krahn inequality [14]. Generally, these methods yield a uniform upper bound, valid for all x ∈ G. In the present setting of graphs, one may also use isoperimetic inequalities on general graphs, or volume growth estimates in the particular case of Cayley graphs of groups; details are in [2], [23], and [24].
In the context of Riemannian manifolds, Grigor'yan has shown that any Riemannian manifold M which satisfies an on diagonal upper bound at two points x, y ∈ M admits a Gaussian upper bound for the heat kernel q t (x, y). His result is as follows: Theorem A. [13] Let x 1 , x 2 be distinct points on a smooth Riemannian manifold M, and suppose that there exist (A, γ)−regular functions f 1 , f 2 such that, for all t > 0 and i ∈ {1, 2}, .
Then for any D > 2 and all t > 0, the Gaussian upper bound One remarkable aspect of this result is that it only requires on-diagonal bounds at the points x 1 and x 2 . Prior to [13], there are several proofs of Gaussian upper bounds for the heat kernel on manifolds, but these papers involve more restrictive hypotheses on the underlying manifold, in addition to requiring on-diagonal heat kernel estimates which hold for all x ∈ G. In practice, the upper bounds (1.3) are often obtained from a uniform upper heat kernel bound using the techniques described previously, such as a Nash inequality. However, Theorem A leaves open the possibility of obtaining Gaussian upper bounds for q t (x 1 , x 2 ) using only the restricted information in (1.3).
For the discrete time SRW on (Γ, π), one may again assume a uniform upper bound for the heat kernel, and obtain a Gaussian upper bound from it. This was done first by Hebisch and Saloff-Coste in [17] using functional-analytic techniques, and later by Coulhon, Grigor'yan, and Zucca in [7], using techniques analogous to the ones used by Grigor'yan in [13].
In discrete time, a SRW cannot move further than distance n in time n, and hence p n (x, y) = 0 whenever d(x, y) > n, whereas a continuous time random walk has no such constraint. For the CSRW on Z with the standard weights, the heat kernel does not exhibit Gaussian decay if d(x, y) ≫ t (see [5]), and as a result we will only attempt to obtain Gaussian upper bounds when d θ (x, y) ≤ t. Non-Gaussian estimates applicable where d θ (x, y) ≥ t will be discussed in Section 2, which adapt work of Davies from [8] and [9].
Our main result is a Gaussian upper bound for the heat kernel p t (x, y) which is valid under mild hypotheses on (Γ, π) and (θ x ) x∈G .
Theorem 1.1. Let (Γ, π) be a weighted graph, and suppose that there exists a constant Suppose also that there exist vertices x 1 , x 2 ∈ G such that for all t > 0 and i ∈ {1, 2}, . (1.6)

Remarks:
1. There is no assumption of stochastic completeness on the process (X t ) t≥0 ; these heat kernel estimates hold even if (X t ) t≥0 has finite explosion time.

2.
The main utility of this result is in settings where f i (t) has polynomial growth, so that (1.5) is satisfied. Suppose that for i ∈ {1, 2}, f i (t) = f (t) := exp(ct α ) for some c, α > 0. By 4. In many applications, one has a uniform on-diagonal heat kernel upper bound, that is, an estimate of the form which is valid for all x ∈ G and all t > 0; various techniques for obtaining such estimates were discussed earlier. However, in other cases, one may obtain a heat kernel upper bound of the form which is valid for all x ∈ G and all t > 0, and where c > 0 is independent of x and V (x, r) := π(B(x, r)). This particular on-diagonal upper bound is related to the condition of volume doubling; see [11]. Theorem 1.1 yields Gaussian upper bounds for the heat kernel even in the second situation, where one may have a different on-diagonal upper bound at each point of the graph.
The following is an immediate consequence of Theorem 1.1.
Corollary 1.2. Let (Γ, π) be a weighted graph, and suppose that there exists a constant C θ > 0 such that the vertex weights (θ x ) x∈G satisfy θ x ≥ C θ for each x ∈ G. Let f be an (A, γ)−regular function satisfying (1.5). If for each t > 0, the uniform heat kernel condition If f is only (A, γ)−regular on (T 1 , T 2 ), then we obtain a restricted version of Theorem 1.1: Let (Γ, π) be a weighted graph, and suppose that there exists a constant C θ > 0 such that the vertex weights (θ x ) x∈G satisfy θ x ≥ C θ for each x ∈ G. Let f 1 , f 2 be (A, γ)−regular functions on (T 1 , T 2 ) satisfying, for i ∈ {1, 2}, If there exist vertices v 1 , v 2 ∈ G such that for all t ∈ (T 1 , T 2 ) and i ∈ {1, 2}, the estimate Remarks: 1. The primary use of this result is in the case that T 2 = ∞, in which case one obtains Gaussian upper bounds for all sufficiently large times. In random environments such as supercritical percolation clusters, the functions which appear in existing on-diagonal heat kernel upper bounds may not be (A, γ)−regular, but rather (A, γ)−regular on (T, ∞) for some T > 0; Theorem 1.3 is useful for obtaining Gaussian upper bounds in this setting. Theorem 1.3 has also been used to obtain Gaussian heat kernel estimates for the random conductance model; see [1].
The structure of this paper is as follows. Section 2 establishes long-range, non-Gaussian heat kernel upper bounds for the heat kernel using the metric d θ , similar to earlier estimates of Davies in [8] and [9]. Sections 3 proves a maximum principle, analogous to the one established in [13]; this is subsequently used to estimate a tail sum of the square of the heat kernel. The direct analogue of the maximum principle from [13] does not work in the setting of graphs, and additional restrictions are necessary in order to establish the maximum principle of this paper.
In Section 4, we estimate this tail sum further using a telescoping argument from [13]. In [13], this argument is iterated infinitely many times, but in the present setting the telescoping argument cannot be employed past a finite number of steps. At this point, it is necessary to use the heat kernel estimates of Section 2 to get a final estimate on the tail sum. In Section 5, this estimate of the tail sum is used to estimate a weighted sum of the square of the heat kernel, and in turn, this estimate is used in Section 6 to establish Theorem 1.1. Section 7 discusses the modifications to Section 4 which are necessary to prove Theorem 1.3. Finally, Section 8 discusses applications to random walks on percolation clusters, and how the results of this paper may be applied to existing work on random walks in random environments.

Long range bounds for the heat kernel
In this section, we establish non-Gaussian upper bounds for the heat kernel p t (x, y) which are close to optimal in the space-time region where d θ (x, y) ≥ t. These bounds are closely related to the long-range bounds found in [8] and [9], and are established using the same general techniques. These bounds hold for all x, y ∈ G and all t > 0, although they give results weaker than Gaussian upper bounds in the space-time region where d θ (x, y) ≤ t.
Theorem 2.1. If x 1 , x 2 ∈ G, then for all t > 0, where Λ ≥ 0 is the bottom of the L 2 spectrum of the operator L θ .
Proof. By Proposition 5 of [8], for all x, y ∈ G and t > 0, we have the estimate .
Using the triangle inequality for the metric d θ and the fact that the function g(t) := e t +e −t = 2 cosh(t) is increasing for t ≥ 0, we obtain At this point, we use the inequality which is valid for all s ≥ 0. This gives b(ψ λ , x) ≤ 1 2θ x y∼x π xy (e λd θ (x,y) + e −λd θ (x,y) − 2) Since this estimate holds uniformly in x, we have that Set f (λ) := 1 2 λ 2 e λ . Combining these estimates with (2.1), we get, for each λ > 0, By optimizing over λ > 0, we have where f is the Legendre transform of f , defined by Note that if f (λ) ≤ g(λ) for all λ > 0, f (γ) ≤ g(γ). Now, the function g(λ) := e 2λ satisfies f (λ) ≤ g(λ) for all λ > 0, so Thus, applying this estimate to the preceding work gives which holds for all t > 0.
One may also use these results to obtain a weak Gaussian upper bound for the heat kernel which does not use any information from on-diagonal bounds.
Proof. We proceed as in the proof of Theorem 2.1. Instead of using the inequality e s + e −s − 2 ≤ s 2 e s , we use the estimate which was used previously in [9]; we then obtain estimates similar to those above, except with f (λ) := 1 2 λ 2 1 + λe λ 6 . In [9], Davies computes that Inserting this estimate into the above yields as desired.

Maximum Principle
For the remainder of the paper, we fix a set of vertex weights (θ x ) x∈G for which there exists C θ > 0 with θ x ≥ C θ for all x ∈ G, and an associated metric d θ , satisfying (1.1). We also fix an increasing set of finite connected subsets (G n ) n∈Z + with limit G.
Let x 0 ∈ G be a point for which there exists a (A, γ)−regular function f satisfying (1.5) such that for t > 0, .
In this section, we will prove a maximum principle for the quantities where ξ R will be defined later. This will allow us to estimate various sums and weighted sums of u 2 . One basic estimate which we will use repeatedly is, for any H ⊂ G, using the symmetry and semigroup properties of the heat kernel.
The reason for considering the killed heat kernels p is that the function u (k) is finitely supported, and thus there is no difficulty in interchanging double sums. When L θ is not a bounded operator on L 2 (θ) (see [8] for a proof), and the interchange of sums in (3.2) is not straightforward. We also remark that there is in general no simple description of the domain of the Dirichlet form E in this case.
Differentiating J (k) R (t) and using the fact that u satisfies the heat equation in the second line, we get (writing u (k) x for u (k) (x, t), ζ for exp • ξ, and ζ x for ζ(x, t)), By a Gauss-Green type calculation and using the fact that u (k) The equality (3.2) follows from interchanging the order of summation, which is permissible since u (k) has finite support. Completing the square, we see that It follows that Given λ > 1, there exists K λ < ∞ so that the inequality Here R ≥ 0, t > 0, and s = s(t) > t are parameters that will be allowed to vary, and δ, ε > 0 are parameters that will be fixed. For the rest of this paper, we will fix λ, δ, ε so that the following conditions are satisfied: Let us show that such an assignment of constants is possible by exhibiting λ 0 , δ 0 , ε 0 which satisfy the above conditions. First, we choose λ 0 = 2, so that K λ 0 = 2.98 . . . ≤ 3; this satisfies (3.4). Next, since λ 0 and γ are known, we may define δ 0 through (3.7), and estimate so that (3.5) is also satisfied. We then choose ε 0 to be Let us also note that (3.6) is equivalent to Once λ, δ and ε have been fixed, we have the following result: Proof. Given k ∈ Z + and x ∈ G k , set Now, let us analyze the inequality As before, we have |d 2 which is precisely the condition in the statement of the Lemma. Now, for k ∈ Z + , we define By (3.1), all of these quantities are finite, and by monotone convergence, The maximum principle allows us to estimate I, as follows: Suppose that R 0 ≥ R 1 , and s > t 0 ≥ t 1 > 0 are such that R, s, t satisfy (3.9). Then and so the maximum principle yields J The last three inequalities follow from bounding above the exponential weight exp(ξ R 0 (x, t 1 )) , and using (3.1).
Letting k → ∞ and using (3.10), we get which completes the proof of the Lemma.

Further estimates for I R (t)
In this section, we will prove the following estimate for I R (t): There exist positive constants m 0 , m 1 , n 0 , n 1 , α, which do not depend on either t 0 or R 0 , so that In [13], a similar estimate is obtained without the n 0 exp(−n 1 R 0 ) term, and is a key step in establishing Gaussian upper bounds. The condition (1.5) in the statement of Theorem 1.3 prevents the term n 0 exp(−n 1 R 0 ) from dominating the 'Gaussian term' m 0 Recall that γ > 1 was seen first in the (A, γ)− regularity of the function f . Note that As long as then Lemma 3.2 gives Let us analyze when (4.1) is satisfied. Let j * denote the maximal j for which (4.1) holds.
Using the definition of (R j ) j∈Z + , we obtain and the maximality of j * shows that Rearranging, we obtain Applying (4.2) repeatedly yields The product in S 1 may be estimated as follows: We will deal with the I R j * (t j * ) term later. Continuing, .
At this point, define β > 0, which depends only on γ > 1, by and multiplying these estimates together yields We remark that this is the only point in the proof where we use the (A, γ)−regularity of f .
and insert (4.5) into our earlier estimate for S 2 to obtain At this point, we divide into cases based on whether or not. If it is, then we have If not, then we can estimate S 2 by It remains to estimate the quantity I R j * (t j * ). From Theorem 2.1, we have the following pointwise estimate of the heat kernel: Hence, u(x, t j * ).
At this point, note that if t > 0 is fixed, the function is nonincreasing for d ≥ 2t. Since R j * > 2e 2 t j * , we get This is the only point in the argument at which we explicitly use the fact that the vertex weights are bounded below. Now, we can put all of our estimates together. Combining (4.4),(4.6),(4.7),(4.8) we have where the constants α, m 0 , m 1 , n 0 , n 1 may be taken to be .
The fact that γ − 1 can be very close to 0 is a potential concern. In practice, one will often have the choice of several values of γ; for example, if f (t) = t α , one may choose any γ > 1. One also has the option of using the fact that (A, γ)−regularity implies (A 2 n , γ 2 n )−regularity to increase γ at the cost of increasing A (and hence m 0 ) also. However, choosing γ excessively large will cause α and n 1 to be undesirably close to zero.
We define k * to be the largest nonnegative integer so that 2 k * ≤ √ t (if there is no such nonnegative integer, set k * = 0), and partition G as We turn our attention to the quantities E κ 0 ,D,A j (x 0 , t) for 0 ≤ j ≤ k * + 1, which satisfy On A 0 , the exponential weight exp κ 0 is bounded above by e κ 0 , and hence is bounded above by exp(κ 0 4 j ). Since 2 j−1 √ t ≤ t, we may apply the bound of Lemma 4.1 to obtain where By (1.5), we know that On A k * +1 , the exponential weight exp κ 0 and hence another application of Lemma 3.1 gives By (1.5) again, and so .

Gaussian upper bounds for the heat kernel
We are now ready to prove Theorem 1.1.
Proof. Let D := d θ (x 1 , x 2 ) and assume that t ≥ 1 ∨ D. Then t 2 ≥ 1 2 ∨ D 2 , so we may apply Lemma 5.1 with the points x 1 and x 2 (for which we have (1.6)) to obtain positive constants c and α such that, for t ≥ 1 ∨ D, for all x ∈ G. By using the semigroup property and Cauchy-Schwarz combined with the above considerations, we obtain, for all t ≥ 1 ∨ D, which completes the proof of Gaussian upper bounds for the heat kernel.

Restricted (A, γ)−regular functions
In Section 4, where we estimated the quantity I R (t), we assumed that t 0 ≥ R 0 ≥ 1/2, and used (A, γ)−regularity to obtain, for 0 ≤ k ≤ j * , This is the only point at which (A, γ)−regularity is used. It follows that if f is merely (A, γ)−regular on (T 1 , T 2 ), then for this inequality to hold, we must have T 1 < 2t j * +1 and 2t 1 < γ −1 T 2 . Subsequently, in Section 5, we apply our bounds for I R (t) with t = t 0 and R = 2 j √ t, for 0 ≤ j ≤ sup{k ∈ Z : 2 k ≤ √ t} ∨ 0. Using (4.3), and setting t 0 = t/2 (where t ≥ 1 ∨ D), we see that these inequalities hold when Rearranging, we have t > 72e 4 γ 4 T 2 1 , t < T 2 , and applying these additional constraints yields Theorem 1.3.

Applications to random walks on percolation clusters
In this section, we show how Theorem 1.3 may be used to obtain Gaussian upper bounds for the CSRW on the infinite component of supercritical bond percolation on the lattice Z d equipped with the standard weights. A detailed description of percolation is given in [16]; a percolation cluster is a random connected subgraph of the lattice Z d obtained by deleting each edge independently with probability 1 − p and keeping it otherwise. By fundamental results of percolation theory, there exists a critical probability p c (d) such that for p > p c (d) (i.e., the supercritical case), there is an a.s. unique infinite cluster; we consider the CSRW on this family of random graphs, which we denote by C p,∞ (ω).
For existing work on random walks on percolation clusters, including on-diagonal heat kernel estimates and invariance principles, see [21] and [3]. From now on, we fix p > p c (d), and write q ω t (x, y) for the heat kernel of the CSRW on C p,∞ (ω); the dependence on ω of q ω t (x, y) is a consequence of C p,∞ (ω) being random. We denote the graph metric on C p,∞ (ω) by d C . In [21], Mathieu and Remy proved the following on-diagonal heat kernel bound for the CSRW on C p,∞ (ω).
Lemma 8.1. [21] There exist random variables N x (ω) < ∞ and non-random constants c 1 , c 2 such that almost surely, for all x ∈ G and t > 0, The polynomial function f (t) := c 2 t d/2 is (A, γ)−regular on (N x (ω), ∞) for A = 1, γ = 2, and hence an application of Theorem 1.3 shows that for t ≥ C(N x (ω) ∨ N y (ω)) ∨ 1 ∨ d C (x, y), we have the Gaussian upper bound where C 1 , C 2 > 0 are non-random constants. Remarks: 1. For the discrete time simple random walk on C p,∞ (ω), Gaussian upper bounds are obtained in [7] as an application of their discrete time heat kernel estimates. However, the bounds in [7] have a random constant C 1 = C 1 (ω) in (8.1). The reason is that [7] only

Introduction
Let Γ = (G, E) be an unoriented graph. We assume that Γ is connected, contains neither loops nor multiple edges, is locally finite, and countably infinite. Let d be the usual graph metric; given x, y ∈ G, d(x, y) is equal to the number of edges in the shortest (geodesic) path between x and y. We write B(x, r) := {y ∈ G : d(x, y) ≤ r} for the closed ball of radius r in the metric d.
We assume that Γ is a weighted graph, so that associated with each (x, y) ∈ G × G is a nonnegative edge weight π xy which is symmetric (π xy = π yx for x, y ∈ G) and satisfies π xy > 0 if and only if {x, y} ∈ E. The edge weights can be extended to a measure on G by setting π x := π({x}) := y∈G π xy for x ∈ G, and this extends to all subsets of G by countable additivity.
Let (θ x ) x∈G be an arbitrary collection of positive vertex weights. We consider the continuoustime simple random walk (X t ) t≥0 , which has generator L θ , given by * Department of Mathematics, The University of British Columbia, 1984 Mathematics Road, Vancouver, B.C., Canada, V6T 1Z2. mfolz@math.ubc.ca. Research supported by an NSERC Alexander Graham Bell Canada Graduate Scholarship.
Regardless of the choice of (θ x ) x∈G , the jump probabilities of these processes are P (x, y) = π xy /π x ; the various walks corresponding to different choices of (θ x ) x∈G will be time-changes of each other.
Two specific choices of the vertex weights (θ x ) x∈G arise frequently. The first is the choice θ x := π x , which yields a process called the constant-speed continuous time simple random walk (CSRW). The CSRW may also be constructed by taking a discrete-time simple random walk on (Γ, π), which we denote by (X n ) n∈Z + , together with an independent rate 1 Poisson process (N t ) t≥0 ; the CSRW is the process Y t := X Nt .
The second choice, θ x ≡ 1, yields a stochastic process referred to as the variable-speed continuous time simple random walk (VSRW). This walk has the same jump probabilities as the CSRW, but instead of waiting for an exponentially distributed time with mean 1 at a vertex x before jumping, the VSRW waits for an exponentially distributed time with mean π −1 x . As discussed in [4], the VSRW may explode in finite time.
Associated with the process (X t ) t≥0 is a semigroup (P t ) t≥0 defined by (P t f )(x) := E x f (X t ), and which possesses a density p t (x, y) with respect to the measure θ, defined by This function is also called the heat kernel of the process (X t ) t≥0 .
We discuss here an alternative construction of the heat kernel which will be used in Section 3; this closely follows the discussion in [25]. Let (G n ) n∈Z + be an increasing sequence of finite connected subsets of G whose limit is G. Given U ⊂ G, we denote the first hitting time of U by T U := inf{s ≥ 0 : X s ∈ U}.
For each n ∈ Z + , we define the killed heat kernel p This object satisfies the following conditions: for all x, y ∈ G.
Furthermore, we have that for all x, y ∈ G and t > 0 and n ∈ Z + , We will also need a distance function on G×G which is adapted to the vertex weights (θ x ) x∈G ; this will be the metric which appears in our heat kernel estimates. In general, Gaussian upper bounds for the heat kernel do not hold if one only considers the graph metric, see Remark 6.6 of [4] for an example. Let d θ (·, ·) be a metric which satisfies It is not difficult to verify that such metrics always exist. We write B θ (x, r) := {y ∈ G : d θ (x, y) ≤ r} for the closed ball of radius r in the metric d θ ; it should be noted that B θ (x, r) may contain infinitely many points for some choices of x ∈ G and r > 0, or, equivalently, points arbitrarily far from x in the graph metric. Note that for the CSRW, the graph metric always satisfies both of the above conditions.
The use of metrics different from the graph metric in heat kernel estimates was initiated by Davies in [9], and this metric is similar to the metrics considered there. These metrics are closely related to the intrinsic metric associated with a given Dirichlet form; some details on the latter may be found in [18]. Recent work using similar metrics includes [4], [12], [15], and [20].
We will need the following condition: holds. If a = 0 and b = ∞, then we say that g is (A, γ)−regular.
For appropriate values of A and γ, this set of functions includes polynomial functions such as ct d/2 , exponential functions such as c exp(Ct α ), and various piecewise combinations of (A, γ)−regular functions such as c 1 t d 1 /2 1 (0,T ] + c 2 t d 2 /2 1 (T,∞) , where c 1 and c 2 are chosen to ensure that the resulting function is continuous.
Our work will assume that one has already obtained on-diagonal upper bound for the heat kernel at two points x 1 , x 2 ∈ G; that is, there are functions f 1 , f 2 which are (A, γ)−regular on (a, b) such that, for all t > 0 and i ∈ {1, 2}, .
On-diagonal bounds such as (1.2) have been studied in considerable detail in both discrete and continuous settings, and follow from a variety of analytic inequalities, such as a Sobolev inequality [22], a Nash inequality [6], a log-Sobolev inequality [10], or a Faber-Krahn inequality [14]. Generally, these methods yield a uniform upper bound, valid for all x ∈ G. In the present setting of graphs, one may also use isoperimetic inequalities on general graphs, or volume growth estimates in the particular case of Cayley graphs of groups; details are in [2], [23], and [24].
In the context of Riemannian manifolds, Grigor'yan has shown that any Riemannian manifold M which satisfies an on diagonal upper bound at two points x, y ∈ M admits a Gaussian upper bound for the heat kernel q t (x, y). His result is as follows: Theorem A.
[13] Let x 1 , x 2 be distinct points on a smooth Riemannian manifold M, and suppose that there exist (A, γ)−regular functions f 1 , f 2 such that, for all t > 0 and i ∈ {1, 2}, .
One remarkable aspect of this result is that it only requires on-diagonal bounds at the points x 1 and x 2 . Prior to [13], there are several proofs of Gaussian upper bounds for the heat kernel on manifolds, but these papers involve more restrictive hypotheses on the underlying manifold, in addition to requiring on-diagonal heat kernel estimates which hold for all x ∈ G. In practice, the upper bounds (1.3) are often obtained from a uniform upper heat kernel bound using the techniques described previously, such as a Nash inequality. However, Theorem A leaves open the possibility of obtaining Gaussian upper bounds for q t (x 1 , x 2 ) using only the restricted information in (1.3).
For the discrete time SRW on (Γ, π), one may again assume a uniform upper bound for the heat kernel, and obtain a Gaussian upper bound from it. This was done first by Hebisch and Saloff-Coste in [17] using functional-analytic techniques, and later by Coulhon, Grigor'yan, and Zucca in [7], using techniques analogous to the ones used by Grigor'yan in [13].
In discrete time, a SRW cannot move further than distance n in time n, and hence p n (x, y) = 0 whenever d(x, y) > n, whereas a continuous time random walk has no such constraint. For the CSRW on Z with the standard weights, the heat kernel does not exhibit Gaussian decay if d(x, y) ≫ t (see [5]), and as a result we will only attempt to obtain Gaussian upper bounds when d θ (x, y) ≤ t. Non-Gaussian estimates applicable where d θ (x, y) ≥ t will be discussed in Section 2, which adapt work of Davies from [8] and [9].
Our main result is a Gaussian upper bound for the heat kernel p t (x, y) which is valid under mild hypotheses on (Γ, π) and (θ x ) x∈G .

Remarks:
1. There is no assumption of stochastic completeness on the process (X t ) t≥0 ; these heat kernel estimates hold even if (X t ) t≥0 has finite explosion time.

2.
The main utility of this result is in settings where f i (t) has polynomial growth, so that (1.5) is satisfied. Suppose that for i ∈ {1, 2}, f i (t) = f (t) := exp(ct α ) for some c, α > 0. By , r)). This particular on-diagonal upper bound is related to the condition of volume doubling; see [11]. Theorem 1.1 yields Gaussian upper bounds for the heat kernel even in the second situation, where one may have a different on-diagonal upper bound at each point of the graph.
The following is an immediate consequence of Theorem 1.1.
Corollary 1.2. Let (Γ, π) be a weighted graph, and suppose that there exists a constant C θ > 0 such that the vertex weights (θ x ) x∈G satisfy θ x ≥ C θ for each x ∈ G. Let f be an (A, γ)−regular function satisfying (1.5). If for each t > 0, the uniform heat kernel condition is satisfied, then there exist constants C 1 (A, γ, C θ ), C 2 (γ), α(γ) > 0 such that for all x 1 , x 2 ∈ G, and t ≥ 1 ∨ d θ (x 1 , x 2 ), If f is only (A, γ)−regular on (T 1 , T 2 ), then we obtain a restricted version of Theorem 1.1: Theorem 1.3. Let (Γ, π) be a weighted graph, and suppose that there exists a constant C θ > 0 such that the vertex weights (θ x ) x∈G satisfy θ x ≥ C θ for each x ∈ G. Let f 1 , f 2 be (A, γ)−regular functions on (T 1 , T 2 ) satisfying, for i ∈ {1, 2}, If there exist vertices v 1 , v 2 ∈ G such that for all t ∈ (T 1 , T 2 ) and i ∈ {1, 2}, the estimate holds, then there exist constants C 1 (A, γ, C θ ), C 2 (γ), α(γ) > 0 such that for all t > 0 satisfying Remarks: 1. The primary use of this result is in the case that T 2 = ∞, in which case one obtains Gaussian upper bounds for all sufficiently large times. In random environments such as supercritical percolation clusters, the functions which appear in existing on-diagonal heat kernel upper bounds may not be (A, γ)−regular, but rather (A, γ)−regular on (T, ∞) for some T > 0; Theorem 1.3 is useful for obtaining Gaussian upper bounds in this setting. Theorem 1.3 has also been used to obtain Gaussian heat kernel estimates for the random conductance model; see [1].
The structure of this paper is as follows. Section 2 establishes long-range, non-Gaussian heat kernel upper bounds for the heat kernel using the metric d θ , similar to earlier estimates of Davies in [8] and [9]. Sections 3 proves a maximum principle, analogous to the one established in [13]; this is subsequently used to estimate a tail sum of the square of the heat kernel. The direct analogue of the maximum principle from [13] does not work in the setting of graphs, and additional restrictions are necessary in order to establish the maximum principle of this paper.
In Section 4, we estimate this tail sum further using a telescoping argument from [13]. In [13], this argument is iterated infinitely many times, but in the present setting the telescoping argument cannot be employed past a finite number of steps. At this point, it is necessary to use the heat kernel estimates of Section 2 to get a final estimate on the tail sum. In Section 5, this estimate of the tail sum is used to estimate a weighted sum of the square of the heat kernel, and in turn, this estimate is used in Section 6 to establish Theorem 1.1. Section 7 discusses the modifications to Section 4 which are necessary to prove Theorem 1.3. Finally, Section 8 discusses applications to random walks on percolation clusters, and how the results of this paper may be applied to existing work on random walks in random environments.

Long range bounds for the heat kernel
In this section, we establish non-Gaussian upper bounds for the heat kernel p t (x, y) which are close to optimal in the space-time region where d θ (x, y) ≥ t. These bounds are closely related to the long-range bounds found in [8] and [9], and are established using the same general techniques. These bounds hold for all x, y ∈ G and all t > 0, although they give results weaker than Gaussian upper bounds in the space-time region where d θ (x, y) ≤ t.
Theorem 2.1. If x 1 , x 2 ∈ G, then for all t > 0, where Λ ≥ 0 is the bottom of the L 2 spectrum of the operator L θ .
Proof. By Proposition 5 of [8], for all x, y ∈ G and t > 0, we have the estimate .
Using the triangle inequality for the metric d θ and the fact that the function g(t) := e t +e −t = 2 cosh(t) is increasing for t ≥ 0, we obtain At this point, we use the inequality e s + e −s − 2 ≤ s 2 e s , which is valid for all s ≥ 0. This gives b(ψ λ , x) ≤ 1 2θ x y∼x π xy (e λd θ (x,y) + e −λd θ (x,y) − 2) Since this estimate holds uniformly in x, we have that Set f (λ) := 1 2 λ 2 e λ . Combining these estimates with (2.1), we get, for each λ > 0, By optimizing over λ > 0, we have where f is the Legendre transform of f , defined by Note that if f (λ) ≤ g(λ) for all λ > 0, f (γ) ≤ g(γ). Now, the function g(λ) := e 2λ satisfies f (λ) ≤ g(λ) for all λ > 0, so Thus, applying this estimate to the preceding work gives which holds for all t > 0.
One may also use these results to obtain a weak Gaussian upper bound for the heat kernel which does not use any information from on-diagonal bounds.
Proof. We proceed as in the proof of Theorem 2.1. Instead of using the inequality e s + e −s − 2 ≤ s 2 e s , we use the estimate which was used previously in [9]; we then obtain estimates similar to those above, except with f (λ) := 1 2 λ 2 1 + λe λ 6 . In [9], Davies computes that Inserting this estimate into the above yields as desired.

Maximum Principle
For the remainder of the paper, we fix a set of vertex weights (θ x ) x∈G for which there exists C θ > 0 with θ x ≥ C θ for all x ∈ G, and an associated metric d θ , satisfying (1.1). We also fix an increasing set of finite connected subsets (G n ) n∈Z + with limit G.
Let x 0 ∈ G be a point for which there exists a (A, γ)−regular function f satisfying (1.5) such that for t > 0, .
We define u(x, t) := p t (x 0 , x), and u (k) (x, t) := p In this section, we will prove a maximum principle for the quantities where ξ R will be defined later. This will allow us to estimate various sums and weighted sums of u 2 . One basic estimate which we will use repeatedly is, for any H ⊂ G and k ∈ Z + , , (3.1) using the symmetry and semigroup properties of the heat kernel.
The reason for considering the killed heat kernels p (G k ) t (x, y) is that the function u (k) is finitely supported, and thus there is no difficulty in interchanging double sums. When L θ is not a bounded operator on L 2 (θ) (see [8] for a proof), and the interchange of sums in (3.2) is not straightforward. We also remark that there is in general no simple description of the domain of the associated Dirichlet form E in this case.
Fix k ∈ Z + . Differentiating J (k) R (t) and using the fact that u is a solution to the heat equation on G k , we get (writing u (k) x for u (k) (x, t), ζ for exp • ξ, and ζ x for ζ(x, t)), x ζ x ) even if x 0 ∈ G k or x ∈ G k . By a Gauss-Green type calculation and using the fact that u (k) The equality (3.2) follows from interchanging the order of summation, which is permissible since u (k) has finite support. Completing the square, we see that It follows that Given λ > 1, there exists K λ < ∞ so that the inequality holds for |t| ≤ K λ . Now, we define the distance function d R,θ (x) := (R − d θ (x 0 , x)) + , and set Here R ≥ 0, t > 0, and s = s(t) > t are parameters that will be allowed to vary, and δ, ε > 0 are parameters that will be fixed. For the rest of this paper, we will fix λ, δ, ε so that the following conditions are satisfied: Let us show that such an assignment of constants is possible by exhibiting λ 0 , δ 0 , ε 0 which satisfy the above conditions. First, we choose λ 0 = 2, so that K λ 0 = 2.98 . . . ≤ 3; this satisfies (3.4). Next, since λ 0 and γ are known, we may define δ 0 through (3.7), and estimate so that (3.5) is also satisfied. We then choose ε 0 to be Let us also note that (3.6) is equivalent to Once λ, δ and ε have been fixed, we have the following result: Now, let us analyze the inequality As before, we have |d 2 and, since d R,θ (x) ≤ R, it certainly holds when which is precisely the condition in the statement of the Lemma. Now, for k ∈ Z + , we define By (3.1), all of these quantities are finite, and by monotone convergence, (3.10) The maximum principle allows us to estimate I, as follows: As long as then Lemma 3.2 gives Let us analyze when (4.1) is satisfied. Let j * denote the maximal j for which (4.1) holds. First, j * ≥ 0, since Using the definition of (R j ) j∈Z + , we obtain and the maximality of j * shows that R j * ≤ 6γe 2 t j * , R j * +1 > 6γe 2 t j * +1 − 1 2 .
Rearranging, we obtain is nonincreasing for d ≥ 2t. Since R j * > 2e 2 t j * , we get .
The fact that γ − 1 can be very close to 0 is a potential concern. In practice, one will often have the choice of several values of γ; for example, if f (t) = t α , one may choose any γ > 1. One also has the option of using the fact that (A, γ)−regularity implies (A 2 n , γ 2 n )−regularity to increase γ at the cost of increasing A (and hence m 0 ) also. However, choosing γ excessively large will cause α and n 1 to be undesirably close to zero.

Applications to random walks on percolation clusters
In this section, we show how Theorem 1.3 may be used to obtain Gaussian upper bounds for the CSRW on the infinite component of supercritical bond percolation on the lattice Z d equipped with the standard weights. A detailed description of percolation is given in [16]; a percolation cluster is a random connected subgraph of the lattice Z d obtained by deleting each edge independently with probability 1 − p and keeping it otherwise. By fundamental results of percolation theory, there exists a critical probability p c (d) such that for p > p c (d) (i.e., the supercritical case), there is an a.s. unique infinite cluster; we consider the CSRW on this family of random graphs, which we denote by C p,∞ (ω).
For existing work on random walks on percolation clusters, including on-diagonal heat kernel estimates and invariance principles, see [21] and [3]. From now on, we fix p > p c (d), and write q ω t (x, y) for the heat kernel of the CSRW on C p,∞ (ω); the dependence on ω of q ω t (x, y) is a consequence of C p,∞ (ω) being random. We denote the graph metric on C p,∞ (ω) by d C . In [21], Mathieu and Remy proved the following on-diagonal heat kernel bound for the CSRW on C p,∞ (ω).
2. Theorem 1.3 is also used in [1] to obtain Gaussian upper bounds for the heat kernel in the random conductance model; as in the case of supercritical percolation clusters, the function appearing in the on-diagonal heat kernel estimate of Proposition 4.1 of [1] is not (A, γ)−regular but rather (A, γ)−regular on (T, ∞) for some T > 0, so Theorem 1.3 yields Gaussian upper bounds for all sufficiently large times.