Thermalization of rate-independent processes by entropic regularization

We consider the effective behaviour of a rate-independent process when it is placed in contact with a heat bath. The method used to"thermalize"the process is an interior-point entropic regularization of the Moreau--Yosida incremental formulation of the unperturbed process. It is shown that the heat bath destroys the rate independence in a controlled and deterministic way, and that the effective dynamics are those of a non-linear gradient descent in the original energetic potential with respect to a different and non-trivial effective dissipation potential.


Introduction and Outline
In [5,14], it was proposed that a suitable model for the effect of a heat bath (i.e. the application of statistically disordered energy) on a gradient descent is a timeincremental variational problem in which, in each time step, the usual work done competes with an entropy term that penalizes coherent, deterministic evolutions. In the case of linear kinetics (two-homogeneous dissipation), this method is equivalent to the one used in [4] to generate the Fokker-Planck equation for an Itō stochastic gradient descent. This paper examines the case of one-homogeneous dissipation and generalizes the results of [14].
As outlined in Section 2, the discrete-time formulation of a rate-independent evolution in an energetic potential E(t, x) with respect to a one-homogeneous dissipation potential Ψ(x) is to find, given state x i at time t i , the state x i+1 at time t i+1 that minimizes (1.1) To represent the influence of a heat bath of "intensity" θ > 0 upon this evolution, we consider an associated variational problem (2.12) for the probability distribution of the random next state of the system, the solution of which is the Gibbsian density This paper shows that, under suitable assumptions on E and Ψ, in the limit as the time step tends to zero, this procedure yields a non-trivial deterministic limiting process. This limiting process is a gradient descent in the original energetic potential E but with respect to a new dissipation potential Ψ that is a non-linear -P R E P R I N T -M a y 2 , 2 0 1 4 -transformation (the Cramer transform) of the original one Ψ. As demonstrated in [13,14], this non-linear gradient descent arises in mechanical contexts such as Andrade creep. Rate-independent processes play an important rôle in the modelling of many physical phenomena such as plasticity and phase transformations in elastic solids, electromagnetism, dry friction on surfaces, and pinning problems in superconductivity. It is widely accepted that rate-independent processes, which describe mesoscopic or macroscopic properties, are limit processes for more complicated microstructural evolutions: the rate-independent model arises in the limit of vanishing inertia, relaxation time and thermal effects. Hence, this paper is concerned with the relaxation of the third of these limiting assumptions.
In Section 2, the notation and set-up of the problem are given, including a brief review of the necessary elements of the theories of gradient descents and rateindependent processes. In Section 3, some formal calculations are performed that motivate the introduction of the effective dissipation potential Ψ. In Section 4, Ψ is defined more formally, its properties examined, and the main convergence theorem (Theorem 4.4) is stated. Some conclusions and outlook for future work are given in Section 5. The proofs of the various results are given in Section 6.
2. Notation and Set-Up of the Problem 2.1. Gradient Descents. Both the unperturbed and perturbed processes of study in this paper are examples of gradient descents. The standard example of a gradient descent is the ordinary differential equationẋ(t) = −∇E(t, x(t)) for x : [0, T ] → R n , which is characterized by the energy evolution law d dt E(t, x(t)) = (∂ t E)(t, x(t)) − 1 2 |ẋ(t)| 2 − 1 2 |∇E(t, x(t))| 2 . (2.1) In general, gradient descents may be considered on any metric space (Q, d); see [1] for a comprehensive treatment. For the purposes of this paper, however, it is enough to consider the case in which Q is a subset of a Banach space (X , · ). A gradient descent in Q is characterized by an initial condition x 0 ∈ Q, an energetic potential E : [0, T ] × Q → R, and a dissipation potential Ψ : X → [0, +∞], which is convex and satisfies Ψ(0) = 0. For simplicity, E(t, x) is assumed to be differentiable with respect to both t and x. Definition 2.1. An absolutely continuous curve x : [0, T ] → Q is said to be a gradient descent in E with respect to Ψ and starting at is satisfied for almost every t ∈ [0, T ], where Ψ ⋆ : X * → [0, +∞] denotes the convex conjugate (Legendre-Fenchel transform) of Ψ, defined by The condition (2.2) is the appropriate generalization of (2.1); the classical case of linear kinetics is that in which the dissipation potential is given by Ψ(x) := 1 2 x 2 .
-P R E P R I N T -M a y 2 , 2 0 1 4 -Shortly, we shall consider rate-independent processes, in which Ψ is positively homogeneous of degree one; the limiting processes of this paper will be gradient descents for which Ψ is not homogeneous of any degree.
2.2. Incremental Formulation. The analysis and numerical approximation of gradient descents are often performed using a discrete-time incremental variational formulation. At each time step, the problem is to minimize the Moreau-Yosida regularization of E(t i , ·) [10,15]. P will denote a partition of the interval of time [0, T ], i.e. a finite strictly increasing sequence where ∆t i := t i − t i−1 and [P ] denotes the mesh size of P : The Moreau-Yosida scheme is a causal sequence of variational problems, the Euler-Lagrange equations of which are the equations of motion for the original gradient descent: Definition 2.2. The Moreau-Yosida incremental formulation of the gradient descent in E with respect to Ψ is to solve the following sequence of minimization problems: given an initial condition x (2.6) By abuse of notation, let x (P ) : [0, T ] → Q also denote the càdlàg piecewise-constant interpolation of the sequence x (P ) i N i=0 , as defined by (2.7) 2.3. Rate-Independent Processes. A rate-independent process is an evolutionary system that has no intrinsic time-scale: it "reacts only as quickly as its timedependent inputs". Put another way, the solution operator commutes with monotone reparametrizations of time. There is much literature on the theory, modelling and analysis of rate-independent processes and the connections with gradient descent theory; see e.g. [6,7,8].
Definition 2.3. Let Q and Q * be topological spaces. Suppose that each choice of initial condition x 0 ∈ Q and each input ℓ : The input-output relationship is said to be rate-independent if, for every strictly increasing and surjective ϕ : . The relationship is said to determine a (possibly multi-valued) evolutionary system if concatenations and restrictions of solutions are also solutions, i.e.
In the case of gradient descents on (subsets of) Banach spaces as described above, rate-independence corresponds to the dissipation potential Ψ : X → [0, +∞] being positively homogeneous of degree one, i.e. (2.8) It will be assumed that Ψ is both continuous and non-degenerate: i.e. there exist constants c Ψ , C Ψ > 0 such that This is equivalent to assuming that Ψ is the convex conjugate of the characteristic function of a suitable subset of X * : for some bounded, closed and convex set E ⊆ X * having 0 as an interior point. E is known as the elastic region and its frontier ∂E is known as the yield surface. The set is the collection of (locally) stable states at time t; since in this paper the energy E will always be convex, the distinction between global and local stability will not matter. As shown in [9, Theorem 7.1], the rate-independent problem is well-posed in the case that Q = X is a separable and reflexive Banach space; that Ψ satisfies (2.8) and (2.9); and that E(t, ·) is of smoothness class C 3 , with the eigenvalues of D 2 E bounded below by some γ > 0, uniformly in time and space.
2.4. Thermalized Gradient Descents: Entropic Regularization. Consider a gradient descent in R n with respect to an energy E and dissipation Ψ. The corresponding Moreau-Yosida incremental problem is as follows: given the state x i at time t i , the aim is to find x i+1 to minimize To model the effect of a heat bath on the gradient descent, we pass to an extended problem, in which the state of the system at time t i is a random variable X i . Given that the random state X i assumes the value x i at time t i , the random next state X i+1 for time t i+1 is posited to have the conditional probability density function ρ i+1 (·|x i ) ∈ L 1 (R n , λ; [0, +∞]) that minimizes (2.12) where λ denotes Lebesgue measure. The parameter ε i+1 > 0 represents the intensity of the heat bath to which the gradient descent is coupled; more precisely, ε i+1 is the amount of (disordered) energy that the heat bath injects into the system over the time interval [t i , t i+1 ].
-P R E P R I N T -M a y 2 , 2 0 1 4 -Equivalently to (2.12), given that the current state X i has probability density function ρ i ∈ L 1 (R n , λ; [0, +∞]), we may seek a joint probability density function ρ i,i+1 ∈ L 1 (R n × R n , λ ⊗ λ; [0, +∞]) that has ρ i as its first marginal and minimizes (2.13) The connection between (2.12) and (2.13) is given by The minimizer of (2.12) is a Gibbs-Boltzmann-type conditional probability density function (cf. [3,4]): Hence, given a partition P of [0, T ], an initial state x 0 ∈ R n , an energetic potential E : [0, T ] × R n → R and a dissipation potential Ψ : R n → [0, +∞), the thermalized gradient descent X (P ) denotes the discrete-time Markov chain that has transition probability densities given by (2.14). By the usual abuse of notation, X (P ) will also denote the càdlàg piecewise-constant interpolation (2.7), defined for all times t ∈ [0, T ].
In the classical case of linear kinetics (i.e. Ψ(x) = 1 2 |x| 2 for x ∈ R n ), this procedure generates the same sequence of densities as the method of [4], and they converge as [P ] → 0 to the solution of the Fokker-Planck equation for the Itō stochastic gradient descentẊ(t) = −∇E(t, X(t)) + √ εẆ (t). Theorem 4.4 establishes the deterministic limiting behaviour of the stochastic process X (P ) as [P ] → 0 in the case of a one-homogeneous dissipation potential Ψ.

Heuristics and Calculation of Moments
In this section we perform some calculations to motivate the main result of Section 4. For simplicity, suppose temporarily that E is of the prototypical quadratic type for some symmetric and non-negative A : R n → (R n ) * and some smooth enough ℓ : [0, T ] → (R n ) * ; this assumption will be relaxed shortly. Also, merely to aid the heuristic and simplify the notation, suppose that the parameters ε i > 0 are all equal to some constant ε > 0 independent of i and that the time step ∆t i is also independent of i.

-P R E P R I N T
-M a y 2 , 2 0 1 4 -Consider the following calculation for the conditional expectation of the next state X (P ) i+1 of the Markov chain X (P ) given that X Then the result of the above calculation may be summarized as The same change of variables z := (x i+1 − x i )/ε gives an estimate for the p th moment of the increments of the Markov chain: For later reference, these calculations are summarized in the following lemma: x with A : R n → (R n ) * symmetric and non-negative. Suppose also that Ψ = χ ⋆ E : R n → [0, +∞) is 1-homogeneous and non-degenerate. Let X (P ) denote the thermalized gradient descent Markov chain in E and Ψ on a partition P of [0, T ]. Then The above calculations, including Lemma 3.1, also go through even if E is not a quadratic form. The non-Ψ terms in the exponent are the Taylor series expansion of (3.2) By abuse of notation, Lemma 3.1 will henceforth be taken to refer to the generalized result for not-necessarily-quadratic E using (3.2).
Note, however, that in none of these expressions does the time increment appear explicitly. This is to be expected, since the original evolution was a rate-independent one. Therefore, in order to obtain a Markov chain that takes any account of time, it will be necessary to take ε to be proportional to the time step. Physically, since E, Ψ and ε all have the units of energy, this corresponds to assuming that the heat bath supplies energy to the system at a constant rate: the power of the heat bath is the constant of proportionality θ between ε and the time step. The parameter θ measures the intensity of the heat bath and can be seen, in some sense, as the "temperature".
The potential Ψ ⋆ ε : (R n ) * → [0, +∞] encodes a great deal of information about the Markov chain X. Most of the terms in the exponent of Ψ ⋆ ε are of order ε or higher, and so can reasonably be expected to have no influence in the limit as [P ] tends to zero in proportion to ε. The limiting dynamics of the Markov chain are expected to be controlled by an effective dual dissipation potential Ψ ⋆ , which is Ψ ⋆ ε with these higher-order terms omitted. Furthermore, the strong similarity to the Euler method for an ordinary differential equation and the fact that the variances are of order ε 2 ≪ ε suggest that the limiting evolution takes the form of a deterministic ordinary differential equatioṅ Therefore, the conjecture is that the effective behaviour of the rate-independent process in E with respect to Ψ when brought into contact with the heat bath is -P R E P R I N T -M a y 2 , 2 0 1 4 -that of a gradient descent in E with respect to the non-linear effective dissipation potential Ψ.

Main Results
In this section the formal manipulations of the previous section are made more precise: the effective (dual) dissipation potential that corresponds to Ψ is introduced and its properties examined; and the main convergence theorem about the limiting behaviour of the thermalized gradient descent Markov chain X (P ) as [P ] → 0 is stated.
4.1. Effective Dissipation Potential. As mentioned above, the effective dual dissipation potential Ψ ⋆ is simply the functional Ψ ⋆ ε of (3.2) with ε set equal to zero, and Ψ is its convex conjugate: The associated effective dissipation potential Ψ : R n → [0, +∞] is the Cramer transform of Ψ and is defined by convex conjugation: Up to a minus sign, Ψ ⋆ is the logarithmic moment generating function (or cumulant generating function) of the Borel measure ψ on R n defined by dψ(z) := exp(−Ψ(z)) dz. (4. 3) It is often convenient to write Ψ ⋆ as an integral over the Euclidean unit sphere S n−1 R n with respect to (n − 1)-dimensional Hausdorff measure H n−1 : Note that Ψ and Ψ ⋆ are objects that are intrinsic to the dissipation, not the energetic structure: they are determined entirely by the duality between R n and (R n ) * and the dissipation potential Ψ (or, equivalently, the geometry of the elastic region E). Proposition 4.2 summarizes the important properties of the effective dual dissipation potential Ψ ⋆ ; the proof is deferred to Section 6.   (1) Suppose that the elastic region E is a rectangular box with faces perpendicular to the coordinate axes in (R n ) * : Then the dissipation potential Ψ is the weighted ℓ 1 "Manhattan" norm Ψ(z) = n j=1 σ j |z j | and (4.6) (2) Suppose that the elastic region E is a Euclidean ball E := w = (w 1 , . . . , w n ) ∈ (R n ) * |w 1 | 2 + · · · + |w n | 2 ≤ σ 2 . (4.7) Then the dissipation potential Ψ is exactly σ times the usual Euclidean norm and To the standing assumption that Ψ = χ ⋆ E satisfies (2.8) and (2.9), we now add some assumptions on the energetic potential E. E : [0, T ] × R n → R is assumed to be bounded below, smooth in space with all derivatives uniformly bounded, and such that (t, x) → DE(t, x) is uniformly Lipschitz. It is also assumed that E is convex, and hence that the Hessian of E is a non-negative operator. Two further, more technical, assumptions are also required. Both of these assumptions are satisfied in the prototypical case   The composition x → Ψ ⋆ (DV (x)) is evidently non-convex, although it is quasiconvex (i.e. it has convex sublevel sets). and is convex and closed for every t; if A is positive-definite, then S(t) is also bounded, and hence compact. The prototypical case was examined in [14]; the technical conditions that follow were introduced in [13].
In order to control certain error terms in the proof of Lemma 6.4, which leads to Theorem 4.4, a monotonicity assumption is used to ensure that these terms have the right sign regardless of their magnitude. The requisite assumption is that for all t ∈ [0, T ], x → Ψ ⋆ (DE(t, x)) is convex, (4.9) or, equivalently, that D Ψ ⋆ (DE(t, ·)) is a monotone vector field for every t ∈ [0, T ]. This is a non-trivial assumption even if E is strictly convex, as the example illustrated in Figure 4.2 shows. Note also that (4.9) presupposes that the set S(t) of stable states is convex for every t ∈ [0, T ], and that convexity of E(t, ·) does not imply convexity of S(t) -see e.g. the kidney-shaped stable set of [9, Example 5.5]. Nevertheless, (4.9) holds in the prototypical case, since DE(t, x) = Ax − ℓ(t) is an affine function and the composition of convex function with an affine one always yields a convex function [2, §3.2]. It is also necessary to place an implicit constraint on the time-dependency of E. The problem to be avoided is that all the estimates for the moments of the increments ∆X (N.B. In this case of "small ℓ", the rate-independent process is static but the thermalized process (3.4) is not.) If A is positive-definite, then (4.10) always holds whenever the initial condition satisfies y(0) ∈S(0) = A −1 (ℓ(0) − E). Indeed, for not-necessarily-quadratic energies, uniform convexity of E implies the condition (4.10): and that ∂ t DE L ∞ < +∞. Then   An illustrative comparison of the original rate-independent evolution and the effect of the heat bath is given in Figure 4.3. Note that when θ is large (which corresponds to the heat bath being very hot),ẏ(t)/θ typically lies in the region of R n close to the origin where Ψ is approximately 2-homogeneous; when θ is small (which corresponds to the heat bath being cold),ẏ(t)/θ typically lies in the region of R n far from the origin where Ψ is approximately 1-homogeneous. Indeed, as θ → 0, the original rate-independent dynamics are recovered.

Conclusions and Outlook
There are three natural directions in which the results of this paper could be generalized. First and most obviously, the smoothness, convexity and other structural assumptions on E could be relaxed: so far, the various error terms in the proof of  in a non-convex energetic potential E is not always unique; the thermalization procedure could provide a selection principle if the thermalized process has a unique limit as θ → 0. Secondly, since most rate-independent processes of interest are infinitedimensional, or even posed on spaces that lack a linear structure, more general state spaces than R n could be considered. This is a potentially subtle topic, since in infinite-dimensional settings there is no obvious candidate for a reference measure with respect to which to take densities to calculate the entropy in (2.12). As noted in [13,Theorem 5.3.5], the Markov chains of study are not invariant under change of reference measure: the logarithm of the Radon-Nikodým derivative of the change of measure acts as an additive perturbation of the energetic potential. The calculations of Section 3 are quite interesting in general: they amount to a study of the tangent measures (in the sense of [12] & al.) of the Gibbsian distribution (2.14).
Thirdly, the limiting result of Theorem 4.4 should be seen as a first-order approximation that is valid for small positive "temperatures" θ. It would be interesting to examine the behaviour of a suitable rescaling of X − y and determine whether it obeys, say, the large deviations principle with respect to a suitable rate function.

Proofs and Supporting Results
Lemma 6.1. Let Ψ : R n → [0, +∞) be one-homogeneous, continuous, and nondegenerate as in (2.8)-(2.9). Let m : (R n ) * → R be given by where S n−1 R n denotes the Euclidean unit sphere. Then m is continuous and Since χ E is convex and lower semi-continuous, Note that f is continuous. Since m is a pointwise infimum of a family of continuous functions, it is upper semi-continuous. Since S n−1 is compact, m is a pointwise infimum of a compactly-parametrized family of continuous functions, and so is also lower semi-continuous [11]. Thus, m is continuous.
Suppose that there exists −v ∈E with m(v) = 0. By the compactness of S n−1 , this implies that there exists a unit vector u 0 ∈ S n−1 with But, since −v ∈E, there exists α > 1 such that −αv ∈E. Then (6.2) implies that v, u 0 < 0, so f (αv, u 0 ) < 0, and so m(αv) < 0, which contradicts (6.1). Hence, It remains only to show that m(v) = 0 for −v ∈ ∂E. Suppose not, i.e. that there exists −v ∈ ∂E with m(v) > 0. Since −v ∈ ∂E and E is convex (and hence star-convex with respect to the origin in (R n ) * ), for every α > 1, −αv ∈ E, and so m(αv) < 0. Hence, by the continuity of m, which is a contradiction. This completes the proof.
Proof of Proposition 4.2. Let ψ denote the Borel measure on R n defined by (4.3).
(1) By (2.9), ψ is a strictly positive and finite measure. Hence, since the exponential function in the integrand of (4.1) is never zero, the claim follows. (2) Consider the spherical integral form (4.4) for Ψ ⋆ . By Lemma 6.1, if −w ∈ E, then the integral is that of a continuous and bounded function over a compact set, so the integral exists and is finite. If −w ∈ ∂E, then, as in the proof of Lemma 6.1, there exists u w ∈ S n−1 with w, u w + Ψ(u w ) = 0, so the integrand has a pole. The triangle inequality for Ψ implies that for u w + u ∈ R n , Hence, the integrand in (4.4) grows more quickly than |u| −n as |u| → 0; hence, by the standard result that x → |x| −α lies in L 1 for a d-dimensional domain about 0 if, and only if, α < d, it follows that Ψ ⋆ (w) = +∞.
where the inequality follows from Hölder's inequality. Hence, since the logarithm is a monotonically increasing function, for all v, w ∈ −E, Moreover, since −E is convex and Ψ ⋆ is identically +∞ outside the interior of −E, Ψ ⋆ is convex on all of (R n ) * . (4) The derivative DZ ⋆ 0 : −E → (R n ) * * ∼ = R n can be computed using the standard theorem on differentiation under the integral sign, yielding and so on for higher-order derivatives: The integrals involved are all finite for −v ∈E because of the exponentially small tails of the measure ψ. (5) As in the proof of the second part of the claim, let −w ∈E and let u w ∈ S n−1 be such that w, u w + Ψ(u w ) is minimal (i.e. equals m(w)). Then w, u w + u + Ψ(u w + u) ≤ m(w) + |u| |w| + C Ψ .
Since m(w) → 0 as −w → ∂E, the same argument as in point (2) applies, and so Ψ ⋆ (w) → +∞ as −w → ∂E. Now suppose that D Ψ ⋆ does not blow up. Then, since Ψ ⋆ is smooth and E is compact, Ψ ⋆ would be bounded on −E, which is a contradiction.
Proof of Lemma 4.3. The energy evolution equation for Ψ ⋆ along y 0 can be calculated using the chain rule, yielding d dt Ψ ⋆ DE(t, y 0 (t)) = − D 2 E(t, y 0 (t)), D Ψ ⋆ (DE(t, y 0 (t))) ⊗2 Proposition 4.2 (5) implies that if Ψ ⋆ blows up along any curve (i.e. one that approaches −∂E in the dual space), then so does |D Ψ ⋆ |. However, the mean value -P R E P R I N T -M a y 2 , 2 0 1 4 -theorem and the above calculation imply that Ψ ⋆ must be decreasing when |D Ψ ⋆ | is large. This yields the desired contradiction.
The next lemma (Lemma 6.2) concerns the closeness of the effective dual dissipation potential Ψ ⋆ and the corresponding quantity Ψ ⋆ ε that controls the increments of the Markov chain. Lemma 6.3 gives the resulting bound for the classical gradient descents in Ψ ⋆ • DE and Ψ ⋆ ε • DE. Both these two results apply to the prototypical case of a quadratic energetic potential. Lemma 6.2. Suppose that the energetic potential E is smooth enough that where · op denotes the operator norm. Then, for every K ⋐ −E and every k ∈ N ∪ {0}, D k Ψ ⋆ ε → D k Ψ ⋆ uniformly on K as ε → 0. More precisely, for every such K and k, there exists a constant C ≥ 0 such that Proof. The essential quantity to estimate is it holds true that Let m(w) := inf{ w, z + Ψ(z) | |z| = 1}. By Lemma 6.1, m is continuous and bounded away from 0 on K. Similarly, since Z ⋆ 0 and Z ε are continuous and positive, they are bounded away from 0 on K, and |D k Z ⋆ 0 | is bounded on K. (Note that all these bounds fail on −∂E, so the assumption that K ⋐ −E is essential.) Thus, the emphasis is on estimating I k,ε (w) in terms of ε and uniformly over K.
I k,ε (w) will be estimated by splitting the integral into two parts: an integral over a ball around the origin in R n , and an integral over the complement. More precisely, for any a ∈ (0, 1), let R = R(a, ε, x, t) > 0 be such that Converting to spherical polar coordinates yields that, for some constant c n depending only on n, This estimate is valid for any a ∈ (0, 1) and corresponding R. The above integrals can be evaluated exactly using the recurrence relation the resulting polynomial-exponential expressions are a bit cumbersome to deal with, but only the leading-order contributions as ε → 0 are of interest here. We now make a specific choice of a and R such that a → 0 and R → ∞ at the right relative rates. Let a := ε 1/2 and let as required for (6.4) to hold. By l'Hôpital's rule, for this choice of a and R, a → 0 and R → +∞ as ε → 0, and there exist constants c 1 , c 2 such that The dominant term here is the ε 1/2 term, since R k+n−1 e −m(w)R not only tends to 0, but does so with all derivatives tending to zero as well; m(w) is bounded away from zero for w ∈ K ⋐ −E. Thus, there is a constant C k (dependent on k and the other geometric parameters, but not on ε) such that sup w∈K I k,ε (w) ≤ C k ε 1/2 for all small enough ε > 0.
Proof. The strategy is to appeal to Lemma 6.2 and Grönwall's inequality. First, note that there exists a K ⋐ E such that −DE(t, y 0 (t)) ∈ K for all t ≥ 0, i.e. for otherwise, since Ψ ⋆ blows up to +∞ on −∂E (by Proposition 4.2(5)), it would follow that t → Ψ ⋆ (DE(t, y 0 (t))) blows up to +∞ in finite time, which would contradict (4.10).
Assume that ε > 0 is small enough that the conclusion of Lemma 6.2 holds. Then, by Lemma 6.2, there exists C ≥ 0 such that Let L be the product of the (finite) Lipschitz constants for DE and D Ψ ⋆ | K . Then, by Grönwall's inequality, for all t ∈ [0, T ], y(t))) given by Proof. In order to simplify the notation, assume that the partition P is a uniform partition with [P ] = h > 0, and define a time-dependent vector field f h by f h (t, x) := −D Ψ ⋆ h (DE(t, x)). Fix δ > 0 small enough that K(t) := {x ∈ S(t) | dist(x, ∂S(t)) > δ} is non-empty and contains y(t) for every t ∈ [0, T ]. Furthermore, using the Lipschitz assumption on DE, assume that δ is small enough that K(t i ) ⋐ S(t i±1 ) for each i. Write (dropping the superscript that indicates the partition P or its mesh size h) By Lemma 3.1 (or, more precisely, its generalization to non-quadratic E through (3.2)), for each x, Ξ i+1 (x) is a random variable with mean 0 and k th central moment at most C k (x)h k . The deviations Z := X − y satisfy In summary, we have the following facts: (M) f h (t i+1 , ·) is a monotonically decreasing vector field on S(t i+1 ); (B) f h (t i+1 , ·) is bounded on compactly-embedded subsets of S(t i+1 ); (Z) for every x, E[Ξ i+1 (x)] = 0.
Let K i be (the σ-algebra generated by) the event that X j ∈ K(t j+1 ) for 0 ≤ j ≤ i. Applying the conditional expectation operator E[·|K i ] (which is never conditioning -P R E P R I N T -M a y 2 , 2 0 1 4 -on an event of zero probability) to the Euclidean dot product of (6.6) with itself yields that and application of the unconditional expectation operator to both sides yields the following uniform bound for the second moment of the deviations: Inequality (6.7) can be used to "bootstrap" a similar inequality for the fourth moments. Define a tetralinear form τ : (R n ) 4 → R by τ (w, x, y, z) := (w · x)(y · z), (6.8) so that |x| 4 = τ (x, x, x, x). This tetralinear form is invariant under arbitrary compositions of the following interchanges of entries: (1, 2), (3,4) and (1, 3)(2, 4). The Cauchy-Bunyakovskiȋ-Schwarz inequality for the Euclidean inner product implies a corresponding inequality for this tetralinear form: for all w, x, y, z ∈ R n , |τ (w, x, y, z)| ≤ |w||x||y||z|. (6.9) Hence, E |Z i+1 | 4 ≡ E E |Z i+1 | 4 K i can be expanded using the tetralinear form (6.8) and (6.6) and each term estimated as in the derivation of (6.7). By (Z), those terms containing precisely one Ξ i+1 (X i ) have zero expectation; the terms of the form E τ Z i , Z i , Z i , h(f h (t i+1 , X i ) − f h (t i+1 , y i )) K i are non-positive by (M); the remaining terms can all be estimated using (B), (6.7), (6.9) and Lemma 3.1, with the worst bound being O(h 3 ). Thus, the following uniform bound for the fourth moment of the deviations holds: (6.10) Hence, for η > 0, by Chebyshev's inequality ≤ η −4 CT 2 h by (6.10), which establishes (6.5) and completes the proof.