Weak Error for the Euler Scheme Approximation of Diffusions with Non-Smooth Coefficients *

We study the weak error associated with the Euler scheme of non degenerate diffusion processes with non smooth bounded coefficients. Namely, we consider the cases of H{\"o}lder continuous coefficients as well as piecewise smooth drifts with smooth diffusion matrices.


Setting.
Let T > 0 be a fixed given deterministic final horizon and x ∈ R d be an initial starting point. We consider the following multidimensional SDE: with (1.1) whose dynamics writes X h 0 = x and for all t ∈ [0, T ]: where we set φ(u) = inf{(t i ) i∈[[0,N −1]] : t i ≤ u < t i+1 }.
A useful quantity to study, arising in many applicative fields from physics to finance, is the so-called weak error which for a suitable real valued test function f writes: using the usual Markovian notations, i.e. X h,0,x T , X 0,x T respectively stand for the Euler scheme and the diffusion at time T which start at point x at 0.
There is a huge literature concerning the weak error for smooth and/or non-degenerate coefficients, from the seminal paper of Talay and Tubaro [TT90], to the extensions to the hypoelliptic framework [BT96a]. Under those conditions, the quantity d(f, x, T, h) is of order h corresponding to the magnitude of the time step. In the non degenerate framework (under some uniform ellipticity or hypoellipticity conditions) it is even possible to take f to be a Dirac mass in the above expression (1.3). The associated convergence rate remains of order h for the Euler scheme, see [KM02] [BT96b] and h 1/2 in the more general case of Markov Chain approximations, see e.g [KM00] in which the Brownian increments appearing in (1.2) are replaced by i.i.d. sequences (ξ i ) i≥1 that are not necessarily Gaussian. In the framework of Lipschitz coefficients we can also mention, in the scalar case, the recent work of Alfonsi et al. [AJKH14], who obtained bounds on the Wasserstein distances between the laws of the paths of the diffusion and its Euler scheme. Anyhow, the case of non smooth coefficients, Hölder continuous or less, has rarely been considered. Such cases might anyhow appear very naturally in many applications, when the drifts have for instance discontinuities at some given interfaces or when the diffusion coefficients are very irregular (random media).
In the framework of bounded non degenerate and Hölder continuous coefficients, let us mention the work of Mikulevičius and Platen [MP91] who obtained bounds for the weak error in (1.3) at rate h γ/2 where γ ∈ (0, 1) is the Hölder exponent of the coefficients b, σ in (1.1) provided f ∈ C 2+γ b (R d , R) (space of bounded functions with bounded derivatives up to order two and γ-Hölder continuous second derivatives). This regularity is essential in that work to apply Itô's formula. Our approach permits to establish that this bound holds true, up to an additional slowly varying factor in the exponent, for the difference of the densities itself, which again corresponds to the weak error (1.3) for a δ-function. We also mention the recent work of Mikulevičius et al. [Mik12], [MZ15], concerning some extensions of [MP91] to jump-driven SDEs with Hölder coefficients.
Finally, concerning numerical schemes for diffusions with non-regular coefficients, we refer to the recent work of Kohatsu-Higa et al. [KHLY15] who investigate the weak error for possibly discontinuous drifts and diffusion coefficients that are just continuous. We are able to extend some of their controls to densities. Indeed, in the quoted work, the authors investigate (1.3) for functions f that are at least continuous. We again have an additional slowly varying factor in the exponent which is due to our smoothing approach.
Our strategy is the following. Under the previous assumptions (stated after (1.1)), both processes (X t ) t∈(0,T ] in (1.1) and (X h ti ) i∈[[1,N ]] in (1.2) have densities, see e.g. [KKM16] for the continuous process and Lemaire and Menozzi [LM10] for the scheme. Let us denote them respectively for x ∈ R d , 0 ≤ i < j ≤ N , by p(t i , t j , x, .) and p h (t i , t j , x, .) for the processes starting at time t i from point x and considered at time t j . To study the error (p − p h )(t i , t j , x, y) we introduce perturbed dynamics associated with (1.1) and (1.2) respectively. Namely, for a small parameter ε, we mollify suitably the coefficients, the mollification procedure is described in its whole generality in Section 2 and depends on the two considered sets of assumptions indicated above, and consider two additional processes with dynamics: (1. 4) where b ε , σ ε are mollified versions of b, σ. It is clear that both (X (ε) t ) t∈(0,T ] and (X h,(ε) ti ) i∈ [[1,N ]] have densities. The mollified coefficients indeed satisfy uniformly in the mollification parameter the previous assumptions. Let us denote those densities for x ∈ R d , 0 ≤ t i < t j ≤ T by p ε (t i , t j , x, .), p h ε (t i , t j , x, .) respectively. The idea is now to decompose the global error as: (1.5) The key point is that the stability of the densities with respect to a perturbation has been thoroughly investigated for diffusions and Markov Chains in Konakov et al. [KKM16]. The results of that work allow to control the differences p − p ε , p h ε − p h . On the other hand, since the coefficients b ε , σ ε of (X ) i∈ [[0,N ]] are smooth the central term p ε −p h ε in (1.5) can be investigated thanks to the work of Konakov and Mammen [KM02] giving the error expansion at order h on the densities for the weak error. The key point is that the coefficients in the expansion depend on the derivatives of b ε , σ ε which explode when ε goes to zero. This last condition is natural in order to control p − p ε , p h ε − p h . Thus, two contributions need to be equilibrated to derive the global error bounds. This will be done through a careful analysis of the densities (heat kernel) of the processes with dynamics described in (1.1), (1.2), (1.4). The estimates required for the error analysis will lead us to refine some bounds previously established by Il'in et al [IKO62]. Let us indicate that this perturbative approach had also been considered by Kohatsu-Higa et al. [KHLY15] but for the weak error (1.3) involving at least a continuous function. Our approach, based on parametrix techniques, allows to handle directly the difference of the densities, and gives, up to an additional factor going to zero with the time step, the expected convergence rates.

Assumptions and Main Results.
Let us introduce the following assumptions. (A1) (Boundedness of the coefficients). The components of the vector-valued function b(t, x) and the matrix-valued function σ(t, x) are bounded measurable. Specifically, there exist constants K 1 , K 2 > 0 s.t.
(H) (Hölder drift and diffusion coefficient). The drift b and the diffusion coefficient σ are time-space Hölder continuous in the following sense: for some γ ∈ (0, 1] , κ < +∞, for all (s, t) Observe that the last condition also readily gives, thanks to the boundedness of σ, that the diffusion matrix a = σσ * enjoys the same Hölder property.
(PS) (Piecewise smooth drift and Smooth diffusion coefficient). The drift b is piecewise smooth with bounded derivatives outside of the discontinuity sets.
where the set of possible discontinuities I writes as Here, for all i ∈ [[1, m]], S i is a smooth bounded submanifold of R d (at least C 4 ) of dimension lower or equal to d − 1, i.e. S i := {x ∈ R d : g i (x) = 0} for a corresponding smooth function g i . We also assume that the ( On the other hand we assume that the diffusion coefficient σ is globally . We emphasize that, with the above definition, the discontinuity set of b only depends on the spatial variable. A time-dependent discontinuity set could a priori also be considered provided each of it components is the boundary of a smooth time-space domain. Namely, considering for x) = 0}, the smooth spatial submanifolds S i (t) should as well evolve smoothly in time. We consider the case introduced in (A P S ) for simplicity.
From now on, we always assume conditions (A1)-(A2) to be in force. We say that assumption (A H ) (resp. (A P S )) holds if additionally the coefficients satisfy (H) (resp. (PS)). We will write that (A) holds whenever (A H ) or (A P S ) is satisfied.
We will denote, from now on, by C a constant depending on the parameters appearing in (A) and T . We reserve the notation c for constants that only depend on (A) but not on T . The values of C, c may change from line to line. Other possible dependencies will be explicitly specified.
Theorem 1 (Error for the Euler scheme of a diffusion with Hölder coefficients). Let T > 0 be fixed and consider a given time where p, p h respectively stand for the densities of the diffusion X and its Euler approximation X h with time step h, for all (t, z) ∈ R + * × R d , p c (t, z) := c d/2 (2πt) d/2 exp(−c |z| 2 2t ) and ψ(h) = log 3 (h −1 ) log 2 (h −1 ) where log k denotes for k ∈ N * the k th iterated logarithm. Let us observe that ψ(h) −→ h→0 0. If we are now interested in the weak error in the sense of (1.3), for a function f ∈ C β (R d , R) (uniformly β-Hölder continuous functions), β ∈ (0, 1]: using again the usual Markovian notations, i.e. X h,ti,x tj , X ti,x tj respectively stand for the Euler scheme and the diffusion at time t j which start at point x at t i . Eventually, if we consider a smooth domain A ⊂ R d (i.e. a connected open set at least C 2 ) with bounded boundary and non zero Lebesgue measure, we also get that for all x ∈ R d s.t. d(x, ∂A) ≥ (t j − t i ) 1/2 h γ/2 : where d(., ∂A) stands for the distance to the boundary of A.
Remark 1. We point out that this result is to be compared with the one obtained by Mikulevičius and Platen [MP91] for the weak error. The framework they considered is similar to ours, and their main results consists in controlling at rate h γ/2 the weak error d(f, x, for a smooth function f ∈ C 2+γ b (R d , R) (space of bounded functions, with bounded derivatives up to order two and γ-Hölder continuous second derivatives). The above theorem establishes that |d(f, x, T, h)| ≤ Ch γ/2−Cψ(h) as soon as f is measurable and satisfies the growth condition (1.9) This control can be useful for specific and relevant applications, like for instance quantile estimation (that would involve functions of the form f (x) = I |x|≤K or f (x) = I |x|≤K exp(c|x|)) that appear in many applications: default probabilities in mathematical finance, fatigue of structures in random mechanics. We are able to find the expected convergence rate up to a vanishing contribution. The rate h γ/2 again holds, without the additional term, as soon as f ∈ C β (R d , R), β ∈ (0, 1]. Some extensions to unbounded functions f satisfying the growth condition (1.9) are described in Remark 7 of Section 3.3. The contribution in ψ(h) appearing in (1.6), which slightly deteriorates the convergence, seems to be, with our approach, the price to pay to get rid of any smoothness on f . Observe anyhow that for indicator functions of smooth Borel sets, equation (1.8) provides a better result than (1.6) as soon as the initial distance to the boundary satisfies d(x, ∂A) ≥ (t j − t i ) 1/2 h γ/2 (see Section 3.3.2 for details). Observe that this control improves in that case what could be derived from [KHLY15] in which continuous test functions are considered.
Remark 2 (About the Convergence Rate). We also emphasize that the convergence rate in h γ/2 is closer to a rate associated with a strong error. It indeed corresponds to the typical magnitude of the quantity E[|W h | γ ] ≤ c γ h γ/2 , which reflects the variations, on one time-step of length h, of the Euler scheme with Hölder coefficients. (1.10) These terms typically appear in the error analysis when there is low regularity of the coefficients or of the value Under the previous assumptions, if the function f belongs to C 2+γ b (R d , R), γ ∈ (0, 1) it is then well known, see e.g. Friedman [Fri64] or Ladyzhenskaya et al. [LSU68] that where L t stands for the generator of (1.1) at time t, i.e. for all ϕ ∈ C 2 0 (R d , R), x ∈ R d , Recalling that t 0 = 0, t N = T , we decompose the error as: (1.11) exploiting the PDE satisfied by v for the last equality. For a function f in C 2+γ b (R d , R), the spatial derivatives of v up to order two are globally bounded on [0, T ]. Indeed, the classical Schauder estimates hold (see e.g. Theorem 5.2, p. 361 in [LSU68]). We are thus led to control in (1.11) quantities similar to those appearing in (1.10). The associated bound then precisely gives the convergence rate. The analysis extends if f is simply C β (R d , R), β ∈ (0, 1] and therefore possibly unbounded. In that case the second derivatives yield an integrable singularity in time for the second order partial derivatives. We refer to Proposition 4, which holds under the sole assumption (A H ) for multi-indices α, |α| ≤ 2, and to the proof of Theorem 1 in Section 3.4.1. Extensions to locally β-Hölder functions f satisfying the growth condition (1.9) are discussed in Remark 7.
Remark 3. Even though we have considered γ ∈ (0, 1], our analysis should extend to the framework of Hölder spaces to γ ∈ (1, 2]. On the other hand, Theorem 1 specifies the time-singularity in small time. Remark 4. We feel that the bounds of Theorem 1 are relevant for functions which are truly Hölder continuous, that is for coefficients that would involve some simple transformations of the Weierstrass functions, see e.g. [Zyg36], or of an independent Brownian sample path in order that (A H ) is fulfilled. Indeed, for functions which are just locally Hölder continuous, like the mapping x → 1 + |x| α ∧ K, α ∈ (0, 1], we think that it would be more appropriate to study some local regularizations, close to the neighborhoods of real Hölder continuity (0 and K 1/α for the indicated example) and to exploit that, outside of these neighborhoods, the usual sufficient smoothness is available. For such coefficients we think that the convergence rates might be definitely better.
(1.12) -If d(y, I) (distance of the final point y to the spatial discontinuity set I) satisfies d(y, I) ≥ h 1/2−ǫ for a fixed given ǫ ∈ (0, 1/2], then: (1.13) -In the special case σ(t, x) = σ, i.e. constant diffusion coefficient 1 , the previous bound improves to: (1.14) Remark 5. This result emphasizes that, as soon as the drift is irregular, a true diffusion coefficient deteriorates the convergence rate. This is clear since, in that case, the difference of the densities p ε − p h ε in (1.5) involves higher derivatives of densities of processes with mollified coefficients which are more explosive (see Section 3.4).
We also mention that the distance of the final point to the discontinuity set plays an important role. The global control (1.12) improves to (1.13) as soon as h 1−1/(2d) ≤ d(y, I).
Eventually, if the diffusion coefficient does not depend on space, we find, up to the additional term in ψ(h), the usual convergence rate for the weak error if d = 1 as soon as c 0 ≤ d(y, I) for any given c 0 > 0.
However, our regularization approach clearly feels the dimension, when doing e.g. Hölder inequalities on neighborhoods of the discontinuity sets, and the convergence rates decrease with the dimension.
Let us carefully mention that considering the weak error for smooth functions f and not Dirac masses as we do, should improve the convergence rates and in particular allow to get rid of the terms in ψ(h) through a careful investigation of the derivatives of the associated heat kernels. We refer to the estimates of Proposition 4 that could be refined when considering an additional integration w.r.t. to the final variable. A very popular model for interest rates in the financial literature is the Cox-Ingersoll-Ross process with dynamics:  [GR11].
Of course the dynamics in (1.15) does not enter our framework, since it is closer to the dynamics of a Bessel-like process whose density does not satisfy Gaussian bounds. However, we could introduce for positive parameters η, K, which are respectively meant to be small and large enough, the dynamics: (1.16) The diffusion coefficientσ(x) = (η + σ|x| 1/2 ∧ K) is then uniformly elliptic, 1/2 Hölder continuous and bounded.
On the other hand the drift is not bounded but the analysis of Theorem 1 would still hold true thanks to the work of Konakov and Markova [KM15] that allows to get rid of the linear drift through a suitable transformation. We would then derive a convergence of order h 1/4−Cψ(h) at least for the associated Euler scheme on the densities (see also Remark 4). Even though the marginals in (1.16) enjoy Gaussian bounds, see e.g. [DM10], the expected properties for an interest rate dynamics, mean reverting and positivity, should hold with some high probability. Also, the difference between the approximate dynamics in (1.16) and the original one in (1.15) might be investigated through stochastic analysis tools (occupation times).

Extension to some Kinetic Models
The results of Theorems 1 and 2 should extend without additional difficulties to the case of degenerate diffusions of the form: (1.17) denoting X t = (X 1 t , X 2 t ), under the same previous assumptions (A H ) or (A P S ) on b, σ. The sensitivity analysis when we consider perturbations of the non-degenerate components, i.e. for a given ε > 0: (1.18) has been performed by Kozhina [Koz16] following [KKM16]. The key point is that under (A), the required parametrix expansions of the densities associated with the solutions of equation (1.17), (1.18) were established in [KMM10]. The analysis of the derivatives of the heat kernel, that would require to extend the results of Section 3 to the considered degenerate setting will concern further research.
The paper is organized as follows. We first introduce a suitable mollification procedure of the coefficients in Section 2 and derive from the stability results of Konakov et al. [KKM16] how the error of the mollifying procedure is then reflected on the densities. This allows to control the terms p − p ε and p h ε − p h in (1.5). We then give in Section 3 some pointwise bounds on the derivatives of the heat-kernels with mollified coefficients. From these controls and the previous error expansion obtained for the Euler scheme with smooth coefficients by Konakov and Mammen [KM02], we are able to control the remaining term p ε − p h ε in (1.5). We then establish our main estimates equilibrating the two errors. Eventually, Section 4 is dedicated to the proof of the controls stated in Section 3. These proofs are based on the parametrix expansions of the underlying densities following the Mc-Kean and Singer approach [MS67].

Mollification of the Coefficients and Stability Results
For the error analysis, in order to apply the strategy described in the introduction, we first need to regularize in an appropriate manner the coefficients. The mollifying procedures differ under our two sets of assumptions.

Mollification under (A H ) (Hölder continuous coefficients)
In this case both coefficients b, σ need to be globally regularized in time and space. We introduce the mollified coefficients defined for all (t, where * stands for the spatial convolution and for ε > 0, ρ ε is a spatial mollifier, i.e. for all x ∈ R d , for some compact set K ⊂ R d . The subscript S in b ε,S , σ ε,S appears to emphasize that the spatial convolution is considered. We will also need a mollification in time when the coefficients are inhomogeneous. Up to a symmetrization in time of the coefficients b, σ, i.e. we set for all (t, where ⋆ stands for the time convolution and for s ∈ R, ζ ε 2 (s) := ε −2 ζ(s/ε 2 ), ζ being a scalar mollifier with compact support in [−T, T ]. The complete regularization in the spatial and time variable reflects the usual parabolic scaling. This feature will be crucial to balance the singularities appearing in our analysis (see Propositions 4, 5 and their proofs below). We have the following controls.
Proposition 1 (First Controls on the Mollified Coefficients). Assume that (A H ) is in force. Then, there exists C ≥ 1 s.t. for all ε > 0, where for a given function f : From the Hölder continuity of b assumed in (H) and the above equation, we deduce that b ε,S satisfies (H) as well and that: (2.4) The same analysis can be performed for σ ε,S , so that σ ε,S satisfies (H) and we also have that b ε,S , σ ε,S are both γ/2-Hölder continuous in time uniformly in ε > 0. Repeating the previous arguments replacing ρ ε by ζ ε 2 , we deduce This gives the controls concerning the sup norms in (2.3).
Let us now turn to the Hölder norm. Observe first that, for all t ∈ R + , (x, y) ∈ (R d ) 2 : It readily follows from the γ-Hölder continuity in space of σ (resp. the γ-Hölder continuity in space and the γ/2-Hölder continuity in time of σ ε,S ) that the following controls hold: This completes the proof. We will need as well some controls on the derivatives of the mollified coefficients.
Proposition 2 (Controls on the Derivatives of the Mollified Coefficients). Under the assumptions of Proposition 1, we have that there exists C ≥ 1 s.t. for all ε ∈ (0, 1) and for all multi-index α, |α| ∈ [[1, 4]]: Also, there exists a constant C s.t.: exploiting the Hölder continuity assumption (H) for σ in the last but one inequality and the assumptions on ρ for the last one. Similarly, we derive for all (t, x, y) ∈ [0, T ] × (R d ) 2 and all ε > 0: The same bounds hold for b ε,S as well. The previous controls readily imply (2.6) since the additional time convolution does not have any impact here.
Equation (2.7) is derived proceeding similarly for the time convolution, exploiting as well the γ/2-Hölder continuity in time of σ ε,S . This completes the proof.

Mollification Under (A P S ) (Piecewise smooth drift and Smooth Diffusion Coefficient).
In this case we only need to regularize the drift in a neighborhood of the discontinuities. Let us denote by m ∈ N * , the finite number of spatial discontinuity sets and write I : Here, d S (·, S i ) stands for the signed distance to S i . This function has the same smoothness as the boundary S i (see e.g. Lemma 14.16 and its proof p. 355 in [GT98]). By convention, for d i ≥ 1, we choose d S (x, S i ) to be positive for points x being in the bounded region with bounded boundary S i .
The fact is now that we set b ε (t, x) = b(t, x) on R d \V ε (I) and perform a smooth mollification on the neighborhood V ε (I) of the discontinuity sets. A possible way to proceed is the following. Introduce for all i ∈ the projection of x on the corresponding boundary ( 2} stands for the signed distance of x to the corresponding boundary ∂V i,j ε and is again a smooth function. Observing that for We take the worst bound for simplicity. Observe as well that the following control holds for the derivatives of the mollified coefficient.
Under the considered assumptions it is not necessary to mollify the diffusion coefficients. We thus set for , in order to keep homogeneous notations under our two running assumptions for the drift.

Stability Results
Recall now that under (A H ) or (A P S ) equation (1.1) admits a density (see e.g. [She91] under (A H ) or Proposition 1 in [KKM16] x, y)dy. The same holds for the Euler scheme in (1.2) (see e.g. Theorem 2.1 in [LM10] x, y)dy. These properties remain valid for the respective perturbed diffusion and Euler scheme whose coefficients correspond to the procedures described in Section 2.1 and Section 2.2 depending on whether assumption (A H ) or (A P S ) is in force. We denote the densities associated with the perturbed diffusion and discretization scheme by p ε and p h ε respectively. Let us now state the sensitivity result following from Theorems 1 and 2 in [KKM16].
Theorem 3 (Main Sensitivity Result). Define for q ∈ (d, +∞] and η ∈ (0, 1] the quantities: (2.10) Also, there exists C ≥ 1 s.t.: (2.11) Remark 6 (Constraint on q). The constraint q > d in the above result is due to the fact that to establish (2.10) in the case q < +∞, we are led to control quantities of the type We refer to the proof of Lemma 2 in [KKM16] for additional details.
Proof. Equation (2.10) readily follows from Theorems 1 and 2 in [KKM16]. The point is here to specify the control (2.11) on the constant appearing in (2.10). Lemma 3 in [KKM16], quantifies the explosive contributions for each term of the parametrix series giving the difference of the densities. It holds for both the diffusion and the Euler scheme, see Section 3.2 of [KKM16] for details, and yields: for a constantC :=C((A), T ) which does not depend on η or q. Introduce for θ ∈ (0, 1 2 ] the quantity: One easily gets that for a given T > 0, there existsC :=C((A), T ) ≥ 1 independent of θ as well such that: Set now r 0 := ⌈ 1 θ ⌉ and write by monotonicity of the Γ function (see e.g. formula 8.363 (8) in Gradstein and Ryzhik [GR14]): This gives (2.11) taking θ = η 2 ∧ α(q) and completes the proof. From Theorem 3, we get the following key sensitivity results.
Proof. The lemma derives from Theorem 3 and Proposition 1. The bound on C η follows observing as well that Proof. Recall that under (A P S ), since the diffusion coefficient is smooth, there is no need to regularize it and σ = σ ε . Thus, ∆ ε,σ,1 = 0. From this observation and equation (2.8), Theorem 3 then yields (2.13). The bound on C q follows observing as well that for q ∈ (d, +∞), 1 2 ∧ α(q) = α(q). Let us mention that the constants C η , C q in equations (2.12) and (2.13) respectively explode when η goes to 0 and q goes to d, which is precisely what we want in order to have the fastest convergence rate w.r.t. ε. On the other hand, the explosion rates that we have emphasized in (2.11) are crucial in order to equilibrate the global errors. This step is performed in Section 3.4 below.

Stream Line to the Proofs of the Main Results.
This Section is devoted to the proof of Theorems 1 and 2. Our main results are those controlling the difference of the densities, i.e. the estimates given in equations (1.6) under (A H ) and (1.13), (1.14) under (A P S ).
To obtain these bounds, the strategy is the following. Let 0 ≤ t i < t j ≤ T and (x, y) ∈ (R d ) 2 be given. One writes for ε > 0: Now, one derives from the sensitivity Lemma 1 that, under (A H ), for all η ∈ (0, γ): Similarly, Lemma 2 yields that, under (A P S ), for all q > d: To investigate and minimize the contributions in the error it thus remains from equations (3.2) and (3.3) to precisely control the difference |p ε − p h ε | in (3.1). Let us now recall that, since the densities p ε , p h ε are now respectively associated with a diffusion process and its Euler scheme with smooth coefficients, they can be compared thanks to the results in [KM02] adapted to the current inhomogeneous setting. We thus have that: where C bε,σε depends on the derivatives of b ε , σ ε and therefore explodes when ε goes to 0. The delicate and crucial point is that we must here precisely quantify this explosion. A key ingredient, to proceed is the parametrix series representation for the densities of the diffusion and its Euler scheme. These aspects are recalled in Section 3.2 below.
Importantly, the parametrix expansion of the density of X ti,x tj in (1.1), i.e. for the equation without mollified coefficients, also directly allows to derive, without any sensitivity analysis, exploiting the controls on the derivatives of the density p(t i , t j , x, ·) of X ti,x tj w.r.t. x up to order 2 under (A H ), the bounds in (1.7) and (1.8). The arguments follow from cancellation techniques that are also crucial to derive our main estimates. We first illustrate this approach in Section 3.3 which is dedicated to the proof of (1.7) and (1.8) (integrated weak error).
The main results corresponding to the controls of the difference of the densities are established in Section 3.4. As emphasized above, these results do rely on the sensitivity analysis. They also require a careful analysis of the explosions of the higher order derivatives of the involved heat kernels which need to be quantitatively controlled in terms of the corresponding regularization procedure. The main result in that direction is Proposition 4 below whose proof, which heavily exploits cancellation techniques, is postponed to Section 4. It yields a precise control of the constant C

Parametrix Representation of Densities.
From Section 2 in [KKM16], we derive that under (A) (i.e. the expansions below hold under both (A H ) and (A P S )), for all ε ≥ 0 (the expansion below even holds for the initial coefficients taking ε = 0), 0 ≤ s < t ≤ T, (x, y) ∈ (R d ) 2 : where for 0 ≤ u < t ≤ T, (z, y) ∈ (R d ) 2 : H ε (u, t, z, y) := (L ε u −L ε,y u )p ε (u, t, z, y), (3.6) and L ε u ,L ε,y u respectively stand for the generators at time u of the processes Alsop ε (u, t, z, y) :=p y ε (u, t, z, w)| w=y wherep y ε (u, t, z, .) stands for the density at time t of the processX (ε),y starting from z at time u. We denote in (3.5),p ε ⊗ H (0) ε (s, t, x, y) =p ε (s, t, x, y) and for all r ≥ 1, (v, t, w, y)dw. More generally, the symbol ⊗ stands for the time-space convolution, i.e. for two real valued functions f, g defined on We also recall that under (A P S ), since the diffusion coefficient is smooth we do not regularize it and denote in this case σ ε = σ.
To investigate the contribution p ε − p h ε in (3.1) we will also use for where the quantities at hand are the same as above and the symbol ⊗ h replacing the ⊗ in (3.5) denotes the discrete convolution, i.e. for all r ≥ 1, Even though p d ε (t i , t j , x, .) is not a priori a density, we will call it so with a slight abuse of terminology. An important control, under (A), for the terms in the parametrix series is the following: taking γ = 1 under (A P S ). We emphasize that those bounds are uniform w.r.t. ε ≥ 0 and refer to [KM02] or Section 2 in [KKM16] for a proof. From the same references (see also Lemma 3.6 in [KM00]), we have that the density of the Euler scheme also admits a similar parametrix representation. Introduce for 0 ≤ t i < t k ≤ T, (z, y) ∈ (R d ) 2 , the schemes: (3.10) Viewed as Markov Chains, their generators write for all ϕ ∈ C 2 (R d , R), x ∈ R d : Define now for 0 ≤ t i < t j ≤ T, (z, y) ∈ (R d ) 2 the Markov chain analogue of the parametrix kernel H in (3.6) by: x, y). One gets the following parametrix representation for the density of the Euler scheme: Again, the subscript ε is meant to explicitly express the dependence on the mollified coefficients. Also, the terms in the above series satisfy the controls of equation (3.9) uniformly in ε ≥ 0.

Integrated Weak Error under (A H ).
We first prove the statements concerning the integrated weak error in (1.7) and (1.8). We insist that, in that case, no regularization of the coefficients is needed. We have the following result: Proposition 3 (Controls of the Derivatives.). Let T > 0 be fixed. Under (A H ), there exist constants C ≥ 1, c ∈ (0, 1] s.t. for all 0 ≤ s < t ≤ T, (x, y) ∈ (R d ) 2 and all multi-index α, |α| ≤ 2: (3.12) As a consequence we also derive that for t j = jh ∈ [0, T ] being fixed and setting for all , as soon as f is bounded, we have that for all (t, x) ∈ [0, t j ) × R d : and for f ∈ C β (R d , R), β ∈ (0, 1] (space of globally, and possibly unbounded, Hölder continuous functions), we have for a multi-index α, |α| ≤ 2 and all (t, x) ∈ [0, t j ) × R d : (3.14) Proof. Equation (3.12) is a direct consequence of Proposition 4 below. This estimate readily gives (3.13). On the other hand, we get that for f ∈ C β (R d , R), β ∈ (0, 1], we have for a multi-index α, |α| ≤ 2, (t, x) ∈ [0, t j ) × R d : recalling that D α x R d p(t, t j , x, y)dy = 0 for the last identity. This is precisely what we call a cancellation technique. It allows here to exploit the spatial Hölder continuity of f to get rid of the time singularity appearing in (3.12) when |α| = 2, or to decrease the time singularity appearing in (3.13). Hence, from (3.12): Equation (3.14) readily follows. Similar operations will be recurrent in the proof of Proposition 4.

Proof of (1.7): Hölder final test function.
Set t h β,γ := sup{(t k ) k∈[[0,j]] : t k ≤ t j − h γ/β < t k+1 } and I h β,γ := t h β,γ /h. In particular, if γ ≥ β, t h β,γ = t j−1 and if β > γ, t h β,γ < t j−1 . Let v be the function defined in Proposition 3. It follows from Proposition 4 that ). An expansion similar to (1.11) yields: where T L stands for the contribution associated with the last step(s) and T M for the other main steps. From equation (3.14) in Proposition 3, one readily gets: (3.16) The contribution T L requires a more careful treatment. Let us write: Remark 7 (Extensions to functions f with subquadratic exponential growth). We stated (1.7) for f ∈ C β (R d , R) for simplicity. Observe anyhow that the above arguments can be adapted to derive the expected convergence rate as soon as f is locally β-Hölder and satisfies the growth condition: where c is as in equation (3.12). In that case, the controls of equations (3.13) and (3.14) would write in the following way. There exists a constant C ≥ 1 s.t. for all (t, x) ∈ [0, t j ) × R d : Plugging (3.19) into (3.15) and (3.17) still yields, thanks to the condition on c 0 in (3.18) and (1.9), an integrable contribution.

Proof of (1.8): Indicator of a Domain as Test Function.
We have assumed A to be C 2 domain and ∂A bounded. Let us denote by d S (·, ∂A) the signed distance to the boundary, i.e. d(x, ∂A) > 0 for x ∈ A and d(x, ∂A) ≤ 0 for x ∈ A.
It is known (see e.g. Lemma 14.16 and its proof p. 355 in [GT98]) that for δ > 0 small enough, on V δ (A) := {y ∈ R d : |d S (y, ∂A)| ≤ δ}, the function d S (·, ∂A) is C 2 and both the exterior and interior sphere conditions hold. The interior sphere condition writes that for y ∈ A δ := V δ (A) ∩ A := {y ∈ R d : 0 < d S (y, ∂A) ≤ δ} (interior points of A whose distance to the boundary is lower or equal than δ), its orthogonal projection on the boundary Π ∂A (y) is also the unique point s.t. defining B(y, d S (y, ∂A)) := {z ∈ R d : z − y ≤ d S (y, ∂A)}, B(y, d S (y, ∂A)) ∩ ∂A = Π ∂A (y). The exterior sphere condition writes similarly for the points y ∈ V δ (A)\Ā := {y ∈ R d : −δ ≤ d S (y, ∂A) < 0} (strictly exterior points of A whose distance to the boundary is lower or equal than δ).
For such a δ, let us now write for 0 ≤ t i < t j ≤ T, x ∈ R d : (3.21) Namely, f δ stands for a smooth approximation (at least C 2 ) of the mapping x → I x∈A .
Recalling again from the proof of Lemma 14.16 in [GT98] that for x ∈ V δ (A), ∇ x d S (x, ∂A) = n(Π ∂A (x)), where n(Π ∂A (x)) stands for the inner unit normal associated with the projection on the boundary, we get for x ∈ V δ (A)\A: This yields in particular that |∇f δ | ∞ = sup x∈V δ (A) |∇f δ (x)| ≤ Cδ −1 . This last bound in particular yields that there exists C ≥ 1 s.t. for all η ∈ (0, γ], (3.23) Indeed, from the control on |∇f δ | ∞ and the smoothness of f δ , we get for all x, y ∈ V 2δ (A), either |x − y| ≤ δ and |f δ (x) − f δ (y)| ≤ Cδ −1 |x − y| ≤ Cδ −η |x − y| η , or |x − y| ≥ δ and |f δ (x) − f δ (y)| ≤ C ≤ Cδ −η |x − y| η . Now, the terms T δ 1 and T δ 3 in (3.20) can be handled similarly thanks to the Gaussian upper bound that is satisfied, under (A H ), by the density of both the diffusion and its Euler scheme, see Proposition 4 or again [She91], Theorem 2.1 in [LM10]. Precisely, with the notations of (3.20) and provided that δ ≤ (t j − t i ) 1/2 : where d(x, ∂A) = |d S (x, ∂A)| stands for the nonnegative distance to the boundary. Indeed, we have that locally, up to a change of coordinate, only one variable is orthogonal to the straightened image of the hypersurface ∂A. We can thus integrate the Gaussian bounds w.r.t. the other ones. This yields the above control.
In particular, v δ has the same Hölder continuity modulus as f δ . We can as well assume w.l.o.g. that γ/η ≥ 1 so that h γ/η ≤ h ≤ 1.
Exploiting now (3.25) in an expansion similar to (1.11) and (3.15), we get: Recalling as well that the Euler scheme satisfies the Aronson Gaussian bounds (see Proposition 4 and Theorem 2.1 in [LM10] for details) we obtain for all s ∈ (t i , t j ]: where Π ∂A (y) again denotes the projection of y on the boundary ∂A, we get: (3.28) The point is now to find the η ∈ (0, γ] maximizing J x,A : η ∈ (0, γ] → ηd(x, ∂A) η in order to minimize the associated contribution in 1 ηd(x,∂A) η for T δ 2 . Two cases occur: It is of course the last term above that becomes significant when the distance of the starting point comes closer to the boundary. The global error estimate deriving from (3.24), the previous computations on J x,A and (3.29) is then better, up to a multiplicative constant, than the one deriving from (1.6) as soon as: (3.30) Since to apply the Aronson's estimates for T δ 1 , T δ 3 (see again eq. (3.20)) we had already assumed d(x, ∂A) ≥ (t j − t i ) 1/2 h γ/2 ≥ h (1+γ)/2 , we derive that the condition in (3.30) is always fulfilled. It can indeed be easily checked that h (1+γ)/2 ≥ exp(−h −Cψ(h) ) for h small enough. Equation (1.8) now follows from (3.24) and (3.29).
Remark 8 (Extension to piecewise smooth domains.). Let us mention that results similar to (1.8) could also be derived for domains A := ∩ n i=1 A i that write as finite intersections of smooth domains (A i ) i∈[[1,n]] with bounded boundaries, and therefore have piecewise smooth boundary. In that case, d(x, ∂A) := inf i∈{1,··· ,n} d(x, ∂A i ) is well defined, but the corresponding signed distance can fail to be smooth, precisely close to the resulting corners. Hence, f δ cannot be directly defined as above. Namely, some additional mollification of the corresponding distance would be necessary as well.

Error Expansion for The Euler Scheme: Controls on the Densities.
From Theorem 1.1, Theorem 2.1 and their proofs in [KM02] we have with the notations of the previous paragraph: where we denote for 0 ≤ t i < t j ≤ T, τ ∈ [0, 1]: Observe that L ε t φ(x, y) = L ε t, * φ(x, y), but more generally the operators do not coincide anymore when iterated. Also, we indicate that the operators involved slightly differ from [KM02] since we chose to use a Gaussian process without drift as proxy, see (3.7) and (3.10). Another difference is the fact that we deal with inhomogeneous coefficients, and the notationsL ε ., * ,L. * ,ε in (3.31) are used to emphasize the time dependence of the operators in the discrete convolution ⊗ h . Anyhow, reproducing the proof of [KM02] taking into account the indicated differences leads to the expression in (3.31).
We mention carefully that in order to analyze the contribution of the last term in the r.h.s. of (3.31) no smoothness in time of the coefficients is needed. On the other hand, such smoothness is clearly required to derive some convergence rates, since to control p ε − p d ε we need to investigate the difference between time integrals and Riemann sums (see Proposition 5 and its proof below).
The term x, y)}dτ involves derivatives of the coefficients and heat kernels up to order 4. The point is again that the derivatives of the coefficients and kernels explode with ε going to 0 (see equation (2.6)). It is precisely this aspect that deteriorates the convergence rate w.r.t. the usual smooth case. We carefully mention that if σ(t, x) = σ, the previous contributions involve lower derivatives of the heat kernel (up to order 2).
The key elements are now the following Propositions. The first one gives bounds for the derivatives of the densities involved in the parametrix series (3.5), (3.8). The second one controls the difference between the discrete and continuous convolutions in (3.31).
In the above expressionsp ε can be any of the densities p ε , p d ε , p τ,h ε uniformly in τ ∈ [0, 1]. For p d ε , p τ,h ε , the time variables s, t are taken on the time grid.
Remark 9 (Spatial Hölder continuity and heat-kernel bounds). We point out that the previous controls (3.32) forp ε = p ε would also hold under the sole spatial Hölder continuity of the coefficients b, σ. This improves in some sense those of [IKO62] which require smoothness in time of the coefficients. We get here the same pointwise controls for the derivatives of the non degenerate heat-kernel with spatial Hölder coefficients up to order 2, uniformly in ε ∈ [0, 1].
Remark 10 (Constants in (3.33)). Even though we are currently considering (A P S ), the associated small smoothing effect deriving from the regularization of the drift is the same as for the sensitivities of densities under (A H ), for which it was induced by the small Hölder parameter for the difference of the diffusion coefficient and its regularization. In both cases the constant C η appears through the control of the corresponding parametrix series, see the proofs of Theorem 3, Lemma 1 and Proposition 4 below.

Proof of The Main Results for Hölder Coefficients (Theorem 1 under (A H ))
Observe from Proposition 4 that, for all k ∈ [[i, j − 1]], (z, y) ∈ R d 2 , τ ∈ [0, 1], Iterating the frozen operator, we obtain that L ε t k , * − L * ,ε t k 2 p τ,h ε (t k , t j , z, y) is a fourth order differential operator which is the sum of the following typical terms: It is easy to see that the terms with fourth derivatives are the most singular. Hence, to , it is enough to concentrate on: x, z) a lm ε (t k , z) − a lm ε (t k , y) a qr ε (t k , z) − a qr ε (t k , y) D 4 z l zmzqzr p τ,h ε (t k , t j , z, y)dz =: (T 1 + T 2 + T 3 )(t i , t j , x, y). (3.39) The tools to control the above terms are (3.32) in Proposition 4 and the Hölder continuity of the mollified coefficients under (A H ). We readily derive: For the term T 2 in (3.39), integrating once by parts, we obtain from (3.32) and (2.6) that: (3.41) The term T 3 in (3.39) can be handled using the same arguments and two integrations by parts in order to get rid of the time singularities. After integrations by parts, the most singular terms w.r.t. ε have the following form: For T 31 , we obtain from inequality (3.32) in Proposition 4 and (2.6) that: For T 32 , Proposition 4 and the spatial Hölder continuity of a ε yield: (3.44) An upper-bound for T 3 then follows summing (3.44) and (3.43). We then derive from (3.40), (3.41) and (3.39) that: We thus eventually get from (3.31), (3.34) and (3.45): Without loss of generality we assume now that 0 ≤ t j − t i ≤ T ≤ 1. We also suppose that: (3.47) We will check that (3.47) holds for the specific choice of the parameters ε, η which is performed below. We derive from equations (3.46), (3.47), together with (3.1), (3.2) that: For such a choice of a mollifying parameter we have for (t j − t i ) ≥ h 1/(2−γ) : Assume for a while that η can be taken so that: recalling as well that C η ≥ 1 for the last assertion. Then, for (t j − t i ) ≥ h 1/(2−γ) if (3.49) holds: Hence, from (3.48), if (3.49) holds: (3.50) using the bounds of Lemma 1 for the last inequality. The point is now to carefully choose η := η(h). Let us consider the specific sequence η = η(h) := 2 , where we recall that for k ∈ N, log k (x) stands for the k th iterated logarithm of x. Observe that this η(h) satisfies the condition (3.49) for h small enough. Setting β h := h −η and α h := exp C(2η −1 + 1) 2η −1 +1 , we get that: log 2 (β h ) = log(η log(h −1 )) = log(2) + log 4 (h −1 ) − log 3 (h −1 ) + log 2 (h −1 ), log 2 (α h ) = log(C(2η −1 + 1) 2η −1 +1 ) = log(C) + (2η −1 + 1) log(2η −1 + 1) It is easily seen that there exists a finite constantC > 0 s.t. for all h small enough, R h ≤C and that log 2 (β h ) ≥ log 2 (α h ) −C. By monotonicity of the exponential, recalling as well that η ∈ (0, γ), we thus derive: (3.51) The previous choice of η yields that, since C η = Cα h , (3.47) is satisfied as well. Plugging (3.51) into (3.50) we complete the proof of equation (1.6) in Theorem 1.

Proof of The Main Results for piecewise smooth coefficients (Theorem 2 under (A P S ))
Keeping the definitions of (3.38), the idea is to proceed as in the previous section from equations (3.31), and (3.39). To emphasize the specificity of Assumptions (A P S ), due to the approximation of the piecewise smooth drift, we begin with the special case σ(t, x) = σ. In that framework, the only terms appearing in (L ε ., * −L * ,ε . ) 2 p τ,h ε are the Ψ ε,τ,h l,m introduced in (3.38). From equation (3.33) in Proposition 4, using a direct control for the index k = i and a global integration by part for k > i, associated with the bound of (2.9), we derive: The point is now to use the Hölder inequality to exploit that the set on which ∇ z b ε gives an explosive bound is small. We get: , denoting byq > 1 the conjugate of q, q −1 +q −1 = 1. Recall now that: This yields: as soon as ε 1− 1 q −η ≤ 1 which holds true for η small enough (remember q > d). Performing now in the general case, involving derivatives of the heat kernel up to order 4, an integration by part similar to the one described for (3.39) and using the Hölder inequality as above for the terms involving derivatives of b ε , we derive from (3.33) in Proposition 4, that for all q > d, η ∈ (0, α(q)): (3.53) We thus get in whole generality, from (3.3), (3.31), (3.53) and (3.35) in Proposition 5: If now d(y, V ε (I)) ≥ 2ε, then, from (3.36) in Proposition 5: Eventually, if we additionally have that σ(t, x) = σ, (3.37) in Proposition 5 and (3.52) yield: We then set C q ε 1/q =C η,q hε −2+1/q in the general case, i.e. for b, σ depending both on the spatial variable and without any distance condition for the final point y. If d(y, V ε (I)) ≥ 2ε, we take C q ε 1/q =C η,q hε −(1+η) for a general σ and C q ε 1/q =C η,q hε −(1+η)+1/q if σ(t, x) = σ. The results can be derived as in the previous section choosing η := η(h) = ψ(h), q := q(h) s.t. α(q) = ψ(h). For (1.12) and (1.13), we recall as well that if d(y, I) ≥ h 1/2−ǫ for a fixed given ǫ > 0 for a general σ and d(y, I) ≥ h 1−ǫ for σ(t, x) = σ, the condition d(y, V ε (I)) ≥ 2ε is met.

4.
Proof of the Technical Results from Section 3. Let us establish the result for p ε . We start from the parametrix representation of p ε obtained in (3.5). In all cases, we can readily derive from (3.7) (recall thatX ε,y is a non degenerate Gaussian process) and (2.6) in Proposition 2 that for the main term in the expansion for all multi-index α, |α| ∈ [[1, 4]]: Let us now concentrate on the remainder term: We focus on the first two inequalities in (3.32), the last one can be proved similarly. The ideas are close to those in [IKO62], but we need to adapt them since they considered the "forward" version of the parametrix expansions. The key point is that, for Hölder coefficients we have bounded controls for the derivatives of the remainder in the backward variable up to order two. It is first easily seen for the first derivatives, since the first order derivation gives an integrable singularity in time in the previous expansions. Indeed, from (4.1) and (3.9), one readily gets the statement if |α| = 1. The case |α| ≥ 2 is much more subtle and needs to be discussed thoroughly. Write indeed: The contribution D α x R f ε (s, t, x, y) does not exhibit time singularities in the integral, since on the considered integration set u − s ≥ 1 2 (t − s). Let us now recall the usual control on the parametrix kernel under (A H ), see e.g. Section 2 in [KKM16]. There exist c, c 1 s.t. for all 0 ≤ u < t ≤ T, (z, y) ∈ (R d ) 2 : Inequality (4.3) for H ε then yields for all r ∈ N * , 0 ≤ s < t ≤ T, (x, y) ∈ (R d ) 2 : with the convention 0 i=1 = 1. We thus derive that for all 0 ≤ s < t ≤ T, (x, y) ∈ (R d ) 2 : |Φ ε (s, t, x, y)| ≤ C (t − s) 1−γ/2 p c (t − s, y − x). (4.5) Thus, from inequalities (4.1) and (4.5): (4.6) The delicate contribution is indeed D α x R τ ε (s, t, x, y) for which we need to be more careful. If |α| = 2 we exploit some cancellation properties of the derivatives of the Gaussian kernels. Recall now that for an arbitrary w ∈ R d , setting for 0 ≤ s < u ≤ T, Σ ε (s, u, w) : where for q ∈ R d , we denote for i ∈ [[1, d]] by q i its i th entry. Hence, for all multi-index α, |α| = 2: xp w ε (s, u, x, z)dz = 0. (4.8) Introducing the centering function c α ε (s, u, x, z) := (D α xp w ε (s, u, x, z)) | w=x , we rewrite: x, y))dz := (R τ,1 ε + R τ,2 ε )(s, t, x, y), (4.9) exploiting the centering condition (4.8) to introduce the last term of the first equality. On the one hand, the terms D α xp ε (s, u, x, z), c α ε (s, u, x, z) only differ in their frozen coefficients (respectively at point z and x). Exploiting the Hölder property in space of the mollified coefficients, it is then easily seen that: yielding an integrable singularity in time so that, from (4.5): (4.10) Let us now control the other contribution. The key idea is now to exploit the smoothing property of the kernel Φ ε . Assume indeed that for A := {z ∈ R d : |x − z| ≤ c(t − s) 1/2 } (recall as well that u ∈ [s, s+t 2 ]) one has: |Φ ε (u, t, x, y) − Φ ε (u, t, z, y)| ≤ C |x − z| γ/2 (t − u) 1−γ/4 p c (t − u, y − z). (4.11) Then, we can derive from (4.1), (4.9) and (4.11): x, y)|}dz. (4.12) From (4.5), we finally get on the considered time set: which together with (4.10), (4.9), (4.6) and (4.2) gives the statement. It remains to establish (4.11). From the definition of Φ ε and the smoothing effect of the kernel H ε in (4.4), it suffices to prove that on the set A := {z ∈ R d : |x − z| ≤ c(u ′ − u) 1/2 }: exploiting that z ∈ A, t− u ≥ 1 2 (t− s), and the usual convexity inequality |y−x| 2 t−u ≥ |y−z| 2 2(t−u) − |z−x| 2 t−u ≥ |y−z| 2 2(t−u) − 2c 2 for the last but one inequality. On the other hand, onĀ we get (4.11) from (4.13) and (4.4).
Let us turn to the proof of (4.13). We concentrate on the second derivatives in H ε which yield the most singular contributions: (4.14) Then, from (4.1), using that z ∈Ā for the second inequality, again combined with the convexity inequality |x−w| 2 u ′ −u ≥ |z−w| 2 2(u ′ −u) − |x−z| 2 u ′ −u ≥ |z−w| 2 2(u ′ −u) − c 2 for the last one. Now, from the explicit expression of the second order derivatives in (4.7), (A2) and usual computations we also derive: This gives (4.13) and completes the proof for |α| ≤ 2. Let us now turn to |α| ≥ 3. In those cases, the singularities induced by the derivatives are not integrable in short time, even if we exploit cancellations. We are thus led to perform integration by parts, deteriorating the bounds since these operations make the derivatives of the mollified coefficients appear.
We readily get from the controls of (4.1) that: |D β 1 (s, t, x, y)| ≤ C (t − s) (|β|−1)/2 p c (t − s, y − x), (4.31) which is the expected control. Since a is smooth the terms involving the second derivatives w.r.t. z in D β 2 can be handled performing the change of variables z ′ = z + y as above (see also [KM02] under the current smoothness assumption on the diffusion coefficient). Let us thus focus on the contribution: recalling that for all y ∈ R d , R d D zp (u, t, z, y)dz = 0, so that D β y R d D zp (u, t, z, y)dz = 0, for the last but one equality. Still from the controls of (4.1), we readily get: p c (u − s, y − x + λ(z − y))dλI |z−y|≤(t−s) 1/2 +(p c (u − s, z − x) + p c (u − s, y − x))I |z−y|>(t−s) 1/2 1 (t − u) p c (t − u, y − z)dz ≤ Cp c (t − s, y − x).