A simple method for the existence of a density for stochastic evolutions with rough coefficients

We extend the validity of a simple method for the existence of a density for stochastic differential equations, first introduced in [DebRom2014], by proving local estimate for the density, existence for the density with summable drift, and by improving the regularity of the density.


Introduction
The purpose of this paper is to illustrate and improve a simple but effective method to prove existence and minimal regularity of a density for the solutions of stochastic differential equations with non-regular coefficients. The strong points of the method are its simplicity, its flexibility and the dimension-free nature of the regularity estimates. Indeed, the simple method has been first introduced in [14] (see also [39,41,42,43]), widely extending an idea in [19], to prove existence of densities for finite dimensional approximations of an infinite dimensional stochastic equation. The method has been later used, see for instance [13,18,44,45,1].
Indeed, the method we are discussing and extending fits into the general problem of studying existence and regularity of densities for stochastic equations with nonsmooth coefficients. The problem has raised some interest recently. In addition to the aforementioned [19,14], we would like to mention the approach in [5,6,7] based on some analytic criteria in spaces of Orlicz type, and interpolation. In [23,24] the author get rid of the drift with a Girsanov transformation, and then use the Malliavin calculus. Malliavin calculus is also used in [30]. An atypical method based on optimization is instead introduced in [3,4]. Different approaches, based on an explicit representation of the density (and so giving typically results as lower and upper bounds on the density, rather than regularity) have been given in [25,28,27]. Finally, [33] proves Malliavin differentiability of solutions of stochastic equations with non-differentiable coefficients. For a PDE approach see for instance [31].
In this paper, as well as in most of the aforementioned papers (some with non-Gaussian noise), we focus on the following "toy problem", dX t = b(t, X t ) dt + σ(t, X t ) dB t and in Section 2 we illustrate our simple method for the existence of a density for the solutions of the equation above, under the assumptions b ∈ L ∞ (R d ), σ ∈ C β (R d ; R d×d ′ ), for some β ∈ (0, 1), d, d ′ ≥ 1, and σ(y)σ(y) ⋆ ≥ δI, with δ > 0 (weaker assumptions will be discussed in the subsequent sections). The method uses the idea of an auxiliary (much) simpler process introduced in [19], together with a smoothing lemma (see Appendix A) for duality in Besov spaces.
In this paper we use the simple method illustrated in Section 2 to extend the scope and the conclusions of the method itself. It is of foremost importance to notice that, even though we prove our results on the toy problem above, the results themselves are not difficult to extend to other contexts, such as equations driven by non-Gaussian noise, path-dependent equations, stochastic PDEs. We shall illustrate some of these examples in Section 7.
We first show in Section 2.2 that a higher order of approximation provides, whenever it is compatible with the regularity of the coefficients, a higher regularity for the density.
Even though the results if Section 2 assume uniform ellipticity of the diffusion, it is not difficult to adapt the method to both the cases of a singular diffusion matrix as in Section 3.1, and of a hypo-elliptic diffusion. In the latter case we only discuss a simple example to illustrate the ideas. Unfortunately, but not unexpectedly, stronger regularity assumptions are necessary here, strong enough that in principle the problem may be amenable by a more standard approach such as the Malliavin calculus.
In Section 4 we prove a local version of the results of the simple method introduced in Section 2, to take into account the case when the coefficients are not globally regular or globally bounded.
In Section 5 we discuss the case of rougher coefficients (with respect to the assumptions of Section 2, namely bounded drift and Hölder covariance). In Section 5.1 we prove existence of a density for L p drifts, with p larger than the dimension of the state spaces. Unfortunately we have not been able to lower the regularity requirements for the diffusion. We briefly discuss the issues in Section 5.2.
Finally, in Section 6 we slightly improve on the summability index of the Besov spaces for the regularity of the density. This is not yet a satisfactory result, since in the basic case of Section 2 one would expect Hölder regularity. We can achieve Hölder regularity only in dimension one (see Remark 6.4).
Our results aim to be explanatory, so for instance we will not mix two different lines of development of the method (for instance, we will not consider local results, as in Section 4, with rough drift, as in Section 5, or with singular diffusion, etc.).
In addition with the results explained so far for the simple toy model, in Section 7 we discuss a series of examples, • path-dependent stochastic equations, • improvements over [13] for a class of stochastic equations driven by α-stable noise, • a singular stochastic PDE. These examples should convince of the power and flexibility of our simple method. We notice in particular that the dimension-free nature of the regularity obtained is well suited for infinite dimensional problem (as well as in general for singular problems). It is sufficient indeed to apply the simple method to a series of approximating problems, to obtain uniform estimate in a Besov space with small but positive regularity. This ensures uniform integrability and thus convergence of the densities to the density of the limit problem.
Finally, in Appendix A we introduce our functional analytic framework, with the definition of all the function spaces we use throughout the paper, and we prove the crucial smoothing results for laws of random variables.

The core idea
In this section we introduce the core idea around the simple method which is the main theme of the paper. The idea appears implicitly in [14] (and later, explicitly, in [13]). We repeat it here so that it will be the starting point of our improvements. We will moreover make the dependence on the time when the density is evaluated and on the initial condition explicit.
In other words the diffusion coefficient is non-degenerate. Our basic tool here is the smoothing Lemma A.1. Fix an integer m ≥ 1 large and a function φ in C α b (R d ), with α ∈ (0, 1) to be chosen later. Our aim is to estimate E[∆ m h φ(X t )] and capture the regularizing effect of the density. To this end, consider a number 0 < ǫ < t ∧ 1 and the auxiliary process We decompose the quantity E[∆ m h φ(X t )] in two terms, the approximation error , (2.4) Ae , and the probabilistic estimate, . We will address (and denote) the two fundamental quantities we have defined as Ae and Pe in the rest of the paper.
For the first term we use the regularity of the test function φ, and, by standard estimates on stochastic equations, we have, , or, in other words, Ae ǫ 1 2 α(1+β) . For the second term we condition over the history F t−ǫ up to time t − ǫ, as in [19], t is a Brownian motion, independent from F t−ǫ , with starting point X t−ǫ and covariance matrix σ(X t−ǫ )σ ⋆ (X t−ǫ ). By a discrete integration by parts (that is by using the second formula in (A.1) and a change of variables), with a number c that depends on det(σ(y)σ ⋆ (y)) −1 (and σ L ∞ ), and thus is uniformly bounded with respect to y by our non-degeneracy assumption. In conclusion, Pe An optimization in ǫ suggests the choice ǫ ≈ |h| 2m m+α(1+β) (we will address the issue that ǫ ≤ t below), thus For m large, the exponent of |h| is about α(1 + β). The smoothing Lemma A.1 yields that X t has a density p t ∈ B αβ 1,∞ (R d ). Since α can be chosen arbitrarily close to 1, we conclude that p t ∈ B β−η 1,∞ (R d ) for all η > 0.
and that |b(x)| |x| 2 (this is indeed the case analyzed in [14]). Then and the estimate of the approximation error is as above.
We wish to give a version of the above computations that takes into account the size of the pre-factor in (2.8) in terms of the time t and the initial value X 0 . We state the result in a way that will be convenient in the next sections. In terms of the proposition below, the above computations correspond to a 0 = β, θ = 2.
Since by our choice of m we also have m > α(1 + a 0 )(1 − δ 2 ), by the smoothing Lemma A.1, By the choice of δ 2 , This yields the exponents in (2.10).
, while with our positions,

Remark 2.3 (Time dependent coefficients)
. It is not difficult to add time regularity to the coefficients. We will do so for the drift in Section 5.1. As for the diffusion coefficient, if for instance σ ∈ L ∞ (0, T ; C β b ), and if we define the auxiliary process as , if s ≥ t − ǫ, and as in (2.3) otherwise, then both the approximation error and the probabilistic estimate can be estimated with the same power of ǫ as above, and thus the result we get is the same.
Remark 2.4 (Time regularity). It is possible to obtain regularity in time of the density. This has been done in [43] in the framework of finite dimensional projections of Navier-Stokes equations. Here one can prove that From this, using the methods we have introduced in this section, it is easy to 2 − is the most challenging, and in [43] it has been obtained using the Girsanov transformation.

Local solutions.
In principle the solution of (2.1) may not be defined over all times. In this section we want to manage the case of solutions with explosion. We will see that, under suitable assumptions on the coefficients, the solution of (2.1) has a density on the event of non-explosion.
To this end we need to give a suitable definition of the solution. We define X t to be the local solution before explosion, and a cemetery site ∞ afterwards. Under the assumption of local existence and uniqueness this defines a Markov process (this was done already for instance in [40] in a general framework). Denote by τ ∞ the explosion time, then it is easy to see that {τ ∞ > t} = {|X t | < ∞}. This will allow to localize at fixed time the blow-up (unlike τ ∞ that in principle carries the full past history of the process).
Let η R (x) be a smooth function equal to 1 when |x| ≤ R, and to 0 when |x| ≥ R + 1. Our computations will prove that, if µ t is the law of X t (possibly with an atom at ∞), then η R (x)µ t (dx) has a density with respect to the Lebesgue measure. As long as the estimate on the density (due to the coefficients) of η R µ t does not depend from R, by the uniform integrability ensured by the Besov bound also the measure ½ {|Xt|<∞} µ t = ½ {τ∞>t} µ t has a density, with similar Besov bounds by semi-continuity.
In more details, fix t > 0, α ∈ (0, 1), φ ∈ C α b , m ≥ 1, and h such that |h| ≤ 1. To prove that η R (x)µ t (dx) has a density, in view of the smoothing Lemma A.1 it is sufficient to prove that , for some s > α. We decompose the discrete derivative as The first term on the right-hand-side is the approximation error , , and the final estimate depends on the coefficients. The third term is the probabilistic estimate, | that can be estimated as in the previous section. Here F t−ǫ is the σ-field of events before time t − ǫ.
Finally, the new term that accounts for explosion can be controlled as follows, if both X t and X t−ǫ are smaller than R+1 or larger than R+1. In case X t−ǫ < R+1 and in both cases one can use the equation to obtain an estimate of the above quantities.

2.2.
More regularity -I. By Proposition 2.2 it is immediately clear that in order to obtain more regularity for the density we need to improve the approximation error . If we think of the auxiliary process 2.3 as a (very) basic numerical approximation of the original process X, then we need to use a more refined numerical method. To do this it is necessary to have a smoother coefficient. Moreover, due to the non-anticipative nature of the estimate of the probabilistic estimate, the numerical method needs to be explicit. Let us focus, for the sake of clarity, on the drift term, namely consider with b ∈ C β b . Similar considerations and conclusions are likewise possible for the diffusion coefficient.
To exploit the additional regularity of the drift b, it is meaningful to define the auxiliary process differently. Define the auxiliary process Y ǫ in a general way as and this time take A r = b(X t−ǫ ), rather than A r = 0 as we have done before. The probabilistic estimate does not change, since we have only changed the mean, but in a way that is measurable with respect to the information at time t − ǫ. To evaluate the approximation error , consider (2.12) and Proposition 2.2 ensures that the density is in B a 1,∞ for a < 1 + β. It is now quantitatively clear that the more regular is b, the more regularity we obtain for the density. The (obvious) key idea is to find a good approximation of X. A natural candidate then for Y are the Picard iterations for our equation. The next step to improve the regularity of the density is to choose an auxiliary process that ensures a smaller estimate of |X t − Y ǫ t |. Since in (2.12) the estimates depends on |X s − X t−ǫ |, and in turns the size of this term corresponds to the size of the Brownian increments, a way to improve the difference might be to define the auxiliary process (2.11) with A s = b(X t−ǫ + B s − B t−ǫ ) when s ≥ t−ǫ. Unfortunately this makes the term in the probabilistic estimate "anticipative" (with respect to the time t − ǫ). If we are ready to consider an "anticipative" term, we are faced with the difficulty that it becomes now difficult to evaluate the law of the terms that contribute to the probabilistic estimate. This is indeed an implicit requirement of the simple method we are illustrating.
A workaround that, albeit "anticipative", keeps the term that will contribute to the probabilistic estimate simple, can be considered if b is more regular, namely if b ∈ C 1+β b (R d ), by looking at iterated integrals of Brownian motion. Additional regularity of b cannot be avoided in general, and a way to see this is to look at the discussion on the Fokker-Planck equation at the beginning of Section 6. With this in mind, take . With the choice of the auxiliary process, we see that the term providing the probabilistic estimate is again amenable to our analysis: given y = X t−ǫ , we have that whereB r = B t−ǫ+r − B t−ǫ is independent from the history until time t − ǫ. The random variable Y ǫ t is conditionally Gaussian with variance so that the probabilistic estimate has the same order in ǫ. In conclusion the density is in B a 1,∞ for all a < β + 2. With additional regularity of b one can consider better stochastic Taylor approximations to obtain better estimates. We refer to the classical [26, Chapter 10] for some possibilities.
As a concluding remark we notice that, as long as the coefficients have derivatives, one can more easily resort to Malliavin calculus. These consideration will become useful though in Section 3.2.

Singular diffusion coefficient
In this section we wish to consider the case when the diffusion matrix is singular. We will briefly discuss two cases. The first is when the diffusion matrix is noninvertible. The second case is the so-called hypo-elliptic case, when the diffusion matrix is singular, but the noise is transmitted to noise-less components by the drift.
3.1. Singular diffusion coefficient. First of all, we notice that in general we do not need a uniform estimate such as (2.2), since under the assumption of nonsingularity one can use the results of Section 4. In this section we wish to investigate if there is any condition that can ensure the existence of a density even around point where the diffusion coefficient is zero or non-invertible.
An observation from [13] indeed says that the method from Section 2 can be slightly modified to take into account the case when the diffusion coefficient is not invertible. The idea is as in Section 2.1. Consider a solution X t of (2.1) and let µ t be its law. Fix an integer m ≥ 1, then the smoothing Lemma A.1 will be applied to the measure |σ −1 (x)| −m µ t (dx), where | · | is the operator norm of a matrix. If this measure has a density, then µ t has a density on {y : σ(y) invertible}.
To this end, it is sufficient to prove that The decomposition here is The first term can be controlled as while the second term plays the role of the approximation error , thus Finally, the third term is the probabilistic estimate, and by conditioning, The contribution of the first term is negligible, since, if the method is successful, then |h| √ ǫ, thus |h| α ǫ β/2 ǫ 1 2 (α+β) ≤ ǫ α 2 (1+β) , since α < 1. In conclusion, we end up with the same estimate as in Section 2, so that we obtain the same conclusion, but on the measure |σ −1 (x)| −m µ t (dx). This proves the following result.
. Let X be a solution of (2.1) and t > 0. Then X t has a density with respect to the Lebesgue measure on the set {y : σ(y) invertible}.
Clearly, the regularity assumptions on the coefficients can be relaxed (for instance along the lines of Section 5).
We now wish to find some simple condition for the existence of a density on R d even in the case of non-invertible diffusion coefficient. For the sake of simplicity, we consider our problem (2.1) without drift, namely, Then X t has a density p in R d and there is a 0 = a 0 (β, γ) > 0 such that p ∈ B a 1,∞ for all a < a 0 .
Proof. Consider the auxiliary process as in (2.3). The approximation error is as in Section 2, thus Ae ǫ For the probabilistic estimate, ifB s = B t−ǫ+s − B t−ǫ , then by conditioning, and the inequality extends to σ(y) non invertible, since by definition |σ(y) −1 | −1 = inf |z|=1 |σ(y)z| (and the understanding 1 0 = ∞). By the moment assumption, Choose now 0 < λ 1 < λ 2 < 1 and consider the two terms Ae and Pe (here we have replaced ǫ with δ for convenience), The proof can now be concluded with computations similar to those in Proposition 2.2.
Example 3.3. The result we have obtained is clearly non-optimal. Let us consider a simple example to figure out where the issues arise.
if and only if γ < 1. The previous considerations ensure that there is a density p t for X t on R and that has regularity On the other hand we know explicitly the density, The correct regularity can be recovered, at least away from the zero of the diffusion coefficient, using the methods of Section 4.

A hypo-elliptic example.
Here we aim to consider a hypo-elliptic problem, that is a problem where the diffusion coefficient is not invertible, but the effect of the noise is propagated by the drift. We will only show a (very) elementary example, to convince that the simple method we are illustrating is effective also in this framework. It would be difficult, though, to give a general result.
Consider the following problem, with X = (X 1 , X 2 ), where B is a one-dimensional standard Brownian motion. Here the diffusion coefficient is which is nowhere invertible, thus none of the results of the previous section is available. The key point is clearly the definition of the auxiliary process for the second component. The basic definition in (2.3), as well as the one in (2.11) with , are not suitable, because we do not introduce any influence of the noise. This is fundamental in view of the probabilistic estimate. Thus, we consider the auxiliary process in the second component as in (2.11), but with the process A defined similarly to (2.13), and only with variations in the first component (the one that contains the noise). In other words, for s ≥ t − ǫ. In particular, we need to assume that b 2 is differentiable. This is not sufficient, since to ensure that the first component will "transfer" the regularizing effect of the noise to the second component, we need to add the "hypo-elliptic" assumption We turn to the definition of the first component of the auxiliary process. There is a hidden requirement for the approximation error . Indeed, we will see in Proposition 3.4 below that Pe ǫ − 3 2 m |h| m . This is due to the fact that the random variable in the second component is smoother, thus gives a stronger singularity in time. To guess the right size of the approximation error , we see that, with the probabilistic estimate as above, if Ae ǫ αq , then by the simple optimization we have seen in Proposition 2.2, we need q > 3 2 . In other words, we need to choose an auxiliary process that approximates the original process to the order ǫ 3/2 . Thus Then for every t > 0 any solution X t of problem (3.1) has a density in B a 1,∞ for every a < 1 3 β.
We notice that on the one hand the result is not very satisfactory, since coefficients are assumed differentiable (although one can be more careful on which directional derivatives are really necessary), thus one can derive the existence of a density by the existence of the Malliavin derivative and the non-degeneracy of the Malliavin matrix. The last statement would follow from our hypo-elliptic assumption. On the other hand, at this level the simple method of this paper still provides the additional value of the minimal regularity that ensures that smooth approximations of the solution have uniformly integrable (thus, weakly compact) densities.
Proof of Proposition 3.4. We first estimate the approximation error . We easily have In conclusion the approximation error is . We turn to the probabilistic estimate. Conditional to the history up to time t − ǫ, we can write is Gaussian, and its covariance matrix has eigenvalues of order ǫ and ǫ 3 . Thus Our standard computations (those in Proposition 2.2), ensure that the density of X t is in B a 1,∞ for all a < 1 3 β. Remark 3.5. The regularity obtained in the previous proposition is the regularity of the joint density of X 1 t and X 2 t . It is not difficult to check that the density of the random variable X 1 t is more regular.

Local estimates
In this section we wish to obtain local estimates on the density. This might be useful if for instance • we only have local regularity of the coefficients, • or if the diffusion coefficient is non-zero or non-singular only in some part of the space, and so on.
To this end, let us consider our toy model (2.1). We will localize the problem as in [12,Theorem 2.4], and then apply the method to the localized problem.
If X x is solution of (2.1), with initial condition x ∈ R d , then for every t > 0 the random variable X x t has a density p x (t) in a smaller ball D ′ ⊆ D, and p x (t) ∈ B a 1,∞ (D ′ ) for every a < β.
We will devote the rest of the section to the proof of the theorem. Prior to this, we give a global version of the above result.
for every a < β and every smaller ball D ′ ⊂ D.
Proof. Notice that a probability measure that is locally absolutely continuous, is absolutely continuous, thus it is sufficient to prove the existence of a density in balls. This is immediate from the previous theorem. 4.1. Proof of Theorem 4.1. Let x 0 ∈ R d and r > 0 be such that B 6r (x 0 ) ⊂ D. We shall study existence and regularity of the density around x 0 of the solution X of (2.1) at some fixed time t.
then we see that r → m(t, r) is non-decreasing with limit 1 at r = ∞. Assume that m(t, r) > 0 for all r > 0 (otherwise |X t − x 0 | > 0 a. s. and the density of X t is equal to 0 in a neighbourhood of , thus, in view of the smoothing Lemma A.1, it is sufficient to prove that To localize the dynamics, we proceed again as in [12], and consider a localizing ). It is immediate to see thatb andσ have (globally) the same (local) regularity properties of b, σ. In particular, Decomposition. Fix a small parameter δ ∈ (0, 1], with δ ≪ t, and set Set moreover Consider first the second term. In [12] it is proved that and it is easy to see (recall thatX satisfies an equation with globally bounded coefficients) that for every q ≥ 1, Therefore, for every q ≥ 1, Consider next the first term in (4.2). Conditional to X t−δ , the random variable On the event I t we have that Thus, by the Markov property, on The analysis of the first term on the right-hand side is postponed to the next section. We focus here on the second term. Indeed, On {τ x ≤ δ} and for Before turning to the next step, we summarise what we have proved so far. By (4.4), (4.5) and (4.6), we see that We use this estimate and (4.3) in (4.2) to finally obtain where the constant in the inequality above depends on r, b L ∞ and σ L ∞ . 4.1.3. The method. Let α ∈ (0, 1) and m ≥ 1 (to be suitably chosen later), and , and apply (4.8) to get (4.9) We will apply our method to the new processX. Under our standing assumptions,X has globally bounded drift and globally non-singular Hölder diffusion coefficient, so in principle the estimate should not be any different than what we did in Section 2. Indeed, we consider the extra parameter ǫ ∈ [0, δ] and an extra processȲ x , defined as in Section 2, and we do the decomposition The approximation error is essentially the same, . The probabilistic estimate is slightly more delicate. By conditioning on the σfield F δ−ǫ of events known at time δ − ǫ, , is a Brownian motion. It remains to analyze the term where g ǫ is the density of a centred Gaussian random vector with covariancē σ(u)σ(u) ⋆ ǫ. By a discrete integration by parts (see below in (4.10)), The Leibniz formula for discrete derivatives (4.10) yields On the one hand, since ǫ ≤ 1, and since by the assumptions of the theorem there on the other hand, , and in conclusion In conclusion, by (4.9) and the above computations we have that with q, δ, ǫ yet to be chosen, with 0 < ǫ ≤ δ ≤ 1 ∧ t. Choose δ to be a multiple of ǫ, and q = α(1 + β), so that The same computations of Proposition 2.2 yield the result. We finally mention two formulae on discrete derivatives we have used in the computations above,

Rougher coefficients
In this section we wish to discuss an extension of the basic result on existence given in Section 2. We will only be able to lower the required regularity of the drift coefficient.
We wish first to give a few remarks about the possible limitations in the case of rough coefficients. With Proposition 2.2 at hand, it is reasonable to expect to find a threshold of regularity for the coefficients under which the method fails. Indeed, the key requirement is a 0 > 0 in formula (2.9). In the simple case of equation (5.1) below, this would correspond to the estimate Ae ǫ α/2 , that is Notice that this estimate is readily available if one observes that In other words, in the case a 0 = 0, at short times, the size of the drift is the same as the size of the noise and in principle one can believe that the effect of the noise fails to be effective for densities.
It would be interesting to understand if there is a counterexample to existence of densities in the case a 0 ≤ 0, or simply the method fails. 5.1. Rougher drift coefficient. In this section we shall study the existence of a density for solutions of the equation with non-regular coefficient b. We have taken a simple diffusion coefficient to focus on the difficulties originated by the drift. Theorem 5.1 below shows existence and regularity of a density under the assumption b ∈ L q (0, T ; L p (R d )), with 2 q + d p < 1. It is interesting to notice that this same condition ensures existence and uniqueness of a strong solution [29] of (5.1). See Remark 5.2 for a wider discussion along these lines.
This setting could provide a testbed for the problem of existence of densities in the case a 0 ≤ 0 (addressed at the beginning of Section 5).
Then for every initial condition x ∈ R d the solution of (5.1) with initial condition x has a density p x (t) for every t > 0. Moreover, for every γ < 1 − 2 q , sup where e γ is any number such that e γ > 1− 1 Proof. Set p ′ = p p−1 and a = d p , so that p ′ = d d−a and thus B a 1,∞ ⊂ L p ′ . Set for every γ > 0, p · ⋆,γ := sup where γ → e γ will be identified below in the proof. Now, by the Hölder inequality, and Sobolev's embeddings, We can thus estimate the approximation error with, The probabilistic estimate is as in the standard case (Section 2), so,if we set ⋆,a ) and a 0 = 2 q ′ − 1, then we have (2.9), therefore (2.10) holds, that is This formula shows an a-priori estimate for p · ⋆,γ , once we have defined, through this same formula, the value of e γ as e γ > γ a 0 e a + 1 + a 0 2a 0 γ.
Unfortunately the estimate, as well as the value e γ , depend on p · ⋆,a and e a . Notice that our assumption ensures that a < a 0 , therefore if we choose γ = a, the restriction on e a reads e a > 1 + a 0 2(a 0 − a) a.
The constant in the inequality (5.2) does not depend on the initial condition, thus ).
If we choose δ small enough that a a 0 + 2δ < 1, this provides an a-priori estimate for p · ⋆,a , with e a chosen as above.
With the estimate of p · ⋆,a at hand, we look back at (5.2) and see that if then (5.2) provides an estimate for p · ⋆,γ .
2. An alternative proof via a backward Kolmogorov equation is available to prove Theorem 5.1. The idea we wish to use has been extensively used in results on regularization by noise, see for instance [17], or [29], and [15] for an extension to random drifts. The idea, here, is to re-write the equation (5.1) as an equation with more regular coefficients. Notice though that in this way, to "solve" a Fokker-Planck equation, we end up using its dual Kolmogorov equation. Our original proof has the advantage to be based on elementary arguments that do not require to solve PDEs, and therefore can be readily used in more general cases, see for instance Section 7.2.2. Indeed, for b ∈ L q (0, T ; L p (R d )), p ≥ 2 and d p + 2 q < 1, consider the following backward parabolic equation, with U ∈ L q (0, t; W 2,p (R d )) ∩ W 1,q (0, t; L p (R d )) (see [29,Theorem 10.3]). It is possible to give a quantitative estimate of the dependence on λ (with a minor modification from [16,Lemma 3.4], to make the dependence on λ more explicit), for every positive a < 1 2 (1 − 2 q − d p ), and for λ large enough. By Itō's formula, and this immediately allows to estimate the approximation error (with the optimal choice λ = ǫ −1 ).

5.1.1.
An almost sure regularity result. We discuss a simple application of the previous result. The regularity of the density we have found in Theorem 5.1 can be used to obtain regularity of functionals of solutions of (5.1). Indeed, under the assumptions of To see this, we simply use the Hölder inequality, embeddings of Besov spaces, and the estimate of the previous theorem to obtain, where a = d p , and p ′ and q ′ are the conjugate Hölder exponents of p, q. Similar computations show that the same function is Hölder continuous on [0, T ] if q ′ e a < 1, that is if 2d p + 2 q < 1. More generally, we have the following result.
Let X x be the solution of (5.1) with initial condition x ∈ R d , and let p x (t) be its density for every t > 0. If f ∈ L q (0, T ; L p (R d )), then the map is a. s. Hölder continuous on [0, T ] with exponent smaller than 1 q ′ − e a , where q is the conjugate Hölder exponent of q and a = d p . Proof. The proof is elementary and uses the Markov property and Kolmogorov's continuity theorem. We only give a sketch under the assumption that f does not depend from t. The general case is entirely similar.
Let a = d p , and set for brevity g(r) = (1 ∧ r) −ea . By the Markov property, if r 1 < r 2 < · · · < r k , f k L p g(r 1 )g(r 2 − r 1 ) . . . g(r k − r k−1 ), since, as above, Thus, by k changes of variables, If t − s ≤ 1, then r k ≤ r k−1 ≤ · · · ≤ r 1 ≤ t − s ≤ 1 and Remark 5.4. We remark that the above proposition does not show that the stochastic flow generated by (5.1) is Hölder continuous.

5.2.
Rougher diffusion coefficient. We wish to briefly discuss some difficulties related to the extension of the "core" method with rougher diffusion coefficients.
First of all, regularity of the diffusion coefficient is a more delicate issue, and regularity itself might not be as significant as for the drift coefficient in view of densities. For instance by the Levy characterization theorem we know if σ : R d → R d×d ′ satisfies |σ(x)| = 1 for all x, then the solution of is a Brownian motion and thus has a smooth density. On the other hand the method we have introduced seems to strongly depend on an evaluation of the increments of σ. It would be thus reasonable to expect results if we lower the summability of the "derivatives" of σ, namely by requiring that σ ∈ B β p,q for some p, q ≥ 1 but finite (recall that C β b = B β ∞,∞ ).
A strategy to estimate the approximation error could be as in Theorem 5.1. Assume for instance σ is non-singular and σ ∈ L ∞ ∩ B β p,q , and let p x (s) be the density of the solution X x s of (5.4) with initial condition X 0 = x, fix t > 0 and let Y be the auxiliary process introduced in Section 2. The approximation error would be and the Markov property yields On the one hand the above formula contains increments of σ, that could be estimate using the regularity of σ. On the other hand the term p y (t−s, ·) gives a too singular contribution, if we expect a singularity in time as in Theorem 5.1. In other words, the estimate of the term above is successful only when β > d p , that is whenever σ, by Sobolev's embeddings, is a Hölder function.

More regularity -II
In this section we shall see that the method introduced in Section 2 is not optimal, and will suggest a partial probabilistic proof that goes in the direction of the optimal result.
Consider for simplicity the problem with constant diffusion (although our considerations equally hold with a non singular-diffusion) and bounded drift, with initial condition X 0 = x. The computations in Section 2 show that X t has a density p x (t) ∈ B 1− 1,∞ . It is easy to be convinced though that the expected regularity is B 1− ∞,∞ (that is Hölder). Indeed, ideally the density should solve the associated Fokker-Planck equation where g is the heat kernel. It is not difficult, using this mild formulation, to prove that and conclude with estimates for the heat kernel. Similar computations (with a caveat) also show Hölder bounds. The trick is to bound the quantity although we need to know a-priori that this quantity is finite. A similar quantity is considered in Section 5. We turn to a result that improves slightly (but without getting the optimal result) the summability of the density. We will work under the same assumptions of Section 2.
, with β ∈ (0, 1). Assume moreover (2.2). Let X x be a solution of (2.1) with initial condition x ∈ R d . Then the density p x (t) of X x t is in B a p,∞ (R d ) for every a < β and every p < d d−β . Proof. By the computations in Section 2 we know that under the standing assumptions the density p x (t) ∈ B a 1,∞ for every a < β. By Sobolev's embeddings, p x (t) ∈ L p for every p < d d−β . Fix p ∈ (1, d d−β ) and denote by q the conjugate Hölder exponent of p. Notice that, by our choice of p, we have that q > d β . We will use the smoothing Lemma A.3. To this end let φ ∈ F α q,∞ (R d ), with α ∈ ( d q , 1). With these values of α and q, we can use the characterization of F α q,∞ in terms of differences given in formula (A.3). In particular, if we set . As usual, we consider E[∆ m h ϕ(X t )] and split it into probabilistic estimate and approximation error . The probabilistic estimate is not the source of issues. We only have to make a Hölder inequality in formula (2.7) in L p − L q rather than L 1 − L ∞ , to get where Y ǫ is the process defined in formula 2.3. We turn to the approximation error , Notice that Y ǫ t is a function of X x t−ǫ and of a Brownian motionB s = B t−ǫ+s − B t−ǫ , s ≥ 0 that is independent from the history until time t − ǫ, and X is a Markov process. Thus (σ(X y s ) − σ(y)) dB s . Therefore, by the Hölder inequality, and with the same computations used to obtain the estimate (2.6) q . We thus have, using the computations above and again the Hölder inequality, Let g σ(y),ǫ be the density of y + σ(y)B ǫ , then by the non-degeneracy assumption on σ it is easy to see that for a constant c that depends on σ, thus Using the same computations of Proposition 2.2 and the smoothing Lemma A.3, we finally obtain that p x (t) ∈ B a p,∞ for every a < β. Remark 6.2. With a more careful analysis of the proof of Proposition 2.2, it is not difficult to obtain an estimate of the norm of the density in B a p,∞ in terms of t, as in (2.10).
Remark 6.3. Notice that, since by the theorem above we now know that the density is in B β− d d−β −,∞ , we can deduce a stronger summability (this was the starting point of the proof above). Unfortunately the limitation in the characterization (A.3) of Triebel-Lizorkin spaces prevents us to iterate the above proof and deduce Hölder bounds.
Remark 6.4. In the case d = 1, β > 1 2 we have the embedding of B β− d d−β −,∞ in spaces of Hölder functions. We can thus conclude that the density is Hölder continuous, as expected.

Examples and applications
In this section we show a few applications of the simple methods and its improvements illustrated in the first part of the paper.
Before presenting the examples, we wish to notice that, in the estimate of the probabilistic estimate, there are two main key points. The first is the estimate of the small time asymptotics of the "noise" part of the equation. The second is that the method we have illustrated depends very much on the fact that for times close to the final time the noise is independent from the past. In principle this might rule out, for instance, processes such as the fractional Brownian motion as the source of noise. In the particular case of the fractional Brownian motion though, one can use a suitable integral representation, and in that case is easy to be convinced that most of the results (and in particular the basic results of Section 2) hold true with minimal modifications (essentially we have the Hurst index as the value of the parameter θ in Proposition 2.2).
As it regards the second key point, we notice that, in view of the analysis of infinite dimensional problems, our method seems to be tailored to the white noise, and it seems so far unlikely it can be applied to problems driven by a spatial noise, with no temporal component (see for instance [8] for results in this direction based on Malliavin's calculus).

A path-dependent SDE.
Here we use only the basic method (from Section 2), because as such we do not have a Markov evolution. This result can be essentially found already in [6], we provide it for the purpose of illustrating the method. Notice that the example includes, with suitable adjustments, also the case of equations with delay.
Given T > 0 and two integers d, , then b(t, ω) = b(t, ω ′ ) and σ(t, ω) = σ(t, ω ′ ). In other words, b and σ, when evaluated at time t, depend only on the part of the path up to time t. This is to ensure adaptness in the equation. |ω r − ω s |.
Proposition 7.1. Under the above assumption, if (X t ) t≥0 is a solution of (7.1), then for every t ∈ (0, T ] the random variable X t has a density with respect to the Lebesgue measure. Moreover, the density is in B a 1,∞ (R d ) for every a < β. Proof. We use the auxiliary process (2.3). Notice that by our first assumption the term σ(t − ǫ, X) is measurable with respect to the history up to t − ǫ.
We have, thus the approximation error is, To estimate E[δ t−ǫ,s (X) 2β ], we notice that . By our second assumption, the probabilistic estimate can be estimated as in Section 2, to get Pe ǫ −m/2 |h| m ϕ L ∞ . Proposition 2.2 concludes the proof.
Remark 7.2. Notice that in the derivation of the results of Sections 4, 5, and 6, the Markov property was used at same stage, thus they cannot be adapted to this setting.
Remark 7.3. The result we have obtained here is slightly less general than in [6, Section 3.1]. Indeed, in [6] they have the same assumptions, but they have a weaker form of continuity of the diffusion coefficient, namely, for some ǫ > 0. The density in this case is in L e log , the Orlicz space with Young function e log (u) = (1 + |u|) log(1 + |u|). We believe that in principle, if one would consider Besov spaces with norm as in [6], one might extend the result we have given above with this weaker assumption. We do not consider this, due to the analytical difficulties involved. 7.2. Lvy noise driven SDEs. In this section we wish to slightly extend the results of [13] in the direction of Sections 5 and 6. We briefly recall the setting from [13].
We consider the following equation, driven by the α-stable-like process (Z t ) t≥0 defined above, Here the non-degeneracy assumption can be weakened following the lines of Section 3.
In the rest of the section we will restrict to the (simpler) case α ∈ (1, 2). While this greatly simplifies the computations of Section 7.2.1, it is a necessary assumption in Section 7.2.2 to obtain sufficiently smallness in the approximation error . 7.2.1. More regularity. In this section we will adapt the ideas of Section 6 to problem (7.2). From [13] we know that, under the above assumptions on the coefficients and on the driving process, any solution of (7.2) has a density in B s 1,∞ , for s < β ∧ (α − 1). We wish to improve the summability index. As far as we know, under the standing assumptions on the coefficients there is not yet a general result of existence or uniqueness for (7.2). So in the rest of this section we assume that either there is a solution of (7.2) that is a Markov process, or that there is a solution that can be obtained by approximation of (7.2) with smooth coefficients. In the latter case the Besov bound ensures the uniform integrability of the approximating densities and thus that the limit problem has a density with the same Besov regularity.
Theorem 7.5. Let (Z t ) t≥0 be a Lvy process as in Assumption 7.4, with α > 1, let b be bounded measurable and σ ∈ C β b (R d ; R d×d ′ ) such that (7.3) holds, and set κ = min(α − 1, β). For every x ∈ R d let (X x t ) t≥0 be a solution (as specified above) of (7.2). Then for every t > 0 and x ∈ R d , the density p x (t) of the random variable X x t is in B a p,∞ (R d ) for every p ∈ (1, d d−κ ) and every a < κ(1 ∧ α p ). We proceed with the proof of the additional regularity of the density. We will need first a slight modification of [13,Lemma 3.3

] (which in turn is a quantitative version of [46, Theorem 1.2]).
Lemma 7.6. Let (Z t ) t≥0 be a Lvy process as in Assumption 7.4, and let g t be the density of Z t , for each t > 0. Then for all integers m ≥ 1, all h ∈ R d , |h| ≤ 1, and all p ∈ [1, ∞], where q is the conjugate Hölder exponent of p.
Proof of Theorem 7.5. Define for s ≥ t − ǫ, and set Y ǫ s = Y X t−ǫ ,ǫ s . Up to the change of the driving process, this is the same as the auxiliary process (2.3).
We proceed as in Section 6. Let p x be the density of the solution of (7.2) with initial condition x. We know that p x (t) ∈ B a 1,∞ for all a < κ, then by Sobolev's embeddings , where q is the conjugate Hölder exponent of p. Define Φ as in (6.1), so The probabilistic estimate is obtained through the previous lemma and the nondegeneracy assumption on σ, We turn to the analysis of the approximation error . We use the Markov property, and by the Hölder inequality, for each y ∈ R d , where we have used [13, Lemma 3.1-(ii)], and we need θp < α (this will limit the final regularity when α < p). We thus have, First, we need a bound of p x (t − ǫ) L p . By our choice of p, B a 1,∞ ⊂ L p for a suitable a < κ. Proposition 2.2, together with the estimates in [13], provides an estimate in time of p x (t − ǫ) B a 1,∞ . With our standard choice ǫ ≤ t 2 , we readily have that p x (t − ǫ) L p is bounded by a number that depends on t, but that is uniform in x and ǫ.
It remains to estimate the last term. Denote by g y,ǫ the density of σ(y)(Z t −Z t−ǫ ), and by g ǫ the density of Z t −Z t−ǫ . It is easy to see that g y,ǫ (z) = det(σ(y)) −1 g ǫ (σ(y) −1 z), thus and it is sufficient to prove that the inner integral is uniformly bounded in z. This follows from computations similar to those in Section 6, using the estimate in [13,Lemma 3.3].
In conclusion, Ae ǫ θ α (1+κ) , therefore, using the same computations of Proposition 2.2 and the smoothing Lemma A.3, we finally obtain that p x (t) ∈ B a p,∞ (R d ), for every a < κ 1 ∧ α p . 7.2.2. Rougher drift. In this section we extend the results of Section 5.1 to the α-stable-like drivers. We consider problem (7.2) with b ∈ L q (0, T ; L p (R d )) (and same assumptions as before on σ).
As in Section 6, our result is an a-priori estimate on the regularity of the density. In Section 6 the equation we consider has a unique strong solution [29], thus the result is rigorous. Here, under the assumptions on the coefficients we consider, we do not know if there is a solution, or if it is a Markov process. So, as in the above Section 7.2.1, we assume that either there is a solution of (7.2) that is a Markov process, or that there is a solution that can be obtained by approximation of (7.2) with smooth coefficients. In the latter case the Besov bound ensures the uniform integrability of the approximating densities and thus that the limit problem has a density with the same Besov regularity.
Theorem 7.7. Let (Z t ) t≥0 be a Lvy process as in Assumption 7.4, with α > 1, be a solution (as specified above) of (7.2). Assume there is e ≥ 0 such that , and q ′ is the conjugate Hölder exponent of q. Then for every t > 0 and x ∈ R d , the random variable X x t has a density p We recall that, if p x (t) is the density of a solution X x t of (7.2) with initial condition x, we have set with e γ to be suitably chosen. Set a = d p , so that p ′ = d d−a and B a 1,∞ ⊂ L p ′ , where p ′ is the Hölder conjugate exponent of p. We start with some estimates of the contribution of the drift.
Lemma 7.8. If q ′ e a < 1 (for the second inequality), for s ≤ t and ǫ < 1, ǫ ≤ t 2 , where q ′ is the Hölder conjugate exponent of q.
Proof. The first estimate follows as in Theorem 5.1, since we have, Likewise, and, by the Markov property and the estimate above (twice), Thus, by the Hölder inequality (twice), by elementary computations, since q ′ e a < 1.
From the above estimates we immediately deduce the following result.
Lemma 7.9. If γ < α and s ∈ [t − ǫ, t], For D we use the previous lemma, since Proof of Theorem 7.7. We are ready to estimate the approximation error and the probabilistic estimate. We use the auxiliary process as in formula (7.4) of previous section. The probabilistic estimate can be immediately deduced from Lemma 7.6 (with p = 1), thus Pe ǫ −m/α |h| m φ L ∞ . We turn to the approximation error . We have, For the first term we use the first statement of Lemma 7.8 (recall that θ < 1), For the second term we use [13,] and Lemma 7.9: let γ be such that α < γ ≤ 2 and γβ < α, then If we put the two estimates together, All the above computations show that under the following conditions, e a > a a 0 e a + 1 + a 0 αa 0 a, the norm p · ⋆,a is finite, and thus p · ⋆,b < ∞ for every b < a 0 , for a suitable value of e b that can be easily computed as above by Proposition 2.2.
It is not immediately apparent that the conditions of Theorem 7.7 may be verified, so we provide a couple of particular cases. The first matches Theorem 5.1. Notice that under slightly different assumptions on the drift b but that are essentially the same as those in the corollary below, existence and uniqueness hold, see [10].
Corollary 7. 10. Under the assumptions of Theorem 7.7 on the solution and on the coefficients, and if moreover σ is constant, then κ = 1 q ′ and one can take e = 0. In particular, the conclusions of the theorem hold if The second particular case applies for p large. In that case we expect the number e of Theorem 7.7 to be small. As a matter of facts, e → 0 as p → ∞. To further simplify, we consider the case κ = 1 q ′ . Corollary 7.11. Under the same assumptions of Theorem 7.7, if • either β ≥ α − 1, or β < α − 1 and q ≤ α α−β−1 , then any number e such that κd p(ακ − 1) − d < e < 1 q ′ meets the conditions (7.5), and thus the conclusions of Theorem 7.7 hold.
Proof. Let us first prove that κ = 1 q ′ if eq ′ < 1. Indeed, by the third of the assumptions above, 1 q ′ ≤ β+1 α . Moreover, by the second of the assumptions above, βe, and in conclusion κ = 1 q ′ . The condition ακ > 1 holds since β > 0 and q ′ < α. The condition d p < ακ − 1 holds by the first of the assumptions of the corollary. To ensure that e can be chosen in a non-empty interval, we need to check that and this follows from the first assumption of the corollary.
7.3. The 3D Navier-Stokes equations with noise. The problem of existence of densities for finite dimensional projections of the solutions of the Navier-Stokes equations driven by noise has been the motivating example, discussed in [14], that has led to the development of the dimension-free method illustrated in this paper. The results have been further improved in [43] proving Hölder regularity in time with values in Besov spaces in time, and in [41] proving Hölder regularity in space of the densities. Unlike Section 6, the result of Hölder regularity is optimal but has been proved by analytical methods.
7.4. A singular equation: Φ 4 d . In this section we consider the following singular stochastic PDE, on the torus T d , with periodic boundary conditions, in two dimensions (but see Remarks 7.14 and 7.19 for the three dimensional case), where ξ is space-time white noise.
The equation is generally understood as the limit of a family of regularized problems. To this end, let η be a smooth compactly supported function, set η δ (x) = δ −d η(δ −1 x), and ξ δ = η δ ⋆ ξ. Consider where c δ is a suitable number that depends on δ and η. Define where is chosen as the stationary process that solves the above equation. With this choice, δ and δ are also stationary processes. The constants c δ are chosen so that δ and δ have a limit as δ ↓ 0, see [11,34] in dimension two, and [20,21,22,9,37] for the three dimensional case. We will denote by , , the limit quantities as δ ↓ 0.
7.4.1. Densities for the solution. Set X δ = δ + R δ , then R δ solves We prove that the solution has a density at each time. Since the solution is distribution valued, we will prove the existence of a joint density of the solution tested over a finite but arbitrary number of test functions.
Fix t > 0, then the auxiliary process for the simple method (the term analogous to 2.3) here is given as dY ǫ δ,i = ϕ i , ξ δ . By the non-degeneracy assumption on the test functions it follows that, for δ small enough, the random vector ( ϕ 1 , ξ δ , ϕ 2 , ξ δ , . . . , ϕ n , ξ δ ) i,j=1,2,...,n is a non-degenerate Gaussian random vector. It easily follows then that the probabilistic estimate is given by Pe For the approximation error we need to compute This is immediate, since X δ has moments in (negative) Besov spaces [34]. This is read in terms of uniform bounds on moments of X δ in negative Besov spaces. In conclusion Ae ǫ α [φ] C α b . Proposition 2.2 concludes the proof. Remark 7.13. We actually expect the densities in the previous proposition to be smooth. Following the lines of Section 2.2, one could define a infinite dimensional auxiliary process Y ǫ with the drift frozen at t − ǫ, as in formula (2.13). This would allow to compute an approximation error at the level of the infinite dimensional processes in the appropriate Besov spaces of distributions. When evaluated over test functions, this error would provide the approximation error for the method.
Remark 7.14. The above result holds also in dimension three, with the same proof. The difference is that, as we shall see below, the correct interpretation of the equation is more involved.
In dimension three we additionally introduce the diagram δ , the stationary solution of (∂ t − ∆) δ = δ . Again, the constant c δ is chosen so that δ and δ have a limit as δ ↓ 0, see [37]. Set X δ = δ − δ + R δ , then R δ solves δ . This is not enough yet to give a meaning to the limit equation, because the term R δ δ is not well defined in the limit, given the expected regularity of the limit R. Without giving too many details (that would be beyond the scope of this paper), we know that the equation can be suitably reformulated following for instance [37] (see also [9]) as, , Q δ gathers more regular terms, and = , > are defined in terms of the Bony paraproduct.
At this stage one can proceed as in the two-dimensional case, since the stochastic diagrams as well as the remainder have uniform bounds in time on moments in negative Besov spaces. Thus, Theorem 7.12 holds for (7.6) also in dimension d = 3.
Then the random vector ( X(t), ϕ 1 , X(t), ϕ 1 , . . . , X(t), ϕ n ) has a density with respect to the Lebesgue measure in B a 1,∞ (R n ), for every a < 1. 7.5. Densities for the remainder. In this section we wish to delve into another direction. As a second application of the simple method in this framework, we wish to investigate the existence of a density for the remainder R. A possible approach could be based on the previous considerations and some arguments from Section 3.2. Indeed, it would be sufficient to prove (for instance in dimension two) that (X, ) has a joint density (when evaluated over test functions). Since both X and are driven by the same noise, this can be only possible if the drift is hypoelliptic. This idea has two drawbacks: the first is that it would give densities for R against test functions, while R is a bona fide function. The second is that, as we have seen in Section 3.2, the method requires good estimates on the drift, while giving back low regularity.
We wish to consider here a different idea, that uses the non-linearity directly. The equation for R is (in dimension two), Our idea is to understand the terms not depending on R as the "noise" of the evolution, since our simple method presented in this paper is essentially based on small time estimates for the density of the "noise" object.
Unfortunately the idea does not work here in dimension two (and apparently in dimension three as well, see Remark 7.19), since and are "too good", that is with regularity comparable with the "non-noise" terms R 3 + 3 R 2 − (1 − 3 )R. As such, we would end up with a formula similar to (2.9), but with a 0 = 0. For this reason in the rest of the section we will discuss an example very close to problem (7.6) and that is amenable to our analysis.
We consider the following problem, with periodic boundary conditions on the two dimensional torus, where γ ∈ (0, 2 5 ) and ξ is space-time white noise. As above, the problem makes sense when suitably renormalized or as a limit of approximated problems. If ξ δ is a spatial smooth approximation of the noise, we study the problem 2 ], and δ solves (∂ t − ∆) δ = (−∆) γ/2 ξ δ . Set δ (t) = δ (t) 2 − c δ (t) and δ (t) = δ (t) 3 − 3c δ (t) δ (t). Denote respectively by , and their limits as δ → ∞. From now on we will drop for simplicity the subscript δ, even though all computations are rigorous only at the level of approximations. The Besov bound from our method will provide the uniform integrability necessary to bring the argument rigorously in the limit as δ → 0 to the solution of (7.8).
The choice γ < 2 5 is for convenience. In this regime it is sufficient to decompose the solution as X = + R, and the remainder solves One can prove, for instance following the lines of [36, Theorem 1.1], that has regularity C −γ− , has regularity C −2γ− and has regularity C −3γ− . Thus we expect that R has regularity C (2−3γ)− , so that R 2 and R are well defined when γ < 2 5 1 . In the rest of the section we will assume that we have a solution for the auxiliary equation with the above mentioned regularity. In principle the solution might be defined only up to a random time, but on the one hand we may guess 1 If γ is above 2 5 but below 1 2 , we need to decompose X with the additional term . The case γ = 1 2 is, analytically, equivalent to Φ 4 3 .
that the results of [35] extend to this case, and on the other hand our method can take local solutions into account as well (as in Section 2.1). The main result for the densities of the remainder is as follows.
Theorem 7. 16. Let X be a solution of (7.8), and set R t = X t − t . Then for every t > 0 and every x ∈ T 2 the random variable R t (x) has a density with respect to the Lebesgue measure on R. Moreover the density is in B a 1,∞ for every a < γ 2−3γ . To prove the theorem we first identify the approximation problem, as in formula (2.3) for the toy problem. Fix 0 < s < t and write the equation for R in mild form, The terms in the first integral in the formula above are more regular than the second integral, thus should provide a smaller approximation error . We will treat the last term as "noise", although clearly this new "noise" has no independent increments. We define our auxiliary process for s ≤ r ≤ t, as Lemma 7.17. Under the standing assumptions, given φ ∈ C α for some α ∈ (0, 1), for every 0 ≤ s < t, x ∈ T 2 , and every δ > 0.
Proof. We have that Among the terms in the right-hand-side of formula above, the least regular is the one containing r R r . By [36, Proposition 2.3], The terms containing R 3 r and r R 2 r are more regular and give a smaller contribution in terms of powers of t − s.
We wish now to "extract" the component of our "noise" that is independent from the past (that is before time s). To this end notice that if s < r < t, then r = e ∆(r−s) s + r s e ∆(r−u) dW u =: s→r + s,r , with s→r measurable with respect to the history up to time s (and smooth for r > s), and s,r independent from the history up to time s. By squaring, s→r + 3 s→r s,r + 3 s→r s,r + s,r dr.
Proof. Since where F s is the σ-field of events until time s, the result follows if we can estimate the small time asymptotic of the L 1 norm of the density (and of its derivatives) of S t (x) given F s .
To this end we recall that for a real random variable X with density g, we have by [ M −1 X DX m+1 m+1,2 m r , for every r > p, where D is the Malliavin derivative, and M X is the Malliavin matrix of X. By integrating over x, we see that (7.12) D m g L 1 E[|X| |H (1,1,...,1) |], so that our task is to understand how this expectation scales in powers of (t − s) when we take X = − t−s 0 e ∆(t−s−r) U 3 (r) + 3U 2 (r) r + 3U 1 (r) r + r dr, and U i , i = 0, . . . , 3 are given elements with U i uniformly bounded in time with values in C −i·γ−δ for i = 1, 2, 3, for every δ > 0. Notice that the choice of X above corresponds, up to an additive constant, to S t when conditioned over F s , and with a translation in time of the stochastic diagrams (that does not change the law). By the chain rule for the Malliavin derivative, DX = − t−s 0 e ∆(t−s−r) 3U 2 (r)D r + 6U 1 (r) r D r + 3 r D r dr, and similarly for the second and third derivative (the fourth derivative is zero since the random variable X above is in the third Wiener chaos). In particular, where (e k ) k∈Z 2 is the Fourier basis of complex exponentials. We see that for every small δ > 0, Notice that when we will evaluate the external expectation in (7.11), we will see that sup [0,t−s] U 1 C 2γ+δ ≈ (t − s) − γ 2 (since we will replace U 1 (r) by s→r ). In conclusion X scales as (t − s) 1− 3 2 γ , up to small corrections. With similar computations we see that DX scales also as (t − s) 1− 3 2 γ (due to the additional contribution of D ), as well as the second and third Malliavin derivatives. Therefore, using formula (7.12), the conclusion of the lemma follows.
We are ready to complete the proof of the main result of this section.
Remark 7.19. One can expect that Theorem 7.16 might hold also for (7.6) in dimension three. In that case the role of the "noise" should be played by Since the remainder has regularity C 1− , the approximation error should be of the order Ae ≈ ǫ 1 2 − , while the probabilistic estimate has a density with increments of order |h|/ √ ǫ. This prevents the application of Proposition 2.2 (in short, it would correspond to the case a 0 = 0). Otherwise one could improve the Ae with a smarter definition of the auxiliary process, as in Section 2.2. In principle one could try to use Hölder continuity in time of the stochastic diagrams. Unfortunately this would deteriorate the space regularity (see for instance [36, Theorem 1.1]), yielding a final dependence of the approximation error of order Ae ≈ ǫ 1/2− (or worse). We do not try this attempt here though.

Appendix A. Weaker versions of the smoothing lemma
In the first pages of Malliavin's seminal paper [32] on a probabilistic proof of the Hörmander theorem there is a classical smoothing lemma. This is the link between the existence of a density and probabilistic integration by parts and the Malliavin calculus. The lemma says roughly that if for a R d -valued random variable X, for all |α| ≤ n and all test functions φ, then X has a density, with respect to the Lebesgue measure, in C n−d−1 . In this section we give a generalization of this lemma in Besov spaces.
A.1. Besov and Triebel-Lizorkin spaces. Besov spaces, together with the Triebel-Lizorkin spaces, are a scale of function spaces introduced to capture the fine properties of regularity of functions, beyond on the one hand the Sobolev spaces, and on the other hand the spaces of continuous functions. Indeed, Besov spaces contain both. The main references we shall use on this subject are [47,48].
A general definition with the Littlewood-Paley decomposition is (briefly) as follows. Let (ϕ n ) n≥0 be a band-limited decomposition of the frequency space. For a distribution f each f n = F −1 (ϕ nf ) is a Schwartz function and f = n f n . Then f B s p,q := (2 ns f n L p ) n≥0 ℓ q , and f F s p,q := (2 ns f n ) n≥0 ℓ q L p , (p < ∞ for the F s p,q norm). Notice that to define the B s p,q norm, the Littlewood-Paley decomposition is first averaged over position, and then over frequencies, while the opposite happens for the F s p,q norm. We define the spaces B s p,q (R d ) and F s p,q (R d ), with s ∈ R and 1 ≤ p, q ≤ ∞ (with the exception of p = ∞ for the F -space) as the closure of the Schwartz space with respect to the above norms.
In this way the spaces we obtain are separable, regardless of the index (but this is an issue only if p = ∞ or q = ∞). Since · B s p,p = · F s p,p for all p, we define F s ∞,∞ := B s ∞,∞ . The spaces obtained do not depend on the band-limited decomposition (although the norms do), and different choices of the decomposition give raise to equivalent norms.
A.1.1. Definition via the difference operator. Given α > 0, we shall denote by C α (R d ) the standard Hölder space, namely the space of functions with [α] derivatives such that the derivatives of order [α] are Hölder continuous of exponent (α − [α]).
A special role in this paper is played by the Zygmund spaces C α b (R d ) = B α ∞,∞ (R d ) that, for non-integer values of α, coincide with the (separable version of the) Hölder spaces. With this in mind, we recall an alternative definition of Besov spaces that is better suited for our purposes (see [47,Theorem 2.5.12] or [48, Theorem 2.6.1] for further details). Define then the following norms, for s > 0, 1 ≤ p ≤ ∞, 1 ≤ q ≤ ∞, .
where m is any integer such that s < m, and B 1 (0) is the unit ball in R d . A similar definition can be given for Triebel-Lizorkin spaces (see [47, Theorem 2.5.10]). Let 1 ≤ p < ∞, 1 ≤ q ≤ ∞, and s > d p∧q , and set where m is an integer m > s. Then f L p + [f ] F s p,q is an equivalent norm in F s p,q (R d ). Unfortunately the representation in terms of differences only holds for s large.
We know by [47,Theorem 2.11.2] that the space F 0 q,∞ is the dual of F 0 p,1 , therefore we deduce from the inequality above that ∆ m h g F 0 p,1 ≤ K|h| s . On the other hand we know by [47,Proposition 2.5.7] that F 0 p,1 ⊂ L p , hence ∆ m h g L p K|h| s . Finally, by [2], g L p f L p , and in conclusion g ∈ B s p,∞ and g B s p,∞ (K + f L p ), and then (A.6) follows, since f B s−α p,∞ = (I −∆ d ) α/2 g B s−α p,∞ = g B s p,∞ . Remark A.4. Since F α ∞,∞ = B α ∞,∞ for all α ∈ R, the case p = ∞ in the lemma above is already covered by Lemma A.1.