Density formula and concentration inequalities with Malliavin calculus

We show how to use the Malliavin calculus to obtain a new exact formula for the density ρ of the law of any random variable Z which is measurable and differentiable with respect to a given isonormal Gaussian process. The main advantage of this formula is that it does not refer to the divergence operator δ (dual of the Malliavin derivative D ). The formula is based on an auxilliary random variable G : = 〈 DZ , − DL − 1 Z 〉 H , where L is the generator of the so-called Ornstein-Uhlenbeck semigroup. The use of G was ﬁrst discovered by Nourdin and Peccati (Probab. Theory Relat. Fields 145 , 2009) in the context of rates of convergence in law. Here, thanks to G , density lower bounds can be obtained in some instances. Among several examples, we provide an application to the (centered) maximum of a general Gaussian process. We also explain how to derive concentration inequalities for Z in our framework.


Introduction
Let X = {X (h) : h ∈ H} be a centered isonormal Gaussian process defined on a real separable Hilbert space H. This just means that X is a collection of centered and jointly Gaussian random variables indexed by the elements of H, defined on some probability space (Ω, , P), such that the covariance of X is given by the inner product in H: for every h, g ∈ H, E X (h)X (g) = 〈h, g〉 H .
The process X has the interpretation of a Wiener (stochastic) integral. As usual in Malliavin calculus, we use the following notation (see Section 2 for precise definitions): • L 2 (Ω, , P) is the space of square-integrable functionals of X ; this means in particular that is the σ-field generated by X ; • 1,2 is the domain of the Malliavin derivative operator D with respect to X ; this implies that the Malliavin derivative DZ of Z ∈ 1,2 is a random element with values in H, and that E DZ 2 H < ∞.
• Dom δ is the domain of the divergence operator δ. This operator will only play a marginal role in our study; it is simply used in order to simplify some proof arguments, and for comparison purposes.
From now on, Z will always denote a random variable in 1,2 with zero mean.
The following result on the density of a random variable is a well-known fact of the Malliavin calculus: if DZ/ DZ 2 H belongs to Dom δ, then the law of Z has a continuous and bounded density ρ given, for all z ∈ , by ρ(z) = E   1 ( [16]. In the first main part of our paper (Section 3), we prove a new general formula for ρ, which does not refer to δ. For Z a mean-zero r.v. in 1,2 , define the function g Z : → almost everywhere by The L appearing here is the so-called generator of the Ornstein-Uhlenbeck semigroup; it is defined, as well as its pseudo-inverse L −1 , in the next section. By [15,Proposition 3.9], we know that g Z is non-negative on the support of the law of Z. Under some general conditions on Z (see Theorem 3.1 for a precise statement), the density ρ of the law of Z (provided it exists) is given by the following new formula, valid for almost all z in the support of ρ: We also show that one simple condition under which ρ exists and is strictly positive on is that g (z) σ 2 min hold almost everywhere for some constant σ 2 min > 0 (see Corollary 3.3 for a precise statement). In this case, formula (1.3) immediately implies that ρ (z) E |Z| / 2g Z (z) exp −z 2 / 2σ 2 min , so that if some a-priori upper bound is known on g, then ρ is, up to a constant, bounded below by a Gaussian density.
Another main point in our approach, also discussed in Section 3, is that it is often possible to express g Z relatively explicitly, via the following formula (see Proposition 3.7): This is a consequence of the so-called Mehler formula of Malliavin calculus; here X ′ , which stands for an independent copy of X , is such that X and X ′ are defined on the product probability space (Ω × Ω ′ , ⊗ ′ , P × P ′ ); E denotes the mathematical expectation with respect to P × P ′ ; and the mapping Φ Z : As an important motivational example of our density formula (1.3) combined with the explicit expression (1.4) for g Z , let X = (X t , t ∈ [0, T ]) be a centered Gaussian process with continuous paths, Understanding the distribution of Z is a topic of great historical and current interest. For detailed accounts, one may consult textbooks by Robert Adler et al., from 1990Adler et al., from , 2007, and in preparation: [1], [2], [3]. Expressing the density of Z, even implicitly, is a subject of study in its own right; in the case of differentiable random fields, geometric methods have been used by Azaïs and Wschebor, based on the so-called Rice-Kac formula, to express the density of Z in a way which allows them to derive sharp bounds on the tail of Z: see [4] and references therein.
Herein we will apply our density formula to Z, resulting in an expression which is not restricted to differentiable fields, and is not related to the Azaïs-Wschebor formula. To achieve this, we will use specific facts about Z (see explanations and references in Section 3.2.4). It is known that Z ∈ 1,2 , that, almost surely, the supremum of X on [0, T ] is attained at a single point in I 0 ∈ [0, T ], and that the law of Z has a density ρ. The underlying Wiener space can then be parametrized to imply DZ = 1 [0,I 0 ] . We note by R the covariance function of X , defined by R(s, t) = E(X s X t ). Let stands for an independent copy of X (as defined above). Using the Mehler-type formula (1.4), we show Therefore by (1.3), for almost all z in the support of ρ, we have In particular, if R is bounded above and below on [0, T ], we immediately get some Gaussian lower and upper bounds for ρ over all of . Moreover, now that we have a formula for ρ, it is not difficult to derive a formula for the variance of Z. We get Our general density formula (1.3) has found additional applications reported in other publications.
In the context of Brownian directed polymers in Gaussian and non-Gaussian environments, this paper's second author obtained fully diffusive fluctuations for some polymer partition functions in [20] using slightly different tools than we have here, but the techniques in this paper which lead to (1.5) would yield the results in [20, Section 5] as well. In a direct application of (1.3) in the same style as (1.5), Nualart and Quer-Sardanyons proved in the preprint [18] that the stochastic heat and wave equations have solutions whose densities are bounded above and below by Gaussian densities in many cases.
In the second main part of the paper (Section 4), we explain what can be done when one knows that g Z is sub-affine. More precisely, if the law of Z has a density and if g Z verifies g Z (Z) αZ + β P-a.s. for some α 0 and β > 0, we prove the following concentration inequalities (Theorem 4.1): for all z > 0, As an application of (1.7), we prove the following result. Let B = (B t , t ∈ [0, 1]) be a fractional Brownian motion with Hurst index H ∈ (0, 1). Let Q : → be a 1 function such that the Lebesgue measure of the set {u ∈ : Q ′ (u) = 0} is zero, and |Q ′ (u)| C|u| and Q(u) cu 2 for some positive constants c, C and all u ∈ . The square function satisfies this assumption, but we may also allow many perturbations of the square. Let Z = (1.8) The interest of this result lies in the fact that the exact distribution of 1 0 Q B u du is unknown; even when Q (x) = x 2 , it is still an open problem for H = 1/2. Note also that classical results by Borell [5] can only be applied when Q (x) = x 2 (because then, Z is a second-chaos random variable) and would give a bound like Aexp(−Cz). The behavior for large z is always of exponential type. The proof of (1.7) with α and β as above for this class of examples is given at the end of Section 4.1.
A related application of relation (1.7) from our Theorem 4.1 is reported in [6], by this paper's first author, together with J.C. Breton and G. Peccati: they describe an application in statistics where they build exact confidence intervals for the Hurst parameter associated with a one-dimensional fractional Brownian motion.
Section 4 also contains a general lower bound result, Theorem 4.3, again based on the quantity 〈DZ, −DL −1 Z〉 H via the function g Z defined in (1.2). This quantity was introduced recently in [15] for the purpose of using Stein's method in order to show that the standard deviation of 〈DZ, −DL −1 Z〉 H provides an error bound of the normal approximation of Z, see also Remark 3.2 below for a precise statement. Here, in Theorem 4.3 and in Theorem 4.1 as a special case (α = 0 therein), g Z (Z) = E(〈DZ, −DL −1 Z〉 H |Z) can be instead assumed to be bounded either above or below almost surely by a constant; the role of this constant is to be a measure of the dispersion of Z, and more specifically to ensure that the tail of Z is bounded either above or below by a normal tail with that constant as its variance. Our Section 4 can thus be thought as a way to extend the phenomena described in [15] when comparison with the normal distribution can only be expected to go one way. Theorem 4.3 shows that we may have no control over how heavy the tail of Z may be (beyond the existence of a second moment), but the condition g Z (Z) σ 2 > 0 P-a.s. essentially guarantees that it has to be no less heavy than a Gaussian tail with variance σ 2 .
The rest of the paper is organized as follows. In Section 2, we recall the notions of Malliavin calculus that we need in order to perform our proofs. In Section 3, we state and discuss our density estimates. Section 4 deals with concentration inequalities, i.e. tail estimates.

Some elements of Malliavin calculus
Details of the exposition in this section are in Nualart's book [16,Chapter 1]. As stated in the introduction, we let X be a centered isonormal Gaussian process over a real separable Hilbert space H. For any m 1, let H ⊗m be the mth tensor product of H and H ⊙m be the mth symmetric tensor product. Let be the σ-field generated by X . It is well-known that any random variable Z belonging to L 2 (Ω, , P) admits the following chaos expansion: where I 0 ( f 0 ) = E(Z), the series converges in L 2 (Ω) and the kernels f m ∈ H ⊙m , m 1, are uniquely determined by Z. In the particular case where H is equal to a separable space L 2 (A, , µ), for (A, ) a measurable space and µ a σ-finite and non-atomic measure, one has that H ⊙m = L 2 s (A m , ⊗m , µ ⊗m ) is the space of symmetric and square integrable functions on A m and, for every f ∈ H ⊙m , I m ( f ) coincides with the multiple Wiener-Itô integral of order m of f with respect to X . For every m 0, we write J m to indicate the orthogonal projection operator on the mth Wiener chaos associated with X . That is, if Z ∈ L 2 (Ω, , P) is as in (2.9), then J m F = I m ( f m ) for every m 0.
Let be the set of all smooth cylindrical random variables of the form where n 1, g : n → belongs to ∞ b (the set of bounded and infinitely differentiable functions g with bounded partial derivatives), and φ i ∈ H, i = 1, . . . , n. The Malliavin derivative of Z with respect to X is the element of L 2 (Ω, H) defined as In particular, DX (h) = h for every h ∈ H. By iteration, one can define the mth derivative D m Z (which is an element of L 2 (Ω, H ⊙m )) for every m 2. For m 1, m,2 denotes the closure of with respect to the norm · m,2 , defined by the relation Note that a random variable Z as in (2.9) is in 1,2 and, in this case, , then the derivative of a random variable Z as in (2.9) can be identified with the element of L 2 (A × Ω) given by The Malliavin derivative D satisfies the following chain rule. If ϕ : n → is of class 1 with bounded derivatives, and if {Z i } i=1,...,n is a vector of elements of 1,2 , then ϕ(Z 1 , . . . , Z n ) ∈ 1,2 and Formula (2.10) still holds when ϕ is only Lipschitz but the law of (Z 1 , . . . , Z n ) has a density with respect to the Lebesgue measure on n (see e.g. Proposition 1.

in [16]).
We denote by δ the adjoint of the operator D, also called the divergence operator. A random element u ∈ L 2 (Ω, H) belongs to the domain of δ, denoted by Dom δ, if and only if it satisfies where c u is a constant depending only on u. If u ∈ Dom δ, then the random variable δ(u) is uniquely defined by the duality relationship which holds for every Z ∈ 1,2 . Notice that all chaos variables The operator L is defined through the projection operators as L = ∞ m=0 −mJ m , and is called the generator of the Ornstein-Uhlenbeck semigroup. It satisfies the following crucial property. A random variable Z is an element of Dom L (= 2,2 ) if and only if Z ∈ Dom δD (i.e. Z ∈ 1,2 and DZ ∈ Dom δ), and in this case: We also define the operator L −1 , which is the pseudo-inverse of L, as follows. For every Z ∈ L 2 (Ω, , P), we set L −1 Z = m 1 − 1 m J m (Z). Note that L −1 is an operator with values in 2,2 , and that L L −1 Z = Z − E(Z) for any Z ∈ L 2 (Ω, , P), so that L −1 does act as L's inverse for centered r.v.'s.

2292
The family (T u , u 0) of operators is defined as T u = ∞ m=0 e −mu J m , and is called the Orstein-Uhlenbeck semigroup. Assume that the process X ′ , which stands for an independent copy of X , is such that X and X ′ are defined on the product probability space (Ω × Ω ′ , ⊗ ′ , P × P ′ ). Given a random variable Z ∈ 1,2 , we can write DZ = Φ Z (X ), where Φ Z is a measurable mapping from H to H, determined P • X −1 -almost surely. Then, for any u 0, we have the so-called Mehler formula: where E ′ denotes the mathematical expectation with respect to the probability P ′ .

Formula for the density
As said in the introduction, we consider a random variable Z ∈ 1,2 with zero mean. Recall the function g Z introduced in (1 .2): It is useful to keep in mind throughout this paper that, by [15, Proposition 3.9], g Z (z) 0 for almost all z in the support of the law of Z.

General formulae
We begin with the following theorem. (3.14) Proof. Let us first prove a useful identity. For any f : → of class 1 with bounded derivative, we have Now, assume that the random variable g Z (Z) is strictly positive almost surely. Combining (3.15) with an approximation argument, we get, for any Borel set B ∈ ( ), that Consequently, since g Z (Z) > 0 a.s. by assumption, we have P(Z ∈ B) = 0. Therefore, the Radon-Nikodym criterion implies that the law of Z has a density.
Conversely, assume that the law of Z has a density, say ρ. Let f : → be a continuous function with compact support, and let F denote any antiderivative of f . Note that F is necessarily bounded. Following Stein himself (see [19,Lemma 3,p. 61]), we can write: Equality (*) was obtained by integrating by parts, after observing that this is because Z has mean zero). Therefore, we have shown Since Z has zero mean, note that α < 0 and β > 0 necessarily. For every z ∈ (α, β), define The function ϕ is differentiable almost everywhere on (α, β), and its derivative is −zρ (z). In particular, since ϕ(α) = ϕ(β) = 0 and ϕ is strictly increasing before 0 and strictly decreasing afterwards, we have ϕ(z) > 0 for all z ∈ (α, β). Hence, (3.17) implies that g Z (Z) is strictly positive almost surely.
Finally, let us prove (3.14). Let ϕ still be defined by (3.18). On the one hand, we have ϕ ′ (z) = −zρ(z) for almost all z ∈ supp ρ. On the other hand, by (3.17), we have, for almost all z ∈ supp ρ, By putting these two facts together, we get the following ordinary differential equation satisfied by ϕ: Integrating this relation over the interval [0, z] yields Taking the exponential and using 0 Finally, the desired conclusion comes from (3.19).
Remark 3.2. The 'integration by parts formula' (3.15) was proved and used for the first time by Nourdin and Peccati in [15], in order to perform error bounds in the normal approximation of Z.
Specifically, [15] shows, by combining Stein's method with (3.15), that, if Var(Z) > 0, then Var(Z) , (3.20) where N ∼ (0, VarZ). In reality, the inequality stated in [15] is with Var 〈DZ, −DL −1 Z〉 H instead of Var g Z (Z) on the right-hand side; but the same proof allows to write this slight improvement; it was not stated or used in [15] because it did not improve the applications therein.
As a corollary of Theorem 3.1, we can state the following. Using Corollary 3.3, we can deduce the following interesting criterion for normality, which one will compare with (3.20).

Corollary 3.4. Assume that Z is not identically zero. Then Z is Gaussian if and only if
Var(g Z (Z)) = 0.
When g Z can be bounded above and away from zero, we get the following density estimates: Proof : One only needs to apply Corollary 3.3.

Remark 3.6.
General lower bound results on densities are few and far between. The case of uniformly elliptic diffusions was treated in a series of papers by Kusuoka and Stroock: see [14]. This was generalized by Kohatsu-Higa [13] in Wiener space via the concept of uniformly elliptic random variables; these random variables proved to be well-adapted to studying diffusion equations. E. Nualart [17] showed that fractional exponential moments for a divergence-integral quantity known to be useful for bounding densities from above (see formula (1.1) above), can also be useful for deriving a scale of exponential lower bounds on densities; the scale includes Gaussian lower bounds. However, in all these works, the applications are largely restricted to diffusions.

Computations and examples
We now show how to 'compute' g Z (Z) = E(〈DZ, −DL −1 Z〉 H |Z) in practice. We then provide several examples using this computation.
where X ′ stands for an independent copy of X , and is such that X and X ′ are defined on the product probability space (Ω × Ω ′ , ⊗ ′ , P × P ′ ). Here E denotes the mathematical expectation with respect to P × P ′ , while E ′ is the mathematical expectation with respect to P ′ .
Proof : We follow the arguments contained in Nourdin and Peccati [15, Remark 3.6]. Without loss of generality, we can assume that H is equal to L 2 (A, , µ), where (A, ) is a measurable space and µ is a σ-finite measure without atoms. Let us consider the chaos expansion of Z, given by On the other hand, we have D a Z = ∞ m=1 mI m−1 ( f m (·, a)). Thus Consequently, By Mehler's formula (2.13), and since DZ = Φ Z (X ) by assumption, we deduce that Now, we give several examples of application of this corollary.

First example: monotone Gaussian functional, finite case.
Let N ∼ n (0, K) with K positive definite, and f : n → be a 1 function having bounded derivatives. Consider an isonormal Gaussian process X over the Euclidean space H = n , endowed with the inner product 〈h i , h j 〉 H = E(N i N j ) = K i j . Here, {h i } 1 i n stands for the canonical basis of H = n . Without loss of generality, we can identify N i with X (h i ) for any i = 1, . . . , n. N )). The chain rule (2.10) implies that Z ∈ 1,2 and that . . , n. In particular, Corollary 3.5 combined with Proposition 3.7 yields the following.

Second example: monotone Gaussian functional, continuous case.
Assume that X = (X t , t ∈ [0, T ]) is a centered Gaussian process with continuous paths, and that f : → is 1 with a bounded derivative. The Gaussian space generated by X can be identified with an isonormal Gaussian process of the type X = {X (h) : h ∈ H}, where the real and separable Hilbert space H is defined as follows: (i) denote by the set of all -valued step functions on [0, T ], (ii) define H as the Hilbert space obtained by closing with respect to the scalar product In particular, with such a notation, we identify X t with X (1 [0,t] ).
Using Corollary 3.5 combined with Proposition 3.7, we get the following.

Proposition 3.10.
Assume that X = (X t , t ∈ [0, T ]) is a centered Gaussian process with continuous paths, and that f :

Third example: maximum of a Gaussian vector.
Let N ∼ n (0, K) with K positive definite. Once again, we assume that N can be written N i = X (h i ), for X and h i , i = 1, . . . , n, defined as in the section 3.2.1. Since K is positive definite, note that the members h 1 , . . . , h n are necessarily different in pairs. Let Z = max N − E(max N ), and set Proof : Fix u 0. Since, for any i = j, we have the random variable I u is a well-defined element of {1, . . . , n}. Now, if ∆ i denotes the set {x ∈ n : x j x i for all j}, observe that ∂ ∂ x i max(x 1 , . . . , x n ) = 1 ∆ i (x 1 , . . . , x n ) almost everywhere. The desired conclusion follows from the Lipschitz version of the chain rule (2.10), and the following Lipschitz property of the max function, which is easily proved by induction (on n 1): max( y 1 , . . . , y n ) − max(x 1 , . . . , x n ) n i=1 | y i − x i | for any x, y ∈ n . (3.24) In particular, we deduce from Lemma 3.11 that so that, by Corollary 3.8, the density ρ of the law of Z is given, for almost all z in supp ρ, by: As a by-product (see also Corollary 3.5), we get the density estimates in the next proposition, and a variance formula. • With N ′ an independent copy of N and I u := argmax(e −u N + 1 − e −2u N ′ ), we have The variance formula above is a discrete analogue of formula (1.6): the reader can check that it is established identically to the proof of (1.6) found in the next section (Proposition 3.13), by using formula (3.25) instead of formula (3.26) therein. It appears that this discrete-case variance formula was established recently using non-Malliavin-calculus tools in the preprint [8, Lemma 3.1].

Fourth example: supremum of a Gaussian process.
Assume that X = (X t , t ∈ [0, T ]) is a centered Gaussian process with continuous paths. Fernique's theorem [12] implies that E( As in the section above, we can see X as an isonormal Gaussian process (over H).
, and let I u be the (unique) random point where e −u X + 1 − e −2u X ′ attains its maximum on [0, T ]. Note that I u is well-defined, see e.g. Lemma 2.6 in [11]. Moreover, we have that Z ∈ 1,2 and the law of Z has a density, see Proposition 2.1.11 in [16], and DZ = Φ Z (X ) = 1 [0,I 0 ] , see Lemma 3.1 in [9]. Therefore where R(s, t) = E(X s X t ) is the covariance function of X . Hence, (1.5) is a direct application of Corollary 3.8. The first statement in the next proposition now follows straight from Corollary 3.5. The proposition's second statement is the variance formula (1.6), and its proof is given below. • Let R (s, t) = E(X s X t ), let X ′ be an independent copy of X , and let Then Var(sup X ) = When applied to the case of fractional Brownian motion, we get the following. Proof : The desired conclusion is a direct application of Proposition 3.13 since, for all a s < t b,

Concentration inequalities
In this whole section, we continue to assume that Z ∈ 1,2 has zero mean, and to work with g Z defined by (1.2). Now, we investigate what can be said when g Z (Z) just admits a lower (resp. upper) bound. Results under such hypotheses are more difficult to obtain than in the previous section, since there we could use bounds on g Z (Z) in both directions to good effect; this is apparent, for instance, in the appearance of both the lower and upper bounding values σ min and σ max in each of the two bounds in (3.27), or more generally in Corollary 3.5. However, given our previous work, tails bounds can be readily obtained: most of the analysis of the role of g Z (Z) in tail estimates is already contained in the proof of Corollary 3.3.
Before stating our own results, let us cite a work which is closely related to ours, insofar as some of the preoccupations and techniques are similar. In [10], Houdré and Privault prove concentration inequalities for functionals of Wiener and Poisson spaces: they have discovered almost-sure conditions on expressions involving Malliavin derivatives which guarantee upper bounds on the tails of their functionals. This is similar to the upper bound portion of our work (Section 4.1), and closer yet to the first-chaos portion of the work in [21]; they do not, however, address lower bound issues.

Upper bounds
The next result allows comparisons both to the Gaussian and exponential tails.
Here, instead of (4.28), we get similarly that m ′ for all θ 0. Therefore, we can use the same arguments as above in order to obtain, this time, firstly that E e θ Y e βθ 2 2 for all θ 0 and secondly that P(Y z) exp − z 2 2β (choosing θ = z/β), which is the desired bound for P(Z −z).

Remark 4.2.
In Theorem 4.1, when α > 0, by the non-negativity of g Z (Z) , (i) automatically implies that Z is bounded below by the non-random constant −β/α, and therefore the left hand tail P(Z −z) is zero for z −β/α. Therefore the upper bound on this tail in the above theorem is only asymptotically of interest in the "sub-Gaussian" case where α = 0.
We will now give an example of application of Theorem 4.1. Assume that B = (B t , t ∈ [0, T ]) is a fractional Brownian motion with Hurst index H ∈ (0, 1). For any choice of the parameter H, as already mentioned in section 3.2.2, the Gaussian space generated by B can be identified with an isonormal Gaussian process of the type X = {X (h) : h ∈ H}, where the real and separable Hilbert space H is defined as follows: (i) denote by the set of all -valued step functions on [0, T ], (ii) define H as the Hilbert space obtained by closing with respect to the scalar product In particular, with such a notation one has that B t = X (1 [0,t] ). Observe that Z ∈ 1,2 , with DZ =  Now we wish to make Z appear inside the right-hand side above. Note first that, using Cauchy-Schwarz's inequality and thanks to the lower bound on Q, so that g Z (Z) αZ + β, with α, β defined as in (1.8). Therefore, due to Theorem 4.1, the desired conclusion (1.7) is proved, once we show that the law of Z has a density.
For that purpose, recall the so-called Bouleau-Hirsch criterion from [16, Theorem 2.1.3]: if Z ∈ 1,2 is such that DZ H > 0 P-a.s., then the law of Z has a density. Here, we have from the computations performed above. We can express it as ], which is a contradiction with the fact that the Lebesgue measure of the set {u ∈ : Q ′ (u) = 0} is zero. Therefore, DZ H > 0 P-a.s., and the law of Z has a density according to the Bouleau-Hirsch criterion. The proof of (1.7) is concluded.