Convergence in distribution norms in the CLT for non identical distributed random variables

We study the convergence in distribution norms in the Central Limit Theorem for non identical distributed random variables that is $$ \varepsilon_{n}(f):={\mathbb{E}}\Big(f\Big(\frac 1{\sqrt n}\sum_{i=1}^{n}Z_{i}\Big)\Big)-{\mathbb{E}}\big(f(G)\big)\rightarrow 0 $$ where $Z_{i}$ are centred independent random variables and $G$ is a Gaussian random variable. We also consider local developments (Edgeworth expansion). This kind of results is well understood in the case of smooth test functions $f$. If one deals with measurable and bounded test functions (convergence in total variation distance), a well known theorem due to Prohorov shows that some regularity condition for the law of the random variables $Z_{i}$, $i\in {\mathbb{N}}$, on hand is needed. Essentially, one needs that the law of $ Z_{i}$ is locally lower bounded by the Lebesgue measure (Doeblin's condition). This topic is also widely discussed in the literature. Our main contribution is to discuss convergence in distribution norms, that is to replace the test function $f$ by some derivative $\partial_{\alpha }f$ and to obtain upper bounds for $\varepsilon_{n}(\partial_{\alpha }f)$ in terms of the infinite norm of $f$. Some applications are also discussed: an invariance principle for the occupation time for random walks, small balls estimates and expected value of the number of roots of trigonometric polynomials with random coefficients.


Introduction
We consider a sequence of centred independent random variables Z k ∈ R d , k ∈ N with covariance matrixes σ i,j k = E(Z i k Z j k ) and we look to Our aim is to obtain a Central Limit Theorem as well as Edgeworth developments in this framework. The basic hypotheses are the following. We assume the normalization condition n k=1 σ k = I d (1.2) where I d ∈ M d×d is the identity matrix. Moreover we assume that for each p ∈ N there exists a constant C p ≥ 1 such that Let f k,∞ denote the norm in W k,∞ , that is the uniform norm of f and of all its derivatives of order less or equal to k. First, we want to prove that where γ d (x) = (2π) −d/2 exp(− 1 2 |x| 2 ) is the density of the standard normal law. This corresponds to the Central Limit Theorem (hereafter CLT). Moreover we look for some functions (polynomials) ψ k : R d → R such that for N ∈ N and for every f ∈ C (1.5) This is the Edgeworth development of order N . In the case of smooth test functions f (as it is the case in (1.5)), this topic has been widely discussed and well understood: such development has been obtained by Sirazhdinov and Mamatov [21] in the case of identically distributed random variables and then by Götze and Hipp [16] in the non identically distributed case. A complete presentation of this topic may be found in the book of Battacharaya and Rao [12]. It it worth to mention that the classical approach used in the above papers is based on Fourier analysis. In particular, the coefficients ψ k in the above development are given as inverse Fourier transform of some suitable functions, so the expression of ψ k is not completely transparent and its explicit computation requires some effort.
In our paper we use a different approach based on the Lindemberg method for Markov semigroups (this is inspired from works concerning the parametrix method for Markov semigroups in [9]). This alternative approach is convenient for the proof of our main result concerning "distribution norms" (see below). But, even in the case of smooth test functions, this allows to obtain slightly more clear and precise results: we prove that ψ k are linear combination of Hermite polynomials of order less or equal to k, whose coefficients are explicit and computed starting with the moments of Z i and G i , G i denoting a Gaussian random variable with the same covariance matrix as Z i . So the computation of these coefficients is easier. Moreover, our estimates hold for each fixed n (in contrast with the ones in the above papers, which are just asymptotic).
A second problem is to obtain the estimate (1.5) for test functions f which are not regular, in particular to replace f (N +1)(N +3),∞ by f ∞ . This amounts to estimate the error in total variation distance.
In the case of identically distributed random variables, and for N = 0 (so at the level of the standard CLT), this problem has been widely studied. First of all, one may prove the convergence in Kolmogorov distance, that is for f = 1 D where D is a rectangle. Many refinements of this type of result has been obtained by Battacharaya and Rao and they are presented in [12]. But it turns out that one may not prove such a result for a general measurable set D without assuming more regularity on the law of Z k , k ∈ N. Indeed, in his seminal paper [20] Prohorov proved that the convergence in total variation distance is equivalent to the fact that there exists m such that the law of Z 1 + · · · + Z m has an absolutely continuous component. In [3] Bally and Caramellino obtained (1.5) in total variation distance, for identically distributed random variables, under the hypothesis that the law of Z k is locally lower bounded by the Lebesgue measure. We assume this type of hypothesis in this paper also. More precisely we assume that there exists r, ε > 0 and there exists z k ∈ R d such that for every measurable set A ⊂ B r (z k ) P(Z k ∈ A) ≥ ελ(A) (1.6) where λ is the Lebesgue measure. This condition is known in the literature as Doeblin's condition. Under this hypothesis we are able to obtain (1.5) in total variation distance. It is clear that (1.6) is more restrictive than Prohorov's condition. However we prove that in the framework of the CLT for identically distributed random variables, if we have Prohorov's condition we may produce doubling condition as well, just working with the packages Y k = 2(k+1)m i=2km+1 Z i . This allows us to prove Corollary 3.12 which is a stronger version of Prohorov's theorem. Let us finally mention another line of research which has been strongly developed in the last years: it consists in estimating the convergence in the CLT in entropy distance. This starts with the papers of Barron [11] and Johnson and Barron [14]. In these papers the case of identically distributed random variables is considered, but recently, in [13] Bobkov, Chistyakov and Götze obtained the estimate in entropy distance for the case of random variables which are no more identically distributed as well. We recall that the convergence in entropy distance implies the convergence in total variation distance, so such results are stronger. However, in order to work in entropy distance one has to assume that the law of Z k is absolutely continuous with respect to the Lebesgue measure and have finite entropy and this is more limiting than (1.6). So the hypothesis and the results are slightly different. A third problem is to obtain the CLT and the Edgeworth development with the test function f replaced by a derivative ∂ γ f. If the law of S n (Z) is absolutely continuous with respect to the Lebesgue measure, this means that we prove the convergence of the density and of its derivatives as well (which corresponds to the convergence in distribution norms). Unfortunately we fail to obtain such a result in the general framework: this is moral because we do not assume that the laws of Z k , k = 1, ..., n are absolutely continuous, and then the law of S n (Z) may have atoms. However we obtain a similar result, but we have to keep a "small error". Let us give a precise statement of our result. For a function f ∈ C m p (R d ) (m times differentiable with polynomial growth) we define L m (f ) and l m (f ) to 3 be two constants such that Our main result is the following: for a fixed m ∈ N, there exist some constants C N ≥ 1 ≥ c N > 0 (depending on r, ε from (1.6) and on C p from (1.3)) such that for every multi-index γ with |γ| = m and for every f ∈ C m p (R d ) (1.8) If the random variables Z k , k ∈ N are identically distributed we succeed to obtain exactly the same result under the Prohorov's condition (see Corollary 3.12). So this is a strictly stronger version of Prohorov's theorem (for m = 0 we get the convergence in total variation). Moreover, such result is used in [6] in order to give invariance principles concerning the variance of the number of zeros of trigonometric polynomials. However we fail to get convergence in distribution norms because L m (f )e −c N ×n appears in the upper bound of the error and L m (f ) depends on the derivatives of f . But we are close to such a result: (1.9) Another way to eliminate L m (f )e −c N ×n is to assume that the law of Z i , i = 1, ..., m are absolutely continuous with the derivative of the density belonging to L 1 . This is done in Proposition 4.2: we prove that for every k ∈ N and every multi-index α so, under these stronger conditions, we succeed to obtain convergence in distribution norms. But the most interesting consequence of our result is given in Theorem 4.1: there we give an invariance principle for the occupation time of a random walk. More precisely we take ε n = n − 1 2 (1−ρ) with ρ ∈ (0, 1) and we prove that, for every ρ ′ < ρ with W s a Brownian motion (so 1 0 1 εn 1 (−εn,εn) (W s )ds converges to the local time of W ). Here the test function is f n = 1 εn 1 (−εn,εn) and this converges to the Dirac function. This example shows that (1.8) is an appropriate estimate in order to deal with some singular problems. The paper is organized as follows. In Section 2 we prove the result for smooth test functions (that is (1.5)) and in Section 3 we treat the case of measurable test functions. In order to do it we use some integration by parts technology which has already been used in [3] and which is presented in Section 3.1. We mention that a similar approach has been used by Nourdin and Poly [18], by using the Γ-calculus settled in [10]. The main result in Section 3 is Theorem 3.8. In Section 4 we treat the two applications mentioned above. Finally we leave for Appendix A the explicit calculus of the coefficients ψ q from (1.5) for q = 1, 2, 3 and in Appendix B we prove a technical result which is used in our development. Although many ideas in our paper come from previous works (mainly from Malliavin calculus), at the end we finish with an approach which is fairly simple and elementary -so we try to give here a presentation which is essentially self contained (even if some cumbersome and straightforward computations are just sketched).

Smooth test functions 2.1 Notation and main result
We fix n ∈ N and we consider n centred and independent random variables We denote by σ k the covariance matrix of Z k that is We look to Our aim is to compare the law of S n (Z) with the law of S n (G) where G = (G k ) 1≤k≤n denotes n centred and independent Gaussian random variables with the same covariance matrices: This is a CLT result (but we stress that it is not asymptotic). And we will obtain an Edgeworth development as well. We assume that Z k has finite moments of any order and more precisely, In particular, for i = 2 the inequality (2.2) gives Since the covariance matrix of G k is equal to that of Z k , the inequality (2.2) holds for the G k 's as well, so we can resume by writing Without loss of generality, (from Hölder) we can assume that 1 ≤ C i (Z) ≤ C i+1 (Z) and more in general Remark 2.1. Although it is not explicitly written, we are assuming that we fix n and that the laws of Z k and G k , as well as σ k , are all depending on n. In our applications, we take a sequence Y = {Y k } k of i.i.d. centred r.v's taking values in R m and we consider Z k = 1 √ n C k Y k , where C k denotes a d × m matrix. Therefore, we actually study in which c i (Y ) denotes a constant depending only on (the law of ) the Y k 's, so that (2.4) actually holds. We will specialize the results to this case. But in order to relax the notation and the proofs, it is much more useful to consider a general Z k instead of 1 √ n C k Y k . In order to give the expression of the terms which appear in the Edgeworth development we need to introduce some notation. We say that α is a multiindex if α ∈ {1, . . . , d} k for some k ≥ 1, and we set |α| = k its length. We allow the case k = 0, giving the void multiindex α = ∅. Let α be a multiindex and set k = |α|. For for x ∈ R d and f : , the case k = 0 giving x ∅ = 1 and ∂ ∅ f = f . In the following, we denote with C k (R d ) the set of the functions f such that ∂ α f exists and is continuous for any α with |α| ≤ k.
and l k (f ) to be some constants such that Moreover, for a non negative definite matrix σ ∈ M d×d we denote by L σ the Laplace operator associated to σ, i.e.
For r ≥ 1 and l ≥ 0 we set Notice that D (l) r ≡ 0 for l = 0, 1, 2 and, by (2.4), for l ≥ 3 and |α| = l then |∆ α (r)| ≤ 2C l (Z) n l/2 , r = 1, . . . , n. (2.8) We construct now the coefficients of our development. Let N be fixed: this is the order of the development that we will obtain. Given 1 ≤ m ≤ k ≤ N we define Then, for 1 ≤ k ≤ N, we define the differential operator (2.10) By using (2.2) and (2.8), one easily gets the following estimates: where L 3k (f ) and l 3k (f ) are given in (2.5) and C, C 3k are positive constants.
We introduce now the Hermite polynomials. We refer to Nualart [19] for definitions and properties, here we just give the shortest way to introduce them by means of the integration by parts formula. Given a multi-index α, the Hermite polynomial H α on R d is defined by where W is a standard normal random variable in R d . Moreover for a differential operator Γ = |α|≤k a(α)∂ α , with a(α) ∈ R, we denote H Γ = |α|≤k a(α)H α so that (2.14) Finally we define The main result in this section is the following (recall the constants L k (f ) and l k (f ), f ∈ C k p (R d ), defined in (2.5)): 16) in which N = [N/2], N = N (2N + N + 5), H N is a positive constant depending on N and W denotes a standard normal random variable in R d . As a consequence, taking f (x) = x β with |β| = k, one gets

Basic decomposition and proof of the main result
Let N ∈ {0, 1, . . .}. We define r ≡ 0 for l = 0, 1, 2, the above sum actually begins with l = 3 and of course this is the basic fact. Then, with the convention 2 l=3 = 0, we have We also define For a matrix σ ∈ M d×d we recall the Laplace operator L σ associated to σ in (see (2.6)) and we define In (2.21), W stands for a standard Gaussian random variable. Then we define We now put our problem in a semigroup framework. For a sequence X k , k ≥ 1, of independent r.v.'s, for 1 ≤ k ≤ p we define We use P Z k,p and P G k,p . By using independence, we have the semigroup and the commutative property: P X k,p = P X r,p P X k,r = P X k,r P X r,p k ≤ r ≤ p.
(2.24) Moreover, for m = 1, . . . , N we denote N,k,n = k≤r 1 <···<rm≤n P G rm+1,n P G r m−1 +1,rm · · · P G r 1 +1,r 2 P G k,r 1 Q  Notice that in the first sum above the conditions q i , q ′ i ∈ {0, 1} and q 1 + · · · + q m + q ′ 1 + · · · + q ′ m > 0 say that at least one of q i , q ′ i , i = 1, ..., m is equal to one. We notice that the operators T 1 N,r i and U 1 N,σr i represent "remainders" and they are supposed to give small quantities of order n − 1 2 (N +1) . So the fact that at least one q i or q ′ i is non null means that the product has at least one term which is a remainder (so is small), and consequently R (m) N,k,n is a remainder also. 8 Finally we define We are now able to give our first result: N,k,n , m = 1, . . . , N + 1, be given through (2.18), (2.20), (2.25), (2.26). Then for every 1 ≤ k ≤ n + 1 and f ∈ C Proof.
Step 1 (Lindeberg method) We use the Lindeberg method in terms of semigroups: for 1 ≤ k ≤ n + 1 Then we define (2.28) and the above relation reads We will write (2.29) as a discrete time Volterra type equation (this is inspired from the approach to the parametrix method given in [9]: see equation (3.1) there). For a family of operators F k,p , k ≤ p we define AF by and we write (2.29) in functional form: By iteration, By the commutative property in (2.24), straightforward computations give (2.32) Step 2 (Taylor formula) The drawback of (2.31) is that A depends on P Z also, see (2.28). So, we use now the Taylor's formula in order to eliminate this dependance. We use (2.4) and we consider a Taylor approximation at the level of an error of order n − N+2 2 . We use the following expression for the Taylor's formula: Then we have, with D (l) r defined in (2.7), By using the independence property, one can apply commutativity and by using (2.32) we have Notice that the operator in (2.33) acts on f ∈ C m(N +3) . In particular, the chain P G rm+1,n · · · P G r 1 +1,r 2 P G k,r 1 contains all the steps, except for the steps corresponding to r i , i = 1, ..., m (remark that for each i, P G r i ,r i +1 is replaced with T 0 N,r i + T 1 N,r i ). In order to "insert" such steps we use the backward Taylor formula (B.3) up to order N = [N/2] (see next Appendix B). So, we take h 0 N,σr and h 1 N,σr as in (2.20) and (2.21) respectively and we have U 0 N,r 1 and U 1 N,r 1 being given in (2.22). We use this formula in (2.34) for every i = 1, 2, ..., m and we get Notice that the above operator acts on C Our aim now is to isolate the principal term, that is the sum of the terms where only U 0 N,r i and T 0 N,r i appear. So we write N,k,n in (2.25). In order to compute the first one we notice that for every r ′ < r < r ′′ we have Then, for m = 1, ..., N We treat now A N +1 P Z . Using (2.33) we get We give now some useful representations of the remainders.
where a n (α) ∈ R are suitable coefficients with the property
Proof. In a first step we construct the measures µ α r 1 ,...,rm and the operators θ α r 1 ,...,rm and in a second step we prove that the corresponding coefficients a r 1 ,...,rm n (α) verify (2.37). We start by representing T 0 N,r defined in (2.18). Set (2.40) Hereafter γ denotes a non negative power. Concerning for every Borel set A. Then we have We represent now the operator U 0 So, by denoting ρ 0 σr the law of G r , we have (2.42) We now obtain a similar representation for h 1 N,σ f (x) defined in (2.21). Set in which φ σ 1/2 √ s W denotes the density of a centred Gaussian r.v. with covariance matrix sσ. Then we write (2.43) Using (2.40), (2.41), (2.42) and (2.43) we obtain (2.36) with the measure µ α r 1 ,...,rm from (2.39) constructed in the following way: where η i is one of the measures ν q,β r i , q = 0, 1, andη i is one of the measures ρ q σr i , q = 0, 1. Let us check that the coefficients a r 1 ,...,rm n (α) which will appear in (2.36) verify the bounds in (2.37).
1} and at least one of them is equal to one. And a r 1 ,...,rm n (α) is the product of coefficients which appear in the representation of U We finally prove (2.38). We have N,r 1 ,...,r N+1 is clearly the same. We give now the representation of the "principal term": with Γ k defined in (2.10) and Proof. Let Λ m and Λ m,k be the sets in (2.9). Notice that, for fixed m, the Λ m,k 's are disjoint which is a differential operator of the form (2.45). Moreover, the coefficients c n (α) can be bounded as follows: and the estimate in (2.45) holds as well.
We are now ready for the Proof of Theorem 2.2 We denote P X n = P X 1,n+1 , with X = Z or X = G, so that We have proved that with so it is sufficient to study the remaining terms I 1 , I 2 and I 3 above. Consider first m ∈ {1, ..., N }. We use Lemma 2.4 (recall N m given therein) and in particular (2.36): Since the G k 1 k / ∈{r 1 ,...,rm} 's are centred and independent, we can use the Burkholder inequality (see next (3.26), which gives and by inserting, we get We use now this inequality with g = θ α r 1 ,...,rm ∂ α f : by applying (2.38) we get Moreover, using (2.37) , H N denoting a constant depending on N only. Since the set {1 ≤ r 1 < ... < r m ≤ n} has less than n m elements, we get The estimate for I 2 (f ) is analogous. Concerning I 1 f , we use (2.45) in order to obtain in which we have again used the Burkholder inequality (3.26). By using C p ( with N = N (2N + N + 5), and statement (2.16) follows. Concerning (2.17), it suffices to notice that for f (x) = x β with |β| = k then L N (f ) = 1 and l N (f ) = k.

Differential calculus based on a splitting method
In this section we use the variational calculus settled in [2,1,7,8] in order to treat general test functions. Let us give the definitions and the notation. We say that the law of the random variable Y ∈ R d is locally lower bounded by the Lebesgue measure if there exists y Y ∈ R d and ε, r > 0 such that for every non negative and measurable function f : We denote by L(r, ε) the class of the random variables which verify (3.1). Given r > 0 we consider the functions a r , ψ r : R → R + defined by The advantage of ψ r (|y − y Y | 2 ) is that it is a smooth function (which replaces the indicator function of the ball) and (it is easy to check) that for each l ∈ N, p ≥ 1 there exists a universal constant C l,p ≥ 1 such that where a (l) r denotes the derivative of order l of a r . Moreover one can check (see [3]) that if Y ∈ L(2r, ε) then it admits the following decomposition (the equality is understood as identity of laws): where χ, U, V are independent random variables with the following laws: P(χ = 1) = εm(r) and P(χ = 0) = 1 − εm(r), We are now able to present our calculus. We fix r, ε > 0 and we consider a sequence of independent random variables Y k ∈ L(2r, ε), k ∈ N. Then, using the procedure described above we write the law of χ k , U k and V k being given in (3.5). We assume that χ k , U k , V k , k ∈ N, are independent. We define G = σ(χ k , V k , k ∈ N). A random variable F = f (ω, U 1 , ..., U n ) is called a simple functional if f is G × B(R d×n ) measurable and for each ω, f (ω, ·) ∈ C ∞ b (R d×n ). We denote S the space of the simple functionals. Moreover we define the differential operator D : S → l 2 := l 2 (R d ) by D (k,i) F = χ k ∂ u i k f (ω, U 1 , ..., U n ). Then the Malliavin covariance matrix of F ∈ (F 1 , ..., F m ) ∈ S m is defined as We introduce now the Ornstein-Uhlenbeck operator L. We denote , p U k being the density of U k , and we define Using elementary integration by parts on R d one easily proves the following duality formula: for Finally, for q ≥ 2, we define |F | q,p = F q,p + LF q−2,p . (3.12) We recall now the basic computational rules and the integration by parts formulae. For φ ∈ C 1 (R d ) and F = (F 1 , ..., F d ) ∈ S d we have and for F, G ∈ S L(F G) = F LG + GLF − 2 DF, DG . (3.14) The formula (3.13) is just the chain rule in the standard differential calculus and (3.14) is obtained using duality. Let H ∈ S. We use the duality relation and We give now the integration by parts formula (this is a localized version of the standard integration by parts formula from Malliavin calculus).
Proof. We give here only a sketch of the proof, a detailed one can be found e.g. in [4] and [7]. Using the chain rule Dφ(F ) = ∇φ(F )DF so that It follows that, on the set det σ F > 0,we have ∇φ(F ) = γ F Dφ(F ), DF l 2 . Then, by using (3.15) we get and (3.15)-(3.16) hold. By iteration one obtains the higher order integration by parts formulae.
We give now useful estimates for the weights which appear in (3.17): Lemma 3.2. Let m, q ∈ N, F ∈ S d and G ∈ S. There exists a universal constant C ≥ 1 (depending on d, m, q only) such that for every multi index α with |α| = q one has In particular we have Proof. A rather long but straightforward computation (see [7] or [4] Theorem 3.4, more precise details are given in [5]) gives Notice that Moreover, on the set Ψ η (det σ F ) = 0 we have det σ F ≥ η/2. So Taking now m = 0 and using Schwartz inequality we obtain (3.19).
We go now on and we give the regularization lemma. We recall that a super kernel φ : R d → R is a function which belongs to the Schwartz space S (infinitely differentiable functions which decrease in a polynomial way to infinity), φ(x)dx = 1, and such that for every multi indexes α and β, one has y α φ(y)dy = 0, |α| ≥ 1, (3.20) |y| m |∂ β φ(y)| dy < ∞. As usual, for |α| = m then y α = m i=1 y α i . Since super kernels play a crucial role in our approach we give here the construction of such an object (we follow [17] Section 3, Remark 1). We do it in dimension d = 1 and then we take tensor products. So, if d = 1 we take ψ ∈ S which is symmetric and equal to one in a neighborhood of zero and we define φ = F −1 ψ, the inverse of the Fourier transform of ψ. Since F −1 sends S into S the property (3.21) is verified. And we also have 0 = ψ (m) (0) = i −m x m φ(x)dx so (3.20) holds as well. We finally normalize in order to obtain φ = 1. We fix a super kernel φ. For δ ∈ (0, 1) and for a function f we define the symbol * denoting convolution. For f ∈ C k p (R d ), we recall the constants L k (f ) and l k (f ) in (2.5). Lemma 3.3. Let F ∈ S d and q, m ∈ N. There exists a constant C ≥ 1, depending on d, m and q only, such that for every f ∈ C q+m p (R d ), every multi index γ with |γ| = m and every η, δ > 0 As a consequence, we have

(3.24)
Proof A. Using Taylor expansion of order q Using (3.20) we obtain I(x, y)φ δ (x − y)dy = 0 and by a change of variable we get

20
Using integration by parts formula (3.17) (with G = 1) and the upper bound from (3.19) (with p = 2) we get

CLT and Edgeworth's development
In this section we take F = S n (Z) = n k=1 Z k defined in (2.1). It is convenient for us to write We assume that Y k ∈ L(2r, ε) so we have the decomposition (3.7). Consequently We will use Lemma 3.3, so we estimate the quantities which appear in the right hand side of (3.22). Proof. We will use the following easy consequence of Burkholder's inequality for discrete martingales: if M n = n k=1 ∆ k with ∆ k , k = 1, ..., n independent centred random variables, then Using this inequality and (2.4) we obtain S n (Z) p ≤ C × C p (Z). We look now to the Sobolev norms. It is easy to see that, S n (Z) i denoting the ith component of S n (Z), and D (l) S n (Z) = 0 for l ≥ 2.
Since n k=1 |σ k | ≤ C 2 (Z) it follows that We prove that

27)
C depending on k, p but being independent of n.
Let k = 0. The duality relation gives E(LZ k ) = E( D1, DZ k l 2 ) = 0. Since the LZ k 's are independent, we can apply (3.26) first and (2.4), so that We take now k = 1. We have so that, using again (2.4), We notice that D (q,j) A r (U q ) is not null for r < |U q − y Y | 2 < 2r and contains the derivatives of a r up to order 2, possibly multiplied by polynomials in the components of U q − y Y of order up to 2. Since |U q − y Y | 2 ≤ 2r, by using (3.3) one obtains E(|DLF | p l 2 ) ≤ Cr −2p × C 1/2 2 (Z), so (3.27) holds for k = 1 also. And for higher order derivatives the proof is similar.
We give now estimates of the Malliavin covariance matrix. We have σ Sn(Z) = n k=1 χ k σ k .
Lemma 3.6. Let q, m ∈ N. There exists some constant C ≥ 1, depending just on q, m, such that for every δ > 0, every multi index γ with |γ| = m and every f ∈ C m p (R d ) one has c p being given in (3.23).
Proof. We will use Lemma 3.3. Notice first that, by (3.25), the constant C q+m (S n (Z)) defined in (3.23) is upper bounded by CC 2d(q+m)(q+m+3) 4d(q+m)(q+m+3) (Z)r −(q+m+1) , C depending on d and q + m. And by using the Burkholder inequality (3.26), one has S n (Z) c p (f ) being given in (3.23). We take now η = ( λ n m(r) 2(1+λn) ) d and we use (3.29) in order to obtain We are now able to characterize the regularity of the semigroup P Z n : Proof. We take η = ( λ n m(r) 2(1+λn) ) d and the truncation function Ψ η and we write We estimate first In order to estimate J we use integration by parts and we obtain J = E(f (x + S n (Z))H γ (S n (Z), Ψ η (det σ Sn ))) Then using (3.19) and (3.25) We are now able to give the main result.
Theorem 3.8. We look to S n (Z) = n k=1 Z k = n k=1 C k Y k and we assume that Y k ∈ L(2r, ε) for some ε > 0, r > 0. We also assume that (1.2) and (1.3) hold (for every p ∈ N). Let N, q ∈ N be fixed. We assume that n is sufficiently large in order to have n 1 2 (N +1) e − m 2 (r) 128 ×n ≤ 1 and n ≥ 4(N + 1)C 2 (Z).
There exists C ≥ 1, depending on N and q only, such that for every multi index γ with |γ| = q and every f ∈ C q p (R d ) c p being given in (3.23). Proof.
So, taking all estimates, we obtain So, and (3.40) is proved in Case 1.
Notice that P G r N +1,r N+1 · · · P G r 1 +1, where G is a centred Gaussian random variable of variance Now the proof follows as in the previous case. So (3.40) is proved. And, summing over r 1 < r 2 < · · · < r N +1 ≤ n we get Exactly as in Case 2 presented above (using standard integration by parts with respect to the law of Gaussian random variables) we obtain So, (3.36) is proved.
Step 2. We now come back and we replace L q+(N +1)(N +3) (f ) by L q (f ) in (3.36). We will use the regularization lemma. So we fix δ > 0 (to be chosen in a moment) and we write and l m (f ) = l 0 (f ). So, We use now (3.30) with x = 0 and with some h to be chosen in a moment. We then obtain with Q h,q (Z) defined in (3.31) (In order to identify the notation from (3.31) we recall that q = |γ| was denoted by m in (3.31) and h, which we may choose as we want, was denoted by q in (3.31)). And we also have A ′′ δ (f ) ≤ CL 0 (f )δ h (the proof is identical to the one of (3.24) but one employs usual integration by parts with respect to the Gaussian law). We put all this together and we obtain We take now δ such that δ h = 1 δ q+(N +1)(N +3) e − m 2 (r)n 64 so that We take now n sufficiently large in order to have The statement now follows by observing that, with C * (Z) given in (3.34), The result in Theorem 3.8 holds under the following slightly weaker condition (which will be used in the proof of Corollary 3.12 below). Proposition 3.9. Assume that for some m < n one has Y k ∈ L(2r, ε) for k ≤ n − m and n−m k=1 σ k ≥ 1 2 I. Then (3.33) holds true.
Proof. The idea is that, since n−m k=1 σ k ≥ 1 2 I, the random variables Y k , k ≤ n − m contain sufficient noise in order to give the regularization effect. We show the main changes in the estimate of I 2 (f ) (for I 1 (f ), I 3 (f ) the proof is analogues). We split P Z r N+1 +1,n = P Z r N+1 +1,n−m P Z n−m,n and we need to have sufficient noise in order that P Z r N+1 +1,n−m gives the regularization effect. Then, the two cases described in ( where W is a standard Gaussian random variable and Φ σn N is defined in (2.10) using Z k = σ ). And C σn * (Z) = (λ dq n /λ −q n )C * (Z) with C * (Z) given in (3.34) and λ n respectively λ n the lower respectively the larger eigenvalue of σ n . Finally r = r(λ n /λ n ) d .

(3.45)
Here γ d is the density of the standard normal law in R d .
the last inequality being true by our choice of δ n . Moreover with |R ′ (n)| ≤ C n 1 2 (N+1) the last inequality being again a consequence of the choice of δ n . We now prove a stronger version of Prohorov's theorem. We consider a sequence of identical distributed, centred random variables X k ∈ R d which have finite moments of any order and we look to Following Porhorov we assume that there exist m ∈ N such that for some measurable non negative function ψ.
Corollary 3.12. We assume that (3.46) holds. We fix q, N ∈ N. There exists two constants 0 < c * ≤ 1 ≤ C * (depending on N and q) such that the following holds: if then, for every multi-index γ with |γ| ≤ q and foe every f ∈ C q p (R d ) one has Proof. We denote Notice that we may take ψ in (3.46) to be bounded with compact support. Then ψ * ψ is continuous and so we may find some r > 0, ε > 0 and y ∈ R d such that ψ * ψ ≥ ε1 Br (y) . It follows that Y k ∈ L(2r, ε) and we may use the previous theorem in order to obtain (3.47) for n = 2m × n ′ with n ′ ∈ N. But this is not satisfactory because we claim that (3.47) holds for every n ∈ N. This does not follow directly but needs to come back to the proof of Theorem 3.8 and to adapt it in the following way. Suppose that 2mn ′ ≤ n < 2m(n ′ + 1). Then Since X k , 2mn ′ + 1 ≤ k ≤ n have no regularity property, we may not use them in the regularization arguments employed in the proof of Theorem 3.8. But Y k , 1 ≤ k ≤ n ′ contain sufficient noise in order to achieve the proof (see Remark 3.9).

An invariance principle related to the local time
In this section we consider a sequence of independent identically distributed, centred random variables Y k , k ∈ N, with finite moments of any order and we denote Our aim is to study the asymptotic behaviour of the expectation of L n (Y ) = 1 n n k=1 ψ εn (S n (k, Y )) with ψ εn (x) = 1 2ε n 1 {|x|≤εn} .
So L n (Y ) appears as the occupation time of the random walk S n (k, Y ), k = 1, ..., n, and consequently, as ε n → 0, one expects that it has to be close to the local time in zero at time 1, denoted by l 1 , of the Brownian motion. In fact, we prove now that E(L n (Y )) → E(l 1 ) as n → ∞.
Theorem 4.1. Let ε n = n − 1 2 (1−ρ) with ρ ∈ (0, 1). We consider a centred random variable Y ∈ L(r, ε) which has finite moments of any order and we take a sequence Y i , i ∈ N of independent copies of Y. We define N (Y ) = max{2k : E(Y 2k ) = E(G 2k )} − 1 ≥ 1 and we denote p N (Y ) = 8(1 + (N (Y ) + 1)(N (Y ) + 3))(4 + (N (Y ) + 1)(N (Y ) + 3)). For every η < 1 there exists a constant C depending on r, ε, ρ, η and on Y p N(Y ) such that (4.1) The above inequality holds for n which is sufficiently large in order to have  Proof. All over this proof we denote by C a constant which depends on r, ε, ρ, η and on Y p N(Y ) (as in the statement of the lemma) and which may change from a line to another.
Using Chebyshev's inequality and Burkholder's inequality we obtain for every p ≥ 2 And the same estimate holds with Y i replaced by G i . We conclude that
We now write We will use (3.33) with f = h k n ,n and ∂ γ will be the first order derivative. Then, by (3.33) with Here C is the constant from (3.33) defined in (3.34). Notice that by (4.2), for k ≥ k n = n ηρ one has We recall now that (see (2.15)) with H Γ l (x) linear combinations of Hermite polynomials (see (2.10) and (2.14)). Notice that if l is odd then Γ l is a linear combination of differential operators of odd order (see the definition of Λ m,l in (2.9)). So H Γ l is an odd function (as a linear combination of Hermite polynomials of odd order) so that ψ εn × H Γ l is also an odd function. Since W 1 and −W 1 have the same law, it follows that and consequently Moreover, by the definition of N (Y ), for 2l ≤ N (Y ) we have E(Y 2l ) = E(G 2l ) so that H Γ 2l = 0. We conclude that We put now together the results from the first and the second step and we obtain (4.1).
Step 3. We prove (4.3). Recall first the representation formula where l a 1 denotes the local time in a ∈ R at time 1, so that l 1 = l 0 1 . Since a → l a 1 is Hölder continuous of order ρ ′ 2 for every ρ ′ < 1, we obtain (4.5) We prove now that, for every ρ ′ < 1 and n large enough, (4.6)

35
To begin we notice that S n (k, G) has the same law as W k/n , so that we write As above, we take k n = n ρη and for k ≤ k n , we have Since P(|W s | ≥ ε n ) ≤ C exp(− ε 2 2s ), this immediately gives for n large enough.
We consider now the case k ≥ k n . Using a formal computation, by applying the standard Gaussian integration by parts formula, we write in which we have used (4.4) and where H 3 denotes the third Hermite polynomial. The above computation is formal because ψ εn is not differentiable. But, since the first and the last term in the chain of equalities depends on ψ εn only (and not on the derivatives) we may use regularization by convolution in order to do it rigorously. Notice also that the first equality is obtained using Ito's formula and the last one is obtained using integration by parts. It follows that

Convergence in distribution norms
In this section we prove that, under some supplementary regularity assumptions on the laws of Z k , k ∈ N, Theorem 3.8 implies that the density of the law of S n (Z) converges in distribution norms to the Gaussian density. We write and we denote σ k = C k C * k . We assume that 0 < σ ≤ σ k ≤ σ < ∞, and sup k Y k p p < ∞. (4.7) In particular each σ k is invertible. We denote γ k = σ −1 k . Notice that the normalization condition is For a function f ∈ C 1 (R d ) and for k ∈ N we denote Proposition 4.2. A. We fix q ∈ N and we also fix a polynomial P. Suppose that Y i ∈ L(r, ε), i ∈ N and (4.7) holds. Moreover we suppose that P(Y i ∈ dy) = p Y i (y)dy with p Y i ∈ C 1 (R d ) for every for i = 1, ..., q. There exist some constants c ∈ (0, 1) (depending on r and on ε) and C q (P ) ≥ 1 (depending on q, σ, σ and on P ) such that, if n (q+1)/2 e −cn ≤ 1, then, for every f ∈ C q p (R d ), and every multi-index α with |α| ≤ q |E(P (S n (Z))∂ α f (S n (Z)) − E(P (S n (G))∂ α f (S n (G))| ≤ C q (P ) √ n q i=1 m 1,l 0 (f )+l 0 (P ) (p Y i ) × L 0 (f ). (4.9) B. Moreover, if p Sn is the density of the law of S n (Z) then, if n (d+q+1)/2 e −cn ≤ 1, we have where γ is the density of the standard normal law in R d .
Proof A. We proceed by recurence on the degree k of the polynomial P . First we assume that k = 0 (so that P is a constant) and we prove (4.9) for every q ∈ N. We write Then we define and we have E(∂ α f (S n (Z)) = E(∂ α g(S (q) n (Z))). Now using (3.44) with N = 0 for S (q) n (Z) we get E(∂ α g(S (q) n (Z))) = E(∂ α g(S (q) n (G))) + R n = E ∂ α f with |R n | ≤ C 1 √ n L 0 (g) + e −cn L q (g) . Let us estimate L q (g). We recall that γ i = σ −1 i . For α = (α 1 , ..., α q ) we have (4.13) in which we have assumed that the Y i 's take values in R m . So p Y i (y i )dy 1 ...dy q = (−1) q n q/2 m β 1 ,...,βq=1 It follows that |∂ α g(x)| ≤ Cn q/2 L 0 (f ) We conclude that l q (g) = l 0 (f ) and L q (g) ≤ Cn q/2 L 0 (f ) q i=1 m 1,l 0 (f ) (p Y i ). The same is true for q = 0 and so (4.12) gives the last inequality being true if n q/2 e −cn ≤ n −1/2 . So (4.11) says that we succeed to replace Y i , q + 1 ≤ i ≤ n by G i , q + 1 ≤ i ≤ n and the price to be paid is CL 0 (f ) q i=1 m 1,l 0 (f ) (p Y i ) × 1 √ n . Now we can do the same thing and replace Y i , 1 ≤ i ≤ q by G i , 1 ≤ i ≤ q and the price will be the same (here we use C i G i , i = q + 1, ..., 2q instead of C i Y i , i = 1, ..., q). So (4.9) is proved for polynomials P of degree k = 0.
We assume now that (4.9) holds for every polynomials of degree less or equal to k − 1 and we prove it for a polynomial P of order k. We have Since |β| ≥ 1 the polynomial ∂ β P has degree at most k − 1. Then the recurrence hypothesis ensures that (4.9) holds for ∂ β P × ∂ γ f. Moreover, using again (4.9) for g = P × f we obtain (4.9) in which L 0 (g) ≤ L 0 (P )L 0 (f ) and l 0 (g) ≤ l 0 (P ) + l 0 (f ) appear. So A. is proved.

Remark 4.3.
We would like to obtain Edgeworth expansions as well -but there is a difficulty: when we use the expansion for S (q) n (Z) we are in the situation when the covariance matrix of S (q) n (Z) is not the identity matrix. So the coefficients of the expansion are computed using a correction (see the definition of ∆ k in the Remark 3.10). And this correction produces an error of order n −1/2 . This means that we are not able to go beyond this level (at least without supplementary technical effort).

A Computation of the first three coefficients
We explicitly write the expression of Γ k for k = 1, 2, 3 (for larger values of k the term Γ k is difficult to explicitly compute). Recall formulas (2.10) for Γ k and formula (2.9) for the set Λ m,k appearing in (2.10).
We consider now a sequence of independent centred Gaussian random variables G k with covariance matrix σ k and we denote S p = p k=1 G k . Moreover, for a matrix σ ∈ M d×d we define the operators where W is a d−dimensional Brownian motion independent of S p .