Berry-Esseen bounds in the Breuer-Major CLT and Gebelein's inequality

We derive explicit Berry-Esseen bounds in the total variation distance for the Breuer-Major central limit theorem, in the case of a subordinating function $\varphi$ satisfying minimal regularity assumptions. In particular, our results cover the case $\varphi(x) = |x| - \sqrt{2/\pi}$, that until now has been outside the scope of available techniques. Our approach is based on the combination of the Malliavin-Stein approach for normal approximations with Gebelein's inequality, bounding the covariance of functionals of Gaussian fields in terms of maximal correlation coefficients.


Introduction
Let X = (X k ) k∈Z be a centered stationary Gaussian sequence with covariance function E[X k X j ] = ρ(k − j) satisfying ρ(0) = 1. Let ϕ ∈ L 2 (R, γ), where γ is the standard Gaussian measure on the real line, and assume without loss of generality that E[ϕ(X 1 )] = R ϕdγ = 0. By exploiting the orthogonality and completeness of Hermite polynomials in L 2 (R, γ) (see, e.g., [11, p. 13]), we can write where H ℓ is the Hermite polynomial of order ℓ, the coefficient a d is different from zero, d ≥ 1 is the Hermite rank of ϕ, and the series converges in L 2 (R, γ). Consider the sequence of normalized sums ϕ(X k ), n ≥ 1.
The celebrated Breuer-Major theorem [2], stated below, provides sufficient conditions on the covariance function ρ, in order for F n to exhibit Gaussian fluctuations, as n → ∞ (see also Taqqu [20] for a related Date: December 24, 2018. 1 work). Throughout the paper, the symbol N(a, b) denotes a Gaussian random variable with mean a ∈ R and variance b ≥ 0.
Theorem 1 (Breuer-Major Theorem). Let the previous assumptions on X and ϕ prevail, and suppose moreover that k∈Z |ρ(k)| d < ∞. Then F n d → N(0, σ 2 ), where Here, and for the rest of the paper, d → denotes convergence in distribution of random variables.
The Breuer-Major theorem has far-reaching applications in many different areas, such as mathematical statistics, signal processing or geometry of random nodal sets, see e.g. [5,19,21,15] and references therein. It has been generalized and refined in various aspects [3,4,10,12,13]. Now let σ 2 n := Var(F n ) and V n := F n / Var(F n ). The aim of the present paper is develop a novel method for obtaining explicit upper bounds on the sequence d TV (V n , N(0, 1)) := sup A∈B(R) |P(V n ∈ A) − P(N(0, 1) ∈ A)| , n ≥ 1, under minimal regularity assumptions on the function ϕ. We recall that, for every n, the quantity d TV (V n , N(0, 1)) corresponds to the total variation distance between the distributions of V n and N(0, 1) -see e.g. [12,Appendix C], and the references therein, for a discussion of the properties of d TV . Any statement yielding the existence of an explicit numerical sequence {α n } such that α n → 0 and d(V n , N(0, 1)) ≤ α n , for some distance d, is called a quantitative Breuer-Major Theorem.
One of the first quantitative Breuer-Major theorems is contained in the work by Nourdin, Peccati and Podolskij [13] -see, in particular, [13,Cor. 2.4], where the focus is on the Kolmogorov and 1-Wasserstein distances and on the case where ϕ is a Hermite polynomial of order q. The rates obtained in [13] are, in general, not optimal. We stress that, according to [13,Corollary 2.4], the convergence in distribution in Theorem 1 always takes place in the sense of the Kolmogorov and 1-Wasserstein distances. On the other hand, determining whether the Breuer-Major CLT holds in the topology of the distance d T V is a much more delicate matter, and is the main focus of the present work. In particular, the following problem is still open: Problem P: Letting the notation of Theorem 1 prevail, find necessary and sufficient conditions on ρ and ϕ in order to have that We note that, unlike convergences in Kolmogorov or 1-Wasserstein distances, convergence in total variation distance cannot always take place without extra assumption on ϕ. For an easy counterexample, consider independent X k ∼ N(0, 1) and set ϕ(x) = sign(x); the assumptions of Theorem 1 are then satisfied, but d TV (F n / Var(F n ), N(0, 1)) = 1 for all n.
In the case where ϕ has a possibly infinite Hermite expansion (1), and under some extra smoothness assumptions, Nourdin, Peccati and Reinert [14], Nualart and Zhou [18] and Vidotto [23] obtained better error bounds than [13]. The rates of convergence deduced in [13] and [14,18,22] (that are sometimes optimal, and sometimes not) are all obtained via some variation of the Malliavin-Stein approach described in Section 2.2 below.
We observe that, disregarding regularity assumptions on ϕ, the upper bound of the order n −1/2 obtained in [18] is optimal for the set of assumptions at Point (a) and (b) above. We recall that, according e.g. to the terminology adopted in [11,Chapter 9], a numerical sequence α n ↓ 0 is said to provide an optimal rate (for d TV (V n , N(0, 1))), whenever there exist non-zero finite constants k < K such that kα n ≤ d TV (V n , N(0, 1)) ≤ Kα n , for n large enough. Indeed, in the trivial case where ρ(j) = 0 for every j = 0 and using e.g. the reverse Berry-Esseen inequality from [1], it is easy to build a centered smooth function ϕ with Hermite rank 1 and such that d TV (V n , N(0, 1)) ≥ Cn −1/2 , for some absolute constant C > 0. On the other hand, the order n −1/2 under the set of assumptions at Point (b) cannot be improved in general, since it coincides with the third/fourth cumulant barrier for the total variation distance, between the laws of a sequence of random variables in a fixed chaos and the standard normal distribution. Such a result was established in full generality in [12,Theorem 11.2], and is presented in the next proposition in the simple case of polynomials of order 2.

Proposition 2. [12, Proposition 4.2]
Let F n be given by (2) Here, a(n) ≍ b(n) means that the ratio a(n)/b(n) is bounded from above and below by positive finite constants. In particular, As anticipated, the present paper is concerned with upper bounds on the rate of convergence in the Breuer-Major theorem when, unlike [12] but similarly to [18], the function ϕ possibly displays an infinite Hermite expansion (1), and belongs to the Sobolev space D 1,4 . The property of belonging to some space ϕ ∈ D 1,q , q ≥ 1, is minimal, in the sense that it is the least requirement on ϕ that allows one to directly apply the Malliavin-Stein method described in Section 2.2.
One interesting subordinating function ϕ entering the scope of our paper is ϕ(x) = |x| − 2/π. The Breuer-Major CLT associated with such a mapping has been recently applied in a geometric setting in [3], where ϕ arose in the approximation of the length of a smooth regularization of the sample paths of a Gaussian process with stationary increments. Note that ϕ ∈ D 1,q for any q ≥ 1, but ϕ / ∈ D 2,2 . Also, ϕ has an infinite expansion (1) with Hermite rank d = 2. Such a case is not covered by the findings of [18] or [13,14,22] (due to the lack of sufficient regularity for the function ϕ), and enters indeed the framework of our main result, stated in the forthcoming Theorem 3.
Let F n be given by (2) and set σ 2 n = Var(F n ) and V n = F n /σ n . Then: (ii) If ϕ is symmetric (or, more generally, 2-sparse, as defined in Section 2.4) then, for all b ∈ [1, 2] and all n, .
Here, C(ϕ) is a finite constant, whose explicit value is given in (9) below.
(1) As discussed above, the rate provided in Theorem 3-(i) for functions ϕ with Hermite rank 1 is optimal.
(2) For a function ϕ having Hermite rank equal to 2, the sufficient condition for asymptotic normality in Theorem 1 is that ρ ∈ ℓ 2 (Z). Thus, in the case of a symmetric function ϕ with Hermite rank 2, Theorem 3-(ii) yields an upper bound on d TV (V n , N(0, 1)), explicitly interpolating all the cases ρ ∈ ℓ b (Z), for 1 ≤ b < 2.
(3) By inspection of our forthcoming proof, it will be clear that our techniques do not allow us to deal with the case of a general function ϕ ∈ D 1,4 having Hermite rank equal to 2. This implies that the requirement that ϕ is 2-sparse cannot easily be removed.
We would now like to briefly compare our result with those of [18], which is the closest reference to the present note. The higher regularity requirement of [18] stems from the method used therein of applying integration by parts several times. On the other hand, our approach requires that we only perform one integration by parts in the Malliavin-Stein approach, since our final estimate makes use of an intrinsic correlation bound -known as Gebelein's inequality, see [6,22]. The use of Gebelein's inequality, which is the main technological breakthrough of the present paper, requires much less regularity on ϕ.
A natural question one might ask is whether our bounds are optimal. In view of Proposition 2, applying the upper bound in  to the case ϕ = H 2 (and ρ ∈ ℓ b (Z), for some 1 ≤ b < 2), one obtains a rate which is not optimal. However, we do not know whether the rates implied by our results are optimal in the case ϕ(x) = |x| − 2/π. Further discussions around this problem are gathered at the end of the paper -see Section 4.
The present paper is organised as follows. We start by reviewing some basic elements of stochastic analysis on the Wiener space and of the Malliavin-Stein approach. Then we introduce the new ingredient, Gebelein's inequality for correlated isonormal Gaussian processes, in Section 2. We apply a Gebelein-Malliavin-Stein bound to prove our main theorem in Section 3. A discussion on optimality is provided in Section 4, thus concluding the paper.
Every random object considered below is defined on a common probability space (Ω, F , P), with E denoting mathematical expectation with respect to P.

2.1.
Stochastic analysis on the Wiener space. The content of this subsection can be found in [11] or [10]. An isonormal Gaussian process {W (h) : h ∈ H} is a family of centered Gaussian random variables indexed by a real separable Hilbert space H such that the covariance satisfies Let F be a square-integrable functional of an isonormal Gaussian process W . Then, F has a unique Wiener-Itô chaos expansion where f k ∈ H ⊗k is a symmetric kernel, and I k (f k ) is the k-th multiple Wiener-It integral, k ≥ 1. By convention we write I 0 (f 0 ) = f 0 = E[F ]. By orthogonality between multiple integrals of different orders, H ⊗k . Let f : R n → R be of class C ∞ , and such that all its partial derivatives have at most polynomial growth. Consider a smooth functional of the form F = f (W (h 1 ), ..., W (h n )) with h 1 , .., h n ∈ H. We define the Malliavin derivative of F as The set of smooth functionals F introduced above is dense in L q (Ω), q ≥ 1, and the operator D is closable. Therefore, D can be extended to D 1,q , the set of F such that there exists a sequence of smooth functionals (F n ) n≥1 satisfying E[|F n − F | q ] → 0 and E[ DF n − η q H ] → 0, for some η ∈ L q (Ω, H), that we rewrite as η := DF . One defines similarly D p and D p,q . When q = 2, these spaces are Hilbert spaces and we have the following characterization in terms of the chaos expansion (4): The adjoint of D, customarily called the divergence operator or the Skorohod integral, is denoted by δ and satisfies the duality formula, for all F ∈ D 1,2 , whenever u : Ω → H is in the domain of δ.
The Ornstein-Uhlenbeck semigroup (P t ) t≥0 is defined by Mehler's formula for all F ∈ L 1 (Ω) by where W ′ is an independent copy of W and E ′ denotes the expectation with respect to W ′ . For F ∈ L 2 (Ω) given by the chaos expansion (4), the Ornstein-Uhlenbeck semigroup takes the form The generator of (P t ) t≥0 is denoted by L and acts on the chaos expansion in a simple way, The key identity that links the objects defined above is L = −δD; in particular, we have −DL −1 F ∈ dom(δ) for all F ∈ L 2 (Ω). We end this subsection with a fundamental product formula for multiple integrals.

Proposition 4 (Product formula). Let p, q be non-negative integers.
Let f ∈ H ⊗p and g ∈ H ⊗q be symmetric kernels. We have where f ⊗ r g is the symmetrized r-th contraction of f and g, see [11, p. 208] for a definition.

2.2.
Malliavin-Stein approach. We make use of an identity (labeled below as (6)) first noted by Jaramillo and Nualart in [8].
First of all, we observe that any stationary centered Gaussian sequence X = {X k : k ∈ Z} is embedded in an isonormal Gaussian process W = {W (h) : h ∈ H}. This means that that there always exists a Hilbert space H and an isonormal Gaussian process W (defined on the same probability space) such that, for some {e k : k ≥ 1} ⊂ H, W (e k ) = X k for all k, and consequently E[W (e k )W (e l )] = e k , e l H = ρ(k − l), for all k, l (see, e.g., [9, Section 1] for a justification of this fact).

.3.1]) for d TV and then by integration by parts via (5), we have that
where we used the fact that E DV n , u n H = EV 2 n = 1, and the class G is composed of those g : R → R such that g ∞ < √ 2π 2 and g ′ ∞ ≤ 2. Now we estimate from above the variance in the above bound. Note that, by the chain rule and the relation DX k = e k , DV n , u n H = 1 σ 2 n n n k,ℓ=1 Hence, The following relation is a consequence of Meyer's inequality and of the equivalence of Sobolev norms [16, p.72], justifying our integrability assumption on ϕ. Its proof is given in [18,Lem. 2.2].

Note that
so that the covariance in (8) is finite.

2.3.
Gebelein's inequality. Up to some slight adaptation, Theorem 6 can be deduced from Veraar's paper [22]. For the sake of completeness, in the Appendix contained in Section 5, we will however present an independent proof of such a result (inspired by the approach of [22]), using tools and concepts that are directly connected to the framework of isonormal Gaussian processes.
Recall that a L 2 functional of an isonormal Gaussian process is said to have Hermite rank d if its projection to the first d − 1 chaoses is zero, and its projection to the d-th chaos is non trivial.
Theorem 6 (Gebelein's inequality for Isonormal processes). Let W = {W (h) : h ∈ H} be an isonormal Gaussian process over some real separable Hilbert space H, and let H 1 , H 2 be two Hilbert subspaces of H. Define W 1 and W 2 , respectively, to be the restriction of W to H 1 and H 2 . Now consider two measurable mappings F i : R H i → R, i = 1, 2, and assume that each F i (W i ) is centred and square-integrable. If F 1 has Hermite rank equal to p ≥ 1, one has that where θ := sup 2.4. k-sparsity. As we will see in the next section, combining Gebelein's inequality with the Malliavin-Stein approach will lead to effective upper bounds for the total variation distance in the Breuer-Major CLT.
To this end, we need information on the Hermite rank of functionals of the type F := ϕ ′ (W (h))ϕ 1 (W (g)) for h, g ∈ H with unit norm, and ϕ ∈ D 1,4 . We introduce the notion of k-sparsity.
Definition 1. Let ϕ ∈ L 2 (R, γ) be given by the series expansion ϕ = q≥d a q H q . We say the ϕ is k-sparse if min{j − i : j > i ≥ d, a i = 0, a j = 0} ≥ k.
Lemma 7. Assume that ϕ ∈ D 1,4 is 2-sparse and set for h, g ∈ H with unit norm. Then F − E[F ] has Hermite rank at least 2.
Proof. By [11, Th. 2.7.7], we have H p (W (e)) = I p (e ⊗p ) for e H = 1. Thus, ϕ ′ (W (h))ϕ 1 (W (g)) = q≥d p≥d qa q a p I q−1 (h ⊗q−1 )I p−1 (g ⊗p−1 ), where the series convergence in L 2 (Ω). By 2-sparsity, remain only those products of multiple integrals with indices (p, q) satisfying p = q or |p − q| ≥ 2. Assume |p − q| ≥ 2. By Proposition 4, the multiple integral of lowest order in the chaos expansion for the product is I |p−q| (·), hence the projection of I q−1 (h ⊗q−1 )I p−1 (g ⊗p−1 ) to the first chaos is zero. If p = q, Proposition 4 shows that the chaos expansion for the product contains only multiple integrals of even order, ending the proof.

Gebelein-Malliavin-Stein upper bound.
Putting things together, we have the following Gebelein-Malliavin-Stein upper bound for the total variation distance. Proposition 8. Let ϕ(X 1 ) ∈ D 1,4 have Hermite rank d ≥ 1, and define V n = F n /σ n according to (2) and σ 2 n := Var(F n ). We have If, in addition, ϕ is 2-sparse, then Proof. We evaluate the right-hand side of (8) The conclusion follows from symmetry, and by using the estimate (9).

Proof of the main result
We are now ready to Prove Theorem 3. We set ρ n (k) = |ρ(k)|1 |k|<n .
Proof of (i). We have the last inequality being obtained by applying twice Young's inequality for convolutions. The result follows from Proposition 8.
Proof of (ii). First we rewrite the sum of products as a sum of the product of convolutions by introducing the function 1 n (k) := 1 |k|<n .
We have Let b ∈ [1,2]. Applying successively Hölder's inequality and Young's inequality, we are led to The result follows from Proposition 8.

A remark on optimality
Our Gebelein-Malliavin-Stein upper bound (Proposition 8) could not provide the rate n −1/2 in the case where ρ is square integrable but not summable. Indeed, restricting ourselves to the subset of indices {i = j = k}, we obtain that goes to infinity as n → ∞.

Appendix: Proof of Theorem 6
We start by proving a similar result in a simpler setting, to which we can reduce the general case. Proposition 9. Let (X, Y ) be a pair of jointly isonormal Gaussian processes over H, such that X, Y are rigidly correlated, in the following sense: there exists θ ∈ [−1, 1] such that, for every h, g ∈ H Consider measurable mappings F : R H → R and G : R H → R such that F (X) and G(Y ) are square-integrable and centred, and assume that F has Hermite rank p ≥ 1. Then, Proof. Let {e i : i ≥ 1} be an orthonormal basis of H. We write α, β, ... to indicate multi-indices; for a multi-index α, the symbol H α indicates the corresponding multivariate polynomial. We also write where H k stands for the kth Hermite polynomial in one variable; H α (Y ) is defined analogously. From the properties of Hermite polynomials and from the rigid correlation assumption, we infer that, for any choice of multi-indices α, β, Now, by the chaotic representation property of isonormal processes, one has that with convergence in L 2 (Ω). By virtue of the previous discussion, |b α c α |α!, and the conclusion follows from an application of the Cauchy-Schwarz inequality.
We now turn to the proof of Theorem 6.