Poincar\'e Inequalities and Normal Approximation for Weighted Sums

Under Poincar\'e-type conditions, upper bounds are explored for the Kolmogorov distance between the distributions of weighted sums of dependent summands and the normal law. Based on improved concentration inequalities on high-dimensional Euclidean spheres, the results extend and refine previous results to non-symmetric models.


Introduction
Let X = (X 1 , . . . , X n ) be an isotropic random vector in R n (n ≥ 2), meaning that EX i X j = δ ij for all i, j ≤ n, where δ ij is the Kronecker symbol. Define the weighted sums S θ = θ 1 X 1 + · · · + θ n X n , θ = (θ 1 , . . . , θ n ), θ 2 1 + · · · + θ 2 n = 1, with coefficients from the unit sphere S n−1 in R n . We are looking for natural general conditions on X k which guarantee that the distribution functions F θ (x) = P{S θ ≤ x} are well approximated for most of θ ∈ S n−1 by the standard normal distribution function Of special interest is the question of possible rates in the Kolmogorov distance In this problem, going back to the seminal work of Sudakov [35], the well studied classical case of independent components may serve as a basic example for comparison with various models or dependencies. Let us recall that, if X k are independent and have finite 4-th moments (with mean zero and variance one), there is an upper bound on average where c > 0 is an absolute constant, and where we use E θ to denote the integral over the uniform probability measure s n−1 on the unit sphere. Moreover, for any r > 0, This non-trivial phenomenon was observed by Klartag and Sodin [26]. It shows that whenβ 4 is bounded like in the i.i.d. situation, the distances ρ(F θ , Φ) turn out to be typically of order at most 1/n. This is in contrast to the case of equal coefficients leading to the unimprovable standard 1 √ n -rate (in general, including independent Bernoulli summands X k ). Moreover, in the i.i.d. situation with finite moment β 5 = E |X 1 | 5 and symmetric underlying distributions, the typical rate of normal approximation for F θ may further be improved to β 5 n −3/2 up to a constant (which is best possible as long as EX 4 1 = 3, cf. [8]). As for more general models with not necessarily independent components X k , the study of this high-dimensional phenomenon has a long history, and we refer an interested reader to the book [15] and a recent paper [13] for an account of various results in this direction. Let us only mention [2], [5], [6], [34], [23], [24], [18], where one can find quantitative variants of Sudakov's theorem on the concentration of F θ about the typical (average) distribution F = E θ F and/or about the normal law Φ for different metrics and under certain assumptions (of convexity-type, for example). Some papers provide Berry-Esseen-type estimates on the closeness of F θ to Φ explicitly in terms of θ assuming that the distribution of the random vector X is "sufficiently" symmetric, cf. [29], [30], [19], [25], [21].
Whether or not F itself is close to the standard normal law represents a thin-shell problem on the concentration of the values of the square of the Euclidean norm |X| about its mean E |X| 2 = n (or, in essence, on the concentration of |X| about √ n). The rate of concentration may be controlled in terms of the functional σ 4 = 1 n Var(|X| 2 ) which is often of order 1 (including the i.i.d. situation). Once it is the case, one can obtain a standard rate of concentration of F θ around Φ on average under mild moment assumptions. For example, it is known that, if E |X| 2 = n (without the isotropy hypothesis), then up to an absolute constant c > 0, where M 3 3 = sup θ E |S θ | 3 (cf. [12]). In order to reach better rates, one has to involve stronger assumptions or functionals such as Λ = Λ(X) defined as an optimal constant in the inequality which may be referred to as a second order correlation condition. In terms of Λ, the bound (1.1) has been extended in [13] modulo a logarithmic factor: If additionally to the isotropy assumption the distribution of X is symmetric around the origin, it was shown that The optimal value Λ = Λ(X) in (1.3) is finite as long as |X| has a finite 4-th moment. It represents the maximal eigenvalue of the covariance matrix associated to the n 2 -dimensional random vector (X i X j − EX i X j ) n i,j=1 . This parameter may be effectively estimated in many examples and is related to other standard characteristics. For example, Λ(X) ≤ 2 max k EX 4 k , if X k are independent. If X is isotropic, and its distribution admits a Poincaré-type inequality λ 1 Var(u(X)) ≤ E |∇u(X)| 2 (1.5) with a positive (optimal) constant λ 1 = λ 1 (X) for all smooth functions u on R n , then we have Λ(X) ≤ 4/λ 1 (X). The aim of these notes is to sharpen (1.4) via a large deviation bound in analogy with (1.2). This turns out to be possible as long as all linear forms S θ have finite exponential moments. To avoid technical discussions, we restrict ourselves to the case where λ 1 > 0, which at the same time allows to drop the symmetry assumption. Theorem 1.1. Let X be an isotropic random vector in R n with mean zero and a positive Poincaré constant λ 1 . Then with some absolute constant c > 0 Moreover, for all r > 0, Being restricted to isotropic log-concave distributions, an interesting feature of the bound (1.4) is its connection with certain open problems in Asymptotic Convex Geometry such as the K-L-S and thin-shell conjectures. Namely, modulo n-dependent logarithmic factors, the following three assertions are equivalent up to positive constants c and β (perhaps different in different places) for the entire class of isotropic random vectors X in R n having symmetric log-concave distributions (cf. [13]): In this connection, let us also mention a recent paper by Jiang, Lee and Vempala [22], which provides a reformulation of (i)-(ii) as a central limit theorem for random variables of the form X, Y , where Y is an independent copy of X. Note that the implication (i) ⇒ (ii) is immediate when applying (1.5) to u(x) = |x|, while the reverse statement is a deep theorem due to Eldan [17]. By (1.4), we also have (i) ⇒ (iii). As for the implication (iii) ⇒ (ii), it holds true in view of a general relation c Var(|X|) ≤ n (log n) 4 E θ ρ(F θ , Φ) + 1 (which only requires that all S θ have a finite and bounded exponential moment).
The symmetry assumption is irrelevant both in (i) and (ii). However, this is not so obvious concerning (iii). Indeed, one may try to use a symmetrization argument by applying (1.4) to the random vector X ′ = (X − Y )/ √ 2. But then we need a quantitative form of a particular variant of Cramer's theorem: If η is an independent copy of a random variable ξ with mean zero and variance one, and if ξ ′ = (ξ − η)/ √ 2 is almost standard normal, then so is ξ. The best result in this direction is the following theorem due to Sapogov [33]: Given that up to some absolute constant C, where F ξ and F ξ ′ denote the distribution functions of ξ and ξ ′ . Moreover, the dependence in ε on the right-hand side cannot be improved, as was shown in [16] (cf. also [9] for a related model). Thus, the resulting bound on E θ ρ(F θ , Φ) which can be derived this way on the basis of X ′ cannot yield even a standard rate.
Here, we choose a different route. As we will see, it is possible to remove the symmetry hypothesis, by adding to the right-hand side of (1.4) an additional term responsible for higher order correlations between X k . More precisely, as a preliminary bound which is based on the Λ-functional only, it will be shown that The last expectation is vanishing for symmetric distributions, or, for example, if |X| = √ n a.s. As another scenario, the second term in (1.8) is of a smaller order in comparison with log n n λ −1 1 when (1.5) holds. Nevertheless, in contrast to the bound (1.4), the derivation of (1.8) turns out to be tedious, since it involves a careful analysis of projections of the characteristic functions f θ (t) of S θ as functions of θ onto the subspace of all linear functions in the Hilbert space L 2 (R n , s n−1 ).
The paper is organized as follows. We start with the study of densities of linear functionals on the sphere S n−1 viewed as random variables with respect to the normalized Lebesgue measure s n−1 . Here, the aim will be to refine the asymptotic normality of these distributions in analogy with Erdgeworth expansions in the central limit theorem (which we consider up to order 2, Sections 2-3). Then we turn to the problem of deviations of general smooth functions on S n−1 in terms of their Hessians, recalling and extending several results in this direction (Section 4). These results are applied in Sections 5 to characteristic functions f θ (t), with a separate treatment of their linear parts in L 2 (s n−1 ) in the next Section 6. In Section 7, we adapt basic Fourier analytic tools in the form of Berry-Esseen-type bounds to the scheme of weighted sums. Deviations of involved integrals as functions on the sphere are discussed separately in Section 8. Section 9 collects several general facts about Poincaré-type inequalities that will be needed for the proof of Theorem 1.1, while final steps of the proof are deferred to the remaining Sections 10-12.
As usual, the Euclidean space R n is endowed with the canonical norm | · | and the inner product ·, · . We denote by c a positive absolute constant which may vary from place to place (if not stated explicitly that c depends on some parameter).

Distribution of Linear Functionals on the Sphere
By the rotational invariance of s n−1 , all linear functionals u(θ) = θ, v with |v| = 1 have equal distributions. Hence, it is sufficient to focus just on the first coordinate θ 1 of the vector θ ∈ S n−1 viewed as a random variable on the probability space (S n−1 , s n−1 ). It is well-known that this random variable has density , with respect to the Lebesgue measure on the real line, where c n is a normalizing constant. We denote by ϕ n the density of the normalized first coordinate √ n θ 1 , i.e., Clearly, as n → ∞, and one can also show that c ′ n < 1 √ 2π for all n.
Deviations for ϕ n (x) from ϕ(x) have been considered in [12]. In particular, if n ≥ 3, then for all x ∈ R, We need to sharpen this bound by obtaining an approximation for ϕ n (x) with an error of order 1/n 2 by means of a suitable modification of the standard normal density. Denote by H 4 (x) = x 4 − 6x 2 + 3 the 4-th Chebyshev-Hermite polynomial.
Proposition 2.1. For all x ∈ R and n ≥ 3, Proof. In the interval |x| ≤ 1 2 √ n, consider the function p n (x) = (1 − x 2 n ) n−3 2 + . Using the Taylor expansion for the logarithmic function near zero, one may write The remainder term has the form with some 0 ≤ ε ≤ 1. By the assumption that x 2 ≤ 1 4 n, it satisfies Hence Moreover, using once more x 2 ≤ 1 4 n, we get which implies Hence, with some |ε 1 | ≤ 1, As a result, To derive a similar expansion for ϕ n (x), denote by Z a standard normal random variable. From (2.3) we obtain that Here we used the property that p n (x) has a sufficiently fast decay for |x| ≥ 1 2 √ n, as indicated in (2.1). Since ϕ n (x) = c ′ n p n (x) is a density, we conclude that Thus, in the interval |x| ≤ 1 2 √ n, with a quantity Q n (x) bounded by a universal constant in absolute value. In view of (2.1), the bound (2.2) follows immediately.

Characteristic Function of Linear Functionals
In the sequel, we denote by J n = J n (t) the characteristic function of the first coordinate θ 1 of a random vector θ = (θ 1 , . . . , θ n ) which is uniformly distributed on the unit sphere S n−1 .
In a more explicit form, for any t ∈ R, This is just a multiple of the Bessel function of the first kind with index ν = n 2 − 1 ([3], p. 81). Thus, the characteristic function of the normalized first coordinate θ 1 √ n is given bŷ which is the Fourier transform of the probability density ϕ n . Proposition 2.1 can be used to compareφ n (t) with the Fourier transform of the "corrected Gaussian measure", as well as to compare the derivatives of these transforms.
Moreover, for any k = 1, 2, . . . , Taking k = 1, we have One may also add a t-depending factor on the right-hand side. For t of order 1, this can be done just by virtue of Taylor's formula. Indeed, the functions have equal first three derivatives at zero. Since, by Proposition 3.1, |f n (t)| ≤ c n 2 , Taylor's formula refines this proposition for the interval |t| ≤ 1.
These approximations may be complemented by a Gaussian decay bound Proof of Proposition 3.1. In general, given two integrable functions on the real line, say, p and q, their Fourier transformŝ Moreover, one may differentiate these transforms k times to get as long as the integrands are integrable, which also yields the relation This applies in particular to the functions p(x) = ϕ n (x) and q(x) = ϕ(x) (1− 1 4n H 4 (x)) whose Fourier transform is described asq Since (by Stirling's formula) it remains to apply (2.2).

Deviations of Smooth Functions on the Sphere
Smooth functions u on the unit n-sphere with s n−1 -mean zero are known to have fluctuations of order at most 1/ √ n (which is the case for all linear functions). This may be seen from the Poincaré inequality Moreover, when u is Lipschitz, that is, |∇u(θ)| ≤ 1 for all θ ∈ S n−1 , there is a subgaussian exponential bound on the Laplace transform (cf. [28]) This spherical concentration phenomenon may be strengthened with respect to the dimension n for a wide subclass of smooth functions. We denote by ∇ 2 u(x) the Hessian, that is, the n × n matrix of second order partial derivative ∂ ij u(x), and by I n the identity n × n matrix. Given a symmetric matrix A = (a ij ) n i,j=1 with real or complex entries, the associated Hilbert-Schmidt and operator norms are defined by The next proposition summarizes several results from [13] employing a second order concentration on the sphere, a property developed in [10].
Proposition 4.1. Suppose that a real-valued function u is defined and C 2 -smooth in some neighborhood of S n−1 . If u is orthogonal to all affine functions in L 2 (s n−1 ), then By Markov's inequality, (4.4) yields a corresponding large deviation bound, which may be stated informally as a subexponential stochastic dominance |u| ≤ c b ( 1 √ n Z) 2 with Z ∼ N (0, 1). Thus, the deviations of u are of order at most 1/n.
We will need the following generalization of Proposition 4.1 which is more flexible in applications. Given a function u in the (complex) Hilbert space L 2 = L 2 (R n , s n−1 ), we consider its orthogonal projection l = Proj H u onto the linear space H in L 2 generated by the constant and linear functions on R n . Let us call l an affine part of u.
Suppose that a complex-valued function u is C 2 -smooth in some neighborhood of S n−1 and has s n−1 -mean zero. For any a ∈ C, where l is the affine part of u. Moreover, if ∇ 2 u − a I n ≤ 1 on S n−1 , then Here we used a standard notation for the Orlicz norm on the probability space (S n−1 , s n−1 ) generated by the Young function Proof of Proposition 4.2. The Poincaré-type inequalities (4.1) and (4.3) continue to hold in the class of all complex-valued functions u with s n−1 -mean zero, while (4.2) and (4.4) require slight modifications. Indeed, (4.4) may be applied separately to the real part u 0 = Re(u) and to the imaginary part u 1 = Re(u) of u, which results in for k = 0 and k = 1, assuming that the following conditions are fulfilled: a) u 0 and u 1 (that is, u) are C 2 -smooth and orthogonal to all affine functions in L 2 (s n−1 ); b) ∇ 2 u k − a k I n ≤ 1 on S n−1 with a 0 = Re(a) and a 1 = Re(a).
The latter requirement is met as long as pointwise on S n−1 . As for the exponential bounds in (4.7), they may equivalently be written in terms of the Orlicz ψ 1 -norm as Applying the triangle inequality u ψ 1 ≤ u 0 ψ 1 + u 1 ψ 1 in the Orlicz space and noting that b 0 + b 1 is just the integral on the right-hand side in (4.5)-(4.6), we conclude that This is a "complex" variant of the inequality (4.4), which holds for all a ∈ C under the assumption that u is C 2 -smooth in some neighborhood of S n−1 , is orthogonal to all affine functions in L 2 (s n−1 ), and satisfies (4.8).
One may now start with an arbitrary C 2 -smooth function u with mean zero, but apply these hypotheses and the conclusions to the projection T u of u onto the orthogonal complement of the space H of all linear functions in L 2 (s n−1 ). This space has dimension n, and one may choose for the orthonormal basis in H the canonical functions Therefore, the "linear" part l = T u − u of u is described as the orthogonal projection in L 2 (s n−1 ) onto H, namely In other words, which implies, in particular, that The functions T u and u have identical Euclidean second derivatives. Hence, (4.5) follows from (4.3) when the latter is applied to T u, since T u and l are orthogonal in L 2 . Applying (4.9) with T u in place of u, we similarly have provided that ∇ 2 T u − a I n = ∇ 2 u − a I n ≤ 1 on S n−1 as in (4.8).
To derive (4.6), it remains to use the fact that the linear functions on the sphere behave like Gaussian random variables. This can be seen from (4.2), which may be applied with r = 1 to the real and imaginary parts of l/ l Lip . Then it gives The latter should be combined with (4.11), and we arrive at (4.6) due to the triangle inequality u ψ 1 ≤ T u ψ 1 + l ψ 1 .

Concentration of Characteristic Functions
Given a random vector X = (X 1 , . . . , X n ) in R n , we consider the smooth functions where t ∈ R serves as a parameter. For any fixed θ ∈ R n , t → f θ (t) represents the characteristic function of the weighted sum S θ = X, θ with distribution function F θ , while the s n−1 -mean Recall that we use E θ to denote integrals with respect to the uniform measure s n−1 .
In order to control deviations of u t from f (t) on S n−1 at the standard rate, the spherical concentration inequalities (4.1)-(4.2) are sufficient. Indeed, the function u t has a gradient described in the vector form as Hence, under the isotropy assumption, writing that is, | ∇u t (θ), w | ≤ |t| |w| for all w ∈ C n . This gives a uniform bound |∇u t (θ)| ≤ |t|, so that, by the spherical Poincaré inequality (4.1), A similar inequality is also true for the Orlicz ψ 2 -norm of f θ (t) − f (t) generated by the Young function ψ 2 (r) = e r 2 − 1.
As it turns out, this rate of concentration may be improved under a second order correlation condition (1.3) at least for values of t which are not too large, by involving the characteristic Λ = Λ(X). In the isotropic case, this condition is described as the relation Here, Λ is necessarily bounded away from zero. Indeed, (5.3) includes E X 2 j X 2 k − δ jk ≤ Λ as partial cases. Summing this over all j, k = 1, . . . , n leads to E |X| 4 − n ≤ n 2 Λ. But As was proved in [13] on the basis of Proposition 4.1, if the distribution of X is isotropic and symmetric about the origin, the characteristic functions f θ (t) satisfy in the interval |t| ≤ An 1/5 where the constant c > 0 depends on the parameter A ≥ 1 only. Moreover, Note that, in the symmetric case, the functions θ → f θ (t) are even, so, all u t have zero linear parts when projecting them onto the subspace H of all linear functions in L 2 (R n , s n−1 ). To drop the symmetry assumption, consider an orthogonal decomposition where l t (θ) = c 1 (t) θ 1 + · · · + c n (t) θ n , θ = (θ 1 , . . . , θ n ) ∈ R n , is the orthogonal projection of u t −f (t) onto H (the linear part) and v t (θ) = u t (θ)−f (t)−l t (θ) is the non-linear part of u t . By the orthogonality, With these notations, the bounds (5.4)-(5.5) should be properly modified.
Proposition 5.1. Given an isotropic random vector X in R n , in the interval |t| ≤ An 1/5 , with some constant c > 0 depending on the parameter A ≥ 1. Here l t is the linear part of f θ (t) in L 2 (R n , s n−1 ) from the orthogonal decomposition (5.6). Moreover, if |t| ≤ An 1/6 , then If the distribution of X is symmetric about the origin, then l t (θ) = 0, and we return in (5.8)-(5.9) to (5.4)-(5.5). The linear part l t is also vanishing, when X has mean zero and a constant Euclidean norm, i.e. when |X| = √ n a.s. (this will be clarified in the next section).
Proof. To employ Propositions 4.1-4.2, we need to choose a suitable value a ∈ C and estimate the operator norm ∇ 2 u t − a I n and the Hilbert-Schmidt norm ∇ 2 u t − a I n HS . First note that, by differentiation of (5.1), for any fixed t ∈ R, Hence, a good choice is a = −t 2 f (t) in order to balance the diagonal elements in the matrix of second derivatives of u t . For any vector w ∈ C n , using the canonical inner product in the complex n-space, we have Hence, by the isotropy assumption, In terms of the norm defined as in (4.8), this bound insures that In addition, putting a(θ) = −t 2 f θ (t), we have where the supremum is running over all complex numbers z jk such that n j,k=1 |z jk | 2 = 1. But, under this constraint, due to the second order correlation condition, the last expectation is bounded by Λ. Since u t and v t have equal Hessians, we conclude that for all θ. On the other hand, by (5.2), The two last bounds give HS ≤ 2Λt 4 + 4t 6 , which, by Proposition 4.1, yields One can sharpen this bound for the range |t| ≤ An 1/5 . Applying it in (5.7), we get which, according to the identity in (5.12), gives Combining this with (5.11), we get Hence, by Proposition 4.1 once more, 10 nt 4 (n − 1) 2 E θ |l t (θ)| 2 + 10 (n − 1) 2 Λt 4 + 50 n (n − 1) 4 (Λt 8 + 2t 10 ), so that, by (5.7), According to the identity in (15.12), this gives E θ (a(θ) − a) I n 2 HS ≤ nt 4 1 + 10 nt 4 (n − 1) 2 E θ |l t (θ)| 2 + 10 n (n − 1) 2 Λt 8 + 50 n 2 (n − 1) 4 (Λt 12 + 2t 14 ).
One can combine this with (5.11) to obtain that with some constant c depending A. Since nt 4 < A 4 n 2 , by Proposition 4.1, we get In view of (5.7), this proves the inequality (5.8).
To get a bound for the ψ 1 -norm, note that, by (5.10), the conditions of Proposition 4.2 (in its second part) are fulfilled with − 1 2 f (t) in place of a for the function Since (5.13) holds for u t as well (provided that |t| ≤ An 1/5 ), this inequality may be rewritten as HS ≤ n l t 2 L 2 + Λ. The linear part of u is given by l t /(2t 2 ). Hence, the inequality (4.6) of Proposition 4.2 yields Using once more Λ ≥ 1 2 , the above is simplified to (5.14) Here, the last term on the right-hand side is dominated by the second last term in the smaller interval |t| ≤ An 1/6 . Indeed, according to the concentration inequality (5.2), As a result, (5.14) leads to the required form (5.9).
Remark. Continuing the iteration process in the proof of Proposition 5.2, one may state (5.8) in the intervals |t| ≤ n α with any fixed α < 1 4 .

Linear Part of Characteristic Functions
In order to make the bounds (5.8)-(5.9) effective, we need to properly estimate the L 2 -norm of the linear part l t (θ) of f θ (t) in L 2 (R n , s n−1 ). According to (4.10), it is described as Let us find an asymptotically explicit expression for this function.
Proposition 6.1. Let X be a random vector in R n such that E |X| 2 = n. For any t ∈ R, the characteristic function f θ (t) = E e it X,θ as a function of θ on the sphere has a linear part, whose squared L 2 (s n−1 )-norm may be represented as
Proof. Using an independent copy Y of X, one may rewrite (6.1) equivalently as To compute the inner expectations, introduce the function where, as before, J n denotes the characteristic function of the first coordinate of a point on the unit sphere S n−1 under the normalized Lebesgue measure s n−1 . By the definition, Differentiating this equality with respect to the variable v k , we obtain that

Let us multiply this by a similar equality
to get that, for all v, w ∈ R n , Hence, summing over all k ≤ n, we get It remains to make the substitution v = tX, w = tY and to take the expectation over (X, Y ). Then we arrive at the following expression In particular, if |X| = √ n a.s., then which is vanishing, as soon as X has mean zero. In fact, the property I(t) = 0 remains valid for more general random vectors. In particular, this is the case, where the conditional distribution of X given that |X| = r has mean zero for any r > 0. Now, let us derive an asymptotic formula for the function K n and its derivative. We know from Corollary 3.2 that Since J n (t √ n) = K n (t 2 ), after differentiation we find that Changing the variable, we arrive at From this, uniformly over all t, s ≥ 0, so, with a remainder term satisfying |ε| ≤ c n 2 up to some absolute constant c. The latter yields assuming that E |X| 2 = n. Hence, recalling (6.3), we obtain (6.2).
In the isotropic case, we have E | X, Y | ≤ √ n, which leads to the corresponding improvement of the remainder term.

Berry-Esseen Bounds
The Kolmogorov distances between the distribution functions F θ of the weighted sums S θ = X, θ and the standard normal distribution function Φ can be explored by means of the Berry-Esseen-type bounds. They involve the characteristic functions associated to F θ (x) and the average distribution function F (x) = E θ F (x). Using the Λfunctional, let us state a few preliminary relations.
Lemma 7.1. Given a random vector X in R n such that E |X| 2 = n, we have, for all T ≥ T 0 ≥ 1 and θ ∈ S n−1 , The idea to involve two parameters T and T 0 stems upon the observation that the first integrand in (7.2) is small on a relatively moderate sized interval [0, T 0 ] only, due to the concentration property of f θ (t) about f (t) as a function of θ (as discussed in Section 5). On the other hand, for T 0 ≤ t ≤ T with a sufficiently large T , one may hope that both f θ (t) and f (t) will be just small in absolute value (in analogy with the case of independent components).
Proof. One can apply a general Berry-Esseen-type bound where U and V are arbitrary distribution functions with characteristic functionsÛ andV , respectively (cf. e.g. [7], [31], [32]). In particular, for all θ ∈ S n−1 , Splitting the integration in the first integral to the subintervals [0, T 0 ] and [T 0 , T ], T ≥ T 0 > 0, we then have 3) The decay of the characteristic function f (t) for large t can be controlled in terms of the variance-type functional σ 2 4 = 1 n Var(|X| 2 ), which in turn satisfies σ 2 4 ≤ Λ according to the inequality (1.3) applied with coefficients a ij = 1. Namely, write the definition (7.1) as Here, one may split the expectation into the event A = {|X| 2 ≤ 1 2 n} and its complement B. By the upper bound (3.1), On the other hand, by Chebyshev's inequality, Since |J n (s)| ≤ 1 for all s ∈ R, we get thus implying that c |f (t)| ≤ e −t 2 /4 + Λ n for all t ∈ R, and therefore c T Using these bounds in the inequality (7.3), it is simplified to The variance functional may also be used to quantify closeness of F to the standard normal distribution function via the inequality (cf. [11]) Since σ 2 4 ≤ Λ, (7.2) immediately follows in view of the triangle inequality for the Kolmogorov metric.
Lemma 7.1 may be used to derive the following upper bound on average which represents a generalization of the inequality (1.4).
Lemma 7.2. Given an isotropic random vector X in R n , with T 0 = 4 √ log n we have where I(t) denotes the squared L 2 -norm of the linear part of f θ (t) in L 2 (s n−1 ).
Proof. When bounding ρ(F θ , Φ) on average with respect to s n−1 , the inequality (7.6) is actually not needed. Using Jensen's inequality |f (t)| ≤ E θ |f θ (t)|, from (7.3) and (7.5) we obtain that, for all T ≥ T 0 ≥ 1, Now, as was shown in [12] (Lemma 5.2 specialized to the parameter p = 2), for all t ∈ R, where Y is an independent copy of X. Using a simple relation m 4 ≤ M 2 4 (Corollary 2.3 in [12]), one may also involve the functional It may be bounded in terms of Λ as well as σ 2 4 . Indeed, applying (1.3) with a ij = θ i θ j , we get Var( X, θ 2 ) ≤ Λ, θ ∈ S n−1 , which implies M 4 4 ≤ 1 + Λ ≤ 3Λ in the isotropic case. This allows us to replace (7.9) with Applying the latter in (7.8), thus inequality is simplified to Here, the integral can be bounded by virtue of the L 2 -bound (5.8) which yields for |t| ≤ An 1/5 with a prescribed constant A ≥ 1. This gives as long as T 0 ≤ An 1/5 . Applying this in (7.10), we arrive at Finally, choosing T = 4n, T 0 = 4 √ log n, we obtain (7.7).

Large Deviations Related to Moderate Sized and Long Intervals
A similar argument can be used when bounding the ψ 1 -Orlicz norm of ρ(F θ , Φ). As a preliminary step, let us start with the first integral in (7.2) over the moderate interval. Applying now the inequality (5.9), we have which is used with the same parameter T 0 as in Lemma 7.2. In general, by Markov's inequality, s n−1 |ξ| ≥ r ξ ψ 1 ≤ 2 e −r , r > 0. Hence, we get: Outside the moderate sized interval, that is, on the long interval [T 0 , T ], both |f (t)| and |f θ (t)| are expected to be small for most of θ. To study this property, let us consider the growth of the moments of the integral Given a random vector X in R n , let X (k) , Y (k) (k = 1, . . . , p) be independent copies of X. For the integral in (8.1) with parameters T 0 = 4 √ log n and T = T 0 n, we have Proof. By Hölder's inequality, so that Since |f θ (t)| 2p = E e it Σp,θ , we may write Thus, Next, we split the expectation to the events A and its complement B = |Σ p | 2 > np 2 . Applying the upper bound (3.1), we get For the choice T 0 = 4 √ log n, T = T 0 n, this leads to Using the inequality x 2p e −x ≤ p 2p (x ≥ 0), we have e −n/12 ≤ (12 p) 2p n −2p , and the above bound is simplified to (8.2).

Concentration in Presence of Poincaré-type Inequalities
In order to simplify the bounds in Lemma 7.2 and Lemmas 8.1-8.2, we need more information about the distribution of X, which would allow us to say more on the involved function I(t) and the probability of the even A as in Lemma 8.2. To this aim, our starting hypothesis will be described by Poincaré-type inequalities. Let us first recall several results about concentration, assuming that the random vector X = (X 1 , . . . , X n ) in R n admits the Poincaré-type inequality for all smooth functions u on R n with a positive constant λ 1 . As was discovered by Gromov and Milman [20] and by Borovkov and Utev [14], deviations of random variables u(X) from their means are subexponential, as long as u is a Lipschitz function on R n (cf. also [1], [28]). In a somewhat optimal way, worst possible deviations of u(X) are described in the following assertion proved in [4].
Proposition 9.1. If the function u : R n → R has a Lipschitz semi-norm u Lip ≤ 1, then, for any r ≥ 0, Using a smoothing argument, the inequality (9.2) may be extended to all locally Lipschitz functions, in which case the modulus of the gradient is understood as a Borel measurable function In terms of partial derivatives, it leads to the usual expression n k=1 (∂ x k u(x)) 2 1/2 assuming that u is differentiable at the point x.
If the function u is not Lipschitz (for example, a polynomial), the bound (9.2) is no longer true, and a more general variant of Proposition 9.1 is needed, which would allow us to control probabilities of large deviations. To this aim, proper bounds on the L p -norms of u in terms of the L p -norms of the modulus of the gradient are useful.
Proposition 9.2. Given a locally Lipschitz function u on R n , suppose that the moment E |∇u(X)| p is finite for p ≥ 2. Then, u(X) has finite absolute moments up to order p, and and hence Equivalently, If the right integral is finite, so is the left one, thus u is integrable. Moreover, the left integral is greater than or equal to |u(x) − E u(X)| p dµ(x) (by Jensen's inequality).
Let us now connect the Poincaré constant with small ball probabilities.
It remains to note that (9.5) is fulfilled automatically when λ 1 n < 16 7 , since then the right-hand side is greater than 1.
Let us give another version of this statement for convolutions, namely, for sums where X (k) , Y (k) (1 ≤ k ≤ p) are independent copies of X. One may use the property that the product measure µ ⊗2p on (R n ) 2p = R 2pn has the same Poincaré constant λ 1 as the distribution µ of X. The function u(x 1 , . . . , x p , y 1 , . . . , has Lipschitz semi-norm √ 2p with respect to the Euclidean distance on R 2pn . Therefore, according to Proposition 9.1, it admits an exponential inequality where m is the µ ⊗2p -mean of u. That is, By the Poincaré-type inequality on the product space, and using E |Σ p | 2 = 2pn, we have where the last inequality holds true when λ 1 n ≥ 2. In this case, E |Σ p | ≥ √ pn, and applying In the case λ 1 n ≤ 2, this inequality is fulfilled automatically, so, we arrive at: Remark 9.5. If the random vector X in R n (n ≥ 2) is isotropic, then necessarily λ 1 ≤ 1. Indeed, applying the Poincaré-type inequality (9.1) with linear functions u(x) = x, θ , we get λ 1 1 − a, θ 2 ≤ 1, θ ∈ S n−1 , where a = EX. Since one may choose θ to be orthogonal to the vector a, the conclusion follows. The upper bound λ 1 ≥ 1 is also valid in dimension n = 1, as long as EX = 0 (however, we only have λ 1 ≤ 1/Var(X) without the mean zero assumption).

The Case of Non-symmetric Distributions
In order to extend the bound to the case where the distribution of X is not necessarily symmetric about the origin, we need to employ more sophisticated results reflecting the size of the linear part of the characteristic functions f θ (t) in L 2 (s n−1 ) with respect to the variable θ. This may be achieved at the expense of a certain term that has to be added to the right-hand side in (10.1). More precisely, we derive the following: Proposition 10.1. Given an isotropic random vector X = (X 1 , . . . , X n ) in R n , where Y is an independent copy of X.
The ratio X, Y / |X| 2 + |Y | 2 is understood to be zero in the case X = Y = 0. Note that the last expectation in (10.2) is non-negative which follows from the representation E X, Y EX k e −|X| 2 r 2 2 dr.
If the distribution of X is symmetric, this expectation is vanishing, and we return to (10.1). Returning to Proposition 6.1, define the random variables and recall that the squared L 2 -norm of the linear part of the characteristic function f θ (t) of the weighted sums X, θ admits an asymptotic representation Lemma 10.2. If X is isotropic, then, putting T 0 = 4 √ log n, we have After the change of the variable Rt = s (assuming without loss of generality that R > 0) and putting T 1 = RT 0 , the above is simplified to At the expense of a small error, integration here may be extended from the interval [0, T 1 ] to the whole half-axis (0, ∞). To see this, one can use the estimates As was already noted in (7.4), Since on the set B, we have T 2 1 = 16R 2 log n > 4 log n, and due to ER 2 = 1, it follows that where we used the lower bound Λ ≥ 1 2 . Hence By a similar argument, Using Thus, extending the integration to the positive half-axis, we get with some absolute constants c j > 0. Moreover, using the identity and recalling that E X,Y R ≥ 0, it follows that, with some other positive absolute constants To get rid of the last expectation (by showing that it is bounded by a dimension free quantity), first note that, by (10.5), the expression under this expectation is bounded in absolute value by Rn. Hence, applying Cauchy's inequality and using ER 2 = 1, from (10.6) we obtain that Turning to the complementary set, note that on B, Hence, by Cauchy's inequality, and using E X, Y 2 = n, Combining this bound with (10.8), we finally obtain that As a result, we arrive in (10.7) at the bound (10.4).
Proof of Proposition 10.1. We employ the bound (7.7) of Lemma 7.2 which was stated with T 0 = 4 √ log n. Using Cauchy's inequality and applying (10.4), it gives Simplifying the expression on the right-hand side, we arrive at (10.2).

The estimate on average
Let us rewrite the bound (10.2) as where R 2 = 1 2n (|X| 2 + |Y | 2 ), R ≥ 0, and where Y is an independent copy of X. In the next step, we are going to simplify the last expectation in terms of λ 1 . Note that, under our standard assumptions as in Proposition 10.1, Hence, with high probability the ratio X,Y R is almost X, Y which in turn has zero expectation, as long as X has mean zero. However, in general it is not clear whether or not this approximation is sufficient to make further simplification. Nevertheless, the approximation R 2 ∼ 1 is indeed sufficiently strong, for example, in the case where the distribution µ of X satisfies the Poincaré-type inequality (1.3).
Using | X, Y | ≤ R 2 n, (11.5) cf. (10.5), we have Similarly, Here, the first three expectations on the right-hand side do not exceed in absolute value a multiple of 1 λ 2 1 n . Hence, using the previous bound (11.6), we get where the quantities c 1 and c 2 are bounded by an absolute constant. By Cauchy's inequality, the square of the last expectation does not exceed, E X, Y 2 E (R 2 − 1) 6 = n E (R 2 − 1) 6 .