Stein approximation for functionals of independent random sequences

We derive Stein approximation bounds for functionals of uniform random variables, using chaos expansions and the Clark-Ocone representation formula combined with derivation and finite difference operators. This approach covers sums and functionals of both continuous and discrete independent random variables. For random variables admitting a continuous density, it recovers classical distance bounds based on absolute third moments, with better and explicit constants. We also apply this method to multiple stochastic integrals that can be used to represent U-statistics, and include linear and quadratic functionals as particular cases.


Introduction
The Stein and Chen-Stein methods have been developed together with the Malliavin calculus to derive bounds on the distances between probability laws on the Wiener and Poisson spaces, cf. [9], [12], [13] and for discrete Bernoulli sequences, cf. [10], [4], [5]. The results of these works rely on covariance representations based on the number (or Ornstein-Uhlenbeck) operator L on multiple Wiener-Poisson stochastic integrals and its inverse L −1 . Other covariance representations based on the Clark-Ocone representation formula have been used in [18] on the Wiener and Poisson spaces, and in [19] for Bernoulli processes. This paper focuses on functionals of a countable number of uniformly distributed random variables, and uses the framework of [14], cf. also [15], [16], to derive covariance representations from chaos expansions in multiple stochastic integrals, based on a version of the Clark-Ocone formula with finite difference or derivation operators. We obtain general bounds on the distance of a random functional to the Gaussian and gamma distributions using Stein kernels, see Propositions 3.1-3.3, and we also derive specific bounds for multiple stochastic integrals, see Corollary 5.2. Other recent approaches to the Stein method for arbitrary univariate distributions using Stein kernels include [7].
When restricted to single stochastic integrals, our framework applies to sums of independent centered random variables (X k ) k≥1 with variance one. This includes the case of discrete random variables and, e.g., sums and polynomials of Bernoulli random variables with variable parameters, as a consequence of Proposition 3.4, see Proposition 4.2.
In addition, this approach yields the general bound where d W denotes the Wasserstein distance, see (4.4) below, which recovers classical results such as the bound of Theorem 1.1 in [2], however with an additional factor two.
On the other hand, for random variables which admit a continuous density, as a consequence of Proposition 3.2 we find in Proposition 4.4 that assuming that the cumulative distribution function F k of X k admits a non-vanishing density on the support of X k . This recovers in particular Proposition 3.3 of [18] in the case n = 1. In Section 6 we consider U-statistics, or quadratic functionals of the form where (X k ) k≥1 is a sequence of normalized independent identically distributed random variables, such that Var[Q n ] = 1. Corollary 6.2 shows that we have the bound which provides a different bound from Theorem 1 in [3], with explicit constants. In case a 2k,2k−1 = 1/ √ n, the bound (1.3) yields which recovers the known convergence rate in 1/ √ n as on pages 1074-1075 of [3]. Corollary 6.4 provides another bound obtained from derivation operators.
More generally, our approach applies to functionals of uniformly distributed random variables, see Propositions 3.2 and 3.3 which deal respectively with smooth random functionals and with multiple stochastic integrals, cf. Proposition 3.4.
This paper is organized as follows. In Section 2 we recall the framework of [14] for the construction of random functionals of uniform random variables, together with the construction of derivation operators and the associated stochastic integral (Clark-Ocone) decomposition formula. In Section 3 we derive Stein approximation bounds for the distance of the laws of general functionals to the Gaussian and gamma distributions. Section 4 deals with single stochastic integrals which can be used to represent sums of independent random variables.
Section 5 treats the general case of multiple stochastic integrals, which can be viewed as U -statistics. Finally, in Section 6, double stochastic integrals are discussed with theirs applications to quadratic functionals. In the appendix Section 7 we prove a multiplication formula for multiple stochastic integrals.
We also denote by (F t ) t∈R + the filtration generated by (Y t ) t∈R + , and let The compensated stochastic integral with respect to the compensated point process (Y t − t/2) t∈R + can be defined for square-integrableF t -adapted processes (u t ) t∈R + by the isometry relation see [14], where (u t ) t∈R + and (v t ) t∈R + are square-integrableF t -adapted processes. This also implies the bound for (u t ) t∈R + a square-integrableF t -adapted process.
Given f 1 ∈ L 1 (R + ) ∩ L 2 (R + ) we define the first order stochastic integral Next, given f n a function which is square integrable on R n + and belongs to the spaceL 2 (R n + ) of symmetric functions that vanish outside of we define the multiple stochastic integral see [16] for a construction using a Wick type product, and [22] for the Poisson point process version. It is easy to notice, see (2.1) above and Propositions 4 and 6 of [14], that (I n (f n )) n≥1 forms a family of mutually orthogonal centered random variables which satisfy the bound which allows us to extend the definition of I n (f n ) to all f n ∈L 2 (R n + ). If in addition we have i.e. the function f n is canonical [23], then the multiple stochastic integral I n (f n ) can be written as the U -statistic of order n based on the function f n , i.e.

Finite difference operator
Consider the finite difference operator ∇ defined on multiple stochastic integrals X = I n (f n ) as cf. Definition 5 and Proposition 10 of [14]. The operator ∇ does not satisfy the chain rule of derivation, however it possesses a simple form and it can be easily applied to multiple stochastic integrals.
Proof. We observe that Consequently we have and applying this to (2.7) we obtain the conclusion.
In particular, under the condition (2.3) we have the equality as in Proposition 10 of [14]. The operator ∇ also admits an adjoint operator ∇ * given by whereg n+1 is the symmetrization of g n+1 ∈L 2 (R n + ) ⊗ L 2 (R + ) in n + 1 variables, and ∇ is closable with domain and we have the duality relation (2.10) for u in the domain Dom(∇ * ) of ∇ * , cf. Proposition 8 of [14]. The operator L defined on linear combinations of multiple stochastic integrals as is called the Ornstein-Uhlenbeck operator. By (2.6) the operator is well-defined, invertible for centered X ∈ L 2 (Ω), and the inverse operator L −1 is given by Recall that the operator ∇ satisfies the Clark-Ocone formula for X ∈ L 2 (Ω), see [14], Theorem 2. This relation is reformulated using the operator Ψ t in the next proposition.

Proposition 2.2
For all X ∈ L 2 (Ω) we have Proof. Since the integral term in the right hand side of (2.7) is constant in t on every interval of the form [2k, 2k + 2), k ∈ N, we get and (2.11) ends the proof.
In particular, it follows from the Clark-Ocone formula (2.11) that (2.13) since the integral term in the right hand side of (2.8) is constant in t on every interval of the form [2k, 2k + 2), k ∈ N.

Derivation operator
Given X a random variable of the form we consider the gradient D t defined as cf. Definition 3 of [14]. By Proposition 5 of [14] the gradient D is closable, and its closed domain is denoted by Dom(D). For any X ∈ Dom(D) and φ ∈ C 1 b (R) we have φ(X) ∈ Dom(D), and the operator D satisfies the chain rule of derivation for all φ ∈ C 1 b (R). The gradient operator with domain Dom(D), defined by DX = (D t X) t∈R + satisfies the following Clark-Ocone representation formula, see Theorem 2 of [14].
. We have

Stein kernel
The next proposition shows that the Stein kernel ϕ X defined in (2.17) is a Stein kernel in the sense of Definition (2.1) in [6].
Proof. We note that by Lemma 2.4 and Jensen's inequality we have and, for any In particular, (2.19) shows that we have In the sequel we will also use the identity (2.20) see Relation (3.17) in [11]. Next, we review some examples of Stein kernels.
Gaussian case. The Stein kernel of X 1 N (0, σ 2 ) with the Gaussian cumulative distribution function F (x) is given by Gamma case. When X 1 has the centered gamma distribution with shape parameter s > 0 and density function Beta case. When X 1 has the centered Beta(α, 1) distribution, α > 0, we have and the Stein kernel of X 1 is (2.21) Single stochastic integrals. Such integrals can be used to represent the sum Z n of independent centered random variables (X k ) k≥1 as where is the right-continuous inverse of the cumulative distribution function F k of X k , k ≥ 1.
In the sequel we let C 1 (R + ) denote the set of functions which are C 1 on every interval of the form (2k, 2k + 2), k ∈ N. The next lemma can be useful when computing the Stein kernel of single stochastic integrals according to (2.17), see Propositions 4.3 and 4.4 below.
X k belongs to Dom(D), n ≥ 1. We have Proof. We note that for Next, by Proposition 10 and Lemma 1 in [14] we get On the other hand, by (2.1) and (2.20), see (3.17) in [11], we have where we used the identity (2.20).

Density representation and bounds
Working along the lines of the proof of Theorem 3.1 in [11]  satisfies ϕ X (X) > 0 a.s. In this case Supp(p X ) is a closed interval of R containing 0 and we have As a consequence of Proposition 2.7 we get the following result on density bounds as in Corollary 3.5 of [11].
where C, c > 0 are positive constants. Then the density p X satisfies , a.e. z ∈ R, and the tail probabilities satisfy

Stein approximation bounds
The total variation distance between two real-valued random variables X and Y is defined where B(R) denotes the Borel subsets of R. The Wasserstein distance between the laws of X and Y is defined by where Lip(1) is the class of real-valued Lipschitz functions with Lipschitz constant less than or equal to 1.
In the following propositions we derive bounds for the Wasserstein and total variation distances between the normal distribution and the distribution of a given random variable X ∈ Dom(D). Recall that by Stein's lemma, cf. [21], [8], for any continuous function In the sequel we denote by the space of twice differentiable functions whose first derivative is bounded by 1 and whose second derivative is bounded by 2. For the gamma approximation we will use the distance

Derivation operator bounds
In the next Proposition 3.1 we derive a Stein bound using the Stein kernel ϕ X (z) defined in (2.17), see also Proposition 3.3 of [18] for a bound using a different probabilistic representation of the Stein kernel. Here we denote by Γ(ν/2) a random variable distributed according to the gamma law with parameters (ν/2, 1), ν > 0. We also let ·, · denote the usual inner product ·, · L 2 (R + ) on L 2 (R + ).
where the Stein kernel ϕ X is defined in (2.17), and If moreover X is a.s. (−ν, ∞)-valued then we have Proof. We focus on the first inequalities, as the second inequalities follow from the triangle inequality and Jensen's inequality, and the identity E[ϕ X (X)] = E[X 2 ] that follows from Lemma 2.4.
(i) By Lemma 2.4 we have Hence, using the bound (2.33) in [12] and (3.1), we get (ii) By the covariance identity (3.1) we have and this bound can be extended to h = 1 C for any C ∈ B b (R) by the same approximation argument as in the proof of e.g. Theorem 2.1 of [18].
(iii) Given h ∈ H a twice differentiable function bounded above by 1 we choose c > 0 and By e.g. Lemma 1.3-(ii) of [9], letting Γ ν := 2Γ(ν/2) − ν, the functional equation has a solution f h which is bounded and differentiable on (−ν, ∞), and such that By the covariance identity (3.1) on C 1 b (R) for the centered random variable X we have The claim follows by taking the supremum over all functions h ∈ H.
As a consequence of Proposition 3.1, for any X ∈ Dom(D) such that E[X] = 0, we have and Similarly, Proposition 3.1 implies the following corollary which applies in particular to smooth functionals X ∈ Dom(D).
For any a.s.

Finite difference operator bound
Using the finite difference operator ∇ we obtain the following bound which applies in particular to multiple stochastic integrals, see Proposition 3.4 below.
Proof. By (2.7), for every function f ∈ C 2 (R), the finite difference operator ∇ satisfies Hence for any f ∈ T , by the duality relation (2.10) we have Regarding the first term, we note that for any two square-integrable random variables F and G, by (2.7) we have Next, given that f ∞ ≤ 2, the term (3.5) can be bounded as where we used the relation t ∈ [2k, 2k + 2], k ∈ N, that hold similarly to (3.6). We conclude to (3.4) by the inequality (3.2), which is the bound (2.33) in [12].
The second term in (3.4) can also be written as Taking X = I n (f n ) in Proposition 3.3, we get the following result.
Proposition 3.4 Let f n ∈L 2 (R n + ). The following estimate holds:

Single stochastic integrals
For single stochastic integrals, Proposition 3.4 shows the following.
Consider now a sum Z n of independent centered random variables (X k ) k≥1 written, as in (2.22), as with f 1 ∈ C 1 (R + ) ∩ L 2 (R + ) given by (2.23) from the respective cumulative distribution functions (F k ) k≥1 . In this case, Proposition 4.1 can be rewritten as follows.
Proposition 4.2 Given (Z n ) n≥1 written as in (4.2) we have Proof. We note that f 1 (2k + 1 + U k ) = F −1 k ((U k + 1)/2) has same distribution as X k , k ≥ 1, hence (4.1) can be rewritten as Using Hölder's inequality, Proposition 4.2 shows that for the normalized sumZ n : with however a worse constant.

Bernoulli random variables
Given (p k ) k≥1 a sequence in (0, 1), letting the single integral I 1 (f 1 1 [0,2n] ) becomes a weighted sum α k X k of centered and normalized Bernoulli random variables (X k ) k≥1 with parameters (p k ) k≥1 , and (4.3) shows that The bound for d T V (I 1 (f 1 ), N ) is twice as large as (4.5).
Proof. We note that by (2.1) and Lemma 2.6 we have Proposition 4.3 can be rewritten as follows using sums Z n of random variables (X k ) k≥1 .

Proposition 4.4
Assume that (X k ) k≥1 is a sequence of independent centered random variables having non-vanishing continuous densities. Then the sum The bound for d T V (Z n , N ) is twice as large as (4.6).
Gaussian case. The Stein kernel of X k centered Gaussian is given by and the bound (4.6) recovers d W (Z n , N ) ≤ |1 − E[Z 2 n ]| as expected.
Gamma case. The Stein kernel of X k a centered gamma random variable is ϕ Zn (y) = and the bound (4.6) shows that the sum Z n satisfies By the scaling relation we find that the normalized sumZ n : , n ≥ 1.
In particular, in the i.i.d. case we have , n ≥ 1, which systematically improves on (4.4) and on the bound (1.1) of [2], i.e.
where Γ(3 + s, s) denotes the upper incomplete gamma function. Indeed, the ratio 2 2Γ(3 + s, s) + 2s 2+s e −s (1 + s) of the two bounds tends to infinity as s tends to infinity, and has smallest value 2 as s tends to 0.

Multiple stochastic integrals
In this section we apply the multiplication formula given in the appendix Section 7 in order to obtain bounds on the distance between multiple stochastic integrals and the normal distribution N . In the sequel for 0 ≤ i ≤ k ≤ n ∧ m we define Bounds obtained from the finite difference operator ∇ To obtain a more explicit bound than in Proposition 3.4 we have to employ the multiplication formula. Precisely, by virtue of Proposition 5.1 we may express I n (f n ) 2 as follows: where G n k f n (z 1 , . . . , z k ) = 1 ∆ k (z 1 , . . . , z k ) n r=0 r l=0 1 {2n−r−l=k} r! n r 2 r l f n l r f n (z 1 , . . . , z k ).
Corollary 5.2 Let f n ∈ L 2 (R n + ) be a symmetric function satisfying (2.3). Assume that

Proof.
We are going to estimate both components appearing in Proposition 3.4. The formula (5.2) lets us write Hence we have Since multiple integrals of different orders are orthogonal, we get Finally, by (2.5), we obtain which implies To get the second component of the estimates in the thesis we use Cauchy-Schwarz inequality in the following way: Since and by orthogonality of multiple integrals, we have which ends the proof.
As noted above, I n (f n ) can be used to represent various U-statistics, including polynomials of Bernoulli random variables, in which case Corollary 5.2 provides an alternative to the results of [10], [5], [19] for Bernoulli processes.

Bounds obtained from the derivation operator D
Here we let C 1 (R n + ) denote the set of functions which are C 1 on every set of the form (2k 1 , 2k 1 + 2) × · · · × (2k n , 2k n + 2), k 1 , . . . , k n ∈ N. and assume that H k , J k ∈ L 2 (R k+1 + ). Additionally, we denote Next is a consequence of Proposition 3.2.
Corollary 5.3 Let f n ∈ C 1 (R n + ) ∩ L 2 (R n + ) and satisfy (2.3). We have The bounds for d T V (I n (f ), N ) are equal to those for d W (I n (f ), N ) multiplied by 2.
Proof. By Lemma 2.4 and formula (2.5) we get Next, we are going to provide an explicit form for the expression D · I n (f n ), E D · I n (f n ) |F · .
We have By Proposition 10 and Lemma 1 in [14] we get Consequently, using the assumption (2.3) twice, we arrive at is a random process when n ≥ 2. Note that by integration by parts we have and consequently Using the orthogonality of multiple integrals of different orders and the relation I n−1 (f n (s, * ))I n−1 (f n (s, * ))1 { * <s} = 2n−2 k=0 I k (J k (s, * )) , we rewrite the latter component as follows: Furthermore, by Proposition 5.1 we have We apply this to Proposition 3.2 and get the first inequality in the assertion of the theorem.
In order to derive the other one we use (2.2) and the estimate

A combinatorial central limit theorem
In this section, we show that the bounds of [4] for the Rademacher combinatorial central limit theorem of [1] can be extended to our setting of random sequences.
Theorem 5.4 There exists a constant C = C(q) such that Proof. Let F be the distribution function of X 1 with generalised inverse function F −1 .
Theorem 5.4 extends the standard Berry-Esseen bound of Corollary 6.2 in [4] to general independent random sequences, in particular when K takes the form K = {1, . . . , n} q ∩∆ q .
Note also that the general result on random sequences in Proposition 6.8 of [10] does not apply to the total variation or Wasserstein distances.

Quadratic functionals
This section is devoted to double stochastic integrals, which are a special case of the multiple integrals discussed in Section 5. We study them in a separate section because of many applications i.e. to quadratic functionals. Taking n = 2 in Corollary 5.2 of Section 5, we get the following result.
Corollary 6.1 Let f 2 ∈ L 2 (R 2 + ) be a symmetric function satisfying (2.3). Assume that the functions belong to L 2 (R + ) and L 2 (R 2 + ), respectively. Then we have For example, when f 2 ∈ C 1 (R 2 + ) ∩ L 2 (R 2 + ) is given by where A = (a k,l ) 1≤k,l≤n is a symmetric matrix with vanishing diagonal and such that 1≤k,l≤n a 2 k,l = 1, Corollary 6.1 yields the following result, when f 1 is given by (2.23).
Corollary 6.2 Given (X k ) k≥1 a sequence of independent identically distributed random variables such that E[X k ] = 0 and E[X 2 k ] = 1, k ≥ 1, let Q n denote the normalized quadratic form  Bounds of that type have been already studied in the literature, see e.g. [20] and [3]. They are usually presented by means of the expression L 2 n := max 1≤k≤n n l=1 a 2 k,l .
Following this convention we can apply the bound of Corollary 6.2 to obtain When f 2 ∈ C 1 (R 2 + ) ∩ L 2 (R 2 + ) is given by (6.1), Corollary 6.3 shows the following bound on quadratic functionals. Corollary 6.4 Given (X k ) k≥1 a sequence of independent identically distributed random variables such that E[X k ] = 0 and E[X 2 k ] = 1, k ≥ 1, the normalized quadratic form Q n defined in (6.2) satisfies The bound for d T V (Q n , N ) is twice as large as (6.5).
Proof. By Corollary 6.3, we have    When (X k ) k≥1 is a sequence of independent gamma identically distributed normalized random variables we have E[(ϕ X k (X k )) 2 ] = 2, and (6.5) yields A similar expression can be obtained from (4.7) in the beta case.
We note that by (2.3) we have E S 2 (t) |F t = 0. Additionally, the function s −→ E S 3 (t) | F s is constant for s ∈ [2 t/2 , 2 t/2 + 2) which, combined with (2.12), implies Then, by the induction hypothesis and renumeration in the first sum below, we get