Quantitative de Jong theorems in any dimension

We develop a new quantitative approach to a multidimensional version of the well-known {\it de Jong's central limit theorem} under optimal conditions, stating that a sequence of Hoeffding degenerate $U$-statistics whose fourth cumulants converge to zero satisfies a CLT, as soon as a Lindeberg-Feller type condition is verified. Our approach allows one to deduce explicit (and presumably optimal) Berry-Esseen bounds in the case of general $U$-statistics of arbitrary order $d\geq1$. One of our main findings is that, for vectors of $U$-statistics satisfying de Jong' s conditions and whose covariances admit a limit, componentwise convergence systematically implies joint convergence to Gaussian: this is the first instance in which such a phenomenon is described outside the frameworks of homogeneous chaoses and of diffusive Markov semigroups.

contrasted with the 'typical' non-central asymptotic behaviour of degenerate U -statistics of a fixed order d ≥ 2 and with a fixed kernel -see e.g. [22], [62], [60], [15] or [27,Ch. 11] ; it also provides a general explanation of the ubiquitous emergence of the Gaussian distribution in geometric models where counting statistics can be naturally represented in terms of degenerate U -statistics, see e.g. [26,48,54].
One should notice that de Jong's central limit theorem (CLT) is a one dimensional qualitative statement: in particular, it does not provide any meaningful information about the rate of convergence of the law of W n towards the target Gaussian distribution. Our aim in this paper is to use Stein's method of exchangeable pairs, as originally developed in Stein's monograph [63], in order to prove new quantitative and multidimensional versions of de Jong's central limit theorem under minimal conditions, in the setting of degenerate and non-symmetric U -statistics that do not necessarily have the form of homogeneous sums. In particular, we are interested in characterizing the joint convergence of those vectors of degenerate U -statistics, whose components verify onedimensional CLTs.
One of the main motivations for pursuing our goal is that the findings of [11] have anticipated a modern and very fruitful direction of research, where tools of infinitedimensional calculus are used in order to deduce fourth moment theorems in the spirit of de Jong (but, crucially, without the use of Lindeberg-Feller-type conditions) for random variables belonging to the homogeneous chaos of some general random field. The best-known results in this area gravitate around the main discovery of [46] (as well as its multidimensional extension [51]), where it is proved that a sequence of normalized random variables {Y n : n ≥ 1}, belonging to a fixed Wiener chaos of a Gaussian field, verifies a central limit theorem (CLT) if and only if E[Y 4 n ] → 3. The combined use of Malliavin calculus and Stein's method has consequently allowed one to deduce strong quantitative versions with explicit Berry-Esseen bounds of these results (see [40,41]), and it is therefore a natural question to ask whether the original CLT by de Jong can be endowed with explicit bounds, that are comparable with those available in a Gaussian setting.
The reader can consult the constantly updated webpage https://sites.google.com/site/malliavinstein/home for an overview of the emerging domain of research connected to [40,41,46,51]. Among the many notable ramifications of the results of [40,46] to which our findings should be compared, we quote: [32,44,55] for results involving homogeneous sums in the Rademacher (also called Walsh) chaos, [17,34,35,49,52,57,61] for the analysis of Poissonized U -statistics living in the Wiener chaos associated with a Poisson measure, [1,5,30,45] for fourth moment theorems involving homogeneous sums in a non-commutative setting, and [2,6,36] for results in the setting of chaotic random variables associated with a diffusive Markov semigroup. Central and non-central quantitative versions of de Jong's results in the case of fully symmetric Poissonized U -statistics can be found in [17,20,50]. Two sets of references are particularly relevant for the present work: (a) In reference [43] (see also [53]) de Jong's CLT in the special case of homogeneous sums was studied in the framework of the powerful theory of universality and influence functions initiated in [39]. In particular, explicit bounds were obtained for vectors of homogeneous sums satisfying a CLT.
(b) In the already quoted reference [51], the following striking phenomenon was discovered. For r ≥ 2, let Y m = (Y m 1 , ..., Y m r ), m ≥ 1, be a sequence of random vectors whose components live in a fixed Wiener chaos, and assume that the covariance EJP 22 (2017), paper 2.
U -statistics. Our main theorems on the matter show that the case of U -statistics of the same order must take into account at least one cumulant of order fourthus echoing recent results from [6]. Our forthcoming Theorem 1.7 marks the first instance in which the phenomenon observed in [51] is described in full generality, outside the frameworks of homogeneous chaoses, and of the chaoses associated with a diffusive Markov semigroup.
We will now describe our setting and our main results in more detail.

Main results, I: univariate normal approximations
Let us fix the following setup and notation, which we essentially adopt from [12]. We refer the reader to the classical references [25,31,29,62,64], as well as to the more recent works [18,19,33,47], for an introduction to degenerate U -statistics, Hoeffding decompositions and their use in stochastic analysis. Let (Ω, F, P) be a probability space and for an integer n ≥ 1 let X 1 , . . . , X n be independent random elements on this space assuming values in the respective measurable spaces (E 1 , E 1 ), . . . , (E n , E n ). Further, assume that f :    (1.7) The next result corresponds to de Jong's celebrated (qualitative) CLT discussed in Section 1. Our first main statement provides an explicit bound in the Wasserstein distance d Wass for Theorem 1.2. We recall that, given two integrable random variables X and Y , the Wasserstein distance between the distributions of X and Y is given by the quantity where Lip(1) stands for the class of 1-Lipschitz functions. Theorem 1.3. As before, let W ∈ L 4 (P) be a degenerate U -statistic of order d such that (1.1) is satisfied and let Z ∼ N (0, 1) be a standard normal random variable. Then, it holds that where κ d is a finite constant which only depends on d. Recall that a degenerate U -statistic W of order d as given by (1.6) is called symmetric, if, additionally, the measurable spaces (E 1 , E 1 ), . . . , (E n , E n ) all coincide, the random variables X 1 , . . . , X n are i.i.d. and if there is a measurable kernel g : E d 1 → R such that f J = g for all J ∈ D d . In this special situation, the relations Hence, we arrive at the following corollary of Theorem 1.3.

Corollary 1.4.
Let W ∈ L 4 (P) be a normalized, degenerate and symmetric U -statistic of order d and let Z ∼ N (0, 1) be a standard normal random variable. Then, In particular, under the assumptions of Theorem (b) In the context of multilinear forms in independent and standardized real-valued random variables (X i ) i∈N considered in [43], the authors had to assume that the uniform moment condition sup i∈N E[X 4 i ] < ∞ is satisfied. It is easy to check that, for homogeneous sums, this condition is in fact equivalent to the hypercontractivity condition Interestingly, this condition was also assumed in the monograph [11] by de Jong who was only able to dispense with it in the later paper [12]. Note further that the bounds for multilinear forms in independent random variables with arbitrary distributions derived in [43] are stated in terms of three times differentiable test functions whose first three derivatives are uniformly bounded by a constant. Hence, our Theorem 1.3 is not only more general than the corresponding result from [43] as far as the class of random functionals dealt with is concerned but is also stated in terms of much less smooth test functions.
(c) It should be mentioned that the original proof of Theorem 1.2 in [12] applies a quantitative martingale CLT from [24] and, by carefully revising its proof, one would be able to derive a bound on the rate of convergence. This issue is also briefly addressed in the introduction of the monograph [11] but not pursued any further. The resulting rate, however, would be of a much worse order than the rate provided by Theorem 1.3. Roughly, the power 1/2 appearing in our statements would have to be systematically replaced by the power 1/5. Furthermore, as was shown in [23] by means of an example, the Berry-Esseen bound for martingales from [24]  his qualitative statement. Note that the phenomenon of generally sharp bounds on the rate of convergence for martingale CLTs which reduce to sub-optimal bounds in particular situations was already discovered in the paper [3]. We also stress that, unlike our work, references [11,12] do not contain any multidimensional statements.
Finally, we would like to mention that the paper [58] also deals with bounds on the normal approximation of so-called degenerate weighted U -statistics of order d = 2, which have the form for some vector X = (X 1 , . . . , X n ) of i.i.d. random variables, some symmetric, degenerate kernel ψ and with nonnegative weights w i,j , 1 ≤ i < j ≤ n. Note that the class of weighted U -statistics is strictly included in our framework, since we can define the degenerate kernel f {i,j} corresponding to the subset {i, j} ∈ D 2 by f {i,j} = w i,j ψ, leading to the Hoeffding components W {i,j} = w i,j ψ(X i , X j ), 1 ≤ i < j ≤ n. This, of course, also holds for arbitrary positive integers d. Note that, in contrast to our work, the bounds given in [58] are expressed in terms of quantities which are related explicitly to the kernel ψ and to the weights w i,j rather than in terms of the fourth cumulant of U and, hence, cannot be immediately compared to ours.

Main results, II: multivariate normal approximations
In this subsection we state a new approximation theorem for the distribution of vectors of degenerate, non-symmetric U -statistics by a suitable multivariate normal distribution. In particular, we show that an analog of de Jong's theorem 1.2 holds in any dimension, see Theorem 1.7. Note that, in the multivariate case, even this qualitative result relating the asymptotic normality of the vector of degenerate, non-symmetric U -statistics to fourth moment conditions is completely novel. As before, let X 1 , . . . , X n be the underlying sequence of independent random variables, let r ∈ N and for 1 ≤ i ≤ r let W (i) be a random variable on (Ω, F, P) which is measurable with respect to F [n] = σ(X 1 , . . . , X n ) and whose Hoeffding decomposition is given by for some p i ∈ N, i.e. W (i) is a degenerate U -statistic of order p i . Without loss of generality, we can assume that p i ≤ p k whenever 1 ≤ i < k ≤ r. Thus, there is an s ∈ {1, . . . , r}, positive integers r 1 , . . . , r s with 1 ≤ r 1 < r 2 < . . . < r s = r and integers 1 ≤ q 1 < q 2 < . . . < q s such that p i = q l for all i ∈ {r l−1 + 1, . . . , r l } and all l = 1, . . . , s , where we set r 0 := 0. We define W := (W (1), . . . , W (r)) T and assume that each W (i) ∈ L 4 (P) with  Note that v i,i = 1 for i = 1, . . . , r and |v i,k | ≤ 1 for 1 ≤ i, k ≤ r, by the Cauchy-Schwarz inequality. Note also that v i,k = 0 unless p i = p k . Hence, V is a block diagonal matrix.
Throughout this section we denote by a centered Gaussian vector with covariance matrix V. For 1 ≤ k ≤ r and J ∈ D p k we define σ J (k) 2 := Var W J (k) = E W J (k) 2 and 2 n,k := max 1≤j≤n J∈Dp k : j∈J Before stating our multivariate normal approximation theorem, we have to introduce some more notation: For a vector x = (x 1 , . . . , x r ) T ∈ R r we denote by x 2 its Euclidean norm and for a matrix A ∈ R r×r we denote by A op the operator norm induced by the Euclidean norm, i.e., A op := sup{ Ax 2 : x 2 = 1} .
Recall that for a function h : R r → R, its minimum Lipschitz constant M 1 (h) is given by Thus, ·, · H.S. is just the standard inner product on R r×r ∼ = R r 2

. The corresponding
Hilbert-Schmidt norm will be denoted by · H.S. . With this notion at hand, following [8] and [38], for k = 2 we finally definẽ where Hess h is the Hessian matrix corresponding to h. + q l min 2 n,k , 2 n,i + q l n,k n,i + C i,k max 2 n,i , 2 n,k n,k and under the above assumptions, we have the following bounds: Since the class of all compactly supported, three times differentiable functions h on R r is convergence-determining, from Theorem 1.6 (i) we obtain the following statement, which is a new multidimensional extension of Theorem 1.2. Theorem 1.7. Fix r ≥ 2, as well as integers p 1 , ..., p r , and let n m → ∞, as m → ∞. Let W m := (W m (1), . . . , W m (r)) T , m ≥ 1, be a sequence of random vectors such that each W m (k) is a centered, unit variance degenerate U -statistic of order p k , whose argument is the vector of independent random elements (X (m) 1 , ..., X (m) nm ). Furthermore, let Σ ∈ R r×r be a positive semi-definite matrix with Σ(j, j) = 1 for j = 1, . . . , r and denote by N = (N (1), ..., N (r)) T ∼ N r (0, Σ) a centered Gaussian vector with covariance matrix Σ. Assume the following: Then, as m → ∞, W m converges in distribution to N .
In the framework of the normal approximation of vectors of eigenfunctions of diffusive Markov semigroups, a condition similar to (iv) in the above statement has been recently introduced and applied in [6]. The rest of the paper is organized as follows: Section 2 contains the proof of our one-dimensional result, Section 3 focusses on our multidimensional statements, whereas Section 4 contains the detailed proofs of several technical lemmas.

Proof of the one-dimensional theorem
In this section we give a detailed proof of Theorem 1.3. First we review Stein's method of exchangeable pairs for univariate normal approximation.

Stein's method of exchangeable pairs
The exchangeable pairs approach within Stein's method dates back to Stein's celebrated monograph [63]. Recall that a pair (X, X ) of random elements on a common In [63] C. Stein extensively illustrated the fact that a given normalized random variable W is close in distribution to Z ∼ N (0, 1), whenever one can construct another random variable W on the same space such that: is satisfied for some small λ > 0, and (iv) the conditional second moment of W − W given W is close to its mean, the constant 2λ, in the L 1 metric. For a precise statement see Theorem 2.1 below.
The range of examples to which this method can be applied was considerably extended by the work [58] by Rinott and Rotar, who proved bounds on the distance to normality under the condition that the linear regression property is only approximately satisfied, i.e. that there is some negligible remainder term R such that is satisfied, where G is a sub-σ-field of F such that σ(W ) ⊆ G. The method of exchangeable pairs has been generalized to other absolutely continuous distributions, like the exponential ( [7] and [21]), the multivariate normal ( [8], [56] and [38]) and the Beta distribution [14]. It has also been developed for general classes of one-dimensional absolutely continuous distributions in [9], [16] and [14]. As was observed in [59], in the case of one-dimensional distributional approximation one may in general relax the exchangeability condition to the assumption that W and W be identically distributed.
In this article we focus on the exchangeable pairs method in the context of one-and multidimensional normal approximation. The following result is a variant of Theorem 1, Lecture 3 in [63] (see also Theorem 4.9 in [10]). It slightly improves on these result with respect to the constants appearing in the bound and is also stated in terms of identically distributed random variables W, W as opposed to exchangeable ones as well as for general sub-σ-fields G of F with σ(W ) ⊆ G. The proof is standard and therefore omitted from the paper. Moreover, the result is a direct consequence of Proposition 3.19 in [14] together with the best known bounds on the first two derivatives of the solution to the standard normal Stein equation for Lipschitz test functions (see e.g. Lemma 2.4 in [10]). Theorem 2.1. Let (W, W ) be a pair of identically distributed, square-integrable random variables on (Ω, F, P) such that, for some λ > 0, (2.1) holds. Furthermore, let G be a sub-σ-field of F with σ(W ) ⊆ G. Then, we have the bound For the proof of Theorem 1.3 we will need the following new auxiliary result about exchangeable pairs satisfying identity (2.1) which might be of independent interest. EJP 22 (2017), paper 2. Lemma 2.2. Let (W, W ) be an exchangeable pair of real-valued random variables in L 4 (P) such that, for some λ > 0, (2.1) is satisfied and let G be a sub-σ-field of F with σ(W ) ⊆ G. Then, Proof. By exchangeability of (W, W ) we have Thus, from (2.4), (2.5) and (2.6) we obtain that proving the lemma.

Proof of Theorem 1.3
Let W ∈ L 4 (P) be as in Theorem 1.3 such that its Hoeffding decomposition is given by (1.6). We are going to apply Theorem 2.1 to the σ-field G = σ(X 1 , . . . , X n ) and to the exchangeable pair (W, W ) which is constructed as follows: Let Y := (Y j ) 1≤j≤n be an independent copy of X := (X j ) 1≤j≤n and let α be uniformly distributed on {1, . . . , n} such that X, Y and α are jointly independent. Letting, for j = 1, . . . , n, it is easy to see that the pair (X, X ) is exchangeable. Finally, as exchangeability is preserved under functions, defining where the kernel f J is given by (1.5). We now show that the pair (W, W ) satisfies Stein's linear regression property (2.1) exactly with coefficient λ = d/n.
Proof. It suffices to prove the second equality. Note that Hence, by independence, Here, we have used the defining property of the Hoeffding decomposition to obtain the fourth equality.
We would like to mention that the same construction of the exchangeable pair (W, W ) was used in [58] in the situation of weighted U -statistics. They also noted the validity of (2.1) with λ = d/n in the special case of completely degenerate weighted U -statistics of order d.
In order to apply (2.3), by Lemma 2.3, we thus have to compute an upper bound on the variance of n 2d E (W − W ) 2 X . This is done by finding the Hoeffding decomposition of this quantity in terms of the Hoeffding decomposition of W 2 for which we will now find a new convenient expression. More generally, we derive a formula for the Hoeffding decomposition of the product of two degenerate U -statistics, which will also be needed for the proof of Theorem 1.6.
Assume that 1 ≤ p, q ≤ n and that W and V are square-integrable pand q-degenerate U -statistics with respect to the same underlying sequence X, respectively, with Hoeffding The product U := V W in general is not a degenerate U -statistic, but it clearly has a Lemma 2.5. Let J ∈ D p and K ∈ D q , respectively.
(a) The Hoeffding decomposition of W J V K is given by Proof. The claim of (a) follows immediately from Lemma 2.4 and from the general formula for the Hoeffding decomposition of an F J∪K -measurable random variable T which is given by The claim of (b) follows similarly upon observing that, for L ⊆ (J ∪ K) \ {j} we have The next result which might be of independent interest plays a similar role as the product formula for two multiple Wiener-Itô integrals (see e.g. [41]). Theorem 2.6 (Product formula for degenerate U -statistics). Let 1 ≤ p, q ≤ n and let W, V ∈ L 2 (P) be pand q-degenerate U -statistics, respectively, with respective Hoeffding decompositions given by (2.7). Then, the Hoeffding decomposition Before we proceed, let us, following [11] and [12], introduce the following important classes of quadruples (J 1 , J 2 , J 3 , J 4 ) ∈ D 4 d . We call an element j ∈ J 1 ∪ J 2 ∪ J 3 ∪ J 4 a free index, if it appears in J i for exactly one i ∈ {1, 2, 3, 4}. Note that this implies that E W J1 W J2 W J3 W J4 = 0 and there exists at least one j ∈ [n] such that i.e. each element of the union J 1 ∪ J 2 ∪ J 3 ∪ J 4 appears in J i for at least two values of i ∈ {1, 2, 3, 4} and there is an element of the union J 1 ∪ J 2 ∪ J 3 ∪ J 4 that appears in J i for at least three values of i ∈ {1, 2, 3, 4}.
Note that the last identity in the definition of S 0 is true by virtue of (2.11). The following result is Proposition 5 (b) of [12]. We will prove a more general version stated as Proposition 3.5 to deal with the multivariate case.
Recall the definition of the Lindeberg-Feller quantity = n given in (1.7). Next, we state a substantial improvement of Lemma B in [12]. Indeed, there the upper bound on τ is of order as compared to the order 2 which we obtain. Its proof is deferred to Section 4.
The next two lemmas will be very useful for what follows.
which proves the claim. Now we are able to bound the first term on the right hand side of (2.3): Lemma 2.11. For the above constructed exchangeable pair we have (2.14) Proof. Using the orthogonality of the summands within the Hoeffding decomposition as well as a M ∈ [0, 1], |M | ≤ 2d − 1, from Lemma 2.7 we obtain that where the final inequality is by Lemma 2.10. Now, we proceed to bounding the second error term appearing in the bound (2.3) from Theorem 2.1. The next lemma will be crucial for doing this.
Proof. From Lemmas 2.3, 2.2 and 2.7 we obtain that where we have used Lemma 2.10 to obtain the last inequality.
From the fact that and using the Cauchy-Schwarz inequality we obtain

Stein's method of exchangeable pairs for multivariate normal approximation
Although the exchangeable pairs coupling lies at the heart of univariate normal approximation by Stein's method, it was only in 2008 in [8] that the problem of developing an analogous technique in the multivariate setting was finally attacked. In their work, for a given random vector W = (W (1), . . . , W (r)) T , the authors assume the existence of another random vector W = (W (1), . . . , W (r)) T , defined on the same probability space (Ω, F, P), such that W has the same distribution as W and such that the linear regression property E W − W W = −λW is satisfied for some positive constant λ. Under these assumptions the authors prove several theorems which bound the distance from W to a standard normal random vector in terms of the pair (W, W ). In [56] the authors motivate and investigate the more general linear regression property E W − W G = −ΛW + R , (3.1) where now Λ is an invertible non-random r × r matrix, G ⊆ F is a sub-σ-field of F such that σ(W ) ⊆ G and R = (R(1), . . . , R(r)) T is a small remainder term. However, in contrast to [8] and to the univariate situation presented in Subsection 2.1, in [56] the full strength of the exchangeability of the vector (W, W ) is needed. Finally, in [38] the two approaches from [8] and [56] are combined, allowing for the more general linear regression property from [56] and using sharper coordinate-free bounds on the solution to the Stein equation similar to those derived in [8]. The following result, quoted from [13], is (a version of) Theorem 3 in [38] but with better constants. Theorem 3.1. Let (W, W ) be an exchangeable pair of R r -valued L 2 (P) random vectors defined on a probability space (Ω, F, P) and let G ⊆ F be a sub-σ-field of F such that σ(W ) ⊆ G. Suppose there exist a non-random invertible matrix Λ ∈ R r×r , a non-random positive semidefinite matrix Σ, a G-measurable random vector R and a G-measurable random matrix S such that (3.1) and hold true. Finally, denote by Z a centered r-dimensional Gaussian vector with covariance matrix Σ.

Proof of Theorem 1.6
Recall the notation and assumptions from Subsection 1.3. Starting from the random vector W = (W (1), . . . , W (r)) T we will construct another vector W := (W (1), . . . , W (r)) T such that (W, W ) is an exchangeable pair in the following way: For each 1 ≤ i ≤ r we construct W (i) in the same way as we did in the one-dimensional situation treated in Subsection 2.2 and from the same independent copy Y = (Y 1 , . . . , Y n ) of X = (X 1 , . . . , X n ) and the same α which is independent of (X, Y ) and uniformly distributed on [n]. We will apply Theorem 3.1 with Σ = V and G = σ(X 1 , . . . , X n ).

Lemma 3.2.
With the above definitions and notation we have where the matrix Λ is given by Λ = diag p1 n , . . . , pr n .
Proof. This follows immediately from Lemma 2.3.
Hence, we obtain that Let us define the random matrix S = (S i,k ) 1≤i,k≤r by the relation From Lemma 3.2 and the fact that v i,k = 0 unless p i = p k we easily conclude that S is symmetric. Also, using exchangeability, it is readily checked that be the Hoeffding decomposition of W (i)W (k). Then, we have the Hoeffding decomposi- Proof. First note that we have the representation Note that for the third equality we have used the crucial fact that Also, from Lemma 2.5 (a) we obtain that Since S is centered, from (3.4) and Lemma 3.3 we obtain that If p i < p k , then from (3.8) we have that Lemma 4.1 immediately yields that If p i = p k , then we obtain that k) . (3.11) Similarly to Lemma 4.1 we obtain for p i = p k that EJP 22 (2017), paper 2.
≥ v 2 i,k − p i n,k n,i . Note that we can write Hence, if p i < p k , then, since v i,k = 0, from (3.9), (3.10) and (3.13) we see that If, on the other hand, p i = p k , then from (3.11), (3.10), (3.13) and (3.12) we conclude that n,i + p i n,k n,i − S 0 (i, k) . For the last identity we have used the elementarily verifiable fact that For p i < p k , by the orthogonality of the Hoeffding decomposition and by the Cauchy-Schwarz inequality we have that Since p i < p k , by means of (3.16) we can further bound where the final inequality is true by (2.13).
From (3.14) and (3.17) and from (3.15), respectively, we thus obtain the following result.  (ii) If p i = p k , then It remains to bound the quantities S 0 (i, k), 1 ≤ i ≤ k ≤ r. The concepts of free indices and bifold quadruples from Subsection 2.2 generalize in the obvious way to quadruples We denote by B i,k the collection of all bifold quadruples in D 4 i,k . Also, we denote by T i,k the set of quadruples (J 1 , J 2 , J 3 , J 4 ) ∈ D 4 i,k which are neither bifold nor have a free index, i.e. which satisfy and there is a j ∈ [n] such that With these definitions, for 1 ≤ i ≤ k ≤ r, we define The next result is a generalization of Proposition 2.8.
The proof is postponed to Section 4. It remains to obtain a bound on the quantities τ i,k in terms of 2 n,i and 2 n,k . This is provided by the following result which generalizes Proposition 2.9. An outline of the main elements of the proof is given in Section 4. Proposition 3.6. For each 1 ≤ i, k ≤ r, there exists a finite constant C i,k which depends on i and k only through p i and p k and which is independent of n such that Combining Propositions 3.5 and 3.6, we thus obtain that Observe that, using (3.3) and the symmetry of S, we can bound Now, using Lemma 3.4 we have + q l min 2 n,k , 2 n,i + q l n,k n,i + C i,k max 2 n,i , 2 n,k (3.20) Here, the constants C qm are defined by Proposition 2.9. Note that from Lemma 2.12 applied to the exchangeable pair (W (i), Using (3.3) as well as Jensen's inequality, we obtain Thus, by (3.21) we have Theorem 1.6 now follows from Theorem 3.1 and from the respective bounds (3.19), (3.20) and (3.22) .

Proof. Note that we have
The claim follows by symmetry.
Proof. We repeat the short proof from [12]. By independence, we have Proof. Again, we immitate the proof given in [12]. Using first the conditional version of the Cauchy-Schwarz inequality and then twice the independence of the underlying random variables X 1 , . . . , X n we obtain where we have used Lemma 4.2 to obtain the second equality. Note that because for a bifold quadruple (J, K, L, M ) the identity J \ M = L \ K implies that J ∩ K = L ∩ M = ∅ and because we have Thus, the claim follows.
Proof of Proposition 2.9. In order to prove Proposition 2.9 let us review the following concepts and notation, introduced in [11]. For a quadruple (J 1 , J 2 , J 3 , J 4 ) ∈ D 4 d write Note that since we have the equivalence a ∈ J l ⇔ i a ∈ J l the sets J l satisfy obvious relations like for some r ∈ {d, d + 1, . . . , 2d − 1} and J i ∩ J k = ∅ for all i, k = 1, 2, 3, 4. Indeed, if, for instance, J 1 ∩ J 2 were empty and j 0 ∈ J i for at least three values of i ∈ {1, 2, 3, 4}, then Thus, (J 1 , J 2 , J 3 , J 4 ) has a free index and, hence, cannot be in T d . By the above observation (4.1), this immediateley implies that also J i ∩ J k = ∅ for all i, k = 1, 2, 3, 4.
In general, we call a quadruple of sets F = (F 1 , F 2 , F 3 , F 4 ) a shadow (a d-shadow) if there is an r ∈ {d, d+1, . . . , 2d−1} such that F := F 1 ∪F 2 ∪F 3 ∪F 4 = {1, . . . , r} and |F l | = d for l = 1, 2, 3, 4. We call r the size of the shadow F. We say that the shadow F is induced by the quadruple (J 1 , . . , r }, then we say that F and F are equivalent and write F ∼ F , if r = r and there is a permutation σ ∈ S r such that F l = σ(F l ) for l = 1, 2, 3, 4. We denote the latter fact by F = F σ . This clearly defines an equivalence relation on the set of d-shadows and we denote by [F] ∼ the equivalence class of F. We further denote by γ(F) the number of permutations σ ∈ S r that leave F fixed in the sense that σ(F l ) = F l for all l = 1, 2, 3, 4. The set of these permutaions is just the stabilizer of F with respect to the natural action of S r on the set of d-shadows of size r. Note that, for F ∼ F, we have γ(F) = γ(F ) and that γ(F) also gives the number of permutations σ such that (4.2) holds. Let us define the function g : Further, for a shadow F = (F 1 , F 2 , F 3 , F 4 ) which is induced by some quadruple (J 1 , J 2 , J 3 , J 4 ) ∈ D 4 d and with F : . . , r} and π F l being the natural Proof. For ease of notation, in this proof we use bold letters a to denote tuples a = (a 1 , . . . , a s ) ∈ [n] s , where s is some natural number. Also, for two such tuples a = Here, we used the notation [n] r = for the set of all tuples (i 1 , . . . , i r ) ∈ [n] r such that i j = i k whenever j = k. Hence, it suffices to show that we always have the bound We first treat the simple cases that either two or all of the sets F l , l = 1, 2, 3, 4, are equal. Note that the case of exactly three equal sets is vacuous for a quadruple in T . Assume first that e.g. F 3 = F 1 = F 2 = F 4 . It might be that also F 3 = F 4 but this is immaterial. Then, we have Note that the second inequality follows from the fact that F 1 ∩ F 3 = ∅ and F 1 ∩ F 4 = ∅ in this case as well as by the definition of 2 For the remainder of this proof we may thus assume that the sets F l , l = 1, 2, 3, 4, are pairwise different. Then, using the Cauchy-Schwarz inequality, we can bound n .
Note that we have used the fact that to obtain the last inequality.
2) There is an element j 0 ∈ F = F 1 ∪ F 2 ∪ F 3 ∪ F 4 which is contained in exactly two of the sets F 1 , F 2 , F 3 , F 4 . We may assume that j 0 ∈ F 1 . We claim that then there are distinct indices j, k ∈ {2, 3, 4} such that and, hence, j 0 cannot be contained in the set on the right hand side. Thus, let us assume that F 1 ⊆ F 3 ∪ F 4 . We obtain that G F3 (j, a 1 , a * 2 , k, l)G F4 (j, a 1 , a * 2 , k, l) 2 , where a * 2 ∈ [n] F1∩F2\(F3∪F4) is arbitrary but fixed. Now note that due to the fact that F is induced by some quadruple in T we have EJP 22 (2017), paper 2.
where the union on the right hand side is disjoint. Thus, the last bound becomes which is independent of n.
Remark 4.5. Using the fact that the equivalence class of a shadow F = (F 1 , F 2 , F 3 , F 4 ) is determined by the cardinalities of all finite intersections of the sets F 1 , F 2 , F 3 , F 4 , one can get an upper bound on the number s of all equivalence classes of shadows induced by quadruples in T . Using that γ(F) ≥ 1 immediately gives a crude bound on C d . It is not difficult to verify that C 2 = 13 by distinguishing all possible cases. Furthermore, by some clever combinatorial argument, it might be possible to compute sharp bounds on C d starting from (4.5). This would be of great interest for deriving limit theorems in situations where d = d n → ∞ with n. We leave this as an interesting problem for possible future work.
Idea of the proof of Proposition 3.6. The proof of Proposition 2.9 can be easily generalized to the present situation by introducing the concept of a (p i , p k )-shadow corresponding to a quadruple (J 1 , J 2 , J 3 , J 4 ) ∈ D 4 i,k and following exactly the same lines of the proof. We have, however, refrained from giving the proof in this more general situation for mainly two reasons. Firstly, the proof of Proposition 2.9 already involves a lot of notation and introducing even more of it might make the argument less transparent.
Secondly, and more importantly, the precise dependence of the constant C i,k on p i and p k would be more complicated and less explicit than the formula given by (4.5) which EJP 22 (2017), paper 2.
can be exactly evaluated for small values of d and, as mentioned in Remark 4.5, might be suitably bounded for general d.