Quantitative CLTs for symmetric $U$-statistics using contractions

We consider sequences of symmetric $U$-statistics, not necessarily Hoeffding-degenerate, both in a one- and multi-dimensional setting, and prove quantitative central limit theorems (CLTs) based on the use of {\it contraction operators}. Our results represent an explicit counterpart to analogous criteria that are available for sequences of random variables living on the Gaussian, Poisson or Rademacher chaoses, and are perfectly tailored for geometric applications. As a demonstration of this fact, we develop explicit bounds for subgraph counting in generalised random graphs on Euclidean spaces; special attention is devoted to the so-called `dense parameter regime' for uniformly distributed points, for which we deduce CLTs that are new even in their qualitative statement, and that substantially extend classical findings by Jammalamadaka and Janson (1986) and Bhattacharaya and Ghosh (1992).


Motivation and Overview
1.1. Introduction. In the recent reference [DP17], we have provided a multidimensional and quantitative version of a seminal result by de Jong [dJ89, dJ90], roughly stating that, if F = {F n : n ≥ 1} is a normalized sequence of random variables having the form of degenerate, not necessarily symmetric U-statistics of a fixed order, and F enjoys an appropriate Lindeberg property, then a sufficient condition for F n to verify a central limit theorem (CLT) (as n → ∞) is that E[F 4 n ] → 3. Observe that 3 = E[N 4 ], where N ∼ N (0, 1) is a standard Gaussian random variable.
The aim of this paper is to develop some remarkable applications and extensions of the main results of [DP17,dJ89,dJ90] to the case of symmetric and degenerate U-statistics, in a possibly multidimensional setting. By symmetric we mean here that the corresponding kernel does not depend on the choice of the subset of the random input, although it might well vary with the sample size n. In particular, our main aim is to establish a collection of quantitative one-and multi-dimensional CLTs (see Theorem 3.3 and Theorem 4.2 below), with explicit bounds expressed in terms of contraction operators -see Section 2 below as well as [Las16, Section 6] for definitions. Our tools will involve new multiplication formulae for U-statistics (see Proposition 2.6), as well as new estimates on contraction operators (see Lemma 2.4), that seem to have an independent interest.
Although the previously quoted results only involve degenerate U-statistics, we will show in Section 5 that they can be naturally generalized to the case of arbitrary symmetric U-statistics, by exploiting the explicit form of their Hoeffding decomposition, together with our multivariate results. As discussed in great detail in the two monographs [NP12,PR16], as well as in the papers [LRP13a,LRP13b,NPR10b], contraction operators play a fundamental role in CLTs involving random variables belonging to the Wiener chaos of a Gaussian field, of a Poisson measure or of a Rademacher sequence. To the best of our knowledge, our contributions represents the first systematic use of contraction operators in the framework of general symmetric U-statistics.
As the discussion in [BP16] and the references therein largely demonstrates, the use of contraction operator is well-adapted for dealing with geometric applications, involving e.g. additive functionals of random geometric graphs, like the total length, or subgraph counting statistics. In the last section of the present paper, we will apply our results to edge-counting statistics of geometric random graphs, belonging to the family of geometric structures studied in [Pen04], thus substantially generalising some estimates from [LRP13b], as well as from the classical references [BG92,JJ86].
1.2. Comments on previous literature. Due to the generality of our results, the present work is related to most articles dealing with the asymptotic normality of symmetric U-statistics, like the classical paper [Hoe48] about non-degenerate Ustatistics given by fixed kernels, as well as the more recent papers [JJ86,Hal84,BG92,Web83,RR97], in which the considered kernels might well depend on the sample size n. We would like to point out explicitly that, like this work, also the papers [JJ86,BG92] prove asymptotic normality of one-dimensional U-statistics that do not necessarily have a dominant Hoeffding component via a multivariate CLT for the vector of Hoeffding components. Our method can be seen as a quantitative counterpart to such an approach. Moreover, whereas the references [JJ86,Hal84,BG92,Web83] provide in general non-equivalent and very technical conditions for asymptotic normality, our statements will only involve simple analytic quantities, merely depending on norms of contractions of the kernels. We believe that, as in the Poisson situation [LRP13a,LRP13b], such conditions are most suitable for a large array of possible applications -plausibly much wider than the set of examples discussed in the present paper. Finally, although for symmetric U-statistics the results of [DP17] imply asymptotic normality whenever each Hoeffding component satisfies a fourth moment condition, these moment conditions are generally quite hard to check in practice. This remark applies even more so, when the U-statistic is nondegenerate so that one would have to deal with the complicated expressions for the kernels appearing in the Hoeffding decomposition.
As in [DP17], our results rely on Stein's method of exchangeable pairs [Ste86]. Other articles which have proved (quantitative) CLTs for U-statistics via this approach include [RR10,RR97]. However, since [RR10] only deals with non-degenrate kernels that do not depend on n, the overlap with the present paper seems marginal. In [RR97], the class of so-called weighted U-statistics is considered, and CLTs are obtained for non-degenrate kernels of arbitrary order as well as for degenerate kernels of order 2. In the latter case, and when all weights are set to 1, our bound in Theorem 3.3 not only improves on [RR97, Theorem 1.4] with respect to the rate of convergence but also deals with degenrate kernels of arbitrary orders.
We eventually observe that an alternate approach for obtaining the main results of the present paper (in particular, the general bounds of Section 5) could be based, in principle, on an adequate generalization of the de-Poissonization techniques of [DM83] to the case of non-degenerate kernels whose expression possibly depends on the sample size, that should then be combined with the estimates from [LRP13b]. In the general case of a sequence of non-degenerate U-statistics whose kernel varies with the sample size, implementing such an approach would involve a number of highly non-trivial technical difficulties: we therefore prefer to keep this direction of research as a separate subject of further investigation. We stress that the intrinsic approach developed in the present paper will also yield some remarkable results of independent interest, most notably the product formulae stated in the next section.
1.3. Plan. Section 2 contains some preliminary results, as well as a discussion of product formulae for degenerate U-statistics, and several useful estimates for contraction operators. Section 3 deals with one-dimensional approximation results for degenerate U-statistics, whereas the multidimensional case is dealt with in Section 4. In Section 5, we establish a number of new bounds for general U-statistics, whereas an application to random graphs is detailed in Section 6. Some technical proofs are collected in Section 7.

Preliminary Notions and Auxiliary Results
We will now present several useful results concerning the Hoeffding decompositions of square-integrable U-statistics, as well as contraction operators. Both constitute the theoretical backbone of our approach.
Every random object appearing in the sequel is defined on a suitable common probability space (Ω, F , P).
In general, the kernel ψ might also depend on the parameter n, but we will often suppress such a dependence, in order to simplify the notation.
In what follows, we will write X := (X i ) 1≤i≤n , and use the symbol . For p, ψ, X as above, we define We say that the random variable J p (ψ) is the U-statistic of order p, based on X and generated by the kernel ψ. For p = 0 and a constant c ∈ R we further let J 0 (c) := 0. Now assume that p ≥ 1 and ψ ∈ L 1 (µ ⊗p ). The kernel ψ is called (completely) degenerate or canonical with respect to µ, if Remark 2.1. In non-parametric statistics (see e.g. the classical references [KB94,Ser80]), the quantity is called a U-statistic, since it is always an unbiased estimator of the parameter note that many well-known estimators from statistics turn out to be U-statistics (see again [KB94,Ser80]). We however choose to refer to the unaveraged version J p (ψ) as a "U-statistic". Moreover, in this situation the kernel is typically not degenerate, since θ would have to be equal to 0 otherwise. However, the centered kernel ψ − θ might well be degenerate.
having the form of a deterministic function g of (not necessarily identically distributed) independent random variables X 1 , . . . , X n , has a P-a.s. unique representation of the type : |M |=s 2.3. Hoeffding decompositions for symmetric U-statistics. Now assume that the random variable Y is given by a U-statistic J p (ψ) based on a vector X = (X 1 , ..., X n ) of i.i.d. random variables, and generated by a symmetric kernel ψ, that is: where we used the notation (2). In this case, the Hoeffding decomposition of Y = J p (ψ) can be expressed as the sum of its expectation and of a linear combination of U-statistics generated by symmetric and degenerate kernels ψ s of orders s = 1, . . . , p, that is, and the symmetric functions g k : E k → R are defined by (6) g k (y 1 , . . . , y k ) := E ψ(y 1 , . . . , y k , X 1 , . . . , X p−k ) , in such a way that, for 1 ≤ s ≤ p, ψ s is symmetric and degenerate of order s. In particular, one has g 0 ≡ ψ 0 ≡ E ψ(X 1 , . . . , X p ) and g p = ψ. For s = 1, . . . , p one has the alternative formula and one can easily check that the random variables Y M appearing in (3) verify the relations Remark 2.2. Plainly, for the averaged version of the U-statistics, the Hoeffding decomposition reads 2.4. Analysis of Variance. In this work, we are interested in symmetric U-statistics Y = J p (ψ), based on an i.i.d. sample X, such that the kernel ψ is square-integrable with respect to µ ⊗p . Under such an assumption, the summands in the Hoeffding decomposition (4) are orthogonal in L 2 (P), thanks to the degeneracy of the kernels ψ s , s = 1, . . . , p. In particular, we have that Choosing n = p leads to the following useful lower bound on the variance: Another useful variance formula in terms of the functions g k is as follows (see e.g. [Ser80,p. 183]): Recalling that g p = ψ yields the following lower bound on the variance of J p (ψ): Remark 2.3 (On notation). For the rest of the paper, for every integer m and every real r > 0, we will use the standard notation: Given a measurable mapping ϕ : E m → R, we will often write (by a slight abuse of notation), even when the right-hand side of the previous equation is infinite.
Note also that ψ ⋆ 0 p ψ = ψ 2 is square-integrable if and only if ψ ∈ L 4 (µ ⊗p ). Hence, ψ ⋆ l r ϕ might not be in L 2 (µ ⊗p+q−r−l ) even though ψ ∈ L 2 (µ ⊗p ) and ϕ ∈ L 2 (µ ⊗q ). Moreover, if l = r = p, then ψ ⋆ p p ψ = ψ 2 L 2 (µ ⊗p ) is constant. The next result lists the properties of contraction kernels that are most useful for the present work. The (quite technical) proof is deferred to Section 7.
(i) For all 0 ≤ l ≤ r ≤ p ∧ q the function ψ ⋆ l r ϕ given by (12) is well-defined, in the sense specified at the beginning of the present subsection.
where both sides of the inequality might assume the value +∞.
where both sides of the inequality might assume the value +∞. (iv) If ψ ∈ L 4 (µ ⊗p ) and ϕ ∈ L 4 (µ ⊗q ), then, for all 0 ≤ r ≤ p ∧ q, one has ψ ⋆ l r ϕ ∈ L 2 (µ ⊗p+q−r−l ) and (v) For all 0 ≤ r ≤ p ∧ q the function ψ ⋆ r r ϕ is in L 2 (µ ⊗p+q ) and ψ ⋆ r r ϕ L 2 (µ ⊗p+q−2r ) ≤ ψ L 2 (µ ⊗p ) ϕ L 2 (µ ⊗q ) . (vi) If, for all 0 ≤ l ≤ p − 1, ψ ⋆ l p ψ ∈ L 2 (µ ⊗p−l ) and, for all 0 ≤ k ≤ q − 1, ϕ ⋆ k q ϕ ∈ L 2 (µ ⊗q−k ), then, for all 0 ≤ l ≤ r ≤ p ∧ q, one has ψ ⋆ l r ϕ ∈ L 2 (µ ⊗p+q−r−l ) and ψ ⋆ l r ϕ 2 We will heavily rely on item (iv) for deriving our normal approximation bounds. Moreover, in certain applications item (vi) (which is already contained in Lemma 2.9 of [PZ10]) can be very useful in order to study the asymptotic distributional behaviour of vectors of degenerate U-statistics. (b) The contraction kernels defined by (12) also play a fundamental role for the normal approximation of functionals of a general Poisson measure having the form of multiple Wiener-Itô integrals or, more generally, of U-statistics (see e.g. [PSTU10, LRP13a, LRP13b, PZ10, BP16]), as well as of functionals of a Rademacher sequence (see e.g. [NPR10b,KRT16]). In these settings, the measure µ appearing in (12) is the control measure of the Poisson measure and the counting measure on N, respectively, and, hence, it is in general not finite. We also stress that items (i), (ii), (v) and (vi) of Lemma 2.4 also hold true for σ-finite measures µ. This will be clear from the proof below. (c) Statements (iii) and (iv) can be suitably adapted to the framework of a finite measure, by introducing appropriate additional multiplicative constants on the right hand sides of the respective inequalities. For instance, inequality (iv) becomes ψ ⋆ l r ϕ L 2 (µ ⊗p+q−r−l ) ≤ µ(E) l−r+(p+q)/2 ψ L 4 (µ ⊗p ) ϕ L 4 (µ ⊗q ) . On the other hand, if µ(E) = +∞, then, in general, there is no finite constant C = C(p, q, r, l) such that ψ ⋆ l r ϕ L 2 (µ ⊗p+q−r−l ) ≤ C ψ L 4 (µ ⊗p ) ϕ L 4 (µ ⊗q ) . Indeed, take (E, E, µ) = (R, B(R), λ), p = q = r = 2, l = 1 and ψ(x, y) = ϕ(x, y) = (1 + x 2 ) −1/4 (1 + y 2 ) −1/4 . Then, for all x ∈ R and, a fortiori, ψ ⋆ 1 2 ψ L 2 (λ) = +∞ but 2.6. Product formulae and related estimates. It is easily seen that the contraction kernels ψ ⋆ l r ϕ are, in general, not symmetric. If f : E p → R is an arbitrary function, then we denote byf its canonical symmetrization defined viã where, as before, S p indicates the group of permutations of the set [p]. It easily follows from Minkowski's inequality that, if f ∈ L 2 (µ ⊗p ), then so isf and The following new formula for the product of two degenerate, symmetric Ustatistics, which has an independent interest, will be crucial for the proofs of the main results provided in this work. Such a statement is a more explicit expression of the product formula for degenerate, not necessarily symmetric U-statistics which was provided recently in [DP17]; it also represents a particularly attractive alternative to the combinatorial product formulae for U-statistics derived in [Maj13,Chapter 11]. The proof is provided in Section 7.
Proposition 2.6 (Product formula for degenerate, symmetric U-statistics). Let p, q ≥ 1 be positive integers and assume that ψ ∈ L 2 (µ ⊗p ) and ϕ ∈ L 2 (µ ⊗q ) are degenerate, symmetric kernels of orders p and q respectively. Then, whenever n ≥ p + q we have the Hoeffding decomposition: where, for t ∈ {0, 1, . . . , 2(p ∧ q)}, the degenerate, symmetric kernel of order p + q − t, is given by In the previous expression, the kernels ψ ⋆ t−r r ϕ p+q−t appearing in the Hoeffding decomposition of J p+q−t ( ψ ⋆ t−r r ϕ) p+q−t have been defined in (5) and we have written ⌈x⌉ to indicate the smallest integer greater or equal to the real number x.
Remark 2.7. (a) Proposition 2.6 is in the same spirit as the existing product formulae for multiple stochastic integrals on the Wiener space (see e.g. Theorem 2.7.10 in [NP12]), on the Poisson space (see [Sur84,Las16]) and for functionals of a Rademacher sequence -see [NPR10a,Kro17,PT15]. In particular, the product formula for multiple integrals on the Poisson space in its orthogonal form given explicitly by equation (19) of [PZ10] is completely analogous to (15), as one can see by the change of variables k = p + q − t in (15), and by replacing the indicator 1 {p+q−r−l=k} in formula (18) of [PZ10] with suitable conditions on the respective summation indices. (b) The product formula for non-symmetric and non-homogeneous Rademacher sequences in [PT15, formula (5. 3)] or, equivalently, in [Kro17, formula (2.4)] can be easily related to Proposition 2.6 and their similarity is quite striking. Note that, on the one hand, our formula is more general, in the sense that we allow for an arbitrary distribution of the underlying i.i.d. random variables whereas the formula in [PT15] is restricted to discrete multiple integrals which are functionals of a Rademacher sequence; on the other hand, the success parameters of the Rademacher sequences considered in [PT15] are allowed to vary and further the multiple integrals might depend on the whole infinite sequence.
In order to derive our main bounds, we will also make use of the following elementary lemmas.
Lemma 2.9. Let n, p, q, t, r be positive integers such that n ≥ p + q and 1 ≤ r ≤ t ≤ p + q − 1. Then, the inequality n p+q−t n p n q is in order, where C(p, q, t, r) is a suitable constant which only depends on p, q, r and t.
Proof. This immediately follows from the definition of multinomial coefficients.

Main Results in Dimension One
3.1. Degenerate U-statistics. For the rest of this section, we let Z ∼ N(0, 1) denote a standard normal random variable, and write X = (X 1 , ..., X n ) to indicate a vector of i.i.d. random variables, with values in a space (E, E) and common distribution µ. We also fix a degenerate, symmetric kernel ψ = ψ(n) -possibly depending on the integer parameter n -of order p ≥ 1 (see Section 2.1 for definitions), and we assume that E ψ 4 (X 1 , . . . , X p ) < ∞ .
We assume that σ 2 n > 0 and denote by ϕ := ϕ n the kernel defined via where the set D p is defined in (1). Of course, E[W ] = 0, Var(W ) = 1 and, by degeneracy, is the Hoeffding decomposition of W , as defined in Section 2.2. Since, by assumption, W is a square-integrable U-statistic of order p, it is easy to see that U := W 2 admits a Hoeffding decomposition of the type (3), that we write (with obvious notation) as the explicit form of the Hoeffding decomposition of U can be of course be deduced from Proposition 2.6.
definition 3.1. Given two real-valued random variables X, Y we write where Lip(1) is the class of all 1-Lipschitz mappings h : R → R, to indicate the Wasserstein distance between the distributions of X and Y (see [NP12,Appendix C], and the references therein, for some basic properties of this distance).
The following lemma is a simple consequence of the techniques developed in [DP17]. An outline of its proof is given in Section 7.
Lemma 3.2. Under the notation of the present section, one has the bound where κ p is a finite constant which only depends on p.
We now state one of the main results of the paper. It corresponds to an explicit bound on the normal approximation of degenerate U-statistics, expressed in terms of contraction operators.
Theorem 3.3. With W as defined above and with the constants κ p from Lemma 3.2 and C(p, p, t, r) defined in Lemma 2.9, we have Remark 3.4. Fix p, and assume as before that the kernel ψ = ψ(n) depends on the parameter n. Then, as n → ∞, the bound (21) is of the order whereas the bound (20) behaves asymptotically as The asymptotic relations pointed out in Remark 3.4 immediately yield the following one-dimensional CLT.
Corollary 3.5. Let p be a fixed positive integer and, for each n ≥ p, let ψ(n) ∈ L 4 (µ ⊗p ) be a symmetric and degenerate kernel with respect to the probability measure µ such that ψ(n) L 2 (µ ⊗p ) > 0. Let X 1 , X 2 , . . . be i.i.d. random variables on (Ω, F , P) with common distribution µ and, for n ≥ p, let W n be the normalized random variable obtained from J p (ψ(n)) := according to the definition (18). Assume that the following conditions (i) and (ii) are satisfied: Then, as n → ∞, W n converges in distribution to Z ∼ N(0, 1).
Remark 3.6. (a) The statement of Corollary 3.5 is in fact an extension of a CLT by Hall [Hal84] to general p. Indeed, in this reference it is proved that, with the above notation for p = 2, the CLT for W n , n ∈ N, holds, whenever Note that our bound (21) even gives a precise estimate of the error of normal approximation in this situation. Proof of Theorem 3.3. We will apply Lemma 3.2. Our goal is therefore to effectively bound from above the quantity in terms of the kernel function ψ. From Proposition 2.6, we deduce that Using (8), (9) as well as Lemmas 2.8 and 2.9, for a fixed t ∈ {1, . . . , 2p − 1}, we obtain that Using Lemma 2.4-(iv) for t 2 < r ≤ t ∧ p and distinguishing the cases of even and odd values of t, we thus infer the chain of inequalities The bounds (20) and (21) now follow from Lemma 3.2 and from the bounds (23) and (24), respectively.
3.2. U-statistics with a dominant component. In this subsection we drop the restriction that the kernel ψ be degenerate, and we obtain quantitative CLTs under the assumptions that one of the terms in the Hoeffding decomposition is dominant in the large sample limit n → ∞. The reason why we treat this case separately from the general results of Section 5 is that, by virtue of the one-dimensional results of the previous section, we are able to obtain explicit bounds in the Wasserstein distance. The theory developed in Section 5 will hinge on multidimensional results involving smooth distances, and will therefore yield bounds for more regular test functions.
We now assume that ψ = ψ(n) is a symmetric kernel of a fixed order 1 ≤ p ≤ n such that

Denote by
the Hoeffding decopmposition (4) of J p (ψ) with symmetric and degenerate kernels ψ s of order s which automatically satisfy E ψ 4 s (X 1 , . . . , X s ) < ∞ , s = 1, . . . , p. This can be easily seen from their explicit construction. Let us further assume w.l.o.g. that E[J p (ψ)] = 0 and that ψ 2 L 2 (µ ⊗p ) = Var(ψ(X 1 , . . . , X p )) = 1. We then define to be the so-called order of degeneracy or Hoeffding rank of J p (ψ). The second equality in (25) easily follows from (5) and (7). Let as well as (note that W, Y, R all implicitly depend on n). We provide the following bound on the Wasserstein distance between the law of W and the standard normal distribution, which is useful whenever the random variable Y (that is, the first non-trivial Hoeffding component of W ) is dominant and R is negligible.
Theorem 3.7. Under the above assumption, one has the estimates and suitable bounds on d W (Y, Z) are provided by Theorem 3.3.
Proof. Using the simple inequality as well as This is the first bound stated in the Theorem. The second one follows immediately from this one and from (9) since Combined with Theorem 3.3, Theorem 3.7 yields the following CLT.
Corollary 3.8. Let p be a fixed positive integer and, for each n ≥ p, let ψ(n) ∈ L 4 (µ ⊗p ) be a symmetric kernel such that E p ψ dµ ⊗p = 0 and ψ(n) L 2 (µ ⊗p ) > 0. Let X 1 , X 2 , . . . be i.i.d. random variables on (Ω, F , P) with distribution µ and, for n ≥ p, denote by m = m n the Hoeffding rank of the U-statistic and let Then, with W n := σ −1 mn J p (ψ(n)), n ≥ p, assume that the following conditions (i), (ii) and (iii) are satisfied: Then, as n → ∞, W n converges in distribution to Z ∼ N(0, 1).
If the Hoeffding rank m = m n does in fact not depend on n, then (iii) can be replaced with the weaker condition Again, if we are dealing with a fixed kernel ψ not depending on n, then also m does not depend on n and we obtain asymptotic normality of W n if and only if m = 1. As observed before, such a phenomenon is consistent with classical results about the asymptotic distribution of U-statistics, see e.g. [Hoe48,Gre77,Ser80,DM83].

Multivariate Results
Our goal in this section is to deduce explicit multidimensional bounds for vectors of degenerate U-statistics. As in the previous section, we denote by X = (X 1 , ..., X n ) (n ≥ 1) a vector of i.i.d. random variables, with values in (E, E) and with distribution µ.
4.1. Setup. We start by fixing a positive integer d and, for 1 ≤ i ≤ d, we let ψ (i) = ψ (n,i) be a degenerate and symmetric kernel of order 1 ≤ p i ≤ n (as before, the tacit dependence of the kernels on the sample size n will be omitted whenever there is no risk of confusion). We will again assume that ψ (i) ∈ L 4 (µ ⊗p i ) and, for Without loss of generality, we can assume that p i ≤ p k whenever 1 ≤ i < k ≤ d.
Thus, there is an s ∈ {1, . . . , d} as well as positive integers 1 ≤ d 1 < d 2 < . . . < d s = d and 1 ≤ q 1 < q 2 < . . . < q s such that Note that v i,i = σ n (i) 2 for i = 1, . . . , r and |v i,k | ≤ σ n (i)σ n (k) for 1 ≤ i, k ≤ d, by the Cauchy-Schwarz inequality. Note also that, by degeneracy of the kernels, v i,k = 0 unless p i = p k . Hence, V is a block diagonal matrix. Throughout this section we denote by Z = Z(1), . . . , Z(d) the Hoeffding decomposition of W (i)W (k); similarly to the situation of the previous section, the explicit form of the random variables U M (i, k) can be deduced from Proposition 2.6.
4.2. Generalities on matrix norms and related estimates. For a vector x = (x 1 , . . . , x d ) T ∈ R d we denote by x 2 its Euclidean norm and for a matrix A ∈ R d×d we let A op be the operator norm induced by the Euclidean norm, i.e., A op := sup{ Ax 2 : x 2 = 1} .
More generally, for any k-multilinear form ψ : (R d ) k → R, k ∈ N, we define its (generalized) operator norm as Recall that for a function h : R d → R, its minimum Lipschitz constant M 1 (h) is given by If, for instance, h is differentiable, then it is easy to see that If, more generally, k ≥ 1 and if h : R d → R is a (k − 1)-times differentiable function, then we let Recall that the Hilbert-Schmidt inner product of two matrices A, B ∈ R d×d is defined by Thus, ·, · H.S. is just the standard inner product on R d×d ∼ = R d 2 . The corresponding Hilbert-Schmidt norm will be denoted by · H.S. . With this notion at hand, and following [CM08] and [Mec09], for k = 2 we finally definẽ with Hess h being the Hessian matrix of h. Then, we have the inequality

4.3.
Main results. The next lemma is the multivariate counterpart to Lemma 3.2 and, as the latter, relies on the methods provided in the recent paper [DP17]. Its proof is sketched in Section 7.
Lemma 4.1. Under the assumptions of Section 4.1, the following holds. There are constants κ p i ∈ (0, ∞), only depending on p i , 1 ≤ i ≤ d, such that: (ii) If, moreover, V is positive definite, then for each h ∈ C 2 (R d ) such that E |h(W )| < ∞ and E |h(Z)| < ∞, We next state our main multivariate normal approximation theorem, and some more notation is needed for the sake of readability. For 1 ≤ i, k ≤ d, we define as well as where the constants C(p, q, t, r) have been defined in Lemma 2.9. As indicated in the statement below, each of the two estimates appearing in Theorem 4.2 hold when either A 1 or A 2 is plugged on the right-hand side -the key to this phenomenon being the subsequent Lemma 4.3.
Theorem 4.2. With the above notation and assumptions, the following estimates hold.
(ii) If, moreover, V is positive definite, then for each h ∈ C 2 (R d ) such that E |h(W )| < ∞ and E |h(Z)| < ∞ and for j = 1, 2 we have For the proof of Theorem 4.2 we will need the following preparatory result.
Proof of Lemma 4.3. By orthogonality and the product formula stated in Proposition 2.6, we have where By Lemmas 2.8 and 2.9, arguing similarly as in the proof of Theorem 3.3, we obtain for This proves the first inequality in (27). The second estimate in (27) can be deduced from the first one by again distinguishing the cases of even and odd values of 1 ≤ t ≤ p i + p k − 1 and by using the statement of Lemma 2.4-(iv) in the cases t/2 = r.
Proof of Theorem 4.2. The theorem follows immediately from Lemmas 4.1 and 4.3.

Bounds for General Symmetric U-statistics
As anticipated, we now want to apply the multidimensional results of the previous section in order to deal with the one-dimensional normal approximation of general U-statistics; in particular, our main aim is to develop tools for systematically dealing with sequences of U-statistics without a dominant Hoeffding component -thus falling in principle outside the scope of Section 3.2. As before X = (X 1 , ..., X n ), n ≥ 1, indicates a vector of i.i.d. random variables, with values in (E, E), and common distribution µ.
We let ψ : E p → R be a symmetric kernel of order p which is neither necessarily degenerate nor has a dominating component. From (4) we know that the random variable F := J p (ψ) has the Hoeffding decomposition where the symmetric and degenerate kernels ψ s : E s → R of order s are given by (5). We will assume that 0 < σ 2 := Var(J p (ψ)) = Var(F ) for the normalised version of F . Our goal is to use the multivariate bounds from Theorem 4.2 in order to estimate a suitable distance of W to a standard normal random variable Z∼ N(0, 1). Note that the Hoeffding decomposition of W is given by where, in accordance with the notation from Section 4.1, we define Note that, by construction, we have which implies that (32) 0 ≤ ψ (s) L 2 (µ ⊗s ) ≤ 1 , 1 ≤ s ≤ p . In order to apply Theorem 4.2, we must estimate the following contraction norms.
where 0 ≤ i, k ≤ p and 1 ≤ s ≤ i+k 2 − 1. Since the kernels ψ i , 1 ≤ i ≤ p, appearing in (5) have complicated expressions and are, hence, not straightforward to compute in practice, we provide the following lemma which bounds these norms in terms of norms of contractions of the (much) simpler functions g k given by (6). The proof is deferred to Section 7.
Lemma 5.1. Suppose that 1 ≤ s, i, k ≤ p and 0 ≤ l ≤ p be such that 0 ≤ l ≤ s ≤ i∧k. Let Q(s, l) be the set of pairs (r, t) of nonnegative integers such that 0 ≤ t ≤ r ≤ s, t ≤ l and r − t ≤ s − l.
(i) There exists a constant K(i, k, s, l) ∈ (0, ∞) which only depends on i, k, s and l such that (ii) If, in particular, l = s, then these bounds reduce to In order to estimate the quantities A 2 (i, k, n) from Theorem 4.2, we still have to bound the L 4 -norms ψ (i) we obtain from Lemma 5.1 (i) that In order to state our normal approximation result for W , let us introduce the following notation. For 1 ≤ i, k ≤ p ≤ n define as well as where the constants C(i, k, t, s) and K(i, k, s, l) are those from Lemmas 2.9 and 5.1, respectively. Despite their complicated definition, dealing with bounds involving B 1 and B 2 is actually rather straightforward, once one observes that there are finite constants b 1 (i, k) and b 2 (i, k) such that Theorem 5.2 (Normal approximation of general symmetric U-statistics). Let W be as above and let N be a standard normal random variable. Furthermore, let g ∈ C 3 (R) have three bounded derivatives. Then, for j = 1, 2, we have the bound and an analogous inequality holds with the constants B j (i, k, n) replaced by the respective B ′ j (i, k, n). Here, κ i is a finite constant depending only on i. Remark 5.3. (a) Note that, by using (32), the bound in Theorem 5.2 could further be simplified but we prefered leaving it as it is since there might be cases where it is possible to estimate the quantities ψ (n,i) L 2 (µ ⊗p ) more accurately. (b) A drawback of our approach is that Theorem 5.2 allows one to only bound expressions involving C 3 test functions. Such a technical limitation is an artifact of our method of proof, involving a detour through the multivariate normal approximation result stated in Theorem 4.2. On the other hand, our derivation of (35) from a multidimensional result immediately implies that, if one can prove that the right-hand side of (35) converges to zero as n → ∞, then one can immediately deduce the joint convergence of the vector of Hoeffding components of the U-statistic W to some multivariate normal distribution. From a qualitative point of view, this seems to be a much stronger statement than that the simple convergence of W , since the latter might a priori be due to certain cancellation effects. Observe that, as several Hoeffding components of W might vanish in the limit (thus generating a singular covariance matrix), in the proof of (35) we can only invoke part (i) of Theorem 4.2 which gives a bound in terms of C 3 test functions. In general, recurring to smoother test functions seems to be inevitable when using Stein's method for multivariate normal approximation, when one does not deal with an invertible limiting covariance matrix. As already discussed, whenever one Hoeffding component is dominant, then one might use the bound from Theorem 3.7 in order to obtain a bound on the Wasserstein distance. (c) We stress that our bound is purely analytic and that the functions g k , whose contraction norms must be evaluated, are typically much easier to compute than the individual Hoeffding kernels ψ s which are alternating sums of the g k for 0 ≤ k ≤ s (see (5)). Apart from these norms, the only quantity which has to be controlled is the variance σ 2 of F . (d) We remark that the maxima appearing in the definition of the quantities B j (i, k, n) and B ′ j (i, k, n) give certain important constraints on the indices s, l, r and t. This is comparable to similar constraints appearing in the bounds provided in [LRP13a] and [LRP13b]. In particular, when dealing with example cases, it is usually important to take these constraints into account in order to show that the bounds indeed converge to zero. This is for instance the case in the example dealt with in Section 6. (e) Using a linear projection R p 1 +...+p d → R d , we could similarly use Theorem 4.2 in order to provide a bound on the d-dimensional normal approximation of a vector of non-degenerate U-statistics of respective orders p 1 , . . . , p d . This is clear from the proof of Theorem 5.2.
Proof of Theorem 5.2. Let g ∈ C 3 (R) have three bounded derivatives. Define S : R p → R by S(x 1 , . . . , x p ) := p j=1 x j as well as h : R p → R by h := g • S. Then, h ∈ C 3 (R p ), and one can easily check that In particular, it follows that where the last inequality is by (26). Let V be the covariance matrix of the vector Then, S(V ) = W , V is diagonal and by (31) its diagonal entries sum up to 1. Hence, letting Z = (Z 1 , . . . , Z p ) T be a centered p-dimensional normal vector with covariance matrix V, it follows that S(Z) has the standard normal distribution of N. It is easy to see that plugging in the bounds on the contractions ψ i ⋆ l s ψ k L 2 (µ ⊗i+k−s−l ) provided by Lemma 5.1 and (34) as well as respecting (33) yields the bounds B j (i, k, n) which are themselves bounded from above by the B ′ j (i, k, n). Finally, we notice that which is exactly the quantity which is bounded from above in Theorem 4.2 (i).

An Application to Subgraph Counting
Geometric random graphs are graphs whose vertices are random points scattered on some Euclidean domain, and whose edges are determined by some explicit geometric rule; in view of their wide applicability (for instance, to the modelling of telecommunication networks), these objects represent a very popular and important alternative to the combinatorial Erdös-Rényi random graphs. We refer to the monographs [Pen03] and [PR16] for a detailed introduction to this topic and its several applications. We will use our Theorem 5.2 in order to prove the Gaussian fluctuations of subgraph counts in a typical model of this kind. Although the asymptotic (jointly) Gaussian behaviour of these counts is well understood both in the binomial and in the Poisson point process situation (at least at the qualitative level, see again [Pen03]), we chose this example in order to demonstrate the power and easy applicability of our bounds. As already discussed, in the case of uniformly distributed points on some Euclidean domain, our results yield a substantial refinement and extension of [BG92,JJ86]. In the case where the vertices of the random graph are generated by a Poisson measure, the recent paper [LRP13a] provides the univariate CLT with a rate of convergence for the Wasserstein distance.
We fix a dimension d ≥ 1 as well as a bounded and Lebesgue almost everywhere continuous probability density function f on R d . Let µ(dx) := f (x)dx be the corresponding probability measure on (R d , B(R d )) and suppose that X 1 , X 2 , . . . are i.i.d. with distribution µ. Let X := (X j ) j∈N . We denote by (t n ) n∈N a sequence of radii in (0, ∞) such that lim n→∞ t n = 0. For each n ∈ N, we denote by G(X; t n ) the random geometric graph obtained as follows. The vertices of G(X; t n ) are given by the set V n := {X 1 , . . . , X n }, which P-a.s. has cardinality n, and two vertices X i , X j are connected if and only if 0 < X i − X j 2 < t n . Furthermore, let p ≥ 2 be a fixed integer and suppose that Γ is a fixed connected graph on p vertices. For each n we denote by G n (Γ) the number of induced subgraphs of G(X; t n ) which are isomorphic to Γ. Recall that an induced subgraph of G(X; t n ) consists of a non-empty subset V ′ n ⊆ V n and its edge set is precisely the set of edges of G(X; t n ) whose endpoints are both in V ′ n . We will also have to assume that Γ is feasible for every n ≥ p. This means that the probability that the restriction of G(X; t n ) to X 1 , . . . , X p is isomorphic to Γ is strictly positive for n ≥ p. Note that feasibility depends on the common distribution µ of the points. The quantity G n (Γ) is a symmetric U-statistic of X 1 , . . . , X n since where ψ Γ,tn (x 1 , . . . , x p ) equals 1 if the graph with vertices x 1 , . . . , x p and edge set {{x i , x j } : 0 < x i − x j 2 < t n } is isomorphic to Γ and 0, otherwise. For obtaining asymptotic normality one typically distinguishes between three different asymptotic regimes (see Remark 6.3 (b) below): as n → ∞ (thermodynamic regime) It turns out that, under regime (R2) one also has to take into account whether the common distribution µ of the X j is the uniform distribution U(M) on some Borel subset M ⊆ R d , 0 < λ d (M) < ∞ with density f (x) = λ d (M) −1 1 M (x), or not. To take into account this specific situation, we will therefore distinguish between the following four cases: (C3) nt d n → ∞ as n → ∞, and µ is not a uniform distribution. (C4) nt d n → ̺ ∈ (0, ∞) as n → ∞. The following important variance estimates will be needed (in what follows, for a n , b n > 0, n ∈ N, we write a n ∼ b n if lim n→∞ a n /b n = 1).
Proposition 6.1. Under all regimes (R1), (R2) and (R3) it holds that E[G n (Γ)] ∼ cn p t d(p−1) n for a constant c ∈ (0, ∞). Moreover, there exist constants c 1 , c 2 , c 3 , c 4 ∈ (0, ∞) such that, as n → ∞, Proof. The formulas on the asymptotic variances given in Theorems 3.12 and 3.13 in the book [Pen03] yield the claims in the cases (C1) and (C3) and (C4). However, in the case (C2), the limiting covariance appearing in [Pen03, Theorem 3.12] is actually equal to zero, from which one can only infer that the actual order of the variance of G n (Γ) is of a smaller order than n 2p−1 t d(2p−2) n . In order to compute an effective lower bound for such a variance, we will apply formula (11). Indeed, by (11) we have where we have used the fact that ψ 2 Γ,tn = ψ Γ,tn for the second identity. Now, from [Pen03, Proposition 3.1] we know that and we obtain from (36) that indeed for a positive constant c 2 .
We denote by Var(G n (Γ)) the normalized version of G n (Γ). The following statement is a direct application of the main results of this paper.
Theorem 6.2. Let N be a standard normal random variable. Then, with the above definitions and notation, for every function g ∈ C 3 (R) with three bounded derivatives, there exists a finite constant C > 0 which is independent of n such that for all n ≥ p, ] ≤ C · n −1/2 in cases (C3) and (C4) and In particular, we have that W n always converges in distribution to N as n → ∞ in the cases (C1), (C3) and (C4). In case (C2), we have that W n converges in distribution to N under the additional assumption that lim n→∞ n 2p−3 t d(2p−2) n = 0.
Remark 6.3. (a) The proof of Theorem 6.2 provided below is remarkably shortin particular, because we are able to directly exploit several technical computations taken from [LRP13b]. The fact that a CLT for U-statistics based on i.i.d. samples can now be directly proved by slightly adapting the computations for the Poisson setting is a demonstration of the power of Theorem 5.2 above, allowing one to replace estimates involving the kernels of Hoeffding decompositions with considerably simpler expressions. As a side remark, we observe that comparable bounds could in principle be obtained by combining [LRP13b] with a de-Poissonization technique analogous to [DM83]; this would however change the rates of convergence, as well as force us to deal with some complicated conditional variance estimates and provide less complete information about the fluctuations of Hoeffding projections -see also Remark 5.3-(b). (b) Note that in all the three regimes (R1), (R2) and (R3) considered in Theorem 6.2, one has that lim n→∞ n p t d(p−1) n = +∞. Indeed, it is shown in [Pen03, Section 3.2] that W n converges weakly to a Poisson distribution if lim n→∞ n p t d(p−1) n = α ∈ (0, ∞) and to 0 if lim n→∞ n p t d(p−1) n = 0, respectively. Hence, lim n→∞ n p t d(p−1) n = +∞ is a necessary condition for the asymptotic normality of G n (Γ). (c) We remark that the distinction between uniform distributions and non-uniform distribution is not necessary for the analogous problem on Poisson space considered in [Pen03,LRP13b]. The reason is that, in this situation, the formulae for the respective limiting variances are slightly different, see [Pen03, Section 3.2]. The phenomenon that, in the case of a uniform distribution on a set M, the asymptotic order of the variance is different in the dense regime (R2) has already been observed in [JJ86, Section 4] and in [BG92, Theorem 2.1,Theorem 3.1]. It is remarked on page 1357 of [JJ86], in the special case p = 2 of edge counting, that the asymptotic order of Var(G n (Γ)) in fact depends on the boundary structure of the set M. Moreover, for smooth enough boundaries, it is claimed there that Var(G n (Γ)) ∼ cn 2 t d n for some constant c ∈ (0, ∞), whenever t n = o(n −1/(d+1) ). Hence, there are cases where our lower bound for Var(G n (Γ)) in case (C2) given in Proposition 6.1 is sharp. (d) Interestingly, in the case of a uniform µ, the condition (D ′ 3 ), which is assumed in Theorem 3.1 of [BG92] to guarantee asymptotic normality for the number of p clusters (that is, subgraphs of p vertices that are isomorphic to the complete graph), is exactly the same as our additional condition that lim n→∞ n 2p−3 t d(2p−2) n = 0.
We notice that [BG92, Theorem 3.1] exclusively deals with the counting of pclusters, whereas our findings allow one to deduce normal fluctuations for general connected graphs of p vertices. (e) As discussed above (see the proof of Proposition 6.1), in the situation of case (C2), even the qualitative CLT for G n (Γ) given in Theorem 6.2 seems to be new (for instance, in this case the scaling used in Theorem 3.12 of [Pen03] leads to a degenerate limit). We mention that, in the very special case of edge counting (p = 2) considered in [JJ86, Section 4], the authors prove that asymptotic normality holds even without the additional assumption that lim n→∞ nt 2d n = 0. Remark 6.4. We further mention that, in the cases (C1), (C3) and (C4), we obtain the same rate of convergence as the one obtained in [LRP13b] in the Poisson situation for the Wasserstein distance. Moreover, if (t n ) n∈N is bounded away from zero, then the CLT holds true due to Hoeffding's classical CLT via the projection method [Hoe48]. In this case, a combination of Theorems 3.3 and 3.7 yields a bound of order n −1/2 on the Wasserstein distance.
Proof of Theorem 6.2. Denote by σ 2 n := Var(G n (Γ)) the variance of G n (Γ) and let g (k) Γ,tn be the functions (6) corresponding to the kernel ψ Γ,tn . Also, fix 1 ≤ i, k ≤ p as well as indices s and l such that 1 ≤ s ≤ i ∧ k and 0 ≤ l ≤ (i + k − s − 1) ∧ s. Furthermore, assume that (r, t) is a pair of indices in the set Q(s, l). Then, by definition of this set we have 0 ≤ r − t ≤ s − l. Further, the computations on pages 4196-4197 of [LRP13b] show that n (we observe that the authors of [LRP13b] actually deal with the rescaled measure n · µ, which is why they obtain another power of n as a prefactor). Since t n → 0 as n → ∞, we can assume that 0 < t n < 1 for each n ≥ p. From (37) and since r − t ≤ s − l and 0 < t n < 1 we conclude that By Theorem 5.2 and the definition of the bound B ′ 1 (i, k, n) we have to show that, under the above constraints on the indices i, j, s, l, r and t, Taking into account relation (38) as well as the fact that due to restrictions on the indices involved we always have i + k + s − l ≥ 3, this follows in the same way as in the paper [LRP13b] which is why we omit the details of these computations. Finally, we mention that, in cases (C1) and (C2), the respective relations n p t d(p−1) n = O(n) and n −1 = o n 2p−3 t d(2p−2) n hold such that the last term in the bound of Theorem 5.2 does not affect the rate of convergence in these cases.

Proofs
In this section we outline the proofs of several auxiliary results in the paper.
Note that we have |J ∪ K| = |J| + |K| − |J ∩ K| = p + q − r and Furthermore, we have Also, We denote by Π r (L) the collection of all (ordered) partitions (A, B, C) of the set L (i.e. L is the disjoint union of A, B and C) such that |A| = 2r + s − p − q, |B| = p − r and |C| = q − r. Then, for given sets L ⊆ M ⊆ [n] with |L| = s ≤ m = |M|, a fixed r ∈ {0, 1, . . . , p ∧ q} and for a fixed triple (A, B, C) ∈ Π r (L) there are exactly and M ⊆ J ∪ K. Indeed, given these restrictions it only remains to choose the set (J ∩ K) \ L such that M \ L ⊆ (J ∩ K) \ L . The claim now follows from the fact that The above implies that, still for fixed L and M, we have Let us assume that M = {j 1 , . . . , j m } with 1 ≤ j 1 < . . . < j m ≤ n. For π ∈ S m let us write T π := E ψ ⋆ p+q−r−m r ϕ X j π(1) , . . . , X j π(m) (X j ) j∈L .
Also, for a given partition (A, B, C) ∈ Π r (L) there are (p −r)!(q −r)!(m−p −q + 2r)! permutations π ∈ S m such that Henceforth, denote by X = (X 1 , . . . , X n ) our given vector of independent random variables X 1 , . . . , X n on (Ω, F , P) with values in the respective measurable spaces (E 1 , E 1 ), . . . , (E n , E n ). In fact, in this paper, X 1 , . . . , X n are even i.i.d. with values in (E, E) but we prefer keeping the general framework for possible future reference. Let Y := (Y 1 , . . . , Y n ) be an independent copy of X and let α be uniformly distributed on [n] = {1, . . . , n} in such a way that X, Y, α are independent random variables. Letting, for j = 1, . . . , n, and X ′ := (X ′ 1 , . . . , X ′ n ) it is easy to see that the pair (X, X ′ ) is exchangeable.
Proof of Lemma 3.2. We apply the following variant of Theorem 1, Lecture 3 in [Ste86] (see [DP17] for more information).
Theorem 7.1. Let (W, W ′ ) be an exchangeable pair of square-integrable, real-valued random variables on (Ω, F , P) such that, for some λ > 0 and some sub-σ-field G of F with σ(W ) ⊆ G, the linear regression property is satisfied. Then, we have that With the exchangeable pair (X, X ′ ) from above we construct W ′ by defining As exchangeability is preserved under functions, the pair (W, W ′ ) is clearly exchangeable. In Lemma 2.3 of [DP17], we showed that i.e. (55) holds with G = σ(X) and λ = p/n. Furthermore, denoting by Var U M , which already bounds the first term appearing on the right hand side of (56). To bound the second term, we first use the Cauchy-Schwarz inequality to obtain that 1 3λ Lemma 2.2 in [DP17] implies that Hence, using the orthogonality of the Hoeffding decomposition we obtain that The claim now follows from (59), (60) and (62) .
Proof of Lemma 4.1. For the proof of the multivariate lemma we quote the following (simplified) result from [Döb12]. It is a variant of Theorem 3 in [Mec09] albeit with better constants.
Theorem 7.2. Let (W, W ′ ) be an exchangeable pair of R d -valued L 2 (P) random vectors defined on a probability space (Ω, F , P) and let G ⊆ F be a sub-σ-field of F such that σ(W ) ⊆ G. Suppose there exists a non-random invertible matrix Λ ∈ R d×d such that the linear regression property holds and, for a given non-random positive semidefinite matrix Σ, define the Gmeasurable random matrix S by Finally, denote by Z a centered d-dimensional Gaussian vector with covariance matrix Σ.
(a) For any h ∈ C 3 (R d ) such that E |h(W )| < ∞ and E |h(Z)| < ∞, (b) If Σ is actually positive definite, then for each h ∈ C 2 (R d ) such that E |h(W )| < ∞ and E |h(Z)| < ∞ we have We now apply Theorem 7.2 with the σ-field G = σ(X) and the nonnegative definite matrix Σ := V = Cov(W ). Similarly to the above, we define the random vector Then, clearly the pair (W, W ′ ) is exchangeable and from Lemma 3.2 in [DP17] we know that E W ′ − W X = −ΛW holds with Λ = diag p 1 n , . . . , p d n and that the matrix is centered. Note that we have We start by bounding Note that, since S is centered, using the Cauchy-Schwarz inequality, we obtain Var S i,k is the Hoeffding decomposition of W (i)W (k). Hence, by the orthogonality of the terms in the Hoeffding decomposition we obtain, for 1 ≤ i, k ≤ d, Var U M (i, k) . 2p i n E W (i) 2 = 2p i n σ n (i) 2 to obtain the last identity. Now, taking into consideration that, in contrast to the one-dimensional setting, we did not normalize the components W (i) of W , similarly to (62) we obtain Now, using the fact that µ is a probability measure, we obtain that E k+i−s−l G 2 (r,t,j,m) (x i 1 , . . . , x i j−t−a , y q 1 , . . . , y qc , x k 1 , . . . , x k m−t−b ) dµ ⊗i+k−l−s (x 1 , . . . , x k+i−2s , y l+1 , . . . , y s ) = E j+m−r−t g j ⋆ t r g m 2 dµ ⊗j+m−r−t = g j ⋆ t r g m 2 L 2 (µ ⊗j+m−r−t ) .