Exact rate of convergence of the expected W _ 2 distance between the empirical and true Gaussian distribution

We study the Wasserstein distance W 2 for Gaussian samples. We establish the exact rate of convergence p log log n/n of the expected value of the W 2 distance between the empirical and true c.d.f. ’s for the normal distribution. We also show that the rate of weak convergence is unexpectedly 1 / √ n in the case of two correlated Gaussian samples.


Introduction
In this article we investigate in details the asymptotic behaviour of the quadratic Wasserstein distance between the empirical cumulative distribution function (c.d.f.) of a sample X 1 , . . ., X n of independent standard Gaussian random variables denoted by F n and the standard normal c.d.f.denoted by Φ.Thus we consider the random variable More precisely we are interested in the exact rate of convergence of E W 2 2 (F n , Φ) .Define h(u) = Φ ′ • Φ −1 (u) for u ∈ (0, 1).First note that Corollary 19 in [1] does not apply in this specific case where b = 2, and indeed we almost surely have lim n→+∞ nW 2  2 (F n , Φ) = +∞.Secondly, to our knowledge the most precise result about the behaviour of W 2 (F n , Φ) is given by Theorem 4.6 (ii) in [9] which implies, as n → +∞, the convergence in distribution where B is a standard Brownian bridge.This is not enough to control nE(W 2 2 (F n , Φ)) since the deterministic centering integral is diverging.In [4] specific bounds on nE(W p p (F n , F )) are given for log-concave distribution F .In the standard Gaussian case Corollary 6.14 of [4] reads where 0 < c < C < +∞.The main achievement below is to compute the exact asymptotic constant in (2).As far as we know this is the first result of this kind.
In the spirit of [1] we moreover extend the investigations in the one sample case to the two correlated samples case.
More precisely, we study the random quantity W 2 2 (F n , G n ) where F n , G n are the marginal empirical c.d.f.obtained from a n-sample (X i , Y i ) 1 i n of standard Gaussian couples with correlation ρ.If the Gaussian marginals Φ X and Φ Y were not identical the general Theorem 14 in [2] would imply the convergence in distribution where Σ is the covariance matrix of (X 1 , Y 1 ) and σ 2 (Σ) has a closed form expression that explicitly depends on Σ.
In particular, Corollary 18 of [2] shows that for two independent samples from two distinct Gaussian distributions Surprisingly, the second result below establishes that whenever the marginals are the same, Φ X = Φ Y = Φ, and the samples are not independent, that is ρ = 0, the rate of weak convergence of W 2 2 (F n , G n ) is 1/n and the limiting distribution is a slight variation of the one given at Theorem 11 in [1], even if the sufficient condition of the latter result is not satisfied.

The results
First we provide the limiting constant in (2).Theorem 1.Let F n be the empirical c.d.f. of an i.i.d.standard normal sample of size n and Φ the c.d.f. of the standard normal distribution.Then it holds Remark 2. This result is consistent with (1) and the fact that, by [3], we have which implies that n log log n W 2 2 (F n , Φ) → 1 in probability.Remark 3. In the case of a sample of unstandardized normal random variables with variance σ 2 the expected W 2distance between the empirical and the true distribution has the same rate as above and limiting constants σ 2 and σ, respectively.Remark 4. If G n is a second empirical c.d.f.independent of F n and build from another sample we see that Therefore, in this independent case we have which is in contrast with the forthcoming dependent sample case.
Second, in the setting of [2] and [1] we also get the rate of weak convergence in the two correlated samples case.Theorem 5. Let F n and G n denote the marginal empirical c.d.f. of a size n i.i.d.sample of correlated bivariate standard normal with covariance ρ, where (B X , B Y ) are two standard Brownian bridges with cross covariance Then we have the convergence in distribution and the limiting random variable is almost surely finite with finite expectation.
Remark 6.By Theorem 5 it holds √ nW 2 (F n , G n ) → ||G|| 2 with a CLT rate and a non degenerate limiting distribution with finite variance.This was not expected since in the case of two independent samples, that is ρ = 0, it holds which proves by Theorem 1.3 of [8] that P(||G|| 2 = +∞) = 1, and is consistent with the similar case where G n is replaced with Φ as shown by Theorem 1.
Remark 7. Theorem 5 is an extension of Theorem 11 in [1] for Gaussian correlated samples that proves that the dependency between two i.i.d.samples expressed through the joint law may influence the rate of convergence of W 2 2 (F n , G n ) if the marginal distributions are the same.In the general CLT formulated at Theorem 14 of [2], only the limiting finite variance of was affected by the joint law if the marginal distributions are different, not the rate 1/ √ n as recalled at (3) above.

Preliminaries
Note that the density quantile function As a consequence, we have, as u → 1, and Let us extend the results concerning the first and second moments of the extreme order statistics of a Gaussian sample stated at page 376 in [6].
where, for k > 0, Proof of Lemma 8. Following [6], let Since the random variables ξ 1 /n < ... < ξ n /n are the order statistics of n independent uniform random variables, we see that ξ n−k+1 has density Step 1. Write Γ(k) = (k − 1)! and observe that Step 2. For k 1 we have Assume that k C(log n) θ .By (4) it holds, for some K > 0 and all n large enough,

Now turn to
where, for 0 < x < x(n), we have, by (4), which is integrable near 0 with respect to the above density since and log x, (log x) 2 are integrable with respect to any Gamma distribution.Hence and moreover -see [6] -it holds Similar computations give the claimed result for the variance.More precisely in the step 2 when substituing n and E 2,n it again appears that we can only consider integrals up to x(n).Then it remains to compute, by substituing the expression of E(Z n−k ) and using equation ( 6) for Φ −1 1 − x n : We conclude along the same lines as above by the upper bound (7) and the fact that the variance of the logarithm of a variable with distribution Γ(k) is π 2 /6 − s 2 k+1 .

Proof of Theorem 1
We intend to mimic the sheme of proof worked out in [2] and [1] -specialized to the simpler case of the distance between the empirical and true c.d.f.'s instead of two correlated empirical ones.However all arguments have to be reconsidered since the almost sure controls by means of the law of the iterated logarithm and strong approximations can not be turned easily into L 1 controls.Indeed, what happens now is that the main part of the random integral we consider is also built from the extreme parts rather than the inner part only.Moreover, only a very short extreme interval can be neglected and the remainder extreme intervals define a divergent integral to be precisely evaluated as a series.This is why the expectation rate is no more a CLT rate.Note that the log log n in this paper only comes from the primitive of u(1 − u)/h(u) 2 .Introduce the following decomposition, for C > 0, γ > 1 and 1 < θ 2, Step 1.We have, for γ > 1, where Step 2. Notice that for all u ∈ [1 − 1/n, 1 − 1/(n(log n) γ )], we have Next observe that Step 3. Start with Recall that As a consequence, Thus, for any θ 2 we have Step 4. Now we compute the limit of the main deterministic contribution to the main stochastic term D n , namely Compared with the result of [3] recalled at Remark 2 the truncation at level 1/v n instead of 1/n preserves the same first order.
Step 5. To show that E(D n ) behaves as D 1,n + o(1) we proceed as in [2] with strong approximation arguments.
First, we substitute the uniform quantile process to the general quantile process with a sharp control of the expectation of the random error terms in the Taylor Lagrange expansion.For short, write Defining U i = Φ(X i ) which is uniform on (0, 1) we obviously have U (i) = Φ(X (i) ).Let denote F U n the uniform empirical c.d.f.associated to the U i and define the underlying uniform quantile process to be Thus for all 1/2 u 1 − d n there exists a random u * such that |u − u * | β U n (u) / √ n and We study Since we have it holds, by Lemma 6.1.1 in [7], Now we introduce the sequence of events, with 0 < ε < 1, On the event A n we have the following control of u * , since, for instance, and the same holds for the reverse ratios.Hence we have By Lemma 9 below and (8) we have, when θ = 2, sup It ensues By using the Cauchy-Schwartz inequality we easily get since by (10) we have, again for θ = 2, Step 6.Next we evaluate the probability of the rare event A c n from (9).To this aim we work on the KMT probability space where we can define a sequence B n of standard Brownian bridges approximating the processes β U n in such a way that the error process w n = β U n − B n satisfies, for universal positive constants c 1 , c 2 , c 3 and all x > 0, n 1, Hence we have Recall that 1 < θ 2. By the theorem of Borell-Sudakov (see [5], [10]) and (11) we obtain, for any γ > 2, the constant C fixed as large as needed and all n large enough, Therefore we get, for any 0 < b < γ/2 − 1, Step 7. It remains to study At this stage the approximation bounds play a crucial role and there is no room for relaxing the trimming constraints.
To be more specific the only allowed choice θ 2 is θ = 2. Choose an arbitrarily large constant C > 0. Given any 0 < η < 1, consider the sequence of events By (11), for any k 1 > 0 there exists C = C η > (1 + k 1 /c 3 ) 2 /η 2 > 0 and n 0 > 0 large enough such that for all n > n 0 we have Lemma 9.For any p 1 there exist constants C > 0 and κ p such that we have, for ] and all n large enough, and all n large enough, we have By Sudakov-Borell theorem it holds which proves the first claimed upper bound.Since doesn't depend on n the second expectation bound follows.
By Lemma 9 we get and, by (8), By choosing η as small as desired, the first assertion of Theorem 1 is proved.
Step 8.The sequence n/ log log nW 2 (F n , Φ) is bounded in L 2 , thus uniformly integrable, and from (1) (see [9]) converges in probability to 1. Thus the convergence holds in L 1 , which establishes the second assertion of Theorem 1.

Proof of Theorem 5
In Theorem 11 of [1] we proved that nW under assumptions on the common probability distribution F of the samples ensuring that √ n(F −1 n (u) − F −1 (u)) and √ n(G −1 n (u) − G −1 (u)) can be simultaneously approximated on a suitable sub-interval of [0, 1] by B X (u)/h(u) and B Y (u)/h(u) respectively.Here B X (u) and B Y (u) are two standard Brownian bridges coupled to the marginal samples respectively, and are then correlated together as mentionned at Theorem 5 if the two samples are.In [1] the imposed assumptions for the Gaussian approximation concerned the tail of F with respect to the cost function, and the integrability condition The second term needs more attention.First we choose 0 < α < 1 such that for all v ∈ [1/2, 1−(1−u) α 2 ] we have, for u close to 1 and η arbitrarily small, Φ −1 (v) (α+η)Φ −1 (u) and 1−αρ > 1 − ρ 2 .We take α < (1− 1 − ρ 2 )/ρ, which is actually less than ρ and we have for u close enough to 1, Thus it comes that is, up to a logarithmic factor, of order (1 − u) , with (1−(α+η)ρ) 2 1−ρ 2 > 1 for u close enough to 1.
It remains to study which proves that it is integrable near 1.By symmetry the same holds near 0. We conclude that (u − C ρ (u))/h 2 (u) is integrable on (0, 1).