Sharp minimax tests for large covariance matrices and adaptation

We consider the detection problem of correlations in a $p$-dimensional Gaussian vector, when we observe $n$ independent, identically distributed random vectors, for $n$ and $p$ large. We assume that the covariance matrix varies in some ellipsoid with parameter $\alpha>1/2$ and total energy bounded by $L>0$. We propose a test procedure based on a U-statistic of order 2 which is weighted in an optimal way. The weights are the solution of an optimization problem, they are constant on each diagonal and non-null only for the $T$ first diagonals, where $T=o(p)$. We show that this test statistic is asymptotically Gaussian distributed under the null hypothesis and also under the alternative hypothesis for matrices close to the detection boundary. We prove upper bounds for the total error probability of our test procedure, for $\alpha>1/2$ and under the assumption $T=o(p)$ which implies that $n=o(p^{ 2 \alpha})$. We illustrate via a numerical study the behavior of our test procedure. Moreover, we prove lower bounds for the maximal type II error and the total error probabilities. Thus we obtain the asymptotic and the sharp asymptotically minimax separation rate $\tilde{\varphi} = (C(\alpha, L) n^2 p )^{- \alpha/(4 \alpha + 1)}$, for $\alpha>3/2$ and for $\alpha>1$ together with the additional assumption $p= o(n^{4 \alpha -1})$, respectively. We deduce rate asymptotic minimax results for testing the inverse of the covariance matrix. We construct an adaptive test procedure with respect to the parameter $\alpha$ and show that it attains the rate $\tilde{\psi}= ( n^2 p / \ln\ln(n \displaystyle\sqrt{p}) )^{- \alpha/(4 \alpha + 1)}$.


Introduction
A large variety of applied fields collect and need to recover information from high-dimensional data.Among these we can cite for example communications and signal theory, econometrics, biology and finance.Testing large covariance matrix is an important problem and has recently been approached via several techniques: corrected likelihood ratio test using the theory of large random matrices, methods based on the sample covariance matrix and so on.
Let X 1 , . . ., X n , be n independent and identically distributed p-vectors following a multivariate normal distribution N p (0, Σ), where Σ = [σ ij ] 1≤i,j≤p is the normalized covariance matrix, with σ ii = 1, for all i = 1 to p. Let us denote by X k = (X k,1 , . . ., X k,p ) T for all k = 1, . . ., p.In this paper we also assume that the size p of the vectors grows to infinity as well as the sample size n.
We consider the following goodness-of-fit test, where we test the null hypothesis H 0 : Σ = I, where I is the p × p identity matrix (1) against the composite alternative hypothesis The class of matrices F(α, L) is defined as follows, for α > 0, F(α, L) = {Σ ∈ C >0 ; 1 p 1≤i<j≤p σ 2 ij |i − j| 2α ≤ L for all p and σ ii = 1 for all i = 1, . . ., p} where C >0 is the set of all non-negative definite symmetric p × p matrices.
Note that null hypothesis H 0 : Σ = Σ 0 with a given non-negative definite covariance matrix Σ 0 is equivalent to (1).This follows simply from the fact that we can always transform the observations X i into Z i = Σ − 1 2 0 X i and then test (1) using the Z i .Let us denote by The set of covariance matrices under the alternative hypothesis consists of matrices of size p × p, whose elements decrease polynomially when moving away from the diagonal.In the following, we assume that n → ∞ , p → ∞ and that ϕ 2 = ϕ 2 (n, p) is related to n and p, but also to α, L.
The problem of estimation of large covariance matrices has been considered from a minimax and adaptive point of view in various setups, see [4], [3], [8], [9] and references therein.
Unlike the estimation of the covariance matrix, the goodness-of-fit test has been considered in a minimax setup in one previous paper, by [7].They do not restrict the alternative to a nonparametric class, they consider H 1 : Σ − I F ≥ ϕ.For this alternative the minimax optimal rate is of order p/n.We will see in next Section that this rate corresponds to the first order term 1/n in the variance of the estimator of Σ − I 2 F /p.In our setup, the restriction to the nonparametric class Q(α, L, ϕ) makes us go further to second order terms.
Likelihood ratio tests (LRT) were first designed for fixed dimension p, p ≤ n, but the LRT statistic tends to infinity as p is also large.This was noted by [1] who proposed a correction of the LRT statistic and showed its convergence in law under the null hypothesis, as soon as p/n → c, for some fixed c ∈ (0, 1).Indeed, this correction is based on the asymptotic behaviour of the spectrum of the covariance matrix.A similar phenomenon was noted for tests based on quadratic forms of the sample covariance matrix by [16], who also gave corrected tests for goodness-of-fit and for sphericity for normally distributed random vectors.In order to deal with non Gaussian random vectors, [10] do moment assumptions for the stationary law of the observations.Tests for the identity matrix for large non Gaussian vectors were constructed under mild dependence assumptions by [17].They use maximum deviation of the sample covariance matrix whose limit behaviour was studied by [6] under the null hypothesis and generalized to Gaussian m-dependent data.These methods show an original limit behaviour of Gumbel type for the test statistic.
A non-asymptotic sphericity test for Gaussian vectors was studied by [2].The alternative is given by a model with rank-one and sparse additive perturbation in the variance.
We describe here the rate asymptotics of the error probabilities from the minimax point of view.We recall that a test procedure ∆ is a measurable function with respect to the observations, taking values in [0, 1].Set η(∆) = E I (∆) = P I (∆ = 1) its type I error probability, P Σ (∆ = 0) its maximal type II error probability over the set Q(α, L, ϕ), and by the total error probability of ∆.Let us denote by γ the minimax total error probability over Q(α, L, ϕ) which is defined by where the infimum is taken over all test procedures.We want to describe the separation rate ϕ = ϕ(n, p) such that, on the one hand, In this case we say that we can not distinguish between the two hypotheses.On the other hand, we exhibit an explicit test procedure ∆ * such that its total error probability tends to 0 We say that ∆ * is asymptotically minimax consistent test and ϕ is the asymptotically minimax rate.
In this paper, we find asymptotically minimax rates for testing over the class F(α, L).
The minimax consistent test procedure is based on a U-statistic of second order, weighted in an optimal way.In this, our procedure is very different from known corrected procedures based on quadratic forms of the sample covariance matrix, see e.g.[16].This is the first time a weighted test-statistic is used for testing covariance matrices.
Moreover, our rates are sharp minimax.We show a Gaussian asymptotic behaviour of the test statistic in the neighbourhood of the separation rate.We get the following expression for the maximal type II probability error inf where Φ denotes the cumulative distribution function (cdf) of the standard Gaussian distribution and z 1−w is the 1−w quantile of the standard Gaussian distribution for any w ∈ (0, 1).
We deduce that the minimax total error probability is of the type where b 2 (ϕ) = C(α, L)ϕ 4+1/α as ϕ → 0, C(α, L) is explicitly given.This shows that the asymptotically sharp minimax rate is corresponding to n 2 pb 2 ( ϕ) = 1 and to the asymptotic testing constant C(α, L).
Analogous results were obtained by [5] in the particular case where the covariance matrix is Toeplitz, that is σ i,j = σ |i−j| for all different i and j from 1 to p.We note a gain of a factor p in the minimax rate.The results are valid for any n ≥ 2 and asymptotics are taken with p.The asymptotically sharp minimax rate for Toeplitz covariance matrices is This additional factor p can be heuristically explained by the number of parameters p − 1 for a Toeplitz matrix, instead of p(p − 1)/2 for an arbitrary covariance matrix.For n = 1 the test problem for Toeplitz covariance matrices was solved in the sharp asymptotic framework, as p → ∞, by [11].Let us also recall that the adaptive rates (to α) for minimax testing are obtained for the spectral density problem by [12] by a non constructive method using the asymptotic equivalence with a Gaussian white noise model.
Important generalizations of this problem include testing in a minimax setup of composite null hypotheses like sphericity, H 0 : Σ = v 2 • I, for unknown v 2 in some compact set separated from 0, or bandedness, H 0 : Σ = Σ 0 such that [Σ 0 ] ij = 0 for all i = j with |i − j| > K.
Our proofs rely on the Gaussian distribution of Gaussian vectors.Generalizations to non Gaussian distributions with finite moments of some order can be proposed under additional assumptions on the behaviour of higher order moments, like e.g.[10].Finding explicit test procedures which adapt automatically to parameters α and/or L of our class of matrices will be the object of future work.We focus here on sharp minimax rates.
Section 2 introduces the test statistic and studies its asymptotic properties.Next we give upper bounds for the maximal type II error probability and for the total error probability and refine these results to sharp asymptotics under the condition that n = o(1)p 2α .
In Section 3 we prove sharp asymptotic optimality without restriction on n and p large and deduce the optimality of the minimax separation rates.In Section 4 we present the rate minimax ressults for testing the inverse of the covariance matrix.Proofs are given in Section 5 and the Appendix contains the extremal problem providing both optimal weights for the test statistic and a family of optimal covariance matrices for the lower bounds.

Test procedure and sharp asymptotics
In the minimax theory of tests developped since [15], it is well understood that optimal test statistics are estimators (suitably normalized and tuned) of the functional which defines the separation of an element in the alternative from the element of the null hypothesis.In our case this is the Frobenius norm Σ − Weighting the elements of the sample covariance matrix appeared first as hard thresholding in minimax estimation of large covariance matrices.Let us mention [4] for banding i.e. truncation of the matrix to its k first diagonals (closest to the main diagonal), [3] for hard thresholding, then [8] where tapering was studied.It is a natural idea when coming from minimax nonparametric estimation.
However, that was never used for tests concerning large covariance matrices.In this section, we introduce a weighted U-statistic of order 2 for testing large covariance matrices, study its asymptotic properties and give asymptotic upper bounds for the minimax rates of testing.
From now on asymptotics and symbols o, O, ∼ and ≍ are considered n and p tend to infinity.Recall that, given sequences of real numbers u and real positive numbers v, we say that they are asymptotically equivalent, u ∼ v, if lim u/v = 1.Moreover, we say that the sequences are asymptotically of the same order, u ≍ v, if there exist two constants 0 < c ≤ C < ∞ such that c ≤ lim inf u/v and lim sup u/v ≤ C.

Test statistic and its asymptotic behaviour
For any covariance matrix Σ, we recall that the Frobenius norm is computed as Our test statistic is a weighted U-statistic of order 2. It can be also seen as a weighted functional of the sample covariance matrix.The weights w * ij are constant on each diagonal (they depend on i and j only through i − j), non-zero only for |i − j| ≤ T for some large integer T and decreasing polynomially for elements further from the main diagonal (as |i − j| is increasing).More precisely, we consider the following test statistic: where Note that the weights {w * ij } i,j and the parameters T, λ, b 2 (ϕ) are obtained by solving an extremal problem which is postponed to the Appendix.
In fact the weights in (4) have further properties: The following Proposition gives the moments of D n under the null and their bounds under the alternative hypothesis, respectively, as well as the asymptotic normality under the null hypothesis.
Proposition 1 The test statistic D n defined by (3) with parameters given by ( 4) and ( 5) has the following moments, under the null hypothesis: Moreover, under the alternative, if we assume that ϕ → 0, p ϕ 1/α → ∞ and α > 1/2, we have, uniformly over Σ in Q(α, L, ϕ): where Note that, under the alternative, we have the additional assumption that pϕ −1/α ≍ T /p → 0, when p grows to infinity.This is natural in order to a have a meaningful weighted statistic.
When applied to the asymptotically minimax rate, this condition becomes n = o(1)p 2α .
Let us take a quick look at the extremal problem (36): for given ϕ > 0, b(ϕ) is the least value that E Σ ( D n ) can take over Σ in the alternative set of hypotheses.
Under the alternative, we shall establish the asymptotic normality under additional conditions that the underlying covariance matrix close to the border of the null set.This will be sufficient to give upper bounds of the total error probability of Gaussian type in next Section.

Upper bounds for the error probabilities
In order to distinguish between the two hypothesis H 0 and H 1 defined previously, we defined the following test procedure where D n is the estimator defined in (3).
The following theorem proves that the previously defined test procedure is minimax consistent if t is conveniently chosen.
Theorem 1 If n and p tend to infinity, the test procedure ∆ * defined in (8) with t > 0 has the following properties : Type II error probability : if α > 1/2 and if , for some constant c in (0, 1), we have In the next Theorem we give a more refined upper bound of error probabilities of Gaussian type.The proof of this result explains the choice of the weights as solution of the extremal problem given in the Appendix.
Recall that Φ is the cumulative distribution function (cdf) of standard Gaussian random variable and, for any w ∈ (0, 1), z 1−w is defined by Φ(z 1−w ) = 1 − w.
Theorem 2 If n and p tend to infinity, the test procedure ∆ * defined in (8) with t > 0 has the following properties : Type I error probability : we have η(∆ Type II error probability : if α > 1/2 and if then, uniformly over , for some constant c in (0, 1), we have In particular, for for some c ∈ (0, 1), we also get Another important consequence of the previous theorem, is that the test procedure ∆ * , with In particular, we get the asymptotically minimax consistent test procedure ∆ * (t * ) if ϕ/ ϕ → ∞, where we call sharp separation rate . When we compare with Theorem 1, we also have the rate of convergence of the total error probability.
Note that the separation rate verifies (9) when p and n are such that p 2α /n → ∞.Indeed, Proof of Theorems 1 and 2.
The proof is based on the Proposition 1 and the asymptotic normality of the weighted test statistic n √ p D in Proposition 2. We get for the type I error probability of ∆ * For the type II error probability of ∆ * , uniformly in Σ over Q(α, L, ϕ), we have for t ≤ c • b(ϕ) and 0 < c < 1.Under the hypotheses of our theorem, n √ p • t is at most finite constant.Therefore, we distinguish the cases where n 2 pb 2 (ϕ) tend to infty or is close to the asymptotic constant C(α, L).
We use the fact that, under the alternative, E Σ ( D n ) ≥ b(ϕ).We bound from below as follows: Then, it gives Let us bound from above S 1 using (6): .
We will see using (7) that the term S 2 tends to 0 as well: 1) for all α > 1/2, as soon as n 2 pb 2 (ϕ) → +∞.Now, if we are close to the separation rate: n 2 pb 2 (ϕ) = O(1), we see that whenever The nontrivial bound is obtained when Σ under the alternative is close to the null hypothesis in the sense that E Σ ( D n ) = O(b(ϕ)) together with the fact that ϕ is close to the separation rate: n 2 pb 2 (ϕ) = O(1).We apply Proposition 2 to get the asymptotic normality At this point, choosing optimal weights translates into inf after solving the extremal problem in the Appendix, which ends the proof of the Theorem.

Asymptotic optimality
The next theorem shows sharp lower bounds for the maximal type II error probability and deduces the lower bounds for the total error probability.
Theorem 3 Suppose α > 1/2 and, moreover, that n and p tend to infinity and that ϕ → 0. Then, where the infimum is taken over all test statistics ∆ with type I error probability less than or Theorems 2 and 3 imply that the sharp separation rate for minimax testing is where the constant C(α, L) is given by ( 5).
As a corollary, we get that if ϕ is such that ϕ/ ϕ → 0, we can not distinguish between the null and the alternative hypotheses.
Corollary 4 Suppose α > 1/2 and, moreover, that n and p tend to infinity.
where the infimum is taken over all test statistics ∆.
Together with Theorem 1, this Corollary shows that the separation rates are asymptotically minimax.
The proof of the lower bounds is postponed to Section 5. We construct a family of n large centered Gaussian vectors with covariance matrices based on Σ * given by the extremal problem in the Appendix and a prior measure on these covariance matrices.We prove that the likelihood under the null and the average likelihood under the alternative hypothesis tend to the same limit asymptotically.
The log of the ratio of the likelihoods associated to an arbitrary Σ with respect to I under the null hypothesis is known to drift away to infinity (see [1], who corrected this ratio to get a proper limit).However, the log of the Bayesian likelihood ratio with our prior measure is asymptotically normally distributed.This property is highly surprising and essential in proving sharp asymptotic lower bounds for testing in our setup.

Testing the inverse of the covariance matrix
Let us consider the same model, but the following test problem against the alternative where G(α, L, λ) is the class of covariance matrices Σ in F(α, L) with the additional constraint that the eigenvalues λ i (Σ) are bounded from below by some λ ∈ (0, 1) for all i from 1 to p and all Σ in the set.
We prove here that previous results apply to this setup and we get the same rates, but not the sharp asymptotics.Note that, the additional hypothesis is the mildest one that does not change the rates for testing.Indeed, we see this case as a well-posed inverse problem.
The cases of ill-posed inverse problem where the smallest eigenvalue can be allowed to tend to 0 will most certainly imply a loss in the rate and is beyond the scope of this paper.
Proof.Note that Σ −1 = I if and only if Σ = I.Moreover, if Σ belongs to G(α, L, λ) such , then Σ obviously belongs to F(α, L) and is such that Thus we can proceed with our former test procedure, with ϕ replaced by λψ and we obtain the upper bounds in the definition of the separation rates.
The lower bounds in the previous Section will also remain valid.Indeed, this proof is based on the construction of a subfamily {Σ * U : u ∈ U } on the set of alternatives.We have proven in Proposition 3, that and we have α > 1/2 and ϕ = λψ → 0 as ψ → 0 and therefore, 1 − O(ϕ 1−1/(2α) ) ≥ λ for ψ > 0 small enough.Thus, this family belongs to the set of alternatives we consider here, as well.

Proofs
Proof of Theorem 3. The first step of the proof is to reduce the set of parameters to a convenient parametric family.Let Σ * = [σ * ij ] 1≤i,j≤p be the matrix which has 1 on the diagonal and off-diagonal entries σ * ij where with λ and T are given by ( 4) and ( 5).
Let us define Q * a subset of Q(α, L, ϕ) as follows where The cardinality of U is p(T − 1)/2.
Using Proposition 3 hereafter we have that for all Σ * U ∈ Q * , Σ * U is non-negative definite, for ϕ > 0 small enough.
Assume that X 1 , . . ., X n ∼ N (0, I) under the null hypothesis and denote by P I the likelihood of these random variables.We assume that X 1 , . . ., X n ∼ N (0, Σ * U ), under the alternative, and we denote P U the associated likelihood.In addition let U ∈U P U be the average likelihood over Q * .By Lemma 8.1 and Proposition 2.11 in [14], the problem can be reduced to the test and that It is, therefore, sufficient to show that inf and that inf In order to obtain (11) and ( 12), we apply results in Section 4.3.1 of [14] giving the sufficient condition that, in P I probability: where u n = n √ pb(ϕ) and Z n is asymptotically distributed as a standard Gaussian distribution.Let us finish by proving (13).
, for all i from 1 to p, and σ * ij defined in ( 10) is non-negative definite, for ϕ > 0 small enough, and for all U ∈ U .Moreover, denote by λ 1,U , ..., λ p,U the eigenvalues of , for all i from 1 to p.
We deduce that We deduce that the smallest eigenvalue is bounded from below by min i=1,...,p which is strictly positive for ϕ > 0 small enough.
Let us continue the proof of (13).More explicitly, where U is seen as a randomly chosen matrix with uniform distribution over the set U .Let us denote ∆ U = Σ * U − I and write the following approximations obtained by Taylor expansion: Indeed, tr(∆ U ) = 0 and tr(∆ 2 U ) does not depend on U and equals the Frobenius norm of Σ * − I.This gives Now, we compute the expected value with respect to the i.i.d.Rademacher variables u i 1 ,j 1 , Now, we study the last expression as a random variable under the null hypothesis.Recall the Taylor expansion of log cosh(u) = u 2 /2 − u 4 /12 + O(u 6 ) for small u.See also that, under and, similarly, for log cosh(σ * ih σ * hj W ij (1+o(1))).We can check that, for all i = j, h ∈{i,j} σ * ih σ * hj = o(σ * ij ).Indeed, on the one hand, |j − i| is at most T − 1 and we have On the other hand, and ϕ 2 = o(ϕ 1−1/α ) for all α > 0 and ϕ → 0.
Thus, we get With our definition: σ * ij = w * ij 2b(ϕ) and we see that and we can put Z n = n √ p D n which is asymptotically standard Gaussian under the null hypothesis, by Proposition 1.
Note also that W ij are non correlated, identically distributed for all i < j and that W ij / √ n is asymptotically standard Gaussian.Therefore, , for all given i < j and n large.We deduce that, and that Thus, Remaining terms in (15) can be grouped as follows: which concludes the proof of ( 13).
Proof of Proposition 1.We recall that under the null hypothesis the coordinates of the vector X k are independent, so using this fact we have : Remark that D n − E Σ ( D n ) can be written as the following form Then the variance of the estimator D n is a sum of two uncorrelated terms Now we will give an upper bound for the first term on the right-hand side of (17).Denote by We shall distinguish three terms in the previous sum, that is (i, where A 1 , A 2 , A 3 form a partition of the set{(i, j, i ′ , j ′ ) ∈ {1, . . ., p} 4 such that i = j, i ′ = j ′ }.
More precisely in A 1 we have (i, j) = (i ′ , j ′ ) or (i, j) = (j ′ , i ′ ), in A 2 we have three different indices (i = i ′ and j = j ′ ) or (j = j ′ and i = i ′ ) or (i = j ′ and j = i ′ ) or (j = i ′ and i = j ′ ) and finally in A 3 the indices are pairewise distinct.First, when (i, j, i ′ , j ′ ) ∈ A 1 , we use that and this is p(1 + o(1)) since sup i,j w * 2 ij ≍ (1/T ) → 0. When the indices are in A 2 , we have three indices out of four which are equal.We assume i = i ′ , therefore it is sufficient to check that, Now let us bound from above the first term of T 1,2 , Again we will treat each term of T 1,2,1 separately.We recall that the weights w * ij verify the following properties In the rest of the proof we denote by k 0 (α, L), k 1 (α, L), . . .different constants that dependent only on α and/or on L. We have for α > 1/2, For the second term in (19), where |j − j ′ | ≥ T , we use the following bound: then we prove that, Note that sup i,j σ ij ≤ 1.The second term of T 1,2 , is bounded as follows: As a consequence of (20) to (22), The last case, where (i, j, i ′ , j ′ ) vary in A 3 , the indices are pairwise distinct, As the two previous terms have the same upper bound, let us deal with the first one say T 1,3,1 .
We should distinguish two cases, the first when |i − i ′ | < T and the second when We begin by the first case, which in turn will be decomposed into three terms.First, Then, Finally, using Cauchy-Schwarz inequality, we have, Now we suppose that we have |i − i ′ | > T , then, Finally we obtain, from ( 24) to ( 27) : Put together ( 18), ( 23) and ( 28) to obtain (6).Let us give an upper bound for the second term of (17), Proceeding similarly, we shall distinguish three kind of terms.Let us begin by the case when the indices belong to A 1 , We bound from each term of T 2,2 separately.Using Cauchy-Schwarz inequality two times we obtain, The second term in T 2,2 is T 1,2,2 and therefore, Finally, when (i, j, i ′ , j ′ ) ∈ A 3 , we have to bound from above These last two terms, in T 2,3 , are treated similarly, so let us deal with : Using the upper bound of T 1,3 obtained previously, we have Put together (29), ( 30) and (31) to get (7).
The asymptotic normality under the null hypothesis is obvious.
Proof of Proposition 2. We use the decomposition (16) in the proof of the Proposition 1 and we treat each term separately.Recall that, by our assumptions, n This tends to 0, since , which is true for all α > 1/2.
It follows that, for proving the asymptotic normality, it is sufficient to prove the asymptotic normality of We study V n centered, 1-degenerate U-statistic, with symmetric kernel H n (X 1 , X 2 ) defined as follows We apply Theorem 1 of [13].Therefore we check that E Σ (H 2 n (X 1 , X 2 )) < +∞ and that , and from the inequality (6), we have In order to prove that In fact, To bound from above (33), we shall distinguish four cases.The first one is when all couples of indices are equal, The second one is when we have two different pairs of couples of indices, which can be obtained by two different combinations of the couples of indices.When we have equal pairs of couples of indices, as for example (i 1 , j 1 ) = (i 2 , j 2 ), (i When we have three couples of indices equal, for example (i For the third case, there are three different couples of pairs of indices, for example, (i 1 , j 1 ) = (i ′ 2 , j ′ 2 ) and (i 1 , j 1 ) = (i ′ 1 , j ′ 1 ) = (i 2 , j 2 ).Using Cauchy-Schwarz inequality several times we obtain, Moreover, we recognize in these bounds which is O(p).Thus, Now we will treat the last case, when the pairs of indices are pairwise distinct, in this case, we have 16 terms to handle.As all terms are treated the same way, let us deal with: In order to find an upper bound for G 4 , we decompose the previous sums, into several sums, similarly to the upper bound of (28).That is (i , where J 1 , . . ., J 16 , form a partition of the set {(i 1 , j 1 , i ′ 1 , j ′ 1 , i 2 , j 2 , i ′ 2 , j ′ 2 ) ∈ {1, . . ., p} 8 }.Let us define, and so on, for all J r , r = 3, . . ., 16.To bound from above the sum over J 1 , we partition again and so on, until we get the partition of J 1 .
, we can see that : where, from now on, κ 0 (α, L), κ 1 (α, L), . . ., denote constants that depend on α and L. Now, Using similar arguments, we can prove that all remaining terms tend to zero.In consequence, The above squared expected value is a sum of a large number of terms that are all treated similarly.Let us consider examples of terms containing squared terms and products of terms, respectively.For α > 1/2, The terms containing no squared values are treated as, e.g., w * i 1 j 1 w * i 2 j 2 w * i 3 j 3 w * i 4 j 4 σ i 1 i 2 σ j 1 j 2 σ i 3 i 4 σ j 3 j 4 σ i 1 i 3 σ j 1 j 3 σ i 2 i 4 σ j 2 j 4 We can see that H 2 coincides with G 4,2 .Then we can deduce that , Finally we can apply [13], and we obtain: Combining (32) and (35), we have by Slutsky theorem that: L −→ N (0, 1).

Appendix -Optimal weights and covariance matrix
We solve the following extremal problem that appears in both sharp upper and lower bounds.
Indeed, the solution of this problem defines the weights (w * ij ) 1≤i,j≤p that appear in the optimal test procedure and the covariance matrix Σ * that we use to construct the subfamily of covariance matrices in the proof of the lower bounds.
Recall that Q(α, L, ϕ) is the class of covariance matrices in (2).We define the sequences where we used Proposition 4.1 in [14].Indeed, the set of parameters over which we take the infimum is convex.Now using Cauchy-Schwarz inequality we obtain, sup {(w ij ) ij : w ij ≥0; We evaluate the solution of the previous system as T tends to infinity, T < p, Under the assumption that T /p → 0, that gives T ∼ (L(4α + 1)) 2α ) and Σ * U − I has eigenvalues λ i,U − 1. Proof of Proposition 3 .Let us check the case where u ij = 1 for all i, j such that |i − j| ≤ T and the generalization to all U in U will be obvious.Using Gershgorin's Theorem we get that each eigenvalue of Σ * U = [u ij σ * ij ] 1≤i,j≤p lies in one of the disks centered in σ ii = 1 and radius R i = (w * ij ) ij and (σ * ij ) ij as solutions of the following optimization problem : ij ) ij : w ij ≥0; (σ ij ) i,j ; Σ∈Q(α,L,ϕ) ) ij : w ij ≥0; ) ij : v ij ≥0;