On the sphericity test with large-dimensional observations

In this paper, we propose corrections to the likelihood ratio test and John's test for sphericity in large-dimensions. New formulas for the limiting parameters in the CLT for linear spectral statistics of sample covariance matrices with general fourth moments are first established. Using these formulas, we derive the asymptotic distribution of the two proposed test statistics under the null. These asymptotics are valid for general population, i.e. not necessarily Gaussian, provided a finite fourth-moment. Extensive Monte-Carlo experiments are conducted to assess the quality of these tests with a comparison to several existing methods from the literature. Moreover, we also obtain their asymptotic power functions under the alternative of a spiked population model as a specific alternative.


Introduction
Consider a sample Y 1 , . . . , Y n from a p-dimensional multivariate distribution with covariance matrix Σ p . An important problem in multivariate analysis is to test the sphericity, namely the hypothesis H 0 : Σ p = σ 2 I p where σ 2 is unspecified. If the observations represent a multivariate error with p components, the null hypothesis expresses the fact that the error is cross-sectionally uncorrelated (independent if in addition they are normal) and have a same variance (homoscedasticity).
Much of the existing theory about this test has been exposed first in details in [17] about Gaussian likelihood ratio test and later in [10,11,27] and also in textbooks like [18,Chapter 8] and [1,Chapter 10]. Assume for a moment that the sample has a normal distribution with mean zero and covariance matrix Σ p . Let S n = n −1 i Y i Y * i be the sample covariance matrix and denote its eigenvalues by {ℓ i } 1≤i≤p . Two well established procedures for testing the sphericity are the likelihood ratio test (LRT) and a test devised in [10]. The likelihood ratio statistic is, see e.g. [1, §10.7.2], L n = (ℓ 1 · · · ℓ p ) 1/p 1 p (ℓ 1 + · · · + ℓ p ) 1 2 pn , which is a power of the ratio of the geometric mean of the sample eigenvalues to the arithmetic mean. It is here noticed that in this formula it is necessary to assume that p ≤ n to avoid null eigenvalues in (the numerator of) L n . If we let n → ∞ while keeping p fixed, classical asymptotic theory indicates that under the null hypothesis, −2 log L n =⇒ χ 2 f , a chi-square distribution with degree of freedom f = 1 2 p(p + 1) − 1. This asymptotic distribution is further refined by the following Box-Bartlett correction (referred as BBLRT): where P k (x) = P (χ 2 k ≤ x) and ρ = 1 − 2p 2 + p + 2 6pn , ω 2 = (p + 2)(p − 1)(p − 2)(2p 3 + 6p 2 + 3p + 2) 288p 2 n 2 ρ 2 .
By observing that the asymptotic variance of −2 log L n is proportional to tr{Σ(tr Σ) −1 − p −1 I p } 2 , [10] proposed to use the statistic T 2 = p 2 n 2 tr S n (tr S n ) −1 − p −1 I p for testing sphericity. When p is fixed and n → ∞, under the null hypothesis, it also holds that T 2 =⇒ χ 2 f , which we referred to as John's test. It is observed that T 2 is proportional to the square of the coefficient of variation of the sample eigenvalues, namely Following the idea of the Box-Bartlett correction, [19] established an expansion for the distribution function of the statistics T 2 (referred as Nagao's test), It has been well known that classical multivariate procedures are in general challenged by large-dimensional data. A small simulation experiment is conducted to explore the performance of the BBLRT and Nagao's test (two corrections) with growing dimension p. The sample size is set to n = 64 while dimension p increases from 4 to 60 (we have also run other experiments with larger sample sizes n but conclusions are very similar), and the nominal level is set to be α = 0.05. The samples come from normal vectors with mean zero and identity covariance matrix, and each pair of (p, n) is assessed with 10000 independent replications. Table 1 gives the empirical sizes of BBLRT and Nagao's test. It is found here that when the dimension to sample size ratio p/n is below 1/2, both tests have an empirical size close to the nominal test level 0.05. Then when the ratio grows up, the BBLRT becomes quickly biased while Nagao's test still has a correct empirical size. It is striking that although Nagao's test is derived under classical "p fixed, n → ∞" regime, it is remarkably robust against dimension inflation.
Therefore, the goal of this paper is to propose novel corrections to both LRT and John's test to cope with the large-dimensional context. Similar works have already been done in [15], which confirms the robustness of John's test in large-dimensions; however, these results assume a Gaussian population. In this paper, we remove such a Gaussian restriction, and prove that the robustness of John's test is in fact general. Following the idea of [15], [9] proposed to use a family of well selected U-statistics to test the sphericity; however, as showed in our simulation study in Section 3, the powers of our corrected John's test are slightly higher than this test in most cases. More recently, [26] examined the performance of T 1 (a statistic first put forward in [24]) under non-normality, but with the moment condition γ = 3 + O(p −ǫ ), which essentially matches the Gaussian case (γ = 3) asymptotically. We have also removed this moment restriction in our setting. In short, we have unveiled two corrections that have a better performance and removed the Gaussian or nearly Gaussian restriction found in the existing literature.
From the technical point of view, our approach differs from [15] and follows the one devised in [4] and [6]. The central tool is a CLT for linear spectral statistics of sample covariance matrices established in [2] and later refined in [21]. The paper also contains an original contribution on this CLT reported in the Appendix: new formulas for the limiting parameters in the CLT. Since such CLT's are increasingly important in large-dimensional statistics, we believe that these new formulas will be of independent interest for applications other than those considered in this paper.
The remaining of the paper is organized as follows. Large-dimensional corrections to LRT and John's test are introduced in Section 2. Section 3 reports a detailed Monte-Carlo study to analyze finite-sample sizes and powers of these two corrections under both normal and non-normal distributed data. Next, Section 4 gives the theoretical analysis of their asymptotic power under the alternative of a spiked population model. Section 5 generalizes our test procedures to populations with an unknown mean. Technical proofs and calculations are relegated to Section 6. The last Section contains some concluding remarks.

Large-dimensional corrections
From now on, we assume that the observations Y 1 , . . . , Y n have the representation Y j = Σ 1/2 p X j where the p × n table {X 1 , . . . , X n } = {x ij } 1≤i≤p,1≤j≤n are made with an array of i.i.d. standardized random variables (mean 0 and variance 1). This setting is motivated by the random matrix theory and it is generic enough for a precise analysis of the sphericity test. Furthermore, under the null hypothesis H 0 : Σ p = σ 2 I p (σ 2 is unspecified), we notice that both LRT and John's test are independent from the scale parameter σ 2 under the null. Therefore, we can assume w.l.o.g. σ 2 = 1 when dealing with the null distributions of these test statistics. This will be assumed in all the sections.
Throughout the paper we will use an indicator κ set to 2 when {x ij } are real and to 1 when they are complex as defined in [3]. Also, we define the kurtosis coefficient β = E|x ij | 4 − 1 − κ for both cases and note that for normal variables, β = 0 (recall that for a standard complex-valued normal random variable, its real and imaginary parts are two iid. N (0, 1 2 ) real random variables).

2168
Q. Wang and J. Yao

The corrected likelihood ratio test (CLRT)
For the correction of LRT, let L n = −2n −1 log L n be the test statistic for n ≥ 1.
Our first main result is the following.
Then under H 0 and when p n = y n → y ∈ (0, 1), The test based on this asymptotic normal distribution will be hereafter referred as the corrected likelihood-ratio test (CLRT). One may observe that the limiting distribution of the test crucially depends on the limiting dimension-tosample ratio y through the factor − log(1 − y). In particular, the asymptotic variance will blow up quickly when y approaches 1, so it is expected that the power will seriously break down. Monte-Carlo experiments in Section 3 will provide more details on this behavior.
The proof of Theorem 2.1 is based on the following lemma. In all the following, F y denotes the Marčenko-Pastur distribution of index y (> 0) which is introduced and discussed in the Appendix. And F y (f (x)) = f (x)F y (dx) denotes the integral of function f (x) with respect to F y .
The proof of this lemma is postponed to Section 6.

The corrected John's test (CJ)
Earlier than the asymptotic expansion (1.2) given in [19], [10] proved that when the observations are normal, the sphericity test based on T 2 is a locally most powerful invariant test. It is also established in [11] that under these conditions, the limiting distribution of T 2 under H 0 is χ 2 f with degree of freedom f = 1 2 p(p + 1) − 1, or equivalently, where for convenience, we have let U = 2(np) −1 T 2 . Clearly, this limit has been established for n → ∞ and a fixed dimension p. However, if we now let p → ∞ in the right-hand side of the above result, it is not hard to see that 2 p χ 2 f −p will tend to the normal distribution N (1, 4). It then seems "natural" to conjecture that when both p and n grow to infinity in some "proper" way, it may happen that This is indeed the main result of [15] where this asymptotic distribution was established assuming that data are normal-distributed and p and n grow to infinity in a proportional way (i.e. p/n → y > 0). In this section, we provide a more general result using our own approach. In particular, the distribution of the observation is arbitrary provided a finite fourth moment exists.
The test based on the asymptotic normal distribution given in equation (2.3) will be hereafter referred as the corrected John's test (CJ).
A striking fact in this theorem is that as in the normal case, the limiting distribution of CJ is independent of the dimension-to-sample ratio y = lim p/n. In particular, the limiting distribution derived under classical scheme (p fixed, n → ∞), e.g. the distribution 2 p χ 2 f − p in the normal case, when used for large p, stays very close to this limiting distribution derived for large-dimensional scheme (p → ∞, n → ∞, p/n → y ∈ (0, ∞)). In this sense, Theorem 2.2 gives a theoretic explanation to the widely observed robustness of John's test against the dimension inflation. Moreover, CJ is also valid for the p larger (or much larger) than n case in contrast to the CLRT where this ratio should be kept smaller than 1 to avoid null eigenvalues.
It is also worth noticing that for real normal data, we have κ = 2 and β = 0 so that the theorem above reduces to nU − p ⇒ N (1, 4). This is exactly the result discussed in [15]. Besides, if the data has a non-normal distribution but has the same first four moments as the normal distribution, we have again nU − p ⇒ N (1, 4), which turns out to have a universality property.
The proof of Theorem 2.2 is based on the following lemma.
Then under H 0 and the conditions of Theorem 2.2, we have The proof of this lemma is postponed to Section 6.
Proof of Theorem 2.2. The result of Lemma 2.2 can be rewritten as: Define the function f (x, y) = x By the delta method, The proof of Theorem 2.2 is complete.
Remark 2.1. Note that in Theorems 2.1 and 2.2 appears the parameter β, which is in practice unknown with real data. So we may estimate the parameter β using the fourth-order sample moment: According to the law of large numbers,β = β + o p (1), so substitutingβ for β in Theorems 2.1 and 2.2 does not modify the limiting distribution.

Monte Carlo study
Monte Carlo simulations are conducted to find empirical sizes and powers of CLRT and CJ. In particular, here we want to examine the following questions: how robust are the tests against non-normal distributed data and what is the range of the dimension to sample ratio p/n where the tests are applicable. For comparison, we show both the performance of the LW test using the asymptotic N (1, 4) distribution in (2.2) (Notice however this is the CJ test under normal distribution) and the Chen's test (denoted as C for short) using the asymptotic N (0, 4) distribution derived in [9]. The nominal test level is set to be α = 0.05, and for each pair of (p, n), we run 10000 independent replications.
We consider two scenarios with respect to the random vectors Y i : Table 2 Empirical sizes of LW, CJ, CLRT and C test at 5% significance level based on 10000 independent applications with real N (0, 1) random variables and with real Gamma(4,2)-2 random variables (a) Y i is p-dimensional real random vector from the multivariate normal population N (0, I p ). In this case, κ = 2 and β = 0. (b) Y i consists of iid real random variables with distribution Gamma(4, 2) − 2 so that y ij satisfies Ey ij = 0, Ey 4 ij = 4.5. In this case, κ = 2 and β = 1.5. Table 2 reports the sizes of the four tests in these two scenarios for different values of (p, n). We see that when {y ij } are normal, LW (=CJ), CLRT and C all have similar empirical sizes tending to the nominal level 0.05 as either p or n increases. But when {y ij } are Gamma-distributed, the sizes of LW are higher than 0.1 no matter how large the values of p and n are while the sizes of CLRT and CJ all converge to the nominal level 0.05 as either p or n gets larger. This empirically confirms that normal assumptions are needed for the result of [15] while our corrected statistics CLRT and CJ (also the C test) have no such distributional restriction.
As for empirical powers, we consider two alternatives (here, the limiting spectral distributions of Σ p under these two alternatives differs from that under H 0 ): Table 3 Empirical powers of LW, CJ, CLRT and C test at 5% significance level based on 10000 independent applications with real N (0, 1) random variables and with real Gamma(4,2)-2 random variables under two alternatives Power 1 and 2 (see the text for details) (1) Σ p is diagonal with half of its diagonal elements 0.5 and half 1. We denote its power by Power 1; (2) Σ p is diagonal with 1/4 of the elements equal 0.5 and 3/4 equal 1. We denote its power by Power 2. Table 3 reports the powers of LW(=CJ), CLRT and C when {y ij } are distributed as N (0, 1), and of CJ, CLRT and C when {y ij } are distributed as Gamma(4,2)-2, for the situation when n equals 64 or 128, with varying values of p and under the above mentioned two alternatives. For n = 256 and p varying from 16 to 240, all the tests have powers around 1 under both alternatives so that these values are omitted. And in order to find the trend of these powers, we also present the results when n = 128 in Figure 1    The behavior of Power 1 and Power 2 in each figure related to the three statistics are similar, except that Power 1 is much higher compared with Power 2 for a given dimension design (p, n) and any given test for the reason that the first alternative differs more from the null than the second one. The powers of LW (in the normal case), CJ (in the Gamma case) and C are all monotonically increasing in p for a fixed value of n. But for CLRT, when n is fixed, the powers first increase in p and then become decreasing when p is getting close to n. This can be explained by the fact that when p is close to n, some of the eigenvalues of S n are getting close to zero, causing the CLRT nearly degenerate and losing power.
Besides, we find that in the normal case the trend of C's power is very much alike of those of LW while in the Gamma case it is similar with those of CJ under both alternatives. And in most of the cases (especially in large p case), the power of C test is slightly lower than LW (in the normal case) and CJ (in the Gamma case).
Lastly, we examine the performance of CJ and C when p is larger than n. Empirical sizes and powers are presented in Table 4. We choose the variables to be distributed as Gamma(4,2)-2 since CJ reduces to LW in the normal case, and [15] has already reported the performance of LW when p is larger than n. From the table, we see that when p is larger than n, the size of CJ is still correct and it is always around the nominal level 0.05 as the dimension p increases and the same phenomenon exists for C test.
When we evaluate the power, the same two alternatives Power 1 and Power 2 as above are considered. The sample size is fixed to n = 64 and the ratio p/n varies from 1 to 20. We see that Power 1 are much higher than Power 2 for the same reason that the first alternative is easier to be distinguished from H 0 . Besides, the powers under both alternatives all increase monotonically for 1 ≤ p n ≤ 15. However, when p/n is getting larger, say p/n = 20, we can observe that its size is a little larger and powers a little drop (compared with p/n = 15) but overall, it still behaves well, which can be considered as free from the assumption constraint "p/n → y". Besides, the powers of CJ are always slightly higher than those of C in this "large p small n" setting.
Since the asymptotic distribution for the CLRT and CJ are both derived under the "Marcenko-Pasture scheme" (i.e p/n → y ∈ (0, ∞)), if p/n is getting too large (p ≫ n), it seems that the limiting results provided in this paper will loose accuracy. It is worth noticing that [7] has extended the LW test to such a scheme (p ≫ n) for multivariate normal distribution.
Summarizing all these findings from this Monte-Carlo study, the overall figure is the following: when the ratio p/n is much lower than 1 (say smaller than 1/2), it is preferable to employ CLRT (than CJ, LW or C); while this ratio is higher, CJ (or LW for normal data) becomes more powerful (slightly more powerful than C).

Asymptotic powers: under the spiked population alternative
In this section, we give an analysis of the powers of the two corrections: CLRT and CJ. To this end, we consider an alternative model that has attracted lots of attention since its introduction by [13], namely, the spiked population model. This model can be described as follows: the eigenvalues of Σ p are all one except for a few fixed number of them. Thus, we restrict our sphericity testing problem to the following: where the multiplicity numbers n i 's are fixed and satisfying n 1 + · · · + n k = M . We derive the explicit expressions of the power functions of CLRT and CJ in this section. Under H * 1 , the empirical spectral distribution of Σ p is and it will converge to δ 1 , a Dirac mass at 1, which is the same as the limit under the null hypothesis H 0 : Σ p = I p . From this point of view, anything related to the limiting spectral distribution remains the same whenever under H 0 or H * 1 . Then recall the CLT for LSS of the sample covariance matrix, as provided in [2], is of the form: where the right side of this equation is determined only by the limiting spectral distribution. So we can conclude that the limiting parameters µ and σ 2 remain the same under H 0 and H * 1 , only the centering term p f (x)dF yn (x) possibly makes a difference. Since there's a p in front of f (x)dF yn (x), which tends to infinity as assumed, so knowing the convergence H n → δ 1 is not enough and more details about the convergence are needed. In [28], we have established an asymptotic expansion for the centering parameter when the population has a spiked structure. We will use these formulas like equations (4.2), (4.3) and (4.6) in the following to derive the powers of the CLRT and CJ. Lemma 2.1 remains the same under H * 1 , except that this time the centering terms become (see formulas (4.11) and (4.12) in [28]): Repeating the proof of Theorem 2.1, we can get: under H * 1 . As a result, the power of CLRT for testing H 0 against H * 1 can be expressed as: for a pre-given significance level α.
It is worth noticing here that if the alternative has only one simple spike, i.e. k = 1, n k = 1, and assuming the real Gaussian variable case, i.e. κ = 2, (4.5) reduces to a result provided in [20]. However, our formula is valid for a general number of spikes with eventual multiplicities. Besides, these authors use some more sophisticated tools of asymptotic contiguity and Le Cam's first and third lemmas, which are totally different from ours.
In order to calculate the power function of CJ, we restated Lemma 2.2 as follows: Using the delta method as in the proof of Theorem 2.2, this time, we have under H * 1 . Power function of CJ can be expressed as for a pre-given significance level α. Now consider the functions a i − log a i − 1 and (a i − 1) 2 appearing in the expressions (4.5) and (4.8), they will achieve their minimum value 0 at a i = 1, which is to say, once a i 's going away from 1, the powers β 1 (α) and β 2 (α) will both increase. This phenomenon agrees with our intuition, since the more a i 's deviate away from 1, the easier to distinguish H 0 from H * 1 . Therefore, the powers should naturally grow higher.
Then, we consider the power functions β 1 (α) and β 2 (α) as functions of y, and see how they are going along with y ′ s changing. we see that in expression (4.5), − log(1 − y) − y is increasing when y ∈ (0, 1), so β 1 (α) is decreasing as the function of y, which attains its maximum value 1 when y → 0 + and minimum value α when y → 1 − . Also, expression (4.8) is obviously a decreasing function of y, attaining its maximum value 1 when y → 0 + and minimum value α when y → +∞. We present the trends of β 1 (α) and β 2 (α) (corresponding to the power of CLRT and CJ) in Figure 3 when only one spike a = 2.5 exists. It is however a little different from the non-spiked case as showed in the simulation in Section 3 (both Figure 1 and Figure 2), where the power of CLRT first increases then decreases while the power of CJ is always increasing along with the increase of the value of p. These power drops are due to the fact that when p increases, since only one spike eigenvalue is considered, it becomes harder to distinguish both hypotheses. Besides, an interesting finding here is that these power functions give a new confirmation of the fact that CLRT behaves quite badly when y → 1 − , while CJ test has a reasonable power for a significant range of y > 1.

Generalization to the case when the population mean is unknown
So far, we have assumed the observation (Y i ) are centered. However, this is hardly true in practical situations when µ = EY i is usually unknown. Therefore, the sample covariance matrix should be taken as * is a rank one matrix, substituting S n for S * n when µ is unknown will not affect the limiting distribution in the CLT for LSS; while it is not the case for the centering parameter, for it has a p in front.
Recently, [22] shows that if we use S * n in the CLT for LSS when µ is unknown, the limiting variance remains the same as we use S n ; while the limiting mean has a shift which can be expressed as a complex contour integral. Later, [30] looks into this shift and finally derives a concise conclusion on the CLT corresponding to S * n : the random vector X * n (f 1 ), . . . , X * n (f k ) converges weakly to a Gaussian vector with the same mean and covariance function as given in Theorem A.1, where this time, . It is here important to pay attention that the only difference is in the centering term, where we use the new ratio y n−1 = p n−1 instead of the previous y n = p n , while leaving all the other terms unchanged.
Using this result, we can modify our Theorems 2.1 and 2.2 to get the CLT of CLRT and CJ under H 0 when µ is unknown only by considering the eigenvalues of S * n and substituting n − 1 for n in the centering terms. More precisely, now equations (2.1) and (2.3) in Theorems 2.1 and 2.2 become The same procedures can be applied to get the CLT of CLRT and CJ under H * 1 when µ is unknown. This time, equations (4.4) and (4.7) become and therefore the powers of CLRT and CJ under the spiked alternative remain unchanged as expressed in (4.5) and (4.8).

Additional proofs
We recall these two important formulas which appear in the Appendix as (A.2) and (A.3) here for the convenience of reading:

Proof of Lemma 2.1
Let for x > 0, f (x) = log x and g(x) = x. Define A n and B n by the decompositions Applying Theorem A.1 given in the Appendix to the pair (f, g), we have It remains to evaluate the limiting parameters and this results from the following calculations where h is denoted as √ y: I 1 (f, r) = 1 2 log 1 − h 2 /r 2 , (6.1) I 1 (g, r) = 0, (6.2) 3) We now detail these calculations to complete the proof. They are all based on the formula given in Proposition A.1 in the Appendix and repeated use of the residue theorem.
Proof of (6.1). We have For the first integral, note that as r > 1, the poles are ± 1 r and we have by the residue theorem, For the second integral, The third one is where the first equality results from the change of variable z = 1 ξ , and the third equality holds because r > 1, so the only pole is z = 0.
The fourth one equals Collecting the four integrals leads to the desired formula for I 1 (f, r).
Proof of (6.2). We have These two integrals are calculated as follows: Collecting the two terms leads to I 1 (g, r) = 0.
We have where the first equality results from the change of variable z = 1 ξ2 , and the third equality holds because | r h | > 1, so r h is not a pole. Finally, we find J 1 (f, f, r) = − 1 r log(1 − h 2 r ).

Concluding remarks
Using recent central limit theorems for eigenvalues of large sample covariance matrices, we are able to find new asymptotic distributions for two major procedures to test the sphericity of a large-dimensional distribution. Although the theory is developed under the scheme p → ∞, n → ∞ and p/n → y > 0, our Monte-Carlo study has proved that: on the one hand, both CLRT and CJ are already very efficient for middle dimension such as (p, n) = (96, 128) both in size and power, see Table 2 and Table 3; and on the other hand, CJ also behaves very well in most of "large p, small n" situation, see Table 4. Three characteristic features emerge from our findings: (a) These asymptotic distributions are universal in the sense that they depend on the distribution of the observations only through its first four moments; (b) The new test procedures improve quickly when either the dimension p or the sample size n gets large. In particular, for a given sample size n, within a wide range of values of p/n, higher dimensions p lead to better performance of these corrected test statistics. (c) CJ is particularly robust against the dimension inflation. Our Monte-Carlo study shows that for a small sample size n = 64, the test is effective for 0 < p/n ≤ 20.
In a sense, these new procedures have benefited from the "blessings of dimensionality".
Appendix A: Formula for limiting parameters in the CLT for eigenvalues of a sample covariance matrix with general fourth moments Given a sample covariance matrix S n of dimension p with eigenvalues λ 1 , . . . , λ p , linear spectral statistics of the form F n (g) = p −1 p i=1 g(λ j ) for suitable functions g are of central importance in multivariate analysis. Such CLT's have been successively developed since the pioneering work of [12], see [2] and [16] for a recent account on the subject.
The CLT in [2] (see also an improved version in [5]) has been widely used in applications as this CLT also provides, for the first time, explicit formula for the mean and covariance parameters of the normal limiting distribution. In the special case with an array {x ij } of independent variables, this CLT assumes the following moment conditions: (a) For each n, x ij = x n ij , i p, j n are independent. (b) Ex ij = 0, E|x ij | 2 = 1, max i,j,n E|x ij | 4 < ∞.
In Condition (c), the fourth moments of the entries are set to the values 3 or 2 matching the normal case. This is indeed a quite demanding and restrictive condition since in the real case for example, it is incredibly hard to find a nonnormal distribution with mean 0, variance 1 and fourth moment equaling 3. As a consequence, most of if not all applications published in the literature using this CLT assumes a normal distribution for the observations. Recently, effort have been made in [21,16] and [29] to overcome these moment restrictions. We present below such a CLT with general forth moments that will be used for the sphericity test.
In all the following, we use an indicator κ set to 2 when {x ij } are real and to 1 when they are complex. Define β = E|x ij | 4 − 1 − κ for both cases and h = √ y. For the presentation of the results, let be the sample covariance matrix S n = 1 n n i=1 X i X * i where X i = (x ki ) 1≤k≤p is the i-th observed vector. It is then well-known that when p → ∞, n → ∞ and p/n → y > 0, the distribution of its eigenvalues converges to a nonrandom distribution, namely the Marčenko-Pastur distribution F y with support [a, b] = [(1 ± √ y) 2 ] (an additional mass at the origin when y > 1). Moreover, the Stieltjes transform m of a companion distribution defined by F y = (1 − y)δ 0 + yF c satisfies an inverse equation for z ∈ C + , The following CLT is a particular instance of Theorem 1.4 in [21].