Testing the Sphericity of a covariance matrix when the dimension is much larger than the sample size

This paper focuses on the prominent sphericity test when the dimension $p$ is much lager than sample size $n$. The classical likelihood ratio test(LRT) is no longer applicable when $p\gg n$. Therefore a Quasi-LRT is proposed and asymptotic distribution of the test statistic under the null when $p/n\rightarrow\infty, n\rightarrow\infty$ is well established in this paper. Meanwhile, John's test has been found to possess the powerful {\it dimension-proof} property, which keeps exactly the same limiting distribution under the null with any $(n,p)$-asymptotic, i.e. $p/n\rightarrow[0,\infty]$, $n\rightarrow\infty$. All asymptotic results are derived for general population with finite fourth order moment. Numerical experiments are implemented for comparison.


Introduction
High dimensional data with dimension p of same scale with or even larger than the number of observations n has applausive statistical applications in biology and finance recently. In particular, practical needs for testing gene-wise independence in genomic studies have inspired a wide range of discussions regarding test of structures of the covariance matrix.
In this paper, we consider the prominent sphericity test when the dimension p is much larger than the sample size n. Let X = (X 1 , X 2 , · · · , X n ) be a p × n data matrix with n independent and identically distributed p−dimensional random vectors {X i } 1≤i≤n with covariance Σ = V ar(X i ). Our interest is to test H 0 : Σ = σ 2 I p vs. H 1 : Σ = σ 2 I p , (1.1) where σ 2 is an unknown positive constant. Among traditional tests are the likelihood ratio test(LRT) and John's invariant test.
Consider first the LRT with test statistic (Anderson (1984)) − 2 log L n = −2 log (l 1 · · · l p ) 1/p 1 p (l 1 + · · · + l p ) pn 2 = n log l where {l i } 1≤i≤p are the eigenvalues of p−dimensional sample covariance matrix 1 n n i=1 X i X ′ i = 1 n XX ′ , X = (X 1 , · · · , X n ). If we let n → ∞ while keeping p fixed, classics asymptotic theory indicates that under the null hypothesis and assuming the population is normal, −2 log L n d − → χ 2 1 2 p(p+1)−1 , the chi-square distribution is further refined by the Box-Bartlett correction. However, this χ 2 −convergence becomes slow when the dimension p increases so that the LRT (and its Box-Bartlett correction) is seriously biased when the dimension-to-sample size ratio p/n is not small enough.
Wang and Yao (2013) made bias correction to the traditional LRT test under the regime where both p, n → ∞, p/n → c ∈ (0, 1). They derived that when X = {x ij } 1≤i≤p 1≤j≤n with i.i.d entries satisfying E(x ij ) = 0, E|x ij | 2 = 1, ν 4 := E|x ij | 4 < ∞, and under H 0 , Notice that here the scale parameter σ 2 in H 0 has been taken to be σ 2 = 1 as the LRT statistic is invariant under scaling. Extensive simulation study in Wang and Yao (2013) shows that this test is well adapted to high dimensions and has a very reasonable size and power for a wide range of dimension-sample size combinations (p, n). The LRT however requires that p ≤ n because when p > n, n − p of the sample eigenvalues {l i } are null so that the likelihood ratio L n is identically null. In this paper, we introduce a quasi-LRT statistic which can be seen as a natural extension of the LRT statistic to the situation where p > n. The quasi-LRT test statistic is defined as where {λ i } 1≤i≤n are eigenvalues of n−dimensional matrix 1 p X ′ X. The main idea is that the companion matrix X ′ X has exactly the same n non-null eigenvalues with the sample covariance matrix XX ′ (up to some scaling). Therefore, the quasi-LRT test statistic removes all the null eigenvalues in the original LRT test statistic and we find that under the so-called ultra-dimensional asymptotic p ≫ n, that is p/n → ∞ and n → ∞, Based on this asymptotic result, a quasi-LRT test can be conducted to test sphericity to compensate for the inapplicability of the traditional LRT in the ultra-dimension setting.
Next we consider John's invariant test for sphericity. John (1971John ( , 1972) studied the problem for normal populations and proposed the testing statistic U = 1 p tr (1.5) where l = 1 p p i=1 l i . It has been proved that, as n → ∞ while p remain fixed, the limiting distribution of U under H 0 is Contrary to the LRT, it has been noticed for a while that John's test does not suffer from high dimensions and this χ 2 limit is quite accurate even when the ratio p/n is not small. Ledoit and Wolf (2002) studied the (n, p)-consistency of this test statistic under normality assumptions. They proved that, when n, p → ∞, lim n→∞ p/n → c ∈ (0, +∞), In other words, Ledoit and Wolf (2002) extended the classical n-asymptotic theory (where p is fixed) to the high-dimensional case where p goes to infinity proportionally with n. Meanwhile, the robustness of John's test is explained in this proportional high-dimensional scheme. Wang and Yao (2013) further relaxed the normality restriction and proved that, (1.7) Since ν 4 = 3 for normal distribution, it shows that the existing results confirm with each other. In this paper, we extend the above result one step further, i.e. consider the asymptotic behavior of the John's test statistic under the ultra-dimensional p ≫ n setting. We find that this test statistic possesses a remarkable dimension-proof property, which shows that under the (n, p)-asymptotic, the limit in (1.7) still holds when lim n→∞ p/n = ∞. This dimensionproof property of John's test makes it a very competitive candidate for sphericity testing regardless of p, n. Related methods have also been proposed in the literature for the high dimensional sphericity test. Noteworthy work include Schott (2005) where a test statistic based on the logarithm of the norm of sample correlation matrix under (n, p)-asymptotic has been well studied. Yet multivariate normality assumption has been assumed in this paper. Similarly in Fisher et al. (2010), a novel test statistic utilizing the ratio of the fourth and second arithmetic means of the sample covariance matrix is developed under the p/n → c, (n, p)-asymptotic with normality restriction. Srivastava (2005) considered the ratio of arithmetic means of the eigenvalues of sample covariance matrix in the normal case when n = O(p δ ), δ > 0, n, p → ∞ and Srivastava (2011) further proved the robustness of this test statistic against non-normality assumption irrespective of either n/p → 0 or n/p → ∞. However, their results are only applicable under some specified factorized settings, which makes it less general than John's test. Chen et al. (2010) developed a high-dimensional test based on the John's test, however this test is very time-consuming (See Section 2.4). Zou et al. (2013) considered the multivariate-sign-based covariance matrices to construct robust test for sphericity and significantly enhanced test performance when the non-normality is severe, particularly for heavy tailed distributions. In their paper the asymptotic distributions of the test statistic when p = O(n 2 ) is derived. Srivastava (2006) studied a quasi-likelihood ratio test under the n = O(p δ ), 0 < δ < 1, n, p → ∞ asymptotic in the normal case, while in this paper, the normality assumption is released and results are discussed under a wider range of (n, p)-asymptotic. These tests are compared in the simulation studies of the paper in Section 2.4.
The rest of the paper is organized as follows. Section 2 discusses the asymptotic behavior of the John's test statistic and the quasi-LRT test statistic under the ultra-dimensional setting.
Empirical sizes and powers of these two tests and other methods are compared under various scenarios. Section 3 presented theoretical results for power of John's test and quasi-LRT test and testified these results with simulations. Section 4 concludes. Some technique lemmas and related proofs are displayed in the Appendix A.

Preliminary Knowledge
For any n × n Hermitian matrix M with real eigenvalues λ 1 , · · · , λ n , the empirical spectral distribution (ESD for short) of M is defined by F M = n −1 n j=1 δ λ j , where δ a denotes the Dirac mass at a. The Stieltjes transform of any distribution G is defined as where I(z) stands for the imaginary part of z.
Consider the re-normalized sample covariance matrix A = p n 1 p X ′ X − I n , where X = (x ij ) p×n and x ij , i = 1, · · · , p, j = 1, · · · , n are i.i.d. real random variables with mean zero and variance one, I n is the identity matrix of order n. It's known that under the ultra-dimensional setting (Bai and Yin, 1988), with probability one, the ESD of matrix A, F A converges to the semicircle law F with density We denote the Stieltjes transform of the semicircle law F by m(z). Let S denote any open region on the complex plane including [−2, 2], the support of F and M be the set of functions which are analytic on S . For any f ∈ M , denote 11 and √ B 2 − 4AC is a complex number whose imaginary part has same sign as that of B. The integral's contour is taken as |m| = ρ with ρ < 1. Chen and Pan (2013) gives a calibration in advance for the mean correction term in (2.1), where only C is replaced with while others remain the same. The central limit theorem (CLT) of linear functions of eigenvalues of the re-normalized sample covariance matrix A when the dimension p is much larger than the sample size n derived by Chen and Pan (2013) is stated as follows.
Then, for any f 1 , · · · , f k ∈ M , the finite dimensional random vector (G n (f 1 ), · · · , G n (f k )) converges weakly to a Gaussian vector (Y (f 1 ), · · · , Y (f k )) with mean function EY (f ) = 0 and covariance function The proofs of the main theorems in this paper are based on two lemmas derived from this CLT. Notice that the limiting covariance functions in (3.1) has been first established in Bai and Yao (2005) for Wigner matrices.
where X satisfies the assumptions in Theorem 2.2, then as p/n → ∞, n → ∞, X satisfies the assumptions in Theorem 2.3, then as p/n → ∞, n → ∞, The proofs of these two lemma are postponed to Appendix A.

John's Test
Consider John's test statistic U defined in (1.5) based on eigenvalues of the p−dimensional sample covariance matrix S = 1 n XX ′ . Here we assume that the X ′ j s in X have representation It can be seen that, under the null hypothesis H 0 , the John's test statistic is independent from the scale parameter σ 2 . Therefore, we assume w.l.o.g. σ 2 = 1 when we derive the null distribution of the test statistic. In other words, under H 0 , we assume in the rest of this paper that sample vectors The first main result of this paper is the following.
Similarly with this theorem, Wang and Yao (2013) It indicates that as long as X = {x ij } p×n are i.i.d with zero mean, unit variance and finite fourth order moment, John's test statistic nU − p has a consistent limiting distribution N (ν 4 − 2, 4), regardless of normality, under any (n, p)-asymptotic, n/p → [0, ∞). Therefore, the powerful dimension-proof property assigns John's test top priority when little information about the data is known before implementing sphericity test.
The proof of Theorem 2.2 is based on Lemma 2.1.
Proof. Denote the eigenvalues of p×p matrix S n = 1 n XX ′ in descending order by l i (1 ≤ i ≤ p), and the eigenvalues of n × n matrix A = p n Since p > n, S n has p − n zero eigenvalues and the remaining n non-zero eigenvalues We have, for John's test statistic

then John's test statistic can be written as
According to Lemma 2.1, when p/n → ∞, n → ∞, Then by the Delta Method, is the corresponding gradient vector.

Quasi-likelihood ratio test
Consider the Quasi-LRT statistic L n in (1.4) based on the eigenvalues of n−dimensional matrix 1 p X ′ X, which are also proportional to the non-null eigenvalues of p−dimensional sample covariance matrix 1 n XX ′ . Similarly with John's test statistic, it can be seen that, under the null hypothesis H 0 , the L n statistic is independent of the scale parameter σ 2 . Therefore, we again assume w.l.o.g. σ 2 = 1 when we derive the null distribution of the test statistic. The second main result of this paper is the following theorem.
Recall the classic LRT when H 0 holds and p is fixed while n → ∞, if the population is Gaussian, the test statistic where {l i } 1≤i≤p are the eigenvalues of p−dimensional sample covariance matrix 1 n XX ′ . Here we notice that n/p → ∞.
By interchanging the role of n and p, which is feasible under H 0 , it can be seen that when n fixed and p/n → ∞, the test statistic which is nothing but (2.3) applied to the normal case (ν 4 = 3) with fixed n and p → ∞. Therefore, the classical LRT can be thought of as a particular "finite-dimensional" instance of the general limit of (2.3) for the Quasi-LRT, that is, Theorem 2.3 covers a wide range of "large p, small n" situations.
The proof of Theorem 2.3 is based on lemma 2.2.
Proof. Denote the eigenvalues of n × n matrix 1 p X ′ X in descending order byl i (1 ≤ i ≤ n), and eigenvalues of n × n matrix A = p n 1 p X ′ X − I n by λ i (1 ≤ i ≤ n). These eigenvalues are related as We have, for the Quasi-LRT test statistic Define the function then the Quasi-LRT test statistic can be written as According to Lemma 2.2, when p/n → ∞, n → ∞, Then by the Delta Method, is the corresponding gradient vector.

Simulation Studies
In order to further explore the finite sample behavior of John's sphericity test when dimension p is significantly larger than the sample size n, Monte Carlo simulations are implemented in this session to evaluate the size and power of John's Sphericity Test. Test statistic proposed by Chen et al. (2010) is also considered for comparison.
In the simulation, without loss of generality, we conduct the sphericity test with σ 2 = 1. To find the empirical sizes of these two tests, we consider two different scenarios to generate sample data: Ex 2 ij = 1, Ex 4 ij = ν 4 = 4.5. We set sample size n = 64, dimension p = 320, 640, 960, 1280, 1600, 2400, 3200 in order to understand the effect of an increasing dimension. The nominal test level is α = 0.05. For each pair of (p, n), 10000 replications are used to get the empirical size.
As for the test in Chen et al. (2010), the test statistic is defined as follows: where P r n = n!/(n − r)!, * denotes summation over mutually different indices. Then we reject H 0 if nU n exceeds the 5% upper quantile of N (0, 4) distribution.
For the test in Srivastava (2011)(Sri for short), the test statistic is defined as follows: (n−1)(n+2) . According to the limiting distribution of W n , we reject H 0 if W n exceeds the 5% upper quantile of N (0, 1) distribution. As for empirical powers, we generate sample data from two alternatives: -Power 1: Σ is diagonal with half of its diagonal elements 0.5 and half 1. This power scenario is denoted by Power 1; -Power 2: Σ is diagonal with 1/4 of its diagonal elements 0.5 and 3/4 equal to 1. This power scenario is denoted by Power 2. Table 1 reports the empirical sizes and powers of two tests for Gaussian data. Table 2 is for Non-Gaussian data. It can be seen from the above results that both John's test and QLRT perform well with respect to sizes and powers. Empirical powers under Power 1 are in general higher than under Power 2 because of more significant difference between H 0 and H 1 . John's test performs slightly better than Chen's method. In all tested scenarios, the QLRT dominates the other two tests in term of power even though the difference is quite marginal. Srivastava's test performs slightly below John's test in the Gaussian case and still suffers from non-normality with non-negligible bias. Furthermore, we have recorded the execution time of these two tests within different scenarios and we find that Chen's method is more time-consuming due to more complicated computations.

Power of the tests
In this section we study the asymptotic power of the two tests. To begin with, some preliminary knowledge is introduced as follows.

Preliminary knowledge
Consider the re-normalized sample covariance matrix where Z = (z ij ) p×n and z ij , i = 1, · · · , p, j = 1, · · · , n are i.i.d. real random variables with mean zero and variance one, I n is the identity matrix of order n, Σ p is a sequence of p × p non-negative definite matrices with bounded spectral norm. Assume the following limit exist, it has been proven that, under the ultra-dimensional setting (Bai and Yin, 1988), with probability one, the ESD of matrix A, F A converges to the semicircle law F with density We denote the Stieltjes transform of the semicircle law F by m(z). Let S denote any open region on the complex plane including [−2, 2], the support of F and M be the set of functions which are analytic on S . For any f ∈ M , denote where, for any positive integer k, π −π f (2 cos(θ)) cos(kθ) dθ.
Limiting theory of the test statistics under the alternative H 1 is based on a new CLT for linear statistics of A, provided in Li and Yao (2016), as follows.
Then, for any f 1 , · · · , f k ∈ M , the finite dimensional random vector (G n (f 1 ), · · · , G n (f k )) converges weakly to a Gaussian vector (Y (f 1 ), · · · , Y (f k )) with mean function The proofs of Theorem 3.2 and 3.3 about the power of the two test statistics are based on two lemmas derived from this CLT.
where Z, Σ p satisfies the assumptions in Theorem 3.1, then where Z, Σ p satisfies the assumptions in Theorem 3.1, then The proofs of these two lemma are postponed to Appendix A.

John's test
Suppose that an i.i.d. p−dimensional sample vectors X 1 , · · · , X n follow the multivariate distribution with covariance matrix Σ p . To explore the power of John's test under the alternative hypothesis H 1 : Σ p = σ 2 I p , we assume that the X ′ j s in X have representation The main result of the power of John's test is as follows.
Theorem 3.2. Assume X 1 , · · · , X n are i.i.d. p−dimensional sample vectors follow multivariate distribution with covariance matrix Σ p , X = Σ 1/2 ij ) = 1, E|z ij | 4 = ν 4 < ∞, Σ p is a sequence of p × p non-negative definite matrices with bounded spectral norm and the following limit exist, Note that the theorem above reveals the limit distribution of John's test statistic under alternative hypothesis H 1 . Nevertheless, if let Σ p = σ 2 I p , then γ = σ 2 , θ = ω = σ 4 , Theorem 3.2 reduces to Theorem 2.2, which states the null distribution of John's test statistic under H 0 . With the two limit distributions of John's test statistic under H 0 and H 1 , power of the test is derived as below.
Proposition 3.1. With the same assumptions as in Theorem 3.2, when p/n → ∞, n → ∞, n 3 /p = O(1), the power of John's test where α is the nominal test level, Z α , Φ(·) are the alpha upper quantile and cdf of standard normal distribution respectively.
The proof of Theorem 3.2 is based on Lemma 3.1.
Proof. Denote the eigenvalues of p × p matrix S n = 1 n XX ′ = 1 n ZΣ p Z ′ in descending order by Since p > n, S n has p − n zero eigenvalues and the remaining n non-zero eigenvalues l i are related with λ i as We have, for John's test statistic According to Lemma 3.1, when p/n → ∞, n → ∞, Then by the Delta Method, is the corresponding gradient vector.

Quasi-likelihood ratio test
Consider the Quasi-LRT statistic L n in (1.4) based on the eigenvalues of n−dimensional matrix 1 p X ′ X. Similarly with John's test statistic, it can be seen that, under the alternative hypothesis H 1 , the L n statistic can be represented as The main result of the power of the Quasi-LRT test is as follows.
Theorem 3.3. With the same assumptions as in Theorem 3.2, when p/n → ∞, n → ∞, n 3 /p = O(1), Note that the theorem above reveals the limit distribution of the Quasi-LRT statistic under alternative hypothesis H 1 . Nevertheless, if let Σ p = σ 2 I p , then γ = σ 2 , θ = ω = σ 4 , Theorem 3.3 reduces to Theorem 2.3, which states the null distribution of the Quasi-LRT test statistic under H 0 . Similarly, with the two limit distributions of QLRT statistic under H 0 and H 1 , power of the test is derived as below.
Proposition 3.2. With the same assumptions as in Theorem 3.2, when p/n → ∞, n → ∞, n 3 /p = O(1), the power of QLRT β QLRT (H 1 ) is where α is the nominal test level, Z α , Φ(·) are the alpha upper quantile and cdf of standard normal distribution respectively.
The proof of Theorem 3.3 is based on lemma 3.2.
Proof. Denote the eigenvalues of n × n matrix 1 p X ′ X = 1 p Z ′ Σ p Z in descending order by . These eigenvalues are related as We have, for the Quasi-LRT test statistic then the Quasi-LRT test statistic can be written as According to Lemma 3.2, when p/n → ∞, n → ∞, n 3 /p = O(1), n p is the corresponding gradient vector.
Then we have, for (u, v) = 0, The result thus follows.

Simulation Experiments
Empirical power of the two tests are shown in this section to testify the theoretical results presented in Proposition 3.1 and 3.2. Specifically, we consider two different scenarios to generate sample data: (2) {z ij , 1 ≤ i ≤ p, 1 ≤ j ≤ n} i.i.d follow Gamma(4, 2) − 2 distribution, then Ez ij = 0, Ez 2 ij = 1, Ez 4 ij = ν 4 = 4.5. X p×n = Σ 1/2 p Z p×n . To cover multiple alternative hypothesis, Σ p is configured as a diagonal matrix with elements 0.5 and 1. The proportion of "1" is δ. The nominal test level is set as α = 0.05. (p, n) = (2400, 64) and empirical power are generated from 5000 replications. Theoretical values are displayed for comparison. It can be seen from Table 3 that the empirical and theoretical power coincide with each other and both tests have very large power even when δ is small.

Discussions and Auxiliary Results
In summary, we found in the considered ultra-dimension (p ≫ n) situations, QLRT is the most recommended procedure regarding its maximal power for sphericity test. However, from the application perspective where the dimension p and n are explicitly known, it becomes very difficult to decide which asymptotic scheme to use, namely, " p fixed, n → ∞", "p/n → c ∈ (0, ∞), p, n → ∞", or "p/n → ∞, p, n → ∞" etc. Combining our study with the existing literature, we would like to recommend a dimension-proof procedure like John's test or Chen's test, with a slight preference for John's test as it has a slightly higher power and an easier implementation.
We conclude the paper by mentioning some surprising consequence of the main results of the paper as follows. Proof. Interchanging the role of n and p in Theorem 2.2, keeping all other assumptions unchanged, it can be seen that, when n/p → ∞, n, p → ∞, Henceforth, the dimension-proof property of John's test statistic, i.e. regardless of normality, under any (n, p)-asymptotic, n/p → [0, ∞], has been completely testified.

Consequently result follows.
Proof of Lemma 3.1: Proof. According to Theorem 3.1, define function f 1 (x) = x 2 , then where F A is the ESD of A = I n and F represents the semicircular law.