Invariant test based on the modiﬁed correction to LRT for the equality of two high-dimensional covariance matrices

: In this paper, we propose an invariant test based on the mod- iﬁed correction to the likelihood ratio test (LRT) of the equality of two high-dimensional covariance matrices. It is well-known that the classical log-LRT is not well deﬁned when the dimension is larger than or equal to one of the sample sizes. Or even the log-LRT is well-deﬁned, it is usu- ally perceived as a bad statistic in the high-dimensional cases because of their low powers under some alternatives. In this paper, we will justify the usefulness of the modiﬁed log-LRT, and an invariant test that works well in cases where the dimension is larger than the sample sizes. Besides, the test is established under the weakest conditions on the dimensions and the moments of the samples. The asymptotic distribution of the proposed test statistic is also obtained under the null hypothesis. What is more, we also propose a lite version of the modiﬁed LRT in the paper. A simulation study and a real data analysis show that the performances of the two proposed statistics are invariant under aﬃne transformations.


Introduction
Since the assumption of homogeneity of covariance matrices is needed in many multivariate statistical analyses based on two populations, the equality of two covariance matrices is among the most active hypothesis tests. These tests date back to the work of [19], which was followed by a huge amount of literature. Suppose we have N := N 1 + N 2 observations {z (l) i ∼ N (μ l , Σ l ), i = 1, . . . N l , l = 1, 2} and wish to test the hypothesis where μ l and Σ l are the population mean vectors and covariance matrices of the p-dimensional vectors z (l) i , l = 1, 2 respectively. It is natural to first consider the Invariant test for covariance matrices 851 likelihood ratio test (LRT) if it is "applicable". But when is the LRT applicable to testing the equality of two covariance matrices? The traditional viewpoint is that the LRT is applicable if the sample sizes are both much larger than the dimensions based on the χ 2 approximation [20]. However, [5] and [12] showed that when the dimensions are large but smaller than the sample sizes, the traditional χ 2 approximation of the LRT fails. This problem encouraged us to investigate the conditions under which the high-dimensional LRT is applicable. Additionally, we consider a lite LRT proposed by [16,17]. Currently, there are three general types of test procedures for the highdimensional hypothesis (1.1) that are widely discussed in the literature: (i) Corrected classical LRTs, see [5,11,12]; (ii) Nonparametric methods, see [13,14,18]; (iii) Maximum element methods, see [7,8]. The strengths and weaknesses of these three methods are significant. Nonparametric methods and maximum element methods can address cases where the dimensions are much larger than the sample sizes but are strongly restricted by the structure or eigenvalues of the population covariance matrices. By contrast, corrected classical LRT requires the dimensions to be smaller than the sample sizes, but there is no assumption on the population covariance matrices. In addition, if the dimensions are fixed, LRT has been shown to be unbiased and uniformly most powerful among affine-transform-invariant tests. Therefore, we focus on the LRT.
In the following, we denote X (1) (1) ij } are independent and identically distributed (i.i.d.) random variables with mean zero and variance one. Similarly X (2) N2 = (x (2) ij ) p×N2 , which is independent with X (1) N1 , constitutes another i.i.d. sample with mean zero and variance one. For l = 1, 2, we assume that the N l observations z i ) . We recall the famous Bartlett corrected LRT statistic L proposed by [6], which is given by, Here, and in the following, we denote c 1 = n 1 /(n 1 + n 2 ) and c 2 = n 2 /(n 1 + n 2 ). Moreover, under the null hypothesis of (1.1) and linear transformation model (1.2), it is not difficult to rewrite this statistic as where i ) , l = 1, 2. Thus we know that L is independent with the population means μ l and covariance matrices Σ l under the null hypothesis H 0 . In the following, we drop the superscripts of S and denote S 1 := S x 1 and S 2 := S x 2 for simplicity. In addition, by simple calculation, we can rewrite where B n = n 1 S 1 (n 1 S 1 + n 2 S 2 ) −1 .
We now analyze L. If p > n 1 or p > n 2 , then L is undefined because: (1) if p ≥ n 1 + n 2 , then matrix n 1 S 1 + n 2 S 2 is singular, which makes the inverse of matrix n 1 S 1 + n 2 S 2 undefined; and (2) if p < n 1 + n 2 , i.e., the inverse of n 1 S 1 + n 2 S 2 is well-defined almost surely (with the fourth moments finite assumption), but p > n 1 or p > n 2 , then the determinant of B n or I p − B n is zero, which makes the logarithm functions undefined. However, from random matrix theory (RMT), we know that if the fourth moment of x (l) ij exists, matrix B n almost certainly has p − n 1 zero eigenvalues and p − n 2 one eigenvalues according to the condition p > n 1 and p > n 2 , respectively (see [1]). Therefore, we can naturally redefine the LRT L by restricting the non-zero and non-one eigenvalues of B n , i.e., where λ Bn i denotes the i-th smallest eigenvalue of B n . It is clear that L is invariant with respect to transformations where C is nonsingular. Therefore, we only need to obtain the asymptotic distributions of the redefined LRT in (1.3), which is addressed in the next section. This paper also considers the test statisticL, which was proposed in [19] and discussed in [16,17]. AsL is the first term of L, thus it can be viewed as a lite LRT. Similar to L , we redefineL bỹ Unlike L , it is obvious thatL is monotone for matrix B n . That means under the alternative Σ 1 > Σ 2 or Σ 1 < Σ 2 , all the eigenvalues {λ Bn i | λ Bn i ∈ (0, 1)} become bigger or smaller than that under the null hypothesis. Thus,L becomes bigger or smaller correspondingly. However, it is uncertain for L . Therefore,L should be more powerful than L in the two cases.
The rest of this paper is organized as follows. The main results are presented in Section 2, including the asymptotic normality of L andL and their necessary conditions, which stand for the modification of LRT cannot be improved anymore in our model settings. In Section 3, we present the simulation results for the proposed statistics by comparison with that proposed by [14] and [7]. In Section 5, we introduce a real data application to demonstrate the application of the proposed tests. All technical details are relegated to the appendix. We note that for the high-dimensional testing problem (1.1), the exact distribution of the test statistic is difficult to obtain when the distributions are arbitrary.

Main results
In this section, we give the asymptotic distributions of the redefined LRT in (1.3) and (1.4). For the application, we also present consistent estimators for the fourth moments of the samples under the null hypothesis. Before presenting the main results, we give some notation and the necessary assumptions. In the following, we denote the indicator function by δ (·) , the natural logarithm function by log(·), convergence in distribution by D →, and . We set two assumptions of the sample that will be shown to be necessary for the proposed test statistics.
We are now ready to present the main results of this paper. Theorem 2.1. In addition to the Moments Assumption and Dimensions Assumption, we assume that as min{p, n 1 , n 2 } → ∞, lim y 1 ∈ {0, 1}, lim y 2 ∈ {0, 1} and lim p/(n 1 + n 2 ) ∈ (0, 1). Then under the null hypothesis, we have and According to the above theorem, we can easily conclude the following corollary under normal circumstances: j , j = 1 . . . , n l , l = 1, 2 are normally distributed, then (2.1) in Theorem 2.1 reduces to:

Remark 2.3.
If the Dimensions Assumption in Theorem 2.1 is satisfied, the limit condition that lim y 1 ∈ {0, 1}, lim y 2 ∈ {0, 1} and lim p/(n 1 + n 2 ) ∈ (0, 1) could be considered to hold, because there is no information for the convergence of the dimensions and sizes for any dataset. In addition, [12] showed that if p/n l → 1, Theorem 2.1 also holds for normally distributed data. However, if y l is near 1, then the variance ν n will be large and the LRT will become unstable, see Figure 1 for illustration. Remark 2.4. When y 1 < 1 and y 2 < 1, Corollary 2.2 recovers Theorem 4.1 in [5] directly.
The necessity of the Dimensions Assumption is clear because of the definitions of n , μ n and ν n . For the Moments Assumption, we only need to consider the fourth moments of the sample. From the variance ν n in Theorem 2.1, we know that its fourth-moment term cannot be removed except y 1 < 1 and y 2 < 1. However, if y 1 < 1 and y 2 < 1, it is not difficult to obtain that Ψ(y 2 , y 1 ) = Ψ(y 1 , y 2 ) = −y 3 1 y 3 2 , which implies that the fourth-moment term of μ n cannot be removed. Therefore, we conclude that the existence of the fourth moment of the sample is necessary for the modified LRT statistic L . However, when y 1 or y 2 is close to 1, the variance ν n will increase rapidly, resulting in poor power. For illustration, we present two 3D figures of μ n and ν 2 n with Δ 1 = Δ 2 = 0 in Figure 1. Now, we give the asymptomatic distribution ofL .
Theorem 2.5. In addition to the Moments Assumption and Dimensions Assumption, we assume that as min{p, n 1 , n 2 } → ∞, lim y 1 = 1, lim y 2 = 1 and lim p/(n 1 + n 2 ) < 1. Then under the null hypothesis, we havẽ   [22]. However, if p is bigger than both of the two sample sizes, because of the lack of theoretical results about the general Beta matrix n 1 S 1 (n 1 S 1 + n 2 A 1/2 S 2 A 1/2 ) −1 , the asymptotic distributions of statistics L andL are also open problems and will be left for our future work. Here A is any non-random symmetric matrix.
If Δ l = 0, or more specifically, if the samples are not normally distributed, the estimates for Δ l are necessary for the test application. Thus, we obtain their consistent estimators using the method of moments and random matrix theory. LetΔ 2) is used to avoid the case where the sample sizes are not larger than the dimension, which gives rise to the singularity of the sample covariance matrix. Otherwise, if the dimension p is smaller than some sample size, say, p < n 1 , we can estimate Δ 1 by replacing the term ( respectively. The estimatorΔ 2 in (2.3) can be treated analogously.
The proofs of Theorem 2.1, Theorem 2.5 Theorem 2.7 are given in the appendix.

Results of the simulation
In this section, we compare the performance of the statistics proposed in [14] and [7] and our modified LRTs L andL under various settings of sample size and dimensionality. The classical LRT statistic in [20] was shown to have poor performance for (1.1) by [5]; thus, it will not be considered in this section. Without loss of generality, we assume μ l = 0 and set where a is a constant. The samples are drawn from the following distributions: • Case 1: x (1) and x (2) are both standard normal distributed and Σ 2 = I p ; • Case 2: x (1) and x (2) are from the uniform distribution U (− √ 3, √ 3) and Σ 2 = I p ; • Case 3: x (1) and x (2) are from the uniform distribution U (− √ 3, √ 3) and Σ 2 = Diag(p 2 , 1, . . . , 1); • Case 4: x (1) and x (2) are from the uniform distribution U (− √ 3, √ 3) and Here, 1 p represents a p-dimensional vector with all entries 1. The results are obtained based on 10,000 replicates. In the tables, T andT denote the proposed modified LRTs, T lc denotes the nonparametric test of [14] and T clx denotes the maximum element test of [7].
In the first part of this section, we report the results by assuming the forth moments of the x (1) and x (2) are known. Tables 1-4 present the empirical sizes and empirical powers of Cases 1-4. Additionally, we provide four figures (Figures 2-5) to show the divergence of powers of the four test statistics as the parameter a increases. The results indicate that the sizes T andT perform quite well for all cases. However, in Case 3 and Case 4, the sizes of T lc and T clx are not accurate, which reflects the fact that the null distributions of the test statistics T lc and T clx are not well approximated by their asymptotic distributions in these cases. For the empirical power, we can conclude that T andT are more sensitive than T lc and T clx when at least one of the sample sizes is smaller than the dimensions.        Otherwise, the LRT T does not perform as well when the dimensions are large. However, the lite LRTT is always powerful because of its monotonicity for matrix B n , which coincides with our intuition. Next we will show the performance of the estimator we proposed in Theorem 2.7. Here we have to explain the reason that why we did not use the estimators in the previous simulation results. Because our estimator is based on the method of moments and requires n = n 1 + n 2 loops and inverse process for one replication, and we need 10,000 replications for one result, which makes the running times to be excessive. In the simulation, we set x ij be standard normal distributed and uniform distributed on (− √ 3, √ 3) to estimate the Δ 1 respectively. Under each circumstance we repeat 10,000 times and the results are reported at Table  5 and Table 6. From the numerical results, the performance of the estimator is remarkable, especially when the sample size is large. Therefore, we believe that the proposed modified LRTs must also perform well under the null hypothesis when using the estimators instead of their true values. But, under the alternative, we can not have any supporting evidence because of lack of theoretical results about the general Beta matrix. A common interest is to test whether the covariance matrices of the logarithmic daily returns for some stocks are the same over a period of time. Logarithmic daily returns are commonly used in finance. There are several theoretical and practical advantages of using logarithmic daily returns, including that we can assume that the sequences of logarithmic daily returns are independent of each other, see [10]. According to the Global Industry Classification Standard (GICS), which is an industry taxonomy developed in 1999 by MSCI and S&P for use by the global financial community, we select two sectors, Consumer Discretionary and Materials, which include 71 and 22 stocks, respectively. First, for Consumer Discretionary sector, we test whether the covariance matrices of the logarithmic daily returns of the second quarters (1 April-30 June) of 2012 and 2013 are the same. There were 63 and 64 trading days, respectively, in the second quarters of 2012 and 2013. By using the logarithmic difference transformation on the original data, we obtain the logarithmic daily returns dataset with sample sizes n 1 = 61 and n 2 = 62. Simultaneously, we use the first quarters (1 January-31 March) data of 2012 and 2013 when applying to Materials sector. That makes the sample sizes be n 1 = 60 and n 2 = 58. The p values obtained by applying the four test statistics T ,T , T lc and T clx are shown in the first and third columns of Table 7. Next, we use the same procedure to test whether the covariance matrices of the logarithmic daily returns of the first quarter (1 January-31 March) and the second quarter (1 April-30 June) of 2012 are the same. The results are shown in the second and fourth columns of Table 7. The results show that all the p-values are much smaller than 0.05. Thus, there is strong evidence that the two covariance matrices are different. Therefore, we suggest caution with the assumption that the daily returns are identically distributed.

Conclusion and discussion
In this paper, we propose two modified LRTs for the equality of two highdimensional covariance matrices and show the asymptotic distributions under the Moments Assumption and the null hypothesis. Furthermore, we show that the Moments Assumption and Dimensions Assumption are necessary for our results. We also present the weakly consistent and asymptotic unbiased estimators of Δ l for non-Gaussian distributions under the null hypothesis. According to the simulation results, the performances of these modified LRTs are remarkable under conditions where they are applicable, but the theoretical results for the alternative hypothesis are not considered in this paper because of the lack of random matrix theory. The modification of the LRT under the alternative hypothesis test will be presented in the future.

Appendix A: Technical details
In the appendix, we give the proofs of Theorem 2.1, Theorem 2.5 and Theorem 2.7. By comparing the definitions of L andL , it is easy to verify that the proof of Theorem 2.1 can be split into two parts, one of which is the same as the proof of Theorem 2.5 and the other of which is an analogous analysis. Hence, we first present the proof Theorem 2.5, and then give the rough proof of the other analogue part.

A.1. Proof of Theorem 2.5
The main tool used to prove the theorems is the Cauchy integral formula and Theorem 1.6 in [1], which established the central limit theorem of linear spectral statistics of random matrix B n and is presented below for convenience. Denote α n = n 2 /n 1 , G n (x) = p(F Bn (x) − F y1,y2 (x)) and x l , x r = y2(h∓y1) 2 (y1+y2) 2 be the limit spectral distribution of B n with parameters α n , y 1 , y 2 , Lemma A.1 (Theorem 1.6 in [1]). In addition to the Moments Assumption and the Dimensions Assumption, we further assume that: (1) As min{p, n 1 , n 2 } → ∞, y 1 → γ 1 ∈ (0, 1) ∪ (1, ∞), y 2 → γ 2 ∈ (0, 1) ∪ (1, ∞), and α n → α > 0. ( and covariance functions All the above contour integrals can be evaluated on any contour enclosing the interval [ αc l 1−c l , αcr 1−cr ]. Note that Lemma A.1 is proved based on the centralized sample covariance matrices, which are constructed by not subtracting the sample mean vector from each sample vector. However, [21] showed that the only difference between the two types of sample covariance matrices is normalization by p/n l and p/N l . It is not difficult to verify that Theorem 2.7 satisfies Lemma A.1 by choosing the kernel function to the logarithmic function. Therefore, the main task of proving Theorem 2.7 is to calculate the three integrals, which will be shown step by step.
Proof of the limit part˜ n . When calculating the integral to achieve the limit part˜ n , we choose a transformation x = y 2 |y 1 + hξ| 2 (y 1 + y 2 ) 2 .
Clearly, when x moves from y2(h−y1) 2 (y1+y2) 2 to y2(h+y1) 2 (y1+y2) 2 two times, ξ shifts along a unit circle in the positive direction. Then, we obtain that the integral (A.1) is equal to when y 1 > 1 and when y 1 < 1. Two forms of the integral ensure that the logarithmic function returns a finite number of poles related to y 1 of the integrand. The pole related to y 2 of the integrand is h/y 2 when y 2 > 1. There is no differentce in the integral value before and after the transformation ξ = 1/ξ, except that the residue point in the unit disc turns into y 2 /h, which is the residue point under the assumption y 2 < 1. Thus, we assume y 2 > 1 without loss of generality. Then, we obtain that if y 1 > 1, (A.2) can be rewritten as Similarly, if y 1 < 1, we have (A.3) equals According to Cauchy's residue theorem, we find three poles {0, − h y1 , h y2 } under the settings of y 1 > 1, y 2 > 1. The corresponding residues are p(y 1 + y 2 ) 2y 1 y 2 log y 2 1 y 2 (y 1 + y 2 ) 2 , respectively. In the same way, under the assumptions y 1 < 1, y 2 > 1, we obtain three poles {0, − y1 h , h y2 } and three residues p(y 1 + y 2 ) 2y 1 y 2 log h 2 y 2 (y 1 + y 2 ) 2 , Invariant test for covariance matrices 867 Therefore, by combining the above results and basic calculations, we obtain the limit part (A.1) is which completes the proof.
Proof of the mean partμ n . Because m 3 satisfies the equation , we make an integral conversion z = (1 + hrξ) ( where r is a number greater than but close to 1. According to the discussion in the last section, we assume y 2 > 1 without loss of generality. From the equation . When z runs in the positive direction around the contour enclosing the interval [ αc l 1−c l , αcr 1−cr ], m 3 runs in the opposite direction. Thus, when y 2 > 1, we choose the outcome Therefore, for y 1 > 1, we get the mean partμ n is equal to If y 1 < 1, we only need to change the logarithmic term in the above expression into the following form log , and the other terms remain the same. For y 1 > 1, there are four poles of term and four residues 1 2 log Thus, by Cauchy's residue theorem, we have Analogously, Thus, by combining the above results, the mean partμ n is This completes the proof.
Proof of the variance part. To calculate the variance partν n we make an analogous integral conversion Therefore, the relationship between ξ l and m 3 (z l ), l = 1, 2 is Without loss of generality, we assume r 1 < r 2 . When y 1 > 1, y 2 > 1, There is only one pole r1 r2ξ2 in the unit disc of the integration formula with respect to ξ 1 . Thus, by Cauchy's residue theorem, (A.7) is equal to which has three poles 0, − h r2 , − h r2y2 . Thus, we finally obtain that (A.7) = 2y 2 y1) . In addition, as when y 1 > 1, y 2 > 1, when y 1 < 1, y 2 > 1. Thus, the variance partν n is equal to This completes the proof.

A.2. Proof of Theorem 2.1
In this part, we give the proof of Theorem 2.1.
Proof. First, calculate the limit part n , Make the transformation x = y 2 |y 1 + hξ| 2 (y 1 + y 2 ) 2 , correspondingly, To ensure the logarithmic functions are not infinity, we change the transformation into following tow forms when y 1 > 1, and when y 1 < 1.
Because of that the log(1 − x) part can only be affect by y 2 , with same discussion, we get when y 2 > 1, and when y 2 < 1. Thus, dξ.
Then we complete the proof.

A.3. Proof of Theorem 2.7
We now prove thatΔ 1 is a weakly consistent and asymptotically unbiased estimator of Δ 1 .
Proof. From the definition of Δ 1 in (2.6) and Chebyshev's inequality, we only need to prove the following two results: Under the same assumptions in Theorem 1 and the null hypothesis, and We first consider (A.17). Without loss of generality, we assume the mean of z (1) j is zero. By using the truncation steps in [2] we may truncate and re-normalize the random variables as follows |x (1) ij | ≤ η n √ n, Ex where η n → 0 slowly. In addition, in the following we assume the sample covariance matrix without the negative sample mean, because their difference is only a rank one matrixz 1z 1 which will not affect the results. From the proof of Theorem 1 in [4], one can conclude that under the conditions of Theorem 1, there exists a constant M > 0 such that for any k > 0 where · means the spectral norm of a matrix. Thus, in the sequel, we can assume the smallest eigenvalue of c 11 S x 1j + c 12 S x 2 is bounded away from zero uniformly. Then we have j ) (c 11 S z 1j + c 12 S z 2 ) −1 z (1) j ] + O(p).
Here we use the fact that E[(z (1) ) (c 11 S z 1j + c 12 S z 2 ) −1 (z (1) )] = O (1) and E[(z (1) ) (c 11 S z 1j + c 12 S z 2 ) −1 (z (1) These results can be found in [15]. According to the independence of z (1) j and S z 1j and under the null hypothesis, we obtain that which complete the proof of (A.17). For (A.18), by the same argument as above, we have that E(Δ 1 − Δ 1 ) 2 to Now, by (2.1) in [2], we obtain that Next we use the fact that where S x 1ji is the sample covariance matrix by removing the vector x (1) j and x (1) i . Thus, we get that which together with (A.22) implies Then we complete the proof of Theorem 2.7.