Test of independence for high-dimensional random vectors based on freeness in block correlation matrices

In this paper, we are concerned with the independence test for k high-dimensional sub-vectors of a normal vector, with fixed positive integer k. A natural high-dimensional extension of the classical sample correlation matrix, namely block correlation matrix, is proposed for this purpose. We then construct the so-called Schott type statistic as our test statistic, which turns out to be a particular linear spectral statistic of the block correlation matrix. Interestingly, the limiting behavior of the Schott type statistic can be figured out with the aid of the Free Probability Theory and the Random Matrix Theory. Specifically, we will bring the so-called real second order freeness for Haar distributed orthogonal matrices, derived in Mingo and Popa (2013)[10], into the framework of this high-dimensional testing ∗Corresponding author. †Z.G. Bao was supported by a startup fund from HKUST. ‡J. Hu was partially supported by Science and Technology Development Foundation of Jilin (Grant No. 20160520174JH), Science and Technology Foundation of Jilin during the “13th Five-Year Plan” and the National Natural Science Foundation of China (Grant No. 11301063). §G.M. Pan was partially supported by a MOE Tier 2 grant 2014-T2-2-060 and by a MOE Tier 1 Grant RG25/14 at the Nanyang Technological University, Singapore. ¶W. Zhou was partially supported by R-155-000-165-112 at the National University of Singapore. 1527


Introduction
Test of independence for random variables is a very classical hypothesis testing problem, which dates back to the seminal work by Pearson in [17], followed by a huge literature regarding this topic and its variants. One frequently recurring variant is the test of independence for k random vectors, where k ≥ 2 is an integer. Comprehensive overview and detailed references on this problem can be found in most of the textbooks on multivariate statistical analysis. For instance, here we recommend the masterpieces by Muirhead in [13] and by Anderson in [1] for more details, in the low-dimensional case. However, due to the increasing demand in the analysis of big data springing up in various fields nowadays, such as genomics, signal processing, microarray, proteomics and finance, the investigation on a high-dimensional extension of this testing problem is much needed, which motivates us to propose a feasible way for it in this work.
Let us take a review more specifically on some representative existing results in the literature, after necessary notation is introduced. For simplicity, henceforth, we will use the notation JmK to denote the set {1, 2, . . . , m} for any positive integer m. Assume that x = (x 1 , . . . , x k ) is a p-dimensional normal vector, in which x i possesses dimension p i for i ∈ JkK, such that p 1 + · · · + p k = p. Denote by μ i the mean vector of the ith sub-vector x i and by Σ ij the cross covariance matrix of x i and x j for all i, j ∈ JkK. Then μ := (μ 1 , . . . , μ k ) and Σ := (Σ ij ) k i,j=1 are the mean vector and covariance matrix of x respectively. In this work, we consider the following hypothesis testing To this end, we draw n observations of x, namely x(1), . . . , x(n). In addition, the ith sub-vector of x(j) will be denoted by x i (j) for all i ∈ JkK and j ∈ JnK.
Hence, x i (j), j ∈ JnK are n independent observations of x i . Conventionally, the corresponding sample means will be written asx = n −1 n j=1 x(j) and x i = n −1 n j=1 x i (j). With the observations at hand, we can construct the sample covariance matrix as usual. Set

1529
The sample covariance matrix of x and the cross sample covariance matrix of x m and x will be denoted byΣ andΣ m respectively, to wit, In the classical large n and fixed p case, the likelihood ratio statistic Λ n := W n/2 n with W n := |Σ| is a favorable one for the testing problem (T1). A celebrated limiting law on One can refer to [27] or Theorem 11.2.5 of [13], for instance. The left hand side of (1.2) is known as the Wilks statistic. Now, we turn to the high-dimensional setting of interest. A commonly used assumption on dimensionality and sample size in the Random Matrix Theory (RMT) is that p is proportionally large as n, i.e. p := p(n), p n → y ∈ (0, ∞), as n → ∞.
To employ the existing RMT apparatus in the sequel, hereafter, we will always work with the above large n and proportionally large p setting. This time, resorting to the likelihood ratio statistic in (1.2) directly is obviously infeasible, since the limiting law (1.2) is invalid when p tends to infinity along with n. Actually, under this setting, the likelihood ratio statistic can still be employed if an appropriate renormalization is performed a priori. In [8], the authors renormalized the likelihood ratio statistic, and derived its limiting law under H 0 as a central limit theorem (CLT), under the restriction of One can refer to Theorem 2 in [8] for the details of the normalization and the CLT. One similar result was also obtained in [7], see Theorem 4.1 therein. However, the condition (1.3) is indispensable for the likelihood ratio statistic which will inevitably hit the wall when p is even larger than n, owing to the fact that log |Σ| is not well-defined in this situation. In addition, in [7], another test statistic constructed from the traces of F-matrices, the so-called trace criterion test, was proposed. Under H 0 , a CLT was derived for this statistic under the following restrictions for all i ∈ JkK, together with p − p 1 < n. We stress here, condition (1.4) is obviously much stronger than p i /n → y i ∈ (0, 1) for all i ∈ JkK.
Roughly speaking, our aim in this paper is to propose a new statistic with both statistical visualizability and mathematical tractability, whose limiting behavior can be derived with the following restriction on the dimensionality and the sample size Especially, n is not required to be larger than p or any partial sum of p i . More precisely, our multi-facet target consists of the following: • Introducing a new matrix model tailored to (T1), namely block correlation matrix , which can be viewed as a natural high-dimensional extension of the sample correlation matrix. • Constructing the so-called Schott type statistic from the block correlation matrix, which can be regarded as an extension of Schott's statistic for complete independence test in [21]. • Deriving the limiting distribution of the Schott type statistic with the aid of tools from the Free Probability Theory (FPT). Specifically, we will channel the so-called real second order freeness for Haar distributed orthogonal matrices from [10] into the framework. • Employing this limiting law to test independence of k sub-vectors under (1.5) and assessing the statistic via simulations and a real data set, which comes from the daily returns of 258 stocks issued by the companies from S&P 500.
It will be seen that for (T1), it is quite natural and reasonable to put forward the concept of block correlation matrix. Just like that the sample correlation matrix is designed to bypass the unknown mean values and variances of entries, the block correlation matrix can also be employed, without knowing the population mean vectors μ i and covariance matrices Σ ii , i ∈ JkK. Also interestingly, it turns out that, the statistical meaning of the Schott type statistic is rooted in the idea of testing independence based on the Canonical Correlation Analysis (CCA). Meanwhile, theoretically, the Schott type statistic is a special type of linear spectral statistic from the perspective of RMT. Then methodologies from RMT and FPT can take over the derivation of the limiting behavior of the proposed statistic. As far as we know, the application of FPT in highdimensional statistical inference is still in its infancy. We also hope this work can evoke more applications of FPT in statistical inference in the future. One may refer to [19] for another application of FPT, in the context of statistics.
Our paper is organized as follows. In Section 2, we will construct the block correlation matrix. Then we will present the definition of the Schott type statistic and its limiting law in Section 3. Meanwhile, we will discuss the statistical rationality of this test statistic, especially the relationship with CCA. In Section 4, we will detect the utility of our statistic by simulation, and an example about stock prices will be analyzed in Section 5. Finally, Section 6 will be devoted to the illustration of how RMT and FPT machinery can help to establish the limiting behavior of our statistic, and some calculations will be presented in the Appendix.
Throughout the paper, trA represents the trace of a square matrix A. If A is N × N , we will use λ 1 (A), . . . , λ N (A) to denote its N eigenvalues. Moreover, |A| means the determinant of A. In addition, 0 M ×N will be used to denote the M × N null matrix, and be abbreviated to 0 when there is no confusion on the dimension. Moreover, for random objects ξ and η, we use ξ d = η to represent that ξ and η share the same distribution.

Block correlation matrix
An elementary property of the sample correlation matrix is that it is invariant under translation and scaling of variables. Such an advantage allows us to discuss the limiting behavior of statistics constructed from the sample correlation matrix without knowing the means and variances of the involved variables, thus makes it a favorable choice in dealing with the real data. Now we are in a similar situation, without knowing explicit information of the population mean vectors μ i and covariance matrices Σ ii , we want to construct a test statistic which is independent of these unknown parameters, in a similar vein. The first step is to propose a high-dimensional extension of sample correlation matrix, namely the block correlation matrix. For simplicity, we use the notation to denote the diagonal block matrix with blocks A i , i ∈ JkK, i.e. all off-diagonal blocks are 0. Definition 2.1 ((Block correlation matrix)). With the aid of the notation in (1.1), the block correlation matrix B := B(X 1 , · · · , X k ) is defined as follows . Remark 2.1. Note that when p i = 1 for all i ∈ JkK, B is reduced to the classical sample correlation matrix of n observations of a k-dimensional random vector.
In this sense, we can regard B as a natural high-dimensional extension of the sample correlation matrix.

Remark 2.2.
If we take determinant of B, we can get the likelihood ratio statistic. However, since one needs to further take logarithm on the determinant, the assumption n > p + 2 is indispensable, in light of [8].
Apparently, we have ZZ = XX and Z i Z i = X i X i . Consequently, we can also write An advantage of Z i over X i is that its columns are i.i.d.

Schott type statistic and main result
With the block correlation matrix at hand, we can propose our test statistic for (T1), namely the Schott type statistic. Such a nomenclature is motivated by Schott in [21] on another classical independence test problem, the so-called complete independence test , which can be described as follows. Given a random vector w = (w 1 , . . . , w p ) , we consider the following hypothesis testing: When w is multivariate normal, the above test problem is equivalent to the so-called test of sphericity of population correlation matrix, i.e. under the null hypothesis, the population correlation matrix of w is I p . There is a long list of references devoted to testing complete independence for a random vector under the high-dimensional setting. See, for example, [14], [15], [23], [4], [2] and [21]. Especially, in [21], the author constructed a statistic from the sample correlation matrix. To be specific, denote the sample correlation matrix of n i.i.d. observations of w by R = R(n, p) := (r ij ) p×p . Schott's statistic for (T2) is then defined as follows Note that under the null hypothesis, all off-diagonal entries of the population correlation matrix should be 0. Hence, it is quite natural to use the summation of squares of all off-diagonal entries to measure the difference between the population correlation matrix and I. Then Schott's statistic s(R) is just the sample counterpart of such a measurement. Now in a similar vein, we define the Schott type statistic for the block correlation matrix as follows.

Definition 3.1 (Schott type statistic). We define the Schott type statistic of the block correlation matrix B by
For simplicity, we introduce the matrix Stemming from the definition of B, we can easily get The matrix C(i, j) is frequently used in CCA towards random vectors x i and x j , known as the canonical correlation matrix. A well-known fact is that the eigenvalues of C(i, j) provide meaningful measures of the correlation between these two random vectors. Actually, its singular values (square root of eigenvalues) are the well-known canonical correlations between x i and x j . Thus, the summation of all eigenvalues, trC(i, j), also known as Pillai's test statistic, can obviously measure the correlation between x i and x j . Summing up these measurements over all (i, j) pairs (subject to j < i) yields our Schott type statistic, which can capture the overall correlation among k parts of x. Thus, from the perspective of CCA, the Schott type statistic possesses a theoretical validity for the testing problem (T1). Our main result is the following CLT under H 0 .
Remark 3.1. Note that by assumption, a n is of order O(n) and b n is of order O(1).

Remark 3.2.
For k = 2, this theorem coincides with Theorem 2.2 of [28], which does not need the normality assumption. We believe that Theorem 3.1 should be also true under the same conditions as those in Theorem 2.2 of [28].

Numerical studies
In this section we compare the performance of the statistics proposed in [8], [7] and our Schott type statistic, under various settings of sample size and dimensionality. For simplicity, we will focus on the case of k = 3. The Wilks' statistic in (1.2) without any renormalization has been shown to perform badly for (T1) in [8] and [7], thus will not be considered in this section. Under the null hypothesis H 0 , due to the scale invariance of the three tests, without loss of generality, the samples are drawn from the following distribution: Under the alternative hypothesis H 1 , we adopt five distributions, including the two used in [8](II) and [7](III) respectively: Here 1 p×p stands for the matrix whose entries are all equal to 1, I ij stands for the p i × p j rectangular matrix whose main diagonal entries are equal to 1 and the others are equal to 0, and 1 23 stands for the p 2 × p 3 rectangular matrix whose first entrie is equal to 1 and the others are equal to 0. The empirical sizes and powers are obtained based on 100,000 replications with 3 digits. In the tables, T 0 denotes the proposed Schott type test, T 1 denotes the renormalized likelihood ratio test of [8] and T 2 denotes the trace criterion test of [7]. The empirical sizes and powers of three tests are presented in Table 1 and Table 2. It shows that the proposed Schott type test T 0 performs quite robust Table 1 Empirical sizes (Scenario I) and empirical powers (Scenario II and III) of tests T 0 , T 1 and T 2 at the 5% significance level. Scenario III  T0  T1  T2  T0  T1  T2  T0  T1  T2  (2, 2 Table 2 Empirical powers of tests T 0 , T 1 and T 2 at the 5% significance level. T0  T1  T2  T0  T1  T2  T0  T1  T2  (2, 2 under the null hypothesis. Even when p 1 = 2, p 2 = 2, p 3 = 1 and n = 4 the attained significance levels are also close to 6%. It is obvious that the empirical sizes of T 0 is good enough in the inapplicable cases of T 1 and T 2 . Furthermore, for these applicable cases, when the dimensions and sample sizes are big, all the three tests perform good. But when the sample sizes n are close to the total of dimensions p, test T 1 performs worse than T 0 and T 2 .
For the power comparations, in Scenario II and III, which are introduced in [8] and [7] respectively, all the powers tend to 1 as the sample size increases, and T 0 performs better than T 1 and T 2 in most of the settings. It is worthy to notice that when the sample size n is big, the renormalized likelihood ratio test T 1 is also satisfactory, and better than T 2 . What is more, when the dimensions and sample sizes are big, T 0 also shows to be the most powerful one among the three tests in Scenario V. But if the dimensions are small, T 1 seems slightly better. In Scenario IV, T 2 performs extremely good, especially when the dimensions and sample sizes are big. In this situation, T 0 is also acceptable. Scenario VI is the so called spike model in RMT, and from the result we can find that all the three tests failed, because their powers do not tend to 1 as the sample size increases. Therefore, by the numerical results, we recommend the proposed test T 0 for testing the independence of high dimensional random vectors due to its stability.

An example
For illustration, we apply the proposed test statistic to the daily returns of 258 stocks issued by the companies from S&P 500. The original data are the closing prices or the bid/ask average of these stocks for the trading days of the last quarter in 2013, i.e., from 1 October 2013 to 31 December 2013, with total 64 days. This dataset is derived from the Center for Research in Security Prices Daily Stock in Wharton Research Data Services. According to The North American Industry Classification System (NAICS), which is used by business and government to classify business establishments, the 258 stocks are separated into 11 sectors. Numbers of stocks in each sector are shown in Table 3. A common interest here is to test whether the daily returns for the investigated 11 sectors are independent.
The testing model is established as follows: Denote p i as the number of stocks in the ith sector, u il (j) as the price for the lth stock in the ith sector at day j. Here j ∈ J64K. Correspondingly we have p = 11 i=1 p i = 258. In order to satisfy the condition of the proposed test statistic, the original data u il (j) need to be transformed as follows: (i) logarithmic difference: Let x il (j) = ln(u il (j + 1)/u il (j)). Notice that j ∈ J63K. So we denote n = 63. Logarithmic difference is a very commonly used procedure in finance. There are a number of theoretical and practical advantages of using logarithmic returns, especially we shall assume that the sequence of logarithmic returns are independent of each other for big time scales (e.g. ≥ 1 day, see [18]). (ii) power transform: It is well known that if a security price follows geometric Brownian motion, then the logarithmic returns of the security are normally distributed. However in most cases, the normalized logarithmic returns x il (j) are considered to have sharper peaks and heavier tails than the standard normal distribution. Thus we first transform x il (j) tox il (j) by Box-Cox transformation, and then suppose the transformed data follows a standard normal distribution, that is Here β il is an unknown parameter,x il andσ il are the sample mean and sample standard deviation ofx il (j), j ∈ JnK. β il can be estimated by where a il = min j∈JnKxil (j), b il = max j∈JnKxil (j) and Φ(t) is the standard normal distribution function. (iii) normality test : we use the Kolmogorov-Smirnov test to test whether the transformed prices of each stock are drawn from the standard normal distribution. By calculation, we get 258 p-values. Among them the minimum is 0.1473. And 91.86% of p-values are bigger than 0.5. For illustration, we present the smoothed empirical densities of the transformed datã x il (j) for the first four stocks of each sector in Figure 1. From these graphs, we can also see that the transformed data fit the normal density curve well. Observe that, marginal normal cannot guarantee jointly normal. But still, here we preform our method on the transformed data, since as we mentioned In Remark 3.2, our result Theorem 3.1 is believed to hold under more generally distributed data. The transformation is done mainly for getting more moments matching to the Gaussian distribution. Theorem 2.2 in [28] indicates that moment matching condition like the fourth moment equal to 3 may be necessary for Theorem 3.1. Now we apply s(B) to test the independence of every two sectors. The pvalues are shown in Table 4. We find in the total 55 pairs of sectors, there are 23 pairs with p-values bigger than 0.05 and 18 pairs with p-values bigger than 0.1. Interestingly, according to these results, if we set the significance level as 5% we find that there are seven sectors which are independent of Sector 7, the finance and insurance sector, which seems to be most independent of other sectors. On the other hand, Sector 11 (the arts, entertainment, and recreation sector) is dependent on all other sectors except Sector 1 (the mining, quarrying, and oil and gas extraction sector). We also investigate the mutual independence of every three sectors. Applying Theorem 3.1, we find that there are 11 groups with p-values bigger than 0.05. In addition, we find that there is only one group   containing four sectors which are mutually independent. The results are shown in Table 5. Thus, we have strong evidence to believe that every five sectors are dependent.

Linear spectral statistics and second order freeness
In this section, we will introduce some RMT and FPT apparatus with which Theorem 3.1 can be proved. We will just summarize the main ideas on how these tools fit into the framework of the limiting behavior of our proposed statistic, but leave the details of the reasoning and calculation to the Appendix. We start from the following elementary fact. To wit, for two matrices S and T, we know that ST and TS share the same non-zero eigenvalues, as long as both ST and TS are square. Therefore, to study the eigenvalues of B, it is equivalent to study the eigenvalues of the (n − 1) × (n − 1) matrix by the above discussion we can assert

s(B) = s(B).
A main advantage of B is embodied in the following proposition, which will be a starting point of the proof of Theorem 3.1.
Proof. We perform the singular value decomposition as Note that Proposition 6.1 allows us to study the eigenvalues of a summation of k independent random projections instead. Moreover, it is obvious that under H 0 , this summation of random matrices does not depend on the unknown population mean vectors and covariance matrices of x i , i ∈ JkK.
To compress notation, we set According to the discussion in the last section, we know that our Schott type statistic can be expressed (in distribution) in terms of the eigenvalues of Q, since In RMT, given an N × N random matrix A and some test function f : C → C, one usually calls the quantity a linear spectral statistic of A with test function f . For some classical random matrix models such as Wigner matrices and sample covariance matrices, linear spectral statistics have been widely studied. Not trying to be comprehensive, one can refer to [3], [9], [16], [25] and [22] for instance. A notable feature in this type of CLTs is that usually the variance of the linear spectral statistic is of order O(1) when the test function f is smooth enough, mainly due to the strong correlations among eigenvalues, thus is significantly different from the case of i.i.d variables. Now, in a similar vein, with the random matrix Q at hand, we want to study the fluctuation of its linear spectral statistics, focusing on the test function f (x) = x 2 .
In the past few decades, the main stream of RMT has focused on the spectral behavior of single random matrix models such as Wigner matrix, sample covariance matrix and non-Hermitian matrix with i.i.d. variables. However, with the rapid development in RMT and its related fields, the study of general polynomials with classical single random matrices as its variables is in increasing demand. A favorable idea is to derive the spectral properties of the matrix polynomial from the information of the spectrums of its variables (single matrices). Specifically, the question can be described as

Given the eigenvalues of A and B, what can one say about the eigenvalues of h(A, B)?
Here h(·, ·) is a bivariate polynomial. Usually, for deterministic matrices A and B, only with their eigenvalues given, it is impossible to write down the eigenvalues of h (A, B). However, for some independent high-dimensional random ma- trices A and B, deriving the limiting spectral properties of h(A, B) via those of A and B is possible. To answer this kind of question, the right machinery to employ is FPT. In the breakthrough work by Voiculescu in [26], the author proved that if (A n ) n∈N and (B n ) n∈N are two independent sequences of random Hermitian matrices and at least one of them is orthogonally invariant (in distribution), then they satisfy the property of asymptotic freeness, which allows one to derive the limit of 1 n Etrh(A n , B n ) from the limits of ( 1 n EtrA m n ) m∈N and ( 1 n EtrB m n ) m∈N directly. Sometimes we also call the asymptotic freeness of two random matrix sequences as first order freeness.
Our aim in this paper, however, is not to derive the limit of the normalized trace of some polynomial in random matrices, but to take a step further to study the fluctuation of the trace. To this end, we need to adopt the concept of second order freeness, which was recently raised and developed in the series of work: [11,12,5,10], also see [20]. In contrast, the second order freeness aims at answering how to derive the fluctuation property of trh(A n , B n ) from the limiting spectral properties of A n and B n .
Especially, in [10], the authors established the so-called real second order freeness for orthogonal matrices, which is specialized in solving the fluctuation of the linear spectral statistics of polynomials in Haar distributed orthogonal matrices and deterministic matrices. For our purpose, we need to employ Proposition 52 in [10], an ad hoc and simplified version of which can be heuristically sketched as follows. Assume that {A n } n∈N and {B n } n∈N are two independent sequences of random matrices (may be deterministic), where A n and B n are n by n, and the limits of n −1 Etrh 1 (A n ) and n −1 Etrh 2 (B n ) exist for any given polynomials h 1 and h 2 , as n → ∞. Moreover, trh 1 (A n ) and trh 2 (B l n) possess Gaussian fluctuations (may be degenerate) asymptotically for any given polynomials h 1 and h 2 . Then trq(O A n O, B n ) also possesses Gaussian fluctuation asymptotically for any given bivariate polynomial q(·, ·), where O is supposed to be an n × n Haar orthogonal matrix independent of A n and B n . Now as for Q, we can start from the case of k = 2, which fits the above framework quite well. To wit, we can regard O 1 P 1 O 1 as B n−1 and P 2 as A n−1 , using Proposition 52 of [10] leads to the fact that trh( is asymptotically Gaussian after an appropriate normalization, for any given polynomial h. Then we take O 1 P 1 O 1 + O 2 P 2 O 2 as B n−1 and regard P 3 as A n−1 and repeat the above discussion. By using Proposition 52 in [10] recursively, we can get our CLT for Q finally. A formal result which can be derived from Proposition 52 in [10] is as follows, whose proof will be presented in the Appendix. where κ r (ξ 1 , . . . , ξ r ) represents the joint cumulant of the random variables ξ 1 , . . . , ξ r . Now if we set h i (x) = x 2 for all i ∈ N, we can obtain Theorem 3.1 by proving the following lemma. Lemma 6.1. Under the above notation, we have and 3) The proof of Lemma 6.1 will also be stated in the Appendix. In the sequel, we prove Theorem 3.1 with Lemma 6.1 granted. Proof of Theorem 3.1. Setting h i (x) = x 2 for all i ∈ N in Theorem 6.1, we see that all rth cumulants of trQ 2 tend to 0 if r ≥ 3 when n → ∞, which together with Lemma 6.1 implies that =⇒ N (0, 1).
Then Theorem 3.1 follows from the definition of s(B) directly.

Appendix
In this Appendix, we provide the proofs of Theorem 6.1 and Lemma 6.1.
Proof of Theorem 6.1. In Proposition 52 of [10], the authors state that if {A n } n∈N and {B n } n∈N are two independent sequences of random matrices (may be deterministic), where A n and B n are n × n, each having a real second order limit distribution, and O is supposed to be an n × n Haar orthogonal matrix independent of A n and B n , then B n and O A n O are asymptotically real second order free. By Definition 30 of [10], we see that for a single random matrix sequence {A n } n∈N , the existence of the so-called real second order limit distribution means that the following three statements hold simultaneously for any given sequence of polynomials h 1 , h 2 , h 3 , . . . when n → ∞: 2) Cov(trh 1 (A n ), trh 2 (A n )) converges (the limit can be 0); 3) κ r (trh 1 (A n ), . . . , trh r (A n )) = o(1) for all r ≥ 3.
We stress here, the original definition in [10] is given with a language of noncommutative probability theory. To avoid introducing too many additional notions, we just modify it to be the above 1)-3). Then by the Definition 33 and Proposition 52 of [10]), one see that if both {A n } n∈N and {B n } n∈N have real second order limit distributions, we have the following three facts for any given sequences of bivariate polynomials q 1 , q 2 , q 3 , . . . when n → ∞: Here 1') and 2') can be implied by the definitions of the first and second order freeness in [10] respectively, and 3') can be found in the proof of Proposition 52 of [10], where the authors claim that the proof of Theorem 41 therein is also applicable under the setting of Proposition 52. Note that in [10], a more concrete rule to determine the limit of is given, which can be viewed as the core of the concept of real second order freeness. However, here we do not need such a concrete rule, thus do not introduce it. Now we are at the stage of employing Proposition 52 of [10] to prove Theorem 6.1. Recall our objective Q defined in (6.1). We start from the case of k = 2, to wit, we are considering the linear spectral statistics of the random matrix O 1 P 1 O 1 + O 2 P 2 O 2 . Now, we regard O 1 P 1 O 1 as B n−1 and P 2 as A n−1 , then obviously they both satisfy 1)-3) in the definition of the existence of the real second order limit distribution, since the spectrums of A n−1 and B n−1 are both deterministic, noticing they are projection matrices with known ranks. Then 1')-3') immediately imply that O 1 P 1 O 1 + O 2 P 2 O 2 also has a real second order limit distribution. Next, adding O 3 P 3 O 3 to O 1 P 1 O 1 + O 2 P 2 O 2 , and regarding the latter as B n−1 and P 3 as A n−1 , we can use the above discussion again to conclude that O 1 P 1 O 1 + O 2 P 2 O 2 + O 3 P 3 O 3 also possesses a real second order limit distribution. Recursively, we can finally get that Q has a real second order limit distribution, which implies Theorem 6.1. So we conclude the proof.
It remains to prove Lemma 6.1. Before commencing the proof, we briefly introduce some technical inputs. Since the trace of a product of matrices can always be expressed in terms of some products of their entries, it is expected that we will need to calculate the quantities of the form where O is assumed to be an N -dimensional Haar distributed orthogonal matrix, and O ij is its (i, j)th entry. A powerful tool handling this kind of expectation is the so-called Weingarten calculus on orthogonal group, we refer to the seminal paper of [6], formula (21) therein. To avoid introducing too many combinatorics notions for Weingarten calculus, we just list some consequence of it for our purpose, taking into account the fact that we will only need to handle the case of m ≤ 4 in (6.4) in the sequel. Specifically, we have the following lemma.

3) Replacing O by O , we can obviously switch the roles of i and j in 1) and
2). Moreover, any permutation on the indices {1, . . . , m} will not change (6.4). Any other triple (m, i, j), which can not be transformed into any case in 1) or 2) via switching the roles of i and j or performing permutations on the indices {1, . . . , m}, will drives (6.4) to be 0.
With Lemma 6.2 at hand, we can prove Lemma 6.1 in the sequel.
Proof of Lemma 6.1. At first, we verify (6.2). Note that by definition, Let u i ( ) be the th column of O i and u i ( , s) be the sth coefficient of u i ( ), i.e. the ( , s)th entry of O i , for all i ∈ JkK. Then for i = j we have Here, in the third step above we used 3) of Lemma 6.2 to discard the terms with s = t, while in the last step we used 1) of Lemma 6.2. Therefore, we have Now we calculate Var(trQ 2 ) as follows. Note that we have In the sequel, we briefly write Cov((i, j), (m, )) := Cov(tr Note that the summation in the last step above can be decomposed into the following six cases. Cov((i, j), (j, )), α = 5, 6.
Through detailed but elementary calculation, with the aid of Lemma 6.2, we can finally obtain that for i = j, Moreover, analogously, when i, j, are mutually distinct, we can get Cov((i, j), (i, )) = Cov((i, j), (j, )) = O( 1 n ) by using Lemma 6.2. Here we just omit the details of the calculation. Consequently, by (6.7) one has Thus we conclude the proof.