The fluctuation of eigenvalues in factor models

: We consider the fluctuation of eigenvalues in factor models and propose a new method for testing the model. Based on the characteristics of eigenvalues, variables of unknown distribution are transformed into statistics of known distribution through randomization. The test statistic checks for breaks in the structure of factor models, including changes in factor loadings and increases in the number of factors. We give the results of simulation experiments and test the factor structure of the stock return data of China’s and U.S. stock markets from January 1, 2017, to December 31, 2019. Our method performs well in both simulations and real data.


Introduction
Currently, people can easily obtain higher-dimensional and larger amounts of data.The factor model, a well-established technique for dimension reduction, is one of the most commonly used methods in high-dimensional research.For the theory related to factor models, see papers such as Refs.[1,2].With the wide application of factor models, related theories have also been developed, such as articles related to determining the number of factors in the model [3, 4] .These methods often involve the use of covariance matrices.Let N and T denote the cross-sectional and time series dimensions, respectively.When the dimension N is fixed while T tends to infinity, the eigenvalues of the sample covariance matrix are good estimates of the eigenvalues of the population covariance matrix.However, when both N and T tend to infinity, the estimation at this time is biased.For the study of the eigenvalues of the sample covariance matrix, there is a substantial body of work dealing with the limiting behavior of the eigenvalues.It forms an important subset of what is often called random matrix theory.Related research includes Refs.[5,6].
In this paper, we consider spiked covariance matrices in the high-dimensional setting, which arise naturally from factor models.A special type of spiked covariance matrix whose nonspiked eigenvalues are equal to 1 is proposed by Ref. [7].Subsequent studies based on the spiked matrix have been extended in various directions, including the asymptotic distribution of spiked eigenvalues (e.g., Refs.[8][9][10][11][12]).Related studies extend the scope to more generalized matrices, for example, by removing the restriction that all nonspiked eigenvalues equal 1.Based on the eigenvalues of the covariance matrix, Ref. [13] proposes a randomization method and constructs a test statistic to determine the number of factors in a factor model.Applications of this randomization method can be traced back to Ref. [14], and Ref. [15] also applied the method in the financial field.This randomization method is also the main method used in our article.
In a typical factor model, the whole sample is often used to construct a model suitable for the whole period.Time invariance of factor loadings is a standard assumption in the analysis of large factor models.However, such an assumption is often too ideal to suit practice.In reality, the factor structure may constantly change due to the influence of various factors.This study focuses on two types of changes: the instability of factor loadings and the fact that new factors emerge after the change point.Breaks in the factor structure have been explored in many papers.Refs.[16,17] focus on structural instability in the loadings.Some articles consider the breaks in factor models before and after some significant events.Ref. [18] assess the evidence for a change in the number of factors during the Great Recession.However, their approach requires prior knowledge of the breaking point and cannot distinguish the two change types we just mentioned.Studies also aim to find the exact time point when a break occurs, which we call a change point, with Ref. [19] being one example.
The problem is that when there is no structural change in the process, some detection methods may give a false result.To avoid such false test results, we need to test whether there are structural changes before conducting in-depth research, and when there are no structural changes, additional studies are unnecessary.In this article, we consider the fluctuation of the eigenvalues in the factor model and use the randomization method to construct a statistic to test whether there is a structural change in the model within the time range.Employing a sliding window, our technique is more sensitive to detecting changes in factor loadings than simply dividing the sample into two.The test statistic performs well in the absence of change points.The presence of a change point is consistently detected under both types of structural changes (loading instability and new factors).√ Our research always assumes that (1).For any matrix , is the square root of the largest eigenvalue of .The rest of the paper is organized as follows.In Section 2, we introduce the model and basic results, and Section 3 presents the theory on which the method is mainly based.Section 4 explains how we construct statistics using the randomization method.Then, in Section 5, simulation experiments are carried out to test the effect of the statistic.In addition, a real data example is reported in Section 6.Finally, we give our conclusions in Section 7. All technical proofs are presented in the Appendix.

The model and basic results
Consider a factor model as follows.
We can also write the factor model with matrices.
where and are two matrices, is an matrix, and is an matrix.Define .
Assumption 1.Both and go to infinity. .Moreover, such that 1 ⩽ j, k ⩽ r for any .
Let be a matrix that has elements .Define , then Eq. (3) implies that ). (4) Eq. (3) holds when satisfies the following conditions: is strictly stationary and -mixing with , and .If the population covariance matrix is not diagonal, it is easy to adjust it by eigendecomposition and a different .However, (3) may still hold in some other cases even though is a nonrandom time trend.Thus, we impose Assumption 1 on the samples, not the populations, mainly to cover a more extensive range of situations.
. Here, has i.i.d.elements with mean 0, variance 1, and a finite fourth moment.is an nonnegative definition matrix.is a matrix with .There exists a constant that does not depend on and such that and .
It also implies that and .Let be the sample covariance matrix of .Let be the eigenvalues of .Proposition 1.Under Assumptions 1 and 2, and ).
Theorem 1 shows the fluctuation of from for .Note that when and are independent, degenerates to a separable model with rank on both sides.Thus, holds.It can also extend to other cases where there is only weak dependence between and .
λk F We note that only depends on the population of .Eqs. (7) and (8) imply that the fluctuation of is small even if we replace by a different population satisfying Assumption 2. Thus, becomes large only if the population of becomes large.
as and go to infinity, where is away from 0. Let be the eigenvalues of .There exists a constant such that ), K where is any finite integer.
Eq. ( 9) is a common result for nonspiked eigenvalues in random matrix theory, and it covers many cases.For example, the asymptotic distribution for the first eigenvalues of sample covariance matrices with the general population has been stated under some moment constraints in Ref. [20].Ref. [21] also established the Tracy-Widom asymptotic for the largest eigenvalues of a general class of random Gram matrices.In fact, is the right endpoint of the support for LSD of .For more details, one can also refer to Refs.[6,22,23].

r K C
Ẽ Theorem 2 demonstrates that excluding the first largest eigenvalues, the remaining first eigenvalues will be located within a small border around the right endpoint .Although Assumption 3 covers the nonspiked cases in random matrix theory, there are other cases for with so-called "spiked eigenvalues".Spiked eigenvalues often have a convergence rate of , which makes them differ from nonspiked eigenvalues.Addressing spiked eigenvalues in random matrix theory is sometimes easier than nonspiked eigenvalues, but the situation is different in our problems.The main reason is that the important terms in our proof of Theorem 2, such as and , may change the spiked structure.

Ẽ
We will consider a special case for with spiked eigenvalues.
Let and be independent. .Moreover, can be rewritten as where has the i.i.d.elements with mean 0, variance 1, and a finite fourth moment.is an matrix that is not negative definite, with eigenvalues .
Under Assumptions 1 and 4, the spectrum of and the distributions of satisfy the conditions of Theorem 3.6 in Ref. [22].Then, there exist constants such that Assumption 4 gives a situation with spiked eigenvalues in .In this case, ( 12) establishes a similar result to Theorem 2. The eigenvalues will still fluctuate in a small range near the limits.

Fluctuation of eigenvalues
Now, we consider with .Define the matrix with elements ( ) and .For each , we can calculate the largest eigenvalues of the sample covariance matrix for .We use to represent the th eigenvalue of .
We do not impose any conditions on the correlations between for varying .In contrast, we want to study whether differences exist between the structures based on the eigenvalues of their sample covariance matrices.We define and by replacing by in .Similarly, we can define by replacing by in .Denote that λk = Theorem 4. Suppose that Assumptions 1 and 2 hold for all .Then, from Eq. ( 7), for any , 1 ⩽ k ⩽ r for any .Moreover, if also holds, Remark 3. If is not large enough, we may consider a faster rate of .However, and impact in this case.Eq. ( 17) can have a faster rate provided that there are no differences (in distributions) between .
We also consider an alternative hypothesis and result: Assume and exist s.t.
. Then, there may exist such that the left part of Eq. ( 17) has a larger order than .
r Theorem 4 states that for the first eigenvalues, the left part of Eq. ( 15) should be small and not fluctuate greatly when the structure has not changed.It is a natural conclusion drawn from Theorem 1.

X (s)
Theorem 5. Suppose that Assumptions 1-3 hold for all .Then, according to Eq. ( 10), ), where is any finite integer.If for with , then Theorem 6. Suppose that the conditions of Theorem 3 hold for all .Then, With an alternative hypothesis, we have the following: If or and have different spectra, then there may exist such that the left part of ( 21) has a larger order than .Following Theorems 2 and 3, it is easy to obtain Theorems 5 and 6.We know that the left part of ( 21) should not fluctuate greatly for both the nonspiked and spiked cases when there is no break in the factor structure.

Randomization algorithm
In Section 3, we consider only the results of orders.We can use the randomization algorithm in Ref. [13] to construct a new test statistic.
Based on Theorems 1-6, we construct the new statistic using Here, is a value given in advance.It controls the number of possible factors in all sliding windows and is predetermined in the algorithm by a simple initial estimate: For in each sliding window, compute to estim- ate the number of factors for that window.When corresponding estimates are made for all windows, is chosen to be slightly larger than the maximum of so that it can always include the largest eigenvalues: 1 ⩽ t ⩽ T * To detect whether there is a change point in , the null hypothesis and the alternative hypothesis can be set as follows: { ); ϕ Here, is defined as Eq. ( 22).Then, the new statistic is constructed by Since has an order of under the null hypothesis, then should diverge to infinity at a rate of .The randomization algorithm follows as in Ref. [24]: Step 1.For , generate an sample with common distribution (the standard normal distribution).

u F(u) U ⊂ R/{0}
Step 2. For any drawn from a distribution with support , define Step 3. Compute Step 4.

Θ
The considerations in Ref. [24] point toward choosing .A relevant explanation of how the test statistics work under the null and alternative hypotheses is as follows.Under the null hypothesis, will approach infinity as mentioned before; therefore, the sequence follows a Bernoulli distribution with .Then in Step 3, should be because the central limit theorem should hold when is large enough.Consequently, in Step 4 should follow a chi-square distribution with one degree of freedom.Under the alternative hypothesis, remains finite, so for any , .Naturally, a sum of random variables with nonzero means is obtained in Step 3. Hence, also diverges to infinity.

Simulation
To verify the effect of the constructed statistic, repeated experiments were carried out under each group of conditions, where represents the number of factors under the null hypothesis.Under , we simulate data with according to the model: where means the whole process of data monitoring, and is the chosen window width.We fix and for the whole process and set the change point .Additionally, we consider factors and sliding window width .Common factors are allowed to have some time dependence: ∼ N(0, I r ) with .We simulate each element of as .The idiosyncratic components with each element of .and are two Toeplitz matrices of size and , respectively.The entries in the th diagonal place are given by for and for .
is always chosen in the randomization step because the construction of the final statistic is based on the central limit theorem.The distribution in Step 1 is set to be , while in Step 2 is chosen to have nonzero and equal mass at .When the factor loadings have changed after the change point, the factor model can be expressed in the following form: ∼ N(0, 1) where each element of and .
A 1 = P A 2 P When using the traditional way to look at two samples separately, the eigenvalues reflect only the information of each one; they cannot reflect the change in loading.However, putting two samples together and calculating through the sliding window method can ensure the eigenvalues reflect that change.One of the examples is that when the common factors stay constant in (29), but there is such a relationship between the factor loadings , is an orthogonal matrix.If we simply study the eigenvalues of the two covariance matrices, we find that they have almost the same eigenvalues, but at this time, the two loading matrices are obviously not equal.At this point, our sliding window method can identify such changes very well.
∼ N(0, I r1 ) Consider the case where new factors appear after the change point, and the new factors satisfy: with .The factor model at this time can be written as follows: To evaluate the performance of our statistic, we repeat the simulation 300 times under varying conditions and present the outcomes in Tables 1-4.Table 1 reports the fraction of false rejections under (i.e., when there is no break in the factor model) at the 1% and 5% significance level.In simulations, the sample size , and the window width .

T
Under the null hypothesis, it can be seen from the data in Table 1 that the constructed test statistic can reach a good confidence level when a large window width is chosen.Moreover, the performance of the test statistic is not affected The fluctuation of eigenvalues in factor models by increases in the number of factors in the original model.
In Table 2, we show the fraction of detections for three types of structural changes when a break takes place and when testing at 1% and 5% significance levels.Under the alternative hypothesis, considering the changes in the factor loadings and the number of factors, the statistic can reject the null hypothesis with a probability of 1, which indicates that it can test the changes in the structure very well.At the same time, the results in Table 2 show that it can have strong power under the alternative assumptions of different factor numbers in the original model and newly added factor numbers.
Considering a smaller sample size with change point , Table 3 shows the false rejection rate under of selecting the window width .According to the results in the table, even if is reduced to 200, our method can still be used when window width .This suggests that a larger window width can bring a better size of the test.However, it is not appropriate to select a window width that is too large for the whole sample size .Table 4 shows the fraction of detections under the alternative hypothesis when testing at 1% and 5% significance levels.Alghouth Table 3 shows that it has a better size with window width , the power in Table 4 is too low for the test.

Real data
Taking the daily returns of the S&P 500 and CSI 300 constituent stocks in China's and U.S. stock markets, respectively, from January 1, 2017, to December 31, 2019, as the research object, we examine the factor structure of the stock markets of the two countries.Stocks that have been suspended for more than 60 consecutive days are eliminated, and the remaining missing stock return data are filled with 0. Aggregating the data according to the same opening date for both markets, we obtain the daily returns of 749 stocks for 708 trading days.We use the randomization algorithm in Section 4 to test the factor structure of the above data.The window width is selected as , and the final test statistic value is , which exceeds the critical value when the significance level is .This suggests that there are changes in the factor structure during this period.To show the result more clearly, Fig. 1 presents the value of in each sliding window .In the whole process, changes greatly with the movement of the window, which also shows that the factor structure has changed.
The selected data include the period of the U.S.-China trade war, during which many important trade war events occurred, including multiple mutual taxation and meeting negotiations between China and the United States.The trade relationship between these two countries has experienced huge volatility, and stock markets have also fluctuated accordingly.Hence, the test result is consistent with the actual situation.

Conclusions
We study the fluctuation of eigenvalues in factor models and construct a test statistic of the model structure.One only need to detect the change points after the test shows the existence of change points.It reduces unnecessary work.We also apply our method to simulations and the data from China's and U.S. stock markets.Our method performs well in both simulations and real data.
At the same time, the sliding window method also avoids the insensitivity of the method of simply comparing two samples for eigenvalue research to determine the change in factor loadings.Our proposed test statistic performs well under both types of structural changes (the instability of factor loadings and the fact that new factors emerge after the change point) and is not affected by the number of factors in the original model.Accordingly, this novel statistic has good universality for application in future studies.

Biographies
Fanglin Bao   The fluctuation of eigenvalues in factor models istics from USTC in 2011, a Master's degree in Probability Theory and Mathematical Statistics from USTC in 2013, and a Ph.D. degree in Mathematics from Nanyang Technological University in Singapore in 2017.His research mainly focuses on high-dimensional random matrices, highdimensional time series, and complex networks.

A.2 Proof of Theorem
For any , is positive-definite with probability 1 (as and tend to infinity).Then, for any , is the solution of det . It is easy to see that when is the solution of Eq. (A5).
Then, Eq. (A5) can be rewritten as First, let us consider the largest eigenvalues of .Since are the eigenvalues of , we can run SVD on .Then we can rewrite as

Note that
and is not a negative-definite matrix, and the largest eigenvalue of is 1.Then, the largest eigenvalues of are not larger than but also not smaller than the first largest eigenvalues of ) Note that is a projection matrix with rank .Therefore, has at least eigenvalues 1.This, together with Eqs. ( 9) and (A8), implies that the largest eigenvalues of are .Similarly, we can repeat the above idea on

F Ẽ⊤ r
We find that is positive-definite with probability 1, and the rank of is not larger than .Moreover, Thus, N − 2r and it has at least eigenvalues of 1.This lets us conclude Eq. (10).
We know that is independent of since and are independent.Eq. (A.9) can be considered a separable sample covariance matrix when we freeze .
has at least eigenvalues 1 and eigenvalues 0. This, together with Assumption 4, satisfies the conditions of Theorem 3.6 in Ref. [22].Recalling Theorem 3.6 in Ref. [22], the spiked eigenvalues have a rate of and the nonspiked eigenvalues have a rate of .Since , we find that the fluctuations of the largest eigenvalues are .Thus, we can prove that there exist constants such that λ0,r+k − c Now, we only need to prove for .Recalling Eq. (A7), we need to prove 800 T ∈ {150, 200, 250, 300}

Table 4 .
is a graduate student under the tutelage of Prof. Bo Zhang at the University of Science and Technology of China.Her research mainly focuses on high-dimensional statistics.Bo Zhang is an Associate Professor at the University of Science and Technology of China (USTC).He obtained a Bachelor's degree in Stat-Power of the statistic at 1% and 5% significance ( ).