Detection of Multiple Structural Breaks in Large Covariance Matrices

ABSTRACT This article studies multiple structural breaks in large contemporaneous covariance matrices of high-dimensional time series satisfying an approximate factor model. The breaks in the second-order moment structure of the common components are due to sudden changes in either factor loadings or covariance of latent factors, requiring appropriate transformation of the factor models to facilitate estimation of the (transformed) common factors and factor loadings via the classical principal component analysis. With the estimated factors and idiosyncratic errors, an easy-to-implement CUSUM-based detection technique is introduced to consistently estimate the location and number of breaks and correctly identify whether they originate in the common or idiosyncratic error components. The algorithms of Wild Binary Segmentation for Covariance (WBS-Cov) and Wild Sparsified Binary Segmentation for Covariance (WSBS-Cov) are used to estimate breaks in the common and idiosyncratic error components, respectively. Under some technical conditions, the asymptotic properties of the proposed methodology are derived with near-optimal rates (up to a logarithmic factor) achieved for the estimated breaks. Monte Carlo simulation studies are conducted to examine the finite-sample performance of the developed method and its comparison with other existing approaches. We finally apply our method to study the contemporaneous covariance structure of daily returns of S&P 500 constituents and identify a few breaks including those occurring during the 2007–2008 financial crisis and the recent coronavirus (COVID-19) outbreak. An package “ ” is provided to implement the proposed algorithms.


Introduction
Estimation of covariance matrices is of fundamental importance in modern multivariate statistics, and has applications in various fields such as biology, economics, finance, and social networks. Suppose that (X t : t = 1, . . . , n) is a collection of n observations from a d-dimensional random vector with E(X t ) = 0 and E X t X t = , where 0 is a null vector whose size may change from line to line and is a d × d positive definite covariance matrix. When the dimension d is fixed or significantly smaller than the sample size n, the population covariance matrix can be well estimated by the conventional sample covariance matrix (see Anderson 2003): However, when d exceeds n, the sample covariance matrix n defined in (1.1) becomes singular. Estimation of such a large covariance matrix is generally challenging and has received increasing attention in recent years. Some techniques have been introduced to regularize covariance matrices and produce reliable estimation (see Wu and Pourahmadi 2003;Ledoit and Wolf 2004;Bickel andLevina 2008a, 2008b;El Karoui 2008a, 2008bLam and Fan 2009;Rothman, Levina, and Zhu 2009;Cai and Liu 2011;Fan, Liao, and Mincheva 2013;Huang and Fryzlewicz 2019). A comprehensive review on large covariance Killick, Fearnhead, and Eckley (2012), Fryzlewicz (2014Fryzlewicz ( , 2020 and the references therein. In recent years, there has been increasing interest in extending this technique to highdimensional settings such as high-dimensional time series and large panel data (see Cho and Fryzlewicz 2015;Jirak 2015;Aston and Kirch 2018;Wang and Samworth 2018;Wang et al. 2019;Safikhani and Shojaie 2022). However, most of the aforementioned literature mainly considers break detection and estimation in the first-order moment structure. Aue et al. (2009) use the BS method with mean-corrected cumulative sum (CUSUM) to detect structural breaks in fixed dimensional covariance matrices, and establish the limit distributions for the test statistics and the convergence rate for the estimated break date. Korkas and Fryzlewicz (2017) detect breaks in the second-order moment structure of time series by combining the wavelet-based transformation and the univariate Wild Binary Segmentation (WBS) algorithm proposed by Fryzlewicz (2014). However, their technique is not directly applicable to our case when the size of the covariance matrix diverges to infinity as the sample size n increases.
In this article, we consider the second-order moment structure of the high-dimensional random vector X t generated by an approximate factor model, that is, (1.2) where = (λ 1 , . . . , λ d ) is a matrix of factor loadings with λ i being an r-dimensional vector, F t is an r-dimensional vector of common factors, and t = ( t1 , . . . , td ) is a d-dimensional vector of idiosyncratic components uncorrelated with F t . The approximate factor model is very popular in economics and finance, and has become an effective tool in analyzing highdimensional time series (see Chamberlain and Rothschild 1983;Fama and French 1992;Stock and Watson 2002;Bai andNg 2002, 2006;Chen et al. 2018). Our main focus is the contemporaneous second-order moment structure, which plays a crucial role in various fields such as the classic mean-variance portfolio choice theory and risk management. Note that = cov(X t ) = cov(F t ) + cov( t ) = ( , F) + ( ), (1.3) where ( ) is the covariance matrix of the idiosyncratic components t and ( , F) is the covariance matrix of the common components F t . In this article, we aim to study multiple structural breaks in the covariance structure of X t generated by the approximate factor model, estimating the break points and number of breaks. Structural breaks may occur in either the covariance structure of the common components or that of the idiosyncratic components, indicating that the constant covariance structure in (1.3) is replaced by the following timevarying version: (1.4) The main contributions of this article, the fundamental novelty of the proposed methodology and its connection to some recent literature are summarized as follows.
• A transformation mechanism is introduced to convert the approximate factor model with multiple structural breaks to that with constant factor loadings, justifying applicability of the classical Principal Component Analysis (PCA) technique (see Bai and Ng 2002;Stock and Watson 2002) to a more general model setting. Note that breaks in the second-order moment structure of the common components are due to changes in factor loadings, covariance of latent factors or factor number. This makes it challenging to directly estimate the latent common factors and factor loadings via PCA. When there are multiple breaks in factor loadings, we transform the original factor model to one with time-invariant factor loadings and a time-varying covariance structure for transformed factors, and then use the PCA method to estimate the transformed factors and factor loadings, which are shown to be consistent with convergence rates comparable to those in the existing literature on PCA for approximate factor models without breaks (see Bai and Ng 2002;Fan, Liao, and Mincheva 2013). This transformation technique is also used by Han and Inoue (2015) and Baltagi, Kao, andWang (2017, 2021) to estimate and test factor models with structural instability. However, they assume that the number of factors does not change. In contrast, we consider a more general setting, allowing changes of factor numbers and multiple breaks in the covariance matrices of idiosyncratic components. • With the estimated factors and idiosyncratic errors, we propose an easy-to-implement CUSUM-based detection technique to estimate the location and number of breaks and identify whether they originate in the common or idiosyncratic components. The developed CUSUM statistics are directly computed via the empirical second moments of the estimated factors and idiosyncratic errors. Based on Fryzlewicz's (2014) WBS algorithm, we propose a Wild Binary Segmentation for Covariance (WBS-Cov) to locate multiple breaks in the common components. For breaks in the idiosyncratic error components, we introduce a Wild Sparsified Binary Segmentation algorithm for Covariance (WSBS-Cov), combining WBS-Cov and the Sparsified Binary Segmentation (SBS) proposed by Cho and Fryzlewicz (2015) in the high-dimensional mean estimation context. A recent paper by Barigozzi, Cho, and Fryzlewicz (2018) combines the wavelet-based transformation and the double CUSUM technique  to estimate the break time as well as break number within the second-order moment structure. However, a disadvantage of the wavelet-based transformation on the piecewise-constant "signals, " involving selection of the wavelet filter and scale parameter, is that it could destroy the piecewise constancy and subsequently affect the estimation accuracy of the break number and location. When we only focus on a specific feature of the data, data transformation, even one-to-one transformation, would distort the interpretation of breaks of the structure. Another recent method to deal with sparsity in high-dimensional break detection is the sparse projection introduced by Wang and Samworth (2018) but their method only detects breaks in the first-order moment structure. Wang, Yu, and Rinaldo (2021) extends the BS and WBS techniques to detect multiple breaks in large covariance matrices, but they limit attention to the independent sub-Gaussian random vector and assume that the dimension is divergent at a slow polynomial rate of n. In contrast, we allow the dimension to grow faster, for example, at an exponential rate of n, a typical setting in big data analysis. • Simulation studies are implemented to assess the finitesample performance of the proposed method and its comparison with other competing methods. The simulation results show that our method has superior numerical performance in most settings, whereas the method by Barigozzi, Cho, and Fryzlewicz (2018) tends to perform poorly when the number of covariance matrix entries with breaks is small relative to the matrix size. Our simulation also shows that the proposed methods are robust when the factor number is over-estimated. • The developed model and method are applied to the daily return series of S&P 500 constituent which contain 375 firms over the period from January 01, 2000 to February 28, 2021, and detect five breaks in the covariance structure of the common components and three breaks in the idiosyncratic error components. In particular, we identify the breaks occurring during the 2007-2008 global financial crisis and the recent coronavirus (COVID-19) outbreak. With an additional comparison with break detection result of lowdimensional observable risk factor series, we emphasize that the breaks in covariance structure of returns cannot be fully explained by the breaks in the covariance structure of observed risk factors nor the breaks in loadings of these factors.
The rest of the article is organized as follows. Section 2 introduces the setting of multiple structural breaks, the PCA estimation and CUSUM-based detection methods, and the WBS-Cov and WSBS-Cov algorithms. Section 3 presents the asymptotic theory for the developed WBS-Cov and WSBS-Cov methods. Section 4 discusses some practical issues in the detection procedure. Sections 5 and 6 give the simulation studies and the empirical application, respectively. Section 7 concludes the article. The technical assumptions are given in Appendix A and proofs of the main theoretical results are available in Appendices C-E of a supplementary materials . An R package "BSCOV" for implementing the proposed methods is available from https://github.com/markov10000/BSCOV. Throughout the article, we let || · || 2 and || · || F denote the Euclidean norm of a column vector and the Frobenius norm of a matrix, respectively; let vech(·) denote the half vectorization of a symmetric matrix obtained by vectorizing only the lower triangular part of the matrix; let · denote the floor function; let a ∨ b and a ∧ b denote max{a, b} and min{a, b}, respectively; and let a n b n denote that a n = O(b n ) and b n = O(a n ) hold jointly.

Methodology
In this section, we first introduce the setting of multiple structural breaks in the contemporaneous covariance structure of both the common and idiosyncratic components, and transform the approximate factor models to construct PCA estimation of the latent common factors and factor loadings (with appropriate rotation). Using the estimated factors and idiosyncratic errors, we then introduce the CUSUM-based statistics to estimate the location and number of structural breaks as well as the WBS-Cov and WSBS-Cov algorithms.

Model Structure for Multiple Structural Breaks
We consider the following multiple structural breaks in the covariance matrix of the common components, that is, t ( , F) in (1.4) is specified as . . , K 1 , and η c 1 , . . . , η c K 1 are unobservable breaks with the superscript "c" denoting the "common component. " For convenience, we let η c 0 = 0 and η c K 1 +1 = n. The above structural breaks may be caused by one or a combination of the following cases, • Case (i): sudden changes in the factor loadings; • Case (ii): sudden changes in the covariance matrix for common factors; • Case (iii): sudden changes in the number of factors.
Let 0 k be a d × r k factor loading matrix and F t,k be the corresponding factor vector with dimension r k when η c k−1 +1 ≤ t ≤ η c k , and write the approximate factor model as , the structural break structure (2.1) can be equivalently written as Note that, although 0 k ( , F) = 0 k+1 ( , F) for k = 1, . . . , K 1 , 0 k may be the same as 0 k+1 (when cases (i) and (iii) do not occur at the break time). It is worthwhile to point out that case (i) is similar to the approximate factor model with structural breaks; see, for example, Breitung andEickmeier (2011), Chen, Dolado, andGonzalo (2014), Han and Inoue (2015), Cheng, Liao, and Schorfheide (2016), Baltagi, Kao, and Wang (2017), Bai, Han, and Shi (2020), and Duan, Bai, and Han (2022). However, the aforementioned papers consider the case of a single structural break in the factor loadings and usually assume that the covariance of the idiosyncratic error components is time invariant. Ma and Su (2018) consider detecting and estimating multiple structural breaks in the factor loadings with a three-step procedure using the adaptive fused group Lasso. Case (iii) is an important type of structure break in the factor model, which has received much attention in recent years (see Baltagi, Kao, and Wang 2017;Barigozzi, Cho, and Fryzlewicz 2018;Li et al. 2019). Section 2.2 will give a unified factor model framework covering cases (i)-(iii) which facilitates the construction of PCA estimation, and then proceed to introduce the relevant CUSUM statistics in Section 2.3 to detect the multiple structural breaks in the common component.
The structural breaks in the covariance matrix of the error components are specified as follows are unobservable breaks with the superscript "e" denoting the "idiosyncratic error component. " For convenience, we let η e 0 = 0 and η e K 2 +1 = n. Let In this article, we assume that the minimum length of the subintervals separated by η c k is of order n, see Assumption 4(ii) in Appendix A. Such a restriction is mainly to ensure that the PCA method is applicable to estimate the latent factors and factor loadings. In contrast, the minimum length of the subintervals separated by η e k is allowed to be of order smaller than n, see Assumption 4(iii). We do not impose any restriction on how the break points in B c and those in B e are located with respect to each other.

Transformed Factor Models and PCA Estimation
Although the multiple structural breaks formulated in (2.1) and (2.2) may be caused by one or a combination of cases (i)-(iii) described in Section 2.1, we next show that multiple structural breaks in the factor loadings and factor number can be transformed to breaks in the factor covariance structure.
Proposition 2.1. The approximate factor model (2.2) can be equivalently written as a transformed factor model like where denotes the transformed factor loading matrix which is time invariant, and F t denotes the transformed factors. In addition, the number of factors in the original factor model and that in the transformed factor model, satisfy the following inequalities, where q 0 is the number of transformed factors, r k is the column rank of 0 k and r k is the number of original factors F t,k when η c k−1 + 1 ≤ t ≤ η c k .
Remark 2.1. A representation similar to Proposition 2.1 is also given in Han and Inoue (2015) and Baltagi, Kao, andWang (2017, 2021). Note that construction of in the transformation mechanism is not unique due to the factor model identification issue. When each of 0 k is of full column rank, we have r k = r k , and consequently the inequalities in (2.5) would be simplified to max 1≤k≤K 1 +1 The upper bound for q 0 can be achieved if 0 k , k = 1, . . . , K 1 + 1, are linearly independent. Appendix B in the supplementary materials gives a simple motivating example for the above transformation. Proposition 3.1 in Section 3.1 will further explore the limiting behavior of 1 n n t=1 F t F t as n → ∞, which is crucial to application of the classic PCA method to estimate the transformed factors and factor loadings. Through the above transformation, it is sufficient to construct the CUSUM statistic using the estimated transformed factors for all of cases (i)-(iii) discussed in Section 2.1.
As the common factors are latent, we have to obtain their estimates (subject to appropriate rotation) in practice. By Proposition 2.1, we consider the transformed factor model like (2.4) with time-invariant factor loadings, and apply the PCA estimation technique. Let X n = (X 1 , . . . , X n ) be an n × d matrix of observations. For the time being, we assume that the number of transformed factors q 0 is known, and will discuss how to determine it later in this section. The estimated factorsF t as well as the estimated idiosyncratic errorsˆ t will be used to construct the CUSUM statistics in Sections 2.3 and 2.4. The algorithm for PCA estimation is given as follows.

The idiosyncratic component t is approximated bŷ
where X tj is the jth element of X t .
Output:F t ,ˆ t for t = 1, . . . , n A key issue in the PCA estimation is to appropriately choose the number of latent factors. As the above PCA method focuses on the transformed factor model, our aim is to estimate the number of transformed factors rather than that of the original ones. In general, the number of transformed factors q 0 could be much larger than maximum of r k (or r k ) over k = 1 . . . , K 1 + 1. However, when both (r k , k = 1 . . . , K 1 + 1) and K 1 are assumed to be fixed, from (2.5) in Proposition 2.1, we have the true number of transformed factors to be upper bounded by a finite positive integer denoted by Q. Some existing criteria developed for stable factor models can be applied to the transformed factor model (2.4) to obtain a consistent estimate of q 0 . In this article, we use a commonly used information criterion proposed by Bai and Ng (2002).
For any 1 ≤ q ≤ Q, we letF n (q) = F 1 (q), . . .F n (q) be the estimated factors obtained in Algorithm 1 replacing q 0 by q. Define is a d × q factor loading matrix and μ j (·) denotes the jth largest eigenvalue. Consequently, we can choose the following objective function: and obtain the estimateq viâ where the common components disappear when q = 0. Alternative IC functions with different penalty terms in (2.8) can be found in Bai and Ng (2002) and in Alessi, Barigozzi, and Capasso (2010) for the case of large idiosyncratic disturbances. With the asymptotic results given in Theorem 2 of Bai and Ng (2002), we may show thatq is a consistent estimator of the true number q 0 , indicating thatq = q 0 whose probability converges to one as the sample size increases. Although accurate estimation of the factor number may be difficult in practice, slight over-estimation of the (transformed) factor number may be not risky in break detection (see Barigozzi, Cho, and Fryzlewicz 2018). In Appendix F of the supplementary materials, we provide a simulation to show that over-estimating q 0 would not negatively affect accuracy of multiple break detection. It is possible to avoid precisely estimating the factor number by following Barigozzi, Cho, and Fryzlewicz's (2018) method, multiplying the estimated factors and factor loadings to obtain the estimated common components. However, this would lead to a high-dimensional break detection for the common components.

Break Detection in the Common Components
We start with the Binary Segmentation for Covariance (BS-Cov) technique to detect breaks in the common components and then introduce the WBS-Cov algorithm. As breaks in the factor loadings and factor number can be transformed to those in the covariance of transformed factors after reformulating the approximate factor model to the one with time-invariant factor loadings, we next define a CUSUM statistic using the estimated which is a column vector with dimension q 0 (q 0 + 1)/2. Maxi- (2.11) the first candidate break point (which is not necessarily the estimate of the first break time point η c 1 ). If the quantity CF l,u (η c 1 ) 2 exceeds certain threshold, say ξ c n , we split the closed interval with respect to s, estimate the next candidate break points, and examine whether a further split is needed. We continue this until no more split is needed.

Remark 2.2.
(i) It may be possible to extend (2.11) to a weighted version (see Hariz, Wylie, and Zhang 2007;Yu and Chen 2021) with −1/2 < υ ≤ 1/2. A smaller υ may help detect breaks that are close to boundary. For simplicity, we only consider the unweighted version, that is, υ = 0, otherwise υ would affect the asymptotic analysis. (ii) In the above BS-Cov, we use the l 2 -type aggregation of the CUSUM quantities. Alternatively, we may use different norms in the aggregation such as the l 1 -norm and l ∞norm. It may be also possible to consider a weighted aggregation and seek an optimal projection direction as recommended by Wang and Samworth (2018). However, it is not clear how to implement the optimal direction method in a computationally efficient way for break detection in our model setting, so we do not pursue it in the present article. In Appendix F of the supplementary materials, we compare numerical performance among different types of aggregation on the CUSUM quantities and find that the l 2 -type aggregation produces the most accurate detection result. (iii) The l 2 -aggregation in (2.11) ignores possible correlation structure between elements of the vector CUSUM statistic. Such a simplification would not affect estimation consistency of the break number and locations. For break detection in fixed-dimensional covariance matrices, Aue et al. (2009) propose a weighted quadratic form of the CUSUM vector using an estimated long-run covariance matrix. This refinement in the l 2 -aggregation may improve break estimation efficiency. However, consistent estimation of the long-run covariance structure for vech F tF t is difficult to obtain when there are unknown multiple breaks.
In order to accurately estimate the break time and delete possible spurious spikes in the CUSUM statistics, instead of using BS-Cov, we next extend the WBS algorithm introduced by Fryzlewicz (2014) to detect the multiple structural breaks in the common components. Letη c 1 , . . . ,η cK 1 be the estimated break points (arranged in an increasing order) andK 1 be the estimate of the unknown break number K 1 obtained in the following WBS-Cov algorithm. A discussion on selection of the threshold ξ c n can be found in Section 4. The method to draw random intervals in the WBS-Cov algorithm is the same as that in Fryzlewicz (2014) (see also Algorithm 4 in Section 4). Fryzlewicz (2020) provides an alternative recursive method of drawing random intervals in the WBS algorithm. In general, the intervals drawn in Algorithm 2 do not have to be random but can be taken over a fixed grid.

Break Detection in the Idiosyncratic Components
We next turn to detection of multiple structural breaks in the covariance structure of the idiosyncratic vector, which is much more challenging as the dimension involved can be ultra large and t is unobservable. In the existing literature, to obtain reliable estimation of ultra-high dimensional covariance matrices (without breaks), it is often common to impose certain sparsity condition, limiting the number of nonzero entries. Consequently, the thresholding or other generalized shrinkage methods (Bickel and Levina 2008a;Rothman, Levina, and Zhu 2009;Cai and Liu 2011) have been developed to estimate high-dimensional covariance matrices. A similar thresholding technique is also introduced by Cho and Fryzlewicz (2015) to detect multiple structural breaks in high-dimensional time series setting. We next generalize the latter to detect breaks in the large covariance structure of the idiosyncratic error vector, using the approximation of t obtained in the PCA algorithm. Define the normalized CUSUM-type statistic usingˆ tj defined in (2.7): is a properly chosen scaling factor. The value of the scaling factor depends on both the time interval [l, u] and the index pair (i, j). The scaling factors ensure that we can choose a common threshold for each index pair. Note that the scaling factor is not needed for the CUSUM statistic (2.10) to detect breaks in the common components since the estimated factors have been normalized. We chooseσ l,u (i, j) as the differential error median absolute deviation usingˆ tj , whose detailed construction is given in Section 4. To derive the consistency result for break detection, we assume that the random scaling factors are uniformly bounded away from zero and infinity over 1 ≤ i, j ≤ d and 1 ≤ l < u ≤ n with probability tending to one (see Assumption 5 in Appendix A). As the dimension d → ∞ and might be much larger than the sample size n (see Assumption 4(i)), a direct aggregation of the CUSUM statistics cˆ ,σ l,u (s; i, j) over i and j might not perform well in detecting the breaks in particular when the sparsity condition is imposed. Hence, we compare cˆ ,σ l,u (s; i, j) with a thresholding parameter, say ξ e n , and delete the index (2.13) where I(·) is an indicator function. It is easy to see that the truncation component in (2.13) is independent of the random subintervals [l m , u m ]. Therefore, if max l≤t<u cˆ ,σ l,u (t; i, j) ≤ ξ e n , the algorithm will no longer search for breaks within the interval [l, u) for the index pair (i, j). The main difference between our method and that in Cho and Fryzlewicz (2015) is that they sparsify the classical Binary Segmentation to locate the break points and estimate the break number, whereas we combine the sparsified CUSUM quantity with the WBS-Cov algorithm introduced as in Section 2.3. Thus, we call the proposed algorithm the Wild Sparsified Binary Segmentation for Covariance or WSBS-Cov. The detailed algorithm is given as follows.
Note that the sparsified CUSUM statistic Cˆ l m ,u m (s) cannot take any value between 0 and (ξ e n ) 2 and

Large-Sample Theory
In this section we establish the large-sample theory for the methods proposed in Section 2 under some technical assumptions listed in Appendix A. The asymptotic properties for the WBS-Cov and WSBS-Cov methods are given in Sections 3.1 and 3.2, respectively.

Asymptotic Theory of WBS-Cov for the Common Components
We start with the following proposition on some fundamental properties for the transformed factors F t and factor loadings in (2.4).
Proposition 3.1. Suppose that Assumption 2 in Appendix A is satisfied and let κ c n = min 1≤k≤K 1 +1 (η c k − η c k−1 ). The transformed factor loading matrix is of full column rank and the transformed factors F t satisfy that n t=1 F t F t F = O P (n) when κ c n → ∞. If, in addition, κ c n n, there exists a q 0 × q 0 positive definite matrix F such that (3.1) The above proposition shows that the sample second-order moment of the transformed factors converges in probability under some regularity conditions in Assumption 2. The convergence result (3.1) is sometimes given directly as a highlevel condition in the literature (e.g., Assumption A1 in Ma and Su 2018). The restriction κ c n n is crucial to ensure positive definiteness of F . If the order of the minimum distance between breaks is smaller than n, the limit matrix F may be singular.
We next study the asymptotic theory for the WBS-Cov algorithm in Section 2.3 to detect breaks in the common components. Define an infeasible CUSUM statistic using the latent transformed factors: and H is a q 0 × q 0 rotation matrix defined by in which q 0 is a q 0 × q 0 diagonal matrix with the diagonal elements being the first q 0 largest eigenvalues of X n X n /(nd) arranged in a descending order, and = λ 1 , . . . , λ d with λ j being a q 0 -dimensional vector of transformed factor loadings.  where κ c n is defined in Proposition 3.1 and ω c n denotes the minimum break size (in the common components) defined in Assumption 4(ii). Then there exists a positive constant ι c such that where ϕ c n = log 4 n/ω c n .
Remark 3.1. The convergence result (3.6) shows that the proposed estimator of the break points in the common components via WBS-Cov has the approximation rate of ϕ c n = log 4 n/ω c n , which can be further simplified to the rate of log 4 n if the minimum break size ω c n is larger than a small positive constant. Note that, even if the factors are observable, the optimal estimation rate of the break points in their covariance structure is O P (1) (Korostelev 1988;Aue et al. 2009) when the binary segmentation is used in break detection. Hence, our approximation rate in Theorem 3.1 is nearly optimal up to a fourth-order logarithmic rate. Since we remove the Gaussian assumption on the observations and further allow temporal dependence in the data over time, our approximation rate is slightly slower than that in Theorem 3.2 of Fryzlewicz (2014).

Asymptotic Theory of the WSBS-Cov for the Idiosyncratic Components
We next present the asymptotic theory of structural break estimation in the idiosyncratic error components. Similarly to that in Section 3.1, we define the following infeasible CUSUM-type statistics using the unobservable tj , for 1 ≤ l ≤ s < u ≤ n and 1 ≤ i, j ≤ d, whereσ l,u (i, j) is defined as in (2.12), and (3.9) Proposition 3.3 plays a crucial role in our asymptotic analysis, indicating that the CUSUM statistics cˆ ,σ l,u (s; i, j) can be replaced by the infeasible ones c ,σ l,u (s; i, j), which are much easier to handle in the proofs. The following theorem establishes the convergence result for the estimates of the break points and break number in the covariance structure of the idiosyncratic components.
Remark 3.2. The rate in (3.11) relies on both n and d, making it substantially different from the convergence result in Theorem 3.1. If the dimension diverges to infinity at a polynomial rate of n (see Barigozzi, Cho, and Fryzlewicz 2018) and assume κ e n n, the rate ϕ e n,d becomes log 4 n/ω e n , slightly faster than that obtained in Theorem 3 of Barigozzi, Cho, and Fryzlewicz (2018) when the breaks are sparse in covariance of the idiosyncratic errors.

Practical Issues in the Detection Procedure
In this section, we first extend the so-called Strengthened Schwarz Information Criterion (SSIC) to our model setting to determine the number of breaks, and then discuss the choice of the random scaling quantityσ l,u (i, j) in (2.12) and the thresholding parameter ξ e n in the WSBS-Cov. The SSIC is proposed by Fryzlewicz (2014) in the context of univariate WBS, and some modifications are needed to make it applicable to our setting. For the break detection in the covariance structure of the common components, when the algorithm proceeds, we estimate the k candidate break points (arranged in an increasing order) denoted byη c 1|k , . . . ,η c k|k with the convention ofη c 0|k = 0 andη c k+1|k = n. Letting ZF t = vech F tF t = ZF t,1 , . . . , ZF t,q 0 (q 0 +1)/2 and ZF •,j = ZF 1,j , . . . , ZF n,j for j = 1, . . . , q 0 (q 0 +1)/2, we define the SSIC objective function for the univariate process ZF •,j by ZF s,j forη c i|k + 1 ≤ t ≤η c i+1|k , i = 0, . . . , k, p(n) = n 1/2 , andK is a positive constant which is a prespecified upper bound for the break number. If there exists a positive integer K c such that SSIC c j (K c + 1) > SSIC c j (K c ) ∀j = 1, . . . , q 0 (q 0 + 1)/2, (4.2) no further break exists in any dimension of the transformed factors. LetK 1 be the smallest number such that (4.2) is satisfied, and stop the WBS-Cov algorithm in Section 2.3 when the number of breaks reachesK 1 . By using SSIC, we avoid choosing the tuning parameter ξ c n in the WBS-Cov. Similarly, we can also use the SSIC to obtain the estimated break number for the idiosyncratic error components. Let SSIC e i,j (k) be defined similarly to SSIC c j (k) but with ZF t,j and ZF t,j (k) replaced by for all 1 ≤ i, j ≤ d, and thus we obtain the estimated break number for the idiosyncratic error component. In the construction of the CUSUM statistic in the WSBS-Cov, we need to define the random scaling quantityσ l,u (i, j). In the numerical studies, we letσ l,u (i, j) be the differential error median absolute deviation which is defined bŷ in which med l≤t<u (·) denotes the sample median over the time interval [l, u) (e.g., Fryzlewicz 2014). An advantage of using the median absolute deviation is that it provides robust scale estimation.
When implementing the WBS-Cov and WSBS-Cov algorithms, we have to guarantee sufficient length of the random intervals to facilitate break detection. When l and u are too close, σ l,u (i, j) may be unbounded, leading to violation of Assumption 5 in Appendix A. Consequently, the CUSUM statistics on those small intervals would be smaller than the thresholding parameter so that no breaks could be detected. Therefore, we introduce a quantity n as in Barigozzi, Cho, and Fryzlewicz (2018), to control the minimum length of the random intervals. Specifically, Algorithm 4 is used to produce a set of random intervals in WBS-Cov or WSBS-Cov. We set M n = M c n = M e n = 400 in the numerical studies. By Algorithm 4, the random intervals are produced with length of at least 4 n . In addition, we also trim each random interval to keep a distance of n to both boundaries, indicating that we only maximize the CUSUM statistics over the interval [l m + n , u m − n ]. Note that n should be much smaller than κ c n and κ e n . Thus, we choose n = (log 2 n) ∧ (0.25n 6/7 ) following the discussion in Barigozzi, Cho, and Fryzlewicz (2018).

Algorithm 4 Drawing random intervals
A crucial issue in constructing the sparsified CUSUM statistic in the WSBS-Cov algorithm is the choice of the thresholding parameter ξ e n . Cho and Fryzlewicz (2015) suggest choosing this parameter using the CUSUM statistic based on a stationary process (e.g., a simulated AR(1) process) under the null hypothesis of no breaks. However, in practical applications, for the high-dimensional time series data with latent breaks, it is often difficult to recover the underlying stationary structure. To address this problem, we propose an alternative approach to choose ξ e n : (i) pre-detect the breaks (e.g., combining the classical BS-Cov and SSIC without knowing ξ e n a priori) and remove the breaks from the high-dimensional covariance matrices through demeaning using the estimated breaks, that is, generate Zε t,ij − Zε t,ij (K e n ) as in (4.3), whereK e n is the preliminary estimate of the break number obtained in the pre-detection of the idiosyncratic components; (ii) calculate the CUSUM statistics of the "stationary" process Zε t,ij −Zε t,ij (K e n ) for each pair (i, j) and then take the maximum as the chosen thresholding parameter ξ e n . This technique is more computationally intensive than any formula-based threshold selection procedures; however, we have found that it performs well in the numerical studies reported in Section 5.

Monte Carlo Simulation
In this section, we provide a simulation study to compare the finite-sample performance between the proposed methods and the method proposed by Barigozzi, Cho, and Fryzlewicz (2018), and examine the performance of our methods in the weak factor structure. Additional simulation studies are given in Appendix F of the supplementary materials, where the proposed methods are compared with various other competing methods that we designed. Each of these methods differs from the method presented in our main text in only one step. Like ablation studies that are commonly conducted in deep learning research, they help to gain a better understanding of our methods. Since these alternative methods are not designed for practical use, nor are they theoretically justified, we keep them in the supplementary materials.
We consider the following factor model to generate data: where θ > 0 is used to control the signal-to-noise ratio. The replication number in each simulation cases is set to be R = 100. For the 100 simulated samples, we report the estimated number of break(s) as well as the accuracy measure for each break location η k defined by ...,K(i) |η l (i) − η k | ≤ log n , (5.2) for k = 1, . . . , K, where K denotes the break number K 1 or K 2 ,K(i) is the corresponding break number estimate in the ith simulated sample, η k denotes either η c k or η e k , andη l (i), l = 1, . . . ,K(i), are the break point estimates in the ith simulated sample, Example 5.1. We use model (5.1) to generate the data in simulation, where θ = 0.5, the number of factors is r = 5, the sample size is n = 400, and the dimension is d = 200. The factor process F t = (F t1 , . . . , F t5 ) is generated from a multivariate normal distribution N 5 0, * F independently over t, where * F is a covariance matrix with one break specified as follows: for 1 ≤ t ≤ η c 1 = 133, φ F j , the square root of the jth diagonal element of * F , is independently generated from a uniform distribution U(0.5, 1.5), and the (i, j)-entry of * F is defined as φ F i φ F j (0.5) |i−j| for 1 ≤ i = j ≤ 5; for η c 1 < t ≤ n, the (1,2) and (2,1)-entries of * F change from 0.5φ F 1 φ F 2 to 0.9φ F 1 φ F 2 , and φ F 5 is replaced by 1.3φ F 5 resulting in structural breaks in (5, j) and (j, 5)-entries of * F . For 1 ≤ t ≤ η c 2 = 267, the factor loadings λ ij are independently generated from a uniform distribution U(−1, 1); whereas for η c 2 < t ≤ n, the factor loadings corresponding to the first two factors are  regenerated by a new uniform distribution U(−1, 1). Therefore, there are two structural breaks in the covariance structure of the common components covering cases (i) and (ii) discussed in Section 2.1.
The idiosyncratic errors t follow a multivariate normal distribution N d (0, ) independently over t, where φ j , the square root of the jth diagonal element of , is generated from an independent uniform distribution U(0.5, 1.5), and the (i, j)- There are three breaks, η e 1 = n/4 = 100, η e 2 = n/2 = 200 and η e 3 = 3n/4 = 300, in the idiosyncratic error components. At each of the break points, we swap the orders of e 1 d/2 randomly selected pairs of elements of t with e 1 chosen as 1, 0.5 and 0.1. Note that e 1 = 0.1 indicates that the structural breaks are sparse in the high-dimensional error components, whereas e 1 = 1 indicates that the breaks are dense. Table 1 reports the simulation result, where "W(S)BS-Cov" denotes our proposed methods and "BCF" denotes the method proposed by Barigozzi, Cho, and Fryzlewicz (2018) which combines the wavelet-based transformation and the double-CUSUM method. We find that the proposed WBS-Cov method has better finite-sample performance in detecting breaks in the common components with more accurate estimated break number and the higher ACU, whereas "BCF" tends to neglect the first break and thus under-estimate the number of breaks. For the break detection in the idiosyncratic error components, when e 1 = 1, the proposed WSBS-Cov method and "BCF" perform equally well. However, when e 1 = 0.5 and 0.1, the WSBS-Cov method clearly outperforms "BCF. " We next consider an example with breaks in the weak factor structure. When factor loadings for the transformed factor model in (2.4) satisfies μ k = o P (d), we call the kth factor as a weak factor. Although our asymptotic framework (see Assumption 2(ii)) rules out this scenario, we provide the following simulation to examine the effect of weak factors to the proposed break detection methods.
Example 5.2. We use model (5.1) with a setting similar to Freyaldenhoven (2021) to generate the data, where the number of factors is r = 9, θ = 1.5, the sample size is n = 400, and the dimension is d = 200. The factor process F t is generated from a multivariate normal distribution N 9 (0, I 9 ) independently over t. The jth factor only affects d j elements of the d-dimensional observation vector, indicating that the factor loading matrix is sparse with d j nonzero entries in its jth column. The position of nonzero entries are randomly selected, and their values are drawn from N (1, 1) independently. We set (d 1 , . . . , We set two breaks for the common components: η c 1 = 100 and η c 2 = 300 and consider nine cases in the simulation. Case j represents a set-up in which breaks only occur on the jth column of the factor loading matrix. Specifically, independently for the three periods: 1 ≤ t ≤ 100, 101 ≤ t ≤ 300, and 301 ≤ t ≤ 400, the position of nonzero entries in the jth factor loading vector is randomly selected with its value independently regenerated. For the idiosyncratic error components, as in Freyaldenhoven (2021), we let t = A e t and generate e t = (e t1 , . . . , e td ) by where u e,ti are independently drawn from N(0, 1), and (ρ e , ρ v ) = (0.3, 0.1). We set two break points on A : η e 1 = 150 and η e 2 = 250. For 1 ≤ t ≤ 150, we first generate a tridiagonal matrixÃ with all diagonal entries being one and the other nonzero entries drawn from U(0.5, 1.5), and then standardizeÃ by letting each row multiply a constant so that the row sum of squares is one to define A . The matrix A is randomly regenerated in the same way for 151 ≤ t ≤ 250 and 251 ≤ t ≤ 400.
As shown by Freyaldenhoven (2021), the classic PCA estimators are consistent only for the "relevant" factors whose corresponding d j is of an order larger than d 1/2 . Hence, the number of relevant factors is 6 in the simulated weak factor model. By the factor model transformation mechanism in Proposition 2.1, we expect that the number of factors should be 8 for cases 1-6 (6 relevant factors plus 2 extra factors due to factor transformation accommodating breaks), and 6 in cases 7-9. Table 2 reports the simulation results for the break detection. We find that the factor number is over-estimated in all cases. The break numbers and locations in both the common and idiosyncratic components are correctly estimated in general when breaks occur in the relevant factors which are relatively strong (cases 1-3). The break detection results are less accurate when there are breaks in loadings of relevant factors which are relatively weak (cases 4-6). In cases 7-9, since weak factors with breaks in the loadings are not selected as the relevant factors via the information criterion, the proposed WBS-Cov method cannot detect either of the two breaks in the common components.

An Empirical Application
In this section, we use the proposed methods to detect breaks in the contemporaneous covariance structure of the daily returns of S&P 500 constituents.

Break Detection in Covariances
The data are retrieved from Refinitiv Datastream database (formerly Thomson Reuters Datastream), covering the time period from January 01, 2000 to February 28, 2021. We consider 375 firms listed on the S&P 500 index over the entire time period. Let X it denote the demeaned percentage price change (without reinvesting of dividends) of equity i at time t, where i = 1, . . . , 375 and t = 1, . . . , 5322. We fit the factor model (2.2) to the data with X t = (X 1t , . . . , X 375,t ) , allowing structural breaks in the covariance matrices of both the common and idiosyncratic components.
The information criterion introduced in Section 2.2 selects nine factors (with appropriate transformation), that is,q = 9. We set n = 20 and M c n = M e n = 1000 in the WBS-Cov and WSBS-Cov algorithms. Combined with SSIC, the WBS-Cov algorithm detects five breaks in the covariance matrix of the common components, and the WSBS-Cov algorithm detects three breaks in the idiosyncratic components. We plot the estimated break times on the series of the S&P 500 index and its returns in Figure 1, where the red vertical lines denote breaks in the common components and the blue dotted vertical lines denote breaks in the idiosyncratic components. Among the five breaks in the common components, two occur during the period of a bearish market from 2000 to 2003, two occur in 2008 and 2009 during the global financial crisis, and one occurs recently due to the COVID-19 outbreak. Among the three breaks in the idiosyncratic errors, the first break occurs around the end of 2002, the second and third breaks occur in 2008 and 2009.

Calibration
We next provide a simulation using the factor model with calibrated parameters from the empirical study, and aim to further examine the proposed break detection methods in finite samples.
• Generate a simulated dataset with n = 5322 and d = 375 as the real data.
• Consider six sub-periods separated by the five estimated break dates in the common components (see the red lines in Figure 1) and use the PCA estimated factor loadings. The factors are drawn from a standard normal distribution independently over time and across dimension. • Divide the estimated idiosyncratic errors into four segments split by the three estimated breaks in idiosyncratic components (see the blue dotted lines in Figure 1), and separately estimate their sample covariance matrices. For each segment, generate idiosyncratic errors by a multivariate normal distribution independently over time with mean zero and covariance matrix being the sample covariance matrix.
The above generating mechanism guarantees the simulated covariance structure to be the same as that for the empirical data. As in the simulation study in Section 5, we repeat the simulation procedure 100 times and report the break detection results. For simplicity, we fix ξ e n = 95.99 in WSBS-Cov, which is calculated by applying the method in Section 4 to the real data. Table 3 reports the average break number estimates, the number of times that the estimated break is within a distance of log(n) ≈ 8.6 to the true break, denoted as ACU, and the root-mean-squared distance of the closest estimated break to the true break, denoted as RMSE. Except for the first two breaks (April 24, 2000 andNovember 25, 2002) in the common components, ACUs are larger than 95% and RMSEs are smaller than 4, indicating that the break dates are accurately detected. The low ACUs and large RMSEs of the first two breaks may be due to the relatively weak break magnitude. The estimated break numbers reported in Table 3 are slightly larger than the true break numbers (5 for the common components and 3 for the idiosyncratic components). In addition, we also fix q 0 = 12 in the break detection procedures and obtain similar simulation results, showing that the proposed methods work when the factor number is over-estimated.

Comparison with Observed Risk Factors
As a benchmark, we also download eight observable risk factors (over January 01, 2000-February 28, 2021) from Kenneth R. French's data library 1 and study breaks in their covariance structure. These risk factors include the excess return (MKT), the size factor (SMB), the value factor (HML), the profitability Figure 1. The detected break times in the covariance structure of the 375 daily returns of the S&P 500 constituents from January 01, 2000 to February 28, 2021. The estimated break dates for common components are April 24, 2000, November 25, 2002, September 12, 2008, May 08, 2009and March 06, 2020; and the estimated break dates for idiosyncratic components are August 06, 2002, September 11, 2008and May 11, 2009 (in blue dotted lines). factor (RMW) and the investment factor (CMA) introduced by Fama andFrench (1993, 2015); the Momentum factor (MOM) proposed by Carhart (1997) in a four-factor model; and the short term reversal factor (STRev) and the long-term reversal factor (LTRev) which are less famous but equally important (see De Bondt and Thaler 1987;Lehmann 1990). They are often used as observable proxies of latent factors in the approximate factor model. We plot patterns of observed factors in Figure 2, from which we can see that they vary drastically during the global financial crisis and the COVID-19 outbreak. Let Z t = (Z 1t , . . . , Z 8t ) denote the vector containing the eight risk factors, t = 1, . . . , 5322. As the dimension of Z t is low, there is no need to impose the approximate factor model framework. Using the WBS-Cov algorithm (without any latent factor structure) in Section 2. Note that the first two estimated break dates (April 19, 2001 andJuly 01, 2008) are quite different from the first two break dates (April 24, 2000 andNovember 25, 2002) in the common components obtained by applying the proposed WBS-Cov to the estimated factors, see Figure 1. Table 4 shows the canonical correlation between the estimated factors and observed factors for the entire sample and six subsamples divided by the estimated break dates in the common components. p-values are reported in the brackets using the F-approximations of Rao's test statistic related to Wilks' Lambda (see Rao 1973). Some of the minimum canonical correlations are small and insignificant, indicating that the space spanned by the eight observed factors may be not the same as that spanned by the nine PCA estimated factors.
To further analyze whether breaks in the common components can be explained by the observed factors, we run regression of stock returns on the eight observed factors and detect whether breaks exist in covariances of the regression residuals. We apply the factor model (2.2) to the residual vectors and use the information criterion to select six factors. Using WBS-Cov, we detect six breaks in the common components (of the residual vectors): April 20, 2000, November 22, 2002, July 07, 2008, September 26, 2008, May 08, 2009and March 06, 2020, and one break in the idiosyncratic components which is August 12, 2002. Note that the first two breaks in the common components April 20, 2000and November 22, 2002are very close to April 24, 2000and November 25, 2002, respectively, which are reported in Figure 1. Hence, we may conclude that break detection of a low-dimensional observed risk factor cannot fully capture structural breaks in covariance of price changes in the entire stock market.

Conclusion
In this article, we detect and estimate multiple structural breaks in the contemporaneous covariance structure of the highdimensional time series vector. We allow high correlation between the time series variables by imposing the approximate factor model, a commonly-used framework in economics and finance. The structural breaks occur in either the common or idiosyncratic error components. When there are breaks in the factor loadings or factor numbers, we transform the factor model in order to make the classic PCA method applicable and subsequently obtain estimates of the transformed factors and approximation of the idiosyncratic errors. The construction of the CUSUM quantities is based on the second-order moments of the estimated factors and errors. The WBS algorithm introduced by Fryzlewicz (2014), combined with SSIC, is extended to the multivariate setting to detect breaks in covariance structure of the common components, estimating the location and number of breaks. For the idiosyncratic error components, we combine the WBS-Cov algorithm and the sparsified CUSUM statistic to detect structural breaks (which can be either sparse or dense), extending the SBS algorithm proposed by Cho and Fryzlewicz (2015). Under some regularity conditions, we show consistency of the estimated break numbers and derive the near-optimal rates (up to a fourth-order logarithmic factor) for the estimated breaks. Monte Carlo simulation studies are conducted to examine the numerical performance of the proposed WBS-Cov and WSBS-Cov methods in finite samples and its comparison with the method introduced by Barigozzi, Cho, and Fryzlewicz (2018). In addition, we examine performance of the proposed methods when breaks are on weak factors. In the empirical application to the return series of S&P 500 constituent, we detect five break in the common components and three breaks in the idiosyncratic error components by the developed methods.

Appendix A: Technical Assumptions
The following assumptions are needed to establish the asymptotic theorems in Sections 3.1 and 3.2. Some of the conditions (such as the moment conditions) may be not the weakest possible but can be relaxed at the cost of more lengthy proofs.
The assumption of mixing dependence on the latent common factor and idiosyncratic error process in Assumption 1(i) is very mild and covers some commonly-used vector time series models. Note that the α-mixing dependence condition allows the underlying process to be nonstationary with time-varying covariance structure (e.g., the piece-wise stationary time series process with breaks in the covariance matrix). The moment conditions in Assumptions 1(ii) and 3 are crucial to consistently estimate the factors and factor loadings (with rotation) in the transformed factor model (2.4).
Assumption 2 contains some fundamental conditions for the approximate factor model (2.2), which can be seen as a generalized version of those for the stable factor model without breaks (e.g., Bai and Ng 2002;Fan, Liao, and Mincheva 2013). These conditions, together with κ c n n in Assumption 4(ii), indicate that there exists a nonsingular limiting matrix for 1 n n t=1 F t F t and is of full column rank (see Proposition 3.1), which are essential to derive the convergence result for the PCA estimated factors and factor loadings. In addition, Assumption 2 also guarantees that ω c n = O(1). The moment conditions (A.2) and (A.3) in Assumption 3(i) imply that the idiosyncratic errors tj are allowed to be weakly dependent over j. Similar high-level moment conditions can be found in Assumption 4 in Bai and Ng (2002) and Assumption 3.4 in Fan, Liao, and Mincheva (2013). Assumption 3(ii) is similar to the exponential-tail condition commonly used in high-dimensional data analysis and is needed to allow the dimension d to be divergent at an exponential rate of n.
Assumption 4(i) indicates that both d and n diverge to infinity jointly, which is a typical large panel data setting necessary for consistent estimates of the rotated factors and factor loadings. A very similar assumption is also used in Theorem 1 of Fan, Liao, and Mincheva (2013). In particular, Assumption 4(i) shows that d can be divergent to infinity at an exponential rate of n, further relaxing the corresponding condition assumed in Barigozzi, Cho, and Fryzlewicz (2018) and Wang, Yu, and Rinaldo (2021). Although the restriction n ≤ ι 3 d δ/(δ+4) indicates that n is of smaller asymptotic order than d, our methodology is still applicable if n is much larger than d by switching the role of n and d and the role of the factor loadings and common factors in the PCA estimation (see sec. 3 in Bai and Li 2012). Assumption 4(ii) is mainly used for the asymptotics of WBS-Cov in the common components, and would be automatically satisfied when ι 4 ≤ ω c n ≤ ω c n ≤ ι 5 and M c n = log n , where ι 4 < ι 5 are two positive constants. Assumption 4(iii) is mainly used to derive the WSBS-Cov asymptotic theory in the idiosyncratic error components. The restriction ω e n ω e n is only imposed to facilitate the proofs. The restrictions on ω c n and ω e n are consistent with the argument in Andrews (1993) and Bai (1997) that the break points can be identified only if the size of breaks, that is, (ω c n ) 1/2 and (ω e n ) 1/2 in our article, is asymptotically larger than n −1/2 . Assumption 4(iii) also indicates that there might be more frequent breaks in the idiosyncratic error components than in the common components, and the restriction and ω e n n 5 −4 / log 4 (nd) → ∞ with ∈ (4/5, 1]. In contrast, the minimum length κ c n of the subintervals separated by break points in the common components is of order n. Assumption 5 requires the random normalizers in the WSBS-Cov to be uniformly bounded away from zero and infinity with probability approaching one. In Assumption 2(i), we assume that K 1 and r k , k = 1, . . . , K 1 , are all bounded, which indicates that q 0 , the number of transformed factors in (2.4), is upper bounded by Proposition 2.1. This assumption facilitates proofs of the consistent estimation theory for the transformed factors and factor loadings and consistent break estimation theory. It would be an interesting extension by allowing the break numbers K 1 and K 2 to grow slowly as n → ∞. In this case, q 0 would be divergent as well, and the condition of κ c n n would be violated. Consequently, we need to amend the estimation and break detection techniques, and modify some technical assumptions (such as Assumption 4) and the asymptotic proofs. We will further explore this in our future study.

Supplementary Materials
The supplementary materials contains the detailed proofs of the main asymptotic results, a simple motivating example for the factor model transformation as well as additional simulation studies.