Simultaneous multiple change-point and factor analysis for high-dimensional time series

We propose the first comprehensive treatment of high-dimensional time series factor models with multiple change-points in their second-order structure. We operate under the most flexible definition of piecewise stationarity, and estimate the number and locations of change-points consistently as well as identifying whether they originate in the common or idiosyncratic components. Through the use of wavelets, we transform the problem of change-point detection in the second-order structure of a high-dimensional time series, into the (relatively easier) problem of change-point detection in the means of high-dimensional panel data. Also, our methodology circumvents the difficult issue of the accurate estimation of the true number of factors in the presence of multiple change-points by adopting a screening procedure. We further show that consistent factor analysis is achieved over each segment defined by the change-points estimated by the proposed methodology. In extensive simulation studies, we observe that factor analysis prior to change-point detection improves the detectability of change-points, and identify and describe an interesting `spillover' effect in which substantial breaks in the idiosyncratic components get, naturally enough, identified as change-points in the common components, which prompts us to regard the corresponding change-points as also acting as a form of `factors'. Our methodology is implemented in the R package {\tt factorcpt}, available from CRAN.

Factor modelling, in which the individual elements of a high-dimensional time series are modelled as sums of a common component (a linear combination of a small number of possibly unknown factors), plus each individual element's own idiosyncratic noise, is a well-established technique for dimension reduction in time series. Time series factor models are classified, in relation to the effect of factors on the observed time series, into 'static' (only a contemporaneous effect, see e.g., Stock and Watson (2002); Bai and Ng (2002); Bai (2003)) or 'dynamic' (lagged factors may also have an effect, see e.g., Forni and Lippi (2001); Hallin and Lippi (2013)) factor models.
It is increasingly recognised that in several important application areas, such as those mentioned at the beginning of this section, nonstationary time series data are commonly observed.
Arguably the simplest realistic departure from stationarity, which also leads to sparse and interpretable time series modelling, is piecewise-stationarity, in which the time series is modelled as approximately stationary between neighbouring change-points, and changing its distribution (e.g., the mean or covariance structure) at each change-point.
The main aim of this work is to provide the first comprehensive framework for the estimation of time series factor models with multiple change-points in their second-order structure. The existing literature on time series factor modelling has only partially embraced nonstationarity.
One way in which a change-point is typically handled in the literature is via the assumption that the structural break in the loadings is 'moderate' and it affects only a limited number of series, so that it does not adversely impact the quality of traditional stationary principal component analysis (PCA)-type estimation Watson, 2002, 2009;Bates et al., 2013).
However, opinions diverge on the empirically relevant degrees of temporal instability in the factor loadings, and several authors observe that 'large' changes in the stochastic data-generating mechanism have the potential to severely distort the estimation of the factor structure. Investigations into the effect of a single break in the loadings or the number of factors on the factor structure, with accompanying change-point tests and estimators for its location, can be found in Breitung and Eickmeier (2011), Chen et al. (2014), Han and Inoue (2014), Corradi and Swanson (2014), Yamamoto and Tanaka (2015), Baltagi et al. (2017), Bai et al. (2017) and Massacci (2017). Lasso-type estimation is considered for change-point analysis under factor modelling in Cheng et al. (2016) and Ma and Su (2016): the former concerns single change-point detection in the loadings and the number of factors, while the latter considers multiple change-point detection in loadings only. Note that the 1 -penalty of the Lasso is not optimal for change-point detection, as investigated in Brodsky and Darkhovsky (1993) and Cho and Fryzlewicz (2011). In summary, apart from Ma and Su (2016) and Sun et al. (2016) (the latter considers factor models with multiple change-points but for a small number of time series only), the existing change-point methods proposed for factor models focus on detecting a single break of a particular type, namely a break in the loadings or the number of factors.
We now describe in detail the contribution and findings of this work, at the same time giving an overview of the organisation of the paper.
(a) We propose a comprehensive methodology for the consistent estimation of multiple changepoints in the second-order structure of a high-dimensional time series governed by a factor model. This is in contrast to the substantial time series factor model literature, which is overwhelmingly concerned with testing for a single change-point. In practice, the possibility of the presence of multiple change-points cannot be ruled out from a dataset consisting of observations over a long stretch of time, as illustrated in our applications to financial and macroeconomic time series data in Section 8. Our estimators are 'interpretable' in the sense that they enable the identification of whether each change-point originates from the common or idiosyncratic components. Through simulation studies (Section 7), it is demonstrated that in high-dimensional time series segmentation, factor analysis prior to change-point detection improves the detectability of change-points that appear only in either of the common or the idiosyncratic components. (c) We derive a uniform convergence rate for the PCA-based estimator of the common components under the factor model with multiple change-points in Theorems 1 and 2 (Section 3.1). A key to the derivation of the theoretical results is the introduction of the 'capped' PCA estimator of the common components, which controls for the possible contribution of spurious factors to individual common components even when the number of factors is over-specified.
(d) Through the use of wavelets, we transform the problem of change-point detection in the second-order structure of a high-dimensional time series, into the (relatively easier) problem of change-point detection in the means of high-dimensional panel data. More specifically, in the first stage of the proposed methodology, we decompose the observed time series into common and idiosyncratic components, and we compute wavelet transforms for each component (separately), see e.g., Nason et al. (2000), Fryzlewicz and Nason (2006) and Van Bellegem and von Sachs (2008) for the use of locally stationary wavelet models for time series data. In this way, any change-point in the second-order structure of common or idiosyncratic components is detectable as a change-point in the means of the wavelet-transformed series (Sections 3.2 and 3.3).
(e) Each of the panels of transformed common and idiosyncratic components serves as an input to the second stage of our methodology, which requires an algorithm for multiple change-point detection in the means of high-dimensional panel data. For this, a number of methodologies have been investigated in the literature, such as Horváth and Hušková (2012), Enikeeva and Harchaoui (2015), Jirak (2015), Cho and Fryzlewicz (2015) and Wang and Samworth (2018). In Section 4.1, we adopt the Double CUSUM Binary Segmentation procedure proposed in Cho (2016), which achieves consistency in estimating the total number and locations of the multiple change-points while permitting both within-series and cross-sectional correlations. In Section 4.2, we prove that this consistency result carries over to the consistency in multiple change-point detection in the common and idiosyncratic components, as the dimensions of the data, n and T , diverge (Theorem 3).
(f) Motivated by the theoretical finding noted in (c), our methodology is equipped with a step that screens the results of change-point analysis over a range of factor numbers employed for the estimation of common components (Section 4.3). It enables us to circumvent the challenging problem of accurately estimating the true number of factors in contrast to much of the existing literature, and plays the key role for the proposed methodology to achieve consistent change-point estimation.
(g) Once all the change-points are estimated, we show that the common components are consistently estimated via PCA on each estimated segment (Section 5). Such result is new to the literature on factor models with structural breaks.
(h) We identify and describe an interesting 'spillover' effect in which change-points attributed to large breaks in the second-order structure of the idiosyncratic components get (seemingly falsely) identified as change-points in the common components (Section 7). We argue that this phenomenon is only too natural and expected, and can be ascribed to the prominent change-points playing the role of 'common factors', regardless of whether they originate in the common or idiosyncratic components. We refer to this new point of view as regarding change-points as factors.
(i) We provide an R package named factorcpt, which implements our methodology. The package is available from CRAN .

Stage 1
Stage 2 Repeat the following for k ∈ R x it , 1 ≤ i ≤ n;

An overview of the proposed methodology
We provide an overview of the change-point detection methodology with an accompanying flowchart in Figure 1. For each step, we provide a reference to the relevant section in the paper.
Stage 1: Factor analysis and wavelet transformation.
PCA: x it is decomposed into the common component estimated with k factors ( χ k it ) and the idiosyncratic component ( k it ) via PCA (Section 3.1). Wavelet transformation: χ k it and k it are separately transformed into panels with (almost) piecewise constant signals via wavelet-based transformations g j (·) and h j (·, ·) (Section 3.3).
Stage 2: High-dimensional panel data segmentation. The joint application of the Double CUSUM Binary Segmentation (Section 4.1) and stationary bootstrap (Section 6.1) algorithms on the panels obtained in Stage 1, returns the sets of change-points detected for the common ( B χ (k)) and idiosyncratic ( B (k)) components.
Screening over k ∈ R: the sets B χ (k) are screened to obtain the final estimates B χ and B (Section 4.3).

Notation
For a given m × n matrix B with b ij = [B] i,j denoting its (i, j) element, its spectral norm is defined as B = µ 1 (BB ), where µ k (C) denotes the kth largest eigenvalue of C, and its ij . For a given set Π, we denote its cardinality by |Π|. The indicator function is denoted by I(·). Also, we use the notations a ∨ b = max(a, b) and a ∧ b = min(a, b). Besides, a ∼ b indicates that a is of the order of b, and a b that We denote an n × n-matrix of zeros by O n .
2 Piecewise stationary factor model In this section, we define a piecewise stationary factor model which provides a framework for developing our change-point detection methodology. Throughout, we assume that we observe an n-dimensional vector of time series x t = (x 1t , . . . , x nt ) , following a factor model and undergoing an unknown number of change-points in its second-order structure. The location of the change-points is also unknown. Each of element x it can be written as: where χ t = (χ 1t , . . . , χ nt ) and t = ( 1t , . . . , nt ) denote the common and the idiosyncratic components of x t , with E(χ it ) = E( it ) = 0 for all i, t. We refer to r as the number of factors in model (1). Then the common components are driven by the r factors f t = (f 1t , . . . , f rt ) with E(f t ) = 0, and λ i = (λ i1 , . . . , λ ir ) denotes the r-dimensional vector of loadings. We denote the n × r matrix of factor loadings by Λ = [λ 1 , . . . , λ n ] . The change-points in the secondorder structure of x t are classified into B χ = {η χ 1 , . . . , η χ Bχ }, those in the common components, and B = {η 1 , . . . , η B }, those in the idiosyncratic components.
Remark 1. In the factor model (1), both the loadings and the factor number are timeinvariant. However, it is well established in the literature on factor models with structural breaks, that any change in the number of factors or loadings can be represented by a factor model with stable loadings and time-varying factors. In doing so, r, the dimension of the factor space under (1), satisfies r ≥ r b , where r b denotes the minimum number of factors of each segment [η χ b + 1, η χ b+1 ], b = 0, . . . , B χ , over which the common components remain stationary. We provide a comprehensive example illustrating this point in Appendix A. Many tests and estimation methods for changes in the loadings rely on this equivalence between the two representations of time-varying factor models, see e.g., Han and Inoue (2014) and Chen et al. (2014). Therefore, we work with the representation in (1), where all change-points in the common components are imposed on the second-order structure of f t as detailed in Assumption 1 below. At the same time, we occasionally speak of changes in the loadings or the number of factors, referring to those changes in f t that can be ascribed back to such changes.
We denote by β(t) = max{0 ≤ b ≤ B χ : η χ b + 1 ≤ t} the index of the change-point in χ it that is nearest to, and strictly to the left of t, and by η χ (t) = max{η χ b : η χ b + 1 ≤ t, b = 0, . . . , B χ } the latest change-point in χ it strictly before t, with the notational convention η χ 0 = 0 and η χ Bχ+1 = T . Similarly, η (t) is defined with respect to η b ∈ B , and γ(t) denotes the index of the change-point in it that is nearest to t while being strictly to its left. Then, we impose the following conditions on f t and t . Assumption 1.
Assumption 1 (i)-(ii) indicates that for each b = 0, . . . , B χ , the common component χ it is 'close' to a stationary process χ b it over the segment [η χ b + 1, η χ b+1 ] such that the effect of transition from one segment to another diminishes at a geometric rate as t moves away from η χ b , and any η χ b ∈ B χ coincides with a change-point in the autocovariance or cross-covariance matrices of χ β(t) t . The same arguments apply to it under Assumption 1 (iii)-(iv). The treatment of such approximately piecewise stationary f t and t is similar to that in Fryzlewicz and Subba Rao (2014).
Note that the literature on factor models with structural breaks have primarily focused on the case of a single change-point in the loadings or factor number see e.g., Breitung andEickmeier (2011), Chen et al. (2014) and Han and Inoue (2014). We emphasise that to the best of our knowledge, the model (1) equipped with Assumption 1 is the first one to offer a comprehensive framework that allows for multiple change-points that are not confined to breaks in the loadings or the emergence of a new factor, but also includes breaks in the second-order structure of the factors and idiosyncratic components.
We now list and motivate the assumptions imposed on the piecewise stationary factor model; see e.g., Stock and Watson (2002), Bai and Ng (2002), Bai (2003), Forni et al. (2009) andFan et al. (2013) for similar conditions on stationary factor models.
(i) There exists a positive definite r × r matrix H with distinct eigenvalues, such that (ii) There existsλ ∈ (0, ∞) such that |λ ij | <λ for all i, j.
for any sequence of coefficients {a i } n i=1 satisfying n i=1 a 2 i = 1.
(ii) it are normally distributed.
We adopt the normalisation given in Assumption 2 (i) for the purpose of identification; in general, factors and loadings are recoverable up to a linear invertible transformation only. Similar assumptions are found in the factor model literature, see e.g., equation (2.1) of Fan et al. (2013). In order to motivate Assumptions 2 (i), 3 and 4 (i), we introduce the notations We denote the eigenvalues (in non-increasing order) of Γ b χ , Γ b , Γ x , Γ χ and Γ by µ b χ,j , µ b ,j , µ x,j , µ χ,j and µ ,j , respectively. Then, Assumptions 2 (i) and 3 imply that µ b χ,j , j = 1, . . . , r b are diverging as n → ∞ and, in particular, are of order n for all b = 0, . . . , B χ . Assumption 3 implicitly rules out change-points which are due to weak changes in the loadings, in the sense that the magnitudes of the changes in the loadings are small, or only a small fraction of χ it undergoes the change. A similar requirement can be found in e.g., Assumption 1 of Chen et al. (2014) and Assumption 10 of Han and Inoue (2014). We note that Bai et al. (2017) studies the consistency of a least squares estimator for a single change-point attributed to a possibly weak break in the loadings in the above sense. In Section 7, we provide numerical results on the effect of the size of the break on our proposed methodology.
Assumption 4 (i) guarantees that, when a = (a 1 , . . . , a n ) is a normalised eigenvector of Γ b , then the largest eigenvalue of Γ b is bounded for all b = 0, . . . , B and n, that is µ b ,1 < C . This is the same assumption as those made in Chamberlain and Rothschild (1983) and Forni et al. (2009) and comparable to Assumption C.4 of Bai (2003) and Assumption 2.1 of Fan et al. (2011a) in the stationary case. Note that Assumption 4 (i) is sufficient in guaranteeing the commonness of χ it and the idiosyncrasy of it according to Definitions 2.1 and 2.2 of Hallin and Lippi (2013). Assumption 4 (ii) may be relaxed to allow for it of exponential-type tails, provided that the tail behaviour carries over to the cross-sectional sums of it . Note that the normality of the idiosyncratic component does not necessarily imply the normality of the data since the factors are allowed to be non-normal.
Assumption 5 is commonly found in the factor model literature (see e.g., Assumptions 3.1-3.2 of Fan et al. (2011a)). In particular, the exponential-type tail conditions in Assumptions 2 (ii) and 4 (ii), along with the mixing condition in Assumption 5 (ii), allow us to control the deviation of sample covariance estimates from their population counterparts, via Bernsteintype inequality (see e.g., Theorem 1.4 of Bosq (1998) and Theorem 1 of Merlevède et al. (2011)).
From (C1)-(C4) above, it is clear that for identification of the common and idiosyncratic components, factor models need to be studied in the asymptotic limit where n → ∞ and T → ∞. This requirement, in practice, recommends the use of large cross-sections to apply PCA for factor analysis. In particular, we require Assumption 7. n → ∞ as T → ∞, with n = O(T κ ) for some κ ∈ (0, ∞).
Under Assumption 7, we are able to establish the consistency in estimation of the common components as well as that of the change-points in high-dimensional settings where n > T , even when the factor number r is unknown.
Remark 2. The multiple change-point detection algorithm adopted in Stage 2 of our methodology still achieves consistency when Assumption 6 is relaxed to allow min{|η χ b+1 − η χ b |, |η b+1 − η b |} ≥ c η T ν for some fixed c > 0 and ν ∈ (6/7, 1], with B ≡ B T increasing in T such that B T = O(log 2 T ). However, under these relaxed conditions, it is no longer guaranteed that Γ χ and, consequently, Γ x have r diverging eigenvalues. Therefore, in this paper we limit the scope of our theoretical results to the more restricted setting of Assumption 6.

Estimation of the common and idiosyncratic components
Decomposing x it into common and idiosyncratic components is an essential step for the separate treatment of the change-points in χ it and it , such that we can identify the origins of detected change-points. Therefore, in this section, we establish the asymptotic bounds on the estimation error of the common and idiosyncratic components, when using PCA under the piecewise stationary factor model in (1).
Let w x,j denote the n-dimensional normalised eigenvector corresponding to the jth largest eigenvalue of the sample covariance matrix, Γ x , with its entries w x,ij , i = 1, . . . , n. When the number of factors r is known, the PCA estimator of χ it is defined as χ it = r j=1 w x,ij w x,j x t , for which the following theorem holds.
Theorem 1. Under Assumptions 1-7, the PCA estimator of χ it with r known satisfies Proofs of Theorem 1 and all other theoretical results are provided in the Appendix.
Despite the presence of multiple change-points, allowed both in the variance and autocorrelations of f t , the rate of convergence for χ it is almost as fast as the one derived for the stationary case, e.g., Theorem 3 of Bai (2003), and Theorem 1 of Pelger (2015) and Theorem 5 of Aït-Sahalia and Xiu (2017) in the context of factor modelling high-frequency data. We highlight that Theorem 1 derives a uniform bound on the estimation error over i = 1, . . . , n and t = 1, . . . , T ; similar results are found in Theorem 4 of Fan et al. (2013).
However, the true number of factors r is typically unknown and, since the seminal paper by Bai and Ng (2002),  Bai and Ng (2002) achieve asymptotic consistency in estimating r in the presence of a single break in the loadings, it has been observed that such an approach tends to exhibit poor finite sample performance when the idiosyncratic components are both serially and cross-sectionally correlated, or when Var( it ) is large compared to Var(χ it ). Also, it was noted that the factor number estimates heavily depend on the relative magnitude of n and T and the choice of penalty terms as demonstrated in the numerical studies of Bai and Ng (2002) and Ahn and Horenstein (2013).
While the majority of change-point methods for factor models rely on the consistent estimation of the number of factors (Breitung and Eickmeier, 2011;Han and Inoue, 2014;Chen et al., 2014;Corradi and Swanson, 2014), our empirical study on simulated data, available in the supplementary document, indicates that this dependence on a single estimate may lead to failure in change-point detection.
To remedy this, we propose to consider a range of factor number candidates in our changepoint analysis, in particular, allowing for the over-specification of the number of factors. In high dimensions, the estimated eigenvectors w x,j for j > r are in general not consistent, as implied by Theorem 2 of Yu et al. (2015). Thus, estimates of common components based on more than r principal components may be subject to a non-negligible overestimation error.
In order to control the effect of over-specifying the number of factors, we propose a modified PCA estimator of χ it defined as χ k it = k j=1 w x,ij w x,j x t , where each element of w x,j is 'capped' according to the rule for some fixed c w > 0. The estimated loadings are then given by λ ij = √ n w x,ij and the factors by f jt = n −1/2 w x,j x t . A practical way of choosing c w is discussed in Section 6.2.
Note that, thanks to the result in (C3) and Lemmas 2 and 3 in the Appendix, we have In other words, asymptotically, the capping does not alter the contribution from the r leading eigenvectors of Γ x to χ k it , even when the capping is applied without the knowledge of r. On the other hand, by means of this capping, we control the contribution from the 'spurious' eigenvectors when k > r, which allows us to establish the following bound on the partial sums of the estimation errors even when the factor number is over-specified.
Theorem 2. Suppose that Assumptions 1-7 hold, and let θ be defined as in Theorem 1. Then, the capped estimator χ k it satisfies the following: (ii) when k > r, with θ given in Theorem 1.
The bound in (2) concerns the case in which we correctly specify the number of factors and is in agreement with Theorem 1. Turning to when the factor number is over-specified, Forni et al. (2000) (in their Corollary 2) and Onatski (2015) (in his Proposition 1) reported similar results for the stationary case. However, the uniform consistency of χ k it , k ≥ r, in the presence of multiple change-points, has not been spelled out before in the factor model literature to the best of our knowledge; we achieve this via the proposed capping. On the other hand, it is possible to show that with k < r, the estimation error in χ k it is non-negligible. Lastly, thanks to Lemma 4 in Appendix, it is straightforward to show that analogous bounds hold for the Although the over-specification of k > r brings the bound in (2) to increase by √ n ∧ √ T , we can still guarantee that all change-points in χ it ( it ) are detectable from χ k it ( k it ) provided that k ≥ r, as shown in Proposition 1 and Theorem 3 below. In what follows, we continue describing our methodology by supposing that k ≥ r is given, and we refer to Section 4.3 for the complete description of the proposed 'screening' procedure that considers a range of values for k.

Wavelet periodograms and cross-periodograms
As the first stage of the proposed methodology, we construct a wavelet-based transformation (WT) of the estimated common and idiosyncratic components χ k it and k it from Section 3.1, which serves as an input to the algorithm for high-dimensional change-point analysis in Stage 2. In order to motivate the WT, which will be formally introduced in Section 3.3, we limit our discussion in this section to the (unobservable) common component χ it ; the same arguments hold verbatim for the idiosyncratic one. In practice, the WT is performed on the estimated common component, χ k it , and the effect of considering estimated quantities rather than the true ones is studied in Section 3.3. Nason et al. (2000) have proposed the use of wavelets as building blocks in nonstationary time series analogous to Fourier exponentials in the classical Cramér representation for stationary processes. The simplest example of a wavelet system, Haar wavelets, are defined as ψ j,l = 2 j/2 I(0 ≤ l ≤ 2 −j−1 − 1) − 2 j/2 I(2 −j−1 ≤ l ≤ 2 −j − 1), with j ∈ {−1, −2, . . .} denoting the wavelet scale, and l ∈ Z denoting the location. Small negative values of the scale parameter j denote fine scales where the wavelet vectors are more localised and oscillatory, while large negative values denote coarser scales with longer, less oscillatory wavelet vectors.
Recall the notation i,t−l ψ j,l , with respect to ψ j = (ψ j,0 , . . . , ψ j,L j −1 ) , a vector of discrete wavelets at scale j. Note that the support of ψ j is of length L j = M 2 −j for a fixed M > 0 (which depends on the choice of wavelet family), so that we have access to wavelet coefficients from at most log 2 T scales for a time series of length T . In other words, wavelet coefficients are obtained by filtering χ β(t) it with respect to wavelet vectors of finite lengths. Wavelet periodogram and cross-periodogram sequences of χ β(t) it are defined as I j,it = |d j,it | 2 and I j,ii t = d i,it d j,i t . It has been shown that the expectations of these sequences have a one-to-one correspondence with the second-order structure of the input time series, see Cho and Fryzlewicz (2012) for the case of univariate time series and Cho and Fryzlewicz (2015) for the high-dimensional case. To illustrate, suppose that t − L j + 1 ≥ η χ (t) + 1. Then, where Ψ j (τ ) = l∈Z ψ j,l ψ j,l+τ (with ψ j,l = 0 unless 0 ≤ l ≤ L j − 1) denotes the autocorrelation wavelets (Nason et al., 2000) and That is, provided that t is sufficiently distanced (by L j ) from any change-points to its left, E|d j,it | 2 is a wavelet transformation of the (auto)covariances E(χ Similar arguments hold between wavelet cross-periodograms and cross-covariances of χ β(t) it . Following Cho and Fryzlewicz (2015), we conclude that under Assumption 1 (ii), any jump in the autocovariance and cross-covariance structures of χ β(t) t is translated to a jump in the means of its wavelet periodogram and cross-periodogram sequences at some wavelet scale j, in the following sense: their change-points coinciding with those in B χ , apart from intervals of length L j around the change-points.
It is reasonable to limit our consideration to wavelets at the first J * T finest scales j = −1, . . . , −J * T (with J * T ≤ log 2 T ), in order to control the possible bias in change-point estimation that arises from the transition intervals of length L j . On the other hand, due to the compactness of the support of ψ j , a change in Γ β(t) χ (τ ) that appears only at some large lags (|τ | ≥ L −J * T = M 2 J * T ), is not detectable as a change-point in the wavelet periodogram and cross-periodogram sequences at the few finest scales (j ≥ −J * T ), see (4). To strike a balance between the above quantities related to the choice of J * T , we set J * T = C log 2 log υ T for some υ ∈ (0, 1] and some fixed C > 0. We refer to Section 6.2 for the choice of C. Then,

Wavelet-based transformation for change-point analysis
In this section, we propose the WT of estimated common and idiosyncratic components, and show that the change-points in the complex (autocovariance and cross-covariance) structure of (unobservable) χ it and it , are made detectable as the change-points in the relatively simple structure (means) of the panel of the wavelet transformed χ k it and k it . This panel then serves as an input to Stage 2 of our methodology, as described in Section 4. As in Section 3.2, we limit the discussion of the WT and its properties when applied to the change-point analysis of the common components; the same arguments are applicable to that of the idiosyncratic components.
Then, for each j = −1, −2, . . . , −J * T , we propose the following transformation which takes χ k it and produces a panel of n(n + 1)/2-dimensional sequences with elements: where s ii ∈ {−1, 1}. For example, with Haar wavelets at scale j = −1, Remark 3. Notice that g j ( χ k it ) is simply the squared root of the wavelet periodogram of χ k it at scale j. Arguments supporting the choice of h j in place of wavelet cross-periodogram sequences can be found in Section 3.1.2 of Cho and Fryzlewicz (2015), where it is guaranteed that any change detectable from d j,it · d j,i t can also be detected from h j ( χ k it , χ k i t ) with either of s ii ∈ {1, −1}. As per the recommendation made therein, we select s ii = −sign{ cor( χ k it , χ k i t )}, where cor( χ k it , χ k i t ) denotes the sample correlation between χ k it and χ k i t over t = 1, . . . , T . This is in an empirical effort to select s ii that better brings out any change in E{h j (χ it , χ i t )} for the given data.
As with wavelet periodograms and cross-periodograms discussed in Section 3.2, the trans- as change-points in their 'signals'. This is formalised in the following proposition.
Proposition 1. Suppose that all the conditions in Theorem 2 hold. For some fixed k ≥ r and and denote as y t a generic element of (5). Then, we have the following decomposition: (i) z t are piecewise constant as the corresponding elements of That is, all change-points in z t belong to Therefore, the panel data in (5) contains all change-points in the second-order structure of as the change-points in its piecewise constant signals represented by z t .
Let us denote by y t an element of the panel obtained by transforming χ it in place of χ k it in (5). Then, the proof of Proposition 1 is based on the following decomposition Then, I accounts for the discrepancy between χ it and χ β(t) it and is controlled by Assumption 1 (i), and II follows from the weak dependence structure and the tail behaviour of f t assumed in Assumptions 2 (ii) and 5. Term III arises from the estimation error in χ k it which can be bounded as shown in Theorem 2, which further motivates the WT of χ k it via g j and h j rather than using its wavelet (cross-)periodograms.
To conclude, note that the WT of k it are collected into the N -dimensional panel for change-point analysis, where the elements of (8) are also decomposed into piecewise constant signals and error terms of bounded partial sums; see Appendix C where we present the result analogous to Proposition 1 for the idiosyncratic components.
4 High-dimensional panel data segmentation

Double CUSUM Binary Segmentation
Cumulative sum (CUSUM) statistics have been widely adopted for change-point detection in both univariate and multivariate data. In order to detect change-points in the N -dimensional additive panel data in (5), we compute N univariate CUSUM series and aggregate the highdimensional CUSUM series via the Double CUSUM statistic proposed by Cho (2016), which achieves this through point-wise, data-driven partitioning of the panel data When dealing with multiple change-point detection, the Double CUSUM statistic is used jointly with binary segmentation in an algorithm, which we refer to as the Double CUSUM Binary Segmentation (DCBS) algorithm. The DCBS algorithm guarantees the consistency in multiple change-point detection in high-dimensional settings, as shown in Theorem 3, while allowing for the crosssectional size of change to decrease with increasing sample size (see Assumption 9 below).
Further, it admits both serial-and cross-correlations in the data, which is a case highly relevant for the time series factor model considered in this paper.
Consider the N -dimensional input panel {y t , = 1, . . . , N ; t = 1, . . . , T }, computed from χ k it or k it for the change-point analysis in either the common or the idiosyncratic components via WT; see (5) and (8) for their definitions. We define the CUSUM series of y t over a generic segment [s, e] for some 1 ≤ s < e ≤ T , as for b = s, . . . , e − 1, where σ denotes a scaling constant for treating all rows of the panel data y t , 1 ≤ ≤ N on equal footing; see Remark 4 below for its choice. We note that if ε t in (6) were i.i.d. Gaussian random variables, the maximum likelihood estimator of the change-point Proposed in Cho (2016), the Double CUSUM (DC) operator aggregates the N series of CUSUM statistics from y t and returns a two-dimensional array of DC statistics: which is compared against a threshold π N,T for determining the presence of a change-point over the interval [s, e]. If T s,e > π N,T , the location of the change-point is identified as Remark 4 (Choice of σ .). Cho (2016) assumes second-order stationarity on the error term ε t in (6), which enables the use of its long-run variance estimator as the scaling term σ . However, it is not trivial to define a similar quantity in the problem considered here, particularly due to the possible nonstationarities in ε t . Following Cho and Fryzlewicz (2015), in order for the CUSUM series computed on y t not to depend on the level of E(y 2 t ), we also adopt the choice of σ = T −1 T t=1 y 2 t . Note that the asymptotic consistency of the DCBS algorithm in Theorem 3 below does not depend on the choice of σ , provided that it is bounded away from zero and from the above for all = 1, . . . , N with probability tending to one (see Assumption (A6) of Cho (2016)). By adopting arguments similar to those in the proof of Lemma 6 in Cho and Fryzlewicz (2012), it can be shown that our choice of σ satisfies these properties.
We now formulate the DCBS algorithm which is equipped with the threshold π N,T . The index u is used to denote the level (indicating the progression of the segmentation procedure) and v to denote the location of the node at each level.

The Double CUSUM Binary Segmentation (DCBS) algorithm
Step 0: Step 1: At the current level u, repeat the following for all v.
Step 1 Step 1.2: Obtain the test statistic T s,e = max b∈[s,e) max 1≤m≤N D s,b,e (m).
Step 1.3: If T s,e ≤ π N,T , quit searching for change-points on the interval [s, e]. On the other hand, if T s,e > π N,T , locate η = arg max b∈[s,e) max 1≤m≤N D s,b,e (m), add it to the set of estimated change-points B, and proceed to Step 1.4.
Step 2: Once [s u,v , e u,v ] for all v are examined at level u, set u ← u + 1 and go to Step 1.
Step 1.3 provides a stopping rule to the DCBS algorithm, by which the search for further change-points is terminated once T s,e ≤ π N,T on every segment defined by two adjacent estimated change-points in B. Depending on the choice of the input panel data y t as in (5) or (8), the DCBS algorithm returns B χ (k) or B (k), the sets of change-points detected from χ k it and k it , respectively.

Consistency in multiple change-point detection
In this section, we show the consistency of change-points estimated for the common and idiosyncratic components by the DCBS algorithm, in terms of their total number and locations.
Consider the panel {y t , = 1, . . . , N ; t = 1, . . . , T } that represents the WTs of χ k it and k it as in (5) and (8), respectively. In either case, let η b , b = 1, . . . , B denote the change-points in the piecewise constant signals z t underlying y t (see (7) and (9) for the precise definitions of We impose the following assumptions on the signals z t and the change-point structure therein.
Assumption 8 requires the expectations of the WTs of χ b it and b it defined in Assumption 1 to be bounded, which in fact holds trivially as L j −1 l=0 ψ 2 j,l = 1 for all j. Assumption 9 imposes a condition on the minimal cross-sectional size of the changes in z t , represented by ∆ N,T . Under Assumption 3, the change-points which are due to breaks in the loadings or (dis)appearance of new factors (see Remark 1), are implicitly required to be 'dense', in the sense that the number of coordinates in x t affected by the changes is of order n. Consequently, such changes appear in a large number (of order n 2 ) of elements of Γ β(t) χ (τ ) and, therefore, that of the WT of the second-order structure, z t . Similarly, not- results in a dense change-point that affects a large number of z t . In other words, for change-point analysis in the common components, Assumption 9 is reduced to requiring as T → ∞, allowing for the average jump size in individual coordinates z t to tend to zero at a rate slower than T −1/4 , and is no longer dependent on n. On the other hand, our model in (1) does not impose any assumption on the 'denseness' of η b , b = 1, . . . , B , and sparse change-points (with m b n 2 ) in the idiosyncratic components are detectable provided that their sparsity is compensated by the size of jumps, depending on whether y t is the WT of χ k it or k it for some fixed k. Accordingly, we have either Then the following theorem establishes that the DCBS algorithm performs consistent change-point analysis for both the common and the idiosyncratic components.
Theorem 3. Suppose that Assumptions 1-9 hold, and let θ be defined as in Theorem 1.
Also, let the threshold π N,T satisfy C N ∆ −1 N,T log 2θ+2υ T < π N,T < C ∆ N,T T 1/2 for some fixed C , C > 0. Then, there exists c 1 > 0 such that  (Korostelev, 1987). With our approach, near-optimality in change-point estimation is achieved up to a logarithmic factor when the change-points are cross-sectionally dense (m b ∼ N ) with average size δ b bounded away from zero, so that

Screening over a range of factor number candidates
In this section, we detail a screening procedure motivated by Theorem 2, which enables us to bypass the challenging task of estimating the number of factors in the presence of (possibly) multiple change-points in the factor structure.
The performance of most methods proposed for change-point analysis under factor modelling, such as those listed in Introduction, relies heavily on accurate estimation of the factor number.
Thanks to Theorem 2, however, mis-specifying the factor number in our methodology does not influence the theoretical consistency as reported in Theorem 3, provided that we choose k sufficiently large to satisfy k ≥ r. Based on this observation, we propose to screen B χ (k), the set of change-points detected from χ k it , for a range of factor number candidates denoted as R = {r, r + 1, . . . ,r}. Specifically, we select the B χ (k) with the largest cardinality over k and, if there is a tie, we select B χ (k) with the largest k. Denoting k * = max{k ∈ R : As noted below Theorem 2, using k < r factors leads to χ k it with non-negligible estimation error which may not contain all the change-points in B χ as change-points in its second-order structure. Moreover, with such k, some η χ b 's may appear as change-points in k it , which we refer to as the spillage of change-points in the common components over to the idiosyncratic components. This justifies the proposed screening procedure and the choice of k * .
For r, we use the number of factors estimated by minimising the information criterion of Bai and Ng (2002) using as penalty p(n, T ) = (n ∧ T ) −1 log(n ∧ T ); in principle, any other procedure for factor number estimation may be adopted to select r. The range R also involves the choice of the maximum number of factors,r. In the factor modelling literature, this choice is a commonly faced problem, e.g., when estimating the number of factors using an information criterion-type estimator. In the literature on stationary factor models, the maximum number of factors is usually fixed at a small number (r = 8 is used in Bai and Ng (2002)) for practical purposes. However, the presence of change-points tends to increase the number of factors, as shown with an example in Appendix A. Therefore, we setr = 20 ∨ √ n ∧ T , where the second term dominates the first when n and T are large.

Factor analysis after change-point detection
Once the change-points are detected, we can estimate the factor space over each segment . . , B χ , defined by two consecutive change-points estimated from the common components. Denoting the sample covariance matrix over t x corresponding to its j-th largest eigenvalue. Then, for a fixed number of factors k, the segment-specific estimator of χ it is obtained via PCA as χ The number of factors r b in the b-th segment can be estimated by means of the information criterion proposed in Bai and Ng (2002): wherer denotes the maximum allowable factor number, and the penalty function p(n, T ) satisfies p(n, T ) → 0 as well as (n 2 ∧ √ T ) log −2/β f T · p(n, T ) → ∞. Motivated by the formulation of penalties in Bai and Ng (2002), we may use p(n, T ) = (n+ Let Λ b be the n×r b matrix of loadings for the b-th segment. In order to discuss the theoretical properties of segment-wise factor analysis, we require the following assumption that extends Assumption 3 (i) to each segment I χ b .
Assumption 10. There exists a positive definite Then, we obtain the asymptotic results in Propositions 2-3 for the segment-wise estimators of the factor number and the common components.
Proposition 2. Suppose that all the conditions in Theorem 3 and Assumption 10 hold. For Proposition 3. Suppose that all the conditions in Theorem 3 and Assumption 10 hold. Then, From Propositions 2 and 3, we have the guarantee that the factor space is consistently estimated by PCA over each estimated segment. We may further refine the post change-point analysis by first determining whether η χ b can be associated with (a) a break in the loadings or factor number, or (b) that in the autocorrelation structure of the factors only. This can be accomplished by comparing r b−1 and r b against the factor number estimated from the pooled segment I χ b−1 ∪ I χ b : a break in the loadings or factor number necessarily brings in a change in the number of factors from the pooled segment. However, if (b) is the case, the segments before and after η χ b as well as the pooled one return the identical number of factors, and we can perform the joint factor analysis of the two segments.
6 Computational aspects 6.1 Bootstrap procedure for threshold selection The range of theoretical rates supplied for π N,T in Theorem 3, involves typically unattainable knowledge of the minimum cross-sectional size of the changes. Hence, we propose a bootstrap procedure for the selection of π χ N,T and π N,T , the thresholds for change-point analysis of χ k it and k it , respectively. We omit N and T from their subscripts for notational convenience when there is no confusion. Although a formal proof on the validity of the proposed bootstrap algorithm is beyond the scope of the current paper, simulation studies reported in Section 7 demonstrate its good performance when applied jointly with our proposed methodology. We refer to Trapani (2013), Corradi and Swanson (2014) and Gonçalves andPerron (2014, 2016) for alternative bootstrap methods under factor models and Jentsch and Politis (2015) for the linear process bootstrap for multivariate time series in general.
We propose a new bootstrap procedure which is specifically motivated by the separate treatment of common and idiosyncratic components in our change-point detection methodology.
Namely, the resampling method produces bootstrap samples from the common and idiosyncratic components independently, relying on the consistency of the estimated components with an over-specified factor number as reported in Theorem 2.
Let T χ s,e (k) and T s,e (k) denote the test statistics T s,e computed on the interval [s, e] for the panel data obtained from the WT of χ k it and k it , respectively. The proposed resampling procedure aims at approximating the distributions of T χ s,e (k) and T s,e (k) under the null hypothesis of no change-point, which then can be used for the selection of π χ s,e (k) and π s,e (k), the corresponding, interval-specific test criteria for χ k it and k it over [s, e], an interval which is considered at some iteration of the DCBS algorithm.
Among the many block bootstrap procedures proposed in the literature for bootstrapping time series (see Politis and White (2004) for an overview), stationary bootstrap (SB) proposed in Politis and Romano (1994) generates bootstrap samples which are stationary conditional on the observed data (see Appendix D.4 for details). Based on the SB, our procedure derives π χ s,e (k) and π s,e (k). Recall that λ ij and f jt denote the loadings and factors estimated via the capped PCA for j = 1, . . . , k, see Section 3.1.
Stationary bootstrap algorithm for factor models.
Step 1 For the common components: For each l ∈ {1, . . . , k}, produce the SB sample of Step 2: Generate y • t through transforming χ k• it or k• it using g j (·) and h j (·, ·) as described in Section 3.3.
Step 3: Compute Y • s,b,e on y • t and generate the test statistic T • s,e according to (10).
Step 4: Repeat Steps 1-3 R times. The critical value π χ s,e (k) or π s,e (k) for the segment [s, e] is selected as the (1 − α)-quantile of the R bootstrap test statistics T • s,e at given α ∈ (0, 1).
The bootstrap algorithm is designed to produce χ k• t and k• t that mimic the second-order structure of χ k t and k t , respectively, when there is no change-point present, and thus approximates the distributions of the test statistics under the null hypothesis. In the algorithm, the treatment of the common and idiosyncratic components differs only in the application of SB in Step 1: since the factors estimated from the PCA are uncorrelated by construction, we generate the SB samples of f jt for each j separately, while the n elements of k t are resampled jointly in an attempt to preserve the cross-sectional dependence therein. We discuss the choice of the bootstrap size R and the level of quantile α in Section 6.2.

Selection of tuning parameters
The proposed methodology involves the choice of tuning parameters for the capped PCA, WT, DCBS algorithm and the bootstrap procedure. We here list the values used for the simulation studies (Section 7) and real data analysis (Section 8). We provide further guidance on the implementation of the proposed methodology in Appendix D.
In the examples considered in Section 7, we have not observed unreasonably large contributions to χ k it from spurious factors. As noticed in Section 3.1, selecting a sufficiently large constant as c w effectively disables the capping for the (unknown) r leading principal components to χ k it . For this reason, in the current implementation, we disable the capping. However, this does not necessarily mean that capping will always be of no practical use. Therefore, we recommend the data-driven choice of c w = √ n max 1≤j≤r max 1≤i≤n | w x,ij |. With such c w , the capping is enabled for only those w x,ij with r + 1 ≤ j ≤r, where r andr denote the smallest and largest number of factors considered in Section 4.3.
For the WT, we propose to use J * T = C log 2 log υ T number of finest Haar wavelets for some υ ∈ (0, 1] and C > 0, in order to control for any bias in change-point estimation arising from WT. Noting that the bias increases at the rate 2 C with increasing C, we recommend the choice of J * T = log 2 log 2 T in practice. Although omitted from the description of the DCBS algorithm, we select a parameter d T controlling the minimum distance between two estimated change-points. In light of Remark 2 and Theorem 3, we choose to use d T = [log 2 T ∧ 0.25T 6/7 ]. Note that we can avoid using this parameter by replacing the binary segmentation procedure with wild binary segmentation (Fryzlewicz, 2014), such that the Double CUSUM is applied over randomly drawn intervals to derive the test statistic. It is conjectured that such a procedure will place a tighter bound on the bias in estimated change-point locations, as well as bypassing the need for the parameter d T . We leave the investigation in this direction for the future research.
Finally, for the proposed bootstrap procedure, we use the bootstrap sample size R = 200 for the simulation studies and R = 500 for the real data analysis. As for the level of quantile, we select α = 0.05; note that this choice does not indicate the significance level in hypothesis testing.

Simulation studies
In this section, we apply the proposed change-point detection methodology to both single and multiple change-point scenarios under factor modelling. While our methodology is designed for multiple change-point detection, Section 7.1 gives us insight into its performance in the presence of a single change-point, which is of the type, size and denseness that vary in a systematic manner. Multiple change-point scenarios are considered in Section 7.2.

Single change-point scenarios
The following stationary q-factor model allows for serial correlations in f jt and both serial and cross-sectional correlations in it : and λ ij ∼ iid N (0, 1). The stationary factor model in (12) has been frequently adopted for empirical studies in the factor model literature including Bai and Ng (2002). Throughout we  For each simulated dataset, we consider the range of possible factor numbers k ∈ R selected as described in Section 4.3. For any given k, we estimate the common and idiosyncratic components and proceed with WT of Stage 1, to which only the first iteration of the DCBS algorithm is applied. In this way, we test for the existence of at most a single change-point in the common and idiosyncratic components separately and, if its presence is detected, we identify its location in time. With a slight abuse of notation, we refer to the simultaneous testing and locating procedure as the DC test, and report its detection power and accuracy in change-point estimation.
As noted in Introduction, existing methods for change-point analysis under factor modelling are not applicable to the scenarios other than (S1) and (S3 performance against DC, MAX and AVG tests applied to the WT of χ k it under (S1)-(S3), and that of k it under (S4)-(S5). We include this approach, termed the DC-NFA (no factor analysis) test, in order to demonstrate the advantage in linking the factor modelling and change-point detection as proposed in our methodology. Overall, change-point detection becomes more challenging as σ (the size of changes) or (the proportion of the coordinates with the change) decreases, and also as Var(χ it )/Var(x it ) decreases (with increasing φ) under (S1)-(S3) and increases (with increasing φ −1 ) under (S4)-(S5). In all scenarios, the DC test shows superior performance to the DC-NFA test. This confirms that the factor analysis prior to change-point analysis is an essential step in markedly improving the detection power, as well as in identifying the origins of the change-points; without a factor analysis, a change-point that appears in either of χ it or it may be 'masked' by the presence of the other component and thus escape detection.
The DC and AVG tests generally attain similar powers, while the MAX test tends to have considerably lower power in scenarios such as (S2), (S4) and (S5). An exception is under (S3), where the MAX test attains larger power than the others in some settings with decreasing ( Figure 5). It may be explained by the fact that the smaller is, the sparser the changepoint becomes cross-sectionally, which is a setting that favours the approach taken in MAX  (2015)). Between the DC and AVG tests, the former outperforms the latter in the more challenging settings when the change is sparse cross-sectionally (with small ), particularly when the change is attributed to the introduction of a single factor to the existing five as in (S3).
Apart from (S5), the origin of the change-point is correctly identified in the sense that it is detected only from χ * it under (S1)-(S3) or * it under (S4) with power strictly above α = 0.05. In (S5), we observe spillage of the change-point in it : the change-point is detected from both χ * it and * it , particularly when a large proportion of it undergoes the change of an increase in the bandwidth H i in (14), and when Var(χ it )/Var(x it ) is small (with large φ), see Figure 7.
This supports the notion of a pervasive change-point being a factor, i.e., a significant break that affects the majority of the idiosyncratic components in their second-order structure, may be viewed as a common feature. We also note that due to the relatively large variance of common component, the detection power of the DC-NFA test behaves as that of the DC test applied to χ * it rather than * it in this scenario. We present in a supplementary document the results on other data generating processes taken from the literature on testing for a single change-point under factor models, as well as tables and figures summarising the simulation results for (S1)-(S5) with n = 300, along with box plots of the estimated change-points η 1 . The performance of all tests under consideration generally improve when n = 300. on these observations, we adopt the following data generating model:

Multiple change-point scenarios
where Var(·) denotes the sample variance operator, u jt , v it ∼ iid t 7 and g b = 1 for even b and g b = −1 otherwise. We use the loadings estimated from the S&P100 data without capping, We report the performance of our methodology over 100 realisations when n = 100 in Figures   8-9 for the two extreme cases with φ ∈ {1, 2.5}. We also apply the DCBS algorithm to the WT of true common and idiosyncratic components generated under (M1) and report the corresponding results, which serve as a benchmark against which the efficacy of the PCAbased factor analysis of the proposed methodology is assessed. Additional simulation results with varying φ and n are reported in the supplementary document.
The DCBS algorithm detects the two change-points in the idiosyncratic components equally well from * it and it , regardless of the values of φ and σ. On the other hand, detection of η χ b , b = 1, 2 is highly variable with respect to these parameters. When the size of the change is large (σ ∈ {1, 0.75}), the panel data generated from transforming χ * it serves as as good an input to the DCBS algorithm as that generated from transforming χ it in terms of translating the presence and locations of both change-points.
With decreasing Var(χ it )/Var(x it ), the change-point η 2 , which appears only in the idiosyn- cratic components, is detected with increasing frequency as a change-point in the common components from χ * it , when (a) σ ≥ 0.75 (the change in it is large and thus all three changepoints are detected from χ * it ), or (b) σ = 0.25 (the changes in χ it are ignored and a single change-point is detected at η 2 from χ * it ). This phenomenon is in line with the observation made for the model (S5) in Section 7.1, on the spillage of change-points in the idiosyncratic components over to the common components: a significant co-movement in the dependence structure of it may be regarded as being pervasive and common, and hence is captured as a change in the dependence structure of the common components by our proposed methodology.
Lastly, we note that although we impose normality on the idiosyncratic components for the theoretical development, these results show that our methodology works well even when the data exhibits some deviations of normality, such as heavy-tails.

Model (M2)
In this model, change-points are introduced as in (S1)-(S4) of Section 7.1. More specifically, with q = 5 and λ ij , u jt , v it ∼ iid N (0, 1). The parameters ρ f,j , ρ ,i , β i , H and ϑ are chosen identically as in Section 7.1. In summary, three change-points in the common components are introduced to the loadings (η χ 1 ), autocorrelations of the factors (η χ 2 ) and the number of factors (η χ 3 ), while a single change-point in the idiosyncratic components is introduced to their serial correlations (η 1 ). The cardinality of the index sets S χ 1 , S χ 3 and S 1 determines the sparsity of the change-points η χ 1 , η χ 3 and η 1 , respectively. We randomly draw each index set from {1, . . . , n} with its cardinality set at [ n], where ∈ {1, 0.75, 0.5, 0.25}. Also, in the case of η χ 1 , the size of the shifts in the loadings is controlled by the parameter σ ∈ √ 2{1, 0.75, 0.5, 0.25}, as ∆ ij ∼ iid N (0, σ 2 ). Finally, we set φ ∈ {1, 1.5, 2, 2.5}, a parameter that features in ϑ, in order to investigate the impact of Var(χ it )/Var(x it ) on the performance of the change-point detection methodology. We fix the number of observations at T = 500 and the dimensionality at n = 100.
We report the performance of our methodology over 100 realisations in Figures 10-11  In accordance with the observations made under single change-point scenarios (Section 7.1), detecting change-points in the common components, η χ 1 and η χ 3 in particular, becomes more challenging as they grow sparse cross-sectionally (with decreasing ) and as Var(χ it )/Var(x it ) decreases (with increasing φ). For the settings considered here, the DCBS algorithm applied to WT of * it performs as well as that applied to the WT of the true it , regardless of the model parameters φ and . Not surprisingly, as the break in the loadings grows weaker with decreasing σ, the detection rate of η χ 1 deteriorates, especially when σ = 0.25 √ 2. Comparing the performance of the DCBS algorithm applied to χ * it and χ it , the gap is not so striking in the detection of the change-point η χ 2 in the autocorrelations of the factors. As for η χ 1 and η χ 3 , provided that the breaks in the loadings and the number of factors are moderately dense ( ≥ 0.5), and the magnitude of the former reasonably large (σ ≥ 0.5 √ 2), we can expect the common components estimated via PCA to recover the both change-points as those in their second-order structure, for a range of φ.
8 Real data analysis for all k ∈ R \ {8}. Table 1 reports the change-points estimated from χ * it and * it , as well as By way of investigating the validity of η χ b , b = 1, . . . , B χ , we computed the following quantities over each segment defined by two neighbouring change-points, [ η χ b + 1, η χ b+1 ]: x,j > c for some c ∈ (0, 1),  where µ b x,j denotes the jth largest eigenvalue of Γ b x , and q b = (n − 1) ∧ ( η χ b+1 − η χ b − 1). In short, k b (c) is the minimum number of eigenvalues required so that the proportion of the variance of x t , t ∈ [ η χ b + 1, η χ b+1 ] accounted for by µ b x,j , j = 1, . . . , k b (c) exceeds a given c. Varying c ∈ {0.5, 0.55, . . . , 0.95}, we plot k b (c) over the B χ + 1 segments in Figure 13. We observe that over long stretches of stationarity, greater numbers of eigenvalues are required to account for the same proportion of variance, compared to shorter intervals which all tend to be characterised by high volatility. This finding is in accordance with the observation made in Li et al. (2017), that a small number of factors drive the majority of the cross-sectional correlations during the periods of high volatility.

US macroeconomic data
We analyse the US representative macroeconomic dataset of 101 time series, collected quarterly between 1960:Q2 and 2012:Q3 (T = 210), for change-points. Similar datasets have been analysed frequently in the factor model literature, for example, in Stock and Watson (2002).
The dataset is available from the St. Louis Federal Reserve Bank website (https://fred. stlouisfed.org/). We impose a restriction in applying the DCBS algorithm so that no two change-points are detected within three quarters in analysing the quarterly observations.
Applying the information criterion of Bai and Ng (2002), r = 8 is returned so we choose R = {8, . . . , 20}. All k ≥ 14 lead to the identical change-point estimates for the common component with B χ = 5 so that we select k * = 20. We also obtain B χ (k) ⊂ B χ (k * ) for all k < 14. Table   2 reports the change-points estimated from χ * it and * it , and we plot two representative series from the dataset, gross domestic product (GDP) growth rate and consumer price inflation (CPI), along with η χ b in Figure 14. According to the change-points detected, the observations are divided into periods corresponding to different economic regimes characterised by high or low volatility. In particular, we highlight the following regimes (recessions are dated by the National Bureau of Economic Research, http://www.nber.org/cycles.html): 1. early 1970s to early 1980s marked by two major economic recessions, which were characterised by high inflation due to the oil crisis and the level of interest rates; 2. the so-called Great Moderation period which, according to our analysis, started in late    financial indicators, observed monthly rather than quarterly, over a shorter span of period between January 1985 and January 2013. Their focus was on verifying the existence of a structural break corresponding to the beginning of the financial crisis, which they estimated to be in December 2007, a date which is close to our η χ 4 (considering that we analyse quarterly observations). As in Section 8.1, we perform a post-change-point analysis by plotting k b (c) computed on each segment defined by η χ b , see Figure 15, where we make similar observations about t he contrast between the number of factors required over long stretches of stationarity and short intervals of volatility.

Conclusions
We have provided the first comprehensive treatment of high-dimensional time series factor models with multiple change-points in their second-order structure. We have proposed an

A An example of piecewise stationary factor model
We illustrate with an example that the piecewise stationary factor model (1) under Assumption 1 admits all possible change-point scenarios that arise under factor modelling for the common components.
The following four comments help in understanding the properties of model (1).
2. Recall that the number of factors in (1) satisfies r ≥ r b : in this example we have a single factor prior to η χ 1 (r 0 = 1) and then two factors in each of the subsequent segments (r 1 = r 2 = r 3 = 2), whereas r = 4. (1) is not unique. We can for example re-write (16) with constant (1) can also be written as a piecewise stationary version of the dynamic factor model introduced in Forni and Lippi (2001) and Hallin and Lippi (2013). We can for example re-write (16) as the piecewise stationary version of a dynamic factor model with r = 5 and constant dynamic loadings

Model
βL) −1 ) (L denoting the lag operator), and factors defined as A change in the autocorrelation of the factors can therefore be equivalently represented by a change in the dynamic loadings.

B.1 Preliminary results
We denote by ϕ i the n-dimensional vector with one as its ith element and zero elsewhere.
For part (ii), now we deal with an n-dimensional vector and not a matrix. Hence, using the same approach as in part (i), which completes the proof.
Lemma 2. Let r × r diagonal matrices M x and M χ have the r largest eigenvalues of Γ x and of Γ χ in the decreasing order as the diagonal elements, respectively. Then, Proof. As a consequence of Lemma 1 (i) and Weyl's inequality, µ x,j satisfy From (19), there exists c r ∈ (0, ∞) such that µ χ,r /n ≥ c r and thus µ x,r /n ≥ c r +O p ( log n T ∨ 1 n ), which implies that the matrix n −1 M χ is invertible and the inverse of n −1 M x exists with probability tending to one as n, T → ∞. Therefore, Finally, Lemma 3. Denote the n-dimensional normalised eigenvectors corresponding to the jth largest eigenvalues of Γ x and Γ χ , by w x,j and w χ,j , respectively. We further define the n × r matrices W x = [ w x,1 , . . . , w x,r ] and W χ = [w χ,1 , . . . , w χ,r ]. Then, there exists an orthonormal r × r- Proof. From Theorem 2 in Yu et al. (2015), which is a generalisation of the sin θ theorem in Davis and Kahan (1970), we have where µ χ,0 = ∞ and µ χ,r+1 = 0. From (19), the denominator of (20) is bounded from the below by c r n and thus part (i) follows immediately from Lemma 1 (i).

B.2 Proof of Theorem 1
Note that under the normalisation adopted in Assumption 2 (i), χ it = ϕ i W x W x x t and For I, we have from Lemma 3 (i)-(ii), Lemma 4 and the result in (C1) lead to As for II, due to normalisation of the eigenvectors, we invoke Assumption 4 (i): Thereby, due to Assumption 4 (ii) and Bonferroni correction, max 1≤t≤T W χ t = O p ( √ log T ) and using (23) Substituting (22) and (25) into (21) completes the proof.

B.3 Proof of Theorem 2
Note that Recall (23), from which due to Lemma 3 (ii), and therefore max 1≤j≤r max 1≤i≤n w x,ij = O p (1/ √ n). Therefore, for c w chosen large enough, we have χ r it = r j=1 w x,ij w x,j x t = r j=1 w x,ij w x,j x t = χ it for all i and t with probability tending to one, and we prove this theorem conditioning on such an event; once this is done, it then implies the unconditional arguments.
Consider the scaled partial sums Starting from II, where, following (22), from Lemma 3 (i)-(ii) and Lemma 4. For IV , we first invoke Assumption 4 (i) as in (24): Hence, under Assumption 4 (ii), (e−s+1) −1/2 e t=s W χ t is an r-vector of zero-mean normally distributed random variables with finite variance, and thus Therefore, we have which proves (i). Next, thanks to Lemma 4, thus proving (ii).
B.4 Proof of Proposition 1 g j ( χ k it ) and h j ( χ k it , χ k i t ) admit the following decompositions i t )} are piecewise constant with their changepoints belonging to B χ . Moreover, under Assumption 1, all change-points in Γ β(t) χ (τ ), |τ | ≤τ χ , i.e., all η χ b ∈ B χ , appear as change-points in the panel {E{g j (χ T , with the choice of J * T = log 2 log υ T as discussed in Section 3.2. Next, we turn our attention to scaled sums of I and IV over any given interval [s, e], which is bounded as in the following Lemma 5 (see Appendix B.4.1 for a proof).
Lemma 5. Suppose that the conditions of Theorem 2 are met. At scales j ≥ −J * T , For bounding III and V I, we investigate the behaviour of ξ j ( , the errors arising from replacing the unobservable χ it by its estimate χ k it , in the following Lemma 6 (see Appendix B.4.2 for a proof).
Lemma 6. Suppose that the conditions of Theorem 2 are met. At scales j ≥ −J * T and k ≥ r, Finally, the scaled partial sums of II and V are handled by the following Lemma 7 (see Appendix B.4.3 for a proof).
Lemma 7. Suppose that the conditions of Theorem 2 are met. At scales j ≥ −J * T , From Lemmas 5-7 Proposition 1 follows.

B.4.1 Proof of Lemma 5
Let τ = e − s + 1 when there is no confusion. Note that Restricting our attention to the first summand in the RHS of (27) (when h = 0), by the definition of g j , Starting from I, let b 1 = β(s) and b 2 = β(e) + 1. Then, uniformly in 1 ≤ s ≤ e ≤ T for any i = 1, . . . , n, from Assumptions 1 (i) and 3. As for II, i t )}, which concludes the proof.

B.4.2 Proof of Lemma 6
The proof of Theorem 2 indicates that Let τ = e − s + 1 when there is no confusion. Adopting the similar arguments as in Appendix B.4.1, we need to derive a bound on the following: (28). As for II, note that for all t satisfying w it · v it < 0 and |w it | ≥ |v it |, we have w it + v it = c t (w it − v it ) for some c t ∈ [0, 1), which as for I it leads to II = O p (log θ T ). Similar arguments apply to III. Then, similarly to (27), τ −1/2 | e t=s ξ j ( χ k it )| involves summation of L j such summands, and thus is O p (log θ+υ T ). Analogous arguments can be applied to bound the scaled partial sums of ζ j ( χ k it , χ k i t ).

B.4.3 Proof of Lemma 7
The proof of Lemma 4 implies that (with E(χ it ) = 0), from which Lemma 7 follows since g j (χ it ) and h j (χ it , χ i t ) involves at most

B.5 Proof of Theorem 3
The DC operator in this paper is identical to D ϕ m (·) defined in Cho (2016) with ϕ = 1/2. Let the additive panel data considered therein be y t = z t + ε t . Then Assumptions 8-9 along with Assumption 6 imposed on the change-points in z t , are sufficient for the conditions imposed on z t . On the other hand, their noise term satisfies E(ε t ) = 0, while it is generally expected that E(ε t ) = 0. Assuming that ε t is strong mixing with bounded moments, it was shown that (e − s + 1) −1/2 | e t=s ε t | ≤ log T uniformly in ∈ {1, . . . , N } and 1 ≤ s < e ≤ T (their Lemma 1), which is comparable to the bound of log θ+υ T on (e − s + 1) −1/2 | e t=s ε t | as shown in Propositions 1 and 4. This enables us to directly employ the arguments used in the proofs of Theorem 3.3 of Cho (2016) for the proof of Theorem 3.

B.6 Proof of Proposition 2
In order to prove Proposition 2, we first introduce the following lemmas (see Appendix B.6.1-B.6.3 for the proofs).
Lemma 8. Suppose that all the conditions in Theorem 3 hold.
x t x t ), and define Γ b χ analogously as Γ b x . Then, for all b = 0, . . . , B χ , Lemma 9. Suppose that all the conditions in Theorem 3 hold. Let W b χ = [w b χ,1 , . . . , w b χ,r b ], with w b χ,j denoting the normalised eigenvectors corresponding to µ b χ,j , the jth largest eigen- Then, there exists an Lemma 10. Suppose that all the conditions in Theorem 3 and Assumption 10 hold. For a From Theorem 3, we have P(C n,T ) → 1 where with for some c 1 > 0 and ω n,T = J * T min 1≤b≤Bχ δ −2 b log 2θ+2υ T . We first show that where, denoting by W b j: Thanks to Theorem 3, it is sufficient to show that P{r b = arg min 1≤k≤r V b (k)|C n,T } → 1 for the proof of (31). Firstly, let k > r b . Due to the orthonormality of w b x,j , Then, using Lemma 10, due to Assumptions 2 (ii), 5 (ii) and 6. Also, invoking Assumptions 4)-5 and Lemma A.3 of Fan et al. (2011a). Hence, under the conditions imposed on p(n, T ), we conclude that V b (k) > V b (r b ) for any fixed k > r b with probability tending to one as n, T → ∞.
Then, we can bound V = O p ( log n/T ) similarly as II. Also, thanks to Lemma 9, there exists an r b × (r b − k) matrix S with orthonormal columns so that Note that Then, which follows from Lemma 8 and that S S is a rank r b − k projection matrix, and hence V I > 0. Also, using (32) and Assumption 2 (ii), Combining the bounds on V I, V II and V III, we conclude that III is bounded away from zero with probability tending to one. Finally, under Assumptions 2 (ii), 4 (ii) and 5, Lemma B.1 (ii) of Fan et al. (2011a) leads to with probability converging to one. Having shown that V b (k) is minimised at r b , we proved (31). Then we can adopt the arguments used in the proof of Corollary 1 in Bai and Ng (2002) verbatim to complete the proof.
B.6.1 Proof of Lemma 8 For a given b, without loss of generality, assume that Under Assumption 6, Lemmas A.3 and B.1 (ii) of Fan et al. (2011a) can be adopted to show that II = O p (n 2 log n/T ). Recall the definition of C n,T from the proof of Proposition 2. Then, for some c 2 > 0, consider x,ii | > c 2 ω n,T T ∩ C n,T + P(C c n,T ), (33) where the second probability in the RHS of (33) tends to zero as n, T → ∞. Note that |x it x i t | = III + IV + V.
Lemma A.2 of Fan et al. (2011a) shows that the exponential tail bound carries over to x it x i t with parameter 2β f /(2 + β f ), from which we derive that max 1≤i,i ≤n max 1≤t≤T |x it x i t | = O p (log θ−1/2 T ) under Assumption 7. Then, we have III, IV, V = O p (ω n,T log θ−1/2 T /T ) uniformly in i, i = 1, . . . , n in the event of C n,T under Assumption 6. Hence, the RHS of (33) tends to zero with n, T → ∞, which leads to I = O p (n 2 ω 2 n,T log 2θ−1 T /T 2 ) and concludes the proof of (29). The proof of (30) follows analogously.

C WT for change-point analysis of the idiosyncratic components
The WT proposed in Section 3.3 is directly applicable to k it for change-point analysis in the idiosyncratic components. For completeness, we state the corresponding arguments below.
Lemma 11. Suppose that the conditions of Theorem 2 are met. At j ≥ −J * T and k ≥ r,  The proof of Lemma 11 take the analogous steps as the proofs of Lemmas 5-7 and thus is omitted. Then Proposition 4 below holds for the WT of k it . Again, its proof takes the identical arguments as the proof of Proposition 1 thus is omitted. apply the algorithm sequentially: (a) first apply the DCBS algorithm to the panel consisting of wavelet-transformed χ k it ( k it ) at scale −1 only, and denote the set of estimated change-points by B; (b) apply the DCBS algorithm to wavelet-transformed χ k it ( k it ) at scale −2 over each segment defined by two adjacent change-points in B, and add the estimated change-points to B; (c) repeat (b) with wavelet scales j = −3, −4, . . . until B does not change from one scale to another.
The above procedure is motivated by the fact that the finer wavelet scales are preferable for the purpose of change-point estimation, and allows for the data-driven choice of J * T . Due to the condition on the spread of change-points in Assumption 6, this adjustment does not affect the theoretical consistency in the detected change-points, while ensuring that the bias in estimated change-points do not hinder the subsequent search for change-points empirically.

D.3 Binary segmentation algorithm
With regards to the discussion in Remark 2 and the bias presented in Theorem 3, we propose to use d T = [log 2 T ∧ 0.25T 6/7 ].
Though our methodology requires the selection of a multitude of parameters as listed in this section, we observe that its empirical performance is relatively robust to their choices. The parameter that exerts the most influence is d T : smaller values of d T tend to return a larger set of estimated change-points, some of which may be spurious as a result of bias associated with previously detected change-points. On the other hand, if d T is set to be too large, some change-points may be left undetected due to the restrictions imposed by its choice. The default choice of d T recommended above worked well for the simulated datasets. Also, with some prior knowledge on the dataset, d T may be selected accordingly, as we have done in real data analysis.

D.3.2 Implementation of the DCBS algorithm
Steps 1-2 of the bootstrap algorithm proposed in Section 6.1 generate panel data y • t of the same dimensions as the original {y t , = 1, . . . , N ; t = 1, . . . , T }. Therefore, T • s,e over different intervals [s u,v , e u,v ] (with s = s u,v and e = e u,v ), which are examined at certain iterations of the DCBS algorithm, can be computed from the same bootstrap panel data y • t . Since storing R copies of such high-dimensional panel data (N × T ) is costly, we propose to apply the DCBS algorithm with the bootstrap algorithm as proposed below.
Firstly, a binary tree of a given height, say , is grown from y t with (s u,v , e u,v ) as its nodes, by omitting the testing procedure of Step 1.3 when applying the DCBS algorithm until u = .
We impose a minimum length constraint that requires e u,v − s u,v + 1 ≥ 4d T ; if not, the interval [s u,v , e u,v ] is not split further.
Let I = {(s u,v , e u,v ) : 1 ≤ v ≤ I u ; 1 ≤ u ≤ } be the collection of the nodes in the thusgenerated binary tree, with I u denoting the number of nodes at level u. Then, the following is repeated for m = 1, . . . , R: we generate the bootstrap panel data y m• t using the bootstrap algorithm, and compute T m• s,e for all (s, e) ∈ I, from which the corresponding thresholds are drawn. Once it is complete, we perform Step 1.3 by setting (s, e) = (s u,v , e u,v ) ∈ I; starting from (u, v) = (1, 1), we progress in the order (u, v) = (1, 1), (2, 1), . . . , (2, I 2 ), (3, 1), . . .. If T s,e ≤ π N,T for some (s, e), all the nodes (s , e ) that are dependent on (s, e) (in the sense that [s , e ] ⊂ [s, e]) are removed from I, which ensures that we are indeed performing a binary segmentation procedure. = [log 2 T /2] is used in simulation studies and real data analysis, which grows a sufficiently large tree considering that we can grow a binary tree of height at most log 2 T .

D.4 Stationary bootstrap and choice of parameters
Stationary bootstrap (SB) generates bootstrap samples X • 1 , . . . , X • T as below. Let J 1 , J 2 , . . . be i.i.d. geometric random variables independent from the data, where P(J 1 = L) = p(1 − p) L−1 with some p ∈ (0, 1), I 1 , . . . , I Q be i.i.d. uniform variables on {1, . . . , T }, and denote a data block by B(t, L) = (X t , . . . , X t+L−1 ) for t, L ≥ 1 with a periodic extension (X t = X t−uT for some u ∈ Z such that 1 ≤ t − uT ≤ T ). The SB sample X • 1 , . . . , X • T is generated as the first T observations in the sequence B(I 1 , J 1 ), . . . , B(I Q , J Q ) for Q = min{q : q l=1 J q ≥ T }. We used the bootstrap sample size R = 200 for simulation studies, and R = 500 for real data analysis. Naturally, the larger bootstrap sample size is expected to return the better choice for the threshold, but we did not observe any sensitivity due to the choice of the size of bootstrap samples in our simulation studies. The binary segmentation procedure implicitly performs multiple testing, with the number of tests determined by the choice of described in Appendix D.3.2. However, the test statistics computed at different iterations of the DCBS algorithm are correlated. Therefore, with the added difficulty arising from the hierarchy inherent in the procedure, the task of controlling for multiple testing is highly challenging.
Instead, noticing that the bootstrap test statistics tend to have heavy tails, we chose to adopt the same α = 0.05 at all iterations of the DCBS algorithm.
For the stationarity testing in multivariate (finite-dimensional) time series, Jentsch and Politis (2015) proposed to modify the above p as per the theory developed for the consistency of their stationary bootstrap, by (a) replacing T 1/3 with T 1/5 , and (b) taking the average of p −1 associated with each univariate series. Taking their approach, we choose to select p j for individual f jt , j = 1, . . . , k according to (a) in Step 1 for the common components, and select p i for individual k it , i = 1, . . . , n according to (a) and use (n −1 n i=1 p −1 i ) −1 according to (b) in the same step for the idiosyncratic components. As for Λ, we plug in the automatically chosen bandwidth based on the autocorrelation structure of f jt and k it as suggested in Politis and White (2004).