Change-point detection in panel data via double CUSUM statistic

In this paper, we consider the problem of (multiple) change-point detection in panel data. We propose the double CUSUM statistic which utilises the cross-sectional change-point structure by examining the cumulative sums of ordered CUSUMs at each point. The efficiency of the proposed change-point test is studied, which is reflected on the rate at which the cross-sectional size of a change is permitted to converge to zero while it is still detectable. Also, the consistency of the proposed change-point detection procedure based on the binary segmentation algorithm, is established in terms of both the total number and locations (in time) of the estimated change-points. Motivated by the representation properties of the Generalised Dynamic Factor Model, we propose a bootstrap procedure for test criterion selection, which accounts for both cross-sectional and within-series correlations in high-dimensional data. The empirical performance of the double CUSUM statistics, equipped with the proposed bootstrap scheme, is investigated in a comparative simulation study with the state-of-the-art. As an application, we analyse the log returns of S&P 100 component stock prices over a period of one year.


Introduction
Multivariate, possibly high-dimensional observations over time have emerged in many fields, such as economics, finance, natural science, engineering and humanities, thanks to the advances of computing technologies (Fan, Lv and Qi, 2011). Multivariate data observed in practical problems often appear nonstationary in the sense that it is natural to let some quantities or parameters involved in the model to be time-varying. Arguably, the simplest departure from assuming stationarity is to operate under the assumption of piecewise stationarity, which allows more flexibility as well as providing interesting insights into the data with regards to the structural change-points. Besides, in the case of time series analysis, it enables (short-term) prediction of the future process values, by treating the last estimated segment as being stationary. Throughout the paper, the term "multiple change-point detection" is used interchangeably with "segmentation".
Panel data models are frequently adopted to analyse high-dimensional data involving measurements over time. In this paper, we focus on the problem of detecting (possibly) multiple change-points in the mean of panel data, where n, the dimensionality of the data, may increase with the number of observations T . The panel model is presented as t= 1, . . . , T ; j = 1, . . . , n, where {f j,t } T t=1 , j = 1, . . . , n are piecewise constant signals which share an unknown number of change-points at unknown locations.
CUSUM statistics have been widely adopted for segmenting both univariate and multivariate data. For univariate data segmentation, CUSUM statistics are computed over time, and this series of CUSUMs is examined to locate a changepoint, often as where its maximum in the absolute value is attained. Combined with a binary segmentation (BS) algorithm, the CUSUM statistics can consistently detect multiple change-points in a recursive manner (see e.g., Vostrikova (1981), Venkatraman (1992) and Cho and Fryzlewicz (2012)).
For segmenting n-dimensional panel data, we may apply the above procedure to each univariate component series separately, and then prune down the estimated change-points by identifying those detected for the identical changepoint across the panel. However, such pruning may be difficult to accomplish even in moderately large dimensions, due to the estimation bias present in each change-point estimate. Besides, this approach does not take into account, and thus benefit from, the cross-sectional nature of change-points (that they are shared across the panel) which may lead to loss of power in change-point detection. Instead, we propose to segment the n-dimensional data simultaneously by searching for change-points from the aggregation of n series of CUSUM statistics, rather than from individual CUSUM series separately.

Literature review
Let C b denote a CUSUM operator which takes x j,t over a generic interval t ∈ [s, e] with 1 ≤ s < e ≤ T as an input and returns for b = s, . . . , e − 1, with a suitably chosen scaling constant σ j . Assuming the presence of at most one change-point, some change-point tests for panel data have been proposed, based on the principle of high-dimensional CUSUM series aggregation. Note that for single change-point detection, s = 1 and e = T . Zhang et al. (2010) Horváth and Hušková (2012), {(X j 1,b,T ) 2 − 1}, and the limit distribution of the test statistic was derived for independent panel data. Enikeeva and Harchaoui (2014) proposed a change-point test for panel data with i.i.d. Gaussian noise that combines the linear statistic, which is constructed similarly as T HH 1,T , and the scan statistic, which is aimed at detecting a crosssectionally sparse change. More specifically, the test statistics are Allowing for both temporal and cross-sectional dependence, Jirak (2015) proposed a test statistic obtained from taking the pointwise maximum of the multiple CUSUM series: which is compared against a threshold drawn from an extreme value distribution of Gumble type or bootstrap. Note that with the exception of T scan 1,T , the CUSUM aggregation methods above are not adaptive to the underlying structure of CUSUM statistic values at each b, in the sense that they take either pointwise maximum or sum of (squared) CUSUMs. Empirical studies conducted in Cho and Fryzlewicz (2015) showed that such approaches may lead to inferior performance in detecting and locating change-points in high-dimensional settings. Instead, they proposed the Sparsified Binary Segmentation (SBS) where the change-point test T SBS 1,T (π T ) > 0 was based on the following "sparsified" or "thresholded" test statistic |X j 1,b,T | · I(|X j 1,b,T | > π T ) (I(E) = 1 if and only if the event E is true), with an appropriately bounded threshold π T chosen to guarantee that |X j s,b,e | < π T uniformly over j ∈ {1, . . . , n} and {(s, b, e); 1 ≤ s ≤ b < e ≤ T } with probability converging to one under the null hypothesis of no change-point. The intuition behind the construction of T SBS 1,T (π T ) is that, irrelevant contribution from those components without any change is reflected as small values of |X j 1,b,T | and hence is disregarded through the thresholding step, while large CUSUMs formed in the vicinity of the true change-points are summed up to strengthen the meaningful contribution from the corresponding components. Combining the BS procedure with the thresholded test statistic, the consistency of the SBS algorithm for multiple changepoint detection was established. However, it is empirically observed that the higher autocorrelations in ε j,t is, the larger π T is required to grant the consistency in change-point detection, which amounts to selecting n thresholds for segmenting n-dimensional panel data.

Outline of the paper
In this paper, we propose the double CUSUM statistic which accomplishes the high-dimensional CUSUM series aggregation through data-driven partitioning of the panel data at each point, while avoiding the difficulties involved in selecting (possibly n) thresholds that apply directly to individual CUSUMs for establishing change-point detection consistency.
The rest of the paper is organised as follows. We describe the double CUSUM statistic in details in Section 2. In Section 3, we establish the consistency of the change-point test based on the double CUSUM statistics, as well as investigating its efficiency in comparison with the tests discussed in Section 1.1. Also, the double CUSUM Binary Segmentation algorithm is formulated and its consistency in multiple change-point detection is studied. Section 4 discusses the choice of important quantities including the test criterion, for which a bootstrap procedure is introduced. We illustrate its performance on simulated datasets in Section 5 and on log returns of S&P 100 component stock prices in Section 6. Section 7 concludes the paper and the proofs of theoretical results are provided in Section 8. Finally, some auxiliary results and simulation results are reported in the supplementary document (Cho, 2016).

Double CUSUM statistic
Recall the panel data model from Introduction The noise {ε j,t } T t=1 satisfies E(ε j,t ) = 0 for all j and t, and is allowed to be correlated both within-series and cross-sectionally as specified later in Section 3. The piecewise constant signals {f j,t } T t=1 , j = 1, . . . , n share N change-points 1 < η 1 < . . . < η N < T (possibly with unknown N ). That is, at each change-point η r , there exists an index set Π r = {j : Recall the definition of X j s,b,e in (1), which denotes the CUSUM statistic computed on x j,t over a generic interval t ∈ [s, e] with 1 ≤ s < e ≤ T , for which is compared against a test criterion π ϕ n,T . Once T ϕ s,e > π ϕ n,T , the location of the change-point is identified as where the pointwise maximum (over m) of the DC statistics is maximised (over b), i.e., To understand the properties of the DC statistic, first consider the case where the noise ε j,t is not present in the panel data. Then the series of DC statistics at each fixed m is always maximised at one of the true change-points within the interval [s, e) and consequently, the maximum over both b and m is guaranteed to be attained at a true change-point. The formal statement and its proof can be found in Appendix B of the supplementary document.
The key feature of the DC statistic is the ordering of the input series to D ϕ m . In panel data segmentation, in addition to detecting and locating the change-points in time, it is also of interest to locate the change in coordinates as well, which is relatively unexplored in the relevant literature with the exception of Jirak (2015). Not only such information is useful in the interpretation of detected change-points, but also can play an important role in aggregating the highdimensional CUSUM series efficiently as detailed below. At a given time point b, one way of partitioning the components into those with changes and those without, is to arrange the modulus of CUSUMs in the decreasing order, and then to label the components which correspond to the first m b (∈ {1, . . . , n}) largest values of |X j s,b,e | as being likely to have a changepoint around b. Note that aggregating the (squared) CUSUMs via pointwise averaging or maximising implicitly takes m b = n or m b = 1, respectively. In constructing T SBS (π T ), the choice of m b is associated with the choice of π T , i.e., m b (π T ) = n j=1 I(|X j s,b,e | > π T ). The DC statistic provides a data-driven alternative for selecting m b , namely While we do not claim that the thus-obtained partitioning consistently identify Π r at each detected change-point, we see in simulation studies reported in Section 5 that the DC statistic performs well in change-point detection, by identifying those components that contribute to change-point detection according to the above m ϕ b . The RHS of (5)  involves maximisation over both b and m, we avoid explicitly selecting π T that applies directly to individual CUSUMs while still enjoying the "sparsifying" effect of the thresholding step.
Comparing (1) and (4), we can see the connection between the two operators D ϕ m and C b especially when ϕ = 1/2, as both return the scaled difference between partial averages over two disjoint intervals. However, D ϕ m involves {m(2n − m)/(2n)} ϕ instead of {m(n − m)/n} ϕ , although the input series is of length n. This difference comes from the observation that the latter scaling factor may not favour a change-point that is shared by more than [n/2] rows, and even act as a penalty when m r is close to n, which is against our intuition on the detectability of a change-point. By adopting {m(2n − m)/(2n)} ϕ , the DC statistic can be regarded as being computed on the panel data of dimension 2n, where there are additional n "null" components which are known to have no change-point. Since {m(2n − m)/(2n)} ϕ is non-decreasing in m, when considering the asymptotic efficiency of the DC statistic-based change-point test, the cross-sectional size of change accounts for both |δ j,r |, the magnitude of jumps at the change-point, and m r , its "density" (as opposed to the sparsity) across the panel (see Remark 3.1 for further discussion).
For illustration, we computed the DC statistics from panel data generated with a single change-point of different configurations with ϕ chosen as detailed in Section 4.1. Fixing n = 250 and T = 100, ε j,t was simulated as in (N1) of Section 5.1.1 with = 0.2 (which controls the degree of cross-correlations in ε j,t ). The piecewise constant signals were generated with a change-point at η 1 = T/2, where m 1 out of n components contained a shift of magnitude randomly drawn from a uniform distribution U(0.75δ 1 , 1.25δ 1 ) with (m 1 , δ 1 ) = ([log n], 0.24) and ([0.5n], 0.05). We chose m 1 and δ 1 in order to set j∈Π1 δ 2 j,1 at approximately the same level.
As shown in Figure 1, the location of the true change-point in time was accurately identified as where the pointwise maximum of DC statistics was maximised in both settings, i.e., η 1 = arg max b∈[1,T ) max 1≤m≤n D ϕ m ({|X 1, η1,T |} n j=1 ), was closer to m 1 for the larger δ 1 . This implies that not all components with the changes contribute to the detection of a change-point in the presence of noise, due to small magnitude of the changes, and using only a subset of Π 1 may serve the purpose better for change-point detection.
Remark 2.1. For high-dimensional change-point analysis, Aston and Kirch (2014) defined the high-dimensional efficiency, a concept closely related to asymptotic relative efficiency. Let δ = (δ 1,1 , . . . , δ n,1 ) . Then, the high-dimensional efficiency is determined by the rate at which the cross-sectional size of change is allowed to converge to zero ( δ 2 → 0) as T and, with it, n increase, such that the power of the change-point test is strictly between the size and one. They further investigated the problem of single change-point detection using a class of change-point statistics obtained from (i) first projecting the panel data with respect to a projection vector p ∈ R n , and (ii) computing the series of CUSUM statistics from the univariate series { x t , p } T t=1 , with x t = (x 1,t , . . . , x n,t ) . Assuming that ε t = (ε 1,t , . . . , ε n,t ) is independent over time and denoting Σ ε = var(ε t ), the oracle projection vector with the optimal high-dimensional efficiency was given by maximising (p Σ ε p) −1 | δ, p | 2 with respect to p, as o := Σ −1 ε δ. The DC statistic at fixed b and m coincides with the CUSUM of pointwise projection of the panel data. More specifically, D ϕ m ({|X (j) 1,b,T |} n j=1 ) is associated with the n-vector p ϕ b,m with its elements . In light of the previous discussion on pointwise ordering and partitioning of the panel data in computing the DC statistics, our approach may be viewed as an attempt at mimicking the performance of the oracle projection without the prior knowledge of either δ or Σ ε . We further investigate the high-dimensional efficiency of the DC statistic in comparison with other competitors and the oracle projection in Section 3.1.
Remark 2.2. The scan statistic (Enikeeva and Harchaoui, 2014) shares with the DC statistic the pointwise maximisation of cumulative sums of the ordered CUSUMs over 1 ≤ m ≤ n. One key difference is that the former has (X (j)  Enikeeva and Harchaoui (2014) proposed two choices for T m , a theoretical one from a χ 2 m -distribution, and an empirical one T m = 2(2n) −1/2 {m log(ne/m) + log(T n/α)} at a given significance level α ∈ (0, 1). In conducting the simulation studies reported in Section 5, both choices of T m were observed to be too sensitive to cross-sectional correlations in the panel data. T m may be chosen numerically from e.g., a bootstrap scheme as indicated by the authors, but this requires selecting n such T m and moreover, it is unclear how the cross-correlations should be treated.

Single change-point detection
In this section, we show the consistency of the DC statistic in the single changepoint scenario (N = 1). More specifically, we consider the null hypothesis H 0 : f j,1 = · · · = f j,T for all j = 1, . . . , n, which indicates structural stability in the mean over time. As an alternative hypothesis, we specify the scenario where the piecewise constant signals {f j,t } T t=1 , j = 1, . . . , n contains a single change-point at an unknown location η 1 ∈ (1, T ), such that

H. Cho
Throughout the paper, a ∼ b is used to denote that a is of the order of b, and a ∧ b = min(a, b). Then, the consistency of the proposed test is established under the following conditions.
, the average magnitude of non-zero changes at t = η 1 . Then for any ϕ ∈ [0, 1], we have (n ϕ log T ) −1 m ϕ In (A1), each ε j,t is assumed to be α-mixing (strong-mixing) at a geometric rate, with bounded moments and long-run variance. The condition (A1.i) is met by many distributions such as exponential, gamma and inverse Gaussian, besides the Gaussian distribution. Note that the cross-sectional correlations of the panel data are not explicitly controlled by any of the conditions imposed in (A1). In (A2), the dimensionality can either be fixed or increase with T at a polynomial rate. (A4) imposes a condition on the unbalancedness of the changepoint location, permitting T −1 {η 1 ∨ (T − η 1 )} → 0 as T → ∞ when β < 1. (A5) places a lower bound on the rate of m ϕ 1 δ 1 , which dictates the minimum requirement on the cross-sectional size of the change for the change-point to be detected as well as being located with accuracy. (A3) rules out the trivial case where any |δ j,1 | → ∞ with T → ∞. (2014) as a tool that allows us to quantify and compare the power of different changepoint tests. Table 1 summarises the high-dimensional efficiency of T ϕ 1,T and other change-point tests discussed in Section 1.1, when the change-point is maximally distanced from the extreme ends of [1, T ] (i.e., β = 1). Table 1 High-dimensional efficiency of change-point tests when β = 1.

those with the interpretation as projected change-point tests (right) and those without (left).
The oracle projection-based change-point test (see Remark 2.1) achieves highdimensional efficiency of T 1/2 Σ −1/2 ε δ 2 → ∞ and thus attaining better efficiency than T HH 1,T and T linear 1,T by n 1/4 , and than T scan 1,T by m

the latter attains better high-dimensional efficiency when
However, the former two achieves partitioning consistency (i.e., consistent estimation Π 1 ), which is not granted by the latter. (1), some studentisation is required for panel data analysis in practice. We estimate σ j using the flat-top kernel estimator with the automatically chosen bandwidth as discussed in Politis (2011). Let

Remark 3.2. As briefly noted below
where the constants are chosen as per Hušková and Kirch (2010)). Then, the estimator is given by where the second term in the RHS of (8) is present to prevent spuriously large CUSUMs resulting from too small estimates of σ j (flat-top kernel estimator is 2010 H. Cho known to produce negative estimators). Hušková and Kirch (2010) showed the consistency of the above estimator in the single change-point detection problem, under similar conditions as those imposed on ε j,t in (A1). Noting that (i) the theoretical results derived in this paper hold without the consistency of the flattop kernel estimator, provided that σ 2 j is uniformly bounded away from zero, and also that (ii) we ultimately consider the problem of multiple change-point detection (with adjustment to the estimator of σ 2 j in order to accommodate the presence of multiple change-points, see Remark 3.3), we assume (A6) below on σ 2 j without extending the consistency result of Hušková and Kirch (2010).
Note that under (A4), short intervals near the extreme points {1, T } do not contain the change-point. Hence we search for a change-point within . Under these conditions, we present the following theorems on the consistency of the DC statistic-based test equipped with a test criterion π ϕ n,T , which satisfies C n ϕ log T < π ϕ n,T < C m ϕ 1 δ 1 T β/2 for some C , C > 0.

Theorem 3.1. Assume that (A1)-(A3) and (A6) hold and that there exists no change-point (N = 0) in the panel data in (3). Then
Theorem 3.1 guarantees that when all signals remain constant, the test does not detect any change-point with probability converging to one.
Theorem 3.2. Assume that (A1)-(A6) hold. Then there exists c 0 > 0 such that Theorem 3.2 states that in the presence of a single change-point, the proposed test detects its presence as well as accurately identifying its location. From (A5), it is easily seen that T /T β → 0 as T → ∞, i.e., in the rescaled time, the estimated change-point location is consistent since We may define the optimality in change-point detection as when the true change-point and its estimate are within the distance of O p (1), see Korostelev (1987). Then, near-optimality in change-point estimation is achieved up to a logarithmic factor (log 2 T ) with the choice (i) ϕ = 0 when δ 1 ∼ 1, or (ii) ϕ > 0 when m 1 ∼ n and δ 1 ∼ 1.

Binary segmentation for multiple change-point detection
In this section, we show the consistency of the DC statistic for multiple changepoint detection when applied jointly with a BS algorithm. We first formulate the double CUSUM Binary Segmentation (DCBS) algorithm for panel data segmentation, which is equipped with a test criterion π ϕ n,T . As in Section 3.1, denote a fraction of the interval [s, e] on which we do not search for change-points, in order to account for possible bias in the previously detected change-points. This does not affect the asymptotic consistency of the estimated change-point locations under the assumption (B1) below, on the dispersion of change-points. The index p is used to denote the level (indicating the progression of the segmentation procedure) and q is used to denote the location of the node at each level.
Step 1 At the current level p, repeat the following for all q.
Step 1.1 Letting s = s p,q and e = e p,q , obtain the series of CUSUMs is computed over all b and m.
Step 1.2 Obtain the test statistic Step 1 Step 2 Once [s p,q , e p,q ] for all q are examined at level p, set p ← p + 1 and go to Step 1.
Step 1.3 furnishes a stopping rule to the DCBS algorithm: the search for further change-points is terminated once T ϕ s,e ≤ π ϕ n,T on every [s, e] defined by two adjacent estimated change-points. Remark 3.3. An adjustment is required to extend the scaling estimation procedure in Remark 3.2, originally designed for single change-point detection, to be applicable to the multiple change-point scenario. More specifically,ε j,t in (7) can no longer be regarded as well-estimating ε j,t when [1, η j,1 ] or ( η j,1 , T ] may 2012 H. Cho contain further change-points. Seeing that η j,1 is the top node of a binary tree that results from applying a BS algorithm to {x j,t } T t=1 , we remedy this by growing a binary tree of depth L T (= O(log T )) from each {x j,t } T t=1 , j = 1, . . . , n. Such a tree represents a maximal segmentation of x j,t and therefore, regarding each segment as being stationary,ε j,t is derived by subtracting the sample means within those intervals. Then, the scaling estimation procedure is applied to the thus-obtainedε j,t .
Note that this approach requires an arbitrary choice of L T when there is no prior information on the upper bound on the total number of change-points, N . For multiple change-point detection, such a requirement (or its equivalent) is commonly found; see e.g., Soh and Chandrasekaran (2015) and Kirch and Muhsal (2015) where the minimum distance between two consecutive changepoints plays an essential role in guaranteeing the consistency of the proposed procedures.
The consistency of the DCBS algorithm is established under (A1)-(A3), (A6) and the following conditions extending (A4)-(A5) in order to allow for the presence of multiple change-points.
(B1) There exists a fixed constant c > 0 such that min 0≤r≤N (η r+1 − η r ) ≥ cT β for some β ∈ (6/7, 1], using the notational convention that η 0 = 0 and Since the unbalancedness of a change-point location is closely related to the distance between two adjacent change-points, (B1) is formulated in terms of the latter. Note that we do not impose any upper bound on the number of total change-points N except for the implication that can be derived from (B1), namely that N may grow with T provided that any two adjacent change-points is sufficiently distanced. Comparing (B2) to (A5), the high-dimensional efficiency is worsen by T 3/2−5β/4 when detecting multiple change-points, which is also the case when applying the BS algorithm to univariate data. This rate can be improved when the DC statistic is combined with e.g., the Wild Binary Segmentation (WBS) (Fryzlewicz, 2014) which applies the CUSUM principle to randomly drawn intervals; we leave the exploration in this direction for the future research.
Unlike in the single change-point scenario, T now depends on the dispersion of the change-points through β. From (B1)-(B2), it is easily seen that T /T β → 0 as T → ∞, and hence the multiple change-points are consistently located for all r = 1, . . . , N.
Remark 3.4 (Post-processing of estimated change-points). In this paper, we focus on establishing the good performance of the DC statistic in high-dimensional CUSUM series aggregation when combined with a simple segmentation method, rather than considering more sophisticated algorithms such as the WBS (Fryzlewicz, 2014) or MOSUM (Kirch and Muhsal, 2015) procedures, where the DC principle is straightforwardly applicable. The BS algorithm is known to perform sub-optimally in certain unfavourable settings since, at each iteration, it fits a step function with a single break to the data over a segment that contains possibly multiple change-points. Hence, we equip the DCBS algorithm with an extra step aiming at removing spuriously detected change-points which is in line with the post-processing steps proposed in Cho and Fryzlewicz (2012) and Cho and Fryzlewicz (2015). It checks whether T ϕ ηr−dr, ηr+dr > π ϕ n,T for r = 1, . . . , N with In other words, we compute the DC statistics within each segment containing a single estimated change-point, and retain only those η r that survive the thresholding. More details on the post-processing step can be found in Section 3.2.1 of Cho and Fryzlewicz (2012).

Choice of ϕ
Remark 3.1 indicates that ϕ = 0 is preferable in terms of the high-dimensional efficiency for detecting a change-point that is known to be sparse across the panel, while T ϕ 1,T with ϕ > 0 achieves the same high-dimensional efficiency as T 0 1,T when the change is dense across the panel. In practice, such information is often unavailable a priori and therefore it is of interest to find a way of combining the information embedded in an array of DC statistics indexed by ϕ ∈ [0, 1], which works universally well over different ranges of change-point configurations determined by η r , m r and δ r .
Recalling the data-driven partitioning of the panel achieved by the DC statistic, the pointwise partitioning of the ordered CUSUM values may be regarded as analogous to fitting a step function with a single break to |X arises from an additive model with a piecewise constant signal that contains a single break at j = m r for some r ∈ {1, . . . , N} and otherwise constant. This setting is unlikely to be satisfied by the ordered CUSUM values in general, but provides a framework for linking the optimality in panel partitioning to that in locating a single change-point. Then, Brodsky and Darkhovsky (1993) showed that the choice of ϕ = 1/2 leads to the best rate of convergence in the bias | m ϕ b − m r | (recall (6)) among ϕ ∈ [0, 1], i.e., asymptotically, the optimal partitioning is attained by the choice of ϕ = 1/2.

H. Cho
Taking into account these observations, we propose a new DC statistic where γ n acts as a scaling factor that enables treating D 0 m and D 1/2 m on an equal footing. Heuristically, the discrepancy between D 0 m and D 1/2 m increases at the rate m 1/2 , but is not so pronounced for small values of m. Therefore, D m can be viewed as an attempt at combining the different ranges of changepoint configurations over which either D 0 m or D 1/2 m achieves consistency, and the following conditions modifying (A5) and (B2) reflect this point.
Then, the consistency of DC statistics carry over to that of the newly defined DC statistic D m as below.

Bootstrap for test criterion selection
Theorems 3.1-4.1 provide ranges for the rate of π ϕ n,T which grant size and power consistency for both single and multiple change-point detection problems. However, the theoretical rates involve typically unattainable knowledge on the quantities such as β and Δ ϕ and, even with such knowledge available, finite sample performance may be heavily influenced by the choice of the multiplicative constant applied to the given rate. Therefore, we propose a resampling procedure that enables us to approximate the quantiles of T ϕ s,e under the null hypothesis of no change-points.
Originally proposed as a data-based simulation method for statistical inference with i.i.d. random samples, the bootstrap principle has been extended to a variety of non-i.i.d. situations; see Härdle, Horowitz and Kreiss (2003) for a comprehensive survey. Although some heuristic attempts have been made (Fiecas and von Sachs, 2014), applying bootstrap methods developed for time series of fixed dimensions to high-dimensional settings is challenging. Some recent efforts in this direction include Jentsch and Politis (2015), who proposed the Linear Process Bootstrap for multivariate time series and established its asymptotic validity for the sample mean when n is allowed to increase with T . The procedure involves the estimation of (nT ) × (nT )-dimensional covariance matrix of the vectorised version of [ε t , t = 1, . . . , T ] and the computation of its Cholesky decomposition, where the latter task alone is of computational complexity O(n 3 T 3 ), which makes it difficult to apply the method even to panel data of moderately large dimensions.
Instead, we propose a bootstrap procedure motivated by the representation theory developed for the Generalised Dynamic Factor Model (GDFM). Factor analysis is a popular dimension reduction technique used in many disciplines, such as econometrics, statistics, signal processing, psychometrics, and chemometrics. The key idea is that pervasive cross-correlations in ε t are modelled by the common component χ t , and ξ t = x t − χ t with moderate degree of crosscorrelations denotes the idiosyncratic component. The GDFM introduced in Forni et al. (2000) steps further and admits the following representation theorem (Forni and Lippi, 2001): any ε t with a finite number (q < ∞) of diverging dynamic eigenvalues is decomposed into χ t driven by a q-tuple of white noises u t (common shocks) as χ t = b(L)u t (L denotes the lag operator and b(L) is an n × q-matrix of one-sided and square-summable filters b ik (L)), and ξ t with bounded dynamic eigenvalues. Referred to as the GDFM-boot, the proposed bootstrap method utilises this representation property of the GDFM in order to effectively handle the cross-correlations as well as within-series correlations present in ε t of large dimensions.
Step 2 Let e l denote the eigenvector of the covariance matrix T −1 EE , which corresponds to its l-th largest eigenvalue. Then, estimate q, the number of 2016 H. Cho common shocks driving the cross-correlations of E using the information criterion proposed by Bai and Ng (2002): 2 ) + kC −1 n,T log(C n,T ), as q = arg min 0≤k≤Q IC(k) with Q = C n,T / log(C n,T ) and C n,T = n ∧ T .
Step 3 Estimate the q-dimensional common shocks ( u t ) and the associated filter ( b(L)), such that ε t is decomposed into χ t = b(L) u t and ξ t = ε t − χ t .
Step 4.1 For each k = 1, . . . , q, draw {u l k,t } T t=1 independently from the empirical distribution of { u k,t } T t=1 , from which χ l t = b(L)u l t is obtained.
Step 4.2 Generate a bootstrap sequence of the Fourier coefficients of ξ t using the Time Frequency Toggle (TFT)-Bootstrap (Kirch and Politis, 2011), to which the inverse fast Fourier transform algorithm is applied to produce ξ l t .

Step 4.3 On the bootstrap series
Step 5 Select the (1 − α)-quantile of T ϕ,l 1,T , l = 1, . . . , B as π ϕ n,T . In Step 1, we adopt the same approach taken in estimating σ j , namely first to estimate the locations of the change-points coordinate-wise and then to take away the estimated piecewise constant signal from each x j,t , in order to estimate the unobservable ε j,t . Note that the practice of input data studentisation is often adopted prior to factor modelling. Step 2 is justified by the observations that (i) the number of factors in the static representation of factor models is typically greater than the number of common shocks in the GDFM, and (ii) overestimated q still returns mean-square consistent common components (Forni et al., 2000, Corollary 2). Section 4 of Forni et al. (2000) provides an estimator of b(L) (and hence u t ) based on the windowed estimator of the spectral density matrix of E, which is adopted for Step 3.
Step 4.1 produces a bootstrap sample of χ t by treating the white noise estimates u k,t as being i.i.d. over time. In Step 4.2, the Local Bootstrap, originally proposed in Kirch and Politis (2011) as a part of TFT-Bootstrap for resampling univariate time series, is applied to the n-dimensional ξ t of bounded cross-sectional correlations. It does not require an initial estimate of the spectral density matrix of ξ t . Instead, the Fourier coefficients of ξ t are resampled according to a kernel-based probability vector, under the observation that in a neighbourhood of each frequency, the distribution of different coefficients is almost identical (if the spectral density is smooth). A detailed description of the Local Bootstrap is provided in the Appendix A of the supplement document. In simulation studies, we used the Daniell kernel with the window size [0.05T ]. Since the common and idiosyncratic components are handled independently, generating a smaller size bootstrap sample (e.g., √ B ) for each component may be sufficient to generate the bootstrap sample of size B for ε t .
Once a bootstrap sample is generated at level p = 1 of the DCBS algorithm, the same sample can be used repeatedly for critical value selection at levels p ≥ 2. That is, in Step 4.3 above, the test statistics are computed as T ϕ,l s ,e = max b∈[s ,e ), 1≤m≤n D ϕ m ({|E (j),l s ,b,e |} n j=1 ) over a moving window of size e − s + 1 for 1 ≤ s ≤ T − e + s, e = s − s + e and l = 1, . . . , B, from which the test criterion is drawn.
Establishing the validity of the GDFM-boot algorithm for change-point analysis is beyond the scope of the current paper and hence is left for the future investigation. However, simulation studies in Section 5 confirm its good performance in various settings together with the proposed change-point statistic. Computationally, the GDFM-boot algorithm benefits from the dimension reduction via factor modelling. When applied to generate a bootstrap sample of size B = 100 with n = 25 and T = 100 (executed on a 3.10GHz quad-core with 8GB of RAM running Windows 7), the R code implementing the algorithm took 0.38 seconds, compared to 62 seconds taken by the Linear Process Bootstrap. Further, applying the latter was not computationally feasible for any data of the size and dimensionality considered in our simulation study.

Single change-point detection
In this section, we evaluate the empirical performance of the DC statistic on simulated panel data with (at most) a single change-point. For comparison, change-point tests with two different choices of ϕ = 0, 1/2 (referred to as T 0 and T 1/2 ) as well as the combined DC statistic ( T ) from Section 4.1 are considered, with π ϕ n,T computed from the GDFM-boot algorithm. Also included are • T EH 1,T = (T linear 1,T > 1)∨(T scan 1,T > 1) equipped with the thresholds H chosen as (1 − α/2)-quantile of the χ 2 n -distribution and T m = 2 √ 2m {m log(ne/m) + log(nT /α)}, as per the recommendation made in Section 5 of Enikeeva and Harchaoui (2014) (referred to as T EH ); • T Jirak 1,T with the test criterion chosen according to the bootstrap algorithm (Algorithm 4.6) proposed in Jirak (2015) with the block size K = 4 (referred to as T Jirak ).
Since the scaling of component series is not discussed in Enikeeva and Harchaoui (2014), we choose to apply σ j estimated as in Remark 3.2 in deriving T EH 1,T . Jirak (2015) showed the consistency of the proposed test when applied with a set of long-run variance estimators for σ j . As discussed in Section 5.1.2, T Jirak is highly sensitive to the scaling terms estimated spuriously small, and therefore we use the most conservative estimator among those included in Proposition 3.5 of Jirak (2015). T HH 1,T is similarly constructed as T linear 1,T with the same high-2018 H. Cho dimensional efficiency, and T EH is expected to perform better than either of the two component tests. Hence T HH 1,T is omitted from our study. As described in Section 5.1.1, each coordinate of ε t is generated from the same model, which enables us to select a single threshold π T applicable to all n component series in deriving T SBS 1,T (π T ). Hence we include T SBS 1,T (π T ) equipped with the "oracle" threshold in the comparative study (referred to as T SBS ), where π T is chosen from the GDFM-boot algorithm with (unobservable) ε t replacing the estimatedε t . Then, π T is the (1−α)-quantile of max b∈[1,T ), 1≤j≤n |E j,l 1,b,T |, l = 1, . . . , B.
When a bootstrap procedure is involved for test criterion selection, the bootstrap sample size is fixed at B = 100. We report the Type I error and the size-corrected power at the significance level α = 0.05, as well as the location accuracy (| η 1 − η 1 | < log T , in %) for T 0 , T 1/2 , T , T Jirak and T SBS over 100 simulated sample paths for each setting; for T EH , it is not suggested how the change-point location is to be estimated. For all tests, d T = 5 is used to trim off the extreme ends of the interval [1, T ]. Also, we investigate the partitioning accuracy of T 0 , T 1/2 , T , T Jirak and T SBS by reporting the Rand index, the sum of true positives and true negatives divided by the total (n), where Rand index close to 1 indicates more accurate partitioning (binary classification).

Results
Table 2 compares the Type I errors of different tests in consideration. Combined with the DGFM-boot procedure, the DC statistic-based tests generally manages to control the Type I errors below the nominal level (α = 0.05) or slightly above, with the exception of the case when n = T = 250 and the ARMA model (N1) is used to generate the noise with = 0.5. The oracle threshold for T SBS leads to a very conservative test, which is also evident in the power performance of T SBS . T Jirak appears to be highly sensitive and therefore vulnerable to the choice of σ j particularly when the critical value is selected by the parametric bootstrap (not reported here), since the test statistic directly depends on the largest CUSUM value attained by a single component series. When the noise is generated as in (N2) with h = 0.9, the size of T Jirak is the closest to the nominal level, which is attributed to the presence of a strong factor as it leads to the scaling terms being estimated homogeneously across the panel. This presence of a strong factor has an adverse effect on the size of T EH , which is originally proposed for independent panel data (see also Remark 2.2).
To observe the power behaviour, we present the results when n = 250 and T = 100 as a representative example in Tables 3-4 and report the rest in Appendix C. Increasing sample size generally leads to improved power and accuracy in estimated change-point location. As for the dimensionality, its increase has different effects depending on the error generating model: while there is no strong discernible trend with regards to increasing n in the case of (N1), it brings in dramatic improvement in the performance of T 0 for the noise generated from (N2), especially with increasing influence of the factor (when h = 0.9).
The tests tend to lose power when the change-point is sparse, the jumps are of smaller magnitude and its location is distanced away from the centre, and the similar arguments apply to the location and partitioning accuracy. Note that the Rand index occasionally decreases with increasing m 1 , as it measures both true positives and true negatives where the former may decrease with growing density of the change-point. This indicates that over the range of δ 1 considered here, some x j,t , j ∈ Π 1 may not contribute to change-point detection due to small jump size |δ j,1 |.
When ε t is generated from (N1) (where the cross-sectional correlations are relatively small), T EH and T consistently achieve high (size-corrected) power over the entire range of change-point configurations, and the latter also achieves high location accuracy and Rand index. In the case of (N2), T 0 outperforms all the other tests considered in our study in all change-point configurations and criteria for evaluation. Although not reported here, the performance of T in this particular setting can be improved by using a greater value for γ n , which prompts the development of a data-driven way of exploiting the information embedded in an array of DC statistics over ϕ ∈ [0, 1]. When the size of the data increases (T = 250), the performance of T catches up with that of T 0 in this setting. Comparing T 0 and T 1/2 , the former performs superior to the latter except for when the change-point is highly dense and the noise is generated from (N1).
T Jirak performs as well as T 0 and T or, occasionally, even better when the change-point is centrally located, but its performance deteriorates greatly when η 1 = 0.1T . This can be explained by the presence of a multiplicative factor 2020 H. Cho Table 2 Type I error when α = 0.05; n = 100 (top) and n = 250 (bottom).  (2). b(T − b)/T |X j 1,b,T | amounts to a CUSUM-based changepoint test that attains the slowest rate at which its Type II error converges to zero among the set of CUSUM statistics studied in Section 3 of Brodsky and Darkhovsky (1993), with the rate depending on T β−1 instead of T β/2−1/2 as is the case with |X j 1,b,T |; see Theorem 3.5.2 of Brodsky and Darkhovsky (1993) for further details. Due to the conservative behaviour of the oracle threshold as observed in Table 2, T SBS does not outperform the other tests.
Interestingly, when T Jirak , T 0 , T 1/2 and T attain the similar level of power, the latter two achieve higher accuracy in locating the change-point. It is attributed to the fact that the latter usually select greater m ϕ b in (6), which is evidenced by the higher Rand index values.

Multiple change-point detection
In this section, we evaluate the empirical behaviour of the DCBS algorithm with T 0 and T , based on their good performance observed in Section 5.1.2. For comparison, we also investigate the performance of the SBS algorithm furnished with the oracle threshold.
Fixing the dimensionality and the size of data at n = 250 and T = 250, we consider piecewise constant signal {f j,t } T t=1 containing three change-points as follows: at t = η r , an index set Π r of cardinality m r is randomly drawn from {1, . . . , n}, where |δ j,r | ∼ i.i.d U(0.75δ r , 1.25δ r ) for j ∈ Π r . We set (η 1 , m 1 , δ 1 ) = ( r remains identical over all r = 1, 2, 3. The noise ε t is generated as in (N1) and (N2) of Section 5.1.1 with varying and h . We set B = 100, d T = 5 and L T = [log 2 (log T + 1)] where the latter is chosen to permit a growing number (log T ) of change-points in the data. To account for multiple testing, the Bonferroni's correction is adopted by setting α = α * /(2 L T − 1) with α * = 0.05. We report the total number of estimated change-points ( N , in %) and their location accuracy (| η r − η r | < log T , in %), over 100 simulated sample paths for each setting in Table 5 and Figures 2-3.
Overall, the BS algorithm applied with T performs the best in detecting all three change-points as well as identifying their locations over 80% of the simulated data. In comparison, T 0 or T SBS tend to miss η 1 which is associated with the smallest jump size, while all three methods tend to detect η 3 the Table 3. n = 250, T = 100 and α = 0.05: (N1) with = 0.2 (top) and = 0.5 (bottom).
size-corrected power location accuracy (%) Rand index δ1 m1   best, which is associated with the largest jump size. The behaviour of T 0 and T change dramatically when ε t is generated from (N2) with a strong factor ( h = 0.9), where the performance of T 0 improves (identifying all three changepoints over 70% of the simulated data) whereas that of the latter deteriorates greatly. As observed in Section 5.1.2, using a larger scaling term γ n may improve the performance of T in this setting. While the performance of T SBS is better when ε t is generated from (N2) rather than (N1), the choice of threshold for T SBS turns out to be too conservative overall. Although each ε j,t is generated from the identical model, employing n thresholds for the n-dimensional panel data may lead to better small sample performance of T SBS .

Application to financial time series data
We analysed the log returns of the closing prices of all S&P 100 component stocks between February 24, 2015 and February 23, 2016, which are denoted by y i,t , i = 1, . . . , n; t = 1, . . . , T with n = 88 (only those components which remained in the index for a longer period were included) and T = 252. Jirak (2015) analysed a similar financial dataset for a single change in its mean and variance, respectively. However, considering that (i) log returns are often modelled to have zero-mean and time-varying conditional variance using conditionally heteroscedastic models, and (ii) it is difficult to rule out the possible existence of multiple change-points, we chose to perform the change-point analysis in the (unconditional) second-order structure of y i,t using the DCBS algorithm. For the purpose, wavelet-based periodogram and cross-periodogram sequences of y i,t were computed, which were also adopted to comprise the input panel data to the SBS-MVTS algorithm in Cho and Fryzlewicz (2015). Any change-point in the autocovariance and cross-covariance structure of y i,t is detectable from examining the wavelet (cross-)periodogram sequences; for further details, see Section 3.1 of Cho and Fryzlewicz (2015). We used Haar wavelets at the two finest scales to produce the periodogram sequences, which are denoted by x j,t , j = 1, . . . , n = n( n + 1); t = 1, . . . , T . The DCBS algorithm with D m detects a single change-point at t = 220, which corresponds to January 6, 2016. It has been noted that the first week of trading in 2016 marked the worst five-day start to a year ever, according to S&P Dow Jones Indices (Financial Times, http://www.ft.com/fastft/2016/01/ 07/sp-500-logs-worst-annual-kick-off-on-record/). For example, the S&P 500 index dropped by 4.9% during the period, and the Dow Jones Industrial Average by 6.19%. Figure 4 shows y i,t (left) and the pointwise maximum of the DC statistics at the first iteration of the DCBS algorithm (right), where such behaviour of the financial market at the beginning of 2016 is reflected as a large peak in the latter.

Conclusions
In this paper, we have proposed the DC statistic, a novel way of aggregating high-dimensional CUSUM series across the panel for change-point analysis, and showed its consistency in single and multiple change-point detection both theoretically and empirically. We conclude by listing possible future research projects.
• The DC statistic can be applied to detect changes in high-dimensional time series besides those in the mean of panel data. For example, the DCBS algorithm is easily extended to detect change-points in both autocovariance and cross-covariance structure of n-dimensional time series, by taking panel data consisting of local periodogram and cross-periodogram sequences as an input, see the real data analysis in Section 6. Moreover, the DC operator may be regarded as a generic tool that can be adopted to aggregate multiple series of statistics cross-sectionally, the result of which can be utilised for panel data analysis beyond change-point detection. • Empirically, it was shown that T generally outperforms T ϕ for any ϕ ∈ {0, 1/2}, while in the presence of strong cross-correlations, T 0 performs better than the rest. This opens up several possibilities for future research, including the investigation of the "optimal" way of exploiting the information contained in the infinite array of DC statistics over ϕ ∈ [0, 1].

H. Cho
• Proposed here for test criterion selection, the GDFM-boot algorithm showed good empirical performance, but it remains to investigate the validity of its application to change-point analysis by establishing that the bootstrap scheme mimics the correct second-order structure for a large class of time series processes. The GDFM-boot will have applicability to a wide range of inference and estimation problems involving highdimensional time series beyond the context of change-point analysis. For example, a key task in factor analysis is to estimate the number of common factors that drive the pervasive cross-sectional correlations, and several information criterion-type estimators have been proposed. However, there is lack of any attempt at statistical inference on the factor number, e.g., by constructing its confidence interval, and the GDFM-boot can be adopted for such tasks.

Preliminary results
We first prove a set of lemmas that are stepping stones for the proofs of Theorems 3.1-4.1. We assume that (A1)-(A3) and (A6) hold in all the lemmas where applicable, and the notations C i , c i , i = 0, 1, . . . are adopted to denote fixed positive constants throughout the proofs. Further, ε j,t denotes the scaled ε j,t with respect to an estimator σ j satisfying (A6), and x j,t and f j,t are defined similarly. Throughout, we operate within E σ defined in (A6). Proof. We first study the following probability Let d = e − s + 1. Theorem 1.4 of Bosq (1998) showed that under the conditions imposed in (A1), the probability in (10) is bounded from the above by is bounded by for some C 0 , C 1 > 0, where the latter depends on μ, c q , C ε , C and σ * 2 /σ 2 * . More specifically, we can impose appropriate conditions on the above parameters in order that C 1 > 7/2 + ω. Therefore, P(E c 1 ) is bounded from the above by Proof. Note that I 2 ⊂ I 1 , and On the event E 1 defined in Lemma 8.1, and similarly II ≤ (b − s + 1)/(e − s + 1) log T , uniformly over all j and (s, b, e) ∈ I 2 . Hence with probability tending to one as T → ∞, and therefore P(E c 2 ) ≤ P(E c 2 |E 1 )P(E 1 ) + P(E c 1 ) → 0. Next, we introduce N additive models y r,t = g r,t + ξ r,t , r = 1, . . . , N, which play a vital role in the following proofs. For each r, let {k r 1 , . . . , k r n } denote a permutation of {1, . . . , n}, and {i r 1 , . . . , i r n } a set of signs (taking values from {-1, 1} with repetitions), for which the followings hold:

H. Cho
(since δ k r j ,r = f k r j ,ηr+1 − f k r j ,ηr = 0 for all j ≥ m r + 1, the ordering and the set of signs are not unique). Then y r,t , g r,t and ξ r,t are defined as By its definition, {g r,t } T t=1 is a piecewise constant signal with change-points at t = η 1 , . . . , η N , and its jump at t = η r is of size satisfying the following: Similarly, Lemma 8.2 implies that for all r, with probability tending to one. Now we consider a generic additive model where y t , g t and ξ t are obtained from the panel data {x j,t } T t=1 , j = 1, . . . , n in the same manner as y r,t , g r,t and ξ r,t in (12)-(14), with respect to some m ∈ {1, . . . , n}, a permutation of index {k 1 , . . . , k n } and a sign sequence {i 1 , . . . , i n }. Then g t is a piecewise constant signal with change-points at t = η 1 , . . . , η N satisfying (B1). Also, it is easily seen that the inequalities (16)-(17) hold with the zero-mean noise series ξ t in place of ξ r,t with probability converging to one.
Recall that s and e denote the start and the end of an interval which is examined at some stage of our search for the change-points. Let s and e satisfy for 0 ≤ q 1 < q 2 ≤ N . In some of the following lemmas, we impose at least one of following conditions: with¯ T to be defined later. The condition (20) implies that there is at least one change-point to be detected which is sufficiently distanced away from the previously detected change-points s and e, and (21) indicates that each of s and e is detected for one of the true change-points.
Since the CUSUM statistics are not affected by the shift in the overall level of g t , we assume that  (20) and (21) hold. For η = arg max b∈[s,e) |C b ({y t } e t=s )|, there exists a true change-point η q ≡ η ∈ (s, e) satisfying | η−η| < c 0¯ T with probability converging to one, provided that Proof. The following proof is an adaptation of the proof of Theorem 3.3.1 in Fryzlewicz (2014)  Let us assume that | η − η| = c 0¯ T . Due to the fact that |C b ({g t } e t=s )| is monotonic, or decreasing and then increasing in b between two adjacent changepoints of g t (Lemma 2.7 of Venkatraman (1992)), it is enough to consider the case when η satisfies | η − η| = c 0¯ T . Then, if it is shown that the claim would be proved to be contradiction. Expanding the LHS of (22), we obtain Clearly, II < 0 from the definition of g * t . Let Ψ be the set of vectors of length (e − s + 1) whose elements are initially positive and constant, then after a break, are negative and constant; moreover, the elements sum to zero and when squared, sum to one. Since we assume that e t=s g t = 0, we can find a vector ψ * ∈ Ψ satisfying g * = g, ψ * ψ * where g = (g s , . . . , g e ) and g * = (g * s , . . . , g * e ) . Then we have e t=s (g t − g * t ) 2 = e t=s g 2 t − g, ψ * 2 .
As for e t=s ξ t ( g t − g t ), we have |V I| and |V II| are of the same order, and with probability converging to one, Putting together (23) and the upper bound on |III|-|V II|, we have the dominance of II over I under the conditions given in the lemma.

H. Cho
Lemma 8.5. Assume that there exists a single change-point η 1 in g t which satisfies (A4) and |g η1+1 − g η1 | ≥ δ. Then for some C 2 > 0, Proof. The first equality (24) is a direct result of Lemma 8.3. The second equality follows from the definition of |C η1 ({g t } T t=1 )|. Lemma 8.6. Assume that the conditions imposed in Lemma 8.5 are met. Then for some T , we have Proof. Without loss of generality, let g η1 = g * 1 > g * 2 = g η1+1 and that b > η 1 . Then where the inequality follows from the Taylor expansion.
With the above lemmas, we are now ready to prove the theorem. At the beginning of the DCBS algorithm, we have s = 1 and e = T for which both (20) and (21) hold, and thus η r1 is estimated by η r1 within the distance of T from a