Abstract
Many existing procedures for detecting multiple change-points in data sequences fail in frequent-change-point scenarios. This article proposes a new change-point detection methodology designed to work well in both infrequent and frequent change-point settings. It is made up of two ingredients: one is “Wild Binary Segmentation 2” (WBS2), a recursive algorithm for producing what we call a ‘complete’ solution path to the change-point detection problem, i.e. a sequence of estimated nested models containing \(0, \ldots , T-1\) change-points, where T is the data length. The other ingredient is a new model selection procedure, referred to as “Steepest Drop to Low Levels” (SDLL). The SDLL criterion acts on the WBS2 solution path, and, unlike many existing model selection procedures for change-point problems, it is not penalty-based, and only uses thresholding as a certain discrete secondary check. The resulting WBS2.SDLL procedure, combining both ingredients, is shown to be consistent, and to significantly outperform the competition in the frequent change-point scenarios tested. WBS2.SDLL is fast, easy to code and does not require the choice of a window or span parameter.
Similar content being viewed by others
References
Amiri, A., & Allahyari, S. (2012). Change point estimation methods for control chart postsignal diagnostics: A literature review. Quality and Reliability Engineering International, 28, 673–685.
Anastasiou, A., & Fryzlewicz, P. (2018a). Detecting multiple generalized change-points by isolating single ones. Preprint,
Anastasiou, A., & Fryzlewicz, P. (2018b). IDetect: Detecting multiple generalized change-points by isolating single ones. https://CRAN.R-project.org/package=IDetect. R package version 1.0.
Andreou, E., & Ghysels, E. (2002). Detecting multiple breaks in financial market volatility dynamics. Journal of Applied Econometrics, 17, 579–600.
Arlot, S. (2019). Minimal penalties and the slope heuristics: A survey. Journal de la Societe Française de Statistique, 160, 1–106.
Arlot, S., Brault, V., Baudry, J.-P., Maugis, C., & Michel, B. (2016). capushe: CAlibrating Penalities Using Slope HEuristics. https://CRAN.R-project.org/package=capushe. R package version 1.1.1.
Bai, J. (1997). Estimating multiple breaks one at a time. Econometric Theory, 13, 315–352.
Bai, J., & Perron, P. (2003). Computation and analysis of multiple structural change models. Journal of Applied Econometrics, 18, 1–22.
Baranowski, R., & Fryzlewicz, P. (2015). wbs: Wild binary segmentation for multiple change-point detection. https://CRAN.R-project.org/package=wbs. R package version 1.3.
Baranowski, R., Chen, Y., & Fryzlewicz, P. (2019). Narrowest-Over-Threshold detection of multiple change-points and change-point-like features. Journal of the Royal Statistical Society: Series B, 81, 649–672.
Baudry, J.-P., Maugis, C., & Michel, B. (2012). Slope heuristics: Overview and implementation. Statistics and Computing, 22, 455–470.
Birgé, L., & Massart, P. (2001). Gaussian model selection. Journal of the European Mathematical Society, 3, 203–268.
Birgé, L., & Massart, P. (2007). Minimal penalties for Gaussian model selection. Probability Theory and Related Fields, 138, 33–73.
Bosq, D. (1998). Nonparametric statistics for stochastic processes (2nd ed.). New York: Springer.
Boysen, L., Kempe, A., Liebscher, V., Munk, A., & Wittich, O. (2009). Consistencies and rates of convergence of jump-penalized least squares estimators. Annals of Statistics, 37, 157–183.
Braun, J., & Mueller, H.-G. (1998). Statistical methods for DNA sequence segmentation. Statistical Science, 13, 142–162.
Braun, J., Braun, R., & Mueller, H.-G. (2000). Multiple changepoint fitting via quasilikelihood, with application to dna sequence segmentation. Biometrika, 87, 301–314.
Brodsky, B., & Darkhovsky, B. (1993). Nonparametric methods in change-point problems. Dordrecht: Kluwer Academic Publishers.
Chen, K.-M., Cohen, A., & Sackrowitz, H. (2011). Consistent multiple testing for change points. Journal of Multivariate Analysis, 102, 1339–1343.
Cho, H., & Fryzlewicz, P. (2011). Multiscale interpretation of taut string estimation and its connection to Unbalanced Haar wavelets. Statistics and Computing, 21, 671–681.
Cho, H., & Fryzlewicz, P. (2012). Multiscale and multilevel technique for consistent segmentation of nonstationary time series. Statistica Sinica, 22, 207–229.
Cho, H., & Fryzlewicz, P. (2015). Multiple change-point detection for high-dimensional time series via sparsified binary segmentation. Journal of the Royal Statistical Society Series B, 77, 475–507.
Ciuperca, G. (2011). A general criterion to determine the number of change-points. Statistics & Probability Letters, 81, 1267–1275.
Ciuperca, G. (2014). Model selection by LASSO methods in a change-point model. Statistical Papers, 55, 349–374.
Cleynen, A., Rigaill, G., & Koskas, M. (2016). Segmentor3IsBack: A fast segmentation algorithm. https://CRAN.R-project.org/package=Segmentor3IsBack. R package version 2.0.
D’Angelo, M., Palhares, R., Takahashi, R., Loschi, R., Baccarini, L., & Caminhas, W. (2011). Incipient fault detection in induction machine stator-winding using a fuzzy-Bayesian change point detection approach. Applied Soft Computing, 11, 179–192.
Davies, P. L., & Kovac, A. (2001). Local extremes, runs, strings and multiresolution. Annals of Statistics, 29, 1–48.
Davis, R., Lee, T., & Rodriguez-Yam, G. (2006). Structural break estimation for nonstationary time series models. Journal of the American Statistical Association, 101, 223–239.
Du, C., Kao, C.-L., & Kou, S. (2016). Stepwise signal extraction via marginal likelihood. Journal of the American Statistical Association, 111, 314–330.
Efron, B., Hastie, T., Johnstone, I., & Tibshirani, R. (2004). Least angle regression. Annals of Statistics, 32, 407–499.
Eichinger, B., & Kirch, C. (2018). A MOSUM procedure for the estimation of multiple random change points. Bernoulli, 24, 526–564.
Frick, K., Munk, A., & Sieling, H. (2014). Multiscale change-point inference (with discussion). Journal of the Royal Statistical Society Series B, 76, 495–580.
Fryzlewicz, P. (2014). Wild binary segmentation for multiple change-point detection. Annals of Statistics, 42, 2243–2281.
Fryzlewicz, P. (2017). breakfast: Multiple change-point detection and segmentation. https://CRAN.R-project.org/package=breakfast. R package version 1.0.0.
Fryzlewicz, P. (2018). Tail-greedy bottom-up data decompositions and fast multiple change-point detection. The Annals of Statistics, 46, 3390–3421.
Fryzlewicz, P., & Rao, S Subba. (2014). Multiple-change-point detection for auto-regressive conditional heteroscedastic processes. Journal of the Royal Statistical Society Series B, 76, 903–924.
Galceran, E., Cunningham, A., Eustice, R., & Olson E. (2015). Multipolicy decision-making for autonomous driving via changepoint-based behavior prediction. In 2015 robotics: Science and systems conference, RSS 2015 (vol. 11).
Guntuboyina, A., Lieu, D., Chatterjee, S., & Sen, B. (2020). Adaptive risk bounds in univariate total variation denoising and trend filtering. The Annals of Statistics, 48, 205–229.
Hansen, B. (2001). The new econometrics of structural change: Dating breaks in U.S. labour productivity. Journal of Economic Perspectives, 15, 117–128.
Harchaoui, Z., & Lévy-Leduc, C. (2010). Multiple change-point estimation with a total variation penalty. Journal of the American Statistical Association, 105, 1480–1493.
Huang, C.-Y., & Lyu, M. (2011). Estimation and analysis of some generalized multiple change-point software reliability models. IEEE Transactions on Reliability, 60, 498–514.
Huskova, M., & Slaby, A. (2001). Permutation tests for multiple changes. Kybernetika, 37, 605–622.
James, N., & Matteson, D. (2014). ecp: An R package for nonparametric multiple change point analysis of multivariate data. Journal of Statistical Software, 62, 1–25.
Killick, R., Fearnhead, P., & Eckley, I. (2012). Optimal detection of changepoints with a linear computational cost. Journal of the American Statistical Association, 107, 1590–1598.
Killick, R., Haynes, K., & Eckley, I. (2016). changepoint: An R package for changepoint analysis. https://CRAN.R-project.org/package=changepoint. R package version 2.2.2.
Korkas, K., & Fryzlewicz, P. (2017). Multiple change-point detection for non-stationary time series using wild binary segmentation. Statistica Sinica, 27, 287–311.
Lavielle, M. (1999). Detection of multiple changes in a sequence of dependent variables. Stochastic Processes and their Applications, 83, 79–102.
Lavielle, M. (2005). Using penalized contrasts for the change-point problem. Signal Processing, 85, 1501–1510.
Lavielle, M., & Moulines, E. (2000). Least-squares estimation of an unknown number of shifts in a time series. Journal of Time Series Analysis, 21, 33–59.
Lebarbier, E. (2005). Detecting multiple change-points in the mean of Gaussian process by model selection. Signal Processing, 85, 717–736.
Lee, C.-B. (1995). Estimating the number of change points in a sequence of independent normal random variables. Statistics and Probability Letters, 25, 241–248.
Li, H., & Munk, A. (2016). FDR-control in multiscale change-point segmentation. Electronic Journal of Statistics, 10, 918–959.
Li, H., & Sieling, H. (2017). FDRSeg: FDR-control in multiscale change-point segmentation. https://CRAN.R-project.org/package=FDRSeg. R package version 1.0-3.
Lin, K., Sharpnack, J. L., Rinaldo, A., & Tibshirani, R. J. (2017). A sharp error analysis for the fused lasso, with application to approximate changepoint screening. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, & R. Garnett (Eds.), Advances in neural information processing systems (pp. 6884–6893). Curran Associates, Inc.
Liu, D., Chen, X., Lian, Y., & Lou, Z. (2010). Impacts of climate change and human activities on surface runoff in the Dongjiang River basin of China. Hydrological Processes, 24, 1487–1495.
Maidstone, R., Hocking, T., Rigaill, G., & Fearnhead, P. (2017). On optimal multiple changepoint algorithms for large data. Statistics and Computing, 27, 519–533.
Mallows, C. (1991). Another comment on O’Cinneide. The American Statistician, 45, 257.
Matteson, D., & James, N. (2014). A nonparametric approach for multiple change point analysis of multivariate data. Journal of the American Statistical Association, 109, 334–345.
Meier, A., Cho, H., & Kirch, C. (2018). mosum: Moving sum based procedures for changes in the mean. https://CRAN.R-project.org/package=mosum. R package version 1.2.0.
Muggeo, V. (2003). Estimating regression models with unknown break-points. Statistics in Medicine, 22, 3055–3071.
Muggeo V. (2012). cumSeg: Change point detection in genomic sequences. https://CRAN.R-project.org/package=cumSeg. R package version 1.1.
Muggeo, V., & Adelfio, G. (2011). Efficient change point detection for genomic sequences of continuous measurements. Bioinformatics, 27, 161–166.
National Research Council. Frontiers in Massive Data Analysis. Washington, DC: The National Academies Press (2013). https://doi.org/10.17226/18374.
Olshen, A., Venkatraman, E. S., Lucito, R., & Wigler, M. (2004). Circular binary segmentation for the analysis of array-based DNA copy number data. Biostatistics, 5, 557–572.
Pan, J., & Chen, J. (2006). Application of modified information criterion to multiple change point problems. Journal of Multivariate Analysis, 97, 2221–2241.
Pein, F., Hotz, T., Sieling, H., & Aspelmeier, T. (2018). stepR: Multiscale change-point inference. https://CRAN.R-project.org/package=stepR. R package version 2.0-2.
Pezzatti, G., Zumbrunnen, T., Bürgi, M., Ambrosetti, P., & Conedera, M. (2013). Fire regime shifts as a consequence of fire policy and socio-economic development: An analysis based on the change point approach. Forest Policy and Economics, 29, 7–18.
Pierre-Jean, M., Rigaill, G., & Neuvial P. (2017). jointseg: Joint segmentation of multivariate (copy number) signals. https://CRAN.R-project.org/package=jointseg. R package version 1.0.1.
Ranganathan, A. (2012). PLISS: Labeling places using online changepoint detection. Autonomous Robots, 32, 351–368.
Reeves, J., Chen, J., Wang, X., Lund, R., & Lu, Q. (2007). A review and comparison of changepoint detection techniques for climate data. Journal of Applied Meteorology and Climatology, 46, 900–915.
Rigaill, G. (2015). A pruned dynamic programming algorithm to recover the best segmentations with 1 to \(k_{max}\) change-points. Journal de la Societe Francaise de Statistique, 156, 180–205.
Rigaill, G., & Hocking, T.D. (2016). fpop: Segmentation using Optimal Partitioning and Function Pruning, URL https://R-Forge.R-project.org/projects/opfp/. R package version 2016.10.25/r55.
Rinaldo, A. (2009). Properties and refinements of the fused lasso. Annals of Statistics, 37, 2922–2952.
Rojas, C., & Wahlberg, B. (2014). On change point detection using the fused lasso method. Unpublished manuscript.
Ross, G. J. (2015). Parametric and nonparametric sequential change detection in R: the cpm package. Journal of Statistical Software, 66, 1–20.
Salarijazi, M., Akhond-Ali, A., Adib, A., & Daneshkhah, A. (2012). Trend and change-point detection for the annual stream-flow series of the Karun River at the Ahvaz hydrometric station. African Journal of Agricultural Research, 7, 4540–4552.
Tibshirani, R. (2014). Adaptive piecewise polynomial estimation via trend filtering. Annals of Statistics, 42, 285–323.
Tibshirani, R., Saunders, M., Rosset, S., Zhu, J., & Knight, K. (2005). Sparsity and smoothness via the fused lasso. Journal of the Royal Statistical Society: Series B, 67, 91–108.
Truong, C., Oudre, L., & Vayatis, N. (2020). Selective review of offline change point detection methods. Signal Processing, 167, 107299.
Venkatraman, E.S. (1992). Consistency results in multiple change-point problems. Technical Report No. 24, Department of Statistics, Stanford University. https://statistics.stanford.edu/resources/technical-reports.
Venkatraman, E. S., & Olshen, A. (2007). A faster circular binary segmentation algorithm for the analysis of array CGH data. Bioinformatics, 23, 657–663.
Vostrikova, L. (1981). Detecting ‘disorder’ in multidimensional random processes. Soviet Mathematics Doklady, 24, 55–59.
Wang, D., Yu, Y., & Rinaldo, A. (2018). Univariate mean change point detection: Penalization. Preprint: CUSUM and optimality.
Wang, T., & Samworth, R. (2018). High dimensional change point estimation via sparse projection. Journal of the Royal Statistical Society: Series B, 80, 57–83.
Wang, Y. (1995). Jump and sharp cusp detection by wavelets. Biometrika, 82, 385–397.
Wu, Y. (2008). Simultaneous change point analysis and variable selection in a regression problem. Journal of Multivariate Analysis, 99, 2154–2171.
Yao, Y.-C. (1988). Estimating the number of change-points via Schwarz’ criterion. Statistics & Probability Letters, 6, 181–189.
Yao, Y.-C., & Au, S. T. (1989). Least-squares estimation of a step function. Sankhya Series A, 51, 370–381.
Younes, L., Albert, M., & Miller, M. (2014). Inferring changepoint times of medial temporal lobe morphometric change in preclinical Alzheimer’s disease. NeuroImage: Clinical, 5, 178–187.
Zeileis, A., Leisch, F., Hornik, K., & Kleiber, C. (2002). strucchange: An R package for testing for structural change in linear regression models. Journal of Statistical Software, 7, 1–38.
Zhang, N., & Siegmund, D. (2007). A modified Bayes information criterion with applications to the analysis of comparative genomic hybridization data. Biometrics, 63, 22–32.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Work supported by the Engineering and Physical Sciences Research Council Grant no. EP/L014246/1.
Proofs
Proofs
Proof of Theorem 3.1
Part (i). The statement is trivially true if \(T = 1\). Suppose, inductively, that it is true for data lengths \(1, \ldots , T-1\). For an input data sequence of length T, the first addition to the solution path breaks the domain of operation into two, of lengths \(b_{m_0}\) and \(T-b_{m_0}\). Therefore by the inductive hypothesis, the length of the solution path for the input data will be \(1 + (b_{m-0} - 1) + (T-b_{m_0} - 1) = T-1\), which completes the proof of part (i).
Part (ii). The computation of the CUSUM statistic for all b in formula (2) is of computational order \(O(e-s)\). Therefore the addition of elements to the solution path from all sub-domains within a single scale of operation is of computational order \(O({\tilde{M}}T)\). If the scale has reached J, there is nothing to do and the procedure has stopped. Therefore the overall computational complexity before the sorting step is of order \(O({\tilde{M}}JT)\). The sorting can be performed in computational time \(O(T \log \,T)\), which never dominates \(O({\tilde{M}}JT)\) as the smallest possible J is of computational order \(O(\log \,T)\), which is straightforward to see from its definition. This completes the proof of part (ii).
Part (iii). We first set the probabilistic framework in which we analyse the behaviour of the WBS2 solution path algorithm. With the change-point locations denoted by \(\eta _1, \ldots , \eta _N\) (with the additional notation \(\eta _0 = 1\), \(\eta _{N+1} = T+1\)), we define the intervals
where \(k,l \in \mathbb {Z} \cap [- C_1(\Delta )({\underline{f}}_T)^{-2}\log \,T, C_1(\Delta )({\underline{f}}_T)^{-2}\log \,T]\) for each \(1 \le i+1 < j \le {N+1}\). Suppose that on each interval \(I_{i,j}^{k,l}\), \({\tilde{M}}\) intervals \(\{[s_m, e_m]\}_{m=1}^{{\tilde{M}}}\) have been drawn, with the start- and end-points having been drawn uniformly with replacement from \(I_{i,j}^{k,l}\) (if \({\tilde{M}} > |I_{i,j}^{k,l}|(|I_{i,j}^{k,l}| - 1)/2\), then we understand \(\{[s_m, e_m]\}_{m=1}^{{\tilde{M}}}\) to contain all possible subintervals of \(I_{i,j}^{k,l}\)). Note we do not reflect the (stochastic) dependence of \(s_m, e_m\) on i, j, k, l so as not to over-complicate the notation. As in the proof of Theorem 3.2 in Fryzlewicz (2014), we define intervals \({{\mathcal {I}}}_r\) between change-points in such a way that their lengths are at least of order O(T), and they are separated from the change-points also by distances at least of order O(T). To fix ideas, define \({{\mathcal {I}}}_r = [\eta _{r-1} + \frac{1}{3}(\eta _r - \eta _{r-1}), \eta _{r-1} + \frac{2}{3}(\eta _r - \eta _{r-1})]\), \(r = 1, \ldots , N+1\). For each interval \(I_{i,j}^{k,l}\), we are interested in the following event
Note that
and hence
Define further the following event
where \({\tilde{\varepsilon }}_{s,e}^b\) is defined as in formula (2) with \(\varepsilon _t\) in place of \(X_t\). It is the statement of Lemma A.1 in Fryzlewicz (2018) that \({\mathcal {B}}_{\Delta , T} \subseteq {{\mathcal {D}}}_{\Delta , T}\).
The following arguments apply on the set \(\bigcap _{i,j,k,l} A_{i,j}^{k,l} \cap {{\mathcal {D}}}_{\Delta , T}\) for any fixed \(\Delta > 0\). Proceeding exactly as in the proof of Theorem 3.2 in Fryzlewicz (2014), at the start of the procedure we have \(s = 1\), \(e = T\), and in view of the fact that we are on \(A_{0,N+1}^{0,0} \cap {{\mathcal {D}}}_{\Delta , T}\), the procedure finds a \(b_{m_0} \in [s_{m_0}, e_{m_0}] \subseteq [s,e]\) such that
-
(a)
there exists an \(r \in \{1, 2, \ldots , N\}\) such that \(|\eta _r - b_{m_0}| \le C_1(\Delta )({\underline{f}}_T)^{-2}\log \,T\), and
-
(b)
\(|{\tilde{X}}_{s_{m_0}, e_{m_0}}^{b_{m_0}}| \gtrsim T^{1/2} {\underline{f}}_T\),
where the \(\gtrsim \) symbol means “of the order of or larger”. Again by arguments identical to those in the proof of Theorem 3.2 in Fryzlewicz (2014), on each subsequent segment containing previously undetected change-points, we are on one of the sets \(A_{i,j}^{k,l} \cap {{\mathcal {D}}}_{\Delta , T}\), and therefore the procedure again finds a \(b_{m_0}\) satisfying properties (a), (b) above for a certain previously undetected change-point \(\eta _r\), until all change-points have been identified in this way. Once all change-points have been identified, by Lemma A.5 of Fryzlewicz (2014), all maximisers \(b_{m_0}\) of the absolute CUSUM statistics \(|{\tilde{X}}_{s_m, e_m}^b|\) (over b) are such that \(|{\tilde{X}}_{s_{m_0}, e_{m_0}}^{b_{m_0}}| \le C_2(\Delta ) \log ^{1/2}T\). As \(\log ^{1/2}T = o(T^{1/2}{\underline{f}}_T)\), for T large enough, the sorting of the elements of \(\tilde{\mathcal {P}}\) does not move any elements from the first N to the last \(T-1-N\) or vice versa, which completes the proof of part (iii). \(\square \)
Proof of Theorem 3.2
With the notation as in the statement of the theorem and in the proof of Theorem 3.1, the following arguments apply on the set \(\bigcap _{i,j,k,l} A_{i,j}^{k,l} \cap {{\mathcal {D}}}_{\Delta , T} \cap \Theta _{\theta , T}\) for any fixed \(\Delta > 0\). Let T be large enough for the assertion of Theorem 3.1 to hold. Further, let \({\tilde{C}}\) be large enough for the following inequality to hold
(this is always possible on \(\Theta _{\theta , T}\)). If \(N = 0\), then in view of inequality (6) and the fact that we are on \({{\mathcal {D}}}_{\Delta , T}\), from the definition of the SDLL algorithm we necessarily have \({\hat{N}} = 0\). If \(N > 0\), then by part (iii) of Theorem 3.1, we must have \(K + 1 \ge N\). If \(|{\tilde{X}}_{s_{K+1}, e_{K+1}}^{b_{K+1}}| > {\tilde{\zeta }}_T\), then by inequality (6), we have \(N = K + 1\) and the SDLL procedure correctly identifies \({\hat{N}} = K + 1 = N\). If \(|{\tilde{X}}_{s_{K+1}, e_{K+1}}^{b_{K+1}}| < {\tilde{\zeta }}_T\), then we necessarily have \(N < K+1\), and by part (iii) of Theorem 3.1, we have, for \(Z_k = \log |{\tilde{X}}_{s_k, e_k}^{b_k}| - \log |{\tilde{X}}_{s_{k+1}, e_{k+1}}^{b_{k+1}}|\),
Therefore, from the definition of the SDLL and by part (iii) of Theorem 3.1, for T large enough, \({\hat{N}} = N\) will be chosen. This completes the proof of the Theorem. \(\square \)
Rights and permissions
About this article
Cite this article
Fryzlewicz, P. Detecting possibly frequent change-points: Wild Binary Segmentation 2 and steepest-drop model selection. J. Korean Stat. Soc. 49, 1027–1070 (2020). https://doi.org/10.1007/s42952-020-00060-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s42952-020-00060-x