Relevant change points in high dimensional time series

This paper investigates the problem of detecting relevant change points in the mean vector, say $\mu_t =(\mu_{1,t},\ldots ,\mu_{d,t})^T$ of a high dimensional time series $(Z_t)_{t\in \mathbb{Z}}$. While the recent literature on testing for change points in this context considers hypotheses for the equality of the means $\mu_h^{(1)}$ and $\mu_h^{(2)}$ before and after the change points in the different components, we are interested in a null hypothesis of the form $$ H_0: |\mu^{(1)}_{h} - \mu^{(2)}_{h} | \leq \Delta_h ~~~\mbox{ for all } ~~h=1,\ldots ,d $$ where $\Delta_1, \ldots , \Delta_d$ are given thresholds for which a smaller difference of the means in the $h$-th component is considered to be non-relevant. We propose a new test for this problem based on the maximum of squared and integrated CUSUM statistics and investigate its properties as the sample size $n$ and the dimension $d$ both converge to infinity. In particular, using Gaussian approximations for the maximum of a large number of dependent random variables, we show that on certain points of the boundary of the null hypothesis a standardised version of the maximum converges weakly to a Gumbel distribution.

(1) h and µ (2) h before and after the change points in the different components, we are interested in a null hypothesis of the form h | ≤ ∆ h for all h = 1, . . . , d where ∆ 1 , . . . , ∆ d are given thresholds for which a smaller difference of the means in the hth component is considered to be non-relevant. This formulation of the testing problem is motivated by the fact that in many applications a modification of the statistical analysis might not be necessary, if the differences between the parameters before and after the change points in the individual components are small. This problem is of particular relevance in high dimensional change point analysis, where a small change in only one component can yield a rejection by the classical procedure although all components change only in a non-relevant way.
We propose a new test for this problem based on the maximum of squared and integrated CUSUM statistics and investigate its properties as the sample size n and the dimension d both converge to infinity. In particular, using Gaussian approximations for the maximum of a large number of dependent random variables, we show that on certain points of the boundary of the null hypothesis a standardised version of the maximum converges weakly to a Gumbel distribution. This result is used to construct a consistent asymptotic level α test and a multiplier bootstrap procedure is proposed, which improves the finite sample performance of the test. The finite sample properties of the test are investigated by means of a simulation study and we also illustrate the new approach investigating data from hydrology.

Introduction
In the context of high dimensional time series it is typically unrealistic to assume stationarity. A simple form of non-stationarity, which is motivated by financial time series, where large panels of asset returns routinely display break points, is to assume structural breaks at different times (the change points) in the individual components. One goal of statistical inference is to correctly estimate these "change points" such that the original data can be partitioned into shorter stationary segments. This field is called change point analysis in the statistical literature and since the seminal papers of Page (1954Page ( , 1955 numerous authors have worked on the problem of detecting structural breaks or change points in various statistical models [see Aue and Horváth (2013) for a recent review]. There exists in particular an enormous amount of literature on testing for and estimating the location of a change in the mean vector µ t = (µ 1,t , . . . , µ d,t ) T = E [Z t ] of a multivariate time series (Z t ) n t=1 [see Chu et al. (1996), Horváth et al. (1999), Kirch et al. (2015) among others]. A common feature in these references consists in the fact that the dimension, say d, of the time series is fixed. High dimensional change point problems, where the dimension d increases with sample size have only been recently considered in the literature [see Bai (2010), Zhang et al. (2010), Horváth and Husková (2012) and Enikeeva and Harchaoui (2014), Jirak (2015a), Cho and Fryzlewicz (2015) and Wang and Samworth (2016) among others]. Some of this work uses information across the coordinates in order to detect smaller changes than could be observed in any individual component series.
In the simplest case of one structural break in each component many authors attack the problem of detecting the change point by means of hypothesis testing. For example, Jirak (2015a) investigates the hypothesis of no structural break in a high-dimensional time series by testing the hypotheses H 0 : µ h,1 = µ h,2 = . . . = µ h,n for all h = 1, . . . , d, (1.1) where µ h,t denotes the h-th component of the mean vector µ t of the random variable Z t (t = 1, . . . , n). The alternative is then formulated (in the simplest case of one structural break) as H 1 : µ (1) h = µ h,1 = µ h,2 = · · · = µ h,2 (1.2) = µ h,k h +1 = µ h,k h +2 = · · · = µ h,n = µ (2) h for at least one h ∈ {1, . . . , d}, where k h ∈ {1, . . . , n} denotes the (unknown) location of the change point in the h-th component. While -even under sparsity assumptions -the detection of small changes in each component is a very challenging problem, a modification of the statistical analysis (such as prediction) might not be necessary if the actual size of change is small. For example, in risk management situations, one is interested in fitting a suitable model for forecasting Value at Risk from data after the last change point [see e.g. Wied (2013)], but in practice, small changes in the parameter are perhaps not very interesting because they do not yield large changes in the Value at Risk. The forecasting quality might only improve slightly, but this benefit could be negatively overcompensated by transaction costs, in particular in high-dimensional portfolios. Moreover, even, if the null hypothesis (1.1) is not rejected, it is difficult to quantify the statistical uncertainty for the subsequent statistical analysis (conducted under the assumption of stationarity), as there is no control about the type II error in this case.
The present work is motivated by these observations and proposes a test for the null hypothesis of no relevant change point in a high dimensional context, that is h are the parameters before and after the change point in the h-th component and ∆ 1 , . . . , ∆ d are given thresholds for which a smaller difference of the means in the h-th component is considered as non-relevant.
The problem of testing for a relevant difference between (one dimensional) means of two samples has been discussed by numerous authors mainly in the field of biostatistics (see Wellek (2010) for a recent review). In particular testing relevant hypotheses avoids the consistency problem as mentioned in Berkson (1938), that is: Any consistent test will detect any arbitrary small change in the parameters if the sample size is sufficiently large. Dette and Wied (2016) considered relevant hypotheses in the context of change point problems for general parameters, but did not discuss the high dimensional setup, where the dimension increases with sample size. In this case the statistical problems are completely different. The alternative approach requires the specification of the thresholds ∆ h > 0, and this has to be carefully discussed and depends on the specific application. We also note that the hypotheses (1.3) contain the classical hypotheses (1.1), which are obtained by simply choosing ∆ h = 0 for all h = 1, . . . , d. Nevertheless we argue that from a practical point of view it might be very reasonable to think about this choice more carefully and to define the size of change in which one is really interested. In particular it is often known that ∆ h = 0 although one is testing "classical hypotheses" of the form (1.1) and (1.2). Moreover, a decision of no relevant structural break at a controlled type I error can be easily achieved by interchanging the null hypothesis and alternative in (1.3), i.e. considering the hypotheses (1.6) In this paper we propose for the first time a test for the hypotheses of a relevant structural break in any of the components of a high dimensional time series. The basic ideas are explained in Section 2 (without going into any technical details), where we propose to calculate for any component the integral of the squared CUSUM process and reject the null hypotheses whenever the maximum of these integrals (calculated with respect to all components) is large. In order to obtain critical values for this test we derive in Section 3 the asymptotic distribution of an appropriately standardized version of the maximum as the sample size and the dimension converge to infinity. We also provide several auxiliary results, which are of own interest, and investigate the case where the maximum is only calculated over a subset of the components. These results are then used in Section 4 to prove that the proposed test yields to a valid statistical inference, i.e. it is a consistent and asymptotic level α test. It turns out that -in contrast to the classical change point problem -the analysis of the test for no relevant structural breaks is substantially harder as the null hypothesis does not correspond to a stationary process (non-relevant changes in the means are allowed). Section 5 is devoted to the investigation of a multiplier block bootstrap procedure. In particular we prove that the quantiles generated by this resampling method also yield to a consistent asymptotic level α test. The finite sample properties of the new test are investigated in Section 6, where we also illustrate our approach analysing a data example from hydrology. Finally some of the technical details are deferred to Section 7.

Relevant changes in high dimensions -basic principles
In this Section we explain the basic idea of our approach to test for a relevant change in at least one component of the mean vector of a high dimensional time series. For the sake of a transparent representation we try to avoid technical details at this stage and refer to the subsequent sections, where we present the basic assumptions and mathematical details establishing the validity of the proposed method.
Throughout this paper we consider an array of real valued random variables {Z j,h } j∈Z,h∈N such that where µ j,h ∈ R for all j ∈ Z, h ∈ N and {X j,h } j∈Z,h∈N denotes an array of centered and real valued random variables, which implies µ j,h = E [Z j,h ] for all j ∈ Z, h ∈ N. It follows from the assumptions made in Section 3 that for each fixed d ∈ N the time series is stationary. Suppose that Z 1 = (Z 1,1 , . . . , Z 1,d ) T , . . . , Z n = (Z n,1 , . . . , Z n,d ) T ∈ R d are d-dimensional observations from the array {Z j,h } j∈Z,h∈N and assume that for each component h ∈ {1, . . . , d} there exists an unknown constant t h ∈ (0, 1), such that where x = sup{z ∈ Z | z ≤ x} denotes the larges integer smaller or equal than x. In this case the random variables {Z j,h } h=1,...,d;j=1,...,n also depend on n, i.e. they form a triangular array, but for the sake of readability, we will suppress this dependence in our notation. We h as the unknown difference between the means in the h-th component before and after the change point t h . Note, that in the case ∆µ h = 0 the actual location k h = nt h of the change point depends on the sample size n, which is a common assumption in the literature on change point problems to perform asymptotic inference. It simply ensures that the number of observations before and after the change point is growing proportional to n. For each h with ∆µ h = 0 the observable process {Z j,h } j∈Z is stationary and to avoid misunderstandings we set t h = 1/2, whenever ∆µ h = 0. The reader should notice that in this case the actual value is of no interest in the following discussion. With this notation the hypotheses in (1.3) can be rewritten as Under the assumptions stated in Section 3 it is shown in Section 7.1 that as n → ∞ and therefore our considerations will be based on the statistiĉ wheret h denotes an appropriate estimator for the unknown location t h of the structural break, that will be precisely defined in Section 3.2. The null hypothesis of no relevant change in the h-th component will be rejected for large values ofM 2 n,h , and in order to determine a critical value we introduce the normalization whereσ h denotes an estimator for the unknown long-run variance (see Section 3.3 for a precise definition), and the quantityτ h is a function of the estimate of the change point defined bŷ n,h exceeds the (1 − α)-quantile of the standard normal distribution, is a consistent asymptotic level α test.
In order to construct a test for the hypotheses H 0,∆ versus H A,∆ in (2.4) and (2.5) of a relevant change point in at least one of the components of a high dimensional time series we propose a simultaneous test, which rejects the null hypothesis for large values of the statistic Note that a similar approach has been investigated by Jirak (2015a), who considered the "classical" change point problem in high dimension, that is In this case the weak convergence (2.12) does not hold (in factT (∆) n,h = o P (1) under H 0,class ) and a different statistic has to be considered.
As it is well known that the (adjusted) maximum of standard normal distributed random variables converges weakly to a Gumbel distribution if they exhibit an appropriate dependence structure [see for example Berman (1964))], it is reasonable to consider the following for the high-dimensional change point problem (2.4), where a d and b d denote appropriate sequences, such that the left hand side converges weakly at specific points of the "boundary" of the parameter space corresponding to the null hypothesis.
To make these arguments more precise, define the sequences 16) and note that the parameter spaces corresponding to the null hypothesis (2.4) and the alternative (2.5) are given by . Define for k = 0, . . . , d the "(d − k)-dimensional boundary of the hypotheses (2.4) and (2.5)" by For the case d = 2, we illustrate this decomposition of the null hypothesis parameter space in Figure 1. In fact, a large part of this paper is devoted to prove the following statements (under appropriate assumptions -see Theorem 4.1 in Section 4): (1) If (µ (1) , µ (2) ) ∈ A d , the weak convergence holds as both n, d → ∞, where G denotes a Gumbel distribution, i.e. (2.18) (2) If (µ (1) , µ (2) ) ∈ A k and there exists a constant C, such that |µ (3) If (µ (1) , µ (2) ) ∈ A k and lim d→∞ k/d = 0 we have Let g 1−α denote the (1 − α) quantile of the Gumbel distribution, then it follows from these considerations that the test which rejects H 0,∆ in favour of the alternative hypothesis H A,∆ , whenever is an asymptotic level α test. More precisely, it follows (under appropriate assumptions stated below) that Additionally the test is consistent (see Theorem 4.3). In the following sections we will make these arguments more rigorous. Moreover, in order to improve the finite sample approximation of the nominal level we also introduce a multiplier bootstrap procedure and prove its consistency in Section 5.

Asymptotic properties
In this Section we provide the theoretical background for the test suggested in Section 2. We begin introducing some notations and assumptions. After stating the main assumptions we provide in Section 3.2 the asymptotic theory for an analogue of the statistic T d,n defined in (2.14), where the centering in (2.10) is performed by the "true" squared differences |∆µ h | 2 and the estimates of the variances σ h and the locations of the change points t h have been replaced by their "true" counterparts. In Section 3.3 we introduce estimators for the locations of the structural breaks (which may occur at a different location in each component) and investigate their consistency properties. These are then used to define appropriate variance estimators (note that variances have to be estimated in the case of changes in the mean). Finally, we consider in Section 3.4 the asymptotic distribution of the maximum of the statistics (2.10) where again centering is performed by |∆µ h | 2 instead of ∆ 2 h . These results will then be used in the subsequent Section 4 to investigate the statistical properties of the the test (2.19).
Throughout this paper we will use the following notation and symbols. Let x ∧ y and x ∨ y define the minimum and maximum of two real numbers x and y, respectively. For an appropriately integrable random variable Y and q ≥ 1, let Y q = E [|Y | q ] 1/q denote the L q -norm. By the symbols D =⇒ and P −→ we denote weak convergence and convergence in probability, respectively. Moreover, we use the notation x n y n , whenever the inequality x n ≤ C · y n , holds for some constant C > 0 which does not depend on the sample size and dimension and whose actual value is of no further interest. Due to its frequent appearance, G will always represent a (standard) Gumbel distribution defined by (2.18). In the high dimensional setup the dimension d converges to infinity with n → ∞.
Recall the definition of model (2.1) and assume that the time series {X j,h } j∈Z,h∈N forms a physical system [see e.g. Wu (2005)], that is where {ε j } j∈Z is a sequence of i.i.d. random variables with values in some measure space S such that for each h ∈ N the function g h : S N → R is measurable. Note that it follows from (3.1) that the time series defined in (2.2) is stationary. Let ε 0 be an independent copy of ε 0 and define X j,h = g h (ε j , ε j−1 , . . . , ε 1 , ε 0 , ε −1 , . . . ). (3. 2) The distance between X j,h and its counterpart X j,h is used to quantify the (temporal) dependence of the physical system, and for this purpose we introduce the coefficients which measure the influence of ε 0 on the random variable X j,h . Let us also define some additional denotes the covariance function of the h-th and i-th component at lag j Accordingly, the autocovariance function for the h-th component is given by φ h (j) = φ h,h (j) = Cov(X 0,h , X j,h ) and we obtain the well-known representations for the long-run covariance and long-run variance, respectively. If we have σ h , σ i > 0, we can additionally define the long-run correlations by and it will be crucial for our work, that these quantities become sufficiently small with an increasing temporal distance |h − i|. This will be precisely formulated in the following section.

Assumptions
Operating in a high-dimensional framework usually needs stronger assumptions than those for the finite-dimensional case. Mainly, we need uniform dependence and moment conditions among all components to exclude extreme cases and to ensure, that the unknown parameters can be estimated accurately. In the high-dimensional setup considered in this paper the number of parameters t h , σ h grows together with the dimension d, since even under the null hypothesis of no relevant change points each component exhibits its own variance and change point structure. The precise assumptions made in this paper are the following.
Assumption 3.1 (temporal assumptions) Suppose that there exist constants δ ∈ (0, 1) and σ − > 0 such that for some p > 4 the physical dependence coefficients ϑ j,h,p and long run variances σ h defined in (3.3) and (3.4), respectively, satisfy Assumption 3.2 (spatial assumptions) The dimension d increases with the sample size at a polynomial rate, i.e. we assume that for constants C 1 and D where the exponent D satisfies and p refers to the L p -norm · p used to measure the physical dependence in Assumption 3.1.
Here B ∈ (0, 1) denotes a constant used to control the bandwidth of an variance estimator, that will be defined in Section 3.4. Further we assume for the long-run correlations in (3.5) where ρ + > 0, η > 0 and C 2 > 0 denote global constants.
Assumption 3.3 (moment assumptions) Suppose, that there exists a positive sequence (M d ) d∈N and constants C 3 > 1 and C 4 > 0, such that Assumption 3.4 (location of the change points) There exists a constant t ∈ (0, 1/2), such that for all h ∈ N the unknown locations nt h of the structural breaks satisfy Let us give a brief explanation for the Assumptions 3.1 -3.4. The temporal Assumptions (T1) and (T2) define the temporal dependence structure and bounds for the long-run variance. Further (T1) implies the existence of the quantities σ h , γ h,i and ρ h,i defined in (3.4) and (3.5). Condition (S3) and (S4) refer to the spatial dependence and are only needed to derive the desired extreme value convergence. Assumption (S1) gives a relation between the number of observations and its dimension, while (S2) is a slightly technical assumption, which enables reasonable estimations of the variance σ h and the locations t h of the structural breaks. For a proof of the uniform consistency of the estimates for the latter quantity we must further rely on Assumption (C1), which makes the change points identifiable and is a common assumption in the literature. Roughly speaking, it simply ensures to have enough data before and after the change point in each component. Assumptions (S1) and (S2) together with n → ∞ directly imply d → ∞. It is also worth mentioning, that (S1) can be replaced by d n D , if one additionally supposes that d → ∞.
The moment Assumptions (M1) and (M2) are required for a Gaussian approximation and are satisfied, if {X j,h } j∈Z,h∈{1,...,d} is stationary with respect to the index j and for each h ∈ N the random variable

Asymptotic properties -known variances and locations
In this section we assume that the locations t h of the changes and the long-run variances σ h are known. Recalling the approximation (2.7) we define and investigate the asymptotic properties of the maximum of the statistics where τ h = τ (t h ) and the function τ is defined by (2.11). Note that T n,h is the analogue of the statisticT n,h , where the thresholds ∆ h , estimatest h andσ h have been replaced by the unknown quantities ∆µ h , t h and σ h , respectively. Our first result shows that an appropriate standardized version of the maximum of the statistics T n,h converges weakly to a Gumbel distribution. The proof is complicated and we indicate the main steps in this section deferring the more technical arguments to the appendix -see Section 7.
Theorem 3.5 Assume that the Assumptions 3.1 -3.4 are satisfied and that additionally there exist constants C , C u (independent of h and d) such that the inequalities C ≤ |∆µ h | ≤ C u hold for all h = 1, . . . , d . Then where the sequences a d and b d are defined in (2.15) and (2.16), respectively.
Proof of Theorem 3.5 (main steps). Observing the definition (3.6) and (3.7) we obtain by a straightforward calculation the representation and the statistics T (1) n,h and T (2) n,h are defined by (3.11) For the following discussion, we introduce the additional notation Our first auxiliary result shows that the first term T (1) n,h in the decomposition (3.8) is asymptotically negligible and is proved in Section 7.2.1.
Lemma 3.6 If the assumptions of Theorem 3.5 are satisfied, we have By Lemma 3.6 it suffices now to deal with the term T (2) n,h . The next step in the proof of Theorem 3.5 consists in a (uniform) approximation of the distribution of the maximum of the statisticsT (2) n,h by of the distribution of the maximum of (dependent) Gaussian random variables. The proof of the following result is given in Section 7.2.2, where we make use of new developments on Gaussian approximations for maxima of sums of random variables [see Chernozhukov et al. (2013) and Zhang and Cheng (2016).
By Lemma 3.7 it suffices to establish the desired limiting distribution for the maximum of a Gaussian distributed vector. Nowadays, this is a well-understood area of mathematics and we can rely on results of Berman (1964), who originally examined the behavior of the maximum of dependent Gaussian random variables. A straightforward adaption of these arguments shows that the sequence of random variables {N i } i∈N constructed in Lemma 3.7 satisfies (2) (3.14) which together with Lemma 3.6 yields the assertion of Theorem 3.5.
In the next step we will replace the unknown quantities t h and σ h in (3.7) by suitable estimators, sayt h andσ h , and obtain the statisticŝ We emphasize again that the statisticsT n,h do not coincide with the statisticsT n,h in (2.10), which are actually used in the test (2.19) except in the case where ∆µ 2 h = |µ Thus centering is still performed with respect to the unknown difference of the means before and after the change points. In the following two subsections we give a precise definition of the two estimators and derive an analogue of Theorem 3.5 in the case of estimated change points and variances.

Estimation of long-run variances and change point locations
Determining the relative locations t h of the structural breaks and constructing an estimator for the long-run variances σ h for all components h ∈ {1, . . . , d} is a rather difficult task in a high dimensional setting. A further difficulty in the problem of testing for relevant structural breaks consists in the fact that even under the null hypothesis there may appear structural breaks in the mean and the corresponding process is not stationary. Therefore in contrast of testing the "classical" hypotheses in (2.13) the construction of a suitable variance estimator is not trivial. A standard long-run variance estimator in terms of i≤βnφ h (i) for an increasing sequence {β n } n∈N and appropriate estimatorsφ h (i) of the auto-covariance from the full sample may not be consistent due to possible changes of the mean. Following Jirak (2015a) we define for each component h = 1, . . . , d the sets which are the observations before and after the (unknown) change point nt h , respectively. Since these points are usually unknown, we need to estimate them and for this purpose we propose the common estimatort The following Lemma shows that these estimators are uniformly consistent with respect to all components, where a change point exists. Its proof follows by an adaption of Corollary 3.1 in Jirak (2015a), and is therefore omitted.
Lemma 3.8 Let denote the set of all components h ∈ {1, . . . , d}, where the corresponding time series {Z j,h } j∈Z is stationary, and define is satisfied. Then for a sufficiently small constant C > 0 it holds that Roughly speaking condition (3.19) guarantees that the decreasing sequence µ d does not converge too fast to 0 if the dimension of the time series converge to infinity. Otherwise it is not possible to identify the (relative) locations of all change points simultaneously. In view of Lemma 3.8 it is reasonable to estimate the unknown sets D h,1 and D h,2 by respectively, where S ∈ (0, 1) denotes a user-specified separation constant. Letσ 2 h,1 andσ 2 h,2 be the standard long-run variance estimators based on the samples D h,1 and D h,2 , respectively. Then we can use any convex combination ofσ 2 h,1 andσ 2 h,2 to estimate the long run variances σ 2 h , for examplê σ 2 h = 1 2 (σ 2 h,1 +σ 2 h,2 ). To simplify the technical arguments in Section 7 we consider a truncated version, that isσ where 0 < s − and s + are a sufficiently small and large constant, respectively. The following statement is a straightforward implication of the results in Section 3 of Jirak (2015a) and yields the consistency of these estimators (uniformly with respect to the spatial component).
Lemma 3.9 Suppose that Assumptions 3.1 -3.4 are satisfied and additionally that there exists a constant C > 0 such that the inequality C ≤ |∆µ h | hold for all h = 1, . . . , d. Then we have for a sufficiently small constant η > 0.

Weak convergence
Equipped with Lemmas 3.8 and 3.9 we are now able to state the main result of this section, which will be proved in Section 7.2.3.
Theorem 3.10 If the assumptions of Theorem 3.5 are satisfied, then the statisticsT n,h defined in (3.15) satisfy as d, n → ∞, where the sequences a d and b d are defined in (2.15) and (2.16), respectively.
Recall again that the statisticsT n,h andT (∆) n,h in (2.10) do not coincide in general. Thus Theorem 3.10 does not show that the test (2.19) is an asymptotic level α test because it does not cover all parameter configurations of the the null hypothesis (2.4). However, if the vector (µ (1) , µ (2) ) is an element of the set A d defined in (2.17) we haveT n,h =T (∆) n,h and it follows from this result that the probability of rejection converges to α, that is lim d,n→∞ We conclude this section with a result, which can be used to control the type I error of the test (2.19) for other values of the vector (µ (1) , µ (2) ).
Corollary 3.11 Let {M d } d∈N be an increasing sequence of subsets of {1, . . . , d} (as d, n → ∞). If the assumptions of Theorem 3.5 hold, then is valid for all x ∈ R.

Relevant changes in high dimensional time series
Recall the problem of testing the hypotheses of a relevant change defined in (2.4) and (2.5). We propose to reject the null hypothesis of no relevant change in any component of the high dimensional mean vector, whenever the inequality (2.19) holds, that is T d,n > g 1−α , where the test statistic T d,n is defined in (2.14) and g 1−α denotes the (1 − α) quantile of the Gumbel distribution. The following two results make the discussion at the end of Section 2 rigorous and show that the test introduced in (2.19) defines in fact a consistent and asymptotic level α test.
Theorem 4.1 Suppose that the Assumptions 3.1 -3.4 are satisfied, α ∈ (0, 1 − e −1 ] and that there exist constants ∆ − , ∆ + such that the thresholds ∆ h satisfy the inequalities for some constant C ∆ > 0, then, under H 0,∆ , we have (4.4) Remark 4.2 Condition (4.1) is actually not a very strong restriction since the thresholds ∆ h are defined by the user. Nevertheless, the condition is crucial since we use the factor 1/∆ h as a normalisation in the statisticsT (∆) n,h defined in (2.10). Note that under the null hypothesis (2.4) the inequality ∆ h ≤ ∆ + is equivalent to |∆µ h | ≤ C u , which was one of the assumptions in Theorem 3.10. Consequently the assertion of Theorem 4.1 follows from Theorem 3.10 in the case where |∆µ h | = ∆ h for all h ∈ {1, . . . , d}. However, in the general case the proof of Theorem 4.1 is more complicated and deferred to Section 7.3.1, where we also handle the case |∆µ h | < ∆ h .
In the following result we investigate the consistency of the new test. Interestingly it requires less assumptions than Theorem 4.1. If the test (2.19) rejects the null hypothesis H 0,∆ in (2.4) we conclude (at a controlled type I error) that there is at least one component with a relevant change in mean. As there could exist relevant changes in several components, the next step in the statistical inference is the identification of the set (4.6) The following theorem provides a consistency result of this estimate.
Theorem 4.4 Suppose that the Assumptions 3.1 -3.4 hold and assume additionally that there exist two constants 0 < C < 1/2, C u > 0 such that Then, the set estimator defined in (4.6) satisfies for α ∈ (0, 1 − e −1 ] lim d,n→∞ Moreover we have the following lower bound (4.9)

Bootstrap
The testing procedure introduced in the previous sections is based on the weak convergence of the maximum of appropriately standardized statistics to a Gumbel distribution, and it is well known that the speed of convergence in limit theorems of this type is rather slow. As a consequence the approximation of the nominal level of the test (2.19) for finite sample sizes may not be accurate.
A common way to improve the performance of the test, is to obtain the critical values from an appropriate bootstrap procedure.
In the context of testing for relevant change points the construction of an appropriate resampling procedure is not obvious as -in contrast to classical change point problems -the parameter space under the null hypothesis is rather large. In particular it will be necessary to simulate the distribution of the statistic T d,n in case |∆µ h | = ∆ h for all h ∈ {1, . . . , d} such that one can replace the quantile of the Gumbel distribution by the corresponding quantile of the bootstrap statistics. A further problem is to mimic the dependence of the underlying times series, which we will address employing a Gaussian multiplier bootstrap, where observations are block-wise multiplied with independent Gaussian random variables [see Künsch (1989) or Lahiri (2003)].
To handle the problem of potential change points under the null hypothesis (2.4) of no relevant changes, observations from blocks in a neighborhood of estimated change points will not be used in the estimate. Furthermore, components without a change point will be ignored when the bootstrap statistic is constructed. We begin describing this idea in more detail and show in Theorem 5.5, that the bootstrap statistic converges weakly to a Gumbel distribution as well. In the sequel we assume without loss of generalization that n = KL and will split the sample into L blocks of length K, and additionally L ∼ n and K ∼ n 1− for ∈ (0, 1). (5.1) We obtain the following quantities, which control the number of blocks before and after the estimated change point.
wheret h denotes the estimator of the location t h of the change in the h-th component defined in Section 3.4. The corresponding sample means are given bȳ which can be used to define an estimator for the unknown amount of change ∆µ h = µ h,1 − µ h,2 in the h-th component, that is Moreover, these estimators can also be used to define the "mean corrected" sample (5.5) Finally, we define blocking variables (that are sums with respect to the different blocks) as and introduce the notation From the representation (5.5) we directly obtain, that blocks near to the estimatort h are ignored, i.e. we have Further denote by the σ-field generated by the sample (Z j,h ) 1≤j≤n,1≤h≤d and by {ξ } ∈N a sequence of i.i.d. standard Gaussian random variables, which is independent of Z n . Now we consider a multiplier version of the CUSUM-process from the L blocks, that is and introduce the bootstrap integral statistics (for each component) where k(x, y) = x ∧ y − xy denotes the covariance kernel of the standard Brownian bridge and is the variance estimate from the bootstrap sample. In an analogous manner to the previous sections, we define a normalized maximum of the bootstrap statistics B n,h by where the normalizing sequences a d and b d are given by (2.15) and (2.16), respectively.
Remark 5.1 Let us give a brief heuristic explanation, why (5.8) and (5.9) define an appropriate bootstrap statistic. Our basic aim is to mimic the distribution of the test statistics T d,n on the "0dimensional boundary" of the null hypothesis H 0,∆ , i.e. in case of |∆µ h | = ∆ h for all h = 1, . . . , d.
Note, that we have µ h (s, t h ) ∆ h = sign(∆µ h )k(s, t h ) and it was outlined in Section 3, that in this setting the representation holds, where by Lemma 3.6 the first summand on the right-hand side is of order o P (1). The component-wise bootstrap statistic B n,h is supposed to imitate the second summand in this decomposition. However, this approach is only sensible for all components h, that actually contain a change and for this reason we introduce the indicator function I |∆ µ h | > n −1/4 . To be more precise, we will show in the Appendix that where the set S d is defined in (3.17). The statistic B d,n will then be used to generate bootstrap quantiles for the statistic T d,n . In order to prove that this is a valid approach we require the following additional assumptions.
Assumption 5.2 (assumptions for the bootstrap) For the constants p, D in Assumption 3.2 and the exponent in (5.1) assume that (B1) D < min{(1 − )(p/2 − 2), (p/4 − 1)} with > 3/4 and p > 8, . Assumption (B1) is a rather technical condition relating the dimension d ∼ n D , the number of blocks L = n and the constant p, which was initially introduced in Assumption 3.1. Assumption (B2) is only a restriction for the monotone decreasing sequence µ * d = min h∈S c d |∆µ h |, that is not allowed to decrease arbitrarily fast.
We are now ready to state the main results of this section. Our first lemma shows, that we are able to identify the set of stable components S d correctly.
conditional on Z n in probability, where the sequences a d and b d are defined in (2.15) and (2.16), respectively.
Finally, the representation

Remark 5.6
In the following let g * 1−α denote the (1 − α)-quantile of the distribution of the bootstrap statistic B d,n and define the bootstrap test by rejecting the null hypothesis (2.4) in favour of (2.5), whenever where the statistics T d,n is defined in (2.14). If the alternative hypothesis of at least one relevant change point holds, Theorem 4.3 shows T d,n P −→ ∞, which due to Theorem 5.5 directly yields consistence of the bootstrap test. Under the null hypothesis, we consider different cases. Recall the definition of the sets . . , d}. A combination of Theorem 4.1 and Theorem 5.5 now shows, that the rule in (5.11) gives an asymptotic level α test, whenever the limit lim d→∞ |S c d |/d exists.

Finite sample properties
In this section we examine the finite sample properties of the asymptotic and bootstrap test by means of a small simulation study and illustrate its application in an example.

Simulation study
The results of the previous section demonstrate that a test which rejects the null hypothesis in (1.3) for large values of the statistic T d,n defined in (2.14) is consistent and has asymptotic level α, provided that the critical values are either chosen by the asymptotic theory or estimated by the bootstrap procedure introduced in Section 5. It turns out that a bias correction, which does not change the asymptotic properties, yields substantial improvements of the finite sample properties of the asymptotic and bootstrap test. To be precise, recall the decomposition in (3.8), that is n,h and T (2) n,h are defined in (3.10) and (3.11), respectively. It is shown in Section 3, that the maximum of the terms T n,h is always nonnegative, which may lead to a non negligible bias in applications, in particular when the sample size is small relative to the dimension. To solve this problem for the asymptotic test (2.19), note that we have and therefore we subtract the termσ n,h defined in (2.10). Similarly, a bias correction is also suggested for the bootstrap test in Section 5. Recall that the Bootstrap statistic is already constructed to mimic the distribution of the statistic T To guarantee a stable long-run variance estimation, we replaced the standard long-run variance estimator used in the theory of Section 3.3, by an estimator using the Bartlett kernel [see Newey and West (1987)], that is (for component h) In order to have a more conservative test we use the maximum of the two variance estimates based on the sets D h,1 and D h,2 defined in (3.20) as the final variance estimation. We will focus on two scenarios with independent innovations in model (2.1), i.e.
and on two models of dependent data, an ARMA-process and a MA-process, defined by where different values of the parameter µ are considered and the time of change is t h = 1/2. All results presented in this section are based on 1000 simulation runs and the used significance level is always α = 0.05. Further, the constant S involved in (3.20) was fixed to 0.9. In our first example we investigate the finite sample properties of the asymptotic test (2.19) which uses the quantiles of the Gumbel distribution. In Table 1 we display the simulated type I error of this test at the "0-dimensional boundary" of the null hypothesis (that is ∆ h = ∆µ h = 1 for all h ∈ {1, . . . , d}) for different values of n and d. It is well known that the approximation of the distribution of the maximum of normally distributed random variables by a Gumbel distribution is not very accurate for small samples and therefore we consider relatively large sample sizes and dimensions in order to illustrate the properties of the asymptotic test. The results reflect the asymptotic properties in Section 3. For the independent cases (I) and (II) the approximation of the nominal level is more precise as for the dependent scenarios (III) and (IV), where the test is more conservative. We also mention that the rejection probabilities are increasing with ∆µ h as predicted in Section 2 and 4 (these results are not presented for the sake of brevity).
n=2000 n=5000 f n=10000 model d=500 d=1000 d=500 d=1000 d=500 d=1000  Next we analyse the properties of the bootstrap test (5.11), which was developed in Section 5. Here we focus on relatively small sample sizes, that is n = 100, n = 200 and a relative large dimension compared to the sample sizes, that is d = 50, d = 100. It is well known that the multiplier bootstrap is sensitive with respect to the choice of the block length and this dependence is also observed for the bootstrap test proposed here. Exemplarily we show in Table 2 the simulated rejection probabilities for the different models (I) -(IV), different values of K and ∆µ h = µ. Here the values |∆µ h | ≤ 1 correspond to the null hypothesis. For |∆µ h | = 1 (for all h ∈ {1, . . . , d}) the results of Section 5 predict that at this point the level of the test should be close to α = 0.05. Note that for the case of independent innovations (model (I) and (II)) the choice K = 1 (which means that no blocks are used) leads to the most reasonable results, given by rejection probabilities on the "0-dimensional boundary" A d of the null hypothesis of 8.2% and 2.5%, respectively, while larger values of K such as K = 4 yield a too large type I error. On the other hand in the dependent models (III) and (IV), the block length needs to be carefully adapted to the time series structure. For model (III) K = 10 seems to be optimal, while for model (IV) choosing K = 2 gives acceptable results. The larger block length for model (III) might be required due to its autoregressive structure. Moreover, inspecting the results in rows 6 and 8 of Table 2 shows that too small values of K lead to a loss of power, while -similar to the first two models -too large values can cause an uncontrolled type I error.
Next we display in Figure 2 the simulated power of the bootstrap test for all four models under consideration (where we use the optimal K from Table 2). Note that the rejection probabilities are increasing with ∆µ h as predicted by the asymptotic arguments of the previous sections. In the left panel we show the results for the independent scenarios (I) and (II), which are rather similar. On the other hand an inspection of the right panel shows larger difference in the dependent case. The different dependency structures in model (III) and (IV) yield substantial differences in power of the bootstrap test (5.11). We conclude the discussion of the bootstrap test investigating the sample size n = 200. The corresponding results are presented in Table 3

Data example
In this section we illustrate in a short example, how the new test can be used in applications. Our dataset is taken from hydrology and consists of average daily flows (m 3 /sec) of the river Chemnitz at Goeritzhain (Germany) in the period 1909-2014. This data set has been recently analysed by Sharipov et al. (2016) using a statistical model from functional data analysis. Following these authors we subdivide the data into n = 105 years with d = 365 days per year. To avoid confusion, the reader should note that the German hydrological year starts on the 1st of November. Equipped with our new methodology, we are now able to test if there is a relevant change in the mean of at least one component. To specify the term 'relevant', we exemplarily set the thresholds for all components to ∆ 1 = ∆ 2 = · · · = ∆ 365 = 0.63, which is close to 10% of the overall mean of the data under consideration. For a significance level of 5% the Bootstrap test defined in Section 5 rejects the null hypothesis of no relevant change for the given thresholds. Moreover, we can also identify the components, where the individual test statistic leads to a rejection, that is a d (T quantile of the bootstrap distribution used in (5.11). For the data under consideration we found four components with a relevant change, given by h = 53, 99, 137 and 252 with corresponding estimators nt 53 = 56, nt 99 = 70, nt 137 = 47 and nt 252 = 41, respectively. This corresponds to the 23th of December 1965, the 7th of February 1979, the 18th of March 1956 and to the 10th of July 1950, respectively. In Figure 3 we display for these cases the time series before and after the year. For example the panel in the first row shows the average annual flow curves before and after 1950 and the other three years are interpreted separately. In all cases we observe a large difference between both curves close the estimated component (marked by the vertical dashed line). We finally note that the approach of Sharipov et al. (2016) identifies only one change point, namely the year 1965. In contrast our analysis indicates that there might be additional change points in the years 1950, 1956, 1965 and 1979

Technical details
Throughout the proofs we will use that Assumption (C1) directly implies the existence of two constants τ − , τ + , such that the function τ defined in (2.11) satisfies holds for all h ∈ N.

Proof of assertion (2.7)
Straightforward calculations yield uniformly with respect to s ∈ [0, 1], where µ h (s, t h ) is defined in (3.9). An application of Fubini's theorem now gives 7.2 Details in the proof of Theorem 3.5

Proof of Lemma 3.6
Observing the definition (2.11) and Assumptions (T2) and (C1) it easily follows that there exist constants c and C such that the inequalties hold for all h ∈ {1, . . . , d}. With these inequalities we obtain Using (7.1) and max d h=1 |∆µ h | ≤ C u this is further bounded by Introducing the notation e d = 2 √ 2 log 2d it follows that the last term can be bounded by the random variable and the claim follows from Theorem 2.5 in (Jirak, 2015a) corresponds to a CUSUM-process under the classical null hypothesis of no change point, that is µ h for all h ∈ {1, . . . , d} (note that there is a typo in the original paper, which has been corrected in the arXiv version).

Proof of Lemma 3.7
A straightforward calculation yields n Cov(U n,h (s 1 ), U n,i (s 2 )) = k(s 1 , s 2 )γ h,i + r n,h,i (s 1 , s 2 ), where k(s 1 , s 2 ) = s 1 ∧s 2 −s 1 s 2 denotes the covariance kernel of a Brownian bridge and the remainder term satisfies An application of Fubini's theorem shows that where the function τ is given by Moreover, for the function τ defined in (2.11) we obtain the representation τ (t) = 6 (t(1 − t)) 2 1 0 1 0 k(s 1 , t)k(s 2 , t)k(s 1 , s 2 )ds 1 ds 2 , and it follows that τ (t h )τ (t h ) = τ (t h , t h ). Therefore we obtain as a special case the estimate Var T (2) n,h = 1 + r n,h,h .
and the proof can now be performed in two steps: Step 1: For two constants c, C > 0 it holds that (7.6) where N is a d-dimensional centered Gaussian distributed random variable with same covariance structure as (T n,1 , . . . ,T n,d ) T .
Step 1: At the end of this proof we derive the following representation where the coefficients c n,j,h are uniformly bounded, that is sup n,j,h∈N |c n,j,h | ≤ c 0 < ∞ Next we apply the Gaussian approximation in Corollary 2.2 of Zhang and Cheng (2016) to the random variables Y n,j,h = c n,j,h X j,h . For this purpose we check the assumptions of this result. By Moreover, Assumption (M2) yields that M d n m with m < 3/8. This means that for sufficiently small b we have m < (3 − 17b)/8 and by Assumption (S1) it follows that d exp(n b ). Using the identity (3.1) the triangular array {Y n,j,h | 1 ≤ j ≤ n, 1 ≤ h ≤ d} n∈N exhibits the following structure Y n,j,h = c n,j,h · g h (ε j , ε j−1 , . . . ) :=g n,j,h (ε j , ε j−1 , . . . ).
Define the coefficients where ε i−j is an independent copy of ε i−j , and observe the inequality which holds uniformly with respect to n. By (7.5) there exist constants c 1 and c 2 , such that Var T (2) n,h < c 2 if n is sufficiently large. Since all requirements are met, Corollary 2.2 of Zhang and Cheng (2016) implies the existence of a Gaussian random variable N having the same covariance matrix as the vector (T (2) n,1 , . . . ,T (2) n,d ) T and satisfying inequality (7.6).
Step 2: We choose the random variable N to be d-dimensional centered Gaussian with covariance matrix given by where the function τ is defined in equation (7.4). Next denote with Σ the covariance matrix of the vector N from Step 1. By (7.3) we have n,i ) n −1/2 and an application of the Gaussian comparison inequality, in Lemma 3.1 of Chernozhukov et al. (2013) gives The proof of Step 2 is now completed observing the bound | τ (t h , t i )| ≤ |τ (t i )||τ (t h )|, which is a consequence of the (generalised) Cauchy-Schwarz inequality.
Proof of the representation (7.9). Recall the definition k(s, t) = s ∧ t − st, then

Now observe the representation forT
(2) then the representation (7.9) holds, and the proof is completed observing the inequalities

Proof of Theorem 3.10
As |∆µ h | > C u for all h ∈ {1, . . . , d} we have S d = ∅ and S c d = {1, . . . , d}, and Lemma 3.8 implies Moreover, one easily verifies that the function t → τ (t) defined in (2.11) is Lipschitz-continuous on the interval [t, 1 − t] and therefore we obtain from (7.10) Finally, we note that for a sufficiently small constant C > 0 the estimate holds, which is a direct consequence Lemma 3.9 and assertion (7.12). After these preparations we return to the proof of Theorem 3.10. We recall the definition (2.8) and introduce the notationT We will first show the weak convergence For a proof of (7.14) we recall the definition of the statistic T n,h in (3.7) and obtain from Theorem 3.5 With the representation forM 2 n,h and M 2 n,h in (2.8) and (3.6), respectively, and the notationq h It is easy to see that this term can be bounded by where the remainder satisfies which follows observing the inequalities (7.2), (7.11) and the condition C ≤ |∆µ h | ≤ C u . Thus (7.14) follows, if we can establish For a proof of this result we fix x ∈ R and define u d (x) = x/a d + b d . By From (7.11) and d = C 1 n D it follows that which due to Slutsky's theorem directly implies Thus we have established (7.15) and proved (7.14).
To complete the proof of Theorem 3.10 note that the assertion (3.22) is equivalent to where we use again u d (x) = x/a d + b d . To prove this statement define and consider the set where the involved sequence is given by δ d = (log d) −2 . The inequalities (7.2) and the estimate (7.13) yield By a similar argument for the term Q − d − 1 we obtain P (Q c d ) → 0. If max d h=1T n,h ≥ 0 holds, we can conclude that Therefore the following inequalities hold Observing the identity Hence, we directly obtain Now we fix ε > 0 and note that the inequality u d (x − ε) < u d (x) + u d (x)δ d < u d (x + ε) holds if n (or equivalently d) is sufficiently large. The weak convergence (7.14) then yields lim sup Using Bonferroni's inequality we can proceed similarly for the lower bound of (7.20), i.e. lim inf The assertion (7.18) then follows by ε → 0, which completes the proof of Theorem 3.10.

Proof of Corollary 3.11
At first we consider the case where m d := |M d | = c · d + o(d) for some constant c ∈ (0, 1] and note that in this case ( 7.21) Recall the definition of the set S d in (3.17) , choose a constant ∆ 2 − > ζ > 0 and consider the following decomposition of the set {1, . . . , d} \ S d Using the representation the first assertion (4.2) follows from the following three statements Proof of (7.23): Observing the definition ofT (∆) n,h in (2.10) we obtain the inequality The first summand of this expression is further bounded by respectively. Now we have an upper bound for (7.30) given by Similar as in the proof of (7.23) one easily shows that a d max n,h = o p (1). In the case that S (2) n,h ≥ 0 we have Applying Lemma 3.7 yields lim sup for all x ∈ R and consequently the right hand side of (7.30) is of order O p (1). Now (7.24) follows from (7.28) and (7.29).
It remains to show assertion (4.4) under the additional assumption of (4.3). Note that under the latter assumption, we can further decompose the set E d into E d = (E d \ B d ) ∪ B d and observe that (4.3) yields Again, we can examine both sets separately.
By definition of E d \ B d we obtain that the second summand on the right-hand side of (7.32) tends (in probability) to −∞. Due to the lower bound min h∈E d \B d |∆µ h | > (∆ − − ζ)/ C(t), which holds uniformly in d, we can apply Corollary 3.11 to the first summand of the right-hand side of (7.32), which then gives a d max On the set B d = E d we can directly apply Corollary 3.11, so that we obtain (4.4).

Proof of Theorem 4.4
Due to g 1−α /a d + b d > 0 we deduce Using the notation d h = ∆µ 2 h − ∆ 2 h we get By assumption (4.7) we have By arguments similar to those in the proofs of Section 3 one can show that for all x ∈ R lim inf d,n→∞ P a d min which yields assertion (4.8). For a proof of (4.9) we apply Bonferroni's inequality, which gives By the arguments in the previous paragraph we have P R d R d (α) = o(1) and Theorem 4.1 gives lim sup d,n→∞ which finishes the proof.

Proofs of the results in Section 5
To establish the bootstrap results, we recall the definition of the set S d in (3.17) and we introduce the set which represents the event, that the locations of all change points are identified correctly. We need the following basic properties.
Lemma 7.1 If the assumptions of Section 3.1 and Assumption 5.2 hold, then P (L c d ) n C for a sufficiently small constant C > 0.
Proof. For assertion (i) fix ε > 0 and observe Z j,h − µ h,2 > ε/2 ∩ L d + o(1). (7.39) The first two summands of the right-hand side of (7.39) exhibit the same structure, so we only treat the first of them. Note that on the event L d , it holds that Further there exists a constant 0 < C 0 < 1, such that the inequalities C 0 n ≤ nt − K ≤ K L − h ≤ nt h hold. This implies P n C d max X j,h > n −C ε/2 ≤ d h=1 P n max k=1 k j=1 X j,h > C 0 n 1−C ε/2 .
Proof of Lemma 5.3. This is a consequence of Lemma 7.1 (i) and the inequality (for any ε > 0) where the truncation is not conducted within the blocks. We make also use of the following extra notation n,h (s)k(s,t h )ds.
The theorem's claim is a direct consequence of the next five lemmas. We will use the notation P |Zn (·) = P (·|Z n ). and frequently apply that the implication P (A n ) = o(1) ⇒ P |Zn (A n ) = o p (1) holds for all sequences of measurable sets {A n } n∈N .
where the function τ is defined in (7.4). Our aim is to control the (conditional) Kolmogorov-distance between max h∈S c d B n,h and max h∈S c d N h . Since the random variables {ξ } ∈N are independent, we can directly calculate the conditional covariance Cov |Zn B n,h , B n,i = 36 for sufficiently small constants C, δ > 0, where the set L d is defined in (7.38). Using the lower bound σ h σ i τ (t h )τ (t i ) ≥ σ 2 − τ 2 − yields P C(δ) C n −C (7.40) for the set For a sufficiently small C > 0 we have P P |Zn (L C d ∪ C(δ) C ) ≥ n −C ≤ n C E P |Zn (L C d ∪ C(δ) C ) = n C P L C d ∪ C(δ) C = o(1). Assumption (T2) and (C1) imply σ h ≥ σ − and τ (t h )(t h (1 − t h )) 2 ≥ τ − t 4 , which yields holds conditionally on Z n .