Change-point tests under local alternatives for long-range dependent processes

We consider the change-point problem for the marginal distribution of subordinated Gaussian processes that exhibit long-range dependence. The asymptotic distributions of Kolmogorov-Smirnov- and Cram\'{e}r-von Mises type statistics are investigated under local alternatives. By doing so we are able to compute the asymptotic relative efficiency of the mentioned tests and the CUSUM test. In the special case of a mean-shift in Gaussian data it is always $1$. Moreover our theory covers the scenario where the Hermite rank of the underlying process changes. In a small simulation study we show that the theoretical findings carry over to the finite sample performance of the tests.


Introduction
Over the last two decades various authors have studied the change-point problem under longrange dependence and classical methods are often found to yield different results than under short-range dependence. The CUSUM test is studied in Csörgő and Horvath (1997) and compared to the Wilcoxon change-point test in Dehling et al. (2012). Ling (2007) investigates a Darling-Erdős-type result for a parametric change-point test, and estimators for the time of change are considered in Horvath and Kokoszka (1997) and Hariz et al. (2009). Moreover, the special features of long memory motivated new procedures. Beran and Terrin (1996) and Horvath and Shao (1999) are testing for a change in the linear dependence structure of the time series and Berkes et al. (2006) and Baek and Pipiras (2011) construct tests in order to discriminate between stationary long memory observations and short memory sequences with * Fakultät für Mathematik, Ruhr-Universität Bochum, 44780 Bochum, Germany, Email address: Johannes.Tewes@rub.de a structural change. For a general overview of the change-point problem under long-range dependence see Kokoszka and Leipus (2001) and the associated chapter in Beran et al. (2013).
One of the classical change-point problems is the change of the marginal distributions of a time series {Y i } i≥1 . When testing for at most one change-point (AMOC) in the marginal distribution one often considers the empirical distribution function of the first k observations and that of the remaining observations. Taking a distance between the empirical distributions and the maximum over all k < n yields a natural statistic. Common distances are the supremum norm, which gives the Kolmogorov-Smirnov statistic 1) or an L 2 -distance, which gives the Cramér-von Mises statistic Both are widely used for goodness-of-fit tests and two-sample problems. In the change-point literature they are considered by Szyszkowicz (1994) for independent data, by Inoue (2001) for strongly mixing sequences and by Giraitis et al. (1996b) for linear long-memory processes.
However, note that in the LRD setting only the Kolmogorov-Smirnov test has been investigated.
(1.1) and (1.2) are functionals of the sequential empirical process, that is ⌊nt⌋ i=1 (1 {Y i ≤x} −F (x)) for t ∈ [0, 1] and x ∈ R. Thus the asymptotic distributions of T n and S n rely on that of the sequential empirical process. For weakly dependent sequences this would be a Gaussian process, in the special case of independent random variables it is called Kiefer-Müller process. For stationary sequences that exhibit long-range dependence, Dehling and Taqqu (1989a) proved that the limit process is of the form {J(x)Z(t)} t,x , where J(x) is a deterministic function and the process is therefore called semi-degenerate. They considered subordinated Gaussian processes, in detail Y i = G(X i ) for any measurable function G and a Gaussian sequence X i with non-summable autocovariance function. A similar limit structure was later obtained independently by Ho and Hsing (1996) and Giraitis et al. (1996a) for long-range dependent moving-average sequences.
It is the main goal of this paper to derive the limit distribution of change-point statistics of the type (1.1) and (1.2) under local alternatives. We then apply these results to derive the asymptotic relative efficiency (ARE) of several change-point tests. To this end we investigate the sequence G 1 (X 1 ), . . . , G 1 (X k * ), G n (X k * +1 , ) . . . G n (X n ), (1.3) Here G n is a sequence of functions such that the distribution of G n (X 1 ) converges to the distribution of G(X 1 ) in some suitable way.
Therefore, we are able to analyze various types of change-points, among them a mean-shift.

Thus we may compute the ARE of Kolmogorov Smirnov, Cramér-von Mises, CUSUM and
Wilcoxon test and get the surprising result that in case of Gaussian data it is always 1.
The mathematically most challenging case is the situation when the Hermite rank changes.
The Hermite rank of the class {1 {G(·)≤x} − F (x)} x∈R is defined as the smallest positive integer, such that E[1 {G(X 1 )≤x} H q (X 1 )] = 0 for some x ∈ R, with H q being the q-th Hermite polynomial. The structure of the limiting process Z(t), e.g. the marginal distribution and the covariance structure, mainly depends on m. However, a special feature of distributional changes in subordinated Gaussian processes is the fact that the Hermite rank may change, too.
Hence the question arises which Hermite process will determine the limit distribution. Under a mean-shift the Hermite rank remains unchanged, which can be seen easily by its definition.
Our results differ in various ways from those obtained in Giraitis et al. (1996b), where changes in the coefficients of an LRD linear process were investigated. While the empirical process of LDR moving average sequences converges to fractional Brownian motion, we may encounter higher order Hermite processes. The possible change in the Hermite rank is therefore a novel feature in our investigation.
The rest of the paper is organized as follows. In section 2 we will state a limit theorem for the sequential empirical process under change-point alternatives. Moreover we will give the asymptotic distribution of the test statistics under the hypothesis of no change as well as under local alternatives. Thus we are able to derive the asymptotic relative efficiency of several change-point tests. In section 2.5 we consider the empirical process for long-range dependent arrays that are stationary within rows. The outcome mainly serves as a device for proving the main results, but is also of interest on its own. Section 3 contains the simulation study. To the best of our knowledge there are no results on the finite sample performance of the Cramér-von Mises change-point test under long memory. It is compared to other change-point tests and the effect of an estimated Hurst-coefficient is discussed. We obtain that the theoretical results (e.g. asymptotic relativ efficiency between Cramér-von Mises and CUSUM test) carry over to the finite sample performance of the tests. Finally proofs are provided in section 4.

Main results
Let {X i } i≥1 be a stationary Gaussian process, with for 0 < D < 1 and a slowly varying function L. The non-summability of the covariance function is one possibility to define long-range dependence. We investigate our results for so called The key tool in our analysis of possible changes in the marginal distribution of such a process is the sequential empirical process. To obtain weak convergence of this process the right normalization is given by d n,m , defined by where the constant of proportionality is 2m!(1 − mD) −1 (2 − mD) −1 , see Theorem 3.1 in Taqqu (1975). H = 1 − mD/2 is called Hurst coefficient and The mentioned result of Dehling and Taqqu (1989a) then reads as follows.
Theorem A (Dehling, Taqqu). Let the class of functions {1 {G(·)≤x} − F (x)} x∈R have Hermite rank m and let 0 < D < 1/m. Then 1 d n,m G(X i ) and G n (X i ) by F and F (n) , respectively.
To obtain weak convergence of the empirical process of (2.3) we have to make some assumptions on the structure of the change and the Hermite rank.
Assumption A: A1. The class of functions {1 {G(·)≤x} } x∈R has Hermite rank m with 0 < D < 1/m. A2. Let m(n) be the Hermite rank of {1 {Gn(·)≤x} } x∈R and m * = lim inf n→∞ m(n). Then we assume implies, see the proof of Lemma 4.5, (ii) Moreover, A2 implies convergence of the marginal distribution function. To see this, note and n (m−m * )D(1+δ)/2 = O(1). However, the converse is not always true. Consider for instance the functions G(x) = x and G n (x) = G 1 (x) = −x or the situation in Example 2.8. Then again, there are lots of natural choices of G and G n for whom convergence of the marginal distribution functions (with a certain rate) implies Assumption A2. Among them G n (x) = G(x) + µ n (mean-shift), G n (x) = σ n G(x) (change in variance) and (iii) Our assumptions explicitly allow for the Hermite rank to change together with the marginal distribution. Then again, the limit behaivior seems to be untouched by this change. Intuitively this corresponds to the idea that the change in distribution and the change in the Hermite coefficient, both caused by the difference of G and G n , are of the same order. For q < m this enforces the function J q,n (x) to converge rather fast to 0. Technically this can be explained through A2. If this assumption is dropped, we might actually encounter limits with multiple Hermite processes. Such cases will be considered in Example 2.8 and Corollary 2.13.
(iv) If A1 is violated, the sequence {G(X i )} i≥1 is actually short-range dependent. For stationary observations Csörgő and Mielniczuk (1996) showed convergence of the sequential empirical process to a two-parameter Gaussian process. Change-point alternatives have not been considered for such random variables, yet, but would require fundamentally different proofs compared to our results.

Asymptotic behavior of the change-point statistics
We now apply the results concerning empirical processes to determine the asymptotic distribution of the Kolmogorov-Smirnov statistics and that of the Cramér-von Mises change-point statistic To get a non degenerate limit under a sequence of local alternatives it is important to choose the right amount of change. For a mean-shift this is naturally the difference of the expectations before and after the change. For a general change we formulate the test problem as follows: We wish to test the hypothesis H : Assumption A1 holds and G n (x) = G(x) for all x ∈ R and n ≥ 1, against the sequence of local alternatives A n : Assumption A holds and, for n → ∞, uniformly in x, where g(x) is a measurable function of bounded total variation, whose support has positive Lebesgue measure.
Remark 2.3. Note that nd −1 n,m ∼ n mD/2 L −m/2 (n). Thus (2.7) implies (ii) Under the sequence of local alternatives A n we have, as n → ∞, The tests can be performed, if the right normalization for the empirical process, the supremum of J m (x) and the distribution of sup t∈[0,1] |Z m,H (t)| are known. In practical applications this might be not the case. Solutions are self-normalization (Shao (2011)), estimating the the Hurst-coefficient (see for example Künsch (1987)) and bootstrap estimators for J m (x) (Tewes (2016)).

Examples
Example 2.4 (Mean-shift). Let G n (x) = G(x) + µ n with µ n ∼ d n /n, then we get the typical change in the mean problem. In the case of long-range dependent subordinated Gaussian processes this was considered in Dehling et al. (2012Dehling et al. ( , 2013, Csörgő and Horvath (1997), Shao (2011) and Betken (2016). Let f G be the probability density of G(X 1 ), and assume that it is continuous and of bounded variation. Then we obtain where, due to continuity of f G , the convergence holds uniformly.
Example 2.5 (Change in the variance). To describe the change-in-variance-problem define G n (x) = 1/(1 − δ n )G(x), with δ n ∼ d n /n. For ease of notation let δ n = d n /n. Then we get The derivative of xF (x) is xf G (x) + F (x), hence (2.8) converges to 0. The convergence is uniform, if f G and F are continuous. (2.9) converges to 0, because of continuity, monotonicity and boundedness of F . Thus (2.7) holds with function g(x) = xf G (x). Assume without loss of The minimum can be treated analogously, hence Assumption A2 follows from convergence of the marginals.
Additionally one might consider a combined change in mean and variance, given through G n (x) = σ n G(x) + µ n . In this case (2.7) holds with g(x) = f G (x)(C 1 + C 2 x).
Example 2.6 (Generalized inverse of a mixture distribution). By using the generalized inverse of a distribution function one could generate subordinated Gaussian processes with any given marginals, see for example Dehling et al. (2013). We use this for the change-point problem by For a continuous distribution function F * define the mixture with δ n ∼ d n n −1 . Then (2.7) holds with g(x) = F * (x) − F (x) and moreover Analogously one has P (min{G(X 1 ), G n (X 1 )} ≤ x) = max{F (x), F (n) (x)}. Hence thus Assumption A2 is also satisfied. For strongly mixing data similar local alternatives were considered by Inoue (2001).

Further let
with Hermite ranks m(n) = 1 for all n ∈ N. If (a n − 1) ∼ d n,2 /n, then one can show (similar to the case of a variance change in Example 2.5) that uniformly in x. As Assumption A2 is satisfied, too, we may apply Corollary 2 (ii) with function Example 2.8 (Multiple Hermite processes in the limit). In the previous example, together with the marginal distribution, also the Hermite rank has changed. However, the limiting process seems to be untouched by this fact and one might ask whether this is intuitive or not.
It is caused by the fact that the change in the distribution and the change in the Hermite coefficients, both originating in the difference of the functions G(x) and G n (x), are of the same order.
To get an additional Hermite process in the limit, one would need (a n − 1) ∼ d n,2 /d n,1 , see Corollary 2.13 and its proof. But then and the test would have asymptotic power 1.
To achieve nontrivial asymptotic power one has to consider structural breaks that consists of two aspects and where only one is captured by the marginal distribution. To this end define the transformations for some sequence (a n ) n with a n = 1 and a n → 1. On the one hand, {1 {G()≤x} } x has Hermite rank m = 2 and G(X i ) ∼ N (0, 1). On the other hand, {1 {Gn()≤x} } x has Hermite rank m(n) = 1 for all n ∈ N and G n (X i ) ∼ N (µ n , 1). Now let µ n ∼ d n,2 /n, then Example 2.4 applies and we for any sequence (a n ) n . In contrast, the convergence of the Hermite coefficients is highly influenced by (a n ) n . If the sequence is chosen such that (a n − 1) ∼ d n,2 /d n,1 (therefore, it converges slower than µ n ), then the sequential empirical process will converge towards Actually this can be proved similar to Corollary 2.13. Moreover, the Kolmogorov-Smirnov statistic converges weakly to We find this example rather pathological, therefore such situations are excluded from the main results via Assumption A2.

Asymptotic relative efficiency
By studying the asymptotic distributions under local alternatives one might compare different tests in terms of the asymptotic relative efficiency (ARE). Here we give a precise definition of the ARE in the very special context of our change-point setting. The general idea is due to Pitman (1948) (for a published article see for example Noether (1950)) and was formalized in Noether (1955). Of course it can be extended to all kinds of testing procedures.
Definition 2.9. Let T 1 and T 2 represent two change-point test procedures. Consider the local alternatives (G, G n k , τ ) and a sample size (n k ) k , Let β 1 be the asymptotic power of the test T 1 against the local alternatives given by (G, G n k , τ, (n k ) k ) and β 2 be the asymptotic power of the test T 2 against the local alternatives given by (G,G m k , τ, (m k ) k ).
If β 1 equals β 2 , then the asymptotic relative efficiency (ARE) of the tests T 1 and T 2 is defined as Example 2.10 (Mean-shift in Gaussian data). Consider G(x) = x and G n (x) = G(x) + µ n , in other words a mean-shift in Gaussian data. As for the Hermite coefficient function, we where φ is the standard normal probability density. Thus, according to Corollary 2, the test statistic T n converges towards whereas under the Null, that is we have a stationary standard Gaussian sequence, the limit distribution would be For the Cramér-von Mises statistic we obtain analogously the limit distributions under local alternative and hypothesis, respectively. Hence in this special case the CUSUM test, the Wilcoxon test (see Dehling et al. (2013) where q 1−α,H is the (1−α)-quantile of the maximum of a fractional Brownian bridge sup t∈[0,1] |B H (t)|. As a direct consequence, one gets that the ARE of the four tests is 1. This result is quite surprising, keeping in mind that CUSUM and Wilcoxon tests are designed to detect level-shifts, while our tests have power against all kinds of distributional changes.
For non-Gaussian data and change-points beyond a simple mean-shift, the investigation of the ARE is not that straightforward. In fact, little is known about the distribution of and even less if higher order Hermite processes are considered. This seems to prevent a precise computation of the ARE in many cases. However, one might derive lower bounds for the efficiency as we do in next example for a combined change in mean and variance. Unlike in the previous example, we will make use of the subtle definition of the ARE.
The asymptotic distribution of the CUSUM test has been derived in Dehling et al. (2013), but only in the case of a mean-shift with constant variance. However, for EG 2 (X i ) < ∞ the CUSUM statistic is a continuous functional of the sequential empirical process. Thus, we might apply our Theorem 1 and conclude that the CUSUM statistic converges under this type of local alternatives to Note that this is the same limit as under a mean-shift with constant variance and thus, too, the asymptotic power is the same as in example 2.10.
The limiting distribution of the Cramér-von Mises statistic is given by and for its asymptotic power we obtain First assume sup t {B H (t)} ≤ q = q 1−α,H and consider C * 1 , given by Combining these two findings with (2.11) we can bound the asymptotic power from below by for C * 1 > C 1 . Now we are ready to compute the ARE. To this end we chose different sample sizes for both test.
In detail, (n k ) k for the Cramér-von Mises test and (m k ) k for the CUSUM test. Moreover, the local alternatives are such that G CvM For the CUSUM test, in order to achieve at least asymptotic power β, its limit distribution has to satisfy P sup In other words, and π −1 is the generalized inverse. Therefore, the sample size of the CUSUM test has to be chosen such that We also obtain (as µ k and (1 − 1/σ k ) are of the same order) for some C * 2 > 0. Next we will select the sample size for the Cramér-von Mises test such that its asymptotic power might be bounded from below as in (2.12) (and therefore by β). To this end choose is monotone increasing, surjective and continuous (as the minimum is attained either in κ 1 or κ 2 ), therefore such an C 1 always can be found. By construction of the function f it follows that C 1 < C * 1 . Now let the sample size of the Cramérvon Mises test satisfy (2.14) Moreover, we observe Thus, and we observe for the asymptotic power of the Cramér-von Mises test So both test have (at least) asymptotic power β against the local alternatives (G, G k , τ ). Finally, by construction of the sample sizes and the definition of slowly varying functions. Consequently In other words, the Cramér-von Mises test is asymptotically more efficient, no matter how small the additional variance-change is.

The empirical process of triangular arrays
Since the work of Dehling and Taqqu (1989a,b), uniform reduction principles have become the main tool in the analysis of empirical processes of long-range dependent data. More precisely, the empirical process gets approximated only by the first term of its Hermite expansion (if the underlying process is not Gaussian other expansions are available). However, most results are investigated for stationary sequences. When considering G(X 1 ), . . . , G(X ⌊nτ ⌋ ), G n (X ⌊nτ ⌋+1 ), . . . , G n (X n ), the empirical process of the first ⌊nτ ⌋ random variables can be approximated just as in Dehling and Taqqu (1989a). In contrast the Hermite expansion of 1 {Gn( Two difficulties arise. First, m * might be smaller than m, the Hermite rank of {1 {G(·)≤x} } x∈R .
Secondly, the coefficients J q,n (x) depend on n and might converge uniformly to 0. Thus, it is a priori not clear which term of the Hermite expansion is asymptotically dominant or if there are even more than one. The next result is a reduction principle that lays emphasis on this aspects. We will make use of it in the proof of Theorem 1, but is also of interest on its own.
Theorem 3. Let {G n } n be a sequence of measurable functions and let m(n) be the sequence of Hermite ranks of {1 {Gn(·)≤x} } x∈R . Then, for any m ∈ N with m(n) ≤ m < 1/D (for n ≥ n 0 ), where C and κ do not depend on n.
Remark 2.12. (i) Theorem 3 contains the reduction principle of Dehling and Taqqu (1989a) as a special case (G n (x) = G(x) and m(n) = m).
Thus, one might expect d −1 n,m * as normalization. The weaker normalization d −1 n,m is however possible since the empirical process is approximated by additional terms of the Hermite expansion, in detail those up to m.
(iii) A similar result is given by Wu (2003), who considers linear long memory processes and even shows convergences with respect to a weighted supremum metric. Then again, he considers only the normal empirical process, while we also treat the sequential version. Moreover, we consider triangular arrays, which Wu (2003)  uniformly in x, then (Z q,H (t)) t∈[0,1] are uncorrelated, qth order Hermite processes.
Remark 2.14. (i) Comparing the limit process of Corollary 2.13 to that of Theorem 1 it is apparent that multiple Hermite Processes are involved. This is not the case in Theorem 1. The reason is Assumption A2, which causes the Hermite coefficients J m,n (x) to converge rather fast.
(ii) The Hermite processes occurring in the limit are dependent, see Proposition 1 in Bai and Taqqu (2013).
Remark 2.15. In view of the proof of Corollary 2.13 it is important to note that the functions h q are uniform limits of the cádlág-functions J m,n (x) and hence elements of D[−∞, ∞]. As a consequence they are also bounded (Pollard (1984)).
Example 2.16. There are indeed sequences of functions {G n } n that satisfy the conditions of Corollary 2.13. Consider again the functions from Example 2.7, namely G n (x) = x 2 (1 x≥0 + a n 1 x<0 ) with a n → 1 and a n = 1. Thus, we are in the situation of Theorem 3 with m(n) = 1 for all n ∈ N. One obtains, a n → 1s If in addition a n ∼ n −D/2 L 1/2 (n) ∼ d n,2 /d n,1 , then for some constant C depending on D only. Corollary 2.13 then holds with m = 2, m * = 1, The goal of this simulation study is to examine whether this theoretical and asymptotic result carry over to the finite sample performance of the tests. We will consider samples of size 50 to 400. For these situations the approximation of the empirical process by its semi-degenerate limit process is quite inaccurate. The empirical size of the Cramér-von Mises test will be therefore much larger than the nominal size, if critical values are deduced from the asymptotic distribution. Instead we simulate J = 1000 Gaussian time series X j,1 , . . . X j,n j = 1, . . . J with Hurst coefficient H. In the simulation study we will use fractional Gaussian noise for this sequences. Subsequently, a Cramér-von Mises statistic is calculated for each of the J = 1000 Gaussian series, in detail  formations of Gaussian data. We note that this is a strong assumption and that an accurate approximation of the empirical process for general long-range dependent data is an issue of future research. The CUSUM statistic is not invariant under monotone transformations. Therefore, the Wilcoxon change-point test is considered additionally.
In the first part of the simulation study we treat realizations of a Gaussian process X 1 , . . . , X n given by fractional Gaussian noise). For the implementation we have used the function fgnSim from the R-package fArma. Eventually a change is added by Y i = X i + µ1 {i>⌊nτ ⌋} and the three mentioned change-point tests are applied to Y 1 , . . . , Y n .  For early changes (after 20% of the observations) the CUSUM test is slightly more accurate than the other tests. Depending on sample size and strength of dependence, either the Cramérvon Mises or the Wilcoxon test might be second best.

Unknown Hurst coefficient
In applications the true Hurst coefficient H is unknown, and in the following we will consider two different estimators. The first is the local Whittle estimator (denoted byĤ) with bandwidth parameter m = ⌊n 2/3 ⌋, see Künsch (1987). However, if there is actually a change in the data, the local Whittle estimator is known to be biased. For the second estimator we therefore divide the observations into two subsamples X 1 , . . . , Xk and Xk +1 , . . . , X n   Hariz et al. (2009). Horvath and Kokoszka (1997) verified consistency for the analogous CUSUM-based estimator. to the case where H was assumed to be known. Then again, the probability of a false rejection is higher than α = 0.05, so the test is quite liberal.   Table 2.

Consistency of this estimator was shown in
In terms of empirical power the Cramér-von Mises and Wilcoxon test give similar results with the CUSUM test being slightly ahead for τ = 1/2 and being clearly advantageous for early changes τ = 1/5 (see Table 3).
We have to keep in mind that CUSUM and Wilcoxon test are designed to detect changes in the mean. On the contrary, the Cramér-von Mises test is a so called omnibus test and has power against arbitrary changes in the marginal distribution. Therefore, we consider another situation, with the mean-shift being now accompanied by a small change in the variance. In detail, corresponding to a situation in which mean, variance, skewness and the Hermite rank change (see Example 2.8).

f arima(0, d, 0)-processes
For Gaussian long memory processes beyond fractional Gaussian noise, not only the Hurst coefficient determines the normalization. Instead it is given by d n = n H L 1/2 (n)(H(2H − 1)) 1/2 ), see (2.1). In this study we assume, as n → ∞, L(n) → C, which is quite common in the literature. For fractional Gaussian noise, C = H(2H − 1) so the two factors just cancel out.
In general the constant C is given through the limit  as k → ∞. We suggest an estimator for C (which is quite heuristic) by: withĤ being one of the two estimators from above. Finally, we use the normalization The estimatorĈ in (3.1) is only defined under long memory, that is H > 0.5 (or in this situation H > 0.5). Therefore, we modify both estimators by considering max(Ĥ, 0.501) instead ofĤ.
The effect of this modification on short memory processes will be seen in the next section.
However, for f arima(0, d, 0)-sequences it seems to work quite well, see

Short-range dependent effects
Finally, we have considered deviations from purely LRD sequences by simulating f arima(1, d, 0)time series and short memory AR(1)-processes.
First, we have applied the tests to f arima(0, d, 1)-sequences, which are still long-range dependent. Table 7 indicates that the empirical power of the Cramér-von Mises test is less than in the case of f arima(0, d, 0)-processes. However, the test works principally well, meaning that the power increases with the number of observations while the empirical size stays close to the nominal size. For CUSUM and Wilcoxon test this seems to be not the case.  For the (purely short-range dependent) AR(1)-processes we make two observations: First, due to the assumption of LRD (H > 0.5) the normalization is too strong and the statistics converge to 0, at least under stationarity. If the structural change is big enough, the tests might still detect the change (see Table 8). However, there is a certain loss in power.  Table 8. Again, this matches the theoretical fact that under short memory the tests show a different asymptotic behavior.

Proof of Theorem 3 and Corollary 2.13
It is the goal to approximate the sequential empirical process by a linear combination of multiple partial sum processes. The indicator function 1 {Gn(X j )≤x} has the Hermite expansion . Now let L m,n,j (x) be the Hermite expansion up to m, in detail Let m(n) be the Hermite rank of (1 {Gn(X j )≤x} ) x . Then we have by the conditions of Theorem 3 that m * ≤ m(n) ≤ m for some m * ≤ m < 1/D. Thus Finally, let S n (k; x, y) = S n (k; y)−S n (k; x), L m,n,j (x, y) = L m,n,j (y)−L m,n,j (x) and J n,q (x, y) = J n,q (y) − J n,q (x).
We will make use of the chaining technique of Dehling and Taqqu (1989a). To this end, define and observe that J q,n (x, y)/q! is bounded by Λ n (x, y) = Λ n (y) − Λ n (x), for all n ∈ N and all q = 0, . . . , m. Furthermore, Λ n is monotone, Λ n (−∞) = 0 and Define partitions, similarly to Dehling and Taqqu (1989a), but now depending on n, by for k = 0, · · · , K, with the integer K chosen below. Then we have Note that the right hand side of (4.2) does not depend on n.
Based on these partitions we can define chaining points i k (x) by for each x and each k ∈ {0, 1, . . . , K}, see Dehling and Taqqu (1989a).
Lemma 4.1. Define the chaining points as above. Suppose the following two conditions hold: (i) There are constants γ > 0 and C > 0, not depending on n, such that for all k ≤ n E|S n (k; x, y)| 2 ≤ C k n n −γ F (n) (x, y).
(ii) For all ǫ > 0 and all n ∈ N there is a real number K = K(n, ǫ), such that for all λ > 0 Then there is a constant ρ > 0, such that for all n ∈ N and all ǫ > 0 the following holds: Proof. Due to definition of the chaining points each point x is linked to −∞ in detail The last summand of the right hand side of (4.3) can be treated as follows (4.4) By (4.3) and (4.4) we get, using ∞ k=1 (k + 2) −2 < 1/2, Further, by condition (i) of Lemma 4.1 and the Markov inequality we get The constant C in (4.8) is the constant of condition (i) in Lemma 4.1 and thus independent of n. In the next line this C gets multiplied with Λ n (+∞), which is a constant by itself. Thus the C in the inequality above is a universal constant, not depending on n. The same is true for γ.
Using the same arguments we get moreover Finally we have by condition (ii) of Lemma 4.1 for all λ > 0. Combining the estimates for (4.5), (4.6) and (4.7) we arrive at which finishes the proof.
Lemma 4.2. There exist constants γ and C, not depending on n, such that for all k ≤ n E|S n (k; x, y)| 2 ≤ C k n n −γ F (n) (x, y).
The proof is very close to the proof of Lemma 3.1 in Dehling and Taqqu (1989a). However, for further results it is crucial that C and γ only depend indirectly on the function G n , namely through the Hermite rank. Thus we give a detailed proof to highlight this fact.
Proof. First, obtain the Hermite expansion Secondly, we have by orthogonality of the H q (X i ) and EH 2 This yields Note that the second factor of the product in the last line may depend indirectly on the function G n , because G n determines m, however this is the only influence. For different combinations of m and D the term i,j≤k |r(i − j)| m+1 might have a different asymptotic order. However, in all cases we get (see page 1777 in Dehling and Taqqu (1989a)) The result then follows because L and L 1 are slowly varying.
Lemma 4.3. Let n ∈ N and ǫ > 0. Define the chaining points and L m,n,j (x) as in (4.1). Set Then there is a constant C > 0, such that for all λ > 0 Proof. By construction of the chaining points we have for q = 0, . . . , m and for all Therefore we get by Markov's inequality for q = m * , . . . , m For q = 0 the term is deterministic, thus the probability is 0. Note that (K + 3) 5 ≤ Cǫ −1 n δ for any δ > 0, see Dehling and Taqqu (1989a), page 1781. By this fact and by virtue of Lemma 4.1 with ρ = min(γ − δ, m * D − λ). Now choose δ < γ, then ρ > 0 and we have thus proven a reduction principle in x. It remains to verify uniformity in l. For n = 2 r one gets by the same arguments as in the proof of Theorem 3.1 in Dehling and Taqqu (1989a) P max for any 0 < ǫ ≤ 1 and universal constants C and ǫ. Next consider arbitrary n and define for r (1 {Gn(X j )≤x} − L m,n,j (x)) for l ≤ 2 r , where {G n (X j )} n∈N,j≤2 r is a (slightly modified) array. One obtains The last line holds since d 2 r ,m /d n,m is uniformly bounded away from 0 and ∞. Thus, Theorem 3 is proven.
Proof of Corollary 2.13. Using the reduction principle, namely Theorem 3, it remains to show converges to the desired limit processes. Define where convergence takes place in D([0, 1] × [−∞, ∞], equipped with the supremum norm. The result then follows by the uniform convergence of d n,m /d n,q J q,n (x) towards q!h q (x), the reduction principle and Slutsky's theorem.

Proof of Theorem 1 and Theorem 2
We start by proving a reduction principle for the empirical process in presence of a change point. Consider the array {Y n,i } n∈N,i≤n , defined in section 2, and let H n,i (x) = P (Y n,i ≤ x). Define Lemma 4.4. Let the conditions of Theorem 1 hold. Then there are constants C ≥ 0 and κ > 0 such that for all ǫ > 0 P sup By Theorem 3 we have P sup (4.11) Next obtain Therefore, we get, using (4.11) several times, for all n ∈ N and all ǫ > 0. as n → ∞.
Proof of Theorem 1. By definition of the functions J q,n,i we get The second and the third summands are negligible due to the uniform convergence of the functions J q,n (see Lemma 4.5)). The first summand converges in distribution towards J m (x) m! Z m (t), see Dehling and Taqqu (1989a). Together with Lemma 4.4 this finishes the proof.
Proof of Theorem 2. We give the proof for a sequence of local alternatives. The asymptotic behavior under the hypothesis then is an immediate consequence. Obtain the following de- By uniform convergence of n/d n,m (F (x) − F (n) (x)) and ψ n,τ (t) towards g(x) and ψ τ (t), respectively, Theorem 1 and the continuous mapping theorem, one gets that (4.13) converges weakly towards J m (x)/(m!) (Z m (t) − tZ m (t)) + ψ τ (t)g(x).
The convergence of the Kolmogorov-Smirnov type statistic then follows from continuity of the application of the supremum norm. The Cramér-von Mises statistic S n can be written Due to the convergence of (4.13) and the continuous mapping theorem, (4.14) converges to the desired limit process. Thus, it remains to show that (4.15) is negligible. Therefore, obtain = I n − II n + III n .
As a consequence of Theorem 1 and F (n) (x) → F (x) one gets a weak Glivenko-Cantelli type convergence, in detail Moreover, obtain that J m (x) is of bounded variation (this was also noted in Dehling and Taqqu (1989a)). To see this, let [a, b] be an arbitrary interval and {x i } n i=0 a partition of this interval. Then By the boundedness of J m , J 2 m is also of bounded variation and thus integration by parts, together with the weak Glivenko-Cantelli-type reulst, yields I n = −(Z m (t)) 2 /(m!) 2 R (F n (x) − F (x)) dJ 2 m (x) P − → 0.
By definition, the function g(x) is bounded and of bounded variation. Hence the same is true for g 2 (x) and by the same arguments as above one gets III n = o P (1). Finally, II n = o P (1), which can be seen using Hölders's inequality. This finishes the proof.
Remark 4.6. Note that our proof of the weak convergence of the Cramér-von Mises statistic would not work for short-range dependent time series. The reason is the completely different limit behavior of the sequential empirical process. Instead of the semi-degenerate process J m (x)Z m (t) one gets a Gaussian process K(t, x). While J m is of bounded variation, this is not the case for sample paths of K. Hence R K(t, x) d(F n (x) − F (x)) cannot be treated simultaneously to R J m (x) d(F n (x) − F (x)).