Power of Change-Point Tests for Long-Range Dependent Data

We investigate the power of the CUSUM test and the Wilcoxon change-point test for a shift in the mean of a process with long-range dependent noise. We derive analytiv formulas for the power of these tests under local alternatives. These results enable us to calculate the asymptotic relative efficiency (ARE) of the CUSUM test and the Wilcoxon change point test. We obtain the surprising result that for Gaussian data, the ARE of these two tests equals 1, in contrast to the case of i.i.d. noise when the ARE is known to be $3/\pi$.


Introduction
Statistical tests for the presence of changes in the structure of time series are of great importance in a wide range of scientific discussions, e.g. regarding economic, technological and climate data. Many procedures for detecting changes and for estimating change-points have been proposed in the literature; see e.g. Csörgő and Horvath (1997) for a detailed exposition. In the case of independent data, the theory is quite satisfactory. For various types of change-point models, statistical procedures have been proposed and their properties investigated. In contrast, the situation is different for dependent data, such as encountered in time series models. For dependent data, most research has focused on linear procedures, such as cumulative sum (CUSUM) tests, and there are many open problems when it comes to other types of test procedures, e.g. those used in robust statistics.
In the present paper, we study the change-point problem for Gaussian subordinated long-range dependent data. Specifically, we will test the hypothesis that the process is stationary against the alternative that there is a change in the mean. The classical test statistic for this problem is the CUSUM statistic, When the test statistic is large, one infers that there is a change in the mean. The CUSUM test has good properties when the underlying process is Gaussian. The asymptotic distribution of the CUSUM test in the presence of long-range dependent data has been investigated by Horváth and Kokoszka (1997). Ben Hariz and Wylie (2005) have studied the rate of convergence of a change-point estimator based on the CUSUM test. However, the CUSUM test is not robust against possible outliers in the data, because the sum k i=1 X i can change drastically when there are outliers. Recently, Dehling, Rooch and Taqqu (2013) have proposed a robust alternative to the CUSUM test, which is based on the Wilcoxon two-sample rank statistic. The corresponding "Wilcoxon change-point test" uses the test statistic One rejects the null hypothesis when this test statistic is large. Motivation for the centering constant 1/2 is provided in Remark 3.2. Rank tests for change-point problems have been studied earlier by Antoch et al. (2008), in the presence of i.i.d. data, and by Wang (2008) for linear processes.
Ben Hariz, Wylie and Zhang (2007) have studied optimal rates of convergence for a wide class of nonparametric change-point estimators.
In their paper, Dehling, Rooch and Taqqu (2013) investigated the asymptotic distribution of the Wilcoxon change-point test under the null hypothesis of no change, in the presence of long-range dependence. Moreover, they performed a simulation study to compare the finite sample performance and the power of the CUSUM test based on (1.1) and the Wilcoxon change-point test based on (1.2). 1 In the present paper, we study the power of the CUSUM test and the Wilcoxon change-point test for a shift in the mean of a long-range dependent process. We will calculate the power under local alternatives, where the height of the shift decreases with the sample size n in such a way that the tests have non-trivial limit power as n → ∞. These results enable us to compute the asymptotic relative efficiency (ARE) of the CUSUM and the Wilcoxon changepoint tests, which is defined as the limit of the ratio of the sample sizes required to obtain a given power. We obtain the surprising result that the ARE of these two tests equals 1 in the case of long-range dependent Gaussian data. This is in contrast with the case of i.i.d. and short-range dependent data, where the ARE of the Wilcoxon change-point test with respect to the CUSUM test is 3/π. In the context of M-estimation of a location parameter, a similar phenomenon has been observed by Beran (1991); see also Beran (1994), Corollary 8. 1.
We consider a model where the observations are generated by a stochastic process (X i ) i≥1 of the type where ( i ) i≥1 is a long-range dependent stationary process with mean zero, finite variance and where (μ i ) i≥1 are the unknown means. We focus on the case when ( i ) i≥1 is an instantaneous functional of a stationary Gaussian process (ξ i ) i≥1 with non-summable covariances, i.e.
We assume that (ξ i ) i≥1 is a long-range dependent (LRD), mean-zero Gaussian process with variance E(ξ 2 i ) = 1 and autocovariance function ρ(k) = k −D L(k), k ≥ 1, (1.4) where 0 < D < 1, and where L(k) is a slowly varying function. Moreover, G : R → R is a measurable function satisfying E(G(ξ i )) = 0. Based on observations X 1 , . . . , X n , we wish to test the hypothesis that there is no change in the means of the data against the alternative We shall refer to this test problem as (H, A).
Dehling, Rooch and Taqqu (2013) have studied two tests for this change-point problem, namely the CUSUM test which uses the test statistic 6) and the Wilcoxon change-point test which is based on the test statistic Here where H m is the m-th order Hermite polynomial, and where m is the Hermite rank of G, respectively of the class of functions 1 {G(ξ)≤x} ; see below for details. Observe that the normalization d n , which will be specified below, is the same for both tests. These tests are similar in spirit. They compare the first part of the sample to the second part. The Wilcoxon change-point test (1.7) involves the rank of the data whereas the CUSUM test (1.6) involves their values. One rejects the null hypothesis of no change when these test statistics are large.
Dehling, Rooch and Taqqu (2013) investigated the asymptotic distribution of these test statistics under the null hypothesis H of no change in the means. In addition, they calculated the power of these tests numerically via a Monte-Carlo simulation. In this paper, we will compute the power of the above test statistics under a local alternative. More specifically, we shall consider the following sequence of alternatives where 0 ≤ τ ≤ 1. Observe that the mean shift h n depends on the sample size n.

Power of the CUSUM test under local alternatives
We will first investigate the asymptotic distribution of the process To do so, we consider the Hermite expansion of G(ξ i ), namely where H q is the q-th order Hermite polynomial, and where We define the Hermite rank of the function G as m = min{q ≥ 1 : a q = 0}, and introduce the normalization constants We suppose 0 < D < 1/m, in which case Here we use the symbol a n ∼ b n to denote a n /b n → 1 as n → ∞. Under the null hypothesis H of no mean shift, we get that the process (D n (λ)) 0≤λ≤1 in (2.1) Observe that the statistic D n (λ) presumes that the jump occurs at time [nλ]+1, whereas the local alternative A τ,hn (n) involves a jump at [nτ ] + 1. There will therefore be an interplay between λ and τ . In fact, under the local alternative A τ,hn (n) in (1.8), we get which takes its maximum value τ (1 − τ ) at λ = τ ; see Figure 1. Note that for large n, we get using (2.5), -6 X X X X X X X X X X X X X X X X X X X X X Graph of the function φτ ; see (2.6).
Thus, in order for the second term in (2.4) to converge as n → ∞, we have to choose the mean shift h n ∼ c d n /n. When n is large, this is exactly the order of the mean shift that can be detected with a nontrivial power, that is with a power which is neither 0 nor 1. for an arbitrary constant c, the process (D n (λ)) 0≤λ≤1 in (2.4) where (Z m (λ)) λ≥0 denotes the m-th order Hermite process with Hurst parameter H = 1 − Dm/2 ∈ (1/2, 1), where a m is given by (2.2) and φ τ (λ) by (2.6).
Proof. We use the decomposition (2.4). The first term on the right hand side has the same distribution as D n (λ) under the hypothesis, and thus converges in distribution to am m! (λZ m (1) − Z m (λ)). Regarding the second term, we observe that by (2.7) and (2.5) we get uniformly in λ ∈ [0, 1], as n → ∞.

Remark 2.2.
(i) Observe that for c = 0 we recover the limit distribution under the null hypothesis. Thus, Theorem 2.1 is a generalization of the results obtained previously under the null hypothesis. The limit process is a fractional bridge process. When m = 1, this process is a fractional Gaussian bridge. For m > 1, the process is non-Gaussian.
(ii) Under the local alternative, i.e. when c = 0, the limit process is the sum of a fractional bridge process and the deterministic function c φ τ .
As an application of the continuous mapping theorem, we obtain the following corollary.

Corollary 2.3.
Under the local alternative A τ,hn (n) with h n ∼ dn n c, D n as defined in (1.6) Remark 2.4.
(i) The limit distribution (2.9) depends on the constant c. For c = 0, we obtain the limit distribution under the null hypthesis. Quantiles of this limit distribution were calculated numerically via a Monte-Carlo simulation by Dehling, Rooch and Taqqu (2013), Table 1. Increasing the value of |c| leads to a shift of the distribution to the right. If c = ∞, that is, if h n tends slower to zero than dn n c for any c > 0, then the correct normalization for D n (λ) should go to ∞ at a higher rate which would kill the random part (λZ m (1) − Z m (λ)) in (2.8), and hence the mean shift could be detected precisely. The power of the asymptotic test would be equal to 1 in this case.
(ii) For a given τ ∈ [0, 1], the function φ τ (λ) takes its maximum value in λ = τ , and this maximum value equals τ (1 − τ ). Thus, for values of τ close to 0 and close to 1, τ (1 − τ ) is close to 0, and thus the effect of adding the term cφ τ (λ) is rather small. As a result, the power of the test is small at mean shifts that occur very early or very late in the process. (iii) The higher the mean shift, the easier it is to detect. (iv) If the observations are short-range dependent, one can typically detect mean shifts h n of height √ n/n = 1/ √ n, but here, because of long-range dependence, the mean shifts that can be detected are of larger order dn n ∼ cn 1−Dm/2 L(n) n = cn −Dm/2 L(n); note that Dm < 1.
We will now apply Corollary 2.3 in order to make power calculations for the change-point test that rejects for large values of D n . Under the null hypothesis of no mean shift, If we denote by q α the upper α-quantile of the distribution of sup 0≤λ≤1 |λZ m (1)− Z m (λ)|, we obtain where P H indicates the probability under the null hypothesis H. Thus, the test that rejects the null hypothesis H when D n ≥ |am| m! q α has asymptotic level α. If h n is chosen as in (2.7), we obtain under the local alternative A τ,hn (n) Thus, for large n, the power of our test at the alternative A τ,hn (n) is approximately given by the right-hand side of (2.10). We may also apply Corollary 2.3 in order to determine the size of a mean shift at time [τ n] that can be detected with a given power β. First, we calculate Thus, by (2.10), we get that the asymptotic power of the test at the alternative A τ,hn (n) is equal to β. Thus, given a sample size n, we can detect a mean shift of height h n = dn n c(α, β) at time [τ n] with power β with a level α test based on the test statistic D n . Note that the above calculations are of limited practical value when m ≥ 2, as the quantiles of the process λZ m (1)−Z m (λ) are not easily calculated.

Power of the Wilcoxon change-point test under local alternatives
In the context of the Wilcoxon change-point test, the Hermite rank is not that of the function G, but of the class of functions We define the Hermite expansion of the class of functions (3.1) as where H q is again the q-th order Hermite polynomial and where the coefficients are We define the Hermite rank of the class of functions (3.1) as Theorem 3.1. Suppose that (ξ i ) i≥1 is a stationary Gaussian process with mean zero, variance 1 and autocovariance function as in (1.4) with 0 ≤ D < 1/m. Moreover, let G : R → R be a measurable function, and assume that G(ξ k ) has continuous distribution function F (x). Let m denote the Hermite rank of the class of functions (3.1), let d n be as in (2.3), and let the mean shift h n be as in (2.7). Then, under the local alternative A τ,hn (n), defined in (1.8), if h n → 0 as n → ∞, the process where (Z m (λ)) λ≥0 denotes the m-th order Hermite process with Hurst parameter

Remark 3.2.
(i) The normalization d n and the processes (Z m (λ)) λ≥0 in Theorem 2.1 and Theorem 3.1 are the same.
(ii) Note that it is possible that the Hermite ranks in Theorem 2.1 and Theorem 3.1 are different. This is the case, e.g. when Obviously, the Hermite rank of G is 3, while the Hermite rank of the class of functions 1 {G(ξi)≤x} is 1. As a consequence, the CUSUM test converges at a faster rate than the Wilcoxon test.
(iii) Since, by assumption, the distribution F (x) of G(ξ k ) is continuous, it follows from integration by parts that R F (x)dF (x) = 1 2 . This explains the 1/2 in where X 1 is an independent copy of X 1 . The independence assumption is reasonable as the dependence between X i and X j vanishes asymptotically when |i − j| → ∞. (iv) As noted at the beginning of the proof, the first part of (3.3) converges to (3.4) under the null hypothesis. We show in the proof that the second part of (3.3) compensates for the presence of the local alternative A τ,hn .
(v) We make no assumption about the exact order of the sequence (h n ) n≥1 . Theorem 3.1 holds under the very general assumption that h n → 0, as n → ∞.
(vi) If we choose (h n ) n≥1 as in (2.7), the centering constants in (3.3) converge, provided some technical assumptions are satisfied. To see this, observe that The convergence in the next to last step requires some justification -this holds, e.g. if F is differentiable with bounded derivative f (x).

Corollary 3.3.
Suppose that (ξ i ) i≥1 is a stationary Gaussian process with mean zero, variance 1 and autocovariance function as in (1.4) with 0 ≤ D < 1/m. Moreover, let G : R → R be a measurable function, and assume that G(ξ k ) has a distribution function F (x) with bounded density f (x). Let m denote the Hermite rank of the class of functions Then, under the local alternative A τ,hn , defined in (1.8), with h n ∼ cd n /n we obtain that Proof of Theorem 3.1. In our proof, we will make use of the limit theorem that was derived in Dehling, Rooch and Taqqu (2013) under the null hypothesis. They showed (see Theorem 1) that 1]. In order to make use of this result, we will decompose the test statistic into a term whose distribution is the same both under the null hypothesis as well as under the alternative, and a second term which, after proper centering converges to zero. As in Dehling, Rooch and Taqqu (2013), we will express the test statistic as a functional of the empirical distribution function of the G(ξ i ), namely Recall that under the local alternative, we have Thus, we obtain for λ ≤ τ , In the same way, we obtain for λ ≥ τ , Thus, in order to prove Theorem 3.1, it suffices to show that the following two terms, both converge to zero in probability. We first show this for (3.5). Observe that as n → ∞. Hence, in order to show that (3.5) converges to zero in probability, it suffices to show that converges to zero, in probability, uniformly in λ ∈ [0, 1]. In order to prove this, we rewrite the difference of the integrals in (3.7) as where we have used integration by parts in the final step. Thus, in order to prove that (3.7) converges to zero, in probability, uniformly in λ ∈ [0, 1], it suffices to show that In order to prove (3.8) and (3.9), we now apply the empirical process non-central limit theorem of Dehling and Taqqu (1989) which states that Note that by definition, for any λ ≤ τ Hence, we may deduce from (3.10) the following limit theorem for the empirical distribution of the observations X [nλ]+1 , . . . , X [nτ ] , sup 0≤λ≤τ,x∈R As a special case, for τ = 1, we obtain sup 0≤λ≤1,x∈R a.s. −→ 0, (3.11) Now we return to (3.8) and write The second term on the right-hand side converges to zero by (3.10). Concerning the first term, note that where φ(y) = 1 √ 2π e −y 2 /2 denotes the standard normal density function. For the second identity, we have used the fact that G(ξ), by assumption, has a continuous distribution, and that R H m (y)φ(y)dy = 0, for m ≥ 1. Using (3.13) we thus obtain 2182 H. Dehling et al. and, using analogous arguments, By the Glivenko-Cantelli theorem, applied to the stationary, ergodic process (G(ξ i )) i≥1 , we get sup x∈R |F n (x) − F (x)| → 0, almost surely. Since Returning to the first term on the right-hand side of (3.12), we obtain, using (3.14) and (3.15), Both terms on the right-hand-side converge to zero; the second one by (3.16), the first one by continuity of F , the fact that h n → 0, and Lebesgue's dominated convergence theorem. In both cases, we have made use of the fact that |H m (y)|φ(y)dy < ∞. Thus we have finally established (3.8). In order to prove (3.9), we observe that The second term on the right-hand side converges to zero, by (3.11). Concerning the first term, note that Applying Lebesgue's dominated convergence theorem and making use of the fact that, by assumption, J is continuous, we obtain that R J(x)d(F (x + h n ) − F (x)) → 0. In this way, we have finally proved that (3.5) converges to zero, in probability. By similar arguments, we can prove this for (3.6), which finally ends the proof of Theorem 3.1.

ARE of the Wilcoxon change-point test and the CUSUM test for LRD data
In this section, we calculate the asymptotic relative efficiency (ARE) of the It will turn out that the limit (4.1) exists and that the asymptotic relative efficiency does not depend on the choice of τ, α, β. If this limit is larger than 1, then the CUSUM test requires a larger sample size to detect the mean shift, and hence the Wilcoxon change-point test is (asymptotically) more efficient.
In the remaining part of this section, we will focus on the case when m = 1 both for the CUSUM as well as the Wilcoxon change-point test, i.e. when the Hermite rank of G(ξ 1 ) and of the class of functions 1 {G(ξ1)≤x} − F (x), x ∈ R, are both equal to 1. This is the case, for example, when G is a strictly monotone function. In this case see Relation (20) in Dehling, Rooch and Taqqu (2013), showing that the Hermite rank of the class of functions 1 {G(ξ1)≤x} − F (x), x ∈ R equals 1. Focusing now on G(ξ 1 ) and using integration by parts, we get that the first order Hermite coefficient a 1 of G equals √ 2π e −x 2 /2 denotes the standard normal density function. Thus, the Hermite rank of G(ξ i ) equals 1, as well.
In this case, i.e. when m = 1, the Hermite process arising as limit in Theorem 2.1, Theorem 3.1 as well as in Corollary 3.3 is fractional Brownian motion (B H (λ)) 0≤λ≤1 . Note that fractional Brownian motion is symmetric, i.e. (−B H (λ)) 0≤λ≤1 has the same distribution as (B H (λ)) 0≤λ≤1 . Thus the limit processes in Theorem 2.1 and Corollary 3.3 can also be written as

respectively.
As preparation, we first calculate a quantity that is related to the asymptotic relative efficiency, namely the ratio of the heights of mean shifts that can be detected by the two tests, based on the same number of observations n, again for given values of τ, α, β. We denote the corresponding mean shifts by h W (n) and h C (n), respectively, assuming that these numbers depend on n in the following way: where c W and c C are given constants. In order to simplify the following considerations, we take a one-sided change-point test, thus rejecting the hypothesis of no change-point for large values of respectively. These are the appropriate tests when testing against the alternative of a non-negative mean shift. In order to obtain tests that have asymptotically level α, the CUSUM test and the Wilcoxon change-point test reject the nullhypothesis when where q α denotes the upper α quantile of the distribution of sup 0≤λ≤1 (B H (λ) − λB H (1)). This follows from Theorem 2.1 and Corollary 3.3 after applying the continuous mapping theorem. The constants a 1 and the functions J 1 (x) are defined in (2.2) and (3.2), respectively, and have just been computed. Under the local alternative A τ,h C (n) , the asymptotic distribution of the test statistic in (4.4) is given by see Theorem 2.1. Under the local alternative A τ,h W (n) , the asymptotic distribution of the test statistic in (4.5) is given by see Corollary 3.3. Thus, the asymptotic power of the CUSUM test is given by In the same way, we obtain the asymptotic power of the Wilcoxon change-point test Thus, if we want the two tests to have identical power, we have to choose c C and c W in such a way that which again yields by (4.2) and (4.3), This quantity gives the ratio of the height of a mean shift that can be detected by a CUSUM test over the height that can be detected by a Wilcoxon changepoint test, when both tests are assumed to have the same level α, the same power β and the shifts are taking place at the same time [nτ ]. In addition, we assume that the tests are based on the same number of observations n, which is supposed to be large.
Hence, both tests can asymptotically, as n → ∞, detect mean shifts of the same height.
The mean shifts can be expressed in terms of viewed as a function of t, for fixed values of τ and α. The function ψ is monotonically increasing. We define the generalized inverse, Thus, we get (4.8) and, in fact, for given τ , α and β, ψ − (β) is the smallest number having this property.
We can now apply Theorem 2.1 and Theorem 3.1. By comparing (4.6) and Similarly, by comparing (4.7) and (4.8), we get for the Wilcoxon change-point test that n has to satisfy In the following theorem, we compute the asymptotic relative efficiency of the Wilcoxon change point test with respect to the CUSUM test.  (1.4). Moreover, let G : R → R be a measurable function satisfying E(G 2 (ξ 1 )) < ∞, and such that G(ξ 1 ) has a distribution function F (x) with bounded density f (x). Assume that the Hermite rank of G(ξ 1 ) as well as the Hermite rank of the class of functions (1 {G(ξ1) ≤x} − F (x)), x ∈ R are equal to 1. Moreover assume that 0 < D < 1. Then where T C and T W denote the CUSUM test and the Wilcoxon change-point test, respectively.
Proof. For abbreviation, we define We will show that the Wilcoxon change-point test based on b n observations has asymptotically the same power as the CUSUM test based on n observations. We will consider the local alternative for the CUSUM test, and the local alternative where we have used the fact that L(n) is a slowly varying function. For the CUSUM test, we can apply Corollary 2.3 and we obtain under the local alternative A C n , that For the Wilcoxon change-point test, we apply Corollary 3.3 with c replaced by c b D/2 , in view of (4.10). We thus obtain under the local alternative A W n , Then the tests that reject the null hypothesis when 1 |a1| D n ≥ q α or when 1 | R J1(x)dF (x)| W n ≥ q α , respectively, have asymptotically level α. The power of these tests at the local alternatives A C n and A W n , respectively, converges to Note that this also holds for the power along any other sequence, such as bn.
Since the mean shift at the local alternative A C n equals the mean shift at the local alternative A W bn , we have shown that the Wilcoxon change-point test requires b n observations to yield the same performance as the CUSUM test with n observations. Thus ARE(T W , T C ) = 1/b, proving the theorem.

ARE of the Wilcoxon change-point test and the CUSUM test for IID data
We have shown in Example 4.1 that in the case of LRD data, the ARE of the Wilcoxon change-point test and the CUSUM test is 1 for Gaussian data. In this section, we will compare this surprising result with the case of i.i.d. Gaussian data. We will find that in this case, the ARE is 3/π, i.e. the Wilcoxon changepoint test is less efficient than the CUSUM test.
In this section, we consider the model (1.3) with i.i.d. Gaussian noise ( i ) i≥1 . Thus, the data are given by X i = μ i + i , i ≥ 1. We consider the U -statistic As kernel we will choose h C (x, y) = y − x and h W (x, y) = 1 {x≤y} − 1/2, in other words we consider Both kernels h C , h W are antisymmetric, i.e. they satisfy h(x, y) = −h(y, x), so in order to determine the limit behaviour of U (C) k and U (W ) k , we can apply the limit theorems of Csörgő and Horváth (1988). where (B n (λ)) 0≤λ≤1 is a sequence of Brownian bridges like B 1,n and B 2,n above and where and ⎛ ⎝ 1 Proof. First, we prove (5.1). Like for the case of LRD observations, we decompose the statistic, so that we obtain under the local alternative A τ,hn (n) By Theorem 5.1, the first term on the right-hand side converges to a Brownian bridge B(λ). For the second term we have like in the proof for LRD observations and in order for the right-hand side to converge, we have to choose Now let us prove (5.2). Again like for LRD observations, we decompose the statistic into one term that converges like under the null hypothesis and one term which converges to a constant. Under the local alternative A τ,hn (n) and for the case λ ≤ τ , this decomposition is (5.4) The first term converges uniformly to a Brownian Bridge, like under the null hypothesis. We will show that, if the observations i = G(ξ i ) are i.i.d. with c.d.f. F which has two bounded derivatives F = f and F , the second term converges uniformly to cλ ( In the case of standard normally distributed G(ξ i ), i.e. for F = Φ and f = ϕ, this function is c(2 √ π) −1 φ τ (λ). To this end, we consider the Hoeffding decomposition for the sequence of kernels h n (x, y) = 1 {y<x≤y+hn} :

random variables. Then we define
where in the last step we have used that (F (y + h n ) − F (y))/h n → f (y) and the dominated convergence theorem. Moreover, we define We will now show that and from this it follows by the sequence of Hoeffding decompositions (5.5) that i.e. that the second term in (5.4) converges uniformly to

by (5.6) and (5.3).
We use the triangle inequality and show the uniform convergence to 0 for each of the three terms in (5.7) seperately. Since the parameter λ occurs only in the floor function value [λn], the supremum is in fact a maximum, and the h 1,n ( i ) are i.i.d. random variables, so we can use Kolmogorov's inequality. We obtain for the first term in (5.7) Var[h 1,n ( i )].
(5.8) By the mean value theorem, we obtain ]. Since f = F is bounded by assumption, we get Var[h 1,n ( )] = O(h 2 n ). Since h n → 0, the right-hand side of (5.8) converges to 0 as n increases.
In the same manner, we obtain Finally, we have to show that We set temporarily l := [λn] and T := [τ n] and obtain from Markov's inequality Now for any collection of random variables Y 1 , . . . , Y k , one has the inequality where in the last step we have used that h 3,n ( i , j ) are uncorrelated by Hoeffding's decomposition. Now for two i.i.d. random variables , η, we have, like above with the Taylor expansion of F : using (5.6). We have just shown that P max which proves ( [λn] , as defined in the section about the ARE in the LRD case. Let q α denote the upper α-quantile of the distribution of sup 0≤λ≤1 B(λ). By Theorem 5.2, the asymptotic power of the tests is given respectively by where σ 2 = 1/12 and we assume that Thus if we want both tests to have identical power, we must ensure that Now we define, as in the proof for LRD observations, the probability for whose generalized inverse ψ − holds Now, comparing (5.11) and (5.10), we conclude that we can detect a mean shift of height h at time [nτ ] with the CUSUM test of (asymptotic) level α and power β based on n observations, if h C (n) = c C / √ n and where c C satisfies c C = ψ − (β); hence we obtain that h C (n) has to satisfy In the same manner, we get for the Wilcoxon test the conditions h W (n) = c W / √ n and c W /(σ2 √ π) = ψ − (β) and thus Solving these two equations for n again and denoting the resulting numbers of observations by n C and n W , respectively, we obtain To obtain ARE(T W , T C ), we equate h W and h C . We then obtain the following theorem.
where T C , T W denote the one-sided CUSUM-test, respectively the one-sided Wilcoxon test, for the test problem (H, A τ,hn ).

Simulation results
We have proven that for Gaussian data, the CUSUM test and the Wilcoxon change-point test show asymptotically the same performance, i.e. that their ARE is 1. For Pareto(3,1) distributed data, we obtain, using (4.9) and numerical integration, an ARE of approximately (2.68) 2/D . Now we will illustrate these findings by a simulation study.

Gaussian data
We consider realizations ξ 1 , . . . , ξ n of a fGn process with Hurst parameter H = 0.7 (D = 0.6), using the fArma package in R, and create observations by applying a transformation G which is (with respect to the standard normal measure) normalized and square-integrable: E[G(ξ)] = 0, E[G 2 (ξ)] = 1 for ξ ∼ N (0, 1). As a first step, we choose G(t) = t in order to obtain Gaussian observations X 1 , . . . , X n (later we will choose a function G such that we obtain Pareto distributed data). In other words, we consider data which follow the local alternative Since our theoretical considerations yield an ARE of 1, we expect that both tests detect jumps equally well -that means that both tests, set on the same level, detect jumps of the same height and at the same position in the same number of observations with the same relative frequency. And indeed, we can clearly see in Table 1 that the power of both tests approximately coincides at many points; differences can be spotted only when the break occurs early in the data.

Heavy tailed data
We consider again realizations ξ 1 , . . . , ξ n of a fGn process with Hurst parameter H = 0.7 (D = 0.6) and create observations by applying the transformation In this case, the first Hermite coefficient of G, obtained by numerical integration, equals a 1 ≈ −0.6784. This transformation G produces observations X i = G(ξ i ) which follow a standardized Pareto(3, 1) distribution with mean zero and variance 1. The probability density function of X i is given by To the second sample of observations, X [τn]+1 , . . . , X n , we again add a constant h, but this time we choose  This means that the CUSUM test needs approximately 26.66 times as many observations as the Wilcoxon test in order to detect the same jump on the same level with the same probability. In order to find this behaviour, we have analysed the power of the Wilcoxon test for n W = 10, 50, 100, 200 observations and the power of the CUSUM test for n C = 266, 1332, 2666, 5330 observations.
In order to be able to compare the two tests, we need to have identical mean shifts when applying the Wilcoxon test to a sample of size n W and the CUSUM test to a sample of size n C = 26.655 n W . This can be achieved by choosing the constants c in (6.1) accordingly, namely taking c C = 2.67754 c W . In this way, we obtain We ran simulations for two different choices of c W , namely c W = 1 and c W = 2; see Table 3 and Table 4 for the results.
Here, we have to face a problem which was already encountered by Dehling, Rooch and Taqqu (2013). For the heavy-tailed Pareto data, the convergence of the CUSUM test statistic towards its limit is so slow that the asymptotic quantiles of the limit distribution are not appropriate as critical values to define the domain of rejection of the test: In finite sample situations, the observed level of the test is not 5% -as it should be when using the 5%-quantile of the asymptotic limit distribution. In order to remedy this, we used as critical value for the test, the finite sample 5% quantiles of the distribution of the CUSUM test statistic under the null hypothesis, using a Monte Carlo simulation; see Table 6 in Dehling, Rooch and Taqqu (2013). Here, we have performed the same steps, but for sample sizes n = n C = 266, 1332, 2666, 5330. The results are given in  The simulation results are shown in Table 3 (for c W = 1) and Table 4 (for c W = 2). Indeed, for a fixed jump position τ , the power of the CUSUM test (for n = n C = 266, 1332, 2666, 5330 observations) and of the Wilcoxon test (for n = n W = 10, 50, 100, 200 observations) coincide. They are not fully equal, but we conjecture this is due to the small sample size which conflicts with the asymptotic character of our results. But it becomes clear: The CUSUM test needs quite a number of observations more to detect the same jump on the same level with the same probability -as predicted by our calculation around 25 times as many.