Change point estimation based on Wilcoxon tests in the presence of long-range dependence

We consider an estimator for the location of a shift in the mean of long-range dependent sequences. The estimation is based on the two-sample Wilcoxon statistic. Consistency and the rate of convergence for the estimated change point are established. In the case of a constant shift height, the $1/n$ convergence rate (with $n$ denoting the number of observations), which is typical under the assumption of independent observations, is also achieved for long memory sequences. It is proved that if the change point height decreases to $0$ with a certain rate, the suitably standardized estimator converges in distribution to a functional of a fractional Brownian motion. The estimator is tested on two well-known data sets. Finite sample behaviors are investigated in a Monte Carlo simulation study.

where (µ i ) i≥1 are unknown constants and where (Y i ) i≥1 is a stationary, longrange dependent (LRD, in short) process with mean zero. A stationary process (Y i ) i≥1 is called "long-range dependent" if its autocovariance function ρ, ρ(k) := Cov(Y 1 , Y k+1 ), satisfies where 0 < D < 1 (referred to as long-range dependence (LRD) parameter) and where L is a slowly varying function. Furthermore, we assume that there is a change point in the mean of the observations, that is µ i = µ, for i = 1, . . . , k 0 , µ + h n , for i = k 0 + 1, . . . , n, where k 0 = nτ denotes the change point location and h n is the height of the level-shift.
In the following we differentiate between fixed and local changes. Under fixed changes we assume that h n = h for some h = 0. Local changes are characterized by a sequence h n , n ∈ N, with h n −→ 0 as n −→ ∞; in other words, in a model where the height of the jump decreases with increasing sample size n.
In order to test the hypothesis (see Dehling, Rooch and Taqqu (2013a)). Under the assumption that there is a change point in the mean in k 0 we expect the absolute value of W k0,n to exceed the absolute value of W l,n for any l = k 0 . Therefore, it seems natural to define an estimator of k 0 bŷ k W =k W (n) := min k : |W k,n | = max Preceding papers that address the problem of estimating change point locations in dependent observations X 1 , . . . , X n with a shift in mean often refer to a family of estimators based on the CUSUM change point test statistics C n (γ) := max 1≤k≤n−1 |C k,n (γ)|, where C k,n (γ) := k(n − k) n with parameter 0 ≤ γ < 1. The corresponding change point estimator is defined byk C,γ =k C,γ (n) := min k : |C k,n (γ)| = max 1≤i≤n−1 |C i,n (γ)| . ( For long-range dependent Gaussian processes Horváth and Kokoszka (1997) derive the asymptotic distribution of the estimatork C,γ under the assumption of a decreasing jump height h n , i.e. under the assumption that h n approaches 0 as the sample size n increases. Under non-restrictive constraints on the dependence structure of the data-generating process (including long-range dependent time series) Kokoszka and Leipus (1998) prove consistency ofk C,γ under the assumption of fixed as well as decreasing jump heights. Furthermore, they establish the convergence rate of the change point estimator as a function of the intensity of dependence in the data if the jump height is constant. Ben Hariz and Wylie (2005) show that under a similar assumption on the decay of the autocovariances the convergence rate that is achieved in the case of independent observations can be obtained for short-and long-range dependent data, as well. Furthermore, it is shown in their paper that for a decreasing jump height the convergence rate derived by Horváth and Kokoszka (1997) under the assumption of gaussianity can also be established under more general assumptions on the data-generating sequences. Bai (1994) establishes an estimator for the location of a shift in the mean by the method of least squares. He proves consistency, determines the rate of convergence of the change point estimator and derives its asymptotic distribution. These results are shown to hold for weakly dependent observations that satisfy a linear model and cover, for example, ARMA(p, q)-processes. Bai extended these results to the estimation of the location of a parameter change in multiple regression models that also allow for lagged dependent variables and trending regressors (see Bai (1997)). A generalization of these results to possibly long-range dependent data-generating processes (including fractionally integrated processes) is given in Kuan and Hsu (1998) and Lavielle and Moulines (2000). Under the assumption of independent data Darkhovskh (1976) establishes an estimator for the location of a change in distribution based on the two-sample Mann-Whitney test statistic. He obtains a convergence rate that has order 1 n , where n is the number of observations. Allowing for strong dependence in the data Giraitis, Leipus and Surgailis (1996) consider Kolmogorov-Smirnov and Cramér-von-Mises-type test statistics for the detection of a change in the marginal distribution of the random variables that underlie the observed data. Consistency of the corresponding change point estimators is proved under the assumption that the jump height approaches 0. A change point estimator based on a self-normalized CUSUM test statistic has been applied in Shao (2011) to real data sets. Although Shao assumes validity of using the estimator, the article does not cover a formal proof of consistency. Furthermore, it has been noted by Shao and Zhang (2010) that even under the assumption of short-range dependence it seems difficult to obtain the asymptotic distribution of the estimate.
In this paper we shortly address the issue of estimating the change point location on the basis of the self-normalized Wilcoxon test statistic proposed in Betken (2016). In order to construct the self-normalized Wilcoxon test statistic, we have to consider the ranks R i , i = 1, . . . , n, of the observations X 1 , . . . , X n . These are defined by R i := rank(X i ) = n j=1 1 {Xj ≤Xi} for i = 1, . . . , n. The selfnormalized two-sample test statistic is defined by The self-normalized Wilcoxon change point test for the test problem (H, A) rejects the hypothesis for large values of T n (τ 1 , τ 2 ) = max k∈{ nτ1 ,..., nτ2 } |SW k,n |, where 0 < τ 1 < τ 2 < 1. Note that the proportion of the data that is included in the calculation of the supremum is restricted by τ 1 and τ 2 . A common choice for these parameters is τ 1 = 1 − τ 2 = 0.15; see Andrews (1993).
A natural change point estimator that results from the self-normalized Wilcoxon test statistic iŝ k SW =k SW (n) := min k : |SW k,n | = max nτ1 ≤i≤ nτ2 |SW i,n | .
We will prove consistency of the estimatork SW under fixed changes and under local changes whose height converges to 0 with a rate depending on the intensity of dependence in the data. Nonetheless, the main aim of this paper is to characterize the asymptotic behavior of the change point estimatork W . In Section 2 we establish consistency ofk W andk SW , derive the optimal convergence rate ofk W and finally consider its asymptotic distribution. Applications to two wellknown data sets can be found in Section 3. The finite sample properties of the estimators are investigated by simulations in Section 4. Proofs of the theoretical results are given in Section 5.

Main Results
Recall that for fixed x, x ∈ R, the Hermite expansion of 1 {G(ξi)≤x} − F (x) is given by where H q denotes the q-th order Hermite polynomial and where is a stationary, long-range dependent Gaussian process with mean 0, variance 1 and LRD parameter D. We assume that 0 < D < 1 r , where r denotes the Hermite rank of the class of Moreover, we assume that G : R −→ R is a measurable function and that (Y i ) i≥1 has a continuous distribution function F . .
Since g D,r is a regularly varying function, there exists a function g − D,r such that g D,r (g − D,r (t)) ∼ g − D,r (g D,r (t)) ∼ t, as t → ∞, (see Theorem 1.5.12 in Bingham, Goldie and Teugels (1987) and if F has a bounded density f . In other words, we havê in both situations. Furthermore, it follows that the Wilcoxon test is consistent under these assumptions (in the sense that 1 ndn,r max 1≤k≤n−1 |W k,n | P −→ ∞).
The following theorem establishes a convergence rate for the change point estimatork W . Note that only under local changes the convergence rate depends on the intensity of dependence in the data.
Theorem 1. Suppose that Assumption 1 holds and let m n := g − D,r (h −1 n ). Then, we have and F has a bounded density f .
1. Under fixed changes m n is constant. As a consequence, |k W −k 0 | = O P (1). This result corresponds to the convergence rates obtained by Ben Hariz and Wylie (2005) for the CUSUM-test based change point estimator and by Lavielle and Moulines (2000) for the least-squares estimate of the change point location. Surprisingly, in this case the rate of convergence is independent of the intensity of dependence in the data characterized by the value of the LRD parameter D. An explanation for this phenomenon might be the occurrence of two opposing effects: increasing values of the LRD parameter D go along with a slower convergence of the test statistic W k,n (making estimation more difficult), but a more regular behavior of the random component (making estimation easier) (see Ben Hariz and Wylie (2005)). Based on the previous results it is possible to derive the asymptotic distribution of the change point estimatork W :

Note that if
Theorem 2. Suppose that Assumption 1 holds with r = 1 and assume that F has a bounded density f . Let m n := g − D,1 (h −1 n ), let B H denote a fractional Brownian motion process and define h(s; τ ) by . If h −1 n = o n dn,1 , then, for all M > 0, 1 e n W 2 k0+ mns ,n − W 2 k0,n , −M ≤ s ≤ M, with e n = n 3 h n d mn,1 , converges in distribution to (3) Remark 2.
1. Under local changes the assumption on h n is equivalent to Assumption C.5 (i) in Horváth and Kokoszka (1997). Moreover, the limit distribution (3) closely resembles the limit distribution of the CUSUM-based change point estimator considered in that paper. 2. The proof of Theorem 2 is mainly based on the empirical process noncentral limit theorem for subordinated Gaussian sequences in Dehling and Taqqu (1989). The sequential empirical process has also been studied by many other authors in the context of different models. See, among many others, the following: Müller (1970) and Kiefer (1972) for independent and identically distributed data, Berkes and Philipp (1977) and Philipp and Pinzur (1980) for strongly mixing processes, Berkes, Hörmann and Schauer (2009) for S-mixing processes, Giraitis and Surgailis (1999) for long memory linear (or moving average) processes, Dehling, Durieu and Tusche (2014) for multiple mixing processes. Presumably, in these situations the asymptotic distribution ofk W can be derived by the same argument as in the proof of Theorem 2 for subordinated Gaussian processes. In particular, Theorem 1 in Giraitis and Surgailis (1999) can be considered as a generalization of Theorem 1.1 in Dehling and Taqqu (1989), i.e. with an appropriate normalization the change point estimatork W , computed with respect to long-range dependent linear processes as defined in Giraitis and Surgailis (1999), should converge in distribution to a limit that corresponds to (3) (up to multiplicative constants).

Applications
We consider two well-known data sets which have been analyzed before. We compute the estimatork W based on the given observations and put our results into context with the findings and conclusions of other authors. The plot in Figure 1 depicts the annual volume of discharge from the Nile river at Aswan in 10 8 m 3 for the years 1871 to 1970. The data set is included in any standard distribution of R. Amongst others, Cobb (1978), Macneill, Tang and Jandhyala (1991), Wu and Zhao (2007), Shao (2011) and Betken and Wendler (2015) provide statistically significant evidence for a decrease of the Nile's annual discharge towards the end of the 19th century. The construction of the Aswan Low Dam between 1898 and 1902 serves as a popular explanation for an abrupt change in the data around the turn of the century. Yet, Cobb gave another explanation for the decrease in water volume by citing rainfall records which suggest a decline of tropical rainfall at that time. In fact, an application of the change point estimatork W identifies a change in 1898. This result seems to be in good accordance with the estimated change point locations suggested by other authors: Cobb's analysis of the Nile data leads to the conjecture of a significant decrease in discharge volume in 1898. Moreover, computation of the CUSUM-based change point estimatork C,0 considered in Horváth and Kokoszka (1997) indicates a change in 1898. Balke (1993) and Wu and Zhao (2007) suggest that the change occurred in 1899.
The second data set consists of the seasonally adjusted monthly deviations of the temperature (degrees C) for the Northern hemisphere during the years 1854 to 1989 from the monthly averages over the period 1950 to 1979. The data has been taken from the longmemo package in R. It results from spatial averaging of temperatures measured over land and sea. In view of the plot in Figure 2 it seems natural to assume that the data generating process is non-stationary. Previous analysis of this data offers different explanations for the irregular behavior of the time series. Deo and Hurvich (1998) fitted a linear trend to the data, thereby providing statistical evidence for global warming during the last decades. However, the consideration of a more general stochastic model by the assumption of so-called semiparametric fractional autoregressive (SEMIFAR) processes in Beran and Feng (2002) does not confirm the conjecture of a trend-like behavior. Neither does the investigation of the global temperature data in Wang (2007) support the hypothesis of an increasing trend. It is pointed out by Wang that the trend-like behavior of the Northern hemisphere temperature data may have been generated by stationary long-range dependent processes. Yet, it is shown in Shao (2011) and also in Betken and Wendler (2015) that under model assumptions that include long-range dependence an application of change point tests leads to a rejection of the hypothesis that the time series is stationary. According to Shao (2011) an estimation based on a self-normalized CUSUM test statistic suggests a change around October 1924. Computation of the change point estimatork W corresponds to a change point located around June 1924. The same change point location results from an application of the previously mentioned estimatork C,0 considered in Horváth and Kokoszka (1997). In this regard estimation byk W seems to be in good accordance with the results of alternative change point estimators.

Simulations
We will now investigate the finite sample performance of the change point estimatork W and compare it to corresponding simulation results for the estimatorŝ k SW (based on the self-normalized Wilcoxon test statistic) andk C,0 (based on the CUSUM test statistic with parameter γ = 0) . For this purpose, we consider two different scenarios: 1. Normal margins: We generate fractional Gaussian noise time series (ξ i ) i≥1 and choose G(t) = t in Assumption 1. As a result, the simulated observa- Note that in this case the Hermite coefficient J 1 (x) is not equal to 0 for all x ∈ R (see Dehling, Rooch and Taqqu (2013a)) so that m = 1, where m denotes the Hermite rank of 1 {G(ξi)≤x} − F (x), x ∈ R. Therefore, Assumption 1 holds for all values of D ∈ (0, 1). 2. Pareto margins: In order to get standardized Pareto-distributed data which has a representation as a functional of a Gaussian process, we consider the transformation with parameters k, β > 0 and with Φ denoting the standard normal distribution function. Since G is a strictly decreasing function, it follows by Theorem 2 in Dehling, Rooch and Taqqu (2013a) that the Hermite rank of 1 {G(ξi)≤x} − F (x), x ∈ R, is m = 1 so that Assumption 1 holds for all values of D ∈ (0, 1).
To analyze the behavior of the estimators we simulated 500 time series of length 600 and added a level shift of height h after a proportion τ of the data. We have done so for several choices of h and τ . The descriptive statistics, i.e. mean, sample standard deviation (S.D.) and quartiles, are reported in Tables 1, 2, and 3 for the three change point estimatorsk W ,k SW andk C,0 .
The following observations, made on the basis of Tables 1, 2, and 3, correspond to the expected behavior of consistent change point estimators: • Bias and variance of the estimated change point location decrease when the height of the level shift increases. • Estimation of the time of change is more accurate for breakpoints located in the middle of the sample than estimation of change point locations that lie close to the boundary of the testing region. • High values of H go along with an increase of bias and variance. This seems natural since when there is very strong dependence, i.e. H is large, the variance of the series increases, so that it becomes harder to accurately estimate the location of a level shift.
A comparison of the descriptive statistics of the estimatork W (based on the Wilcoxon statistic) andk SW (based on the self-normalized Wilcoxon statistic) shows that: • In most cases the estimatork SW has a smaller bias, especially for an early change point location. Nevertheless, the difference between the biases of k SW andk W is not big. • In general the sample standard deviation ofk W is smaller than that of k SW . Indeed, it is only slightly better for τ = 0.25, but there is a clear difference for τ = 0.5.
All in all, our simulations do not give rise to choosingk SW overk W . In particular, better standard deviations ofk W compensate for smaller biases of k SW .
Comparing the finite sample performance ofk W and the CUSUM-based change point estimatork C,0 we make the following observations: • For fractional Gaussian noise time series bias and variance ofk C,0 tend to be slightly better, at least when τ = 0.25 and especially for relatively high level shifts. Nonetheless, the deviations are in most cases negligible. • If the change happens in the middle of a sample with normal margins, bias and variance ofk W tend to be smaller, especially for relatively high level shifts. Again, in most cases the deviations are negligible. • For Pareto(3, 1) time seriesk W clearly outperformsk C,0 by yielding smaller biases and decisively smaller variances for almost every combination of parameters that has been considered. The performance of the estimatork C,0 surpasses the performance ofk W only for high values of the jump height h.
It is well-known that the Wilcoxon change point test is more robust against outliers in data sets than the CUSUM-like change point tests, i.e. the Wilcoxon test outperforms CUSUM-like tests if heavy-tailed time series are considered. Our simulations confirm that this observation is also reflected by the finite sample behavior of the corresponding change point estimators.
wherek W,i , i = 1, . . . , m, denote the estimates for k 0 , computed on the basis of m = 5000 different sequences of fractional Gaussian noise time series. Sincek W − k 0 = O P (1) due to Theorem 1, we expect MAE to approach a constant as n tends to infinity. This can be clearly seen in Figure 3 for H ∈ {0.6, 0.7, 0.8}. For a high intensity of dependence in the data (characterized by H = 0.9) convergence becomes slower. This is due to a slower convergence of the test statistic W n (k) which, in finite samples, is not canceled out by the effect of a more regular behavior of the sample paths of the limit process.

Proofs
In the following let F k and F k+1,n denote the empirical distribution functions of the first k and last n − k realizations of Y 1 , . . . , Y n , i.e.
For notational convenience we write W n (k) instead of W k,n and SW n (k) instead of SW k,n . The proofs in this section as well as the proofs in the appendix are partially influenced by arguments that have been established in Horváth and Kokoszka (1997), Bai (1994) and Dehling, Rooch and Taqqu (2013a). In particular, some arguments are based on the empirical process non-central limit theorem of Dehling and Taqqu (1989) which states that where r is the Hermite rank defined in Assumption 1, Z H is an r-th order Hermite process 1 , H = 1 − rD 2 ∈ 1 2 , 1 , and " The Dudley-Wichura version of Skorohod's representation theorem (see Shorack and Wellner (1986), Theorem 2.3.4) implies that, for our purposes, we may assume without loss of generality that sup λ∈[0,1],x∈R Proof of Proposition 1. The proof of Proposition 1 is based on an application of Lemma 1 in the appendix. According to Lemma 1 it holds that, under the assumptions of Proposition 1, and C denotes some non-zero constant. It directly follows that 1 ndn,r max 1≤k≤n−1 |W n (k)| P −→ ∞. Furthermore, All in all, it follows that for any ε > 0 This proves consistency of the change point estimator which is based on the Wilcoxon test statistic. In the following it is shown that 1 nk SW is a consistent estimator, too. For this purpose, we consider the process SW n ( nλ ), 0 ≤ λ ≤ 1. According to Betken (2016) the limit of the self-normalized Wilcoxon test statistic can be obtained by an application of the continuous mapping theorem to the process 1 a n nλ i=1 n j= nλ +1 where a n denotes an appropriate normalization. Therefore, it follows by the corresponding argument in Betken (2016) that .
Proof of Theorem 1. In the following we writek instead ofk W . For convenience, we assume that h > 0 under fixed changes, and that for some n 0 ∈ N h n > 0 for all n ≥ n 0 under local changes, respectively. Furthermore, we subsume both changes under the general assumption that lim n→∞ h n = h (under fixed changes h n = h for all n ∈ N, under local changes h = 0). In order to prove Theorem 1, we need to show that for all ε > 0 there exists an n(ε) ∈ N and an M > 0 such that Therefore, P 2 ≤ P 2,1 + P 2,2 , where P 2,1 := P sup k∈D n,M (1) In the following we will consider the first summand only. (For the second summand analogous implications result from the same argument.) For this, we define Note that We have Due to Lemma 2 in the appendix and Theorem 1.1 in Dehling, Rooch and Taqqu (2013a) 2 sup i.e. for all ε > 0 there exists a K > 0 such that The right hand side of the above inequality diverges if h n = h is fixed or if h −1 n = o n dn,r . Therefore, it is possible to find an n(ε) ∈ N such that for all n ≥ n(ε).
We will now turn to the summand P 1 . We have P 1 ≤ P 1,1 + P 1,2 , where In the following we will consider the first summand only. (For the second summand analogous implications result from the same argument.) We define a random sequence k n , n ∈ N, by choosing k n ∈ D n,M (1) such that sup k∈D n,M (1) Note that for any sequence k n , n ∈ N, with k n ∈ D n,M (1) where l n := k 0 − k n . Since k n ∈ D n,M (1) and m n −→ ∞ we have for n sufficiently large. Thus, we have If h n is fixed, the right hand side of the inequality diverges. Under local changes the right hand side asymptotically behaves like since, in this case, h n ∼ dm n ,r mn due to the assumptions of Theorem 1. In any case, for δ > 0 it is possible to find an n 0 ∈ N such that All in all, the previous considerations show that there exists an n 0 ∈ N and a constant K such that for all n ≥ n 0 Thus, for n ≥ n 0 For each i ∈ {1, . . . , 4} it will be shown that P sup for n and M sufficiently large.
1. Note that sup k∈D n,M (1) Due to stationarity sup k∈D n,M (1) Note that sup k∈D n,M (1) converges to 0 almost surely. Therefore, for n sufficiently large. Note that sup x∈R |J r (x)| < ∞. Furthermore, it is well-known that all moments of Hermite processes are finite. As a result, it follows by Markov's inequality that for some M 0 ∈ R for n sufficiently large. As a result, sup k∈D n,M (1) Due to the empirical process non-central limit theorem of Dehling and Taqqu (1989) we have Moreover, H is a H-self-similar process with stationary increments. Thus, we have P sup for n sufficiently large. Again, it follows by Markov's inequality that for M sufficiently large. 3. Note that for n sufficiently large. Therefore, sup k∈D n,M (1) The expression on the right hand side of the inequality converges in distribution to due to the empirical process non-central limit theorem. Since As a result, the aforementioned argument yields P sup for n and M sufficiently large. 4. We have sup k∈D n,M (1) Hence, the same argument that has been used to obtain an analogous result for A n,1 can be applied to conclude that P sup All in all, it follows that for all ε > 0 there exists an n(ε) ∈ N and an M > 0 such that for all n ≥ n(ε). This proves Theorem 1.
Proof of Theorem 2. Note that We will show that (with an appropriate normalization) W n (k 0 + m n s )−W n (k 0 ) converges in distribution to a non-deterministic limit process whereas W n (k 0 + m n s ) + W n (k 0 ) (with stronger normalization) converges in probability to a deterministic expression. For notational convenience we write d mn instead of d mn,1 , J instead of J 1 ,k instead ofk W and we define l n (s) := k 0 + m n s . We have We will show that 1 ndm nṼ n (l n (s)) converges to h(s; τ ) in probability and that 1 We rewriteṼ n (l n (s)) in the following way: if s > 0. For s < 0 the limit of 1 ndm nṼ n (l n (s)) corresponds to the limit of due to Lemma 3 and stationarity of the random sequence Y i , i ≥ 1. Note that The above expression converges to −s f 2 (x)dx, since h n ∼ dm n mn .
For s > 0 the limit of 1 ndm nṼ n (l n (s)) corresponds to the limit of due to Lemma 3 and stationarity of the random sequence Y i , i ≥ 1. Note that The above expression converges to s f 2 (x)dx, since h n ∼ dm n mn . All in all, it follows that 1 ndm nṼ n (l n (s)) converges to h(s; τ ) defined by In the following it is shown that 1 ndm n V n (l n (s)) converges in distribution to Note that if s < 0, If s > 0, we have The arguments that appear in the proof of Lemma 3 can also be applied to show that the limit of 1 ndm n V n (l n (s)) corresponds to the limit of 1 nd mn (A 1,n (s) + A 2,n (s) + A 3,n (s)) , where Note that for s < 0 1 nd mn A 2,n (s) = − 1 nd mn m n s l n (s) (F ln(s) (x) − F (x))dF (x).
The above expression converges to 0 uniformly in s, since mn dm n = o( n dn ) and since probability. An analogous argument shows that 1 ndm n A 3,n (s) vanishes if n tends to ∞. Therefore, it remains to show that 1 ndm n A 1,n (s) converges in distribution to a non-deterministic expression. Due to stationarity 1 nd mn A 1,n (s) for s < 0. As a result, If s > 0, an application of the previous arguments shows that 1 ndm n A 2,n (s) and 1 ndm n A 3,n (s) converge to 0 whereas 1 ndm n A 1,n (s) converges in distribution to B H (s) J(x)dF (x).
All in all, it follows that Furthermore, it follows that with the stronger normalization h n n 2 the limit of 1 hnn 2 W n (k 0 + m n s ) corresponds to the limit of 1 hnn 2 W n (k 0 ). We have The second summand on the right hand side vanishes as n tends to ∞, since h −1 n = o (n/d n ). Due to Lemma 3 the limit of In addition, k0 n (n−k0) n −→ τ (1 − τ ). From this we can conclude that 1 h n n 2 (W n (k 0 + m n s) + W n (k 0 )) M ]. This completes the proof of the first assertion in Theorem 2.
In order to show that we make use of Lemma 4. For this purpose, we note that according to Lifshits' criterion for unimodality of Gaussian processes (see Theorem 1.1 in Ferger (1999) Note that Therefore, we have to show that for some M ∈ R Hence, for all ε > 0 there is an M 0 ∈ R and an n 0 ∈ N such that P k =k(M ) < ε for all n ≥ n 0 and all M ≥ M 0 . This concludes the proof of Theorem 2.

Appendix A: Auxiliary Results
In the following we prove some Lemmas that are needed for the proofs of our main results. Lemma 1 characterizes the asymptotic behavior of the Wilcoxon process under the assumption of a change-point in the mean. It is used to prove consistency of the change-point estimatorsk W andk SW . 2. The process 1 nd n,r nτ i=1 n j= nλ +1 1 {Yi≤Yj +hn} − F (x + h n )dF (x) , τ ≤ λ ≤ 1, converges in distribution to Proof. We give a proof for the first assertion only as the convergence of the second term follows by an analogous argument. The steps in this proof correspond to the argument that proves Theorem 1.1 in Dehling, Rooch and Taqqu (2013a). For λ ≤ τ it follows that nλ i=1 n j= nτ +1 1 {Yi≤Yj +hn} = (n − nτ ) nλ F nλ (x + h n )dF nτ +1,n (x).
For the first summand we have sup 0≤λ≤τ d −1 n,r nλ F nλ (x + h n ) − F (x + h n ) dF nτ +1,n (x) H (λ) (J r (x + h n ) − J r (x + h)) dF nτ +1,n (x) We will show that each of the summands on the right hand side converges to 0. The first summand converges to 0 because of the empirical non-central limit theorem of Dehling and Taqqu (1989). In order to show convergence of the second and third summand, note that sup 0≤λ≤τ |Z (r) H (λ)| < ∞ a.s. since the sample paths of the Hermite processes are almost surely continuous.
The first expression on the right hand side converges to 0 by the Glivenko-Cantelli theorem and the fact that |H r (y)| ϕ(y)dy < ∞; the second expression converges to 0 due to continuity of F and the dominated convergence theorem.
To show convergence of the third summand note that For both summands on the right hand side of the above inequality the ergodic theorem implies almost sure convergence to 0.