Detecting long-range dependence in non-stationary time series

An important problem in time series analysis is the discrimination between non-stationarity and longrange dependence. Most of the literature considers the problem of testing specific parametric hypotheses of non-stationarity (such as a change in the mean) against long-range dependent stationary alternatives. In this paper we suggest a simple approach, which can be used to test the null-hypothesis of a general non-stationary short-memory against the alternative of a non-stationary long-memory process. The test procedure works in the spectral domain and uses a sequence of approximating tvFARIMA models to estimate the time varying long-range dependence parameter. We prove uniform consistency of this estimate and asymptotic normality of an averaged version. These results yield a simple test (based on the quantiles of the standard normal distribution), and it is demonstrated in a simulation study that - despite of its semi-parametric nature - the new test outperforms the currently available methods, which are constructed to discriminate between specific parametric hypotheses of non-stationarity short- and stationarity long-range dependence.

as k → ∞, where d ∈ (0, 0.5) denotes a "long memory" parameter. Statistical models (and corresponding theory) for long-range dependent processes are very well developed [see Doukhan et al. (2003) or Palma (2007) for recent surveys] and have found applications in numerous fields [see Breidt et al. (1998), Beran et al. (2006) or Haslett and Raftery (1989) for such an approach in the framework of asset volatility, video traffic and wind power modeling]. However, it was pointed out by several authors that the observation of "long memory" features in the autocovariance function can be as well explained by non stationarity [see Mikosch and Starica (2004) or Chen et al. (2010) among many others]. This is clearly demonstrated in Figure 2, which shows the autocovariances of the squared returns from a fit of the (non-stationary) model X t,T = σ(t/T )Z t for the returns [here Z t is an i.i.d. sequence and σ(·) is some suitable function, cf. Starica and Granger (2005) or Fryzlewicz et al. (2006) for more details], and from a stationary FARIMA(3, d, 0)-fit for the squared ones X 2 t . Both models are able to explain the observed effect of 'long-range dependence' for the volatility process. So, in summary, the same effect can be explained by two completely different modeling approaches. For this reason several authors have pointed out the importance to distinguish between long-memory and non-stationarity [see Starica and Granger (2005), Perron and Qu (2010) or Chen et al. (2010) to mention only a few]. However, there exists a surprisingly small number of statistical procedures which address problems of this type. To the best of our knowledge, Künsch (1986) is the first reference investigating the existence of "long memory" if non-stationarities appear in the time series. In this article a procedure to discriminate between a long-range dependent model and a process with a monotone mean functional and weakly dependent innovations is derived. Later on, Heyde and Dai (1996) and Sibbertsen and Kruse (2009) developed methods for distinguishing between long-memory and small trends. Furthermore, Berkes et al. (2006), Baek and Pipiras )-fit to the 2048 squared IBM-returns X 2 t , right panel: Autocovariance function of X 2 t for X t simulated from the model X t,T =σ(t/T )Z t withσ(·) estimated by a rolling-window of length 128.
(2012) and Yau and Davis (2012) investigated CUSUM and likelihood ratio tests to discriminate between the null hypothesis of no long-range and weak dependence with one change point in the mean. Although the procedures proposed in these articles are technically mature and work rather well in suitable situations, they are, however, only designed to discriminate between long-range dependence and a very specific change in the first-order structure, like one structural break and two stationary segments of the series. This is rather restrictive, since the expectation might change in a different way than assumed [there could be, for example, continuous changes or multiple breaks instead of a single one] and the second-order structure could be time-varying as well. However, if these or more general non-stationarities occur, the discrimination techniques, which have been proposed in the literature so far, usually fail, and a procedure which is working under less restrictive assumptions is still missing. The objective of this paper is to fill this gap and to develop a nonparametric test for the null hypothesis of no long-range dependence in a framework which is flexible enough to deal with different types of non-stationarity in both the first and second-order structure. The general model is introduced in Section 2. Our approach is based on an estimate of a (possibly time varying) long-range dependence parameter, which is derived by a sieve of approximating tvFARIMA model. This estimator vanishes if and only if the null hypothesis of a short-memory locally stationary process is satisfied. Its asymptotic properties are investigated in Section 3. In particular we prove asymptotic normality of the proposed test-statistic under the null hypothesis of no longrange dependence. As a consequence we obtain a nonparametric test, which is based on the quantiles of the standard normal distribution and therefore very easy to implement. The finite sample properties of the new test are investigated in Section 4, which also provides a comparison with the competing procedures with a focus on non-stationarities. We demonstrate the superiority of the new method and also illustrate the application of the method in two data examples. Finally, all technical details are deferred to an appendix.

Locally stationary long-range dependent processes
In order to develop a test for the presence of long-range dependence which can deal with different kinds of non-stationarity, a set-up is required which includes short-memory processes with a rather general time-varying first and second order structure and a reasonable long-range dependent extension. For this purpose, we consider a triangular scheme ({X t,T } t=1,...,T ) T ∈N of locally stationary long-memory processes, which have an MA(∞) representation of the form µ : [0, 1] → R is a "smooth" function and {Z t } t∈Z are independent standard normal distributed random variables. For the coefficients ψ t,T,l and the function µ in the expansion (2.1) we make the following additional assumptions.
2) The time varying spectral density f : can be represented as where the function g defined by is twice continuously differentiable.
3) There exists a constant C ∈ R + , which is independent of u and λ, such that for l = 0 the conditions sup u∈(0,1) are satisfied.
Similar locally stationary long-range dependent models have been investigated by Beran (2009), Palma and Olea (2010) and Roueff and von Sachs (2011). Note that in contrast to the standard framework of local stationarity introduced by Dahlhaus (1997) and extended to the long-memory case in Palma and Olea (2010), condition (2.3) is much weaker and allows, for example, to include tvFARIMA(p, d, q)-models as well [see Theorem 2.2 in Preuß and Vetter (2013)]. Moreover, it is also worthwhile to mention that the assumption of Gaussianity is only imposed to simplify the technical arguments in the proofs of our main results and that it is straightforward (but cumbersome) to extend the theory to a more general framework, see Remark 3.9 for more details on this. The very specific form of the function g in (2.7) implies that the process {X t,T } t=1,...,T can be locally approximated by a FARIMA(∞, d, 0) process in the sense of (2.3). More precisely, we obtain with between the approximating functions ψ l (u) and the time-varying AR-parameters [see the proof of Lemma 3.2 in Kokoszka and Taqqu (1995) for more details]. In order to further visualize some properties of these kinds of locally stationary long-memory models we introduce for every fixed u ∈ [0, 1] the stationary process One can show that condition (2.4) implies the existence of bounded functions y i : [0, 1] → R (i = 1, 2) such that the approximations hold [see Palma and Olea (2010) for details]. Consequently, the autocovariance function γ(u, k) = Cov(X 0 (u), X k (u)) is not absolutely summable if the function a(u) in (2.4) is not vanishing, and in this case the time varying spectral density f (u, λ) has a pole at λ = 0 for any u ∈ [0, 1] for which d 0 (u) is positive.
In the framework of these long-range dependent locally stationary processes we now investigate the null hypothesis that the time-varying "long memory" parameter d 0 (u) vanishes for all u ∈ [0, 1], i.e. there is no long-range dependence in the locally stationary process X t,T . Since the function d 0 is continuous and non-negative we obtain that the hypotheses where the quantity F is defined by (2.10) In the next section we will develop an estimator of the function d 0 and establish uniform convergency. The integral is then estimated by a Riemann sum and we investigate the asymptotic properties of the resulting estimator both under the null hypothesis and the alternative. In particular we prove consistency and asymptotic normality. As a consequence we obtain a consistent level-α test for the presence of long-range dependence in non-stationary time series models by rejecting the null hypothesis for large values of the estimator of F .

Testing short-versus long-memory
In order to estimate the integral F we use a sieve of semi-parametric models approximating the processes {X t (u)} t∈Z with time varying spectral density (2.6) and proceed in several steps. First we choose an increasing sequence k = k(T ) ∈ N, which diverges 'slowly' to infinity as the sample size T grows, and fit a tvFARIMA(k,d,0) model to the data. To be precise we consider a locally stationary long-memory model with time varying spectral density f : [0, 1] × [−π, π] → R + 0 defined by and, for each k ∈ N, θ k = (d, a 1 , . . . , a k ) : [0, 1] → R k+1 is a vector valued function. We then estimate the function θ k (u) by a localized Whittle-estimator, that iŝ where denotes the (local) Whittle likelihood [see Dahlhaus and Giraitis (1998) or Dahlhaus and Polonik (2009)] and for each u ∈ [0, 1] Θ u,k ⊂ R k+1 is a compact set which will be specified in Assumption 3.1. In (3.2) and (3.3) the quantity denotes the mean-corrected local periodogram, N is an even window-length which is 'small' compared to T andμ is an asymptotically unbiased estimator of the mean function µ : [0, 1] → R, see Dahlhaus (1997). Here and throughout this paper we use the convention X j,T = 0 for j ∈ {1, ..., T }. We finally obtain an estimator d N (u) for the time-varying long-memory parameter by taking the first component of the (k + 1) dimensional vectorθ N,k (u) defined in (3.2). It will be demonstrated in Theorem 3.3 below that this approach results in a uniformly consistent estimator of the time-varying long-memory parameter. For this purpose we define θ 0,k (u) := (d 0 (u), a 1,0 (u), ..., a k,0 (u)) as the (k + 1) dimensional vector containing the long memory parameter d 0 (u) and the first k AR-parameter functions a 1,0 (u), ..., a k,0 (u) of the approximating process {X t (u)} t∈Z defined by the representation (2.6) and (2.7). Here and throughout this paper, A 11 denotes the element in the position (1,1) and A sp the spectral norm of the matrix A = (a ij ) k i,j=1 , respectively. We state the following technical assumptions.
where the constant D is the same as in Assumption 2.1 and for each i = 1, . . . , k Θ u,k,i is a compact set with a finite number (independent of u, k, i) of connected components with positive Lebesgue measure. Let Θ k denote the space of all four times continuously differentiable functions θ k : [0, 1] → R k+1 with θ k (u) ∈ Θ u,k for all u ∈ [0, 1]. We assume that the following conditions hold for each k ∈ {k(T ), T ∈ N} : (i) The functions g k in (3.1) are bounded from below by a positive constant (which is independent of k) and are four times continuously differentiable with respect to λ and u, where all partial derivates of g k up to the order four are bounded with a constant independent of k.
(iii) Define [here ∇ denotes the derivative with respect to the parameter-vector θ k ], then the matrix Γ k (θ 0,k ) is nonsingular for every u ∈ [0, 1], k ∈ {k(T ), T ∈ N}, and as T → ∞. Furthermore, (3.6) is also satisfied if the function θ 0,k (u) is replaced by any sequenceθ T (u) such that sup u∈[0,1] |θ T (u) − θ 0,k (u)| → 0. For such a sequence we additionally assume that the condition Assumption (i) and (ii) are rather standard in a semi-parametric locally stationary time series model [see for example Dahlhaus and Giraitis (1998) or Dahlhaus and Polonik (2009) among others]. Note that in order to be fully nonparametric it is necessary that the number of parameters k grows with increasing sample size. In this case the restriction on the spectral norm in part (iv) was verified for a large number of long-range dependent models by Bhansalia et al. (2006) [see equation (4.4) in this reference]. While these assumptions solely depend on the "true" underlying model, the following assumption links the growth rate of k and N as the sample size T increases if the spectral density f (u, λ) in (2.6) is approximated by the truncated analogue for some 0 < ε < 1/6 as T → ∞.
It follows by similar arguments as given in the proof of Lemma 2.4 in Kreiß et al. (2011) As a consequence Assumption 3.1 (iii) is rather intuitive, because the parametric model (3.1) can be considered as an approximation of the "true" model defined in terms of the time varying spectral density (2.5). We finally note that condition (3.7) is satisfied for a large number of tvFARIMA(p, d, q) models, because it can be shown by similar arguments as in the proof of Theorem 2.2 in Preuß and Vetter (2013) that the coefficients a j,0 (u) are geometrically decaying. This yields ∞ j=k+1 sup u |a j,0 (u)| = O(q k ) for some q ∈ (0, 1) resulting in a logarithmic growth rate for k, which is in line with the findings of Bhansalia et al. (2006). Similarly, one can include processes whose AR coefficients decay such that ∞ j=0 sup u |a j,0 (u)|j r < ∞ is satisfied for some r ∈ N 0 . In this case k needs to grow at some specific polynomial rate. Our first main result states a uniform convergence rate for the difference betweenθ N,k (u) and its true counterpart θ 0,k (u). As a consequence it implies that the estimatord N obtained by sieve estimation is uniformly consistent for the (time varying) long-range dependence parameter of the locally stationary process. Theorem 3.3. Let Assumption 2.1, 3.1 and 3.2 be satisfied and suppose that the estimator of the mean function µ satisfies (3.10) In particular Remark 3.4. It follows from the proof of Theorem 3.7 below that, the local window estimator with window length Nμ Because we use tvFARIMA(k, d, 0) models in (3.1), we can choose a logarithmic rate for the dimension k [see the discussion following (3.8)]. Consequently, for the local window estimate the uniform rate in equation (3.10) is arbitrary close to the factor N D−1/2 .
In order to obtain an estimator of the quantity F in (2.10) we assume without loss of generality that the sample size T can be decomposed into M blocks with length N (i.e. T = N M ), where M is some positive integer. We define the corresponding midpoints in both the time and rescaled time domain by t j = (N − 1)j + N/2, u j = t j /T , respectively, and calculated N (u j ) on each of the M blocks as described in the previous paragraph. The test statistic is then obtained asF The following two results specify the asymptotic behaviour of the statisticF T under the null hypothesis and alternative.
Theorem 3.5. Assume that the null hypothesis H 0 (of no long-range dependence) is true. Let Assumptions 2.1, 3.1 and 3.2 be satisfied, define )du] 1,1 and suppose that the estimatorμ of the mean function satisfies max t=1,...,T

12)
where 0 < ε < 1/6 is the constant in Assumption 3.2. Moreover, if the conditions Theorem 3.6. Assume that the alternative H 1 of long-range dependence is true. Let Assumptions 2.1, 3.1 and 3.2 be satisfied and suppose that the estimatorμ of the mean function satisfies where 0 < ε < min{1/2 − D, 1/6} is the constant in Assumption 3.2. Moreover, if the conditions Note that the term W T in the denominator of the left hand side in (3.13) can be consistently estimated bŷ such thatŴ T /W T P − → 1. Consequently, an asymptotic level α-test is obtained by rejecting the null hypothesis (2.9), whenever where u 1−α denotes the (1 − α)-quantile of the standard normal distribution. It then follows from Theorem 3.5 and 3.6 that for any estimator of the mean function µ satisfying (3.11), (3.12) and (3.14), the test, which rejects H 0 whenever (3.15) is satisfied, is a consistent level-α test for the null hypothesis stated in (2.9). A popular estimate for this quantity is given by the the local-window estimator where L is a window-length which does not necessarily coincide with the corresponding parameter in the calculation of the local periodogram. Note that also Iμ N (u, λ) is an asymptotically unbiased estimator for f (u, λ) if N → ∞ and N/T → 0. The final result of this section shows that this estimator satisfies the assumptions of Theorem 3.5 and 3.6 if L grows at a 'slightly' faster rate than N . This means, it can be used in the asymptotic level α test defined by (3.15) Theorem 3.7. a) Suppose that the assumptions of Theorem 3.5 hold and additionally N 1+4ε /L 1−δ → 0 and L 5/2−δ /T 3/2 → 0 are satisfied for some δ > 0, where ε > 0 denotes the constant in Theorem 3.5. Then the local-window estimatorμ L defined in (3.16) satisfies (3.11) and (3.12).
Remark 3.8. Analogues of Theorem 3.5 and 3.6 can be obtained in a parametric framework. To be precise, assume that the approximating processes {X t (u)} t∈Z has a time varying spectral density of the form (3.1), where k is fixed and known. In this case it is not necessary that the dimension k is increasing with the sample size T and Assumption 3.1(iii) and 3.2 are not required. All other stated assumptions are rather standard in this framework of a semi-parametric locally stationary time series model [see for example Dahlhaus and Giraitis (1998) or Dahlhaus and Polonik (2009) among others]. With these modifications Theorem 3.5 and 3.6 remain valid and as a consequence we obtain an alternative test to the likelihood ratio test proposed in Yau and Davis (2012), which operates in the spectral domain and can be used for more general null hypotheses as considered by these authors.
Remark 3.9. It is worthwhile to mention that the assumption of Gaussianity for the innovation process in 2.1 is not necessary at all and is only imposed here to simplify technical arguments in the proof of Theorem 5.1. In fact, all results of this section remain true as long as the innovations are independent with all moments existing, mean zero and E(Z 2 t,T ) = σ 2 (t/T ) for some twice continuously differentiable function σ : [0, 1] → R.
To be more precise, in order to address for non-gaussian innovations the variance V T in Theorem 5.1 (which is one of the main ingredients for the proofs in Section 5) has to be replaced by where V T is defined in (5.5) and κ 4 (u) denotes the fourth cumulant of the innovations, i.e. κ(t/T ) = E(Z 4 in the proof of Theorem 3.5, we must replaceŴ T in the decision rule (3.15) bŷ whereσ(u j ) andκ(u j ) are obtained by calculating the empirical second and fourth momentμ 2,Z (u j ),μ 4,Z (u j ) of the residuals

Finite sample properties
The application of the test (3.15) requires the choice of several parameters. Based on an extensive numerical investigation we recommend the following rules. For the choice of the parameter L in the local window estimatê µ L of the mean function [for a precise definition see (3.16)] we use L = N 1.05 . Because the procedure is based on a sieve of approximating tvFARIMA(k, d, 0)-processes the choice of the order k is essential, and we suggest the AIC criterion for this purpose, that iŝ where λ j = 2πj/T (j = 1, . . . , T ), and hθ k,s (λ) is the estimated spectral density of a stationary FARIMA(k, d, 0) process and Iμ L (λ) is the mean-corrected periodogram given by Finally, the performance of the test depends on the choice of N , and this dependency will be investigated in the following discussion.

Simulation results
All results presented in this Section are based on 1000 simulation runs, and we begin with an investigation of the approximation of the nominal level of the test (3.15) considering three examples. The first example is given by a tvAR(1)-process where {Z t,T } t=1,...,T denotes a Gaussian white noise process with variance 1. Two cases are investigated for the mean function representing a smooth change and abrupt change in the mean effect, i.e.
Our third example consists of a tvMA(1)-process given by where {Z t,T } t=1,...,T denotes again a Gaussian white noise process with variance 1. Figure 3 shows the autocovariance functions of 1024 observations generated by the models (4.2), (4.3) and (4.4), respectively, from which it is clearly visible that the mean functions in (4.2) and (4.3) are causing a long-memory type behaviour of the autocovariance functions [see the left and middle panel in Figure 3]. In Table 1, we show the simulated level of the test (3.15) for various choices of N . We observe a reasonable approximation of the nominal level whenever N/T ≈ 1/4 and the sample size T is larger or equal than 512. Note that even for model (4.1) with mean function (4.3) the level is only slightly overestimated, although this mean function is not twice continuously differentiable as required by the asymptotic theory. We conjecture that the performance of the test could be improved by using estimators addressing for such jumps. In order to investigate the power of the test (3.15) and to compare it with the competing procedures proposed by Berkes et al. (2006), Baek and Pipiras (2012)  where {Z t,T } t=1,...,T denotes a Gaussian white noise process with variance 1 and B is the backshift operator given by B j X t,T := X t−j,T . In both cases the long-memory function is given by d(t/T ) = 0.1 + 0.3t/T . Because all competing procedures are designed to detect stationary long-range dependent alternatives, we also simulated data from a stationary FARIMA(1,d,1)-process The corresponding results for the new test (3.15) are shown in the second column of Table 2, 3 and 4, and we observe reasonable rejection frequencies in the first two cases. Interestingly, the differences in power between the tvFARIMA(1, d, 0) and the tvFARIMA(0, d, 1)-model are rather small (see second column in Table 2 and 3).
The results in Table 4 show a loss in power, which corresponds to intuition because the "average" long-memory effect in model (4.7) is 0.1, while it is 1 0 (0.1 + 0.3u) du = 0.25 in model (4.5) and (4.6). In order to compare our new test with existing approaches we next investigate the performance of the procedures proposed by Berkes et al. (2006), Baek and Pipiras (2012) and Yau and Davis (2012), which are designed for a test of the null hypothesis "the process has the short memory property with a structural break in the mean" against the alternative "the process has the long memory property". The third column of Table 2, 3 and 4 shows the power of the test in Baek and Pipiras (2012), which also operates in the spectral domain. These authors estimate the change in the mean with a break point estimator and remove this mean effect (which is responsible for the observed local stationarity) from the time series. Then they calculated the local Whittle estimator introduced by Robinson (1995) for the self similarity parameter and reject the null hypothesis for large value of this estimate. Note that the calculation of the local Whittle estimator requires the specification of the number of "low frequencies" and we used m = √ T as Baek and Pipiras (2012) suggested in their simulation study. We observe that the new test (3.15) yields larger power than the procedure of Baek and Pipiras (2012) in nearly all cases under consideration. This improvement becomes more substantial with increasing sample size. Next we study the performance of the procedure proposed by Berkes et al. (2006) in model (4.5)-(4.7). These authors use a CUSUM statistic to construct an estimator, sayk * , for a (possible) change point k * in a time series. Then two CUSUM statistics are computed for the firstk * elements of the time series and the remaining ones, respectively. The test statistic is given by the maximum of those two. For the choice of the bandwidth function we used q(n) = 15 log 10 (n) as suggested by these authors in Section 3 of their article. The results are depicted in the fourth column of Table 2, 3 and 4. From this we see that their test is not able to detect long-range dependence in both the stationary and locally stationary case. These findings coincide with the results of Baek and Pipiras (2012) who also remarked that the test in Berkes et al. (2006) has very low power against long-range dependence alternatives. Finally, we investigate the method proposed by Yau and Davis (2012) which consists of a parametric likelihood ratio test assuming two (not necessarily equal) ARMA(p, q) models before and after the breakpoint of the mean function. Their method requires a specification of the order of these two models and we used ARMA(1, 1)-models under the null hypothesis and a FARIMA(1, d, 1) model under the alternative hypothesis. The corresponding results are depicted in the fifth column of Table 2, 3 and 4 corresponding to non-stationary and stationary long-range dependent alternatives, respectively. We observe that in these cases the new test (3.15) outperforms the test proposed in Yau and Davis (2012) if the sample size is larger than 512 and that both tests have similar power for sample size 256 (see the fifth column of Table 2 and 3). On the other hand, in the case of the long-range dependent stationary alternative (4.7) the test of Yau and Davis (2012) yields slightly better rejection probabilities than the new test (3.15) for smaller sample sizes while we observe advantages of the proposed test in this paper for sample sizes 512 and 1024. These results are remarkable, because the test of Yau and Davis (2012) is especially designed to detect stationary alternatives of FARIMA(1, d, 1) type, but the nonparametric test still yields an improvement.

Data examples
As an illustration we apply our test to two different datasets. The first contains annual flow volume of the Nile River at Aswan Egypt between the years 1871 and 1970 while the second data set contains 2048 squared log-returns of the IBM stock between July 15th 2005 and August 30th 2013 which were already discussed in the introduction. Both time series are depicted in Figure 4, and in the case of Nile River data our test statistic √ TF T / Ŵ T equals -1.9 for M = 4, which is far below every reasonable critical value and yields a p-value of 0.971. This implies that the null hypothesis of a non-stationary short-memory model can not be rejected for this dataset, which is in line with the findings of Yau and Davis (2012) who obtained p-values larger than 0.7 for their likelihood ratio approach and the CUSUM procedure of Berkes et al. (2006). The test of Baek and Pipiras (2012) does not reject the null hypothesis as well, since the p-value equals 0.944.      In the situation of the squared log-returns of the IBM stock, the assumption of Gaussianity is too restrictive and we therefore apply the more general test described in Remark 3.9. The values of the test statistic √ TF T / Ŵ T,general are 5.67 and 9.48 for M = 4 and M = 8, respectively, yielding that the p-value is smaller than 2.87 · 10 −7 for both choices of the segmentation. This means that the assumption of no long-range dependence is clearly rejected. If we apply the likelihood ratio test of Yau and Davis (2012) to this dataset, we obtain a value for the statistic of 15.77 which is then compared with the quantiles of the standard normal distribution. This yields an even smaller p-value. On the other hand, the CUSUM procedure of Berkes et al. (2006) only rejects the null hypothesis of no long-range dependence at a 10 % but not at a 5 % level. This observation is, however, not surprising given the low power of this test in the finite sample situations presented in the previous section. The test of Baek and Pipiras (2012) rejects the null hypothesis with a p-value 8.65 · 10 −12 , yielding the same result as our approach and the one of Yau and Davis (2012).
on the the sample size T and define where I µ N is the analogue of the local periodogram (3.4) where the estimatorμ has been replaced by the "true" mean function µ.
Theorem 5.1. a) Let Assumption 2.1 be fulfilled and assume that φ T (u, λ) : [0, 1] × [−π, π] → R is symmetric in λ, twice continuously differentiable with uniformly bounded partial derivates such that for all u ∈ [0, 1], λ ∈ [−π, π], where C > 0, 0 < ε < 1/2 − D are constants and g : N → (0, ∞) is a given function. Then we have b) Suppose the assumptions of part a) hold with D = 0, ε < 1/6 and additionally lim inf T →∞ V T ≥ c, Proof: In order to prove part a) Theorem 5.1 we definet j := t j − N/2 + 1,ψ l (u j,p ) := ψ l (t j +p T ), Z a,b := Z a−N/2+1+b and obtain Note that B N,T and A N,T compromise the error arising in the approximation of ψt j +p,T,l by ψ l (t j +p T ) and ψ m (u j,q ) by ψ m (u j ), respectively. In order to establish the claim (5.4), we prove the following statements: Proof of (5.6): Due to the independence of the random variables Z t , we only need to consider terms fulfilling p = q + l − m (this means 0 ≤ p = q + l − m ≤ N − 1 because of p ∈ {0, 1, 2 . . . , N − 1}) which in turn implies |l − m| ≤ N − 1. Therefore Using (2.4), (5.1) and Lemma 6.2 in the online supplement, we obtain where we used the fact that terms corresponding to l = 0 or m = 0 are of smaller or the same order (we will use this property frequently from now on without further mentioning it). We set h := l − m and obtain from Lemma 6.1a) in the online supplement that By proceeding analogously we obtain that E 2 N,T = O(g(k)N −1+ε ) which proves the assertion in (5.6).
Proof of (5.7): Without loss of generality we only consider the first summand in A N,T (the second term is treated exactly in the same way). A Taylor expansion and similar arguments as in the proof of (5.6) yield and η m,j,q ∈ (u j − N/(2T ), u j + N/(2T )). Using (2.4), (2.8), (5.1), Lemma 6.2 it follows where we used Lemma 6.1(c) in the online supplement for the last step. Finally (2.4), (2.8), (5.1), Lemma 6.2 in the online supplement and the same same arguments as above, show that the term A 2 N,T is of order O(g(k)N 2 T −2 ).
Proof of (5.8): By employing (2.3) and the same arguments as above it can be shown that B N,T is of order In the next step we prove the asymptotic representation for the variance in (5.5). We obtain where we used assumption (2.3) and similar arguments as given in the proof of (5.4). Because of the Gaussianity of the innovations we obtain This implies that the calculation of the (dominating part of the) variance splits into two sums, say V 1 N,T and V 2 N,T . In the following discussion we will show that both terms converge to the same limit, that is For the sake of brevity we restrict ourselves to the case i = 1. Because of the independence of the innovations Z t , we obtain that the conditions p = r + l − n + (j 2 − j 1 )N and s = q + o − m + (j 1 − j 2 )N must hold, which, because of p, s ∈ {0, ..., N − 1}, directly implies |l − n + (j 2 − j 1 )N | ≤ N − 1 and |o − m + (j 1 − j 2 )N | ≤ N − 1. Thus, the term V 1 N,T can be written as Since q ∈ {0, 1, 2 . . . , N − 1}, we get from the condition 0 ≤ q + o − m + (j 1 − j 2 )N ≤ N − 1 that, if q, o, m, j 1 are fixed, there are at most two possible values for j 2 such that the corresponding term does not vanish. It follows from Lemma 6.3 (i)-(iii) in the online supplement that there appears an error of order O( 1 T g 2 (k) N 1−2D−2ε ) if we drop the condition 0 ≤ r + l − n + (j 2 − j 1 )N ≤ N − 1 and assume that the variable r runs from −(N − 1) to −1. Therefore, up to an error of order O( 1 T g 2 (k) We show which then concludes the proof of (5.5). For this purpose we begin with an investigation of the term D 1,T for which the terms in the sum vanish if r − q + m − o + (j 2 − j 1 )N = 0. Moreover, the following facts are correct: I. The variable r runs from 0 to N −1 since r−q+m−o+(j 2 −j 1 )N = 0 and 0 ≤ q+o−m+(j 1 −j 2 )N ≤ N −1.
III. There appears an error of order O(g 2 (k)T −1 N −1+2D+2ε ) if we omit the sum with j 1 = j 2 [we prove this in Lemma 6.3(v) in the online supplement].
IV. We can afterwards omit the condition 0 ≤ q +o−m ≤ N −1 since it is 0 ≤ r ≤ N −1 and r −q +m−o = 0 [note that, because of III., we assume j 1 = j 2 from now on].
Thus, using the representation of f (u j 1 , λ) in (2.5), the term D 1,T can be written as (up to an error of order With Parseval's identity, we get while Lemma 6.2 in the online supplement yields (up to a constant) the inequalities which proves (5.9). We now consider the term Here D (1) 2,T corresponds to the sum over all r and vanishes by Parseval's identity. D (2) 2,T stands for the resulting error term which is of order O(T −1 g 2 (k)N −1+2D+2ε ) because of Lemma 6.3 (vi) in the online supplement. For a proof of this statement where we proceed (with a slight modification) analogously to the proof of Theorem 6.1 c) in Preuß and Vetter (2013). Note that these authors work with functions φ T such that for the integrated case. The authors then derive the exact same order as in (5.10) with the only difference that ε = 0 and g(k) ≡ 1. In our situation, assumption (5.1) and Lemma 6.2 in the online supplement imply |h| (5.12) and we can therefore proceed completely analogously to the proof of Theorem 6.1 c) in Preuß and Vetter (2013) but using (5.12) instead of (5.11). The details are omitted for the sake of brevity. 2 For the formulation of the next result we define the set G T (s, ) = {φ T : [−π, π] → R |φ T is symmetric, there exists a polynomial P of degree and a constant d and state the following result.
By an application of Markov's inequality and a straightforward but cumbersome calculation [see the proof of Lemma 2.3 in Dahlhaus (1988) for more details] this yields The statement (5.14) then follows with the extension of the classical chaining argument as described in Dahlhaus (1988) if we show that the corresponding covering integral of Φ * T with respect to the semi-metric ∆ T,ε is finite. More precisely, the covering number N T (u) of Φ * T with respect to ∆ T,ε is equal to one for u ≥ AN −ε/2 and bounded by T C (qk) 2 u −qk N −qkε/2 for some constant C for u < AN −ε/2 [see Chapter VII.2. of Pollard (1984) for a definition of covering numbers]. This implies that the covering integral J T (δ) = δ 0 log(48N T (u) 2 u −1 ) 2 du is up to a constant bounded by k 4 log 2 (T )N −ε/2 . The assertion follows by the assumptions on k and N . 2

Proof of Theorem 3.3
Introducing the notation we obtain with the same arguments as given in the proof of Theorem 3.6 in Dahlhaus (1997) max t=1,...,T u, λ). By proceeding as in the proof of Theorem 5.2 one verifies and analogously we get We will now derive a refinement of this statement. By an application of the mean value theorem, there exist vectors ζ , and the first term on the left-hand side vanishes due to (5.17). This yields where E T denotes the difference between ∇L µ N,k (θ 0,k (u), u) and ∇Lμ N,k (θ 0,k (u), u), which is of order max t=1,...,T µ(t/T ) −μ(t/T ) O p (k 1/2 N ε ) by (5.16). It follows from and Theorem 5.2 that max u∈{1/T,...1} ∇L µ N,k (θ 0,k (u), u) 2 = O p ( √ kN −1/2+ε/2 ) so it remains to show P (∇ 2 Lμ N,k (ζ (k) u , u) −1 exists and ∇ 2 Lμ N,k (ζ (k) u , u) −1 sp ≤ Ck for all u ∈ {1/T, . . . , 1}) → 1 for some positive constant C. This, however, follows with a Taylor expansion, (5.17), Theorem 5.2 and Assumption 3.1 (iv) for the corresponding expression withμ replaced by µ. The more general case is then implied by the convergence-assumptions onμ. 2 W T in Theorem 5.1 and 3.5, (3.6) yields V T /W T → 1. Consequently, under the assumptions of Theorem 3.5 it follows (observing (3.8) and the growth conditions on N , Since d 0 (u) is the first element of the vector θ 0,k (u), Theorem 3.5 is a consequence of the fact 1 M M j=1 d 0 (u j ) = F + O(M −2 ) [this can be proved by a second order Taylor expansion] if we are able to show that Analogously, Theorem 3.6 follows from (5.4) and (5.5) if the estimates can be established. It can be shown analogously to the proof of Theorem 3.6 in Dahlhaus (1997), that, under assumptions (3.11) -(3.12), both terms R 1,T and R 2,T are of order O p (k 2 N −ε T −1/2 + k 2 N ε−1 ), while, under assumption (3.14), the order is o p (1) [see the proof of (5.23) and (5.15), respectively, for more details]. Therefore it only remains to consider the quantities R 3,T and R 4,T . For this purpose note that ∂θ j,s ∂θ j,r r,s,t=1,...,k+1 dλ where, in the last inequality, we have used the fact that the second and third term in (5.22) are bounded by a constant [this follows directly from Assumption 3.1]. Before we investigate the order of this expression, we derive a similar bound for the term R 4,T . Observing (5.21) we obtain If we show max j=1,...,M sup θ k ∈Θ R,k r,s,t=1,...,k 1 4π it follows with Assumption 3.1 (iv) in combination with (  = O p (k 1/2 N −ε T −1/2 + k 1/2 N ε−1 ) = o P (N −1/2+ε/2 k 1/2 ) (5.24) under the null hypothesis. By using (5.23) and (5.24) instead of (5.15) and (5.16), assertion (5.18) follows by the same arguments as given in the proof of Theorem 3.3. where we used the independence of the innovations, (2.3) and (2.4) and the last inequality follows by replacing the sums by its corresponding approximating integrals and holds for some positive constant C (which is independent of l and may vary in the following arguments). This yields thatμ L (t/T ) estimates its true counterpart at a pointwise rate of L 1/2−D and we now continue by showing stochastic equicontinuity. The expansion (5.25) and the bound C l L 1−l(1−D) for the l-th cumulant (l ≥ 2) ofμ L yield cum l (L 1/2−D−α/2 (μ L (t 1 /T )−μ L (t 2 /T ))) ≤ (2C) l L −lα/2 for all t i ∈ {1, ..., T } and every α > 0, from which we get E(L l(1/2−D−α) (μ L (t 1 /T ) −μ L (t 2 /T )) l ) ≤ (2l)!C l L −lα/2 for all even l ∈ N and t i ∈ {1, ..., T } and from (5.25) we obtain E(∆(t/T)) = O(T −1 +L 2 /T 2 ). A simple calculation reveals cum(∆(t 1 /T ), ∆(t 2 /T )) = O(L −1 T −1 ) (where the estimate is independent of t i ) and with the Gaussianity of the innovations we get cum(∆(t 1 /T ), ..., ∆(t l /T )) = 0 for l ≥ 3. This yields, as above, L 1/2−α T 1/2 max t=1,...,T |∆(t/T )| = o p (1) for every α > 0, and completes the proof of Theorem 3.7. 2 6 Online supplement: Auxiliary results Finally we state some lemmas which were employed in the above proofs.
By (5.1) and Lemma 6.2 this sum can be bounded by As in the proof of (i) we can argue that there are at most two possible values for j 2 if o, m and j 1 are chosen and that the expression is maximized for |j 1 − j 2 | = 1. Therefore we can bound the above expression up to a constant through g 2 (k) By setting p := o − m + κN the claim follows with (7.2). 2