Testing for jumps in the presence of smooth changes in trends of nonstationary time series

: Nonparametric smoothing methods have been widely used in trend analysis. However, the inference procedure usually requires the cru-cial assumption that the underlying trend function is smooth. This paper considers the situation where the trend function has potential jumps in addition to smooth changes. In order to determine the existence of jumps, we propose a nonparametric test that can survive under dependent and nonstationary errors, where existing tests assuming independence or sta- tionarity can fail. When the existence of jumps is aﬃrmative, we further consider the problem of estimating the number, location and size of jumps. The results are illustrated via both Monte Carlo simulations and a real data example.


Introduction
Given a sequence of observations collected over time, a problem of significant interest is to understand the pattern of the underlying trend (mean) function. For this, a common approach is to form a small window around each time point and use the (weighted) sample average within the selected window as an estimator of the corresponding local mean. The method is nonparametric as it does not impose any parametric assumption on the underlying trend function, and has been frequently used and widely studied in the literature; see for example Härdle and Mammen (1993), Fan and Zhang (1999), Horowitz and Spokoiny (2001), Zhou and Wu (2010), Chen and Hong (2012) and references therein. Nevertheless, the validity of the methods developed in the aforementioned papers relies critically on the assumption that the trend function is smooth, namely it does not contain any jump. Although the smoothness assumption provides the basic motivation and technical justification for using nonparametric smoothing methods, Müller and Stadtmüller (1999) argued that a small number of jumps can exist in addi-Distinguishing jumps from smooth changes under dependence 707 tion to smooth changes for a number of applications. Therefore, an immediate and important problem is to detect whether the trend function has any jump before calling the nonparametric estimation procedure. This motivates us to consider a rigorous statistical test for the existence of jumps in the trend function regardless of whether it has smooth changes. Therefore, smooth changes can exist under both the null and alternative hypotheses, which makes the current problem different from the conventional problem of testing for change-points where one is interested in detecting any change from constancy; see for example Shao and Zhang (2010), Zhou (2013), Vogt and Dette (2015) for recent developments and review papers by Aue and Horváth (2013) and Jandhyala et al. (2013) for further references in this direction. As a result, applying the tests developed in the aforementioned papers to the current problem may result in distorted p-values, as the current null hypothesis contains smooth changes which however belong to the alternative of the aforementioned tests.
Nonparametric inference of regression functions with jumps has been an active area of research; see for example Hall and Titterington (1992), Müller (1992), Loader (1996), Qiu and Yandell (1998), Spokoiny (1998), Grégoire and Hamrouni (2002a,b), Gijbels et al. (2007) and Joo and Qiu (2009). Nevertheless, most of the existing results focused on estimating the discontinuous regression curve by assuming that the errors are independent and identically distributed (iid). In contrast, the problem of testing for jumps has been studied mainly in the setting that the potential jump location is known to the practitioner, while the difficult problem of establishing an asymptotically valid global test has been much less explored in the literature. Assuming that the errors are iid, Wu and Chu (1993) proposed a test based on the maximal discrepancy between two kernel estimators that use the same bandwidth but different kernels; while Müller and Stadtmüller (1999) proposed a test by estimating the sum of squared jump sizes via simple linear regression. However, as commented by Wu and Zhao (2007), the independence assumption used in Wu and Chu (1993) and Müller and Stadtmüller (1999) can cause serious restriction on their applicability to time series data where dependence is the rule rather than the exception. To accommodate for serial correlation, Wu and Zhao (2007) proposed a dependencecorrected test by assuming that the error process is strictly stationary; see also  for the case with stationary α-mixing observations. We shall here substantially generalize existing results by allowing dependent and nonstationary errors, which appear commonly in practice (Elsner et al., 2008;Zhou and Wu, 2009;Degras et al., 2012;Guinness and Stein, 2013).
Nevertheless, generalizing the aforementioned tests to allow nonstationary error processes can be a nontrivial task. For example, the method of Müller and Stadtmüller (1999) relies on the relationship in their equation (3.2) to estimate the jump indicator parameter. However, if the error process is nonstationary, then the intercept in their equation (3.2) may also depend on time or the index variable, making it no longer suitable for the aforementioned purpose. Also, the test of  requires a stationary regressor process with a continuous density function to guarantee its convergence to a proper limiting distribution, which is violated in the current setting where the data collection time serves 708 T. Zhang as the index variable which is deterministic and discrete. Since nonstationary processes with time-varying features have experienced a surge of attention in the recent literature, the current paper makes an effort and consider a nonparametric test that can survive under both dependence and nonstationarity. In addition, to overcome the slow convergence issue of the asymptotic extreme value distribution, we propose to use a pivotalized simulation-assisted (PSA) procedure, and the current test seems to outperform the one by Wu and Zhao (2007) even for stationary error processes for which the aforementioned test is designed; see the simulation comparison in Section 5.1. Moreover, the current test largely relaxes the restrictive bandwidth condition required by Wu and Zhao (2007), which can help bring a great convenience to practitioners. To be more specific, in order for their test statistic to have a proper limiting distribution, Wu and Zhao (2007) requires that the bandwidth b n satisfies the condition nb 3 n log n → 0; see also Wu and Chu (1993) for a similar condition on the bandwidth. Although the aforementioned condition is able to regularize the bandwidth sequence to produce a smaller bias, it can cause a larger asymptotic variance of the resulting kernel estimators and in turn bring more randomness or uncertainties into the resulting test statistic making it less powerful. A popular bandwidth choice that is known to balance the bias-variance trade-off is of the form b n = cn −1/5 for some constant 0 < c < ∞, which has been widely used for estimation and testing purposes in the literature; see the discussion in Zhang and Wu (2011) for further references. Nevertheless, this popular choice is not allowed by the test of Wu and Zhao (2007), which can cause a great inconvenience for practitioners as finding another bandwidth sequence that has the nice interpretation and satisfies the conditions imposed by Wu and Chu (1993) and Wu and Zhao (2007) can be a nontrivial task. Wu and Zhao (2007) did not provide a solution on this and simply used b n = n −2/5 in their applications. In contrast, the current test relaxes the condition to nb 7 n log n → 0, and therefore one can conveniently use the popular bandwidth choice b n = cn −1/5 in practice to avoid the hassle of finding another appropriate bandwidth subject to the conditions imposed by Wu and Chu (1993) and Wu and Zhao (2007). Furthermore, the current paper makes an effort to further consider the situation when the null hypothesis of no jump point is rejected by the proposed test. In this case, we further propose an algorithm that one can use to estimate the number of jumps, locations of jump points and their jump sizes. The corresponding asymptotic theory is also established under the nonstationary framework introduced in Section 2. Unlike the case for iid normal observations, the current proof uses probabilistic tools including the m-dependence approximation and martingale decomposition to handle nonstationary error processes, and our asymptotic theory allows a growing number of jump points.
The rest of the paper is organized as follows. Section 2 introduces the nonstationary framework and basic assumptions. Section 3 contains our main results, including a nonparametric test for determining the existence of jumps and an algorithm for estimating the number, location and size of jumps. Section 4 deals with various implementation issues, and Section 5 contains Monte Carlo simulations and a real data application. Technical proofs are deferred to the Appendix.

Framework and basic assumptions
Assume that the data (y i ) n i=1 are observed from the model where μ : [0, 1] → R is the unknown trend function and (e i ) n i=1 is a zeromean error process which can be nonstationary. If the trend function is smooth, then one can naturally estimate it, along with its derivative, by the local linear estimator (Fan and Gijbels, 1996) for any t ∈ [0, 1], where K(·) is the kernel function and b n is the bandwidth sequence satisfying b n → 0 and nb n → ∞. Throughout the paper we assume that the kernel K(·) is a symmetric nonnegative function in C 1 [−1, 1] satisfying is the rectangle kernel, then the local linear estimator (2) becomes the least squares estimator based on the local data points As commented by Müller and Stadtmüller (1999), in many applications the underlying function is smooth everywhere except for a certain number of points where jumps occur. We shall here consider the situation that the trend function μ(·) is piecewise smooth on [0, 1] with a finite number of jump points. To be more precise, we say that a function f ∈ PS M [0, 1] if there exists 0 = t 0 < t 1 < · · · < t M < t M +1 = 1 such that on each of the intervals [t 0 , t 1 ), . . . , [t M −1 , t M ) and [t M , 1], f has bounded third-order derivatives, while lim s↑t k f (s) = f (t k ), k = 1, . . . , M. Hence, M represents the total number of jump points. We want to test the null hypothesis that the trend function does not contain any jump, namely Let be the two one-sided kernel functions deduced from K(·), and be the corresponding local linear estimators. Thenμ n,− (t) andμ n,+ (t) represent the left and right local linear estimators by using the data points {y i : i/n ∈ [t − b n , t)} and {y i : i/n ∈ [t, t + b n ]} respectively. If the function μ(·) is smooth, then bothμ n,− (t) andμ n,+ (t) are consistent estimator of μ(t), and thus their difference is expected to be small. On the other hand, if the trend function 710 T. Zhang has a jump at time t and μ(t−) = lim s↑t μ(s) = μ(t), thenμ n,+ (t) −μ n,− (t) provides an estimator of the jump size μ(t) − μ(t−) which is relatively large. The formulation of a formal test would require an asymptotic theory on the discrepancy between the left and right local linear estimators, which can be quite nontrivial even for stationary error processes. We shall here substantially generalize earlier results by allowing nonstationary and nonlinear error processes so that our asymptotic theory can be widely applicable. The problem of modeling nonstationary processes has been an active area of research; see for example Dahlhaus (1996), Nason et al. (2000), Ombao et al. (2005), Subba Rao (2006) and references therein. A more detailed comparison on existing frameworks can be found in Zhang and Wu (2011); see also Dahlhaus and Subba Rao (2006) and Vogt (2012) for additional discussions. We shall here generalize the framework of Draghicescu et al. (2009) and assume that there exists a zero-mean nonstationary process {G(i/n; F i )} n i=1 in the sense of Draghicescu et al. (2009) holds for some p ≥ 2, where F i = (. . . , i−1 , i ) is the shift process of iid random variables j , j ∈ Z, and G is a measurable function. Therefore, one can interpret F i and G(i/n; F i ) as the input and output of a time-varying physical system G, which approximates the underlying data generating mechanism by (4). As discussed in Draghicescu et al. (2009) and Zhang and Wu (2011), this framework covers a wide range of nonstationary processes and naturally extends many existing stationary time series models to their nonstationary counterparts.
We shall now introduce the functional dependence measure that will be useful in our asymptotic theory. Let ( j ) j∈Z be an iid copy of ( j ) j∈Z and F i = (F −1 , 0 , 1 , . . . , i ) be the coupled shift process, we define the functional dependence measure which quantifies the dependence of G(t; F i ) on the single innovation 0 over t ∈ [0, 1]. The functional dependence measure (5) enables us to develop an asymptotic theory for complicated statistics of time series data; see Wu (2005) for a comparison with strong mixing conditions and near-epoch dependence conditions. If the short range dependence condition Θ 0,q = ∞ i=0 θ i,q < ∞ holds for some q ≥ 2, then the long-run variance function Condition (A2) suggests that the the error process is approximately locally stationary. In particular, consider the process ζ j (i/n) = G(i/n; F j ), j ∈ Z, which is generated by using the same physical system G(i/n, ·) and is thus stationary. Then by condition (A2), in the small neighborhood i − k n ≤ l ≤ i + k n with k n /n → 0, we have e l − ζ l (i/n) = O(k n /n + 1/n) → 0.

A nonparametric test for jumps
Recall that the left and right local linear estimatorsμ n,− (t) andμ n,+ (t) are based on the data points {y i : i/n ∈ [t − b n , t)} and {y i : i/n ∈ [t, t + b n ]} respectively, and thus the differenceμ n,+ (t)−μ n,− (t) provides a natural statistic for testing whether there is a jump at time t. Let Theorem 3.1 provides the central limit theorem ofμ n,+ (t) −μ n,− (t) which can be used for testing whether a jump occurs at a certain time point.
If t ∈ (0, 1) \ {t 1 , . . . , t M } is a continuous point, then the bias of current estimator is E{μ n,+ (t) −μ n,− (t)} = O{b 3 n + (nb n ) −1 } as can be seen from the proof of Theorem 3.1. For stationary error processes, Wu and Zhao (2007) considered using the difference between the left and right local averages and the corresponding bias is E{Δ n (t)} = b n μ (t) + O{b 2 n + (nb n ) −1 }. Hence, the finite-sample performance of their method can be greatly affected by the 712 T. Zhang steepness of the underlying function. Intuitively, the right and left local averages in (6) estimate μ(t + b n /2) and μ(t − b n /2) respectively, and thus the associated bias is of order O(b n ), which is larger than its local linear counterpart. Due to the large bias, the method of Wu and Zhao (2007) can be very restrictive on the choice of the bandwidth. In fact, they require that the bandwidth satisfies nb 3 n log n → 0, which excludes the popular choice of b n = cn −1/5 , where 0 < c < ∞ is a constant; see also the discussion in Section 1. It is remarkable that using the technique of local linear estimation usually can only bring significant improvements for points in the boundary area (Fan and Gijbels, 1996), which only constitutes a small proportion of the whole region and thus the difference is usually neglected. However, for the specific problem of testing for jumps, left and right local estimators are formed and compared, which makes each time point a boundary point and thus improving the boundary performance by using local linear estimators can bring significant advantages to the current problem but was not taken into account by the test of Wu and Zhao (2007).
If one has the prior knowledge that a certain time point t ∈ (0, 1) is a potential jump point due to the occurrence of some sudden events at that time, then Theorem 3.1 is useful in providing a statistical test. In particular, by Theorem 3.1 (i), one rejects the null hypothesis that t is not a jump point at level α ∈ (0, 1) if where z 1−α/2 is the (1−α/2)-th quantile of the standard normal distribution. By Theorem 3.1 (ii), the above test has unit asymptotic power as n → ∞. Nevertheless, for many applications it is usually the case that potential jump locations are unknown to the practitioner, and the problem of identifying jump locations can require a series of statistical analysis. We shall here provide a global test that can be used for determining the existence of jumps in the trend function without prespecifying potential jump locations. Theorem 3.2 shows that under the null hypothesis (3), after proper centering and scaling the maximal discrepancy between the left and right local linear estimators has the asymptotic extreme value distribution.
Then as n → ∞,

Distinguishing jumps from smooth changes under dependence
If there is at least one jump point in the trend function, namely μ ∈ PS M [0, 1] for some M > 0, then by Theorem 3.1 (ii), the above test has unit asymptotic power as n → ∞. If the null hypothesis (3) of no jump point is rejected, then one would be interested in knowing the number of jump points along with their locations and sizes, which we shall discuss in Section 3.2.

The case with a single jump point
If one has prior knowledge about the jump location, say t 1 ∈ (0, 1), then the corresponding jump size can be simply estimated by the differenceμ n,+ (t 1 ) − μ n,− (t 1 ), whose consistency is guaranteed by Theorem 3.1 (ii). We shall here consider the situation that the jump location is unknown and needs to be estimated. Recall that the differenceμ n,+ (t) −μ n,− (t) provides a consistent estimator of the jump size that is zero for continuous points, and thus it is natural to estimate the jump location by the maximizer The corresponding jump size can then be estimated bŷ Fryzlewicz (2014) considered the problem of estimating the number and locations of change-points in a piecewise-constant mean function with iid observations. We shall here consider the problem of distinguishing jumps from smooth changes with dependent and nonstationary observations. The following theorem states thatt 1 andd 1 consistently estimate the true jump location and the corresponding jump size respectively.
n → ∞ and nb ν1 n → 0, then for any ε > 0 which can be taken arbitrarily small, we have (i)

The Case with multiple jump points
In the existence of multiple jump points, by the discussion in Section 3.2.1 one can naturally estimate the corresponding jump locations by local maximizers of the discrepancy function |μ n,+ (t) −μ n,− (t)|. To be more specific, we shall in the following present an algorithm that one can use to estimate the number, location and size of jumps.

Estimation of long-run variance
To apply Theorems 3.1 and 3.2, we need an estimate of the long-run variance function g(t), t ∈ (0, 1). Let τ n and n be bandwidth sequences satisfying τ n → 0, n → 0 and nτ n n → ∞, and Following Zhang and Wu (2012), we estimate g(t), t ∈ [0, 1], bŷ  (2012) and (ii) concerns the uniform consistency which is more desirable for the current problem. Note that uniform results as in Theorem 4.1 (ii) are generally much more difficult to obtain than its counterpart as in (i), and the detailed proof is given in the Appendix.
and (ii) sup where ψ n = n −1/4 (τ n n log n) 1/2 + n 1/2 n . We shall here briefly discuss the bounds in (11) and (12). For both of them, the optimal bounds are complicated and depend on ι, the decay rate of the dependence. If ι ≥ 2 as assumed in Theorem 3.2, then the bound in (11) , where for two positive sequences (r n ) and (s n ) we write r n s n if r n /s n + s n /r n is bounded for all large n. On the other hand, the bound in (12)  . Hence if the error process satisfies the geometric moment contraction condition, namely θ n,4 = O(ρ n ) for some 0 ≤ ρ < 1 as for finiteorder autoregressive processes, then the bounds in (11) and (12) can achieve O p (n −2/5+ε ) and O p (n −1/3+ε ) respectively for some arbitrarily small ε > 0. In practice, the error process (e i ) n i=1 is usually not observable, and we shall replace it by the estimated residuals. We suggest usingê i = y i −μ n (i/n), i = 1, . . . , n, where for each t,μ n (t) is one of the one-sided local linear estimatesμ n,± (t) that has the smaller weighted residual mean squares where χ n = {(nb n ) −1/2 (log n) 1/2 + b 2 n }n 5/4 τ n n .

A pivotalized simulation-assisted testing procedure
It is well known that the convergence to the extreme value distributions as in Theorem 3.2 can be quite slow, and a very large sample size would be needed for the approximation to be reasonably accurate. We shall here consider a pivotalized simulation-assisted (PSA) procedure that can help improve the finitesample performance of the proposed test. In particular, by (8), we reject the null hypothesis (3) at level α ∈ (0, 1) if We then generate iid standard normal random variables y • i and compute the corresponding test statistic T • n . We repeat this for many times and obtain the empirical quantileq 1−α of T • n . Since T n and T • n are both asymptotically pivotal and share the same asymptotic distribution, we reject the null hypothesis (3) at level α ∈ (0, 1) if T n >q 1−α . It can be seen from the simulation study in Section 5.1 that the above PSA procedure can help largely improve the finite-sample performance, and the current method can outperform the one by Wu and Zhao (2007) even for stationary error processes for which the aforementioned test is designed.

Bandwidth selection
Bandwidth selection is a nontrivial problem in the application of nonparametric methods. In the context of nonparametric hypothesis testing, it has been studied by Hall and Hart (1990), Kulasekera and Wang (1997) and  among others. Although many candidates have been proposed in the literature, Wang (2008) commented that usually there is no uniform guidance for an optimal choice. On the positive side, our simulation results in Section 5.1 suggest that the performance of the proposed testing procedure is not very sensitive to the choice of the bandwidth. Therefore, one can simply choose b n = n −1/5 as suggested by Zhang and Wu (2011) that has the nice interpretation. As an alternative, we consider the generalized cross-validation (GCV) selector, and correct the dependence by estimating the covariance matrix Γ n = {E(e i e j )} 1≤i,j≤n as suggested by Wang (1998). In particular, let Y = (y 1 , . . . , y n ) , where is the transpose operator. Then for any bandwidth b ∈ (0, 1), recall thatμ n (t; b) is one of the one-sided local linear estimatesμ n,± (t; b) that has the smaller weighted residual mean squares (2), we denote the corresponding fitted values asŶ (b). We choose the bandwidthb n that minimizes An estimate of the covariance matrix Γ n can be obtained by using the banding technique as in Bickel and Levina (2008) and Wu and Pourahmadi (2009). The GCV selector (13) works reasonably well in our simulation studies. For the choice of τ n and n in estimating the long-run variance, we suggest using the data-driven selector of Zhang and Wu (2012). When implementing the algorithm described in Section 3.2.2 for finite-sample problems, we suggest using the cut-off value from the hypothesis test as the threshold to device the critical region.

Monte Carlo simulations
We shall in this section carry out Monte Carlo simulations to examine the finite-sample performance of the proposed test and compare it with the existing method. The problem of testing for jumps in regression functions has attracted much attention in the literature; see for example Hall and Titterington (1992), Qiu and Yandell (1998), Grégoire and Hamrouni (2002b) and Joo and Qiu (2009) for local tests designed specifically for a given potential jump point, and Wu and Chu (1993) and Müller and Stadtmüller (1999) for global tests with unknown jump locations. However, as commented by Wu and Zhao (2007), existing results usually assume that the errors are iid, which can cause a serious limitation on their applicability to time series data where dependence is the rule rather than the exception. The aforementioned paper proposed a dependence-corrected test by assuming that the error process is strictly stationary. We shall here provide a brief review of their test and compare it with the proposed method. In particular, Wu and Zhao (2007) proposed to use the test statistic where Δ n (·) is the difference between left and right local averages as defined in (6), and k n = nb n is the largest integer not exceeding nb n . To obtain the cutoff values, Wu and Zhao (2007) suggested a simulation scheme by using quantiles of T • WZ , test statistics obtained from independent standard normal observations. Nevertheless, due to the lack of pivotalization, asymptotic distributions of T WZ and T • WZ can be different by a scale, which needs to be estimated bŷ where q Φ,0.75 = 0.674 is the third quartile of the standard normal distribution and A i,n = k −1 n kn j=1 y ikn+j . The cut-off value is then set to be the corresponding quantile ofrT • WZ . We shall here compare it with the proposed test for both stationary and nonstationary error processes. Let n = 500 and the trend function μ(t) = 2 sin(2πt), we consider the problem of testing the null hypothesis (3). Throughout our numerical experiments, the rectangle kernel K(v) = 1 I {|v|≤1} /2 T. Zhang   Fig 1. A comparison of the test by Wu and Zhao (2007)

in the left panel with the proposed test in the right panel for stationary autoregressive error processes. Q-Q plots of (a) T WZ /r against T • WZ ; and (b) Tn against T • n . The dashed line in both plots has unit slope and zero intercept.
is used. Since the test of Wu and Zhao (2007) was developed specifically for stationary error processes, we first consider a stationary autoregressive error process e i = 0.3e i−1 + i , where i , i ∈ Z, are iid standard normal random variables, and we use the bandwidth suggested by Wu and Zhao (2007). The results are summarized in Figure 1, from which we can see that the performance of the simulation method of Wu and Zhao (2007) may still not be decent enough as quantiles of T WZ are not well approximated by those ofrT • WZ . In contrast, the proposed test seems to perform reasonably well as can be seen from Figure 1 (b).
We shall now consider the situation with nonstationary error processes. In this case, the asymptotic distribution derived by Wu and Zhao (2007) for their test statistic may no longer hold, as Δ n (t) in (14) at different time points can behave differently for nonstationary processes. Write ζ i (t) = G(t; F i ) where F i = (. . . , i−1 , i ), we consider (a) a time-varying autoregressive error process: ζ i (t) = ρ(t)ζ i−1 (t) + i , where i , i ∈ Z, are iid random variables with pr( i = −1) = pr( i = 1) = 1/2; (b) a time-varying nonlinear error process:   The results are summarized in Table 1 for different choices of bandwidths, from which we can see that the empirical acceptance probabilities of the proposed test based on T n are fairly close to their nominal levels (90%, 95% and 99%) and the results are not very sensitive to the choice of bandwidths. As a comparison, empirical acceptance probabilities of the test by Wu and Zhao (2007) generally deviate from their nominal levels.

Application to global temperature data
The data set at http://cdiac.ornl.gov/ftp/trends/temp/jonescru/global. txt contains global monthly temperature anomalies in Celsius from 1850 to 2012. The data set has been widely studied in the literature on understanding the trend pattern, and a time series plot is provided in Figure 2. In particular, Wu et al. (2001) modeled the trend function as piecewise constant, namely changes in the mean can only occur by means of pure jumps. On the other hand, Wu and Zhao (2007) proposed to model the trend as a smooth function as their test suggested that there is no jump in the trend function with a p-value of 0.22. Nevertheless, the aforementioned results relied on the assumption of stationarity, while Zhou and Wu (2009) argued that the series should be treated as nonstationary. We shall here model the error process by (4) and apply the proposed method to test whether the trend function contains any jump. The selected bandwidth isb n = 0.18, and the corresponding test statistic is T n = 0.537. By the PSA procedure described in Section 4.2 with 5000 simulated T • n , the p-value is 0.011 thus we reject the null hypothesis of no jump at 720 T. Zhang   Fig 2. Monthly global temperature anomalies in Celsius from 01/1850 to 12/2012. The period between the two dashed lines corresponds to 11/1944-12/1946. the 5% significance level. We provide in Figure 2 the region (between the two dashed lines) where the local discrepancy exceeds the critical value. It can be seen from Figure 2 that there is a huge decrease in the mean during the small period between the two dashed lines, which should be treated as a jump as suggested by our analysis. By the algorithm in Section 3.2.2, the jump location is estimated at March 1946 with a jump sized = −0.26 in Celsius, indicating a sudden cooling after the Second World War during which atomic bombs were dropped at Japanese cities of Hiroshima and Nagasaki. Also, the end of World War II could lead to significant changes in naval activities at and under the sea surface, national and foreign policies of various countries and other human activities that could potentially affect temperature measurements.

A.1. Technical proofs
Define the projection operator Let e i = G(i/n; F i ), i = 1, . . . , n, and for t ∈ (0, 1) let The following lemma provides the asymptotic properties of R n,± (t) and R n,± (t), t ∈ (0, 1), and would be useful in proving Theorems 3.1 and 3.2. The proof of Lemma A.1 is given in Section A.2.
, v ∈ R, from the proof of Lemma A.1. The following lemmas are useful in proving Theorem 3.2, and their proofs are given in Section A.2.

T. Zhang
Lemma A.2. Let Z 1 , . . . , Z n be iid standard normal random variables and and (ii) the same result holds for R n,+ (t) − R n,− (t).

Lemma A.4. Assume conditions of Theorem 3.3. Then for any positive nonincreasing sequence
Proof. We shall first deal with the region where t ∈ T n \ N (t 1 , b n ). Since M = 1 is assumed, the mean function does not have any jump other than t 1 , and thus by the proof of Theorem 3.2, sup t∈Tn\N (t1,bn) On the other hand, by Theorem 3.1 (ii), and thus lim n→∞ pr |μ n, Therefore, it suffices to deal with the remaining region where t ∈ N (t 1 , b n ) ∩ T n \ N (t 1 , ξ n ), for which we need the following preparation. Let ζ k (t) = G(t; F k ) and write Ξ k,m, Without loss of generality, assume that the jump size d 1 > 0. Let ω n = b n + (nb n ) −1 , then by (15), is a continuous function on [0, 1] satisfying P K (0) = 0, P K (1) = 1 and 0 < P K (s) < 2 if s ∈ (0, 1). Since the kernel function K(·) is symmetric on [−1, 1], we have sup t∈N (t1,bn)∩Tn also tends to one as n → ∞, which entails the desired result. If the jump size d 1 < 0, then one can apply the above argument to the transformed data −y i , i = 1, . . . , n, in which case |μ n,+ (t) −μ n,− (t)| remains at the same but the direction of jump is flipped.
Proof. (Theorem 3.4) Let J n = ∪ M k=1 N (t k , b n ) be the collection of jump neighborhoods, then by the proof of Theorem 3.2, sup t∈Tn\J n |μ n, Therefore, with probability tending to one, there will be M separate regions where |μ n,+ (t)−μ n,− (t)| exceeds the threshold ϕ n . Since the window size b n → 0 as n → ∞, (i) follows. The second claim follows by applying the argument of Theorem 3.3 on N (t k , b n ), k = 1, . . . , M, respectively.
Since τ n → 0 and nτ n n → ∞, it suffices to prove that the same result holds for

T. Zhang
For this, note that For any fixed l ∈ Z, (P i−l λ i ) n i=1 forms a sequence of martingale differences, and by Doob's inequality, Let C denote an absolute constant whose value may vary from place to place.
By the proof of Proposition 4.1 of Zhang and Wu (2012), follows by properties of local linear estimates.
Proof. (Theorem 4.2) Letλ i , i = 1, . . . , n, be defined as in (9) but with (e j ) therein replaced by (ê j ), then By the proof of Theorem 3.2, we have In addition, by the proof of Since nτ n n → ∞, the result follows by Theorem 4.1.

A.2. Additional Technical Proofs
We shall here provide the proofs for Lemmas A.1-A.3.
Proof. (Lemma A.1) For any fixed l ≥ 0, the projections P i−l e i , i = 1, . . . , n, form a sequence of martingale differences. Let ζ k (t) = G(t; Since
Proof. (Lemma A.2) Let B(v), v ∈ R, be a standard Brownian motion, and Then Υ(u), u ∈ R, is a stationary Gaussian process with mean zero and marginal variance (2φ) −1 from the stationarity of Υ(u), u ∈ R, by Corollary A1 of Bickel and Rosenblatt (1973),  (26) holds with Υ(u) therein replaced byΥ(u). By the scaling property of Brownian motion, the processes {Q n (t)} t∈Tn and {Υ(t/b n )} t∈Tn have the same joint distribution, and Lemma A.2 follows.
By the technique of summation by parts, we have = o p (n −7/10 b −1 n log n).