A nonparametric approach to detecting changes in variance in locally stationary time series

This paper proposes a nonparametric approach to detecting changes in variance within a time series that we demonstrate is resilient to departures from the assumption of normality or presence of outliers. Our method is founded on a local estimate of the variance provided by the locally stationary wavelet framework. Within this setting, the structure of this local estimate of the variance will be piecewise constant if a time series has piecewise constant variance. Consequently, changes in the variance of a time series can be detected in a nonparametric setting. In addition, using a simulation study, we explore the robustness of our approach against the typical assumption of normality and presence of outliers. We illustrate the application of the approach to changes in variability of wind speeds at a location in the United Kingdom.

The challenge of detecting changes in a sequence {x i } i = 1, … ,n of observations reduces to the problem of finding the values of m and { i } i , which minimize the following expression: Here, we have m changepoints with associated ordered positions 0 = 0 , 1 , … , m , m + 1 = n. As a result, having m changepoints causes the data to be split into m + 1 segments such that segment i contains the observations x ( i−1 +1)∶ i and has variance 2 (i) ∶= 2 i−1 +1 = · · · = 2 i . The first term in Equation (1) is a cost function for the segment x ( i−1 +1)∶ i . The second term in Equation (1) is a penalty that guards against overfitting. The function f (m) is often taken to be simply the number of changepoints m, resulting in a penalty that is linear in the number of changepoints. is a tuning parameter where small values result in more changes than larger values. Different methods can be adopted in order to minimize Equation (1). Examples of exact methods include PELT (Killick, Fearnhead, & Eckley, 2012) and segment neighborhood (Auger & Lawrence, 1989), whereas binary segmentation is a commonly used approximate method (Scott & Knott, 1974).
Within the literature, a number of different variance cost functions have been proposed, one of the best known being the penalized likelihood approach (see Chen & Gupta, 2012 for a description). Within this setting, the log likelihood is calculated for each segment and summed up over segments to give the log likelihood for the data sequence. To avoid overfitting, the penalty is subtracted from the log likelihood as in (1). Alternatives to the likelihood-based approach to changepoint detection include the important works of Inclán and Tiao (1994), who use a nonparametric cumulative sums of squares (CSS) approach, and Fearnhead (2006), who develops a Bayesian posterior odds approach.
In practice, data sequences are often prone to outliers and/or heavy-tailed structures to which a majority of approaches are intolerant. Typically, some preprocessing of the data is often performed in an attempt to mitigate these effects (Candemir & Oguz, 2017). In some cases, this is a straightforward adaptation; however, given the unprecedented volume of data now being generated, preprocessing is becoming increasingly impractical and often subjective (Taleb, Dssouli, & Serhani, 2015). This motivates the need for new methods that are inherently resilient to such features.
The novel contribution of this paper is a nonparametric method for detecting changes in variance in the presence of outliers and heavy tails. We develop this nonparametric method using the locally stationary wavelet (LSW) model, due to Nason, Von Sachs, and Kroisandt (2000), to provide a local estimate of the variance of a time series. This paper is organized as follows: In Section 2, we describe the LSW model and our method for detecting changes in variance. The method is then assessed under various simulation scenarios (Section 3). Lastly, Section 4 applies our method to wind speed data collected from a site in the United Kingdom.

A NONPARAMETRIC APPROACH TO DETECTING CHANGES IN VARIANCE
In this section, we describe our nonparametric method for detecting changes in variance. Our approach is based on the key insight that detecting a change in variance in the time domain can be transformed into detecting a change in mean in a transformed domain, given a suitable transformation. We are, by no means, the first to consider this (see, for example, Darkhovski, 1994;Inclán & Tiao, 1994). In contrast to this earlier work, we adopt a wavelet-based approach, which we described below. Prior to describing our approach, we provide a brief introduction to the LSW modeling framework that we use.

LSW framework
Our method for detecting changes in variance relies upon capturing the local behavior of a time series' variance. This could be achieved using a rolling window estimate of the variance, but would require choice of a window size. Instead, we choose to adopt the LSW framework, which is built upon nondecimated wavelets.
The advantage of the LSW framework is that it encompasses many common time series processes, such as moving-average and autoregressive processes. Of particular interest for this work, we can use the LSW framework to attain a local time-varying measure of the variance. Below, we provide a brief introduction to the LSW approach to time series, which has a piecewise stationary second-order structure, and introduce a time-varying measure of the variance.
Following Fryzlewicz and Nason (2006), a triangular stochastic array {X t,N } t = 0, … ,N − 1 , for a dyadic length of time N = 2 J ≥ 1, is an LSW process if there is a mean-square representation where { j,k − t } j,k is a set of discrete compactly supported nondecimated wavelets and j,k are zero-mean, orthonormal, identically distributed random variables. The function W (z) ∶ [0, 1] → R is real valued and piecewise constant with some finite unknown number of jumps. Let  denote the total magnitude of jumps in W 2 (z). Then, the functions, W j (z), satisfy This definition of an LSW process (2) is a modification of that by Nason et al. (2000), in which the Lipschitz continuity constraint is replaced by that of total variation. This allows a process with a piecewise constant second-order structure to be modeled. The use of the notation X t,N rather than the traditional X t is to emphasize the triangular stochastic array across different N values, although, in practice, the dependence on N is often suppressed within notation.
To enable various theoretical model properties to be established, Nason et al. (2000) adopted the rescaled-time approach proposed by Dahlhaus (1997). For a given LSW series {X t,N } t = 1, … ,N , its time-varying spectrum is defined as S j (k∕N) = |W j (k∕N)| 2 . To estimate the spectrum, define d ,k = ∑ N t=1 X t ,k−t to be the empirical wavelet coefficients of an LSW process X t,N . They then demonstrate that the corrected wavelet periodogram L (z) = ∑ J l=1 A −1 ,l |d l,z | 2 is an asymptotically unbiased estimator of the time-varying spectrum, S j (k∕N). Here, A j,l is the discrete autocorrelation wavelet inner product We note that the wavelet periodogram is an inconsistent estimator of the evolutionary wavelet spectrum (see proposition 4 in Nason et al., 2000). Therefore, in order to obtain a consistent estimator, we would traditionally smooth the periodogram. Using representation (2), the variance of X t is given by The dependence on time in Equation (4) is introduced indirectly via the compact support of the wavelet. Using the wavelet spectrum, Nason et al. (2000) introduce the localized variance function for an LSW process of length N = 2 J . This is defined to be where z = k∕N ∈ (0, 1) is rescaled time. If a time series has a constant variance, then the dependence on z in Equation (5) is lost and the localized measure becomes a global one. The time-varying estimate of the variance (5) can be interpreted as a windowed rolling estimate of the variance of the time series. However, unlike a usual rolling estimate, no consideration of the window length is required. The benefit of a wavelet approach is that a variety of window sizes are used in the wavelet transform. Through the compact support of the wavelets, the representation in (4) is unique given the wavelet (Nason et al., 2000). Figure 1 shows an example of a process with (a) constant variance and (b) piecewise variance and their associated smoothed and unsmoothed local variance functions in (c), (d) and (e), (f), respectively. Figure 1(c) demonstrates that smoothing the spectral estimate masks the abrupt change that is clearly visible in (b) and (f). For this reason, the following section presents a method based on the unsmoothed localized variance.

The NPLE method
As previously described, if a time series is second-order stationary, then its evolutionary wavelet spectrum will be constant across each scale. Similarly, if a time series is piecewise second-order stationary, then the spectrum will be piecewise constant (Fryzlewicz & Nason, 2006). Consequently, as the localized variance function in Equation (5) is the sum of the spectrum over scales, this means that the localized variance function will also be piecewise constant. In order to exploit this property for changepoint detection, we need to translate it into a practical setting. Thus, our estimate of the unsmoothed local variance function is defined aŝ2 Due to the compact support of the wavelets, it is clear that, for a signal with piecewise constant variance, this estimate is also piecewise constant. The following section outlines the method for detecting these changes in the localized variance.

The nonparametric model
The localized variance function (6) is a sum of correlated 2 random variables. In practice, it is difficult to obtain the distribution for this (Gordon & Ramig, 1983). We choose to adopt a nonparametric approach to this changepoint detection problem.
We consider the localized variance, 2 , and model its cumulative distribution function (CDF), G(u) = P( 2 ≤ u), for quantile u using the empirical CDFĜ wherê2 t are assumed to be independent. With this distribution function in mind, following Zou, Yin, Feng, and Wang (2014), the maximum log likelihood of G(u) is given by because, for a fixed value of u, we have nĜ(u) ∼ Bin(n, G(u)).
Recall that in order to identify changepoints, we aim to minimize the following: where the cost function for segment i is given by the negative of the empirical log likelihood of the CDF of the localized variance estimate, that is, ] .
(10) Zou et al. (2014) recommend an integrated form of the cost function (10), as follows: where w(·) is a weight function, dependent upon the CDF of the data set, such that the integral is finite. The consistency of this approach is detailed in Zou et al. (2014). This allows information across all time points to be incorporated into the cost function. The computational cost of the cost function suggested by Zou et al. (2014) is (Mn 2 + n 3 ), where M is a specified maximum number of changepoints . Zou et al. (2014) suggest a screening step to help reduce this computational time; however this jeopardizes the accuracy of the locations of the changepoints.  suggest an improved segment cost that involves approximating the integral in (11) by a sum with some fixed number of terms K. This improves the computational time taken to calculate the cost for a given segment to (log n). The suggested approximation is as follows.
Following , we fix K and define = − log(2n−1) K . Time is then rescaled according to quantiles dependent upon the choice of K. Let {t k } k = 1, … ,K be equal to the (1 + (2n − 1) exp{ (2k − 1)}) −1 empirical quantile of the data. The approximation to the integral in Equation (11) is then given by We could use any search function in order to identify the changepoints using (12). However,  show that this cost function is compatible with PELT (Killick et al., 2012), a computationally efficient search for changepoints. We therefore use this search method in our simulation study in Section 3. Based upon the above description, we choose to call the method outlined here Nonparametric change in variance detection using Localized Estimates, abbreviated to NPLE.

Penalty choice
Penalty choice is a practical challenge in many changepoint settings. We choose to take an adaptive approach to penalty selection following that of Lavielle (2005). Intuitively, this approach involves selecting the segmentation that causes the most significant decrease in the cost function. This can be presented graphically in an analogous way to a scree plot used in principal component analysis (Jolliffe, 2002). Figure 2 shows an example plot of a cost function against the number of changepoints identified for a model with two true changepoints. It is visible that the true segmentation occurs at the point of maximum curvature, or "elbow," of the plot, where the largest relative decrease in the cost function occurs. The procedure of identifying this "elbow" can be formalized, and automated, as follows.
In line with Lavielle (2005), let m MAX be an upper bound on the number of changepoints in the model. The PELT search algorithm results in a single optimal segmentation for a given penalty value. In order to obtain segmentations for a range of penalty values efficiently, we utilize the CROPS method . From this range of segmentations, we then wish to determinem, the estimated number of changepoints in the model. Following Lavielle (2005), we obtainm using the following procedure.
where  m is the cost for the segmentation corresponding to m changepoints at locations 1:m . The associated costs have now been normalized between 1 and m MAX + 1.

Then, for 1
and D 1 = ∞. 3. The estimate for the true location of the changepoint is given by the largest value of m such that the second derivative of  m , D m , is greater than some threshold S, that is, The above procedure has also been implemented for penalty choice in a wavelet context by Killick, Eckley, and Jonathan (2013). The intuition behind this approach is that true changes will be added to the segmentation first as they will result in the largest improvement to the cost function. Following this, we will start to add spurious changes to the data, which are just due to noise, and so, the improvement in fit will be small. The aim of the choice of S is to put a threshold on the rate of change in the scaled test statistic as the number of changes increases. For an individual data set, we would do this using the changepoint equivalent of a scree plot.

SIMULATION STUDY
In the following simulation study, we test the robustness of NPLE against the log likelihood of a normal distribution with changing variance (MLvar; Chen & Gupta, 2012) and the nonparametric CSS (Inclán & Tiao, 1994). This allows for a comparison between both a parametric and a nonparametric method. Each of these is implemented using the changepoint package (Killick, Haynes, & Eckley, 2016;Killick & Eckley, 2014) in R (R Core Team, 2018). For the calculation of the localized variance estimate, we utilize the wavethresh (Nason, 2012) and changepoint.np (Haynes, 2016) packages. The study also considers departures from the idealized normal distribution change in variance setting. Specifically, the simulation study provides a practical assessment of the proposed approach's resilience to departures from normality, including outliers and a heavy-tailed dependence structure.

Random outliers
In this first study, we seek to test how each of the methods performs with varying degrees of outliers. To this end, we simulate a time series with different proportions of outliers. Specifically, we simulate epidemic changes in variance, = (1, 3, 1, 3, 1, 3), from a normal distribution of length 2048 with changes at 365i for i = 1, … , 5. The timing of the outliers is simulated from a Unif(1, 2048) distribution. To create outliers at these time points, we add a fixed constant, 15, to the existing observations. We repeat this for P = 0.01%, 1%, and 5% density of outlying observations within each data set as well as the no-outlier case for comparison. The choice to use additive outliers instead of multiplicative outliers means that the size of the outliers will vary less across segments with differing variances. Table 1 shows the number of changepoints detected by each of the methods for the four values of P over 500 repetitions. As expected, the performance of each method degrades as the percentage of outlying values increases. However, this degradation is not uniform across the methods. NPLE detects the correct number of changepoints 63% of the time when 5% of observations are outliers; in comparison, CSS achieves a similar rate when only 0.01% of observations are outliers. Figure 3 shows the density of detected changepoint locations for each of the methods for P equal to 0.01%, 1%, and 5%. NPLE maintains accurate changepoint locations as P increases, whereas the other methods are drawn to the outliers.
The results of these simulations demonstrate that NPLE is less sensitive to outliers than the other methods. When using MLvar, and similarly CSS, the outliers contribute to both the likelihood and the sum of squares directly and distort the estimates.
In the next simulation study, we consider another model with outliers; however, they are located at fixed points in time.

Fixed outliers
In this section, we test the robustness of the model for increasingly sized changes in variance, using variance changes that are more difficult to detect than those in Section 3.1. We simulated 500 repetitions of a normal distribution of length   Estimates; blue), MLvar (likelihood assuming Normality; purple), and CSS (cumulative sums of squares; orange) for the outliers model 2048 with changepoints at 365i for i = 1, … , 5 and = (1, 1.6, 1, 1.8, 1, 2). We also consider the effect of proximity of outliers to changepoint locations. Hence, we introduce multiplicative outliers located at times (361,462,723,924,1244,1630,1881) with inflation factors (20, −20, 16, 18, 20, 10, 7). Figure 4a shows a realization of this model, where it is important to note the location of the outlier in relation to the location of the changepoint. The first and third outliers occur close to changepoint locations, whereas the remaining outliers are firmly within segments. Despite the locations of the outliers being fixed, in comparison to the uncertain locations in Section 3.1, the size of the outliers is more variable as a consequence of their multiplicative nature. Figure 4b shows the density of detected changepoints, and Table 2 gives the corresponding numbers of changepoints detected. NPLE detects the true number of changes 87% of the time, whereas MLvar and CSS achieve only 13% and 14%, respectively.
Turning consideration to the locations of the changes, for the first change, for MLvar and CSS, the presence of the outliers near the changepoint means that there are two distinct peaks corresponding to the location of the change. This is not the case for NPLE; however, the outlier appears to result in the change being detected slightly early. At the third change, MLvar and CSS often detect a change in either side of the true changepoint location.
All three methods perform similarly when detecting the second change.
Despite being the largest changes, the last three are detected correctly the least by MLvar and CSS; this is probably a consequence of the methods detecting a larger number of changes elsewhere, induced by the outliers. The large outliers at 462, 924, and 1244 have clearly resulted in spurious changes for both MLvar and CSS.
Our final simulation study considered data that, instead of having outliers, exhibits heavy-tail behavior.

Heavy-tailed structure
In this section, we consider data generated from a generalized extreme value distribution, with zero mean (E(X t ) = 0), that exhibits varying changes in variance. The changes are located at times 256i, i = 1, … , 7, and the sequence of standard deviations is given by = (1, 1.6, 1, 1.8, 1, 2, 1, 2.5). We consider three values of the shape parameter for the generalized extreme value distribution: 0, 0.25, and 0.45. Note that as is a function of the shape and scale parameters, we keep the shape constant and only modify the scale across the segments to obtain the required . The tails become heavier as the shape parameter, , increases across simulations. Table 3 shows the number of changepoints detected, and Figure 5 shows the corresponding densities for the locations. As expected, as the tails become heavier, the detection rate decreases for all methods. While they all perform similarly for = 0 as the shape parameter increases, NPLE is most resilient to the heavy tails, providing around double the detection rate as = 0.25 and 0.45.  This illustrates NPLE's reduced sensitivity to heavy-tailed distributions. For MLvar, we are using a Gaussian assumption, and so, we expect this method to perform poorly, but CSS does not have any tail assumptions.

APPLICATION TO WIND SPEED CHARACTERIZATION
We now turn to consider the detection of variance changepoints within a time series of wind speeds. The data we analyze were obtained at a UK wind farm location during November 2005. Each measurement represents the average wind speed obtained from an anemometer at the farm. The series contains 4,261 observations, as depicted in Figure 6a. The data that support the findings of this study are available from the corresponding author upon reasonable request.
To explore whether any changes in variance exist within this wind speed data, we begin by taking first differences to remove the mean behavior. The resulting series has a very clear, nonconstant variance structure (Figure 6b). There also appear to be some anomalous observations that could potentially affect changepoint estimation. Note that while first differences were sufficient for this data sequence, wind speeds are periodic, and taking a longer stretch of data would require a different approach to remove this. Next, we apply both the NPLE and MLvar methods to the differenced wind speeds. To provide a fair comparison between the methods, we use the Lavielle (2005)  Normality) with eight changes following the method in Lavielle (2005) methods. The diagnostic plots are given in Figure 7, where it is clear that the elbow in the curve for NPLE is at nine changes and that for MLvar is at eight changes. The resulting changepoint plots for NPLE and MLvar are given in Figure 8. Note, in particular, how MLvar appears to be inflating the variance estimate for the first segment of data in response to the anomalous points. This results in a later changepoint than the NPLE method, which chooses to use two changepoints to capture the period of smaller variability. For operational decisions, the segmentation provided by NPLE is preferred.

CONCLUSION
In this paper, we have introduced a novel changepoint detection procedure to detect changes in variance (NPLE). The key benefits of our nonparametric approach are its capacity to provide changepoint estimates that are resilient to outliers and departures from normality. This method is shown to perform well against an established nonparametric method (CSS) and penalized likelihood approaches (MLvar) in all simulated settings. We also considered the utility of NPLE on data obtained from a UK wind farm. In future work, we plan to extend our approach to detect changes in local autocovariance. Such a development could prove useful to a wide variety of environmental applications.