Signal Optimal Smoothing by Means of Spectral Analysis Signal Optimal Smoothing by Means of Spectral Analysis

This chapter introduces two new empirical methods for obtaining optimal smoothing of noise‐ridden stationary and nonstationary, linear and nonlinear signals. Both methods utilize an application of the spectral representation theorem (SRT) for signal decomposi‐ tion that exploits the dynamic properties of optimal control. The methods, named as SRT1 and SRT2, produce a low‐resolution and a high‐resolution filter, which may be utilized for optimal long‐ and short‐run tracking as well as forecasting devices. Monte Carlo simulation applied to three broad classes of signals enables comparing the dual SRT methods with a similarly optimized version of the well‐known and reputed empiri‐ cal Hilbert‐Huang transform (HHT). The results point to a more satisfactory performance of the SRT methods and especially the second, in terms of low and high resolution as compared to the HHT for any of the three signal classes, in many cases also for nonlinear and stationary/nonstationary signals. Finally, all of the three methods undergo statistical experimenting on eight select real‐time data sets, which include climatic, seismological, economic and solar time series.


Introduction
The literature on time series smoothing and denoising techniques is vast and encompasses different disciplines, such as chemometrics, econometrics, seismology, signal analysis and many more. Among the most renown and utilized methods are the Savitzky-Golay and Hodrick-Prescott filters, the Hilbert-Huang transform (HHT), wavelet analysis, as well as the ample class of kernel filters [1][2][3][4][5][6].
All of these techniques share an ample spectrum of degrees of resolution loss of the original raw signal, namely, a large family of filters addressed at separating from noise the stochastic or broken trend underlying the observed signal. The researcher is thus enabled to select the desired denoising frequency visually and/or by means of some prior information, like threshold selection, rescaling and data-dependent smoothing weights. Automatic selection procedures are available in the context of Savitzky-Golay filtering developed in the field of chemometrics [7] and quite a few in the field of image processing (e.g. [8]). Such procedures are utilized also for the stoppage criteria embedded in the HHT but are virtually absent in the field of econometrics [3,[8][9][10][11].
In general, however, the capability of balancing high-with low-resolution characteristics in a world of different real-time data (RTD) patterns is oftentimes questionable. In fact, in spite of their efficacy, all of these models contribute a common shortcoming by letting the researcher arbitrarily-and thus casually-select the degree of resolution, thereby risking over/under fitting of the slow mode with respect to the original signal. The immediate consequence is suboptimal denoising, namely, a signal extraction yielding a smoother that is statistically inconsistent with the original peak/trough pattern or that too closely replicates the original signal. For this purpose, automated and optimizing parameter selection must be utilized, based on the minimum squared distance between the actual and the smoothed series [12].
The goal of this chapter exactly addresses this problem, which may be defined as the search for statistical dynamic efficiency of the estimated smoother. This search, in practice, ends up with extracting from the available data set the optimal smoother, namely, the stochastic or broken trend, which minimizes the noise variance among many second-best solutions. Section 2 introduces time series decomposition and the statistical taxonomy of stochastic noise. Sections 3 and 4 are devoted to an introduction and detailed description, respectively, of the HHT and of the SRT models for signal smoothing. Section 5 produces comparative efficiency results of the two new techniques with respect to HHT by using Monte Carlo simulations and reports descriptive statistics of some select climatic, seismological, economic and solar real-time series. Section 6 concludes, while the Appendix produces the sources of the real-time signals utilized.
is the slow-mode or growth component in the form of a broken trend [16] or of a smoother [3]. The second component s t is the periodical seasonal cycle, if any and the last component is the The signal may be linear or nonlinear, as well as stationary or nonstationary, depending on its components and on their distributional properties. Briefly, nonlinearity is a feature of a signal characterized by the presence of linear as well as quadratic and/or cubic variables and/ or multiplicative variables, rendering the signal typically nonparametric and thus unsuitable for standard statistical testing such as variance analysis, prediction and stationarity. In addition, a linear or nonlinear signal is stationary (nonstationary) if its best linear trend fit is significantly flat (rising/falling over time). Nonlinearity and nonstationarity of the signal [17] can be tested by means of appropriate procedures [18][19][20][21]. The resulting number of regime switches, when applicable, is determined by computing single or multiple intercept and/or trend time breaks [22][23][24]. For details on the corresponding critical-value statistics, the reader is redirected to the respective authors.
All components of the observed signal are stochastic and unobservable and must be retrieved from the raw data set by means of appropriate computational methods, some of which were invoked in Introduction. In particular, the slow-mode term requires the use of some filtering procedures to possibly attain an optimal low-resolution time-varying trend, in fact the smoother (Section 3). In addition, if s t is a high-or low-resolution phenomenon, it may be entrenched into either or y t * and thus difficult to disentangle from either component.
Any observed signal in Eq. (1) is shown to pertain to one of the following three statistical taxonomic classes: (1) a random Gaussian white noise (RGS), (2) a colored (pink/red) random noise (RWS) and (3) a valid-threshold regime-switching Gaussian mixture constructed as a casually ordered sequence of two or more of the previous signals (RGWS).
Formally, the first kind of signal (RGS) may be expressed as follows: where the constant term, which corresponds to the mean of the noise and the time-slope coef- T is a real-valued univariate white-noise data sequence such that ε t ~ IID ( 0, σ ε 2 ) and σ ε 2 << ∞ is a constant. In Eq. (2), if δ = 0 , the noise is trend-stationary and the process is a pure RGS with constant mean and variance and expected zero-autocovariance. Instead, if significantly δ ≠ 0 , the noise is trend-nonstationary and exhibits a rising/falling trend line with white noise superimposed. This result may be substituted into Eq. (1) such that the slow model, and with it the entire signal, would be significantly trended.

The second kind of signal (RWS) is
where the coefficients contribute the same characteristics as those in Eq. (2), but even if δ = 0 the noise is trend-nonstationary and both the variance and the autocovariance are not constant overtime. In consequence, the process, which is an additive white noise, is entirely time dependent in both mean and variance and this affects the overtime pattern of the signal and its components.
The third kind of signal (RGWS) is represented by the Markov regime-switching model [25][26][27][28][29][30][31], where the raw signal may undergo significant structural changes across its lifetime due to variability in its underlying probability transition matrix. Therefore, the signal, which is a combination of Eqs. (2) and (3), may experience some quiet and some turbulent states of nature, like stock-market fluctuations and seismic signals. The obvious consequence is that the signal may fall short of standard parametric normality or stationarity statistical testing. Moreover, a threshold upper and lower limit must be imposed on regime-switching dates to avoid spurious computation of clusters too close to each endpoint. The threshold is set in the present context to be 15-85% of the total number of observations with consequential "valid" dates comprised within the reduced-size sample [24].
The RGWS noise class is represented by the following sequence where m ≥ 2 are the valid state indexes and, for i ∈ [ 1, m ] , the time index t ( i ) is state-specific, the expected value E ( | δ i | ) ≥ 0 , whereas f i ( . ) are the nonadditive or additive white-noise functions, respectively, from Eqs. (2) and (3). From Eq. (4), the typical ith state of the signal is where, given an m-sized vector of exponents represented by a sequence of positive integers 1 ≤ α ( i ) << ∞ , the fast modes of the signal are , for the remaining 1 ≤ i < m The entire m-sized sequence of the state functions f i ( ε t(i),i ) may turn to be highly nonlinear if at least one α ( i ) > 1 . As a result, the entire signal would give rise in many cases to quirky graphics characterized by enhanced peaks and/or troughs, possibly intermitted by sizably flatter states. Multiple regime switching is expected to be very frequent in such a case, much less so if α ( i ) ≡ 1 , where Eq. (4) would show a more moderate pattern and fewer state gyrations, even zero.
Either of the three taxonomies enables empirical estimation of Eq. (1) by means of any of the smoothing procedures introduced in Section 1. Such procedures in different manners will find the smoother y t * and by default the component y t . In the IID context applied to the noise ε t of Eq. (2) to Eq. (4), the random slow and fast components of Eq. (1) are both discrete stochastic processes with zero or nonzero mean and with finite variance and autocovariance. Incidentally, if the slow component is significantly trended (not trended), its autocovariance process is persistent (tapers off slowly over time), whereas the autocovariance of the second component rapidly tapers off over time (is white noise).
Whereas by simple logic the extraction process of the smoother y t * is carried out over the entire timespan of the signal, in the presence of significant regime changes in Eqs. (4) and (5), this procedure might not produce an optimal smoother. Therefore, extraction of the smoother may be applied on a subsample cluster basis rather than by full-sample computation. Each subsample corresponds to a state (Eq. (4)) and its own smoother is estimated separately from the others. The sequence so obtained is then tested by minimum percentage root mean squared error (PRMSE). In quite, a few real-time data cases (Section 5), the subsample estimation technique is found to be a better performer.
As a matter of empirical fact, simulation outcomes of Monte Carlo iterations with 1000 draws indicate that around 60% of the RGWS signals with α ( i ) ≡ 1 in Eq. (5) are nonlinear, while on average, roughly 40% are stationary and almost half of them contain one or more valid regime switches. Nonlinearity and stationarity are very common in the RGWS signals with α ( i ) > 1 in Eq. (5), as they are found in 70% of the corresponding simulations. Finally, on average, less than 1% (10%) of the RGS (RWS) are nonstationary (stationary), whereas in most cases (over 90%), both signals are linear.

The Hilbert-Huang transform (HHT)
The Hilbert-Huang transform (HHT) is an empirical procedure designed to correct for noiseridden signal smoothing, purported to work for both nonlinear and nonstationary time series [4-6, 32, 33]. The HHT is based on the so-called ensemble empirical mode decomposition (EEMD), which consists, for any arbitrary number of steps 2 ≤ Q ≤ T , of producing at each step an intrinsic mode function (IMF). Each IMF is computed by consecutive "siftings" of the data, a procedure that involves finding the means of the high-resolution envelopes constructed by cubic splining of the extreme values of the available data, possibly after correcting for endpoints [34]. The Q-step IMF-sifting process for a given signal Y t is represented by the following one-lag adaptive sequence where, given the upper and lower envelopes s t,p U , s t,p is the mean envelope obtained at the end of each sifting process. More generally, from the second line onward, Eq. (7) may be written as follows: which represents the family of all the sifted IMFs starting from the highest frequency.
Subsequently, the matrix H S T,Q : ( T, Q ) of the HHT smoothers is obtained. This is the matrix of the In an HHT environment, the optimal high-resolution smoother among the candidates of H S T,Q may be detected by utilizing the stoppage criterion for sifting, or else by similar means closely akin to the percentage root mean squared error (PRMSE), a much-used performance index for goodness-of-fit purposes. The PRMSEs may be computed as follows: which produce an inverse signal-to-noise ratio screeplot of length Q [35,36]. The procedure for detecting the screeplot global minimizer, denoted as Q * , requires where ‖.‖ is the Euclidean norm of the enclosed argument and Q * = arg max 1≤q≤Q ( P q ) [37]. There immediately follows that the measured PRMSE, given Q * , is represented by the following formula (12) which is the flagship of the efficiency indicators that shall be utilized on confrontational grounds with respect to the two SRTs discussed in Section 4.

The spectral representation transform (SRT)
By virtue of the spectral representation theorem, the signal Y t as in Eq. (1) may be approximated by the following De Moivre's formula, as follows: which depicts a harmonic waveform continuous function, where r = √ ___ − 1 and ω are a given arbitrary and constant frequency. There follows that the signal Y t may be defined by the following periodic function where μ is the mean of the signal, k ∈ [ are real-valued random coefficient sequences, both IID with finite mean and variance and is the frequency sequence such that where Δ y t = f ( . ) expresses the existence of linearity and nonlinearity of the process, depending on the exponent attached to the lagged signal level [20,21]. Needless to say, K generally corresponds to the maximal EMD level contemplated in the HHT method.
The fitted signal of Eq. (14), which is the smoother estimable by ordinary least squares [25], is actually the SRT of the original signal with time-varying amplitude and lag k. If we let the time series of the prediction error be where e t,k ~ IID ( 0, σ e,k 2 ) . After producing 1000 Monte Carlo normal simulations of Eq. (14) with T = 300 , the central limit theorem (CLT), as k → T , is found not to hold asymptotically for RGWS, as is obvious in the presence of regime switches. Elsewise, for RGS and RWS, we have lim k→T ( Y t = Y ^ t ,k ) . Similar to the technique utilized for identifying the optimal smoother in the PRMSE sense exhibited in Section 3, also here a matrix of smoother candidates may be obtained depending on the kth lag chosen. The matrix is defined as S S T,K : ( T, K ) from which the optimal smoother can be selected by applying the optimal lag stopping criterion similar in kind to Eq. (16). However, we utilize here a performance index for model selection different from the PRMSE, which is based on the dynamics of the Hamiltonian optimal control problem [38][39][40].
If we let e t,k be defined as in Eq. (16), then its dynamics is captured by the first-order autoregressive AR(1) process g t,k = e t,k − e t−1,k while the dynamics of the smoother is expressed as the . We expect both processes to be normally distributed with zero mean and finite variance.
Hence, the Hamiltonian problem may be expressed as follows: where the first element within the curly braces is of obvious reading and represents a resolution index that picks up the high frequencies of the problem. The second and the third elements capture the AR(1) dynamics of the processes involved in the Hamiltonian problem, namely, those of the prediction error and those of the signal itself. Eq. (17) is a cubic spline smoother that partly resembles the Hodrick-Prescott filter in the inclusion of both a cyclical and a trend part [3]. Moreover, Eq. (17) forms the basis for a screeplot [36] similar in kind to that of Eq. (11), whereby the optimal lag K * is obtained after letting

17)
such that K * = arg max 1≤k≤K ( V k ) and wherefrom Y ^ t, K * * is the optimal low-resolution Hamiltonianbased smoother, which is named SRT1.
From this smoother the dual high-resolution SRT smoother, named SRT2, may be easily constructed by means of envelope augmentation, a procedure knowingly embedded into the HHT sifting process (Section 3). Let the upper and the lower envelopes of the actual signal be defined as the unique cubic splines s t U , s t L constructed by using the extrema of the given data. The upshot is the signal Y ^ t, K * , which is the mean of the two smoothers. The error time series, for the optimal given lag K * , are expressed in a fashion similar to Eq. (9), as follows where the first error is associated with SRT1 and the second is associated with SRT2.
From Eq. (19), the PRMSEs of both models may be found by the same means as those employed to obtain Eq. (12), namely 0.5 (20) where the first (second) index is associated with SRT1 (SRT2). Figure 1 depicts six signals pertaining to the three taxonomic classes (Section 2). The first three signals are 300 observation artificial RGS, RWS and RGWS, drawn from standard normal distributions. The other three are real-world signals, of which the first two are recent Japanese earthquake seismographic waveforms collected from a specific web directory [41]. The last signal represents the Qualcomm NASDAQ-traded stock close price on a weekly basis from the year 1995 to date. In particular, with regard to the first two real-time signals (Figure 1, panels 4 and 5), the reported seismographs concern two earthquakes, respectively, occurred Horizontal (vertical) measurements in Figure 1 represent time lengths (magnitudes). All of the three smoothers (HHT, SRT1 and SRT2) are included together with a linear best-fit trend. The first two signals (RGS and RWS) are linear while the third (RGWS) is nonlinear (Figure 1, panels 1-3).

Smoother analysis applied to artificial and real-time signals
Among them, only RGS is stationary and none exhibits a valid time break. Of the three real-time signals (Makurazaki and Tohoku earthquakes, Qualcomm), all are nonlinear and one is nonsta- tionary as well (Tohoku earthquake). Moreover, all of these signals exhibit one single valid time break each, respectively, located at observation 57, 1123 and 261. Finally, from unreported results, SRT2 prevails as the least-PRMSE smoother for the first four signals, while the quirky behavior of the last two signals (Tohoku and Qualcomm) finds HHT as the most efficient smoother. Table 1, columns A-C, exhibits the efficiency performance and other coefficients of the three smoothers (HHT, SRT1 and SRT2) for the three signal classes (RGS, RWS and RGWS). The results shown are the mean values obtained from empirical Monte Carlo signal simulations with 1000 draws each and length of 300 observations. The PRMSE coefficients reported in line 1 have received attention in Eqs. (12) and (20), whereas the coefficients reported in line 2 show the simulated mean error variance appearing in their numerators. The variance of the smoothed trends is exhibited in line 3. In all cases, simple eyeballing points to SRT2 as the most efficient, that is, the variance minimizing smoothing method ( Table 1, third column of columns A-C).
Stationarity of the error variances in Eqs. (9) and (19) is crucial for establishing the goodnessof-fit of the estimated smoothers of all signal classes. Here, stationarity is tested by means of a novel technique, which corrects the conventional augmented Dickey-Fuller test statistic [19] after accounting for overtime changes in subsample variances, cycles and growing/falling linear trends in the signal [12]. This technique exhibits a similar nonparametric distribution as the ADF test statistic, such that the critical test statistics for stationarity of the untransformed signal are -2.79 and -2.51 for p-values of 1 and 5%, respectively. The corrected ADF test statistics of the error variances are reported in Table 1, line 4. In many, if not in most cases, stationarity emerges, yet the error variances generated by SRT2 are significantly stationary for all signal classes, whereas the other methods fail in the particular case of RGWS.
In order to further compare the HHT and the SRT performances, select real-time signals are being put to empirical testing for the sake of optimal smoothing analysis. The entire data set of the real-time data (RTD) includes eight signals of different nature: climatic, economic, seismic and solar. The RTD proposed are mostly high resolution and long term, ranging from a minimum of 315 to a maximum of 1915 observations of which one is yearly and all the others are monthly. The last access to the web for all available data was November 30, 2015. Among these RTDs, worth of more details than those provided in the Appendix is the earthquake of Banda Aceh, Indonesia, which occurred on December 26, 2014 at 00:58 UTC with M w of 9.2 and which caused, especially due to an extraordinarily violent tsunami wave, a death toll of an estimated 250,000. The data utilized straddle the foreshock and the main shock tremors, namely, the observations comprised between 12,000 and 15,000 out of a total of recorded waves tallying 58,320 observations.
The relevant descriptive statistics and visual performances of the RTDs are exhibited, respectively, in Table 2 and in Figure 2a and b. From Table 2, a very diverse pattern emerges in terms of mean and standard deviation with estimated volatilities ranging from 1.0 (NASDAQ) to 101.0 (Banda). Moreover, skewness appears mild everywhere, barring two cases where it is relatively high (S&P500 and NASDAQ), whereas kurtosis hovers for all RTDs around its critical value of three. All but two of the RTDs (AMO and SSN) are nonstationary and all exhibit zero valid regime switches, exclusion made for Banda and NASDAQ. Finally, they are all nonlinear except for AMO and SPI, as subsumed from the Harvey linearity test statistic whose critical value is close to 3.0. Optimal smoothing of the RTDs is achieved in the last two cases (Banda and NASDAQ) by means of subsample cluster analysis whereas for the other RTDs full-sample computation is more efficient (Section 2). The SRT2 smoother is found to exhibit the smallest PRMSEs in 75% of the cases proposed, whereas only three of them prefer the HHT method (S&P500, NASDAQ and SSN).
For each RTD, the actual signal, its optimal smoother (HHT or SRT2) and its linear trend are exhibited in Figure 2a and b. In Figure 1, horizontal (vertical) measurements represent time  lengths (magnitudes). Among the three HHT optimal smoothers found above, two are related to quirky signals, S&P500 and NASDAQ (Figure 2a, panel 4 and Figure 2b, panel 7). They exhibit however different frequencies, in spite of being estimated by the same subsample cluster technique. In fact the former (latter) is a low-(high-) resolution signal. Both signals exhibit in any case broken trends characterized by a long period of quiet followed by wild gyrations, which reflect the highly varying moods of both the stock market and of the Federal Reserve Board. The third signal associated with optimal smoothing attained through the HHT is the time series of sunspots (SSN). Full-sample estimation of the smoother was preferred, while broken trends clearly emerge from visual inspection (Figure 2b, panel 8). Significant regime switches are found to occur at the observations 1810 and 1902, in occasion of the Dalton and of the Modern Minimum, respectively [12].
Among the other five RTDs whose optimal smoother is of the SRT2 brand, only one (Banda) requires subsample cluster computation. The optimal smoother exhibits a dramatic regime switch in the passage from foreshocks to the main shocks and somewhat later (Figure 2b, panel 6). The two major break dates were found to be placed at observations 1833 and 1905 of 3000 observations. All of the RTD optimal smoothers, including Banda, exhibit high resolution and, at least visually, manifest considerable accuracy in tracking the original signal. This is particularly true for AMO, GISS, Yamalia and SPI, which are sizably noise-ridden (Figure 2a, panels 1-3 and 5).

Conclusions
This chapter has introduced and described in detail two new dual empirical methods for obtaining optimal smoothing of random signals pertaining to three broad taxonomic classes. Both methods utilize an application of the spectral representation theorem for signal decomposition that exploits the dynamic properties of optimal control. The two methods, named SRT1 and SRT2, produce a low-and a high-resolution filter, which may be utilized for optimal long-and short-run tracking as well as forecasting devices. The methods proven by Monte Carlo simulation found to be more efficient than the empirical Hilbert-Huang transform (HHT) for all of the taxonomic classes. The methods are also comparatively tested by using random artificial and a bunch of real-time signals, particularly eight select data sets including climatic, seismological and economic time series. HHT is proven to be more efficient in few cases of quirky, multiple regime-switch signals, like the Standard & Poor's 500 and the NASDAQ indexes.