Asymptotic Information Measures Discrimination of Non-Stationary Time Series Based on Wavelet Domain Discriminacion de medidas de información asintótica de series de tiempo no estacionarias basadas en dominio wavelet

This article is concerned with the problem of discrimination between two classes of locally stationary time series based on minimum discrimination information. We view the observed signals as realizations of Gaussian locally stationary wavelet (LSW) processes. The asymptotic Kullback Leibler discrimination information and Chernoff discrimination information are developed as discriminant criteria for LSW processes. The simulation study showed that our procedure performs as well as other procedures and in some cases better than some other classification methods. Applications to classifying real data show the usefulness of our discriminant criteria.


Introduction
Many practical problems in time series analysis are reduced to classifying a stochastic process to one or a other category.Shumway & Stoffer (2011) is a good source for examples in this field.Kullback & Leibler (1951) and Chernoff (1952) information measures, hereafter KL and CH information measures, respectively, are appropriate measures for classification and discrimination of time series data.KL and CH information measures have been widely used for many years in various fields such as economics, engineering and medical sciences.Gersch, Yeager, Diamond, Spire, Gerry & Gevins (1975) and Gersch & Yonemoto (1977) used KL information measures to determine whether or not patients are sufficiency anesthetized for surgery.Shumway & Unger (1974) and Dargahi-Noubary & Laycock (1981) obtained spectral forms of KL discrimination information to distinguish between different classes of seismic data, earthquakes and explosions.In an extension of Shumway & Unger (1974), Kakizawa, Shumway & Taniguchi (1998) developed KL along with Chernoff discrimination measures to bivariate time series data.Parzen (1990) and Zhang & Taniguchi (1992) studied the robustness of KL to non-Gaussian departure.All of the afore mentioned papers are based on stationary assumption.These methods are performed admirably for stationary processes, however time series often have time-varying dynamics, and traditional spectral procedures are not performed well.Recently, to model non-stationary time series, authors have proposed several approaches such as locally stationary modeling by Dahlhaus (1997) or smooth localized complex exponential (SLEX) methodology by Ombao, Raz, von Saches & Malow (2001) and Ombao, Raz, von Saches & Guo (2002).These methods provide a better basis for the discrimination of non-stationary time series such as seismic data.Recognizing this, Shumway (2003) and Sakiyama & Taniguchi (2004) for Dahlhaus' locally stationary processes and Huang, Ombao & Stoffer (2004) for stationary time series in the SLEX model, proposed classification techniques.In the remainder of the article, we refer to these procedures as the SST and SLEX methods respectively.Some authors like Maharaj & Alonso (2007) and Fryzlewicz (2003) have proposed methods based on wavelets to discriminate analysis of time series and employ it to discriminate seismic signals.Modeling a non-stationary random process, such as a LSW processes is discussed in detail by Nason, von Sachs & Kroisandt (2000).The LSW model provides a time-scale decomposition of the signals in which we can define and rigorously estimate the evolutionary wavelet spectrum Fryzlewicz & Ombao (2009).In this paper we view the observed signals as realizations of Gaussian LSW processes.We obtain likelihood and calculating the asymptotic KL and CH discrimination measures as discriminant criteria for Gaussian LSW processes.
Our discriminant criteria are related to evolutionary wavelet spectrum (EWS), which contains the second-moment information on the non-stationary time series.Throughout the paper we suppose that there are two different groups of Gaussian LSW processes and there is a new non-stationary time series vector that has to be allocated to one of them.For each time series to be classified, we compute the wavelet discrimination information measures and the divergence from the wavelet discrimination information measures of each group, which is then assigned to the group to which it is the least dissimilar.We recognize that our methodology is very close to the method mentioned in Fryzlewicz & Ombao (2009) (that we refer to as the LSW method), but ours differs in that the LSW method is based on the L 2 criteria while our method is based on much familiar criteria in discrimination i. e. the Kullback-Leibler and Chernoff discrimination measures.
The article is organized in six Sections as follows: In Section 2, we discuss the LSW model as a tool to analyze non-stationary time series data.In Section 3, we obtain the minimum discrimination information for two criteria, Kullback-Leibler and Chernoff discrimination measures for Gaussian LSW processes.We explain our discrimination algorithm in Section 4. To evaluate the performance of our discriminant criteria, in Section 5, a simulation study is carried out and our methods are compared with other proposed approaches.Finally, in Section 6, we apply our procedure to real data.

The LSW Model
Throughout the paper we assume that the non-stationary time series follows the LSW model based on the use of the discrete wavelet transform (DWT), use Nason et al. (2000).A class of LSW processes, is a sequence of doubly-indexed stochastic processes {X t,T } t=1,...,T , T ≥ 1, with the following representation in the mean-square sense in respect to a given wavelet basis {ψ j,k (t)} j,k of L 2 (R), and an orthonormal random sequence of increments ξ j,k where and possess the following properties: 1. E(ξj, k) = 0 for all j, k and hence E(X t,T ) = 0 for all t and T .
4. The real-valued wavelet basis {ψ j,k (t)} j,k is orthonormal and the wavelets have compact support.
Note 2. Note that most of the time series, including the stationary series can be modeled as (1).

Discrimination Based on Using Decimated Wavelet Basis
X t,T ψ j,k (t) be the empirical wavelet coefficients, where X t,T is a LSW process given by (1).Then: Hence, since the ξ l,m are uncorrelated: 1) is a Gaussian process, the empirical wavelet coefficients are independent Gaussian variables, i.e: then, using formula (4), the log likelihood ratio based on empirical wavelet coefficients becomes Consider the situation where a LSW process X t,T , under H i belong to population One classical measure for disparity between two multivariate distribution is the Kullback-Leibler (KL) discriminant information (Kullback & Leibler 1951, Kullback 1978), defined as where p Πi is the probability density of X t,T , under H i , (i = 1, 2).
Another criterion for measuring the discrepancy between two densities, proposed in Parzen (1990), is Chernoff information based, which is on Chernoff distance (see Chernoff 1952) and given by where h ∈ (0, 1).

Discrimination Based on Using Non-Decimated Wavelet Basis
In an LSW model based on using non-decimated wavelet basis ψ jk (t) = 2 j/2 ψ(2 j (t− k)) : Let ξ j,k be a Gaussian process, then where The covariance between two empirical wavelet coefficients d j,k;T and d j ,k ;T decays with an increasing distance between the two relative positions of the location k on scale j and k on scale j .For example, when j = j and k = k, we have ) is zero as v +|k −k| exceeds the supports of the wavelets.Therefore, the correlation between d j,k:T and d j,k :T vanishes as |k − k| → ∞.
As noted in Remark 1, , in the KL criterion in (8) and CH criterion in (9), we have: and In practice the true wavelet spectra S i j (z) are unknown, and should be estimated.Let I j k,T = |d j,k;T | 2 be the wavelet periodogram of {X t,T }.From ( 11), I j k,T is a biased estimator of the evolutionary wavelet spectrum (EWS).(Nason et al. 2000) have shown that the J-dimensional matrix A j = (A j,l ) j,l=−1,...,−J is positive-definite and has a bounded inverse.Formula (11) suggests that the natural unbiased estimator of S j (k/T ) is an empirical wavelet spectrum given by

Discrimination Based on KL and CH Measures
Suppose that the observed vector of time series W = {W t,T } t=1,...,T .T ≥ 1 has to be assigned to one of two available groups {X 1 , . . ., X n1 } or {Y 1 , . . ., Y n2 } with evolutionary wavelet spectra S 1 j (z) and S 2 j (z), respectively.The classification algorithm proceeds as follows.
1. Choice of the Discriminating Set Θ.The important wavelet coefficients for classification are identified using the method proposed by Fryzlewicz & Ombao (2009): For each timescale (j, k), the divergence index measures the ability to separate the groups.These divergence values are then ordered and only the top pre-specified proportion of the coefficients need to be chosen.
2. Discrimination based on Kullback-Leibler (KL) discriminant criterion.According to the KL criterion, W = {W t,T } t=1,...,T , T ≥ 1}, is assigned to the first group if where 3. Discrimination based on Chernoff (CH) criterion.According to the CH criterion, W = {W t,T } t=1,...,T , T ≥ 1} , is assigned to first group if where In practice, the spectra S i j (z) in ( 16) and ( 17)are replaced by the empirical wavelet spectra, and are averaged across time series replicates.The empirical wavelet periodogram for r-th series in group i is denoted by L i,r j (k/T ).We then replace S i j (z) in ( 16) and ( 17) by its estimate Ŝi j (k/T ) = 1 ni ni r=1 L i,r j (k/T ).As in the stationary case the empirical wavelet periodogram is an asymptotically unbiased but inconsistent estimator for wavelet spectra and needs to be smoothed.In practice, a smoothed version of periodogram is used.According to the KL criterion, W is assigned to the first group if and to second group otherwise.According to the CH criterion, W is assigned to first group if and to second group otherwise.

Numerical Study
In this section in order to check the performance of the discriminant criteria obtained in the previous section, a numerical study is conducted.Various kinds of time series including the stationary and different kinds of non-stationary time series were investigated.The results approve the ability of the discriminant criteria ( 16) and ( 17).Nevertheless, the simulation study done by Fryzlewicz & Ombao (2009) is repeated in order to compare the results.This helps in comparing the discriminate criterions with the other advanced and useful methods in discriminating time series which were proposed by other researchers.These methods are: the SST method, which is proposed by Shumway (2003) and Sakiyama & Taniguchi (2004); the SLEX method, which is proposed by Huang et al. (2004); and the LSW method, which is proposed by Fryzlewicz & Ombao (2009).Fryzlewicz & Ombao (2009) considered four cases: Case 1 for abruptly changing parameters and relatively large disparity between groups.
Case 2 for abruptly changing parameters and small disparity between groups.
Case 3 for smoothly changing parameters and relatively large disparity between groups.
Case 4 for abruptly changing parameters and small disparity between groups.
The n-th time series from group g(g = 1, 2), denoted by X g n,t , is generated from the process defined by where g n,t ∼ iid N (0, 1)  In all cases, each of the two groups consisted of N = 8 time series, and each time series has length T = 256.Fifty test time series each of length T = 256 were then simulated from each group 1 and group 2. The testing time series were independent of the training data.The subset Θ of coefficients, were selected according to the top p = 10% of the timescale coefficients.Then each test time series was allocated to group 1 or 2 according to discriminant criterions ( 18) and (19).Daubechies (1992) identifies the Extremal Phase family: a collection of orthonormal wavelet bases possessing different degrees of smoothness and number of vanishing moments.This family of bases is indexed by the number of vanishing moments and the Haar basis is its zeroth member (Fryzlewicz 2003).Nason et al. (2000) have shown that, the J-dimensional matrix, in (12), becomes 'more' diagonal as the number of vanishing moments of the underlying wavelet increases.In all the simulations, we use the Daubechies' Extremal Phase wavelet No. 4 from the wavethresh software package for R.However, our investigations showed that, various amounts of vanishing moments have a negligible effect on the classification problem.We also eliminated the coarest scale in all calculations.
Within the Chernoff discriminate criterion, different amounts of the h parameter were examined and it was shown that small amounts lead to better results.Therefore, the amount 0.1 was chosen for the h parameter.In practice however, there would be rare cases in which the inputs of the logarithm statement became negative in the discriminate criterions ( 16) and ( 17).These aspects were omitted from the calculations.The results are compared with three other approaches which include: the ST method, the SLEX method and the LSW method.The results are presented in Table 1.As can be seen in Table 1, in case 1, the SLEX method, with either finest level J = 3 or 4, and the LSW method, applied with the top p = 25% of timescale coefficients, perform perfectly.Other methods also work well.In case 2, the best performance was achieved by the LSW method (89%).Also, it was often the case that the CH and KL methods have accepted results with KL having slightly better result than the CH.In this case SST and the SLEX methods are not so good as other methods.It should be mentioned that case 3 is results are good, except for the SLEX method, which has a 74% correct classification rate.In this case SST and LSW have perfect performance.The CH method has slightly better results than KL method.
The results of KL, CH and LSW in case 4 are the same (92% correct classification).The best result belonged to the SST method, which has a 96% corrected classification rate.The SLEX method is the worst.
It should be mentioned that in all four methods, the conditions considered for the two groups including the number group members, the length of the time series and their models, were difficult and had complicated conditions.In general, KL and CH have shown a relatively good performance in all four cases.Meanwhile, it is been noticed that when the face the real data, the Kullback -Leibler and Chernoff criterion act much better than the LSW method.

Discrimination of Seismic Signals
Discriminating between nuclear explosions and earthquakes is a problem of critical importance to monitor a comprehensive test-ban treaty.A dataset constructed by Blandford (1993) that comprises regional (100-2,000 km) recordings of several typical Scandinavia is used in this study.A list of these events, including eight earthquakes and eight explosions and an event of uncertain origin located in the Novaya Zemlya region of Russia, was given by Kakizawa et al. (1998).The problem was discussed in detail by (Shumway & Stoffer 2011, chap. 5) and the data is available online at http://lib.stat.cmu.edu/general/tsa2/.Each Earthquake and Explosion record is actually composed of two phases: P-phase and S-phase, with 1024 points in each phase.(see Figure 2, for an earthquake, an explosion and the Novaya Zemlya (NZ) event.)The results for discrimination are seen in Table 2.It should be noted that in the CH method we use h = 0.05 since it became obvious that a small value gave better results.To determine the rate of correct classification, we used the leave-One-Out method.The discrimination results are good, in that, in the KL method, 14 out of the 16 seismic signals were correctly classified for p = 10% and in he CH method, 13 out of the 16 seismic signals were correctly classified for p = 10% and h = 0.05.In our investigation, it became obvious that, for seismic data, KL discriminant criteria, when finest scale, eliminated leads to get a better result.By eliminating finest scale, 15 out of the 16 seismic signals were correctly classified in the KL method.In the LSW method, when applied with the top p = 0.1k, k = 1, . . ., 10, of timescale coefficients, only 14 out of the 16 seismic signals were correctly classified.
Moreover, in both te KL and CH methods, the Novaya Zemlya event is classified as an explosion, which is consistent with the findings in Huang et al. (2004) and Fryzlewicz & Ombao (2009).

Discrimination of Population Data
The population dataset was a collection of time-series representing the population estimates from 1900-1999 in 20 us states, and the data is available online from http://www.census.gov/population/www/estimates/ststts.html.Some of these time-series had an exponentially increasing trend while others had a stabilizing trend.The 20 states were partitioned into two groups based on their trends: group 1 consisted of CA, CO, FL, GA, MD, NC, SC, TN, TX, VA, and WA and had the exponentially increasing trend, while group 2 consisted of IL, MA, MI, NJ, NY, OK, PA, ND, and SD and had a stabilizing trend (Kalpakis, Gada & Puttagunta 2001).
Since the time series length is 99, we use a novel approach to use the information as much as possible.We partitioning the time series into 3 parts: the first part with a length of 64, consists of 1900-1963, the second part with a length of 32, consists of 1964-1995 and finally the third part with a length of 4 consists of 1996-1999, which was eliminated.We then standardized the first and second parts of the data sets.In order to under take discrimination of this data set, we used the sum of our proposed KL and CH criteria and Euclidian distance (which is proposed by Fryzlewicz & Ombao (2009)) in two parts.All three methods applied with the top p = 10% of timescale coefficients and the coarsest scale were eliminated.Once again within the CH criterion, different amounts of the h parameter were examined and it was shown that small amounts lead to better results.Therefore, the amount 0.1 was chosen for the h parameter.In the leave-One-Out method, all time series were correctly classified by the CH and KL criterions.However, in the LSW method, 18 out of 20 time series were correctly classified, the NC and MI were misclassified.

Figure 1 :
Figure 1: Plot of a simulated time series for each of the cases in both groups, g = 1, 2.

Figure 2 :
Figure 2: An earthquake, an explosion and the NZ event.

Table 1 :
The corrected classification rate for four case models using different criterions.

Table 2 :
The number of misclassifications using a leave-one-out cross validation procedure in the discrimination of seismic signals based on criterion (16) and (17).