Time Series Chaos Detection and Assessment via Scale Dependent Lyapunov Exponent

,


Introduction
Nature is per se' a non-linear entity, whose measurable expressions are non-linear as well.Real-world Data Generating Processes (DGPs) -except for trivial cases or lab-controlled experiments -hardly ever do possess features compatible with a linear framework.In particular, for time series produced by non linear dynamical systems, non linearity can unfold in many different ways, giving rise to various, mainly complex phenomena as a result, such as deterministic chaos, long memory, non stationarity, fractal and multi-fractal behavior.Deterministic chaos is among the most interesting features that can arise in such a framework.In particular, two topics have become crucial in several research fields -such as engineering, meteorology, cosmology and medicine -i.e. its discrimination from random noise and the estimation of its level.In fact, dealing with complicated structures -as in the case of weak RADAR signals (see for example Harman et al., 2006;Liu et al., 2007), noisy satellite images (Olsen et al., 2009) or complex biological data Freeman, 1992 and reference therein) -requires the analyst to have a picture as clear as possible on whether chaotic components are present in the dynamical system at hand as well as on their extent.However, it should be emphasized that, in its short life (it dates back to the early 70s), chaos theory has impacted a wide range of scientific activities, including the non physical ones, such as economics (see e.g.Peters, 1994;Brock et al., 1989;Puu, 2013), psychology (Guastello, 2013), sociology (Eve et al., 1997) and archaeology (Chadwick, 1998).
Deterministic chaos is characterized by sensitivity to initial conditions, which can be detected by measuring the rate of divergence between two trajectories starting from nearby states.Lyapunov Exponent (LE) -is one popular choice routinely employed to this end.This mathematical tool has been conceived and designed to perform in the field of physical science, where the full knowledge of the underlying stochastic process -in terms of both signal-to-noise ratio (SNR) and determining equations -guarantees the control of the system investigated.However, many times evidences of chaos in a given experimental system has to be evaluated on empirical basis, namely by direct observation of the empirical data.Regardless of the field of application, two typical features of such a framework are the lack of knowledge of the set of rules governing the system as well as of the quantity of noise present (and its exact nature).While the former makes system control procedures in general difficult to apply, the latter might have even more dangerous consequences.As an ubiquitous and unavoidable entity inherent with the experimental nature of the data, noise can have a catastrophic impact on virtually all the stages of chaos detection and assessment.Either in the form of environmental fluctuations or limited experimental resolution, noise can introduce significant amount of uncertainty in the system and chaos can go undetected as a result.Another serious problem is related to the fact that the estimation of the Lyapunov exponent, being performed on trajectories of finite length, can be severely biased due to their insufficient duration.
In the present paper, it is shown how chaos detection and assessment can be pursued in empirical setups, i.e. characterized by sensitivity to external noise and poor data resolution, through the estimation of the Lyapunov Exponent within the framework of multi-resolution approximation (MRA).The rest of the paper is organized as follows: in Section 1.1 the goals pursued are more precisely articulated and in Section 1.2 wavelet theory is introduced.Its particularization within Chaos theory is discussed in Section 2, where also the analytic tools employed for chaos detection are introduced.The employed variance decomposition technique is illustrated in Section 2.2.Finally, the empirical experiment is illustrated in Sections 3.

The Goal
As already pointed out, the objective of the present paper is to undertake the analysis of a given time series to check for the presence of deterministic chaos and, if so, to provide an estimate -either quantitative and qualitative -of its degree.In more details, the proposed approach combines chaos and wavelet theories with the three-fold purposes of a) detecting deterministic chaos b) discriminating it from random noise and c) quantifying chaos at different time scales.By having a clearer picture of the embedded structures of the time series at hand, one is more likely to assess the predictive capabilities of a given forecasting tool or to conclude that a portion of it (either in time or frequency), or even the whole time series is unpredictable.In particular, wavelet decomposition can help the analyst choose the more predictable components and to design ad hoc forecasting procedures.Finally, a variance decomposition algorithm is employed as a proxy of the amount of chaos present at a given decomposition level.In more details, point a) is dealt with by repeating the LE estimation at each scale: by probing small length resolutions, chaos (and possibly other non-linear phenomena) detection as well as better understanding of its nature can be gained.Regarding the issue sub b), MRA low-pass filtering capabilities can be useful to discriminate pure random noise and chaotic components whereas point c) is pretty straightforward, as it consists in applying variance decomposition (W-ANOVA, i.e.Wavelet Analysis of Variance) procedures at each scale.

Signal Decomposing Procedure
Dynamical decomposition of the observed time series is performed through a mathematical transform of the type MRA, which is induced by well localized functions: the wavelets, formalized in (1) and (2).MRA consists of a hierarchical sequence of nested subspaces progressively approximating the Hilbert space L 2 (R) of all the squared integrable functions satisfying the following properties: The scaling function ϕ (the father wavelet), constitutes the scale function for a given MRA.In other words, scaling by 2 j generates basis functions for the space V j so that, given that (...V j−1 ⊂ V j ⊂ V j+1 ...), the scaling equation can be expressed as (1), under the condition of a proper choice of the coefficients {c k ; k ∈ Z}; by linearly combining the scaled father wavelet with a suitably chosen set of the coefficients By dilations and translations of ψ(x), the space of functions Ψ(x) is generated , i.e.Ψ = By projecting the data onto shifted and translated transformations of the mother and father wavelets, the sets of wavelet and scaling coefficients are respectively obtained, i.e.
Here, each set of coefficients, usually referred to as crystal, is linked to a spacial scale j, whereas every single coefficient, called atom, accounts for a particular location.In practice, wavelet coefficients { d jk ; j = 1, 2, ..., J } in (3) account for and represent progressively finer and finer details whereas smooth dynamics at the coarsest scale are captured by the crystal {S Jk } (4).MODWT is the filtering approach achieved used to perform MRA.Here the wavelet and scale coefficients are given by: where } are the length L, level j, wavelet and scaling filters, obtained by rescaling their DWT counterparts, i.e.

{ h j,l
} and } , as follows: h j,l = h j,l 2 j/2 and g j,l = g j,l 2 j/2 .Here, the sequences of coefficients } and } are approximate filters: the former, of the type band-pass, with nominal pass-band f ∈ [ 1 4 µ j , 1 2 µ j ], and the latter of the type low-pass, with a nominal pass-band f ∈ [0, 1 4 µ j ], with µ j denoting the scale.Considering all the J = J max sustainable scales, MRA wavelet representation of the given time series x t , in the L 2 (R) space, can be expressed as follows: with k taking integer values from 1 to the length of the vector of wavelet coefficients related to the component j.

Practicalities
In an empirical set up, boundary conditions can be selected on the basis of the characteristics of the time series and/or as a part of a preliminary ad hoc investigation on a guess and check basis.On the contrary, the choice of the wavelet function and its length L is in general critical and strictly system specific.In what follows, the 4 th order Daubechies least asymmetric wavelet filter (known also as symmlets) of length L = 8, usually denoted by LA(8), has been made use of.This type of filter, commonly adopted in several applications, has been chosen for its salient properties, that is: near symmetry in the midpoint and (approximately) linear phase.MRA has been performed using Maximum Overlapping Discrete Wavelet Transform (MODWT) algorithm.Technically, it is a filtering approach aimed at modifying the observed series x t , by artificially introducing an extension of it, so that the unobserved samples {x} t∈Z − are assigned the observed values X T −1 , X T −2 ,...,X 0 .This method, considers the series as it were periodic 1 , and is known as using circular boundary conditions.For more details on MODWT, the reader is referred to Percival and Walden (2006) and Percival (2002).

The Wavelet Phase-space Reconstruction and the Scale Dependent Chaos Detector
The Lyapunov-based investigation is defined by considering an ensemble of trajectories, which requires a suitable reconstruction of the phase space of the underlying dynamical system, say x n+1 = F (x n ).This is observed through a function y = H(x), whose elements are the scalar measurements S (n) at a different time, i.e.
with m being the embedding dimension and τ the time delay.Takens embedding theorem allows us to use the delay coordinates (7), in virtue of the two dynamical systems arising from this setup, i.e. y n+k = H(x n+k ) = H(F k (x n )) and y n+1 = G(y n ).Considering the latter, x ↔ y, one-to-one type relationship is guaranteed only for a "sufficiently" large m.Takens theorem particularizes to time dependent processes, the results obtained by Whitney in the case of dimension D smooth manifold, say M, whose dynamic can be embedded in R 2D+1 and immersed in R 2D .In essence, Takens 1 However, it should be emphasized that, even though such an approach suffices in many cases, it shows weakness when non-periodic signals are affected by discontinuities, as in the case, for example, of certain deseasonalized economic indicators.To alleviate the problem, a common strategy is to introduce an artificial extension of the time series, by doubling up its original sample size through boundary conditions of the type reflection, i.e. the unobserved values x −1 , x −2 , ..., x −T are assigned the values observed at x 0 , x 1 ,...x proved that -given a diffeophorism M defining the set of trajectories on M, a smooth observation function y : M → R generates an embedding of M in 2m + 1 dimensions under the transformation M M,y : M → R 2m+1 , where M M,y (x) = < y(x), y(M(x)), y(M 2 (x)), . . ., y(M 2m (x)) >.Here, each of the elements < y(x), y(M(x)), y(M 2 (x)), . . ., y(M 2m (x)) > represents the time-shifted observations of the dynamics induced by M on M.An in-depth and detailed discussion of this subjected can be found in Deyle and Sugihara (2011) and Noakes (1991).
Relevant for the present analysis is the result from Sauer and Yorke (1993) on whether filtering procedure could affect proper embedding.Their central result is that by applying filters of the type Finite Impulse Response (FIR) embedding procedures, in general, would not be compromised, under the assumption that a sufficient number of independent observables is available.This setup is consistent with the adopted MRA approach -other than the assumption, previously stated, of time series of "sufficient" length.Regarding the first condition, a basis of the type Riesz can be transformed into an orthogonal basis through the transfer function

}
(5), which entirely determines the scaling function S J,t (5).The g-induced scaling function S t is compactly supported if and only if } has a finite number of non zero coefficients, that is g(•) is a FIR filter.
However, in order for the function g(•) to generate multiresolution approximations, some conditions need to be satisfied (see, for example Merry & Steinbuch, 2005).They are: On the other hand, if (a) ĝ(ω) is periodic (period = 2π); (b) ĝ(ω) of the type C 1 in a neighborhood of ω = 0; (c) conditions (i, ii) above hold; is the Fourier transform of a scaling function S ∈ L 2 (R).
Finally, the application of Lyapunov exponent at different resolution levels finds its theoretical justification in the fact that (see e.g.Lamarque & Malasoma, 1996 ) given a scalar function f : R → R, and a wavelet function ϕ(x) (1), the f -wavelet transformation Q is given by with b being the focal point of the mathematical microscope ϕ(•) and 1 a the magnifying factor.The quantification of the separation (δ) over time (∆ t ) of couple of neighbor trajectories starting in δ giving rise to the Lyapunov exponent λ, i.e.
is done with reference to the function Q(•).
Therefore, the Lyapunov exponent is now expressed as where U(x t 0 ) defines the neighborhood centered on x t 0 .

Embedding Dimension and Time Delay Estimation
In what follows, the analytic tools employed for the optimal LE estimation are presented.They are designed to capture patterns in the probabilistic structure in the linear (autocorrelation function) and non linear (mutual information) case, whereas the estimation of the time delay is provided by the false nearest neighbor method.

Autocorrelation and Mutual Information
The usual autocorrelation function (11)(12) and the time delayed mutual information (13-14), as well as visual inspection of delay representations with various lags provide important information about reasonable delay times.
Here ( 12) -whose asymptotic confidence intervals are 1 T -represents the estimator for (11) at a given resolution level j.However, more guidance has been gained by means of the mutual information function.This function is able to capture and account for linear and nonlinear correlations; in essence, it expresses the entropy-related concept of measuring the "amount of information" obtained about one random variable, through the other random variable.It is given by: Its estimator, for the resolution level j, is given by

Time Delay Estimation
Determination of the proper time delay d has been done with regard to nearest neighbor of each of the vectors (7) using the L 2 norm.This is a critical and in general a non trivial task: its underestimation is particularly dangerous, as two time delay vectors might show a small distance not as a result of the peculiar system dynamics but due to projection.The employed procedure, conducted at the j-level, declares a given neighbor of y j,k , denoted as y • j,k , a false neighbor according to the rate of false nearest neighbors in the reconstructed phase space, i.e.: Here, A thr ≈ 2 (see e.g.Aittokallio et al., 1999) is the radius of the attractor whereas with µ(•) denoting the mean value computed on all the points.
However, this procedure -designed to test progressively higher dimensionalies until a sufficiently small number is obtained (possibly 0) -is able to provide reliable outcomes in case of noise-free data.On the contrary, when noisy components are embedded in the time series, the likelihood of finding false nearest neighbors increase with the sample size.The use of wavelet theory, in the form of MRA algorithm, is a coping strategy for such a circumstance, as it is able to provide information at different levels which can be less affected by noise.
Spurious temporal correlation, which can seriously bias the estimation of the system embedding dimension, are dealt with by ruling out the set of points closer than some threshold time, i.e. the Theiler window.This quantity has been estimated using the method introduced by Provenzale et al. (1992), which is basically based on a sequence of space-time separation plots employed for the detection of temporal structures in the data.MODWT's translation-invariant property allows the effective alignment of the different events at different resolution levels so that the integrity of the dynamics induced by transient events is preserved.

Wavelet ANOVA
MODWT is a transformation of the type energy conserving, i.e. ∥X∥ 2 = ∑ J 0 j=1 ∥ W j ∥ 2 + ∥ V j 0 ∥ 2 , therefore a scale dependent analysis of variance can be derived, based on the set of wavelet and scaling coefficients, i.e.: Equation ( 15) enable us to quantify the allocation of the energy roughly attributable to chaotic components across the J scales Ruling out the boundary coefficients sets, we have that the wavelet variance ν2 is time independent under second order stationarity or difference stationary assumptions.The former implies that the extracted sequence (signal) { τ j,t } t∈Z + , that E(X) = µ and cov(τ j,t , τ j,t+k ) = γ j,k , being µ constant and γ k time independent.Defining the back-shift operator L, i.e.Lτ t = τ t−1 (therefore L n τ = τ t−n ) and the difference operator (the subscript j is omitted) ▽ d τ t = (1 − L) d τ t d = 0, 1, . . .D, the latter implies that the transformed time series ▽ d τ t = stationary.However, these conditions are usually met at coarser levels, where with M − j = N − L j + 1 being the size of the set of non-boundary coefficients for the j-level, L j the wavefilter length, and u stands for unbiased.However, by using MODWT, one is forced to use either reflection, or circular boundary conditions.Therefore, the proper estimator is given by: bias Confidence intervals for the true wavelet variance can be built for both ( 17) and ( 18) on the basis of their asymptotic approximation to a scaled χ 2 distribution with δ j EDOF (Equivalent Degree Of Freedom), which reflects the correlation structure at different resolution levels j.Following Percival and Walden (2006), 100%(1 − 2p) confidence intervals are approximated by with according to whether the case (17) or ( 18) is considered.

Table 2 .
Series X t , MRA outcomes for the time frame Z 1

Table 3 .
Series X t , MRA outcomes for the time frame Z 2

Table 4 .
Series X t , MRA outcomes for the time frame Z 3 Figure1.Time series X t and its MODWT coefficient sequence d j,t ; j = 1, . . ., 6