Characterizing abrupt transitions in stochastic dynamics

Data sampled at discrete times appears as a succession of discontinuous jumps, even if the underlying trajectory is continuous. We analytically derive a criterion that allows one to check whether for a given, even noisy time series the underlying process has a continuous (diffusion) trajectory or has jump discontinuities. This enables one to detect and characterize abrupt changes (jump events) in given time series. The proposed criterion is validated numerically using synthetic continuous and discontinuous time series. We demonstrate applicability of our criterion to distinguish diffusive and jumpy behavior by a data-driven inference of higher-order conditional moments from empirical observations.


Introduction
Many empirical time series exhibit fluctuations that are interrupted by jumps in very short time between different states of a system [1]. Examples include dynamics of charge transport in various materials [2], stochastic resonance [3], moving fronts [4], light curves of variable astronomical objects [5], fluctuations of wind and solar power systems [6], early warning signals of systems near to their tipping points [7], transitions in financial [8,9] and climate data [10], ion channel dynamics [11], eye movements [12], or movement and foraging paths of animals [13]. Jumps generate pronounced discontinuities in time series of observables and apparently, the underlying trajectories are unlikely to be generated by statistical continuous, diffusion-type processes. Such processes have been extensively studied [14,15] and are commonly modeled by a Langevin equation (using the Itô interpretation) as 0 is a scalar Wiener (Brownian) motion, and a(x, t) and ( ) b x t , 2 denote the state-dependent deterministic drift and the diffusion functions. A process x(t) generated with equation (1) is a continuous diffusion process if a(x, t) and b(x, t) are smooth and do not change dramatically over a short time interval t d [16]. The unknown functions a(x, t) and b(x, t) can be found non-parametrically [17]-i.e. directly from measured time series-in terms of the firstand second-order Kramers-Moyal (KM) coefficients. For a continuous diffusion process (equation (1) and for infinitesimal t d ), we have x t x . For this type of process and using the conditional probability distribution, one can show that x(t) satisfies Lindeberg's continuity condition, given some δ>0 [16] When analyzing empirical data, one might want to ensure that the corresponding time series can indeed be modeled by equation (1). For this purpose, one should check if the fourth-order KM coefficient vanishes. If this was the case, then-according to the Pawula theorem [18]-all higher-order (>2) coefficients will vanish too ( ), and the probability density of the underlying process can be described by a Fokker-Planck equation [17].
In general, for a given time series, the non-vanishing of higher-order (>2) KM coefficients can be related to the existence of jumps in the time series [16]. In this case, the KM coefficients provide an upper limit for the continuity condition (equation (3)): , which holds for any m3 (provided the firstand second-order coefficients are non-vanishing) [16]. Therefore, any vanishing higher-order KM coefficients, particularly the fourth-order one ( 4 ), guarantee that the underlying process is statistically continuous. Depending on the problem formulation, checking the boundedness of the aforementioned continuity condition may be easier than estimating the tail of the probability distribution.
Here we demonstrate that a finite time interval t d not only influences the firstand second-order KM coefficients [19] but also causes non-vanishing higher-order (>2) ones. We derive a novel criterion to check whether for a given, even noisy time series the underlying process has a continuous trajectories or has jump discontinuities.
This paper is organized as follows. In sections 2 and 3, we present higher-order conditional moments of continuous, linear and nonlinear diffusion processes as well as of jump-diffusion processes. We then present in section 4 a novel criterion to distinguish diffusive and jumpy behavior in time series. In section 5, we derive the firstand second-order conditional moments of noisy empirical time series, and in section 6, we provide our results from an analysis of two real-world time series. We summarize our paper in section 7. All derivations and proofs are presented in appendices A and B.

Higher-order conditional moments of continuous, linear and nonlinear diffusion processes
We begin by deriving conditional moments of the Langevin equation (1) for different orders of the time interval t d (see appendix A and [20]). We find the following expressions for the conditional moments of orders Î { } m 2, 4, 6 , if we consider terms up to the order of the first non-vanishing power in ( ) t d 2 for m=2 and m=4, respectively ( ) t d 3 for m=6 (here we omit the x-and t-dependence of a and b to enhance readability) where the subscript 'd' denotes diffusion, and where ¢ a and ¢ b denote the first and a″ and b″ the second derivatives with respect to state variable x. We check the validity of expansions in equation (4) as well as of those for conditional moments of orders Î { } m 1, 3, 5 (see appendix A) by reconstructing stochastic processes with known drift and diffusion coefficients from synthetic time series sampled with time intervals t d spanning three orders of magnitude.
For both linear and nonlinear continuous processes described by the Langevin equation (equation (1)), we find a very good agreement between estimated conditional moments and the respective theoretical predictions (see figures 1 and 2), demonstrating the validity of our approach for such systems. As expected, we find the second-order conditional moment to depend linearly on t d while the fourth-order one scales with t d 2 (see figure 3). These findings demonstrate that-even for continuous diffusion processes-non-vanishing higherorder (>2) conditional moments can originate from a finite time interval t d . With the second-and fourth-order conditional moments for small t d we find , which confirms Wick's theorem [21,22] and follows from the fact that the shorttime propagator of the Langevin dynamics (equation (1)) is a Gaussian distribution [17]. No such short-time propagator, however, is known for general jump-diffusion processes (see below). In figure 4, we show this relationship for the aforementioned continuous nonlinear diffusion process. Similar findings can be obtained for a continuous linear diffusive process, and we note that a normalization of the time series leaves this relationship between the fourth-and second-order conditional moments unaffected. For the analysis of empirical data, we propose to use this relationship to judge whether the fourth-order moment tends to zero.
estimated from exemplary time series of a continuous linear diffusion process (equation (1)), generated with a(x)=−px and b(x)=b 0 (p = 10, b 0 =1) and with a time interval =t d 10 5 . Time series consisted of N=3×10 6 data points. We observed extreme events up to about ±3σ for these time series (σ denotes the standard deviation). Error bars indicate the standard error of the mean (SEM) in each bin. Theoretically expected values are shown as red lines, and in the range ±2σ more than two-thirds of these values lie within the SEM interval of estimated conditional moments. Deviations in the tails are due to low statistics.
.05) and with a time interval =t d 10 5 . Time series consisted of N=3×10 6 data points. We observed extreme events up to ±4σ for these time series. Theoretically expected values are shown as red lines, and in the range ±1.5σ more than two-thirds of these values lie within the SEM interval of estimated conditional moments. Deviations in the tails are due to low statistics.

Higher-order conditional moments of jump-diffusion processes
As a third example, we consider a dynamical stochastic equation that is capable of generating a discontinuous trajectory [16]  for some x=x 0 on time intervals t d for a continuous nonlinear diffusion process (see figure 2; we obtained similar findings for the continuous linear diffusion process). Conditional moments were estimated from normalized time series (zero mean and unit variance). When analyzing empirical data, the ratio between fourth-and second-order conditional moment (see inset) is often used to judge whether the fourth-order moment tends to zero. Lines are for eye guidance only. 3 . We chose x 0 around the mean of the respective time series. Lines are for eye guidance only.
0 is a scalar Wiener process, a(x, t) and b(x, t) are again the state-dependent deterministic drift and the multiplicative functions, and J(t) is a time-homogeneous Poisson jump process (we assume that jump events are rare and can be modeled via a Poisson process). Jumps have state-dependent rate λ(x) (which defines the mean waiting time τ p =1/λ between successive jumps) and size ξ, which we assume to be Gaussian distributed with zero mean and variance s x 2 (or to follow any symmetric distribution with finite moments). For infinitesimal t d , it was shown recently [16] that x l (the subscript 'j' denotes jumpy), and in the following, we derive conditional moments for different orders of the finite time interval t d . From equation (6), we would expect for infinitesimal t d that all conditional moments (except the odd-order ones with m>1) are non-vanishing. Therefore the conditional probability distribution p (x, t) of the process x(t) in equation (5)   where the superscript { } i denotes the ith derivative with respect to x. The terms A-E are related to drift and diffusion functions and jump properties as (see appendix B) x l x l x l  (7) are zero). We note that finite-N synthetic data from equation (5) may not have jump number l  n N j for a constant jump rate. We find again a very good agreement between estimated conditional moments and the respective theoretical predictions (see figure 5), demonstrating the validity of our approach even for such systems.
In order to derive the unknown functions in equation (6), we use the slope of ( ) for small t d (see figure 3) and estimate x l á ñ ( ) , and this leads to s l x ( ) x 3 4 and s l x ( ) x 30 6 . Now we can estimate both jump amplitude s x 2 and jump rate λ(x) from ( ) provides an estimate for the drift function a.
In closing this section, we note that for jump-diffusion processes and for small t d the ratio between fourthand second-order conditional moments diverges as

A criterion to distinguish diffusive and jumpy behavior in time series
With conditional moments ( ) (the subscript• is a placeholder for either 'd' or 'j') and using either relations 4 or relations 7, we now derive a function Q(x) that-for small time intervals t d -allows one to detect and to distinguish diffusive and jumpy behavior in time series  diffusion function b (see figure 6). Any deviation from will rule out that the underlying process is diffusive. For the exemplary time series of the jump-diffusion process generated with equation (5), Q(x) converges to the jump amplitude s x 2 . We note that Q(x) will approach x x á ñ á ñ 5 6 4 for jump amplitudes with a non-Gaussian distribution.

Higher-order conditional moments of noisy empirical time series
We now consider the case that a time series x(t) is contaminated with some noise η(t), which is not assimilated by the stochastic process, leading to y(t)=x(t)+η(t). The noise η(t) is supposed to be uncorrelated with x(t), and we assume that it has finite even-order statistical moments h á ñ n 2 and vanishing odd-order ones. To reconstruct the unknown dynamics x(t) from the measurement y(t), it is essential to quantify η(t) and its influence on the reconstruction of KM coefficients. One can derive the following expressions for the firstand second-order conditional moments of y(t) (the subscript 'n' denotes noisy) [23]:  [23]. We note that the averaged statistical moments á -ñ ( ( ) ) x t y m do not depend on t d . These terms cause a strong overestimation of KM coefficients = for small t d and thus of the functions of interest in equations (1)and (5). Now, equations (4)and (7) state that there are extra t d -independent terms contributing to the conditional moments and these constants can be subsumed with the different statistical moments of the noise. These constants can easily be estimated by dividing the evenorder conditional moments for y(t) by t d (because this leads to a divergence of ( ) ) [23], and averaged higher-order statistical moments of η can then be derived from the knowledge of the lower-order ones. Having estimated the averaged statistical moments of the noise allows one to find conditional moments for the time series x(t), and with equation (10) one can check whether the underlying process has a continuous (diffusive) or discontinuous (jumpy) trajectory.

Application to real-world time series
In this section, we demonstrate applicability of our criterion to distinguish diffusive and jumpy behavior (equation (10)) by a data-driven inference of higher-order conditional moments from empirical observations. Before going into details, we briefly mention that in the case of empirically derived time series that were sampled with a fixed sampling interval Δ, equation (10) can be verified by scaling the time interval as a = D t d , where α=1, 2, 3, K(i.e. by considering data points , only). This scaling changes the estimated drift coefficient a to α a, the diffusion coefficient b to a b, and the estimated jump amplitude s x 2 to a s x then indicates a possibly diffusive or jumpy behavior. For coarse scales (α ? 1), we expect Q(x) to take on non-vanishing values, given that data discretized at such scales appears as a succession of discontinuous jumps, even if the underlying trajectory is continuous. For small scales (  a = ( ) 1 ) and diffusive processes, Q(x) approaches zero since the Brownian-type (Wiener-type) behavior of the process produces a continuous trajectory [16]. Thus, the small-scale behavior of Q(x) for  a = ( ) 1 is an indicator for rapid changes or jumps in a given time series. For our analysis, we expect that the scaled sampling interval is lying in the interval a t < D < t p M . Here, t M denotes the Markov-Einstein time scale, which is the minimum time interval over which the data can be approximated by a Markov process; t M can be estimated from the investigated time series [17], for instance by directly checking the validity of the Chapman-Kolmogorov equation [24]. The mean waiting time τ p between jumps is defined as in section 3 and can be estimated from the investigated time series as described in [16].
The data we analyze here were part of previous studies [6,25,26]. The first time series is a measurement of the spatial position of a dielectric bead (polystyrene, diameter ∼1 μm, Bangs Laboratories Inc. USA) trapped in optical tweezers. The setup consisted of a Nd:YAG laser (power: 73 mW, wavelength: 1064 nm, Coherent, USA) focused using a water immersion objective (Olympus Corp., Japan) in an optimal condition [27,28]. A Si PIN quadrant photodiode (S5980, Hamamatsu Photonics K. K., Japan) was positioned at the back focal plane (BFP) of the condenser allowing an accurate detection of the displacements of the trapped bead through the BFP detection scheme [29]. The voltage output of the quadrant photodiode was first amplified and then digitized using an A/D card (National Instruments Corp., USA). Trapping experiments were conducted under the optimal condition with almost zero aberrations at a depth of ∼10 μm in order to neglect the hydrodynamic effect of the chamber walls. Once a bead was trapped, ten positional time series (see figure 7 top) were recorded over a period of 3 s each with a sampling rate of 22 kHz (see [25] for further details; t M =Δ). A data-driven estimation of the firstand second-order conditional moments confirmed the diffusive nature of the bead dynamics [25].
The second time series is based on a 12 months measurement (with a sampling rate of 1 Hz) of global solar irradiance on horizontal and inclined surfaces conducted by the United States' National Renewable Energy Laboratory at Kalaeloa Airport (21.312°N, -158.084°W), Hawaii, USA, from March 2010 until March 2011 [30]. Measurement were performed using 19 LI-200 pyranometers (LI-COR, USA). Two pyranometers were tilted by 45°, while the remaining ones were horizontally mounted and scattered across an area of about 750×750 m 2 . The data is available from http://nrel.gov/midc/oahu_archive/. We here investigate clear-sky index data * = ( ) ( ) I t I t I cs , where I(t) and I cs are the measured solar irradiance and its theoretical prediction under clear-sky conditions at a given latitude and longitude, respectively (see [6,26] for further details; t M = 4Δ; A jumpy stochastic behavior of solar irradiance represents a major obstacle in the power production as it influences not only the availability of energy, but also the stability of the entire power grid [6]. Moreover, it requires expensive technical solutions, such as fast reserves or storage systems in power supply to overcome and compensate such fluctuations. In figure 8, we show for both time series the dependence of Q(x) on the scaling coefficient α. For the time series of the spatial position of a bead trapped in optical tweezers, we obtain a clear indication for a continuous diffusion process, as expected. For the time series of the clear-sky index, our analyses indicate a discontinuous trajectory with jumps that can be traced back to on-off fluctuations of the cloud structure generated by turbulence in the atmosphere [6]. In addition, it appears that for time-scales greater than 8 s (α=8) this time series contains jumps with amplitude s x  0.11 2 (for the chosen x 0 around the mean of data). Before closing this section, we demonstrate that our approach to distinguish diffusive and jumpy behavior can easily be extended to non-stationary time series, e.g. by employing kernel-based estimation techniques of higher-order conditional moments [16,31]. This allows for a time-resolved estimation of local characteristics of a time series, such as drift and diffusion coefficients as well as the jump rate and jump amplitude. As an example, we investigate the clear-sky index data (with α=1), which has strong nonlinear and non-stationary properties on time-scales less than 10 4 s [26], and estimate Q(x(t)) in a time-resolved fashion (Gaussian kernel with bandwidth 0.1). Its temporal evolution (see figure 9) provides strong evidence for an intermittent switching between diffusive and jumpy behavior. A similar analysis for the spatial position of a bead trapped in optical tweezers did not indicate a jumpy behavior (Q(x(t)) took on values in the order of 10 −8 only; data not shown).

Concluding remarks
The KM expansion for the probability density of some stochastic process can be reduced to the Fokker-Planck equation if higher-order (>2) KM coefficients vanish. There are, however, many physical experiments that indicate non-vanishing higher-order KM coefficients [17,[32][33][34][35][36][37][38][39]. A priori, it is not evident if such observations are due to the finiteness of the respective sampling intervals or whether the measured time series do not belong to the class of continuous diffusion processes [14,40] and contain discontinuous, abrupt changes or jumps. The latter were shown recently to have pronounced contributions to higher-order KM coefficients [16] and to account for the non-Gaussian behavior of increment statistics of empirical time series [6]. Nevertheless, jumps are notoriously difficult to identify since in practice, only discrete data are available from continuous-time models. Here we have demonstrated that a finite sampling interval not only influences the firstand secondorder KM coefficients but also causes non-vanishing higher-order ones. For small time intervals t d , the linear relationship between the fourth-and squared second-order conditional moments enables one to judge whether the fourth-order moment tends to zero (any deviation from this will rule out that the underlying process is diffusive), thus allowing to reduce the KM expansion to the Fokker-Planck equation. Using information about these higher-order moments, we derived a novel criterion (as a necessary condition) to check whether for a given, even noisy time series the underlying process has a continuous or discontinuous trajectory. Our novel approach to distinguish diffusive from jumpy stochastic behavior in time series enables the detection of jump events in the data and provides a general avenue to better understand the dynamics of complex systems.
are nonvanishing. Therefore, the conditional probability distribution of the process x(t) satisfies the truncated KM differential equation (Fokker-Planck equation) as , and  FP is given by , . A2