Random coefficient autoregressive processes describe Brownian yet non-Gaussian diffusion in heterogeneous systems

Many studies on biological and soft matter systems report the joint presence of a linear mean-squared displacement and a non-Gaussian probability density exhibiting, for instance, exponential or stretched-Gaussian tails. This phenomenon is ascribed to the heterogeneity of the medium and is captured by random parameter models such as ‘superstatistics’ or ‘diffusing diffusivity’. Independently, scientists working in the area of time series analysis and statistics have studied a class of discrete-time processes with similar properties, namely, random coefficient autoregressive models. In this work we try to reconcile these two approaches and thus provide a bridge between physical stochastic processes and autoregressive models. We start from the basic Langevin equation of motion with time-varying damping or diffusion coefficients and establish the link to random coefficient autoregressive processes. By exploring that link we gain access to efficient statistical methods which can help to identify data exhibiting Brownian yet non-Gaussian diffusion.


Introduction
Brownian motion, one of the most fundamental processes in non-equilibrium statistical physics, describes the motion of a passive colloidal particle in a thermal liquid environment.Observed experimentally by Brown [1] it was studied theoretically by Einstein, Sutherland, Smoluchowski, andLangevin between 1905 and1908 [2, 3, 4, 5].Two fundamental properties are typically associated with Brownian motion, namely, the linear growth of the mean-squared displacement (MSD) with the diffusion coefficient D, and the Gaussian probability density function (PDF) Alternatively to the MSD (1) Brownian motion is characterised by a 1/f 2 frequency dependence of the associated ensemble and single trajectory power spectra [6,7].Deviations from the linear time dependence (1) of the MSD are observed routinely in a large variety of systems, in particular, in the power-law form with the anomalous diffusion coefficient K α [8,9,10].In biological cells or other complex liquids both subdiffusion with 0 < α < 1 [11,12,13,14,15,16] and superdiffusion with 1 < α < 2 [17,18,19] are measured, see the recent reviews [20,21].Anomalous diffusion are effected when the increments of the stochastic process are no longer independent, when the variance of the step length or the mean step time diverges, as well as due to the tortuosity of the embedding space.The associated PDF of these processes may have both Gaussian and non-Gaussian shapes [8,9,10].
Of late, a new class of diffusive dynamics has come into focus, following numerous reports in soft matter, biological and other complex systems: in these systems the MSD is normal of the form (1) with invariable coefficient K, however, the PDF is non-Gaussian and often found to be of the distinct exponential shape ("Laplace distribution") see [22,23,24,25,26] as well as the extensive list of references in [27].These "Brownian yet non-Gaussian" processes along with a more general class of non-Gaussian PDFs, discussed in more detail below, are in the focus of this study.Our goal here is, however, different from the previous modelling approaches.Namely, here we try to establish a direct connection of the physical models for Brownian yet non-Gaussian diffusion and a class of processes ubiquitously used in time series analysis, the so-called autoregressive models with random coefficients [28,29,30].As we will show, both approaches are indeed tightly linked, and our detailed analysis here provides a much needed bridge between the worlds of mathematical data science and physical stochastic modelling.As a result we gain access to very efficient statistical methods that will allow us to identify data exhibiting Brownian yet non-Gaussian diffusion.
In section 2 we will provide more details on the process of Brownian yet non-Gaussian diffusion along with a primer to autoregressive models.Section 3 then introduces an intuitive physical derivation of the autoregressive model.The situation when the correlation times of the random diffusion coefficients is comparatively short is then considered in detail in section 4. A statistical analysis with parameter estimation is provided in section 5. We discuss our results in section 6.The paper closes with an appendix, in which details of the mathematical derivations are presented.

Brownian yet non-Gaussian diffusion
In the original works on Brownian yet non-Gaussian diffusion the linear MSD (1) was observed along with the Laplace shape (4) of the PDF.Other findings suggest the presence of more general stretched Gaussian tails with the stretching parameter δ = 2. PDFs of this type describe diffusion that can be normal (α = 1) or anomalous of the form (3) with α = 1.It was found that δ = 1 and α between 0.75 and 0.25 for tracer diffusion in living bacteria and eukaryotic cells [31].
Similarly, in simulations of crowded membranes it was shown that δ is between 1.3 and 1.6, and α below 1 [32].‡ This behaviour can be explained by the fact that the considered medium is spatially or temporally heterogeneous [22,23,27,34,35,36].In such a system the thermal fluctuations are still Gaussian but the observed displacements are mixtures of these Gaussian contributions with random weights, effecting the non-Gaussian outcome.In practice, this is commonly realised by two classes of diffusion models, namely by so-called "superstatistics" and "diffusing diffusivity" models.
Superstatistics is a term proposed by Cohen and Beck [37,38,39] and stands for superposition of statistics.It refers to a model with random parameters, that are fixed for each trajectory.This is a form of hierarchical or multilevel modelling [40], which is also close to the Bayesian interface method.The distributions which arise this way are called compound or mixture [41].In the context of diffusion this mixture approach can be explained in the following way: each observed trajectory either moves in its own neighbourhood, that has distinct properties affecting the motion.These features are supposed to not vary significantly at temporal and spatial scales specific to the observed trajectories.If the motion is not confined, this assumption must be viewed as a short-time approximation.Another possible origin is the situation when the properties of the observed particles vary from one to another, for instance, the radius of different tracer beads varies significantly.In this case the compound distributions are observed at long times, as well, but this possibility can be excluded for experiments in which the type of the diffusing object is precisely controlled.
The simplest type of a mixture model is Brownian diffusion with a randomised diffusion coefficient D. The velocity and position processes are given via where B represents standard Brownian motion.If we assume that D has an exponential distribution, the emerging process is characterised by the Laplace PDF (4).Similarly, ‡ Gamma distributions were found in [25], see the modelling in [33].
a stretched exponential PDF of D leads to relation (5).§ The randomness of D was confirmed in many experiments [25,31,36] and also in numerical studies [32].Since there is no fundamental reason to pinpoint the diffusion coefficient as the only heterogeneous property of the system, other parameters can also be made superstatistical.For example, a random viscosity or memory kernel in the Langevin equation also lead to non-Gaussian PDFs and generally non-linear MSD [42].It must be stressed that by introducing a parameter which is random and fixed trajectory-wise, we break not only homogeneity, but also locality and independence.By sharing the same random parameter, the memory structure of the process becomes non-linear and non-Gaussian [42].This dependence is strong enough to render the resulting motion non-ergodic.This non-ergodic property is not always desired and can be avoided when the system parameters are allowed to be both random and time-dependent, as in the diffusing diffusivity model.Fixed, deterministic values are replaced by a stochastic variation of the parameter, and for this reason the resulting models are called doubly stochastic [43].If the introduced parameter evolution is ergodic, the resulting model will be ergodic, but still non-Gaussian.For example, one can consider a time-varying diffusion coefficient leading to the diffusing diffusivity approach proposed by Chubinsky and Slater [44].In this model small Brownian displacements dB(t) are assumed to be randomly rescaled by random D(t), such that the resulting motion is described by Further details of the process depend on the specific choice of D(t)-the analysis is often complicated.However, an important case was solved by Cox, Ingerson and Ross [47] who proposed a stochastic differential equation with suitable mean-reverting property that leads to D(t) with exponential type of memory.Their work was motivated by financial applications and was expressed in the language of option prices and volatility modelling.More recently, this approach was applied to physically relevant quantities in [27,34,48,49,50], in particular, also for non-stationary initial conditions [33].

Autoregressive models
In contrast to models originating from physics, autoregressive models have their roots in filtering, optimal control theory, and economics [28].In this context the measured sequence of observations V k is assumed to be an output of a system, which is linked linearly to the white Gaussian noise input sequence Z k .For classical autoregressive moving average (ARMA) processes Z k [29,30] are defined by the recursive relation § Of course, other forms may also emerge from such an approach, for instance, Gamma [33] or stable [27] distributions.
In this sense the diffusing diffusivity approach is similar in kind to the diffusing waiting times in correlated continuous time random walks [45,46].This process is denoted by ARMA(p, q) and thus explicitly contains the range parameters p and q.The term autoregressive AR(p) reflects the linear dependence of the observed V k on p past values V k−1 , . . ., V k−p , and the moving average MA(q) represents a linear combination of the last value of the noise Z k and q previous values Z k−1 , . . ., Z k−q .The admissible coefficients φ i and θ j are chosen such that the sequence V k is stationary, resembling the physical velocity process (hence the choice of notation "V ") or highly confined motion.¶ In most of the literature the model (8) was not meant to explain the behaviour of the system.Rather, the philosophy was concentrated on controlling the data.This can be achieved by finding a reasonably small set of φ i and θ j that sufficiently well describe the observed memory structure of the observations.The procedure typically employs methods based on mean-square optimisation and information theory [29,30].Given the estimates of the φ i and θ j the future behaviour can be predicted (in the sense of the least square error or similar), which is a crucial element, for instance, in financial forecasting.Additionally, by inverting the relation (8), the Z k can be estimated from the X k , therefore various white noise tests can be used to verify the goodness of fit of the model [30].
It is in fact remarkable how many concepts of non-equilibrium physics and biological physics have their independent counterparts in time series analysis.For example, the study of anomalous diffusion and long-range dependence can be performed by using autoregressive fractionally integrated (ARFIMA) models where the white noise sequence Z k is replaced by Z d k with a power-law covariance structure, which itself is a result of applying a linear filter, namely, For ARMA processes, the left hand side of (8)-the AR part-is responsible for modelling an exponential decay of the covariance, whereas the MA part introduces short, finite time corrections.Additional filtering of the white noise presented in (9) extends the ARMA model to the ARFIMA with power-law memory, which was very successfully applied to various anomalous diffusion phenomena in biological data [51].An analogue of Lévy flights and diffusion with a power-law PDF is the ARMA model with stable noise, where the Gaussian Z k 's are replaced by non-Gaussian stable random variables [52].Similarly, fractional Lévy stable motion corresponds to ARFIMA with stable noise [53,54].Another line of development in time series analysis, originating from Box-Jenkins models, are non-linear generalisations, mainly the so-called autoregressive conditional heteroscedasticity (ARCH) [55] and generalised ARCH (GARCH) models [56].These allow for the parameterisation and prediction of a non-constant variance and proved to ¶ For non-stationary processes the ARIMA model ("I" stands for "integrated") may be used: for ARIMA(p, 1, q) the differences ∆V k = V k − V k−1 are assumed to be ARMA(p, q), for ARIMA(p, n, q) the nth differences are ARMA(p, q).be very useful for financial time series.Their invention prompted the bestowal of the Nobel Memorial Prize in Economic Sciences in 2003 to Granger and Engle.When integrated ARCH-type processes exhibit non-Gaussian distributions for short times along with a linear time dependence of the MSD.Precisely this observation leads us to pursue the question whether there is a fundamental connection between the two worlds of time series analysis using autoregressive methods and physical models of Brownian yet non-Gaussian type.
The aforementioned notion of heteroscedasticity is related to the time-dependence of the conditional variance [57,58].The basic ARCH(q) model is defined as If an ARMA model is assumed for the noise variance, the model becomes a GARCH(p, q) model: A GARCH model has an ARMA-type representation, so that many of its properties are similar to those of ARMA models, for instance, we can estimate the GARCH parameters by the same technique as for ARMA processes [28].In most applications it is enough to consider the order of the model as (1, 1).ARFIMA combined with GARCH describes both power-law decay of the correlation function with finite-lag effects (ARFIMA part) and varying diffusion parameter (GARCH part) [59].For example, it can describe inhomogeneous diffusion in the cell membrane [60] or solar X-ray variability [61].While the resemblance to diffusing diffusivity models ( 7) is indisputable: X k models velocity and Σ k models the evolving diffusion coefficient, there is to date no physical derivation of relations (10) and (11).
Another approach related to heteroscedasticity, non-Gaussianity and linear MSD is the main point of interest in this work: we assume that the linear coefficients φ i and θ j in the definition of ARMA ( 8) are replaced by random variables Φ i k and Θ j k that are independent of the noise Z k : This is the class of doubly stochastic models, random coefficient ARMA (rcARMA) [62,63].

Physical derivation of the autoregressive model
Let us start with the classical Langevin equation for the velocity of a diffusing particle, which is Newton's second law with the Stokes dissipative force −βV and the stochastic force F .In the classical setting F is given by white Gaussian noise with amplitude k B T β determined by the fluctuation-dissipation relation [64,65].Any heterogeneity of the medium can be modelled by making the parameters of equation (13 time-dependent and random-then they describe the local state of the environment of the diffusing particle.All possible correlation effects caused by the particle repeatedly visiting the same areas of the phase space are assumed to be reflected by the memory structure of random functions modelling these local parameters.Starting from the next section we will neglect this dependence entirely, corresponding to an annealed picture that is also inherent in the current physical diffusing diffusivity models.This assumption can be justified when the environment changes continuously.The resulting equation can then be written in the form We do not provide any more fundamental derivation behind this formula, for us Λ(t) and D(t) are effective parameters whose physical meaning is determined by equation ( 14) alone: D(t) describes the local effective amplitude of velocity gains and Λ(t) the local effective linear damping or relaxation of the velocity.In financial language these parameters would be called stochastic return rate and stochastic volatility.Formula ( 14) resembles the rcARMA.Indeed, in the standard Euler scheme of numerical simulations, one fixes a time delay ∆t between observations, denotes This is an rcAR(1) process with AR coefficient 1 − Λ k ∆t.It is easy to see that the same reasoning applies to any linear stochastic differential equation and thus the classical schemes of discrete-time simulations return rcARMA processes approximating the continuous-time solutions.
In fact, the connections between rcARMA and models of diffusing diffusivity reach deeper than that.Classical results from the theory of time series analysis establish that there is an exact correspondence between ARMA and solutions of linear stochastic equations with constant coefficients-no approximation is needed [66].The classical Langevin equation for a position of a particle confined in an harmonic potential, fits into this category.The resulting discrete motion is the ARMA(2,1) process with AR(2) coefficients The MA(1) coefficient θ can be easily calculated numerically or expressed by a somewhat complicated but elementary formula [67].This class of relations has direct application to the statistical analysis of the data and can be used, for instance, to calculate the exact discrete-time power spectral density (the spectrum given by the Fourier series of X k ) without the need of the commonly used approximation k f (X k )∆t ≈ f (X(t))dt [68].
The core of the physical interpretation we propose in this work is that the derivation used to obtain ( 16) and ( 17) can be generalised for the case of linear equations with timedependent coefficients.Namely, we multiply ( 14) by the integrating factor exp( Λ(t)dt) and integrate from (k − 1)∆t to k∆t, obtaining After some rearrangement it takes the sleek form with The random coefficient AR form of left hand side is clearly present, but it is not immediately clear what is the structure of Z k on the right.Different Z k use increments dB(s) from disjoint intervals [(k − 1)∆t, k∆t].The Gaussian increments dB(s) are stationary and do not depend on k but they are rescaled by D(t) and the exponent of Λ(t), which makes their conditional variances a sequence of random variables, Conditioned on Λ and D the variables Z k are Gaussian, independent, and have the variance specified above.Thus the Z k can be decomposed as which is a function of random Λ(t) and D(t).This is but exactly the rcARMA in which the regressive coefficients 0 ≤ Φ k ≤ 1 model the local return or relaxation rates, and the Θ k ≥ 0 describe the heteroscedasticity of fluctuations, that is the variability of the fluctuations' dispersion.An illustration how the changes of Φ k are visible in the data can be seen in figure 1.In general the form of this dependence is highly non-linear.When the parameters can be assumed to not vary significantly in the interval ∆t, Λ(t) ≈ Λ k and D(t) ≈ D k , the approximate relation reads In practical applications this formula can be further simplified.Namely, the function No matter what is the distribution of the Φ k and Θ k , the solution of (23) itself can be expressed in a simple manner.Repeating the recursive relation between V k and V k−1 we can write the explicit formula linking V k and V k−n for any n in the form Away from the degenerate case Λ(t) ≡ 0 it is true that Φ k < 1, therefore when n → ∞ the series converges towards We see that each V k is a mixture of Gaussian variables W k with random weights.The result is not Gaussian, but can be characterised as conditionally Gaussian N (0, S 2 ) with conditional variance The distribution of S 2 determines the non-Gaussianity: the more spread out S 2 is, the more the PDF of the velocity and displacement differ from the Gaussian shape.This property can be expressed in terms of the excess kurtosis, which is directly related to the relative standard deviation (RSD) of S 2 .Namely, is three times the RSD squared.Different variants of the excess kurtosis or the RSD of S 2 are sometimes called "non-Gaussianity parameter" [69,70], but here we will use more precise terminology.Because both are positive, the distribution of V is always leptokurtic, that is, it has tails thicker than a Gaussian.It is worth stressing that the kurtosis is only one of many possible measures of non-Gaussianity, but it is a convenient one because of the easy estimation.The distribution of S 2 is, in general, hard to analyse even for simple models of Φ and Θ.One exception is encountered when the coefficients stay constant for long intervals of time T α (precisely, E [T α ] larger than the mean relaxation time 1/E [Φ]).In such a situation S 2 spends short times transitioning between these periods and therefore in the majority they attain an equilibrium value obtained from taking the coefficients in ( 27) constant, where α is an identification of the interval.The resulting distribution of S 2 is thus with Θ and Φ being random variables constructed from the values Θ α , Φ α , Θ β , Φ β , Θ γ , Φ γ , . . .with probabilities weighted by the relative mean lengths of the intervals T α , T β , T γ , ....In this approximation one trajectory is effectively the same as an ensemble of equal-length trajectories, each with its own Θ, Φ, which are glued together at the ends, merging into a single trajectory.For a simple demonstration see figure 2. There one of the limiting cases are long periods of constant coefficients, as in (30), but one can also observe the second limit of rapidly varying coefficients.This is the subject of the next section.

Short memory random coefficient AR(1)
When the correlation time of Λ(t) and D(t) is smaller than the observation time, t c ∆t, the integrals defining Φ k , Φ k+1 and Θ k , Θ k+1 contain only the ratio t c /∆t of the total mass, corresponding to dependent values of Λ(t), D(t).Thus, Φ k and Θ k can be assumed to be sequences of independent variables.Note that for fixed k the pair Φ k , Θ k could still be dependent.Averaging (27) yields = Geo(q)) + Geo(q)) = 1/2.The periods of Φ α = 0.9 are on average twice longer, thus the effective distribution is P( Φ = 0.1) = 1/3, and P( Φ = 0.9) = 2/3.The prediction q → 1 from ( 27) with i.i.d.coefficients distributed like Φ.The prediction q → 0 comes from (30), in the latter case the PDF is just a superposition of two Gaussians.
Similarly, E [S 4 ] can be calculated, as detailed in the appendix.The moments E [S 2 ] and E [S 4 ] together determine the kurtosis of V .Accordingly, the motion is again non-Gaussian.The covariance also has a simple form: from the recursive formula (25) we can easily derive the geometric decay The form of the covariance also determines the MSD: for the velocity it converges exponentially to a constant δ 2 V (j) = 2(1−r V (j)) and for the position A similar formula can be obtained if one tries to recreate displacements given V k : then A small error in determining the slope of the true X(t) is made in this procedure, but the general behaviour of the MSD is the same.Note that the shapes of all the above functions depend only on the mean value of Φ k , this is a diffusion with covariance and MSD (up to a scaling factor) as if the system were homogenised-however, at the same time it is non-Gaussian, see figure 3.
As a concrete example of a physical model which leads to this type of process let us assume that the diffusivity is constant (with no loss of generality we take D(t) = 1).Then we consider a diffusing particle interacting with high viscosity traps distributed in a Poissonian manner, that is, the waiting time between subsequent trapping events have the exponential distribution E(ρ).By a high-viscosity trap we mean a short-time interval when Λ(t) = ∞, which immediately kills the momentum of the particlethis is a form of stochastic resetting [71].Outside of these events the damping rate is constant Λ(t) = λ.The probability that the particle does not meet any traps between times (k − 1)∆t and k∆t is exp(−ρ∆t): in this case Φ k = exp(−λ∆t) and Θ k = (1 − exp(−2λ∆t))/(2λ).The probability that the particle meets at least one trap in an interval [(k−1)∆t, k∆t] is of probability 1−exp(−ρ∆t), and the corresponding AR(1) coefficient Φ k becomes 0. The discrete noise Z k for this k has a more complex structure.All Gaussian fluctuations before the last trapping event are erased, which is visible from (21).However the remaining fluctuations still count.Denoting by T k the difference between the last trapping event and k∆t we obtain the formula for Θ k corresponding to the event This corresponds to a small correction O(∆t), which makes the resulting PDF smooth (otherwise the trapping events would be visible as a Dirac delta at x = 0).The approximation on the right holds for λ∆t 1.The relation between Λ(t) and Φ k , Θ k is shown in figure 4 using an exemplary trajectory.
Variables T k which determine Θ k have the distribution of an exponential variable conditioned by the requirement that it has value lower than ∆t (that is, the trapping event occurred in the considered interval).It has the PDF As we determined the exact distribution of Φ k and Θ k , the formula for the covariance (32) can be made completely explicit, namely r V (j) = E S 2 e −j(ρ+λ)∆t , The variance E [S 2 ] has a complicated form, but the geometric decay rate is given by (ρ + λ)∆t.It is an intuitive result: meeting a trap erases the momentum of the particle, which thus loses all connection to its history at a frequency proportional to the density of traps.
Studying the distribution of V k in general is hard, even for i.i.d.coefficients, because the conditional variance ( 27) is given by an infinite sum with terms which are dependent, as the same Φ k−i reappear in them.In the studied example the situation is specific and simpler, because the sum is actually finite.The coefficients Φ k−i are a series of Bernoulli tries-after a series of j − 1 non-zero ones appearing with probability exp(−ρ∆t) j the first zero occurs with probability 1 − exp(−ρ∆t) and the summation stops.Conditioned by this event the first j − 1 values Θ k−i are deterministic, but the last term in the sum, Θ k−j has the conditional distribution (35).This leads to the formula for characteristic function, Here the Laplace transform of the conditional The last approximation works for λ∆t 1 which follows from (35).The limiting case λ∆t ≈ 0 is also interesting: then Φ k are i.i.d.variables with Bernoulli distribution P(Φ k = 1) = exp(−ρ∆t) = 1 − P(Φ k = 0), as usual Θ k = √ Φ k -for simplicity we may neglect the fluctuations counted in the next interval after the trapping events.Directly from (27) we see that S 2 has a geometric distribution.Any such variable can be expressed as an integer part of some exponentially distributed variable, E say, and therefore it has values between E and E − 1.As mentioned in the introduction, the mixture of Gaussian variables with variance E has exponential tails [72], so this is the case also for S 2 .This also provides an intuitive argument for the presence of the tails with thickness between Gaussian and Laplace in the general case considered in this section (see figure 3).As the Φ k are bounded by 1, the tails of S 2 correspond to long stretches of Φ k close to 1, and as they are i.i.d. the probability of such event decays geometrically with the number of Φ k .

Statistical analysis
We now address the statistical procedures in the context of stochastic processes with random coefficients.

Memory
In (32) we showed that the covariance of i.i.d.rcAR(1) is, up to a scale factor, the same as for the AR(1) with averaged coefficient φ = E [Φ k ].The same is true for general i.i.d.rcARMA, it can be demonstrated by multiplying both sides of the equation by past values of the process and deriving the recursive formulas for the covariance (these are called Yule-Walker equations)-they differ from the non-random coefficient case only by the single equation which determines the variance [73].On the one hand it makes the additional randomness harder to detect, but on the other hand it allows to use powerful ARMA based methods of analysis.As we mentioned in section 2.2, AR (thus also rcAR) have a covariance function that is a mixture of exponentials, and the MA part is responsible for short time corrections.For the continuous-time processes the only option of statistical verification would seem to be fitting the estimated covariance function and checking the validity of the result.For rcARMA a more intuitive tool is available: the partial autocorrelation function.Given a stationary time series X k it measures the linear dependence between X k and X k−j when an in-between set of variables I k,j := X k−1 , X k 2 , . . .X k−j+1 is removed.Denoting the least-squares projection operator by P it can be defined as The partial autocorrelation can be effectively estimated using standard methods implemented in the popular statistical packages (such as Python Statsmodels, R tseries, Julia StatsBase, or Matlab Econometrics Toolbox)-the usual command is the standard abbreviation pacf.It is intuitively clear that α X measures the "direct" dependence between X k and X k−j which for AR(p) and rcAR(p), given their recursive definition, should be non-zero for the first p values and zero further on.For the MA part one can reverse the definition and express the noise Z k as the geometrical sum of X k , hence, the partial autocorrelation decays geometrically [29,30].For the full ARMA model the two effects are additive.For the i.i.d.rcAR(1) the statistical procedure becomes very simple: just estimate α X and check if it has a significant non-zero value only for j = 1.This value is simply For an illustration see figure 5.For higher order AR processes there are also exact equations linking the values of partial autocorrelation, covariance and coefficients of the model [29,30].
We now proceed to show that there actually is a way to detect differences between the dependence structure of ARMA and rcARMA.First, we estimate the mean rcAR(1) coefficient ϕ = E [Φ k ], for instance, by using the partial autocorrelation.Next, the typical method of statistical verification for AR processes is to consider is For the AR(1) with deterministic coefficient ϕ = E [Φ k ] this would be an estimate of the noise Z k .Therefore, one could then use many of the testing methods if the series is  16).Here we used a 500 point trajectory-to observe only the characteristic rcAR(1) behaviour even shorter, 200-300 point trajectories are sufficient.
i.i.d.However, for the rcAR(1), As It is a natural consequence of the applied procedure, which fits the system by removing the linear dependence.But the series Z k remains non-linearly dependent.The value Z k still depends on V k−1 and consequently also on Φ k−1 appearing in the expression for Z k−1 .This purely non-linear dependence can be detected only using non-linear measures of memory.The codifference function (see [74] for a detailed discussion) is one possible choice.It was proposed as a tool to study α-stable variables, because it is finite even for variables with no second, or even lower, moments [75]-however its usefulness is by no means limited to this class.One of the few possible close definitions is the formula For any Gaussian process the codifference equals the covariance, so the difference between the two indicate any non-Gaussianity of the PDF and, what is important here, of the memory structure.For i.i.d.rcAR it can be expressed as a function of coefficients, and it can be analytically approximated, see (A.4) and the derivation below.As a simple example let us take the uniformly distributed Blue areas are 95% confidence intervals around 0. The covariance is statistically zero for j > 0, the codifference exhibits exponential decay.We used two extremely long 10 7 point trajectories in order to present a long range of j and very weak dependence (statistically significant even around 10 −3 ).Much shorter trajectories are sufficient for less extreme cases, the requirements in general are similar to those for the covariance estimation.
simulation yields τ 1 Z (1) ≈ 0.0023, in good agreement with the analytic approximation is close 412/170535 = 0.0024.Two more examples of rcAR(1) with higher dispersion of i.i.d.coefficients and even more prominent non-linear memory are shown in figure 6.

Non-Gaussianity
One possible approach of visualising the non-Gaussian behaviour of a diffusive process is to quantify the non-Gaussian nature of its dispersion in a manner similar to how we quantified non-Gaussian memory in the last section.We propose to use the non-linear measure of dispersion which we call log-characteristic function (LCF).It has the property that if the process is Gaussian it equals the MSD for any θ.Therefore estimating it and plotting it together with the MSD reveals the non-Gaussian nature of the observed process.As the MSD averages over squares of the data, small values become even smaller and larger values become even more emphasised.Thus the MSD gives a large importance to the spread of extremal values.Conversely, for the LCF large values are translated into rapid oscillations of the cosine function and cancel each other, the LCF measures mainly the spread of the bulk.For the classical case of Gaussian normal diffusion (4) the LCF is linear for small times, but logarithmic for large times, so it gets dominated by the linear MSD.This insight is more general.Random coefficient models discussed in this work can only describe diffusion processes with tails thicker than Gaussian.This means that compared to Gaussian diffusion with the same MSD more mass is located in the tails than in the . Cases a) and b) are based on variants of i.i.d.rcAR considered before, case c) is an example of a motion with varying diffusivity strongly concentrated at 0. bulk, and the LCF must be smaller than the MSD.
To use the LCF in practice one needs to fix the parameter θ.For too small values the LCF converges to the MSD because cos(θx) ≈ 1 − θ 2 x 2 , for too large values the estimation becomes challenging, because ln(0) diverges.In a typical case reasonable values are those for which θE [|X(t)|] ≈ π so that X(t) explores mainly the first period of the cosine function.In figure 7 we show a practical application of this technique performed using a relatively small sample of 500 integrated rcAR trajectories.
For rigorous statistical testing of the Gaussianity we propose to use the standard Jarque-Bera (JB) test [76].It is a goodness of fit test of whether sample data have skewness and kurtosis that match a normal distribution.For the sample {x 1 , . . ., x n } of observation the JB statistic is defined as where s is the sample skewness and κ is the sample excess kurtosis.Samples from a normal distribution have an expected skewness of 0 and an expected kurtosis of 3 (excess kurtosis 0, in dimension one).Any deviation from this increases the JB statistic.The test is considered as standard and is implemented in various numerical packages, such as R tseries, Python SciPy.stats,Julia HypothesisTests, or Matlab.This test is considered as one of the most powerful tests on normality but, if non-Gaussianity is detected, it does provide information on the origins of this behaviour.However, the idea of observing the kurtosis leads to an algorithm to distinguish between Gaussian and non-Gaussian (in particular, Lévy stable) distributions [77].It is based on the empirical cumulative excess kurtosis (ECEK) which is defined as follows, where x is an arithmetic mean of the random sample.This simple statistic serves as an indicator for whether there is a noticeable difference between Gaussian and non-Gaussian distributions.For the Gaussian case, for large numbers of observations it converges to the theoretical excess kurtosis 0, while for the non-Gaussian case the ECEK does not tend to 0 with increasing number k of observations and, moreover, for distributions that do not have a finite forth moment it does not converge at all and behaves chaotically [77].
To distinguish between Gaussian and non-Gaussian distributions we also advocate the application of the discrimination algorithm based on examining the rate of convergence to the Gaussian law by means of the central limit theorem [78].The idea of the algorithm is to analyse the convergence of the estimated index α of stability for sequential bootstrapped samples from the analysed data.If the estimated values converge to 2, then the data are light-tailed and belong to the domain of attraction of the Gaussian law.In particular, if the data are Gaussian, the estimated values should be equal to 2 for most of the cases.If the data belong to the domain of attraction of a non-Gaussian Lévy stable law, then the values should converge to a constant less than 2.
To illustrate the usefulness of the above statistical tools we consider an rcAR process for which the Φ k are independent uniformly distributed on the interval (0, 0.95) and In the left panel of figure 8 we can see a simulated trajectory of V k of length 1000 points.In the middle panel we display a plot of the ECEK for the simulated trajectory.We can see that the function converges to a constant close to 2 which clearly indicates a non-Gaussian behaviour and a finite positive excess kurtosis of the underlying distribution (a leptokurtic distribution).We also calculated analytically the exact value of the excess kurtosis (equations ( 28) and (A.3)) and obtained the value 1.83, which coincides with the final values of the ECEK function.Finally, in the right panel of figure 8 we show the estimated values of the stability index α, for different non-overlapping consecutive blocks of length M .For the first sample (M = 1; corresponding to the whole trajectory), the value is significantly lower than 2, but for the other M the values increase and tend to 2. This suggests that the simulated trajectory exhibits a non-Gaussian behaviour but its distribution belongs to the normal domain of attraction.The obtained results clearly suggest that the simulated trajectory exhibits a non-Gaussian behaviour and the underlying distribution is heavier-tailed than Gaussian (is leptokurtic) but belongs to the normal domain of attraction.The non-Gaussianity is also confirmed by the JB test which returns a p-value less than 0.001.

Parameter estimation for rcAR processes
Finding reliable estimators of the parameters of the rcARMA models is a very important challenge.In this section we present preliminary ideas for some special cases of the models.
For an rcAR process with Φ being a simple function one can estimate its parameters either by extracting the constant periods of Φ and applying classical fitting techniques for the autoregressive processes to the extracted parts, or to consider a method which directly incorporates the information on switching between different autoregressive processes.
We now follow the latter idea and apply an algorithm based on hidden Markov Models (HMMs) [61,79,80].This algorithm assumes that the trajectories switch among discrete diffusive states according to a stochastic (Markov) process.We consider a modification of the classical HMM, where the Markov regime switching is combined with AR(1) processes [81].To show the usefulness of this technique we take into account a two-regime parameter switching model, that is, a model with both regimes driven by AR(1) processes with the autoregressive parameters φ 1 and φ 2 .In figure 9 we present a simulated trajectory of V k of length 1000 for which Φ α changes from 0.2 to 0.8 at point 400.The estimated regime switching point is 399 which is very close to the true value, and the estimated φ 1 and φ 2 coefficients are 0.17, 0.79 which confirms the usefulness of the procedure.

Discussion
As modern experiments such as single particle tracking routinely produce large amounts of data for the thermally and actively driven motion of test particles, the extraction of physical information from the garnered data becomes ever more important.On the one hand, this is met by the analysis of a growing number of complementary observables such as time averaged MSD [82,83], higher order moments or mean maximal statistics [84], p-variation methods [85], first-passage methods [86], or single trajectory power spectra [7,87].On the other hand objective methods such as maximum likelihood approaches [88,89,90,91] or machine learning [92] are being recognised as useful tools.Here the goal is to identify the physical model giving rise to the observed motion and estimate the involved parameters.An alternative method, well established in financial market studies, is time series analysis.The latter seemingly does not share a lot with the physical approach of modelling.It puts the emphasis not on explaining the observed systems, but instead concentrates on fitting and prediction.Given a class of discrete linear filters suited to describe the general features of the data (for us the non-Gaussianity and linear MSD) it offers a wide choice of powerful methods for finding the optimal model and validating it.In other words by design it concentrates on the phenomenological description.
While a pure time series analysis approach initially may be discouraging for physically-minded researchers, the undeniable effectiveness of the time series methods make any connections to the framework of physics interesting and important.As noted in section 3 the correspondence between the Gaussian Langevin equation and the ARMA process is known (in fact, it was established as early as in 1959 [93]), but it never seemed to have gained a widespread appeal in the physical or biological physics community.Nevertheless, some recent progress was made, especially in topics related to the identification of fractional dynamics [60,94] and accounting for the distortions made by measurement devices [68,95].
This paper continues this line of research and aims at promoting to sample the best of both worlds.This is hard to achieve on a general level, but we study here important cases when a fruitful compromise can be made, and new information provided in the challenging analysis of non-Gaussian diffusion.Omitting the dynamics at time scales significantly shorter than what is available from observation (shorter than the sampling time ∆t) leads to conceptually simple autoregressive models.There are straightforward and quick to simulate (using a medium class computer we were generating ten million values in under 2 seconds) and they provide a wide of statistical methods.Some of those, such as the partial autocorrelation, are intrinsically linked to discrete dynamics and have no continuous-time counterpart.Others, such as the kurtosis and codifference, profit from the analytic methods available for discrete variables.
In section 5 we illustrated important properties of the rcARMA processes and provided a list of statistical tools that can be useful in their identification and validation.In particular we showed that the codifference function can be used to distinguish between ARMA and rcARMA processes.To illustrate the non-Gaussianity of the rcARMA models we considered a non-linear measure of dispersion, namely the logcharacteristic function, the empirical cumulative excess kurtosis, and studied the domain of attraction of the underlying distribution.Finally, we presented a hidden Markov model methodology which can be useful in fitting the process parameters.We also stress that the presented tools can be successfully applied for quite moderate lengths of the trajectories.
Our considerations are general and not limited to any particular system, as we discuss few related models based on random coefficient autoregression, which stem from a widely used form of the Langevin equation and reasonable models of heterogeneous environment.Thus, the provided methods are also universal.We hope that this study will promote the use of autoregressive models in modern data analysis, as well as prompt further studies into the physical meaning of these models.
factor is We recognise the product of two geometric series in the above.Finally, the fourth moment becomes Given these moments, one can easily calculate the kurtosis.It also makes it possible to obtain an analytic approximation of the codifference of the estimated noise We show the procedure for τ θ Z (1).The method for other τ θ Z (j), j > 1 would be the same.We denote Φ k := Φ k − ϕ and start with a straightforward conditioning which yields where S 2 is assumed to have the distribution as in (27) and be independent from Φ k , Θ k , Φ k−1 , Θ k−1 .Expanding the exponents in a Taylor series yields an approximation for the codifference which depends only on the moments of Φ k and Θ k .Expansion up to first order, e −x ≈ 1 − x, is equivalent to calculating the covariance, so it is zero.The second order, e −x = 1 − x + x 2 /2 in general already provides quite good non-zero estimate.For the numerator it is The corresponding denominator is expanded into The expected values that appear in this formulas are just linear combinations of the first four moments of Φ k , Θ k and their products.For variables generated as powers of a uniform distribution these are just integrals over polynomials.Also, without significant change the logarithm can be replaced by the fraction ln(x) ≈ x − 1, which leads to a formula being a fraction of coefficient moments up to fourth order.

Figure 1 .
Figure 1.Trajectory V k in which Φ k changes in 3 intervals from 0.5, to 0.1, and finally 0.9, while Θ k = 1 is kept constant.For small Φ k the motion resembles white noise, larger values result in longer Brownian excursions away from 0. Note that for better visibility very long excursion are cut off.

Figure 2 .
Figure 2. PDF of the rcAR process V with random periods T α of constant coefficients Φ α and Θ α = 1.The dashed lines represent rcAR simulations, the solid lines correspond to theoretical predictions for the limiting cases.The distribution was chosen to be the simple geometric distribution Pr(Φ α = 0.1, T α d = Geo(q)) = 1/2 and Pr(Φ α = 0.9, T α d

Figure 3 .
Figure 3. PDF and MSD estimated from the rcAR process V .Here U k are i.i.d. and uniformly distributed, U(0, 1), and Θ k = √ Φ k .Different transformations of U k correspond to different power laws of Φ k around 0 + and 1 − .

Figure 4 .
Figure 4. Damping rate Λ(t) and corresponding Φ k , Θ k .Here ∆t = 1 and, outside of the short impulses, Λ(t) = 1/4.To better highlight different values of the coefficients, the diffusion coefficient was taken to be D(t) = 4.

Figure 6 .
Figure 6.Comparison of the covariance and codifference for the series Zk = V k − E [Φ k ] V k−1 generated with two different i.i.d.Φ k obtained from U k

Figure 7 .
Figure 7. LCF and MSD calculated from 500 samples of the integrated rcAR processes.Semi-transparent lines are the individual estimated LCFs and MSDs, solid lines are the averaged ones, dashed lines denote 90% confidence intervals.Case a) is

Figure 8 .
Figure 8. Left: Trajectory of V k for which Φ k d = U(0, 0.95) and Θ k = √ Φ k .Middle: Empirical cumulative excess kurtosis for the simulated trajectory.The solid black line represents the true kurtosis value which is equal to 1.83.Right: Estimated values of the index of stability α.The boxplots (for block lengths M ∈ [1, 10]) constructed from 100 bootstrapped samples each of length 1000.The obtained results clearly suggest that the simulated trajectory exhibits a non-Gaussian behaviour and its distribution is leptokurtic but belongs to the normal domain of attraction.
Figure 9. Two regimes identified by Hidden Markov Model methodology for a trajectory of V k for which Φ α changes from 0.2 to 0.8 at point 400.The estimated regime switching point is 399 which is very close to the true value.