Superstatistical generalised Langevin equation: non-Gaussian viscoelastic anomalous diffusion

Recent advances in single particle tracking and supercomputing techniques demonstrate the emergence of normal or anomalous, viscoelastic diffusion in conjunction with non-Gaussian distributions in soft, biological, and active matter systems. We here formulate a stochastic model based on a generalised Langevin equation in which non-Gaussian shapes of the probability density function and normal or anomalous diffusion have a common origin, namely a random parametrisation of the stochastic force. We perform a detailed analytical analysis demonstrating how various types of parameter distributions for the memory kernel result in the exponential, power law, or power-log law tails of the memory functions. The studied system is also shown to exhibit a further unusual property: the velocity has a Gaussian one point probability density but non-Gaussian joint distributions. This behaviour is reflected in relaxation from Gaussian to non-Gaussian distribution observed for the position variable. We show that our theoretical results are in excellent agreement with Monte Carlo simulations.


Introduction
At the beginning of 20th century the works of Einstein, Smoluchowski, Langevin and Wiener [1][2][3][4] opened a new chapter of quantitative understanding of physics, chemistry, and mathematics by laying down the foundations for what we now call the theory of stochastic processes. Their goal was to provide descriptions of various aspects of diffusive motion, which were observed even in ancient times, for instance, by Roman poet Lucretius [5]. However, it was the groundbreaking experiments of Brown in the 19th century that brought this topic into scientific attention.
Two fundamental properties are commonly encountered in observed diffusive motion: (i) The mean squared displacement (MSD) of the particle position X grows linearly with time, the slope of this MSD being determined by the diffusion coefficient D. (ii) The random position X is distributed according to Gaussian statistics with the probability density From a random walk perspective these properties emerge from the central limit theorem for the weakly dependent, identically distributed random variables [6]. Properties (1) and (2) can be readily obtained from the stochastic equation [3] introduced by Langevin, which describes the dynamics of the velocity process V of a particle of mass m in a thermal bath of temperature T , where k B is Boltzmann's constant and λ the damping coefficient. Equation (3) models the interaction of the Brownian particle with the surrounding medium: the Gaussian white noise term √ k B T λξ(t) corresponds to the rapid exchange of momentum between the test particle and the environment. The motion X is considered slow in comparison to individual bombardments by bath particles. The term −λV (t) represents the viscosity of the surrounding medium, its exact magnitude determined by the properties of the liquid and the particle shape, and thus stands for energy dissipation. Solving the Langevin equation (3) and comparing the stationary value of the mean squared velocity with the thermal energy due to the equipartition theorem, we obtain the Einstein-Smoluchowski relation D = k B T /(4λm) [7].
There exist several approaches to anomalous diffusion, which introduce various degrees of memory [36]. An important extension of the Langevin equation (3), the generalised Langevin equation (GLE), was promoted in the famous work of Kubo [37] and widely applied in chemical physics [38,39]. The GLE is an integro-differential equation of the form [39][40][41][42][43][44] in which the more complex dependence is reflected both in the memory integral (of convolution form) with the kernel K, and in the stochastic force ξ, which is now described by the covariance function r ξ (t) = E[ξ(τ + t)ξ(τ )]. For a power-law kernel the solution of the GLE is an antipersistent motion which models subdiffusion [40,41,45] and can be written in terms of a fractional order Langevin equation [36,46,47]. The GLE has a somewhat special status among stochastic models of anomalous diffusion, as it can be strictly derived from statistical mechanics. The most general approach is the projection-operator formalism [39] but additional physical insight can be gained from more specific derivations, for instance, from the Kaz-Zwanzig model of a degree of freedom interacting with a heat bath of harmonic oscillators [48,49], a test particle interacting with a continuous field [50,51], or a Rouse model describing the conformational dynamics of a monomer in a polymeric bead spring model of mass points connected by harmonic springs [52,53]. The GLE with power-law kernel also emerges from a harmonisation of a single file system of interacting hard core particles [54]. It follows from this derivation that ξ is a stationary Gaussian process, that is, every vector [ξ(t 1 + τ ), ξ(t 2 + τ ), . . . , ξ(t n + τ )] has an n-dimensional Gaussian distribution, which does not depend on the time shift τ . Moreover, the kernel K and the stochastic force are related by the famed Kubo fluctuation-dissipation theorem E[ξ(τ )ξ(τ + t)] = √ k B T × K(t) [37,55]. Physically, the GLE with power-law kernel is related to viscoelastic systems, and was identified as the underlying stochastic process driving the subdiffusion of submicron tracers in cells, crowded liquids, and lipid diffusion in simple bilayer membranes [12,18,21,22,25,28].

Non-Gaussian diffusion processes
However, an additional phenomenon was unveiled in numerous experiments recently. Namely, not only the assumption of normal diffusion is no longer generally valid, numerous experiments have shown a new class of diffusive dynamics in which the fundamental Gaussian property (2) is violated [56][57][58][59]; see also the additional references in [60]. In many of these observations the MSD is still linear, of the form (1), however, the probability density function has the exponential shape (often called Laplace distribution) [56,61,62] How can such observations be explained physically? One of the approaches allowing to explain the emergence of the Laplace distribution is that the measured particle motion does not correspond to samples of the distribution (2), but to a mixture of individual Gaussian processes with different values of the diffusivities D. In statistics such an object is called a compound or mixture distribution [63]; in the analysis of diffusion processes this type of model is called superstatistical [64] (which stands for "superposition of statistics") or "doubly stochastic" [65], which is a term for stochastic models generalised by replacing some parameter, for instance, D, by a random process. The observations of the Laplace distribution (6) can be justified by assuming that the diffusion coefficient D is a random variable with exponential distribution. Every single trajectory is still Gaussian, but the probability density calculated from the whole ensemble is a compound distribution, in this case exactly the Laplace distribution [56].
There are a few physical interpretations that explain the randomness of D. The particles that we observe may not be identical and their different shapes and interactions with the surroundings could and should affect how quickly they diffuse. The environment may also be not homogeneous, which is an expected property of many complex systems, especially biological ones, such as cell membranes. In this situation the diffusion coefficient is local and position dependent D = D(x) [66,67]. If the particle is moving along trajectory X(t) the effective diffusivity felt is indirectly timedependent, D(t) = D(X(t)). Time dependence can also be direct (D = D(t, X(t))) if the environment is changing e.g. because of motion of other particles. Often an approximation is used in which in which D is assumed to evolve independently from X. This is a "diffusing diffusivity" approach proposed by Chubinsky and Slater [68], which is currently being actively developed [60,69,70].
According to the superstatistics approach [64] the different diffusivities correspond to the motion of one given particle in a region with a given D-value. At sufficiently short time scales the observed particles are relatively localised in such a region, and D can be considered to be constant for each trajectory. The whole ensemble of particles behaves as a system with random D with a distribution of D-values mirroring the spacial and temporal dispersion of D(t, x) [71]. Beck proposed that in turbulent media one could consider the Langevin equation (3) to be valid, however, the effective temperature is random such that (k B T ) −1 is distributed according to χ 2 statistics [64,72,73].
We note here that viscoelastic anomalous diffusion with Laplace shape of the probability density function was observed for the motion of messenger RNA molecules in the cytoplasm of bacteria and yeast cells [74,75], while stretched Gaussian shapes were unveiled in the motion of lipids in protein-crowded lipid biliayer systems [76].
In what follows we study a natural extension of this idea: what if not the temperature, but the properties of the stochastic force in equations (3) and (5) is random? Such an assumption may be justified in the same way as the randomness of D. Namely, this situation can be realised in an ensemble of particles with varying systems parameters, or in non-homogenous media. This approach resembles to some degree models such as the GLE with kernels, that are a mixture of more elementary functions [43,77]. Similar ideas appear in financial modelling with "gamma-mixed Ornstein-Uhlenbeck process" [78]. However in these models the observed trajectories are not examples of compound distributions but result from deterministic dynamics, which can be interpreted as an average over random local dynamical laws. In our approach the studied processes are truly superstatistical.
The paper is structured as follows. In section 2 we introduce the GLE with random parameters and discuss its elementary properties. Section 3 then considers the concrete case of a compound Ornstein-Uhlenbeck process within the superstatistical approach. In section 4 more complex types of memory are proposed and studied, including oscillatory regimes. Our findings are discussed in section 5. In the appendix some more technical details are presented.

Generalised Langevin equation with random parameters
Our starting point is the GLE assumed to depend on some parameter c, which may describe the type of diffusing particle and/or the local properties of its environment. The parameter c can, in principle, be a number or a vector. In the GLE the stochastic force then also becomes parametrised by c, ξ = ξ c . Due to their coupling via the Kubo fluctuation-dissipation relation, also the memory kernel depends on c, K = K c (see below). The solution of the GLE, the velocity and position processes can then also be considered to be functions of this parameter, V = V c , X = X c . These phase space coordinates thus solve the set of equations The constant √ k B T and the mass m actually only rescale the solutions. Considering c dependent mass m and temperature T would result in a c dependent diffusion constant D c . This type of influence was extensively studied before [64,72,73,79,80], and therefore we will omit this ramification in our analysis and assume m = k B T = 1 in what follows.
In the above definition we tacitly assumed that the introduction of the parameter c does not change the spatially local structure of the GLE, and we assume that the fluctuation-dissipation theorem remains valid in the form By design the GLE is a spatially local equation. The variable X interacts with the heat bath only in its neighbourhood. The bath degrees of freedom themselves do not interact with each other directly, which prohibits spatial long-range correlations. Longtime correlations can still be present, but they result from the interactions between X and the bath degrees of freedom, which "store" the memory structure for a long time, but do so only locally. That means for each fixed value c the fluctuation-dissipation theorem should still hold. For every c, the GLE can be solved using the Green's function formalism. The stationary solution of equation (7) is given by where the Green's function G solves the equatioṅ Equivalently, G c is the inverse Laplace transform of where by G c (s) we denote the Laplace transform of G c (t).
Generally the superstatistical solution V C and X C of the GLE emerges when the parameter c is drawn from some distribution, which we denote by substituting a big letter C for it. In order to get a better feeling, as a guiding example let us consider the simple case of a discrete set of local environments or types of particles. We number them by c = 1, 2, 3, . . . The random variable C with distribution P(C = k) = p k then describes how many trajectories are evolving in each environment or correspond to each particle type. With probability p k the observed trajectory evolved according to the GLE (7) with kernel K k and stochastic force ξ k . In general C may not be discrete, but can be a continuous parameter. The latter case is more complex and interesting, allowing to model a wider range of phenomena, and this will be our main point of interest in the following.
When we consider the solutions of the superstatistical GLE (that is equation (7) together with a model distribution for C) we are interested in two types of observables: ensemble and time averages. Ensemble averages correspond to quantities relevant when an experiment averages over many particles, as in the pioneering experiments of Perrin [81,82]. Single particle tracking experiments with sufficiently long individual particle traces, as those introduced by Nordlund [83], are typically evaluated by time averages [84]. For the case of ensemble averages the situation is simple, these quantities can be calculated using the so-called "tower property", which can be applied to any random function f (X) In what follows we omit the subscripts in the notation E (·) and use the convention that the variables will always be averaged with respect to their natural distribution. In order to calculate ensemble averages of V C and X C we simply need to calculate ensemble averages of V c , X c for fixed c and average them over the distribution of C. In particular, if C and X c have the probability density (PDF) p C and p Xc , the density of X C becomes For the time averages, the most commonly used quantity is the time averaged MSD, which for a trajectory of length T (observation time) reads [10,36,84] The last form stresses that δ 2 (t; T ) can be viewed as a function of the velocity V C , which here is a stationary process. In this work we will be mostly interested in the limit T → ∞ for which one can omit subtracting t and use the simpler average lim T →∞ 1 T T 0 (·)dτ and determine it using tools from ergodic theory. For every choice of c the stochastic force ξ c is stationary and Gaussian, and its covariance function decays to zero, r ξc (t) → 0 as t → ∞. The famous Maryuama theorem [85,86] guarantees that in such a case the process is mixing, in particular, it is ergodic. So the stationary solution V c of the GLE (7) must be mixing and ergodic, as well, for every choice of c [87]. Any time average coincides with the ensemble average, that is, for every function of However, the superstatistical solution V C cannot be ergodic as averaging over one trajectory one cannot gain insight into the distribution of C. But, the process V C ‡ Note, however, that generally a process described by the GLE can be transiently non-ergodic and ageing, as shown in references [21,88,89].
is still stationary, as, for every c, V c is stationary. In such a case the behaviour of the time averages is determined by Birkhoff's theorem [86,90] which guarantees that All time averages converge to a random variable E[f (V C (0))|M], which is an expected value conditioned by M summarising all constants of motion of V C . For isolated systems M would correspond to the energy, total momentum and similar quantities. In our case here every trajectory V c itself is ergodic, so it has no internal constants of motion. Therefore the only constants of motion are the local states of the environment, denoted by C. This statement is intuitively reasonable: given a trajectory evolving with C = c all time averaged statistics converge to the values corresponding to the solution of the GLE with K c and ξ c , which, due to the ergodic theorem (15) are exactly the conditional For the MSD this means that which is a function of the random parameter C and time t. One consequence of this is that the ergodicity breaking parameter [36,91,92] never equals zero and does not converge to zero even in the asymptotic limit t → ∞, for any t = 0, as the mean of a square equals the square of a mean only for non-random variables.
The ensemble averaged MSD can be directly obtained from the Green's function G c (t). Namely, proposition 1 in the Appendix proves that the covariance function of V c equals the Green's function, r Vc = G c § such that the ensemble averaged MSD of X c reads Moreover, as G c is a covariance function, it is bounded, |G c (0)| ≤ G c (0) = 1. Thus, from relation (19) we see that δ 2 Xc (t) ≤ t 2 /2, so the MSD is always finite and the motion governed by the superstatistical GLE is sub-ballistic. The result for the MSD assumes a particularly simple form in Laplace space, Note that G c (0) = 1 also implies that δ 2 Vc (t) = 1. For any t the value V C (t) is not superstatistical, it is simply a Gaussian variable with variance 1, which is the same as for V c (t) with any c. At the same time the covariance function r V C is decaying as a mixture of decaying functions r Vc . Without careful consideration this may seem contradictory: the Maryuama theorem states that if a stationary Gaussian process has a decaying covariance function, it is mixing and ergodic, but V C is stationary, Gaussian at every t, has a decaying covariance function and is not ergodic.
The solution to this seeming contradiction is the fact that while V C is Gaussian at every instant of time t, it is itself not a Gaussian process. For a stochastic process to be Gaussian, it is not sufficient that is has a Gaussian marginal distribution but also a Gaussian joint distribution. The solution V C is an interesting physical example of an object witch has Gaussian marginals, but non-Gaussian memory structure. Such processes are well-known to exist: it is enough to take some non-Gaussian process X(t) and transform it using its own cumulative distribution function, The resulting process Y (t) has uniform distribution for every t, and it is enough to transform it a second time using a normal quantile function to obtain a process with Gaussian PDF, yet a complicated and particularly non-Gaussian type of dependence. However this construction can be considered artificial and without physical meaning. The unusual non-Gaussianity of V C here arises naturally from the physical model. Process V C could be very misleading during the analysis of measured data, using only basic statistical methods it will seem Gaussian. We will show techniques which can be used to unveil its non-Gaussianity in the next Section for specific examples.

Overview of the model
The classical Langevin equation can be considered as an approximation of the GLE in which the covariance function r ξ decays very rapidly in the relevant time scale. The solution of the Langevin equation exhibits many properties typical to the GLE in general. We fix the mass of the particle and the bath temperature, so equation (3) is governed solely by the parameter λ. The superstatistical solution is thus V Λ , where Λ > 0 is a random variable, which can be interpreted as a local viscosity value. The Langevin equation can be solved using the integrating factor exp(Λt), which yields the stationary solution The solution V λ for fixed λ is often called Ornstein-Uhlenbeck process, so V Λ may be called a compound Ornstein-Uhlenbeck process. It can also be represented in Fourier space. Calculating the Fourier transform of equation (3) demonstrates that where we note that the Fourier transform of Gaussian white noise ξ is another Gaussian white noise. Another useful representation is the recursive formula, which is fulfilled by process in discretised time moments. If we solve the Langevin equation (3) using the integrating factor exp(Λt) but integrate from time k∆t to (k + 1)∆t we obtain where the noise Z k has the same distribution as a Gaussian discrete white noise ξ k multiplied by a random constant. The series Z k is, conditionally on Λ, independent from past values V Λ (j∆t), with j < k. Such a process is called a random-coefficient autoregressive process of order 1, in short AR(1) [93,94] with autoregressive coefficient exp(−∆tΛ). When there are only few distinct populations and Λ has only few possible values, they can even be recognised on the phase plot of y = V Λ ((k + 1)∆t) versus x = V Λ (k∆t), see Figure 1. There, the two distinct populations with different autoregressive coefficients can be distinguished. Both have Gaussian distribution, but each one has a distinct elliptical shape. The total distribution, as a mixture of two ellipsoids, is not Gaussian, nor even elliptical. The projection of the joint distribution on x or y axis are the PDF of V Λ (t) and are Gaussian, thus, one needs at least a twodimensional phase plot to reveal the non-Gaussianity of V Λ . For a larger number of populations the phase plot would be much less clear, but the huge advantage of this method is that it works even for trajectories of very short length. The situation becomes more complex and interesting when Λ assumes a continuous distribution. The covariance function of the Ornstein-Uhlenbeck process is Here some care needs to be taken, as the factor 1/2 differs from the covariance of the solution of the GLE for which generally r V (0) = 1 (this is due to the fact that it corresponds to a degenerate Dirac delta kernel). If Λ has the PDF p Λ , the covariance function of V Λ is so it is the Laplace transform of p Λ : in probabilistic language this quantity would be called a moment generating function of the variable −Λ. For instance, if Λ is a stable subordinator with index 0 < α < 1 [95] the covariance function is the stretched exponent which is a common relaxation model [96][97][98]100], sometimes referred to as Kohlrausch-Williams-Watts relaxation [101,102]. If Λ can be decomposed into a sum of two independent random variables Λ = Λ 1 + Λ 2 , the corresponding covariance function is a product, Therefore, in this model various kinds of truncations of the kernel correspond to a decomposition of Λ, for instance, if Λ = λ + Λ ′ with deterministic λ > 0, the covariance function r V Λ will be truncated by exp(−λt). Some general observations about the behaviour of r V Λ can be made. When Λ has a distribution supported on an interval, such as λ 1 < Λ < λ 2 , and its PDF has no singularity, it is necessarily bounded, that is, m ≤ p Λ (λ) ≤ M. In this case, The integrals on the left and right have asymptotics of the form Here we introduce the notation of an asymptotic inequality, which will be useful later on. We write f g if f ∼ h ≤ g for some function h. Using this notion we can write the above results as This equation proves that , so it can be interpreted as the choice taken using the weakest possible assumptions.
Heavier tails of r V Λ may be observed only when the distribution of Λ is concentrated around 0 + .The most significant case of such a distribution is a power law of the form p Λ (λ) ∼ λ α−1 , with λ → 0 + and α > 0. For any distribution of this type Tauberian theorems guarantee that the tail of the covariance has a power law tail [103,104] For α < 1 the process V Λ exhibits a long memory. This observation can be refined as follows. In Proposition 2 i) we present a generalised Tauberian theorem, which states that if the PDF of Λ contains slowly-varying factor L, then the tail of the covariance contains the factor L(t −1 ). One example of such a slowly-varying factor is | ln(λ)| β , β > 0, so heavy tails of the covariance of the power law form t −α ln(t) β can also be present for compound Ornstein-Uhlenbeck process if the distribution of Λ exhibits a logarithmic behaviour at 0 + . This observation proves that this equation can also describe ultra-slow diffusion and can be considered as an alternative to more complex models based on distributed order fractional derivatives [77,105].
This is similar to the Landau-O notation, but when we write we also include the value of the constant factor which would be omitted writing r VΛ = O(t −1 e −λ1t ). ¶ The same situation is sometimes denoted in terms of the "large theta" notation, that is r VΛ = Θ(t −1 e −λ1t ).
In Section 2 we noted that the superstatistical solutions of the Langevin equations are not Gaussian, however, they can be easily mistaken to be Gaussian. The marginal distributions of V Λ are Gaussian at any time t, only the joint distributions are not. This means that the multidimensional PDF of the variables does not have Gaussian shape. This fact is easy to observe studying the characteristic function of the two point distribution, which is a Fourier transform of the two-point PDF. Let us fix V Λ (τ ), V Λ (τ + t) and define the two-point characteristic function as For any deterministic λ this function is determined by the covariance matrix Σ t of the so in the superstatistical case the characteristic function reads As we argued, the marginal factors exp(−θ 2 1 /4), exp(−θ 2 2 /4) are indeed Gaussian, but the cross factor describing the interdependence is not. The function φ Λ would describe a Gaussian distribution if and only if the factor E exp(θ 1 θ 2 e −Λt /2) had the form exp(aθ 1 θ 2 ). But we see that it is in fact a moment generating function of the variable exp(−Λt) at point θ 1 θ 2 /2, which is an exponential if and only if Λ equals one fixed value with probability unity. The compound Ornstein-Uhlenbeck is never Gaussian for non-deterministic Λ.
This property is also evident if we calculate the conditional MSD of X Λ , At short times t this approximately is t 2 /4, so the distribution is nearly Gaussian and the motion is ballistic. However at long t the dominating term is Such a situation occurs when the distribution Λ is not highly concentrated around 0 + . When Λ has a power-law singularity as in (31), that is λ α−1 at 0 + , 0 < α < 1, this condition is not fulfilled: E[Λ −1 ] = ∞. But in this situation the assumptions required for the Tauberian theorem hold and we can apply it twice: first for relation (31), to show that s → 0 + and the second time for relation (20), to prove that In this regime the system is superdiffusive. The transition from superdiffusion (0 < α < 1) to normal diffusion (1 ≤ α) is unusual among diffusion models. Fractional Brownian motion and fractional Langevin equation [41,46,106] undergo transitions from super-to subdiffusion at a critical point of the control parameter. This is so as in these models in this the change of the diffusion type is caused by the change of the memory type from persistent to antipersistent. But the Ornstein-Uhlenbeck process models only persistent dependence, so the mixture of such motions also inherits this property. For 1 ≤ α (and any other case when E[Λ −1 ] < ∞) this dependence is weak enough for the process to be normally diffusive, for smaller values of α it induces superdiffusion.
In the introduction we already mentioned that it is commonly observed that the distribution of the position process is double exponential, see equation (6) and references below. This exact distribution is observed when D has an exponential distribution E(β) For the corresponding compound Ornstein-Uhlenbeck process the distribution of Λ is given by Λ = (4D) −1 and for such a choice the process models normal diffusion with a Laplace PDF. Moreover the covariance function of the velocity process is where we used one of the integral representations of the modified Bessel function of the second kind K 1 (see [107], formula 10.32.10). This function has the asymptotic , formula 10.40.2), so the covariance function behaves like This behaviour is shown in Figure 2, where we present the covariance function corresponding to the Laplace distributed X Λ (t) with random diffusion coefficient D d = E(2). We do not present the Bessel function (38), as it appears to be indistinguishable from the result of the Monte Carlo simulation. Figure 2 also shows how to distinguish this behaviour from an exponential decay on a semi-logarithmic scale: the covariance function and its asymptotic are concave, which is mostly visible for short times t. Analysing the shape of the covariance function can serve as a method to distinguish between a superstatistic introduced by a local effective temperature and the distribution of mass from superstatistics caused by the randomness of the viscosity Λ. In the former case the resulting decay is exponential (as in the non-superstatistical Langevin equation) or even zero for a free Brownian particle, in the latter case it is given by relation (38).

Gamma distributed Λ
In order to better understand the superstatistical Langevin equation we will consider a simple model with one particular choice for the distribution Λ. After going through this explicit example we will come back to the general case at the end of this section.
A generic choice for the Λ distribution is the Gamma distribution G(α, β) with the PDF This corresponds to a power law at 0 + which is truncated by an exponential. As the conditional covariance function is an exponential, too, many integrals which in general would be hard to calculate, in this present case turn out to be surprisingly simple. The Gamma distribution is also a convenient choice because many of its special cases are well established in physics. The Erlang distribution is the special case of expression (40) when α is a natural number. An Erlang variable with α = k and β can be represented as the sum of k independent exponential variables E(β), in particular, for k = 1 it is the exponential distribution itself. The Chi-square distribution χ 2 (k) is also a special case of expression (40) where α = k/2, β = 1/2. The Maxwell-Boltzmann distribution corresponds to the square root of χ 2 (3), and the Rayleigh distribution to the square root of χ 2 (2). We already know from relation (31) that r V Λ has a power tail ∼ 2 −1 β α t −α , more specifically, direct integration yields This is solely a function of the ratio t/β which suggests that the parameter β changes the time scale of the process. Indeed, for any λ the process V λ (bt) is equivalent to V bλ (t), because the Gaussian process is determined by its covariance function, which in both cases is the same. Therefore, also the compound process V Λ (bt) is equivalent to V bΛ (t) and bΛ has the distribution G(α, β/b).
The function (41) would be observed if we calculated the ensemble average of V Λ (τ )V Λ (τ + t) for some τ . If instead the covariance function would be estimated as a time average over individual trajectories, the Birkhoff theorem determines that the result would be a random variable, equal to the conditional covariance It is straightforward to calculate the PDF of this distribution, The mean value of this quantity is given by result (41). This PDF is zero in the point x = 1/2 if α > 1 but has a logarithmic singularity at x = (1/2) − if α < 1 (that is, in the long-memory case). It is zero in x = 0 for t < β as in expression (43) any power law dominates any power of the logarithm. For t > β there is a singularity at x = 0 + which approaches the asymptotics x −1 | ln x| α−1 as t → ∞. This behaviour can be observed in Figure 3, illustrating how the probability mass moves from (1/2) − to 0 + as time increases.
As we already know, the compound Ornstein-Uhlenbeck process is non-Gaussian. Let us follow up on this property in more detail. To study the characteristic function we need to calculate the average in equation (34), which is actually the moment generating function for the random variable exp(−Λt). Some approximations can be made. let us assume that Λt is small in the sense that the probability that this variable is larger than some small ǫ > 0 is negligible. In this regime we can approximate exp(−Λt) ≈ 1 − Λt and find The first factor describes a distribution of V Λ (τ ) = V Λ (τ + t). So in our approximation we assume that the values in the process between short time delays are nearly identical and the multiplicative correction (1 − tθ 1 θ 2 /(2β)) −α is non-Gaussian. The second type of approximation can be made for long times t when exp(−Λt) ≈ 0.
In this case Now we treat the values V Λ (τ ) and V Λ (τ +t) as nearly independent, the small correction is once again non-Gaussian. Apart from the approximations, the exact formula for φ Λ can be provided using the series which is absolutely convergent.
Note that for the specific choice θ 1 = θ, θ 2 = −θ the function φ Λ is a Fourier transform of the probability density of the increment ∆V Λ (τ, t) := V Λ (τ ) − V Λ (τ + t), which therefore equals Clearly, any increment of V Λ is non-Gaussian. This is demonstrated in Figure 4, where we show − ln( p ∆V Λ (τ,1) (θ)) on the y-axis. In this choice of scale Gaussian distributions are represented by straight lines. The concave shape of the empirical estimator calculated using Monte Carlo simulation shows that the process V Λ is indeed non-Gaussian. In the same plot we present the two types of approximations of p ∆V Λ (τ,1) : for t → 0 + we have equation (44), which reflects well the tails θ → ±∞, and for t → ∞ we see that with several terms of the series (47) a good fit for θ ≈ 0 is obtained. It may appear counter-intuitive that the values V Λ (t), which are all exactly Gaussian, are sums of non-Gaussian variables. If the increments were independent that would be impossible, here their non-ergodic dependence structure allows for this unusual property to emerge. However, the they are still conditionally Gaussian with variance The non-Gaussianity is prominent for short times t. As t increases, the distribution of ∆V Λ (τ, t) converges to a Gaussian with unit variance. The non-Gaussian memory structure of the velocity V Λ affects also the distribution of the position X Λ , which, using result (35), for large t becomes For α = 1/2 and long t the position process X Λ is approximately Cauchy distributed. For short t it is nearly Gaussian distributed with variance t 2 /4. In general the above formula is a PDF of the Student's T-distribution type, although unusual in the sense that most often it arises in statistics where it is parametrised only by positive integer values of α. The parameter α in the above expression determines the decay of the tails of the PDF, which, as we can see, scale like x −2α−1 : the parameter β rescales time, but in an inverse manner compared to its action on V Λ . It may therefore seem that for α ≤ 1/2 the process X Λ may not have a finite second moment, however, this is not true. In Section 2 we made the general remark that the MSD of the superstatistical GLE is necessarily finite, see the comments below equation (19). In our current case δ 2 X λ (t) is given in expression (35), and thus This is indeed a bounded function of control parameter λ, and the MSD of X Λ must be finite for any distribution of Λ. Moments of higher even order can be expressed as so they are all also finite as well. Integrating twice relation (41) for Gamma distributed Λ it can be shown that This describes superdiffusion for 0 < α < 1 and normal diffusion for 1 ≤ α in agreement with the more general theory discussed below equation (35). Similar, a somewhat longer calculation yields which determines the asymptotics of the ergodicity breaking parameter (18) at t → ∞. Additionally, in this model it is easy to check that 3 × (EB(t) + 1) is the kurtosis of X Λ (t), that is E[X Λ (t) 4 ]/(E[X Λ (t) 2 ]) 2 . This is one of the measures of the thickness of the tails of a distribution which for any one-dimensional Gaussian distribution equals 3. Here, the distribution is clearly non-Gaussian, but it is hard to judge the tail behaviour using the kurtosis. This is due to the fact that p X Λ converges to a power-law, yet according to result (51) the tails of p X Λ must decay faster than any power, symbolically p X Λ (x, t) = O(x −∞ ) for any t. Therefore the PDF's tails are always truncated, but is not noticeable observing moments, which are affected by the finite range in which PDF becomes close to the power law. The asymptotical properties of X Λ (t) are illustrated in Figure 5, where we show the PDFs of the rescaled position position X Λ (t)/ √ t simulated with α = 1/2, β = 1 and calculated using kernel density estimator. In agreement with result (49), the limiting distribution is of Cauchy type. At the same time for all finite t the tails of the PDF remain truncated: as time increases this truncation is moved more away into x = ±∞, and as a result the MSD increases as t 2−α . This is an illustration of a more general rule: equation (49) is a Laplace transform of √ Λ, so any Λ with power law p Λ (λ) ∼ λ α−1 , λ → 0 + will result in a power law x −2α−1 as limiting distribution of X Λ . But at the same time the superstatistical Langevin equation preserves the finiteness of moments, which stems from the Hamiltonian derivation of the GLE. Therefore this model reconciles power law tails of the observed distribution with a finite second moment by naturally introducing truncation moving to ±∞ as t → ∞.

More complex memory types
We here analyse the behaviour of the superstatistical GLE (7), which is non-Markovian and may be used to model more complex types of memory structure. We study the two important cases of exponential and power law shapes for the kernel.

Exponential kernel GLE
The covariance function of force in exponential kernel GLE has the conditional form This particular parametrisation is chosen for convenience, it will simplify the formulas later. The stochastic force ξ A,B in this model is the compound Ornstein-Uhlenbeck process considered in Section (3), additionally rescaled by random coefficient B 2 . It may be represented in time space or Fourier space as We first solve the corresponding deterministic model. The Laplace transform of the Green's function can be easily obtained in the form Its Laplace inverse is a conditional covariance function, which is the sum of two exponential functions, In the case A = B a division by 0 appears, so the above formula should be understood as a limit A → B. In any case we can calculate the MSD using relation (19) and obtain As we can see the asymptotical behaviour of the MSD at t = 0 + and t = ∞ is very similar to that of the compound Ornstein-Uhlenbeck process. For E[A/B 2 ] < ∞ this GLE models normal diffusion with a random diffusion coefficient. When this condition is not fulfilled it may model superdiffusion determined by the power-law tails of the covariance function, compare relation (20) and the discussion below. The behaviour of this system greatly depends on whether A < B, A = B or A > B. All ensemble averages can be separated between these three classes (60) so even if all three regimes can be present in a physical system, we can model them separately and average the results at the end. We start the analysis with the simplest case.
4.1.1. Critical regime A = B. Taking the limit A → B in expression (58) or calculating the inverse Laplace transform from equation (57) with A = B we determine the form of the conditional covariance within this critical regime, The behaviour of the resulting solution V A is very similar to that of the compound Ornstein-Uhlenbeck process. The differences are mostly technical. For example, if A = A 1 + A 2 for some independent A 1 and A 2 , then Therefore, for instance if A > a 0 we can write A = A ′ + a 0 , A ′ > 0 and the covariance function becomes truncated by a 0 t exp(−a 0 t).
The formula for r V A consist of two terms. Function At exp(−At) has a thicker tail, but the asymptotic behaviour of r V A is determined by the distribution of small values A ≈ 0, so it is not clear which term is most important in that regard. If we assume p A (a) ∼ a α−1 then so actually both terms have comparable influence over the resulting tails of the covariance.

Exponential decay regime A > B.
In this case covariance function is a sum of two decaying exponentials. Because of this A > a 0 results in an exponential truncation by exp(−a 0 t) of the associated covariance, but this time there is no simple rule to determine the behaviour of this function for A = A 1 + A 2 . Instead let us analyse expression (58) in more detail. The first exponential has a negative amplitude, the second one a positive amplitude. In additional the second exponential always has a heavier tail, as its exponent includes the difference of positive terms, A − √ A 2 − B 2 , whereas the other exponent includes a sum. Thus we expect that the exponent with A + √ A 2 − B 2 cannot lead to a slower asymptotics than the one containing A − √ A 2 − B 2 . Given this reasoning let us change the variables in the form The new parameter A ′ attains the value A ′ = B for A = B and decays monotonically to 0 as A → ∞. Note that for small values of A ′ , A ′ ≈ B 2 /(2A), so the tail behaviour of A determines the distribution of A ′ at 0 + ; in particular, a power law shape of the former is equivalent to a power law shape of the latter. Using the parameters A ′ and B the covariance function can be expressed as As A ′ < B the variables A ′ and B cannot be independent unless B is deterministic. In that latter case B = b and for A ′ concentrated around 0 + we have that so the asymptotical behaviour of this model is again in analogy to that of the compound Ornstein-Uhlenbeck process. In particular, p A ′ (a) ∼ a α−1 , when a → 0 + (equivalently p A (a) ∼ 2a −1−α for a → ∞) implies the emergence of a power law, This asymptotic does not depend on the exact choice of b, which means that the scale of the stochastic force does not affect the tails of the memory and the influence of b only matters at short times. When both A ′ and B are random their dependence may potentially be quite complex and influence the tails of the covariance in unpredictable ways. It can only be studied under some simplifying assumptions. We want to require some sort of independence between A ′ and B for small values of A ′ , which determine the asymptotics of exp(−At). So, let us denote B ′ := A ′ /B, which is a random variable that must be less than 1, but may be supposed to be independent from A ′ . Using variables A ′ and B ′ the covariance function can be transformed into In this form the influence of A ′ and B ′ is mostly factorised, the only remainder is A ′ /B ′2 in the second exponent. This leads to some immediate consequences. If the PDF of A ′ is supported on the interval [a 1 , a 2 ], and m ≤ p A ′ (a) ≤ M, then straightforward integration yields Power law tails appear when p A ′ (a) ∼ a α−1 , a → 0 + . The conditional asymptotics then reads so for the unconditional covariance we have Both types of asymptotics are similar to the behaviour of the compound Ornstein-Uhlenbeck process, only with different scaling. For the same reason r V A ′ ,B ′ (t) is truncated under the same conditions as before if it is a sum of independent A ′ 1 and where the constant depends on the distribution of A ′ and B ′ .

Oscillatory decay regime A < B.
When the square root √ A 2 − B 2 is imaginary we can express the covariance function as This represents a trigonometric oscillation truncated by the factor exp(−At). When calculating the unconditional covariance, this function acts as a integral kernel on the distribution of A and B. The exponential factor acts similarly to the Laplace transform, but oscillations introduce Fourier-like behaviour of this transformation. It can be observed in the solutions of the corresponding GLE, which we will show below. Tauberian theorems can be applied for the bound of the covariance, given by the inequality which we prove in Proposition 3 in the Appendix, together with other asymptotic properties. As a general rule it can be said that solutions of the GLE in this regime have a covariance which decays no slower than the covariance of the compound Ornstein-Uhlenbeck process with the same distribution. For example, for independent A and B, when the PDF of A is bounded and supported on some interval [a 1 , a 2 ], and for a power law at a + 1 , that is A = a 1 + A ′ with p A ′ (a) ∼ a α+1 at a → 0 + , the covariance is bounded by The scaling constants depend on the distance between the distributions of A and B: the closer they are the larger is the multiplicative factor. If The question remains if this constraint is reached. The answer is yes, the oscillations of r V A,B (t|A, B) are asymptotically regular, that is, their frequency becomes constant (exactly equal to B) at t → ∞. Because of this they are not influenced by averaging over A, so if B is deterministic B = b and A has power law p A (a) ∼ a α−1 , a → 0 + , we observe that This behaviour can be seen in Figure 6 which demonstrates that the convergence is relatively fast. During the Monte Carlo simulation the parameter B was fixed as B = π and A was taken from gamma distribution G(1/2, 1). For this distribution there exists 98.8% chance that A < π = B and the system is in the oscillatory regime, so it is indeed dominating the result, as shown in Figure 6.

Power law kernel GLE
Our last example is the superstatistical GLE in which the force has a power law covariance function, namely The force process ξ H,Z is a fractional Brownian noise with random index H, rescaled by the random coefficient Z. The factor 1/Γ(2H −1) is added for convenience and simplifies the formulas, however, its presence does not change the outcome of our analysis. The Green's function of this GLE is given by In the last form we recognise the Laplace transform of a function from the Mittag-Leffler class [108]. The asymptotic of the conditional covariance can be derived from Tauberian theorems or analysing the Mittag-Leffler function directly [108,109] From this formula we see that the distribution of Z should not have an influence on the covariance asymptotics. Further on we will assume that Z is independent from H and E[Z −1 ] < ∞. In the simple case when 0 < h 1 < H < h 2 and H has a bounded PDF, that is m ≤ p H ≤ M, one can show that following bound holds Therefore r V H,Z ≍ t −2h 1 ln(t) −1 . The proof is given in Proposition 4 i) in the Appendix. As usual, when H has uniform distribution on [h 1 , h 2 ] the asymptotics is stronger, A more interesting situation occurs when H is distributed according to a power law. Noting that t −2H = e −2H ln(t) one may suspect that the resulting covariance would exhibit power-log tails. This intuition is true. We will analyse case the when the index By imposing h 0 > 0 we prohibit a situation when values of H are arbitrarily close to 0 + because the Mittag-Leffler function diverges in this limit, otherwise it is a continuous function of H. This problem corresponds to the fact that for small H the trajectories of ξ H,Z become very irregular and as H → 0 + the solution of GLE is not well-defined. We show in the Appendix that under these assumptions the asymptotics indeed has power-log factor Because we can take h 0 arbitrarily close to 0 + in this model we can obtain tails which are very close to pure power-log shape.
To finish this section let us also comment on the properties of the position process. Equation (19) describes the MSD as a second derivative of the Green's function, so using its simple form in the Laplace space (20) The inverse transform can be found using tables of two-parameter Mittag-Leffler function, which also determines its asymptotics [108,109] The presence of the factor 1/Z means that this superstatistical GLE can model anomalous diffusion with non-Gaussian PDF's. This dependence on Z is the same as for the parameter Λ of the compound Ornstein-Uhlenbeck process, so both exponential and power law tails can be present in this model in an analogous way. As for the asymptotic of δ 2 X H,Z (t), the identical argument as for the covariance can be used, so in this model the MSD of the form is present for 0 < h 0 < 1 and 0 < α ≤ 1. A numerical evaluation of this behaviour is shown in Figure 7 where we have taken the subdiffusive case H = 3/10 + H ′ , with p H ′ (h) = α5 −α h α−1 and α = 3/4, 0 < H ′ < 1/5. The factor t 2 in expression (84) does not depend on the particular form of the dynamics, so we divided all shown functions by this factor to highlight the influence of H. As we can see the convergence to the asymptotic behaviour is much slower than in the previous examples, which stems from the fact that the Mittag-Leffler function converges slowly to the power law The inclusion of the log power law is significant, but may be difficult to determine on the log log scale. It is demonstrated in the two lower panels in Figure 7. The asymptotic MSD is concave on log log scale, but the effect is not very prominent, and for the MSD estimated from Monte Carlo simulations can be detected only on very long time scales. The difference from a power law is more visible if the lines are shown without the factor t −2h 0 (bottom panel), but h 0 may not be easy to estimate for real systems. This comparison shows that different possible forms of decay can be easily mistaken, so one should exert caution when analysing data suspected to stem from such systems.

Conclusions
We here studied the properties of the solutions of the superstatistical generalised Langevin equation. This is based on the Gaussian GLE, but includes random parameters determining the stochastic force. This new type of processes has a number of properties that are unusual among the models of diffusion. Firstly, the velocity is a stationary but non-ergodic process. The behaviour of its time-averages can be studied, for instance, using the time averaged covariance function (42), see also Figure 3. The resulting process is moreover not Gaussian. At every point of time it has a Gaussian distribution but exhibits a non-Gaussian structure of the memory, see equations (44) and (46). One consequence of this is that it has non-Gaussian increments (47), also shown in Figure 4. Secondly, the position process at small times has a PDF which is approximately Gaussian, but as time increases it converges to non-Gaussian PDF, as demonstrated in Figure 5. This limit distribution can exhibit commonly observed exponential tails but also power-law tails (49). Even when the limiting distribution does not have a finite second moment, at any given finite time t the position process does have a finite MSD. The observed PDFs are always truncated. For short memory GLE, the MSD of the position process is normal or superdiffusive, see (35) and the discussion below. For power law memory models, anomalous diffusion with additional log-power law may be observed (84).
Various kinds of the GLE, distributions of shape parameters and the resulting asymptotic properties of the covariance function r V based on the model developed herein, are shown in Table 1. The notation is chosen as follows: , and f (t) ≍ g(t) when for some constants 0 < c 1 < c 2 . The middle and left columns can be slightly generalised if the distribution of shape parameters contains a slowly varying factor, see Proposition 2. We see our work as a part of the development of the superstatical approach to the modelling of diffusive processes in complex media. The assumption that the stochastic force driving the GLE can change its properties from one localised trajectory to another appears quite natural. Moreover, the superstatistical GLE can simultaneously explain the presence of non-Gaussian distributions and normal or anomalous type of diffusion. The properties of the solutions listed above are specific enough to clearly distinguish this model from possible alternatives, in particular, from the presence of non-Gaussian PDFs caused by random rescaling of the process, which does not change the type of Shape parameter distribution r ξ bounded on interval power law at 0 + power law + c, c < 1 -log law (81) Table 1. Different asymptotics of the covariance function r V for different GLE: memoryless, exponential kernel in decay regime (under convenient parametrisation), exponential kernel in oscillating exponential decay regime, and power law. Different distributions of the shape parameters Λ, A ′ , A, H are considered memory that is observed in the system. We show how in our model the ensembles of short memory trajectories can, concertedly, give rise to a long memory. This dependence is highly non-Gaussian despite the Gaussian PDF of the velocity process. The presence of such peculiar phenomena in a physical model is an interesting theoretical finding in its own right. Its main relevance, however, lies in the description of stochastic data, highlighting the importance of comprehensive Gaussianity checks in data analysis. We also note that the classical Langevin equation, considered in Section 3, exhibits most of the properties specific for superstatistical GLE-even the presence of powerlog tails. This model has a very simple form and at the same time allows for the derivation of not only asymptotic results, but also many exact ones, given distribution of the dumping coefficient or viscosity. Its most significant limitation is that it cannot model subdiffusion. As such it should be considered a simple yet robust model for data with linear or superdiffusive msd, power law or log-power law covariance function and non-Gaussian PDFs.
Finally, let us add a caveat. Of course, any locally diffusing particle will eventually reach the border of its domain characterised by a specific set of diffusion parameters. For time ranges much longer than this dissociation time, first a coarse-graining of the diffusion parameters will arise, and eventually the diffusion will be governed by a Gaussian diffusion process with a single, effective diffusion coefficient, similar to the observations in the diffusing diffusivity model [60].

Appendix: Proofs
We here provide several propositions with proofs, needed in the development of the theory in the main body of the paper. Proposition 1. The covariance function of any stationary solution V of the GLE (5) equals where G is the corresponding Green's function of the GLE, and the MSD of the position process Proof. We will assume that the Green's function G and kernel K are functions on R for which G(t) = 0 and K(t) = 0 for t < 0 (such functions are called "casual"). Using this notation all integrals are defined on R, which simplifies calculations. The solution of the GLE is stationary and has zero mean, so we can take t > 0 and calculate the covariance function as The covariance function r ξ is not casual, but it is symmetric, so it can be represented as r ξ (τ ) = K(τ ) + K(−τ ). This formula fails only at τ = 0, but it does not affect result of integration. The integral separates into two parts, the first being where we used relation (10) in the interior of support of G. Similarly, for the second integral, In I 1 and I 2 we can substitute −τ 1 = τ and −τ 2 = τ . In their sum we recognise the formula for integration by parts, which yields Now, for the position process we find Because of the symmetry between τ 1 and τ 2 the integral is twice the term with G(τ 1 −τ 2 ), after substitution τ 1 − τ 2 = τ ′ 2 > 0 we get Proposition 2. If L is a slowly varying function at 0 + , that is lim λ→0 + L(λx) L(λ) = 1 for any x > 0, then i) A distribution of the form p Λ (λ) ∼ λ α−1 L(λ), λ → 0 + (95) implies that the mean value of the exponential satisfies E e −Λt ∼ Γ(α)t −α L(t −1 ), t → ∞.
ii) A distribution of H of the form H = h 1 + H ′ with h 1 > 0 implies that the mean value of a power law satisfies Proof. We will only show ii), as the proof of i) is similar and simpler. We write the integral for E[t −2(h 1 +H ′ ) ], reformulate it as a Laplace transform using t −2H ′ = e −2H ′ ln(t) , change variables and calculate the limit Proposition 3. Let r V A,B be a covariance function corresponding to the solution of a GLE with an exponential kernel in the oscillatory regime, Then the following asymptotical properties hold: i) For A with bounded PDF p A ≤ M supported on the interval [a 1 , a 2 ], a 2 < B and independent of B, there exists the asymptotic bound ii) If additionally A exhibits a power law behaviour at a + 1 , that is, A = a 1 +A ′ , p A ′ (a) ∼ a α+1 for a → 0 + , the asymptotic bound can be refined to iii) For p A (a) ∼ a α+1 at a → 0 + and deterministic B = b the asymptotic limit of covariance function is which holds for all sequences of t k → ∞ which do not target zeros of cos(bt), that is |bt k − lπ + π/2| > ǫ for all k, l ∈ N and some ǫ > 0.
Proof. We start from the simple inequality This allows us to prove i), namely: Proving iii) requires a more delicate reasoning. We write r V A,b (t) as an integral and change variables at → a, so that After change of variables the Fourier oscillations depend on the variable √ b 2 t 2 − a 2 . In the limit t → ∞ they converge to oscillations with frequency b, Now we check the asymptotics of the integral above, Taking the limit ǫ → 0 and averaging over Z proves point i).
For ii) let us first study the behaviour of the power law asymptotic itself, Now, because of asymptotic (115) for every ǫ > 0 there exist a T H such that for t > T H This inequality holds for any H in a closed interval [h 1 , h 2 ] and fixed Z. As the Mittag-Leffler function and the power function are continuous with respect to H in this range, we can find T sufficiently large such that this inequality will hold for t > T and all H ∈ [h 1 , h 2 ] simultaneously. Otherwise we could take T k → ∞ and corresponding H k ∈ [h 1 , h 2 ] for which it does not hold and obtain a contradiction with continuity of H → E 2H,β (−Zt 2H ) or asymptotic (115) at an accumulation point of the sequence H k . We may divide (119) by and consider some large t > T in order to obtain Taking the limit ǫ → 0 and averaging over Z yields the desired asymptotic