Codifference can detect ergodicity breaking and non-Gaussianity

We show that the codifference is a useful tool in studying the ergodicity breaking and non-Gaussianity properties of stochastic time series. While the codifference is a measure of dependence that was previously studied mainly in the context of stable processes, we here extend its range of applicability to random-parameter and diffusing-diffusivity models which are important in contemporary physics, biology and financial engineering. We prove that the codifference detects forms of dependence and ergodicity breaking which are not visible from analysing the covariance and correlation functions. We also discuss a related measure of dispersion, which is a non-linear analogue of the mean squared displacement.


A.1. Statistical measures in modelling of diffusion
The analysis of stochastic systems has three important and partially distinct aspects: models, properties and estimation. These roughly correspond to physical, mathematical and statistical aspects of research. Modelling is concerned with explaining the nature of a system according to the underlying theory (e.g. "the particle undergoes Brownian motion, because it rapidly exchanges momenta with the molecules of liquid"). The analysis of statistical properties (also called "measures") ‡ relates these models with observable quantities ("Brownian motion has a linear mean squared displacement"). By using suitable estimators we link these parameters to the experimental data ("the mean squared displacement can be efficiently estimated by an arithmetic average over squared displacements").
This work is motivated by our conviction that the choice of statistical measures is too small for contemporary needs, as the scope and number of models increased ‡ Strictly speaking, these are sufficiently regular functionals acting on the space of observations. In quantum mechanics each such linear functional corresponds to an observable. In statistical mechanics a similar rôle is fulfilled by E[f (X)] for bounded continuous functions f . In statistics linearity is usually not required and various measures have the form g(E[f (X)]). This is the case also in the present work.
considerably [1]. The classical models based on the Langevin equation [2], the generalised Langevin equation [3,4], as; well as short- [5] and long- [6,7] memory random walks were complemented by motions on fractals [8], motions in complex energy landscapes [9], random walks in random environments [10,11], random walks with correlated steps and waiting times [12,13,14,15,16] and Lévy walks [17], spatially heterogeneous diffusion processes [18], diffusing-diffusivity [19] and more. Distinguishing between different models from this wide class is of course crucially dependent on the physical understanding of the system, but this requirement does not lessen the importance of empirical verification based on various measures and corresponding estimators. From an experimental point of view the large range of different stochastic processes is called for by ever more detailed insights garnered in highly complex environments such as living biological cells or membranes, for instance, by single particle tracking of individual sub-micron tracers of even fluorescently labelled single molecules [20,21,22].
Traditionally, in the study of diffusion phenomena, the three most basic and popular statistical measures in use are: the mean as a measure of location, the mean squared displacement (MSD) as a measure of dispersion and the covariance as a measure of dependence, respectively , r X (t) := E[(X s+t −µ X (s+t))(X s −µ X (s))].
(A1) Other, alternative choices of measures could be, for example: the median for the location [23], entropy [24] or quantile ranges [23] for the dispersion, the rank correlation [23,25] or the mutual information [26] for the dependence.
The covariance as defined above should not depend on the choice of s, which is true for stationary processes (the term "non-ageing" is also in use). We will assume stationarity whenever we will be studying memory. In practical applications this condition is fulfilled by many types of confined motions or increments of free diffusions. The more general non-stationary case will be only briefly mentioned in Eqs. (C2) and (C4). Many of the arguments presented here could be further extended to non-stationary models, but it would require a case-by-case study. Conversely, measures of dispersion and location are interesting mostly for non-stationary (ageing) processes, otherwise they are constant, and the discussed cases will fit into that category.
The present range of typically employed measures, which could be effectively used for studying diffusion is indeed quite limited, and the need of a wider range of methods has been acknowledged for many years. Various papers proposed, e.g. studying higher order moments and ratios of moments [27], running maximum [28], p-variation [29], or time averages and ensemble averages of time averages [30]. A prominent example of the last kind of measure is, e.g., the ergodicity breaking parameter [18,31,32,33]. Recently also single-trajectory power spectral methods were proposed [34,35]. These techniques are steadily gaining public recognition, but often the range of their application is still narrow. Moreover, a large part of this important research has a limitation of studying properties "not very different" from the second order on. For example, any power function x α for α > 1 has a similar behaviour to x 2 (i.e., it is an increasing, convex function) and parameters based on it are usually not far away from the classical ones. § They all emphasise highly the tails of the distribution, and any change of distributions for large values of observations has a larger influence than for the small ones. This connection is very helpful in making comparisons, but the important part of the total information is lost and could be extracted using more distinct measures.

A.2. Overview of the codifference
Our main subject of interest, the codifference, is an example for a measure different from those based on moments. It was initially proposed as a tool to measure the dependence for α-stable processes, for which the second moment is infinite [37,38,39,40,41,42]. However, in many systems the divergence of the second moment is not an expected physical property, which limits the range of possible applications of stable processes. It was already noticed, e.g., in [43] that the codifference may be useful for both models with or without finite second moment. In our present work we study the applications of the codifference for a class of models based on Gaussian distributions, which we call conditionally Gaussian processes; as we will demonstrate many useful and widely used models fit into this category.
The definition of the codifference which we will use is as follows: for any stationary process X it is given by the formula . (A2) The sample codifference is introduced in a standard way, by replacing the three ensemble averages E[·] in the above expression by arithmetic averages 1 n n j=1 (·). Similarly, one can consider a time-averaged codifference. For all symmetric distributions the considered averages should be real-valued, so in most of the practical applications one can average over cos(θ(·)) instead of exp(iθ(·)); this was used for the Monte Carlo simulations which will be presented further on.
Note that the so-called generalised codifference has X s+t and X s multiplied by θ 1 and θ 2 respectively and contains even more information [37]. In the context of models that we will consider this additional flexibility does not seem to be meaningful and so the cost of complicating our formulae would be unreasonable.
Conversely, the basic formula for the codifference in the classical book of Samorodnitsky and Taqqu [37] is similar to ours, but with θ = 1. In the mathematical study of stable process this is sufficient, but in more broad physical applications introducing an arbitrary dimensional constant equal to unity is not desirable. In our choice of definition the codifference has the unit of X 2 due to the introduction of 1/θ 2 . This factor makes the codifference comparable to the covariance, and allows us to show them on the same plots. When this is not important the factor 1/θ 2 can be omitted. There exists an even more simplified object, the dynamical functional [44], which is just the numerator minus the denominator from (A2) with θ = 1; it is used to study ergodicity breaking [30,45].
Instead of moments such as the covariance, the codifference depends on sines and cosines of θX s+t and θX s . Expanding these functions into Taylor series around zero up to the two first terms and using the fact that for stationary process E [X t ] = const. shows that the codifference agrees with the covariance for distributions concentrated around the origin. The most essential difference is that the codifference measures mainly the dependence determined by the bulk of the probability density in contrast to the covariance, which puts much larger emphasis on the tails. This is caused by the cancellation of highly oscillatory terms in the tails of the PDF as stated by the Riemann-Lebesgue lemma, which is in contrast to the huge influence of the tails in the covariance caused by the quadratic factor in the probabilistic integral E[X s X s+t ].
Because of the presence of two highly non-linear transformations: sine/cosine and logarithm, definition (A2) may initially not seem very intuitive. It becomes more natural if we interpret it as a conveniently transformed Fourier transform of the distribution (that is, the probabilistic characteristic function). In the full, multidimensional form, the characteristic function contains all information about the dependence. Moreover for Gaussian variables it has the very simple form exp(−(θσ) 2 /2), so it seems reasonable to use it as a dependence measure for models related to the Gaussian distribution. Still, it is not obvious that the codifference behaves as we would require from a memory function. Fortunately, simple arguments show that this is the case: a) When X s+t = X s (the case of total positive dependence) the codifference is a positive constant τ θ X (t) = τ θ X (0) > 0. If the values X s+t and X s become independent, the codifference converges to 0. Both facts are immediate consequences of the definition together with and, for X s+t independent of X s , b) If the process is a sum of independent components X t = Y t + Z t then the respective codifferences are additive This property is important in common applications, where the observed process usually is at least to some degree disturbed by noise, which can most often be assumed to be additive and independent of the basic motion.
c) If E[X 2 t ] < ∞, the covariance can be viewed as a limit of the codifference, which stems from expanding the complex exponents in definition (A2) into a Taylor series up to the second term and noting that we obtained the logarithm of expression (1 + θ 2 r X (t) + o(θ 2 )) θ −2 . It is then justified to treat the codifference as a generalisation of the covariance. d) For a Gaussian process the codifference equals the covariance for any θ which follows immediately from a short calculation, see Eq. (C6). Therefore comparing the codifference and the covariance can be used to measure non-Gaussianity.
One intuitive property, that the codifference does not have, is symmetry. Considering two variables we fix the first one and negate the second one (x → −x), and we expect the strength of dependence to be the same but for the sign to change. This is the case for the covariance, but not for the codifference, which is by design non-linear. Even in the borderline case X s+t = −X s we do not have a guarantee that τ θ X (t) < 0, counterexamples can be given even for the otherwise well-behaved class of processes considered later. It is actually possible to remove this sometimes inconvenient property by introducing the symmetrised codifference which for all symmetric distributions changes sign with respect to reflection, X s+t → −X s+t . This quantity can be useful if one wants to compare the strength of positive and negative dependencies, but there is a cost: the symmetrised codifference is "linear enough" to ignore many types of non-linear ergodicity breaking, similarly to the covariance, see Eq. (C12). For this reason further on we will use the non-symmetrised codifference and study systems with a positive type of dependence, at least in some suitable limit, such as t → ∞.
Note that if the codifference is a generalisation of the covariance, one should reasonably expect that there exists a generalisation of the MSD defined in a similar spirit. Indeed, let us consider the formula This quantity may seem trivial, because studying the distribution in Fourier space is a classical method of basic probability theory. But, the distinguishing part of this definition is that the result is treated primarily as a function of time and it is conveniently transformed, so that it can be interpreted as a measure of dispersion with the same unit as X 2 . Up to a rescaling it can be considered a cumulant generating function calculated at imaginary argument, but such a quantity does not seem to have an established name in the literature, so we will call it by the straightforward term "log characteristic function", in short LCF. It is clear that in analogy to the features of the codifference, the LCF measures mainly the spread of the bulk of the probability and is much less influenced by the distribution's tails than the MSD. As before, the first factor, here 2/θ 2 , is optional and only needed when one wants to compare the LCF to the MSD. The LCF is indeed a reasonable measure of dispersion, as shown by the following properties: b) For any Gaussian process the LCF equals the MSD, c) As we stretch the probability density of X t , the LCF diverges, that is, The first two facts are analogues of the corresponding properties of the codifference which allow one to trace the influence of the noise and detect non-Gaussianity. The point c) is just the Riemann-Lebesgue lemma in disguise: it corresponds to the intuition that the rescaled process should have a larger spread. It should be mentioned that in general the LCF can be negative or complex valued, which is highly undesirable. However, for the considered models, which are based on internal Gaussian dynamics, this will never be the case, as proved in Proposition 2.
Decomposing any process with independent increments into a sum of its jumps shows that in this case ζ θ X (t) is a linear function. In particular, this holds of Lévy flights [37]. It also holds for continuous time random walks with exponential waiting times [5], for which where J is one jump and T is one waiting time of diffusion X. The dependence on T is the same as for the MSD, only the scaling depending on J's distribution changes from non-linear to linear. The LCF can also be used for finite-or infinite-variance models which are "anomalous" in some sense. A basic example is fractional Lévy stable motion L H α [46]. It is stable and self-similar which implies that for some constant C θ , which depends on the chosen normalisation. This formula agrees with the intuition that a measure of the spread in this case should behave like a power law. Somewhat surprisingly, the situation is different for continuous time random walks with power-law waiting times, which are used to model subdiffusion. Such processes after rescaling converge to subordinated Brownian motion B(S α (t)), for which the LCF can be calculated directly, using the well-known properties of the inverse α-stable subordinator S α [47], where E α is the Mittag-Leffler function [48]. This function approaches infinity like a logarithm; the exact asymptotic is shown in Eq. (B3). The difference between these two models of anomalous diffusion is that L H α is self-similar, so its PDF spreads in the uniform manner, whereas for B(S α ) the bulk is much more constrained than the tails.
After this brief discussion about the general properties of the codifference and related notions, we will study its behaviour in more detail for models based on random parameters of motion and for models based on random and time-varying diffusion coefficient. The next section (B) provides a general physical overview and concrete examples useful for the modelling. The third and the last section (C) is dedicated to presenting mathematical results and calculation techniques. The paper is written such that, if the reader prefers, the physical and mathematical sections B and C can be read independently.

B.1. Gaussian diffusion governed by random parameters
One of the core concepts behind ergodicity and ergodicity breaking is the idea of looking at information contained in a single trajectory. We speak about ergodicity if the data that can possibly be gained analysing one, sufficiently long, series of observations, is the same as if one analyses all possible trajectories in the ensemble [49]. Conversely, if this amount of information is smaller, we speak about ergodicity breaking. In other words, there is some information contained in a given trajectory, and using only a single trajectory we omit the amount contained in the rest. This is sometimes also rephrased as confinement in the phase space, but this language must be used carefully as the said space has a subtle structure.
From a different perspective, modelling based on the information content often leads to an intuitive description, because the differences between trajectories often stem from differences between diffusing particles and differences between their local surroundings. Both may occur, e.g., in biological systems. The latter case requires the additional assumption that the inhomogeneity present in the surroundings varies on a length scale of the mean distance between trajectories, but does not vary much at the scale of the Even for classical Brownian motion it is the infinitely dimensional Wiener space [50] trajectories themselves. That is, distinct trajectories have distinct surroundings, but each particle is sufficiently localised so that the state of the medium around it does note change significantly. This is reasonable for example when the particles are trapped or the measurement time is sufficiently short-compare, e.g., the absolute spread of the traced particles in [33].
In any case, this information can be parametrised, which leads to the so-called hierarchical or multilevel modelling [51], which in the context of physics is also called "superstatistics" (a short term for "superposition of statistics") [52]. Deterministic parameters of the basic model become random on an additional statistical layer.
B.1.1. Random diffusion coefficient. For diffusion the simplest example of an hierarchical model is the motion with a random diffusion coefficient, the situation when different trajectories depict movements with varying average mobilities. A typical model of such observations is the grey Brownian motion [53,54,55] Here B H is fractional Brownian motion [56] and the diffusion coefficient D β is an independent random variable with the so-called β M-Wright distribution [57]. The moments of grey Brownian motion are the same as those of fractional Brownian motion up to a multiplicative constant, therefore the MSD still grows as t 2H and the process models anomalous diffusion. Nevertheless, a straightforward calculation yields that the LCF can be expressed using the Mittag-Leffler function, which also yields Here the asymptotic '+o(1)' is pointwise, which is stronger than the asymptotic proportionality '∼'; in the sense of '∼' the term 4H/θ 2 ln t is dominating and the logarithmic behaviour clearly distinguishes the LCF from the power-law MSD at long times. This crossover behaviour can be used to distinguish grey Brownian motion from fractional Brownian motion (case β = 1 [53]) and diffusing-diffusivity model (Eqs. (B27) and (B28)). The very slow log increase of the LCF is not surprising: because the diffusion constant is random, but fixed and it constrains the relaxation of the probability density-it is detected by the LCF, but ignored by the MSD; for a more general result see Proposition 7 d).
Grey Brownian motion models free, unconfined movements and is therefore not stationary. Still, the codifference can be used for its increments ∆B 2H,β (t) := B 2H,β (t + ∆t) − B 2H,β (t). The calculation is again not hard and yields The covariance decays to zero like a power law t 2H−1 , but the function above decays to the non-zero constant This means that there is some degree of dependence left even at t = ∞ which the covariance does not detect, but the codifference does. Indeed, it can be interpreted as a joint dependency on the trajectory-wise fixed but random diffusion coefficient D β . The above simple example shows that the codifference does not directly detect nonergodicity, it rather detects dependence. The notion of mixing is useful to describe this idea. It is a property which states that the future evolution of the process after a long delay becomes independent of its past values. Formally speaking, the process is mixing when, if we calculate some statistic in some finite time interval starting at s, and later on any other statistic starting at s + t, these two must become independent as t → ∞ [58]. Therefore, analysing the codifference, which measures the dependence between exp(−iθX s ) and exp(iθX s+t ), allows one to exclude mixing, i.e. to indicate the presence of a non-vanishing dependence. The latter means that the motion is constrained in phase space, which in turn implies ergodicity breaking. ¶ Thus, for a very large class of systems one does not need to study time-averages to detect non-ergodicity. It is sufficient to find a proper memory function which will indicate non-mixing. As we demonstrate the covariance fails in this role for the considered models, but the codifference works.
These detecting capabilities of the codifference work under quite general circumstances. If we observe any ensemble of mixing, zero mean Gaussian trajectories, the covariance will converge to zero. This happens because for Gaussian process, mixing is equivalent to a decay of the covariance [58,59], and the mixture of decaying covariance functions is decaying. But, the ensemble of trajectories as a whole will not be ergodic, which will not be detected by the covariance. Let C is some parametrisation of this mixture, then the conditional average D = E[X 2 t |C] be the resulting, possibly random, conditional variance. We call it D because if the data X corresponds to the velocity or increments of displacements, it will be proportional to the diffusion coefficient. Under these assumptions the codifference converges to the constant as proven in Proposition 5. This quantity is related to the coefficient of variation defined as the standard deviation divided by the mean [23]. Denoting it by CV[X], the formula above can be expressed as θ −2 ln(CV[exp(−θ 2 D/2)] 2 +1) which is an increasing function ¶ The remaining class of processes which are ergodic but non-mixing is complicated and those do not seem to appear in applications. For a mathematically constructed example of such a process and the discussion see [59].
of CV[exp(−θ 2 D/2)] and asymptotically quadratic for small CV. The coefficient of variation is a measure of dispersion, hence so is τ θ X (∞) which reflects the randomness of D. This behaviour is also equivalent to detecting a residual dependence and the resulting non-mixing/non-ergodicity.
Outside of the useful limit t = ∞ not much can be said about the properties of the codifference in such a wide and general class. The situation changes if we consider a more specific model. The idea behind grey Brownian motion and many works about superstatistics [52] is that the trajectories differ mainly by the diffusion coefficient, other properties are not significantly distinct. A simple model of such a system can be written as We assume that the process Y describes the joint form of dependence common for all trajectories. We consider a Gaussian Y , which for grey Brownian motion would be fractional Brownian motion. Another reasonable choice would be, e.g., a solution of the Langevin equation. In this case, as long as Y is stationary (i.e., for free diffusion we consider increments or the velocity process), the covariance is of course as long as E[D] < ∞. If the process Y has sufficiently long memory, r Y (t) ≈ 0 in the considered time scale, also r X (t) ≈ 0. The covariance does not detect the additional dependence introduced by random D. At the same time the codifference can be expressed as a function of the covariance of Y , precisely as for any D, no matter if E[D] < ∞. It clearly converges to the constant (B6) as r Y (t) → 0 and detects the additional non-linear dependence.
For a general, possibly non- Given some model of D these formulae can be made completely explicit, examples are given in Table 1. The first example is the gamma distribution D d = G(α, β) in which the coefficient α describes the power-law behaviour of the PDF near 0 and β is the rate of exponential decay of the tails (the specific case G(1, β) is the exponential distribution); it models common types of experiments in which the distribution of diffusion coefficients resembles a bump concentrated around some finite constant and high values of D become exponentially less probable. This case is also illustrated in Figure 1.
Diffusion coefficients with a heavy-tailed distribution result in a motion that itself exhibits heavy tails of the PDF, a phenomenon actively investigated in transport, finance, turbulence and many other systems [6,60,61]. A classical model of this case , as given by Table 1. Various properties of the codifference are visible: for θ → 0 it converges to the covariance; the codifference and the covariance increase and decay in the same intervals; at t = 0 the codifference is smaller than the covariance; as t → ∞ the codifference converges to a θ-dependent value τ θ X (∞) which is a functional of the law of D; the type of asymptotic of r X (t) and τ θ X (t) − τ θ X (∞) is the same (here: exponential decay). The derivations are presented in Proposition 6.
is the one-sided α-stable subordinator S(α, c), determined by its Laplace transform exp(−(cs) α ). The resulting type of process was thoroughly studied in the literature concerned with stable distributions [37]. This process is called sub-Gaussian, which is arguably a confusing term. In this case the process X has no second moment, therefore attempts to estimate its covariance will lead to a diverging result. This is visible in the formulae for the codifference and the LCF, which diverge as θ → 0. But, for any θ > 0 the codifference and the LCF are finite and can be estimated in a standard way, and from the result if one wishes the covariance and the MSD of Y can be reconstructed.
For a distribution concentrated around its mean value one can use Gaussian N (µ, σ 2 ) or uniform U(a, b) distributions, however the former is only a valid model for σ µ, when the probability that D < 0 can be neglected. Even if the precise model of D is not known, quite a lot can be said about the behaviour of the codifference. In Proposition 6 we show that a) The codifference is a monotonic function of the covariance. If one increases, the second one also increases, the same goes for decreases.

b) If E[D]
< ∞ the codifference is smaller than the covariance for strong positive correlation, but larger for weak or negative correlations.
c) The approach to the value τ θ X (∞) has the same asymptotic as the decay of the Table 1. Formulae for the codifference and the LCF corresponding to common models of D: gamma, one-sided stable, Gaussian and uniform. covariance These are all desirable properties: the memory structure of the internal process Y is reflected in a straightforward manner by the codifference. For small values of the covariance their relation is even linear, as stated in c), and the proportionality constant is finite for any distribution of D, due to the truncating factor exp(−θ 2 D).
Another property is that the codifference depends additively on D. Precisely speaking, if we decompose D = D + D for some independent D and D , the codifference also decomposes for where X and X are processes with diffusion coefficients D and D respectively. Therefore subtracting the codifferences estimated from different samples may be used to analyse different sources of diffusivity. The derivation is given in Proposition 6. Analogous features can also be checked for the LCF (Proposition 7), which can also be decomposed for D = D + D and is a monotonic function of the MSD, but is always smaller than the MSD, therefore detecting the additional constraints of the motion introduced by a random D.
At the end of the discussion about random diffusion coefficients we note that the behaviour of the codifference near t = 0 can also give valuable information. In Proposition 8 we prove that for a typical case when E[D] < ∞ its asymptotic reflects that of the covariance. However, if E[D] = ∞ and D has power tails, corresponding to the presence of high-volatility trajectories, the asymptotic of the codifference has an additional power law. As for Gaussian processes the behaviour of the covariance near t = 0 is determined by their fractal dimension [62, chapter 8.8], the same is true for the codifference, which can be applied also for processes with no moments.
B.1.2. Random memory decay rate. Another interesting type of models are ensembles of particles for which the time dependence may vary from trajectory to trajectory. The simplest model of a time-varying dependency is the exponential decay exp(−tΛ), which is the covariance of Ornstein-Uhlenbeck process [63]. It models many kinds of linear relaxation disturbed by additive noise. It was also studied as a model of the additive measurement noise itself [64,65]. In the hierarchical model the decay rate Λ may be random. The covariance of the resulting mixture of Ornstein-Uhlenbeck type trajectories was studied in [66] in the context of a randomly parametrised Langevin equation.
The coefficient Λ has a different physical interpretation depending on the details of the studied phenomenon. For the velocity of a Brownian particle it is proportional to the friction coefficient and its randomness is related to local changes of the viscosity and/or different shapes of the diffusing particles [67]; in this system the fluctuationdissipation relation also links the scaling to the temperature. For trapped particles Λ is proportional to the stiffness of the confining harmonic potential (the prominent example being optical tweezers [21,68]), therefore the randomness of Λ is equivalent to an ensemble of traps with varying sizes, which are proportional to Λ −1 .
Another case worth mentioning is that of viscoelastic anomalous diffusion [69], for which the velocity (or increments) have power-law dependence ∝ t 2H−1 . This function can be expressed as exp(− ln(t)(1 − 2H)). Therefore it is enough to replace t with ln t and the results further on will also follow for the ensemble of power-law memory trajectories characterised by random parameter (1 − 2H). It is worth to note that the variability of the of the Hurst index H seems to be more of a rule than an exception for biological systems [70,71,72].
We do not want to make the discussion overly technical, so below we will analyse only the case of deterministic scaling and random decay rate, r X (t|C) = σ 2 exp(−tΛ). Results for more general Df (Λ) exp(−tΛ) are presented in Propositions 10, 11 and 12, which prove that the randomness of the scaling is not essential for most of the properties discussed below. We also note that sometimes one can remove the random scaling and normalise the trajectories using the estimate of scaling obtained from the Birkhoff ergodic theorem [58], However, this procedure requires having access to sufficiently long trajectories. A particular property of ensembles with fixed scaling is that any marginal distribution is Gaussian, i.e., all variables X t have Gaussian distribution with variance σ 2 . But the codifference can be found to be and because it does not equal the covariance, the process as a whole is not Gaussian. The codifference indicates the presence of subtle non-Gaussianity of the memory structure.
This formula can also be used to derive useful bounds between the codifference and the covariance, see Proposition 9. Expanding in a Taylor series the exponent from (B14) leads to Note that σ 2 E[e −ktΛ ] = r X (kt), so the result is a type of average over the values r X (kt).
When the distribution of Λ is not sufficiently concentrated near 0 and the covariance decays fast (strictly speaking is rapidly varying [73,74]), the term k = 1 dominates the t → ∞ asymptotic. This is the case, e.g., for the one-sided stable variable Λ that is, we observe a stretched exponential type of dependence. When Λ is more concentrated around 0 the situation differs. A basic example would again be the gamma distribution When α = 1 (i.e., Λ has an exponential distribution) the above can also be written using the incomplete gamma function. For any α all terms in the sum decay like t −α and they are comparable. Because of this, the codifference also decays with the same power law, but the proportionality constant is non-trivial , It is not surprising that this behaviour is not specific to a gamma distribution and can be observed for any Λ with power-law PDF near 0 + , see Proposition 10. Similarly, if the PDF of Λ decays fast near 0 + , the codifference also decays fast. All these properties are analogous to those of the covariance [66], so here they can be used interchangeably or simultaneously, as a mean to obtain stronger statistical verification.
They are also similar in that both do not detect the non-ergodicity, more precisely the non-mixing, of this system. As was already demonstrated for the covariance it is a common occurrence resulting from its linearity. The codifference fails, because it does measure only a reduced form of mixing. For the process to be mixing it means that any two sets of multiple disjoint measurements must become asymptotically independent, i.e., the vectors [X s 1 , X s 2 , . . . , X sn ] and [X s 1 +t , X s 2 +t , . . . , X sn+t ] have to become independent as t → ∞. The codifference (and for that matter also the covariance) measures only the dependence between two values X s and X s+t .
For a process with a random decay rate these are asymptotically independent and the one-point distributions are relaxing. Therefore, in order to detect non-ergodicity, we need to analyse the dependence between at least three values. A practical choice is to use four values divided into two pairs [X s , X s+∆t ] and [X s+t , X s+∆t+t ]. The values in the first pair are correlated as e −∆tΛ trajectory-wise, analogously for the values of the second pair. This property of both pairs is fixed and random, i.e., it is a constant of motion which can be detected. Probably the simplest method to achieve this is to calculate increments and study the codifference of those. A short calculation given in Proposition 11 shows that this method indeed works and The result depends on Λ in a complex manner, but it can be easily estimated numerically. We can also use the fact that for small ∆t the conditional covariance of increments is and normalise the process, ∆ X t := ∆X t / √ ∆t. The result then simplifies and becomes independent of ∆t, We stress here that this method cannot be applied using the covariance, which, calculated from increments, decays to 0 and does not detect this specific memory structure. Its decay is even quicker than for the original process and proportional to the power law decay t −α−2 [66]. Intuitively speaking, the decay rate is quicker by a factor t −2 , because the scale of ∆X depends on Λ as Λ 2 and the trajectories with stronger correlation have smaller amplitude and add less to the average. This property has its analogy for the codifference, for which τ θ ∆X (t) − τ θ ∆X (∞) also decays like t −α−2 (see Proposition 12 for a more general result). This time the faster decay rate actually helps in detecting ergodicity breaking, making the limit τ θ ∆X (∞) visible even at short times. The numerical illustration of the discussed behaviour is shown in Figure 2.

B.2. Diffusing-diffusivity
In the preceding sections we considered models which were non-Gaussian and nonergodic. For non-Gaussian but ergodic models the codifference can also be a useful measure of dependence. In particular we show that it can be successfully used to analyse diffusing-diffusivity models. We now assume that the increments of X t are Brownian fluctuations, but rescaled by a time-dependent random diffusivity D t , (B23) In the presented domain the covariance r ∆X was negative, so we plotted the negated value. One can observe the predicted power law decays ∝ t −3/2 and ∝ t −3/2−2 (Eq. (C62)); the codifference of increments detects the non-ergodicity by converging to a constant τ 1.5 ∆X (∞) ≈ 0.105, which fits perfectly Eq. (B20). The value θ = 1.5 was chosen to best illustrate the interesting properties; for smaller θ the codifference τ θ X becomes closer to the covariance r X , for larger θ the codifference τ θ ∆X converges faster to the t → ∞ limit. To present smooth curves in the whole presented range we used a large 10 7 sample; the general shape of the presented functions is already visible for samples around 10 4 ; a significant difference between r ∆X and τ ∆X is observed using even a few hundred trajectories. Examples for smaller sample sizes are presented in figure A1. This is a generalisation of the random parameter model, for which D t = const. Because we modified the dynamical equation by replacing the previously constant parameter with a stochastic process, models of this class are sometimes called "doubly stochastic" [75]. Before application in physics, they were extensively used in financial engineering, where it is natural to assume that parameters of the market, such as the volatility, vary in time. In 1985 Cox, Ingersoll and Ross [76] proposed a model of interest rate (now commonly named CIR), which describes a non-negative stochastic process with linear mean-reverting property. In 2012 Chubynsky and Slater independently proposed a special case of the CIR process as a model of non-Gaussian diffusion [19,77]. This led the way to a wider range of models based on fluctuating diffusivity coefficient with a short time memory [78,79,80,81,82]. The evolution of the diffusion coefficient in the CIR model is defined by the stochastic equation where a > 0 describes the speed of return to the mean b > 0, and σ > 0 regulates the amplitude of the fluctuations. In this equation as D t → 0 the term a(b − D t )dt ≈ abdt > 0 starts to dominate the fluctuations with the mean-squared amplitude E[( √ D t dB t ) 2 ] = D t dt, consequently dD t > 0 which causes the motion to stay positive. We assume that the system evolved for a long time before the start of the measurement and has reached the stationary gamma distribution D 0 d = G(2ab/σ 2 , 2a/σ 2 ) [83]. Because of the non-Gaussianity the LCF function should differ from the MSD. Conditioning by D t , it can be expressed by the formula Expanding the above in powers of θ 2 shows that again ζ θ X (t) → δ 2 X (t) as θ → 0. The average in (B25) appears in the calculation of the expected price of zero-coupon bond and was calculated in the initial paper of Cox, Ingersoll and Ross [76], who derived the differential equation which it fulfils and then solved it; a more general result is also available in [83]. The calculation was performed for the case when D 0 is fixed and deterministic, however their result can be easily extended for stationary D by averaging over the equilibrium G(2ab/σ 2 , 2a/σ 2 ) distribution of D 0 . Then the formula for the LCF reads (B26) with γ θ = a 2 + (θσ) 2 . From that a brief calculation proves that the motion is Fickian for long times and also for short time, albeit with a diffusion scale agreeing with the MSD which should come as no surprise. For an illustration of these formulae see Figure  3, where we present results of Monte Carlo simulations compared to the theoretical predictions. See also the crossover behaviour of the MSD in the random diffusivity model in [81].
If we want to analyse the codifference of the CIR model, it would be required to study the memory of the velocity V t = √ D t dB t /dt. But the white noise dB t /dt is not well-defined in a classical sense. It can be interpreted as a distribution which leads to a similar redefinition of the covariance, the familiar Dirac delta. The codifference is, however, non-linear and this approach fails. The solution is to consider only the welldefined velocity processes V t = √ D t Y t with Y t being some classical process which models the velocity as being undisturbed by the fluctuations of the diffusivity. The behaviour of the white noise can be studied if we consider t large enough such that r Y (t) = 0 strictly or approximately. It is natural to assume that Y t is Gaussian, while choosing the model of D t is more subtle. It is clearly observed that the MSD exhibits a single linear law whereas the LCF switches between two linear laws at t → 0 + and t → ∞. Also note that for large θ and t the estimation becomes unstable. It is caused by E[cos(θX t )] becoming comparable in amplitude to the estimation uncertainty; for this reason one should be careful using the codifference and the LCF in the range θX t 1.
The CIR process for ab ∈ N, can be proved to be a sum of squared independent Ornstein-Uhlenbeck processes, which follows directly from writing the stochastic differential equation of such a sum [83]. Thus, a natural generalisation is to consider D t being a square of a Gaussian process [80,81]. We will assume that the velocity can be decomposed as where both Z t and Y t are Gaussian with variance one. In this model we have ample freedom in describing a wide range of memory types, because any covariance r Z and r Y can be used. By choosing r Y we model the internal dynamics, if r Y (t) = 0 in the considered time scale we arrive back at (B23); by choosing r Z we model the memory structure of D t : exponential, power law, oscillating, etc. The one-dimensional distributions are more rigged, as we limit ourselves to D t having the PDF of a square Gaussian, that is χ 2 1 distribution (a special case of the gamma distribution). A rather technical derivation (Proposition 13) then shows that the exact form of the codifference This formula looks complicated, but is composed only of elementary functions. It is illustrated in figure 4, were we plotted the codifference τ θ V as a function of r Z and r Y for four different θs. Having calculated the codifference for at least two θs, one can solve the system of equations resulting from (B30) and calculate r Z , r Y . This procedure may be considered simpler than using the covariance r Z , which requires calculating the average of |Z s Z s+t | given by a hard-to-evaluate integral. The covariance r V can also be obtained from taking the limit θ → 0 of the codifference.
More importantly, when r Y (t) = 0 the codifference is clearly non-zero, so it detects the dependence introduced by D t = Z 2 t . Its asymptotic for small r Z (t) (e.g., at long times) in this case is the simple relation Thus the codifference detects the memory structure of the time-varying diffusion coefficient D t = Z 2 t even in the regime r Y (t) = 0 in which the covariance r V (t) is zero and does not contain any important information. This is also true when r Z (t) = 0 but r Y (t) = 0, this time the codifference is asymptotically proportional to r Y (t); the proportionality constant depends only on the one-dimensional distributions of D, the exact form of the dynamics does not matter, see Proposition 14.
For some systems different models of D t may be more suitable. When D t is strongly concentrated around its mean value a possible choice is a simple Gaussian centred around some b, V t = (σZ t + b)Y t . This model permits the unphysical situation when D t < 0, but when σ b the probability of this event is negligible. In this case an elementary formula for the codifference also can be given (see (C78)) and again even for r Y (t) = 0 the internal dependence of D t is still detected, this time with asymptotic (B32)

B.3. Discussion
The aim of this work was to provide the theoretical background for using the codifference as a dependence measure suited for the study of various non-Gaussian and ergodicity breaking models. This goal was achieved in few steps. First we proved that the codifference has intuitive properties that one would expect from a reasonable memory function, such as additivity, positivity for the case of complete dependence and being null for the case of independence. Second, we showed that it can be calculated using fairly straightforward methods for typical random parameters and diffusing-diffusivity models, which represent a significant extension of the previously established results for stable and infinitely divisible processes. Finally, we analysed how the codifference detects forms of dependence and ergodicity breaking which cannot be easily studied using solely covariance-based methods. We also showed one example of non-detected ergodicity breaking, the case of a Langevin equation with a random return rate. In this case we offer an easy fix: the codifference works well for the increments of this process. We note that within this paper we did not analyse ergodicity breaking caused by ageing. In principle, the codifference should work, but the analytical analysis will be challenging for many of these phenomena.
In addition to the codifference, we also discussed a related quantity, the logarithm of the characteristic function (LCF), which was interpreted as a measure of dispersion. Our contribution is an extension of the Fourier methods and a distinct view based on ideas previously developed only for heavy tailed α-stable distributions. The codifference is also very closely related to the theory of the dynamical functional, which was already successfully used for real data, and should be considered a part of the same framework.
The cost of using this technique is that linearity is a powerful analytical tool, especially for complicated models, and a significant part of this strength is lost when using the codifference. The more complicated defining formula also may make its form more complicated (e.g., see Table 1). However, it is a clear application of the characteristic function which does not seem to be commonly acknowledged and the Fourier-based techniques by themselves are widely used by the scientific community. Thus, it has an advantage, offering a wide choice of established analytical methods and estimation techniques. In some cases (e.g., (B30)) the codifference has a simpler form than the covariance.
We believe that the most important example that was considered was also the simplest: deterministic motion with its scale (diffusion coefficient) varying from trajectory to trajectory. The observed asymptotical behaviour of the codifference contains a lot of useful information and lays the foundation for possible future applications in more complex and realistic models, some of which we discussed. At the same time we stress that even this initial, highly simplified model is being commonly used, especially in biophysical systems.
We are confident that the obtained results are interesting in their own right, but we also promote their additional value by indicating the limitations of the methodology based on the MSD and the covariance. Both are, without a doubt, essential parts of the scientific language related to diffusion and complex phenomena, but their limitations are becoming more and more evident, as contemporary research starts to concentrate around non-Gaussian systems with complicated memory structure; the change is stimulated by increasing experimental evidence. These complex and non-linear phenomena require new complex and non-linear methods.

C.1. Basic definitions and properties
All processes considered in this work can be labelled as "conditionally Gaussian". In practical applications these processes are Gaussian locally, in the temporal or spatial sense. The formal definition is more general. Definition 1. We call a process conditionally Gaussian when any of its finitedimensional distributions is a Gaussian distribution under some conditioning by σalgebra C. That is, any finite dimensional distribution X := [X t 1 , . . . , X tn ] can be written as where A and µ are a C-measurable n × n random matrix and an n-dimensional random vector. Both may depend on t 1 , . . . , t n . The vector Y is i.i.d N (0, 1) and is independent of A and µ. If µ = 0 for any t 1 , . . . , t n we call a process conditionally centred Gaussian. Further on we will consider only this class. Similarly, we call a process conditionally stationary Gaussian, if the distribution of A and µ does not depend on time translation t 1 , . . . , t n → t 1 + t, . . . , t n + t. Proposition 1. The distribution of a conditionally Gaussian process is completely determined by the knowledge of C, the conditional mean and the conditional covariance The process is conditionally centred if and only if µ X (t|C) = 0. The process is conditionally stationary if and only if µ X (t|C) = const. and r X (s, t|C) is a function of t − s, denoted r X (t − s|C).
Proof. This is a direct consequence of the equality The conditional probability on the right is a Gaussian integral and a function of µ X (t|C) and r X (s, t|C). The representation of conditionally centred and stationary processes are just a reflection of the analogical representations for Gaussian processes.
which is very useful for calculations. For non-centred process the additional term (C7) appears. Here all averages are finite, but they can generally be complex values, moreover in particular cases the averages in the denominator can be 0. This strongly suggests the codifference should be used carefully in this case (the same applies to the LCF).
Additionally, representation (C6) yields another desirable property of the codifference: Proposition 3. For a conditionally centred Gaussian process with positive covariance r X (s, t|C) the codifference τ θ X (s, t) is also positive, a negative conditional covariance implies negative codifference.
If the support of r X (s, t|C) is on both positive and negative half-axes, the sign of the codifference may vary, but it is worth noting that with r X (t, t|C) and r X (s, s|C) fixed, it depends monotonically on r X (s, t|C), so if the conditional covariance is smaller in the sense of stochastic dominance, the codifference will also be smaller. Now, a simple fact follows only from the expansion ln(x) ∈ x − 1 + o(x) as x → 1.

Proposition 4. For any stationary process X with asymptotically independent values
Proof. We assume that X s+t and X s are asymptotically independent as t → ∞ (note that this property is not sufficient to imply that X is mixing). Therefore and the ratio of expected values under the logarithm converges to 1 so we can use the expansion ln(x) ≈ x − 1.
This simple fact is a prototype for the later results, which describe cases when it is possible to remove the non-linear logarithmic function if the process can be somehow decomposed as a transformation of some weakly dependent variables.
If the process X does not have asymptotically independent values the non-linearity cannot be removed at t → ∞, but if it is an ensemble of such processes (i.e., the conditioned process is mixing), it can be shown that the codifference converges to a positive constant, non-linearly dependent on the law of D.

C.2. Random parameter models
Proposition 5. If the process X is an ensemble of mixing stationary centred Gaussian processes, then, denoting and equal 0 only for deterministic D.
Proof. The calculation is simple. Because r X (t|C) ≤ D almost surely the random variable e θ 2 (r X (t|C)−D) is positive and bounded by 1 for every t. We can commute the limit with the logarithm and the averaging, getting The non-negativity of the above stems from Jensen's inequality applied to the function x → x 2 and the variable e −θ 2 D/2 .

Remark.
A similar calculation repeated for symmetrised codifference (A8) shows that it does not exhibit this behaviour. Under the same assumptions i.e., it cannot detect this form of residual dependence and ergodicity breaking.
Proposition 6. Let the process X have the form where Y is a stationary Gaussian process, E[Y 2 t ] = 1, and D > 0 is a random variable independent of Y . Then the codifference has the form (C14) a) It is additive with respect to D, that is if D = D + D for independent D and D , then b) It is an increasing function of the covariance r Y (t), which is smaller than r X (t) for r Y (t) close to 1 and larger than r X (t) when the latter is close to 0. If E[D] < ∞ the difference τ θ X (t) − r X (t) decreases as a function of r Y (t). c) For any mixing Y the difference τ θ X (t) − τ θ X (∞) exhibits the same type of asymptotic as the covariance r Y (t), that is Proof. Let us start from writing the conditional covariance, which implies that If we substitute D = D + D both numerator and denominator factorise as products of independent random variables. The formula follows.
In point b) the monotonic dependence is a consequence of the fact that only the numerator of the fraction in (C14) depends on r Y (t). It is a Laplace transform of the variable D calculated at the point θ 2 (1 − r Y (t)), it decreases as the argument increases, so it is an increasing function of r Y (t). This dependence is continuous. When r X (t) = 0, e.g., always for t = 0 formula (C14) simplifies and we can apply Jensen's inequality, For r Y (t) close to 0 we can use Proposition 5 to determine, that the codifference is positive. For the last property listed in b), let us write the difference τ θ X (t) − r θ X (t) as a function of r = r X θ(t), Using the majorised convergence theorem, the derivative of the numerator exists and determines the sign of f . Denoting F r : where we used the fact that E[F r ] = 0 and x(e −x − 1) ≤ 0. For c) consider τ X (t) − τ X (∞) and use the expansion (C23) Now we can rearrange the right side of the above equation and get The analogues of a) and b) also hold for the LCF, the derivation is very similar as in Proposition 6 so we only state the result. Proposition 7. Let the process X have the form where Y is a centred Gaussian process and D > 0 is a random variable independent of Y .
Then the LCF has the form and: c) It is an increasing function of the MSD δ 2 is non-negative and increases as δ 2 X (t) increases.
The asymptotic of the codifference near zero depends on the tail behaviour of p D and can be used to study it. This statement is clarified by the following result.
Proposition 8. If the stationary Gaussian process Y is mean-square continuous and for some slowly varying function L, then Proof. For a mean-square continuous Y the covariance r Y is a continuous function. The codifference is also continuous and ln(x) ≈ x − 1 implies that The derivation for ζ θ X is similar. For point b) we write the asymptotic of τ θ and simplify the ratio under investigation Now, let us move our attention from a random D to the class of processes, for which the shape of the covariance function varies from trajectory to trajectory: Proposition 9. For a mixture of stationary Gaussian processes with fixed non-random scale D = σ 2 τ θ X (t) = 1 θ 2 ln E e θ 2 r X (t|C) . (C37) The above formula also implies that Proof. Assumption of a fixed variance means that E[X(t) 2 |C] = σ 2 for some deterministic σ 2 . Using the conditional expectancy it follows that Now the left inequality is just Jensen's inequality applied to the function ln. The right inequality follows from two approximations: the first is ln For the exponentially decaying conditional covariance stronger results are available: Proposition 10. For a mixture of stationary centred Gaussian processes with conditional covariance r X (t|Λ, D) = De −tΛ , with Λ and D independent, we observe the following asymptotic properties.
a) Power law behaviour: if p Λ (λ) ∼ L(λ)λ α−1 , λ → 0 + for slowly varying L, then where the constant C α,θ is where X is a solution of the Langevin equation with viscosity Λ and the same D.
Proof. For a) first we apply the expansion ln( (C44) Therefore Note that the sum within consists of positive terms, so the commutation of expectation and sum is justified. Now, knowing the asymptotic p Λ (λ) ∼ λ α−1 , λ → 0 + we can apply the Tauberian theorem The sum (C45) consists of positive terms, so let us study its asymptotic where the commutation of taking the limit and the sum is justified by the inequality The right term is convergent with respect to t, therefore it is bounded, so the left term is uniformly bounded with respect to k and we can use the dominated convergence theorem.
Note that the resulting sum is also bounded with respect to α, This concludes the derivation of a). Now let us prove b). We fix integer N > 0 and then make the estimation The limit follows because it is a convergent sum of positive terms. In order to prove the last point c) notice that e −tλ 0 < 1 and e tλ 0 > 1 so x → x e −tλ 0 is a concave function and e −θ 2 De tλ 0 ≤ e −θ 2 D . Therefore In the next proposition we will study the properties of the increment process and use it to detect non-ergodicity.
Proposition 11. Considering the same process as in Proposition 10, the codifference of its increments ∆X t converges to a constant which equals 0 only when both D and Λ are deterministic. After suitable rescaling ∆ X t := ∆X t / √ ∆t the limit becomes independent of ∆t, Proof. The reasoning is similar to the one shown in the proof of Proposition 6 b). The increment process ∆X t is a stationary process, which is conditionally Gaussian. We can calculate its conditional variance and the variance of the difference The limit of the codifference is Applying Jensen's inequality to the variable e 2θ 2 De −∆tΛ and the function x → x 2 yields the inequality. For the rescaled process it is straightforward to calculate that The last considered class of covariance functions is Df (Λ) exp(−tΛ). The increment process from Proposition 11 fits this class with f (Λ) = 1 − exp(−∆tΛ), higher order increments and other similar transformations correspond to more complex f , but their behaviour at 0 + can be easily traced. Note that the proposition below is not a straightforward generalisation of Proposition 10. The statements and methods of the derivation below are similar, but the assumptions do not coincide, because the introduction of the scaling f (Λ) with a power law at 0 was made at the cost of adding the strong requirement about the fast decay of tails of D, E[exp(θ 2 D)] < ∞: Proposition 12. Let us consider the stationary, conditionally Gaussian process characterised by the conditional covariance r X (t|Λ, D) = Df (Λ)e −tΛ . (C60) Now, let us assume that D and Λ are independent, E e θ 2 D < ∞ and the PDF of Λ has the form p Λ (λ) ∼ L(λ)λ α−1 , f (λ) ∼ λ γ , λ → 0 + ; α, γ > 0, for slowly varying function L. Then, for this class of processes Proof. We start from the formula which has the asymptotic We thus need to study the tail behaviour of We will analyse it using a bottom-up approach and start from considering the long time Remark. Propositions 10 and 12 above can be generalised by replacing t by g(t) in the formula for the covariance, the only requirement is that g(t) → ∞ as t → ∞. This allows one to consider some more general types of the dependence, e.g., the power-law t −2H corresponds to Λ = 2H and g(t) = ln(t).

C.3. Diffusing diffusivity
Proposition 13. Let us assume that Y and Z are centred stationary Gaussian processes. Without loss of generality we assume E[Y 2 t ] = E[Z 2 t ] = 1. Let X be given by a) with deterministic σ, d > 0. Then the codifference of X is given by elementary formulae, as given at the end of corresponding derivations in Eqs. (C78) and (C82).
For b) the denominator is simple and yields (1 + (θσ) 2 ) −1 . The numerator can be expressed as E e iθσ(|Z s+t |Y s+t −|Zs|Ys ) = E exp − (θσ) 2 2 Using the formula for the two-dimensional density of [Z s+t , Z s ], the term under the logarithm in the formula for the codifference can be expressed as an integral over the function where C := 1 + (θσ) 2 2π 1 − r Z (t) 2 , s := 1 2 (θσ) 2 + 1 1 − r Z (t) 2 , The integration over R 2 of (C80) can be changed to an integration over R 2 + I = 2C Now, the codifference is τ θ X (t) = θ −2 ln I.
When r Y (t) = 0 the above formulae simplify significantly and simple asymptotic can be derived by direct computation, see Eqs. (B31) and (B32). The case r Z (t) = 0 also leads to a simplification and can be considered in a more general setting.
where D has the same distribution as D s or D s+t .  Figure A1. Estimated codifference τ and covariance r for sample sizes (from left to right) 10 4 , 10 3 , and 500.
Proof. We take t large enough so that we can represent the values of X as X s = √ D 1 Y s and X s+t = √ D 2 Y s+t for i.i.d. D 1 and D 2 . Using a conditioning on D 1 , D 2 the codifference can be expressed as Now we consider the numerator in the above, divide it by r Y (t) and, using dominated convergence as in previous propositions, E e −θ 2 (D 1 +D 2 )/2 e θ 2 r Y (t) The result follows.
Appendix A. Sample size dependence of codifference and covariance In supplement to figure 2 we show in figure A1 that even for smaller sample sizes such as 10 4 , 10 3 , and 500 significant differences between the covariance and codifference of increments are visible.