Correlation functions as a tool to study collective behaviour phenomena in biological systems

Tomás S Grigera

doi:10.1088/2632-072X/ac2b06

1. Introduction

Biological systems behave in complex ways. This is particularly evident at the level of whole organisms: animals can adapt to a variety of situations, and this capacity is related to the large number of states (spatiotemporal patterns) that their brains can explore. These states are realised on some physical substrate (neurons in the case of the brain), but it is quite plausible that the same kind of behaviour can be produced by another system, composed of units that interact among themselves in a similar way, but microscopically of a very different nature. For example, robot birds could interact to form flocks that look macroscopically like starling murmurations, or a collection of neurons simulated in software, or built with electronics, could produce the same kind of patterns and behaviour as a brain (this has actually been attempted using simulated neurons wired with the connectome of the nematode Caenorhabditis elegans to drive a robot, see Urbina et al (2020)). The point is that many of the interesting phenomenology of biological systems stems from collective behaviour (Kaneko 2006), and that much of it may arise independently of the details of the biological units making up the system. A first description of collective properties starts of course with computation of average global quantities. But in complex systems there is much information on how fluctuations around those averages are structured in space and time. Correlation functions, the subject of this article, can capture information about these fluctuations, and thus constitute one of the basic statistical physics tools to describe collective properties. To be sure, there is more to collective behaviour than what is captured by the two-point correlation functions we discuss here. But correlation functions should be in the toolbox of any researcher attempting to understand the behaviour of complex biological systems, and it is probably fair to say that, although they have been used and studied for a long time in statistical physics, they have not been exploited enough in biology.

Among systems exhibiting nontrivial collective behaviour, critical systems deserve special mention. At a critical point (i.e. at the border between two static or dynamic phases) of a continuous transition, long-range correlated fluctuations dominate the large-scale phenomenology, so that many critical systems look alike at large scales: this is called universality. Criticality is rare, in the sense that systems are critical only in a negligible region of the space of their control parameters. However, criticality appears to play an important role in biological systems (Bak 1996, Mora and Bialek 2011 and Muñoz 2018) and in particular the brain (Chialvo 2010), with different signs of critical behaviour having been found in systems as diverse as proteins (Mora et al 2010 and Tang et al 2017), membranes (Honerkamp-Smith et al 2009 and Veatch et al 2007), bacterial colonies (Dombrowski et al 2004 and Zhang et al 2010), social amoebas (De Palo et al 2017), networks of neurons (Beggs and Plenz 2003 and Schneidman et al 2006), brain activity (Fraiman and Chialvo 2012 and Haimovici et al 2013), midge swarms (Attanasi et al 2014 and Cavagna et al 2017), starling flocks (Cavagna et al 2010a) and sheep herds (Ginelli et al 2015). Since the odds that the system's parameters are tuned to critical just by chance are vanishing, the question arises of why critical systems appear to be so common in biology. An answer may be that criticality is the simple path to complexity (Chialvo 2018) and thus the reason why a functioning brain, for instance, needs to be 'at the boundary between being nearly dead and being fully epileptic' (Mora and Bialek 2011). Two-point correlation functions are an important tool in the exploration of critical systems. Again, they are not the whole story, but a necessary instrument nevertheless.

The purpose of this contribution is to present in a brief but self-contained form, the definitions of several two-point correlation functions together with a discussion on how to compute them from experimental or simulation data. We aim to give enough practical detail to enable students or researchers with no strong background in statistical physics to start to calculate correlation functions on their own data. At the end we consider a simple neural model, to illustrate how correlations can be studied on neuronal networks, and to argue that a simultaneous study of space and time correlations can yield useful information, and perhaps an alternative view, of their dynamic behaviour.

In the following, we present the theoretical definition of two-point correlation functions (section 2), followed by a discussion on their computation from experimental data (section 3). Then we discuss the definition and computation of the correlation length and time (section 4) and some properties of the correlation functions of near-critical systems (section 5). Finally we present some results on space and time correlations on a neuronal model (section 6).

2. Definitions

2.1. Correlation and covariance

Let x and y be two random variables and p(x, y) their joint probability density. We use ⟨...⟩ to represent the appropriate averages, e.g. the mean of x is $\left\langle x\right\rangle =\int x\enspace p(x)\mathrm{d}x$ (the probability distribution of x can be obtained from the joint probability, p(x) = ∫dy p(x, y), and in this context is called marginal probability) and its variance is ${\mathrm{Var}}_{x}=\left\langle {(x-\left\langle x\right\rangle )}^{2}\right\rangle$ . We define

$\begin{equation}{C}_{xy}=\langle xy\rangle =\int xy\enspace p(x,y)\mathrm{d}x\enspace \mathrm{d}y,\end{equation} \tag{ 1 }$

$\begin{equation}{\text{Cov}}_{x,y}=\left\langle \left(x-\left\langle x\right\rangle \right)\left(y-\left\langle y\right\rangle \right)\right\rangle =\langle xy\rangle -\left\langle x\right\rangle \left\langle y\right\rangle .\end{equation} \tag{ 2 }$

We call C_xy the correlation and Cov_x,y the covariance of x and y. The covariance is bounded by the product of the standard deviations (Priestley 1981, section 2.12),

$\begin{equation}{\text{Cov}}_{x,y}^{2}\leqslant {\mathrm{Var}}_{x}\enspace {\mathrm{Var}}_{y},\end{equation} \tag{ 3 }$

and is related to the variance of the sum:

$\begin{equation}{\mathrm{Var}}_{x+y}={\mathrm{Var}}_{x}+{\mathrm{Var}}_{y}+2\enspace {\text{Cov}}_{x,y}.\end{equation} \tag{ 4 }$

The Pearson correlation coefficient is defined as

$\begin{equation}{r}_{x,y}=\frac{{\text{Cov}}_{x,y}}{\sqrt{{\mathrm{Var}}_{x}\enspace {\mathrm{Var}}_{y}}},\end{equation} \tag{ 5 }$

and, as a consequence of (3), is bounded: −1 ⩽ r_x,y ⩽ 1. It can be shown that the equality holds only when the relationship between x and y is linear (Priestley 1981, section 2.12).

The variables are said to be uncorrelated if their covariance is null:

$\begin{equation}{\text{Cov}}_{x,y}=0\Longleftrightarrow\left\langle xy\right\rangle =\left\langle x\right\rangle \left\langle y\right\rangle \quad (\text{uncorrelated}).\end{equation} \tag{ 6 }$

Absence of correlation is weaker than independence: independence means that p(x, y) = p(x)p(y) (and clearly implies absence of correlation). For uncorrelated variables it holds, because of (4), that the variance of the sum is the sum of the variances, but Cov_x,y = 0 is equivalent to independence only when the joint probability p(x, y) is Gaussian. The covariance, or the correlation coefficient, are said to measure the degree of linear association between x and y, because it is possible to build a nonlinear dependence of x on y that yields zero covariance (see chapter 2 of Priestley 1981).

2.2. Fluctuating quantities as stochastic processes

Consider now an experimentally observable quantity, a, that provides some useful information on a property of a system of interest. We assume here for simplicity that a is a scalar, but the present considerations can be rather easily generalised to vector quantities. a can represent the magnetization of a material, the number of bacteria in a certain region, neural activity (e.g. as measured by fMRI), etc. We assume that a is a local quantity, i.e. that its value is defined at a particular position in space and time, so that we deal with a function $a(\mathbf{r},t)$ (for the case where only space or time dependence is available see section 2.3.1).

We are interested in cases where a is noisy, i.e. subject to random fluctuations that arise because of our incomplete knowledge of the variables affecting the system's evolution, or because of our inability control them with enough precision (e.g. we do not know all variables that can affect the variation of the price of a stock market asset, we do not know all the interactions in a magnetic system, we cannot control all the microscopic degrees of freedom of a thermal bath). We wish to compare the values of a measured at different positions and times, but a statement like ' $a({\mathbf{r}}_{1},{t}_{1})$ is larger than $a({\mathbf{r}}_{2},{t}_{2})$ ' is useless in practice, because even if it is meaningful for a particular realization of the measurement process, the noisy character of the observable means that a different repetition of the experiment, under the same conditions, will yield a different function $a(\mathbf{r},t)$ . Actually repeating the experiment may or may not be feasible depending on the case, but we assume that we know enough about the system to be able to assert that a hypothetical repetition of the experiment would not exactly reproduce the original $a(\mathbf{r},t)$ . For clarity, it may be easier to imagine that several copies of the system are made and let evolve in parallel under the same conditions, each copy then producing a signal slightly different from that of the other copies. The quantity $a(\mathbf{r},t)$ is then a random variable, and to compare it at different values of its argument we will use correlation functions, and this section is devoted to stating their precise definitions. (For a broader discussion of fluctuating variables in the natural sciences we can refer the reader to the book by Sornette (2006)).

The expression 'under the same conditions' deserves a comment. Clearly we expect that two strictly identical copies of a system evolving under exactly identical conditions will produce the same signal $a(\mathbf{r},t)$ . The 'same conditions' must be understood in a statistical way: the system is prepared by choosing a random initial configuration extracted from a well-defined probability distribution, or two identical copies evolve with a random dynamics with known statistical properties (e.g. coupled to a thermal bath at given temperature and pressure). We are excluding from consideration cases where the fluctuations are mainly due to experimental errors. If that where the only source of noise, one could in principle repeat the measurement enough times so that the average $\left\langle a(\mathbf{r},t)\right\rangle$ is known with enough precision. Then $\left\langle a(\mathbf{r},t)\right\rangle$ would be an accurate description of the system's actual spatial and temporal variations, and the correlations we are about to study would be dominated by properties of the measurement process rather than by the dynamics of the system itself. Instead we are interested in the opposite case: experimental error is negligible, and the fluctuations of the observable are due to some process intrinsic to the system. Indeed in many cases (such as macroscopic observables of systems in thermodynamic equilibrium) the average of the signal is uninteresting (it is a constant), but the correlation functions unveil interesting details of the system's spatial structure and dynamics.

From this discussion it is clear that $a(\mathbf{r},t)$ is not an ordinary function. Rather, at each particular value of its arguments $a(\mathbf{r},t)$ is a random variable: $a(\mathbf{r},t)$ is therefore treated mathematically as a family of random variables indexed by r and t, called stochastic process or random field. We do not need to delve into them here (but see appendix A for a brief introduction). To define correlation functions we only need to assume that the joint distributions

$\begin{equation}{P}_{2}({a}_{1},{\mathbf{r}}_{1},{t}_{1},{a}_{2},{\mathbf{r}}_{2},{t}_{2})\end{equation} \tag{ 7 }$

are defined for all possible values of ${\mathbf{r}}_{1}$ , t₁, ${\mathbf{r}}_{2}$ , and t₂. These distributions characterise only partially the stochastic process, but they are enough to define the correlation functions we consider here.

2.3. Definition of space-time correlation functions

The correlation function is the correlation of the random variables $a({\mathbf{r}}_{0},{t}_{0})$ and $a({\mathbf{r}}_{0}+\mathbf{r},{t}_{0}+t)$ ,

$\begin{equation}C({\mathbf{r}}_{0},{t}_{0},\mathbf{r},t)=\langle {a}^{\ast }({\mathbf{r}}_{0},{t}_{0})a({\mathbf{r}}_{0}+\mathbf{r},{t}_{0}+t)\rangle =\int \mathrm{d}{a}_{1}\enspace \mathrm{d}{a}_{2}\enspace P({a}_{1},{\mathbf{r}}_{0},{t}_{0},{a}_{2},{\mathbf{r}}_{0}+\mathbf{r},{t}_{0}+t){a}_{1}^{\ast }{a}_{2},\end{equation} \tag{ 8 }$

where the star stands for complex conjugate. The time difference t is sometimes called lag, and the function is called self correlation or autocorrelation to emphasize the fact that it is the correlation of the same observable quantity measured at different points. However that from the point of view of probability theory $a({\mathbf{r}}_{0},{t}_{0})$ and $a({\mathbf{r}}_{0}+\mathbf{r},{t}_{0}+t)$ are two different (though not independent) random variables. The cross-correlations of two observables is similarly defined:

$\begin{equation}{C}_{ab}({\mathbf{r}}_{0},{t}_{0},\mathbf{r},t)=\left\langle {a}^{\ast }({\mathbf{r}}_{0},{t}_{0})b({\mathbf{r}}_{0}+\mathbf{r},{t}_{0}+t)\right\rangle .\end{equation} \tag{ 9 }$

The star again indicates complex conjugate. From now on however we shall restrict ourselves to real quantities and omit it in the following equations.

It is often useful to concentrate on the fluctuation $\delta a(\mathbf{r},t)=a(\mathbf{r},t)-\left\langle a(\mathbf{r},t)\right\rangle$ , especially when the average is independent of position and/or time. The correlation of the fluctuations is

$\begin{equation}{C}_{c}({\mathbf{r}}_{0},{t}_{0},\mathbf{r},t)=\left\langle \delta a({\mathbf{r}}_{0},{t}_{0})\delta a({\mathbf{r}}_{0}+\mathbf{r},{t}_{0}+t)\right\rangle =C({\mathbf{r}}_{0},{t}_{0},\mathbf{r},t)-\left\langle a({\mathbf{r}}_{0},{t}_{0})\right\rangle \left\langle a({\mathbf{r}}_{0}+\mathbf{r},{t}_{0}+t)\right\rangle ,\end{equation} \tag{ 10 }$

and is called the connected correlation function (in diagrammatic perturbation theory, the connected correlation is obtained as the sum of connected Feynman diagrams only, see e.g. Binney et al 1992, chapter 8). From (6) it is clear that this function is zero when the variables are uncorrelated. For this reason it is often more useful than the correlation (8), which for uncorrelated variables takes the value $\left\langle a({\mathbf{r}}_{0},{t}_{0})\right\rangle \left\langle a({\mathbf{r}}_{0}+\mathbf{r},{t}_{0}+t)\right\rangle$ .

Sometimes a normalised connected correlation is defined so that its absolute value is bounded by 1, in analogy with the Pearson coefficient:

$\begin{equation}\rho ({\mathbf{r}}_{0},{t}_{0},\mathbf{r},t)=\frac{{C}_{c}({\mathbf{r}}_{0},{t}_{0},\mathbf{r},t)}{\sqrt{{C}_{c}({\mathbf{r}}_{0},{t}_{0},0,0){C}_{c}({\mathbf{r}}_{0}+\mathbf{r},{t}_{0}+t,0,0)}}.\end{equation} \tag{ 11 }$

The names employed here are usual in the physics literature (e.g. Binney et al 1992 and Hansen and McDonald 2005). In the mathematical statistics literature, the connected correlation function is called autocovariance (in fact it is just the covariance of $a({\mathbf{r}}_{0},{t}_{0})$ and $a({\mathbf{r}}_{0}+\mathbf{r},{t}_{0}+t)$ ), while the name autocorrelation is reserved for the normalized connected correlation (11).

The correlations defined here are also known as two-point correlation functions, to distinguish them from higher-order correlations that can be studied but are outside the scope of this paper. Higher-order correlation functions are higher-order moments of $a(\mathbf{r},t)$ , while higher-order connected correlations correspond to the cumulants of $a(\mathbf{r},t)$ (Itzykson and Drouffe 1989a, chapter 7) (the distinction between cumulants and moments is only relevant beyond third order, see e.g. van Kampen 2007, section 2.2).

2.3.1. Global and static quantities

The full spatial and temporal evolution may be not accessible or not relevant (e.g. because time evolution is very slow and the quantity is static at the experimental time-scales). In those cases one defines static space correlation functions or global time correlation functions which can be obtained as particular cases of (8) and (10).

The static space correlation function is just the space-time correlation evaluated at t = 0,

$\begin{equation}C({\mathbf{r}}_{0},\mathbf{r})=C({\mathbf{r}}_{0},{t}_{0},\mathbf{r},0)=\left\langle a({\mathbf{r}}_{0},{t}_{0})a({\mathbf{r}}_{0}+\mathbf{r},{t}_{0})\right\rangle ,\end{equation} \tag{ 12 }$

and analogously for the connected correlation. Time-only correlation functions are defined from a signal that can be regarded as a space integral of a local quantity,

$\begin{equation}A(t)={\int }_{V}a(\mathbf{r},t)\mathrm{d}\mathbf{r},\end{equation} \tag{ 13 }$

so that the time correlation function is

$\begin{equation}C(t,{t}_{0})=\left\langle A({t}_{0})A({t}_{0}+t)\right\rangle =\left\langle \int \mathrm{d}{\mathbf{r}}_{0}\enspace a({\mathbf{r}}_{0},{t}_{0})\int \mathrm{d}{\mathbf{r}}_{1}\enspace a({\mathbf{r}}_{1},{t}_{0}+t)\right\rangle =\int \mathrm{d}{\mathbf{r}}_{0}\enspace \mathrm{d}\mathbf{r}\enspace C({\mathbf{r}}_{0},{t}_{0},\mathbf{r},t).\end{equation} \tag{ 14 }$

2.3.2. Symmetries

Knowledge of the symmetries of the system under study, which are reflected in invariances of the stochastic process used to represent it, is very important. Apart from their significance in our theoretical understanding, at the practical level symmetries are useful in the estimation of statistical quantities, because they afford a way of obtaining additional sampling of the stochastic signal. For example, if the system is translation-invariant, one can perform measurements at different places of the same system and treat them as different samples of the same process, because one knows the statistical properties are independent of position (section 3.1).

If a system is time-translation invariant (TTI), then

$\begin{equation}\left\langle a(\mathbf{r},t)\right\rangle =\left\langle a(\mathbf{r},t+s)\right\rangle ,\end{equation} \tag{ 15a }$

$\begin{equation}\left\langle a({\mathbf{r}}_{1},{t}_{1})a({\mathbf{r}}_{2},{t}_{2})\right\rangle =\left\langle a({\mathbf{r}}_{1},{t}_{1}+s)a({\mathbf{r}}_{2},{t}_{2}+s)\right\rangle \end{equation} \tag{ 15b }$

for all ${\mathbf{r}}_{1}$ , ${\mathbf{r}}_{2}$ , t₁, t₂ and s. In this case the system is said to be stationary.

In stationary systems the time origin is irrelevant: no matter when one performs an experiment, the statistical properties of the observed signal are always the same. In particular, this implies that the average $\left\langle a(\mathbf{r},t)\right\rangle$ is independent of time (but does not mean that time correlations are trivial). Setting s = −t₁ one sees that the second moment depends only on the time difference t₂ − t₁, so that the time correlation depends only on one time (in our definition, the second time, or lag),

$\begin{equation}C(t)=\left\langle A(0)A(t)\right\rangle ,\quad (\text{stationary})\end{equation} \tag{ 16 }$

and differs from the connected correlation by a constant:

$\begin{equation}{C}_{c}(t)=\left\langle A(0)A(t)\right\rangle -{\left\langle A\right\rangle }^{2}=C(t)-{\left\langle A\right\rangle }^{2},\quad (\text{stationary})\text{.}\end{equation} \tag{ 17 }$

This is the situation that holds for a system in thermodynamic equilibrium.

Translation invariance in space similarly means that

$\begin{equation}\left\langle a(\mathbf{r},t)\right\rangle =\left\langle a(\mathbf{r}+\mathbf{s},t)\right\rangle ,\end{equation} \tag{ 18a }$

$\begin{equation}\left\langle a({\mathbf{r}}_{1},{t}_{1})a({\mathbf{r}}_{2},{t}_{2})\right\rangle =\left\langle a({\mathbf{r}}_{1}+\mathbf{s},{t}_{1})a({\mathbf{r}}_{2}+\mathbf{s},{t}_{2})\right\rangle \end{equation} \tag{ 18b }$

for all positions and s. In a translation-invariant, or homogeneous system, the space origin is irrelevant: no matter where the experiment is performed, the statistical properties are the same. In particular $\left\langle a\left(\right.\mathbf{r},t\right\rangle$ is independent of $\mathbf{r}$ , and the second moments depend only on the displacement ${\mathbf{r}}_{2}-{\mathbf{r}}_{1}$ . Space correlations then depend only on the second argument:

$\begin{equation}C(\mathbf{r})=\left\langle a(0,{t}_{0})a(\mathbf{r},{t}_{0})\right\rangle ,\quad (\text{homogeneous})\end{equation} \tag{ 19 }$

$\begin{equation}{C}_{c}(\mathbf{r})=\left\langle \delta a(0,{t}_{0})\delta a(\mathbf{r},{t}_{0})\right\rangle =\left\langle a(0,{t}_{0})a(\mathbf{r},{t}_{0})\right\rangle -{\left\langle a(0,{t}_{0})\right\rangle }^{2}=C(\mathbf{r})-{\left\langle a({t}_{0})\right\rangle }^{2},\quad (\text{homogeneous})\text{.}\end{equation} \tag{ 20 }$

If in addition the system is isotropic, then the process will be rotation invariant, and the second moment depends only on the distance $r=\vert \mathbf{r}\vert$ between the two points, so that $C(\mathbf{r})=C(r\mathbf{n})$ for any unit vector n.

For stationary and homogeneous systems, the normalised correlation can be obtained simply by dividing by its value at the origin,

$\begin{equation}\rho (\mathbf{r},t)=\frac{{C}_{c}(\mathbf{r},t)}{{C}_{c}(0,0)},\quad (\text{stationary,}\;\text{homogeneous})\text{.}\end{equation} \tag{ 21 }$

2.3.3. Some properties

It is easy to see from the definitions that in the stationary and homogeneous case it holds that

$\begin{equation}{C}_{c}(0,0)=\left\langle a{({\mathbf{r}}_{0},{t}_{0})}^{2}\right\rangle -{\left\langle a({\mathbf{r}}_{0},{t}_{0})\right\rangle }^{2}={\mathrm{Var}}_{a},\end{equation} \tag{ 22 }$

$\begin{equation}\vert {C}_{c}(\mathbf{r},t)\vert \leqslant {C}_{c}(0,0)\quad \forall \enspace \mathbf{r},t,\end{equation} \tag{ 23 }$

and that the correlation is even in r and t if $a(\mathbf{r},t)$ is real-valued.

Also, for sufficiently long lags and distances the variables will become uncorrelated, so that

$\begin{equation}C(\mathbf{r}\to \infty ,t)\to {\left\langle a\right\rangle }^{2},\qquad {C}_{c}(\mathbf{r}\to \infty ,t)\to 0,\end{equation} \tag{ 24 }$

$\begin{equation}C(\mathbf{r},t\to \infty )\to {\left\langle a\right\rangle }^{2},\qquad {C}_{c}(\mathbf{r},t\to \infty )\to 0.\end{equation} \tag{ 25 }$

Thus the connected correlation will have a Fourier transform (section 2.4).

2.3.4. Example

Let us conclude this section of definitions with an example. Figure 1 shows on the left two synthetic stochastic signals, generated with the random process described in section 3.1.2. On the right there are the corresponding connected correlations. The two signals look different, and this difference is captured by C_c(t). We can say that fluctuations for the lower signal are more persistent: when some value of the signal is reached, it tends to stay at similar values for longer, compared to the other signal. This is reflected in the slower decay of C_c(t).

2.4. Correlations in Fourier space

Correlation functions are often studied in frequency space, either because they are obtained experimentally in the frequency domain (e.g. in techniques like dielectric spectroscopy), because the Fourier transforms are easier to compute or handle analytically, or because they provide an easier or alternative interpretation of the fluctuating process. Although the substance is the same, the precise definitions used can vary. One must pay attention to (i) the convention used for the Fourier transform pairs and (ii) the exact variables that are transformed.

Here we define the Fourier transform pairs as

$\begin{equation}\tilde{f}(\mathbf{k},\omega )={\int }_{-\infty }^{\infty } \mathrm{d}t\int {\mathrm{d}}^{d}r\enspace f(\mathbf{r},t){\text{e}}^{-\text{i}\mathbf{k}\cdot \mathbf{r}+\mathrm{i}\omega t},\qquad f(\mathbf{r},t)={\int }_{-\infty }^{\infty } \frac{\mathrm{d}\omega }{2\pi }\int \frac{{\mathrm{d}}^{d}r}{{(2\pi )}^{d}}\enspace \tilde{f}(\mathbf{k},\omega ){\text{e}}^{\text{i}\mathbf{k}\cdot \mathbf{r}-\mathrm{i}\omega t},\end{equation} \tag{ 26 }$

but some authors choose the opposite sign for the forward transform and/or a different placement for the 1/2π factor (sometimes splitting it between the forward and inverse transforms). Depending on the convention a factor 2π can appear or disappear in some relations.

Let us consider time-only correlations first. These are defined in terms of two times, t₁ and t₂ (which we chose to write as t₀ and t₀ + t). One can transform one or both of t₁ and t₂ or t₀ and t. Let us mention two different choices, useful in different circumstances. First, take the connected time correlation (10) and do a Fourier transform on t:

$\begin{equation}{C}_{c}({t}_{0},\omega )=\int \mathrm{d}t\enspace {\text{e}}^{\text{i}\omega t}{C}_{c}({t}_{0},{t}_{0}+t)={\text{e}}^{-\text{i}\omega {t}_{0}}\left\langle \delta A({t}_{0})\delta \tilde{A}(\omega )\right\rangle ,\end{equation} \tag{ 27 }$

where $\delta \tilde{A}(\omega )$ stands for the Fourier transform of δA(t). This definition is convenient when there is an explicit dependence on t₀ but the evolution with t₀ is slow, as in the physical ageing of glassy systems (see e.g. Cugliandolo 2004); one then studies a time-dependent spectrum. If on the other hand C_c is stationary, it is more convenient to write t₁ = t₀, t₂ = t₀ + t and do a double transform in t₁, t₂:

$\begin{align}\hfill {\tilde{C}}_{c}({\omega }_{1},{\omega }_{2})& =\int \mathrm{d}{t}_{1}\enspace \mathrm{d}{t}_{2}\enspace {\text{e}}^{\text{i}{\omega }_{1}{t}_{1}}\enspace {\text{e}}^{\text{i}{\omega }_{2}{t}_{2}}\enspace {C}_{c}({t}_{1},{t}_{2}-{t}_{1})\hfill \\ \hfill & =\int \mathrm{d}{t}_{1}\enspace \mathrm{d}{t}_{2}\enspace {\text{e}}^{\text{i}{\omega }_{1}{t}_{1}}\enspace {\text{e}}^{\text{i}{\omega }_{2}{t}_{2}}\left\langle \delta A({t}_{1})\delta A({t}_{2})\right\rangle =\left\langle \delta \tilde{A}({\omega }_{1})\delta \tilde{A}({\omega }_{2})\right\rangle \hfill \\ \hfill & =\int \mathrm{d}{t}_{1}\enspace \mathrm{d}{t}_{2}\enspace {\text{e}}^{\text{i}({\omega }_{1}+{\omega }_{2}){t}_{1}}{\text{e}}^{\text{i}{\omega }_{2}({t}_{2}-{t}_{1})}{C}_{c}({t}_{2}-{t}_{1})=(2\pi )\delta ({\omega }_{1}+{\omega }_{2}){\tilde{C}}_{c}({\omega }_{2}),\hfill \end{align} \tag{ 28 }$

where ${\tilde{C}}_{c}(\omega )$ is the Fourier transform of the stationary connected correlation with respect to t and we have used the integral representation of Dirac's delta, $\delta (\omega -{\omega }^{\prime })={(2\pi )}^{-1}{\int }_{-\infty }^{\infty }{\text{e}}^{\text{i}t(\omega -{\omega }^{\prime })}\enspace \mathrm{d}t$ . As (28) shows, the transform is zero unless ω₁ = −ω₂, so that it is useful to define the reduced correlation,

$\begin{equation}{C}_{c}^{R}(\omega )=\left\langle \delta \tilde{A}(-\omega )\delta \tilde{A}(\omega )\right\rangle =\left\langle \delta {\tilde{A}}^{\ast }(\omega )\delta \tilde{A}(\omega )\right\rangle ,\end{equation} \tag{ 29 }$

where the rightmost equality holds when A(t) is real.

The transform ${\tilde{C}}_{c}(\omega )$ is a well-defined function, because the connected correlation decays to zero (and usually fast enough). We can then say

$\begin{equation}{\tilde{C}}_{c}(\omega )=\frac{1}{2\pi \delta (0)}{C}_{c}^{R}(\omega ).\end{equation} \tag{ 30 }$

This relation can be exploited to build a very fast algorithm to compute C_c(t) in practice (appendix D). The at first sight baffling infinite factor relating the reduced correlation to ${\tilde{C}}_{c}(\omega )$ originates in the fact that $\delta \tilde{A}(\omega )$ cannot exist as an ordinary function, since we have assumed that A(t) is stationary. This implies that the signal has an infinite duration, and that, its statistical properties being always the same, it cannot decay to zero for long times as required for its Fourier transform to exist. The Dirac delta can be understood as originating from a limiting process where one considers signals of duration T (with suitable, usually periodic, boundary conditions) and then takes T → ∞. Then 2πδ(ω = 0) = ∫dt e^itω|_ω=0 = ∫dt = T. These considerations can be made rigorous by defining the signal's power spectrum,

$\begin{equation}h(\omega )=\underset{T\to \infty }{\mathrm{lim}}\enspace \frac{1}{T}\left\langle {A}_{T}(\omega ){A}_{T}^{\ast }(\omega )\right\rangle ,\qquad {A}_{T}(\omega )={\int }_{-T/2}^{T/2}A(t){\text{e}}^{\text{i}\omega t}\enspace \mathrm{d}t,\end{equation} \tag{ 31 }$

and then proving that $h(\omega )={\tilde{C}}_{c}(\omega )$ (Priestley 1981, sections 4.7 and 4.8).

The same considerations apply to space or space-time correlations. For reference, the relations one finds are

$\begin{equation}{\tilde{C}}_{c}({\mathbf{k}}_{1},{\omega }_{1},{\mathbf{k}}_{2},{\omega }_{2})=\int {\mathrm{d}}^{d}{r}_{1}\enspace \mathrm{d}{t}_{1}\enspace {\mathrm{d}}^{d}{r}_{2}\enspace \mathrm{d}{t}_{2}\enspace {\text{e}}^{-\text{i}{\mathbf{k}}_{1}\cdot {\mathbf{r}}_{1}+\mathrm{i}{\omega }_{1}{t}_{1}}\enspace {\text{e}}^{-\text{i}{\mathbf{k}}_{2}\cdot {\mathbf{r}}_{2}+\mathrm{i}{\omega }_{2}{t}_{2}}{C}_{c}({\mathbf{r}}_{1},{t}_{1},{\mathbf{r}}_{2}-{\mathbf{r}}_{1},{t}_{2}-{t}_{1}),\end{equation} \tag{ 32a }$

$\begin{equation}{\tilde{C}}_{c}(\mathbf{k},\omega )={\int }_{-\infty }^{\infty } \mathrm{d}t\int {\mathrm{d}}^{d}r\enspace {\text{e}}^{-\text{i}\mathbf{k}\cdot \mathbf{r}+\mathrm{i}\omega t}{C}_{c}(\mathbf{r},t),\end{equation} \tag{ 32b }$

$\begin{equation}{C}_{c}^{R}(\mathbf{k},\omega )=\left\langle \delta a(-\mathbf{k},-\omega )\delta a(\mathbf{k},\omega )\right\rangle ,\end{equation} \tag{ 32c }$

$\begin{equation}{\tilde{C}}_{c}({\mathbf{k}}_{1},{\omega }_{1},{\mathbf{k}}_{2},{\omega }_{2})={(2\pi )}^{d+1}{\delta }^{(d)}({\mathbf{k}}_{1}+{\mathbf{k}}_{2})\delta ({\omega }_{1}+{\omega }_{2}){C}_{c}^{R}({\mathbf{k}}_{1},{\omega }_{1}),\end{equation} \tag{ 32d }$

$\begin{equation}{\tilde{C}}_{c}(\mathbf{k},\omega )=\frac{1}{{(2\pi )}^{d+1}{\delta }^{(d)}(0)\delta (0)}{C}_{c}^{R}(\mathbf{k},\omega )=\frac{1}{{(2\pi )}^{d+1}VT}{C}_{c}^{R}(\mathbf{k},\omega ).\end{equation} \tag{ 32e }$

3. Computing correlation functions from experimental data

The theoretical definitions of correlation functions are given as averages over some probability distribution, or ensemble. To compute them in practice, given data recorded in an experiment or produced in numerical simulation, there are two issues two consider. First, experimental data will be discrete in time as well as in space, either as a result of sampling, or because the spatial resolution is high enough to measure the actual discrete units that make up our system ('particles', i.e. birds, cells, neurons, etc.). Second, we do not have direct access to the probability distribution, but only to a set of samples, i.e. results from experiments, distributed randomly according to an unknown distribution. Thus what we actually compute are estimators (as they are called in statistics) of the averages we want (the correlation functions). We will discuss estimators corresponding to different situations, and for clarity it will be convenient to consider first the case of a global quantity (correlation in time only, section 3.1) and only later introduce the additional complications brought in by the space structure (section 3.2).

Before moving to the estimation of correlations, let us examine the estimator for the mean, which we will need to estimate the connected correlation functions, and which will allow us to make a couple of general points. It is clearly hopeless to attempt to estimate an ensemble average unless it is possible to obtain many samples under the same conditions (i.e., if one is throwing dice, one should throw many times the same dice). So we assume that we are given a set of M samples {a⁽ⁿ⁾}, n = 1, ..., M obtained in M equivalent experiments. From these samples we can estimate the mean as

$\begin{equation}\bar{a}=\frac{1}{M}\sum\limits _{n=1}^{M}{a}^{(n)}.\end{equation} \tag{ 33 }$

This well-known estimator has the desirable property that it 'gets better and better' when the number of samples M grows, if the ensemble has finite variance. More precisely, $\bar{a}\to \left\langle a\right\rangle$ as M → ∞. This property is called consistency, and is guaranteed because $\left\langle \bar{a}\right\rangle =\left\langle a\right\rangle$ (i.e. the estimator is unbiased) and the variance of $\bar{a}$ vanishes for M → ∞ (Priestley 1981, sections 5.2 and 5.3).

How far off the estimate can be expected to be from the actual value (i.e. the 'error'), is proportional to its variance¹ . A result called the Cramer–Rao inequality implies that there is a lower bound to the variance of any estimator, given the number of samples (Priestley 1981, section 5.2). So in practice one chooses a good known estimator such as (33), but the only way to improve on the error is to acquire more samples, since the variance of (33) goes as $\sim 1/\sqrt{M}$ . Unfortunately $1/\sqrt{M}$ is not a very fast-decreasing function (a tenfold increase in the number of samples reduces the error only by about a third), so one is always pressed to obtain the largest possible number of samples. But in experiments (especially in biology) the number of samples may be scarce, and more so in the case of correlations, which are defined in terms of pairs of values, separated by a fixed distance or time.

To make the most out of the available samples, one must take advantage of the known symmetries of the system. Space translation invariance allows us to treat samples measured at different points of a large sample as different equivalent experiments, and the same goes for samples obtained at different points in time (even if the experiment has not been restarted) if the system is stationary. The samples obtained this way are in general not independent (they will be in fact correlated), but the estimate (33) does not need to be used on independent samples. It is still unbiased and consistent, only that its variance will be larger than it would be for the same number of independent samples (see equation (38)).

Accordingly, whenever in what follows we use n to index samples or experiments, it is understood that the averages over n can actually be implemented as averages over a time or space index if the system is stationary or homogeneous, respectively. We refer then to time averages or space averages as replacing ensemble averages when building an estimator.

3.1. Estimation of time correlation functions

Assume that the experiment records a scalar signal with a uniform sampling interval Δt, so that we are handled N real-valued and time-ordered values forming a sequence A_i, with i = 1, ..., N. It is understood that if the data are digitally sampled from a continuous time process, proper filtering has been applied² . In what follows we shall measure time in units of the sampling interval, so that in the formulae below we shall make no use of Δt. To recover the original time units one must simply remember that A_i = A(t_i) with t_i = t₀ + (i − 1)Δt. For the stationary case we shall write C_k = C(t_k) where t_k is the time difference, t_k = kΔt and k = 0, ..., N − 1, and in the non-stationary case C_i,k = C(t_i, t_k).

If the process is not stationary, nothing can be estimated from a single sequence because the successive values correspond to different, nonequivalent experiments, as the conditions have changed from one sample to another (i.e. the system has evolved in a way that has altered the characteristic of the stochastic process). In this case the correlation depends on two times and the mean itself can depend on time. The only way to estimate the mean or the correlation function is to obtain many sequences ${A}_{i}^{(n)}$ , n = 1, ..., M reinitializing the system to the same macroscopic conditions each time (in a simulation, one can for example restart the simulation with the same parameters but changing the random number seed). It is not possible to resort to time averaging, and the estimates are obtained by replacing the ensemble average by averages across the different sequences, i.e. the (time-dependent) mean is estimated as

$\begin{equation}\left\langle A({t}_{i})\right\rangle \approx \bar{{A}_{i}}=\frac{1}{M}\sum\limits _{n=1}^{M}{A}_{i}^{(n)},\end{equation} \tag{ 34 }$

and the time correlation as

$\begin{equation}{C}_{c}({t}_{i},{t}_{k})\approx {\hat{C}}_{i,k}^{(c)}=\frac{1}{M}\sum\limits _{n}^{M}\delta {A}_{i}^{(n)}\delta {A}_{i+k}^{(n)},\qquad \delta {A}_{i}^{(n)}={A}_{i}^{(n)}-\bar{{A}_{i}},\end{equation} \tag{ 35 }$

where the hat distinguishes the estimate from the actual quantity. Both estimators are consistent and unbiased, i.e. ${\hat{C}}_{i,k}^{(c)}\to C({t}_{i},{t}_{i})$ and $\bar{A({t}_{i})}\to \left\langle A({t}_{i})\right\rangle$ for M → ∞.

If the system is stationary, one can resort to time averaging to build estimators (like (37) below) that only require one sequence. But let us remark that it is always sensible to check whether the sequence is actually stationary. A first check on the mean can be done dividing the sequence in several sections and computing the average of each section, then looking for a possible systematic trend. If this check passes, then one should compute the time correlation of each section and then check that all of them show the same decay (using fewer sections than in the first check, as longer sections will be needed to obtain meaningful estimates of the correlation function). It is important to note that this second check is necessary even if the first one passes; as it is possible for the mean to be time-independent (or its time-dependence undetectable) while the time correlation still depends on two times.

In the stationary case, then, we can estimate the mean with

$\begin{equation}\bar{A}=\frac{1}{N}\sum\limits _{i=1}^{N}{A}_{i},\end{equation} \tag{ 36 }$

and the connected correlation function with

$\begin{equation}{\hat{C}}_{k}^{(c)}=\frac{1}{N-k}\sum\limits _{j=1}^{N-k}\delta {A}_{j}\delta {A}_{j+k},\qquad \delta {A}_{j}={A}_{j}-\bar{A}.\end{equation} \tag{ 37 }$

In the last equation, the true sample mean $\left\langle A\right\rangle$ , if known, can be used instead of the estimate $\bar{A}$ . If the true mean is used, the estimator (37) is unbiased, otherwise it is asymptotically unbiased, i.e. the bias tends to zero for N → ∞, provided the Fourier transform of C_c(t) exists. More precisely, $\langle {\hat{C}}_{k}^{(c)}\rangle ={C}_{c}({t}_{k})-\alpha /N$ , where $\alpha =2\pi \enspace {\mathrm{Var}}_{A}\enspace {\tilde{C}}_{c}(\omega =0)$ . The variance of ${\hat{C}}_{k}^{(c)}$ is $O(\frac{1}{N-k})$ (Priestley 1981, section 5.3). This is sufficient to show that, at fixed k, the estimator is consistent, i.e. ${\hat{C}}_{k}^{(c)}\to {C}_{c}({t}_{k})$ for N → ∞. However, the variance grows with increasing k, and thus the tail of ${\hat{C}}_{k}^{(c)}$ is considerably noisy. In practice, for k near N the estimate is mostly useless, and the usual rule of thumb is to use ${\hat{C}}_{k}^{(c)}$ only for k ⩽ N/2.

Knowing the time correlation, one can make the statement that the variance of the estimate (36) is larger than that of (34) quantitative: the variance of the estimate with correlated samples is (Sokal 1997)

$\begin{equation}{\mathrm{Var}}_{\bar{a}}\approx \frac{2{\tau }_{\mathrm{i}\mathrm{n}\mathrm{t}}}{N}\left[\left\langle {A}^{2}\right\rangle -{\left\langle A\right\rangle }^{2}\right],\end{equation} \tag{ 38 }$

i.e. 2τ_int times larger than the variance in the independent sample case, where τ_int is the integral correlation time defined below (58). In this sense N/2τ_int can be thought of as the number of 'effectively independent' samples (see also Sornette 2006, chapter 8).

We have not written an estimator for the nonconnected correlation function. It is natural to propose

$\begin{equation}{\hat{C}}_{k}=\frac{1}{N-k}\sum\limits _{j=1}^{N-k}{A}_{j}{A}_{j+k}.\end{equation} \tag{ 39 }$

However, although ${\hat{C}}_{k}$ is unbiased, it can have a large variance when the signal has a mean value larger than the typical fluctuation (see section 3.1.1). As a general rule, it is not a good idea estimate the connected correlation by using (39) and then subtracting ${\bar{A}}^{2}$ . Instead, the connected estimator (37) should be used. A possible exception may be the case when an experiment can only furnish many short independent samples of the signal (see the examples in section 3.1.2).

If several sequences sampled from a stationary system are available, it is possible to combine the two averages: one first computes the stationary correlation estimate (37) for each sequence, and then averages the different correlation estimates (over n at fixed lag k). It is clearly wrong to average the sequences themselves before computing the correlation, as this will tend to approach the (stationary) ensemble mean $\left\langle A\right\rangle$ for all times, destroying the dynamical correlations.

There is another asymptotically unbiased estimator of the connected correlation, that can be obtained by using 1/N instead of 1/(N − i) as prefactor of the sum in (37). Calling this estimator ${C}_{c,k}^{\ast }$ , it holds that $\langle {C}_{c,k}^{\ast }\rangle ={C}_{c}({t}_{k})-\alpha /N-kC({t}_{k})/N-\alpha k/{N}^{2}$ , where α is defined as before, and again α = 0 if the exact sample mean is used (Priestley 1981, section 5.3). This has the unpleasant property that the bias depends on k, while the bias of ${\hat{C}}_{c,k}$ is independent of its argument, and smaller in magnitude. The advantage of ${C}_{c,k}^{\ast }$ is that its variance is O(1/N) independent of k, thus it has a less noisy tail. Some authors prefer ${C}_{c,k}^{\ast }$ due to its smaller variance and to the fact that it strictly preserves properties of the correlation, which may not hold exactly for ${\hat{C}}_{c,k}$ . Here we stick to ${\hat{C}}_{c,k}$ , as usual in the physics literature (e.g. Allen and Tildesley 1987, Newman and Barkema 1999 and Sokal 1997), so that we avoid worrying about possible distortions of the shape. In practice however, it seems that as long as N is greater than ∼10τ (a necessary requirement in any case, see below), and for k ⩽ N/2, there is little numerical difference between the two estimates.

3.1.1. Two properties of the estimator

We must mention two important features of the estimator that are highly relevant when attempting to compute numerically the time correlation. The first is that the difference between the non-connected and connected estimators is not ${\bar{a}}^{2}$ , but

$\begin{equation}{\hat{C}}_{k}-{\hat{C}}_{c,k}=\bar{a}\left[\frac{1}{N-k}\sum\limits _{j}^{N-k}{a}_{j}+\frac{1}{N-k}\sum\limits _{j}^{N-k}{a}_{j+k}-\bar{a}\right],\end{equation} \tag{ 40 }$

as it is not difficult to compute. The difference does tend to ${\bar{a}}^{2}$ for N → ∞, but in a finite sample fluctuations are important, especially at large k. Since fluctuations are amplified by a factor $\bar{a}$ , when the signal mean is large with respect to its variance, the estimate ${\hat{C}}_{k}$ is completely washed out by statistical fluctuations. This is why, while C(t) and C_c(t) differ by a constant, in practice it is a bad idea to compute ${\hat{C}}_{k}$ and subtract ${\bar{a}}^{2}$ to obtain an estimate of the connected correlation. Instead, one computes the connected correlation directly by estimating first the mean and then using (37).

Another important fact is that the estimator of the connected correlation will always have zero, whatever the length of the sample used, even if N is much shorter than the correlation time. As shown in appendix B, for any time series one has that ${\hat{C}}_{0}^{(c)} > 0$ and that ${\hat{C}}_{c,k}^{(c)}$ will change sign at least once for k ⩾ 1 (this applies to the case when the mean and connected correlation are estimated with a single time series). The practical consequence is that when N is of the order of τ, or smaller, the estimate ${\hat{C}}_{k}^{(c)}$ will suffer from strong finite-size effects, and its shape will be quite different from the actual C_c(t). In particular, since ${\hat{C}}_{k}^{(c)}$ will intersect the x-axis, it will look like the series has decorrelated when in fact it is still highly correlated. Be suspicious if ${\hat{C}}_{k}^{(c)}$ changes sign once and stays very anticorrelated. Anticorrelation may be a feature of the actual C_c(t), but if the sample is long enough, the estimate should decay until correlation is lost (noisily oscillating around zero). One must always perform tests estimating the correlation with different values of N: if the shape of the correlation at short times depends on N, then N must be increased until one finds that estimates for different values of N coincide for lags up to a few times the correlation time (we show an example next). Once a sample-size-independent estimate has been obtained, the correlation time can be estimated (section 4.1), and it must be checked that self-consistently N is several times larger than τ.

3.1.2. Example

To illustrate the above considerations on estimating C_c(t) we use a synthetic correlated sequence generated from the recursion

$\begin{equation}{a}_{i}=w{a}_{i-1}+(1-w){\xi }_{i},\end{equation} \tag{ 41 }$

where w ∈ [0, 1] is a parameter and the ξ_i are uncorrelated Gaussian random variables with mean μ and variance σ². Assuming the stationary state has been reached it is not difficult to find

$\begin{equation}\left\langle a\right\rangle =\mu ,\qquad \left\langle {(a-\mu )}^{2}\right\rangle =\frac{1-w}{1+w}{\sigma }^{2},\qquad {C}_{k}^{(c)}=\left\langle ({a}_{i}-\mu )({a}_{i+k}-\mu )\right\rangle ={\sigma }^{2}{w}^{k},\end{equation} \tag{ 42 }$

so that the correlation time is τ = −1/log w.

We used the above recursion to generate artificial sequences and computed their time correlation functions with the estimates discussed above. Figure 2 shows the problem that can face the non-connected estimate. When the average of the signal is smaller than or of the order of the noise amplitude (as in the top panels), one can get away with using (39). However if μ ≫ σ, the non-connected estimate is swamped by noise, while the connected estimate is essentially unaffected (bottom panels). Hence, if one is considering only one sequence, one should always use the connected estimator.

**Figure 2.** Connected vs nonconnected estimate. The estimate of C(t) (equation (39), left panels) is much worse that the estimates of C_c(t) (equation (37), right panels). The (artificial) signal was generated with (41). Top panels: μ = 1; bottom panels: μ = 100. In both cases, σ² = 1, w = 0.95 (τ ≈ 20) and length N = 5 × 10⁴.
Download figure:
Standard image High-resolution image

In figure 3 we see how using samples that are too short affects the correlation estimates. The same artificial signal was generated with different lengths. For the shorter lengths, it is seen that the correlation estimate crosses the t axis (as we have shown it must) but does not show signs of losing correlation. One might hope that N = 1000 ≈ 10τ is enough (the estimate starts to show a flat tail), but comparing to the result of doubling the length shows that it is still suffering from finite-length effects. For this particular sequence, it is seen that at least 20τ to 50τ is needed to get the initial part of the normalized connected correlation more or less right, while a length of about 1000τ is necessary to obtain a good estimate up to t ∼ 5τ. The unnormalized estimator suffers still more from finite size due to the increased error in the determination of the variance (left panel).

If the experimental situation is such that it is impossible to obtain sequences much longer than the correlation time, one can get some information on the time correlation if it is possible to repeat the experiment so as to obtain several independent and statistically equivalent sample sequences. In figure 4 we take several sequences with the same parameters as in the previous example, but quite short (N = 200 ≈ 2τ). As we have shown, it is not possible to obtain a moderately reasonable estimate of C_c(t) using one such sequence (as is also clear from the M = 1 case of figure 4). However, the figure shows how one may benefit from the multiple repetitions of the (short) experiment by averaging together all the estimates. The averaged connected estimates are always far from the actual correlation function, even for M = 500 (the case which contains in total 10⁵ points, which proved quite satisfactory in the previous example): this is consequence of the fact that all connected estimates must become negative. Instead, the averaged non-connected estimates approach quite closely the analytical result even though not reaching the decorrelated region³ . Although it is tricky to try to extract a correlation time from this estimate (due to lack of information on the last part of the decay), this procedure at least offers a way to obtain some dynamical information in the face of experimental limitations.

**Figure 4.** Effect of averaging the estimates of many short sequences. We show the connected (left) and non-connected (right) estimates of M different samples of sequence (41) as indicated in the legend, with N = 200, μ = 0, σ² = 1, w = 0.99 (τ ≈ 100). The analytical result C_c(t) = σ² w^t is also plotted.
Download figure:
Standard image High-resolution image

3.2. Estimation of space correlation functions

To compute space correlations one needs experimental data with spatial resolution. We can assume that each experiment provides us a set of values $\left\{{a}_{i}^{(n)}\right\}$ together with a set of positions $\left\{{\mathbf{r}}_{i}^{(n)}\right\}$ which specify where in the sample the ith value has been recorded. The experiment label, n = 1, ..., M can indicate experiments on different samples, or for a stationary system, it can be a time index.

The positions ${\mathbf{r}}_{i}^{(n)}$ may be fixed by the experiment (e.g. when sampling in space with an array of sensors, in which case they are likely independent of n) or might be themselves part of the experimental result (as when measuring the positions of moving organisms). In the former case, a naive implementation of definition (12) will pick up the structure of the experimental array itself, introducing correlations that are clearly an artefact. In the latter case, the positions may encode nontrivial structural features, which will show up in the correlation function of a together with a-specific effects. For example, particles with excluded volume will have non-trivial density correlations at short range, while there might be long-range correlations due to an ordering of a. It is then wise to untangle the two effects as much as possible. Thus in any case it is convenient to implement the definition in a way that separates structural correlations.

Structural effects can be studied through density correlation functions. One such function is the pair distribution function (Hansen and McDonald 2005, section 2.5), defined for homogeneous systems as

$\begin{equation}g(\mathbf{r})=\frac{1}{{\rho }_{0}N}\left\langle \sum\limits _{i\ne j\vert }{\delta }^{(d)}(\mathbf{r}-{\mathbf{r}}_{ij})\right\rangle =\frac{1}{N{\rho }_{0}}\left\langle \sum\limits _{i\ne j}\frac{\delta \left(\vert \mathbf{r}\vert -\vert {r}_{ij}\vert \right)}{4\pi \vert \mathbf{r}{\vert }^{2}}\right\rangle ,\end{equation} \tag{ 43 }$

where ${\delta }^{(d)}(\mathbf{r})$ is the d-dimensional Dirac's delta, ${\mathbf{r}}_{ij}={\mathbf{r}}_{i}-{\mathbf{r}}_{j}$ is the pair displacement and ρ₀ = N/V is the average density. The second equality is valid for isotropic systems, and g(r) is often called radial distribution function in this case. Defining the density of the configuration $\left\{{\mathbf{r}}_{i}\right\}$ by $\rho (\mathbf{r})={\sum }_{i}\delta (\mathbf{r}-{\mathbf{r}}_{i})$ , the pair distribution function is found to be related to the density-density correlation function by

$\begin{equation}\left\langle \rho ({\mathbf{r}}_{0})\rho ({\mathbf{r}}_{0}+\mathbf{r})\right\rangle ={\rho }_{0}^{2}g(\mathbf{r})+{\rho }_{0}\delta (\mathbf{r})={\rho }_{0}^{2}g(\vert \mathbf{r}\vert )+{\rho }_{0}\frac{\delta (\vert \mathbf{r}\vert )}{4\pi \vert \mathbf{r}{\vert }^{2}},\end{equation} \tag{ 44 }$

where again the second equality applies to isotropic systems. The Dirac delta term arises because in $\left\langle \rho ({\mathbf{r}}_{0})\rho ({\mathbf{r}}_{0}+\mathbf{r})\right\rangle$ one includes the i = j terms that were excluded from the sum in (43). If the $\left\{{\mathbf{r}}_{i}\right\}$ are obtained with periodic boundary conditions (as is often the case in simulation), (43) can be used to estimate g(r) almost directly: it suffices to replace the Dirac delta by a discrete version (i.e. turn it into a binning function) and the average by a (normalised) sum over all available statistically equivalent configurations. For the isotropic case,

$\begin{equation}{\hat{g}}_{k}=\frac{1}{M}\sum\limits _{n}\frac{1}{{\rho }_{0}N}\sum\limits _{ij}\frac{{\Delta}\left[{r}_{ij}^{(n)}-(k+1/2){\Delta}r\right]}{{V}_{k}},\end{equation} \tag{ 45a }$

$\begin{equation}{\Delta}(r)=\begin{cases}1\quad \hfill & \quad \text{if}\enspace -{\Delta}r/2< r\leqslant {\Delta}r/2,\hfill \\ 0\quad \hfill & \quad \text{otherwise},\hfill \end{cases}\end{equation} \tag{ 45b }$

where Δr is the bin width and V_k is the volume of the kth bin, in three dimensions ${V}_{k}=\frac{4}{3}\pi {({\Delta}r)}^{3}\left({(k+1)}^{3}-{k}^{3}\right)$ . The bin width must not be chosen too small or some bins will end up having very few or no particles. In any case, it must be larger than the experimental resolution, otherwise g(r) will pick up spurious correlations arising from the fact that due to finite spatial resolution, all positions effectively lay on the sites of a square lattice.

If borders are non-periodic, as in experimental data, using (49) will introduce bias at large r, because the number of neighbours of a given particle stops growing as r² (the volume of the bin) as soon as r is of the order of the distance of the particle to the nearest border. In this case an unbiased estimator is

$\begin{equation}{\hat{g}}_{k}=\frac{1}{{\rho }_{0}{N}_{k}}\sum\limits _{i\in {S}_{k}}\sum\limits _{j}\frac{{\Delta}(k-{r}_{ij}/{\Delta}r)}{{V}_{k}},\end{equation} \tag{ 46 }$

where i ∈ S_k means that the sum runs only over the set of particles that are at least at a distance (k + 1/2)Δr from the nearest border, and N_k is the number of such particles. In other words, for a given point i, the pair ij only enters the sum if ${\mathbf{r}}_{j}$ lies on a bin (centred at i) that is completely contained within the system's volume. This is known as the unweighted Hanisch method (Hanisch 1984).

We stress that in systems with non-periodic boundaries it is essential to deal adequately with borders. Border effects are not limited to correlation functions, but also affect other observables, such as the average density. The Hanisch method (46) is an unbiased estimator for the radial distribution function, but to apply it one first needs to define the borders, which are to a certain degree arbitrary in the case of a discrete system. The convex hull algorithm can define an unambiguous border by finding the smallest convex set that includes all the system. However, when obvious concavities or 'holes' are present, the convex hull can greatly overestimate the volume (leading to underestimating the average density). To deal with concavities one can use the α-shape algorithm (Edelsbrunner and Mücke 1994), at the price of introducing an arbitrary parameter (see Cavagna et al 2008, for discussion on using these algorithms to determine the borders of biological groups). Details on the correct determination of borders are also discussed by Villegas et al (2021), who study the spatial structure of a rain forest, and show how valuable insight can be obtained from the combined study of structural correlations together with the spatial-scale dependence of density fluctuations.

To conclude our remarks on structural correlation functions, let us note that if correlations in Fourier space are of interest, it is typically better to estimate them directly, rather than going through the real space functions and applying a numerical Fourier transform. For example, the structure factor, which is proportional to the reduced density correlation,

$\begin{equation}S(\mathbf{k})=\frac{1}{N}\left\langle \rho (\mathbf{k})\rho (-\mathbf{k})\right\rangle =1+{\rho }_{0}\tilde{g}(\mathbf{k}),\end{equation} \tag{ 47 }$

where $\tilde{g}(\mathbf{k})$ is the Fourier transform of $g(\mathbf{r})$ , can be computed directly from the ${\mathbf{r}}_{i}^{(n)}$ through

$\begin{equation}S(\mathbf{k})=\frac{1}{M}\frac{1}{N}\sum\limits _{n}\sum\limits _{jl}{\text{e}}^{\text{i}\mathbf{k}\cdot {\mathbf{r}}_{jl}^{(n)}}=1+\frac{1}{NM}\sum\limits _{n}\sum\limits _{i\ne j}\frac{\mathrm{sin}\vert \mathbf{k}\vert {r}_{ij}^{(n)}}{\vert \mathbf{k}\vert {r}_{ij}^{(n)}},\end{equation} \tag{ 48 }$

where the second equality applies to isotropic system and is obtained by analytically averaging the first expression over all angles. For further discussion of density correlation functions we refer the reader to condensed matter texts such as Hansen and McDonald (2005) or Ashcroft and Mermin (1976).

Turning now to the correlations of a, and focusing on the homogeneous case, we seek an estimator of the connected correlation function. How to write it depends on exactly what one means by $a(\mathbf{r})$ (it can be defined as a density or as specific quantity, i.e. per unit volume or per particle), and on the fact that one wants to separate structural correlations. These issues are discussed at length in Cavagna et al (2018). Here it will suffice to state that a good estimator is

$\begin{equation}{\hat{C}}^{(c)}(\mathbf{r})=\frac{\sum\limits _{ij}\delta {a}_{i}\delta {a}_{j}\delta (\mathbf{r}-{\mathbf{r}}_{ij})}{\sum\limits _{kl}\delta (\mathbf{r}-{\mathbf{r}}_{kl})}=\frac{\sum\limits _{ij}\delta {a}_{i}\delta {a}_{j}\delta (r-{r}_{ij})}{\sum\limits _{kl}\delta (r-{r}_{kl})},\end{equation} \tag{ 49 }$

where again $\delta {a}_{i}={a}_{i}-\bar{a}$ and the last expression is for the isotropic case. We have written δ functions for clarity, but clearly one uses binning as in (45). These expressions do a good job of disentangling correlations of the property a from structural correlations. In the case of a fixed lattice, ${\hat{C}}^{(c)}(\mathbf{r})$ is completely emancipated from the effects of the lattice structure, while in the case of moving particles, at least the uncorrelated effects of structure will be taken care of. Also, thanks to the denominator, (49) is free from boundary effects, so that one does need to worry about the border bias that requires the use of the Hanisch method for g(r).

As written, (49) can be computed using a single configuration. Since the system is homogeneous, the mean $\left\langle a\right\rangle$ can also be estimated with a space average,

$\begin{equation}\bar{a}=\frac{1}{N}\sum\limits _{i}{a}_{i}.\end{equation} \tag{ 50 }$

When only one configuration is available, there is no choice but to use that configuration to compute both $\bar{a}$ and ${\hat{C}}^{(c)}(r)$ . If several configurations, statistically equivalent and with the same boundary conditions, are available, there are two possibilities. One is to use all configurations to compute first an estimate of the mean, and then compute the correlation. In this case we shall call the estimate ${\hat{C}}_{\text{ph}}^{(c)}(r)$ , for phase average, because this procedure is closest to actually performing a phase-space, or ensemble, average. The other possibility is to use each configuration to compute the mean and correlation function (with the mean estimated with the same single configuration), and afterwards average over configurations. We shall write ${\hat{C}}_{\text{sp}}^{(c)}(r)$ (for space average) in this case.

An important fact about the estimate (49) in the space-average case is that ${\hat{C}}_{\text{sp}}^{(c)}(\mathbf{r})$ will always have a zero, exactly like the estimator for the connected time correlation (37). The reason is similar, and it can be seen considering that since ∑_i δa_i = 0, then

$\begin{equation}0=\frac{1}{N}\sum\limits _{ij}\delta {a}_{i}\delta {a}_{j}={\rho }_{0}\int \mathrm{d}\mathbf{r}\enspace {g}_{F}(\mathbf{r})C(\mathbf{r}),\end{equation} \tag{ 51 }$

where ${g}_{F}(\mathbf{r})$ is defined as the pair distribution function (43) but without excluding the case i = j from the double sum. Since g_F(r) > 0, it follows that ${\hat{C}}_{\text{sp}}^{(c)}(\mathbf{r})$ must change sign, so there is always an r₀ such that ${\hat{C}}_{\text{sp}}^{(c)}({r}_{0})=0$ . This is an artefact, but it can be exploited to obtain a proxy of the correlation scale in critical systems (section 5.2).

Let us conclude by writing the Fourier-space counterpart of ${\hat{C}}^{(c)}(r)$ . We introduce it as proportional to the reduced correlation,

$\begin{equation}\hat{C}(\mathbf{k})=\frac{1}{N}\delta a(\mathbf{k})\delta (-\mathbf{k})=\frac{1}{N}\sum\limits _{ij}\delta {a}_{i}\delta {a}_{j}\enspace {\text{e}}^{\text{i}\mathbf{k}\cdot {\mathbf{r}}_{ij}}=\frac{1}{N}\sum\limits _{ij}\frac{\mathrm{sin}\vert \mathbf{k}\vert {r}_{ij}}{\vert \mathbf{k}\vert {r}_{ij}}\delta {a}_{i}\delta {a}_{j}.\end{equation} \tag{ 52 }$

It is related to the real space correlation by

$\begin{equation}\hat{C}(k)={\rho }_{0}\int \mathrm{d}\mathbf{r}\enspace {g}_{F}(r){\text{e}}^{-\text{i}\mathbf{k}\cdot \mathbf{r}}{\hat{C}}^{(c)}(r).\end{equation} \tag{ 53 }$

The function g_F(r) appears in the integral (in effect reintroducing structural effects in $\hat{C}(k)$ ) because (52) is more convenient computational-wise than the Fourier transform of (49), and is consistent with an integration measure chosen so that the integral of a(r) equals ∑_i a_i, which is often the global order parameter (see Cavagna et al 2018, appendix A).

3.3. Estimation of space-time correlations

Space-time correlation estimation brings together the issues encountered in the time and space cases. The previous discussion should allow the reader to generalise the estimators written above to the space-time case. For brevity, let us just write the space-time generalisation of (49) and (52) (for homogeneous and stationary systems):

$\begin{equation}{\hat{C}}^{(c)}(\mathbf{r},t)=\frac{1}{T}\sum\limits _{{t}_{0}}\frac{\sum\limits _{ij}\delta {a}_{i}({t}_{0})\delta {a}_{j}({t}_{0}+t)\delta (\mathbf{r}-{\mathbf{r}}_{i}({t}_{0})+{\mathbf{r}}_{j}({t}_{0}+t))}{\sum\limits _{ij}\delta (\mathbf{r}-{\mathbf{r}}_{i}({t}_{0})+{\mathbf{r}}_{j}({t}_{0}+t))},\end{equation} \tag{ 54 }$

$\begin{equation}\hat{C}(\mathbf{k})=\frac{1}{TN}\sum\limits _{{t}_{0}}\sum\limits _{ij}\delta {a}_{i}({t}_{0})\delta {a}_{j}({t}_{0}+t){\text{e}}^{-\text{i}\mathbf{k}\cdot \left[{\mathbf{r}}_{i}({t}_{0})-{\mathbf{r}}_{j}({t}_{0}+t)\right.}.\end{equation} \tag{ 55 }$

4. Correlation time and correlation length

A connected space correlation function measures how correlation is gradually lost as one considers two locations further and further apart. Similarly, the connected time correlation function measures how correlation is gradually lost as time elapses. One often seeks for a summary of this detailed information, in the form of a (space or time) scale that measures the interval it takes for significant decorrelation to happen: these are the correlation length ξ and the correlation time ⁴ τ.

Sometimes these are interpreted as the distance or time after which the connected correlation has descended past a prescribed level. Although this can be a useful operational definition when working with a set of correlation functions of similar shape, these scales are better understood as the asymptotic exponential rate of the decay. The precise numerical value (which will depend on the details of the computation method) is less important than their trend with the relevant control parameters (temperature, concentration, ...): in this sense one can say that ξ and τ are most useful to compare the correlation range as some environmental condition is changed.

Of course, correlation functions are typically more complicated than a simple exponential, and it may make sense to describe them using more than one scale (e.g. as a superposition of exponential decays). The correlation length refers to the rate of the last exponential to die out. More precisely (Amit and Martín-Mayor 2006, sections II.2–3)

$\begin{equation}\xi =\underset{r\to \infty }{\mathrm{lim}}\enspace \frac{r}{-\mathrm{log}\enspace {C}_{c}(r)}.\end{equation} \tag{ 56 }$

This definition, with emphasis on the long-range behaviour, is particularly suited to the study of systems in the critical region. Note that ξ = ∞ does not mean that the correlation never decays, but only that it does so more slowly than an exponential.

Similarly, the correlation time is (Sokal 1997),

$\begin{equation}\tau =\underset{t\to \infty }{\mathrm{lim}}\enspace \frac{t}{-\mathrm{log}\enspace {C}_{c}(t)}.\end{equation} \tag{ 57 }$

These definitions work well when the decay is a combination of exponentials and power laws. Exponentials of powers, on the other hand, never yield a finite scale according to (56) or (57), we defer discussion of those to appendix C.

Clearly definitions (56) and (57) are not directly applicable to extract ξ or τ from correlation functions computed in experiments or simulations, which are of finite range. The rest of this section is devoted to discussing ways to obtain ξ or τ from finite data.

A possibility is to fit the decay to some function, and apply the definitions to this function. This may be fine, as long as one can avoid proliferation of parameters⁵ . However, it can be difficult to fit both the short- and long-range regimes, so an alternative is to attempt fit $\mathrm{log}\left[{x}^{\alpha }{C}_{c}(x)\right]$ (where x stands for r or t) for large x with a straight line −Ax. If the fit is good then one has ξ = 1/A or τ = 1/A. One expects α = 0 for the time case (at least in equilibrium), and in the space case α ≈ 0 in d = 2 and α ≈ 1 in d = 3 according to the scaling form (66). This power has no effect on the definition (56), but is important to obtain a reasonable fit on relatively short range.

Nevertheless, fitting may not be the best strategy, especially if the linear region in the fit is short and the slope is very sensitive to the choice of fitting interval. We present next some alternative procedures.

4.1. Correlation time

A quite general way to define a correlation time is from the integral of the normalized connected correlation,

$\begin{equation}{\tau }_{\mathrm{i}\mathrm{n}\mathrm{t}}={\int }_{0}^{\infty }\rho (t)\enspace \mathrm{d}t.\end{equation} \tag{ 58 }$

Clearly for a pure exponential decay τ_int = τ. In general, if ρ(t) = f(t/τ) then τ_int = const τ, so that τ_int has the same dependence on control parameters as τ. This would change if $\rho (t)={(t/{t}_{0})}^{\alpha }f(t/\tau )$ (Sokal 1997, section 2), but one expects α = 0 in equilibrium. In any case, τ_int has a meaning of its own, as the scale that corrects the variance of the estimate of the mean when computed with correlated samples (section 3.1).

With some care, τ_int can be computed from experimental or simulation data, avoiding the difficulties associated with thresholds or fitting functions. The following procedure (Sokal 1997) is expected to work well if long enough sequences are at disposal and the decay does not display strong oscillations or anticorrelation. If ${\hat{C}}_{k}^{(c)}\approx {C}_{c}(k{\Delta}t)$ is the estimate of the stationary connected correlation (37), then the integral can be approximated by a discrete sum. However, the sum cannot run over all available values k = 0, ..., N − 1, because the variance of ${\hat{C}}_{k}^{(c)}$ for k near N − 1 is large (section 3.1), so that the sum ${\sum }_{k=0}^{N-1}{\hat{C}}_{k}^{(c)}$ is dominated by statistical noise (more precisely, its variance does not go to zero for N → ∞). A way around this difficulty is (Sokal 1997, section 3) to cut-off the integral at a time t_c = cΔt such that c ≪ N but the correlation is already small at t = t_c (i.e. t_c is a few times τ). Thus τ_int is defined self-consistently as

$\begin{equation}{\tau }_{\mathrm{i}\mathrm{n}\mathrm{t}}={\int }_{0}^{\alpha {\tau }_{\mathrm{i}\mathrm{n}\mathrm{t}}}\rho (t)\enspace \mathrm{d}t,\end{equation} \tag{ 59 }$

where α should be chosen larger than about 5, and within a region of values such that τ_int(α) is approximately independent of α. Longer tails will require larger values of α; we have found adequate values to be as large as 20 (Cavagna et al 2012). To solve (59) one can compute $\tau (M)={\sum }_{k}^{M}{\hat{C}}_{k}^{(c)}/{\hat{C}}_{0}^{(c)}$ starting with M = 1 and increasing M until ατ(M) > M.

Another useful definition of correlation time can be obtained from $\tilde{\rho }(\omega )$ , the Fourier transform of ρ(t). Normalisation of ρ(t) implies that $1=\rho (0)={\int }_{-\infty }^{\infty }\frac{\mathrm{d}\omega }{2\pi }\enspace \tilde{\rho }(\omega )$ . Then a characteristic frequency ω₀ (and a characteristic time τ₀ = 1/ω₀) can be defined such that half of the spectrum of $\tilde{\rho }(\omega )$ is contained in ω ∈ [ −ω₀, ω₀] (Halperin and Hohenberg 1969), i.e.

$\begin{equation}{\int }_{-{\omega }_{0}}^{{\omega }_{0}}\frac{\mathrm{d}\omega }{2\pi }\enspace \tilde{\rho }(\omega )=\frac{1}{2}.\end{equation} \tag{ 60 }$

This definition of can be expressed directly in the time domain writing

$\begin{equation}\frac{1}{2}={\int }_{-{\omega }_{0}}^{{\omega }_{0}}\frac{\mathrm{d}\omega }{2\pi }{\int }_{-\infty }^{\infty } \mathrm{d}t\enspace \rho (t){\text{e}}^{\text{i}\omega t}=2{\int }_{0}^{\infty } \mathrm{d}t\enspace \rho (t){\int }_{-{\omega }_{0}}^{{\omega }_{0}}\frac{\mathrm{d}\omega }{2\pi }\enspace {\text{e}}^{\text{i}\omega t}=\frac{2}{\pi }{\int }_{0}^{\infty } \mathrm{d}t\enspace \rho (t)\frac{\mathrm{sin}\enspace {\omega }_{0}t}{t},\end{equation} \tag{ 61 }$

where we have used the fact that ρ(t) is even. Then the correlation time is defined by

$\begin{equation}{\int }_{0}^{\infty }\frac{\mathrm{d}t}{t}\enspace \rho (t)\mathrm{sin}\left(\frac{t}{{\tau }_{0}}\right)=\frac{\pi }{4}.\end{equation} \tag{ 62 }$

It can be seen that if ρ(t) = f(t/τ), then τ₀ is proportional to τ (it suffices to change the integration variable to u = t/τ in the integral above). An advantage of this definition is that it copes well with the case when inertial effects are important and manifest in (damped) oscillations of the correlation function (see figure 10 in appendix C).

4.2. Correlation length

A quite general procedure is to use the second moment correlation length ξ₂ (Amit and Martín-Mayor 2006, section 2.3.1),

$\begin{equation}{\xi }_{2}=\sqrt{\frac{\int \mathrm{d}\mathbf{r}\enspace {r}^{2}{C}_{c}(\mathbf{r})}{\int \mathrm{d}\mathbf{r}\enspace {C}_{c}(\mathbf{r})}}.\end{equation} \tag{ 63 }$

The quotient here ensures that ξ₂ scales with control parameters like ξ (a definition analogous to that of the integral correlation time would not have this property due to the power-law prefactor). ξ₂ can also be expressed in terms of the Fourier transform of ${C}_{c}(\mathbf{r})$ as

$\begin{equation}{\xi }_{2}^{2}=-{\left.\frac{1}{\tilde{C}(\mathbf{k})}\frac{{\partial }^{2}\tilde{C}(\mathbf{k})}{\partial {k}_{\alpha }\partial {k}_{\alpha }}\right\vert }_{k=0}.\end{equation} \tag{ 64 }$

The integrals in (63) can be computed numerically over the finite volume, but (64) may be more convenient (recall that $\hat{C}(\mathbf{k})$ can be computed directly from the data, section 3.2). Numerical derivatives should clearly be avoided; instead one can fit $\tilde{C}(k)=1/(A+B{k}^{2})$ for small k, and obtain the correlation length as ${\xi }_{2}=\sqrt{B/A}$ (which follows from (64) apart from a dimensional-dependent numerical prefactor). Another possibility is to obtain A and B using only $\tilde{C}(0)$ and $\tilde{C}({k}_{\mathrm{min}})$ (Cooper et al 1982), where k_min is the smallest wavenumber allowed by boundary conditions (e.g. k_min = 2π/L in a cubic periodic system), resulting in

$\begin{equation}{\xi }_{2}=\frac{1}{{k}_{\mathrm{min}}}\sqrt{\frac{\hat{C}(0)}{\hat{C}({k}_{\mathrm{min}})}-1}.\end{equation} \tag{ 65 }$

When working with lattice systems, 2 sin(k_min/2) can be substituted for the k_min outside the square root (Caracciolo et al 1993), rendering the expression exact for a Gaussian lattice model. This procedure works well when ξ ≪ L, so that the small wavevectors are seeing an exponential behaviour for r ∼ L.

If A/B is not much larger than L⁻², then ξ is approaching the system size L and the estimate is not very reliable; for ξ of the order of L or larger, A/B may even become negative. In situations like these (which include critical systems), the first root of ${\hat{C}}_{\text{sp}}^{(c)}(r)$ , can provide a useful proxy for the correlation scale, but in that case it is necessary to compare systems of different sizes (see section 5.2).

5. Correlation functions in the critical region

The critical region refers to the volume of parameter space near the critical surface, which is a boundary between different phases of a system. The word phase may refer to a thermodynamic equilibrium phase, but it also applies to systems out of equilibrium, where different phases are characterised by qualitatively different properties (static or dynamic).

The presence of long-range correlations is a key feature of critical systems, so that correlation functions are a fundamental tool to study systems at or near criticality. Note, however, that long-range correlations are not synonymous with the critical point: in systems with a continuous symmetry (such as the classical Heisenberg model, in which spins are vectors on the sphere), the correlation length is infinite across all the broken-symmetry (low-temperature, magnetised) phase (Goldenfeld 1992, chapter 11).

Sufficiently close to the critical surface, the fact that the correlation is much larger than microscopic lengths allows to write the correlation functions for large enough distances in the form of so-called scaling relations. These are functional relations where the dependence on the microscopic parameters enters only through a few control parameters, which are often recast in terms of the correlation length. The relations involve an unknown function, which gives the shape of the correlation decay but is independent of the control parameters, or depends on a subset of them. We state first the scaling relations for the static and dynamic correlation functions as applicable to infinite systems, and afterwards consider the case of finite sizes.

Scaling relations as presented below apply and are well understood within equilibrium thermodynamics, where the system's (near) scale invariance can be exploited using the renormalization group to explain how scaling relations, as well as subleading corrections, arise (Amit and Martín-Mayor 2006, Binney et al 1992, Cardy 1996, Itzykson and Drouffe 1989b, Le Bellac 1991 and Wilson and Kogut 1974). In out-of-equilibrium, and in particular biological, systems our understanding is less firm, although the scale invariance found near critical points can still be exploited (see e.g. Tauber 2014). Scaling has been successfully applied to biological systems (although in this case it is typically essential to take into account the system's finite size, as discussed in section 5.2), but situations might be encountered where matters are more complicated than presented here, especially when disorder or networks with complicated topology are involved.

5.1. Scaling

The connected correlation function for an infinite system near criticality for the case of a single control parameter can be written as

$\begin{equation}{C}_{c}(r)=\frac{1}{{r}^{d-2+\eta }}f(r/\xi )=\frac{1}{{\xi }^{d-2+\eta }}\hat{f}(r/\xi ),\end{equation} \tag{ 66 }$

where f(x) is a function that tends to 0 faster than any power law for x → ∞, and the second equality is just a different way of writing the same scaling law, using $\hat{f}(x)={x}^{-d+2-\eta }f(x)$ . The exponent of the power-law prefactor is conventionally written in this way because η is typically a small correction to d − 2, which is the exponent obtained from dimensional analysis (Goldenfeld 1992, chapter 7). The exponent η is called an anomalous dimension, and is one of the set of critical exponents that describe the singular behaviour of thermodynamic quantities near criticality (see e.g. Binney et al 1992).

As stated, the scaling law (66) applies for large r and finite ξ. At short distances, non-universal dependence on microscopic details may be observed (in particular note that the correlation must be finite for r = 0). Precisely at the critical point, the correlation does not decay exponentially but as a power law,

$\begin{equation}C(r)\sim {r}^{2-d-\eta }.\end{equation} \tag{ 67 }$

In other words, for ξ = ∞ the scaling function becomes a constant, even if it is not true that f(0) is finite. This decay is scale-free, in the sense that if one chooses an arbitrary scale r₀, C(r)/C(r₀) is a function of r/r₀, i.e. it can be scaled with the externally chosen length scale. In contrast, if the decay is e.g. exponential, C(r)/C(r₀) will depend on r/r₀ and on r₀/ξ, i.e. the function has an intrinsic length scale ξ.

If there is more than one control variable that can take the system away from the critical surface then the scaling law can be more complicated than (66) (see Goldenfeld 1992, chapter 9). The scaling of C_c(r) is illustrated in figure 5 for the 2d Ising model.

**Figure 5.** Example of static (top) and dynamic (bottom) scaling in the 2d Ising model. In the top panel, the space correlation function C(r) was computed using equation (49) for the Ising model on a 100 × 100 square lattice at the specified temperatures. The correlation length ξ₂ was computed in k-space, equation (64) fitting the first five points of C⁻¹(k) to a quadratic function. The rightmost plot shows that ξ^η C(r) plotted vs r/ξ falls on the same curve for all temperatures. We have used the exact known value η = 1/4. Bottom panel: the correlation length and time (computed with definition (62)) for several temperatures and system sizes grow on approaching T_c, but saturate at a size-dependent value (left). Right: correlation time vs correlation length for the same sizes and temperatures as in the left panel, illustrating the dynamic scaling (72). The continuous curve is a power law with exponent z = 2.21, the value of the dynamic exponent obtained by Münkel *et al* (1993).
Download figure:
Standard image High-resolution image

From (66) an equivalent scaling law can be written for the correlation in Fourier space:

$\begin{equation}\tilde{C}(k)={\xi }^{2-\eta }F(k\xi )={k}^{-2+\eta }\hat{F}(k\xi ),\end{equation} \tag{ 68 }$

where $\hat{F}(x)={x}^{2-\eta }F(x)$ and now the scaling law is valid for small k, with F(x = 0) finite, so that $\hat{C}(k=0)$ (which is proportional to the susceptibility) diverges when ξ → ∞.

The space-time correlation can also be written in a scaling form, which ultimately leads to a link between ξ and τ. This dynamic scaling relation states that (Halperin and Hohenberg 1969, Hohenberg and Halperin 1977 and Tauber 2014)

$\begin{equation}\tilde{C}(k,\omega )=\tilde{C}(k)\frac{2\pi }{{\omega }_{k}}h(\omega /{\omega }_{k},k\xi ),\qquad {\omega }_{k}={k}^{z}{\Omega}(k\xi ),\end{equation} \tag{ 69 }$

where $\tilde{C}(k)$ is the static correlation function and ω_k is the k-dependent inverse characteristic time defined through (60) applied to $\tilde{C}(k,\omega )$ at fixed k. Equation (69) introduces the dynamic exponent z and two scaling functions: Ω(x), which stays finite for x → ∞, and the shape function h(x), which obeys ${\int }_{-\infty }^{\infty }h(x)\enspace \mathrm{d}x=1$ and ${\int }_{-1}^{1}h(x)\enspace \mathrm{d}x=1/2$ because $C(k)=C(k,t=0)={\int }_{-\infty }^{\infty }C(k,\omega )\enspace \mathrm{d}\omega /2\pi$ . In the time domain the scaling reads

$\begin{equation}C(k,t)=C(k)\hat{h}(t/{\tau }_{k},k\xi ),\qquad {\tau }_{k}={k}^{-z}{{\Omega}}^{-1}(k\xi ).\end{equation} \tag{ 70 }$

In general the correlation time and the shape of the normalised k-dependent time correlation depend on k. If one compares systems with different correlation lengths but at different k such that the product kξ is held constant, then C(k, t)/C(k) will be seen to depend on t and k only through the scaling variable k^z t.

One can multiply and divide by ξ^z to rewrite the scaling of τ_k:

$\begin{equation}{\tau }_{k}={k}^{-z}{{\Omega}}^{-1}(k\xi )={\xi }^{z}{\hat{{\Omega}}}^{-1}(k\xi ),\end{equation} \tag{ 71 }$

where ${\hat{{\Omega}}}^{-1}(x)=x{{\Omega}}^{-1}(x)$ . From (71) we learn the important fact that, at fixed ξ, the correlation time depends on the observation scale, more precisely that it is smaller for shorter length scales (larger k). Since τ must be finite for finite ξ, ${\hat{{\Omega}}}^{-1}(x)$ must be finite for x → 0, and from the second equality we conclude that the relaxation time of global quantities (i.e. at k = 0) grows with growing correlation length as

$\begin{equation}\tau \sim {\xi }^{z}.\end{equation} \tag{ 72 }$

This important relationship highlights the fact that static, or equal-time, correlations, are nevertheless the result of a dynamic process of information exchange, and that the farther this information travels, the slower the system becomes. So static correlations should not be viewed as instantaneous correlations; in fact they contain contributions from all frequencies, since $C(k,t=0)={\int }_{-\infty }^{\infty }\tilde{C}(k,\omega )\enspace \mathrm{d}\omega /(2\pi )$ .

Note finally that in disordered systems the correlation time may grow with correlation length faster than a power law, leading to extremely slow dynamics even far from a critical point, as it happens in structural glasses (see e.g. Kob 2004). We do not consider here such systems, which require more complicated correlation functions (multi-point, or point-to-set) to detect growing correlations (Biroli et al 2008). A brief discussion of point-to-set correlations with emphasis on numerical simulation can be found in Cavagna et al (2010b); broader theoretical reviews are Berthier and Biroli (2011), Biroli and Bouchaud (2012) and Lubchenko and Wolynes (2007).

5.2. Finite-size effects

Scaling laws as written in the previous subsection apply only to infinite systems. Although infinite systems exist only in theory, experimental systems can be considered infinite when ξ is much smaller than the system's linear size L. It is true that when the critical point is approached, ξ will eventually grow to be larger than any system size. However, in condensed matter systems samples are typically many orders of magnitude larger than microscopic lengths, and ξ can be extremely large with respect to inter-particle distances while still being smaller than L. The condition ξ ⩾ L will only be true for a range of temperatures so thin around the critical point that it this effect can in practice be ignored.

In contrast, in experiments in biology the number of units making up the system (aminoacids, cells, organisms) is quite far from the 10²³ or so that condensed matter systems can boast, and the effects of finite size must be properly taken into account. The same is true for numerical simulations, where in fact finite-size effects have long been used (Privman 1990) to extract information about the critical region. More recently, the effects of finite size have also been exploited in experiments on biological systems.

Finite-size effects blur the sharp transitions and singularities that distinguish phase transitions in the thermodynamics of infinite systems. If a critical system is finite, it can be scale invariant only up to the its size L; in this sense L acts as an effective correlation length if ξ > L. Conversely, if ξ is finite but larger than L, the system will appear nevertheless scale invariant. Thus if L is finite, scaling relations must be modified to account for the fact that correlations cannot extend beyond L. These modified relations are known as finite-size scaling relations (Barber 1983 and Cardy 1988).

According to the finite-size scaling ansatz (Barber 1983), when ξ is of the order or greater than L the scaling relations must be modified to include the ratio L/ξ. This ansatz can be justified with renormalisation group arguments (Amit and Martín-Mayor 2006 and Cardy 1988). For the static correlation function, instead of (66) we write

$\begin{equation}C(r)=\frac{1}{{r}^{d-2+\eta }}f(r/\xi ,L/\xi ),\quad \text{if}\enspace L > \xi ,\end{equation} \tag{ 73 }$

with f(x, y) → f(x) for y → ∞, and

$\begin{equation}C(r)=\frac{1}{{r}^{d-2+\eta }}g(r/L,L/\xi )\quad \text{if}\enspace L< \xi ,\end{equation} \tag{ 74 }$

with g(x) another scaling function.

At the critical point, we are effectively left with a scaling relation of the same type as (66) but with L replacing ξ (Cardy 1988),

$\begin{equation}C(r)=\frac{1}{{r}^{d-2+\eta }}g(r/L)=\frac{1}{{L}^{d-2+\eta }}\hat{g}(r/L)\end{equation} \tag{ 75 }$

(and a different scaling function). Correlations are truly scale-free only in the limit L → ∞, where the scaling function is replaced by a constant. In a finite system, the function g(x) will introduce a modulation of the power law when r ∼ L. Scale invariance manifests instead as a correlation scale that is proportional to L: enlarging the system also enlarges the correlation scale. This, together with the fact that the space-averaged correlation function has at least one zero, can be exploited to show that a system is scale-invariant. Calling r₀ the first zero of C_sp(r), one can show (Cavagna et al 2018, sections 2.3.3) that

$\begin{equation}{r}_{0}\sim \xi \enspace \mathrm{log}(L/\xi ),\quad L\gg \xi ,\end{equation} \tag{ 76a }$

$\begin{equation}{r}_{0}\sim L,\quad L\ll \xi .\end{equation} \tag{ 76b }$

We emphasise that r₀ is not in general a correlation length, since it behaves differently: it does not have a finite limit for L → ∞ (even away from the critical point) and it does not scale like ξ with control parameters. However, if ξ = ∞, then r₀ is proportional to L (the only correlation scale). So, if one can show that (76b) holds for a set of systems of different sizes, but otherwise equivalent, one can argue that those systems are in fact scale-free. Such reasoning has been used e.g. in Attanasi et al (2014), Cavagna et al (2010a), Fraiman and Chialvo (2012) and Tang et al (2017).

To expand on this important issue, we must note that scale invariance, as evidenced by the validity of relation (76b) can arise when the system's parameters are fixed at the values corresponding to the critical point in the thermodynamic limit (ξ = ∞), as shown above, but in a subtler way as well (also related to criticality): if one varies the system size and control parameters together in such a way that the ratio L/ξ remains constant, then relation (74) still leads to (75) and (76). This is what Attanasi et al (2014) find for midge swarms: swarms of different sizes form at different densities (density is the control parameter here), and yet for the set of swarms it still holds that r₀ ∼ L, implying a scaling of the form (75). Since ξ is changing from system to system (because the density changes), what must be happening is that for each size the density is adjusted so that the ratio L/ξ is constant across all swarms, and furthermore ξ must be much larger than L, or r₀ would not be proportional to L. It is remarkable that although midges are not programmed to fly at some predefined distance from one another, they still find a way to tune themselves to near-criticality and display scale-free behaviour. How a tuning of this kind can happen is open to speculation (see Attanasi et al 2014 and Chialvo et al 2020), but it is important to note that, as Attanasi et al (2014) emphasise, the critical point is not strictly defined for systems of finite size, and that the maximum susceptibility will be found for size-dependent values of the control parameters. This pseudo-critical value can be noticeably different from the rigorous thermodynamic-limit critical value. We thus have to consider the possibility that some biological systems are critical not because through evolution they found optimal values of their working parameters, but because evolution has endowed them with some means of adjusting those parameters.

Finally, turning to Fourier space, the finite-size version of (68) can be written, for L < ξ,

$\begin{equation}\hat{C}(k;L)={k}^{-2+\eta }F(kL,L/\xi )={L}^{2-\eta }\hat{F}(kL,L/\xi ).\end{equation} \tag{ 77 }$

The finite-size version of the dynamic scaling law (70) is

$\begin{equation}\tilde{C}(k,t;L)=\tilde{C}(k;L)\hat{h}(t/{\tau }_{k},kL,L/\xi ),\qquad {\tau }_{k}={k}^{-z}{{\Omega}}^{-1}(kL,L/\xi ).\end{equation} \tag{ 78 }$

Then, at the scale-free point ξ = ∞ it holds that $\tilde{C}(k,t;L)$ measured for systems of different size depends on k^z t if one measures each system at a k such that kL = const. This scaling law has been successfully applied to study the dynamical behaviour of swarms of midges in the field (Cavagna et al 2017). For the global relaxation time one has, instead of (72)

$\begin{equation}\tau ={L}^{z}{\hat{{\Omega}}}^{-1}(L/\xi ).\end{equation} \tag{ 79 }$

To conclude, let us point out that from what we have said it follows that a single measurement of ξ on a finite system does not tell one anything about whether correlations are long range or not. An unknown numerical prefactor is involved in the estimation of ξ, and the estimators we have discussed tend to be less reliable when ξ ∼ L, so that it does not make sense to discuss whether, say, ξ = L/2 is large or not. Instead, it is necessary to measure the trend of ξ with control parameters, and if at all possible, with L. Moreover, studying different system sizes can show whether the correlation scale grows with L and establish that the system is scale-free in situations where changing the control parameters is not feasible.

There can be situations, especially in biological experiments, where changing system size is not possible (e.g. how would one study human brains of different sizes?). In such circumstances, one may attempt to do finite-size scaling by measuring subsystems ('boxes') of size W ≪ L and studying how the quantities change when varying W. The idea can be traced back to Binder (1981), and has also been employed out of equilibrium (Fernández and Martín-Mayor 2015). The applicability of relations (76) with the box size W in place of L for systems of fixed size was recently demonstrated for several simulated systems by Martin et al (2021).

6. Space-time correlations in a neuronal network model

The finding of scale-free avalanches in cortical tissue (Beggs and Plenz 2003 and Shew et al 2009) and strong correlations in retinal cells (Schneidman et al 2006, Tkačik et al 2015 and 2006) are early experimental evidence of criticality in neuronal populations (see Mora and Bialek (2011) and Chialvo (2010) for reviews of these and more recent experiments). At the level of the whole brain, fMRI measurements of neuronal activity (Expert et al 2011, Fraiman and Chialvo 2012 and Tagliazucchi et al 2012) have found long-range correlation, providing support for the conjecture that the brain operates at a critical point (Bak 1996, Beggs 2008 and Chialvo and Bak 1999). Dynamic systems studies (e.g. Deco and Jirsa 2012) have also provided support for the criticality hypothesis.

Notwithstanding the importance of these works in forwarding our understanding of how complex behaviour can arise in the brain, it is fair to say that the picture of the brain as a critical system is still work in progress, and surely new experiments will be attempted in the near future. Since critical systems display distinctive dynamic as well as static characteristics, much information can potentially be gained by studying systematically static and dynamic correlations together on the same system, and in particular the scaling relations. For instance, study of static correlations in midge swarms using finite-size scaling led to the finding (Attanasi et al 2014) that they are in a critical regime compatible with the critical Vicsek model (Vicsek et al 1995). A subsequent investigation of dynamic correlations in the same system showed that they obey dynamic scaling (Cavagna et al 2017), but with a novel dynamic exponent, incompatible with the dynamics of the Vicsek model. The exponent turned out to be different from those of the classical dynamic models (Hohenberg and Halperin 1977), and this spurred still-ongoing efforts to understand whether a combination of inertia and activity ingredients can explain this new exponent (Cavagna et al 2019).

Clearly, magnets, swarms and brains are quite different systems, and in particular the special network structure of neuronal assemblies poses specific problems (Korchinski et al 2021). Certainly it is not the case to take tools developed in the statistical mechanics of condensed matter and blindly apply them to study the brain. But knowledge of other critical systems suggests the relation between correlation time and length scales can hold valuable information. An optimal opportunity to attempt this kind of study may come from novel opto-genetics techniques being applied to the simultaneous recording of the activity of thousands of neurons (see Emiliani et al 2015, for a review). In that direction we can mention the recent experimental report (Ribeiro et al 2020) describing the relation (76) with the box size W computed from opto-genetic data from the visual and auditory cortices of freely behaving mice. The question, of course, deserves careful experimental study, but we wish to conclude this article with an exploration of a simple excitable network model, which can serve as some illustration to the exposition of correlation functions, and hopefully stir up interest for experimental studies of length and time correlation scales in the brain.

6.1. The ZBPCC model

We choose the simple neural network model proposed by Zarepour et al (2019). The model is a Greenberg–Hastings cellular automaton (Greenberg and Hastings 1978) implemented on a small-world network (this model was also used by Haimovici et al (2013) but on a different graph). In detail, each site of the network has three possible states, S_i = {Q, E, R}, corresponding to quiescent, excited or refractory, respectively, and the transitions among them are ruled by the probabilities

$\begin{equation}{P}_{i,Q\to E}=1-\left[1-{r}_{1}\right]\left[1-{\Theta}\left(\sum\limits _{j}{W}_{ij}{\delta }_{{S}_{j},E}-T\right)\right],\end{equation} \tag{ 80a }$

$\begin{equation}{P}_{i,E\to R}=1,\end{equation} \tag{ 80b }$

$\begin{equation}{P}_{i,R\to Q}={r}_{2}.\end{equation} \tag{ 80c }$

P_i,a→b is the probability that site i will transition from a to state b, Θ(x) is Heaviside's step function (Θ(x) = 1 for x ⩾ 0 or 0 otherwise), δ_i,j is Kronecker's delta, r₁, r₂ and T are numeric parameters and W_ij is the network's connectivity matrix (see below). Thus an active site always turns refractory in the next time step, and a refractory site becomes quiescent with probability r₂. We set r₂ = 0.3 as in previous work (Haimovici et al 2013 and Zarepour et al 2019), so that the site remains refractory for a few time steps. The probability for a quiescent site to become active is written as 1 minus the product of the probabilities of not becoming active through the two mechanisms at work: spontaneous activation, which occurs with a small probability r₁ (mimicking an external stimulus), or transmitted activation, which occurs with certainty if the sum of the weights of the links connecting i to its active (excited) neighbours exceeds a threshold T. All sites are updated simultaneously, and we chose r₁ = 10⁻⁶ so that external excitation events are relatively rare.

The model runs on an undirected weighted graph described by the connectivity matrix elements W_ij ⩾ 0, where a nonzero element means a bond with the specified weight connects sites i and j, and W_ij = W_ji. Neither the connectivity nor the weights depend on time. The graph is a bidirectional Watts–Strogatz small-world network (Watts and Strogatz 1998) with average connectivity ⟨k⟩ and rewiring probability π. The network is constructed as usual (Watts and Strogatz 1998) by starting from ring of N nodes, each connected symmetrically to its ⟨k⟩/2 nearest neighbours; then each link connecting a node to a clockwise neighbour is rewired to a random node with probability π, so that average connectivity is preserved. The rewiring probability is a measure of the disorder in the network; π = 0 preserves the original network topology, while π = 1 gives a completely random graph. Once the network bonds are drawn, each is assigned a random weight drawn from an exponential distribution, p(W_ij = w) = λ e^−λw, with λ = 12.5 chosen to mimic the weight distribution of the human connectome (Zarepour et al 2019).

The simplest way to characterise the state of the network is perhaps the activity order parameter, i.e. the fraction of active sites

$\begin{equation}a=\frac{1}{N}A,\qquad A=\sum\limits _{i}{\delta }_{{S}_{i},E}.\end{equation} \tag{ 81 }$

By analogy to magnetic systems, one can define an associated susceptibility, proportional to the variance of a,

$\begin{equation}\chi =\frac{1}{N}\left[\langle {A}^{2}\rangle -\langle {A\rangle }^{2}\right],\end{equation} \tag{ 82 }$

where ⟨...⟩ is a time average.

We computed network activity as a function of T for π = 0.2 and average connectivity $\left\langle K\right\rangle =12$ and $\left\langle K\right\rangle =6$ (figure 6). In both cases we find what appears to be a continuous transition, where activity goes smoothly to zero at T_a ≈ 0.19 for $\left\langle K\right\rangle =12$ and at T_a ≈ 0.12 for $\left\langle K\right\rangle =6$ . The transition is however unusual in that χ has a (not very sharp) maximum at a different value of T, and the height of the peak in χ does not scale with size (i.e. unlike usual second order transitions, here χ never diverges).

**Figure 6.** Dynamic transition of the ZBPCC model on the Watts–Strogatz small world network with π = 0.2 and average connectivity $\left\langle K\right\rangle =12$ (left) and $\left\langle K\right\rangle =6$ (right). The plots show activity (blue) and susceptibility (orange) vs threshold level T. Symbol type indicates network size. Curves are obtained after averaging over 2 to 10 realisations of the network. Sizes are N = 5 × 10³, 10⁴, 2 × 10⁴, 5 × 10⁴, 10⁵, 2 × 10⁵, 5 × 10⁵ and 10⁶. In both cases r₁ = 10⁶ and r₂ = 0.3.
Download figure:
Standard image High-resolution image

According to Zarepour et al (2019), for this value of π and $\left\langle K\right\rangle =12$ one has a continuous transition, as we find here, but $\left\langle K\right\rangle =6$ lies in the 'no transition' region of $(\pi ,\left\langle K\right\rangle )$ space (see their figure 4). This discrepancy is due to the fact that Zarepour et al (2019) chose to study the percolation of active clusters instead of the total activity. Here we prefer the simpler activity parameter, which can be measured without knowledge of network connectivity (which may be difficult to obtain experimentally). Also it turns out that the transition described in terms of the activity order parameter is consistent with the dynamical behaviour, as discussed next.

The time correlation of the activity reveals a noticeable slow-down of the dynamics near T_a. We have computed the correlation time τ_a, defined according to (62), from the time correlation function of the order parameter (see figure 7). The correlation time shows a peak that grows with system size at the activity transition T_a. Looking at the normalised time correlation function after one step, C_c(t = 1), a quantity which peaks at a regular second order phase transition (Chialvo et al 2020), we find it displays a clear peak at T_a.

The peak of τ_a vs T for growing system size can also be appreciated in logarithmic scale (figure 7(c), inset), with curves of τ_a vs |T − T_a| saturating at higher values for larger systems. Unfortunately, it is not possible to explore standard dynamic scaling in this model, because it is defined on a graph with no spatial structure, so that space correlations cannot be defined. We make an attempt using χ (which in a ferromagnetic transition would scales with a power of ξ, although here it never diverges). Somewhat surprisingly, a reasonable scaling behaviour is obtained plotting N^−1/2 τ vs χ/N, i.e. the variance of a.

Finally, inspection of the full correlation function shows a qualitative change in its shape, with damped oscillations found at low T giving way to an overdamped-like decay at higher T (figure 7(d)).

6.2. ZBPCC on the fcc lattice

To overcome the impossibility of computing space correlations on the ZBPCC model, we explore the simplest strategy to endow the ZBPCC with a spatial structure: we build a graph connecting the nearest neighbours of a regular lattice, and then rewire it with probability π as in the small world case. We chose the face-centred cubic (fcc) lattice, which has connectivity K = 12 (a value in which a transition is observed both with the present approach focusing on the activity and in the approach of Zarepour et al).

Figure 8 shows the behaviour of the activity for the pure fcc lattice (π = 0) and for the fcc lattice rewired with π = 0.2. The fcc lattice shows a situation similar to the small world, with a transition in a but a maximum in χ shifted with respect to T_a, although for the rewired lattice the maximum is closer to T_a. In both cases T_a ≈ 0.19. These results suggest that T_a depends mainly on the network connectivity, and is not very sensitive to the details of network topology.

In this version of the model we can use the underlying fcc lattice to define distances and compute space correlations. We have computed C(k) and extracted ξ₂ from it using definition (64) and a quadratic fit at small k (figure 9). For π = 0, the largest ξ is about a tenth of the lattice size, and finite-size effects are modest. On the contrary, on the rewired fcc, there is a strong size effect. However, instead of the situation usual in a regular lattice, with ξ vs T curves superimposing at high T and saturating at different values on approaching the critical point (see figure 5), here ξ grows with L almost uniformly for all T, and the ξ vs T curves seem to scale when dividing by L (figure 9(b), inset). Given that this effect only appears with rewiring, it may be related to the fact that when a link is rewired, the new neighbour is chosen randomly over all the nodes without regards to the euclidean distance, a procedure that is not very realistic.

**Figure 9.** ZBPCC on the fcc lattice (left column) and the fcc rewired with π = 0.2 (right column). Panel (a): the correlation length and time have a peak at T_a. Panel (b): in the case of the rewired network, ξ has a strong dependence on lattice size, indicated by symbol colour. Panels (c) and (d): relationship between τ and ξ.
Download figure:
Standard image High-resolution image

Finally we have computed the correlation time of the activity. This shows a size-dependent maximum at T_a in both cases, confirming the dynamical relevance of T_a. Comparing τ_a and ξ shows that both grow together, but the attempt at applying dynamic scaling (figures 9(c) and (d)) is not very satisfactory. A more detailed analysis is needed of the role played by topological details and network disorder.

This brief exploration of neuronal network models should serve to illustrate the usefulness of a combined study of space and time correlations on these systems, even though their behaviour will not necessarily turn out to be exactly like what is found in condensed matter systems. In this case, the time correlation of the activity suggests that total activity is the relevant order parameter to characterise the network state and dynamic transition. However, several questions remain unanswered at this point, such as the relationship between the activity transition and the percolation transition studied by Zarepour et al (2019), the reasons behind the scarce size-effects of the correlation length in the fcc case, and the strange scaling of ξ with L and of τ with ξ in the rewired fcc. The somewhat artificial way space has been introduced in the model is perhaps to blame for some of the strange features of space correlations, and it is worthwhile to consider models with realistic spatial structure and connectivity, using directly the human connectome for a graph, as in Haimovici et al (2013) or Ódor (2016). However, a thorough exploration of these issues would take us too far from the aim of the present article, and are left to be pursued in future work.

7. Conclusions

We have presented space (static), time (dynamic), and space-time correlation functions, and discussed how they can be defined and computed in a variety of situations. We have also considered how to define and compute characteristic correlation lengths and times, and summarised our knowledge of scaling relations and finite-size effects for systems near a critical point. Finally we have used the simple ZBPCC model as an illustration of the kind of analysis that could be carried out in more realistic neuronal models, or even in experiments on networks of neurons.

We have argued that these tools from statistical physics are extremely useful to study collective phenomena in biological systems, because they are designed to capture information about the space and time structure of fluctuations around average quantities, and in that structure lies much information on the nature of complex systems. It should be clear that correlation functions are not 'one size fits all'. We have discussed a variety of them, and only within two-point correlations. The choice of the fluctuating quantity to consider can lead to different descriptions of the same system, as the example of the ZBPCC model shows, and reconciling seeming contradictions will only expand one's understanding of the system under study. In this sense, more is better, and especially so when dealing with biological systems where expectations based on naive transposition of equilibrium thermodynamics ideas about phase transitions can prove wrong. In particular, we have advocated studying both space and time correlations whenever possible. Also in the case of slowly driven out-of-equilibrium systems, their study together with other indicators of criticality such as power-law distributions of relaxation events (avalanches) can be highly informative (see Jensen 2021, in this focus issue).

We hope this paper will be useful to students and researchers outside statistical physics interested in computing correlation functions from simulation and experimental data. It might also help convince experimentalists in biology to invest in the necessary effort to acquire new experimental data with enough resolution to compute space and time correlations.

Acknowledgments

I thank D Chialvo for helpful discussions on the ZBPCC model, and for bringing some references to my attention. I also thank A Cavagna and I Giardina, who over a long collaboration helped shape my understanding of concepts and practicalities of correlation functions, especially as applied to biological systems. This work was supported in part by the BRAIN initiative Grant U19 NS107464-01, and by Grants from Universidad Nacional de La Plata and Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET, Argentina).

Data availability statement

The data that support the findings of this study are available upon reasonable request from the authors.

Appendix A.: Stochastic processes

A stochastic process, or sometimes random field, is a family of random variables indexed by $\mathbf{r}$ and t. Thus there must exist a family of probability densities $P(a,\mathbf{r},t)$ that allows to compute all moments $\left\langle {a}^{n}(\mathbf{r},t)\right\rangle$ and in general any average $\left\langle f\left(a(\mathbf{r},t)\right)\right\rangle$ . But $P(a,\mathbf{r},t)$ is not enough to characterise the stochastic process, because in general the variables $a(\mathbf{r},t)$ at different places/times will not be independent. Thus $P(a,\mathbf{r},t)$ is actually a marginal distribution of some more complicated multivariate probability density.

A complete characterisation of the stochastic process would need an infinite-dimensional probability distribution $P\left[a(\mathbf{r},t)\right]$ that gives the joint probability for all random variables $a(\mathbf{r},t)$ . However, one can usually avoid the difficulties associated with such a distribution by considering the set of n-variable joint distributions

$\begin{equation}{P}_{n}({a}_{1},{\mathbf{r}}_{1},{t}_{1},{a}_{2},{\mathbf{r}}_{2},{t}_{2},\dots ,{a}_{n},{\mathbf{r}}_{n},{t}_{n}).\end{equation} \tag{ A1 }$

Most stochastic processes (Priestley 1981, section 3.2.1) can be completely specified by the set of all joint probabilities of the form (A1) for all n and all possible choices of ${\mathbf{r}}_{1},{t}_{1},\dots ,{\mathbf{r}}_{n},{t}_{n}$ . This should include all processes of interest in physics and biology. To define two-point correlation functions we need only the set corresponding to n = 2 (7), which gives also the set for n = 1 by marginalising on the second variable.

The joint probability distributions that define the process are TTI if they obey (we omit the space variables for clarity)

$\begin{equation}{P}_{n}({A}_{1},{t}_{1},\dots ,{A}_{n},{t}_{n})=P({A}_{1},{t}_{1}+s,\dots ,{A}_{n},{t}_{n}+s)\end{equation} \tag{ A2 }$

for all times, s and n. Stochastic processes obeying TTI are called completely stationary processes. A related but less restrictive notion is that of stationary processes up order M, defined by the requirement that all the joint moments up to order M exist and are TTI:

$\begin{equation}\left\langle {A}^{{m}_{1}}({t}_{1}){A}^{{m}_{2}}({t}_{2})\dots {A}^{{m}_{n}}({t}_{n})\right\rangle =\langle {A}^{{m}_{1}}({t}_{1}+s){A}^{{m}_{2}}({t}_{2}+s)\dots {A}^{{m}_{n}}({t}_{n}+s)\rangle \end{equation} \tag{ A3 }$

for all s, n, {t₁, ..., t_n} and {m₁, ..., m_n} such that m₁ + m₂ + ⋯ + m_n ⩽ M. This is less restrictive not only because of the bound on the number of random variables considered, but also because invariance is imposed only on the moments, and not on the joint distributions themselves.

Appendix B.: The estimator ${\hat{C}}_{k}^{(c)}$ always has a zero

An important property of estimator (37) will always have zero, whatever the length of the sequence used to compute it. To see this, consider the quantity

$\begin{equation}{B}_{i}=(N-i){\hat{C}}_{c,i}=\sum\limits _{j=1}^{N-i}\delta {a}_{j}\delta {a}_{j+i},\end{equation} \tag{ B1 }$

and compute the sum

$\begin{equation}\sum\limits _{i=0}^{N-1}{B}_{i}=\sum\limits _{i=0}^{N-1}\sum\limits _{j=1}^{N-i}\delta {a}_{j}\delta {a}_{j+i}=\sum\limits _{i=0}^{N-1}\sum\limits _{j=1}^{N}\sum\limits _{k=1}^{N}\delta {a}_{j}\delta {a}_{k}{\delta }_{k,j+i}.\end{equation} \tag{ B2 }$

But ${\sum }_{i=0}^{N-1}{\delta }_{k,j+i}$ equals 1 if k ⩾ j and 0, so

$\begin{align}\hfill \sum\limits _{i=0}^{N-1}{B}_{i}=\sum\limits _{j=1}^{N}\sum\limits _{k=j}^{N}\delta {a}_{j}\delta {a}_{k}=\frac{1}{2}\sum\limits _{j\ne k}^{N}\delta {a}_{j}\delta {a}_{k}+\sum\limits _{j=1}^{N}{(\delta {a}_{j})}^{2}\\ \hfill =\frac{1}{2}{\left[\sum\limits _{j=1}^{N}\delta {a}_{j}\right]}^{2}+\frac{1}{2}\sum\limits _{j=1}^{N}{(\delta {a}_{j})}^{2}=\frac{1}{2}\sum\limits _{j=1}^{N}{(\delta {a}_{j})}^{2} > 0,\end{align} \tag{ B3 }$

where the last equality follows because ∑_j δa_j = 0. Now we can easily do the same sum starting from i = 1:

$\begin{equation}\sum\limits _{i=1}^{N-1}{B}_{i}=-{B}_{0}+\sum\limits _{i=0}^{N-1}{B}_{i}=-\frac{1}{2}\sum\limits _{j=1}^{N}{(\delta {a}_{j})}^{2}< 0.\end{equation} \tag{ B4 }$

This shows that at least some of the B_i must be negative. But since B₀ > 0, the conclusion is that B_k, and hence ${\hat{C}}_{c,k}$ , which differs from it by a positive factor, must change sign at least once for k ⩾ 1.

Appendix C.: The shape of the correlation decay

Figure 10 illustrates several (simple) possible shapes of a decaying function. It should make it clear why a threshold to define a correlation scale is misleading unless the curves to be compared are all the same shape. Apart from the simple exponential ρ(t) = e^t/τ and the sum of exponentials, the figure illustrates two interesting cases.

epsilon — **Figure 10.** Different possible shapes for the decay of the time correlation. Top left: simple exponential. Top right: simple exponential (black curve) and double exponential (red curve) decay. In this case, the threshold criterion (here = 0.1) labels the red curve as the fastest, but it clearly has a longer tail. Bottom left: stretched exponential. The integral correlation time (58) is τ_int ≈ 22.7 (black curve), τ_int ≈ 92.6 (red curve). Bottom right: exponentially damped harmonic oscillations.
Download figure:
Standard image High-resolution image

One is the Kolrausch–William–Watts functions (lower left panel),

$\begin{equation}{\rho }_{\text{K}}(t)={\mathrm{e}}^{-{(t/{\tau }_{\text{K}})}^{\beta }},\end{equation} \tag{ C1 }$

a function widely used to describe non-exponential decays, employing only three parameters. Here τ_K is a time scale and β is the stretching exponent: the KWW function gives a stretched exponential for β < 1, and a compressed exponential for β > 1. Definition (57) gives 0 in the compressed case, and ∞ in the stretched case. This can be understood considering the decay as a superposition of exponential processes,

$\begin{equation}{\rho }_{\text{K}}(t)={\int }_{0}^{\infty }w(\tau ){\mathrm{e}}^{-t/\tau }\enspace \mathrm{d}\tau ,\end{equation} \tag{ C2 }$

which defines the correlation time distribution function w(τ). It can be seen that the support of w(τ) extends to infinity for β < 1 (Lindsey and Patterson 1980). The integral correlation time is instead more useful here,

$\begin{equation}{\tau }_{\mathrm{i}\mathrm{n}\mathrm{t}}=\frac{{\tau }_{\text{K}}}{\beta }{\Gamma}\left(\frac{1}{\beta }\right),\end{equation} \tag{ C3 }$

where Γ(x) is Euler's gamma function. So τ_int is proportional to τ_K, but τ_int incorporates information about the shape of the tail of the decay, making it better suited to compare two decays with different β.

The other case is that of damped oscillations, that appear in systems with important inertial effects. In the simplest harmonic form we have

$\begin{equation}{\rho }_{\;\text{osc}}(t)={\mathrm{e}}^{t/\tau }\enspace \mathrm{cos}({\omega }_{0}t).\end{equation} \tag{ C4 }$

The integral (58) and spectrum (62) scales are

$\begin{equation}{\tau }_{\mathrm{i}\mathrm{n}\mathrm{t}}=\frac{\tau }{1+{({\omega }_{0}\tau )}^{2}},\qquad {\tau }_{0}=\frac{\tau }{\sqrt{1+{({\omega }_{0}\tau )}^{2}}}.\end{equation} \tag{ C5 }$

In the overdamped case, ω₀ τ ≪ 1 both scales tend to τ, but in the opposite limit ω₀ τ ≫ 1, where the function oscillates many times before significant damping is observed, τ_int ∼ 0 while τ₀ ∼ 1/ω₀.

Appendix D.: Algorithms to compute time correlations

The estimators can be computed numerically by straightforward implementation of equations (35) or (37), although in the stationary case it is much more efficient to compute the connected correlation through relation (31) using a fast Fourier transform (FFT) algorithm. Let us focus on the stationary case and examine in some detail these algorithms.

Algorithm 1 presents the direct method. It is straightforward to translate the pseudo-code to an actual language of your choice. Apart from some missing variable declarations, the only thing to consider is that it is probably inconvenient (or even illegal, as in classic C) to return a large array, and it is better to define C as an output argument, using a pointer or reference (as e.g. FORTRAN or C do by default) to avoid copying large blocks of data. The advantages of this algorithm are that it is self-contained and simple to understand and implement. Its main disadvantage is that, due to the double loop of lines 8–11, it runs in a time that grows as N². For N up to about 10⁵, algorithm 1 is perfectly fine: a good implementation in a compiled language should run in a few seconds in a modern computer. But this time grows quickly; in the author's computer N = 5 × 10⁵ takes 35 s, for N = 10⁶ the time is two and a half minutes. In contrast, the algorithm with FFT takes 1 s for N = 10⁶ and 11 s for N = 10⁷.

Algorithm 1. Compute the connected correlation of sequence a (of length N) using the direct O(N²) method. The connected correlation is returned in vector C.

1: function TIMECORR(a, N)
2: μ ← 0	⊳ Compute average
3: for i = 1, ..., N do
4: μ ← μ + a_i
5: μ ← μ/N
6: for i = 1, ..., N do	⊳ Clear C vector
7: C_i ← 0
8: for i = 1, ..., N do	⊳ Correlation loop
9: d ← a_i − μ
10:for k = 0, ..., N − i do
11: C_k+1 ← C_k+1 + d*(a_i+k − μ)
12: for i = 1, ..., N do	⊳ Normalize and return
13: C_i ← C_i/(N − i − 1)
14: return C

If the correlation of really long sequences is needed, the FFT-based algorithm, though more difficult to get running, pays off with huge savings in CPU time at essentially the same numerical precision. The idea of the algorithm is to compute the Fourier transform of the signal, use (30) to obtain the Fourier transform of the connected correlation, then transform back to obtain C_c(t). This is faster than algorithm 1 because the clever FFT algorithm can compute the Fourier transform in a time that is O(N log N).

Actually, we need discrete versions of the Fourier transform formulas (as we remarked before, the Fourier transform of the continuous time signal does not exist). The discrete Fourier transform (DFT) and its inverse operation are defined (Press et al 1992, section 12.1) as (it is convenient to let the subindex of a_i run from 0 to N − 1 to write the following two equations),

$\begin{equation}{\tilde{a}}_{k}=\sum\limits _{j=0}^{N-1}{\mathrm{e}}^{2\pi ijk/N}{a}_{j},\qquad {a}_{j}=\frac{1}{N}\sum\limits _{k=0}^{N-1}{\mathrm{e}}^{-2\pi ijk/N}{\tilde{a}}_{k},\end{equation} \tag{ D1 }$

where we note that the inverse DFT effectively extends the sequence periodically (with period N). The discrete version of (30) is (Press et al 1992, section 13.2)

$\begin{equation}{\tilde{D}}_{k}=\vert {\tilde{a}}_{k}{\vert }^{2},\quad \text{where}\enspace {D}_{j}=\sum\limits _{k=0}^{N-1}{a}_{k}{a}_{k+j},\end{equation} \tag{ D2 }$

and where the definition of D_j makes use of the (assumed) periodicity of a_i. D_j is almost our estimate (37): we only need to take care of the normalization and of the fact that due to the assumed periodicity of a_i some past times are regarded as future, e.g. for k = 10, in the sum there appear the terms a₀ a₁₀ up to a_N−11 a_N−1 (which are fine), but also a_N−10 a₀ through a_N−1 a₉, which we do not want included. This is fixed by padding the original signal with N zeros at the end, i.e. setting a_k = 0 for k = N, ..., 2N − 1 and ignoring the values of D_j for j ⩾ N.

In summary, to compute the connected correlation using FFT the steps are (i) estimate the mean and subtract from the a_i, (ii) add N zeros to the end of the sequence, (iii) compute the DFT of the sequence, (iv) compute the squared modulus of the transform, (v) compute the inverse DFT of the squared modulus, (vi) multiply by the 1/(N − i) prefactor. Pseudocode for this algorithm is presented as algorithm 2.

Algorithm 2. Compute the connected correlation of sequence a (of length N) using a FFT. This algorithm is O(N log N).

1: function TIMECORR(a, N)
2: μ ← 0	⊳ Compute average
3: for i = 1, ..., N do
4: μ ← μ + a_i
5: μ ← μ/N
6: for i = 0, ..., N do	⊳ Subtract the average from signal
7: a_i ← a_i − μ
8: for i = N, ..., 2N do	⊳ Pad with 0s at the end
9: a_i ← 0
10: b ← FFT(a, 2N)	⊳ Compute the FFT of a as a vector of length 2N
11: for i = 1, ..., 2N do	⊳ Compute squared modulus of b
12: ${b}_{i}{\leftarrow}{\left\vert {b}_{i}\right\vert }^{2}$	⊳ Note that the Fourier transform is complex
13: C ← IFFT(b, 2N)	⊳ Inverse FFT
14: C ← RESIZE(C, N)	⊳ Discard the last N elements of C
15 for i = 1, ..., N do	⊳ Normalize and return
16: C_i ← C_i/(N − i − 1)
17: return C

To translate this into an actual programming language the comments made for algorithm 1 apply, and in addition some extra work is needed for lines 10–13. First, one needs to choose an FFT routine. The reader curious about the FFT algorithm can read for example Press et al (1992) (chapter 12) or Duhamel and Vetterli (1990), but writing an FFT routine is not easy, and implementing a state-of-the-art FFT is stuff for professionals. Excellent free-software implementations of the FFT can be found on Internet. FFTW (Frigo and Johnson 2005), at http://fftw.org deserves mention as particularly efficient, although it is a large library and a bit complex to use. Note that some simpler implementations require that N be a power of two, failing or using a slow O(N²) algorithm if the requirement is not fulfilled. Also pay attention to (i) the difference between 'inverse' and 'backward' DFTs (the latter lacks the 1/N factor), (ii) how the routine expects the data to be placed in the input array, (iii) how it is returned, and (iv) whether the transform is done 'in place' (i.e. overwriting the original data) or not. If the routine is a 'complex FFT' it will expect complex input data (so that for real sequences you will have to set to zero the imaginary part of the a_i), while if it is a 'real FFT' routine it will typically arrange ('pack') the output data in some way, making use of the discrete equivalent of the $\tilde{A}(-\omega )={A}^{\ast }(\omega )$ symmetry so as to return N real numbers instead of 2N. This affects the way one must compute the squared modulus (lines 11–12). For example, for the packing used by the FFTW real routines, lines 11–12 translate to (in C)