Inter-regional correlation estimators for functional magnetic resonance imaging

Functional magnetic resonance imaging (fMRI) functional connectivity between brain regions is often computed using parcellations defined by functional or structural atlases. Typically, some kind of voxel averaging is performed to obtain a single temporal correlation estimate per region pair. However, several estimators can be defined for this task, with various assumptions and degrees of robustness to local noise, global noise, and region size. In this paper, we systematically present and study the properties of 9 different functional connectivity estimators taking into account the spatial structure of fMRI data, based on a simple fMRI data spatial model. These include 3 existing estimators and 6 novel estimators. We demonstrate the empirical properties of the estimators using synthetic, animal, and human data, in terms of graph structure, repeatability and reproducibility, discriminability, dependence on region size, as well as local and global noise robustness. We prove analytically the link between regional intra-correlation and inter-region correlation, and show that the choice of estimator has a strong influence on inter-correlation values.


Introduction
Functional connectivity of the brain is estimated from observations using non invasive techniques such as electroencephalography (EEG), magnetoencephalography (MEG) or functional Magnetic Resonance Imaging (fMRI).
Each recording provides time series associated to spatial locations within regions of the brain. Functional connectomes, that is, graphs representing the estimated connectivity, are then constructed by computing dependence measures between the time series. For fMRI, in addition to the choice of acquisition sequence, physiological noise (Caballero-Gaudes and Reynolds, 2017), preprocessing (Braun et al., 2012), and subject motion, it has been shown that computation of connectomes is affected by three main parameters: the length of the acquisition (Whitlow et al., 2011;Van Dijk et al., 2010), the number of regions (Stanley et al., 2013;Cao et al., 2019) and the chosen frequency band (Cordes et al., 2001;Salvador et al., 2005;Braun et al., 2012;Chen and Glover, 2015). In addition, sample size will also play a role in terms of group comparisons (Termenon et al., 2016).
Typically, each region of the brain, defined by a structural or functional parcellation, is associated to a given set of voxels amongst the thousands for which a signal is recorded. The idea is then to extract a representative of the set of voxels to attach one time series to each region. When structural atlases are used, the most common approach is to take the average of the voxel time series at each time point. Indeed, almost 70% of papers on PubMed that used the Human Connectome Project dataset to conduct functional-connectivityrelated studies in the last five years (e.g., Ogawa (2021); Figueroa-Jimenez et al. (2021); Bolt et al. (2017); Zhang et al. (2016)) use this method. While there are numerous other approaches to connectivity estimation (including the related techniques for estimating parcellation from connectivity, see, e.g., Eickhoff et al. (2015), or using regional medians (Braun et al., 2012) or eigenvectors (Büchel and Friston, 1997;Braun et al., 2012) instead of means), we focus on correlations between averaged regional time-courses, and argue that this technique introduces bias in the estimation of the functional connectomes.
In the statistical literature, aggregation of data is often used because either the data are collected in different groups (or organizations, or regions), or one wishes to increase the signal to noise ratio. However, difficult challenges arise due to the presence of correlations within the collected datasets.
Measurement error can impact inter-group correlations. For example, in Ostroff (1993), the objective was to evaluate the relationships among variables by studying correlations between satisfaction and technology at two levels: individual scores (i.e., individual correlation) and when individuals are grouped into organizations, also called organizational correlation. Depending on three main factors, mainly the measurement errors, the variance within group and the ratio of the intra-group variance and the total variance, the ratio of organizational correlation and individual correlation was shown to take values between -1 and 2. This means that it is crucial to identify the way the data are generated. Measurement errors have been well studied in fMRI. They are related to both the hardware and the subject (Greve et al., 2013), and are known to impart correlation structure to the data that is not linked to neural activity (Jo et al., 2010;Murphy et al., 2013).
Group size can also influence inter-group correlations. In the studies of familial data (Rosner et al., 1977), specific characteristics are obtained for different families with different sizes. The quantification of family resemblance is studied in this context, with the estimation of sibling correlations and/or parent-offspring correlations (see Donner and Eliasziw (1991) for a review). The difficulties arise because of the dependence between the children and the different number of children per family. It was noted that correlation between the children and parents and the average of correlations between all children and parents are not equal in the majority of cases. In fMRI, the dependence of inter-region correlations on brain region size has been noted (Achard et al., 2011), and it has been shown that autocorrelation depends on region size (Afyouni et al., 2019). Regional homogeneity approaches are a wide class of methods where intra-regional connectivity is used to quantify neurobiological differences (Jiang and Zuo, 2016). This observation has also recently been confirmed using Wasserstein distances between intra-correlation densities using Patients with Alzheimer's disease (Petersen et al., 2016).
Finally, the spatial aspect of the data also complicates correlation estimation. Computing correlations is common when geostatistics is applied to ecology, geography, climate studies, and more. The data collected in these fields are attached to a spatial position and usually with spatial correlation.
This problem was first reported by Student (1914), and studied in Clifford et al. (1989) for two spatial processes. Applications of these methods can be found for example in the study of meteorological data (Gunst, 1995), in ecological data (Liebhold and Sharov, 1998), but also in fMRI for instance in clustering (Ye et al., 2009(Ye et al., , 2011.
In all these fields of application, the main difficulty is to take into account spatial correlation when the goal is to construct estimators of correlation and to build testing procedures when the averaged variables are not independent.
Thus, in this paper, we question the default choice of using correlations of averages of voxel timecourses, and examine in details the assumptions of various methods and their robustness to various types of noise. We propose first a simple definition of a spatial model of fMRI with intra-correlations within brain regions. Then, computations of correlations are described and we show the potential bias in the estimators. Based on simulations, we illustrate the good behaviour of the newly introduced estimators. Finally, we conclude with results on human data and rat data.

Definition of the proposed spatial model for fMRI data
Let C ⊂ Z d , d ∈ N * , be a finite compact set of multi-indices. In the context of our application, d = 3 and C contains all 3-tuples indexing the voxels of an three-dimensional image of a brain. Each brain is virtually partitioned into J regions of interest which are represented through their subsets of voxels R j of cardinal #R j = N j , j = 1, . . . , J. We thus have For any d-tuple i ∈ C, a signal Y i (·) sampled at times t = 1, . . . , T is observed and we assume that it can be decomposed as follows where X i (·) is an unobserved signal of interest, ε i (·) is a local noise contaminating locally the signal X i (·), and e(·) is a global noise corrupting in the same way the signal measured in each voxel indexed by an element of C. This can be, e.g., a consequence of thermal or background noise (Lazar, 2008;Greve et al., 2013), which at high field strength has been shown to explain a high proportion of noise variance (Greve et al., 2011).
We make a few assumptions on these different components. First, we assume that all random variables are centered. We also assume that the signals X i (·), ε i (·) and e(·) are mutually independent and independent in time and that the global noise is homoskedastic with a variance denoted as σ e ≥ 0.
Note that these are not very strong assumptions, and precise mathematical definitions of these concepts as well as discussions are provided in Appendix B.
Let us now describe the spatial nature of the signal and of the local noise.
For any i, i ∈ C, j, j = 1, . . . , J (j = j ) and for all t = 1, . . . , T , we assume The parameter r jj represents the correlation between two (unobserved) signals of two different regions R j and R j and is called inter-correlation between regions R j and R j in the following. This is the parameter of interest we focus on in the rest of the paper. The parameter ρ ii (resp. η ii ) represents the intra-correlation between two (unobserved) signals (resp. the spatial correlation between two local noises) inside a common region. We assume that inside each region, the signals of interest have positive intra-correlation and that for each time t and for j = 1, . . . , J, is a stationary random field defined over R j (resp. C). We furthermore assume that both the correlations ρ ii (for any i, i ∈ R j for some j) and η ii (for i, i ∈ C) are stationary and isotropic with respect to the uniform distance, that is ρ ii and η ii depend only on the (uniform) distance between the two voxels i and i . For brevity, we still denote ρ |i −i| by ρ ii and η |i −i| by η ii where for x ∈ Z d , the notation |x| stands for the uniform norm. Our a priori is that the intra-correlation ρ δ is close to 1 for moderate distances δ, meaning that close neighbours are strongly (positively) correlated. For the local noise, we assume that there exists p such that the local noise is p-dependent, i.e., η δ = 0 for any δ ≥ p. Without loss of generality, we also intrinsically assume that for all i ∈ R j and i ∈ R j , ε i (t) and ε i (t) are uncorrelated. This simplifies the presentation in the next sections.
Given a parcellation of the brain, the objective is to estimate intercorrelations r jj for each pair of regions of interest. We do not model the distribution of Y i but only its second-order properties (through X i , ε i , e).
Moreover, we consider the intra-correlations, the correlation of the local noise and the different variances as nuisance parameters that we do not want to estimate. In the next section, we present various estimators of r jj built in order to address one (or several) of the following problems: (1) the regions of interest R j and R j may contain a different number of voxels; (2) the intra-correlation may deviate strongly from 1 (especially for large regions); (3) there may be a non negligible local noise ε i affecting the signal in each region; (4) there may be a global noise affecting all regions.
For any j = 1, . . . , J, we define a ν-neighbourhood and denote it by V as a subset of n ν := (2ν + 1) d indices, all of which are at a distance less than or equal to ν from the center j of the neighbourhood. For any set of indices E (which could be a ν-neighborhood or a region of interest) and any spatio- . . , T the time series spatially averaged over E, that is To sum up, we reserve the bold notation to mainly denote a vector of length T , the notation· to denote an average over time while· will denote an average over space. Hence, for instanceσ 2 (Ȳ E ) denotes the sample variance of the vector with components (#E) −1 i∈E Y i (t) for t = 1, . . . , T . We also letρ The quantityρ E represents the (spatial) average intra-correlation inside the set E. If E corresponds to a ν-neighborhood with moderate ν, we may expectρ V to be close to 1. Such an observation is probably less realistic when E = R j especially for large regions. The quantityη E is related to the (spatial) correlation structure of the local noise. By assuming this noise to be p-dependent (that is η δ = 0 when δ ≥ p), it is clear that the larger #E the smallerη E .
Using the assumption given in Section 2.1, for any E ⊆ R j and E ⊆ R j , The variance can also be deduced as follows: The detail of this result is given in Proposition Appendix C.1.
In the next sections, we set j, j and thus aim to estimate r jj .

Existing inter-correlation estimators
We first review existing inter-regional correlation estimators using a unified notation throughout 1 . Results on consistency of the estimators are provided in Appendix C-Appendix G.4.
1 In a previous study (Achard et al., 2011), we already described three of the estimators discussed here (ca, ac, ca), but not with a unified notation, as well as a fourth estimator which is only discussed in the appendix of the present paper

Correlation of averages (method ca)
In order to increase the signal-to-noise ratio, the most standard method in fMRI is to average (or sometimes convolve with a Gaussian kernel) the signal in space (in each region of interest). The aggregated correlation estimator corresponds to the standard estimator (see Section 1) considered for example in Achard et al. (2006), Bolt et al. (2017) or Ogawa (2021): This estimator, illustrated in Figure 1 was designed to reduce the local noise. Indeed, in the absence of global noise (σ 2 e = 0) and assuming equal variances, this estimator tends to r jj / (ρ R j + σ 2 εη R j )(ρ R j + σ 2 εη R j ). The interest of averaging before computing correlations is clear: the local noise is smoothed, thusη R j = O(1/N j ) is probably very small. However, even in absence of noise (σ ε = σ e = 0), r ca has a serious drawback since it estimates r jj / ρ R jρ R j which is highly dependent on intra-correlation. Just to give an example, assume r jj = 1/2, N j = N j = 2, ρ 1 = 0 thenρ R j =ρ R j = 1/2 and r ca jj = 1. This is a caricature but illustrates what may happen for large regions when some of the signals X i are not enough positively correlated.
That fact was already pointed out by Achard et al. (2011).

Average of correlations (method ac)
Instead of evaluating correlation of spatial averages, it is natural to perform the (spatial) average of correlations. This estimator, illustrated in Fig-ure 1, is given by: As seen from Table 1, in absence of global noise (σ 2 e = 0) and when the variances are equal to 1, this estimator estimates the quantity r jj /(1 + σ 2 ε ) which makes this estimator robust to large regions (for whichρ R j may be far from 1) but more sensitive to local noise than the estimator r ca jj .

Replicates for correlations (method r)
In order to cancel out the effect of local noise, we introduce to fMRI a slight adaptation of the estimator introduced by Bergholm et al. (2010), in the context of image analysis. This is based on the concept of replicates within the same region, and denoted by r R (r for replicates). It is illustrated in Figure 1 and defined by Under equal variances and absence of global noise, r R jj estimates 1/|ρ δ | which is clearly independent of σ 2 ε and may be expected to be close to one if δ is small.
2.4. Novel estimators: discarding the effect of global and/or local noise 2.4.1. Use of a priori uncorrelated regions (method d based on differences) We now present an estimator which handles the problem of global noise.
To achieve this task, we assume that among the regions where the signal is recorded there are at least two regions say R k and R k which are uncorrelated between themselves and from all the other ones. With a slight abuse of notation, k, k will be used for the indices of these two regions, while j, j will be used when we are interested in the inter-correlation between regions R j and R j (hence r jk = r jk = r j k = r j k = 0). This assumption is realistic in the context of fMRI data where we are interested in the correlations between cortical regions. Indeed, the field of view is typically larger than the brain itself, and the definition of extra regions is possible, for instance using air voxels or muscle voxels. The estimator is illustrated in Figure 1.
We propose the following strategy: where for four vectors Y 1 , Y 2 , Y 3 and Y 4 (with same length) and where for three vectors U, V and W with same length The intuition of this estimator is quite simple. Assume that the local noise has zero variance. Since the noise e(·) is global, subtracting from And since the regions R k and R k are not correlated and not correlated to the other ones, the numerator (for each b) is an estimate of σ j σ j r jj . Then, we just have to divide by estimates of σ j (and σ j ). We observe that this cannot be done using simply This justifies the introduction of s 2 .
Note thatr D jj is still biased with respect to local noise (see Table 1).
A more formal proposition and proof for this estimator are provided in Appendix E.

Replicates and use of a priori disconnected regions: method rd
Combining replicates and the idea based on differences motivates us to propose the following estimator (see Sections 2.3.3 and 2.4.1 for notation) It is worth pointing out that r RD jj is independent of σ ε and σ e and equals the unknown r jj if ρ δ is close to 1. A more formal proposition and proof for this estimator are provided in Appendix F.

Localized versions of inter-correlation estimators
As mentioned previously, when noisy signals are averaged, the signal to noise ratio increases. A very popular method in neuroimaging analyses is to apply a Gaussian smoothing on the fMRI volumes (Worsley et al., 1992(Worsley et al., , 1996Poline et al., 1997). However, applying a large kernel width may have dramatic effect on brain connectivity (Triana et al., 2020). Some earlier work on PET connectivity used a local neighbourhood centered around voxels of interest to smooth the signal in each region prior to connectivity estimation (Köhler et al., 1998). We introduce in this section estimators using local neighborhoods to control the smoothing effect on correlation estimations.

Local correlation of averages (method ca)
Motivated by the first two estimators, we propose to estimate r jj using an empirical average of local spatial averages.

Local average of replicates (method r)
This estimator consists in replacing single indices by neighborhoods in (6).
The local average of replicates based estimator is defined by 2.5.3. Local averages and use of disconnected regions (method d) We use in particular notation introduced in Sections 2.5.2 and 2.4.1 to propose the estimator r D jj given by 2.5.4. Replicates, local averages and use of a priori disconnected regions This estimator is a local version of r RD jj and is defined by

Summary of estimators
We have formalised 9 estimators for inter-region correlation in fMRI, 6 of which are novel to the best of our knowledge. They vary in terms of their theoretical sensitivity to three factors: differences in region sizes and region intra-correlations (ρ R j 1), local noise (σ ε ), and global noise (σ e ). Table 1 summarises estimators properties qualitatively using − for estimators that are sensitive to these factors, + for estimators that are insensitive, and ± for those that are in-between. The ca, ac, r, and d estimators are also sketched in Figure 1.
As an example, let us interpret the properties of ca shown in Table 1 in terms of these factors. First, we observe that the limit r ca jj /r ca jj strongly depends on the region size. Indeed, even in absence of noise this limit is 1/ ρ R jρ R j , which can be quite far from 1 especially for very large regions (so the estimator is sensitive to local noise and denoted − in the corresponding column). Now imagine thatρ R jρ R j = 1 and that σ 2 e = 0 then the limit becomes 1/ (1 + σ 2 ε,jη R j )(1 + σ 2 ε,j η R j ). Since it is expected thatη E is small (see (C.3)), especially for large sets E, this limit should be quite close to 1 in this situation (+). Finally, ifρ R jρ R j = 1 and σ 2 ε = 0, and assume for simplicity that σ j = σ j = 1, then r ca jj would converge towards (r jj +σ 2 e )/(1+ σ 2 e ) which can significanlty deviate from r jj when the global noise is strong (−).
This does not describe at all finite sample properties of the different estimators. Obviously, we could be tempted to always use the last two estimators (methods RD ad RD) which seem to be the most robust to additional noises.
However, it is clear that the price to pay for reaching robustness is accuracy.
We propose to investigate these finite sample properties in a simulation study (Section 2.7.1) and real datasets (Sections 2.7.2 and 2.7.3 ).
We note that evaluating asymptotic variances of the different estimators would add too much notation, assumptions and technicalities, and is left for future work.
R code implementing all estimators, as well as NIFTI time-series extraction and preprocessing, is available at https: (10))  (1). We refer the reader to Section 2.2 for details on notation. In particular σ 2 e,j , σ 2 ε,j and σ 2 e,jj are given by (3) whileρ for two sets of indices E, E are given by (2) and (C.7). Sensitivity of estimators to three factors are reported: differences in region sizes and region intra-correlations (ρ Rj 1), local noise (σ ε ), and global noise (σ e ). Estimators that are sensitive to these factors are denoted − those that are insensitive are denoted + and those in-between are denoted insensitive ± .

Datasets
We used three different datasets to evaluate our estimators: a simulated dataset, a rat dataset including both dead and live animals, and a healthy human subject dataset with test-retest data.

Simulated data
For different sets of parameters (local noise, global noise), we generated according to model (1) two regions containing 20 and 40 voxels, respectively, each corresponding to a time series of length 1000, resulting in a total of 60 time series. This simulation was therefore done in dimension 1 to save time and we abusively keep the terminology voxel for a one-dimensional cell. We assumed the signal, local noise and global noise followed Gaussian distributions. Throughout the simulations the true inter-correlation was set to 0.6 and the local noise ε i (t) and ε i (t) were assumed to be uncorrelated. The intra-correlation was defined by the following spatial model: ρ |i −i| = max(1 − |i − i|/100, 0.6). This corresponds to a setting where two voxels inside a given region have moderate intra-correlation.
Twenty-five rats were scanned and identified in 4 different groups: DEAD, ETO-L (Etomidate), ISO-W (Isoflurane) and MED-L(Medetomidine). The first group contains dead rats and the three last groups correspond to different anesthetics. In this paper, we show results with data from three rats, one dead and two alive with different anesthetics (ETO-L, ISO-W).
The duration of the scanning was 30 minutes, using single-shot echoplanar imaging with TR / TE = 500 / 20 ms, so that 3600 time points were available at the end of experiment. The resolution was 0.47 × 0.47 × 1.00 mm, slice gap 0.1 mm, 9 slices. After preprocessing as explained in Becq et al. (2020b), 51 brain regions for each rat were extracted using an in-house atlas. Sufficiently large regions are needed to be able to use the r estimator.
We hence discarded regions that contained fewer than 40 voxels, and were left with 18 brain regions: The anterior cingulate cortex (ACC), bilateral Insular cortex (Ins r and Ins l), bilateral primary motor cortex (M1 r and M1 l), bilateral somatosensory 1 (S1 r and S1 l), bilateral somatosensory 1 barrel field (S1BF r and S1BF l), bilateral auditory cortex (AU r and AU l), bilateral caudate-putamen (striatum) (CPu r and CPu l), bilateral thalamus (Th r and Th l), bilateral basal forebrain region (BF r and BF l), bilateral hippocampus (HIP r and HIP l).
Voxel time series were waveled-filtered using Daubechies orthonormal compactly supported wavelet of length 8.

Human Connectome Project data
We also evaluated our estimators on a subset of the Human Connectome Voxel time series were wavelet filtered using Daubechies orthonormal compactly supported wavelet of length 8.
The code to extract the time series from the preprocessed HCP data is available at https://gitlab.inria.fr/q-func/ireco4fmri.

Evaluation and metrics
Preferable inter-correlation estimators have three properties i) face validity, ii) good repeatability, iii) preservation of the differences between individuals (discriminative power), and iv) independence from region size.
First, on simulated data, we qualitatively inspected the bias and variance of the distribution of correlation values with respect to known ground truth for various levels of global and local noise.
Then, using rat data, we performed a face validity analysis of the estimators, with the premise that dead rats should show no functional connectivity (the correlation distribution should be centered at zero). In order to quantify the differences between correlation values obtained for dead and live rats, we computed the Wasserstein distance between the correlation distributions of each anesthetized rat in comparison to that of a dead rat. A low value of the Wasserstein distance indicates that correlations values of live and dead rats are comparable and counts negatively in the evaluation of an estimator.
To evaluate the repeatability of the proposed estimators on the rat dataset, we split the time series in two equal parts. We computed the correlations on each part using the whole range of proposed estimators, and computed the Concordance Correlation Coefficient (CCC) (Lin, 1989) between splits to provide a scaled measure of agreement, where 1 is perfect agreement and 0 is no agreement. A preferable estimator should be more repeatable and have higher CCC.
To quantify the similarity of connectivity graphs between estimators, we computed the number of common edges between graphs obtained from each estimator. To this end we used a sparsity threshold equal to 20% of the total number of edges (i.e., 27 edges in our case with 18 regions).
For human data, we used rs-fMRI sessions from different days, in order to evaluate reproducibility of correlation coefficients. This was analysed using CCC, again with a preferable estimator being more reproducible and having higher CCC.
We also evaluated the reproducibility of graph metrics between sessions.
To this end we used a sparsity threshold equal to 20% of the total number of edges, keeping only edges with the highest correlation (i.e., 783 edges in our case with 89 regions), and binarized the edges. In order to compute graph metrics, we forced the graph to be connected by applying a minimum spanning tree (Alexander-Bloch et al., 2010). Then we computed classical graph metrics: betweenness centrality, transitivity, global and local efficencies using package iGraph. Reproducibility was evaluated using the CCC.
We also summarized the differences of connectivity graphs between estimators, by computing the number of common edges between graphs obtained from ca and ca using thresholding at the 20th percentile (i.e., 783 edges with 89 regions), and visualized the difference qualitatively by taking absolute values of correlation values for each estimator, rank-transforming, and computing median difference in ranks across all subjects, Additionally, we also evaluated discriminative power of the various estimators via three metrics: inter vs intra-subject graph distance, a nonparametric test of the same, and identification rate using functional connectome fingerprinting (Finn et al., 2015). A desirable estimator should provide estimates that preserve inter-individual differences.
We defined the intra-subject distance as the distance beween the graph representing the first rs-fMRI session and the graph representing the second rs-fMRI session. The inter-subject distance was computed between each subject's first session and all other subject's first sessions. Separation between the intra-subject distances and the inter-subject distances was quantified by mean and standard deviation of the distributions, and by a Wilcoxon ranksum test on multiple random splits of subject data, to avoid having multiple measurements of the same subjects. Here, we repeated 10 times the following procedure for each estimator of interest: first, split the subjects into two disjoint sets -one used to compute intra-distances (18 subjects), and one to compute inter-distances (18 subjects). Within the inter-distances set, 9 subject pairs were formed randomly. Then, a one-sided Wilcoxon rank-sum hypothesis test for the null hypothesis of no difference between inter-and intra-distances was performed, yielding a W statistic and a p-value for each of the 10 runs. We then computed the average W value across runs, as well as the harmonic mean p-value (Wilson, 2019) across runs, a procedure with strong family-wise error rate (FWER) control even for positively dependent tests.
To compute identification rate, functional connectome fingerprinting represent each subject's graph g as a vectorized version a of the upper-triangular (or lower-triangular) part of the full inter-region correlation matrix (whose entries are r ii ), and computes the fingerprinting distance between graphs as d(g 1 , g 2 ) = 1 − Cor(a 1 , a 2 ), where Cor denotes Pearson correlation. From the (intra, inter) fingerprinting distance distributions, the identification counts as correct if the intra-subject distance is lower than all inter-subject distances.
This is equivalent to a top-1 recognition rate. We note there are many other possibilities to compute distances between such brain graphs (Richiardi et al., 2013;Ng et al., 2016;Dadi et al., 2019), including computing distances between graph embeddings, which could substantially alter results.
Finally, we evaluated the dependence on region size by computing Spearman correlations between atlas region size and the average of correlations in which the region is involved (itself averaged across subjects). A preferable estimator should minimize dependence to region size, and show lower Spearman correlation. We tested differences between estimators using a paired t-test between these Spearman correlations.

Evaluation on simulated data
We first study the properties of the estimators in the absence of noise (σ ε = σ e = 0). Figure 2 shows boxplots based on 500 replications of the general model (1) with parameters described previously. Figure 2 (with σ ε = σ e = 0) shows that when the intra-correlation within each region R j and R j is high for any pair of voxels, then the estimates of r are quite satisfactory even if we can already observe a slight bias for the method ca. It is to be noticed that in terms of variance the methods based on replicates (i.e. methods r, r) are as efficient as the methods ac and ca. The methods based on the a priori knowledge of two other disconnected regions (methods based on "differences", i.e. the methods d, d, rd and rd) have higher dispersion than the other ones. Finally, we also observe that the "complex" methods r and rd based on local averages generate a more important bias than other methods. This bias is clearly smaller than the one observed for the method ca and illustrates the following fact: when we suspect that the intra-correlation matrix is not very well concentrated on the diagonal then the size of the neighborhood (ν in the definition of the estimators) should be chosen sufficiently small. Number of voxels in E Spatially averaged correlation We now turn to study the influence of local and global noise, which are also illustrated in Figure 2. The variances of local and global noise are parameterized by fixing the signal-to-noise ratio: when for instance σ e = 0, we fixed σ 2 ε as follows SNR ε = 10 log 10 min(σ 2 j , σ 2 j ) σ 2 ε ⇔ σ 2 ε = 10 −SNRε/10 min(σ 2 j , σ 2 j ).
When σ ε = 0 and σ e = 0, as expected, Figure 2 shows that the methods based on replicates are able to estimate correctly the inter-correlation parameter r and these methods remain efficient whatever the value of the SNR. The methods ac, ca, d and d are strongly affected by this additional local noise and exhibit a high negative bias. As explained in Section 2.3.1, the method ca which averages the signals in the regions R j and R j is able to reduce the effect of the local noise.  (Becq et al., 2020a). In order to validate these methods, we also apply our estimators to live rats. The results of two live rats is shown in Figure 3   regions are too small, which is often the case in rat data. From now on, we will hence focus on estimator ca.

Evaluation on rat data
We then quantified the edges in common between the networks obtained via the two estimators ca (which is currently the most widely used estimator) and ca. For the dead rat, 67% of edges are in common between the two estimators. Additionally, 60% and 77% of edges are similar for the live rats. ac, D, RD have a very low distance, indicating that correlation values are similar between dead and live rats for these estimators.

Evaluation on human data
Based on our findings on the rats datasets, we evaluate the performances of the three estimators ca (most common estimator, highest dead-live rat distance), ac (low dead-live rat distance) and ca (high dead-live rat distance) for 36 subjects of the HCP dataset. Reproducibility results for correlation estimates are shown in Figure 4 (B). The Concordance Correlation Coefficient was similar between estimators 0.41 (0.15). ca had significantly lower CCC than ca (T=-4.6, p = 6.1e −5 ).
In the thresholded graphs, the percentage of edges in common between estimators ca and ca was on average equal to 70% for the thirty-six subjects used in this analysis for both sessions. Figure 5 shows median differences between the estimators in brain space across the HCP subjects.
Group differences were also similar between estimators. Table 2 provides details.

Discussion
In this paper we illustrate the effect of averaged data on estimators of correlation when two types of noises are present, local and global noise. The use of the classical correlation of averages is hindered by the presence of these noises in addition to the presence of intra-correlations. We proposed estimator intra (sd) inter (

The correlation of averages estimator is highly biased
The CA estimator tends to be highly biased, as illustrated on synthetic data where the ground truth is known, but also compared to other estimators, as shown on live rats and human data, where the mode of the distribution of correlation values is systematically among the highest found. We hypothesise that this is driven by a combination of low intra-correlation and large region sizes, which further lowers intra-correlation. This can be seen from the estimator definition in Eq. 4. We also note that the ca estimator effectively reduces this influence of region size.
In addition, Figure 5 revealed a systematic spatial bias between the ca and ca estimator, exhibiting dorsal posterior hyper-connectivity for ca, and corresponding ventral anterior hypo-connectivity. The figure also suggests that the largest differences between the two estimators comes between regions that are the largest, further highlighting the reduced dependency to region size for the ca estimator. The spatial distribution of these differences suggests that caution is in order when examining large-scale resting-state networks derived from the ca estimator, as some apparent topological properties of brain networks, such as modularity, could be driven in part by region size and region intra-correlation. Indeed, in our experiments, thresholded graphs differed in a large proportion of edges, both in rats (around 30%-50% edge differences) and humans (around 30% edge differences).

local noise and intra-correlation link to long-range correlation
In this paper, we explain the bias observed in ca estimator by introducing hypotheses on both intra-correlation and noise. Indeed, previous studies on regional homogeneity (Zang et al., 2004) showed relevant results on classification of pathologies based only on intra-regional properties. This was confirmed by a recent work on classification of intra-correlation (Petersen et al., 2016) using Wasserstein distances. Based on these findings, we hypothesize that bias observed on inter-correlation is driven by intra-correlation and noise. Our simple simulation model illustrates the effect of local noise and intra-correlation. This is clearly displayed in Figure 2, where the boxplots for the various estimators are plotted. However, it is important to note that under local noise, in this framework with controlled intra-correlation, estimator ca is relatively close to the exact value. This may be explained by a trade-off in the denominator of the limit as expressed in Table 1. In our simulation, we also observed that the ca estimators bias depends on the intra-correlation and local noise. Indeed, high values of ca tends to be observed when low values of intra-correlation are observed. These low values of intra-correlation have already been mentioned in the study of dynamics of neural networks (Deco et al., 2014) where local decorrelation was reported in real datasets. In our paper, for the first time, we proved a statistical explanation of the link between local decorrelations and long-range correlations using aggregated time series.
The model chosen in this paper for intra-correlation and local noise was driven by statistical motivations to be able to write explicit formulas for the limit of the estimators. However, as observed in Jiang and Zuo (2016)

Repeatability and reproducibility
Repeatability of correlation values in dead rats was very low for all estimators, consistent with the random nature of the data. For live rats, the CCC ranged from 0.46 to 0.87 depending on specimen and estimator. For humans, ca and ca showed approximately the same reproducibility (0.63 average (0.2)), and ac was slightly superior (0.67 average (0.2)). But reproducibility differences between estimators were much less pronounced than reproducibility differences between individual subjects.
As a representative for the reproducibility of graph metrics, we inves-tigated betweenness. Here, ac offered the highest reproducibility (average (sd): 0.58 (0.18)) and ca improved markedly over ca (0.41 (0.15) vs 0.30 (0.17)). This is contrast to another study that found no effect of aggregation method (region mean time series versus region median versus 1st eigenvariate of the region) on the reproducibility of graph metrics (Braun et al., 2012) (although in that study sessions were weeks apart).

Discriminability
Estimators ca, ac, ca showed similar values for discriminability, with slightly lower identification rate and intra-subject to inter-subject distribution separation for ac than the two others, and slightly lower intra-inter separation for ca than ca. This suggests that the improved robustness to region size and intra-correlation effects of ca does not result in a sizeable impact on discriminative ability, although this warrants further evaluation.

Limitations
Our signal model, and therefore the derived estimators, is a trade-off between model realism and tractability of the analysis of estimator properties.
This comes with important limitations.
First, assuming stationarity and additivity of the local noise fails to capture effects like system instability due to B0 inhomogeneity, RF power variations, or gradient fluctuations (Lazar, 2008;Greve et al., 2013;Liu, 2016).
Independently of the model, note that effects such as drift are mitigated by using wavelet coefficient time series as we did in this study, and that such instabilities explain proportionally less of the noise variance than thermal noise at high field (Greve et al., 2011).
Second, motion effects, and in particular differential long-vs short-range effects on correlations (Van Dijk et al., 2010;Yan et al., 2013), were not studied, and their interplay with the spatial bias exhibited by estimator ca in Figure 5 was not examined.
Third, our new estimators come with the added burden of choosing hyperparameters such as neighbourhood size. These are currently selected empirically, and no systematic sensitivity analysis has been performed.
Despite these limitations, we believe our empirical tests served to bridge the gap towards applicability, since our model yielded at least an estimator, ca, with useful properties for use in neuroimaging -namely, reduced dependency to region size and low intra-correlation, and improved reproducibility of graph metrics.

Acknowledgements
This work pas partly funded by French National Research Agency project (17/24) first averaged voxels before computing the inter-regional correlations and 88% (21/24) employed some kind of spatial aggregation method, including but not limited to averaging over voxels, ICA or dictionary learning.

Appendix B. Hypotheses for the spatio-temporal model
The assumptions on the model can be written as follows. For any i, i ∈ C and s, t = 1, . . . , T , Let Σ be the covariance matrix of the vector (Y i (t)) i∈C,t=1,...,T . In this paper, we assume without referring specifically to this assumption that the parameters σ 2 j , σ 2 ε , σ 2 e , ρ ii , η ii , r jj are such that Σ is a positive definite matrix.
We also assume that the random variables are independent in time. This is not overly restrictive: in particular, if the random variables have long memory, after a wavelet decomposition, the random variables can be approximated to be decorrelated in time for large wavelet scales (Moulines et al., 2007). In addition, assuming that the X i 's are centered is coherent as it is a well-known fact that a wavelet decomposition based on a wavelet mother with K vanishing moments cancels out every polynomial trend with degree K − 1.
Finally, to apply the law of large numbers, we also assume that all random variables are absolutely integrable, that is E[|Z i (t)|] < ∞ for Z = X, ε, e, i ∈ C and t = 1, . . . , T .

Appendix C. Properties of the estimators of interest
For any set of indices E with cardinality #E, we let The results of the paper are based on this proposition: Proposition Appendix C.1. Consider the notation of Section 2.1 and assumptions described in Appendix B. Let j, j ∈ {1, . . . , J}. (iii) Let i ∈ E ⊆ R j and i ∈ E ⊆ R j and assume for any i ∈ E and i ∈ E |i − i | ≥ p (in the case j = j ). Then as T → ∞, the following statements hold almost surely.  (1)].
(C.9) Proposition Appendix C.1 is given without proof. (i)-(ii) ensue from the model (1) while (iii) is quite straightfoward since we have assumed independence in time.
As seen from Proposition Appendix C.1, the quantityη E is related to the correlation structure of the local noise. By assuming this noise to be p-dependent (that is η δ = 0 when δ ≥ p), it is clear that the larger #E the smallerη E .

Appendix D. Consistency results for the existing estimators
Appendix D.1. Consistency of r ca jj Proposition Appendix C.1 shows r ca jj is a strongly consistent estimator of r ca jj as T → ∞ where r ca jj = r jj 1 + σ 2 e,jj /r jj (ρ R j + σ 2 ε,jη R j + σ 2 e,j )(ρ R j + σ 2 ε,j η R j + σ 2 e,j ) .

(D.1)
Another way to correct the size effect is to compensate the inter-correlation by the intra-correlation. This would lead to the following estimator: (D. 2) The two estimators (5) and (D.2) have the important property to remove the size effect (since when σ ε = σ e = 0, r ac jj = r jj ). Note that both estimators tend to the same limit.

(D.3)
As revealed by (D.1) and (D.3),r ca jj andr ac jj do not converge toward r jj when a local noise or global noise is present. We could ask whyr ca jj is interesting. Actually, a first spatial averaging tends to decrease the effect of the local noise. Indeed, when σ 2 e = 0 (and with equal unit variances to simplify), we have r ca jj = r jj (ρ R j + σ 2 εη R j )(ρ R j + σ 2 εη R j ) and r ac jj = r jj 1 + σ 2 ε . Hence, if we expect thatρ R j ≈ρ R j ≈ 1,r ca jj will be a better estimator sincē η R j = O(1/N j ). A natural compromise betweenr ca jj andr ca jj can be defined using local neighborhood as defined by ca.

(D.4)
From (D.4), we observe that when σ e = 0 then for any unknown value of σ ε , r R jj estimates consistently r jj /|ρ δ | which should be close to r jj if we take δ = p and expect that ρ p is close to 1. In other words, the estimatorr R jj is robust to the size of the regions and robust to a possible local noise.
To reduce the assumption that ρ p is close to 1, we can combine this idea of replicates with local averaging. This is the topic of the next section.

Appendix E. Consistency of r D jj
The following result is the key ingredient: Proposition Appendix E.1. Under the notation of this section, as T → ∞, the following statements hold almost surely.
Proof. (i) Using the independence in time, it is clear that the left-hand side converges almost surely to Cov( (1)) = σ j σ j r jj + σ 2 e − 2σ 2 e + σ 2 e since the two regions R k and R k are disconnected, which leads to the result.
In other words, Proposition Appendix E.1 shows that r D jj is a strongly consistent estimator of r D jj given by which, in the situation where σ ε = 0, is nothing else than r jj .
where V, V are two ν-neighborhoods at distance δ.
Propositions Appendix G.1-Appendix F.1 show that r RD jj is a strongly consistent estimator of r RD jj given by where V and V are two ν-neighborhoods at distance δ. Similarly to the previous estimator, r RD jj is robust to an additive global and local noise.