Estimation of functional connectivity from electromagnetic signals and the amount of empirical data required

An increasing number of neuroimaging studies are concerned with the identification of interactions or statistical dependencies between brain areas. Dependencies between the activities of different brain regions can be quantified with functional connectivity measures such as the cross-correlation coefficient. An important factor limiting the accuracy of such measures is the amount of empirical data available. For event-related protocols, the amount of data also affects the temporal resolution of the analysis. We use analytical expressions to calculate the amount of empirical data needed to establish whether a certain level of dependency is significant when the time series are autocorrelated, as is the case for biological signals. These analytical results are then contrasted with estimates from simulations based on real data recorded with magnetoencephalography during a resting-state paradigm and during the presentation of visual stimuli. Results indicate that, for broadband signals, 50-100 s of data is required to detect a true underlying cross-correlations coefficient of 0.05. This corresponds to a resolution of a few hundred milliseconds for typical event-related recordings. The required time window increases for narrow band signals as frequency decreases. For instance, approximately 3 times as much data is necessary for signals in the alpha band. Important implications can be derived for the design and interpretation of experiments to characterize weak interactions, which are potentially important for brain processing.


Introduction
Neuroimaging has traditionally been concerned with identifying which brain areas are specialized to represent different stimulus features or what regions are recruited to carry out different tasks. An important complementary question is how information is integrated across areas [7]. Given their high temporal resolution, imaging techniques such as electroencephalography (EEG) and magnetoencephalography (MEG) are particularly suited to investigate this issue. Different measures can be used to quantify statistical dependencies between time series, such as cross-correlation, coherence, mutual information, phase synchronization and gener-provides a linear measure. For time series with zero autocorrelation, the significance of the cross-correlation coefficient can be assessed with the help of the Student's-t distribution. For time series (x, y) with time samples t = 1 to N the cross-correlation coefficient is x y (1) where .. denotes the average across samples, and x and y denote the standard deviation of x and y, respectively. If the elements of x and y are normal independent identically distributed (i.i.d.) variables, and therefore have zero autocorrelation, the significance of the cross-correlation coefficient r can be calculated with Eq. (2). Under the null hypothesis of zero crosscorrelation between x and y, variablet in Eq. (2) approximately follows a Student's-t distribution with N − 2 degrees of freedom [3]. The p-value corresponding to the cross-correlation coefficient r is the same as the p-value associated with the calculatedt: where N is the number of samples. Since physiological signals are autocorrelated, the above expressions are not directly applicable. An extended expression of the statistical test can be derived for autocorrelated time series [3,5]. This is done by analytically estimating the variance of the cross-correlation coefficient under the null hypothesis of zero cross-correlations, and leads to the definition of an effective sample size N eff , which replaces sample size N in Eq. (2).
Here ˙x = cov(x) and ˙y = cov(y) are the N × N autocovariance matrices of time series x and y respectively. In addition, , where I N , and J N , respectively, denote the N × N identity matrix and the N × N matrix of ones only, and tr(M) denotes the trace of matrix M.
The modified t-test t mod has the same expression as before after substituting the number of samples N with the effective number of samples N eff .
The number of degrees of freedom is N eff − 2 in this case.

Datasets
Two MEG datasets were used. Resting-state data was collected with a 306-channel Elekta Neuromag system (Helsinki, Finland) at the Centre for Biomedical Technology (Technical University of Madrid, Spain). Recordings were obtained from 4 subjects, who sat for 5 min with their eyes open. They were instructed to remain passive and maintain their fixation on a small centrally placed dot. Data was sampled at 600 Hz and a bandpass filter between 0.1 and 100 Hz was applied online. All subject signed an informed consent according to local regulations. A visual event-related design was employed for the second dataset. Details for this dataset have been reported elsewhere [9]. In brief, twenty-three subjects participated after giving written informed consent. Stimuli comprised 60 pictures with affective content. Pictures were presented for 1.5 s in sequences of six with an interstimulus interval between 1.5 and 3 s. Each participant viewed a total of 360 pictures. For the present work the 120 pictures with neutral affective content were employed. MEG data was collected with a 148-channel whole head system (Magnes 2500 WHS, 4D Neuroimaging, San Diego, USA) at the Centre for Magnetoencephalography (Complutense University of Madrid, Spain). The sampling frequency was 254.3 Hz and a band-pass filter of 0.1-50 Hz was applied online. The recordings conformed to The Code of Ethics of the World Medical Association (Declaration of Helsinki).

Source reconstruction
A minimum variance beamformer [1,2,8,14] as implemented in FieldTrip [11] was used to reconstruct neuronal activity time series from sensor data. The forward solution was based on a template brain. For the resting-state data, a grand average map of normalized activity power across subjects was calculated. Normalization was carried out by dividing the power by the sensor noise, separately for each subject [15]. Locations of interest were the maxima of the grand average map spanning a 3D dipole mesh with 1 mm resolution covering the whole brain. Time series at those locations were obtained for each subject. Solutions were obtained from data prefiltered in each of the traditional frequency bands separately: delta (0.1-4 Hz), theta (4-8 Hz), alpha (8)(9)(10)(11)(12)(13)(14)(15), beta (8-15 Hz), gamma (15-60) and from broadband data (0.1-30 Hz). All maxima larger than 20% of the largest maxima and at least 1 cm apart from a larger maximum for each frequency band were considered. This procedure yielded 7-9 maxima per frequency band. Locations are provided in Table 2 and Fig. 3.
For the event-related dataset, the source reconstruction procedure has been described in detail elsewhere [9]. In brief, signals were bandpass-filtered between 0.1 and 30 Hz. Beamformer relative power changes were calculated by dividing the power in an active time window of interest (0.4-0.6 s post-stimulus) by the power during baseline (−0.5 to 0 s prestimulus). The resulting activity maps were submitted to a nonparametric cluster-based permutation statistic [10], as implemented in FieldTrip, to identify cortical source clusters of affect modulation. A 2D surface mesh representing the cortical sheet was employed for reconstruction. The 3 most significant dipole clusters were considered: 2 with a corrected threshold of p < 0.05 (right superior frontal gyrus and left occipitoparietal junction) and one with a trend level of p = 0.06 at the right occipitoparietal junction (Fig. 4). For each cluster, virtual electrode time series were averaged across dipoles. In the present study, cluster time series corresponding to the 0-1.5 s post-stimulus onset period for each subject were used. Details of the source reconstruction procedure can be found in Appendix B.

Results
As described in Section 2.1, the two key parameters influencing how much data is needed to identify statistical dependencies between time series are the autocorrelation time-scale and the level of cross-correlation. We first estimate these parameters from our empirical datasets and we then calculate the amount of data needed for different types of datasets.

Empirical autocorrelation function
The effective number of samples N eff (Eqs. (3) and (4)) decreases as the autocorrelation time scale increases, since samples become less independent. Fig. 1A provides an estimate of the autocorrelation time scales in the empirical datasets. The following exponential model A(t) = exp(−t/ ) was fitted to the empirical autocorrelation function, calculated over 1500 ms epochs, where A(t) is the model autocorrelation function, and t is the time between samples. Data had been bandpass-filtered between 0.1 and 30 Hz. A weighted least-squares fitting was carried out, with the standard deviation for each data point estimated from 500 ms epoch-segments. Fig. 1A shows the distribution of values of the autocorrelation time-scale across epochs for all subjects. Only values of for which the exponential model provided a good fit (p > 0.05 according to the 2 distribution) are shown. This corresponded to a 91% of epochs for the event-related paradigm and to a 68% of epochs for the restingstate data.
Median values of and standard deviation of the median value across subjects are reported in Table 1.
Overall variability in values of was similar for the event-related (standard deviation = 9.8 ms) and for the resting-state dataset (s.d. = 9.2). Median variability for individual subjects was 6.1 and 5.5, respectively. Calculating the variability for a given subject and brain location yielded median values of 5.2 for the event-related data and 4.8 for the resting-state data Table 2.
For the resting-state dataset, values of from neighbouring epochs were independent (cross-correlation coefficient = 0.04, p > 0.05), while for the event-related dataset they were moderately correlated (cross-correlation coefficient = 0.22, p < 0.05).
For band-pass filtered data in the traditional EEG/MEG frequency bands (delta, theta, alpha, beta and gamma) a model with an oscillatory component was used to fit the empirical autocorrelation function, A(t) = exp(−t/ )*cos(2 ωt), as oscillations were apparent in the time series. Table 1 shows the median values of and ω (and standard deviation of median values across subjects) for the different frequency bands, calculated again across epochs for which the model provided a good fit (p > 0.05 according to the 2 distribution).

Empirical cross-correlation coefficients
A second key parameter is the cross-correlation coefficient between virtual electrode time series r (Eq. (5)). Fig. 1B(left) represents the cross-correlation coefficient as a function of the delay between the virtual electrode time series for the different frequency bands. Each curve represents the cross-correlation coefficient averaged across epochs for a given subject and virtual electrode pair. While for resting-state data the empirical values are in the range 0-0.6, for event-related data cross-correlations are in the interval 0-0.4. In both cases, the values peak at zero-delay which indicates that zero-lag cross-correlations are important. These zero-lag cross-correlations reflect that the measured statistical dependencies do not solely arise from direct, causal, interactions between brain areas, as there is always a delay in transmission in physiological signals. They may indicate the existence of a common input to the two areas by a third region or may arise from the limitations inherent in the experimental techniques, such as volume conduction or smoothing in source reconstruction. To have an estimate of how strong interactions beyond zero-lag cross-correlations are, we quantify how asymmetric the cross-correlation function is with respect to lag, by subtracting the negative-lag part of the function from the positive one.
This procedure removes the symmetric component around zero delay in the cross-correlation function arising from instantaneous zero-lag interactions and provides a lower bound estimate to the non-zero-lag cross-correlations. The corrected versions of the cross-correlations for broadband signals are shown in Fig. 1B (right) and have values in the range [−0.05:0.05] which are markedly lower than for the full cross-correlation coefficients.

Required amount of data for different datasets
Models of brain areas and their connections have been used to compare connectivity measures (e.g. [4]). In the present paper the analysis is based instead on the distribution of cross-correlation coefficient values from empirical data. Using our empirical datasets, we first create surrogate datasets to obtain a null distribution of correlation coefficients under the null hypothesis of no functional connectivity between different virtual electrodes. Time series are first segmented into 500 ms long epochs. Then, for each dataset, subject and virtual electrode pairing separately, the epoch order of one of the virtual-electrodes is randomized. This ensures that the original bivariate dependencies are not present in the new epoch pairing, while preserving the univariate statistics such as the spectral power. This procedure defines null resting-state and event-related datasets.
While the epoch-randomizing procedure eliminates zero-lag correlations in the resting-state data, part of such correlations will still be present in the event-related data due to the influence of a common stimulus across epochs. To evaluate the magnitude of cross-correlations between virtual electrodes introduced by the external stimulus, a third null dataset is created by randomizing the phases of the Fourier components of the event-related epochs, where phases of different Fourier components and different virtual electrodes are randomized independently [13]. This procedure destroys all the temporal information in the time series, including dependencies arising from a common external stimulus, which are not removed by epoch randomization, while preserving the power spectrum and autocorrelation function of the univariate signals, and yields a null phase-randomized event-related dataset.
Finally, to assess the effect of non-Gaussian components in the signals, a dataset with Gaussian statistics is created with the help of an autoregressive model. This fourth dataset followed a stationary AR(1) Gaussian autoregressive model with autocorrelation timescale .
where = 15 ms and values for ε are independently drawn from the standard normal distribution. The null distribution of cross-correlation coefficients for each of the 4 datasets described above is obtained by calculating the cross-correlation coefficient across all virtual electrode parings and epochs independently for each subject. Distributions for different time-window lengths are calculated by averaging across groups of epochs spanning the required time window. For example, the correlation coefficient corresponding to a window length of 5 s is obtained by averaging the correlation coefficient of ten 500-mslong epochs.
In addition, following the methods in Section 2.1, an analytical null distribution of cross-correlation coefficients is obtained in the following way. Assuming an exponential autocorrelation function (˙x = ˙y = ˙i ,j = exp(−|t i − t j |/ )), with = 15 ms, and a given number of samples N, the effective number of samples N eff is obtained from Eqs. (3) and (4). N eff is then entered into Eq. (5) to obtain the distribution of cross-correlation coefficients, r, under the null hypothesis of no cross-correlation between time series, as we know that t mod follows a Student's-t distribution with N eff − 2 degrees of freedom.
The ability to detect a certain level of interaction depends on the amount of available empirical data, since the variance of the distributions of cross-correlation coefficients decreases when increasing the amount of data, and, therefore, the distributions corresponding to the presence and absence of interactions overlap less. Let r active denote the mean value of cross-correlation coefficients due to the presence of interactions. We assume that the variability in cross-correlation coefficient around this mean due to statistical fluctuations/noise equals the variability from the corresponding null distribution. Let us set a certain significance threshold corresponding to a given p-value of the null distribution, p null . We will refer to the fraction of the active distribution above this threshold as p active . Fig. 2A shows the amount of data, or time window length, T, required to declare as significant an 80% (p active = 0.8) of interactions, when the statistical threshold corresponds to p null = 0.05, for the different datasets. An autocorrelation timescale = 15 ms was used for the analytical and auto-regressive datasets. Error bars indicate the standard deviation across subjects. As can be observed, the amount of required data, T, increases sharply as the crosscorrelation coefficient r active decreases. This is due to the fact that to distinguish distributions with closer means, their variances must be smaller, and that is achieved with more data. There is almost perfect agreement between the analytical and auto-regressive datasets. Differences between these two first datasets and the other three are probably due to variability in the autocorrelation time-scale and deviations from normality in the latter. The fact that differences between the event-related and phase-randomized event-related datasets are small indicate that the presence of an external stimulus does not markedly increases the variability of the signals and does not make the interactions more difficult to detect.

Discussion
The main aim of the present work was to determine the amount of data needed to estimate the degree of dependency between the activities of different brain regions. As shown in Fig. 2A, the amount of data needed to detect an interaction increases sharply as the interaction level decreases. For broadband signals, while detecting a cross-correlation coefficient of r = 0.2 requires less than 10 seconds of data, 50-100 seconds are needed to detect a cross-correlation coefficient of r = 0.05. For narrow-band data, Fig. 2B shows that the lower the frequency band the more data is needed, with approximately 3 times as much data required for the alpha band than for broadband data. This is consistent with the fact that lower frequency bands are associated with longer autocorrelation time-scales as shown in Fig. 1C.
A key question is then what the typical levels of functional interactions between brain areas are. For the datasets considered in the present work, full cross-correlation coefficients were as high as 0.6. In contrast, reduction of zero-lag components yielded lower bound estimates no higher than 0.1. Analysis of cross-correlations between neurons in cat auditory cortex shows that most of the cross-correlation is due to secondary effects other than direct anatomical interactions between neurons [6] and quantify primary correlation effects between 0 and 0.1. Although large statistical dependencies are also reported in the neuroimaging literature, an important issue is to what extent this values reflect direct interactions between brain areas or reflect as well other contributions such as volume conduction effects, common influences from a third area, or common modulation by an external stimulus. Results from the present analysis allow one to address the potential effect of this last confound. The fact that similar results were found in the present analysis for the event-related and phase-randomised event-related datasets indicates that correlations induced by a common external stimulus did not significantly affect the amount of data required.
In the case of event-related data, the temporal resolution at which a certain interaction can be detected can be calculated by dividing the required window length, as reported in the present study, by the available number of epochs. Thus, if 100 epochs have been obtained and the required amount of data is 20 s, the connectivity measure has at most a resolution of 200 ms.
To summarize, the present analysis shows that, given the number and magnitude of confounding components, substantial amounts of data are needed to reliably detect weak interactions between brain areas. Such weak interactions may constitute a large proportion of interregional brain dependencies, given the low empirical correlation values found in the present work and considering previous reports demonstrating that direct interactions between single neurons are small. Therefore, studies designed to characterize functional connectivity relationships between brain areas should acquire enough data to allow for reliable measures of interaction.