Looking through the windows: a study about the dependency of phase-coupling estimates on the data length

Objective. Being able to characterize functional connectivity (FC) state dynamics in a real-time setting, such as in brain–computer interface, neurofeedback or closed-loop neurostimulation frameworks, requires the rapid detection of the statistical dependencies that quantify FC in short windows of data. The aim of this study is to characterize, through extensive realistic simulations, the reliability of FC estimation as a function of the data length. In particular, we focused on FC as measured by phase-coupling (PC) of neuronal oscillations, one of the most functionally relevant neural coupling modes. Approach. We generated synthetic data corresponding to different scenarios by varying the data length, the signal-to-noise ratio (SNR), the phase difference value, the spectral analysis approach (Hilbert or Fourier) and the fractional bandwidth. We compared seven PC metrics, i.e. imaginary part of phase locking value (iPLV), PLV of orthogonalized signals, phase lag index (PLI), debiased weighted PLI, imaginary part of coherency, coherence of orthogonalized signals and lagged coherence. Main results. Our findings show that, for a SNR of at least 10 dB, a data window that contains 5–8 cycles of the oscillation of interest (e.g. a 500–800 ms window at 10 Hz) is generally required to achieve reliable PC estimates. In general, Hilbert-based approaches were associated with higher performance than Fourier-based approaches. Furthermore, the results suggest that, when the analysis is performed in a narrow frequency range, a larger window is required. Significance. The achieved results pave the way to the introduction of best-practice guidelines to be followed when a real-time frequency-specific PC assessment is at target.


Introduction
The characterization of the physiological and pathological organization of the human brain requires the assessment of functional connectivity (FC), defined as the statistical dependency between neural signals (He et al 2018). Reliable FC estimation is crucial to unravel the integrated cooperation of several spatially separated brain regions and its role in cognition (Bressler and Menon 2010). While the assessment of static FC, i.e. obtained under the assumption that the FC does not change over the data acquisition time period, requires a limited number of parameters to be set (e.g. frequency of interest), the estimation of dynamic FC depends on a larger set of factors to be chosen in the analysis pipeline. Albeit being more complex in nature, the estimation of dynamic FC may provide important insights when fluctuations over the data acquisition time period (either in ongoing or epoched paradigms) are relevant (e.g. perception, learning or memory related studies; Goddard et al 2016). Similarly, brain-machine interface, closed-loop or neurofeedback approaches used to guide intervention in neurorehabilitation may require the real-time two ideal (i.e. noiseless) phase-coupled oscillators in the frequency band 9−11 Hz, with constant phase difference (∆θ = π/4) and independent amplitudes; center: polar histogram and mean-resultant-length (MRL) vector for the instantaneous phase difference sampled at all time points in the range from 0 to 1 s; right: polar histogram and MRL vector for the instantaneous phase difference sampled at all time points in the range from 0 to 0.2 s. (B) Same as in panel (A) for two uncoupled oscillators. (C) Box plots of the distribution of connectivity estimates, as measured by the imaginary part of phase-locking value (iPLV), displayed as a function of data length; results were obtained from 5000 realizations of phase-coupled oscillators (in purple) and uncoupled oscillators (in green); the dot, the rectangular box and the whiskers for each box plot denote the median value, the range from the 16th to the 84th percentile and the range from the 5th to the 95th percentile of the distribution, respectively. estimation of dynamic FC (Sitaram et al 2017). This real-time estimation is best achieved by relying on electrophysiological techniques, which feature a high temporal resolution (on a millisecond time scale) that is fundamental to track and interfere with the brain. These techniques offer also a unique opportunity to investigate, with high spectral resolution, neural oscillations, which are known to subserve brain connectivity (Varela et al 2001, Engel et al 2013. Indeed, it has been hypothesized that only coherently oscillating neuronal groups can effectively interact (Communication Through Coherence;Fries 2005), and that phase-coupling (PC) between oscillatory signals from different brain areas properly represents a measure of long-range connectivity (Engel et al 2013).
To detect such PC in a dynamic FC framework, several aspects need to be taken into account, among which the length of the data window is crucial. As an illustrative example, we considered the estimation of PC, using the imaginary part of the phase locking value (iPLV; Palva and Palva 2012), in the two extreme scenarios of always coupled or always uncoupled noiseless signals (figures 1(A) and (B), respectively). This example shows that, for shorter data window lengths, high iPLV values are obtained also for uncoupled signals (panel (C)). In the practical situation of noisy signals, not only false connectivity can be detected but also true connectivity can be missed.
In this simulation study, we assess the reliability of PC estimates as a function of the data length and with different metrics used in the literature to detect this neural coupling mode . In particular, we focus on realistic (synthetic) data corresponding to different situations realized by varying the following parameters: the data length, the signalto-noise ratio (SNR), the phase difference value, the specific spectral analysis method and the fractional bandwidth.
Previous studies investigated the performance of FC metrics with various parameters under simulated conditions as well as real data (Astolfi et al 2007, Wang et al 2014, Fraschini et al 2016, Bakhshayesh et al 2019, Sommariva et al 2019. Wang et al (2014) benchmarked 42 different connectivity metrics from seven metric families, including coherence, mutual information, and Granger causality. They evaluated the performance of the metrics on electroencephalographic (EEG) and functional magnetic resonance imaging signals simulated by relying on different generative models (GMs), incorporating both linear and non-linear systems. Accuracy in recovering the ground truth was estimated for different data lengths, sizes of sliding window, time delays as well as noise levels. Liuzzi et al (2019) investigated sensitivity of connectivity metrics to time-varying dynamics of connectivity states. Estimation accuracy was assessed for two classes of connectivity metrics: phase-and amplitude-based. Analysis was performed on synthetic signals representing a two-node system and a large-scale network. Performance was evaluated under different parameters of connectivity state durations, length of sliding windows, and SNR levels. Sommariva et al (2019) studied the reliability of connectivity estimation using model-based and data-driven metrics with respect to different data length constraints. Furthermore, artificially generated source signals were projected to EEG sensor space with different levels of biological noise, and then estimated again via source reconstruction. They demonstrated metric-dependent effects of the number of available samples as well as of the noise level on the reliability of estimated connectivity.
Tewarie et al (2019) compared time-resolved methods based on sliding window approach with methods incorporating data-driven (adaptive) windowing, similar to the approach used in de Pasquale et al (2010). They showed that adaptive windowing outperforms fixed-window approach in recovering fast-dynamic fluctuations in connectivity. However, adaptive windowing is not easily applicable to realtime analysis scenario.
All the aforementioned studies demonstrated the sensitivity of connectivity estimates to data length and sliding window length. However, most of them did not consider an extensive set of phase-based connectivity metrics. In addition, this is the first study that, to our knowledge, addresses this issue by also comparing two spectral approaches: the Hilbert-based approach and the Fourier-based one. For example, Liuzzi et al (2019) and Bakhshayesh et al (2019) directly focused on the Hilbert approach, while only the Fourier-based one was used in Sommariva et al (2019). Moreover, we extensively investigated the effects induced by the choice of the frequency bandwidth on the metric performances. Indeed, previous work focused on standard band definition regardless of the relevant central frequency. For instance, Bakhshayesh et al (2019) simulated a coupling in the alpha band, while Liuzzi et al (2019) studied two specific frequency bands (alpha band for the neural mass model, and the beta band for the autoregressive (AR) model). We also varied the SNR and the phase of the simulated couplings in order to account for different possible scenarios. Finally, we expressed the window length in terms of the number of cycles of a given oscillatory signal component, thus reducing possible confounds in the results induced by the specific frequency of interest.

Material and methods
In brief, signal time-courses were generated as pairs of time-courses of variable length, describing bandlimited processes with a given center frequency and bandwidth of the oscillation of interest, associated with two cortical sites. The two sources could be either uncoupled or phase-coupled. The generated source time-courses were projected to the sensor level using realistic head and sensor models. Biological and instrumental noise was added to different extents. The reconstructed source time-courses were then obtained from the simulated signals through the exact low-resolution electromagnetic tomography (eLORETA) spatial filter (Pascual-Marqui 2007a) as implemented in the FieldTrip toolbox . Pairwise coupling between sources was assessed by means of several connectivity metrics intrinsically robust to source leakage effects: iPLV, phase lag index (PLI), debiased weighted PLI, imaginary part of coherency, and lagged coherence. Additionally, connectivity using coherence and PLV was assessed after orthogonalization (Brookes et al 2012, O'Neill et al 2015 of the reconstructed source timecourses since these metrics are not intrinsically robust to source leakage effects . Ten thousand repetitions (5000 for phase-coupled sources and 5000 for uncoupled sources) were obtained by varying the source locations and the center frequency of the oscillation interest. All details are given below.

Head and sensor models
The head and sensor models used in this study were based on the publicly available 'New York Head' model (Huang et al 2016), downloaded from www. parralab.org/nyhead. The model data contain (but are not limited to): (a) a segmentation of the ICBM-152 template head (Fonov et al 2009(Fonov et al , 2011 into six different tissue types (skin, skull, cerebrospinal fluid-CSF, grey matter, white matter, and air cavities); (b) a highdensity triangular mesh (74 382 nodes) for the cortical surface; (c) the labels and locations of 231 EEG scalp electrodes defined in accordance with the 10-5 electrode system (Oostenveld and Praamstra 2001); (d) the lead-field matrix evaluated for all electrodes and unit-strength current dipoles located at the nodes of the cortical surface mesh, with orientations normal to the local cortical surface, by using a finiteelement method (FEM) with a six-tissue volumeconductor model derived from head segmentation (tissue conductivities in S m −1 : gray matter 0.276; white matter 0.126; CSF 1.65; skull 0.01; skin 0.465; air 2.5 × 10 −14 ); and (e) triangular meshes (1922 nodes each) for the inner-and outer boundaries of the skull and for the outer boundary of the skin (i.e. the scalp). We refer to (Huang et al 2016) for further details on the New York Head model.
From the mentioned data, we modeled a 128channel EEG system by using a selection of 128 EEG scalp electrodes of the 10-5 system according to the configuration of conventional high-density EEG systems (Oostenveld 2006). We used a uniform subsample of 9788 nodes of the cortical surface mesh, with the exclusion of the medial wall, for the source space; source-current distributions were modeled using normally oriented current dipoles placed at these nodes. For the selected EEG sensors and sources, we built two forward models, based on different volume-conductor modeling details and numerical computation methods, to independently generate the EEG signals and solve the inverse problem. The first model was the FEM model provided along with the New York Head used as a GM for simulating the Figure 2. Illustrative representation describing the generation and reconstruction of source time-courses. For each realization, we generated a pair of time-courses (either coupled or uncoupled) of variable length associated with two random cortical sites, black dots. The generated source time-courses were projected to the sensor level using realistic head (six tissue-type FEM) and sensor models. Biological and instrumental noise was added to different extents at the sensor level. The estimated source time-courses were obtained through the eLORETA spatial filter (with leadfield based on a three-shell BEM model).
EEG signals. The lead-field matrix was obtained by taking the entries of the full 231 × 74 382 leadfield matrix corresponding to the selected sensors and sources, eventually re-referenced to common average reference.
The second forward model was built using a simplified three-shell volume-conductor model along with a boundary-element method (BEM) (de Munck 1992, Mosher et al 1999. The surface meshes for the inner-skull, outer-skull and scalp were used as conductivity boundaries in a three-shell boundaryelement volume-conductor model (skin, skull and intracranial volume). Tissue conductivities were set equal to 0.33 S m −1 for the skin and intracranial volume, and 0.0041 S m −1 for the skull, as in Huang et al (2016). Within this volume-conductor geometry, we computed the lead-field matrix by using a linear collocation BEM (Stenroos et al 2007) formulated with the isolated source approach (Hämäläinen andSarvas 1989, Stenroos andSarvas 2012). In addition, we computed the source estimator using one of the most commonly used methods for the solution of EEG inverse problem: the eLORETA method (Pascual-Marqui 2007a). This model was used as a test model (TM) for source reconstruction and connectivity analysis.
A schematic of the overall pipeline is shown in figure 2.

Simulated EEG data
We generated pairs of signals of variable length, sampled at 512 Hz, representing the time-courses of the activity at two cortical sites. To include noise contamination and interference signal (crosstalk) from other cortical sites, the final source time-courses were generated as described below.
We simulated two dipolar sources, s 1 and s 2 , with locations randomly chosen from the nodes of the cortical surface mesh and orientations normal to the cortex. The source time-courses, s 1 (t) and s 2 (t), were modeled as band-limited processes with center frequency f c and fractional bandwidth B f (i.e. the absolute bandwidth divided by the center frequency). The center frequency f c was randomly sampled from a uniform distribution in the range from 10 to 50 Hz; B f was fixed to 0.5, i.e. a value that roughly approximates conventional EEG frequency bands (e.g. 4.5-7.5 Hz for theta, 7.5-12.5 Hz for alpha, or 15-25 Hz for beta). The two sources could be either uncoupled or phasecoupled. To simulate uncoupled sources, s 1 (t) and s 2 (t) were generated as the sum of sinusoids with frequencies from f c (1 − B f /2) to f c (1 + B f /2), in steps of 0.01 Hz, and independent and identically distributed random amplitudes and phases for each frequency component, eventually passed through an AR filter of length 0.2/f c . To simulate phase-coupled sources, we first generated s 1 (t) and s 2 (t) as in the case of uncoupled sources; we then computed their analytic signals using the Hilbert transform, and we set the instantaneous phase of s 2 (t) equal to the instantaneous phase of s 1 (t) plus a constant phase difference ∆θ, while keeping the instantaneous amplitude of the original signal; ∆θ was randomly sampled from a uniform distribution in the range from −π to π.
The generated source time-courses were projected to the sensor level through multiplication by the source topographies (i.e. the columns of the lead field matrix) calculated using the GM. Additive noise was modeled as a mixture of independent standard normal processes distributed over cortical nodes and electrodes. This approach allowed us to model the presence of both biological (i.e. correlated brain-) noise and instrumental (i.e. sensor) noise. Specifically, the N-dimensional vector noise time-course, n (t), with N being the number of EEG sensors, was sampled from a multivariate Gaussian distribution with zero mean and covariance matrix of the form: where L denotes the GM lead-field matrix, the symbol T denotes the transpose operator, I is the identity matrix, and λ is a scalar parameter that weighs the contribution of instrumental noise to the biological noise covariance matrix. The diagonal/non-diagonal entries of the first term of the sum in equation (1) models the variance/covariance of the sensor-level signals that is due to biologically realistic spread (which is in turn modeled by the lead-field matrix) and the second term of equation (1) codes the increment of each sensor variance due to instrumental noise. In particular, we set λ = 0.01 trace LL T /N, where trace {·} denotes the trace of a matrix. This value of λ modeled a 10% increase of the sensor variance that is due to instrumental (and not to biological) noise. The reconstructed source time-courses, σ 1 (t) and σ 2 (t), were obtained from the simulated EEG signals through multiplication by the source spatial filters (i.e. the rows of the eLORETA estimator matrix corresponding to the simulated sources s 1 and s 2 ) calculated with TM, i.e.
where l i (for i = 1, 2) denotes the topography of source i calculated using the GM, w i (for i = 1, 2) denotes the spatial filter of source i calculated using the TM, and γ 1 and γ 2 are two parameters that weight the contribution of the actual source time-courses to the reconstructed source time-courses. We varied γ 1 and γ 2 in order to manipulate the SNR (see section 2.4.2).

Source connectivity analysis
Source connectivity was estimated from the reconstructed source time-courses, σ 1 (t) and σ 2 (t), as follows. We computed the spectra Σ 1 ( f ) and Σ 2 ( f ) of the source time-courses σ 1 (t) and σ 2 (t). From these spectra, which are complex-valued functions of the frequency f, we extracted the spectral amplitudes, where |·| and arg {·} denote the absolute value and the argument, respectively, of a complex-valued number. We computed the phase difference as The source connectivity was then estimated by using seven different PC metrics, as listed below.
The PLV (Tass et al 1998, Lachaux et al 1999 was estimated as where ı is the imaginary unit, exp {·} denotes the exponential function, and ⟨·⟩ denotes the expectation value. The latter is evaluated as the average across time, see section 2.4.3 below. The iPLV (Palva and Palva 2012) was estimated as where ℑ {·} denotes the imaginary part of a complexvalued number. The PLI (Stam et al 2007) was estimated as where sign {·} denotes the sign function.
The debiased weighted phase-lag index (wPLIdeb) (Vinck et al 2011) was estimated as The coherence (Coh) (Rosenberg et al 1989, Halliday et al 1995 and the imaginary part of coherency (imCoh) (Nolte et al 2004) were derived from the complex-valued coherency. In particular, the former was computed as and the latter as Hence, Coh and imCoh coincide with the absolute value and the imaginary part of the complexvalued coherency, respectively. imCoh, which can be negative-valued, was considered as absolute value. The lagged coherence (lagCoh) (Pascual-Marqui 2007b) was estimated as where ℜ {·} denotes the real part of a complex-valued number.
A source-leakage compensation by using a pairwise orthogonalization procedure (Brookes et al 2012, O'Neill et al 2015 was applied to source time-courses, σ 1 (t) and σ 2 (t), prior to the estimation of Coh and PLV. Such compensation was not required for estimating other connectivity metrics, which are robust to artificial PC detection due to source leakage . An in-house implementation of the above connectivity metrics was used. Each metric was estimated for all frequencies within a band around the center frequency (see section 2.4.4 for the selection of the frequency band), and then averaged over this band.

Simulation repetitions
We generated 10 000 two-source configurations by varying the locations of the cortical nodes s 1 and s 2 and the center frequency f c . Of these, 5000 were used to test connectivity estimates for phase-coupled sources and 5000 for uncoupled sources. For each two-source configuration, source connectivity was estimated by independently varying the following four parameters in the simulations: (a) the data length; (b) the output signal-to-noise ratio (oSNR); (c) the spectral analysis approach; and (d) the test fractional bandwidth. These parameters are described below in more detail.

Data length
The length L of the source time-courses, hereinafter the data length, was measured as the number of cycles of an oscillation at the center frequency f c (e.g. L = 5 cycles is equal to 1000 ms for f c = 5 Hz, or 500 ms for f c = 10 Hz, or 250 ms for f c = 20 Hz). We varied L in the range from 1 to 20 cycles, in steps of 0.5.

oSNR
For each of the two reconstructed sources, we defined the oSNR in the frequency band of the simulated signal, ∆f = 0.5 f c , as the ratio between the projected actual-source-signal variance and the projected-noise variance, i.e.
where var ∆f {·} denotes the variance of a signal which has been band-pass filtered (two-pass fourth-order Butterworth filter) in the frequency band ∆f. We independently varied γ 1 and γ 2 in equation (2) to set three levels of oSNR, i.e. 20 dB, 10 dB and 3 dB.

Spectral analysis approach
The estimation of source connectivity from continuous signals, such as σ 1 (t) and σ 2 (t), relies on a timeresolved representation of signal spectra. That is, the spectra of the source time-courses were computed as functions of both frequency and time, i.e. Σ 1 (f, t) and Σ 2 (f, t). For each frequency, source connectivity estimates were computed as in equations (3)- (8), in which the expectation values were evaluated as an average over time (segments for the Fourierbased approach and instants for the Hilbert-based approach). Such a time-resolved representation of signal spectra was obtained using either a Fourierbased approach or a Hilbert-based approach.
In the Fourier-based approach, σ 1 (t) and σ 2 (t) were divided into overlapping segments of equal length W; the value of W was set to control the frequency resolution (see section 2.4.4); the step width between successive segments was chosen equal to the sampling interval of the signal. By construction, each segment was centered around a different time-instant t. Within each segment, the signal was first multiplied by a Hanning window function to reduce effects related to spectral leakage (Harris 1978), and then Fourier transformed to compute the frequency-dependent Fourier coefficients. The collection of the Fourier coefficients estimated from different segments yielded the time-resolved representation signal spectra, Σ 1 (f, t) and Σ 2 (f, t).
In the Hilbert-based approach, σ 1 (t) and σ 2 (t) were first band-pass filtered around the center frequency, with a given bandwidth (see section 2.4.4), by using a two-pass fourth-order Butterworth filter. The analytic signals of the filtered source time-courses were then computed by using the Hilbert transform. Particular attention was paid to the minimization of the edge effects induced by both the filter and by the Hilbert transform. For the former, we used the Gustafsson's method to set the filter initial conditions (Gustafsson 1996) as in Scipy (scipy.signal) filtfilt function. For the latter, we used the filtered source time-courses to generate the coefficients of an AR model (Yule-Walker, order 30 or 2/3 of the data length for shorter data), which served to iteratively pad the time-courses by one-cycle-length at both ends prior to the computation of the Hilbert transform. The analytic signals provided the time-resolved representation signal spectra, Σ 1 (f, t) and Σ 2 (f, t), for the frequency band of interest.

Test fractional bandwidth
In real-data applications, the frequency band associated with the (potentially present) PC of interest may be unknown. We thus wanted to analyze the impact of the choice of the fractional bandwidth (a frequency band normalized by the center frequency) on the minimum data length required to have reliable connectivity estimates. We termed the fractional bandwidth used in connectivity analysis as 'test fractional bandwidth' , B test f . We recall that, instead of using an unnormalized absolute bandwidth, it is convenient to use the fractional bandwidth, since it does not change as the center frequency of the bandpass process is moved.
This study is conducted under the assumption that, as described in section 2.2, the underlying PC shows a frequency-band specificity corresponding to fractional bandwidth B f = 0.5 (value that roughly approximates standard EEG frequency bands e.g. for a center frequency f c = 10 Hz, we have the band 7.5-12.5 Hz, which approximates the alpha frequency band).
The test fractional bandwidth, B test f , depends on the choice of specific parameters in spectral analysis, discussed below. In particular, we defined B test f as the ratio between the 6 dB bandwidth ∆f −6dB (namely the range of frequencies comprised between the −6 dB points of the frequency response function) and the center frequency f c , i.e.
For instance, estimating connectivity between filtered time-course with a ∆f −6dB = 10 Hz or a ∆f −6dB = 1 Hz, with the same center frequency f c = 10 Hz (i.e. in the frequency-bands 5-15 Hz and 9.5-10.5 Hz), means to analyze the statistical dependency with a B test f = 1 or a B test f = 0.1, respectively. In the following, we describe the dependency of the B test f on the selected parameters for the Fourier-based and Hilbert-based approaches.
For the Fourier-based approach, B test f solely depends on the length W of the segments in which the source time-courses are divided for the computation of Fourier coefficients, and on the shape of the window function which is multiplied with each signal segment to reduce spectral leakage (Harris 1978); in this study we used a Hanning window, for which B test f = 2/W, where W is measured in number of cycles of an oscillation at the center frequency f c . For the Hilbert-based approach, where f low and f high denote the lower and higher cutoff frequencies of the band-pass Butterworth filter applied to source time-courses prior to the calculation of the analytic signal (note that the 3 dB attenuation at the cutoff frequency of a one-pass Butterworth filter is enhanced to 6 dB after two-pass filtering).
In this work, we varied the segment length W (for the Fourier-based approach) and the cutoff frequencies f low and f high (for the Hilbert-based approach) to vary B test f in the range from 0.1 to 1, in steps of 0.1. Since the fractional bandwidth B f used in data generation is fixed to 0.5, the above choice for B test f corresponds to analyzing, for each realization, frequency bandwidths in the range from 0.2 to 2 times the one of the simulated data (e.g. for a center frequency f c = 10 Hz, and thus a simulated frequency band of 7.5-12.5 Hz, the analyses will cover bands from 9.5-10.5 Hz to 5-15 Hz).

Performance assessment
The performance of each connectivity metric was assessed by using the area under curve (AUC) of the receiver operating characteristic (ROC) curve (Egan 1975, Fawcett 2006. For each combination of simulation parameters (i.e. data length, oSNR, spectral analysis approach, and fractional bandwidth), we collected n pos = 5000 connectivity estimates obtained from phase-coupled sources, defined as positives (Fawcett 2006) and n neg = 5000 connectivity estimates from uncoupled sources as negatives. We defined a threshold T (p), which is a function of p (0 < p ⩽ 100), as the connectivity value at the pth percentile of the connectivity metric distribution of the uncoupled sources. Given a value for p, all the scores exceeding T (p) were classified as positives (coupled), otherwise as negatives (uncoupled); the true positive rate (TPR) was computed as the ratio between the number of positive samples 'correctly' detected as positives (coupled) and the total number of positives; the false positive rate (FPR) was computed as the ratio between the number of negative (uncoupled) samples 'incorrectly' detected as positives and the total number of negatives. Note that, in this simulation, FPR is equal to 1 − p. The specificity and sensitivity were computed as TPR and 1-FPR, respectively.
The ROC curve was obtained by plotting TPR against FPR for successive values of p in the range from 0 to 100, in steps of 1. The AUC was estimated from the ROC curve using the trapezoidal approximation. The 95% confidence interval for AUC was estimated under large sample size approximation as CI 95% (AUC) = AUC ± 1.96 SE (AUC) where SE (AUC) is the standard error of AUC of the form (Hanley and McNeil 1982) with Q 1 = AUC 2 − AUC , and Q 2 = 2AUC 2 1 + AUC .
We used the AUC as a measure of the performance of a connectivity metric in discriminating phasecoupled sources from uncoupled sources, i.e. the larger the AUC, the better the performance. The minimum data length required for accurate discrimination was found according to a minimum AUC value of 0.70, which is conventionally used as a threshold value to define a diagnostic test as accurate (Swets 1988). We looked at the AUC as a function of the data length and test fractional bandwidth; approximate isolines for AUC = 0.7 were computed after applying a two-dimensional Gaussian smoothing kernel with unitary standard deviation.

Dependence on phase difference
To investigate the dependence of the connectivity metric performance on the phase difference between phase-coupled sources, we performed dedicated simulations in which the phase difference ∆θ during data generation was parametrically set in the range from 0 to π, in steps of π/18. At each value of ∆θ, we evaluated the minimum data length required to obtain AUC larger or equal to 0.70.

Edge response analysis
To show practical implications of the use of finite data length for connectivity analysis in real-life experiments, we simulated an example case of a dynamic connectivity analysis experiment, as described below.
We generated pairs of signals, sampled at 512 Hz, representing the activity of two sources at 10 Hz, with 5 Hz bandwidth. The signal length was 6 s, centered at t = 0 s (i.e. from −3 s to 3 s). We simulated different source connectivity profiles. In the first profile, the data comprised two consecutive segments each of length 3 s; the sources were phase-coupled (∆θ = π/2) in the first segment (i.e. one from −3 s to 0 s) and uncoupled in the other segment (i.e. from 0 s to 3 s); this profile will be referred to as 'edge' . The other profiles comprised three consecutive segments, i.e. one central segment of length D, centered t = 0 s, and two lateral segments; the sources were phase-coupled in the central segment (∆θ = π/2) (i.e. from −D/2 s to D/2 s) and uncoupled in the lateral segments (i.e. from −3 s to −D/2 s, and from D/2 s to 3 s); these profiles will be referred to as 'steps' . We varied D across 0.125 s, 0.25 s, 0.5 s, and 1 s. Additive uncorrelated noise was sampled from a univariate Gaussian distribution; the SNR was measured as the ratio between the signal variance and the noise variance in the frequency band 7.5−12.5 Hz and set to be equal to 10 dB.
For each connectivity profile, we generated a total of 1000 signal realizations. We estimated the timecourses of dynamic source connectivity by using a sliding window approach. Specifically, connectivity was estimated within sliding windows of data of length L, moved in steps of 7.8 ms across the whole signal length. We varied L across 250 ms, 500 ms, 750 ms, and 1000 ms.

Results
In this section, we first present the performances of different connectivity metrics in discriminating phase-coupled sources from uncoupled sources as measured by ROC AUC; we consider an acceptable performance level when AUC is at least equal to 0.7. Then, we present how the minimum data length required to reach the acceptable performance level depends on the phase difference between phasecoupled sources. Last, we discuss the practical implications of the use of finite data length in a simulated experiment of dynamic connectivity analysis using a sliding-window approach.
We found similar patterns of AUC dependence on data length L and test fractional bandwidth B test f across the different connectivity metrics and oSNR values. The AUC dependence patterns will be described below for the representative case of PLV estimated from source signals with oSNR = 10 dB. For fixed L, AUC shows an initial increase with increasing B test f and approaches to a maximum at a value of B test f , which is roughly equal to the fractional bandwidth of the simulated signal (i.e. 0.5), followed by a plateau or a slight decrease at higher B test f . Figure 3(B) left plot shows the heat map of the AUC for PLV estimated using the Fourier-based approach as a function of L and B test f . While AUC still increases for increasing L, we emphasize a few key differences between this AUC pattern and the PLV one estimated by using the Hilbert-based approach. First, the signal spectra and, consequently, the connectivity estimates and AUC could not be evaluated for L < 2/B test f (i.e. the white region in the AUC heat map), since the data length was not sufficient to build segments of length W = 2/B test f necessary to achieve the desired frequency resolution. Second, for a given combination of L and B test f , the AUC is smaller than the respective value calculated using the Hilbert-based approach. Third, the AUC pattern reflects the number of averaging segments used for the computation of Fourier spectra (see figure 3(B), right plot, for quantitative values), with small AUC values being paired to a small number of averaging segments. Last, as an effect of AUC dependence on the number of averaging segments, the value of B test f at which, for fixed L, AUC reaches its maximum is slightly larger than 0.5.
For a comparative evaluation of the performance of the connectivity metrics, we looked at the Heat map of AUC for PLV estimated by using the Hilbert-based approach, as a function of data length (expressed as number of cycles, cyc.) and test fractional bandwidth, for oSNR = 10 dB; the black line indicates the isoline for AUC = 0.7. (B) Left: heat map of AUC for PLV estimated by using the Fourier-based approach, as a function of data length and test fractional bandwidth, for oSNR = 10 dB; the black line indicates the isoline for AUC = 0.7; right: heat map of the number of averaging segments used for the computation of Fourier spectra, as a function of data length and test fractional bandwidth. (C) Isolines (solid lines) for AUC = 0.7 with the respective uncertainty regions (shaded regions, calculated as the area within the isolines for the 95%-confidence-interval bounds of AUC equal to 0.7), as a function of data length and test fractional bandwidth, for the different connectivity metrics (PLV, iPLV, PLI, wPLIdeb, Coh, imCoh, lagCoh); the plots in different rows refer to different spectral analysis approach (Hilbert-based and Fourier-based); the plots in different columns refer to different values of oSNR (20 dB, 10 dB, and 3 dB). data length required to reach the minimum performance level, i.e. AUC = 0.7; this quantity will be hereinafter denoted as L min . For each connectivity metric, we extracted the isoline for AUC = 0.7 in the plots of AUC as a function of L and B test f ; this isoline indicates the values of L min for various values of B test f . The results are shown in figure 3(C); the plots in different rows refer to different spectral analysis approach (Hilbert-based and Fourier-based); the plots in different columns refer to different oSNR values (20 dB, 10 dB, and 3 dB); within each plot, the solid lines represent the isolines for AUC = 0.7 calculated for the different connectivity metrics, and the shaded regions represent the respective uncertainty regions calculated as the area within the isolines for the 95%-confidence-interval bounds of AUC equal to 0.7. We distinguish two main trends for L min as a function of B test f : (a) L min rapidly increases with decreasing B test f if the latter is smaller than the value that is close to (Hilbert-based approach) or slightly above (Fourier-based approach) the fractional bandwidth of the simulated signal (i.e. 0.5 in this simulation); and (b) above this value, L min slightly decreases (oSNR = 20 dB), remains constant (oSNR = 10 dB) or slightly increases (oSNR = 3 dB) with increasing B test f . In the comparison between spectral analysis approaches, we found that, for a given value of B test f , L min obtained by using the Hilbert-based approach is globally smaller than the L min obtained by using the Fourier-based approach. In the comparison between the different connectivity metrics, the main differences arise if the Hilbert-based approach is used and B test f is small, where the values of L min for Coh and lagCoh are smaller than the ones of the other connectivity metrics; overall, the values of L min for imCoh and iPLV are systematically larger than the one of the other connectivity metrics. Figure 4 shows the AUC as a function of L in the particular case of B test f = 0.5, i.e. a value that roughly corresponds to the bandwidth of conventional EEG frequency bands (e.g. 4.5-7.5 Hz for theta, 7.5-12.5 Hz for alpha, or 15-25 Hz for beta) and that corresponds to the fractional bandwidth of the simulated signals. For all connectivity metrics, AUC increases for increasing L starting from a value around 0.5 (no discrimination ability) up to values greater than 0.8 (high discriminability) in high oSNR conditions.
In high oSNR conditions (20 dB and 10 dB), and using the Hilbert-based approach, we found that a L min of about 2-3 cycles is needed for PLV, PLI and wPLIdeb to reach the AUC threshold value (AUC = 0.7), while at least 5 cycles are needed for iPLV and imCoh. Conversely, with a oSNR equal to 3 dB, about 10 cycles are needed to reach AUC = 0.7 for basically all methods.
In accordance with the previous results, the Fourier-based approaches were associated with lower performance than those obtained by relying on the Hilbert transform. In particular, using the former approach, the L min is almost doubled.
We also evaluated the performance of different connectivity metrics as a function of the phase difference between the simulated phase-coupled signals. Figure 5 shows the minimum number of cycles needed to obtain an AUC ⩾ 0.7, as a function of the phase difference.
Consistently with the previous findings, we obtained that the methods based on the Hilbert approach outperforms the Fourier-based ones, since the L min obtained with the latter is approximately two times larger than the L min obtained by using the former approach. Figure 5 shows that the curves always have their minimum at phase differences close to π/2. The range of phase differences in which each metric reaches AUC values close to (its) minimum depends on the level of oSNR: the higher the oSNR the larger this range (that approximately spans 100 • for a oSNR equal to 20 dB, and 60 • for a oSNR = 3 dB). Interestingly, global minima for L min are reached by imCoh, while other metrics, e.g. lagCoh and iPLV, reached these minima only in particular conditions.
Finally, we showed some practical implications of using a finite data length in the example case of a dynamic connectivity analysis experiment. We simulated time-courses pairs composed of phase-coupled (in the alpha frequency band, i.e. 7.5-12.5 Hz) signals of different durations (length = 1, 0.5, 0.25 and 0.125 s), which are preceded/followed by uncoupled signals. Then, we estimated the dynamic connectivity using the PLV with the sliding-window lengths L = 10, 7.5, 5 and 2.5 cycles, which correspond to a sliding-window of 1, 0.75, 0.5, 0.25 s, respectively. Figure 5. Minimum data length in the range 1−20 cycles (cyc.) required to obtain AUC ⩾ 0.7 as a function of phase difference between source time-courses, calculated for the different connectivity metrics (PLV, iPLV, PLI, wPLIdeb, Coh, imCoh, lagCoh) and for a value of test fractional bandwidth B test f = 0.5; the plots in different rows refer to different spectral analysis approach (Hilbert-based and Fourier-based); the plots in different columns refer to different values of oSNR (20 dB, 10 dB, and 3 dB).

Figure 6.
Median value (solid lines) and the range from the 16th to the 84th percentile (shaded regions) for 5000 estimates of dynamic connectivity as measured by PLV using a sliding-window approach; the black solid line represents the theoretical phase-coupling profile, i.e. equal to 1 for phase-coupled signals, or 0 for uncoupled signals. The plots in different rows refer to different sliding-window lengths (250 ms, 500 ms, 750 ms, and 1000 ms); the plots in different columns refer to different dynamic connectivity profiles (edge, and steps of length 1 s, 0.5 s, 0.25 s, and 0.125 s). Figure 6 shows the obtained dynamic connectivity profiles.
The detection of coupling is relatively easy when it lasts for 1 s, although the connectivity profile for each sliding-window length has different slope and peak value. In particular, a short sliding-window is associated with a connectivity profile with a steep slope, whereas the opposite is found for longer windows.
In addition, our results showed that short windows have a clear detection peak also when the coupling has a short duration, and, on the opposite, larger windows failed to detect short couplings. Another interesting aspect is that uncoupled signal pairs result in an increased average value of PLV as the window gets shorter. Using a short sliding-window, uncoupled signals may thus be wrongly detected as coupled, but it is likely that most of the phase-coupled sources are detected as functionally. Conversely, using a larger window, uncoupled signals are hardly detected as coupled, but it is possible that short-term PCs are not detected at all.

Discussion
In this study, we investigated the impact of the data length to reliably estimate PC in a dynamic FC framework. Overall, our results show that in the comparison between the spectral analysis approaches, the Hilbert-based approach outperforms the Fourierbased one. Moreover, we showed that five oscillation cycles may be considered sufficient to conduct a reliable analysis for almost all the analyzed (Hilbertbased) metrics, assuming a conventional threshold for the AUC equal to 0.7 (Swets 1988), and an oSNR greater than or equal to 10 dB. In fact, a substantial fraction (approx. 25%) of spontaneous PC at the frequency of the µ-rhythm within interhemispheric cortical motor network can be captured by windows of about five oscillation cycles (Ermolova et al 2021). In general, we found some similarities among the behavior of the analyzed methods, which may depend on mathematical relations among them (Nolte et al 2020; see the supplementary material (available online at stacks.iop.org/JNE/19/016039/mmedia) of this paper for a proof of the theoretical relations among the three PC methods that were not considered in the above-mentioned study). However, since these relations hold for an infinite amount of data and we deal with short windows of data, it is reasonable that some statistical differences arise. Specifically, the metrics that reached an AUC of 0.7 with the smallest number of cycles for high SNRs (i.e. 20 dB and 10 dB) were those relying on both the imaginary part and the real part, such as PLV and coherence of orthogonalized signals, and the lagged coherence. Unfortunately, these metrics might be prone to false positives (figure S2 shows e.g. an overall inflation of Coh with respect to imCoh values also where no connectivity is expected). Moreover, a direct comparison between the different metrics depends on the specific application, e.g. in relation to the accepted FPR, and is beyond the scope of this work. Our findings also demonstrated that the phase difference influences the number of cycles to be chosen to get an AUC equal or greater than 0.7, but this influence is limited. For example, for an oSNR greater than or equal to 10 dB, the minimum data length required to obtain AUC ⩾ 0.7 is always smaller than five cycles for a phase difference in the range 40 • − 140 • for all connectivity metrics, with a minimum corresponding to ±90 • . This minimum value can be easily explained since artifactually zero-lag correlated sources are associated with phase differences either close to 180 • or to −180 • , which are maximally distant from ±90 • . For a frequency of 10 Hz, the range 40 • − 140 • would correspond to a lag between source time-courses in the range 10 ms-40 ms.
In addition, our results showed that the fractional bandwidth chosen for the analysis also influences the number of cycles to be chosen to get an AUC equal or greater than 0.7. For example, for an oSNR greater than or equal to 10 dB, choosing a bandwidth approximately equal to (or greater than) the one associated with the underlying coupling would be the best choice. Since it is difficult to have a priori knowledge regarding the exact frequency range of the couplings in real data, unless previous neuroscientific knowledge is available, our suggestion is to avoid selecting a very narrow frequency range since this choice would imply using larger windows of data to reach the desired performance. In practice, the unmotivated reduction of the analyzed frequency range may worsen the performance of PC methods. In our simulation study, we assumed the individual central frequency to be known. Indeed, even though in real-data applications the individual central frequency may also be a priori unknown, power or time-frequency analyses can be performed, e.g. to inform us about individual frequency peaks (which can be successively used as the central frequency for a PC analysis).
Finally, we investigated some practical implications of using short/long windows of data in case of neural couplings with different duration. In particular, short sliding-window showed higher sensitivity and lower specificity then longer sliding-window. In real-time settings, it will be fundamental to decide whether more specificity or more sensitivity is of interest, also considering that the value of these variables depends on the chosen metric (see figure S1). The aspects should be carefully considered in scientific experiments, such as those based on closedloop EEG with concurrent Transcranial Magnetic Stimulation (TMS) (Zrenner et al 2016, Tervo et al 2022, in which a stimulation pulse might be triggered by the real-time detection of specific functional couplings. Although in the main analysis we only showed, as a representative example, the dynamic profiles associated with PLV, we also performed edge response analysis with the other PC metrics. The results showed that all metrics have a similar behavior for a sliding-window length larger than 250 ms. When the shortest sliding-window length is used, PLI and wPLIdeb tend to reach values that are close to 1 also for phase-uncoupled sources (see figure S2). This behavior might be due to the fact that these metrics take into account, in the expectation value, the sign-function of the phase-difference, which is computationally equal to either 1 or −1, thus being almost always larger (in magnitude, and in case of uncoupled sources) than a plain raw value. The above-described negative behavior is mitigated by considering a higher number of cycles.
One aspect that has not been taken into account in this study is the potential advantage of using multidimensional (MD) connectivity methods instead of one-dimensional (1D) methods (see e.g. Basti et al 2020). The MD methods allow to analyze the statistical dependency between two sets of multivariate time-courses, such as the parcel activity or the three-dimensional time-course associated with reconstructed cortical activity with free source orientation. Instead, 1D-methods, such as the approaches analyzed in this study, are those that can only be applied to pairs of scalar time-courses, e.g. to two EEG sensor signals or to two time-courses associated with orientations normal to the local cortical surface. Since recent studies showed the sub-optimality of 1Dmethods when standard-length data have a multivariate nature (Geerligs and Henson 2016; see also Basti et al 2019 for the concept of 'multivariate nature'), it might be possible that the difference in performance also arises when shorter data are considered. Nevertheless, a comprehensive comparison between 1D and MD is out of the scope of this paper. Similarly, here we focused on the performance of undirected PC methods and, thus, we did not consider e.g. amplitudebased methods (O'Neill et al 2015), cross-frequency methods (Palva et al 2005, Chella et al 2016, and directed PC methods, such as phase slope index (Nolte et al 2008, Basti et al 2017 and phase transfer entropy (Lobier et al 2014).
The synthetic study reported here relied on simulations of interacting neural sources. While we here limited our analyses on pairs of phase-coupled source, in future studies it will be interesting to quantitatively analyze the changes in the metric performances induced by the presence of complex PC networks composed of more than two sources (which may in addition be the framework for analyzing also multivariate and MD PC approaches). We preferred to rely on a model providing phase-coupled time-courses in a frequency band of interest (resembling frequency-specific neural phase-coupled activities). Several other models have been used in the past for conducting synthetic connectivity analysis; two typical examples are AR models , Sommariva et al 2019 and neural mass models . In future studies, it will also be interesting to analyze whether the presence of amplitude correlated time-courses, such as those obtained by relying on AR models, may affect the performance of some connectivity methods (e.g. wPLI, which explicitly weighs the amplitudes). Furthermore, in our simulation pipeline we explicitly used two forward models in order to (independently) generate the EEG signals and solve the inverse problem. For pursuing the latter step, we relied on eLORETA method and assumed the source locations to be perfectly estimated by this method. The source-leakage effects are, however, not significantly reduced by this step: the biological noise couples to the estimate through the leakage inherently present in the inverse estimator. Since we only used robust PC methods, we do not expect significant metric performance inflations.
In conclusion, the achieved findings pave the way to the definition of guidelines necessary when an online frequency-specific PC assessment is of interest. Such estimation might be crucial in a non-invasive closed-loop EEG-TMS or in a neurofeedback framework.

Data availability statement
The data that support the findings of this study are available upon reasonable request from the authors.