Combined head phantom and neural mass model validation of effective connectivity measures

Objective. Due to its high temporal resolution, electroencephalography (EEG) has become a promising tool for quantifying cortical dynamics and effective connectivity in a mobile setting. While many connectivity estimators are available, the efficacy of these measures has not been rigorously validated in real-world scenarios. The goal of this study was to quantify the accuracy of independent component analysis and multiple connectivity measures on ground-truth connections while exposed real-world volume conduction and head motion. Approach. We collected high-density EEG from a phantom head with embedded antennae, using neural mass models to generate transiently interconnected signals. The head was mounted upon a motion platform that mimicked recorded human head motion at various walking speeds. We used cross-correlation and signal to noise ratio to determine how well independent component analysis recovered the original antenna signals. For connectivity measures, we computed the average and standard deviation across frequency of each estimated connectivity peak. Main results. Independent component analysis recovered most antenna signals, as evidenced by cross-correlations primarily above 0.8, and maintained consistent signal to noise ratio values near 10 dB across walking speeds compared to scalp channel data, which had decreased signal to noise ratios of ~2 dB at fast walking speeds. The connectivity measures used were generally able to identify the true interconnections, but some measures were susceptible to spurious high-frequency connections inducing large standard deviations of ~10 Hz. Significance. Our results indicate that independent component analysis and some connectivity measures can be effective at recovering underlying connections among brain areas. These results highlight the utility of validating EEG processing techniques with a combination of complex signals, phantom head use, and realistic head motion.


Introduction
A recent thrust of neuroimaging research has been to measure brain activity during mobile real-world scenarios [1]. Traditional neuroimaging methods, such as functional magnetic resonance imaging (fMRI) and positron emission tomography (PET), require stationary subjects, limiting their use for real-world recordings. In contrast, high-density EEG is a promising method for recording real-world brain dynamics due to its portability and high temporal resolution [2,3]. EEG is affected by low spatial resolution and artifact contamination, making it challenging to extract meaningful cortical information [4]. Blind source separation using independent component analysis can separate out cortical and artefactual sources, reducing the impact of artifact contamination and improving spatial resolution [5,6]. Such high-density, sourcelocalized experiments have been performed during mobile tasks such as treadmill walking, stair stepping, and balancebeam walking [7][8][9].
While many EEG studies analyze frequency-domain spectral power, understanding the flow of information amongst brain areas using connectivity analysis can provide a more complete understanding of the brain by quantifying interactions amongst cortical regions. However, there are many connectivity measures to choose from. One class of measures originated from Granger causality, which states that one signal influences activity in a second signal if information from the past of the first signal provides information that helps predict the future of the second signal [10]. This idea was developed for two signals only, but has since been extended to multichannel data by using multivariate autoregressive modelling [11]. One of these extensions is directed transfer function [12], which was based on the transfer function of the autoregressive model. This was later corrected using normalization to be frequency independent and sensitive only to direct connections, leading to full-frequency directed transfer function (ffDTF) and direct directed transfer function (dDTF), respectively [13]. Another extension of Granger Causality is partial directed coherence, which was based on the model coefficients in the frequency domain [14]. Corrections to make this measure less dependent on scaling and later scale-free have resulted in generalized partial directed coherence (gPDC) and renormalized partial directed coherence (rPDC), respectively [15,16]. In additions to these extensions of Granger causality, there is also Granger-Geweke causality (GGC) [17]. Additionally, other connectivity measures, such as phase locking value (PLV) and weighted phase lag index (WPLI), do not use multivariate autoregressive models. PLV measures the relative phase between two sources [18], but can include spurious, instantaneous connections due to volume conduction. In contrast, WPLI is based on imaginary coherence, which ignores these instantaneous connections and increases sensitivity to true connections [19]. For our study, we used the debiased WPLI-square estimator from Vinck et al [19]. Because PLV and WPLI do not act on model-fit EEG activity, they take advantage of averaging results across multiple trials. Due to the limitations of various connectivity measures, many other measures and techniques exist beyond the ones listed here, including methods to identify connectivity using nonstationary burst dynamics that Granger causality measures do not usually account for [20,21].
Due to the abundance of connectivity measures, several studies have attempted to validate these measures. Previous research has used simulated data to generate connection patterns with a known ground truth [22]. This has been used to verify connectivity during walking [23] and to show that connectivity measures can be affected by volume conduction [24]. The downside of such modelling is that it usually avoids the non-linearities of the real world, which could potentially violate the assumptions of the measure being validated. Another way to compare measures is by recording EEG from human subjects and then using various metrics for validation [25], but this leads to assumptions about the underlying connectivity pattern in the absence of a ground truth. There is currently a need to validate connectivity measures on real-world ground truth signals.
In addition, there is ongoing debate as to whether connectivity estimation should be performed at the channel or source level [24,26,27]. The concern with source-level connectivity is that it impairs the channel data's correlative structure, removing important information [28]. On the other hand, source-level connectivity is less influenced by volume conduction and involves specific cortical areas [24]. In addition to the volume conduction, motion artifact becomes a concern in mobile settings [29]. There is also a need to determine if independent component analysis preprocessing can result in accurate connectivity estimation in the presence of real-world volume conduction and motion artifact.
One way to provide a real-world testing with ground truth signals is to use a phantom head with embedded antennae. Head phantoms have long been used in fMRI research to test methods [30] and have also validated EEG source estimation techniques [31,32]. EEG phantom heads mounted upon a moving platform have quantified the effects of head motion and cable sway [33,34]. However, no phantom head studies have validated connectivity estimation techniques. In order to test connectivity measures with complex, multi-frequency signals, we used a neural mass model. Neural mass models are based on the oscillatory properties of neuronal networks, which can be used to generate oscillations at various physiological frequency ranges [35]. By summing the results of multiple neural mass models, complex waveforms can be created [36]. Additionally, interconnections can be created between neural mass model signals to analyze EEG connectivity [23,37,38]. However, these previous studies involved computer simulations and did not account for the real-world effects as a phantom head might.
The purpose of this study was to determine the efficacy of independent component analysis and connectivity measures on signals of varying complexity exposed to real-world volume conduction and motion artifact. We hypothesized that independent component analysis would be able to recover the antennae sources and separate out motion artifact as measured by signal to noise ratio and cross-correlation of the resulting independent components. In addition, we hypothesized that the connectivity estimation measures we used would be able Phantom head antennae locations. A CT scan (left) and diagram (right) of the antennae locations within the phantom head are shown, using an axial view. The low, mid, and high antennae were used to generate the signals of interest that contained intermittent connections. These names are based on the peak frequency content of each antenna, with the low signal containing the lowest peak frequency while the high signal included the highest peak frequency of the three signals. In addition, we used three distractor signals at the antenna locations marked in yellow, which are numbered for later reference. Two other antennae were not used for this study due to technological constraints. Antenna signals of interest power spectra. The power spectra for the three signals of interest (low, mid, high) are shown for each condition. Signals in the single peak condition have a single sharp frequency peak, indicating one dominant frequency (the smaller peaks in the mid and high signals are from intermittent connections throughout each condition). The power spectra during the smeared peak condition are less sharp, reflecting a more complex signal. For the double peak condition, each signal had two frequency peaks, which is best exemplified by the high signal power spectra. to find the true causal interactions based on the peak frequency for each measure.

Phantom head setup and antenna signals
Our phantom head consisted of a mannequin head with eight exposed wire pairs, around which we used a combination of dental plaster, sodium propionate, and water to simulate realistic tissue conductance. See Oliveira et al for a more complete description [33]. We sent predefined signals into each antenna using an input/output interface (MicroLabBox, dSPACE GmbH, Paderborn, Germany). We used 6 of the 8 antennae due to memory constraints (figure 1). Three antenna signals contained intermittent connections, while the other three were distractor signals to see how well independent component analysis and connectivity measures performed in the presence of other signals.
The three non-distractor signals were classified as low, mid, and high based on the main frequency component of the signal, as shown in figure 2. The low signal's peak frequency was at 6.5 Hz, corresponding to the EEG theta band. Peak frequency for the mid signal was at 10 Hz, corresponding to the EEG alpha band. For the high signal, the peak frequency was at 41 Hz, which corresponded to the EEG gamma band. The experiment included three conditions with different antenna signals: (1) signals with a single peak frequency, (2) signals with a smeared, single peak frequency, and (3) signals with two frequency peaks.
We generated complex and physiologically-relevant signals for each antenna using a neural mass model based on previous research [35,38]. We used the neural mass model to generate six separate sources with peak frequencies in different EEG bands (delta, theta, alpha, beta, low gamma, and high gamma). These sources were summed together to create each final antenna signal, using the different weightings shown in table 1.
For each condition, we induced periodic connections between the three antenna signals of interest, using the pattern shown in figure 3. The six pre-defined signals lasted for 20 min total for each signal condition, with intermittent connections every 2 s. We recorded 20 min of 128-channel EEG (BioSemi Active II, BioSemi, Amsterdam, NL) from these signals sent through the phantom head, resulting in 100 trials for each type of periodic connection.
It should be noted that for the single dominant frequency condition, we found that the front-most low-frequency distractor signal (peak frequency of 4 Hz) resulted in suboptimal independent component analysis decomposition. When we analyzed the correlation for each antenna signal between 12 s trials, we found a much higher autocorrelation for this low-frequency distractor signal (0.42) than any of the other signals, including the low signal (0.01). This suggests that independent component analysis performs better with some data variability. We used a 3.25-4.75 Hz notch filter to remove this signal only for the single dominant frequency condition, which improved the independent component analysis decomposition.
We collected real-world human head motion during gait from one young healthy subject (male), using an inertial measurement unit (APDM, Portland, OR) strapped to his forehead. This subject provided written informed consent, and our protocol was approved by the University of Michigan Health Sciences and Behavioral Sciences Institutional Review Board for the protection of human subjects. We recorded 20 min conditions each of standing and walking at 0.5 m s −1 , 1.0 m s −1 , 1.5 m s −1 , and 2.0 m s −1 . This data was converted Table 1. Neural mass model frequency weightings for each antenna signal across conditions. Values show the relative weighting of each source generated from the neural mass model (column headers), with weights adding up to 1. The single peak condition used one neural mass model source for each antenna, while the smeared peak condition distributed weights to neural mass model sources with nearby peak frequencies. The double peak condition used unequal weightings of only two neural mass model sources.
into trajectories that were replicated by our Notus hexapod (Symétrie, Nimes, FR), similar to a previous study [34]. By mounting the phantom head on top of the hexapod, we could simulate realistic human motion during phantom head recordings. Due to electromagnetic noise from the hexapod motors when using the MicroLabBox, all EEG recordings with the antenna signals turned on were performed with the motors off (signal data). On the same testing day, we separately recorded EEG motion data with the hexapod turned on and the MicroLabBox disconnected (motion data). We then added the motion data to the signal data during post-processing to approximate EEG signal recordings with motion artifact.

EEG analysis
EEG data were processed in EEGLAB using custom Matlab scripts [39]. We high-pass filtered the data at 1 Hz to remove baseline drift. We performed bad channel rejection by identifying channels with notably large standard deviation, a kurtosis above five standard deviations, or with uncorrelated activity for more than 1% of the trial time [8]. No channels matched these criteria, so all channels were retained. We referenced the data to the common channel average. For the motion-only data, we performed a fast Fourier transform using Welch's method to characterize the frequency content of motion artifact at different walking speeds. We then added the motion and signal data, creating separate motion trials for each signal condition. In addition, we added simulated pink noise to maintain a similar 1/f power result to standard EEG studies [40]. We also increased the 60 Hz noise by adding uniform random noise that was bandpass filtered between 59-61 Hz. We then re-referenced to the common average across channels and ran adaptive mixture independent component analysis (AMICA) [41,42], using principal component analysis reduction to 60 components beforehand. After running independent component analysis, we used the maximum cross-correlation between independent comp onents and the original antenna signals to identify the Connectivity protocol between three antennae of interest. The pattern for each connectivity trial is shown. Circles indicate the three antenna signals of interest, with low/mid/high referring to each signal's relative peak frequency. Arrows signify when a connection between signals was present, with titles at the top indicating what type of connection was present during each 2 s period. Each trial lasted 12 s total. We included 100 trials (20 min total) for each motion condition. component associated with each antenna. It is important to note that the sign of each independent component time series can be arbitrary based on the component weights, leading to inverted component data compared to the original signals [43]. This is less of an issue for single-frequency sinusoidal signals, where lagging the component signal can remove the inversion effect. Because we dealt with complex, non-periodic signals, we selected the maximum cross-correlation between each the inverted and non-inverted component time series. We also calculated the power spectra of the three antenna signals of interest (low, mid, high) and their corresponding components in order to quantify similarity in frequency content. We then computed scalp maps for each component to visually determine the spatial similarity between the antenna and its corresponding component. In addition to cross-correlation, we calculated signal to noise ratio by using the independent component analysis weights that map channels to components and applying them separately to the signal data and to the combined motion and pink noise data because we collected the signal and noise data separately. Signal to noise ratio was calculated as the mean square of the signal data divided by the mean square of the noise data, converted to decibels.
Connectivity was performed using the source information flow toolbox (SIFT) [44]. We retained the five components that best aligned with the five antenna signals used (excluding the front-most distractor components for consistency across conditions). These five components were processed in SIFT, using a 500 ms sliding window and 25 ms step size. Each window was detrended. Each connection type had 100 trials per condition. We fit separate multivariate autoregressive models to our data for each motion and signal condition (and connection type), using Hannan-Quinn information criterion Example of time-averaged connectivity. An example of how the time-averaged connectivity is obtained from the time-frequency connectivity results from SIFT. We averaged the 1 s following connection onset, which is at time 0. This results in a one-dimensional trace that shows the average frequency connectivity that measure found. Using this, we were able to plot connectivity results across all measures of interest on a single plot. Real walking noise effect on EEG. The time courses (left) and power spectra (right) of just the head motion artifact recorded with the EEG system are shown. We recorded head motion during five different walking speeds, from stationary (0 m s −1 ) to 2.0 m s −1 , and used a motion platform to play back this head motion while recording EEG from the phantom head. Peak frequency power increases at faster walking speeds, along with each peak shifting towards a higher frequency. This can be seen in the time courses, as the rhythmic motion artifact becomes more pronounced and oscillates quicker as walking speed increases.
to determine the optimal model order [11]. After fitting and validating the model, connectivity was estimated using dDTF, ffDTF, gPDC, rPDC, GGC, WPLI, PLV (figure 4). We also performed phase-randomized surrogate statistics to determine significantly nonzero connectivity estimates [45]. This uses the same model fitting and connectivity estimation techniques, but applied to phase-randomized data, creating a null distribution. Non-significant values were set to 0.
To reduce the dimensionality of our connectivity data, we averaged the resulting time-frequency connectivity values across the first second of connectivity onset, as shown in figure 5. We then normalized the averaged results to the maximum value for each condition, allowing comparisons across different measures. We plotted this averaged, normalized connectivity together for all connectivity measures during the stationary condition. To quantify relative accuracy and precision, we computed an average frequency and standard deviation during the stationary condition, weighted by the connectivity strengths at each frequency. We also used the maximum value across frequency bins for each connection to determine how strong each estimated connection was. We compared the stationary condition to the motion conditions using correlation of the time-averaged, significance-masked connectivity. This included a comparison between the connectivity results from the stationary condition and from the signals that were sent into each antenna, which helped determine the effect of the phantom head. While we did use surrogate statistics, we were unable to use statistics to compare across conditions and measures because there was only one subject. We feel that it is reasonable to not have statistics because we have a ground truth to compare to. Similar studies attempting to validate EEG processing and connectivity measures have also not used statistics [25,33].

Results
The EEG motion artifact noise from head motion during walking was concentrated at frequencies below 4 Hz (figure 6). Each walking speed contained different frequency peaks. As walking speed increased, EEG noise data power peaks increased in power and shifted towards higher frequencies. This can be seen in the raw data traces, where faster speeds have larger peak amplitudes and faster oscillatory behavior. At faster walking speeds of 1.5-2.0 m s −1 , large harmonic frequency peaks can be seen near 2 and 3 Hz.
Independent component analysis performed well in finding the 3 signals of interest in each condition (figures 7 and 8). The only exception was the low signal during the double peak condition, which was not well-recovered based on the difference in power spectra and low signal to noise ratio. Otherwise, independent components had high signal to noise ratio values ~10 dB or higher that remained consistent at fast movement speeds. In contrast, the signal to noise ratio of the Cz channel started near 10 dB during the stationary condition, but decreased to ~2 dB at the fastest walking speed. Visual inspection of the independent component power and original antenna signal power spectra indicated that volume conduction, head motion, and pink noise mostly added power to the delta (1-4 Hz) and gamma (>30 Hz) power bands. Cross-correlation was above 0.9 for the single peak condition, above 0.8 for the mid and high signals for the other conditions. Based on the decreased signal to noise ratios and cross-correlations for the low signals compared to the other signals in the smeared peak and double peak conditions, independent component analysis seemed to have the greatest difficulty recovering low-frequency signals.
Autoregressive model validation prior to connectivity estimation showed reasonable model fits to the data, as shown in table 2. All models across signal and motion conditions had low parameter to datapoint ratios (<0.1), indicating that overfitting was unlikely. Interestingly, the model orders increased slightly for the single peak condition compared to the other two conditions. For all conditions, the likelihood of the residuals being white and consistency values were below the desired levels of 0.95% and 85%, respectively, which likely indicates extra data structure not captured by the model. The negative stability index across all conditions indicated that all models were stable. Overall, the models were stable and appeared to avoid overfitting, indicating that they fit the data well. In addition, we used the same model fit across different connectivity measures, so the fit of each model should not have impacted inter-measure connectivity differences.
Time-averaged estimated connectivity varied among different connectivity measures for the stationary motion condition (figure 9), with some measures containing frequent spurious results. Most connectivity measures were able to determine the mid signal to high signal connection, validating the use of such measures for estimating connectivity. However, there were clear differences across measures. Both PLV and WPLI frequently found spurious, high-frequency connections. They also correctly estimated connectivity in the low to mid connection during the double peak condition, even though independent component analysis did not recover the original low signal. Both GGC and gPDC also had spurious high-frequency connections, with GGC identifying no true connectivity during the single peak condition. In addition, rPDC incorrectly estimated spurious low-frequency connectivity. While ffDTF and dDTF appeared to be robust to noise, we noticed that ffDTF can sometimes estimate the connection to be in the wrong direction, such as to both low to mid connections during the smeared peak condition. dDTF may Signal to noise ratio remained consistent across walking speeds for the components, while the signal to noise ratio of a representative channel (Cz) is notably affected. Additionally, cross-correlation indicated which independent components best match with their respective original signals. With the exception of the low signal for the double peak condition, independent component analysis appeared to recover the original signals well. Table 2. Multivariate autoregressive model validation results. Mean validation results from the fit models are shown, with standard deviation in parentheses. Optimal model order was determined using the Hannan-Quinn Criterion across all time windows. The models were stable and likely avoided overfitting due to negative stability indices and parameter to datapoint ratios below 0.1, respectively. The likelihood of the residuals being white and the model consistency were slightly lower than desired, indicating that the model may not have completely captured all of the data variance. We used the same model fit across different connectivity measures, which avoids differences in model fit from affecting inter-measure differences. sometimes show directional connections as bidirectional, but it does not appear to indicate incorrect connectivity direction. Still, this suggests caution when interpreting estimated connectivity direction. The weighted average and standard deviation of the timeaveraged connectivity highlights differences in accuracy and precision across measures (table 3). PLV, WPLI, and gPDC had consistently high average frequency and large standard deviations, reflecting their susceptibility to spurious high-frequency connectivity. GGC performed best when estimating the mid to high connections for the smeared peak and double peak conditions. Otherwise, it did not estimate many other connections, indicating that its performance can vary considerably based on experimental conditions and the underlying connections present. ffDTF performs well for the single peak condition, but did not find anything for the low to mid connections during the smeared peak condition. Both dDTF and rPDC appear to perform well, but rPDC appears biased towards low frequencies of 4-5 Hz during some mid to high connections when the low to mid connection is also present.
Correlation between the stationary condition and motion conditions show a complex effect of motion on connectivity estimation (figure 10). All measures were affected by motion, which may rely on the quality of the independent component decomposition for each condition. dDTF usually had high correlations close to 1 across motion conditions, except for the double connections for the single peak and smeared peak conditions where correlation dropped almost to 0. rPDC also had consistently high correlations near one across motion conditions, except during the smeared peak condition. Both WPLI and PLV only displayed correlations near one for the single peak condition, indicating that they may be more susceptible to motion effects for complex signals. It is important to note that these results must be interpreted in conjunction with the stationary condition results. For example, GGC was quite consistent for the single peak condition, but the stationary results show that GGC was consistently finding hardly any connectivity. In addition, the estimated connectivity using the raw antenna signals was consistently different from the stationary condition for all measures, highlighting the effect of real-world phantom head testing.

Discussion
We used a novel combination of complex neural mass model signals and a phantom head to validate independent component analysis and connectivity measures under realistic head motions. We found that independent component analysis primarily recovered the original signals of interest and separated out motion artifact. For connectivity estimation, we found Table 3. Connectivity weighted mean and standard deviation. For each significance-masked, time-averaged connectivity result, we computed the weighted mean and standard deviation (in Hz) to quantify the estimated connectivity distribution across frequency. This is only shown for the stationary motion speed. In addition, bolded values indicate the maximum value within each time-averaged connectivity result, illustrating how strong each result was. Values are displayed according to connectivity conditions: (A) single peak, (B) smeared peak, and (C) double peak. The last two columns show results from the combined low/mid and mid/high connectivity pattern, while the first two columns are from the individual patterns. variable results across measures and conditions, with most measures able to correctly estimate the underlying connectivity. Measures applied directly to the data, instead of a fitted model, were susceptible to spurious high-frequency connections. In general, dDTF, ffDTF, and rPDC performed best for our experiments out of the measures we used.

Motion artifact and independent component analysis
The effect of walking on EEG signals occurred mostly at low frequencies, indicating that slow walking speeds minimally affect EEG results, especially at most physiological frequencies. This has been indicated by other studies [46,47], but differences in cable sway across experimental setups can affect results [34]. We bundled the cables together for this study, which likely decreased the effect of motion artifact. As walking speed increased, the spectral power peaks of the noise data and the frequencies of these peaks increased. This highlights the challenge of computationally removing motion artifact during fast walking and running, where the motion artifact is large and can overlap with EEG frequencies of interest. Dual-layer EEG systems that can subtract out motion artifact appear to be a promising method to mitigate this issue [48]. Independent component analysis performed well in mostly recovering the original signals. We expected this given the frequent use of independent component analysis in EEG research and its ability to recover single-frequency, sinusoidal signals during similar phantom head validation [33]. The consistent signal to noise ratio across motion speeds emphasizes the importance of using blind source separation to minimize the effects of motion, which is why such methods are used often during mobile tasks [6,49]. The cross-correlation results aligned well with the signal to noise ratio results of the recovered independent comp onents of interest. Independent component analysis did not recover the low signal as well as the other signals in the smeared peak and double peak conditions, likely because of the added pink noise, not motion artifact. The effect of pink noise can be seen by comparing the recovered components' power spectra to the original antenna signals' power spectra. Across all signals and movement conditions, low frequency power is consistently increased in the recovered components, which likely leads to the decreased signal to noise ratio seen in the recovered low frequency components. Although motion artifact may be an important concern when analyzing low-frequency EEG activity, our results showed robust motion separation using independent component analysis.

Connectivity estimation measures
Connectivity measures generally identified the true connections, especially the mid to high connection. This validates the use of independent component analysis and such measures for mobile EEG settings. However, there were substantial differences in performance across measures, especially with regards to finding false positives. We especially noticed this for PLV and WPLI, which estimated connectivity directly from the data instead of using a fitted model. Because these measures utilize trial averaging, their performance likely would have increased with more trials. In addition, many other factors could have altered connectivity estimation, such as choice of reference or type of source localization used [25]. Still, multivariate autoregressive modelling may provide a more robust framework for connectivity estimation than trial averaging.
Out of the estimation techniques using multivariate autoregressive modelling, we found that dDTF, ffDTF, and rPDC appear to provide the most reliable estimates. GGC appeared unreliable, especially in the single peak and smeared peak conditions. Other techniques have been used for GGC besides multivariate autoregressive models [50], indicating that the methods used with GGC should be carefully considered beforehand [51]. We also found that ffDTF estimated the true connectivity correctly in most cases, but some of its results would lead researchers to conclude that the connection occurred in the wrong direction. This makes ffDTF potentially problematic to use if directionality is of particular interest, such as analyzing the connectivity between the cortex and leg muscles. Directionality accuracy appears improved for dDTF and rPDC, but it still appears important to utilize statistical tests to firmly establish a specific directionality. Our results show that no one measure provides a completely clean picture of the true underlying connectivity, suggesting that using multiple connectivity measures may provide the most robust estimates of underlying connections.
Despite the consistent component signal to noise ratio values, connectivity estimation still was impacted by motion and real-world volume conduction. Correlations varied between motion conditions and the stationary condition for all measures, without a clear indication of one measure being most robust to motion in all cases. In general, rPDC and dDTF appear to be the most stable across walking speeds, despite varying estimates during the smeared peak condition. While GGC, WPLI, and PLV were fairly consistent in some conditions, it is important to note that their stationary connectivity estimations were not ideal results, even if they were maintained for different walking speeds. In addition, we found consistently low correlation between the estimated connectivity during the stationary condition and the estimated connectivity performed on the original signals before being sent through the head. This effect was seen across all measures and conditions, indicating that volume conduction and noise from real-world recording at the scalp do consistently affect the resulting connectivity estimation [24].

Limitations
While we were able to validate connectivity under real-world scenarios, our study was limited to a subset of connectivity measures and motion artifact that did not consistently occur during the event of interest. There are many other available measures to estimate connectivity, including coherence, mutual information, and multivariate phase synchronization [52]. We focused primarily on measures based on Granger causality that were available in the SIFT toolbox [44]. In Figure 10. Correlation to original signals and motion conditions. The correlation between time-averaged connectivity of the stationary motion and all other head motions are shown. We also included connectivity results performed on the original signals that were sent into each antenna, designated as 'no phantom'. All motion speeds (and original signals) were fit to their own model, time-averaged, and masked using phase-randomized surrogate statistics with 200 permutations each. Both dDTF and rPDC appear to have high correlation across motion for most conditions. In addition, connectivity on the signals before they were sent through the antennae had consistently low correlation to the stationary condition, indicating the importance of using head phantoms for validating connectivity methods. addition, there are many other source localization techniques, such as the various beamforming methods [53]. We also did not look at motion artifact that consistently overlaps with connectivity onset, which would be applicable to EEG studies during locomotion. Any lingering motion artifact following independent component analysis may have had a notable effect on connectivity if time-locked to an event of interest. For this reason, it is important to consider the potential effects of motion artifact, even at slow walking speeds, if it is time-locked to the event of interest. The influence of motion artifact depends on a variety of factors, including the performance of blind source separation of motion and brain sources, the events of interest, and cable sway [34,47]. In addition, several post-processing motion artifact removal techniques have been proposed, which might have potentially improved our connectivity results across walking speeds [6,54,55]. Future phantom head studies similar to the one presented here would help validate how such methods work on ground-truth signals in a real-world setting, allowing researchers to better determine which method should work best in a given situation.

Conclusions
We validated that several connectivity measures can accurately estimate true connections between complex signals exposed to real-world volume conduction and head movement via a head phantom. Independent component analysis recovered most of the original signals and appeared to separate out motion artifact. We were able to show that performing connectivity on sources from independent component analysis can find the true connections in a real-world scenario, but no one measure performed optimally in every condition. It may be beneficial to use multiple connectivity measures to increase confidence in the estimated connectivity results. Our technique opens up the ability to use complex, ground-truth signals in a real-world environment to validate EEG methods, improving our understanding of how well common EEG methods truly work.