Information-theoretic approach to detect directional information flow in EEG signals induced by TMS

Effective connectivity analysis has been widely applied to noninvasive recordings such as functional magnetic resonance imaging and electroencephalograms (EEGs). Previous studies have aimed to extract the causal relations between brain regions, but the validity of the derived connectivity has not yet been fully determined. This is because it is generally difficult to identify causality in the usual experimental framework based on observations alone. Transcranial magnetic stimulation (TMS) provides a framework in which a controllable perturbation is applied to a local brain region and the effect is examined by comparing the neural activity with and without this stimulation. This study evaluates two methods for effective connectivity analysis, symbolic transfer entropy (STE) and vector autoregression (VAR), by applying them to TMS-EEG data. In terms of the consistency of results from different experimental sessions, STE is found to yield robust results irrespective of sessions, whereas VAR produces less correlation between sessions. Furthermore, STE preferentially detects the directional information flow from the TMS target. Taken together, our results suggest that STE is a reliable method for detecting the effect of TMS, implying that it would also be useful for identifying neural activity during cognitive tasks and resting states.


Introduction
Conventional studies on neural activity during human cognitive functions have focused on the mapping between functions and the brain regions in which they are processed, so-called functional localization. Recently, attention has shifted toward determining the association between brain regions during cognitive tasks, as this would elucidate how brain functions, especially higher cognitive functions, are processed (Rubinov and Sporns, 2010). To this end, functional connectivity, representing the correlation between the neural activity of different regions, has been intensively studied (Rissman et al., 2004;Babiloni et al., 2005;Chang et al., 2013). However, to determine the detailed mechanism and procedure of neural information processing, it is vital to understand the causal relation, rather than correlation, between neural activity. Recent studies have tackled this issue by developing methods to extract effective connections from noninvasive brain recordings such as functional magnetic resonance imaging (fMRI) or electroencephalograms (EEGs) (Büchel and Friston, 1997). Such analyses focus on the functional or effective connectivity of brain activity during cognitive functions as well as during the resting state, because it has been reported that the pathological resting state network differs from the normal one (Xiong et al., 1999;Greicius et al., 2003;Honey et al., 2009). Therefore, the network is expected to be useful as a diagnostic tool for brain diseases (Fingelkurts et al., 2007;Jordan et al., 2013;Mäki-Marttunen et al., 2013;Thul et al., 2016).
The problem with effective connectivity analysis concerns the validity of the obtained connectivity. In principle, to identify a causal relation, it is necessary to manipulate the candidate "cause" and observe the impacts under different conditions on the "effect." When any effective connectivity analysis is applied to experimental data during cognitive tasks or resting states, it is not clear whether the derived effective connection from one region to another indicates an actual causal relation or not. This is because it is impossible to identify the direct trigger of relevant neural activity in the usual experimental design, which employs observations alone. In contrast, the application of transcranial magnetic stimulation to EEG data enables us to directly influence neural activity in the vicinity of the stimulation target and measure the perturbed neural activity (Ilmoniemi et al., 1997;Massimini et al., 2005;Thut et al., 2005;Morishima et al., 2009;Glim et al., 2019). To compare neural activity induced by TMS against that in the absence of TMS, one could assess the causal relation from the region of the target to other regions.
In the present study, we evaluate two methods for determining effective connectivity by applying them to TMS-EEG data (Fuggetta et al., 2005;Kawasaki et al., 2014). The methods are symbolic transfer entropy (STE) and vector autoregression (VAR). STE is an information-theoretic method in which time-series data are transformed into sequences of motifs of EEG data "symbols" and the transfer entropy is calculated for a pair of symbols' sequences (Staniek and Lehnertz, 2008). VAR is an extended autoregressive model for multivariate time-series data; Granger causality analysis, one of the most frequently used analysis on effective connectivity, is based on VAR (Brovelli et al., 2004;Roebroeck et al., 2005). By applying these methods to TMS-EEG data, their performance can be evaluated in terms of the robustness of the results and the ability to detect directional information flow induced by TMS.

Subjects
Twelve healthy right-handed subjects (six females and six males, mean age: 26.3 ± 6.1 years) participated in the TMS-EEG study after providing informed consent. The study was approved by the ethics committee of RIKEN and was conducted in accordance with the Declaration of Helsinki. The TMS-EEG data were analyzed in a previous study for a different purpose (i.e., to probe the global propagation of phase resetting of ongoing oscillations and associated information flow in phase time series) (Kawasaki et al., 2014).

TMS
Subjects sat in a chair with their eyes closed and their head position fixed on a chin rest. They wore in-ear headphones with earmuffs over the ears and heard white noise masking tones. To reduce auditory evoked potentials, which can contaminate TMS-induced direct cortical responses, the level of white noise was adjusted so that subjects could not detect the TMS coil click. A biphasic stimulator with a 70-mm wing diameter figure-of-eight coil (Magstim Rapid, Magstim Company Ltd., UK) was used to deliver single-pulse TMS. The TMS intensity was set to 95% of the individual active motor threshold. The TMS pulses were delivered at intervals ranging from 2.5 to 3.5 s. If the interval was fixed, participants would predict the timing of TMS pulses, potentially inducing predictive brain responses around the TMS timing. To minimize such an effect, we made the interval varying between 2.5 s and 3.5 s. Subjects underwent four sessions of 50 TMS pulses. In two of the sessions, 50 real TMS pulses were delivered to the visual cortex targeting the Oz electrode. The TMS coil handle was oriented upward. TMS pulses were applied with a thin layer of plastic placed between the TMS coil and the scalp. This was inserted to attenuate TMS-induced mechanical conduction, which could evoke sensory responses. In the other two sessions, 50 sham stimulations were delivered. The handle of the TMS coil was oriented rightward and the handle axis rotated 90 • with one wing of the coil oriented to the scalp with a 3.6cm plastic cube and a thin layer of plastic between the coil and the scalp. The order of conditions was randomized in a counterbalanced manner across subjects.

EEG recordings and preprocessing
EEG signals were recorded using a TMS-compatible EEG amplifier (BrainAmp MR+, BrainProducts GmbH, Germany) with 63 sintered Ag/AgCl electrodes (EasyCap; EASYCAP GmbH, Germany), which were positioned according to the 10/10 international system. The EEG lead wires were arranged based on the orientation of the TMS coil to reduce TMS-induced electromagnetic artifacts (Sekiguchi et al., 2011). During EEG recording, electrode impedance was kept below 10 k . EEG signals were sampled at 1000 Hz and referenced to the left earlobe electrode with the ground electrode located at AFz. Signals were re-referenced offline to the average of the left and right earlobe electrodes. In addition, horizontal and vertical electrooculographic (EOG) signals were recorded.
In terms of preprocessing, the EEG data were first segmented around the TMS pulses (pre 3-s, post 2-s periods) and the period from −1 ms to 7 ms around the TMS was linearly interpolated to remove excessive TMS artifacts. Next, the EEG data were high-pass filtered using a threshold of 0.1 Hz. Trials in which the EOG and EEG signals exceeded an amplitude criterion (±150 V) were rejected. A low-pass filter was then applied using a threshold of 47 Hz. Finally, current source density transformation was applied to the EEG signals (Perrin et al., 1989;Kayser and Tenke, 2006). Two subjects were removed from further analyses because the TMS intensities were too weak to evoke sufficient brain responses.
Overall, careful procedures were applied to obtain genuine TMSinduced responses in comparison with a controlled sham condition during the TMS-EEG recording and preprocessing.

Analyses
The EEG trials remaining after artifact-contaminated epoch rejection (40-50 trials survived in each session) were serially concatenated and the directional information flow from one electrode to another was analyzed using pairs of serial data (Fig. 1b). The directional information flow between the EEG electrodes was evaluated using STE and VAR.
STE is a variation of the transfer entropy that describes a causal relation between time-series data (Staniek and Lehnertz, 2008;Schreiber, 2000). In the case of STE, each of the measured time-series data were transformed into a sequence of "symbols" represented by ordinal patterns of data points within each time interval. The transfer entropy was then calculated for a pair of such symbol sequences. Let the measured value of the i-th electrode at time t be x i (t). The symbol of the i-th electrode at time t, s i (t), was defined as the ordinal pattern of n data points within the sampling interval t, i.e., x i (t), x i (t + t), . . ., x i (t + (n-1) t). For the example of the upper time-series in Fig. 1c, the descending order among the four data points is which is defined as the symbol "1432." The STE from the j-th electrode to the i-th electrode c ij is obtained by where H(·|·) is the conditional entropy and s i (t) is a symbol taken by the time series of the i-th electrode during the time interval from t to t + (n-1) t. By definition, the transfer entropy from the j-th to the i-th time series c ij is the reduced amount of "uncertainty" in the case where the information from the j-th time series is used as compared with the case where only the information from the i-th time series is used. To calculate the conditional entropy, it was necessary to construct the joint probability p(s i (t+ ), s i (t), s j (t)). Because n was set to 4 in the present study, the number of symbols (permutation patterns) was 4! = 24, and so the state of an electrode at each time interval T = (n-1) t was represented by one of 24 permutation EEG recording was done with 63 electrodes and TMS was applied to the visual area of the Oz (59th) electrode. Same-colored close numbers correspond to adjacent areas, indicating that elements near the diagonal of a connection matrix represent local connections. (b) Real and sham TMS trials. Trial duration was 5 s and TMS and sham TMS were applied 3 s after the onset of each trial. The EEG data recorded from each electrode for 40-50 trials in a session were serially concatenated. (c) Forming a "symbol" from time-series data in STE analysis. Using the symbols in orange, the entropy of the symbol starting from the red point over ms is calculated. In this example, the upper time series is supposed to be a "receiver" whereas the bottom one is a "sender.". patterns. We examined the cases of t = 16 or 33 ms (which led to T = 48 or 99 ms). The future time point was set to 50 ms.
VAR is an autoregressive model that can be applied to multivariate time-series. For a pair of time series data i and j, VAR is given by where a i (n) (n = 0, 1, 2, . . ., N) and b ij (n) (n = 1, 2, . . ., N) are the coefficients of past terms for two time series, respectively. Granger causality considers the significance of the coefficients b ij (n) , namely, it assumes that the variable x j should exert its effect on x i as the "cause" of x i if introducing the terms of x j s improves the residual error on x i . Therefore, we regarded the coefficients b ij (n) as effective connections from the j-th electrode to the i-th electrode. In the present study, the order N was set to 3 and we investigated the cases of t = 16 or 33 ms.
To evaluate the ability to detect TMS effects, we conducted the following investigation. Let us denote the enhanced effective connection produced by TMS from the j-th to the i-th electrode in session k by c ij k (= c ij k, TMS -c ij k, sham ). By selecting the 100x% largest effective connections from each session, we calculated the average number of effective connections from an electrode as where (x) is the Heaviside step function, which takes a value of 1 if x > 0 and takes a value of 0 otherwise; c th k (x) is the threshold corresponding to the lowest value among the chosen 100x% connections for session k. We obtained the actual number of connections from the TMS target (59-th, Oz) as We evaluated the detection ability by comparing n TMS with n ave .

Choice of sampling interval t to detect TMS effect
We calculated the STE for the sham and real TMS trials. Fig. 2(a) displays the STE matrices and the difference between them ( c ij = c ij real -c ij sham ) for one of the subjects with t = 16 and 33 ms. As shown in the matrices, the STE with t =33 ms is able to differentiate the directions with large and small STE values better than the STE with t = 16 ms. To determine whether TMS brought about changes in the STE values, we examined the distribution of STE values for sham and real TMS trials. Fig. 2(b) illustrates those distributions for the same subject as in Fig. 2(a) with t = 16 and 33 ms. We applied a statistical test to the two distributions (Kolmogorov-Smirnov (KS) test). The KS statistic is 0.180 for the case t = 16 ms and 0.280 for t =33 ms, suggesting that the 33 ms provides more distinction between the STE values from the real and sham TMS trials. (In both cases, the two distributions are significantly different.) The results of the KS test for the other subjects are summarized in Table 1. Fig. 2(c) shows the distributions of differences induced by TMS, c ij , indicating that the differences for t =33 ms are greater than those for t = 16 ms. Thus, the sampling interval was set to 33 ms to detect the effect of TMS on directional information flows in EEG signals.

Robustness of STE analysis
To verify whether STE analysis can yield robust results, we compared the results of two sessions obtained at different times. STE matrices of sham and real TMS conditions, and the difference between them for two sessions, are shown in Fig. 3(a). For both of the sham TMS sessions, the matrices have very similar structures and directions, with large STE values appearing near the diagonal. This suggests that information flows occurred within local areas. In contrast, in both of the real TMS sessions, directions with larger STE values originated from posterior areas (channel numbers in the 50 s, see Fig. 1(a)) and point toward any regions including anterior areas. At the same time, we can see directions from anterior to posterior regions. The structure of the real TMS sessions looks very different to that of the sham TMS sessions, but both matrices from the real TMS sessions have a similar structure. As indicated in the difference matrices, although the information flows decreased for a small portion of directions (in blue), those of many directions were enhanced by TMS (in red). Overall, there are no large discrepancies between the difference matrices. To quantify the similarity, we examined the correlations between the results of the two sessions  bold value means the larger coefficient within a subject. * P < 0.001 otherwise P < 0.0001.  Fig. 3(b)). For the sham TMS sessions, the points are distributed along the line y = x, suggesting that the STE values obtained in a given direction for the two sessions were almost the same. As indicated in the plot, the Spearman correlation coefficient between the sessions is 0.85. The distribution of points for the real TMS sessions is slightly broad, but the correlation coefficient is high (0.94). As for the difference c ij , the correlation coefficient decreased slightly to 0.85. The correlation coefficients for all subjects are summarized in Table 2. These high correlation coefficients are evidence that STE analysis produces robust results, even for different sessions.

Choice of a suitable coefficient for an effective connection
Similar to STE, the VAR analysis is influenced by the choice of the sampling interval t. We set the interval to 33 ms, as for the STE analysis. Because an N-th order VAR model contains N coefficients representing the causal relation from one time-series to another (see Materials and Methods), we chose one coefficient (b (1) , b (2) , and b (3) ) as an effective connection. Fig. 4(a) shows the matrices B (n) = (|b ij (n) |) for sham and real TMS conditions and those of the dif-  Bold value means the largest coefficient within a subject. ns not significant, * = P < 0.05, ** = P < 0.01, *** = P < 0.001, otherwise P < 0.0001. the different contributions of the "cause" time-series at different time lags to the "effect" time series. Overall, b (1) seems to differentiate strong contributions from weak ones for both the sham and real TMS conditions. This tendency is supported by the difference matrices. While the STE analysis showed that directions from specific senders (rows) were enhanced by TMS, the VAR analysis indicates that the connections to specific receivers (columns) are enhanced by TMS. To select one of the coefficients as an effective connection, we compared their distributions for the sham and real TMS conditions (Fig. 4(b)). KS tests indicate that the distributions on this subject were significantly different (P « 0.001 for all three coefficients, see Table 3). The coefficient that was significantly different between the sham and real TMS conditions for all subjects was b (1) . Thus, we used b (1) to reflect effective connections in the later analyses.

Robustness of VAR analysis
Similar to STE, we examined the robustness of VAR analysis. Fig. 5 is a similar plot to Fig. 3, indicating the correlations between the results of sessions 1 and 2. For this subject, the matrices for the sham TMS condition have a similar structure. However, the matrices for the real TMS condition are very different; there are very few large coefficients in session 1, whereas a number of very large coefficients appear in session 2. This discrepancy is reflected in the difference matrices. The scatter plot for the sham TMS condition demonstrates that the points are loosely distributed along the line y = x, i.e., the coefficients between the two sessions are correlated to some extent. The plots for the real TMS condition and their difference indicate less correlation between the sessions, as shown by the correlation coefficients. The correlation coefficients for all  subjects are summarized in Table 4. These results suggest that the VAR analysis is session-dependent, and therefore less robust than STE analysis.

Detection of directional information flow enhanced by TMS
Finally, we evaluated the detection of TMS-enhanced effective connections. Figs. 6(a) and 6(b) display examples of connections that were commonly selected between the two sessions by STE and VAR, respectively. The number of connections selected by STE is much larger than that by VAR, and most of the connections given by STE arise from the TMS target (59-th, Oz electrode). To quantify this tendency, we compared the expected number of effective connections from an electrode when assuming random sampling, n ave , and the actual number of connections from the TMS target, n TMS (see Materials and Methods). For STE, the actual number is significantly larger than the expected number for any ratio of selection x, suggesting that STE preferentially selected directional information flow from the TMS target (Fig. 6(c)). In contrast, the VAR results indicate that the actual number of effective connections from the TMS target is similar to the expected number; consequently, the method failed to detect the effect of TMS (Fig. 6(d)). Fig. 6(e) and (f) show the distributions of the selected directions from the TMS target with respect to their distances. Although the distances of the (1),1 , b ij (1),2 ) are plotted, where b ij (1), k represents the coefficient from the j-th to the i-th electrode with 1-step lags in session k.
directions found by STE vary, some directions leading to distant regions (frontal regions) were detected, which suggests that TMS has a considerable impact on long-range information transfer.

Discussion
The ability to capture causal relations between regional neural activities would help us understand how brain regions cooperate during cognitive functions. Although this has been the subject of many studies using various data analysis methods, as mentioned in the Introduction1, the problem is how to effectively validate the obtained results. As to evaluate an analysis method, it is useful to apply it to synthetic data (Kobayashi and Kitano, 2013;Hu et al., 2016). Because the mathematically described model of multivariate time series that generates the synthetic data has some "ground truth" of causation, the performance of the analysis method can be evaluated by comparing the obtained results with the ground truth. However, the validity of the model (in this case, the plausibility of the dynamical model on EEG) presents another problem. Hence, the use of TMS, a controllable stimulation that directly induces neural activity in the vicinity of the target brain region, is of great interest. In the present study, we proposed an experimental framework with TMS and practically evaluated two analysis methods by determining whether they could detect causal relations in neural activities modulated by TMS.
Using this framework, we evaluated performances of two causality analysis methods-STE and VAR. The results indicated that the STE analysis yielded robust results, with almost the same results obtained in different sessions ( Fig. 3 and Table 2). In particular, directional information flows during the sham TMS condition exhibited extremely high correlation between sessions. Stable results under the sham TMS condition are advantageous because they play the role of a baseline for detecting neural activity modulated by TMS. On the contrary, VAR analysis was unreliable in terms of its robustness because the results differed considerably across the sessions (Fig. 5 and Table 4). Furthermore, STE analysis successfully detected information flows from the TMS target at a visual area (Oz) whereas VAR detected only by chance (Fig. 6). Taken together, we can conclude that STE is a suitable method of detecting the effect of TMS. Therefore, STE is useful for analyzing whole-brain activity during cognitive functions and resting states.
The good performance of STE would be interpreted as follows. By their nature, EEG data contain strong noise and fluctuations; thus, handling such data with care is important. Most methods, including VAR, utilize the measurements themselves, meaning that the results will be greatly influenced by noise and fluctuations. As for VAR, such noise and fluctuations should increase the difficulty of regression coefficient estimation. In the case of the standard transfer entropy (TE) where the probability distribution of measurements is calculated, the quantization of measurements necessarily for probability distribution are disrupted owing to noise "*" in the plots indicates the significance of P < 0.001 by Welch's t-test. An absence of symbols indicates that the result is "not significant." (e) and (f) show the distributions of lengths of the effective connections from the TMS target. The distance is defined by the coordinate of the electrode map ( Fig. 1(a)), rather than by the actual lengths of the subjects' heads. For example, the distance between Oz (59) and Cz (34) is 0.507. Because only the connections selected between two sessions were examined, the total numbers are different in the cases of STE (e) and VAR (f). and fluctuations. STE is a noise-tolerant approach because it is a coarse-graining method based on the magnitude relation between successive measurements, which decreases influences of the noise and extracts "sketches" of EEG wave forms. Furthermore, while VAR assumes that the data to be analyzed obey a Gaussian distribution and that the interaction between the time series is linear (Barnett et al., 2009), transfer entropy requires no assumption on the nature of data as long as one can obtain the probability distributions necessary for conditional entropy (Schreiber, 2000). Although it is difficult to clarify whether the Gaussian nature or linear interaction assumption, or both, is critical, the poor performance of VAR strongly suggested that the nonlinearity of neural dynamics, including their interactions, underlies EEG signals. Granger causality analysis has been extended to treat such nonlinearity in several manners, and modified versions of this analysis could be expected to perform much better than the original version (Marinazzo et al., 2008).
One of the intrinsic characteristics of EEG signals is their rhythmic behavior. It is well known that frequency bands in EEG signals play an important role in the transfer of neural information within local regions or between distant regions. STE is an amplitudebased method that does not utilize frequency-domain features-its performance depends on the sampling interval t. In the experiments, the ability to detect the effect of TMS on EEG signals was better for t =33 ms than for 16 ms (Fig. 2). When t =33 ms, the length of a symbol corresponds to a period of alpha-band oscillations (=99 ms). This result suggests that directional information flow is highly dependent on frequency bands (Von Stein and Sarnthein, 2000;Buzsaki and Draguhn, 2004;Hillebrand et al., 2016). Further investigation may reveal that such a dependence of directional information flow on frequency bands might differ according to brain regions or distances between brain regions. Furthermore, comparing our results with analysis utilizing features of the frequency domain such as phase would be an important step in understanding the details of inter-regional neural information transfer (Kawasaki et al., 2014).