Measuring neuronal avalanches to inform brain-computer interfaces

Summary Large-scale interactions among multiple brain regions manifest as bursts of activations called neuronal avalanches, which reconfigure according to the task at hand and, hence, might constitute natural candidates to design brain-computer interfaces (BCIs). To test this hypothesis, we used source-reconstructed magneto/electroencephalography during resting state and a motor imagery task performed within a BCI protocol. To track the probability that an avalanche would spread across any two regions, we built an avalanche transition matrix (ATM) and demonstrated that the edges whose transition probabilities significantly differed between conditions hinged selectively on premotor regions in all subjects. Furthermore, we showed that the topology of the ATMs allows task-decoding above the current gold standard. Hence, our results suggest that neuronal avalanches might capture interpretable differences between tasks that can be used to inform brain-computer interfaces.


Highlights
The spatial spreading of neuronal avalanches on the large scale is taskdependent Large-scale avalanches spread more through premotor areas during motor imagery Task-related differences are consistent over subjects and related to task execution Neuronal avalanches properties allow taskdecoding above the current gold standard

INTRODUCTION
Brain-computer interfaces (BCIs) constitute a promising tool for establishing direct communication and control from the brain over external effectors for clinical applications. 1,2However, the ideal features to design a BCI are unknown, since the underlying microscopic brain processes and their reflection on brain signals, are poorly understood. 3As a result, mastering noninvasive BCI systems remains a learned skill that yields suboptimal performance in $30% of users, referred to as the ''BCI inefficiency'' phenomenon. 4Measuring the dynamic features that are relevant to the execution of a task and, as a consequence, may improve BCI performance remains an open challenge. 3Indeed, the current features that are used in the context of BCI rely on local measurements 5 (mostly frequency band power features and time-point features, depending on the BCI paradigm) disregarding the interconnected nature of brain dynamics.Electromagnetic imaging data are dominated by 'bursty' dynamics, with fast, fat-tailed distributed, aperiodic perturbations, called ''neuronal avalanches'', traveling across the whole brain, [6][7][8][9] that have been recorded using electro/magnetoencephalography. 10,11Neuronal avalanches spread preferentially across the white-matter bundles, 12 they are modified by neurodegenerative diseases, 13 and they evolve over a manifold during resting-state, generating rich functional connectivity dynamics. 14Such rich dynamics are a major contributor to time-averaged functional connectivity. 6,15Hence, the spreading of neuronal avalanches might be a correlate of the functional interactions among brain areas and, as such, we hypothesize that they could spread differently according to the task at hand, thereby providing a powerful and original marker to differentiate among behaviors.To test our hypothesis, we compared source-reconstructed magnetoencephalography (MEG) signals in resting state (RS) and while performing a hand motor imagery (MI) task within a BCI protocol, in order to track the dynamical features related to motor imagery as compared to rest.We obtained the probabilities of each pair of regions being recruited sequentially in an avalanche, 12 compared these probabilities across MI and RS conditions edge-and subject-wise, and related the differences between the two conditions to the performance in the BCI task, as measured using the BCI classification accuracy.Furthermore, we used these features to decode the tasks from source-reconstructed data.
Figure 1.Overview of the analysis (A) Subject-level analysis: to identify, for each subject, the edges that show a significant condition effect.The average of all RS trials was subtracted, edge-wise, from the average of the MI trials.The differences were validated by shuffling the labels (i.e., MI and RS) 10,000 times.This way, significantly different edges were identified for each patient.(B) Group level analysis: we identified the edges that were significantly different in a large number of subjects (as compared to what would be expected by chance) and then performed nodal analysis to identify the regions over which significant differences were clustered.(C) Correlation analysis: correlation coefficients between individual differences (for each edge) and individual BCI scores were averaged over 5 regions, namely, executive, pre/motor, parietal, temporal and occipital, obtaining one mean correlation between functional areas (i.e., 5 3 5 = 25 values).Then, the correlation coefficients were shuffled 10,000 times, and each time surrogate average coefficients were obtained for each group of edges.These nulls were used to check if the edges with differences in transition probabilities that related to task performance clustered in any specific area.All statistical analyses were BH corrected for multiple comparisons, as appropriate.
were genuinely related to the task execution, one might expect that they would be greater when the trials were successful (i.e., when the subject could control the BCI device), as compared to when the trials were unsuccessful.To test this hypothesis, in each subject, we compared the differences between MI and RS in the successful trials, to the differences between MI and RS in the unsuccessful trials, expecting to see greater differences at the individual level (in the same set of edges) in the former case as compared to the latter.To statistically test this hypothesis, we used a permutation approach, randomizing successful and unsuccessful trials within each subject.Here, we proceeded under the null hypothesis that if the differences in transition probabilities were truly related to the kind of task at end (i.e., MI or resting state) they should be greater when the task is performed correctly, as compared to when the task has been done wrongly.Hence, we have compared the differences between successful MI and successful resting state, to the differences between unsuccessful MI and unsuccessful resting state.Then, we have built the corresponding null-model under the null-hypothesis that the correct execution of the task would not entail greater differences in the transition probabilities.Hence, we have allocated the trials randomly to the successful (hit) and unsuccessful (miss) trials, and we computed the distribution of the differences expected by chance.Finally, we compared the observed differences between hits (i.e., successful rest vs. successful MI) versus the differences between the misses, demonstrating that when tasks are successfully executed the corresponding differences in the edges are greater than what would be expected by chance.As shown in Figure 2, panels C, to the right, we could confirm that the transition probabilities across the previously identified reliable edges differed more, in each subject, between conditions, in successful trials as compared to unsuccessful trials, supporting the hypothesis that the relationship between transition probabilities and task performance is valid and measurable at the individual level.
MI-based BCI experiments rely on the use of features extracted from power spectra measured in location and frequency bins sensitive to oscillatory changes.More specifically, the system takes advantage of the desynchronization effect associated with a decrease of the power spectra as compared to the rest condition observed within the mu and/or beta band and over the contralateral sensorimotor area when one performs a motor imagery task of the right hand. 17To improve the classification performance based on power spectra, spatial filters relying on the common spatial patterns (CSPs) approach 18,19 have been adopted and widely used in the BCI domain. 5To take advantage of the interconnected nature of brain functioning, recent work consisted in using functional connectivity estimators, mostly relying on phase-locking value (PLV), 20 as alternative features for classification. 20,21To explore the performance of neuronal avalanches in the decoding of the task (i.e., resting-state versus hand motor imagery), we compared the ATMs to the CSP approach.In a preliminary analysis, we also explored the performance of the power-spectra and of the PLVs which both performed, as expected, worse than the CSP (not shown).The CSP and ATM outputs were classified with a Support Vector Machine (SVM), and the results were compared.As it can be seen in Figures 3A and 3D, the classification performance seemed comparable for both methods with MEG (averaged performance of 0.76 for both CSP+SVM and for ATM+SVM) and greater for ATM+SVM than CSP+SVM with EEG (averaged performance of 0.75 and of 0.80 respectively for CSP+SVM and for ATM+SVM).Furthermore, in both MEG and EEG we observed greater inter-subject variability in the case of CSP+SVM (standard deviation = 0.13 and 0.15, for MEG and EEG, respectively) than with ATM+SVM (standard deviation = 0.11 and 0.10 for MEG and EEG, respectively).
Then, we moved on to a subject-specific analysis, to explore the applicability of the ATM method in the context of a BCI training.We aimed to compare the ability of correctly classifying a trial as MI and RS within each subject.Hence, for each subject, we ran t-tests (and confirmed (B) Edge-wise differences in transition probability between hit/miss trials: Reliably different task-based interactions over the subjects (p < 0.05, after correction for multiple comparison).The reported edges are the ones where the difference observed between MI and Rest in the ''hit'' trials was greater than that found in the ''miss'' trials.(C) Node-wise differences in transition probability -on the left: computed from the trials according to the stimulus presented to the subject (''ground truth''); on the right: from the ''hit/miss trials'': the color scale is proportional to the nodal degree derived from the matrix with the group-significant differences (the brighter the color, the more edges that differ between tasks are incident upon that node).them with Wilcoxon tests) to compare the 50 success rates obtained with CSP+SVM to the 50 success rates obtained using ATM+SVM.This was done under the null hypothesis that CSP+SVM and ATM+SVM would not yield any statistically different performance in trial classification.We repeated this comparison for every subject, and corrected the statistical comparisons for multiple comparisons across subjects using the False Discovery Rate (FDR).We also repeated the analysis for 75 random splits, and the results did not change, showing that our results reached convergence at 50 splits.
This analysis showed that for the MEG dataset, ATM+SVM yielded significantly higher classification accuracy than CSP+SVM did for 6 subjects, while the opposite was true for 7 subjects.For the remaining 7 subjects, there was not any statistically significant difference between the decoding performances of CSP+SVM and ATM+SVM (Figure 3, panel B).For the EEG data, ATM+SVM yielded better classification accuracy than CSP+SVM for 12 subjects.In four subjects, CSPs yielded better accuracy than ATMs (Figure 3, panel E).In 5 subjects, there was not any statistically significant difference between the two approaches.
Moreover, we examined the variability of the estimates across the splits.Steady estimates are important to train online algorithms and high variability might be partly responsible for ineffective training.We observed marginally higher intra-subject variability in CSP+SVM (median value of 0.07 in both modalities) as compared to ATM+SVM (median value of 0.06 in both modalities).In particular, the standard deviation across the split is smaller for the ATMs for most subjects.In Figure 3, panels C and F for MEG and EEG respectively, we compare the variance (across random splits) of the estimates obtained with the two pipelines (again, ATM on the x axis, CSP on the y axis).We have also checked what is the contribution of very small avalanches (given that most avalanches are power-law distributed, and small avalanches are the most frequent).As reported in Figure S7 for the case of the threshold |z | > 3, one can observe that extremely small avalanches do not contribute significantly to the performance of the classification, as convergence is reached when including avalanches of size three.We have also investigated the influence of the frequency band (as opposed to broad-band) to the classification performance.As reported in the Figure S8, the broad-band case shows the best performance.
Finally, we explore the relationship between the magnitude of the differences in the transition probabilities between the two experimental conditions (in each subject, for every edge) to the individual BCI performance (defined as the proportion of trials in which the subject (B and E) Individual-level classification performance from MEG data and EEG data: each dot corresponds to the accuracy averaged over the 50 splits obtained from a given subject.The x axis refers to the accuracy of ATM+SVM, while the y axis refers to the accuracy of CSP+SVM.The dashed black line represents equal accuracy for both methods.Therefore, the dots below (resp.above) the line represents the subjects for which ATM+SVM performed better (resp.worse) than CSP+SVM.Dots have been color coded accordingly: in salmon, the subjects for whom ATM+SVM were statistically more accurate than the CSP+SVM; in violet, the subjects for whom CSP+SVM were statistically more accurate than the ATM+SVM, and in gray, the subjects where the two methods did not yield statistically different performances.(C and F) Individual-level classification performance variability from MEG data and EEG data: each dot corresponds to the standard deviation over the 50 splits obtained from a given subject.The dashed black line represents equal variability for both methods.The x-and y axis have been coded in descending value order so that the dot distribution can be read similarly as in the previous plots, with dots below (resp.above) the dashed line representing subjects for whom ATM+SVM led smaller (resp.greater) variability than CSP+SVM.The color code is identical to the plots B and E, so that salmon (resp.violet) dots represent the subjects for whom ATM+SVM accuracy was significantly better (resp.worse) than the accuracy of CSP+SVM.We chose to keep the color code constant to allow for more direct comparison of the performance (accuracy and variability) of both methods.controlled the BCI device) in the MI task, as measured using the BCI score.Figure 4, panel A shows the edges whose differences between conditions correlate the most with the BCI scores (the color code shows the intensities of the correlations, for visualization purposes only edges with correlations with p values <0.05 are shown).Firstly, for nearly every edge we observed a positive correlation.To help interpretation, we clustered all the edges according to functional regions (i.e., executive regions, pre/motor areas, parietal areas, temporal areas, occipital areas, (Figure 1C).That is, we computed the average Spearman's correlation over all the edges connecting any two functional regions (also including self-connections, i.e., edges which are comprised within a functional region).To check if the differences in edges transitions which correlated to task performance clustered over certain functional areas above chance-level, we again built a null-model.To this end, for 10000 times, we randomly allocated edges correlation coefficients to functional areas and, each time, we computed the average.We compared the observed average correlation coefficient to the null distribution to obtain a significance value per functional area.Significance values were then corrected for multiple comparisons across all their combinations (i.e., 5 3 5, 25 p values).Our results suggest that edges that significantly relate to BCI task performance hinge pre/motor areas and parietal areas (p < 0.0001, See Figure 4, panel B).Since we retrieved nearly exclusively positive correlations, our results capture that perturbations spread more often between premotor/motor areas and parietal areas when the subject is engaged in the motor-imagery task, as compared to the resting-state condition.We replicated these results using different parcellation schemes (see Figure S9).

DISCUSSION
In this work, we set out to test if neuronal avalanches can track subject-specific changes induced by the execution of a task (i.e., hand motor imagery) in the large-scale brain dynamics.The working hypothesis was that meaningful communication among regions on the large-scale is intermittent, and it is best understood and measured in terms of aperiodic perturbations.3][24] In our data, we confirmed that the measured branching ratio is compatible with that of a system operating at criticality or near-criticality. 25We went on from there to test the basic idea that brain regions interact differently while performing different tasks.We reasoned that, if avalanches convey interactions occurring between regions, their spreading should also be modified according to the task at hand.Such context-dependent modifications should then be encoded in the avalanche transition matrices and, in turn, might be decoded in order to tailor BCIs.More specifically, such information could be considered either as potential predictors of BCI performance, to conceive tailored training programs, or as alternative features, to improve the classification performance.
With respect to the encoding framework, we identified, in an unsupervised manner, a number of functional links (i.e., edges) that are reliably more likely to be dynamically recruited during a hand motor imagery task as compared to resting state.The edges cluster over regions typically involved in motor planning and attention, as expected from a motor imagery task.In particular, our results demonstrated that the activities spreading across these edges differed mostly when contrasting trials that had been successful, as compared to the trials during which the subject could not control the interface.This finding demonstrates a behavioral readout for the observed changes in the transition probabilities.This is in line with previous evidences demonstrating that premotor areas are involved in the planning of motor actions, in the imagining of actions, in allocating executive attention, 26 as well as in the selection between competing visual targets, 27 while parietal areas are notably involved, among other things, with the processing of sensitive input.In line with previous findings, 28 a premotor-parietal network was found to be specifically implicated with spatial imagery tasks.These results suggest that indeed the localization of the different edges carries a behavioral meaning.
The ATMs directly track the spreading of activations as they happen (as opposed to quantifying dependencies over time intervals).Using such straight-forward approach, 12 we reliably retrieve functional information related to the execution of a task at the subject-level, which was not possible using classical functional metrics. 29Our approach is based on theoretical underpinnings derived from statistical mechanics, which posits that higher-order, long-range correlations would appear in a near-critical dynamical regime. 22,25In fact, it is important to notice that, by z-scoring each region and using a high-threshold we selected only very strong coherent activity, which is unlikely to be generated by a linear process and that, instead, refers to a higher-order phenomenon.In doing so, we discarded most of the available signals.In practice, we have discarded roughly 90% of the data, applying a ''spatio-temporal'' filter, and only selecting those points in time and space where largescale aperiodic perturbations were found.To provide a comparison with more standards techniques, we have used the same pipeline based on techniques that assume stationarity (and take the whole data into account), namely the power-spectra and the event-related desynchronization/synchronization (ERD/S) effects (both containing local information) and the phase-locking value (estimating bivariate synchronization between brain regions).Importantly, all these techniques failed to reproduce any pattern of differences between the two conditions that was replicable at the individual level (see Figure S3).1][32][33] The fact that we could find robust individual differences while discarding most data and that we failed to do so when taking the whole data into account suggests that focusing on higher-order perturbations might be useful to capture functionally relevant processes and, in turn, to apply them to the design of BCIs.
We replicated our results using different thresholds and binnings showing that they are resilient to these choices.Also, they can be replicated using different parcellation schemes and using EEG signals, which is more widely available than MEG for BCI applications, thereby making our methodology suitable in a wide variety of settings.All in all, extensive replications make it unlikely that our results could be due to arbitrary choices or limited to a specific methodology.
Within the decoding framework, we compared the offline classification performance resulting from the use of the ATM to the gold-standard approach, which relies on spatial filters (i.e., the Common Spatial Patterns).Our results suggest that the integration of periodic and aperiodic features would be a straightforward way to improve task classification.Indeed, the information captured by the two types of feature extraction (namely CSPs and ATMs) and the two modalities (MEG and EEG) are complementary.The ATMs maintain a fairly straightforward interpretability as opposed to CSPs, which operate on large-scale components of the signal that are not as readily interpretable.In particular, the ATMs focus on the strong coherent interactions that intermittently occur on the large-scale.The good performance of the ATMs on the EEG data is relevant to translate our methodology to real-world scenarios.In this configuration, the classification of ATMs leads to a significant improvement of the decoding performance with respect to the benchmark in the majority of subjects.Importantly, in both modalities, we observed a reduced intra and inter-subject variability with our approach as compared to CSP+SVM.This might help, in real-life experiments, to reduce the BCI inefficiency phenomenon.To evaluate the feasibility in online applications, we estimated that for an epoch of 5s the time necessary to extract the features, and to perform the classification, was approximately 25 ms for ATM+SVM and 27 ms for CSP+SVM.This value is actually compatible with current on-line settings which use similar time windows and update the feedback every 28 ms.Further investigations are needed to explore the performance in the context of online classification with shorter time windows.Nevertheless, it is worthwhile mentioning that this is a first proof-of-concept study of the use of neuronal avalanches as complementary/alternative features for the design of BCI.
Identifying neural markers associated with BCI performance is crucial to design optimized and tailored BCI systems. 34In turn, the most informative markers provide insight into the processes that underpin the execution of a given task.
Neurophysiological predictors of BCI scores are most commonly associated with power spectra.Indeed, sensorimotor m-and a-rhythms or, more recently, time-averaged brain interactions in these frequency bands have been considered as potential markers. 33These findings were mainly empirical and, in this oscillatory perspective, features such as power spectra and/or (static) synchronization measures have been widely explored to inform the interfaces. 21,33Furthermore, regional connectivity strength 29 and the M/EEG multiplex core periphery 35 of specific associative and somatosensory areas held predictive power over BCI performance in the same session.However, between 15% and 30% of the subjects do not learn to control the effector despite extensive training.This might mean that the typical features only partly capture the processes that lead to the execution of the task.Hence, different markers might be exploited.Our study contributes, on a practical level, by achieving a differentiation between tasks at the individual level.From a more theoretical perspective, our results suggest that the spreading of local synchronization on the large-scale might be intermittent and aperiodic, and that such spreading carries behavioral relevance.The fact that neuronal avalanches are relevant to the execution of a task might also have implications on the underlying microscopic dynamics.As such, this would allow the deployment of complex and solid mathematical tools derived from statistical mechanics to test the presence of specific microscopic physiological processes.
When relating the differences between MI and RS in the probability of an avalanche consecutively recruiting two regions to the magnitude of BCI performance we find mostly positive correlations, indicating that the more avalanches spread between premotor/motor and parietal regions during the task, the better the control of the BCI.This might suggest that the interactions between pre/motor regions and parietal ones underpin the execution of the task.These findings are in line with previous studies relying on MI-based BCI paradigms.In particular, Buch et al. 36 showed that the structural integrity of the frontoparietal networks predict the ability of stroke patients to control a brain computer interface in a motor imagery task.Using fMRI, Halder et al. showed that the premotor areas participate in executing voluntary modulation of brain rhythms through an MI-based BCI. 37mportantly, when interpreting the results in cognitive terms, one should consider that a (BCI) task likely recruits multiple cognitive processes, beyond those exploited by the BCI classification itself.As such, a psychophysiological interpretation of the areas involved in the BCI classification is not straightforward.For example, the right frontoparietal network dynamics also reflects the allocation of attentional resources, which are typically engaged in cognitive/motor tasks.It is also known that frontomedial activities are one of the main correlates of sustained attention. 38Thus, although the clustering of the edges that we found in the frontoparietal network is consistent with the prominent involvement of this network in motor imagery tasks, it cannot be mapped uniquely onto one cognitive process.A different perspective is provided by the analysis of cognitive profiles, since it was shown that spatial abilities influence BCI performance, 39 and in particular mental rotation. 40As such, training strategies might be tailored over a subject-specific assessment of such abilities.Intriguingly, mental rotation abilities were related in turn to increased activity of the premotor cortex, the superior-parietal and the intra-parietal cortices. 41,42Furthermore, activations in the right middle frontal gyrus correlated with BCI performance, which might be interpreted in the light of the role that this region plays in the processing of an observed movement. 37n conclusion, in a real-world scenario, multiple mechanisms might be in place.As such, our approach is not expected to be the only useful framework.However, it might capture part of the processes that were typically overlooked in a more oscillatory perspective.Our work paves the way to use aperiodic activities to improve classification performance and tailor BCI training programs.

Limitations of the study
This first proof-of-concept study aimed at assessing to which extent neuronal avalanches could be relevant to identify potential markers of BCI performance and alternative features to detect the subjects' intent.However, to explore scalability and deployability, studies will need to involve different types of motor imagery tasks (e.g., feet motor imagery, tongue motor imagery etc.), the assess the sensibility of ATMs toward the discrimination of tasks that involve areas close to each other.Furthermore, we have only assessed the performance of the ATMs in controlling one degree of freedom.However, the performance of ATMs in controlling more degrees of freedom will have to be assessed to study the use of ATMs in richer frameworks (i.e., instead of considering only the vertical position of the moving cursor, the horizontal position might also be considered).

STAR+METHODS
Detailed methods are provided in the online version of this paper and include the following: contrast maps between conditions to elicit the most relevant information, namely the (channel; frequency bins) couples of interest that best discriminate the subjects' intent, to train the classifier.ii) the testing phase was made of six runs where the feedback, consisting of a moving cursor, was provided.Each trial consisted of 1s of ISI, followed by 5s of target presentation.The feedback was provided from t = 3s to t = 6s.It consists of a cursor that starts from the left to the right part of the screen with a fixed velocity.Experimenters instructed the subjects to start to either remain at rest or to perform a sustained MI task as soon as they saw the target, i.e., at t = 1s.The online features were obtained from the estimation of the power spectra via an autoregressive model that relied on the maximum entropy method 44 every 28 ms on a time window of 0.5s.These features were classified using the Linear Discriminant Analysis (LDA) method.The feedback provided to the subject, namely the vertical position of the moving cursor, relied on the linear combination of the computed features via the moving average method. 45The BCI performance used in this study refers to the proportion of trials in which the subjects could control the vertical position of the moving cursor to hit the target.In this work, the analysis relied on the data obtained from the testing phase.

M/EEG data acquisition and preprocessing
EEG signals were recorded with a 74 EEG-channel system, with Ag/AgCl passive sensors (Easycap, Germany) placed according to the standard 10-10 montage, with the references placed at the mastoids, and the ground electrode located at the left scapula.MEG signals were recorded via a system composed of 102 magnetometers and 204 gradiometers (MEGIN Neuromag TRIUX MEG system).M/EEG signals were simultaneously recorded in a magnetic shielding room with a bandwidth of 0.01-300 Hz and a sampling frequency of 1 kHz.Head positions were digitized via the Polhemus Fastrak digitizer (Polhemus, Colchester, VT).Three points were used as landmarks to provide co-registration with the individual anatomical MRI: nasion, left and right pre-auricular points.Individual T1 sequences (256 sagittal slices, TR = 2.40 ms, TE = 2.22 ms, 0.80 mm isotropic voxels, 300x320 matrix; flip angle = 9 ) were acquired with a 3T Siemens Magnetom PRISMA after the BCI experiments.Subjects were instructed to remain at rest for 15 min.Images were preprocessed with the FreeSurfer toolbox 46 and imported to the Brainstorm toolbox 47 where the digitized locations of the landmarks, and of the EEG electrodes were aligned with the MRI.
To remove the environmental noise in MEG signals, we applied the temporal extension of the Signal Space Separation (tSSS) with Max-Filter. 48To remove ocular and cardiac artifacts, we performed an Independent Components Analysis (ICA) via the Infomax approach with the Fieldtrip toolbox. 49,50Only the components that contained physiological artifacts were removed through a visual inspection of the signals.
Once the data was preprocessed, we cut the recordings into epochs of 7 s.
Source reconstruction was performed by the computation of individual head models with the Boundary Element Method (BEM) 51,52 where the surfaces were obtained from three layers related to the individual MRI (scalp, inner skull, outer skull) that contained 1922 vertices each.Sources were estimated via the weighted Minimum Norm Estimate (wMNE). 53,54In this work, we used the Desikan parcellation scheme. 55The list of the regions of interest is available in the Table S2.
For a complete description of the preprocessing steps, please refer to Corsi et al. 29

Data pipeline
Each source-reconstructed signal was z-scored (over time), thresholded, and set to 1 when above threshold, and to zero otherwise (threshold: z = |3| 10 ).Note that each region was z-scored independently (over time).Then, an avalanche was defined as starting when at least one region is above threshold, and as finishing when no region is active.For each avalanche, we estimated a transition matrix, structured with regions in rows and columns, and the ij-th edge is defined as the probability that regions j would be active at time t+1, given region i was active at time t. 13,56To consider the intra-regional dynamics, the main diagonal of the transition matrix contains the probability that if a region is recruited by an avalanche, it will keep being active at the successive time step.For each subject, we obtained an average transition matrix (i.e., averaging edge-wise over all avalanches) for the baseline condition, and an average transition matrix for the hand motor imagery task.To ensure appropriate sampling, 57 we have binned the data with bins ranging from 1 to 3 (stopping at three to avoid aliasing).To select the optimal binning, we looked at the branching ratio, since a branching ratio $ 1 typically indicates a process operating near a critical regime.The branching ratio is calculated as the geometrically averaged (over all the time bins) ratio of the number of events (activations) between the subsequent time bin (descendants) and that in the current time bin (ancestors) -Equation 1, as: n events ðjÞ 1 N bin À 1 (Equation 1) and then geometrically averaging it over all the avalanches -Equation 2. 22 s = Y where s i is the branching parameter of the i-th avalanche in the dataset, N bin is the total amount of bins in the i-th avalanche, n events (j) is the total number of events active in the j-th bin, and N aval is the total number of avalanches in each participant's recording.
In branching processes, a branching ratio of s = 1 indicates critical processes with activity that is highly variable and nearly sustained, s < 1 indicates subcritical processes in which the activity quickly dies out, and s > 1 indicates supercritical processes in which the activity increases as runaway excitation.The bin length equal to one sample yielded a s = 1, hinting at the avalanches as occurring in the context of a dynamical regime near operating near criticality, was chosen for subsequent analyses.Importantly, the other binnings also yielded branching ratios extremely close to 1 (0.995, and 0.978 for binnings 2 and 3, respectively), and the results of the analyses remain unchanged, showing resilience to the details of the pipeline.However, one should notice that this particular dataset entails rather short epochs and, as such, it is not best suited for the evaluation of a long tail.This is all the more true considering that a stimulus was being delivered and, as such, the dynamics was not evolving unperturbed.
In order to compare how the trials are encoded in the data, we provide a comparison with standard feature extraction techniques, we computed the power spectra, via the Welch method with a window length of 1 s and a window overlap ratio of 50%, the event-related desynchronization/synchronization (ERD/S) effects via Morlet wavelets with a central frequency of 1 Hz and a time resolution of 3s between 3 and 40 Hz, 58,59 and the phase-locking value, as in Lachaux et al. 60 The PLV was chosen for its straight-forward interpretation, and for its theoretical assumptions (i.e., stationarity of the signal), which is different from the one of the ATMs.Even though working in the source space may mitigate the volume conduction effects, 61 it is important to mention that the PLV method is influenced by zero-lag interactions, which might be either true interactions or spurious correlations induced by the field-spread.
For these analyses, the classification of the trials as MI or RS was based on the outcome of the experiment.To explore the robustness of our results to different classification algorithms, we also classified the trials as MI/RS based on either Linear Discriminant Analysis or a Support Vector Machine.The results showed that our conclusions are robust to the classification algorithm (see Figure S6).

Classification analysis
To assess the extent to which the ATMs might be considered as an alternative feature for BCIs, we compared the classification performance resulting from a feature extraction approach based respectively on the ATMs and on spatial filters, namely Common Spatial Patterns (CSP). 18,19In addition, as a preliminary study, we also tested the classification performance associated with other features such as the power-spectra (of the source-reconstructed time-series) and the phase-locking values.All the considered features were classified with two different techniques, namely linear discriminant analysis (LDA) and support vector machines (SVM).We obtained the best classification performance using CSP followed by the Support Vector Machine (SVM) classifier.Therefore, we selected this framework as the benchmark against which ATMs were compared.
For each subject, we divided the dataset to include 80% of the trials in the train split and 20% of the trials in the test split.The classification scores for all pipelines were evaluated with an accuracy measurement using a random permutation cross-validator.50 re-shuffling and splitting iterations were performed.The SVM was trained using either the CSPs or the ATMs.For each subject, the CSP method decomposes the source-reconstructed signals using spatial filters, and then selects the n modes that capture most inter-class variance.Here, we selected eight spatial modes (since they yielded the best classification accuracy) and returned the average power of each.As for the ATMs, for each subject we found the optimal Z score threshold for identifiability.Then, we fed an SVM classifier with either feature (CSP-filtered data or ATM).Finally, we compared the classification performance (i.e., the proportion of correctly labeled trials) for CSP+SVM and ATM+SVM, over 50 random splits of the data.For each subject, we ran t-tests (and confirmed them with Wilcoxon) under the null hypothesis that CSP+SVM and ATM+-SVM would not yield a statistically different performance in trial classification.We repeated this comparison for all the subjects and corrected the statistical comparisons for multiple comparisons across subjects using the False Discovery Rate (FDR).To calculate the inter-subject variability, we used the standard deviation of the classification performance across splits and subjects.As per the intra-subject variability, we calculated the standard deviation of the classification accuracy across the 50 splits for each subject.To estimate the computational time required to extract and to classify the features, we used a built-in function in python.

QUANTIFICATION AND STATISTICAL ANALYSIS
For each subject, we computed the difference in the probability of a perturbation running across a given edge during resting-state and during the MI task.To statistically validate this, for each individual, we randomly shuffled the labels of the individual avalanches (i.e., each trial-specific transition matrix was randomly allocated to either resting-state and hand motor imagery).We performed this procedure 10000 times, obtaining, for each edge, the distribution of the differences given the null-hypothesis that the transition matrices would not capture any difference between the two conditions.Note that this approach does not require normality of the original distributions.We used the null distribution to obtain a statistical significance for each edge.The retrieved significances were Benjamini-Hochberg-corrected for multiple comparisons across edges. 16Following this procedure, we obtained for each patient, a matrix with the edges that significantly differed from the two conditions.We then looked at the concordance of such matrices across subjects, as to only focus on the edges that are reliably related to the task at hand.We have only selected those edges that were significant in a higher-than-chance number of subjects.Finally, we selected only those nodes that had more significantly different edges incident upon them, as compared to chance level.This way, we selected the areas whose involvement in large-scale dynamics is qualitatively different, in multiple subjects, between the two conditions (i.e., RS vs. MI task), and refer to these as the ''task-specific'' areas.
Then, we moved on to check what edges differed related to the BCI performance.To this end, we related, for each edge, the individual differences in the transition probabilities in the two experimental conditions to the individual BCI performances.We then grouped the edges according to functional areas, namely: executive areas, pre/motor areas, parietal areas, temporal areas, and occipital areas.To statistically validate this approach, for 10000 we have randomly allocated the edges to these groups and computed the average correlation coefficient at each iteration for each group of edges.We used these averages to build a null-distribution for each functional area, and used to have a statistical significance.Finally, these significances were corrected across functional regions using the BH correction for multiple comparison.We have repeated the correlation analysis without grouping the regions into functional areas but carrying out the correlation analysis for each region separately.In this case, no single region survived the BH correction (not shown).
Finally, we replicated our results using the Destrieux parcellation scheme 62 (the associated list of the regions of interest is available in the Table S2) and using the electroencephalogram. Furthermore, we repeated the analysis changing the threshold to define active regions.

Figure 2 .
Figure 2. Reliability analysis (A) Edge-wise differences in transition probability from the ground truth: Reliably different task-based interactions over the subjects (p < 0.05, after correction for multiple comparison).(B)Edge-wise differences in transition probability between hit/miss trials: Reliably different task-based interactions over the subjects (p < 0.05, after correction for multiple comparison).The reported edges are the ones where the difference observed between MI and Rest in the ''hit'' trials was greater than that found in the ''miss'' trials.(C) Node-wise differences in transition probability -on the left: computed from the trials according to the stimulus presented to the subject (''ground truth''); on the right: from the ''hit/miss trials'': the color scale is proportional to the nodal degree derived from the matrix with the group-significant differences (the brighter the color, the more edges that differ between tasks are incident upon that node).

Figure 3 .
Figure 3. -Classification analysis (A and D) Group-level classification performance obtained from MEG and EEG data: for each classification tool (respectively, in purple to the left, CSP+SVM and in salmon to the right, ATM+SVM) each dot corresponds to the accuracy obtained for a given subject and a given split.(Band E) Individual-level classification performance from MEG data and EEG data: each dot corresponds to the accuracy averaged over the 50 splits obtained from a given subject.The x axis refers to the accuracy of ATM+SVM, while the y axis refers to the accuracy of CSP+SVM.The dashed black line represents equal accuracy for both methods.Therefore, the dots below (resp.above) the line represents the subjects for which ATM+SVM performed better (resp.worse) than CSP+SVM.Dots have been color coded accordingly: in salmon, the subjects for whom ATM+SVM were statistically more accurate than the CSP+SVM; in violet, the subjects for whom CSP+SVM were statistically more accurate than the ATM+SVM, and in gray, the subjects where the two methods did not yield statistically different performances.(C and F) Individual-level classification performance variability from MEG data and EEG data: each dot corresponds to the standard deviation over the 50 splits obtained from a given subject.The dashed black line represents equal variability for both methods.The x-and y axis have been coded in descending value order so that the dot distribution can be read similarly as in the previous plots, with dots below (resp.above) the dashed line representing subjects for whom ATM+SVM led smaller (resp.greater) variability than CSP+SVM.The color code is identical to the plots B and E, so that salmon (resp.violet) dots represent the subjects for whom ATM+SVM accuracy was significantly better (resp.worse) than the accuracy of CSP+SVM.We chose to keep the color code constant to allow for more direct comparison of the performance (accuracy and variability) of both methods.

Figure 4 .
Figure 4. Correlation analysis (A) Edge-wise correlations: edge-wise correlation with BCI scores.For visualization purposes, only the edges with p values<0.05 are visualized.The color of the edges is proportional to the correlation coefficient.(B) Node-wise differences in transition probabilities: the picture shows the functional-areas averaged correlation coefficients.In orange the pre/motor areas, in turquoise the parietal lobe.

TABLE
d RESOURCE AVAILABILITY B Lead contact B Materials availability B Data and code availability d EXPERIMENTAL MODEL AND STUDY PARTICIPANT DETAILS d METHOD DETAILS B BCI experiment B M/EEG data acquisition and preprocessing B Data pipeline B Classification analysis d QUANTIFICATION AND STATISTICAL ANALYSIS