Improving motor imagery classification during induced motor perturbations

Objective. Motor imagery is the mental simulation of movements. It is a common paradigm to design brain-computer interfaces (BCIs) that elicits the modulation of brain oscillatory activity similar to real, passive and induced movements. In this study, we used peripheral stimulation to provoke movements of one limb during the performance of motor imagery tasks. Unlike other works, in which induced movements are used to support the BCI operation, our goal was to test and improve the robustness of motor imagery based BCI systems to perturbations caused by artificially generated movements. Approach. We performed a BCI session with ten participants who carried out motor imagery of three limbs. In some of the trials, one of the arms was moved by neuromuscular stimulation. We analysed 2-class motor imagery classifications with and without movement perturbations. We investigated the performance decrease produced by these disturbances and designed different computational strategies to attenuate the observed classification accuracy drop. Main results. When the movement was induced in a limb not coincident with the motor imagery classes, extracting oscillatory sources of the movement imagination tasks resulted in BCI performance being similar to the control (undisturbed) condition; when the movement was induced in a limb also involved in the motor imagery tasks, the performance drop was significantly alleviated by spatially filtering out the neural noise caused by the stimulation. We also show that the loss of BCI accuracy was accompanied by weaker power of the sensorimotor rhythm. Importantly, this residual power could be used to predict whether a BCI user will perform with sufficient accuracy under the movement disturbances. Significance. We provide methods to ameliorate and even eliminate motor related afferent disturbances during the performance of motor imagery tasks. This can help improving the reliability of current motor imagery based BCI systems.


Introduction
Non-invasive brain-computer interfaces (BCIs) allow generating a control signal by inferring a user's intention from the ongoing electroencephalography (EEG) (see Vidal 1973, Kuebler et al 2001, Birbaumer 2006, Neuper et al 2006, Sejnowski et al 2007, Müller et al 2008, Blankertz et al 2010b, Millán et al 2010, McFarland and Wolpaw 2017, Scherer and Vidaurre 2018, Stevenson et al 2019, Mane et al 2020. In BCIs based on the modulation of the sensorimotor rhythms (SMR), the imagination of movements elicits changes in the motor cortex that are reflected in the EEG. By distinguishing which kind of imagined movement the user is executing, it is possible to generate an output signal that can be used to interact with, for example, a computer. Other possible interventions that also cause SMR modulations are passive and induced movements. During a passive movement there is no voluntary planning and preparation processes, but activity from the afferent somatosensory components (tactile receptors, muscle spindles and joint receptors) is still present, leading to the modulation of brain activity manifested in event related desychronization/synchronization (ERD/ERS) (Alegre et al 2002, Keinrath et al 2006. The stimulation of muscles, even under the threshold of movement, also excites neural networks in the brain (Chatterjee et al 2007, Cho et al 2011, Vidaurre et al 2013, Ahn et al 2014, Yi et al 2017, Corbet et al 2018 and stimulated (induced) movements also cause SMR modulations through the activation of the afferent pathways (e.g. stimulating one nerve or some muscles directly). Stimulation can produce very specific afferent activity in different structures of the brain's sensorimotor network (Wegrzyk et al 2017). Electrical stimulation is based on a very broad variety of stimulation parameters (e.g. pulse width, voltage amplitude, current, electrodes shape and size, frequency) and empirically found standard stimulation parameters are normally chosen to achieve effective motor or sensory stimulation (Doucet et al 2012).
In the context of SMR based BCIs for rehabilitation purposes (Birbaumer et al 2009, Chaudhary et al 2020, Mane et al 2020, Nierhaus et al 2021, induced movements are commonly used to support the classification of motor imagery (MI). For example, Ramos-Murguialday and Birbaumer (2015) compared motor imagery, passive, induced and active movement tasks, showing that all of them share similar modulation of the oscillatory components. Additionally they showed that the use of proprioceptive feedback increases BCI performance (Ramos-Murguialday et al 2012). Similar results were also recently published in (Corbet et al 2018). Furthermore, previous work showed that calibrating a MI-based BCI with afferent signals obtained with neuro-muscular electrical stimulation (NMES), can significantly increase performance in those tasks where BCI control is difficult to achieve (Vidaurre et al 2013. Over the past years, the role of BCIs in the field of neuroprostheses is widely discussed (see Faghri et al 1994, Mueller-Putz et al 2005, Peckham and Knutson 2005, Neuper et al 2006, Ramos-Murguialday et al 2012. In this setting, induced (NMES) or passive movements are externally initiated to achieve complex movements (Pedrocchi et al 2013. Therefore, there is a need to study induced movements not only as facilitators of BCI control (Chatterjee et al 2007, Vidaurre et al 2013, Ahn et al 2014, Yi et al 2017, Corbet et al 2018, but also as possible disrupters. A recent work by Hehenberger et al (2020) found that vibrotactile stimulation without movement does not negatively influence the performance of the BCI system for the recognition of movements. However, there are situations where induced movement artifacts of neuronal origin can hinder the efficiency of BCI systems. This can occur during attempted movements in e.g. stroke patients. In this situation, peripheral stimulation is used to facilitate motor performance. However, neuronal activity caused by the stimulation could be wrongly recognized as an attempted movement. This in turn can lead to a false triggering of a BCI device.
In this work, we looked at induced movements from that perspective and investigated whether significant performance drops exist in the classification of two motor imagery tasks when one limb is stimulated during the imagination of different movements. And indeed, this was the case for the two studied scenarios: (a) the first one is the continuous stimulation of a limb not involved in any of the MI tasks. Here, and taking into account results reported in literature (see e.g. Vidaurre et al 2013Vidaurre et al , 2019, we hypothesized that the stimulation might cause a more widespread event-related desynchronization (ERD) than MI, which might hamper the classification; (b) the second setting is the continuous stimulation of a limb involved in one of the two MI tasks. In this condition, the always-active stimulation only coincides with the MI task in half of the trials. However, during the other half stimulation and MI activity belong to different classes.
After analysing both settings, we further conjectured that it is possible to ameliorate the observed performance drop by removing spatial directions contaminated with noise. Furthermore, our results show that in the first scenario a simple method to extract oscillatory sources called spatio-spectral decomposition (Nikulin et al 2011, Haufe et al 2014 is sufficient to counteract the effect of the afferent feedback. In the second setting, we offer ways to alleviate this accuracy loss by removing spatial directions where noise related to stimulation could be present. We additionally study the cause of such performance decreases by inspecting the extent of ERD/ERS (event-related synchronization) in comparison to the control condition. We show that the ERD attenuation is significantly diminished and are able to provide the threshold for the residual SMR that must still be present in EEG during the stimulation to ensure that the BCI user will retain good performance.
Similar to the work presented by Iscan and Nikulin (2018), which examined the effect of perturbations on steady-state visual evoked potential based BCIs, the present study can be considered as another demonstration for the possibility to use BCI systems in situations involving external perturbations affecting the quality of task related neural signals.

Experimental setup
Informed and written consent was obtained from the participants before starting the experiments.
Data were recorded in a one-day session lasting between 6 and 7 h, from ten healthy BCI-users. First, the NMES set-up took place. NMES was generated by a Rehastim (Hasomed GmbH, Magdeburg, Germany) system that produced electrical impulses at 20 Hz using four electrodes placed on the skin of the users, two on each biceps using a bipolar set-up. The impulse width was 500 µs. The amplitude needed to stimulate the arms (8-10 mA) was subject-dependent and was selected as the minimum needed to elicit a weak contraction of the muscle. The NMES controller generated stimulation during the whole active trial (3 s) and the evoked contraction caused the arm to slowly move upwards. At the end of the trial, when the stimulation stopped, the user was asked to return the arm to the initial position (over his/her lap).
Brain activity was acquired from the scalp with multi-channel EEG amplifiers using 56 Ag/AgCl electrodes in an extended 10-20 international system. It was sampled at 1000 Hz with a band-pass filter from 0.5 to 50 Hz. Data were down-sampled to 100 Hz before further processing. After the preparation, users were instructed to imagine the same type of movements as those yielded by NMES, and the imagery exercise was repeated several times before the recording session took place to let the participants train the movement imagination. The imagined movement of the feet was ankle flexion. For the upper limbs, it was arm flexion. The session consisted of three different blocks: • MI: Motor imagery of three limbs (left arm, right arm and feet) was carried out. Every seven seconds one of three different visual cues (arrows pointing left, right, or down) indicated the participant which type of MI task to perform (see first row of figure 1). Three runs of 75 trials each (25 per class) were recorded. • NMES: in this condition, the left or the right arm were stimulated but no MI tasks were performed by the participant. Two runs of 50 trials per class (i.e. the arms) were carried out. See second row of figure 1. • MI+NMES: the pair of classes with best predicted performance from the MI recording was chosen for the MI tasks carried out in this block. This pair always included one of the arms and the feet class. During some of the trials the stimulation of the right or left arm occurred. A schema of MI+NMES is visible in the third row of figure 1. In each run of this block there were trials of MI and trials of simultaneous MI and NMES. Three runs with 150 trials each were recorded. The runs contained three types of trials in random order to avoid block-effects: -CONTROL (MI performed without simultaneous NMES): Best classified class-pair in the calibration data containing MI tasks of one arm and the feet. -DIFF: MI of one arm and the feet (the same tasks as CONTROL condition) plus simultaneous NMES on the other arm. NMES was active in all trials of this condition. -SAME: MI of one arm and the feet (the same tasks as CONTROL condition) plus simultaneous NMES on the same arm. Again, NMES was active in all trials of this condition.
Note that in a facilitator paradigm, NMES coincides with the performed tasks in 100% of the trials. In our DIFF and SAME settings NMES is always active as well. However during DIFF, NMES never coincides with the MI tasks. On the other hand, during SAME, NMES coincides with the task in only 50% of the trials, because the other 50% of the trials are feet class.

Spatio-spectral decomposition
Spatio-spectral decomposition (SSD) finds spatial filters that maximize the signal to noise ratio (SNR) of oscillatory signals. This method was first introduced by (Nikulin et al 2011), where it was shown that the SNR of the measured signals is proportional to the SNR of the sources. Thus, maximizing the SNR of the measured signals, also maximizes the SNR of the sources of interest. Taking covariance matrices as power estimators, the optimization problem reduces to a generalized eigenvalue problem. where P s (f ) and P n (f ) are the power of the source signal and the noise at a narrow frequency band respectively, P m (f ) is the power of the measured signal, and Σ m and Σ n are the covariance matrices of the measured signal and noise of the filtered EEG. Finally w is the spatial filter that allows finding sources of maximal SNR.

Common spatial patterns
In order to find maximally discriminant spatial filters in terms of covariances, we apply the methods described by (Blankertz et al 2008b, Lemm et al 2011, Sannelli et al 2019. This analysis is called common spatial patterns (CSPs). The 2 Hz band between 19 and 21 Hz was disregarded from the CSP analysis to avoid the stimulation artefact present in the 20 Hz band, see (Vidaurre et al 2013, Insausti-Delgado et al 2021. CSP filters maximize the variance of the signal under one task while minimizing it for the other. Since, for band-pass filtered signals the variance is equal to band-power, CSP analysis is applied to obtain an effective discrimination of mental states that are characterized by ERD/ERS effects (cf Ramoser et al 2000, Blankertz et al 2008b, Sannelli et al 2011. CSP yields a data-driven supervised decomposition of the signal x(t) parametrized by a matrix W that projects the signal from the original sensor space to a surrogate sensor space: To calculate the matrix W we need the sample covariance matrices of the band-pass filtered EEG signals of two different motor imagery tasks Σ + and Σ − . Then, the filter matrix W is defined by the generalized eigenvalue problem: where D is a diagonal matrix of eigenvalues.

Invariant CSP with robust PCA noise matrix for DIFF condition
Usually CSP filters are optimized for MI trials, but in our case NMES is concurrently applied during the task execution. Therefore, the CSP filtered signals are contaminated with 'MI-like' patterns induced by the stimulation (Vidaurre et al 2013) in trials of all classes. In order to obtain more robust filters, it is possible to include a 'noise matrix' to the right term of equation ( The iCSP method turns a BCI system robust against changes in power that occur due to phenomena unrelated to the performed task, (Blankertz et al 2008a).
The goal of iCSP is to calculate spatial filters W such that the variance of one class is maximized by simultaneously minimizing the variance of the other class and the covariance matrix Ξ of a disturbing signal Y.

Figure 2.
Pipeline to obtain baseline results for CONTROL, DIFF and SAME conditions. After finding the frequency band and time interval of interest for the individual pair of classes, CSP analysis is performed to select CSP derivations. Then, a LDA classifier is trained. All parameters (band, time interval, CSP filters and classifier) were then used in the test set (CONTROL, DIFF or SAME dataset), to obtain the final classification error.
For this purpose, the following formulation of CSP is applied: where η ∈ [0, 1] is a hyperparameter to trade-off the discrimination of the training classes against invariance. This means that a small part of the non-task related but disturbing information is added to the averaged class covariance matrix in order to make the spatial filter invariant against it. In this manuscript, the 'noise-matrix' was estimated from band-pass EEG signals recorded during NMES trials. The EEG can then be filtered as before: x CSP (t) = x(t) · W. To find Ξ, we apply a method called robust PCA (rPCA) to the noise matrix (Pascual et al 2011). This algorithm allows using only principal directions of noise, at the same time attenuating the influence of outlying information in the calculation of PCA. The number of robust principal components that are retained is fixed to 8 and based on previous experience.

Classification
The extracted features were then used to train a Linear Discriminant Analysis (LDA) classifier, which has demonstrated to be effective in the discrimination of MI data (Müller et al 2003, Blankertz et al 2007, Lotte et al 2007, Vidaurre et al 2007, Lemm et al 2011. For our analysis, the 3 runs of the MI dataset described in section 2.1 were used to estimate the required parameters (i.e. frequency band where ERD/ERS occurs and time interval), the spatial filters and train the classifier. To set the additional hyperparameter corresponding to invariant CSP, we employed the first 50 trials of the test set. These 50 trials have been always discarded from performance estimation, regardless of the method used.

Processing pipelines
The final results for accuracies of CONTROL, DIFF and SAME conditions are the outcome of applying different datasets to specific processing pipelines of methods that have been previously described. In order to clarify how each of the methods is applied within each pipeline and for each of the conditions, we summarize these procedures in the present section.

Pipeline to obtain baseline results for all conditions
The processing pipeline used to obtain baseline accuracies in all conditions is depicted in figure 2. This pipeline is state-of-the-art of motor imagery processing for two classes, as shown in (Sannelli et al 2019). First, the band and time interval of interest were found. The subject-specific band was computed based on heuristics from the discriminability between the spectra of binary class combinations (see Sannelli et al 2019 for more details). The band between 19 and 21 Hz or higher harmonics of the stimulation frequency were not included within the search range of the procedure. In particular, the range was divided within two broad bands corresponding to 6-19 Hz and 21-30 Hz. The obtained subject-specific bands varied between 5 and 23 Hz for the lower band limit and 12.5 and 28.5 Hz for the upper band limit, with corresponding mean values 11.85 and 16.35 Hz. Also the time interval was found based on heuristics of the discriminability of the ERD/ERS curves from laplacian channels over the sensorimotor area, as in Sannelli et al (2019). The minimum allowed width of this interval was 1000 ms. Then, a CSP analysis was performed to select six spatial filters. After that, the variance and natural logarithm were applied to the data and finally a LDA classifier was trained. The resulting features obtained with these parameters (frequency band and time interval) and CSP filters were then used to train a classifier that was applied to the test dataset of interest (CONTROL, DIFF or SAME). . SSD pipeline for DIFF condition. Using the MI training data, the frequency band and time interval of interest for the individual pair of classes were found, then SSD was applied and the components with the strongest SNR retained (between percentiles 80 and 100). After that, a CSP analysis was performed to select CSP derivations. Next, a LDA classifier was trained. All parameters (band, time interval, SSD derivations, CSP derivations and classifier) are subsequently applied in the test set (DIFF dataset), to obtain the final SSD pipeline classification error. The error rate achieved with this pipeline on each of the datasets (CONTROL, DIFF and SAME) is the baseline result for each of the conditions. We termed these procedures baseline pipeline.

Pipelines for DIFF condition
Apart from the baseline pipeline, DIFF condition was analysed using two other different pipelines. The first one is based on SSD (see figure 3 for a graphical summary of the procedures involved), where this method was applied to the band-pass filtered MI training data (without stimulation) to find only some MI related oscillatory sources. The frequency band and time interval used were the same as the ones of the baseline pipeline. In order to select the appropriate number of SSD sources, the raw training MI signals were taken and filtered in the previously found SSD space. Then, the power spectra of all SSD sources were estimated. The SNR was computed between the maximum power in the frequency band of interest and the mean power in the noise band. After that, only those SNR values of SSD sources between the 100th and 80th percentiles were retained. Later, a CSP analysis was performed in the selected SSD sources and a LDA classifier trained (see figure 3). Using this SSD pipeline, DIFF data were classified and the corresponding results were compared to the baselines obtained for CONTROL and DIFF conditions. The second pipeline for DIFF data uses iCSP with robust PCA. Again, the frequency band and time interval of interest were the same as in the baseline pipeline. Here the NMES dataset (considered as neuronal noise in this scenario) was used to improve the robustness of CSP filters against NMES disturbances. The dimensions of the noise matrix were reduced using the robust PCA algorithm. This step improves the accuracy of the procedure, as shown in (Pascual et al 2011). We termed these procedures RPCA pipeline. A graphical summary of the methods included in this pipeline is visible in figure 4.

Pipeline for SAME condition
Although SSD was developed to reduce dimensions of data containing oscillatory information, it can also be applied to filter out oscillatory information from data by discarding oscillatory sources of noise in specific bands. In order to achieve this, SSD was computed in the NMES (noise) data at the band where . RSSD pipeline for SAME condition: SSD is applied to the NMES data, where the sources with highest SNR are removed. Then, the training data is filtered in the remaining subspace. After that, SSD is applied and the components with the strongest SNR retained as in DIFF (percentiles 70-100). Then, CSP analysis is performed to select CSP derivations on the band and interval of interest. Next, a LDA classifier is trained. Subsequently, all parameters (band, time interval, SSD derivations from NMES and MI, CSP derivations and LDA classifier) are used in the test set (SAME dataset), to obtain the final accuracy.
the stimulation occurred (19-21 Hz) by previously applying Butterworth filters of order 4. The raw noise signal was filtered in the SSD space and the SNR of each SSD component estimated. In this case, we only discarded the components of very high SNR (noise) between the 100th and the 99th percentiles. The next step was to project the MI data into this 'NMES-clean' subspace. Here we also computed an additional SSD to remove dimensions as in the SSD pipeline. Finally, CSP analysis was applied before training a LDA classifier. The parameters found in this pipeline (band, time interval, SSD derivations from NMES, SSD derivations from MI, CSP derivations and LDA classifier) were used to extract features and classify SAME data. This collection of procedures, which we called RSSD pipeline, are depicted in figure 5.

Signal to noise ratio of motor imagery
In order to understand performance drops due to movement stimulation, we evaluated the amount of ERD/ERS observed for each subject (Gastaut andBert 1954, Pfurtscheller andda Silva 1999). The degree of ERD/ERS can be seen as an estimate of the SNR of modulated oscillatory signals, as those originated during movement imagination. The strength of the modulation allows the classification of different MI tasks, hence, it also relates to the ability of the classifier to distinguish between different classes. Nevertheless, there is a fundamental difference when ERD/ERS is computed between MI and rest or between two MI classes. In the first case, the amount of ERD/ERS might directly quantify the SNR of the signal, as ERD/ERS is, by definition, normalized with the power level in the resting period. However, for two MI classes the SNR of the signal is related to power variations between two MI tasks at different scalp locations. Spatially filtered EEG signals of one class will exhibit ERD and simultaneously (at the same time interval of the trial), the EEG signals from the other class will show ERS (or at least a less pronounced ERD). The difference between these ERD/ERS effects can then be interpreted as SNR, where the larger the difference, the easier it is to discriminate between MI tasks. When the CSP filtered signal exhibits the same level of ERD or ERS for both classes, class differences are not observed and the classification problem might become difficult. The quantification of ERD/ERS of CONTROL, DIFF and SAME conditions was performed as follows: the EEG data were temporally and spatially filtered using the subject-specific frequency band and the CSP derivations obtained in the training data (MI condition). Then, the Hilbert Transform (Clochon et al 1996) was applied in order to obtain the amplitude envelope of the subject-specific oscillations. Processed EEG activity was then averaged across epochs separately for each class. The ERD was calculated for each CSP derivation according to: ERD(%) = 100 × (POST − PRE)/PRE, where POST is the average amplitude in the subject-specific poststimulus interval and PRE is the average activity in the pre-stimulus interval (−500 to 0 ms). The averaged difference between classes was computed across CSP channels. The CSP channel with the highest r 2 -value (point biserial correlation coefficient) was selected. The r 2 -value is a correlation coefficient between a real variable (here, the ERD/ERS) and a dichotomous one containing class information.
Finally, the reported SNR values in terms ERD/ERS were obtained applying the absolute operator to the aforementioned ERD/ERS averaged differences.

Estimation of similarity between NMES and MI patterns
Neuromuscular electrical stimulation generates afferent patterns in the EEG that during the performance of MI occur simultaneously to MI related signatures.
As previously stated, during DIFF the stimulation might spread ERD over the scalp, which in turn might hamper the classification. On the other hand, during SAME the stimulation is always active, but it only coincides 50% of the time with the MI task. Here the classifier might confuse the afferent pattern with a pattern related to the MI task. In both settings, the classification performance might degrade more when the stimulation patterns and the MI patterns corresponding to one of the two classes are very similar.
Thus, quantifying the similarity between MI and stimulation brain patterns might help understanding performance drops observed in this study. Patterns, as obtained by CSP analysis, are simply vectors. A common procedure to estimate the similarity between two vectors is to compute the angle of separation between them using the formula of the inner product: where a MI are the CSP patterns extracted from the training data and a NMES are patterns extracted with CSP during the NMES of each of the arms. The final estimated similarity is the average angle between all six CSP patterns of the MI condition (training) and one selected NMES pattern for DIFF and SAME stimulation. In this way, one angle per subject and condition (DIFF or SAME) is obtained. Then, the correlation of the resulting angles to the DIFF and SAME baseline performances was carried out. The Spearman correlation, robust to outliers, was employed. A positive correlation between the angles and the accuracy means that the higher the difference between patterns, the higher the performance achieved.

Estimation of residual sensorimotor rhythm during the stimulation
The oscillatory activity originated in the sensorimotor cortex, referred to as sensorimotor rhythms (SMR), can be quantified by modelling the EEG power peaks and their corresponding noise baselines. This procedure was introduced in (Blankertz et al 2010a) where it was employed to predict MI accuracy from an SMR power estimate in the resting-state EEG. The method in (Blankertz et al 2010a) models the power spectral density (PSD) at a specific frequency range and also its corresponding noise baseline. In particular, it fits one curve for the noise and another one for the power peaks of the spectrum. The curve fitting is based on the minimization of the L2-norm of the difference vector between the spectral PSD and the modelled parametric curves. The SMR estimate is the maximal difference between the maximum peak and the noise at the specific frequency bin. In this paper, we used the EEG collected during stimulation (NMES dataset) to estimate the level of SMR during induced movements. Movement provokes a power decrease of the oscillatory activity (ERD), thus the remaining estimated power was called in this study 'residual SMR' . We obtained one residual SMR value for each of the participants of the study and correlated it to the accuracy obtained by each user during the SAME condition by means of the Spearman correlation coefficient. The Spearman correlation is a nonparametric method, robust to outliers and to deviations of the normality assumptions.

Results
The classifiers trained on CSP features from different conditions might be affected by a bias shift. This suboptimal bias might cause poor classification performance of potentially discriminable features (Vidaurre et al 2013. One way to avoid this is to use a bias-independent classification measure such as the area over the ROC curve (AOC). Thus, we opted for this procedure to obtain final classification errors.

Signal to noise ratio of CONTROL, DIFF and SAME features
In this manuscript, ERD/ERS is estimated to quantify the signal to noise ratio of the features used to classify BCI commands . The features were extracted with the baseline pipeline, where as described in section 2.6, band, time interval and CSP filters were obtained with training MI data. These parameters were subsequently applied to testing data from CONTROL, DIFF and SAME conditions. An example of ERD/ERS curves computed on test data of different conditions is depicted in the bottom right panel of figure 6. The plot shows that the ERD/ERS difference between classes is greater for CONTROL data than for DIFF and SAME. In fact, in the case of SAME there is no observable difference. These results are related to the power distribution over the scalp displayed on the left panel of figure 6. For CONTROL (top row), the signed-r 2 values show discriminative areas over the left hemisphere. On the other hand, DIFF and SAME conditions display r 2 -values close to zero, meaning that discriminability between MI tasks is drastically reduced. Also, for DIFF and SAME the ERD is more widespread over the scalp than for CONTROL. Comparing the first with the second and third rows of this figure, the ERD contralateral to the stimulation (which can be seen without the influence of MI tasks in the lower row of the picture) is present during both MI classes and it spreads over the whole scalp. This effect appears stronger for SAME than for DIFF.
In the top right panel of figure 6, the ERD/ERS values of each class and user are averaged across time in the interval of interest. Then, the absolute difference between CSP channels for one task is computed and the mean across them calculated, obtaining one value per subject. A high ERD/ERS difference means and SAME (third row) conditions. Each title indicates the MI task. The fourth row corresponds to NMES data, where the first column shows DIFF stimulation data, i.e. on the limb not related to the task (left hand). On the fourth row second column, SAME stimulation data is visible (right hand). The third column displays signed r 2 -values between the data displayed in the first and second column. Top right: scatter plot of ERD/ERS absolute values per subject for each condition (CONTROL, DIFF and SAME) in the baseline pipeline. All values below the diagonal mean that ERD/ERS of CONTROL are higher than those of DIFF or SAME. The higher the value, the more discriminative information is contained in the data. Bottom right: an example of ERD/ERS curves over time for conditions CONTROL, DIFF and SAME with the baseline pipeline. In multi-class settings, the discriminability of ERD/ERS features is achieved when a spatially filtered component of one class reflects ERD and simultaneously the same spatially filtered component of the other class exhibits ERS (or less pronounced ERD). The difference between both conditions helps the classification. When no difference is observed, the classification of CSP-based features turns into a very difficult problem. This is visible for the signals displayed for SAME cl. 1 and cl. 2, where no class difference is noticeable regarding ERD/ERS effects. A bit greater difference between classes is seen for DIFF, and the best situation in terms of discriminability is depicted for CONTROL. high SNR, thus the signals are easier to classify. All points below the diagonal show that DIFF or SAME data have lower SNR than the CONTROL condition. The means and standard error values for ERD/ERS related to CONTROL are 52.21% ± 9.45%, those related to DIFF are 30.55% ± 9.45%, and finally, those of SAME are 13.53% ± 4.03%. We hypothesized that baseline features of CONTROL condition would exhibit better SNR than DIFF and SAME baseline features. Thus, we employed a one-tailed Wilcoxon test for paired samples and corrected the p-values with the Holm-Bonferroni approach (Holm 1979). The SNR of CONTROL was significantly higher than the SNR of DIFF (p-value = 0.007) and also significantly higher than the SNR of SAME (p-value = 0.002) baseline features. Additionally, the difference between the SNR of DIFF and SAME baseline features was near significant (p-value = 0.065).

Results for DIFF condition
Classification errors of individual subjects are presented in the left panel of figure 7. The baseline error for DIFF condition (x-axis) is plotted against SSD and RPCA errors. The CONTROL baseline error is also plotted in the y-axis. The average and standard error of DIFF baseline was 22.41 ± 4.08%, SSD obtained 19.00 ± 3.67% and RPCA was 17.77 ± 3.77%. Finally, the CONTROL baseline error was 14.25 ± 4.08%.
We tested two hypotheses: first, that the baseline pipeline was suboptimal for DIFF condition, and thus CONTROL performance would be significantly better. Second, that SSD and RPCA pipelines would be significantly better than the baseline pipeline for DIFF. To test both hypotheses we performed non-parametric one-tailed Wilcoxon tests. We corrected the α-level for multicomparison using the Holm-Bonferroni approach, (Holm 1979).
The comparison between the baselines of DIFF and CONTROL was significant, with a p-value of 0.010. The difference between SSD and DIFF baseline error was also significant (p-value = 0.018). The same as between DIFF baseline and RPCA errors (p-value = 0.010).
Regarding SNR values, we also studied whether SSD and RPCA pipelines could increase ERD/ERS differences to those of the CONTROL condition (see figure 7, right panel). In this case the hypothesis was that no significant differences might be found, thus a two-tailed non-parametric Wilcoxon test with Holm-Bonferroni correction was employed. Figure 7. Left: Scatter plot comparing the error rates (AOC) obtained classifying DIFF data with baseline pipeline (x-axis) versus SSD and RPCA pipelines (y-axis). Classification results of CONTROL data are also displayed (y-axis). Right: Scatter plot comparing SNR values of DIFF condition with baseline pipeline (x-axis) and DIFF with SSD and RPCA pipelines (y-axis). The higher the value, the more discriminative information is contained in the data.

Figure 8.
Left: bar plot of AOC for CONTROL and SAME baselines. Also, RSSD error is depicted. On the left side of the picture, all subjects for whom AOC in CONTROL was smaller than 10% are shown. Right: scatter plot comparing SNR values of SAME baseline (x-axis) and RSSD pipeline (y-axis). The higher the value, the more discriminative information is contained in the data.
And in fact, no significant differences between the SNR of CONTROL baseline and the SNR of SSD (p-value = 0.5) and RPCA (p-value = 0.14) pipelines were found. Finally, the mean SNR achieved with the SSD pipeline was 50.75% ± 7.08%. The mean SNR for the RPCA pipeline was 41.72% ± 7.47%.

Results for SAME condition
Our goal in this condition was to filter out the spatial noise from the stimulation using SSD filters (RSSD pipeline). In this setting, during half of the trials afferent and efferent patterns were similar (Vidaurre et al 2013) because stimulation and MI task coincided (as in a facilitator paradigm). However, during the other half of the time, the classifier needed to identify a MI task that was performed simultaneously to the stimulation in another limb. This stimulation caused afferent patterns that were very similar to the other non-active MI class.
The detailed AOC results for the CONTROL and SAME baselines and also for RSSD are depicted in figure 8. There, one can observe that for very good performers (under 10% of error) in CONTROL, the RSSD pipeline can only slightly improve averaged results (for two participants it stays at the same level, for one user it worsens and for two an improvement is observed). However, for the rest of participants a clear improvement is visible. Regarding averaged values, the SAME baseline error was 29.13% ± 5.16% and after applying the RSSD pipeline, AOC was 23.91% ± 3.79%. As we hypothesized that the RSSD pipeline would significantly improve the baseline results, we employed a one-tailed Wilcoxon signed rank test (p-value = 0.024). Nevertheless, the difference between RSSD and CONTROL baseline error was still significant (p-value = 0.006).
From these results, it seems that RSSD can provide a significant advantage specially if accuracy rates in Figure 9. Left: correlation line of similarity (CSP patterns of MI and NMES) and SAME baseline accuracy. Right: correlation line of residual SMR (x-axis) versus baseline performance in SAME (y-axis). The line was plotted as described in section 2.9. The two values on the x-axis correspond to SMR residual rhythm at 80% and 90% of accuracy. CONTROL (that is, MI performance without disturbances) are lower than 90%. Regarding the SNR values, the features of the RSSD pipeline obtained an average of 21.25% ± 4.08%. Again, we conjectured that RSSD would significantly improve the SNR of the SAME baseline features, hence we performed a onetailed Wilcoxon sign rank test that allowed us to confirm our hypothesis (p-value = 0.014). All SNR results are depicted in the right panel of figure 8.

Similarity between MI and NMES and its relation to performance decrease during stimulation
As stated in sections 2.8 and 3.3, the similarity between MI and NMES patterns might hinder the classification due to the simultaneous occurrence of afferent and efferent patterns corresponding to different classes. Hence, this similarity might be a reason for the observed performance drops during MI simultaneous to NMES induced movements.
To find this out, we tested whether a significant correlation existed between the accuracy obtained with the baseline pipeline in conditions DIFF and SAME and the pattern similarity of MI and NMES. The result was significant only for SAME condition, with a correlation value of 0.75 and a p-value of 0.009. The corresponding correlation plot is visible on the left panel of figure 9.

Correlation of SAME performance and residual SMR
Finally, we studied whether the amount of residual SMR during stimulation was positively correlated to the baseline accuracy achieved for SAME. The Spearman correlation coefficient was employed to find out whether this relation was significant. This procedure is robust to outliers and based on the correlation of ranked data. We obtained a significantly positive correlation of 0.71 with a p-value of 0.013.
The right panel of figure 9 depicts the scatter plot of the residual SMR values during stimulation of the arm (NMES condition) against the performance obtained by the participants in the SAME condition. The line corresponds to the correlation found with the Spearman coefficient. This line is not equivalent to the one obtained using the Pearson coefficient. It was found using an alternative procedure that consists on finding the β parameter in y − βx for which the absolute correlation with x is minimum (i.e. zero or close to it). The β minimizing the correlation is the slope of the depicted line in the right panel of figure 9. The corresponding intercept was computed as the median of the residuals. The two points marked in the x-axis of the figure correspond to the SMR residual values necessary to obtain 80% and 90% of accuracy, respectively.

Discussion
The present study aims at improving the reliability of BCI systems by turning them more robust against motor disturbances. Reliability is a key factor in the design of human-computer interaction systems and, in relation to BCIs, it is included into a growing body of literature that investigates the use of these systems in the real world (Millán et al 2010, Allison et al 2012. The challenge of bringing BCI out-of-the-lab includes studying how non-controlled conditions affect BCI performance, as we did in this manuscript. A similar example includes Friedrich et al (2011), who found that auditory distractions do not significantly affect BCI performance of mental tasks (including MI). Also, Brandl et al (2016) showed that MI classification is affected by secondary tasks such as hearing news or closing the eyes. A further example is the work of Iscan and Nikulin (2018), in which the effect of distractions on the performance of a SSVEP-based BCI was investigated. In relation to increasing the reliability of BCI systems, other notable examples are related to user-centered approaches designed to improve the feedback experience of BCI users (Friedrich et al 2013, Kübler et al 2014, Lorenz et al 2014, Liberati et al 2015. To the best of our knowledge, our work goes one step further by focusing on motor disturbances, analysing their influence in the performance of BCI systems and providing new methodology to improve the reliability of BCIs for rehabilitation purposes. The results relate to a BCI experiment in which brain oscillatory activity was negatively affected by afferent inputs produced by NMES. These perturbations can occur, for example, during the use of neuroprostheses. However, there are other situations where similar artifacts could be expected. For example, DIFF condition would be similar to that produced by compensatory movements of the contralateral side during movement attempt in e.g. stroke patients, (Cai et al 2019). On the other hand, SAME condition could happen due to incorrect triggering of a movement device in one limb during rehabilitation. More specifically, a possible real case application could be during movement attempt decoding while using a rehabilitation BCI concept. There, brain signals control a body actuator like NMES or a robotic exoskeleton. Involuntary muscle contractions on the non-paretic side during paretic side movement attempts, or false positives in the decoding activating the body actuator, may negatively affect the decoding of movement intention. In such an application (rehabilitation BCI), the decoding or link (i.e. the contingency) between brain oscillatory activity and the movement of the paretic limb is key to induce functional neuroplasticity. Therefore, a system like the one developed and described in the present work, although still not tested with patients, has the potential to partially improve the contingency of BCI systems in rehabilitation contexts.
The results of this manuscript show that both DIFF and SAME conditions exhibit more widespread ERD over the contralateral cortex to the stimulated limb. Furthermore, as seen in figure 6 the ERD/ERS difference between tasks is also diminished, with almost no discriminative information contained in the signed r 2 -values. Along with it, a significant performance drop is also observed in comparison to CONTROL baseline.
Our results show that the ERD/ERS spatial distribution changes with stimulation. Hence, we hypothesized that rejecting or robustifying spatial information related to noise would improve performance. In fact, the idea of reducing noise by rejecting spatial information is a usual procedure to remove artifacts (e.g. the application of independent component analysis (ICA) to EEG data, Meinecke et al 2002, Vorobyov and Cichocki 2002, Winkler et al 2011. However, despite the afferent information in our framework is due to the stimulation, it has a similar oscillatory nature as the signals related to MI tasks. Thus, the preferred procedure to reject afferent patterns was SSD, which is a source separation algorithm that exploits the oscillatory nature of the EEG.
As stated before, the stimulation causes ERD that is not only contralateral to the stimulated limb, but also spreads over other areas of the scalp. In the case of DIFF this means that the ERD elicited by MI tasks is ipsilateral and central to the ERD from the stimulation. Our results show that selecting those directions where MI tasks occur is sufficient to filter out afferent oscillatory noise. Therefore, this also indicates that the NMES and MI pattern similarity is not related to the performance drop observed in DIFF.
For SAME, the problem of removing afferent noise is more complex. During 50% of the trials, the stimulation of the hand is simultaneous to the MI of the feet. The hand stimulation is similar to the corresponding MI hand pattern. Hence, from the classifier's point of view, both classes of interest appear to be active simultaneously, complicating the classification problem in comparison to the usual setting.
In this case, retaining a few MI directions as in the SSD pipeline would not remove stimulation noise in SAME. This is because afferent and efferent patterns coincide for the hand MI task. On the other hand, excessively eliminating SSD directions in the MI band, would also remove ERD containing class discriminative information of the hand class. Our idea was thus to directly reject noise by removing SSD directions that appear at the frequency band of the stimulation. In fact, the authors of (Brickwedde et al 2020) showed that peripheral somatosensory stimulation at 20 Hz produces not only steady-state evoked responses at 20 Hz, but it also affects neuronal activity at α/µ frequency range (i.e. one of the bands where ERD occurs in MI). Moreover, such changes were not constant but varied dynamically during the experiment. Therefore, we reasoned that neuronal structures involved in the processing of NMES in our study would be affected not only through changes in β but also in the µ frequency range. Therefore, although SSD was trained in the range of NMES frequency, rejecting the corresponding SSD spatial filter in a broad frequency range would also remove neuronal changes at structures responding to NMES in other frequency ranges. This idea was applied in the RSSD pipeline, which improved the classification accuracy in condition SAME. Our results showed that it was particularly effective for those participants who are more vulnerable to performance drops.
Finally, given that the baseline accuracy was higher for CONTROL than for SAME, we studied how much residual SMR power during stimulation should be present to ensure acceptable performance.
First, we found that the correlation between this residual SMR power and the baseline performance of SAME was significant. Using this relationship, the right panel of figure 9 shows that an approximate residual SMR between 4 and 5 dB is necessary to respectively achieve 80% or 90% of accuracy when disturbances similar to the SAME condition are expected.

Conclusions
In this paper we showed how stimulated movements might cause performance drops in MI-based paradigms. We also presented three methods to counteract their harmful effect. Disturbances of DIFF condition could be mitigated by applying two different pipelines, one based on robustifying filters (RPCA) and the other on selecting informative directions (SSD). The SSD is a simple technique that increases SNR and helps selecting SMR sources at specific frequency bands. Interestingly, and unlike RPCA, this method does not involve prior knowledge of the noise characteristics, reducing the need of extra recordings. We could also achieve a significant improvement for the SAME condition with an accuracy that was still lower than for CONTROL. Importantly, we studied the level of residual SMR during stimulation and provided a range of easy-to-measure values that are significantly related to future performance under movement disturbances. These values can be obtained during the stimulation without MI and serve as predictors for the successful application of a BCI paradigm under these complicated experimental conditions. Future research will study nonlinear classification such as neural networks as well as explanation techniques (Sturm et al 2016) for assessing potential temporal variability of the disturbance during stimulation and during a training process in rehabilitation. Finally, the number of participants of the study might be a limitation despite the promising results obtained. Thus, future experiments with a greater number of participants should also be considered.

Data availability statement
The data generated and/or analysed during the current study are not publicly available for legal/ ethical reasons but are available from the corresponding author on reasonable request.