Hybrid brain-computer interface with motor imagery and error-related brain activity

Objective. Brain-computer interface (BCI) systems read and interpret brain activity directly from the brain. They can provide a means of communication or locomotion for patients suffering from neurodegenerative diseases or stroke. However, non-stationarity of brain activity limits the reliable transfer of the algorithms that were trained during a calibration session to real-time BCI control. One source of non-stationarity is the user’s brain response to the BCI output (feedback), for instance, whether the BCI feedback is perceived as an error by the user or not. By taking such sources of non-stationarity into account, the reliability of the BCI can be improved. Approach. In this work, we demonstrate a real-time implementation of a hybrid motor imagery BCI combining the information from the motor imagery signal and the error-related brain activity simultaneously so as to gain benefit from both sources. Main results. We show significantly improved performance in real-time BCI control across 12 participants, compared to a conventional motor imagery BCI. The significant improvement is in terms of classification accuracy, target hit rate, subjective perception of control and information-transfer rate. Moreover, our offline analyses of the recorded EEG data show that the error-related brain activity provides a more reliable source of information than the motor imagery signal. Significance. This work shows, for the first time, that the error-related brain activity classifier compared to the motor imagery classifier is more consistent when trained on calibration data and tested during online control. This likely explains why the proposed hybrid BCI allows for a more reliable means of communication or rehabilitation for patients in need.


Introduction
Since the introduction of a brain-computer interface (BCI) by Vidal [1], there have been many implementations as potential communication or rehabilitation interventions for patients (e.g. [2][3][4][5][6]). In BCI systems, brain activity is read directly from the brain and a control command is generated based on the user's intentions while bypassing the common neuromuscular pathways. Electroencephalography (EEG) is one technique to read brain activity non-invasively and is thus widely used in BCI applications. Motor imagery (MI) BCI is a popular category of BCI systems that relies on the user-initiated movement imagination 5 Author to whom any correspondence should be addressed. of different body parts, which results in distinguishable patterns of brain activity [3,7]. By detecting and classifying these patterns in real time, imagining movement of different body parts can then be mapped to, for instance, a cursor moving on a screen or a switch controlling a wheelchair.
The MI signal is user-generated; therefore, the user's ability to generate distinguishable motor imagery signals plays an essential role in reliable BCI control. Since the generated patterns are relatively subtle to detect, pattern recognition methods are necessary to train relevant classifiers [8][9][10]. BCIs usually use supervised machine learning in which the training data come from a calibration session [11]. Due to the inherent non-stationarity of brain activity, re-calibration is often necessary after some time. Without such re-calibration, the BCI system suffers a decrease in accuracy over time, limiting the transfer of a classifier trained during calibration to subsequent real-time control [12].
To improve reliability in BCI control while avoiding time-consuming re-calibration sessions, one approach is to use other available sources of information in order to support, adjust, or correct the information from the primary detected signal. One potential such source is the brain activity that occurs in response to the BCI output (feedback). User brain activity was shown to be different when observing a successful execution of an intended task by the BCI versus an unsuccessful execution [13][14][15][16][17][18][19][20]. This so-called 'error-related brain activity' is classifiable and can be used to alleviate the reliability limitation [17,[21][22][23]. Of course, during online use of a BCI system, classifiers focusing on these secondary sources of information could potentially suffer the same degradation over time. Aside from the issue of how to optimally combine multiple sources of information, another important question is how the accuracy changes when a classifier trained on the calibration data is used in online control for both the primary (e.g. motor imagery) and the secondary (e.g. error-related brain activity) signals.
There are multiple approaches towards using error-related brain activity to improve BCI performance. One approach is to discard a BCI output and repeat the trial or to execute an action in the reverse direction upon detection of an error [14,24,25]. A second approach, based on error-driven learning, attempts to limit the occurrence of a future error by updating the classifiers upon the detection of errorrelated brain activity or to discard the unsupervised adaptation temporarily if an error is detected [26,27]. A third approach based on error integration proposed a hybrid BCI for a 1-D cursor control by combining the motor imagery signal with the user brain activity in response to the cursor's changes in the direction of movement [22,23].
Recent studies in invasive BCIs also show evidence for the possible detection of error-related brain activity and propose to use it to improve the BCI performance. For instance, the authors in [28] showed that error signals can be detected from human electrocorticography (ECoG) in a continuous task comprising a video game. In other work, the authors in [29] showed the detection of error-related brain activity in a motor imagery task in human ECoG. Moreover, the authors in [30] proposed to augment an intracortical BCI with error detection. They showed in an experiment with non-human primates that a classifiable error signal can be detected from electrodes located in the premotor and primary motor cortices and proposed a system to automatically undo or prevent mistakes.
In earlier work [17,31], we showed that the motor imagery signal and feedback/error related brain activity occur in overlapping frequency bands and may therefore interfere with each other if not handled appropriately. We also proposed to combine these two signals through a more sophisticated error integration approach that simultaneously combined a right/left hand motor imagery classifier with a classifier detecting whether the user perceived the last BCI output as an error or not. The proposed hybrid BCI system translated the classification score from the domain of the error-related brain activity to that of the motor imagery classifier and learned a logistic regression classifier to best combine the two sources of information for each user. This allows for a system that relies more/less on either the motor imagery or error-related brain activity signals depending on how classifiable each source of information is for a specific user. We showed the efficacy of such a system in an offline BCI with sham feedback across 10 participants [17].
Since real-time feedback can itself modify brain activity, it is vital to evaluate the efficacy of our earlier proposed hybrid BCI in real-time control. This is the goal of the current study where we answer two questions: 1) how the performance of the proposed hybrid BCI in real-time control compares to a conventional motor imagery BCI, and 2) how the underlying two aspects of the proposed hybrid BCI (i.e. error-related brain activity and motor imagery signal) compare with each other in their contributions to robust and reliable control.

Methods
The study was approved by the University Institutional Review Board at UC San Diego and all participants signed an informed consent form prior to their participation. Data were recorded from 12 participants who were naive to BCI experiments (7 females, 1 left-handed, average age = 20.4± 1.0). Each participant took part in one session of roughly 2.5 hours in length while they comfortably sat in front of a screen (Dell 19" CRT monitor), centered with respect to the screen and about one meter away. EEG data were recorded using a 64-channel BrainAmp system (Brain Products GmbH). The active electrodes were located according to the international 10-20 system [32]. The impedance of the electrode connectivity was adjusted to be below 20 kΩ. Electromyography (EMG) data were also recorded with bipolar electrodes using BrainAmp ExG from the wrists and upper forearms; however, the recorded EMG data were not used for the analysis in this paper. Data were recorded at a sampling rate of 5000 Hz with online reference and ground electrodes located at FCz and AFz, respectively. To ensure accurate recording and inference of the brain activity, a small photo sensor was placed at the bottom right corner of the screen and was connected to the ExG box. A white circle (with the same diameter as the the photo sensor) was turned on and off at the same time that the cursor moved on the screen. This allowed for an accurate recording of the stimuli presentation time and removing any potential jitter in the measurement system.
Python was used to design and present the stimuli as well as for the real-time processing of the EEG signal. Simulation and Neuroscience Application Platform (SNAP) [33] was used for stimuli presentation, Lab Streaming Layer (LSL) [34] to interface the EEG system with the computer, and Numpy [35], SciPy [36] and scikit-learn [37] for data processing. Python was used for the offline analysis of the data as well. MATLAB [38] and EEGLAB [39] were used for offline epoching and plotting of the results. Temporal filters were implemented through the filtfilt function from SciPy [36] that applies a filter twice, once forward and once backward, to ensure zero-phase filtering. Also, wherever not otherwise indicated, all 64 EEG channels were used for feature extraction and classification.

Experiment
Each participant took part in one experiment session that was comprised of two phases. The first phase was primarily designed for the participants to gain experience with the motor imagery of their right/left hand. Data from the first phase were not analyzed in this work. Phase 2, which is the focus of this work, involved using motor imagery to control a cursor on the screen. This phase had two parts: calibration and online control. Details of the stimuli presentation are described next.

Phase 1
Each trial began by showing a right/left arrow representing the side of imagery. Next, participants fixated on a fixation cross for 1 second. Then the text 'imagery' appeared above the fixation cross instructing participants to begin motor imagery of the corresponding hand. The movement imagination time was set to 3 seconds and afterwards participants were provided with feedback in the form of two bars whose height represented the power in the 7-30 Hz frequency band averaged on the electrodes over the right (EEG channels FC4, C4, CP4) and left (EEG channels FC3, C3, CP3) motor cortices, separately. Since motor imagery of the right/left hand results in contralateral event-related desynchronization (i.e. decreased power over right/left motor cortex for left/right motor imagery), participants were instructed to maximize the height of the bar on the side of imagery and minimize the height of the one on the other side [40]. If the bar height was larger than a set threshold, the scale was adjusted for easy observation and interpretation of the bar heights. An example of a trial is depicted in figure 1(a) and an explanation of the provided feedback is shown in figure 1 Phase 1 had a total of 30 trials (15 right hand and 15 left hand motor imagery) divided into three blocks and there was a 5-second break in between two consecutive trials. The order of the trials was randomly selected once and the order was kept the same across participants. Participants were given instructions and suggestions on what to imagine for their hand movements; however, they had a chance to explore different imagined movements and decide on what worked best for them. After each block, participants could take as much rest as needed. Upon completion of this phase, participants filled in a short questionnaire in which they answered what movement they imagined for their right and left hands. They were instructed not to change their selected movement imagination throughout the rest of the experiment.

Phase 2
In phase 2, participants were instructed to use their selected right/left hand movement imagination to move a cursor to the right/left towards a target. At the beginning of every trial, the cursor (blue circle, 2 cm in diameter) appeared at its starting position, i.e. the center of the screen. The target (white circle, 2 cm in diameter) also appeared at either side of the screen, exactly three horizontal steps away from the cursor. After 1.2 seconds, the target disappeared and the cursor moved at the rate of one movement or 'step' per 1.2 seconds to either the left or the right. The trial ended when the cursor hit the target location or the corresponding location on the other side or after 12 cursor steps, whichever came first. Therefore, trial duration varied across different trials, depending on the cursor movements, approximately between 4 to 15 seconds. Participants had 5 seconds to rest before the next trial began. The choice of a maximum of 12 steps in each trial was selected to ensure that the trial ended with a maximum duration of about 15 seconds. An example of a trial is depicted in figure 2. In this example, the target appears at the right side of the screen indicating a right hand motor imagery trial where the participant should imagine movement of the right hand throughout the trial. Since the goal in each trial is for the cursor to hit the target, a movement towards the target is called a 'good' movement and a movement away from the target is called a 'bad' movement. Therefore, in this example, the first cursor movement towards the right is a good movement, the second one is a bad movement as the cursor moves away from the target. The third movement is a good movement and so on. This trial is a success (or hit) as the cursor reached the target location.
Phase 2 had a total of nine blocks and each block was comprised of 20 trials. In each trial, the target appeared randomly at the right or the left side of the screen while maintaining that each block had 10 right and 10 left trials. In the first three blocks, participants received sham feedback while they were led to believe that they were in control of the cursor movements.  This was to have the necessary labels to train the classifiers. In these first three blocks, the cursor movements were predefined and randomly generated with the following criteria: the cursor had a fixed probability of going towards the target (p = 0.60) following a Bernoulli distribution until the target or the corresponding location on the other side was reached, or a maximum of 12 steps occurred; however, any generated trials with more than three consecutive changes in direction were not used. The cursor movements for the first three blocks were generated ahead of time and kept the same across participants. The hit rates were 0.75, 0.6 and 0.75 in blocks 1, 2 and 3, respectively. The recorded EEG data in the first three blocks were used to calibrate the classifiers as will be explained in detail later.
In the latter six blocks of phase 2, participants received real online feedback in which three blocks used the conventional right/left hand motor imagery (R/L) control and the remaining three blocks used our proposed BCI control that combined the right/left hand motor imagery with the error-related brain activity signal. The latter classifier detects whether the user perceived the last cursor movement as 'good' or 'bad' , i.e. going towards or away from the target, respectively. This is called a good/bad classifier (G/B) and the proposed control is therefore called R/L+G/B control. The blocks with R/L and R/L+G/B control were alternated and the order was counterbalanced across participants. After each block in this phase, participants could take as much rest as needed.
Participants were not aware of the sham feedback in the first three blocks and the different controls in the online blocks. After each block in phase 2 (including the calibration and online control blocks), participants answered the following question: from 1 to 10 where 1 represents the least and 10 the most amount of control, how much in control of the cursor did you feel? Figure 2. An example of a trial in phase 2. Each trial began with the cursor (blue circle) at the center and the target (white circle) at either side of the screen, exactly three steps from the cursor. Trial ended when the cursor hit the target location or the corresponding location on the other side or after 12 cursor steps, whichever came first. Participants had 5 seconds to rest before the next trial began. Note that the background of the frames was dark gray in the experiment but is depicted lighter here for easier visualization of the details.
Participants also filled in questionnaires aimed at quantifying their handedness and various aspects of their personality [41][42][43][44]. However, the data were not used for the analysis of this work.

Calibration and online control
We performed extensive analysis on our previously recorded data [17] comprising data from 10 separate participants each participating in ten blocks with 20 trials of an experiment with sham feedback (similar to the calibration phase in this work). We compared different classifiers in their performance when applied to three blocks of their data, i.e. the planned duration of the calibration phase, to decide the specifications of the feature extraction and classification methodologies such as the type of classifier, number of spatial filters, specifications of the temporal filters, etc as explained next. Note that the data from our previous work were not directly used for calibration in this work.

Calibration
The recorded EEG data from the first three blocks of phase 2, called the 'calibration data' during which the cursor moved according to a pre-determined set of movements, were used to train the classifiers to be applied later for online control.
To train classifiers, the recorded calibration data were downsampled to 100 Hz, re-referenced to common average, and epoched 0-1 seconds time-locked to each step (cursor movement) excluding the last cursor movement. The last cursor movement was excluded as it indicated the end of a trial and no motor imagery signal was generated by the participants when a trial ended. In the trial example depicted in figure 2, epoching the EEG data resulted in 4 epochs. Each epoch was labeled as good-right (GR), bad-right (BR), good-left (GL) or bad-left (BL) depending on whether it belonged to a right/left motor imagery trial and whether the cursor moved towards/away from the target resulting in either a good or a bad movement. In the trial example in figure 2, the first epoch was labeled as GR, the second as BR, and the third and fourth as GR. Similarly, a left motor imagery trial (with the target at the left side of the screen), resulted in GL and BL epochs for cursor movements towards and away from the target, respectively.
Three classifiers were trained on the selected epochs: two to classify the error-related brain activity (called the G/B-csp and G/B-wm classifiers) and one to classify the motor imagery signal (called the R/L classifier). Classifiers were trained for each participant separately on her/his calibration data. Since the number of GR, BR, GL and BL steps in calibration data was not balanced (mainly because the average hit rate for the calibration blocks were by design higher than 0.5 to maintain participants' motivation), prior to training the classifiers, the population of the GR, BR, GL and BL epochs were balanced by subsampling the larger groups. This was to have an unbiased classifier. The subsampling occurred at random and no steps from the calibration data were removed except for balancing the populations of the four aforementioned groups of steps.
(a) R/L classifier: For the right/left hand motor imagery classification (R/L), the GR and BR epochs were concatenated and labeled as the right class. The GL and and BL epochs were also concatenated and labeled as the left class. The R/L classifier was trained to distinguish between the right and left classes. To do so, first the epoched data were filtered to 7-30 Hz with an IIR filter (6th order Butterworth). Then the method of common spatial patterns (CSP) [9,45] was applied and the top 3 CSP filters for each of the right and left classes were selected (i.e. a total of 6 filters). Temporally filtered epochs were passed through the selected CSP filters and the logarithm of the variance of the filtered data across time were selected as features. A linear discriminant analysis (LDA) with automatic shrinkage using the Ledoit-Wolf lemma [46] was trained on the selected features [37]. The choice of shrinkage was to encourage better generalization. (b) G/B-csp classifier: The GR and GL epochs were concatenated and labeled as the good class. The BR and and BL epochs were also concatenated and labeled as the bad class. The G/B-csp classifier was trained to distinguish between the good and bad classes [17]. To train the G/B-csp classifier, we filtered the data to 1-30 Hz with an IIR filter (6th order Butterworth). A smaller time window of 50-950 ms was selected from each filtered epoch. Next, the CSP technique [9,45] was applied and the top 3 CSP filters for each of the good and bad classes were selected (i.e. a total of 6 filters). Temporally filtered epochs were passed through the selected CSP filters and the logarithm of the variance of the filtered data across time were selected as features.To encourage better generalization, LDA with automatic shrinkage using the Ledoit-Wolf lemma [46] was trained on the selected features [37]. (c) G/B-wm classifier: we trained another G/B classifier called G/B-wm following the windowed-means approach for single trial classification of an event-related potential (ERP) [14,47]. We considered EEG activity on channels Fz, Cz, CPz and Pz as the error-related brain activity is considered to be a fronto-central signal that is best picked up by the mid-line channels [14]. The EEG signal on these 4 channels was filtered to 1-10 Hz with an IIR filter (6th order Butterworth). A smaller time window of 50-950 ms was selected from each filtered epoch and baselined to the first 50 ms. Then the average of the signal in 100 ms nonoverlapping windows in 9 windows (50-950 ms) were selected as features. An LDA classifier with automatic shrinkage using the Ledoit-Wolf lemma [46] was trained on the selected features [37,47]. The choice of shrinkage was to encourage better generalization.
The proposed R/L+G/B classifier: Our proposed hybrid MI-BCI combined the scores from the aforementioned classifiers as follows. Each R/L, G/B-wm and G/B-csp is a binary LDA classifier. Let the event spaces for the R/L and G/B (-csp and -wm) classifiers be r, l and g, b where r, l, g and b represent right, left, good and bad cursor movements. Then, P RL (r) or P RL (l) indicate the score/probability of the R/L classifier for outputting r or l, respectively. On the other hand, P GB − csp (g) or P GB − csp (b) indicate the score/probability of the G/B-csp classifier for outputting g or b and similarly, P GB − wm (g) or P GB − wm (b) indicate the score/probability of the G/B-wm for outputting g or b. Then the following holds: To combine the three classifiers, we trained a logistic regression to combine the scores from the aforementioned classifiers and output r/l as this is the goal of a motor imagery BCI. However, this would not be possible directly since the domain of the G/B (including G/B-csp and G/B-wm) and R/L classifiers are not aligned in general. If the cursor last moved to the right, then P GB − csp (g) and P GB − wm (g) would be directly mapped to P RL (r). However, if the cursor last moved to the left, then P GB − csp (b) and P GB − wm (b) would be mapped to P RL (r) instead. So the selected features were the following depending on the last cursor movement: if the cursor last moved to the right [P RL (r), P GB − csp (g), P GB − wm (g)], and if the cursor last moved to the left [P RL (r), 1 − P GB − csp (g), 1 − P GB − wm (g)] were selected as features and a logistic regression (with three weights and one bias term) was trained on the selected features [17].

Online control
As mentioned earlier, blocks 4-9 comprised online control of the cursor where half of them used the R/L control and the rest used the proposed R/L+G/B control. In each trial, after the target disappeared, the classifier (selected with respect to the block's control type) was applied to every 1.2 seconds of EEG data at a time and the cursor moved to the right or left according to the output of the classifier. R/L control: In the R/L control blocks, only the R/L classifier was used to control the cursor the same way as in a conventional motor imagery BCI. This is depicted in figure 3.
R/L+G/B control: The R/L+G/B blocks used the proposed R/L+G/B classifier. As explained earlier, the scores from the R/L, G/B-csp and G/B-wm classifiers were combined through a logistic regression based on the direction of the last cursor movement (CD). This is depicted in figure 4. Note that in the R/L+G/B blocks, in each trial, the first cursor movement was based on the R/L classifier only as no feedback was provided to the participant yet. From the second movement onward, the R/L+G/B classifier was used.

Hit rate and subjective rate
We compared R/L and R/L+G/B controls (blocks 4-9) in various aspects. A successful trial was when the cursor hit the target. Therefore, 'Hit rate (HR)' is defined as the rate/percentage of hit targets in each block (of 20 trials). We define another online score based on the participants' ratings of how much in control they felt in each block. We call this 'subjective rate (SR)' . We report the participants' online scores including hit rates and their subjective ratings.

Information transfer rate
Since there is a trade-off between accuracy and time, we also calculated the information transfer rate (ITR) for each type of control. Let x i and y j represent the intended and decoded classes, respectively. Since the target is located at either the right or left, then i, j ∈ r, l. We used the following equation to calculate the ITR [48,49]: where T is the trial duration. We estimated T using the average number of steps (AST) per trial across the R/L or R/L+G/B blocks for each participant. Note that the timed-out trials, taking the maximum allowed number of steps (i.e. 12), were also included in the calculation of the AST. Since each step took about 1.2 seconds to complete, we estimated T = 1.2 × AST + 5 as the average trial duration including the inter-trial interval of 5 seconds.
Since the target location was balanced by design, then p(x r ) = p(x l ) = 0.5. Also, p(y j |x i ) for i, j ∈ r, l, was estimated as the rate of trials that hit the target at the j side when the target was in fact located at the i side. ITR was calculated for each participant, separately for the three R/L and the three R/L+G/B control blocks.

Classification accuracy of steps
As another metric, we looked at the performance of the R/L, G/B-csp, G/B-wm and R/L+G/B classifiers for classifying each cursor movement or step through offline analysis of the recorded data. Our goal was to investigate how the trained classifiers compared with each other and how they contributed to our proposed R/L+G/B classifier. To do so, the calibration (blocks 1-3 of phase 2) and the online recorded data (blocks 4-9 of phase 2) for each participant were epoched 0-1 seconds time-locked to each cursor movement while excluding the last cursor movements. All trials were used for the analyses except for three trials from participant B5 and two trials from participant B6 (all from the recorded online data of these participants) that were removed due to technical issues during recording.
As mentioned earlier, each cursor step may have any of the following 4 different labels: GR, BR, GL or BL. For training and testing the classifiers, these four groups of steps were balanced by subsampling the groups with the larger population. After balancing the groups, there were 136 steps in the calibration blocks, and on average 272.7± 38.4 steps in the online blocks across participants. Note that the steps in the calibration blocks were the same across participants by design, but the steps differed between participants in the online blocks due to the difference in participants' performance. Subsampling (separately within the calibration and online data steps) was done 10 times where each is called an instance of the data with balanced GR, BR, GL and BL groups leading to balanced right, left, good and bad classes. Balanced classes allow for easier interpretation of the results.
We compared the classification accuracy of the R/L, G/B-csp, G/B-wm and our proposed R/L+G/B classifiers in three conditions: 1) trained and tested on the calibration data using cross-validation (TRcal-TEcal), 2) trained and tested on the online data using cross-validation (TRon-TEon) and 3) trained on the calibration data and tested on the online data (TRcal-TEon). The last condition is what took place during the online control. We did these analyses two ways: with and without separating the online steps that belonged to the R/L and R/L+G/B blocks. Figure 5 reports the online hit rates (HR) and subjective rates (SR) across participants. For each participant, the online part of our experiment comprised six blocks (three R/L and three R/L+G/B blocks). Scores were averaged across the R/L and R/L+G/B blocks separately and reported as HR and SR for    Table 1 presents the sensitivity of the right (R) and left (L) target hit rates in the R/L and R/L+G/B blocks. Even though for some participants (e.g. B4, B8 and B9) there is either negligible bias or bias towards the right, on average across participants there is a slight bias towards the left in both R/L and R/L+G/B blocks. We will show later in section 3.3.1 that this bias is most probably induced by the bias in the R/L classifier at the level of every cursor movement.

Hit rate and subjective rate
As mentioned earlier, trials could end when the number of cursor movements reached its maximum (i.e. 12 movements). Figure 6 shows the average number of timed-out trials for R/L and R/L+G/B blocks before hitting the target or the corresponding location on the other side. Wilcoxon signed rank test shows no significant difference across participants in the number of timed-out trials between the R/L and R/L+G/B blocks (p = 0.86).

Information transfer rate
AST and ITR for the R/L and R/L+G/B blocks are reported in table 2. Note that 1 bit of information was conveyed when the cursor hit the target. Since the target was 3 steps away from the cursor, the (a) Online hit rates (HR). The upper limit of the chance interval with significance of p= 0.05 is 0.62 [49].
(b) Subjective rates (SR). best achievable ITR was 1/T when AST = 3, i.e. 0.11 628 bit/sec as T = 1.2 × AST + 5 also included the 5 seconds inter-trial interval. This explains the low ITR values reported in table 2. Nevertheless, the ITR in the R/L+G/B blocks is significantly higher across participants than the ITR in the R/L blocks (Wilcoxon signed rank test, p = 0.007).

Classification accuracy on steps
The classification accuracy of the classifiers on steps are reported in figures 7, 8 and 9. The red bars represent the 'cross-validated calibration accuracy' (TRcal-TEcal), that is the accuracy of a classifier trained and tested on the calibration steps. The blue bars on the other hand, represent the 'cross-validated online accuracy' (TRon-TEon), that is the accuracy of a classifier trained and tested on the online steps. The blue and red bar heights indicate the average accuracy of 10 instances of a 5-fold cross-validation over balanced steps. Bar heights represent the average and the error bars indicate the standard deviation for individual participants and standard error of the mean for the average bar (AVR) which is across participants.
The green bars, on the other hand, represent the 'transferred online accuracy' (TRcal-TEon), that is the classification accuracy of a classifier trained on the calibration steps and tested on the online steps, again over balanced classes. Bar heights represent the average and the error bars indicate the standard deviation for individual participants and standard error of the mean for the average bar (AVR) which is across participants. Note that we did not separate  the steps from the R/L and R/L+G/B blocks in these analyses.
Since calibration steps were the same across participants, the upper limit of the chance level (at p = 0.05) [50] indicates 0.58 for the red bars. However, the number of available steps in the online blocks varied across participants and the upper limit of the chance level (at p = 0.05) for each participant is reported in table 3.
As can be noted from figures 7, 8 and 9, even though on average the cross-validated calibration accuracy (TRcal-TEon) is lower for both G/B-csp and G/B-wm compared to the R/L classifier, the loss from transferring the classifier to the online data is much larger for the R/L classifier. We used Wilcoxon signed rank tests to compare the classification accuracy of TRcal-TEcal (the red bars) and TRcal-TEon (the green bars) for each classifier, across participants. For G/B-csp and G/B-wm, the difference is not statistically significant across participants (Wilcoxon signed rank test, p > 0.6). However, for the R/L classifier, TRcal-TEon is worse than TRcal-TEcal and this difference is statistically significant across participants (Wilcoxon signed rank test, p < 0.001).
We also compared the cross-validated calibration accuracy (TRcal-TEcal) and the cross-validated online accuracy (TRon-TEon), for each classifier. This is because one can argue that the different performance as represented by the lower R/L transferred online accuracy is mainly because the data quality is different and less classifiable in the online data. However, a Wilcoxon signed rank test shows that the difference between TRcal-TEcal (the red bars) and TRon-TEon (the blue bars) for the R/L classifier across participants is not statistically significant (p = 0.42), suggesting that the R/L motor imagery data quality is not the main cause of the drop in the transferred online accuracy.
Across participants, the transferred online accuracy (TRcal-TEon) of the R/L+G/B classifier as depicted by the green bars in figure 9 is significantly better than the transferred online accuracy (TRcal-TEon) for the R/L classifier as depicted by green bars in figure 7 (Wilcoxon signed rank test, p < 0.001). This indicates a higher accuracy in each cursor movement that resulted in the higher hit rates in the RL/+G/B blocks during the online control.
The sensitivity of TRcal-TEcal and TRcal-TEon for the R/L, G/B-csp, G/B-wm, and R/L+G/B classifiers are presented in tables 4, 5, 6 and 7. TRcal-TEon for the R/L classifier seems to have a bias towards the left side for more than half of the participants which we believe could be the cause for the hit rate bias in table 1. However, since this is not the case for all 12 participants, it is most probably not due to the experiment design. Data from more participants are needed for a more detailed sensitivity investigation. It is also important to note that this bias is slightly alleviated in the R/L+G/B classifier again pointing towards the superiority of the R/L+G/B control.

Comparison of the transferred online performance (TRcal-TEon)
We compared the performance of TRcal-TEon for the R/L, G/B-csp, G/B-wm and R/L+G/B classifiers in table 8. Note that these values are also presented as green bars in figures 7, 8 and 9. Wilcoxon signed rank test shows that the difference between TRcal-TEon in the R/L and R/L+G/B classifiers is significant (p < 0.001) indicating that the proposed R/L+G/B control provides a more accurate and reliable control for every cursor movement.
The difference in TRcal-TEon across participants for the G/B-csp and the R/L+G/B classifiers is also significant (Wilcoxon signed rank test, p < 0.001). However, the difference between the G/Bwm and R/L+G/B is not significant across participants (Wilcoxon signed rank test, p = 0.2). Wilcoxon signed rank test for each participant reveals that the R/L+G/B classifier performs significantly different from the G/B-wm classifier for B1, B2, B4, B5, B8, B9, B10 and B11 (p = 0.02, Bonferroni corrected for the number of comparisons, i.e. 12). In fact, for B10 and B11, the G/B-wm classifier provides a significantly higher accuracy than the R/L+G/B classifier for the classification of every step.

Comparison of the classification accuracy in the R/L and R/L+G/B blocks
We separated the steps in the R/L and R/L+G/B blocks and redid the analysis in figures 7, 8 and 9 to investigate whether there is a difference for the results of the offline analysis with respect to which type of block the steps were randomly selected

Discussion
In hybrid BCI systems, multiple sources of information can be processed simultaneously or sequentially Table 4. The sensitivity of the cross-validated calibration classifier (TRcal-TEcal) in predicting the right (R) and left (L) for the R/L and R/L+G/B classifiers. The first number in each entry is the average accuracy across the 10 instances of the data and the second number indicates the standard deviation.       for improved performance [51]. Earlier work in the literature proposed to use the error-related brain activity sequentially to either correct an executed action [14,24,25] or to adapt a trained classifier [26,27]. A different approach in [22,23], proposed a motor imagery BCI for 1-D cursor control that simultaneously integrated the user brain activity in response to the cursor change in its direction of movement. In this study, we went beyond cursor direction changes and proposed a hybrid BCI to simultaneously combine the feedback-related brain activity to every cursor movement with the motor imagery signal. We showed that across 12 participants, they were able to control the BCI using the proposed method significantly better than using the conventional motor imagery BCI. Our results showed that this improvement is significant in terms of the classification accuracy of single cursor movements, We further showed that the performance of the motor imagery classifier (R/L) was negatively affected when transferred from calibration to the online control. We believe that this is in part due to the drift in the EEG data, but also in part due to the fact that some participants performed worse than the sham calibration feedback and this may be affecting the transferability of the R/L classifier -as mentioned earlier, during calibration, participants were provided with sham feedback but were not aware of it. At the same time, the error-related brain activity classifiers (G/B-csp and G/B-wm) were both better transferred from calibration to the online control than the R/L classifier. In other words, G/B-csp and G/B-wm classifiers were able to provide a more reliable (consistent from calibration to online) classifier than the motor imagery R/L classifier. This difference may also be influenced by the fact that the error-related brain activity was time-locked to the stimulus onset as opposed to the user-generated motor imagery signal.
Moreover, we hypothesize that the R/L signals could be impacted by the G/B signals which occur in similar frequency bands [17,23]. However, for G/B, the signals of 'goodness' and 'badness' were either not affected or actually improved [49,52] resulting in a more consistent classifier from calibration to online use.
Other work in the literature also mention that the error-related brain activity can result in a reliable classifier over time. For instance, the authors in [53] and [54] showed that a trained error-related brain activity classifier could be reliably used in a test session over long periods of time. However, to the best of our knowledge, our work is the first that provides evidence comparing the reliability of the errorrelated brain activity with that of the motor imagery signal.
As mentioned earlier, error-related brain activity was used in hybrid BCIs before, e.g. to correct mistakes of the primary motor imagery classifier, or to improve it by providing labels for additional training data [14,23,24,26,27]. Error-related brain activity has also been used as the sole input for 2D cursor control using a passive BCI [55][56][57][58]. This was implemented using cognitive probing, i.e. the active elicitation of automatic (in this case, error-or predictionrelated) brain response by the system without the explicit involvement of the participant [59]. This was implemented using a windowed-means approach similar to the G/B-wm classifier used here. The current data (see table 8) show that in the transferred online accuracy while R/L+G/B outperforms both R/L and G/B-csp classifiers, the performance of the R/L+G/B classifier is not significantly different than that of the G/B-wm classifier across participants. Given this, one may ask whether a passive control based on the G/B-wm classifier-i.e. a control based solely on the error-related brain activity, as in [56] is capable of providing a comparable or even better control compared to a hybrid BCI. Further comparisons for each participant revealed that the R/L+G/B classifier provided significantly better classification Figure 9. The transferability of the R/L+G/B classifier from calibration to online data. The red and blue bars in each plot indicate the cross-validated calibration accuracy (TRcal-TEcal, i.e. trained and tested on the calibration data) and the cross-validated online accuracy (TRon-TEon, i.e. trained and tested on the online data), respectively. The green bars on the other hand, represent the transferred online accuracy (TRcal-TEon, i.e. trained on the calibration data and tested on the online data). AVR indicates the averages across participants. Bar heights represent the average and the error bars indicate the standard deviation for individual participants and the standard error of the mean for the average across participants (AVR). accuracy for 6 participants (B1, B2, B4, B5, B8 and B9). On the other hand, the G/B-wm classifier was significantly better than the R/L+G/B for 2 participants (B10 and B11). While the answer to this question could very well be user-dependent, a possible factor that is not answered in this study is whether the error-related brain activity is different (potentially more pronounced) when users are actively controlling a BCI [60] than in a passive BCI.

Conclusion and future work
In this work, we showed the efficacy of our proposed hybrid BCI that combines the motor imagery signal with the error-related brain activity in response to the BCI error. Our proposed BCI significantly outperforms the conventional motor imagery BCI in terms of accuracy, information transfer rate and the perceived subjective rate. We further analyzed the two components of our proposed hybrid BCI, namely the motor imagery and the error-related brain activity classifiers. We showed for the first time that compared to the motor imagery classifier, the classifier based on the error-related brain activity is more consistent with respect to transferring the trained classifier to the online control. This finding helps explain why the proposed hybrid BCI outperforms a conventional motor imagery BCI, and may help improve other forms of BCI as well.
Future work should compare the transferability of the classifiers from calibration to online using other approaches beyond common spatial patterns and windowed-means (e.g. [11,[61][62][63][64][65]). Further study is also required to conclude if a passive BCI based on either the G/B-csp or G/B-wm classifiers or a combination of the two, may be a better choice for some participants as opposed to the proposed hybrid BCI. Future work can also shed light on why the performance of the motor imagery classifier is less consistent than the error-related brain activity classifier. This can allow us to better understand the interaction between the two sources of information (i.e. the error-related brain activity and the motor imagery signal) and direct us to design an even more reliable BCI system.