Front-End Replication Dynamic Window (FRDW) for Online Motor Imagery Classification

Motor imagery (MI) is a classical paradigm in electroencephalogram (EEG) based brain-computer interfaces (BCIs). Online accurate and fast decoding is very important to its successful applications. This paper proposes a simple yet effective front-end replication dynamic window (FRDW) algorithm for this purpose. Dynamic windows enable the classification based on a test EEG trial shorter than those used in training, improving the decision speed; front-end replication fills a short test EEG trial to the length used in training, improving the classification accuracy. Within-subject and cross-subject online MI classification experiments on three public datasets, with three different classifiers and three different data augmentation approaches, demonstrated that FRDW can significantly increase the information transfer rate in MI decoding. Additionally, FR can also be used in training data augmentation. FRDW helped win national champion of the China BCI Competition in 2022.


I. INTRODUCTION
A BRAIN-COMPUTER interface (BCI) can measure and process the subject's brain activities and translate them into interactive information or commands for external device control [1].Sensorimotor rhythm (SMR) based BCIs [2], [3], [4] are based on the principle that the execution or imagination of limb movements, the latter also known as motor imagery (MI), changes the cortical rhythmic activity [5].The increase and decrease of the SMR are called event-related synchronization (ERS) and event-related desynchronization (ERD), respectively.
MI classification could be performed offline or online.Offline classification means the entire test EEG data are available offline for analysis.Online classification collects the subject's EEG signals and makes inferences in real-time, aiming at both high classification accuracy and fast response.The information transfer rate (ITR) is an important metric for online BCIs.
There are several challenges for online MI classification: 1) Varying EEG trial length.Most EEG classification algorithms assume the input trial has a fixed length, whereas for fast response, EEG trials in online classification usually have varying lengths.2) Trade-off between speed and accuracy.For fast response, classifications should be made for short EEG trials, which usually reduces the accuracy.3) Large individual differences.In cross-subject MI classification, to alleviate individual differences, some EEG data from the test subject are usually needed for model calibration.Online classification usually has less calibration data than offline classification.
To cope with these challenges, we propose a front-end replication dynamic window (FRDW) approach for online MI classification.Our main contributions are: 1) We propose a front-end replication (FR) approach to fill each test EEG trial to a fixed length required by the online classifier, improving the classification accuracy.Additionally, FR can also be used in training data augmentation.2) We use a dynamic window (DW) approach to adaptively adjust the length of each test EEG trial to improve the decoding speed.3) We integrate Euclidean alignment (EA) [15] with FRDW to accommodate individual differences in online cross-subject MI classification.
Extensive within-and cross-subject experiments on three public MI datasets with three classifiers and three data augmentation approaches demonstrated the effectiveness of our proposed FRDW approach.FRDW helped win national champion of the China BCI Competition in 2022. 1  The remainder of this paper is organized as follows: Section II introduces related works.Section III describes the FRDW approach.Section IV introduces the experimental settings.Section V presents the experimental results.Finally, Section VI draws conclusions.

II. RELATED WORKS
This section introduces related works on offline MI classification, online MI classification, and DW approaches for EEG classification.

A. Offline MI Classification
Both conventional machine learning and deep learning (DL) have been used in offline EEG-based MI classification.
Conventional machine learning typically uses expert knowledge to extract EEG features and then feeds them into a traditional classifier, e.g., support vector machine (SVM).Common spatial pattern (CSP) [16] and its variants, e.g., sparse CSP [17], L1-norm-based CSP [18], divergence CSP [19], and probabilistic CSP [20], have been widely used in MI signal processing and feature extraction.Filter bank CSP [21] first partitions the EEG signal into different frequency bands, and then applies CSP to each of them.Some studies also used power spectral density [22], [23] or wavelet transform [24] features.

B. Online MI Classification
Online MI classification has been attracting much attention, due to the requirement of real-time BCIs.
Yang et al. [31] applied data augmentation to the first three-second of a test trial to get five samples, and then voted the five real-time predictions for the final result.The average binary classification accuracy on 80 test trials from two subjects was 71.3%.Tayeb et al. [32] controlled the movement of a robotic arm in real-time, which only responded to high-confidence predictions.Parashiva et al. [33] trained an Error-Related Potential detection model to learn the brain response to feedback and make automatic corrections, achieving 64.88% online classification accuracy.Furthermore, asynchronous MI [34], [35] first recognizes whether the subject is performing MI, and then makes classification.Its typical performance measure includes both classification accuracy and ITR.Spatiotemporal equalization DW recognition [36] allows the adaptive control of the stimulus timing while maintaining high recognition accuracy, significantly improving the ITR and the system's adaptability to different subjects.Chen et al. [37] proposed a filter bank canonical correlation analysis based training-free DW recognition approach for SSVEP.Hadi et al. [38] proposed a novel DW classifier, using ensembling learning for SSVEP recognition.[39], [40] enhanced DW threshold selection to further improve the ITR for SSVEP classification.DL-based DW has also been used in SSVEP recognition, e.g., Zhou et al. [41] proposed an EEGNet-DW approach, which uses a different EEGNet model and threshold for each DW length.
To our knowledge, there has not been DW approaches for MI-based BCIs.

A. Problem Setting
Assume we have trained an MI classifier f (X ), where X ∈ R C×N , in which C is the number of EEG channels, and N the fixed trial length.During the test stage, n test trials arrive one by one.For each test trial, each sampling update brings in L ′ points, i.e., each test trial arrives in the sequence of dimensionality R C×L ′ , R C×2L ′ , . .., R C×N .For maximum ITR, we should make an accurate classification as early as possible, not necessarily waiting for the full

B. FRDW for Online Within-Subject MI Classification
We propose FRDW to improve the ITR of MI-based BCIs.
A DW on the test dataflow acts on the currently available test trial data X ′ ∈ R C×L , where L ≤ L ≤ N and L is the minimum length.When L < N , FR is used to fill X ′ to length N to match the trial length used in training.Specifically, X ′ is repeatedly concatenated: X is then trimmed to have N columns and used as the input to the classifier.This approach is called FR because trimming keeps the front-end of X ′ .When the maximum classification probability on the current X is lower than a confidence threshold τ , the current window may be too short for reliable classification.In such case, we wait for another L ′ samples, and apply FR to the test input of length L + L ′ .The process is repeated until the maximum classification probability exceeds or equals τ .In the worstcase scenario, the entire test trial of length N is used for classification.
Algorithm 1 gives the pseudo-code of FRDW for online within-subject MI classification.

Algorithm 1 FRDW for Online Within-Subject MI Classification
Input: X ′ ∈ R C×L , the current test trial; L, the minimum trial length; ( p, m) = f (X ), the classifier, where X ∈ R C×N , p is the maximum prediction probability, and m is the predicted class; τ , the confidence threshold.Output: m, the predicted class.
, where X N contains the first N points of X ′ ; return m. end end

C. FRDW With EA for Online Cross-Subject MI Classification
In cross-subject MI classification, FRDW can be combined with EA [15] for better performance.EA aligns EEG trials from different subjects in the Euclidean space to reduce individual differences, improving the transfer learning performance on a new subject [42].
Assume there are n trials {X i } n i=1 from a subject.EA first computes the reference matrix R as the arithmetic mean of all n covariance matrices: Each trial X i is then aligned by where X i is the aligned sample for X i .
After EA, the mean covariance matrix of each subject equals the identity matrix I : i.e., EEG trials from different subjects become more consistent.
Since EA requires a few EEG trials for calculating the reference matrix R, it raises problems at the beginning of the Algorithm 2 FRDW With EA for Online Cross-Subject MI Classification Input: X ′ n ∈ R C×L , the current test trial; L, the minimum trial length; ( p, m) = f (X ), the classifier without EA, where X ∈ R C×N , p is the maximum prediction probability, and m is the predicted class; ( p, m) = f E A (X ), the classifier with EA; τ , the confidence threshold; n E A , the minimum number of test trials for applying EA; {X ′ i } n−1 i=1 , all test trials before the current trial; R, the reference matrix of EA; needed only when n > n E A .Output: m, the predicted class.
, where X N contains the first N points of X ′ n ; return m. end end end test phase when there are too few test trials.When the number of test trials is smaller than a threshold n E A (n E A = 10 in this paper), no EA is performed on the test trials, and a classifier without EA is applied to them.When n E A test trials are accumulated, we compute and save the reference matrix R, perform EA on each test trial thereafter, and apply a classifier with EA to it.
Algorithm 2 shows the pseudo-code of FRDW with EA for online cross-subject MI classification.

IV. EXPERIMENTS
Extensive experiments were performed to validate the superior performance of FRDW.
Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

A. Dataset and Preprocessing
Three public MI datasets were used in our experiments, whose statistics are summarized in Table I.
All three datasets were from BCI Competition IV [43] and had the same collection protocol.Each subject sat in front of a computer screen.At the beginning of each trial, a fixation cross appeared on the screen, accompanied by a warning tone.Shortly after, an arrow pointing to a particular direction appeared as a cue (e.g., left arrow for left hand, down arrow for feet), prompting the subject to perform the instructed MI task until the fixation cross disappeared from the screen.The next trial started after a short break.EEG signals were recorded during the experiment.
EEG signals in all three datasets were imported from an open-source repository. 2

B. Data Augmentation
We used two EEG data augmentation approaches to reduce model overfitting: 1) overlap.The trials are augmented using sliding windows, i.e., each sliding window contains 25 sampling points of the original data, and there is a 75-point overlap between two successive windows.Note that 'none' in our comparison means no overlap between any successive windows was used.2) FR.FR introduced in Section III can also be employed for data augmentation in training.It uses sliding windows of length 0.7 * N and 25-point overlap to segment 2 http://www.bnci-horizon-2020.eu/database/data-setseach training trial.Then, for each sliding window, FR is used to complement the front-end data to the back, making the total length N .

C. Classifiers
Three classifiers were considered: 1) EEGNet.EEGNet [25] is a popular CNN for EEG classification.It starts with a temporal convolution followed by a depthwise convolution and a separable convolution, and finally a fully connected layer for classification.We used eight temporal filters and two spatial filters, and dropout rate 0.25.2) CSP+Transformer.CSP [16]+Transformer [44] was inspired by Conformer [45], which used a convolutional transformer.However, the data length used by Conformer was about eight times longer than ours, and it is difficult for us to extract features with only two onedimensional convolutions.Therefore, we utilized CSP to extract log-variance features and performed a temporal convolution and a depthwise convolution on the features as patch embeddings.The transformer encoder block was repeated three times, and a fully connected layer was used for classification.For 4-class MI classification, we used a one-versus-rest strategy for CSP [46], which divided the 4-class classification task into four binaryclassification tasks.We concatenated the first four rows of each filter as the final filter.3) CSP+SVM.Feature extraction using CSP and then classification using SVM [47] is a classical MI classification approach.We employed radial basis function kernel with regularization parameter C = 0.1 for within-subject SVM training, and linear kernel with C = 1 for cross-subject SVM training (default values were used for all other parameters).

D. Performance Metric
The ITR, which considers both the classification speed and accuracy, was used as the primary performance matric: where T is the average trial length (s), M the number of classes, and P the classification accuracy.The unit of ITR is bits/min.When P < 1 M , i.e., the classification accuracy is lower than random, the ITR is set to 0.
Taking four-class classification as an example, the relationship between the ITR and the classification accuracy for different T is shown in Fig. 1.For a fixed T , the ITR grows exponentially as the accuracy increases.For a fixed accuracy, the ITR is an inverse function of the trial length.So, to improve the ITR, we should increase the classification accuracy while reducing the trial length.

E. Experimental Settings
All datasets were partitioned into two parts, training and test.We reserved part of the training data as the validation For within-subject MI classification, we first recorded the hyper-parameters (e.g., trial length, number of epochs) corresponding to the best ITR on the validation set, and then combined the training and validation sets to train the final model using them.For cross-subject MI classification, we used leave-one-subject-out cross-validation, i.e., the test set of one subject was used for testing and the training sets of all other subjects were combined for training.The model with the best validation ITR was used (no re-training as in the within-subject case).
During offline training, the same data preprocessing and augmentation procedures were applied to the training and validation sets.We used the training set to calculate the CSPs, which were then used for validation set feature extraction.
We also used offline data to simulate the online data acquisition process: the EEG data were down-sampled to 250Hz and sent out every 40 ms, i.e., each DW update contained 10 samples (L ′ = 10).

F. Hyper-Parameters
There were two types of hyper-parameters: model training related, and FRDW related.
Model training related hyper-parameters included the trial length, FR ratio, maximum number of epochs, learning rate, and batch size.The data length was selected from {100, 125, 150, 200, 250, 500, 750} according to the ITR on the validation set.The maximum number of training epochs 100, learning rate 0.001, and batch size 64 were used.To cope with randomness, in each experiment, EEGNet and CSP+Transformer were repeated 11 times, and the average performance was reported.CSP+SVM was run only once, as it was much more stable.
The values of the FRDW related hyper-parameters are shown in Table II.

V. RESULTS
This section presents the experimental results to demonstrate the effectiveness of FRDW.

A. Online MI Classification
Tables III-V show the ITRs and accuracies (ACCs) in within-subject classification on the three datasets, respectively.Table III also  Tables III-VIII show that: 1) Compared with FW, FRDW improved ITR with little or no loss of accuracy, regardless of the dataset, classifier, and data augmentation approach.2) FR can also be used for data augmentation in training, achieving comparable or better results than others.3) In cross-subject classification (Tables VI-VIII), combining EA with FRDW can further improve the ITR in most cases.To examine if the differences between FW and our proposed FRDW, and between FRDW without and with EA, were statistically significant, we performed paired-sample t-tests on the ITRs in Tables III-VIII.The null hypothesis was that the difference between the paired samples has zero mean, and it was rejected if p ≤ 0.05.
The within-and cross-subject paired-sample t-test results between FW and FRDW are shown in Tables IX and X, respectively.The results for FRDW without and with EA are shown in Table XI.The statistically significant ones are marked in bold.
Table IX shows that most p-values were smaller than or close to 0.05 when FR was used for train data augmentation, indicating statistically significant ITR improvement of FRDW over FW in within-subject MI classification.Table X shows that most p-values were smaller than or close to 0.05 when FR and EA were used together, indicating statistically significant ITR improvement of FRDW over FW in cross-subject MI classification.

B. Sensitivity Analysis
Fig. 2 shows the sensitivity analysis results of the two hyper-parameters in FRDW, L and τ , on MI1 for withinsubject classification.
For a fixed confidence threshold τ , when the minimum trial length L increased, generally the ITR first increased and then decreased.For a fixed minimum trial length L, when the confidence threshold τ increased, the ITR also first increased and then decreased.Both results are intuitive.
Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

C. Ablation Study
Transformer and SVM classifiers do not require the input EEG trial to have a fixed length, so they can also be used without data replication.However, this subsection shows that using FR in testing still improved their ITRs.
Tables XII-XIII show the test results without and with FR when Transformer and SVM were used as the classifier, respectively.The best ITRs are marked in bold.Using FR in testing always improved the ITRs for both classifiers.

D. Computational Cost
The sampling rates of all three datasets were 250 Hz.We assumed that the BCI system brings in 10 sampling points (L ′ = 10) each time, i.e., we update the test trial every 40 ms.FRDW is required to complete all computations within 40 ms for real-time operations.
Table XIV shows the mean and standard deviation of the FRDW computation time for each update on MI1.All experiments were implemented via Python 3.8 and PyTorch, and ran on a server with NVIDIA RTX 3090 GPU and Intel(R) Xeon(R) Gold 6226R 2.90GHz CPU.On average, FRDW Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

Fig. 1 .
Fig. 1.Relationship between the ITR and classification accuracy for different trial length T.
includes the detailed results on each individual subject.Tables VI-VIII show the ITRs and ACCs in cross-subject classification on three datasets, respectively.The best ITRs for each training data augmentation approach are marked in bold.The values in parentheses are the improvements of FRDW over FW.
MI is a classical and popular paradigm in EEG-based BCIs.Online accurate and fast decoding is very important to its successful applications.This paper has proposed a simple yet effective FRDW algorithm for this purpose.Dynamic windows enable the classification based on a test EEG trial shorter than those used in training, improving the decision speed; front-end replication fills a short test EEG trial to the length used in training, improving the classification accuracy.Withinsubject and cross-subject online MI classification experiments on three public datasets, with three different classifiers and three different data augmentation approaches, demonstrated that FRDW can significantly increase the information transfer rate in MI decoding.Additionally, FR can also be used in training data augmentation.FRDW helped win national champion of the China BCI Competition in 2022.

TABLE III ITRS
AND ACCURACIES IN WITHIN-SUBJECT CLASSIFICATION ON MI1

TABLE VI AVERAGE
ITRS AND ACCURACIES IN CROSS-SUBJECT CLASSIFICATION ON MI1

TABLE VII AVERAGE
ITRS AND ACCURACIES IN CROSS-SUBJECT CLASSIFICATION ON MI2

TABLE VIII AVERAGE
ITRS AND ACCURACIES IN CROSS-SUBJECT CLASSIFICATION ON MI3

TABLE IX PAIRED
-SAMPLE t -TEST RESULTS ON THE TEST ITRS IN TABLES III-V BETWEEN FW AND FRDW

TABLE X PAIRED
-SAMPLE t -TEST RESULTS ON THE TEST ITRS IN TABLES VI AND VIII BETWEEN FW AND FRDW Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
TABLE XI PAIRED-SAMPLE t -TEST RESULTS ON THE TEST ITRS IN TABLES VI AND VIII FOR FRDW WITHOUT EA AND WITH EA

TABLE XII AVERAGE
PERFORMANCE WITHOUT AND WITH FR IN TESTING, WHEN CSP+TRANSFOMER WAS USED ON MI1 always finished all computations within 40 ms, especially when EEGNet or CSP+SVM was used.

TABLE XIII AVERAGE
PERFORMANCE WITHOUT AND WITH FR IN TESTING, WHEN CSP+SVM WAS USED ON MI1

TABLE XIV COMPUTATION
TIME (MS) OF FRDW VI.CONCLUSION