A CNN model with feature integration for MI EEG subject classification in BMI

Objective Electroencephalogram (EEG) based motor imagery (MI) classification is an important aspect in brain-machine interfaces (BMIs) which bridges between neural system and computer devices decoding brain signals into recognizable machine commands. However, the MI classification task is challenging due to inherent complex properties, inter-subject variability, and low signal-to-noise ratio (SNR) of EEG signals. To overcome the above-mentioned issues, the current work proposes an efficient multi-scale convolutional neural network (MS-CNN). Approach In the framework, discriminant user-specific features have been extracted and integrated to improve the accuracy and performance of the CNN classifier. Additionally, different data augmentation methods have been implemented to further improve the accuracy and robustness of the model. Main results The model achieves an average classification accuracy of 93.74% and Cohen’s kappa-coefficient of 0.92 on the BCI competition IV2b dataset outperforming several baseline and current state-of-the-art EEG-based MI classification models. Significance The proposed algorithm effectively addresses the shortcoming of existing CNN-based EEG-MI classification models and significantly improves the classification accuracy.

electroencephalography (EEG), functional magnetic resonance imaging (fMRI), magnetoencephalography (MEG), and near-infrared spectroscopy (NIRS) [2,3]. However, non-invasive BMI through EEG signal collecting through electrodes placed on the scalp has been a popular choice due to its fine temporal resolution, low cost, and user-friendly communication with other electronic devices. Compared with other types of brain signals, EEG has some distinct characteristics such as uniqueness, non-linearity, and non-stationary behaviors which vary with the human brain and the mental state of the particular subjects [10]. Additionally, due to the presence of noise from different muscle artifacts, it poses a challenge to effectively improve the signal-to-noise ratio (SNR) to enhance accuracy in subject classification. Thus, the feature extraction and classification of EEG signals is an important aspect of designing a robust BMI system. The commonly used EEG signals include motor imagery (MI) related mu/beta rhythm (de)synchronization, event-related P300 potentials, and steady-state visually evoked potentials (SSVEPs) [11,12]. Among these, MI is the most popular in various EEG-based BCI applications [2,3]. The general workflow of a typical EEG-based MI BCI system is shown in Fig.1 which generally consist of four phases including brain signal acquisition, feature extraction, feature classification, and device control interface. For feature extraction in time-frequency spectrum, wavelet [13] or short-time Fourier-transformation (STFT) [14] have been utilized.
Due to the limitation of feature extraction in the same frequency band, the classification accuracy may fall for different subjects. To overcome this, wavelet packet decomposition (WPD) and dynamic frequency feature selection (DFFS) [15] have been employed to obtain better time-frequency features for each subject [16]. However, the procedure is time-consuming and can not be generalized. In regard to feature extraction of EEG signal in space domain, common spatial pattern (CSP) [17], filter bank CSP [18] have shown to be effective in improving accuracy, however, the performance depends on a specific frequency band and does not consider full time-domain feature extraction from different subjects.
With the advancement of deep learning (DL) in recent years, it illustrates superior performance in MI-BCI classification compared to traditional ML methodologies [14] due to the capability of adapting non-linear and non-stationary signals and extracting important feature information [19] from EEG signal automatically. In this regard, there are several studies have been geared towards the EEG signal classification employing DL, in particular, convolutional neural network (CNN) [14,20,21,22,23,24,25]. A DL model with the combination of CNN and stacked autoencoders (SAE) have been developed which demonstrated the improvement of recognition accuracy for EEG signal classification [14]. In [23], an end-to-end CNN has been designed for efficient MI-EEG signal classification. Some traditional feature extraction methodologies such as wavelet transform (WT) from time-frequency images have been incorporated in CNN for subject classification [26]. A DNN Scheme based on restricted Boltzmann machines (RBM) for MI classification has been proposed in [27]. In addition, EEGNet framework [28] based on compact CNN has been proposed for MI and P300 visual-evoked potentials which demonstrated improvement of classification accuracy compared to the state-of-the-art methods. Along similar line, a deep transfer CNN framework based on the VGG-16 CNN model pre-trained on the ImageNet and a target CNN model for MI EEG signal classification has been proposed in [29]. Furthermore, a 1-D multi-scale CNN [30] based on conditional empirical mode decomposition (CEMD) has been developed which correlates the original EEG signal and intrinsic modal component (IMF) to encode event-related synchronization/de-synchronization (ERS/ERD) information between the channels achieving higher accuracy for EEG signals classification. Additionally, a multiple bandwidth method with optimized CNN framework [31] has been designed for BMI classification with EEG-fNIRS signals. More recently, a hybrid-scale CNN architecture [32] for EEG MI classification has been proposed which demonstrates significant improvement in classification accuracy.
Although, the CNN-based models have achieved better results, there are several issues which cause to hinder the accuracy and performance of the classifier for EEG MI classification.
Firstly, most of the CNN-based models consider only a single convolution scale which is not sufficient to extract distinguishable features of several non-overlapping canonical frequency bands of EEG signal efficiently. Secondly, intrinsic feature extraction of the input signal is often ignored which limits CNN's ability to learn more semantic features from the raw EEG data. Moreover, feature extraction has not been designed to fully integrate into the DL workflow which is the main bottleneck for the deployment of real-time BCI applications with high classification accuracy. Thirdly, one of the common issues of CNN-based models is the lack of sufficient training data which restrain to achieve high classification accuracy for EEG-based MI-BCI classifier.

Brain Signal Acquisition
(raw EEG data + artifact filtering)

Feature Extraction
Feature Classification Electronics Device Control Figure 1: The general workflow of a typical EEG-based MI-BCI system consist of brain signal acquisition, feature extraction, feature classification, and device control interface.
In order to address the aforementioned challenges and shortcomings, an efficient multi-scale convolutional neural network (MS-CNN) has been proposed for EEG-based MI classification.
In the model, a multi-scale convolution block (MSCB) with different convolutional kernel sizes has been designed to extract the effective features of EEG signals from multiple scales for four different frequency bands δ, θ, α, and β from original EEG data for MI classification.
Moreover, important intrinsic and user-specific features including differential entropy (DE) and neural power spectra (NPS) characteristics have been extracted from the original EEG data and integrated into the proposed algorithm to improve the accuracy and performance of the model.  Figure 2: (a) Schematics of electrodes positioning of C 3 , C Z , and C 4 in standardized international 10-20 electrode system ; (b) timing scheme of each trial including first two sessions (top) and remaining three sessions (bottom).
In the present work, BCI competition IV 2b dataset [33] has been utilized to evaluate the efficiency and accuracy of the proposed MS-CNN model. The 2b datasets contain EEG data collected from nine healthy subjects. For each subject, the EEG signal data has been recorded and collected from three bipolar EEG channel electrodes with a sampling frequency of 250 Hz.
These electrodes (i.e., C 3 , C Z , and C 4 ) have been positioned according to standardized international 10-20 electrode system [34] as shown in Fig. 2-(a). In order to eliminate the power line signal noise, a band-pass filter allowing EEG signal frequency between 0.5 Hz and 100 Hz with a notch filter at 50Hz was implemented. Additionally, the electrooculogram (EOG) has been recorded using three monopolar electrodes [35,36]. In the dateset, two types of MI classification tasks has been performed by each subject which include left (class 1) and right-hand movement (class 2) imagination. From the given dataset, there is a total of 5 sessions were considered for each of the subjects. For each subject case, the first two sessions consisting of 120 trials were collected without feedback in EEG signal data. Whereas, the remaining three sessions of 160 trials were recorded with EEG feedback. The schematic of each trial and corresponding timing stamp has been depicted in Fig. 2-(b). For example, consider the first two sessions, the successive events are as following order: at the start of each trial (i.e., t=0) a fixation cross appears; followed by a cue in the form of an arrow at t=3 s to t=4.25 s indicating two distinct MI tasks at once; finally, the subject performs left and right-hand movement imagination according to arrow direction from t=4 s to t=7 s [35,36,37].

Proposed CNN model:
In recent years, CNN has demonstrated significant performance improvement outperforming traditional ML approaches [38] in various applications such as object detection and computer vision [39,40]. The MI subject classification from the raw EEG signal with high variability and non-stationary noise is a challenging task. In this regard, CNN can be an effective method for extracting the most relevant features and learning the hierarchical representations of high dimensional EEG time series data. CNN is a feed-forward network that is usually comprised of the following components: input layer, convolution (Conv) layer, pooling (Pool) layer, fully connected (FC) layer, and output layer. Generally, the CNN network consists of alternating convolution and pooling layers for extracting features, and a fully connected layer at the end for final classification. Mathematically, the convolution process can be expressed as : to various distinct behavioral state [41,42]. Each frequency pattern represents a qualitative assessment of awareness during MI tasks. The low frequency δ-bands (1-4 HZ) was found to carry significant class-related information [2,43,44]. Additionally, in movement-related MI-BCI systems, α (8-13 HZ)and β (13-30) rhythms are important due to their high temporal resolution [45]. An increase and decrease of power spectrum in the β and α-bands results in event-related synchronization (ERS) and event-related desynchronization (ERD), respectively [46,47]. Recently, it has been revealed that θ-band (4-8 Hz) significantly differs between the left/right-hand MI tasks which plays an important role in MI-BCI classification process [32,48,49]. Thus, in the current study, we have considered four non-overlapping frequency bands including δ, θ, α, and β for feature extraction from original EEG data in our proposed CNN model. A filter bank of 1-4 Hz, 4-8 Hz, 8-13 Hz, and 13-30 Hz has been employed to extract EEG signal information in corresponding frequency bands.

multi-scale convolution block (MSCB):
In the proposed MSCB network, convolution block λ L has a relatively large convolution kernel size 1 × Ω L which can capture the overall feature map of the EEG signal. Relatively medium convolution kernel size 1 × Ω M in Output layer ----2 convolution block λ M can preserve relatively coarse grain feature information. Finally, convolution block λ M with small kernel size 1×Ω S can efficiently collect fine-grain localize information.

Multi-scale CNN (MS-CNN):
The proposed multi-scale CNN architecture has been presented in Fig. 4. At first, the inputted EEG signal has been divided into four different frequency bands channels and passed through corresponding MSCB blocks (i.e., MSCB i , i = δ, θ, α, and β) to obtain multi-scale features from the EEG signal as shown in Fig optimization process of the MS-CNN network and increases the classification accuracy for MI-BCI application [30]. Moreover, cross-entropy has been used as a loss function to optimize the model during training. The cross-entropy L CE for n classes can be defined as: Where t i is the truth label; p i is the Softmax probability for i th class. In the proposed model, the trial matrix size of 1 × N Ls (where N Ls is the numeber of data points of each EEG trials) from EEG signal data has been inputted in the MS-CNN; where L s is the segment of EEG signal in interest and N c is the number of EEG channels. The time period of a MI trial ranges from 3.5s to 7.5s as shown in Fig. 2-(b). Thus, with the sampling frequency of 250 Hz, we obtain a trial of L s = 4s with N Ls =1000 data points.

EEG Feature Extraction:
Due to inter-subject variability of the EEG signals, the accuracy of the classifier often diminish. Thus, it is critical to extract the user-specific features information from EEG data to improve classification accuracy in MI based BCIs. Although there are many features such as statistical, time-domain, frequency-domain, wavelet, auto-regressive coefficients can be extracted from EEG signals [52], however, in the proposed framework, two highly discriminant user-specific features have been extracted and integrated in the MS-CNN model which improve the accuracy and performance of the model.

Differential entropy:
From the MI-EEG signal data segment, differential entropy (DE) feature has been extracted which demonstrated superior performance compared to commonly used features [53]. The DE of a given EEG signal X satisfying the Gauss distribution N (µ, σ 2 ) can be expressed as Where σ is the standard deviation, ζ is the mean value. For a particular EEG signal segment, DE feature is equivalent to the logarithm energy spectrum in a distinct frequency band. Thus, a bandpass filter has been employed to extract four different frequency bands δ (1-4 Hz), θ 4.2 Neural power spectra: Typically, EEG signal data has been analyzed using only canonically defined frequency bands, ignoring the aperiodic component which may compromise physiological interpretations. However, the EEG neural data contains both periodic and aperiodic components [42]. Recently, the neural power spectra (NPS) model has been introduced which combines both aperiodic component and putative periodic oscillatory peaks [42].
In this model, the neural PSD has been characterized by the power, specific center frequency, and bandwidth without requiring predefining specific EEG frequency bands and controlling for the aperiodic component. Additionally, the characteristics of these aperiodic components allow one to measure and compare the 1/f -like components between inter-subject variability of the EEG signals. This model has been utilized to extract periodic and aperiodic features of EEG data in the present study. To measure the periodic activity, the power relative to the aperiodic component has been calculated with each peak can be described by Gaussian in terms of parameters a, c and w, where a is the height of the peak, over and above the aperiodic component; c is the center frequency of the peak; w is the width of the peak; F is the array of frequency values. Each Gaussian, n referred to as G(F ) n can be expressed as [42]: Whereas, the aperiodic activity component without any characteristic frequency can be expressed as the function L(F ) as follows: Where b is the broadband offset; χ is the exponent of the aperiodic fit; k is the 'knee'. Finally, across a set of frequencies F , NPS can be expressed as N P S(F ) = G(F ) n + L(F ). For better clarity, an example of NPS containing a strong peak in α band with overlapping nature of periodic and aperiodic spectral features for two classes have been shown in Fig. 5-(a, b). Consequently, the power ratio (spectral power in the bin normalized by power in all spectral bins ) for four different canonical frequency range from the electrodes C 3 , C Z , and C 4 reveals that the α range power decreases as shown in Fig. 5-(c, d, e) on the opposite (i.e. contralateral) hemisphere when MI on one side is performed for a particular class. The apparent differences between the different electrode's PSDs, in particular, relative PSDs of C 3 (see Fig. 5-c) and C 4 (see Fig. 5-e) channels vary between the two MI classes. Clearly, NPS is a highly discriminant user-specific feature that improves the accuracy and performance of the model significantly (see section 6.3 ). In the present study, NPS features containing periodic (c, w, and P band ) and aperiodic parameters (b, k, and χ) have been extracted for each of the four frequency bands for each channel and finally concatenated with MS-CNN main feature matrix as shown in Fig. 8.

Data augmentation of EEG signal:
In DL based MI-BCI system, the accuracy of the MI classifier is greatly dependent on the volume of EEG training data. Without sufficient data, the accuracy may drop. Hence, to improve the accuracy and robustness of the CNN classifier, data augmentation (DA) methods can be employed to generate new sample data from existing EEG training sample [54]. However, improper DA might lead to a decrease in the performance of the classifier. In the present study, four different types of DA methods have been chosen and implemented specifically for EEG-based MI-BCI systems to increase the volume of EEG data during training and improve the performance of the classier (see section 6.4).

Gaussian noise:
In the first DA method, noise has been added to the original training data. The EEG signal has strong randomness and highly non-stationary characteristics.
Thus, randomly added local noise may alter the important EEG feature. In order to preserve the important local feature of EEG data, Gaussian noise (GN) has been added to the original training data [53]. The probability density function P G of a Gaussian random variable ξ can be expressed as follows: Where σ is the standard deviation, ζ is the mean value. In the present work, ζ = 0 has been from the same training class data [55]. If ∆ = {S i } , i ∈ [1, n] is the set of total n number of EEG training data for given class, each training EEG trial S i for the particular class has been subdivided into K consecutive and non-overlapping segments S i K and then generating a new trialsS j = [S j m , S j n , . . . S j n ] by combining segments from different and randomly selected training trials from the same class. The schematic of S & R -DA procedure has been shown in Fig. 6. Considering the original trials S m and S n from the same class have been segmented into . . . , S i m , S j m , S k m , . . . and . . . , S i n , S j n , S k n , . . . . These segments have been recombined in time domain to obtain N additional DA trials. In the present work, input EEG trials (i.e. left/right-hand MI) have been considered as same label with each trial has been segmented into 4 division with 250 data points (i.e., 1 sec long segment).

Window slicing:
In the window slicing (WS) DA method, EEG time series data have been extracted in slices and classification has been performed at different slice levels [56]. During training, each slice of the corresponding class has been assigned to the same class where the size of the slice is one of the parameters for the DA. In the present work, a window of 90% of the training EEG data has been chosen randomly and interpolated back to the original size to fit with the classifier [56,57].

Window Warping:
In the last DA technique, window warping (WW) [56] DA has been utilized which expands or contracts random windows of the EEG training data by some specific value. Considering the length of the original EEG signal as a parameter, WW warps a randomly selected segment as shown in Fig. 7. Although, WW generates input time series of different lengths, however, the issues can be overcome by performing DA on transformed EEG data having equal lengths [56]. The present study considers a random window of 10% of the original EEG data and wrapped it by speeding it up by 2 or slowing it down by 0.5.

Results and discussion:
In this section, the performance and accuracy of the proposed model have been discussed and compared with several existing methods. The BCI competition IV-2b EEG dataset includes 9-subject and 2-class motor-imagery (right hand, left hand) with five sessions for each subject. The first two sessions (identifiers: 01T and 02T ) have been used to train the classifier for all models [35]. The third session (identifiers: 03T ) has been employed during validation.
The last two sessions (identifiers: 04E and 05E ) have been strictly utilized for evaluating the corresponding trained classifier [35,36]. The EEG trial length of L s = 4s with N Ls =1000 data points has been used. The proposed MS-CNN model has been fine-tuned on the valida-  Additionally, MNE v0.23 [58], PyEEG [59], NeuroDSP [60], and FOOOF [61] libraries have been utilized for data pre-processing, feature extraction, spectral-domain and NPS analysis.
The flowchart representing the overall workflow of the proposed framework has been shown in Fig. 8. For each subject, the experiment has been executed 10 times. The accuracy per-    The evaluation matrices P , R, F1 can be defined as The larger values of P , R, F1 indicate better performance of the model. curve has been plotted from the true positive rate (TPR) or R (in the ordinate) and false positive rate (FPR) data (in the abscissa) from SVM, standard CNN, and MS-CNN result as shown in Fig. 10-(b). The area under the ROC curve is expressed in AUC and ranges from 0.5 to 1. The closer the AUC is to 1.0, the higher the performance of the model. As shown in   Fig. 12-(a), it is clear that feature integration significantly improves the accuracy of the model when accuracy increases 10.98% compared to the situation when the proposed model     Table 6, κ values obtained from the MS-CNN model have been compared with some of stateof-the-art machine learning models for BCI Competition IV 2b datasets. These ML models have utilized different feature extraction methodologies including CSP [63], filter bank CSP (FBCSP) [64], Hilbert transform (HT) [65], wavelet packet decomposition (WPD) [16], and empirical mode decomposition (EMD) algorithm considering different classification methods such as SVM [64,66], least squares twin SVM (LSTSVM) [63], LDA [65], and K-NN [16].
From the overall comparison, it can be seen that the proposed MS-CNN model outshines other ML models significantly in terms of κ values for all nine subjects, as shown in Fig.   13. Comparing average κ, MS-CNN has achieved the highest κ value of 0.92 which is 22.2% and 26.01% improvement over the state-of-the-art ML methods in [16] and [64], respectively.   6.6 Comparison with different state-of-the-art DL models : In this section, the accuracy of several state-of-the-art advanced DL models such as separated channel convolutional neural network (SCCNN) [25], deep transfer CNN (DTCNN) [29], CNN and stacked autoencoders (CNN+SAE) [14], 1-D multi-scale CNN (1DMSCNN) [30], frequential deep belief network (FDBN) [27], and hybrid-scale CNN (HS-CNN) [32] have been compared with the proposed model as detailed in Table 7 [67] where the evaluation phase is independent of the particular subject training class. Additionally, where high-performance person-independent classification is compulsory for the wide application of BCI Systems in the real-world, one possible solution to achieving the goal is to build a personalized model with transfer learn- ing. An efficient transfer model can adopt a transductive parameter to construct an individual classifier which can be further extended to an adaptive-based transfer learning classifier [68].
Another direction could be the implementation of unsupervised or semi-supervised learning [69] to circumvent expensive and time-consuming manual labeling in unsupervised learning to perform classification tasks in abundant class labels for a wide range of MI-BCI scenarios.
Moreover, in future work, classification accuracy can be further improved by integrating long short-term memory (LSTM) recurrent neural network (RNN) architecture [70] or self-attention based transformer [71] for extracting semantic temporal-spatial feature of EEG signal and expend the proposed framework for classifying multi-class MI for different BCI applications.
Here, some important future research directions, in particular, geared toward the applications of BCI in the healthcare community have been acknowledged. Firstly, the current framework can be utilized in medical and health care, where the deep learning-based BCI systems predominantly work on the detection and diagnosis of mental diseases such as sleeping disorders, Alzheimer's Disease, epileptic seizure, and other disorders. The MS-CNN model can be widely adopted for its feature engineering and real-time classification for spontaneous EEG streambased neurodegenerative diseases such as Parkinson's disease [72]. Moreover, the current model can be suitable for classifying AD based on spontaneous EEG [73] and diagnosis of an epileptic seizure. In such scenarios, a hybrid model containing recurrent neural network (RNN) architecture attached to the MS-CNN model considering tempo-spatial feature extraction can be utilized in seizure diagnosis [74,75]. Additionally, different mental diseases such as depression [76], Interictal Epileptic Discharge (IED) [77], schizophrenia [78], Creutzfeldt-Jakob disease (CJD) [79], and Mild Cognitive Impairment (MCI) [80] can be detected employing the current BCI deep learning model. Furthermore, the current framework can be extended as a more reliable and robust MI-based real-time brain signal based communications applications such as robotic control [5,6,7], P300 speller [81], rehabilitation of neuromotor disorders [4], text entry speech communication [8,9], cognitive load measurement [82], gaming [2,3] etc. The current model can be extended to various material modeling [83,84,85,86,87,88,89,90] 8. Conclusion : Summarizing, in this study, a multi-scale convolutional neural network has been designed for EEG-based MI classification. The multi-scale convolution block consisting of different convolutional kernel sizes in the proposed model can extract semantic features in multiple scales for different frequency bands δ, θ, α, and β from original EEG data for the classification purpose.
Several intrinsic and user-specific features have been extracted from the original EEG data and integrated into the proposed algorithm to improve the accuracy and performance of the model.
Furthermore, various data augmentation methods have been utilized to further improve the accuracy and robustness of the proposed classifier by increasing training EEG data. In order to validate the effectiveness of the framework, the proposed model has been applied to the BCI competition IV-2b dataset. Compared with other existing state-of-the-art algorithms, the classification accuracy of the current algorithm has been significantly improved. The results show that the proposed algorithm can attain high classification accuracy with the characteristic of similar performance among the different subjects. With average accuracy of 93.74%, the current framework demonstrates excellent classification performance and generalization. It improves the average classification accuracy of 11.13%, 9.74%, and 6.14% over the recent and more advanced DL models 1DMSCNN, FDBN and HS-CNN (with up to 18.1% increase of accuracy in the subject-specific case), respectively. The proposed model can extract more effective features from EEG signals and can be used to design the efficient and accurate real-time