Identifying a Suitable Signal Processing Technique for MI EEG Data

Motor imagery (MI) electroencephalography (EEG) technology is acquiring great attention from researchers due to its remarkable real-world applications. EEG signals inherit a high degree of non-stationarity, making their analysis not modest. Hence, choosing an appropriate signal processing approach becomes crucial. This comparative paper aims to identify a suitable signal processing method among famous approaches, namely short-time Fourier transform (STFT), continuous wavelet transform (CWT), and two variations of discrete wavelet transform maximal overlap DWT (MODWT) and MODWT multiresolution analysis (MODWTMRA). Different mother wavelet basis filters experimented with wavelet methods: Morse, Amor, Bump, Symlets, Daubechies, Coiflets, and Fejér-Korovkin. The different methods were tested on the classification of the right-hand and left-hand motor imagery tasks using the brain-computer interface (BCI) competition IV 2b dataset. A shallow convolutional neural network containing a single convolution layer was first trained and then used for classification. The experimental outcomes verified that MI EEG signals can be better analyzed and recognized using the maximal overlap-based signal processing methods. The classification accuracy proved that MODWT and MODWTMRA with the Symlets wavelet outperformed the other methods.


1.INTRODUCTION
Comprehensive research has been conducted on electroencephalogram (EEG) for various purposes, including medical diagnosis and treatment, person identification and authentication, social interaction, and many other applications [1][2][3][4].One of the well-known types of EEG recordings adopted in braincomputer interface (BCI) systems is motor imagery (MI) [5,6], which is a noninvasive technique and thus easy to record from a subject.MI EEG-based BCI appeared as a committed technology with potential applications within the medical and nonmedical fields.Simply, the motor imagery task involves just imagining moving a precise limb or joint in the subject body without actually carrying the movement out.This imagination produces changes in the brain's electrical signals that can be captured at the cortex and used to issue control instructions to a BCI external device.The EEG technology, in general, is characterized by highly-temporal resolution signals, safety for the subject as it has no risks during the recording process, as well as its low cost and portable device.While the EEG data analysis has some difficulties due to the non-stationarity of recorded signals, the multichannel recording strategy, the channel correlation, and the presence of extrinsic artifacts and noise [7,8].Consequently, there is a need for robust and efficient systems that can extract relevant information from these signals.The employment of MI EEG technology in reallife applications faces several challenges.For example, The EEG sensors (electrodes) need to be improved to make them user-friendly, and signal processing approaches need to be enhanced, particularly in terms of sampling rate and classification methods.Additionally, careful consideration must be given to the choice of technology to ensure that the end-user device is fast, reliable, robust, cheap, wearable, and portable.The deep learning (DL) approaches have recently been used to learn the patterns of different EEG tasks [7].DL algorithms automatically extract, select, and classify features without requiring designing and determining which features to deal with.However, the DL approaches typically require a large amount of data for large-scale neural networks to be effectively trained [9], which means a considerable number of training samples (trials in MI EEG) must be available to obtain a reliable and robust classifier.The available MI EEG datasets comprise few trials per subject, making the use of the DL techniques difficult.However, this problem can be overcome using shallow neural networks, especially when the number of classes to be classified is small.Performing a specific motor imagery task results in a reduction in spectral power in the mu band (8-13 Hz), referred to as event-related desynchronization (ERD), and an increase in the spectral power in the beta band (13-30 Hz) referred to as event-related synchronization (ERS) [7].These changes in energy levels within identified frequency bands can be utilized to create useful images that capture the ERD/ERD patterns and, consequently, can be used to train a neural network.
Time-frequency representation techniques, such as Fourier and wavelet transforms, can be used to create the images.Generally, three input formulations are typically used in the EEG signal analysis: calculated (hand-engineered) features, timeseries signals, and time-frequency spectral images [7].Calculated features type of input formulation is a conventional method of providing neural networks with training data in vectors.This input type is suitable for analyzing small datasets only because it may take a long time when dealing with large-scale datasets.Also, generating features may cause potential information loss, ultimately affecting classification accuracy.Time-series input formulation for EEG data involves using the amplitude of signals in the time domain, and it offers the advantage of end-to-end training without the need for feature extraction or using a third-party algorithm to generate another form of data, such as images.However, the raw time-series input formulation cannot jointly capture all spatial, temporal, and frequency information of the two important features, ERD and ERS.To formulate EEG data as images, various techniques, such as STFT and wavelet transform, can be used.The output feature maps are represented in two-dimensional or three-dimensional matrices.Nevertheless, the time-frequency representations neglect the spatial information related to the location of electrodes, which is essential in EEG analysis [10].To overcome this problem and achieve a reliable classification model, the sub-images of electrode signals are arranged in a form that preserves spatial information.This paper aims to quantify the classification accuracy of the most well-known time-frequency representation approaches, such as short-time Fourier transform (STFT), continuous wavelet transform (CWT), Maximal overlap discrete wavelet transform (MODWT), and MODWT multiresolution analysis (MODWTMRA).Also, the techniques' effect is studied by changing the mother wavelets and the presence and absence of the eye blink artifact.The remainder of the paper is structured as follows: Section 2 provides the recent related works.Section 3 discusses the approaches used in this research, such as the compared different types of signal processing, the MI EEG dataset, the architecture of the used neural network, and other details.Section 4 presents the obtained experimental results with a discussion.Finally, Section 5 concludes the paper's outcomes.

2.LITERATURE REVIEW
In the past years, a common approach for analyzing EEG trials was transforming them into two-dimensional images using timefrequency representation approaches, with STFT and CWT being the most frequently used methods.By transforming the data this way, capturing the power spectrum at each explicit frequency band within a signal becomes possible.Additionally, the spatial information of the recording EEG electrodes can be conserved throughout arranging the resulting sub-images to preserve the order of the recorded signals.Hence, all the information related to time, frequency, and spatial (electrode location) can be used to train a neural network.

3.2.Formulation of Input Images
By leveraging the energy changes within known frequency bands, it is possible to create meaningful images that capture the patterns of ERD and ERS.The patterns are then utilized for training a neural network.Time-frequency representation techniques can be employed to generate these images.As the used dataset was recorded by three electrodes, three images for each trial were obtained.From each image corresponding to an electrode, two sub-images were extracted corresponding to both mu and beta bands to capture the two phenomena, ERD and ERS.The sub-images sizes were then unified to ensure equal weighting for both bands.The mu band was captured within (8-13 Hz), while the beta band was captured within (13-30 Hz).For each trial, the result sub-images were arranged vertically and combined to reserve the electrodes' spatial location and form the final training image.

3.3.Short-time Fourier Transform (STFT)
STFT is a time-series signal processing technique used to analyze the spectrum of a long signal by windowing it into shorter segments and performing Fourier transform (FT) on each segment.The spectrum features within a particular period were analyzed sequentially by changing the window's position [19].The formula for calculating the STFT is as in Eq. (1).Where () denotes a time-series signal, i.e., a single electrode signal of MI EEG in this study, and ℎ() denotes a window function.The complete time window moves over the signal as τ steps.STFT applies a time-frequency centralized window function to the EEG signals and estimates the power spectrum at various time points, which transforms the signals from the time domain to the frequency domain helping in learning information through network models.The time window determines the temporal resolution of the transformed signal at every point.The frequency window represents the range of frequencies captured in the generated matrix.The frequency resolution decreases as the time domain window size increases during STFT transformation.

3.4.Continuous Wavelet Transform (CWT)
CWT and FT share some similarities in their ways of estimation.FT estimates correlation coefficients between each one of the original signals and a sinusoidal signal, while CWT estimates correlation coefficients between each one of the original signals and a predetermined mother wavelet base filter.Nevertheless, unlike FT, which decomposes the signal into a frequency domain, CWT allocates the signal to a time-frequency domain by managing the shape of the mother wavelet using scaling and shifting parameters.CWT [16] can be evaluated using Eq. ( 2).

𝐶𝐶𝐶𝐶𝑆𝑆(2𝜋𝜋𝑓𝑓, 𝑠𝑠
Where () denotes a time-series signal,  denotes a pre-defined mother wavelet,  is the time shifting parameter, i.e., translation parameter, and  is the scaling parameter.

3.5.Discrete Wavelet Transform (DWT)
MODWT and MODWTMRA are variations of the discrete wavelet transform (DWT); therefore, this sub-section presents a brief idea of the DWT [20] and then presents the mathematical model of MODWT.The DWT quantifies the wavelet coefficients of the scale 2  and the location of 2   using Eq.(3).
Where   (, ) is a wavelet coefficient,  is the discrete input signal,  is an integer of power 2, the scaling parameter  0 = 2, the translating parameter  0 = 1, and the mother wavelet is given in Eq. (4).
Where  and  are integer variables that control the scaling and translating parameters.

3.6.Maximal Overlap Discrete Wavelet Transform (MODWT)
MODWT is a mathematical model that decomposes a time-series signal into scaling and multilevel wavelet coefficients.It has some advantages over the discrete wavelet transform (DWT).One of the MODWT key benefits is handling signals of arbitrary length, whereas DWT is restricted to signals with a length that is an integer power of two.Also, MODWT is a transform invariant, meaning that the pattern of the wavelet transform is unaffected by any shift in the input signal [21].The th level of MODWT decomposition wavelet coefficient  , and scaling coefficient  , for an input signal   ( = 0, … ,  − 1) can be obtained from Eq. ( 5) and Eq. ( 6).where  is the level of decomposition ( = 1,2, … , ),  is the highest decomposition level, and  is the filter length.The two filters are settled by the used mother wavelet function.

3.7.MODWT Multiresolution Analysis (MODWTMRA)
The MODWT decomposes the energy of the input signal across detail coefficients and scaling coefficients.On the other hand, MODWTMRA involves projecting a signal onto wavelet subspaces and a scaling subspace.The MRAMODWT decomposes signal  into lowpass filtered approximation component (  ) and high-pass filtered detail components (  ).The MRAMODWT mathematical model [22] can be presented in Eqs.(7)-(9).
Where   is the approximation component.

3.8.Shallow Convolutional Neural Network
The implemented shallow CNN architecture comprised a convolution layer followed by a max-pooling layer.Fig. 3 shows a simple illustration of the network.The height of the 2D kernels used in the convolutional layer equaled the height of the training image, which was 96 in our case.The vertical direction of the kernel apprehends information associated with both electrodes' locations and frequency.The width of the kernels was 3; this horizontal direction of the kernel apprehends information associated with the time.The output of the convolution layer was evaluated by the rectified linear unit (ReLU) activation function.The max-pooling layer subsampled the output of the convolution layer by a factor of 10.A softmax classification fully connected layer was the end layer of the network.This last layer has two neurons to classify the two motor imagery classes, i.e., left hand and right hand.The network was trained on the generated images containing information concerning electrodes' locations, time, and frequency.

3.9.Experimental Settings
The shallow CNN was implemented in Matlab R2020a using the deep learning toolbox.The learnable parameters of the network were optimized by the Adam algorithm.The initial learning rate was set to 0.0001 then every ten epochs of learning was multiplied by a drop factor of 0.9 to achieve smoother learning at the end of the process.The first three sessions from the dataset were used to train the neural network, while the two others were used to test the network.A complete training and testing phase was repeated twice for two cases: training and testing using the trials that contained eye blink artifacts (with artifacts) and training and testing using trials that did not contain the artifacts (without artifacts).The classification accuracy was used for evaluating the performance of the presented signal processing methods.The mathematical formula for evaluating the accuracy is given in Eq. (10).

4.RESULTS AND DISCUSSION
The classification accuracy was evaluated for each subject in the dataset.The average accuracy for the nine subjects was calculated, which was done for different types of the presented signal processing.One of the hyperparameters that need to be set is the minibatch size (MBS), which determines the number of samples (images) used for training the neural network at each iteration.Since there were no different filters (such as mother wavelet) in the case of STFT, different MBS experimented with the STFT.It can be seen from Table 1 that the case of 300 images per iteration achieved the highest accuracy.Thus MBS=300 was adopted with all subsequent processing methods for achieving fairly comparison.Table 1 presents the average accuracy achieved by the STFT.It is clear that without artifacts, classification was better than with the presence of artifacts due to the admixture of different shapes of signals with the MI EEG; such signals are produced by the body muscles and limbs, such as eye blink, finger movement, and leg or hand movement.Table 2 presents the average accuracy achieved by the CWT with three different mother wavelets Morse, Amor, and Bump.Also, it can be noticed that the classification performance in the case of no artifacts was better than with artifacts.3. The selected studies used the same MI EEG dataset (BCIC IV 2b) for a reasonable comparison.The obtained results in this study are reasonable.The maximal overlap-based methods, i.e., MODWT and MODWTMRA, outperformed other signal processing methods, which confirms that MI EEG data require highconcentration methods to encompass the nonstationarity of the signals.It may be noticed that although almost the same general model used by [9,11], and this study was similar (STFT+shallow CNN), the results of the three studies were different because a data augmentation method was used by [9], whereas no augmentation was used by [11] and this study.Also, the process of training a neural network depends on several settings, such as learning rate, number of epochs, batch size, activation function, parameter initialization at the beginning of training, and other settings.The variation of those settings could also cause some variation in the end performance.

5.CONCLUSION
The MI EEG data implicate high nonstationarity within their time-series signals, so selecting the proper signal processing method is essential.Hence, the present comparative study tried to identify the best method among the well-known approaches; namely STFT, CWT, MODWT, and MODWTMRA; with different types of mother wavelets.Among the experimented various combinations, the MODWT and MODWTMRA achieved the best classification results, especially using Symlets mother wavelet.The trade-off between the time and frequency resolution exhibited by STFT has limited capability for analyzing MI EEG signals.
Also, the simple form of the wavelet transform, which depends on dynamically changing the time-frequency resolution based on the inherited frequency bands in the signals, did not show the optimal results.The approaches that depend on applying maximal overlapping base wavelet filters showed superior classification performance.
Tabar and Halici [11] proposed a neural network model that combines a convolutional neural network (CNN) as an auto-feature extractor and stacked autoencoders (SAE) as a classifier.The suggested CNN model was a shallow CNN comprising only one convolution layer, pooling layer, and classification layer.They used a shallow CNN to train it with the available limit size of data.The authors applied STFT to generate 2D images to train and test their proposed network.Lu et al. [12] obtained the frequency representations of EEG signals by applying fast Fourier transform (FFT) and wavelet package decomposition (WPD) separately to compare the two approaches.They trained a deep belief network (DBN) of restricted Boltzmann machines (RBM) leveraging the obtained frequency domain features.Zhao et al. [13] utilized wavelet kernels within their deep convolutional neural network (CNN) to decrease the required learnable parameters compared to traditional convolutional filters.Cropping augmentation, transfer learning, and early stopping policy were implemented to improve the training process and minimize the overfitting hazard.In Ref's [14, 15] applied STFT to transform each raw signal of a multi-signal trial into a 2D spectral image.Then sub-images related to certain frequency bands were extracted and combined while preserving the electrodes' locations.Dai et al. [14] used spectrograms to train a shallow CNN.Whereas Xu et al. [15] used the generated images for transfer learning using the pre-trained VGG-16 neural network.Ortiz-Echeverri et al. [16] applied the blind source separation (BSS) technique to estimate the latent independent sources (components) from the raw signals of EEG, with one component per electrode, which was done to diminish the impact of probable noise and artifacts.Then, the assessed components were arranged depending on the correlation between each of their components.Subsequently, a 2D image was created for each trial by assembling sub-images retrieved for the estimated components using the CWT.The generated scalograms were used to train a small-size CNN architecture composed of two convolution layers.Xie and Oniga [17] used a deep CNN with two branches; each branch consisted of two convolution layers.The first branch received EEG trials as raw time-series signals, while the second branch received EEG trials as an image generated by CWT.The two branches' obtained features were combined and sent to the classification layer.Data transformation was used as a data augmentation method to increase the number of training samples.Hwang et al. [18] utilized the frequency band common spatial pattern (FBCSP) as a feature extraction method at different frequency bands.The information-theoretic feature selection (ITFE) algorithm was then used for feature selection.The selected features were used to train long short-term memory (LSTM) network.The network consisted of three stages: the input for learning a new pattern, the forget to remember or forget the last pattern, and the output to pass the updated pattern to the next step.3.MATERIALS AND METHODS This section describes the MI EEG dataset used, the way of training image formulation, the different experimented time-frequency representation techniques, the shallow CNN, and the experimental setup.Fig. 1 depicts the general block diagram of the proposed methodology for identifying the suitable signal processing technique for MI EEG analysis.Briefly, the recorded 3-channel MI EEG signals were transformed into time-frequency spectral images and then used for training a neural network.Four main types of signal processing were used: STFT, CWT, MODWT, and MODWTMRA.The reason for selecting STFT and CWT is their generality of use for analyzing different types of signals.Also, as presented in the literature review section (section 2), they analyzed EEG signals satisfactorily.Both MODWT and MODWTMRA are recent variations of the wavelet transform, and they are worth a try for analyzing MI EEG signals.3.1.DatasetThe BCI competition IV 2b dataset [7] is commonly used in EEG-based BCI research.It consists of EEG trials recorded from 9 subjects while executing motor imagery tasks.The dataset contains 5 sessions, with each subject performing 2 sessions on the first two days and 3 sessions on the following three days.Each session contains 120 or 160 trials, with each trial lasting for 4 seconds.Some of the provided trials contain eye blink artifacts in all five sessions.The EEG signals were recorded using 3 electrodes (C3, Cz, and C4) with a sampling rate of 250 Hz.The dataset includes two classes of MI tasks: the imagination of moving the left hand and the imagination of moving the right hand.A timing scheme of the recorded trials is illustrated in Fig.2.The BCIC IV 2b dataset is widely used for developing and testing the BCI systems that decode motor imagery tasks.

Fig. 1
Fig. 1 Block Diagram of the Research's Methodology.

Fig. 2
Fig. 2 Timing Outline of the Recorded MI EEG Trials.

Fig. 4
Fig. 4 shows a visual comparison between the achieved results by MODWT and MODWTMRA with five different types of mother wavelets, namely Symlets (sym), Daubechies (db), Coiflets (coif), and Fejér-Korovkin (fk) and their different orders.Part A of the figure compares the two approaches with the presence of artifacts, while part B compares the two approaches with the absence of artifacts.Relatively, more cells of red color mean bad classification in comparison to the green cells, which means good classification.In general, the

Table 1
Average Accuracy of STFT for Different MBS.

Table 2
Average Accuracy of CWT.

Table 1 and
Table2.In contrast, the classification accuracy was higher in the case of no artifacts than in the presence of artifacts.It can be noticed that in most cases of mother wavelets, MODWTMRA achieved better results than MODWT.In the case of STFT, using a short window size provided a reasonable time resolution but a poor frequency resolution.Conversely, a wider window size provided a good frequency resolution but a poor time resolution.This time-frequency resolution trade-off may not be optimal for interpreting the MI EEG signals.A multiresolution-based wavelet transforms, i.e., MODWT and MODWTMRA, were considered more suitable than the static time-frequency-based Fourier transform and dynamic time-frequency-based wavelet transforms.Whereas, in addition to the inclusion of time and frequency information, the multiresolution, i.e., MODWTMRA, allowed for the analysis of signals at deep levels, which are all essential in EEG analysis.

Table 3
Comparison of Obtained Results with Other Studies.