A Deep Learning-Based Classification Method for Different Frequency EEG Data

In recent years, the research on electroencephalography (EEG) has focused on the feature extraction of EEG signals. The development of convenient and simple EEG acquisition devices has produced a variety of EEG signal sources and the diversity of the EEG data. Thus, the adaptability of EEG classification methods has become significant. This study proposed a deep network model for autonomous learning and classification of EEG signals, which could self-adaptively classify EEG signals with different sampling frequencies and lengths. The artificial design feature extraction methods could not obtain stable classification results when analyzing EEG data with different sampling frequencies. However, the proposed depth network model showed considerably better universality and classification accuracy, particularly for EEG signals with short length, which was validated by two datasets.


Introduction
Epilepsy is characterized by recurrent seizures caused by the abnormal discharge of brain neurons, which often bring physical and psychological harm to patients. Approximately 50 million epilepsy patients have been documented globally, and epilepsy has become one of the most common nervous system diseases endangering human health worldwide. Brain wave is a synaptic postsynaptic potential generated by numerous neurons when the brain is active. It can record brain wave changes during brain activity and reflect the electrophysiological activities of the cerebral cortex or scalp surface of brain neurons [1]. Accordingly, brain wave analysis has become an effective and important method for the study of epilepsy.
Since the 1980s, scholars have been conducting research on epilepsy based on electroencephalography (EEG), among which the identification of epilepsy by analyzing EEG data is one of the important research contents [2]. With the development of computer science and technology, numerous studies have focused on the classification of features extracted from EEG signals by using a computer classification model [3,4]. Such a research often follows the following steps: EEG data acquisition and prepossessing, feature extraction, classification model training, and data prediction. Feature extraction from EEG data is one of the most important steps. Numerous methods are used to extract EEG features, including time-domain, frequency-domain, and time-frequency analyses and chaotic features [5][6][7]. Moreover, some studies have combined or redesigned these methods to obtain new features, thereby eventually achieving good classification results [8][9][10].
With the development of science and technology, the accuracy of medical EEG acquisition equipment has been improved. In addition, some portable EEG acquisition equipment has been developed. For example, emotive has been widely used in brain-computer interface [11][12][13] because it is lightweight and inexpensive and has similar performance to medical equipment. However, although a variety of medical devices or portable EEG acquisition devices produce numerous EEG data that can be used for epilepsy research, the different data sources result in a lack of uniform data formats, such as different sampling frequencies, different signal lengths, and different sampling channels. The inconsistency of data specifications often affects the features obtained by traditional feature extraction methods. This situation raises a question on how to improve the ability of classification methods to adapt to new data. Hence, the universality of classification methods should be improved, while ensuring the enhanced detection and recognition of EEG data.
At present, in-depth learning technology is a popular research area. Given this technology's autonomous learning characteristics from data, it can directly skip the manual design features and extraction process in the traditional methods, avoid the difficulties of manual design features in traditional methods, and manually adjust numerous parameters. In-depth learning technology can accomplish numerous tasks that are difficult to complete in the traditional methods [14]. Some researchers have studied EEG via a deep network [15]. Tabar and Halici [16] converted onedimensional (1D) brain waves into two-dimensional (2D) image data through short-time Fourier transform and accessed the deep network for classification. Bashivan et al. [17] converted the frequency bands extracted from brain waves into topographical maps (2D images) through spectral power and classified the images into depth networks. Hosseini et al. [18] used an in-depth learning method based on a cloud platform to propose a solution for epilepsy prevention and control. Xun et al. [19] and Masci et al. [20] proposed a coding method for epileptic EEG signals based on the deep network. However, the majority of these studies have focused on regular data, such as the same frequency and same length of the sample data. In the feature design aspect, these studies have converted 1D EEG data into 2D image data in advance and classified the features via the deep network. The current study constructed a classification model based on the deep convolution network to automatically learn the characteristics of EEG and adapt to the EEG data of different sampling frequencies and lengths. Our method (including network model and training method) can considerably identify different forms of EEG data.
The remainder of this paper is organized as follows. Section 2 first simulates the EEG data with different frequencies. Thereafter, we classify the data with existing manual feature design classification methods and indicate their disadvantages compared with our model. Section 3 provides details of our proposed network model, training methods, and data processing methods. Section 4 compares our model with existing methods and discusses the advantages of our model. Section 5 presents the summary.

Experimental Result
This section first describes two open datasets and classifies and compares the EEG data at different sampling frequencies using an artificial design feature method and deep network autonomous feature learning method.

Data Description and Data Synthesis
2.1.1. Dataset 1. The first dataset comes from the dataset published by Andrzejak et al. [21]. This dataset consists of five subsets (represented as A to E). Each subset contains 100 EEG signals of 23.6 sec in length, and the sampling frequency is 173.6 Hz. The data include records of healthy and epileptic patients. Among them, there were two subsets of EEG recorded during epileptic seizures, which had 200 samples, and one set of EEG records in the seizure period had 100 samples. Figure 1 shows two types of signals in epilepsy patients during nonepilepsy and epilepsy. They are classified as F and S, respectively. Among them, 200 samples are classified as F and 100 samples are classified as S. Class F is labeled as a nonepileptic seizure EEG signal, while class S is a seizure signal.

Dataset 2. The second dataset was collected by Boston
Children's Hospital [22]. EEG signals are obtained by measuring electrical activity in the brain by connecting multiple electrodes to a patient's scalp. Data length is approximately from half an hour to one hour, including epileptic seizure and nonepileptic data. The sampling frequency of each data sample is 256 Hz, which contains 23-25 channels, and the sample length is approximately 921600. The dataset has 24 subjects. The first 10 subjects are selected for experiment. Each channel in the sample has a name; for example, the first channel was named FP1-F7 (see Figure 2). We selected one of the 23 channels for our study. When epilepsy occurs, the EEG signal will fluctuate substantially, resulting in an increase in the signal variance. We make channel selection based on variance [23]. The method is as follows. We calculate the variance of each channel in each sample, with each sample having a channel with the largest variance, and derive the statistics on these channels thereafter with the largest variance in the sample. The "FT9-FT10" channel has the highest number of occurrences, thereby leading us to choose this channel. A total of 200 EEG samples of epileptic seizures and 200 nonepileptic seizures were randomly intercepted on the FT9-FT10 channel. The length of each signal sample was 4096 (or 16 sec). Class F remains to be labeled as a nonepileptic seizure EEG signal in dataset 2, while class S is a seizure signal.
The signal is a cortical signal, the signal on the left side of the black line is no epilepsy, and the signal on the right side of the black line is epilepsy, as shown in Figure 2.
The two datasets are the most widely used in the current research on epilepsy data classification and detection. Given that the sampling frequency of signals in the two datasets is fixed, we use the signal processing library in SciPy [24] [25][26][27], which have a good classification effect in the existing research, including  3 Computational and Mathematical Methods in Medicine integral absolute value, root mean square, waveform length, sample entropy, Lee's index, Hurst index, DFA index, and multifractal feature. After feature extraction, several common classifiers are selected from the scikit-learn library [28], including k-nearest neighbor (k-NN), linear classifier (LDA), support vector machine (SVM), decision tree (DT), multilayer perceptron (MLP), and Gaussian naive Bayes (GNB). These classification algorithms adopt self-contained parameters in the library. Tables 3 and 4 use the aforementioned features and classifiers to classify datasets 1-0 and 2-0, respectively. The table shows the results of the 3-, 5-, and 10-fold cross-validations. The last column of AVG is the average classification accuracy of each classifier. SVM, which is the commonly used classifier, achieves good classification accuracy and validates the effectiveness of the feature extraction methods. Tables 5 and 6 show the accuracy of the 5-fold classification of datasets by various classifiers. Table 5 shows that under different sampling frequencies, traditional classification methods based on artificial design feature have different classification results in different clas-sifiers. For example, the classification results of SVM should be optimized to GNB. When sampling frequency decreases, classification accuracy fluctuates. For example, the classification accuracy of k-NN decreases, and those of LDA and SVM change substantially. Table 6 shows that the average accuracy of the last column is higher than that of Table 5. This result indicates that the classification method based on artificial design features can achieve superior classification results in datasets 2-0 to 2-7. However, the classification accuracy of data with different sampling frequencies continues to fluctuate significantly. Figure 3 shows the average classification accuracy of two datasets based on artificial design features at different sampling frequencies. The classification results of datasets 1-0 to 1-7 are not ideal, while datasets 2-0 to 2-7 have better classification results. These synthesizations show that the method based on artificial design features depends on the selection of classifiers. Moreover, this method's characteristics are sensitive to the data of different sampling frequencies, which substantially reduces the applicability of the method.   Table 3: Classification accuracy of various classifiers on 1-0 using the artificial design feature method.   Table 5: Classification accuracy of the 5-fold classifier for datasets 1-0 to 1-7.   Tables 1 to 6. Whether or not these methods are effective in the case of mixing various frequency data needs further analysis. Moreover, whether or not a classification model can train the datasets of existing sampling frequencies and effectively predict the data of new sampling frequencies should be further discussed. For example, the model is trained with the 173.61 Hz and 163.61 Hz data to predict the type of the 153.61 Hz data. Given these problems, the third part of this paper explains the solutions and further discusses and analyzes these problems in the fourth part.

Methodology
This section first describes the model structure based on CNN and the training methods for different length sample data.

Classification Model Based on CNN.
Numerous methods of feature extraction are based on artificial design. However, when the data changes, the classification effect based on the general feature extraction method is not stable. In this study, the classification model based on CNN can independently learn and classify data features, including the two steps of feature extraction and classification (see Figure 4). It attempts to obtain good and stable classification results   The left side is a classification process based on artificial design features, which requires two steps. The right side is to input data into the network model and output the classification results directly, as shown in Figure 4.
CNN is a feedforward neural network that improves the classification ability of patterns by posterior probability. The network mainly includes convolutional, pooling, fully connected, and softmax layers. The convolution layer convolutes the input signal data through different convolution kernels to obtain the feature map (i.e., number of convolution kernels equals the number of feature maps). The pooling layer is the process of downsampling the feature map obtained from the convolution operation of the upper layer. The network often increases the network depth by iterating the convolutional and pooling layers. Meanwhile, the fully connected layer connects all feature maps from the upper layer to the hidden layer of a common neural network and eventually outputs the classification results through the softmax layer. This study proposes a multilayer network with cubic iterative convolutional and pooling layers, fully connected layer, and softmax layer to classify EEG data (hereinafter referred to as CNN-E). The model classifies the one-dimensional EEG data of a single channel and makes the input sample data X. The convolutional layer is equivalent to the feature extractor. This layer uses multiple convolution kernels to convolute x and obtains several feature maps that can keep the main components of the input signal. The convolution calculation formula is as follows: where f k n represents the feature map of layer k, f k−1 m is the feature map of the upper layer, w k m,n represents the convolution kernels of the mth feature map of layer k − 1 to the nth feature map of layer k, b k n is the neuron bias, and g k ð⋅Þ is the activation function. When k = 1, that is, the first convolution operation on sample data, f k−1 m = x and M = 1, because only one feature map in the upper layer is x and N is the number of convolution kernels. Given that the input data X is onedimensional, the feature map f k n output by convolution operation is also one-dimensional. In this model, the pooling operation divides f k n with length l into J regions of equal length without overlap, and each region has i/j elements and extracts the maximum value from each region. Hence, the size of the feature map can be reduced to a downsampling. In this way, the strongest features in each region can be selected, and the ability to distinguish the overall features of the model can be enhanced. After the pooling operation, f k n changes from the original length l to j, where the maximum pooling operation is p k ð f k−1 n , iÞ, and i = l/j is the reduction ratio of the feature map. Thereafter, the pooling operation is as follows: Each neuron in the fully connected layer connects to all neurons in the upper layer f k−1 n . The output of all neurons in the upper layer f k−1 n is mapped to a dimension array V by reshape operation, and V is input to the fully connected layer. Thereafter, the fully connected layer can be expressed as follows: where w c and b c are the weights and biases, respectively, of the fully connected layer and c is the output of the fully connected layer. Lastly, the final result is output via softmax, and the operation is as follows: The classification result y is obtained. Assuming that there are N training samples, x ðiÞ represents a sample labeled l ðiÞ . Sample x ðiÞ is calculated by the model to obtain y ðiÞ . Thereafter, cross-entropy is used as the loss function of the model. The formula is as follows: The loss function of the network model is optimized by the SGD [26] optimizer.

Model
Training. Section 3.1 explained the basic structure and principle of the CNN-E model. This section further introduces the parameter setting and model training of the model. Figure 5 shows the CNN-E frame diagram of the neural network model used in this research. Given that a sample signal is stored in an array, each small rectangle in the graph represents the elements of the signal, and numerous small rectangles constitute a sample signal. The length of the input sample signal is 4096. After the calculation of three    Computational and Mathematical Methods in Medicine in datasets 1-0 and 2-0 is 4096, and the length of the new frequency data obtained by resampling changes. The resampling method is operated using the Fourier resampling method in the signal processing toolkit of SciPy. In Figure 6(a), one sample in dataset 1-0 and four new samples (i.e., 1-1, 1-2, 1-3, and 1-4) generated by the sample at different sampling frequencies are presented. With the decrease in sampling frequency, the sample length becomes considerably short. However, the length of input data acceptable to the model is fixed. This study used the complementation method to cut a certain length of data from the head of the sample and supplement it to the tail. Thus, the length of the sample data reaches 4096. Figure 6(b) shows that the data in the red rectangle is replicated and supplemented to the blue rectangle. In this way, the model can be adapted to different length data. If the sample data is above 4096, then the 4096-length data is input into the model. To enhance the universality of the model, there is no data preprocessing operation in data training. For example, the majority of the data in dataset 1 range from −500 to 500, and a small part of the data may be extended to −2000 to 2000 owing to abnormal or noise fluctuations.
Thereafter, the sigmoid function used in the first convolution can reduce the impact of these abnormal data on model training. Figure 6(a) is the new data generated by using different sampling frequencies for the original data, and Figure 6(b) is the sample data after completing the data in Figure 6(a).

Discussion
This section compares the classification results of the artificial design feature method and CNN-E model and different sampling frequencies.

Comparative and Characteristic Analyses of the
Classification Results with the Same Frequency. Data were trained and classified at the same sampling frequency. Figure 7 shows the classification accuracy of the two methods for two datasets. Among them, A represents the average classification result of the classification method based on artificial design features. B is the classification result of the current CNN-E model. In datasets 1-0 to 1-7, we find that the classification accuracy of the CNN-E model 9 Computational and Mathematical Methods in Medicine is above 0.95, which has a good classification effect. In datasets 2-0 to 2-7, the classification accuracy of only 2-0 and 2-2 is lower than that of the classification method based on artificial design features. The majority of the others are higher than those of the classification method based on artificial design features. Moreover, we find that for the two datasets, the classification accuracy tends to decline with a decreasing frequency of adoption. CNN-E continues to maintain relatively stable classification accuracy.
A is a classification method based on artificial design features, and B is a classification method based on CNN-E, as shown in Figure 7. Figure 8 shows the distribution of the F and S data features in datasets 1-0 and 1-7. Under different sampling frequencies, the calculated distribution of features is relatively different. For example, the two types of features are easy to distinguish in f 1 , the two types of features in f 6 and f 11 are nearly unchanged, and the feature f 5 becomes difficult to distinguish. These aspects reflect that the artificial design feature method is considerably dependent on the actual data signal. When the sampling frequency changes, the feature distribution also changes. This situation is also the reason why the classification accuracy decreases with a decrease in sampling frequency in the preceding experiments. From the classification results of datasets 2-0 to 2-7 in Figure 3, the artificial design feature method remains effective. First, the majority of the features (12) are used. Second, Figure 7 shows that these features change regularly at different sampling frequencies. Lastly, these features are selected from the existing features with good experimental results. However, the performance of these features in datasets 1-0 to 1-7 is poor, which also shows that the classification methods based on artificial design feature extraction have considerable differences in the performance of different datasets. However, the features obtained by CNN-E have profound meanings and local features. Although these deep features are difficult to visualize, they have good adaptability, as shown in Figure 7.
In the previous section, the classification method based on artificial design feature design and the classification results of CNN-E at the same sampling frequency are analyzed. This section uses the classification results of different sampling frequency data to show the universality of the CNN-E model. Figure 9 shows that some characteristic distributions of the sample data will change at different sampling frequencies. Given that the data resampling method is based on the Fourier resampling method, the characteristic changes in the frequency domain are relatively small. Figure 9 shows the spectrum of samples at different sampling frequencies. Figure 9(a) lists the spectrum obtained by applying different sampling frequencies to the same sample. This series of spectrum is nearly identical in the blue rectangular frame. To ensure that the model can be adapted to data of different lengths, the length of input samples is supplemented by the complementary method (see Figure 6). The spectrum also changes after completing the sample data of different sampling frequencies. For example, Figure 9(b) shows that with the change of sampling frequency, the spectrum of the new sample is increasingly different from that of the original sample. Figure 9(a) is listed as the spectrum of samples at different sampling frequencies, and Figure 9(b) is listed as the spectrum of samples after the complementation method.  10 Computational and Mathematical Methods in Medicine Figure 10 shows that the classification results of the CNN-E model for different frequency sampled data are better than those of the traditional classification methods based on artificial design features (e.g., k-NN, LAD, and SVM). Although there are considerable differences in the spectral characteristics of samples when the input sample signal is supplemented, the CNN-E model can extract deep features and reduce the feature dimension of the samples. Hence, the model achieves a good classification effect.

Nonequal Length Sample
Testing. In practical application, the EEG classification model faces different sampling frequency data and can also process different lengths of signal data. However, numerous artificial design features have constraints on data length when extracting features. For example, when data length is only one second or the sampling frequency is not high, meaningful time-domain, frequency-domain, or nondynamic features cannot be extracted. Previous classification studies are mostly based on time windows. All samples are divided into new sample sets according to a certain length of time windows, and training and test sets are divided thereafter for training and testing, respectively, the model. Given that the proposed model can be adapted to different lengths of the sample data, we use the experiments in the previous section as bases in utilizing different lengths of time windows to segment the sample data without overlap. The window length is 1 sec, 2 seconds to the signal length. If the sample length of dataset 1-0 is 23.6 sec, then its maximum window length is 23 sec. The sample length of dataset 2 is 16 sec, and its maximum window length is 16 sec. Table 9 shows that datasets 1-0 and 1-1 are divided into different time lengths of 1 to 5 sec, respectively, and the changes of the sample length and sample number are obtained. Figure 11 shows the classification accuracy of different datasets divided by different time lengths based on the CNN-E classification model. From the graph, the model proposed in this research achieves a good classification effect (i.e., amount of data in 1 sec can obtain a high classification accuracy) and has high timeliness on the premise of ensuring high accuracy.

Conclusion
In real life, there are diverse types of EEG signals. The current research on EEG classification has focused on classification accuracy, but the universality of the methods has seldom been discussed. To solve the problem, this study constructed a CNN-E classification model based on CNN. The model could be applied to classify EEG signals with different