Epilepsy EEG classification method based on supervised locality preserving

Existing epileptic seizure automatic detection systems are often troubled by highdimensional electroencephalogram (EEG) features. High-dimensional features will not only bring redundant information and noise, but also reduce the response speed of the system. In order to solve this problem, supervised locality preserving canonical correlation analysis (SLPCCA), which can effectively use both sample category information and nonlinear relationships between features, is introduced. And an epileptic signal classification method based on SLPCCA is proposed. Firstly, the power spectral density and the fluctuation index of the frequency slice wavelet transform are extracted as features from the EEG fragments. Next, SLPCCA obtains the optimal projection direction by maximizing the weight correlation between the paired samples in the class and their neighbors. And the projection combination of original features in the optimal direction is the fusion feature. The fusion features are then input into LS-SVM for training and testing. This method is verified on the Bonn dataset and the CHB-MIT dataset and gets good results. On various classification tasks of Bonn data set, the proposed method achieves an average classification accuracy of 99.16%. On the binary classification task of the inter-seizure and seizure epileptic EEG of the CHB-MIT dataset, the proposed method achieves an average accuracy of 97.18%. The experimental results show that the algorithm achieves excellent results compared with several state-of-the-art methods. In addition, the parameter sensitivity of SLPCCA and the relationship between the dimension of the fusion features and the classification results are discussed. Therefore, the stability and effectiveness of the method are further verified.


Introduction
Epilepsy is a brain disease with strong suddenness, recurrence and involuntary nature [1]. According to statistics from the World Health Organization, nearly 50 million people worldwide suffer from epilepsy [2]. At present, the clinical diagnosis of epilepsy mainly relies on medical history data and brain examinations, such as visual examination of long-term electroencephalogram(EEG) by experienced doctors. However, this method is time-consuming and the results are subjective. To solve this dilemma, a large number of methods have been proposed to identify epileptic EEG signals through signal processing and machine learning. These methods can reduce the burden of doctors and improve the diagnostic accuracy.
Due to the obvious nonlinearity and nonstationarity of epileptic EEG signals, researchers use multichannel and high sampling rate EEG acquisition equipment to obtain EEG signals of subjects. In this way, they can ensure the highest spatial and temporal resolution. However, this method directly causes the feature dimension of the multi-channel EEG signal to be too high. On the one hand, information redundancy and noise are increased, which brings interference to accurate identification. On the other hand, the signal processing time is prolonged and the diagnosis efficiency is reduced.
In view of the possible dimension disasters in the feature extraction of multi-channel EEG signals, many feature dimension reduction algorithms are proposed. M. Yildiz et al. [3] used principal component analysis (PCA) to reduce dimensionality, obtaining the best 8-dimensional features and improving the classification accuracy by 9%. A. Matin et al. [4] combined PCA with two independent component analysis (ICA) algorithms and achieved good classification results in a variety of tasks. It have been found that manifold learning can improve the classification effect while reducing the dimension of EEG features. Yang et al. [5] proposed a feature dimension reduction algorithm based on local preserving projection and achieved a classification accuracy of more than 97% in a variety of classification tasks. J.Birjandtalab et al. [6] adopted a non-linear data embedding technique based on random nearest neighbor distance metric and achieved an F-measure of more than 87%. Hou et al. [7] used local linear embedding and random forest algorithms to achieve an average classification accuracy of 95%.
Although there have been many dimension reduction algorithms for a single feature, the relationships among features are not well considered. It may cause redundant information between different features, thereby increasing unnecessary feature dimensions. In order to reduce the feature dimension and the information redundancy between features, SLPCCA is applied to the classification of epilepsy EEG since it can effectively use both sample category information and nonlinear relationships between features.
SLPCCA is used to convert the original high-dimensional multi-channel EEG features into low-dimensional fusion features, then the fusion features are input into the classifier. Through experiments on Bonn data set and CHB-MIT data set, it can be proved that SLPCCA can effectively reduce the multi-channel EEG feature dimension. At the same time, the redundant information and irrelevant components in the original features are removed, which improves the classification accuracy and helps to reduce the diagnosis time.
The layout of the paper is as follows. In Section 2, two original features in the experiment are introduced. In Section 3, theories and characteristics of SLPCCA is introduced. The data sets, classifiers and classification evaluation indicators used in the experiment, as well as the classification results under various classification tasks of different data sets are presented in Section 4. Parameter sensitivity and fusion feature dimension of SLPCCA is analyzed in Section 5. Finally, conclusion are presented in Section 6.

Feature extraction
The automatic detection of epileptic seizures is mainly the mining of effective information in EEG through machine learning. Its essence can be attributed to a pattern recognition problem, in which it is important to design features that can effectively distinguish epileptic EEG. Since EEG are significantly different in different frequency bands, power spectral density (PSD) is selected to reflect the changes of EEG energy with frequency. Besides, due to the non-stationary nature of EEG, frequency domain features that require signals to be stationary, such as PSD, have limitations. To overcome the limitations, feature extraction methods based on time-frequency analysis are proposed and fluctuation index of frequency slice wavelet transform (FSWT-FI) is selected here.

Power spectral density
PSD [8] can describe the energy change of the frequency domain signal. Since the energy of the EEG signal during an epileptic seizure is significantly greater than that of the non-seizure, the PSD can effectively distinguish the seizure and non-seizure EEG. The way to obtain the PSD of signal f (t) is as follows: where P (ω) means the power spectral density at frequency ω, E means expectation, ' * ' means conjugate and τ is the time delay. EEG can be roughly divided into five rhythms, namely δ (1 ∼ 3Hz), θ (4 ∼ 7Hz), α (8 ∼ 13Hz), β (14 ∼ 30Hz) and γ (30 ∼ 80Hz). Each rhythm has different energy characteristics during a seizure. Therefore, the PSD on these rhythms are selected as features of the automatic detection of epileptic seizures.

Fluctuation index of frequency slice wavelet transform
Frequency slice wavelet transform (FSWT) is a time-frequency analysis method proposed by Yan et al. [9]. The FSWT of signal f (t) is as follows: where W f (t, ω, k) is the frequency slice wavelet transform form of f (t), ' * ' means conjugate andf is the Fourier transform of f .p is the Fourier transform of the mother wavelet function, that is, the frequency slice function. The commonly used frequency slicing function isp (ω) = e −0.5ω 2 . k is the time-frequency analysis coefficient, which can be obtained as k = ω p η s , where ω p is the width of the frequency window ofp (ω) and η s is the frequency resolution of f (t). The inverse Fourier transform of W f (t, ω, k) in the time domain (t 1 , t 2 ) and frequency domain (ω 1 , ω 2 ) is as follows, then the signal component of the f (t) in the time frequency domain (t 1 , t 2 , ω 1 , ω 2 ) can be obtained, In real applications, EEG is processed in discrete form. Correspondingly, the discrete FSWT is given by [10] W where n is the length of the signal f , F and F −1 are the discrete Fourier transform and inverse discrete Fourier transform, respectively. Besides, the frequency slice functionP k is a sparse sequence of limited length. Equation (2.4) can be realized by reducing resample in the time domain and using fast Fourier transform for each k. So, the inverse discrete FSWT is as follows: where N θ is the number of resample points. After obtaining the time-frequency distribution of EEG signals through FSWT, the corresponding frequency ranges of rhythms δ, θ, α, β and γ are reconstructed to obtain the time-domain waveforms of each rhythm. Then the fluctuation index (FI) of each rhythm signal is calculated as follows for the subsequent classification task, where n is the total length of the signal f . f t+1 and f t are signals at two adjacent moments.

Feature fusion algorithm
Although the above-mentioned PSD and FSWT-FI can well reflect the characteristics of epilepsy EEG, they have the risk of causing a dimension explosion. Because they are extracted on each rhythm of the EEG, when the object is a multi-channel EEG, the feature dimension will increase rapidly. In order to avoid the risk of dimension explosion, feature fusion algorithm is used after feature extraction to fuse information of two features while reducing the dimension.
Canonical correlation analysis (CCA) [11] is a multivariate statistical algorithm dealing with the correlation between two sets of variates. CCA transforms the correlation between two sets of random variables into the correlation between a few pairs of typical variables. It can not only retain the effective information of multiple EEG features participating in the fusion, but also reduce redundant information, thereby reducing the feature dimension.
Although CCA can effectively analyze the correlation between features and has played a role in image recognition tasks [12], it has two shortcomings. On the one hand, CCA is an unsupervised feature fusion algorithm, so it does not contain sample category information. In order to incorporate sample category information into the feature fusion process, discriminant canonical correlation analysis (DCCA) [13] is proposed. The fusion features obtained through the supervised feature fusion algorithm will be better applied to the task of epilepsy EEG classification. On the other hand, as an essentially linear dimension reduction technique, CCA cannot reveal the nonlinear relationship between features. So, kernel canonical correlation analysis (KCCA) [14] is proposed to converts the nonlinear problem in the original space into a linear problem in the high-dimensional space and uses the kernel trick to solve it. However, it is difficult for KCCA to solve the nonlinear problem embedded in the local space. Therefore, the idea of local preservation is introduced and locality preserving canonical correlation analysis (LPCCA) [15] is proposed to better contain the nonlinear properties of EEG. Inheriting the advantages of DCCA and LPCCA while making up for their shortcomings and adding sample weights, supervised locality preserving canonical correlation analysis (SLPCCA) [16] is proposed.

Canonical correlation analysis
Suppose there are two random variables ξ and η. CCA is to find the projection direction α and β so that the correlation coefficient of the projection α T ξ and β T η is the largest. α and β are called typical projection directions. α T ξ and β T η are called canonical correlation variables. By analyzing several pairs of unrelated canonical correlation variables, the analysis of the correlation between ξ and η can be completed. Typical projection directions α and β can be obtained by the following criterion function.
where S ξη represents the cross-covariance matrix of ξ and η. S ξξ and S ηη represent the covariance matrix of ξ and η, respectively. When CCA is used for feature fusion, suppose there are two standardized feature sets X = [x 1 , x 2 , · · · , x n ] ∈ R p * n and Y = y 1 , y 2 , · · · , y n ∈ R q * n , which means X and Y respectively contain p-dimensional features and q-dimensional features of n samples. Based on the idea of CCA, the canonical correlation variables between X and Y are extracted first. These canonical correlation variables can be called canonical correlation features, denoted as α T 1 X and β T 1 Y, α T 2 X and β T 2 Y, · · · , α T d X and β T d Y, a total of d pairs. Then, feature set after projection, namely X * = (α 1 , α 2 , . . . , α d ) T X and Y * = (β 1 , β 2 , . . . , β d ) T Y, can form the fusion feature Z = X * Y * for classification.

Discriminant canonical correlation analysis
Because the sample category information needs to be considered, DCCA reorders the features in the original feature set X and Y according to categories. Then, the matrix C w describing the similarity within the feature vector class can be defined as C w = XAY T , where A is a symmetric positive semidefinite block matrix with rank c and block size n i . c is the number of sample categories and n i is the number of samples in ith category. Similarly, the matrix C b describing the similarity between feature vector classes can be defined as So, the criterion function of the typical projection direction α and β of DCCA can be defined as Same as CCA, Z = X * Y * can be used as the fusion feature after projection for classification.

Locality preserving canonical correlation analysis
To analyze the local structure of the data, if N X (x i ) is the set of local neighbors of the sample x i , the local correlation of the sample can be expressed as Then, the k neighboring matrices of feature sets X are defined as follows where t x are the mean value of the distance between all samples in the feature set X. F Y can be obtained in the same way. So, the local canonical correlation can be expressed as T · β and the criterion function of α and β of the typical projection direction of LPCCA can be defined as . Like CCA and DCCA, Z = X * Y * can be used as the fusion feature after projection for classification.

Supervised locality preserving canonical correlation analysis
Drawing on the idea of LPCCA, SLPCCA constructs the intra-class neighbor set of the sample and defines the intra-class k neighbor matrix of feature sets X as follows, where S N X is the set of intra-class nearest neighbors of the sample. S F Y can be obtained in the same way. Similar to LPCCA, the criterion functions of the typical projection directions α and β of SLPCCA are as follows, where S D S F X and S D S F Y are diagonal matrices composed of column elements of S F X and S F Y , respectively. In order to solve the above optimization problem, the Lagrange multiplier method is used to obtain the following generalized eigenvalue equation: where σ is the Lagrange multiplier. Next, the generalized eigenvector {α i , β i } , i = 1, · · · , d, corresponding to the first d maximum generalized eigenvalues can be found. Thus, α and β can be constructed and then Z = X * Y * can be obtained.

Epilepsy EEG feature fusion method based on SLPCCA
After extracting PSD and FSWT-FI from EEG data, in order to reduce the feature dimension and information redundancy, SLPCCA is adopted to merge PSD and FSWT-FI into a fusion feature. Firstly, the k neighbor matrix within the class according to the relationship between samples in the feature set PSD and FSWT-FI is established. Then, samples and neighbor samples in PSD are correlated with the corresponding samples and neighbor samples in FSWT-FI. The local structure of the data is thus preserved. After that, by solving the optimization problem, the projection direction is obtained. Finally, the original features are projected and combined into a fusion vector. The specific process of feature fusion by SLPCCA is as follows • Construct the intra-class k neighbor matrix S F X and S F Y of feature sets X and Y, namely PSD and FSWT-FI; • Establish criterion function ρ as Eq (3.6); • Calculate typical projection directions α and β by solving optimization problem as Eq (3.7); • Get the projection X * and Y * of feature set X and Y by α and β; • Combine X * and Y * into a fusion feature Z = X * Y * ; • Z is the fusion feature set obtained by the fusion of PSD and FSWT-FI, which will be used to train and test the classifier.
where λ is the eigenvalue, then it can be known that the optimal solutions for projection directions α and β are S XX [11], respectively. Under the constraints of the optimization problem shown in Eq (3.7), the typical projection directions, namely α i and β i , are at most r pairs r = rank S XY . Because the projection directions α i and β i corresponding to the larger eigenvalue λ i contain more information, the projection direction corresponding to the d largest eigenvalues can be used to construct the final d-dimensional projection direction.
Besides, for any α and β that meet the constraints of the optimization problem, there are (3.10) Therefore, cov z * i , z * j = 0, which means the components of fusion features are irrelevant [12]. It proves that there is no information redundancy between the fusion features.
In addition, SLPCCA is further analyzed and compared with the above feature fusion algorithm, namely CCA, DCCA and LPCCA.

Dataset
The EEG data comes from the German Bonn University dataset [17] and CHB-MIT dataset [18], which have been widely used in epilepsy detection related tasks.
The Bonn dataset contains 5 subsets of A-E. Each subset contains 100 segments of single-channel EEG signals composed of 4097 sampling points and the sampling frequency is 173.61Hz. Subsets A and B are respectively taken from the scalp surface of five healthy volunteers with their eyes open and closed. Subsets C and D are respectively taken from five epilepsy patients outside the lesion area and within the lesion area of the inter-epilepsy. Subset E is taken from the epileptic seizure area of the above five epilepsy patients. Each EEG signal is divided into 4 evenly, that is, each subset is divided into a data set containing 400 single-channel EEG signals composed of 1024 sampling points.
The CHB-MIT dataset consists of multi-channel EEG signals from 24 epilepsy patients aged 3 to 22 years. Most EEG signals adopt 23 EEG channels and their electrode positions and naming adopt the international 10-20 system [19]. A sliding window with a window length of 4 seconds and a step length of 2 seconds is used to extract the inter-seizure segment and the seizure segment from the continuous EEG signal.
In the experiment, 70% of each type of sample is used as the training set and 30% is used as the test set. For the Bonn data set, PSD of each sample and the FSWT-FI of each sample's δ, θ, α, β, and γ rhythms are extracted as features. Then feature fusion algorithms are used to get the fusion feature. For the CHB-MIT data set, 23 channels' PSD of each EEG signal and the FSWT-FI of the five EEG rhythms are extracted as features. After forming the concatenated feature vector, feature fusion algorithms are used to get the fusion feature.

Pretreatment, classifier and evaluation index
The original EEG will be interfered by noise, such as eye electrical artifact and EMG interference from human body. Therefore, in order to better extract EEG features, wavelet threshold filtering is used for denoising. Besides, the band stop filter is used to remove power-line interference.
Least squares support vector machine (LS-SVM), which has been widely used in various fields such as machine learning, is chosen as the classifier. In order to evaluate the effect of the classifier, accuracy, sensitivity and specificity are introduced as evaluation indicators. They are obtained as follows:

Experiments on Bonn dataset
In order to consider a wider range of practical clinical situations, such as the classification of multiple types of EEG signals and the classification under the uneven number of samples, 5 subsets of Bonn dataset are used to form 11 classification groups. They include A vs E, B vs E, C vs E, D vs E, AB vs E, AC vs E, AD vs E, ABC vs E, ABCD vs E, A vs C vs E and AB vs CD vs E. These 11 classification groups can be divided into 4 classification tasks: normal EEG vs seizure epileptic EEG, inter-seizure epileptic EEG vs seizure epileptic EEG, non-seizure EEG vs seizure epileptic EEG and normal EEG vs inter-seizure epileptic EEG vs seizure epileptic EEG. The classification results on these four tasks are shown in Table 1, Table 2, Table 3 and Table 4 respectively. The results shown in the tables are the average of all experiments in the corresponding task and the optimal result in each task is bolded.
• normal EEG vs seizure epileptic EEG • non-seizure EEG vs seizure epileptic EEG • normal EEG vs inter-seizure epileptic EEG vs seizure epileptic EEG It can be seen from the above results that fusion features that formed by the fusion of PSD and FI using SLPCCA algorithm can achieve the best classification effect in all kinds of classification tasks, with an average of 99.08%. Series features are also superior to single features in all classification tasks, second only to SLPCCA. For other feature fusion algorithms, because the algorithm itself is not suitable for epilepsy EEG classification tasks, the results are even inferior to a single feature on some classification tasks. But because of the excellent performance of SLPCCA, it can be believed that the application of feature fusion algorithm in the classification of epilepsy EEG is feasible and effective. Although the effectiveness of the feature fusion algorithm has been proven, the difference in the classification effect of each algorithm on the Bonn dataset is small. In addition, because the Bonn dataset is a single-channel EEG dataset with low feature dimensions, the experimental results on the Bonn dataset cannot reflect the advantages of feature fusion algorithm in dimension reduction. Therefore, the binary classification of inter-seizure epileptic EEG and seizure epileptic EEG on the CHB-MIT dataset are carried out. The results are shown in the Table 5. The optimal result is also bolded.

Experiment on CHB-MIT dataset
It can be seen from Table 5 that, in the multi-channel EEG data set, the dimension reduction advantage of the feature fusion algorithm is more obvious. Compared with the series feature, the dimension of each fusion feature has different degrees of reduction. The fusion feature obtained by SLPCCA achieve the highest classification accuracy of 97.18% when the feature dimension is almost one-half of the single feature dimension and one-fourth of the serial feature dimension. It is foreseeable that when the feature dimension or the number of EEG channels is further increased, when the dimensional explosion disaster occurs in the series feature, the fusion feature obtained by SLPCCA can still be in a low dimension. Besides, because the algorithm itself is not suitable for epilepsy EEG classification tasks, the results obtained by other fusion algorithms except SLPCCA are not as good as single feature, just like the experiment on Bonn dataset. However, because CCA and LPCCA consider the correlation between features, their accuracies are still higher than that of single feature dimension reduction methods, such as PCA.

Comparison with other methods
The proposed method is compared with other methods under the same dataset. The comparison results of Bonn dataset and CHB-MIT dataset are shown in the Table 6 and Table 7 respectively. It can be seen from the result that SLPCCA has a higher accuracy in more comprehensive classification tasks.

Discussion
It can be seen from the results that although CCA, DCCA and LPCCA can achieve dimension reduction, the accuracy of the fusion features obtained by them is negatively affected. CCA and LPCCA are unsupervised algorithms, so they may introduce wrong category information in the feature fusion process, thereby reducing the classification accuracy. The dimension of the fusion features obtained by the DCCA is limited by the number of categories, so DCCA is not suitable for small-category classification tasks. However, SLPCCA can not only achieve feature dimension reduction, but also improve classification accuracy, so it will be further analyzed next.

Discussion of algorithm parameter sensitivity
In order to add the local structure information of the data, SLPCCA draws on LPCCA to construct the intra-class k neighbor matrix of the feature set. Because the parameter k will affect the effect of the algorithm, sensitivity analysis is performed on k. k ranges from 15% to 90% of the total number of samples in the same category. Accuracy of task AB vs CD vs E on the Bonn dataset is the evaluation index. The result is shown in the Figure 1.  It can be seen from the figure that when k is bigger than 40%, the accuracy is high and the fluctuation range does not exceed 1%. Therefore, SLPCCA can be considered not sensitive to parameter changes and relatively stable.

Discussion of feature dimension and classification effect
Among the mentioned various feature fusion algorithms, only the fusion feature dimension of DCCA is limited by the number of classification categories. Except DCCA, in CCA, LPCCA and SLPCCA, the dimension of the projection direction depends on the number of non-zero eigenvalues of the generalized characteristic equation. The dimension of the projection direction will then determine the dimension of the fusion feature and ultimately affect the classification effect. Therefore, the classification accuracies of the fusion features of these three algorithms under different feature dimensions are analyzed. Accuracy on the CHB-MIT dataset is the evaluation index. The result is shown in the Figure 2. It can be seen from the figure that as the fusion feature dimension decreases, the accuracy rate of CCA decreases the fastest, followed by LPCCA. And SLPCCA maintains a relatively high accuracy rate when the feature dimension decreases. It is because LPCCA constructs a k-adjacent matrix on the basis of CCA, compared to the fusion features generated by CCA, the fusion features generated by LPCCA have information about the local structure of the data. Besides, SLPCCA borrows from DCCA and changes from an unsupervised feature fusion algorithm to a supervised feature fusion algorithm. The added sample category information enables SLPCCA to contain more information with less dimensions and ultimately achieve a better classification effect. In real applications, EEG features usually have high dimension and contain redundant information. Moreover, because EEG has strong frequency characteristics, features of combined frequency bands are often selected. But some of these frequency bands may be irrelevant to the research, which will lead to the inclusion of irrelevant components. Therefore, it is necessary to use feature fusion methods such as SLPCCA to reduce dimension while connecting multiple features. Thus, redundant information and irrelevant components in features are reduced, and the application effect of EEG, such as classification accuracy, is improved.

Conclusions
In order to avoid the dimensional explosion and information redundancy caused by the series connection of multi-channel EEG features, SLPCCA, which can reflect the sample category information and the local structure of the data, is used for feature fusion. Then the epilepsy classification method based on SLPCCA including the following three steps is proposed. First, PSD and FSWT-FI are extracted from EEG as features. Secondly, SLPCCA is used to find a projection direction and merge the two features of one sample into one feature after projection. Compared with other feature fusion algorithms such as CCA, SLPCCA contains not only the local structure of the sample, but also the category information of the sample. Finally, fusion features are used to train and test LS-SVM. The proposed method is verified on the Bonn dataset and the CHB-MIT dataset. The experiment results show that the proposed method can achieve an average classification accuracy of 99.16%, a sensitivity of 99.06% and a specificity of 99.58% on the 11 classification tasks on the Bonn dataset and the feature dimension is only 80% of the series feature dimension. It can achieve an accuracy of 97.18%, a sensitivity of 97.10% and a specificity of 97.77% on inter-seizure epileptic EEG vs seizure epileptic EEG task on the CHB-MIT dataset and the feature dimension is only 25% of the series feature dimension. It can be seen that the proposed method not only has better classification accuracy than other methods, but also can effectively reduce the feature dimension. What's more, because the proposed method has no special processing, such as selecting specific channels for some patients or selecting different features for different tasks manually, it can be well adapted to other epilepsy EEG dataset.