Auto-Weighted Multi-View Discriminative Metric Learning Method With Fisher Discriminative and Global Structure Constraints for Epilepsy EEG Signal Classification

Metric learning is a class of efficient algorithms for EEG signal classification problem. Usually, metric learning method deals with EEG signals in the single view space. To exploit the diversity and complementariness of different feature representations, a new auto-weighted multi-view discriminative metric learning method with Fisher discriminative and global structure constraints for epilepsy EEG signal classification called AMDML is proposed to promote the performance of EEG signal classification. On the one hand, AMDML exploits the multiple features of different views in the scheme of the multi-view feature representation. On the other hand, considering both the Fisher discriminative constraint and global structure constraint, AMDML learns the discriminative metric space, in which the intraclass EEG signals are compact and the interclass EEG signals are separable as much as possible. For better adjusting the weights of constraints and views, instead of manually adjusting, a closed form solution is proposed, which obtain the best values when achieving the optimal model. Experimental results on Bonn EEG dataset show AMDML achieves the satisfactory results.


INTRODUCTION
Epilepsy is characterized by an unexpected seizure periodicity, where brain temporary dysfunction is caused by abnormal discharge of neurons (Kabir and Zhang, 2016;Gummadavelli et al., 2018;Li et al., 2019). During the seizure, motor dysfunction, intestinal and bladder dysfunction, loss of consciousness, and other cognitive dysfunction often occur. Since the occurrence of epilepsy is often accompanied by changes in spatial organization and temporal dynamics of brain neural neurons, many brain imaging methods are used to reveal abnormal changes in brain neural neurons caused by epilepsy. EEG signal is an important signal to record the activity of neurons in the brain. It uses electrophysiological indicators to record the changes in the electrical wave of the cerebral cortex generated during brain activity. It is the overall reflection of the activity of brain neurons in the cerebral cortex. Many clinical studies have shown that due to abnormal discharge of brain neurons, epilepsy-specific waveforms, such as spikes and sharp waves, appear during or shortly before the onset of seizures, so identifying EEG signals is an effective detection of epilepsy method. Clinically, the detection of seizures based on EEG signals mainly relies on the personal experience of doctors. However, modern EEG recorders can generate up to 1,000 data points per second, and the standard recording process can last for several days. This procedure will make manual screening require a lot of physical and mental exhaustion, and after a long period of observation, the doctor's judgment is easily affected by fatigue.
With the gradual development of smart healthcare, more and more machine learning algorithms are applied to the detection of epilepsy of EEG signals (Jiang et al., 2017a;Juan et al., 2017;Usman and Fong, 2017;Richhariya and Tanveer, 2018;Cury et al., 2019). In the view of machine learning, the EEG signal recognition contains two stages: feature extraction and classification method. The commonly used feature extraction methods for EEG signals are time-domain feature extractions and frequency-domain feature extractions (Srinivasan et al., 2005;Tzallas et al., 2009;Iscan et al., 2011). Since the original EEG signal is the time series signal, the time-domain feature extractions are generally based on the original EEG signal; then, the relevant statistics of the time series are calculated, and the epilepsy EEG features are extracted, using the kernel principal component analysis (KPCA) (Smola, 1997). The frequencydomain features are to transform the original EEG signal in the time domain to the frequency domain and then extract the relevant frequency-domain features as EEG features (Griffin and Lim, 1984). Although these feature extraction methods provide good performance in some practical applications, there is no feature extraction method that can be applied to all application scenarios. EEG signals are generated by numerous brain neuron activities. Due to the non-linear and non-static nature of EEG signals, how to extract effective features is still an important challenge. For those reasons, the multiple feature based multi-view learning concept has become a hot topic in EEG signal classification Wen et al., 2020). Different from using the single feature type, the multi-view learning method can comprehensively use a set of data features obtained from multiple ways or multiple levels. Each view of the data features may contain specific information not available in other views. Specifically, these independent and diverse features can be extracted from timedomain, frequency-domain, and multilevel features of signals. Appropriately designed multi-view learning can significantly promote the performance of EEG signal classification. For example, Spyrou et al. (2018) proposed a multiple featuresbased classifier to use spatial, temporal, or frequency EEG data. This classifier performs dimensionality reduction and rejects components through evaluating the classification performance. Two multi-view Takagi-Sugeno-Kang fuzzy systems for epileptic EEG signals classification are proposed in Zhou et al. (2019) and Jiang et al. (2017b), respectively. The former fuzzy system is developed in a deep view-reduction framework, and the latter fuzzy system is developed in a multi-view collaborative learning mechanism.
Besides the multi-view learning, classification algorithm is very important for EEG signal classification. One of the recent trends is the metric learning method. Metric learning method learns a more suitable distance measurement criterion in the feature space from the training data. Metric learning can be used for specific tasks, such as classification and clustering, so as to more accurately represent the similarity between samples. Different from traditional Euclidean distance, such as nearestneighbors classifiers and K-means, metric learning aims to find the appropriate similarity measures between data pairs to maintain the required distance structure (Cai et al., 2015;Wang et al., 2015;Lu et al., 2016). The appropriately distance metrics can provide a good measure of the similarity and dissimilarity between different samples. For example, Liu et al. (2014) developed a similarity metric-learning in the process of EEG P300 wave recognition. Compared with traditional Euclidean metric, the proposed global Mahalanobis distance metric shows the better discriminative representation. Phan et al. (2013) proposed a metric learning method using the global distance metric from labeled examples. This method successfully applied on singlechannel EEG data for sleep staging and does not need artifact removal or boostrapping preprocessing steps. Alwasiti et al. (2020) proposed a deep metric learning model and tested it for motor imagery EEG signals classification. The experimental results show that the proposed deep metric learning model can converge with very small number of training EEG signals.
Inspired by the distance metric and multi-view learning, we present a new auto-weighted multi-view discriminative metric learning method with Fisher discriminative and global structure constraints for EEG signal classification called AMDML. To better exploit the correlation and complementary data features of multiple views, both the Fisher discriminative constraint and global structure constraint are adopted in the construction process of the distance metric matrix. In such common metric space, the intraclass EEG signals are compact, and interclass EEG signals are separable as much as possible. Simultaneously, an auto-weighted learning strategy is developed to automatically adjust constraint and view weights during the model learning process. The contributions of our work are as follows: (1) Both Fisher discriminative and global structure information of multiple view data features are considered in the multi-view metric learning model with the high discriminative performance; (2) in the optimization process, the constraint and view weights can be adjusted auto-weighted by the closed form solution, instead of adjusted manually. Thus, the constraints balance and multiple view collaboration can be optimized; and (3) the experimental results on Bonn EEG dataset justify the applicability of AMDML for EEG signal classification.

Metric Learning
Here, we introduce the baseline method in the study. Xing et al. (2003) proposed a distance metric considering side-information (DMSI) method. Using the given similar and dissimilar pairs of samples, DMSI learns a good distance metric to identify the "similar" relationship between all pairs of samples so that similar pairs are close and dissimilar pairs are separated. Let S and D be two sets of pair as The optimization problem of DMSI is represented as (3) A key point in DMSI is that all samples that are not clearly identified as similar are dissimilar. In addition, metric learning tries to find an appropriate measurement to preserve the distance structure. The distance metric considers a positive semidefinite matrix M, and x i − x j M is induced as a Mahalanobis distance When the learned M is a diagonal matrix, Equation (3) can be solved by the Newton-Raphson method; when M is a full matrix, Equation (3) can be solved by an iterative optimization algorithm with gradient ascent and iterative projection strategies.

Bonn EEG Dataset
The EEG signal data in the experiment is from the website of the Bonn University, Germany (Tzallas et al., 2009). The Bonn EEG dataset contains five groups of EEG signal sets called as groups A-E. The example samples in groups A-E are shown in Figure 1.
Each EEG data group consists of 100 single-channel EEG signal segments of 23.6 s and 173.6 Hz rate. The basic information of five groups is listed in Table 1. EEG signal data in groups A and B is sampled from five healthy volunteers, and EEG signal data in groups C-E is sampled from five patients at different states of epileptic seizure.

Objective Function
After collecting a set of EEG signals, we obtain N samples presented as where l i is the class label of sample x i . According to the label information, we construct two sets of sample pairs such that where N k 1 (x m j ) denotes the intraclass sample set containing the k 1 nearest neighbors of x m i . The graph P m can be computed as where N k 2 (x m j ) denotes the interclass sample set containing the k 2 nearest neighbors of x m i . Then, the intraclass correlation constraint F m G from the mth view can be written as where L m G is the Laplacian matrix on G m , and L m G is computed as Tr () is for the trace operator. The interclass correlation constraint F m P from the mth view can be written as where L m P is the Laplacian matrix on P m , and L m P is computed as L m P = D m P − P m . D m P is a diagonal matrix, and the element in D m P is D m p,i,i = j P m i,j . For global structure knowledge of multiple-view data preservation, the following global structure consistency Q m is employed where W is an adjacent matrix whose element is W i,j = 1/N 2 . L W is the Laplacian matrix on W, and L W is computed as L W = D W − W, D W is a diagonal matrix, and the element in D W is D m W,i,i = 1/N. The basic principle of Q m is to use the global structural information through cross-view data covariance. The term X m L W X mT is equivalent to the centering matrix of the mth view data, i.e., X m L W It represents the average squared distance between all samples of the mth view in the metric space. Therefore, Q m can be considered as a principal component analysis (PCA) (Smola, 1997)-like the regularization term in the mth view. The goal of AMDML is to find an optimal discriminative distance metric in a multi-view learning model, and in such metric space, it can exploit the complementary information of different view data features and further enforce the proposed method to be more discriminative. To achieve this goal, we learn a metric that maximizes the Fisher analysis constraint (interclass/intraclass correlation ratio), simultaneously maximizing the preservation of the global structure consistency constraint. The objective function of AMDML is designed as The projection matrix H helps to build a discriminative metric space among multiple views, such that the feature correlation and complementary structural information among multiple views can be exploited. The vector Θ = [ 1 , 2 , ..., M ] is the view weight vector, and its element m indicates the role importance of the mth view. When m tends to 0, it means the data features of the mth view are useless for discrimination task. The m =1 means that only one type of data features from one view is used in AMDML, and in this case, Equation (10) is a single view learning problem. To better utilize the complementary information of multiple features rather than the best feature, we use index parameter r (r > 1) on m .
Equation (10) However, the optimization of Equation (11) involves a complex operation of inverse. Using a constraint weight parameter γ , we reconstruct Equation (11) where γ represents a constraint weight tradeoff Fisher discriminative constraint and global structure constraint. It is noted that the constraint weight γ and view weight are not a manually adjusted parameters. In this study, we adaptively adjust γ and in two closed form solutions, respectively.

Optimization
Because the optimization problem of Equation (12) is a nonlinear constrained non-convex problem, in this study, we solve the optimization problem in Equation (12) using the iteratively optimization strategy to obtain the AMDML parameters of H, Θ, and γ . First, we tune parameter H while fixing parameters Θ and γ . The optimization problem in Equation (12) can be reformulated as follows: Thus, H can be easily calculated by solving the eigenvalue decomposition problem as follows: In terms of the Lagrange optimization, the minimization of Equation (14) can be converted with multiplier as follows: Next, we tune parameter Θ while fixing parameters H and γ . Let ∂J(Θ,γ ,α) We can obtain m as follows: Finally, we tune parameter γ while fixing parameters H and Θ. In terms of the Lagrange optimization, the solution of γ is ∂J(Θ,γ ,α) ∂γ = 0; we can obtain γ as follows: Based on the above analysis, the proposed AMDML method is presented in Algorithm 1.

Input: M views of m pairs of EEG signals;
Output: the best metric H = H l .

Performance Comparisons With Single-View Methods
We first compare the performance of AMDML with several single-view classification methods. NPE using two classifiers KNN and SVM are named as NPE-KNN and NPE-SVM, respectively. Table 3 shows the classification performance of these methods on Bonn EEG dataset using three signal views (WPD, STFT, and KPCA) and full views. When AMDML uses single-view feature data, the parameter m is fixed with m =1 in its objective function. For a fair comparison, three signal views features are combined for four single-view classification methods in full views. We can see that, on the one hand, both AMDML-KNN and AMDML-SVM using full-view features are better than them using only single-view features. For example, the performances of AMDML-KNN with full-view feature are 1.44, 1.57, and 1.31% higher than its performance in WPD, STFT, and KPCA on Task 1, respectively. On the other hand, the classification accuracy of methods AMDML-KNN and AMDML-SVM are better than single-view methods on 10 tasks. These results show that (1) the simple combination of features is limited to improve classification performance for single-view methods, and (2) due to the inherent diversity and complex of EEG signals, it is suitable to exploit multiple view features to better make use of the correlation and complementary EEG data. Thus, the multi-view learning framework can promote the EEG signal classification performance.

Performance Comparisons With Multi-View Methods
In this subsection, we compare AMDML with several multiview classification methods. The multi-view metric learning method DMML uses KNN and SVM as testing classifiers, and two classifiers are named as DMML-KNN and DMML-SVM, respectively. Figure 2 shows the classification accuracies of all methods on all EEG classification tasks. In addition, we use balanced loss l bal (Wang et al., 2014;Gu et al., 2020) to  In the framework of multi-view learning and to discriminate each emotion category best from all other categories, AMDML learns discriminative metric space to utilize the global and local information by adopting Fisher discriminative constraint and global structure constraint. Thus, the intraclass compactness and interclass EEG signals separability can perform better in the learned metric space. In addition, the auto-weighted learning strategy used in the proposed method adjusts constraint and view weights. The optimal weights can be obtained adaptively, and multiple feature representation in each view can be collaborative leaned. Similar to the results shown in Table 3, the classification accuracies of AMDML-KNN and AMDML-SVM are comparative. To summarize, the results in Figures 2, 3 confirm that the AMDML method is effective in EEG signal classification.

Model Analysis
To further validate the effects of performance of AMDML, we discuss the effect of the k 1 in Equation (5) and k 2 in Equation (6) in AMDML. The parameters k 1 and k 2 build the k-nearest neighbor inter-and intraclass graphs, respectively. For convenience, we set k 1 = k 2 in the range {2,..., 10}. Figure 4 shows the classification accuracy of AMDML-KNN with different values of k 1 for Tasks 1, 4, and 8; meanwhile, the k-nearest neighbor in KNN are fixed with 7. We can see that the performance of AMDML-KNN is not high sensitive to the variation k 1 and k 2 . Next, for AMDML-KNN, we discuss the effect of the K in KNN classifier. In KNN classifier, the class label of the testing sample is determined by the distance from the K nearest training sample. Figure 5 shows the classification performance of different values of K for all tasks; meanwhile, fixing k 1 = k 2 = 5. We can see that the classification accuracy of AMDML-KNN is relatively stable with respect to the variation K. Therefore, we can set K empirically to 7 for Tasks 1, 4, and 8.

CONCLUSION
In this paper, we propose a new multi-views metric learning to achieve the robust distance metric for EEG signal classification. In the scheme of the multi-view data representation, the diversity, and complementariness of features of all views can be exploited; meanwhile, both the Fisher discriminative constraint and global structure constraint are considered, and the learned classifier will obtain high generalization ability. Through learning a discriminative metric space, AMDML shows the higher classification performance. There are several directions of future study. In this paper, we use the k-nearest neighbor intra-and interclass graphs to exploit local discriminative information; we will consider other discriminative terms in the multi-view metric learning framework. Second, the gradient descent method used in this study is a simple and common solution method. We may develop a more effective method to speed up the solution of our method. Third, we plan to apply the proposed method for more EEG signal classification applications.

DATA AVAILABILITY STATEMENT
Publicly available datasets were analyzed in this study. This data can be found here: http://epileptologie-bonn.de/cms/upload/ workgroup/lehnertz/eegdata.html.