Electroencephalography (EEG) based epilepsy diagnosis via multiple feature space fusion using shared hidden space-driven multi-view learning

Epilepsy is a chronic, non-communicable disease caused by paroxysmal abnormal synchronized electrical activity of brain neurons, and is one of the most common neurological diseases worldwide. Electroencephalography (EEG) is currently a crucial tool for epilepsy diagnosis. With the development of artificial intelligence, multi-view learning-based EEG analysis has become an important method for automatic epilepsy recognition because EEG contains difficult types of features such as time-frequency features, frequency-domain features and time-domain features. However, current multi-view learning still faces some challenges, such as the difference between samples of the same class from different views is greater than the difference between samples of different classes from the same view. In view of this, in this study, we propose a shared hidden space-driven multi-view learning algorithm. The algorithm uses kernel density estimation to construct a shared hidden space and combines the shared hidden space with the original space to obtain an expanded space for multi-view learning. By constructing the expanded space and utilizing the information of both the shared hidden space and the original space for learning, the relevant information of samples within and across views can thereby be fully utilized. Experimental results on a dataset of epilepsy provided by the University of Bonn show that the proposed algorithm has promising performance, with an average classification accuracy value of 0.9787, which achieves at least 4% improvement compared to single-view methods.


INTRODUCTION
Epilepsy is a chronic, non-infectious but genetic disease that affects all ages and is caused by paroxysmal abnormal hypersynchrony of brain neurons.It is one of the most common neurological diseases globally.Due to the diversity and complexity of the clinical manifestation of epilepsy, it is often misdiagnosed or missed.Repetitive seizures can have a persistent negative impact on the patient's mental and cognitive functions, even threatening their life.Therefore, the study of epilepsy diagnosis and treatment has important clinical significance.The brain electroencephalogram (EEG) is a microvolt-level electrical signal generated by synchronized neurons in the brain when electrodes are placed on the scalp at specific locations.As the most commonly used and cheapest non-invasive brain wave detection method, EEG has a history of over 70 years of research and is the most effective method for diagnosing epilepsy-related diseases, such as identifying seizures, predicting their occurrence, and localizing the affected areas.With the development of artificial intelligence, machine learning models are extensively used in automatic epilepsy recognition.Feature representation is a crucial step in machine learning.Research has indicated that EEG signals can be represented by both linear and non-linear features.Time-domain features are the fundamental features in EEG signal processing, primarily extracted by directly observing and calculating relevant characteristics from the raw signal.Their advantages lie in their simplicity of computation and ease of interpretation.However, the non-stationarity of EEG signals, individual differences, and external interferences can easily affect time-domain features.Frequencydomain features are based on the significant changes in energy in EEG during epileptic seizures, assuming that the background EEG is approximately stationary.Most frequencydomain features are derived from the study of signal power spectra, and various parameter estimation methods can be used for extracting spectral features.The accuracy of these parameters also affects the quality of frequency-domain features.If we consider the amount of information contained in the features, neither pure time-domain features nor frequency-domain features can comprehensively characterize an EEG signal.Additionally, EEG analysis based on the assumption of stationarity is not rigorous.Therefore, researchers have turned their attention to time-frequency analysis methods, such as timefrequency transformations, to re-represent non-stationary EEG signals and extract corresponding features.In addition to the aforementioned linear features, many studies also consider the brain as a nonlinear system and extract corresponding nonlinear features from descriptions of complexity, persistence, synchrony, and other changes in the system.These features are not affected by the non-stationarity of EEG signals and offer more flexibility in dealing with issues such as multi-channel correlation and channel loss.Based on the aforementioned linear or nonlinear feature representations, numerous scholars have constructed machine learning models for the automatic diagnosis of epilepsy.For example, the study conducted by Li, Chen & Zhang (2016) employed a dual-tree complex discrete wavelet transform to extract nonlinear features from individual components.The researchers utilized an ANOVA analysis to select relevant classification features, including the Hurst parameter and fuzzy entropy.For the classification task, a support vector machine (SVM) was employed.Reddy & Rao (2017) computed the central correlated entropy of wavelet components obtained from tunable Q-factor wavelet transform, and utilized models such as RF, LR, and multi-layer perceptron for epileptic signal recognition.Jaiswal & Banka (2017) proposed a feature extraction method called local gradient pattern transformation and applied classification methods such as k-nearest neighbors, SVM, and decision trees for epilepsy detection.
The aforementioned machine learning-based epilepsy diagnostic models utilize single EEG feature representation for epilepsy diagnosis, which have low model complexity and high interpretability.However, these models rely on expert knowledge, and deep features are not easily observed and extracted.As a result, the accuracy is limited.Multi-view learning (Zhao et al., 2017;Jiang et al., 2020;Zhang, Chung & Wang, 2018;Yan et al., 2021) improves the classification accuracy of models by utilizing the differences and similarities between multiple different views based on the principles of view consistency and complementarity.For example, Tian et al. (2019) utilized a convolutional neural network (CNN) model to extract deep features from EEG signals in the time domain, frequency domain, and time-frequency domain.These features were constructed as three views, and multi-view learning was conducted using a multi-view Takagi-Sugeno-Kang (TSK) fuzzy system, which improved the classification and detection performance compared to a single view.Yuan et al. (2018) implemented a multi-view epilepsy automatic diagnosis by utilizing channel characteristics and intra-channel time-frequency features of multichannel EEG signals extracted using autoencoder (AE) through channel perception technology.Liu & Li (2019) utilized a user-sensitive model for channel selection and extracted time-frequency features from each sub-band of the selected channels, forming multi-view features.They extracted numerical and morphological features using a common spatial projection matrix and utilized a maximum average difference autoencoder to extract inter-channel time-frequency domain features, enabling automatic diagnosis of epilepsy with multiple views.These effective models based on collaborative regularization can construct a common feature space for multi-view learning.However, these models also have certain limitations.While these methods construct the density distributions of each view solely based on the corresponding observed data, they overlook the correlated information among all views.Additionally, they separate the original sample space from the common space obtained through mapping.This approach solely utilizes the common space for learning, neglecting the discriminative information present in the original space.
To overcome such shortcomings, in this study, a shared hidden feature space method is constructed by using kernel density estimation, and it is extended to an expanded space by combining it with the original space.Then, SVM is introduced and a multi-view SVM based on the shared hidden space is proposed to take a careful consideration of the differences and relationships between samples from different views.Through experimental verification on different multi-view data sets, the effectiveness of this method in addressing the challenges mentioned above has also been confirmed.The contributions of this study are mainly reflected in the following aspects: (1) The kernel density estimation (KDE) technique is used to construct a new shared hidden space, and it is combined with the original space to construct an expanded space for multi-view learning, thus being able to effectively address the special issue mentioned above on multi-view learning.
(2) By constructing the expanded space and utilizing the information of both the shared hidden space and the original space for learning, thereby fully utilizing the relevant information of samples within and across views, we can effectively solve the problem that the difference between samples of the same class from different views is greater than the difference between samples of different classes from the same view.
(3) During the optimization phase, the proposed model is transformed into a classical Quadratic Programming (QP) problem, allowing for the utilization of pre-existing optimization methods that offer both high effectiveness and theoretical guarantees.This transformation enables the application of readily available optimization techniques, which have proven to be highly efficient in solving QP problems.
The following sections are organized as follows.In 'Data', we introduce the EEG data used in this study and the corresponding multiple feature space representation.In 'Methodology', we present the proposed model.In 'Experimental studies', experimental results are reported and in the last section, the whole study is summarized.

DATA
The EEG data of epileptic patients used in this study was authorized and provided by the University of Bonn in Germany (Andrzejak et al., 2001), as shown in Table 1

Frequency-domain representation extraction
Frequency-domain feature representation originates from the significant changes in energy in EEG during epileptic seizures.To extract frequency-domain representation from EEG signals, the Daubechies4 wavelet coefficients are utilized to decompose the original signals into a series of binary wavelets.The frequency band of each Daubechies4 wavelet coefficient is provided in Table 2.By applying these settings, the EEG signals are divided into six distinct frequency bands.An illustrative example of the decomposed signals from group E is depicted in Fig. 2.

Time-domain feature extraction
Time-domain features are the fundamental features in EEG signal processing, primarily extracted by directly observing and calculating relevant characteristics from the raw signal.Their advantages lie in their simplicity of computation and ease of interpretation for researchers.In this study, we employ kernel principal component analysis (KPCA) (Li et al., 2022b) on the raw EEG signals to enable complex nonlinear mapping.Previous research has shown that KPCA features offer discriminative patterns suitable for pattern

Group #Volunteers Collection information
A 100 This group was collected from a group of healthy volunteers who were instructed to keep their eyes open during the recording process.These volunteers did not have any known neurological or psychiatric disorders and were not experiencing any abnormal symptoms at the time of data collection.
B 100 This group was collected from a group of healthy volunteers under conditions where they kept their eyes closed.
C 100 This group was collected from the hippocampal formation of the contralateral hemisphere of the brain during seizure-free intervals.These samples were obtained when the patient was not experiencing any epileptic seizures.

D 100
This group was collected from the epileptogenic zone during periods of seizure freedom.This implies that the recordings were obtained when the patient was not experiencing seizures.

E 100
The group was collected during seizure activity phase offering a unique opportunity to study the dynamics and temporal dynamics of epileptic seizures, paving the way for the development of more accurate and reliable seizure detection and prediction algorithms.recognition.An illustration depicting an example of KPCA features from group E can be observed in Fig. 3.

Time-frequency representation extraction
Pure time-domain or frequency-domain feature representations alone cannot comprehensively characterize an EEG signal, and EEG analysis based on the assumption of stationarity is not rigorous.Therefore, researchers have turned their attention to timefrequency analysis methods, such as time-frequency transformations, to re-represent nonstationary EEG signals and extract corresponding features.To capture time-frequency representation, researchers often employ the short-time Fourier transform (STFT) (Li et al., 2022a).STFT allows for the analysis of how the frequency content of a signal changes over time.It can be formulated as follows: In the context of EEG signal analysis, Eq. ( 1) represents the transformation of continuous EEG signals, denoted as x time ð Þ, into the time-frequency plane using the  1), which takes into account the observed discrepancies.A visualization of these six energy bands, exemplified by group E, is illustrated in Fig. 4.

METHODOLOGY
In this section, we will design a shared hidden space-driven multi-view learning method to fuse time-frequency representation, frequency-domain representation and time-domain representation.

Construction of shared hidden feature space
Suppose that 2 R rÂd is an orthogonal matrix subject to Ng represents one kind of feature space, e.g., timedomain feature space, and f B ¼ fx B i ; y i jx B i 2 R d ; i ¼ 1; 2; . . .; Ng represents another kind of feature space, then the hidden feature space of f A and f B can be generated by x A i 2 R r and x B i 2 R r , respectively, where r represents the number of hidden features.To obtain a consistent hidden feature space between x A i and x B i , it is expected that the difference between them should be minimized as much as possible.Kernel density estimation (KDE), which is one of the non-parametric estimation methods in probability theory, is usually used to estimate the unknown probability density function (Wang, Wang & Chung, 2013).For a training set X ¼ fx i ; y i jx i 2 R d ; i ¼ 1; 2; . . .; Ng, its corresponding kernel density estimation function can be expressed as where d is the kernel width, K Á ð Þ is the kernel function.If the Gaussian kernel function is adopted, then Eq. ( 2) can be updated as Therefore, the kernel density estimation of x A i and x B i can be expressed as follows when using the Gaussian kernel function, respectively, e; (3) In this study, the difference between P A x ð Þ and P B x ð Þ is measured by the mean square error, that is By minimizing J, the two-view data x A i and x B i can be made to have the maximum commonality in the shared hidden space, and thus the challenge of excessive variability between samples from different views can be addressed.In order to solve Eq. ( 6), we ð Þ and P B x ð Þ can be updated as . Therefore, Eq. ( 5) can be computed by J Hansen, Jaumard & Xiong (1994), we have , Therefore, we have the following equations, where However, it is difficult to solve Eq. ( 9) directly.Thus, Taylor expansion can be used for getting an approximate solution.Hence, we have Therefore, Eq. ( 9) can be further updated as arg min in Eq. ( 11), implicit feature transformation matrix still cannot be solved directly, but can be solved by gradient descent method.Thus, Eq. ( 11) can be updated as The partial derivative of J w.r.t. is Then the transformation matrix can be solved by gradient descent method, that is, where h is the step size that can be solved by According to the above analysis and derivation, the algorithm for solving implicit feature transformation matrix is described as follows.

Multi-view learning based on shared hidden feature space
After determining the shared hidden space between two views, the extended space can be generated by combining the original space and the shared hidden space.Then, a multiview classifier based on SVM is designed for multi-view data classification in the extended space.In existing multi-view learning mechanisms, it is generally assumed that each view can provide a classifier containing specific information, and classifiers constructed from different view tend to be consistent.Additionally, since views can provide specific information to each other, the proposed model establishes the objective function by considering the mutual information between two views.In summary, the proposed model, based on SVM, restructures the slack variables on each view, and then narrows the gap between the two views by using the corresponding regularization term.The objective function of multi-view learning based on shared hidden feature space can be formulated as arg min where k, C A and C B are the regularization parameters.Observe that Eq. ( 16) consists of three parts: the first four terms reflect the outcome risk in the original feature space and the shared hidden space respectively; the second two terms represent the empirical risk; and the third term reflects the difference between the two views in the shared hidden space.The objective function in Eq. ( 16) strengthens the constraints based on the traditional SVM through the implicit mapping, so that the probability distributions of data from different views in the shared hidden space are as consistent as possible, which can well solve the problem described at the beginning of this study.In order to solve Eq. ( 16) efficiently, the relevant Lagrangian multipliers are introduced according to the Lagrangian optimization theory, hence Eq. ( 16) can be converted into the corresponding dual form as follows.The Lagrangian function corresponding to Eq. ( 16) is where a A i !0, a B i !0, l A i !0, and l B i !0 are Lagrangian multipliers.By setting the partial derivatives of Lagrangian function L with respect to w and  n B i to 0, we have By submitting Eqs.(18-22) to Eq. ( 16), we have the dual problem of Eq. ( 24), which can be defined as where Algorithm 1 Shared hidden feature space generation.
Hu et al. ( 2024), PeerJ Comput.Sci., DOI 10.7717/peerj-cs.187411/18 and K is the kernel function.It is obvious that the optimization of Eq. ( 23) can be considered as a QP problem, which can be solved according to Deng et al. (2013).The decision function of the proposed model in this study is defined as The algorithm of multi-view learning based on shared hidden feature space can be obtained, as shown in Algorithm 2. From Algorithm 2, we can find that the time complexity is mainly contributed by steps 1, 3 and 4. The time complexity of Algorithm 1 EXPERIMENTAL STUDIES

Settings
To observe the merits of the proposed model, k-nearest neighbor (KNN) (Liu & Liu, 2016), support vector machine (SVM) (Liu & Liu, 2016), SVM2K (Farquhar et al., 2005), multiview L2-SVM (MV-L2-SVM) (Huang, Chung & Wang, 2016), and alternative multi-view MED (AMVMED) (Chao & Sun, 2015) are introduced for comparison studies.Accuracy is used as the evaluation indicator in this study.SVM, SVM2K, MV-L2-SVM, and 2V-SVM-SH are all trained using a Gaussian kernel for experimentation.For all methods, ten-fold cross-validation (CV) is used to determine the optimal parameters.Table 3 provides the specific parameters and ranges used for each method.All experiments are conducted on a PC with a 16-core CPU with a clock speed of 3.40 GHz and 32 GB of memory.The programming environment was Matlab R2016a. 1. Use Algorithm 1 to obtain 2. Use to obtain the shared hidden space 3. Solve the ãi according to Eq. ( 23    To construct a two-view learning scenario, based on "Data", three feature extraction methods, namely wavelet packet decomposition (WPD), short-time Fourier transform (STFT) and kernel principal component analysis (KPCA) are adopted, to extract timefrequency features, frequency-domain features and time-domain features from the original EEG signals, as shown in Fig. 2. Finally, 12 datasets are constructed, as shown in Table 4.

Experimental results and analysis
The experimental results are reported in Table 5.We can see from Table 5 that the proposed model wins the best performance on most datasets.Only on DS5, DS9, the proposed model performs worse than SVM-2K and MV-L2-SVM.The advantages of the proposed model indicate the promising ability of the shared hidden space.From the promising results, it can be found that by constructing the expanded space and utilizing the information of both the shared hidden space and the original space for learning, thereby fully utilizing the relevant information of samples within and across views, the proposed model effectively solves the problem that the difference between samples of the same class from different views is greater than the difference between samples of different classes from the same view.The experimental results also indicate the power of KDE which is used to construct the shared hidden space.

Statistical analysis
We use the Friedman test (Zimmerman & Zumbo, 1993;Sakamoto et al., 2015) to conduct a statistical analysis of the experimental results on all methods across all datasets.The Friedman test is a non-parametric testing method that can be used to analyze whether there are significant differences in performance among multiple methods on multiple datasets.The principle is to first obtain the average ranking of each method's performance on all datasets, and then compare whether these rankings are the same.If they are the same, it indicates that all methods have the same performance, otherwise it suggests that there are significant differences in performance among all methods.If there are significant differences among all methods, we further use a Holm post-hoc hypothesis test to specifically analyze which methods and our proposed algorithm have significant differences.From Fig. 5, we see that 2V-SVM-SH wins the best ranking result.The p- values embedded in Fig. 5 computed by Friedman test hint that there are significant differences among different models.From Table 6, it can be seen that all hypothesis is rejected except the proposed model vs AMVMED and the proposed model vs SVM-2K.
These results indicate that the proposed model performs significantly better than KNN-A, KNN-B, SVM-B, SVM-A and MV-L2-SVM.Although the hypothesis of the proposed model vs AMVMED and the proposed model vs SVM-2K is not reject, the low p-value of the proposed model vs AMVMED and the proposed model vs SVM-2K also indicates the reveal the competition of the proposed model.

CONCLUSIONS
In this study, a multi-view support vector machine based on a shared hidden space is constructed using kernel density estimation.The method is designed to address the problem of decreased recognition performance due to the difference in sample characteristics between different view models in multi-view learning.The method involves incorporating SVM into the shared hidden space, resulting in an effective solution to the problem of solving the classic QP problem.Experimental results on EEG-based epilepsy diagnosis demonstrate that our proposed method is better able to extract complementary information between different view models than other methods.In practical applications, annotating training samples is often a time-consuming task.Therefore, in subsequent research, we intend to extend the multi-view algorithm proposed in this article to transfer learning scenarios, aiming to reduce the reliance on labeled samples.
. The dataset included volunteers who could be divided into five groups, namely A, B, C, D, and E. Each group contained 100 single-channel EEG segments lasting 23.6 s, with a sampling rate of 173.6 Hz.The EEG signals of groups A and B were collected from healthy volunteers in a relaxed and conscious state, while the eyes of the volunteers were open during the data collection of group A and closed during the data collection of group B. The remaining three groups' signals were collected from epileptic volunteers, with group C's signals collected from the hippocampi of the two brain hemispheres, and group D's signals collected from the epileptic foci.The signals of groups C and D were measured during periods without epileptic seizures, while group E collected signals during epileptic seizures.Figure 1 provides an example of EEG signals from five groups.

) 4 .
Solve the w T A , w T B , b A , b B , v A and v B by Eqs.(18)-(22) 5. Construct the decision function based on w T A , w T B , b A , b B , v A and v B Hu et al. (2024), PeerJ Comput.Sci., DOI 10.7717/peerj-cs.187412/18

Table 1
Basic collection information of epilepsy EEG signals.

Table 2
Frequency band of each Daubechies4 wavelet coefficient.
Hu et al. (2024), PeerJ Comput.Sci., DOI 10.7717/peerj-cs.18745/18 Algorithm 2 Multi-view learning based on shared hidden feature space.Input: training samples of view-1: fx A i ; y i g, training samples of view-2: fx B i ; y i g, regularized parameters C A ; C B and k Output: w T A , w T B , b A , b B , v A and v B Procedures:

Table 4
Two-view learning scenarios.

Table 5
Classification performance in terms of accuracy on all multi-view learning scenarios.
Note:Bold entries indicate the best performance achieved by the corresponding method.