Latent Prototype-Based Clustering: A Novel Exploratory Electroencephalography Analysis Approach

Electroencephalography (EEG)-based applications in brain–computer interfaces (BCIs), neurological disease diagnosis, rehabilitation, etc., rely on supervised approaches such as classification that requires given labels. However, with the ever-increasing amount of EEG data, incomplete or incorrectly labeled or unlabeled EEG data are increasing. It likely degrades the performance of supervised approaches. In this work, we put forward a novel unsupervised exploratory EEG analysis solution by clustering based on low-dimensional prototypes in latent space that are associated with the respective clusters. Having the prototype as a baseline of each cluster, a compositive similarity is defined to act as the critic function in clustering, which incorporates similarities on three levels. The approach is implemented with a Generative Adversarial Network (GAN), termed W-SLOGAN, by extending the Stein Latent Optimization for GANs (SLOGAN). The Gaussian Mixture Model (GMM) is utilized as the latent distribution to adapt to the diversity of EEG signal patterns. The W-SLOGAN ensures that images generated from each Gaussian component belong to the associated cluster. The adaptively learned Gaussian mixing coefficients make the model remain effective in dealing with an imbalanced dataset. By applying the proposed approach to two public EEG or intracranial EEG (iEEG) epilepsy datasets, our experiments demonstrate that the clustering results are close to the classification of the data. Moreover, we present several findings that were discovered by intra-class clustering and cross-analysis of clustering and classification. They show that the approach is attractive in practice in the diagnosis of the epileptic subtype, multiple labelling of EEG data, etc.


Introduction
Electroencephalography (EEG) is a well-established non-invasive tool to record brain electrophysiological activity.Compared to other neuroimaging techniques that provide information about the anatomical structure (e.g., MRI, CT, and fMRI), EEG offers ultra-high time resolution, which is critical in understanding brain function.As the mainstream means for examining brain electrical activities, EEG techniques have wide applications in cognitive neuroscience, emotion recognition [1], motor imagery [2], and the diagnosis of diseases such as autism, schizophrenia, and epilepsy [3].However, these applications are mostly focused on supervised tasks that require a priori knowledge such as EEG labels that define the class they belong to.However, not all EEG labels associated with specific patterns of brain activity can be completely or correctly obtained from subjects across different recording sessions.This is especially so for patients with complex situations, such as those suffering from stroke [4], Alzheimer's disease (AD) [5], amyotrophic lateral sclerosis (ALS) [6], or epileptic seizures [7], etc.Therefore, the increasing amount of incomplete or incorrectly labeled or unlabeled EEG data likely degrades the efficacy of supervised techniques, e.g., classification that is crucial to brain-computer interface (BCI)based applications and disease diagnosis.Moreover, in fact, the EEG signals often have Sensors 2024, 24, 4920 2 of 22 multiple attributes, consequently needing to be annotated with multiple labels.Some EEG data may have subclasses that deserve more attentions in clinical applications, like epileptic subtype diagnosis.In practice, either multiple labelling or subclass labelling is almost infeasible due to the multiple labor and time costs of the labelling.
Dai et al. suggested a semi-supervised EEG clustering method that makes good use of limited a priori knowledge [8].In our work, we focus on unsupervised clustering.Clustering aims to organize the data elements of a dataset into distinct clusters according to the resemblance of the intrinsic patterns of the data.Data elements of the same cluster are characterized by a similarity higher than those of other clusters [9].Clustering is probably the most important and fundamental means of exploratory data analysis for finding intrinsic hidden information and patterns (if any) without requirement for a priori knowledge, such as to detect unknown kinds of abnormal states from brain imaging data.
The most widely applied clustering techniques, such as K-means, rely on distance to assign a cluster, which determine the cluster members to centers based on their minimum distances and find the most appropriate cluster centers by the optimization of an objective function based on distance [10].Distance-based algorithms are the most commonly used benefiting from this simple principle.
To adapt to the diversity of data, distribution-based approaches such as the Gaussian Mixture Model (GMM) have drawn more attentions.They employ predefined probability distribution functions to reproduce data elements [11].If the predefined distribution cannot be adaptively adjusted, the clustering efficacy relies on the capability of the trial probability in representing the data.Based on the Density Peaks Clustering (DPC) algorithm [12], Gao et al. formed an adaptive density peaks clustering (ADPC) solution towards exploratory EEG analysis [13].
Generative Adversarial Networks (GANs) have obtained remarkable success in many unsupervised learning tasks [14].In recent times, in order to provide a better fit to the target data distribution when the image dataset includes many different classes, some variants of the basic GAN model, including Gaussian Mixture GAN (GM-GAN), dynamic GM-GAN [15], and Deli-GAN [16], have been proposed where the probability distribution over the latent space is a mixture of Gaussians.These models tend to map latent vectors sampled from different Gaussians in the latent space to samples of different classes in the image data space.This phenomenon implies that it may be exploited for the task of unsupervised clustering.However, these GANs do not provide inverse mapping from the data space X to the latent space Z.Therefore, given a query data point, we cannot know which latent variable it is generated from, or to say, we cannot obtain its latent space representation.
Some GAN techniques make use of an encoder that has the potential to provide another form of back-projection, such as InfoGAN [17], Variational Auto-Encoder GAN (VAE-GAN) [18], and Stein Latent Optimization for GANs (SLOGAN) [19].Usually, they are not specifically designed for clustering.
The main contributions of this work are as follows.
A novel unsupervised approach is put forward for exploratory EEG analysis.The basic idea is to form a kind of GAN to learn the Gaussian mixture distribution in latent space wherefrom the prototype or center associated with each cluster can be abstracted.Then, based on the latent prototypes, according to a well-defined similarity metric, the query EEG data will be assigned to a cluster.
By applying the proposed approach to two public EEG or intracranial EEG (iEEG) epilepsy datasets, our experiments demonstrate that the clustering results are close to the classification of the data.Moreover, several findings show that the approach is attractive in practice in the diagnosis of epileptic subtypes and multiple labelling of EEG data, etc.

Materials
In this work, two publicly available EEG epilepsy datasets were used in the experiments, the benchmark Bonn dataset and the HUP iEEG dataset.

Bonn Dataset
The Bonn dataset [20], collected by the University of Bonn, contains EEG and iEEG signals from healthy volunteers and epileptics.The muscle activity and eye movement artifacts were already removed from the collected data on the basis of visual inspection [21].The complete database consists of by five sets denoted as A-E.Sets A and B contain scalp EEG signals collected from healthy volunteers with their eyes open (A) and closed (B).Set C contains iEEG recordings that were recorded from the hippocampal formation of opposite hemispheric regions during inter-ictal periods [22].Set D comprises the iEEG signals collected from within the epileptic zone of the brain of patients during seizure-free intervals.Set E contains data collected from within the epileptogenic zone of patients during the ictal period.Detailed descriptions of the dataset are shown in Table 1.Each set contains 100 single-channel EEG or iEEG segments with a sampling rate of 173.61 Hz and a duration of 23.6 s.

HUP IEEG Epilepsy Dataset
The HUP dataset [23], collected by the Hospital of the University of Pennsylvania, contains intracranial EEG (iEEG) signals from 58 patients diagnosed as drug-resistant epilepsy.Each of the 58 subjects underwent iEEG with subdural grid, strip, and depth electrodes (electrocorticography (ECoG)) or purely stereotactically placed depth electrodes (sEEGs).Since each patient's epilepsy type may not be the same, a patient-specific study is necessary [24].We chose the ECoG signals of three de-identified patients, HUP65, HUP88, and HUP89, to be used in the experiments.Details of the dataset are provided in Table 2.The data for each patients include three ictal and two inter-ictal segments, stored in EDF format.Each ictal segment includes recordings from two minutes before the seizure onsets, which were viewed as the pre-ictal period in this study.Therefore, data of each patient can be categorized into three periods: pre-ictal, ictal, and inter-ictal.Figure 1 illustrates the different periods in EEG data collected from epileptics.

Methods
Given a test data, the probability for each cluster can be calculated using a giv function, which enables us to assign a cluster for the data.We propose a critic based on the test data's low-dimensional prototype in the latent space, termed lat totype.Each prototype is responsible for a certain attribute of the data, namely a

Methods
Given a test data, the probability for each cluster can be calculated using a given critic function, which enables us to assign a cluster for the data.We propose a critic function based on the test data's low-dimensional prototype in the latent space, termed latent prototype.Each prototype is responsible for a certain attribute of the data, namely a cluster.Having the prototype as a baseline of each cluster, we are able to define a compositive critic metric that incorporates the similarities between the test data and the prototype of a given cluster on three levels, which are the latent representation level, image level, and deep feature map level.

Schematic of Latent Prototype-Based Clustering
According to the above consideration, we put forward an unsupervised EEG clustering approach based on latent prototypes.Its schematic is briefly illustrated in Figure 2. First, train a W-SLOGAN from the EEG dataset to learn a generator, a discriminator, an encoder as well as the latent prototype µ k that are responsible for each cluster.Given a query signal, transform it with continuous wavelet transform to a scalogram x query .Wavelet transform is an effective technique to analyze the local characteristics of non-stationary signals, offering both time domain resolution and frequency domain resolution.Then, utilizing the trained W-SLOGAN, calculate three levels of similarities separately between (i) the latent space representation of the query signal and the latent prototype of each cluster, (ii) scalogram of the query signal and the baseline scalogram of each cluster, and (iii) the deep feature map (DFM) of the query signal and the baseline deep feature map of each cluster.Obtain the compositive similarity between the query signal and the prototype of each cluster by incorporating the above three levels of similarities.Finally, convert the compositive similarity to a probability with the SoftMax function, which enables us to assign a cluster for the query signal.

Methods
Given a test data, the probability for each cluster can be calculated using a given critic function, which enables us to assign a cluster for the data.We propose a critic function based on the test data's low-dimensional prototype in the latent space, termed latent prototype.Each prototype is responsible for a certain attribute of the data, namely a cluster.Having the prototype as a baseline of each cluster, we are able to define a compositive critic metric that incorporates the similarities between the test data and the prototype of a given cluster on three levels, which are the latent representation level, image level, and deep feature map level.

Schematic of Latent Prototype-Based Clustering
According to the above consideration, we put forward an unsupervised EEG clustering approach based on latent prototypes.Its schematic is briefly illustrated in Figure 2. First, train a W-SLOGAN from the EEG dataset to learn a generator, a discriminator, an encoder as well as the latent prototype   that are responsible for each cluster.Given a query signal, transform it with continuous wavelet transform to a scalogram  .Wavelet transform is an effective technique to analyze the local characteristics of non-stationary signals, offering both time domain resolution and frequency domain resolution.Then, utilizing the trained W-SLOGAN, calculate three levels of similarities separately between (i) the latent space representation of the query signal and the latent prototype of each cluster, (ii) scalogram of the query signal and the baseline scalogram of each cluster, and (iii) the deep feature map (DFM) of the query signal and the baseline deep feature map of each cluster.Obtain the compositive similarity between the query signal and the prototype of each cluster by incorporating the above three levels of similarities.Finally, convert the compositive similarity to a probability with the SoftMax function, which enables us to assign a cluster for the query signal.

Gaussian Mixture Distribution in Latent Space
GANs usually uses a unimodal distribution as the prior distribution for Z, such as the multivariate uniform distribution (i.e.,  1,1 ) and the multivariate normal

Gaussian Mixture Distribution in Latent Space
GANs usually uses a unimodal distribution as the prior distribution for Z, such as the multivariate uniform distribution (i.e., U[−1, 1] d z ) and the multivariate normal distribution (i.e., N (0, I d z ×d z )) [15].To better adapt to diversity of the real data, the W-SLOGAN adopts the multimodal Gaussian mixture distribution as the prior to sample from latent space, as shown in Figure 3.
The Gaussian mixture distribution is defined as follows: where N is the number of Gaussian components and can be predefined by the number of data clusters, p(k) is mixing coefficient, and q(z|k) denotes the probability distribution of the kth Gaussian component, formulated as q(z|k ) = N (z; µ k , Σ k ), where µ k and Σ k denote the mean vector and covariance matrix of the kth Gaussian component, respectively.
Sensors 2024, 24, 4920 distribution (i.e.,  0,  ) [15].To better adapt to diversity of the real data, SLOGAN adopts the multimodal Gaussian mixture distribution as the prior to from latent space, as shown in Figure 3.The Gaussian mixture distribution is defined as follows: where N is the number of Gaussian components and can be predefined by the num data clusters,   is mixing coefficient, and  | denotes the probability distr of the kth Gaussian component, formulated as  |  ;  ,  , where  denote the mean vector and covariance matrix of the kth Gaussian component, tively.

W-SLOGAN
We decided to form a kind of GAN to obtain the latent prototype of each clus is needed in the calculation of compositive similarity.GAN is a generative mod learns a generator (G) capable of generating samples from the data distribution ( converting latent vectors from a lower-dimension data space (Z) to samples in a dimension data space (X).Specifically, we need a kind of GAN such that: (i) Th space distribution of the GAN should be defined as a Gaussian mixture distribu model the diversity of the data.(ii) It should be able to learn the latent prototype responsible for each cluster from the data distribution.(iii) It is able to back-pro query image to the latent space; therefore, an encoder (E) is needed.(iv) Well-defi jective functions are needed for training G, D, and E and the latent distribution.B the above consideration, we put forward a GAN, termed W-SLOGAN, which ta vantage of both the Wasserstein GAN with Gradient Penalty (WGAN-GP) and th Latent Optimization for GANs (SLOGAN), especially the latter.The W-SLOGAN a discriminator objective function and adversarial loss function proposed in WGA with which the training can be more stable, and the generated images can be o quality.Also, it utilizes an encoder as well as an Unsupervised Conditional Con loss (U2C loss), ensuring that the encoded vector of the generated image is simil assigned low-dimensional prototype in the latent space.
In the following, the network architecture, objective functions, and optimiza gorithm of the W-SLOGAN will be described.

Network Architecture
Figure 4 shows the network architecture of W-SLOGAN, which consists of a

W-SLOGAN
We decided to form a kind of GAN to obtain the latent prototype of each cluster that is needed in the calculation of compositive similarity.GAN is a generative model that learns a generator (G) capable of generating samples from the data distribution (p data ), by converting latent vectors from a lower-dimension data space (Z) to samples in a higherdimension data space (X).Specifically, we need a kind of GAN such that: (i) The latent space distribution of the GAN should be defined as a Gaussian mixture distribution to model the diversity of the data.(ii) It should be able to learn the latent prototype that is responsible for each cluster from the data distribution.(iii) It is able to back-project the query image to the latent space; therefore, an encoder (E) is needed.(iv) Well-defined objective functions are needed for training G, D, and E and the latent distribution.Based on the above consideration, we put forward a GAN, termed W-SLOGAN, which takes advantage of both the Wasserstein GAN with Gradient Penalty (WGAN-GP) and the Stein Latent Optimization for GANs (SLOGAN), especially the latter.The W-SLOGAN adopts a discriminator objective function and adversarial loss function proposed in WGAN-GP, with which the training can be more stable, and the generated images can be of better quality.Also, it utilizes an encoder as well as an Unsupervised Conditional Contrastive loss (U2C loss), ensuring that the encoded vector of the generated image is similar to its assigned low-dimensional prototype in the latent space.
In the following, the network architecture, objective functions, and optimization algorithm of the W-SLOGAN will be described.

Network Architecture
Figure 4 shows the network architecture of W-SLOGAN, which consists of a generator (G), a discriminator (D), and an encoder (E).G is responsible for mapping the latent space (Z) defined by Gaussian mixture distribution to the real image domain (X) ( G(z) : Z → X ).In this mapping, the mean vector µ k of each Gaussian component in the latent space can be viewed as a prototype of samples with certain salient attribute, and the average representation of that attribute in the latent space.D receives the generated images (x g ) and real images (x r ) to train its capability to discriminate between real and fake, providing the driving force for the training of the generator.E maps the image onto a space of the same dimension as the latent space.In order to stabilize the process of adversarial learning and improve the learning of images, W-SLOGAN adopts the convolutional layer structure that was introduced in Deep Convolutional GAN (DCGAN) [25].Table 3 provides the implementation details of the W-SLOGAN model.images ( ) and real images ( ) to train its capability to discriminate between real and fake, providing the driving force for the training of the generator.E maps the image onto a space of the same dimension as the latent space.In order to stabilize the process of adversarial learning and improve the learning of images, W-SLOGAN adopts the convolutional layer structure that was introduced in Deep Convolutional GAN (DCGAN) [25].Table 3 provides the implementation details of the W-SLOGAN model.

Objective Functions
W-SLOGAN is trained to learn the parameters of the generator G, discriminator D, and encoder E from the data, as well as the parameters (µ k , Σ k , and p(k)) of the Gaussian mixture distribution of the latent space.It is necessary to define well the objective functions for the training.We chose a discriminator objective function and an adversarial loss function that were the same as those of WGAN-GP.The unsupervised conditional contrastive loss (U2C loss) proposed by Hwang et al. [19] for SLOGAN was also employed in the training.In the training of W-SLOGAN, D and (G, E, µ k , Σ k , and p(k)) were updated alternately.D was trained with the discriminator objective function, and G, E, µ k , Σ k , and p(k) were trained with a total objective function that comprises the adversarial loss and the U2C loss.
Discriminator objective function.The discriminator objective function is defined as that of WGAN-GP, which helps to stabilize the training process and provide the driving force for the training of generator [26][27][28].It is defined as where λ 1 denotes the gradient penalty coefficient, and x is sampled from the line between the real training data distribution and the generated data distribution.Such a design can make the model converge faster.
Adversarial loss function.The purpose of minimizing adversarial loss is to make the samples generated by the generator as realistic as possible, so that the discriminator cannot accurately distinguish between the generated samples and the fake ones.The adversarial loss function is defined as that of WGAN-GP, as formulated in (3).
Total objective function.W-SLOGAN learns the parameters of the generator (G), encoder (E), and the Gaussian mixture distribution of the latent space by minimizing the total objective function, including unsupervised conditional contrastive loss (U2C loss) and adversarial loss.The total objective function is defined as follows: where λ 2 denotes the weight coefficient of the U2C loss.U2C loss.With U2C loss, the training allows each salient attribute to cluster in the latent space, and each component of the learned latent distribution is responsible for a certain attribute of the data.Given a batch of latent vectors z i B i=1 ∼ q(z) (where B is the batch size), we can find the corresponding Gaussian component K i (the mean vector is to which z i is most likely belong by the use of (5).
The generator receives the latent vector z i and generates the corresponding sample x i g = G z i .Then, the generated sample x i g is mapped by the encoder to an encoded vector x i g = G z i .The cosine similarity between e i and µ j K can be calculated using In this way, by minimizing the U2C loss, the training encourages the encoded vectors of samples with the same prototype to be as similar as possible to the prototype.This allows each component of the learned latent distribution to be responsible for a certain cluster of the data.

Optimization Algorithm of Latent Distribution Parameters
In order to train the parameters of Gaussian mixture distribution of latent space, it is crucial to obtain the gradient of the parameters during the training.Gurumurthy et al. [16] and Ben-Yosef et al. [15] adopted the "reparameterization trick" proposed by Kingma et al. [29] in their related work to update the mean vectors µ k and covariance matrices Σ k of each Gaussian component.However, the above method assumes uniform mixing coefficients p(k) that are fixed.As a consequence, it fails to generate data in the case of imbalanced datasets.Based on the generalized Stein lemma, Hwang et al. [19] derived gradient identities of the parameters of Gaussian mixture distribution, which not only enables µ k and Σ k to be updated, but also ensures that the mixing coefficients p(k) can be updated.It is called the Stein latent optimization algorithm.The W-SLOGAN employs the Stein latent optimization algorithm to enable the imbalanced attributes to be naturally clustered in a continuous latent space.Table 4 presents a comparison of these two reparameterization techniques.Step 1. Parameter initialization.Initialize parameters of the Gaussian mixture distribution, including µ k , Σ k , and p(k), as well as the parameters of three networks G, D, and E.
Step 2. Train D for b D times.Train D with the discriminator objective function as presented in Equation (2).
Step 3. Train G, E, µ k , Σ k , and p(k) for one time.Train them with the total objective function as presented in Equation (4).Go to Step 2.
The loop of step 2 and step 3 stops after it is carried out for a predefined number of times.

Compositive Similarity Metric
In our clustering approach, the similarity metric plays the role of critic function, which will enable us to assign a cluster for the query data.Having the prototype as a baseline of each cluster, we put forward a compositive similarity metric that combines similarities between the test data and the prototype of a given cluster on three levels, namely latent representation, image, and deep feature map. Figure 5 illustrates three levels of similarity for clustering.
Latent representation similarity.The scalogram x query of the query signal is mapped to the latent space by the encoder (E) to become an encoded vector, which can be viewed as the latent representation of the query signal, denoted by e query = E x query .Latent representation similarity measures similarity between e query and the latent prototype that is responsible for a given cluster.It is defined in (7), using cosine similarity to measure the similarity between two vectors.
where S k latent denotes the similarity between the query signal and the k-th cluster in latent space.
Sensors 2024, 24, 4920 9 of 22 Image similarity.It measures similarity between the scalogram of the query signal x query and the baseline scalogram x k of a given cluster that is generated by the generator G from the prototype µ k of the given cluster, and is formulated by x k = G(µ k ).Image similarity is defined as follows: DFM. similarity.It measures the similarity between the DFM of the query signal, DFM query , and the DFM of the baseline DFM of a given cluster, DFM k = DFM(x k ).DFM refers to the output of the last convolution layer of the discriminator that can be viewed as deep feature map of a given image.This kind of deep feature is inspired by the work of NHAN et al. [30], where the discriminator is employed as an unsupervised feature extractor.DFM similarity is defined as Compositive similarity.We define a compositive similarity between the query signal and the centroid of the kth cluster that incorporates all the three levels of similarities as where α 1 , α 2 , and α 3 are weight coefficients of the three similarities.As the dimensions are different, it is necessary to normalize S k latent , S k image , and S k DFM separately before calculating the compositive similarity.
Through some sensitivity tests, we found that the proposed clustering approach is somehow sensitive to α 1 and α 2 values.Therefore, to seek good settings for them, we carried out multiple experiments where α 1 and α 2 were set to different values.Then, based on the resulting clustering performance evaluated with external clustering indexes, we determined the optimal values.In real applications where the data are unlabeled, optimal values can also be determined by use of internal clustering indexes.and the latent prototype that is responsible for a given cluster.It is defined in (7), using cosine similarity to measure the The query data can then be assigned to the cluster with the highest compositive similarity (i.e., argmax k S k com ).With respect to computational complexity, when applied to a dataset with size n, the complexity of both the training and clustering of the W-SLOGAN algorithm are O(n), which is linear.

External Clustering Indexes
To evaluate the closeness between the clustering results from the proposed approach and classification of the data, three widely used external clustering indexes were adopted, namely Purity, Adjusted Rand Index (ARI), and Normalized Mutual Information (NMI).
Purity.Purity is an intuitive evaluation index that indicates the degree of agreement between the clustering results and the real data distribution.It is defined as where n denotes the total number of samples.k denotes the total number of clusters.n ij denotes the number of samples in both class u j and cluster v i .The range of the Purity value is [0, 1], where a higher value indicates a purer clustering result.ARI.ARI measures the degree of similarity between two data distributions [31].It takes into consideration the consistency between the resulted cluster labels and class labels.The definition of ARI is where n ij denotes the number of samples in both class u j and cluster v i .a i denotes the number of samples in class u i and b j denotes the number of samples in cluster v j .n is the total number of samples.( 2 ) represents a combination and is equal to C 2 n ij .NMI. NMI evaluates the consistency between two distributions by measuring their mutual information [32].The definition of NMI is presented as follows: where MI(C, K) denotes the mutual information between class labels and the resulted cluster labels, H(C) is the entropy of classification labels, and H(K) is the entropy of clustering results.The range of the NMI value is [0, 1], where 1 indicates perfect consistency, which means the clustering is exactly consistent with the classification.

Experimental Setup and Running Environment
The Bonn dataset consists of five subsets (A, B, C, D, and E), which were organized into four groups in the experiments.Such grouping considers the complexity of signals of each subset and the relevance of the subsets in clinical.We segmented the scalp EEG and iEEG signals from the Bonn dataset using a sliding time window of 2.88 s with 50% overlap.For each subset, all the 100 signals each with 23.6 s duration were segmented to 1500 samples each with 2.88 s duration.In the experiments, the cluster number was set according to the class number in each group.Detailed descriptions of the four groups are shown in Table 5.
The HUP dataset comprises three classes of signals: pre-ictal, ictal, and inter-ictal.The pre-ictal one is defined as two minutes before the seizure onset.The ECoG signals were segmented with a sliding time window of 5 s with 50% overlap.The number of clusters was set to 3. The description of the ECoG data of three de-identified epileptics in the HUP dataset is outlined in Table 6.In each experimental group, all the data were used for the unsupervised training of W-SLOGAN; then, with the trained model, all the EEG segments were clustered; at last, the clustering results were evaluated with three external clustering indexes.
Preprocessing.Each signal was filtered by an FIR bandpass filter, preserving information within the frequency range from 0.5 to 40 Hz.The scalp EEGs and iEEGs from the Bonn dataset were segmented into 2.88-s segments, while the ECoG signals from the HUP dataset were segmented into 5-s segments.The 1-d time series segments were transformed to 2-d scalograms.The Morlet wavelet was selected as the mother wavelet in the continuous wavelet transform.The scalogram dimension is 64 × 64 × 3 (3 is the number of RGB channels).Before feeding these scalograms into the W-SLOGAN model for training, each pixel value of the scalograms was scaled to ensure their range fell within (−1, 1).
Parameter settings.For simplicity, we denote the learning rate of generator as η, the learning rate of the covariance Σ k as γ, the gradient penalty coefficient as λ 1 , the weight coefficient of the U2C loss as λ 2 , and the weights for the latent representation, image, and DFM similarities as α 1 , α 2 , and α 3 , respectively.In the experiments, the learning rate of discriminator was set to 4η, and the learning rate of encoder to η.The learning rate of latent prototype µ k was set to 10γ, and the learning rate of mixing coefficient p(k) to γ.Specifically, the parameter values were η = 0.0001, γ = 0.004, and λ = 10.In addition, we initialized the p(k) = 1/N, Σ k = I d z ×d z , and µ k sampling from N (0, I d z ×d z ).The three weights α 1 , α 2 , and α 3 were empirically set to 1/3.
During the training, the Adam optimizer was employed to train G, D, and E, and the stochastic gradient descent (SGD) optimizer was adopted to train Σ k , µ k , and p(k).The batch size (B) was 64 and the number of training iterations was set to 18,000 to ensure sufficient training.In the clustering experiments, we repeated each experiment several times and reported the means and standard deviations of model performances.Table 7 shows the details of the experimental parameter settings.Running environment.The experimental conditions include a desktop computer equipped with an Inter(R) Core (TM) i9-10900K CPU (Inter, Santa Clara, CA, USA) and an NVIDIA GeForce RTX 3080 GPU (Nvidia, Santa Clara, CA, USA).Segmentation and continuous wavelet transform of the signals were implemented with MATLAB (R2019a), while the training and evaluation of W-SLOGAN were carried out with Python 3.7 and TensorFlow 2.6.0.

Clustering Results
From applying the proposed approach to the benchmark Bonn EEG datasets, the clustering results and the classification of the data are highly consistent.Taking the group AB_CD_E of the Bonn dataset as an example, the three subplots of Figure 6A depict the probability density functions for signals to belong to Cluster 1, Cluster 2, and Cluster 3. Red, green, and purple indicate samples from Class AB (healthy), Class CD (inter-ictal, epileptic) and Class E (ictal, epileptic), respectively.Taking Cluster 1 as an example, as shown in Figure 6A1, the probability that Class AB samples belong to this cluster is the highest, while the probability that Class CD and Class E samples belong to that cluster is relatively low.It indicates that Cluster 1 gathers the general samples of Class AB.In other words, Cluster 1 corresponds to class AB.Likewise, from Figure 6A2,A3, it is obvious that Cluster 2 and Cluster 3 correspond to Class E and Class CD, respectively.On the other hand, there are also some samples whose resulted clustering labels are inconsistent with their class labels.For example, a few samples of Class CD and E are clustered into Cluster 1, although the probability that they belong to that cluster is not high.It does not necessarily indicate clustering error, but rather reveals the intra-class diversity.
The samples whose resulted cluster label is consistent with its class label represent the generic attributes of that class.The following analysis will focus on this part of samples.Figure 6B shows the probability density function of Class AB samples that are clustered into Cluster 1.These samples can be divided into two parts; one comprises 95% of samples with higher probabilities, and the other comprises the other 5% with lower probabilities.The red color indicates high-probability samples.Several high-probability samples with their respective scalograms are shown in the upper row, representing the typical attributes of Class AB.The yellow color indicate low-probability samples that are shown in the lower row.Similarly, Figure 6C shows the probability density functions of Class CD samples clustered into Cluster 3, several high-probability samples, and low-probability samples.As for Class E samples, see Section 4.4 for a more detailed analysis.
It is obvious that (i) the high-probability samples of each cluster are quite similar, reflecting the typical characteristic of that cluster or class.For example, the high-probability samples of Cluster 1 reflect the characteristics of the EEG signals of healthy volunteers with eyes opened or closed, i.e., the amplitude fluctuates between −180 and 100, fluctuations with a relatively high frequency that suggest rapid changes in the brain activity.The high-probability samples of Cluster 3 reflect the characteristics of iEEG signals during inter-ictal periods of epileptics, i.e., the amplitude fluctuates between −70 and 70 with a relatively low frequency.(ii) Low-probability samples exhibit significant differences in terms of waveforms and scalograms compared to high-probability samples.They show diverse patterns.It could reflect intra-class diversity or may be caused by noise.

Clustering Results from Different Similarity Metrics
According to the class labels provided in the dataset, the closeness of the resulted clustering to the data classification can be measured by three external clustering evaluation indexes, namely Purity, ARI, and NMI.In order to observe the role that different similarity plays in clustering in the four groups of EEG data of the Bonn dataset and the three epileptic patient's ECoG data of the HUP dataset, we applied clustering separately with three kinds of similarity metrics (namely single latent representation similarity, latent representation similarity + image similarity, and the compositive similarity that incorporates all the three levels of similarities).The results are shown in Tables 8 and 9.   Tables 7 and 8 show that, in either the Bonn dataset or the HUP dataset, clustering by the use of the compositive similarity outperforms that of the other two kinds of similarities.Specifically, clustering using the compositive similarity achieves the best average rank (1.43) when evaluated with Purity, ARI, and NMI.Also, the compositive similarity achieved the largest number of the best Purity, ARI, and NMI out of the seven groups of experiments (five out of seven).
Figures 7 and 8 show the bar charts of the three external clustering indexes for clustering using different kinds of similarities on the Bonn dataset and the HUP dataset, respectively.Compared to using a single latent representation similarity, the inclusion of image similarity significantly improved the clustering performance in most experimental groups.In the Bonn dataset, the average Purity, ARI, and NMI increased by 2.18%, 11.92%, and 13.04%, respectively.In the HUP dataset, those three indexes averagely increased by 1.70%, 6.20%, and 6.36%, respectively.The improvement in the performance implies that the scalogram has the potential to capture the time-frequency characteristics of different EEG signals.However, compared to the combination of latent representation similarity and image similarity, the inclusion of DFM similarity has little effect on the improvement in clustering performance.In the Bonn dataset, the three indexes increased by 0.1%, 0.37%, and 0.66%, respectively.We investigated the impact of the number of iterations during the training on the clustering performance of the W-SLOGAN model.We evaluated the clustering performance of the proposed approach on the two datasets separately when the model was iterated 0, 3000, 6000, 9000, 12000, 15,000, and 18,000 times, respectively.The results are shown in Figures 9 and 10.It is shown that, in most groups, a fairly good clustering performance was achieved by 9000 iterations.Before that, the performance increases rapidly with the increase in the iteration number, whereas after that, it changes smoothly.Nevertheless, for part of groups such as AB_CD_E, the clustering performance continues to increase with the increase in the iteration number.We investigated the impact of the number of iterations during the training on the clustering performance of the W-SLOGAN model.We evaluated the clustering performance of the proposed approach on the two datasets separately when the model was iterated 0, 3000, 6000, 9000, 12000, 15,000, and 18,000 times, respectively.The results are shown in Figures 9 and 10.It is shown that, in most groups, a fairly good clustering performance was achieved by 9000 iterations.Before that, the performance increases rapidly with the increase in the iteration number, whereas after that, it changes smoothly.Nevertheless, for part of groups such as AB_CD_E, the clustering performance continues to increase with the increase in the iteration number.

Reproducibility of the Results
The results are to some extent reproducible.It is based on our experiments to test the reproducibility on each experimental group.Taking the group of AB_CD_E of Bonn dataset as example, we trained three W-SLOGAN models with the same experimental setup and parameters.The mixing coefficients p(k) of the latent mixture components obtained from different trained models are displayed in Table 10.It can be seen that they are close.Also, the ratios of the Gaussian mixture components are all close to 2:2:1.They fit the true class ratio (3000:3000:1500) of that group.In order to explore the diversity of ictal iEEG signals, we carried out clustering on the ictal data of the Bonn dataset.The number of clusters was set to five.It is noteworthy that the three resulting clusters were highly consistent with three typical kinds of epileptiform waves in the characteristic patterns.Epileptic seizures are accompanied by some typical discharge waveforms, which serve as significant characteristics and diagnostic criteria for epileptic seizures.Common epileptiform waves include sharp wave, spike wave, spike and slow wave complex, sharp and slow wave complex, highly rhythmic disorganization, and so on.We found several clusters that correspond to rhythmic sharp wave, spike and slow wave complex, and highly rhythmic disorganization from the iEEG recordings.
Figure 11 shows the three types of epileptiform waveforms and the corresponding clusters that resulted from our approach.On each row are displayed the characteristic waveform of a type of epileptiform discharge, three epileptiform waves of that type that were clustered into a same cluster found from the iEEG recordings by our approach, as well as the baseline scalogram of that cluster.
For reference, the definitions and characteristics of sharp wave, spike and slow wave complex, and highly rhythmic disorganization are listed below [33].
Sharp wave.Sharp waves are the most basic form of burst EEG activity, lasting from 70 to 200 ms (5-14 Hz).The amplitudes range from 100 to 200 µV.They are usually with the form of negative phase waves.
Spike and slow wave complex.A pattern of epileptiform waveform composed of spike waves and slow waves.The slow wave is the predominant component of this complex, lasting approximately from 200 to 500 ms.The spike and slow wave complex typically has higher amplitudes, ranging from 105 to 300 µV, and can reach even more than 500 µV.
Highly rhythmic disorganization.Usually composed of sharp waves, spikes, etc., the frequency and amplitude are highly irregular, often seen in complex partial seizures.
waves in the characteristic patterns.Epileptic seizures are accompanied by some typical discharge waveforms, which serve as significant characteristics and diagnostic criteria for epileptic seizures.Common epileptiform waves include sharp wave, spike wave, spike and slow wave complex, sharp and slow wave complex, highly rhythmic disorganization, and so on.We found several clusters that correspond to rhythmic sharp wave, spike and slow wave complex, and highly rhythmic disorganization from the iEEG recordings.
Figure 11 shows the three types of epileptiform waveforms and the corresponding clusters that resulted from our approach.On each row are displayed the characteristic waveform of a type of epileptiform discharge, three epileptiform waves of that type that were clustered into a same cluster found from the iEEG recordings by our approach, as well as the baseline scalogram of that cluster.In each row are displayed the characteristic waveform of a type of epileptiform discharge, three epileptiform waves of that type that were clustered into a same cluster found from the iEEG recordings by our approach, as well as the baseline scalogram of that cluster.
For reference, the definitions and characteristics of sharp wave, spike and slow wave complex, and highly rhythmic disorganization are listed below [33].
Sharp wave.Sharp waves are the most basic form of burst EEG activity, lasting from 70 to 200ms (5-14 Hz).The amplitudes range from 100 to 200 µV.They are usually with the form of negative phase waves.In each row are displayed the characteristic waveform of a type of epileptiform discharge, three epileptiform waves of that type that were clustered into a same cluster found from the iEEG recordings by our approach, as well as the baseline scalogram of that cluster.

Multiple Labels of EEG Data
The cross-analysis of clustering and classification has the potential to discover interesting knowledge, including multiple labels of data.Taking the group AB_CD_E of the Bonn dataset as an example, Figure 12 shows the class labels and clustering results of several samples.Samples in each row belong to a same class and those in each column are clustered into a same cluster.Each grid displays four samples.Row 1 and column 1 both correspond to Class AB, i.e., healthy; Row 2 and column 2 both correspond to Class CD, i.e., inter-ictal; and Row 3 and column 3 both correspond to Class E, i.e., inter-ictal.
The class labels of waveforms on the diagonal in the Figure 12 are consistent with their respective cluster labels.These waveforms best reflect the salient attributes of that type of EEG signal.For example, the samples in row 1, column 1 (AB) represent scalp EEG in healthy volunteers, showing high frequency components compared to the inter-ictal iEEG (CD) in row 2, column 2. The four ictal iEEG signals in row 3, column 3 (E) exhibit various morphology, including some typical epileptiform discharge waveforms such as rhythmic sharp waves and spike and slow wave complexes.
Perhaps, it is signals whose class label and cluster label do not refer to the same attribute that deserve more attention.Taking the waveforms in row 2 and column 3 as an example, they belong to the inter-ictal, epileptic class; however, they are clustered as an ictal, epileptic cluster.They exhibit characteristics of typical epileptiform waveforms, such as spike waves and spike and slow wave complex.These findings are consistent with the clinical experience about the existence of epileptiform discharges in inter-ictal periods.In fact, these signals with multiple attributes should be annotated with multiple labels so that the information within the recordings can be reflected more comprehensively and objectively.
esting knowledge, including multiple labels of data.Taking the group AB_CD_E of the Bonn dataset as an example, Figure 12 shows the class labels and clustering results of several samples.Samples in each row belong to a same class and those in each column are clustered into a same cluster.Each grid displays four samples.Row 1 and column 1 both correspond to Class AB, i.e., healthy; Row 2 and column 2 both correspond to Class CD, i.e., inter-ictal; and Row 3 and column 3 both correspond to Class E, i.e., inter-ictal.Samples on each row belong to a same class and those on each column are clustered into a same cluster.Each grid displays four samples.Row 1 and column 1 both correspond to Class AB, i.e., healthy; Row 2 and column 2 both correspond to Class CD, i.e., inter-ictal, epileptic; Row 3 and column 3 both correspond to Class E, i.e., ictal, epileptic.

Discussion
Clustering plays a unique role in exploratory EEG analysis.It is unsupervised, and consequently, has low labor and time costs.With our approach, the adaptively learned Gaussian mixing coefficients make the model remain effective in dealing with imbalanced datasets.By means of intra-class clustering or cross-analysis of clustering and classification, it is possible to reveal intra-class diversity or other interesting information.As demonstrated in this work, the proposed approach is attractive for practice in epileptic subtype diagnosis, multiple labelling of EEG data, etc.
With the latent prototype-based clustering approach, the clustering results are close to the classification of the data (with reference to the results in Sections 4.1 and 4.2).It is in part due to the sound definition of the critic function, which is based on latent prototypes and measures the similarity on three levels.In this way, the approach is able to detect underlying unknown patterns in the data.Nevertheless, we would like to point out that, even if the clustering result is inconsistent with the classification, it does not mean that the performance of the clustering method is not good.This is because the objectives of clustering and classification are different.Classification is task-oriented, while clustering organizes data elements according to the resemblance of the intrinsic patterns (if any) of the data.
Different types of epileptiform waves were discovered from EEG recordings, as shown in Figure 11, in an unsupervised way without any given type label.It has been found that some types of epileptic waveforms are related to specific epilepsy subtypes.For example, rhythmic sharp waves are often associated with focal seizures, while the spike and slow wave complex is more common in absence seizures.Therefore, our approach can not only reveal the diversity of EEG signals during seizures and provide with a representative scalogram of each subtype, but also can point out when and what type of epileptic discharge occurs in the brain so as to assist the doctor in epilepsy subtype diagnosis.
Multiple labels of EEG or iEEG data can be discovered by means of the cross-analysis of clustering and classification, as shown in Figure 12.Such cross-analysis between unknown kinds and known classes has the potential to reveal novel knowledge.Sometimes, it is the signal whose class label and cluster label do not refer to the same attribute that deserve more attention.They reflect intra-class diversity.On the other hand, such analysis helps to better understand multiple attributes of the data.As revealed in Figure 12, in ictal-period iEEG signals of an epileptic, there exist waveforms similar to that of a healthy subject, while in the interictal period, some epileptiform discharges are found.In fact, these signals with multiple attributes should be annotated with multiple labels so that the information within the recordings can be reflected more comprehensively and objectively.
Discussion on DFM similarity.As shown in Figures 7 and 8, the inclusion of DFM similarity has little effect on the improvement in clustering performance.This may be due to the fact that the discriminator's task is to distinguish between real and fake samples.The features extracted by its convolutional layer are those that focus on that task.Hence, in most cases, the DFM level similarity plays a less important role in clustering than the other two levels of similarity.But, in a few cases, e.g., ECoG of HUP89, the use of DFM similarity is more effective than that of image similarity.This detail problem is pending for further study.
With respect to the W-SLOGAN model, the number of Gaussian components of the latent distribution need to be set in advance according to a predetermined number of clusters.The optimization of the cluster number may be a future research direction.

Figure 1 .
Figure 1.Different periods of electroencephalography (EEG) signals of an epileptic.a-e de ferent time points.

Figure 1 .
Figure 1.Different periods of electroencephalography (EEG) signals of an epileptic.a-e denote different time points.

Figure 1 .
Figure 1.Different periods of electroencephalography (EEG) signals of an epileptic.a-e denote different time points.

Figure 2 .
Figure 2. Schematic of EEG clustering solution based on latent prototypes.CWT, continuous wavelet transform.DFM, deep feature map.equery, latent space representation of the query signal.  , latent prototype of the kth cluster. , scalogram of the query signal. , baseline scalogram of the kth cluster. , deep feature map of the query signal. , baseline deep feature map of the kth cluster. ,  , and  are weights.

Figure 2 .
Figure 2. Schematic of EEG clustering solution based on latent prototypes.CWT, continuous wavelet transform.DFM, deep feature map.e query , latent space representation of the query signal.µ k , latent prototype of the kth cluster.x query , scalogram of the query signal.x k , baseline scalogram of the kth cluster.DFM query, deep feature map of the query signal.DFM k , baseline deep feature map of the kth cluster.α 1 , α 2 , and α 3 are weights.

Figure 3 .
Figure 3. Latent distribution defined as Gaussian mixture distribution and distribution of ge data and that of real data.Suppose there are three clusters in the dataset. ,  , and  regarded as the latent prototypes of the three clusters.

Figure 3 .
Figure 3. Latent distribution defined as Gaussian mixture distribution and distribution of generated data and that of real data.Suppose there are three clusters in the dataset.µ 1 , µ 2 , and µ 3 can be regarded as the latent prototypes of the three clusters.

Figure 4 .
Figure 4. Network architecture of W-SLOGAN.The latent distribution is defined as Gaussian mixture distribution.Assume the number of Gaussian components is 3.  ,  , and  denote the latent vectors sampled from latent space. ,  , and  denote the encoded vectors of the scalograms calculated by the encoder. ,  , and  denote the mean vectors of the three Gaussian components, corresponding to the latent prototypes of the three clusters. denotes the output of the discriminator.

Figure 4 .
Figure 4. Network architecture of W-SLOGAN.The latent distribution is defined as Gaussian mixture distribution.Assume the number of Gaussian components is 3. z 1 , z 2 , and z 3 denote the latent vectors sampled from latent space.e 1 , e 2 , and e 3 denote the encoded vectors of the scalograms calculated by the encoder.µ 1 , µ 2 , and µ 3 denote the mean vectors of the three Gaussian components, corresponding to the latent prototypes of the three clusters.d x denotes the output of the discriminator.

Sensors 2024, 24 , 4920 9 of 22 Figure 5 .
Figure 5. Three levels of similarity for clustering.Assume the number of Gaussian components is 3. DFM: deep feature map. ,  , and  denote the mean vectors of the three Gaussian components, corresponding to the latent prototypes of the three clusters. denotes the latent representation of the query signal. ,  , and  denote the baseline scalograms of the three clusters. denotes the scalogram of the query signal. ,  , and  denote the baseline deep feature maps of the three clusters. denotes the deep feature map of the query signal.Latent representation similarity.The scalogram  of the query signal is mapped to the latent space by the encoder (E) to become an encoded vector, which can be viewed as the latent representation of the query signal, denoted by    .Latent representation similarity measures similarity between and the latent prototype that is responsible for a given cluster.It is defined in(7), using cosine similarity to measure the

Figure 5 .
Figure 5. Three levels of similarity for clustering.Assume the number of Gaussian components is 3. DFM: deep feature map.µ 1 , µ 2 , and µ 3 denote the mean vectors of the three Gaussian components, corresponding to the latent prototypes of the three clusters.e query denotes the latent representation of the query signal.x 1 , x 2 , and x 3 denote the baseline scalograms of the three clusters.x query denotes the scalogram of the query signal.DFM 1 , DFM 2 , and DFM 3 denote the baseline deep feature maps of the three clusters.DFM query denotes the deep feature map of the query signal.

Figure 6 .
Figure 6.Clustering results and intra-class diversity.(A1-A3) show the probability density functions for samples belonging to Cluster 1, Cluster 2, and Cluster 3, respectively.(B) shows the probability density function of Class AB samples clustered into Cluster 1, several high-probability samples with their scalograms (in the upper row), and several low-probability samples with their respective scalograms (in the lower row).(C) shows the probability density functions of Class CD samples clustered into Cluster 3, several high-probability samples with their scalograms, and several low-probability samples with their scalograms (in the lower row).

Figure 6 .
Figure 6.Clustering results and intra-class diversity.(A1-A3) show the probability density functions for samples belonging to Cluster 1, Cluster 2, and Cluster 3, respectively.(B) shows the probability density function of Class AB samples clustered into Cluster 1, several high-probability samples with their scalograms (in the upper row), and several low-probability samples with their respective scalograms (in the lower row).(C) shows the probability density functions of Class CD samples clustered into Cluster 3, several high-probability samples with their scalograms, and several lowprobability samples with their scalograms (in the lower row).

Sensors 2024, 24 , 4920 16 of 22 Figure 7 .Figure 7 .
Figure 7. Purity, ARI, and NMI of the results of clustering on four groups of EEG/ intracranial EEG (iEEG) data of the Bonn dataset separately using different kinds of similarities.

Figure 7 .
Figure 7. Purity, ARI, and NMI of the results of clustering on four groups of EEG/ intracranial EEG (iEEG) data of the Bonn dataset separately using different kinds of similarities.

Figure 8 .
Figure 8. Purity, ARI, and NMI of the results of clustering on three epileptic subjects of ECoG data of the HUP dataset separately using different kinds of similarities.

Figure 8 .
Figure 8. Purity, ARI, and NMI of the results of clustering on three epileptic subjects of ECoG data of the HUP dataset separately using different kinds of similarities.4.3.W-SLOGAN's Training 4.3.1.Impact of the Number of Iterations in Training W-SLOGAN

,
2024 submitted to Journal Not Specified 4 of 11

Figure 2 .
Figure 2.This is a figure.Schemes follow the same formatting.If there are multiple panels, they should be listed as: (a) Description of what is contained in the first panel.(b) Description of what is contained in the second panel.Figures should be placed in the main text near to the first time they are cited.A caption on a single line should be centered.

Figure 9 .
Figure 9. Impact of the iteration number during training W-SLOGAN on the clustering performance on four groups of EEG data of the Bonn dataset separately evaluated with Purity, ARI, and NMI.

Figure 2 .
Figure 2.This is a figure.Schemes follow the same formatting.If there are multiple panels, they should be listed as: (a) Description of what is contained in the first panel.(b) Description of what is contained in the second panel.Figures should be placed in the main text near to the first time they are cited.A caption on a single line should be centered.

Figure 3 .
Figure 3.This is a figure.Schemes follow the same formatting.If there are multiple panels, they should be listed as: (a) Description of what is contained in the first panel.(b) Description of what is contained in the second panel.Figures should be placed in the main text near to the first time they are cited.A caption on a single line should be centered.

Figure 10 .
Figure 10.Impact of the iteration number duringtraining W-SLOGAN on the clustering performance on three epileptic subjects of ECoG data of the HUP dataset separately evaluated with Purity, ARI and NMI.

Figure 11 .
Figure 11.Typical kinds of epileptiform waveforms were found by clustering the ictal iEEG data of the Bonn dataset.In each row are displayed the characteristic waveform of a type of epileptiform discharge, three epileptiform waves of that type that were clustered into a same cluster found from the iEEG recordings by our approach, as well as the baseline scalogram of that cluster.

Figure 11 .
Figure 11.Typical kinds of epileptiform waveforms were found by clustering the ictal iEEG data of the Bonn dataset.In each row are displayed the characteristic waveform of a type of epileptiform discharge, three epileptiform waves of that type that were clustered into a same cluster found from the iEEG recordings by our approach, as well as the baseline scalogram of that cluster.

Figure 12 .Figure 12 .
Figure12.Class labels and clustering results of several samples in group AB_CD_E of the Bonn dataset.Samples on each row belong to a same class and those on each column are clustered into a same cluster.Each grid displays four samples.Row 1 and column 1 both correspond to Class AB, i.e., healthy; Row 2 and column 2 both correspond to Class CD, i.e., inter-ictal, epileptic; Row 3 and column 3 both correspond to Class E, i.e., ictal, epileptic.

Table 1 .
Description of the Bonn dataset.

Table 2 .
Description of electrocorticography (ECoG) data of three patients in the HUP dataset.

Table 3 .
Implementation details of the W-SLOGAN model.

Table 3 .
Implementation details of the W-SLOGAN model.
a Batch Normalization.b Global Average Pooling.

Table 4 .
Comparison of two reparameterization methods.

Table 5 .
Description of four experimental groups of the Bonn dataset.

Table 6 .
Description of the ECoG data of three de-identified epileptics in the HUP dataset.

Table 7 .
Details of the experimental parameter settings.

Table 8 .
The clustering results of the Bonn dataset by the use of different kinds of similarity metrics.Best ARI, and # NMI, respectively, indicate the largest number of the best Purity, ARI, and NMI across four groups of the Bonn dataset.

Table 9 .
The clustering results of the HUP dataset by the use of different kinds of similarity metrics.

Table 9 .
Cont.Best ARI, and # NMI, respectively, indicate the largest number of the best Purity, ARI, and NMI across three epileptic subjects of the HUP dataset.

Table 2 .
This is a table caption.Tables should be placed in the main text near to the first time they are cited.

Table 10 .
Mixing coefficients p(k) of the latent mixture components obtained from different trained W-SLOGAN models.