Automatic Epileptic Tissue Localization Through Spatial Pattern Clustering of High Frequency Activity

High-frequency activity (HFA) in intracranial electroencephalography recordings are diagnostic biomarkers for refractory epilepsy. Clinical utilities based on HFA have been extensively examined. HFA often exhibits different spatial patterns corresponding to specific states of neural activation, which will potentially improve epileptic tissue localization. However, research on quantitative measurement and separation of such patterns is still lacking. In this paper, spatial pattern clustering of HFA (SPC-HFA) is developed. The process is composed of three steps: (1) feature extraction: skewness which quantifies the intensity of HFA is extracted; (2) clustering: k-means clustering is applied to separate column vectors within the feature matrix into intrinsic spatial patterns; (3) localization: the determination of epileptic tissue is performed based on the cluster centroid with HFA expanding to the largest spatial extent. Experiments were conducted on a public iEEG dataset with 20 patients. Compared with existing localization methods, SPC-HFA demonstrates improvement (Cohen’s d $>0.2$ ) and ranks top in 10 out of 20 patients in terms of the area under the curve. In addition, after extending SPC-HFA to high-frequency oscillation detection algorithms, corresponding localization results also improve with effect size Cohen’s d $\geq 0.48$ . Therefore, SPC-HFA can be utilized to guide clinical and surgical treatment of refractory epilepsy.

get their seizures under control. However, medication malfunctions for nearly 1/3 of the patients who are medically refractory and leaves epilepsy not fully controlled [2]. In such cases, patients can potentially be treated by surgical resection of epileptic tissue and become seizure-free. Intracranial electroencephalography (iEEG) monitoring is the clinical gold standard to observe the seizure dynamics and guide surgery. Currently, clinical surgical treatment mainly rely on the removal of seizure onset zone (SOZ), defined as the the primary part for generating clinical seizures. SOZ can be identified through visual inspection of ictal iEEG signals by neurologists. Removing the cortex of SOZ is typically necessary but not sufficient since it does not delineate the full extent of epileptic tissue [3], [4], [5]. Moreover, ictal period during which seizure happens only take up a small proportion of intracranial recording as opposed to inter-ictal period. Therefore, biomarkers extracted from inter-ictal iEEG signals are also utilized, aiming to better localize epileptic tissue in addition to SOZ.
Epileptic spikes and high-frequency oscillations (HFOs) are most common inter-ictal biomarkers. In particular, HFOs are receiving great attention in recent years and their relationships with SOZ and surgical outcomes have been extensively studied [5], [6], [7], [8]. Typically, HFOs are prominent transient high-frequency events in the spectral range between 80-500 Hz [5], [9]. In the past decades, automatic HFO detection algorithms [10], [11], [12], [13] have been proposed and validated against human reviewers. Recently, their utility in SOZ localization and surgical outcome evaluation are directly explored [5], [6], [7], [14], [15], [16]. HFOs are proved as effective biomarker in localization [14], [15], [16]. But in [6], results indicate that HFOs are not statistically better than spikes. In [7], correlation between the removal of HFO-generating regions and seizure-free outcome is only in group level. It is worth noting that these studies are all based on automatic detection algorithms. However, detection algorithms suffer from two problems: first, physiological and pathological HFOs can be both detected and demonstrate variability in brain areas with different structure and functions [6], [7], [17], [18]; second, these algorithms are insensitive to HFA within epileptic tissue due to low signal amplitude and limited oscillation peaks outside of the detection threshold [19]. These problems will probably result in Fig. 1. The schematic diagram of automatic epileptic tissue localization process, which includes intracranial EEG signal processing (Step 1) and spatial pattern clustering of high-frequency activity (SPC-HFA, Step 2-4). In Step 2, SPC-HFA transforms multi-channel data matrix X into feature matrix Y with size N × T, representing channel and time, respectively. In Step 3, SPC-HFA applies spatial clustering along columns of Y with proper clustering number k. In Step 4, k cluster centroids c 1 , c 2 , . . . , c k are sorted and localization is based on c k which contains information about epileptic tissue.
To overcome the limitations and improve localization ability, we develop a method that we refer to as spatial pattern clustering of HFA (SPC-HFA), which includes three steps. The first step is feature extraction. Instead of HFO event detection, SPC-HFA adopts skewness in 80-500Hz as the quantitative measurement to improve sensitivity to HFA within epileptic tissue. High skewness values are correlated with the presence of HFOs [20], identifying pathological HFOs [21] and characterizing paediatric epilepsy [22]. In the second step, SPC-HFA employs spatial clustering according to the spatial extent of HFA during iEEG recordings. As revealed by previous research [19], [23], [24], HFOs show different spatial patterns. For example, the majority of HFOs are located only in one or several channels that are specifically within epileptic tissue [19], [23]. HFOs can also appear in multiple contacts at a time with certain spatial and temporal extent [23], [24], resulting from highly synchronized neural activation. These spatial patterns are related to epileptic tissue and surgical outcomes [19], [23], [24] and reflect the dynamics of the epileptic network in ictogenesis [25]. The third step is localization. After performing clustering, clusters corresponding to different spatial extent can be automatically separated and epileptic tissue localization is based on the cluster centroid with HFA expanding to the largest spatial extent.
Therefore, by improved sensitivity to HFA and separation among different spatial patterns, we hypothesize that SPC-HFA would achieve better localization results compared with automatic detection algorithms. Experiments are conducted on a public iEEG dataset consisting of 20 patients with refractory epilepsy. Localization results are statistically analyzed with respect to the receiver operating characteristic (ROC) curves and area under the curve (AUC).
The remainder of the paper is organized as follows: Section II introduces the dataset, data processing and details of SPC-HFA. The experiment results and analysis are in Section III. Section IV compares SPC-HFA with automatic HFO detection algorithms and methodological details. Section V concludes the paper.

II. MATERIAL AND METHODS
The schematic diagram of SPC-HFA for epileptic tissue localization is illustrated in Fig. 1. An example of epileptic tissue localization for P-2 is shown in Fig. 2. There are 8 depth electrodes implanted stereotactically into the amygdala, the hippocampal head and the entorhinal and perirhinal cortex bilaterally, which is referred to as AL, AR, AHL, AHR, ECL, ECR, PHL and PHR [26]. For each electrode (1.3 mm diameter, 8 contacts of 1.6 mm length, spacing between contacts centers 5 mm), the 3 most mesial bipolar channels are included for analysis.

A. Epileptic Patients and iEEG Recordings
The dataset was introduced in [26], which contains segments of long-term iEEG recordings from 20 patients with drug-resistant focal epilepsy. Recording types include subdural strip electrodes, grid electrodes and depth electrodes. Each segment consists of five-minute recordings during inter-ictal slow-wave sleep with 2000 Hz sampling rate. The number of segments for each patient is different, depending on the number of slow-wave sleep in each night and recording days. Details of each patient are available in [26]. Slow-wave sleep is the period during which HFOs are most frequent [5], [9]. All patients underwent surgery and postsurgical seizure outcome was determined according to the International League Against Epilepsy (ILAE) from class 1: completely seizure free and no auras to class 6: more than 100% increase of baseline seizure days [27]. The clinical profiles of the patients are summarized in Table I.

B. Data Processing
The data processing incorporates: • re-reference: iEEG signals are transformed into bipolar montage to suppress interference caused by severe common-mode noise and outliers during iEEG recording; • bandpass filtering: each iEEG segment is bandpassed in 80-500 Hz based on the type-II Chebyshev infinite impulse response (IIR) forward-backward filter with a passband ripple of no more than 1 dB and a stopband attenuation of at least 100 dB in RIPPLELAB toolbox [28]; • envelope extraction: peak upper envelope of iEEG signals are extracted using spline interpolation over local maxima separated by at least 30 samples. If original filtered signals are used, the skewness will center around 0 due to the oscillating components. The upper envelope over local maxima can also smooth the envelope to reduce the influence of noise; • segmentation: data is further segmented into 1-second epochs without overlap to enhance the temporal resolution.

C. Feature Extraction
Skewness is a measure of the asymmetry of a distribution, which is used here to quantify the asymmetry in the amplitude distribution of the iEEG time series within an epoch brought by HFA superimposed on a quiet background. For a time series x with n samples, skewness is calculated by where x is the mean of x.

TABLE I SUMMARY OF CLINICAL PROFILES IN THE DATASET
The envelope amplitude distribution is expected to be skewed due to the existence of HFA for a given channel. The feature extraction is performed in each epoch across channels, resulting in a feature matrix Y with size of N × T , where N is the number of bipolar channels and T is the total recording seconds as illustrated in Fig. 1 Step 2. Each column vector in Y corresponds to the extracted skewness within an epoch. We refer to 1, 2, .., T as temporal dimension since it represents the time course of iEEG signals and 1, 2, . . . , N as spatial dimension since it represents different iEEG channels.

D. Clustering and Optimal Clustering Number
The purpose of clustering is to separate column vectors into k proper clusters according to their similarity, which correspond to different spatial extents of neural activation. In this paper, k-means clustering is employed. Two reasons are listed below: first, the dimension of column vector equals the number of iEEG channels and will not exceed 65 in this paper without concerning "curse of dimensionality" problem; second, k-means is easy to implement with robust convergence and the resulting clustering centroid with the largest neural activation can be used directly for localization.
Before implementing the clustering algorithm and evaluating the optimal clustering number, each column of Y is z-scored with zero mean and unit variance as illustrated in Fig. 2(b) since clustering algorithms are sensitive to scaling. By this way, the fluctuation of skewness caused by varying amplitude of HFA within a time epoch can be reduced. Column vectors with HFA expanding to similar spatial extent are more likely to be within the same cluster.
The optimal clustering number is estimated based on different criteria to reach consensus. Here, elbow method, gap statistic, silhouette score and Davies-Bouldin index are employed. The elbow method measures the total within-cluster sum of squares and identifies the optimal number by the 'elbow' with the increasing of k. Gap statistic compares the change in within-cluster dispersion with that expected under null reference distribution, the optimal cluster number being the maximum value. Silhouette score measures how well the intra-cluster points are matched to the neighboring cluster, the local peak value being the appropriate cluster number. Davies-Bouldin index is defined as the ratio between the cluster scatter and the separation among clusters, the local minimum being the appropriate cluster number. When a monotonic trend is observed for a certain method and indicates a large k evaluation, the turning point in the trend is selected as the proper clustering number. If no obvious clustering number is observed, the clustering number will be set to 1, equal to taking the average of each channel as the clustering centroid. An example evaluation process for P-2 is plotted in Fig. 2(c), obvious turning points appear in k = 3 for elbow, gap statistic and Davies-Bouldin methods whereas the highest silhouette score appears in k = 2. Hence we choose k = 3 for P-2. K-means is implemented with 10 repetitions and k-means++ algorithm for cluster center initialization to increase robustness [29]. Since k-means is likely to converge to a local minimum or not converge, multiple repetitions and random centroids initialization can help reach global minimum. In practice, we empirically find that 10 repetitions are sufficient. Only the optimal solution after 10 repetitions are utilized for subsequent analysis.
After evaluation, clustering numbers of patients in the dataset are listed in Table II.

E. Epileptic Tissue Localization
After clustering, we can get k clusters and corresponding cluster index across time as illustrated in Fig. 2(c). It can be observed that the distribution of different clusters is mixed, with each cluster centroid in Fig. 2(d) representing a specific extent of neural activation. Cluster centroids are sorted in ascending order according to variance. Z-scored column vectors with dimension N lie on the N − 1 dimensional unit hypersphere, the intersection between the N dimensional unit ball y 2 1 + y 2 2 + . . . + y 2 N = 1 and N dimensional hyperplane y 1 + y 2 + . . . + y N = 0. In the ideal case without noise and with the amplitude of HFA constant, column vectors with HFA expanding to different spatial extent can be well separated and variance of each clustering centroid equals to 1. However, due to the existence of noise and random fluctuations in clinical EEG recordings, the distribution of column vectors with no obvious HFA is likely scattered, causing the clustering centroid shifting toward the origin with variance approximately 0. In contrast, the distribution of column vectors with HFA expanding to larger spatial extent are more likely centered and less influenced by noise, causing the clustering centroid close to the point on the unit hypersphere with variance around 1. Therefore, the cluster centroid with the largest variance manifests clearly the dynamic of the epileptic network, which is used for epileptic tissue localization. Taking the results in Fig. 2(d) for example, for centroid 1, skewness values are relatively low for most channels. For centroid 2, channels ER1-2, HR1-2 within epileptic tissue are highly activated and skewness values of AR1-2, ER2-3 and HR2-3 also increase. For centroid 3, skewness values of channels PR1-2, PR2-3 and AR1-2 are also higher compared with centroid 2. Therefore, epileptic localization can be conducted on centroid 3, which represents the neural activation that extends most epileptic tissue.
Channels which exceed a certain threshold are labeled 'epileptic' whereas channels which are below the threshold are labeled 'non-epileptic'. Localization results are compared with the channels within resected zone (RZ), which is accessible in the dataset. Channels within the RZ are considered as true 'epileptic' whereas channels outside the RZ are considered as true 'non-epileptic'. For patients with seizure-free outcomes, the RZ will contain or be equal to the epileptic tissue. For patients with surgical outcomes that still evoke seizures, RZ does not fully overlap with the full extent of epileptic tissue and includes a portion under certain clinical condition.
To demonstrate the efficacy of SPC-HFA, existing localization methods with open-source code based on HFO and spike detection are implemented. HFO detection methods include short time energy (STE) detector [10], short-time line length (SLL) detector [11], Montreal Neurological Institute (MNI) detector [12] and Cimbálník-Stead (CS) detector [13]. STE, SLL and MNI detectors are based on bandpass filtering, baseline activity estimation and HFO event detection with specific requirement on energy and duration. CS detector simultaneously considers the amplitude and frequency traces to increase sensitivity and specificity over STE and SLL detectors. It has been applied in several studies [7], [14]. Spike detector from [30] with standard parameters in [14] is used to detect the number of spikes in each channel. Average of HFA (Ave-HFA) which takes the average of skewness without clustering is also implemented. STE, SLL and MNI detectors are implemented based on RIPPLELAB toolbox [28] in the frequency band between 80-500 Hz. Spike detection and CS detector are implemented according to the code provided in the original papers. In particular, HFO number of CS detector is summed together except in the gamma band.

A. Case Analysis
To give a clear example on how SPC-HFA works compared with the other methods, results of different methods for P-2 are given in Fig. 3. From Fig. 3, except SPC-HFA, the values in part of epileptic channels are higher than part of non-epileptic channels. However, the separation between channels within and out of epileptic tissue is not clear. For example, skewness/spike number/HFO number in channels AL1-2, HL1-2, PL1-2 are comparable to some channels within epileptic tissue. Such non-separability will cause false positive results in localization. In contrast, for centroid 3 derived from SPC-HFA, skewness of channels within epileptic tissue stands more clearly than the other methods. Therefore, SPC-HFA will have less false positive results and better localize epileptic tissue.  Speci f icit y = T N T N + F P (3) Considering the variability of electrode types, channel numbers, signal patterns among different patients, epileptic tissue localization is performed for each patient separately and the sensitivity and specificity are averaged over all patients.
To quantify the diagnostic ability of epileptic tissue localization at various threshold settings, ROC curve is plotted as sensitivity against 1-specificity (FPR). The results are illustrated in Fig. 4, 5 and Table III. In Fig. 4, average AUCs of all methods but CS are above chance level (0.5), which indicates that skewness, HFO number and spike number are effective biomarkers in epileptic tissue localization. SPC-HFA ranks top with the average AUC 0.72. The average AUCs of Ave-HFA, STE, SLL and MNI are above 0.60. The average AUC of Spike is 0.57. Under low FPR (< 0.2), ROC curves of SLL and SPC-HFA are above the other methods and the sensitivities approach to 0.5. With the FPR continuously increasing but still at a low rate (less than 0.3), the sensitivity of SPC-HFA rises up quickly and reaches around 0.7 whereas the sensitivity of SLL  is about 0.6, similar to STE, MNI and Ave-HFA. When FPR is larger than 0.3, SPC-HFA maintains superior performance over the other methods. For Ave-HFA, STE and MNI, ROC curves are close to each other and then diverge when the FPR is larger than 0.3. The localization performance of Spike and CS are inferior to the other methods.
Individual AUCs and ROC curves across patients are listed in Table III and Fig. 5. SPC-HFA achieves highest AUCs in 10 out of 20 patients, compared with MNI in 6 patients, Ave-HFA and STE in 4 patients, SLL in 3 patients, Spike and CS in 2 patients. The 95% confidence interval (CI) of AUCs for SPC-HFA is [0.61,0.84], which is superior to the other methods. MNI detector perform better for P-3, P-4, P-15, P-18, P-19, P-20 over SPC-HFA possibly because it is able to handle the segment with continuous HFO activity whereas skewness is not sensitive enough to quantify the asymmetry between HFA and background activity. For patients with seizure-free outcomes like P-1 to P-6, P-10, P-13 to P-16, the top AUCs among different localization algorithms are above 0.7, indicating good consistency with surgical treatment. Their ROC curves are characterized by rapid elevation of sensitivities under extremely low FPR (0 or close to 0), which means that part of epileptic tissue can be directly identified and is distinct from the remaining tissue. When FPR increases, sensitivities gradually step up, corresponding to the mixture of the other part of epileptic tissue and part of normal tissue. This is reasonable since epileptic tissue is equal or contained within RZ for these patients. For patients with ILAE 3 evaluation like P-7 and P-8, the top AUCs are only 0.55 and 0.67 in poor consistency with surgical treatment. None of these localization algorithms is effective compared with surgical treatment. For patients with ILAE 5 or 6 evaluation like P-9, P-17 to P-20, good (P-9 with AUC 0.91 by SPC-HFA, P-18 with AUC 0.80 by MNI and P-19 with AUC 0.98 by SLL) and poor localization consistency (AUC 0.66 by STE for P-17 and AUC 0.43 by MNI for P-20) both exist. There are two possible cases. When implanted electrodes only cover a specific part of the total epileptic tissue and RZ determined by the doctor is the best possible treatment, good consistency between RZ and the localization algorithm indicates that the algorithm can successfully localize epileptic tissue as well whereas poor consistency indicates that the algorithm fails to localize it. When implanted electrodes miss the majority of epileptic tissue or do not cover it at all, seizures cannot be effectively controlled for the patient whether the consistency between RZ and the localization algorithm is good or not. We use effect size Cohen's d to quantify the improvement of SPC-HFA over the other methods in terms of paired

C. Extending SPC to HFO Detection Algorithms
In this part, we extend the results of SPC-HFA to 4 HFO detection algorithms. Similar to SPC-HFA, HFO numbers are summed only within the cluster with the largest centroid variance for localization. They are referred to as SPC-CS,  SPC-STE, SPC-SLL and SPC-MNI. Evaluation process follows Section IIIB. The results are illustrated in Fig.6 and Table V. In Fig. 6(a)-((b)), compared with the original localization results, ROC curves with SPC gives better performance (average AUC from 0.48 to 0.52 for CS, 0.61 to 0.69 for STE, 0.67 to 0.72 for SLL and 0.67 to 0.73 for MNI). The sensitivities are similarly elevated when FPR is approximately larger than 0.2. In Fig. 6(e)-6(h), higher AUCs (except P-5 in Fig. 6(a) whose AUC approaches to 1) are achieved for P-1 to P-6, P-8 and P-9. For the other patients, AUCs in two cases remain the same or similar. In Table V, compared  with Table IV, medium effect size (Cohen's d ≥ 0.5) is found between SPC-CS and CS (0.72), SPC-STE and STE (0.58) and SPC-SLL and SLL (0.50). The effect size between SPC-MNI and MNI is around medium (0.48). In contrast to Table IV, Cohen's d increases for STE, SLL and MNI because localization results of each detection algorithm are only compared with itself with SPC. Therefore, variability induced by design criteria is greatly reduced.

A. Comparison With Detection Algorithms
Automatic HFO/spike detection methods often follow evaluation by comparison with clinical gold standard HFOs/spikes marked by clinicians. Some false detection will be rejected. This way of evaluation is extremely time-consuming and differs significantly subject to patients, centers and so on. Validation on large-volume data is difficult.In this study, by directly applying detection algorithms and building correlation with epileptic tissue, their diagnostic abilities are demonstrated quantitatively. Some false detection are potentially allowed since they do not influence the final localization results after analyzing long-term data.
Two ways to enhance HFO detection for localizing epileptic tissue are proposed: first, clustering based on automatically detected HFOs can be used to separate spikes and artifacts from putative HFOs [16], [32]. But one research points out combining HFOs with spikes by cross-rate is better than HFOs alone in epileptic tissue localization [6]. Reference [33] shows that different morphology of HFOs reflect similar epileptogenicity and it may not be necessary to separate putative HFOs from false oscillations produced by the filter effect of sharp spikes. The relationship between HFOs and spikes in determining epileptic tissue is also in active investigation. Second, detected HFOs can be organized into networks for analysis [8], [24], [25]. In this case, patient-specific geometric information on electrodes [24], [25] and temporal interval restriction [8], [25] are required.
Presumably, both ways further analyze HFOs based on detected HFOs, which is different from SPC-HFA. Since HFOs that relate to neural activation within epileptic tissue are not always detectable, SPC-HFA starts from skewness which generalizes HFA quantification by a single and continuous measurement and is able to maintain the quantitative comparison among different channels within a specific time segment. A quite surprising finding is that clusters derived from SPC-HFA can also be applied for HFO detection algorithms in improving epileptic tissue localization. This supports a close relationship between HFA and HFOs and denies the possibility that the measurement of HFA by skewness is contaminated or biased by noise. The possible underlying mechanism on the formation of spatial clusters can be explained by the network hypothesis [25] that HFOs are generated by pathologically interconnected neuronal nodes. Coordinated activity within these nodes produces seizures.

B. Methodological Consideration
The evaluation process of proper cluster numbers is not fully automatic. Clear cluster numbers can be found in 15 out of 20 patients. For the rest 5 patients, data for each patient is treated as one cluster across time since no suitable cluster number can be found. There are two possible reasons. The first is that epileptic activity is limited in one or only several channels and shows clear difference between epileptic and normal tissue, especially for the patients with ILAE 1 outcome (e.g., P-10, P-14). The second reason is that the activity difference is so trivial or diverse that it is very hard to separate them only by clustering. SPC-HFA is suitable in the case that epileptic tissue is sampled by multiple electrodes and boundary determination between epileptic and normal tissue is expected.
SPC-HFA as well as other HFO/spike detection algorithms process different types of electrodes equally since there is no anatomical data and accurate coordinations of implanted electrodes provided in the dataset. The influence of electrode types has been examined in a recent study [34] and needs investigation in future study.
In this study, we sort different clusters according to variance, which is a simple and straightforward way to describe the difference between epileptic and normal channels. Alternative ways to sort these clusters need to be considered. In addition, only the information of clusters is utilized and the sequential cluster index across time remains unexplored. Recently, some researchers start to investigate such temporal variability with statistical models such as binomial and Nelson-Aalen models [35], [36]. How to interpret temporal information of different clusters is still a question. In terms of time period, data analysis is carried out in slow-wave sleep during which HFOs are most frequent and interference is limited. Whether this method can be generalized to other phases still needs careful examination.

C. Limitations and Future Perspective
In this study, we retrospectively analyzed 20 patients with refractory epilepsy. The proposed SPC-HFA and its extension to HFO detection algorithms demonstrate improvement over the other methods generally. However, limited patient number and variability from different algorithms result in restricted validation power when SPC-HFA is compared with the rest of the algorithms. Variability can be effectively reduced by extending SPC-HFA to the corresponding HFO detection algorithm. The limitation of sample size is partially overcome by effect size measurement which is able to assess the magnitude of effect under limited samples. Another limitation is that SPC-HFA has not been tested on healthy controls due to the lack of available dataset. To fully solve the problem, more clinical patients and healthy controls should be incorporated to get conclusive results in the future. The understanding derived from clinical patients with seizure-free outcomes is more straightforward than patients who still suffer from seizures after surgery. For the latter, we can only compare the results under the assumption that surgical resection is the optimal treatment. Except for the extent of consistency between automatic localization and surgical treatment, potential utility for these patients also lies in the clinical profile and expertise of doctors. SPC-HFA is intentionally designed as an automatic localization algorithm. After retrospective analysis and evaluation, it is more expected that SPC-HFA can be directly applied in pre-surgical evaluation and provide useful reference for doctors.
V. CONCLUSION SPC-HFA deals with epileptic tissue localization based on quantitative HFA measurement via skewness and spatial clustering corresponding to different extent of neural activation. It supports automatic localization and can be used in presurgical evaluation for patients with drug-resistant epilepsy. We hope this method can be incorporated in clinical practice to contribute to better treatment options for patients.