Coherent False Seizure Prediction in Epilepsy, Coincidence or Providence?

Seizure forecasting using machine learning is possible, but the performance is far from ideal, as indicated by many false predictions and low specificity. Here, we examine false and missing alarms of two algorithms on long-term datasets to show that the limitations are less related to classifiers or features, but rather to intrinsic changes in the data. We evaluated two algorithms on three datasets by computing the correlation of false predictions and estimating the information transfer between both classification methods. For 9 out of 12 individuals both methods showed a performance better than chance. For all individuals we observed a positive correlation in predictions. For individuals with strong correlation in false predictions we were able to boost the performance of one method by excluding test samples based on the results of the second method. Substantially different algorithms exhibit a highly consistent performance and a strong coherency in false and missing alarms. Hence, changing the underlying hypothesis of a preictal state of fixed time length prior to each seizure to a proictal state is more helpful than further optimizing classifiers. The outcome is significant for the evaluation of seizure prediction algorithms on continuous data.


Introduction
The prediction of epileptic seizures has become a more and more realistic scenario, promoted by new capabilities and insights into the field of data-driven signal processing in the last decades. For over 30 years, researchers have been trying to identify precursors of seizures and to understand the underlying mechanisms of ictogenesis in the epileptic brain (Kuhlmann et al., 2018b). After being too optimistic in the late 1990s on the predictive performance of many approaches that had been derived from short-term intracranial electroencephalography (iEEG) data, the focus moved on to an assessment of new algorithms of continuous multi-day recordings obtained from pre-surgical evaluation (Mormann et al., 2007).
In a prospective study with an implanted advisory system -the first and only in-man clinical trial so far -it has been shown that seizure prediction in humans is generally possible (Cook et al., 2013). Subsequently, with the availability of continuous long-term recordings that are available to scientists over collaborative platforms (Brinkmann et al., 2016;Howbert et al., 2014;Wagenaar et al., 2015) significant progress was made in terms of sensitivity and specificity of seizure prediction.
In two online competitions the contestants focused on the development and improvement of refined algorithms to solve a binary classification problem, i.e. to classify data clips as either preictal (i.e. a seizure is imminent) or interictal (Brinkmann et al., 2016;Kuhlmann et al., 2018a). To solve this classification problem, scientists used state-of-theart machine learning and deep learning methods like random forest trees, support-vector machines (SVM) (Direito et al., 2017), k-nearest neighbours (Ghaderyan et al., 2014), convolutional neural networks (Eberlein et al., 2018;Nejedly et al., 2019), and recurrent networks with long-short term memory (LSTM) (Ma et al., 2018;Tsiouris et al., 2018) among others.
Despite the fact that in many cases the algorithms perform better than a random predictor, they are mostly considered unsuitable for actual clinical use. This is due to a rather low specificity which is characterised by a high number of false positive alarms or by an inappropriate long time in warning (Snyder et al., 2008). Even though the aim is to achieve a high level of sensitivity to detect as many potential dangerous situations as possible, false alarms should be avoided in order to prevent additional psychological stress for the patients by incorrect warnings or unnecessarily long warning times.
However, it is not clear whether an apparently false positive decision is always actually a false prediction or whether it might uncover an epoch of high seizure likelihood that finally did not end up in ictal event -i.e. is a subclinical event. In this contribution we evaluate false alarms in seizure prediction by comparing the outputs of two substantially differing classifiers. Our hypothesis is that false alarms in more than one method imply some intrinsic changes in the iEEG data that cannot be detected by the data-driven methods used and are thus not necessarily the result of a poor classification algorithm.

Datasets
In this study we used three different datasets of long-term intracranial electroencephalography (iEEG). Two of them comprise iEEG segments preselected for recent public seizure challenges, which have become benchmark datasets of seizure prediction studies (Brinkmann et al., 2016;Kuhlmann et al., 2018a). In addition, a new dataset (Dataset 1 ) with continuous unselected iEEG recordings was invoked to show that our finding was not biased by the selection procedure of the former.
Dataset 1 Intracranial EEG from five patients (Patient A-E) with pharmacorefractory epilepsy were recorded during presurgical diagnostics using subdural stripes and grids, and depth electrodes at the Department of Neurosurgery of University Hospital Carl Gustav Carus in Dresden, Germany (Eberlein et al., 2019). Beginning and end of each clinical seizure and artefacts were annotated by an clinical neurologist (G. L.) specialised in epileptology. For data analysis, between 58 and 107 channels at a sampling rate of 500 Hz or 1 kHz were available. Patients' details are given in Table 1.  (Coles et al., 2013;Davis et al., 2011) that had been recorded from human patients and dogs with naturally occurring epilepsy were used. In the context of the American Epilepsy Society Seizure Prediction Challenge that had been conducted on the platform kaggle.com, data from five canines and two humans was made available to the contestants (Brinkmann et al., 2016). From these datasets, we excluded the human patients and Dog 5 since their data acquisition differs from the other data (Dog 1 -Dog 4 ), which had been recorded with 16 channels and a sampling rate of 400 Hz.

Dataset 3
The iEEG data was made available to the contestants of the Melbourne-University AES-MathWorks-NIH Seizure Prediction Challenge (Kuhlmann et al., 2018a) and includes recordings of three human patients that had taken part in a study with the NeuroVista system (Cook et al., 2013). The iEEG data comprises recordings for about six months occurring after the first month of implantation and was also recorded from 16 electrodes and sampled at 400 Hz.

General Considerations
In this contribution, we compare two approaches for seizure prediction that both show a good performance when applied to long-term iEEG datasets but follow fundamentally different strategies. The first method uses a list of "hand-crafted" univariate and bivariate features that have proved suitable for EEG characterisation (Tetzlaff and Senger, 2012;Senger and Tetzlaff, 2016). Feature vectors were classified by a multilayer perceptron (MLP) and the best feature combination is chosen to optimise the performance of the algorithm. The second method applies a deep neural network to the iEEG raw data. By means of subsequent convolution and pooling layers the feature extraction and classification is taken over completely by a deep learning process.
Data Handling All recordings were divided into a training set to fit our models and a test set for the evaluation on out-of-sample data. For each individual, training and test sets were separated in time. In case of the five patients of Dataset 1 the first half of the data was assigned as training data and the second half as test data (Eberlein et al., 2019). Details about the structure of Dataset 2 and Dataset 3 are given in (Korshunova et al., 2018) and (Kuhlmann et al., 2018a).

Segmentation of Dataset 1
The segmentation of recordings from Dataset 1 was done in general accordance with the procedure of the competitions on kaggle.com as outlined in (Brinkmann et al., 2016) and (Kuhlmann et al., 2018a). All data was divided into contiguous, non-overlapping 10-min clips. The data from 66 min to 5 min before onset of a seizure was assigned as preictal. The time period of 60 min following a seizure onset was excluded from the analysis to avoid derogation of the data by ictal and postictal behaviour. Any seizure that might have occurred during that time period would have not been included in the analysis. Data that was recorded at least 4 h from any seizures (i.e. from 240 min after a seizure till 240 min before the next seizure) was assigned as interictal.
To avoid data contamination by events exceeding the recorded duration, 4 h of data at the beginning and at the end of each recording was discarded. EEG channels showing artefacts (identified by visual inspection by the neurologist) were excluded from this study. Table 2 provides an overview to the numbers of interictal and preictal clips of all datasets.
Preprocessing All data was subsampled to 200 Hz in order to reduce computational cost. Before being fed into the CNN, z-score normalisation was applied on each channel individually all 10-min clips. Subsequently, the 10 min sequences were divided in segments of 15 s. In this study, an adaptive-training approach with retraining after a fixed period or seizure event is not considered.

Feature-based Classification (Method 1)
Features We considered both univariate and bivariate features to account for epileptiform anomalies in single-channel iEEG measurements and among their correlations. A major part of our core feature set comes from top ranked algorithms of the American Epilepsy Society Seizure Prediction Challenge on kaggle.com (Brinkmann et al., 2016). This includes band power spectrum and statistical moments of single channel signals as well as cross-channel (linear) correlations in time and frequency domains. Other methods were adapted from previous studies to complement and strengthen the set further (Mormann et al., 2007;Tetzlaff and Senger, 2012). For univariate features we added autoregressive (AR) model coefficients and signal prediction errors, which capture sequential information of iEEG signals (Tetzlaff and Senger, 2012). For bivariate features we used nonlinear interdependence  and mean phase coherence  to characterise nonlinear cross-channel coherence. In view of the complex nature of brain dynamics, nonlinear measures are expected to be more suitable for extracting information from iEEG signals. We considered three variants of symmetric bivariate features: 1) the feature matrix itself, 2) eigenvalues and eigenvectors of the matrix as well as the maximum of rows/columns, and 3) a combination of all features of the two previous variants.

Classification
The classification was executed on input vectors of each univariate and bivariate feature by means of a multilayer perceptron (MLP) with three hidden layers comprising 16, 8, and 4 neurons, respectively. The combination of uni-and bivariate characteristics is intended to provide both properties of single channels and of their correlations. Each network layer is followed by a batch normalization. The rectified linear unit (ReLU) activation function is used for hidden layers and the sigmoid function for the output layer. The network was trained with stochastic gradient descent (SGD) using backpropagation over 500 epochs with a learning rate of 10 −4 . It was found that dropout does not improve the performance of such a small network. We considered the mean of an ensemble of 100 networks with different initial weights in order to obtain a statistical significance of the outputs.
To find the optimal feature combination the area under the receiver operating characteristic curve (ROC AUC value) of all possible combinations was estimated on a respective validation set and the feature combination with the highest validation score was selected as the optimal one to get the test scores and predictions. In case of the patients from Dataset 1, the validation set equals the test set due to the relatively short recording time for each patient. For Dataset 2 and Dataset 3 the available public test sets were chosen for validation. We are fully aware of the limits of significance of our methodology and that the performance does not reflect a purely prospective approach, where the optimization of the model's hyperparameters should be done on the training data (Eberlein et al., 2018). However, we accept this in the context of this study since the focus is directed to the false alarms and less on the prediction performance itself.

Deep-Learning Classification (Method 2)
In comparison to the "hand-crafted" feature extraction we applied a convolutional neural network to the multi-channel iEEG data, as originally proposed as topology 1 (nv1x16) in (Eberlein et al., 2018). An appropriate low-dimensional representation of the signal was derived from the raw data in recurring consecutive layers of convolution, nonlinear activation, and pooling operations.
A schematic of the topology is depicted in Fig. 1. The input data (a k-channel array of 3,000 samples) is processed along the time axis by convolution and pooling operations with kernel sizes in the range of 2 to 5. The number of feature maps changes from 32 in the first layers to 128 and to 32 again in the last layers. ReLU was chosen as nonlinear activation in the convolutional and dense layers and the sigmoid function was used in the output layer. Additionally, dropout layers (p = 0.2 and p = 0.5) as well as L1 and L2 regularization were applied. Finally, the classification was done in a fully connected layer of 64 neurons.
In contrast to method 1, the network's hyperparameters were optimised by using training and test data of Dog 2 to avoid an overfitting of the models. For all remaining individuals, the derived network topology is applied without using further validation sets. To improve the statistical significance 20 models were trained for each individual.  Figure 1: Schematic of the convolutional neural network (CNN) topology to process multichannel iEEG raw data, exemplarily shown for 16-channel input data with 3,000 samples.

Correlation of prediction errors
Various metrics have been used for assessing the prediction performance, for instance accuracy, sensitivity, etc. Here, we are interested in changes of predictions over time and especially in the consistency of changes between two different methods. Subsequently, the term prediction is considered as the respective network output and represents the probability of an iEEG clip to be preictal. According to our hypothesis, a high correlation of predictions of fundamentally different classifiers indicates an intrinsic change in the data.
A standard measure used to characterise the coherence between two data series is the pearson correlation coefficient. For two predictions p i and p j it reads where · and σ(·) denote the mean and standard deviation of a prediction for all samples.
Since any good prediction should be close to the ground truth the correlation c is strongly biased, i.e. its value is almost trivially high for two good predictions. For a metric with emphasis on the coherence between "false" predictions we used the weighted correlation coefficient where e i = p i − L denotes the prediction error and L is the label. The weighted mean and standard deviation are given by The weight factor w = max ( e i , e j ) was assigned to emphasize the effect of coherent false predictions.

Information transfer
As another demonstration of the observed coherent false predictions, we tested how knowledge of false prediction obtained for method A can be used to artificially boost the performance of method B and vice versa. To be precise, with predictions from method A we eliminate all clips with the prediction error e larger than a threshold value e th . Then the ROC AUC value of method B is evaluated on the reduced dataset. If there is a strong correlation of the prediction errors, we expect that samples which are difficult to be classified for method A are likely to be classified falsely by method B as well.

Classification performance
For a non-subjective characterization of the performance of our classifiers we used the statistical metric of the ROC AUC value. In order to obtain classifications of the original 10 min clips, the predictions of the corresponding 15 s segments have been averaged. Due to the missing timestamps of Dataset 2 and Dataset 3, it was not possible to calculate other metrics on specificity like time in warning (Mormann et al., 2007). As shown in Tab. 3, both classifiers performed better than a chance perdictor in 9 out of 12 individuals from all three datasets. The performance is comparable to the state-of-the-art leading algorithms for the long-term recordings of Dataset 2 (Brinkmann et al., 2016) and Dataset 3 (Kuhlmann et al., 2018a)

Coherent false predictions and information transfer
Interesting new aspects were found when comparing the prediction time series directly, as shown exemplarily for Patient A and Patient E in Fig. 2. Both classifiers delivered a good performance when applied to the recordings of Patient A as characterised by AUC values of 0.79 for method 1 and 0.74 for method 2. Hence, it is not surprising to see that both methods made correct predictions consistently for many segments. Noticeably, they delivered false predictions also in a highly coherent manner, as indicated for instance in the cluster of false positive states from day 2 8:00 am to 4:00 pm. Moreover, the way how the prediction values change with time is also very coherent. The same phenomenon can be seen at the predictions over time of Patient E in the lower part of Fig. 2, where a strong coherence of false predictions is observed for example on day 2 from 0:00 am to 3:00 am. Method 2 Figure 2: Comparison of predictions in the testing data for Patient A (upper) and Patient E (lower). The standard deviation (SD) is computed over 100 models for method 1 and 20 models for method 2. False positive predictions are delivered coherently by both methods for instance for Patient A from day 2 8:00 am to 4:00 pm and for Patient E from day 2 0:00 am to 3:00 am. Seizure onset is marked as red line.
The correlation coefficient c and weighted correlation coefficient c w were used to measure the observed coherency between false predictions. Here, negative values represent anti-correlation, meaning a state falsely labelled by classifier A will be correctly labelled by classifier B and vice versa. The positive values of c and c w (see Tab. 3) indicate that the two classifiers are mistaken about the same time periods by giving coherent false predictions, with a higher positive value representing a stronger coherency.
Note that for Patient C of Dataset 1 the coherent false predictions have a clear oneday rhythm (see Fig.S1-S4 in the supplementary material), which evidences a direct cause from the circadian cycle. However, no regular cycles are visible for other patients of Dataset 1 with a significantly high coherence measure c w . Hence, circadian cycle and other periodicity can not be the only cause of the observed coherency in false predictions. Table 3: Seizure prediction performance of the two methods characterised by receiver operating characteristic (ROC) area under curve (AUC) values and correlation of the predictions c and the weighted correlation of prediction errors c w . Both methods were compared against random predictors and p-values of their superior performance were assessed using the Hanley-McNeil method (Hanley and J., 1982). For the correlations, p-values are estimated as the probability of two random predictions with the same ROC AUC having equal or higher values. A total of M = 100 random predictions were obtained for each method by randomly permuting the original predictions for each label class individually. All possible combinations amount to N = 5050 correlations for computing the p-values. For c and c w , all p-values are < 1/N , except for a : p = 0.0008, b : p = 0.008, c : p = 0.02 For most individuals of all three datasets the values of c w are larger than 0.5, indicating the common occurrence of a medium to strong correlation between false predictions of two classifiers. Exceptions are Patients B, Patient D, and Patient 1, where at least one of our classifiers performs significantly worse than the other. For all individuals the pearson correlation coefficient c is always larger than the corresponding value of c w , which reflects the biasing effect of the common correct predictions.
The change of ROC AUC values depending on the threshold of omission e th is shown in Fig. 3(a) for Dog 3 and in (b) for Patient 1. The ROC AUC value at e th = 1 corresponds to that of the original complete testing set and with decreasing e th more and more falsely  Figure 3: Coherence of false predictions demonstrated by information transfer between two methods, where c w is the weighted correlation of prediction errors, e denotes the prediction error, and e th denotes the threshold of omission. Here, "method 1" means that the falsely predicted samples (with e > e th ) of method 1 were eliminated from the test set of method 2, and vice versa. For comparison, "method 1 random" shows the performance for a randomly reduced test set of the same size.
predicted samples were omitted.
For Dog 3 in Fig. 3(a) we clearly observe an increase in the ROC AUC values with decreasing e th , which is in good correspondence with the strong correlation of false predictions for this individual (c = 0.80 and c w = 0.79, see Tab. 3). To exclude the possibility that the increase of ROC AUC values is only due to the decreased amount of testing samples, the performance has been evaluated for a reduced test set with the same number of randomly selected samples being omitted. In this case, the ROC AUC values remains almost constant.
For comparison, we observe no significant increase of the ROC AUC values for Patient 1 in Fig. 3(b) as long as e th does not fall below a threshold of 0.6. This behaviour can be expected since Patient 1 shows a rather low correlation of false predictions (c = 0.22 and c w = −0.07) and hence, it is not likely that the omission of samples falsely predicted by method A will significantly affect the performance of method B.

Discussion
In our study we were able to show that seizure prediction is possible with a performance better than chance for a majority of the individuals, substantiated by ROC AUC values above 0.5 for 11 out of 12 individuals for method 1 and for 9 out of 12 individuals for method 2. This result is in line with recent studies (Brinkmann et al., 2016) and (Kuhlmann et al., 2018a).
The performance of method 1 is slightly better than method 2 for almost all individuals (except for Dog 3 and Dog 4 ) as the best feature combination was chosen retrospectively from the validation set. In a prospective approach such an optimisation will not be possible and the performance of this method is therefore expected to be worse , as shown in (Eberlein et al., 2019). Moreover, the relevance of the ROC AUC values of Dataset 1 is limited since the amount of iEEG recordings is considerably shorter than that of Dataset 2 and Dataset 3, respectively. However, it provides continuous and annotated data which is valuable for the discussion of coherent false predictions given below.
Generally, it seems that the performance for each individual is limited by a "ceiling effect". This is consistent with a recent study that observed that ensembling of topperforming algorithms shows no real improvement (Reuben et al., 2019). These findings imply that classifiers might not be able to perform significantly better on this data and that false predictions are correlated across different algorithms.
We assume that the reason for the upper boundary is due to the non-stationarity of the signal that causes temporally varying distributions of the raw data and implies different training and testing distributions for clinically relevant applications, i.e. temporally separated training and test phases. This results in intrinsically limited ability to generalise between those two data sets by means of the data itself. For data driven methods, this deficit could be overcome by significantly more and/or less correlated training data.

Origins of coherent false predictions
In our analysis on three different long-term datasets, two fundamentally different algorithms show a remarkable coherence in correct and wrong predictions of iEEG sequences, indicated by the weighted correlation coefficient c w > 0.5 in seven out of 12 individuals. For three out of the five remaining individuals (with c w < 0.5) at least one algorithm yields a very weak prediction performance. By looking at the network outputs over time, (as exemplarily given for Patient A in Fig. 2) we can observe this correlation as a temporal conformity of the network outputs on the time scale of hours to days. Moreover, our investigation about information transfer reveals an increase in specificity of one method if we eliminate samples from the test set that have been classified falsely by another method (see Fig. 3).
In this study (as in the majority of similar studies (Brinkmann et al., 2016;Kuhlmann et al., 2018b,a;Mormann and Andrzejak, 2016)), the complex problem of seizure prediction is reduced to a binary classification task of allocating data as either preictal or as interictal. We assume that data-driven algorithms are generally sensitive to patterns that are correlated with these two states of the brain, at least in the training set. However, it is obvious that we observe a variety of different brain modes, even in interictal periods of the epileptic brain (Kalitzin et al., 2011). Therefore, especially in scenarios with limited and non-stationary data, it is unclear whether all possible states are sufficiently represented in the training data. This leads to two possible interpretations for the causes of frequent occurrence of coherent false predictions.
On the one hand, we expect an overfitting of the models. As already discussed, generalizability is intrinsically limited for the problem at hand. Reasons include the above described long-term non-stationarity or the low number of seizures. Furthermore, patients might experience different types of seizures with different generating dynamics Mormann et al., 2005). Seizure onsets varying in time and manifesting over different channels were observed for example in (Ung et al., 2016). Potentially seizures of a type that did not occur during the training period are difficult to be predicted by an optimized algorithm. Phenomena like this might explain why the generalization problems not only occur regardless of the algorithm but also generally for the same periods. This leads to the conclusion, that these problems will persist for data-driven approaches and the present data, regardless of the method applied.
On the other hand we might not have a valid ground truth for the commonly used working hypothesis of binary classification. The labeling and classification of the EEG segments is based on the fact that a seizure occurred subsequently or not. Given the fact that changes in the EEG-signal that harbour the potential to lead to a seizure but are not yet strong enough to stride over the threshold will not be classified correctly as preictal despite its ictogenic potential. In such a scenario the term proictal would be more appropriate as it is devoid of the actual occurrence of an apparent seizure.
Finally, false negatives are likely to show up in a set-up with an assumed preictal period of a fixed duration. In accordance with the definitions used in (Brinkmann et al., 2016) and (Kuhlmann et al., 2018a) we assumed a preictal state in the time between 65 minutes and 5 minutes prior to each seizure. It is however questionable whether such a fixed seizure prediction horizon (SPH) is valid for all individuals or even for all seizures of an individual at all (Snyder et al., 2008). Several studies have already shown that the best performance of prediction algorithms is achieved for a patient-individual SPH in the range of 10 min to 60 min (Gadhoumi et al., 2016;Senger and Tetzlaff, 2016;Zheng et al., 2014) or in even shorter prediction horizons of less than 10 min (Kuhlmann et al., 2010). In our opinion, more flexibility should be provided at this point when considering databases with long-term recordings.

Steps to improve seizure prediction
Regardless of the interpretation of the causes of the presented results, it is likely that we need a new working hypothesis, since the binary classification of interictal and preictal segments is limited. The assumption of a preictal state is based on the assumption of a deterministic transition from the interictal state to a seizure. Physiological or pathophysiological processes that are related to ictogenesis but not inevitably followed by a seizure are not considered in this hypothesis. Recently, a more probabilistic approach is often considered by assuming the existence of a proictal state that is characterised by an increased probability of a seizure onset (Kalitzin et al., 2011;Meisel et al., 2015). Studies on forecasting epileptic seizures that identify periods of increased risk of seizures based on the analysis of circadian and multi-day cycles already show promising results, Proix et al., 2019;Stirling et al., 2020).
However, the retrospective determination of increased seizure risk is unfeasible for cur-rently available data. This is the case especially for periods of high seizure likelihood that are not developing into a seizure, but also for epochs preceding seizures since the actual duration of the proictal phase prior to seizures can only be hypothesised. This leads to major challenges in the definition of a uniform framework for the development and comparison of new methods. Algorithms can be compared in their overall performance (e.g. time in false warning and sensitivity), but the identification of faulty behaviour is impossible considering the lack of a reliable ground truth. In the future, experimentally probing the cortical excitability via electric or transcranial magnetic stimulation (Bauer et al., 2014;Freestone et al., 2011) could possibly provide hints whether the brain is in a state of increased seizure susceptibility. Finally, until data including reliable ground truth is available, we suggest to consider alternative approaches for data driven methods. By the way data driven models are currently trained, it is implicitly assumed that a deterministic preictal period precedes every seizure and that any other time is by definition interictal. Our findings support current studies that claim that this hypothesis might not hold for every patient and that probabilistic frameworks should be considered instead. Due to these developments, we propose the use of semi-or unsupervised trained models to acknowledge this fundamental change in the underlying hypothesis.

Conclusion
By comparing two substantially different seizure prediction algorithms on three datasets we observed a remarkably strong coherence of correct but also of false predictions. As algorithms are predominantly sensitive to underlying changes in the data the problem with apparently false predictions is unlikely to disappear by focusing on further optimizations of the algorithms for binary classification.
In our opinion, we should instead focus on new working hypothesis in seizure prediction that follows a probabilistic rather than a deterministic approach. Considering a proictal state along with a clustering of the EEG data using unsupervised learning could be a promising approach. allocation of computing time.

Disclosure of Conflicts of Interest
None of the authors has any conflict of interest to disclose.

Ethical Publication Statement
We confirm that we have read the Journal's position on issues involved in ethical publication and affirm that this report is consistent with those guidelines.