Channel Increment Strategy-Based 1D Convolutional Neural Networks for Seizure Prediction Using Intracranial EEG

The application of intracranial electroencephalogram (iEEG) to predict seizures remains challenging. Although channel selection has been utilized in seizure prediction and detection studies, most of them focus on the combination with conventional machine learning methods. Thus, channel selection combined with deep learning methods can be further analyzed in the field of seizure prediction. Given this, in this work, a novel iEEG-based deep learning method of One-Dimensional Convolutional Neural Networks (1D-CNN) combined with channel increment strategy was proposed for the effective seizure prediction. First, we used 4-sec sliding windows without overlap to segment iEEG signals. Then, 4-sec iEEG segments with an increasing number of channels (channel increment strategy, from one channel to all channels) were sequentially fed into the constructed 1D-CNN model. Next, the patient-specific model was trained for classification. Finally, according to the classification results in different channel cases, the channel case with the best classification rate was selected for each patient. Our method was tested on the Freiburg iEEG database, and the system performances were evaluated at two levels (segment- and event-based levels). Two model training strategies (Strategy-1 and Strategy-2) based on the K-fold cross validation (K-CV) were discussed in our work. (1) For the Strategy-1, a basic K-CV, a sensitivity of 90.18%, specificity of 94.81%, and accuracy of 94.42% were achieved at the segment-based level. At the event-based level, an event-based sensitivity of 100%, and false prediction rate (FPR) of 0.12/h were attained. (2) For the Strategy-2, the difference from the Strategy-1 is that a trained model selection step is added during model training. We obtained a sensitivity, specificity, and accuracy of 86.23%, 96.00% and 95.13% respectively at the segment-based level. At the event-based level, we achieved an event-based sensitivity of 98.65% with 0.08/h FPR. Our method also showed a better performance in seizure prediction compared to many previous studies and the random predictor using the same database. This may have reference value for the future clinical application of seizure prediction.

of 0.12/h were attained. (2) For the Strategy-2, the difference from the Strategy-1 is that a trained model selection step is added during model training. We obtained a sensitivity, specificity, and accuracy of 86.23%, 96.00% and 95.13% respectively at the segment-based level. At the event-based level, we achieved an event-based sensitivity of 98.65% with 0.08/h FPR. Our method also showed a better performance in seizure prediction compared to many previous studies and the random predictor using the same database. This may have reference value for the future clinical application of seizure prediction.

I. INTRODUCTION
E PILEPSY is one of the most common neurological diseases and seriously affects the health of epileptic patients. There are an estimated 70 million people with epilepsy, and approximately 30% of them are intractable to anti-epileptic drugs [1], [2]. For patients with drug-resistant epilepsy, the prediction of seizures may provide them with more treatment options. This is because it can give people a time frame for taking interventions to suppress the onset of seizures.
Electroencephalogram (EEG), as a significant tool, has been widely utilized in the diagnosis of epilepsy [3], [4] and the source localization of epileptic focus [5], [6]. However, EEG-based seizure prediction remains a challenging task. Consequently, EEG-based seizure prediction has attracted an increasing attention in recent years as accurate seizure prediction will greatly reduce the suffering and improve the quality of life for epileptic patients. Seizure prediction using intracranial electroencephalogram (iEEG) and scalp electroencephalogram (sEEG) has been widely studied over the past two decades. The Freiburg iEEG [7] and the CHB-MIT sEEG [8] databases are commonly used in iEEG-based and sEEG-based studies for seizure prediction, respectively. An overview of the related researches is briefly introduced as follows.
As mentioned above, many conventional machine learning and deep learning methods have been used to achieve remarkable results in seizure prediction. However, there are still several considerations for focus and discussion. The first consideration is that the combination of channel selection and deep learning methods is less studied and should be further analyzed in seizure prediction. Second, it should be noted that, for many previous studies using the Freiburg iEEG database for seizure prediction, performance is commonly evaluated at the event-based level (event-based sensitivity and FPR), while for many previous studies employing the CHB-MIT sEEG database for seizure prediction, performance is commonly evaluated at the segment-based level (sensitivity, specificity and accuracy), thus, both levels can be considered at the same time. Third, consider that LSTM, 2D-CNN have been widely used for the prediction of seizures, while the use of 1D-CNN is low. According to these considerations, the main contributions or novelties of this work are summarized as follows: 1) A novel method of channel increment strategy-based 1D-CNN is presented for seizure prediction. In the channel increment strategy, iEEG signals with the varied number of channels (from one channel to all channels) are used in turn as the inputs of 1D-CNN model for classification. Then, the channel case with the best classification rate is selected for each patient. 2) For better evaluating the performances of our method, classification results are simultaneously evaluated at the two levels (segment-and event-based levels). The two levels are also flexibly applied together to select the best channel case. For example, if several channel cases show the same high performance at the event-based level for a patient, the segment-based performance can be used to assist in selecting the best channel case. 3) Two model training strategies (Strategy-1 and Strategy-2) based on the K-fold cross validation (K-CV) are discussed, and they also correspond to two sets of channel selection processes. The Strategy-1 is a basic K-CV, and the best channel case selection is only performed after the K-CV. For the Strategy-2, the difference from the Strategy-1 is that we add a trained model selection step during model training as a preliminary selection of channel cases. Then, the best channel case is selected from these preliminary selected channel cases after the K-CV. The remaining sections of this paper include the materials in Section II, the methodology in Section III, the results of the proposed method in Section IV, and the discussion in Section V. Section VI presents our conclusion.

II. DATA
The Freiburg iEEG database is utilized and analyzed for the prediction of seizures. The iEEG database is recorded at the sampling rate of 256 Hz, with the notch filtering of 50 Hz and the bandpass filtering of 0.5-120 Hz. It contains 21 patients, 87 epileptic seizures, 509 h of interictal, and 73 h of preictal or ictal iEEG signals [7]. For each patient, there are at least 24 h of interictal and 50 min of preictal iEEG signals. More details of this database can be found in [7].
In the study of seizure prediction, the seizure occurrence period (SOP) is defined as the period during which a seizure is expected to arise. The seizure prediction horizon (SPH) is the period from an alarm to the beginning of SOP [54] (as shown in Fig. 1). SPH is also regarded as the period of interventions to prevent seizure onsets [55]. In this work,  we discuss the preictal condition of SPH = 5 min and SOP = 30 min (35 min preictal duration before a seizure) based on studies [25] and [26]. Our work only considers patients with at least 4 seizures for ensuring the number of samples during model training. The details of the selected iEEG signals are summarized in Table I.

III. METHODOLOGY
The overall framework of the 1D-CNN combined with channel increment strategy is showed in Fig. 2. For the iEEG database used in this work, each patient has six iEEG channels, including three in-focal channels (marked as channels 1-3) and three out-of-focal channels (marked as channels 4-6). Hence, iEEG signals with an increasing number of channels (from one channel to six channels) are sequentially fed into the 1D-CNN models for classification, and this process is regarded as the channel increment strategy. Then, the best channel case is selected according to the classification results (as shown in Fig. 2). The next five parts of this section include preprocessing, channel increment strategy, 1D-CNN model, model training and system evaluation.

A. Preprocessing
In preprocessing, 4-sec sliding windows without overlap are used to segment the raw iEEG signals (as shown in Fig. 3). Since the iEEG signals are recorded at the sampling rate of 256 Hz, each 4-sec iEEG segment is a matrix of n × 1024, where n (n = 1 to 6) is the number of channels, and 1024 is the number of points. Then, the 4-sec iEEG segments are used as the inputs of the 1D-CNN models. For each patient, the number of the 4-sec iEEG segments is summarized in Table II.

B. Channel Increment Strategy
The iEEG signals of each patient contain six channels: channels 1-3 (in-focal) and channels 4-6 (out-of-focal). In the channel increment strategy, when iEEG segments of one channel are used as the inputs of the 1D-CNN model, there are six channel cases (|C 1 6 | = 6). By analogy, there are cases of two channels, three channels and all the way to six channels. Consequently, there are 63 channel cases (|C 1 6 |+|C 2 6 |+|C 3 6 |+ |C 4 6 | + |C 5 6 | + |C 6 6 | = 63) in total. All channel cases are summarized in Table III.

C. 1D-CNN Model
Since the 4-sec iEEG segments are directly used as the inputs of the classifier, a 1D-CNN model is constructed in this study. As shown in Fig. 4, the framework of the proposed 1D-CNN model includes two parallel blocks (Block 1 and Block 2), one convolution portion and two fully connected (FC) layers. Each block has the same structure and contains three convolution portions. Moreover, each convolution portion is composed of a convolutional layer with the rectified linear activation unit (ReLU), a batch-normalization (BN) layer, and a max-pool (MP) layer. In this work, the two parallel blocks with different kernel sizes used in the model aim to learn more different representations from the input signals for classification. The function of a convolutional layer is to process the input signals with the convolution calculation and nonlinearization, and the convolution results are commonly fed into a pooling layer for preserving higher-level representations.
The details or parameters of the proposed 1D-CNN model are described as follows. In the Block 1, the three convolutional layers contain 32 kernels (size = n × 3, where n is an integer ranging from 1 to 6, and stride = 2), 64 kernels (size = 3 and stride = 2) and 128 kernels (size = 3 and stride = 1), respectively. The three MP layers have the same pooling size of 3 and the same stride of 2. Compared to the Block 1, the differences in the Block 2 are the kernel sizes of the three convolutional layers (as shown in Fig. 4). In the Block 2, the kernels sizes of the three convolutional layers are n × 5, 5, and 5, respectively. Then, the diverse representations from the two blocks are concatenated as the inputs of the last convolution portion. It consists of a convolutional layer (256 kernels, size = 3 and stride = 1), a BN layer, and a MP layer (size = 3 and stride = 2). Finally, the outputs of the last convolutional portion are globally averaged and fed into the two FC layers. The first FC layer has 128 neurons,

D. Model Training
In this work, the patient-specific model is trained for each patient. Two strategies (Strategy-1 and Strategy-2) based on the K-fold cross validation (K-CV) are performed for model training.
1) Strategy-1: The Strategy-1 is a basic K-CV. For the Strategy-1, the model training is implemented for K rounds, where K is the number of seizures of each patient. In each round, (K-1) parts are used for training, and the remaining one is used for testing. For example, Fig. 5 shows the Strategy-1 for the patients with 4 seizures. First, the interictal segments are sequentially divided into 4 equal parts. Since the number of the interictal segments is much larger than that of the preictal segments, a down-sampling approach is then used before model training. As shown in Fig. 5, the same number of interictal segments are randomly selected from 3 interictal parts in each round. Consequently, the size of the selected interictal segments is equal to that of the preictal segments during model training, while the remaining one (one interictal and one preictal part) is used for testing. Finally, all segments are tested after 4 rounds.
2) Strategy-2: For the Strategy-2, the difference from the Strategy-1 is that a trained model selection step is added in each round (as shown in Fig. 6). The selection criterion of the trained models is based on the F1 score. F1 score can be calculated as follows: precision = T P T P + F P , where T P indicates the number of true predicted preictal segments, F P indicates the number of false predicted preictal segments, and F N indicates the number of false predicted interictal segments. In this work, only when F1 scores are more than 0.97, the corresponding trained models are selected from 63 trained models (because of there are 63 channel cases) in each round. For example, Fig. 6 shows the Strategy-2 for the patients with 4 seizures. First, the sample balance solution is the same as that stated in the Strategy-1. Then, in each round, we leave one part as a testing set, while 90% of the samples from the other three parts are used to train models, and the remaining 10% of the samples are used as the validation set for the selection of trained models (a preliminary selection of channel cases). The trained models with F1 scores more than 0.97 are selected in each round, and the selected models are used again to test the testing set.

E. System Evaluation
In seizure prediction, the system performance is commonly evaluated at the even-based level. However, in this work, the performances of our method are evaluated at the two levels (segment-and event-based levels) simultaneously for two reasons. One reason is that the segment-based performance can be utilized to assist in selecting the best channel case if several channel cases have the same high event-based performance for a patient. Another reason is that the performances at two levels can also make the classification evaluation more comprehensive.
1) Segment-Based Level: At the segment-based level, the sensitivity, specificity, and accuracy are calculated. The formulas of these three metrics are given as follows: Sensi ti vity = T P P , where T P indicates the number of true predicted preictal segments, P indicates the number of all preictal segments, T N indicates the number of true predicted interictal segments, and N indicates the number of all interictal segments.
2) Event-Based Level: At the event-based level, the event-based sensitivity and the FPR are computed. The formulas of the two metrics are given as follows: To give an accurate alarm in the prediction of seizures, a simple postprocessing for prediction labels is performed. In our work, the condition for an alarm to sound is that prediction labels within 32 seconds are all positive (as shown in Fig. 7). It means that eight consecutive labels must be all positive to meet the requirement of an alarm. Since unnecessary repeated alarms need to be avoided, the time interval between two alarms is the sum of SOP and SPH. Consequently, the second alarm in the duration from the first alarm to the end of SOP is prohibited in the system. At the event-based level, we also compare the proposed method with the random predictor. The probability of the random predictor predicting at least k out of K seizures is expressed as follows: where p 1 ≈ 1 − e −F P R·S O P [56], p 1 is the probability of a random alarm, FPR and SOP are the false prediction rate and the seizure occurrence period, respectively. k is the number of true predictions, and K is the number of all seizures. The significance level is set to 0.05 in our work, and it means that the proposed method is better than the random predictor when the p v is less than 0.05.

A. Results of the Strategy-1
The whole algorithm runs twice. For each channel case (total 63 channel cases, |C 1 6 | + | C 2 6 | + | C 3 6 | + | C 4 6 | + | C 5 6 | + | C 6 6 | = 63) at the segment-based level, the averaged sensitivity (Sen 1 ), specificity (Spe), and accuracy (Acc) are achieved. For each channel case at the event-based level, the averaged event-based sensitivity (Sen 2 ), and FPR are attained. Then, from 63 channel cases, based on the results of both levels simultaneously, the best channel case is selected for each patient, and the corresponding classification results are summarized.
For example, as shown in Fig. 8, the averaged results in each channel case for patient 19 are given at the segment-based level (Fig. 8(A)) and the event-based level (Fig. 8(B)). The case of channels 12 is finally selected according to the results of both levels at the same time. And then, the results of channels 12 for patient 19 are summarized in Table IV. Hence, Table IV  Fig. 7. At the event-based level, a simple postprocessing for prediction labels is performed to accurately sound an alarm. In this work, 32-sec duration is the requirement for sounding an alarm robustly. This means that 8 consecutive labels of 4-sec iEEG segments must be positive.  Table IV.   TABLE IV  IN THE STRATEGY-1, THE SELECTED CHANNEL CASES AND THE CORRESPONDING RESULTS FOR EACH PATIENT summarizes the best channel cases with the corresponding classification results for all patients.
As shown in Table IV, the results of each patient are provided after selecting the best channel case. The overall 90.18% sensitivity, 94.81% specificity, and 94.42% accuracy are achieved at the segment-based level. At the event-based level, 74 seizures are all predicted, and the event-based sensitivity of 100% with 0.12/h FPR is attained. According to  Table V. the p v values in Table IV, the performance of our method is better than that of the random predictor for all patients.

B. Results of the Strategy-2
Different from the Strategy-1, we add a model selection step in each round (as shown in Fig. 6). The whole algorithm also runs twice. After running twice, one channel case can attain an averaged F1 score in one round. Thus, for K rounds, one channel case has K averaged F1 scores. In this work, only when K averaged F1 scores of a channel case are all more than 0.97, the corresponding channel case is selected as the pre-selected channel case. After some pre-selected channel cases are obtained, the classification results of the testing sets from these pre-selected channel cases are then calculated for the final best channel case selection. After selecting the best channel case for each patient, the corresponding results are summarized.
For example, as shown in Fig. 9(A), for the patient 19 with 4 seizures, after the whole algorithm runs twice, each channel case has 4 averaged F1 scores. 30 channel cases are first selected because of the F1 scores of these 30 channel cases are all more than 0.97 in all rounds. Then, the classification results of the testing sets from these 30 selected channel cases are showed in Fig. 9(B) and (C). According to the results in Fig. 9(B) and (C), the best case of channels 12 is finally selected from the 30 selected channel cases, and the corresponding results are summarized in Table V. Table V, at the segment-based level, the overall sensitivity, specificity, and accuracy are 86.23%, 96.00%, and 95.13%, respectively. At the event-based level, 73 out of 74 seizures are correctly predicted (except one seizure in patient 5). The overall event-based sensitivity, and FPR are 98.65% and 0.08/h, respectively. This method also shows a better performance than the random predictor for all patients according to the p v values in Table V.

A. Compared to the Studies Using the Freiburg Database for Seizure Prediction
Based on the same iEEG database, the results of this work and previous studies are also compared. The comparison details, including features, classifiers, number of patients, number of seizures, SOP, SPH, number of the used channels, sensitivity and FPR, are summarized in Table VI. As shown in Table VI, the methods of previous studies mainly focus on three aspects: threshold analysis, conventional machine learning, and deep learning. (1) For the methods of threshold analysis combined with linear or non-linear features, the studies [7], [9], [10], [11], [12], [13], [14] achieve a sensitivity of 42% to 92.9% and a FPR of 0.04/h to 1/h. In these studies, the study [13] attains a highest sensitivity of 92.9% with a FPR of 0.096/h, but only 10 out of 21 patients are used. (2) For the conventional machine learning methods, the SVM in the studies [15], [16], [17], [18], [19], [20], the LS-SVM in the study [21], the Bayesian in the studies [22], [23], and the linear classifier in the study [57] are used for the analysis of seizure prediction, and a sensitivity of 85.11% to 100% and a FPR of 0.03/h to 0.36/h are achieved. A highest sensitivity of 100% and a low FPR of 0.0324/h are obtained by using the SVM in the study [19]. (3) For the deep learning methods, the 1D-CNN [24] and 2D-CNN [25], [26] models are utilized, and these studies attain sensitivities ranging from 81.4% to 98.85% and FPRs ranging from 0.01/h to 0.08/h. The study [24] achives highest sensitivity at 98.85% and lowest FPR at 0.01/h. In our work, the deep learning techniques are also used for the analysis of the same iEEG database, and an event-based sensitivity of 98.65-100% and a FPR of 0.08-0.12/h are obtained. Compared to the results of previous studies in Table VI, the performances of our work are better than that of most of previous studies.
Although the studies [18], [19], [57] achieve a sensitivity of 100%, the time of interventions to suppress seizure onsets is ignored (SPH = 0). Moreover, the highest sensitivity of our work can also reach 100% with a reduced number of channels (as shown in Table IV). Compared to the studies [12], [16], [19], [24], [25], our work attains a little higher FPR of 0.08-0.12/h, but it still meets the requirement that FPR should be less than 0.15/h [54]. For the sensitivity performance, the sensitivity of our work is 98.65-100%, which is higher than that of the studies [12], [16], [25] and is also commendable when compared with that of the studies [19], [24]. In this work, another highlight needs to be emphasized. For the Freiburg iEEG database, most of prior studies only evaluate the performances of seizure prediction at the event-based level (as shown in Table VI), without considering the performances at the segment-based level. Different from these studies, our work evaluates the performances of seizure prediction from both levels (as shown in Tables IV and V), thus, it is more comprehensive.

B. Compared to the Studies Using Channel Selection for Seizure Prediction
Table VII summarizes the studies using channel selection strategy (CSS) for seizure prediction. As shown in Table VII, three CSS, including the pre-specified, the statistical criteria In studies [9], [24], [48] using the pre-specified strategy, a sensitivity of 60-98.85%, and FPR of 0.01-0.15/h are achieved. For the pre-specified strategy, some channel cases are predefined (the other channel cases are ignored), and the best channel case is only selected from these pre-specified channel cases. Therefore, one drawback of the pre-specified strategy is that the ignored channel cases may contain the real best channel case. In studies [49], [50], [51], [52] using the statistical criteria strategy, the authors finally attain a sensitivity of 70.9-97.83% with a FPR of 0.031-0.076/h. For the statistical criteria strategy, extracted features or classification rates from single or multiple channels are statistically evaluated to select the significant channels. Then, these selected channels are used for seizure prediction. However, feature extraction is a timeconsuming task, and complex feature extraction and selection approaches may also result in a low generalization. In our work, we use the sequential search strategy (the number of channels ranges from one to all) for channel selection, and the best channel case is selected from all the channel cases according to the performance of each channel case, without discarding some channel cases in advance. Combined with deep learning method, our method achieves a result of 98.65-100% sensitivity and 0.08-0.12/h FPR, and this also shows a remarkable performance compared to the studies in Table VII.

VI. CONCLUSION
In this paper, a novel method of 1D-CNN combined with channel increment strategy was proposed for the prediction of seizures. In the channel increment strategy, iEEG signals with an increasing number of channels (from one channel to all channels) were sequentially used as the inputs of 1D-CNN model for finding the best classification. The proposed method was tested on the Freiburg iEEG database with six channels per patient. Finally, 74 seizures were all predicted. A high eventbased sensitivity of 98.65-100% and a low FPR of 0.08-0.12/h were achieved at the event-based level. At the segment-based level, a segment-based sensitivity of 86.23-90.18%, specificity of 94.81-96.00%, and accuracy of 94.42-95.13% were attained. Compared to the performance of the random predictor, our method was also statistically better than the random predictor for all patients. From these results, we could see that our method had a remarkable performance in seizure prediction with a minimal or reduced number of channels, and the selection of channels for each patient was necessary in this work. All of these may provide a reference for the clinical application of seizure prediction with a reduced number of channels in the future.