Automated Detection of Abnormalities from an EEG Recording of Epilepsy Patients With a Compact Convolutional Neural Network

Electroencephalography (EEG) is essential for the diagnosis of epilepsy, but it requires expertise and experience to identify abnormalities. It is thus crucial to develop automated models for the detection of abnormalities in EEGs related to epilepsy. This paper describes the development of a novel class of compact convolutional neural networks (CNNs) for detecting abnormal patterns and electrodes in EEGs for epilepsy. The designed model is inspired by a CNN developed for brain-computer interfacing called multichannel EEGNet (mEEGNet). Unlike the EEGNet, the proposed model, mEEGNet, has the same number of electrode inputs and outputs to detect abnormal patterns. The mEEGNet was evaluated with a clinical dataset consisting of 29 cases of juvenile and childhood absence epilepsy labeled by a clinical expert. The labels were given to paroxysmal discharges visually observed in both ictal (seizure) and interictal (nonseizure) durations. Results showed that the mEEGNet detected abnormalities with the area under the curve, F1-values, and sensitivity equivalent to or higher than those of existing CNNs. Moreover, the number of parameters is much smaller than other CNN models. To our knowledge, the dataset of absence epilepsy validated with machine learning through this research is the largest in the literature.


Introduction
Epilepsy is a chronic neurological disorder that affects approximately 50 million people worldwide [1]. An electroencephalogram (EEG) is a crucial tool in the diagnosis of epilepsy. Depending on its invasiveness, EEG can be classified into either scalp EEG or intracranial EEG (iEEG). A scalp EEG is performed using electrodes attached to the scalp and is used for an early epilepsy diagnosis.
On the other hand, iEEG is performed through placing electrodes inside the skull by craniotomy. Intracranial EEG is used to identify the primary site of epilepsy in patients whose seizures are not controlled by medication and who may benefit from its surgical removal of the primary site of epilepsy [2].
To diagnose epilepsy based on EEG measurements, a trained specialist (epileptologist) must visually read the EEG and distinguish between normal and abnormal epileptic waveforms. This process is highly specialized, time-consuming, and laborious and is a significant burden for epileptologists. Therefore, there is a need to detect epileptic intervals in EEGs automatically. In the past decades, machine learning's effectiveness has been widely recognized because it can automatically learn the features necessary for detection by collecting a sufficient amount of data. In the context of machine learning-based approaches, most of the automated techniques for scalp EEG are designed for seizure or seizure onset detection [3] and waveforms (such as spike) detection [4]. In particular, methods based on convolutional neural networks (CNNs), one of the deep neural networks (DNNs), have shown high performance in detection tasks [5,6,7,8,9,10].
In terms of the diagnosis of epilepsy, ictal (seizure) EEGs are crucial, but interictal (nonseizure) EEGs are also essential. Abnormal waveforms are also observed in a patient's EEG during interictal states. Even though these abnormal patterns are the basis of diagnosis, they are intermittent and infrequent in an entire EEG recording, typically lasting about half an hour. Looking for abnormal patterns from an entire EEG is time-consuming for epileptologists. However, there are few studies on the automatic detection of abnormal patterns in EEGs of epilepsy patients. Sakai et al. [11] proposed a CNN model to detect abnormalities in EEGs of child absence epilepsy patients. CNNs typically have many parameters. They require a large labeled dataset for training, although, similar to other medical data, it is difficult to collect a significant number of epileptic EEGs. This points to a need to develop a compact CNN model that is suitable for EEG.
Recently, in the field of brain-computer interfacing, a CNN model for multichannel EEG called EEGNet, was successfully proposed [12]. The EEGNet demonstrated its efficiency in the classification of EEG to different mental states such as motor imagery. This paper proposes using a novel class of the EEGNet called multichannel EEGNet (mEEGNet), a compact CNN model based on EEGNet, which can detect EEG abnormalities in epilepsy for each electrode. To validate the mEEGNet, we experimented with a practical dataset consisting of labeled EEGs from 29 cases (19 patients) of childhood absence epilepsy (CAE) and juvenile absence epilepsy (JAE), both of which are major childhood epilepsy syndromes. The performance is compared with conventional models using actual EEG data from epilepsy patients to show its effectiveness in terms of F1-score, sensitivity, specificity, and AUC.
Our contributions can be summarized as follows. We have revealed that a compact CNN model based on the EEGNet can identify abnormal patterns in EEG of CAE and JAE patients with higher accuracies than other complicated architectures. We tested several machine learning models with an original dataset consisting of 19 epilepsy patients in the validation. To our knowledge, the dataset used in the validation is the largest in the literature. This technology may enable epileptologists to extract "keyframes" from EEG recordings up to 40 minutes, in which abnormal patterns appear infrequent.

Related Works
In engineering, seizure detection from the EEGs of epilepsy patients is an essential topic in epileptic signal analysis, and most published works have used public datasets such as the Bonn dataset [13] and the CHB-MIT corpus [14].
In particular, the Bonn dataset consists of 100 single-channel EEG segments of 23.6 s duration. For each segment, the label of normal, interictal, or ictal EEG is given. This is the easy go-to for computer simulations; therefore, hundreds of papers have been published in the engineering context 1 . Although the Bonn dataset contributed to the development of classification algorithms, the data format is incomplete and far from a practical clinical format; thus, it is difficult to use in clinical settings.
On the other hand, the CHB-MIT corpus contains multi-electrode EEGs of 23 subjects with labels of seizure duration. This corpus contributed significantly to the development of neural network algorithms for seizure detection in EEGs using CNNs and recurrent neural networks [5,17,6,7,8]. However, the label is only applied to ictal (seizure) times. For diagnosis of epilepsy, abnormal waveforms (such as spike waves) during the interictal state are also essential. To detect abnormalities, some of the authors of the current paper have proposed a two-dimensional CNN (ScalpNet) with convolutional kernels in both time and electrode directions for detecting abnormal epileptic segments [11].
In terms of waveform detection, epileptic spike detection is a major problem.
Spike waves appear in the non-paroxysmal scalp EEGs of epileptic patients and are essential biomarkers in diagnosis. Their automatic detection could reduce the burden on specialists [10,9,18]. Johansen et al. [10] used CNN to classify EEGs preprocessed with high-pass and notch filters to detect spike waves. Fukumori et al. [9] used CNN to classify abnormal epileptic spikes and artifacts.
Abnormality can be labeled for a whole recording of EEG. Some works used a dataset where every EEG recording had one label: normal or abnormal. Van 1 For recent works, see [15,3,16], for example.
Leeuwen et al. [19] used their own clinical dataset consisting of 15-min EEG recordings labeled normal or abnormal. Every 1-min segment from a 15-min EEG was determined normal or abnormal by CNN, and the segment-wise features were aggregated to make a final decision for the input EEG. Yıldırım et al. [20] used TUH abnormal corpus to test their CNN that needs the first 1-min of the EEG recording to determine normal or abnormal. Both methods used balanced datasets where the numbers of normal and abnormal labels are similar.
Our aim is different from the above mentioned studies. We detect abnormal patterns from an entire EEG recordings of absence epilepsy patients, because epileptologists often visually review a long-term EEG to find abnormal patterns (epileptiform discharges and slow waves) for the diagnosis. However, the abnormal patterns in EEGs are intermittent, so most intervals are normal.

EEG Data
In this study, scalp EEG was measured in 19 patients with epilepsy at Juntendo University Nerima Hospital, Japan. Nine of the 19 patients were diagnosed with JAE and 10 with CAE. The dataset consisted of 29 cases, including multiple measurements from the same patient on different occasions.
The sex, age, and disease name are shown in Table 1. The EEG No. in Table 1 is a serial number, and the alphabetical letters were used as patient identifiers, as the dataset includes multiple measurements from the same patient.
For all 29 cases shown in Table 1, the sampling frequency was 500 Hz, and the number of electrodes was 16. A clinical expert visually inspected all cases to find paroxysmal discharges [21,22] and labeled the time duration and electrodes of abnormalities. These paroxysmal discharges were observed in both ictal and interictal EEGs, and were sometimes induced by hyperventilation [23]. The EEG recording and analysis were approved by the Ethics Committee of Tokyo University of Agriculture and Technology and the Ethics Committee of Juntendo  Table 2.
University Nerima Hospital in Tokyo, Japan. Written informed consent was obtained from the patients and caretakers.

mEEGNet: A Model for Detecting Epileptic Abnormalities in EEG
This paper proposes a CNN architecture called mEEGNet for detecting epileptic abnormal EEGs. mEEGNet is an architecture inspired by EEGNet [12], which was originally proposed for brain-computer interfaces. The pipeline of mEEGNet is illustrated in Fig. 1, and the details of mEEGNet are listed in Table   2. The proposed mEEGNet has an input layer with a size of 16 × 500, which adapts to 16 electrodes (international 10-20 system of the electrode location) and a one-second time window. Also, the output layer has 16 units. This implies that the mEEGNet makes a positive or negative decision every second at each electrode. The mEEGNet consists of three convolutional layers and one fully connected layer.
The first Conv2D layer of the mEEGNet is a convolutional layer with a kernel only in the time direction. The purpose of the first layer is to extract features in the frequency domain of an EEG. Since the kernel size in the time direction is set to 1/2 of the sampling frequency, it can extract frequency features above 2 Hz. The kernel size will be further explored in the simulation, as described later in this paper. The second convolutional layer, DepthwiseConv2D, is a convolutional layer with kernels only in the direction of the electrodes. It aims to learn the relationship between electrodes by weighting and adding the features of each electrode obtained in the first convolutional layer. The third convolutional layer is SeparableConv2D. Separable convolution is a convolutional layer that combines depthwise convolution and pointwise convolution. A dropout layer follows to suppress overfitting.
The significant difference from EEGNet [12] is the second stage following SeperableConv2D. There are an equal number of corresponding electrodes in the inputs and outputs of mEEGNet. The output uses a sigmoid function for the activation function of the final layer to indicate electrode abnormalities.

Weighted Loss Functions
The binary cross-entropy (BCE) is a widely used loss function for training neural networks with the binary output. The loss function of BCE is given as where p t describes the model's estimated probability (denoted by p) for one class and 1 − p for the other class. One of the significant challenges in anomaly detection is the imbalance between normal labels and abnormal labels. When a dataset that does not have an equal number of data per label is used to train a classifier (detector), it tends to result in an unbalanced classifier that has a very high detection rate for the majority class and a low detection rate for the minority class [24]. To solve this problem, learning with weighted loss functions such as focal loss (FL) [25] and class-balanced loss (CBF) [26] has been proposed.
This paper uses not only binary cross-entropy (BCE) but also FL defined as and CBF defined as to check their impact on the detection performance of mEEGNet. In the above loss functions, α, γ, and β are hyperparameters, and n is the number of training samples. Conv.2D (#filter=1024@1×1) Figure 2: The architecture of ScalpNet [11]

ScalpNet
ScalpNet is a simple CNN architecture proposed by Sakai et al. [11], and the network structure is illustrated in Fig. 2. The main feature of this network is that the 16 electrodes of the international 10-20 system is rearranged as a two-dimensional array of 5 × 4 by assuming that the distance between the electrodes in the 16 electrodes was approximately equal.

Evaluation
The implementations of neural network models and SVM were written in Keras [28] with a Tensorflow backend [29] and in cuML [30], respectively. We ran our experiments with Amazon Linux 2 on Amazon Web Service (AWS) EC2 P3 instances.
This paper examines the model's performance through two types of validation: 5-fold cross-validation and leave-one-case-out validation. In the 5-fold crossvalidation, all the EEGs from the 29 cases shown in Table 1 were randomly divided into five blocks; one of the five blocks was test data, and the remaining four blocks were training data. Thus, the cross-validation was conducted five times independently with randomized divisions for 25 training and testing sessions in total. In the leave-one-case-out validation, one of the 29 cases shown in Table   1 was the test data and the remaining 28 cases were the training data. The As shown in Table 2, the first layer of mEEGNet (Conv2D) has 250 parameters, which is the largest among the layers. Thus, a reduction of the number of parameters in this layer could contribute to simplification of the mEEGNet. We evaluated the performance of the mEEGNet with kernel sizes of 10, 50, 125, and 250 in the first layer. For the AUC and the F1-value, the Friedman test [31] was performed with the significance level set at p < 0.05. The Nemenyi test [32] was performed as a post hoc test, and combinations of models with significant differences were also investigated.

AUC, F1-value, Sensitivity, and Specificity
Tables 3 and 4 list the results of averaged scores of the 5-fold cross validation and the leave-one-case-out methods, respectively. In Table 3, all the values are the ground average with the standard deviation across the cross validation repeated five times (N = 25). Table 4

Kernel Size of the First Layer
The results of the 5-fold cross-validation with the kernel sizes of 10, 50, 125, and 250 in the temporal direction of the first mEEGNet layer are presented in

Discussion
We have assessed evaluation scores (AUC, F1-values, sensitivity and specificity) for two-class classification models with different loss functions. The experimental results showed that the proposed mEEGNet with the binary cross-entropy provided the best performance in the two criteria. Additionally, kernels that were too short (size of 10) led to significant deterioration in the classification performance.

Weighted Loss Functions
The focal loss and the class-balanced focal (CBFocal) loss were proposed for datasets with imbalanced labels [33,26]. Cui et al. [26] proposed to characterize the dataset by the so-called imbalance factor of a dataset, defined as the number of training samples in the largest class divided by the smallest. The CBFocal loss showed better classification accuracy than the BCE and Focal in an imbalance factor range of 1 to 200. Although our dataset falls into this range-the imbalance factor in terms of time is 36.006 (see Table 1)-there was almost no difference in performance between the three types of loss functions compared. Thus, the CBLoss was not as effective as expected. This might be due to the selection of parameters, which should need fine-tunig.

Effect of the Kernel Size
From Fig. 3, it can be confirmed that the performance of mEEGNet with a kernel size of 10 is significantly lower than those of mEEGNet with a larger kernel size in both 5-fold cross-validation and leave-one-case-out validation. This result may be explained by the frequency bandwidth extractable depending on the kernel size. The dataset used in this paper all consist of EEGs with a sampling frequency of 500 Hz. Therefore, when the kernel size is 250, features with a 2 Hz or higher frequency can be extracted. On the other hand, if the kernel size is 10, only features in the frequency domain above 50 Hz can be extracted. In EEG analysis, the frequency band below 40 Hz is often used [34] because essential features often exist in such a low-frequency band. This could lead to a significant performance difference between the mEEGNet with a kernel size of 10 in the first layer and those with a larger kernel size.

Advantage of mEEGNet in Terms of Number of Parameters
Results of the AUC, F1-values, and sensitivity tests showed that the proposed mEEGNet achieved the best performance. However, as observed in Tables 3 and   4, ScalpNet and the Zhou-freq model performed similarly to the mEEGNet.
However, we would like to conclude that mEEGNet is the best in overall performance because the number of parameters of mEEGNet is the significantly smallest, as summarized in Table 5. It should be noted that the number of parameters of mEEGNet is about one-third that of Zhou et al.'s model and about one over 1,500 that of ScalpNet.

Characteristics of Abnormality Detection
It is also essential to discuss the behavior of abnormality detection of the proposed mEEGNet. To evaluate the behavior, we consider some representative cases. We focus on sensitivity because specificity values are nearly one for all models, and sensitivity is an essential measure in the abnormality detection application. Figure 4 is the scatter plot of sensitivity values of the mEEGNet (BCE) and ScalpNet (BCE) for each case in the leave-one-case-out validation. It is observed from this figure that the sensitivity is strongly correlated between two models (r = 0.83, p < 10 −6 ), whereas three cases (13G, 16G, and 27R) showed a relatively notable difference in the sensitivity.  Table 2) for time convolutions, which may be more effective in detecting temporal abnormalities in EEGs.

Conclusion
This paper has proposed using the mEEGNet to detect abnormalities in EEGs of absence epilepsy (CAE and JAE). The structure of the mEEGNet is a variant of the EEGNet, which was designed in the context of brain-computer