Ultrafast review of ambulatory EEGs with deep learning

(cid:1) We ran a deep neural network through 100 ambulatory EEGs and an expert reviewed the EEGs based on the ﬂagged epochs. (cid:1) The conclusion of the expert regarding the presence of IEDs matched the original EEG report in 97% of the recordings. (cid:1) Our network can reduce time spent on visual analysis in the clinic by 50–75 times.


Introduction
Electroencephalograms (EEGs) are important tools in the diagnosis of epilepsy (Smith, 2005;Tatum et al., 2018).Interictal epileptiform discharges (IEDs) are biomarkers of epilepsy that are typically manifested as spikes or sharp waves, followed by a slow wave (Pillai and Sperling, 2006;Smith, 2005).IEDs are present in about 30-50% of routine EEGs of epilepsy patients, with the likelihood rising to 80% when the EEG recording contains a period of sleep (Halford, 2009).Ambulatory EEGs, usually recorded in the patient's home or in an epilepsy center, include several hours of data during wakefulness and sleep (Benbadis, 2015), so there is a high likelihood of the presence of IEDs if the patient has epilepsy.
The gold standard for IED detection is still visual analysis of EEGs by experts, who need to be extensively trained for this task.Further, there is significant intra-and inter-subject variability in EEG scoring and review times in clinical practice are long (Benbadis and Lin, 2008;Benbadis and Tatum, 2003).Since review times are proportional to the duration of the signal that is being reviewed, this burden is vastly increased when it comes to ambulatory EEGs, relative to routine recordings.Experts need approximately 8.3 minutes to review a routine EEG (Brogger et al., 2018), and 2-3 h for an ambulatory recording.Automating IED detection would reduce the time spent on visual analysis, decreas-

Contents lists available at ScienceDirect
Clinical Neurophysiology j o u r n a l h o m e p a g e : w w w .e l s e v i e r .c o m / l o c a t e / c l i n p h ing the resources dedicated to this task and the overall burden on clinicians.
In recent years, deep learning methods have been vastly applied in the medical field.Aside from achieving human level performance in the detection of skin cancer (Esteva et al., 2017) and mammography screening (Shen et al., 2019), artificial neural networks have also been able to tackle tasks that humans cannot perform, such as prediction of atrial fibrillation in ECG (Attia et al., 2019) or distinguishing sex from EEG patterns (Van Putten et al., 2018).Other applications in EEG analysis include classifying signals as normal or abnormal (Van Leeuwen et al., 2019) and, specifically in epilepsy, seizure detection (Saminu et al., 2022), seizure prediction (Baud et al., 2022) and IED detection (Nhu et al., 2022).In what concerns IED detection, deep learning approaches have been able to surpass the performance of mimetic methods, template based approaches and even traditional machine learning methods (da Silva Lourenço et al., 2021;Nhu et al., 2022).
While it is not trivial to compare IED detection studies given the variability of the datasets and outcome measures (da Silva Lourenço et al., 2021;Halford, 2009;Nhu et al., 2022), recent papers have reported AUCs up to 0.99 (Thangavel et al., 2021).In a recent study, two neural networks trained for IED detection were tested in a fully automated and using a hybrid approach (Kural et al., 2022).In the fully automated approach, SpikeNet (Thomas et al., 2020) yielded 67% sensitivity at 63% specificity and Encevis (Fürbass et al., 2020) led to 97% sensitivity at 17% specificity.In the hybrid approach, experts looked at the clustered detections from the network and classified those as IEDs or non-IEDs.With this method, the sensitivities of both networks decreased (albeit not significantly), and the specificities increased to 97% for Spike-Net and 93% for Encevis.The authors reported a 26-91% reduction of the time-burden of full EEG review compared to the hybrid approach, on routine recordings.
Knowing how much time is saved with the automation of IED detection is a key factor in determining the clinical impact of the algorithms.Since the burden associated with the review of ambulatory EEGs is significantly larger than that of routine EEGs, it is expected that the time-saving potential for these recordings much higher.One study reported that hyper-clustering of spikes led to an eight-fold decrease in review time in 24 h recordings (Scherg et al., 2012).In this case, the review time included not only the review of the clusters but also visual inspection of a 5 minute segment of each hour of the recording, which was necessary to reach satisfactory conclusions regarding the presence of IEDs.
To the best of our knowledge, there has been no study reporting the reduction of the time-burden of IED detection on ambulatory EEGs using deep learning methods.Here, we apply a neural network trained with a mix of routine and ambulatory data, validated on ambulatory EEGs (da Silva Lourenco et al., 2023), and study the reduction of time spent on visual analysis when reviewing the results of the algorithm (i.e.only epochs that have been flagged as potential lEDs) compared to traditional visual analysis process of the whole EEG.

Deep learning algorithm
A modified VGG C convolutional neural network (Simonyan and Zisserman, 2014) was implemented in Python 3.8 using Tensorflow 2.2.0 and a CUDA-enabled NVIDIA GPU (GTX-1080), running on Rocky Linux.Convolutional neural networks contain convolutional layers, with filters that are applied across the sample to extract information, allowing the network to create its own representation of the feature space.This network architecture includes five blocks of zero-padding, convolutional and max pooling layers, followed by flattening, dropout and fully connected layers.To adapt the architecture to the EEG data, the input dimensions were changed to [number of epochs Â 250 Â 18], with 250 corresponding to 2 seconds of data sampled at 125 Hz and 18 being the number of channels.In the final dense layer, we used two nodes instead of 1000.We used an Adam optimizer for stochastic optimization, with a learning rate of 2 * 10 -5 , b 1 = 0.91, b 2 = 0.999 and e = 10 -8 .A sparse categorical cross entropy function and a batch size of 64 were used.
The algorithm was trained with 125 EEG recordings with IEDs and 116 normal EEGs, randomly selected from the digital database of the Medisch Spectrum Twente, in the Netherlands.All EEGs were obtained as part of routine care, and anonymized before analysis.This dataset included routine and ambulatory EEG data, containing a total of 5482 IEDs.The dataset was subsequently augmented by shifting the acquisition window (by 0.5 s, 1 s and 1.5 s) and re-montaging the data (in the longitudinal bipolar, source derivation and common average montages), leading to 63,180 IEDs.This data was then filtered in the 0.5-30 Hz range, downsampled to 125 Hz and split into two second samples.These steps were implemented in Matlab R2021b (The MathWorks, Inc., Natick, MA).
Training was performed using the 2 s epochs of 18-channel EEG data (18x250 matrices) as input to the network.The output of the neural network was a probability ranging from 0 (meaning there was no IED in the sample) to 1 (the sample definitely contains an IED) assigned to each non-overlapping 2 s epoch of the EEG recording.More technical details regarding the architecture and implementation of the network can be found in da Silva Lourenço et al. ( 2023).

Time reduction test
We randomly selected 100 ambulatory EEG recordings from patients over 8 years old, as well as the respective report conclusions from the digital database of the Medisch Spectrum Twente, in the Netherlands.All EEGs were fully anonymized, and obtained as part of routine clinical care.Thus, the ethical review board waived the need for ethical approval.This dataset included 42 EEGs that contained IEDs, 25 EEGs that were abnormal but did not contain IEDs and 33 normal EEGs, as concluded from the reports.The average duration of the recordings was 20.6 hours and patient age ranged from 8 to 88 years, with sex distribution close to 50%.As summarized in Table 1, in the abnormal nonepileptic EEGs, focal slowing was the most common type of abnormality.This included intermittent abnormalities, isolated delta waves, polymorphic delta, background asymmetry, among others.For the EEGs with IEDs, the most common types of epilepsy were fronto-temporal and frontal epilepsy (cf Table 1).
The filtering and downsampling steps applied to the training data were also applied to the test EEGs.The recordings, in the longitudinal bipolar montage, were split into two second nonoverlapping epochs and processed by the neural network.The probabilities were thresholded at 0.99 and the epochs with probabilities higher than the threshold were flagged on the EEG viewer (NeuroCenter EEG).The entire pipeline, from pre-processing to writing the annotations in the EEG, took approximately 20 minutes per 20 h of EEG signal.
The 100 processed EEGs were then provided to an expert, who used NeuroCenter EEG to review the recordings by scrolling through the epochs that were flagged by the network.No clinical information about the EEG recordings was provided and the expert was not informed if the EEG was from a healthy control or a patient.It was possible to change montage, filter settings and scaling of the EEG.Based on the information from the flagged epochs, the expert should conclude whether the recording contained IEDs or not.If any other abnormalities were found during the review process, these could also be reported as comments.If the expert was uncertain, that information was added to the comments.For each EEG, the expert was timed during the review process.

Performance assessment
The conclusion of the expert for each recording was compared to the category extracted from the original report.Normal EEG and EEG with non-epileptiform abnormalities were grouped as non-epileptic (i.e.negative class) and EEG with IEDs constituted the positive class.Agreement rate, sensitivity and specificity were calculated.

Results
The expert took a total of approximately 4 h to review the 2000 h of EEGs in the dataset, looking only at the epochs flagged by the network.Examples of IEDs flagged by the algorithm (true and false positives), as well as an IED that was missed by the network, are shown in Fig. 1.
Fifty of the 100 EEGs took less than 2 minutes to review, and 97 recordings were reviewed in less than 9 minutes, as shown in Fig. 2. On average, each recording took 2.4 minutes to review, with a time distribution dependent on the type of EEG, as shown in Table 2. Normal EEGs and abnormal EEGs without IEDs took, on average, less time to review when compared to EEGs with IEDs and the maximum review time of normal EEGs was less than half of the maximum review time of the other two categories.
The average number of detections of the network from the 100 recordings was 7.1 detections/h, or 141.5 detections per EEG.The average detections per hour were more than four times higher in EEGs with IEDs than in the other two categories, and the number of maximum detections was also higher in these recordings, as shown in Table 2.
There was an agreement of 97% between the conclusion of the expert and the original report of the EEG, regarding whether the EEG contained IEDs or not (cf Table 3).
The expert detected IEDs in 39 of the 42 recordings that contained IEDs according to the original report, corresponding to 93% sensitivity at 100% specificity.In 10 of those recordings, there were suspected IEDs or a very small number of IEDs in the recording, which led to doubts that were expressed in the conclusion of the expert.These matched the original report, as can be seen in Table 3. From the 3 EEGs with IEDs that were missed by the expert upon reviewing the results of the network, one of them included a seizure (that was not flagged by the algorithm as it was not an interictal discharge).There was a small number of detections in this EEG that corresponded to right temporal epilepsy, which led to IEDs not being detected during the review (which was only based on the flagged epochs).The network flagged >50 epochs in the two other recordings, which were from right central and temporal left epilepsy, but these were not sufficient for the expert to identify convincing IEDs.
From the EEGs that did not contain IEDs (normal and abnormal non-epileptiform), all the normal EEGs were identified as such.Some patterns such as wicket waves and abnormalities such as slowing and polymorphic delta were also detected, based only on flagged epochs.As the final classification only took into account whether the recordings contained IEDs or not, these EEGs were included in the negative class (i.e.EEGs without IEDs).The considerations about the abnormalities were kept as comments and did not impact the results of the study (cf Table 3).

Discussion
Automating IED detection in EEGs can reduce the time burden associated with visual analysis.We report on how much time can be saved in clinical practice by applying an artificial neural network to ambulatory EEG recordings and reviewing only the flagged epochs to conclude whether the EEG contains IEDs.
An expert reviewed 100 EEGs, equivalent to approximately 20.000 h of recordings, in approximately 4 h (cf Fig. 2).In the clinic, visual analysis of an ambulatory recording takes on average 2-3 h, which means that the workload can be reduced from 200 to 300 h to 4 h.This represents a decrease in analysis time between 50 and 75-fold.
The conclusion of the review process was in agreement with the original report of the EEG in 97 of the 100 recordings.All normal EEGs and abnormal EEGs without IEDs were classified as such by the expert, and 39 out of 42 EEGs were correctly identified as containing IEDs.Ten of the reports from EEGs with IEDs included IEDs in doubt or the suggestion to perform further studies, and these comments were also made by the expert upon review of the flagged epochs (cf Table 3).The expert also commented on the presence of other abnormalities, such as slowing and wicket waves, which were mistakenly flagged by the network in some recordings (cf Table 3).
There were 3 EEGs where IEDs were not detected by the expert while reviewing the flagged epochs.Regarding the EEG that corresponded to right temporal epilepsy, the report based the conclusion on a potential seizure that was not detected by the network, as ictal patterns generally do not correspond to IEDs.The network flagged only 7 epochs, which appeared not sufficient to lead to a positive conclusion.In the two other false negatives, the network flagged more than 50 epochs, none of which corresponded to clear IEDs according to the expert.
The number of epochs flagged by the network was, on average, 4 times higher in EEGs that contained IEDs than in normal or abnormal EEGs that did not contain IEDs (cf Table 2), which resulted in an average review time of 3 minutes.Normal EEGs had an average review time of only 1.5 minutes, corresponding to half the average time spent reviewing EEGs with IEDs.This means that, for EEGs without IEDs in particular, the decrease in analysis time surpasses the average reduction of 50-75-fold.
When compared to other algorithms, our neural network surpasses the reported reductions in time-burden.A recent study reported a reduction of 26-91% on routine recordings using a hybrid approach where the experts reviewed the IEDs flagged by deep learning algorithms, clustered into IED types (Kural et al., 2022).This study included three methods (two deep learning approaches, SpikeNet (Thomas et al., 2020) and Encevis (Fürbass et al., 2020), as well as Persyst (Scheuer et al., 2017)) but it did not compare time reduction between algorithms.We have previously shown that algorithm performance is not necessarily maintained between routine and ambulatory data, so it is not clear how the results of the algorithms, as well as the time-burden reduction, would compare in ambulatory EEGs.Another study using the BESA algorithm, based on hyperclustering of IEDs, led to a review that was 8 times faster than traditional visual analysis of the whole EEG in 24 h recordings (Scherg et al., 2012).The review time in this study included not only the analysis of the hyperclusters, but also the visual analysis of five minutes of each hour of the recording.The BESA software scored lower than Persyst and Encevis in IED detection performance in a recent comparison study (Reus et al., 2022), which may justify the need to review a longer portion of the EEG as opposed to only the flagged IEDs.Our aim was to assess the presence or absence of IEDs in EEG recordings based on automated detections, which is a task with high clinical relevance when developing an assistive tool to support decisions and streamline the visual analysis process of ambulatory recordings.This differs from the topics of previous works (Scheuer et al., 2017;da Silva Lourenço et al., 2021, 2023;Fürbass et al., 2020;Scherg et al., 2012), which try to optimize performance for individual IED detection.While it is necessary for an automated approach to show sufficient performance for single IEDs (as several recently published approaches do), in the clinic, these methods will likely be applied to full EEG recordings.Thus, assessing recording-level performance and comparing the conclusion based on automated detections with the medical report that results from traditional visual analysis is paramount for the adoption of these algorithms in the clinic.We show that our approach leads to fast and reliable review of IEDs in ambulatory EEGs, supporting its potential for clinical use.
We have previously shown that algorithm performance is not maintained across data types (i.e. a network trained for detection of ambulatory EEGs will perform worse on routine recordings, whereas a network trained on routine EEGs will perform worse on ambulatory recordings), so we propose the use of this network specifically for ambulatory recordings (da Silva Lourenço et al., 2023).
The use of clustering as a post-processing step can potentially further reduce the number of false positives by grouping potential spikes and potential artefacts together.Further, it can help identify the source of the spikes by sorting them spatiotemporally.Thus, adding this step to our current approach might improve it even further in the future.
Our study has a few limitations.Our network was trained on data from patients over 8 years old, so it is not expected that the level of performance is maintained when applied to other types of data, such as pediatric EEGs.The different types of patterns present in these recordings will likely lead to a larger number of false positive detections.Also, only a single expert reviewed all recordings (MvP).Similarly, as the algorithm was trained on data from one clinical center, the same level of performance is not guaranteed on data from other clinical centers, as the patient population or the physical characteristics of the acquired signal might differ.

Conclusion
We show that an artificial neural network trained for IED detection can aid clinicians by making the EEG review time 50 to 75 times faster.The algorithm can process an ambulatory EEG within minutes, reliably identifying potential IEDs that lead to accurate conclusions upon review by experts.

Declaration of Competing Interest
M.J.A.M. van Putten is co-founder of Clinical Science Systems, a supplier of EEG systems for Medisch Spectrum Twente.Clinical Science Systems offered no funding and was not involved in the

Fig. 1 .
Fig. 1.Examples of epochs flagged as interictal epileptiform discharges (IEDs) by the neural network.The samples on the top row are IEDs detected by the network (true positives).The panel on the left side includes a left temporal IED. with a maximum at F7-T3; the panel in the middle corresponds to a generalized discharge and the panel on the right side shows an IED in F7-T3/F3-C3.The bottom row shows two false positives (i.e.epochs incorrectly identified as IEDs, left and center), with artifacts and eye movement artifacts, respectively, and a false negative (missed IED around F7/T3) on the right lower panel.

Table 1
Characteristics of the patients included in the study, including type of abnormality for the abnormal electroencephalograms (EEGs) that did not contain interictal epileptiform discharges (IEDs) and type of epilepsy for the EEGs with IEDs.

Table 2
Average and maximum time (in minutes) spent reviewing individual electroencephalograms (EEGs) of each category (normal, abnormal without interictal epileptiform discharges (IEDs) and EEGs with IEDs), based on the epochs flagged by the network.The average and maximum (max) number of epochs flagged by the network per hour of EEG recording is also shown.

Table 3
Comparison of the original electroencephalogram (EEG) report with the comments of the expert and subsequent conclusion based only on epochs flagged by the network.The conclusion concerned only the presence of interictal epileptiform discharges (IEDs) in the recording.The EEG reports included in this table are examples taken from abnormal EEGs without IEDs (first two reports), normal EEGs (third report), and EEGs with IEDs with and without doubts (third and fourth reports, respectively).a considerable asymmetry to the detriment of the left hemisphere, where polymorphic delta activity is almost continuously present.No epileptiform abnormalities.intermittent slow activity.In light sleep, occurrence of SSWs, suspected of epileptiform abnormalities.Slowing, several IEDs in doubt IEDs Normal ground pattern, with occasional epileptiform abnormalities frontotemporal left during light sleep.Frontotemporal left temporal discharges IEDs design, execution, analysis, interpretation or publication of the study.The remaining authors have no conflicts of interest.