Personalized seizure detection using logistic regression machine learning based on wearable ECG-monitoring device.

Purpose: Wearable automated detection devices of focal epileptic seizures are needed to alert patients and caregivers and to optimize the medical treatment. Heart rate variability (HRV)-based seizure detection devices have presented good detection sensitivity. However, false alarm rates (FAR) are too high. Methods: In this phase-2 study we pursued to decrease the FAR, by using patient-adaptive logistic regression machine learning (LRML) to improve the performance of a previously published HRV-based seizure detection algorithm. ECG-data were prospectively collected using a dedicated wearable electrocardiogram-device during long-term video-EEG monitoring. Sixty-two patients had 174 seizures during 4,614 h recording. The dataset was divided into training, cross-validation, and test-sets (chronological) in order to avoid overfitting. Patients with > 50 beats/min change in heart rate during first recorded seizure were selected as responders. We compared 18 LRML-settings to find the optimal algorithm. Results: The patient-adaptive LRML-classifier in combination with using only responders to train the initial decision boundary was superior to both the generic approach and including non-responders to train the LRML-classifier. Using the optimal setting of the LRML in responders in the test dataset yielded a sensitivity of 78.2% and FAR of 0.62/24 h. The FAR was reduced by 31% compared to the previous method, upholding similar sensitivity. Conclusion: The novel, patient-adaptive LRML seizure detection algorithm outperformed both the generic approach and the previously published patient-tailored method. The proposed method can be implemented in a wearable online HRV-based seizure detection system alerting patients and caregivers of seizures and improve seizure-count which may help optimizing the patient treatment.


Introduction
Wearable seizure detection devices alerting patients, caregivers and family of patients with epilepsy represent a vital asset for patients with intractable epilepsy, who have uncontrolled and unpredictable seizures [1].In clinical practice reporting of the seizure frequency of the individual patient relies on seizure diaries logged by the patients or caregivers, however these are highly unreliable as more than half of seizures go unnoticed by the patients [2].The erroneous seizure counts both leads to under and over prescription of anti-seizure drugs and furthermore bias clinical assessment of efficacy of anti-seizure drug trials [3].
Wearable seizure detection devices based on electromyography (EMG) [4], accelerometry [5], and multimodal recordings [6] have been validated for detection of convulsive seizures in phase 2, 3, and 4 clinical trials [7].However, detection of nonconvulsive seizures still remains a challenge which has not yet been reliably solved according to the recently published systematic review and recommendations of The International League Against Epilepsy (ILAE) [8,9].The ILAE therefore encourages further research and development in the field, as wearable devices detecting both convulsive and nonconvulsive seizures bear the potential of providing objective data on seizure burden and, by triggering of seizure alarms, it may contribute to preventing morbidity and mortality associated with seizures [10,11].Detection of nonconvulsive/focal seizures requires other biomarkers than movement or muscle activity-based devices as these seizures often is not accompanied with marked movement or muscle activity.Changes of heart rate (HR) and heart rate variability (HRV) have been suggested as biomarkers of detecting focal nonconvulsive seizures [12].However, no phase 3 or 4, and only a few phase 2 studies using dedicated seizure detection devices to measure HR and HRV-changes have been conducted [12].
Recently, we conducted two phase-2 studies using HRV-based seizure detection algorithms for detections of both convulsive and focal nonconvulsive seizures [13,14].In the first study, we compared 26 HRV-algorithms for seizure detection and found a high sensitivity (93%) among patients with marked ictal autonomic changes (responders) using the best of the 26 detection algorithms.However, this was at the cost of a relatively high false alarm rate (FAR) of 1.0 per 24 h [13].In a second study, we validated the best detection algorithm in an independent new test set and found a sensitivity of 87% and FAR of 0.9 per 24 h among responders [14].In 53-57% of all patients in the studies were responders [13,14].However, in the studies, we found the individual seizure detection thresholds of the patients by using the first 24 h of recording (baseline) of each patient to determine the threshold.This was always set as a standard 105% of the maximal HRV-algorithm-value of the non-seizure periods of the first 24 h of the patient recording and thereby requires 24 h of recordings and manually setting of the seizure alarm threshold, which in practice is suboptimal.
In this phase 2 clinical trial, we implemented an adaptive detection threshold seizure alarm based on patient independent data and gradually adapting to the patient own HRV-data.This circumvents manual threshold settings or any pre-recording.We used Logistic Regression machine learning techniques with the primary aim to lower the FAR of the previous study of the seizure detection [13], which was later validated [14].The specific aim of lowering the FAR is essential, as it has been pointed out by both patient surveys and the ILAE that high FAR constitutes one of the major concerns of establishing a reliable wearable seizure detection device to detect focal nonconvulsive seizures [8,9,15].

Patients and dataset
Patients admitted to long-term epilepsy monitoring unit (EMU) at Aarhus University Hospital and The Danish Epilepsy centre in Denmark were invited to participate in the study, which was approved by the Danish Ethical Committee (ID nr 1-10-72-343-15).The patients agreeing to participate were also invited to perform a 10-min exercise bike test and a 10-min Paced Auditory Serial Addition Test (PASAT) cognitive stress test [16].The tests were optional in order to enhance the likelihood of recruitment of patients.
The wearable ECG-device (ePatch) used to record the ECG has a sample frequency of 512 Hz and battery lifetime of 72 h.The ePatch was placed by the staff on the lower left ribs, 5 cm from the midline at an angle of 30 • to the horizontal (Fig. 1).The patients were able to move freely in the EMU during the recording.The R-peak detection and HRV analysis of the ECG recordings from the ePatch were done offline using custom made computer programs in LabVIEW 2016 (64-bit) (National Instruments).A fully automated R-peak detection algorithm developed specifically for the ePatch were used to compute the R-peak detection of the ECG [17].The HRV analysis was done using the seizure detection programs created and described in our previous phase-2 exploration study [13].The new logistic regression machine learning adaptive threshold computation was done using MATLAB 2016.
Seizures <20 s in duration and electrographic seizures without any clinical correlate/episodes without observable semiologic manifestations were excluded from the seizure detection analysis.If seizures occurred in clusters with an interval of <30 min only the first seizure was considered.Trained experts, blinded to the HRV analysis, marked seizure onset as the first electroencephalography (EEG) or clinical sign of the seizure (whichever was first) and the seizure termination as the last EEG or clinical sign of the seizure (whichever was last).The EEG data was only used to mark the time of the seizures and was not in any way used in the seizure detection algorithm, which solely was done with ECG data.
Data were prospectively collected from February 2016 to November 2019 and consisted of 62 patients with 174 seizures and has been descripted in detail in [13,14].The dataset from Jeppesen et al. 2019 [13] (43 patients with 126 seizures) was divided in two where the first half of the patients was used as training dataset and the other half as cross-validation dataset.The dataset from Jeppesen et al. 2020 [14] (19 patients with 48 seizures) was then used as an independent test dataset to test the seizure detection algorithm derived from the training and cross-validation dataset.

Responders
Previous studies suggested that HRV-based seizure detection would only work in patients with marked autonomic changes [13,18].Here, we used a selection method, that we previously have proposed [14]: We considered responders to be the patients in whom the first recorded seizure had an ictal heart rate change of >50 beats/min during a 100 R-R interval period.The rest of the patients were considered non-responders.

Extracting seizure candidates time epochs
Ictal tachycardia or bradycardia is a well-known phenomenon during seizures for patients with epilepsy.In order to find time-epochs (candidates) of possible seizures, we initially used a basic heart rate change classifier that could capture the seizures with high sensitivity, though low specificity.First the tachogram (continuously measured R-R intervals) were applied with a 7 R-R interval moving window median filter with maximum overlapping to remove possible false R-peak detections, missed R-peaks and/or ectopic heart beats.The absolute value of the differential in a 100 heart beats moving window with maximum overlapping of the filtered tachogram were computed and time epochs where this value surpassed 35 bpm was marked (times where HR changed >35 bpm within 100 heart beats).If two consecutive epochs were within 120 heart beats they were merged together as one so the time epoch also included the time periods between the epochs.Lastly, 20 heart beats before each epoch-start and after each epoch-end were included in the final seizure candidate time segment (see Fig. 2 flowchart).
The choice of using a lower threshold for the seizure candidate selection (>35 bpm) compared to the selection of responder patients (>50 bpm) was three-fold: 1) In pilot testing of data from the training set it was observed that not all seizures of responders had >50 bpm increase, and thus these seizures would not be regarded as seizure candidates if the threshold was set to rigid.Please note that only the patients' first seizure was used to classify the responders and non-responders.2) In order to personalize the seizure detection algorithm as quickly as possible we wanted to ensure a higher number of seizure candidates for each patient (also the non-seizure ones).Therefore, we chose a lower threshold to personalize the algorithm faster.
3) It was our hope that the non-responder patients would have better performance of seizure detection if using our adaptable seizure detection threshold, so in order to include as many of the seizures of the non-responder patients as possible we lowered the threshold of seizure candidates to include also the seizures with mediocre heart rate change.

Seizure detection parameters
After finding the time-epochs of all the seizure candidates, the maximum values of the two heart rate variability parameters from the best detection algorithm found previously in [13] (ModCSI100_filtered x Slope and CSI100 x Slope) within the time epoch of each candidate were computed.If the seizure candidate time epoch included a seizure from 120 heart beats before seizure start to seizure end the candidate was designated as a "positive seizure candidate" if not the candidate was designated as a "non-seizure candidate".If two seizure candidate epochs were designating the same seizure only the highest maximum value of each parameter during the two seizure candidate epochs were computed.

Logistic regression machine learning computation
The detailed description of the logistic regression machine learning technique used to classify the seizure candidates into seizures or nonseizures is provided in Supporting Information 1.In short, a decision boundary line was computed using 200 non-seizure candidates and 200 positive seizure candidates from the training dataset in order to classify the seizure candidates into seizures or non-seizures.

Optimizing seizure detection threshold
The analysis of finding the optimal classifier for seizure detection with the lowest possible false alarm rate and highest sensitivity possible was twofold.First, it was tested if the training dataset consisting of responders only was better than the training dataset from all patients (nonresponders + responders) and if an adaptive threshold that gradually personalize the detection threshold was better than using a generic threshold only based on the training dataset.Both the responders and non-responders of the cross-validation dataset were used to test these aspects.
Second, three parameters of the value of Q (1.0, 1.2, 1.4) in ( 3) and six values of the λ (1, 5, 10, 50, 100, 200) in (2) (see formulars Supporting Information 1) was tested in order to find the optimal settings of the logistic regression machine learning parameters.Selection of the Qand λ -values tested were based on our pilot study.Finding the optimal settings of the Q-and λ -values is a tradeoff between high detection sensitivity and low false alarm rate (FAR), we focused on the aspect of the low FAR, but without losing significant sensitivity when selecting the optimal setting.The higher the Q-value the more weight is paid to high sensitivity (and the less to low false positives).The regularization parameter λ-value have an influence on how much the decision boundary fit to the training data; a low λ-value will fit closer to the training data however leaving the algorithm more prone to overfitting, whereas a high λ-value will fit less specific to the training data, however leaving the algorithm more prone to underfitting.The responders of the cross-validation set were used to test these aspects.Of the 18 algorithms (3×6) only the best performing algorithm tested in the cross-validation data set was used to test the algorithm in the independent test dataset, thus eliminating all biases of training and testing in the same dataset.

Patient independent algorithm
The patient independent classifier only used the training dataset to determine the decision boundary either from all patients in the training dataset or from the responders-only in the training dataset.The patients in the cross-validation dataset were used to test the false positive rate and detection sensitivity either from the non-responders or the responders-only.

Patient adaptive threshold algorithm
In order to optimize the algorithm to each patient, we created an adaptive threshold that would gradually change decision boundary depending on the patient's own consecutive seizure candidate-epoch measures of the two HRV features in the following way whenever a new candidate appeared: If the candidate was classified as a seizure and in fact was a seizure, 10% of the positive seizure candidates of the original patient independent training data was deleted (random selected) and replaced with the new candidate.
If the candidate was classified as a non-seizure, but in fact was a seizure, 25% of the positive seizure candidates of the original patient independent training data was deleted (random selected) and replaced with the new candidate.
If the candidate was classified as a non-seizure and in fact was a nonseizure, 5% of the non-seizure candidates of the original patient independent training data was deleted (random selected) and replaced with the new candidate.
If the candidate was classified as a seizure, but in fact was a nonseizure, 25% of the non-seizure candidates of the original patient independent training data was deleted (random selected) and replaced with the new candidate.
The percentages of new candidates were always made by resampling the new candidate by adding copies of it.For instance, when 5% of the non-seizure data was replaced with the new candidate this was 5% of the 200 non-seizure-candidates = 10 copies.
Whenever the patient adaptive threshold reached the point where the classifier was 100% based on the patient's own data in one of the two classes (for example 20 correct non-seizure candidates), the algorithm would still update now deleting x-percentage of candidates of the patients' previous candidates.This was done in order to always optimize the threshold to best resample the newest candidates.
Whenever a candidate was classified wrong (false positive or missed seizure detection), the candidate counted with higher weight (25%), in the updated classifier compared to when the candidates were classified correctly (10% seizure, 5% non-seizure).This was done to put higher weight to the false classifications and base the updated decision boundary more heavily on these to avoid future misclassifications.
Fig. 3 illustrates an example on how the decision boundary changes from first to last computed seizure candidate of a patient.

Results
We prospectively recruited 62 patients with 174 seizures with a total recording time of 4614 h.The training set consisted of the first 21 patients with 70 seizures in which 9 patients with 23 seizures were responders.The cross-validation set consisted of the following 22 patients with 56 seizures in which 13 patients with 31 seizures were responders.The test set consisted of the last 19 patients with 48 seizures in which 11 patients with 23 seizures were responders.
The 13 responder patients in the cross-validation dataset had a total of 1186 "seizure candidate" epochs (median: 57, range 13-253), and only one of the 31 seizures were not detected as "seizure candidate".The nine non-responder patients in the cross-validation dataset had a total of 868 "seizure candidates" (median: 47, range 5-251), and 13 of the 25 seizures were not included as "seizure candidates".
The 11 responder patients from the test set had a total of 1301 "seizure candidate" epochs (median: 85, range 2-360), and all of the 23 seizures were included as "seizure candidates".
When testing the trained algorithms in the responders of the crossvalidation dataset the first analysis showed that using the training set consisting of the responders only and using the adaptive threshold (sensitivity: 83.9% (95% CI: 70.9-96.8%),FAR: 0.82/24 h) was superior to using either non-adaptive threshold or the training dataset from all the patients (see Table 1).
In the non-responder group of the cross-validation dataset, we found that both the sensitivity and FAR was unsatisfying for all the algorithms with a sensitivity ranging from 20 to 28% and FAR of 1.38-2.95/24h (see Table 1).
In the second analysis, we considered the algorithm using Q=1.0 and λ=100 to present the superior performance of the three different Qvalues and six λ-values tested, with a sensitivity of 77.5% (62.7-92.1%)and FAR of 0.24/24h, since our main aim was lowering of the false alarm rate (see Table 2).
When using the superior algorithm (with Q = 1.0 and λ=100) in the independent test set, the performance of the algorithm yielded a sensitivity of 78.2% (95% CI: 61.4-95.1%)with a FAR of 0.62/24 h (see Table 3).Ten of the 13 focal seizures (76.9%) and eight of the 10 generalized tonic-clonic seizures (GTC) / focal-to-bilateral tonic-clonic seizures (FBTC) (80%) were detected.Five of the 11 patients in this test (45.5%)did not have any false alarms at all, and had all of their seizure detected (see Table 3).Two of the patients (number 7 & 11) combined had 14 of the 23 (61%) registered false alarms.

Discussion
In this phase 2, multicenter clinical validation seizure detection study using patient adaptive logistic regression machine learning, we achieved a seizure detection sensitivity of 78.2% and a false alarm rate of 0.62/24 h.The robustness of the algorithm was achieved by using separate training-, cross-validation-and test dataset from separate patients eliminating potential bias.
Decreasing the FAR in seizure detection devices has been stated as a major importance by users [19], and has also been recognized by the ILEA [8,9], and highlighted in systematic reviews of seizure detection devices [12,20].Our first analysis clearly showed that an approach of training the seizure detection algorithm using data only from responders (e.g., patients, whose first seizure had >50 bpm HR increase), was superior in comparison to using data both from responder and non-responder patients combined for training the proposed algorithm, which had 54% higher FAR (see Table 1).Intuitively, this also makes good sense, as a decision boundary which is trained including seizures with very low HR-change will lower the threshold of seizure detection, but at the cost of an unacceptably high false alarm rate.
The first analysis also revealed that a patient specific seizure detection adaptation approach was superior to a generic threshold where the decision boundary of the proposed algorithm is not adapted to the patient specific data, in which the FAR was 52% higher (see Table 1).The high gain of the adaptive threshold in terms of the high reduction in FAR could seem somewhat surprising as the patients in the cross-validation dataset only had 1-5 seizures recorded each, thus a limited influence on the personal adaptation of the decision boundary had to be expected.However, one has to keep in mind that all the seizure candidates without seizures (non-seizure candidates) gradually feeding the logistic regression ML also benefit from personalization of the decision boundary of the seizure detection algorithm.Thus, the part of the decision boundary based on the non-seizure candidate's data is relatively quickly altered from the initial generic setting to a personalized patient-tailored setting constituting a high gain in terms of lower FAR.
The seizure detection sensitivity of the non-responders was very poor (24-28%) regardless of adaptation or type of training set (see Table 1).This was not surprising giving the low number of seizures marked as seizure candidates (12 of 25) due to very low heart rate changes during the seizures of these patients.These findings highlight the fact that not all patients have seizures which include heart rate changes and that an individual pre-screening of heart rate changes during a seizure is necessary for determining if any form of heart rate-based seizure detection system is feasible for the patient.
The second analysis consisted of changing the parameters of Q and λ in the logistic regression to optimize the adaptive seizure detection algorithm.This analysis clearly shows the trade-off between sensitivity and specificity (false alarm rate), when manipulating with these parameters (see Table 2).As the emphasis in this study was to reduce the FAR, we regarded the formula using Q = 1.0 and λ=100 as the superior algorithm, as it yielded a very low FAR of 0.24/24 h in the crossvalidation dataset.The FAR in the responders of the cross-validation dataset with both focal (24 seizures) and GTC/FBTC (7 seizures) was thus comparable with false alarm rates of accelerometry based wearable devices detecting only convulsive GTC/FBTC seizures [5] and still maintained a relative high detection sensitivity (77.4%).
In the final analysis using the independent test set, a much higher FAR (0.62/24 h) was seen as compared to the cross-validation set (0.24/ 24 h), highlighting the importance of using an independent test set to avoid the bias occurring when testing in a cross-validation dataset.In spite of this, the FAR of 0.62/24 h in the independent test set still means a 31% reduction in comparison to the one we reported in the same dataset in Jeppesen et al. 2020 [14] (FAR: 0.9/24 h), in which study we used a simple patient-tailored detection threshold only based on the first 24 h of non-seizure data from each patient.Also, when using our novel logistic regression ML approach, five of the 11 patients in the test set (45.5%) did not have any false alarms and had all their seizure detected (see Table 3), while two of the patients (number 7 & 11) combined had

Table 2
Test results using three scales of the Q-value (1.0, 1.2, 1.4) and six scales of the λ-value (1,5,10,50,100,200) in order to find the optimal settings of the logistic regression machine learning parameters.The initial decision boundary was trained using the responders of the training set and the personal adaptive threshold method was applied.The cross-validation dataset (responders only) was used to conduct the test.Abbreviations: E = Exercise test, C = Cognitive test, N = None, FBTC = Focal to bilateral tonic-clonic, GTC = generalized tonic-clonic.
61% (14 of 23) of the registered false alarms.Patient 7 had some noise artifacts during the first of the non-detected seizures, which may have contributed to a spuriously low decision boundary and thereby more false positives.Patient 11 had false positives relating to abruptly starting physical activity and hyperventilation exercises conducted to try and provoke seizures.Other than this we did not find any pattern across patients having high false positive vs no false positive.These findings indicate that the seizure detection alarm presented in this study seems to be perfect for some patients, while others inherently have some false alarms.The seizure detection sensitivity was slightly lower, although similar, in the novel logistic regression ML approach (78.2%, 95% CI: 61.4-95.1%)when comparing to the previous reported study [14] (87.0%95% CI: 73.2-100%), but the difference was not significant (p-value: 0.44).Another advantage of using the logistic regression machine learning approach presented in this study, besides the 31% FAR reduction, is the convenience that the patients do not need to have an initial baseline recording of 24 h recording to determine the seizure detection threshold.Instead, the detection threshold will initially be based on data from other patients and then gradually adapt to the patients individual HRV responses during seizure and non-seizure epochs of heart rate changes.
Patient adaptive seizure detection thresholds using heart rate changes as biomarkers have also been suggested in other studies [21][22][23].Cooman et al. used an adaptive linear support vector machine (SVM) classifier algorithm in a phase 1 study (wired ECG measured from standard video-EEG recording systems) and also found that this was better than patient independent seizure detection [21].Although, the algorithm achieved a high sensitivity of 77.1%, the FAR was relatively high in the study (29.7/24 h).The same research group tried to confront the issue of high FAR by applying an adaptive transfer learning approach in a recent publication, which lowered the FAR by 37% [24].However, the FAR was still at an unacceptably high level and more than 70-fold higher than the FAR we report in this current study.Pavei et al. also used support vector machine with input of HRV data to detect seizures which achieved a high detection sensitivity of 94.1% however also in this study the FAR was undecidable high (11.8/24h) [23].It is, however, likely that the high FARs seen in those studies is due to the fact that the datasets were not divided into responders and non-responders, which most likely reflects the results as patients with no or subtle autonomic changes during seizures will be likely to have low detection sensitivity and generate high FAR, thus contaminating the general performance of the seizure detection algorithms.
It is important to emphasize that not all patients with epilepsy can potentially benefit from HRV-based seizure detection.One of the limitations of our study is the pre-selection of responders, which requires that a seizure has been captured with heart rate recordings that can determine a >50 heart beat change during the seizure.In this study 53% of patients were considered responders.However, wearable ECG-or pulse-recorders are cheap and easy to come by, so pulse-recordings of patients during seizures should be feasible even in an out-hospital setting.In addition, the algorithm is optimized for detecting seizures >20 s duration, and thus very short seizures are not within the scope of this study.However, shorter seizures are arguably less important from a clinical point of view [13,14].
Another limitation is that the patient adaptive feature requires that patients correctly identify seizures and false alarms whenever these occur.As previously mentioned, it is an issue that many patients do not know when they are having or have had a seizure and thereby it can be difficult for them to correctly identify all seizure alarms, and therefore, a video-EEG monitoring session may be needed to initialize the system.A prospective study of real-time online detection (phase-3) is needed in order to challenge this issue and to test how many seizures and false alarms patients can identify correctly following seizure alarms.

Conclusion
The novel, adaptive, patient tailored HRV-based seizure detection algorithm using a logistic regression machine learning approach presented in this study was superior to both the generic detection threshold and to the previously published method of personalizing the seizure detection threshold.The false alarm rate was reduced with 31% compared to the previous methods, while still upholding a similar sensitivity.This novel method can be implemented in a wearable online HRV-based seizure detection system alerting patients and caregivers of seizures and help to obtain an objective seizure count and thereby optimize the treatment of the patients.

Fig. 1 .
Fig. 1.Positioning of the ePatch heart monitor on the lower left ribs 5 cm from the midline at an angle of 30 • .

Fig. 2 .
Fig. 2. Flowchart of how the seizure candidates time epochs was computed from the patients tachograms (continuously measured R-R intervals).

Fig. 3 .
Fig. 3. Example: Patient 10 in the test set.Fig. 3A shows the decision boundary at the first seizure candidate before any personal seizure candidates has been added (Patients own), so all seizures and non-seizure epochs is data from the training set.Fig. 3B shows the change in the decision boundary after the last (48th) seizure candidate has been added from patient 10 (5 seizures and 43 non-seizures).At this point the decision boundary is personalized based on 50% of own seizures and 50% on seizures from training set and 100% of the patients own non-seizure candidates.

Table 1
Detection sensitivity and False alarm rate by use of adaptive threshold or nonadaptive threshold and using either all patients in the training dataset or only the responders of the training set for creating the initial decision boundary.The cross-validation dataset (responders only) was used to conduct the test with parameters λ = 5 and Q = 1.0 in the logistic regression cost function.

Table 3
Test set patient data (responders only) of seizures, seizure types, performed tests, false alarm rates and detection sensitivity.The initial decision boundary was trained using the responders of the training set.The personal adaptive threshold method was applied using with parameters λ = 100 and Q = 1.0.