An integrative method to quantitatively detect nocturnal motor seizures

In this proof-of-concept investigation, we demonstrate a marker-free video-based method to detect nocturnal motor seizures across a spectrum of motor seizure types, in a nighttime setting with a single adult female with refractory epilepsy. In doing so, we further explore the intermediate biosignals, visually mapping seizure "fingerprints" to seizure types. The method is designed to be flexible enough to generalize to unseen data, and shows promising performance characteristics for low-cost seizure detection and classification. The dataset contained recordings from 27 recorded nights. Seizure events were observed in 22 of these nights, with 36 unequivocally confirmed seizures. Each seizure was classified by an expert epileptologist according to both the ILAE 2017 standard and the Lüders semiological classification guidelines, yielding 5 of the ILAE-recognized seizure types and 7 distinct seizure semiologies. Evaluation was based on inference of motion, oscillation, and sound signals extracted from the recordings. The model architecture consisted of two feature extraction and event determination layers and one thresholding layer, establishing a simple framework for multimodal seizure analysis. Training of the optimal parameters was done by randomly resampling the event hits for each signal, and choosing a threshold that kept an expected 90 % sensitivity for the sample distribution. With the cut-off values selected, statistical performance was calculated for two target seizure groups: those containing a clonic component, and those containing a tonic component. When tuned to 90 % sensitivity, the system achieved a very low false discovery rate of 0.038/hour when targeting seizures with a clonic component, and a clinically-relevant rate of 1.02/hour when targeting seizures with a tonic component. These results indicate a sensitive method for detecting various nocturnal motor seizure types, and a high potential to differentiate motor seizures based on their video and audio signal characteristics. Paired with the low cost of this technique, both cost savings and improved quality of care might be achieved through further development and commercialization of this method.


Introduction
Epilepsy is one of the most common neurological disorders, and is characterized by the occurrence of seizures caused by excessive abnormal brain activity. Accurate seizure documentation is essential in order to assess therapy outcomes and risks, especially of nocturnal seizures which cause remarkable increase in the probability of sudden unexpected death in epilepsy (SUDEP) Lamberts et al. (2012); Baumgartner et al. (2018). However, there is ample data suggesting that even 50 % of motor seizures are missed by patients Elger and Hoppe (2018), and under-reporting is even more frequent for nocturnal seizures or seizures with impaired awareness Hoppe et al. (2007). Seizure diaries are commonly unreliable due to postical amnesia and the inability of caregivers to observe (and accurately describe) all of the patient's seizures Akman et al. (2009). Inaccurate documentation affects the patient's treatment and evaluation of efficacy of treatments Elger and Hoppe (2018). Improved documentation could not only help in assessing therapy outcome and thus facilitate treatment optimization, but could additionally provide information for lateralizing and localizing the epileptogenic zone, which is helpful for classification of seizure syndrome and therapy planning. For these reasons, there is a need for more objective and reliable seizure detection.
Due to the inaccuracies present in traditional diary-based follow-up, new strategies for epilepsy monitoring have been proposed. Non-EEG systems can be used for detecting motor seizures Conradsen et al. (2012), but the accuracy and reliability remain problematic for some seizure types. Video-based monitoring systems usually rely on markers attached to the patientor done purely with computer vision techniques allowing motor seizures to be detected without physical attachment to a machine. While most of the studies successfully target prominent convulsive seizures, subtle motor seizures tend to yield lower sensitivity and positive predictive values. Even so, video monitoring has been demonstrated as a tool for capturing more subtle seizure types, including those not observed by caretakers or the patient Peciola et al. (2018); van der Lende et al. (2016). Furthermore, seizure audio has been shown as a useful tool for seizure classification De Bruijne et al. (2008); Hartl et al. (2018), and could be combined in a multimodal way to better differentiate video-based detections from false positives.
Given the (1) clinical need for better epilepsy monitoring, the (2) gap in current solutions for a wider spectrum of motor seizures, and the (3) hypothetical power of using a multimodal approach to provide better detection, we propose a framework for a new seizure detection method based on off-the-shelf hardware and open-source software. In this study we present a system that measures seizure features quantitatively, which allows to detect changes in seizure severity or propagation. The parameter selection in this proof-of-concept model is based on a single adult with refractory multifocal epilepsy. Her variety of nocturnal motor seizure types allowed us to explore a spectrum of motor seizures. From a detection method point of view, this study serves as a Phase 1 validation of a new epilepsy monitoring device Beniczky and Ryvlin (2018).

Clinical history of the patient
The patient is an 18-year-old woman with moderate intellectual disability and refractory epilepsy. The onset of epilepsy was associated with fever and infection at the age of one. She had different seizure types classified according to her pediatric epileptologist, such as myoclonic absences, bilateral tonic seizures, and generalized tonic-clonic seizures. The brain MRI revealed mild cerebellar atrophy and was otherwise normal at the age of five. Genetic testing for Dravet syndrome (SCN1A gene) was negative. The epilepsy was classified as generalized epilepsy with encephalopathy. Despite adequate trials of multiple antiepileptic drugs (AEDs), the patient continued to have frequent seizures.
When the medical care was transferred to the adult neurology unit at the age of 16, a video EEG (VEEG)-monitoring was performed in June 2016 for electroclinical characterization of the seizures and definition of the epilepsy syndrome. The seizures observed in VEEG were similar to those observed during the home monitoring period used in this study; more detailed descriptions are presented below. The VEEG registration demonstrated multifocal from either the left or the right hemisphere with frontal or centro-parietal EEG seizure onset. Based on the unequivocal focal onset of seizures, the epilepsy was reclassified as multifocal epilepsy. Due to the bilateral multifocal onset, resective epilepsy surgery was excluded as a treatment option.
The patient was implanted with a vagal nerve stimulator (VNS) in January 2017. The stimulation was initiated in February 2017 and the primary stimulation target dosing was reached by May 2017 (1.75 mA, 30 Hz, 250 μsec, 30 s ON and 5 min OFF; autostimulation and magnet mode activated). The caregivers (parents) described significant improvement in alertness and decreased seizure severity, but the patient continued to experience the above-mentioned seizure types. The seizures were frequent and mostly nocturnal, and their count, severity and duration were difficult to evaluate based on the seizure diary. In addition to VNS, the patient continues to be treated with a daily dose of 1200 mg sodium valproate, 200 mg lamotrigine, and 30 mg clobazam.

Semiological classification of the patient's seizures
In order to assess the continuing evolution of the patient's disease, a one-month video-based nighttime home monitoring was performed in May 2018 (the service provider was Neuro Event Labs) at the patient's home. The 35 recorded intervals ranged from 42 min to 11 h 39 min in duration, with a total recorded time of 262 h 8 min, and a mean duration of 7 h 29 min. As the recordings were manually controlled by the patient, variation in the length can be accounted for by natural changes in sleep patterns as well as manual stopping of the recording for privacy reasons. The nocturnal registrations formed the dataset for this case study and was chosen in particular due to the variability and frequency of seizures, as well as the relatively long registration period: seizures were observed in 22 out of 27 recorded nights, with a total of 36 confirmed seizures. All video data was manually reviewed by an annotator. The evaluation was based on suspected seizure events which were detected by the analysis of motion, audio, and oscillation signals recorded during nighttime monitoring. All events were manually evaluated and classified by two experienced epileptologists (E.H., S.N.). This also serves as a reference standard for one study according to Standards for testing and clinical validation of seizure detection devices Beniczky and Ryvlin (2018). Events that were not determined to be unequivocal seizures were divided into two categories: those clearly "not a seizure" and those "unlikely to be a seizure" and excluded from the further analysis. A total of 7 seizure semiologies were indicated, falling under 5 of the ILAE-recognized focal motor seizure types. The seizures are listed in Table 1 along with the ILAE codes Beniczky et al. (2017) in parentheses. Short descriptions of the seizures observed during the home monitoring period, a summary of their quantitative characteristics, and statistics of seizures observed by caregivers are presented below. Also, electrophysiological features of the seizures are summarized if observed in the VEEG registration.
• Focal tonic seizure (I.C.05) (n = 24): started from sleep with sudden stiffening of the body, accompanied by an exhalation sound and typically bilateral raising of arms. In some cases (10/24 seizures) the seizure ended at this stage (bilateral tonic). In the rest of the cases the tonic phase was followed by tonic posturing accompanied by a guttural sound or by a clonic phase (bilateral tonic to bilateral clonic). According to the data from the home monitoring, the duration of these seizures varied from 4 to 42 s. Apart from the seizures with the gutteral sound (12/24 seizures), focal tonic seizures were not witnessed by the caregivers. In the VEEG registration, these seizures were associated with frontal EEG seizure activity. • Focal clonic seizure (I.C.03) (n = 3): started from sleep with unilateral clonic movement (unilateral clonic). During the home monitoring period, the duration of these seizures varied from 9 to 14 s. Only 1 out of 3 seizures in the dataset was noticed by the caregivers. According to VEEG recordings, the seizure onset was in the right frontal region. • Focal to bilateral tonic-clonic seizure (I.D.01) (n = 5): started from sleep with sudden stiffening of the body, simultaneously with an exhalation sound, followed by guttural sounds (4/5 seizures). After a bilateral tonic phase of 15-20 seconds, the patient entered a period of bilateral clonic movement. These seizures were noticed by the caregivers, and they lasted from 23 to 45 s. Postictally bilateral flattening in VEEG occurred. • Focal motor seizure (I.C.01) (n = 2): appeared with awakening. The patient rose and leaned backwards, followed by a clonic movement of the right arm (complex motor, asymmetric clonic) or stiffening of the body (complex motor, bilateral tonic). The duration of these seizures varied from 11 to 26 s. These seizures were not noticed by the caregivers and were not captured during the prior VEEG evaluation. • Focal myoclonic seizure (I.C.02) (n = 2): consisted of single myoclonic jerks or clusters of myoclonic jerking of arms and legs. The duration of these seizures was 4-16 seconds. These seizures were not noticed by the caregivers and did not manifest during the VEEG recording.

Dataset
The original (raw) data from the home registration consisted of the aforementioned 262 h of grayscale 30 frames-per-second (Hz) compressed (VP9-encoded) stereo video at 1280 × 720 ("HD Ready") resolution and accompanying compressed (Vorbis-encoded) 48 kHz stereo audio. Sound was captured using the built-in stereo microphone of an Intel NUC, a low-cost compact PC. This computer was used to perform the collection of the video and audio content.
Video was captured using an Intel Realsense D435 camera module, a low-cost depth sensor containing stereo near-infrared imaging sensors, via a USB connection to the PC. The use of the infrared spectrum allows it to capture clear grayscale images in the dark. This camera's built-in infrared projector is designed for structured light stereoscopy, but this light pattern was coupled with an optical diffuser in order to illuminate the scene in lieu of the structured light pattern. This device has a global shutter, ensuring a fixed frame rate despite changes in lighting conditions. The camera's built-in autoexposure support was enabled to adjust for natural and electrical lighting contributions to the scene's illumination. The camera was placed in a fixed position at the foot of the bed, using a boom arm extending toward the patient. The camera was oriented with the bed to optimize the number of "physiologically active" pixels in the image.
All seizures presented in Table 1 were annotated against this raw data using the UTC timestamp of estimated onset and offset of the ictal period based on observable phenomena.

Model architecture
As motor seizures are typically recognized by the presence of abnormal movement, we designed the model around the extraction of features typical for motor seizures but absent in normal sleep. It is based on the intuition that any generic measurement of motion or sound (aspects found in motor seizures and detectable by a camera and microphone) might have thresholds or features which are more indicative of seizure behavior than typical behaviors observed during sleep.
Given the distribution of semiological features in the dataset, we focused on three biomarkers: sudden movement (suggesting a tonic component), sustained oscillatory movement (suggesting a clonic component), and sudden increase in audio level (suggesting a vocalization). We paired signal processing algorithms for each of these biomarkers, resulting in three input signals for the model. A basic multilayer approach for event determination and thresholding was then constructed based on these input signals, ordered from most sensitive and inclusive to most specific and exclusive. In the upper layers, the parameters were selected based on the input dataset tuned for sensitivity (to capture all possible seizures), with the lower layers tuned for positive predictive value (to eliminate false positives). The model architecture is presented in Fig. 1.
The sample rate of the extracted signals matches the original video frame rate (30 Hz), and can be described as time (t) dependent functions. The signals are normalized to a range from 0 to 1.

Extraction layer
The computationally inexpensive first layer generates signals closest to the raw data: they filter out periods of time with low salience. This layer includes motion and audio intensity extraction, which form basic physiological biomarkers for movement and sound. To model scene motion, a background subtraction model by Zivkovic and Van Der Heijden (2006) was paired with a stereo correspondence filter by Hirschmuller (2008) based on semi-global matching (both implemented in OpenCV). The background subtraction model provided a binary mask of the moving parts of the image, and each pixel was multiplied by the distance provided by the stereoscopic filter in meters, resulting in lower values for pixels representing points closer to the camera, and larger values for pixels representing objects farther from the camera. Default values from OpenCV were used, as well as following the software manual's guidance for eliminating noise and improving correctness of these models. The mean per-frame value of this mask of" distance-normalized pixels" was recorded as a one-dimensional signal. Denoting this model as M, the ratio of active pixels to total pixels per frame formed the motion signal m t : To model the sound level of the scene, a similar approach was used: a signal S was derived from the ratio of the raw audio signal by subsampling it to 30 Hz (down from 48 kHz), taking the maximum value of the subsampled period (in this case 1600 original samples), with the intuition that the general sound intensity ("loudness") could be inferred from this signal. This models only the sound volume at that moment in time, which is consistent with other research showing that vocalization intensity to be a good marker for seizure localization Hartl et al. (2018). In future iterations of the model, more complex audio features such as pitch might improve PPV, as demonstrated by Speck et al. (2018). This sound loudness signal, st, was normalized against the maximum value as follows: For detecting periods of sustained oscillation (as present in clonic seizures), an optical flow Horn and Schunck (1981) based method, a commonly used approach in video-based seizure detection Geertsema et al. (2018), was applied. Specifically, the "PixFlow" optical flow implementation Facebook (2016) was used to compute a time-series motion vector field for the salient clip. This vector field was used to construct a sparse path history, with paths eliminated from the output where the optical flow algorithm lost confidence in the tracked image feature. The unbroken paths during a sliding window (1 s) were then analyzed for direction reversal, with each change in direction over 90 • being considered a reversal. The resulting signal o t was defined as the count of non-zero values from the set of unbroken paths, P, containing reversals over the threshold N: A value of 5 for N (i.e. five reversals, or a 2.5 Hz oscillation frequency) was experimentally found to be a good filter for finding oscillating movements that do not occur during normal sleep.

Thresholding layer
The thresholding layer creates events from the input signal based on thresholds for amplitude, duration, and sample count. In the parameter selection phase of this model (discussed more in the following section), these parameters can be optimized for a given evaluation criterion, e.g. maximal sensitivity. Based on each signal and their target evaluation criteria, a set of events was determined: oscillation events (E o ), noticeable movement events (E m ), and sound events (E s ). This gives a flexible way to combine events based on their time intersection, e.g. finding events with both sudden movement and sound, as observed in many of the tonic seizures within the dataset. This particular case (E i ) can be defined as:

Threshold selection
For this study, it was desirable to find cut-off thresholds which yielded sensitivity close to 100 %, while maximizing positive predictive value. This is an important distinction with this model architecture, as its purpose is to provide a narrowing view with each added layer: the top model should catch all possible seizures, at the expense of generating many false positives. Each progressive layer filters out positives based on semiological characteristics that can be determined by a biomarker. This section describes a method for optimizing the cut-off values which determine if these events are relevant to the patient's seizures. Further study is required to understand if such thresholds are truly generalizable across patients, how many of a given patient's seizures would needed to be in order to find meaningful thresholds through training, and if these physically-based values can hold well even when the patient and environment are changed.
Under the hypothesis that a characteristic difference exists between seizures and non-seizures within the events detected by this model, a tuning of model's parameters should provide a path to separate the two classes by one or more thresholds. To test this hypothesis, the value distribution for each set of extracted events was compared with the corresponding ground truth dataset to find if any statistically significant effects where at play for the given variable. In order for an event to be considered correspondent to the ground truth, it had to begin within 10 s of the reference standard (before or after), and had to end after the reference standard started (thus eliminating short events which might have started and stopped before the actual seizure). To account for variability, the ground truth (all events containing seizures) was split into 5 folds, each with 80 % of the original hits. This cross-validation of the threshold parameters gives insight into the stability of the cut-off value, and acts as an indicator of how well this model is expected to work on future data from this patient. It also creates a basis for future datasets (from different patients and for different seizure types) to be used to find good parameters given an equivalently-sized" training set" of that patient's seizures.
Given the goal to characterize signal intensity, the possible numerical features which can be extracted from such time-series data is practically limitless. For this study, a simple descriptor was used: the Euclidean (L 2 ) distance between the maximum and mean magnitudes for the event (both values were scaled by the sample standard deviation before calculation). This descriptor serves as a reasonable marker for intensity, as it favors both events with a sharp peak (maximum magnitude) and those with an overall high energy content (mean magnitude). These values were then used to estimate the population density function using kernel density estimation (KDE). In the interest of retaining at least 90 % sensitivity, the optimal value was selected to be the 10th percentile of the cumulative distribution function of the KDE. As the experiment was performed 5 times for each signal, the mean of the returned values was used in evaluation and the range has been plotted to show variability.
For" noticeable" motion, a total of 1525 events (total duration 630 min) were detected, with a mean duration of 24.8 s and a range from 3.3-997.8 s (σ = 43 s). All seizures had exactly one match to a corresponding motion (100 % sensitivity). As the distributions appear to be roughly exponentially normal, the x-axis is plotted exponentially and cropped around the central tendency (note that this visually skews the probability distribution, so it must be remembered that the density increases as the x value increases). The seizure samples appear to be from a different distribution than the overall collection (p < 0.001 for all seizures, p < 0.01 for those with a tonic component), so it is expected that considerable separation power is available with this feature. The optimal threshold was calculated from the tonic seizure folds, and was found to be 0.0092 (range = 0.0081− 0.0104, σ = 0.0011). The small variance in this range suggests that the intensity measure is a good fit to the problem. The density distribution of seizure samples and all motion events has been presented in figure 2 .
For" audible" sound, a total of 5681 events were detected (total duration 185 min), with a mean duration of 1.9 s and a range from 0.6-257.5 s (σ = 8.5 s). A total of 48 sound events qualified as seizure detectors, with 34 / 36 (94 % sensitivity) seizures detected by one or two sound events according to the hit criteria. While this implies that the missed seizures did not have audible sound at the beginning of the clinical onset, it may also imply that the signal itself would benefit from adaptive filtering to adjust the noise floor as the ambient sound levels change throughout the recording. The intensity descriptor showed distribution separation between classes, with non-seizure events in gray, all seizures in green, and all" guttural" seizures in orange, as shown in 3 .
Visually, it appears that seizures are more likely to contain loud sound samples, which is intuitively expected. The difference in density estimates is statistically significant (p < 0.001 for all seizures, p < 0.01 for those with guttural sounds). As the most audible seizures were the target of the model, the guttural data set was used; the optimal value occurred at 0.025 (range = 0.0074− 0.041, σ = 0.013). The rather large variance in the range suggests that the intensity measure does not adequately capture the seizure sound feature, or that there is a naturally large variance in such sounds.
Finally, for the richest event type used in this study, E o , representing" visible oscillation", a mere 25 events were detected (total duration: 3 min 25 s), with mean duration of 8.1 s and a range of 3.6-25.4 seconds (σ = 6.4 s). As oscillations tend to occur later in the seizure (particularly with FTBTC seizures), the hit criteria was applied to the encapsulating motion event E m instead of the start of the oscillation. Of the 36 seizure events, 11 had exactly one oscillation event according to the hit criteria (30 % sensitivity), but all 10 seizures with a clonic component were detected (as well as one tonic seizure, apparently due to oscillation of the caregiver patting the patient on the back). The distributions are not significantly different (p > 0.1), which is expected given that nearly half of the detections correspond to a seizure. When comparing to events that did not hit a clonic seizure, however, the difference in distributions appears to be significant (p < 0.02). All oscillation events without a clonic correspondence are displayed in gray, and hits with a clonic component are shown in orange. The optimal threshold was calculated to be 0.0037 (range = 0.0− 0.011, σ = 0.0042). The variance is somewhat high, suggesting that the chosen intensity measure may not be optimal for the problem. Probability density distribution of oscillation events has been presented in figure 4 .

Results
To help illustrate and understand the statistical performance of the model, it is important to document some of the observable

Algorithm-based description of seizures
As Fig. 5 shows, each visually depicted seizure category had a distinctive signal profile. All seizure signal profiles in the dataset are provided in the supplementary material.
Seizures with one or more clonic phases manifested a prominent amount of oscillation o t compared to the bilateral tonic seizures. The focal clonic seizure on the left in Fig. 5 depicted an oscillating phase without further semiological findings. The observable increase in the sound signal s t is due to movements of the bed caused by the shaking patient.
Seizures with a bilateral tonic phase revealed a sudden, simultaneous increase in the amount of movement (distinctive spike on m) and sound (distinctive spike on s t ). There was no notable oscillation present in the scene. The two examples of focal tonic seizures in Fig. 5 depict the signal difference caused by the appearance of the guttural sound: movementrelated signals were similar, but the bilateral tonic seizure with guttural sounds lead to an altered sound profile. Thereby, guttural sounds increased the variance of the signal after the initial, sudden sound onset.
Generalized convulsive seizures comprised the following features: stiffened arms raised slowly before the patient entered the clonic phase, either without significant vocalization (Fig. 5, bilateral tonic to bilateral clonic) or accompanied by distinct guttural sounds. Furthermore, the patient had one seizure propagating from bilateral clonic to bilateral tonic-clonic, with the tonic stiffening of the body observable as a lack of movement between the two oscillating phases.
The signal profiles were also capable of differentiating between seizure and non-seizure time periods, and this was utilized during the parameter selection phase by optimizing amplitude and duration thresholds for each of the signals separately. The model was derived based on seizures categorized by E.H., S.N. and J.P., and the final parameters quantify the minimum requirements for a signal segment to be counted as a seizure candidate.

Seizures with a tonic component
The most common seizure type in our registration was the focal tonic seizure. Tonic seizures consist of a sustained contraction of one or more muscle groups usually lasting >3 s and leading to "positioning" Noachtar and Peters (2009);Fisher et al. (2017). In our patient, focal tonic seizures manifested in a sudden movement simultaneously with an exhalation sound, which were recognizable both in the movement and sound analysis (Fig. 5) without any change in oscillatory mode. The model E i (events containing both sudden movement and sound) was the best predictor of this seizure type. This model also detected some seizures which also contained clonic components, and those were considered true positives if they manifested a clear tonic phase. Positive predictive value for events produced by sound signal alone (E s ) was a mere 2.0 %, while the motion model (E m ) yielded a PPV of 3.9 %. The number of false positives detected by the sound feature alone was 1938, whereas the motion model gave 661 false positives. As our system combined the signals (E i ), the results greatly improved: by considering only the time periods where these events intersected, the PPV was boosted to 8.8 % and number of false positives decreased to 268. The sensitivity of this model remained rather high, only missing 3 seizures, for a value of 90 %. Such a model would be good as the basis of further models, or even useful as a clinical aid at only 1 false discovery per hour.

Seizures with a clonic component
The category of focal clonic seizures was the second most common, together with focal to bilateral tonic-clonic seizures. By definition, clonic seizures consist of more or less regular, repeated, short contractions of various muscle groups Noachtar and Peters (2009);Fisher et al. (2017). Seizures with one or more clonic phases were detected using the oscillation signal. Two seizures commenced with a very short tonic activity followed by clonic activity; these seizures were classified as tonic seizures because the tonic phase was considered as the earliest prominent motor feature according to ILAE instructions. In these two cases, sudden total movement was observed before propagation to changes in the oscillation, depicting the oscillation event model (E o ), sensitivity was perfect (100 %) with a reasonably high PPV (50 %) and low false discovery rate (0.038/hour, or about two per week).
The four secondarily generalized seizuresmarked as focal-tobilateral tonic clonic (FTBTC) according to the ILAE definitionwere recognized with the motion model (E m ) detecting the sudden seizure onset, whereas the clonic phase was recognized with the oscillation event model (E o ). Using both the motion and oscillation events to detect tonic and clonic phases, both the sensitivity and PPV of FTBTC detection was also 100 % for this seizure type.

Other seizure components
Two complex motor seizures occurred with unspecific motor features, categorized as unclassified motor seizure according to the ILAE specification. These seizures, as well as myoclonic jerks, were not targeted with our system, and were not used in the selection process of the event parameters. However, a single myoclonic jerk was detected using the motion model: despite the small sample size, this anecdotal evidence indicates that such a model may serve as a myoclonic seizure detector as well.

Overall statistics
The resulting statistics are presented in Table 2 according to the previously established cut-off thresholds. Along with standard accuracy scores, a review time has been provided to give an estimate of amount of effort required to determine the salience of the detections. It is calculated as the total event time, with a minimum duration of 10 s per event (as required by the hit criteria), and a maximum of 20 s per event (suggesting that the reviewer should be able to determine salience within that period).
As expected from the threshold selection, all models gave at least 90 % sensitivity at the selected operating point. The oscillation model exhibited good PPV for clonic seizures, with well under 1 false discovery per night. The tonic model based on motion and sound intersections gave fewer than one false discovery per hour, a performance likely to be acceptable as a clinical aid for closer seizure tracking. While the basic motion and sound models do not perform with high PPV, they clearly demonstrate the filtering power of combining weak estimators to form a stronger one. Furthermore, they act as viable first-pass filters (returning less than 2% of the original recorded material) for later steps in an algorithmic pipeline. While sensitivity appears to be adequate, more can likely be done to find more indicative features of intensity within these biomarkers and to increase PPV.

Discussion
In this proof of concept study, we introduce a novel multimodal registration system, based on nocturnal long-term video and audio home monitoring. This system is able to detect nocturnal seizures with prominent motor features through integrating three distinct modes of analysis which are sound, oscillation, and sudden movement. This multimodal approach has potential to discriminate between seizures of tonic, clonic, and tonic-clonic semiologies as well as their evolution. In addition, a further distinction is possible by analyzing the more detailed semiological features and intra-event evolution of the seizures. Thus, it is possible for multimodal system to detect more subtle seizure types such as automatisms, although a naive implementation will likely cause an increase in false positives due to the lower intensity of such signals. However, further development and testing are needed for more reliable detection of subtle seizures and differentiation between seizure types.
According to the previous studies, the sensitivity varies depending on Analysis of epileptic seizure semiology relies on qualitative criteria which makes it prone to inter-observer discrepancy Bleasel et al. (1997). Also, capacity for a given detection system to differentiate between seizure types would be helpful but was not reported in previous studies. Our system is capable of measuring the seizure features quantitatively, which allows to detect changes in seizure severity or seizure propagation. Quantitative analysis of movements during video-recorded seizures have been applied to develop objective criteria for the analysis of seizure semiology Cunha et al. (2016), and this might be useful in the presurgical workup or therapy outcome assessment. Currently, devices which detect tonic-clonic seizures use some form of oscillation measurement as a biomarker. While highly specific, this method has the disadvantage that a seizure is first detected during the clonic phase; this higher latency makes the technique less useful as the basis for an alarm. If a multimodal model such as the one described here was to be implemented in an alarm system, the tonic biomarkers (sudden movement and sound) could potentially detect the seizure's onset earlier, as well as support a wider range of seizure types (e.g. seizures with tonic but no clonic activity). As for the detection system, accurate detection of prominent and subtle seizures is the main priority despite the higher number of false positives. In this study, the system was thus tuned to a sensitivity of 90 % or greater for seizures with a either a clonic or tonic component, with much better PPV when clonus was present. If this approach was developed into an alarm, optimizing the false detection rate would naturally be a more important target than achieving perfect sensitivity. Thus the proposed model could be further extrapolated to once provide a seizure classification system, simply by observing the correlation of multiple biomarker models, or by observing specific output ranges within a model. Perhaps most importantly, this model can prove useful as a clinical aid to finding subtle seizures from long recordings, as well as dramatically reducing the amount of material required for manual review in general.
Nevertheless, marker-free video-based methods have some limitations. The camera must be placed so that it observes the patient's body and limbs to detect movement. If patient has a seizure out of the area of interest, seizure recognition is completely based on the sound signal, which causes challenges for detection, and the number of false positives may increase if the parameters were adjusted. Small movements can be difficult to recognize using marker-free systems, especially if part of the patient's body is covered by a blanket. Seizure detection in a home setting usually has changing lighting conditions, and as most commonly available cameras adjust the frame rate based on the lighting, it requires reactions of the algorithm to changing video input. Current challenge for the video detection systems is the recognition of seizures with more subtle motor features, which benefits only part of the patient population Ahmedt-Aristizabal et al. (2018). Reliable detection of subtle seizures might be one of the next steps for development in order to serve a larger portion of epilepsy patients. Because the recordings in our study were mainly captured during sleep, our system extracted only normal motion in addition to movements related to a seizure during the night. Patients move more during daytime, or spend time lying on the bed instead of sleeping, which can cause a challenge for the system to detect daytime seizures with the same accuracy. However, daytime seizures are more easily detected by the parents or their caregivers and correspond to a lower SUDEP risk than nighttime seizures. In addition, the used dataset consists of only one patient and therefore, is individual. Datasets of multiple patients and more diverse seizures could help to better estimate the motor and audio signal PPV due to the inter-individual variability of ictal movements and sounds, which can vary from whispering to screaming and smacking to generalized convulsions. However, decreasing e.g. the audio signal threshold to detect subtle seizures can increase the number of false positives Arends et al. (2016). This reduces the statistic potential of our results and indicates the need for testing a larger patient groups and seizure datasets.

Conclusions
In conclusion, this proof-of-concept study introduces a methodological frame-work for deriving biomarkers from a simple video-based home registration system, while demonstrating the use of these biomarkers to model and automatically detect a spectrum of nocturnal epileptic seizures with motor components. Given the non-invasive and relatively low cost (in terms of labor, hardware, and computational power) of this technique, as well as its high sensitivity, it implies a both cost savings and improved quality of care for those suffering from nocturnal seizures. Through the provided automated analysis, the patient (as well as their physician and caregivers) could be kept informed of their seizure frequency and characteristics over the long term. Our intent is to further develop this system, with the goal of improving the models to increase positive predictive value as well as sensitivity to a wider range of seizure types and semiologies. Furthermore, serious investigation is warranted into the amount of individualization and calibration needed when applying this model to unseen data. We are already in the beginning stages of running this method on a larger cohort -15-20 similarly-monitored patientsallowing us to conduct a phase 2 validation study Beniczky and Ryvlin (2018), in which the models are trained in a generalized manner with separate training and test sets. This will allow us to validate the efficacy (and perhaps the level of individualization needed) of this model over a more diverse population of epilepsy patients.

Declaration of Competing Interest
AK, AH, and JB are employees of Neuro Event Labs, the company that provided the equipment and technology used in the study. PO has provided medical consultation for Neuro Event Labs. JP is a shareholder of Neuro Event Labs. No other authors claim a conflict of interest.