Radar-based sleep stage classification in children undergoing polysomnography a pilot-study

Study objectives: Unobtrusive monitoring of sleep and sleep disorders in children presents challenges. We investigated the possibility of using Ultra-Wide band (UWB) radar to measure sleep in children. Methods: Thirty-two children scheduled to undergo a clinical polysomnography participated; their ages ranged from 2 months to 14 years. During the polysomnography, the children's body movements and breathing rate were measured by an UWB-radar. A total of 38 features were calculated from the motion signals and breathing rate obtained from the raw radar signals. Adaptive boosting was used as machine learning classi ﬁ er to estimate sleep stages, with polysomnography as gold standard method for comparison. Results: Data of all participants combined, this study achieved a Cohen's Kappa coef ﬁ cient of 0.67 and an overall accuracy of 89.8% for wake and sleep classi ﬁ cation, a Kappa of 0.47 and an accuracy of 72.9% for wake, rapid-eye-movement (REM) sleep, and non-REM sleep classi ﬁ cation, and a Kappa of 0.43 and an accuracy of 58.0% for wake, REM sleep, light sleep and deep sleep classi ﬁ cation. Conclusion: Although the current performance is not suf ﬁ cient for clinical use yet, UWB radar is a promising method for non-contact sleep analysis in children.


Original Article
Radar-based sleep stage classification in children undergoing polysomnography: a pilot-study 1

. Introduction
Sleep is thought to play a crucial role in infants' and children's brain development [1,2]. The human sleep cycle has two main stages: rapid-eye-movement (REM) sleep and Non-REM (NREM) sleep [3]. The latter can be subdivided in three stages: N1 and N2 (light sleep) and N3 (deep sleep, slow wave sleep). REM sleep and NREM sleep each seem to contribute to different aspects of brain maturation [4]. During early brain development REM sleep is thought to provide important endogenous neural stimulation, laying the groundwork for early neural circuitry [4]. NREM sleep seems to be involved in regulating synaptic homeostasis [4].
While acute or transient changes in sleep may impact a person's cognitive and behavioral performance [5,6], long-term sleep deprivation or sustained changes in sleep patterns carry the risk factor for several diseases and suboptimal development (eg, diabetes, obesity, mood swings, suboptimal growth and school performance) [7,8]. Analyzing the sleep duration and the distribution of sleep stages may well offer useful clinical insights and deeper understanding of sleep physiology and pathology in young children.
To date, a level 1 polysomnography (PSG) with clinical evaluation remains the gold standard method to study sleep. PSG has been widely used in clinical and laboratory settings through multichannel bio-signal recordings manually assessed by sleep technologists. Performing a PSG can be a very challenging procedure for children, and a level 1 full PSG cannot be performed easily outside a laboratory setting [9e11].
These limitations promote the demand for an unobtrusive, or even non-contact sleep monitoring system for home monitoring. Assessing sleep in the child's safe environment and without sensors attached to its body for several days can provide valuable information and could serve as screening method for the necessity of an in-hospital PSG. Several techniques have been suggested for this purpose, such as actigraphy, radar, ballistocardiography, or Doppler laser [12]. These techniques make use of non-EEG vital signs and behavioral measurements that are easier to obtain unobtrusively, such as heart rate, respiration and body movements.
Radar is a contactless method for the monitoring of human vital signs and relies on the modulation effect, due to the chest-wall displacement of a radio signal sent by a transmitter towards the patient. Depending on the type of signal it transmits, four types of radar can be distinguished: pulsed wave, continuous wave (CW), frequency-modulated continuous-wave (FMCW) or steppedfrequency continuous-wave (SFCW) [13].
CW radar systems, such as Doppler radar, have frequently been used for vital sign monitoring because of their relatively low power consumption and simple radio architecture [14]. Several studies have shown that CW radar is a feasible method for vital sign estimation [15e19]. However, complex signal processing techniques are required to accurately detect and measure these vital signs, and these techniques increase the power consumption significantly. These technical difficulties hinder the use of CW radar for widespread public usage [13]. Kagawa et al. investigated sleep stage classification using two Doppler radars, but showed lower accuracy compared to other non-contact methods [20].
Moreover, frequency modulation of the radar is needed for the detection of multiple subjects and human presence [21]. SFCW is not commonly used for human vital sign detection. FMCW on the other hand already demonstrated its potential for vital sign monitoring [22e25]. Turppa et al. [24] demonstrated that FMCW can provide accurate vital sign monitoring during sleep. However, unlike the radars discussed above, pulse-based radars are able to detect vital signs of humans behind solid objects and have high power efficiency [13]. Furthermore, a comparison by Wang et al. showed that Impulse radio ultra-wide band radar (IR-UWB) had better accuracy ratios than FMCW [26].
UWB radar is the most commonly used pulse-based radar and is considered a promising and reliable technique that can capture patients' vital signs without contacting the body [27]. By measuring reflected RF signals, the distance and movement of an object can be determined [14]. This way, movements of the chest and abdomen can be quantified. The UWB radar technique has a high range resolution and a strong penetration with good capability to distinguish between multiple targets [28].
Several studies have used radar technology to measure respiratory information [29e34]. One of these, by Immoreev and Tao [33] measured the agreement between UWB radar and impedance pneumography in four infants under the age of two months. The authors reported that >95% of sample pair differences lay within the ±95% confidence interval. Kang et al. have used IR-UWB radar to measure sleep and sleep apnea and compared it with a PSG with good results in 21 adults [35]. At the moment, however, studies validating UWB radar sleep data against PSG as the gold standard are still very limited, especially in children.
The aim of our study was to make a pediatric sleep stage classification algorithm based on UWB radar data with PSG as gold standard method; a secondary aim was to assess the UWB radar accuracy in automatically determining sleep stages.

Study population, inclusion and exclusion criteria
The study was conducted at the Erasmus MC -Sophia Children's Hospital Rotterdam. The institutional medical ethics review board approved this study (Rotterdam, the Netherlands. File No. MEC-2017-159). Children scheduled to undergo a clinical PSG were eligible for inclusion in this study. Parents or guardians of the children provided informed consent was given by all, and children from the age of 12 years provided consent themselves. A subject was excluded in case of technical failure of the sensor or the PSG.

Study set-up
Connected to a stand-alone laptop that stored the radar data, the UWB-sensor was placed on a standard at the head of the bed at approximately 1.5 m distance from the child, as shown in Fig. 1. While the radar was functioning, the clinically indicated level 1 PSG was carried out as usual.

Polysomnography
The level 1 PSG included the following: ECG, 14-channel EEG, nasal airflow (thermistor), video recording, chest and abdominal wall motion (plethysmography), a capillary blood gas test, arterial blood oxygen-hemoglobin saturation using pulse oximetry (SpO 2 ), and transcutaneous partial pressure of carbon dioxide (tcpCO 2 ). The monitoring was a single-night recording. Sleep stages were scored on each 30-s non-overlapping epoch according to the rules of the American Association of Sleep Medicine (AASM) [3]. Respiratory events were also scored according to the rules of the latest AASM criteria and the apnea-hypopnea index (AHI) was calculated [3]. The child was classified as having obstructive sleep apnea (OSA) if the obstructive AHI was higher than 1 event per hour [37], and central sleep apnea (CSA) if the central AHI was higher than 1 [38]. The limited restricted comparison to only OSA vs. non-OSA, and CSA vs. non-CSA. These groups were compared to the total group. To explore the effect of age, the population was split in a group younger than 1 year old and a group older than 1 year old.

Radar system
The XeThru X2M200 and X4M200 radar modules (Novelda AS, Oslo, Norway) [39] were used for this research. They rely on observing periodic movements when a person is resting and breathing. These modules make use of pulse-Doppler processing. in which coherent pulses produced by a local oscillator are transmitted through an antenna. These pulses propagate through space until they meet reflectors. Some of the transmitted energy will be reflected back to the receiver along with phase modulation caused by motion. The received RF signals are then down-converted by the local oscillator to a baseband signal. The baseband signal is split into two quadrature signals [40](see Fig. 2). The XeThru modules we used to convert the RF signals into baseband frequencies of 20 Hz in X2M200 and 17 Hz in X4M200. Both radar modules provide In-Phase and Quadrature (IQ) or amplitude/phase (AP) data.
The amplitude baseband data was used in this study because magnitudes carry the most comprehensible information. Body motion and respiration rate were obtained by processing these baseband data. The motion signal was created by integrating the differences between two subsequent time frames across the amplitude baseband signals [36]. It is expressed by where MVM win represents the movement quantity within a time window win at a specific sample point t. A is the amplitude of the baseband data, and t i is the sample point in time.
A Fast Fourier Transform (FFT) analysis on amplitude baseband data was then performed to create a 'range-frequency' matrix. Static objects were removed, and small movements of breathing were measured. By means of localizing the peak of the rangefrequency matrix, the breathing rate (BR, in respirations per minute) was detected (see Fig. 3).

Feature extraction
Thirty-eight motion and respiratory features were extracted in this study, summarized in Table 1. Nine features came from the motion signals 'twenty-nine from the BR.  [45,46]. Lastly, a robust algorithm, Katz's fractal dimension, was applied to estimate the fractal dimension of the signals [47]. All these features were subjected to a Z-score normalization by subtracting the mean of the feature values and dividing by their standard deviation for each recording. The normalization step aims to decrease physiological variance from subject to subject by reducing data redundancy and improving data integrity. 2) Motion feature: On the analysis of the baseband data, two motion signals were obtained by the motion detection algorithm in different settings. One was the signal with a window of 1 s, which represented the amount of the movements within 1 s, MVM1, also called fast movement signal. The other one was the movement quantity for a time window of 20 s, MVM20. These motion signals were normalized by their maximum values. The average, area, variance, and entropies of motion signals were computed. Besides, we computed the fast movement ratio, ie, the relative time within 10 min in which MVM1 exceeds a threshold of 10% of the 20-s window signal.

Classifier
An adaptive boosting (AdaBoost) algorithm based on decision trees was adopted for the classification tasks due to its multiple advantages in good classification performance, low susceptibility to  overfitting, and relatively high computational efficiency. For comparison, we used two other popular and widely used classification algorithms, the k-nearest neighbors (KNN) algorithm and support vector machine (SVM).

Evaluation
Leave-one-out cross validation was conducted, which implies that each patient in turn was used as a test set while the remaining ones form a training set during each iteration of the Leave-one-out cross validation. Classification results and performance metrics from each patient during each test iteration were obtained. Three sleep stage classification tasks were performed and evaluated: 1) Wake-sleep (WS) classification, 2) Wake-REM-NREM (WRN) classification, and 3) Wake-REM-Light-Deep (WRLD) classification. To evaluate the performance of the classifiers, the overall accuracy was computed. As we were dealing with class imbalance, Cohen's Kappa coefficient of agreement was also computed. In addition, to show the discriminative power of a single feature towards the output, we calculated the feature importance for each iteration. Furthermore, an unpaired Wilcoxon rank-sum test served to test for significant differences in performance between different groups.

Subjects
Initially, 40 patients were included. Six had to be excluded because of software issues. These issues were resolved for the subsequent measurements. Two with severe mental retardation did not have a single episode of normal REM-sleep during the recording, and were excluded as well. Thus, data of 32 patients were analyzed. Characteristics of these patients are presented in Table 2. AdaBoost and SVM showed similar performance, but because the computational cost of AdaBoost was much lower than that of SVM, we used AdaBoost in the remainder of this study. Fig. 5 shows the importance of each feature within the model for the three classification tasks. The feature numbers in Fig. 5 correspond to the features listed in Table 1. This quality indicates that the motion features and respiratory features are complimentary. In general, motion features occupy the leading positions in the classifications due to their capability of distinguishing between wake and sleep states. For example, the motion feature average of MVM20 signal within 30 s (#30), the variance of MVM1 signal within 30 s (#35), and the fast movement ratios within the last  10 min (#38) were of high importance. Interestingly, the BR feature variance of BR 6 min centered at the epoch (#1), and Katz's fractal dimension within 30 s (#29), were given higher scores in WRN and WRLD classifications than in WS classification. The summary of the classification performance per patient group is presented in Table 3. For all patients, we achieved a mean Kappa of 0.67 ± 0.14 with an overall accuracy of 89.82 ± 5.5% for the WS classification, a mean Kappa of 0.47 ± 0.12 with an accuracy of 72.93 ± 6.8% for the WRN classification, and a mean Kappa of 0.43 ± 0.11 with an accuracy of 57.99 ± 7.7% for the WRLD classification. The summary of the classification performance presented as positive predictive value (PPV) and sensitivity is shown in Table 4. Fig. 6

Discussion
This study shows that sleep stage classification in children can be accurately assessed using UWB-radar technology and shows that it can be a reliable technique to contactlessly assess children's sleep. From the machine learning classifiers that we tested in this study, AdaBoost proved to be the most accurate classifier for the machine learning algorithm. This study effectively identified the most useful features for the classification tasks. While movement features were more important for the wake-sleep differentiation, breathing rate features were more important to differentiate between sleep stages. Differentiating between wake and sleep was the most accurate. Differentiating between more sleep stages became less accurate, with the WRLD classification being the least accurate.
The accuracy of our machine learning algorithm reached was similar to that reported in other studies [20,48e53]. Some of these studies have used UWB radar, similar to our study; others have used ECG signals, video-based actigraphy, or combined different types of sensors and microphones to get the results. However, most studies show similar accuracies for the three types of sleep stage classification: approximately 90% for the WS classification, 60e70% for the WRN classification, and 55e75% for WRLD classification [49].
In the present study, we have made use of UWB radar technology to classify sleep stages. At the moment, several other modalities may be used for this purpose. Probably the most popular method at the moment is actigraphy with the use of a watch. ECG and methods with other kinds of bands, such as ballistocardiography, are also used to estimate sleep [49,54e56]. In the end, it comes down to different methods of quantifying physiologic signals and using a machine learning algorithm to estimate sleep variables. The advantage of UWB radar is that it is a truly unobtrusive device to measure a person's breathing and movement during sleep. This non-obtrusive tool has the potential of increasing patient compliance in sleep diagnostics. The device can be placed on a fixed location next to the bed, and does not have to be attached to the person's body. Furthermore, close to or within the device, several other factors can be measured, such as room temperature, sound, and light exposure. Note that UWB radar is unable to measure is oxygen saturation. To measure oxygen saturation, a single extra sensor could be added to the set-up.
For sleep stage classification, many studies have focused on extracting informative features from respiratory signals. For instance, features such as breathing rate, respiratory self-similarity and regularity and inhalation/exhalation rate and volume have been applied [50,57]. Very few sleep staging studies using radar technology have been performed in infants and children. Some of these studies applied other contactless monitoring approaches (eg, capacitive ECG and video camera) to acquire vital signs for sleep stage classification. Werth et al. [53] developed an algorithm to automatically detect sleep stages from capacitive ECG data from 8 preterm infants (gestational age 30 ± 2.5 weeks). In that study, a performance of Kappa 0.44 was achieved for classifying active sleep, quiet sleep, and caretaking and wakefulness. Recently, Long et al. [51] used video-based actigraphy to identify wakefulness and sleep states in 10 healthy term infants (<18 months), whereby they obtained a Kappa of 0.73 and an overall accuracy of 92% [51].
This study has several limitations. For one thing, the wide age range of subjects, from newborn infants to adolescents. For the sleep stage classification, age could not be included in our machine learning algorithm. In our exploratory analysis, however, there was no difference in accuracy between children younger than 1 year of age and children older than 1 year of age.
Second, there was also a wide range of diseases among the subjects. As this pilot study was performed in a tertiary referral university medical center, the study population does not reflect the average pediatric population. Further research has to be done to also validate the algorithm in a healthy population.
Thirdly, some of the subjects in this study had OSA. This means that their breathing pattern and movement pattern were disturbed by apneas and arousals during the measurements. Due to the relatively small sample size, we could not correct for apneas and arousals in this particular study. OSA seemed to have some negative effect on the accuracy of the algorithms, but the patient numbers in   this study are too small to correct for having OSA. In future studies, investigators should try to only include subjects that do not suffer from OSA or any other disease. Lastly, the hypnograms of the polysomnography are scored by three different observers. Although they were blinded for the UWBradar results, this may have caused some bias in the gold standard method of determining sleep stages and might have lowered the accuracy of the sleep stage classification by the UWB-radar sensor. In an ideal situation, every hypnogram should have been scored by two observers independently, and their results combined through consensus.
Even given these limitations, the yielded results tally well with those of other studies and we therefore believe that our results could have been even better in a larger and more homogenous study group.
Our current study has only focused on sleep stage classification, even though many other aspects in a night's sleep can be evaluated. Future studies could focus on adding new parameters to the algorithm, such as the total sleep time, sleep onset latency, and wake time after sleep onset. Future studies could also focus on detecting obstructive and central sleep apneas, and might be able to detect sleep apnea. Some studies in adults have already proven the feasibility of detecting apneas using an UWB radar [58,59].
In conclusion, this study shows that sleep classification using an UWB radar could be a feasible non-contact method in children.
Since this was a pilot study with a heterogeneous study population, the accuracy of the sleep stage classification is not yet sufficient for clinical use. However, the results are promising for the future of this radar technique in sleep diagnostics. This study paves the way for more in-depth and more detailed studies on sleep quantification using radar-technologies.

Financial\Non-financial disclosure
None.