Validation of mandibular movements’ analysis to measure sleep in adults with sleep complaints by comparison with actigraphy and polysomnography

Objective In adults with sleep complaints, we assessed the software of automatic analysis of mandibular movements to identify sleep and wake states by confrontation with the polysomnography (PSG) and the actigraphy (ACTG). Material and Methods Simultaneous and synchronized in-lab PSG, ACTG, and JAWAC were carried out in 100 patients with a sleep complaint. Epoch by epoch analysis was realized to assess the ability to sleep-wake distinction. Sleep parameters as measured by the three devices were compared. These included three regularly reported parameters: total sleep time (TST), sleep onset latency (SOL), and wake after sleep onset (WASO). Also, two supplementary parameters, wake during sleep period (WDSP) and latency to arising (LTA) were added to measure separately the quiet wakefulness states. Results The epoch by epoch analysis showed that the JAWAC, as compared to ACTG, classified sleep and wake states with greater specificity, while the overall accuracy and sensitivity of the two devices were comparable. The sleep parameters analysis showed that for the JAWAC estimates, the differences in TST, SOL, and LTA values were not statistically significant. However, WDSP and subsequently WASO were slightly underestimated. In contrast, the dissimilarities between ACTG estimates and PSG measurements of all the above sleep parameters were statistically significant; TST was overestimated whilst SOL, LTA, WDSP, and WASO were underestimated. Conclusion This study indicated that, besides its ability to reliably estimate TST, the JAWAC based on mandibular movements’ analysis was able, in adults with sleep complaints, to overcome the important problem of the recognition of the state of quiet wakefulness.


INTRODUCTION
Epidemiological studies carried out in the past decade have shown that obstructive sleep apnea (OSA) is a highly prevalent medical condition in the general population. It implies a range of clinical presentations with an ever-growing list of known adverse health consequences 1 . The diagnosis and severity of OSA are determined by the apnea-hypopnea index (AHI), which itself requires an accurate and objective assessment of both sleep time and respiratory events. Full inlab polysomnography (PSG) is considered as the gold standard in the diagnosis of OSA, however this procedure can be costly and time-consuming limiting thus its scope. Consequently, alternative approaches based on portable devices used for home sleep testing (HST) were developed from early on 2 .
Whilst HST devices make use of different combinations of PSG respiratory sensors to identify respiratory events, they all lack the objective EEG measurement of the total sleep time (TST). Therefore, the International Classification of Sleep Disorders (ICSD) recommends that, in those conditions, the total recording time be used instead of the total sleep time in calculating respiratory disturbance index (RDI) 3 . This change of a variable, obviously, impacts both the diagnostic outcome and disease severity stratification 4 . To overcome this problem, many HST devices are using either a separate or an integrated actigraphy device (ACTG) as a surrogate for the EEG measurement of the TST. Although these devices improved the overall sensitivity and specificity of HST devices, they turn out to be less reliable in the presence of comorbidities, a situation frequently reported in patients who require a sleep study. The use of ACTG technology in association with HST devices is therefore considered as "conditional" rather than "standard" recommendation 5,6 . Otherwise, a growing number of HST devices resorts to technologies that do not make use of the same recommended variables for respiratory events as used in standard PSG, but instead rely on surrogate parameters. In such an approach, the number of channels used on a given device becomes less relevant than the sensitivity and specificity of the device and the clinical outcomes that it can achieve 7 .
One of the most promising surrogate used in HST is the analysis of the sagittal mandibular movement (MM) using a highresolution magnetometer named JAWAC. Such analysis is able not only to recognize but also to differentiate between different sleep related respiratory events 8 . The algorithm used in the automatic analysis software of MM compared to PSG, proved to be a reliable alternative to the latter 9 . Moreover, as the sagittal mandibular movements reflected different behaviors associated, either to wakefulness (speaking, swallowing, eating, drinking, tonic support…), or to sleep (quiescence), we implemented another algorithm in order to detect sleep and wake epochs. This complementary algorithm provided a good estimation of the sleep and wake states 10 . The use of a single sensor to measure both the relevant respiratory and sleep parameters in a reliable way, offers a definite advantage that requires further validations. In a previous study conducted on healthy adults, we showed that MM was comparable to standard PSG and superior to ACTG in differentiating sleep and wake states 11 .
However, the presence of sleep disturbances or comorbidities may interfere with the ability of a device to measure sleep. So, we designed the present study in order to assess the accuracy of predicting sleep and wake states by the analysis of MMs, confronted to synchronized analyses of ACTG and PSG, the latter one considered as the gold standard, in a cohort of patients suffering from a sleep disorder.

Participants
The study was conducted at the Sleep Center of the University Hospital of Liege, Belgium. In accordance with the Helsinki declaration on human experimentation 12 , every participant read and signed an informed consent in which the aims of the study were also explained. They were selected from a group of adult patients who had been referred to the sleep center for investigation of sleep complaints. All patients were 18 years old or above. Their medical history, medications and demographic data were collected. The size of the cohort was limited to the first 100 patients in whom the simultaneous recordings of the devices were completed without technical defects.

Polysomnography
PSGs were carried out using EMBLA N7000 systems equipped with the Somnologica software. The PSG montage included three EEG channels, left and right EOG, chin EMG, bilateral tibialis anterior EMG, EKG, nasal cannula/pressure transducer, chest, and abdominal inductance plethysmography belts, fingertip pulse oximetry, snoring sensor, body position sensor, and light sensor. The manual scoring was done according to AASM scoring rules and was realized by qualified technologists blinded to the results of the other devices 2 . PSG was named hereafter as the gold standard for wake and sleep identification as well as for the diagnosis of sleep disorders.

Tested device: actigraphy (ACTG)
Actiwatch monitor (Actiwatch 2; Philips -Respironics, Murrysville, PA, USA) attached to the nondominant wrist was used for that purpose. Data were collected in 30-seconds epochs and analyzed thereafter by Philips ActiWare software version 6.0.1. The "default" settings provided by the manufacturer were selected for automatic analysis 13 .

Tested device: JAWAC
The JAWAC (Nomics -Liege, Belgium) 14 is a device validated in the diagnosis of sleep breathing disorders through an analysis of mandibular movements. It employs a noninvasive motion sensor, based on the principle of electromagnetic selfinduction. The output voltage at the receiver coil is a monotonic cubic function of the distance between the transmitter and the receiver coils. When the two coils are placed parallel to each other on the median-line of forehead and chin, the distance between them, which represents the sagittal MM, can be calculated from the properties of the received signal.
The output was amplified, digitalized at a rate of 10Hz and made available online with the PSG channels. The data were also stored for subsequent retrieval and analysis. A first software based on MM analysis to detect and classify the ventilatory effort has been developed and validated 14 . Furthermore, a second validated software, using a wavelet-based complexity measure of the MM signal, was proposed to recognize sleep and wake states 10 .

Procedures
To ensure a reliable temporal synchronization between the three devices, we used the "Network Time Protocol". Before each sleep study, the computer from each device was connected to the Internet and its clock synchronized manually with the Internet timeserver. Several units of each device were available for randomly use. Patients were admitted to the sleep laboratory between 14:00 and 17:00 hours. They were equipped early with ACTG and JAWAC sensors while those of PSG were installed later in the evening. Each patient freely chose the time devoted to sleep, in accordance with his or her bed and wake habits. The data from each device was stored for subsequent retrieval and analysis. At the end of each sleep study, the sleep technologist made sure that the three devices had remained synchronized, defined as showing no more than 30 seconds of discrepancy.

Data analysis
For methodological reasons, the duration of the recording was different with the three devices. Consequently, as they were all synchronized, we selected the period from "lights out" to "lights on" identified by the PSG sensors, as the time base for analysis for all three devices (gold standard PSG, tested device ACTG, and tested device JAWAC).
Qualified technologists scored the data from the PSG recordings manually and according to the AASM scoring rules. Automatic analyses were used in order to get ACTG and JAWAC data. The results of the scorings were available in 30sec. epochs. The PSG epochs were reduced to a binary form (S for any sleep stage and W for wakefulness), while those of ACTG and JAWAC were labeled directly 'sleep (S)' or 'wake (W)' by automatic analyses.
For each device, five derived sleep parameters were calculated using the same definitions. These included three AASM recommended parameters: 1) the total sleep time (TST) defined as the duration of all epochs labeled as sleep; 2) the sleep onset latency (SOL) measured as the time from light-off to the first epoch of sleep; and 3) the wake after sleep onset (WASO), which is the time scored as wake from first sleep epoch to light-on. Two additional parameters were added: 1) the wake during sleep period (WDSP), calculated as the time of wake between the first and the last epoch of sleep and 2) the latency to arising (LTA) measured as the elapsed time from last sleep epoch to light-on 15 .

Statistical analysis
We assessed the outcomes of the three devices both on a pooled-epoch and on a per-subject basis. The objectives of the statistical analysis were threefold: 1) to compare epoch by epoch the respective abilities of JAWAC and ACTG to differentiate sleep from wake states; 2) to compare their estimates of sleep parameters; and 3) to explore the differences between them.
In the epoch by epoch comparison, we combined all scored epochs from all subjects on a pooled-epoch basis and for each device. A three-way presentation of the results was used in accordance with the recommendations of the statistical guidance of the Food and Drug Administration (FDA) 16 .
The classification of the epochs by the PSG was used as reference. According to Tyron's method 17 , sensitivity, specificity and accuracy were used to compute the percentage of matching epochs between each tested device and PSG. Sensitivity is the measure of the correctly identified sleep epochs. It is calculated by dividing the number of epochs correctly recognized by the tested device as sleep by the total number of PSG-identified sleep epochs. Specificity is the proportion of correctly identified wake epochs and is calculated by dividing the number of epochs the device correctly identified as wake by the total number of PSG-identified wake epochs. Accuracy is the overall agreement between PSG and the device. Accuracy is determined by dividing the cumulative number of correctly identified sleep and wake epochs by the total number of epochs in the recording period. Wake and sleep epoch agreements were analyzed for each device against PSG using the Cohen's Kappa correlation, which determines the amount of agreement that can be expected by chance. This statistic ranges from 1, which demonstrates perfect agreement, to 0 which demonstrates agreement based on chance alone, and to -1 which demonstrates complete disagreement.
To investigate the differences between ACTG and JAWAC, we analyzed their agreements and disagreements. First, three agreement levels between ACTG and JAWAC were calculated: a) the overall agreement, is the percentage of the total number of epochs labeled identically by the two tested devices; b) the agreement in 'sleep epochs' is the percentage of identical labeling by the two tested devices in the epochs scored by the PSG as sleep; and c) the agreement in 'wake epochs' is the percentage of identical labeling by the two tested devices in the epochs scored by the PSG as wake. Second, in epochs where the two tested devices disagreed, we used the discrepant resolution test considering PSG as resolver to determine the 'right' device.
In the comparison between the estimates of sleep parameters, we proceeded to a per-subject analysis. The means and standard deviations of each sleep parameter were calculated for PSG, ACTG and JAWAC. The Pearson correlation coefficient, with 95% confidence interval was reported for the two devices against the PSG gold standard values.
One-way ANOVA tests were used to verify if the ACTG and JAWAC estimates varied from the PSG measurement and Pairwise t-tests with adjusted p-values (using Holm-Bonferroni correction) were used to determine the significance of any difference between ACTG and JAWAC with the null hypothesis (the two devices provide equivalent performance). All statistically significant conclusions are made at an α=0.05 level.
To illustrate the data, we calculated for each subject the difference between the PSG measurements and the ACTG and JAWAC estimates. The mean differences (bias), the standard deviation of the differences and the limits of agreement were reported. A positive bias indicates an overestimation, and a negative bias indicates an underestimation relative to the PSG analysis, by the ACTG or JAWAC. For each sleep parameter, we displayed the Bland and Altman plots. Since PSG is the gold standard, we considered for each subject the PSG results rather than the mean of two methods to be plotted against the difference between PSG measurements and ACTG and JAWAC estimates 18 .

RESULTS
One hundred seven sleep recordings were needed to obtain 100 simultaneous recordings unaffected by technical faults: 3 recordings were excluded for lack of synchronization, 1 for ACTG software problem, and 3 for JAWAC signal loss.
The demographic characteristics and sleep variables for the three devices are presented in Table 1. PSG was considered as normal in 6 patients (normal distribution and proportion of sleep stages; IAH <5; PLM <15/h). 42 patients were diagnosed with severe sleep apnea syndrome (IAH >30), 31 with moderate SAS (15<IAH<30) and 19 with mild SAS (5<IAH<15). Periodic limb movement (PLM >15/h) was present in 70 patients.

Performance of the sleep ⁄wake classification
In Table 2, a three-way presentation compares the sleep/ wake classifications according to each device and the different combinations of labeling between ACTG and JAWAC for the overall epochs (106,456) and also for epochs scored by the PSG as sleep (81,169) or wake (25,287).
The performance of ACTG and JAWAC compared to PSG are presented in Table 3. Both devices showed an identically high level of accuracy (82.86% for ACTG and 83.17% for JAWAC). The high sensitivity levels (96.26 for ACTG and 91.78 for the JAWAC) confirm their excellent ability to identify epochs of "sleep". Whereas a higher specificity of JAWAC (55.54) compared with ACTG (39.88) indicates its greater efficiency in identifying epochs of "wake". The JAWAC's Cohen's Kappa coefficient (0.50) even though moderate, was slightly higher than ACTG (0.43).

Sleep parameters concordance
The sleep parameters calculated by each of the three devices are shown in Table 4. All ACTG estimates differed significantly from their corresponding PSG measures. TST was largely overestimated by ACTG, while the other parameters related to 'wake' were all underestimated.
The Pearson correlation coefficient between ACTG and PSG showed a good and significant correlation for TST, WASO, and WDSP and poor or non-significant correlation for SOL and LTA.
The JAWAC's estimates of TST and SOL were not significantly different from the PSG measurements, while WASO/wake time was underestimated and significantly different. Interestingly, a differentiation of WASO into its two components, WDSP and LTA, showed that this difference is due to the WDSP, which is underestimated rather than to the LTA, which does not show a significant difference. The correlation coefficient for the JAWAC estimates showed a significant correlation for all the parameters with a good correlation for TST and LTA and a modest one for SOL, WASO, and WDSP.

Bias and precision statistics
The Bland and Altman statistics for the sleep parameters are described in Table 5 and the plots are shown in Figure 1. According to these results, the directions of the biases were the same for both devices: an overestimation of TST by the 2 tested devices but the overestimation with the JAWAC was less than half the one for the ACTG. Moreover, the JAWAC expressed a rather adequate estimation of SOL. The other wake parameters were underestimated by the two tested devices but far less for the LTA, with the JAWAC. The biases with JAWAC were closer to zero and with more tightened limits of agreement for all sleep parameters and showed a greater degree of constancy with regard to TST, SOL, and LTA. ACTG on the other hand showed a constant bias only with regard to TST whereas they tended to diverge farther as the values for PSG increased. Table 6 presents on a pooled-epoch basis, the analysis of agreements and disagreements between ACTG and JAWAC in classifying epochs into sleep and wake.    These results showed a good level of agreement (0.82) between the two devices, and where they agreed, their degree of convergence with the reference PSG scoring was excellent (0.90). We used the discrepant analysis of disagreement with PSG as a resolver to analyze the disagreements between ACTG and JAWAC 16 .

Exploring the differences between ACTG and JAWAC
This showed that both devices overall produced the same level of agreement with PSG (0.49 for ACTG and 0.51 for JAWAC). ACTG was more accurate in correctly identifying the sleep epochs (0.72). Conversely, JAWAC was more accurate in correctly identifying wake epochs (0.68).
The sleep parameters (Table 4) demonstrate the presence of a significant difference between ACTG and JAWAC estimates of TST but not those of WASO, although when this is broken down into its two components, LTA and WDSP, a significant difference is found with regard to the former but not to the latter.

DISCUSSION
In the current study, we sought to assess the performance of the JAWAC, a new HST device based on MM analysis, to identify wake and sleep state and provide estimates of sleep parameters.
We chose to include patients who were referred to the sleep laboratory for a PSG, regardless of the nature of the suspected sleep disorder, so as to minimize the impact of a given sleep disorder on the performance of the device 6 .
We also compared the results of JAWAC with those of both the reference standard (PSG) and the non-reference standard (ACTG). This triangular comparison allowed a more accurate assessment of the differences between JAWAC and ACTG.
Our results showed that JAWAC, as compared to ACTG, classified sleep and wake states with greater specificity, while the overall accuracy and sensitivity of the two devices were comparable. Furthermore, the use of PSG as the determining factor in disagreements between the two HST devices showed a superiority of JAWAC in correctly identifying the wake epochs. The fact that wake occurs more frequently in patients with sleep breathing disorders as compared to normal subjects, regardless of its distribution within the TIB, may explain the greater impact of the degree of specificity of such surrogate devices, as we have defined above, on the quality of the sleep analysis in those patients.
In the choice of parameters, we used TST as a specific measure of the duration of sleep. We also made a distinction between different situations of wakefulness during TIB, by designating SOL and LTA as two periods of quiet wake, distinct from WDSP, which is a measure of the time awake during the sleep period. WASO was calculated by adding WDSP to LTA. The sleep efficiency, which is the ratio of TST to TIB, was reported in table 1 but not included in the statistical analysis due to the fact that the denominator was common for the three devices.
Set against PSG as reference, JAWAC proved efficient in distinguishing sleep from quiet wakefulness. This was illustrated by the fact that the differences in TST, SOL, and LTA values were not statistically significant. However, WDSP and subsequently WASO were slightly underestimated.
In contrast, the dissimilarities between ACTG estimates and PSG measurements of all the above sleep parameters were statistically significant; TST was overestimated whilst SOL, LTA, WDSP, and WASO were underestimated. These results of the ACTG performance are broadly in agreement with earlier reports, which show that actigraphies, regardless of the manufacturer and software, tend to rate quiet wakefulness as sleep, and hence systematically overestimate TST and underestimate SOL and WASO 5 .
The results presented in this study indicated that, besides its ability to reliably estimate TST, JAWAC was able to overcome the important problem of the recognition of the state of quiet wakefulness.
The better performance of JAWAC in this regard is probably related to the distinct behavior of mandibular muscles during sleep. The position of the mandible is the result of a balance of forces between the jaw-closing and the jaw-opening muscles. The onset of sleep has different effects on the basal activity of those muscles, where masseter and medial pterygoid show a significant decrease in their tonic activities whereas genioglossus and geniohyoid muscles maintain a greater phasic activity 19 . The resultant imbalance induces thus an opening movement of the jaw that is recorded both in healthy adults and patients with obstructive sleep apneas syndrome 20,21 .
The automatic analysis algorithm used in JAWAC recognizes the onset of sleep from a decrease in the amplitude of the jaw movements associated with a slight mouth opening lasting for at least 2 minutes, whereas the onset of wakefulness is recognized from a sharp increase in the amplitude of movements 10 . The ability of JAWAC to efficiently detect these specific mandibular movements probably explains its success in identifying the state of quiet wakefulness.
This study has however some limitations, which ought to be taken into account in future research. Just as it has been shown in the literature that actigraphies' ability to measure sleep may be affected to different extent within different sleep disorders. Further studies should be carried out to evaluate the performance of JAWAC in specific patient groups, such as in those suffering from insomnia and sleep breathing disorders. Furthermore, in the choice of sleep parameters it seems appropriate to take into account sleep efficiency, as this parameter is widely used in validation studies.

CONCLUSION
Our study demonstrates the ability of JAWAC to correctly identify sleep/wake epochs and thus give an accurate estimation of various sleep parameters. This feature combined with its capacity to record sleep respiratory events, validated in previous studies, makes it a device unique in its ability to calculate AHI reliably using a single sensor, a valuable asset in the science of HSTs that seeks to couple simplicity with reliability.