Evaluation of the Accuracy of Contactless Consumer Sleep-Tracking Devices Application in Human Experiment: A Systematic Review and Meta-Analysis

Compared with the gold standard, polysomnography (PSG), and silver standard, actigraphy, contactless consumer sleep-tracking devices (CCSTDs) are more advantageous for implementing large-sample and long-period experiments in the field and out of the laboratory due to their low price, convenience, and unobtrusiveness. This review aimed to examine the effectiveness of CCSTDs application in human experiments. A systematic review and meta-analysis (PRISMA) of their performance in monitoring sleep parameters were conducted (PROSPERO: CRD42022342378). PubMed, EMBASE, Cochrane CENTRALE, and Web of Science were searched, and 26 articles were qualified for systematic review, of which 22 provided quantitative data for meta-analysis. The findings show that CCSTDs had a better accuracy in the experimental group of healthy participants who wore mattress-based devices with piezoelectric sensors. CCSTDs’ performance in distinguishing waking from sleeping epochs is as good as that of actigraphy. Moreover, CCSTDs provide data on sleep stages that are not available when actigraphy is used. Therefore, CCSTDs could be an effective alternative tool to PSG and actigraphy in human experiments.


Introduction
Sleep is an important indicator when analyzing fatigue [1,2], working performance [3,4], mood [5,6], circadian entrainment [7,8], etc. Some human experiments om large samples have been conducted in the field, with sleep monitoring performed at home [6,9,10]. Although the gold standard for measuring sleep, polysomnography (PSG), is the most precise device used to evaluate sleep quality, it is expensive and requires the assistance of specialists [11]. In addition, the discomfort caused by electrodes attached to the human body can interfere with normal sleep behavior, making it impossible to monitor long-term sleep in home environments [12]. Currently, actigraphy has become a widely used instrument for assessing sleep in human experiments [1,[13][14][15]. Compared with PSG, actigraphy provides a convenient and feasible solution for long-term sleep monitoring in the home environment. However, actigraphy is still costly to use in sleep experiments with a large sample size as it requires simultaneity. Moreover, as a wearable device, some people feel uncomfortable and easily forget to wear them before sleep [16].
Contactless consumer sleep-tracking devices (hereinafter referred to as 'CCSTDs') have a low cost and are simple to operate. Whether it is a bedside device using technologies, such as radiofrequency identification, an infrared camera, or a mattress device that

Bias Assessment
The studies' quality was assessed (unblinded) using the standards of the quality assessment of diagnostic accuracy studies, version 2 (QUADAS-2) [27] (see Supplementary Materials ( Figures S1 and S2)).

Statistical Analysis
For studies that compared CCSTDs with PSG and provided quantitative data, the mean, standard deviation, and sample size were used to produce a pooled estimated mean effect size and 95% confidence intervals (CI) using a random effects model. Heterogeneity among studies was determined by calculating I 2 statistics. I 2 values in the order of 25%, 50%, and 75% were considered as having small, medium, and large heterogenicity, respectively [28]. A threshold probability of 5% (p = 0.05) was selected as the basis for rejecting the null hypothesis. As recommended, the threshold probability of 10% (p = 0.10) was the basis for testing the significance of heterogenicity, and also, for determining the statistical significance of subgroup comparisons [29]. Meta-analyses were performed using Stata version 14, with metan, metafunnel, and metabias packages. In addition, Review Manager 5.4.1 and R program were used for analysis in this study. Figure 1 presents a summary of the selection and qualification of articles that were reviewed. A total of 1744 publications were retrieved through searching the databases. An additional publication was identified in the reference list of identified articles. After removing duplicate publications found in multiple databases, 1165 articles remained for screening. The examination of individual titles and abstracts yielded 137 publications for the full-text appraisal. However, after scrutinizing each of them according to a priori inclusion and exclusion criteria, only 26 articles were qualified for systematic review [17,20,. Out of the 26 articles, one did not evaluate PSG and CCSTD simultaneously [30], which likely led to a bias; three did not include data of meta-analysis [30,39,41,46]. Finally, 22 articles qualified for meta-analysis [17,20,[31][32][33][34][35][36][37][38]40,[42][43][44][45][47][48][49][50][51][52][53].

Main Characteristics of the Studies Included in the Review
The descriptive characteristics of 26 included articles are summarized in Table 1, and  the results from qualified publications on CCSTDs and actigraphy versus PSG can be found  in the Supplementary Materials Table S2. These articles described 32 studies of CCSTDs. Among the studies that used PSG as the standard test, five studies compared CCSTDs and actigraphy using the same cohort [34,35,42,49,53]. Their performance was tested against that of PSG by making participants wear both CCSTD and an actigraphy device. CCSTDs use two types of device, i.e., mattress-based devices and bedside devices. Out of 26 articles, 11 (42%) studied mattress-based devices, 12 (46%) studied bedside devices, 2 (8%) studied both mattress-based and bedside devices [49,51], and 1 (4%) studied bedside an automatic video device [36]. The most common device sensors used in CCSTDs were piezoelectric sensors (n = 9), pressure sensors (n = 5), radiofrequency sensors (n = 17), and infrared cameras (n = 1). Figure 2 shows the different types of sensors and their sleep monitoring mechanism. Piezoelectric or pressure sensors are used in mattress-based devices, and radiofrequency sensors and infrared cameras are used in bedside devices. These sensors could monitor two or four physiological signals in real time, including heart rate, respitation rate, body movement, and sleep posture, and data were analyzed to determine the sleep status via built-in algorithms. These studies were performed in 11 countries. The number of participants ranged from 5 to 198, and they aged from 16 to 84 years. The participants included normal sleepers and those diagnosed with sleep (PLMS), obstructive sleep apnea (OSA), sleep-disordered breathing (SDB), central disorders of hypersomnolence (CDH), insomnia, diabetes, hypertension, arthritis, perioperative, and septic shock, according to periodic limb movements. Feng et al. [45] was included in the patient group in the Sensors 2023, 23, 4842 4 of 17 meta-analysis since only one of the subjects did not have OSA. Among twenty-six articles, experiments in twenty-one articles (81%) were conducted in sleep laboratories, one (4%) was conducted in the home environment, three (11%) were conducted either at home or in a sleep laboratory, and one was conducted (4%) in an ICU [20].

Main Characteristics of the Studies Included in the Review
The descriptive characteristics of 26 included articles are summarized in Table 1, and the results from qualified publications on CCSTDs and actigraphy versus PSG can be found in the Supplementary Materials Table S2. These articles described 32 studies of CCSTDs. Among the studies that used PSG as the standard test, five studies compared CCSTDs and actigraphy using the same cohort [34,35,42,49,53]. Their performance was tested against that of PSG by making participants wear both CCSTD and an actigraphy device. CCSTDs use two types of device, i.e., mattress-based devices and bedside devices. Out of 26 articles, 11 (42%) studied mattress-based devices, 12 (46%) studied bedside devices, 2 (8%) studied both mattress-based and bedside devices [49,51], and 1 (4%) studied bedside an automatic video device [36]. The most common device sensors used in CCSTDs were piezoelectric sensors (n = 9), pressure sensors (n = 5), radiofrequency sensors (n = 17), and infrared cameras (n = 1). Figure 2 shows the different types of sensors and their sleep monitoring mechanism. Piezoelectric or pressure sensors are used in mattress-based devices, and radiofrequency sensors and infrared cameras are used in bedside devices. These sensors could monitor two or four physiological signals in real time, including heart rate, respitation rate, body movement, and sleep posture, and data were analyzed to determine the sleep status via built-in algorithms. These studies were performed in 11 countries. The number of participants ranged from 5 to 198, and they aged from 16 to 84 years. The participants included normal sleepers and those diagnosed with sleep (PLMS), obstructive sleep apnea (OSA), sleep-disordered breathing (SDB), central disorders of hypersomnolence (CDH), insomnia, diabetes, hypertension, arthritis, perioperative, and septic shock, according to periodic limb movements. Feng et al. [45] was included in the patient group

Publication Bias
According to the Cochrane handbook, a publication bias test was performed on the outcome indicators included in the meta-analysis for more than ten studies. The Funnel plot showed no publication bias (see Supplementary Materials Figure S3). Egger's test

Publication Bias
According to the Cochrane handbook, a publication bias test was performed on the outcome indicators included in the meta-analysis for more than ten studies. The Funnel plot showed no publication bias (see Supplementary Materials Figure S3). Egger's test (P TST = 0.608, P SOL = 0.595, P SE = 0.458, P WASO = 0.637, P lightsleep = 0.172, P deepsleep = 0.997, and P REM = 0.487) and Begg's test (P TST = 0.733, P SOL = 0.434, P SE = 0.707, P WASO = 0.620, P lightsleep = 0.371, P deepsleep = 0.721, and P REM = 0.592) indicated no publication bias. Out of the 26 articles, 22 articles were included in the meta-analysis. Of the four articles excluded, three articles reported their results without raw data, and in one article, the CCSTD was not monitored at the same time as the gold standard, PSG, in the experiment [30], which likely led to a bias. The articles included described thirty-six studies in total, with six articles each reporting from two to five studies of CCSTDs. A summary of the values for each outcome is provided in Table

Subgroup Analyses
The results of subgroup analyses of sensors, device type, participant type, and brands compared with those of PSG are presented in the Supplementary Materials Tables S3-S6 and Figures S4-S31.
The subgroup analyses of sensors reveal that there were no significant differences for the piezoelectric sensor in the estimation of TST, SOL, WASO, and REM; for the pressure sensor in the estimation of SOL, SE, and deep sleep; for the radiofrequency sensor in the estimation of sleep stages; for infrared camera in the estimation of TST. In the subgroup of device types, there were no significant differences for the mattress-based devices in the estimation of SOL, WASO, and sleep stages or for the bedside devices in the estimation of sleep stages. In the subgroup of participant types, there were no significant differences among healthy participants in assessing sleep stages; for patient who participated in REM studies; for healthy and patient who participated in WASO and deep sleep studies. The results of the subgroup analyses for the brand of devices showed no significant differences for the use of ResMed S+ in terms of all the sleep parameters; for the use of Beddit in terms  Figure 3 displays error bars of the EBE agreement for sleeping and awake states in terms of sensors, the device type, the health conditions of the participants, and the brand of device. As only one study [20] provided EBE data given by the pressure sensor, it was not included in the error bar of the sensors subgroup. The analysis of four subgroups showed a high degree of consistency in identifying asleep and awake states, which related to high sensitivity and low specificity, respectively. Compared to the mean values in the subgroups, the results of the EBE agreement for asleep and awake states in terms of accuracy, sensitivity, and specificity are as follows: the piezoelectric sensor (accuracy: N = 2, 0.90 ± 0.01; sensitivity: N = 3, 0.92 ± 0.05; specificity: N = 3, 0.56 ± 0.21) was better than the radiofrequency sensor was (accuracy: N = 19, 0.81 ± 0.06; sensitivity: N = 19, 0.90 ± 0.07; specificity: N = 19, 0.51 ± 0.11); the mattress-based one (accuracy: N = 3, 0.83 ± 0.13; sensitivity: N = 4, 0.91 ± 0.04; specificity: N = 4, 0.52 ± 0.19) was slightly better than the bedside one was (accuracy: N = 19, 0.81 ± 0.06; sensitivity: N = 19, 0.90 ± 0.07; specificity: N = 19, 0.51 ± 0.11); healthy patients (N = 13, 0.94 ± 0.03) had the best sensitivity results; the healthy + patient group (N = 5, 0.53 ± 0.16) was slightly better than the healthy group was (N = 13, 0.52 ± 0.12) in terms of specificity; the healthy (N = 13, 0.84 ± 0.06) and healthy + patient groups (N = 4, 0.82 ± 0.06) generally had consistent accuracy, and the patient group had the worst scores of all; the others group (B) (N = 3, 0.96 ± 0.02) scored the best in terms of sensitivity, and ResMed S had a higher accuracy (N = 4, 0.84 ± 0.09) and specificity (N = 4, 0.59 ± 0.15) than the other brands did. The coefficients of variation (CV) of the piezoelectric sensor, mattress-based devices, the healthy group, and the others group (B) had the lowest sensitivity scores, while the CVs of the radiofrequency sensor, bedside devices, the patient group, and SleepMinder had the lowest specificity results.

Sleep Stage Identification
A total of eight studies (n = 390 samples) appraised the performance of CCSTDs using sleep staging functions in identifying sleep stages via EBE analysis [17,[39][40][41][42][43]48,49]. Out of these eight studies, two consisted of two or three different samples, thereby increasing the total number of evaluations to eleven. Figure 4 displays the error bar of the EBE agreement between sleep stages. Compared with PSG, CCSTDs had the highest sensitivity (N = 4, 0.64 ± 0.05) and the lowest specificity (N = 4, 0.60 ± 0.08) and accuracy (N = 9, 0.63 ± 0.06) in detecting light sleep. REM showed the opposite result: the highest specificity (N = 6, 0.91 ± 0.05) and accuracy (N = 9, 0.75 ± 0.13), but the lowest sensitivity (N = 6, 0.49 ± 0.20). Deepsleep (CV = 0.06) had the smallest coefficient of variation in specificity, but a high level of specificity (N = 5, 0.88 ± 0.03). Light sleep had the smallest coefficient of variation in terms of sensitivity (CV = 0.08) and accuracy (CV = 0.09). groups (N = 4, 0.82 ± 0.06) generally had consistent accuracy, and the patient group had the worst scores of all; the others group (B) (N = 3, 0.96 ± 0.02) scored the best in terms of sensitivity, and ResMed S had a higher accuracy (N = 4, 0.84 ± 0.09) and specificity (N = 4, 0.59 ± 0.15) than the other brands did. The coefficients of variation (CV) of the piezoelectric sensor, mattress-based devices, the healthy group, and the others group (B) had the lowest sensitivity scores, while the CVs of the radiofrequency sensor, bedside devices, the patient group, and SleepMinder had the lowest specificity results.

Discussion
The aim of the study was to assess the feasibility of CCSTDs as an alternative to PSG and actigraphy using large sample size or via the long-term monitoring of humans through a comprehensive analysis of the effectiveness of CCSTDs. The findings are based on a review of relevant published articles and a meta-analysis.
According to the qualified publications, all CCSTDs can track sleep and wake, among which ResMed S+, EarlySense Live, and Somnofy can also monitor sleep stages. Compared to PSG, CCSTDs overestimated TST, SE and deep sleep, and underestimated SOL and WASO, but showed non-significant difference for light sleep only. The EBE analysis of asleep and awake participants showed that CCSTDs had a high degree of sensitivity (0.90 ± 0.06), but a relatively low degree of specificity (0.51 ± 0.12), indicating a tendency for the devices to accurately detect sleep epochs, but less accurately detect awake epoch. Moreover, there is a wide range of values for accuracy (i.e., between 0.68 and 0.91) of sleep and wake epoch identification. In terms of EBE agreement for sleep stages, the degree of sensitivity was relatively low for light sleep, deep sleep, and REM. The degree of specificity was relatively high, with a narrower range of values for deep sleep, and REM, but low values for light sleep. This indicates an overall poorer and inconsistent ability of CCSTDs to correctly detect sleep stage epochs [17,49]. The results of the analysis of the accuracy for EBE agreement for sleep stages confirm this. In summary, the accuracy of CCSTDs varied widely, which could be attributed to the varied health states of participants and different sensors and algorithms used in CCSTDs.
The health state of participants is one of the most important factors affecting the accuracy of CCSTD sleep monitoring. For healthy participants with normal sleep patterns and behaviors, the CCSTDs results did not significantly differ in assessing sleep stages compared to those of PSG, but for the patient participants, CCSTDs did not show significant difference in terms of REM only. In the cases when healthy participants were involved, CCSTDs demonstrated much higher degrees of accuracy, sensitivity, and specificity than when patients were involved in the EBE analysis of asleep and awake epochs. The same conclusion was confirmed in previous studies that used the same device to study healthy participants and patients [20,31,35,55]. This is because patients who suffer from disturbed sleep are prone to repeated arousal during sleep [56]. The more wakefulness occurs, the more erroneous data there are, leading to sleep overestimation [57]. Hence, sleep/wake identification becomes more inaccurate for patients.
The sensor is the most important factor affecting the accuracy of CCSTDs in sleep monitoring. CCSTDs are used to sense the sleep state by detecting one or more physiological signals, such as chest and abdominal breathing movements, heart movements, and body movements using sensors, such as radiofrequency, infrared light, pressure, and piezoelectricity ones, and microphones [58]. The subgroup analysis of sensors showed that both piezoelectric and radiofrequency sensors performed well in sleep monitoring.
However, the piezoelectric sensor had high degree of accuracy in terms of both wakefulness and sleep, while the radiofrequency sensor had high degree of accuracy in terms of sleep only. This is consistent with the results of the EBE analysis, where the piezoelectric sensor had excellent accuracy and sensitivity, and the radiofrequency sensor excelled only in terms of sensitivity. The reason for this is that the piezoelectric sensor collects data on heartbeat rate and respiratory rate, body movements, and sleep postures during sleep, while the radiofrequency sensor obtains sleep data by detecting breathing and body movements [59]. In addition, it is difficult to detect small movements within the body, such as heartbeat or pulse, using current radiofrequency technology, and it cannot monitor multiple people simultaneously [58]. Therefore, the radiofrequency sensor is slightly worse than the piezoelectric sensor in terms of sleep monitoring performance. This could also explain the results of the meta-analysis in which mattress-based devices performed slightly better than bedside devices did in terms of the mean values of sensitivity and specificity for monitoring the sleep period. The other advantages of mattress-based devices over bedside devices are that they not only allow small movements within the body to be more easily detected, but they also allow multiple people in the same room to be monitored simultaneously [16].
It is worth noting that some bedside instruments with better sensor performances and better algorithms show as accuracy values as high as those of mattress instruments, such as ResMed S+, EarlySense, and Somnofy [42,48,49], indicating a significant difference among different brands. The results of the subgroup meta-analysis by brand confirm this. For example, ResMed S+ performed as well PSG did in measuring the sleep parameters in this review.
Therefore, CCSTDs, especially, mattress-based devices with a built-in piezoelectric sensor, demonstrated excellent performances in the sleep monitoring of healthy populations. They can be used as an alternative tool to PSG in monitoring overall sleep conditions in long-period and large-sample healthy population experiments, rather than experiments that require precise data because CCSTDs overestimation or underestimation some sleep parameters to some degree [43,47,49,53].
Actigraphy devices have been extensively validated [34,60], and found to be highly sensitive (0.965) and accurate (0.863), but poorly specific (0.329) to sleep [61]. Actigraphy is used to estimate asleep/awake epochs by measuring body movements using accelerometers. The definition of sleep is a lack of movement. Thus, the brief awake periods that produce a small amount of motion and physical stillness that precede sleep onset are often classified incorrectly as sleep, leading to low specificity, overestimated TST and SE, and underestimated SOL and WASO, especially among individuals with sleep disturbances [61][62][63][64]. The specificity for studies that have included healthy participants ranged from 0.269 to 0.77 [56,[65][66][67][68][69][70][71][72][73], whereas others that included a variety of patient groups report specificity values that range from 0.325 to 0.80 [61,[74][75][76]. A low degree of specificity is the main limitation of sleep monitoring via actigraphy. Further limitations concern the inability to discriminate between sleep stages. Studies that compared CCSTDs with actigraphy [34,35,42,49,53] revealed that the CCSTDs demonstrated a similar degree of accuracy, a higher degree of specificity, and a lower degree of sensitivity than actigraphy did for sleep/wake epoch identification. TST, SE, SOL, and WASO estimation using CCSTDs were superior to those of actigraphy. The accuracies of actigraphy systems varies widely and are typically dependent on the observed population [77]. The accuracy of actigraphy in determining asleep and awake epochs is reasonably high in the normal subjects [78][79][80]. The American Academy of Sleep Medicine Practice Guidelines indicates that actigraphy represents a reliable method of measuring sleep in the normal healthy adult population [81]. However, the differences in accuracy are large in the presence of sleep disorders, such as sleep-disordered breathing, insomnia, and periodic limb movements [82,83]. For example, in sleep-disordered breathing patients, a per-epoch accuracy of 0.80-0.86 has been reported, and the accuracy tends to be less in severe apnea cases (0.861 in the normal group versus 0.799 in the severe OSA group) [77]. This difference is consistent with the results of EBE agreement for asleep/awake epochs in the subgroup of participant type in this study. In summary, this study shows that CCSTDs can be a valid alternative to the silver standard of actigraphy for sleep monitoring. In addition, CCSTDs have the advantage of detecting sleep stages.
However, the included studies have certain limitations. First, the participants were mostly monitored for one night. Second, there is a lack of published information, such as that about the algorithm and the reliability of hardware, to explain the different performances of various CCSTDs. Third, most of the included studies analyzed only one or two sleep indicators using the device, which may have led to a bias in the final results. In addition, a wide range of CCSTDs have appeared on the market. They may have the problems of unstandardized, undisclosed, and unvalidated data and algorithms [54,84]. Hence, in the face of the constant introduction of new devices and algorithms, it is essential to refer to the gold standard, PSG, and perform validation evaluations for specific populations before using them in human experiments.

Conclusions
This systematic review and meta-analysis have shown that CCSTDs, especially, mattressbased devices with a piezoelectric sensor, have a better accuracy when they are used in experiments involving healthy participants. They can be used as an alternative tool to PSG for long-period and large-sample human experiments that do not require precise data. Furthermore, CCSTDs demonstrate accuracy values as high as those for actigraphy and can provide data on sleep stages that are not available with the use of actigraphy. Therefore, CCSTDs can be an effective alternative tool to actigraphy for monitoring sleep.