Accuracy of 12 Wearable Devices for Estimating Physical Activity Energy Expenditure Using a Metabolic Chamber and the Doubly Labeled Water Method: Validation Study

Background: Self-monitoring using certain types of pedometers and accelerometers has been reported to be effective for promoting and maintaining physical activity (PA). However, the validity of estimating the level of PA or PA energy expenditure (PAEE) for general consumers using wearable devices has not been sufficiently established. Objective: We examined the validity of 12 wearable devices for determining PAEE during 1 standardized day in a metabolic chamber and 15 free-living days using the doubly labeled water (DLW) method. Methods: A total of 19 healthy adults aged 21 to 50 years (9 men and 10 women) participated in this study. They followed a standardized PA protocol in a metabolic chamber for an entire day while simultaneously wearing 12 wearable devices: 5 devices on the waist, 5 on the wrist, and 2 placed in the pocket. In addition, they spent their daily lives wearing 12 wearable devices under free-living conditions while being subjected to the DLW method for 15 days. The PAEE criterion was calculated by subtracting the basal metabolic rate measured by the metabolic chamber and 0.1×total energy expenditure (TEE) from TEE. The TEE was obtained by the metabolic chamber and DLW methods. The PAEE values of wearable devices were also extracted or calculated from each mobile phone app or website. The Dunnett test and Pearson and Spearman correlation coefficients were used to examine the variables estimated by wearable devices. Results: On the standardized day, the PAEE estimated using the metabolic chamber (PAEEcha) was 528.8±149.4 kcal/day. The PAEEs of all devices except the TANITA AM-160 (513.8±135.0 kcal/day; P>.05), SUZUKEN Lifecorder EX (519.3±89.3 kcal/day; P>.05), and Panasonic Actimarker (545.9±141.7 kcal/day; P>.05) were significantly different from the PAEEcha. None of the devices was correlated with PAEEcha according to both Pearson (r=−.13 to .37) and Spearman (ρ=−.25 to .46) correlation tests. During the 15 free-living days, the PAEE estimated by DLW (PAEEdlw) was 728.0±162.7 kcal/day. PAEE values of all JMIR Mhealth Uhealth 2019 | vol. 7 | iss. 8 | e13938 | p.1 https://mhealth.jmir.org/2019/8/e13938/ (page number not for citation purposes) Murakami et al JMIR MHEALTH AND UHEALTH


Introduction
Background Physical activity (PA) has been reported to reduce the incidence of and mortality because of several noncommunicable diseases, including cardiovascular disease, stroke, and some types of cancer [1][2][3]. To promote or maintain PA, self-monitoring using pedometers and accelerometers has been considered effective [4]. However, the validity of estimating the amount of PA or PA energy expenditure (PAEE) detected using wearable devices has not been sufficiently established. Previously, we simultaneously examined the validity of total energy expenditure (TEE) estimated by 12 wearable devices during 1 standardized day in a metabolic chamber and 15 free-living days using the doubly labeled water (DLW) method [5]. This study allowed the ranking of daily individual TEE (ρ=.80-.88), but absolute values varied widely among devices and differed significantly from the criterion under free living. Moreover, it is better to estimate accurately not only TEE but also daily PAEE because TEE is mainly determined by the basal metabolic rate (BMR) rather than PA [6].
Several studies have tested the validity of wearable devices for estimating energy expenditure (EE) during some activities [7][8][9][10][11][12][13][14]. However, most have compared EE estimated by wearable devices and standard reference measures estimated by an expired gas analysis during very short structured activities in laboratories [7][8][9][11][12][13]. EE measured during such study designs also included resting EE (REE) or BMR, which do not reflect net PAEE. The BMR accounts for a substantial proportion of TEE and is relatively constant from day to day. In contrast, PAEE contributes to TEE to a lesser extent, but it is a fairly variable component that allows the opportunity to increase TEE [6]. Due to the relationship between the amount of PA and health outcomes, accurate estimations of the net PAEE using wearable devices are required, especially under free-living conditions that use wearable devices. Various wearable devices are available for consumer purchase [15], but little is known about their validity.

Objectives
In this study, we evaluated the validity of consumer-based and research-grade wearable devices for estimating PAEE values without the BMR or REE. We developed 2 designs: (1) standardized day for PAEE estimated using a metabolic chamber and (2) 15 free-living days for PAEE estimated using the DLW method.

Participants
A total of 21 healthy adults aged 21 to 50 years (9 men and 12 women) participated in this study. None of the participants had chronic diseases that could affect their metabolism or daily PA. Their body mass index (BMI) values were within the normal range (18.5-25.0 kg/m 2 ). Of 21 participants, 2 were excluded from all analyses: 1 because personal information in the JAWBONE UP24 (Jawbone, San Francisco, CA, USA) app during the 15 free-living days experiment had been set incorrectly, and the other because data from the metabolic chamber during the 1 standardized day experiment was incorrect because of instrument failure. Finally, 19 participants (9 men and 10 women) were included in this analysis. All procedures were reviewed and approved by the Ethics Review Board of the National Institute of Health and Nutrition (kenei-4-02). All participants provided written informed consent.

Wearable Devices
The consumer-based wearable devices used in this study were selected based on the following criteria: they were the most popular devices in Japan according to several marketing websites based on their sales ranking (eg, Amazon, Japan website [16] or kakaku website[17] as of December 1, 2014); the app could be displayed in Japanese on a mobile phone or website; and the clock settings of the app or device could be manipulated. We needed to change the clock setting from 9:00 am to 9:00 am the next day to 12:00 am to 12:00 am the next day to obtain the TEE for an entire day when participants used the metabolic chamber. A total of 8 wearable devices, including the Fitbit Flex (Fitbit, San Francisco, CA, USA), JAWBONE UP24, Misfit Shine (Misfit Wearables, Burlingame, CA, USA), EPSON PULSENSE (SEIKO EPSON, Nagano, Japan), Garmin Vivofit (Garmin, Olathe, KS, USA), TANITA AM-160 (TANITA, Tokyo, Japan), Omron CaloriScan HJA-401F (OMRON HEALTHCARE, Kyoto, Japan), and Withings Pulse O2 (Withings, Issy-les-Moulineaux, France), were selected for this study (Table 1). In addition, 4 research-grade wearable devices, namely, Omron Active style Pro (OMRON HEALTHCARE, Kyoto, Japan), Panasonic Actimarker EW4800 (Panasonic, Osaka, Japan), SUZUKEN Lifecorder EX

Experimental Design
A total of 2 experiments were conducted to test the validity of the wearable devices: 1 used the metabolic chamber method during 1 standardized day, and the other used the DLW method during 15 free-living days. These 2 methods were used as the standard to determine TEE [18,19]. For the 1-day standardized experiment, participants visited the laboratory 2 hours before the start of the experiment (7:00 am) after an overnight fast of at least 10 hours. Then, height, weight, and body composition were measured. After setting and wearing 12 wearable devices, participants entered the metabolic chamber before 9:00 am and completed 24-hour metabolic chamber measurements (9:00 am to 9:00 am the next day) using a standardized protocol that included various activities common during daily life such as eating 3 meals, watching television (TV), using a computer, cleaning, and walking on a treadmill ( Table 2). Each participant's energy intake for meal was calculated by multiplying each BMR by 1.6, which was the PA level (PAL) assumed for a standardized day. The meal was served 3 times per day, and the total energy intake was equally divided into 3 times. The participants were instructed to eat all the meals that were served, and they were not allowed to eat any other foods in the metabolic chamber. However, they were permitted to drink water freely. The average metabolic equivalents (METs) estimated using the compendium of physical activities [20] and previous studies [21][22][23][24] for this protocol was 1.37 METs, and the mean PAEE estimated using the estimated METs×hour and participants' weight was 447.0±66.8 kcal/day. Participants wore all the wearable devices during their waking hours without removing them. The 5 devices on the wrist were worn even while sleeping. Participants wore all the wearable devices when they were awake, but they did not wear them during water-related physical activities, physical activities during which the devices were difficult to wear, or when the battery was charging. Of 12 wearable devices, 5 were worn on the wrist even while sleeping. After 15 free-living days, all urine samples were collected and stored at −30ºC until they were analyzed. Dietary assessments using a brief self-administered diet history questionnaire [25] were conducted to calculate the food quotient (FQ) after 15 days. Logs for time awake, time asleep, nonwearing time, and PA during nonwearing time were completed for 15 days by each participant. PAEE during the nonwearing time was calculated based on the recorded time and METs that were referred to the Compendium of Physical Activity [20].

Data Reduction for Each Wearable Device
For the experiment involving 15 free-living days, the days were considered valid when participants wore the wearable devices for more than 10 hours/day [26]. However, we included 1 day when a participant slept for more than 14 hours and, therefore, did not wear the devices for more than 10 hours. The minimum number of valid days was defined as 10 days, and all participants fulfilled this requirement. The mean PAEE of valid days was used for the experiment involving 15 free-living days.
The PAEE for each device (PAEE dev ) was calculated by subtracting the BMR and 0.1×TEE as diet-induced thermogenesis (DIT) from TEE estimated by each device (TEE dev

Anthropometry and Body Composition
Height and body weight were measured on both experiment days, and each profile was used for each experiment. BMI (kg/m 2 ) was calculated, and body composition was determined using a bioelectrical impedance analysis (Inner Scan BC-600; TANITA).

Measurement of Energy Expenditure on a Standardized Day Using the Metabolic Chamber
An open-circuit, indirect metabolic chamber equipped with a bed, desk, chair, TV, toilet, sink, and treadmill was used to measure EE. The temperature and relative humidity in the room were controlled at 25ºC and 55%, respectively. Oxygen and carbon dioxide concentrations of the air supply and exhaust were measured using mass spectrometry (ARCO-1000A-CH; Arco System, Kashiwa, Japan). The flow rates of the exhausts from the chamber were measured using pneumotachography (FLB1; Arco System). Oxygen uptake (VO 2 ) and carbon dioxide output (VCO 2 ) were determined based on the concentrations of the inlet and outlet air flows from the chamber and the flow rate of the exhausts from the chamber, respectively. TEE from 9:00 am the first day until 9:00 am the next day was estimated from VO 2 and VCO 2 using Weir equation (TEE cha

Measurement of Energy Expenditure During 15 Free-Living Days Using the Doubly Labeled Water Method
Gas samples for the isotope ratio mass spectrometer (IRMS) were prepared by maintaining the equilibration of the urine sample with gas. CO 2 was used to equilibrate 18 [27]. The memory effects of the IRMS were eliminated and checked using additional samples when the expected isotope ratio difference was high (eg, days 2-8), and the potential drift of the IRMS was corrected mathematically using standardized working criteria and checked for accuracy and precision using another working criterion at regular intervals in a series of measurements and between different measurement days. The samples obtained from 1 participant were analyzed in 1 series of measurements in 1 day to minimize the effects of day-to-day variation. The dilution space ratio of 2 H (Nd) and 18 O (No) of all 21 participants was 1.036±0.010 (range 1.021-1.056), which was an acceptable value according to a previous review of a large database [28]. Therefore, total body water (TBW) was calculated from the mean value or the isotope pool size of 2 H divided by 1.041 and that of 18

Statistical Analysis
Data were expressed as mean (standard deviation). The Dunnett test, for which standard criteria were set as references, was used for comparing variables estimated by wearable devices during the use of the metabolic chamber method and the DLW method. The mean absolute percent errors (MAPEs) relative to the PAEE values estimated using standard methods were calculated to provide an indicator of the overall measurement error. The Pearson and Spearman correlation coefficients were used to examine the relationship between standard criteria and variables estimated by wearable devices. Modified Bland-Altman plots [32] were used to test proportional biases between standard methods and devices, and the correlation coefficient of the standard criteria and the differences between the standard criteria and each device were examined for significance. During all analyses, P<.05 was considered statistically significant. All statistical analyses were performed with SPSS version 20.0 for Windows (IBM SPSS Japan Inc, Tokyo, Japan).

Descriptive Results
Participants were aged 32 (Tables 3 and 4). PAEE/body weight also showed similar results for PAL (Tables 3 and 4). Moreover, similar results were obtained in partial correlation test using body weight as a control variable.

Figure 2. Correlation between PAEEcha (physical activity energy expenditure) and PAEEdev during 1 standardized day. Scattered plots between
PAEEcha (x-axis) and PAEEdev (y-axis) during 1 standardized day. There was no significant correlation according to Pearson and Spearman tests. n.s.: nonsignificant.   Figure 3). On the other hand, systematic biases indicated by Bland-Altman plots were observed for all devices with negative coefficients (Figure 1). Regarding PAL, all devices except the Omron Active style Pro (1.72±0.10; P>.05) and Omron CaloriScan (1.71±0.09; P>.05) showed significant differences in PAL dev compared with PAL dlw (1.73±0.21; Tables 3 and 4).
No devices showed a significant correlation with PAL dlw according to both Pearson and Spearman tests (Tables 3 and 4). PAEE/body weight also showed results similar to those of PAL (Tables 3 and 4). Moreover, similar results with partial correlation were obtained using body weight as a control variable.

Unique PAEE by Consumer-Based Devices
On the standardized day, we also compared the PAEE cha with the unique PAEE parameters obtained by 6 of the 8 consumer-based devices ( Table 5). The absolute values from each device were not compared with PAEE cha because we could not find any information about these parameters and could not define the value as PAEE. None of the parameters showed a significant correlation with PAEE cha .

Principal Findings
We examined the validity of 12 consumer-based and research-grade wearable devices for estimating PAEE using a metabolic chamber and the DLW method as standard methods. On the standardized day, most of the wearable devices showed significant differences in PAEE when compared with PAEE cha (MAPE 26.5%-93.7%). Moreover, all wearable devices except the Omron CaloriScan and Omron Active style Pro significantly underestimated values during 15 free-living days (MAPE 19.4%-100.2%). These results were similar, even for PAL. The number of wearable devices with significant differences in PAEE compared with the standard criteria in this study was greater than the number of devices with significant differences in TEE in our previous study using same 12 devices; we found that only 2 devices during the standardized day and 4 devices during 15 free-living days showed significant differences in TEE compared with the standard criteria [5]. These results showed that wearable devices had lesser accuracy when estimating PAEE than TEE, which included the BMR.

Comparison With Previous Studies
Several studies have evaluated the validity of EE estimated by wearable devices during some activities [7][8][9][10][11][12][13][14]. Most of these studies were conducted during very short structured activities in laboratories. For the most studied device (Fitbit), there were many inconsistent results such as overestimated EE [4,33,34], underestimated EE [11,12], and comparable EE [8]. It has also been reported that the EE estimations based on the Fitbit were largely different depending on the activity types performed during those studies [8,12]. These discrepancies may have been dependent on the differences in the standard criteria, EE assessment method, and selected activities. In this study, the PAEE estimated by the Fitbit Flex was somewhat comparable with standard PAEEs during a standardized day and during 15 free-living days in consumer-based wearable devices, which was consistent with the results of the Fitbit Zip [11]. Furthermore, in this study, the JAWBONE UP24 underestimated PAEEs during both experiments, which was consistent with the results of previous studies [7,11]. However, the Misfit Shine and Garmin Vivofit underestimated PAEE during this study but overestimated PAEE during previous studies [7,9]. Attention is necessary when directly comparing the present results of this study with the previous results because what was used to evaluate PAEE was slightly different. We evaluated TEE−BMR−TEE×0.1 as PAEE (ie, net EE with PA); however, most previous studies that evaluated EE included the BMR or REE during experimental activities as PAEE. We also compared the unique indices of PAEE provided by several devices as PAEE cha (Table 5). These were indicated on the app as active EE or exercise EE. However, no parameters were significantly correlated with PAEE cha . Most evidence that demonstrated the relationship between PA and risk reduction of disease based on epidemiological studies were described as the amount of PA but not as the TEE. Therefore, it is important to accurately assess daily PAEE in terms of preventive medicine and public health.

Underestimation Under Free Living
In a comparison of the results of the standardized day and those of 15 free-living days, all wearable devices except the Omron CaloriScan and Omron Active style Pro underestimated PAEE for 15 free-living days, whereas 6 devices underestimated PAEE, and 3 devices overestimated PAEE on the standardized day. Because TEE measurements using the metabolic chamber have been reported as not significantly different from TEE measured by DLW methods on the same days [35], our results were not caused by different criteria for the TEE assessment. It has been reported that cycling and washing laundry are underestimated by wearable devices [8,12]. Moreover, standing that does not produce acceleration may be classified as sedentary behavior [36]. These types of PA during free-living days may have caused underestimation of PAEE in this study. Although early consumer-based wearable devices for estimating PA relied on movement sensors alone (eg, accelerometers), more recently developed wearable devices integrate several physiological or geographical outputs, including heart rate, skin temperature, galvanic skin response, and a global positioning system [37]. PAEE that cannot be captured by an accelerometer may be accurately estimated using these multisensor wearable devices in the future. Another reason for the underestimation of PAEE during free-living days could have been transition in postures (eg, sit-to-stand), transition in directions, and acceleration and deceleration during movements. Recent studies have suggested that significant additional EE is associated with changing directions and/or changing postures [38][39][40][41], and those transitions are often observed during free-living days [42,43]. However, those elements were not usually considered to establish and validate PA monitors. To assess actual PAEE during daily life, it is necessary to continuously evaluate the validity of these sensors for estimating PAEE.

Perspectives
Wearable devices can be powerful tools that provide not only individual information but also large-scale population data on a global scale. Most wearable devices can connect to the internet through an app on a user's mobile phone and collect data. Using 68 million days of step count data from 717,517 users of the Argus Smartphone app, Althoff et al [44] showed that inequality in PA within a country was associated with the prevalence of obesity in the population. Moreover, multiple aspects of health behavior need to be monitored simultaneously and continually because our health outcomes resulted from various health behaviors that included not only PA but also daily diet, smoking, and sleep [45]. Under such circumstances, it is important to be able to properly evaluate the multilateral health behavior and physiological parameters globally. However, some problems have been highlighted by the continuous wearing of such a device. One-third of owners of a consumer-based wearable device stopped using it within 6 months [15]. Therefore, it is necessary to enhance continuity and strive to maintain and improve health outcomes through various other approaches.

Limitations
There were some limitations to this study. First, the sample size was small and restricted to normal-weight individuals; therefore, results cannot be generalized to obese or lean people. Comprehensive validation extending to other populations with various PALs is required. Because it was expected that some types of PA were underestimated and some were overestimated by wearable devices, different PA situations may lead to different results. Second, we could not examine the validity of all wearable devices for all types of activity during a standardized day. Different settings using different intensities and other types of activities may lead to different results. We also need to confirm the results in different settings or examine the validity of each activity performed during a standardized day to reveal the causes of the underestimation and overestimation. Third, BMR values estimated by several wearable devices were obtained as whole-day values with stable situation. This was not supposed by the manufacturers; therefore, we might have used BMR incorrectly for several devices, which might have led to erroneous estimations of PAEE because it was calculated by subtracting the BMR from the TEE. Therefore, comparisons of absolute values of PAEE for these devices in this study must be interpreted with caution.

Conclusions
In conclusion, most wearable devices showed PAEEs that were significantly different from those estimated using gold standard methods during a standardized day and 15 free-living days. It is possible that the PAEE of some PA is underestimated during free-living situations by wearable devices. The development of wearable devices that can accurately estimate PAEE will lead people to use them as motivational tools. Moreover, this will allow researchers to precisely understand PA in an observational study or intervention study, thereby leading to public health recommendations based on scientific evidence.