Validity Evaluation of the Fitbit Charge2 and the Garmin vivosmart HR+ in Free-Living Environments in an Older Adult Cohort

Background: Few studies have investigated the validity of mainstream wrist-based activity trackers in healthy older adults in real life, as opposed to laboratory settings. Objective: This study explored the performance of two wrist-worn trackers (Fitbit Charge 2 and Garmin vivosmart HR+) in estimating steps, energy expenditure, moderate-to-vigorous physical activity (MVPA) levels, and sleep parameters (total sleep time [TST] and wake after sleep onset [WASO]) against gold-standard technologies in a cohort of healthy older adults in a free-living environment. Methods: Overall, 20 participants (>65 years) took part in the study. The devices were worn by the participants for 24 hours, and the results were compared against validated technology (ActiGraph and New-Lifestyles NL-2000i). Mean error, mean percentage error (MPE), mean absolute percentage error (MAPE), intraclass correlation (ICC), and Bland-Altman plots were computed for all the parameters considered. Results: For step counting, all trackers were highly correlated with one another (ICCs>0.89). Although the Fitbit tended to overcount steps (MPE=12.36%), the Garmin and ActiGraph undercounted (MPE 9.36% and 11.53%, respectively). The Garmin had poor ICC values when energy expenditure was compared against the criterion. The Fitbit had moderate-to-good ICCs in comparison to the other activity trackers, and showed the best results (MAPE=12.25%), although it underestimated calories burned. For MVPA levels estimation, the wristband trackers were highly correlated (ICC=0.96); however, they were moderately correlated against the criterion and they overestimated MVPA activity minutes. For the sleep parameters, the ICCs were poor for all cases, except when comparing the Fitbit with the criterion, which showed moderate agreement. The TST was slightly overestimated with the Fitbit, although it provided good results with an average MAPE equal to 10.13%. Conversely, WASO estimation was poorer and was overestimated by the Fitbit but underestimated by the Garmin. Again, the Fitbit was the most accurate, with an average MAPE of 49.7%. Conclusions: The tested well-known devices could be adopted to estimate steps, energy expenditure, and sleep duration with an acceptable level of accuracy in the population of interest, although clinicians should be cautious in considering other parameters for clinical and research purposes. (JMIR Mhealth Uhealth 2019;7(6):e13084) doi: 10.2196/13084


Introduction
Fitness trackers are popular devices used by athletes and the general public to monitor their physical activity levels, sport performance, and even their general health status in real time, with the latter having the potential to also predict the person's future health status [1]. Compared to other body positions, the wrist has been identified as the most suitable location for enhancing user acceptability and the user-friendliness of the device [2]. Common consumer-level, wrist-worn devices typically provide data on step count, distance traveled, number of floors climbed, and minutes of physical activity, as well as sport-related activity recognition, physiological measurements, energy expenditure, and sleep patterns. This information can promote a healthier lifestyle or an optimal training program through user-friendly visual feedback of current status or performance compared to set targets.
Fitness trackers based on motion sensors are also being used for monitoring biomechanical quantities of clinical interest, such as gait analysis applications [3,4], indirect estimation of ground reaction forces, and posture in general [5,6].
Although the number of studies investigating the validity and reliability of different fitness trackers is growing, the majority of the evidence is limited to young and middle-aged adult populations, mostly in good health [7][8][9]. Considering the multiple applications of wrist-based technology and its potential adoption in health care, and with an aging population, it is important to investigate the use of these devices in different populations, such as older people [10]. Although older adults perceive commercial trackers as useful and acceptable [11,12], older person-specific activity trackers are still limited [13].
Few studies have investigated the validity of mainstream wrist-based activity trackers in healthy older adults [14,15]. However, such investigations mainly involved a protocol structured around a number of daily activities simulated or recreated in a laboratory environment. Studies that investigated fitness trackers' performance when used by older people in their home environment, where older adults can perform their real daily routine, are scarce and mainly limited to step-counting features [16]. This study reviewed the validity and reliability of consumer-grade activity trackers in older community-dwelling adults through seven observational studies, of which only five studied free-living settings for a monitoring period of between 3 and 7 days.
For example, Paul et al [17] reported that the average steps per day measured over 7 days in a community-dwelling older adult population with a Fitbit and an ActiGraph showed excellent agreement, with the ActiGraph undercounting steps compared against participants' physical activity logs.
In another study, a Fitbit Flex and an ActiGraph were worn by a cohort of cardiac patients and their family members to measure steps and moderate-to-vigorous physical activity (MVPA) levels for 4 days [18]. It showed a significant correlation for step counts but lower values for MVPA, with the Fitbit Flex slightly overestimating both parameters.
Boeselt et al [19] compared a Polar A300 with a BodyMedia SenseWear in a cohort of patients with chronic obstructive pulmonary disease (mean age 66.4 years). Participants used the devices for three consecutive days, measuring steps, calories burned, daily activity time, and metabolic equivalents. The study showed a high correlation for step count and calories burned.
Farina and Lowry [20] compared the accuracy of step counts from two consumer-level activity monitors (Misfit Shine on both wrist and waist, and Fitbit Charge HR on the wrist) against two waist-worn reference devices (ActiGraph GT3X+ and New Lifestyle NL2000i) in healthy, community-dwelling older adults in free-living conditions over seven consecutive days. All consumer-level activity monitors positively correlated with reference devices. Compared to the ActiGraph GT3X+, the waist-worn Misfit Shine displayed the highest agreement, whereas the wrist-worn devices showed poorer performances.
Finally, Burton et al [21] reported good reliability and validity for the Fitbit Flex and Fitbit Charge HR compared against a GENEactiv accelerometer in a free-living environment over 14 days.
Step count, distance traveled, MVPA minutes, and sleep were measured. Good strength of agreement was found for total distance and steps (obtained with the fitness tracker) and the MVPA estimated by the GENEactiv.
It is evident that a comparative analysis of mainstream trackers worn by healthy older people in a more ecologically valid environment is needed. This study aims to investigate the reliability and accuracy of the wrist-based Fitbit Charge 2 and the Garmin vivosmart HR+ activity trackers in the estimation of daily step count, total calorie expenditure, MVPA, and sleep parameters within a home environment in a cohort of older adults.

Participants
This study was based on a sample of 20 healthy older people (9 males, 11 females). Volunteers were recruited via a general invitation email, posters, and word of mouth to exstaff at University College Cork (Cork, Ireland) and their relatives, and also through local social and voluntary groups that had older adults as members. They were informed of the study by the Centre for Gerontology and Rehabilitation in University College Cork.
For the cohort, the inclusion criteria were age 65 years and older, with no history of neurological or other disorders or disability that could affect the participant's movements, and in good general health. Before participation, volunteers received an oral and written explanation of the study protocol, and written consent was obtained. Sociodemographic information was collected on gender, age, weight, height, and dominant arm. The study received approval by the Clinical Research Ethics Committee at the University College Cork. Demographic information on the participants who completed the study protocol is presented in Table 1.

Experimental Protocol
Two consumer-based wrist-mounted brands were tested (Fitbit and Garmin), worn on the nondominant arm. The trackers' position on the wrist was randomized. The dominant side of the waist (midaxillary line) and the dominant wrist are reported to be optimal for monitoring energy expenditure, MVPA, and sleep in older adults [25][26][27][28]; therefore, two ActiGraph monitors were located in these positions as a reference for those parameters. Energy expenditure, MVPA levels, and steps were measured with the ActiGraph on the waist; sleep parameters were extrapolated from the ActiGraph on the wrist. The New-Lifestyles NL-2000i tracker was also worn on the dominant waist (midaxillary line) and was considered as a reference for step counting. Figure 1 illustrates the body positions of the different devices on a participant.
The algorithm adopted by the ActiGraph for estimating energy expenditure was based on the method designed by Crouter et al [29] and also considered in Patterson et al [26]. This method provides estimations expressed in metabolic equivalent (MET), which are later converted into total calories per day. Likewise, the algorithms of Troiano et al [30] and Cole-Kripke et al [31] were considered for estimating MVPA levels and sleep parameters, respectively, through the ActiGraph accelerometer [28,32]. By definition, MVPA level is the amount of time spent performing any activities requiring more than 3 METs, which according to Troiano et al [30] is defined by the ActiGraph by at least 2020 counts per minute. Finally, to guarantee a fair comparison for the different trackers, only the sleep parameters measured by both the Fitbit and Garmin were analyzed. Those parameters were the total sleep time (TST), and the wake after sleep onset (WASO). Participants were asked to complete a sleep diary as well, and the in-bed and out-bed information required by the Cole-Kripke method were input manually according to the values reported in the sleep diary.
The devices were attached on to the person in the morning for data collection and were returned to the researchers the following morning. All trackers were removed by the participants during bathing, whereas only the trackers on the waist were removed during sleep.
Nonwear periods were defined as 90 minutes or more with no activity counts [33]. A valid day was defined as 10 wearing hours or more in a 24-hour period [30].

Statistical Analysis
Descriptive statistics were run on the computed parameters. The following indicators were computed for each parameter and device: mean estimated value with related standard deviation (SD), mean bias with standard deviation, mean percentage error (MPE) with standard deviation, and mean absolute percentage error (MAPE). Intraclass correlation (ICC [2,1]) was performed for each tracker compared against all other devices and the criterion as well. The related 95% confidence intervals (CIs) were also computed. ICC values less than 0.5, between 0.5 and 0.75, between 0.75 and 0.9, and greater than 0.90 are indicative of poor, moderate, good, and excellent reliability, respectively [34]. Bland-Altman plots were also obtained for every parameter comparing all the possible permutations of trackers and the criterion. All statistical analyses were performed using MATLAB (MathWorks, Natick, MA, USA).

Results
Overall, 20 participants took part in the data collection. All participants were white of Irish and British ancestry. Data collection was carried out at the Tyndall National Institute between April 2018 and August 2018. Table 2 shows the mean values measured, mean error, mean percentage error, and MAPE with related standard deviations for the activity monitors and each parameter. Likewise, Table  3 shows the related ICCs with the 95% CI for each tracker and every parameter. Figure 2 displays the MAPE.
The average wear time for each tracker among all participants was mean 963 (SD 102) minutes per day; thus, every monitored day for each participant was deemed a valid test day above the 10 wearing hours threshold. All monitored days were on weekdays.
For step counting, all the trackers were highly correlated with one another (ICCs>0.89). Although the Fitbit tended to overcount steps (MPE=12.36%), the Garmin and ActiGraph undercounted with a MPE of 9.36% and 11.53%, respectively. For the MAPE, the Garmin and ActiGraph were slightly more accurate with mean MAPEs of 12.89% and 14.23%, respectively. Therefore, all the considered activity trackers can accurately capture steps when worn on the nondominant wrist. The Bland-Altman plots are shown in Figures 3-6 for steps, energy expenditure, MVPA, and sleep parameters, respectively, and summarized in Table 4.

Discussion
This investigation is one of the first studies to investigate the reliability and accuracy of two consumer-level, wrist-based activity trackers in the estimation of daily step count, total calories expenditure, MVPA, and sleep parameters in a home environment for a cohort of healthy older adults.
Results show that the mainstream monitors may be adopted to estimate steps, energy expenditure, and some sleep parameters (eg, TST) with a certain level of accuracy in a healthy older adult population in free-living settings, whereas other variables (MVPA and WASO) may show excessively large errors.
The measured mean values are largely consistent with data reported studying the same population of interest in other studies adopting non-consumer-level technologies for energy expenditure [26], steps and MVPA [35], and sleep analysis [36].
Regarding step-counting performance, all the trackers presented a good strength of agreement among one another and against the reference device. Also, this study confirms previous findings [17,18] indicating a slight overcounting by the Fitbit device and an undercounting by the ActiGraph. In absolute terms, as shown by the MAPE, there is no significant difference between the Fitbit and the Garmin monitors when monitoring steps. Similar considerations could be drawn for energy expenditure; however, only the Fitbit shows moderate-to-good agreement with the other trackers. The Fitbit underestimated energy expenditure in our study, confirming findings illustrated in the review by Feehan et al [37], which cited a number of studies in which Fitbits worn on the wrist in free-living settings slightly underestimated METs by 7% (when compared against doubly labeled water), and showed a −10% measurement error (against the SenseWear), and provided MAPE values varying from 16% to 30% when compared with measurements from an ActiGraph or Actiheart accelerometer. However, most of the studies reviewed considered healthy adults and not older adults; thus, the lower MAPE values reported in our study (mean 12.25%, SD 7.33%) may be due to the generally limited amount of moderate-to-vigorous activity performed by older adults. Fitbit showed the narrowest limit of agreement among the trackers, which indicated the device could underestimate the amount of calories per day up to 750.34 kcal and overestimate up to 398.94 kcal.
Due to its many health benefits reported, MVPA may represent an important aspect in people's life and, with aging, this may become even more useful to guarantee independent living and prevention of noncommunicable diseases [35]. The current national MVPA recommendations consider a threshold of 30 minutes per day or 150 minutes per week. Therefore, a correct and reliable estimation of MVPA bouts helps support behavior change techniques applied to sedentary older adults. All the trackers considered (Fitbit, Garmin on the wrist, and New-Lifestyles NL-2000i on the waist) were moderately correlated with the reference. However, the Fitbit and Garmin showed an excellent strength of agreement between each other. The Fitbit and Garmin tended to overestimate MVPA with an average error of 12.63 minutes per day and 13.8 minutes per day, consistent with previously reported results [18], whereas the New-Lifestyles NL-2000i underestimated by 11.7 minutes per day. However, due to the limited moderate-to-vigorous activities performed, MAPE values are large, especially for the wrist-worn devices. The MAPE was mean 75.73% (SD 75.31%) and mean 91.98% (SD 47.13%) for the Fitbit and Garmin, respectively, confirming the large overestimation errors observed in Fitbit devices estimating MVPA in free-living settings compared with an ActiGraph accelerometer in healthy young adults and older adults living with a variety of chronic diseases (MAPEs >30%) [37]. The waist-worn device showed slightly better results both in terms of MAPE and limits of agreement (-47.85 to 24.45).
Finally, aging also impacts sleep, and changes occur in sleep patterns with aging (for example, decrease in the amount of slow wave sleep, increases in non-rapid eye movement sleep, increase in the number of spontaneous arousals, changes in the normal circadian sleep cycle) [38]. Moreover, older adults are more prone to develop sleep-related respiratory disorders, which are associated with cardiovascular disease, metabolic disorders, and impaired neurocognition [38]. Thus, low-cost, unobtrusive, and effective sleep monitoring devices such as consumer-level activity trackers are ideal for providing insightful details on the normal changes in sleeping patterns with advancing age. Between the Fitbit and the Garmin, the Fitbit was moderately correlated with the ActiGraph worn on the wrist, and only for the estimation of the sleeping time. TST was overestimated by a mean 5.72 minutes per day with a MAPE equal to 10.13% (SD 9.12%), which are largely consistent with findings reported in other studies adopting Fitbit devices to investigate sleep measurement accuracy in healthy young adults in free-living settings (MAPE approximately 10%) [37]. The Garmin showed larger errors with a MAPE of 16.8% (SD 13.3%). In contrast, WASO measurements were poorly correlated against the ActiGraph for both devices. Although the lowest mean error was 0.21 minutes per day for the Fitbit, MAPE was large (mean 49.7%, SD 72%) due to the generally limited amount of time spent awake overnight. Conversely, the Garmin significantly underestimated the measurements. Limits of agreement were similar for both trackers for both parameters. These performances may not be suitable for clinical-grade investigations because they require accurate measurements for supporting the decision-making process. For example, WASO is typically adopted as a criterion for discriminating insomnia and normal-sleeper groups (the general threshold is WASO ≥31 minutes per day occurring at least three times per week for at least 6 months) [39]; thus, the WASO estimation inaccuracy may hinder the adoption of mainstream wristband devices for clinical assessments in populations expected to have abnormal sleep patterns [40].
It is worth clarifying that there is no universally accepted definition of an acceptable degree of error for physical activity wearable devices. Some studies recommend that an acceptable measurement error under controlled conditions or for research purposes is within ±3% [41,42] and under free-living conditions is within ±10% [41,42]. Other studies recommend that mean errors of less than 20% have acceptable validity for clinical purposes [43]. This investigation considers the validity criteria between the tested and criterion physical activity measures for clinical purposes when the mean error is less than 20%. Results suggest that the tested devices could be adopted to estimate steps, energy expenditure, and sleep duration with an acceptable level of accuracy in the population of interest, whereas clinicians should be cautious in considering other parameters (eg, MVPA, awakenings) for clinical and research purposes. Although performance estimation is modest in some variables, it may still be adequate for guidance purposes. For instance, the ever-growing acceptance of wearable technologies by older people may push the adoption of wrist-worn trackers in behavior change investigations [11,44].
This study was limited to healthy older adults. As a consequence, it is difficult to indicate if these findings are generalizable to less active older adults or impaired or hospitalized older adults. Indeed, as shown in the literature, step-counting accuracy in people using a walking aid in a laboratory-structured protocol represents a challenge for all consumer-level trackers as evidenced by large MAPE values. Moreover, the small number of studied participants and the reduced intervention duration may also limit the generalizability of these findings. Thus, further studies would be needed to investigate activity trackers' performance in a large cohort and also in nonhealthy populations.
Although the most common commercial trackers were considered in this study, it is difficult to indicate if results may translate to other consumer monitors on the market, due to the different algorithms they may employ.
This analysis was limited to some health parameters, whereas other variables, which may be of interest in older adults, could not be taken into account due to the lack of a gold-standard for nonlaboratory settings. Some examples are sedentary bouts, light activity bouts, the amount of time spent in different postures, distance traveled, speed, additional sleep measures (eg, sleep efficiency, sleep latency), and physiological measures, such as continuous heart rate measurements, blood oxygen saturation levels, galvanic skin response, blood pressure, or photoplethysmography, and these should be further investigated in future studies.
This study explored the performance of two wrist-worn trackers (Fitbit Charge2 and Garmin vivosmart HR+) estimating steps, energy expenditure, MVPA levels, and sleep parameters against gold-standard technologies in a free-living environment in a cohort of healthy participants aged 65 years and older.
This study confirmed that the wrist-worn devices are effective in estimating steps, energy expenditure, and some sleep parameters with a certain level of accuracy in healthy older adults (lower MAPE values: 12.89% for step counting with the Garmin, 12.25% for energy expenditure with the Fitbit, and 10.13% for TST estimation with the Fitbit). The results were coherent with previous studies, and the observed accuracy was acceptable for monitoring everyday activities. However, clinicians should be cautious in considering other parameters (eg, MVPA levels and WASO) for clinical and research purposes.