Validity and Reliability of Wearable Devices during Self-Paced Walking, Jogging and Overground Skipping

Wearable technology can track unusual exercise, providing data for improving fitness. The aim of the study was to determine validity and reliability during walking, jogging, and skipping. Eighteen volunteers completed 5 min self-paced activities interspersed with 5 min rest. Variables and devices were step count (Garmin Instinct), estimated energy expenditure (Garmin Instinct, Polar Vantage M2), and heart rate (Garmin Instinct, Polar Vantage M2, Polar OH1, Polar Verity Sense). Validity measures were mean absolute percent error (MAPE) and Lin’s Concordance (CCC), and reliability were coefficient of variation (CV), and intraclass correlation (ICC). Thresholds were MAPE ≤5%, CCC≥0.90, CV≤10%, ICC≥0.70. Garmin Instinct step count during skipping was not considered valid (MAPE=90.2%, CCC=0.008) or reliable (CV=6%, ICC CI=0.4). Energy expenditure during skipping was not valid or reliable in the Garmin Instinct (MAPE=28%, CCC=0.27; CV=19%, ICC=0.61) or the Polar Vantage M2 (MAPE=19%, CCC=0.57; CV=13%). While the Polar Vantage M2 was reliable for estimated energy expenditure during walking and jogging activities, wrist-worn devices (Garmin Instinct, Polar Vantage M2) were neither valid nor reliable in returning estimated energy expenditure during overground skipping. From a wider perspective, wearable device algorithms for estimating energy expenditure should continue to be refined until they return the same level of accuracy as what is currently observed for heart rate, and to a lesser extent step count. Skipping may be an excellent unusual activity for testing wearable devices


Introduction
Wearable technology, particularly activity trackers, have sprung up in popularity in recent years.These wearable devices are designed to monitor and track physical activity, including steps taken, estimated energy expenditure, and heart rate (Bunn, Navalta, Fountaine, & Reece, 2018).Systematic reviews consistently report that using wearable activity trackers leads to improvements in weight loss, waist circumference, and body mass index (Yen & Chiu, 2019).Wearing an activity tracker and receiving daily feedback on progress can led to increased physical activity and reduced sedentary behav-ior (Jakicic et al., 2016).However, the boundless enthusiasm for wearable devices should be tempered, as the benefits of using activity trackers may depend on motivation and level of engagement with the device (Cadmus-Bertram, Marcus, Patterson, Parker, & Morey, 2015).
Wearable technology has been increasingly used to monitor and track physical activity beyond traditional exercise activities, such as in water-based sports, dancing, and video gaming.Activity trackers have been proposed to obtain metrics during water-based activities such as kayaking and canoeing (Umek & Kos, 2018;Liu, Wang, Qiu, Zhang, & Hao, 2021).Similarly, wearable sensors can be used to assess activity during both social and jazz dancing activities (Stančin & Tomažič, 2022), making it a useful tool for dancers who must perform repetitive leaping motions to track their progress and improve their performance.Furthermore, virtual reality gaming has gained popularity as an alternative form of exercise, and wearable technology such as virtual reality headsets and motion capture devices can be used to monitor and track physiological responses during these activities (Cao, Xie, & Chen, 2019).Overall, wearable technology has shown potential in monitoring and tracking physical activity in a variety of unusual exercise activities, providing useful data for individuals to optimize their performance and improve their overall fitness levels.However, further research is needed to explore the accuracy and reliability of these devices in diverse settings and unusual activities.
Bounding straight to the issue, overground skipping is a non-traditional exercise that has not been investigated to date in the context of wearable technology devices.Most humans transition from walking to running as a faster method of travel and exercise (Navalta, Davis, Carrier, Sertic, & Cater, 2021), however, running requires greater energy expenditure than walking (Harrell et al., 2005).Some people, particularly children, perform skipping as a transitory movement as an alternative to walking and running (Navalta et al., 2021).The available literature centers on rope skipping, rather than overground skipping (Verdel et al., 2022;Yongmao & Yuxin, 2023).One study evaluated skipping in 20-sec bouts as obtained from a novel smart patch (Verdel et al., 2022), and a recent investigation was conducted on rope skipping and deep learning algorithms (Yongmao & Yuxin, 2023).However, overground skipping is different in that the action involves forward propulsion that is absent in rope skipping, which may be responsible for half of the energy cost of walking (Gottschall & Kram, 2005).Thus, the evaluation of wearable devices to return valid and reliable measurements during overground skipping should be conducted.
The purpose of this investigation was to determine the validity and reliability of wearable technology devices during an overground skipping activity.While many studies are designed to determine validity, there is a need to also obtain reliability metrics (Carrier, Barrios, Jolley, & Navalta, 2020).Previous research has shown wearable devices to have varied levels of validity, with heart rate generally being acceptable, energy expenditure estimation unacceptable, and step count falling in between (Bunn et al., 2018).Based on previous literature and testing performed in our laboratory, it was hypothesized that during overground skipping, step count would be reliable but not valid, estimates of energy expenditure would be neither valid nor reliable, and heart rate would be both valid and reliable.These findings are important, as this is the first study to report validity and reliability of wearable devices during an overground skipping activity.

Participants
Eighteen participants (female n=10, male n=8, transgender, intersex, or other n=0) provided written informed consent that was approved by the Institutional Review Board (IRB approval# UNLV-2022-80) and volunteered for the study.A power analysis was performed using pilot data, indicating the need for at least eleven participants (coefficient of determination r 2 =0.57, correlation r effect size=0.755, a=0.05, b=0.80).Participants were screened and deemed not to require medical clearance to complete exercise according to the American College of Sports Medicine preparticipation health screening recommendations.The mean demographic information included: age=26±8 years, body mass=73.3±11.9kg, height=66.4±3.5 cm, and Body Mass Index=25.6±3.0 kg•m −2 .Participants self-identified ethnic group included African American (n=1), White (n=10), Hispanic (n=3), Polynesian (n=1), and Southeast Asian (n=3).

Device Setup
Demographic information was obtained and input into the criterion and wearable devices prior to testing each participant.Participants were outfitted with criterion devices (K5 [COSMED, Rome, Italy] for energy expenditure; H10 [Polar Electro, Kempele, Finland] for heart rate) and wearable devices (described below), and a secure Bluetooth connection was confirmed.In all cases, devices were affixed to the body according to manufacturer recommendations.Briefly, the K5 was secured on the back of the participant, the H10 attached around the chest.The biceps-worn experimental devices were the Polar OH1 (Polar Electro, Kempele, Finland) and Polar Verity Sense (Polar Electro, Kempele, Finland), placed on the right and left biceps each.The wrist-worn experimental devices were the Garmin Instinct (Garmin Limited, Olathe, Kansas) and Polar Vantage M2 (Polar Electro, Kempele, Finland), placed on the right and left wrist each.Two of the same models of each experimental device were used simultaneously so that concurrent reliability could be obtained (Pinedo-Jauregi, Garcia-Tabar, Carrier, Navalta, & Camara, 2022).
The chest-worn (H10) and biceps-worn devices (OH1, Verity Sense) were connected via Bluetooth to an iPad mini (Apple Inc., Cupertino, CA) with the PerformTek application (Valencell, Inc., Raleigh, NC) which returns heart rate of all connected devices on a single csv file.The wrist-worn devices (Instinct, Vantage M2) collected information directly onto the device and were downloaded at a later time using recommended applications (Instinct through the Garmin Connect application and Bluetooth connection; Vantage M2 through the Polar FlowSync desktop application and Polar Flow online database).

Exercise Bouts
After confirming connection to the criterion and experimental devices, participants performed a self-paced walk back and forth through an indoor hallway along a 61-meter track, followed by 5 minutes (min) of seated rest.All devices were reset during the period of rest (i.e., new activity recording session with the same demographic and anthropometric data in the watch; not a factory reset).Participants then completed 5 min of self-paced running back and forth through the same indoor hallway, followed by 5 min of seated rest.All devices were again reset during the period of rest.Finally, participants performed 5 min of self-paced overground skipping back and forth through the same indoor hallway.
Step count was collected via manual clicker step counters by two independent observers during the overground skipping bout only.The step count was the arithmetic mean of the observer's manual counts.A step was defined as the completion of a unilateral stride before transferring motion to the contralateral side of the body.The protocol is similar in terms of timing and administration to other investigations conducted by the laboratory group (Montes & Navalta, 2019;Navalta et al., 2019;Montes, Tandy, Young, Lee, & Navalta, 2022).

Devices
Polar H10: Although the use case specific to overground skipping has not been determined, the Polar H10 chest strap has been shown in other settings to have acceptable reliability (Speer, Semple, Naumovski, & McKune, 2020) and to be valid compared to electrocardiography (Gilgen-Ammann, Schweizer, & Wyss, 2019).The Polar H10 is an electrocardiogram-based heart rate sensor secured around the chest at the level of the xyphoid process.The H10 contains plastic electrodes on the underside of the strap that detects heart rate.The sensor materials include acrylonitrile butadiene styrene (ABS), ABS plus glass fiber (ABS+GF), polycarbonate, and stainless steel, and the strap material is composed of 38% polyamide, 29% polyurethane, 20% elastane, 13% polyester, and silicone prints.The H10 has a sampling frequency of 1000 Hertz (Hz).
Polar OH1: The Polar OH1 is a photoplethysmography (PPG) device that uses an optical heart rate sensor worn on the upper arm.The OH1 is an optical heart rate sensor worn on the upper arm.The sensor materials include ABS, ABS+GF, poly(methyl methacrylate) (PMMA), and steel use stainless (SUS) 316.The device was positioned so that the sensor was on the underside of the armband and firmly against the skin.The OH1 has a sample rate of 135 Hz.
Polar Verity Sense: The Polar Verity Sense is also a PPG device worn on the upper arm.The sensor materials include ABS, ABS+GF, PMMA, and SUS 316.The device was positioned with the sensor on the underside of the armband and firmly against the skin.The Verity Sense has a sample rate of 135 Hz.
Garmin Instinct: The Garmin Instinct is a PPG device.The physical size is 46 x 46 x 12.5 millimeters (mm) with a mass of 45 grams (g).The optical heart rate sensor is worn on the wrist and employs Garmin ElevateTM heart rate sensor technology.The sample rate is unknown.
Polar Vantage M2: The Polar Vantage M2 is a PPG device.The physical size is 45 x 45 x 15.3 mm with a mass of 52 g.The optical heart rate sensor is worn on the wrist and em-ploys Precision PrimeTM heart rate sensor technology.The Vantage M2 is reported to have a sample rate of 135 Hz.

Statistical Analysis
Reported measures associated with validity include mean absolute percent error (MAPE), Lin's Concordance Correlation Coefficient (CCC), and the mean absolute error (MAE).The equations for these metrics were input into an Excel spreadsheet (Microsoft Excel for Mac version 16.66.1,Redmond, WA).For validity thresholds our laboratory group has used a MAPE value ≤ 5%, and a CCC ≥ 0.90 (J.W. Navalta et al., 2020).A device meeting both thresholds was considered evidence in support of validity.The Bland-Altman analysis was used to determine agreement.Bias and limits of agreement were determined using the blandr analysis in jamovi (version 2.3.19.0).
Reported measures associated with reliability include the coefficient of variation (CV), and two-way mixed model with absolute agreement intraclass correlation coefficient (ICC).The CV was determined using Excel (Microsoft Excel for Mac version 16.66.1,Redmond, WA), and the ICC (single measures) using SPSS Statistics (IBM SPSS Statistics, version 28.0.1.0,Chicago, IL).For non-laboratory settings a threshold of ≤ 10% for CV, and ≥ 0.70 for ICC, with a lower bound of the 95% CI ≥ 0.70 has been used (Navalta et al., 2019).A device meeting both thresholds was considered evidence in support of reliability.

Step Count
Step count was obtained only from the Garmin Instinct during the skipping trial because the other devices did not return the measure.Manual step count was not obtained from one participant and was not recorded from the Garmin Instinct for two other participants.Thus, validity measures reflect data points from 30 distinct bouts, and reliability measures from 16 pairs (see table 1).Based on the predefined definition of a step, the device did not meet any of the threshold measures for validity and overestimated the step count by nearly double (see Table 1).The Garmin Instinct met the reliability thresholds for CV and ICC but was not considered reliable because the lower end of the 95% CI did not meet the established threshold.

Energy Expenditure
There was no missing energy expenditure data from the criterion device or any of the wearable activity trackers.Validity and reliability measures for wrist-worn devices (Garmin Instinct, Polar Vantage M2) during walking, jogging, and skipping are provided in

Heart Rate
Heart rate measures were not obtained from the Polar OH1 for one participant during the walking trial.There was no missing heart rate data from the criterion device or any of the other wearable activity trackers.All devices met the prees-tablished validity thresholds during the self-paced walking trial (see Table 3).One wrist-worn device (Garmin Instinct) and one biceps-worn device (Polar Verity Sense) met the thresholds for reliability during walking (see Table 3) and can be considered both valid and reliable during this type of exercise.During both self-paced jogging and self-paced skipping, the biceps-worn devices (Polar OH1, Polar Verity Sense) met all preestablished thresholds for validity and reliability (see Table 3).Thus, the Polar OH1 and the Polar Verity Sense can be considered valid and reliable during jogging as well as overground skipping.

Discussion
The purpose of this investigation was to determine the validity and reliability of wearable technology devices to return step count, energy expenditure, and heart rate during walking, jogging, and overground skipping.It was hypothesized that, during overground skipping, step count would be reliable but not valid, energy expenditure would be neither valid nor reliable, and heart rate would be both valid and reliable.First, the hypothesis for step count was partially supported because step count during self-paced skipping was neither valid nor reliable.Second, the hypothesis for energy expenditure was supported because estimated energy expenditure during selfpaced skipping was neither valid nor reliable.Third, the hypothesis for heart rate was supported for biceps-worn devices (valid and reliable) but not wrist-worn technology (neither valid nor reliable).
The conclusions drawn from step count obtained during skipping are obtained from a single wrist-worn wearable device and should be considered with caution.The device did not meet the validity thresholds, likely because of how a step was defined in the current investigation (as a complete unilateral stride before transferring motion to the contralateral side of the body).The definition did not account for the foot tap down during a stride that many participants performed during the skipping motion, and likely accounted for the very large bias that was observed.It may be useful for future researchers interested in step count during overground skipping to account for the foot tap, as it appears at least one wearable device registers this action.It is proposed that users of consumer activity trackers may utilize this knowledge to their advantage, to increase step count during wellness program challenges by performing overground skipping as their preferred method of transportation, which would conceivably double the step count compared to the more mundane method of walking.The practical implications of this finding should not be skipped over.
Heart rate was observed to be both valid and reliable in all devices during the walking bout.When wrist-worn devices (Garmin Instinct, Polar Vantage M2) are considered, the finding did not extend to activities of greater relative intensity, being considered neither valid nor reliable during jogging and overground skipping.This finding aligns with a study using a different wrist-based device, the Garmin fēnix 5, which reported moderate validity measures during sitting and walking, but poor validity measures during increased intensities of exercise (Duking et al., 2020).However, an earlier Polar wrist-worn model, the Polar Vantage V, was reported to have acceptable MAPE values for heart rate during running and cycling at low and high self-selected intensities (Hajj-Boutros, Landry-Duval, Comtois, Gouspillou, & Karelis, 2023).The specific wrist-based devices used in the current investigation have little published literature outside of conference abstracts, many of which are from our laboratory group.The same holds true for one of the biceps-worn devices, the Polar Verity Sense, in that published validation literature currently exists only in abstract form (Bodell et al., 2021).The other biceps-worn device, the Polar OH1, is reported to have acceptable validity during treadmill (MAPE between 0.2 and 1.9%) and cycle exercise (MAPE between 0.6 and 3.9%) (Muggeridge et al., 2021), spin bike activities (mean bias less than 1 bpm) (Hettiarachchi, Hanoun, Nahavandi, & Nahavandi, 2019)  front crawl swimming at various intensities (ICC between 0.72 and 0.96) (Olstad & Zinner, 2020), and endurance sports (difference from criterion < 5%), as well as acceptable reliability during endurance sports (ICC = 0.99) (Hermand, Cassirame, Ennequin, & Hue, 2019).Based on the results of the current investigation, biceps-worn devices may be preferrable for individuals who are interested in obtaining both valid and reliable heart rate measures in non-traditional activities such as overground skipping.
It has been noted that estimated energy expenditure in wearable devices during exercise validation studies is generally poor (Bunn et al., 2018).As described with heart rate above, the specific devices used in the current investigation have not been reported in the published literature, and the same holds true for energy expenditure.The existing literature details opportunities for improvement when energy expenditure is considered.Physical activity energy expenditure in wearable devices including the Garmin vívofit, Jawbone Up24, and Fitbit Flex, compared to doubly labeled water were deemed unacceptable (MAPE range for 12 devices 19.4% to 100.2%) (Murakami et al., 2019).Estimated energy expenditure returned from the Garmin vivo HR+ and Fitbit Charge 2 were evaluated against COSMED units (K4, K5) during progressive exercise tests to volitional fatigue (treadmill, cycle ergometer) and MAPE did not meet validity threshold (18.9% to 43.5%) (Reddy et al., 2018).During trail running, the Hexoskin biometric shirt did not return correlated energy expenditure measures compared to the COSMED K4 (r = -0.058)(Tanner et al., 2016).The Apple watch 6 (14.9% to 47.8% MAPE), Polar Vantage V (15.6% to 34.6% MAPE), and Fitbit Sense (17.8% to 45.1% MAPE) all displayed unacceptable MAPE during walking, running, resistance exercises, and cycling (Hajj-Boutros et al., 2023).The current investigation adds to the volume of literature, in that estimated energy expenditure obtained from Garmin Instinct and Polar Vantage M2 wrist-worn devices during self-paced walking, jogging, and overground skipping are similarly poor when validity measures are considered.While the current investigation reports the Polar Vantage M2 to satisfy reliability thresholds during walking and jogging, this is of little consolation if the validity assumptions are violated.
The current investigation is not without limitations or hurdles, which are inherent in wearable technology validation studies (Navalta & Bunn, 2023).As mentioned previously, the predetermined definition for what constitutes a step during the overground skipping motion likely led to the device returning poor validation measures.Future studies employing overground skipping while obtaining step count measures should take this factor into account.Another limitation is that the order of the exercise bouts was constant, with walking first, then jogging, and ending with overground skipping.Because the intent of the study was to obtain concurrent validity and reliability measures, it is believed that this approach is acceptable.However, it is unknown whether scheduling the overground skipping trial last affected mechanics of the skill, and whether potential fatigue may have induced extraneous movements that decreased validity and or reliability measurements in certain devices.Future studies employing various exercise modalities should not skip over randomizing the order of exercise bouts.
In conclusion, the modality of overground skipping is an area of research that represents boundless possibilities.In the context of the current investigation, several findings are reported on specific wearable devices for the first time.Regarding heart rate, the Garmin Instinct, Polar Vantage M2, Polar Verity Sense, and Polar OH1 met all thresholds to be considered both valid and reliable during self-paced walking.Only the biceps-worn devices (Polar OH1, Polar Verity Sense) met the heart rate thresholds for self-paced jogging and overground skipping.The Garmin Instinct was the only device to return step count, and it was neither valid nor reliable during overground skipping according to our predetermined definition of a step.While the Polar Vantage M2 was reliable for energy expenditure during walking and jogging, wrist-worn devices (Garmin Instinct, Polar Vantage M2) were neither valid nor reliable in returning estimated energy expenditure during overground skipping.From a wider perspective, wearable device algorithms for estimating energy expenditure should continue to be refined until they return the same level of accuracy as what is currently observed for heart rate, and to a lesser extent step count.Overground skipping may be an excellent unusual exercise for the continued testing of these measures as leaps and bounds are made in the development wearable activity devices.

Table 1 .
Step count during the self-paced skipping bout.Criterion (manually counted steps), and validity and reliability measures of the Garmin Instinct.Average is the arithmetic mean (standard deviation) measurement of step count.MAPE = mean absolute percent error, CCC = Lin's Concordance Correlation Coefficient, CV = coefficient of variation, ICC = Intraclass correlation coefficient, CI = Confidence interval of the ICC.Values noted in bold and italics meet the predetermined threshold for validity or reliability.

Table 2 .
The biceps-worn devices did not return an estimate of energy expenditure.Neither the Garmin Instinct nor the Polar Vantage M2 met the predetermined validity thresholds when returning estimates of energy expenditure across any bout of exercise.The Polar Vantage M2 met the reliability thresholds during self-paced walking and jogging, but not overground skipping.VALIDITY OF WEARABLES DURING OVERGROUND SKIPPING | J. W. NAVALTA ET AL.

Table 2 .
Energy expenditure during self-paced walking, jogging, and skipping bouts.Criterion (COSMED K5), and validity and reliability measures of the Garmin Instinct, and Polar Vantage M2.
Average is the arithmetic mean (standard deviation) measurement of kilocalories expended.MAPE = mean absolute percent error, CCC = Lin's Concordance Correlation Coefficient, CV = coefficient of variation, ICC = Intraclass correlation coefficient, CI = Confidence interval of the ICC.Values noted in bold and italics meet the predetermined threshold for validity or reliability.
(continued on next page) , = mean absolute percent error, CCC = Lin's Concordance, CV = coefficient of variation, ICC = Intraclass correlation coefficient, CI = Confidence interval.Values noted in bold and italics meet the predetermined threshold for validity or reliability. MAPE