Within and between-day reliability of energetic cost measures during treadmill walking

The efficacy of assistive devices used during walking is often measured as a reduction in metabolic cost. Metabolic cost is typically assessed within a day or on multiple days, yet the benefit of performing within-day vs. between-day metabolic assessments is unknown. The purpose of this study was to determine the within-day minimal detectable change of standard measures of physiologic performance using a conventional portable metabolic system (K4b2 Cosmed, Rome, Italy), and compare these to between-day values. Twenty healthy adults completed two identical data collection sessions on separate days. In each session they performed three bouts of treadmill walking interspersed with three bouts of rest while oxygen consumption ( ̇ VO2), carbon dioxide production, and heart rate were measured. Intraclass correlation coefficients (ICC) and minimal detectable change values were calculated for non-resting within-day, as well as all between-day comparisons. All within-day measures were clinically reliable (ICC > 0.96), while between-day measures were generally less reliable (ICCs > 0.82). Within-day minimal detectable change values (walking heart rate = 4.9 bpm; gross ̇ VO2 = 0.80 mL/kg/min; net ̇ VO2 = 0.80 mL/kg/min; cost of transport = 0.022 J/Nm) were about half as large as between-day values. The results of this study suggest that, where possible, physiologic changes should be assessed within a single day of testing to maximize ability to detect small changes in performance. *Corresponding author: Deanna H. Gates, School of Kinesiology, University of Michigan, Ann Arbor, MI 48109, USA; Department of Biomedical Engineering, University of Michigan, Ann Arbor, MI 48109, USA E-mails: gatesd@umich.edu, deanna.h.gates@gmail.com

ABOUT THE AUTHORS Audra Davidson is a researcher and Dr Emily S. Gardinier is a postdoctoral fellow in The Rehabilitation Biomechanics Laboratory, run by Dr Deanna H. Gates. In the Rehabilitation Biomechanics Laboratory, we focus on the study of repetitive human movements such as walking and reaching. Throughout these studies, we aim to determine which aspects of movement a person actively controls and how this function can be modeled most effectively. Along with governing control strategies, these models can be used to design both passive and active devices, which can mimic biological function and restore or improve function in individuals with disability. Another focus of our research is determining appropriate outcomes to measure performance with new technology. Without such guidelines, it is difficult to assess the efficacy of assistive devices. The following work aids in this endeavor, providing additional tools for the evaluation of assistive devices.

PUBLIC INTEREST STATEMENT
A key component of the design and prescription of medical devices that assist in walking is a thorough understanding of how patients will benefit from their use. Reducing the energy patients must expend to walk is often a primary goal of such devices. In order to accurately determine if this goal is being met, it is important for researchers to separate everyday changes in energy expenditure from changes caused by using the device. This work aims to provide a standard against which changes in energy expenditure within one day of testing can be measured. With this benchmark, researchers will have a better understanding of which assistive devices result in meaningful reductions in energy expenditure. This will allow for more rigorous assessments of assistive walking devices, ultimately providing physicians and patients with a more thorough understanding of their benefits.

Experimental protocol
Participants performed two identical test sessions in the morning at least one day apart (14 (13) days). Participants were instructed to abstain from caffeine, avoid strenuous exercise, and fast for at least 4 h prior to testing.
A K4b 2 portable metabolic system was used to measure the rate of oxygen consumption (VO 2 , mL/ kg/min), carbon dioxide production (VCO 2 , mL/kg/min), and heart rate. This system consists of a mask, heart-rate monitor, and a collection unit that wirelessly transmits data to a laptop where it can be observed in real time (Figure 1(A)). First, participants were asked to remain seated and still for at least 5 min to obtain metabolic cost measures for seated rest. Then, they walked on a treadmill at a speed normalized to leg length and approximated a comfortable walking speed (average speed = 1.22 (0.03) m/s) (Gates, Wilken, Scott, Sinitski, & Dingwell, 2012). Participants walked for a minimum of 5 min, or until they reached steady-state, which was identified in real-time by visually observing a plateau in V O 2 (average time = 9.1 (1.98) min; Figure 1(B)). Once this plateau was reached, participants walked for an additional 3 min to obtain metabolic cost measures for steadystate walking. This process was then repeated such that all participants completed three bouts of walking which were interrupted by 5 min bouts of seated rest.  (Brockway, 1987). COT, defined as the energy cost of moving 1 kg of body mass 1 m (J/Nm), was calculated by dividing net metabolic rate by walking speed (Mian, Thom, Ardigo, Narici, & Minetti, 2006).

Statistical analysis
Intraclass correlation coefficients (ICCs) were calculated to assess reliability within-and betweendays (Portney & Watkins, 2009 where SD pooled is the pooled standard deviation across testing bouts or days. Subsequently, minimal detectable change (MDC 95 ) values were estimated for each variable according to the equation (Weir, 2005): where 1.96 corresponds to a 95% confidence level. MDCs are defined as the smallest change in a measure that exceeds random variation (Lassere, van der Heijde, & Johnson, 2001), and are often used to quantify the limits of expected variation in a measure. Bland-Altman plots comparing daily average values were used to identify proportional bias (Bland & Altman, 1999). Pearson's correlations were performed to determine if testing separation, subject age, or subject weight were correlated with percent difference in performance measures across days for each subject. All statistical analyses were performed in SPSS version 22 (IBM Corp., Armonk, NY, USA).

Results
All 20 subjects were included in the analysis. Data from one subject (S08) was lost for one session on day two of testing due to equipment malfunction. F-tests were significant for all variables (p < 0.01), indicating sufficient heterogeneity. No proportional bias was observed in any measure (see Supplemental material).
All within-day and one between-day measure (walking HR) indicated clinically meaningful reliability (ICC > 0.96; Table 1). Between-day measures had good reliability (ICC > 0.82), except for seated rest V O 2 , which had poor reliability (ICC = 0.65).
The within-day MDCs were about half the magnitude of between-day MDCs ( Table 1). The MDC for walking heart rate exhibited the smallest comparative difference, with a within-day MDC of 4.9 bpm, which was about 30% less than its between-day MDC of 7.1 bpm (Table 1).
(1) The percent change across days for energetic measures were not significantly correlated with the days between testing (p ≥ 0.24), subject age (p ≥ 0.17), or subject weight (p ≥ 0.54) for any performance measure (Table 2).

Discussion
The purpose of this study was to determine within-day MDCs for various energetic measures of walking performance and compare these to between-day values. All within-day measures were deemed clinically reliable (ICC > 0.96). Between-day comparisons of physiologic measurements generally exhibited good agreement (ICC > 0.82), but were less reliable than within-session measures. Therefore, consistent with our expectations, the K4b 2 system was able to reliably detect smaller changes in physiologic measures for comparisons made within a day than comparisons made across different testing days.
The results for between-day reliability were consistent with those reported previously on healthy adults using the K4b 2 system (Darter et al., 2013) and another portable metabolic system (Blessinger et al., 2009). While seated rest and walking HR MDCs in the present study were 3-4 bpm lower than those previously reported (Darter et al., 2013), between-day oxygen consumption MDCs were roughly 0.86 and 0.61 mL/kg/min higher than previously reported values (Darter et al., 2013) for gross and net V O 2 , respectively. This discrepancy may be the result of a larger separation between collection days in the current study. In the present study, testing days were separated by an average of two weeks, compared to five days of average separation in other studies (Blessinger et al., 2009;Darter et al., 2013). The longer between-day interval in the present study may have allowed for greater changes in fitness, body weight, stress levels, or other factors known to influence energetic measurements (Donahoo, Levine, & Melanson, 2004).
Consistent with prior studies (Darter et al., 2013;Thomas et al., 2009), resting V O 2 was less reliable than walking V O 2 . Littlewood et al. (2002) suggested that the K4b 2 device is highly variable when acquiring resting energetic measurements, and may only be suitable for measuring the energetics of activity (Littlewood et al., 2002). When comparing the Cosmed system to the Deltatrac II, a metabolic cart specifically designed to measure resting energetics, Littlewood et al. (2002) revealed a mean bias for REE and VO 2 of 268 ± 702 kcal/day and 51.6 ± 125.6 mL/min, respectively. The exact reason for this discrepancy is not well understood, as its design is similar to devices that have been reported to produce consistently accurate resting measures (Littlewood et al., 2002). However, supine rest, longer and stricter fasting periods, and resting prior to data collection have been shown to improve the between and within-day reliability of resting measures obtained with the Cosmed K4b 2 system (Welch et al., 2015). These stricter resting protocols should be used if resting measures are of particular importance to a study.
There are several limitations to this study. As mentioned above, the protocol may have limited our ability to determine accurate energetic measures for seated rest. To obtain more reliable measures we could have had participants rest for longer periods of time. This may have improved the reliability for both seated rest and net VO 2 . Second, although participants were asked to abstain from exercise and to fast and avoid caffeine, we have no way of determining whether they actually did these things. Failure to adhere to these guidelines could cause increased variability from day-to-day. This would be true for any study unless participants were observed over extended periods of time in the laboratory. Third, there was a wide variation in subject characteristics and range of time between testing sessions. Testing days were separated by an average of two weeks and the age range of subjects tested was large (19-65 years). These factors were not correlated with changes in any measure across days, however. Although this may increase the variability of the data, testing a wide variety of subjects with varying lengths of testing separation increases the generalizability of the results.
In conclusion, the K4b 2 system is able to detect small changes in physiologic measurements within a day. Within-day MDCs were about half the size of between-day values. These results emphasize the benefit of using within-day methodological designs when feasible as the ability to determine differences in metabolic cost dramatically improves.