Performance of Different Accelerometry-Based Metrics to Estimate Oxygen Consumption during Track and Treadmill Locomotion over a Wide Intensity Range

Accelerometer data can be used to estimate incident oxygen consumption (VO2) during physical activity. Relationships between the accelerometer metrics and VO2 are typically determined using specific walking or running protocols on a track or treadmill. In this study, we compared the predictive performance of three different metrics based on the mean amplitude deviation (MAD) of the raw three-dimensional acceleration signal during maximal tests performed on a track or treadmill. A total of 53 healthy adult volunteers participated in the study, 29 performed the track test and 24 the treadmill test. During the tests, the data were collected using hip-worn triaxial accelerometers and metabolic gas analyzers. Data from both tests were pooled for primary statistical analysis. For typical walking speeds at VO2 less than 25 mL/kg/min, accelerometer metrics accounted for 71–86% of the variation in VO2. For typical running speeds starting from VO2 of 25 mL/kg/min up to over 60 mL/kg/min, 32–69% of the variation in VO2 could be explained, while the test type had an independent effect on the results, except for the conventional MAD metrics. The MAD metric is the best predictor of VO2 during walking, but the poorest during running. Depending on the intensity of locomotion, the choice of proper accelerometer metrics and test type may affect the validity of the prediction of incident VO2.


Introduction
Accelerometer-based assessment of physical activity (PA) intensity is widely used in observational and intervention studies. Accelerometers permit continuous assessment of PA intensity patterns over long periods with high resolution. A crucial step in the assessment of PA is the extraction of the useful information from the raw acceleration signal and transforming it into meaningful measures, such as oxygen consumption (VO 2 ) or metabolic equivalent (MET). While the acceleration signal obtained from the sensor follows the biomechanical rules determined by body dimensions, attachment position, and movement efficiency, the corresponding energy cost is affected by physiological maturation and efficiency of the given person [1].
Current ways to monitor daily physical behavior with accelerometer-based methods vary a lot in terms of the sensor itself, its wear time and placement, and analysis methods used to extract the information of interest. This methodological variance is likely the factor that feeds the current controversies in the epidemiology of daily physical behavior. Chastin et al. concluded recently that the data from hip-worn accelerometry can provide accurate and meaningful estimates of PA [2]. Further, recent advances in data analysis have made it possible to quantify also the sedentary time spent in standing, sitting, and reclining postures [3,4].
Relationships between the accelerometer output, gait speed, and VO 2 are typically determined using specific walking or running protocols on a treadmill or overground. So far, only a few studies have compared these two approaches [5,6]. Yngve et al. found that during track locomotion the accelerometer output was consistently higher and VO 2 slightly lower compared to treadmill locomotion when both activities were performed at the same individual velocity [5]. Barnett et al., in turn, found that the treadmill-based calibration equations overestimated both VO 2 and walking speed during a free-living test, whereas the calibration equations based on acceleration data collected on a 400 m track at a controlled speed provided accurate and unbiased estimates of VO 2 [6]. The above observations mean that the accelerometer outputs differ whether the locomotion was performed on the track or treadmill. Thus, the differences in gait mechanics between treadmill walking and overground walking appear to result in inconsistent estimations of free-living gait speed and VO 2 using the accelerometer data [5,6].
We have devised the mean amplitude deviation (MAD) of the resultant acceleration signal as a metric for comparable classification of PA intensity irrespective of substantial differences in measurement ranges and sampling rates of different accelerometer brands [7]. Several other researchers have evaluated the performance of the MAD metric and found it at least satisfactory [8][9][10][11][12][13][14][15]. The initial MAD method provides a valid and accurate estimate of incident VO 2 within a wide range of walking and running speeds on track locomotion [16]. However, Chen et al. recently stated that the MAD method employing trunk-worn accelerometer data cannot be used to estimate VO 2 while running on a treadmill [14]. In this study, we therefore assessed the validity of three different MAD-based analysis methods employing hip-worn accelerometer data in estimating VO 2 during track and treadmill locomotion.

Accelerometry
This study employed the raw acceleration signals from a triaxial accelerometer (Hookie AM20, Traxmeet Ltd., Espoo, Finland) collected during controlled track and treadmill tests. This accelerometer employs the commonly used 13-bit digital triaxial acceleration sensor (ADXL345; Analog Devices Inc, Norwood, MA, USA). The measurement range of the accelerometer was ±16 g (g denotes for Earth's gravity, 9.81 m/s 2 ) and the data were measured at a 100 Hz sampling frequency.
The accelerometer was attached to the hip-mounted elastic belt at the level of the iliac crest. Because the indoor track had banked turns only to left, the track group kept the accelerometer either at the right or left side of the hip as per random assignment. The left side (i.e., the inner side of the curve) was assigned to 13 participants and the right side (i.e., the outer side of the curve) to 16 participants. The treadmill group kept the accelerometer on the right side of the hip.

Test Protocols
The track test was conducted by the UKK Institute and the treadmill test by the University of Graz. The track group consisted of 29 healthy volunteers (15 males and 14 females), and the treadmill group of 24 healthy volunteers (12 males and 12 females). Before testing, participants' body height and weight were measured with standard methods. Participants were informed about the experimental test protocol, and they gave their written informed consent before the tests. Local ethical committees approved the studies.
The track group performed a pace-conducted non-stop test on a 200 m long indoor track. The initial speed of the track test was 0.6 m/s (2.16 km/h), and the speed was increased by 0.4 m/s (1.44 km/h) every 2.5 min.
The initial speed of the treadmill test was 2.0 km/h, and the speed was increased by 2.0 km/h every 3.0 min. The treadmill grade was constantly 1.5%. At the end of each stage, the treadmill test was paused for 30 s for blood sampling.
While performing the test, the participants could freely decide whether they preferred walking or running for the given speed of locomotion. The test was continued until volitional exhaustion when the participant could not keep up with the concurrent pace.
During the track test, VO 2 was continuously measured in breath-by-breath mode using a portable metabolic gas analyzer (Oxycon Mobile, Carefusion, Yorba Linda, CA, USA), and the data were recorded with a telemetry system. During the treadmill test, VO 2 was continuously measured in breath-by-breath mode using a cardiopulmonary exercise testing system (Oxygon Pro ® , Carl Reiner GmbH, Vienna, Austria). Both devices were calibrated before each test according to the manufacturer's instructions.

Data Analysis
The acceleration signal was analyzed for the final two minutes of each stage, when the participants had reached a steady rhythm. As the steady-state value of VO 2 for the given speed, the mean VO 2 of the final minute of the corresponding stage was used. Stages with a respiratory exchange ratio over 1.0 or not fully completed were excluded from the analysis.
According to our standard procedure using the MAD method [7,[16][17][18], the accelerometer data were analyzed in six-second epochs. For each epoch, MAD, MADxyz, and mean magnitude (MM) values were calculated using Equations (1)-(4). In these equations, r i is the magnitude of the incident resultant acceleration of the three orthogonal vectors x i , y i , and z i ; N is the number of samples in the epoch (for the six-second epoch 600); and R ave , X ave , Y ave , and Z ave are the mean acceleration values of the epoch. The unit of all calculated values is milligravity (mg), which corresponds to one-thousandth of Earth's gravitational force.
For illustration, Figure 1 depicts the measured raw and processed acceleration data for a two-second walking period. The conventional MAD metric is sensitive to the variation in the total magnitude of the acceleration, while the MADxyz metric and the novel MM metric are also sensitive to changes in the accelerometer orientation relative to the Earth's gravity vector. The MM metric utilizes the same acceleration parameters which are needed to calculate not only the MAD and MADxyz values used in this study but also to determine the angle of accelerometer used to estimate the body posture [3].

Statistical Analysis
Multiple regression models for VO 2 estimation were determined separately for the three accelerometry-based metrics for typical walking and running intensities. Stagespecific VO 2 was the dependent variable and the respective MAD, MADxyz, or MM values and the test type served as the independent predictor variables. The test type was a dummy variable (track test = 0 and treadmill test = 1). In addition, the receiver operator characteristics (ROC) analysis was used to find the optimal cut-points for moderate, vigorous, and very vigorous PA for MAD, MADxyz, and MM metrics. The cut-point between light and moderate PA was set to 3.0 MET (in terms of oxygen consumption 1 MET = 3.5 mL (O 2 )/kg/min), between moderate and vigorous PA to 6.0 MET, and between vigorous and very vigorous PA to 9.0 MET. The Kolmogorov-Smirnov statistic was used to determine the optimal cut-points.

Statistical Analysis
Multiple regression models for VO2 estimation were determined separately for the three accelerometry-based metrics for typical walking and running intensities. Stage-specific VO2 was the dependent variable and the respective MAD, MADxyz, or MM values and the test type served as the independent predictor variables. The test type was a dummy variable (track test = 0 and treadmill test = 1). In addition, the receiver operator characteristics (ROC) analysis was used to find the optimal cut-points for moderate, vigorous, and very vigorous PA for MAD, MADxyz, and MM metrics. The cut-point between light and moderate PA was set to 3.0 MET (in terms of oxygen consumption 1 MET = 3.5 mL (O2)/kg/min), between moderate and vigorous PA to 6.0 MET, and between vigorous and very vigorous PA to 9.0 MET. The Kolmogorov-Smirnov statistic was used to determine the optimal cut-points.
All statistical analyses were conducted using the statistics software (IBM SPSS Statistics for Windows, Version 27.0, Armonk, NY, USA). Table 1 shows the participant characteristics in the track and treadmill test groups. On average, both groups were normal weight, but the treadmill group was 11.6 years All statistical analyses were conducted using the statistics software (IBM SPSS Statistics for Windows, Version 27.0, Armonk, NY, USA). Table 1 shows the participant characteristics in the track and treadmill test groups. On average, both groups were normal weight, but the treadmill group was 11.6 years younger and 4.7 cm taller than the track group. The age-difference was statistically significant. Figure 2 shows the stage-specific mean values of accelerometry-based metrics plotted against the locomotion speed and the measured VO 2 . The distinct gap in the MAD, MADxyz, and MM values between speeds from 6 to 8 km/h and VO 2 from 22 to 30 mL/kg/min coincides with the transition between typical walking and running speeds. In addition, the VO 2 points displayed different slopes for typical walking and running speeds. Therefore, the regression analysis was conducted separately to points having VO 2 values less than  BMI is body mass index. SD is standard deviation. * Between-group difference p < 0.0001. Figure 2 shows the stage-specific mean values of accelerometry-based metrics plotted against the locomotion speed and the measured VO2. The distinct gap in the MAD, MADxyz, and MM values between speeds from 6 to 8 km/h and VO2 from 22 to 30 mL/kg/min coincides with the transition between typical walking and running speeds. In addition, the VO2 points displayed different slopes for typical walking and running speeds. Therefore, the regression analysis was conducted separately to points having VO2 values less than 25 mL/kg/min and at least 25 Table 2 shows the results of the regression analysis separately for typical walking and running intensities VO 2 < 25 mL/kg/min and VO 2 ≥ 25 mL/kg/min, respectively. The MAD metric had the best performance for the walking intensities, explaining 86% of the variation in VO 2 without significantly depending on the test type. For the running intensities, its predictive accuracy was substantially poorer. The MADxyz metric had the best overall performance, explaining 84% and 63% of the variation in VO 2 during walking and running, respectively. The test type was, however, significantly associated with the predicted VO 2 during running. The MM metric provided consistent predictive accuracy of VO 2 , explaining 71% and 69% of the variation in VO 2 during walking and running, respectively, but the test type was significantly associated with the predicted VO 2 during running. Figures 3-5 show the scatter plots (correlation) and Bland-Altman difference plots between the measured and predicted VO 2 for MAD, MADxyz, and MM metrics, respectively. In general, the differences were higher for typical running intensities than walking intensities for all metrics. Although the MAD metric had the best and consistent performance for walking intensities, at running intensities, it tended to overestimate the lower VO 2 values and underestimate the higher VO 2 values. Table 3 shows the results of the ROC analysis and optimal cut-points for moderate (3 MET), vigorous (6 MET), and very vigorous (9 MET) PA separately for MAD, MADxyz, and MM metrics. The sensitivity and specificity values were mainly higher than 90%, ranging from 91.6% to 99.4% and from 88.1% to 100%. For each metric, the optimal cutpoints for separating different intensity levels from each other differed between the track and treadmill tests and the values depended on the intensity level. In six cases, the track test gave the same cut-point as the pooled dataset. In one case, the treadmill test gave the same cut-point as the pooled dataset, whereas in two cases, the pooled data set had unique cut-points compared to the track and treadmill tests. Of note, the treadmill test data contained 261 datapoints and the track test 137 datapoints, which can give more weight to the treadmill test in the analysis of the pooled dataset.       having VO2 less than 25 mL/kg/min and lower row for the points having VO2 at least 25 mL/kg/min. The dotted lines on the Bland-Altman plot represent the mean difference and the limits of agreement, calculated as the mean difference ± 1.96 times standard deviation (SD) of the differences. The blue circles are for the track test and red circles for the treadmill test.  Table 3 shows the results of the ROC analysis and optimal cut-points for moderate (3 MET), vigorous (6 MET), and very vigorous (9 MET) PA separately for MAD, MADxyz, and MM metrics. The sensitivity and specificity values were mainly higher than 90%, ranging from 91.6% to 99.4% and from 88.1% to 100%. For each metric, the optimal cutpoints for separating different intensity levels from each other differed between the track and treadmill tests and the values depended on the intensity level. In six cases, the track test gave the same cut-point as the pooled dataset. In one case, the treadmill test gave the same cut-point as the pooled dataset, whereas in two cases, the pooled data set had unique cut-points compared to the track and treadmill tests. Of note, the treadmill test data contained 261 datapoints and the track test 137 datapoints, which can give more weight to the treadmill test in the analysis of the pooled dataset. Figure 5. Scatter plots of measured and predicted oxygen consumption (VO 2 ) values and Bland-Altman difference plots for the MM metric. The upper row shows the results for the data points having VO 2 less than 25 mL/kg/min and lower row for the points having VO 2 at least 25 mL/kg/min. The dotted lines on the Bland-Altman plot represent the mean difference and the limits of agreement, calculated as the mean difference ± 1.96 times standard deviation (SD) of the differences. The blue circles are for the track test and red circles for the treadmill test.

Discussion
All three MAD-based accelerometry metrics evaluated in the present study showed characteristic, relatively strong associations with oxygen consumption during locomotion over a wide range of intensities. Mutual to them was that the associations were stronger for typical walking intensities than for running intensities. For the latter, the variation in measured data was wide whereas the test type conferred a significant effect on the prediction of oxygen consumption, except for the conventional MAD metric.
The MAD metric showed the best performance for walking intensities, but the relationship between the accelerometer values and oxygen consumption virtually plateaued during running in the treadmill test. This means that the MAD values did not virtually increase despite the substantially increased intensity. A similar plateau effect during treadmill running was recently observed by Chen et al. [14]. However, in the present study, only one participant showed a negative relationship between the VO 2 and MAD metric, that is, the MAD values decreased with increasing VO 2 . Individual physical fitness is likely to contribute to this plateau. Those, who had higher MAD values at lower running speeds, were also the first ones to drop out from the test. These individual results likely compromised the performance of the MAD metric during running. The standard error of VO 2 estimation was over 7 mL/kg/min (corresponding to two METs), indicating a relatively poor accuracy of energy consumption for the MAD metric at high speeds of locomotion. On the other hand, the MAD metric performs accurately when it comes to the classification of PA into intensity categories, for example, into low-intensity PA or moderate-to-vigorous PA (MVPA). These categories, in addition to sedentary time, are commonly used in scientific literature addressing the associations of PA with various health outcomes and PA recommendations [19,20].
The MADxyz metric had good performance for both walking and running intensities. However, the test type confounded the VO 2 prediction at running speeds, but its impact remained smaller than that with the MM metric. The VO 2 was 1.8 mL/kg/min lower on average (corresponding to half MET) during treadmill running than during track running at the same MADxyz level. While the MM metric seemed to show the best prediction of VO 2 for running speeds, the test type conferred a significant impact on the results. VO 2 during treadmill running was 4.1 mL/kg/min lower (corresponding to more than one MET) than during track running at the same MM level. These test type-related errors can be problematic in free-living assessments of PA intensity because the actual conditions cannot be known a priori or are nearly impossible to measure, especially in studies with a great number of participants.
Treadmill walking qualitatively and quantitatively resembles overground walking. However, when the walking speed is matched, the typical stride length is shorter, and the cadence is higher on the treadmill than on overground [21]. Additionally, overground walking has more variability in temporal rhythm and other gait parameters, such as trunk velocity [22], but VO 2 is similar during treadmill and overground walking when the speed is the same [23]. In the present track and treadmill tests, the accelerometry-based metrics were at the same level for typical walking speeds. The previously observed differences in the accelerometer outputs between overground and treadmill walking may be attributed to count-based algorithms [5,6]. The count-based metrics used in the previous studies are affected by a band-pass filter, which attenuates the acceleration signal substantially outside the frequency range of 0.25 Hz to 2.5 Hz [24]. Thus, the differences in the preferred gait frequency can modify the accelerometer output and account for the diverging results of the count-based algorithms.
The treadmill and overground running have similar submaximal VO 2 at lower intensities. Roughly a 1% grade can be used in the treadmill test at running velocities corresponding to at least 80% of the maximal VO 2 or speeds over 13 km/h to apply more precisely the treadmill-based assessment of running to real-life conditions [25]. However, the biomechanics of running differ between the treadmill and overground tests. On the treadmill, the belt moves the supporting leg under the body, whereas on the overground, the body moves over the supporting leg [26]. Vertical displacement during the entire gait cycle is lower, and the ground contact time is longer during treadmill running [26]. In addition, a person's inexperience in treadmill running can result in a higher stride frequency and shorter stride length that, in turn, reduce braking and propulsive forces [26]. Humans also optimize their leg stiffness and ground contact to minimize the metabolic cost of running [27]. Apparently, each accelerometer-based metric is differently sensitive to different gait parameters. Therefore, variations in the running conditions can impose specific effects on the accelerometry metrics. In this study, the lower MAD values during the treadmill running may be explained by the lower vertical displacement during the entire gait cycle. The influence of the test type was confirmed by the ROC analysis as clear differences in the cut-points at different intensity levels.
In the present study, the sensor placement was on the side of the hip. The indoor track with banked curves conferred a marginal effect on the accelerometer output between the opposite sides [16]. In general, depending on the habitual activity engaged by the individual, movement-generated accelerations at one site do not necessarily represent those at another site. Therefore, placement of the accelerometer in the middle of the low back could be more optimal for evaluating locomotion because it is closer to the body's centre of gravity than the hip site [5]. Additionally, for walking and running measurements only, other body locations may provide accurate results as well. For example, wristworn devices can predict VO 2 accurately in different running conditions [28]. However, physical behaviour assessments should provide reliable estimates for all types of physical activities and sedentary behaviour, particularly in population-based studies. Thus, the sensor location does not have to be the best at anything, but be good in everything and feasible to use, as recently recommended by Chastin et al. [2].
According to the present findings, the MAD metric had the best performance for VO 2 levels up to 25 mL/kg/min, and the MADxyz metric for VO 2 levels of at least 25 mL/kg/min or higher. In practice, the MAD metric can be used for accurate VO 2 estimation when the MAD value is less than 500 mg. When the MAD value exceeds 500 mg, the MADxyz metric is preferable. The MM metric showed slightly better performance for running intensities than the MADxyz metric, but it was also the most sensitive to the test type. Consequently, it Is also more sensitive to unknown variations in free-living conditions, such as surface stiffness, making it susceptible to substantial uncertainty. Nevertheless, because of its consistently strong association with VO 2 over the wide intensity range of locomotion, the utility of the novel MM metric warrants further evaluation.
One essential difference between the evaluated accelerometry-based metrics is that the MAD metric is not sensitive to sensor rotation whereas the MADxyz and MM metrics are. In other words, when the sensor is rotated in place, the acceleration magnitude corresponds to the level of the Earth's gravity, but the axis-specific values can vary between −1 g to +1 g. While the MAD metric reflects the sensor noise, the MADxyz and MM metrics can even indicate high-intensity running. This measurement property may turn useful in predicting VO 2 during different activities, such as household chores or occupational activities, which are conducted mainly in place but require changes in the body position. This calls for further validation studies as well.
Major strengths of the present study are the direct breath-by-breath measurement of incident VO 2 during both tests with similar metabolic gas analyzers, well-controlled test conditions, and the use of the same accelerometers for data collection. Further, the data analysis was centralized and based on identical, validated algorithms. On the other hand, the participants were not the same in both tests, and the treadmill group was significantly younger. Other limitations pertain to the unknown performance of metrics during low-intensity locomotion (VO 2 less than 10 mL/kg/min or less than 3 MET), locomotion at different surfaces, and up-or downhill. Assessment of these conditions calls for further studies.
In conclusion, the MADxyz metric had a similar performance to the MAD metric at walking intensity. However, caution is required with the MADxyz metric because of its high sensitivity to changes in the sensor orientation relative to Earth's gravity vector. A small change in body posture during the epoch can substantially affect the MADxyz value and possibly result in a fallacious estimation of VO 2 . In all evaluated accelerometry-based metrics, the prediction accuracy of VO 2 was consistently poorer at high intensities than at low intensities corresponding to walking. Depending on the intensity of locomotion, the choice of proper accelerometer metrics and test type may affect the validity of the prediction of incident VO 2 . However, the accuracy of these metrics is sufficient for the proper classification of physical activity into low, moderate, and vigorous intensity categories that is the most common approach to analyze PA data in large population-based studies.

Institutional Review Board Statement:
The studies were conducted according to the guidelines of Declaration of Helsinki. The track test study was approved by the Regional Ethics Committee of the Expert Responsibility area of Tampere University Hospital (R13040) and the treadmill test study by the Medical University of Graz (26-510 ex 13/14).

Informed Consent Statement:
Informed consent was obtained from all study participants. Data Availability Statement: Non-identifiable data are available for research purposes from the corresponding author upon reasonable request.