Stepping up with GGIR: Validity of step cadence derived from wrist-worn research-grade accelerometers using the verisense step count algorithm

ABSTRACT The Verisense Step Count Algorithm facilitates generation of steps from wrist-worn accelerometers. Based on preliminary evidence suggesting a proportional bias with overestimation at low steps/day, but underestimation at high steps/day, the algorithm parameters have been revised. We aimed to establish validity of the original and revised algorithms relative to waist-worn ActiGraph step cadence. We also assessed whether step cadence was similar across accelerometer brand and wrist. Ninety-eight participants (age: 58.6±11.1 y) undertook six walks (~500 m hard path) at different speeds (cadence: 92.9±9.5–127.9±8.7 steps/min) while wearing three accelerometers on each wrist (Axivity, GENEActiv, ActiGraph) and an ActiGraph on the waist. Of these, 24 participants also undertook one run (~1000 m). Mean bias for the original algorithm was −21 to −26.1 steps/min (95% limits of agreement (LoA) ~±65 steps/min) and mean absolute percentage error (MAPE) 17–22%. This was unevenly distributed with increasing error as speed increased. Mean bias and 95%LoA were halved with the revised algorithm parameters (~-10 to −12 steps/min, 95%LoA ~30 steps/min, MAPE ~10-12%). Performance was similar across brand and wrist. The revised step algorithm provides a more valid measure of step cadence than the original, with MAPE similar to recently reported wrist-wear summary MAPE (7–11%).


Introduction
Increasingly, evidence pertaining to the benefits of physical activity for health is generated from wrist-worn accelerometers; for example, the Axivity in UK Biobank (Doherty et al., 2017), the ActiGraph in the US National Health and Nutrition Examination Survey (NHANES, Belcher et al., 2021) and the GENEActiv in the Pelotas Birth Cohort (Da Silva et al., 2014). Accelerometers address the subjectivity and recall bias inherent in selfreported activity (Sallis & Saelens, 2000), facilitate the capture of incidental non-purposeful activity which self-report tends to miss , and increase precision of measurement (Troiano et al., 2014). Further, the move from waist-worn to wrist-worn accelerometers has led to greater participant compliance , reducing the potential for bias due to non-wear or selective wear (Toftager et al., 2013).
While this is good news for the research community, the physical activity outcomes from research-grade accelerometers do not always lend themselves to intuitive interpretation and effective public health messaging. Around a quarter of US adults (Rising et al., 2021;Xie et al., 2020) and 14% of UK adults (Strain et al., 2019) reported using consumer wearable health and activity devices in 2019 and 2018, respectively -a doubling of reported users since 2015 in the US (Rising et al., 2021). This growth in wearables has led to people becoming accustomed to monitoring their physical activity, usually with stepping activity being the main behaviour of interest as it is intuitive and readily understandable (Bassett et al., 2017). Given this, the generation of steps as an outcome from research-grade wrist accelerometer data in large national resources could improve understanding of associations between stepping behaviour and health. This could inform goal setting and health messaging for consumer wearable self-monitoring devices.
However, converting raw acceleration data to useful activity metrics is not a trivial task. Over recent years, GGIR has emerged as arguably the most widely-used open-source research-grade accelerometer processing software (Migueles et al., 2019), (https://github.com/wadpac/GGIR/wiki/Publication-list). GGIR facilitates identical processing and analysis of data from different brands of research-grade monitors. This has been shown to result in physical activity outcomes that can be considered equivalent between brands (Rowlands, Plekhanova et al., 2019). Within GGIR, the Verisense Step Count Algorithm, based on the Gu et al. (2017) method of step detection from triaxial acceleration data, is now available for the extraction of stepping metrics (https://github.com/ShimmerEngineering/Verisense-Toolbox /tree/master/Verisense_step_algorithm; Patterson, 2020).
While steps appear to be an intuitive metric (Bassett et al., 2017), its apparent face validity can be problematic. Recorded daily steps have been shown to vary from 69% to 220% of actual daily steps depending on monitor and/or algorithm (Toth et al., 2018). This highlights the importance of validation of step metrics. Previously, we compared free-living daily steps estimated from wrist accelerometry using the Verisense algorithm to activPAL determined steps in a large (N = ~650) sample of office workers (Maylor et al.,), with results indicating a strong correlation (r = 0.79, p < 0.001) and modest positive mean bias of ~850 steps per day, representing approximating 9% overestimation from the wrist. However, a proportional bias existed meaning that in high-active individuals' steps per day were underestimated at the wrist, potentially through underestimation of steps during faster walking and/ or running. Based on these results, the Verisense algorithm has been refined. Prior to use for interpretation and/or prescription of activity in terms of steps, the original and revised Verisense step algorithm parameters need further validation across a range of ambulatory speeds and metrics. Over recent years, interest has mounted into the potential importance of step cadence as an indicator of walking pace (Tudor-Locke et al., 2018), which has also been used as an independent predictor of health outcomes (Paluch et al., 2022).
Therefore, the aim of this study was to compare step cadence derived from the original and revised Verisense Step Count Algorithm parameters to concurrent waist-worn ActiGraph determined step cadence across slow, steady, and brisk walking, and running. A secondary aim was to assess whether step cadence determined from the Verisense step algorithm was similar across three brands of research-grade accelerometers worn on the dominant and non-dominant wrist.

Methods
This analysis used data from the "Walk in the Park" study (Dawkins et al., 2021), which aimed to facilitate simple and meaningful translation of wrist worn accelerometer data. In brief, a convenience sample of 105 adults aged 40-79 y were recruited after responding to advertisements in the university and social media. The study had ethics approval from the University of Leicester's College of Life Sciences ethics committee (Ref: 18,779) and all participants gave written informed consent. Height was measured to the nearest 0.1 cm and mass to the nearest 0.1 kg. Age and sex were self-reported.
Participants were fitted with seven accelerometers; three accelerometers on each wrist (Axivity, GENEActiv, ActiGraph) and an ActiGraph worn on the waist above the right hip to provide a reference measure of step cadence. While a true criterion would be directly counted steps, waist-worn ActiGraph accelerometers have been shown to produce valid measures of steps during free-living walking (Abel et al., 2011;Lee et al., 2015) and ambulation (Gould et al., 2021;Moore et al., 2020). For example, step count from the previous version of the ActiGraph GT9X, the GT3X, worn at the waist has been shown to measure steps at speeds >4 km/h with <5% mean absolute percentage error (MAPE) in children (Gould et al., 2021) and <6% MAPE in adults (Moore et al., 2020). The relative position of each wrist device was randomised between participants with one device placed at the wrist, one proximal to this and the third placed on top of one of the first two. Positioning of devices was consistent between wrists for each participant.
To ensure a range of walking speeds as well as "preferred speeds" were covered, participants completed three walks in a park along a hard path standardised between participants to represent a slow, steady, and brisk pace, followed by three selfpaced slow, steady, and brisk walks. For the standardised speeds, a steady walking speed based on leg length (predicted from height (Pheasant, 1982)), was calculated for each participant (Kramer & Sarton-Miller, 2008). The associated pacing frequency was determined (Soczawa-Stronczyk et al., 2019) to facilitate metronome regulation of pace. Standardised slow and brisk speeds were 10% below and above the steady speed, respectively. For the self-paced slow walk participants were asked to walk "at a pace they may use when strolling along chatting", the self-paced steady walk "at their preferred pace", and the self-paced brisk walk "at a pace that feels brisk or purposeful". Finally, participants were given the option of a selfpaced run around the park at a "comfortable and consistent pace which they could maintain for 5 km". Each walk was straight "out and back" with a turn at the mid-point to return to the starting point and approximately 485 m long. The run was a loop round the park with some right angle turns and approximately 1,110 m long. Speed was calculated from the duration (start and end time) and distance of each walk/run.

Accelerometers
The GENEActiv Original, Axivity AX3 and ActiGraph GT9X Link (from herein: GENEActiv, Axivity and ActiGraph) are triaxial accelerometers with a dynamic range of ± 8 g, where g is equal to the Earth's gravitational pull. All were configured to record at a frequency of 100 Hz and initialised using the same computer to match the time of the recording period. GENEActiv monitors were initialised and data downloaded (bin format) using GENEActiv PC (version 3.2). Axivity monitors were initialised and data downloaded (cwa format) using OmGui (OmGui Version 1.0.0.30, Open Movement, Newcastle, UK). ActiGraph monitors were initialised and data downloaded (gt3x format) using ActiLife (version 6.13.3). The "idle sleep mode" in the ActiGraph software was disabled. Wrist ActiGraph files were converted to .csv format for data processing.

Waist-worn actiGraph
Files from the waist-worn ActiGraph were converted to 5-second epoch agd files (no low-frequency extension) and processed in Actilife (version 6.13.3) to generate the criterion measure of steps in 5-s epoch csv files.
Step count from the previous version of the ActiGraph GT9X, the GT3X, worn at the waist has been shown to measure steps at speeds ≥4 km/h with <5% mean absolute percentage error (MAPE) in children (Gould et al., 2021) and <6% MAPE in adults (Moore et al., 2020).

Wrist-worn accelerometers
All wrist-worn accelerometer files (six per participant) were analysed with R-package GGIR version 2.6-0 (parts 1 and 2) in R (https://cran.r-project.org/web/packages/GGIR/; Bastian et al., 2015Bastian et al., , p. 2014). This included auto-calibration using local gravity as a reference (van Hees et al., 2014), detection of sustained abnormally high values, calculation of the average magnitude of dynamic acceleration (i.e., resultant vector magnitude, corrected for gravity and expressed as Euclidean Norm Minus One with negative values rounded up to zero (ENMO) in milligravitational units (mg) averaged over 5-second epochs). Files were excluded if accelerometer files showed post-calibration error greater than 0.01 g (10 mg). Steps were generated for the wrist worn devices using the original (https://github.com/ ShimmerEngineering/Verisense-Toolbox/tree/master/ Verisense_step_algorithm) and revised versions of the Verisense step-count external function algorithm and output in 5-s epoch csv files. Details of the original and revised algorithm parameters are in the supplemental methods.
The 5-s epoch csv files for all seven accelerometers were matched across participants and the timings for each walk/run identified. The first and last 30s of each walk/run were trimmed to exclude accelerations and decelerations pertaining to the beginning and/or end of the activity.
Step count per 5-s epoch was multiplied by 12 to obtain cadence in steps per minute. Mean cadence per walk/run was calculated for each device.

Data analysis
Descriptive statistics (median, inter-quartile range) were calculated for all outcome measures. Cadence from each wrist accelerometer was compared to the criterion using mean absolute percentage error and limits of agreement (LoA) (Bland and Altman, 1986). Comparisons were carried out by pace for the original and revised Verisense step algorithm parameters. To be included in the analysis, participants needed data for the criterion measure of steps and at least one brand of accelerometer worn on both wrists.
To determine whether performance of the Verisense step algorithms was similar across the three brands of researchgrade accelerometer worn on the dominant and non-dominant wrists we used pairwise 95% equivalence tests (Wellek, 2003). We selected ±10% as our proposed equivalence zone as has been used in previous studies comparing activity monitors (e.g., Lee et al., 2014). Comparisons were carried out pairwise by pace for the best performing Verisense step algorithm: 1) within brand, between wrist; 2) between brand, within wrist.

Results
There were 105 participants recruited of which 98 (56 female, age 58.6 ± 11.1 y, height 168.6 ± 9.7, weight 72.9 ± 13.9 kg) had sufficient data for the walking dataset. Participant exclusions were due to not meeting the minimum data requirements due to failing to complete study protocol (n = 1), or reference monitor device error during one measurement batch (n = 6). Twenty-five participants volunteered for the run, of which 24 (10 female, age 53.5 ± 9.7 y, height 171.2 ± 11.3 cm, weight 74.5 ± 12.7 kg) had sufficient data for the running dataset (n = 1, reference monitor device error).
Mean walking speed ranged from 4.1 to 6.5 km/h with the speed of self-paced steady walking tending to be faster than the standardised walking speed based on leg length (Table 1). Increases in pace were reflected in increased speed and reference cadence for standardised pace and self-paced walking/ running.
Cadence estimated from the original Verisense step algorithm tended to plateau as speed and reference cadence increased, then decline for self-paced brisk walking and running (Table 1, Figure 1). A similar pattern was shown across all devices worn at either the non-dominant or dominant wrist. However, the revised Verisense step algorithm elicited the expected linear positive association with pace and reference cadence (Table 1, Figure 2).

Figure 2.
Association between reference step cadence and estimations from the revised Verisense step algorithm applied to the Axivity, GENEActiv, and ActiGraph worn at the dominant wrist (a, b, and c, respectively) and non-dominant wrist (d, e, and f, respectively).

Figure 3.
Mean bias and 95% limits of agreement relative to reference step cadence for estimations from the original Verisense step algorithm applied to the Axivity, GENEActiv, and ActiGraph worn at the dominant wrist (a, b, and c, respectively) and non-dominant wrist (d, e, and f, respectively).

Figure 4.
Mean bias and 95% limits of agreement relative to reference step cadence for estimations from the revised Verisense step algorithm applied to the Axivity, GENEActiv, and ActiGraph worn at the dominant wrist (a, b, and c, respectively) and non-dominant wrist (d, e, and f, respectively).
was more evenly spread across pace, albeit still greatest for the faster speeds of self-paced brisk walking (~-17 to −20 steps/ min, MAPE 14-17%) and running (−12 to −28 steps/min, MAPE 9-17%). The Verisense step algorithm performed similarly across wrist and devices. Pairwise 95% equivalence tests by pace were carried out for the revised (best performing) algorithm. Cadence output could be considered equivalent within brand between wrists (Figure 5a) and between brands within wrist ( Figure 5 b and c) for all walks/runs for the Axivity and the GENEActiv and for >80% of the walks/runs for pairings involving the ActiGraph. The pairings that could not be considered equivalent were either the slowest walk (Figure 5a), or the selfpaced run (Figures 5 b and c) and fell just outside the 10% equivalent zone (~±8 steps/min and ~±14-15 steps/min for the slowest walk and the self-paced run, respectively). Notably, over 92% of GENEActiv/Axivity pairings and over 60% of interbrand comparisons involving the ActiGraph fell within a 5% equivalence zone (~5 steps/min at a steady pace).

Discussion
While the original Verisense step algorithm parameters tended to underestimate step cadence as walking speed increased, the revised algorithm parameters provided a valid measure of step cadence. However, an underestimation of cadence during walking persisted; given the modest positive bias during free-living (Maylor et al.,), this suggests the algorithm misclassifies some non-stepping lifestyle activity as steps. Overall mean absolute percentage error of the revised algorithm was similar to recently reported wear wrist-specific summary values of 7-11% (Moore et al., 2020). Importantly, performance of the revised algorithm was similar across wrists and accelerometer brands, particularly for the Axivity and GENEActiv where differences fell within the 10% equivalence zone for all walks and the run.
Results were poor for brisk walking and running with the original Verisense step algorithm parameters because these gait conditions were not present in the initial development of the algorithm. The original algorithm parameters were developed used an open-source dataset (Mattfeld et al., 2017) that consisted of unstructured, semi-structured and structured gait periods, all performed in a supervised, laboratory environment. The open-source dataset and subsequent original algorithm parameters would reflect a use case in which participants have a relatively low level of activity and do not perform sporting activities.
Step count results were improved with the updated parameter set in the revised Verisense step count algorithm because the parameters were optimised to perform well on a small subset of a free-living validation dataset. As described in the supplementary methods, sixteen participants, representing step over-estimation, under-estimation and good performance with the original parameters, were chosen to be used as Figure 5. Equivalence between cadence estimated from the revised Verisense algorithm applied to pairs of monitors a) within wrist between brand monitor, b) between brands on the non-dominant wrist, and c) between brands on the dominant wrist. a training sample to find a parameter set that performed better on this subset of participants. A parameter set that balanced performance on the under and over-estimation participants was best, as it did not sacrifice performance of some participants for good performance on other participants. The data in this sample came from free-living recordings of healthy individuals performing their everyday activities (Maylor et al.,). In contrast, the parameters obtained in the original open-source dataset were not sufficient to accurately detect steps outside of the supervised, laboratory environment, despite the use of unstructured gait periods in the original, open-source data set (Mattfeld et al., 2017). The improved performance of the revised algorithm highlights the synergistic potential, and thus importance, of using both free-living and structured scenarios in the development and testing of algorithms for generation of physical activity outcomes from accelerometer data. This approach is in concordance with best practise recommendations (e.g., Bastian et al., 2015;Butte et al., 2012).
Importantly, results suggest that step outcomes could be generated and compared across studies even if the protocols differed by brand of wrist-worn research-grade accelerometer and/or wrist of wear, e.g., UK Biobank (Doherty et al., 2017) and NHANES (Belcher et al., 2021). However, some caution is warranted when comparing the ActiGraph to the GENEActiv or Axivity for slow walking and running. It is possible these slightly larger differences at very slow or fast cadences may be device or artefact driven given reports of differences between the ActiGraph and other accelerometer brands following identical processing in some (Bastian et al., 2015;Edwardson et al., 2022;John et al., 2013Rowlands, Plekhanova et al., 2019, although not all (Migueles et al., 2022), studies. Notably, irrespective of device, underestimation relative to the reference device was greater for the higher cadences evident in self-paced brisk walking and running. When applied in a free-living setting this would have implications for estimates of steps taken at moderate-to-vigorous intensity, e.g., cadences ≥100 steps/min (Tudor-Locke et al., 2018). The extent of this would depend on the proportion of higher cadences in the population studied and should be investigated in further research on free-living data.
Inclusion of three research-grade accelerometers worn on both wrists and the large age range (40-79 y) studied were strengths of the study. Though the results may not be generalisable to adults aged under 40 y, this ensures applicability of the results to a range of accelerometer protocols that are currently in use. However, although relative position of devices was randomised between participants, this did mean that only one device per participant could be placed directly on the optimal wrist position. Further, only one run, completed by approximately a quarter of the participants, was included. Finally, although our reference measure of step count from the waist-worn ActiGraph is well-validated (Gould et al., 2021;Moore et al., 2020), a visual step count would have provided a true criterion. However, this was a secondary analysis of an existing dataset and an observed step count was not available. In the absence of a true step count, the waist-worn ActiGraph was deemed appropriate as a reference measure given its established validity for assessment of free-living (Abel et al., 2011;Lee et al., 2015) and ambulatory steps (Gould et al., 2021;Moore et al., 2020). Additionally, much of the evidence base on steps and health outcomes has been generated from studies using the waist-worn ActiGraph to assess steps and/or cadence (Paluch et al., 2022) further supporting the appropriateness of the reference measure.
In conclusion, the revised Verisense step algorithm parameters provide a valid measure of step cadence. The error during walking and running is similar to the error recently reported for wrist-worn contemporary step-counting devices. As step count is a readily understandable metric, application of this algorithm could enable more intuitive interpretation of research evidence to be generated from the many research studies worldwide that process their wrist accelerometer data through GGIR.