The Analytical and Clinical Validity of the pfSTEP Digital Biomarker of the Susceptibility/Risk of Declining Physical Function in Community-Dwelling Older Adults

Measures of stepping volume and rate are common outputs from wearable devices, such as accelerometers. It has been proposed that biomedical technologies, including accelerometers and their algorithms, should undergo rigorous verification as well as analytical and clinical validation to demonstrate that they are fit for purpose. The aim of this study was to use the V3 framework to assess the analytical and clinical validity of a wrist-worn measurement system of stepping volume and rate, formed by the GENEActiv accelerometer and GENEAcount step counting algorithm. The analytical validity was assessed by measuring the level of agreement between the wrist-worn system and a thigh-worn system (activPAL), the reference measure. The clinical validity was assessed by establishing the prospective association between the changes in stepping volume and rate with changes in physical function (SPPB score). The agreement of the thigh-worn reference system and the wrist-worn system was excellent for total daily steps (CCC = 0.88, 95% CI 0.83–0.91) and moderate for walking steps and faster-paced walking steps (CCC = 0.61, 95% CI 0.53–0.68 and 0.55, 95% CI 0.46–0.64, respectively). A higher number of total steps and faster paced-walking steps was consistently associated with better physical function. After 24 months, an increase of 1000 daily faster-paced walking steps was associated with a clinically meaningful increase in physical function (0.53 SPPB score, 95% CI 0.32–0.74). We have validated a digital susceptibility/risk biomarker—pfSTEP—that identifies an associated risk of low physical function in community-dwelling older adults using a wrist-worn accelerometer and its accompanying open-source step counting algorithm.


Introduction
The use of wearable sensors, including accelerometers, to estimate the number of daily steps and their cadence, has become ubiquitous in research [1]. Their ability to objectively and unobtrusively obtain multi-day, 24/7 recordings of stepping in free-living conditions can provide insights not obtainable within the constraints of the laboratory, such as detailed distributions of step counts, stepping durations, and cadences [2][3][4][5].
A higher stepping volume (the number of steps counted during an interval of time, e.g., steps/day) is associated with reduced all-cause mortality and cause-specific mortality [1,[6][7][8][9], as well as a lower risk of chronic disease [10] including cardiovascular disease [11].
Meeting a daily stepping goal through slower-paced walking is qualitatively different from achieving the same daily steps through faster-paced walking. Investigations of the association between an accelerometer-derived stepping volume and rate (the cadence at which steps were accumulated, e.g., 62 steps/min) with health outcomes provide equivocal findings. A higher stepping volume and rate have been shown to be jointly associated with lower hospitalisation and all-cause mortality in older adults [12]. Stepping at a faster pace has also been associated with all-cause, cancer, and cardiovascular morbidity and mortality even when adjusting for the daily stepping volume [13], and it has been suggested that the stepping rate may be of greater importance for cardiometabolic risk reduction than total stepping volume [14]. Furthermore, the association of total stepping volume with all-cause dementia [15] and incident diabetes [16] was found to be stronger when steps were accrued at a faster pace. Similarly, a larger proportion of steps at higher stepping rates was associated with a greater risk reduction for diabetes [17]. In contrast, a higher daily stepping volume was associated with lower mortality but stepping rate was not when adjusted for the stepping volume [7,8,18].
The different measurement properties of the systems used to estimate step counts are one possible reason for the uncertainty about the relative importance of stepping volume and rate, because the outputs of different systems are far from interchangeable [19]. The aspects shaping an accelerometer system's measurement properties include the design of algorithms turning raw data into stepping estimates [20], the construction and duration of variable-length stepping events [21], and the sensor's wear location (usually hip, thigh, or wrist) [22][23][24][25][26][27][28]. Discrepancies between systems can also be exacerbated at low stepping rates because it is harder to detect steps from weak acceleration signals [5,25,[29][30][31]. Likewise, the acceleration is moderated by the setting in which movement occurs. Stepping metrics estimated from the same device can vary depending on whether the data are recorded in an artificial laboratory setting, such as treadmill walking or simulated activities of daily life, or in an authentic free-living setting [32,33].
Epoch-based methods and event-based methods are two different ways to estimate the stepping volume and the rate at which steps were accumulated. An epoch-based method collects and analyses data in predefined, non-overlapping time intervals (epochs). For example, an epoch might be set to last 60 s, and the number of steps taken during that 60-s interval would be recorded. The number of steps taken are divided by the epoch's duration to estimate the cadence. This approach underestimates the 'true' cadence if the epoch includes stationary time or if the start and end of the stepping event spans over two epochs. This is likely to be an issue because it is uncommon for humans to step consistently for a whole minute [34]. An event-based method, on the other hand, records steps in real-time as they occur [3]. When a step is detected, an event is started, and the number of steps taken recorded until the continuous period of stepping comes to an end. The number of steps in the event is still divided by the duration of the event to estimate the cadence. However, this estimate is a more accurate estimate of the true cadence because the stationary time is not included in the variable-length event's duration.
When measuring stepping, placing accelerometers on the lower body (e.g., the thigh or hip) is generally preferred because the lower limbs are the body parts in contact with the ground and the primary source of movement during stepping. An accelerometer placed on the upper body, such as the wrist, may still capture the motion of the body during stepping, but the signals can be affected by secondary actions (e.g., holding a phone) and may not always reflect whole body movement [25]. The sensor wear location also impacts wear time adherence, which may lead to differences between studies. Periods of missing data due to non-wear can reduce the accuracy of stepping estimates and lead to erroneous estimates of the association between stepping and health outcomes [35,36]. Reduced wear time for hip-worn devices has been attributed to the discomfort and inconvenience they can cause [37] and evidence suggests that wrist-worn systems have higher adherence to wear time protocols in adolescents and adults [1,38,39].
Comfortable, waterproof, single-device sensors worn at the wrist are more likely to maximise wear time, and thus the accuracy of derived estimates, but these measurement systems need to show that the stepping estimates they produce are reliable and fit for purpose. It has been proposed that biomedical technologies, including accelerometers and their algorithms, should undergo rigorous verification as well as analytical and clinical validation-the V3 framework-to confirm their suitability [40]. While digital measures of stepping have been estimated from wrist-worn accelerometers, none have demonstrated that they are fit-for-purpose based on the V3 framework [25,28,[41][42][43][44][45][46]. The results from studies that assessed the accuracy of the widely used wrist-worn ActiGraph in free-living settings and with criterion measures obtained in a laboratory found that stepping estimates obtained from the wrist were often in disagreement with those measured at the hip [26,47,48], highlighting the importance of rigorous verification and validation of accelerometer measurement systems. Therefore, the aims of this study were to apply the complete V3 framework by:

1.
Selecting a verified wrist-worn measurement system, formed by the GENEActiv accelerometer [49] and its accompanying open-source step counting algorithm.

2.
Establishing its analytical validity by measuring the level of concurrent agreement between the GENEActiv wrist system and the activPAL thigh system when worn simultaneously in a sample of older adults.

3.
Establishing its clinical validity by measuring the prospective association between repeated measures of daily stepping volume and rate with physical function measured via the Short Physical Performance Battery (SPPB) score [50] in a sample of older adults. The SPPB score is a clinically based measure of physical function associated with all-cause mortality, hospitalisation, future functional decline, and long-term disability [51,52]. Furthermore, the SPPB score is a predictor of frailty phenotypes and geriatric syndromes in community-dwelling older people [53].
We conclude that wrist-measured stepping volume and rate obtained through the verified and analytically and clinically validated GENEActiv measurement system create a viable digital susceptibility/risk biomarker [54] associated with a decreased risk for low physical function in older, community-dwelling adults not suffering from health conditions preventing them from engaging in physical activity.

Verification
This study was conducted with well-established measurement hardware. The acceleration measurement of the GENEActiv has been shown to have excellent intra-device and inter-device reliability [55]. The function of the open-source step counting algorithm was verified both by code inspection and replication in an alternate code base. The analytical reference device for step measurement was the activPAL [56], which has demonstrated an absolute percentage error of 1% when compared to the leading pedometers [57]. The analysis pipeline was regularly tested throughout development, with full records of the package dependencies. 20/21R1-008). All participants provided written informed consent prior to participation, including consent for their anonymised data to be used for future research.

Processing of Raw Accelerometer Data
The raw accelerometer data from the thigh-and wrist-worn devices were processed to obtain estimates of stepping volume and rate. Both measurement systems used event-based rather than epoch-based approaches to achieve a granular assessment of stepping volume and rate throughout a day, although the event segmentation approaches were different. The thigh-worn devices ran firmware 649 and their raw data were processed with the manufacturer's proprietary PALbatch desktop software in version 8.11.1.63 [56] using the VANE algorithm. The minimum non-upright period and minimum upright period durations were set to the default of 10 s. The wear time validation algorithm was set to the most stringent option, using the 24-h protocol, which allows a maximum of 4 h of non-wear per day. Wear correction was enabled to automatically correct inverted wear if a participant accidentally attached the device the wrong way round. There is no calibration option in the PALbatch software. The resulting data were exported via the 'Events (extended)' and 'Stepping bouts' reports to acquire time-stamped strides and variable-length events.
The wrist-worn devices ran firmware version 4.08a. Their raw sensor data were calibrated to remove potential measurement errors, which may result from local gravity or temperature [58], using the GENEAread R package [59]. The calibrated raw data were then processed into stepping metrics using variable-length events with the GENEAread and GENEAclassify packages [60] in versions 2.0.8 and 1.5.1, respectively. The number of valid wear hours on each measurement day was identified separately with the GGIR package version 2.7-1 [61] by analysing the calibrated raw data in 24-h chunks (midnight to midnight). This made it possible to match the 24-h protocol from the thigh-worn system during data quality checking. All the code (in Supplementary Materials) was run with R

Data Quality and Aggregation
The processed stepping measurements were quality checked to ensure that only the relevant and reliable observations were included. Only data recorded on valid days, defined as days on which a participant wore both devices for at least 20 h, were analysed. Sedentary or upright events without stepping activity were excluded, as were events with fewer than 10 steps because fewer than 10 consecutive steps may lead to unreliable estimates [63]. The thigh-worn system did not report cadences less than 20 steps/min, possibly because slow stepping produces smaller accelerations, which do not satisfy the minimum acceleration thresholds necessary for a step to be registered [25]. Wrist events with cadences less than 20 steps/min were therefore removed. Similarly, the thigh-worn system does not report cadences greater than 175 steps/min and such events were removed from the wrist data accordingly. Where participants recorded data for more than 7 consecutive days, the additional days were excluded to avoid a potential distortion of the results by cyclical behaviour, such as work-related activity patterns or exercise routines.
For each participant, the event-level stepping estimates were aggregated into daily measures of stepping volume and rate. Total steps (20-175 steps/min) were obtained by summing up a participant's steps on each valid day.
Total steps were then further categorised into two subsets: 'non-walking steps' (20-44 steps/min) and 'walking steps' (45-175 steps/min). Stepping below 45 steps/min was not considered walking, because stepping below this threshold tends to consist of less sustained stepping consistent with non-walking behaviours [5].
The 'walking' category (45-175 steps/min) was then further divided into two sub-sets representing 'slower-paced walking' and 'faster-paced walking'. A comparison of the distributions of event step counts, durations, and cadences showed that the thigh and wrist systems had different response characteristics due to differences in their processing pipelines ( Figure A1). The system-specific cadence thresholds to delineate walking at slower and faster pace were therefore required for a meaningful evaluation of agreement. This was achieved by identifying each system's median walking cadence (the median cadence of events ≥45 steps/min). For the thigh system, the median walking cadence was 74 steps/min, for the wrist 76 steps/min. Step counts from events below and including the median walking cadence were then summed for each day to obtain slower-paced walking steps and those above the median walking cadence to calculate faster-paced walking steps (Table 1, Figure 1). The 'walking' category (45-175 steps/min) was then further divided into two sub-sets representing 'slower-paced walking' and 'faster-paced walking'. A comparison of the distributions of event step counts, durations, and cadences showed that the thigh and wrist systems had different response characteristics due to differences in their processing pipelines ( Figure A1). The system-specific cadence thresholds to delineate walking at slower and faster pace were therefore required for a meaningful evaluation of agreement. This was achieved by identifying each system's median walking cadence (the median cadence of events ≥45 steps/min). For the thigh system, the median walking cadence was 74 steps/min, for the wrist 76 steps/min. Step counts from events below and including the median walking cadence were then summed for each day to obtain slower-paced walking steps and those above the median walking cadence to calculate faster-paced walking steps (Table 1, Figure 1).

Data Source
Participants were from the REtirement in ACTion (REACT) study, which was reviewed and approved by the National Health Service (NHS) South East Coast-Surrey Research Ethics Committee (15/LO/2082) and is registered as a completed randomised controlled trial (ISRCTN45627165). All participants provided written informed consent, including the use of anonymised data for future research.
The full study protocol is published in detail elsewhere [64,65]. In short, participants were over the age of 65 and had an SPPB score between four and nine (inclusive). They were also screened for a variety of health-related exclusion criteria before recruitment. Participants were followed for 2 years during which they were asked to wear a wrist-worn

Data Source
Participants were from the REtirement in ACTion (REACT) study, which was reviewed and approved by the National Health Service (NHS) South East Coast-Surrey Research Ethics Committee (15/LO/2082) and is registered as a completed randomised controlled trial (ISRCTN45627165). All participants provided written informed consent, including the use of anonymised data for future research.
The full study protocol is published in detail elsewhere [64,65]. In short, participants were over the age of 65 and had an SPPB score between four and nine (inclusive). They were also screened for a variety of health-related exclusion criteria before recruitment. Participants were followed for 2 years during which they were asked to wear a wrist-worn accelerometer in a community-dwelling setting for 7 consecutive days at baseline and at 6 months, 12 months, and 24 months after baseline. At each of these four accelerometer recording periods, the participants also completed a laboratory assessment during which their physical function (SPPB score) and other health metrics were recorded.

Processing of Raw Accelerometer Data
Only the wrist-worn measurement system was used in the REACT study and the raw data were processed with the same tools and settings described in Section 2.2.2.

Data Quality and Aggregation of Stepping Metrics
In keeping with the exclusion criteria described in Section 2.2.3, stepping events with fewer than 10 steps, a cadence less than 20 steps/min and greater than 175 steps/min, and measurement days beyond the 7th day were excluded. For each of the four accelerometer recording periods, stepping estimates were only considered valid if the device was worn for at least 18 h/day on at least 6 days/period to maximise the reliability of the habitual physical activity estimates [35]. The 18 h/day wear time limit accounted for the fact that the accelerometers were configured to start recording at 5:00 am on the first measurement day. A minimum of 18 h/day avoided discarding valuable data while producing reliable estimates because it remained considerably more stringent than the recommended wear times [66].
For each valid recording period, the data were aggregated into average daily stepping variables. The aggregation of the stepping measures happened in two stages. First, total steps and the sum of steps accrued during slower-paced and faster-paced walking were calculated for each participant per day. The median walking cadence (62 steps/min in the REACT dataset) was used to delineate slower-paced from faster-paced walking. In the second stage, the daily aggregates were averaged across recording periods, resulting in mean daily step counts for total steps (20-175 steps/min), slower-paced steps (20-62 steps/min), slower-paced walking steps (45-62 steps/min), and faster-paced walking steps (63-175 steps/min) for each of the four recording periods.

Analytical Validity
The wrist-worn system's analytical validity was determined by assessing the concurrent agreement of its daily stepping estimates with those of the thigh-worn reference standard via the Concordance Correlation Coefficient (CCC) developed by Lin [67] and extended by Carrasco et. al. [68] for longitudinal repeated measurements. This extension expresses the CCC in terms of the variance components of a Linear Mixed Effects Model (LMEM). This accounted for the hierarchical nature of the data by modelling the paired daily stepping estimates as longitudinal replicates separately for each participant (random effects), including the interactions between systems, participants, and recording periods. Separate models were fitted with the cccrm R package version 2.0.3 [69] for total steps and the walking, slower-paced walking, and faster-paced walking subsets to obtain a CCC for each cadence category.

Clinical Validity
The outcome of interest was the change of physical function (SPPB score) over the 24-month follow-up period. Therefore, participants had to provide valid accelerometer and SPPB data for at least two of the four recording periods to be included in the statistical analysis. The independent association between stepping variables and physical function was assessed through LMEMs. All data were analysed at the level of the individual participant and a random intercepts term was included in the model to allow the intercepts to vary for each participant. The covariates included group allocation, site of data collection, age at recruitment, sex, indices of multiple deprivation (IMD) quintile, highest education qualification, perceived general health (SF-36 Score), and the presence of comorbidities (see Table 2 for covariate levels). Data from the control and intervention groups could be analysed together because the effect that the intervention had on SPPB was accounted for by including the group allocation and its respective interaction with time (0, 6, 12, and 24 months) in the model. Longitudinal associations between stepping and SPPB were also modelled by including 'stepping x time' interaction terms. Three different models Statistical significance of coefficient estimates, for the presence of associations, was defined as p < 0.05. All models were fitted in Stata version 17.0 [70] using the 'mixed' command. Sensitivity analyses were conducted to confirm the robustness of the LMEM results. This included: (i) fitting the models without health and socio-economic covariates, (ii) the examination of the Control and Intervention groups separately, (iii) the replication of the analysis using mixed effects ordinal logistic regressions (cumulative link models) because SPPB scores lie on a non-equidistant 12-point ordinal scale derived from the sum of three individual four-point scores on an ordinal scale (gait, balance, and sit-to-stand). However, as the 12-point scores were normally distributed, they could be approximated to, and treated as, a continuous scale for the primary LMEM analyses.

Biomarker Description
A detailed analysis of the different measurement systems in this study and those referenced in the prior art made it possible to conceive a simple biomarker that reveals the association of in-community measured steps and physical function (pfSTEP). This biomarker is structured as two integer numbers that represent the average number of slower-paced steps per day, followed by the average number of faster-paced walking steps per day, e.g., (6931; 428). The sum of the two integers is the average total number of steps per day and the separation threshold is the median walking cadence of the population under study.

Analytical Validity
Of the N = 56 participants who provided valid data, 30 (54%) were female and 26 (46%) were male. Their age ranged between 50 and 87 years, with an average age of 64 (±8) years. On average, participants took 9065 (±5104) daily total steps for the thigh-worn reference system and 9721 (±4776) for the wrist-worn system.
The agreement of the thigh-and wrist-worn systems for total daily steps was excellent [71] with a CCC of 0.88 (95% CI 0.83-0.91). The CCC for walking steps and faster-paced walking steps showed a moderate agreement with CCCs of 0.61 (95% CI 0.53-0.68) and 0.55 (95% CI 0.46-0.64), respectively. The CCC for slower-paced walking steps was 0.14 (95% CI 0.02-0.27), indicating a poor agreement for this category with a 95% CI close to zero. Figure 2 further illustrates the precision and accuracy dimensions of the CCC. The pairs of daily total steps estimated by the two systems were similar (Plot A). A given thigh step count tended to correspond to a similar wrist step count and vice versa, meaning that the measured linear relationship was close to what would be observed in the presence of perfect agreement. The thigh-worn system consistently recorded higher step counts for walking (Plot B). Considerably different thigh and wrist step counts corresponded to each other for slower-paced walking and the true linear relationship between thigh-and wristworn measurements at slower-paced walking was far from the theoretical relationship for perfect agreement (Plot C). For faster-paced walking, the agreement was weaker for higher step counts and on many days the thigh-worn reference system recorded considerably more steps than the wrist-worn system (Plot D). The thigh-worn system also consistently reported more faster-paced walking steps than the wrist-worn system.
The variance components from the LMEMs ( Figure A2) showed that most of the observed differences between the thigh-and wrist-measured total steps were attributable to the participants (51% of total variance) and the variation of their behaviour on different days (37%), while the measurement systems were the source of little variation (1%). However, when step counts were categorised into slower-paced and faster-paced walking, the measurement systems were a much larger source of variation, contributing between 15% and 30% of the total variance.

Clinical Validity
Of the N = 777 participants who took part in the REACT trial, 651 (83.78%) provided both valid accelerometer data and completed the physical function tests on at least two of the four recording periods (10.42% of participants provided data from two periods, 28.44% from three, and 44.92% from all four). Consequently, the model coefficients and goodnessof-fit metrics were derived from stepping activity collected from 15,374 measurement days across 2227 participant-recording periods. On average, each participant provided 6.9 days of accelerometer data from 3.6 recording periods. The age at recruitment ranged from 65 to 98 years. Table 2 presents the characteristics of the study population, resulting from randomisation via a minimisation algorithm, which balanced groups by study site, age group, gender, and initial functional ability [64,72].
Daily total steps and step counts in the cadence-specific categories declined at each follow-up. The proportion of non-walking steps increased over time while the proportion of slower-and faster-paced walking declined (Table 3).  The variance components from the LMEMs ( Figure A2) showed that most of the observed differences between the thigh-and wrist-measured total steps were attributable to the participants (51% of total variance) and the variation of their behaviour on different days (37%), while the measurement systems were the source of little variation (1%). However, when step counts were categorised into slower-paced and faster-paced walking, the measurement systems were a much larger source of variation, contributing between 15% and 30% of the total variance.

Clinical Validity
Of the N = 777 participants who took part in the REACT trial, 651 (83.78%) provided both valid accelerometer data and completed the physical function tests on at least two of the four recording periods (10.42% of participants provided data from two periods, 28.44% from three, and 44.92% from all four). Consequently, the model coefficients and goodness-of-fit metrics were derived from stepping activity collected from 15,374 measurement days across 2227 participant-recording periods. On average, each participant provided 6.9 days of accelerometer data from 3.6 recording periods. The age at recruitment ranged from 65 to 98 years. Table 2 presents the characteristics of the study population, resulting from randomisation via a minimisation algorithm, which balanced groups by study site, age group, gender, and initial functional ability [64,72].

Mean (SD; % of total).
More total steps were consistently associated with better physical function (higher SPPB scores) at all three follow-ups compared to baseline and the association became stronger over the 2-year period. After 24 months, an increase of 1000 daily total steps was associated with a physical function increase of 0.21 points on the 12-point SPPB scale compared to the baseline (Table 4, Model 1). Table 4. Comparing models containing 'total steps' and 'faster-paced walking steps' separately and concurrently, in the prediction of physical function. Unstandardised coefficient estimates reported per 1000 steps with 95% confidence intervals. All models are adjusted for age, sex, site, allocation to Control or Intervention, SF-36 Score, comorbidities, IMD quintile, and highest education in addition to the stepping variables shown. The reference level for all interactions is baseline. The 6-, 12-, and 24-month coefficients represent how much stronger the coefficient is compared to baseline and need to be added to the baseline coefficient to obtain the total strength of the association at each time point (e.g., 0.04 + 0.21 = 0.25 for total steps at 24 months in Model 1). Only models 2 and 3 are nested. N/A = not applicable (variable not included in model) AIC = Akaike Information Criterion (lower by 2 units is considered a better model). * p < 0.05, ** p < 0.01, *** p < 0.001 (coefficient statistically significantly greater than reference baseline coefficient).

Model 1 Total Steps Only
A higher number of faster-paced walking steps was also consistently associated with better physical function and the association became stronger over time. An additional 1000 daily faster-paced walking steps were associated with a physical function increase of 0.69 points at the 24-month follow-up compared to the baseline (Table 4, Model 2).
Additionally controlling for total steps attenuated the association but it remained present at all time points. An increase by 1000 slower-paced steps (20-62 steps/min) was associated with a physical function increase of 0.13 points while 1000 additional fasterpaced walking steps (63-175 steps/min) were associated with an increase of 0.53 points compared to the baseline (Table 4, Model 3).
In relation to the study population's mean baseline activity levels (Table 3)-which are typical for older adults [9]-1000 additional steps represented a daily physical activity increase of 17% total steps, 20% slower-paced steps, 172% slower-paced walking steps, or 112% faster-paced walking steps. Sensitivity analyses showed that the health and socio-economic covariates did not alter the presence of the reported associations (Table A1). The results remained the same when the models were replicated as ordinal logistic regressions with mixed effects (Table A2). Separate models of Control and Intervention group data produced comparable results that were also in line with the primary analysis (Tables A3 and A4).

Discussion
The aims of this study were to verify and evaluate the analytical and clinical validity of a wrist-worn system for estimating the stepping volume and rate in community-dwelling adults. The results showed that verified, processed data on stepping volume and rate had a high level of agreement with total steps and an acceptable level of agreement with faster-paced walking steps, when directly compared with a thigh-based reference standard in a sample of community-dwelling adults aged over 50.
Direct comparisons with other studies reporting the analytical validity of step counting algorithms, processing data from wrist-worn devices, is challenging due to differences in methodology. The primary challenge is the absence of a true gold standard to classify stepping volume and rate in free-living settings where a direct observation is not feasible. For this reason, most validity studies [44,46,[73][74][75][76] are limited to laboratory settings or semi-supervised conditions involving simulated outdoor stepping situations where direct observation is possible for short periods. The performance of step counting algorithms validated under such conditions is poor when they are applied to free-living situations [77,78]. The absence of a gold standard measure prevents the assessment of criterion validity, and it has been proposed that the term 'reference standard' be used in situations when the best available method is being used rather than a gold standard [79]. To our knowledge, only one study has assessed the analytical validity of a wrist-worn system in communitydwelling adults using the thigh-worn activPAL as the reference measure [28]. In a sample of N = 713 (aged 45 ± 10 years), participants wore both accelerometers together for 7 days. A greater number of daily total steps were recorded for the wrist-worn system compared to the thigh-worn system, a finding similar to the current study. In addition, consistent with the current study, the wrist-worn system had a high level of agreement with daily total steps and a lower level of agreement with faster-paced walking steps (which Maylor et al. [28] defined as >100 steps/min). The level of agreement for slower-paced stepping was not described. As reported in the current study, Maylor and colleagues also comment that the between-accelerometer differences in faster-paced walking steps may be largely due to the reference measure not reliably capturing slower-paced non-walking steps, rather than an error in the wrist-worn system. This again highlights the problem of the absence of a true gold standard criterion measure when assessing a wide range of cadences in free-living settings. In older populations, where the proportion of daily slower-paced steps to total steps is likely to be higher, underestimating slower-paced stepping could be particularly problematic [80].
In the clinical validity study in community-dwelling older adults with a mean age of 77 (±7) years, both faster-paced walking steps (63-175 steps/min) and all other slowerpaced steps (20-62 steps/min) were independently associated with higher physical function. The model with both the number of faster-paced walking steps and slower-paced steps was a better fit than models with just total steps (20-175 steps/min) or just faster-paced walking steps. Even if total steps are quite low, if they are mostly faster-paced walking steps, the risk of reduced function is lower than it would be for a higher total number of entirely slower-paced steps.
Over a 2-year period, total steps, slower-paced steps, and faster-paced walking steps fell by 944, 708, and 236 steps/day, respectively, in the combined control and intervention group dataset from the REACT trial. These declines in stepping could potentially lead to a decrease of 0.22 in SPPB score ([0.708 × 0.13] + [0.236 × 0.53]). Consequently, if older people merely retained their baseline stepping level, they could potentially prevent a 0.22 decrease in the SPPB score. However, a meta-analysis of intervention studies has shown that an increase of approximately 1000 steps/day is possible [81]. If achieved in this population (e.g., 500 extra faster-paced walking steps/day and 500 extra slower-paced steps/day), the SPPB score would be expected to increase by 0.33 ([0.500 × 0.53] + [0.500 × 0.13]). Alternatively, if an extra 500 faster-paced walking steps/day were achieved and 500 slower-paced steps/day were replaced with faster-paced walking steps, the SPPB score could increase by 0.47 ([1.000 × 0.53] − [0.500 × 0.13]), a reversal of the expected age-related decline in physical function as well as being a clinically meaningful change [82][83][84]. Increases of 1000 steps/day would not only increase the physical function but also reduce the risk of all-cause mortality and cardiovascular disease morbidity and mortality [1]. Additional examples of changes in slower-paced steps and faster-paced walking steps and the estimated changes in the SPPB score can be found in Figure A4.
To the best of our knowledge, there are no studies that have assessed the clinical validity of accelerometer-derived stepping metrics and objective measures of physical function. More specifically, there are no studies of the association between the changes in stepping volume and rate with changes in physical function. Despite this, our results for daily total steps are consistent with the many prospective cohort studies that consistently show that higher volumes of daily stepping are associated with reduced risk of mortality and chronic disease [1,11,13].
However, there remains uncertainty about whether stepping at a faster pace is associated with health benefits, independent of the total steps taken per day. In this study, increases in faster-paced walking steps were more strongly associated with physical function than increases in slower-paced steps. Different devices, their wear location, and step detection methods can lead to different estimates of stepping rate. The lowest stepping rate able to be reliably detected also varies between devices [25]. Inevitably, this will lead to the misclassification of stepping rates, especially in older adult populations where slow stepping rates are most prevalent. Nevertheless, different devices attached to the same body part and using the same processing algorithm can reduce the differences between device outputs, at least for 'average-paced' walking [46]. Some studies calculate the stepping rate using an epoch method while others, including this study, use an event-based method (identify a variable-length stepping event, count the number of steps in the event, and divide by the duration of the event). It has been reported that the epoch method underestimates the 'true' stepping rate because it includes periods of standing as well as stepping into one and the same epoch [85]. In addition, event-based methods may be better placed to establish the independence of stepping rate because the stepping rates estimated from epoch methods are more correlated with total steps [86]. In addition, some studies only computed the stepping rates for epochs ≥2 min and cadences ≥60 steps/min [8], whereas others computed cadences as low as 1-39 steps/min [13], even though the device used was not validated for such low cadences.
Declines in physical function are insidious and start at a point when traditional measures of physical function, such as the SPPB score, would likely return 'normal' values despite the function already being in decline. We show that the wrist-worn system evaluated in this paper is fit for purpose to obtain a digital biomarker for the early detection of people's susceptibility or risk of decline in physical function and can be measured remotely at a time when people still have a reserve of function sufficient to alter their trajectory towards low function and frailty. A meta-analysis of interventions [81] has shown that the level of change required to preserve or improve the function identified in this study can be achieved and would also be accompanied by a significant reduction in the risk of chronic disease and all-cause mortality [1].
The GENEActiv wrist-worn system used in this study achieves a high wear time compliance in a variety of populations, is low burden for the wearer, and is proven to be easily deployable in a wide range of applications. The pfSTEP biomarker can be derived from the GENEActiv raw (sensor-level) acceleration data using standard approaches and the open-source GENEAcount algorithm. The continuous measurement of body movement from the wrist is fully aligned with the intended utility of the pfSTEP biomarker (assessing physical function through stepping volume and rate). The representation of the biomarker as two integers retains the intuitive simplicity and usability of steps for both individuals and clinicians, while providing a much richer outcome to support decision-making.
A strength of this study is the methodical approach to the V3 process [40] for assessing how fit for purpose the measures of stepping volume and rate obtained from the wrist are as a digital biomarker of susceptibility/risk for low physical function in older, communitydwelling adults. In addition, the analytical validity was assessed in a real-world setting over several days, better reflecting daily living values of stepping rates compared to laboratory estimates of gait speed [87]. The repeated measures of both exposure and outcome measures are a real strength of this clinical validity study, along with the large sample of community-dwelling older adults.
A major strength of the current study is that it has collected measures of both the exposure (stepping) and the health outcome (physical function) at four time points over a 2-year period. The wholly longitudinal nature of these data allows for the analysis of dynamic associations, rather than the static associations afforded by cross-sectional designs. Dynamic associations in this analysis are represented by the 'stepping x time' interaction term, which describe to what degree time-related changes in the SPPB score are associated with time-related changes in stepping. A more common approach in longitudinal studies is to measure stepping once-at the baseline timepoint-and measure physical function at baseline and follow up. The absence of repeated measures of the exposure in such studies would be a major limitation in ageing populations, as this study showed that large decreases in daily total steps, especially at faster-paced walking, occurred over a 2-year period (a reduction of 16% and 26%, respectively). Repeated, longitudinal data are also likely to improve the reliability of associations compared to cross-sectional data as they are less affected by the occurrence of non-typical measures (e.g., a non-representative week of walking/stepping or sub-optimal performance in the SPPB tests). Representing total steps with two different stepping variables of non-overlapping cadence (faster-paced walking steps and slower-paced steps) in the same model makes for an intuitive interpretation of the model coefficients and reduces the level of collinearity between predictors. If total steps and faster-paced walking steps (a sub-set of total steps) had been entered into the same model, this would have caused a high level of collinearity, which in turn would have increased the uncertainty and decreased the reliability of their respective model coefficients. Furthermore, the 'total steps' coefficient would represent the coefficient of 'slower-paced steps' with 'faster-paced walking steps' already being accounted for in the model.
This study is the first to be methodical in trying to match the processing methods for both systems as much as possible. Future studies of analytical validity in real-world settings would benefit from being more transparent about the differences in the step detection methods to ensure that the measurement systems are not a large source of the variance between the stepping estimates, potentially leading to false conclusions about the accuracy of the system being compared to the reference system.
A major limitation in this study, and any other analytical validity study in free-living settings, was the absence of a true gold standard criterion measure. As a result, differences in the estimates of stepping could not be attributed to a misclassification in one system or the other. However, it has been observed that, in situations where an acceptable reference standard does not exist, clinical validation can provide a significant methodological advance [79]. Furthermore, our analytical and clinical validity studies were restricted to older people, limiting the external validity of the results. Additional studies are required in a broader range of populations to determine how generalisable the results are.
The well-documented challenge of accurately detecting slower-paced stepping [5,25,[29][30][31], especially in older people, requires urgent attention to better understand the value of slower-paced stepping in this population. Systematic reviews of the prospective association of stepping measures and health outcomes struggle to harmonise the data for meta-analysis due to the very many differences in the systems used to collect the estimates of stepping measures. With the increasing availability of cloud storage, it is possible to store the raw acceleration data, from which stepping measures are derived, at scale. This would allow future reviews to apply a single processing method to raw acceleration data collected from different devices if the wear location was consistent, the wear time was standardised, and the device outputs were verified. This could improve the precision of estimates of the associations between stepping and health outcomes.

Conclusions
We have described and validated a digital susceptibility/risk biomarker-pfSTEPthat identifies the associated risk of a low physical function in community-dwelling older adults using a wrist-worn accelerometer and its accompanying open-source step counting algorithm. Older adults who increase their proportion of faster-paced walking steps reduce their risk of developing low physical function and thereby their risk of premature mortality, frailty, hospitalisation, and falls. The digital pfSTEP biomarker uses real world evidence from a system with proven high usability. It supports continuous measurement outside the confines of the clinic or laboratory environment and enables the remote monitoring of changes in ambulatory activity to identify older adults at risk of developing a low physical function.
Supplementary Materials: The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/s23115122/s1, Code S1: R code for the processing, cleaning, analysis, and visualisation of the analytical validity study; Code S2: R code for the processing, cleaning, and visualisation of the clinical validity study; Code S3: Stata code for the analysis of the clinical validity study. Informed Consent Statement: Informed consent was obtained from all subjects involved in the original DAPPA and REACT studies, including consent for their anonymised data to be used for future research.

Data Availability Statement:
The data presented in this study are available on request from Afroditi Stahti (REACT dataset) and Max Western (DAPPA dataset). The data are not publicly available due to privacy protection.

Conflicts of Interest:
Joss Langford is the director of Activinsights Ltd. All other authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results. Appendix A Figure A1. Event-level distributions of (A) step count of the event, (B) duration of the event, and (C) cadence of the event as measured on the thigh and on the wrist.     Appendix B Table A1. Comparing models containing 'total steps' and 'faster-paced walking steps' separately and concurrently, without health and socio-economic covariates, in the prediction of physical function. Reference level for all interactions is the baseline visit. Note that only models 2 and 3 are nested. N/A = not applicable (variable not included in model) AIC = Akaike Information Criterion (lower by 2 units is considered a better model) * p < 0.05, ** p < 0.01, *** p < 0.001 (coefficient statistically significantly greater than reference baseline coefficient). Table A2. Comparing models containing 'total steps' and 'faster-paced walking steps' separately and concurrently, modelled as ordinal logistic regression with mixed effects, in the prediction of physical function.  Unstandardised coefficient estimates are log-odds-ratios reported per 1000 steps with 95% confidence intervals. All models are adjusted for age, sex, site, allocation to Control or Intervention, SF-36 Score, comorbidities, IMD quintile, and highest education in addition to the stepping variables shown. The reference level for all interactions is the baseline visit. Note that only models 2 and 3 are nested. N/A = not applicable (variable not included in model) AIC = Akaike Information Criterion (lower by 2 units is considered a better model) ** p < 0.01, *** p < 0.001 (coefficient statistically significantly greater than reference baseline coefficient). Table A3. Comparing models fitted to Control group data only, containing 'total steps' and 'fasterpaced walking steps' separately and concurrently, in the prediction of physical function. Unstandardised coefficient estimates reported per 1000 steps with 95% confidence intervals. All models are adjusted for age, sex, site, SF-36 Score, comorbidities, IMD quintile, and highest education in addition to the stepping variables shown. The reference level for all interactions is the baseline visit. Note that only models 2 and 3 are nested. N/A = not applicable (variable not included in model) AIC = Akaike Information Criterion (lower by 2 units is considered a better model) * p < 0.05, ** p < 0.01, *** p < 0.001 (coefficient statistically significantly greater than reference baseline coefficient). Table A4. Comparing models fitted to Intervention group data only, containing 'total steps' and 'faster-paced walking steps' separately and concurrently, in the prediction of physical function. Unstandardised coefficient estimates reported per 1000 steps with 95% confidence intervals. All models are adjusted for age, sex, site, SF-36 Score, comorbidities, IMD quintile, and highest education in addition to the stepping variables shown. The reference level for all interactions is the baseline visit. Note that only models 2 and 3 are nested. N/A = not applicable (variable not included in model) AIC = Akaike Information Criterion (lower by 2 units is considered a better model) * p < 0.05, ** p < 0.01, *** p < 0.001 (coefficient statistically significantly greater than reference baseline coefficient).