Revealing the optimal thresholds for movement performance: A systematic review and meta-analysis to benchmark pathological walking behaviour

: In order to address whether increased levels of movement output variability indicate pathological performance, we systematically reviewed and synthesized meta-analysis data on healthy and pathological motor behavior. After screening up to 24’000 reports from four databases, 85 studies were included containing 2409 patients and 2523 healthy asymptomatic controls. The optimal thresholds of variability with uncertainty boundaries (in % Coeﬀicient of Variation ± Standard Error) were estimated in 7 parameters: stride time (2.34 ± 0.21), stride length (2.99 ± 0.37), step length (3.34 ± 0.84), swing time (2.94 ± 0.60), step time (3.35 ± 0.23), step width (15.87 ± 1.86), and dual-limb support time (6.08 ± 2.83). All spatio-temporal parameters exhibited a positive effect size (pathology led to increased variability) except step width variability (Effect Size = -0.21). By objectively benchmarking thresholds for pathological motor variability also presented through a case-study, this review provides access to movement signatures to understand neurological changes in an individual that are apparent in movement variability. The comprehensive evidence presented now qualifies stride time variability as a movement biomarker, endorsing its applicability as a viable outcome measure in clinical trials. In order to address whether increased levels of movement output variability indicate pathological performance , we systematically reviewed and synthesized meta-analysis data on healthy and pathological motor behavior. After screening up to 24’000 reports from four databases, 85 studies were included containing 2409 patients and 2523 healthy asymptomatic controls. The optimal thresholds of variability with uncertainty boundaries (in % Coefficient of Variation ± Standard Error) were estimated in 7 parameters: stride time (2.34 ± 0.21), stride length (2.99 ± 0.37), step length (3.34 ± 0.84), swing time (2.94 ± 0.60), step time (3.35 ± 0.23), step width (15.87 ± 1.86), and dual-limb support time (6.08 ± 2.83). All spatio-temporal parameters exhibited a positive effect size (pathology led to increased variability) except step width variability (Effect Size = −0.21). By objectively benchmarking thresholds for pathological motor variability also presented through a case-study, this review provides access to movement signatures to understand neurological changes in an individual that are apparent in movement variability. The comprehensive evidence presented now qualifies stride time variability as a movement biomarker, endorsing its applicability as a viable outcome measure in clinical trials.


Introduction
The effective performance of activities of daily living such as standing, walking or reaching is fundamental to leading an independent life. Walking specifically, is an essential locomotor activity that requires effective regulation between central and peripheral nervous, and musculoskeletal system resources allowing us to synergize the movement of limbs both spatially (inter-limb coordination) and temporally (rhythmicity -constancy in step repetitions), such that we can maintain our balance (regulation of the centre of mass over the base of support) (Clark, 2015;Plotnik and Hausdorff, 2008;Bruijn and van Dieen, 2018). In addition, these resources are needed to negotiate obstacles in the environment (obstacle avoidance) and to respond to perturbations to the moving body (Kovacs, 2005;Song and Geyer, 2017). Due to the continuous regulation of these neural resources, as well as the noise inherently present in both sensory and motor signals (Singh et al., 2012;König et al., 2014a;Jones et al., 2002;Churchland et al., 2006), motor output fluctuates around a target or desired level, termed motor output or Movement Variability (MoV - (Stergiou et al., 2006;Herzfeld and Shadmehr, 2014)).
The ability to walk in a stable manner declines with age and pathology, adversely affecting health-related quality of life (Netuveli and Blane, 2008;Seidler et al., 2010). Furthermore, movement deficits require individuals to perform activities near their maximal effort (Singh et al., 2012;Aagaard et al., 2010;Chandler et al., 1998), thus continually challenging them physically (but also cognitively (Beauchet et al., 2011;Hamilton et al., 2009)) and, in many cases, leading to adverse events such as injurious falling. Recent evidence shows that elderly adults produce repetitive movements such as walking with greater MoV, possibly due to the loss in strength and flexibility (Kang and Dingwell, 2008) as well as the decline in walking speed (Chien et al., 2015). Interestingly, MoV is even greater in individuals that are at a high risk of falling (Hamacher et al., 2011;König et al., 2014b), or suffer from neurological disorders affecting motor function (hereinafter referred to as "movement disorders") such as Parkinson's disease (PD), Multiple Sclerosis, or Huntington's disease (König et al., 2016a; https://doi.org/10.1016/j.neubiorev.2019.10.008 Received 9 May 2019; Received in revised form 7 October 2019; Accepted 11 October 2019 Hausdorff et al., 2001;Hausdorff, 2005;Moe-Nilssen et al., 2010).
It is likely that an optimal window of MoV characterises asymptomatic individuals, and differentiates healthy from pathological movement function (Singh et al., 2012;Stergiou et al., 2006;König et al., 2016b;Stergiou and Decker, 2011;Rosenblatt et al., 2014). In general, below this window, movement is likely to become rigid (system with limited flexibility to adjust to internal and external perturbations), while MoV above the optimum would bring the system closer to its limits of stability (generally provided by the feet in contact with the ground that form the base-of-support -see (Hof et al., 2007) for details), with both extremes indicating deficits in movement performance (Stergiou and Decker, 2011). Such an interpretation might appear in line with contemporary theoretical frameworks (cf. proposed by (Todorov, 2004), but also confirmed by (Cusumano and Dingwell, 2013)), which hypothesize that movement tasks are likely adapted and executed (i.e. the level that one should aim for (Todorov, 2004)) by optimizing accuracy, while requiring minimal control effort. The threshold probed within this study however, is conceptually different in that it reflects an optimal boundary identified across multiple studies that differentiates healthy from pathological gait. Such a threshold would allow the possible use of MoV as an effective biomarker for assessing an individual's neuro-motor status. A comprehensive knowledge of MoV with clear definitions for asymptomatic task performance in both temporal and spatial domains could allow different metrics of variability to be established as intrinsic features of performance (signatures) to associate an individual's quality of movement with their underlying neural status. In essence, by identifying the optimal thresholds for benchmarking motor signatures, we envisage prioritising and formalising potentially useful movement-based biomarkers for neurological disorders (Shipitsin et al., 2014;Strimbu and Tavel, 2010). As such, a well-defined biomarker, or combination thereof, can address the persistent clinical need for early identification of movement disorders (screening) as well as for evaluating the effectiveness of therapies for returning individual patients to some level of independent living.
In a previous systematic review (König et al., 2016b), we investigated the thresholds between healthy asymptomatic and pathological magnitudes of task variability, and concluded that, for the parameter "variability of stride time" (STV; evaluated as the coefficient of variation, CV of stride time), upper thresholds of 2.6%CV discriminated pathological from healthy adults performance. Furthermore, 1.1%CV was identified as the lower threshold for healthy variability in adults. Although the systematic review approach was comprehensive, a very large sample of clinical trials, each consisting of a suitable number of subjects, is clearly needed in order to better estimate the true value of the boundaries. Until then, such a pool of information will neither be reliable nor robust, meaning that the inclusion of any further clinical trial data would have sufficient relative weight to influence the existing evidence on optimal levels of variability. Therefore, it is critical to determine methods that are able to absorb the changes such that the estimated boundary levels remain robust and meaningful in light of additional evidence. One such approach involving the estimation of probabilistic thresholds has been proposed previously in other domains, but never used in the clinical context of movement signatures (Xu and Gupta, 2005). As a result, due to the growing recognition of metrics of MoV for assessment in clinical settings, a full and complete study evaluating thresholds between healthy asymptomatic and pathological task variability, is timely.
The selective effects of movement disorders such as PD on different neurological structures (consequently impairments may also be variable) highlight the importance of holistically understanding the interplay between various locomotor characteristics that are critical during walking (König et al., 2016a). However, due to the conventional subjective approach of pre-selecting parameters in a somewhat arbitrary manner (e.g. STV is reported far more than any other spatio-temporal parameter in the literature (König et al., 2016b), possibly due to its ease of measurement), the complex interactions between multiple different features of walking remain largely unexplored. Such subjective practices therefore hinder an accurate characterization of movement deficits in both clinical cohorts and on an individual basis. Thus, an understanding of how multiple signatures of gait are regulated would lay the foundations for unravelling the neuromuscular mechanisms that are involved in not only walking, but also movement in general Dingwell and Cusumano, 2000;Lord et al., 2011). With a vision to distinguish healthy asymptomatic from pathological gait performance in a holistic manner, we therefore aim to broaden our knowledge of optimal windows of variability (originally investigated only for stride time (König et al., 2016b)) by estimating the optimal thresholds for all commonly reported characteristics of walking.
The aim of this review was therefore to firstly provide current stateof-the-art and reliable evidence on magnitude of MoV in healthy walking behaviour, but also to exploit a probabilistic approach for improving the robustness of the optimum window of MoV. Finally, this paper presents a case study involving a statistical model for investigating motor deficits in PD patients using the optimum thresholds identified in the review, including preliminary data from a retrospective case-control study.

Search strategy and study selection
The literature search and selection strategy (Supp. Methods 1) in the original systematic review (König et al., 2016b) was maintained in this study in order to extend the search for articles published after June 2014 until June 2018. Four databases: PubMed, Web of Science, Embase and Ebsco were comprehensively searched to include only studies in which continuous measures of variability during straight line walking were collected in both healthy asymptomatic adults and patients with a neurological disorder. The inclusion criteria for the studies was: 1) Outcome measure -inter-cycle variability of gait measures (e.g. step length, stance time) expressed by percentage coefficient of variation, calculated as the ratio of the standard deviation to the mean, 2) Participants -cohort of healthy adults and a cohort of patients with a neurological pathology, and 3) Task -walking on a treadmill or overground at a comfortable or self-selected walking speed. The search was further restricted to peer reviewed articles published in the English language. An exhaustive list of exclusion criteria is presented as supplementary material (Supp. Methods 2. Exclusion criteria). Two of the authors (DKR and MG) performed the literature search and screened the studies at each stage of the review; any disagreement was resolved by consensus (together with NBS). The study was performed in line with the Preferred Reporting Items for Systematic Reviews and Meta-Analysis (PRISMA, see Fig. 1) and the PRISMA checklist is provided as supplementary material. The literature search identified 11,111 potentially relevant articles. After the removal of duplicates (n = 2384) and articles rejected based on title or abstract (n = 8543), 184 articles were included for full text screening. During this process, a further 163 articles were excluded. Finally, 21 studies were included containing a total sample of 752 patients and 608 healthy asymptomatic participants over and above the 1657 patients and 1915 healthy asymptomatic participants from the original study (König et al., 2016b). The characteristics of all included studies is presented in a supplementary table (Supp. Table 1). Screening of all the articles were performed within the EPPI-Reviewer 4 software.

Risk of bias assessment
Two authors (DKR and NBS) independently assigned the risk of bias quality scores to the studies with the use of the MINORS tool. Each study was assessed as having high, unclear or low risk of bias on all items (scored from 0 to 2 respectively with an ideal global score of 18) included in the original checklist (three items: Unbiased Assessment of the Study Endpoint, Follow-up period appropriate to the aim of the study, and Loss to follow up less than 5% were considered irrelevant and were excluded). Any discrepancies between reviewers were resolved through consensus. The results are presented in a supplementary table (Supp. Table 2).

Data analysis and synthesis
We used measures of variability of spatio-temporal parameters of walking reported through coefficient of variation (CV) for both healthy asymptomatic and pathological cohorts. An effect size (ES -the difference between the means of the pathological and healthy asymptomatic control groups over the pooled standard deviation, which was additionally corrected for sample size to provide Hedges' g) for each study was used to express the difference between cohorts in a standardised manner. The group averages for each gait parameter from all studies were then combined by calculating a pooled effect size using standard error as a weighting factor (in order to minimize the risk of overestimation). Heterogeneity among the included studies was assessed and interpreted using I 2 statistics (according to the Cochrane guidelines (Higgins et al., 2011)). Additionally, potential publication bias was assessed through visual inspection of the funnel plot with effect sizes and standard error of the effect estimate. In order to estimate the upper boundary of the performance window, all studies that exhibited a positive ES were selected for the logistic regression. Similarly, studies that revealed a negative ES formed the basis for estimating the lower boundary (König et al., 2016b).
A generalized linear mixed model fit using maximum likelihood criteria (with Laplace approximation to integrate the likelihood function (Handayani et al., 2017)) was used to model the log odds of the binary outcome (0 for healthy asymptomatic, 1 for pathological) as a linear combination of the predictor variable (CV gait parameter) and the study index (random effects). As the means of the healthy asymptomatic and pathological groups for a particular walking parameter originating from the same study could be correlated to each other due to the inherent study design and experimental protocol, a mixed-effects model was used to account for the random effect of the paired data (healthy asymptomatic vs. pathological) within each included study and also accounting for heterogeneity in meta-analysis.
The logistic model (with random effect for the study index) is given by: where p i is the probability that the observation belonged to a particular cohort given the predictor variable, x i (gait parameter), and the study index, U i ; B 0 and B 1 are coefficient estimates of the regression model estimated using maximum likelihood (see Supp. Figures. 1-7). Finally, in order to test whether the model with the predictor variable (model with the alternate hypothesis wrt coefficient B 1 ) fits significantly better than a null model (model with the null hypothesis wrt B 1 ), we evaluated Bayes Factor using ΔBIC (where ΔBIC = BIC H1 -BIC H0 and BIC is the Bayesian Information Criterion -please refer to (Jarosz and Wiley, 2014)).
The discriminatory accuracy of our model was assessed using the receiver operating characteristic (ROC) curve obtained by mapping sensitivity versus specificity for all possible values of the cut-off point between pathology and healthy. Here, sensitivity (True Positives) was defined as the probability of correctly classifying an individual as pathological, while specificity was the probability of correctly classifying an individual as healthy asymptomatic (True Negative) (Zweig and Campbell, 1993). The optimality criterion was given by:  (2) The optimum cut-off probability point, P opt , was obtained from Eq. (2) as the point minimizing the Euclidean distance between the ROC curve and the (1,1) coordinate on the ROC plane (Zweig and Campbell, 1993) -see Supp. Figures. 8-14 for details. P opt was then fed into Eq. (3), obtained by reformulating the binary logistic regression function in Eq.
(1), in order to estimate the optimal threshold value, x opt , for a particular gait parameter.

Probabilistic thresholds
Optimum separation between healthy asymptomatic and pathological motor performance for any physiological parameter (in our case gait variability) will approach the "true value", only when estimated using a sufficiently large number of clinical trials with adequate sample of healthy and pathological participants. Such a measure can only be informative when the mean is reported together with its standard error, hence providing the basis for confidence levels (or uncertainty) to be evaluated (Altman and Bland, 2005). In the present study, the standard error (SE) was estimated as probability boundaries  of the evaluated optimal threshold (x opt ) via Eqns. 1 -3, which then provided a more systematic and reliable estimate of the window of healthy physiological gait performance. SEs were evaluated using the Delta Method involving a first-order Taylor approximation (Venables and Ripley, 2010), which is given by: All analyses were conducted in Matlab (v2016b, The Mathworks Inc., USA) and R (v3.4.1, The R Foundation for Statistical Computing, Austria).

Case study
Experimental approach to demonstrate the application of optimal thresholds in the clinical context.
In an attempt to demonstrate how the optimum thresholds can be used to provide rapid and robust indications on healthy versus pathological gait performance, we exemplarily applied the synthesised evidence to preliminary movement data from a retrospective case-control study. Twenty elderly volunteers including 10 patients with PD (PwPD) with a mean(SD) age of 59(6)years, height 173.9(4.6)cm and weight 75.5(9.4)kg, as well as 10 healthy controls 65(11)years, height 165.6(5.3)cm and weight 61.7(8.4)kg were recruited. All participants provided written, informed consent approved by the local ethics committee (registration number: 2015-00141) prior to participation. The participants were requested to walk continuously in a path shaped as an "8" in the laboratory for ten minutes at their own self-selected walking speed (König et al., 2014a) while 3D kinematics of both feet from the straight line walking (reflective markers placed on the calcaneus of each foot) were only recorded using optical motion capture (VICON, OMG Ltd, UK). PwPD were on their normal dopaminergic prescriptions at the time of measurement. The median cohort score using the Unified Parkinson's Disease Rating Scale (MDS-UPDRS III) while off medication was 38 (range 24-61) and while on medication was 22.5 (range 6-40).

Results
When combined, a total of 85 studies from the current and original review were considered for analysis, and 7 parameters were reported across different pathological groups, resulting in a total of 147 ES values (Forest plot Fig. 2). The heterogeneity in this systematic evidence was: I 2 = 13.8%, with an average Cochrane's Q of 198.5, which indicates excellent consistency across the publications reviewed. STV still remained the most commonly reported parameter characterising MoV (47 studies). Other frequently reported gait variability parameters were variability of: a) stride length (23 studies), b) step length (22 studies), c) swing time (13 studies), d) step time (12 studies), e) step width (7 studies), and f) dual limb support time (6 studies). The evaluated articles presented varying risk of bias and methodological quality. Mean quality score was 13.38 ± 1.54 (range 10-17) for an ideal of 18. The summary of the methodological score for each question and studies is provided in Supp. Table 2. Unfortunately, only 45% of the included studies used 50 or more steps in their analysis, which might be required for reliable assessment of gait variability (König et al., 2014a).
Since 2014 when the previous review was conducted, nine additional studies included STV in their trials, an increase of ∼24%, hence necessitating an update of the reported thresholds between healthy asymptomatic and pathological variability. In the current study, we evaluated the higher bound of physiological STV to be 2.34%, which is 0.26% lower than the estimate in the original review. The delta method was then employed to identify the standard error in the likelihood estimates as ± 0.21%. The corresponding 95% confidence intervals (95%CIs) for the upper thresholds of physiological stride time variability were thus estimated as 1.92-2.76% (2 SDs above and below the mean). Similar to STV, the optimal thresholds and subsequently the likelihood estimates were also evaluated for the other commonly reported gait parameters (Table 1). All comparisons had positive effect sizes (stride time (0.75 ± 0.09), stride length (0.59 ± 0.11), step length (0.82 ± 0.15), swing time (0.34 ± 0.11), step time (0.83 ± 0.20) and dual limb support time (0.42 ± 0.27)), with the upper thresholds discriminating healthy asymptomatic from pathological gait, mostly ranging from approximately 2-6%, except step width variability with an ES = −0.21 ± 0.18 and threshold of 15.9%. Furthermore, the most commonly reported parameters, variability of stride time (N = 47) and length (N = 23) were both highly significant (p < 0.01), while also displaying the lowest Bayes Factor (< 1.4 e-4, see Supp. Table 3). Finally, the funnel plot (Fig. 3) of our data reveal asymmetrical distribution towards positive ES due to possible reporting bias (reduced likelihood of studies being published that report negative or no ESs).

Schematic representation of optimal thresholds data using radar plots
The patients suffering from basal ganglia disorders (Parkinson's as well as Huntington's disease - Fig. 4b in red, please see figure caption for details on the radar plot representation) had larger variabilities than their asymptomatic counterparts (displayed Fig. 4a-f solid line in green) in all the reported parameters, which were also outside the higher bound of threshold (Fig. 4a-f green bars on axes). In contrast, patients with cognitive disorders (e.g. Alzheimer's disease or Mild cognitive disorder - Fig. 4c in red) exhibited larger levels (in comparison to the asymptomatic group) for only the temporal variability parameters. Finally, patients suffering from global disorders (e.g. Multiple sclerosis Fig. 4d.) also suffered from asymmetry issues (than the asymptomatic group) as depicted with larger variability in both step length and time.
3.2. Schematic representation benchmarking the data from the case study with optimal thresholds using radar plots Finally, Fig. 4e-f illustrate the results of the retrospective case control study with patients suffering from Parkinson's disease denoted with red lines. The retrospective case control study revealed that PwPD showed increased average gait variability compared to their healthy counterparts within the case study (displayed in blue solid line), consistent with the systematic review data. The pooled mean of all the gait variability parameters for controls within the case study were within the derived window for healthy physiological gait performance (as obtained from the meta-analytic investigation here and presented in Table 1 and overlaid on the radial plot in green solid line - Fig. 4a-f), while the mean of the PwPD cohort consistently lay outside the upper thresholds of the optimal window (as indicated by the green bars on each axes Fig. 4a-f). Fig. 4f also provides gait variability parameters for each individual (dashed lines) patient (as well as participant) assessed within the case study with the optimally identified thresholds (in green) for comparison.

Discussion
Movement variability is an important characteristic that may represent both redundancy and adaptability of human task performance (Hausdorff, 2005;Stergiou and Decker, 2011;Latash et al., 2002). Recent investigations clearly demonstrate that MoV is sensitive to adaptations, including learning new skills (Sternad, 2018;Wu et al., 2014), the onset of movement disorders (König et  Contemporary reports (König et al., 2014a;Hausdorff, 2005;König et al., 2016b;Dingwell and Cusumano, 2000) suggest that primary reasons for such a discrepancy are: i) a lack of understanding and misconceptions associated with MoV -e.g. MoV might represent error, but is also associated with adaptability (Schmidt, 2003;Lipsitz and Goldberger, 1992), ii) a subjective bias together with an arbitrary choice of parameters, and iii) the general requirement for dedicated measurements during extended periods of walking to effectively estimate MoV (König et al., 2014a). Despite the use of 4-and 6-minute walk tests in clinical settings, the continued lack of comprehensive assessment of gait quality suggests an opportunity missed, and perhaps a disadvantage to both the individual patient and the health care system. The evidence from this systematic review of the literature clearly suggests that MoV is able to quantify gait quality, furthermore the probabilistic analysis revealed an optimum threshold between healthy asymptomatic and pathological gait patterns. As such, there is potential added value of complementing standard clinical assessments with the evaluation of MoV as a relevant movement biomarker for identifying subjects with early signs of disease, but also in monitoring adaptations after therapy. One of the principal challenges hindering the uptake of MoV has been the need to collate the wealth of diverse literature, and provide a clear understanding of the healthy physiological levels of variability during walking. This study has therefore directly provided numerical evidence on healthy physiological movement performance, suggesting that locomotor tasks within populations suffering from neurological disorders might be distinctly regulated from their healthy asymptomatic counterparts (König et al., 2016a;Hausdorff, 2005;Stergiou and Decker, 2011).
A previous literature-based meta-analysis that investigated STV, provided optimum thresholds for establishing the limits between healthy asymptomatic and pathological task performance (König et al., 2016b). Due to the lack of both large-scale as well as long-term assessments, this numerical evidence was plausibly subject to change with new literature appearing. A preliminary search gauged and confirmed the existence of substantial new evidence, which therefore necessitated this review update (Garner et al., 2016). To accommodate such changes, a probabilistic approach to evaluate the boundaries for healthy performance in walking behaviour was considered within this updated review. The CI of the upper threshold of 2.34 %CV ranged from 1.92 to 2.76 %CV, and provides a robust probabilistic estimate for handling any new evidence on STV. An STV larger than this threshold, and certainly one exceeding the CI's upper boundary, indicates an overall inability to maintain temporal consistency of foot placement, Table 1 Effect size statistics including the z-test across all seven walking parameters, optimum thresholds and probabilistic levels (represented with Standard Errors) calculated at 50% dose levels.

Stride length variability
Step length variability

Swing time variability
Step time variability Step width variability Dual limb support time variability  Fig. 3. Publication bias demonstrated using funnel plots. D.K. Ravi, et al. Neuroscience and Biobehavioral Reviews 108 (2020) 24-33 such that walking performance might, in general, be closer to the limits of stability (Bruijn and van Dieen, 2018;König et al., 2016a). Specifically, an inter-cycle STV larger than the threshold might lead to inconsistency between swing and stance phases of gait making it difficult to maintain the CoM of the body within the base of support across cycles. Such an estimate therefore provides a meaningful threshold for objectively assessing STV and highlighting its value for clinical uptake for the purposes of screening individuals that might suffer from movement impairments.

Meta -analytic data
In the interest of translation to clinical settings, an important question is: "Does STV -the single most commonly used parameter of MoV -have the necessary attributes to qualify as a gait biomarker for detecting movement disorders? Here, Hausdorff (2005) has proposed that for MoV to be used as a biomarker, its reliability, accuracy, sensitivity and clinical utility (time-and cost-effectiveness) should all be established, or an optimal trade-off between all these attributes should be found. The duration of walking trials has a substantial effect on the reliability of assessing kinematic variability (Owings and Grabiner, 2003), but STV has also been shown to be modestly reliable with 50 steps, resulting in an inter-day test-retest variability of ∼13% (König et al., 2014a). This systematic review revealed that STV had a large effect size with the lowest standard errors for discriminating healthy asymptomatic vs. pathological performance (Table 1). Furthermore, the logistic regression model based on population data revealed that there is clearly a high level of sensitivity and specificity (75%) for using STV alone to identify movement disorders, proving its predictive capability (Table 1). Benchmarking demonstrated robust levels for higher bounds of unstable walking performance, providing the much-needed impetus for the practical uptake of STV in clinical settings towards accurately identifying individuals that suffer from movement impairments.  Fig. 4a. Here, the overall mean of the commonly reported gait variability parameters for healthy asymptomatic controls obtained from the studies included within the systematic review provides the baseline, and is depicted with green solid line. The optimum windows for the gait characteristics (depicted as green bars on the different axes on the radial plot) are formed using the optimum threshold (higher bound) identified from the logistic regression procedure and lowest observed group value for the asymptomatic subjects (lower bound). Fig. 4b. -d. Radar plots illustrating patients grouped according to the pathology from the studies included in the meta-analysis (solid red line). b. The overall mean of patients suffering from basal ganglia group (red), c. The overall mean of patients belonging to the cognitive group (red). d. The overall mean global group (red). Fig. 4e & f. Radar plots illustrating patients with Parkinson's Disease from the retrospective case study (solid red line). e. Parameter variables extracted from the retrospective case-study allow a comparison against PwPDs and healthy older adults (solid blue line). All values outside of the optimum windows indicate movement deficits. In this case not only did the mean of the PwPDs have higher levels of variability compared to the healthy controls, but the values were also consistently outside the upper thresholds of the optimal window. f. Average retrospective case-study data on PwPDs (red solid line) and healthy older adults (blue solid line), individual retrospective case-study data on PwPDs (red dashed lines) and healthy older adults (blue dashed lines). All values are presented in standardized or z-scores. The solid axes on each of the radial plot, in grey and radiating from the center of the plot range from -3.5 to 3.5 z-scores. The horizontal axis on the right-hand side displaying variability (Var) of stride length (StrideL) followed by (in a clockwise manner) step length (StepL), step time (StepT), swing time (SwingT), dual limb support time (DLS-T), stride time (StrideT), and step width (StepW), represent the 7 signatures considered within the study.
Finally, although direct evidence of clinical utility of STV remains lacking, the metric is seemingly well-accepted and incorporated in many clinical trials as a primary (Lord et al., 2011) or secondary outcome measure (Zweig and Campbell, 1993;Henderson et al., 2013). Notably, a Phase II clinical trial has recently used step time variability as a surrogate marker to investigate gait stability and fall risk in PwPD (Henderson et al., 2016). Finally, the accessibility of this parameter from wearable technologies clearly promotes STV as a promising signature for assessing movement rhythmicity.
Wearable technologies provide easy access to MoV indicators, such as STV, for patients in real-world or ecologically valid scenarios and therefore deliver a viable and cost-effective means for population-wide screening, but also for personalized monitoring of patients over extended periods. With a complete system that includes wearables and a clear interpretation of gait quality, it is possible to assess large-scale movement data (large sample sizes) over a long-term follow-up (multiple assessments from every individual). Subtle differences during repetitive tasks can then be recognized in an unbiased and rapid manner and be combined with the clinical status of the individual to provide close tracking of neuro-muscular performance. Here, the potential of MoV, especially STV, to identify motor quality in clinical populations has been well recognized (Kovacs, 2005;Song and Geyer, 2017;Moe-Nilssen et al., 2010;Stergiou and Decker, 2011). However, STV as well as other parameters will need to be quantified and generalized across populations in order to establish their association with other signatures (or features) of gait quality. While the methodological challenges associated with the quantitative assessment of various attributes of reliability and accuracy in MoV (König et al., 2014a(König et al., , 2014bBeauchet et al., 2009) have now mostly been resolved, epidemiological research in ecologically valid settings is still required before elevated clinical uptake can be realised.
While gait variability metrics such as STV are clearly quantifiable indicators of neural control of movement, it is hard to believe that just one parameter can independently characterize age and pathology related motor control impairments. By focusing predominantly on STV, movement scientists and clinicians might well under-estimate the complexity of gait control. Recent evidence indicates that multiple signatures are modulated differently by age (Rosso et al., 2014), pathology (Moe-Nilssen et al., 2010;Takakusaki, 2013), walking speed (Frenkel-Toledo et al., 2005), gender (Hughes-Oliver et al., 2018), but more importantly also by complex interactions between one or more of these factors (e.g. for age and walking speed see (Callisaya et al., 2010;Helbostad and Moe-Nilssen, 2003)). Although we did not observe a considerable effect of age on the overall effect size (or on stride time variability undertaken in separate subgroup analyses -Supplementary Tables 4 & 5), the modulatory effects of walking speed or gender could not be explored within this meta-analysis due to the lack of availability of data within the original studies. Despite this discrepancy, the metaanalysis clearly provides evidence on optimal boundaries for variability of multiple commonly identified spatio-temporal parameters.
Of all the parameters reported, step width variability was the only metric to exhibit reduced levels of variability in pathological populations compared to healthy asymptomatic individuals. This result is consistent with the original systematic review (König et al., 2016b) and coherent to previous findings that diminished step width variability is linked to poor balance and fall risk (Maki, 1997). On the neuromotor level, this observation plausibly indicates tightening control in one dimension at the expense of another. The significance of this finding however, needs further exploration and should be interpreted with caution, especially in the context of its dynamic interplay with both centre of mass kinematics (Arvin et al., 2018;Wang and Srinivasan, 2014), but also STV during task execution (Bauby and Kuo, 2000). In fact, variability in step width reflects the regulation of the body's base of support in order to maintain balance in the medio-lateral direction during walking. Biomechanically, modifications to step width (e.g. extending the lateral margins of the base of support -see Bruijn and van Dieen, 2018;Wang and Srinivasan, 2014;König Ignasiak et al., 2019) and/or double limb support time (extended time period during which the projected CoM is within the base of support) might influence our balance during walking (Moe-Nilssen et al., 2010;Rosenblatt et al., 2014;Galna et al., 2013;Owings and Grabiner, 2004;Gabell and Nayak, 1984). Moreover, step width might be regulated on a cycle-tocycle basis and is intricately related to the walking speed (Helbostad and Moe-Nilssen, 2003;Stimpson et al., 2018), the kinematic state of the CoM (Bruijn and van Dieen, 2018), but also the energy requirements (Donelan et al., 2001(Donelan et al., , 2004 as well as other environmental constraints. Although step width as well as dual limb support time have been suggested to reflect balance control (Bruijn and van Dieen, 2018;Wang and Srinivasan, 2014;König Ignasiak et al., 2019), the number of studies that reported step width variability (as well as dual limb support time variability) are approximately 9 times lower than those reporting STV. Despite widespread access to wearable technologies that allow the assessment of temporal parameters, their inability to assess spatial parameters such as step width is one of, if not the reason for this disproportionality. Interestingly, variability of double-support time (a temporal parameter), conventionally associated with balance control, actually demonstrated a positive overall effect size, it might represent a different sub-feature than the variability of step width (a spatial parameter), within the qualitative domain of balance control. Overall, the role of neuromotor control on step width variability (and dual limb support time variability) in older and pathological populations needs further investigation.
The hypothesis of dynamic interplay between movement or gait signatures (but also relevant as independent domains), such as that between rhythmicity and balance control, highlights the need for comprehensive assessment of movement performance in future over the more conservative and commonplace subjective approach of pre-selecting parameters in an arbitrary manner. Here, we support the necessity of assessing a family (for similar approach please also see ) of gait signatures including but not limited to rhythmicity, balance, coordination, regularity (the predictability of movement), asymmetry (motor and/or physical symptom leading to discrepancy between parameters from two limbs) and obstacle avoidance. In fact, results presented here provide preliminary evidence that movement disorders affecting different regions of the brain (i.e. Parkinson's vs. Alzheimer's disease) might influence different signatures (cf. Fig. 4b vs. c). Such representation therefore clearly indicates how presenting boundaries for multiple parameters enables rapid determination of a particular motor deficit, but also allows comparison across different disease groups. In future an operational model, which provides an unbiased statistical/data-lean presentation to extract the most important signatures, would be a preferred approach that will allow clinicians to characterize complex distinctive walking behaviours in both healthy asymptomatic and pathological individuals. Such an approach could incorporate multivariate methods to rank the important signatures from kinematic data, while simultaneously allowing the identification of interplay between the signatures. Consequently, it would also make way for multivariate classification models using population based optimal estimates, which until now has not been available. Establishing and benchmarking movement signatures with population based optimal estimates as has been undertaken in this review will allow generalized estimates of MoV to provide scalable and unbiased information on movement quality.
Finally, we have reported a preliminary approach to characterize motor impairments in subjects with movement disorders based on population-based evidence derived from the meta-analysis. The usage of this approach has been demonstrated in an exemplary case study comprising PwPD (Fig. 4e-f). In general, captured and pre-processed 3D kinematics of patients (using e.g. optical motion capture or wearable inertial sensors), thus allows easy access to an individual's movement signatures. The boundaries identified within the meta-analysis and presented within the case study (Fig. 4e-f) now provide an easy and accessible manner to benchmark such movement signatures against data from 85 studies containing 2409 patients and 2523 healthy asymptomatic controls (Table 1), e.g. 1.92-2.76%CV as uncertainty for the upper threshold of 2.34 %CV for STV, such that we have an effective and rapid screening of movement performance. Further studies are necessary to explore the broader utility and potential biases of our approach (utilizing optimum thresholds obtained through meta-analyses based on aggregate data to benchmark individual's movement quality might not be always straightforward). Nevertheless, our robust meta-analytic approach will allow the uptake of objective metrics on signatures within clinical information, a concept that is extremely important, but is not well appreciated until now .
Overall, when viewed from a human sensory motor control perspective, a neurophysiological approach that collates all available evidence-based clinical information in a statistical model, would bridge the gap in knowledge between currently dominant bipedal walking (e.g. (Bauby and Kuo, 2000)) and neuroanatomical (e.g (Takakusaki, 2013).) models. In the context of clinical decision-making, a gait hypermodel (an integrative model that synergizes knowledge from multiple models) that combines a neurophysiological model (experimentally validated and therefore physiologically representative, but generally in laboratory settings) with a statistical model (populationbased using rapid clinical tests and therefore providing comprehensive and ecologically valid evidence) offers the ability to identify biomarkers and thereby objectively track the state of a pathology (monitoring) and track its progression (prognostic). In addition, such a gait hypermodel opens perspectives for estimating susceptibility (risk), e.g. in the prediction of an individual's risk of falling (BEST, 2016). An additional benefit would be the exploration of task performance in scenarios that might be unethical or harmful through statistical perturbation via probabilistic techniques such as Monte Carlo methods.
Information on MoV has obvious relevance for the deeper understanding of movement control mechanisms and has been extensively investigated in both healthy asymptomatic and pathological cohorts. Clinical interest in the use of MoV as a biomarker for identifying pathological movement status has increased considerably in recent years, as evident through the publication of 21 relevant studies in 4 years compared to 64 in the previous 34 years. It now seems plausible that quantitative motor performance data (also in the form of pooled population information available through this review) can be utilized for developing gait hypermodels (such as the one proposed in this paper) with a vision for benchmarking behaviour, developing biomarkers and understanding neuro-adaptive behaviour of aged and pathological populations.

Contributors
DKR, NBS and WRT conceived and designed the review. DKR and MG performed literature search, screening and data extraction. DKR and NBS did quality assessment of included studies. DKR, NBS and WRT did the subsequent meta-analyses. NBS, WRT and CRB designed and supervised the case study. DKR, MG, NKI, MU are the study co-coordinators for the case study. CRB provided clinical content expertise while drafting and revising the manuscript. JHVD provided critical opinion and revision of the manuscript as a subject expert. All the authors reviewed and approved the manuscript for submission. WRT is the guarantor.

Ethics approval
Within the case study project, prior to study participation each of the 20 subjects (10 PwPD and 10 healthy controls) provided written informed consent. The entire study protocol was approved by the Kantonale Ethikkommission Zürich (registration number: 2015-00141) and was conducted in accordance with the Declaration of Helsinki.

Funding
No external funding was received to support this study.

Data sharing
The algorithms presented in this article to identify the optimum thresholds in physiological outcomes of walking will be released as a user friendly software interface on our laboratory webpage (movement.ethz.ch/research/neuromuscular-biomechanics). Complete metaanalysis data is available as supplementary material to this article. The case study data will also be made available on reasonable request.

Funding
There was no external funding for this project.

Declaration of Competing Interest
All authors have completed the ICMJE uniform disclosure at www. icme.org/coi_disclosure.pdf and declare: no support from any organisation for the submitted work; no financial relationships with any organisations that might have an interest in the submitted work in the previous three years; no other relationships or activities that could appear to have influenced the submitted work.