Towards validation and standardization of automatic gait event identification algorithms for use in paediatric pathological populations

Background: To analyse and interpret gait patterns in pathological paediatric populations, accurate determination of the timing of specific gait events (e.g. initial contract – IC, or toe-off – TO) is essential. As currently used clinical identification methods are generally subjective, time-consuming, or limited to steps with force platform data, several techniques have been proposed based on processing of marker kinematics. However, until now, validation and standardization of these methods for use in diverse gait patterns remains lacking. Research questions: 1) What is the accuracy of available kinematics-based identification algorithms in determining the timing of IC and TO for diverse gait signatures? 2) Does automatic identification affect interpretation of spatio-temporal parameters?. Methods: 3D kinematic and kinetic data of 90 children were retrospectively analysed from a clinical gait data- base. Participants were classified into 3 gait categories: group A (toe-walkers), B (flat IC) and C (heel IC). Five kinematic algorithms (one modified) were implemented for two different foot marker configurations for both IC and TO and compared with clinical (visual and force-plate) identification using Bland-Altman analysis. The best-performing algorithm-marker configuration was used to compute spatio-temporal parameters (STP) of all gait trials. To establish whether the error associated with this configuration would affect clinical interpretation, the bias and limits of agreement were determined and compared against inter-trial variability established using visual identification. Results: Sagittal velocity of the heel (Group C) or toe marker configurations (Group A and B) was the most reliable indicator of IC, while the sagittal velocity of the hallux marker configuration performed best for TO. Biases for walking speed, stride time and stride length were within the respective inter-trial variability values. Significance: Automatic identification of gait events was dependent on algorithm-marker configuration, and best results were obtained when optimized towards specific gait patterns. Our data suggest that correct selection of automatic gait event detection approach will ensure that misinterpretation of STPs is avoided.


Introduction
Gait analysis allows detailed characterisation of specific movement and functional deficits and can provide critical support for screening, diagnosing, and monitoring disease progression. It has therefore become an established approach in paediatric medicine for assessing motordevelopmental disorders. To effectively characterise motor deficits, 3D Clinical Gait Analysis (CGA) assists in objectively quantifying gait deviations, informing clinical decision making, and monitoring the effectiveness of therapy [1]. Deviation are based on comparing ensemble averaged kinematic signals over the course of a gait cycle [0-100 %], that is critically dependent upon the accurate identification a gait cycle. Gait event detection measures the occurrence of events (initial contact, IC and toe-off, TO) to discriminate between gait phases (stance vs swing) in a typical gait cycle. Incorrect identification of gait events could lead to errors in normalization of kinematic and kinetic data and its ensemble-averaging, as well as to inaccurate spatio-temporal parameters. So, the comparison of gait patterns between-and within-subjects, depends on accurate detection of gait events.
Currently, gait events are identified in clinics through two main approaches (collectively referred to in this paper as clinical identification). The first approach, widely treated as the 'gold standard', involves setting thresholds on vertical ground reaction forces (GRFs) obtained using force plates. However, obtaining clean, isolated force plate "hits" is often limited due to the use of assistive devices, short step lengths, or partial contact with the plate. In this way, the number of steps from which events can be identified will be constrained to the number of force platforms. Furthermore, the acceptable force threshold to identify IC and TO events has not been standardized across clinics, especially for paediatric populations [2,3]. When force plate data is unavailable, events are determined through visual identification of segment kinematics by trained experts. Besides being time-consuming, the reliability of this data is dependent on the expertise of the identifier and is thus subjective, furthermore its precision is limited to the video frame rate.
As an alternative to these clinical identification methods, algorithmbased event detection (referred to in this paper as automatic identification), primarily using kinematic data from optoelectronic markers, was introduced in the early 1990s [4]. While many methods have since been developed to estimate gait events in both typically developing and pathological subjects [3,[5][6][7][8][9][10][11][12][13][14][15], validation and standardization of these methods for use in a diverse gait patterns is lacking. Specifically, walking patterns commonly observed in Cerebral Palsy population involve a toe-or flat-foot initial contact rather than a heel-strike. In such cases, it might be difficult to assess the gait events as the typical kinematics are different. Recommendations to improve accuracy of IC detection for pathological gait patterns include modification of existent algorithms, and the use of the hallux marker to increase accuracy in TO detection [2], unfortunately, these alternatives have not been evaluated. Furthermore, the influence of automatic identification on the estimation of spatio-temporal parameters has not been investigated, nor validated against the gold standard.
Therefore, the primary goal of this study was to determine and compare the accuracy of available (modified) automatic identification approaches in subgroups with different gait patterns. Our secondary goal was to estimate if automatic identification affects interpretation of spatio-temporal parameters.

Participants
This study utilized a retrospective clinical database consisting of 3D kinematic and kinetic data of patients who underwent 3D CGA at a local hospital during regular clinical visits. Participants were included in the study if they were aged 3-18 years at time of measurement and visited the lab between 2015− 2017. Furthermore, subjects were excluded if they walked with assistance of orthotic devices, crutches or walkers during the measurement trial.
Ninety participants who fulfilled the inclusion criteria were included in this study. Limbs were classified into one of 3 groups (each N = 30) according to the region of the foot which initially contacted the ground at IC -. Group A consisted of toe-walkers with forefoot contact; Group B were flat-foot walkers, where the entire sole or the side of the foot contacted the ground; Group C exhibited typically developing gait patterns with a heel-strike (Table 1). Only a single limb for each participant was randomly included in this study, where group categorisation was confirmed using video evidence.
Informed consent was obtained from all children or their guardians, as approved by the local ethical committee (EKNZ Nr. 2018− 01640). All measurements were conducted according to the Declaration of Helsinki.

Measurement procedure
All participants walked barefoot on an 10 m instrumented walkway, without assistive devices, at their preferred walking speed for at least 6 trials. Kinematic data was collected at a sampling frequency of either 300 Hz (data collected until 2016) or 150 Hz (after 2016) using an optoelectronic motion capture system (12-camera MTX20, VICON, Oxford, UK). A total of 64 markers were attached to the subjects according to the modified Plug-in Gait (PiG) model (9.5-mm diameter, see Supplementary material S1). 3D ground reaction forces (GRFs) were collected through force platforms embedded in the walkway (Kistler, Switzerland, sampling frequency 1500 Hz). Walking trials were only included for data processing when at least one step with clean force plate contact was achieved. Trials with occluded marker data from the heel, toe, hallux or posterior superior iliac markers were excluded. In total, 90 trials were included for analysis.

Data processing
3D trajectories of the posterior superior iliac spine (PSI), calcaneous (HEE), second metatarsal head (TOE), and hallux (HLX) markers were extracted and low pass-filtered (Butterworth 2nd order; 10 Hz cut-off). Five algorithms for detecting gait events (Zeni, Desailly, Ghoussayni, Hrejac and Marshall, Hsue), as recommended by previous studies for assessing pathological gait [2,16], as well as the algorithm by O'Connor et al. [12], which focusses on typically developing gait, were implemented and compared in this study (Fig. 1). The Ghoussayni algorithm [6] was further modified to tune event detection as a function of walking speed, according to previous recommendations [2]. To allow standardised comparisons across all groups (including forefoot, flat-foot, and heel-strike gait types), all algorithms were implemented using data from Table 1 Participant description. IC: initial contact; SD: standard deviation; M: male, F: female, GMFCS: Gross Motor Function Classification System; CP: cerebral palsy; uni: unilateral, bi: bilateral, Neurological-Other: traumatic brain injury, incomplete spinal cord injury, infection, tumor; ITW: idiopathic toe-walking.
The forefoot is in contact with the ground during IC The entire sole or the side of the foot is in contact with the ground during IC The heel is in contact with the ground during IC   (Table 2). All data was processed using MATLAB (v. 2019a, Mathworks Inc., Natick, MA, USA). Codes related to this publication can be found on GitHub, project ID: 20884 (https://gith ub.com/Roosje95/AGED_gait-event-detection). Event-detection-error was determined as the difference in timing of the IC and TO gait events obtained using each algorithm, compared against the 'gold standard' force plate measurement, using a vertical GRF threshold of 20 N. Additional GRF thresholds of 10 N, 15 N, and 2% of the maximum vertical GRF were also analysed in order to establish a possible effect on event timing. The best-performing algorithm for each group was determined on the basis of sensitivity and accuracy, explained below under statistical analysis. To examine efficacy for use in clinical settings, spatio-temporal parameters (stride time (ms), walking speed (m/s), stride length (mm), single limb support time (ms), and stride width (mm)) were calculated using the best-performing algorithm for each group.

Statistical analysis
The event-detection-errors obtained for each algorithm and group were analyzed for normal distribution with a Kolmogorov-Smirnov test. Sensitivity was defined as the percentage of participants for which the absolute event-detection-error was below 33 ms [2,16]. Bland-Altman analyses [17,18] were performed to compare a) the timing of gait events obtained using the implemented algorithms against the 'gold standard' (force plate identification), and b) the spatio-temporal parameters obtained automatically and clinically (through force plate and visual identification). Accuracy was defined as the bias (mean difference between two methods of measurement) and 95 % limits of agreement (LoAs, 1.96 SD of the event-detection-error) resulting from the Bland-Altman analysis. Furthermore, to evaluate whether the determined parameters were clinically meaningful, the bias and LoAs of the spatio-temporal parameters were compared against the inter-trial variability (coefficient of variation, CV) determined over 6 CGA trials (supported by visual identification when the number of clean force plate hits was insufficient). Moreover, the linear association between the automatically and clinically identified spatio-temporal parameters was evaluated using the coefficient of determination (R 2 value, from the Bland Altman analysis). The performance of automatic identification was seen as valid when the bias and LoAs fell within the inter-trial variability and the R 2 value was >0.95.

Comparison of GRF thresholds
For group A, varying the force threshold from 10 N/15 N/2%GRF to 20 N for estimating the timing of gait events using vertical GRFs led to a bias in event-detection errors of up to 4.8 ms with LoAs of 50 ms for estimating the timing of IC (Supplementary material S2). Smaller biases   were found for groups B (bias: 0.3, LoAs: 0.9) and C (bias: 0.3, LoAs: 0.5). Similar trends were observed for TO (groups Abias: 4.9 ms, LoAs of 65 ms; B -bias: 1.2, LoAs: 1.9; Cbias: 0.9, LoAs: 1.4).

Comparison of algorithms
While 5 algorithms were successfully implemented, two algorithms (Hrejac and Marshall, Hsue) [7,8] were excluded from the study due to high levels of false positives in identifying gait events automatically (S3).
The Kolmogorov Smirnov test accepted the normalcy hypothesis for the event-detection-error values in almost all groups except A and B when using the O'Connor algorithm, for which the Bland-Altman analysis was adapted.
For estimating the timing of TO, the modified Ghoussayni approach with HLX marker yielded the best results for all 3 groups (bias: 0.1-1.3 ms; LoAs: 15− 22 ms; Fig. 2D). With regards to sensitivity, for groups A and C, all event-detection-error values were within the accepted 33 ms threshold, while only 96.7 % of the errors were within the threshold for group B.

Comparison of automatically and clinically determined spatiotemporal parameters
Results showed high agreement between automatic and clinical identification methods (R 2 >0.95) for most spatio-temporal parameters, except single limb support time (R 2 values -groups A: 0.72; B: 0.88; C: 0.88). The biases and LoAs for stride time, walking speed, and stride length all fell within the inter-trial variability (Fig. 3). While the biases for single limb support time and stride width fell within the inter-trial variability, LoAs exceeded it for single limb support time for all conditions, as well as for stride width for group A and when all gait signatures were taken together (Fig. 3).

Discussion
While gait analysis has become commonplace for clinical assessment of cerebral palsy, the robust identification of gait events remains challenging and lack of standardization in this routine reduces reproducibility and possibly its ecological validity. The purpose of this study was therefore to determine and compare the accuracy of available (modified) automatic identification methods in subgroups with different gait patterns and evaluate the feasibility of incorporating such approaches into clinical settings. The accuracy and sensitivity of automatic identification procedures were shown to be dependent on a combination of algorithm and marker selection and yielded best results when optimized towards a specific subgroup. Moreover, the best performing automatic identification approach per subgroup were shown to be valid for the calculation of stride time, walking speed and stride length, and are therefore able to yield robust metrics for supporting clinical decision making for these spatio-temporal parameters.

Evaluating algorithm performance
Within this study, the evaluation of algorithm performance was done against the gold standard of force plate identification; this sets it aside from previous work which was mainly based on visual event setting [2]. The sagittal velocity based approaches (Ghoussayni and modified Ghoussayni) outperformed the other algorithms (Zeni, Desailly, O'Connor; which only considered movement in a single direction) for estimating the timing of IC and TO, which is in line with previous findings and recommendations [2]. For estimating the timing of TO, tuning event detection as a function of walking speed improved the accuracy of identification for the sagittal velocity approaches, as did the use of the HLX instead of the TOE marker for all groups, which is also in line with recommendations from literature [2]. For estimating timing of TO, the modified Ghoussayni approach demonstrated the greatest accuracy across all groups, while for estimating timing of IC it was clearly favourable to optimize the algorithm set-up to the walking behaviour of specific subgroups. Here, the use of the TOE instead of the HEE marker improved the results for the toe-and flat-foot walking groups (A and B) when using the Ghoussayni approach, as would be expected based on the part of the foot used at IC. However, no improvements were observed for the other algorithms when adapting input marker configuration. This result is likely due to the consideration of local maxima in only one direction (horizontal position or vertical velocity), rather than Ghoussayni's 2D approach.

Impact of automatic approaches for gait event detection on spatiotemporal parameters
The best performing algorithms per subgroup showed to be robust for calculation of stride time, walking speed and stride length when compared against the current clinical standard of visual identification. The inter-trial variability values observed in this study  showed to be similar to those from literature on spastic Cerebral Palsy population (7.6-14 % [19] and 3.4-9.7 % [20]) and were therefore considered to be appropriate for the dedicated use. As all observed bias values between automatically and clinically identified spatio-temporal parameters fell within the inter-trial variability (Fig. 3), the automatic algorithms yield robust metrics that could support clinical decisions. However, it is important to consider that, for this study, results for automatically and clinically determined spatio-temporal parameters were computed for one gait cycle per trial, the differences between the methods could change over multiple consecutive gait cycles. Additionally, due to the LoA for single limb support and stride width exceeding the inter-trail variability, further evaluation of these parameters is recommended before implementing automatic identification within clinical settings.

Limitations
The clinical 'gold standard' for estimating the timing of gait events (force plate identification method) is dependent on the GRF threshold. To estimate the effect of different GRF thresholds on the timing of gait events, the 20 N threshold used in this study was compared against 3 other thresholds (10 N, 15 N, 2%GRF) that have been reported in literature (S2). This evaluation showed high agreement between the different force thresholds (R 2 value >0.99) and therefore we consider the effect of threshold choice on our results to be limited.
It is likely that the 33 ms window applied in previous studies might not be an acceptable range for identifying gait events with the different kinematic algorithms, and hence lead to a misinterpretation of spatiotemporal parameters. Use of this window in our study allowed comparison of our results to previous investigations [2]. In addition, outcomes from our validation analyses showed that biases for spatio-temporal parameters remained within the inter-trial variability.
Within this study, inter-trial variability was used to investigate if automatic identification would affect interpretation of spatio-temporal parameters. Inter-trial variability shows the difference in calculation of spatio-temporal parameters between trials of 1 individual during a measurement session. As a fast majority of clinical gait labs focuses on the average values for clinical decision making; the assumption was made that if the difference between visual identification and kinematic algorithm detection is smaller than the variation between trials, it should not affect the interpretation of average values. As no minimum clinically important differences are currently established for paediatric spatio-temporal parameters, inter-trial variability was seen as the best possible alternative. To be capable of calculating inter-trial variability, more than 1 stride was required, therefore in additional to force plate identification, visual event setting was used to gather the required reference values.

Recommendations for further research
Our results showed that event detection improved when input marker configurations used within the algorithms were adapted to the gait pattern of the participant. For example, as a large number of participants from group B initially contacted the ground with the side of the foot, it might be worthwhile considering the use of marker configurations at the proximal and distal ends of the fifth metatarsal (PMT5 and DMT5 [21]) for estimating the timing of IC.
While accounting for walking speed supported the tuning of individual thresholds for estimating the timing of TO, it did not improve our estimation of IC timing. Here, it would be worthwhile identifying parameters that could lead to optimization of individualized thresholds for estimating IC timing.
In a next step, the most promising method could be used to evaluate cases in which no force plate detection is available. In addition, the method could be used to investigate gait event detecting during eg. treadmill walking, or while wearing orthotics, or when ambulatory aids are used.

Conclusions
Our findings suggest that the sagittal velocity of the heel (Group C) or toe marker configurations (Group A and B) was the most reliable indicator of IC, while the sagittal velocity of the hallux marker configuration performed best for TO. Evaluation of the resultant spatio-temporal parameters showed that automatic event identification is capable of producing reliable metrics consistent with clinical interpretation.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.