Concurrent repeatability and reproducibility analyses of four marker placement protocols for the foot-ankle complex

Multi-segment models of the foot have been proposed in the past years to overcome limitations imposed by oversimpli ﬁ ed traditional approaches used to describe foot kinematics, but they have been only partially validated and never compared. This paper presents a unique comparative assessment of the four most widely adopted foot kinematic models and aims to provide a guidance for the clinical interpretation of their results. Sensitivity of the models to differences between treadmill and overground walking was tested in nine young healthy adults using a 1D paired t -test. Repeatability was assessed by investigating the joint kinematics obtained when the same operator placed the markers on thirteen young healthy adults in two occasions. Reproducibility was then assessed using data from three randomly selected participants, asking three operators to repeat the marker placement three times. The analyses were performed on sagittal kinematics using curve similarity and correlation indices (Linear Fit Method) and absolute dif- ferences between selected points. Differences between treadmill and overground gait were highlighted by all the investigated models. The two most repeatable and reproducible investigated models had average correlations higher than 0.70, with the lowest values (0.56) obtained for the midfoot. Averaged correlations were always higher than 0.74 for the former and 0.70 for the latter, with the lowest obtained for the midfoot (0.64 and 0.51). For all investigated models, foot kinematics generally showed low repeatability: normative bands must be adopted with caution when used for comparison with patient data.


Introduction
The observation of the foot-ankle complex is of clinical interest for various pathologies, including foot drop or deformities. Clinical decision-making might benefit of objective measurements of foot kinematics to isolate the causes of altered movements.
In gait analysis the foot is typically considered as a rigid segment linked to the tibia. This simplification, justifiable for some clinical applications, might be unsuitable for problems where the multi-segmental anatomy of the foot is needed. In the past two decades several multi-segment models of the foot-ankle complex have been proposed and reviewed (Deschamps et al., 2011;Saraswat et al., 2012;Sawacha et al., 2009;Theologis and Stebbins, 2010). Nowadays, the most popular models used either for research or clinical applications are those illustrated by Leardini et al. (2007), Saraswat et al. (2012), Sawacha et al. (2009) and Stebbins et al. (2006). The major differences are in the number and definition of the segments to be tracked, as well as in the identification of the associated anatomical landmarks. The validation of these models is limited (Arnold et al., 2013;Caravaggi et al., 2011;Curtis et al., 2009;Deschamps et al., 2012a) and their clinical feasibility and utility has been previously questioned (Baker and Robb, 2006). Moreover, their repeatability (i.e. their precision when applied on same or similar subjects by the same operator (JCGM, 2012)) and reproducibility (i.e. their precision when applied on the same, or similar, subjects by different operators (JCGM, 2012)) are still unclear (Deschamps et al., 2011).
Contents lists available at ScienceDirect journal homepage: www.elsevier.com/locate/jbiomech www.JBiomech.com This paper aims at: quantifying the within-and betweensubject repeatability, and between-operator reproducibility of the data obtained from the four mentioned models for overground and treadmill walking and at assessing their ability to highlight changes imposed by these two walking conditions.

Subjects
Thirteen healthy subjects were recruited (ten males, age: 27.0 7 1.9 years, height: 1.83 7 0.08 m, foot length: 28.5 7 1.0 cm). Exclusion criteria were selfreported musculoskeletal pain or impairments. Ethical approval was granted by the University of Sheffield. Prior to the data collection, all subjects read and signed a consent form. The sample size was calculated using a power analysis with significance α¼0.05 and power β¼0.80, based on the data from the sagittal kinematics of the first two subjects.

Data collection and processing
Each subject was instrumented with the marker set obtained merging those proposed by Stebbins et al. (2006) Davis et al. (1991)) ( Fig. 1 and supplementary material). This choice allowed avoiding the effect of the betweenstride variability associated to placing each marker-set once per time. The merged set of 39 markers was obtained respecting the anatomical landmark locations and the positioning critical alignments described in each paper: 4 on the pelvis, 2 on the thighs, 2 on the lateral femoral condyles; plus, on the right side, 6 markers on the shank, 7 on the hindfoot, 2 on the mid-foot, 12 on the forefoot, and 4 on the hallux. Spherical markers (diameter: 9.5 mm) were used for pelvis, thighs and shank segments, whereas hemispherical markers (diameter: 4 mm) were used for the foot.
Marker trajectories were collected with a 10-camera stereophotogrammetric system (T-160, Vicon Motion System Ltd -Oxford, UK, 100 Hz, Vicon Nexus 1.8.5). Aperture, focus and position of the cameras were set to ensure good visibility and precise and accurate tracking of the smaller 4 mm markers (Di Marco et al., 2016;Windolf et al., 2008).
Labelling, manual cycle-events detection (from absolute vertical component of the heel marker, and 3D position of the foot), gap filling, and filtering (Woltring spline routine, size 30 (Woltring, 1986)) were conducted within Nexus and C3D files were then post-processed in MATLAB (R2015b, The MathWorks, Inc. -Natick, MA, USA). The local coordinate systems for each segment were defined according to the corresponding model, selecting the pertaining markers, and used to compute joint kinematics consistently with the definitions given in each paper. M1 was implemented according to its most repeatable configuration (option 5 in (Stebbins et al., 2006)), using static calibration and dynamic tracking of the hindfoot without considering the wand marker.
The following notations will be used to simplify the data reporting: hindfoot and calcaneus will both be indicated as HF, midfoot as MF, metatarsus and forefoot as FF, tibia and fibula as Tib, hallux as Hal, and finally, the foot modelled as a rigid segment as Foot. A left-side superscript will specify the model: e.g. the forefoot in M1 and the metatarsus in M2 will be noted as M1 FF and M2 FF, respectively. Fig. 2 summarises the flow of data collection and processing explained in the following sections.

Comparison between treadmill and overground walking
A treadmill (ADAL3D-F, TECMACHINE HEF Groupe -Andreziéux Bouthéon, France) was used to collect more than one stride per trial. A comparison between treadmill and overground walking conditions allowed to check whether the models were all sensitive enough to detect expected changes in the kinematic patterns, known to be different mainly due to the inherent different walking speeds (Alton et al., 1998;Sloot et al., 2014).
A trained operator placed the entire marker-set on the thirteen subjects, who were asked to walk barefoot at a self-selected speed on both the treadmill and overground. The observed walking speeds were 0.8270.15 m/s and 0.9970.11 m/s, respectively. A total of five right strides were retained from each session for the analysis.
Data from four subjects among the thirteen recruited were discarded due to poor marker visibility in the overground trials. For the remaining subjects, the ability of the models to discriminate between treadmill and overground walking was tested with the 1D paired t-test (α¼ 0.05) (Pataky, 2012). This test is based on the statistical parametric mapping (SPM) theory (Friston et al., 2007), which is used to analyse statistical differences among continuous curves, without reducing the test to summary metrics (maximum or minimum values). The analysis was performed using the SPM1D open-source package for MATLAB (spm1d.org) and generated: map of t-values (SPM{t}), t* limit, and areas where differences were found with relevant p-values.

Within-and between-subjects analyses
Two sessions of data collection for the treadmill walking were carried out one month apart. In each session the same operator re-placed the markers on the same subject.

Waveform similarity
Waveform similarity was assessed both for overground and treadmill walking using the Linear Fit Method (LFM) (Iosa et al., 2014). This method was chosen rather than the Coefficient of Multiple Correlation (Kadaba et al., 1989) as it has been heavily questioned in the past (Ferrari et al., 2010;Røislien et al., 2012). The LFM yields three coefficients: a 1 is the scaling factor between the comparing curves and the similarity index (the closer to 1, the more similar the curves); a 0 measures the shift between the curves, quantifying the offset, when a 1 tends to 1; R 2 validates the linear relationship between the curves and measures their correlation (the closer to 1, the stronger the linear model).
For the within-subject analysis in treadmill walking, for each i-th subject the jth kinematic curve at the k-th gait cycle was compared to the same kinematics averaged among the five strides and the two sessions of the i-th subject. As reported in (Iosa et al., 2014), a 1 and a 0 tend to their ideal values (i.e., 1 and 0, respectively) when comparing n curves with their averaged pattern. In this case, to have a measure of the variations, it is relevant to report and observe the standard deviations for both a 1 and a 0 .

Repeatability
Models' repeatability was assessed considering the sagittal joint angles at Initial Contact (IC) and Toe-Off (TO) as summary metrics (Wilken et al., 2012). The Median Absolute Deviation (MAD) and the Maximum Difference (MD) were calculated. The former is a variability index reported to be robust to the outliers, the latter measures the differences obtained in the worst case (Benedetti et al., 2013).

Between-operators analysis
A subset of three male subjects (age: 25.7 7 2.3 years, height: 1.84 7 0.08 m, foot length: 28.7 7 0.2 cm), randomly selected among those recruited, was considered. Three trained operators repeated three times the marker placement and measured the relevant anthropometric parameters. Subjects walked barefoot on the treadmill at self-selected speed (walking speed: 0.977 0.24 m/s). This condition is considered the most controlled and produces least variations in the relevant joint kinematics. Five right strides were isolated for the analysis.

Waveform similarity
The agreement among the kinematic curves was tested using the LFM. For each subject, the i-th kinematic variable averaged among the five strides at the j-th repetition for the k-th operator was compared to the same kinematic variable averaged among the five strides, the three repetitions and the three operators.

Reproducibility
Median Absolute Deviations (MAD) and Maximum Differences (MD) were calculated on the sagittal kinematics at the Initial Contact (IC) and the Toe-Off (TO) to evaluate the between-operator reproducibility.

Comparison between treadmill and overground walking
Figs. 3 and 4 show the joint kinematics and the relevant mapping of t-values (SPM{t}) obtained from the 1D paired t-test over the two walking conditions. Despite corresponding joints being differently defined, the Knee and FF-MF (where relevant) obtained from the four models showed differences in the same part of the gait cycle. For the other kinematics some inconsistencies among the models were highlighted: for example, HF-Tib displayed differences between the 40 and 50% of the gait cycle for M2 and M3 (p o0.001), whereas M1 and M4 did not. These inconsistencies are not relevant for this study, which aimed to assess the models ability highlighting the changes imposed by the two walking conditions (treadmill and overground walking).

Within-and between-subjects analyses
Tables 1 and 2 show the results of the within-and betweensubject repeatability analyses for both the statistical similarities and absolute differences, for treadmill and overground walking, respectively. Results are complemented with the Range of Motion (ROM) values for the targeted joints.
For the treadmill walking (Table 1), the within-subject analysis yielded high averaged correlations (R 2 ) among the curves for all the kinematics of the four models, with values ranging between 0.90 and 0.97. The exceptions were only observed for M2 FF-MF (R 2 ¼0.87) and M3 MF-HF (R 2 ¼0.77). These were also the kinematics with smallest range of motion: 9°and 5°, respectively. Averaged a 1 was equal to 1 for all the kinematics and its standard deviation was always lower than 0.27: peaks for SD a1 occurred for M2 FF-MF (SD a1 ¼0.22) and M3 MF-HF (SD a1 ¼ 0.27). Interestingly, standard deviations of the offset a 0 were comparable among M1, M2 and M3 (between 0°and 3°), whereas higher values were found for M4 (between 3°and 10°). Although less marked, a similar trend was detected by MAD and MD at both IC and TO.
Although only one session of data collection was performed for the overground walking, the comments given for the results showed by Table 1 (treadmill walking) are also valid for the results in Table 2 (overground walking). Interestingly, the kinematics M3 MF-HF showed Table 1 Treadmill walking: Range Of Motion (ROM), Linear Fit Method (LFM) coefficients, Median Absolute Deviation (MAD) and Maximum Difference (MD) at Initial Contact (IC) and Toe-Off (TO) obtained from the within-and between-subject repeatability analyses. Segment names are abbreviated as follows: tibia (Tib), calcaneus and hindfoot (HF), midfoot (MF), metatarsus and forefoot (FF), hallux (Hal), and foot as rigid segment (Foot). M1 stands for the model illustrated by Stebbins et al. (2006), M2 for Leardini et al. (2007), M3 for Sawacha et al. (2009) the worst behaviour also in between-subject analysis for this walking condition: a 1 ¼1.0070.37 and R 2 ¼ 0.5570.23. Table 3 shows the treadmill between-operators reproducibility. Averaged correlations ranged from 0.85 to 0.98 for M1, from 0.87 to 0.98 for M2, from 0.72 to 0.98 for M3, and from 0.90 to 0.98 for M4. As for the within-and between-subjects analyses, M2 FF-MF (SD a1 ¼ 0.24 and R 2 ¼0.87) and M3 MF-HF (SD a1 ¼0.29 and R 2 ¼0.72) showed the highest SD a1 and correlations were lower than those of other joints. Also the SD a0 confirmed what observed in the previous analysis: the highest values were obtained for M1 and M4. Averaged MAD values at IC and TO were in the range 0-3°for Table 2 Overground walking: Range Of Motion (ROM), Linear Fit Method (LFM) coefficients, Median Absolute Deviation (MAD) and Maximum Difference (MD) at Initial Contact (IC) and Toe-Off (TO) obtained from the within-and between-subject repeatability analyses. Segment names are abbreviated as follows: tibia (Tib), calcaneus and hindfoot (HF), midfoot (MF), metatarsus and forefoot (FF), hallux (Hal), and foot as rigid segment (Foot). M1 stands for the model illustrated by Stebbins et al. (2006), M2 for Leardini et al. (2007), M3 for Sawacha et al. (2009), and M4 for Saraswat et al. (2012).

Discussion
This study evaluated the repeatability and reproducibility of four foot-ankle models used for gait analysis. Tests were conducted on healthy adults and, thus, no comparison of the presented results can be performed with studies that include patients with pathologies causing foot deformities, for whom ad-hoc studies investigating within-and between-subjects, and between-operator variability are recommended. Out-of-sagittal kinematics have not been analysed, since they have already been reported to be the least repeatable and reproducible (Ferrari et al., 2008;Kadaba et al., 1989), also for the four models here investigated (Caravaggi et al., 2011;Deschamps et al., 2012b;Leardini et al., 2007;Saraswat et al., 2012;Sawacha et al., 2009;Stebbins et al., 2006). While this choice could be addressed as a limitation, it is safe to assume that out-of-sagittal variables would be even less repeatable and reproducible than sagittal kinematics.
The obtained kinematics have been verified by comparing the ROM to those reported in the original articles for M1-M2-M3 and M4. A good match of the kinematics was observed, even though M1 and M4 were originally proposed for a children population. In particular, the obtained ROM differed at the most of 6°for M1 ( M1 FF-Tib), of 8°for M3 ( M3 FF-MF), and of 10°for M4 ( M4 FF-HF), respectively. A comparison of the kinematics over the entire gait cycle was not possible for M2, since Leardini et al., (2007) reported only the stance phase. However, Deschamps et al. (2012b) provided the ROM of the relevant joints for M2 and the largest discrepancy from our results (10°) was obtained for M2 HF-Tib. These differences are most probably to be ascribed to a non-age matched sample with the cited papers.

Comparison between treadmill and overground walking
The sensitivity of the four models to the changes imposed by walking overground or on a treadmill was investigated. This part of the study was designed to overcome some of the limitations of the most common analyses of joint angles estimated in this two conditions. Indeed, when testing statistical differences, not only time history correlations or point-by-point differences were calculated, but also the intrinsic correlation between subsequent time-samples of the same variable (Deschamps et al., 2011;Pataky, 2012;Schwartz et al., 2004). The 1D paired t-test on the kinematics showed statistically significant differences between the two walking conditions (Figs. 3 and 4). These differences are likely to be ascribed to the different walking speeds, coherently with the literature (Alton et al., 1998). For the majority of the kinematics, the different definitions adopted for segments and joints did not allow a direct comparison of the differences observed in the various models. This, as highlighted in Figs. 3 and 4, led to some inconsistent statistical differences in the kinematics among models during the stance phase. However, the reported results showed an overall ability in distinguishing between the two walking conditions. In conclusion, the four models are sensitive to the examined walking conditions.

Within-and between-subjects analyses
The within-subject analysis performed on the treadmill data (Table 1) provided information on the effects of the marker repositioning. Considering the standard deviation of a 0 for each of the four models, it was evident that the kinematics obtained from M4 were the most affected by the marker repositioning. This is validated also looking at the MDs, and most likely due to the lack of a neutral configuration definition for the joints, i.e. the alignment with the static posture. Although M1 does not require any reference posture to define the joint angles, the relevant kinematics did not display the same large variability, but still larger than M2 and M3. It is worth highlighting that referencing the kinematics to the static posture, as it is for M2 and M3, would lead to a loss of information on possible anatomical deformities. The within-subject results obtained for the overground walking (Table 2) are similar to those obtained for the treadmill walking. However, the overground results showed a smaller range of values for a 0 , strengthening the conclusion that marker repositioning affects mainly the outputs of M4 and M1. This results (Table 2) are more reasonably a between-stride variability measurement. M2 FF-MF and M3 MF-HF were the angles that led to the worst similarity and correlation indices. These two variables showed a small range of motion, and a large magnitude for the soft tissue artefact could have concealed the actual information, reducing both a 1 and R 2 . Moreover, the midfoot segment (MF) is tracked by markers placed on very close landmarks in both the models, and this could increase the variability on the midfoot-based kinematics.
Our results seem to contrast those from Caravaggi et al. (2011) who found M2 Foot-Tib to be the most repeatable among the foot joints, which would call for higher values of a 1 and R 2 . This could be due to the two different methods used to quantify the repeatability: averaged standard deviation in (Caravaggi et al., 2011) and LFM, complemented with MAD and MD, in the present study.
The between-subject repeatability analysis, performed both for overground and treadmill walking, highlighted some critical issues concerning the clinical meaningfulness of normative bands (Tables 1 and 2 Among all the kinematics, M2 FF-MF and M3 MF-HF appear to be the least reliable, in terms of both similarity and correlation. Incidentally, M2 FF-MF was already found to be the least reliable among the M2 kinematics by Deschamps et al. (2012b), who used a z-score analysis for this purpose (Schwartz et al., 2004).
Although Deschamps et al. (2012a) showed that for M2 the use of absolute angles did not have a critical impact on the variability of 3D rotations, the results of the present study indicate that the static posture subtraction might be crucial for foot kinematics. Indeed, M4 yielded larger normative bands than the other protocols, as shown in Figs. 3 and 4. M1 did not call for a posture subtraction either, but appeared to be more robust to the marker repositioning. Generally, MADs and MDs were always higher than those obtained for M2 and M3, but lower than the values obtained for M4.
The models M2 and M3 are comparable in terms of absolute differences on the summary metrics for the within-subject analysis, whereas M1 and M4 led to slightly higher values for the MDs. This was true both for treadmill and overground walking, with the latter condition leading, as expected, to the highest values for MADs and MDs. The same trend was confirmed by the betweensubject analysis.

Between-operators analysis
Reported LFM coefficients (Table 3), and particularly the parameter a 0 , showed that the effect of the marker repositioning on the same subject (repeatability) produces similar effect of the repositioning performed by different operators (reproducibility). This was also confirmed by MAD and MD values. Although a bias might be introduced by the different sample sizes considered for these two analyses, the equivalence of the two effects suggests that the variability of the foot motion is higher than any other source of variability. The presented results seem to contrast those previously reported for M2 (Deschamps et al., 2012a), where between-operator reproducibility was assessed with the CMC, and was lower than the within-and the between-day repeatability for a sample of six subjects. As well as for the within-subject analysis, this is likely due to the different methodologies used to assess the curve similarities. Indeed, CMC sensibly decreases when large offset occurs between curves, whereas R 2 does not.
Both between-operator similarity and correlation indices confirmed what discussed for the within-and between-subject analyses: M2 MF-HF should be interpreted with attention, and M2 FF-MF and M3 MF-HF were the least reliable, having the lowest similarities and correlations. M1 and M4 were confirmed to be the models leading to the highest differences in terms of both MADs and MDs with consequent larger normative bands.

Conclusion
Concurrent within-and between-subject repeatability, and between-operator reproducibility analyses of the kinematics obtained using four foot models have been performed, together with an assessment of their ability to highlight changes imposed by treadmill and overground walking. All the models were able to distinguish between the two walking conditions and the models M2 (Leardini et al., 2007) and M3 (Sawacha et al., 2009) were the most repeatable and reproducible. Nevertheless, this study clearly showed that it is questionable to assume the foot kinematics to be repeatable and hence to rely on normative bands for the clinical assessment of patients.

Conflict of interests
The authors have no conflict of interests to report.