The accuracy of markerless motion capture combined with computer vision techniques for measuring running kinematics

Markerless motion capture based on low‐cost 2‐D video analysis in combination with computer vision techniques has the potential to provide accurate analysis of running technique in both a research and clinical setting. However, the accuracy of markerless motion capture for assessing running kinematics compared to a gold‐standard approach remains largely unexplored.


| INTRODUCTION
Running technique is considered an important risk factor for running injuries 1,2 and has also often been associated with running economy and running performance. 1,[3][4][5] For example, a larger knee flexion angle during the stance phase has been associated with a higher patellofemoral compression stress 6,7 and poorer running economy. 3 Although the technical components associated with injury risk and running economy are not always consistent across studies, both researchers and clinicians often assess running technique to identify components that may be modified with gait-retraining to reduce injury risk (e.g., 8 ) and/or improve performance. Three-dimensional (3-D) marker-based motion capture is often considered the "gold-standard" method to quantify running technique. However, this method requires expensive equipment, trained operators, and the marker placement can be time-consuming, all of which make this method unsuitable for most trainers and clinicians such as physiotherapists. 9 Trainers and physiotherapists therefore often use qualitative visual assessment, or two-dimensional (2-D) video analysis to assess running technique, 10,11 with sagittal-plane hip, knee, and ankle angles typically being assessed because they are considered relevant to infer injury risk or the cause of injuries (e.g., 12 ). However, while 2-D video analyses have been shown to exhibit excellent validity for some outcomes, sagittal-plane knee and ankle kinematic outcomes have been shown to exhibit poor to moderate concurrent validity relative to 3-D motion capture during running. [13][14][15] Similarly, 2-D video analysis has been shown to exhibit highly varying reliability, ranging from poor to excellent. 13,15,16 Further, only a few frames are typically manually analyzed to assess discrete outcomes rather than time-series with 2-D analysis. Although approaches have been developed to track landmarks throughout a video sequence, these methods often only allow one point at a time to be tracked or still require marker placement. 14 Markerless motion capture based on low-cost 2-D video analysis in combination with computer vision techniques has the potential to overcome most of the limitations inherent to 3-D marker-based motion capture and manual 2-D video analysis. Computer vision techniques use various algorithms to automatically identify (anatomical) landmarks, which in turn can be used to compute segments and joint angles to objectively quantify running technique. There are numerous freely available approaches to (2-D or 3-D) markerless motion capture such as OpenPose, 17,18 XNect, 19 and DeepLabCut, 20,21 or commercial software such as Theia3D, 22 each with their own drawbacks and advantages (see also 23 for an overview). OpenPose is one of the most popular approaches to markerless motion capture because it is freely available, relative easy to use, and it allows use of a pre-trained model. 24,25 In contrast, DeepLabCut is an open source deep learning approach that allows researchers to train an artificial neural network to identify the position of manually identified landmarks throughout a video sequence. 20,21 The ability to self-select landmarks is a major strength over other markerless motion capture methods that often track predetermined landmarks that may reduce their accuracy and applicability (as in, e.g., OpenPose 17,18 or XNect 19 ). Further, DeepLabCut is also freely available, and is embedded within a graphical user interface, which facilitates implementation by practitioners and researchers.
Despite the potential of OpenPose and DeepLabCut to facilitate larger scale data collection in scientific studies, and improve the reliability and validity of manual 2-D video analysis for practitioners, only a few studies have compared their accuracy for assessing kinematics in human movements to a gold-standard marker-based approach. Indeed, only one study has investigated the accuracy of OpenPose and DeepLabCut for assessing sagittal-plane running kinematics. 25 However, the researchers used DeepLabCut's pre-trained human pose model to identify landmarks and this pre-trained model may not be as accurate for detecting landmarks as a model specifically trained for running. Indeed, the differences were largely systematic in nature, and the authors therefore argued that they likely reflected human error due to mislabeling of the training set. 25 Moreover, they used data from multiple cameras as input as opposed to one 2-D camera that may typically be available in an applied setting and also assessed the accuracy of both methods by comparing the estimated joint centers. Knowledge about the deviation in joint angles would, however, be more informative to both practitioners and researchers. This study therefore aimed to compare sagittal-plane hip, knee, and ankle running kinematics measured using a custom-trained DeepLabCut K E Y W O R D S artificial intelligence, deep learning, DeepLabCut, motion analysis, OpenPose, statistical parameter mapping, validity model and the widely used OpenPose model with a "gold-standard" marker-based motion capture system. We hypothesized that the custom-trained DeepLabCut model would show better agreement with the markerbased motion capture than OpenPose because the landmarks in DeepLabCut were labeled by an experienced researcher, whereas OpenPose relied on the pre-trained landmarks that were not necessarily labeled by an experienced researcher. Finally, as a secondary aim we also assessed the generalizability of the custom-trained DeepLabCut model by investigating the model's performance in conditions were participants were running without retroreflective markers. We hypothesized that the model would partially learn to recognize the bright pixels of the retroreflective markers and that removal of these markers would lead to lower agreement.

| General study design
This study used data collected as part of an intervention study that aimed to investigate the effects of real-time spatiotemporal feedback on running injuries and performance. As part of this intervention study, a subsample of 40 randomly selected participants reported to the laboratory prior to the intervention to assess their running technique and running economy. Briefly, the participants were first equipped with retroreflective markers. After subject calibration and a familiarization period, the participants completed short (4 min) runs while full-body kinematics and videos were collected. The video and kinematic data collected during these running trials were used in this study to compare markerless motion capture with marker-based motion capture for assessing running kinematics.
Both the larger intervention study and markerless motion capture procedures were approved by the local ethics committee (nr. NL72989.068.20, and FHML-REC/2021/076, respectively), and all participants signed informed consent prior to the measurements.

| Participants
Forty participants (21 male, 19 female, mean ± SD age 37.8 ± 11.5 years, body height 173.7 ± 7.9 m; body mass 72.9 ± 11.7 kg) volunteered to participate in the study. All participants were required to be free of any moderate injuries for at least 3 months, or minor injuries for at least 1 month, had a body mass index (BMI) of <27.5, and were aged 18-65 years. The participants were also required to be running a maximum of 40 km per week at the start of the study.

| Instruments
The computer assisted rehabilitation environment (CAREN, Motek, The Netherlands) system 26 combines an instrumented split-belt treadmill (belt length and width 2.15 × 0.5 m, 6.28-kW motor per belt, 60 Hz belt speed update frequency and 0-18 km h −1 speed range) with a 12-camera 3-D motion capture system (Bonita, VICON NEXUS v2.7, Oxford Metrics Group, Oxford, UK, 100 Hz) and was used to determine marker-based kinematic outcomes during running at various speeds. Two-dimensional gray-scale videos were captured at 50 Hz using a camera (Basler scA640-74gm, 659 × 494 pixels, Germany), placed at the right side of the treadmill at a height of 0.91 m relative to the treadmill surface ( Figure 1). The video data and motion capture data were both recorded using D-flow software. 26 This also ensured time-synchronization between the data.

| Data collection procedures
After calibration of the systems, 46 retroreflective skin markers with a diameter of 14 mm were attached to the skin with double-sided tape using a modified full-body marker set (Human Body Model v2 26 ;). In line with the International Society of Biomechanics (ISB) suggestions, each participant was asked to perform basic motion tasks of the hip and knee joint to determine the "functional" axis of rotation. No functional axis of rotation of the ankle joint was performed as this option was not embedded within the Human Body Model. After calibration, each participant performed an 8-minute familiarization trial 27 at their typical training speed. Thereafter, the participants ran at their comfortable running speed, 10% faster, and at a fixed speed of 2.78 and 3.33 m s −1 , with the order of the conditions being randomized. Marker-based and markerless motion capture data were collected simultaneously during all experimental trials (excluding the familiarization), but only the fixed speeds of 2.78 and 3.33 m s −1 were used for analysis in this paper. The reason for this decision is that the comfortable and 10% faster speeds were largely similar to the fixed speeds, and we therefore expected highly similar and thus redundant results. Data from all speeds were, however, used to train the DeepLabCut model (see section 2.6). This therefore resulted in two trials per participant being used for data analyses.

| Marker-based data processing
Three-dimensional running kinematics were determined in real-time using a lower body and trunk musculoskeletal model (Human Body Model v2, consisting of nine rigid body segments, 21 degrees of freedom and 86 muscles) implemented in the D-flow software. The short computation time with real-time analysis due to global optimization results in negligible errors compared to unlimited computation time for the inverse kinematics analysis during walking. 26 A 2nd order Butterworth with a low-pass cutoff of 20 Hz was used for the inverse kinematics. The output was further processed using custom-written algorithms in Matlab to extract variables of interest. Briefly, the sagittal-plane hip, knee, and ankle angles were time-normalized from 0% to 100% of the gait cycle over a 60 s period of steady-speed running. Footstrike was identified when the vertical ground reaction force exceeded 20 N. If a measurement contained incorrect data due to the real-time processing (e.g., implausible knee angle), the trial was first labeled in Vicon Nexus, and gaps were filed with a combination of spline, pattern, and rigid body fills. The marker coordinates and ground reaction forces were then exported to the Gait Offline Analysis Tool 3.3 (Motek ForceLink B.V., Amsterdam) for offline analyses involving inverse kinematics using a 20 Hz 2nd order Butterworth filter before re-using the custom-made Matlab script for further processing. Biomechanical outcomes compared between the markerless and marker-based motion capture were chosen based on their potential relevance to running injury prevention and include sagittal-plane hip, knee, and ankle angles (e.g., 12 ). Only the right leg was used for analysis since this was closest to the sagittalview 2-D camera.

| Markerless data processing
All video files were imported in DeepLabCut (version 2.2rc3) to train an artificial neural networks for the sagittal-view of the right leg. The neural network combines a residual neural network (ResNet-50) pre-trained on ImageNet with deep convolutional and deconvolutional neural network layers to predict the "learned" locations of landmarks using feature detection. 20 The pretrained nature of this network ensures that only a relatively small amount of training data is required for robust landmark identification. 23 Further technical details of the model training are described in. 20,28 Since we trained a new artificial neural network with DeepLabCut, we split the data into a training and test group to assess the performance and generalizability of the model. Splitting the dataset into a training and test group allowed us to assess how well the model performed on new participants (i.e., the test group) where the model was not trained on, which gave some indication of the generalizability of the trained model. First, we assigned 30 participants to the training group and 10 participants to the testing group (75%-25% split, similar to 20 ). Three out of four trials from participants in the training group were randomly chosen to train the model, resulting in a total of 90 videos being used for training purposes.
The model was trained using a personal computer (NVIDIA GeForce 3060ti, AMD 5800x GPU) with the default settings. To this purpose, for each participant, we extracted 10 video frames (out of 4740 for a typical 1-minute video) for labelling using an automated feature in DeepLabCut that identified distinct frames optimized for labelling. This resulted in 30*10*3 = 900 labeled video frames being used for model training. This number of images is substantially higher than the number required to reduce overfitting and improve the generalizability to a non-trained dataset. 20,28 One researcher manually labeled the following anatomical points of interest in the sagittal-plane for model training: elbow, hip, knee, and ankle (approximate) joint centers, and mid-head, shoulder, mid-wrist, hand, anterior and posterior superior iliac spine, mid-pelvis, 5th metatarsal head, and heel ( Figure 2, upper panel). We used the anatomical landmarks from the motion capture markers for labelling where possible. Note that the upper body markers were not used for further analysis in this study, but are mentioned for completeness as these are visible in Figure 2, upper panel. To improve our estimate of the hip joint center, we first determined the hip joint center of rotation from the functional calibration using the SCoRE procedure in Vicon Nexus 29 for three randomly chosen subjects. We then visualized this hip joint center during the running trials in Vicon Nexus to inform on the hip joint center of rotation for labelling purposes. The head was labeled at the highest location where the ear attached to the head, the shoulder was labeled at the acromion-clavicular joint, the wrist was labeled in the center between the two anatomical wrist markers, and mid-pelvis was labeled as the midpoint between the anterior and posterior superior iliac spine markers ( Figure 2, upper panel). Occluded body parts were not labeled to ensure only accurate landmarks were used as input to the model as recommended previously. 23,30 Model training took approximately 15 h of computational time. The trained model was subsequently used to analyze the videos of the 10 participants in the test set.
The same videos used for DeepLabCut's testing set were also imported in OpenPose (v.1.6.0), where we used the 25-point model for landmark detection (Figure 2, lower panel). The x-y positions of the labeled landmarks were exported from DeepLabCut and OpenPose for each participant and trial and imported in Matlab for further analyses. We first filtered all coordinates using a 15 Hz 4th order low-pass Butterworth to remove high-frequency noise. Then, we calculated sagittal-plane joint angles by computing a straight line between segment markers and calculated their respective difference in angle using the atan function in Matlab. In some frames, the target landmark was occluded by the hand or arm, resulting in low confidence values for the landmark of interest, and sometimes also in inaccurately estimated positions. We therefore removed all datapoints with a confidence of <0.5 and used the gap filling function interp1, pchip type, that preserves the shape via a piecewise cubic interpolation (see Figure S1). Markerless kinematics were time-normalized from 0 to 100% of the gait cycle using a kinematic-based event detection algorithm. This algorithm detected the heel strike as the lowest and most anterior position of the heel landmark. Gait cycles that were less than half or more than three times the median gait cycle duration were considered outliers and removed. We chose to use this gait event detection approach, rather than an approach whereby we used the events from the force plate as in the marker-based approach because force data would also not be available in a typical clinical setting.

| Statistical analysis
The test set (n = 10) was used to compare the markerbased and markerless approaches at each speed using one-dimensional (1-D) statistical parametric mapping (SPM) implemented in Matlab. 31,32 Normal distribution was assessed using a normality check. Note that even though OpenPose did not require a separate test and training set, we used only the participants that were also in the DeepLabCut test set for the comparison between OpenPose and the marker-based approach for consistency. A SPM paired 1-D t-test with a Sidak corrected alpha level of 0.05 (i.e., effective alpha level was therefore 0.017 for three joints per speed) was used for comparison. Since differences in ground contact identification between the marker-based and markerless approach could have introduced differences in the time-normalized data, we apriori also chose to perform a second analysis where the time-series were temporarily aligned using the "xcorr" function in Matlab. This function determines the crosscorrelation between the two time-series and shifts the markerless time-series to maximize the cross-correlation. After comparing the temporally aligned data, we noticed a systematic offset in some joints and therefore also compared the markerless approach to the marker-based approach with the offset being removed (both with and without temporal alignment). The root-mean square error (RMSE; in degrees) was used to quantify the magnitude of the error between the marker-based and markerless approach for all joints and different corrections and was also averaged over all time points and participants. The RMSE was interpreted in relation to previously reported RMSEs, and smallest detectable change for marker-based motion capture.
To remove retroreflective markers as a nonconventional potential source of information for 2-D video analysis, we F I G U R E 2 Example images from one participant in the test set showing the identified landmarks with DeepLabCut (top) and OpenPose (bottom). A video of both models is also available from the OpenScience Framework at https://osf.io/34mka. also assessed DeepLabCut's performance on four trials from two participants in the test set that ran without any retroreflective markers and compared the markerless kinematics with kinematics from the participant at the same speed while running with markers.

| RESULTS
DeepLabCut and OpenPose hip, knee, and ankle angles showed several clusters of significant differences with the marker-based approach prior to temporal alignment and offset removal ( Figure S2-S5), but were typically not significantly different from the marker-based approach at 2.78 m s −1 after both offset removal and temporarily aligning the time-series (Figure 3; Figure 4). Only for DeepLabCut, there was a significantly lower knee flexion angle during the mid-late stance phase ( Figure 3). The effect of only offset removal or temporal alignment is shown in Figures S6-S13. At 3.33 m s −1 , there remained two clusters of significant differences for DeepLabCut and one cluster for OpenPose ( Figure S14; Figure S15).
The RMSE per joint averaged over all time points and all participants for DeepLabCut and OpenPose with and without offset and temporal alignment is reported in Table 1 for 2.78 m s −1 and in Table 2 for 3.33 m s −1 .
DeepLabCut produced similar kinematics for the hip when applied to videos from participants running with and without retroreflective markers, but did not identify the ankle and foot markers ( Figure S16).

| DISCUSSION
The aim of this study was to compare sagittal-plane hip, knee, and ankle running kinematics measured using a custom-trained computer vision method (DeepLabCut) and existing computer vision method (OpenPose) with a "gold-standard" marker-based motion capture system. Our findings show that both markerless methods show significant and likely meaningful differences in comparison with the marker-based approach prior to temporal alignment and offset removal. After temporal alignment and offset removal, OpenPose did not show any significant differences at 2.78 m s −1 , while DeepLabCut showed F I G U R E 3 Top: marker-based (blue) and markerless motion capture (DeepLabCut, red) sagittal-plane joint angles at 2.78 m s −1 for the hip (left column), knee (middle column), and ankle (right column). Middle: corresponding paired samples t-test statistic from statistical parameter mapping in relation to the critical threshold (red dashed line). Bottom: root-squared error between marker-based and markerless (DeepLabCut) motion capture for each joint. Colored lines represent individual root squared errors; the thick bold line represents the rootmean squared error averaged over all participants per time point. a significantly smaller knee flexion angle during the midlate stance phase in comparison with the gold-standard marker-based approach (Figure 3; Figure 4). At 3.33 m s −1 , both DeepLabCut and OpenPose showed a significantly smaller knee flexion angle during the mid-late stance phase, and DeepLabCut also predicted a significantly smaller ankle dorsiflexion angle during the stance phase.
Our findings generally agree with previous research that has compared markerless and marker-based motion capture during other activities such jumping, aquajogging, and walking. [33][34][35][36] Drazan and colleagues, 34 for example, found a very strong overall agreement between the pre-trained DeepLabCut model and marker-based motion capture during jumping with a coefficient of multiple correlations of >0.991 and overall RMSE of 3.2 degrees, which is lower than our overall RMSEs of 6.2 and 8.8 degrees for DeepLabCut and 5.7 and 5.9 degrees for OpenPose at 2.78 m s −1 and 3.33 m s −1 , respectively. The larger RMSE in our study likely reflects the higher movement velocity of running as compared to jumping, which may have introduced noise in the form of motion blur, hereby increasing the difficulty for accurate feature detection. 25 In support of this, previous research showed higher errors during running as compared to vertical jumping and walking. 25 Similar to our findings, Moro et al. 35 found no significant differences in sagittal-plane thigh, shank, and foot angles derived from a customtrained DeepLabCut model and marker-based approach during overground walking, indicating good agreement on a group level. In the only study to investigate running, Needham, Evans, Cosker, Wade, McGuigan, Bilzon, and Colyer 25 however, showed errors in joint center location of 23-58 mm during running with different pose estimation methods, including DeepLabCut. Although these errors in joint center location are difficult to compare to our joint angles, a simulated anterior-posterior displacement of the knee marker by 10 mm leads to an 8 degrees change in peak knee angle. 37 Since our DeepLabCut model showed a RMSE of 8-11.5 degrees for the knee angle, this therefore suggests this joint center was estimated more accurately than in the study by Needham showed OpenPose to exhibit a smaller difference in joint center location during running as compared to the pre-trained DeepLabCut model for all joints. The authors attributed this to the use of the oldest and lowest scoring method for pose estimation in the pre-trained DeepLabCut model as compared to better methods in OpenPose, and to jumping of joint centers to individuals in the background of the video. In our study, OpenPose generally showed a slightly smaller RMSE (Tables 1 and  2) despite the absence of individuals in the background, and use of a custom-trained rather than pre-trained model. This therefore suggests the poorer performance of DeepLabCut as compared to OpenPose may primarily be related to the complexity and architecture of the neural network for detecting landmarks. Nevertheless, the custom-trained nature likely also reduced the error of DeepLabCut as the difference between DeepLabCut and OpenPose in our study was substantially smaller (0.1-3.5 degrees) than the estimated difference in joint angles as derived from joint center errors in (25) (13.6 degrees).
To infer on the usefulness of both methods for kinematic analysis in research studies or clinical practice, it is important to contrast the observed differences at a group and individual level to other methods for kinematic analysis, and to interpret the observed error in relation to the minimum detectable change from marker-based motion capture. With regard to the former, the magnitude of the error for both DeepLabCut and OpenPose is slightly larger (DeepLabCut) or in line with/smaller (OpenPose) than the RMSE reported between different motion-capture techniques during running. Wouda, Giuberti, Bellusci, Maartens, Reenalda, Van Beijnum, and Veltink, 38 for example, reported an average RMSE of 5.2, 8.3, and 5.5 degrees for the hip, knee, and ankle at 3.33 m s −1 between two musculoskeletal models (OpenSim and the PlugInGait model). This is smaller than the RMSEs between DeepLabCut and marker-based motion capture at 3.33 m s −1 in our study, which were 7.4, 10.9, and 8.0 degrees, respectively (Table 2; Figure S14). Similarly, the RMSEs for DeepLabCut both after temporal alignment and offset correction and without such corrections are generally larger than the RMSE between the OpenSim model and a suit (Xsens) equipped with wearable sensors (13.5, 7.2, and 3.8 degrees, respectively). 38 In contrast, the RMSE for OpenPose after temporal alignment and offset correction was in line with or smaller (5.3, 7.0, ad 5.4 degrees for the hip, knee, and ankle at 3.33 m s −1 ) than the reported RMSEs between the two models or between the musculoskeletal model and wearable suit by Wouda and colleagues. This suggests that OpenPose may be useful to facilitate larger scale and in-field data collection. In further support of this, the difference between OpenPose and the marker-based method is generally smaller than the difference reported between other 2D video analysis approaches. [13][14][15] For example, the knee angle at toeoff as measured using a smartphone application (Coach's Eye) was 7 degrees smaller in comparison with the knee angle from marker-based motion capture, 13 which is larger than the 4.7 degrees underestimation with OpenPose after temporal alignment and offset correction at 2.78 m s −1 in our study (1.8 degrees without corrections).
T A B L E 1 Mean ± SD RMSE (°) per joint averaged over all time points and participants at 2.78 m s −1 .

Method DeepLabCut OpenPose
Uncorrected outcomes The RMSE at peak joint angles between both DeepLabCut and OpenPose and the marker-based approach are generally, however, larger than the minimum detectable change reported for peak joint angles from marker-based motion capture (Figure 2; Figure 3), which are typically <5 degrees at speeds ranging from 2.5-3.4 m s −1 . [39][40][41][42] In contrast, the minimum detectable change for joint excursions of the hip, knee, and ankle ranges from 4.5-8.3, 4.3-6.0, and 9.8-7.5 degrees, respectively, 40,42 and is more in line with the mean RMSE values reported over the whole gait cycle for temporally aligned and offset corrected DeepLabCut and OpenPose values.
Overall, these findings suggest that after temporal alignment and offset correction, OpenPose (and to a lesser extend the custom-trained DeepLabCut model) may be used to speed up data collection procedures and facilitate in-field measurements. For OpenPose, the accuracy obtained with a single 2-D camera seems comparable to, or even better than other approaches for kinematic analysis such as wearable suits and other 2-D analysis approaches, yet lower than would be obtained with 3-D marker-based motion capture. The additional speed of data collection may in turn can facilitate larger-scale studies (e.g., 43 ). For example, marker placement in our laboratory typically takes between 15 and 30 min for the full-body marker set used in this study and an additional 10-15 min is used for system and subject calibration. Since the markerless approach used in our study does not involve any marker placement nor any system or subject calibration, it may save 25-45 min of research time. The post-processing time may also be faster since no marker labelling is required. Since sample sizes are typically rather small in biomechanics/gait research (i.e., roughly 30 subjects 44 ), markerless approaches may help to measure larger samples and hereby increase the generalizability, although also at the cost of a reduced accuracy. Moreover, the markerless approach can be used with only one camera and can also be used outside of traditional laboratory settings (e.g., during competition).
At an individual level, the error in joint angles after temporal alignment and offset removal was still substantial for some individuals during certain parts of the gait cycle ( Figure 3; Figure 4). This suggests that both the custom-trained DeepLabCut model and OpenPose should be used with caution for assessing running technique (and hence potentially inferring injury risk/causes) of an individual. For example, a 1-2 degree difference in peak knee flexion angle has been observed between individuals with and without patellofemoral pain, 45,46 but such a difference may not necessarily be captured with the computer vision methods method as the RMSE at peak knee flexion was roughly 7 degrees for DeepLabCut (range of 0-15 degrees; Figure 3) and 3.4 degrees for OpenPose (range 0.6-6.4 degrees). This high variability in accuracy suggests that the method may be most suitable for larger-scale studies that are interested group mean effects-as also evidence by a generally good agreement of the mean joint anglesrather than analysis at an individual level.
An important consideration when using DeepLabCut or Openpose for large-scale studies (or individual analyses) is that we had to temporally align the data, and remove a systematic difference (i.e., offset) to obtain the best agreement. Such corrections may, however, not be available when the computer vision methods are used in a clinical setting, and also impair real-time applications. The offset may have been introduced due to the position of the camera relative to the participants, whereby the camera field of view is centered around the pelvis of the participant (Figure 1; Figure 2), thereby creating an angle relative to the more distal joints, which in turn potentially introduces a slight systematic offset in the computed joint angles. This limitation is inherent to the use of a single 2-D camera, but may be overcome by combining the view of multiple 2-D camera's to approximate a 3-D view as done in other studies, 18,24,25,47,48 or by using 3-D information obtained from a depth camera (e.g., [49][50][51][52]. Note, however, that the amplitude correction for the OpenPose ankle angle did typically not or only minorly impact the RMSE (Table 1; Table 2), suggesting the ankle joint did not show a systematic offset. The temporal difference is likely primarily introduced by the different methods to determine initial contact. Specifically, for the marker-based method, we used the (gold-standard) force plate, while we used a kinematic-based approach for the markerless method as a force-plate approach would typically not be available outside a research lab. The relatively low camera sampling frequency did, however, not allow us to use the most accurate kinematic-based event detection methods (e.g., 53 ). As such, the use of a camera with a higher sampling frequency could likely substantially improve the temporal agreement. The video camera in our study was originally, however, included in the set-up with the sole purpose of providing clinicians an opportunity to visually inspect data. Therefore, the resolution and frame rate are not optimal for markerless motion capture for sports applications. Nevertheless, we temporally aligned the markerless and marker-based time-series in an attempt to explore how the agreement could be with a more accurate event detection method that may be used with a better camera set-up. Overall, the limitations to an improved accuracy of both methods therefore seem primarily technical based (i.e., number of camera's and sampling frequency), which suggests potential for future use with a better camera set-up.
Further research is required on the generalizability specifically of the DeepLabCut model in different settings that may typically be encountered in clinical practice, for example using a different camera type and distance/angle from the treadmill, different lightning, the effect of wearing more clothing, etc. Although previous research has shown that clothing does not meaningful impact the accuracy of markerless motion capture, 54 pilot experiments in our laboratory indicate that different camera positions can substantially affect the accuracy of the custom-trained DeepLabCut model (but not OpenPose model), highlighting that further training may be required to ensure robust functioning with different camera set-ups. An important finding related to this was that the DeepLabCut model worked equally well for the hip and knee landmarks on data collected without any markers ( Figure S16), suggesting the model did not "learn" to identify retroreflective markers and can therefore truly be used without markers on the participants. However, the model was not able to identify ankle or foot markers and therefore might have relied on their presence to identify these landmarks. This difference between ankle and knee/hip markers is likely related to the availability of the markers in the training set, whereby the ankle and foot markers were always visible in the labeled images of the training set, whereas the hip or knee markers were not always visible for example due to occlusion from the arms. This finding is important as it suggests future studies should ensure labelling includes video frames where markers are not present, and they indicate the need to assess the performance of the model on movements without markers. Finally, both markerless approaches still use paid software (i.e., Matlab) to perform part of the data processing, and further steps are required to implement these processing steps in free software (e.g., Python) with no need for coding to facilitate use by practitioners. Related to this, both practitioners and researchers could benefit from real-time analysis. This may be achieved using recently developed tools that use real-time pose estimations. 55

| Limitations
There are several limitations to this study. First, our goldstandard marker-based approach is prone to error, for example due to marker misplacement 56 and soft-tissue artifacts. Nevertheless, the same researcher experienced with marker-based motion capture placed all markers to minimize placement error and soft-tissue artifacts are likely minimized due to the inverse kinematics procedure (i.e., fitting a rigid-body musculoskeletal model with joint constraints to marker positions and using the model's orientation to compute joint angles). Related to this, the markerless approach used direct kinematics (i.e., computing angles between straight lines connecting markers) rather than inverse kinematics in the markerbased approach. A direct kinematics approach is more sensitive to errors in the estimation of joint centers than the inverse kinematics approach 57 and this may have increased the error of the joint angles in the markerless approach. Recently, fully automated workflows for inverse kinematics with markerless motion capture data have been developed, and these may prove useful for future applications. 47,58 Second, the accuracy of the markerless approach depends heavily on the accuracy of the manually identified landmarks used for supervised learning. Since only one researcher manually labeled all images, this could have introduced bias as different researchers may place the landmarks at (slightly) different locations. However, we informally assessed the agreement between five different researchers in pilot experiments, and this produced highly similar results. Third, the resolution and sampling frequency of our camera's were relatively low (659 × 494, 50 Hz, respectively, as compared to 200 Hz in for example 25 ), which limited the ability to accurately identify landmarks, in particular for the toes and ankle that often moved at high speeds and thereby caused motion blurring (see Figure 2 for an example of this effect). Despite this, we did not observe a higher error at the ankle joint as compared to for example the hip joint as would be expected from the higher segment speed. Finally, we only measured kinematics on the right-body side, and this approach may therefore not be suitable to identify betweenleg imbalances in running technique.

| Perspective
Overall, our findings show that both markerless methods show significant and likely meaningful differences in sagittal-plane joint angles during treadmill running in comparison with the marker-based approach prior to temporal alignment and offset removal. After temporal alignment and offset removal, OpenPose did not show any significant differences at 2.78 m s −1 , while DeepLabCut showed a significantly smaller knee flexion angle during the mid-late stance phase in comparison with the goldstandard marker-based approach (Figure 3; Figure 4). At 3.33 m s −1 , both DeepLabCut and OpenPose showed a significantly smaller knee flexion angle during the midlate stance phase, and DeepLabCut also predicted a significantly smaller ankle dorsiflexion angle during the stance phase. The magnitude of the differences between OpenPose and the marker-based method was in line with or smaller than reported between other methods and marker-based methods, thus suggesting some usefulness of this approach to facilitate large-scale and in-field data collection. Nevertheless, the error was higher than the minimum detectable change for marker-based motion capture. When these findings are combined with the considerable variation in accuracy between individuals, this suggests that OpenPose and to a lesser extend a customtrained DeepLabCut model are most suitable for largescale data collection and investigation of group effects rather than individual-level analyses.
Researchers and/or clinicians that would like to use the custom-trained DeepLabCut model can download this from the OpenScience Framework, available at Doi 10.17605/OSF.IO/D6VPW. In this case, only a small amount of training will likely be required to improve the model functioning in the new task (e.g., walking) and setting due to the transfer learning phenomenon. 20,23,30 The OpenPose model is already freely available.