Human movement, results from the complex interplay of the nervous, muscular, and skeletal systems, and exhibits unique outcomes due to the adaptability of those systems1,2. The interaction of intrinsic-organismic, environmental, and task constraints leads to individualized movement patterns3.

Running is a typical example of a fundamental human activity where all the above-described properties can be observed. Running is a demanding activity for each of the three-component structure that controls movement. It becomes even more challenging for the system mechanics when fatigue progressively accumulates. Fatigue and running are two terms that are inextricably linked. Fatigue as a phenomenon has attracted significant research interest over the years and is defined as the decline in various objective measures of performance over a discrete time period4.

During exhaustive exercise, central and peripheral fatigue contribute to performance decline5. Central fatigue relates to the reduced central nervous system capacity for optimal motor output, while peripheral fatigue involves factors at/or distal to the neuromuscular junction6,7,8. The interplay between fatigue mechanisms and physiological constraints have been reported to influence performance decline9.

Researchers have proposed various physiological indices to identify the onset of fatigue accumulation during exercise, including the ventilatory threshold, respiratory compensation point, heart rate deflection point, critical power, physical working capacity at the fatigue threshold, and electromyographic fatigue threshold. These indices are used to estimate physical exertion and mark the metabolic transition from aerobic to anaerobic energy production, where fatigue effects are more likely to be observed10,11. Fatigue significantly impacts musculoskeletal mechanics during running, affecting the mechanical behavior of the system. Adverse effects on neuromuscular function can lead to a reduction in mechanical energy transfer during the stretch–shortening cycle12 and a decrease in muscle reaction times13. Fatigue also influences trunk kinematics, with changes in trunk flexion and extension observed in recreational runners after a fatigue running protocol14. Exhaustive running makes the stance phase more variable, complicating athletes' efforts to maintain optimal angular displacements15,16. A recent study17 highlighted the importance of features like lateral trunk bending and maximum loading rate in predicting the fatigue state of recreational runners.

Since inter-individual variability is a well-established concept18, the accumulation of fatigue during running is very probable to cause a unique pattern in the strategy that every runner employs to compensate for the effects of fatigue. Bates et al.1,19 identified unique performance characteristics among five elite runners that were masked when a descriptive group approach was adopted. However, running biomechanics are mostly investigated using group-based analyses. While group-based analysis can provide meaningful insights about the differences between groups or/and extract conclusions about universal features that characterize human movement irrespective of the level of expertise20, the lack of homogeneity in human movement result in a substantial difficulty in deciphering individual patterns. Especially, using group model statistics, this challenge becomes even harder. On the contrary, subject-specific modeling can be useful in identifying emerging patterns and track changes within a single individual. With the substantial improvement of wearable Inertial Measurement Unit (IMU) sensors in quality, cost and ease of recording reliable data, coaches and runners use such systems to quantify fatigue, training load and running mechanics and explore movement patterns.21,22 By taking advantage of the ability these sensors provide to continuously record data (which can lead to an extensive amount of information), and the use of subject-specific modeling, we could obtain more specialized information about the movement changes, and in some cases, outperform group-based models.23,24.

Advanced data analysis techniques like supervised machine learning models have been proposed to model complex relationships between biomechanical measures and outcomes of interest25. Tree based methods like random forests (RF) have been successfully used in biomechanical research because of their robustness26,27. Nevertheless, such tree ensemble algorithms can be more difficult in interpretation despite their potentially satisfactory prediction performance. This drawback can be mediated using model interpretation techniques such as SHAP (SHapley Additive exPlanations) values which can be used for assigning feature importance and interactions among the examined features and thus provide insights about algorithm predictions28,29.

Therefore, the purpose of this investigation was to explore through machine learning models (classification) the subject-specific changes in biomechanics of running to exhaustion and map the importance of the measured features, using a physiological threshold as the cut-off criterion. The importance of individuality in the running pattern was supported by Hoitz et al.30 which found that kinematic features of running, derived from coronal and transverse plane were the most relevant in providing information on a runner’s unique movement pattern, whereas characteristics of the sagittal plane and ground reaction forces in vertical or anterior–posterior direction were mostly irrelevant.

This study was set to explore the hypothesis that during incremental running to exhaustion some features, which characterize running technique, will have a universal importance amongst the individuals, while others will contribute in a subject-specific way.

Results

The classification accuracy value range was from 0.853 to 0.962. Classification accuracy and other metrics obtained for each runner are displayed in Table 1. The SHAP based variable importance ranked all the features according to their value for the model, to predict the state above the VT2.

Table 1 Classification accuracy, Cohen’s kappa, sensitivity, specificity and F1 score for each runner.

In Fig. 1, the proportion of feature importance for each runner is presented. Τhe feature RTAPu appeared to be the most important for prediction in 6 out of 13 participants. Also, RFDmaxD appeared first in 3 out of 13 and RTLTu in 2 cases respectively. GRFpeak and Itotal were the most important features for 1 out of 13 of the sample participants.

Figure 1
figure 1

SHAP based variable importance for each participant.

Figure 2 displays the SHAP dependence plot which describes feature effects on the predictions. The y-axis represents the SHAP value and x-axis the raw feature values and so the plot shows the features’ overall influence on the model predictions and thus provides an intuitive way to understand the fitted model.

Figure 2
figure 2

SHAP dependence plots for the three representative participants with RTAPu (lateral trunk flexion/extension) as their most influential feature. SHAP dependence plots show the contribution of a feature to the model based on the feature’s distribution. In this plot each point shows an observation from the individual datasets, the X-axis line shows the value of the feature in that instance, and the Y-axis shows the SHAP value for that feature (RTAPu) that indicates the effect of that feature with that specific value on the prediction. The unit of X-axis is the same as the unit of the feature (for RTAPu—degrees), and the Y-axis is the SHAP value for predicting pre or post the VT2. Color coding has been used to show the velocity stages of the observations. SHAP values above the y = 0 lead prediction values towards the target class (after VT2) and the opposite is true for SHAP values less than y = 0.

Discussion

The aim of this study was to classify changes in subject-specific running to exhaustion patterns based on the VT2 using RF classifiers. The results of the current study supported the first part of the initial hypotheses and demonstrated that using an RF approach, robust and accurate classification could be achieved (Ahamed et al., 2019).

While each participant had a different mix of important predictor variables, RTAPu was the most or second most important feature in the majority of the participants. This is in line with previous findings from group-based analysis and also verifies the initial hypothesis of the study17. Winter et al.2 identified the critical role of trunk frontal plane kinematics (RTAPu) for balance control due to the trajectory of the center of mass which is medial to the base of support. Also, lateral flexion (RTAPu) of the upper trunk towards the supporting leg serves as an assisting mechanism to oppose the abduction torque mainly induced by GRF during running31. Therefore, regulation of this feature is important for an effective running pattern.

Nevertheless, although the majority of runners exhibit a behavior that was characterized from RTAPu there were also individuals who did not followed that pattern and had a totally different set of features describing their performance. For example, subjects 2 and 7 displayed a very specific behavior regarding feature importance, which at the same time differed from all other participants (Fig. 1). Other researchers that studied the walking and running gait patterns in healthy individuals also reported the presence of distinct patterns32. Putting fatigue in the equation, it is reasonable to expect that individuals will respond with a specific motor solution seeking a more energy-efficient movement pattern based on the constraints imposed from their anatomy, morphology, physiology and level of training. Apart from the energy efficiency hypothesis33, fatigue in this case could be the control variable, the scaling of which may be responsible for moving to a different attractor34 according to the dynamic systems theory perspective. This theory posits that the behavioral state (attractor) of an order parameter (running mechanics in the present case) is dependent on the scaling of a control parameter (duration or speed of running). Such transition phases are characterized by highly variable patterns35 of the order parameter.

Therefore, the expected motor response to fatigue could be different not only for the magnitude level of a specific feature, but also for all selected features that are triggered as a response to the imposed demands. The SHAP dependence plots for the most influential feature are also pointing out the specificity of the response, with some individuals exhibiting more consistent patterns, while others have more variance in their predictions. Specifically, it is observed that in all runners, for the first stages of the test, where velocity is at the low end of their respected continuum to exhaustion, predictions are very consistent with a negative prediction impact on the overall model. That is not the case as running velocities increase and especially in the transition phase (between 55—75% of the total test duration in the current study) where variability of prediction is evident. At this point it is crucial to report that increasing velocities without the presence of fatigue does not influence the variability of the response. This finding is valid up to velocities up to 100% of maximal aerobic velocity36 which corresponded to velocities up to ~ 5.5 m/s. The final velocities in our study ranged from 3.89 to 5 m/s. Therefore, the dispersion of prediction values, originating from the intermediate stages of velocity, points out the existence of a certain degree of variability in the movement response pattern during these stages, where fatigue apparently kicks-in. For example, in runners 6 and 8 (Fig. 2) there is a notable consistency in predictions with increasing velocity but this is not the case for runner 3 (Fig. 2) where significant variability is present.

Also, the vertical dispersion (when for about the same area of the feature’s distribution the impact on overall model may be very different) in the SHAP values that appears in some individuals for fixed feature values, may be related to an interaction effect with other features29. This highlights the fact that an instance of SHAP values for a feature is not solely dependent on the value of that feature, but is also influenced from the values of other features at that instance. In Fig. 2 it is observed that at the early stages of the test, where everybody runs in a comfortable velocity, there is limited vertical dispersion in the SHAP values, relatively to later stages of the test. Possibly this phenomenon suggests that the model can capture the compensations, or interconnections of available features, that occur while the body tries to adapt and maintain proper technique.

The importance of these two characteristics of the SHAP plots, as interpreted above, is that they account for feature interaction and the variability of prediction around the transition phase. This could provide a direction regarding what velocities are those in which training should take place to make the movement pattern more stable and mitigate the effects of fatigue at higher intensities while maintaining the proper technique15,37,38. As stated previously observing Fig. 2 can reveal that there are individuals where predictions are consistent and in line with the natural succession of velocities as going towards the end of the test and makes clear for example that runners 6 and 8 exhibit a stable pattern up to 3.61 m/s. Beyond those velocities significant variation and interactions are present which likely suggest alterations in the running biomechanical pattern.

A substantial body of the literature has examined mostly single and discrete variables (e.g. stride time, cadence) often in isolation from other features, resulting in conflicting outcomes about the effects of fatigue during running. For example, measuring contact time after fatigue in a study by Morin et al. (2011a) in ultramarathon runners who performed a 24-h treadmill test showed a decrease. However, it has to be clear that fatigue during an incremental to exhaustion test and ultra marathons is regulated by very different mechanisms. Moreover, there are reports of reductions in peak knee flexion angle during stance40, whereas others reported an increase41. The differentiation of the protocols in terms of test velocity (constant—variable) may also have contributed to the observed difference. Nevertheless, this type of univariate statistical approach may also be less sensitive in detecting changes in overall running patterns42. In contrast, the current study supports previous research which suggests that examining multivariate and/or multi-segment changes may better quantify the overall biomechanical changes that occur during running (Phinyomark et al., 2015). It should be noted that although data points from each step in the entire trial were recorded for building the subject-specific models, the sample was relatively small to account for the substantial population variations. Other variables that often have shown to be sensitive to fatigue such as flight time, contact time, vertical stiffness, etc., were not measured in this study and therefore this could impact the model output.

A critical point concerns the calculation of the RFD feature for which the time component is imperative. The validity of the treadmill force plate set up was checked for its structural stiffness over several loads with dead weights with the motor on and off and was found quite higher than leg stiffness during running45. Yet any amount of dampening or delay in the force transmission through the treadmill to the force plates may affect the accuracy of RFD value or even GRFpeak by smoothening them due to “filtering” of high frequencies. Nevertheless, dampening occurs when running on track, grass and practically in several natural running surface that runners tend to train. Another potential limitation could arise from the a) the timing variability of VT2 detection and b) the individuality of the physiological or psychological response. Since there is a certain degree of subjectivity in the identification process and interpretation criteria in the literature this could bring some uncertainty in the precise estimation of VT2, and also the perception of fatigue could differ due to individual physiological or psychological differences. While this method offers an interesting angle for exploration, we recognize the inherent difficulty in pinpointing a precise crossing point at which fatigue becomes conspicuous and emerges. The intricate and multifaceted nature of fatigue, along with its manifestation in kinematic patterns, contributes to this challenge. Nevertheless, the accuracy of the models supports the use of VT2 as a criterion for supervised machine learning approach. Future research should focus on examining alternative approaches for detecting the onset of fatigue accumulation and further understanding its impact on kinematic variations.

It should be reported that the present research setup and methodology points out associations between selected features but no cause-and-effect relationships. However, the SHAP values, can assist to gain a deeper understanding of the complex relationships between the features and the output, as well as the interactions among the features themselves. Although these associations are not entirely interpretable, they form an inspiring challenge to study the complexity of the system as a whole. For example, this can be pursuit following a dynamic systems perspective (how musculoskeletal and biological constraints interact with the task).

Materials and methods

Participants

Thirteen (13) male recreational runners (age = 37.84 ± 4.53 years, height = 178.15 ± 5.37 cm, weight = 78.85 ± 6.89 kg) voluntarily participated in the study. All runners were healthy and free of any neuromuscular or musculoskeletal disorders. All participants had to have at least three years of systematic training and racing at distances greater than 10 km. The range of performances across the participants in the study was between 49 and 57 min for 10 km races. The study protocol was approved by the Institutional Research Review Board (Aristotle University Research Ethics and Bioethics Committee: ΕΗ-12/2020) and was conducted in accordance with the Declaration of Helsinki. Written informed consents were obtained from all participants.

Protocol and instrumentation

Both biomechanical and physiological data were collected during an incremental running to exhaustion test on a treadmill (Impulse RT700, UK). All tests took place between 14.00 and 18.00 pm. Familiarization with the measuring equipment was pursued with participants a day before the trial when they came to the lab for an easy five-minute run, wearing all the measuring equipment without logging any data. On the testing day the participants were asked not to eat anything at least 3 h prior to testing. At this day the procedure included an eight-minute light warmup at a self-comfort velocity followed by 5 min of dynamic stretching. The main running task were initiated with participants running at a velocity equivalent to 85% of their 10 k tempo (most recent race pace) which was approximately between 2.5 and 3.61 m/s. All subjects performed 3-min stages with a steady increase on the workload of 0.28 m/s until they were unable to continue and voluntarily interrupted the test. To ensure that participants reached significant levels of exhaustion the following criteria were used: Respiratory Exchange Ratio ≥ 1.1, Heart Rate ≥  ± 10 beats × min−1 age predicted HRmax, and RPE > 17. Respiratory data (\(\dot{V}\)O2 and \(\dot{V}\)CO2) were recorded through a portable gas analyzer (PNOE, ENDO Medical, Palo Alto, CA). Ground reaction force data obtained from a dual force plate system (k-Delta, K-Invent, Biomechanique, Montpellier) with a sampling frequency of 516 Hz on which the treadmill was securely mounted. Also, torso kinematics were quantified using a pair of USB connected 6 DoF IMUs (k-sens, K-Invent, Biomechanique, Montpellier) with 218 Hz sampling frequency. From angular velocity data, the angular displacement was calculated through integration of angular velocity around the axis of the interest. Systematic drift of the gyroscopes that appeared during integration was removed using least squares regression methods. The minimum detectable step (resolution) for the IMU sensors is 4 mg/LSB (Least Significant Bit) for the accelerometer, and 0.06°/s for the gyroscope. The maximum detectable value for the accelerometer was ± 16 g and for gyroscope ± 2000°/s. The sensors were fixed on C7 and L5 respectively. Both systems were internally synchronized.

Data analysis

Pre-processing

Raw kinetic and kinematic signals were filtered with a second order Butterworth filter with a cutoff frequency at 30 and 15 Hz respectively. All the features that were extracted from the ground reaction force and the IMU data, represented discrete values from every step (Table 2). For each footfall, contact times were determined with backward and forward search of the point when the curve gradient was equal to zero and setting a threshold for the vertical force signal exceeded 40 N.

Table 2 Extracted features from the raw signals.

A moving average filter of eleven breath window was adopted to smooth ventilation data. For the second ventilatory threshold (VT2) identification, two experienced independent researchers were asked to examine \(\dot{V}\)E/\(\dot{V}\)O2\(\dot{V}\)E/\(\dot{V}\)CO2 curves and report the specific time point. In the literature, the VT2 which is closely related to anaerobic processes, is defined as an over proportional increase in \(\dot{V}\)E vs \(\dot{V}\)O2 output46. The average estimation was defined as the time point of data division in two conditions: before and after VT2. A twenty second data window before and after that point was removed for every individual. The datasets were searched for outliers according to a rule of ± 3 standard deviations. If a point were ± 3 standard deviations away from the local mean value (3’ stage), that point’s value was replaced with nearest non outlier point value. In any case in all of the thirteen datasets the percentage of outliers were less than 1%. Finally, thirteen n x m matrices were created, one for every participant with n representing rows (observations) and m columns (features) with n ranging between 2347 and 3577 among participants.

Subject specific models

Random forests (RF) were used for classification of the pre and post conditions. RF implementation was performed as originally described by Breiman47. The algorithm combines many decision trees grown in random subsets selected with bootstrap aggregation from the feature space. Also, each time a split is made a random sample of the available features are considered. Since predictions are produced from the majority vote (average prediction of individual trees) of the produced trees, the described procedure has the advantage to decorrelate the trees and provide good results in terms of accuracy and highlighting the importance of the supplied features. Model results validation was checked by randomly splitting the data into training (80%) and testing (20%) sets. The algorithm was trained with tenfold cross validation in the training set and tested the predictions in the remaining 20% of the data. This procedure was performed separately for every individual and the related performance metrics for each runner appear in Table 1.

In classification problems, large differences in proportions of the response (dependent) variable may have a significant negative impact on model fitting. For that reason, special consideration was taken regarding representative proportions of classes in the response variable during splitting the datasets because of the inter-individual variability in the time points where the VT2 appeared. If large disparity between classes was identified, subsampling with random under-sampling was adopted. Random under-sampling seeks to randomly select and remove instances from the majority class and so to provide a balanced proportion of the classes consisting the response.

RF is sensitive to parameters such as number of trees that will be grown and number of predictors to be considered. For this reason, grid search for optimal values of those parameters was adopted, using out-of-bag error minimization as criterion selection. For feature selection a recursive elimination process was adopted based on accuracy values. This iterative approach aimed to identify the subset of features that have the greatest impact on the accuracy of the model. It should also be noted that to ensure the method was entirely subject-specific, parameter tuning was conducted separately within each individual. Accuracy (the number of correctly predicted data points out of all the data points), sensitivity (the true positive recognition rate), specificity (the proportion of actual negatives, which got predicted as the negatives) and Cohen’s Kappa coefficient (a measure of how closely the instances classified by the machine learning classifier matched the data labeled as ground truth, controlling for the accuracy of a random classifier as measured by the expected accuracy) were used to assess model’s performance (Table 1). All models were built in Python 3.6 with scikit-learn 1.0.1. Finally, SHAP (SHapley Additive Explanations) with TreeSHAP implementation29 were used as proposed by Lundberg & Lee28 to interpret how the adopted RF models yielded their predictions. This methodological approach provides consistent and accurate attribution values for each feature within each prediction model29. SHAP can provide a visualization of the overall feature importance for all the individual models that were fitted to understand the different combinations of relative importance needed for each of the participant’s predictions. Also, SHAP dependence plots examined how a feature’s attributed importance changes as its value varies within its distribution range29. That is a model agnostic, unified approach for explaining the outcome of any machine learning model. SHAP values evaluate the importance of the output resulting from the inclusion of feature A for all combinations of features other than A. SHAP analysis was carried out using the SHAP python module48. Since in the current study a subject-specific design and analysis was adopted, we did not include any null hypothesis significance testing (NHST).

Conclusions

In the present study, each individual exhibited different changes in overall running mechanical parameters in predicting the post VT2 state. These findings also support the efficacy of machine learning modeling approach for understanding the complexities of running gait patterns based on collecting large amount of data in a laboratory setting. Overall, the results of this study showed that the production of work in a fatigued condition results in subject-specific changes in biomechanical running patterns, possibly in an effort to compensate and optimize the system function to account for the imposed demands based on individual movement characteristics or constraints.