Predictive Performance Models in Long-Distance Runners: A Narrative Review

Physiological variables such as maximal oxygen uptake (VO2max), velocity at maximal oxygen uptake (vVO2max), running economy (RE) and changes in lactate levels are considered the main factors determining performance in long-distance races. The aim of this review was to present the mathematical models available in the literature to estimate performance in the 5000 m, 10,000 m, half-marathon and marathon events. Eighty-eight articles were identified, selections were made based on the inclusion criteria and the full text of the articles were obtained. The articles were reviewed and categorized according to demographic, anthropometric, exercise physiology and field test variables were also included by athletic specialty. A total of 58 studies were included, from 1983 to the present, distributed in the following categories: 12 in the 5000 m, 13 in the 10,000 m, 12 in the half-marathon and 21 in the marathon. A total of 136 independent variables associated with performance in long-distance races were considered, 43.4% of which pertained to variables derived from the evaluation of aerobic metabolism, 26.5% to variables associated with training load and 20.6% to anthropometric variables, body composition and somatotype components. The most closely associated variables in the prediction models for the half and full marathon specialties were the variables obtained from the laboratory tests (VO2max, vVO2max), training variables (training pace, training load) and anthropometric variables (fat mass, skinfolds). A large gap exists in predicting time in long-distance races, based on field tests. Physiological effort assessments are almost exclusive to shorter specialties (5000 m and 10,000 m). The predictor variables of the half-marathon are mainly anthropometric, but with moderate coefficients of determination. The variables of note in the marathon category are fundamentally those associated with training and those derived from physiological evaluation and anthropometric parameters.


Introduction
The great popularity of long-distance running has seen an unprecedented increase in the last 10 years. This has generated, in coaches and athletes, a great interest in the development of performance prediction models based on linear regression equations, with the aim of helping many athletes in their preparation for competitions. These predictions are based on a combination of physiological, anthropometric, nutritional and training factors (modifying frequency, volume and intensity), most obtained in exercise physiology laboratories, through variables related to training load [1,2].
Performance in long-distance disciplines can be defined as the final time or race time, and its understanding is important both for designing training programs and for determining scheduled training and race pace. However, accurate knowledge is frequently difficult to obtain, especially in long-distance races, as it would involve high training loads, which can, at times, indicate poor race planning in inexperienced runners who normally use polarized training methods [3]. This and other factors associated with the control of training, result in predictive models being recognized and useful for coaches or professional runners. The physiological adaptations produced by training in amateur runners are well understood and are generally those performed at submaximal intensities with continuous training strategies [4]. In high-level athletes, these improvements are seen particularly with tempo runs and short-interval training, as methods to improve performance [5]. Therefore, transferring the results and conclusions obtained from amateur athletes to high-level athletes is not advisable [6].
Performance in endurance running is influenced by a variety of factors, both anthropometric and training. Morphological (somatotype components) and anthropometric characteristics such as skinfolds, body fat percentage, circumferences, lower limb length, weight, height and body mass index appear to influence performance. Accordingly, certain characteristics have a better relationship between energy expenditure and performance [7,8].
There are numerous studies on physiological factors in the literature on performance prediction in long-distance runners. Classically, maximal oxygen uptake (VO 2 max), running economy (RE) and anaerobic threshold (AT) stand out as the main variables that have been used to predict performance in long-distance races [9,10], but a large gap exists in the field of performance prediction based on field tests.
The aim of this narrative review was to undertake a descriptive, analytical and detailed analysis of the determinants and predictive ability of anthropometric, physiological (laboratory test), training and combined variables, as well as field assessments (field tests), to estimate performance in specialties of long-distance races (5000 m, 10,000 m, half-marathon and marathon).

Materials and Methods
This document is classified as a narrative review and was carried out under a framework of assigning key attributes based on Search, Appraisal, Synthesis and Analysis (SALSA) [11]. Accordingly, the search was exhaustive. The synthesis is a tabular exposition of the data and the analysis may be chronological, conceptual or thematic [11]. In general terms, this narrative review presents all the known published works that include runners of different levels: all of these in different types of runner (amateur, moderately trained, highly trained, high-level and elite) with the common denominator that they are generally trained both in length of time and number of weekly sessions. Also included are all studies that found associations between anthropometric and physiological parameters and performance in the middle-distance (5000 m and 10,000 m) and long-distance (half-marathon and marathon) events.

Search
The abstracts of original English articles registered in the Pubmed, SciELO (Scientific Electronic Library On line), ScienceDirect and SportDiscus databases were reviewed. The terms entered in the search engines were as follows: "runners", "long distance runners", "performance", "performance prediction", "anthropometric", "physiological determinants", "performance determinants", "5000 m", "10,000 m", "half-marathon" and "marathon", as well as the combinations of all of them, depending on the specialty examined.

Selection Criteria
The selection criteria were all relevant articles, as well as books and monographs. The first evaluation consisted of reading the abstract and the full text of the selected studies, followed by an analysis of the results.

Exclusion Criteria
Case studies, duplicate articles and abstracts without clear and sufficient information were excluded.

Results
The flow chart ( Figure 1) shows the final selection of 58 articles, with 12 articles identified for the 5000 m modality, 13 for the 10,000 m, 12 for the half-marathon and 21 for the marathon. m", "10,000 m", "half-marathon" and "marathon", as well as the combinations of all of them, depending on the specialty examined.

Selection Criteria
The selection criteria were all relevant articles, as well as books and monographs. The first evaluation consisted of reading the abstract and the full text of the selected studies, followed by an analysis of the results.

Exclusion Criteria
Case studies, duplicate articles and abstracts without clear and sufficient information were excluded.

Results
The flow chart ( Figure 1) shows the final selection of 58 articles, with 12 articles identified for the 5000 m modality, 13 for the 10,000 m, 12 for the half-marathon and 21 for the marathon. In Table 1, the variables are grouped as demographic, laboratory assessments, field test, training, anthropometric and others.  In Table 1, the variables are grouped as demographic, laboratory assessments, field test, training, anthropometric and others.

Demographic Variables
Of the seven demographic variables, the most notable is age, which is included in all the specialties studied. Gender is only recorded in the 5000 m specialty [12].

Aerobic Metabolism Assessment Variables
In this section, the variables were classified into two groups: 1. Maximum range (VO 2 max, velocity at maximal oxygen uptake [vVO 2 max], maximum heart rate, maximum lactate, vVO 2 with the University of Montreal Track Test, anaerobic capacity and oxygen deficit, etc.).
2. Submaximal range (VO 2 at lactate threshold, lactate threshold, velocity at lactate levels of 2.5-3 and 4 mmol/L, RE, heart rate at individual anaerobic threshold (IAT), velocity at heart rate deflection point, VO 2 and % VO 2 at AT, velocity at AT, lactate level at AT and % of peak velocity at AT). Of particular note are vVO 2 max and VO 2 max, RE, understood as oxygen uptake at specific velocity, VO 2 at AT and velocity at the level of 4 mmol/L lactate. Thirty-one of these studies include mL/kg/min among the variables that are associated with or are predictive factors of running performance from middle to long distance. Additionally, 24 studies include variables such as km/h, m/min, m/s associated with conditions obtained at VT2 (anaerobic threshold), velocity at heart rate deflection, IAT, ATLab (AT in laboratory test), etc.

Training Variables
The training variables were grouped into two categories: quantitative (mean race duration, number of training sessions per week, miles per week, km per week, training volume, miles in 8 weeks, training in 9 weeks, years of training) and qualitative (training pace, record for 1 mile, 5 miles, 10 miles, half-marathon time and having finished a marathon).

Field Test Variables
Only two studies measuring AT using the University of Montreal Track test [13], and covered distance in the Cooper test [14,15]

Anthropometric Variables
These variables are classified into three categories: (i) basic measurements (height, weight, body mass index, skinfolds and muscle circumferences), (ii) body composition fractions (fat mass, fat-free mass and skeletal muscle mass) and (iii) somatotype components (endomorphy, mesomorphy and ectomorphy). Other important performance-related variables are body mass index, fat mass percentage, and skinfolds as regional indicators of adiposity associated with performance. Fifteen of the 26 studies were conducted in the half-marathon specialty by Knechtle's research group [8,16,17].

Other Variables
Noteworthy are also the use of a biochemical variable such as transferrin levels, as well as a model based on data collection through a post-competition survey [14] and leg volume and heart rate changes during the Ruffier test recovery period [15].

Data Management and Presentation
Tables 2-5 are individual tables for each distance (5000 m, 10,000 m) and long-distance specialty (half-marathon and marathon) respectively and structured to display: Author, year of publication, sex, number of participants, athletic level, dependent variable, independent variable(s) associated with performance (correlation coefficient, p-value) or if the independent variables comprise a significant model (equation): the coefficient of determination (R 2 ) and the standard error of the estimate (SEE), the limits of agreement of the Bland-Altman plot (only in half-marathon) and the predictive equation.                  The tables present two types of study: those without a prediction equation in which they provide the correlations between the independent variables and the dependent variable (correlation coefficient and p-value. The studies including a prediction equation are shown in the tables with the R 2 value and the SEE. In Table 4 only, corresponding to the studies on the half-marathon, a further section is included, pertaining to the information on bias between the predicted and the actual time, with the limits of agreement derived from the studies by Knechtle's [8,18,19] and other authors [14,15,20,21]. Finally, the studies with a prediction equation are presented in a highlighted text box

Variables and Models Associated with the 5000 m Event
Search: The different keywords were combined as follows: "performance, performance prediction", "performance determinants", "anthropometric and physiological determinants", "5000 m", "5 km".
Appraisal: The subjects of the different studies were generally moderately trained or highly trained athletes of different athletic levels (amateur, collegiate, competitive, elite), except for the study by Stratton which includes untrained individuals [22]. Of all the studies, only a few provide coefficients for determining the independent variable [13,[23][24][25][26][27]. The coefficients of determination ranged from 0.62 to 0.98, but none of the studies reported the standard error. Additionally, the study by Stratton has an external validation study in a subsample of subjects [22].
Synthesis: It should be noted that in all the studies, the variables most used for performance prediction are derived from determinations of aerobic metabolism. In one study the variable is the percentage of fat mass measured by anthropometry [28] and in another the fat-free mass [29]. Only one study was conducted in which the velocity at VO 2 max in the University of Montreal Track Test, as a field variable, is presented as a predictor variable [13].

Variables and Models Associated with the 10,000 m Event
Search: The different keywords were combined as follows: "performance, performance prediction," "anthropometric and physiological determinants," "performance determinants," "10,000 m," "10 km".
Appraisal: The subjects of the different studies were generally trained athletes of different levels (amateur, competitive, elite) with the exception of the studies by Brandon [34] and Berg [35], which included only moderately trained individuals.
Synthesis: In all the studies, the variables most used for prediction continue to be those derived from laboratory tests. Furthermore, these variables increase compared to the 5000 m specialty. New variables include those from training data, such as number of training sessions, miles per week and years of training [7]. In addition, anthropometric variables such as skinfolds [36] and two somatotype components are beginning to be included [35] although these equations have a low-moderate R 2 (0.380-0.41).

Variables and Models Associated with the Half-Marathon Event
Search: The different keywords were combined as follows: "long distance runners," "performance, performance prediction," "anthropometric and physiological determinants," "performance determinants," "half-marathon".
Appraisal: The subjects of the different studies were generally at an amateur level and infrequently at a competitive level (Roecker et al., 1998) [28].
Synthesis: It should be noted that the half-marathon is not an official specialty of the Olympic Games or the World Championships, although there are national and international competitions in this event. Consequently, the largest number of individuals who practice this modality are amateur runners, with different training loads, ages and levels of experience. Multiple associations have been found between performance and anthropometric variables, but with models of moderate predictive power (R 2 = 0.440-0.71) and with wide limits of agreement between the predicted time and the actual race time. Finally, two studies should be mentioned due to the high coefficient of determination (R 2 = 0.84) and relatively low limits of agreement obtained through the distance covered in the Cooper test as a predictor variable [14,15]. This is a simple field test that can be introduced into training routines and can provide very useful information and Cooper's test has a good accuracy and reliability in amateur long-distance runners [20].
Analysis: Table 4 presents 11 studies from 1985 to 2020 [8,[14][15][16]28,[47][48][49][50]. Of these 11 studies, nine were undertaken from 2011. In this section we should note the many contributions by Knechtle's group. Multiple publications by these authors base their results on the relationships between performance in half-marathon races with anthropometric variables such as skinfolds, estimated body composition variables such as fat mass and skeletal muscle mass, and training load variables such as average training velocity [8,48,50,51] (Table 4).

Variables and Models Associated with the Marathon Event
Search: The different keywords were combined as follows: "long distance runners," "performance, performance prediction," "anthropometric and physiological determinants," "performance determinants" and "marathon".
Appraisal: The subjects in the different studies are generally trained and/or highly trained and at different levels (amateur, competitive, elite), with the exception of the study by Hagan which includes novice runners [41].

Discussion
The main strength of this literature review is the considerable number of publications and the subsequent analysis of the variables that make up the prediction equations of each of the specialties. This analytical text invites the reader and the scholar to use the assessment methods available to evaluate athletic performance.
One of the difficulties we encountered in comparing the different equations is that there is no consensus on the definition of the type of athletes, with each author having named the type of subjects involved. Therefore, we recommend unifying and clearly defining each of the athletes and their level. We also found great differences in the number of athletes participating in the studies, ranging from eight subjects [24,36] to 427 including both men and women [28].
The dependent variables of the models found are diverse, as they are expressed as time in minutes, seconds, hours; speed in m/s, m/min, km/h and, finally, the race pace in s/km. On this issue these have been the independent variables that have defined training loads, without finding work that has influenced in a quantification of both, strength trainings [57] and high-intensity intervals [6,58] from which predictor variables can be extracted. The number of independent variables is two or three, with some equations having as many as six independent variables. A piece of data missing in almost all the studies is the variance inflation factor (VIF), which informs us of multicollinearity.
Some of the possible solutions to the problem of multicollinearity are the following: improvement in the sample design by extracting the maximum information from the observed variables, elimination of the variables suspected of causing multi-collinearity and, finally, in the case of having few observations, increasing the sample size [59].
The identification of physiological variables for performance prediction has at least two important applications around sports training. The first is the evaluation of certain defining physiological characteristics related to the sports specialty and the second is associated with training (volume and intensity) in relation to the sports modality and especially with regard to metabolic and functional characteristics (capacity and power, aerobic and anaerobic).
The most widely studied variables for predicting aerobic performance in running are VO 2 max and vVO 2 max, both of which are fundamentally associated with short distances such as the 5000 m and 10,000 m events [10,22,23,25,28,43]. This is likely because the intensities at which these races are executed are very close to maximal intensities and thus their close correlation. VO 2 max is the physiological variable that represents aerobic capacity, or in other words, the measurement of the maximum energy produced by aerobic metabolism per unit of time. Both vVO 2 max and VO 2 max would effectively be the same as they occur essentially at the same time [28,31,43,60,61].
The variables related to the submaximum level and the variable intensities that occur in these areas have been studied extensively in all specialties, except for the half-marathon [26,28,39,43,62]. This is related to the fact that the half-marathon has not been recognized in the international federative sphere and, therefore there has been no interest in its study. In the half-marathon specialty, very few studies are available: one by Campbell in 1985 [47] and another by Roecker et al. [28] Campbell finds moderate-low correlations between some basic anthropometric parameters and running pulse rate and weeks of training. Roecker et al. [28] observed high correlations (r > 0.89) between individual anaerobic threshold and running velocity at an intensity of 4 mmol/L, both physiologically very similar concepts, and vVO 2 max. From 2011 onwards, the following references are provided by Knechtle's group, which published many articles linking half-marathon times with numerous anthropometric variables and with low-moderate correlation coefficients [48] and with prediction models also with moderate coefficients of determination [19].
Many studies in the literature analyse performance prediction in aerobic specialties based on the physiological parameters mentioned above. However, these studies, using simple or multiple regression models, analyse the associations between physiological parameters and aerobic performance capacity in athletes for a single distance (frequently between 1500 m and 10,000 m) [27,61,63] Based on the studies mentioned above, it has been proposed that race distance and, therefore, exercise intensity may influence the associations between physiological indicators and aerobic performance. Nonetheless, no studies have addressed aerobic performance capacity in the same athletes at different distances with two or more physiological indicators, particularly in studies with vVO 2 max and its respective time to exhaustion. As a result, it is not possible to draw the same conclusions for all sports specialties and at different athletic levels (amateur, highly trained, trained) [60]. The variables related to the quantity and quality of training are almost exclusive to studies undertaken in the marathon specialty and for different levels of training.
A contribution of this review is the general idea that the parameters recorded at the end of the graded exercise stress test are well understood, as are the parameters associated with aerobic and anaerobic thresholds, in terms of both metabolism and gas exchange, since in the different prediction models, variables range between 85% and 99% of the stress intensities. From our point of view, it is here, in this range of intensities where stronger associations should be sought, that would allow us to obtain more powerful models for predicting performance.
Similarly, in the field of ultramarathon races, which are becoming increasingly popular, variables related to RE, associated low lactate concentrations, percentage of VO 2 max and the search for models that integrate genetic aspects related to muscle damage and protein synthesis capacity should be explored, as well as how to more accurately determine and calculate training load both in terms of quantity and quality. In relation to genetic studies, it has been shown that polymorphisms (about 160) in 27 genes were identified in 10,442 participants, of whom 2984 were marathon runners, leaving the variance in the result on sports performance to be studied [64].

Practical Applications
The prediction of race time in the long-distance modalities has, above all, an initial application for novice runners, who have little knowledge of their race paces, allowing them to adjust to constant paces. Running paces can be modified depending on the phase of training. The knowledge of the variables associated with performance in long-distance runners should help coaches and exercise physiologists understand and promote the search for new variables that improve the prediction of sports performance.

Future Research Directions
As future lines of research, we must consider aspects that are currently known as physiological events that occur at the aerobic threshold (VT1), at the anaerobic threshold (VT2) and at maximum intensities (VO 2 max). At the lactate threshold, normally below 50-60% of VO 2 max, we know the lactate values, the energy expenditure for the race and the RE. These same parameters are also well known at the anaerobic threshold, which could be estimated to be around 85% of VO 2 max. We have many parameters that associate sports performance with VO 2 max, such as running speed, individual anaerobic threshold, and lactate levels. In addition, we know the physiological responses when reaching 100% of VO 2 max. Up to this point we can see what the exercise physiology studies have been based on for performance. However, we believe that there is a gap in what occurs between the aforementioned points, with regard to studying these values (percentage VO 2 max, RE, lactate levels, etc.). Anaerobic capacities should also be further explored, particularly as related to the 5000 and 10,000 m events. Finally, we must not forget the quantification of training load and of the molecular and genetic aspects related to human performance (see Figure 2). Int. J. Environ. Res. Public Health 2020, 17, x FOR PEER REVIEW 18 of 21 been based on for performance. However, we believe that there is a gap in what occurs between the aforementioned points, with regard to studying these values (percentage VO2max, RE, lactate levels, etc.). Anaerobic capacities should also be further explored, particularly as related to the 5000 and 10,000 m events. Finally, we must not forget the quantification of training load and of the molecular and genetic aspects related to human performance (see Figure 2).

Figure 2.
Proposal for the study of long-distance runners.

Conclusions
Physiological stress assessments are almost exclusive to the short long-distance specialties (5000 m and 10,000 m). Half-marathon predictor variables are mainly anthropometric, with moderate coefficients of determination and physiological and field test variables with high coefficients R 2 . The most relevant variables in the marathon modality are training variables derived from the evaluation of aerobic metabolism and anthropometric parameters.

Conclusions
Physiological stress assessments are almost exclusive to the short long-distance specialties (5000 m and 10,000 m). Half-marathon predictor variables are mainly anthropometric, with moderate coefficients of determination and physiological and field test variables with high coefficients R 2 . The most relevant variables in the marathon modality are training variables derived from the evaluation of aerobic metabolism and anthropometric parameters.