VO2max prediction based on submaximal cardiorespiratory relationships and body composition in male runners and cyclists: a population study

Background: Oxygen uptake (VO2) is one of the most important measures of fitness and critical vital sign. Cardiopulmonary exercise testing (CPET) is a valuable method of assessing fitness in sport and clinical settings. There is a lack of large studies on athletic populations to predict VO2max using somatic or submaximal CPET variables. Thus, this study aimed to: (1) derive prediction models for maximal VO2 (VO2max) based on submaximal exercise variables at anaerobic threshold (AT) or respiratory compensation point (RCP) or only somatic and (2) internally validate provided equations. Methods: Four thousand four hundred twenty-four male endurance athletes (EA) underwent maximal symptom-limited CPET on a treadmill (n=3330) or cycle ergometer (n=1094). The cohort was randomly divided between: variables selection (nrunners = 1998; ncyclist = 656), model building (nrunners = 666; ncyclist = 219), and validation (nrunners = 666; ncyclist = 219). Random forest was used to select the most significant variables. Models were derived and internally validated with multiple linear regression. Results: Runners were 36.24±8.45 years; BMI = 23.94 ± 2.43 kg·m−2; VO2max=53.81±6.67 mL·min−1·kg−1. Cyclists were 37.33±9.13 years; BMI = 24.34 ± 2.63 kg·m−2; VO2max=51.74±7.99 mL·min−1·kg−1. VO2 at AT and RCP were the most contributing variables to exercise equations. Body mass and body fat had the highest impact on the somatic equation. Model performance for VO2max based on variables at AT was R2=0.81, at RCP was R2=0.91, at AT and RCP was R2=0.91 and for somatic-only was R2=0.43. Conclusions: Derived prediction models were highly accurate and fairly replicable. Formulae allow for precise estimation of VO2max based on submaximal exercise performance or somatic variables. Presented models are applicable for sport and clinical settling. They are a valuable supplementary method for fitness practitioners to adjust individualised training recommendations. Funding: No external funding was received for this work.


Introduction
The oxygen uptake (VO 2 ) is considered an important metric in assessing cardiorespiratory fitness, health status, or endurance performance potential (Guazzi et al., 2012). With the application of standardised procedures and interpretation protocols, during graded exercise tests (GXT), the (maximal oxygen uptake) VO 2max can be established (Bentley et al., 2007). GXT is the most widely used assessment to examine the dynamic relationship between exercise and integrated physiological systems (Albouaini et al., 2007;Bentley et al., 2007). The information from GXT during cardiopulmonary exercise testing (CPET) can be applied across the spectrum of sport performance, occupational safety screening, research, and clinical diagnostics (Guazzi et al., 2017).
VO 2 max is often used as a boundary between severe and extreme intensity domains and by definition requires maximal effort from the tested subject (Gaesser and Poole, 1996). However, it is not always recommended or possible to undertake a test to exhaustion (Guazzi et al., 2012). For the athletes, the proximity of competition or injury history can allow submaximal testing, but not testing to exhaustion (Sassi et al., 2006). Testing that requires maximal effort may be disruptive to the training process or interfere with race performance (Coutts et al., 2007;Lamberts et al., 2011). Due to practical constraints, tests to exhaustion or peak-power-output tests are often performed only two or three times a year (Coutts et al., 2007).
However, VO 2 values are widely used in sport science and the decision-making process (Mann et al., 2013). VO 2 is widely considered one of the major endurance performance determinants (Joyner and Coyle, 2008). Using VO 2max to guide the selection process, prescribing training intensity, assessing training adaptations, or predicting race times is a common practice in high-performance sports (Bassett and Howley, 2000;Bentley et al., 2007;Hawley and Noakes, 1992;Noakes et al., 1990).
VO 2max is also one of the critical vital signs coordinating the function of the cardiovascular, respiratory, and muscular systems, it is an indicator of overall body health status (Kaminsky et al., 2017). Quantifying VO 2max provides additional input regarding clinical decision-making, risk stratification, evaluation of therapy, and physical activity guidelines (Guazzi et al., 2012). For patients undertaking a test to exhaustion is rarely needed or possible due to health restraints or cardiac risk (Guazzi et al., 2016).
For many years researchers have studied indirect methods of estimating VO 2max (Sartor et al., 2013). Protocols such as the Astrand-Ryhming Test, Six-Minute Walk Test, or YMCA Step Test have been established and validated (Astrand and Ryhming, 1954;Beutner et al., 2015;Carey, 2022;Jalili et al., 2018). Moreover, estimation of the VO 2 and heart rate (HR) values below the ventilatory threshold can be based on cardiorespiratory kinetics assessment using randomised changes in the work rate known as a pseudo-random binary sequences testing (Hoffmann et al., 2022). However, with the development of technology, the accessibility of laboratory testing and mobile testing improved (Montoye et al., 2020;Pritchard et al., 2021). Therefore, new opportunities to develop more precise yet simple and accessible methods and models to assess VO 2max occur (Jurov et al., 2023). This appears to be especially important considering the low prediction accuracy of most of the VO 2max formulae that were validated in our previous study (Wiecha et al., 2023).
Recently, we have been observing the development of prediction methods with the usage of machine learning (ML) and artificial intelligence (AI) (Ashfaq et al., 2022). Both ML and AI are used in sport science as forecasting and decision-making support tools (Abut and Akay, 2015;Bobowik and Wiszomirska, 2022;Chmait and Westerbeek, 2021;Hammes et al., 2022;Rossi et al., 2021). There is growing evidence that VO 2max prediction based on ML models, especially support vector ML and artificial neural network models, exhibits more robust and accurate results compared to MLR only (Abut and Akay, 2015;Ashfaq et al., 2022). Therefore, in this research, with the support of ML, we look for algorithms and prediction patterns that allow us to use values obtained during submaximal CPET and somatic measurements to estimate maximal VO 2max values in male runners and cyclists. We stipulate that prediction models allow for accurate calculation of VO 2max based on somatic or submaximal CPET variables.

Materials and methods
We have applied the development and validation of the prediction TRIPOD guidelines to conduct the study (see TRIPOD Checklist for Prediction Model Development and Validation) (Collins et al., 2015). The study is based on retrospective data analysis from the CPET registry collected from 2013 to 2021 at the medical clinic (Sportslab, Warsaw, Poland). All CPET have been performed at the individual request of participants, as a part of regular training monitoring or performance assessment.

Ethical approval
The Institutional Review Board of the Bioethical Committee at the Medical University of Warsaw (AKBE/32/2021) has approved the study protocol. The regulations of the Declaration of Helsinki were met during all parts of the study. Each study participant delivered written consent to undergo CPET and participate in the study.

Derivation cohort
We selected the cohort with the use of rigorous exclusion/inclusion criteria. Due to the insufficient number of women in our database and the number of potential variables in the regression models for adequate power, we had to limit ourselves to conduct analysis in the male population only (Martens and Logan, 2021). Out of 6439 healthy, adult male cyclists and long-distance runners that undergone CPET, 4423 met the criteria as further: (1) age ≥18 years, (2) declared regular cycling or running training for ≥3 months, (3) had no extreme outliers ≤ or ≥±3 standard deviations (SD) from mean for all of the testing variables (beyond ≥±3 SD in VO 2max ), (4) lack of any injury, medical condition, or addiction in medical history that may affect exercise capacity, (5) not taking any medications with a modifying effect on exercise capacity, (6) maximum exertion achieved during CPET. We defined the maximum exertion in CPET as the fulfilment of the minimum six of the following criteria: (1) respiratory exchange ratio (RER) ≥1.10, (2) present VO 2 plateau (growth <100 mL·min -1 in VO 2 despite increased running speed or cycling power), (3) respiratory frequency (fR) ≥45 breaths·min -1 , (4) declared subjective exertion intensity during CPET ≥18 in the Borg scale (Borg, 1970), (5) blood lactate concentration [La -] b ≥8 mmol·L -1 , (6) growth in speed/power ≥10% of respiratory compensation point (RCP) values after exceeding the RCP, (7) peak heart rate (HRpeak) ≥15 beats·min -1 below predicted maximal heart rate (HR max ) (Lach et al., 2021).
Participants' selection procedure has been shown in Figure 1.

Somatic measurements and CPET protocols
Body mass was measured with a body composition (BC) analyser (Tanita, MC 718, Japan) with the multifrequency of 5 kHz/50 kHz/250 kHz via the bioimpedance analysis and normal testing mode. The participants' skin was cleaned with alcohol before placing the electrodes on the skin. Prior to the test, the participants received instructions to refrain from exercising for 2 hr, consume a light meal rich in carbohydrates 2-3 hr beforehand, and maintain hydration by drinking isotonic beverages. Additionally, they were advised to abstain from medications, caffeine, and cigarettes on the day of the test. Running CPET (TE) was performed on a mechanical treadmill (h/p/Cosmos Quasar, Germany). Cycling CPET (CE) was performed on Cyclus-2 (RBM elektronik-automation GmbH, Leipzig, Germany). Hans Rudolph V2 mask (Hans Rudolph, Inc, Shawnee, KS, USA), breath-by-breath method with Cosmed Quark CPET gas exchange analysing device (Cosmed Srl, Rome, Italy), and Quark PFT Suite to Omnia 1.6 software were utilised. The gas analyser device was regularly calibrated with the reference gas (16% O 2 ; 5% CO 2 ) in accordance with the manufacturer's instructions (Airgas USA, LLC, Plumsteadville, PA, USA). From 2013 to 2021, three Cosmed Quark CPET units were used. HR was measured with the Cosmed torso belt (Cosmed srl, Rome, Italy). [La -] b was measured via enzymatic-amperometric electrochemical technique with Super GL2 analyser (Müller Gerätebau GmbH, Freital, Germany). The [La -] b analyser was regularly calibrated before each measurement series. The 40 m 2 indoor, air-conditioned Figure 1. Flowchart of the preliminary inclusion and exclusion process. Abbreviations: EA, endurance athlete; CPET, cardiopulmonary exercise testing; SD, standard deviation; TE, treadmill; RER, respiratory exchange ratio; VO 2 , oxygen uptake (mL·min −1 ·kg −1 ); [La − ] b , lactate concentration (mmol·L −1 ); fR, breathing frequency (breaths·min −1 ); RCP, respiratory compensation point; HR peak , peak heart rate (beats·min −1 ); HR max , maximal heart rate (bpm). At both stages of the selection, some participants met several (>1) exclusion criteria. laboratory with 20-22°C temperature and 40-60% humidity, and 100 m ASL provided the same conditions for all BC and CPET.
Each CPET began with a 5 min personalised warm-up (walk or easy jog with 'conversational' intensity for running, easy pedalling with 'conversational' intensity for cycling). Then after the preparation (about 5 min), the continuous progressive step test was conducted. Due to the population diversity (training status), the running test speed started from 7 to 12 km·hr -1 with a 1% treadmill incline. The choice of initial starting speed was determined by the interview and sports results achieved. For example, those running less than 60 min at a distance of 10 km started the test at 7 km/hr, while those running 10 km for less than 35 min started the test at an initial speed of 12 km/hr. The pace increased by 1 km·hr -1 every 2 min with no change in incline. The cycling test began at 60-150 W, depending on the athletes training status. The power increased by 20-30 W every 2 min. It was recommended to maintain a constant cadence of 80-90 (repetition·min -1 ) during the test. The tests were terminated due to exhaustion: volitional inability to continue the activity or/and VO 2 and HR plateau with increasing load or/and observed disturbance of coordination in running or/and inability to maintain the set cadence. Due to the graded protocol used, the cycling power and running speed values have been calculated as a function of time to better reflect the actual level for the test moment being determined (Kuipers et al., 1985). Before the test, after every step, and 3 min after the termination of the effort technician took a 20 µL blood sample from a fingertip. Samples were collected during the test without interrupting the effort. The samples were taken from the initial puncture. The first blood drop was collected into the swab and the second blood drop was drawn for further analysis into the capillary. VO 2max was recorded as the highest value (15 s intervals) before the termination of the test. HR max was recorded as the highest value obtained at the end of the test, without averaging.
The anaerobic threshold (AT) was established with the following criteria: (1) common start of VE/ VO 2 and VE/VCO 2 curves, (2) end-tidal partial pressure of oxygen raised constantly with the endtidal partial pressure of carbon dioxide (Beaver et al., 1986). The was established with the following criteria: (1) PetCO 2 must decrease after reaching maximal amount, (2) the presence of fast nonlinear growth in VE (second deflection), (3) the VE/VCO 2 ratio achieved minimum and started to rise, and (4) a nonlinear increase in VCO 2 versus VO 2 (lack of linearity) (Beaver et al., 1986). The [La -] b was estimated for AT and RCP in relation to power or speed (Wiecha et al., 2022).

Data analysis
Our comprehensive ML approach enables the evaluation of each formula by preliminary variables precision (at the stage of selection), then accuracy (during the model's building) and recall (in internal validation).
Individual CPET results were saved into the Excel file (Microsoft Corporation, Redmond, WA, USA) and a custom-made script was used to generate the database in Excel (Python programming). Further, mean, SD, and 95% confidence intervals (CI) were calculated. The normality of the distribution of the data was examined using the Shapiro-Wilk test and intergroup differences were calculated using the Student's t-test for independent variables. Three-step variable selection procedures based on random forests were applied using the R package VSURF in RStudio software (R Core Team, Vienna, Austria; version 3.6.4) (Genuer et al., 2016). For each level of measurement (AT, RCP) and their combination (AT+RCP), significant variables were identified separately. The first step was dedicated to eliminate irrelevant variables from the dataset. Second step aimed to select all variables related to the response for interpretation purposes. The third step refined the selection by eliminating redundancy in the set of variables selected by the second step, for prediction purposes (Genuer et al., 2017). Each time for variables selection,  CI = 95% confidence interval. SD = standard deviation. rVO 2AT = oxygen uptake at anaerobic threshold relative to body mass. RER AT = respiratory exchange ratio at anaerobic threshold. HR AT = heart rate at anaerobic threshold. VE AT = pulmonary ventilation at anaerobic threshold. SPEED AT = velocity at anaerobic threshold. LA AT = blood lactate concentration at anaerobic threshold. rVO 2RCP = oxygen uptake at respiratory compensation point relative to body mass. RER RCP = respiratory exchange ratio at respiratory compensation point. HR RCP = heart rate at respiratory compensation point. VE RCP = pulmonary ventilation at respiratory compensation point. SPEED RCP = velocity at respiratory compensation point. LA RCP = blood lactate concentration at respiratory compensation point. rVO 2max = maximal oxygen uptake relative to body mass.
the anthropometric variables as in Tables 1-2 and the CPET parameters given in Tables 3-4 from a specific level of measurement (AT; RCP) and their combinations were visible. After selection variables were included in the further analysis, only selected parameters were put into multiple linear regression (MLR) modelling. The data for MLR model building were randomly distributed into sets, that is derivation, testing, validation representing 60%, 20%, and 20% of the cases, respectively. As a result, only significant predictors (with p<0.05) were included in the final models. Derived equations are characterised by the coefficient of determination (R 2 ), root mean square error (RMSE), and mean absolute error (MAE). Bland-Altman plots analysis was used to establish the model's precision and accuracy during validation (Altman and Bland, 1983). Other implemented tests to reach the complete fulfilment of MLR modelling requirements included Ramsey's RESET test (for the correctness of specificity in MLR equations), Chow test (for stability assessment between different coefficients), and Durbin-Watson test (for autocorrelation of residuals). Each model was examined under the above-mentioned requirements and any irregularities have not been noted.
Ggplot 2 package in RStudio (R Core Team, Vienna, Austria; version 3.6.4), GraphPad Prism (GraphPad Software; San Diego, CA, USA; version 9.0.0 for Mac OS), and STATA software (StataCorp, College Station, TX, USA; version 15.1) were used in statistical analysis. A two-sided p-value <0.05 was considered as the significance borderline.

Somatic measurements and CPET results
Anthropometric data of the runners models for derivation, testing, and validation groups are presented in Table 1, while cyclists are in Table 2. The runners groups consisted of 1998, 666, and 666 men for derivation, testing, and validation groups, respectively. In turn, the cyclists groups included 656, 219, and 219 men, respectively. Significant differences (p<0.05) between derivation cohorts of runners and cyclists were in BMI and age, between testing cohorts in all baseline parameters, whereas between validation cohorts only in BMI. CI = 95% confidence interval. SD = standard deviation. rVO 2AT = oxygen uptake at anaerobic threshold relative to body mass. RER AT = respiratory exchange ratio at anaerobic threshold. HR AT = heart rate at anaerobic threshold. VE AT = pulmonary ventilation at anaerobic threshold. rPOW AT = power at anaerobic threshold relative to body mass. LA AT = blood lactate concentration at anaerobic threshold. rVO 2RCP = oxygen uptake at respiratory compensation point relative to body mass. RER RCP = respiratory exchange ratio at respiratory compensation point. HR RCP = heart rate at respiratory compensation point. VE RCP = pulmonary ventilation at respiratory compensation point. LA RCP = blood lactate concentration at respiratory compensation point. rPOW RCP = power at respiratory compensation point relative to body mass. rVO 2max = maximal oxygen uptake relative to body mass.

Models validation
Evaluation of each model for cyclists is presented in Table 5, while for runners in Table 6. In summary, the performance of our prediction equations was similar to that observed in the derivation cohort. A minorly higher RMSE and MAE were noted. Overall, RMSE values in cyclists are located between 2.03 and 6.11, whereas in runners between 2.0 and 5.54. MAE ranged from 1.64 to 4.74 mL·min -1 ·kg -1 in cyclists models and 1.58 to 4.37 mL·min -1 ·kg -1 in runners. The most accurate prediction was obtained in cyclists (defined as the highest replicability and the lowest risk of inaccuracies in the test set) by RCP equations (R 2 =0.913, RMSE=2.03, MAE=1.64). Interestingly, the models which worked the most accurately and the less precisely were the same in the derivation and validation. Figure 3 illustrates the Bland-Altman plots with a comparison of observed vs predicted VO 2max using newly derived prediction models at the stage of validation.

Discussion
In the present study, we derived and internally validated novel advanced and accurate prediction models for VO 2max . The main findings are as follows: (1) we can precisely predict VO 2max based on submaximal CPET parameters, (2) inclusion of cardiopulmonary and BC variables enriches their prediction performance, (3) based only on somatic parameters, weak-to-low VO 2max assessment is currently possible, (4) derived equations showed high transferability abilities during validation. Our findings indicate that prediction models based on AT and RCP variables allow for accurate VO 2max calculation. Equations based on somatic variables allow for limited precision.
The main advantage of our research is the unified CPET protocol conducted on a wide cohort of endurance athletes with different levels of fitness. This approach enables the comprehensive evaluation of the most important predictors which were further applied to build prediction equations. In the current literature, prediction models for sports and performance diagnostics are mostly derived from narrow and specified athletic cohorts which limit their applicability to broader populations (Paap and Takken, 2014). Moreover, the advantage of the presented research was the fact that regression equations for the treadmill and cycle ergometer were derived based on the most commonly used machines and forms of activity or movement in the laboratory stress exercise tests.
An important issue addressed in publications on various attempts to estimate VO 2max is the question of their usefulness in assessing changes in endurance over a training cycle (Klusiewicz et al., 2016). As reported by Klusiewicz et al., the suitability of the two indirect methods of assessing VO 2max was statistically confirmed, their usefulness for estimating changes in the endurance of the trained individuals during the training cycle was rather low (Klusiewicz et al., 2016). The standard estimation error of these methods (ranged between 4.2% and 7.7% in the female and 5.1% and 7.4% in the male) was higher than the real differences in the VO 2max values determined in the direct measurements (between the first and the second examination the VO 2max rose by 3.0% in the female athletes and dropped by 4.3% in the male athletes) (Klusiewicz et al., 2016). Popularly used wearables provided substantial accuracy on population level when considered devices with exercise-based algorithms (Molina-Garcia et al., 2022). However, VO 2max predictions on the individual's level still need improvement in the context of both sports and clinical settings (Molina-Garcia et al., 2022).
In the Astrand-Ryhming method, a widely used VO 2max prediction method for almost 70 years, in several papers published so far, the correlation coefficients of the measured values to the predicted values ranged from 0.63 to 0.85. Standard estimated error values (in L·min -1 ) generally exceeded 0.5 (Grant et al., 1995;Legge and Banister, 1986). In our study, the highest R 2 was 0.913 and the  lowest was 0.775. As an additional advantage, we propose an equation based only on somatic variables, which showed a low R 2 -0.35 for runners and 0.43 for cyclists. Although, it still presents that our models are more accurate than those widely described in the literature so far (Paap and Takken, 2014;Wiecha et al., 2023).
As VO 2 is an exercise parameter that combines the function of the respiratory, circulatory, and muscular systems, the use of only somatic variables such as body fat, weight, or age was not sufficient for optimal prediction (Bassett and Howley, 2000). It is worth underlining that the main factors contributing to VO 2max were submaximal variables VO 2AT and VO 2RCP , as well as running speed for runners and pedalling power for cyclists (Billat and Koralsztein, 1996). Thus, VO 2 is a universal and interchangeable measurement of endurance (Albouaini et al., 2007). Our results suggest that VO 2 is an indicator of both endurance and critical vital signs (Blair et al., 1989;Kaminsky et al., 2015). It is also in line with current standings as Kaminsky et al. and Blair et al. postulate that the higher the VO 2max , the fitter the individual is (Kaminsky et al., 2015), and the lower its all-cause mortality (Blair et al., 1989).
In sports medicine and exercise physiology, evaluation of the body's functional performance remains crucial (Sartor et al., 2013). Our results indicate that VO 2max was possible to accurately predict based only on submaximal parameters (without the inclusion of maximal ones), which is reflected in R 2 (Wiecha et al., 2022). This confirms that submaximal CPET is a valuable tool in assessing fitness levels. Thus, submaximal exercise testing appears to be more applicable by physicians and fitness professionals in their role as clinical exercise specialists (Noonan and Dean, 2000). For individuals who have a moderate to high possibility of cardiovascular diseases, exerting themselves up to their maximum abilities increases risk of adverse outcomes (Guazzi et al., 2012;Noonan and Dean, 2000). There are numerous possibilities to use results of submaximal CPET. Currently, pseudo-random binary sequencing appears as one of the feasible approaches. Its enables assessment of cardiorespiratory kinetics within the selected workload ranges (Hoffmann et al., 2022;Koschate et al., 2016). This is important as CPET until refusal is often impossible to conduct or highly dangerous (Guazzi et al., 2016). Such situations appear mainly in clinical cardiovascular conditions, such as heart failure, dyspnea of unknown aetiology, or risk evaluation for providing treatment protocol (Guazzi et al., 2016). Furthermore, previous studies have described that submaximal variables are significant predictors for performance measurements, as pointed out by Snowden et al., 2010, andAlbouaini et al., 2007. In conclusion, ensuring high repeatability through submaximal prediction methods is crucial for monitoring endurance changes in both sports and medical diagnostics (Mann et al., 2013;Noonan and Dean, 2000).
It is worth mentioning the effect of body fat percentage on VO 2max . This variable has been included in the majority of our models. With the increase in body fat percentage, VO 2 decreased, and this relationship was particularly important in the somatic equation and is previously described in the literature (Shete et al., 2014). This is due to the fact that a higher level of adipose tissue and general body mass have both a negative impact on the results during long-term endurance sports (i.e. running and cycling) and with increasing fitness levels, the level of participant fatness decreases (Schwartz et al., 1991).
Results of internal validation show that our prediction models allow for an accurate assessment of VO 2max . The observed RMSE and MAE values are significantly lower than in the validation of other prediction models on endurance athletes' cohorts. Petek et al., 2022, andMalek et al., 2004, while validating the majority of widely used prediction models, observed MAE and RMSE on the level of 7-9 mL·kg -1 ·min -1 . Our highest value of error for the somatic equation was in the cycling model (MAE; 4.74 mL·kg -1 ·min -1 ). Moreover, as we mentioned above, the somatic equation showed the lowest accuracy, and the remaining equations have RMSE between 1.94 and 6.11 and MAE in the range of 1.46-4.74 mL·kg -1 ·min -1 .
Our study has some limitations. The applied exercise protocol may affect CPET results. There may be differences in performance measured in 2 min steps comparing to longer steps, but this should not significantly impact the participants' exercise results. Additionally, longer constant intervals may increase accuracy in determining AT and RCP level, but have negligible impact on VO 2max values (Muscat et al., 2015). The study, due to the insufficient number of women in the database to obtain reliable results, was restricted to men only. Therefore, the equations should be applied with more caution in women.
To summarise, our study has vast practical applications in the comprehensive assessment of an athlete's training and is a valuable tool for coaches in the preparation of individualised training prescriptions (Mann et al., 2013). Targeting training regimens and diet to optimise the most important parameters contributing to the VO 2max (i.e. VO 2AT , VO 2RCP , RER, body fat, etc.) will allow for achieving better results during the competition and they provide a useful indirect method for assessing changes in endurance during the training cycle (Mann et al., 2013). Various areas of application of prediction models have also been postulated in the literature so far, for example in submaximal and maximal efforts, or simulating the overcoming of the starting distance, or even at rest (Zhou et al., 1997). It is also worth mentioning their clinical implications in cardiology for the diagnosis of heart disease in athletes (where a reduction in VO 2max may occur despite maintaining other parameters, e.g. RER) (Guazzi et al., 2016;Löllgen and Leyk, 2018).

Conclusion
Briefly, we provided new prediction models for VO 2max . The proposed method allows for precise prediction of VO 2max based on submaximal results. Our equations were derived from a wide cohort of 6439 athletes with varied fitness levels which inflated the quality and transferability of the presented data. Higher accuracy was noted when applying submaximal predictors. Adding circulatory and respiratory variables enriches prediction performance. Body fat and fat-free mass had significant impacts on most of the VO 2max prediction equations. The novel model based only on somatic parameters is presented. Derived equations showed high performance during internal validation and were fairly replicable. The inclusion of such a tool has practical usage for fitness professionals and personal coaches to prepare more precise training recommendations and establish competition pacing strategies.

Additional information
Competing interests Szczepan Wiecha: received payment for leading CPET workshops at IX Małopolskich Warsztatach Niewydolności Serca. The author has no other competing interest to declare. Tomasz Kowalski: has received funding from the Institute of Sport -National Research Institute. The author has received consulting fees for regular coaching and consulting work with private clients, Polish Triathlon Federation and The Triathlon Squad professional triathlon team. The author has no other competing interests to declare. The other authors declare that no competing interests exist.

Funding
No external funding was received for this work.

Additional files
Supplementary files • MDAR checklist • Source code 1. Source code in Python for transforming files in the database.

Data availability
All data generated or analysed during this study are included in the manuscript.