Integrative Interpretation of Cardiopulmonary Exercise Tests for Cardiovascular Outcome Prediction: A Machine Learning Approach

Integrative interpretation of cardiopulmonary exercise tests (CPETs) may improve assessment of cardiovascular (CV) risk. Here, we identified patient phenogroups based on CPET summary metrics and evaluated their predictive value for CV events. We included 2280 patients with diverse CV risk who underwent maximal CPET by cycle ergometry. Key CPET indices and information on incident CV events (median follow-up time: 5.3 years) were derived. Next, we applied unsupervised clustering by Gaussian Mixture modeling to subdivide the cohort into four male and four female phenogroups solely based on differences in CPET metrics. Ten of 18 CPET metrics were used for clustering as eight were removed due to high collinearity. In males and females, the phenogroups differed significantly in age, BMI, blood pressure, disease prevalence, medication intake and spirometry. In males, phenogroups 3 and 4 presented a significantly higher risk for incident CV events than phenogroup 1 (multivariable-adjusted hazard ratio: 1.51 and 2.19; p ≤ 0.048). In females, differences in the risk for future CV events between the phenogroups were not significant after adjustment for clinical covariables. Integrative CPET-based phenogrouping, thus, adequately stratified male patients according to CV risk. CPET phenomapping may facilitate comprehensive evaluation of CPET results and steer CV risk stratification and management.


Introduction
Cardiovascular (CV) diseases substantially burden societal health and healthcare [1]. We are still in search of tools to timely identify individuals at high risk for CV disease in order to timely initiate risk reducing strategies [2]. The assessment of cardiorespiratory fitness (CRF) may improve CV risk assessment [3]. CRF reflects the ability of the muscles to perform dynamic work, relying on the respiratory and CV systems for the transport of oxygen (O 2 ) to, and carbon dioxide (CO 2 ) from, working muscles. Low CRF is strongly associated with a higher incidence of adverse CV events [3,4]. Nowadays, cardiopulmonary exercise testing (CPET) allows standardized assessment of CRF in clinical settings. To date,

Study Population
This retrospective study is an iCOMPEER sub-study ("Integrative computer modeling for personalized profiling of CRF and prediction of response to ambulatory cardiac rehabilitation"). The study received approval from the Ethics Committee of UZ/KU Leuven (#S64901). We screened data from 3466 adult patients who underwent a maximal CPET at the University Hospital Leuven (Leuven, Belgium) between April 2010 and October 2020. Patients were referred for CPET for diverse reasons: (a) following recent revascularization, including elective percutaneous coronary interventions (PCIs); (b) for CV risk assessment in hypertension, obesity, or diabetes mellitus; (c) as part of screening before initiating an exercise program; or (d) for differential diagnosis of dyspnea, chronic fatigue syndrome or another exercise-limiting condition. Only the first CPET of each patient was considered.
Patients were excluded from analysis if they (1) were below 18 or above 80 years old; (2) had previous myocardial infarction with an impact on left ventricular function (i.e., ejection fraction < 50%), symptomatic heart failure, congenital heart disease, cardiomyopathy, cardiac surgery (e.g., coronary artery bypass grafting) or an artificial pacemaker; (3) had an autoimmune disease, a malignancy or recreational drug abuse; (4) were pregnant; or (5) did not pass the criteria of a maximal CPET (see Section 2.5). The final study sample included 2280 patients.

Retrospective Data Retrieval
Data was pooled from the medical repository system of the University Hospital Leuven. The data comprised demographics, anthropometrics, disease history and medication intake as well as CPET summary metrics. Blood test results for renal function (eGFR) and glucose regulation (blood glucose, HbA1c) from less than 1 year before CPET were retrieved to complement the medical data. This biochemical data was available in 1772 patients (77.7%; median time between blood sample and CPET: 1.1 months).
Definitions of the CV risk factors and comorbidities, such as hypertension, chronic kidney disease and diabetes mellitus, are detailed in the Supplementary Material. CV diseases and surgery were determined from the patient's medical history files and surgery reports (Table S1).

Outcome Collection (Incident CV Events)
To evaluate the performance of the CPET-based phenogrouping model for CV disease prediction, we collected information on incidence of fatal and non-fatal CV events from the medical repository system. CV events were ascertained until December 2022. Fatal and non-fatal CV events included the following: coronary events (myocardial infarction, acute coronary syndrome, angina pectoris or ischemic heart disease requiring coronary revascularization), symptomatic heart failure, valvular heart disease requiring surgical intervention, heart block, pacemaker implantation, incident atrial fibrillation, stroke, transient ischemic attacks, aortic aneurysm, pulmonary heart disease, pulmonary embolism or infarction, arteriosclerosis, other peripheral vascular disease, arterial embolism or thrombosis, and other diseases of arteries and arterioles. A grace period of 3 months for a first CV event was respected to exclude events resulting from residual therapy (e.g., planned PCI for residual lesions).

Spirometry
Spirometry was available for 1756 of the 2280 patients (77.0%) prior to CPET. Variables recorded were the forced expiratory volume in 1 s (FEV1), the forced vital capacity (FVC), and their ratio (FEV1/FVC). The percentage predicted FEV1 and FVC were calculated as the measured FEV1 and FVC relative to the FEV1 and FVC predicted when considering the patient's age, sex, and height [8].

Cardiopulmonary Exercise Testing
Patients were instructed not to refrain from taking their medications prior to CPET. Patients had performed a CPET in a hospital setting on a cycle ergometer (Ergometrics 800S, Ergometrics, Bitz, Germany) supervised by a physiotherapist or clinician in accordance with the then prevailing recommendations [9]. Each patient performed an incremental exercise protocol, aiming to reach maximum exertion within 8-12 min (20 W + 20 W/min (79.9%), 20 W + 10 W/min (12.2%), 10 W + 10 W/min (6.7%) or another incremental protocol (1.2%)). During the test, all patients were encouraged to reach maximal exertion and instructed to maintain the cycling cadence between 60 and 70 rotations per minute. Exercise testing was performed under continuous ECG monitoring with breath-by-breath analysis of inspired and expired gas (VO 2 , VCO 2 ) and minute ventilation. The BP was measured every other minute using an automated BP monitor. The CPET was stopped upon volitional fatigue, when the patient was unable to keep the pedalling rate above 60 rpm or when any AHA termination criterion was reached [10]. After the test, the patient rated the maximal level of perceived exertion on the 6 to 20 Borg scale. The test supervisor determined the point of the second ventilatory threshold (VT2) using the ventilatory equivalent VE/VCO 2 method [11]. Patients were included in analyses if VT2 and/or a peak respiratory exchange ratio (RER, the ratio of VCO 2 and VO 2 ) above 1.05 occurred [12].

CPET Summary Metrics
We derived CPET metrics at peak exercise, including the load, VO 2 , RER, HR, O 2 pulse (as VO 2peak /HR peak ), systolic BP and ventilation. VO 2peak was the highest of the last three consecutive 30-s averages of VO 2 . Peak O 2 pulse was calculated as VO 2peak /HR peak . We additionally indexed VO 2peak and peak O 2 pulse for body weight in kilograms. The percentage of age-predicted HR was 100 × [peak HR/maximal predicted HR (defined as 220age)]. We applied sex-specific equations to calculate the predicted peak VO 2 from age, weight, and height [in men: −69 + 1.48 × age + 14.02 × height + 7.44 × weight − 0.2256 × age 2 ; in women: −588 − 11.33 × age + 9.13 × height + 26.88 × weight − 0.12 × weight 2 ] and retrieved percentage predicted VO 2 as 100 × VO 2peak /predicted peak VO 2 [13]. The peak metabolic equivalents of Task (METs peak ) were calculated as VO 2 /kg peak /3.5. The VE/VCO 2 slope was determined by applying linear regression on the ventilation and VCO 2 tracings recorded during the exercise period until the respiratory compensation point [14]. In addition, we derived the percentage of VO 2 and HR at VT1 relative to the peak exercise values.

Statistical Analysis
Study data were managed using the REDCap electronic data management tool hosted at KU Leuven [15]. Means and frequencies were compared between males and females using two-sided unpaired t-tests and χ 2 tests, respectively. A p value below 0.05 on a two-sided test was considered statistically significant.

Cluster Analysis for CPET-Based Phenogrouping
Python 3.8 (https://www.python.org, accessed on 10 May 2023) was used for the unsupervised analysis to identify sex-specific, CPET-based phenogroups and to investigate their associations with the clinical data and CV outcome. Figure 1 summarizes the steps of the unsupervised ML pipeline.
percentage of age-predicted HR was 100 × [peak HR/maximal predicted HR (defined as 220-age)]. We applied sex-specific equations to calculate the predicted peak VO2 from age, weight, and height [in men: −69 + 1.48 × age + 14.02 × height + 7.44 × weight − 0.2256 × age 2 ; in women: −588 − 11.33 × age + 9.13 × height + 26.88 × weight − 0.12 × weight 2 ] and retrieved percentage predicted VO2 as 100 × VO2peak/predicted peak VO2 [13]. The peak metabolic equivalents of Task (METspeak) were calculated as VO2/kgpeak/3.5. The VE/VCO2 slope was determined by applying linear regression on the ventilation and VCO2 tracings recorded during the exercise period until the respiratory compensation point [14]. In addition, we derived the percentage of VO2 and HR at VT1 relative to the peak exercise values.

Statistical Analysis
Study data were managed using the REDCap electronic data management tool hosted at KU Leuven [15]. Means and frequencies were compared between males and females using two-sided unpaired t-tests and χ 2 tests, respectively. A p value below 0.05 on a two-sided test was considered statistically significant.

Cluster Analysis for CPET-Based Phenogrouping
Python 3.8 (https://www.python.org, accessed on 10 May 2023) was used for the unsupervised analysis to identify sex-specific, CPET-based phenogroups and to investigate their associations with the clinical data and CV outcome. Figure 1 summarizes the steps of the unsupervised ML pipeline.

Feature Selection
Before phenogrouping, we applied a feature selection step to avoid any clustering bias originating from strongly correlated features. First, the 18 spirometric and CPET features were standardized to a mean of 0 and standard deviation of 1. A Pearson's

Feature Selection
Before phenogrouping, we applied a feature selection step to avoid any clustering bias originating from strongly correlated features. First, the 18 spirometric and CPET features were standardized to a mean of 0 and standard deviation of 1. A Pearson's correlation matrix was constructed by using the Pandas library (1.1.2) to identify highly correlated features (i.e., correlations with a Pearson correlation coefficient ≥ 0.8). Of the highly correlated features, only the most connected and most clinically relevant features were kept so as to end up with a set of features for phenogrouping free from strong interrelationships. Eventually, FEV1, FVC, and their ratio were not considered for the phenogrouping as spirometric data was missing in 23% of the patients. In total, 10 CPET features were, thus, selected for phenomapping.

Model Fitting
Two clustering approaches were taken from the scikit-learn library version 0.23, which were applied to the male and female patients separately [16,17]. First, we performed exploratory analysis with biclustering, based on hierarchical clustering with Euclidean distance and Ward linkage, to visually identify patterns that could define subsets of patients based on the previously determined set of 10 features. A biclustering heatmap was created using the seaborn library (v0.10.1), visualizing the grouping of both patients and CPET variables.
Second, we used Gaussian mixture model-based clustering, fit with an expectation maximization algorithm, for the actual CPET-based phenomapping [17,18]. The advantages of Gaussian mixture model-based clustering, compared to other approaches, relate to the following: (i) the possibilities of statistical analysis using the underlying probabilistic framework, (ii) estimating cluster parameters from soft assignments, (iii) accounting for variance and (iv) the formation of clusters of different sizes and shapes [19].
By using Bayesian information criteria (BIC), the ideal number of clusters (K) in men and women was determined for every tested k, as implemented in the scikit-learn library [16,20]. The K with the lowest BIC represented the most appropriate number of clusters. Based on visual inspection of the biclustering heatmap and the BIC values, we instructed the Gaussian mixture algorithm to identify four male and four female phenogroups. Four clusters were considered per sex to enable sufficient granularity between the phenogroups, while maintaining adequate interpretability. Next, the R package VarSelLCM v2.1.3 (https://www.r-project.org, accessed on 10 May 2023) was applied to determine the importance of the features in the phenogroups by sex. The package uses a model-based clustering framework by optimizing a modified integrated completedata likelihood [21]. Radar charts were used to visualize CPET features across the sexspecific phenogroups.

Model Validation
To define the clinical relevance of the CPET phenogrouping, we first compared clinical characteristics across the phenogroups by means of two-sided unpaired t-tests and χ 2 statistics, with Bonferroni correction of p values to account for multiple testing. Next, using the lifelines library v0.27, we plotted the event-free survival of CV events per phenogroup and performed pairwise Log-rank tests comparing the individual Kaplan-Meier curves. We also calculated the multivariable-adjusted Cox proportional hazard ratios expressing the adjusted risk for CV events per phenogroup (with phenogroup 1 as reference). We considered the following confounders in Cox regression: age, body height and weight, heart rate, systolic and diastolic BP, antihypertensive drug intake and a history of diabetes mellitus, chronic kidney disease and cardiovascular intervention. We also tested models that additionally accounted for differences in VO 2 /kg peak or the percentage predicted VO 2peak . Table 1 presents the clinical and CPET characteristics by sex. Of the 2280 patients, 1092 (47.9%) were female, 1310 (57.5%) had hypertension and 1057 (46.4%) were taking BP lowering medication. Overall, females were younger than males (47.5 ± 13.7 years in women vs. 55.9 ± 13.0 years in men), and had a more favorable CV risk profile and medical history and less medication intake. Apart from a higher HR at rest and at peak exercise in women, men achieved significantly higher CPET values than women, including a higher load, VO 2 , O 2 pulse, systolic BP, minute ventilation, VE/VCO 2 slope and RER at peak exercise (p ≤ 0.011) ( Table 1). For feature selection prior to the phenogrouping, correlations were determined across 18 spirometric and CPET features by means of a Pearson's correlation matrix to filter a set of features free from strong interrelationships (Figure 2). Of the eight highly correlated features, we kept load peak and VO 2 /kg peak as features for phenogrouping, given their major clinical importance ( Figure 2B). As a result, we excluded VO 2peak , VCO 2peak and peak minute ventilation (because they were strongly correlated with load peak ) as well as VCO 2 /kg peak and O 2 pulse/kg peak (because they were strongly correlated with Diagnostics 2023, 13, 2051 7 of 15 VO 2 /kg peak ) from the phenogrouping ( Figure 2B). In addition, the spirometric variables were also excluded from phenogrouping, as this data was missing in 23% of the patients, leaving 10 CPET variables for phenomapping. . The spirometric measurements were eventually excluded from the phenogrouping, due to the extensive amount of spirometric data missing (23.0%). In addition, five CPET metrics were excluded from the phenogrouping due to high collinearity (see B), leaving ten features for the phenomapping (in bold in (A)). Abbreviations as in Table 1.

Figure 2.
Correlations between the spirometric and the CPET features. Strong correlations, those with an absolute Pearson correlation coefficient > 0.8, were framed in red in the correlation matrix (A) and presented below (B). The spirometric measurements were eventually excluded from the phenogrouping, due to the extensive amount of spirometric data missing (23.0%). In addition, five CPET metrics were excluded from the phenogrouping due to high collinearity (see B), leaving ten features for the phenomapping (in bold in (A)). Abbreviations as in Table 1. Figure S1 presents the heatmap from biclustering based on agglomerative hierarchical clustering of the 10 CPET features. Based on visual inspection of the biclustering heatmap ( Figure S1), and on the BIC values ( Figure S2), the optimal number of clusters ranged between 3 and 6 in women and between 4 and 6 in men. Four clusters were eventu-ally considered per sex to enable sufficient granularity between the phenogroups, while maintaining adequate interpretability. As such, the Gaussian mixture modeling algorithm extracted four phenogroups per sex. In both men and women, the load peak , VO 2 /kg peak , and HR peak had the highest power to discriminate the four clusters ( Figure S3). In contrast, SBP peak (and the Borg score in women) were the least important for the phenogrouping. Figure 3 presents the radar charts comparing the 10 CPET features across the sexspecific phenogroups. In general, in both men and women, VO 2 /kg peak and HR peak gradually declined and VE/VCO 2 slope gradually increased from phenogroups 1 to 4 ( Figure 3). Within each sex, phenogroup 4 was, thus, characterized by the lowest VO 2 /kg peak , lowest HR peak and highest VE/VCO 2 slope.  Table 1.

The Association between CV Outcome and Phenogroup Assignment
The median follow-up time was 5.3 years (5th-95th percentile, 1.3 to 10.8 years). During the follow-up time, 278 males and 109 females experienced a CV event (43.9 and 16.2 events per 1000 person-years, respectively). In both sexes, the incidence of CV events increased significantly from phenogroup 1 to 4 ( Figure 4).
In males, relative to phenogroup 1, the adjusted risk for CV events was significantly higher for phenogroups 3 (adjusted hazard ratio with 95% CI (HRadj): 1.51, 1.00-2.27; p = 0.048) and 4 (HRadj: 2.19, 1.31 to 3.66; p = 0.0028), after adjustment for clinical confounders ( Figure 5). After an additional adjustment for VO2/kgpeak or percentage predicted VO2peak, phenogroup 4 was still at higher risk for CV events than phenogroup 1 (Table 2). Reversely, a lower METspeak was significantly associated with a higher CV risk in men, also after adjustment for clinical phenogroup information, at least for METspeak on a continuous scale (Table S4). In females, the risk for future CV events did not differ between the phenogroups after adjustment for the clinical covariables and VO2/kgpeak or percentage predicted VO2peak. (Figure 5, Table 2), but was also not associated with METspeak after accounting for clinical covariables (Table S4).  Table 1. Table S2 presents the clinical characteristics per female phenogroup. From the first (n = 102) to the second (n = 713) to the third (n = 166) to the fourth phenogroup (n = 111), we noted the following: (i) a progressive increase in age, BMI and resting systolic BP (ii) a strong increase in disease prevalence (as the history of hypertension, diabetes mellitus, chronic kidney disease and CV disease), (iii) a progressive increase in the intake of antihypertensive, lipid-lowering, anti-thrombotic and anti-diabetic drugs, and (iv) a progressive decline in the FEV1 and FVC (Table S2). In men, we observed the same inter-phenogroup differences, in age, systolic BP, disease history, medication intake and spirometry from the first (n = 327) to the second (n = 132) to the third (n = 661) to the fourth phenogroup (n = 68), as in women (Table S3).

The Association between CV Outcome and Phenogroup Assignment
The median follow-up time was 5.3 years (5th-95th percentile, 1. 16.2 events per 1000 person-years, respectively). In both sexes, the incidence of CV events increased significantly from phenogroup 1 to 4 ( Figure 4).
In males, relative to phenogroup 1, the adjusted risk for CV events was significantly higher for phenogroups 3 (adjusted hazard ratio with 95% CI (HR adj ): 1.51, 1.00-2.27; p = 0.048) and 4 (HR adj : 2.19, 1.31 to 3.66; p = 0.0028), after adjustment for clinical confounders ( Figure 5). After an additional adjustment for VO 2 /kg peak or percentage predicted VO 2peak , phenogroup 4 was still at higher risk for CV events than phenogroup 1 (Table 2). Reversely, a lower METs peak was significantly associated with a higher CV risk in men, also after adjustment for clinical phenogroup information, at least for METs peak on a continuous scale (Table S4). In females, the risk for future CV events did not differ between the phenogroups after adjustment for the clinical covariables and VO 2 /kg peak or percentage predicted VO 2peak . ( Figure 5, Table 2), but was also not associated with METs peak after accounting for clinical covariables (Table S4). Hazard ratios (95% CI) represent the risk for cardiovascular events relative to phenogroup 1. Clinical covariables included age, height, weight, heart rate, systolic and diastolic blood pressure, antihypertensive drug intake and history of diabetes mellitus, chronic kidney disease, and cardiovascular intervention at baseline. Models with VO 2 /kg peak did not include weight. Models with % predicted VO 2, peak did not account for age (as already considered in the calculation of the % predicted).    The risk for cardiovascular events by the integrative CPET profiles. The adjusted hazard ratios (95% CI) accounted for differences in age, height, weight, heart rate, systolic and diastolic blood pressure, antihypertensive drug intake and a history of diabetes mellitus, chronic kidney disease, and cardiovascular intervention at baseline. Table 2 details the hazard ratios and p values from the unadjusted and adjusted models, including the models accounting for differences in VO2/kgpeak or % predicted VO2,peak. 0.0098 Hazard ratios (95% CI) represent the risk for cardiovascular events relative to phenogroup 1. Clinical covariables included age, height, weight, heart rate, systolic and diastolic blood pressure, antihypertensive drug intake and history of diabetes mellitus, chronic kidney disease, and cardiovascular intervention at baseline. Models with VO2/kgpeak did not include weight. Models with % predicted VO2, peak did not account for age (as already considered in the calculation of the % predicted).

Discussion
We leveraged clinical and CPET data from a large and heterogeneous patient sample to construct integrative cardiopulmonary exercise response profiles for CV risk stratification. In brief, unsupervised machine learning identified four CPET-based phenogroups in both men and women, which stratified the patient sample along CV risk in terms of both prevalent and future CV disease. These findings may pave the way to an integrative CPET interpreter for CV risk stratification and CV event prediction.
Previous studies demonstrated the additive prognostic value of individual CPET metrics, beyond traditional risk factors in individuals with symptomatic diseases [22][23][24]. These studies, however, focused on prognostic adverse events in patients with already established diseases, particularly symptomatic heart failure [24]. Moreover, the studies mostly tested the predictive value of only peak O2 consumption and the VE/VCO2 slope, Figure 5. The risk for cardiovascular events by the integrative CPET profiles. The adjusted hazard ratios (95% CI) accounted for differences in age, height, weight, heart rate, systolic and diastolic blood pressure, antihypertensive drug intake and a history of diabetes mellitus, chronic kidney disease, and cardiovascular intervention at baseline. Table 2 details the hazard ratios and p values from the unadjusted and adjusted models, including the models accounting for differences in VO 2 /kg peak or % predicted VO 2,peak .

Discussion
We leveraged clinical and CPET data from a large and heterogeneous patient sample to construct integrative cardiopulmonary exercise response profiles for CV risk stratification. In brief, unsupervised machine learning identified four CPET-based phenogroups in both men and women, which stratified the patient sample along CV risk in terms of both prevalent and future CV disease. These findings may pave the way to an integrative CPET interpreter for CV risk stratification and CV event prediction.
Previous studies demonstrated the additive prognostic value of individual CPET metrics, beyond traditional risk factors in individuals with symptomatic diseases [22][23][24]. These studies, however, focused on prognostic adverse events in patients with already established diseases, particularly symptomatic heart failure [24]. Moreover, the studies mostly tested the predictive value of only peak O 2 consumption and the VE/VCO 2 slope, by means of traditional regression modeling [24]. More research is warranted to assess the added value of integrative CPET evaluation, especially for the identification of at risk (asymptomatic) populations [25].
Within this context, ML approaches (either supervised or unsupervised) may enable the extraction of clinically meaningful profiles from CPET data for early CV risk stratification. Recently, plenty of studies have explored supervised ML methods for disease diagnosis or prognosis using CPET data in at-risk individuals [26][27][28], but only a limited number of studies have applied unsupervised learning approaches. For instance, a previous study on 1619 patients with chronic heart failure applied a cluster analysis on an extensive set of clinical variables, including exercise performance metrics, such as HR peak , VO 2,peak , RER peak and VE/VCO 2 slope [29]. They identified four distinct heart failure phenotypes heterogeneous in exercise capability and at risk for all-cause mortality and hospitalization. Unfortunately, the authors did not provide insights into the importance of the variables considered in the clustering, making it impossible to evaluate the contribution of the CPET metrics to the phenogroup assignment and performance.
So far, only one study has applied clustering methods to group patients solely based on CPET summary data. In 738 patients with unexplained dyspnea or exertional intolerance who underwent an invasive CPET (iCPET), the algorithm defined four patient groups based on a preselected set of 10 features [7]. Of note, although the authors adjusted minimally for confounders, the assigned phenogroups enabled prediction of three-year all-cause hospitalization independently of separate invasive measures and the clustering was validated in two external patient cohorts [7]. However, the invasiveness of the CPET protocol applied in the study limits the usefulness of its iCPET risk calculator to symptomatic patients with advanced exercise intolerance and hampers its applicability for early CV risk stratification strategies, particularly in settings with milder patient phenotypes.
In our study, the phenogrouping appeared to stratify the patient sample along CV risk in terms of both prevalent and future CV disease. Indeed, in both males and females, we identified a CPET-based phenogroup of patients with the most favorable CV risk profile as being younger, having a lower BMI, a lower prevalence of hypertension, diabetes mellitus and CV disease, a lower medication intake and the greatest response to exercise compared to the other phenogroups. Besides having the lowest risk for prevalent CV disease, patients within this phenogroup also represented the lowest risk of developing CV diseases in the future, which was likely thanks to their having the highest CRFs and more favorable CV risk profiles at baseline. In contrast, the unsupervised clustering identified a distinct CPET-based phenogroup with patients at high risk for future CV events, as it included older patients with an unfavorable CV risk profile (e.g., more CV risk factors, higher disease prevalence and higher medication intake). Of note, in males, the increased risk for future CV events of this 'poor performing' phenogroup was found to be independent of traditional CV risk factors. In both males and females, the clusters between the adequate and poor response groups were at intermediate risk with regards to CPET response, severity of CV risk profile and risk for future CV events.
As a proof of concept, our unsupervised clustering analyses illustrate that integration of CPET summary data may generate clinically useful CRF profiles that can help characterize CV health in asymptomatic individuals at risk for future CV disease. In the past, we applied a similar approach to identify echocardiography-based [18,30] and proteomebased [20] phenogroups of CV health, illustrating the potential of integrative biomarker profiling for better CV disease prediction. In addition, the clustering highlighted the CPET metrics most important for adequate discrimination of patient clusters for CV risk stratification. Indeed, the distinction between the phenogroups was predominantly based on differences in metrics of overall CRF (load peak and VO 2 /kg peak ) and peak hemodynamic response (HR peak ), and less on differences in other metrics, such as SBPpeak, RERpeak and Borg score. The non-steered preference of the phenogrouping algorithm for CRF measurements, such as VO 2 /kg peak , may explain its capability to predict CV events. Importantly, our findings demonstrated an additive value for integrative CPET profiling for CV risk stratification beyond isolated CPET metrics, as the phenogrouping itself was found to be predictive of CV events beyond highly predictive indexes, VO 2 /kg peak , and percentage predicted VO 2,peak .

Clinical Implications and Perspectives
Our approach may enable a more comprehensive, faster, and more observer-friendly way to evaluate CPET results than current approaches or it may be a complementary tool to current approaches. Future large-scale studies should further evaluate the clinical usefulness of integrative CPET profiling for CV risk stratification. In particular, studies should further unravel the extent to which CPET-based phenomapping complements or supplements current approaches for CV risk stratification. Future work should also consider how our approach can be translated to other patient populations. Validated, integrative CPET phenotyping may serve as a complementary tool for healthcare workers to improve the clinical assessment of CRF and steer clinical decision making. Overall, more research is needed to utilize the rich data originating from clinical exercise tests to the fullest. While our analyses were limited to the most common summary CPET metrics, the interpretation of the complex time series recorded during CPET may further improve assessment of CRF components and, in consequence, the risk for CV events.

Limitations and Strengths
We analyzed a rich resource of CPET data from a heterogeneous group of patients who visited a university hospital with a high standard of care. Still, medical reports remain prone to incomplete reporting of diseases and medication. Although the study sample was heterogeneous, some patient groups may have been underrepresented (selection bias), especially individuals too sick to perform a maximal CPET or too healthy to be prescribed one. Furthermore, CPET was supervised by different clinicians and physiotherapists using three devices during routine clinical practice, so caution is necessary regarding interobserver variability in CPET measurements. Yet, CPET was performed by intensively trained staff of a university hospital with a high standard of care and thorough quality controls were performed on all data, including the CPET metrics. Causality cannot be inferred from this cross-sectional setting. Lastly, the phenogrouping model produced during this study has high commercial potential, but a stringent development procedure for the software to be used as a medical device is required before it can be used as a real-life medical application.

Conclusions
CPET-based clustering by means of unsupervised machine learning algorithms identified integrative CPET profiles, stratifying a heterogeneous patient sample according to CV risk. Such phenogrouping may enable an integrative interpretation of CPET results, facilitate a more comprehensive assessment of CRF and complement risk stratification strategies in asymptomatic individuals. Future studies should further validate the clinical value of cardiopulmonary exercise response profiles for CV risk stratification in the community and in at-risk patient populations.
Supplementary Materials: The following supporting information can be downloaded at: https:// www.mdpi.com/article/10.3390/diagnostics13122051/s1, A Supplemental Data File presenting: Supplemental Methods: Table S1: Cardiovascular Diseases and Surgery. Table S2: Characteristics of female patients by phenogroup. Table S3: Characteristics of male patients by phenogroup. Table S4: Multivariable-adjusted risk for cardiovascular events by peak Metabolic Equivalents of Task (METs). Figure S1. Phenogrouping heatmaps for female (left) and male (right) patients from agglomerative hierarchical biclustering analysis on CPET features. Figure S2. Selection of optimal number of clusters according to the Bayesian Information Criterion (BIC). Figure S3. Discriminative power of the CPET features used for phenogrouping by sex.

Institutional Review Board Statement:
The study was conducted in accordance with the Declaration of Helsinki, and approved by the Ethics Committee of UZ/KU Leuven (protocol code S64901, date of approval: 11 March 2021).
Informed Consent Statement: Patient consent was waived as only retrospectively collected anonymous patient data were analyzed.

Data Availability Statement:
The data presented in this study are available upon reasonable request from the corresponding author. The data are not publicly available due to privacy and ethical issues.