Tailored Risk Stratification in Severe Mitral Regurgitation and Heart Failure Using Supervised Learning Techniques

Background Secondary mitral regurgitation (sMR) in the setting of heart failure (HF) has considerable impact on quality of life, HF rehospitalizations, and mortality. Identification of high-risk cohorts is essential to understand disease trajectories and for risk stratification. Objectives This study aimed to provide a structured decision tree–like approach to risk stratification in patients with severe sMR and HF. Methods This observational study included 1,317 patients with severe sMR from the entire HF spectrum. Clinical, echocardiographic, and laboratory data were extracted for all patients. The primary end point was all-cause mortality. Survival tree analysis, a supervised learning technique, was applied to identify patient subgroups at risk of mortality and further stratified by HF subtype (preserved, mildly reduced, and reduced ejection fraction). Results Using supervised learning (survival tree method), 8 distinct subgroups were identified that differed significantly in long-term survival. Subgroup 7, characterized by younger age (≤66 years), higher hemoglobin (>12.7 g/dL), and higher albumin levels (>40.6 g/L) had the best survival. In contrast, subgroup 5 displayed a 20-fold risk of mortality (hazard ratio: 20.38 [95% CI: 10.78-38.52]); P < 0.001 and had older age (>68 years), low serum albumin (≤40.6 g/L), and higher NT-proBNP levels (≥9,750 pg/mL). Unique subgroups were further identified for each type of HF subtypes. Conclusions Supervised machine learning reveals heterogeneity in the sMR risk spectrum, highlighting the clinical variability in the population. A decision tree–like model can help identify differences in outcomes among subgroups and can help provide tailored risk stratification.

2][3] Severe sMR is specifically relevant because therapeutic strategies can improve prognosis. 4,5equate treatment relies on patient selection and risk stratification.Recently, 2 randomized trials have shown the importance of patient selection specifically for patients with severe sMR and reduced ejection fraction. 4,6,7Previous investigations from our group demonstrated a consistent adverse effect of severe sMR on survival in all HF subtypes (heart failure with preserved ejection fraction [HFpEF], heart failure with mildly reduced ejection fraction [HFmrEF], and heart failure with reduced ejection fraction [HFrEF]) in comparison to patients with HF and no/mild sMR.Although the effect on outcome was most pronounced in patients with HFmrEF, excess mortality was present in all HF subtypes in short-and long-term follow-up, thereby implying that other features than the HF subtype might be of central importance in the disease process. 3high burden of comorbidities and systemic involvement suggest severe sMR and HF are a complex clinical entity. 3The results of the COAPT and MITRA-FR studies have mainly been interpreted in terms of left ventricular (LV) size and function. 4,6These anatomical and functional substrates of sMR are important but only partially explain the prognosis in sMR. 8The impact of individual risks, systemic factors, and comorbidities are less well studied but are essential for understanding prognosis, improving personalized risk stratification, and may be helpful when identifying therapeutic targets.
We therefore aimed to investigate the association between a comprehensive and readily available set of clinical, echocardiographic, and laboratory variables and outcome in HF patients with severe sMR.
Furthermore, we aimed to provide a comprehensive and structured decision tree-like guide to risk stratification using supervised machine learning methods.
To further improve risk stratification, we also applied this structured approach to each HF subtypes (HFpEF, HFmrEF, and HFrEF).

METHODS
STUDY POPULATION.This is an observational cohort study of patients with sMR and HF identified from the Medical University of Vienna's longitudinal health records and the institutional echocardiography database seen between 2010 and 2020.HF was diagnosed according to guidelines criteria 9 and subclassified into HF with preserved (LV ejection fraction [LVEF] $50%; HFpEF), mildly reduced (LVEF 40%-49%; HFmrEF), and reduced ejection fraction (LVEF <40%; HFrEF).
The diagnostic algorithm also included signs and symptoms of HF, natriuretic peptides, and echocar- Commercially available equipment was used, and board-certified physicians interpreted the images.
Examinations and interpretation were performed as recommended by current guidelines.Heitzinger et al First, 70% of the patients were assigned to the derivation cohort, whereas the remaining 30% served as the validation cohort, and second, the splitting procedure was stratified by HF subtype to ensure equal representation of each HF subtype.All main analyses (ie, univariate analysis, bootstrap selection, and survival tree analysis) were performed on the derivation data set, and results were then internally validated on the validation cohort.In addition, we implemented temporal validation to ensure consistency of sMR diagnosis and applicability of identified subgroups over a long period.Patients were therefore pooled into 2 cohorts stratified by inclusion year in an alternating pattern.We also applied our below outlined methodology on each HF subtype separately with the learning sample derived from the whole study cohort.Because of sample size limitations, we choose to abstain from internal validation in the subgroup analysis (HFpEF, HFmrEF, and HFrEF).
STATISTICAL ANALYSIS.Continuous data are presented as median and interquartile range, and categorical data are presented as count and percent.Data were compared with the Kruskal-Wallis test for the former and chi-square test for the latter.The results of the univariate Cox proportional hazard regression analysis were depicted in forest plots.Three models were formed, encompassing: 1) all clinical parameters; 2) all laboratory variables; and 3) all echocardiographic variables and a stepwise bootstrap resampling procedure using forward and backward selection with 500 repeats was applied to each model.This bootstrap resampling procedure 2,12,13 was used to identify those variables, which were most frequently included in the respective final model.
Variables selected in $85% of all repeats were kept for further analyses and used as covariates for a survival tree-based model, a form of supervised learning. 14Supervised learning refers to a form of machine learning, where the underlying data are labeled, and the aim is classification of observations based on the available features. 15,16The survival tree method applies recursive partitioning and groups patients in accordance to their survival data.The survival tree is grown by iteratively performing 2 steps at every split until a stopping criterion is met, and the tree ends in its terminal leaves (ie, the final risk groups).In the first step out of all covariates, the variable with the highest association to mortality is selected, and then as a second step, the optimal split that maximizes the survival difference for the chosen variable is assessed.The checklist list is provided in Supplemental Table 2.For all analyses, a 2-sided P value <0.05 was considered statistically significant.SPSS 25 (IBM Corp) and the R software (R Core Team [2020], R: A language and environment for statistical computing, R Foundation for Statistical Computing) were used for all analyses.A detailed description of packages used is provided in Supplemental Table 1.

BASELINE CHARACTERISTICS OF THE TOTAL STUDY
POPULATION.A total of 1,317 patients (median age 71 [61-78] years) with severe sMR and HF were included in this observational study, 60% of which were male.

Heitzinger et al
Risk Stratification in Severe Secondary Mitral Regurgitation and Heart Failure Heitzinger et al

Risk Stratification in Severe Secondary Mitral Regurgitation and Heart Failure
A U G U S T 2 0 2 2 : 1 0 0 0 6 3 Among those, 331 had HFpEF, 330 had HFmrEF, and 656 had HFrEF.A comprehensive overview of baseline characteristics for all HF subtypes is presented in Table 1.Overall comorbidity burden was high, as almost half of the patients had a history of CAD (n ¼ 641, 49%), 677 (51%) had hypertension, and diabetes was present in 276 (21%).LV end-diastolic diameter (P < 0.001) and right ventricular enddiastolic diameter (P < 0.001) both increased with worsening systolic function; however, atrial dimensions did not significantly differ between HF subtypes (Table 1) These data are presented in Supplemental Table 3.

Intraclass correlation coefficient and Kappa statistics
showed excellent correlations in accordance with recommendations from the American Society of Echocardiography, 19 and in addition, Bland-Altman plots (Supplemental Figure 1) show good agreement In the first step, the association between all recorded parameters and outcome by univariate Cox regression analysis (Supplemental Figure 2) was investigated.
We then used a bootstrap resampling procedure to identify the most comprehensive set of variables associated with adverse outcome among: 1) all clinical parameters; 2) all echocardiographic variables; and 3) all laboratory variables (Supplemental Figure 3).
Within the clinical parameters, age and peripheral artery disease (PAD) were selected most frequently.
In the echocardiographic model, LV function and LV end-diastolic diameter were the most significant predictors, and among laboratory variables, blood urea nitrogen (BUN), creatinine, and albumin were found to be the most frequent predictors.Values are HR (95% CI) unless otherwise indicated.
BUN ¼ blood urea nitrogen (mg/dL); HB ¼ hemoglobin (g/dL); NT-proBNP ¼ N-terminal pro brain-type natriuretic peptide (pg/mL).analysis, depicted in Table 2, further supported the robustness of these results.To verify the applicability of our 8 risk groups on long-term data, temporal validation was performed.These results are depicted in Supplemental Figure 5 and show overall good consistency of subgroups.

PREDICTORS OF MORTALITY AND SURVIVAL AMONG HF SUBTYPES IN THE TOTAL STUDY
COHORT.Univariate Cox regression analysis for each clinical, echocardiographic, and laboratory predictor according to HF subtype is depicted in Supplemental Figure 6.Supplemental Figure 7 shows the results of stepwise bootstrap selection.
In HFpEF, patients' age and female sex were selected most frequently in the clinical model, and bilirubin, BUN, cholesterol, and albumin were selected most frequently in the laboratory model (Supplemental Figure 7).The survival tree analysis (Supplemental Figure 8) identified 3 separate subgroups.HFpEF patients with BUN >24.7 mg/dL had the lowest survival of 31% at 6 years.A further split was selected with cholesterol, where subgroup 2 with higher total cholesterol and better kidney function had the lowest risk of mortality.The details are shown on the Kaplan-Meier curves and corresponding HRs in In HFmrEF, the most frequently selected variables were atrial fibrillation, body mass index, LV enddiastolic diameter, albumin, and aspartate transaminase (Supplemental Figure 7).In survival tree analysis, HFmrEF subgroup 1, defined by hypoalbuminemia (#35.2 g/L), had the worst survival at 25% after 6 years.The remaining subgroups were stratified according to age and presence of atrial fibrillation.Subgroup 2 with younger patients and no hypoalbuminemia had the best survival.The survival tree is visualized in Supplemental Figure 9, and the risk of mortality for each subgroup is presented in Table 3.In comparison to subgroup 2, patients with hypoalbuminemia in subgroup 1 had a significantly higher mortality (HR: 5.7 [95% CI: 3.05-10.66]; In HFrEF, age and PAD were the most frequently selected clinical variables, and LV end-diastolic diameter was the most frequently selected echocardiographic variable.NT-proBNP was the most selected laboratory variable, followed by aspartate transaminase, hemoglobin, and BUN.The survival tree in HFrEF patients was mainly shaped by NT-proBNP.Subgroup 1, consisting of younger patients with lower NT-proBNP levels, had the most favorable survival prognosis.Mortality was highest in older patients with excessive NT-proBNP levels in subgroup 6.The partitioning algorithm also determined additional subgroups, which were defined by renal function and hemoglobin levels.Apart from subgroup 1, survival was severely impaired in all patients (Supplemental Figure 10).In reference to subgroup 1, the risk of mortality saw a near 9-fold increase in subgroup 6 (HR: 9.34 [95% CI: 6.21-14.06];P < 0.001).HRs for the corresponding subgroups are in Table 3.

DISCUSSION
In this study, we performed a unique large-scale in-depth risk analysis in patients with sMR and HF using a supervised machine learning approach.This ters.We could identify high risk subgroups with a 20-fold risk of mortality compared with the group with the most favorable mortality serving as the reference.This analysis highlights that sMR is not a simple lesion but a complex syndrome with multifaceted comorbidities, diverse anatomic substrates, and variable systemic involvement.The term covers a broad patient spectrum, ranging from patients that otherwise are nearly healthy to those with disproportionately high risk of mortality (Figure 2).
This heterogenous spectrum of sMR needs to be considered, and a tailored approach to risk stratification may be a valuable tool to guide therapeutic decisions.

RISK FACTORS FOR MORTALITY IN PATIENTS WITH
SEVERE sMR AND HF.The comprehensive set of clinical, echocardiographic, and laboratory parameters in this study covers a broad range of factors and includes patients with multiorgan involvement.
Among the most important predictors in our survival trees were BUN, albumin, hemoglobin, and NT-proBNP.Previous studies have shown the predictive role of these laboratory parameters in HF patients.
Low serum albumin was found to be associated with adverse cardiovascular outcomes in both HFrEF and HFpEF, 19,20 although it remains unclear whether hypoalbuminemia is the result of impaired liver function or result of hemodilution from volume overload.In severe sMR, it is probable that the latter is the case, as liver function may not yet be severely impaired.Similar to other HF cohorts, 21 we found low hemoglobin to be associated with mortality.Anemia secondary to hemodilution is frequent in chronic HF and has a worse prognosis than true anemia. 22,23BUN has been repetitively shown to be a powerful predictor of mortality in HF. [24][25][26] In addition, some studies even report superiority to creatinine-based markers of renal function. 25In the setting of severe sMR, increases in BUN may be interpreted as a marker of forward failure.Renal malperfusion due to reduced cardiac output in severe sMR stimulates neurohumoral response and vasoconstriction in the afferent arterioles, leading to increased water and urea absorption, thereby increasing BUN. 26 regard to echocardiographic variables, markers of LV function and LV dimension were selected by the bootstrap procedure as the most relevant predictors diagnosis and treatment planning.In the setting of concomitant severe sMR, however, we previously found that the HF subtype does not add prognostic information. 3Therefore, the current analysis helps improve our understanding of this condition by identifying other variables that impact outcome.
Nevertheless, substratification into HF subtypes still holds relevance even in sMR, as survival tree analysis revealed distinct risk features of HF subtypes, allowing for a personal and individual risk stratification (Figure 3).sMR in HF-and therefore are not generalizable to all HF patients.Also, we chose to use a single survival tree over "bagged" decision trees.Although this may affect the predictive accuracy, model interpretability is higher, and application of structured decision trees is more feasible for everyday clinical application.

CONCLUSIONS
Machine learning (survival tree analysis) is helpful to highlight the complex and heterogenous outcomes in patients with severe sMR and HF.Using this approach, subgroups of patients with disproportionately high mortality can be identified.This phenotype-driven approach may help with personalized risk stratification and the identification of targets for optimizing patient care.

FUNDING SUPPORT AND AUTHOR DISCLOSURES
In our case, the stopping criterion was a minimum of 95 patients in the terminal leaves to ensure a large enough sample size for subsequent Kaplan-Meier analysis but still retain sufficient and clinically relevant subgroup differentiation.The minimum criterion for node split was defined as P < 0.05.The Central Illustration provides a visual reproduction of the methodology used.In a final step, we used Kaplan-Meier and univariate Cox proportional hazards regression analysis to further investigate each identified subgroup.All analyses were planned and performed in accordance with the Proposed Requirements for Cardiovascular Imaging-Related Machine Learning Evaluation (PRIME): A Checklist, recently published by Dr Sengupta et al.

Risk
CENTRAL ILLUSTRATION Supervised Learning Techniques Reveal the Heterogeneity in Severe Secondary Mitral Regurgitation Heitzinger G, et al.JACC Adv.2022;1(3):100063.The study included 1,317 patients with severe secondary mitral regurgitation and heart failure.Clinical, echocardiographic, and laboratory parameters were used to predict mortality in the derivation cohort.The most relevant predictors were investigated further using supervised learning techniques and 8 distinct subgroups could be identified, revealing a heterogenous risk spectrum in patients with severe secondary mitral regurgitation and heart failure.Additional analysis demonstrated unique risk factor profiles for each heart failure subtype.AFIB ¼ atrial fibrillation; BUN ¼ blood urea nitrogen; HB ¼ hemoglobin; HFmrEF ¼ heart failure with mildly reduced ejection fraction; HFpEF ¼ heart failure with preserved ejection fraction; HFrEF ¼ heart failure with reduced ejection fraction; NT-proBNP ¼ N-terminal pro brain-type natriuretic peptide.J A C C : A D V A N C E S , V O L . 1 , N O . 3 Stratification in Severe Secondary Mitral Regurgitation and Heart Failure of vena contracta width measurements and indicate no systemic bias.Moreover, the raw data is presented in Supplemental Table 5. DERIVATION AND VALIDATION COHORT CHARACTERISTICS.Conditional random sampling of individual patient data allocated 923 patients to the derivation cohort and 394 to the validation data set.Accordingly, strata were equally weighted in the initial and the split data sets (25% HFpEF, 25% HFmrEF, and 50% HFrEF).Baseline characteristics according to the respective cohort are presented in Supplemental Table 4 and showed equal data distribution without evidence of systematic bias.RISK PHENOTYPING OF PATIENTS WITH SEVERE sMR AND HF IN THE DERIVATION COHORT.The median survival time of patients in the derivation cohort was 5.4 years, in which 371 patients died.

1 .
SURVIVAL TREE-BASED MODEL FOR PATIENTS WITH sMR AND HF IN THE DERIVATION COHORT.All prior selected variables were further analyzed by recursive partitioning into 8 distinct risk groups that significantly differ in their long-term survival.Predictors, cutoffs, and the according Kaplan-Meier curve for each subgroup are depicted in Figure Patients with the most favorable survival were allocated to subgroup 7, characterized by several splits.These include a younger age #66 years, hemoglobin >12.7 g/dL, and albumin >40.6 g/L and resulted in an estimated survival of 97% after 1 year and 85% after 6 years.In comparison, the patient subset with the highest mortality was subgroup 5, with an estimated survival of 48% after 1 year and 11% after 6 years consisting of older patients (age >68 years) with hypoalbuminemia (#40.6 g/L) and highly elevated NT-proBNP (9,570 pg/mL).The remaining subgroups and their respective cutoffs are visualized using a survival tree in Figure 1.Survival analysis by Kaplan-Meier estimates is presented in Figure 2 and color coded according to survival.With subgroup 7 as

FIGURE 1 2 Risk
FIGURE 1 Survival Tree Analysis for Patients With Severe Secondary Mitral Regurgitation and Heart Failure (Derivation Cohort) IN THE VALIDATION COHORT.Subgroups with respective cutoffs identified by survival tree analysis based on the derivation cohort were validated on the remaining 394 patients.Kaplan-Meier analysis (Supplemental Figure 4) yielded consistent results and showed similar differentiation of subgroups.Cox proportional hazard

FIGURE 2 Risk
FIGURE 2 Kaplan-Meier Survival Curves Stratified According to Supervised Learning Derived Subgroups (Derivation Cohort)

FIGURE 3 2 Risk
FIGURE 3 Kaplan-Meier Survival Curves Stratified According to Heart Failure Subtypes (Subgroup Analysis) The specific strengths of this study included: 1) a comprehensive diagnosis of HF and the specific subtypes according to societal guideline criteria; 2) detailed echocardiographic diagnostic of sMR etiology and severity;3) risk phenotyping covering all HF subtypes as well as a broad spectrum of anatomic substrates, systemic involvement, and comorbid conditions; and 4) consistent results on internal and temporal validation highlighting the reliability of the findings.Although the present data provide the most comprehensive contemporary information about risk stratification in sMR and HF, external validation might improve applicability in clinical practice.Despite clinical, echocardiographic, and laboratory information, even more data-such as detailed medical therapy, information on secondary end points, and incorporation of other imaging modalities-could further improve individual risk estimation.Our data provide results for a very specific patient population-namely, severe

2 Risk
This work was supported by a grant of the Austrian Science Fund (FWFidentification number: KLI-818B).The authors have reported that they have no relationships relevant to the contents of this paper to disclose.ADDRESS FOR CORRESPONDENCE: Dr Philipp E. Bartko, Department of Internal Medicine II, Medical University of Vienna, Waehringer Guertel 18-20, Vienna A-1090, Austria.E-mail: philippemanuel.bartko@meduniwien.ac.at.PERSPECTIVES COMPETENCY IN PATIENT CARE AND PROCE-DURAL SKILLS: In patients with severe sMR and HF, a heterogenous risk spectrum was identified by supervised learning techniques.Patients with younger age, better renal function, and higher hemoglobin values had the most favorable survival, whereas older patients with low serum albumin and higher NT-proBNP values experience a 20-fold risk increase in mortality.TRANSLATIONAL OUTLOOK: Further studies are needed to refine the therapeutic management for sMR in every HF subtype, taking into account the complex underlying heterogeneity in this population.Heitzinger et al J A C C : A D V A N C E S , V O L . 1 , N O . 3 , 2 0 2 Stratification in Severe Secondary Mitral Regurgitation and Heart Failure A U G U S T 2 0 2 2 : 1 0 0 0 6 3 Risk Stratification in Severe Secondary Mitral Regurgitation and Heart Failure

TABLE 2
Subgroups Identified by Survival Tree Analysis in the Derivation and Validation Cohorts

TABLE 3
Subgroups Identified by Survival Tree Analysis According to 1While echocardiographic variable will continue to be important, this study highlights that, in a survival tree model, laboratory and clinical variables provide better differentiation of subgroups for all-cause mortality.RISK STRATIFICATION IN HF SUBTYPES.Classification of HF patients into subtypes is essential for