Reliability of EuroSCORE II on Prediction of Thirty-Day Mortality and Long-Term Results in Patients Treated with Sutureless Valves

Background: EuroSCORE II (ES2) is a reliable tool for preoperative cardiac surgery mortality risk prediction; however, a patient’s age, a surgical procedure’s weight and the new devices available may cause its accuracy to drift. We sought to investigate ES2 performance related to the surgical risk and late mortality estimation in patients who underwent aortic valve replacement (AVR) with sutureless valves. Methods: Between 2012 and 2021, a total of 1126 patients with isolated aortic stenosis who underwent surgical AVR by means of sutureless valves were retrospectively collected from six European centers. Patients were stratified into three groups according to the EuroSCORE II risk classes (ES2 < 4%, ES2 4–8% and ES2 > 8%). The accuracy of ES2 in estimating mortality risk was assessed using the standardized mortality ratio (O/E ratio), ROC curves (AUC) and Hosmer–Lemeshow (HL) test for goodness-of-fit. Results: The overall observed mortality was 3.0% (predicted mortality ES2: 5.39%) with an observed/expected (O/E) ratio of 0.64 (confidential interval (CI): 0.49–0.89). In our population, ES2 showed a moderate discriminating power (AUC 0.65, 95%CI 0.56–0.72, p < 0.001; HL p = 0.798). Good accuracy was found in patients with ES2 < 4% (O/E ratio 0.54, 95%CI 0.23–1.20, AUC 0.75, p < 0.001, HL p = 0.999) and for patients with an age < 75 years (O/E ratio 0.98, 95%CI 0.45–1.96, AUC 0.76, p = 0.004, HL p = 0.762). Moderate discrimination was observed for ES2 in the estimation of long-term risk of mortality (AUC 0.64, 95%CI: 0.60–0.68, p < 0.001). Conclusions: EuroSCORE II showed good accuracy in patients with an age < 75 years and patients with ES2 < 4%, while overestimating risk in the other subgroups. A recalibration of the model should be taken into account based on the complexity of actual patients and impact of new technologies.


Introduction
Results of transcatheter aortic valve implantation (TAVI) have dramatically changed the treatment of degenerative aortic stenosis [1][2][3].Consequently, a careful preoperative risk assessment is now required to establish the best treatment option between surgery and TAVI [4].In this setting, surgical risk scores are useful tools for preoperative risk prediction and decision-making processes [5,6].EuroSCORE II (ES2) is an easy, user-friendly risk score for operative mortality prediction [7,8]; however, improvement in clinical practice, the spread of minimally invasive surgical approaches and the increasing burden of patients' comorbidities [9][10][11] may cause its predictive accuracy to drift.In particular, patients' ages have been demonstrated to potentially jeopardize the reliability and the predictive power of ES2 [12]: its calibration decreases in patients over 70 years at intermediate to high surgical risk [13], and it tends to overestimate risk in octogenarians [14].
The introduction in the surgical armamentarium of sutureless aortic valves (Su) for patients undergoing aortic valve replacement (AVR) reduced operative risks and times [15,16] with excellent safety profiles observed even in elderly patients with intermediate and high surgical risk [17,18].However, the use of such new devices may represent an additional confounding factor in the use of ES2 to predict preoperative risk.As a matter of fact, surgical risk scores do not take into account the usage of new technologies, such Su-AVR and TAVI, for the advantages and potential onset of device complications such as paravalvular leaks (PVL) and permanent pacemaker implantation, well known as independent predictors for long-term survival.
In this study, we sought to investigate the performance of EuroSCORE II in predicting early and long-term mortality in different surgical risk classes using a larger European dataset in patients who underwent Su-AVR.

Materials and Methods
Data from a total of 1126 consecutive patients who had undergone isolated Su-AVR by means of sutureless Perceval S valve (CorCym Srl, Saluggia, Italy) were retrospectively collected between January 2012 and December 2021 from six European centers.Patients who had undergone combined procedures were excluded from this study.
Patients were judged unsuitable for Perceval implantation if the ratio between the sinotubular junction (STJ) and aortic annulus diameter was over 1.3.Measures of STJ, annulus and distance between the annulus and STJ were obtained with ECG-gated angio-CT scans or transthoracic echocardiography.
Sievers type I bicuspid aortic valve was considered a relative contraindication, while patients with bicuspid valve Sievers type 0 (true bicuspid without raphe) were excluded.
The primary outcomes were 30-day mortality and EuroSCORE II calibration for mortality prediction.Secondary outcomes were long-term mortality, major adverse cardiac and cerebrovascular events at follow-up, EuroSCORE II performance for mortality risk prediction at follow-up and identification of independent predictors for mortality at follow-up.

Statistical Analysis
The distribution normality of continuous variables was tested with the Kolmogorov-Smirnov test.Data conforming to normal distribution were described as mean with standard deviation, while non-normal distribution was described as median with interquartile range.ANOVA and Kruskal-Wallis tests were used for normal and non-normal intergroup comparisons, respectively.Categorical variables were expressed as frequency and percentage and compared with the Pearson Chi-square test or Fisher's exact test when the expected frequencies of one or more cells was less than 5.
Standardized mortality ratio (observed/expected ratio, O/E ratio) s was calculated at 30 days with a 95% confidence interval (CI).
Kaplan-Meier curves of the three groups were charted for late overall mortality, cardiac-related mortality and MACCEs with 95%CI, and compared using a log-rank test.Univariable and multivariable Cox regression analyses were used to investigate the effect of preoperative characteristics on survival.
The discriminatory capacity of the EuroSCORE II on 30-day mortality was evaluated by the area under the receiver operating characteristics (ROC) curve (AUC).Calibration refers to the agreement between observed and predicted in-hospital mortality.Overall model calibration was assessed by comparing observed and predicted mortality in 10 equally sized subgroups in increasing order of patient risk, according to the Hosmer-Lemeshow (HL) test for goodness-of-fit.An HL p-value > 0.05 indicates a well-calibrated model for the study population.
The statistical analyses were performed with SPSS version 26.0 (SPSS Inc., Chicago, IL, USA).The values were statistically significant at p ≤ 0.05.
Significant differences were reported in patients' demographics between the three risk groups.Specifically, age, gender, body surface area (BSA), dyslipidemia, diabetes, previous atrial fibrillation, peripheral artery disease, previous CABG, previous stroke, GFR < 30 mL/min, low ejection fraction, NYHA III/IV, mitral regurgitation ≥ II, COPD and echocardiographic data were significantly different between groups, thus reflecting the increase in patients' complexity according to the increase in risk profile (Table S1).
Binary logistic regression showed that age over 75 years did not increase the risk of 30-day mortality (odds ratio 0.66, 95%CI 0.32-1.34,p = 0.256), while mitral valve regurgitation of a grade ≥ II was identified as a predictor of mortality (OR 2.56, 95% 1.04-5.72,p = 0.040).Hosmer-Lemeshow goodness-of-fit test revealed an acceptable calibration in the overall population and in patients younger than 75 years old (p = 0.399 and p = 0.534, respectively) (Figure 3A,B) (Table 2B).

of 11
Binary logistic regression showed that age over 75 years did not increase the risk of 30-day mortality (odds ratio 0.66, 95%CI 0.32-1.34,p = 0.256), while mitral valve regurgitation of a grade ≥ II was identified as a predictor of mortality (OR 2.56, 95% 1.04-5.72,p = 0.040).
The results of the Cox regression analysis for all-cause death are presented in Table 3.At univariate analysis EuroSCORE II > 8%, male gender, preoperative atrial fibrillation and advanced NYHA class (III-IV) were identified as independent predictors of all-cause death.After multivariable analysis adjustment, only EuroSCORE > 8% (HR: 1.82, 95%CI: 1.29-2.56,p < 0.001) and preoperative atrial fibrillation (HR: 1.74, 95%CI: 1.14-2.65,p < 0.009) were identified as independent predictors of all-cause death.Figure 5 shows a continuous relationship between EuroSCORE II and HR for mortality based on restricted cubic spline models.Receiver operating characteristic curves of the EuroSCORE II for mortality at follow-up showed an AUC of 0.64, 95%CI: 0.60-0.68,p < 0.001.The ROC statistics revealed a EuroSCORE II score of 6.9% as the optimal cutoff point for the prediction of mortality (sensitivity 0.62, 1-specificity 0.36).Figure 5 shows a continuous relationship between EuroSCORE II and HR for mortality based on restricted cubic spline models.Receiver operating characteristic curves of the EuroSCORE II for mortality at follow-up showed an AUC of 0.64, 95%CI: 0.60-0.68,p < 0.001.The ROC statistics revealed a EuroSCORE II score of 6.9% as the optimal cutoff point for the prediction of mortality (sensitivity 0.62, 1-specificity 0.36).

Discussion
This multi-institutional study represented a "real-world" experience evaluating the performance of EuroSCORE II in predicting 30-day and long-term mortality on a large population of patients undergoing Su-AVR.
The major findings of our study are as follows: 1. EuroSCORE II overestimated the risk of mortality with an E/O ratio of 0.64 (95% IC

Discussion
This multi-institutional study represented a "real-world" experience evaluating the performance of EuroSCORE II in predicting 30-day and long-term mortality on a large population of patients undergoing Su-AVR.
The major findings of our study are as follows: 1.
EuroSCORE II overestimated the risk of mortality with an E/O ratio of 0.64 (95% IC: 0.49-0.89) in patients who underwent Su-AVR; 2.
The ROC curve tested on the whole population showed an acceptable accuracy of EuroSCORE II in estimating 30-day mortality with an AUC of 0.65; 3.
In the L-ES2 subgroup, EuroSCORE II had good predictive power with an E/O ratio of 0.98 (0.45-1.96); 4.
In patients younger than 75 years, the ROC curve showed good estimation performance in 30-day mortality with an AUC of 0.75.Suboptimal results were observed in patients older than 75 years, with an AUC of 0.61; 5.
The ROC curve showed a performance barely acceptable when ES II was used to estimate the risk of mortality at 60 months follow-up (AUC 0.64).
The performance of EuroSCORE II has been extensively evaluated in several studies reporting an improvement in risk assessment when compared to the previous version of Logistic and Addictive EuroSCORE I [7,8].Nevertheless, these validation studies were mostly performed on relatively young patients and do not reflect the current landscape [5].
Increased life expectancy has led to a progressive increase in the median age of patients requiring a cardiac intervention.Conversely, the increase in patients' complexity did not lead to a significant increase in operative mortality, mainly related to an improvement in surgical, anesthesiologic and perioperative care [9,10].As a result, risk stratification by means of proper scores is considered mandatory in order to establish the best therapeutic strategy, especially in elderly patients with aortic stenosis.The choice between a transcatheter or surgical approach strictly depends on the preoperative risk assessment [19,20].
It has been shown that age, comorbidities and the weight of the planned surgical procedure might influence the discriminating capability of such risk scores.In this setting, aortic valve replacement procedures have been reported to reduce the discrimination of EuroSCORE II [9][10][11][12] when weighted with other surgical procedures.
Results of the present study showed the observed mortality in the whole population to be lower than the expected mortality calculated with EuroSCORE II (observed 3.02%/expected 5.39%), thus leading to a significant risk of overestimation in the intermediate-(O/E ratio: 0.61, 95%CI: 0.37-0.96)and high-risk (O/E ratio: 0.3, 95%CI: 0.11-0.65)groups.Onorati et al. already reported this "overestimation" tendency in high-risk patients with an observed mortality rate in intermediate (observed 3.0% vs. expected 5.8%)-and high (observed 2.1% vs. expected 15.4%)-surgical-risk patients lower than expected [21].Similarly, Howell and colleagues investigated the low discrimination power of the ES II to predict mortality in high-risk patients reporting an AUC of 0.67 and a failure in the calibration test [10].A similar attitude was observed in TAVI patients, where EuroSCORE II overestimated the observed mortality, thus reinforcing the concept that minimally invasive procedures may influence the incidence of mortality, suggesting that the effect on the predictive power of the weight of the intervention should be tested [22,23].
Several studies demonstrated that elderly patients represent a special population in cardiac surgery.Provenchère et al. found low calibration of EuroSCORE II in octogenarian patients undergoing cardiac surgery [14], and Poullis et al. reported an AUC below 0.7 in patients > 70 years old receiving AVR [13].Consistent with these previous results, the present study reported good predictive power in patients younger than 75 years old (AUC 0.75, p < 0.001), while the AUC decreased in elderly patients with an overestimation of risk [13,14].
Nevertheless, in the subgroup of younger patients (age < 75 years), we reported fair calibration when assessed by means of the Hosmer-Lemeshow goodness-of-fit test, similarly to previous results obtained merely with EuroSCORE II [7,8], thus suggesting age has a relevant impact on the performance of the risk calculator.This finding could be explained with the relative under-representation of elderly patients in the validation dataset of EuroSCORE II [5].Since patients' age represents a discriminant in the decisionmaking process of treatment of severe aortic stenosis and the latest VHD guidelines have recommended TAVI in patients over 75 years, our results suggest a word of caution.Moreover, EuroSCORE II does not take into account some anatomical (aortic calcification, aortic annulus dimensions) and biological (frailty, organ function) characteristics that, if included, could increase its accuracy.A recalibration or update of EuroSCORE II should be seriously considered.Nowadays, the best way to guide clinical practice is The Heart Team approach [24].In this field, machine learning has demonstrated itself to be a valid option for improving decision-making processes in the evaluation of preoperative risk for aortic valve replacement [25].
According to our findings, the performance of a surgical risk score should not only be limited to short-term/30-day mortality but also consider the capability to properly estimate mid-and long-term outcomes [24].In this regard, high-risk patients may take advantage of cardiac surgery in the short term; however, they might be affected by poorer mid-and long-term outcomes related to their comorbidities [26].Hence, risk stratification on long-term outcomes is of paramount importance beyond immediate results.Predictive factors influencing 30-day mortality may not play a role in determining outcomes mid-and long-term, whereas other neglected predictors might be determinant.We demonstrated that EuroSCORE II has a low accuracy for the estimation of mortality at mid-and long-term follow-up.Indeed, according to Barili et al., our results pointed out a low calibration of EuroSCORE II in predicting 5-year mortality [27].
A similar phenomenon was reported for the STS score by Ishimizu and colleagues, who analyzed the 5-year outcomes of 2588 patients who underwent TAVI.This study showed that patients at high surgical risk had a higher mortality when compared to intermediate-and low-risk patients, and high surgical risk was found to be an independent predictor of mortality.Despite these findings, the calibration of STS score on the prediction of long-term-results remains poor (AUC 0.63) [23].
The main limitation of this study is its retrospective nature, and the results may have been affected by selection bias and unknown potential confounders.Moreover, the analysis in the present study focused only on the performance of EuroSCORE II.

Conclusions
Considering our results, we suggest a recalibration of EuroSCORE II in determining the risk of elderly patients, keeping in mind that the decision on surgical versus percutaneous interventions should be drawn based on an updated tool.The incoming improvement in the EuroSCORE II should also consider other contemporary outcome predictors, such as frailty and anatomical patients' features.Nowadays, machine learning should be taken into account as an integrative tool in improving decision-making processes choosing the best therapeutic strategy for every patient.

Figure 1 .
Figure 1.(A) ROC for EuroSCORE II in the overall population.(B) ROC for EuroSCORE II in patients younger than 75 years old.(C) ROC for EuroSCORE II in patients > 75 years.

Figure 1 .
Figure 1.(A) ROC for EuroSCORE II in the overall population.(B) ROC for EuroSCORE II in patients younger than 75 years old.(C) ROC for EuroSCORE II in patients > 75 years.

Figure 3 .
Figure 3. (A) Calibration plot for EuroSCORE II in the overall population.(B) Calibration plot in patients younger than 75 years.

Figure 3 .
Figure 3. (A) Calibration plot for EuroSCORE II in the overall population.(B) Calibration plot in patients younger than 75 years.

Figure 4 .
Figure 4. (A) Kaplan-Meier curves for overall survival in patients at low, intermediate and high surgical risk.(B) Kaplan-Meier curves for survival freedom from MACCE in patients at low, intermediate and high surgical risk.

Figure 4 .
Figure 4. (A) Kaplan-Meier curves for overall survival in patients at low, intermediate and high surgical risk.(B) Kaplan-Meier curves for survival freedom from MACCE in patients at low, intermediate and high surgical risk.

Figure 5 .
Figure 5. (A) Continuous relationship between EuroSCORE II and HR for mortality based on restricted cubic spline models (univariable cox regression).(B) Continuous relationship between Eu-roSCORE II and HR for mortality based on restricted cubic spline models (multivariable Cox regression).

Figure 5 .
Figure 5. (A) Continuous relationship between EuroSCORE II and HR for mortality based on restricted cubic spline models (univariable cox regression).(B) Continuous relationship between EuroSCORE II and HR for mortality based on restricted cubic spline models (multivariable Cox regression).

Table 1 .
Operative and early outcomes.

Table 2 .
A. EuroSCORE II calibration in the overall population and patients younger and older than 75 years old.B. EuroSCORE II calibration according to different preoperative risk categories (LES2, IES2, HES2).

Table 3 .
Univariable and multivariable Cox regression for long-term mortality.

Table 3 .
Univariable and multivariable Cox regression for long-term mortality.