Machine learning‐based prediction of 1‐year mortality in hypertensive patients undergoing coronary revascularization surgery

Abstract Background Machine learning (ML) has shown promising results in all fields of medicine, including preventive cardiology. Hypertensive patients are at higher risk of mortality after coronary artery bypass graft (CABG) surgery; thus, we aimed to design and evaluate five ML models to predict 1‐year mortality among hypertensive patients who underwent CABG. Hyothesis ML algorithms can significantly improve mortality prediction after CABG. Methods Tehran Heart Center's CABG data registry was used to extract several baseline and peri‐procedural characteristics and mortality data. The best features were chosen using random forest (RF) feature selection algorithm. Five ML models were developed to predict 1‐year mortality: logistic regression (LR), RF, artificial neural network (ANN), extreme gradient boosting (XGB), and naïve Bayes (NB). The area under the curve (AUC), sensitivity, and specificity were used to evaluate the models. Results Among the 8,493 hypertensive patients who underwent CABG (mean age of 68.27 ± 9.27 years), 303 died in the first year. Eleven features were selected as the best predictors, among which total ventilation hours and ejection fraction were the leading ones. LR showed the best prediction ability with an AUC of 0.82, while the least AUC was for the NB model (0.79). Among the subgroups, the highest AUC for LR model was for two age range groups (50–59 and 80–89 years), overweight, diabetic, and smoker subgroups of hypertensive patients. Conclusions All ML models had excellent performance in predicting 1‐year mortality among CABG hypertension patients, while LR was the best regarding AUC. These models can help clinicians assess the risk of mortality in specific subgroups at higher risk (such as hypertensive ones).


| INTRODUCTION
Cardiovascular diseases (CVDs) are responsible for approximately 17.9 million deaths annually. 1 Ischemic heart disease (IHD) is the most prevalent CVD in the general population, as 49.2% of CVD deaths are among IHD patients. 2 Revascularization methods, including percutaneous coronary intervention (PCI) and coronary artery bypass grafting (CABG), are the primary therapies in IHD. 3 CABG is one of the most common cardiac surgeries, considered the preferable therapeutic approach in patients with multivessel or left main coronary artery disease (CAD) or in case of left ventricular dysfunction. 4 With the prevalence of one in every three adults in the United States, 5 hypertension is a major modifiable risk factor for CAD irrespective of sex and age. 6 Hypertensive patients tend to have different risk factor patterns from other CABG patients. 7 Moreover, increased postoperative complications, early mortality, and 2-year mortality have been reported, compared to nonhypertensive patients. 7,8 This was reported to be an up to 40% increase in perioperative morbidity in hypertensive patients undergoing CABG. 9 Besides traditional risk scores, machine learning (ML)developed models are getting attention for outcome prediction after cardiac surgeries. 10 However, there are controversies about the accuracy of ML models compared to risk scores currently being used. 11 Knowing the greater need for mortality prediction in the hypertensive population, we aimed to use and compare different ML methods to predict 1-year mortality of hypertensive patients after isolated CABG.

| Study design and data collection
We conducted this serial cross-sectional study based on the Tehran Heart Center CABG registry among hypertensive patients between 2005 and 2015. Hypertension was defined as systolic blood pressure (SBP) ≥ 140 mmHg and/or diastolic blood pressure (DBP) ≥ 90 mmHg following two separate examinations in patients' history and/or taking antihypertensive medications. All the perioperative data of patients were collected and managed by expert nurses in our center.
The ethics committee of Tehran Heart Center approved this study (IR.TUMS.THC.1401.023).

| Variables' definition
Baseline characteristics including demographic, preoperative, and intraoperative variables were used as potential predictors. Age, gender, weight, height, and body mass index (BMI) were demographics. Serum hemoglobin (Hb), high-density lipoprotein cholesterol (HDL-C), low-density lipoprotein cholesterol (LDL-C), total cholesterol, triglycerides (TG), and creatinine, in addition to left ventricular ejection fraction (EF) measured by echocardiogram, diabetes, opium consumption, smoking status, prior myocardial infarction (MI), preoperative heart failure (HF), and chronic obstructive pulmonary disease (COPD) were preoperative variables. Finally, hospitalization parameters and intraoperative variables were total ICU hours, total ventilation hours, and cardiopulmonary pump utilization (on-pump or off-pump). All these data were obtained from either past medical records or blood sample measurements during hospitalization episodes and before surgery.

| Main outcome
The study's main outcome was 1-year mortality post-CABG, for which we compared different ML-based prediction models. This outcome included both in-hospital and after-discharge mortality events.

| Test/train split, feature selection, and oversampling
In a random assignment process, the total hypertensive population was divided into train and test cohorts (70% and 30%, respectively).
The test cohort sample was used to evaluate and validate the ML models.
To select the best predictors for mortality in the total population and each of the subgroups, a feature selection process based on the random forest (RF) model was designed using 10-fold cross-validation.
This technique investigates the effect of each predictor alone and in combination with other predictive variables. RF feature selection works based on mean decrease accuracy (MDA) and the mean decrease gini (MDG). The former shows how much accuracy is lost if a variable is excluded, while MDG represents the contribution of each variable to the homogeneity of the nodes and leaves in the resulting RF. The higher these scores, the higher the importance of variable. [12][13][14] Wherever there was a strong clinical and statistical correlation between two variables, the one with better prediction potential and/or clinical relevance was chosen, and the other was omitted.
Our study population was completely imbalanced in terms of mortality, where its rate was only 3.39%. To tackle this common challenge in ML models, we performed the synthetic minority oversampling technique (SMOTE) to balance our data in the training sample. SMOTE works by identifying the minority group's k-nearest neighbors, and it selects a set of neighbors which then generates new data using them. 15 Ten-fold cross-validation with the SMOTE of 25% (for the ratio of the minority to majority group) was used to tune this oversampling strategy and select the best minority to majority class ratio.
As the last step of preparation for the model development, the "standard scaler" (from the scikit-learn package 16 ) was used to scale each variable by removing the mean and scaling to unit variance, which is the requirement for many ML algorithms.

| Model development
Predictive ML models used in this study were (1) logistic regression In all models, we used variables obtained by the feature selection method previously described. The "Grid Search" method was used to select the best parameters in each model to increase the accuracy of the model performance.

| Model performance evaluation
Performance evaluation was done using the following metrics: A) sensitivity and specificity; B) accuracy of prediction using 10-fold cross-validation; C) AUC score by plotting true positive against false positive rate. The threshold is the cut-off to allocate a probability into a class label and is normally set at 0.5 (50%). Due to the highly imbalanced outcome in our study, this rate of 0.5 was tuned by utilizing 10-fold cross-validation in the train data to adjust the sensitivity and specificity of models.
The primary metric for evaluating models was chosen as AUC (with a 95% confidence interval [CI] using several random states) since it is independent of the threshold. To validate the findings, the best model in terms of AUC was implemented to measure the metrics for the most recent 30% of cases in terms of admission time (2013)(2014)(2015). This method assesses the temporal validity of findings over time. 17,18

| Statistical analysis
Baseline characteristics are reported as mean ± standard deviation (SD) or proportion (percentage). The comparison was made using Pearson χ 2 test and Fisher's exact test for categorical variables, in addition to an independent sample t-test for continuous variables. A two-sided p value of less than .05 was considered statistically significant. Prediction models were designed and evaluated for 1year mortality for the whole hypertensive cohort of patients and subgroups based on gender, age group, BMI, diabetes, and smoking status. All statistical analyses and model development were performed using Python (version 3.10). LR, NB, and RF models were implemented using scikit-learn (1.0.2) library, 16 ANN with TensorFlow (version 2), 19 and XGB using XGBoost (version 1.6.0) Python library.
The methodological design of the study including all the mentioned stages performed is illustrated in Figure 1.  Table 1. Among all patients, 303 (3.39%) died during a 1-year follow-up. Patients who died were significantly older than survivors (71.94 ± 9.56 vs. 68.41 ± 9.26 years; p < .001). Hb and EF were significantly lower in dead patients compared to alive ones. In addition, the prevalence of diabetes was higher in patients who died (54.78% vs. 45.65%; p < .001). Figure 2 represents the baseline characteristics of the dead and alive patients in the whole cohort measured before, during, or after the CABG.

| Feature selection
The RF feature selector was used using a 10-fold cross-validation method to select top features given their AUC. Using Pearson correlation r, we determined correlations between the features. Total ventilation hours and BMI were used instead of total ICU hours and weight due to statistical correlation and more clinical acceptance.

| Models evaluation
We designed five ML algorithms for the prediction of 1-year mortality among hypertensive patients undergoing CABG. Table 1 compares the sensitivity, specificity, and AUC of prediction models.
All the models had an acceptable performance with LR outperforming others [AUC (95% CI) = 0.82 (0.78-0.86)]. Considering AUC as the main metric for evaluation, LR was followed by XGB, ANN, RF, and NB. In addition, LR had the highest specificity and accuracy (specificity = 83% and accuracy = 82.37%), while XGB had the best performance in terms of sensitivity (88%). Figure 3B demonstrates the receiver operating characteristic curve (ROC) for all five models.
Finally, the LR model showed an AUC of 0.77 (0.73-0.81) for the most recent 30% of the total cohort.

| Subgroups
Models ran for each of the subgroups of hypertensive patients.    Therefore, it seems that these scoring tools need to be modified and improved. [21][22][23][24] The recent advancement in electronic medical records and artificial intelligence resulted in an increasing interest in utilizing ML algorithms for individualized clinical decision-making and risk prediction. 25 ML algorithms showed a notable ability to be trained to develop personalized risk prediction scoring systems for outcomes of specific therapeutic approaches such as surgeries by identifying complex patterns in the big data. [26][27][28] In addition, ML models allow for adjustment of the sensitivity and specificity of each model in different clinical settings in the context of risk predictions at the individual level. 29 Although some previous studies used the 50% default threshold, which can lead to a plethora of missed cases, we modified it to achieve the optimum sensitivity and specificity on the ROC curve, the same as what was done earlier in other studies. 29,30 It has been illustrated that ML could improve the forecasting quality of the traditional epidemiologic standard mortality models. 31,32 These findings extend several studies where they demonstrated the superiority of ML models compared to classical tools for identifying patients at increased risk of mortality after CABG. ML models demonstrated that they could be more accurate in forecasting in-hospital mortality after cardiac surgeries than EURO-Score II. 33 The preoperative ML models also outperformed the conventional STS model concerning the prediction of mortality or major morbidity in patients who underwent isolated CABG, mainly using intraoperative parameters such as cross-clamp and bypass times as additive predictive factors. 34  The frequency of hypertension among patients who require CABG is notable, as a recent study reported that hypertension was present in 54.6% of patients who underwent CABG. 37 Moreover, hypertension is a significant risk factor for mortality and worsened prognosis after CABG. 9,38 Therefore, this study focused on the prediction of mortality after CABG in a hypertensive group of patients, that the LR model represented the best discriminative performance for predicting the 1-year mortality. The simplicity of implementation and regularization,

F I G U R E 4 Logistic regression (LR) model evaluation for prediction of mortality in different subgroups of patients
good efficiency from a training perspective, and not being affected by small data noise and multicollinearity constitute the advantage of LR. 39 It has been reported that LR performs as well as ML models in predicting the risk of CVDs, chronic kidney disease (CKD), diabetes, and hypertension. 40 Similar to our study, several studies aimed to investigate implementing ML models in certain groups of patients after cardiac surgery. For instance, Zhong et al. 41 revealed that the XGB was associated with overall better predictive ability in terms of AUC compared to other models for forecasting the 30-day mortality in critically ill patients after cardiac surgery. Consistently, another study compared five ML algorithms for estimating the long-term mortality risk in the older adults (>65 years old) group who underwent CABG.
Based on their results, the XGB and multivariate adaptive regression spline (MARS) models yielded the best predictive performance before and after variable selection, respectively. 42 Altogether, there are controversies in selecting the best model for predicting clinical outcomes and mortality.
Feature selection is widely applied to removing irrelevant and unnecessary data, thereby could improve the accuracy and understanding of the ML models. 43 RF algorithm has been applied in many studies [44][45][46] and found to perform better in classification prediction modeling compared to other methods in ML techniques. 47,48 Since the use of too many features can lead to a decrease in the model's performance, reducing the number of variables and taking the correlation of features into consideration are among the advantages of the RF model. 45 Likewise, in this study, we used the RF feature selector technique to determine the top features. Based on our results, the ventilation time after the surgery was recorded as the most influential variable for predicting mortality, followed by baseline EF.
Consistently, in another study, LR, RF, and XGB models selected the mechanical ventilation time as an important perioperative factor for predicting mortality after CABG. 42 Also, the prolonged mechanical ventilation requirement after cardiac surgery has been reported as a predictive factor for in-hospital and long-term mortality, with patients who were intubated for more than 21 days having significantly worsened long-term survival compared to other patients in 1 year (88.9 vs. 70.9%, p = .03). 49 Fernandez-Zamora et al. 50 also reported that prolonged mechanical ventilation (>24 h) postcardiac surgery was observed in 10%-20% of patients, and they represented most of the postoperative mortality. 50 In a meta-analysis, He et al.
reported that prolonged mechanical ventilation time (>48 h) could be associated with a higher risk of ventilator-associated pneumonia (VAP), averaging 35.2%. Also, VAP after cardiac surgery is related to poor prognosis with high mortality and long ICU stays. 51 The notable reverse relationship between low EF and risk of post-CABG mortality has also been frequently reported by other investigations, 52-55 with a dose-response relationship between reducing EF and risk of death has been revealed. 53 So far, lots of previous studies reported the pivotal role of age, 56,57 impaired glucose [58][59][60] and lipid profile, 61,62 Hb levels, 63 data, which is a main methodological challenge in ML models. Several approaches have been suggested to resolve this issue, including oversampling the minority group, undersampling the majority group, and lowering the prediction threshold. 76,77 To overcome the imbalance of mortality data, we modified the threshold and applied the SMOTE oversampling method, which is more frequently used for predicting meager outcomes such as mortality than undersampling methods due to retaining valuable data. 15

CONFLICT OF INTEREST
The authors declare no conflict of interest.

DATA AVAILABILITY STATEMENT
The data set analyzed in this study, along with the codes used to develop and evaluate machine learning models, are available upon reasonable request from the corresponding author.