Development and Validation of a Machine Learning Score for Readmissions After Transcatheter Aortic Valve Implantation

Background Identifying predictors of readmissions after transcatheter aortic valve implantation (TAVI) is an important unmet need. Objectives We sought to explore the role of machine learning (ML) in predicting readmissions after TAVI. Methods We included patients who underwent TAVI between 2016 and 2019 in the Nationwide Readmission Database. A total of 917 candidate predictors representing all International Classification of Diseases, Tenth Revision, diagnosis and procedure codes were included. First, we used lasso regression to remove noninformative variables and rank informative ones. Next, we used an unsupervised ML model (K-means) to identify patterns/clusters in the data. Furthermore, we used Light Gradient Boosting Machine and Shapley Additive exPlanations to specify the impact of individual predictors. Finally, we built a parsimonious model to predict 30-day readmission. Results A total of 117,398 and 93,800 index TAVI hospitalizations were included in the 30- and 90-day analyses, respectively. Lasso regression identified 138 and 199 informative predictors for the 30- and 90-day readmission, respectively. Next, K-means recognized 2 distinct clusters: low risk and high risk. In the 30-day cohort, the readmission rate was 10.1% in the low risk group and 23.3% in the high risk group. In the 90-day cohort, the rates were 17.4% and 35.3%, respectively. The top predictors were the length of stay, frailty score, total discharge diagnoses, acute kidney injury, and Elixhauser score. These predictors were incorporated into a risk score (TAVI readmission score), which exhibited good performance in an external validation cohort (area under the curve 0.74 [0.7-0.78]). Conclusions ML methods can leverage widely available administrative databases to identify patients at risk for readmission after TAVI, which could inform and improve post-TAVI care.

4][5][6][7][8] In addition, these studies often used institutional databases, which limit their generalizability.0][11][12][13] ML algorithms have been shown to provide excellent risk discrimination regarding in-hospital mortality after TAVI. 9 However, to our knowledge, their utility in identifying risk factors for readmission has not been previously investigated.To address this knowledge gap, we sought to develop and validate an ML-based model to predict the risk of readmission after TAVI using a nationwide contemporary administrative database.The NRD and SID are developed by the Healthcare

METHODS
Cost and Utilization Project.The SIDs encompass w97% of all hospitalizations in the participating states, and the NRD represents a w60% sample of the SIDs across 30 states.The NRD and SID provide demographics, inpatient diagnoses and procedures, total costs, primary payers, length of stay, and hospital characteristics.In addition, they contain a patient linkage number that identifies discharges belonging to the same individual within the same state. 1416][17][18] Because NRD and SID are publicly available and deidentified, this study was deemed exempt by the institutional review board.
The data could not be made directly available for replication of the study's results by the authors but can be obtained directly from the Healthcare Cost and Utilization Project with the appropriate data user agreement.
STUDY POPULATION.We identified hospital stays for adult patients (aged $ 18 years) who underwent TAVI using the International Classification of Diseases-10th Revision-Clinical Modification, codes (Supplemental Table 1).We excluded patients who died, those discharged in December for the 30-day cohort and in October, November, or December for the 90-day cohort, and those who had missing information (Figure 1).
PRIMARY OUTCOMES.The outcomes of interest included 30-and 90-day readmission after TAVI.
PREDICTORS SELECTION PROCESS.Lasso regression was used to find the most informative predictors and rank them in descending order of importance.
Nine hundred seventeen variables spanning all body systems were included in lasso regression: 49 repre-   Sulaiman et al   2).
K-MEANS CLUSTERING.In the 30-day cohort, Lasso regression ranked 138 informative predictors in descending order of importance (Supplemental  2).

COMPARISON OF HIGH-AND LOW-RISK CLUSTERS.
A comparison between the baseline characteristics of the high risk and low risk groups is shown in Tables 1   and 2. The top 20 predictors with the highest SHAP values are displayed in Figures 3 and 4. The frequencies of these predictors in the high risk and low risk groups are shown in Tables 3 and 4. The high-risk cluster was had a higher comorbidities burden (Elixhauser score 8 [5-12] vs 2 [0-5], SD ¼ 0.44), a higher total number of diagnoses (23 [18-27] vs 15 [11-18], SD ¼ 0.55), and a higher prevalence of hemodialysis (11.1% vs 0.8%, SD ¼ 0.61

DISCUSSION
This study proposes a simple ML-derived score to predict unplanned rehospitalizations after TAVI using widely available administrative data (Central Illustration).The "readmission score" could be incorporated into the electronic medical records to Sulaiman et al

Machine Learning to Predict Readmissions After TAVI
A U G U S T 2 0 2 2 : 1 0 0 0 6 0 automatically flag patients at high risk for read-   The previously mentioned studies had 2 major limitations: first, they used variables that might not be routinely collected in every center (eg, 5-m gait speed), hence limiting their broad applicability; and second; none of the studies constructed or validated an actual risk score that can be easily incorporated in clinical practice.Our study sought to mitigate these limitations by: 1) using variables from administrative data sets that are routinely used in virtually all hospitals; and 2) by constructing a validating a risk score that can be incorporated within hospital databases to stratify TAVI patients with regard to their risk of readmissions.In addition, the previous studies identified dissimilar predictors of readmission, reflecting the preselection of a narrow pool of candidate predictors.Therefore, we applied ML methods that, contrary to conventional statistics, allow a full survey of all existing variables in the database (>900 in this study), hence allowing potential discovery of new predictors and avoiding priori knowledge and prejudice.

DATA SOURCE.
We used the National Readmission Database (NRD) from January 1, 2016, through December 31, 2019, for the development stage (training data set), and the 2020 Maryland State Inpatient Database (SID) for the validation stage (testing data set).
UNSUPERVISED ML (K-MEANS).K-means identifies hospital stays of similar features and assigns them to clusters.The machine had the freedom to split data into 2 to 8 distinct clusters.The final number of clusters was chosen based on the largest Silhouette scores (Supplemental Figure 1).Silhouette scores represent the distance between clusters, and the larger the score, the more distant the clusters are.To visualize these clusters, we ran a principal component analysis.Then we scatter plotted hospital stays on the first and second principal components.

FIGURE 1
FIGURE 1 Study Flowchart

FIGURE 2
FIGURE 2 Scatterplot of the 90-Day Cohort on Principal Components 1 and 2

FIGURE 3 FIGURE 4
FIGURE 3 SHAP Values of the Top 20 Predictors of Light GBM Model

FIGURE 5
FIGURE 5 Receiver Operator Curve and Optimal Cutpoint for the TAVI 30-Day Readmission Score in the Validation Cohort Machine Learning to Predict Readmissions After TAVI antiplatelet and anticoagulation use predicted early readmission after TAVI.A third study involved 1,749 TAVIs from a Japanese multicenter registry identified atrial fibrillation, obstructive pulmonary disease, Frailty Scale $4, chronic kidney disease, and moderate-to-severe mitral regurgitation as independent predictors of readmission for heart failure after TAVI.5 CENTRAL ILLUSTRATION Machine Learning Derived Score to Predict Readmissions After Transcatheter Aortic Valve Implantation Sulaiman S, et al.JACC Adv.2022;1(3):100060.The right lower subfigure shows the scatterplot of the 30-day cohort on principal components 1 and 2; blue dots represent the low-risk cluster, and red crosses represent the high-risk cluster.The left lower subfigure shows the suggested score equation with a histogram of the score per high and low-risk clusters; 37.8 is the 95th percentile of the low-risk cluster, and 30.8 is the fifth percentile of the high-risk cluster.AKI ¼ acute kidney injury; NRD ¼ Nationwide Readmission Database; PC ¼ principal component; TAVI ¼ transcatheter aortic valve implantation.Sulaiman et al J A C C : A D V A N C E S , V O L . 1 , N O . 3 , 2 0 2 2 Machine Learning to Predict Readmissions After TAVI A U G U S T 2 0 2 2 : 1 0 0 0 6 0 Machine Learning to Predict Readmissions After TAVI

TABLE 1
Comparison of Baseline Characteristics Stratified by 30-Day Readmission Status and K-Means Recognized Clusters Values are mean AE standard deviation, %, or median (IQR).a Defined by HCUP Elixhauser comorbidities software.SD ¼ standardized difference.

TABLE 2
Comparison of Baseline Characteristics Stratified by 90-Day Readmission Status and K-Means Recognized Clusters Values are mean AE standard deviation, %, or median (IQR).a Defined by HCUP Elixhauser comorbidities software.SD ¼ standardized difference.

TABLE 3
Comparisons of the Top 20 Most Impactful Predictors of High vs Low Risk of 30-Day Readmission After TAVI Values are median (IQR) or %. a Defined by HCUP Elixhauser comorbidities software.

TABLE 4
Comparisons of the Top 20 Most Impactful Predictors of High vs Low Risk of 90-Day Readmission After TAVI Values are median (IQR) or %. a Defined by HCUP Elixhauser comorbidities software.IQR ¼ interquartile range; SD ¼ standardized difference.