Machine Learning Risk Prediction for Incident Heart Failure in Patients With Atrial Fibrillation

Background Atrial fibrillation (AF) increases the risk of heart failure (HF); however, little focus is placed on the risk stratification for, and prevention of, incident HF in patients with AF. Objectives This study aimed to construct and validate a machine learning (ML) prediction model for HF hospitalization in patients with AF. Methods The Fushimi AF Registry is a community-based prospective survey of patients with AF in Fushimi-ku, Kyoto, Japan. We divided the data set of the registry into derivation (n = 2,383) and validation (n = 2,011) cohorts. An ML model was built to predict the incidence of HF hospitalization using the derivation cohort, and predictive ability was examined using the validation cohort. Results HF hospitalization occurred in 606 patients (14%) during a median follow-up period of 4.4 years in the entire registry. Data of transthoracic echocardiography and biomarkers were frequently nominated as important predictive variables across all 6 ML models. The ML model based on a random forest algorithm using 7 variables (age, history of HF, creatinine clearance, cardiothoracic ratio on x-ray, left ventricular [LV] ejection fraction, LV end-systolic diameter, and LV asynergy) had high prediction performance (area under the receiver operating characteristics curve [AUC]: 0.75) and was significantly superior to the Framingham HF risk model (AUC: 0.67; P < 0.001). Based on Kaplan-Meier curves, the ML model could stratify the risk of HF hospitalization during the follow-up period (log-rank; P < 0.001). Conclusions The ML model revealed important predictors and helped us to stratify the risk of HF, providing opportunities for the prevention of HF in patients with AF.

proportion of deaths in contemporary patients with AF. 4,5 However, many studies have focused on the prevention of thromboembolism, and little attention has been placed on the risk stratification for, and prevention of, HF despite its high prevalence and poor prognostic impact in patients with AF. An important step toward HF prevention is to identify patients who have a high risk for the disease. Therefore, comprehensive risk stratification of incident HF is warranted for the management of AF in daily practice; however, there is a scarcity of published reports regarding these issues.
Machine learning (ML) is a subset of artificial intelligence in which algorithms learn from data without explicit programming. ML techniques provide a powerful tool for learning complex relationships between the risk predictors and clinical outcomes from a representative sample of patients. 6,7 Besides, ML can efficiently process huge multicategorical data, including biological, clinical, and imaging data, to predict the clinical outcomes. 8,9 Recent studies, including ours, revealed that ML models can achieve higher prediction performance for thromboembolism than the validated risk score, the CHA 2 DS 2 -VASc score, in patients with AF. [10][11][12] We consider ML techniques promising for risk prediction of future HF events; however, risk stratification for HF using ML algorithms in patients with AF has not been investigated.
Accordingly, the aim of the present study was to construct an ML model for predicting the incidence of HF events and to validate its performance using the data from a large-scale community-based prospective survey of Japanese AF patients, the Fushimi AF Registry.   Machine Learning Prediction for HF in AF 707 perspective. In addition, several variables were created using existing variables (for example, body mass index was calculated using patients' height and weight data). Variables with more than 30% missing data in the derivation cohort were deleted. 12  After the model derivation, each of 6 ML models was evaluated for its performance using the validation cohort. For model evaluation with the validation cohort, the missing values were imputed 20 times with multiple imputation with chained equations to address the randomness of the estimation. 18,19 In the    Machine Learning Prediction for HF in AF characteristics in the derivation and validation cohorts are presented in Table 1. Patients in the derivation cohort had a lower prevalence of paroxysmal AF and dyslipidemia and had a higher prevalence of pre-existing HF, hypertension, diabetes mellitus, and chronic kidney disease (all P < 0.05). Oral anticoagulants were less frequently prescribed, and cardiothoracic ratio and LV ejection fraction were lower in patients in the derivation cohort than in those in the validation cohort (all P < 0.05) ( Table 1). The Kaplan-Meier curve for the incidence of HF hospitalization is shown in Supplemental Figure 1.

METHODS
The annual incidence rate of HF hospitalization in the derivation cohort was 4.0% per person-year, and that in the validation cohort was 2.5% per person-year.
All-cause death occurred in 986 (22%) of 4,394  Table 3. The AUCs of the 6 ML algorithms using these 7 variables for the validation cohort are shown in Figure 3A. The AUCs for each model were high (range: 0.73-0.75) using the validation cohort.
The AUC of the Framingham HF risk model for the validation cohort is shown in Figure 3B. According to the Hanley and McNeil method, the ML model with random forest algorithm using the 7 variables was    Figure 5A, Supplemental Table 5). The practical ML model was able to stratify the risk of HF hospitalization among patients without pre-existing HF (logrank; P < 0.001) ( Figure 5B). High-risk patients had a   Indeed, the Belgrade AF Study reported that mild left atrial dilatation or low-normal LV ejection fraction in structurally normal heart heralds an increased risk of incident HF. 30 Another study reported that increased left atrial volume provided prognostic information for the prediction of HF events in AF. 31 When these previous studies are combined with ours, transthoracic echocardiography plays an important role in risk stratification for incident HF in patients with AF. In addition to imaging data, our study suggested that biomarkers can help identify patients with AF who are at an increased risk of HF events. We previously demonstrated that natriuretic peptide levels are a useful biomarker for the risk stratification of HF hospitalization in patients with AF, although this biomarker was unable to be included in our ML models because of missing data. 32 Biomarkers of inflammation, kidney function, and hemoglobin levels were also reportedly associated with a higher incidence of HF in these patients. 28,33,34 However, there is a scarcity of studies incorporating imaging data and biomarkers for the risk prediction model of HF events. Our ML models using these imaging and biomarker data had a high predictive ability, which suggests the utility of incorporating these data for risk stratification for incident HF in patients with AF.  Machine learning prediction model was created using over 100 variables included in the data set of the Fushimi AF Registry. The entire cohort data were divided into derivation cohort and validation cohort. Imaging data and biomarkers were nominated as important variables for the prediction of future heart failure events. Finally, 7 variables were selected for practical machine learning model based on validity, feasibility, and applicability from the clinician's perspective. Our practical machine learning model had a certain level of predictive ability and was able to stratify the risk of hospitalization for heart failure in patients with atrial fibrillation. HF ¼ heart Hamatani et al inhibitors, was reported to aid in preventing HF development in selected patients with AF. 35  Machine Learning Prediction for HF in AF 715