Machine learning-based models to predict one-year mortality among Chinese older patients with coronary artery disease combined with impaired glucose tolerance or diabetes mellitus

Purpose An accurate prediction of survival prognosis is beneficial to guide clinical decision-making. This prospective study aimed to develop a model to predict one-year mortality among older patients with coronary artery disease (CAD) combined with impaired glucose tolerance (IGT) or diabetes mellitus (DM) using machine learning techniques. Methods A total of 451 patients with CAD combined with IGT and DM were finally enrolled, and those patients randomly split 70:30 into training cohort (n = 308) and validation cohort (n = 143). Results The one-year mortality was 26.83%. The least absolute shrinkage and selection operator (LASSO) method and ten-fold cross-validation identified that seven characteristics were significantly associated with one-year mortality with creatine, N-terminal pro-B-type natriuretic peptide (NT-proBNP), and chronic heart failure being risk factors and hemoglobin, high density lipoprotein cholesterol, albumin, and statins being protective factors. The gradient boosting machine model outperformed other models in terms of Brier score (0.114) and area under the curve (0.836). The gradient boosting machine model also showed favorable calibration and clinical usefulness based on calibration curve and clinical decision curve. The Shapley Additive exPlanations (SHAP) found that the top three features associated with one-year mortality were NT-proBNP, albumin, and statins. The web-based application could be available at https://starxueshu-online-application1-year-mortality-main-49cye8.streamlitapp.com/. Conclusions This study proposes an accurate model to stratify patients with a high risk of one-year mortality. The gradient boosting machine model demonstrates promising prediction performance. Some interventions to affect NT-proBNP and albumin levels, and statins, are beneficial to improve survival outcome among patients with CAD combined with IGT or DM. Supplementary Information The online version contains supplementary material available at 10.1186/s12933-023-01854-z.


Introduction
Background and objectives 3a Explain the medical context (including whether diagnostic or prognostic) and rationale for developing or validating the multivariable prediction model, including references to existing models. 3b Specify the objectives, including whether the study describes the development or validation of the model or both. Methods

Source of data 4a
Describe the study design or source of data (e.g., randomized trial, cohort, or registry data), separately for the development and validation data sets, if applicable. 4b Specify the key study dates, including start of accrual; end of accrual; and, if applicable, end of follow-up.

Participants 5a
Specify key elements of the study setting (e.g., primary care, secondary care, general population) including number and location of centres. 5b Describe eligibility criteria for participants. 5c Give details of treatments received, if relevant.

Outcome 6a
Clearly define the outcome that is predicted by the prediction model, including how and when assessed. 6b Report any actions to blind assessment of the outcome to be predicted.

Predictors 7a
Clearly define all predictors used in developing or validating the multivariable prediction model, including how and when they were measured. 7b Report any actions to blind assessment of predictors for the outcome and other predictors. Sample size 8 Explain how the study size was arrived at.
Missing data 9 Describe how missing data were handled (e.g., complete-case analysis, single imputation, multiple imputation) with details of any imputation method.

Statistical analysis methods 10a
Describe how predictors were handled in the analyses.
10b Specify type of model, all model-building procedures (including any predictor selection), and method for internal validation.
10d Specify all measures used to assess model performance and, if relevant, to compare multiple models. Risk groups 11 Provide details on how risk groups were created, if done. Results

Participants 13a
Describe the flow of participants through the study, including the number of participants with and without the outcome and, if applicable, a summary of the follow-up time. A diagram may be helpful.
13b Describe the characteristics of the participants (basic demographics, clinical features, available predictors), including the number of participants with missing data for predictors and outcome.

Model development 14a
Specify the number of participants and outcome events in each analysis. 14b If done, report the unadjusted association between each candidate predictor and outcome.

Model specification 15a
Present the full prediction model to allow predictions for individuals (i.e., all regression coefficients, and model intercept or baseline survival at a given time point). 15b Explain how to the use the prediction model. Model performance 16 Report performance measures (with CIs) for the prediction model.

Limitations 18
Discuss any limitations of the study (such as nonrepresentative sample, few events per predictor, missing data).

Interpretation 19b
Give an overall interpretation of the results, considering objectives, limitations, and results from similar studies, and other relevant evidence.
Implications 20 Discuss the potential clinical use of the model and implications for future research.

Other information
Supplementary information 21 Provide information about the availability of supplementary resources, such as study protocol, Web calculator, and data sets. Funding 22 Give the source of funding and the role of the funders for the present study.
We recommend using the TRIPOD Checklist in conjunction with the TRIPOD Explanation and Elaboration document.