Machine Learning to Predict the Likelihood of Acute Myocardial Infarction

Supplemental Digital Content is available in the text.


Gradient boosting
We used a supervised learning technique, gradient boosting, to develop a model which provides a probabilistic estimate of type 1 myocardial infarction (MI) when presented with a set of explanatory variables. In brief, gradient boosting employs an ensemble of weak learners (here decision trees) which are combined in an iterative process. The aim is to sequentially improve model accuracy, where each tree attempts to correct the errors of the preceding stage.
The final model is a weighted contribution of each decision tree. We refer interested readers to Hastie et al. (2009), Friedman (2001), and Friedman et al. (2000) for further details. The following explanatory variables (features) were used as inputs in the gradient boosting algorithm: (1) age used as a categorical variable from one of seven possible bands (<30 years, 30-39 years, 40-49 years, 50-59 years, 60-69 years, 70-79 years, ≥80 years), (2) sex (male or female), (3) high-sensitivity cardiac troponin concentration at presentation (ng/L), (4) highsensitivity cardiac troponin concentration on repeat testing (ng/L), and (5) rate of change in high-sensitivity cardiac troponin concentration (the first cardiac concentration subtracted from the second cardiac troponin concentration and then divided by the difference in time in minutes, ng/L per min). The diagnosis of type 1 myocardial infarction (yes or no) was the target. Age was categorized because of limitations that may exist in the acute setting globally where the date of birth is not always available and exact age cannot always be determined. This was more of a practical decision to make the algorithm globally applicable. Mathematically, the model can be expressed as: where 91=probability of type 1 myocardial infarction, F * (X) is the notation for all decision trees included in the algorithm , M is the number of decision trees, F / (X) is the initial decision tree, X is the variable vector (X = age category, sex category, high-sensitivity cardiac troponin I concentration at presentation, high-sensitivity cardiac troponin I concentration on repeat testing, rate of change of concentration), ai is the weighting for each decision tree, and :;(X,<;) is a decision tree of characterized by parameters <;. Once the decision trees and weightings were determined (using the training set), they are locked in place and will be used to assess the performance of the model in the test set.
Some of the gradient boosting machine (GBM) hyper-parameters were preset to specific values: the learning rate (shrinkage parameter applied to each tree in the expansion) was set to 0.01 (choosing a small value to avoid overfitting), the interaction depth (maximum depth of each tree, expresses the highest level of variable interactions allowed) was set to 2 (i.e. we allow for up to 2-way interactions amongst the variables presented in the model), the minimum number of observations in the terminal nodes was set to 7, and the fraction of the training set observations randomly selected for each subsequent tree was set to 0.5 (introducing randomness into the model fit). We used the Bernoulli distribution for the loss function in obtaining the probabilistic estimates of the GBM output. The optimal number of iterations (trees) was determined using 5-fold cross-validation searching over 1 to 1000 trees, and we found that 987 trees gave the best results. The algorithm was developed in R using the R package 'gbm' (https://cran.r-project.org/web/packages/gbm/).
The final algorithm returned a value between 0 and 100 for each patient that reflects the likelihood of type 1 myocardial infarction. The algorithm is incorporated into a clinical decision support tool, which also reports the diagnostic parameters associated with each MI 3 value. These diagnostic parameters cannot be derived for an individual patient, and therefore the decision support tool uses an embedded reference table to report estimates of sensitivity, specificity and negative and positive predictive values from our training set alongside the calculated MI 3 value. Research groups wishing to test the algorithm may apply to utilize the algorithm in their research by contacting the authors of the study, and we will provide them with a version of the trained algorithm on our dataset.

The Abbott ARCHITECT high-sensitivity cardiac troponin I assay
The manufacturer reported limit of detection (LoD) and 99 th percentile upper reference limit