Dynamic risk prediction for diabetes using biomarker change measurements

Background Dynamic risk models, which incorporate disease-free survival and repeated measurements over time, might yield more accurate predictions of future health status compared to static models. The objective of this study was to develop and apply a dynamic prediction model to estimate the risk of developing type 2 diabetes mellitus. Methods Both a static prediction model and a dynamic landmark model were used to provide predictions of a 2-year horizon time for diabetes-free survival, updated at 1, 2, and 3 years post-baseline i.e., predicting diabetes-free survival to 2 years and predicting diabetes-free survival to 3 years, 4 years, and 5 years post-baseline, given the patient already survived past 1 year, 2 years, and 3 years post-baseline, respectively. Prediction accuracy was evaluated at each time point using robust non-parametric procedures. Data from 2057 participants of the Diabetes Prevention Program (DPP) study (1027 in metformin arm, 1030 in placebo arm) were analyzed. Results The dynamic landmark model demonstrated good prediction accuracy with area under curve (AUC) estimates ranging from 0.645 to 0.752 and Brier Score estimates ranging from 0.088 to 0.135. Relative to a static risk model, the dynamic landmark model did not significantly differ in terms of AUC but had significantly lower (i.e., better) Brier Score estimates for predictions at 1, 2, and 3 years (e.g. 0.167 versus 0.099; difference − 0.068 95% CI − 0.083 to − 0.053, at 3 years in placebo group) post-baseline. Conclusions Dynamic prediction models based on longitudinal, repeated risk factor measurements have the potential to improve the accuracy of future health status predictions. Electronic supplementary material The online version of this article (10.1186/s12874-019-0812-y) contains supplementary material, which is available to authorized users.


Background
In recent years, a wide range of markers have become available as potential tools to predict risk or progression of disease, leading to an influx of investment in the area of personalized screening, risk prediction, and treatment [1][2][3][4]. However, many of the available methods for personalized risk prediction are based on snapshot measurements (e.g., biomarker values at age 50) of risk factors that can change over time, rather than longitudinal sequences of risk factor measurements [2,[5][6][7]. For example, the Framingham Risk Score estimates the 10-year risk of developing coronary heart disease as a function of most recent diabetes status, smoking status, treated and untreated systolic blood pressure, total cholesterol, and HDL cholesterol [6]. With electronic health record and registry data, incorporating repeated measurements over a patient's longitudinal clinical history, including the trajectory of risk factor changes, into risk prediction models is becoming more realistic and might enable improvements upon currently-available static prediction approaches [8,9].
Specifically considering prediction of incident type 2 diabetes, a recent systematic review by Collins et al. [10] found that the majority of risk prediction models have focused on risk predictors assessed at a fixed time; the most commonly assessed risk predictors were age, family history of diabetes, body mass index, hypertension, waist circumference and gender. For example, Kahn et al. [11] developed and validated a risk-scoring system for 10year incidence of diabetes including (but not limited to) hypertension, waist circumference, weight, glucose level, and triglyceride level using clinical data from 9587 individuals. Models that aim to incorporate the trajectory of risk factor changes, e.g., the change in a patient's glucose level in the past year, into risk prediction for incident diabetes have been sparse. Some available methods that allow for the use of such longitudinal measurements are often considered overly complex or undesirable due to restrictive parametric modeling assumptions or infeasible due to computational requirements [12][13][14][15]. That is, with these methods it is often necessary to specify a parametric model for the longitudinal measurements, and a parametric or semiparametric model characterizing the relationship between the time-to-event outcome and the longitudinal measurements and then use, for example, a Bayesian framework to obtain parameter estimates.
Recently, the introduction of the dynamic landmark prediction framework has proved a useful straightforward alternative in several other clinical settings [16][17][18][19]. In the dynamic prediction framework, the risk prediction model for the outcome of interest is updated over time at prespecified "landmark" times (e.g. 1 year or 2 years after the initiation of a particular medication) incorporating information about the change in risk factors up to that particular time. That is, suppose the goal is to provide an individual with the predicted probability of survival past time τ = t + t 0 given that he/she has already survived to time t 0 (t 0 is the landmark time), the dynamic prediction approach provides this prediction using a model that is updated at time t 0 such that it can incorporate the information available up to time t 0 . The approach is appealing because it is relatively simple and straightforward, and does not require as strict parametric modeling assumptions as is required by a joint modeling approach.
In this paper, we describe the development and use of a dynamic prediction model to estimate the risk of developing type 2 diabetes mellitus, incorporating biomarker values measured repeatedly over time, using data from the Diabetes Prevention Program study. We compare our dynamic prediction approach to a static prediction model to determine whether improvements in prediction accuracy can be obtained. Our aim is to illustrate how such a dynamic approach may be useful and appealing to both clinicians and patients when developing prediction models for the incidence of type 2 diabetes.

Static prediction model
For each individual i, let Z i denote the vector of available baseline covariates, T i denote the time of the outcome of interest, C i denote the censoring time assumed to be independent of T i given Z i , X i = min(T i , C i ) denote the observed event time, and D i = I(T i < C i ) indicate whether the event time or censoring time was observed. Suppose the goal is to predict survival to some time τ for each individual i, based on their covariates Z i . A static model based on the Cox proportional hazards model [20,21] can be expressed as: in terms of survival past time t, or in terms of the hazard function as where Λ 0 (τ) is the cumulative baseline hazard at time τ, λ 0 (τ) is the baseline hazard at time τ, and β is the vector of regression parameters to be estimated. Estimates of β are obtained by maximizing the partial likelihood [22]. Here, we use the term "static" because the model itself never changes; the model is fit once, the β vector of parameters is estimated, and these estimates are used to calculate an individual's predicted probability of survival given their particular Z i . In practice, even when Z i is actually a vector of covariate values measured after baseline (e.g. 1 year later), this model is still used under this static approach. This type of model is standard in the risk prediction literature [2,6,7,10,23]. For example, with the Framingham risk score, there is a single static model that is used to provide risk estimates to patients whether a patient comes in at age 40 or age 60 (using age as the time scale), the actual β estimates used to calculate risk are the same, only the Z i values potentially change to reflect the current covariates values.

Dynamic prediction model
A dynamic prediction model differs from a static prediction model in that the model itself is updated (i.e., refit) at specified "landmark times" e.g. 1 year, 2 years, 3 years after baseline [17,18,24]. This model can be expressed as a landmark Cox proportional hazards model: in terms of survival past time τ, or in terms of the hazard function as where t 0 is the landmark time, τ = t + t 0 , t is referred to as the "horizon time", Z i (t 0 ) denotes a vector of covariates and (if available) covariates that reflect changes in biomarker values from baseline to t 0 , Λ 0 (τ| t 0 ) is the cumulative baseline hazard at time τ given survival to t 0 , λ 0 (τ| t 0 ) is the baseline hazard at time τ given survival to t 0 , and α is the vector of regression parameters to be estimated at each time t 0 . As in model (1.1), estimates of α are obtained by maximizing the appropriate partial likelihood. However, for estimation of α, model (1.3) is fit only among individuals surviving to t 0 and thus, the partial likelihood is composed of only these individuals.
The key substantive differences between the static and dynamic landmark models are that (1) no information regarding change in covariate (e.g., biomarker) measurements are incorporated in the static approach, (2) no information regarding survival up to t 0 is incorporated in the static approach, and (3) the static approach uses a single model (i.e. a single set of Cox regression coefficients) for all predictions, whereas the dynamic landmark model fits an updated model at each landmark time and thus, has a distinct set of regression coefficients for each t 0 . Importantly, the probability being estimated with the static model vs. the landmark model is different and thus, the resulting interpretation of this probability is different between the two approaches. The static model estimates P(T i > τ| Z i ), ignoring any information about survival to t 0 while the landmark model estimates P(T i > τ| T i > t 0 , Z i (t 0 )), explicitly incorporating information about survival to t 0 and changes in biomarker values from baseline to t 0 . Of course, a simple derivation can be used to show that one could obtain an estimate for P(T i > τ| T i > t 0 , Z i ) using the static model based on model (1.1) as expf−ðΛ 0 ðτÞ− Λ 0 ðt 0 ÞÞ expðβ 0 Z i Þg whereβ andΛ 0 denote the estimates of the regression coefficients from maximizing the partial likelihood and the Breslow estimator of the baseline cumulative hazard, respectively. However, this is not what is done in current practice when using a static model; the estimated P(T i > τ| Z i ) is typically provided to patients even when it is known they have survived to t 0 e.g. the patient is given this prediction at a 1 year post-intervention appointment time, t 0 = 1 year. In addition, even with this calculation, the estimation ofβ andΛ 0 themselves are not restricted to individuals that survive to t 0 but were instead estimated using all patients at baseline.
Using the dynamic prediction model, one would generally expect improved prediction accuracy due to the fact that the updated models are taking into account survival to t 0 and should more precisely estimate risk for patients after time t 0 . Indeed, previous work has shown, through simulations and applications outside of diabetes, the benefits of this dynamic approach compared to a static model [24]. Parast & Cai [24] demonstrated through a simulation study improved prediction performance when a dynamic landmark prediction model was used instead of a static model in a survival setting.
With respect to the selection of the times t 0 , these times are generally chosen based on the desired prediction times relevant to the particular clinical application. For example, if patients come in for yearly appointments, the t 0 times of interest may be 1 year, 2 years, and 3 years. If patients come in every 2 years, the t 0 times of interest may be 2 years and 4 years.

Model assumptions and model complexity
Both the static model and dynamic prediction model described above rely on correct specification of the relevant models (models (1.2) and (1.4), respectively). Correct model specification includes the assumption of linearity in the covariates (i.e., β ′ Z i ), the assumption of no omitted confounders, and the proportional hazards assumption. The proportional hazards assumption states that the ratio of the hazards for two different individuals is constant over time; this can be seen in the specification of model (1.2) where the hazard ratio for two individuals λ(τ| Z i ) and λ(τ| Z j ) can be seen to be exp(β ′ (Z i − Z j )) which is not a function of time. The simulation study of Parast & Cai [24] showed that when model (1.2) holds, the static model and dynamic landmark model perform equally well, but when this model is not correctly specified, the dynamic landmark model outperforms the static model.
Models (1.2) and (1.4) are relatively straightforward. These models could certainly be altered to incorporate desired complexities including more complex functions of the covariates, spline or other basis expansions, and/ or regularized regression. In addition, this dynamic prediction framework is not restricted to the Cox proportional hazards model alone. Other modeling approaches appropriate for time-to-event outcome can be considered here including an accelerated failure time model, proportional odds model, or even a fully non-parametric model if there are only 1-2 covariates and the sample size is very large [25,26].

Evaluation of prediction accuracy
To evaluate the accuracy of the prediction models in this paper, we assessed both discrimination and calibration. Discrimination measures the extent to which the prediction rule can correctly distinguish between those who will be diagnosed with diabetes within 2 years and those who will not. As a measure of discrimination, we used the area under the receiver operating characteristic curve (AUC) [27,28] defined as: for K = D, S (i.e., dynamic and static), wherep Di andp Si indicate the predicted probability of survival to time τ using the dynamic model and static model, respectively, for person i. The AUC ranges from 0 to 1 with higher values indicating better prediction accuracy. The AUC has an appealing interpretation as the probability that the prediction model being evaluated will assign a lower probability of survival to an individual that will actually experience the event within the time period of interest, compared to an individual that will not. Calibration is based on the alignment between observed event-rates and predicted event probabilities (i.e., how well predictions match observed rates). As a measure of calibration, we used the Brier Score [29,30] defined as: for K = D, S. The Brier Score ranges from 0 to 1 with lower values indicating better prediction accuracy. The Brier Score captures the mean squared error comparing the true event rates and the predicted event rates obtained from the prediction model. As a test of calibration, we additionally calculated the Hosmer-Lemeshow goodness of fit test statistic (extended to survival data) [31,32]. We compare the AUC, Brier Score, and Hosmer-Lemeshow test statistic from the dynamic model versus the static model. Lastly, as another measure of comparison between the dynamic and static model, we calculated the net reclassification improvement (NRI) [33,34]. The NRI quantifies how well a new model (the dynamic model) reclassifies individuals in terms of estimated risk predictions, either appropriately or inappropriately, as compared to an old model (the static model).
For all AUC, Brier Score and NRI, we used a nonparametric inverse probability of censoring weighted estimation approach that does not rely on the correct specification of any of the prediction models described above [28,35] and bootstrapped the approach using 500 samples to obtain confidence intervals and p-values [36]. In addition, for all four accuracy metrics, we used general cross-validation whereby we repeatedly split the data into a training set and a test set during the estimation process to guard against over-fitting (as we did not have access to an external validation data source) [37,38]. That is, when the same dataset is used to both construct a prediction rule and evaluate a prediction rule, the prediction accuracy measures can sometimes appear overly optimistic because the prediction rule has been over-fit on the single dataset available. Therefore, the accuracy observed may not reflect what one could expect to see using an external validation data source. Cross-validation is helpful in settings where only one dataset is available; data are split such that some portion is used to "train" the prediction rule (build the model) and the remainder is used to "test" the prediction rule i.e., evaluate the accuracy. This is not as ideal as having access to an external validation source, but is more beneficial than no cross-validation at all. For our analysis, we took a random sample of 2/3 of the data to use as a training set, and the remaining 1/3 of the data was the test set. This random splitting, fitting, and evaluating, was repeated 100 times and the average of those 100 estimates was calculated.
Application to diabetes prevention program: study description Details of the Diabetes Prevention Program (DPP) have been published previously [39,40]. The DPP was a randomized clinical trial designed to investigate the efficacy of multiple approaches to prevent type 2 diabetes in high-risk adults. Enrollment began in 1996 and participants were followed through 2001. Participants were randomly assigned to one of four groups: metformin (N = 1073), troglitazone (N = 585; this arm was discontinued due to medication toxicity), lifestyle intervention (N = 1079) or placebo (N = 1082). After randomization, participants attended comprehensive baseline and annual assessments as well as briefer quarterly visits with study personnel. In this paper, we focus on the placebo and metformin groups. Though lifestyle intervention was found to be more effective in terms of reducing diabetes incidence in the main study findings [40], prescribing metformin for patients at high-risk of diabetes is becoming more common in current clinical practice and thus, this comparison is likely of more practical interest [41]. We obtained data on 2057 DPP participants (1027 in metformin arm, 1030 in placebo arm) collected on or before July 31, 2001 as part of the 2008 DPP Full Scale Data Release through the National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK) Data Repository, supplemented by participant data released by the 2011 Diabetes Prevention Program Outcomes Study, which followed participants after the conclusion of DPP, through August 2008. The median follow-up time in this cohort was 6.11 years.
The primary outcome was time to development of type 2 diabetes mellitus, measured at mid-year and annual study visits, as defined by the DPP protocol: fasting glucose greater than or equal to 140 mg/dL for visits through 6/23/1997, greater than or equal to 126 mg/dL for visits on or after 6/24/1997, or 2-h post challenge glucose greater than or equal to 200 mg/dL. For individuals who did not develop type 2 diabetes mellitus, their observation time was censored on the date of their last visit within the study.
This study (a secondary data analysis) was approved by RAND's Human Subjects Protection Committee.

Application to diabetes prevention program: analysis
In this application, our goal was to provide predictions of a 2-year horizon time for diabetes-free survival, updated at 1, 2, and 3 years post-baseline. That is, we are predicting diabetes-free survival to 2 years post-baseline, and then predicting diabetes-free survival to 3 years, 4 years, and 5 years post-baseline, given the patient already survived to 1 year, 2 years, and 3 years post-baseline, respectively. In our defined notation, τ = 2, 3, 4, 5 years and t 0 = 0, 1, 2, 3 years and t = 2 years. Our focus on somewhat short-term survival here is due to both data availability for this study and the fact that the study population is composed of high-risk individuals.
We first fit the static model (model (1.2)) with covariates age, gender, BMI, smoking status, race/ethnicity, and baseline (the time of randomization) measurements of HbA1c and fasting plasma glucose. Recall that this results in a single model, with a single set of regression coefficients. To obtain our predictions of interest from the static model when t 0 > 0, probabilities were calculated using the HbA1c and fasting plasma glucose measurements at t 0 , applied to this single model.
Next, we fit dynamic landmark prediction models where we additionally incorporate information on survival to the landmark times t 0 = 1, 2, 3 years and information on the change in HbA1c and fasting plasma glucose from baseline to t 0 . These models result in an estimate of the probability of a diabetes diagnosis within 2 years after the landmark time as a function of baseline characteristics, lab measurements at baseline, and the change in lab measurements from baseline to t 0 . This approach results in four models, each with its own set of regression coefficients. We stratified all analyses by treatment group: placebo and metformin.
Data availability, code and software DPP data are publicly available upon request from the NIDDK Data Repository and require the establishment of a data use agreement. Code for all analyses presented here is available upon request from the authors. All analyses were performed in R Version 3.3.2, an open source statistical software, using the packages survival and landpred.

Results
Approximately 49% of participants in our sample were younger than 50, 67% were female, and the majority were of white race (Table 1). At baseline, more than one-third of participants had BMI greater than 35 kg/m 2 , and the majority did not smoke. Previous analyses have shown that these characteristics were balanced across the randomized treatment groups [40,42]. Eight participants were missing HbA1c values at baseline and were thus excluded from our subsequent analyses.
A total of 182 participants assigned to the placebo arm (18%) and 126 participants assigned to the metformin arm (12%) were diagnosed with diabetes within 2 years of baseline. Among the 866 placebo participants and 914 metformin participants who survived to 1 year postbaseline without a diabetes diagnosis, 159 (18%) and 140 (15%) were diagnosed with diabetes within 2 years (i.e., by 3 years post-baseline), respectively. Among the 748 placebo participants and 815 metformin participants who survived to 2 years without a diabetes diagnosis, 105 (14%) and 127 (16%) were diagnosed with diabetes within 2 years (i.e., by 4 years post-baseline), respectively. Among the 638 placebo participants and 703 metformin participants who survived to 3 years without a diabetes diagnosis, 73 (11%) and 74 (11%) were diagnosed with diabetes within 2 years (i.e., by 5 years post-baseline), respectively.
In the baseline static prediction model for the placebo arm, the risk of developing diabetes within 2 years was higher for BMI ≥35 kg/m 2 than for BMI < 30 kg/m 2 (hazard ratio [HR] = 1.28, p < 0.05) and higher among Hispanic than among white participants (HR = 1.31, p < 0.05) ( Table 2). In both treatment arms, higher baseline fasting plasma glucose and HbA1c were associated with higher diabetes risk (for glucose, HR = 1.08 in the placebo arm and 1.05 in the metformin arm, p < 0.001; for HbA1c, HR =1.52 and 1.73, p < 0.001). In the dynamic models (see Additional file 1 for model results), the risks associated with each variable changed over time and as expected, larger changes (increases) in fasting plasma glucose and HbA1c compared to baseline were associated with higher diabetes risk.
In terms of prediction accuracy, at baseline, the static and dynamic models are equivalent and thus, had equal AUC estimates as expected (0.728 for the placebo group and 0.663 for the metformin group). At each subsequent landmark time (years 1, 2, and 3), the AUC of the dynamic model was slightly better than that of the static model (Fig. 1), though not significantly. In the placebo group, the AUC was 0.725 for the static model The Brier Score at baseline was 0.130 for the placebo group and 0.107 for the metformin group for both models. At each landmark time, the Brier Score of the dynamic model was lower (i.e., better) than that of the static model (Fig. 1). In the placebo group, these Brier Score differences were statistically significant at all 3 landmark times: 0. The Hosmer-Lemeshow test statistics, provided in Table 3, show that for most time points, both the static model and dynamic model are reasonable. There are  Table 4. Here, these quantities reflect the extent to which the dynamic landmark model moves an individual's predicted risk "up" or "down" in the correct direction, compared to the static model. In the metformin group, examining predictions at 1 year, these results show that among those individuals that will have an event within 2 years, the dynamic landmark model gave 40.4% of them a higher risk (correct direction of risk change) and 59.6% a lower risk (incorrect direction of risk change), compared to the static model. Among those that will not have an event within 2 years, the dynamic landmark model gave 38.1% a higher risk (incorrect direction of risk change) and 61.9% (correct direction of risk change) a lower risk. On net, 4.6% of participants had more accurate risk estimates under the dynamic model than under the static model at year 1 (NRI = 4.6, 95% CI: − 15.8 to 24.9%, p = 0.661). With the exception of predictions calculated at 1 year in the placebo group, the dynamic model tended to produce more accurate risk estimates than the static model, though these improvements were not statistically significant.

Discussion
Our results demonstrate the potential to improve individual risk prediction accuracy by incorporating information about biomarker changes over time into a dynamic modeling approach. Using DPP clinical trial data, we found that incorporating changes in fasting plasma glucose and HbA1c into the diabetes prediction  model moderately improved predication accuracy, in terms of calibration, among study participants in both the placebo and metformin trial arms. However, we found no evidence of improvements in terms of discrimination (i.e, AUC or NRI) when the dynamic model was used. This is not unexpected given that calibration and discrimination each measure important, but distinct, aspects of prediction accuracy [43,44]. These results indicate that while the dynamic model does not appear to significantly improve the ordering or ranking of individuals in terms of risk of a diabetes diagnosis, the approach does improve upon the absolute risk estimates compared to the static model. The clinical significance of this improvement in accuracy as measured by the Brier Score and the Hosmer-Lemeshow test statistic depends on the practical use of the calculated predictions. For example, if risk estimates are to be compared to certain absolute thresholds for the purpose of clinical decision making-for example, when an intervention or treatment will be initiated if the risk of an event exceeds 10% -our observed small but significant improvement in precision may be considered clinically meaningful. However, the additional computational complexity required to implement the dynamic prediction model may not be worth the trade-off for this small improvement.
The methodology described here offers a straightforward approach to developing more accurate and personalized prediction rules for individual patients. In addition, this approach can be extended to take advantage of longitudinal electronic health record data that might already be available in practice. Multiple areas of health research have focused on collecting and improving the utility of a vast amount of patient-level data, for example, by allowing for data collection using smartphones or tablets [45,46]. The development of methods that can use this wealth of data to appropriately inform decision-making warrants further research. While most risk predictions are based on static models, there are some notable exceptions that have been developed very recently such as the Million Hearts Longitudinal Atherosclerotic Cardiovascular Disease Risk Assessment Tool [47] which uses a dynamic prediction modeling approach. Though we do not focus heavily here on discussing the estimated association between covariates and the primary outcome (i.e., the model coefficients and hazard ratios), we have assumed that these associations would be important to practitioners in this setting. For example, both practitioners and patients may wish to view explicit regression coefficients to understand the contribution of each risk factor to their risk score [48]. If this were not the case, and only the individual predictions were needed, then other approaches, such as machine learning approaches including boosting algorithms and artificial neural networks --which could incorporate this dynamic prediction concept--should also be considered [49][50][51][52]. Though these approaches do not provide explicit estimates of associations between individual covariates and the primary outcome (e.g. regression coefficient estimates), they might be useful when relationships between covariates and primary outcomes are complex (e.g. nonlinear, nonadditive, etc.), and/or a large number of covariates is available (e.g. genetic information). Future research comparing our approach to machine learning approaches in a dynamic prediction framework is warranted.
Our study applying these methods to the DPP data has some limitations. First, since these data are from a clinical trial that was specifically focused on high-risk adults, these results may not be representative of individuals at lower risk for diabetes. Second, our data lacked precise information on patient characteristics (exact age and BMI, for example) and was limited to the biological information available in the DPP data release. This may have contributed to our observed overall moderate prediction accuracy even using the dynamic model in the 0.6-0.7 range for the AUC. Future work examining the utility of dynamic models is warranted within studies that have more patient characteristics available for prediction. However, even with this limitation, this illustration shows the potential advantages of such a dynamic approach over a static approach.

Conclusions
Dynamic prediction has the potential to improve the accuracy of future health status predictions for individual patients. Given the widespread use of risk prediction tools in population management and clinical decision making, even modest enhancements in prediction accuracy could yield improvements in care for large numbers of patients-at little added cost or effort.

Additional file
Additional file 1: