A competing-risk-based score for predicting twenty-year risk of incident diabetes: the Beijing Longitudinal Study of Ageing study

Few risk tools have been proposed to quantify the long-term risk of diabetes among middle-aged and elderly individuals in China. The present study aimed to develop a risk tool to estimate the 20-year risk of developing diabetes while incorporating competing risks. A three-stage stratification random-clustering sampling procedure was conducted to ensure the representativeness of the Beijing elderly. We prospectively followed 1857 community residents aged 55 years and above who were free of diabetes at baseline examination. Sub-distribution hazards models were used to adjust for the competing risks of non-diabetes death. The cumulative incidence function of twenty-year diabetes event rates was 11.60% after adjusting for the competing risks of non-diabetes death. Age, body mass index, fasting plasma glucose, health status, and physical activity were selected to form the score. The area under the ROC curve (AUC) was 0.76 (95% Confidence Interval: 0.72–0.80), and the optimism-corrected AUC was 0.78 (95% Confidence Interval: 0.69–0.87) after internal validation by bootstrapping. The calibration plot showed that the actual diabetes risk was similar to the predicted risk. The cut-off value of the risk score was 19 points, marking mark the difference between low-risk and high-risk patients, which exhibited a sensitivity of 0.74 and specificity of 0.65.

Scientific RepoRts | 6:37248 | DOI: 10.1038/srep37248 There are a number of risk assessment tools based on readily available clinical variables that predict the development of new diabetes cases, including ones proposed by the Framingham Offspring study 8 , Rancho Bemardo study 9 , and Guangzhou Biobank Cohort study 10 . The available tools have been derived from European 7,8,[11][12][13] , American 6,8,9,14 , Australian 15 , Brazilian 16 , Africa 17 , and Asian 10,18-21 populations. Differences in ethnicities, locations, and lifestyles partly limit the applicability of some of the effective risk scores to the Chinese population. Despite the large number of risk tools being developed, only a very small minority are designed for middle-aged and older Asian populations, Chinese populations in particular 8,10 . In addition, few diabetes risk prediction models included health status, despite the fact that some studies have confirmed that health status is an important diabetes predictor 22 . Furthermore, current diabetes risk prediction algorithms were developed for 10-year periods or less. The increasing life expectancy and elderly population suggest the need for longer-term risk assessment tools.
Additionally, conventional statistical methods that analyse time-to-event data assume an absence of competing risks 23 . However, competing risks must be explicitly considered explicitly in frail populations, especially among the elderly 24 . Ignoring the presence of competing risks can bias estimates of the incidence of the event of interest upwards [24][25][26] . Specifically, the sum of the estimates of each event type's incidence will exceed the estimates of the incidence of the composite outcome, defined as any of the event types 25 . To overcome this problem, sub-distribution hazards models (e.g., Fine-Gray model) were proposed, in which the cumulative incidence function (CIF) is provided to estimate the incidence of an event while accounting for the presence of competing events 25 . Sub-distribution hazards models permit one to assess the association of predictors on the absolute risk of DM as well as to calculate the absolute risk of DM conditionally on those predictors. Sub-distribution hazards models are being increasingly applied to predict diseases 26,27 . However, to the best of our knowledge, no algorithm has been proposed that quantifies the 20-year risk of diabetes among middle-aged and elderly individuals using a sub-distribution hazards model.
In this report, we develop a risk tool for estimating the 20-year risk of developing diabetes among middle-aged and elderly individuals who are free of diabetes at baseline. Our risk estimates enable an adjustment for the competing risk of non-diabetes death, and simultaneously include lifestyle behaviours, psychological factors, cognitive function and physical conditions simultaneously. The tool is based on the Beijing Longitudinal Study on Ageing, which has contributed to the development of a 10-year risk score algorithm for coronary artery disease using a sub-distribution hazards model 27 , and offers 20 years of rigorous surveillance data for diabetes occurrence.

Results
Baseline Characteristics and Follow-up. We followed 1857 participants who did not have diabetes at baseline for a median 10.9 (Interquartile range: 8.0-15.3)-years period. The average age at baseline was 69.00 ± 8.81 years for women and 69.88 ± 8.55 years for men at baseline. At the end of year 2012, there were 144 documented cases of incident diabetes, and 919 deaths from non-diabetes. Approximately 4.7% of the participants were lost to -follow-up (n = 87). The incidence density of diabetes was 7.908/1000 person-years. The cumulative incidence function (CIF) of incident diabetes was 11.60% after adjusting for the competing risks of non-diabetes deaths. There were differences between the incident diabetes and non-diabetes groups in the baseline distribution of age, disability, marital status, self-assessment of health status, blood lipids, and physical exercise (P < 0.05) ( Table 1). The baseline characteristics of the subjects based on non-diabetes and diabetes events for men and women at the baseline are also provided in Table 1. The sensitivity analysis showed that there were no statistically significance differences in the distribution of baseline characteristics between those lost to follow-up and those retained.
Diabetes Risk Prediction Model. Univariate analyses were used to regress the sub-distribution hazard of diabetes incidence on all twelve candidate variables, and the estimated regression coefficients, estimated regression sub-hazard ratios, estimated 95% confidence intervals, and the statistical significance of the estimated regression coefficients are reported in Table 2. After accounting for competing risk events in the risk set, standard diabetes risk factors (female gender, age, overweight/obesity, IFG, poor self-assessment of health, divorced or single, and high blood lipids) were significant in the univariate analysis (P < 0.05). Then, all significant variables in the univariate analyses were entered into the multivariate prediction model; five variables were retained after backward selection (Table 2). In the multivariate prediction model, after all adjustments, a greater risk of diabetes incidence was associated with impaired FPG (SHR = 1.99, 95% CI = 1.37-2.90), poor self-assessment of health (SHR = 1.73, 95% CI = 1.19-2.51), overweight (SHR = 2.15, 95% CI = 1.44-3.21) or obesity (SHR = 1.96, 95% CI = 1.27-3.03), and less physical activity (SHR = 1.39, 95% CI = 1.01-1.91). The bootstrap-adjusted regression coefficients, SHR and score of the sub-distribution hazards model are presented in Table 3.
Calibration, Discrimination, Reclassification, and Internal Validation. The calibration plot of the sub-distribution hazards model showed good calibration (Hosmer-Lemeshow test, chi-square = 4.544, P value = 0.805), and the actual diabetes risk in the BLSA cohort was similar to the predicted risk (Fig. 1). The sub-distribution hazards model performed better in terms of discrimination and calibration than Cox proportional hazards model. The area under the ROC curve (AUC) value were 0.76 (95% CI: 0.72-0.80) and 0.73 (95% CI: 0.69-0.77) for the sub-distribution hazards model and Cox proportional hazards model, respectively (Fig. 2). The AUC values of the sub-distribution hazards model were better than those of the Cox proportional hazard model at t = 20 years (Z = 4.30, P = 0.00002). The difference value of AUCs between sub-distribution and Cox proportional hazard models were more than zero (P = 0.307) (Fig. 3). After internal validation by bootstrapping, the optimism-corrected AUC of the sub-distribution hazards model at t = 20 years was 0.78 (95% CI: 0.69-0.87), and the optimism-corrected AUC of the Cox proportional hazard model at t = 20 years were 0.74 (95% CI: 0.65-0.84), suggesting a well-validated model. Additional value of self-rated health. The additional variable self-rated health was assessed by the paired difference of risk scores. The empirical distribution function of the change in estimated risk scores for subjects who had events (thick solid line) and those who were event-free (thin solid line) was assessed (Fig. 4). The difference between the areas under the two curves is IDI, and the distances between the two black dots and between the two grey dots represent the continuous NRI and median improvement, respectively. The estimations of IDI and NRI were 0.019 (95% CI: 0.002-0.054; P = 0.024) and 0.124 (95% CI: 0.032-0.236; P = 0.028), respectively, at t = 20 years. The median increment in the risk score after including self-rated health status in the prediction model was − 0.002 (95% CI: − 0.008-0.005; P = 0.351) at t = 20 years.
Diabetes Risk Score Tool. Finally, we developed a simple risk score tool to estimate the 20-year diabetes risk for each individual using the baseline cumulative incidence function and the bootstrap-adjusted regression coefficients of the sub-distribution hazards model ( Table 4). The score ranges from − 4 to 38, and is positively related to the predicted risk of developing diabetes by linear regression (P for trend < 0.001). The competing-risk-based score  exhibited a reasonable sensitivity of 0.74 and specificity of 0.65, with an optimal cut-off value of 19 marking the difference between low-risk and high-risk patients at t = 20 years.

Discussion
Using a community-based sample with a 20-year follow-up, we have constructed a multivariable risk factor algorithm applying a competing risk model that can be used to predict an individual's risk and provides a helpful guide to identifying the groups at high risk for diabetes among adults over 55 years of age. To the best of our knowledge, this is the first community-based diabetes prediction model considering competing risk to be developed for an elderly population in China.
In terms of discrimination and calibration, the competing risk model is superior to Cox proportional hazard model. The competing risk analysis and Cox proportional hazard model may show no relevant differences when the mortality rate is low. From a statistical perspective, these models are not comparable, as they model different endpoints (cumulative incidence versus cause specific hazard). The present study extends and expands on the previous general diabetes risk models by adding a new risk factor, and the prediction model including self-rated health status was superior to the model without it. A user-friendly risk score tool predicting the 20-year probability of diabetes was developed.
Currently, the Finnish Diabetes Risk Score (FINDRISK) 7 , Framingham DM risk score 8 , Cambridge Diabetes Risk Score 11 , and German Diabetes Risk score 13 are the most widely used scores in clinical guidelines. In addition, there are a number of other important risk algorithms or functions 28 . However, a prediction model specifically designed for the risk of incident diabetes in the Chinese elderly population is not currently available, especially one considering the competing risk. Our risk prediction model provided a feasible tool for identifying the high-risk individuals among the elderly in Beijing.
To the best of our knowledge, this is the first community-based diabetes prediction model considering competing risk that has been developed for the elderly population in China. It should be emphasized is that the general model evaluation methods are not applicable for competing risk models, calibration plots, net reclassification index (NRI), and integrated discrimination improvement (IDI) were calculated, and these values were adjusted for the competing risk of non-diabetes death.
The AUCs of previous diabetes risk scores for elderly adults ranged from 0.71 to 0.78 in their original population 9  variables. If further predictors related to blood test results were included, the scores would likely show an improved performance. In our score based on the competing risk model, age is the strongest predictor of incident diabetes (a contribution of 15 points). Individuals aged 55 to 65 years have the highest risk of developing diabetes in our scores (accounting for 39.47% of the total score based on the competing risk model), followed by individuals aged ranging from 66 to 75 years. Similarly results were found in the Guangzhou Biobank Cohort Study (GBCS), which was a 4.1-year population-based follow-up of 16,043 Chinese aged 50 years or above 10 .
BMI is the second-strongest predictor in our scores, and has been included in most of the published scores used to predict incident diabetes 10 . In our scores, the FPG variable is the third-strongest predictor after BMI (a contribution of 7 points). This result is roughly consistent with previous reports 9 . The value representing impaired fasting glucose (IFG) has been defined to be from 6.1 to 6.9 mmol/L. It is unsurprising that individuals with IFG have a high risk of developing diabetes. The risk of incident diabetes increased with high FPG levels.
Physical activity is also an important predictors of incident diabetes, and environmental pathways may be able to account for this relationship 13 . It has been demonstrated that interventions that include increases in physical activity are able to reduce the incidence of diabetes in high risk adults 29,30 . Another reason for this finding is that participants who frequently exercise are more likely to be aware of their blood glucose levels than people who never or rarely exercise.
We are the first to include self-rated health status in a diabetes prediction score. The competing-risk-based score included the self-rated health status and was assigned 6 points. Self-rated health (SRH) is a reflection of social, psychological, and biologic dimensions; it is one of the most widely used yet poorly understood measures of health 31 . In the present study, SRH was based on individuals' assessment of their health status compared with that of peers their age. Similar to our results, SRH scores provide additional valuable information for risk prediction in patients with diabetes 32 , and it has also been recommended as a tool for assessing cardiovascular disease risk assessment 33 . Thus, diabetes guidelines should extend their focus on clinical and social aspects of diabetes to include questions on patient's SRH 34 .
There were some limitations to our study. First, we did not included waist circumference. However, the Guangzhou Biobank Cohort Study showed that using waist circumference or waist-to-hip ratio instead of BMI did not substantially improve the discrimination substantially 10 . Second, due to the long-term follow-up, follow-up biases could easily have been introduced. However, the sensitivity analysis showed that there were no statistically significant differences in the distribution of baseline characteristics between those lost to follow-up and those who remained in the study. In addition, because cases of diabetes were identified through reexamination and questionnaires, diabetes onset occurred prior to diagnosis.

Conclusion
We constructed a multivariable risk score using a community-based sample with a 20-year follow-up that can be used to predict an individual's risk for diabetes among adults over 55 years of age. To the best of our knowledge, this is the first community-based diabetes risk score to consider competing risk developed for an elderly population in China. Further studies are needed to test this score in other population samples of China.        to December 2012, and this study was managed by Xuanwu Hospital of Capital Medical University in Beijing, China. In total, 244 subjects were excluded because of they had either a baseline FPG level higher than 7.0 mmol/L (126 mg/dl) or a history of diabetes (as informed by a physician) or because they were taking antidiabetic medicine. This left 1857 participants (925 men and 932 women) who did not have diabetes at baseline for the analysis.
The study followed the guidelines of the Helsinki Declaration and was approved by the ethics committee of Xuanwu Hospital, Capital Medical University. Written informed consent was obtained from all participants. Table 1 were chosen for their common availability and use in previous diabetes prediction models. The demographic characteristics and information on dietary habits, lifestyle, psychological factors and physical condition were obtained using questionnaires with a high degree of reliability and accuracy 36 ; the questionnaires were administered by hospital research doctors of the hospitals who were specifically trained for the job. The questionnaires were designed by the Beijing Geriatric Clinical and Research Centre and the Australian Geriatric Research Centre of Flinders University. The measurement and classification of each category variable have been reported elsewhere in detail 35 .

Assessment of risk factors and outcomes. The candidate baseline variables presented in
A food frequency questionnaire was conducted for the dietary assessment 37 . Then, a latent class model was constructed and the best model was selected according to the value of the Bayesian information criterion. Based on the posterior probability (representing the frequency of food intake), dietary habits were divided into three latent groups: sufficient nutrition, intermediate-type and meat-based diet. Self-reported smoking, drinking, residence, and health status and the frequency of physical activity were evaluated by questionnaires with a high degree of reliability and accuracy. If the elderly exercised almost every day, this was defined as exercising frequently. The activities included Qi Gong, TaiChi, walking, running/jogging, dancing, etc.
Age was categorized into three sub-groups: 55 to 65 years, 66 to 75 years, and ≥ 76 years. Marital status was divided into two categories: married and unmarried. Height, weight, hip circumference, and waist circumference (2.5 cm above the umbilicus) were measured in the standing position without heavy clothing to the nearest 0.1 cm or 0.1 kg by nurses who were responsible for annual routine health examinations. BMI was calculated according to the equation BMI = weight (kg)/height (m) 2 and was classified based on the common Chinese criteria 38 , i.e., thin corresponding to BMI < 18.5 kg/m 2 , normal to 18.5 ≤ BMI < 24.0 kg/m 2 , overweight to 24.0 ≤ BMI < 28.0 kg/m 2 , and obese to BMI ≥ 28.0 kg/m 2 .
Blood pressure (BP) was measured twice on the left arm of the seated participants with a mercury sphygmomanometer and an appropriately sized cuff; the average of the blood pressure measurements was constituted the examination blood pressure value. The two BP measurements were obtained with a 5-minute interval. If the two measurements differed by more than 5 mmHg, an additional reading was taken, and the final, average of the readings was used for the analysis. BP was classified into two groups: high (systolic blood pressure > 140 mmHg or diastolic blood pressure > 90 mmHg) and normal blood pressure.
Blood samples were collected after an overnight fast of at least 12 hours. FPG (Fasting plasma glucose), total cholesterol (TC), triglycerides (TG), high-density lipoprotein cholesterol (HDL-C), and low-density lipoprotein cholesterol (LDL-C) were subsequently determined with standardized enzymatic methods. Based on the standard of impaired FPG and dyslipidaemia, a FPG level of 6.1 to 6.9 mmol/L (109.8-125.9 mg/dl) was considered to impaired fasting glucose (IFG) 39 . A TC level of 5.18 mmol/L (200 mg/dL) or greater, a TG level of 1.7 mmol/L (150 mg/dL) or greater, a HDL-C level less than 1.03 mmol/L (40 mg/dL) in men and 1.29 mmol/L (50 mg/dL) in women, or an LDL-C level of more than 3.35 mmol/L (130 mg/dL) were considered to indicate dyslipidemia 40 .
The outcome of interest was the first incidence of diabetes at follow-up. This was identified according to either a self-reported history of diabetes diagnosis, or the use of antidiabetic medicine after the baseline examination, or a measured FPG level ≥ 7.0 mmol/L (126 mg/dl) at any of the periodic examinations. The date of diagnosis (incidence) was defined as the date of the examination visit when a new case of diabetes was identified or the diagnosis date on the most recently documented diabetes history collected by the questionnaire, whichever came first. Survival status was determined through interviews with surviving household members or neighbours when surviving household members were unavailable. The information was verified by a subset of participants based on household registration records. Cause of death was determined according to the International Classification of Disease (ICD), ninth revision (ICD-9 or ICD-10). Non-diabetes death, including from cardiovascular diseases, cancers and other causes, was classified as competing events.
Statistical Analysis. Time of follow-up was calculated from the return date of the 1992 questionnaire until either incidence of diabetes, death, loss to follow-up, or the end of follow-up (December 2012), whichever came first. Considering the extensive length of the follow-up and the potential bias due to the competing risk of non-diabetes mortality, we employed a sub-distribution hazards model to adjust for the risk estimates of the competing risk of non-diabetes death as a competing risk 25 . The sub-distribution hazards model calculated the cumulative incidence of diabetes in the following manner: Sub-distribution hazards models were fitted to predict the risk of developing diabetes using package cmprsk and package crrstep in R software, which adjusted for clinical and biochemical variables. In the first step, univariate sub-distribution hazards models were used to regress the sub-distribution hazard of diabetes incidence on all nineteen candidate variables, and the variables with a statistical significance of the estimated regression coefficients of P > 0.20 were removed. Then, all significant variables were included to develop the multivariate prediction model with backward selection. In the third step, the remaining variables were included to build the final prediction model. For each model, sub-distribution hazard ratio (SHRs) and 95% confidence intervals (95% CIs) were calculated to estimate the relative risk.
Self-rated health is an important risk factor for diabetes, as confirmed in some studies 22,41 . However, no diabetes risk prediction models considered the impact of self-assessed of health status. Therefore, the diabetes risk prediction model in this study accounted for self-assessment of health status. We did not account for the interaction terms between the independent variables. All continuous variables included in the model were categorized, and thus the estimated contribution of these factors to diabetes risk could be expressed through simplified point scores assigned to each for the category. In addition, β -coefficients were calculated to determine points for each risk factor by multiplying the β -coefficients by 10 and rounding to the nearest integer. The sum of these points for each model was further calculated to predict the hazard of the incidence of diabetes over a mean follow-up period of 9.81 years for each person.
After the prediction models were developed, it was critical to evaluate their performance. The receiver operating characteristic (ROC) curve and areas under the ROC curves (AUCs, also referred to as C statistics) were used to evaluate the discriminative ability of the sub-distribution hazards models 42 and were obtained by the ROCR package in R (R Foundation for Statistical Computing, Vienna, Austria) 43 . The cut-off point was estimated by calculating the value that minimizes the Euclidean distance between the ROC curve and the upper left corner of the graph. The calibration of the model was assessed graphically by comparing the predicted probability of the observed probability across the 10 deciles of predicted risk 44 , which was performed with the R package pec. Calibration refers to the agreement between observed outcomes and predictions. The more range there is between 10 deciles, the better discriminating the model. Hosmer-Lemeshow test was used to indicate the goodness of fit.
Additionally, internal validation was supported by estimating the potential of over-fitting and the optimism of the models 45 , which was performed by applying bootstrap resampling 1000 times with R package pROC. The bootstrap optimism-corrected AUC was computed by subtracting the optimism from the original AUC. Bootstrap-adjusted regression coefficients better reflect what can be expected when the model is tested or applied in new individuals from the same theoretical source population 45 . However, no internal validation methods can substitute for external validation.
Recently, some novel alternatives to the area under the receiver operating characteristic curve, such as net reclassification improvement (NRI) and integrated discrimination improvement (IDI), have been proposed 46 to measure the improvement from the new risk factor in the prediction. The NRI and IDI are two new metrics used to the formally assess new risk factors, to supplement the improvement in the AUC, and were assessed using the R package of survIDINRI. All p-values reported were two-sided. Two-independent sample chi-square tests were in SAS software (Version 9.2, SAS Institute Inc., Cary, NC).