A Prediction Model Based on Noninvasive Indicators to Predict the 8-Year Incidence of Type 2 Diabetes in Patients with Nonalcoholic Fatty Liver Disease: A Population-Based Retrospective Cohort Study

Background The prevention of type 2 diabetes (T2D) and its associated complications has become a major priority of global public health. In addition, there is growing evidence that nonalcoholic fatty liver disease (NAFLD) is associated with an increased risk of diabetes. Therefore, the purpose of this study was to develop and validate a nomogram based on independent predictors to better assess the 8-year risk of T2D in Japanese patients with NAFLD. Methods This is a historical cohort study from a collection of databases that included 2741 Japanese participants with NAFLD without T2D at baseline. All participants were randomized to a training cohort (n = 2058) and a validation cohort (n = 683). The data of the training cohort were analyzed using the least absolute shrinkage and selection operator method to screen the suitable and effective risk factors for Japanese patients with NAFLD. A cox regression analysis was applied to build a nomogram incorporating the selected features. The C-index, receiver operating characteristic curve (ROC), calibration plot, decision curve analysis, and Kaplan-Meier analysis were used to validate the discrimination, calibration, and clinical usefulness of the model. The results were reevaluated by internal validation in the validation cohort. Results We developed a simple nomogram that predicts the risk of T2D for Japanese patients with NAFLD by using the parameters of smoking status, waist circumference, hemoglobin A1c, and fasting blood glucose. For the prediction model, the C-index of training cohort and validation cohort was 0.839 (95% confidence interval (CI), 0.804-0.874) and 0.822 (95% CI, 0.777-0.868), respectively. The pooled area under the ROC of 8-year T2D risk in the training cohort and validation cohort was 0.811 and 0.805, respectively. The calibration curve indicated a good agreement between the probability predicted by the nomogram and the actual probability. The decision curve analysis demonstrated that the nomogram was clinically useful. Conclusions We developed and validated a nomogram for the 8-year risk of incident T2D among Japanese patients with NAFLD. Our nomogram can effectively predict the 8-year incidence of T2D in Japanese patients with NAFLD and helps to identify people at high risk of T2D early, thus contributing to effective prevention programs for T2D.


Introduction
The global prevalence of adult diabetes has increased rapidly in recent decades and has become a major public health problem [1]. Type 2 diabetes (T2D), the most common form of diabetes, is a chronic disease characterized by elevated blood glucose levels due to insufficient insulin production and insulin resistance [2]. In addition, there is growing evi-dence that nonalcoholic fatty liver disease (NAFLD) is associated with an increased risk of diabetes, independent of traditional diabetes risk factors [3,4]. As a debilitating chronic epidemic, a core component of T2D prevention strategies is to identify individuals at high risk of T2D [5]. Studies have shown that early identification of people at high risk for T2D and timely lifestyle changes or pharmacological interventions can delay the process of β-cell failure and the development of T2D [6,7]. Therefore, it is important to study the risk factors for the development of T2D in Japanese patients with NAFLD and to find an easy, reliable, and accurate screening tool for the identification of T2D high-risk groups in NAFLD. This will contribute to the effective implementation of T2D prevention programs in Japanese adults with NAFLD.
Risk prediction models have great potential in the decision-making process for the management of subhealthy populations and patients [8,9]. Risk prediction models can help guide screening and interventions to predict the onset of diseases. At present, a variety of risk prediction models have been constructed to identify individuals at high risk of T2D, such as Leicester Risk Assessment [10], the Cambridge Risk Score [11], QDiabetes® Calculator [12], and the FIN-DRISC [13], but they all have a number of limitations. First, most do not take into account lifestyle changes such as physical activity, smoking, and alcohol consumption behaviors. Others are based on invasive and cost-effective data, or small-scale and inappropriate cohort selection. Others are based on short-term follow-up or lack of transparent reporting on the steps that produced the pattern. Most importantly, these diabetes prediction models are based on the general population and limited research focused on individuals at low risk.
The purpose of this paper is to develop a T2D risk prediction model for Japanese patients with NAFLD based on data from the NAGALA cohort study to better screen and assess the 8-year risk of developing T2D in high-risk nondiabetic patients.

Data Source.
We have downloaded the raw data uploaded by Okamura et al. from the "DATADRYAD" data-base (http://www.datadryad.org) for free. Since Okamura et al. [14] have granted the data Dryad website ownership of the raw data, we were able to use them for secondary data analysis according to different scientific hypotheses (Dryad data package: 10.5061/dryad.8q0p192).

Study Design and Participants.
In this study, Okamura et al. [14] used the NAGALA (NAfld in the Gifu Area, Longitudinal Analysis) database to investigate the effect of obesity phenotype on the risk of developing T2D. Since most of the participants require repeated examinations, the researchers conducted a follow-up study of incident T2D diagnosed by blood tests and fatty liver diagnosed by abdominal ultrasound. This study was a secondary analysis of the open data from the NAGALA study. The inclusion criteria for the NAGALA study were described in detail in the original article [15]. Briefly, a total of 15744 participants were selected according to the following exclusion criteria: (1) a lack of important data; (2) the participants had hepatitis B or C virus or fatty liver disease; (3) alcohol intake exceeding 60 g per day for men or 40 g per day for women; (4) the participants took medication at baseline; (5) fasting plasma glucose ≥ 6:1 mmol/L.   smokers, past smokers, or current smokers. Nonsmokers were defined as participants who never smoked, past smokers were defined as participants who smoked in the past but quit before the baseline examination, and current smokers were defined as participants who smoked at the time of the baseline examination. Researchers also investigated participants' recreational and physical activities. The researchers defined regular exercisers as participants who participated in any type of exercise at least once a week.

2.5.
Definitions. T2D was defined as HbA1c ≥ 48 mmol/mol, FPG ≥ 126 mg/dL, and/or self-reported diabetes during 3 BioMed Research International follow-up. NAFLD was defined as having fatty liver demonstrated by abdominal ultrasound.
2.6. Ethical Approval. As this study is the second analysis of existing anonymous data, informed consent of participants is not required. Published paper details the ethical permission [14].

Statistical Analyses.
The study is consistent with the transparent report of the multivariate predictive model of individual prognosis or diagnosis (TRIPOD): the TRIPOD statement [16].
Statistical analyses were performed using R software (version 3.6.3; https://www.R-project.org). First, 2741 Japanese patients with NAFLD were randomly divided into a training cohort of 2058 and a validation cohort of 683 for external validation using the R-caret package, consistent with a theoretical ratio of 3 : 1. Data were expressed as mean ± standard deviation (normal distribution) or median (quartile) (skewed distribution) for continuous variables, and categorical variables were evaluated by calculating frequencies or percentages. Two-sample t-tests were used to analyze differences between the training and validation cohorts for normally distributed continuous variables, the Wilcoxon rank-sum test for nonnormally distributed continuous variables, and the chi-square test for categorical variables. And then, the data of the training cohort were analyzed using the least absolute shrinkage and selection operator (LASSO) method to screen the suitable and effective risk factors for Japanese patients with NAFLD. LASSO regression is a method to simplify high-dimensional data. Features with nonzero coefficients can be selected in the LASSO regression model. Next, indicators selected in the LASSO regression model were included in the univariate and multivariate cox regression analysis of risk   BioMed Research International factors related to T2D, and the hazard ratio (HR) and 95% confidence interval (CI) were calculated. The results of the univariate and multivariate cox regression analyzes were visualized using forest plots. Finally, results of the multivariate cox regression analysis were used to construct a nomogram prediction model. In addition, a variety of validation methods were used to estimate the accuracy of the risk prediction model by using data from the training and validation cohorts, respectively. C-index and receiver operating characteristic (ROC) curve were used to quantify the discrimination performance of nomogram. We plotted and calculated calibration curves using the rms software package, which was used to evaluate the calibration of the T2D risk nomogram and accompanied by a Hosmer-Lemeshow test. Decision curve analysis was performed to determine the clinical application of the T2D risk prediction model: the proportion of true positive results minus the proportion of false positive results, and then, the relative risks of false positive and false negative results were weighted to obtain the net benefits of decision-making. Bootstraps for 1000 resample were performed on the ROC curve, C-index, calibration curve, and decision curve analysis to reduce overfitting deviation. Survival analysis was also performed using the Kaplan-Meier analysis between low-risk and high-risk groups according to the cut-off value of 50%, and the log-rank test was performed to compare survival variance in different groups. All statistical tests were two-sided, and P values of <0.05 were considered significant.  validation cohort. A flow diagram of studying design is depicted in Figure 1. The overall incidence of T2D was 8.14% (223/2518). In the training and validation cohorts, the incidence of T2D was 157 (7.63%) and 66 (9.66%), respectively. The median follow-up time for the training cohort was 1865 days (quartile: 779-3445), and the median follow-up time for the validation cohort was 2073 days (quartile: 1054-3474). In addition, there were no significant differences observed between the two cohorts. Baseline characteristics of training and validation cohorts are summarized in Table 1.

Characteristics of Selection by LASSO Regression Analysis.
Seventeen potential risk factors were selected from demography and clinical characteristics and analyzed by LASSO regression (Figures 2(a) and 2(b)). Nonzero characteristic variables were selected based on the statistical approach of the LASSO regression model. Therefore, the number of     BioMed Research International potential variables was reduced from seventeen to four, including smoking status, WC, FPG, and HbA1c. Table 2 shows the specific coefficients corresponding to the variables of lambda.1se. The results showed that smoking status, WC, FPG, and HbA1c were considered to be independent predictors of T2D (P < 0:05).

Development of the Individualized Prediction Model.
We have combined the above four independent predictors into a predictive model and displayed it in the form of a nomogram. As shown in Figure 4, the nomogram is a quantitative and convenient tool. To obtain a personalized 8-year risk of T2D in Japanese patients with NAFLD, a vertical line was   BioMed Research International drawn from the values on the point scale to assess these points, which were then summed to obtain values for each variable. The sum includes the total score and matches the risk on the bottom axis.

3.5.
Performance of the Nomogram. The ROC curve and Cindex were used to evaluate the discriminatory ability of the prediction model. For the prediction model, the pooled area under the ROC of the nomogram was 0.811 with a sensitivity and specificity of 77.49% and 72.36%, respectively, in the training cohort ( Figure 5(a)). It was 0.805 in the validation cohort, with sensitivities and specificities of 69.88% and 75.59%, respectively ( Figure 5(b)), which indicates a moderately good performance. The C-index of training cohort and validation cohort was 0.839 (95% CI, 0.804-0.874) and 0.822 (95% CI, 0.777-0.868), respectively. As shown in Table 3, the nomogram showed a good prediction model. The prediction model was calibrated using the calibration curve and the Hosmer-Lemeshow test. From the calibration curves, the prediction model showed a good fit in both the training and validation cohorts (Figures 6(a) and 6(b)). As shown by the Hosmer-Lemeshow test, there was good agreement between the predicted and actual probabilities in both the training and validation cohorts. The decision curve analysis of the training cohort (Figure 7(a)) and the validation cohort (Figure 7(b)) indicated that the application of the prediction model in Japanese patients with NAFLD to predict the risk of T2D incidence is more effective than the intervention-for-allpatients scheme. Each patient was divided into a high-risk or low-risk group according to the cut-off value of 50% predicted by nomogram. Kaplan-Meier survival analysis yielded a significant difference in T2D-free survival probability between the training cohort (Figure 8(a)) and the validation cohort (Figure 8(b)). This stratification could effectively discriminate the T2D prevalence outcomes of the two risk groups in the training and validation cohort.

Discussion
With rapid economic development, human lifestyles have changed dramatically worldwide. The prevalence and incidence of T2D are rapidly increasing worldwide [17]. T2D is associated with an increased risk of cardiovascular disease and premature death. It is the main cause of end-stage renal disease, blindness, and nontraumatic amputations resulting from microvascular complications, thus imposing a significant economic burden on society [18,19]. The risk of T2D is strongly associated with lifestyle, nutritional status, and environmental factors [20]. Several large intervention studies have shown that lifestyle changes or pharmacological interventions targeting people at risk of T2D can effectively prevent or delay the onset of T2D and reduce the risk of death from T2D and its complications [21,22]. The key to successful intervention is the early identification of people at high risk of T2D [18]. In recent years, numerous T2D risk prediction models have been developed and tested, such as the Australian Type 2 Diabetes Risk Assessment Tool [23], the Cambridge Risk Score [11], the Finnish Risk Score [13], the Framingham Diabetes Risk Score [24], and among others [25][26][27]. However, there is a lack of a risk prediction model based on cohort study data for the Japanese population, especially for patients with NAFLD. Most of the available T2D risk prediction models are only applicable to the target population, and direct application of these models constructed mainly from populations of European origin may underestimate the risk of developing T2D in the Japanese population. Based on data from the NAGALA cohort study, we aimed to develop a T2D risk prediction model for Japanese patients with NAFLD to identify individuals with high risk of T2D.
In this retrospective cohort study, we developed and validated a nomogram model using cost-effective and easily available parameters to predict the 8-year risk of T2D in Japanese patients with NAFLD and to help clinicians identify high-risk populations for T2D. In the training and validation cohorts, our nomogram has excellent prediction performance and also has excellent consistency on the calibration curve. The decision curve analysis illustrates the clinical application value of nomogram. To the best of our knowledge, this study is the first nomogram to use continuous values instead of segmented values to estimate the risk of type 2 diabetes in Japanese patients with NAFLD. In addition, the nomogram will be of great practical value due to its easily available parameters.
Our prediction model includes four parameters: smoking status, WC, FPG, and HbA1c. These variables identified as risk factors for T2D were consistent with previous studies [28,29]. In our nomogram, current or past smokers have a higher risk of developing T2D than never-smokers. Numerous epidemiological studies have confirmed that smoking is associated not only with the occurrence of T2D but also with the increased risk of T2D hospitalization and mortality, and the risk increases in a dose-dependent manner with the increase of daily smoking [30][31][32]. According to the 2014 US secretary of health report, compared with nonsmokers, smoking increases the risk of T2D of active smokers by 30-40%, which indicates that smoking cessation should be emphasized as a basic public health strategy to combat the global diabetes epidemic [33]. The World Health Organization also recognizes that smoking is a preventable risk factor for T2D and agrees to avoid smoking/quit smoking as part of its lifestyle recommendations [30], although there is a lack of a complete understanding of the potential pathways of tobacco abuse, especially the mechanism of pancreatic beta cells. However, data from numerous clinical studies suggest that smoking and nicotine have effects on body composition, pancreatic beta cell function, and peripheral insulin sensitivity [20,29,34,35].
Obesity has become a major global epidemic affecting more than 300 million people, and it is the most important risk factor leading to the onset of T2D [36,37]. BMI has been used as a surrogate marker of obesity and as one of the predictor variables in most diabetes risk models [38][39][40][41]. However, BMI does not reflect central obesity. Compared with BMI, WC has a better predictive value for incident T2D, which is consistent with the results of this study [42][43][44]. WC is a simple anthropometric parameter of abdominal obesity; it is an indicator of central obesity, whereas BMI is an indicator of general obesity [45,46]. Central obesity is a 8 BioMed Research International recognized risk factor for the metabolic syndrome and is strongly associated with the secretion of adipocytokines and inflammatory cytokines, all of which are strongly associated with an increased risk of developing T2D [47][48][49]. Elevated WC always leads to an accumulation of abdominal fat and a subsequent increase in free fatty acid levels [50]. Excess of circulating free fatty acids leads to insulin resistance by inhibiting insulin signalling and directly accelerating the rate of hepatic gluconeogenesis, coupled with desensitization of the hepatic regulatory loop involving fatty acids by hypothalamic sensing [51][52][53]. Therefore, waist circumference was used as the basis for the construction of the predictive model in this study.
The FPG level can reflect the secretion level and function of basal insulin [54]. Many epidemiological studies have shown that baseline FPG levels are highly predictive of T2D, and the elevated baseline FPG levels are closely related to the increased risk of developing T2D [55][56][57][58]. HbA1c is a comprehensive measurement method of circulating blood glucose level, which reflects the average blood glucose level over the previous 2-3 months and is used as the gold standard for long-term follow-up of blood glucose control [59,60]. Compared with the oral glucose tolerance test (OGTT) and 2hPG, the measurement of HbA1c is faster and more convenient and can be measured at any time, regardless of the length of fasting or the composition of the previous meal [61,62]. Epidemiological studies show that elevated HbA1c in nondiabetic adults is associated with T2D incidence rate, incidence rate of cardiovascular disease, and mortality [63][64][65]. A recent recommendation by the American Diabetes Association Committee on the Diagnosis and Classification of Diabetes Mellitus advocates the use of HbA1c as a practical and effective test method in the diagnosis of prediabetes to identify high-risk groups, and it may be cost-effective to carry out intensive lifestyle interventions to prevent T2D. The decision of the American Diabetes Association Committee is mainly based on the established association between HbA1c and microvascular diseases [59].
Although our nomogram performed well, several limitations warrant mention. First, we relied on FPG and HbA1c, rather than OGTT, to define incident T2D. However, OGTT is not feasible to carry out this test in a large sample. HbA1c does not need fasting, reflecting the long-term blood glycemic status. In addition, the International Expert Committee also recommended the use of HbA1c to diagnose diabetes [66]. Second, our validation cohort was derived from the same population as the training cohort, which may indicate that the findings were overly optimistic. Future research could verify the performance of this prediction model with other databases. Third, this large-scale cohort study was conducted in Japan. Therefore, whether the results of this study can be generalized to other ethnic groups and some specific groups, such as pregnant women and children, requires further validation by external cohorts. Fourth, the current assessment results might not be satisfactory in practice, and some novel biochemical markers or indicators, especially of genetic factors, could improve the performance of the prediction model in the future. Finally, this report is a second analysis based on the existing database. Although many confounding factors have been adjusted, these potential predictors were not included in our prediction model because data on socioeconomic status, lifestyle (except for smoking, drinking, and exercise habits), disease history (such as cardiovascular disease and chronic kidney disease), family health history (such as diabetes), and specific timing of diabetes diagnosis (especially for self-reported diabetes) were not collected in the database.

Conclusion
In summary, we developed and validated a nomogram for the 8-year risk of incident in T2D among Japanese patients with NAFLD, including smoking status, WC, FPG, and HbA1c. Our prediction model can effectively predict the 8-year incidence of T2D in Japanese patients with NAFLD and helps to identify people at high risk of T2D early, thus contributing to effective prevention programs for T2D.

Data Availability
All datasets generated and/or analyzed during the present study are included in this published article and available in Dryad (http://www.datadryad.org/).