Validation of prevalent diabetes risk scores based on non-invasively measured predictors in Ghanaian migrant and non-migrant populations – The RODAM study

Background Non-invasive diabetes risk models are a cost-effective tool in large-scale population screening to identify those who need confirmation tests, especially in resource-limited settings. Aims This study aimed to evaluate the ability of six non-invasive risk models (Cambridge, FINDRISC, Kuwaiti, Omani, Rotterdam, and SUNSET model) to identify screen-detected diabetes (defined by HbA1c) among Ghanaian migrants and non-migrants. Study design A multicentered cross-sectional study. Methods This analysis included 4843 Ghanaian migrants and non-migrants from the Research on Obesity and Diabetes among African Migrants (RODAM) Study. Model performance was assessed using the area under the receiver operating characteristic curves (AUC), Hosmer-Lemeshow statistics, and calibration plots. Results All six models had acceptable discrimination (0.70 ≤ AUC <0.80) for screen-detected diabetes in the overall/combined population. Model performance did not significantly differ except for the Cambridge model, which outperformed Rotterdam and Omani models. Calibration was poor, with a consistent trend toward risk overestimation for screen-detected diabetes, but this was substantially attenuated by recalibration through adjustment of the original model intercept. Conclusion Though acceptable discrimination was observed, the original models were poorly calibrated among populations of African ancestry. Recalibration of these models among populations of African ancestry is needed before use.


Introduction
There is a compelling argument for frequent screening among people who are at high risk of getting diabetes [1].Early-stage detection would enable appropriate interventions that can slow the progression of diabetes, and limit related complications, disability, and mortality [2].This has led to a booming field in clinical research where mathematical models, also known as prediction models, prediction equations, or risk scores, used to assess the probability of an individual having (diagnostic prediction scores) or developing (prognostic prediction scores) diabetes [3].Depending on the scope of predictors included, diabetes risk scores may be distinguished as invasive and non-invasive risk scores.Invasive risk scores include laboratory-acquired biomarkers as predictors while non-invasive risk scores do not [4,5].
Due to differences in genetic factors, environment, and body composition, the varying performance of several risk models has been reported when evaluated in different settings [6][7][8][9].Risk scores are evaluated on their discriminative power to distinguish people with the outcome of interest from those without [10], and their calibration property, which is the model's ability to accurately estimate the probability of the event under consideration [11].Diagnosis of diabetes by invasive biochemical laboratory testing has been found to perform better than noninvasive models [9].However, due to the need for large-scale screening in high-risk populations especially in resource-poor settings, the use of noninvasive risk models is more cost-effective than invasive risk models [5,11].
Available diabetes risk scores mostly originate from non-African populations and have rarely been validated in people of African ancestry [12,13].The few validation studies conducted have reported varying predictive performance for different noninvasive diabetes models in sub-Saharan African populations [9,14].It is therefore not clear which diabetes risk score may be most suitable for diagnostic prediction of diabetes among people of African ancestry both within Africa and in the diaspora.This work is therefore aimed at evaluating the predictive performance of six noninvasive diabetes risk scores (Cambridge, FINDRISC Original, Kuwaiti, Omani, Rotterdam, and SUNSET model) to detect new cases of diabetes and prediabetes among Ghanaian migrants and non-migrants.

Study population and design of the study
Data from the Research on Obesity and Diabetes among African Migrants (the RODAM study) was used as the basis for validation of models.The detailed rationale and design of the RODAM study are published elsewhere [15].The current analysis includes 4843 participants with an average age of 45.5 years and between 25 and 70 years, who have no previous diagnosis of diabetes and were not on treatment for diabetes (Figure A1 Appendix).

Ethical clearance
Ethical clearance was obtained from relevant ethics committees at each of the study sites: Ghana (Kwame Nkrumah University of Science & Technology: CHRPE/AP/200/12), Netherlands (Amsterdam University Medical Center: W12-062#12.17.0086), Germany (Charité University Berlin: EA1/307/12) and UK (London School of Hygiene & Tropical Medicine: 6208).The aim of the study and the use of participant data for the purposes of understanding obesity and diabetes related outcomes was explained to each participant before written consent was taken.Participant confidentiality was assured during data collection, storage and analysis.

Model selection
Diabetes prediction models included in this study were prevalence models with non-invasive predictors that have been previously validated in sub-Saharan African ancestry populations [14], with predictor variables available in the RODAM study database.Six non-invasive prevalent diabetes models were selected for this study.These include the Cambridge Risk Score [16], FINDRISC Original [1], Kuwaiti diabetes risk score [17], Omani diabetes risk score [18], Rotterdam Predictive Model [19], and the SUNSET Risk Score [20].
Details of the method employed including, criteria for inclusion in the current study, physical measurement, used outcome variables and cut-offs, statistical methods for predictive performance measurement and recalibration are place in the supplementary sheet due to manuscript word limits.

Participants' characteristics and prevalent screen-detected diabetes
Of the 4843 Ghanaian participants selected for this analysis, 57.2% were residents in Europe and 42.8% were residents in Ghana.The prevalence of screen-detected diabetes was 4.6% in the overall population, with an additional 30% classified as prediabetes.A higher prevalence of screen-detected diabetes was found among participants resident in Europe with 5.5% screen-detected diabetes compared to 3.4% of those in Ghana (Table 1).

Prediction of prevalent screen-detected diabetes
All six diabetes risk models assessed produced acceptable discrimination (AUC = 0.70 to 0.75) for the detection of screen-detected diabetes in the total study population.The Cambridge risk model showed the highest discrimination with an AUC of 0.75 (95% CI, 0.73-0.76),with the Rotterdam risk score having the lowest AUC of 0.70 (95% CI 0.69-0.72).The SUNSET risk model was the best performing model for screen-detected diabetes among Ghanaians resident in Ghana (AUC = 0.78) but the worst performing model among Ghanaians resident in Europe (AUC = 0.70) (Table 2 and Figure A2).Among the population living in Ghana, model performance did not differ between urban and rural dwellers (Table A5).As seen in Supplementary Table A6, poor discrimination was observed for all models when fasting blood glucose was used to define the outcome variable.

Prediction of prevalent prediabetes
All the assessed diabetes risk equations had a modest to acceptable discrimination power for the detection of prediabetes, with the Rotterdam risk model showing the lowest AUC 0.64 (0.63-0.66) and the Cambridge risk model the highest 0.70 (0.69-0.71) in the overall sample (Table A3 and Figure A5, Appendix).

Comparison of model performance
The Rotterdam risk model was found to have been outperformed by all other risk scores assessed among Ghanaians living in Ghana.The FINDRISC and Cambridge diabetes risk models outperformed the Rotterdam diabetes risk model in the prediction of screen-detected diabetes irrespective of gender.The SUNSET exhibited superior discriminative power over the Omani and the Kuwaiti among Ghanaian residents in Ghana (Table 3).

Model calibration and recalibration
In general, we observed an overestimation of the risk of diabetes by the original models with no agreement between the expected risk and the observed risk (p<0.0001 for the Hosmer & Lemeshow statistic).Intercept adjustment substantially attenuated these differences (Table 4).The Kuwaiti and Cambridge models underestimated the risk at the lower deciles and overestimated the risk at the higher deciles after intercept adjustment and the reverse was observed for the Rotterdam and the SUNSET models.The Omani models produced an undulating pattern of both underestimation and overestimation.Both the original and the recalibrated Rotterdam models produced an intercept >0.All the original models of the remaining five models produced an intercept <0.This did not change after intercept adjustment for 4 out of the 5 remaining models, except for the SUNSET model, which produced an intercept >0 after recalibration.Three models (Kuwaiti, Omani, and Cambridge) produced a slope <1.The other two models (Rotterdam and SUNSET) produced a slope >1.In general, both the original and recalibrated models exhibited low Brier scores.The exception was observed for the original SUNSET model, which had a Brier score of 0.129.Eyeballing of the calibration graphs revealed that the Omani model provided sufficient attenuation for satisfactory recalibration (Figures A3  and A4, Appendix).The deciles of risk plots are shown in Figures A6 and  A7 in Appendix.

Discussion
In general, the six selected non-invasive models assessed had acceptable discrimination for the detection of diabetes and modest-toacceptable discrimination for prediabetes.Findings from the few validation studies of the selected diabetes models among populations of African ancestry are consistent with modest-to-acceptable discrimination [14,21].The discrimination observed for the Cambridge model in the current study (AUC: 0.75) was higher than in two earlier reports [22,23] (AUC: 0.67, and C-statistic: 0.67 respectively).The Performance of the FINDRIC original model in the current study was found to be lower than what was earlier reported in Nigeria [24], similar to that reported in Kenya [21], but higher than that reported in Botswana [25].With an AUC range of 0.70 to 0.73, the Kuwaiti, Omani, and, Rotterdam predictive models recorded higher discrimination ability in the RODAM study compared to those reported for the Bellville South study (AUC: 0.64-0.68)[22], but lower discrimination ability compared to those recorded among African Surinamese (AUC: 0.78-0.81)and Ghanaians (AUC: 0.74-0.76) in the HELIUS study [26].The lower performance of these models in the current study compared to that of the HELIUS study may be explained by the inclusion of participants with known diabetes (prevalent diabetes) in the HELIUS analysis.
The Rotterdam model was outperformed by all the other models among participants living in Ghana, but it produced acceptable predictions among participants living in Europe.Masconi, Matsha [22] also reported that the Rotterdam model had the lowest validation performance range among the South African population.The SUNSET model was the best performing model for Ghanaians living in their home country, but the worst among Ghanaian migrants in Europe.Thus, the two models have exhibited inverse geographical inconsistencies among a homogeneous population.This could be explained by variations in the baseline risk profile across settings [27], such as higher obesity and hypertension rates among participants living in Europe than those living in Ghana [26].However, such geographical variations in performance may also be attributed to the influence of migration and its impact on diabetes risk profile.Earlier findings from the RODAM study have shown the probability of developing diabetes is higher among Ghanaians living in Europe compared to rural Ghanaians with the same level of BMI [28].A plausible explanation seems that Ghanaians in the home country may engage in more physical activity with a similar level of BMI compared to their peers living in Europe.Therefore, the interaction between captured and non-captured behavioural and biomedical risk factors in the models could account for these differences in performance [26].Among the participants included in this study, 46.4% of Ghanaians living in Europe confirmed having tested for diabetes within the last two years to sampling as compared to 5% of testing among those in Ghana.Since the design excluded participants with known diabetes, a higher degree of testing standards in the European setting may have yielded both differences in screen-detected diabetes prevalence and inclusion rates.However, with a higher prevalence of screen-detected diabetes among Ghanaians living in Europe (5.5%)compared to those in the home country (3.4%), the assumption that a higher frequency of previous diabetes screening in the population living in Europe would affect the  Values are Hosmer & Lemeshow Chi-square value obtain for calibration of the original model and after intercept adjustment of the model (recalibrated model).
The bigger the value, the more deviation there is between the expected prevalence predicted by the model and the observed prevalence.At a 2 degree of freedom, the p-value for the Hosmer & Lemeshow test was <0.0001 for all chisquare values.
performance of models among Ghanaians in Europe seems implausible.Model composition, which includes the number of variables (parsimonious/extended) and type of variables (behavioural/anthropometric) may influence the performance of the model.For example, most studies suggest a higher predictive ability of diabetes by waist circumference (WC) than body mass index (BMI) [29,30].Thus, a model using WC or both WC and BMI may be a better predictor of diabetes than a model using BMI only.Apart from the variables common to all the models, the Cambridge includes smoking and current steroid treatment, the FIN-DRISC includes physical activity, fruit, and vegetable intake while the SUNSET includes ethnicity, resting heart rate, and family history of premature cardiovascular diseases.These variables may share/pick up different aspects of the underlying diabetes risk of these models and therefore have the potential of influencing the performance.However, the benefit of these additional variables in the extended models in the current study is not clear, since the difference in performance among the models with additional variables did not differ significantly from those without.
Poor calibration performance by original models and the inability of model updating/improvement through intercept adjustment, to achieve total attenuation that was seen in the current study, corroborate with earlier studies [22,26].The initial difference in prediction by the original models could result from a true difference in the prevalence of the outcome in the development and validation populations or from a difference in the methods of outcome measurement between the development and the validation study [22,26].The persistence of non-optimal calibration after intercept adjustment has been attributed to the sensitivity of the Hosmer and Lemeshow statistic to sample size, where small differences between estimated and observed risks can still produce a significant p-value in large sample size [11,26].In general, low Brier scores were observed in the current study, lower compared to those reported by Masconi, Matsha [22].However, a lower Brier score does not necessarily imply higher calibration [31].
The pattern of misprediction revealed that the Kuwaiti and the Cambridge models underestimated the probability of the low-risk group and overestimated the probability of the high-risk group, while the Omani model produced sufficient attenuation for a satisfactory calibration based on visual inspection.A good model would be one that would be able to i. identify those at high risk who need confirmation testing, ii.identify those at low risk to be screened out since the confirmation test is invasive and comes at additional costs.Thus, the misclassification as seen with the Kuwaiti and the Cambridge models would not affect their use for first-line screening of diabetes, more so when a threshold is set for absolute risk requiring a further confirmation test (i.e.≥ 32 points for Kuwaiti and >0.37 for Cambridge).Therefore, an overestimation beyond such values does not cause a change in the public health strategy [26].The other two models (Rotterdam and SUNSET) underestimated the probability of the high-risk group and overestimated that of the low-risk group and therefore could not be desirable for the screening of diabetes among such a population of African ancestry.Hence, the usage of the Rotterdam and SUNSET models would tend to subject people with low risk to a confirmation test and delay the diagnosis of those who have diabetes.

Strengths and limitations
The use of a large randomly selected homogeneous sub-Saharan African adult population living in Ghana and Europe with a wide age distribution and capturing a population at high risk through a random selection method limits potential selection bias a strength of this study.The finding of this study could therefore be potentially extrapolated to other sub-Saharan African ancestry migrants living in the western Europe and their counterparts living in West Africa.However, the extension of the current findings among people with African ancestry born in and living in high-income countries such as Europe and North America is uncertain, since the population of this study does not include significant numbers of second generation migrants.Another strength is that HbA1c was measured in one laboratory for samples from all sites.However, the use of HbA1c for the definition of screen-detected diabetes and prediabetes in an African population where there is a high prevalence of hemoglobinopathies and other conditions that may shorten the life of erythrocytes is a potential limitation due to misclassification arising as a result of underestimation [32,33].Also, since there is limited overlap between the various available biochemical tests for diabetes, positive diabetes cases could remain undetected when using such a one-test definition, the false-negative cases could be present, which would lead to underestimation of the prevalence of diabetes and calibration of the model.However, the recalibration of the models through intercept adjustment attenuates such an effect.Another limitation of this study is the fact that the FINDRISC model was originally designed for the detection of incident diabetes prediction and therefore its estimate does not reflect the likelihood of being a prevalent undiagnosed diabetes case.However, the FINDRISC model has been used over the years as both a prevalent and an incidence model, and therefore adding it to the current analysis is appropriate [14,22,34].

Conclusion
Our findings suggest an acceptable discrimination ability of all six non-invasive risk screening models to detect diabetes and moderate discrimination to detect prediabetes among this population of African ancestry.Three out of the five models (Kuwaiti, Omani, and Cambridge) assessed for calibration were found to be useful in the estimation of the absolute risk after adjusting for the prevalence of screen-detected diabetes in the validation population.Further validation and recalibration of these models in non-migrant African and African migrant populations are needed before large-scale use is to be recommended.valuable support in shaping the methods, and to the research assistants, interviewers, and other staff of the five research locations who have taken part in gathering the data.

Table 1
Socio-demographic characteristics and diabetes status of RODAM Study participants.
Data are presented as figures and corresponding percentages.aAge & BMI are presented as mean and standard deviation of the mean.b WC is presented as median and 95% confidence interval of the median.c Diabetes HbA1c ≥48 mmol/moL, prediabetes HbA1 C ≥39 and < 48mmol/ moL and normoglycemia HbA1 C <39 mmol/mol.

Table 2
Predictive performance of diabetes risk score for screen-detected diabetes among the population of Ghanaians living in Europe and Ghanaians living in Ghana.

Table 3
Differences in AUC among the six diabetes risk scores in the prediction of screen-detected diabetes among Ghanaian living in Europe and Ghana.

Table 4
Chi-Square from Hosmer and Lemeshow tests for the calibration of diabetes risk models for screen-detected diabetes before and after recalibration through intercept adjustment among Ghanaians residents in Ghana and Ghanaian Migrants.