Development and validation of predictive risk models for sight threatening diabetic retinopathy in patients with type 2 diabetes to be applied as triage tools in resource limited settings

Summary Background Delayed diagnosis and treatment of sight threatening diabetic retinopathy (STDR) is a common cause of visual impairment in people with Type 2 diabetes. Therefore, systematic regular retinal screening is recommended, but global coverage of such services is challenging. We aimed to develop and validate predictive models for STDR to identify ‘at-risk’ population for retinal screening. Methods Models were developed using datasets obtained from general practices in inner London, United Kingdom (UK) on adults with type 2 Diabetes during the period 2007–2017. Three models were developed using Cox regression and model performance was assessed using C statistic, calibration slope and observed to expected ratio measures. Models were externally validated in cohorts from Wales, UK and India. Findings A total of 40,334 people were included in the model development phase of which 1427 (3·54%) people developed STDR. Age, gender, diabetes duration, antidiabetic medication history, glycated haemoglobin (HbA1c), and history of retinopathy were included as predictors in the Model 1, Model 2 excluded retinopathy status, and Model 3 further excluded HbA1c. All three models attained strong discrimination performance in the model development dataset with C statistics ranging from 0·778 to 0·832, and in the external validation datasets (C statistic 0·685 – 0·823) with calibration slopes closer to 1 following re-calibration of the baseline survival. Interpretation We have developed new risk prediction equations to identify those at risk of STDR in people with type 2 diabetes in any resource-setting so that they can be screened and treated early. Future testing, and piloting is required before implementation. Funding This study was funded by the GCRF UKRI (MR/P207881/1) and supported by the NIHR Biomedical Research Centre at Moorfields Eye Hospital NHS Foundation Trust and UCL Institute of Ophthalmology.


Introduction
Diabetes and its complications are a significant global health burden and currently, there are around 463 million people with diabetes worldwide. 1,2 Diabetic retinopathy, the most common microvascular complication of diabetes, is a common cause of visual impairment in the working age group people. 2,3 The retinopathy can progress to sight threatening diabetic retinopathy (STDR) without any symptoms and STDR has to be identified early by retinal examination or photography. 4 It is estimated that approximately 28 million people with diabetes have STDR globally. 5 Therefore, systematic regular retinal screening is recommended for all people with diabetes. However, retinal screening is resource intense, and most countries do not have the expertise or facilities to develop and sustain a systematic retinal screening programme for their increasing population with diabetes. Due to other pressing health priorities, establishing diabetic retinopathy screening is not a priority in the majority of less-developed countries. Therefore, the numbers of people with visual impairment due to STDR is likely to increase with the rising prevalence of diabetes. There is an unmet need to identify those at risk of STDR using easily available predictors so that they can be prioritised for retinal examination or screening. In addition, a targeted optimisation of risk factors for this group of individuals may also reduce the risk of disease progression.
There are several prognostic models that have been developed for STDR to personalise retinal screening strategy. 6,7 However, these cannot be applied in lowresource settings as most of them require previous record of retinopathy status, glycated haemoglobin (HbA1c) estimation or other laboratory or clinical parameters (Supplement Table 1). 8−25 Therefore, although the presence of diabetic retinopathy and HbA1c are strong predictors of STDR, an ideal risk model for STDR in low resource settings should be limited to demographic and clinical parameters that patients can inform fieldworkers in community screening. Community-based health-screening is widely practised in low and middle-income countries as primary care is still in its infancy.
Although systematic diabetic retinopathy screening with retinal camera is the gold standard, most people with diabetes do not have access to this service globally. The aim of this study was to develop predictive models for STDR that could be applied based on available resources so that those at risk of STDR could be prioritised for retinal screening from the growing population with diabetes.

Methods
Local research ethics approval was obtained from Moorfields Research Management Committee. Further ethics approval from Health Research Authority was not Articles required as the study included only fully anonymised data. Approval was also obtained from the Caldecott guardian of these anonymised datasets in Queen Mary University London (QMUL) and Secure Anonymised Information Linkage (SAIL) in Wales and local research ethics approval in Madras Diabetes Research Foundation (MDRF), Chennai, India. This study was conducted in accordance with the Declaration of Helsinki. Patientlevel consent was not required as the study only used fully anonymised routinely collected data (SIVS1057, Moorfields Eye Hospital dated 14/04/2020).

Study design, setting and data source
Model development dataset. We developed the predictive models using existing dataset obtained from general practices (GP) in three Clinical Commissioning Groups (CCGs) in East London, which included Newham, Tower Hamlets, and City and Hackney. The dataset covered more than 98% of the GP-registered multiethnic population in these CCGs. The data included demographic information, diagnoses, prescriptions, referrals, laboratory test results and clinical values. Diagnoses, symptoms and clinical values were recorded using read code classifications.
We included all adults with a diagnostic read code for type 2 diabetes (T2DM) during the period 01/01/2007-31/ 12/2017 and aged 18 or over at study entry. As the retinal screening events and DR screening events were not recorded simultaneously, we allowed for a 6-month delay for DR events to be recorded from the date of retinal examination. Study baseline was defined as the date of the first DR screening examination (or recorded non-STDR event date where retinal screening examination was not recorded in the previous 6 months) within the study start and end dates. Start date was defined as the latest of 01/01/2007, the patients registration date, or the date the patient turned 18. In our cohort, participants need a baseline screening episode or DR event followed by a last screening episode or DR event recorded during the study period. However, participants who developed STDR outside of the study period, require their last screening episode or last non-STDR event date not necessarily to have occurred during the study period but recorded at least 6 months prior to their STDR onset date. Follow-up time end date is defined as the earliest date of first diagnosis of STDR, date of death, 31/12/ 2017, de-registration from the GP, or the date of their last DR screening appointment or last DR event. People who were lost to follow up were censored at the date they left the study. Follow-up time was censored at 3-years.

Model validation datasets
Fully anonymised data from the SAIL databank 26

Outcome
The main outcome was STDR and this was classified according to the American Academy of Ophthalmology International Classification as the first record of severe non-proliferative diabetic retinopathy, proliferative diabetic retinopathy or diabetic macular oedema. In the UK datasets, the respective grades of R2, R3 and M1 based on the English Diabetic Eye Screening Programme classification were used. The grades of retinopathy were determined by trained graders or ophthalmologists from retinal images captured through dilated pupils using fixed retinal cameras.

Predictor variables
Based on existing literature (Supplement Table 2) we have identified the predictors that are found to be associated with the outcome and then these variables were checked in the model development dataset. After considering their availability in the model development dataset we considered the following risk factors in this study: age, HbA1c, systolic blood pressure (SBP), duration of diabetes, body mass index (BMI), total cholesterol, antidiabetic medication, estimated glomerular filtration rate (eGFR), history of cardiovascular disease (ischaemic heart disease, heart failure, stroke, peripheral vascular disease, cardiovascular death, acute myocardial infarction, bypass graft/angioplasty, angina pectoris, cardiac arrhythmia, major ECG abnormality, silent myocardial infarction, congestive heart failure, transient ischemic attack, arterial event requiring surgery). These covariates were measured at baseline for each individual and the closest record to the index date ( § 6 months) was selected for clinical variables.
The covariates considered in the SAIL dataset were also recorded in the same way as the model development dataset, taking the closest record to the index date ( § 6 months) and coding standards were consistent with Quality outcomes Framework (QOF). 27 However, eGFR in the SAIL dataset was calculated from data on serum creatinine, ethnicity and age using the 4-variable Modification of Diet in Renal Disease Study equation unlike in the model development dataset where eGFR was directly provided by the laboratories. 28 The MDRF dataset had fewer newly diagnosed participants as patients were seen in secondary care (hospital data), which may or may not have been the initial point of contact unlike the UK primary care datasets. Eligibility criteria for MDRF was different to that of the UK cohorts, in that the dataset was previously curated to include participants who had concurrently undergone routine screening for assessment for eGFR and DR. 29 Sample size and missing data. The Events per Variable (EPV) was between 18 and 95 in the datasets, assuming a maximum number of 35 parameters used in this study, indicating adequate sample size for model development and validation. 30 All covariates were inspected for missing values by coding the missing data as a separate category and then they were considered in the univariable and multivariable modelling. None of the variables retained in the final models had missing data and therefore, no further actions were required.

Model development and validation
We developed the predictive model using the Cox proportional hazard model given below.
Predictive factors were selected using backward elimination procedure while considering statistical significance at each step, variables that were statistically insignificant (p>0.05) were removed until all of the variables became statistically significant in the model (p<0.05). Final parsimonious model was then further assessed for its performance in the development data set and then each variable was assessed for its contribution towards model performance, those variables with least contribution (<0.1 change in C-statistic) were removed from the model and this was identified as Model 1. A further reduced model was developed by removing predictors that would require laboratory testing and retinal screening expertise and therefore difficult to apply in resource restricted settings (Model 2). A minimal model with non-invasive predictors was then obtained by removing any clinical variables/laboratory tests from reduced model above and this was named as Model 3. These three models were used to assess the effects of prognostic factors, and hazard ratios (HR) with 95% confidence intervals (CI) are presented for each of the variable.
Internal validity of the model was assessed according to the measures of model performance in the development datasets. 30 External validity of the model was assessed in SAIL and MDRF datasets described above. For both internal and external validation, model performance was assessed using calibration and discrimination measures where calibration, is the agreement between observed and predicted times to the outcome. 31 The calibration slope was quantified using the beta-coefficient of the linear predictor in each dataset. This gives an impression of whether risks were over or under predicted across all time points. To visualise calibration at a single time-point at 3 years across various risk thresholds, participants were categorised into deciles of 3-year STDR risk, with observed (3-year Kaplan-Meier event rate) and mean predicted risks plotted for each group. The ratio of observed to expected (O/E) risks were also reported which summarises the agreement between Kaplan-Meier event rate and mean predicted risk at 3 years. 31 Model discrimination is the ability of the model to differentiate between patients who reached the endpoint and those who didn't. This measure is quantified using the C-statistic in other words Area Under the Curve. A C−statistic between 0¢6 up to 0¢7, and a C-statistic between 0¢7 and 0¢9 suggests good and strong discrimination of the predictive models respectively. Models were updated in each external validation cohort, if required, by re-calibrating the baseline survival function at 3-years. This is where the calibration slope is set to 1, by assigning the linear predictor as an offset term in the model for the external validation datasets. Updating of the baseline survival aims to correct the calibration slope to make the average predicted risk equal to the observed overall event rate. A risk chart was developed for the minimal model (Model 3) using colours representing high, medium and low risk for having STDR within the next 3 years. This chart was produced Articles using all three datasets, model development and model validation datasets in the UK and India. Data management and analysis was performed using Stata 17 (Stata Corp., College Station, Texas, USA).

Sensitivity analysis
As the date of onset of STDR is difficult to pinpoint with routine data, final models were re-estimated accounting for the interval censored nature of retinal screening data using the interval-censored Cox model. 32,33 These models were fit using the IcenReg package in R. 34 In this approach the left interval was defined as the time from baseline to the last screening examination in which the participant was event free and the right interval the time from baseline to the date STDR was recorded. Participants who did not develop STDR were assumed to be right censored where previous definitions for end of follow-up apply. DR events recorded within 6 months of their retinal screening date were assumed to be the result from that screening visit for the model development dataset as screening and DR events were not recorded simultaneously. Moreover, incidence rates generated from the nonparametric Turnbull's estimator 35 (allowing for events to occur in an interval) using the R package Interval 36 and Kaplan-Meier estimate (assuming specific event times) were compared.
Role of the funding source: The funder had no role in the design and conduct of the study; collection, management, analysis or interpretation of the data; preparation, review or approval of the manuscript; and decision to submit the manuscript for publication.

Ethics approval and consent to participate
Ethical approval was not necessary for the use of deidentified data derived from routinely recorded information in the electronic health record accessed by the Clinical Effectiveness Group Queen Mary University of London with the permission of the general practitioner data controllers.

Results
The model development dataset included 71,908 people with T2DM diagnosis and screening code during the period from 2007-2017. From these patients after several exclusion criteria as explained in detail in Supplement Figure 1 there were 40,334 patients eligible to be included in the study.
Overall study population Table 1 shows baseline characteristics of study population by each dataset. Overall, 40,334 people in model development dataset, 102,672 people in UK validation dataset and 17,509 people in Indian validation dataset met our inclusion criteria. All of the variables that contributed toward the final models (age, gender, duration of diabetes, HbA1c, history of background or mild to moderate retinopathy) were complete in the model development dataset and ethnicity was recorded among more than 98% of the model development dataset.
Supplement Table 4 shows the number of incident cases of STDR during the follow-up period and the incidence rate in model development and model validation datasets. The model development dataset had a total of 1427 people developed STDR events during the followup period, with an incidence rate per 1000 person years of 14.69 (95% CI: 13.94 to 15.47).

Baseline characteristics of the study population
The highest proportion of people, 21,672 (53¢7%) in model development and 57,352 (55¢9%) in UK validation dataset had 0−2 years of duration of known diabetes. However, the highest proportion of people 7514 (42¢9%) in Indian validation dataset had greater than 10 years of known duration of diabetes at baseline. Both the UK model development dataset and Indian model validation dataset had lower proportions of people aged 65+, 26% and 13% respectively compared with the UK model validation dataset with 48% of people aged 65 and above. More than 75% of people in UK were overweight or obese with a BMI≥25 kg/m 2 whereas the Indian dataset had 67% people in the same group. HbA1c levels distribution was similar in UK model development dataset and Indian model validation dataset with around 13.2% people in the ≥ 80 mmol/mol, however the UK model validation dataset had 8.5% of people in this group. All three datasets had around 24%−37% people in highest SBP≥160 mmHg group. Total Cholesterol level was <5.2 mmol/L among 29,919 (77.5%) and 14

Internal validation and external validation
All three models had strong performance in discrimination with a C-statistic greater than 0.7 in the model development dataset as given in Table 3. Model performance in all ethnic groups of White, South Asians, Black and other were also strong with C-Statistic greater than 0.7 (Supplement Table 5 In relation to the external validation, Model 1 had excellent performance both in the UK and Indian dataset with a C-statistic that is greater than 0.7 and calibration slope value closer to 1. Model 2 with fewer variables had slightly lower performance both in the UK and Indian datasets. Model 3 the non-invasive model had satisfactory performance in the UK and Indian dataset with C statistic of 0¢685 and 0¢713 respectively, and the observed to expected ratio closer to 1 following model re-calibration of the baseline survival function showed its applicability in both settings ( Table 3). The beta coefficient of the linear predictor or calibration slope suggests that on average observed risks across all time points by 3 years were being over-estimated by our models in Wales and in India, except for model 1 in India which has a calibration slope > 1. The calibration plot in external validation datasets (Figure 1) was generated by categorising 3-year risks into 10 groups, and it suggest the predicted risks appear to be better aligned to the observed risks at 3-years in the UK (Wales) dataset than India. However, following re-calibration of the baseline survival, observed and predicted risks in both UK and India datasets were more closely aligned. Observed over expected (O/E) ratios show that all models applied in the Indian datasets, were on average over-predicting risks at 3 years and risks being slightly under-predicted in Wales at 3-years. However, re-calibrated O/E ratios for both UK (Wales) and India were all nearer to 1 (1.013-1¢026).

Model presentation
The risk-chart ( Figure 2) is a graphical representation of the risk score, and this shows the risk of each individual according to their group of T2DM duration, age, gender and antidiabetic medication. For example, a female aged 55 with one year duration of diabetes and on insulin has an estimated 3-year risk of 6% to develop STDR.

Articles
Whereas a male aged 55 years with one year duration of diabetes and on insulin has an estimated 3-year risk of 8% to develop STDR. In addition, for those with duration more than 10 years the risk of STDR was peaking in the age group of <45 years, with further increased risk if they were taking two antidiabetic medications or be placed on insulin. Supplementary file Figure 1 and 2 provides the risk charts generated for UK validation dataset and Indian validation dataset.

Sensitivity analysis
The incidence rates of STDR using the Turnbull's estimator which accounts for the interval censored nature of routine healthcare data, appear in close correspondence with Kaplan-Meier rates in the development cohort (Supplement Table 6). Similarly, Hazard ratios from the interval-censored Cox model presented in Supplement Table 7 are consistent with our final models generated using the Cox model assuming right censoring (Table 2).

Discussion
We have developed and externally validated three risk prediction equations to estimate the absolute risk of STDR over a period of 3 years. The first model may be applied in settings where laboratory facilities and retinal screening is available. The second model is for settings where retinal screening is a challenge, but HbA1c is routinely available. The third model is developed for resource restricted settings. The equations are well calibrated and have good performance with C statistics close to 0.7 and above in both the development and external validation datasets. To our knowledge this is the first study that has developed a risk prediction model for STDR to triage the at-risk group for retinal screening especially in resource restricted settings where systematic retinal screening is not available or accessible. Although, all people with diabetes should be screened regularly for STDR, nearly all low-and middle-income countries lack a systematic retinal screening programme and late presentation of STDR with irreversible visual loss is prevalent. The third model uses no laboratory markers or record of previous retinopathy status. Although, the least accurate of the three models, this model may be used by community workers to prioritise people at risk for retinal screening, facilitating efficient use of the limited capacity of retinal facilities. It may also be promoted as a self-assessment tool that can be used by people with diabetes to assess their risk of STDR. It may help them to make informed decisions about managing their level of diabetes and preventing or delaying STDR.
Undiagnosed diabetes also remains a global challenge. In our UK datasets, more than 50% of the people were newly diagnosed diabetes (≤2 years duration at baseline). However, some of them may present with STDR. Therefore, using these datasets enabled us to take this factor into account. Our risk models highlight that duration of diagnosed diabetes is by itself an insufficient predictor of STDR. Whilst several non-laboratory-based risk scores are available for community screening of diabetes, no similar models exist for identifying STDR alongside. On the contrary, a simple urine dipstick can identify albuminuria in all newly detected  person with diabetes. Therefore, this model may also be useful in such circumstances. Two recent systematic reviews on predictive models for diabetic retinopathy have summarised the existing literature on this topic and has identified several predictive models on severe diabetic retinopathy related outcomes including any form of retinopathy, 8 blindness, 9,13,20 diabetic macular oedema (DME) or proliferative diabetic retinopathy (PDR), 10,14 treatment of DME/PDR, 12 intraretinal microvascular abnormalities, 15 STDR, 16,19 retinopathy requiring photocoagulation, 21 and other forms of severe diabetic retinopathy 23,24 as presented in Supplement Table 1. These studies have used different data sources including routinely available databases such as The Health Improvement Network Data (THIN data) 18 and Clinical Practice Research Datalink Data (CPRD), 20 US claims database, 14 hospital databases, 37,38 diabetic eye screening data and clinical trials data. 9,15 However, models utilising only non-laboratory parameters are required for application in low-resource settings.
Only a few of existing predictive models were identified with low risk of bias mainly because most had small sample sizes, missing data, or lacked external validation. Our study has filled this gap by introducing three models that could be chosen based on the resource setting. We also externally validated the models in two datasets from UK and India so that the models are tested in both high and middle-income countries. We have also provided a risk chart with a colour scheme to aid community workers. These models are not a replacement for retinal screening and should be used only for prioritisation for regular retinal screening.
One of the limitations of the study is that there were differences in the sources of our datasets. Our development dataset is a primary care dataset from London, where records are updated from the diabetic screening units while the SAIL de-identified dataset on the population of Wales obtain data from multiple resources including retinal screening episodes. The SAIL dataset had poor recording of ethnicity with (54,483) 53% missing values and by extension eGFR (which was calculated using the Modification of Diet in Renal Disease (MDRD) equation incorporating serum creatinine and ethnicity) with 54.5% missing values but none of these variables were included in the models and therefore did not have an impact on the results. However, further studies with complete data on all relevant covariates or studies using missing data analysis techniques such as multiple imputation to address missing values would be useful as part of any pilot or external validation of these models. The India dataset was from electronic medical records from an established diabetes centre. Furthermore, the English national diabetic eye screening classification of R2 also includes a proportion of people with less severe grades compared to the American Academy of Ophthalmology International Classification. In addition, there are inconsistencies in referral criteria used around the world and the use of "sightthreatening" and "vision-threatening" retinopathy. Therefore, further studies using alternative referral thresholds are required.
In conclusion, we have developed three predictive models to predict three-year risk of STDR that may be applied based on resource settings. These risk scores may be used to identify those who need prioritisation for retinal screening and treatment so that the rate of blindness due to STDR does not rise with the rising prevalence of diabetes. However, further testing and piloting of these models would be required, and they do not replace retinal screening but could be more useful as a pre-screening strategy until systematic retinal screening for all people with diabetes is made available globally.