Deep learning for automated detection of neovascular leakage on ultra-widefield fluorescein angiography in diabetic retinopathy

Diabetic retinopathy is a leading cause of blindness in working-age adults worldwide. Neovascular leakage on fluorescein angiography indicates progression to the proliferative stage of diabetic retinopathy, which is an important distinction that requires timely ophthalmic intervention with laser or intravitreal injection treatment to reduce the risk of severe, permanent vision loss. In this study, we developed a deep learning algorithm to detect neovascular leakage on ultra-widefield fluorescein angiography images obtained from patients with diabetic retinopathy. The algorithm, an ensemble of three convolutional neural networks, was able to accurately classify neovascular leakage and distinguish this disease marker from other angiographic disease features. With additional real-world validation and testing, our algorithm could facilitate identification of neovascular leakage in the clinical setting, allowing timely intervention to reduce the burden of blinding diabetic eye disease.


Results
Patient and imaging characteristics. The study included 678 images from 377 patients. Table 1 shows the demographic and clinical characteristics of the 377 patients.
Eyes with varying clinical DR severity ranging from no DR to PDR were included, with DR severity determined by eye exam with the treating retinal specialist. Mean logMAR visual acuity was 0.36 (SD 0.36), or Snellen equivalent 20/46. Severity of DR was determined on clinical exam and evaluation by a retinal specialist. 10% (n = 70) of eyes had no DR, 15% (n = 102) of eyes had mild NPDR, 20% (n = 135) of eyes had moderate NPDR, and 13% (n = 90) of eyes had severe NPDR. 24% (n = 163) of eyes had PDR. A substantial proportion of eyes had www.nature.com/scientificreports/ a history of macular edema (43%; n = 290). Graders identified the presence of angiographic neovascular leakage in 18% (n = 120) of images. Most eyes were pseudophakic (63%; n = 425).
Algorithm performance. We trained three CNNs and evaluated performance of the model-averaged ensemble classifier through five-fold cross-validation (Fig. 1). The component CNNs were selected based on performance in a variety of prior ophthalmic applications [13][14][15][16][17][18][19] . Since the data set of 678 images contained only 120 images (18%) with grader-identified neovascular leakage, additional weight was placed on this classification to address the class imbalance. Figure 2A shows the receiver operating characteristic (ROC) curves obtained from five-fold cross-validation. Area under the ROC curve (AUC) was 0.96 for the model-averaged ensemble predictor. The AUCs for each individual CNN in the ensemble were 0.90 for InceptionResNetV2, 0.92 for EfficientNetB6, and 0.94 for ResNet152V2. Figure 2B shows the precision-recall (PR) curves from five-fold cross-validation. The average precision was 0.87 for the ensemble predictor. Individual CNN average precisions were 0.76 for InceptionResNetV2, 0.79 for EfficientNetB6, and 0.83 for ResNet152V2. Table 3 lists the metrics obtained within each fold of training and testing for the ensembled predictor. At the selected operating point, sensitivity was 0.82, specificity was 0.95, and precision was 0.77. Figure 3 shows the confusion matrix for the ensemble classifier at the selected operating point. Most of the positive and negative images were corrected identified by the model, but a proportion of images were falsepositives or false-negatives.
Supplementary Figure 1 shows images of the false-positive and false-negative algorithm predictions from Fold 1 of cross-validation. Some potential reasons for false-negative predictions in these images (Supplementary Fig. 1) included NV that exhibited slow leakage of fluorescein, very small foci of NV, NV near the optic disc, Figure 1. Architecture of the deep learning classifier. Five-fold cross-validation was used with an 80%/20% train/test split. Predictions from the ResNet152V2, EfficientNetB6, and InceptionResNetV2 convolutional neural networks were ensembled using model averaging.   www.nature.com/scientificreports/

Discussion
Our deep learning algorithm was able to detect neovascular leakage in UWF-FA images containing other DR disease features that cause hyper-and hypo-fluorescent angiographic changes. To our knowledge, an algorithm to perform this classification task has not previously been constructed. Our image data set was obtained from a well-characterized group of patients with DM and varying stages of DR 20 . The algorithm may be useful in augmenting ophthalmologists' or retinal specialists' ability to discern neovascular leakage on fluorescein angiography in the clinical setting. Deep learning algorithms have been created for screening, classification, and segmentation of numerous eye conditions 8,9,11,21,22 . In diabetic retinopathy, deep learning has been used to screen patients with diabetes mellitus (DM) for referral-warranted diabetic retinopathy (DR) 11,23 , and to identify the severity of DR 24 25 . However, many studies including these used 50-degree color fundus photographs. Peripheral vascular lesions not visible on traditional photography are commonly found in DR and may be important prognostic indicators 26,27 . Our study used UWF imaging with a 200-degree field of view to include NV lesions outside of the posterior pole 28 . A study by Nagasawa et al. used non-FA UWF imaging to detect treatment-naïve PDR with an AUC of 0.97 29 . However, in their study the classifier only had to distinguish between images of normal subjects and images of subjects with PDR, whereas discriminating between NPDR and PDR may be a more challenging task. Our algorithm trained on UWF-FA images achieved an AUC of 0.96 to detect NV leakage, a finding that is diagnostic of PDR. Sickle cell retinopathy is another retinal vascular disease which may also progress into a proliferative stage characterized by neovascularization and tractional retinal detachment. Cai et al. trained an InceptionV4 network to detect seafan neovascularization from ultra-wide-field fundus photographs, achieving sensitivity and specificity of 0.97 30 . Early detection of proliferative vascular disease using automated methods may facilitate early treatment to reduce the risk of vision loss.
Our data included 163 eyes classified as PDR by a clinician, whereas only 120 images were labeled by graders as positive for NV leakage. This discrepancy could be explained by clinical scenarios in which a clinician would diagnose PDR despite lack of evidence of NV on UWF-FA. These could include development of vitreous hemorrhage in an eye with known pre-existing diabetic retinopathy, imaging of a patient who had intravitreal injection performed for proliferative disease preceding referral (causing regression of NV on imaging), neovascularization in the far periphery visible clinically but not visualized on UWF-FA, or iris neovascularization. False-positive predictions by the algorithm were not statistically different (Chi-Squared Test) between eyes with PDR without NV leakage on imaging, and eyes with NPDR without NV leakage on imaging.
Fluorescein angiogram results are typically recorded as a collection of a dozen or more image frames reflecting the different time points and phases of the angiogram. A potential use of the algorithm would be identifying image frames in which neovascularization is detected, indicating the most important frames to review. In this role, the algorithm would facilitate clinical diagnosis. Deep learning has been used to detect abnormalities on FA in other retinal diseases such as retinopathy of prematurity and age-related macular degeneration [31][32][33] . Since PDR is a cause of vision-threatening complications of diabetes mellitus, early diagnosis is key to obtaining appropriate treatment to prevent vision loss 4,5 . Leakage from NV is present in active, untreated proliferative diabetic retinopathy, and often resolves with treatment of PDR.
The limitations of this study include the lack of a larger, standardized external test image set, the inherent limitations of a retrospective, single-center study, inability to assess for diabetic macular edema, and that images used were only from the early venous phase of fluorescein angiography. No standardized external test image set exists for this classifier. We used a data set of 22 images from the ASRS Image Bank as a pilot data set for external validation, but a larger, standardized external test set would be needed to confirm the algorithm's generalizability. Improved data capture and sharing standards through the National Eye Institute Bridge2AI initiative or a model-to-data approach could be used to confirm external validity 34 . Although diabetic macular edema does cause leakage on FA, the preferred imaging modality for detection of diabetic macular edema is optical coherence tomography and not FA. A multi-modality imaging approach with optical coherence tomography would be needed to incorporate detection of diabetic macular edema which is an additional vision-threatening complication of DM. Finally, we chose the early venous phase of FA because leakage from NV appeared to be most prominent during this phase in contrast to other angiographic findings. We reasoned that earlier phases of FA would be less likely to exhibit leakage from neovascularization, and that in the later phases of FA, hyperfluorescence from staining would be more difficult to distinguish from leakage. However, leakage from NV may also be present during other phases, and the temporal pattern of leakage could provide additional information to the model. Analysis of videos of FA could also be of benefit, but video FA is not widely recorded which limits its generalizability.
In summary, we trained a deep learning algorithm to detect the presence of neovascular leakage in UWF-FA images from patients with diabetic retinopathy. With additional testing to verify external validity, the algorithm could help guide early identification and treatment of proliferative diabetic retinopathy.

Methods
We included patients 18 years of age or older with DM who received a diagnosis of diabetic retinopathy by a retina-trained clinician and had an ultra-widefield fluorescein angiography (UWF-FA) study performed between January 2009 and May 2018 at two sites of a tertiary academic medical center (Kellogg Eye Center Ann Arbor and Grand Blanc). Images were obtained retrospectively from a previously generated data set used for quantification of retinal neovascularization 20  www.nature.com/scientificreports/ angiogram were used because leakage from NV was most prominent during this phase compared to other angiographic findings. Images were excluded if image quality was too poor to identify distinct fundus features. Images were labeled with neovascular leakage by graders who underwent training as described previously 20 . Graders were masked to patient data. A fellowship-trained retinal specialist (P.Y.Z.) verified grader labels. Images were cropped to the central 1792 × 1280 and downsampled to 896 × 640. This study adhered to the tenets of the Declaration of Helsinki. The study was initiated after approval by the University of Michigan Institutional Review Board (HUM00120509, PI: Y.M. Paulus), which approved an exemption for the requirement to obtain informed consent. For the deep learning algorithm, we trained three CNNs: ResNet152V2, EfficientNetB6, and Inception-ResNetV2. Each network was pre-trained on ImageNet and then trained on the UWF-FA data set. The algorithm as evaluated using five-fold cross-validation, which each fold consisting of an 80% training set and 20% test set. To generate an ensembled prediction, we averaged the predictions of each of the three CNNs. The Adam algorithm was used for optimization, and the learning rate was set to 0.0005. The batch size was set to 16. Training images were randomly augmented with horizontal and/or vertical translation of up to 10%, random horizontal and vertical flip, image rotation of up to 72 degrees, and image zoom between 90 and 110%. Training was set to a maximum of 35 epochs, with early stopping if the training loss did not decrease after 8 epochs. Computation was performed using Keras with TensorFlow version 2.7.0 as backend on the University of Michigan High Performance Computing Cluster (16 GB NVIDIA Tesla V100; NVIDIA Corporation). Statistical analyses were performed using Excel and Python version 3.9.7 with scikit-learn module version 1.0.1.

Data availability
The data that support the results of this study are not publicly available in respect of patient confidentiality. Programming code and algorithm weights are available upon reasonable request with the corresponding authors.