Introduction

In 2017, the International Diabetes Federation estimated that 13% of the United States population (30.2 million people) had diabetes. Moreover, it was estimated that 1 in 3 of these individuals had some degree of diabetic retinopathy (DR), a common visual complication of diabetes [1]. There are numerous vision-saving interventions for DR including intensive glycemic control [2,3,4], intraocular anti-vascular endothelial growth factor (anti-VEGF) injections [5,6,7,8], steroid injections [9], laser photocoagulation, and/or surgery [10,11,12]. Using these techniques, it is estimated that early interventions can prevent over 90% of significant vision loss at five years [13]; yet, DR remains the leading cause of blindness in working-age adults (20–65 years old) [1].

Many individuals, especially those from a lower socioeconomic status, do not seek medical evaluation and intervention [14,15,16]. Four years after the American Diabetes Association began recommending annual eye exams for individuals with diabetes [17], less than one-half of adults in the United States with diabetes received DR screening either at recommended intervals or at all [18]. Despite significant improvements in treatments and awareness, two decades later, this statistic has not significantly improved. A 2016 study following 339,646 individuals with diabetes found that less than half received the recommended annual eye exam [19]. Major risk factors for not receiving eye screening include young age [14], low socioeconomic status [14,15,16, 20, 21], black or Latino ethnicity [21,22,23], low health literacy [21], and lack of access to screening services [21, 24, 25]. In addition, low screening rates may be influenced by referral practices, and one study found that as few as 35% of diabetic patients were referred for eye exams by other specialists [26].

A possible solution to reducing DR-related vision loss is improving access to high-quality DR screening programs, which offer early and accurate referral for vision-threatening DR. Telemedicine and remote, digital retinal imaging have emerged as a potential resource-effective, technological solutions to the growing need for accurate and widely available DR screening. Recently, attention has been drawn to smartphones as a screening solution which combines telecommunication and imaging capabilities [27]. Remote imaging programs have already succeeded in improving DR screening rates and lowering the incidence of DR-related vision loss in select populations [28,29,30,31,32]. To date, the majority of these remote screening programs are designed upon a two-step approach, wherein patients are first imaged, and then the images are sent for delayed diagnostic grading by an ophthalmologist or a trained grader [33,34,35,36,37,38]. Yet, this approach may introduce unnecessary cost and delay, posting barriers to patients receiving needed treatment.

Attention has recently been drawn to artificial intelligence (AI) as a potential solution to this logistical dilemma. By training on large, pre-scored data sets, these mathematical models and algorithms are capable of learning complex behaviors, such as image classification, which have traditionally required a human. Thus far, researchers have utilized AI to diagnose a range of diseases, including pediatric pneumonia [39], malaria [40], and numerous ocular conditions [39, 41,42,43,44,45,46,47]. Recent programs have been shown to identify DR with diagnostic accuracy similar to trained graders and ophthalmologists [41, 48,49,50,51,52,53,54]. AI systems displaying both high sensitivity and specificity could allow for preliminary screening of large patient populations and could reduce screening costs. However, AI alone will not allow for greater access to remote patient populations, and increased screening for DR remains a critical need across the globe. About 79% of individuals with diabetes live either in low or middle-income countries [1], and estimates suggest that 84% of these cases remain undiagnosed [55]. Thus, if paired with portable imaging devices, AI could allow for rapid, autonomous identification of RWDR in near real-time; thereby drastically simplifying the screening process and improving accessibility.

Herein, we present a mobile platform that combines portable, smartphone-based wide-field retinal imaging with automated grading to detect RWDR. The sensitivity and specificity of the system was determined through a comparison to current gold-standard slit-lamp biomicroscopy.

Materials and methods

RetinaScope hardware

For a detailed description of the device hardware and software design, please reference previous publications from our group [56, 57]. Briefly, the RetinaScope weighs roughly 310 g. Its 3D-printed plastic housing encloses optics for illuminating and imaging the retina onto the smartphone camera. Deep red (655-nm peak wavelength) light emitting diodes are used for focusing and to minimize photopic response. During imaging, polarized bright white illumination is used in conjunction with two polarizing filters to minimize unwanted glare (Fig. 1a). A display may be magnetically attached to either side of the device to display a fixation target (Fig. 1b). The electronic hardware in RetinaScope communicates with an iPhone (Apple Inc., Cupertino, CA) application via Bluetooth (Fig. 1c). Prior to image acquisition, operators can employ intuitive touch and swipe motions to adjust focus, zoom, and exposure. This approach reduces the time necessary to capture the image and minimizes patient discomfort. After pharmacological mydriasis, each fundus image has an ~50° field of view. Using a custom algorithm running directly on the smartphone, sequential images may be computationally merged to create an ~100°, wide-field montage of the retina (Fig. 2). Images can then either be stored on the iPhone or directly uploaded to a secure server using Wi-Fi or cellular service for remote reviewing.

Fig. 1: RetinaScope hardware.
figure 1

a A series of optical lenses are used to focus light from the LED array on the retina (orange light). Reflecting off the retina, light passes through the wire grid beamsplitter and is focused on the smartphone’s camera sensor (blue arrow). b During image acquisition, patients focus a fixation dot on the magnetic display (grey arrow) to guide their gaze while a front element (blue arrow) acquires the image. c Technicians are capable of controlling the device via intuitive touch controls and an application running on the paired iPhone.

Fig. 2: Example wide-field montage assembled using RetinaScope software.
figure 2

Three overlapping images are captured of the central, nasal, and inferior retina of the right eye and merged on phone using custom software to create a wide-field montage of the retina. Each individual image is approximately 50°.

Study participants

Study participants were recruited at the University of Michigan Kellogg Eye Center Retina Clinic and the ophthalmology consultation service at the University of Michigan Hospital, Ann Arbor, MI, in accordance with the University of Michigan Institutional Review Board Committee approval (HUM00097907 and HUM00091492). The study adhered to the tenets of the Declaration of Helsinki and was registered at ClinicalTrials.gov (Identifier NCT03076697). Inclusion criteria required patients be at least 18 years of age and show no significant bilateral media opacity (e.g. vitreous haemorrhage or advanced cataract). Participantsʼ demographic data, including age and sex, and clinical findings were recorded.

Photography and remote interpretation

Images were acquired by a medical student and a medical intern rather than ophthalmologists or ophthalmic photographers. Patients underwent dilated fundus imaging in a dimmed room at the Kellogg Eye Center Retina Clinic. Smartphone imaging was used in conjunction with a custom software application to capture five sequential images (central, inferior, superior, nasal, and temporal). Both eyes were imaged except when one eye was not dilated, had severe media opacity, or the patient was monocular. Patients subsequently underwent a gold-standard dilated eye examination as part of routine care. The images were subsequently evaluated by a retina specialist and a comprehensive ophthalmologist who specialized in telemedicine. The investigators were masked to the clinical DR severity grading. Images were graded in a controlled environment on a high-resolution (1600 × 1200 pixels) 19-inch display with standard luminance and contrast on a black background. A grading template was used to assess the severity of the DR as mild, moderate, or severe non-proliferative DR (NPDR); proliferative DR (PDR); or no DR. The presence of clinically significant macular oedema (CSMO) was also evaluated. All grading was assessed in accordance with the modified Airlie House classification system used in the Early Treatment Diabetic Retinopathy Study (ETDRS) severity classification criteria [58]. Eyes displaying moderate or severe NPDR, PDR, or CSMO were classified as RWDR.

EyeArt® AI eye screening system for autonomous grading

After the patients’ identities were masked, the smartphone images were uploaded to the EyeArt® (v2.0) system. The EyeArt AI eye screening system is an autonomous, cloud based, deep neural network software designed to detect the presence of RWDR. Image quality was evaluated and images with quality insufficient for DR screening were excluded from further analyses. Gradable images were enhanced and normalized to improve lesion visibility, and then analyzed to localize and identify lesions. The location, size, and degree of confidence of lesion detection were used to assess DR severity on the International Classification of DR (ICDR) scale. Finally, hard exudates within one disc-diameter of the macula were used as biomarkers indicating the presence of CSMO. Using this data from all the images belonging to the eye/patient, the system assigned a referral score to the eye/patient, which if above a preset threshold, would result in a binary decision to refer the patient (Fig. 3). A ROC curve was generated by varying this threshold and plotting the true positive rate against the false positive rate (view Supplemental Fig.).

Fig. 3: Visual depiction of the EyeArt system used for binary classification of images as RWDR or non-RWDR.
figure 3

EyeArt system flow diagram indicating the sequence of operations performed on the input retinal images for binary classification of images as RWDR or non-RWDR. After the initial image analysis operations, the DR severity and presence/absence of surrogate markers for CSMO in all images of the patient is used to assign a referral score for the patient, which if above a preset threshold results in a binary decision to refer the patient.

Statistical analysis

At the eye level, a generalized estimating equation (GEE) logistic regression with an exchangeable working correlation matrix was used in RStudio to estimate the sensitivity and the specificity for each grader and account for the inter eye correlations. The clinical diagnoses were used as the true references for each grader. The GEE logistic regression was then used to determine whether there was a significant difference in sensitivity between graders. Robust standard errors were employed when deriving the 95% confidence intervals. At the patient level, if a patient had both the right and the left eyes imaged, the corresponding eyes were paired. Patients were classified as referral warranted if either eye was clinically diagnosed as having RWDR. Independence was assumed between each patient, and thus, a standard, 2 by 2 matrix was used. Due to the high sensitivity and specificity values, Wilson 95% confidence intervals were calculated for each grader. A McNemar’s (chi-square) test was used to determine whether there was a significant difference in sensitivity between the human graders and the AI. Finally, a weighted kappa was calculated to assess inter-grader reliability between the human graders.

Ethics

All patients consented to enrol in this study and were informed of the involved risks. It was emphasized that each patient had the ability to leave the study at any point. In addition, imaging personnel were instructed to cease patient evaluations when potential harm to the patient became a possibility.

Results

Demographic data and gold-standard evaluation

A total of 72 patients with diabetes undergoing routine dilated clinical exams were recruited from the University of Michigan Kellogg Eye Center Retina Clinic. Twenty-six eyes were excluded from smartphone imaging due to either dense media opacity (e.g. vitreous haemorrhage), lack of mydriasis, or imaging deferral by patient. Three patients were removed due to lack of imaging of either eye. A total of 119 eyes from 69 patients were included for analysis. The mean age of the cohort was 57.0 years (standard deviation = 15.7 years); 26 patients (37.7%) were female. One patient was excluded from patient level analysis due to the absence of a smartphone image necessary to make an accurate referral recommendation. By gold-standard clinical diagnosis, RWDR was present in 88 eyes (73.9%) and 54 individuals (78.3%).

Sensitivity and specificity

At the patient level, automated analysis achieved a sensitivity and specificity of 87.0% and 78.6%, respectively (Table 1). Automated analysis maintained a greater specificity than both grader 1 (42.9%) and grader 2 (50.0%). Grader 1 achieved a significantly greater sensitivity than the automated analysis (96.3%; p = 0.02); however, grader 2 did not (92.5%; p = 0.3).

Table 1 Patient level sensitivity and specificity of all three graders as derived using a standard 2 × 2 matrix and Wilson confidence intervals.

At the eye-level, automated analysis achieved a sensitivity and specificity of 77.8% and 71.5%, respectively (Table 2). Both graders 1 and 2 demonstrated greater sensitivities (94.0% and 89.5%, respectively) but lower specificities (52.2% and 66.9%, respectively) than the automated analysis. Yet, when accounting for the inter-eye correlation using the GEE logistic regression, it was found that neither grader 1 or grader 2 were significantly more sensitive than the automated analysis (p = 0.2 and p = 0.7, respectively).

Table 2 Eye level sensitivity and specificity of all three graders as derived using a GEE logistic regression with an exchangeable working correlation matrix.

Inter-grader agreement

The trained graders demonstrated moderate inter grader agreement, achieving a kappa value of 0.452 ± 0.334. Of note, the kappa value was impacted by high disease prevalence within the cohort.

Discussion

We previously presented the RetinaScope, a portable and easy-to-use smartphone-based imaging device capable of capturing high-quality images of the retina [56, 57, 59]. In this study, we evaluated the efficacy of handheld imaging with RetinaScope by non-ophthalmic personnel with automated interpretation by the EyeArt system as a complete mobile platform for generating referral recommendations for DR. Our results suggest that this approach can achieve the necessary sensitivity, as defined by the British Diabetic Association, to be used as a screening tool for DR [60, 61]. At the patient-level, only one grader was significantly more sensitive than the automated analysis, while the automated analysis maintained a higher specificity than both graders. At the eye-level, automated analysis maintained a higher specificity but a lower sensitivity than human graders. Yet, the differences in sensitivity did not achieve statistical significance once the inter-eye correlations were accounted for using the GEE logistic regression. It should be noted that in practice, referrals will be based upon patient-level evaluations.

Recently, several studies have demonstrated the promise of smartphone-based devices in screening and monitoring diseases such as DR. To date, only one study has combined smartphone imaging and AI based image classification [62]. Rajalakshmi et al. reported on the Remidio ‘Fundus on phone’ (FOP), a smartphone-based device, which was used in conjugation the EyeArt system for automated grading. The system produced excellent results in screening for RWDR with a 95.8% sensitivity and 80.2% specificity; however, the findings lacked validation by concurrent gold-standard dilated fundus examination. Instead, the ophthalmologists were asked to grade the images captured using the FOP device, and these scores were used as the true clinical references. Studies employing this structure rest upon the assumption that a clinician reviewing an image from a mobile imaging device is equivalent to a dilated fundus exam. Yet, recent research has shown that there is a notable difference in trained, human graders scoring smartphones images and traditional images. Within the past five years, studies evaluating trained graders ability to detect RWDR within smartphone images reported sensitivities ranging from 50 to 91% [33, 63, 64]. Thus, it is imperative that researchers reference gold-standard clinical diagnoses when validating the sensitivity and specificity of new modalities, especially when combining two new modalities simultaneously. Without such validation, results are more representative of inter-grader agreement, and the reported sensitivities may be erroneously high.

In addition, Rajalakshmi et al. did not specify whether images were acquired by expert or non-ophthalmic operators. The FOP system is reported to be capable of handheld operation or mounted on a slit-lamp frame; however, their study did not disclose whether the device was mounted to the slit lamp when acquiring their data. Rigid stabilization of the camera and patients’ heads improves imaging quality substantially, but it also introduces barriers for implementation in the community including higher cost, lack of portability, and necessity of higher operator skill. Importantly, our study tested the feasibility of imaging by non-ophthalmic operators (a medical intern and medical student), handheld operation, and a lack of rigid head stabilization. Our study validates the efficacy of combining smartphone-based retinal imaging and automated grading for RWDR, and further demonstrates the feasibility of RetinaScope with the EyeArt system in detecting RWDR under non-ideal imaging conditions.

This study presents several notable strengths. First, images used for DR grading were acquired by non-ophthalmic operators and, as a result, images of lesser quality were included in analysis, more closely simulating conditions that may be encountered when screening in the community where such imaging is most needed. Second, our study utilized gold-standard dilated examination by a retina specialist for validating the combination of smartphone imaging and automated image interpretation. Third, our study adheres with wide-field imaging guidelines for photographic screening of DR. Complex imaging tasks, such as imaging multiple regions of the retina for wide-field analysis, were simplified and standardized by leveraging computational capabilities of the smartphone to guide imaging and achieved up to ~100-degree fields-of-view, as previously reported [57]. It should be noted that ETDRS guidelines for photographic screening of DR utilized 7-field retinal images comprising a 90-degree field-of-view [65]. Numerous studies have emphasized the need for wide-field imaging when screening for DR [66,67,68]. For example, a single 45° field of view retinal image had relatively good detection of disease but was inadequate to determine severity of DR as necessary for referral [65, 69, 70].

There are several limitations to this study. First, participants in this study were recruited from the retina clinic in a tertiary care eye hospital, where the prevalence of DR and other retinal diseases is much higher than in the general population. While our feasibility study shows promising results, additional work is required to validate the accuracy and utility of the RetinaScope in the general population. Second, RetinaScope is currently designed as a mydriatic device that requires patients' eyes to be pharmacologically dilated. This can be time-consuming, uncomfortable for patients, and unfamiliar to non-ophthalmic providers in the community. Third, the EyeArt system was trained using traditional retinal photographs rather than smartphone-based imaging. It is possible that automated grading was unable to identify pathology in the smartphone images that it would have recognized in traditional fundus images. The incorporation of smartphone images into the algorithm’s training will help address this limitation.

Smartphone-based retinal imaging combined with automated interpretation is a promising method for increasing accessibility of DR screening. However, it is important to recognize that image quality from handheld or smartphone-based technologies can be highly variable [64]. A key benefit of a smartphone-based approach is the familiarity of this technology, which can assist in usability and rapid learning among inexperienced and non-ophthalmic operators [59]. In addition, we found that device improvement guided by user feedback could dramatically reduce the learning time associated with smartphone-based retinal imaging among inexperienced operators [59]. Minimizing the variability in data quality that arises from nonideal conditions and inexperienced operators will become increasingly important for effective and widespread deployment of automated screening technologies in the community. RetinaScope implements hardware and software automation to simplify acquisition of high-quality retinal images [57]. The field will benefit greatly from continued innovation that improves the reliability and quality of smartphone-based retinal imaging.

Conclusion

At the patient level, RetinaScope combined with the EyeArt system achieved a sensitivity similar to that of trained human graders while maintaining a higher specificity. Future refinements to both the algorithm and the hardware should continue to improve the device accuracy and help to eliminate current burdens on the healthcare system.

Summary

What was known before

  • Artificial intelligence based programs now allow for rapid and accurate analysis of retinal images. Researchers are beginning to integrate autonomous grading with mobile imaging platforms to increase access to medical screening programs.

What this study adds

  • This study is the first to reference the gold standard diagnosis when evaluating the efficacy of combining smartphone-based retinal imaging and autonomous grading. The imaging platform allows for wide-field imaging, which has not been previously explored in conjunction with autonomous grading.