Diabetic Retinopathy Diagnosis based on Convolutional Neural Network in the Russian Population: A Multicenter Prospective Study

Background Diabetic retinopathy is the most common complication of diabetes mellitus and is one of the leading causes of vision impairment globally, which is also relevant for the Russian Federation. Objective To evaluate the diagnostic efficiency of a convolutional neural network trained for the detection of diabetic retinopathy and estimation of its severity in fundus images of the Russian population. Methods In this cross-sectional multicenter study, the training data set was obtained from an open source and relabeled by a group of independent retina specialists; the sample size was 60,000 eyes. The test sample was recruited prospectively, 1186 fundus photographs of 593 patients were collected. The reference standard was the result of independent grading of the diabetic retinopathy stage by ophthalmologists. Results Sensitivity and specificity were 95.0% (95% CI; 90.8-96.4) and 96.8% (95% CI; 95.5-99.0), respectively; positive predictive value – 98.8% (95% CI; 97.6-99.2); negative predictive value – 87.1% (95% CI, 83.4-96.5); accuracy – 95.9% (95% CI; 93.3-97.1); Kappa score – 0.887 (95% CI; 0.839-0.946); F1score – 0.909 (95% CI; 0.870-0.957); area under the ROC-curve – 95.9% (95% CI; 93.3-97.1). There was no statistically significant difference in diagnostic accuracy between the group with isolated diabetic retinopathy and those with hypertensive retinopathy as a concomitant diagnosis. Conclusion The method for diagnosing DR presented in this article has shown its high accuracy, which is consistent with the existing world analogues, however, this method should prove its clinical efficiency in large multicenter multinational controlled randomized studies, in which the reference diagnostic method would be unified and less subjective than an ophthalmologist.


INTRODUCTION
According to the International Diabetes Federation reports in 2021, 537 million adults worldwide suffer from diabetes mellitus (DM).At the same time, it is expected that this figure will increase to 643 million by 2030, and to *Address correspondence to this author at the World-Class Research Center "Digital Biodesign and Personalized Healthcare", I.M. Sechenov First Moscow State Medical University (Sechenov University), Moscow, Russia; Tel: +7 (969) 021-48-47; E-mail: gognieva_d_g@staff.sechenov.ru783 million by 2045.In 2021, the estimated total mortality rate of DM was 6,700,000 [1].In Russia, according to the Federal Register of Patients with Diabetes Mellitus, in 2021, DM was diagnosed in 4,799,552 people (3.23% of the country's population) [2].In 2022, 345,000 new cases of the disease were detected [3].The prevalence of diabetic retinopathy (DR) is 30 to 40% [4].At the same time, according to the World Health Organization (WHO), DR ranks third among the causes of blindness in the world [5], which is also relevant for the Russian Federation [6].
In the United States of America, DM is the leading cause of blindness among the working adult population aged 20-74 years [7].
According to the order of the Ministry of Health of Russia, preventive examinations of apparently healthy adults should be carried out once every three years for people from 18 to 39 years old and annually for people over 40 years old.Routine ophthalmological examination with ophthalmoscopy is not included in the healthcare survey, but the evaluation of blood glucose levels in order to exclude glucose metabolism disorders is mandatory.It should also be noted that such examinations are often voluntary, and in case of the absence of symptoms of the disease, such as vision impairment and other manifestations of DM, a fairly large population group remains out of sight of the doctor and does not receive timely medical care.
If we consider the population group in which DM is already diagnosed, preventive fundus examination should be carried out annually.In case of absence of DR or mild and moderate DR, funduscopy should take place 2-3 times a year, in case of severe or proliferative DR ̶ at least 3-4 times a year.However, the implementation of such a planned dynamic observation in practice is not always feasible, because patients suffering from DM may also have other complications (nephropathy, diabetic foot, etc.), which aggravate the general condition and may cause immobility.In addition, one cannot ignore the fact that the number of qualified specialists who are able to accurately diagnose retinal pathology is not always enough, especially in remote regions, which also limits access to qualified medical care.Thus, there is wide clinical and research potential for developing a portable, non-invasive, cost-effective screening and diagnostic tool for DR with high diagnostic efficiency and reproducibility.Neural networks, which have already proven to be highly effective in DR detection, can become such a tool.This article presents the results of a multicenter prospective study aimed at evaluating the effectiveness of DR detection and grading using a convolutional neural network (CNN) developed by a team of scientists from Russia.This study is the first and currently the only one in the Russian Federation.

MATERIALS AND METHODS
This cross-sectional multicenter study was conducted according to the principles of the Helsinki Declaration and was approved by the Local Ethics Committee of Sechenov University (protocol number 05-21, March 10, 2021).For the purpose of neural network training, a public dataset was used (https://www.kaggle.com/c/diabetic-retinopathy-detection/data).Digital fundus photographs from this source were independently relabeled twice by seven certified ophthalmologists with at least 2 years of professional experience in retinology in order to ensure high accuracy in diagnosing macular pathology.All images were reduced to a resolution of 1024x1024 px, their color was normalized to average values according to the "dataset".The labeled sample included 30,000 patients (60,000 eyes) and it was divided into two parts: 80% -training dataset, and 20% validation set.The basic CNN was trained in three steps: 1. Training of the first six layers of the CNN on 224x224 px image resolution.
2. The use of nine layers with 448x448 px image resolution.

Full CNN training with 1024x1024 px image resolution.
This technique enables the gradient descent during the error backpropagation at the early stages of training, which improves the convergence and generalization of the network.A further increase in image resolution does not lead to an improvement in classification accuracy.To obtain an expert CNN, all layers of the base CNN, except for the last ones, were frozen and only fully connected layers were trained using the training dataset.At each training step, the neural network traversed the same dataset 200 times (epochs) or until the training error on the validation data set stopped decreasing.The architecture of the neural network is presented in Table 1.All codes used in the study were executed in the TensorFlow 2.0 environment with Windows 10 + CUDA (Compute Unified Device Architecture) 10.0.TensorFlow is an open source machine learning library developed by Google® for building and training neural networks.The training was conducted on a hardware platform with a 3.60 GHz Intel Core i9-9900K processor with 32.0 GB of RAM and an NVIDIA GeForceRTX 2080 graphics card.

First Stage
The inclusion of patient data in the test sample was carried out according to the previously calculated sample size on the basis of the Cardiology Department of the University Clinical Hospital No.1.of Sechenov University (237 patients, 474 eyes) and the Department of retinal and optic nerve pathology of the Federal State Budgetary Institution "Research Institute of Eye Diseases named after M.M. Krasnov" (356 people, 712 eyes).This study included 350 patients with a confirmed DM (according to data from medical documentation, which is stored in the electronic database of the clinics), of which 310, in addition to DM and diabetic retinopathy, had a confirmed diagnosis of arterial hypertension and also signs of hypertensive retinopathy; the control group consisted of 242 people, of which 156 had cardiac diagnosis and the corresponding hypertensive retinopathy, and 86 people did not have DM or arterial hypertension and any macular pathology.After inclusion in the study, all patients were assigned a unique identification number for the purpose of blinding, further work was carried out with depersonalized data.Images were obtained using a medical digital camera "Optomed Smartscope PRO", Optomed USA Inc.San Francisco, CA.Fundus photography was performed under the condition of adequate pharmacological mydriasis with the usage of the drug "Fenikamide" (international nonproprietary name tropicamide + phenylephrine, manufacturer GROTEX, LLC, reg.No.: LP-004830 dated 04/26/2018).After receiving 8-field fundus photographs (according to Early Treatment of Diabetic Retinopathy Study (ETDRS) protocol requirements), a number identical to the unique patient's number was assigned to the photographs, and then all of the photos were sent to the specific cloud storage.Subsequent image analysis was performed by two independent ophthalmologists with at least 2 years of professional experi-ence in retinology.Ambiguous clinical cases, that required the adoption of a collegial expert decision, were resolved.Based on the analysis, ophthalmologists graded the photographs according to image quality (high, medium, low, very poor -cannot be interpreted), patients whose photographs were of very poor quality were excluded from further analysis at this step.After that, confirmation or refutation of the diagnosis of DR and identification of the stage of DR were carried out based on International Council of Ophthalmology (ICO) Guidelines for Diabetic Eye Care, 2017 (Table 2) [8].
An average time required for the analysis of one photo by each specialist was calculated.
At this stage, due to poor image quality, 53 people were excluded from the study (30 with a diagnosis of arterial hypertension, 21 with combination of DM and arterial hypertension, and 2 healthy volunteers).

Second Stage
Along with the fundus photography, the researchers conducted analysis of medical documentation from the electronic database, followed by creating specific database containing metadata of the study sample.The presence of the diagnosis "DM" was recorded when one or more of the following WHO criteria were met: Fasting venous or capillary plasma glucose ≥ 7.0 mmol/L (126 mg/dL), 2-hour post-load venous plasma glucose ≥ 11.1 mmol/L (200 mg/dL), 2-hour postload capillary plasma glucose ≥ 12.2 mmol/L (220 mg/dL), HbA1c 6.5% (48 mmol/mol) [9].In addition, the presence of aggravations in the form of hypertension, coronary heart disease, atrial fibrillation, ventricular arrhythmias and severe renal pathology was recorded.

Third Stage
At this stage, the test sample was passed through the neural network.To obtain predictions, a CNN was used, it more detailed description can be found in paper of Tatarkanov A., et al. [10].The main principle of this process is extraction of important simple features of an object from an image (borders, points, color transitions, etc.) and filtering out all unimportant ones.This results in a generation of a single number, indicating that the sample belongs to some certain classification section.
It should be noted that due to the large amount of data (8field fundus photographs per one eye), in order to reduce resource costs and unify the technique, it was decided to use only the photo of the central field for analysis.All photographic images used at this stage for testing the CNN were subjected to pre-processing, the resolution was unified to 1024x1024 px, and the color was normalized to average values according to the dataset.
After completion of testing, confirmation of the presence or absence of the diagnosis of DR and determination of its stage was obtained for all of the fundus photographs.The time required for analysis of each image by CNN was fixed.Analysis could not be performed for some images due to poor quality at this stage of the study, therefor, 1 patient was excluded.

EXPERIMENTAL
Point estimation and the creation of graphs were carried out using the Python v.3.10 programming language, based on the scikit-learn v.1.2module.The bootstrap method was used to calculate the 95% confidence interval.Continuous variables with normal distribution were presented as mean (standard deviation); non-normal variables were reported as median (interquartile range).P-values < 0.05 were considered statistically significant.The diagnosis of DR, made by independent ophthalmologists, was used as a reference standard.The efficiency of the CNN has been estimated in terms sensitivity, specificity, positive and negative predictive value, accuracy, Cohens Kappa score, F1 score, and area under Receiver Operating Characteristic curve (ROC analysis).These parameters were also determined in the classification of DR by stages.

RESULTS
The final sample consisted of 540 people (1080 eyes), 299 (55.37%) of which were female.The mean age was 56±14.08 years.Researchers reached different conclusions when analyzing fundus photographs of the test sample in 21 cases, the final decision on each specific case was made collectively, with the involvement of a third independent ophthalmologist.
The average time for labeling one fundus image for an ophthalmologist was 1 minute 14 seconds, while for a neural network, this figure was 3 minutes, of which the processing itself took 0.8 milliseconds.

DISCUSSION
The first study dedicated to an application of neural network for the screening of DR was published by Gardner et al. about 30 years ago [11].Scientific progress, accompanied not only by the improvement of optical equipment and digitalization necessary to create high-quality databases of big data of medical images, but also by an increase in the efficiency of neural networks themselves, has led to an increase in interest in research aimed at DR screening using neural networks in recent years.
In the first work of Gardner et al. from 1996 the sensitivity and specificity of back propagation neural network in diabetic retinopathy detection was 88.4% and 83.5%, respectively [9].At the same time, current neural networks demonstrate higher diagnostic accuracy, with sensitivity and specificity greater than 90% [12].According to a systematic review and meta-analysis by Mohaimenul et al., which included 20 articles published from January 2000 to March 2019, the sensitivity of methods based on machine learning was 0.83 (95% CI: 0.83-0.83), the specificity was 0.92 (95% CI: Abbreviations: AUC -area under the curve; CI -confidence interval; NPV -negative predictive value; PPV -positive predictive value 0.92-0.92),and the area under the ROC curve -0.97 (95% CI: 0.95-0.98).At the same time, the authors noted that the methodology of the reviewed studies was quite heterogeneous, which may affect the result, and in most of them, the diagnosis established by an ophthalmologist was used as a reference, which does not allow to exclude a possible component of subjective diagnostic error [13].
The study by Choi et al. revealed that the diagnostic efficiency of the neural network decreased in cases of various corresponding fundus pathologies (combination of DR with hypertensive retinopathy, age-related macular degeneration, etc.) [14].In this regard, patients in our study were divided into two groups: Those who had only one fundus pathology ̶ DR and those who, in addition to DR, also had hypertensive retinopathy.As mentioned above, the presence of signs of hypertensive retinopathy did not have a significant impact on the indicators of diagnostic efficiency.
It should be noted that in the course of data analysis, ophthalmologists considered fundus photographs of 53 patients unsuitable for analysis due to its low quality, while the CNN was unable to analyze the data of only 1 patient.Unfortunately, we were unable to evaluate the diagnostic capabilities of the CNN on the specified set of poor quality images, because the diagnosis established by an ophthalmologist was used as a reference in our study.
As for the disagreements that arose between ophthalmologists regarding the diagnosis in 21 cases, in 13 (61.9%) of them, the diagnosis made by CNN coincided with the reference, a discrepancy was noted in 8 (38.1%) cases.All of the above indicates that, unlike a person, a machine is not characterized by doubts when making decisions, fatigue and professional burnout, which undoubtedly creates prerequisites for the use of such diagnostic models for mass screening in conditions of limited material and health human resources.At the same time, researchers do not yet have an understanding of exactly what processes take place within the so-called ''black box'', which implies the emergence of organizational, legal and ethical problems with widespread introduction into practical healthcare.In the study conducted by Lynch et al. fundus photographs containing typical signs of DR were amended at the pixel level, invisible to the human eye.Further, these photographs were passed through the neural network, which had previously been trained with a large amount of images containing signs of the indicated pathology, and it did not reveal pathological changes and attributed them as "normal".At the same time, a neural network trained to diagnose certain pathological changes (lesion-based approach) in the fundus made the diagnosis correctly [15].

LIMITATIONS
Despite the rather high diagnostic efficiency obtained in the course of our study, it has certain limitations: Firstly, a public database of digital fundus photographs, which does not contain any additional information about patients, includ-ing race and nationality was used for the training and validation of the neural network, which theoretically could influence the results; secondly, the inclusion of patients data in the test sample was carried out in Moscow clinics and ethnicity data was not taken into account, which does not allow to expand the results on the multinational population of Russia (represented by more than 160 nationalities); thirdly, the neural network used in our study was trained using images and is also characterized by the ''black box'' phenomenon and the possible resulting problems described above.

CONCLUSION
The method for DR detection presented in this article has shown its high efficiency, which is generally consistent with the results of world analogues.To date, the evidence base for the use of neural networks for the detection and screening of DR seems to be quite impressive.However, it becomes obvious that for a wider clinical application, these methods should prove their effectiveness in large multicenter multinational randomized trials in which the reference diagnostic method would be unified and less subjective than an ophthalmologist.

ETHICS APPROVAL AND CONSENT TO PARTICI-PATE
The study was approved by the Local Ethics Committee of Sechenov University, Russia (protocol number 05-21, March 10, 2021).

HUMAN AND ANIMAL RIGHTS
All procedures performed in studies involving human participants were in accordance with the ethical standards of institutional and/or research committee and with the 1975 Declaration of Helsinki, as revised in 2013.

CONSENT FOR PUBLICATION
All enrolled patients signed written informed consent to participate in the study.After inclusion patient data were anonymized by an individual registration number assignment.

Fig. ( 2 )
Fig. (2) shows the area under the ROC curve for each stage of DR.

Fig. ( 2 )
Fig. (2).Area under the ROC curve for each stage of DR. (A higher resolution / colour version of this figure is available in the electronic copy of the article).