Scale-adaptive model for detection and grading of age-related macular degeneration from color retinal fundus images

Age-related Macular Degeneration (AMD), a retinal disease that affects the macula, can be caused by aging abnormalities in number of different cells and tissues in the retina, retinal pigment epithelium, and choroid, leading to vision loss. An advanced form of AMD, called exudative or wet AMD, is characterized by the ingrowth of abnormal blood vessels beneath or into the macula itself. The diagnosis is confirmed by either fundus auto-fluorescence imaging or optical coherence tomography (OCT) supplemented by fluorescein angiography or OCT angiography without dye. Fluorescein angiography, the gold standard diagnostic procedure for AMD, involves invasive injections of fluorescent dye to highlight retinal vasculature. Meanwhile, patients can be exposed to life-threatening allergic reactions and other risks. This study proposes a scale-adaptive auto-encoder-based model integrated with a deep learning model that can detect AMD early by automatically analyzing the texture patterns in color fundus imaging and correlating them to the vasculature activity in the retina. Moreover, the proposed model can automatically distinguish between AMD grades assisting in early diagnosis and thus allowing for earlier treatment of the patient’s condition, slowing the disease and minimizing its severity. Our model features two main blocks, the first is an auto-encoder-based network for scale adaption, and the second is a convolutional neural network (CNN) classification network. Based on a conducted set of experiments, the proposed model achieves higher diagnostic accuracy compared to other models with accuracy, sensitivity, and specificity that reach 96.2%, 96.2%, and 99%, respectively.

www.nature.com/scientificreports/ The rest of the paper is organized as follows: "Material and methods" section introduces the proposed model methodology related to the research, explaining the auto-encoder-based scale adapting network and the classification network. The "Experiments and results" section shows the results recorded for the conducted set of experiments. The "Discussion" section explains the obtained results. Finally, the "Conclusion and future work" section presents the conclusion and outlook for future work.

Material and methods
This study aims to provide a solution for the classification problem distinguishing between AMD grades by classifying colored fundus images of patients that are either normal or have intermediate AMD, GA or wet AMD grades. The method is applied to a local dataset. Our proposed model is an integrated model between two stages. First stage is a custom auto-encoder-based model that takes the fundus images as its input from the available dataset and feeds its output to the second stage which is a ResNet50 pre-trained model. Figure 1 shows the proposed integrated model diagram.
Data collection. A cohort of 864 human subjects was recruited for this study by The Comparisons of Age-Related Macular Degeneration Treatments Trials (CATT), sponsored by the University of Pennsylvania 42 . This study was available for those aged 50 and older. During the two years of the clinical trial, 43 clinical centers in the United States enrolled participants who received intravitreal injections of ranibizumab or bevacizumab and one of three dosing regimens. All imaging and clinical data for this study were de-identified by the CATT Study Group before being sent to the University of Louisville. Because the data had been collected in the past by a third party and had been appropriately de-identified, it was deemed to be exempt from the local institutional review board (IRB) process by the IRB of the University of Louisville. All data collection methods were carried out in accordance with relevant guidelines and regulations. Informed consent was obtained from all subjects and/or their legal guardian(s). The CATT program provided study treatments on every participant's first visit. Treatment was delivered to those in the fixed monthly dosing groups every visit or as needed based on the presence of exudation. Treatment evaluations were conducted every visit for those assigned to variable dosing groups. Participants who had lesion activity received study treatment. From these data, we collect 216 normal, 216 intermediate AMD, 216 GA AMD, and 216 Wet AMD.
Auto-encoder based scale adapting network. Regarding the in-equal fundus images sizes, we built our customized resizing model that accepts any fundus image size (as large as 2224 × 1888 px to 547 × 491 px) and resizes it to 224 × 224 px to be used in applying transfer learning on any pre-trained model. The scale adapting (SA) network is an auto-encoder-based neural network model, accordingly it filters out noise and irrelevant information. The auto-encoder-based model aims to resize the input images to 224 × 224 × 3 dimensions and take care of any needed data preprocessing before classification training starts. It is constructed of two CL and a MPL after which a split of two branches takes place where a branch is made of a CL, transpose convolutional layer (TCL), and finally, a reshape layer that reshapes its output to 224 × 224 × 12 . In the end, the two paths are combined using the concatenation layer to produce a work containing high and low-resolution images of 224 × 224 × 15 dimensions. The required output dimension is obtained from the low resolution that is generated from the first branch while the high resolution is needed to ensure that the output is the same as the original  www.nature.com/scientificreports/ input image. During training, the model learns to minimize the reconstruction error between the input and the generated output image by applying the custom loss function comparing low and high-resolution output images with the input image. The high-resolution image is the reconstructed image that is supposed to be as similar to the actual input image, while the low-resolution image is the required resized output. We used Adam optimizer with a fixed 0.001 learning rate and tanh as the activation function. The training was then performed over 100 epochs with a batch size equal to 1. Our custom scale-adaptive (SA) auto-encoderbased model recorded a perfect match regenerated image of 1 structural similarity index measure (SSIM) using a combination of two loss functions Pseudo Huber loss function and Log Cosh loss function for high resolution and low resolution respectively, proofing good quality recording Root Mean Square Error (RMSE) 0.081. By trying different combinations between Mean Square Error (MSE) loss function and Mean Square Logarithmic Error (MSLE) loss function, our model showed efficiency and recorded 1 SSIM while RMSE enhanced to 0.075. This comparison was fairly evaluated with the same hyper-parameters, using an Adam optimizer with a 0.0001 learning rate, setting the factor of the high-resolution loss function to 0.25. In contrast, the low-resolution loss function factor was set to 0.075. Figure 2 shows the experimental results for our SA model, where Fig. 2a shows the loss function curve over training epochs for using the combination of MSE and MSLE losses for high resolution and low resolution respectively, while Fig. 2b shows the loss function curve over training epochs for using the combination of Pseudo Huber loss function and Log Cosh loss function for high resolution and low resolution respectively. Figure 2c shows the output results for our SA model using the combination of MSE and MSLE losses for high resolutions and low resolutions respectively. Classification network. The proposed classification network architecture is shown in Fig. 1b. It is constructed of a ResNet50 convolution backbone, a global average pooling layer, flatten layer, three repeated blocks, and a final softmax dense layer. Each block is architected: a dense layer, a batch normalization layer to stabilize and speed up the training process, and a dropout layer to avoid overfitting. All of the dense layers use the Rectified Linear Unit (ReLU) as its activation function setting all values less than zero to 0, and retaining all the values greater than zero, except for the last dense layer uses softmax as the output layer with four nodes to represent normal (no AMD), intermediate, GA and wet AMD grades. We used categorical cross-entropy as the loss function, stochastic gradient descent (SGD) optimizer starting with a 0.001 learning rate that was reduced automatically during the training phase to improve results whenever the loss metric has stopped improving, a total of 24,750,212 out of 24,811,338 parameters were used for training the proposed classification network architecture.
Due to dataset size limitation, we applied transfer learning, where we used the ResNet50 pre-trained model based on the weights of the ImageNet dataset. The training was performed over 300 epochs with a batch size of 64. The dataset samples were split into 70% for the training set and the remaining 30% for validation and testing sets.
While carrying out training on a limited number of samples, we applied data augmentation on the training dataset to increase its size and avoid overfitting by implementing the following data augmentation process: image rotation by rotating the image at 50 • angle, and image mirroring by flipping the image horizontal and vertical  www.nature.com/scientificreports/ data augmentation is only applied during the training phase and no augmentation used during the testing phase, this leads to train the model among samples and test against the remaining samples.

Experiments and results
The proposed model was trained on Colab-Pro GPU. We developed, trained, validated, tested our model, and calculated its performance metrics in python using TensorFlow 43 , Keras 44 , and scikit-learn 45 , the later along with matplotlib 46 and seaborn 47 were used for plotting all of the shown figures and graphs such as performance metrics, confusion matrix, feature extraction, and activation map. We applied k-fold cross-validation technique to validate the best model performance and propose our model that is composed of our SA model integrated with ResNet50 model. The hyperparameters have been set for each model separately where the scale adaptive auto-encoder-based model hyperparameters were set as follows: batch size is 1, Adam optimizer with a fixed 0.001 learning rate, and tanh as the activation function while the ResNet50 pre-trained model hyperparameters were set as: batch size 64, SGD optimizer with automatic adaptive learning rate starting with 0.001 and reduced whenever the accuracy evaluation metric stops improving.
Accurate detection and grading compared to other models. Distinguishing between the normal healthy retina and AMD different grades recorded the best performance when using our proposed integrated model compared to the other models. This is shown in Table 5 48 and Adam 49 are significant optimization techniques used in machine learning for updating the weights of a neural network during training where the latter is considered as a hybrid combination of RMSProp and SGD with momentum 49 . SGD is a straightforward optimization approach that updates the neural network weights in the direction of the loss function's negative gradient with respect to the weights. It randomly chooses a subset of the training data for every update, reducing the optimization's computational cost. The choice of optimization algorithm depends on the problem being solved as well as the computing resources available. SGD is simple and computationally efficient, whereas Adam is more complex, but can achieve faster convergence on larger datasets and more complex studies 50 . According to the outcomes of applying the Bayesian optimization approach to detect the optimal hyperparameter tuning, the top nominated optimizers for tackling our problem were SGD and Adam optimizers with a batch size of 64 and 32 respectively as shown in Table 6 (Figs. 7,8,9).
Based on our study, SGD proved to be a better optimization technique compared with Adam optimizer, the results are shown in Table 5 and Fig. 3. For every experiment, we started the learning rate value by 0.001 Figure 3. Comparison of models' accuracy for using SGD optimizer and Adam optimizer. www.nature.com/scientificreports/ that adapted and reduced its value automatically, while to ensure fair experimental results we fixed any other hyper-parameter and set it to default except for batch size set to 64 over 300 epochs. Performance metrics of the trained models are shown in Tables 1 and 2 for SGD and Adam optimizers respectively, were computed based on the overall true-positives (TP), true-negatives (TN), false-positives (FP) and false-negatives (FN). The overall performance metrics and parameters are shown in Table 8 for using SGD optimizer and Table 9 for using the Adam optimizer. The confusion matrices is shown in Figs. 10 Tables 1, 2, 8 and 9, it was clear that ResNet50 recorded the most promising performance metrics during training and testing phases by either using SGD or Adam optimizers concerning precision or positive predictive value (PPV), sensitivity or recall or true positive rate (TPR), and specificity or true negative rate (TNR) results. We applied 10-fold, 5-fold, and 3-fold cross-validation techniques for the pre-trained models integrated with SA using SGD optimizer or Adam optimizer to find the optimized performance as shown in Tables 3 and 4, comparing the results recorded for accuracy by training models in each k-fold. We also examined the proposed model with batch sizes 16, 32, and 128 as shown in Table 7 where it was observed that using the SGD optimizer recorded the highest accuracy value of 96.2% with batch size 64 although using the Adam optimizer with the same experimental environment recorded higher accuracy the cross-validation results promotes to using of SGD as shown in Table 3.  www.nature.com/scientificreports/ Explainable retina maps. We used a feature map to ensure the availability of information and visualize feature propagation among convolution layers till the last layer. Figure 4 shows feature maps visualization of the first and last convolution layer of the proposed model and SA integrated with other pre-trained models, where Fig. 4a shows the output of its 64 filters first convolution layer of ResNet50 pre-trained model integrated with SA while its last convolution layer shown in Fig. 4b displays the output of 64 filters. Similarly, for SA integrated with InceptionV3 pre-trained model, we displayed its 25 filters of first convolution layers as shown in Fig. 4c while its output is shown in Fig. 4d where we display the output of 64 kernels out of 192 filters. For the VGG16 pre-trained model being integrated with SA, Fig. 4e,f show the output of the top 64 filters for the first and last convolution layers, respectively. Figure 5a,c,e show the output of the top 64 filters for the first convolution layer of ResNet101, VGG19, and ResNet18 pre-trained models being integrated with SA respectively, while Fig. 5b,d,f show the output of top 64 filters for the last convolution layer of ResNet101, VGG19, and ResNet18 pre-trained models being integrated with SA respectively. The predicted output using the proposed model is shown in Fig. 18, where it successfully discriminates between AMD different grading.    53 , and STARE 54 classify images into AMD and normal retina. Hence, it was hard to use any of these datasets in either training, testing, or evaluating the proposed model (Tables 5, 6, 7). Despite these limitations, our model classified the AMD grades successfully and recorded an accuracy of 96.2% for integrating the SA model with the ResNet50 model using SGD optimizer although using Adam optimizer recorded an accuracy of 97.7%. The best model was determined based on the results from Tables 1, 2, 8 and 9 and applying several deep learning methodologies such as k-fold cross validation recorded in Table 3 to ensure high model performance and by evaluating the model using 3-folds, 5-folds and 10-folds to determine  www.nature.com/scientificreports/ optimal performance and decide the best model. By applying data augmentation, the dataset was sufficient to demonstrate the feasibility of our proposed deep learning model to distinguish AMD grades using fundus images.
We examined the integrated model and tried different optimization like Adam and SGD which proved to be the best optimization technique in our case study.
The pre-trained model represented in ResNet50 proved to be more efficient either integrated with the SA model or standalone whether using SGD or Adam optimizer. It recorded the best-fit model to our study according to cross-validation technique results recorded in Table 3. During the training phase it recorded accuracy that is comparatively 3% accuracy higher than using VGG16 and InceptionV3 models when being integrated with SA model. Compared with ResNet101, VGG19, and ResNet18; the proposed model recorded higher accuracy by more than 6%, 10%, and 15% respectively. It recorded 91.7% accuracy when trained as a standalone model. Although VGG16 pre-trained model recorded performance metrics like InceptionV3 pre-trained model using SGD, and VGG19 pre-trained model recorded acceptable results using SGD both VGG16 and VGG19 recorded the lowest results using Adam optimizer either as a standalone model or integrated with the SA model. Incep-tionV3 recorded good performance metrics during the training phase. However, it was excluded due to crossvalidation technique results similar, to ResNet101 and ResNet18.

Conclusion and future work
In this study, we have proposed an integrated model for scaling input images and distinguishing between normal retinas and AMD grades using color fundus images. Our approach involves two stages. The first stage is a custom auto-encoder-based model that aims to resize the input images to 224 × 224 × 3 dimensions, then considers www.nature.com/scientificreports/ any needed data preprocessing, and then feeds its output to the second stage that aims to classify its input into normal retinas, intermediate AMD, GA and wet AMD grades using ResNet50 pre-trained model. The proposed model is trained on the color fundus images dataset provided by the CATT Study Group. We compared our proposed model performance against different pre-trained models either standalone or integrated with our SA model. We validate our approach using a cross-validation technique that proves our proposed model is the best model performance.   www.nature.com/scientificreports/ For future work, we plan to integrate the scale adapting network with other systems that diagnose other retinal disease, such as diabetic retinopathy, and with other networks that work on different imaging modalities. Also, we plan to expand the study by collecting data from additional cohorts that include subjects from a wider range of institutions and geographic areas globally.        www.nature.com/scientificreports/   Table 6. The outcomes of the Bayesian optimization approach indicate the optimal hyperparameters tuning in terms of the optimizer and batch size to achieve the best performance.