Exudate Detection for Diabetic Retinopathy Using Pretrained Convolutional Neural Networks

. In the ﬁeld of ophthalmology, diabetic retinopathy (DR) is a major cause of blindness. DR is based on retinal lesions including exudate. Exudates have been found to be one of the signs and serious DR anomalies, so the proper detection of these lesions and the treatment should be done immediately to prevent loss of vision. In this paper, pretrained convolutional neural network-(CNN-) based framework has been proposed for the detection of exudate. Recently, deep CNNs were individually applied to solve the speciﬁc problems. But, pretrained CNN models with transfer learning can utilize the previous knowledge to solve the other related problems. In the proposed approach, initially data preprocessing is performed for standardization of exudate patches. Furthermore, region of interest (ROI) localization is used to localize the features of exudates, and then transfer learning is performed for feature extraction using pretrained CNN models (Inception-v3, Residual Network-50, and Visual Geometry Group Network-19). Moreover, the fused features from fully connected (FC) layers are fed into the softmax classiﬁer for exudate classiﬁcation. The performance of proposed framework has been analyzed using two well-known publicly available databases such as e-Ophtha and DIARETDB1. The experimental results demonstrate that the proposed pretrained CNN-based framework outperforms the existing techniques for the detection of exudates.


Introduction
In the area of ophthalmology, deep learning is performing a vital role to diagnose serious diseases including diabetic retinopathy (DR). DR is a severe and common disease all over the world. Diabetic retinopathy is a widespread disease that is diagnosed in diabetic patients.
e World Health Organization (WHO) has declared that, in 2030, diabetes will be the most serious and 7th highest death-causing disease in the world [1]. In this perspective, it is most important to prevent the human lives from being affected by diabetes. In the case of diabetic retinopathy, some abnormalities including lesions are generated in the retina, which later lead towards the nonreversible blindness and vision impairment. But the early detection and treatment of these lesions can reduce the blindness significantly. e retinal abnormalities in DR also include hemorrhages, cotton wool spots, microaneurysm (MA), retinal neovascularization, and exudates, which are clearly shown in Figure 1. Soft exudates (cotton wool spots) are exemplified as light yellow or white areas with distracted edges, but hard exudates are illustrated as yellow waxy patches in the retina. e existence of exudates in the retinal fundus photographs is one of the most serious causes of diabetic retinopathy [3]. e manual identification of hard exudates is based on the analyst, which is a time-consuming task. On the contrary, automatic exudate identification technique is possible to timely detect the hard exudates accurately. It is also a difficult task to handle the factors, including shape, texture, color, size, and poor contrast of the exudates.
For the diagnosis of diabetic retinopathy, image processing techniques, including optic disk localization, adaptive threshold, image boundary tracing, and morphological preprocessing, are widely used for feature extraction using retinal fundus images. According to [4], early detection of exudates in retina may assist the ophthalmologists for timely and proper treatment of affected person. e U-Net-based technique was applied for the segmentation and detection of exudates on 107 retinal images. e reported network was composed of expensive and shrinking streams, where shrinking has a similar structure with CNNs. e unsupervised segmentation technique can detect the hard exudates on the basis of ant colony optimization. e experimental results were compared with traditional segmentation technique named Kirsch filter and found that the unsupervised approach performed better than the traditional approach [5].
Deep convolutional neural network has also performed an important role in the segmentation and detection of exudates using digital fundus images. Tan et al. [6] developed convolutional neural network to automatically discriminate and segment microaneurysms, hemorrhages, and exudates. e reported method describes that only one CNN can be used for the segmentation of retinal features using a huge amount of retinal datasets with appropriate accuracy. Furthermore, García et al. [7] investigated three classifiers: multilayer perceptron (MLP), radial basis function (RBF), and support vector machine (SVM) to detect the hard exudates. In this report, 117 retinal fundus images were used with different variables, including quality, brightness, and color. Xiao et al. presented a review of exudate detection in diabetic retinopathy on the basis of a large-scale assessment of the related published articles. In the reported paper, the authors focused on the recent and emerging techniques including deep learning to detect and classify the diabetic retinopathy in the retinal fundus images [8].
In the segmentation and detection of exudates, it is necessary to localize the specified features. e location to segmentation approach for exudate segmentation using digital fundus images was reported [9] and composed of three steps including noise removal, hard exudate localization in the retinal fundus images, and hard exudate segmentation of diabetic retinopathy. e noise removal was performed with match filters for vessel segmentation, and optic disc segmentation was performed on the basis of saliency technique. Furthermore, the location of exudates was identified using random forest classifier to categorize the patches into exudate and nonexudate classes. Finally, the local contrast and exudate regions were identified for the segmentation of exudates and were further classified as exudate and nonexudate patches. Asiri [10] presented a review to highlight the recent development in the field of diabetic retinopathy. e automatic detection of diabetic retinopathy and macular degeneration has become one of the hottest topics of recent deep learning-based research work.
In addition, enormous work has been done to automatically identify the exudates on the basis of its features including texture, shape, and size. e well-known exudates detection techniques can be separated into 4 basic types: (1) machine learning-based techniques; (2) threshold-based techniques; (3) mathematical morphological techniques; (4) region growing approaches.
Machine learning-based algorithms contain supervised and unsupervised learning approaches. A. R. Chowdhury et al. [11] applied random forest classifier for the detection of retinal abnormalities. e technique was based on k-means segmentation of fundus photographs and preprocessing performed by machine learning approaches based on statistical and low-level features. Moreover, a novel approach was introduced by Perdomo et al. [12] for the detection of diabetic macular edema on the basis of exudates' locations using machine learning techniques. Furthermore, Carson Lam et al. [13] applied pretrained models, namely, AlexNet and GoogleNet, for the detection of diabetic retinopathy. e reported article recognized different stages of diabetic retinopathy using convolutional neural networks. e authors highlighted multinomial classification models and discussed some issues about misclassification of disease and CNNs inability in the article.
reshold techniques utilize variations in color strength among different image regions. In this context, iterative thresholding technique is presented on the basis of particle and firefly swarm optimization to diagnose the exudates and hemorrhages [14]. e threshold technique consisted of image enhancement using preprocessing techniques and vessel segmentation using Top-hat and Gabor transformation. e detection of hemorrhages is performed on the basis of linear regression and support vector machine classifier. Additionally, Kaur and Mittal [15] reported exudate segmentation technique to help the eye specialists for effective planning, and timely treatment in the detection of DR was developed. e authors applied dynamic decision thresholding approach to find the faint and bright edges which help to segment the hard exudates efficiently and pick the threshold values in the retinal fundus images dynamically. Furthermore, Das and Puhan [16] presented the Tsallis entropy thresholding technique to enhance the visibility of exudates in diabetic retinopathy. e obtained features of exudates are further filtered to remove the false-positive values by sparse-based dictionary learning and categorization. e Tsallis technique was analyzed on the basis of the public dataset including DIARETDB1 and E-Ophtha to obtain better accuracy results with 95% accuracy.
A huge amount of contribution has been made to detect the abnormalities in fundus images using mathematical morphological approaches. Morphological techniques utilize several mathematical operators having different structures of elements. Jaafar et al. [17] reported an automated technique for the identification of exudates in fundus photographs. In this work, a new method for pure splitting of fundus colored images was applied, and on the first stage, a segmentation process was performed on the basis of variation calculation of pixels in fundus images and then the morphological technique was applied to filter out the adaptive thresholding outcomes on the basis of segmentation results. Additionally, a random forest technique was applied for the detection of hard exudates in the given fundus images. In the diabetic retinopathy, ensemble classifier is applied for multiclass segmentation and localization of hard exudates [18]. e features of exudates were extracted with coarse grain and fine grain levels with the use of Gabor filter and morphological reconstruction, respectively.
e candidate regions were trained on ensemble classifier to classify the exudate and nonexudate boundaries. e four types of publicly available datasets, including Messidor, HEI-MED, e-Ophtha Ex, and DIARETDB1, were used for experiments. Harangi and Hajdu [19] also reported a novel approach to detect the exudates in three steps, including candidate extraction by greyscale morphological technique, precise boundary segmentation by contour-based technique, and exudate classification by region wise classifier. Harangi et al. [20] presented an exudate detection approach using greyscale morphology and active contour techniques to recognize potential exudate states and to extract exact boundaries of the candidates, respectively.
Region growing approaches observe neighbourhoods of start positions and decide whether they can be a member of a particular region. Lim et al. [21] introduced a modified technique of the previous research work. e classification of diabetic and normal macular edema was performed with the help of extracted exudates. e detection of exudates was performed on the basis of signed macular regions to distinguish the diabetic retinopathy from the retinal fundus images. On the basis of contour identification, Harangi and Hajdu [22] introduced exudate detection technique and, additionally, region wise categorization. In this technique, morphological approaches were applied including greyscale morphology to extract the exudate features and proper shape by Markovian segmentation system. A novel approach for detection of diabetic macular edema was developed by Giancardo et al. [23] on the basis of features including exudate segmentation, wavelet decomposition, and color. e experiments were performed on the publicly available datasets, and obtained 88 to 94% accuracy depends on different datasets.
In this research work, the goal of the proposed technique is to detect the exudates from diabetic retinopathy using transfer learning. e main contribution of the proposed work is to apply the transfer learning concept for feature extraction using well-known pretrained deep convolutional neural networks includes Inception-v3, ResNet-50, and VGG-19. Additionally, fusion is performed on extracted features and further classified by softmax for the final decision.
e rest of the article is organized as follows: the proposed method is explained in Section 2; the experimental results and discussion are covered in Section 3. Finally, the findings are concluded in Section 4.

The Proposed Technique
In this portion, the proposed framework based on pretrained convolutional neural network architectures is described for retinal exudate detection and classification in fundus images. In the proposed framework, three well-reputed pretrained network architectures are combined together to perform feature fusion, as different architectures can capture different features; if only single architecture had been adopted instead of combining multiple architectures, then the probability would have been high to miss some useful features, and ultimately, it might had affected the performance of the proposed framework.
Initially, data preprocessing is performed on both datasets to standardize the exudate patches and then Gaussian mixture technique is applied to localize the candidate exudate before feature extraction. e novel framework becomes helpful for the low-level feature extraction individually by 3 reputed pretrained convolutional neural network architectures including Inception-v3, VGG-19, and ResNet-50. Moreover, collective features are treated as input into the fully connected (FC) layers for further action including classification, performed by softmax to classify the retinal exudate and nonexudate patches, as shown in Figure 2.

Dataset.
Data gathering is an essential part of the experiments for the analysis of the proposed technique. In this proposed approach, two publicly available retinal datasets are used for experiments: (i) e-Ophtha and (ii) DIARETDB1. E-Ophtha dataset contains 47 retinal fundus images examined by four ophthalmologist experts for manual annotation of exudates [24]. e size of the retinal images varied from the resolution of 1400 × 960 to 2544 ×1 696 pixels. e DIARETDB1 dataset contains 89 retinal fundus photographs with the resolution of 1500 ×1 152 [25]. All the retinal images were captured by the digital specified fundus image camera having a 50-degree field of view. e examination of exudates in the diabetic retinopathy was Complexity performed manually and evaluated by five authorized ophthalmologists. Soft and hard exudates were labelled with "exudates" as a single class. e total images were resized to the standard size of DIARETDB1 images having a resolution of 1500 × 1152 pixels, and the estimated image scale size was decided based on the standard size of the retinal optic disc. e samples including affected and healthy retinal images of the e-Ophtha and DIARETDB1 are shown in Figure 3.

Data Preprocessing.
In this phase, the input data are prepared for standardization because of variations in the size of the retinal exudates. Figure 4 demonstrates the distinction between patch sizes among all the extracted retinal exudate patches. e length and the width of the extracted patches are corresponding to the X and Y axis, respectively. It also determines that, with the ignorance of outliers, the collection of retinal exudate patch differs from the size of 25 × 25 to the size of 286 × 487 resolution. In this case, the analysis of the retinal images requires the standard size of the patch for better understanding of data labelling. For this solution, the smallest patch size was selected for the identification of the pathological sign by the experts [26].
In the proposed model, 25 × 25 patch size of colored patch images is used with two types of groups including nonexudate and exudate. e manual exudate patch extraction is performed and obtained 36500 and 75600 exudates from e-Ophtha and DIARETDB1 datasets, respectively. Similarly, for the balance dataset, 35000 and 60000 are extracted nonexudate patches and obtained by the regions of e-Ophtha and DIARETDB1 databases. In the retinal nonexudate patch group, there are various retinal diseases including optic nerve heads, background tissues, and retinal blood vessels. In the proposed technique, all the patches were obtained and extracted without any kind of overlap and can be seen as nonexudate and exudate patch classes in Figures 5(a) and 5(b), respectively.

Region of Interest Localization.
Exudates can be described as bright lesions, highlighted as bright patches and spots in diabetic retinopathy with full contrast in the yellow plane of the color fundus image. Exudate segmentation is applied before the application of feature extraction using the region of interest (ROI) localization. In this step, exudate segmentation is performed to detect the ROI into the retinal fundus images. In this case, numerous approaches have been used including neural network, fuzzy models, edge-based segmentation, and ROI-based segmentation. In the proposed technique, Gaussian mixture approach is used for exudate localization. Stauffer and Grimson [27] used Gaussian sorting to attain the background subtraction technique. In this paper, a hybrid technique is applied with the integration of Gaussian mixture model (GMM) on the  4 Complexity basis of adaptive learning rate (ALR) to attain the significant outcome in the form of candidate exudate detection. e region of interest (ROI) is acquired from hybrid approach, as shown in Figure 6. e ROI is fed into the pretrained convolutional neural network models for feature extraction to obtain compact feature vector. e following equation calculates the region of interest (ROI) by Gaussian mixture model: where r q is denoted as a weight factor and w(x; μ q , σ q ) represents the normalized form of the average μ q . e adaptive learning rate is described to revise μ q frequently with the application of probability constraint w(x; μ q , σ q ) to recognize that a pixel is a part of qth Gaussian distribution or not.

Pretrained Deep Convolutional Neural Network Models for Feature Extraction.
In the start, individual deep convolutional neural network models are applied to extract the features, and later, adopted models are further combined with FC layer for the categorization of fundus images. In this scenario of feature combination, there could be multiple types of features including compactness, roundness, and circularity extracted by the single shape descriptor. In the proposed technique, three up-to-date and the most recent deep convolutional neural network architectures, including Inception-v3 [28], Residual Network (ResNet)-50 [29], and Visual Geometry Group Network (VGGNet)-19 [30], are applied for feature extraction and for further classification of exudate and nonexudate diabetic retinopathy. e above CNN models are already trained for numerous standard image descriptors monitored by the significant extracted features from the tiny images, on the basis of transfer  Complexity learning [31]. In the following subsections, the adopted deep convolutional neural network architectures are briefly defined.

Inception-v3
Architecture. Inception-v3 architecture is a convolutional network based on convolutional layers including pooling layers, rectified linear operation layers, and fully connected layers. Inception-v3 architecture is designed for image recognition and classification. e proposed model is also based on the Inception-v3 architecture, which pools several convolutional filters of various sizes towards an innovative single filter. Furthermore, the innovative filter not only decreases the computational complexity but also abates the number of parameters. Inception-v3 also attains better accuracy with the combination of heterogeneoussized filters and low-dimensional embeddings. e basic architecture of the Inception-v3 is shown in Figure 7.

ResNet-50
Architecture. Residual Network-50 is a deep convolutional neural network to achieve significant results in the classification of ImageNet database [32]. ResNet-50 is composed of numerous sizes of convolutional filters to reduce the training time and manage the degradation issue that happens because of deep structures. In this work, ResNet-50 is applied, which is already trained on the standard ImageNet database [33] except fully connected softmax layer associated with this model. e basic architecture of the ResNet-50 is shown in Figure 8.

VGG-19 Architecture.
e Visual Geometry Group Network model is based on multilayered operations called a deep neural network model. It is comparable with the AlexNet model except additional convolutional layers. e expansion of VGGNet architecture is based on the replacement of kernel-sized filters with the window size 3 × 3 filters and with 2 × 2 pooling layers consecutively. e general VGG-19 architecture contains 3 × 3 convolutions layers, ratification layers, pooling layers, and three fully connected layers with 4096 neurons [30]. e performance of VGGNet-19 neural network is better than AlexNet architecture due to its simplicity. e basic architecture of the VGG-19 is shown in Figure 9.

Transfer Learning and Features Fusion.
In the field of machine learning, transfer learning is recognized as a most useful method, which learns the contextual knowledge used for solving one problem and applying it to the new related problems. Primarily, the transfer learning approach network is trained for a particular job on the related dataset, and after that, transfer to the objective job is trained by the objective dataset [34]. In this work, the objective of the proposed technique is to experiment the well-known CCN models in both transfer learning context and feature-level fusion, concerning retinal exudate classification, and to validate the achieved results on the e-Ophtha and DIARETDB1 retinal datasets. e fusion approach combines features extracted from fully connected layer using three different DCNNs. e features of all three DCNNs are merged together in single feature vector. Suppose three different CNN architectures with respective FC layers are represented as   Complexity where equation (2) represents three CNN models and equation (3) illustrates a number of FC layers. erefore, the extracted features are combined in the feature vector space FV ⊕, having dimensions "d", and can be described as Transfer learning-based techniques are implemented with the pretrained Inception-v3, ResNet-50, and VGGNet-19      architectures from ImageNet. e transfer learning setup is tracked by handling the continuing neural network modules as the fixed feature extractor for the different datasets. Generally, the transfer learning holds the primary pretrained prototypical weights and extracts image-based features through the concluding network layer. Mostly, a huge amount of data are mandatory to train a convolutional neural network from scrape however sometime; it is hard to organize a large amount of database of related problems. Opposite to an ultimate circumstance, in the case of most real-world applications, it is a difficult job or it rarely happens to achieve similar training and testing data. In this scenario, the transfer learning approach is presented and is also proved a fruitful technique. ere are two main steps of transfer learning approach: firstly, the selection of pretrained architecture; secondly, the problem similarity and its size. In the selection phase, the choice of pretrained architecture is based on the relevant problem which is associated with the objective problem. In the case of similarity and size of the dataset, if the amount of the target database is lesser (for example, smaller than one thousand images) and also relevant to the source database (for example, vehicles dataset, hand-written character dataset, and medical datasets), then there will be more chances of data over fitting. In another case, if the amount of the target database is sufficient and relevant to the source training database, then there will be a little chance of over fitting and it just needs fine tuning of the pretrained architecture. e deep convolutional neural network (DCNN) models including Inception-v3, ResNet-50, and VGG-19 are applied in the proposed framework to utilize their features on fine-tuning and transfer learning. In the beginning, the training of the selective convolutional neural network models is performed using sample images taken by the standard publicly available "ImageNet" database; moreover, the idea of transfer learning for fused feature extraction has been implemented. In this case, the proposed technique assists the architecture to learn the common features from the new dataset without any requirement of other training. e independently extracted features of all the selective convolutional neural network models are joined into the FC layer for further action including the classification of nonexudate and exudate patch classes performed by softmax.

Results and Discussion
e experiments are performed on "Google Colab" using graphics processing units (GPUs). For the performance evaluation, two publicly available standard datasets are selected for experiments. e training phase is divided into 2 sessions, and each session took 6 hours to complete the experimental task. e designed framework of the proposed technique is trained on 3 types of convolutional neural network architectures including Inception-v3, ResNet-50, and VGG-19 individually, and after that, transfer learning is performed to transfer the knowledge data into the fused extracted features. e attained experimental results from the individual convolutional neural network is compared and analyzed with the set of fused features accompanied by various existing approaches. 10-fold cross-validation approach is applied for performance evaluation. Cross-validation is a resampling procedure used to evaluate machine learning models on a limited data sample. e procedure has a single parameter called k that refers to the number of groups that a given data sample is to be split into. As such, the procedure is often called k-fold cross-validation. When a specific value for k is chosen, it may be used in place of k in the reference to the model, such as k � 10 becoming 10-fold cross-validation [35]. e input data are divided into different ratios of training and testing datasets used in the experiments of the proposed methodology. e splitting data are performed in three different ways with the ratio of 70% for training and 30% for testing, similarly 80% for training and 20% for testing, and 90% for training and 10% for testing the CNN architectures. Table 1 shows the comparative analysis of three individual CNN architectures with the proposed technique on the basis of data splitting using e-Ophtha dataset.
Similarly, Table 2 illustrates individual architecture and proposed technique results in terms of classification accuracy using DIARETDB1 dataset.
In the context of classification performance, a true positive is an outcome where the model correctly predicts the positive class. Similarly, a true negative is an outcome where the model correctly predicts the negative class. However, false negative (FN) and false positive (FP) represent the samples, which are misclassified by the model. e following equations can be applied for the performance assessment.
Accuracy: it is a measure used to evaluate the model effectiveness to identify correct class labels and can be calculated by the following equation: F-measure: it averages out the precision and recall of a classifier having a range between 0 and 1. Best and worst scores are represented by "0" and "1", respectively, computed as follows: In order to make better understanding of classification accuracy results, Figure 10 and Figure 11 show the comparative classification accuracies of the proposed model against the individual models using e-Ophtha and DIA-RETDB1 datasets, respectively. Additionally, Table 3 demonstrates the comparative results obtained by proposed framework and the existing familiar approaches for the detection of retinal exudates. Table 3 illustrates the classification accuracies of [18] as 87%, [36] as 92%, and [37] as 97.60% and 98.20%. But the proposed framework achieved higher accuracy than the abovementioned techniques using both e-Ophtha and DIARETDB1 datasets. e comparative classification performance of the proposed framework against [37] is a little bit high, but the extracted features achieved by the proposed framework can support the final results and specifically be very meaningful