Abstract

Diabetes is a very fast-growing disease in India, with currently more than 72 million patients. Prolonged diabetes (about almost 20 years) can cause serious loss to the tiny blood vessels and neurons in the patient eyes, called diabetic retinopathy (DR). This first causes occlusion and then rapid vision loss. The symptoms of the disease are not very conspicuous in its early stage. The disease is featured by the formation of bloated structures in the retinal area called microaneurysms. Because of negligence, the condition of the eye worsens into the generation of more severe blots and damage to retinal vessels causing complete loss of vision. Early screening and monitoring of DR can reduce the risk of vision loss in patients with high possibilities. But the diabetic retinopathy detection and its classification by a human, is a challenging and error-prone task, because of the complexity of the image captured by color fundus photography. Machine learning algorithms armed with some feature extraction techniques have been employed earlier to detect and classify the levels of DR. However, these techniques provide below-par accuracy. Now, with the advent of deep learning (DL) techniques in computer vision, it has become possible to achieve very high levels of accuracy. DL models are an abstraction of the human brain coupled with the eyes. To create a model from scratch and train it is a cumbersome task requiring a huge amount of images. This deficiency of the DL techniques can be patched up by employing another technique to a task called transfer learning. In this, a DL model is trained on image metadata, and to learn features it used hundreds of classes from the DR fundus images. This enables professionals to create models capable of classifying unseen images into a proper grade or level with acceptable accuracy. This paper proposed a DL model coupled with different classifiers to classify the fundus image into its correct class of severity. We have trained the model on IDRD images and it has proven to show very high accuracy.

1. Introduction

In the human eye, the retina is a thin tissue layer. The retina is responsible for vision; it receives light and converted the received light input into some neural-based signals, and for visual recognition, these signals are forwarded to the brain. Sometimes the retina of the eye got damaged due to diabetes which is called diabetic retinopathy (DR) [14]. Here, “retino” means retina, and “pathy” means disease. The DR can be broadly categorized into 3 stages: (1) diabetes without retinopathy, (2) non- or preproliferative DR, and (3) proliferative DR. However, no such symptoms are observed to detect these different stages; this stage only is defined by pathology.

When the sugar level in the body increases it results in hyperglycemia which causes damage to the cells known as retinal pericytes [5]. These retinal pericytes play a significant role in the regulation of blood flow. The pericytes damage can cause the inability to properly metabolize glucose in cells. This early stage of DR is called diabetes without retinopathy. These can only be detected by using a microscope. An increase in vascular permeability or capillary permeability allows large molecules such as proteins and lipids to move in and out of vessels. If these fluids leak out they get trapped [6]. These can be detected by going across several dilated eye exams. Also, this leakage results in swelling of the macula and it appears like yellow and white flex on the retina which is called hard exudates. Microaneurysms and hard exudates can be detected in the 2nd phase of DR which results in ischemia. Ischemia is a situation in which the poor supply of oxygen to the retina cells takes place. However, in the retina, Vascular Endothelial Growth Factor (VEGF) tries to suppress ischemia by producing new blood vessels.

The last and final stage of DR results in blur vision and is called proliferative DR. Once the DR hits the 3rd stage it becomes proliferative and it can be very severe which results in loss of complete vision permanently. In PDR new blood vessels are formed which are abnormal vessels present in the retina. Some other shortcomings due to PDR are the detachment of the retina from the postal eye and sometimes new-formed blood vessels burst and bleed in the retina which results in permanent blindness. The early DR detection can be cured but when it becomes too severe the process of alleviating it becomes difficult. However, the severe stage treatment for DR includes photocoagulation or focal laser treatment which is laser treatment. Photocoagulation slows down the process of bleeding in the eye or leakage of fluid in the eye. After the dilation of the pupil, fundoscopy is performed which includes a test in which the fundus of the eye is checked by magnifying glass and light. Fundoscopy is performed to check the visualization of the entire retina. The changes seen in DR are microaneurysms, hemorrhages, and hard exudates.

However, classifying DR in different categories which are hemorrhages (HE), microaneurysms (ME), soft exudates (SE), and hard exudates (EX) makes it easy for the ophthalmologist to provide easy and best treatment according to the detected disease. A lot of studies have been completed on the image classification techniques, based on the signs of DR using computer vision [7, 8].

For the last few years, various machine learning approaches have been used for automatic classification tasks [913]. For image classification, it follows the normal procedure initially in the preprocessing stage; the important features are extracted from the dataset of images by making use of convolutional layers. Convolutional Neural Network (CNN) has provided an easy way for researchers to create a new state-of-the-art algorithm that can classify diseases very easily for the good sake of ophthalmologists [14, 15]. Figure 1 shows the presence of different types of lesions in the retina of an infected patient. The major contributions of the present research work are as follows:(1)A wide CNN-based computer-assisted lesions detection system for early and accurate categorization of lesions to aid in treatment planning.(2)An approach that involves image preprocessing, extraction of features, feature reduction, and image classification into different lesions present in the retina of the patient suffering from diabetic retinopathy.(3)Applying data preprocessing techniques on images of various lesions, we were able to increase the CNN performance.

The sections of the paper that follow are structured in the following order: Section 2 provides a thorough overview of the literature, Section 3 explains the preliminary work (dataset description), the experimental environment, and the procedure, Section 4 focuses on the results and the discussion, and Section 5 concludes the paper and outlines the scope of future activities.

Image processing plays a vital role in extracting significant data from an image [16]. Previously, a lot of research has been carried out for the detection of DR in the given clinical dataset of images. Several new state-of-the-art algorithms are also used to detect the DR from the given image dataset. Many researchers have given their contribution to overcome the disadvantages present in the detection of DR. Gadekallu et al. [17] introduce several DL and machine learning (ML) techniques with data normalization and dimensionality reduction approach to exact good results. Firefly and Principal Component Analysis techniques were applied for extracting the features and reducing the dimensionalities. Finally, these images were transferred to the classification process using a Deep Neural Network. Gangwar et al. [18] used an Inception-ResNet-v2 pretrained model and merged it with CNN layers for the detection of diabetic retinopathy. The proposed work used Messidor-1 diabetic retinopathy and APTOS 2019 blindness detection. The use of transfer learning gives good results. Reddy et al. [19] performed the min-max normalization technique on the given dataset to extract the diabetic images. Once these images were preprocessed, the ensemble-based ML algorithms were applied. The results show that the outcomes of ensemble learning algorithms are better than the traditional ML algorithm. Gupta et al. [8] worked on different DL pretrained approaches like Inception v3, VGG16, and VGG19 model for feature extraction from the images dataset of several lesions. To classify the lesions, the extracted features were passed to ML classifiers. Gayathri et al. [20] discussed a new technique for DR detection. Anisotropic Dual-Tree Complex Wavelet Transform and Haralick features were extracted. Also, ML classifiers such as SVM, RF, Tree, and J48 have been used for the binary as well as multiclass classification of different DR lesions. Messidor and DIARETDB0 datasets are used for their work. Nguyen et al. [21] have used various DL models for the classification of various lesions. By using DL techniques the automatic detection becomes easy in comparison to manual detection. It may be observed that the attractive results were found using the above models. The accuracy of 82% and sensitivity of 80% were marked. Erciyas et al. [22] proposed a deep learning-based technique for detecting diabetic retinopathy lesions automatically and independently of datasets and then classifying the lesions found. A data pool is generated in the first stage of the proposed technique by gathering diabetic retinopathy data from several datasets. Lesions are identified and the region of interest is tagged using Faster RCNN. The transfer learning and attention method are used to classify the pictures acquired in the second step. Wan et al. [23] presented a unique segmentation approach for various lesions in DR to tackle the problem. Because the proposed technique is based on a convolutional neural network and can be split into three modules: encoder, attention, and decoder, that is why it is named as EAD-Net. The fundus scans were submitted to the EAD-Net for automatic feature extraction and pixelwise label prediction after normalization and augmentation. The described EAD-Net technique is a unique clinical DR diagnosis-based method. The segmentation of four distinct types of lesions produces excellent results. Gharaibeh in [24] describes a new method for detecting microaneurysms and hemorrhages in fundus photographs. The author used partial swarm optimization (PSO) and Gaussian interval type 2 fuzzy membership function for the detection of diabetic retinopathy lesions. The experimental results are based on the MATLAB simulation program, which uses the DR2 and Messidor databases. These databases produce accurate and efficient categorization results with an accuracy of 95%.

3. Experimental Details

3.1. Preprocessing

The first dataset prepared for the Indian population for detecting eye disease is the IDRID dataset [25, 26]. The idea is originated from one of the eye specialists located in Nanded, India. He captured the fundus images in IDRID. Out of thousands of images, experts verified 516 images to form a dataset based on adequate quality, clinical relevance, and no duplication of images. The image acquisition process is handled by using a view camera (Kowa Vx-10 alpha digital fundus) with an image resolution of 4288 × 4288 pixels. The captured images were stored in jpeg format with an 800 KB image size. The dataset includes typical DR lesions and some normal retinal structures. The dataset highlights facts related to the stage of the disease along with the severity of the disease. The dataset contains eighty-one color fundus dataset images having the capability of the DR trace. These images have various types of lesions like hard exudates (EX), soft exudates (SE), microaneurysms (MA), and hemorrhages (HE) represented in Table 1.

3.2. Training the Convolutional Network

The dataset of ImageNet Large Scale Visual Recognition Challenge (ILSVRC) is used to train the VGG16 model [27], with over 122 color fundus images, which was fine-tuned on the IDRID dataset using transfer learning.

For image classification, CNN is used and the performance of these networks outperformed humans on computer vision-related tasks. ILSVRC challenge organized in 2014 won by VGG. Also, CNN was developed by VGG and two separate models for CNN were designed, i.e., 16-layer model and 19-layer model. These models are used for image classification. The architecture of VGG16 includes 16 trainable layers that include convolutional layer, sigmoid layer, and pooling layer in repetitive occurrences. Finally it was appended with the fully connected layer. The weights in the VGG model are freely available, easy to use, and loaded.

In this work, several color fundus images are fine-tuned using transfer learning. On new predictive modeling tasks, pretrained models are used and utilized the advanced feature extraction potential of recognized models. Figure 2 shows the overall architecture for the classification of lesions using the VGG16 model.

3.3. Classifiers and Energy Functions Used

In ML, the program in the computer is trained onset of inputs and thus uses this learning to classify new input data. This technique is called classification. The algorithm used for the implementation of classification is called a classifier. The different types of classifiers used in ML are SVM, RF, K-nearest neighbors, decision tree, logistic regression, naïve Bayes, AdaBoost, etc. [28]. In this work, once the model is trained to verify the applicability, classification is most important. The classifiers like logistic regression, neural network, SVM, RF, and AdaBoost can be applied to classify the various types of lesions such as ME, EX, SE, and HE in the fundus images of retina [29]. The advantage of using an LR classifier is that whenever the result of the input variables is categorical, logistic regression is utilized. The logistic regression method is applied when input data have binary output, i.e., 0 or 1. The ability of self-learning in neural network classifiers creates an output, i.e., not constrained by input data. The loss of data shows no effect on the operation of the system because input data reside on its network. SVM methods take a small amount of memory for execution. These methods show significant results when classes have clear margin of distinction, e.g., high-dimensional spaces. SVM approaches are also effective when the number of observations is smaller than the parameters. Random forests outperform all other classification algorithms in terms of accuracy. The random forest approach can also handle large datasets with thousands of variables. When a class is rarer than other classes in the data, it can effectively equalize datasets. Adaptive Boosting (AdaBoost) is a common boosting approach that combines many weak classifiers into a single strong classifier.

When using the binary classification system, the results may be positive or negative. The possible classification outcomes by using binary classification are true negative (TN), true positive (TP), false negative (FN), and false positive (FP). Some measures are considered significantly important to find the performance of each classifier. Some of the measures are precision, recall, accuracy, F-score, and AUC [30]. The evaluation based on these measures can be done by the following.

Accuracy may be defined as the division of proportion of correct classifications by the total number of available cases:

Recall may be derived as the total number of TP divided by the total number of actual positives:

Precision is TP divided by the number of predicted positives (sum of TP and FP):

F-score is the harmonic mean of precision value and recall value. The results of the F1-score are good when the values of both precision and recall are good. Also, the results for F-score will not be so good if the value of one measure is good at the expense of the other.

Sensitivity and specificity both are known as conditional probabilities. Sensitivity is determined as the probability for a TP given you have the disease. Specificity is the probability for a TN given you do not have the disease. We consider a symptom or a test to be effective in predicting a disease when both the sensitivity and the specificity are high [28].

4. Results and Discussion

The network which is trained using the VGG16 model was able to classify 122 lesion images segmented from the IDRID dataset. The results obtained gave us an insight into what the VGG model has learned. Also, sometimes convolutional networks give better results in object recognition than humans. The 122 lesion images comprised of four types of lesions: SE with 14 images, EX, HE, and MA with 27 images each. VGG16 is considered the standard model with various classifiers like NN, KNN, RF, LR, and SGD to classify the mentioned lesions [31].

The confusion matrix is also known as the error matrix; i.e., it is used to describe a classifier’s performance on a particular test dataset [32] represented in Table 2. A confusion matrix is classified into subclasses based on correct and incorrect predictions of each class. Table 2 shows the logistic regression classifier in the confusion matrix where ME, HE, EX, and SE correctly predicted 26, 19, 19, and 8 images, respectively.

Another parameter is the ROC curve that can be used to calculate the efficiency of the model, i.e., the sensitivity divided by specificity at several threshold values of probability classifications. ROC curve can classify the true positive values with a larger accuracy rate and avoid the misclassification of false-negative rates. For the present work, the LR classifier with the ROC curve covered the maximum region. With the help of various classifiers like AdaBoost, NN, RF, LR, and SVM, the ROC curve shows accuracy classification with threshold values in Figures 14. This multiclass classification scenario is considered in Tables 35. These tables depict that as ME accurately predicted 26 out of 27 images, SE predicted 9 out of 14 images correctly whereas HE and EX predicted 22 out of 27 images. Tables 37 show the concluded results that depict LR outperformed HE, EX, SE, and ME and also obtained better precision, accuracy, F1-score, recall, and AUC [33].

In the retina, ME is very important and appears initially. The starting stage of DR is known as ME and it can be identified in the retina by spotting red color dots which normally reside in the form of clusters. ROC curve for ME is shown in Figure 3.

While at initial stages ME is not identified and gets burst more than on the retinal area, the ME generates minor hemorrhage known as retinal hemorrhage. Eye bleeding is a common symptom that is normally detected in newly born babies and adults. The ROC curve for HE is depicted in Figure 4.

In the retina, the outer layer retaining the hard reside in the outer layer is EX. The EX is very much similar to liquid material and protein. In the retina, the outer layer retaining the hard reside is called EX. The EX is very much similar to the liquid material and protein. The ROC curve over EX is shown in Figure 5.

In the retina, arteriolar occlusion is arteries that exist in the blood-retina and SE will produce when these arteries move through the occlusion process. These slightly larger precapillaries arteries are also known as cotton wool spots. ROC curve for SE is represented in Figure 6.

The current study describes the application of VGG pretrained models for the categorization of various lesions found in the retina of diabetic patients’ eyes. Furthermore, the lesions images were converted into feature space using VGG16 and VGG19 pretrained models. These models were used due to their clarity, ease of usability, and strength of their depth of learning. Multiple hidden layers in these models aid in collecting and analyzing the existence of characteristics in scans, as well as categorizing the components. Finally, the major advantage of using these networks is its consecutive blocks, which allow for a decrease in the quantity of spatial information required by inserting successive convolutional layers after each other. These models were implemented in our lesion detection to improve the accuracy of detection of lesions in the given input jpeg images [34]. The findings obtained in our research may be compared to those produced by Gharaibeh in [24], who classified the lesions using partial swarm optimization and Gaussian interval type-2 fuzzy membership functions methods. When these approaches are compared to the accuracy gained in our research, it is evident that the accuracy produced in our study surpasses the accuracy reported in [24].

To carry out this research and implement our models, we used an Nvidia Tesla K80 GPU, and we built our algorithm on Google Colaboratory [35]. We used a deep CNN-based architecture for our model.

5. Conclusion

The present work has shown that the use of transfer learning with DL models can achieve high accuracy in the proper classification of different aneurysms in the retinal area of the eye inflicted by diabetic retinopathy. In our paper, we used VGG16 as our model trained on the ImageNet dataset, and then we froze the first few layers of the model. But we retrained the last few layers of the network to train it specifically to capture the higher-level abstracted image features from the IDRID dataset. Diabetic retinopathy disease targets the retinal portion of the eye and entails the loss of vision. Early diagnosis of this condition in prestages is helpful to avoid fate. This needs very accurate learning models to correctly classify the stages. Our proposed model classifies the DR patient aneurysms into proper class with accuracy for classifying MA, SE, EX, and HE images as 95.9% by using LR, 91% by using SGD, 95.9% by using NN, and 93.4% by using HE. A high AUC of 99.7% is indicative of a fewer number of type 2 errors, which is highly desirable in medical classification problems. The rise of DL techniques with transfer learning has made it possible to achieve high accuracies in vision problems.Gharaibeh utilized the PSO and GIT2FMFS techniques to classify lesions in [19]. The accuracy attained utilizing the aforementioned approaches was 95%, which was lower than the accuracy achieved in our study.

This model can be extended to achieve higher accuracy by enhancing the size of the dataset or by implementing a new model such as ResNet, MobileNet, and EfficientNet which can learn more relevant features from the image dataset.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors would like to confirm there are no conflicts of interest regarding the study.

Acknowledgments

The authors would like to thank Taif University Researchers Supporting Project (no. TURSP-2020/26), Taif University, Taif, Saudi Arabia.