Hybrid CNN-SVD Based Prominent Feature Extraction and Selection for Grading Diabetic Retinopathy using Extreme Learning Machine Algorithm

This paper exploits the extreme learning machine (ELM) approach to address diabetic retinopathy (DR), a medical condition in which impairment occurs to the retina caused by diabetes. DR, a leading cause of blindness worldwide, is a sort of swelling leakage due to excessive blood sugar in the retina vessels. An early-stage diagnosis is therefore beneficial to prevent diabetes patients from losing their sight. This study introduced a novel method to detect DR for binary class and multiclass classification based on the APTOS-2019 blindness detection and Messidor-2 datasets. First, DR images have been pre-processed using Ben Graham’s approach. After that, contrast limited adaptive histogram equalization (CLAHE) to get contrast-enhanced images with lower noise and more distinguishing features. Then a novel hybrid convolutional neural network-singular value decomposition model has been developed to reduce input features for classifiers. Finally, the proposed method uses an ELM algorithm as the classifier that minimizes the training time cost. The experiments focus on accuracy, precision, recall, and F1-score and demonstrate the feasibility of adopting the proposed scheme for DR diagnosis. The method outperforms the existing techniques and shows an optimistic accuracy and recall of 99.73% and 100%, respectively, for binary class. For five stages of DR classification, the proposed model achieved an accuracy of 98.09% and 96.26% for APTOS-2019 and Messidor-2 datasets, respectively, which outperformed the existing state-of-art models.


I. INTRODUCTION
D IABETES mellitus is also known as diabetes which is a collection of metabolic illnesses that happen when a person's blood sugar level is high, and the person does not make enough insulin to control it. The number of the affected person is increasing rapidly day by day. In 2019, International Diabetes Federation (IDF) stated that around 463 million people who have aged between 20 and 79 had been affected by diabetes [1]. Since 2000, the World Health Organization (WHO) has estimated that a 5% increase in diabetes-related early death [2]. Various types of illness such as diabetic retinopathy (DR), stroke, kidney failure, heart attack are caused by diabetes. DR emerges when the retina's blood vessels are disrupted because of excessive blood sugar levels, causes swelling and leakage [3]. In the fundus retina image, this leaking blood and fluids show as dots termed lesions. There are two types of lesions: one is red lesions, and another one is right lesions. Figure 1 shows both the lesions where microaneurysms (MA) and haemorrhage (HM) are represented as red lesions and soft and hard exudates (EX) are represented as right lesions. MA refers to the small dark red dots, while HM refers to the larger spots. Soft EX, often known as cotton wool, looks as yellowish-white and fluffy spots due to nerve fiber injury, whereas hard EX shows as distinctive yellow dots [4]. According to ophthalmologists, DR has been divided into two major stages: non-proliferative DR and proliferative DR (PDR). Non-proliferative DR is further divided into three stages: mild, moderate, severe. Hence, the datasets have five stages: No-DR, mild, moderate, severe, and PDR [5]. The number and types of lesions on the retina image define the stages. Globally, more than 0.4 million people lost their vision, and around 2.6 people are affected by severe vision damage [6]. These visual impairments can be prevented or minimized if it is diagnosed and treated promptly enough. But in the early stages of DR, there are few noticeable symptoms; for this reason, most people miss the ideal period for treatment. In the traditional procedure, ophthalmologists use fundus images (FIs) to diagnose DR, which requires a long time, needs a lot of effort, and is vulnerable to misinterpretation of illness. As a result, it is necessary to utilize a computer-aided diagnostic system to help detect DR early, prevent misdiagnosis, and save money, time, and effort. In the last few years, several deep learning (DL) algorithms have been proposed by researchers for the automatic detection of DR from FIs. In this study, a novel architecture has been proposed for both binary and multiclass DR classification. First, the FIs are pre-processed by Ben Graham's pre-processing method. Further enhanced the contrast of the processed image using contrast limited adaptive histogram equalization (CLAHE). Then the prominent features have been extracted using a deep CNN model combined with singular value decomposition (SVD) named the hybrid CNN-SVD method. Finally classified the five stages of DR by using a novel extreme learning machine (ELM) algorithm. These proposed framework performed well in the different datasets, showed an optimistic performance, and surpassed the existing state-of-art models.
The novel contributions of this study are given below: • FIs have been pre-processed using BenGraham-CLAHE to reduce the noise, enhance the image contrast, and highlight the lesions. • This study developed a new hybrid CNN-SVD to extract the features from the FIs. CNN has been used to extract 256 features from processed FIs. Then SVD is applied to reduce these features to 100 features by selecting the most prominent features, decreasing the model complexity, and improving the model performance. • A comparative analysis has been performed and showed that the proposed method outperformed the state-ofart models in terms of both classification problems in different datasets. The rest of the paper is organized as section II represents the recent works on DR classification. Section III describes different steps of the proposed method. The results of the ELM with different approaches are presented in Section IV, and the outcomes compared to the findings of other recent studies. The key conclusions are presented at the end of section V.

II. RELATED WORK
In the last decade, researchers utilized various DL algorithms and many machine learning (ML) algorithms for the automatic detection of DR from fundus images. In this section, mainly two categories have been described where one is binary DR classification, and another is multiclass DR classification. Researchers have used various DL algorithms for binary DR classification from fundus images (FI). Das et al. utilized maximal principal curvature for extracting the blood vessels from the FI [7]. Further enhancing and removing the falsely segmented regions by using adaptive histogram equalization (AHE) and morphological opening. Finally, a CNN was used to detect the DR and achieved an accuracy of 98.7%. Liu et al. proposed a weighted path CNN, named WP-CNN for the detection of DR [8]. WP-CNN was built by stacking the blocks of the weighted path. WP-CNN was used, including WP-CNN-32, WP-CNN-52, and WP-CNN-105 with 32, 52, and 105 convolutional layers. WP-CNN-105 performed better than the other two models. Pires et al. inquisitive a custom CNN for extraction of features from fundus image for detection of referable DR [9]. By adding data and robust feature-extraction augmentation, the model achieved greater performance. Mahmoud et al. proposed a hybrid inductive ML algorithm (HIMLA) for automatic detection of DR from CHASE datasets and achieved an accuracy of 96.62% [10]. Szegedy et al. proposed a way to scale up networks in a manner that makes the additional work as efficient as possible by applying appropriately factorized convolutions, and aggressive regularization [11]. Finally, they ensembled 4 models, performed a multi-crop evaluation, and reached a 3.6% top-5 error on the test data. Gangwar [19]. They proposed two deep CNN, which were first trained, one for identifying four stages and the other for further classifying the last stage into two further stages. Qummar et al. developed an ensemble model with five TL models, which were Resnet50, Inceptionv3, Xception, Dense121, and Dense169, which enhanced the classification of stages of DR and further encoded the complex characteristics [20]. Sikder et al. proposed a decision tree-based ensemble method for the detection of DR [21]. They performed gray-level intensity and utilized a genetic algorithm (GA) to extract the FI features and achieved an accuracy of 94.20%. Gayathri et al. extracted the features from FI using a CNN model [22]. They proposed several ML algorithms, for instance, support vector machine (SVM), random forest (RF), multilayer perceptron (MLP), etc. for the automatic detection of DR. Vijayan et al. proposed Gabor filter and ML algorithms for the detection of DR and achieved an accuracy of 70.1516% [23]. Tsao et al. utilized various ML models, for instance, DT, SVM, artificial neural network (ANN), and logistic regression (LR) for classification of DR [24]. They achieved high accuracy of 79.5% using SVM. Somasundaram and Alli developed an ML bagging ensemble classifier named ML-BEC for the detection of DR from FI [25]. First, they extracted the candidate object from FI and classified the DR using an ensemble classifier. Rajanna et al. proposed a NN with a combination of pre-processing and augmentation for the classification of DR from FI [26]. Mohammad et al. developed an artificial neural network (ANN) model with three layers for the classification of four stages of DR [27]. Costa et al. proposed a multiple instance learning (MIL) framework for classifying DR from FIs [28]. They trained and tested their model using the Messidor dataset and achieved an AUC of 93%. Pour et al. enhanced the FIs using CLAHE, and for the classification of DR stages, they used EffiecentNet-5 architecture [29]. They have achieved an AUC of 0.945 for the Messidor-2 dataset. Zeng et al. utilized transfer learning, and they developed a unique CNN model with a Siamese-like architecture that can be learned with little effort [30]. They tested their model using 7024 images and achieved a ROC of 0.951. Quellec et al. described a general framework for target lesion detection and characterization, which may be quickly done [31]. This tool takes image samples depicting target lesions and other ocular structures similarly shaped but not target and then extracts a feature space from them, allowing a user to more VOLUME X,20XX readily locate and choose target lesions in images. Figure 2 demonstrates the proposed framework for the detection of DR. Researchers have been focused on designing a computer-aided system that automatically detects different types of life-threatening diseases from the medical image in the last few decades. Some diseases, such as DR, COVID-19, malaria, etc., require early detection to reduce death rates. Hence it is necessary to develop a system that detects these diseases correctly, efficiently, and on-time to reduce cost and death rates. In this study, a novel framework has been proposed for two scheme detection of DR where one is two class (No DR and DR), and another is five class (No DR, Mild, Moderate, Severe, and Proliferative DR). First, the fundus retina images have been collected. The FIs have been pre-processed using several well-known image processing techniques, for instance, ben graham, CLAHE, etc., which are described later. After pre-processing, a hybrid CNN-SVD model has been developed to extract and select the most discriminant features. Finally, a novel ELM has been proposed for the detection of both binary and multiclass classification.

A. DATASET DESCRIPTION
Two types of the dataset have been used in this study: one comes from the Asia Pacific Tele-Ophthalmology Society (APTOS). The APTOS data had been released as part of the Kaggle blindness detection challenge -2019 [32]. The retina images were obtained in the lab using several types of clinical cameras, and the database contains 3662 highresolution colors FIs. Another dataset is Messidor-2 that contains 1748 FIs [33]. Some of the FIs have been incorrectly labelled in this dataset, which are corrected, and the final dataset contains 1738 FIs. There are five stages where no DR, mild, moderate, severe, and proliferative DR are represented by 0, 1, 2, 3, and 4, respectively. The samples of each FI shown in Figure 3. Table 1 shows the data distribution and Table 2 lists the stages and their descriptions.
In this study, two schemes of DR classification have been considered: one is five stages of DR classification, and the other one is binary class classification. For binary class classification stages, 1 to 4 are considered class 1, and 0 is regarded as 0. For both the multiclass and binary class classification, the APTOS-2019 has been divided into a ratio of 90:10 for training and testing, respectively. To validate the hyper-parameters of the proposed novel model, another dataset named Messidor-2 has been used for multiclass classification. For training and testing purposes, this dataset is divided into a ratio of 80:20.

B. PRE-PROCESSING
Pre-processing image data is critical because the quality of image pre-processing affects the classification results. In the first pre-processing stages, images have been blurred using the gaussian blur method to reduce the noise. The blurring process has been named Ben Graham as he suggests the parameter used for blurring the image. After blurring the image, CLAHE has been applied to enhance the image contrast. Blurring has been used before CLAHE so that CLAHE does not enhance the unwanted noises in the images. Figure 4 shows the effect of pre-processing. There are five images from five classes. The first column represents the images before pre-processing, the second column represents the blurring images, and the last column represents the contrast-enhanced images. In this study, Histogram Equalization (HE) has improved image contrast and model accuracy. It is mainly utilized for low-contrast images associated with scientific activities, such as FIs, X-Ray, Satellite, and Thermal photos [35]. Due to the low contrast of DR images, image contrast enhancement is also helpful for this investigation. This study used a HE approach known as Contrast Limited Adaptive Histogram Equalization (CLAHE). It's a variant of Adaptive Histogram Equalization (AHE). However, in the region where the image is virtually uniform, the AHE overamplifies the noise [36]. The amplification is limited in CLAHE, which overcomes the problem. The clipping factor clips the amplification. A clipping factor of 2 and a tile size of 8 × 8 have been used in this study. Another thing that has been considered is that after performing Ben graham to FIs, the images are now in the grayscale with a single channel. Since CLAHE works in BGR images, the single-channel has been copied into three channels and performs CLAHE on these processed images, which are shown in Figure 4. From the figure, it is seen that after performing CLAHE, the lessons are more precise.
Since the datasets include FIs of various forms and sizes converted to a particular shape and size, they can be easily fitted into the CNN model. As a result, the FIs are resized to 224 × 224 pixels in size after applying Ben Graham and CLAHE on them. To represent an image, a huge variety of intensity values are used. As a result, normalization is used to prevent unnecessary complexity without using many image pixels. After reshaping each FI divided by image pixel values by 255; the scale is changed from 0-255 to 0-1, reducing the images' complexity. Sometimes, there are a large number of features available in a dataset where some of them have the least contribution to predict the target variable or create data redundancy. High dimensional feature space has a considerable impact on the performance of a classifier. It is called the curse of dimensionality. To reduce the model learning complexity and time cost, it is required to apply dimensionality reduction techniques. It transforms the original feature space into a minimal feature space that can hold the actual non-redundant information without significant loss [37]. Several well-known techniques for this purpose include principal component analysis, linear discriminant analysis, and singular value decomposition (SVD). In this study, initially, CNN has been used to extract features. After extracting the features, the features have been standardized. Finally, SVD has been used for dimensionality reduction.

1) Feature Extraction using CNN from Fundus Images
In this section, a simple CNN has been proposed to extract the most prominent features from the FIs. The model's classification performance will increase if the relevant attributes distinguish between the various DR stages are extracted. For this reason, a simple CNN model has been used. The structure of the feature extractor CNN is illustrated in Figure 5. Table 3 represents the value of the parameter of the CNN model used for this study, and the deep CNN model is summarized in Table 4. These extracted features can be used for the classification of DR stages efficiently. Each convolutional layer (CL) of CNN has been followed by batch normalization and max-pooling layer. Batch normalization has been employed as it speeds up and improves the performance of the model by re-centering and re-scaling the layers' inputs [38].
Max-polling has been used to select the largest value from each cluster's whole neuron and extract the most critical features from the processed FIs [39], [40]. Dropout is employed to reduce overfitting in this case by frequently skipping training all nodes in each layer during the training phase, resulting in a considerable boost in training speed [41]. Adam has been chosen as an optimizer since it performs great when working on large amounts of data [42]. Finally, from each FI, 256 discriminant features have been extracted using the last dense layer.
Hence, ASB * can be written as: SVD can convert D into optimal lower rank approximation by selecting the higher valued elements of S greater than a specified value. Figure 6 shows cumulative energy in terms information. Hence, after extracting 256 features using CNN, 100 features have been selected from them.

D. EXTREME LEARNING MACHINE
Huang [43] proposed ELM to reduce the training time costing caused by the iterative model parameter tuning method. It's a feed-forwarding neural network (NN) with an input layer, a single hidden layer, and an output layer. It is capable of achieving the lowest training error. The training process is boosted since the ELM design is straightforward and does not require iterative parameter tuning. The proposed ELM model is shown in Figure 7. It has 100 input nodes, 500 hidden nodes, and five output nodes for multiclass classification, and for binary classification, there are one output nodes instead of five. Let's, consider a training sample {X 1 , Y 1 } = {x (1,m) , y (1,t) : m R+, t R+}. Let X represent the input, and Y represent the output. The train samples can then be expressed using a matrix format such as (1,m) . . . . . . . . .
Where, m be the number of attributes and N be the number of hidden nodes. For this study, the value of N will be 500. Then, calculate the output of the hidden layer using H (n,N ) = G(X (n,m) · W (m,N ) + B (1,N ) ). Where, G(x) be the activation function. In this study ReLu activation function has been used.
Finally, calculate the output layer weight β (N,t) by using the Moore-Penrose pseudo inverse.

IV. EXPERIMENTS AND RESULTS
The machine and deep learning algorithms have been implemented utilizing Keras, with TensorFlow as the backend running on the Pycharm Community Edition19 (2020.2.364) software. A PC with an Intel(R) Core(TM) i7-6700 CPU @3.40GHz, 32GB RAM, and an NVIDIA GeForce GTX1650 SUPER 4 GB GPU running on a 64-bit Windows 10 Pro operating system has been used to train and test the models. In this study, the model's performance has been measured using an evaluation metric named confusion metrics. The percentage of correctly detected cases among all cases is referred to as accuracy (ACC). It reveals how accurate the classification algorithm is at identifying [44], [45].
Where TP, TN, FP, and FN indicate true positive, true negative, false positive, and false negative, respectively. The most basic precision (P) is the ratio of TP to all positives forms [46]. In this study, that would be the percentage of patients that accurately identify having a DR out of all the individuals who have it.
The recall (R) of a model is the percentage of True Positives it accurately detects [46]. It is a critical factor in this study because each DR-affected patient must be identified.

A. RESULTS FOR FIVE STAGES OF DR CLASSIFICATION
First, the FIs are pre-processed then a simple CNN has been developed to extract 256 features from these processed FIs. Further SVD has been applied for the selection and reduction of these 256 features into 100 features that are more discriminant and finally check whether the performance of ELM has increased or not. After using these hybrid CNN-SVD as feature extractors and reduction. Finally, ELM has been used for classification.

1) Result for APTOS-2019
The proposed ELM has been trained using 3296 data from APTOS-2019 where the numbers of no-DR, mild, moderate, severe, and proliferative DR are 1625, 333, 899, 174, and 265, respectively. To estimate the overall performance of the proposed model, it has been testing using 366 data where the numbers of no-DR, mild, moderate, severe, and proliferative DR are 180, 37, 100, 19, and 30, respectively. A confusion matrix (CM) has been used to examine the ELM model's robustness which is demonstrated in Figure 8. The precision, recall, f1-score, and accuracy of the ELM have been calculated from this CM. Table 5 represents the result of the proposed ELM model without using SVD. That is to say, the 256 features have been extracted using CNN, then ELM has been applied for the classification of five stages of DR and achieved an accuracy of 97.27%. Further SVD has been utilized to reduce the features to 100, which also improved the performance of the ELM model with an accuracy of 98.07% demonstrated in table 6. The area under the curve (AUC) of the ELM with CNN and  hybrid CNN-SVD for the classification of five DR stages is shown in Figure 9. It's a measurement for analyzing a machine learning model's performance and fine-tuning it [48]. Figure 10 represents the graphical comparison of performance based on different metrics between ELM with CNN and ELM with hybrid CNN-SVD for multiclass classification. It can be observed from Figure 10 that processing image by integrating Ben Graham with CLAHE and extracted the features using hybrid CNN-SVD, the ELM model's performance not only increased significantly but also the number of features has been effectively reduced, which decreasing both computational complexity and cost. It also shows that ELM with hybrid CNN-SVD achieved the highest score in every metric.

2) Result for Messidor-2
The proposed ELM has been tested using 348 data from the Messidor-2 dataset. The numbers of no-DR, mild, moderate, severe, and proliferative DR are 203, 54, 69, 15, and 7, respectively. Figure 11 shows the CM for both approaches. The accuracy of the ELM model with CNN-SVD is 96.26%, which is higher than the accuracy of ELM without SVD 94.54%, demonstrated in Tables 7 and 8. The AUC of ELM   using CNN-SVD is 98.24%, which is quite good than the ELM using only CNN, which is 97.42%. The ROC of both approaches is demonstrated in Figure 12. Figure 13 depicts a graphical comparison of performance based on several criteria between ELM with CNN and ELM with hybrid CNN-SVD. Moreover, it demonstrates that ELM with hybrid CNN-SVD achieved a favourable result in every category. From the performance analysis presented above, the newly  structured proposed model can claim its novelty. In this study, the combination for data pre-processing (Ben Graham-CLAHE) removed the noise from FIs and enhanced the lesions. Since the lesions are highlighted in the pre-processing stage, the hybrid CNN-SVD extracted the most discriminant features from the lesions accurately and reduced the complexity by removing irrelevant features, which SVD does. Hence the ELM shows a promising result for both the Aptos-2019 and Messidor-2 datasets with the same hyper-parameter and data pre-processing technique. Therefore, the model hyper-parameter and pre-processing techniques applied in the proposed novel method are independent of the datasets used.

B. RESULTS FOR BINARY DR CLASSIFICATION
For binary classification of DR similar approach has been taken. The proposed ELM has been trained using 3296 data (APTOS-2019), including 1625 are from no-DR, and 1671 are from DR. The classification performance of the ELM model has been estimated by testing the model using 366 data where the number of no-DR and DR are 180 and 186, respectively. The CM for binary classification has been shown in Figure 14.      curve for binary classification of DR is depicted in Figure  15. Figure 16 represents the performance comparison based on different evaluation metrics between ELM with CNN and ELM with hybrid CNN-SVD for binary class classification. From Figure 16, it is observed that using hybrid CNN-SVD as a feature extractor has been drastically increased the performance of the ELM model with both precision and recall of 100%. In the medical field, recall should be maximized because patients infected with various DR stages must be appropriately diagnosed. It shows that ELM with hybrid CNN-SVD achieved the highest score for each criterion.

C. COMPARISON OF PERFORMANCE TO OTHER WORKS
In this section, the performance of the proposed ELM with hybrid CNN-SVD for DR classification from processed FIs has been compared with different existing models.   achieved an accuracy of 97.82% [14]. The same fold has been used for testing the proposed ELM model and achieved an outstanding accuracy and recall of 99.32% and 99%, respectively, which are demonstrated in Table 11.  The model performance with low-quality images is out of scope in this study. In the future, the model performance can be analyzed with a large dataset with a mixture of low and high-quality images.

V. CONCLUSION
As diabetes has become more frequent over the world, DR consequences are becoming more common as well. Many DR patients lose their vision globally due to a lack of proper treatment. Early detection and treatment of DR can therefore play a substantial role in reducing the risk of blindness. In this paper, it has been found that the capability of the hybrid CNN-SVD based method to extract features is helpful for binary classification and multiclass classification of DR. The DR classification is more effective when integrated with an ELM approach, shown by the proposed technique. The method has exploited BenGraham's principle and CLAHE image preprocessing methods to reduce the noises, highlight the lesions, and, therefore, achieve a better DR classification performance. According to the comparison with the existing schemes, it has been concluded that an ELM classifier can detect DR more precisely. The results of this research are expected to be useful for doctors to detect different stages of DR early and diagnose the patients accordingly.