Automated Staging of Diabetic Retinopathy Using Convolutional Support Vector Machine (CSVM) Based on Fundus Image Data

— Diabetic Retinopathy (DR) is a complication of diabetes mellitus, which attacks the eyes and often leads to blindness. The number of DR patients is significantly increasing because some people with diabetes are not aware that they have been affected by complications due to chronic diabetes. Some patients complain that the diagnostic process takes a long time and is expensive. So, it is necessary to do early detection automatically using Computer-Aided Diagnosis (CAD). The DR classification process based on these several classes has several steps: preprocessing and classification. Preprocessing consists of resizing and augmenting data, while in the classification process, CSVM method is used. The CSVM method is a combination of CNN and SVM methods so that the feature extraction and classification processes become a single unit. In the CSVM process, the first stage is extracting convolutional features using the existing architecture on CNN. CSVM could overcome the shortcomings of CNN in terms of training time. CSVM succeeded in accelerating the learning process and did not reduce the accuracy of CNN's results in 2 class, 3 class, and 5 class experiments. The best result achieved was at 2 class classification using CSVM with data augmentation which had an accuracy of 98.76% with a time of 8 seconds. On the contrary, CNN with data augmentation only obtained an accuracy of 86.15% with a time of 810 minutes 14 seconds. It can be concluded that CSVM was faster than CNN, and the accuracy obtained was also better to classify DR.


I. INTRODUCTION
The world's population growth that does not pay attention to health and health patterns leads to the development of Diabetes Mellitus (DM) to spread more and more [1].DM can cause several complications in other parts of the body.One of the most common complications of DM is Diabetic Retinopathy (DR), which attacks the eyes and often causes blindness [2], [3].A report from the World Health Organization (WHO) estimates that people with diabetes will double as years go by, from 171 million patients in 2000 to 366 million patients in 2030.The report also estimates that in 2030, 4.8% of people who suffer from DR will have blindness, which is approximately 37 million people [4].
The number of DR patients is increasing because some people with diabetes are not aware that they have been affected by complications due to their chronic diabetes [5].Some patients complain that the diagnostic process takes a long time and is expensive, so they choose to leave the disease [6].Based on these problems, it is necessary to develop an automatic identification system to assist experts in order to diagnose DR quickly and precisely, therefore it does not cost much to diagnose the disease.Computer-Aided Diagnosis (CAD) system is used in automated DR diagnostics.The CAD system uses several stages, namely pre-processing, feature extraction, and classification [7], [8].
Preprocessing in CAD is intended to get a better image quality.Preprocessing in CAD is used because most medical images have some noise, resulting in a decrease in the diagnosis results.Preprocessing affects the accuracy of the diagnostic process [9].The second process of the CAD system is featuring extraction, that is a statistical value from medical images used in the classification process.There are several types of feature extraction, namely texture, color, convolution, and others.
The feature extraction used in this study was the Convolutional Neural Network (CNN) method.The convolution layer in the CNN can provide information about the pattern of an image as a whole and is more complex than other feature extraction methods [10].Several researchers have used this Convolutional method to make medical diagnoses, particularly CNN.CNN is a deep learning method that focuses on image data and uses convolutional as a learning feature process to obtain good accuracy results [11].CNN has several architectures in the convolutional learning process, such as AlexNet, GoogleNet, ResNet, DenseNet, and others.CNN has attracted much interest from several studies, such as used CNN to classify lung cancer into 2 classes [12].The convolutional model used in this study was 3D CNN with an AUC value of 0.83.Ratul Ghosh et al. used CNN, in which DR was divided into 5 classes: normal, mild, moderate, severe, and PDR (Proliferative DR).The data used were DIARETDB0 and DIARETDB1 data, which produced the highest classification accuracy of 85% [13].
The convolutional feature of the CNN architecture can produce high accuracy and succeed in getting the maximum pattern of an image.However, the classification method on CNN requires a long training time.This drawback has made several researchers develop the CNN method [14].Diabetic retinopathy classification using CNN by [16], achieved a good accuracy of 95.23%.It shows that CNN performs well to classify diabetic retinopathy.
Several researchers have initiated a method by making several machine learning methods in the Neural Network to obtain maximum results and shorter training time.One such machine learning method is the Support Vector Machine (SVM).SVM has a fast-training process and produces good accuracy.DR research using SVM by [15] classified DR into 2 classes, namely normal and DR.This study obtained the best accuracy results of 82.35%.To overcome the drawback of CNN, the CNN classification system, which was initially a Neural Network, was changed into SVM.The CNN process taken was the convolutional feature of the CNN architecture classified using the SVM method to become the Convolutional Support.Diabetic retinopathy classification used CNN [16], Agarap [17] has researched the development of CNN and SVM methods.The data used in this study was MNIST data, which produced the best accuracy results of 99.04% ].In addition, Piotr W. Mirowski has also predicted epileptic seizures using the CSVM method and the research had high accuracy [18].Another study on CSVM was also conducted by Lei Zhang et al.This study used Amazon, DSLR, Webcam, and Caltech data.The results obtained in this study showed that the accuracy of CSVM was better than that of CNN [19].Based on the explanation above, it is expected that CSVM will provide high accuracy and faster training time to classify DR divided into 5 classes, namely normal, mild DR, moderate DR, severe DR, and severe DR proliferative.
Based on several previous studies, CNN can recognize image features well.However, it has a weakness that is a long training time.This study used CNN to study image features and the SVM method for classification.This research is expected to produce a diabetic retinopathy classification system with a short training time and good accuracy.

II. MATERIALS AND METHOD
DR is a damage to the retina caused by complications of diabetes mellitus [20], [21].Based on the severity, DR consists of two levels: non-proliferative diabetic retinopathy (NPDR) and proliferative diabetic retinopathy (PDR).Clinically, NPDR has three stages (mild, moderate, and severe) characterized by microaneurysms, hard exudates, hemorrhages, and venous abnormalities [22].PDR is a progressive condition characterized by neovascularization, retinal or vitreous hemorrhages, and fibrovascular proliferation [23].The sample data to show the differences between mild, moderate, severe, and proliferative DR are presented in Fig. 1.

Normal Mild Moderate
Severe PDR The fundus image data in this study were obtained from the DRIVE dataset.The data obtained had a total of 44 data.Due to the small amount of data, augmentation was carried out in the preprocessing process to produce quite a lot of data.In preprocessing, the resizing stage was carried out because the image dimensions for convolutional were 224 X 224.The entire research process can be seen in the flowchart in Fig. 2.
It can be seen in Figure 2 that after preprocessing is carried out, the convolutional features are conducted.These convolutional features used several architectures for comparison, namely GoogleNet, ResNet101, and DenseNet.After the convolutional features, the following process is the classification process using SVM.The process consists of two stages: training and testing.The training process is carried out to obtain the optimum model from SVM, and the testing process is used to test the optimum model.

A. Convolutional Neural Network (CNN)
CNN is a deep learning technique that is inspired by the natural visual perception mechanism of living things.CNN is developed with several layers of artificial neural network trained by backpropagation algorithm [24].In contrast to traditional learning methods, CNN has features that can learn training images automatically with large amounts of data [25].CNN has provided excellent results for the visual object recognition process in medical research such as DR screening, skin lesion classification, lymph node metastases detection, and others [26].
The primary parameter of convolution layer is filter defined by the feature map.The user can determine the size and number of the convolution process, while the weights are learned and optimized automatically [26].This process also introduces padding (increasing the number of rows and columns) and stride (parameters that determine filter movement), which can affect the results of the convolution process [27].The pooling layer reduces the computational intensity and prevents overfitting.This layer uses two types of processes: max-pooling and average pooling, which means selecting the maximum and minimum values in each feature map [11].The fully-connected layer is a layer that has a complete connection to all activations in the previous layer followed by nonlinear functions, such as ReLU [28].
A simple CNN architecture consists of a convolutional layer, a pooling layer, and a fully connected layer.The arrangement of these three layers forms the complete CNN architecture [25], as shown in Fig. 3.As time goes by, the CNN architecture becomes deeper and more complex.Various CNN architectures in the literature include LeNet, ResNet, GoogLeNet, AlexNet, DenseNet, and others [29].This study used the neural network architecture of GoogLeNet, ResNet, and DenseNet.
GoogLeNet is the winner of ImageNet ILSVRC 2014, with the implementation of inception as its superiority.This module consists of convolutions with various sizes combined in the depth concatenation layer [30].Inception has developed into several models, including V2, V3, and V4 [31].This study used the V2 inception model to detect DR.This CNN model has 22 layers using a convolution layer and a maxpooling size of 2 × 2. This network model uses ReLU layer after the convolution layer with a size of 2 × 2. GoogleNet architecture is shown in Fig. 4. ResNet is an architecture that was introduced in 2015 and won the 2015 ILSVRC ImageNet.ResNet is a residual network that seeks to improve network efficiency and make it less prone to overfitting problems [32].ResNet introduces shortcut connections to jump through multiple layers to avoid gradient loss due to the increase in tissue layer [33].ResNet allows multi-layer implementation in the form of ResNet-18, ResNet-34, ResNet-50, ResNet-101 architectures and others [32].In this study, ResNet-101 is an implementation of 101 convolutional layers trained on many images [34].ResNet architecture is shown in Fig. 5.
In contrast to ResNet, DenseNet introduces direct connections from any layer to all subsequent layers.This model has several advantages, including reducing the missing gradient problem, strengthening feature propagation, encouraging feature reuse, and substantially reducing the number of parameters [35].This study used the DenseNet-201 model, consisting of 4 dense blocks, each with a growth rate of 32.A transition layer in each dense block consists of convolution and max-pooling layers [36].DenseNet architecture is shown in Fig. 6.

B. Support Vector Machine (SVM)
SVM is one of the supervised machine learning methods used for binary classification [37].SVM works by finding the best hyperplane for dividing two different data classes ∈ −1,1 .The best hyperplane can be formulated in Equation (1) [38].
Where is the weight vector and is the bias.In the feature vector classification ∈ ℝ , > 0, then ∈ class 1.If < 0, then ∈ class 2. If not, then indicates that s in the hyperplane, as illustrated in Fig 4 .The best hyperplane is maximizing the margin value.The margin value will be maximized if the value of ‖ ‖ is minimized by Equations ( 2) and ( 3). 1
With the provision of∑ != 0 with !≥ 0 and = 1,2, … , -For data that cannot be separated linearly, the kernel concept can be used which can change the original data space into a higher data space using the mapping function with the multiplication function in Equation ( 5).

III. RESULT AND DISCUSSION
In this study, 44 data consisted of 5 classes, namely normal, mild, moderate, severe, and proliferative DR (PDR).The data does not need to be validated because the data already has a label from the experts who handle DR.The DR classification process based on these several classes has several steps, namely preprocessing and classification.Preprocessing consists of resizing and augmenting data, while the classification process uses the CSVM method.The CSVM method is a combination of CNN and SVM methods so that the feature extraction and classification processes become a single unit.The preprocessing step consists of resizing an image to adjust image size into input size in CNN architecture and augmentation data to decrease overfitting probability.
The resizing process also includes a cropping process so that the image obtained can be precise because what is needed is an image with a dimension square of (224 x 224).The next stage is the data augmentation process.The data augmentation process uses the rotation method.The image is rotated in every one degree to get a different image value.After obtaining the final image from preprocessing, the classification process using the CSVM method is conducted.The first stage of CSVM process is extracting convolutional features using the existing architecture on CNN.The results of the process can be seen in Fig. 8.
Fig. 8 shows that the fundus image of RGB type is processed to get the convoluted image.A flatter process will carry out the final image from the convolution process to get the statistical features of the image.In this study, a trial was conducted by dividing the data into several classes, namely 2 classes (Normal-DR), 3 classes (Normal -NPDR -PDR), and 5 classes (Normal -Mild -Moderate -Severe -PDR).In the 2-class division system, the DR data consisted of Mild, Moderate, Severe, and PDR, while in the 3-class division system, the DR data were further divided into NPDR and PDR, where the NPDR consisted of Mild, Moderate, and Severe.The following process was the convolutional features in the CNN architecture were performed using the GoogleNet, ResNet, and DenseNet architectures.The features obtained were separated into training data and testing data using the K-Fold Cross Validation method with k=5.The K-Fold Cross Validation process results can be seen in Table 1.
In Table 1, each fold obtained has a different value to find out the best accuracy value.The GoogleNet architecture has the best results in Fold 4 of 2 class and 3 class and that in Fold   Variations in the K-fold results show the quality of the model used because the more stable the results obtained, the more accurate the validation.After obtaining the K-Fold Cross Validation value, the best results for each architecture were compared to determine the most suitable architecture based on the accuracy and training time values obtained.The results of the comparison can be seen in Table 2. Table 3 shows that CSVM is better than CNN in terms of accuracy and learning time.CSVM is proven to be faster and more accurate than CNN, so that CSVM can be used as a substitute by some researchers who have problems with CNN in terms of time and accuracy results.Based on Tables 1, 2, and 3, SVM can classify 2 classes (binary classification) well if it is based on an accuracy of 98.76%, while for multiclass cases, SVM is still less effective because SVM is a binary classification method.Furthermore, the table indicates that data augmentation affects the accuracy of a system.The augmentation process applied to this system can increase accuracy by up to 10%.Because SVM is only based on binary classification, another method is needed to classify multiclass data well.Other combination methods can use extreme learning machines, KELM, MLEM, and other conventional methods [41], [42].

Fig. 7
Fig. 7 SVM Model 1of 5 class.In Resnet101, the best results are obtained in Fold 4 of 2 class, Fold 2 of 3 class, and Fold 2 of 5 class.In DenseNet201, the best results are achieved in Fold 3 of 2 class and 5 class, and in Fold 2 of 3 class.

Fig. 8
Fig. 8 Feature extraction process using CNN

Fig. 9
Fig. 9 Comparison result of CNN and CSVM

TABLE I K
-FOLD CROSS VALIDATION RESULT (K=5) Table 3 also explains that SVM is good at classifying 2 classes, with or without data augmentation.Comparison results of CNN and CSVM are shown in Fig. 9.

TABLE III COMPARISON
BETWEEN CNN AND CSVM