Diabetic Retinal Fundus Images : Preprocessing and Feature Extraction For Early Detection of Diabetic Retinopathy

The investigation of clinical reports suggested that more than ten percent patients with diabetes have a high risk of eye issues. Diabetic Retinopathy (DR) is an eye ailment which influences eighty to eighty-five percent of the patients who have diabetes for more than ten years. The retinal fundus images are commonly used for detection and analysis of diabetic retinopathy disease in clinics. The raw retinal fundus images are very hard to process by machine learning algorithms. In this paper, pre-processing of raw retinal fundus images are performed using extraction of green channel, histogram equalization, image enhancement and resizing techniques. Fourteen features are also extracted from pre-processed images for quantitative analysis. The experiments are performed using Kaggle Diabetic Retinopathy dataset, and the results are evaluated by considering the mean value and standard deviation for extracted features. The result yielded exudate area as the bestranked feature with a mean difference of 1029.7. The result attributed due to its complete absence in normal diabetic images and its simultaneous presence in the three classes of diabetic retinopathy images namely mild, normal and severe.


INTRODUCTION
In the recent years, there has been a dramatic increase in the number of diabetic patients suffering from diabetic retinopathy (DR).DR is one of the most chronic diseases which make the key cause of vision loss in middle-aged people in the developed world 1 .DR emerges as small changes in the retinal capillaries.The first differentiable deviations are microaneurysms which are local disruptions of the retinal capillary.The distorted microaneurysms cause the creation of intraregional hemorrhage.This leads to the first stage of DR which is commonly termed as mild non-proliferative diabetic retinopathy 2 .Due to the sensitivity of eye fundus to some vascular diseases, fundus imaging technique is more suitable for noninvasive kind of screening.The result of the screening approach is directly related to the quality and accuracy of the fundus image extraction technique coupled with efficient image processing methodologies 3 for identifying the abnormalities 4,5 .
Exudates are nothing but oily formations leaking from the poor end blood vessels.Starts emerging, the DR is termed as moderate nonproliferative diabetic retinopathy.If these exudates start developing around the central vision area, it is called as diabetic maculopathy.After a certain time, when the retinopathy increases, the blood vessels get blocked by the microinfarcts in the retina.These small infarcts are known as soft exudates.When the presence of the above three abnormalities are encountered together, this kind of retinopathy is then termed as severe non-proliferative diabetic retinopathy 6 ".Several techniques have been used to detect and classify DR, which includes fluorescein angiography, direct and indirect ophthalmoscope, stereoscopic color film fundus photography, and mydriatic or non-mydriatic digital color or monochromatic photography.Under typical clinical conditions, direct ophthalmoscope done by nonophthalmologists has a sensitivity of approximately 50% for the detection of proliferative retinopathy 7,8 .

Literature Review
The result of the paper review indicates that diabetic retinopathy affects approximately two-fifth of the population who identify themselves as having DM 9 .Harding et al. first detected diabetic retinopathy by screening the eye structure of normal and diabetic patient using ophthalmoscope screening tool.The specificity and sensitivity obtained were 97 and 73 percent respectively 5 .The normal features of the fundus images included the optic disc, fovea and blood vessels.The main abnormal features of diabetic retinopathy included exudates and blot hemorrhages 6 .Philips et al. first performed exudates detection and identification.Three strategies namely thresholding, edge detection, and classification were deployed for exudate detection.Global and local thresholding values were used to segment exudates lesions.The sensitivity and specificity calculated were 100% and 71%, respectively 10 .The significant pros found out for single-field fundus photography as explained by trained readers is its potential to detect retinopathy.The sensitivity for it varies from 61% to 90% and specificity falls in the range between 85 to 97 percent" 11 .Optical disk boundary is extracted using the red and green channel.The location methodology succeeded in 99% of cases.Segmentation algorithm rendered automated segmentations and true OD regions of 86% 12 .Ravishankar et al. proposed a new methodology for optic disk detection where they first identified the major blood vessels and used the bifurcation of these to find the approximate location of the optic disk.Many classifiers have been tested including Fuzzy C-means clustering, SVM, Neural Networks, PCA, and simple Bayesian classification 13 .GG Gardener et al used a back propagation neural network.The feature selected for the detection were exudates area, blood vessel area, hemorrhages area, edema and microaneurysms area.It was performed by analyzing images of one hundred forty-seven patients with DR and thirty normal retinal images with exudates, retina with hemorrhages or microaneurysms, retinal images without blood vessels and retinal images containing blood vessels.
The result shown the specificity and sensitivity values to be 88.4 and 83.5 respectively 14 .Mookiah et al.
proposed a new methodology for the fully automatic classification of all the retinal fundus images into various classes by forthwith identifying the blood vessels and hard exudates.The features taken into account were mainly area, Shannon entropy, Kapur entropy, and bifurcation point between two blood vessels.To extract the textual features, they used the concept of Local Binary Pattern (LBP).It was observed that C4.5, a type of decision tree achieved an accuracy of 88.46% whereas SVM with linear kernel achieved an accuracy of 77.56%.The results also showed a specificity and sensitivity value of 95.7 and 94.2 respectively.Akara et al. proposed an exudate detection method based on mathematical morphology on retinal images of non-dilated pupils for low-quality images 15 .The standard deviation of the stimulus showed the main characteristics of the closely distributed cluster of exudates which was obtained using local variation operator for the preprocessing result.The sensitivity and specificity value for exudate detection were found to be 80% and 99.5%, respectively 16 .Xiaohui et al. proposed a solution for the three main difficulties in the detection of MA and that of the non-uniform illumination and interference of similar objects.The KPCA yielded a better result than PCA for SVM classifier.When the number of FP left is 2 per image, KPCA successfully obtained 90.6% true positives 17 .Judah et.al. took the extracted feature from the image and segmented it by applying SVM and KNN classifier for classifying the image according to its severity grade 18 .Alireza et al. proposed a segmentation based on a combination of color representation in Luvs color space and an efficient coarse to fine segmentation using fuzzy c-means (FCM) clustering.They took advantage of retinal color information toward our objectives and showed the improvement obtained by gray-levelbased techniques.The FCM clustering yielded an accuracy of 85.6% and a sensitivity value of 97.2 and specificity of 85.4 10 .
In this paper, we are presenting the preprocessing retinal fundus images, feature extraction steps followed for feature ranking.This paper also includes exudate elimination, optic disc elimination, contrast enhancement, extraction of green channel and MA and hemorrhage detection.

MATERIALS AND METHODS
Figure 1 shows the process flow of methodology adopted to carry out the present work.Subsequent subsections describe the preprocessing of retinal fundus images for extracting and ranking of useful features in detection of diabetic retinopathy.

Data Acquisition Phase
The Kaggle platform provides a large set of high-resolution fundus images taken under a variety of imaging conditions.A clinician(trained pathologist) has rated the data set in the presence of diabetic retinopathy in each image on a scale of 0 to 4, according to the following scale: The images are taken from a variety of different models and types of cameras.Therefore, some of the images may be dark or out of focus.The data set includes a set of 20 normal and 20 diabetic retinopathy images for quantitative analysis.This data set helps in calculating the individual values of the features taken into consideration 19 .
We have considered 14 features which were collected from various sources, as prescribed in the literature review section.The features taken into consideration are described in Table 1 20 .

Feature selection
Feature selection is a process where you automatically select those features in your data that contribute most to the prediction variable or output in which you are interested.Feature selection can be utilized to recognize and eliminate unneeded, irrelevant and redundant attributes from data that do not add to the accuracy of the classifier or may at times tend to diminish the accuracy of the model.We have chosen 7 out of 14 features in total to consider for the experimentation with the reason for their selection as shown in Table 1 in Bold Font.

Pre-processing
To detect the presence of diabetic retinopathy, the steps followed are preprocessing, segmentation and feature ranking.Preprocessing is required to ensure that the dataset is consistent and displays only relevant features.This step is necessary to simplify the workload of the following

Optic distance
The optic distance or Optic Nerve Head is the point in the eye where the optic nerve fibers leave the retina".The optic nerve head in a normal human eye carries about 1 million neurons from the eye towards the brain [21].
Hence it is a good feature to be taken into consideration.

Fovea
The cone cells found in this area are attached to ganglion cells, providing the sharpest central vision in the retina, known as fovea vision [22].However, due to the absence of this feature in previous literature, this feature is not selected.

Blood vessel
When small, delicate blood vessels break beneath the tissue covering the white of the eye (conjunctiva), resulting eye redness may mean that you have a subconjunctival hemorrhage [23].Blood vessel area for the normal image is 37230.56,and since contraction occurs in diabetic retinopathy, the value for it decreases.The Diabetes damages the blood vessels in the retina.

Blot hemorrhages
They are not disease entities in and of themselves; rather, they're hallmarks of ocular and systemic diseases [23].The same detection algorithm goes for the detection of Hemorrhages.Hence it is an important feature to be taken into consideration.It appears same as that of microaneurysms if they are small in size; they occur as MA distortion in the inner layers of the retina, such as the deeper nuclear and outer plexiform layers.Hemorrhages are also termed as red lesions.

Exudate number
An exudate is any fluid that filters from the circulatory system into lesions or areas of inflammation.The fluid is composed of serum, fibrin, and white blood cells.Exudates may ooze from cuts or areas of infection or inflammation [21].However, there hasn't been a significant amount of difference in the number of exudates for a normal or DR affected image.Edema Macular edema is the build-up of fluid in the macula, an area in the center of the retina.The retina is a light affected fragile tissue at the back of the eye, and the macula is the portion of the retina which is primarily responsible for sharp, straight-ahead vision [24].As per the above literature survey, edema doesn't contribute towards Diabetic Retinopathy.Bifurcation between A division of blood vessel into two or more branches is termed as bifurcation.two blood vessel A class of popular approaches for vessel segmentation is based on filtering methods, which work by maximizing response as ship-structures.The mean value for a normal fundus image is far higher than that for a diabetic image as observed in [18].Hence, it serves as a good feature to be taken.

Shannon entropy
Shannon entropy is the expected average value of the information contained in each message.Messages can be modeled or shown using any flow of information [25] .The entropy value for diabetic retinopathy images defers significantly from normal images.

Kapur entropy
Kapur entropy H ( p) K a of order a and is defined as: H ( p) K a = ( ) a a a -å = 1 1 1 1 / n k pk , a > 0 (1.9) when a ®1 , H ( p) K a reduces to H( p).The literature review showed the existence of Kapur Entropy in [20], however the results didn't yield significantly good result for Kapur entropy/ Renyi's entropy Renyi's entropy automatically detects artifacts including eye blinks.It was more effective than another ICA-based approach which jointly used kurtosis and Shannon's entropy [26].The same reason can attributed for ignoring Renyi's entropy as was given to Kapur entropy.LBP entropy LBP is a better textural description operator who requires a minimal amount of calculation and is very much resistant to light interference.However, the feature selected by LBP primarily contains only the relevant textual information of stimuli but does not contain any shape related information.LBP energy LBP energy is significantly low for glaucoma which provides alpha LBP feature for glaucoma indicating that it has coarser textural variation than that of the normal class [27].Microaneurysm MA is a tiny aneurysm, or swelling, in the side of a blood vessel.In people with diabetes, microaneurysms are sometimes found in the retina of the eye [28].
The detection of MA and it's variation with the normal image clearly shows its presence in the diabetic image.In the early stages of DR, patients are mostly asymptomatic; however, in the later stages of the disease, patients may find symptoms that include distortion, floaters and blurred vision.Microaneurysms are the earliest clinical sign of diabetic retinopathy.

Exudates Area
The yellow flecks are called hard exudates.They are the lipid Residues of leakage from damaged capillaries.The commonest cause for the above phenomena is diabetes [29][24].Its presence exists in all the stages of DR, hence it must be considered as an important feature.
processes.Next, the images are segmented to differentiate between the normal and abnormal substances.
Green Channel of the three color channels in the image (Red, Green, and Blue) the contrast between the blood vessels, exudates and hemorrhages is best seen in the green channel and this channels neither under-illuminated nor over-saturated like the other two.Hence, we have extracted only the green channel for analysis and classification given as an illustrative example in Figure 2.
Contrast Enhancement to further enhance the features of the image, contrast limited adaptive histogram equalization is performed.The image is divided into smaller blocks, and histogram equalization is done 29 .
Cropping and Resizing Since the original images vary widely in size and some images were chopped at the top and bottom, they had to be standardized.Since the field of view (FOV) of an image (the section of the retina seen in the image), is circular and first cropped to a square of side equal to the diameter of the FOV.As some images don't have the top and bottom segments, a back patch is added to the images containing the segments to make them uniform.This new square image is then downsampled to the size 512 X 512 pixels 29 as shown in Figure 3.

Feature Extraction
Optic Disk Elimination and Exudate Detection The main objective of exudate detection is the removal of the optic disc before the onset of the process.It is essential because it appears with similar intensity, color and contrast to the other attributes of the fundus image.The optic disc can be separated out by the presence of high contrast circular shape areas.It should be noted that vessels also show with high contrast.However, they are distinctively smaller in area and number.A grayscale closing operator (ö) can be applied to remove the blood vessel which lies inside the optic disc region.For fulfilling this purpose, a flat disc structured element is used with a constant radius of (B1) as shown in Eq. (1).
where B1 is the morphological structuring element.
Exudates The final image was a threshold at automatically selected gray levels, to eliminate Fig. 5: Optic Disc Elimination the region of low intensity.To ensure that all the surrounding pixels of the threshold result were added in the selected region, a flat disc structured element is used with a constant radius of 6(B1) 15 .The steps used in exudates elimination are shown in Figure 4 and example of optical disk elimination is shown in Figure 5.  involves the usage of a classification mechanism.The data is taken from Messidor and then preprocessed using the process of image segmentation.Once that is done, the image is then selected and classified using SVM classifier 32 .For the selection process, filter and wrapper class is used.Once the data is classified the hemorrhages and microaneurysms are detected and plotted in the graph as shown 33 .The Blood vessels By looking at the gray-level image from the red or green field of the sub-image containing the OD (Fig. 3, images R and G), we can interpret that the blood vessels within the OD act as strong deviators, hence they should be deleted from the image beforehand.The vasculature is piecewise straight and linear, and hence can be thought of as a structure comprising many similar connected linear shapes with a minimal length and a maximum width.As a generic rule, these linear shapes were formed, by a set of pixels with an almost non-changing graylevel value and the value being somewhat smaller than the gray-level values of non-vessel pixels in their surroundings.Using a rotating linear structuring element of length and width both set as one, a linear shape can be identified by computing the statistical variance of gray-level values of pixels associated with it.The rotation involved with the minimum value will be that in which the vessel comprises and, conversely, the rotation with the highest value will refer to the situation in which crosses the linear shape 34 .

RESULTS AND DISCUSSION
Experiments are performed on a separate set of feature values for normal retinal fundus images and diabetic affected retinal images.The feature values were computed for 20 images of each type using digital image processing (DIP) toolbox of MATLAB 35 .Table 4 shows the summary of extracted values of mean and standard deviation for both normal and diabetic retinal fundus images.It is observed from Table 4, the exudates, hemorrhage and MA values reduce to zero because of its complete absence in the normal retinal images.The mean difference of all the images was taken and ranked based on the feature ranking parameters.The normal range for all the individual features are shown in Table 3.
As depicted in the Figure 9, the mean value and standard deviation of the exudate area for the normal retinal image is 0 and diabetic retinal images are 1029.7 and 12.246 respectively.Similarly, from the Figure 10, the bifurcation point count shows that the mean and standard deviation of normal retinal images are 332.245and 20.120 respectively and that of diabetic retinal images are 315.09and 12.34 respectively.
It is evident from the Figure 11, the mean value and standard deviation of the blood vessel area for the normal retinal images are 34673 and 2311.86 respectively, and that of diabetic retinal images are 38352.60and 2839.39 respectively.It can be observed from Figure 12, the mean value and standard deviation of the Shannon entropy for the normal retinal images are 6.5001 and 0.226.Again from the same figure, we can derive that, the mean and standard deviation value for Shannon entropy of diabetic retinal images are 5.855 and 0.08 respectively.
As demonstrated in from Figure 13, the mean value and standard deviation of the optic distance for the normal retinal images are 928.80 and 10.Again from the same Figure, we can derive that, the mean and standard deviation value for an Optic distance of diabetic retinal images are 925.36 and 19.353 respectively.
It is easy to see from Figure 14, the mean value and standard deviation of the hemorrhage for the normal retinal images are 79323.4and 28384.45.Again from the same figure, we can derive that, the mean and standard deviation value for an MA of diabetic retinal images are 90642.15and 390000 respectively In Figure 15, the mean value and standard deviation of the MA for the normal retinal images are 86188 and 23626.Again from the same figure, we can derive that, the mean and standard deviation value for an MA of diabetic retinal images are 63435 and 23759 respectively.
From the above analysis, we can conclude that exudates are the highest ranked feature due to its complete absence from normal retinal images followed by blood vessels, which has the highest mean difference.The absolute mean difference has also been computed using method discussed in [2].The further analysis of exudates area and micro features extracted from exudates area may improve the performance of diabetic retinopathy detection systems.

CONCLUSION
In this paper, pre-processing and feature extraction of the diabetic retinal fundus image is done for the detection of diabetic retinopathy using machine learning techniques.The pre-processing techniques such as green channel extraction, histogram equalization and resizing were performed using DIP toolbox of MATLAB.The images were divided into two different datasets, the one was a normal stimulus, and the other was diabetic affected retinal images.The total 14 biologically significant features are extracted from normal and diabetic retinal fundus image data sets.Out of the total extracted features, seven most significant features are used for comparison and ranking these features is very simple and fundamental in the process of identifying a normal and a diabetic fundus image.From the results obtained, it is observed that exudate area is the best feature out of all the features which can primarily be used for diabetic detection, followed by blood vessels and other features, which suggests us that exudate is one of the major feature responsible for diabetic retinopathy.The features used in this study are specific due to their biological relevance and previously reported results.In future, many more features can be extracted from attributes such as red lesions, Kapoor entropy, edema, etc.The Learners can be used for classification of diabetic retinopathy images in multiple classes based on the features values and performance may be evaluated on different measures.