Diabetic Retinopathy Detection Using Local Extrema Quantized Haralick Features with Long Short-Term Memory Network

Diabetic retinopathy is one of the leading diseases affecting eyes. Lack of early detection and treatment can lead to total blindness of the diseased eyes. Recently, numerous researchers have attempted producing automatic diabetic retinopathy detection techniques to supplement diagnosis and early treatment of diabetic retinopathy symptoms. In this manuscript, a new approach has been proposed. The proposed approach utilizes the feature extracted from the fundus image using a local extrema information with quantized Haralick features. The quantized features encode not only the textural Haralick features but also exploit the multiresolution information of numerous symptoms in diabetic retinopathy. Long Short-Term Memory network together with local extrema pattern provides a probabilistic approach to analyze each segment of the image with higher precision which helps to suppress false positive occurrences. The proposed approach analyzes the retina vasculature and hard-exudate symptoms of diabetic retinopathy on two different public datasets. The experimental results evaluated using performance matrices such as specificity, accuracy, and sensitivity reveal promising indices. Similarly, comparison with the related state-of-the-art researches highlights the validity of the proposed method. The proposed approach performs better than most of the researches used for comparison.


Introduction
Diabetic retinopathy (DR) is a medical term used to describe a sightly threatening ailment which appears on the retina. The disease is predominantly common among the working age people which when properly not treated could lead to total loss of vision [1,2]. Retinopathy is the leading cause of blindness amongst adults worldwide [3]. According to a research, over 239 million people in the year 2010 were affected with over 7.7 million in America alone. The number of Americans with diabetic retinopathy, according to the new projection, is expected to nearly double, from 7.7 million in 2010 to around 14.6 million by the year 2050 [4]. Sequel to its enormity and the gravity of devastation it causes, early detection and treatment has been one of the key focus for most health institutions around the world.
Diabetic retinopathy causes major changes in retina vasculature structure which comprises the major blood vessels used for transforming oxygenated blood and nutrients to the various part of the retina. In some people with diabetic retinopathy, blood vessels may swell up and leak fluid leading to formation of abnormalities such as hard and soft exudates. In other situations, development of abnormal new blood vessels, blood vessels occlusion, and leakage of blood (hemorrhages) into a healthy portion of the retina are experienced. Detection of the signs of diabetic retinopathy involves proper identification of all the possible abnormalities such as hard and soft exudates, hemorrhages, and blood vessel occlusions [2,3,5].
Therefore, an effective technique to realize a reasonable diagnosis of DR diseased eye will involve identification and segmentation of the various features of the retina such as blood vessels, optic disk, and many other anomalous symptoms of a diseased eye. Segmentation of blood vessels is one of the important aspects being looked at, whereby the retina vasculature is being extracted from the fundus image for close examination of some anatomical changes in the structures of these vessels such as diameter, branching angle, and occlusion.
In the past, diagnosis of these retinal symptoms depends on a manual segmentation of the retinal fundus image. This action requires an ophthalmologist expertise as the procedure is time-consuming, effort prone, and very complicated [6]. However, with the major breakthrough in machine learning and artificial intelligence fields, many algorithms were developed by researchers to help accomplish the process of diagnosis of these abnormalities in retina fundus images. For example, ophthalmologists can use a properly segmented vessel to examine and detect a disease by identifying the growth of number of extra vessels or fluids like features, their shapes, and sizes. Different techniques have been adopted or proposed in many researches, some of which focused mainly on the segmentation of various features of the retina to examine anomalies. Moreover, some techniques have more general approach, whereas machine learning algorithm is developed and trained to classify a fundus image as healthy or nonhealthy with various stages of the disease. Anatomical overview of retina is depicted in Figure 1.
The manuscript is divided into seven sections. Sections 1 and 2 comprise of the introduction and literature review while in Section 3, theoretical background of the key concepts used in the proposed approach was presented. In Section 4, the proposed approach was discussed while Section 5 presents the experimental results using the proposed approach. In Section 6, discussion and comparison were presented, and the manuscript was concluded in Section 7.

Related Literature
Quite recently, a number of researches have emerged giving birth to computer-aided diagnostic systems which are used for diagnosis of ophthalmic anomalies in the retina fundus images. Blood vessel segmentation has seen one of the highest volumes of articles produced [2,7]. Most of the techniques used in the segmentations can be grouped into two categories: supervised methods and unsupervised method. The supervised methods use labeled data to train classifier algorithms such as support vector machine (SVM) to classify each pixel according to the labels. The unsupervised methods in contrast use no label data or any prior information about the disease. The unsupervised techniques may include morphological operation, matched filtering approach, and deformable models [1].
Yin et al., [8] used probabilistic tracking-based method to segment the retina blood vessel from the fundus images using structural Analysis for Retina (STARE) and Digital retinal Images for a Vessel Extraction (DRIVE) databases. Sensitivity and specificity of 0.75 and 0.95 were recorded, respectively.
In their submission, Ravichandran and Raja [9] applied local entropy-based thresholding technique after preprocessing the fundus image. Wang et al., [10] utilized supervised learning approach in which the Convolutional Neural Network (CNN) and Random Forest (RF) were combined to classify vessel pixel after being trained with a label data. This method performed well but at a cost of huge computational cost incurred.
Sohini et al., [11] extracted major vessels in the preprocessing stage, and then, they applied Gaussian Mixture model (GMM) classifier has been used tune and refine the final vessel from the rest of the image. Zhao et al. [12] used active contour techniques by applying graph cut to segment vessels. They initially enhanced the image using local phase filter. Zhao et al. [13] proposed a new infinite active contour model for the automated detection of retinal blood vessels.
Walter et al. [14] introduced an algorithm for microaneurysm detection using candidate extraction. Their techniques initially enhance the image and then extract the green channel and normalize it followed by candidate detection with diameter closing and an automatic thresholding scheme. The classification of the microaneurysm pixel candidates was done based on kernel density estimation.
Similarly, Spencer et al. [15] and Frame et al. [16] both applied candidate extractor approach where shade correction was used by subtracting a median filtered background from the green channel image. Then, they finally used morphological operation based on top-hat transformations using twelve structuring elements to extract candidate extraction. The resulted candidate pixels were further subjected to contrast enhancement operations before finally were binarized.   Figure 2 shows how these offsets are considered. Therefore, for a gray image I of size M × N with L graylevel values, the cooccurrence matrix C will have a size L × L which is define over the entire image and parameterized by the offset (d, θ).  Figure 3(a) of an original gray image with gray level L = 3, if a horizontal offset is defined (i.e., d = 1, θ = 0°), the computed GLCM in Figure 3(b) is a 3 × 3 sized whose entries are computed using Eq. (1). For instance, the highlighted entry "2" of the GLCM matrix (Figure 3(b)) which occurs at row ði = 3Þ and column ðj = 1Þ was obtained by counting the number of pixel pair on the horizontal offset which have gray values exactly 3 and 1, respectively, from the original gray image in Figure 3(a). The normalized GLCM in Figure 3(c) represents the estimated probability of each combination of GLCM occurrence to occur within the image. For each row of the gray image in Figure 3(a), there are 3 unique possibilities for combination of pixel pair on the horizontal offset; therefore, a total of 12 possibilities exist for the entire gray image. The normalized GLCM in Figure 3(c) is obtained by dividing each gray cooccurrence entries in Figure 3(b) by 12. This normalized GLCM is used to extract Haralick features, and its entries sum up to one. The normalized GLCM can be considered as a probability mass function of the gray-level pairs in the image.

Haralick Textural Features.
Haralick computed 24 different statistical features from the normalized GLCM matrix, C n as exemplified in Figure 3(c). These features quantify essential part of local information and spatial features within the image. Though all the 24 features may be useful in different textural analyses, in this research after a few trial combinations of these features to extract the desired information, five of these features are standout to very promising hence are described in Eqs. (2)- (6).
Homogeneity: describes the measure of closeness of the each GLCM element to its diagonal elements Entropy: is the measure of randomness or the degree of disorder present in the image Energy: is the root of Angular Second Moment, which gives the measures of the local uniformity of the gray levels Correlation: this feature shows the linear dependency of graylevel values in the cooccurrence matrix.
where terms μ x μ y and σ x σ y are the means and standard deviation of the summed cooccurrence matrix C n , a long horizontal and vertical spatial plane, respectively.
Contrast: contrast which is also known as standard deviation indicates the measure of gray-level intensity variation between pixels.
3.3. Long Short-Term Memory (LSTM) Network. LSTM network is a deep learning form of Recurrent Neural Network (RNN) which was first proposed in 1997 by Hochreiter and Schmidhuber [19]. It addresses the familiar problems of vanishing/exploding gradients associated with traditional neural networks. The vanishing gradient problem gradually erodes the magnitude of error gradients used to update igure 2: GLCM center pixel neighborhoods offset.
3 International Journal of Biomedical Imaging weights and biases. This prevents the network from further adjusting weights and biases, and hence, the learning eventually halts in the deeper layers of the network [13].
In a classical structure of RNN (Figure 4), the network updates the weight vectors W, at each hidden layer. The out of the hidden layer h t at time stamp t is compared with data label y ðtÞ to compute the net error L ðtÞ for that layer which is used by gradient decent to minimize the network error. The net input to each hidden layer at time lag t comprises of the input sequence x ðtÞ and the weighted output (Wh ðt−1Þ ) of the adjacent hidden layer at time t − 1.
LSTM replaces each hidden layer by a gated structure called cell ( Figure 5) which has additional connection to each layer using cell state, C. The LSTM cell consists of three gates: forgetting gate, input gate, and the output gate. Each of these gates uses sigmoid (σ) activation to control the amount of information through the cell at various stages. Forgetting gate f t , controls the information to retain or forget from the previous time lag of the sequence in the adjacent layers while the input gate i t regulates the current internal cell state C which holds the cell's net input after being squashed using tanh activation function. The output gate o t , controls the cell output h ðtÞ , which literally is a squashed vector of the current cell state C t regulated by the output gate. The output gate controls the squashed current cell state.
The relations for LSTM network for each gate with biases can be presented in Eq. (7).
The LSTM network uses the gradient decent with truncated Backpropagation Through Time (BPTT) to adjust the weights ðw f , w f , w f , w f Þ and biases (b f , b i , b 0 ) to minimize the error between the network's outputs and target outputs.

Proposed Method
In the proposed method, feature encoding uses the local extrema information at different quantization levels to construct the GLCM. The constructed normalized GLCM is used to extract Haralick features of interest which are subsequently encoded as a feature sequence to be fed to the LSTM network for training.  Figure 4: Recurrent Neural Network.  An image I, which is quantized at gray level L Q with gray local extrema information G r = ½I max , I min ] where I max , I min represent the minimum and maximum of the local extrema gray-level intensity values, will have different Haralick features when these parameters are varied. The new modified image I s which captures both quantization L Q and G r can be computed using Eqs. (8) and (9).
I s m, n ð Þ= where I int is an intermediary image and de is a ceiling operator that maps the computed values in I int to the least integer greater than or equals itself.
One of the common choices for the extrema G r is the global minimum and maximum of the gray-intensity values in the image. A choice of G r within a localized ROI in the image at the same quantization level L Q will result in different intensity value distributions within the image compared to the same image whose G r is the global extrema intensity values of the image. Hence, choice of extrema value G r influences the GLCM matrix (see Figure 6).
The proposed quantized Haralick features are formed using the five Haralick features of the normalized GLCM as described in Section 3. Initially, four different versions of the original gray image are computed at different quantization levels, i.e., L Q = ½128, 64, 16, 8 bins.
From each of the quantized version I Q , a normalized GLCM is constructed and then use to extract the five Haralick features (i.e., homogeneity, entropy, energy, correlation, and contrast). For each quantized image, minimum and maximum pixel intensity values are used as the extrema G r . For instance, the extrema values for I 128 is given by (11).
In the end, all the Haralick features extracted from the normalized GLCM of the four quantized images are concatenated to form a feature vector F Haralic of length 20.

Segmentation.
Instead of applying the proposed feature selection described above to the entire gray fundus image, the image is segmented into 64 × 64 window. Where necessary, the image is padded with zeros to ensure that an integer number of segments of size 64 × 64 is generated. Each segmented window uses the proposed approach to form the 20 features. The objective of segmentation is to help find the probability of occurrence of region of interest (ROI) where the diabetic retinopathy symptoms most likely occur. To compute this probability, a prior information about the location of the symptoms is needed. This information is provided in the training datasets where regions that are infected were marked by experts. These probabilities for each segment are determined using a benchmarking method similarity measures between the features of a segment and the features of an actual ROI are compared. The ROI is simply the ground-truth image of the training data as indicated in Figure 7(b).

Sequence Encoding for LSTM. LSTM network is trained
in a vector to sequence model. For any training image, its corresponding ground-truth is used to extract the 20 features proposed. If there are N training samples, each ground-truth (ROI) generates F i f or i = 1 to N of length 20. The benchmark ROI feature vector F ROI use to evaluate each segment is given as the average of these features as in (12).
A correlation coefficient ρ is computed between each F ROI features and F segment features to assign a label to that segment. To generate level for each segment with encoded feature sequence, a correlation coefficient ρ is computed between each encoded segment features and the encoded feature for the candidate ROI. If the computed ρ is less than 0.5, that segment is assigned a label of none. A value of ρ between 0.5 and 0.7 is labeled mild, 0.7 to 0.89 are labeled strong, and 0.9 to 1 take very strong label. Every segment extracted features (20 features) in the image is train as sequential input to the LSTM and the computed label as the label data. Therefore, the network trains on a stream of sequential data. For a ROI encoded feature X and segment feature Y, the correlation coefficient ρ is computed using Eq. (13). The complete flowchart of the proposed method is depicted in Figure 8.
where E½, μ, and σ are the expected value mean and variance functions, respectively.

International Journal of Biomedical Imaging
The LSTM network is trained with segments within each training data and their corresponding labels computed (correlation coefficient ρ) above. During testing, the network uses its learned model to predict the correlation coefficient ρ of a new segment presented to it. The result of the prediction which is interpreted as the probability of occurrence of the symptoms within that segment is used in subsequent stages to analyze and detect portion of the fundus image with symptoms. The complete flowchart of the proposed method is shown in Figure 9.  International Journal of Biomedical Imaging

Experimental Results
The proposed approach was implemented on two separate popular public retinopathy datasets: STARE [20] and Image Ret datasets [21]. The Image Ret consists of two separate datasets DIARETDB01 and DIARETDB1. DIARETDB1 consists of 89 fundus images with 5 healthy samples while the remaining samples have light symptoms of diabetic retinopathy such as hemorrhages, microaneurysms, hard exudates, and soft exudates. We used this dataset for the detection of hard exudate. On the other hand, STARE database was used for blood vessel segmentation. Apart from general performance accuracy of the proposed approach, other performance metrics were considered. These metrics are similarity measures dependent on pixel-topixel template matching between the ground-truth template and its equivalent obtained using the proposed method. True positive (TP) and true negative (TN) are defined for correct classification. TP identifies all the candidate pixels that are correctly classified as candidates whereas the TN gives the number of noncandidate pixels that are correctly identified as noncandidate pixels. For misclassification, false positive (FP) and false negative (FN) are defined. FP is where a noncandidate pixel is misclassified as a candidate pixel whereas FN is where a candidate pixel is misclassified as noncandidate. The similarity measures considered are defined in Eqs. (14)- (19).
For example, true positive (TP) is computed as the number of white pixel intersection between ground-truth binary image and binary image obtained with our method whereas true negative (TN) is the number of black pixels in the intersection between ground-truth binary image and binary image obtained with our method. FP and FN are the number of white and black pixels in the complimentary set between the two templates, respectively.
Positive prediction value, PPV = TP ð Þ Negative prediction value, Structural similarity index, SSIM x, y ð Þ= 2μ x μ y + c 1 2σ xy + c 2 À Á where μ, σ, and c 1 are the mean, standard deviation, and dynamic range constant of the template images x and y, respectively.

LSTM Implementation Information.
To realize the training of the LSTM network, python 3.8 programming language was used with TensorFlow and Keras as the libraries. Each cell of the LSTM has a look back memory of 3, meaning that to compute the present output of the cells, it uses three previous results from the preceded segments. The input layers are made of 100 cells, and the output layer (dense layers) is made   International Journal of Biomedical Imaging of I neuron as indicated in the model summary in Figure 10. The network trained total of 41,701 parameters in 100 iterations with training and testing loss (RMSE) 20.79 and 30.13, respectively.

Hard-Exudate
Detection. The output of the LSTM network determines the existence or nonexistence of hard exudates in a particular image segment based on the predicted ρ value. For ρ less or equals 0.5, it is assumed that hard exudate is absent in the segment. Values of ρ greater than 0.5 indicate presence of hard exudates but at different stages (none, mild, strong, and very strong).
However, to further classify pixels within a segment for template matching with the ground-truth image, further processing is required. Previously computed quantized version of the segment (I 8 , I 16 , I 64 , I 128 Þ is transformed using a nonlinear gamma transformation (Eq. (20)) to enhance their contrast. The constant gamma in the transformation is taken from the output score ρ of the segment from the LSTM network. These gamma-transformed segments are converted to binary image using Otsu global thresholding. Only segments with ρ > 0:5 are considered, and all pixels in segments with ρ ≤ 0:5 are classified as noncandidate. Pixels in the segments with ρ > 0:5 are classified by computing the intersection of      Figure 11 and Table 1 present the extract from the experimental results on the DIARETDB1 database using the proposed technique.
where c is a constant of the gamma transform 5.3. Blood Vessel Segmentation. As a sign of diabetic retinopathy, blood vessels may swell up and leak fluid leading to formation of abnormalities such as hard and soft exudates. In other situation, development of abnormal new blood vessels, blood vessel occlusion, and leakage of blood (hemorrhages) into a healthy portion of the retina are experienced. Detection of the signs of diabetic retinopathy involves the proper segmentation to give clue on any abnormal development on and around the vessels. The segmentation deploys the same approach as hard-exudate detection (using Eqs. (20) and (21)) except that a background estimation is used to remove background from the preprocessed image before quantization.
To estimate the background, the three channels of the RGB image are sum up together and the resulting image is converted to binary using threshold of 100 values as shown in Figure 12. Figure 13 presents the results of the segmentation of the blood vessel using the proposed method where in Table 2, a sample of performance measure results of automatic blood vessel segmentation in the STARE database is tabulated.

Comparison and Discussion
Detection of a particular symptom of diabetic retinopathy is quite a challenging task. This is due to the fact that some of these symptoms have similar textural composition and intensity distributions which makes it hard to differentiate using simple textural or intensity distribution analysis. For instance, in retina vasculature structure when viewed through the green channel of the fundus image, it exhibits intensity distribution  10 International Journal of Biomedical Imaging similar to the optical disk which sits right on top of the nerve center where these vascular structures originate. Therefore, separating such instance needs more than just filtering but much more robust approach is needed to distinctively separate and scale up the tiniest of difference existing between these features. The use of different quantization level significantly enhances the possibility of differentiating overlapping features in the image. Different quantization level transforms the features into a different resolution whereby some features that are not visible in higher resolution (higher quantization level) suddenly become visible and therefore can easily be analyzed. In detection of symptoms like blood hemorrhage and microaneurysm, they are usually tiny spots which often are too challenging because any noise in the processed image could take the form of these symptoms. The use of sequential training with LSTM becomes very handy and efficient since its output can be used to decide if a segment of the image has the symptom or not. These approaches ensure that a smoothen output is obtained, and only segments with higher probability of symptoms are further postprocessed. In Table 3, we present comparison results between the proposed approach and other state-of-the-art approaches.

Conclusions
A new approach for detecting symptoms of diabetic retinopathy has been proposed. An algorithm which thoroughly and comprehensively analyzes the retina vascular structure and hard exudate has been developed. The approach encodes in it, a powerful feature representation of the analyzed symptoms which facilitated improved performance compared to its counterparts in the literature. The use of different quantization levels transforms the spatial image domain to different resolution domains and significantly helps to improve on the interclass difference between features of similar textural and intensity distribution. Moreover, the LSTM model has been very effective and helps to mitigate the presence of noise or false positive occurrences in the final postprocessed output image. It also reduces time needed to postprocess the image as only segments with higher probability of symptoms cooccurrence are considered.
In summary, the results obtain are impressive and validate the relevance and the efficiency of the proposed approach in this context. Despite the success, the proposed method does not cover the detection of other symptoms of diabetic retinopathy like hemorrhage and microaneurysm. Other symptom detection might present different challenges.

Conflicts of Interest
The authors declare that they have no conflicts of interest.