Optic Disc and Optic Cup Segmentation for Glaucoma Detection from Blur Retinal Images Using Improved Mask-RCNN

,


Introduction
Glaucoma harms the optic nerve (ON) because of the imbalance of intraocular pressure within the eye. e affected nerve fibers result in deterioration of the retinal layer and give rise to the enlarged OD, that is, the part of the retina, and the OC is the main portion of the OD. Glaucoma is typically analysed by attaining the medical history of patients, determining intraocular pressure (IOP), conducting visual field loss tests, and manual assessment of OD employing ophthalmoscopy to investigate the shape and color of the ON [1]. e cup-to-disc ratio (CDR) is one of the key structural image cues reflected for glaucoma identification. e CDR compares the diameter of OC with the diameter of OD; less than 0.5 CDR considers the normal value [2]. So, timely detection of disease can avoid blindness [3]. Hence, clustering of the malicious area is not only advantageous for additional rigorous medical analysis by the ophthalmologist but also useful for designing the automated systems for disease categorization [4]. Initially, experts identify eye abnormalities through the manual examination of the glaucoma regions, by calculating the CDR, diameter, and boundaries variations [5]. However, due to the lack of available experts, timely identification of the eye abnormality is typically delayed [6], whereas early detection and treatment of the disease can save the victim from complete blindness. To tackle with mentioned challenges, the research community is targeting disease identification via Computer-Aided Diagnosis (CAD) based solutions.
In research, deep learning (DL) based approaches [3,4,[7][8][9][10][11][12][13][14][15][16][17][18][19][20] have been utilized to identify glaucoma signs from the retinal images. In [21], an end-to-end RCNN method for joint OD and OC segmentation was proposed. In joint-RCNN, OD and OC proposal networks were used to create bounding box (BB) proposals for OD and OC, respectively. e presented technique is computationally complex because it utilizes two distinct RCNNs to calculate the BBs of ROI regions. erefore, a more reliable technique is required which can detect glaucoma affected region efficiently. In [22], a region-based pixel density calculation method was used for OD localization. Afterward, OD segmentation was performed through the Circular Hough Transform method. e procedure is efficient and robust to the segmentation of OD; however, its recognition performance is disturbed over the images having pathological distractions. In [3], the authors adapted DenseNet into a U-Net shaped framework for OD and OC segmentation. e method was comprised of three major phases, (i) preprocessing, (ii) FC-DenseNet model designing, and (iii) segmentation of OD and OC. In the first, the green channel was extracted from RGB images; after that, OD region within two OD diameters has been collected, which were utilized for the model training. In the second phase, the model has been built which was composed of three blocks, that is, dense and transition down and up. In the final phase, refinement was performed for the extraction of OD and OC through Softmax operation. e performance of the method [3] was evaluated over five different datasets and has achieved good results with a short testing time. However, the method [3] has some shortcomings: (i) calculation of OD centre being dependent on GT data, (ii) high training time, and (iii) training being done on small set. In [18], an eighteen-layer CNN architecture was proposed for glaucoma localization, which has two main components: (i) convolutional and max-pooling layer phase (ii) and fully connected layer phase. e method has evaluated 1426 images and achieved an accuracy of 98.13%. However, the method in [18] degrades performance on unseen samples and may not detect glaucoma at early stages.
In [15], Lu et al. presented a weekly and semisupervised segmentation method based on the Modified U-Net model for OD segmentation. Initially, the GrabCut technique was employed for the generation of the GTs. e U-Net model was improved by minimizing the original U-shape structure by adding a 2-dimensional convolutional layer at the end of the convolutional layer.
is method needs a smaller amount of training, however, indicating less accuracy than other methods due to the lack of GTs. Elangovan et al. [23] have proposed the approach for glaucoma identification based on CNN which was consisted of 18 layers. e technique has different phases: preprocessing, key points computation, and classification. Initially, image resizing and data augmentation were performed; furthermore, rotation augmentation was applied to enhance the number of samples. Features were extracted through CNN which has four convolutional, two pooling, and a fully connected layer. For performance evaluation of the method, different datasets were used, namely, ORIGA, DRISHTI-GS1, RIM-ONE2, LAG, and ACRIMA. In [24], authors have presented the attention-based CNN (AG-CNN) technique for glaucoma recognition. In this paper [24], the authors have created a new database called large-scale attention-based glaucoma, which has a total of 11760 retinal images. All images were marked with negative or positive glaucoma. e AG-CNN method was comprised of two main stages; in the first phase, the attention prediction subnet was used to learn the ROI of glaucoma and then predicted the attention map. Secondly, the predicted map was utilized in the localized region, and then the feature map of this subnet was visualized to locate the pathological region. Lastly, the located region was merged with the anticipated attention to combining the input and subnet of glaucoma key points, for computing the binary labels of glaucoma. e method in [24] shows good performance and reduces the redundancy of fundus images; however, the method depends on the attention prediction subnet.
Existing techniques perform well over the standard datasets but not generalized well to real-world scenarios. e main reasons for performance degradation are the occurrence of blurring, noise, and light variations during the image capturing process, while the standard datasets are acquired in the control environment. In this work, our main motivation is to propose such techniques that can localize and segment the fundus samples under the presence of such factors. We have selected standard datasets like ORIGA and HRF databases which contain light variations and noise effects but lack the presence of blurriness. So, in this work, we have added blurriness in samples of mentioned datasets and proposed a novel technique, namely, Densenet-77 based [25] customized Mask-RCNN to detect and segment the OC and OD from fundus samples. e following are the main contributions of our work: (1) e proposed method can precisely segment the OD and OC for glaucoma diagnosis from retinal images under the presence of blurring, noise, and light variations in input images. (2) We have created the annotations which are essential for the training of the proposed model because available datasets do not have a BB and mask GTs. (3) Accurate localization and segmentation of OD and OC due to effective region proposal network of Custom Mask-RCNN as it works in an end-to-end manner. (4) Extensive results perform over challenging dataset ORIGA to show the robustness of the presented framework. Moreover, we have performed crossdataset validation over the HRF database to demonstrate the generalization power of our technique to real-world scenarios.

Materials and Methods
e retinal images collected from different clinics can contain various artifacts like blurring, noise, out-of-focus images, or light variations, which must be removed to enhance the segmentation performance of the system. In our paper, we have employed the feature level set technique for correcting the bias field and applied the median filter to minimize the noisy effects from retinal images.

2
International Journal of Optics

Preprocessing.
e augmentation step is employed to increase the image samples in terms of data diversity. For this purpose, the input images are rotated at the angles of 0 o , 90 o , 180 o , and 270 o degrees, and Gaussian blur [26] is used over them to add blurriness.
Furthermore, we have generated the annotations for OC and OD regions. e GT mask along the retinal image is needed to detect glaucoma regions, that is, OD and OC for the training procedure. We used the VGG Image Annotator [27] tool to create a polygon mask for every image. Figure 1 presents a sample of images and related mask images. e annotations are saved in a JSON file that contains the set of polygon points for OD and OC regions. is file is utilized to generate a mask image related to each retinal image.

Localization and Segmentation of OD and OC Using
Custom Mask-RCNN. Our objective is the automated detection and segmentation of OD and OC from fundus images with complicated backgrounds and under the presence of postprocessing operations without any human involvement. We aimed to identify glaucoma affected and nonaffected areas from a given sample by utilizing the Mask-RCNN [28] approach. e introduced approach (as shown in Figure 2) comprises the following steps: (1) key points computation, (2) region proposal network (RPN), (3) region of interest (ROI) classifier and bounding box regressor (BBR), and (4) OD and OC segmentation. e comprehensive explanation of all steps is described in the following.

Features Extraction.
In our approach, we have used DenseNet-77 at the feature extraction level of the Mask-RCNN to compute the key points from a given sample. Utilizing DenseNet-77 for features computation exhibits an improvement in both the segmentation accuracy and computational complexity. e starting layers compute lowlevel key points from the images, that is, edge and corner information, and the deep layers calculate high-level key points, that is, structure and chrominance information. e extracted feature map is more enhanced through the FPN that calculates the key points with improved object representation at diverse scales for the RPN module.
DenseNet [25] model is the advanced or improved form of Resnet, where the current layer belongs to all other layers. DenseNet contains the set of dense blocks, which remain consecutively linked with each other by using the extra convolutional and pooling layers among consecutive dense blocks. DenseNet can present the complex transformations which result in improving the issue of the absence of the target's position information for the top-level key points to some degree. DenseNet reduces the total parameters which makes them cost-effective. Furthermore, it supports the calculation of key points and encourages them to recycle, which makes them more suitable for region classification in retinal images. So, in this paper, we have employed the DenseNet-77 as a feature extractor for Mask-RCNN. e explanation of the DenseNet-77 model is shown in Figure 3. It also signifies the query sample size to be accommodated before computing key points from the allocated layer. e complete flow or description of the proposed method is presented in Algorithm 1.
e DenseNet-77 has two potential differences from traditional DenseNet: (i) it has a smaller number of parameters than the actual model and (ii) the layers within all dense block are adjusted to overcome with the computational complexity. Table 1 presents the detail of the training parameters for the Custom CenterNet.

Region Proposal Network.
e calculated feature map from the previous step is passed as input to the RPN module to produce ROIs. Our work has used a 3 × 3 convolutional layer to scan the input sample by a sliding window to produce appropriate anchors that show the BB with varying scales and dispersed over the whole input sample. RPN module generates almost 20 k anchors of varying scales and dimensions which relate to each other to cover the entire image. A classifier is employed to decide whether an anchor holds the object or background (fg/bg). e BBR produces BBes according to the set intersection-over-union (IoU) value. Precisely, if the IoU value for an anchor is greater than 0.7 holding a GT box, then it is categorized positive; otherewise, it is marked as negative.
e RPN module may generate overlapped areas; therefore, a nonmaximum suppression technique is used to keep the regions with the highest foreground score and discard the remaining insignificant parts. e final RoIs are passed to the succeeding step for performing classification.

ROI Classification and Bounding Box Regression.
is module accepts two types of inputs which are the introduced RoI and feature map from previous steps. In contrast to the RPN module, this part is deeper and assigned a specific class to RoIs like glaucoma or nonglaucoma and improves the location of BB. e main objective of the BBR is to improve the location and dimension of the BB to correctly capture the glaucoma region. Typically, the margins of ROI do not overlap with the granularity of the feature map because of the reason that the computed feature map is shrunk k times from the actual image size. For resizing the feature maps, the ROIAlign layer is utilized to compute fixed-length key points vectors for random-sized candidate areas. For resizing, the ROIAlign layer employs the bilinear interpolation to evade misalignment problems that occurred in the ROI pooling layer which utilizes the quantization process.

Segmentation Mask.
is module accepts positive marked ROIs by the ROI classifier as input and computes the segmentation mask with the dimension of 28 × 28 shown by floating values that hold more details as compared to binary masks. e GT masks are resized to 28 × 28 to compute the loss using the identified mask in the training step, which is later scaled up to match the actual size of the ROI BB to show the final mask.      e presented framework uses a multitask loss L on all sampled ROIs given as follows: Here L bclass , L ref , and L smask demonstrate the box class labels estimation loss, BB refinement loss, and segmentation mask prediction loss, respectively. L bclass presents the log loss of the two categories (glaucoma/nonglaucoma), given as follows: L bclass is the log loss of the binary classification, where P t presents the target prediction probability of whether the anchor t holds glaucoma and l shows the gt label. ere are about 20 k anchors generated of distinct scales and sizes that correspond with each other to cover the image. If an anchor has intersection over union (IoU) higher than 0.5 with a ground-truth (GT) box, it is classified as a positive anchor; otherwise, it is negative. If several anchors overlap too much, we keep the one with the highest foreground score and discard the rest (referred to as nonmax suppression). Moreover, the value of l is 1 for true-marked anchors and 0 otherwise. e BB regression loss is given as follows: where Here, vector c j is presenting four dimensions of the estimated BB, and c * j is showing the dimensions of gt relating to the true-marked anchors. e smooth-L1 function is a robust L1 loss which is prone to outliers as compared to L2 loss. When regression targets are unbounded, training L2 loss leads to a gradient explosion and requires a carefully tuned learning rate. During the training of Mask-RCNN, the average cross-entropy loss is used which is calculated as follows: International Journal of Optics where p xy is the pixel value at the location (x, y) in a gt mask of size N × N and for the same pixel, V k xy is presenting its estimated value in the mask obtained for class k (k � 1 for glaucoma region and 0 for nonglaucoma region) [28].

Results and Discussion
We have implemented the model using Keras and Ten-sorFlow libraries with DenseNet-77 and FPN for feature extraction. We initialized the model using pretrained weights obtained from the COCO dataset and employed transfer learning to fine-tune the model on retinal datasets for OD and OC segmentation. For experimentation, we used a 70-30 ratio that is randomly divided into training (70%) and test (30%) sets.

Dataset.
e evaluation experiments of the system were performed on the ORIGA "Online Retinal Fundus Image Database for Glaucoma Analysis" dataset [29]. e details of dataset are presented in Table 2. e dataset have a total of 650 images in which 168 are glaucomatous samples and the remaining 482 are nonglaucomatous samples and gathered from the "Eye Research Institute, Singapore." In each image, OD and OC regions are marked by experts using a vertical and nonrotated ellipse. e sample images are shown in Figure 4.

Evaluation Parameters.
e proposed method is assessed by employing the intersection over union (IOU) as described in Figure 5. A shows the GT rectangle, and B denotes the estimated rectangle with ROI regions. e first decision for the region is identified when the value of IOU is greater than 0.5; otherwise, it is not recognized. e average precision (AP) is mostly employed in evaluating the precision of object detectors, that is, R-CNN, SSD, and YOLO. e geometrical explanation of precision is shown in Figure 6. In our framework of the detection of glaucoma regions, AP depends on the idea of IOU [30].

Results.
is section presented the details of results achieved after performing the experiments over diverse samples with light, color, region sizes variations, and the presence of blurring. For OD, to show the detection accuracy of the presented framework, the visual results are reported in Figure 7. It can be observed from the results that the proposed method can accurately localize the OD regions from the healthy areas despite discontinuous or blurry boundaries and artifacts in fundus images. Moreover, the Mask-RCNN method can precisely segment the OD regions by overcoming the challenges of location, shape, and size.
Furthermore, the visual results for OC segmented regions are shown in Figure 8. From the reported results, it can be visualized that our method can accurately localize and segment the OC regions under the different conditions due to a representative set of features extraction by DenseNet-77 and segmentation power of Mask-RCNN. However, its localization and segmentation power may slightly decrease for samples with intense color variations which results in colormatching with healthy regions. e proposed method can accurately recognize the OD and OC with an average accuracy of 0.965 on the ORIGA dataset. Moreover, the proposed technique can precisely segment the OD and OC by overcoming the challenges of blurriness and variations in location, size, and shape.
To further understand the performance of our method, we have used the evaluation parameters i.e., accuracy, precision, recall, F-measure, and IOU. Table 3 demonstrates the results or proposed approach. We can observe that the presented framework has achieved an average precision, recall, F-measure, and IOU as 0.965, 0.963, 0.97, and 0.972, respectively. Moreover, the confusion matrix of the proposed approach is presented in Figure 9.

DenseNet-77 Framework Evaluation.
We performed an analysis to evaluate the robustness of the DenseNet-77 framework for eye disease detection by comparing it with other DL approaches. To accomplish this, the accuracy of the introduced Mask-RCNN with DenseNet-77 is compared with other base models, that is, Inception-v4 [31], VGG-16 [32], ResNet-101 [33], ResNet-152 [33], and DenseNet-121 [34]. Table 4 shows the comparative analysis of the presented method with other frameworks in both the aspect of model parameters and detection accuracy. e results of this comparative analysis indicate that the custom Mask-RCNN with DenseNet-77 works better than the Inception-v4, VGG-16, ResNet-50, ResNet-101, ResNet-152, and DenseNet-121. Moreover, from Table 4, it can be seen that VGG-16 has the highest model parameters, whereas ResNet-152 is the most expensive approach in terms of execution time. On the contrary, the presented framework with the DenseNet-77 model is economically most efficient and took only 1067 seconds for execution. e main reason for the efficient performance of DenseNet-77 is having a shallow architecture that employs efficient reuse of framework parameters without using redundant key point maps. Such structure of DenseNet-77 results in the extensively minimum number of framework parameters, whereas the comparative techniques suffer from high economical cost and unable to show efficient classification performance for the samples with noise, blurring, scale, and angle variations. erefore, the presented technique better tackles the issues of comparative models by introducing a robust network for feature extraction and shows complicated transformations perfectly, leading to enhanced detection accuracy in postprocessing attacks as well. From the conducted analysis, it can be summarized that our customize Mask-RCNN with DenseNet-77 framework exhibits better performance than the other deep learning models in both terms of accuracy and efficacy.

Evaluation of the Custom Mask-RCNN Model.
In this section, we have compared the performance of the introduced methodology with other region-based segmentation methods, that is, RCNN and Faster-RCNN over the ORIGA database, and results are reported in Figure 10 [35], and Fu et al. [8]. ese techniques are capable of detecting glaucoma from retinal images. However, they require intense training and exhibit lower accuracy for training samples with the class imbalance problem. e comparison results are presented in Table 5. Our framework has acquired the highest average precision, recall, and AUC, that is, 0.965, 0.963, and 0.96, respectively, that signifies the reliability of the proposed method in comparison with other methods. Unlike these methods, our model performs segmentation on the localized ROIs, which limits the space of segmentation and uses the ROIAlign layer which ultimately improves the accuracy of the final segmentation result.  We have plotted the box plot for evaluation of the cross dataset in Figure 11; the accuracy of the test and train is      spreading across the number line into quartiles, median, whisker, and outliers. According to the figure, we achieved an average accuracy of 98% for training and 97.7% for testing which exhibits that our proposed work outperforms the unknown samples as well. erefore, it can be concluded that the introduced framework is robust to OD and OC localization and segmentation.

Conclusions
In this paper, we presented a deep learning technique to customize Mask-RCNN for precise and automated segmentation of OD and OC from the retinal images. We introduce the DenseNet-77 model at the feature computation level of Mask-RCNN to compute the more diverse key points which assist in accurately localizing the OD and OC regions under the various sample conditions. We have tested our framework over a challenging database, namely, ORIGA, and performed cross-dataset validation on the HRF database to show its robustness. e results exhibit that improved Mask-RCNN can compute deep features with effective representation of glaucoma regions over existing systems and serves as a new automated tool for diagnostic purposes. Moreover, both the qualitative and quantitative results show that Custom Mask-RCNN works better than the base framework. Although our approach has presented better OD and OC detection accuracy, however, it can be further enhanced by the inclusion of other latest DL-based techniques like EfficientNet. Furthermore, we plan to extend our work to other medical abnormalities.

Data Availability
Data sharing is not applicable to this article as authors have used publicly available datasets, whose details are included in the Experimental Results section of this article.