Deep Learning Methods for Glaucoma Identification Using Digital Fundus Images

In this survey we analyzed the literature, evaluated the methods for glaucoma identification and identified the main issues faced by other researchers. From the literature it is observed that most of the computer aided diagnosis (CAD) tools for identification of pathological changes in eye fundus are in the early stage of development. The accuracy of glaucoma classification achieved by different methods ranges from 87.50% to 99.41%. However, the classification results are obtained with different data sets and different quality images. Therefore, the further research would be needed to create an algorithm using a data set contained of wider range and various quality images. Also, it is necessary to estimate the advantages and disadvantages of the existing methods and to compare the obtained classification results under the same conditions of experiments.


Introduction
Glaucoma ranks second among the most common eye diseases in the world, with approximately 60 million cases reported worldwide in 2010, the number of affected patients is forecasted to increase up to about 80 million by 2020 (Quigley et al., 2006) and 111.8 million by 2040 (Tham et al., 2014). Glaucoma is an eye disease in which the pressure inside the eye increases, the field of view narrows, the optic nerve begins to atrophy and vision decreases. Prolonged increase in intraocular pressure (IOP) damages the nerve fibers in the eye and the optic nerve, resulting in a narrowing of the field of view and impaired vision (Gayathri et al., 2014). The pressure in a healthy eye can be 10-20 mmHg, although it can vary from person to person . The optic nerve begins at the optic disk (OD), which is 1.5 mm in diameter, bright yellow, round disc-shaped with a recess in the center (Wu et al., 2019), . The optic nerve spreads nervous impulses between the retina and brain. The place of the retina where the optic nerve begins is called the optic nerve head (ONH). A complete ocular examination is necessary for an accurate diagnosis of glaucoma. To measure intraocular pressure, a test called tonometry is performed. However, intraocular pressure is constantly changing. For some people with glaucoma, the pressure in the eye may be normal. Glaucoma of any type, in addition to elevated intraocular pressure, is characterized by glaucoma depression and atrophy of the optic disc and typical eye changes (Hirota et al., 2020). Together with the optic disc (OD), the fovea (FOV) is also an important anatomical landmark on the posterior pole of the retina. The fovea is a depression in the inner retinal surface, the photoreceptor layer of which is entirely cones and which is specialized for maximum visual acuity. The fovea is responsible for sharp central vision and is located in the center of a darker area (Niemeijer et al., 2009). Another important parameter in detecting the glaucoma is cup to disk ratio (CDR) (Gayathri et al., 2014). The bigger the ratio, the more empty space there is in the nerve head. That space may be left behind when nerve cells die. The severity of the glaucoma increases according to the increase of the CDR. Most healthy persons in the population have average a cup to disc ratio of about 0.3. A mild glaucoma, a moderate glaucoma and a severe glaucoma might have a CDR of 0.4, 0.5-0.7, above 0.7 accordingly. The mild glaucoma is the slowest form of the disease. Changes in peripheral vision are not noticeable. A moderate glaucoma is primarily caused by a decrease in the visual field already in the peripheral regions. Visual function is significantly reduced, resulting in reduced efficiency (Acharya et al., 2015). To determine if a person has signs of glaucoma or not, a glaucoma risk index (GRI) can be helpful. In (Mookiah et al., 2012) and in (Acharya et al., 2015) have been proposed the GRI calculation based on significant features and according to the selected method for the features extraction, the range of GRI might be different. In (Mookiah et al., 2012) have been calculated the GRI 33.159±0.012 for normal and 4.701±0.003 for glaucoma classes adapting HOS and DWT features while in (Acharya et al., 2015) have been combined the ranked features extracted from the Gabor transform coefficients and calculated the GRI 8.68±1.67 for normal class and 4.84±2.08 for glaucoma.
To avoid the unpleasant and irreversible effects of glaucoma, it is necessary to visit eye doctors prophylactically and have vision tests performed (Raghavendra et al., 2018). Various tests are needed to accurately diagnose glaucoma. Because primary glaucoma is asymptomatic, it is difficult to detect in it's early stage. In (Hirota et al., 2020) have been described the variability of rim width (RW), optical disk margin (DM) and rim margin (RM) and the complexity of the handcrafted process to find rim thinning. The detection of rim thinking in severe glaucoma case (CDR≥0.7) was successful for 1 of the 5 clinicians only. Therefore, computer aided diagnosis (CAD) tools have recently become an important object of researchers that are already working on the development of the CAD tools to be used by ophthalmologists in diabetic retinopathy, blood vessel changes, myopia, hypertomy identification (Stabingis et al., 2017), (Stabingis et al., 2018). In parallel, the CAD tools for glaucoma identification are in development phase as well. The CAD tools use digital fundus images (Figure 1.) that are captured using a fundus camera and help to identify the retinal health using different computational algorithms (Raghavendra et al., 2018). The key diagnostic parameters for the automated detection of glaucoma are: -Cup to Disk Ratio (CDR); -Optic Disk (OD); -Optic Nerve Head (ONH); -The Fovea (FOV).

Glaucoma detection process using digital fundus images
The process of the automatic diagnosis of glaucoma can be divided into 5 stages such as image preprocessing, image segmentation, features extraction, image classification and performance analysis. These stages are described in more detail in the following subsections.

Image segmentation
Image segmentation divides a digital image into several segments to simplify an image for the further analysis. Segmentation in digital fundus images incorporates: -Geometric parameter model to detect the vessel centerlines (Matsopoulos et al., 2008); -Hough transform to identify the margin of the Optic Disc (Niemeijer et al., 2009); -Probability maps for the Optic Disc center localization (Niemeijer et al., 2009); -Template based method (Niemeijer et al., 2009); -Label-Preserving transformations and dropout to avoid the over-fitting on images (Chen et al., 2015).

Performance evaluation
The parameter General Accuracy, the measures Specificity, Sensitivity, Area Under Curve, Positive Predicted Value, Precision and Glaucoma Risk Index are the main performance evaluation criteria to estimate the accuracy of the glaucoma identification algorithm. More details are described by (Singh et al., 2019).

Literature Survey
From the literature it is observed that most of the algorithms (Table 1.) for the automated detection of glaucoma apply feature extraction and classification techniques. The Higher Order Spectra (HOS) (Mookiah et al., 2012), the Discrete Wavelet Transform (DWT) (Mookiah et al., 2012), (Matsopoulos et al., 2008), the Gabor transform (Acharya et al., 2015) and etc. are the feature extraction techniques. k-Nearest Neighbor (k-NN) (Matsopoulos et al., 2008), (Niemeijer et al., 2009), Support Vector Machine (SVM) (Gajbhiye et al., 2015), (Mookiah et al., 2012), (Acharya et al., 2015), Convolutional Neural Network (CNN) (Mitra et al., 2018), (Chen et al., 2015), (Raghavendra et al., 2018), Artificial Neural Network (ANN) (Matsopoulos et al., 2008), (Gayathri et al., 2014) and etc. are used to predict the classes. The architecture of the highest accuracy achieved algorithms is detailed in Table 2. The detailed review of the recent studies in glaucoma diagnostic parameters localization (Table 1.) has shown that the accuracy of the classification varies depending on the data set. The overall achieved classification accuracy ranges from 87.50% to 99.41%. Although the researchers are working on algorithms creation but it is very difficult to summarize and compare the results due to the use of different data sets. Analyzing the data sets on a case-by-case basis the advantages, compared to regular classifiers, are shown by neural networks: -CNN does not include the preprocessing stage that might influence the performance accuracy (Raghavendra et al., 2018); -CNN is able to extract the features automatically so a lot of computational time and memory are saved (Raghavendra et al., 2018); -CNN does not require the image segmentation as it is powerful to provide the significant features for the normal and glaucoma subjects description (Raghavendra et al., 2018); -CNN can process even on less quality images (Mitra et al., 2018); -CNN process on the entire image and encodes valuable information about the various features (Mitra et al., 2018); -As the CNN process on the full image during training, it is able to generate a completely new dataset (Mitra et al., 2018); -Adjusting the depth and size of the latent space results in an ever-changing dimension and ever-changing properties. The CNN is able to extract these properties from which the classification result is obtained (Stabingis et al., 2017); -Autoencoders can select features serving as input for succeeding layers instead of considering the image as a whole.
As the neural networks have shown a remarkable advantages, we decided to analyse the architecture of the recent proposed methods (Table 2.). The neural networks used in latest studies are composed of different architectures. Many methods have used their own data sets and there are no generalized results on how those data sets interact with each other. A separate study would be needed to identify the advantages and disadvantages of these architectures as the differences between the architectures are quite large. Additionally, the study would be needed to assess how these algorithms work for different data sets and different quality images including the evaluation of computation time, learning speed and classification accuracy.

Data sets for analysis
Although a standard databases are used, different investigators select the specific cases, different data sets and data augmentation techniques (Mitra et al., 2018), (Chen et al., 2015). The data sets used in recent studies are:   (Acharya et al., 2015). -The MESSIDOR (WEB, c) dataset is one of the biggest open-source dataset that consists of 1200 eye fundus images. The images were acquired using 3CCD camera on a Topcon TRC NW6 non-mydriatic retinograph with a 45 field of view. As most of the original images have a blank space apart from the retina, they are cropped and resized (Mitra et al., 2018). -The DRIVE (WEB, a) dataset consists of 40 images, where of 33 have no symptoms of diabetic retinopathy and 7 are indetified as having a mild early diabetic retinopathy. The data were provided by a diabetic retinopathy screening program in The Netherlands. A Canon CR5 non-mydriatic 3CCD camera with a 45 field of view has been used for the image acquisition.
The listed data sets RIM-1 r2, ORIGA, STARE, EGPS, KMC, MESSIDOR and DRIVE are publicly available and can be used in our further work. Also, Vilnius University has received a bioethics permit and is already starting to collect its data which will be publicly available to other researchers. Kaggle, SCES and the data set provided by the Abramoff and Suttorp-Schulten program are not available.

Conclusion and Discussion
In order to get acquainted with the latest work in automated glaucoma detection, the following tasks have been done: -Literature survey; -Algorithm analysis; -Evaluation of the classification accuracy obtained by classical methods and neural networks; -Review of the data sets that are publicly available and can be used for the future research; -Neural network architectures analysis.
From the recent researches it is observed that most of the algorithms have been developed for a quite high quality images only and there are no algorithms that could work for a less quality images captured by the hand held cameras. The main goal of recent algorithms is to solve an image classification problem. Currently the overall achieved image classification accuracy ranges from 87.50% to 99.41%. However, due to different architectures, these results and the parameters extracted by the neural network for an image classification stage are difficult to interpret. So the purpose of our future work would be: -The development of the algorithms that could handle different quality images obtained by different cameras; -The performance of experiments with images of different quality and evaluate the impact they have on neural network learning; -The link of the neural network to biomedical parameters to make the obtained results understandable; -The advantages and disadvantages evaluation of the existing architectures; -The evaluation of the impact of neural network architecture on classification results; -The experiment on layers number and latent space depth.