Automated Glaucoma Detection Using Support Vector Machine Classification Method

Glaucoma is an eye disease that can result in blindness if it is not detected and treated in proper time. Increased intraocular pressure (IOP) of the fluid in the eye often causes glaucoma. Glaucoma is the second leading cause of blindness in the world and is called as the


INTRODUCTION
The eyes, with which we view the world, are the most complex, delicate and sensitive organs in our human body. They are responsible for four fifth of all the information our brain receives. Glaucoma is the term applied to a group of eye diseases that gradually result in loss of vision by permanently damaging the optic nerve, the nerve that transmits visual images to the brain [1]. Eduard Jaeger (1854) described glaucoma as the silent thief of sight which is a specific optic nerve disease with the progressive break down of nerve fibers [2]. The eye is filled with a fluid (aqueous), which has a stable pressure called intraocular pressure (IOP). This fluid is continuously formed within the eye and is also simultaneously drained out to maintain a stable IOP. IOP is increased due to the blockage of the normal outflow mechanism. This situation damages the optic nerve of the eye. This elevation in IOP is generally associated with the development of glaucoma, although additional factors may be also responsible in its development. In some cases, glaucoma may occur in the presence of normal eye pressure. This form of glaucoma is believed to be caused by poor regulation of blood flow to the optic nerve. Glaucoma is the diagnosis given to a group of ocular conditions that contribute to the loss of retinal nerve fibres with a corresponding loss of vision. Glaucoma therefore is a disease of the optic nerve, the nerve bundle which connects the eye to the brain and relays the visual signal [2].
There are two main types of glaucoma: "openangle" and "closed-angle" (or "angle closure") glaucoma [2]. Open-angle chronic glaucoma is painless, develops slowly over time. It often shows no symptoms until the disease has progressed significantly. It is treated with either glaucoma medication to lower the pressure, or with various pressure-reducing glaucoma surgeries. Unlike open-angle glaucoma closedangle glaucoma, is characterized by sudden ocular pain, redness, nausea and vomiting, and other symptoms resulting from a sudden spike in IOP. It is treated as an ocular emergency. The iridocorneal angle between the iris and the cornea is used to differentiate open angle glaucoma and closed angle glaucoma [2].
As lost capabilities of the optic nerve cannot be recovered, it is necessary to detect glaucoma at an early stage. Subsequent treatment is essential for affected patients to preserve their vision. Manual analysis of the eye is time consuming and the accuracy of the parameter measurements also varies with different clinicians. Optical Coherence Tomography (OCT) and Heidelberg Retinal Tomography (HRT) are used in glaucoma detection but the cost of these methods is very high [3]. As an alternative, many ophthalmologists use fundus cameras to diagnose glaucoma [4]. Image processing techniques allow to extract the features that can provide useful information to diagnose glaucoma. This paper presents a novel method for to detect glaucoma using digital fundus images. At first, the images of eye fundus are pre-processed. Then, Principal Component Analysis (PCA) method is used for extracting features from preprocessed images. After PCA, the modified images are fed into a Support Vector Machine (SVM) classifier for training purpose. The performance of the classifier is tested by cross validation approach. Now, the classifier can distinguish between a normal eye fundus and a glaucoma affected eye fundus up to a certain level of accuracy.

LITERATURE REVIEW
Several studies are reported in literature for detection of optic disk and detection and classification of glaucoma.
In year 2006, Kevin Noronha performed a work, "Enhancement of retinal fundus Image to highlight the features for detection of abnormal eyes" [5]. Different techniques that are used to detect main features of retinal fundus images such as fovea, optic disk, exudates and blood vessels are specified in this work. Author finds the brightest part of the fundus and applies Hough transform to determine the optic disk and its centre.
In year 2007, Sangyeol Lee performed a work, "Validation of Retinal Image Registration Algorithms by a Projective Imaging Distortion Model" [6]. A variety of methods for retinal image registration have been proposed. Authors also present the validation tool for any retinal image registration method by tracing back the distortion path and accessing the geometric misalignment from the coordinate system of reference standard.
In year 2008, S. Sekhar performed a work," Automated localization of retinal optic disk using hough transform" [7]. In this work, the proposed methodology consists of two steps: in the first step, region of interest (ROI) is found by image by means of morphological processing, and in the second step, optic disk is detected using the Hough transform.
In year 2010, Zhuo Zhang performed a work," ORIGA-light: An Online Retinal Fundus Image Database for Glaucoma Analysis and Research" [8]. Author present an online dataset, ORIGAlight, which aims to share clinical retinal images with the public. Author had updated the system continuously with more clinically verified groundtruth images. The proposed method focuses on optic disk and cup segmentation.
In year 2010, Vahabi Z proposed," The new approach to Automatic detection of Optic Disc from non-dilated retinal images" [9]. Author describes a new filtering approach like Sobel edge detection, Texture Analysis, Intensity and Template matching to detect Optic Disc. The proposed algorithm is applied in wavelet domain on 150 images of Messidor dataset.
In year 2011, Zafer Yavuz performed a work," Retinal Blood Vessel Segmentation Using Gabor Filter and Tophat Transform" [10]. In this paper, Author gave a method for retinal blood vessels segmentation by applying firstly Gabor filter to enhance blood vessels and then applying top-hat transform. Later on, the output is converted to binary image with p-tile thresholding.
In year 2012, Nilan jan Dey performed a work," Optical Cup to Disc Ratio Measurement for Glaucoma Diagnosis Using Harris Corner" [11]. In this paper, CDR is determined using Harris Corner. Harris comer detector measures the local changes of the signal with patches shifted in different directions by a small amount. It is based on the local auto-correlation function of a signal.
In year 2012, R. Geetha Ramani performed a work," Automatic Prediction of Diabetic Retinopathy and Glaucoma through Retinal Image Analysis and Data Mining Techniques" [12]. This paper proposed a novel approach for automatic disease detection. Retinal image analysis and data mining techniques are used to accurately categorize the retinal images as either normal, Diabetic Retinopathy and Glaucoma affected.
In year 2013, S. Sri Abirami and S. J Grace Shoba proposed "Glaucoma Images Classification Using Fuzzy Min-Max Neural Network Based On Data-Core" [13]. In this method, the Optical Coherence Tomography (OCT) images are preprocessed which includes colour conversion, resizing and pruning before it is given as input for classification. This classification technique uses Fuzzy min-max neural network algorithm.
In year 2014, Sheeba O., Jithin George, Rajin P. K., Nisha Thomas, and Sherin George proposed "Glaucoma Detection Using Artificial Neural Network" [14]. Here neural network is trained to recognize the parameters for the detection of different stages of the disease. The neuron model has been developed using feed forward back propagation network.

PROPOSED METHODOLOGY
The block diagram of the proposed system for the detection of glaucoma is shown in Fig. 1. The various steps in our proposed method i.e image pre-processing, feature extraction and dimensionality reduction and image classification are detailed in the following section

Image Pre-processing
Retinal images are acquired with a digital fundus camera which captures the illumination reflected from the retinal surface. Despite controlled conditions, many retinal images suffer from nonuniform illumination given by several factors: the curved surface of the retina, pupil dilation (highly variable among patients), or presence of diseases among others [14]. The curved retinal surface and the geometrical configuration of the light source and camera lead to a poorly illuminated peripheral part of the retina with respect to the central part. Pre-processing is the step taken before the major image processing task. The problem here is to perform some basic tasks in order to render the resulting image more suitable for the job to follow [14]. In this case it may involve enhancing the contrast, removing noise. Pre-processing can dramatically improve the performance of image processing methods like Image transform, Segmentation, Feature extraction and disease detection. Several techniques have been used to enhance retinal images. Pre-processing also eliminates disease independent variations from the input image [14].
In our proposed method, 100 digital eye fundus images are taken for training the SVM classifier.
At first, some preprocessing tasks are required to be done on the input images as mentioned above. This includes normalizing the images, colour conversion from RGB to grayscale, resizing the images, removal of noise from the images and contrast adjustment for improving image quality. In our method, instead of preprocessing some particular regions of images, entire images are preprocessed before feature extraction and classification tasks.

Normalization
In our method, the original raw input images are normalized in the range [0, 1], resized to 256 X 256 and converted to grayscale before noise removal and contrast improvement. In image processing, normalization is a process that changes the range of pixel intensity values.
Applications include photographs with poor contrast due to glare, for example. Normalization is sometimes called contrast stretching or histogram stretching. In more general fields of data processing, such as digital signal processing, it is referred to as dynamic range expansion. The motivation is to achieve consistency in dynamic range for a set of data, signals, or images and also Principal Component Analysis method cannot be applied on unnormalized images.

Colour conversion
Then the normalized RGB images are converted to grayscale images as grayscale images are easier to be operated upon for preprocessing tasks like contrast enhancement than true colour images.

Resizing
After that, the grayscale images are resized to 256 X 256 as SVM classifier cannot be trained using images of different dimensions. Also, resizing images improves efficiency in implementation and reduces runtime overhead.

Noise removal
Image noise is random variation of brightness or colour information in images, and is usually an aspect of electronic noise. It can be produced by the sensor and circuitry of a scanner or digital camera. Noise reduction is the process of removing noise from the input image. Our goal is to remove the noise from the image in such a way that the original image is discernible [14]. We have used two -dimensional Gaussian Filter for noise removal. A Gaussian filter smooths an image by calculating weighted averages in a filter box. It is more effective in smoothing images than normal mean filter.

Improvement of image contrast
After noise removal, Adaptive Histogram Equalization technique is used for improving image contrast. Histogram equalization increases the dynamic range of the histogram of an image and intensity value of pixels in the input image. The output image contains a uniform distribution of intensities and increased contrast of an image than the original image. Adaptive Histogram Equalization differs from ordinary histogram equalization in the respect that the adaptive method computes several histograms, each corresponding to a distinct section of the image, and uses them to redistribute the lightness values of the image. It is therefore suitable for improving the local contrast [14]. This step also removes the non-uniform background which may be due to non-uniform illumination or variation in the pigment colour of eye.

Feature Extraction Using Principal Component Analysis Method
In this paper, the normalizing features were extracted by using Principal Component Analysis (PCA) method and the features were fed to the SVM classifier. Principal Component Analysis (PCA) is a mathematical procedure that uses an orthogonal transformation to convert a set of observations of possibly correlated variables into a set of values of linearly uncorrelated variables called principal components. This transformation is defined in such a way that the first principal component has the largest possible variance (that is, accounts for as much of the variability in the data as possible), and each succeeding component in turn has the highest variance possible under the constraint that it is orthogonal to (i.e., uncorrelated with) the preceding components. The principal components are orthogonal because they are the eigenvectors of the covariance matrix, which is symmetric. PCA is sensitive to the relative scaling of the original variables. PCA is a powerful tool for analysing data because it is a simple, non-parametric method of extracting relevant information from confusing data sets. The other main advantage of PCA is when identifying patterns in the compress data (or by reducing the number of dimension) the information loss is very less. The number of principal components is less than or equal to the number of original variables.
After pre-processing step, we get pre-processed input images with size 256 X 256. These images are basically 256 X 256 matrices with normalized pixel values where rows correspond to observations and columns correspond to samples or data dimensions. The steps of PCA are as follows: Step 1: Subtract the mean For PCA to work properly, we have to subtract the mean from each of the data dimensions. The mean subtracted is the average across each dimension. This produces a data set whose mean is zero.
Step 2: Calculate the covariance matrix In probability theory and statistics, covariance is a measure of how much two random variables change together. If the greater values of one variable mainly correspond with the greater values of the other variable, and the same holds for the smaller values, i.e., the variables tend to show similar behaviour, the covariance is positive. In the opposite case, when the greater values of one variable mainly correspond to the smaller values of the other, i.e., the variables tend to show opposite behaviour, the covariance is negative. The sign of the covariance therefore shows the tendency in the linear relationship between the variables. A covariance matrix is a matrix whose element in the i, j position is the covariance between the i th and j th elements of a random vector (that is, of a vector of random variables). Intuitively, the covariance matrix generalizes the notion of variance to multiple dimensions. For n dimensional image data, the covariance matrix will be n Xn.

Step 3: Calculate the eigenvectors and eigenvalues of the covariance matrix
Since the covariance matrix is square, we can calculate the eigenvectors and eigenvalues for this matrix. These are important as it gives useful information about the data.

Step 4: Choosing components and forming a feature vector
If we look at the eigenvectors and eigenvalues from the previous step, we notice that the eigenvalues are quite different values. In fact, it turns out that the eigenvector with the highest eigenvalue is the principle component of the data set. In general, once eigenvectors are found from the covariance matrix, the next step is to order them by eigenvalue, highest to lowest. This gives the components in order of significance. Now, we can decide to ignore the components of lesser significance. We do lose some information, but if the eigenvalues are small, we don't lose much. If we leave out some components, the final data set will have less dimensions than the original. To be precise, if we originally have n dimensions in your data, and so we calculate n eigenvectors and eigenvalues, and then we choose only the first p eigenvectors, then the final data set has only p dimensions. What needs to be done now is we need to form a feature vector, which is just a fancy name for a matrix of vectors. This is constructed by taking the eigenvectors that we want to keep from the list of eigenvectors, and forming a matrix with these eigenvectors in the columns.
Feature Vector = (eig 1 eig 2 eig 3 …eig n ) Step 5: Deriving the new data set This is the final step in PCA, and is also the easiest. Once we have chosen the components (eigenvectors) that we wish to keep in our data and formed a feature vector, we simply take the transpose of the vector and multiply it on the left of the original data set, transposed.

RowFeatureVector x RowDataAdjust
where RowFeatureVector is the matrix with the eigenvectors in the columns transposed so that the eigenvectors are now in the rows, with the most significant eigenvector at the top, and RowDataAdjust is the mean-adjusted data transposed, ie. the data items are in each column, with each row holding a separate dimension. It will give us the original data solely in terms of the vectors we chose. Basically we have transformed our data so that it is expressed in terms of the patterns between them. Now the columns of the matrix FinalData are separate features which are linearly uncorrelated.
In order to get the original data back, we can use the equation shows below:

Classification using Support Vector Machine (SVM)
A classification task usually involves separating data into training and testing sets. Each instance in the training set contains one target value (i.e. the class labels) and several attributes (i.e. the features or observed variables).
We have used support vector machine (SVM) classifier, a supervised learning model, for classifying normal eye fundus from glaucoma affected eye fundus. The goal of SVM is to produce a model (based on the training data) which predicts the target values of the test data given only the test data attributes [15]. In our case, the input image matrices modified after applying preprocessing techniques and PCA in the previous steps serve as test data.
SVM is designed to separate of a set of training images into two different classes, (x 1 , y 1 ), (x 2 , y 2 ),…(x n , y n ) where x i in R d , d-dimensional feature space, and y i in {-1,+1}, the class label, with i=1..n. SVM builds the optimal separating hyper planes based on a kernel function (K). All images, of which feature vector lies on one side of the hyper plane, belong to class -1 and the others are belong to class +1 [16].
As we can see from Fig. 3, H3 does not separate the two classes while H1 separate the two class with a small margin, only H2 gives a maximum margin between two classes, therefor it's the right hyper plane used by support vector machine.
If the data of various classes can be separated as in Fig. 3, then the linear SVM is used.
Otherwise if the data of the classes cannot be separated, then the non-linear SVM classifier is used. However, instead of defining a function for the hyper plane itself; we define the margin in between the two classes. From Fig. 4, we can see that the position of our hyper plane depends on the value of w where w is the (not necessarily normalized) normal vector to the hyper plane.
More formally, linear SVM classifier function can be defined as, such that for each training sample x i the function gives f(x i )> 0 for y i = +1, and f(x i ) < 0 for y i = -1. In other words, training samples of two different classes are separated by the hyperplane f(x) = w T x + b = 0, where w is weight vector and normal to hyperplane, b is bias or threshold and x i is the data point.
The nonlinear SVM classifier is defined as, The transformation from non-linear to linear separating hyperplane in higher dimensional feature space is done by taking help of kernel functions. A kernel function on two samples, represented as feature vectors in some input space, is defined as k(x i , x j ) = ϕ(x i ) T ϕ(x j ), ϕ is the feature vector. Most commonly used kernels are: where r is a free parameter trading off the influence of higher-order versus lower-order terms in the polynomial, d is the degree of the polynomial, slope γ>0 where σ>0 is an adjustable free parameter; higher σ means that the kernel is a "flatter" Gaussian and so the decision boundary is "smoother"; lower σ makes it a "sharper" peak, and so the decision boundary is more flexible where γ is the slope and r is the intercept constant. It is also called Hyperbolic Tangent kernel or Multilayer Perceptron kernel.  5 shows by applying SVM kernel trick (as explained above) a non-linear classifier can be transformed into a linear separating hyper plane in higher dimensional feature space [17].
SVM is a binary linear classifier but it can be extended for multiclass problems also. However, our problem is a two class prediction problem where normal healthy eye fundus images will belong to one class (say positive class) and glaucoma affected eye fundus images will belong to another class (say negative class). We have used RBF kernel as kernel function taking parameter σ=1.5.

EXPERIMENTAL RESULTS
In our method, 100 fundus images which consist of 50 normal fundus images and 50 glaucoma affected fundus images are used. All the image pre-processing, feature extraction and SVM classification techniques in our proposed method are simulated in MATLAB 8.1 (R2013a) and run on an Intel(R) Core(TM) Quad CPU-Q 8200 PC with 8-GB memory.
These images are pre-processed through a series of steps as described in section 4.1. Then these pre-processed images are modified by Principal Component Analysis for feature extraction. After that, these modified images are fed into a Support Vector Machine classifier for training purpose.
In this work, we consider three classes: nonglaucoma, glaucoma suspect and glaucoma cases.
Generally, the cup-to-disc ratio (CDR V ) in vertical is the main indicator to analyze the development of glaucoma. When the CDR V is increased gradually, it also means that the size of the optic cup becomes larger. According to the ophthalmologist's suggestion when CDR V is higher than 0.6 is considered as glaucoma as shown in Fig. 6.c. When CDR V is approximately between 0.4 and 0.6, it is in the suspect range of glaucoma as described in Fig.6.b. Finally, Fig. 6 shows the comparison of CDR V in each stage.
We have used k-fold cross validation method to test the performance of classifier. In k-fold crossvalidation, the original set of input images is randomly partitioned into k equal size subsets. Of the k subsets, a single subset is retained as the validation data for testing the model, and the remaining (k−1) subsets are used as training data. The cross-validation process is then repeated k times (the folds), with each of the k subsets used exactly once as the validation data. The k results from the folds is averaged (or otherwise combined) to produce a single estimation. The advantage of this method is that all observations are used for both training and validation, and each observation is used for validation exactly once [18]. 10-fold crossvalidation is commonly used, but in general k remains an unfixed parameter. When k = n (the number of observations), the k-fold crossvalidation is exactly the leave-one-out crossvalidation.
We have taken k=10. 90 images are used for training and 10 images are used for testing each time. This process is repeated 10 times using different sessions of the test data each time.
Performance of the classifier can be tested and evaluated by the following parameters: Here, sample denotes input images used for training the classifier.
After training, we have tested the performance of the trained classifier on 50 eye fundus images (20 normal and 30 glaucoma affected) which were not in the trained set of input images. The SVM classifier can successfully classify this test set with accuracy rate 86%, sensitivity 100%, specificity 65%, positive predictive accuracy 81.08% and negative predictive accuracy 100%.
We have collected fundus images of eye from Susrut Eye Foundation & Research Centre, Kolkata. These are all real world images verified by eye specialists. Comparison of our proposed method with other recent methods is shown below: We name methods used in references [13,14] as Method 1, Method 2. Our method is named as GDSVM It can be observed that, our method GDSVM is optimal compared to other recent methods with respect to number of images and also accuracy. We have used more images for classification and testing than Method 1 and Method 2.

CONCLUSION
In this paper, our aim is to develop a model which can classify glaucomatous eye fundus images from healthy eye fundus with high accuracy. We have used Support Vector Machine classifier for this purpose. Only a few minutes of runtime are necessary for training the SVM classifier. Hence, the computational efficiency of SVM is great. SVM is advantageous as only a small training set is needed to provide very good results because only the support vectors are of importance during training. For these reasons, SVM tends to perform better than other supervised learning methods. We have developed our model using RBF kernel. Before PCA some statistical features can be extracted from fundus images for gaining more efficiency in the classification method. Our proposed method is designed to help the doctors in their decision making process for detecting glaucoma.

CONSENT
It is not applicable.

ETHICAL APPROVAL
It is not applicable.