Unsupervised Anomaly Detection for Glaucoma Diagnosis

With the rapid development of high tech, Internet of Things (IoT) and arti ﬁ cial intelligence (AI) achieve a series of achievements in the healthcare industry. Among them, automatic glaucoma diagnosis is one of them. Glaucoma is second leading cause of blindness in the world. Although many automatic glaucoma diagnosis approaches have been proposed, they still face the following two challenges. First, the data acquisition of diseased images is extremely expensive, especially for disease with low occurrence, leading to the class imbalance. Second, large-scale labeled data are hard to obtain in medical image domain. The aforementioned challenges limit the practical application of these approaches in glaucoma diagnosis. To address these disadvantages, this paper proposes an unsupervised anomaly detection framework based on sparse principal component analysis (SPCA) for glaucoma diagnosis. In the proposed approach, we just employ the one-class normal (nonglaucoma) images for training, so the class imbalance problem can be avoided. Then, to distinguish the glaucoma (abnormal) images from the normal images, a feature set consisting of segmentation-based features and image-based features is extracted, which can capture the shape and textural changes. Next, SPCA is adopted to select the e ﬀ ective features from the feature set. Finally, with the usage of the extracted e ﬀ ective features, glaucoma diagnosis can be automatically accomplished via introducing the T 2 statistic and the control limit, overcoming the issue of insu ﬃ cient labeled samples. Extensive experiments are carried out on the two public databases, and the experimental results verify the e ﬀ ectiveness of the proposed approach.


Introduction
Internet of Things (IoT) techniques are emerging, which are known to be among the most critical sources of data streams that produce massive amounts of data continuously from numerous applications, such as transportation systems, security systems, intrusion systems, and fault detection in industry [1][2][3][4][5][6][7][8][9][10]. Among them, anomaly detection has drawn tremendous interest in the past couple of years. Since science and technology play a vital role in the medical department, how to establish an anomaly detection big data analysis model supported by medicine becomes a hot topic.
With the rapid growth of the worldwide population, the count of eye diseases is also increased. Among them, glaucoma is one of the serious eye diseases that can lead to irreversible vision loss [11]. According to the recent report [12], the number of glaucoma patients is predicted to increase to 80 million by 2020 and to 111.8 million by 2040. Since glaucoma can cause blindness, early detection and timely treatment are good ways to slow down the progress, preventing further vision loss [13]. In clinical, ophthalmologists always employ the color fundus images to assess the optic nerve head (ONH) for diagnosing glaucoma [14] due to its low cost. Figure 1 depicts the structure of ONH, in which the optic disc (OD) appears as a bright yellowish elliptical region consisting of the central bright region as optic cup (OC) and a peripheral region as the neuroretinal rim. Since ONH assessment requires the qualified ophthalmologists to delineate the OC and OD, it is subjective, labor-intensive, timeconsuming, and not suitable for population screening [15].
For large-scale glaucoma screening, it is necessary to use automatic ONH assessment approaches. Since the glaucoma caused by the enlargement OC, several quantitative indicators are presented to identify the symptoms of glaucoma from the fundus images, such as vertical cup to disc ratio (CDR), disc diameter, rim area, and the ISNT (inferior (I), superior (S), nasal (N), and temporal (T)) rule [16,17] (as shown in Figure 1). Among these indicators, CDR is well accepted and widely employed indicator by clinicians. The CDR value can be calculated by the ratio of vertical cup diameter (VCD) to vertical disc diameter (VDD). Normally, a larger CDR value means the higher risk of glaucoma and vice versa. Besides, ISNT rule includes inferior (I), superior (S), nasal (N), and temporal (T) rules, which is used for the detection of glaucoma. For a normal subject, I has the highest value of neuro retinal rim configuration followed by S, N, and T.
With the rapid development of pattern recognition and machine learning, a series of automatic glaucoma diagnosis approaches have been proposed recently [18][19][20][21][22][23][24][25]. Although these approaches can achieve automatic glaucoma diagnosis, there are still two challenges in the practical applications. On one hand, the data acquisition of abnormal images is extremely expensive in medical image domain due to the large variations in appearance of OD. On the other hand, it is a time consuming and tedious task to labeling the medical images by the experienced clinicians. Therefore, the cost of obtaining large-scale labeled medical images is often expensive in clinical. In contrast, collecting a large number of normal data is relatively easy.
As we all know, human are good at distinguishing the abnormal images from the normal images [26]. Inspired by this, this paper designs an automatic approach for glaucoma diagnosis, which just uses the normal images for training model. The main advantages of the proposed approach are given as below: (1) In order to improve the detection accuracy and reduce the computational cost, the region of interest (ROI) in retinal image is extracted by exploring the OD location with unsupervised boundary extraction and Hough transform (2) Segmentation-based features and image-based features are extracted from the obtained ROIs to capture the shape and textural changes of the fundus images (3) To reduce the effects caused by noises and redundancy features, an effective feature set can be selected by combining the elastic net penalty and PCA together (4) In order to solve the issue of class imbalance, the T 2 statistic and the control limit are designed on the normal images, which can be used to distinguish the abnormal images from the normal images, achieving automatic glaucoma diagnosis (5) Extensive experiments are carried out on the two public fundus databases, and the experimental results indicate that the proposed approach is effective The remainder of this paper is arranged as below. Section 2 gives some related works, and then, the proposed approach is introduced in details in Section 3. Next, the experimental results and analyses are presented in Section 4. Finally, the whole work is concluded in brief in Section 5.

Related Work
The existing automatic glaucoma diagnosis approaches can be divided into two categories: machine learning-(ML-) based approaches and deep learning-(DL-) based approaches.

Machine
Learning-(ML-) Based Approaches. ML-based glaucoma diagnosis approaches follow a fixed procedure [27], e.g., (1) input an image; (2) preprocessing; (3) feature extraction; and (4) classification (diagnosis). In preprocessing stage, Contrast Limited Adaptive Histogram Equalization (CLAHE) [28] is commonly used to reduce the negative effects caused by noise and artifact, improving the quality of the fundus images. In feature extraction stage, a series of important and distinctive hand-crafted features will be extracted to explore the concealed pixel variations in the retinal images. The extracted hand-crafted features are classified into wavelet decomposition-based features [29], morphological-based features [30], nonlinear-based features [31], textural-based features [32], and image descriptorbased features [33]. After feature extraction, the last process is classification. Usually, with the usage of the extracted features, a classification model can be trained which identifies the normal class versus abnormal class. Many classifiers have been employed to distinguish the two classes based on the extracted features, for instance, artificial neural network (ANN) [34], K-nearest neighbor (KNN) [32], support vector machine (SVM) [35], least square support vector machine

Wireless Communications and Mobile Computing
(LS-SVM) [29], and extreme learning machine (ELM) [36]. A major challenge in the machine learning-based approaches is that the hand-crafted appropriate features should be set beforehand. Seen from these extracted features, most of them belong to the image-based features. Nevertheless, segmentation-based features are ignored, which can be regarded as one of the most important clinical indicators for glaucoma diagnosis. Besides, usually, the number of normal images is much larger than the abnormal images, leading to the class imbalance. Under this circumstance, the classifier of machine learning approaches can hardly train well, which will affect the final diagnosis performance.

Deep
Learning-(DL-) Based Approaches. DL-based approaches follow the same sequence as ML approaches for glaucoma diagnosis. However, the major difference between them is the deep learning network can self-learn during the training of the network, without extracting a series of hand-crafted features for classification. For example, convolutional neural network (CNN) is the most commonly employed form of deep learning. Generally, a structure of CNN contains (1) convolutional layers, (2) pooling layers, and (3) fully connected layers. The convolutional layer is used to extract the nonlinear features. The pooling layer is utilized to reduce the space dimensionality of sample and keep the important information unchanged. The fully connected layer is to connect the every neuron in the previous layer with the neurons in the current layer of the CNN model. Furthermore, some variations of DL, e.g., adversarial learning [37], FCNs [38], and modified U-net [24], are used to glaucoma diagnosis, improving the diagnosis performance. Although DL-based approaches have achieved many breakthroughs in medical image analysis, these approaches mainly rely on large-scale labeled data. However, in medical image domain, the samples are always small-scale and unlabeled, so DL-based approaches can hardly perform well [39,40].

Our Approach
Our approach consists of the following three stages, including ROI extraction, features extraction (segmentation-based features and image-based features), and glaucoma diagnosis. The flowchart of the proposed approach is shown in Figure 2.

ROI Extraction.
Generally, the resolution of original fundus image is relatively large. Meanwhile, the region of interest for glaucoma diagnosis is just a small region according to the clinical prior knowledge. In order to reduce the influence of the unnecessary background information and the computation cost, improving the diagnosis accuracy, the extraction of ROI around the OD is necessary (as shown in Figure 3).
The process of ROI extraction consists of the following three stages: first, this paper employs our previous proposed approach named as an adaptive rough OD boundary curve extraction based on unsupervised learning [16] to extract the OD region (as shown in Figure 3(b)). After that, in order to accurately locate the center of OD, the Circular Hough Transform (CHT) is employed to the extracted OD region according to the fact that the prior knowledge of OD is circle in shape, depicted in Figure 3(c). At last, with the usage of the extracted center of OD as the ROI center, according to the clinical experience [16], this paper cuts the original images into small images with the resolution of 400 × 400 on the REFUGE and RIM-ONE r2 databases, which are called ROIs around the OD. An example from the REFUGE database is shown in Figure 3(d).
Successful optic disc center localization is considered to be that the distance between the estimated optic disc center and the manually selected center is less than the optic disc radius [41,42]. Compared with the state-of-the-art OD detection approaches, our approach is more robust to the image quality, contrast, brightness, and different lesions. Since human vision is more sensitive to green color, only green channel of ROI is extracted from the RGB image for further processing. Meanwhile, CLAHE is introduced into the extracted green channel of ROI [28] to enhance the contrast (illustrated in Figure 3(e)).

Feature Extraction.
OD with varying intrinsic principal features, e.g., distinct vessel structures, sizes, and brightness, gives more challenges for glaucoma diagnosis. To fully explore the difference between the normal images and abnormal images, improving the effectiveness and accuracy of the glaucoma diagnosis, this paper describes the OD from the following two sides. For one side, inspired by the clinical glaucoma diagnosis, a series of segmentation-based features are extracted to accurately capture the shape deformations to characterize glaucoma. For another, some image-based features are extracted to capture the shape and textual changes in OD. We believe that taking the segmentationbased and image-based features into a united framework can provide complementary and independent information for OD descriptions. The detailed descriptions of the extracted features are given as below.

Extraction of Segmentation-Based Features.
Based on the extracted ROI around the OD, this paper adopts [11] to segment the OD and OC. In clinical, for a normal fundus image, the rim is the thickest in the inferior (I) sector and thinnest in the temporal (T) sector (as shown in Figure 1). Usually, experienced ophthalmologists observe inferior (I), superior (S), nasal (N), and temporal (T) quadrants for glaucoma diagnosis. Therefore, a series of clinical features are extracted from the segmented OD and OC, denoted as F1 to F5 (as shown in Table 1). F1-F3 are based on rim-disc ratio, and F4-F5 are calculated by rim profile and ISNT, respectively. Among them, the calculation of F5 is based on a disk-shaped region named as the NRR, obtained by removing the OC from the OD.
The calculation process of F5 is given as below:

Extraction of Image-Based Features.
Since the gray level of the image on the edge is discontinuous, it has singularity. Recent studies have shown that the multiresolution features of wavelet coefficients and the localization characteristics of wavelet analysis can be used to obtain the domain features at different scales. In addition, the high frequency information decomposed by wavelet can also be used for multiscale edge detection [43], which is very fit for medical image processing and analysis. Inspired by the advantages of wavelet features, this paper employs an adaptive signal decomposing approach, named as two-dimensional (2D) EWT with Littlewood-Paley as an empirical wavelet [44] to decompose the image into various frequency bands. Based on the decomposed components of (2D) EWT, a series of image-based features consisting of chip histogram, gray level cooccurrence matrix (GLCM), and moment invariance are extracted. Here, the enhanced green channel image is utilized as the input of (2D) EWT. For each input image, four frequency bands can be decomposed. The detailed imagebased features descriptions are given as below: (a) Chip histogram features Chip feature belongs to statistical textural feature, which can be extracted from the second-order histogram. There are six features in it including mean, variance, skewness, kurtosis, energy, and entropy. For a given gray image, t ðiÞ is the probability density function of the intensity level i, which is denoted as tðiÞ = HistogramðiÞ/T, i = 1, 2, 3, ⋯, N. T is the total pixels in the gray image, and N is the total number of gray levels, in which Q is the gray level vector represented as ½1, 2, 3, ⋯, N. The expressions of the chip histogram features are depicted in Table 2.
(b) GLCM features GLCM is utilized to extract the texture in an image by doing the transition of gray level between two pixels, which is one of the earliest approaches for texture feature extraction [45]. For each GLCM, four directions can be computed, in which each direction has four different characteristics, i.e.,

Wireless Communications and Mobile Computing
In order to overcome the changes of object shape, position, and orientation [46], this paper employs the moment invariance features for describing the OD and OC. There are seven features in moment invariant features, and the mathematical expressions of these features are shown in Table 4.
The invariants λ mn can be calculated by λ mn = μ mn / μ 1+ðm+nÞ/2 00 , in which μ mn and μ 00 are, respectively, denoted as the center moment and the zeroth central moment.
After feature extraction, 73 features in which 5 segmentation-based features and 68 4 × ð6 + 4 + 7Þ imagebased features are computed for each input image, forming a 73-dimension feature set F = fF 1 , F 2 , ⋯, F 73 g. Each feature F i is normalized to zero mean and unit variance by using F i ′= ðF i − μ i Þ/σ i in which μ i is the mean of the ith feature, and σ i is the corresponding standard deviation.
3.3. Anomaly Detection for Glaucoma with SPCA. In this subsection, we firstly review the (principal component analysis, PCA) and sparse PCA. Then, T 2 statistic of glaucoma and the corresponding control limit are given for glaucoma diagnosis.
3.3.1. PCA. PCA is one of the most popular dimensionality reduction approaches, which is aimed at maximizing the variance of projections on the new directions. For PCA, a series of load vectors can be constructed by the orthogonal vectors. Supposing that X ∈ R N×D is a given training sample set, N and D denote the number of the samples and variables (features), respectively. The objective function of PCA is given as where u ∈ R D . The solution of Equation (1) can be computed by singular value decomposition (SVD) as where W ∈ R N×N and U ∈ R D×D are unitary matrices, in which the orthogonal column vectors in matrix U are called the load vectors. Projecting X on the ith column has the variance as σ 2 i . Σ ∈ R N×D is a diagonal matrix where the elements in main diagonal are the nonnegative singular values in descending order ðσ 1 ≥ σ 2 , ⋯,≥σ min ðN,DÞ ≥ 0Þ, and the others are zeros.
A new load matrix P ∈ R D×A can be obtained by selecting the first A columns in U, named as the principal component subspace (PCS), which is represented as T = XP. For a given new sample x, it can be projected on the PCS as In Equation (3), A is a parameter, which is calculated by cumulative percent variance (CPV), and this paper adopts reference [47] to resolve it.

Sparse PCA.
Seen from the PCA, its major work is to acquire the maximum variance on the certain loading vectors. However, in practice, some principal components are linear relevant with each other and some of them have large noises, which will weaken the precision of detection. Inspired by the advantages of sparsity, sparse PCA has been proposed by performing the maximization of objective function under the L1 norm constraint as where s is the number of nonzero elements in a load vector. u j ðj ∈ f1, 2, ⋯, DgÞ denotes the jth element of the load vector. By introducing the aforementioned sparse constraint, each principle component has lesser original variables. Therefore, it not only enhances the interpretability of the principal components but also reduces the storage space. Many approaches have been designed to solve Equation (4). Among them, Jolliffe and Uddin [48] proposed the most effective approach to resolve it. The main processes are depicted as below: first, PCA can be recast exactly by a regression problem, such as ridge regression. Then, the regression problem is changed to an elastic-net regression by introducing the L1 norm constraint. For more details, please refer to reference [49].

Index and Control Limit.
A series of statistics have been applied to multivariable statistical analysis. Among them, one of the most preventative ways, namely, Hotelling's T 2 statistic, is utilized to represent the variability in the PCS, which is defined as where F A,N−A is an F distribution with A and N − A degrees of freedom. For a given significance level α, the detected image is regarded as glaucoma if The whole process of the proposed glaucoma diagnosis approach can be divided into the following four stages. In the first stage, an unsupervised boundary curve extraction model and Circular Hough Transform (CHT) are used to extract ROI. In the second stage, a series of features containing segmentation-based and image-based features are extracted from the ROI, forming the feature matrix. The third stage is the offline training. In this part, the glaucoma model is constructed based on spares PCA, and the control limit L is also given. The last stage is the online testing. For a new fundus image, first, repeat the process of stages one and two, and then, compute the corresponding statistic; at last, compare them with the control L for distinguishing glaucoma from normal.

Databases.
In experiment, we employ two public databases to verify the effectiveness of the proposed approach. The detailed descriptions of the databases are given as below: Retinal Fundus Glaucoma Challenge (REFUGE) database [50] consists of 1200 color retinal fundus images stored in JPEG format, in which 120 are glaucomatous and 1080 are nonglaucoma images. REFUGE database is divided into three parts: 400 training images, 400 validation images, and 400 testing images, in which 400 training images are acquired with a Zeiss Visucam 500 fundus camera with the resolution of 2124 × 2056 pixels, and the 400 validation images and 400 testing images are acquired with Canon CR-2 device with the resolution of 1634 × 1634 pixels. In REFUGE database, only 400 training images and 400 validation images which are labeled as "ground truth" with the segmentation results of OD and OC and the rest 400 testing images without labeling the ground truth. Therefore, in this experiment, we adopt 400 training images (40 glaucoma images and 360 nonglaucoma images) and 400 validation images (40 glaucoma images and 360 nonglaucoma images)

GLCM features Expressions
Contrast Wireless Communications and Mobile Computing to train and verify the effectiveness of the proposed approach.
In total, there are 80 glaucoma images and 720 nonglaucoma images. In experiment, 500 nonglaucoma images are randomly selected constructing the training set. 20 glaucoma images and 160 nonglaucoma images are regarded as the validation set (20 + 160). And the rest 60 nonglaucoma images and 60 glaucoma images are regarded as the testing set (60 + 60). The random sample selection process is repeated 10 times, and the average result is regarded as the final result. RIM-ONE r2 database [51] comprises of 455 retinal fundus images, in which 255 are normal images and 200 are glaucoma images. In experiment, first, all of the images in this database are resized in the same dimensionality. And then, the ROI extraction depicted in Section 2 is used to these images. At last, a series of features are extracted from the obtained ROIs, which are regarded as the inputs for our approach.
In experiment, 200 nonglaucoma images are randomly selected constructing the training set. 25 glaucoma images and 25 nonglaucoma images are randomly selected as the validation set (25 + 25). The rest 30 nonglaucoma images and 175 glaucoma images are regarded as the testing set (30 + 175). The random sample selection process is repeated 10 times, and the average result is regarded as the final result.
The proposed approach is trained and tested by using a desktop computer having 16 GB random access memory (RAM), Intel ® Core™ i7 CPU950@3.7 GHz. We develop our approach using MATLAB 2021a for training and testing.

Evaluation Criterion.
In order to evaluate the effectiveness of the proposed approach, three evaluation criteria, namely, accuracy, sensitivity, and specificity, are employed where true positive (TP) is the number of glaucoma images that are correctly identified; false negative (FN) is the number of incorrectly found as nonglaucoma images; false positive (FP) is the number of incorrectly found as glaucoma images. true negative (TN) is the number of nonglaucoma images that are correctly identified. In order to evaluate the effectiveness of the proposed approach, receiver-operating characteristics (ROC) curve is utilized in this paper. In ROC curve, the vertical axis is the sensitivity and the horizontal axis is (1-specificity). The area under the ROC curve denotes as the AUC value   Figures 5(a) and 5(b)). As seen from these figures, we can learn that the proposed glaucoma diagnosis approach can achieve maximum AUC values on the REFUGE and RIM-ONE r2 databases when the values of sparsity are set as 40 and 30. After the sparsity is determined, the threshold for T 2 can be computed by Equation (7).

Experimental Results and Analysis.
In this subsection, we will carry out two experiments to verify the effectiveness of the proposed approach. For one side, the proposed approach with different features including segmentation-based features, image-based features, and their fusion features will be tested, respectively, and the obtained diagnosis performances are shown in Figure 6.
Seen from the results depicted in Figure 6, we can learn that when the proposed approach employs the fusion features, it can achieve the best performance on the testing set. The main reason is that the segmentation-based and image-based features can capture the shape and textural changes, respectively. Nevertheless, the fusion of these features provides complementary information for glaucoma diagnosis, which can improve diagnosis performance.
For another, the performance of the proposed approach is compared against the state-of-the-art approaches, i.e., wavelet features [52], superpixel segmentation [53], semisupervised clustering [54], deep learning [55], and AWLCSC [31]. Table 5 shows the diagnosis results obtained by different algorithms on the REFUGE database and RIM-ONE r2 database, respectively. Among them, superpixel segmentation, wavelet features, and deep learning are supervised learning approaches; semisupervised clustering is the semisupervised learning approach; AWLCSC and the proposed approach belong to unsupervised learning approaches. For supervised learning approaches, they can achieve good performance when a large number of labeled samples are used to train model. However, a large number of labeled data are hard to obtain, especially for medical domain. Under this situation, semisupervised learning and unsupervised learning have gained tremendous attention. Although these approaches can overcome the problem of insufficient samples, the class imbalance problem still lies in these approaches reducing their classification performance. Different from the existing glaucoma diagnosis approaches, this paper just employs one-class normal data for constructing model and designs the control limit for detecting the abnormal data. Therefore, the aforementioned two limitations can be avoided. According to the comparison results depicted in Table 5, we can learn that the proposed approach is more reliable than other tested approaches in terms of diagnosis accuracy, indicating the effectiveness of the proposed approach.

Conclusions
In this paper, we propose an unsupervised anomaly detection approach via sparse PCA for glaucoma diagnosis. Since the ODs vary in sizes, shapes, and appearances for glaucoma images, it can hardly construct an effective model to diagnose glaucoma. Instead of using the glaucoma images, this paper just constructs the diagnosis model based on the nonglaucoma images. Therefore, the proposed approach overcomes the class imbalance issue, which is a hard problem in classification. Furthermore, a series of features including segmentation-based features and image-based features are designed, which can capture the shape and textural changes, respectively, improving the discrimination of the OD and OC. Experimental results indicate that the proposed approach can achieve good glaucoma diagnosis performance.
The main disadvantage of the proposed approach is that the fundus images were obtained by different equipment and institutions, so that the trained model parameters by our proposed approach cannot perform stable detection results. Therefore, it is necessary to improve the generalization of the model from different datasets, which is one of our future research directions. Besides, more large-size databases will be introduced to further verify the effectiveness of the proposed approach.

Data Availability
The data are derived from public domain resources.

Conflicts of Interest
The authors declare that they have no conflicts of interest.