Statistical modeling of retinal optical coherence tomography using the Weibull mixture model

: In this paper, a novel statistical model is proposed for retinal optical coherence tomography (OCT) images. According to the layered structure of the retina, a mixture of six Weibull distributions is proposed to describe the main statistical features of OCT images. We apply Weibull distribution to establish a more comprehensive model but with fewer parameters that has better goodness of fit (GoF) than previous models. Our new model also takes care of features such as asymmetry and heavy-tailed nature of the intensity distribution of retinal OCT data. In order to test the effectiveness of this new model, we apply it to improve the low quality of the OCT images. For this purpose, the spatially constrained Gaussian mixture model (SCGMM) is implemented. Since SCGMM is designed for data with Gaussian distribution, we convert our Weibull mixture model to a Gaussian mixture model using histogram matching before applying SCGMM. The denoising results illustrate the remarkable performance in terms of the contrast to noise ratio (CNR) and texture preservation (TP) compared to other peer methods. In another test to evaluate the efficiency of our proposed model, the parameters and GoF criteria are considered as a feature vector for support vector machine (SVM) to classify the healthy retinal OCT images from pigment epithelial detachment (PED) disease. The confusion matrix demonstrates the impact of the proposed model in our preliminary study on the OCT classification problem.


Introduction
Nowadays enormous datasets in different fields cause difficulty in their processing or interpretation. Hence, many data science experts are looking for a suitable model and pattern to take the advantages to specify the characteristics of each data category as particularly and uniquely [1,2]. So, modeling can be considered as the main core of many signal/image processing tasks because choosing different models lead to different strategies for data analysis. Particularly in medical image analysis which prior information about the properties of the organs under imaging is available, modeling shows its importance more. According to nature and noise effect, stochastic models are generally suggested in these images [3][4][5][6].
Optical Coherence Tomography (OCT) has appeared as the most promising ocular imaging technique that provides high-resolution, cross-sectional, and microstructure visualizing of the retina tissue [7,8]. It is applied to detect and manage more retinal diseases in the early stages [9,10]. Like other coherent systems, the OCT images have a multiplicative noise called speckle that dramatically degrades their quality as grainy appearance [8,11]. Here, a novel tailored statistical model for retinal OCT images has been proposed and its performance is surveyed in noise reduction application.
Until now several statistical distributions have been suggested for retinal structure modeling in OCT images and used in various processing applications.
The first investigation can be assigned to Grzywacz et al. that introduced the stretched exponential distribution for the homogenous rectangular regions or parallel regions on retinal layers. The modulation of parameters was served to detect Diabetic Retinopathy (DR). The dependence of parameters was defined as a position function and the spikes in the place of blots were observed in the DR data. However, the parameters were steady within the retinal layers of healthy data [12].
The Generalized Extreme Value (GEV) and Gaussian distributions were suggested to model seven layers of the retina. These models were exploited to build a synthetic OCT dataset [13].
Amini et al. defined a symmetric Normal Laplace Mixture (NLM) model, the convolution of Normal and Laplace distribution, to enhance the contrast of OCT images. It was handled matching the NLM to the Gaussian Mixture Model (GMM) using Gaussianization Transform (GT) [14]. In another study of the researchers, the GT was combined with Spatially Constrained Gaussian Mixture Model (SCGMM) algorithm to reduce the noise of retinal OCT images [15]. In [16], the suggested statistical model in [14] was modified to the asymmetric version due to the histogram of retinal layers' intensities and was called, Asymmetric Normal Laplace Mixture (ANLM) distribution. The denoising results indicated the increase of Contrast to Noise Ratio (CNR) compared to the symmetric model [17]. Jesus et al. described a statistical model for corneal OCT speckle in Region of Interests (ROIs) to evaluate the properties of microstructure in vivo. The generalized Gamma distribution was defined as the distribution of corneal OCT intensities and its parameters have statistically demonstrated significant differences for short and age-related corneal changes [18].
Dubose et al. [19] took into account the segmentation problem of OCT retina layers utilizing the Cramer-Rao Lower Bounds (CRLBs). The authors defined the intensity distribution of each retinal layer as K-family distribution to derive the CRLBs.
Recently, Samieinasab et al. [20] noted a horizontal dependency between neighbor pixels of retinal OCT images at specific distances modeled by a multivariate asymmetric Bessel K Form (BKF). Finally, the result of denoising has been reported by the combination of gaussianization and SCGMM. On the other hand, statistical modeling can also be applied in the transform domain. In the following, we would review them for OCT modeling.
Cauchy distribution has been suggested to model wavelet coefficients of the retinal OCT images. The distribution has been converted to Gaussian by applying histogram matching, and the noiseless coefficients have been recovered by Minimum Mean Square Error(MMSE) estimator [21]. Also, two recent literatures were reported on statistical modeling of retinal OCT images in transform domain [22,23]. Ghaderi et al. revealed the scale mixture model for the group of similar nonlocal patches of OCT image in the sparse domain. The sparse coefficients have been considered as the product of a random vector and a positive scale variable, which are modeled by Laplace and GEV distributions, respectively. The efficiency has been surveyed in denoising and obtained the high resolution OCT images captured by Bioptigen cooperation [22]. In another paper, Amini et al. extended their previous works to a multivariate mixture NL in the dual-tree-complex wavelet transform (DTCWT) domain. The researchers' innovation compared to their previous studies was to propose a multivariate model and gaussianization in the sparse domain. All above mentioned reviewed studies are summarized in Table 1 which is comprised the name of the statistical model, type of OCT device, the domain of the modeling, and application of modeling in the mentioned papers.
Furthermore, some other studies modeled the speckle noise of OCT images by Rayleigh [24], negative exponential [25], and gamma distribution [26]. With this in mind, we undertook this study to propose a more comprehensive model but with fewer parameters that has a better Goodness of Fit (GoF) than previous models as well as considering features such as asymmetry and heavy-tailed of the intensity distribution of retinal OCT data. Finally, to evaluate the performance of the proposed model, it is applied to reduce the noise from the retinal OCT images. In addition, a feasibility study to illustrate the effect of this model in OCT classification is provided. The paper is organized as follows. Section II dedicates to the proposed model and its application on our dataset denoising. The result of GoF and noise reduction are reported in Section III. Finally, the discussion and conclusion are presented in Section IV and V, respectively.

Dataset
In this study, two datasets are used that introduced in [27,28] and available in [29]. The first dataset was obtained from Topcon 3D OCT-1000 imaging system at the Ophthalmology Dept. of Feiz Hospital, Isfahan, Iran. Thirteen 3-D macular (spectral domain) SD-OCT images collected from eyes without pathology were included in this dataset. The x, y, z scale of the collected volumes is 650 × 512 × 128 voxels, 7 mm × 3.125 mm×3.125 mm, voxel size 13.67 µm × 4.81 µm × 24.41 µm. We have randomly selected 10 B-scans from each volume so we have totally130 B-Scans which the GoF of candidate distributions are compared on them. Also, the results of denoising are reported on 60 randomly selected B-scans from different volumes. It should be noted that 50 healthy data from this dataset is also selected for classification application. The second dataset contains six 3D OCT data that has recorded from the same device and with the same characteristics. They were diagnosed to have retinal Pigment Epithelial Detachment (PED). This dataset is used to survey the ability of the proposed model to classify healthy data from abnormal ones. Hence, 50 B-scans with pathological symptoms are selected from this dataset.

Statistical modeling
To provide a statistical model for OCT images, at first a logarithm operator is used to convert the speckle noise to an additive Gaussian noise [30,31].

Retinal layer modeling
To improve the modeling of a retinal OCT image, we have to find a suitable distribution for each layer that provides an appropriate insight into the special structure. To this end, the eleven retinal layers in OCT images were manually segmented by an expert and their normalized histograms were plotted as illustrated in Fig. 1.
The most important observed features were the asymmetry and heavy-tailed nature of data distribution. Among the existing distributions, our goal is to present a model with less complexity and parameters that can be well fitted to OCT data.
In this regard, based on the data structure, different distributions were investigated that twoparameter Weibull distribution can better address the mentioned properties than other candidate distributions. The probability distribution function (pdf) of a Weibull random variable is as Fig. 1. A sample B-scan of retinal OCT with displaying the manual segmentation of the twelve boundaries as well as their nomination and the normalized histogram of each layer. The 11 layers are numbered from ILM to the last RPE complex, respectively. Heavy-tail and asymmetry nature of data distribution are marked as curved red line and vertical dashed red line in each graph, respectively. The x-axis and y-axis are the range of intensity values of pixels and the normalized histogram of these intensities, respectively. follow [32,33].
where α, β are parameters of scale and shape, respectively and x denotes the pixel intensity in each B-scan. The advantages are the comprehensiveness, i.e., changing of the values of Weibull parameters leads to a range of well-known distributions as special cases such as exponential (β = 1) and Rayleigh (β = 2, α = √ 2σ). The proposed pdf has the ability to model the features of data (asymmetry and heavy-tailed properties). This is compared with Gamma, symmetric and asymmetric Normal Laplace, Generalized Gaussian Distribution (GGD), and GEV pdfs that have been used in the previous studies. The visual and numerical results of GoF demonstrated the ability of Weibull distribution in capturing the main statistical properties of each retinal layer of OCT images in comparison to other mentioned distributions. In the following, we introduce a statistical retinal OCT image model based on Weibull distribution.

Retinal OCT image modeling
Since retina has several various cellular layers, we suggest a mixture distribution [34][35][36][37] to model the retinal OCT images. The number of mixture components can in the simplest case, be equivalent to the total eleven anatomic layers of retina. However, these layers are located in different geometric positions, some of them have similar statistical characteristics and can be described with an identical pdf. Due to the one by one (alternate) changing of retinal layer's intensities, we have selected 6 components as nearly optimal value for the number of components. Since the Weibull pdf has a significant GoF for each layer, it was considered as a tailored distribution of each component in the mixture model. Therefore, the mixture of Weibull distribution with different values of parameters in each component is chosen in our proposed model of retinal OCT images.
The proposed mixture model is formulated as (1), w i are non-negative weights with . . , I are the parameters of the Weibull pdf. In our experiments we have used I = 6 in model Eq. (2). In total, the parameter sets α = (α 1 , . . . , α I ), β = (β 1 , . . . , β I ) and w = (w 1 , . . . ., w I ) have to be estimated from the observed data x j , j = 1, . . . , N, where in our case N is the number of pixels in an image. The accurate estimation of these parameter sets plays an important role for the GoF results of the mixture model and for the performance of the processing tasks.
Due to incomplete data, we employ the Expectation Maximization (EM) [38,39] algorithm to estimate the model parameters. The EM works iteratively. We start with an initial estimate α (0) , β (0) , w (0) according to our observations from statistical behavior of OCT layers. At the (k+1)-th iteration step of the EM algorithm, we proceed as follows. See Appendix I for more details.
First, we compute the auxiliary variables for j = 1, . . . , N and l = 1, . . . , I, (expectation step). In the second step (Maximization step), we update the parameter sets α (k) , β (k) , w (k) as follows, Instead of the given formulas in Eqs. (3)-(6), we can also re-compute the auxiliary variables using the latest estimates. For example, after having computed w (k+1) and (6). Moreover, the estimate for β (k+1) can be improved by solving the nonlinear equations , respectively. In our numerical experiments, we have solved Eq. (8) iteratively by choosing appropriate initial values. Most benefit of our proposed mixture model compared to the one introduced in [14,17] is related to the form of Eqs. (5)- (6). Here, we computed these parameters in the explicit formulas using the numerical methods and fixed the non-convergence problem of parameters that the impacts are illustrated in the result section. The proposed mixture model is compared with NLMM, ANLMM and, GMM using GoF tests and visual assessment which shows the significant superiority of the proposed model. After finding the suitable model for OCT data, its performance is investigated in denoising application.

OCT image denoising
As mentioned before, speckle noise has greatly reduced the quality of OCT images, hence several articles have surveyed on the noise reduction of these images [40][41][42][43][44]. Because of the effect of denoising in the main steps of OCT image processing such as segmentation or abnormalities detection, we aim to provide an efficient method of noise reduction using the MAP estimator based on the proposed mixture model. For this purpose, we selected the SCGMM algorithm which is a well-known denoising algorithm based on Gaussian assumption. So, we first convert the Weibull mixture model (WMM) to the Gaussian mixture model and then apply the SCGMM algorithm.

Gaussianization
Converting non-Gaussian data to Gaussian type is considered in multiple processing applications [45][46][47][48][49][50][51][52]. Here, Gaussianization is performed using the histogram matching which transforms an image so that its histogram matches to a specified histogram. Here, the Cumulative Distribution Function (CDF) of the Weibull model is equivalent with the CDF of Gaussian distribution according to the following equation, The Gaussian random variable x G is derived in terms of the Weibull random variable x W , and Weibull and Gaussian parameters {µ iG , σ iG } as where erfinv is the inverse of error function, erf( {µ iG , σ iG }, are calculated in terms of Weibull parameters as where Γ is gamma function [53].
In this way, each Weibull component in mixture model is converted to a Gaussian component. Finally, all Gaussian components are combined using weighted summation proposed in [54] with where x iG is the i th gaussianized component and f i (.) denotes the Weibull distribution using the parameters of i th component. Finally, the exponential operator is used to return the image into the main domain due to use of the logarithm operator at the beginning of the study.

SCGMM
The SCGMM algorithm [55] is a patch-based method [56][57][58][59] in which the image is divided into a number of overlapping patches. Then the number of exemplar patches are uniformly chosen in the image, and similar patches are selected using K-Nearest Neighbors algorithm for each exemplar and form a cluster. It is assumed that each cluster has a Gaussian distribution with mean vector and specific covariance matrix. Therefore, the image would follow a Gaussian mixture model. To calculate these areas as well as to estimate the Gaussian parameters (mean vector and correlation matrix), a two-step algorithm is used, the first step of which is the clustering step and the second step is noise reduction (Fig. 2 shows the main steps of SCGMM). Finally, the steps of denoising based on Weibull mixture model are summarized in Algorithm I.

Results
In the two next subsections, the results of the study are reported. Subsection A is related to the GoF of the proposed model as visual and numerical measurements. In Subsection B the application of the proposed method in denoising of retinal OCT images is investigated.

Goodness of fit
Two numerical measurements were calculated to show how well the various above-mentioned statistical models (i.e. GEV, NL, ANL, Gamma, GGD and Weibull) fit into the intensity distribution of retina layers and B-scans. Chi square examines the difference between the observation and expected values according to the following equation: where p i and q i point out to data distribution and the statistical model, respectively. B is the number of bins for the histogram. Another numerical measurement is Kullback-Leibler Divergence (KLD) which is called relative entropy and measures the discrepancy between two probability distributions. So, it can serve as a GoF test.
The smaller value of the criteria indicates the less discrepancy. In this study, they were used to compare six distributions to model the retina layers as well as four mixture models applied to model the whole OCT B-scan. Table 2-4 and Fig. 3-4 are related to the quantitative and qualitative evaluations of the results, respectively. From these results we can conclude that WMM has the lowest value of the quantitative criteria and it is demonstrated more remarkable superiority than other models in the modeling of retinal OCT images.

Denoising of OCT images
The results of the proposed method on denoising application were evaluated by three numerical criteria called CNR, Texture Preservation (TP), and Edge Preservation (EP). They were calculated in the chosen regions of interest (ROI) on the B-scans.    CNR measures the contrast between the pixels of m-th ROI and background (noisy region) as given by following definition.
where µ mROI , σ 2 mROI denote the mean and variance of m-th ROI chosen in the intra-layers and the µ background , σ 2 background are related to the noisy region on the image (vitreous). It is obvious if the m-th region has a high mean and less standard deviation, it will make a high CNR. So, the region with high texture (large σ 2 ) calculated mistakenly the contrast low. Hence the next criterion (TP) is computed to consider this property in the denoising algorithms and show the preservation of texture after denoising. It is between 0 and 1, and is close to 0 if the texture is become so flatten. The texture regions, layers area, are favor to calculate the TP based on equation in [60].
The last indicator, EP, was calculated to display the maintaining of edges after applying the noise reduction method. EP ranges are similar to TP and the related equation in [60] calculates the correlation of the edge in the denoised and noisy image over the ROIs. The numerical denoising results were reported in Table 5. Also, the quality of the denoising method was shown on a sample image in Fig. 5. From these results we can conclude that our proposed denoising method (WMM-GT-SCGMM) provided the images with high CNR and TP compared to the other methods, i.e. SCGMM, NLMM-GT-SCGMM and, ANLMM-GT-SCGMM. Also, the computational time of the aforementioned denoising methods has been calculated for a sample 271×512 OCT image using a 2.40 GHz Intel Core i7 CPU and reported in Table 6. Our method has a less computational time than ANLMM-GT-SCGMM method however it needs more computational time than SCGMM and NLGTSCGMM. This is due to the more accurate parameter estimation method which leads to better results on Goodness-of-Fit. Furthermore,  our denoising results are also better than other competitive methods. Therefore, this additional processing time is justified, especially for offline applications like denoising.

Discussion
In this study, a new statistical mixture model has been addressed for retinal OCT images in the spatial domain. For this purpose, we first explored a suitable model for each retinal layer using 2-parameter Weibull distribution. It is more compatible regarding the intensity distribution of each layer (excluded the 7 th layer) than other candidate distributions like Gamma, GEV, NL, ANL, and GGD. Regarding the 7 th layer, none of the previously examined distributions fitted well because it is a very thin layer and it is difficult to propose a mixture state for it. The proposed Weibull model with few parameters and computational complexity, is able to model the asymmetry and heavy-tailed nature of the intensity distribution of the layers as well as possible, therefore the mixture of them has been selected for modeling of retinal OCT images. This mixture model is composed of 6 components of Weibull distribution. In addition to aforementioned advantages of the proposed model, a robust estimation of WMM's parameters has been provided by EM algorithm due to the closed form of equations and convergence of parameters. Eventually, WMM was compared to three other mixture models, GMM, NLMM, and ANLMM. It had more remarkable superiority than the other candidate mixture models in terms of both qualitative and quantitative criteria. In addition, the efficiency of the proposed model was investigated in noise reduction. For this purpose, SCGMM algorithm was used to decrease the noise. The proposed algorithm, WMM-GT-SCGMM, has demonstrated better CNR and TP compared to other examined denoising methods. Therefore, it seems that the proposed method can be used as a suitable preprocessing step for future operations such as diagnosis from texture analysis.
Although most previous modeling studies on OCT images have been purposed for noise reduction, some researchers have used modeling to diagnose disease. Therefore, in this study a preliminary study was conducted to classify normal and abnormal data (PED dataset) using the proposed model. In order to differentiate between normal and abnormal datasets, the region of retina from the first to the last boundary (ILM-RPE) was considered which segmented automatically using method in [28]. Figure 6 illustrated an OCT image with the pathological symptoms as well as extracted ILM and RPE boundaries. The proposed mixture model was fitted to the intensity distribution associated with the pixels of ILM-RPE region and the parameters and GoF criteria values {α 1 , . . . , α 6 , β 1 , . . . , β 6 , w 1 , . . . , w 6 , χ 2 , D kl } were extracted as feature vectors of 50 healthy B-scans from 5 subjects and 50 B-scans with pathological symptoms from 5 subjects and then given to a linear SVM classifier as input vector. In order to reduce the effect of correlation which is related to selected B-scans from same subject in train and test phases, the classifier was repeatedly trained and tested on ten different selections of subjects' combination so that 80 percent of subjects were selected for train step and remained 20 percent for test step. Eventually, the average of results of accuracy, precision, recall and F1-score are reported in Table 7. The obtained results of classification procedure show that a simple classifier can result in an acceptable performance of distinguishing between healthy data from patients which illustrate the ability of the proposed Weibull mixture model in classification of OCT data.

Conclusion
Modeling is considered as an important and notable topic in medical imaging, so that the more specific and precise model achieves the ultimate purpose of image processing tasks, i.e., discrimination of healthy from abnormal data. Hence, in this study the Weibull mixture statistical model was addressed for retinal OCT images that despite of less complexity and parameters, was considerably more consistent with dataset than previous ones, and is due to the well GoF. Also, the problem of EM algorithm that were directly solved using a numerical method and GoF results illustrated the remarkable effect.
In addition to address the statistical model for retinal OCT images, its efficiency has also been investigated for noise reduction and classification. The denoising results illustrated the proficiency of proposed method (WMM-GT-SCGMM) to improve the CNR while preserving the texture. In the initial study to utilize the proposed model in classification, the parameters and criteria of GoF as a feature vector achieved the promising results in the machine learning problem. Therefore, the proposed model can be taken into account as a way to discriminate the healthy from unhealthy data. In future work, adding more specific features, such as the geometry of disease symptoms as well as more data volumes, would allow for using sophisticated classification methods such as deep learning networks and discrimination of more diseases.

Appendix I
Here we shortly explain the EM algorithm for iterative calculation of the parameters of the Weibull , and with the unknown parameters α = (α 1 , . . . , α I ), β = (β 1 , . . . , β I ), and w = (w 1 , . . . , w I ) and the side condition ∑︁ I i=1 w i = 1. For given observed data x j , j = 1, . . . , N, (in our case, N is the number of pixels in an image) we aim at estimating all w i , α i , β i , i = 1, . . . , I.
Funding. Vice-Chancellery for Research and Technology; Isfahan University of Medical Sciences (398999).