Semi-Huber potential function for image segmentation

,


Introduction
Segmentation can be considered the first step and essential part in object recognition, scene and image understanding. Some of its applications comprise industrial quality control, medicine, robot navigation, geophysical exploration, military applications, agriculture, among others. Image segmentation is an image processing method that subdivides an image into its constitutive regions or objects. The level until which this process is carried out depends on the problem to be solved. This means, segmentation stops once the objects of interest in an application have been isolated. For example, in automated inspection in electronic assemblies is of particular interest to analyze images with the purpose of determining the presence or absence of anomalies such as missing components or broken paths. In this case, segmentation process is carried out until the required level to identify these elements.
On the other hand, digital images are usually affected by some degrading factors as blurring or noise coming from image acquisition systems or during the transmission-reception process, resulting in degraded or distorted images of the real world and, as a consequence, yielding inadequate segmentation results. A degradation process can be described as a degradation function H that, together with an additive noise term n, it operates on an input image x and produces a degraded image y: y = Hx + n.
Given y, some previous knowledge about the degradation function and some knowledge about the additive noise term, the aim is to obtain an estimation x of the original image x for a good segmentation of the regions or objects into it. The more we know about H and n, the closer x will be to x [1]. The simplest case is when the degradation function H is modeled as a linear function, but it could be non linear in many cases and then the problem becomes more complex. In this work it will be assumed that H is the identity operator and we consider only degradation due to Gaussian noise. In general, segmentation methods are based on two basic properties of the pixels in relation to their local neighborhood, discontinuity and similarity [2,3]. Methods based on discontinuity property of the pixels are called boundary-based methods, while methods based on similarity property are called region-based methods. Unfortunately, these both techniques often fail to produce accurate segmentation. For example, in boundary-based methods, if an image is noisy or if its region attributes differ by only a small amount between regions, edge detection may result in spurious and broken edges. On the other hand, region-based methods always provide closed contour regions and make use of relatively large neighborhoods in order to obtain sufficient information to decide the aggregation of a pixel into a region. This tends to sacrifice resolution and detail in the image, which can result in segmentation errors at the boundaries of the regions, and in failure to distinguish regions that would be small in comparison with the block size used [2].
To overcome these difficulties, the use of Markov random fields (MRF) within a Bayesian framework has become a powerful method and has been used in different works and different areas such as medicine [4][5][6][7][8], texture modeling [9][10][11], image segmentation and restoration [12][13][14][15][16][17][18] and SAR imagery classification [10], [19][20][21]. This is because enables posing this problem, and many others in image processing, as statistical estimation problems [9] where the solution is going to be estimated from the degraded image. Prior distributions of images, which encode contextual constraints [14], can be modeled with MRF. The basic premise is that neighborhood pixels are expected to have similar characteristics [10]. Because MRF models state global statistics in terms of local neighborhood, all computations are restricted to a local window. Typical MRF algorithms visit all sites in the image in a specific order and execute a local computation at each site; this is repeated until some convergence criterion is reached [9].
Usually, information data (input image) is not enough for an accurate estimation of the original image, so the regularization of the problem is necessary. This means that a priori information or assumptions about the structure of x need to be introduced in the estimation process [16]. The a priori knowledge is given in terms of a probability distribution. This distribution, together with a probabilistic description of the noise that corrupts the observations, allows the use of Bayes theory to compute the posterior distribution which represents the likelihood of a solution x given the observations y [22].
Statistical methods look for the solution that best matches the probabilistic behavior of the data. Maximum likelihood (ML) estimation selects the reconstruction which most closely matches the data available. Maximum a posteriori (MAP) estimation allows the introduction of a prior distribution that reflects knowledge or beliefs concerning the types of images acceptable as estimates of the original one [4]. There is a wide variety of MRF models; the difference between them lies on the choice of a potential function. Each of them characterizes the interactions between pixels in the same local group by assigning a larger cost to configurations of pixels which are less likely to occur. The main idea is to avoid excessive penalties extracted, for instance, by the Gaussian's quadratic potential function, which tends to blur edges due to the high cost of abrupt transitions [5].
In this work, we introduce a novel potential function named semi Huber MRF as a proposal of a new algorithm for image segmentation. The main advantage of this model lies in the fact that the hyperparameters to be tuned for an adequate result are less than those needed by other models that were taken as a reference to verify the results of the proposed one. Section 2 provides an overview about the Bayesian approach. Section 3 includes a theoretical basis of Markov random fields and describes the MAP estimators used in this work as a reference. A description of the proposed model and its corresponding MAP estimator is provided in section 4. Some results and comments for image segmentation experiments are presented in section 5. Finally, in section 6 some conclusions are given.

The Bayesian approach
Problems consisting on finding the solution of the model to restore some or all features of the objects in an image using assumptions about the real world are called inverse problems. A common approach to solve this kind of problems is the Bayesian modeling. A Bayesian model is a statistical description of an estimation problem that consists of three components. The first component, the prior model p(x) that is a probabilistic description of the real world or its properties, that we are trying to estimate, before collecting data. The second component, the sensor model p(y|x), is a description of the behavior of noise or stochastic characteristics that relate the original state x to the sampled input image or sensor values y. These two components can be combined to obtain the third component, the posterior model p(x|y), which is a probabilistic description of the current estimation of the original scene x, given the observed data y. The model is obtained using the Bayes rule: where p(y) is the density function of y and is constant if the observed image is provided [21].
To use Bayesian modeling in image processing, it is necessary to encode somehow the smoothness inherent in the image. This can be done by describing the correlation between adjacent pixels of the image in the prior model, and a simple method for modeling such correlation are the Markov random fields [23].
In its usual application [12], Bayesian modeling is used to find the maximum a posteriori (MAP) estimate, that is, the value of x that maximizes the conditional probability p(x|y). It is one of the more efficient and most used estimators [4,5,8,9,18] defined by: where g(x) is a MRF function that models prior information of the phenomena to be estimated as a probability distribution, X is the set of pixels capable to maximize p(x|y) and p(y|x) is the likelihood function from y given x [24].

Markov random fields and MAP estimators
In this section, some previous concepts about Markov random fields are given. Then we present and describe some existing MRF models that were taken as a reference in order to evaluate the performance of our proposal. Finally, the corresponding MAP estimators for each model are defined.
be the set of sites of a rectangular lattice for a 2D image of m × n size. Its elements correspond to the locations where an image is sampled. The sites in S are related to one another via a neighborhood system defined as  where N i is the set of sites neighboring i. Figure 1(a) shows first order neighborhood with four neighbors, second order neighborhood is shown in Fig. 1(b) with eight neighbors, and Fig. 1(c) shows higher order neighborhoods.
A clique c is defined as a subset of sites in S that consists of a single site c = {i}, a pair of neighboring sites c = {i, i }, a triple of neighboring sites c = {i, i , i }, and so on. All posible cliques for the second order neighborhood system are displayed in Fig. 2 [19,25]. The Hammersley-Clifford theorem establishes the equivalence between Markov random fields and Gibbs random fields [12,21,25,26], so the MRF can be determined by defining the potential function in a Gibbs distribution, whose basic form is given by where is the partition function and in practice is a normalization constant value. T is the temperature parameter, that controls the sharpness of the distribution [12] and in practice is assumed to be 1 [25]. U(x) is the energy function such that which is determined as a sum of clique potentials V c (x) over all posible cliques C in the neighborhood [8,19,21,25]. Most segmentation approaches based on MRF use the multi-level logistic (MLL) model to define the potential function. Usually the second order pairwise cliques are selected and the potentials of all non-pairwise cliques are defined to be zeros [21]. In that sense, we will consider in this work simple MRF's based on a second order neighborhood (eight sites) and potential functions of the form ρ(λ (x i − x j )) which act on pairs of sites, where λ is a constant that scales the difference between pixel values.

Generalized Gaussian MRF (GGMRF)
A common choice for the prior model is a Gaussian Markov random field (GMRF). The distribution for a random field of this kind has the form where B is a symmetric positive definite matrix, named the precision matrix, λ is a constant and x t is the transpose of x. To make this to correspond to a Gibbs distribution with neighborhood system ∂ s, it is imposed the constraint that B sr = 0 when s is not in ∂ r and s = r. This distribution may then be rewritten, to form the log likelihood, as where a s = ∑ r∈S B sr and b sr = −B sr .
The generalization of the GMRF is made by replacing the power 2 by p, where 1 ≤ p ≤ 2 and λ is a parameter inversely proportional to the scale of x [5]. The potential function for a GGMRF is then where a s ≥ 0 and b sr > 0, s is the site of interest, r corresponds to the local neighbors and c is a constant term. In practice it is recommended to take a s = 0 for Gaussian noise assumption, thus the unicity of the MAP estimator can be assured, resulting in Here, b sr is a constant that depends on the distance between pixels s and r. The selected value for power p is determinant, since it constrains the convergence speed of the local or global estimator and the quality of the estimated image [24].

Welsh's potential function.
The Welsh's potential function, proposed by Rivera [16] as a hard redescender potential function with granularity control, is defined as where μ is the granularity control parameter, ϕ 1 (x) = e 2 with e = (x s − x r ), and k is a positive scale parameter for edge preservation.

Tukey's potential function.
Another hard redescender potential function also proposed by Rivera [16] is the Tukey's potential function with granularity control, given by where, in this case k is also a scale parameter and μ provides the granularity control too.

MAP estimators
With Eq. (3) and the MRF's previously defined, the corresponding MAP estimators are deduced. The MAP estimator for a GGMRF [5] is given by where the term ∑ s∈S |y s −x s | q stands for the term log p(y|x), and the term σ q λ p ∑ {s,r}∈C b sr |x s − x r | p corresponds to the term log g(x) of Eq. (3). This applies for the other models. The minimization problem can be solved from a global or local point of view considering various methods [16,28,29,30]. As global iterative techniques we have the descendent gradient, the conjugate gradient, the Gauss-Seidel, among others. Local minimization techniques work minimizing at each pixel x s . In this work the Levenberg-Marquardt algorithm was used for local minimization, because all the parameters included into the potential functions were chosen heuristically or according to values proposed in references [24]. Thus, local estimation is implemented with the expression where the subset ∂ s stands for the sites in the neighborhood. Estimator performance depends on the chosen values for parameters p and q. For example if p = q = 2, we have the Gaussian condition for the potential function and the obtained estimator is similar to the least-square one since the likelihood function is quadratic. Moreover, when p = q = 1, the criterion is absolute and the estimator converges to the median one; nevertheless, this criterion is not differentiable at zero and it causes instability in the minimization process [24]. The form of the first term in Eq. (12) depends on the type of noise regarded. For all experiments made in this work we assumed that noise has a Gaussian distribution with mean value μ n and variance σ 2 n . From this idea, the corresponding value for parameter q was set at 2.
As a second MAP estimator we introduce in the second term of Eq. (3) the Welsh's potential function [16] x MAPwel = arg min . (14) In the same context, local estimation for this model is given by Finally, the last MAP estimator used as a reference, corresponds to the Tukey's potential function [16], which is given by the expression , (16) where the local estimation is obtained by

Semi-Huber proposal
For log g(x) in Eq. (3), we introduce the Huber-like norm or semi-Huber potential function, which has been used in one dimensional robust estimation problems [27] for the case of nonlinear regression. This function has been modified for the two dimensional case according to the following equation: where s is the site of interest, r corresponds to the local neighbors, c is a constant term and Here Δ 0 > 0 is a constant value and ϕ 1 (x) = e 2 with e = (x s − x r ). Semi-Huber potential function is shown in Fig. 3 for Δ 0 = 1. Near zero the function is quadratic and for values beyond ±1, the function is almost linear. This linear region of the function allows sharp edges, while convexity makes MAP estimate efficient to compute.
The MAP estimator corresponding to the proposed semi-Hubber potential function, Eq. (18) [24,27], is given by: Equation (20) is simplified to obtain: for local MAP estimation in a single site.

Experiments and results
In this section, we present a set of experiments to show the performance of the proposed model in the segmentation of some images. In all cases, we used the Levenberg-Marquardt algorithm provided in the optimization toolbox of MATLAB R2009a for the local minimization stage. For this algorithm, we need to introduce the initial value, X 0 , to start the search of the solution. It was observed that the final result depended on the choice of this value, which adds one additional hyperparameter to the segmentation process. All the tests was executed on a Mac Pro computer with a 2 × 2.8 GHz Quad-Core Intel Xeon processor and 2 GB at 800 MHz DDR2 RAM. The first experiment was carried out with an image of the brain, trying to segment in three tissues: gray matter, white matter and cerebrospinal fluid. Figure 4 shows segmentation results of the brain image corrupted by centered Gaussian noise, n ∼ N(0, Iσ 2 n ). In top row from left to right we have the original brain image, brain image with noise (σ 2 n = 10), and segmentation result using the semi-Huber MRF. In the bottom row we have the segmentation result using the GGMRF, segmentation result using the Welsh's MRF and segmentation result using the Tukey's MRF. Table 1 shows computation times taken by each model, and Table 2 shows the list of hyperparameters to be tuned.
It can be seen that visual result obtained with the proposed model is good enough with respect to those obtained with the others and difference in computation times is not significant. What matters here is the fact that, from the original semi-Huber MRF model, we only had to adjust one hyperparameter value, Δ 0 in this case, keeping λ = 1 (two if one takes into account the initial value in the optimization stage). Therefore, the number of times that it was necessary to run the segmentation process was significantly lower than with other models. Semi-Huber GGMRF Welsh A second experiment was made with a geographical image of Paso de las Piedras dam, located in Argentina, taken from Google Earth. In this case, the main interest is on segment water from no water in spite of the noise present. Figure 5 shows segmentation results of the dike image. In top row from left to right we have the original dike image, dike image corrupted by Gaussian noise with σ 2 n = 20, and segmentation result using the semi-Huber MRF. In the bottom row we have the segmentation result using the GGMRF, segmentation result using the Welsh's MRF and segmentation result using the Tukey's MRF. Table 3 shows times taken by each model and Table 4 shows the list of hyperparameters values by which results presented in Fig. 5 were obtained.
Here, it can be seen that with the semi-Huber proposed model, noise reduction looks more satisfactory than GGMRF model for example, where we can see in the water region, bigger gray points than the others. Even though the semi-Huber model took the longest time, the relative difference is negligible.
From a third experiment, we show segmentation results for a geographical image of Villahermosa Tabasco city, from the German Spatial Agency, taken by the german satellite TerraSAR-X  in november 13th 2007. Figure 6 shows in top row from left to right: original image, image corrupted by Gaussian noise with σ 2 n = 20, and segmentation result using the semi-Huber MRF. In the bottom row we have the segmentation result using the GGMRF, segmentation result using the Welsh's MRF and segmentation result using the Tukey's MRF.
In this case, computation times during the segmentation process for each model had a similar behavior than in the previous cases. It means, there was not a significant difference between them, being about 575 s on average for a 282 × 190 image size. Again, we remark the advantage that using semi-Huber potential function, fewer parameters have to be tuned according to the type of image, which allows faster segmentation results.
In order to construct a more objective assessment of the results in numeric form, we also included an experiment performed with a synthetic image of the chessboard type, which was degraded with Gaussian noise and then segmented, applying the three models of reference and the proposed one. Two error measures were considered, from segmented images with respect to the original, namely the mean square error (MSE) and the mean absolute error (MAE). Figure 7 shows the synthetic image to the left, and to the right, we have the same image degraded by Gaussian noise with σ 2 n = 20. Figure 8 contains the segmentation results obtained  Table 5, where it can be seen that the proposed model presents another advantage, the smaller error measures.

Conclusion
It was proposed the semi-Huber MRF model to construct a novel algorithm for image segmentation. We verified that this proposal has a satisfactory performance. Times in execution were similar and visual segmentation results were agree with those obtained from models reported in other works. In the case of the Generalized Gaussian, Welsh's and Tukey's Markov random fields, several tests had to be made to find adequate parameter values for each kind of image, since one has more degrees of freedom. While, for the semi-Huber MRF we obtained good results with fewer tests because the parameter adjustment is less complicated. On the other hand, the proposed model produced very good results concerning to error measures. Some of the experiments presented were made on geographical images because we pretend an application on the analysis of this kind of images, properly that concerning to hydrographic resources.