HALF-LINEAR REGULARIZATION FOR NONCONVEX IMAGE RESTORATION MODELS

. Image restoration is the problem of recovering an original image from an observation of it in order to extract the most meaningful information. In this paper, we study this problem from a variational point of view through the minimization of energies composed of a quadratic data-ﬁdelity term and a nonsmooth nonconvex regularization term. In the discrete setting, existence of minimizer is proved for arbitrary linear operators. For this kind of problems, fully segmented solutions can be found by minimizing objective nonconvex functionals. We propose a dual formulation of the model by introducing an auxiliary variable with a double function. On one hand, it marks the edges and it ensures their preservation from smoothing. On the other hand, it makes the criterion half-linear in the sense that the dual energy depends linearly on the gradient of the image to be recovered. This leads to design an eﬃcient optimization algorithm with wide applicability to several image restoration tasks such as denoising and deconvolution. Finally, we present experimental results and we compare them with TV-based image restoration algorithms.


1.
Introduction. An important problem in image processing is the reconstruction of an original image from a degraded observation of it. This can be done by removing some kind of distortion or perturbation in order to recover the underlying true image. Such a process, called image restoration, has been studied at length from different points of view [11,27,28,43,44]. Most of the problems related to the degradation are caused by the noise (restoration by denoising) and by the blur (restoration by deblurring).
A simple model for image restoration is based on the following steps: use a priori information of the degradation, propose a model for the degradation process, and formulate and evaluate objective criteria of goodness. In the variational framework, this leads to the minimization of a functional involving a first regularization term related to the gradient of the image to be recovered u and a second one which measures the fidelity to the data f . The regularization is written in terms of a so-called potential function φ and the problem consists in the minimization of an energy functional such as where u and f belong to a suitable space of functions, A is a linear operator modelizing the degradation of u, and λ is a trade-off parameter between the two terms. The most common model is the widely mentioned total variation smoothing functional of Rudin, Osher and Fatemi [43], where φ(t) = t. Although convex functions are often used for the regularization term, Aubert and Kornprobst [3] pointed out that nonconvexity gives better results in numerical tests. Since the pioneering work of S. Geman and D. Geman [21], different nonconvex potential functions have been considered either in a statistical or variational framework [3,6,19,32,37,39]. For this reason, many image restoration problems are often converted into nonconvex optimization problems. However, their practical interest is limited by theoretical deficiencies about the existence of solutions and the convergence of minimizing sequences. Furthermore, nonconvexity requires complex computational stages involving backward-parabolic Euler-Lagrange equations that lead to ill-posed problems. For instance, it is well-known that Perona-Malik PDE [39] is formally ill-posed.
An important effort has been done in order to avoid the numerical intricacies arising from nonconvex optimization. Robini et al. [41,42] combined simulated annealing with deterministic continuation for the reconstruction of piecewise constant images. On the other hand, Nikolova et al. [36,37] proposed a nonsmooth continuation GNC-like approach in the context of linear inverse problems. The segmentation and the restoration problems were solved jointly by minimizing a nonconvex and nonsmooth energy. However, there is no guarantee for the convergence of the continuation method. Furthermore, an important drawback is the use of a large number of additional variables that increase the computational complexity of the algorithm. Other interesting approaches for dealing with nonconvex optimization can be found in [9], where the authors proposed a gradient sampling algorithm, and in [15], where a smoothing nonlinear conjugate gradient method was introduced. Moreover, a Newton-type solution algorithm was proposed in [25], while in [1,8], the authors introduced a proximal alternating linearized minimization algorithm by completely depart from the convex setting.
In this work, we are interested in expanding the scope of edge-preserving convex approximations to nonsmooth nonconvex regularizers. By imposing some conditions on the potential function to preserve the main information of the image, that is, forward diffusion in quasi-constant areas and enhancing around edges, we can interpret the dynamic on the associated Euler-Lagrange equation. In this sense, our proposed model acts as the weighted total variation, where the weight depends on the norm of the gradient. We also introduce an auxiliary variable ω that transforms the primal problem into a dual one by means of an augmented energy J * which is linear with respect to the norm of the gradient of u. For this reason, we refer to our approach half-linear regularization. More precisely, we shall have that where the solution of min ω J * (u, ω) will be explicitly given by ω = φ (|∇u|). We shall finally propose a dual algorithm with ω acting as edge detector.
Our proposal has a direct relationship with two different approaches to solve the optimization problem in the nonconvex case. On one hand, Menard et al. [2,24] proposed a weighted total variation model by using the ideas of anisotropic diffusion and the total variation. On the other hand, Charbonnier et al. [14] provided a deterministic algorithm to find the solution of the optimization problem by minimizing a dual energy, which involves an isotropic weighted Laplacian. This is the so-called half-quadratic criterion and it is based on the pioneering papers by Geman et al. [19,20]. In our case, the minimization of the dual energy involves an anisotropic term given by the weighted total variation. From a numerical point of view, the weighted TV is not straightforward to minimize since it is not differentiable at zero. However, TV-regularizers can better preserve sharp edges and object boundaries, which are usually the most important features to recover, than Laplacian-like regularizers. Now, we briefly streamline the novelty of our approach and our contributions. This paper has four major contributions, which are subsequently outlined: • We prove the existence of minimizer for general nonconvex discrete energies by only imposing that the linear operator modeling the degradation of the image does not annihilate constant functions as the gradient does. Because the coercivity is not guaranteed for some of the functionals under consideration, the novelty lies on the original mathematical tools used in the proof. • The main contribution of the paper is the new-proposed half-linear regularization for nonsmooth nonconvex energies satisfying a suitable set of conditions for general edge preservation. • Nikolova et al.. [36] proved that the minimization of nonsmooth nonconvex energies provides piecewise constant images with neat edges. We relax the conditions imposed there and, thus, we extend the theorem to more general functionals. • The last contribution is an optimization dual algorithm that exploits the properties of the half-linear regularization. We establish convergence results for the new-proposed scheme. Numerical experiments confirm the role of the dual variable as edge detector and exhibit the efficiency of our approach to restore piecewise constant images from noisy and blurred data. The rest of the paper is organized as follows. Section 2 displays the general minimization problem we focus on and it establishes the existence of minimizer for general nonconvex energies. We discuss in Section 3 the set of conditions imposed on the potential function under which the half-linear regularization applies. This section contains theoretical results that gives meaning to our approach and allow us to introduce a robust optimization algorithm in Section 4. Section 5 exhibits the applicability of the algorithm for image denoising and deconvolution. We compare our results with TV-based restoration algorithms, the half-quadratic criterion among themselves. We conclude with some final remarks.

2.
The general minimization problem. The most commonly studied degradation model in image restoration relates the observed data f to the underlying true image u by means of where η accounts for additive white Gaussian noise and A is a linear operator modeling the degradation of u caused, for instance, by optical blurring. Image denoising models may be seen as a special case by taking A as the identity operator. Unfortunately, getting u from (1) is in some cases an ill-posed inverse problem in the sense of Hadamard [26]. Indeed, A * A is not always one-to-one and, even in this case, its eigenvalues may be small causing numerical instability. Prior information on the underlying image can then help the restoration of u. A flexible way to do that is regularization. In this case, the solution is defined as the minimizer of an energy functional. The general problem we consider in this paper lies on the minimization of an objective functional composed of a nonsmooth nonconvex regularization term and a quadratic data-fidelity term: In this setting, φ is a nonsmooth nonconvex potential function and λ is a positive weighting constant that controls the trade-off between a good fit to f and the amount of oscillation of u. An important requirement in image restoration is that φ allows the recovery of large differences at the locations of edges and smoothes differences elsewhere.
Let us now introduce the discrete minimization problem we are interested in. To simplify, we assume that an image is denoted by a two-dimensional matrix of size N × N . We denote by X the Euclidean space R N ×N endowed with the usual otherwise, for each i, j ∈ {1, . . . , N }. Here α > 0 is a real scaling parameter that tunes the value of the gradient above which a discontinuity is detected. The standard Euclidean norm of R 2 will be simply denoted by | · | and, thus, |∇u| 2 Y = i,j | (∇u) i,j | 2 . With this notation, the general problem we consider is where J is the discrete functional given by 2.1. Existence of minimizer. We start our investigations of the problem (3)-(4) by establishing the existence of solution. A basic requirement is (A0) ker(A) ∩ ker(∇) = {0}, which is equivalent to suppose that A does not annihilate constant functions as the gradient operator does. Note that if φ is a bounded nonconvex potential function, then (4) is not coercive unless we impose A to be invertible. However, this hypothesis does not apply when A is, for instance, a convolution operator with a point spread kernel. Therefore, the coercivity of the functional cannot be used. In the following result, we provide a novel and original proof instead. Proof. Since φ(t) ≥ 0 for t ∈ (0, +∞) and φ(0) = 0, then the functional J(u) is bounded below by zero for every u ∈ X and, thus, its infimum exists. Let us denote it by m = inf u∈X J(u).
Define G(u) = λ 2 Au − f 2 X and consider the optimization problem inf u∈X G(u). It is easy to prove that G has at least a minimizer. If A is of full rank, then J is coercive and continuous so that it reaches its minimum and the proof would be finished. Otherwise, the solution set is given by the affine variety u 0 + ker(A), where u 0 is a particular solution of the associated equation A * (Au − f ) = 0. Let µ 0 = min u∈X G(u) be. For a fixed µ ≥ µ 0 , consider the subset and define the functional Let us prove that J 1 reaches its minimum on M µ . Fix u 0 ∈ M µ0 and define τ = max i,j |(∇u 0 ) i,j |. We can assume τ > 0, otherwise u 0 would be a minimizer of (5) so we would done. Since M µ is unbounded, we can move sufficiently far from u 0 in M µ so that any v = u − u 0 , with u ∈ M µ , tends to be parallel to ker(A). Therefore, (A0) implies that there exits C > 0 such that On the other hand, take R > C and define the compact set K = M µ ∩ B R (u 0 ), where B R (u 0 ) denotes the closed ball of radius R centered at u 0 . Because J 1 is continuous, there exists u µ ∈ K such that It remains to prove that u µ is in fact a minimizer of (5) on M µ . For that, fix u ∈ M µ \ K and define v = u − u 0 . Therefore, one has that v X ≥ R > C and (6) yields The previous inequalities and the increase of φ imply J 1 (u) ≥ J 1 (u 0 ) ≥ J 1 (u µ ), so u µ is a minimizer of (5) on M µ . Finally, consider the function g : [µ 0 , +∞) → R defined as g(µ) = µ + k µ . Since g is continuous and coercive, there exists µ ∈ [µ 0 , +∞) such that .
In what follows we shall prove that g( µ) = m. By definition of infimum, for any > 0 there exists v ∈ X such that J(v) < m + . Define µ * = λ 2 Av − f 2 X , whence one clearly has that v ∈ M µ * , and let u µ * be a minimizer of J 1 on M µ * . One deduces that Since g( µ) ≤ g(µ * ) < m + for any > 0, then g( µ) ≤ m. On the other hand, we have that Then, g( µ) = m and u µ ∈ M µ is a minimizer of J, which ends the proof.
Beyond guaranteeing the existence of solution for the minimization problem under consideration, the main contributions of Theorem 2.1 are the mathematical tools used in its proof.
3. Half-linear regularization. It is well known that minimization of nonconvex energies involves numerical difficulties that restrict the methods that can be used. In order to overcome these drawbacks, we reformulate the objective functional J as an augmented energy J * by introducing a closed-form dual variable.
3.1. Assumptions on the potential function. In order to correctly introduce the half-linear regularization, we need to assume some specific conditions on the potential function. A qualitative reasoning on the Euler-Lagrange equation is developed here to justify the set of imposed conditions. The requirements are imposed to have forward diffusion (smoothing) in quasi-constant areas and backward diffusion (enhancement) around edges.
If we focus on the continuous case, suppose that some particular function u happens to be a minimizer of (2). Then it formally satisfies the Euler-Lagrange equation Note that the divergence term is clearly related to the weighted total variation regularizer and, thus, φ (|∇u|) acts as the "weight". In the case that a pixel belongs to a quasi-constant area, we can assume that all neighboring gradients are close to zero. Consequently, we impose that lim t↓0 + φ (t) = M, 0 < M < +∞. Then, φ (|∇u|) is close to M and (7) behaves as the usual total variation near points where the norm of the gradient is small. On the contrary, consider now a pixel belonging to an edge. A discontinuity appears in its neighborhood and, thus, the gradient in the orthogonal direction of the edge is large. In this case we impose lim t→+∞ φ (t) = 0 and, thus, φ (|∇u|) is close to zero in the transversal direction of the edge so that the singularity is preserved. Furthermore, since we want to avoid that small variations in the gradient produce large changes in the value of the weight, we impose φ to be continuous. It also seems natural that there should be a one-to-one correspondence between values of the gradient and values of the weight, so that φ must be strictly monotonous.
Requirements involved in (A1)-(A2) are typical in image restoration [3]. In general, (A3)-(A5) are conceived in such a way that small variations are smoothed, assuming they are due to noise, and high variations are preserved, supposing they are generated by edges. But noise can also produce high variations while edges with low contrast can generate low gradient modulus. This may be seen as a limitation of the proposed model, although it commonly appears in almost all edge-preserving regularizers such as the half-quadratic criterion [14,19,20].

Remark 1.
Since φ is given on [0, +∞), we can define the right derivative of φ at zero by φ (0) := M due to (A2) and (A5). Therefore, φ is continuously differentiable on [0, +∞). However, under the above conditions, the even extension of φ on R would be non-differentiable at zero.
Let us now study the above requirements from another point of view. Using the tangent and normal directions to the level lines along which the image intensity is constant, the Euler-Lagrange equation can be decomposed in terms of the directional derivatives. Hence, (7) can be rewritten as where u T T and u N N denote the second derivatives in the tangent and the normal direction, respectively. At first glance, (A1)-(A5) imply that we encourage forward diffusion along isophotes, that is, lines of constant grey level value, combined with backward diffusion along the gradient direction. Perona and Malik [39] showed that edge enhancement and reconstruction of blurred images can be performed by imposing the normal diffusion coefficient to be negative. In our case, (A3) implies that φ (|∇u|) < 0 and, thus, an inverse diffusion might be observed in the normal direction when using the class of potential functions proposed in this paper. Therefore, the regularization term in (4) always sharps, more or less, the edges of the image.
At locations where the gradients are low, (A5) ensures that the tangent diffusion coefficient is huge. Consequently, the diffusion takes place mainly following the level lines along which the image intensity is constant, while these curves are also enhanced. In a neighborhood of an edge, where the image has a strong gradient, condition (A4) implies that Consequently, there is no diffusion neither in the tangent nor in the normal direction. Note that φ (t) t tends to 0 + , whereas φ (t) tends to 0 − so that the edge is not only preserved but also slightly enhanced. After all, the diffusion process leads to constant regions surrounded by sharp edges.

3.2.
The main dual theorem. The key of the half-linear regularization is to introduce a dual variable ω, with closed expression, which detects edges and image features. Our approach is based on the theorem below. i) There exists a strictly decreasing and strictly convex function ψ : ii) For each fixed t > 0, let ω ∈ (0, M ) be the value for which the minimum is reached. Then, ω is unique and it is given by ω = φ (t). Furthermore, for t = 0 the minimum is ω := M .
Proof. In view of (A2), φ : (0, +∞) → (0, M ) is continuous and strictly decreases. Then, φ is strictly concave on (0, +∞) and, thus, with equality only if t = s. Furthermore, there exists a strictly decreasing function For each ω ∈ (0, M ), let us define the function where we should pay special attention to the limit cases: On the other hand, substituting (12) into (11) and taking into account that f is one-to-one, we obtain which proves (9).
Remark 2. The proof of the theorem reveals the explicit expression of ψ, although the second statement makes clear that we do not really need ψ to solve (9).
By assuming that (A1)-(A5) hold, the functional (4) can be written as (14) J 1≤i,j≤N and the dual energy is explicitly given by Note that J * is linear with respect to the norm of the gradient of the main variable and it is strictly convex with respect to the auxiliary variable when the main one is fixed 1 . In fact, J * involves a weighted total variation term on u with ω acting as the weight. Furthermore, from Theorem 3.1, the value of the dual variable that minimizes (15), when u is fixed, is explicitly given by It is important to emphasize that the above closed formula and the strictly decreasing property of φ guarantee the role of ω as an indicator of edges: • If ω i,j is close to zero, then the gradient at pixel (i, j) is large and, thus, it belongs to an edge. • If ω i,j is close to M , then the gradient at pixel (i, j) is small and, thus, it belongs to a quasi-constant region. Table 1 lists some examples of nonconvex potential functions satisfying (A1)-(A5), and Figure 1 displays their graphs as well as the graphs of the corresponding weight functions. Table 1. Examples of potential functions satisfying (A1)-(A5), with the associated weight functions ω and the strictly convex functions ψ : (0, M ) → (0, β).
Theorem 3.1 is based on similar results related to the half-quadratic regularization [14,19,20]. However, it is important to distinguish between the scope of both approaches. Functions on which half-quadratic applies do not meet all assumptions (A1)-(A5). Moreover, our potential functions and those given in [14,19,20] have different behaviors near zero, although both are edge-preserving in the general  Table 1.
sense. Another important difference relies on their associated Euler-Lagrange equations. The proposed model approximates the potential function by t → ω|t|, which involves weighted total variation where the weight function is defined as φ (t). On the contrary, the half-quadratic criterion uses the transformation t → ωt 2 , which leads to weighted Laplacian with the weight being φ (t)/t, closely related to quadratic functionals. After all, Theorem 3.1 is a significant contribution to the minimization of energy functionals in the image restoration field because it allows to deal with potential functions that are not treatable by the half-quadratic approach. More concretely, we consider nonsmooth nonconvex functionals that are suitable for the restoration of piecewise constant images with neat edges or for the segmentation of noisy and blurred data. This shall be developed in the following sections.
Finally, it is obvious that minimizing quadratic functionals is simpler than minimizing the weighted total variation. Moreover, a well-known drawback in the latter case is the staircasing effect. Nevertheless, Rudin, Osher and Fatemi [43] observed that quadratic regularization does not allow recovering sharp discontinuities as the total variation does. Hence, we meet a meaningful benefit of the proposed model.

3.3.
Properties of the minimizers. The interest in the class of potential functions described in the previous subsections is motivated by theoretical results, which assert that nonconvex nonsmooth regularization provides better possibilities for restoring images composed of constant regions surrounded by neat edges [34,36,37].
We prove a similar result for the optimization problem (3)-(4) by assuming the imposed hypothesis on φ. The main interest of this result is that fully "segmented" solutions of arbitrary linear inverse problems can be found by minimizing an objective functional where the regularization term is constructed using a nonconvex potential function. To prove that, we need to assume the following additional condition: φ (t) t 2 strictly increases on (0, +∞) and lim t↓0 + φ (t) t 2 = −∞. We can state now the following result, which proof is outlined in Appendix A.
Although θ is not given explicitly, this theorem provides interesting theoretical justification. We can consider the restoration of piecewise constant images where the number of the regions and their values are not fixed in advance from noisy and blurred data obtained at the output of a linear operator.
Theorem 3.2 is based on a similar result in [36] where Nikolova et al.. showed the same property but under different conditions. To compare both results, any twice continuously differentiable function 2 satisfying Nikolova's requirements also meets (A1)-(A6) and, thus, our theorem follows. However, φ 1 in Table 1 is a good counterexample to show that the other way around is not always satisfied. 4. The proposed dual algorithm. For solving the minimization problem (3)-(4) we propose Algorithm 1, which exploits the properties of the half-linear regularization by alternating minimizations over u and ω.

Algorithm 1 Proposed dual algorithm
u 0 = f repeat ω n+1 = arg min ω J * (u n , ω) = φ (|∇u n |) u n+1 = arg min u J * u, ω n+1 until convergence On one hand, when u n is known and fixed, Theorem 3.1 states that the minimizer of J * (u n , ω) with respect to the dual variable ω ∈ (0, M ] N ×N is explicitly given by (17) ω n+1 i,j = φ | (∇u n ) i,j | , ∀i, j ∈ {1, . . . , N }. On the other hand, once ω n+1 is known and fixed, the minimization problem (18) u n+1 = arg min u∈X J * u, ω n+1 is equivalent to the minimization of the objective functional J and it underlies a total variation regularizer weighted by ω n+1 . The term of the weighted total variation of earlier expression is where ω plays the role of weight function. In fact, this is a discretization of the standard weighted total variation [10] defined in the continuous setting for a function u ∈ L 1 (Ω) by

4.1.
Convergence analysis. Algorithm 1 converts the original nonconvex minimization problem into a sequence of convex minimization problems. As usual, convergence is a key theoretical issue for iterative schemes. However, nonconvexity introduces theoretical deficiencies about the existence of a global minimizer and the convergence of minimizing sequences {u n } n≥0 . In our case, it can be shown that {u n } n≥0 converges to a stationary point of (4) provided its stationary points are isolated. Otherwise, the accumulation points of {u n } n≥0 can be guaranteed to get arbitrarily close to the set of all stationary points of J.
For every u ∈ X let us define the mapping where ω u = φ (|∇u|). Note that M is a point-to-set map that associates with every point u ∈ X a subset of X (see [29] for more details about this class of mappings). This is due to the fact that J * is not strictly convex with respect to its first variable and, thus, it may have several minimizers. The procedure (17)-(18) is thus equivalent to the iterative application of (19) from an initial point u 0 ∈ X, where it is not relevant which of the points in the set M(u n ) is assigned to u n+1 . The fixed points of M satisfy u ∈ M(u). By the construction in (19)- (20), all fixed points are minimizers of J * with respect to its first variable, but they are also stationary points of J as stated in the following proposition. Its proof is outlined in Appendix B. We can state now several results about the convergence of the proposed algorithm that strengthen theoretically it. Our analysis proceeds in much the same way as [17,38]. We first show that the sequence {J(u n )} is decreasing and convergent, and that the sequence {u n } has accumulation points. Under additional conditions, we finally prove that {u n } converges to a stationary point of (4). The following theorem contains the statements about the convergence of our algorithm. All proofs are contained in Appendix C.
Theorem 4.1. Let conditions (A0)-(A5) hold. From any initial point u 0 ∈ X, let {u n } n≥0 be any sequence generated by Algorithm 1 as described in (19)- (20). Then, the following statements hold: i) The sequence {J(u n )} n≥0 , with J(u n ) = J * (u n , ω n+1 ), decreases and converges to J( u), where u is a stationary point of J. ii) The sequence {u n } n≥0 has convergent subsequences and their limits are stationary points of J. iii) u n+1 − u n X → 0 as n → +∞. Finally, let us emphasize that the above convergence analysis does not depend on the numerical scheme used to solve the convex minimization problem (18).

Algorithmic details.
There exist many numerical algorithms for solving (18) efficiently, for instance [4,5,16,23,33,46]. We use the first-order primal-dual algorithm proposed by Chambolle and Pock in [13,40], which can be applied to a wide class of convex optimization problems comprising ours. Throughout this subsection we shall make frequent use of the so-called proximity operator, which was introduced by Moreau in [30] as a generalization of the notion of a convex projection operator. Let where F * is the convex conjugate of F . Every solution ( u, p) ∈ X × Y of (21) satisfies K u ∈ ∂F * ( p) and − (K * p) ∈ ∂G ( u), so the iterates of the algorithm proposed by Chambolle and Pock [13] are where K * is the adjoint operator of K. The algorithm thus consists of alternating a gradient ascend in the dual variable and a gradient descend in the primal variable.
The step-size parameters τ > 0 and η > 0 are chosen such that τ ηL 2 < 1, where L = K . Finally, parameter θ controls the over-relaxation in u, θ ∈ [0, 1]. Now, we adapt the algorithm of Chambolle and Pock in order to solve (18). We first emphasize that ψ(ω) in (15) is not involved in (18) and, thus, our primal problem is defined as Let K be given by Ku = ω∇u, so that its adjoint operator is defined as K * (p) = −div (ωp) since Ku, p Y = ∇u, ωp Y = u, −div (ωp) X for all u ∈ X and p ∈ Y . It is easy to show that K 2 ≤ 8. For casting (18) in the form (21), it remains to identify F and G, and to detail the proximity operators in (22). For a better understanding, we exhibit first the case A = I.
Identity operator. If A = I, define F (Ku) = Ku 1 and G(u) = λ 2 u − f 2 X . Therefore, the associated saddle-point problem is where P = {p ∈ Y : |p i,j | ≤ 1, ∀i, j ∈ {1, . . . , N }} and δ P is the indicator function of P defined as The resolvent operator (I + η∂F * ) −1 reduces to point-wise Euclidean projectors onto L 2 balls: Furthermore, (I + τ ∂G) −1 can be computed point-wise using the Euler-Lagrange equation: The iterative scheme for A = I is outlined in Algorithm 2.

Algorithm 2 Detailed algorithm for
General case. For a general linear operator, a convenient reformulation can be obtained by dualizing the norm in the data-fidelity term. This introduces an additional dual variable q ∈ X, yielding min u∈X max p∈Y,q∈X We have F * (p, q) = δ P (p) + 1 2λ q 2 X and G(u) = 0. The proximity operator for F * can be computed separately for each variable. For p ∈ Y we get the same result as in (23) and for q ∈ X we obtain Finally, since G = 0 and u is unconstrained, the proximity operator for G is the identity. The general iterative scheme is outlined in Algorithm 3.

Algorithm 3 Detailed algorithm for a general linear operator
Experimental results. In this section, we present a set of experimental results illustrating the performance of the new-proposed half-linear regularization. The aim is to show that, whatever may be the perturbations to which an image is subjected, the proposed algorithm provides piecewise restored images with neat edges.
We left Algorithm 1 proceed until u n+1 − u n 2 /N 2 became smaller than 10 −6 . The initial weight was set to ω 0 = φ (|∇f |). The trade-off parameter λ and the scaling parameter α were estimated empirically throughout the experimental section. Moreover, the tolerance for Chambolle-Pock's scheme in Algorithms 2-3 was 10 −4 , and the parameters there were chosen so that L = √ 8, τ = η = 1/L and θ = 1 in all cases. All standard deviations given in this section are relative to the intensity range [0, 255]. 5.1. The role of the dual variable as edge detector. Section 3 revealed that requirements (A3)-(A5) strengthen the role of the dual variable as indicator of edges. This fact was experimentally checked by applying the proposed dual algorithm for A = I (Algorithm 2 was thus used for solving the associated convex optimization problem).
A first example is given in Figure 2. We display the weight associated to each potential function in Table 1 after applying our algorithm to a degraded image with noise of standard deviation σ = 10. For a better visualization, the dual variable is plotted in grayscale, so that black pixels correspond to negligible weights (enhancement) whereas white pixels stand for the largest ones (diffusion). Despite the noise, ω perfectly detects the edges and other image features. Figure 3 shows the restoration of a non-constant image. As it has been theoretically justified, the solutions are piecewise constant images with neat edges. Figure 2. Weights associated to the denoised images obtained after applying the proposed algorithm for each potential function. The noisy image contains Gaussian noise of standard deviation σ = 10. The parameters were fixed to λ = 0.1 and α = 30. As expected, the dual variable perfectly detects image features and it provides neat edges.
We also observe that the potential function used in the experimentation influences the result. Indeed, some image features are emphasized by some φ's whereas they are omitted by some others. Consequently, the weight assigned to each pixel in the image changes depending on the potential function being used. Note that, in general, image regions with slight intensity variations are more regularized with φ 1 than with the other potentials. It can be thought that this fact is related to the property lim t↓0 + φ 1 (t) = 0 in (8) and, as a consequence, small gradients are not enhanced. Anyway, the behavior of the dual variable can be supervised by the scaling parameter α, which controls the gradient level that one wants to preserve. In this sense, Figure 4 displays the weights for several values of α. As the scaling parameter increases, more singularities are penalized.

Applications in image restoration using different potential functions.
Many nonconvex potential functions have been proposed in the literature as regularization terms. We are interested in those listed in Table 1 satisfying conditions (A1)-(A6). In the following, the optimal values for the trade-off parameter λ and the scaling parameter α were chosen manually to maximize the so-called peak signalto-noise ratio 4 (PSNR) for each potential function.
Image denoising. Denoising is the problem of removing the inherent noise from an image. The standard noise model is an additive white Gaussian noise, which is supposed to be at each point a zero-mean i.i.d. Gaussian random variable. Therefore, the half-linear regularization applies to solve the minimization problem (3)-(4) for A = I. Figure 5 displays an example on piecewise constant data corrupted with noise of standard deviation σ = 25. Note that all potential functions behave similarly except φ 4 , which performs worse than the others. The numerical results are quite impressive, since the RMSE values have been reduced by one third whereas the PSNR values have been considerably improved. Furthermore, the denoised images are visually close to the ground truth in all cases. Therefore, the proposed model is 4 The peak signal-to-noise ratio (PSNR) is the ratio between the reference signal and the distortion signal in an image, given in decibels. The higher the PSNR, the closer the distorted image is to the original. Its value is defined as PSNR = 10 log 10 , where MSE is the mean-squared error between the original and the distorted image, and MAX is the maximum possible pixel value of the image. In our case, since the pixels are represented using 8 bits per sample, MAX = 255. Figure 3. Restored images (first row) and its associated weights (second row) obtained after applying the proposed algorithm for each potential function. The observed data was a noise-free image, and the parameters were fixed to λ = 0.1 and α = 10. Note that pixels have different weights depending on the potential function being used. On the other hand, small gradients are more regularized with φ 1 than with the other potential functions.  The dual variable can be efficiently modeled by α, since it tunes the value of the gradient above which a discontinuity is detected. As α increases, more gradients are penalized.
able to remove the noise from the data and it provides denoised images with closed constant regions surrounded by neat edges.
Once we have confirmed that the half-linear regularization works well on the class of images related to the underlying image model, we want to check its performance on textured images. Figure 6 shows that, as a consequence of Theorem 3.2, the algorithm provides segmented-denoised images where high-frequency details such as texture features are removed. In fact, TV-based denoising algorithms are not able to distinguish between high-frequency signals generated by texture from noise.  Image deconvolution. Deblurring is the problem of restoring an image that has been blurred and possibly corrupted with noise. Deconvolution refers to the case where the blur to be removed is linear and shift-invariant, so it may be expressed as a convolution with a point spread function. We suppose that the linear operator is given by Au = ϕ * u, where ϕ is a Gaussian convolution kernel. In this case, we used Algorithm 3 to solve (18), which requires convolving twice with Gaussian functions. For large values of the standard deviation, this operation in the spatial domain is extremely time consuming. This can be solved by working in the Fourier domain where the convolution becomes a mere multiplication. In fact, the convolution with a normalized Gaussian can be implemented exactly in the discrete Fourier domain [31].
We display first an example of deconvolution for a piecewise constant image. The observed data was simulated by convolving the ground truth with a Gaussian kernel of standard deviation σ = 5. Figure 7 shows the restored images for each potential function. Although φ 1 provides the best result in terms of RMSE and PSNR values, all restored images are of high quality since the blur has been completely removed  . Denoising of a textured image corrupted with Gaussian noise of standard deviation σ = 10. As expected, the algorithm provides segmented-denoised images with neat edges, but where high-frequency details are not preserved. This is the case of the house bricks texture. while edges have been enhanced. Therefore, images look clear and are visually close to the ground truth. Figure 8 illustrates the restoration of a non-constant image that was convolved with a Gaussian kernel of standard deviation σ = 3. In this case, our algorithm provides segmented images from blurred data and, thus, the problems of segmentation and deconvolution are solved jointly. For this reason, numerical differences between the segmented images and the ground truth cannot be impressive.

5.3.
Comparison with half-quadratic regularization. We shall compare in the next subsection our algorithm with some state-of-the-art TV-based restoration methods solving problem (3)-(4), the half-quadratic approach among themselves. However, let us first exhibit numerically benefits, limitations and complementarities of both regularizers.
We implemented in C/C++ the ARTUR algorithm proposed in [14] with the potential function φ GM = t 2 /(1 + t 2 ), which satisfies all edge-preserving requirements for half-quadratic regularization. Note that φ GM is nonconvex but smooth. For our algorithm, we used φ 1 (t) = φ GM (t).  First, we tested both algorithms on a piecewise constant image corrupted with Gaussian noise of standard deviation σ = 20. The parameters λ and α were chosen manually to maximize the PSNR value for each method. Figure 9 reveals that the result provided by ARTUR becomes cartoon-like due to over-smoothing and, thus, some important details have been blurred. This phenomenon does not happen with our algorithm, since image features are highly preserved and edges look clear. These assertions are confirmed by the RMSE and PSNR values, for which our model has better performance.
Both regularizers were also used to denoise a textured image corrupted with Gaussian noise of standard deviation σ = 10. Figure 10 displays the best results for each algorithm. In general terms, our method provides a piecewise constant denoised image, whereas the result provided by ARTUR is piecewise smooth. Note also that important artifacts in the sky appear in the latter case. The proposed dual algorithm also outperforms the other in terms of the error. Anyway, both regularizers remove high-frequency details such as texture features.
In  bad adjusted, λ = 10 −4 , and both algorithms were run for the same number of iterations. One sees in Figure 11 that the restored noise-free image has been more regularized with ARTUR than with the proposed algorithm. It is clear that the asymptotic behavior of the half-linear regularizer is a piecewise constant image. On the contrary, the asymptotic behavior of the half-quadratic regularizer is an isotropic smoothed image. Consequently, the half-quadratic criterion yields to an over-smoothing process in such a way that image features are partial or completely removed. Our regularization approach is able to preserve the edges, even when the role of the fidelity-term in the energy functional is almost insignificant.

5.4.
Comparison with TV-based image denoising algorithms. The image denoising problem has been intensively studied during the last two decades and numerous algorithms have been proposed and lead to brilliant success. Here we are interested in those arranged in the class of variational methods based on the widely mentioned Rudin-Osher-Fatemi model [43]. A detailed comparison between Algorithm 1 and some TV-based denoising algorithms was performed.

Ground truth
Noisy image RMSE = 16.37, PSNR = 23.85 Half-quadratic RMSE = 6.62, PSNR = 31.72 Half-linear RMSE = 6.03, PSNR = 32.52 We compared our algorithm with the Chambolle's projection algorithm [12], the Split Bregman method [45], and the ARTUR algorithm [14]. Reliable software and online demos on the two first approaches can be found at IPOL webpage [18,22]. In all cases, the trade-off parameter λ was selected manually to maximize the PSNR values. Other algorithm-specific parameters were fixed as given in the corresponding papers. The proposed dual algorithm used φ 3 as potential function. The study was led on the noise-free images displayed in Figure 12 our result the best one also in these cases. We also note that the performance of all methods on images with high-frequency regions such as Building is less impressive. This can be easily justified by taking into account that TV-regularizers cannot distinguish between high-frequencies due to texture from those due to noise. In these cases it is more appropriate to use a patch-based denoising method. However, these algorithms have much greater computational complexity and are usually much more slower than most TV-based denoising techniques.   Table 4. Global averages of the RMSE and PSNR values over all images and all standard deviations.
In general, a low performance in the RMSE and PSNR values also entails a rejection by a human visual inspection. In spite of this, any numerical criterion cannot fully replace human evaluation, which still is an important criterion for judging the performance of denoising algorithms. A first example on the piecewise constant Phantom image is displayed in Figure 13, where the observed data contains Gaussian noise of standard deviation σ = 30. Note that the denoised images provided by all techniques except ours are over-smoothed and, thus, edges look blurred. Furthermore, Chambolle's and Split Bregman methods suppress important image features. On the contrary, our algorithm outperforms all other techniques and the result best approaches the ground truth. We achieve to completely remove the high-level noise from the input data while sharpening the edges. Accordingly, the overall denoised image looks clear. Figure 14 displays a second example on the Building image corrupted with Gaussian noise of standard deviation σ = 15. All denoised images are visually similar and of high quality, although the result provided by the ARTUR algorithm presents several artifacts on the sky. However, all methods fail to recover texture features such as the trees in front of the building entrance. 5.5. Extension to color images. For color images, the standard dual definition of the vectorial total variation introduced by Blomgren and Chan [7] where u = (u 1 , . . . , u M ) and M is the number of color channels. The dual energy (15) is now where f = (f 1 , . . . , f M ) is the vector-valued data. Furthermore, the weight function is given by It is straightforward to adapt Algorithms 2 and 3 for solving the vectorial minimization problem (18).
In order to illustrate the behavior of the half-linear regularization, Figure 15 displays an example of color denoising for an image corrupted with Gaussian noise of standard deviation σ = 15. We used φ 3 as potential function for our algorithm, and we compared it with Chambolle's projection method. As in the grayscale case, the denoised image provided by Chambolle's algorithm suffers from over-smoothing The results are visually close to each other, although the denoised image provided by the halfquadratic criterion presents important artifacts on the sky. As shown by the numerical study, the performance of all compared techniques on high-frequency regions is not entirely good. Indeed, the texture of the trees in front of the building entrance is not recovered from the input noisy data.
since some details are blurred. Indeed, note the loss of information on the letters from the cover book. On the contrary, our result looks clear and image features are better preserved.
6. Conclusions. In this paper, we have proposed the half-linear regularization to minimize objective functionals composed of a quadratic data fidelity-term involving an arbitrary linear operator, and a nonsmooth nonconvex regularization term that depends on the norm of the gradient of the image to be recovered. By imposing some hypotheses on the potential function, we have proved the existence of a solution for the minimization problem. Furthermore, the proposed approach introduces a closed-form auxiliary variable that correctly detects edges and, as the experimental results have exhibited, it preserves them from smoothing. Accordingly, we have presented a dual optimization algorithm that converts the primal problem into a sequence of convex optimization problems. Because of the properties of the minimizers, the solutions are piecewise constant images with neat edges, where the number of the regions and their values are not fixed in advance from noisy and blurred data. We have validated the proposed strategy with theoretical results and we established convergence properties for our algorithm.  Another important issue that should be kept in mind is that our approach limits the eligible potential functions to concave ones. This may be seen as both a limitation and an important contribution. Indeed, the concavity, which cannot be treatable by the half-quadratic regularization, provides remarkable benefits in the image restoration field. Concave potential functions allow to solve jointly the segmentation and the restoration of images from noisy and blurred data in the context of linear inverse problems.
Experiments have convincingly shown that the new-proposed half-linear regularization gives state-of-the-art results in several image restoration tasks such as denoising or deconvolution. Furthermore, it outperforms TV-based image restoration algorithms for restoring piecewise constant images. The results of this paper could be extended to image problems involving TV-regularization. Furthermore, it would be interesting to test a continuation-like strategy. These issues could be a part of a future work.
Appendix A. Proof of Theorem 3.2. Let us denote the set of all pixels by I = 1, . . . , N 2 . For each k ∈ I we rewrite the differential operator D k : X → R 2 as D k u = (∇u) 1 i,j (∇u) 2 i,j , where k = j + (i − 1)N , and its standard norm as which implies that φ( D i ( u + v) ) > 0 at all i ∈ I 1 due to (A1). On the other hand, by noticing that u ∈ V ( I 0 ), it is easy to check that D i ( u + v) = 0 at i ∈ I 0 for any v ∈ V ( I 0 ) and, consequently, φ( D i ( u + v) ) = 0 in these cases. Accordingly, the functional J : satisfies J ( u+v) = J( u+v) for any v ∈ V ( I 0 )∩B ρ (0), so that J is C 2 -continuous on a neighbourhood of u and it has a local minimizer at u. Hence, ∇J ( u) and ∇ 2 J ( u) are well-defined in the usual sense, and the necessary second-order condition for a minimizer must also hold, that is, We develop now last inequality. For any i ∈ I 1 , we have that and also ∇ 2 A u − f 2 X v, v = 2 Av 2 X , from where (24) writes as i∈ I1 On the other hand, let j ∈ I 1 be such that (26) D j u ≤ D i u , ∀i ∈ I 1 , and set κ = D j u . Note that κ > 0 by definition of I 1 and it is finite due to the finiteness of I 1 . Define v = κ −2 u, which clearly satisfies v ∈ V ( I 0 ). We shall show that there exists θ > 0 such that if 0 ≤ D j u < θ, then ∇ 2 J ( u) v, v < 0, which is a contradiction with (24). First, we have