THE MOREAU ENVELOPE APPROACH FOR THE L1/TV IMAGE DENOISING MODEL

,

1. Introduction.Given a noisy image x which was contaminated by impulse noise, we consider a denoised image of x as a minimizer of the following L1/TV model (1) min where • 1 represents the 1 -norm, • TV denotes the total-variation, and λ is the regularization parameter balancing the fidelity term • −x 1 and the regularization term • TV .It is well accepted that the 1 -norm fidelity term can effectively suppress the effect of outliers that may contaminate a given image, and is therefore particularly suitable for handling non-Gaussian additive noise such as impulse noise [4].The L1/TV model ( 1) has many distinctive and desirable features.For example, the model does not erode geometric structures of the images under processing and possesses properties such as contrast invariant, data driven parameter selection, multiscale image decomposition, and morphological invariance [8,22,28].Applications of the L1/TV model include computer vision [10], biomedical imaging [27], optical flow and object tracking [29], and shape denoising [29].
In light of the useful features of the L1/TV model and its successful applications, it is highly desirable to develop efficient and fast algorithms for numerical treatment of the model.An algorithmic difficulty for the numerical treatment of the L1/TV model is the non-differentiability of both the fidelity and regularization terms in the objective function of (1).To overcome the difficulty, several numerical approaches were proposed recently for solving the L1/TV model.Depending on how the non-differentiability of the both terms is treated, these approaches can be roughly divided into two categories.In the first category, prior to developing a numerical scheme for the model, additional quadratic terms with some auxiliary variables are added to the objective function of the L1/TV model to tackle either the non-differentiability of its fidelity or regularization term, or both.For instance, a quadratic term involving an auxiliary variable was introduced to overcome the non-differentiability of the fidelity term in [1].Depending on whether we view the total variation itself as a single non-differentiable function or as a composition of a non-differentiable function with a first order difference matrix, quadratic terms were added to the model in two different ways for the purpose of remedying the regularization term in [11,20].Non-differentiability of both the fidelity and regularization terms was handled simultaneously by introducing two extra quadratic terms into the primal formulation of the L1/TV model in [16,26] and the dual formulation of the model in [13].Such techniques will certainly benefit the development of the numerical treatment of the model, but change the original L1/TV model to its various modified versions.These modified models may lose the desirable features of the L1/TV model.Unlike the ones in the first category, algorithms in the second category are to faithfully solve the L1/TV model directly without introducing additional terms to the model.The algorithms in this category are developed by using the proximity operator [18,20], the primal-dual formulation [7,17], and the augmented Lagrangian method [25].
The main purpose of this paper is twofold.On one hand, the models corresponding to all approaches classified into the first category can be reformulated in terms of the notion of the Moreau envelope.This, in turn, allows us to develop algorithms for these models in a unified way based upon the proximity operator.On the other hand, we compare the performance of various modified L1/TV models for the problem of impulse noise removal and make recommendations for the use of these models in applications.
By replacing the non-differentiable functions in the L1/TV model with their corresponding differentiable Moreau envelopes, we obtain a total of six modified L1/TV models including the original one.Based on our previous work in [18,20], all of the modified L1/TV models are solved under a unified framework based on the proximity operator.In this framework, a solution of a model is characterized in terms of fixed-point equations via the proximity operators of the convex functions associated with the fidelity and regularization terms.Depending on whether the regularization term of a model is differentiable or not, characterizations of the solutions of the models are presented separately.The characterization for models having a non-differentiable regularization term motivates us to develop the corresponding algorithms for the models that could be implemented by exploring a strategy of the Gauss-Seidel (GS) iteration to speed up its rate of convergence.For the models having a differentiable regularization term, the well-known algorithm FISTA introduced in [3] could be applied.
The performance of the proposed algorithms for the original L1/TV model and its variants is evaluated for the problem of impulse noise removal.We confirm that for removing impulse noise the fidelity term of a model should be the 1 -norm of the difference between the image to be sought and the given noisy image.A choice of the regularization term of a model relies upon the nature of the underlying image.It was pointed out in [18] that the original L1/TV model is particularly suitable for cartoon-like images which are approximately piecewise constants.However, since TV minimization techniques are not effective for detailed or textured images [15], we conclude in this paper via intensive numerical experiments that a model with the 1norm fidelity term and a smoothed total-variation as its regularization term is more suitable for the problem of impulse noise removal for natural images.As mentioned above, the total-variation can be viewed as a composition of a non-differentiable function with a first order difference matrix.Here, the smoothed total-variation is derived from the total-variation by simply replacing the non-differentiable function by its Moreau envelope.
The outline of this paper is as follows: In Section 2 we propose several possible modified L1/TV models by approximating the non-smooth 1 fidelity term or the total variation term by its corresponding Moreau envelope.These modified L1/TV models cover several splitting based models in the literature.In Section 3, the solutions of the L1/TV models are characterized in terms of fixed-point equations via the proximity operators.The corresponding algorithms based on these characterizations and their convergence analysis and accelerating schemes are given in Section 4 for the models having a non-differentiable regularization term and in Section 5 for the models having a differentiable regularization term.In Section 6 we report the numerical experiments, discuss the difference between the modified L1/TV models and compare the performance of the proposed algorithms in restoring images from those corrupted by impulse noise.We draw our conclusions in the last section.

2.
The L1/TV model and its Moreau envelope variants.In this section, we first explain the suitability of model (1) for impulse noise removal from a statistical viewpoint.We then give five modified versions of model (1) in terms of the Moreau envelope.Finally, we point out the connection of the modified L1/TV models with several existing models for solving the original model (1).
We begin with a statistical explanation on the L1/TV denoising model for the problem of impulse noise removal.Impulse noise is caused by malfunctioning pixels in camera sensors, faulty memory locations in hardware, or transmission in a noisy channel [5].We denote by u ∈ R d the original image.Its noisy version x corrupted by salt-pepper noise (a special type of impulse noise) with a noise level 0 < r < 1 is modeled as (2) x i =    0, with probability r 2 , 255, with probability r 2 , u i , with probability 1 − r, where x i and u i are the i-th pixel values of x and u, respectively.A maximum a posteriori expectation-maximization could be used to find an estimate of u by maximizing the conditional a posteriori probability p(u|x), the probability that u occurs when x is observed.Using Bayes' theorem, we have that By taking the negative logarithm of the above equation, the estimate is a solution of the following minimization problem The expression − log p(x|u) in (3) can be viewed as a fidelity term measuring the discrepancy between the estimate u and the noisy image x.The term − log p(u) is used to regularize a solution that has a low probability.The choice of the likelihood p(x|u) depends upon the property of noise.For salt-pepper noise given in (2), we have that where |S| denotes the cardinality of the set S. Note that |{i : where • 0 denotes the number of non-zero elements in a vector.Then the above equation becomes A Gibbs prior of the total-variation is used in (3) for p(u) due to the fact that the total-variation is sensitive to geometric features of images, such as edges.This prior has a form (5) p where Z is a normalization factor, σ is a positive number, and u TV denotes the total-variation of the image u.The precise definition of • TV will be provided later in this section.
Putting the expressions of p(x|u) and p(u) given in (4) and ( 5) into (3) and ignoring a constant, we obtain (6) min where λ is a positive number related to parameters r, σ and the factor Z. Since the function • −x 0 is not convex, it will introduce numerical difficulties in solving the minimization problem (6).Motivated by the following approximation to the 0 -norm of v ∈ R d : and the fact that p = 1 is the smallest number of p > 0 such that • p is convex, we replace • −x 0 in (6) by • −x 1 .This leads to model (1).We remark that the 1 -norm data fidelity term was first proposed by Nikolova for the total variation regularization of images corrupted by impulse noise [22].Its effectiveness in handling impulse noise can be also found in [9].Next, we propose several modified models of the L1/TV denoising model (1) by replacing one of the non-differentiable functions involved in the model with its Moreau envelope.To this end, we begin with presenting the precise definition of the total-variation.In what follows, when we say an image u in R d , we always mean that this vector is formed from a q × q square image by sequentially concatenating the columns of the image.Here we assume that d = q 2 .To define the total-variation of the image u, we need a q × q difference matrix .
Through the notion of matrix Kronecker product ⊗, we define a 2d × d matrix B by ( 7) where I d denotes the d × d identity matrix.We further define a convex ϕ : With the above preparation, the total-variation u TV of u can be written as ϕ(Bu).
Hence, the L1/TV image denoising model in (1) has the form (9) min Notice that in formulation (9) of the L1/TV denoising model the functions A typical way of smoothing a non-smooth convex function is to replace it by its Moreau envelope.We denote by Γ 0 (R m ) the class of all lower semicontinuous convex functions ψ : R m → (−∞, +∞] such that the domain dom(ψ) := {x ∈ R m : ψ(x) < +∞} of ψ is nonempty.For a positive number β and a function ψ ∈ Γ 0 (R m ), the Moreau envelope of ψ with index β at z ∈ R m is defined as (10) env βψ (z) := min which is again in Γ 0 (R m ) (see, e.g., [21,23]).For any positive number β the Moreau envelope env βψ is bounded above by ψ and it converges to ψ as β → 0 + (see, e.g.[24]).More importantly, the Moreau envelope of a convex function is always differentiable and its gradient is given by is called the proximity operator of ψ with index β.
Considering the non-differentiability of the 1 -norm and the function ϕ in (9) and the above remarkable properties of the Moreau envelope, we may propose several variants of the original model (9).For its fidelity term • −x 1 , we can keep it unchanged or replace it by its envelope env α • 1 (• − x).For the regularization term ϕ • B, besides keeping it unchanged, depending on whether we view the nondifferentiable total variation as a single function or as a composition of the nondifferentiable function ϕ with the difference matrix B, we have two alternatives: env βϕ•B (u) and env βϕ (Bu).Considering all of the possible choices, we have five different modified models listed in Table 1, in addition to the original model.
Table 1.The L1/TV denoising model and its five variants.

Model
Fidelity term Regularization term Model In the rest of this section, we shall point out that our proposed modified L1/TV models in Table 1 are consistent with several existing models which were obtained by using the splitting technique in dealing with the original L1/TV model.In [1], by introducing an auxiliary variable v to replace the difference u − x in the L1/TV model ( 9) and then imposing a condition to penalize the closeness between v and u − x, a model of the form (13) min with γ being a positive number.
Proof.We first show the non-emptiness of the solution set of the optimization problem (13).Since the functions • 1 , • 2 , and ϕ are convex and non-negative, so is Hence, the existence of minimizers of the objective function J AGCO follows.Now, assume that the pair (u , v ) is a solution of the optimization problem (13).It follows immediately that For ease of exposition, we write Then the above inequality yields . By the definition of the Moreau envelope, we know that prox α Using the definition of the Moreau envelope again, we know that If the pair (u , v ) is a solution of the optimization problem ( 13), we can show that u is a solution of the optimization problem (15).If not, there exists a vector u such that J 2 ( u) < J 2 (u ).By (17), we have , this contradicts the assumption of (u , v ) being the solution of the optimization problem (13).
Conversely, if u is a solution of the optimization problem (15), we show that the pair (u , prox α • 1 (u − x)) is a solution of the optimization problem (13).Indeed, if this statement is not true, then there exists a vector u such that This, by (17), implies that J 2 ( u) < J 2 (u ), which violates the assumption of u being a solution of the optimization problem (15).
By Proposition 1, we conclude that model (13) is the same as Model 2 listed in Table 1.
In model (13), when an additional auxiliary variable w is introduced to replace Bu, we have the following model (18) min with γ and β being positive.Model (18) was originally introduced in [26].Proof.In a similar way that leads to (17), we have that J YZY (u, prox α • 1 (u − x), prox βϕ (Bu)) = J 4 (u) for all vectors u ∈ R d .With this equation, the rest proof of this result is similar to that of Proposition 1, and it is omitted.
By Proposition 2, model ( 18) is the same as Model 4 listed in Table 1.
In [16], the model ( 20) min was proposed, where . Using this equation, the proof of this proposition is similar to that of Proposition 1.Therefore, we omit its proof here.
We conclude from Proposition 3 that the alternative model (20) is the same as Model 6 listed in Table 1.
In [11], the model was discussed, where with λ and β being positive numbers.
Proof.By using the definitions of the proximity operator and the Moreau envelope, we have that J CJK (u, prox βϕ•B (u)) = J 5 (u) for all vectors u ∈ R d .Using this equation, the proof of this proposition is similar to that of Proposition 1, and we skip the details.
Proposition 4 says that model ( 22) is the same as Model 5 listed in Table 1.
In summary, the key idea of the work in [1,11,16,26] is to use the splitting technique, which was initially motivated for convenience in applying alternating minimization approaches to the generated alternative L1/TV models or the corresponding dual models.However, as we proved above, the splitting technique is essentially to replace the non-differentiable functions in the original L1/TV model by their corresponding differentiable Moreau envelopes.This understanding provides us a unified framework for the numerical treatment of all the six models based on the fixed-point equations via the proximity operators.
3. Characterization of Solutions to the L1/TV models.In this section, we characterize the solutions of the modified L1/TV models listed in Table 1.All these models can be cast into the following general form (24) min by specifying the functions F and R and the matrix W properly.The correspondence between a model listed in Table 1 and a choice of F , R and W is described in Table 2.
We first provide a characterization of a solution of model (24).This characterization will be further refined if the function R is differentiable.To characterize a solution of model (24), we need to briefly review the concept of subdifferential in convex analysis.
The subdifferential of a proper convex function ψ ∈ Γ 0 (R m ) at a given vector u ∈ R m is the set defined by (25) ∂ψ(u The subdifferential and the proximity operator of the function ψ are related in the following way (see, e.g.[19]): For u in the domain of With the help of the relationship between the subdifferential and the proximity operator, we can characterize a solution of model (24) in terms of fixed-point equations.
Proposition 5. Let x be in R d and let λ be a positive number.If u ∈ R d is a solution of model (24), where F ∈ Γ 0 (R d ), R ∈ Γ 0 (R m ), and W is a m × d matrix, then for any γ, σ > 0, there exist v, b ∈ R m such that 27), (28), and ( 29), then u is a solution of model (24).
Proof.We first assume that u ∈ R d is a solution of model (24).By the Fermat rule in convex analysis, we have that For any positive numbers γ, σ, we can select a vector a ∈ 1 γ ∂(λF . By equation ( 26), we have that By equation ( 26) again, from the inclusion b ∈ 1 σ ∂R(W u) we have that , which is equation (28).Clearly, W u = v.Therefore, equation ( 29) holds.Substituting equation (29) into equation (30) leads to equation (27).
Conversely, suppose that there exist γ, σ > 0, v, b ∈ R m and u ∈ R d satisfying equations ( 27)- (29).By equation (29), it holds that W u = v.With this relation, by equation (26), equations ( 27) and (28) σ ∂R(W u) respectively.Combining these two inclusion relations yields that This in turn implies that u ∈ R d is a solution of model (24).
We remark that the characterization given in Proposition 5 reduces to the one given in [18] when the function F in model ( 24) is the 1 -norm.The integer m, the number of the rows of the matrix W , is 2d for Models 1-4 and d for Models 5 and 6.
Next, we show that if the convex function R is differentiable, a solution of model ( 24) can be characterized with only one fixed-point equation.model (24), then for any γ > 0 Conversely, if u ∈ R d satisfies equations (32) for some γ > 0, then u is a solution of model (24).
Proof.According to Proposition 5, we need only to prove that the fixed-point equations ( 27)-( 29) can be reformulated to equation (32) when R is differentiable.Notice that, by omitting v, equations ( 28) and ( 29) imply that W u = prox 1 σ R (b + W u). Hence, b ∈ ∂R(W u) by equation (26).Recall that the subdifferential of a differentiable function R at a given point z ∈ R m is a singleton set, that is ∂R(z) = {∇R(z)}, where ∇R(z) denotes the gradient of R at z. Hence, we have that b = ∇R(W u).Substituting this result into (27) and eliminating variables b and v yield the fixed-point equation (32).This completes the proof.
4. Proximity algorithms.In this section, we present proximity algorithms for solving model (24) based on the characterizations of solutions given in the previous section, and establish their convergence results.
According to Proposition 5, a solution of model ( 24) is characterized by the fixedpoint equations ( 27)- (29).Based on these fixed-point equations, we can naturally propose an iterative algorithm that generates three sequences {u k : k ∈ N}, {v k : k ∈ N}, and {b k : k ∈ N}, from arbitrary initial vectors u 0 ∈ R d , v 0 ∈ R m , and b 0 ∈ R m .Here N is the set of all natural numbers.This iterative procedure may be stated as follows: (33) The following theorem establishes a convergence result of the proposed iterative scheme (33).Its proof is omitted here since it is similar to that of Theorem 3.5 in [18].Theorem 4.1.Let x be in R d , let λ be a positive number, let F ∈ Γ 0 (R d ), let W be a m × d matrix, and let R ∈ Γ 0 (R m ).If the positive numbers γ and σ are chosen to satisfy then the sequence {u k : k ∈ N} generated by the iterative scheme (33) converges to a solution of model (24).
Implementing the iterative procedure (33) requires the availability of explicit forms of two proximity operators prox λ γ F and prox 1 β R .We next present explicit forms of these operators for F being either • 1 or env α • 1 and R being either ϕ or env βϕ , where ϕ is given in (8).With these explicit forms, the scheme (33) is particularly suitable for solving models for Models 1-4 listed in Table 1.However, it might not be suitable for solving the modified L1/TV models for Model 5 and Model 6 since the proximity operator of env βϕ•B is lack of an explicit expression.
We first give the closed forms of the proximity operators of • 1 and its Moreau envelope env α • 1 .For t > 0 and u ∈ R d , set y := prox t • 1 (u), z := prox t env α • 1 (u).Then y i and z i , the i-th component of y and z, are given by (35) From equations in (35), we notice that both the proximity operators of 1 -norm and env α • 1 can be performed component-wise.This allows us to exploit a Gauss-Seidel strategy to implement the first equation in the iterative scheme (33) for Models 1-4.For these four models, the matrix W is always equal to B. Hence, in view of the first equation of (33), the i-th entry u k i and those from its neighbor have an impact to the i-th entry u k+1 i of u k+1 .By b k U we denote the upper half of the vector b k and b k L its lower half.Similarly, we define v k U and v k L .For convenience of exposition, we shall view all vectors u k , u k+1 , b k L , b k U , v k L , v k U in the iterative scheme (33) as matrices for the time being.The component-wise expression of the first equation in the iterative scheme (33) for the interior pixel at the (i, j)-th location can be rewritten as where f is either the | • | or the env α|•|1 depending on whether F is the 1 -norm or its Moreau envelope env α • 1 .Similar expressions can be derived for boundary pixels.The proximity operator of f has been explicitly given in equations (35).It can be seen from equation (37) that the pixel u k i,j and its four neighboring pixels all contribute to the updated pixel u k+1 i,j .However, in the iterative scheme (33), one does not use the most recently available information when computing u k+1 i .For example, if the order for updating pixels at the k-th iteration to obtain u k+1 at the (k + 1)-th iteration accords to an order of top-tobottom and column by column, u k i−1,j and u k i,j−1 are not used in the calculation of u k+1 i,j even though u k+1 i−1,j and u k+1 i,j−1 have already been known.Therefore, following the idea of the Gauss-Seidel iteration, we revise the calculation of the pixel u k+1 i,j so that we always use the most current estimations of the pixels u i−1,j and u i,j−1 .

Inverse Problems and Imaging
Volume 8, No. 1 (2014), 53-77 That is, u k+1 i,j can be computed by This modification will not increase any computational complexity compared to equation (37) and is expected to speed up the convergence.With such a modification in the first equation of iterative scheme (33), we have Algorithm 1 which can be used for solving Models 1-4.Note that we could choose σ γ < 1 8 to ensure convergence of the iteration because the norm B 2 2 is always less than 8 (see, e.g., [19]).

Algorithm 1 Proximity algorithm for Models 1-4 accelerated by GS iteration
Given: A noisy image Initialization: until u k+1 converges or satisfies a stopping criteria.
To close this section, we point out the relation between our proposed iterative algorithm (33) and the CPPD method proposed in [7].The CPPD method is a primal-dual based algorithm developed for solving the associated saddle-point formulation of a minimization problem.The saddle-point formulation of model (24) has the form (39) min The iterative scheme of the CPPD algorithm for solving model ( 39) is (40) where δ, τ are two positive numbers, R * is the convex conjugate function of R.
We rewrite the iterative scheme (33) to establish the relationship between the CPPD and our algorithm.From the third step of the iterative scheme (33), we can get Substituting this equation into the first equation in (33), and combining the second and third equations in (33) together, the iterative scheme (33) can be rewritten as (41) Recall that the Moreau decomposition I = prox tR + t prox 1 t R * • 1 t I for any t > 0. Let s k := σb k , then the iterative scheme in (41) has the equivalent form of (42) which is exactly the iterative scheme of CPPD applied to the following saddle-point formulation (43) min of the dual problem of model (24).Therefore, our proposed iterative algorithm (33) for solving the primal problem ( 24) is algebraically equivalent to the CPPD algorithm applied to the saddle-point formulation of its dual problem.But they are not algorithmically equivalent.As pointed out in [18], the component-wise Gauss-Seidel iteration applied to CPPD will not result in acceleration of convergence.

5.
Special proximity algorithms for model (24) with a differentiable R.This section is devoted to the presentation of proximity algorithms for model (24) when the convex function R is differentiable.The last four modified L1/TV models for Models 3-6 summarized in Table 1 possess this property.
Recall that Proposition 6 presents a characterization of solutions of model ( 24) when the function R in the regularization term is differentiable.From the fixedpoint equation (32), for any initial value u 0 ∈ R d , we propose the following iterative scheme for finding a solution of model (24) (44) We remark that the fixed-point equation (32) can be viewed as a special case of the proximal forward-backward splitting described in [12].Furthermore, if the gradient ∇R is Lipschitz continuous with Lipschitz constant L, that is for all y, z ∈ R m and if γ is chosen to satisfy then it was proved in [12] that for any initial guess u 0 ∈ R d , the sequence {u k : k ∈ N} generated from the iterative scheme (44) converges to a fixed-point of equation (32), which, by Proposition 6, is a solution of model (24).
The iterative scheme (44) can be applied to Models 3-6.Implementing the iterative procedure (44) requires the availability of the proximity operator prox λ γ F and the gradient ∇R.Since the function F is either the 1 -norm or the Moreau envelope env α • 1 , its proximity operator prox λ γ F can be explicitly computed according to equations (35).We know from Table 2 that R is the Moreau envelope of either ϕ or ϕ • B. If R is env βϕ (i.e.Models 3 and 4), by equation (11)  Hence, the iterative algorithm (44) applied to Model 3 and Model 4 has the form of (46) Since the proximity operator prox βϕ can also be explicitly computed according to equation (36), the calculation of u k+1 in the iterative scheme (46) has a closedform.Furthermore, due to the non-expansiveness of the proximity operator [2], the gradient of env βϕ in equation ( 45) is Lipschitz continuous with Lipschitz constant then the sequence {u k : k ∈ N} generated from the iterative scheme (46) for any initial value u 0 ∈ R d converges to a solution of the model corresponding to Models 3 or 4. The flow of finding this solution is described in Algorithm 2.

Algorithm 2
The proximity algorithm for Models 3 and 4 Given: The sequence {u k : k ∈ N} generated from iterative scheme (47) converges to a solution of Model 5 or Model 6 if the number γ is chosen to satisfy 1 γβ < 2. In iterative scheme (47), the proximity operator of ϕ • B at u k needs to be evaluated at each iteration.Although this proximity operator has no closed-form, algorithms proposed in [6,14,19] can be directly applied to evaluate the proximity operator prox βϕ•B .In this paper, we apply the approach proposed in [19] to compute prox βϕ•B (u k ).In [19], it was shown that where δ is positive and b k is a solution of the fixed-point equation ).It was proved in [19] that if we choose the number δ > 0 such that δβ < 1  4 , then for any initial guess b k,1 ∈ R 2d the sequence {b k,j : j ∈ N} generated from ) converges to a fixed-point b k of equation (49).As pointed out in [19], a Gauss-Seidel strategy can be used to accelerate the convergence of iterative scheme (50).Algorithm 5 FISTA and denote by w k U and w k L the upper and the lower half of the vector w k , respectively.We view all vectors x, y k , y k+1 , w k U and w k L in equation ( 55) as matrices momentarily.Then, the interior pixel y k+1 ij of y k+1 is updated in the following way Similar calculation for the boundary pixels of y k+1 can be proceeded.The proximity algorithm for solving Model 3 or Model 4 accelerated by both the FISTA and the Gauss-Seidel iteration is described in Algorithm 6.

Algorithm 6
The proximity algorithm for Model 3 or Model 4 accelerated by
We close this section with Table 3, which summarizes choices of algorithms and acceleration techniques for solving the L1/TV model and its variants.
6. Numerical experiments.In this section we present numerical results to demonstrate the difference of the various modified L1/TV models and compare the performance of the proposed algorithms on restoring images from those corrupted by salt-pepper noise.We choose four images shown in Figure 1 as our testing examples.Both the "Cameraman" and "Lena" images are of size 256 × 256.The "Lighthouse" and "Window" images are downloaded from the Kodak database (http://r0k.us/graphics/kodak/)and cut to square images with size of 512×512.The quality of the restored image is evaluated in terms of the peak-signal-to-noise  numerical results are listed in Table 4.We observe that the PSNR-values of the restored images by Model 1 are consistently higher than those by Model 2. Furthermore, the PSNR-values of the restored images by Model 2 decrease as the value of the smoothing parameter α increases.This is essentially due to the facts that lim and the use of • −x 1 is based on the statistical analysis of the salt-pepper noise in Section 2. This experiment confirms that the 1 -norm fidelity term should not be changed for the purpose of removing impulse noise.6.2.The favorable regularization term.In the second experiment, we fix the fidelity term to be • 1 .We shall compare the total variation regularization term with its two smoothed versions env βϕ (Bu) and env β•B (u).To this end, we compare the performance of Model 1 with those of Models 3 and 5. Similarly, we choose different values for the smoothing parameter β in Models 3 and 5 to see its effect on restored images.We use Algorithm 1 to solve Model 3. Model 5 is solved by Algorithm 5 in which the parameter n is chosen as 50.It was shown numerically in [19] that this value for n is accurate enough to evaluate prox ϕ•B (u).The stopping criteria in this experiment is also chosen as TOL = 10 −5 .
We summarize the numerical results in Table 5.Unlike ϕ and env βϕ , the proximity operators of ϕ • B and env βϕ•B have no closed-forms, which introduce some difficulty in the numerical treatment of Model 5.Moreover, as shown in Table 5, Model 5 can not improve the quality of the restored images in comparison to Model 1.We therefore conclude that replacing the total variation ϕ • B by env ϕ•B (u) is not an advisable choice.However, Model 3 with the smoothed regularization term env βϕ (Bu) can outperform Model 1 when the parameter β is properly selected (usually in the range of 10 to 20).This is mainly because the smoothed regularization term uses env βϕ rather than ϕ to penalize the gradients of the image.For z ∈ R 2d , we denote y := env βϕ (z) and let z i := [z i , z d+i ] .Then for i = 1, 2, . . ., d, by the definition of ϕ in equation ( 8) and the closed-form solution of its proximity operator in equation (36), we have that (57) .
As can be seen from equation (57), the function env βϕ differs from ϕ by discriminating gradients with small magnitudes and penalizing them quadratically.As a result, it has an effect on smoothing small gradients of the restored image while well preserving the sharp edges.Consequently, the staircasing effects usually caused by the total variation term can be reduced to some extent.The parameter β is used to specify the gradient value at which the regularization term env βϕ (Bu) switches from penalizing quadratically to behaving like the total variation.When β is too small, the regularization term env βϕ (Bu) performs like the total variation, while it gradually imitates the Tikhonov regularization with β increases.Hence, we recommend env βϕ (Bu) as the preferred regularization term in applications.The experiment results in Table 5 show that the best value for β is usually in the range of 10 to 20.

6.3.
The favorable model.Based on the above numerical study for models with various fidelity and regularization terms, Model 3 is a favorable choice for impulse noise removal.Since the function env βϕ in the model is differentiable and its proximity operator has also a closed-form, the model can be solved by the algorithms accelerated by either the Gauss-Seidel iteration or FISTA, or the both, presented in Sections 4 and 5.A detailed comparison of the algorithms for impulse noise removal is presented in the remaining part of this section.
We first validate the acceleration of the Gauss-Seidel iteration and FISTA.To see whether the Gauss-Seidel iteration can accelerate the convergence, we compare the performance of iterative scheme (33), the CPPD algorithm in equation ( 40) and Algorithm 1 in restoring the "Cameraman" image corrupted by salt-pepper noise with the level of 30%.We select three different values 1, 10, and 20 for β.The PSNR-values of the restored images obtained from three approaches at each iteration are plotted in Figure 2. We observe that the proposed algorithm in equation (33) performs similarly to the CPPD algorithm.This does not surprise to us as both algorithms are closely related as discussed at the end of Section 4.However, applying the Gauss-Seidel iteration in Algorithm 1 can obviously improve the convergence rate and the restored image can be approximately solved within 20 iterations by Algorithm 1.It can also be seen from Figure 2 that the parameter β does not affect much the speed of convergence of these three algorithms.
We then discuss the acceleration effects of the FISTA and Gauss-Seidel iteration on Algorithm 2. To this end, we apply Algorithms 2, 4 and 6 to restore the same noisy image ("Cameraman" corrupted by salt-pepper noise with the noise level 30%) that we used above.Three different values 1, 10, and 20 are also selected for β. Figure 3 shows the PSNR-values of the restored images obtained by three algorithms at each iteration.We can make two conclusions from the results in Figure 3.One is Algorithm 4 using the FISTA can yield much faster convergence rate than Algorithm 2, and the use of Gauss-Seidel iteration in Algorithm 6 could further speed up the convergence of Algorithm 4. Second, the convergence rates of three algorithms heavily depend on the value of β, the larger value we select for β, the faster convergence rate can be achieved.As we have already seen, Model 3 can be solved by employing the proposed algorithm (33) and Algorithm 2 which have the corresponding accelerated schemes described in Algorithm 1 and Algorithm 6.A natural question follows immediately: which accelerated scheme should we choose for solving model Model 3? To answer this question, we compare below the performance of Algorithms 1 and 6 in details.We set the stopping criteria in this experiment as TOL = 10 −3 in (56).We set six different values 1, 5, 10, 15, 20 and 25 for the parameter β.Considering restoring the "Cameraman" image corrupted by the salt-pepper noise with three different levels at 10%, 30% and 50%, we show in Figure 4 the running time consumed, the PSNR-values of the restored images and the needed iteration numbers of both algorithms.The first observation is that the convergence rates of both algorithms can be improved when increasing the value of the parameter β.However, the performance of Algorithm 6 is much more dependent on this parameter than that of Algorithm 1. Specifically, we prefer using Algorithm 1 when β is less than 10, while we tend to like Algorithm 6 more when β becomes larger.We also observe that the best choice for the parameter β that can generate the restored image with the highest PSNR-value usually locates in the range 10 to 20.The higher the level of noise in an image is, the larger β will be preferred.7. Concluding remarks.We propose several modified L1/TV models by replacing one or both of the non-differentiable functions in the objective function of the original L1/TV model with their corresponding differentiable Moreau envelopes.Some existing approaches for the L1/TV model in the literature are intrinsically to solve one of our modified models.A unified fixed-point algorithm framework via the proximity operator is developed for the numerical treatment of these models.The algorithms are motivated by the fixed-point equations which characterize the solutions of the models.The corresponding accelerated schemes that can speed up the convergence rates of the proposed algorithms are given.We compare the performance of the modified models in the applications of impulse noise removal, and single out model 3 as the best choice for this purpose.Two types of the fixed-point algorithms that can solve this model are compared in details, some recommendations on which algorithm should be used are made based on the numerical experiments.

Proposition 1 .
If λ and γ are positive, x is a noisy image in R d , B is the 2d × d matrix defined by(7), and ϕ is the function defined by (8), then the solution set of the optimization problem (13) is nonempty.If the pair (u , v ) is a solution of the optimization problem(13), then v = prox α • 1 (u −x) where α = λγ.Furthermore, the pair (u , v ) is a solution of the optimization problem (13) if and only if u is that of the optimization problem(15) minu J 2 (u),where

Proposition 2 .
If λ, γ and β are positive, x is a noisy image in R d , B is the 2d×d matrix defined by(7), and ϕ is the function defined by (8), then the solution set of the optimization problem (18) is nonempty.If the triple (u , v , w ) is a solution of the optimization problem(18), then v = prox α • 1 (u − x) and w = prox βϕ (Bu ), where α = λγ.Furthermore, the triple (u , v , w ) is a solution of the optimization problem(18) if and only if u is that of the optimization problem min u J 4 (u), Inverse Problems and Imaging Volume 8, No. 1 (2014), 53-77 where J 4 (u) := λ env α • 1 (u − x) + env βϕ (Bu).
and α 3 being positive numbers.Proposition 3. If α 1 , α 2 , and α 3 are positive, x is a noisy image in R d , B is the 2d × d matrix defined by (7), and ϕ is the function defined by (8), then the solution set of the optimization problem (20) is nonempty.If the triple (u , v , w ) is a solution of the optimization problem (20), then v = prox βϕ•B (u ) and w = x + prox α • 1 (u − x), where α = α2 2 and β = α3 2α1 .Furthermore, the triple (u , v , w ) is a solution of the optimization problem (20) if and only if u is that of the optimization problem min u J 6 (u), where J 6 (u) := λ env α • 1 (u − x) + env βϕ•B (u) with λ = α2 α3 .Proof.By using the definitions of the proximity operator and the Moreau envelope, we have that for all vectors u

Proposition 6 .
Let x be in R d , let λ be a positive number, let F ∈ Γ 0 (R d ), let W be a m × d matrix, and let R ∈ Γ 0 (R m ) be differentiable.If u ∈ R d is a solution of Inverse Problems and Imaging Volume 8, No. 1 (2014), 53-77

1 β
converges or satisfies a stopping criteria.If R is env βϕ•B (Model 5 or Model 6), by equation (11) we have that ∇env βϕ•B = (I − prox βϕ•B ), whose gradient is Lipschitz continuous with Lipschitz constant 1 β .Hence the iterative algorithm (44) applied to Model 5 or Model 6 has the form of (47)

20 Figure 2 . 20 Figure 3 .
Figure 2. Acceleration results of applying Gauss-Seidel iteration to scheme (33) when restoring a noisy "Cameraman" image corrupted by salt-pepper noise with the noise level 30%.

Table 3 .
The choices of algorithms and acceleration techniques for solving the L1/TV models