EDGE-PRESERVING RECONSTRUCTION WITH CONTOUR-LINE SMOOTHING AND NON-QUADRATIC DATA-FIDELITY

. The standard approach for image reconstruction is to stabilize the problem by including an edge-preserving roughness penalty in addition to faithfulness to the data. However, this methodology produces noisy object boundaries and creates a staircase eﬀect. State-of-the-art methods to correct these undesirable eﬀects either have weak convergence guarantees or are limited to speciﬁc situations; furthermore, most of them use a quadratic data-ﬁdelity term. In this paper, we propose a simple alternative regularization model to improve contour regularity and to reduce the staircase eﬀect—our model incorporates the smoothness of the edge ﬁeld in an implicit way by adding a simple penalty term deﬁned in the wavelet domain. We also derive an eﬃcient half-quadratic algorithm to solve the resulting optimization problem, including the case when the data-ﬁdelity term is not quadratic and the cost function is not convex. Our approach either extends or supplements existing methods and oﬀers strong convergence guarantees. Numerical experiments show that it outperforms ﬁrst-order total variation regularization as well as state-of-the-art second-order regularization techniques.

1. Introduction. We consider the classical inverse problem of recovering a piecewise-smooth image x ∈ R K from some data d ∈ R K (K K) of the form where H is a linear map from R K to R K which models the acquisition process, η ∈ R K is an unknown realization of a random vector η, and the function χ : R K → R K represents possible additional corruption by impulse noise. (We call this problem image denoising when H = id R K .) Throughout this paper, we make no distinction between (i) a real-valued image with n pixels and its vector representation in R n , (ii) a vector of R n and its n × 1 column-matrix representation, and (iii) a linear map from R n to R m and its m × n canonical matrix. We denote by L n,m (R) the set of linear maps from R n to R m .
A common estimate of the original image x is defined as any global minimum of a cost function U : R K → R which combines a data-fidelity term Φ d with a stabilization term Ψ weighted by a parameter λ > 0 : (2) U (x) = Φ d (x) + λΨ(x).
The original justification for this choice lies within the regularization framework [38] and its Bayesian interpretation [17]. The data-fidelity term accounts for the noise characteristics; it is generally defined by where the φ k 's are increasing in R + ([[1, K ]] is a shorthand notation for the set {1, . . . , K }). In the Bayesian setting, this form is obtained in the absence of impulse noise and under the following assumptions: (i) the noise components η 1 , . . . , η K are independent, and (ii) the density of each η k is proportional to exp(−φ k (|t|)). The most commonly used examples are the squared 2 -norm of the residual, defined by φ k (t) = t 2 , and the 1 -norm of the residual, defined by φ k (t) = t. (The former corresponds to i.i.d. Gaussian random variables, and the latter corresponds to i.i.d. Laplace random variables.) The 1 -norm also allows to deal with impulse noise [41]. In this case, a possible refinement is to detect the data pixels corrupted by impulse noise and to consider the data to be missing at these outlier pixel locations [7,8,34]; the data-fidelity term is then of the same form as (3), but with a summation over the set of non-outlier pixels. The regularization functional Ψ is intended to promote the formation of smooth regions separated by edges; it is of the form where · 2 is the 2 -norm, R l ∈ L K,ρ l (R) (ρ l ∈ N * ), and the functions ψ l are increasing in R + . Concrete examples can be found in [22,33,11,16,46,41,47]. Usually, all the ψ l 's are equal to a same function ψ, and {R l : l ∈ [[1, L]]} is a set of first-order difference operators (ρ l = 1) or a discrete approximation to the gradient (ρ l = 2). The ψ l 's are called potential functions (PFs) in the Bayesian framework. They are often classified in two categories: convex PFs, which ensure the convexity of U and reduce smoothing in the vicinity of discontinuities [11], and non-convex PFs, which yield sharper edges [42] at the expense of increased optimization difficulty. The formulation of the image reconstruction problem as the minimization of a cost function of the form (2)-(4) has proven effective, and yet it has two limitations: 1) Noisy contour lines. Regularization functionals of the type of (4) do not embed prior knowledge on the geometry of edges. They tend to produce noisy object boundaries which are not faithful to the original image. Early attempts to correct this effect were based on the introduction of an explicit edge-process -either boolean [23,3] or continuous [6,29] -to model the mutual dependence between neighboring discontinuities. However, boolean edge-processes drastically increase computational complexity (not to mention the problem of correctly specifying the penalties of the various edge configurations), and continuous edge-processes give disappointing results.
2) Staircase effect. A basic requirement for edge preservation is that ψ (t)/t goes to zero as t → +∞ (see [11]), which explains why convex PFs are typically nearly affine beyond a neighborhood of the origin. In fact, convex edge-preserving PFs are often similar to the identity on R + , but with zero derivative at the origin to make the regularization term Ψ differentiable -a well-known example is (5) ψ δ id (t) = δ 2 + t 2 − δ, where δ > 0 is small compared to the width of the range of the original image. If ψ is the identity on R + , Ψ(x) is the discrete total variation (TV) of x. Despite its popularity, TV regularization produces blocky images (see, for instance, [19,10]), and so do PFs close to the identity. This phenomenon is called the staircase effect because ramps (affine regions) tend to turn into stairs (piecewise-constant regions); it is most pronounced for bounded PFs [22].
State-of-the-art methods for edge continuation and/or reduction of the staircase effect use second-order difference operators [10,35,1,34,49,32,28,53]. These methods are reviewed in Section 2; they lack strong convergence guarantees, and, except for [34], they are restricted to quadratic data-fidelity and/or image denoising. Our contributions are the following: 1) We propose a simple alternative regularization model to encourage the formation of smooth contours lines (while preserving edges) and to reduce the staircase effect. Our model is obtained by adding constraints in a multiresolution space: the solutions are the global minima of an augmented functional V : R K → R of the form (6) V where T is a directional multiresolution transform, Ψ is a smoothness penalty term similar to (4) which operates on the detail coefficients, and λ > 0 weights the influence of Ψ. This technique incorporates the smoothness features of the edge field implicitly and has limited impact on optimization complexity. 2) We derive an efficient deterministic relaxation algorithm to minimize V under differentiability and coercivity assumptions. Our algorithm can deal with nonquadratic data-fidelity and has the following properties: (i) it finds the global minimum when V is strictly convex, (ii) it converges from any initialization when the minima are isolated, and (iii) it gets arbitrarily close to the set of local minima in any situation. The proposed optimization scheme has two further advantages: it allows to enforce inequality constraints on the pixel values, and it allows to use continuation sequences to gradually reveal the complexity of minimizing V (this latter refinement improves performance when V is not convex and/or approximates a non-differentiable cost function).
The price of the convergence guarantees of our deterministic relaxation algorithm is that the PFs must be smooth at zero and coercive (a function ψ defined on R + is said to be smooth at zero if ψ (0) = 0, and it is called coercive if lim t→+∞ ψ(t) = +∞). These conditions preclude PFs of the form ψ(t) = t p with p ∈ (0, 1] [47] and bounded PFs such as ψ(t) = t p /(δ p + t p ) with p ∈ [1, 2] [24,22]. However, they allow standard choices such as, for instance, ψ δ id (see (5)), ψ(t) = ln(cosh(t/δ)) [25], and the non-convex PF (7) ψ δ HL (t) = ln 1 + (t/δ) 2 originally proposed by Hebert and Leahy [26]. In fact, the limitations imposed by our extra conditions are of no practical importance; indeed, any PF that is not smooth at zero can be closely approximated by a smooth-at-zero PF, and coercive non-convex PFs such as ψ δ HL perform as well as bounded PFs in terms of edge preservation [42,Theorem 3.1].
This paper is organized as follows. We review previous work related to ours in Section 2. In Section 3, we discuss our approach to the issue of edge continuation. Section 4 is devoted to the description of the optimization algorithm along with its convergence properties (the proofs are relegated to Appendix A). Experimental results are presented in Section 5, followed by concluding remarks.
2.1. Methods restricted to quadratic data-fidelity. Chouzenoux et al. [13] proposed a subspace optimization approach for minimizing cost functions of the form where R l ∈ L K,ρ (R) for some fixed ρ ∈ N * and ψ is smooth at zero and coercive. They showed that, under assumptions similar to ours, the sequence (x (n) ) n of iterates generated by their algorithm is such that (i) (U (x (n) )) n is non-increasing, and (ii) lim n→∞ ∇U (x (n) ) = 0 (as we will see in Section 4.2, these convergence guarantees are weak compared to those derived for our algorithm). Cost functions of the above form were also studied by Nikolova et al. [43]. They developed a continuation approach to handle the situation where ψ is not smooth at zero and not convex, but they did not investigate the theoretical convergence properties of their algorithms. In [27], Hou et al. considered the special case where ψ(t) = t p with p ∈ (0, 1) in order to promote sparsity of the gradient at different resolutions (the set {R l } l consists of multi-resolution derivative filters). They used the multi-stage convex relaxation method proposed in [52], whose only convergence guarantee is that the cost of the iterates is non-increasing. The Hessian-based approach of Lefkimmiatis et al. [32] and the isotropic higherorder TV approach of Hu and Jacob [28] also boil down to minimizing a function of the form (8). The Hessian-based regularization functionals are of the form l Q l x 2 (Frobenius-norm regularizer) and ∆x 1 + l Q l x 2 (spectral-norm regularizer), where ∆ is a discrete approximation to the Laplacian operator and the Q l 's are second-order differential operators (Q l ∈ L K,3 (R) for the Frobenius norm and Q l ∈ L K,2 (R) for the spectral norm). The isotropic higher-order TV regularization functional is of the form l Q l x q , where the Q l 's are discrete nth-order differential operators in L K,n+1 (R) (n = 2 or 3) and · q is the norm induced by a positive-definite quadratic form on R n+1 . The resulting optimization problems are solved by majorization-minimization algorithms, but neither paper provides a convergence proof. Hu and Jacob [28] also introduced an anisotropic higher-order TV penalty to improve contour regularity while preserving edges (this penalty does not have a closed-form expression). Their experiments on image denoising show that anisotropic regularization slightly improves the SNR compared to isotropic regularization (the gain in SNR remains below 2.15%), but there is no clear evidence of edge continuation.
Other higher-order models were proposed in [10,35,1,53] to reduce the staircase effect; the associated optimization algorithms are restricted to image denoising. Chan et al. [10] combined a first-order TV penalty and a second-order term that penalizes the square of the laplacian at the locations where the gradient is small. Lysaker and Tai [35] defined the solution as a convex combination of the solutions associated with first-and second-order TV regularization. Bae et al. [1] used the Euler's elastica model [9], and Zhu and Chan [53] employed the TV of the mean curvature of the image surface. In [10,35,53], the optimization method is based on solving the Euler-Lagrange equations associated with the model considered, which are quite complex nonlinear PDEs. These equations are difficult to solve numerically and the proposed iterative solution schemes have very weak convergence guarantees. The approach of Bae et al. [1] is radically different: they designed an optimization algorithm based on graph cuts. Their algorithm is very efficient for image denoising, but it cannot be extended to the case where H is not the identity; the reason is that the cost function is not submodular if H is a blurring operator. (An alternative would be to use quadratic pseudo-Boolean optimization [48,31], which is an extension of graph cuts that allows to deal with non-submodular cost functions. However, as discussed in [45], reconstruction techniques based on quadratic pseudo-Boolean optimization are limited to small-size blurring kernels.) 2.2. Methods using a non-quadratic data-fidelity term. The approaches reviewed above share a common limitation: the quadratic data-fidelity term Hx−d 2 2 is appropriate only for Gaussian noise and leads to solutions that are significantly altered by outliers. In particular, quadratic data-fidelity does not allow to deal with impulse noise. To overcome this drawback, a standard choice is the 1 -norm of the residual, or, more generally, a data-fidelity term of the form (3) (it is common practice to replace the 1 -norm of the residual by (3) with φ k = ψ δ id ). Nikolova [41] proposed a convergent method to minimize cost functions of the form where the R l 's are first-order difference operators and ψ is smooth at zero and convex. This approach is expressly restricted to image denoising and impulse noise; it is based on the assumptions that only a part of the pixels are corrupted by noise and that each noisy pixel is dissimilar to its nearest neighbors. Cai et al. [7] and Chan et al. [8] also considered the special case of impulse noise. They proposed a two-phase approach which extends the work of Nikolova: the noisy data pixels are first detected using an impulse-noise detector, and then the original image is estimated by minimizing where the first sum is over the set I of indices of the noise-free data pixels. In [7], the set {R l } l consists of first-order difference operators, and a smooth approximation of U is minimized using a fixed-point iteration scheme without convergence guarantee.
In [8], {R l } l is a discrete approximation to the gradient, and U is minimized by a variant of the primal-dual method proposed in [20]. By extension, the iterative reweighted norm approach of Rodrìguez and Wohlberg [47] allows to consider cost functions of the form with p, q ∈ (0, 2). The associated convergence guarantees are the same as in [13]: the cost of the iterates is non-increasing, and the gradient of U goes to zero as the number of iterations grows to infinity. Li et al. [34] considered a two-phase approach to deal with Gaussian plus impulse noise: after detection of the pixels corrupted by impulse noise, the original image is estimated from the non-outlier pixels by minimizing where φ ν (t) = νt 2 /(ν + t) and {R l } l is a set of tight framelet filters ({R l } l is the union of (i) a weighted-averaging operator, (ii) edge-detection operators, and (iii) second-order difference operators). The authors designed an iterative shrinkage algorithm in which the hyper-parameters ν and λ are determined automatically, but they did not provide any convergence guarantee. Finally, Tai et al. [49] proposed an augmented Lagrangian method for minimizing an Euler's energy functional combined with a data-fidelity term of the form k |x k − d k | p with p ∈ [1,2]. Their numerical experiments demonstrate the efficiency of their algorithm, but they did not study its theoretical convergence properties.

2.3.
Position of our approach. The methods reviewed above either have weak convergence guarantees [10,35,47,7,43,1,13,27,34,49,32,28,53] or are limited to very specific situations [41,8]. Furthermore, the methods aimed at improving contour regularity and/or reducing the staircase effect are generally restricted to quadratic data-fidelity [10,35,1,53,32,28], and the methods that employ a nonquadratic data-fidelity term are often limited to first-order models [41,47,7,8]. By contrast, our approach described in the next two sections has all three of the following advantages: • it is not limited to quadratic data-fidelity; • it has good edge-continuation capabilities (as our experiments show); • it offers strong convergence guarantees.
As mentioned in the introduction, the latter advantage is balanced by assumptions that rule out PFs that are either non-smooth at zero or bounded (or both), but the resulting practical limitations are minor. In addition, our deterministic relaxation algorithm can incorporate inequality constraints as well as continuation sequences at no cost. It is also easy to implement and quite generic (it can be used to minimize the cost functions considered in [41,7,8,13,27,34,32,28]).
3.1. General description. Since we are interested in the recovery of piecewisesmooth images, we can assume that the level curves of the original continuous image (from which x is sampled) are piecewise-smooth. In the domain of a multiscale directional transform (such as a wavelet or a curvelet transform), this prior information is synonymous with clean and strongly oriented detail images. Consequently, edge continuation can be incorporated into the reconstruction process by means of an additional penalty term of the form where x denotes the multiresolution decomposition of x, the D m 's are discrete directional derivative operators acting on the detail coefficients, and the functions ψ m are increasing in R + . Let T ∈ L K, K (R) be the considered multiresolution transform ( K K with strict inequality if T is redundant). Our estimate of x is defined as any global minimum of the augmented cost function V : R K → R obtained by combining the contour-line smoothing term Ψ with the standard cost function U : where Φ d and Ψ are defined by (3) and (4), and where the positive parameters λ and λ control the degree of smoothing in the spatial domain and in the transformed domain. The interests of this approach are that (i) it avoids the delicate use of explicit edge-processes, (ii) it can deal with edges at different resolution levels, (iii) it involves only one extra hyper-parameter (namely, λ), and (iv) the associated increase in optimization complexity is small if the ψ m 's are convex.

Remark 1.
If the D m 's are replaced by canonical projections, and if ψ m (t) ∝ t p for all m and for some fixed p ∈ (0, 2), then Ψ is an p -penalty that favors sparse representations in the transformed domain. Therefore, our approach generalizes the sparsity-constraint approaches in [4,15], which are restricted to cost functions of the form where the λ m 's are non-negative reals. (When T is a wavelet transform, p -penalties are obtained under the assumption that the detail coefficients follow a generalized Gaussian distribution with shape parameter p [4], or when the continuous original image is assumed to belong to a Besov space [15].) Remark 2. The issue of choosing appropriate values for λ and λ is beyond the scope of this paper. Good starting points are the L-hypersurface method [5], the Monte Carlo SURE approach [44], the use of the no-reference measure proposed in [54], and the zero-crossing method developed in [30]. (The L-hypersurface method requires the PFs to be convex, the approaches in [44] and [54] are restricted to image denoising, and the method in [30] must be calibrated as a function of the regularization model.) Another avenue is to determine the parameters of the cost function automatically by updating their values adaptively during the course of the optimization process, as in [34], but this considerably complicates the convergence analysis.
In the following section, we refine the description of the contour-line smoothing term Ψ in the case of a non-redundant wavelet transform. We focus on this situation for computational efficiency reasons and because the construction is easier when directional selectivity is limited. However, the approach can be applied to the case of a redundant transform with high directional selectivity (for instance, the curvelet transform [36]) at the expense of additional design efforts and increased computational burden.

3.2.
Wavelet-domain edge-continuation. Let J ∈ N * . The J-level decomposition of an image x ∈ R K in a wavelet basis is a set of subimages where the index j represents the scale, A 0 x is the approximation of x at resolution 2 −J (the resolution of x is equal to one and its scale is zero), and the detail images A h j x, A v j x, and A d j x convey the difference of information between A j x (the approximation of x at resolution 2 j−J ) and the finer approximation A j+1 x. The linear map A h j gives the horizontal high frequencies (that is, the vertical edges), A v j gives the vertical high frequencies (that is, the horizontal edges), and A d j gives the high frequencies in both horizontal and vertical directions. The wavelet transform T can be viewed as the concatenation of the component functions of the maps . Piecewise-smooth level curves in the original continuous image translate to the fact that the horizontal and vertical detail images of x are respectively vertically and horizontally oriented. In other words, the vertical derivatives in A h j x and the horizontal derivatives in A v j x have small amplitude. This suggests to define the contour-line smoothing term as follows: where the α j 's are positive reals, and where the D v j 's and the D h j 's are vertical and horizontal discrete derivative operators, respectively. The outer summation is over scales, and the inner summations are over the pixels of the detail images where the derivatives are evaluated. A natural choice for the PF ψ is the identity on R + , for we want to preserve both sharp transitions and smooth variations along the preferential directions of the detail images. In practice, we advocate using a strictly convex function such as (5), which facilitates the optimization process without compromising the quality of the reconstructions. The weights α j allow to strengthen the edge-continuation effect at some particular resolution level(s). If there is no conflict with possible application-specific constraints, we choose the α j 's so as to compensate for wavelet-coefficient decay, as discussed in Section 3.3.
From a filtering perspective, the action of Ψ is to discard the solutions that convey significant information in the frequency regions schematized in Fig. 1(a). These regions are exemplified in Figs. 1(b)-(d) for the setting used in our experiments (namely, biorthogonal spline wavelets with two vanishing moments [14], and firstorder difference derivative operators). It appears that the proposed wavelet-domain smoothing term not only favors the formation of horizontal and vertical edges, but also preserves diagonal boundaries (which is why we do not add a penalty on the

Remark 3.
Although the parallel is tempting, Ψ is not comparable to a secondor higher-order smoothing term in the spatial domain. To fix ideas, consider the standard second-order model introduced in [22]: ψ |R x l x| , where x stands for any of the letters h, v, and d, and where the second-derivative operators R h , R v , and R d are the convolutions with the masks

Fig. 2 displays the frequency responses of
shows that Ψ 2 penalizes the high-frequency components regardless of their orientation in the spatial domain. Indeed, R h and R v filter out the horizontal and the vertical high frequencies, including the diagonal subbands which are further attenuated by R d . Therefore, the combined effect of Ψ h , Ψ v , and Ψ d (or of Ψ h and Ψ v only) is that of a low-pass filter with maximum attenuation in the diagonal subbands, which reveals that second-order models are not even appropriate for edge-preservation. The same conclusion holds for third-and higherorder models, as the strength of spatial smoothing increases with the order of the model.

3.3.
Correcting wavelet-coefficient decay. We propose to set the coefficients α j in (16) so as to compensate for the decay of the magnitudes of the detail coefficients across scales. An elegant way to do so is to lie at the intersection of two standard image models: the independent generalized Gaussian (IGG) model [37,40,4], and the Besov space (BS) model [18,15]. The IGG model is a statistical model in which it is assumed that (i) the wavelet coefficients are independent, and (ii) the detail coefficients at scale j follow a zeromean, generalized Gaussian distribution π j with shape parameter p j > 0 and variance σ 2 j : . From the standpoint of this model, wavelet decay correction is achieved by multiplying the D v j 's and the D h j 's by σ 0 /σ j . To obtain a guideline for setting the ratios σ 0 /σ j , we take inspiration from [15] by assuming that the original discrete image x is sampled from a continuous image f : D → R (D is a rectangular domain of R 2 ) that belongs to a Besov space B s p (L p (D)) with p 1 and s 2/p − 1. The precise definitions of B s p (L p (D)) and of its standard norm · B s p (Lp(D)) are given in [18]. All we need to know here is that the BS model is a deterministic characterization of image smoothness: f ∈ B s p (L p (D)) essentially means that f has "s derivatives" in L p (D).
The relationship between the IGG and BS models arises when (20) ∀j 0, (In other words, we further assume that the IGG shape parameter is constant and that the wavelet coefficients decay exponentially across scales.) In this case, the standard Besov norm on B s p (L p (D)) is equivalent to the negative logarithm of the normalized IGG likelihood raised to the power of 1/p [12]; that is, letting w x j,k (f ) : k ∈ Z be the set of detail coefficients of f : D → R at scale j and at orientation x = h, v, or d, we have Hence, if (20) is satisfied, the more likely an image according to the IGG model, the smaller its Besov norm (and vice versa). This connection between the IGG and BS models suggests to set α j = σ 0 /σ j = 2 j(s+1−2/p) for some values of s and p that reflect some knowledge about the original continuous image f in addition to (20). In our experiments (see Section 5), we set p = 1 and s = 2 (that is, α j = 2 j ), which amounts to assume that f has second-order smoothness in L 1 (D) and that it conforms to the independent Laplacian model with standard deviation 2 −j σ 0 at scale j.
4. Deterministic relaxation. In this section, we focus on the minimization of the augmented cost function V given in (14) (the proposed wavelet-domain smoothing term defined in (16) is an instance of (13)). For convenience, we write V in the more compact form (22) V We assume that the data penalty functions φ 1 , . . . , φ K and the PFs θ 1 , . . . , θ L satisfy the following conditions: where S is the set of functions ϕ : R + → R + such that (a) ϕ(0) = 0, ϕ is increasing, and ϕ is twice differentiable; (b) lim t→+∞ ϕ(t) = +∞; (c) ϕ is four times differentiable at zero and ϕ (0) = 0; (d) the function is strictly decreasing, and lim t→+∞ ϕ † (t) = 0. Conditions of type (a) are standard in regularized reconstruction; conditions (b) ensure that V is coercive (that is, lim x →∞ V (x) = +∞) and hence has a global minimum; and conditions (c) are technical requirements for convergence proofs. Conditions of type (d) have different implications depending on whether they concern the data penalty functions or the PFs: their application to the φ k 's allows to handle the case of a non-Gaussian noise, and their application to the θ l 's guarantees edge-preservation properties.
. The choice C1.2 allows to consider the important special case when the data-fidelity term is an 2 -norm weighted by a symmetric positive-definite matrix (call it S), that is, when Φ d is of the form This situation arises in the Bayesian setting when there is no impulse noise and when the noise vector η has a multivariate normal distribution with covariance matrix S −1 .
Our deterministic relaxation algorithm for minimizing V is described in Section 4.1. Its convergence properties are presented in Section 4.2, and we discuss two possible refinements in Section 4.3 (namely, the enforcement of inequality constraints and the introduction of continuation sequences).
For any x ∈ R K , we define the diagonal matrices and where I r is the identity matrix of order r and ρ l is the number of rows of Q l . The first-order necessary condition for x ∈ R K to be an extremum of V (that is, Assuming this condition holds, (29) suggests the following iterative relaxation algorithm: This fixed-point iteration scheme can be equivalently written as Condition C3 guarantees that the quadratic function x ∈ R K → V 0 (x, δ, ε) is positive-definite for any (δ, ε) ∈ (0, +∞) K × (0, +∞) L , and thus that the halfquadratic algorithm A V can be efficiently implemented using a preconditioned conjugate gradient method. Condition C3 trivially holds when rank(H) = K, as in image denoising problems. It is also usually satisfied for reconstruction problems where rank(H) < K, such as image deblurring and limited-angle tomography. Indeed, piecewise-smooth reconstruction involves gradients or directional derivatives in the spatial domain, which implies that ker(Q) = span{1 K } (1 K denotes the vector in R K whose entries are all equal to one), and we generally have H1 K = 0.

4.2.
Convergence properties. The convergence properties of A V (30) are summarized in Theorem 4.1 below, whose proof is given in Appendix A. We denote by C the set of critical points of V , that is, and we use the following terminology: a continuum of R K is a connected compact ( · is any norm on R K ); and C is said to be discrete if all its points are isolated.
Theorem 4.1. Assume that conditions C1-C3 are satisfied, and let (x (n) ) n be any sequence generated by A V .
(i) (x (n) ) n has convergent subsequences, and all its accumulation points are in C. (ii) There exists x * ∈ C such that lim n→∞ V (x (n) ) = V (x * ), and, for any n ∈ N, Either (x (n) ) n converges, or its accumulation points form a continuum.
If V is strictly convex, then (x (n) ) n converges to the global minimum of V .
Theorem 4.1 characterizes the behavior of A V in three mutually exclusive situations which cover all possibilities: (i) when V is strictly convex, (ii) when V is not strictly convex and C is discrete, and (iii) when C is not discrete. The corresponding conclusions are the following: (i) The cost function V is strictly convex if C3 holds and if the data penalty functions φ 1 , . . . , φ K and the PFs θ 1 , . . . , θ L are strictly convex. Therefore, from (vii), convergence to the global minimum always occurs when the φ k 's and the θ l 's are strictly convex. (ii) If V is not strictly convex and C is discrete, then we have from (vi) that (x (n) ) n converges to a critical point x * ∈ C. We cannot rigorously exclude the possibility that x * is a maximum or a saddle point; however, this is very unlikely because, while any isolated critical point that is a minimum is an attractor (by (iv)), maxima and saddle points are unstable. Indeed, from (ii), small random perturbations away from an isolated critical point that is not a minimum will eventually move the iterates away from it. Therefore, from an algorithmic standpoint, convergence to a maximum or a saddle point can be avoided by adding a small random vector to the current solution whenever is below a certain threshold. In practice, the roundoff errors in floating-point arithmetic do the job, and hence any numerical implementation will lead to a minimum. (iii) If C is not discrete, then, from (i) and (iii), either (x (n) ) n converges, or C contains a non-empty continuum. The latter case may be considered as a failure in the design of the regularization term(s), but the algorithm also behaves well in this situation: by (ii) and (v), (V (x (n) )) n converges, and (x (n) ) n gets arbitrarily close to C.

Refinements.
4.3.1. Inequality constraints. The proposed optimization scheme can be easily extended to enforce inequality constraints on the pixel values, that is, constraints of the form Ax b with A ∈ L K,M (R) and b ∈ R M . Indeed, it suffices to consider the cost function where µ is a positive real constant and y + denotes the positive part of y (that is, y + = (y + 1 , . . . , y + M ) with y + m = max{y m , 0}). Assume that conditions C1-C3 hold, and let W 0 : where V 0 is given in (32). The quadratic function x ∈ R K → V 0 (x, δ, ε) is strictly convex and coercive for any (δ, ε) ∈ (0, +∞) K ×(0, +∞) L , and the penalty function 2 is continuously differentiable and convex. Therefore, for any positive δ and ε, the piecewise-quadratic function W 0 (·, δ, ε) : x ∈ R K → W 0 (x, δ, ε) is continuously differentiable, strictly convex, and coercive, and thus it has a unique global minimum which can be easily found (using, for instance, a conjugate gradient method). This leads to the following extension of A V : where e φ and e θ are given in (26). As is made clear in Appendix A, all the convergence results discussed in Section 4.2 generalize to W along with A W .

4.3.2.
Continuation. The algorithm A V is very efficient when the data penalty functions φ 1 , . . . , φ K and the PFs θ 1 , . . . , θ L are strictly convex and when their derivatives do not increase too rapidly at the origin. On the other hand, A V may be trapped in a poor local minimum when the φ k 's or the θ l 's are not convex, and its convergence is slow when the φ k 's or the θ l 's closely approximate a function that is not smooth at zero. In these latter two situations, the convergence can be improved by replacing some or all of the φ k 's and of the θ l 's with the elements of some carefully designed sequences (φ at the beginning of the optimization process. The resulting modified algorithm is the same as A V , except for the first N iterations where E φ and E θ are respectively replaced by E φ (n) and E θ (n) . This continuation technique can be thought of as providing a good initialization for A V , and the convergence results in Theorem 4.1 still hold.
C4. ϕ (n) should be as close as possible as ϕ for all n.
C6. If ϕ is not convex, then the sequence κ(ϕ (n) ) n should be gradually increasing to κ(ϕ), so that the non-convexity (and hence the difficulty) of the optimization problem increases progressively with the number of iterations. For instance, if the target is the convex PF ψ δ id (see (5)), a natural choice is ϕ (n) = ψ δn id with δ n decreasing with n. This sequence speeds up the solution search by gradually reducing the range over which ϕ (n) acts as a quadratic function. If the target is the non-convex PF ψ δ HL (see (7)), we suggest to use ϕ (n) = (1 − a n )ψ δ id + a n ψ δ HL with a n increasing with n from 0 to 1. In this way, the maximum concavity increases progressively from 0 to κ(ψ δ HL ) = 1/(4δ 2 ), and the algorithm can reach deeper minima. These two example sequences are used in our experiments (see Remark 5).

Experiments.
We consider the reconstruction of the 128 × 128 images shown in Figs. 3(a) and 4(a) from the data displayed in Figs. 3(b) and 4(b), respectively. The data associated with the "office" image were generated by first blurring with a 7 × 7 uniform mask and then adding white Gaussian noise at 20 dB SNR. (The standard deviation σ of the noise is defined via the decibel level of the signal-tonoise ratio: SNR dB = 20 log 10 (σ /σ), where σ is the standard deviation of the exact data H(x ).) The data associated with the "peppers" image were obtained by first blurring with a 7 × 7 rotationally symmetric Gaussian mask with a standard deviation of 2 pixels, then adding white Gaussian noise at 20 dB SNR, and finally adding random-valued impulse noise with a 30% corruption rate. More precisely, letting d = Hx + η be the blurred image corrupted by Gaussian noise, the data d = χ(d) is defined by (39) d k = d k with probability 0.7, τ k with probability 0.3, where τ k is a sample from the uniform density on [min k d k , max k d k ]. Our estimates of the original images are obtained by minimizing the cost function V given in (14), where the wavelet-domain smoothing term Ψ is defined by (16) • φ(t) = t 2 , which is appropriate when the noise is additive and consists of i.i.d. normal random variables (the data-fidelity term Φ d (·) is the 2 -norm of the residual H(·) − d). • φ = ψ 0.1 id , as defined in (5), which is appropriate for dealing with impulse noise (Φ d (·) approximates the 1 -norm of H(·) − d).
• ψ = ψ 0.1 id , referred to as the convex case (Ψ(x) is a smooth, convex approximation to the discrete TV of x). • ψ = ψ 10 HL , as defined in (7), referred to as the non-convex case (Ψ acts as a quadratic regularizer for gradient magnitudes up to 3-4 and as an edge detector for gradient magnitudes greater than 10).
For the wavelet-domain smoothing term, we use biorthogonal spline wavelets with two vanishing moments [14] and two resolution levels (we set α 0 = 1 and α 1 = 2 in accordance with the discussion in Section 3.3). We chose these particular wavelets because they combine symmetry, regularity, and compact support [50]. The D v j 's compute differences between vertically adjacent coefficients in the horizontal detail images, and the D h j 's compute differences between horizontally adjacent coefficients in the vertical detail images. In other words, D v j convolves A h j (x) with the mask  [ −1 1 ] T , and D h j convolves A v j (x) with the mask [ −1 1 ] (there are of course other and better discrete approximations of a derivative [51,21], but the associated extra computational load does not worth the effort). Finally, the PF ψ is set equal to ψ 0.1 id , so that the inner summations in Ψ approximate the vertical TVs of the horizontal detail images and the horizontal TVs of the vertical detail images.
(We also performed experiments using the Hessian spectral-norm regularizer proposed in [32], but we do not present the corresponding results because there are slightly less good than those achieved with the HFN regularizer.) The metrics used to assess reconstruction quality are the mean-square error (MSE) and the improvement in SNR (ISNR) [2]; the ISNR associated with a computed solutionx is defined by (42) ISNR dB = 20 log 10 where x| S denotes the restriction of x to the lattice S supporting d (in image deblurring, the solution space R K and the data space R K are supported by two rectangular lattices S and S such that S ⊂ S).
5.1. "Office" image. Since the blurred observation of the "office" image is corrupted by additive white Gaussian noise only, we restrict ourselves here to quadratic data-fidelity (that is, φ(t) = t 2 ).
5.1.1. Convex case. Fig. 5(a) shows the best reconstruction obtained with the PF ψ = ψ 0.1 id and without wavelet-domain smoothing. The corresponding value of the spatial-domain smoothing parameter λ is 1.0, the MSE is 208.0, and the ISNR is 5.70dB. The edges are noisy (not to say visually unpleasant), and the staircase effect is clearly visible at the bottom of the image. By comparison, Fig. 5(b) displays the best solution achieved by adding the wavelet-domain smoothing term and keeping λ = 1.0 (this solution is obtained for λ = 0.8). There are noticeable improvements: the image contours are smoother without penalizing object boundaries, and the "patchy" appearance due to the staircase effect is softened. Quantitatively, the MSE is 180.4 and the ISNR is 6.32 dB. Fig. 6 displays the ISNR as a function of the wavelet-domain smoothing parameter λ. Too large values of λ predictably produce over-smoothed solutions, but the values of λ that improve reconstruction quality cover a large interval compared to λ.  6.32 Figure 6. Convex reconstruction of the "office" image: ISNR as a function of the wavelet-domain smoothing parameter λ (λ is set to its optimal value in terms of ISNR without wavelet-domain smoothing).
The above results clearly show that first-order spatial regularization benefits from wavelet-domain smoothing; but is the converse also true? To answer this question, Fig. 7 displays the best reconstruction obtained using wavelet-domain smoothing without spatial regularization (that is, λ = 0). This estimate is worse than that obtained using only spatial regularization (the "trellis-like" texture is due to the fact that wavelet-domain smoothing favors solutions with vertically-oriented horizontal subbands and horizontally-oriented vertical subbands). We can therefore conclude that spatial-and wavelet-domain smoothing work in synergy.

5.1.2.
Non-convex case. Fig. 8 shows the estimates obtained with the PF ψ = ψ 10 HL for (λ, λ) = (10.5, 0) and (λ, λ) = (10.5, 2.3). The particular value λ = 10.5 was purposely selected to obtain a slightly under-regularized reconstruction in the absence of wavelet-domain smoothing (the optimal value is 21.0). The value λ = 2.3 is the optimal setting associated with λ = 10.5. The MSE and ISNR are 501.9 and 1.88 dB for the standard regularization approach, and 191.0 and 6.08 dB using wavelet-domain smoothing. The improvement is even more striking than in the convex case: the edge artifacts are almost completely removed, and the contour plots show substantial reduction of the staircase effect. As seen in Fig. 9, the evolution of the ISNR as a function of λ is reasonably smooth, which indicates that our optimization algorithm behaves well in the non-convex case. Although λ is not optimal (the best reconstruction achieved without wavelet-domain smoothing has an MSE of 354.1 and an ISNR of 3.40 dB), our model outperforms optimal standard regularization over a significant range of values of λ.  Figure 9. Non-convex reconstruction of the "office" image: ISNR as a function of the wavelet-domain smoothing parameter λ (the dotted green line represents the highest ISNR achievable without wavelet-domain smoothing).

5.1.3.
Comparison with state-of-the-art methods. Fig. 10 shows the estimates achieved by IFAS reconstruction, HFN regularization, and ITV2 regularization. (The HFN and ITV2 regularization estimates are the best obtained by adjusting the regularization strength.) ITV2 regularization produces the best solution among these three methods, but it fails to outperform first-order TV regularization: the ITV2 regularization estimate shown in Fig. 10(c) has an MSE of 211.2 and an ISNR of 5.64dB, whereas the first-order TV regularization estimate shown in Fig. 5(a) has an MSE of 191.0 and an ISNR of 5.70dB. This observation is consistent with the fact that the performances of second-order regularization techniques decline when the original image is too degraded. We also observe that HFN and ITV2 regularization produce estimates with a slightly granular appearance and tend to smooth edges (this unwanted smoothing effect is clearly visible from the enlargements of the cropped areas in Figs To measure the global deviation of the pixel values inx from the range of x , we use a metric we call the deviation from the original range (DOR), which is defined by .
We consistently observe that the introduction of the wavelet-domain smoothing term does not introduce DOR if there is none and that it reduces the DOR otherwise. For instance, for the reconstruction of the "office" image, wavelet-domain smoothing reduces the DOR from 3.38 · 10 −4 to 3.08 · 10 −4 in the convex case, and from 1.79 · 10 −3 to 4.82 · 10 −4 in the non-convex case. The DOR values of the estimates obtained using wavelet-domain smoothing are also smaller than those of the solutions achieved by IFAS reconstruction, HFN regularization, and ITV2 regularization (1.16 · 10 −3 , 1.10 · 10 −3 , and 8.58 · 10 −4 , respectively).

5.1.5.
Comparison of intensity profiles. Fig. 11 displays line intensity profiles of the estimates shown in Fig. 5 (convex case), Fig. 8 (non-convex case), and Figs. 10(b) and 10(c) (HFN and ITV2 regularization). We can see from Figs. 11(b) and 11(c) that the wavelet-domain smoothing term does not smooth the edges that are preserved by first-order spatial regularization. Furthermore, the addition of waveletdomain smoothing allows to recover more details (see arrow 1) and eliminates offsets 5.2. "Peppers" image. The "peppers" image test-problem has the following objectives: (i) to examine the effect of the wavelet-domain smoothing term when the original image has reduced high-frequency content in the horizontal and vertical directions, (ii) to show that our algorithm behaves well when the data-fidelity term is not quadratic, and (iii) to capture the improvement brought by non-quadratic  Figure 11. Intensity profiles along line 40 (displayed in cyan in Fig. 3(a)): (a) original; (b) convex reconstruction with (red) and without (blue) wavelet-domain smoothing; (c) non-convex reconstruction with (red) and without (blue) wavelet-domain smoothing; (d) HFN (blue) and ITV2 (red) regularization. data-fidelity in the presence of impulse noise. Note that, contrary to the approaches proposed in [7,8,34], we do not remove the outlier pixels from the data before reconstruction.

5.2.1.
Quadratic versus non-quadratic data-fidelity. Figs. 12(a) and 13(a) show the best reconstructions obtained using the square data-penalty function φ(t) = t 2 and without wavelet-domain smoothing. They are qualitatively similar: the MSE and the ISNR are 559.3 and 6.16 dB in the convex case, and 568.4 and 6.09 dB in the non-convex case. Both estimates are over-smoothed, but decreasing the value of λ leads to worse solutions that are substantially biased by the impulse noise. By contrast, Figs. 12(b) and 13(b) show the best reconstructions obtained using the non-quadratic data-penalty function φ = ψ 0.1 id . These solutions have much better contrast and sharpness than those achieved with quadratic data-fidelity. Quantitatively, changing quadratic data-fidelity for non-quadratic data-fidelity divides the MSE by about 4.5 in the convex-case and about 2.8 in the non-convex-case; the corresponding increases of the ISNR are 6.58 dB and 4.51 dB, respectively.

Convex versus non-convex regularization.
Looking at the contour plots in Figs. 12(b) and 13(b), we observe that the edges obtained with the non-convex PF ψ = ψ 10 HL are sharper than those obtained with the convex PF ψ = ψ 0.1 id . It should be stressed, however, that sharp-edge recovery makes complete sense when a significant proportion of the discontinuities in the original image are only one pixel wide -it can be a disadvantage otherwise. Indeed, non-convex regularization tends to create sharp discontinuities wherever the gradient exceeds a certain threshold, thus turning edges that are more than one pixel wide into thin edges that do not conform to reality. This unwanted effect is clearly observed by comparing the original contours in Fig. 4(a) and the contours in Fig. 13 reduction of the staircase effect, there is not much difference with the solution obtained without wavelet-domain smoothing. This result is actually satisfactory: the wavelet-domain smoothing term has negligible effect if the reconstruction obtained without using it does not have noisy contour lines and does not present a staircase effect. In the non-convex case, the improvement accompanying the introduction of the wavelet-domain smoothing term is significant: the MSE decreases from 201.2 to 129.3, and the ISNR increases from 10.60 dB to 12.52 dB. As observed for the "office" image, the staircase effect is largely reduced and the noise on the contour lines is eliminated. In addition, we notice that the wavelet-domain smoothing term allows to better recover the true thickness of the edges. This can be seen from the contour plots in Figs. 4(a), 13(b), and 13(c): the vertical edge obtained without wavelet-domain smoothing is too sharp compared to the original one, whereas that obtained using wavelet-domain smoothing is more conform to reality.

5.2.4.
Comparison with state-of-the-art methods. Contrary to IFAS reconstruction, the algorithms proposed in [32,28] for HFN and ITV2 regularization are limited to quadratic data-fidelity. Nevertheless, as mentioned in Section 2.1, the HFN and ITV2 regularizers are special cases of (4), and thus we can use our algorithm A V to assess their behavior when added to a non-quadratic data-fidelity term. Fig. 14 displays the estimates achieved by IFAS reconstruction and by the HFN and ITV2 regularizers combined with non-quadratic data-fidelity. Similarly to what observed for the "office" image, IFAS reconstruction gives the worse estimate, and HFN and ITV2 regularization are outperformed by first-order TV regularization. Again, our approach outperforms all three alternatives in terms of MSE and ISNR. We also clearly see from the enlargements and the contour plots that HFN and ITV2 regularization produce over-smoothed solutions compared to reconstruction with wavelet-domain smoothing. 6. Conclusion. We introduced the idea of implicitly interacting discontinuities by means of an additional penalty term operating on the detail images in the wavelet domain. When combined with first-order edge-preserving regularization, the proposed wavelet-domain regularizer preserves boundary sharpness while smoothing contour lines. Aside from producing visually more pleasing reconstructions, this behavior is desirable for subsequent feature-extraction and segmentation tasks. We also provided an efficient half-quadratic optimization algorithm with strong convergence guarantees even when the faithfulness to the data is non-quadratic and the cost function is not convex. Our experiments show that our approach outperforms first-order TV regularization as well as state-of-the-art second-order regularization techniques. The only drawback is the introduction of the additional hyper-parameter λ which controls the degree of smoothing in the wavelet domain. However, the range of values of λ that lead to an improvement in reconstruction quality is fairly large, and preliminary results suggest that λ can be selected by tracking the value of the spatial-domain regularizer Ψ when this parameter varies.
Appendix A. Proof of Theorem 4.1.
Let g = (g 1 , . . . , g r ) : R K → R r be the vector-valued function with components and let J : R K → R K be given by g(y)).
(The function J is well defined. Indeed, for any γ ∈ (0, 1] r , we have where 1 r = (1, . . . , 1) ∈ R r , and where the right inequality follows from A2. Therefore, Z 0 (·, γ) is coercive by A3, and since it is strictly convex, it has a unique global minimum.) We will show in Appendix A.3 that the cost functions of type (45) that satisfy assumptions A1-A4 are minimized by the algorithm But first, we show in Appendix A.2 that the cost functions and the optimization algorithms studied in Section 4 are special cases of this general framework -Theorem 4.1 then readily follows from the convergence results stated in Theorem A.9 in Appendix A.3.
A.2. Special cases of interest.
We consider the function Z of the form (45) with r given above and with and ∀i ∈ I φ , 2 i−L . Clearly, Z = W , and assumption A1 is satisfied. It remains to check the validity of A2-A4 and the equivalence of A Z and A W .
We immediately have that f i (0) = 0, f i is strictly decreasing on (0, +∞), lim t→0 + f i (t) = 1, lim t→+∞ f i (t) = 0, and f i is twice differentiable on (0, +∞). Without going into details, the second-order Taylor expansion of ϕ at zero gives that f i (0) = 1 (hence f i is strictly decreasing on R + , and thus f i is strictly concave), and the third-order Taylor expansion of ϕ at zero gives that f i is twice differentiable at zero.
Assumption A3. Suppose that V is not coercive, that is, there exists a sequence (x (n) ) n in R K such that lim n→∞ x (n) = ∞ and (V (x (n) )) n is bounded. Since the φ k 's and the θ l 's are non-negative, and since lim t→+∞ φ k (t) = lim t→+∞ θ l (t) = +∞ for all k and for all l, (V (x (n) )) n is bounded if and only if the sequences (sup k |(Hx (n) − d) k |) n and (sup l Q l x (n) 2 ) n are bounded, which is equivalent to say that ( Hx (n) ) n and ( Qx (n) ) n are bounded. Let q be the quadratic form on R K defined by q(x) = Hx 2 + Qx 2 . By C3, q is positive-definite (or, equivalently, coercive), and thus lim n→∞ q(x (n) ) = +∞. This contradicts the fact that ( Hx (n) ) n and ( Qx (n) ) n are bounded. Therefore, V is coercive, and thus so is W .
A.3. Convergence analysis of A Z . In this section, we consider the general framework described in Appendix A.1, and we assume that assumptions A1-A4 hold. We let C(Z) be the set of critical points of Z and F(J ) be the set of fixed points of J , that is, and We denote by A(x (n) ) the set of accumulation points of a sequence (x (n) ) n .
We first show in Lemma A.2 that finding a critical point of Z is equivalent to finding a fixed point of J . We will then use specific instances of Theorems 3.1 and 3.5 in [39] to establish convergence to a fixed point of J -the results that we need are put together into Theorem A.4 after some definitions. The purpose of the subsequent lemmas (A.5 to A.8) is to show that Theorem A.4 applies when its variables ζ and K are restrictions of Z and J to compact sets of the form {x ∈ R K | Z(x) a}, a ∈ (inf x∈R K Z(x), +∞). Our main results about the convergence of A Z are stated in Theorem A.9.
Proof. Let x ∈ R K . We have where ∇Z 0 (·, γ)(x) denotes the gradient of Z 0 (·, γ) : y ∈ R K → Z 0 (y, γ) evaluated at x. Therefore, (The second equivalence follows from the fact that Z 0 (·, γ) is strictly convex and coercive for any γ ∈ (0, 1] r .) Let Ω be a closed subset of R K , let K be a function from Ω to Ω, and let F(K) be the set of fixed points of K.
(ii) K is said to be uniformly compact if there exists a compact subset C of R K such that K(Ω) ⊆ C. (iii) K is said to be strictly monotonic with respect to a function ζ : Ω → R if (57) ∀x ∈ Ω \ F(K), (ζ • K)(x) < ζ(x).
(iv) K is said to be upper semi-continuous if, for any convergent sequences (x (n) ) n and (y (n) ) n in Ω, Theorem A.4. Let Ω be a closed subset of R K , let ζ : Ω → R be a continuous function, and let K : Ω → Ω be a function that is uniformly compact, strictly monotonic with respect to ζ, and upper semi-continuous. Let (x (n) ) n be any sequence generated by K.
There exists x * ∈ F(K) such that lim n→∞ ζ(x (n) ) = ζ(x * ). (iii) Either (x (n) ) n converges, or A(x (n) ) is a continuum. Proof. Let f : R + → R + be a function satisfying the conditions of assumption A2 (we drop the subscript i to simplify notation). Then, f has the following properties: f is negative on (0, +∞), f is a bijection from R + to (0, 1], and (f ) −1 is positive on (0, 1). Let t ∈ R + . We have from (61) that For any s ∈ R + , there is a unique u s ∈ (0, 1] such that u s = f (s) (or, equivalently, s = (f ) −1 (u s )), and thus we can write where h : (0, 1] → R is given by Therefore, Hence, h is strictly decreasing and strictly convex on (0, 1). Since, in addition, h is continuous and h(1) = 0, it follows that h is non-negative, strictly decreasing, and strictly convex. tu + h(u) .
Theorem A.4 hold when Ω = Ω a , K = J | Ωa , and ζ = Z| Ωa . Therefore, there exists an open neighborhood N * of x * such that, for any sequence (x (n) ) n generated by A Z , Hence, it suffices to take N = N * ∩ Ω • a , where Ω • a denotes the interior of Ω a . (v) Let d C(Z) : R K → R + be defined by d C(Z) (x) = inf y∈C(Z) x − y , and suppose that d C(Z) (x (n) ) does not go to zero as n → ∞; that is, there exists a > 0 such that ∀m ∈ N, ∃n m, d C(Z) (x (n) ) > a.
Then, there exists a subsequence (y (l) ) l of (x (n) ) n 1 such that d C(Z) (y (l) ) > a for all l, and since (x (n) ) n 1 is in the compact set Ω 1 , (y (l) ) l has a convergent subsequence (z (k) ) k . The limit point z * of (z (k) ) k is an accumulation point of (x (n) ) n , but z * ∈ C(Z) (as d C(Z) (z (k) ) > a for all k), which contradicts (i).
(vi) Assume that C(Z) is discrete. Then A(x (n) ) is discrete by (i), and we have from (iii) that (x (n) ) n converges. Therefore, A(x (n) ) reduces to the singleton {lim n→∞ x (n) }, and hence, using (i) again, lim n→∞ x (n) ∈ C(Z).
(vii) Assume that Z is strictly convex. Then, since Z is coercive, Z has a unique global minimum x * and C(Z) = {x * }. It follows from (vi) that lim n→∞ x (n) = x * .