Iteratively Solving Linear Inverse Problems under General Convex Constraints ∗

We consider linear inverse problems where the solution is assumed to ful-ﬁll some general homogeneous convex constraint. We develop an algorithm that amounts to a projected Landweber iteration and that provides and iterative approach to the solution of this inverse problem. For relatively moderate assumptions on the constraint we can always prove weak convergence of the iterative scheme. In certain cases, i.e. for special families of convex constraints, weak convergence implies norm convergence. The presented approach covers a wide range of problems, e.g. Besov– or BV–restoration for which we present also numerical experiments in the context of image processing.


Scope of the Problem
In a wide range of practical applications one has to solve the problem that the features of interest cannot be observed directly, but have to be interfered from other quantities. Very often a linear model describing the relationship between the features and the measured quantities works quite nicely, i.e. in such situations the task to solve is T f = h .
Typically, the observed data are not exactly equal to the image h = T f , but rather a distortion of h, i.e. g = h + e = T f + e .
To find an estimate for f from an observation g, one might minimize This kind of problem lies in the class of ill-posed inverse problems if the generalized inverse T + : g → f + is unbounded. In such cases, the generalized inverse must be replaced by bounded approximants, so that numerically stable solutions can be defined.
A typical procedure to avoid these instabilities or to regularize the inverse problem is to modify the functional to be minimized, so that it not only incorporates the discrepancy, but also some a priori knowledge one may have about the solution. More precisely, we consider then a functional of the form where J(f ) < ∞, or even J(f ) < 1 is the mathematical translation of the a priori knowledge (sometimes, we will use · for · H ). In this paper, we shall consider two different choices of J(f ), both adapted to the case where the inverse problem consists in deblurring and denoising a 2-dim. image, as in [9], which was in turn, inspired by [7] and [17]. Both approaches are natural sequels to [9]. In the first approach, we consider J(f ) of the same type as in [9], but we put it in a more general framework, where J(f ) can be any positive, convex, one-homogeneous functional. An extensive discussion of such functionals, in much greater generality than what we present here, is given in [6]. In order to be self contained, and to avoid introducing the full complexity of [6], we present here a sketch of a simpler version that suffices for our case. In the second approach, J(f ) is the same as in [20] and [16], but the numerical solution in [16] of a 4-th order nonlinear PDE is replaced by an iterative approach similar to [7] and [9].
The paper is organized as follows: in Section 2, we present the framework for our first approach (generalizing [9]), which is then discussed in section 3, in several stages: a "warm up" for the case when T * T has a bounded inverse (in this case, the problem is well-posed), then the ill-posed problem (where 0 is in the spectrum of T * T ), with a convergence analysis. In Section 4, we explain our second approach, an alternative way of minimizing the functionals in [20] and [16]. Finally, in Section 5, we give numerical results for both.
2 Penalization by a positive, convex, one-homogeneous functional

Preliminaries
In this Section and the next, we assume that the functional to minimize takes the form (1), where J is a positive, convex and one-homogeneous functional. In this case, the variational problem can be recast as follows: Consider J * , the Fenchel transform or so-called dual functional of J, see [19]. Since J is positive and onehomogeneous, there exists a convex set C such that J * is equal to the indicator function χ C over C. In Hilbert space, we have total duality between convex sets and positive and one-homogeneous functionals, i.e. J = (χ C ) * , or see, e.g., [1,4,6]. (Note: [6] gives a much more general and complete discussion; we restrict ourselves here to a simple situation, and only sketch the arguments. For a complete, detailed discussion, we refer the reader to [6].) We thus end up with the following reformulation of our problem: given some closed convex set C ⊂ H (on which we may still impose extra conditions, below), we wish to minimize where we assume T to be a bounded operator from H to itself, with T < 1. We shall consider two particular cases in more detail.
Example 1. As in [7], a particular orthonormal basis {φ λ } λ∈Λ in H is preselected, the prior is defined as This can, of course, be viewed as a special case of (2), since in this case Similarly, the case with the prior fits also into the framework of (2), C now defined by ∀λ} . When T = I and the problem is ill-posed, the resulting minimization scheme amounts to Landweber iteration with thresholding applied in each step.
Example 2. In the BV regularization framework, [21], [20], one considers functionals of the form and minimizes over all possible f ∈ L 2 (Ω). Expressing this functional by means of a convex set C one discovers that C is the L 2 -closure of i.e. we may again write for details on the structure of C we refer the reader to [14]. It turns out that results on iterative strategies developed for Example 1 carry over to the BV case and that much of the analysis elaborated in [7] can be generalized to the minimization of (3).

Reformulation of the problem
We shall assume that C is a closed convex set in H, C is symmetric, i.e. h ∈ C ⇒ −h ∈ C, and there exists finitely many vectors a 1 , . . . a N ∈ H, and r > 0 so that Lemma 2.1 L(f, h) is continuous in both arguments, it is also convex with respect to f , concave with respect to h.
Proof. Continuity follows immediately. By the second assertion follows.
This means that (provided some technical conditions are fulfilled) we can apply the minimax theorem, which allows us to interchange inf and sup in (4). In this case the minimax theorem moreover asserts that inf and sup are achieved, i.e. the inf is a min, the sup is a max.
Remark In order to invoke the minimax theorem, we need the following Lemma: Lemma 2.2 For the minimization of (4), it suffices to consider f in some bounded set B ⊂ H.
Proof. We need to show that there is some R > 0 so that necessarily arg min First, we have L(0, h) = g 2 , thus inf f sup h∈C L(f, g) ≤ g 2 . If we define V := span{a 1 , . . . , a N } (see conditions on C for the definition of the a i ), then we have where P V ⊥ denotes the orthogonal projection onto V ⊥ . It follows that if Consequently, Since V is finite dimensional, T | V : V → T V can be represented by an N × Nmatrix. If this matrix has a non-trivial kernel, then we can quotient it out. The quotient operator is then bounded below. It follows that there exists anf ∈ V so that Tf = T P V f and Under a variety of different possible conditions on C (e.g. if the set C is bounded; see literature on convex optimization such as [3] or [13] for this and other cases) it then follows that In what follows we shall assume that we can indeed apply the minimax theorem and that we can first minimize L(f, h) over f , and then maximize over h in C.
3 Solving the inverse problem for convex penalization

The well-posed case
Although the case where T * T does not have a bounded inverse, i.e. where the inverse problem is ill-posed, is of most interest to us, we start by sketching the approach in the easier well-posed case.
Theorem 3.1 Suppose that all assumptions made above hold true, and T * T has bounded inverse in its range. If we define A := (T * T ) −1/2 and, for an arbitrary closed convex set K ⊂ H, S K := Id − P K , where P K is the (nonlinear) projection on K, i.e. P K ϕ = arg min h∈C h − ϕ , then the minimizing f is given by Proof. According to the last section we compute first Since T is well-posed, the minimizer is easily found .
In this case it remains to determine This minimum is achieved when Finally, the corresponding function f is given by An obvious example is the case where we just need to denoise an image, without deblurring: Example 3. Consider the denoising problem with an ℓ 1 -constraint in the basis {φ λ } λ∈Λ . In this case T = Id, so that A = Id as well, and Moreover, in the real case we have This implies that S αAC • AT * is exactly the soft thresholding operator In the complex case, we have and the S αAC • AT * reduces to the "complex soft thresholding operator", i.e.

The ill-posed case
In the most interesting problems, the operator T * T does not have a bounded inverse. We can then use the surrogate functionals introduced in [7]. We replace (1) by a family of surrogate functionals H , and we have Proposition 3.1 Let C be as assumed in Section 2.2. Then the minimizer of G n,C is given by Proof. Under the assumptions made on C, we can apply the minimax theorem to the problem of finding the minimizer of G n,C , so that we have to determine For fixed h, the argument is minimized for and we need to determine Next, we discuss convergence properties of the iteration (5). We mentioned above that [6] contains an extensive discussion, including (not easily verifiable) conditions that ensure strong convergence. The full generality of [6] makes it less easy to read if one is mainly interested in the special case presented in this paper. For this reason, we sketch here a more direct (but more specialized) statement. Since the iteration is very similar to the one in [7], a very similar strategy for the proof of convergence holds as well. Up to strong convergence the techniques apply in almost the same way (thus we just sketch the steps). In order to achieve norm convergence, we have to pay more attention to the structure of C, however.
We start by sketching how to achieve weak convergence for our scheme. To this end, we define the nonlinear operator T by In order to establish weak convergence of the f n = T n f 0 we apply Opial's Theorem (see [15]) which reads as follows: Theorem 3.2 Let the mapping A from H to H satisfy the following conditions: (iii) the set G of the fixed points of A in H is not empty.
Then, ∀v ∈ H, the sequence (A n v) n∈N converges weakly to a fixed point in G.
To apply this to T, we thus need to verify that this operator satisfies the three conditions (i), (ii) and (iii) above.
First of all, condition (iii) is verified, by the following argument. We know that (1) has a minimizerf . Thenf minimizes not only F C , but also H . Consequently, the analysis above implies that Tf =f , so T does indeed have at least one fixed point.
Next, we ensure that T is asymptotically regular (condition (ii)): we observe that This implies Consequently, and hence we deduce Finally, we need to check that T is non-expansive (condition (i)): in order to show this property we need the following standard properties of convex sets. Lemma 3.1 Let K be a closed and convex set, then for all u ∈ H and all k ∈ K the inequality u − P K u, k − P K u ≤ 0 holds.
Prrof. For all λ ∈ [0, 1] one has Thus, for all λ ∈ [0, 1] −2λ u − P K u, k − P K u + λ 2 k − P K u 2 ≥ 0, so that we have u − P K u, k − P K u ≤ 0. Lemma 3.2 Let K be a closed and convex set, then for all u, v ∈ H the inequality Prrof. We need to prove Summing the two inequalities leads to Now with the help of Lemma 3.2 and since 0 ≤ T * T ≤ Id we obtain Thus, we have Proposition 3.2 Opial's Theorem 3.2 applies, i.e. f n = T n f 0 converges weakly to a fixed pointf of T.
One can argue that this weak convergence theorem suffices for practical purposes, because every numerical computation is always finite-dimensional so that weak and strong (i.e. norm) convergence of the f n are equivalent. However, it is often useful to establish norm convergence for the infinite dimensional Hilbert space as well, since this then implies that the rate of convergence, and the other constants involved, do not "blow up" as the dimensionality of the discretization increases.
To obtain norm convergence, we need to do some more work. In summary, we have established the following facts: • f n weak −→f , for n → ∞, • f n+1 = f n + T * g − T * T f n − P αC (f n + T * g − T * T f n ), • f n+1 − f n → 0, for n → ∞.
Defining u n := f n −f and v :=f + T * g − T * Tf, we can recast the results as follows: We can then apply, without any change, Lemmas 3.15, 3.17 of [7], leading to T * T u n → 0, for n → ∞, so that we obtain the equivalent formulation To obtain norm convergence of the f n , we must establish u n → 0. For general convex sets C the conditions (7), where α > 0 and v ∈ H are arbitrary (but fixed) actually do not imply norm convergence of the u n to 0. Abstract sufficient and necessary conditions for norm convergence are given in [6]; the following theorem gives a more concrete restriction on C under which we can establish norm convergence.
Theorem 3.3 Suppose u n weak −→ 0 and P αC (v) − P αC (v + u n ) → 0. Moreover, assume that u n is orthogonal to v, P C (v). If for some sequence γ n (with γ n → ∞) the convex set C satisfies γ n u n ∈ C then u n → 0.
Proof. Since γ n u n ∈ C, γ n 1 + γ n P C (v) + 1 1 + γ n (γ n u n ) ∈ C . Thus, This yields Since the u n are uniformly bounded there is some C 1 such that and because u n ⊥ P C (v), v, we obtain from the latter inequalities which, in turn, gives and consequently, Which shows that if n → ∞ then u n → 0.
Unfortunately, this theorem is not sufficiently strong to be applied to the BVfunctional of Example 2, above. Without going in full detail, we sketch here how it (just) falls short.
This concludes our theoretical analysis of our first case described in the introduction, i.e. the case where J(f ) in (1) is convex. We now turn to the second approach.

Iterative algorithm for PDE-based deblurring and denoising
We start by recalling briefly the framework of [20], in which the inverse problem g = T f + e, with edge-preserving regularization, is cast as the minimization of an energy functional of the form (assuming Gaussian noise for simplicity) here the potential φ : R 2 → R is typically a positive continuous increasing function, with at most linear growth at infinity, therefore satisfying |φ(ξ)| ≤ c(1 + |ξ|), for some positive constant c (note that we have replaced the subscript C by φ describing now the penalty and not the convex set of the previous approach). Convex examples include (note that, only for illustration reasons, we give also examples beyond the one-homogeneous case) • φ(ξ) = |ξ| (the total variation minimization of Rudin-Osher-Fatemi [21], [20]),  Table 2: Spatial discretization of the blur operator T .
Let us now restrict again to the one-homogeneous case and assume in addition that φ is differentiable. Then the Euler-Lagrange equation associated with the minimization problem (8), that must be satisfied by a minimizer f , if such a minimizer exists, is given by where ∇φ ξ = (φ ξ1 , φ ξ2 ), and with the boundary conditions ∇ ξ φ(∇f ) · n = 0 on ∂Ω, where n is the unit exterior normal to the boundary. In the case α > 0, the partial differential equation (9) is non-linear for the examples of potential φ given above. Moreover, the presence of the term T * T f makes it computationally expensive and numerically nontrivial. In order to overcome these problems, we propose here to not directly solve (9) numerically, but to apply the surrogate functional algorithm (see [7], or the previous sections), i.e. we construct a sequence of iterates f n that approximate f , without having to invert T * T at every iteration. On the other hand, the direct implementation of the projection P αC associated to our minimization is rather complicated; in this case we prefer to avoid it by switching to an expression based on the Euler-Lagrange equation. The total iteration goes thus as follows: start with an initial  Table 3: Comparison of the convergence rates: for both algorithms, we give the number of iterations, the CPU time and the corresponding relative RMSE, applied to the blurry fingerprint image, using the total variation minimization. We notice that the new method using the surrogate functionals converges faster to the restored image: the relative RMSE f n −f orig / f orig hits the value 0.15 at 2.500 iterations instead of 5.000, and uses a CPU time of ∼ 1300 instead of 2790; there seems thus to be a speed-up factor 2.
f 0 ; find f n , n > 0 as a minimizer of the surrogate functionals where we have assumed that T * T < 1. The associated Euler-Lagrange equation in f n , now easily solved in practice, is: together with the same boundary conditions. One then simply carries out this iterative algorithm to find (an approximation to) desired minimizer.
Remark The same idea can be also applied to other cases, when the L 2 (Ω) norm in (8) is replaced by another norm in a Hilbert space.
For instance, let us consider the minimization problem from Osher-Solé-Vese [16] for denoising, for simplicity, This model is a refinement over the total variation minimization [21], and decomposes the data g into a BV (Ω) component f and an oscillatory component g − f that belongs to (H 1 0 (Ω)) ′ , the dual of H 1 0 (Ω) when endowed with the standard seminorm. Even in the case without blur, the minimization of the convex functional (12) leads to a computationally expensive partial differential equation in f , or to a fourth order non-linear partial differential equation with associated boundary conditions. By applying the surrogate functional approach to the minimization of (12), we can avoid solving this PDE directly; instead compute f n as minimizer of the convex functional The minimization of the energy G n−1 with respect to f n leads to a simpler partial differential equation

Numerical Illustrations
In this section, we present numerical results of the two approaches. We assume the linear degradation model g = T f + e, where g is the given data, as a square integrable function in L 2 (Ω), f is the unknown true image, e is additive noise of zero mean. The operator T : L 2 (Ω) → L 2 (Ω) models a linear and continuous degradation operator, by a convolution with a Gaussian kernel.
In the first approach let now {ϕ λ } λ∈Λ be a frame, i.e. there exists positive and bounded constants 0 < A ≤ B < ∞, such that for f ∈ L 2 (Ω), we define the corresponding frame operator F , By means of the adjoint F * the variational problem (2) reads in the ℓ 1 setting as follows with the shorthand notation f = {f λ } λ∈Λ . The convex set C is now In accordance with Proposition 3.1, the resulting iteration scheme reads as where S α denotes the soft shrinkage operator with shrinkage parameter α. Waveletbased methods of this kind (in the context of image decomposition) are considered in [8,9]. In particular, we have chosen here a wavelet frame that is simply given by a translation invariant wavelet system. As the example image we consider a finger print and its blurred version, see Figure 1. The results obtained with iteration (17) are visualized in Figure 2 and the convergence rates are given in Table 1. The blur operator T used in the experiments has the discrete spatial representation given in Table 2.
The blur convolution is easily implemented as a multiplication in Fourier domain, which means that we switch between the wavelet and Fourier representation at every step of the iteration process.
Next, we present numerical results for the second (PDE) approach. In figure 3 we show the results of the iterative algorithm (11) on the same blurred and noisy image. For comparison with the purely PDE-based method (without the iterative approach corresponding to surrogate functionals) we show in Figure 4 the end results of methods (11) and (9); they look very similar. Table 3 lists the CPU time and the relative RMSE for the first 5000 iterations of both methods, illustrating that the surrogate functional method produces a better error decay for the same amount of CPU time. (These two computations were carried out on the same machine; note that the numerical results in Table 2 were obtained on a different computer Figure 2: Top left to bottom right: blurred image, several iterates using the wavelets scheme: 1st, 100th, 500th, 1000th, 2000th, 3000th, 4000th. and should thus not be compared with this Table.) Finally, we show in Figure 5 the result of a Cartoon+Texture decomposition of the same image (without blur) obtained by the surrogate-based iteration (15).

Conclusion
In this paper, we have extended the approach of [7], or, alternatively, illustrated with concrete examples the abstract analysis in [6]. In particular, we have written an iteration algorithm for solving linear inverse problems of the following general variational form F C (f ) = g − T f 2 H + 2α sup h∈C f, h , and we have applied it to sparse representation w.r.t. a wavelet frame. We have also shown that the iterative approach that naturally follows from introducing surrogate functionals, leads to a simplified solution algorithm for PDE-linked variational problems, resulting in a more efficient algorithm. We have discussed convergence and a sequence of numerical illustrations that verify the usefulness of the iteration. With surrogate functional Without surrogate functional Figure 4: Deblurring results obtained using the models (11) left and (9) right, with φ(ξ) = ǫ + |ξ| 2 (total variation minimization with regularization).