An Inner-Outer Iterative Method for Edge Preservation in Image Restoration and Reconstruction

We present a new inner-outer iterative algorithm for edge enhancement in imaging problems. At each outer iteration, we formulate a Tikhonov-regularized problem where the penalization is expressed in the 2-norm and involves a regularization operator designed to improve edge resolution as the outer iterations progress, through an adaptive process. An efficient hybrid regularization method is used to project the Tikhonov-regularized problem onto approximation subspaces of increasing dimensions (inner iterations), while conveniently choosing the regularization parameter (by applying well-known techniques, such as the discrepancy principle or the ${\mathcal L}$-curve criterion, to the projected problem). This procedure results in an automated algorithm for edge recovery that does not involve regularization parameter tuning by the user, nor repeated calls to sophisticated optimization algorithms, and is therefore particularly attractive from a computational point of view. A key to the success of the new algorithm is the design of the regularization operator through the use of an adaptive diagonal weighting matrix that effectively enforces smoothness only where needed. We demonstrate the value of our approach on applications in X-ray CT image reconstruction and in image deblurring, and show that it can be computationally much more attractive than other well-known strategies for edge preservation, while providing solutions of greater or equal quality.

1. Introduction. In this paper, we consider a new inner-outer iterative algorithm for edge enhancement in image restoration and reconstruction problems. Specifically, we consider generating regularized solutions x reg to where A ∈ R m×n is a known ill-posed forward operator, b represents the known measured data corrupted by some unknown noise η, and x true is the unknown, vectorized version of the true image to which we would like to generate an approximation. Here and in the following we assume that η is Gaussian white noise. One of the most well-known regularization methods is Tikhonov regularization, which consists in computing where R(x) is referred to as the regularization term. The regularization parameter λ determines the trade-off between the fidelity to the model, and the damping of the noise through enforcement of a priori information via R. Indeed, the role of λ cannot be overemphasized, since the quality of the solution estimate is highly dependent on its value. Unfortunately, in practice, good values of λ are rarely known a priori. When R(x) is a non-quadratic term, the most widely used approach is to solve (1.2) with a suitable optimization method for many values of λ, and use a heuristic approach, based on the output, to decide which provides the best quality solution. For some choices of R(x) that try to preserve edges in x, this can be a very expensive strategy, as we discuss below.
To motivate the work in this paper, let us first consider the case R(x) = Lx 2 2 , where L is taken to be the identity (standard form Tikhonov) or a discrete derivative-type operator (general form Tikhonov), such as the discrete gradient or Laplacian. Assuming that an appropriate value of λ is known, the corresponding solution x reg will necessarily be smooth given these choices of L , as the magnitude of x or derivatives thereof is penalized. For λ fixed, the solution to the optimization problem is straightforward to compute. For large scale problems, this is usually done via a Krylov subspace-based iterative projection method, such as CGLS, LSQR [22,23] or LSMR [8]. At each iteration, these iterative methods project the original problem (1.2) onto Krylov subspaces of increasing dimension, they only require one matrix-vector product with each of the A, A T , L, L T , and they have short term recurrences (so storage is minimal). Provided that the number of iterations is small relative to the dimension of the problem, and that fast matrix-vector products are possible, these are relatively efficient solvers.
The so-called hybrid methods [3,4,7,11,12,19,21] exploit the efficiency of iterative solvers such as those mentioned in the previous paragraph to select a good regularization parameter when R(x) has the form Lx 2 2 . Iterative solvers produce a small-dimensional projected problem that retains certain features of the original large-scale problem. The order of the projected problem equals the number of performed iterations. Additional regularization (for instance, Tikhonov regularization) is then efficiently applied to the small-dimensional projected problem. From there, a regularized solution to the original large-scale problem can be obtained. The upside of the hybrid approach is that the regularization parameter is selected at each iteration for the small-dimensional problem only: in this setting, some well-known techniques such as the discrepancy principle or the L-curve criterion can be applied with a negligible computational cost (both in terms of floating point operations and storage), by manipulating the much smaller projected quantities. However, with the usual choices of L mentioned above, the solutions will be smoother than desired.
Two well-known alternative choices for R(x) when one desires regularized solutions with sharp edges are T V (x) ( isotropic total variation) and Lx p p , with L the discrete gradient operator and p close to 1 (see [29] and references therein). For a single fixed λ, the algorithms employed to solve the optimization problem (1.2) usually require much more computational effort than if the 2-norm regularization is used. Moreover, the biggest pitfall of using these regularization operators is that λ is not known a priori. Thus, many optimization problems need to be solved for a discrete set of λ in order to compute a meaningful solution.
A natural and well-established way of handling problem (1.2) with R(x) = Lx p p , p = 2, is to reformulate it as a sequence of quadratic problems: the th quadratic problem involves a regularization term of the form R(x) = M ( ) x 2 2 , where the th regularization matrix M ( ) is defined with respect to L and the approximate solution of the ( − 1)th quadratic problem in the sequence. Here and in the following we refer to this approach as iteratively reweighted norm (IRN) method: this was first proposed in [14], and then extended in [25,26,31]. Recently, an Arnoldi-Tikhonov (hybrid) method for R(x) := x 1 was proposed [11], which can be used when the operator A is square. This approach combines a reweighting strategy (to approximate R(x)) and a hybrid strategy applied with the discrepancy principle to choose the regularization parameter. Since the weights are updated at each iteration of the hybrid method, and since the weights are formally regarded as variable preconditioners after transforming the current quadratic Tikhonov problem into standard form, the reweighted problem is efficiently projected using the flexible Arnoldi algorithm: because of this, only one cycle of iterations should be performed, while, for all the other IRN methods mentioned so far, inner-outer iteration cycles are needed when dealing with large-scale problems. This method based on flexible Krylov subspaces has been extended to work with R(x) = T V (x) in [10], and with rectangular matrices A as well as a sparsity-under-transform regularization term in [6].
In this paper we propose a technique that is similar to the IRN methods, in the sense that we, too, generate a sequence of quadratic problems, with a different weight matrix at each outer-iteration, and employ a hybrid approach on each regularized problem in the sequence. However, our method differs from each of the methods listed above in that: (a) our regularization matrix is different -indeed it is rectangular and possibly not full column rank; (b) the operator A may be rectangular; (c) due to (a) and (b), we must use a different hybrid projection algorithm. More precisely, we propose an automated, inner-outer iterative approach, consisting in solving a sequence of regularized least squares problems with 2-norm regularization of the form where a near-optimal value of λ for the th problem (1.3), which we denote as λ * , , is determined on-the-fly for each regularization operator M ( ) . Here, Γ ( ) denotes a specific projection space in which we will look for a projected solution. Because this solution space depends in part on M ( ) itself (but not on the regularization parameter) we use a superscript on it as well. To compute Γ ( ) and λ * , we employ the hybrid regularization approach (inner iterations) of [18]. We give a method for designing M ( ) adaptively (outer iterations) in such a way that edges are enhanced as increases. We discuss expected behavior on a class of images, and present results and comparisons on applications in X-ray CT and image deblurring to demonstrate that our algorithm has the potential to produce high-quality images in a computationally efficient manner. This paper is organized as follows. In Section 2, we give definitions, notations, and motivations for studying our new inner-outer iterative approach. We describe hybrid iterative regularization algorithms in Section 3, highlighting the particular method in [18] that we leverage in our new approach. In Section 4, we develop our inner-outer iterative algorithm, and provide some analysis. Section 5 is devoted to numerical results. We give conclusions and outline future work in Section 6.
2. Background and Motivation. We begin by motivating the need to take M ( ) in (1.3) as something other than the identity or the discrete gradient operator. Then, we discuss other algorithms proposed in the literature that have been used for edge-preservation.

Filtering
Methods. Consider the model (1.1) and assume that A has rank ρ. Consider the singular value decomposition (SVD) of Then, the minimum norm least squares solution to Ax = b is easily seen to be given by and σ 1 σ ρ > 0. Under the assumption of white noise η, the values of |u T i η| are approximately constant, whereas the |u T i b true | are large and dominant for small indices i but, eventually, decay toward zero and the noise becomes dominant (this property in commonly known as discrete Picard condition, see [15]).
Regularization methods like TSVD, and Tikhonov with R(x) = x 2 2 in (1.2), work by damping the contribution from the terms corresponding to large indices in (2.1), i.e., they are essentially filtering methods and compute a regularized solution of the form where φ i denotes a scalar that decreases with increasing i; see [15,17]. However, the edge content from the image is encoded in the high-frequency terms that are damped or discarded in (2.2), and hence such regularized solutions tend to be smooth. Next, let us consider what happens when R(x) = Lx 2 2 , where L is the discrete gradient operator. To fix notation, we assume the following. Let X be an N v × N h image , and let x = vec(X) be its vectorized version, obtained by stacking its columns. Define Using ⊗ to represent Kronecker product between two matrices, note that • L v X = reshape((I ⊗ L v )vec(X)) ∈ R (Nv−1)×N h is the image that approximates all vertical derivatives; is the image that approximates horizontal derivatives.  Here reshape(·) can be regarded as the inverse of the vec(·) operator. Now let The matrix L is called the discrete gradient operator and, clearly, Lx ∈ R N h (Nv−1)+(N h −1)Nv contains the edge information of x. Figure 2.1 illustrates the vertical derivative image, horizontal derivative image, and magnitude of the gradient image of a standard test image of size 128 × 128 pixels from the MATLAB toolbox "IR Tools" [9]. Consider (1.2) with R(x) = Lx 2 2 . Since Lx contains the edge information, adding a multiple of R(x) to the data misfit component of the cost functional has the effect of producing regularized solutions where jumps are penalized. For images that are supposed to be smooth, this is not a problem. But assume for a moment that X is a piecewise constant image with jump discontinuities (as the one displayed in the top-left corner of Figure 2.1). Additive noise has the effect of creating jumps, so R(x) smooths out the discontinuities in a region of otherwise constant value, which is desired. Unfortunately, because the values of Lx are large near true jump discontinuities, the price one pays is that the jump discontinuity is also smoothed out.

Edge Preservation Algorithms.
As mentioned in Section 1, two popular choices for R(x) in (1.2) are Total Variation [27] and Lx p p for 1 ≤ p < 2 [15]. The Total Variation functional can be expressed, in a discrete setting, as Clearly, either choice for R necessitates using an iterative optimization technique and, at each step of this routine, a linear system or least squares problem may be solved to generate the next search direction. The lagged diffusivity fixed point iteration (see [30] for algorithm details) is a popular choice, but other options such as steepest descent, Newton's method, and primal-dual approaches are also possible (see [29] for details and references). For R(x) = Lx 1 , one can employ iteratively reweighted methods (from the Newton family) or interior point methods [20]. Other approximations to TV may also be employed (see, for example, [1] and the discussion in [29]). However, a key issue that is somewhat disregarded within the edge-preserving solvers listed so far is the choice of regularization parameter λ. Indeed, the optimization codes written to solve (1.2) with the edge-preserving R(x) outlined in the previous paragraph all assume that the value of λ is fixed. Since the optimal value of λ is not known a priori, the procedure usually advocated is to solve multiple optimization problems for different values of λ picked from a discrete set, and then choose the value of λ that gives a "best" image according to some metrics. This involves many calls to the optimization routine; moreover, the convergence behavior of a particular run of the optimization method will vary with λ. Thus, on one side it would appear that recovery of edge information via any of the aforementioned methods would be very computationally expensive, because we have to solve multiple optimization problems for multiple λ (one for each value of λ); moreover, as hinted above, each one of the optimization problems may require multiple linear system solves to determine the outcome.
On the other side, if we take R(x) = x 2 or R(x) = Lx 2 , the cost functional in (1.2) is quadratic and it is relatively straightforward to compute the solution to the optimization problem (i.e., in this case the cost of one full solve is roughly the cost of a single optimization step for the edge-preserving R's). Even so, especially when it comes to the choice of the regularization parameter, further savings in the case of a quadratic problem (1.2) can be obtained through hybrid iterative methods. Hybrid regularization methods operate by projecting a quadratic problem onto smaller-dimensional subspaces where certain properties of noise and signal separation can be preserved. When employing a hybrid approach, (additional) regularization is applied to the small projected problem instead, and the regularization parameter is chosen by applying a suitable heuristic to the projected problem. This is significantly cheaper than solving a quadratic problem (1.2) for each new value of λ. This is discussed in detail in the next section. However, adopting this family of quadratic R(x)'s, we trade edge content for computational efficiency.
In this paper we propose to merge the best of these two worlds, defining an adaptive sequence of quadratic problems whereby a hybrid approach can be employed to automatically choose the regularization parameter in an efficient way for each quadratic problem in the sequence. Our choice of M ( ) has features in common with the methods proposed in [2,31], in that it involves different weightings of the gradient operator to emphasize edges in current estimated solutions. However, it differs from those methods in that the effects of our M ( ) are cumulative, i.e., information from all the previous steps is retained. To run our algorithm we only need to be able to perform matrix vector products with the forward operator A in (1.1) and with L in (2.3), and their respective transposes. Moreover, our approach uses hybrid methods to automatically determine regularization parameters.
3. Hybrid Iterative Methods. In this section, we give the general outline for two (hybrid) approaches that could ultimately be used within our inner-outer iterative strategy. The first one is based on LSQR, while the second one is based on a joint bidiagonalization algorithm. In our problems, D ( ) L will not have full column rank, which makes the second option a more elegant choice and easier to implement ( for this reason, the second option is used to perform the numerical experiments displayed in Section 5). However, for completeness and ease of exposition, we present the LSQR-based hybrid approach first. For more specific details, and for additional methods from the hybrid regularization family, we refer to [11,12].
Consider the problem where, for the remainder of this section, we assume that M remains fixed. This problem is mathematically equivalent to the linear least squares problem (3. 2) The hybrid methods described in the following subsections compute approximate solutions to (3.1) and (3.2) as functions of λ, and select good choices of λ in efficient ways.
3.1. Hybrid-LSQR Method for Standard Form Tikhonov. Let us first address solving (3.1) iteratively by LSQR [22,23]. Implicitly, this approach amounts to transforming problem (3.1) into standard form and constraining the solution to belong to a k-dimensional subspace K k at the kth iteration, i.e., solving In the above definition, the use of the dagger symbol without an accompanying subscript (e.g., M † ) is meant to denote the standard Moore-Penrose pseudo-inverse (see, for instance, [13]). Note that, if M is invertible, then M † A = M −1 , and, if M has full column rank, then M † A = M † . If M is an underdetermined matrix, then M † A may not be the same as M † . For further details on A-weighted pseudo-inverses in the context of Tikhonov regularization, we refer to [15].
When adopting LSQR, K k represents the k-dimensional Krylov subspace The kth iteration of the LSQR algorithm updates the matrix recurrence where B k is (k + 1) × k lower bidiagonal, V k has k orthonormal columns that span K k , and U k+1 has k + 1 orthonormal columns with U k+1 (βe 1 ) = b. Taking y = V k z (i.e., imposing y ∈ K k ), and exploiting unitary invariance of the 2-norm allows us to rewrite (3.3) as the O(k)-sized least squares problem which is a Tikhonov-regularized projected problem in standard form. Denoting by z (λ,k) the minimizer of the above problem, y (λ,k) = V k z (λ,k) is the minimizer of (3.3). If k is large enough, so that the spectral behavior of the bidiagonal matrix B k mimics that of the operator AM † A , then the impact of the choice of λ on the quality of the solution can be observed. However, as hinted in Section 1, k should be assumed small enough, so that problem (3.5) can be efficiently generated and solved. For fixed k, we can use any suitable parameter choice strategy (such as the discrepancy principle or the L-curve criterion) on the projected problem (3.5) to choose λ.
Still assuming k fixed, and regardless of the parameter selection method, let λ k denote the estimated optimal parameter. Assuming k is large enough, we then find z (λ k ,k) by solving (3.5), we set y (λ k ,k) = V k z (λ k ,k) , and we transform from standard form (3.3) back to general form (3.1), to get an approximation x (λ k ,k) of an optimal regularized solution of (3.1). Computing y (λ k ,k) may be done with short term recurrences on the iteration k (i.e., in k, for fixed λ), without storage of V k . See, for example, [19] for details.
To evaluate the computational cost of this class of hybrid algorithms we should consider that each iteration requires a matrix-vector product with AM † A and one with its transpose. Note that AM † A v is computed as the two subsequent "products" A(M † A v). The product M † A v is actually computed implicitly: for instance, if M † A = M −1 , then M −1 v = g is the same as v = M g, so in practice the system M g = v is solved for g, and M −1 is never explicitly computed. Thus, one usually refers to the "product" of M † A v as "applying M † A ". Unfortunately, applying M † A in the case of the A-weighted pseudo-inverse defined as in (3.4) requires additional solver calls. Moreover, in general, each solver call can involve invoking iterative solvers. Therefore, analyzing the computational cost of the products M † A v, and of this hybrid algorithm as a whole, may be difficult. In any case, the overall cost in producing a regularized solution depends predominantly on k, and the cost of an application of M † A and matrix-vector products with A (likewise, their transposes); we must highlight again that the computational cost is essentially independent of the number of discrete values of λ tested to determine the optimal regularization parameter.

Hybrid Method for General Form Tikhonov. Applying M †
A is often non-trivial [15], especially when M is defined as a product of matrices (see also [10]). The authors of [18] develop an alternative hybrid method based on a joint bidiagonalization algorithm, which we now describe. Here, M can be rectangular and does not need to be full rank. Avoiding transformation into standard form (and the need of applying M † A ), and starting instead from (3.2), the formulation is considered, where a partial joint bidiagonalization for A and M is updated at each iteration as follows where The last problem in the above equations is a Tikhonov-regularized problem of dimension O(k) involving only bidiagonal matrices and, for k small relative to the original problem dimension, it is significantly cheaper to solve. Let us denote by w (λ,k) the minimizer of this O(k)-dimensional problem: then x (λ,k) = Z k w (λ,k) is the minimizer of (3.6). The bidiagonal structure leads to recurrence updates (in k, for fixed λ) for x (λ,k) , for the residual norm Ax (λ,k) − b 2 and for the regularization norm term M x (λ,k) 2 , as functions of λ. So, for each fixed k, the regularization parameter can be computed efficiently using strategies such as the discrepancy principle or the L-curve criterion, which only require those quantities.
Thus, as long as k is large enough that the bidiagonal pair inherits certain spectral properties of the original operators, the impact of λ as a regularization parameter for the kth system can be observed, a suitable parameter λ k can be selected, and the solution x (λ k ,k) = Z k w (λ k ,k) of (3.6) can be readily obtained by employing short term recurrences: this solution is an estimate of an optimal regularized solution of (3.2). The reader is referred to [18] for further details.
The algorithm requires only that products with A, M and their transposes can be computed. The cost of generating the bidiagonal pair is somewhat difficult to quantify. The kth step requires one call to LSQR to produce orthogonal projections that are needed to update the partial joint bidiagonalization (3.7), and each LSQR iteration requires 4 matrix-vector products (with each of A, A T , M , M T ). If LSQR needs m k steps, then the cost of a call to LSQR is proportional to m k times the cost of those 4 matrix-vector products. The value of m k is typically small relative to problem dimension but, depending on k, it may affect the overall reconstruction time. Promising methods for keeping m k small are currently under investigation. Nevertheless, bounds on the behavior of k and m k are possible, and for some classes of operators (e.g., see the CT image examples in the Section 5) both k and m k are very small. Even in cases where k and/or m k are larger, when this approach is used as part of the inner-outer edge-preserving iterative scheme below, the overall performance in producing quality images without any parameter tuning favors our approach versus multiple calls per λ to sophisticated optimization routines, whose behavior and performance is at least as difficult to analyze.
4. New Inner-Outer Iterative Algorithm for Edge Preservation. So far, we have seen that hybrid iterative methods for 2-norm Tikhonov-regularized problems are computationally appealing for two reasons: (a) their cost is associated with the number of iterations, k, and the ability to compute matrix-vector products, thus requiring little storage (although a precise computational cost estimate can be difficult; see Sections 3.1 and 3.2); (b) a good value for λ can be computed on-the-fly essentially for free, with strategies like the discrepancy principle or the L-curve criterion. On the other hand, choosing the 2-norm of the gradient as a regularizer has a smoothing effect; changing to the p-norm of the gradient, or changing to TV regularization, would enhance edges but requires the solution of a sequence of difficult optimization problems for many values of λ.
We therefore look to build an approach that allows us to leverage the capabilities of hybrid algorithms, with their computational efficiency and ability to choose regularization parameters on-the-fly, but has the adaptivity to capture edge information in the reconstruction process. Clearly, based on the discussion in Section 2, this means we cannot use a regularization term of the form R(x) = Lx 2 2 . We propose to solve the sequence of problems where λ * , denotes the "optimal" regularization parameter for the th problem, chosen according to some criterion. The th problem computes a solution over a subspace of dimension k (i.e., subspace dimension can vary with , hence the subscript on Γ). In this section, we will answer the following questions: • How can D ( ) be expressed, and how should it evolve to produce edge-enhanced images?
• How can x ( * ,k ) be obtained efficiently?
Assume further that it is a good enough estimate that at least one edge is visible. If we compute the gradient image (as a matrix-vector product) and normalize it, i.e., where | · | in the numerator is used to denote element-wise absolute value, then this vector will have the largest values equal to 1 where the dominant edges are located (these are precisely the values we do not want to penalize) and smaller nonnegative values where pixels are still corrupted by noise (these are precisely the values we do want to penalize). If we now consider the "image" where 1 is the vector of all ones and • denotes element-wise exponentiation, then the values we do not want to penalize have been mapped to the smallest values ≥ 0, and the noisy parts of the image we want to penalize have been mapped to the largest values ≤ 1. Choosing p 1 in (4.3) results in more penalization of the supposedly smooth regions (as more entries in the vector d (1) are forced to be close to 1), and therefore in an overall smoother reconstruction. Choosing p 1 in (4.3) results in less penalization of the supposedly smooth regions (as less entries in the vector d (1) are forced to be close to 1), and therefore in an overall less smooth reconstruction. Thus if, for = 1, we use in (4.1), this has the effect of enforcing smoothness on parts of the image where we still expect to be able to see, and improve, smoothness; at the same time, it does not wash out the "true" edges, because we have (almost) zeroed out those components of the gradient image, as prescribed by the diagonal weighting in (4.1).
Suppose for the moment that λ * ,1 is known, and that x ( * ,1) has been determined (as described in the next subsections).
If x ( * ,1) is an improvement over x ( * ,0) , then the normalized gradient image ∞ should reveal even more edge information. We set d (2) = 1 − (g (1) ) • p, so that again edge information we do not want to penalize has been mapped to the smallest values ≥ 0, and noise-contaminated components have been mapped to the largest values ≤ 1. We then take (2) )D (1) .
Notice that D (2) encodes information about both the previous solution estimates x ( * ,0) and x ( * ,1) , and its entries still remain between 0 and 1. The null space of D (2) L is spanned by the constant vector 1, and by the piecewise-constant vectors having edges corresponding to the dominant edges of x ( * ,1) and x ( * ,0) . This process is iterated. More insight into this choice of diagonal weighting matrices is given in [28]. More algorithmic details are unfolded in the following Sections 4.2 and 4.3.

4.2.
Choosing the Regularization Parameter and Computing the Solution. We now further elaborate on techniques for the computation of the regularization parameters and the corresponding solutions for the th problem in (4.1).
We can use a hybrid method to determine λ * , and x ( * ,k ) . In our numerical examples, we use the method described in Section 3.2, together with either the discrepancy principle or the L-curve criterion for the projected problem (3.6). The following statements can be easily reformulated to work in connection with the hybrid method described in Section 3.1.
The discrepancy principle can be applied if a good estimate of the norm η 2 of the noise affecting the data is available. When used at the kth iteration of the hybrid solver for (4.1), it consists in determining the parameter λ k such that where τ > 1, τ 1, is a safety threshold, and where the factorizations (3.7) and the properties of the matrices appearing therein have been exploited. Equation (4.4) is nonlinear with respect to λ k , and an appropriate zero-finder should be employed (see, for instance, [24]). Problem (4.4) can be solved at a negligible additional computational cost, provided that k is relatively much smaller than the original problem size min{n, m}, and by updating some relevant factorizations for B k as k increases. The inner iterations (in k) are stopped when no significative variations in λ k are detected for two consecutive values of k. Alternatively, one can employ the so-called "secant method" [12], which updates λ k in such a way that stopping by the discrepancy principle is ensured. When the stopping criterion is satisfied, we take k = k, and the optimal parameter λ * , for the th problem in (4.1) is set to λ k . Also, Γ ( ) k = Z k , and x ( * ,k ) = Z k w (k ,λ k ) . The L-curve criterion can be employed at the kth iteration of the hybrid solver for (4.1), even when an estimate for η 2 is not available: in this case, the horizontal axis is determined by B k w (λ,k) − βe 1 2 = Ax (λ,k) − b 2 , and the vertical axis is determined by B k w (λ,k) 2 = M ( ) x (λ,k) 2 , where M ( ) = D ( ) L; the above quantities are evaluated, at each iteration k, for a given fixed discrete set of values of λ, which correspond to points on the L-curve. The inner iterations (in k) are stopped when the corner of an L is visible for a few iterations, and the estimated corner value is constant for a few iterations. When this stopping criterion is satisfied, we take k = k, Γ ( ) k = Z k , and λ * , = λ k as the optimal parameter for the th problem (4.1). The solution x ( * , ) is generated by a short term recurrence, and it is therefore very storage efficient. We refer to [18] for further computational details.
We stress that somewhat different computational approaches should be adopted when considering the discrepancy principle or the L-curve criterion within the inner hybrid solver for the th problem in (4.1). This is because, at the kth hybrid iteration, the former computes approximate solutions that correspond to regularization parameters that are sequentially selected by the zero finder applied to (4.4), while the latter simultaneously computes approximate solutions that correspond to a predetermined set of regularization parameters. However, in both cases, the computational overhead of selecting λ * , is negligible if k min{m, n}.
Similarly to the discrepancy principle and the L-curve criterion, every parameter choice rule that relies on the current residual or solution (semi)norms is an attractive option for hybrid methods.
Clearly, this process of adaptively computing optimal solutions and updating the regularization operator to account for newly acquired edge information can be repeated for the ( + 1)th problem in (4.1). A sketch of the resulting algorithm is presented in the next section.
4.3. Summary of the Main Algorithm. As already highlighted, our new algorithm iteratively improves available solution estimates by recovering and exploiting edge information on the go. This algorithm is inherently based on an inner-outer iteration scheme, where inner iterations are handled with a hybrid al-gorithm, and the outer iterations update the available regularization term. The main operations performed by the new method are summarized in Algorithm 1.
The outer iterations in Algorithm 1 need a starting guess x ( * ,0) to compute the first weighting matrix D (1) . Although Algorithm 1 can accept any x ( * ,0) (even possibly determined by a different regularization method), a natural choice is to take x ( * ,0) as a constant vector: in this way, for = 1, we get , and the first problem in the sequence of quadratic problems (4.1) is actually a Tikhonov regularization problem with R(x) = Lx 2 2 . A constant x ( * ,0) = 0 will be used for the numerical experiments in Section 5.
Two appropriate stopping criteria should be set when implementing Algorithm 1. The first one prescribes how to terminate the hybrid iterations, and should be devised in connection with the regularization parameter choice strategy, as discussed in Section 4.2 (basically, we monitor the stabilization of the approximate λ k along consecutive hybrid iterations). The second one prescribes how to terminate the outer iterations: since we would ideally like to iterate until there is no more real edge information to recover, we track Lx ( * , ) 2 . If we detect that this quantity is decreasing at step , then we have likely oversmoothed our solution, so we break out of the (outer) loop.

Analyzing the Main Algorithm.
Directly from the definition of the diagonal weights given in Section 4.1, it is immediate to state that: 1. If a diagonal entry of D ( ) becomes 0, it stays 0 at future outer iterations. 2. The ith diagonal entry of two consecutive weight matrices is such that 3. Large values in the gradient images incur smaller weights, which are even smaller when p 1 is selected. To further analyze the behavior of Algorithm 1, we ignore the subspace constraint Γ ( ) k in (4.1) and consider the sequence of minimization problems This assumption simplifies the following derivations, and it is also quite realistic because the hybrid solutions of (4.1) resemble the minimizers of (4.6) for k sufficiently large (i.e., when enough inner hybrid iterations are performed). Now let us give a preliminary qualitative overview of the role played by the regularization parameter λ in (4.6). For smaller λ, the model fidelity through the operator A T A is the dominant term in the cost functional. For larger λ the regularization (derivative) term is dominant. Note that solving problem (4.6) is also mathematically equivalent to solving (4.7) In this reformulation it is easier to see that the second term plays the role of a diffusion-like operator (see, for instance, [2,30]). Diffusion operators are frequently used in denoising (see, for instance, [5] and references therein), and so we may also state that a large λ favors denoising. When D (1) = I (e.g., when Algorithm 1 runs with a constant initial guess x ( * ,0) ), the applied regularization just enforces smoothness on the derivative. In our problems, we know that the image is not smooth, so overly smooth solution estimates result in a large data-fidelity mismatch. Therefore, any reasonable parameter selection routine, would not select a λ too large, since to do so would return a smooth solution at the price of a large residual; this is surely not the case when the discrepancy principle is adopted. However, if x ( * ,0) is already quite successful in revealing the edges of x true , then, for any vector v of appropriate size, so that, when running the second outer iteration of Algorithm 1, a minimizer of (4.6) with larger λ does not enforce smoothness uniformly across the entire solution, and could be tolerated. Indeed, the second outer iteration behaves more like denoising than deblurring, because the weights are large in regions where the edges were not detected , and those are regions where a good amount of smoothing is still needed. Thus, we expect the parameter selection strategy to choose a larger value of λ in the second step. We expect this to be a common trend also in the following outer iterations, meaning that there is increased reliance on the denoising provided by the regularization term as the outer iterations progress. We can prove that this is exactly the case when the discrepancy principle is used to select λ for the th problem in (4.6). This result (stated in Proposition 4.2) requires a fair amount of derivations, and it is preceded by a lemma (Lemma 4.1) that states that the discrepancy curves, i.e., the curves decrease with respect to the outer iteration counter . Lemma 4.1. Assume that the null spaces of A and D ( ) L intersect trivially, for = 1, 2, . . . . Let r (λ, ) = b − Ax (λ, ) be the discrepancy associated to the th problem in the sequence (4.6), with λ = λ ≥ 0, = 1, 2, . . . . Then Proof. Consider the th problem in the sequence (4.6). We first have to introduce some notations. Let i.e., the matrix appearing on the left-hand side of (4.7), with λ = λ ≥ 0. Then . . , n}, i.e., the diagonal weighting matrices computed at two consecutive iterations differ at most for h ≥ 0 entries. Then, thanks to (4.5), we can write i.e., the difference between two consecutive matrices of the form A ( ),j is a symmetric rank-1 matrix. Let us fix j ∈ {0, . . . , h − 1}. If we show that we can conclude that r (λ, ) 2 2 := r (λ, ),h 2 2 ≥ r (λ, ),h−1 2 2 ≥ · · · ≥ r (λ, ),0 2 2 =: r (λ, +1) 2 2 . Thanks to the Sherman-Morrison formula (see, e.g., [13]), and where we have used twice that ( A ( ),j ) −1 is symmetric positive definite. Then where we have defined w = A( A ( ),j ) −1 w. Here δ (2) ≥ 0, since w w T is symmetric positive semi-definite. If we show that δ (1) ≥ 0, then we have (4.11) and therefore (4.9). Let be the GSVD of the matrix pair (A, D ( ),j L), [13]. Then (exploiting the properties of the matrices involved in the decompositions above), (4.10), and the definitions of w and w, Note that both D   Proof. Take ≥ 1. Thanks to (4.9), the discrepancy curve (4.8) computed for the th problem lays below the discrepancy curve computed for the ( − 1)th problem. Also, one can easily prove that, for a fixed and a variable λ ≥ 0, the discrepancy curve increases with respect to λ. Since applying the discrepancy principle consists in solving b − Ax (λ, ) 2 = τ η 2 , this implies (4.12). More generally, we experimentally find that the parameters λ * , (often) satisfy λ * , ≤ λ * , −1 also when inner hybrid iterations are provided (i.e., when problem (4.1) is solved instead of (4.6)), and that this also happens when the L-curve criterion is used instead of the discrepancy principle (see Section 5).
We conclude this section by stressing again that Algorithm 1 works well when the initial solution estimates x ( * ,0) or x ( * ,1) are able to recognize at least one decent edge: if the initial estimates are too smooth, which might be the case for example in deblurring with a large noise to signal ratio, our method can fail, as the mapping from the normalized gradient image to [0,1] (described in equation (4.3)) may be totally inappropriate. The qualitative performance of Algorithm 1 also depends on the success of the parameter choice strategy but, as we show in the next section, it works quite well for many applications, providing significant enhancement in a completely automated fashion over the initial solution estimate.

Numerical Results.
All experiments were done in Matlab version 9.1. The test problems come from the MATLAB software packages IR Tools [9] and AIR Tools II [16].
In this section, we would like to assess the potentialities of our new algorithm, and we mainly do so by comparing it to other similar inner-outer iteration methods for Total Variation regularization. Namely, we consider the IRN method for Total Variation (IRN-TV) proposed in [31]: similarly to our new method, IRN-TV updates some weights and defines a sequence of least squares problems (outer iterations), and solves each iteratively reweighted least squares problem iteratively (inner iterations). IRN-TV and our new method mainly differ in the way the weights are defined. The IRN-TV weights D ( ) for the th least-squares problem (4.1) have the following expression where x ( * , −1) is the solution of the ( − 1)th least squares problem, I 2 is the identity matrix of order 2, and 1 ≤ q < 2. Taking q = 1, fixing y = vec(Y ), Y ∈ R N h ×Nv , and evaluating the above weights in y leads to W (y)Ly 2 2 = TV(y), where L is the discrete gradient operator (2.3), and TV(·) is defined as in (2.4). Because of this, we can expect that the reweighted regularization term in IRN-TV is a good approximation of TV(x true ) when x ( * , −1) is a good approximation of x true . Moreover, when [vec(L v X) 2 + vec(XL T h ) 2 ] i is big (typically when there is an edge between the ith and the (i + 1)th pixel), then [(W ( ) R ) 1/2 ] ii is small (so that the corresponding gradient component is not much penalized in the minimization process); viceversa, when [vec(L v X) 2 + vec(XL T h ) 2 ] i 0 (typically corresponding to smooth portions of X) then [(W ( ) R ) 1/2 ] ii is huge (so that the corresponding gradient component is penalized even more in the minimization process). IRN-TV and our new method also differ in the way each least squares problem is solved: IRN-TV assumes that a good value of the regularization parameter is fixed, and employs CGLS for the inner iterations. We also make comparisons with two inner-outer iterative schemes that can be collocated somewhere in between IRN-TV and our new algorithm, namely: (a) we define the weights at the beginning of each outer iteration as in (5.1), and we solve each quadratic problem by the hybrid method described in Section 3.2, with the added benefit of adaptively choosing the regularization parameter; (b) we define the weights at the beginning of each outer iteration as in Section 4.1, and we solve each quadratic problem by CGLS, having fixed a regularization parameter.

Examples from X-ray CT.
Parallel-beam CT. We use IR Tools to setup the following X-ray tomography simulation: • Generate a simulated true phantom image called "grains" from AIR Tools II, of size 128×128 pixels.
• Construct noise-free projections, along with the matrix A that simulates the ray trace forward operator, assuming a parallel beam X-ray transmission, with data collected at angles 0, 2, . . . , 130 degrees. First of all, we test our new algorithm for different parameter choice strategies employed within the inner hybrid scheme for generalized Tikhonov regularization (Section 3.2). More precisely, we consider the discrepancy principle and the L-curve criterion. The left frame of Figure   Parallel-beam CT test problem. Left frame: logarithmic plot of the discrepancy curves, i.e., Ax (λ,k ) − b 2 versus λ, at the end of some selected cycles of inner iterations (as implied from the text in the legend, these correspond to the outer iterations = 1, 2, 5, 10, 15); the red circles denote the intersection between each discrepancy curve and the noise level line, which corresponds to the chosen regularization parameter, λ * , , for the particular outer iteration. Right frame: logarithmic plot of the L-curves for each outer iteration (as implied from the text in the plot, the top curve corresponds to the first outer iteration, = 1, and the curves below this correspond sequentially to iterations = 2, 3, . . . , 11,12); the red circles denote corners of each L-curve, which corresponds to the chosen regularization parameter, λ * , , for the particular outer iteration.
When our algorithm is implemented with the stopping criterion described in Section 4.3 (i.e., stopping as soon as Lx ( * , ) 2 decreases during two consecutive outer iterations) we have termination after = 15 iterations if the discrepancy principle is adopted in the inner iterations, and after = 12 iterations if the L-curve criterion is adopted in the outer iterations. Figure 5.3 shows a plot of the relative errors and chosen regularization parameters at each outer iteration, when the discrepancy principle and the L-curve criterion are used to select the regularization parameter during the inner hybrid iterations. Note that, as expected, the regularization parameters increase as the outer iteration proceeds: this illustrates the property derived in Section 4.4 for the discrepancy principle, which experimentally holds also for the L-curve criterion. This behavior of the regularization parameter is meaningful and desirable: indeed, as the outer iterations increase, we are recovering approximate solutions of enhanced quality that result in improved weights for the regularization term, which in turn should be weighted more to achieve reconstructions of even higher quality. Computed reconstructions for the first outer iteration (that is, x ( * ,1) ), and for the final outer iteration (that is, x ( * , 15) or x ( * , 12) , depending on the parameter choice strategy) are shown in Figure 5.4. As we can see from these plots, there is a significant improvement in the reconstructions, and in particular the edges at the final outer iteration are much sharper than in the initial outer iteration. Also, pixel intensities are very close to those of the true image. Since our new algorithm is an inner-outer iterative strategy, and since so far only the behavior of the solution and regularization parameter across the outer iterations has been displayed, Figure 5.5 displays the relative errors against the number of total iterations. The end of each inner iteration cycle is marked by an asterisk. We can see that each inner iteration cycle consists of 60 iterations, i.e., the maximum allowed number of inner iterations: despite the inner stopping criterion for the inner iterations not being very effective for this example, we can observe that the quality of the reconstruction stabilizes and is almost optimal after some inner iterations (although it seems to slightly deteriorate for the discrepancy principle): this means that both adaptive parameter choice strategies are quite effective. The definition of the weights at each outer iteration depends on a power p > 0 (see, for instance, equation (4.3)). In Section 4.1 we made the argument that the choices p 1 and p 1 correspond to less and more smooth reconstructions, respectively. We now experimentally assess how the value of p affects the quality of the reconstructions when the discrepancy principle is employed to adaptively set the regularization parameter at each inner iteration. considered values of p deliver reconstructions of excellent quality, we can see some slight spurious oscillations in the supposedly constant patches reconstructed taking p = 0.5, and we can clearly see that some of the edges reconstructed using p = 4 are washed out. Interestingly enough, still comparing the choices p = 0.5 and p = 4 in Figure 5.6, we can see that, since p = 0.5 intrinsically enforces less smoothness through the weights, a larger regularization parameter is automatically set to compensate for this, and less outer iterations are performed; viceversa, since p = 4 intrinsically enforces more smoothness through the weights, a smaller regularization parameter is automatically set to compensate for it, and more outer iterations are performed. To generate the following graphs and when considering all the following test problems, we will always take We now consider some comparisons between the new solver and different inner-outer iterative methods for edge enhancement in imaging. First of all, we assess the effect of a different choice of the weight matrix: namely, we keep an inner iteration scheme based on the hybrid solver for general form Tikhonov described in Section 3.2, together with the discrepancy principle to adaptively set the regularization parameter, and we choose the weights as in (5.1), i.e., the weights associated to the IRN-TV methods. In Figure 5.8 we display the behavior of the relative errors and regularization parameters versus the number of outer iterations, considering the new weights and the IRN-TV ones. We can clearly see that the IRN-TV method seems to perform well during the early (outer) iterations, but then the quality of the reconstructed solution rapidly stagnates and deteriorates, while the new method keeps improving. Also, the regularization parameter selected by the discrepancy principle seems to be almost constant right from the early outer iterations when the IRN-TV weights are used, while the regularization parameter selected by the same rule keeps increasing when the new weights are used. Correspondingly, Figure 5.9 shows the reconstructions computed by the two methods at selected outer iterations: namely, at iterations = 2 (i.e., as soon as the reweighting becomes effective; see Section 4.3), = 6 (i.e., when the relative error associated to the IRN-TV weights is lower than the one obtained employing the new weights), and = 15 (i.e., when the stopping criterion for the outer iterations is satisfied employing the new weights). We can clearly see that, contrarily to the new method, the reconstructions obtained using IRN-TV weights do not improve much when the outer iterations proceed, and piecewise constant features are never completely recovered. In order to better understand the reasons behind such a different performances of these two inner-outer iterative schemes, in Figure 5.10 we display the entries of the weighting diagonal matrices at selected outer iterations, rearranged as images. Note that, directly from the definition of the weights (see, for instance, (4.3) and (5.1)), when considering the new weights it is meaningful to display two images (one for the weights applied to the vertical derivatives, and one for the weights applied to the horizontal derivatives), while only one image suffices when considering the IRN-TV weights (because the same weights containing information about both the vertical and the horizontal derivatives are applied to both the vertical and the horizontal derivatives). Looking at Figure 5.10 it is evident that the new weights associated to both the vertical and horizontal derivatives correctly recover most of the edge locations (which are mapped to the smallest values ≥ 0), and the smooth regions (which are mapped to the largest values ≤ 1): in this way, appropriate penalization happens and the reconstructions as well as the weights improve along the outer iterations. The same is not true for the IRN-TV weights: while the locations of the edges are somewhat recovered (and the smallest ≥ 0 weights are assigned to them), the smooth regions do not properly show up (and very oscillating small weights are assigned to them too): these weights are clearly not effective in enforcing piecewise smoothness, and almost no improvement can be seen in both reconstructions and weights as the outer iterations proceed.
Finally, we show two more comparisons, that assess the influence of the inner iterative solver on the overall behavior of the method. Namely, we consider the inner-outer iterative schemes implemented with both the new and the IRN-TV weights, and with both the hybrid and the CGLS methods as inner solvers. Since CGLS requires a fixed value of the regularization parameter to be available in advance of each cycle of inner iterations, we first run the methods based on the hybrid solver for general form Tikhonov that adaptively chooses the regularization parameter at each inner iteration according to the discrepancy principle: in this way, a parameter λ * , is eventually set at the th outer iteration. When running the methods based on CGLS, we take λ * , as regularization parameter for the th outer iteration. errors versus the number of outer iterations for the four instances of inner-outer iterative methods just described. In both the frames displayed in Figure 5.11 we can see that the results obtained using CGLS as inner iterative solver are not as good as the ones obtained using the hybrid strategy; this is mostly evident when the IRN-TV weights are used. This could be because λ * , is supposed to be a nearly-optimal regularization parameter when the joint bidiagonalization algorithm is used to project the th quadratic problem (4.1); however CGLS projects the th quadratic problem into a different subspace, and the same regularization parameter λ * , might not be such a nearly-optimal choice for CGLS. Moreover, adaptively choosing the regularization parameter for the CGLS inner iterations might help improving the quality of the reconstructions.
Parallel-beam CT with limited angle. We use IR Tools to vary some of the options defining the previous X-ray tomography simulation. In particular, we still wish to consider a parallel beam X-ray transmission problem, but we would like to change the phantom image to the so-called "three-phases" one from AIR Tools II of size 128 × 128 pixels, which has many piecewise constant patterns, and we would like to make the reconstruction problem more challenging by using (limited) projection angles 0, 1, . . . , 90. Within IR Tools, this problem can be generated by just replacing the first line of the MATLAB statements specific of the previous example, i.e., we should define a new ProblemOptions structure in the following way: ProblemOptions = PRset('angles', 0:90, 'phantomImage', 'threephases'); Figure 5.12 shows the true phantom image, along with the measured data b (which are again corrupted by Gaussian white noise of level 10 −3 ). As in the previous example, we first run our algorithm with different parameter choice strategies within the inner hybrid scheme for generalized Tikhonov regularization: when the discrepancy principle and the Lcurve criterion are used, the stopping rule of Section 4.3 (decrease of Lx ( * , ) 2 ) is satisfied after = 14 and = 10 outer iterations, respectively. Figure 5.13 shows a plot of the relative errors and chosen regularization parameters at each outer iteration. We can clearly see that, when considering both the discrepancy principle and the L-curve criterion, the reconstructions greatly improve as the outer iterations proceed, with the latter strategy being more efficient (it employs less outer iterations to achieve a relative error comparable to the one obtained by the discrepancy principle at the end of the iterations). Note that, as expected (see again the theory presented in Section 4.4), the regularization parameters increase as the outer iterations proceed and, because of this and the fact that identification of the edges is also improved, the relative errors generally decrease. The behavior of the relative error versus the number of total iterations, and the dependency of the results on the power p > 0 appearing within the weights, are analogous to the ones displayed within the previous test problem.
Next, we compare the new method to other inner-outer iterative methods for edge enhancement in imaging. Figure 5.14 displays the relative error and regularization parameter values versus the number of outer iterations for the new method, and for a method that still employs the hybrid solver for general form Tikhonov regularization to handle the inner iterations while updating the IRN-TV weights (5.1) at each outer iteration. For both methods, the discrepancy principle is employed to adaptively choose the regularization parameter at each inner iteration. As for the previous test problem, both methods perform similarly during the first (outer) iterations, but then the quality of the IRN-TV solutions rapidly stagnates and eventually worsens, while the new method keeps improving. With respect to the previous test problem, in this setting IRN-TV noticeably performs better. Also, the value of the automatically selected regularization parameter is always quite small for IRN-TV, with a steep decrease during one of the final outer iterations: this could explain why the method based on IRN-TV stagnates and worsens at some point: namely, the method fails to give more weight to the regularization term when the latter should become more accurate. assesses the influence of the inner iterative solver on the overall behavior of the method. Namely, we consider the inner-outer iterative schemes implemented with both the new and the IRN-TV weights, and with both the hybrid and the CGLS methods as inner solvers. The hybrid method chooses the regularization parameter adaptively according to the discrepancy principle as the iterations proceed, and the value λ * , selected when the th inner iteration cycle terminates is taken to be the fixed regularization parameter to be set in advance of the th CGLS inner iteration cycle. In both the frames displayed in Figure 5. 15 we can see that the results obtained using CGLS as inner iterative solver are not as good as the ones obtained using the hybrid strategy, especially when the IRN-TV weights are used. Moreover, the CGLS solution obtained using the new weights seems to deteriorate as the outer iterations proceed (as a sort of "semiconvergence" appears): this can be probably avoided if the fixed regularization parameter is chosen differently.
Finally, Figure 5.16 displays some relevant reconstructions. We show the initial reconstructions x ( * ,1) obtained at the end of the first inner iteration cycle, when a Tikhonov-regularized problem with regularization term R(x) = Lx 2 2 is used, and the reconstructions x ( * , 12) . We consider the inner-outer iterative solvers that employ both the new and the IRN-TV weights, and both the hybrid method (with discrepancy principle) and CGLS. We can clearly see that, except for when the IRN-TV weights are used together with CGLS, there is a significant improvement in the reconstructions as the outer iterations proceed, and in particular the edges at the final outer iteration are much sharper than at the initial outer iteration. Interestingly enough, when the new weights are used together with CGLS, the edges and piecewise features of the solution  reweighting becomes effective) and at iteration = 12, for both the new reweighting strategy (4.3) and the IRN-TV reweighting strategy; when considering the new reweighting strategy, both the weights applied to the vertical and horizontal derivatives are displayed. Edges and piecewise constant portions of the image to be recovered remarkably show up in the new weights: although these features are not so evident when = 2 (indeed, they are defined with respect to the reconstruction x ( * ,1) that is still very inaccurate, see Figure  5.16), they can be recovered as the outer iterations progress. The IRN-TV weights are not so effective in revealing the structure of the image: although the edges seem to be better recovered when comparing the weights used at outer iteration = 2 and = 12, the piecewise constant parts are mainly weighted by small weights, too, and therefore are not so effectively smoothed in the reconstruction process.
Parallel-beam CT with more limited angle. We use IR Tools to vary some of the options defining the previous X-ray tomography simulations. We still wish to consider a parallel beam X-ray transmission problem, but we would like to change the phantom image to the well-known Shepp-Logan one, of size 128 × 128 pixels. Moreover, we would like to make the reconstruction problem even more challenging by using even more limited projection angles 0, 1, . . . , 45. Again, we generate this test problem within IR Tools, using instructions similar to the previous ones, where a new ProblemOptions structure is defined in the following way: ProblemOptions = PRset('angles', 0:45, 'phantomImage', 'shepplogan'); Figure 5.18 shows the true phantom image, along with the measured data b (which are again corrupted by Gaussian white noise of level 10 −3 ). As in the previous examples, we first run our algorithm with different parameter choice strategies within the inner hybrid scheme for generalized Tikhonov regularization: when the discrepancy principle is used, the method performs the maximum allowed number of outer iterations = 20; when the L-curve criterion is used, the stopping rule of Section 4.3 (decrease of Lx ( * , ) 2 ) is satisfied after only = 7 outer iterations. We can clearly see that, when considering both the discrepancy principle and the L-curve criterion, the reconstructions greatly improve as the outer iterations proceed, with the latter strategy being very efficient; however, the discrepancy principle eventually achieves a lower relative error (although the L-curve criterion may still achieve the same relative error, if not that the stopping criterion is satisfied quite early). Note that, as expected, the regularization parameters increase with the outer iterations. Next, we compare the new method to other inner-outer iterative methods for edge enhancement in imaging. Figure 5.20 displays the relative error and regularization parameter values versus the number of outer iterations for the new method, and for a method that still employs the hybrid solver for general form Tikhonov regularization to handle the inner iterations while udating the IRN-TV weights (5.1) at each outer iteration. For both methods, the discrepancy principle is employed to adaptively choose the regularization parameter at each inner iteration. We can clearly see that, when the IRN-TV weights are used, the behavior of the relative errors is quite hectic and, although the quality of the solution computed at the second outer iteration is good, this trend is not maintained during the following outer iterations. This may be because the automatically selected regularization parameter for IRN-TV keeps oscillating between values of the order of 10 −2 and values of the order of 10 −4 . Figure 5.21 assesses the influence of the inner iterative solver on the overall behavior of the method. Namely, we consider the inner-outer iterative schemes implemented with both the new and the IRN-TV weights, and with both the hybrid and the CGLS methods as inner solvers. The hybrid method chooses the regularization parameter adaptively according to the discrepancy principle as the iterations proceed, and the value λ * , selected when the th inner iteration cycle terminates is taken to be the fixed regularization parameter to be set in advance of the th CGLS iteration cycle. In both the frames displayed in Figure 5.21 we can see that the results obtained using CGLS hardly improve when the outer iterations proceed, especially when the IRN-TV weights are used. Again, this can probably be avoided if the fixed regularization parameter is chosen differently.
Finally, Figure 5.22 displays some relevant reconstructions. We show the initial reconstructions x ( * ,1) obtained at the end of the first inner iteration cycle, when a Tikhonov-regularized problem with regularization term R(x) = Lx 2 2 is used, and the reconstructions x ( * , 20) obtained when the maximum number of outer iterations is performed. We consider the inner-outer iterative solvers that employ both the new and the IRN-TV weights, and both the hybrid method (with the discrepancy principle as a parameter choice rule) and CGLS. We can clearly see that, when the hybrid method is used as inner solver, there is a significant improvement in the reconstructions as the outer iterations proceed and, in particular, the edges at the final outer iteration are much sharper than at the initial outer iteration: this is especially true for the new weights, while some artifacts are evident in the IRN-TV reconstructions. As expected, the CGLS reconstructions are still very corrupted, even at the end of the full cycle of outer iterations. Figure 5.17 displays the entries of the weight diagonal matrices at outer iterations = 2 and = 20, for both the new reweighting strategy (4.3) and the IRN-TV reweighting strategy. When considering the new reweighting strategy, we can clearly see that both the weights to be applied to the vertical and horizontal derivatives are appropriate and improve with increasing outer iterations. Similarly to the previous test problems, the IRN-TV weights are not so effective in revealing the structure of the image; some true and spurious edges seem to be assigned very small weights at the end of the iterations.

Examples from image deblurring.
Shaking blur. We use IR Tools to setup the following shaking blur simulation • Generate a simulated true sharp geometric image called "pattern1", of size 128 × 128 pixels.
• Construct noise-free blurred image, along with the matrix A that simulates the shaking blur forward operator of medium intensity • Add 0.1% normally distributed (white Gaussian) noise. More specifically, the test problem is generated with the following MATLAB statements: a plot of the relative errors and chosen regularization parameters at each outer iteration. We can clearly see that, when considering both the discrepancy principle and the L-curve criterion, the reconstructions greatly improve as the outer iterations proceed, with the former strategy being very efficient; however, the L-curve criterion eventually achieves a slightly lower relative error. The regularization parameter selected by the Lcurve criterion is quite small and almost stagnates during the first outer iterations (where the corners of the L-curves may be hard to distinguish), but considerably increase during the final outer iterations: as remarked in the previous subsection, this behavior is meaningful and desirable. On the contrary, the regularization parameters computed by the discrepancy principle are always either zero or numerically zero (and therefore they are not displayed in the rightmost frame of Figure 5.25). This means that the discrepancy curves almost never cross the noise level line: this could be because the noise level η 2 / b true 2 = 10 −3 is not very high, and/or this test problem is not very ill-conditioned. It is interesting to note that, despite of this, the regularization term favorably affects the approximation subspace for the solution, and the stopping criterion for the inner iterations prevents the quality of the solutions to degenerate.
Next, we compare the new method to other inner-outer iterative methods for edge enhancement in imaging. Figure 5.26 displays the relative error and regularization parameter values versus the number of outer iterations for the new method, and for a method that still employs the hybrid solver for general form Tikhonov regularization to handle the inner iterations while using the IRN-TV weights (5.1) at each outer iteration. For both methods, the L-curve criterion is employed to adaptively choose the regularization parameter at each inner iteration. We can clearly see that, when the IRN-TV weights are used, the behavior of the relative errors is quite hectic and, although the quality of the solution computed at the second outer iteration is good, this trend is not maintained during the following outer iterations. This situation may improve if alternative stopping criteria for the inner iterations are performed; note also that the regularization parameter for IRN-TV almost immediately stabilizes around a value of the order of 10 −3 . Figure 5.27 assesses the influence of the inner iterative solver on the overall behavior of the method. Namely, we consider the inner-outer iterative schemes implemented with both the new and the IRN-TV weights, and with both the  Shaking blur test problem. Relative errors and regularization parameters values at each outer iteration ,until the stopping criterion is satisfied. Both the discrepancy principle and the L-curve criterion are considered; the regularization parameter computed by the former is always numerically zero, and therefore it is not displayed in the rightmost frame. hybrid and the CGLS methods as inner solvers. The hybrid method chooses the regularization parameter adaptively according to the L-curve criterion as the iterations proceed, and the value λ * , selected when the th inner iteration cycle terminates is taken to be the fixed regularization parameter to be set in advance of the th CGLS iteration cycle. Looking at both frames in Figure 5.27 we can see that CGLS performs quite well. When the new weights are adopted, the quality of the solution stagnates during the first outer iterations (this may be because the selected regularization parameter is quite small; see also Figure 5.25), then considerably improves, and eventually degenerates (the latter phenomenon can be avoided by experimenting more with different stopping criteria for the outer iterations). When the TV weights are adopted, no significant improvements in the quality of the solution are visible as the outer iterations proceed (again, this may be because of the almost constant value of the regularization parameter; see also Figure 5.26).
Finally, Figure 5.28 displays some relevant reconstructions. We show the initial reconstructions x ( * ,2) obtained at the end of the second inner iteration cycle (i.e., as soon as the iterative reweighting of the regularization term is active), and the reconstructions x ( * , 9) obtained when the maximum number of outer iterations is performed. We consider the inner-outer iterative solvers that employ both the new and the IRN-TV weights, and both the hybrid method (with the L-curve criterion) and CGLS. We can clearly see that, when the new weights together with the hybrid method are used, there is a good improvement in the reconstructions as the outer iterations proceed and, in particular, the final reconstruction is piecewise constant. when the new weights together with CGLS are used, the final reconstruction is overall good, but a few spurious constant patches show up in several locations. When considering the IRN-TV weights with both inner solvers, we cannot see any improvements as the outer iterations proceed (on the contrary, the reconstruction at the end of the outer iterations is much worse than the initial one), and none of the reconstructions properly looks piecewise constant. matrices at outer iterations = 2 and = 9, for both the new reweighting strategy (4.3) and the IRN-TV reweighting strategy. When considering the new reweighting strategy, we can clearly see that edges are properly detected in both the vertical and horizontal derivatives starting from the very first reweighting step: as explained in Section 4, this greatly contributes to the success of the new algorithm. Similarly to the previous test problems, the IRN-TV weights are not so effective in revealing the structure of the image; very small and oscillating weights are assigned to every location in the image.
Out-of-focus blur. We use IR Tools to vary some of the options defining the previous image deblurring simulation. Firstly, we would like to change the sharp image to the well-known satellite one, of size 128 × 128 pixels. Secondly, we would like to consider an out-of-focus blur, still of medium intensity. Again, we generate this test problem within IR Tools, using the following MATLAB statements: ProblemOptions = PRset('trueImage', 'satellite', 'BlurLevel', medium); [A, b_true, x_true, ProblemInfo] = PRblurdefocus(128, ProblemOptions); b = PRnoise(b_true, NoiseLevel); Figure 5.30 shows the true image, along with the measured data b (which are again corrupted by Gaussian white noise of level 10 −3 ).
As in the previous examples, we first run our algorithm with different parameter choice strategies within the inner hybrid scheme for generalized Tikhonov regularization: in order to fully assess how the reconstructions evolve when he outer iterations proceed, we perform 10 outer iterations, using both the discrepancy principle and the L-curve criterion. Figure 5.31 shows a plot of the relative errors and chosen regularization parameters at each outer iteration. We can clearly see that the reconstructions computed by both parameter selection strategies are of similar quality, and greatly improve as the outer iterations proceed. The regularization parameters selected by the L-curve are of the order of 10 −3 and slightly oscillate; as for the shaking blur test problem, the regularization parameters computed by the discrepancy principle are always either zero or numerically zero (and therefore they are not displayed in the rightmost frame of Figure 5.31).
The regularization parameter selected by the L-curve criterion is always quite small and almost uniform across the outer iterations. Despite the different value of the regularization parameters selected by the two strategies, the quality of the corresponding solutions is almost identical.
Next, we compare the new method to other inner-outer iterative methods for edge enhancement in imaging. Figure 5.32 displays the relative error and regularization parameter values versus the number of outer iterations for the new method, and for a method that still employs the hybrid solver for general form Tikhonov regularization to handle the inner iterations while using the IRN-TV weights (5.1) at each outer iteration. For both methods, the L-curve criterion is employed to adaptively choose the regularization parameter at each inner iteration. We can clearly see that, when the IRN-TV weights are used, the behavior of the relative errors is quite hectic and the quality of the reconstructions keeps deteriorating as the outer iterations proceed (dramatically so during the last iterations). The regularization parameter for IRN-TV is always around a value of the order of 10 −3 (as in the case of the new weights). Figure 5.33 assesses the influence of the inner iterative solver on the overall behavior of the method. Out-of-focus blur test problem. Relative errors and regularization parameters values at each outer iterations ,until the stopping criterion is satisfied. Both the discrepancy principle and the L-curve criterion are considered; the regularization parameter computed by the former is always numerically zero, and therefore it is not displayed in the rightmost frame. Namely, we consider the inner-outer iterative schemes implemented with both the new and the IRN-TV weights, and with both the hybrid and the CGLS methods as inner solvers. The hybrid method chooses the regularization parameter adaptively according to the L-curve criterion as the iterations proceed, and the value λ * , selected when the th inner iteration cycle terminates is taken to be the fixed regularization parameter to be set in advance of the th CGLS iteration cycle. Looking at the leftmost frame of 5.33 we can see that CGLS stagnates, i.e., no sensible improvements in the quality of the reconstructions happen when the outer iterations proceed. Looking at the rightmost frame of Figure 5.33, we can see that CGLS performs much better than the hybrid method when the IRN-TV weights are considered: although the quality of the reconstructions computed by the former are not dramatic, the latter degenerates.
Finally, Figure 5.34 displays some relevant reconstructions. We show the initial reconstructions x ( * ,2) obtained at the end of the second inner iteration cycle (i.e., as soon as the iterative reweighting of the regularization term is active), and the reconstructions x ( * , 9) obtained when the maximum number of outer iterations is performed. We consider the inner-outer iterative solvers that employ both the new and the IRN-TV weights, and both the hybrid method (with the L-curve criterion) and CGLS. We can clearly see that, when the new weights together with the hybrid method are used, there is an excellent improvement in the reconstructions as the outer iterations proceed, although the last reconstruction is still slightly irregular (and one may consider performing additional outer iterations). When the new weights together with CGLS are used, the final reconstruction is almost identical to the one displayed in the leftmost top frame of Figure  5.34 (as expected; see also  the weight diagonal matrices at outer iterations = 2 and = 9, for both the new reweighting strategy (4.3) and the IRN-TV reweighting strategy. When considering the new reweighting strategy, we can clearly see that both the weights to be applied to the vertical and horizontal derivatives are appropriate and improve with increasing outer iterations. More evidently than in the previous test problems, the IRN-TV weights are not so effective in revealing the structure of the image: they are oscillating but usually very small, and the original structure of the image is totally lost in the last weights.
6. Conclusions and Future Work. In this paper we introduced a new inner-outer iterative algorithm for restoring and reconstructing images with enhanced edges, where a sequence of quadratic Tikhonovregularized problems is solved (outer iterations) and a specific hybrid method is applied to solve each quadratic problem in the sequence (inner iterations). The regularization matrix for each quadratic problem in the sequence is updated at each outer iteration and is obtained by premultiplying the gradient operator by a weight matrix that encodes edge information disclosed within all the previous outer iterations, in such a way that edges are not penalized in the solution process. The hybrid inner solver is very effective, and well-known strategies to set the regularization parameter can be successfully incorporated therein. The resulting strategy is innovative and very efficient when compared to available edge-recovery techniques, mainly because of the definition of the new weights and the incorporation of automatic regularization parameter choice techniques.
Future work will include the investigation of similar reweighting techniques, where the inner iterations are performed by a different hybrid method (for instance the one described in Section 3.1, with possible strategies to approximate the action of M † A ). Also, different expressions of the weighting matrices may be considered, in such a way that edge information is extracted by operators other than the gradient. Finally, similarly to [6,10,11], possible hybrid variants that involve the use of flexible Krylov subspaces will be investigated, with the ideal goal of avoiding nested cycles of iterations.