Enhanced choice of the parameters in an iteratively regularized Newton-Landweber iteration in Banach space

This paper is a close follow-up of Kaltenbacher and Tomba 2013 and Jin 2012, where Newton-Landweber iterations have been shown to converge either (unconditionally) without rates or (under an additional regularity assumption) with rates. The choice of the parameters in the method were different in each of these two cases. We now found a unified and more general strategy for choosing these parameters that enables both convergence and convergence rates. Moreover, as opposed to the previous one, this choice yields strong convergence as the noise level tends to zero, also in the case of no additional regularity. Additionally, the resulting method appears to be more efficient than the one from Kaltenbacher and Tomba 2013, as our numerical tests show.


1.
Introduction. Regularization of inverse problems in Banach spaces is a field of highly active research, cf., e.g., [2,16,8] for variational regularization and, e.g., [1,6,10,11,13,15,21], for iterative methods. The main reason for this lies in the fact that extension of the scope from Hilbert to general Banach spaces better allows to formulate requirements on the searched for solution and to describe realistic noise models.
We will here especially concentrate on a combination of a Newton-type strategy with Landweber iterations to approximate the Newton step, which leads to a fully explicit iteration, cf. [9,10].
To formulate the method, consider a nonlinear ill-posed operator equation where F maps between Banach spaces X and Y . The given data y δ are typically contaminated by noise, and we are going to assume that the noise level δ in is known. In the following, x 0 is some initial guess and we will assume that a solution x † to (1) exists. For some p, r ∈ (1, ∞), we will make use of the duality mappings J X p (x) := ∂ 1 p x p from X to its dual X * , and J Y r (y) := ∂ 1 r y r from Y to Y * , respectively. While under the assumptions we will make on X, the mapping J X p will in fact be single valued, this will not necessarily be the case for J Y r and we will denote by j Y r a single valued selection from J Y r . Therewith, we consider a combination of the iteratively regularized Gauss-Newton method with an iteratively regularized Landweber method for approximating the Newton step, using some initial guess x 0 and starting from some x δ 0 (that need not necessarily coincide with x 0 ) For n = 0, 1, 2 . . . do u n,0 = 0 z n,0 = x δ n For k = 0, 1, 2 . . . , k n − 1 do u n,k+1 = u n,k − α n,k J X p (z n, It is clear that the choice of the parameters α n,k , ω n,k , k n , and of the overall stopping index n = n * crucially influences the stability and efficiency of the method. While in [9], [10] only either convergence or convergence rates have been established with disjoint parameter choices for each of these two cases, the aim of this paper is to provide a unified parameter choice strategy for this method that allows to show both unconditional convergence and convergence rates under additional regularity assumptions on the solution. Moreover, by using an appropriate choice of the stopping index for the inner iteration, differently from [9], [10] we can show continuous dependence of the iterates x n on the data y δ for each fixed outer iteration index n and therewith are able to prove strong convergence. Finally, the new parameter choice appears to enhance efficiency as compared to [9], as the numerical tests below show.
The remainder of this paper is organized as follows. In Section 2 we provide some preliminaries. The parameter choice as well as convergence results are derived and formulated in Section 3. Section 4 shows some numerical tests for a coefficient identification problem in an elliptic PDE in one and two space dimensions.
2. Preliminaries and assumptions. Throughout this paper we will assume that X is smooth, which means that the duality mapping is single-valued, and moreover, that X is s-convex for some s ∈ [p, ∞), which implies for some constant c p,s > 0, cf. Corollary 2.61 in [21]. Here, D p (x, y) denotes the Bregman distance (where j X p (x) denotes a single valued selection of J X p (x)). We will also make use of its shifted version D x0 As a consequence of the above assumptions, X is reflexive and we also have for some C p * ,s * , where s * denotes the dual index s * = s s−1 , cf. (4) in [9]. Under these assumptions, the duality mapping is bijective and J −1 p = J X * p * , the latter denoting the (by s-convexity also single-valued) duality mapping on the dual X * of X. We will also make use of the identities and For more details on the geometry of Banach spaces we refer, e.g., to [20], [21] and the references therein.
The assumptions on the forward operator besides a condition on the domain include a structural condition on its degree of nonlinearity. For simplicity of exposition we restrict ourselves to the tangential cone condition and mention in passing that this could be extended to a more general condition on the degree of nonlinearity (cf. [5]) as in [12]. Here F is not necessarily the Fréchet derivative of F but just a linearization of F satisfying the Taylor remainder estimate (9). Additionally, we assume that F and F are uniformly bounded on . By distinction between the cases x − x 0 < 2 x † − x 0 and x − x 0 ≥ 2 x † − x 0 and the second triangle inequality we obtain from (4) that } For obtaining convergence rates we impose a variational inequality (or variational source condition) with with s the parameter of smoothness of X, and ν ∈ (0, 1 2 ]. Condition (11) corresponds to a source condition x † − x 0 ∈ R((F (x † ) * F (x † )) ν ) in the special case of Hilbert spaces (where s = 2), cf., e.g., [5]. Note that (11) is stronger for larger ν and (11) always holds for ν = 0 with due to (4), (10). The case ν = 0 can be identified with the situation that no additional regularity (i.e., (11) with ν > 0) is known to hold.
3. Error estimates and parameter choice. We here first of all follow the lines of Section 2 in [9]. Some of the estimates are the same as in [9] (and will be repeated here only for convenience of the reader). Some of them are different, though and therewith enable different parameter choice strategies. For any n ∈ IN we have where we abbreviate Assuming that z n,k ∈ B D ρ (x † ), we now estimate each of the terms on the right hand side separately.
By (5) and (7) we have for the term (I) where we have used the triangle inequality in X * and X, the inequality and (10), as well as the abbreviations Here which by p * ≥ s * > 1 defines a strictly monotonically increasing and convex function on R + . Note that estimate (16) is just the same as (15) in [9].
To make use of the variational inequality (11) for estimating (III), we first of all use (9) to conclude This together with (11) implies where we have used the elementary estimate with C(λ) = λ λ (1 − λ) 1−λ and λ(s, ν) as in (12). Note that (26) differs from the corresponding estimate (24) in [9]. Finally, for the term (IV) we have that which is just (26) in [9].
To meet conditions (39), (43) with a minimal α n,k+1 we set It remains to choose • the inner stopping index k n and • the outer stopping index n * , see below. Indeed with these choices of ω n,k and α n,k+1 we can inductively conclude from (35) that This monotonicity result holds for all n ∈ IN and for all k ∈ IN. By (45) and α n,k ≤ 1 (cf. (41)) it can be shown inductively that all iterates and sup k∈IN0 t n,k → 0 as n → ∞ .
To quantify the behavior as k → ∞ of the sequence α n,k according to (39), (43), (44) for fixed n we distinguish between two cases. Case (i): there exists a k such that for all k ≥ k we have α n,k =α n,k . Considering an arbitrary accumulation pointᾱ n of α n,k (which exists since 0 ≤ α n,k ≤ 1) we therefore haveᾱ n =ᾱ n 1 − (1 − q)ᾱ n 1 θ , henceᾱ n = 0.
Case (ii): consider the situation that (i) does not hold, i.e., there exists a subsequence k j such that for all j ∈ IN we have α n,kj =α n,kj . Then by (39), (43), and (48) we have α n,kj →τ (ηr n + (1 + η)δ) Since η and δ can be assumed to be sufficiently small, this especially implies the bound α n,k ≤ 1 in (41). We consider z n * ,k * n * as our regularized solution, where n * , k * n * (and also k n for all n ≤ n * − 1; note that k * n * is to be distinguished from k n * -actually the latter is not defined, since we only define k n for n ≤ n * − 1!) are still to be chosen appropriately, according to the requirements from the proofs of • convergence rates in case ν, θ > 0, • convergence for exact data δ = 0, • convergence for noisy data as δ → 0.
3.1. Convergence rates in case ν, θ > 0. From (45) we get hence in order to get the desired rate in view of (51) (which is a sharp bound in case (ii) above) we need to have a bound r n * ≤ τ δ for some constant τ > 0, and we should choose k * n * large enough so that α n * ,k * n * ≤ C α (r n * + δ) r 1+θ (54) which is possible with a finite k * n * by (51) for C α > (τ (1 + η)) r 1+θ . Note that this holds without any requirements on k n for n < n * .
3.2. Convergence as n → ∞ for exact data δ = 0. To show that (x n ) n∈IN is a Cauchy sequence (following the seminal paper [4]), for arbitrary m < j, we choose the index l ∈ {m, . . . , j} such that r l is minimal and use the identity and the fact that the monotone decrease and boundedness from below of the sequence D x0 p (x † , x m ) implies its convergence, hence it suffices to prove that the last term in (55) tends to zero as m < l → ∞. (Analogously it can be shown that D x0 p (x l , x j ) tends to zero as l < j → ∞). This term can be rewritten as n,k (1 + η)(2r n + r l ) ≤ 2ρ pτ (t n,k + ηr n ) r + 3(1 + η)ω n,k t r−1 n,k r n by our choice of α n,k =α n,k (note thatα n,k = 0 in case θ = 0), condition (9) and and minimality of r l . Thus we have by ω n,k ≤ ω and Young's inequality that there exists C > 0 such that t r n,k + k n r r n for which we can conclude convergence as m, l → ∞ from (47) provided that ∞ n=m k n r r n → 0 as m → ∞ , which we guarantee by choosing, for an a priori fixed summable sequence (a n ) n∈IN , e.g. a n = 2 −n k n := [a n r −r n ] , or, using (50) just k n ≡k for some fixed integerk, e.g.,k = 3. This is consistent with (54), since in case δ = 0 we have n * = ∞, so condition (54) never gets active in the noiseless case.

3.3.
Convergence with noisy data as δ → 0. In case ν, θ > 0, convergence follows from the convergence rates results in Subsection 3.1. Therefore it only remains to show convergence as δ → 0 in case ν, θ = 0. In this section we explicitly emphasize dependence of the computed quantities on the noisy data and on the noise level by a superscript δ.
Let y δj − y ≤ δ j with δ j a zero sequence and n * j the corresponding stopping index. As usual [4] we distinguish between the two cases that (i) n * j has a finite accumulation point and (ii) n * j tends to infinity. Case (i): there exists an N ∈ IN and a subsequence n ji such that for all i ∈ IN we have n ji = N . Provided we can conclude that x δj i N → x 0 N as i → ∞, and by taking the limit as i → ∞ also in (53), x 0 N is a solution to (1). Thus we may set where we have again used the continuous dependence (57) in the last step. Case (ii): let n * j → ∞ as j → ∞, and let x † be a solution to (1). For arbitrary > 0, by convergence for δ = 0 (see the previous subsection) we can find n such that D x0 p (x † , x 0 n ) < 2 and, by Theorem 2.60 (d) in [21] there exists j 0 such that for all j ≥ j 0 we have n * ,j ≥ n + 1 and |D x0 n ≤ n * (δ) − 1 for all δ ⇒ The mapping δ → x δ n is continuous at δ = 0 . (58) Hence, by monotonicity of the errors we have Indeed, (57), (58) can be concluded from continuity of F , F , the definition of the method (3), as well as stable dependence of all parameters ω n,k , α n,k , k n according to (37), (39), (43), (44), (56) on the data y δ .
The analysis above yields the following convergence result.
Theorem 3.2. Assume that X is smooth and s-convex with s ≥ p, that x 0 is sufficiently close to x † , i.e., x 0 ∈ B D ρ (x † ), that F satisfies (9) with (8), that F and F are continuous and uniformly bounded in B D ρ (x † ), and that (38), (42) hold. Then, the iterates z n,k defined by Algorithm 3.1 remain in B D ρ (x † ) and converge to a solution x † of (1) subsequentially as δ → 0 (i.e., there exists a convergent subsequence and the limit of every convergent subsequence is a solution). In case of exact data δ = 0, we have subsequential convergence of x n to a solution of (1) as n → ∞.
If additionally a variational inequality (11) with ν ∈ (0, 1] and β sufficiently small is satisfied, we obtain optimal convergence rates Note that we here deal with an a priori parameter choice: θ and therefore ν has to be known, otherwise ν must be set to a lower bound ν for the true ν and since ν ≤ ν implies validity of (11) with ν replaced by ν, Theorem 3.2 still implies the (possibly suboptimal) rates O(δ 4ν 2ν+1 ), or just convergence if we have set ν = 0.

Numerical Experiments.
In this section we present some numerical experiments to test the method defined in section 3. We consider the identification of the space-dependent coefficient c in the elliptic boundary value problem from the measurement of u in Ω, where f is a fixed function and where Ω is assumed to be a smooth, bounded domain in R d , d ∈ N. Note that inhomogeneous Dirichlet boundary conditions can be easily incorporated into the right-hand side f if necessary. We consider three examples with d = 1 and an example with d = 2. In all cases, we take X := L p (Ω), 1 < p < ∞, Y := L r (Ω), 1 < r < ∞ and recall that the following facts hold true.
(i) For 1 < p < ∞, the duality mapping in X is given by J X p (c) = |c| p−1 sgn(c). (ii) If the domain of the forward operator is defined by the condition (8) is satisfied with c 0 = 0. (iii) There follows from Lemma 2 in [15] that the operator F : are well defined and bounded. In all the numerical simulations, we take θ = 0 and stop the outer iteration by means of the discrepancy principle (53). Concerning the stopping index of the inner iteration, we slightly modify Algorithm 1, requiring also that if F (z n,k ) − y δ ≤ τ δ then the iteration has to be stopped. More precisely, k n = min{k ∈ Z, k ≥ 0, | F (z n,k ) − y δ ≤ τ δ ∨ k ≥ a n r −r n } (62) and the regularized solution is c δ n * = z nn * −1,kn * −1 . 4.1. 1-dimensional examples. We consider the same numerical simulations as in [9], taking Ω = (0, 1) and inhomogeneous boundary conditions u(0) = g 0 , u(1) = g 1 . We solve all differential equations approximately by a finite difference method by dividing the interval [0, 1] into N + 1 subintervals with equal length 1/(N + 1), in all examples below N = 400. The L p and L r norms are calculated approximately by means of a quadrature method.
Example 4.1. In the first simulation we assume that the solution is sparse: The test problem is constructed by taking u(t) = u(c † )(t) = 1 + 5t, f (t) = u(t)c † (t), g 0 = 1 and g 1 = 6. We perturb the exact data u with gaussian white noise: the corresponding perturbed data u δ satisfies u δ − u L r = δ, with δ = 0.1 × 10 −3 . We apply Algorithm 1, with the inner stopping index satisfying (62), with τ = 1.02, τ = 0.1, a n = (50 + n) −2 . The upper bound c ω is fixed equal to 0.1 and ϑ is chosen as 2 −j , where j is the first index that satisfies  In figure 1 we show the results obtained by our method with p = 2 and p = 1.1 respectively. The reconstructed solutions are very similar to those obtained in [9] and [10]. Concerning the total number of inner iterations similarly to [9], it is larger in the case p = 2 (N 2 = 3992) than in the case p = 1.1 (N 1.1 = 3063). In both cases, the value of N p is lower than the corresponding value found in [9]. We underline that it is possible to make different choices forτ , c ω , a n to look for further improvements of the speed of the method. The partial freedom in the choices of these parameters makes Algorithm 1 more flexible than the method described in [9]. In our numerical simulations, we tested different choices which gave similar but slightly worse results than those stated here.
Example 4.2. We modify the exact solution of the previous example into: and choose again δ = 0.1 × 10 −3 . In this case, we take τ = 1.02,τ = 0.01, a n = (100 + n) −2 , c ω = 0.1 and ϑ as in the previous example. In figure 2 we show the results obtained by our method with p = 2 and p = 1.1 respectively. As usual, the reconstruction of the sparsity is much better for p = 1.1. Concerning the total number of inner iteration N p , in this case we obtain N 2 = 4141 (with a corresponding error of 0.1110) and N 1.1 = 3110 (with a corresponding error of 0.0482). We observe that although the corresponding error is slightly larger than that obtained in [9], the value of N 2 is slightly more than a fifth than the value obtained in [9], with a gain in the speed of the 421.8%. Moreover, in the case p = 1.1 Algorithm 1 performs 1415 iterations less than in [9], with a gain in the speed of the 45.5%, and obtains even a lower error.  We suppose c † to be a smooth solution and take u(c † )(t) = 1 − 2t, f (t) = (1 − 2t)(2 − t + 4 sin(2πt)), u(0) = g 0 = 1 and u(1) = g 1 = −1 as exact data of the problem. We start the iteration from the initial guess c 0 (t) = 2 − t, and takeτ = 5 × 10 −3 , a n = (1 + n) −1.1 , c ω = 5 × 10 −3 and ϑ as in the previous example. Using the same Matlab seed for generating random data, we consider the same perturbed data as in [9], Example 3, case B (cf. Figure  3 here and Figure 3, picture (d) in that paper). We run both Algorithm 1 withτ = 1.0015 and the algorithm that generated the results presented in [9] for this example with p = 2 and r = 1.1 1 . Pictures (b) and (d) from figure 3 show the corresponding results. The solution obtained by Algorithm 1 is slightly more precise, with an error equal to 1.8852 × 10 −1 for Algorithm 1 and equal to 2.9388 × 10 −1 for the method in [9]. The most interesting fact is that Algorithm 1 computes only N 2 = 249 total inner iterations to obtain this solution, whereas for the method in [9] N 2 = 3285 and the reconstruction is poorer. Moreover, due to the flexibility of our method, we can simply change the value of τ into 1 + 10 −5 to get a more precise solution (see picture (c) in figure 3). A visual inspection gives an idea of the improvement: the error of this solution, obtained with 278 total inner iterations, is equal to 1.1607 × 10 −1 . We summarize the numerical results of the 1-dimensional examples in Table 1.

A 2-dimensional example.
We show the performance of Algorithm 1 in the following 2-dimensional example.
Let Ω = (0, 1) 2 and assume the exact solution to be where the function χ is the characteristic function on a subset of R 2 . We take as exact data u(x, y) = 1 + x + y: as a consequence, the fixed right hand side of the problem is f (x, y) = c † (x, y)u(x, y) and the data at the boundary are given  by g(x, y) = u(x, y) for every (x, y) ∈ ∂ Ω. We discretize the interval [0, 1] x into N + 1 subintervals and the interval [0, 1] y into M + 1 subintervals and compute the solutions of the forward operator F (c) = u(c) by a finite difference method. The L p and L r norms in Ω are calculated by a simple quadrature method. We fix p = 1.1, N = M = 30, τ = 1 + 10 −5 ,τ = 10 −4 , c ω = 0.1 and ϑ as in the 1-dimensional examples. We run Algorithm 1 starting from c 0 = 0 with the data u δ perturbed by white gaussian noise with two different noise levels. In the first case, we choose a small value for δ = u − u δ = 10 −3 with r = 2. In the second case we choose δ = 10 −2 with r = 2 and with r = 10.
In figure 4 we plot the exact solution and the reconstructions obtained using Algorithm 1 in all these cases. The pictures show that the method provides a good reconstruction of the sparsity in this example. In particular, we underline that in the case δ = 10 −2 the choice of a large r improves the result, obtaining a better reconstruction with very few iterations (the total number of iterations is equal to 9 in this case).

5.
Conclusions. In this paper we have devised an alternative parameter choice strategy for the iteratively regularized Newton-Landweber iteration proposed in  [9] 3285 0.2939 Table 1. Numerical results for the 1-dimensional simulations. [9]. This strategy is based on alternative error estimates and allows for a unified treatment of the unconditional convergence case and convergence with rates. In our future research we will try to extend the analysis to faster inner iterations than Landweber such as steepest descent or conjugate gradient methods.
6. Acknowledgment. Support by the German Science Foundation DFG under grant KA 1778/5-1 is gratefully acknowledged.