A conjugate-gradient-type rational Krylov subspace method for ill-posed problems

Conjugated gradients on the normal equation (CGNE) is a popular method to regularise linear inverse problems. The idea of the method can be summarised as minimising the residuum over a suitable Krylov subspace. It is shown that using the same idea for the shift-and-invert rational Krylov subspace yields an order-optimal regularisation scheme.


Introduction
We consider the solution of the linear system Tx = y δ (1) where the operator T acts continuously between the Hilbert spaces X and Y . The linear system is assumed to be ill-posed, that is, the range R(T) is not closed in Y . y δ is a perturbation of the exact data y , such that y δ − y δ. y δ is also called the noisy data and δ the noise level. For exact data, we assume that y is in the range of T, y ∈ R(T), which guarantees that there exists a unique x + ∈ N (T) ⊥ such that Tx + = y . N (T) ⊥ designates the orthogonal complement of the null space N (T) of T. x + can also be characterised as the unique x + ∈ N (T) ⊥ that solves the normal equation Original content from this work may be used under the terms of the Creative Commons Attribution 3.0 licence. Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.
In fact, the normal equation possesses a unique solution x + ∈ N (T) ⊥ for every y ∈ D(T + ) := R(T) ⊕ R(T) ⊥ , where R(T) ⊕ R(T) ⊥ designates the direct orthogonal sum of R(T) and its orthogonal complement R(T) ⊥ = N (T * ). The linear unbounded map T + : D(T + ) → N (T) ⊥ , y → x + , is the Moore-Penrose (generalised) inverse and x + is the minimum-norm solution.
In order to reconstruct the solution x + of the unperturbed problem Tx + = y as good as possible subject to a given noise level δ, special procedures, called regularisation schemes, have to be used. Let {R m } m∈N0 be a family of linear or nonlinear operators from Y to X with R m 0 = 0. If there exists a mapping m : for any x + ∈ N (T) ⊥ , then the pair (R m , m(δ, y δ )) is called a (convergent) regularisation scheme for T. The mapping m is called parameter choice or stopping rule. We will always use the discrepancy principle as our stopping rule, which is due to Morozov [23]. The discrepancy principle reads: Choose a fixed τ > 1 and set: where x δ m := R m y δ . The discrepancy principle leads to convergent regularisation schemes (see [6]). Regularisation schemes might converge arbitrarily slow unless the (unperturbed) data x + satisfies some smoothness assumptions. Convergence rates can be given when x + is in the source set X µ,ρ := {x ∈ X | x = (T * T) µ w, w ρ}, µ > 0. Regularisation schemes (R m , m(δ, y δ )) that attain the highest possible convergence speed are called of optimal order where C µ neither depends on δ nor on ρ.
One of the most popular iterative regularisation schemes is conjugated gradients on the normal equation (CGNE) that can be stated briefly as where K m is the (polynomial) Krylov subspace An efficient algorithm is available to compute these approximations (see [15]). CGNE with the discrepancy principle as a stopping rule is an order-optimal regularisation scheme for all µ > 0 (see theorem 7.12 in [6,24]). And, due to its definition, CGNE is the fastest to satisfy the discrepancy principle with respect to all regularisation schemes that compute approximations in the Krylov subspace K m . The analysis of CGNE with respect to its regularisation properties is involved, since the operators R m are nonlinear and not necessarily continuous (see theorem 7.6 in [5,6]). In this paper, we will define a method of the same type, but for the shift-and-invert or resolvent Krylov subspace V Grimm Inverse Problems 36 (2020) 015008 where γ > 0 is a fixed real number (e.g. [1, 11, 12, 20-22, 26, 29]). Due to the relation (I + T * T/γ) −1 T * T = γI − γ(I + T * T/γ) −1 , this rational Krylov subspace can also be written as We define our method by combined with the discrepancy principle as its stopping rule. (The minimizer x δ m is uniquely defined, see lemma 2.3 below.) The subspace Q m belongs to the class of rational Krylov subspaces which have been studied in recent years (e.g. references in [8,13]). Since this method can be seen as solving the normal equation (2) approximatively in the shift-and-invert Krylov subspace Q m , the method will be called shift-and-invert on the normal equation (SINE). Several regularisation schemes have been proposed that compute approximations in the subspace Q m , see example 1.1. By definition, our method will be the fastest to stop with respect to the discrepancy principle. SINE is related (but not identical) to CGNE preconditioned by (I + T * T/γ) −1 . Actually, SINE is not a preconditioning technique in the usual sense. Nevertheless, rational Krylov subspaces have been observed of being capable of accelerating the convergence (e.g. [10]). The analysis of SINE with respect to its regularisation properties shares the difficulties of the analysis of CGNE, the family of operators R m is again nonlinear and not continuous in general, which can be seen by generalising the ideas of the proof of theorem 7.6 in [5,6]. (i) Iterated Tikhonov-Phillips regularisation (see [3,7,17,18]).
(ii) Applying the implicit Euler method, the implicit midpoint-rule, or the trapezoidal rule with fixed time-step to asymptotic regularisation (Showalter's regularisation) leads to approximations in Q m (see [27]). (iii) The method proposed by Riley in [28] applied to the normal equation. (iv) The rational Arnoldi approach proposed in [2].
In section 2, we will show that x δ m in (7) can be computed efficiently and discuss some basic properties of the method. Convergence for unperturbed data is shown in section 3 before the SINE method is discussed with respect to its regularisation properties in section 4. In section 5, upper bounds on the number of iterations of SINE are discussed by comparing them to the known upper bounds on the number of iterations of CGNE. The findings are illustrated by an experiment in section 6.
Throughout we will use notations identical or closely related to the notations in [6]. In particular, the functional calculus described in section 2.3 of [6] is used without further note. Our proofs will follow closely or sometimes literally the corresponding proofs for CGNE in chapter 7 of [6].

Basic properties
We consider algorithm 1, where we choose x δ 0 = 0 without loss of generality. If x δ 0 were not zero, the corresponding shift-and-invert Krylov subspace would be spanned with r 0 = y δ − Tx δ 0 instead of y δ , which allows to use prior information on the solution. First, we aim for the following properties: algorithm 1 computes x δ m according to (7), as long as the algorithm does not V Grimm Inverse Problems 36 (2020) 015008 break down. If algorithm 1 breaks down in step κ with q κ = 0, we have x δ κ = T + y δ as well as x δ m = T + y δ for m κ in (7). Algorithm 1. Shift-and-invert on the normal equation (SINE).
An alternative way to compute the iterates of SINE is presented in algorithm 2. This variant is closer to the standard CG iteration and it might therefore help to compare SINE to known methods. All proofs and experiments refer to algorithm 1.

Algorithm 2. Variant of shift-and-invert on the normal equation (SINE)
.

V Grimm Inverse Problems 36 (2020) 015008
We further obtain which concludes the proof of our statements for m = 1. Now we assume that the assertions are satisfied for m. Then, we have With respect to (ii), it follows due to (6) and the fact that assertion (i) is already proved for m + 1. □ The following lemma is a direct consequence of lemma 2.1.

Lemma 2.2. As long as algorithm 1 does not break down with q m = 0, we have
With these preparations, the statements we aimed for can be proved.
V Grimm Inverse Problems 36 (2020) 015008 and obtain by lemma 2.1. The strict inequality, and therefore uniqueness of the minimizer, follows since Hence we have 0 = ( which characterises the minimum-norm solution, that is x δ κ = x + (see theorems 2.5 and 2.6 in [6]). □ If the algorithm stops with q κ = 0, it follows from (7) that x δ m = x δ κ , m κ. Analogous to the description of the Krylov subspace K m with the set Π m−1 of polynomials of degree less than m, the shift-and-invert Krylov subspace Q m can be described with the help of rational functions as The functional calculus of section 2.3 in [6] applies to these rational functions. The iterates, residui etc of algorithm 1 can be identified with the corresponding rational functions (see [6,14]). The following lemma describes some properties of the rational function r m (λ) that belongs to the residuum r m , i.e. the function r m (λ) such that respectively.

Lemma 2.5. As long as the stopping index κ has not been reached, we have
where r m (λ) is the rational function that describes the residuum r m in algorithm 1. The values λ j,m−1 , j = 1, . . . , m − 1 of r m−1 (λ) and the values λ j,m , j = 1, . . . , m of r m (λ) are interlacing, real, and positive, that is V Grimm Inverse Problems 36 (2020) 015008 Proof. Let v 0 , · · · , v m−1 be an orthonormal basis such that Then we can represent the iterate x δ as is an equivalent condition to x δ being the minimizer in (7), we have Specifically, Inductively, by the interlacing eigenvalue theorem (see theorem 3.6 in [30]), this leads to the result that the eigenvalues of S are separated and interlaced with the eigenvalues of S −1. If we designate the eigenvalues of S with λ 1, < · · · < λ , then we obtain the statement on the interlacing of the numbers in our theorem. Using the representation of the iterate in the rational Krylov subspace (see [9]), one obtains The inner product introduced in the following lemma will be crucial for the proof of our main theorem. Also, the idea of algorithm 1 can be briefly stated as computing an orthogonal basis w 0 , . . . , w j−1 of the rational Krylov subspace Q j with respect to the inner product [·, ·] in (13) when the vectors are identified with the corresponding rational functions. In the following, we will often not make a difference between the residuum r m and the rational function r m (λ) and denote both by r m .

Convergence
The following theorem shows convergence of the iterates x m in algorithm 1 to the minimumnorm solution x + = T + y for data y ∈ D(T + ) and our general choice x 0 = 0. For an initial guess x 0 = 0, it can be readily shown that the iterates converge to T + y + P N (T) x 0 , where P N (T) is the orthogonal projector to the null space of T. The superscript δ has been dropped in this section in order to emphasize that data y ∈ D(T + ) without perturbation is considered. Proof. We basically follow the lines of the proof of theorem 7.9 in [6] or [25], respectively. If the iteration terminates after a finite number of steps then the corresponding iterate coincides with T + y according to lemma 2.4. We therefore assume, that the iteration does not terminate. Then we have the sorting 0 < λ 1,m < λ 2,m < · · · < λ m,m T 2 of the Ritz values according to lemma 2.5. From the representation of the residual rational function (10) in lemma 2.5, we obtain Since And therefore, we obtain the estimate where E λ designates the spectral family of T * T. For the last inequality, we additionally assumed y ∈ R(T), that is, y = Tx + . For later use (e.g. lemma 4.1), we discuss the slightly more general function λ ν ϕ 2 m (λ), ν > 0. Standard calculations lead to Since 0 ν ϕ 2 m (0) = 0 = λ ν 1,m ϕ 2 m (λ m,1 ), there is at least one 0 < λ * < λ m,1 , such that (λ ν ϕ 2 m (λ)) (λ * ) = 0 and such that the maximum is achieved at this point. Hence the equation holds true. We need to distinguish two cases. First case: γ T 2 . Then, we have 0 < λ * < λ 1,m T 2 γ. Since λ * < γ, we have Problems 36 (2020) 015008 and hence, by (17), and therefore Second case: γ < T 2 . We set We then have We can therefore conclude, by (17), In both cases, we have We now relax the assumption on y to y ∈ D(T + ) = R(T) ⊕ R(T) ⊥ . Since R(T) ⊥ = N (T * ), algorithm 1 produces the same iterates for y ∈ R(T) ⊕ R(T) ⊥ and P R(T) y ∈ R(T), respectively. Rewriting P R(T) y = Tx + with x + = T + y, we can apply (18) with ν = 1 and obtain From here, a nearly literal copy of the corresponding part of the proof of theorem 7.9 in [6] will finish the proof. □ V Grimm Inverse Problems 36 (2020) 015008

SINE is an order-optimal regularisation method
We assume that where the noise level δ > 0 is known. Algorithm 1 is stopped with m = m(δ, y δ ) according to the discrepancy principle (3). For the stopping index m = m(δ, y δ ) 1, is satisfied (with an a priori chosen τ > 1), for m = 0, only the first inequality holds true. The algorithm always terminates after a finite number of steps, which can be seen as follows. Lemma 4.1 also holds for µ = 0 and ρ = x + . Hence (15)). The limit of the norm of the residuals exists, since the sequence is non-increasing due to lemma 2.3 or (7), respectively, and bounded from below by zero. Since τ δ > δ, the discrepancy principle stops the algorithm after a finite number of steps. If the algorithm has a finite termination index κ, then q κ = 0. According to lemma 2.4 we have x δ κ = T + y δ , in which case The letter c designates a generic constant in the following lemmata and proofs.

Lemma 4.1. Let y = Tx
Proof. The bound (16) proved in theorem 3.1 reads As before ϕ m is bounded by 1 in [0, λ 1,m ] and satisfies the equation (18) If we insert these estimates and use y = T(T * T) µ w with w ρ, we obtain which gives the assertion. □ The following lemma and its proof are a nearly literal copy of lemma 7.11 in [6], only that the functions representing the iteration are rational instead of polynomial.
V Grimm Inverse Problems 36 (2020) 015008 Lemma 4.2. Assume that y = Tx + with x + ∈ X µ,ρ . Then for 0 m κ, Proof. By the interpolation inequality (see [6], page 47) and x δ 0 = 0, We conclude that the assertion of the lemma is true for m = 0 by keeping in mind that r 0 = 0. Now let 0 < m κ. By assumption, we have and we choose a positive such that which in particular implies that is smaller than or equal to λ 1,m , see (15). Next, we introduce where g m is the rational function that represents the mth SINE-iterate in Q m . We obtain From here, a literal copy of the proof of lemma 7.11 in [6] will do. The only difference is that r m is a rational function (see (10)) instead of a polynomial. □ Finally, we can prove our main theorem. (19) with m(δ, y δ ), then SINE is an order-optimal regularisation method, i.e. if T + y ∈ X µ,ρ , then
With respect to lemma 4.2, it remains to estimate |r m(δ,y δ ) (0)|. For simplicity, we write m instead of m(δ, y δ ) in the following and assume, without loss of generality, that m 2. (m = 0 follows from lemma 4.2 with r 0 = 0, m = 1 refers to the space Q 1 = K 1 and theorem 7.12 in [6] applies). By lemma 4.1, we conclude that Since τ > 1, this implies that It remains to estimate

The rational function
due to (13) in lemma 2.6. Moreover, by definition of π m and (15), Substituting u m = π m + λϕ, then we have ϕ ∈ Π m−2 /(1 + ·/γ) m−1 and We first show that Using (21), we obtain Since r m ,q according to lemma 2.6, we have, again with lemma 2.6, By lemma 2.6 and (21), we obtain and, similarly, V Grimm Inverse Problems 36 (2020) 015008 which finally gives (25). Due to and by lemma 2.6, we obtain analogously and therefore Hence, by setting (25) and (26) in (24), we obtain From here, the proof continues literally as the proof of theorem 7.12 in [6]. □

Upper bounds for the stopping index
The number m(δ, y δ ) of necessary iterations to meet the discrepancy principle reflects the efficiency of the method. Due to construction, SINE will stop faster with respect to the discrepancy principle than any other method that relies on the shift-and-invert Krylov subspace. Here, we will additionally show that SINE stops earlier than CGNE (or at the same iterate as CGNE, in the worst case) under the same assumptions as in section 4. For this discussion, we designate the stopping index for SINE with parameter γ > 0 as m γ (δ, y δ ). When γ tends to infinity, SINE turns into CGNE. Therefore we designate the stopping index of CGNE with m ∞ (δ, y δ ). Due to the definition of CGNE (4) and SINE (7), it follows immediately that the norm of the residual of the mth-SINE iterate x SINE m is always smaller than or equal to the norm of the residual of the mth-CGNE iterate x CGNE m , i.e. Theorem 5.1 basically shows that all upper bounds that are known for CGNE also apply to SINE, but SINE might be faster. We explicitly state some corollaries. As a corollary of theorem 7.13 in [6], which is due to [24], and theorem 5.1 we obtain the following statement.
Corollary 5.2. If y ∈ R(T), γ ∈ R + ∪ {∞}, and T + y ∈ X µ,ρ , then and this estimate is sharp in the sense that the exponent cannot be replaced by a smaller one and that the bound is supposed to hold true for all possible values of γ. Theorems 7.14 and 7.15 in [6] also hold literally for SINE as simple corollaries of theorem 5.1

Illustration and discussion
We use the multiplication operator T : L 2 (0, 1) → L 2 (0, 1), Tf (t) := tf (t), in order to illustrate the theoretical findings. The range of T is not closed. For example, it can be readily seen that any constant function apart from zero is in the closure R(T) of the range R(T), but not in the range R(T) itself. We further have T * = T and T * Tf (t) = t 2 f (t). For the fractional powers of the operator T * T, we obtain (T * T) µ f (t) = t 2µ f (t), µ > 0. We choose the exact solutions x + 1 = t and x + 2 = t 3 with right-hand sides y 1 = Tx + 1 = t 2 and y 2 = Tx + 2 = t 4 , respectively. Then x + 1 ∈ X 1/2,1 and x + 2 ∈ X 3/2,1 . We use the perturbed right-hand sides y δ i = y i + δ, i = 1, 2, with y δ i − y i = δ. The linear systems with the perturbed right-hand sides y δ i , i = 1, 2, do not have a solution and a regularisation is necessary. We now use SINE to compute regularised solutions, where the iteration is stopped according to the discrepancy principle with τ = 1001/1000. In figure 1, the L 2 -norm of the error of the computed regularisation is plotted versus δ −1 . The red circle-marked line belongs to the error with respect to x + 1 = t and the green, square-marked line belongs to the error with respect to x + 2 = t 3 . As predicted by theorem 4.3, the convergence to the exact solution with decreasing perturbation δ is at least δ 1 2 or δ 3 2 , respectively, which are indicated by the gray lines. The operator is simple enough such that all computations could be conducted exactly by using the computer algebra system Maple. Due to construction, SINE will stop faster with respect to the discrepancy principle than any other method computing regularisations in the shift-and-invert Krylov subspace Q m . That is, where these methods have been used successfully, SINE should also be a very good choice. That SINE might also be useful with respect to regularisation schemes that do not use the shift-and-invert Krylov subspace Q m , will be illustrated by another simple experiment where we compare SINE and CGNE. For γ = 1 1000 and x + = t, y δ = t 2 + δ, δ = 1 1000 , SINE stops after two steps with t 3 + 1507 1500 t = c 1 T * y δ + c 2 (1 + T * T/γ) −1 T * y δ ∈ Q 2 , c 1 = −21/5000, c 2 = 15 070 063/15 000, whereas CGNE produces a polynomial of degree 39 after 19 steps. Both methods have been stopped according to the discrepancy principle with τ = 1001 1000 . In figure 2, it can be seen that the SINE regularisation on the left-hand side is qualitatively better than the CGNE regularisation on the right-hand side. The experiment also shows that the stopping index of SINE can be significantly smaller than the stopping index of CGNE. With the designations of section 5, we have m γ (δ, y δ ) = 2 < 19 = m ∞ (δ, y δ ).
Altogether, the theory and the experiment suggest that SINE is a valid order-optimal regularisation scheme. It is also hoped that the given analysis inspires further research on the regularisation properties of rational Krylov subspace methods. For example, it is immediately clear that theorem 5.1 carries over to rational Krylov subspaces with arbitrary negative real poles, when the method is defined analogous to (7). Even choosing negative poles at random can only improve on CGNE with respect to the stopping index. These more general rational Krylov subspace methods might also be seen as accelerations of the nonstationary iterated Tikhonov iteration (e.g. [16]) or of method (ii) in example 1.1 with varying step sizes. While there is only one polynomial Krylov subspace, rational Krylov subspaces inspire a wide range of methods that might be adapted to the needs at hand. As a possible application, rational Krylov subspaces have been successfully used to accelerate computations related to seismic imaging (e.g. [4,19,31,32]), which is known to be an ill-posed inverse problem.