On the Choice of the Tikhonov Regularization Parameter and the Discretization Level: A Discrepancy-Based Strategy

We address the classical issue of appropriate choice of the regularization and discretization level for the Tikhonov regularization of an inverse problem with imperfectly measured data. We focus on the fact that the proper choice of the discretization level in the domain together with the regularization parameter is a key feature in adequate regularization. We propose a discrepancy-based choice for these quantities by applying a relaxed version of Morozov's discrepancy principle. Indeed, we prove the existence of the discretization level and the regularization parameter satisfying such discrepancy. We also prove associated regularizing properties concerning the Tikhonov minimizers.


Introduction
In many applications, inverse problems are solved under a finite-dimensional and discrete setup with noisy, and sparse data, although the theoretical framework is infinite dimensional. See [11,12,21]. Thus, the relation between the finite-and the infinite-dimensional descriptions of the same problem should be well-understood. More precisely, it is important to state a criterion to find appropriately the domain discretization level in terms of the available data, in order to find a reliable solution of the inverse problem, which is in general ill-posed.
Thus, under the context of Tikhonov-type regularization, we propose a discrepancybased rule for choosing appropriately a regularization parameter and a domain discretization level. We also establish the corresponding regularizing properties of this rule under fairly general assumptions inspired by [21,Chapter 3].
Several authors have considered discretization as a regularization tool. See [12,13,15,17] and references therein. We go one step further by analyzing the interplay between the regularization parameter and the domain discretization level from a discrepancy principle viewpoint.
Assume that a model is given by the operator F : D(F ) ⊂ X → Y , defined in the reflexive Banach spaces X and Y , with a convex domain D(F ). Thus, we want to identify the element x in D(F ) ⊂ X that generated, through F , the data y ∈ R(F ) ⊂ Y . In other words, we have the following problem: Problem 1. Given y ∈ R(F ), find x ∈ D(F ) satisfying: Problem 1 is an idealization, since it is intrinsically assumed that the data y is perfectly measurable, i.e., there is no uncertainty when measuring y. However, in practice we have access only to noisy data in Y , denoted by y δ . Furthermore, in general the inverse of the forward operator is not continuous or not well-defined (ill-posed).
When the statistics of the noise is available and the way it corrupts the data is known, the noisy and the noiseless data are related by: where e is the noise, which is given by some random variable and the function h(·, ··) states how the uncertainties corrupts the data. See [23]. Remark 1. Let us assume that X and Y are Hilbert spaces 1 . Let the noise be additive, i.e., h(y, e) = y + e, and let e be a zero-mean Gaussian random variable with covariance operator Σ, where Σ : Y −→ Y is a positive-definite and bounded from below linear operator. Then, the noiseless and the noisy data should satisfy with δ > 0 and Σ ≥ 1/δ 2 . By the positiveness and the boundedness of Σ, it follows that ·, Σ −1 · · is a scalar product equivalent to the standard one of Y . In other words, Y with ·, Σ −1 · · is also a Hilbert space. Therefore, when we have a zero-mean Gaussian noise with covariance operator Σ, there is no loss of generality if we assume that y δ , y and the noise level δ > 0 satisfy: The presence of noise and the intrinsic ill-posedness of Problem 1 imply that some regularization technique should be addressed to find stable approximate solutions. Thus, we investigate a Tikhonov-type regularization approach. See [11,21]. More precisely, we analyze: Problem 2. Find a minimizer in the domain of the operator F , which is assumed to be convex, for the Tikhonov functional with α > 0 and 1 ≤ p < +∞. 1 The case of Banach spaces follows similarly by replacing the scalar product by the dual pairing.
In the sequel, we shall make some fairly general assumptions on the functional f x 0 . We remark that, we introduced x 0 in (4) to allow the introduction of some a priori information.
We analyze Problem 1 in a discrete setup. Thus, besides choosing appropriately the regularization parameter α, we should choose also a proper level of discretization in the domain of the forward operator. Such choice should be made taking into account the available data y δ . Thus, we base our choice of both parameters on the same relaxed version of Morozov's discrepancy principle. The present approach applies in a nontrivial way the methodology developed in [16,6,4,5] to the context of nonlinear operators under a discrete setting. We also establish that the continuous case can be recovered from the discrete one, when the discretization level goes to infinity.
We propose an a posteriori choice for the discretization level in the domain of the forward operator and the regularization parameter α in the Tikhonov functional (4) based on the following relaxed version of the Morozov's discrepancy principle: Problem 3. Given 1 < τ < λ fixed, find m ∈ N and α > 0, such that where x δ m,α , is a minimizer of (4) in D(F ) ∩ X m , with X m a finite-dimensional subspace of X.
Under this framework, we prove that, if Problem 1 has a unique solution and the parameters α and m satisfy (5), then its discrete regularized reconstructions (weakly) converge, when the noise level δ goes to zero, to the solution of Problem 1. When uniqueness does not hold, we also have convergence for some solution of Problem 1. However, in this case, the regularized solutions converge to some f x 0 -minimizing solution of Problem 1 (see Definition 1), under some restrictions on the choice of m.
We observe that, in general the set of elements in the finite-dimensional subspace X m satisfying (5) maybe empty. Thus, we shall prove that there exists some m and α such that the discrepancy principle (5) is satisfied. Furthermore, under suitable assumptions, convergence and convergence-rate results for the Tikhonov minimizers can be obtained for such parameters in terms of the noise level δ.
Part of the proof of these results rely on the well-posedness of a discrete version of the Morozov's discrepancy principle presented in [4,5,6].
We also present some guidelines to adapt the proofs of these results when the forward operator is replaced by a discrete approximation or the discrepancy principle (5) is replaced by the sequential discrepancy principle studied in [3]. This article is divided as follows: In Section 2 we introduce the discrete setup and make some assumptions concerning Tikhonov-type regularization. We also present existence and stability results for the minimizers of (4) with a fixed discretization level. In addition, we prove the convergence of the approximated solutions to some solution of Problem 1. The well-posedness of the discrepancy principle (5) for finding the appropriate discretization level in the domain and regularization parameter is stated in Section 3. The convergence and convergence-rate results associated to the discrepancy principle (5) are established in Section 4. We also observe a key change in the proof of the convergence result when, instead of the forward operator F , we consider a discrete approximation of F in Section 5. In Section 6, we introduce an alternative discrepancy principle, which is more general than that of Equation (5). We also comment the principal changes in the proof of the convergence results, when using this discrepancy principle. Section 7 is devoted to numerical examples that illustrate the present approach. The regularizing properties of the auxiliary discrepancy principle introduced in Section 3 is proved in Appendix A.

Preliminaries
We now define the discrete framework used in the subsequent sections. We also present some preliminary results on Tikhonov regularization under the discrete setup and make some important assumptions.
The regularizing functional f x 0 : D(f x 0 ) → R + is weakly lower semicontinuous, convex, coercive, and proper. We also assume that D(F ) is in the interior of D(f x 0 ). Assumption 2. The forward operator F is continuous under the strong topologies of X and Y . We also assume that the level sets α,x 0 (x) ≤ ρ} are weakly pre-compact and weakly closed. Moreover, the restriction of F to M α (ρ) is weakly continuous under the weak topologies of X and Y . x † ∈ L := argmin{f x 0 (x) : x ∈ LS}.
We always assume that L = ∅.
Note that the sets LS and L depend on the noiseless data y.
Assumption 3. Let x † be an f x 0 -minimizing solution for Problem 1 and x 0 ∈ D(F ) be fixed. We assume that: Note that Assumption 3 is satisfied by many classes of operators, such as the class of locally Hölder continuous functions with exponent greater than 1/2, with p = 2. See [11,21] and references therein.
In the remaining part of this section we define the discrete setup of the present article. Let the sequence {X m } m∈N of finite-dimensional subspaces of X satisfy: Definition 2. Define the finite-dimensional sets: The set D m is convex since it is the intersection of a subspace of X with a convex set. Note that, if we had chosen D m as the orthogonal projection of D(F ) onto the finitedimensional subspace X m , we could possibly have that D m ∩ X − D(F ) = ∅, since F is not necessarily linear and D(F ) is not necessarily a subspace of X. Therefore, this definition ensures that D m ⊂ D(F ) for every m ∈ N.
For now on, we assume that D m = ∅, for every m. Thus, we want to find x δ m,α ∈ D m minimizing (4), with m and α appropriately chosen.
The analysis that follows depends on how fast the restriction of the operator F to D m converges to F as m → ∞. Thus, we have the following definition: Proof. From (7) it follows that x − P m x → 0 as m → ∞ for every x ∈ D(F ). Since the operator F is continuous, the assertion follows.

Existence and Stability of Tikhonov minimizers
We consider the following optimization problem: We present below some well-known results concerning the existence and the stability of the solutions of Problem 4. See [19,Proposition 2.3].
Theorem 1 (Existence). Let m ∈ N and δ > 0 be fixed. Then, for any y δ ∈ Y , it follows that Problem 4 has a solution.
Definition 4. For given data y δ , we call a solution of Problem 4 stable if for a strongly convergent sequence {y k } k∈N ⊂ Y , with limit y δ , the corresponding sequence {x k } k∈N ⊂ X of solutions of Problem 4, where y δ is replaced by y k in the functional of Problem 4, has a weakly convergent subsequence {x k l } l∈N , with limitx, a solution of Problem 4 with data y δ .
Theorem 2 (Stability). For each m ∈ N, the solutions of Problem 4 are stable in the sense of Definition 4. Moreover, the convergent subsequence {x k l } l∈N with limitx from Definition 4 satisfies the limit f x 0 (x k l ) → f x 0 (x).

Convergence
The following theorem shows that the finite-dimensional Tikhonov minimizers converge to some f x 0 -minimizing solution of Problem (1).
Theorem 3. Assume that α = α(δ, γ m ) > 0 satisfies the limits: Let {x k } k∈N be a sequence of solutions of Problem 4 with x k = x δ k m k ,α k and δ k , γ m k → 0 when k → ∞. Then, it has a weakly convergent subsequence {x k l } l∈N with weak limit x † , an Proof. Let us choose a sequence {x k } k∈N of solutions of Problem 4 with data y δ k , where x k ∈ X m k and α = α(δ k , γ m k ) satisfy (11). The existence of this sequence follows by Theorem 1. Then, we have the estimates: Recall that the level sets M α (ρ) are pre-compact and X m ⊂ X m+1 for every m. Then, the above estimate implies that {x k } k∈N is bounded and thus it has a weakly convergent subsequence, also denoted by {x k } k∈N , with limitx ∈ D(F ). It also follows that On the other hand, we have the following estimate: Recall that the functional f x 0 and the norm of Y are weakly lower semi-continuous. Then, the weak continuity of F implies that: Therefore,x is an f x 0 -minimizing solution of Problem (1).

The Discrepancy Principle
In this section we consider the simultaneous choice of the discretization level in the domain and the regularization parameter based on the same discrepancy principle.
In Section 6 we present an alternative discrepancy principle, where only one of the inequalities of (12) should be satisfied. Thus, even the functional α → F (x δ m,α − y δ has some discontinuity, the inequality is satisfied by some α, since, we shall see that lim α 0 F (x δ m,α − y δ = 0. The existence of the regularization parameter and the discretization level satisfying the Discrepancy Principle (12) follows by the well-posedness of the modified Morozov's discrepancy principle (7). More precisely, we have to choose m ∈ N such that γ m satisfies a modified version of (12). For this same m, we choose α > 0 through (7), given that it is well-posed. Then, these α and m satisfy the same discrepancy principle, as required. The well-posedness proof of this problem, for each discretization level m ∈ N, is the aim of the following paragraph.
In what follows we assume that x 0 in the penalization f x 0 is an element of the finitedimensional sub-domain D m 0 for some m 0 ≤ m, for every m considered in the analysis.

Discrete Morozov's Principle
We now present a criterion to choose the regularization parameter α by a modified version of the Morozov's principle for a given discretization level m ∈ N.
In what follows we assume that: Definition 6. Let δ, y δ and the domain discretization level m be fixed. For α ∈ R + , we define the functionals: We also define the set of all solutions of Problem 4 for each α ∈ (0, ∞) and m ∈ N: Note that, in what follows we assume that the Tikhonov functional is defined for any x ∈ X m , i.e., it assume finite values if x ∈ D m and it assumes the value +∞ if x ∈ D m . Note also that the definition of the sets M α,m implies that every local minimizer of the Tikhonov functional is a global minimizer in X m .
In the following results, some properties of the functionals L, H and I are presented. See [24, Section 2.6].

Lemma 2.
[24, Lemma 2.6.1] As functions of α ∈ (0, ∞), it follows that, the functional H(·) is non-increasing and the functionals L and I are non-decreasing. More precisely, for 0 < α < β we have H(x) and I(α) ≤ I(β). and are countable and coincide. Moreover, for each m ∈ N the maps L and H are continuous Remark 2. Even we are under a discrete setting, it follows that the proofs of the Lemmas 2, 3 and 4 hold, if the functionals L, H and I are restricted to the finite-dimensional subspace X m , with m ∈ N fixed.
In the present section we consider the following relaxed version of Morozov's discrepancy principle: Definition 7 (Discrete Morozov's Principle). Let δ, y δ and the domain discretization level m be fixed. Define τ 1 := τ and let τ 2 be such that 1 < τ 1 ≤ τ 2 < λ. Then, find α = α(δ, y δ , m) > 0 such that holds for x δ m,α , a solution of Problem 4. In Section 6 we present an alternative discrepancy principle.
Then, we can find α and α > 0, such that where we denote x 1 := x δ m,α and x 2 := x δ m,α . Proof. Let m be fixed and let the sequence {α k } k∈N converge monotonically to zero. By Theorem 1, it is possible to find the sequence of Tikhonov minimizers of Problem 4, {x k } k∈N , with x k = x δ m,α k . By the definition of the functionals L and I, it follows that Note that, f x 0 (P m x † ) is fixed and τ 1 > 1. It follows that, L(x k ) < τ 1 (γ m + δ) p . Then, for a sufficiently largek the above inequality holds. This allows us to set α := αk.
On the other hand, let us assume that the sequence {α k } k∈N satisfies lim k→∞ α k = +∞. Again, by Theorem 1, it is possible to choose a sequence of minimizers of Problem 4 {x k } k∈N , with x k = x δ m,α k for each k ∈ N. Since f x 0 (x) = 0 if, and only if, x = x 0 , it follows that, By the weak continuity of F and the weakly lower semi-continuity of the norm of Y , it follows that This implies that there exists a sufficiently largek such that F (xk) − y δ > τ 2 (δ + γ m ). Then, set α := αk.
Following [4] we have the following: is satisfied for some x, x ∈ M α,m .
The following result shows the well-posedness of the discrepancy principle of Definition 7.
From Equation (18) it follows that A ∩ B = ∅. Since we are assuming that there is no α > 0 satisfying (19), it follows that A ∪ B = R + .
Define α := sup A. By Proposition 2 and since L is non-decreasing, it follows that α < +∞. Then, it must belongs to A or B.
If α ∈ A, then, it is possible to find a sequence {α k } k∈N ⊂ B converging to α with α k > α, since, A ∪ B = R + and L is non-decreasing. Thus, let us select a sequence of minimizers {x k } k∈N with x k = x δ m,α k ∈ M m,α k . We have the estimates: This is a contradiction, since τ 1 < τ 2 .
On the other hand, assume that α ∈ B. Then, it is possible to find a sequence {α k } k∈N ⊂ A converging to α, with α k < α. This follows by the same argument mentioned above. By selecting a sequence of minimizers {x k } k∈N with x k = x δ m,α k ∈ M m,α k , we have the following estimates: We have thus established the well-posedness of the discrete Morozov's principle. See Appendix A for the corresponding regularizing properties.
Under the present setup, if we choose m ∈ N sufficiently large and such that is satisfied with λ > τ 2 > 1. Then, for this same m ∈ N, it follows that, when α is chosen through Definition 7, the discrepancy is satisfied with x δ m,α a solution of (4). This follows since, τ 1 δ ≤ τ 1 (δ +γ m ) and τ 2 (δ +γ m ) ≤ λδ.
This leads us to the proof of Proposition 1: Proof. Let us consider the sets M α,m of solutions of Problem 4, corresponding to α and m. Recall that, by Theorem 1, these sets are nonempty. We also define the sets: Note that, it may occur that M α,m ∩ A δ,m = ∅. Thus, assuming that γ m = O(δ), by Theorem 4 there exist some α > 0 and m ∈ N such that M α,m ∩ A δ,m = ∅. More precisely, let m satisfy γ m ≤ (λ/τ 2 −1)δ, with τ < τ 2 < λ. Also, let α be chosen through Definition 7, with τ 1 = τ . Then, for such m and α, it follows that Remark 3. Therefore, the problem of finding m and α through the discrepancy principle of Definition 5 is well-posed.

Regularizing Properties
In the previous section we have established the well-posedness of the discrepancy principle (12) as a rule to select the parameters m and α, with fixed data y δ and noise level δ. We now explore some corresponding regularizing properties.
The sets defined above shall be used in the proof of convergence results, whenever we need to assume that H m is nonempty. Observe that, for m sufficiently large, H m is indeed nonempty, since: is the open ball centered in y δ and with radius (τ −ε)δ. It also follows that Then, it is possible to find a sequence {x k } k∈N , with x k ∈ H m k , converging strongly to x † . The following proposition states a connection between the discrete setting and the continuous one.
Proposition 3. Let us consider the limit m → ∞, with δ > 0 fixed. We select sequences is a solution of Problem 4 in D m k and α k is the corresponding regularization parameter. Assume that, for each k, x δ m k ,α k satisfies the discrepancy principle (12). Then, x is a Tikhonov minimizer in D(F ) with regularization parameter α, satisfying the discrepancy principle Proof. Choose a sufficiently large m 0 ∈ N, such that, for every m ≥ m 0 , the set H m is nonempty. Then, whenever m ≥ m 0 , we choose α = α(δ, y δ , m) satisfying (22), with x δ m,α a corresponding Tikhonov solution in D m . By Lemma 3, the functionals defined in the Equations (14), (15) and (16)  On the other hand, since the level sets of the Tikhonov functional are weakly precompact, it follows that the sequence {x δ m,αm } m∈N , of corresponding Tikhonov minimizers, is weakly pre-compact.
Let us define α = lim inf m→∞ α m . We select the convergent subsequences {α m k } k∈N and Recall that F is weakly continuous. Then, we have the estimates: We now claim that x is a Tikhonov minimizer in D(F ), with regularization parameter α. Indeed, for an arbitrary and fixed x ∈ D(F ), we choose a sequence {x k } k∈N , with x k ∈ D m k for each k ∈ N and x k → x strongly. Since D(F ) is in the interior of D(f x 0 ) and the operator F is continuous, it follows that By the (weak) lower semi-continuity of f x 0 (·) and F (·) − y δ , it follows that: Since, for each m, x δ m k ,αm k is a Tikhonov minimizer in D m k , with regularization parameter α m k . Thus, F δ αm k ,x 0 (x δ m k ,αm k ) ≤ F δ αm k ,x 0 (x k ), for every k. Therefore, by applying lim inf on both sides and considering Equations (23) and (24), it follows that: Since x was arbitrarily chosen in D(F ), the assertion follows.

Convergence
We now present results concerning the convergence of the approximate solutions.
Theorem 5. Let m and α satisfy the discrepancy principle (12). Let us consider the sequence of real numbers {δ k } k∈N , satisfying δ k > 0 and δ k → 0. Then, every sequence of regularized solutions {x k } k∈N , with x k = x δ k m k ,α k , has a subsequence converging weakly to some least-square solution of Problem 1. Moreover, If there exists a unique solution x † for Problem 1, then the whole sequence converges weakly to x † .
Proof. Let us choose the sequence {δ k } k∈N , satisfying δ k → 0 monotonically. For each δ k , it follows by Proposition 1 that there exist m k and α k and some regularized solution x k = x δ k m k ,α k satisfying the discrepancy principle (12). Then, we can find a sequence {x k } k∈N , associated to {δ k } k∈N .
According to Assumption 2, the level sets of the Tikhonov functional (4) are weakly precompact. Since D(F ) is convex, the sequence {x k } k∈N has a weakly convergent subsequence {x k l } k∈N with limitx ∈ D(F ).
By the weak lower semi-continuity of the norm and the weak continuity of F , it follows that: Since, for each l ∈ N, x k l satisfies the discrepancy principle, in particular, F (x k l ) − y δ k l ≤ λδ k l , then: This leads to F (x) − y = 0. Therefore,x is a least-square solution of Problem 1. If the inverse problem has a unique solution x † , then x † =x and γ m k l → 0. Furthermore, the whole sequence {x k } k∈N converges weakly tox, since it is the unique cluster point of {x k } k∈N , which is bounded.
Remark 4. The existence of the parameters m and α satisfying the discrepancy principle (12) is guaranteed by Proposition 1. Intuitively, m is the largest discretization level satisfying such discrepancy principle and α is the associated Morozov's regularization parameter. When implementing the Tikhonov regularization numerically, the discrepancy principle can be used as a stopping criterion in the minimization procedure.
Observe also that, if we are looking for a least-square solution of Problem 1, or if the inverse problem has a unique solution, then no further assumption or restriction on the choice of m is needed. Thus, the convergence result holds.
Proof. (i) Following the same arguments of the proof of Proposition 5, we choose the sequences {δ k } k∈N , with δ k ↓ 0, and {x k } k∈N , the corresponding sequence of Tikhonov minimizers, with x k = x δ k m k ,α k . We can assume that the latter has a weak limit,x ∈ D(F ). By Equation (25),x is a least-square solution of Problem 1. Let us assume also that each element of the sequence of regularization parameters {α k } k∈N , associated to {x k } k∈N , satisfies the Morozov's principle.
Since H m k = ∅ for each k ∈ N, it is possible to choose a sequence {x k } k∈N , such that x k ∈ H m k , for each k ∈ N and x k → x † . Moreover, the following estimates hold: Thus, by the weak lower semi-continuity of f x 0 and the above estimates, it follows that Sincex is a solution of the Inverse Problem 1, it follows from (29) thatx is also an f x 0minimizing solution. Hence, f x 0 (x) = f x 0 (x † ). Therefore, the inequalities in (28) imply that f x 0 (x k ) − f x 0 (x k ) → 0. Then, the second limit in (27) holds.
(ii) Under the same hypotheses and notation of (i), assume that there exist a constant c > 0 and a subsequence {α k l } l∈N , with α l = α(δ k l , y δ k l , m k l ), such that α k l > c, for every l ∈ N. Define α := lim inf l→∞ α k l .
Let us consider the corresponding subsequence of Tikhonov minimizers {x k l } l∈N . Since the original sequence is weakly convergent tox, it follows that the subsequence {x k l } l∈N also weakly converges tox. On the other hand, let us consider the subsequence {x k l } l∈N , with x k l → x † as in (i), we have the following estimates Sincex is the weak limit of {x k l } l∈N and it is a f x 0 -minimizing solution of Problem 1, it follows by the above estimates that Then,x is a solution of Problem 4 with (noiseless) data y and regularization parameter α.
Remark 5. If we assume that H m is nonempty, thenx, the weak limit of the sequence of minimizers defined in the proof of Theorem 6, is an f x 0 -minimizing solution of Problem (1). Note that, it is always possible to increase the discretization level m, in order that H m = ∅. On the other hand, if Problem (1) has a unique solution, then this assumption is unnecessary. See Proposition 3.

Convergence Rates
The first theorem of the present section states the convergence rates of the regularized solutions of Problem 1, associated to m and α satisfying the discrepancy principle (12), with respect to δ. The following results generalize this theorem for more general forward operators, however further restrictions on the choice of m are necessary.
In the first part of this section we introduced some definitions, assumptions and auxiliary lemmas that are necessary to establish the convergence rates results.

Definition 9 ([21],Definition 3.15). Let U denote a Banach space and
be a convex functional with sub-differential ∂f (u) at u ∈ D(f ). The Bregman distance (or divergence) of f at u ∈ D(f ) and ξ ∈ ∂f (u) ⊂ U * is defined by for everyũ ∈ U , where ·, ·· is the dual product of U * and U . Moreover, the set is called the Bregman domain of f .

Lemma 5.
Let m and α satisfy the discrepancy principle (12). Assume that for this m, H m is nonempty and let x δ m,α be the respective minimizer of (4). Then, for any ξ † ∈ ∂f x 0 (x † ).
Proof. Let m = m(δ, y δ ) satisfy (12). By the same arguments of the proof of Theorem 6 and assuming that H m = ∅, it follows that , where x m ∈ H m and x m → x † strongly. For any ξ † ∈ ∂f x 0 (x † ), by Assumption 3 and the definition of Bregman distance, we have the following estimates: Lemma 6. Let α and m be chosen through the discrepancy principle (12). Let also ε ∈ (ε, τ − 1) be fixed and let H m be nonempty. Define Proof. By the discrepancy principle (12), it follows that for every x ∈ H m . Then, It follows that Note that, in Lemma 6 we have assumed that ε > ε. Recall that ε p δ p /α > 0, f x 0 is continuous in the interior of D(f x 0 ) and D(F ) is in the interior of D(f x 0 ). Then, the the estimate inf x∈Hm f x 0 (x) ≤ κ + ε p δ p α is satisfied for every sufficiently large m ∈ N.
Inspired by [21,Chapter 3], we have the following assumption: Theorem 7 (Convergence Rates). Let m and α be chosen through the discrepancy principle (12) and let Assumption 5 be satisfied. In addition, suppose that H m is nonempty. If x δ m,α is a minimizer of (4) and x m ∈ H m , then the estimates hold: . Moreover, if the hypotheses of Lemma 6 also hold, we have: Proof. (i) The estimate F (x δ m,α ) − y δ = O(δ) follows directly by the discrepancy principle (12). (ii) Let us prove the estimate (35). By Assumption 5, if x m is an element of H m , then Analogously, it follows that α , x † ) + β 2 (λ + 1)δ. Lemma 5 and the above estimates yield that and we have the second estimate in (35).
(iii) We pass now to the proof of the second estimate in (36), since the first one follows by the same arguments of the first estimate of (35). By Lemma 6 and Assumption 5, there exist constants β 1 ∈ [0, 1) and β 2 ≥ 0 such that . This implies that, Proposition 4. Let m and α satisfy the discrepancy principle (12). Assume that for the same m and α, H m is nonempty and x δ m,α is a minimizer of (4). Assume that F is Frechét differentiable in x † and let the source condition hold. Let also the estimates hold with C constant and x in B(x † , η), for some η > 0. Again, let x m be an element of H m . Then, we have the convergence rates Proof. The first estimate follows by the discrepancy principle (12). Let x m be an element of H m . Recall that f x 0 is convex, then it is locally Lipschitz continuous in the interior of D(f x 0 ). Since x † and x m are indeed interior points of D(f x 0 ), then we have: A similar argument yields that Then, from the above estimates and Lemma 5, we have the following: Definition 10 (q-Coerciveness). Let 1 ≤ q < ∞ and u ∈ D(f ) be fixed. The Bregman distance D ξ (·, u) is called q-coercive with constant ζ > 0, if the inequality Example 1. Let X be a Hilbert space and let f x 0 (x) = x−x 0 2 X be the quadratic Tikhonov functional. It follows that the norm of X is 2-coercive, since: Then, the estimate (36) of Theorem 2 implies in L 2 -convergence with order O( √ δ). See [21].

Remark 6.
In addition to the hypotheses of Theorem 6, let us assume that the norm of X is q-coercive. Then, by (35) and (36) we have the estimates respectively.

Discrete Forward Operator
We now present some aspects to be considered when we replace the continuous forward operator by a finite-dimensional approximation. Let us consider a sequence of finite-dimensional subspaces Then, we replace the continuous forward operator by some finite-dimensional approximation. In Section 7, we consider as an illustrative example, the discretization of the parameter to solution map that associates a diffusion parameter to the solution of a Parabolic Cauchy problem. The discretization is then defined by Crank-Nicolson scheme that solves numerically the associated parabolic partial differential equation.
In the present discrete setting, we consider the following alternative discrepancy principle: Definition 11. Let δ > 0 and y δ be fixed. For λ > τ > 1, we choose m, n ∈ N and α > 0, with m = m(δ, y δ ), n = n(δ, y δ ) and α = α(δ, y δ ), such that holds for x δ,α m,n , a solution of In the present context, all the results of the previous sections hold. However, some additional calculations should be done when F is replaced by F m . The main argument in the convergence analysis is based on the existence of a diagonal subsequence converging (weakly) to an f x 0 -minimizing solution of Problem 1, when the limits δ → 0, m, n → ∞ are taken.
More precisely, when δ > 0 is fixed, the limit n → ∞ is taken and the discrepancy principle (40) holds true for every n. Then, we can find a sequence of minimizers {x δ,α m,n } n,∈N , converging weakly to some minimizer of (4), satisfying (12). By Proposition 3, if we also take the limit m → ∞, the resulting sequence has a weakly convergent subsequence with limit satisfying the continuous version of the Morozov discrepancy principle presented in [4]. For this reason, we can always assume the existence of a diagonal subsequence converging (weakly) to an f x 0 -minimizing solution of Problem 1 when δ → 0.
The proof of these results in the specific example of local volatility calibration by Tikhonov regularization can be found in Section 4 of [1].

An Alternative Discrepancy Principle
In general, Assumption 4 does not hold for nonlinear forward operators. See [22,Remark 4.7]. More precisely, one of the inequalities of the discrepancy principle (12) is not satisfied with prescribed constants 1 < τ ≤ λ or 1 < τ 1 ≤ τ 2 . Thus, as an alternative, whenever ensuring (12) is not possible, we base our choice of α, for a fixed m, on the sequential discrepancy principle, presented in [3]. It goes as follows: Definition 12 (Sequential Morozov Criteria). For prescribedτ > 1, α 0 > 0 and 0 < q < 1, we choose k ∈ N such that α k := q k α 0 satisfies the discrepancy for some x δ m,α k ∈ M α k ,m and x δ m,α k−1 ∈ M α k−1 ,m .
The existence of α and m satisfying the discrepancy (42), follows directly from Proposition 2 and by assuming that m is sufficiently large. More precisely, we can replace, for instance,τ δ in (42) by (1 + )(γ m + δ). Then, the estimate always hold by Proposition 2, for every fixed m and ∈ (0,τ − 1). Thus, for a sufficiently large m, it follows thatτ δ ≈ (τ − )(γ m + δ). Theorems 5 and 6 remain valid if the discrepancy principle 12 is replaced by the sequential discrepancy principle (42). This follows by noting that, whenever the lower inequality in the discrepancy principle (12) holds, we can replace it byτ ≤ F (x δ m,α/q ) − y δ , where α satisfies the lower inequality of the sequential discrepancy principle (42) and α/q satisfies the upper one. See [3,Section 3].
The discrepancy principle (12) is always preferable, since its lower inequality implies that F (x δ m,αn )−y δ ≥ τ δ. This avoids the Tikhonov solutions to over fit and to reproduce noise. On the other hand, the same conclusion is not necessarily true if the sequential discrepancy principle (42) is used.
When using the sequential discrepancy principle (42), it is not alway possible to achieve the rate of convergence D ξ † (x δ m,α , ax † ) = O(δ). The technical point is that, if α satisfies the lower inequality of (42), then the estimate f x 0 (x δ m,α ) − f x 0 (x † ) ≤ 0 does not necessarily holds. Such estimate holds for α/q, instead. An additional condition for achieving the convergence rate D ξ † (x δ m,α , ax † ) = O(δ) with the sequential discrepancy principle (42) is to assume that α = O(δ). For a more detailed discussion about convergence rates under the sequential discrepancy principle (42), see [3,Section 4] and [14].

Numerical Examples
We shall now illustrate the theoretical results of the previous sections with some numerical examples based on the calibration of a diffusion coefficient in a parabolic problem. See [1,7,8,9]. More precisely, let a 1 , a 2 ∈ R be scalar constants such that 0 < a 1 ≤ a 2 < +∞ and let a 0 ∈ H 1+ε (R + × R) be fixed. Define the set Assuming that the data u was generated by the following parabolic problem: our problem is to find the diffusion parameter a ∈ Q.
We define the forward operator by: with a 0 ∈ Q fixed and a priori chosen. The choice of H 1+ε (R + × R) is justified in [8,9]. The forward operator under consideration fulfills the hypotheses of the previous sections theorems. Here, we shall implement numerically the Tikhonov regularization for this specific problem with synthetic data. For the technical details, see [1,7,8,9].
In the calibration we take as true (known) diffusion coefficient the following: and set a = σ 2 /2. We also assume that b = 0.03 in Equation (43). We illustrate the discrepancy principle and the convergence-rate results of the previous sections by changing the noise and discretization levels.
We generate the data as follows: On a given mesh, we numerically solve the Cauchy problem of Equation (43) with the diffusion coefficient given in Equation (44), which is evaluated on the same mesh. We add a zero-mean Gaussian noise with standard deviation 0.01 to this numerical solution and interpolate the resulting data in a coarser mesh. We then use this data to calibrate the corresponding diffusion coefficient.
If ∆τ and ∆y denote the time and space mesh sizes, respectively, the calibration of the diffusion coefficient is numerically solved with different values of ∆τ , ∆y and the regularization parameter α, until the discrepancy principle is satisfied, where u δ is the noisy data and δ > 0 is the noise level. Note that, by the definition of F , u(a) − u δ = F (a) − (u δ − u(a 0 )). Thus, instead of using F (a δ m,α ) in the discrepancy principle of Equation (46), we simply use u(a δ m,α ), with no loss of generality. We use this data to calibrate the diffusion coefficient by Tikhonov regularization with the smoothing penalization: with β 1 = 0.5, β 2 = 0.25∆y, β 3 = 0.25∆τ and a 0 ≡ 0.08.
The minimization of the Tikhonov functional is performed recursively by the gradient method. More precisely, if a k denotes the diffusion coefficient at the kth iteration, the next step is given by a k+1 = a k − λ k ∇F u δ α,a 0 (a k ), until the Morozov discrepancy principle is satisfied or the maximum number of iterations is reached or yet the relative change in the residual is less than 1.0 × 10 −4 . We base the choice of the step-length λ k on the Wolfe rule. The algorithm is initialized with the step length λ 0 k = F u δ α,a 0 (a k−1 ) 2 / F u δ α,a 0 (a k ) 2 . See Algorithm 3.5 and Algorithm 3.6 in Chapter III of [18]. The parameters used in the Wolfe conditions are c 1 = 10 −8 and c 2 = 0.95. The iterations begun with a 0 = a 0 ≡ 0.08.
The data is generated with step sizes ∆τ = 0.0025 and ∆y = 0.01 and the coarser grid is given by the step lengths ∆τ = 0.02 and ∆y = 0.1. In the numerical solution of the inverse problem, Equation (43) is numerically solved in the same mesh we interpolate the data, i.e., we use ∆τ = 0.02 and ∆y = 0.1 in both cases. We vary the mesh used to evaluate the diffusion coefficient in order to highlight the discrepancy principle (46). The step sizes used in the tests were the following:  Figures 1 and 2 presents the residual and the error estimates associated to the regularized solutions for the above meshes, respectively. We stress that, in the present set of examples, we chose τ = 1.025 and λ = 1.125 in the discrepancy principle (46) and the illustration of this discrepancy principle can be found in Figure 1.
Ifũ and P n 0 (ũ + e) denote the full noiseless data and the noisy data in the coarser grid with noise e, respectively, then the noise level can be estimated by: δ = ũ − P n 0 (ũ + e) L 2 (D) =   In Figures 1, 2, 3 e 4, we have chosen the reconstructions with regularization parameter presenting the lowest residual satisfying the discrepancy principle of Equation (46). We also calculate the L 2 -error, i.e., the L 2 (D) distance between the regularized solution and the original diffusion coefficient. The resulting L 2 -error for the regularized solutions used in Figure 1 can be found in Figure 2. Note that, reconstructions with coarser meshes satisfying the discrepancy principle of Equation (46) presented satisfactory L 2 -error estimates, illustrating the reliability of its use for finding the appropriate discretization level in the domain and the regularization parameter. Figures 3 and 4 present reconstructions satisfying the discrepancy principle of Equation (46). Note that, the reconstructions with coarser grid satisfying the discrepancy in Figure 1 presented better L 2 -error estimates. Moreover, the surfaces displayed in Figure 3 are smoother than those of Figure 4.

Conclusions
Finding appropriate discretization levels is a well known challenge when solving Tikhonovtype regularization problems. In this work, we have shown that the Morozov discrepancy principle could also be used to find it appropriately. Since we are working in a discrete setting, some additional assumptions ought be made in order to establish theoretical results.
Under the above mentioned discrepancy-based choices, we also presented a convergence analysis with convergence rates in terms of the noise level. In addition, we presented some guidelines on how to apply these results when the forward operator is replaced by a discrete approximation. We also apply the sequential discrepancy principle given by Equation (42), for this discrete setting.
A numerical example illustrated the discrepancy principle when the noise level and the discretization level of the forward operator are kept fixed and the discretization level in the domain is varied.
Summing up, Morozov principle is a robust rule for determining appropriately the regularization parameter and discretization levels in Tikhonov regularization.
Proof. Consider the sequences {δ k } k∈N and {γ m k } k∈N converging monotonically to zero. Define the sequence {α k } k∈N by setting α k to be the regularization parameter α(δ k , γ m k ) satisfying Definition 7 for each k. Thus, for each α k we can find x k = x δ k m k ,α k , a solution of Problem 4. We thus define the sequence {x k } k∈N . By the pre-compactness of the level sets of the Tikhonov functional, it follows that {x k } k∈N has a weakly convergent subsequence, denoted by {x l } l∈N , with weak limitx ∈ D(F ). By the weak lower semi-continuity of the norm and the weak continuity of F , the following estimates hold: Note that in the above estimates we have used l instead of k l to easy notation. Note also that,x is a least-square solution of Problem 1. We also have the estimates: τ 1 (δ l + γ m l ) p + α l f x 0 (x l ) ≤ (δ l + γ m l ) p + α l f x 0 (P m l x † ).
Since τ 1 > 1, it follows that f x 0 (x l ) ≤ f x 0 (P m l x † ), which implies that f x 0 (x) ≤ f x 0 (x † ). Hence,x is an f x 0 -minimizing solution of Problem 1. Assume that there exists α > 0 and a subsequence {α ln } n∈N such that α ln ≥ α. Then, take the respective subsequence of minimizers {x ln } n∈N . Define the sequence of minimizers {x n } n∈N , with x n := x δ ln m ln ,α ln . Since L is non-decreasing, it follows that F (x n ) − y δ ln ≤ F (x ln ) − y δ ln ≤ τ 2 (δ + γ m ) → 0.
Note that, since x † ,x are f x 0 -minimizing solutions of Problem 1, it follows that f x 0 (x † ) = f x 0 (x). On the other hand, lim sup n→∞ αf x 0 (x n ) ≤ αf x 0 (x † ).
By the weak pre-compactness of the level sets M α (ρ), it follows that {x n } n∈N has a weakly convergent subsequence with limit x. By the above estimates, it follows that x is an f x 0minimizing solution for Problem 4. Denoting this subsequence by {x n } n∈N , it follows by the above estimates that Then, x is a solution of Problem 4 with data y (noiseless) and regularization parameter α.
Since f x 0 is convex and f x 0 (x) = 0 if, and only if, x = x 0 , it follows that, for every t ∈ [0, 1) we have It also follows that This shows that, αtf x 0 (x) ≤ F ((1 − t)x + tx 0 ) − y p .
Therefore, the first limit in (47) holds. In order to prove the second limit we proceed as follows. Since {x l } l∈N weakly converges tox with f x 0 (x l ) → f x 0 (x), it follows that τ p 1 (δ l + γ m l ) p + α l f x 0 (x l ) ≤ (γ m + δ l ) p + α l f x 0 (P m l x † ) The above estimate combined with the limit f x 0 (P m l x † ) → f x 0 (x † ) leads to where the right hand side converges to zero when l → ∞. Note that, we have used again the fact that f x 0 (x † ) = f x 0 (x) if x † ,x ∈ L.
Moreover, we have the convergence result of Theorem 8 with this choice of m.
Then, following the same arguments in the proof of Equation (47) in Theorem 8 and substituting τ 1 (δ + γ m ) by τ 1 δ and dominating δ + γ m by λ/τ 2 δ based on (20), it follows that the limits in Equation (48) hold. Consider the sequence of positive constants {δ k } k∈N converging monotonically to zero and define the sequence {m k } k∈N , with m k := m(δ k , y δ k ) satisfying (20). Thus, we can choose a sequence {x k } k∈N of solutions of Problem 4 with x k := x δ k α k ,m k and α k satisfying Definition 7. Then, the convergence of a subsequence, denoted by {x l } l∈N , to an f x 0minimizing solutionx follows by similar arguments in the proof of Theorem 8. We just have to substitute τ 1 (δ + γ m ) by τ 1 δ and dominate δ + γ m by λ/τ 2 δ based on (20).