Quasi-best approximation in optimization with PDE constraints

We consider finite element solutions to quadratic optimization problems, where the state depends on the control via a well-posed linear partial differential equation. Exploiting the structure of a suitably reduced optimality system, we prove that the combined error in the state and adjoint state of the variational discretization is bounded by the best approximation error in the underlying discrete spaces. The constant in this bound depends on the inverse square root of the Tikhonov regularization parameter. Furthermore, if the operators of control action and observation are compact, this quasi-best approximation constant becomes independent of the Tikhonov parameter as the mesh size tends to 0 and we give quantitative relationships between mesh size and Tikhonov parameter ensuring this independence. We also derive generalizations of these results when the control variable is discretized or when it is taken from a convex set.


Introduction
Optimization problems with PDE constraints are ubiquitous, in particular in inverse problems. A basic, and regularly considered, example is the Tikhonov regularized inverse source problem min (q,u)∈L 2 ×H 1 where |·| 0 denotes the L 2 -norm over some underlying domain, u d is the desired state and α > 0 scales the cost of the control or, in the case of a regularized inverse source problem, is the Tikhonov parameter tending to 0. In contrast to PDE constrained optimization problems, for inverse problems α is not fixed. Thus it is of interest to analyze the precise interplay of its value and the discretization parameter h used to approximate the PDE.
Of course, additional constraints on the control q and/or the state u can be imposed, and the error due to a discretization of the state equation, and possibly the control, have been analyzed. For piecewise constant discretizations of the control, this has been done in Falk [1] and Geveci [2] including possible box-constraints on the control variable; see also the summary of obtainable convergence orders including Neumann-control in Malanowski [3]. The consideration of element-wise linear functions for the control has been done by Casas and Tröltzsch [4] and Rösch [5] in the presence of control constraints.
Hinze [6] observed that the minimization problem could be solved numerically without prescribing a discretization of the control since the control can be first eliminated and then recovered through the optimality conditions. For this so-called variational discretization, he established O(h 2 ) convergence for the control in L 2 , even in the presence of box control constraints. It was observed by Meyer and Rösch [7] that the same convergence order can be obtained if a discretized control is used and a post-processing step based upon the optimality conditions is applied.
Due to the structure of the objective in (1.1) these above mentioned estimates make use of the 'natural norm' Although this norm is natural due to the functional, it induces a scaling √ α in all estimates involving the control. Further estimates, for instance of H 1 -norms of the state thereby also contain this scaling. Moreover, the above 'natural norm' is not balanced in terms of approximation accuracy, i.e. the error of the state in L 2 will typically decay at least as fast as the error of the control.
The later effect, however, is invisible as long as the approximation accuracy of both terms is limited by the selected discrete spaces, and not by the regularity of the solutions, as it is typically the case for the model (1.1). However, in the presence of pointwise constraints on the state, see, e.g. [8][9][10][11][12] or the gradient of the state [13][14][15][16] optimal order estimates can only be obtained for the control variable. Yet numerical results indicate a faster convergence of the error in the state variable in L 2 .
As an alternative to the aforementioned works, one may combine the error in the state with error in the (suitably rescaled) adjoint state p , measuring both in the norms that are given by the functional analytic set-up of the PDE constraint. For problem (1.1), this leads to the norm where |·| 1 denotes the H 1 0 -norm. For respective counterparts of (1.2), Chrysafinos and Karatzas [17,18] prove so-called symmetric error estimates or quasi-best approximation results. The growth of the quasi-best approximation constant is limited by α −2 and α −3/2 , respectively.
In this article, we prove abstract quasi-best approximation results, where the discretization error is measured in a counterpart of (1.2). In order to illustrate our results, assume that the underlying domain is convex, let (V h ) h be a sequence of conforming finite-dimensional spaces that approximates H 1 0 , and consider the variational discretization of (1.1). If we denote by x h = (u h , p h ) the pairs of approximate primal and dual states, our results yield (see theorem 3.3 and example 3.9) where the quasi-best approximation constant satisfies Here C F is the constant in the Friedrichs inequality and C I is an interpolation constant depending on the shape regularity on the underlying meshes. In contrast to the first, non-asymptotic relationship, the second, asymptotic one exploits the compactness of the observation and control action operators and elliptic regularity theory. Notably, the latter reveals that Céa's lemma, which holds for the constraint discretization, is recovered as h → 0 and, in particular, ensures an approximation quality independent of α for h = O( √ α). The rest of the paper proceeds as follows. In section 2, we state precisely the considered problem class, allowing for any linear, bounded, and inf-sup-stable operator in the constraint. Furthermore, we reduce the optimality system by eliminating the control, and we lay the groundwork for our results by a careful discussion of the continuity and nondegeneracy properties of the associated bilinear form.
Section 3 constitutes the core of this work and establishes the quasi-best approximation for the variational discretization. To this end, the variational discretization is viewed as a Petrov-Galerkin method and we employ the formula for the quasi-best approximation constant in Tantardini and Veeser [19]. For the asymptotic behavior of the quasi-best approximation constant, we additionally invoke a duality argument, which is similar to, but simpler than, Schatz [20].
The last two sections center on generalizations of these results. In section 4, we consider approximate control action operators, covering in particular the discretization of the control variable. Finally, section 5 deals with nonlinear optimality systems arising from additional convex constraints for the control. The derived results complement those of the linear case and the simplification of Schatz' argument comes in quite useful.
Let us conclude this introduction with a table providing an overview of a selection of our result. For each selected quasi-best approximation result, it shows its main features, the leading interplay in the quasi-best approximation constant ν h of the Tikhonov parameter α, the mesh size h, and the quasi-best approximation constant µ h , see (3.5), of the constraint discretization as α → 0, h → 0, and µ h → ∞.

Main features ν h Reference
Variational discretization with continuous control action and observation O Variational discretization with compact control action and observation Variational discretization with δ-compact control action and observation as well as δ-regularizing PDE constraint Discretization with δ-approximate control action as well as δ-compact observation and δ-regularizing PDE constraint Convex constraints for control with δ-compact control action and observation as well as δ-regularizing PDE constraint

Model optimization problem and reduced optimality system
We introduce our model optimization problem. Assume that the control variable q is taken from a real Hilbert space Q with scalar product (·, ·) Q and induced norm · Q . Its corresponding state u ∈ V 1 is determined by solving a linear boundary value problem of the form with the following setting: • The state space V 1 is a Hilbert space with induced norm · 1 . Its dual and the corresponding duality pairing are indicated with V * 1 and ·, · 1 , respectively. • The differential operator A is induced by bilinear form a : second Hilbert space with induced norm · 2 , dual space V * 2 , and dual pairing ·, · 2 . In sections 3-5 we shall make a special choice (3.6) for the norm · 2 . We assume that the bilinear form a is bounded and satisfies the following non-degeneracy conditions: Employing well-known inf-sup theory (see e.g. Babuška [21]), we see that the operator A : is linear and boundedly invertible. • The control action operator C : Q → V * 2 is linear and bounded with constant M C . Our goal is then to numerically solve the constrained optimization problem where we assume in addition: • The desired 'state' u d is an element of a Hilbert space W with scalar product (·, ·) W and induced norm · W .
• The observation operator I : V 1 → W is linear, and bounded with constant M I .
• The cost of the control, which can be viewed as a Tikhonov regularization, is scaled with the parameter α > 0.
Problem (2.3) is a quadratic minimization problem with a linear constraint. The objective function is convex in (q, u) and strictly convex in q. Consequently, standard arguments ensure the existence of a unique solution; see, e.g. Lions [22, theorem 1.1] or Tröltzsch [23, chapter 2.5].
Laplacian, and C and I are the canonical compact immersions L 2 → H 1 0 * and H 1 0 → L 2 , then (2.3) simplifies to the optimization problem (1.1) in the introduction. Notice that, in this case, the operators C and I are related by C * = I.
To formulate the optimality system for (2.3), it is useful to define the adjoint operators A * , C * , I * of A, C, I by Thanks to the convexity of the problem (2.3), a pair (q, u) ∈ Q × V 1 is a minimum point if and only if there exists p ∈ V 2 such that We may eliminate q by inserting the last equation into the first one and multiplying the second equation by β > 0. We thus obtain the following reduced optimality system for the pair Notice that the second row of equations, Au + 1 α CC * p = 0, suggests scaling the adjoint state p by the factor 1 α , while the first row, −βI * Iu + βA * p = −βI * u d , suggests no scaling at all. As a compromise, we propose to use z = 1 √ α p and β = 1 √ α . We thus transform the optimality system (2.4) into and the reduced optimality system (2.5) into This rescaled and reduced optimality system deviates from the usual KKT-formulation, but has an interesting structure. As the KKT-formulation, it is symmetric also for non-symmetric A. The off-diagonal consists of two interrelated invertible operators, while the diagonal entries are (semi-)definite, symmetric operators. Notice that, upon swapping the rows, the roles of the diagonal and off-diagonal can be exchanged. For the optimization problem (1.1), the operator matrix is then diagonally dominant in the sense that CC * and I * I are compact operators. Let us give a weak formulation of the rescaled and reduced optimality system. Its rows are equivalently written as and so we are led to introduce the Hilbert space

9) and the bilinear form
In this notation, the variational formulation of the rescaled and reduced optimality system (2.7) simply reads A pair x = (u, z) ∈ X is a solution of (2.11) if and only if (u, z) is a solution of (2.8) if and only if the triple (u, z, − 1 √ α C * z) ∈ V × Q verifies the rescaled optimality system (2.6). Consequently, thanks to the convexity of (2.3), if x = (u, z) ∈ V is the unique solution of (2.11), then ( Let us analyze the bilinear form b = a + 1 √ α c. We readily see that a, c, and so b are symmetric. (2.12) Moreover, even if a is coercive, b is not coercive in general. Consider, for example, a set-up where there exists This entails that c is not coercive. As a consequence, b is not coercive for α > 0 sufficiently small. In order to obtain further properties, let us first consider the contributions a and c separately. The bilinear form c is closely related to the original minimization problem (2.3). To see this, observe that, if (u, z) ∈ V and √ αq = −C * z, we have the correspondence Q , which motivates us to introduce the 'energy seminorm' on V. Thus, denoting by Z the kernel of |·| and realizing that the bilinear form c is well-defined on the quotient space V/Z, we see that (2.14) where the second identity relies on the form c is also continuous in V, with constant M. The bilinear form a inherits its continuity and nondegeneracy properties from a. More precisely, we have with M a and m a from (2.2). While the first identity is straight-forward, the second one hinges on the inf-sup-duality (see Babuška [21]) Turning to the complete bilinear form b, we may sum up the continuity properties as follows: for all v, ϕ ∈ V , we have Here we have equipped V as trial space with · and as test space with · α . The former is in accordance with our scopes in the error analyses below and the latter avoids in particular a dependence on M/ √ α of the continuity constant of b and in the following bound for the righthand side in (2.11): for all ϕ = (ϕ 1 , The derivation of the nondegeneracy properties of the bilinear form b is more subtle. In order to establish the crucial inf-sup condition (2.2c), let ϕ = (ϕ 1 , ϕ 2 ) ∈ V be given. We combine the nondegeneracy properties of a and c and let where γ 0 and w = (w 1 , w 2 ) ∈ V is chosen with the help of (2.17) such that w = ϕ and thanks the continuity (2.14) of c and m a M a . Using the inequality 2st we may bound the critical term by Thus, if we define where the norms on the right-hand side coincide with those in the continuity bound (2.19). We therefore have the following basic result.

Theorem 2.1 (Bilinear form of reduced optimality system). If we equip V as trial space with · from (2.9) and as test space with · α from (2.20), then the inf-sup constant m b and the continuity constant M b of the bilinear form (2.10) satisfy
where κ is defined by the relations (2.23).
The inequalities of theorem 2.1 yield for the condition number of the bilinear form b (i.e. the ratio of its continuity constant to its inf-sup constant) The factor M a /m a , the condition number of the bilinear form a associated with the constraint, is expected to be a kind of lower bound. In this vein, we may view the factor κ as a bound for the possible amplification of the constraint conditioning, resulting from the interplay of constraint and the objective in the constrained optimization problem (2.3). Inspecting (2.23), we see that κ is a function of the parameters α, M, m a , and M a . The next three remarks discuss asymptotic behaviors of κ that will play major roles in what follows or are of independent interest.

Remark 2.2 (Amplification for pure constraint case).
Consider the special case C = 0 and I = 0. Then the rescaled and reduced optimality system (2.7) is a well-posed 'double' boundary value problem. Its condition number with respect to (V, · ) × (V, · ) is M a /m a ; see (2.17). As C = 0 and I = 0 imply M = 0, L = 0, and so γ = 0 and κ = 1, this is reproduced by theorem 2.1.
It is worth mentioning that this limiting case of 'pure constraint' is attained in a continuous manner: where L = M/ √ α is essentially the operator norm of the perturbation.

Remark 2.3 (Amplification for degenerating constraint).
While the continuity constant M a of the bilinear form a does not enter κ, its inf-sup constant m a does, in a critical manner. More precisely, we have Notice that the fraction involving L has only values in the interval [1,2].

Remark 2.4 (Amplification for vanishing regularization).
Consider the limit α → 0 of the Tikhonov regularization parameter (while I and C are fixed). Then L → ∞ so that Let us see with a simple example that the inf-sup constant m b in theorem 2.1 can blow up with this rate and so the lower bound therein cannot be improved for small α without further assumptions on the structure of b.
and α > 0. The symmetric bilinear form b of the optimality system is then given by the matrix Hence, the asymptotic behavior of α in (2.25) is attained. The chosen norms for V as trial and test space are not always the most convenient ones. This follows from the following remark considering a special case.

Remark 2.5 (Coercive constraints with C * = I).
Suppose that V 1 = V 2 and Q = W with coinciding scalar products and norms and that the bilinear form a is coercive with constant m a and C * = I. It is worth noting that, as a is not necessarily symmetric, the best coercivity constant m a may be much smaller than the inf-sup constant m a . Given ϕ ∈ V , we proceed as in (2.22) This fits well to the following variant of the continuity bound (2.19): Hence, in this case, the condition number of b with respect to the norms in (2.27) is independent of the Tikhonov regularization parameter α. Nevertheless, if C * = I, also this choice of norms cannot offer in general an asymptotic behavior better than 1/ √ α as α → 0. In fact, recomputing the example in remark 2.4 with the norms in (2.27) does not change the behavior of its inf-sup constant.
Let us conclude this section with the following side product of our discussion of the bilinear form b.

Corollary 2.6 (Existence and uniqueness). The rescaled and reduced optimality system (2.11) and thus (2.4) has a unique solution.
Proof. Inequality (2.24) ensures (2.2c) for the bilinear form b and, thanks to the algebraic symmetry of b, also (2.2b). □

Analysis for variational discretization
In this section, we analyze the error of the variational discretization of the optimization problem (2.3) according to Hinze [6]. Our key tool is the rescaled and reduced optimality system (2.7), whose Galerkin solution coincides with the approximate solution of the variational discretization.

Variational discretization and reduced optimality system
We start by discretizing the PDE constraint (2.1) of the optimization problem (2.3). Recalling its variational formulation The corresponding Petrov-Galerkin method then reads Using this for the constraint in (2.3), we arrive at the (semi-)discrete optimization problem where we, in addition, assume that I can be exactly evaluated for any function from V h,1 . As in the continuous case, Also here, we may eliminate the approximate control q by inserting the third equation into the first one. 2 , the variational formulation of the ensuing discrete rescaled and reduced optimality system is Its solution x h is the Galerkin approximation in V h to the solution x of the variational formulation (2.11) of the rescaled and reduced optimality system. Applying corollary 2.6 to the discrete spaces therefore yields the following approach to uniqueness and existence of the variational discretization of (2.11).

Lemma 3.1 (Discrete well-posedness). The discrete reduced optimality system (3.3) has a unique variational solution
is the unique solution of the semidiscrete optimization problem (3.1). Remarkably, the approximate solutions (q, u h , z h ) of the variational discretization (3.2) are computable whenever C * |V h,2 and I |V h,1 can be evaluated exactly.

Non-asymptotic quasi-best approximation
We shall assess the quality of the Galerkin approximation x h = (u h , z h ) ∈ V h from (3.3), assuming that we are interested particularly in the · 1 -error of the approximate state u h . For this purpose, we compare it with a suitable best error in V h . Let us first recall some basic results in Petrov-Galerkin approximation, which we already formulate for the discretization of the constraint.
see, e.g. Babuška [21]. We refer to the smallest possible choice of µ h as the quasi-best approximation constant of the constraint discretization. Xu and Zikatanov [24] show the identities where v 1 varies in V 1 and v h,1 varies in V h,1 and, for the sake of notational simplicity, a tedious ϕ h,2 = 0 is avoided. A perhaps striking feature of these formulas is that they are not affected by the choices of the norms in the test spaces V h,2 and V 2 . This comes in quite useful in our context, as the adjoint state is an auxiliary variable and, in the original approximation problem (2.3), the norm · 2 is free as long as (2.2) continues to hold with · 1 . Exploiting this freedom, we henceforth assume and so, in particular, measure the error of the approximate adjoint state z h in this norm. The convenience of the choice (3.6) lies in and the following consequences thereof. The numerator in (3.5) is ϕ h,2 2 , which, together with the inf-sup-duality, see (2.18), yields Remark 3.2 (Without special choice of the error norm for the adjoint state). One may want to retain an original choice · 2,org for the norm of V 2 . In this case, the results below continue to hold but their constants have to be revisited. The changes can be determined by the following relationships, where an additional index 'org' refers to the setting with · 2,org and no such index refers to the setting with (3.6): After these preparations, we are ready to derive a first result about quasi-best approximation of the variational discretization (3.1).

Theorem 3.3 (Non-asymptotic quasi-best approximation)
. Let x = (u, z) be the solution of the optimality system (2.11) corresponding to any desired state u d ∈ W and denote by · the norm from (2.9) with (3.6) as norm in V 2 . The combined error in the corresponding approximate state u h and its adjoint z h of the variational discretization is quasi-best in V h with Here and µ h is the quasi-best approximation constant of the constraint discretization.
Proof. Let ν h denote the quasi-best approximation constant of the variational discretization. Thanks to theorem 2.1 and lemma 3.1, we can use the counterpart of (3.5) for the characterization (3.3) of the variational discretization. Let ϕ h ∈ V h . The continuity bound (2.19) gives for the numerator in (3.10). For the denominator, we use (2.22), where V is replaced by V h and, therefore, with 1/µ h in place of m a in view of (3.8). We thus obtain and the proof is finished. □ In the special situation of remark 2.5, we can obtain the following quasi-best approximation result.

Remark 3.4 (Quasi-best approximation for coercive constraints and C * = I).
Suppose that V 1 = V 2 and Q = W with coinciding scalar products and norms and that the bilinear form a is V 1 -coercive with constant m a and C * = I. Exploiting the coercivity and continuity properties of remark 2.5, we derive for the error of the variational discretization (2.11) The quasi-best approximation constant in the preceding remark 3.4 does not blow up for vanishing regularization. Nonetheless, when measuring the error merely with · , it does not exclude an α −1/4 -blow up of the quasi-best approximation constant even in the special case C * = I considered in remark 2.4 and, in the light of the example therein, it does not exclude an α −3/4 -blow up for general operators I and C. As we shall see, the α-dependence in theorem 3.3 is less severe.

Remark 3.5 (Vanishing regularization and quasi-best approximation).
As in remark 2.4, we consider the limit α → 0 for the Tikhonov regularization parameter. Similarly to there, we have This blow up arises from the lower bound of the inf-sup constant in theorem 2.1, which cannot be improved because of (2.26). Note however, that the equivalence of the norms · α and ·) is not uniform in α. In the light of (3.5), it is therefore conceivable that (3.12) could be improved by using sup v =1 b(v, ·) as test space norm. However, the determination of the discrete inf-sup constant with respect to this abstract norm appears to be much more involved than the approach (2.22), which directly carries over to discrete spaces. In any case, we shall show below that, under refinement, the α-dependence disappears for many instances of the optimality system (2.6).

Asymptotic quasi-best approximation
In this section, we complement theorem 3.3. To be more precise, let ν h denote the quasi-best approximation constant of the variational discretization as in the proof of theorem 3.3 and consider a sequence (V h ) h of discrete spaces leading to a uniform stable constraint discretization in the sense that which is equivalent to discrete inf-sup stability in view of (3.8). Theorem 3.3 then ensures the existence of a constant ν such that This upper bound may be pessimistic. To motivate this assessment, represent the bilinear form b by the operator matrix which is the one in (2.7) with swapped rows. If C and I are compact, this matrix is diagonally dominant in an operator sense and can be viewed as a compact perturbation of the diagonal matrix with the entries A and A * . Therefore, in order to improve on (3.14), we mimic somewhat the argument in Schatz [20], introducing some new twist. Let us first observe that, in accordance with remark 2.2, theorem 3.3 yields ν h µ h whenever M I = 0 = M C . More precisely and generally, we have the following relationship between the two quasi-best approximation constants.

where κ h is as in theorem 3.3 and R h is the generalized Ritz projection in (3.9).
Proof. As in the proof of theorem 3.3, we will make use of (3.5) with a replaced by b.
Thanks to (2.14), (2.20) and (3.11) this proves the claimed inequality. □ In order to deploy lemma 3.6, we need additional assumptions for our optimization problem and its discretization. We shall consider two settings: a 'qualitative' and a 'quantitative' one. The former assumes in addition for the optimization problem and for the constraint discretization. Notice that, owing to remark 3.2, the condition (3.15a) is independent of our choice to equip V 2 with the norm (3.6).

Lemma 3.7 (Qualitative asymptotic quasi-best approximation). Under the assumptions (3.13) and (3.15), the quasi-best approximation constant ν h satisfies
Proof. In the light of lemma 3.6 and (3.13), it suffices to verify the uniform convergence This follows from a standard argument; we provide details for the sake of completeness. Let (h k ) k be any sequence with lim k→0 h k = 0 and choose v k such that where we write k instead h k whenever the latter is an index. Exploiting (3.13) another time, we see that the sequence given by for any ϕ ∈ V and ϕ k ∈ V k . Choosing ϕ k by means of (3.15b), we derive a(d, ϕ) = 0 by k → ∞. Consequently, (2.17) yields d = 0. Thanks to (3.15a), the operator I : V 1 → W and the adjoint C * : V 2 → Q are compact. This turns the weak convergence d k 0 in V into strong convergence entailing |d k | → 0 and the proof is finished. □ In order to quantify the convergence in lemma 3.7, we shall use a duality argument. This requires a second, more specific setting of additional assumptions involving the Sobolev spaces H s , s 0, and their norms |·| s over some domain. We use |·| s instead of · s in order to avoid confusion with the norms · 1 and · 2 of V 1 and V 2 . For s < 0, we denote by H s the (topological) dual space of H −s and |·| s stands for the dual norm of |·| −s .
We suppose that spaces V 1 and V 2 relate to Sobolev spaces in the following way: there are s i ∈ R, i = 1, 2, and a constant C S 1 such that V i is a closed subspace of H si and C −1 Furthermore, we suppose that there is δ > 0 such that the following three conditions hold. First, the operators C and I have the boundedness properties Thus, the canonical embeddings H −s2+δ → H −s2 and H s1 → H s1−δ quantify the compactness assumption (3.15a). Second, the differential operator of the constraint and its adjoint offer the following regularity estimates: there is a constant C R > 0 such that, for all admissible f and g, Third and last, the approximation spaces V h verify for some constant C I > 0, which quantifies the approximation property (3.15b).

Theorem 3.8 (Quantitative asymptotic best approximation). Under the assumptions (3.13) and (3.17), the quasi-best approximation constant ν h satisfies
where κ is as in lemma 3.7. For the α-dependence of κ, see remark 3.5.
Proof. Similarly as in the first step of the proof of lemma 3.7, inserting (3.13) and into lemma 3.6 establishes the claim. To show (3.18), let v ∈ V with v = 1 and define ϕ ∈ V as the solution of the following 'dual' problem associated with the bilinear form a: where ϕ h ∈ V h is arbitrary. For the first factor, (3.9) and (3.13) imply (3.20) For second factor, we employ (3.17d) with suitable ϕ h ∈ V h to obtain and it remains to show that the norms on the right-hand side are suitably bounded. Let consider the first one. Making use of the regularity estimate (3.17c) and the definition of ϕ 1 , we deduce where M C is the operator norm of C from (3.17b). A similar argument yields where M I is the operator norm of I in (3.17b). We insert the previous estimates in the first one and conclude (3.18). □ Let us exemplify theorem 3.8 by two applications. The first one considers the optimization problem (1.1) of the introduction, while the second one is more involved in the sense that the constraint does not allow for a coercive set-up. Taking Sobolev seminorms instead of norms in (3.17a), we then have C S = 1 for the relevant cases and C R = 1 thanks to elliptic regularity as well as M I = 1 =M C . Standard approximation theory shows (3.17d) with C I depending on the shape regularity of the underlying meshes. Since µ h = 1, we conclude for the quasi-best approximation constant of the variational discretization in this case. where the underlying domain Ω ⊂ R 2 is planar, polygonal, Lipschitz, but not necessarily convex, {x j } j=1 ⊂ Ω are distinct points, δ xj denotes the Dirac functional at the point x j , and 0 < σ < 1 2 . The bilinear form a(v, w) = Ω ∇v · ∇w dx, v, w ∈ C ∞ 0 (Ω), has a continuous and inf-sup-stable extension on (Ω) and V 2 = H 1+σ 0 (Ω) and allows for a standard discretization with linear finite elements S h for both trial and test space; see, e.g. [25]. For the verification of the discrete inf-sup condition, denote by R h and Λ h the Ritz projection and the Scott-Zhang interpolation operator, respectively. As the continuous inf-sup-condition yields, for any s h ∈ S h , and so where µ h depends only on continuous inf-sup constant and on the shape regularity of the underlying mesh and we switched to (3.6) for the norm on V 2 . To complete the setting, we set W = L 2 (Ω), Q = R , and let I be the canonical embedding H 1−σ (Ω) → L 2 (Ω) and C : R → H −(1+σ) (Ω) be given by Cq = j=1 q j δ xj . The continuity constants M I and M C are of order 1 and , respectively. Notice that, for σ = 0, C is not continuous because functions in H 1 0 (Ω) do not have point values in general. Choosing δ ∈ (0, σ), we have (3.17) with s 1 = 1 − σ, s 2 = 1 + σ and therefore

Analysis with approximate control action operator
In this section, we shall analyze the approximation properties of a variational discretization, where the control action operator is approximated. This includes the case of a discretized control space.

Approximate variational discretization
be the same finite-dimensional conforming spaces introduced in section 3.1 and assume that the linear operator C * h : V → Q approximates C * . Then the (semi-) discrete optimization As before, we may eliminate q h . If we define then the reduced version of (4.2) is the following perturbation of the optimality system (3.3): Before we proceed to analyze its discretization error, let us give an important class of examples.

Example 4.1 (Discretized controls).
We consider a conforming discretization of the control variable. More precisely, replacing Q in (3.1) with a finite-dimensional subspace Q h ⊂ Q leads to the discrete optimality system

(4.4)
If we denote by P h the Q-orthogonal projection onto Q h , then the third equation means and, therefore, the right-hand side of the first equation can be rewritten as follows: Hence, the reduced version of (4.4) is a special case of (4.3) with As the bilinear form b h coincides with b except for using C * h in place of C, the non-asymptotic continuity and nondegeneracy properties of b in sections 2 and 3, e.g. theorem 2.1, immediately carry over by replacing M C with the operator norm M C h of C * h . In particular, setting M h := max{M I , M C h } and defining for all v, ϕ ∈ V . Furthermore, (3.11) and the inf-sup duality (2.18) for all v h ∈ V h , wherẽ and µ h is the quasi-best approximation constant of the constraint discretization.
Since the structures of the discrete problems (4.3) and (3.3) are the same, well-posedness of (4.3) follows from lemma 3.1.

Approximation
As in the error analysis of section 3.2, we adopt the convenient choice Here we start our analysis by splitting the error into an approximation part and a consistency part.

Lemma 4.2 (Approximation and consistency error)
. Let x = (u, z) be any solution of the optimality system (2.11) and let x h be its approximation from (4.3). Then the error satisfies Here κ h is defined by (4.8) and µ h is the quasi-best approximation constant of the constraint discretization from (3.9). 2 2 for all ϕ h ∈ V h . In view of (4.6) and (4.7), these identities imply The claim follows from the obvious inequalities For the next corollary, it is necessary to consider a class of optimization problems, where all elements of V h are solutions for suitable data. The class P consisting of the optimization problems has this property whenever I * is surjective in addition to the assumptions of section 2.

Corollary 4.3 (Necessary condition for quasi-best approximation). If the approximate variational discretization (4.3) is quasi-best in the class P , then
Proof. Let v 2,h ∈ V 2,h be arbitrary and take some v 1,h ∈ V 1,h .
is a possible solution in the class P . Since (4.3) is quasi-best in P , the discrete solution is exactly v h ∈ V h . Hence, by lemma 4.
Although possible, it is difficult to imagine that a practical approximation C * h satisfies the condition in corollary 4.3 without coinciding with C. We therefore consider in what follows only assumptions on C * h that lead to asymptotic quasi-best approximation. In view of lemma 4.2, this requires, that the consistency error vanishes at least as fast as the best approximation error, i.e.
Moreover, to capture in the limit the compactness of C * resulting from assumption (3.15a), we assume that This implies that the operator norms which, in view of (4.10), yields a contradiction. Consequently, Lemma 4.4 (Qualitative asymptotic quasi-best approximation with approximate control action). Let x = (u, z) ∈ V be a solution to problem (2.11) and let x h = (ũ h ,z h ) ∈ V h , h > 0, be the corresponding approximations given by (4.3). Furthermore, assume uniform stability (3.13), approximability (3.15b), limiting compactness (4.10), and that I : V 1 → W is compact. If the exact solution x satisfies (4.9), we have Proof. As in the proof of lemma 4.
We deduce by replacing b with b h and x h with x * h in lemma 3.6 and using the limiting compactness (4.10) instead of the compactness of C * : V 2 → Q in the proof of lemma 3.7. Next, proceeding as in the proof of lemma 4.2, assumption (4.9) on the exact solution gives We therefore conclude by inserting the two preceding relationships into the triangle inequality We turn to prove a quantitative quasi-best approximation result. To this end, we need to specify the qualitative assumptions (4.9) and (4.10) by quantitative ones. We shall assume that and that where δ > 0 is suitably chosen. Note that (4.13) reduces for C h = C to the part regarding C in the quantitative counterpart (3.17b) of the qualitative compactness (3.15a).

Theorem 4.5 (Quantitative asymptotic quasi-best approximation with approximate control action)
. Let x, x h , h > 0, and κ be as in lemma 4.4. In addition, assume uniform stability (3.13) and that there exists δ > 0 such that we have (3.17), where (4.13) replaces the assumption on C in (3.17b). If the exact solution x satisfies also (4.12) with the same δ, we have Proof. We follow the lines of the proof of lemma 4.4, but replacing (4.9) with (4.12) and (4.11) with a quantitative argument in the spirit of theorem 3.8. To this end, it suffices to use (4.13) instead of (3.17b). □ We conclude this section by assessing the key assumptions (4.9) and (4.12) by a remark and an example.

Remark 4.6 (Ensuring dominated consistency error). As
we may verify assumptions (4.9) and (4.12) using relationships for

Example 4.7 (Simple model optimization and piecewise constant controls).
Consider the setting of example 3.9, but with problem (1.1) with linear finite elements for the constraint and piecewise constants for the control variable. In the light of example 4.1, this full discretization can be cast into (4.3) with C h = P h C, where P h is the L 2 -projection onto piecewise constants. By duality, we have where c 1 depends on the shape regularity of the underlying meshes. Suppose that there is a constant c 2 such that This holds for example if the matrix norm of the Hessian of the exact state or its adjoint state are bounded away from 0 in a fixed subdomain. We conclude i.e. (4.12) with δ = 1 and a constant depending on the exact solution under consideration.

Analysis with control constraints
This section generalizes our approach to optimization problems that are nonlinear because of constraints on the control.

Control constraints and discretization
Let K ⊂ Q be the set of admissible controls. We assume that K is nonempty, closed, and convex (5.1) and denote by Π K : Q → K the projection operator onto K which is characterized by q − Π K q Q = inf p∈K q − p Q or, equivalently, by The latter characterization implies for all q, p ∈ Q, which in turn shows that the operator Π K is strongly monotone and Lipschitz continuous, in both cases with constant 1.
The generalization of problem (2.3) incorporating convex control constraints is then the convex optimization problem Thanks to (5.1), a solution (q, u) is characterized by the existence of z ∈ V such that the following counterpart of the rescaled optimality system (2.6) is satisfied: As in section 2, we insert the third equation into the first one and consider the corresponding weak formulation of the rescaled and reduced optimality system: where b K := a + c K,α and which already incorporates the 1/ √ α-scaling. In contrast to the previous sections, c K,α and so b K are in general not linear in the first argument. Nonetheless, if we introduce the pseudometric inequality (5.2) leads to the following replacement of the properties (2.14) of the bilinear form c: if v, w ∈ V and ϕ = −(v 1 − w 1 ), v 2 − w 2 , then while, for any v, w, ϕ ∈ V arbitrary, we have, Hence, B K is strongly monotone and Lipschitz continuous and therefore boundedly invertible by [26, theorem 25.B]. In light of (5.10), we can conclude by noting T − * In order to discretize the optimization problem (5.3) with control constraints, we proceed as in section 3.1. Introducing the discrete space V h = V h,1 × V h,2 as therein, the variational discretization can be characterized as follows: Here we need that Π K (−C * v h,2 / √ α) can be evaluated exactly for v h,2 ∈ V h,2 . This occurs, for example, when we consider (1.1) with box constraints and discretize with linear finite elements. If Π K has to be approximated, the subsequent error analysis involves additional technicalities, similar to those addressed in section 4.
Existence and uniqueness of solutions to (5.11) can be established in a similar way as corollary 5.2. Using (3.6) as the norm in V 2 , the major change is to replace the operator (5.9) by T K,h : V h → V h given by where A h v h,1 := a(v h,1 , ·) |V h,2 , v h,1 ∈ V h,1 , is the discrete counterpart of A, 1/µ h is its inf-sup constant, γ is as in (2.23), and J h,i : V h,i → V * h,i is the Riesz map for V h,i , i = 1, 2.

Quasi-best approximation
We analyze the quasi-best approximation properties of the nonlinear variational discretization (5.11), adopting again (3.6) as norm in V 2 .
The following non-asymptotic result draws heavily on theorem 5.1, which needed an α-dependent error notion for V as trial space.

Theorem 5.3 (Non-asymptotic quasi-best approximation with control constraints)
. If x h is the approximation given by (5.11) to an arbitrary solution x of (5.5), then its error is quasi-best in V h in that where κ h and µ h are as in theorem 3.3.

Proof. Given any
(5.13) To bound the second term, we employ theorem 5.1 with, respectively, V h , T K,h , 1/µ h , 1, and κ h in place of V, T K , m a , M a , and κ. Writing ϕ h = T K,h (v h − x h ), the definitions of x and x h thus yield, weaker assumptions on (V h ) h . Let (h k ) k be any sequence with lim k→∞ h k = 0 and, writing k whenever h k is an index, consider otherwise.
The sequence (d k ) k is bounded in the Hilbert space V by definition. For its weak limit d ∈ V , we have for arbitrary ϕ ∈ V and ϕ k ∈ V h . Consequently, (3.15b), k → ∞, and (2.17) yield d = 0. In view of (3.15a), d k → 0 weakly in V then implies |d k | → 0.
For the second statement, we just note that the main step of the proof of theorem 3.8 with In view of the inverse triangle inequality Theorem 5.5 readily yields the following asymptotic quasi-best approximation result.
Corollary 5.6 (Asymptotic quasi-best approximation with control constraints). Let ν K,h be the quasi-best approximation constant for the nonlinear variational discretization (5.11) with respect to · . Moreover, assume (3.13) and define κ as in lemma 3.7. If (3.15) holds, then More specifically, if (3.17) holds, then For the α-dependence of κ, see remark 2.4.
In comparison with lemma 3.7 and theorem 3.8, corollary 5.6 features an additional M/ √ α-factor. This factor stems from the fact that the derivation we went through used an error notion that also incorporates it.