An adaptive finite element method for distributed elliptic optimal control problems with variable energy regularization

We analyze the finite element discretization of distributed elliptic optimal control problems with variable energy regularization, where the usual L 2 (Ω) norm regularization term with a constant regularization parameter ϱ is replaced by a suitable representation of the energy norm in H − 1 (Ω) involving a variable, mesh-dependent regularization parameter ϱ ( x ). It turns out that the error between the computed finite element state (cid:101) u ϱh and the desired state u (target) is optimal in the L 2 (Ω) norm provided that ϱ ( x ) behaves like the local mesh size squared. This is especially important when adaptive meshes are used in order to approximate discontinuous target functions. The adaptive scheme can be driven by the computable and localizable error norm ∥ (cid:101) u ϱh − u ∥ L 2 (Ω) between the finite element state (cid:101) u ϱh and the target u . The numerical results not only illustrate our theoretical findings, but also show that the iterative solvers for the discretized reduced optimality system are very efficient and robust.


Introduction
Optimal control [2,10,17,22] and inverse problems [8,12,20] subject to partial differential equations often involve some parameter-dependent cost or regularization terms, see also the recent special issue [7] on optimal control and inverse problems. While in optimal control problems the regularization parameter is often considered as a given constant, in inverse problems, the parameter-dependent convergence as → 0 is well studied, see, e.g., [1]. For tracking type cost functionals subject to elliptic partial differential equations, the regularization error was analyzed in [18] which depends on the regularity of the given target. In [15], and in the case of energy regularization, we have considered a related finite element analysis which resulted in an optimal choice of the regularization parameter = h 2 , or vice versa, where h is the mesh size of the globally quasi-uniform finite element mesh. While the latter can be used to approximate smooth target functions in the state space, adaptively refined finite element meshes should be used when considering less regular target functions, e.g., discontinuous or singular, or violating Dirichlet boundary conditions which are involved in the definition of the state space. This motivates the use of a variable regularization parameter function which can be defined by using the local finite element mesh sizes h . We are not aware of any paper on optimal control problems dealing with such an approach. However, there are several papers in imaging considering a similar approach: In [11], a variable regularization parameter is used for an adaptive balancing of the data fidelity and the regularization term, see equation (3) in [11]. A variable L 2 (Ω) (TV) regularization is considered in [6, equation (1.1)]. Finally, a spatially adapted total generalized variation model was already used in [4, equation (1.4)], see also the more recent work [9].
As model problem we consider the optimal control problem to minimize the cost functional We assume that Ω ⊂ R n , n = 1, 2, 3, is a bounded Lipschitz domain, and ∈ R + is some, at this time constant, regularization parameter, on which the solution (u , z ) depends on. Our particular interest is in the behavior of u − u L 2 (Ω) as → 0, see [18]. Note that the energy norm as used in (1.1) is given by When eliminating the adjoint state p we can determine the optimal state u as the solution of the singularly perturbed Dirichlet boundary value problem − ∆u + u = u in Ω, u = 0 on ∂Ω, (1.5) which is also known as differential filter in fluid mechanics [13]. The variational formulation of (1.5) is to find u ∈ H 1 0 (Ω) such that For a finite element discretization of the variational formulation (1.5) we may use the ansatz space V h := S 1 h (Ω) ∩ H 1 0 (Ω) of piecewise linear and continuous basis functions which are defined with respect to some admissible and globally quasiuniform decomposition of Ω into simplicial finite elements of mesh size h. The Galerkin variational formulation of (1.6) is to find u h ∈ V h such that (1.7) When combining the regularization error estimates for u − u L 2 (Ω) as given in [18] with finite element error estimates for the approximate solution u h , i.e., for u h − u L 2 (Ω) , we were able to derive estimates for the error u h − u L 2 (Ω) , see [15]. In particular for the optimal choice = h 2 this gives (1,2]. This error estimate remains true when considering optimal control problems with energy regularization subject to time-dependent partial differential equations, see [16] in the case of the heat equation. However, when considering less regular target functions u, e.g., singular or discontinuous targets, the use of adaptive finite elements seems to be mandatory in order to gain optimal computational complexity. In this case it is not obvious how to choose a constant regularization parameter , e.g., = h 2 min . Instead, one may use a locally adapted regularization function (x), x ∈ Ω. The energy norm in H −1 (Ω) can be realized by duality when solving a Poisson equation with zero Dirichlet boundary conditions. When including the (constant) regularization parameter , we can generalize this approach by considering a diffusion equation with (x) −1 as diffusion coefficient in order to realize the variable energy regularization within an adaptive finite element discretization.
The rest of this paper is organized as follows. In Section 2, we derive the analogon of the optimal control problem (1.1) when using a regularization function (x) instead of a constant regularization parmeter , and the corresponding reduced optimality systems. Section 3 provides estimates of the derivation of the state u from the desired state u with respect to the L 2 -norm in terms of the regularization function (x), and the regularity of the target u. In Section 4 we analyze the L 2 -norm u h − u L 2 (Ω) of the error between the computed finite element state u h and the desired state u leading to an elementwise adaption of the regularization function (x) to the local mesh size h . The first part of Section 5 is devoted to numerical studies of the error u h − u L 2 (Ω) for benchmark problems with discontinuous targets u, whereas the second part provides numerical studies of the proposed iterative solvers in the three-dimensional case n = 3. Finally, in Section 6, we draw some conclusions, and give some outlook on further research work.

Distributed optimal control problems with variable energy regularization
To give a motivation for the optimization problem to be solved, let us consider an alternative representation of the energy norm, still using a constant regularization parameter : where w ∈ H 1 0 (Ω) is the weak solution of the Dirichlet boundary value problem − ∆w = √ z in Ω, w = 0 on ∂Ω. (2.1) Now we may introduce w = √ w to conclude the diffusion equation −div −1 ∇ w = z in Ω, w = 0 on ∂Ω, and the norm representation Instead of using a constant regularization parameter we now consider a diffusion equation with a variable diffusion coefficient ∈ L ∞ (Ω), that is uniformely bounded from above and below, i.e., there exists positive constants and such that 0 < ≤ (x) ≤ < ∞ for almost every x ∈ Ω. More precisely, we consider the Dirichlet boundary value problem and its variational formulation to find w ∈ H 1 0 (Ω) such that is satisfied for all v ∈ H 1 0 (Ω), i.e., we have w = A −1 1/ z . Instead of (1.3) we now consider the reduced cost functional 3) whose minimizer is given as the unique solution of the gradient equation p + w = 0 in Ω. (2.4) The optimality system to be solved is now given by the primal problem (1.2), the adjoint problem (1.4), the gradient equation (2.4), and (2.2). When eliminating w and z , we end up with a coupled system of the primal problem − div 1 (x) ∇p (x) − ∆u (x) = 0 for x ∈ Ω, u (x) = 0 for x ∈ ∂Ω, (2.5) and the adjoint boundary value problem (1.4). The related variational formulation is to find (u , p ) ∈ H 1 0 (Ω) × H 1 0 (Ω) such that for all q ∈ H 1 0 (Ω). When introducing we can write the coupled variational formulation (2.6) and (2.7) in operator form as and eliminating p results in the Schur complement system to find u ∈ H 1 0 (Ω) such that is bounded, self-adjoint, and H 1 0 (Ω) elliptic. Note that for a constant regularization parameter (x) = , (2.8) coincides with (1.5).

Regularization error estimates
In this section, we consider estimates for the regularization error u −u L 2 (Ω) when u ∈ H 1 0 (Ω) is the weak solution of the operator equation and where S is as defined in (2.9). Note that S induces a norm, satisfying First we follow the general approach as given in [16,Section 2] in the case of a constant regularization parameter. The variational formulation of the operator is satisfied for all v ∈ H 1 0 (Ω). Unique solvability of the variational formulation (3.2) follows from the boundedness and ellipticity of S . Theorem 1. Let u ∈ H 1 0 (Ω) be the unique solution of the variational formulation (3.2). For u ∈ L 2 (Ω) there holds the estimate
Let us now consider the operator S as defined in (2.9). For u ∈ H 1 0 (Ω) let

be the unique solution of the variational formulation
Moreover, using a weighted Cauchy-Schwarz inequality, this gives When combining this with the regularization error estimate (3.5) this gives, for It remains to consider S u L 2 (Ω) when S is given as in (2.9), i.e., We first compute For the first part we further have and hence, i.e., When taking the square and applying Hölder's inequality we obtain follows. Using (3.8) we finally obtain When assuming and combining this with (3.6) we obtain While for a constant regularization parameter (x) = we obviously have c = 0, in the more general situation of a, e.g., piecewise linear parameter function (x) we finally obtain a similar error estimate as in (3.9) when assuming u ∈ H 1 0 (Ω) only. Hence we restrict our considerations to u ∈ H 1 0 (Ω), and to less regular target functions, where we can formulate the following result.
Theorem 2. Let u ∈ H 1 0 (Ω) be the unique solution of the Schur complement system (2.8), where the regularization function ∈ L ∞ (Ω) is assumed to be bounded and uniform positive. For u ∈ H 1 0 (Ω) there holds the regularization error estimate Remark 1. Due to the assumption ∈ L ∞ (Ω) we can write (3.12) as and using (3.3) together with an interpolation argument we conclude where the eigenfunctions {v i } i∈N form an orthonormal basis in L 2 (Ω), and the eigenvalues λ i = λ i ( ) ∈ R + are positive and tend to infinity as i → ∞. Hence we can define, for s ∈ [0, 1], the weighted Sobolev norms With this we can write the regularization error estimates (3.3) and (3.12) as for s = 0 and s = 1, respectively. Using an interpolation argument, this remains true for all s ∈ (0, 1) when assuming u ∈ H s (Ω

Finite element error estimates
be the finite element space of piecewise linear and continuous basis functions ϕ k , which are defined with respect to some admissible locally quasi-uniform decomposition of the computational domain Ω into simplicial shape-regular finite elements τ of local mesh size h = ∆ 1/n , where ∆ is the volume of the finite element τ . As regularization we consider the piecewise constant function The Galerkin variational formulation of the abstract operator equation Using standard arguments we conclude Cea's lemma, from which we further obtain . When assuming u ∈ H 1 0 (Ω), and using the regularization error estimates (3.4) and (3.5), this gives .
Let Π h u ∈ V h be a quasi-interpolation of u ∈ H 1 0 (Ω), e.g., using the Scott-Zhang operator Π h , see, e.g., [5], and satisfying the error estimates and Here, ω := {τ j : τ ∩ τ j = ∅} is the simplex patch of τ . Then we can estimate, using (3.8) and (4.6), Moreover, using (4.5), we also have Together with (3.8) we then obtain and with (3.9) this finally gives The variational formulation (4.2) requires, for any given u ∈ H 1 0 (Ω), the evaluation of S u = B * A −1 Hence we can define the approximate solution p uh ∈ V h satisfying and therefore we can introduce an approximation S u = B * p uh of S u = B * p u . Instead of (4.2) we now consider the perturbed variational formulation to find u h ∈ V h such that Unique solvability of (4.11) follows since the stiffness matrix of S is positive semidefinite, while the mass matrix related to the inner product in L 2 (Ω) is positive definite.

Lemma 1.
Let u h ∈ V h be the unique solution of the perturbed variational formulation (4.11). Then there holds the error estimate Proof. The difference of the variational formulations (4.2) and (4.11) first gives the Galerkin orthogonality which can be written as When chosing v h = u h − u h ∈ V h , and using S u, u Ω ≥ 0 for all u ∈ H 1 0 (Ω), this gives Using (4.1) and an inverse inequality locally, we further have and hence, Note that we have and Hence we conclude We can further obtain, inserting the Scott-Zhang interpolation Π h u, Using an inverse inequality locally, we further estimate the first term by Recall that p u ∈ H 1 0 (Ω) solves Hence we conclude the Galerkin orthogonality

Now the assertion follows from
Now we are in the position to state the main result of this paper.
Theorem 3. Let u h ∈ V h ⊂ H 1 0 (Ω) be the unique solution of the perturbed variational formulation (4.11) where the regularization function (x) is given as in (4.1), and where the underlying finite element mesh is assumed to be locally quasi-uniform. Then there holds the error estimate Proof. The estimate (4.13) follows from the finite element error estimates (4.8) and (4.12), since the finite element mesh is assumed to be locally quasi-uniform.
Remark 2. Similar as before we also have the error estimate when assuming u ∈ H s (Ω) for some s ∈ [0, 1].
The perturbed Galerkin finite element formulation (4.11) can be written as coupled system to find for all q h ∈ V h . Note that this system corresponds to the finite element discretization of the coupled variational formulation (2.6) and (2.7). The finite element variational formulation (4.14) and (4.15) is equivalent to a coupled linear system of algebraic equations, where we use the standard finite element stiffness and mass matrices defined as Since the finite element stiffness matrix K h is invertible, we can eliminate the adjoint p to end up with the Schur complement system Since all involved stiffness and mass matrices are symmetric and positive definite, unique solvability of the Schur complement system and therefore of the Galerkin variational formulation (4.14) and (4.15) follows.

Convergence studies
As a first numerical example we consider the two-dimensional domain Ω = (0, 1) 2 , and the discontinuous target function u 2D (x) = 1 for x ∈ (0.25, 0.75) 2 , 0 else. The initial mesh consists of 32 triangular finite elements and 9 degrees of freedom, see Figure 1. For a given mesh we compute the approximate solution u h , the global error and the local error indicators We mark all elements τ when η > θ max =1,...,N η is satisfied, with θ = 0.5. After 14 refinement steps we obtain the mesh as shown in Fig. 1 Fig. 2 where in addition to the present approach we also present the convergence results when considering energy regularization [16] with the optimal choice = h 2 , and the regularization in L 2 (Ω) with = h 4 , see [14]. As expected, we observe a linear order of convergence when using the variable energy regularization in the adaptive version described above, while both the energy regularization and the regularization in L 2 (Ω) almost coincide with half the order of convergence. For a comparison of the different approaches, see also the computed states as shown in Fig. 3. Next we consider the three-dimensional domain Ω = (0, 1) 3 , and the target function u 3D (x) = 1 for x ∈ (0.25, 0.75) 3 , 0 else.
As shown in Fig. 4 we still observe a h 1/2 convergence for a uniform refinement in the case of both the L 2 (Ω) and energy regularizations, but a h 3/4 convergence for the adaptive diffusion approach, where h = N −1/3 . To explain the different convergence behaviour for the adaptive refinement in two and three space dimensions, we first consider the 2D case for the example u 2D ∈ H 1/2−ε (Ω), ε > 0 for a uniform refinement of the triangulation with N triangles (see Fig. 1 (left)). Let further m denote approximately the number of elements in each row/column of the mesh grid, i.e., N ∼ m · m = m 2 . Due to the regularity of u 2D , the optimal order of convergence is de facto 1/2. Thus, refining all of the N elements uniformly, will lead to a an error reduction of order h 1/2 , as observed. Now we aim for an adaptive refinement. Since for the particular test example the singularity of u 2D is only along the boundary of the square (0.25, 0.75) 2 , it is sufficient (after an initializing phase), to refine only O(m) = O( √ N ) elements in the neighborhood of this boundary in order to have the optimal rate of 1/2. Note, that for a uniform refinement, the number of elements would grow by a factor of 4 in each step, for the adaptive scheme, we only refine O(m) = O( √ N ) elements in the neighborhood of the discontinuity. Therefore, the number of elements only grows by a factor of 2. Hence, if we adaptively refine O(N ) elements, we can expect an error reduction of order h, which is exactly what we see in the numerical example as given in Fig. 2. Now let us look at the 3D case. Here again counting the elements along each edge, denoted by m, we get the relation N ∼ m 3 . For a uniform refinement we get a convergence rate h 1/2 . In order to get the same rate with an adaptive scheme, we need to refine at least the elements in the neighborhood of the boundary of the cube For u 1D ∈ H 1/2−ε (Ω), ε > 0, and for uniformly refining N elements, we will get an error reduction of order h 1/2 . Using an adaptive refinement though, it is enough to refine exactly 4 ∼ O(log(N )) elements in each step to get the optimal order of 1/2. Thus we can expect exponential convergence, which is also what we observe in the numerical example in Fig. 5.

Solver studies
While all the numerical results as presented in the previous subsection were computed using Matlab with a sparse direct solver, we finally discuss the use of preconditioned iterative solution strategies which are robust with respect to the regularization parameter function (x). Here we will restrict our considerations to the three-dimensional case with the target u 3D . We first consider the preconditioned conjugate gradient (PCG) solver applied to the Schur complement equation (4.17) with a proper preconditioner. When performing a uniform refinement, the diffusion coefficients (the inverse of the regularization parameters) are constant on all elements, and we may replace (x) by h 2 with h being the global mesh size. Therefore, the Schur complement is simplified as Robust preconditioners for such a Schur complement have been studied in our previous work [15], and the spectral equivalence of this Schur complement to the mass matrix was analyzed in our recent work [14]. Since, in this case, the Schur complement M h + K h K −1 h K h = M h +h 2 K h is spectrally equivalent to the mass matrix M h , we can use a simple diagonal approximation to the mass matrix such as diag[M h ] or lump[M h ] as cheap preconditioners for the Schur complement. In the case of an adaptive refinement, both the local mesh refinement and the use of varying diffusion coefficients (x), which are piecewise constant, play a role in developing robust Schur complement preconditioners with respect to the mesh size and diffusion coefficient jumps. At least, it is not straightforward to show the robustness of the preconditioner for the Schur complement M h + K h K −1 h K h in this situation. However, thanks to the particular choice of the diffusion coefficient (4.1), the negative effect resulting from the varying mesh size seems to be compensated by the local diffusion coefficient (the inverse of the local regularization parameter) during the adaptive procedure. The spectral equivalence of the Schur complement to the mass matrix still seems to hold on the adaptive refinements. A rigorous analysis of the robust preconditioners needs to be further studied. The robustness with respect to the minimal and maximal local mesh size (h min and h max ) is numerically confirmed by the constant iteration numbers of the PCG method preconditioned by the lumped mass matrix (Its (PCG)) on the adaptive refinements in Table 1, in comparison with the increasing conjugate gradient (CG) iteration numbers without using the preconditioner (Its (CG)). Note that we solve the Schur complement equation (4.17) until the relative preconditioned residual error is reduced by a factor 10 6 . For the inverse operation of K −1 h applied to  a vector v within the PCG iteration, we have used the classical Ruge-Stüben algebraic multigrid (AMG) [19] preconditioned CG method to solve K h w = v until the relative preconditioned residual error reaches 10 −12 in order to perform a sufficiently accurate multiplication with the Schur complement. Alternatively, one can here use a sparse factorization in a preprocessing step. Furthermore, we provide some numerical results concerning robust solvers for the coupled optimality system (4.16): Since the system matrix is non-symmetric and positive definite, we apply the GM-RES method with the following proposed block diagonal preconditioner: The number of GMRES iterations (Its) using such a preconditioner are given in Table 2. We solve the system until the relative preconditioned residual error is reduced by a factor 10 6 . In the preconditioner P h , we have utilized the Ruge-Stüben AMG preconditioner K h for K h , whereas the lumped mass matrix lump[M h ] has been used as preconditioner for the Schur complement. We observe almost constant iteration numbers of the GMRES method preconditioned by P h on the uniform refinement as well as on the adaptive refinements as given in Table 2. We only see slightly higher iteration numbers on the adaptive meshes than on the uniform ones.
On the other hand, we may reformulate the coupled system (5.1) in the following equivalent form When applying the Bramble-Pasciak transformation to the symmetric but indefinite system (5.2), this leads to the equivalent system  with the symmetric and positive definite system matrix Here, I h denotes the identity, and C h is some symmetric and positive definite (spd) matrix that is assumed to be spectrally equivalent to the matrix K h , and less than K h , i.e., C h < K h . In particular, we can again take the classical spd Ruge-Stüben AMG preconditioner K h = δK h (I h − AM G h ) −1 with a proper scaling δ > 0 as C h , where AM G h denotes the AMG iteration matrix. Now using C h = K h and the lumped mass matrix lump[M h ] to the Schur complement M h + K h K −1 h K h , we arrive at the following (inexact) BP preconditioner: Details on the BP-CG can be found in the original paper [3]; see also [23] for improved convergence rate estimates. The number of BP-CG iterations (Its) using the preconditioner P h are provided in Table 3. The solver stops the iteration when the relative preconditioned residual error is reduced by a factor 10 6 . From the constant PB-CG iteration numbers on both the uniform and adaptive mesh refinements, we observe the robustness of the proposed preconditioner for the coupled optimality system.

Conclusion and outlook
We have studied finite element discretizations of the reduced optimality system for the standard distributed space-tracking elliptic optimal control problem, but using a new variable energy regularization technique. It has been shown that the choice of the local mesh-size squared as local regularization parameter (x) leads to optimal rates of convergence of the computed finite element state u h to the prescribed target u in the L 2 norm. In particular, this approach allows us to adapt the local regularization parameter to the local mesh-size when using an adaptive mesh refinement, where the adaptivity is driven by the localization of the L 2 norm of  Table 3: Comparison of the preconditioned PB-CG solver on both the adaptive and uniform refinements.
the error between u and u h as computable local error indicator. Numerical studies made for discontinuous targets in one, two and three space dimensions illustrate that these simple adaptive schemes show a significantly better performance than the uniform refinement. We have also proposed iterative solvers for the finite element equations corresponding to the reduced optimality system. The numerical studies have shown that these solvers are robust and efficient at the same time. Contrary to quasi-uniform meshes where we can choose = h 2 , the numerical analysis of these solvers is still not complete in the case of variable regularization parameters. However, all numerical experiments have shown that the mass matrix M h , and, therefore, also the lumped mass matrix lump[M h ] are spectrally equivalent to the Schur complement K h K −1 h K h + M h , and can be used as robust preconditioners in the preconditioned BP-CG, MINRES, or GMRES solvers. Obviously, the classical plane Ruge-Stüben AMG preconditioner is doing this job for K h . Here we may develop more efficient and robust preconditioners that are especially adapted to the diffusion coefficient (x) that changes from element to element according to the mesh-sizes. Moreover, the parallelization of such iterative solution strategies should be implemented in order to solve really large-scale systems in three space dimensions. Finally, the variable energy regularization and the robust solvers for the corresponding linear algebraic equations can be extended to optimal control problems with state or control constraints, and subject to other state equations like elasticity, Maxwell, and Stokes equations.