MODEL REDUCTION TECHNIQUES WITH A-POSTERIORI ERROR ANALYSIS FOR LINEAR-QUADRATIC OPTIMAL CONTROL PROBLEMS

. The main focus of this paper is on an a-posteriori analysis for dif-ferent model-order strategies applied to optimal control problems governed by linear parabolic partial diﬀerential equations. Based on a perturbation method it is deduced how far the suboptimal control, computed on the basis of the reduced-order model, is from the (unknown) exact one. For the model-order reduction, H 2 ,α -norm optimal model reduction (H2), balanced truncation (BT), and proper orthogonal decomposition (POD) are studied. The proposed approach is based on semi-discretization of the underlying dynamics for the state and the adjoint equations as a large scale linear time-invariant (LTI) system. This system is reduced to a lower-dimensional one using Galerkin (POD) or Petrov-Galerkin (H2, BT) projection. The size of the reduced-order system is iteratively increased until the error in the optimal control, computed with the a-posteriori error estimator, satisﬁes a given accuracy. The method is illustrat-ed with numerical tests.


1.
Introduction. Model reduction is a powerful tool widely used for solving partial differential equations (PDEs) or large-scale ordinary differential equations (ODEs) where the latter may arise from semi-discretization of PDEs in space. Many model reduction methods are based on the idea of projecting the state space to a much lower-dimensional one such that the obtained so-called reduced system can be solved much faster. There are several different methods such as Proper Orthogonal Decomposition (POD) [13,21,37], Balanced Truncation (BT) (see, e.g. [6,26]), H 2 -norm model reduction (H2) [7,11,18,25], Reduced Basis (RB) [22,27], Hankel norm approximation [9] and many more out of which the first three, POD, BT and H2, will be the focus of this paper. Roughly speaking, model reduction methods can be divided into two groups as follows. On the one hand, methods such as BT and H2 compute the reduced system on basis of an approximation of the operator mapping from inputs (i.e. usually timedependent parameters to control the system) to outputs (i.e. observations of the system). This approach is applicable for linear time-invariant (LTI) large-scale ODE systems which can be obtained by semi-discretization of a PDE. For an overview we refer to [2]. Note that newer approaches for bilinear systems are available [4]. On the other hand, methods such as POD and RB approximate the state and/or adjoint variable itself. This requires solving the system once or even several times for different parameters to obtain the reduced system. However, it provides other advantages such as the possibility of treating nonlinear and time-varying systems.
Nevertheless, in both groups, model reduction is particularly of interest if one seeks a solution of the system for different parameters which can be finite-dimensional (particularly suitable for RB methods) or infinite-dimensional. In many applications, the system has to be solved during a parameter variation or even during an optimization or an optimal control procedure for many parameters, inputs or controls. Hence, even if the full system has to be solved in advance during a so-called off-line phase to compute the reduced system, the decrease of the computational costs for the complete parameter variation or optimization can be very large.
POD is mostly used for approximation in the context of (nonlinear) PDEs for functions depending on time and space. To be more precise, using a POD Galerkin scheme, one seeks a solution as a linear combination of coefficient functions depending on time and basis functions depending on space. Contrary to standard basis functions in a Finite Element Method (FEM), the spatially dependent so-called POD basis functions have a global support and involve some information about the system which is obtained out of the solution at certain time points (called snapshots).
In numerical practice, the snapshots (and hence, the POD basis functions) are generated by first semi-discretizing the system in space using, for instance, finite elements and then a time-integration of the obtained large-scale ODE system. This idea of semi-discretization is hence similar to the procedure employed if methods for LTI systems such as BT or H2 are used for model reduction in the context of PDE simulation.
Hence, from the numerical point of view, the three reduction methods POD, BT and H2 are based on the same idea. The approach involves a semi-discretization of the problem in space (resulting in a high-dimensional time-dependent ODE system) and the projection to a lower dimensional state vector. POD usually involves a Galerkin projection where the projection matrix is generated by snapshots of a solution of the PDE obtained for a certain reference control. The other two methods involve a Petrov-Galerkin projection where the projection matrices are obtained by considering a suitable LTI system with the controls as input and those parts of the state being relevant for the cost functional as output variables. Of course, the topology of the underlying PDE problem (e.g., L 2 or H 1 ) has to be taken into account when carrying out the projection.
Though POD is an excellent method of model reduction for many time-varying or nonlinear differential equations, it lacks an a priori error analysis that is in some sense uniform in the right-hand side of the underlying partial differential equation, say with respect to the control function. There are results on a priori estimates for POD that depend on certain assumptions on the orthogonal basis generated by the selected snapshots. We refer to Kunisch and Volkwein [19], Sachs and Schu [31], or [34]. However, such estimates will, in general, depend on the control used for generating the snapshots. In [12] an a-priori error analysis is presented for linearquadratic optimal control problems. If the POD basis is computed utilizing the optimal state and associated adjoint variable, a convergence rate can be shown. But in the actual computation we do not know the optimal solution in advance. In view of this, we are interested in a-posteriori estimates for assessing the precision of optimal controls for reduced control problems set up by POD. For the reduced-basis method a-posteriori error estimates for linear-quadratic optimal control problems we refer to [10]. An extension to nonlinear problems was recently given in [17].
BT and H2 are very rarely used in the context of optimal control; see [29] for an example. In a recent work [39], BT and H2 have been applied to optimal control problems subject to linear evolution equations. Accuracy of the optimal control has been compared between BT and H2 for different fixed sizes of the reduced system using a first-discretize-then-optimize (FDTO) approach.
There are three goals in this paper. Firstly, BT and H2 model reduction will be combined with a first-optimize-then-discretize (FOTD) approach which requires additional reduction of the adjoint equation. Here, we choose a rather simple method, a gradient descent technique, as the numerical optimization technique since the focus of this paper is on the model reduction techniques. Indeed, there are other techniques such as a primal-dual active-set approaches which are widely used in the context of optimal control problems. Secondly, a-posteriori error analysis is applied for the optimal control obtained for the reduced problem using BT and H2 as model reduction technique. Therefore, a technique developed in [34] for POD is adapted to BT and H2. Let us refer the reader to [33], where the a-posteriori analysis was successfully applied to the reduced-basis method. Thirdly, the three model reduction techniques (BT, H2 and POD) are compared by means of numerical experiments. As an overall goal, we aim for a framework which describes the three techniques in a similar manner providing the possibility of comparison with respect to accuracy and performance. Up to the authors' knowledge, such a comparison can not yet be found in the literature.
The paper is organized as follows. Section 2 introduces the problem class to be considered in this paper and summarizes well-known necessary optimality conditions. The model reduction methods POD, BT and H2 are given in Section 3. Individually, the three methods are well known; however, we will present the computational procedure to obtain the reduced system in one framework with a focus on correlations and differences of the three methods. Section 4 deals with the application of the reduction techniques to optimal control theory using a-posteriori error estimates for the optimal control. Results for numerical test are presented in Section 5.
2. Linear-quadratic parabolic optimal control problems. In this paper, we will consider optimal control problems subject to parabolic PDEs with the control variable appearing linearly and with cost functionals of tracking type with respect to the state and the control.
2.1. Problem formulation. The task is to minimize the quadratic cost functional

GEORG VOSSEN AND STEFAN VOLKWEIN
subject to the linear parabolic problem where y (depending on ( is the spatial domain with boundary Γ = ∂Ω, T is the final time, α is a coefficient, β Q and β Σ are coefficient functions (depending on (x, t) in Q and Σ, respectively), λ Ω , λ Q , λ Σ , λ v , λ u are nonnegative constants with λ u + λ v > 0. Furthermore, y 0 is an initial distribution, ν denotes the normal vector on Γ and R = ((r ij )) ∈ L ∞ (Ω; R d×d ) is a given coefficient matrix. The operator A denotes the usual elliptic operator of the form A solution to (2) is understood in the weak sense. Notice that problem (1) and (2) can be written in the abstract form considered in [35].

2.2.
Necessary optimality conditions. Existence and uniqueness are discussed in, e.g., [35] where the following first order necessary optimality conditions can also be found.
Theorem 2.1. Let (ȳ,v,ū) be the optimal solution of (1)- (2). Then it satisfies the variational inequalities for all v and u which satisfy (2d) and (2e), respectively, and where the associated adjoint variablep is the solution of the adjoint system where A * denotes the adjoint operator of A satisfying Second order optimality conditions based on classical results from Maurer and Zowe [24] on optimization in Banach spaces can also be found in [35]. 3. Numerical model reduction by projection. As mentioned in the introduction, we will focus on the three model reduction methods POD, BT and H2. The methods share the feature that the state is projected on a lower-dimensional space to obtain a reduced system that can be solved much more efficiently. In a first step, the PDE problem is semi-discretized in space to obtain a large-scale LTI ODE system. The second step includes computation of the projection matrices -either out of snapshots obtained from an FEM solution (POD) or out of the transfer function which describes the input/output relation of the LTI system (BT and H2).
3.1. Semi-discretization to an LTI system using finite elements.
3.1.1. Semi-discretization of the state equations. Generate a grid onΩ with n (for instance, equidistant) discretization points x j , j ∈ N := {1, . . . , n}. Define basis functions ϕ j :Ω → R, j ∈ N ; e.g., hat functions such that ϕ j (x i ) = δ ji and ϕ j is piecewise linear. A solution y of the PDE (2) shall be approximated by y n with y n (x, t) = n j=1 z j (t)ϕ j (x) and the vector function z = (z 1 , . . . , z n ) T is called the semi-discretized state.
Analogously, we approximate the controls by v n (x, t) = j∈N \J w j (t)ϕ j (x) and u n (x, t) = j∈J w |N \J|+j (t)ϕ j (x), where J := {j ∈ N : x j ∈ Γ} and denote w = (w 1 , . . . , w n ) T as the semi-discretized control. Of course, we can also use different basis functions for the control variable, e.g., piecewise constant step functions. Note that often some components of the vector function w are irrelevant for computation of the cost functional J. For instance, if β Q ≡ 0 (i.e., we have a boundary control problem) it is sufficient to consider only the indices j ∈ J.
Similarly, in many applications, it is not necessary to know the complete semidiscretized state for computation of the (discrete) cost functional, but only a linear combination of the state, for instance, the state at the (or some parts of the) boundary if λ Ω = λ Q = 0. Those linear combinations shall be called output variables of the semi-discretized system.
To summarize, we end up with an implicit linear time-invariant dynamical system given by with the (semi-discretized) state z : R → R n , the (semi-discretized) control w : R → R m and the (semi-discretized) output a : R → R q . Here, m is the number of those discretization points "where v and u act" (i.e. where the corresponding factor β Q or β Σ , respectively, in (2) does not vanish) and q is the number of linear combinations of state variables "required to compute the cost functional" (i.e. where the corresponding factor λ Ω , λ Q or λ Σ , respectively, in (1) does not vanish). Furthermore, z 0 is the initial vector with components z 0,j = y 0 (x j ) for j ∈ N and E, A ∈ R n×n , B ∈ R n×m , C ∈ R q×n are constant matrices. Hereby, E, A, B are determined by the type of chosen FEM basis functions. For instance, using hat functions, E and A are tridiagonal for a one-dimensional spatial domain Ω. The matrix E is the (positive definite, symmetric) mass matrix and −A the (positive semi-definite, symmetric) stiffness matrix. In the context of system theory, the matrices E, A, B, C are called system matrices. If E = I n (where I n is the identity matrix of dimension n), the system is implicit and often called a descriptor system. Note that many model reduction methods which approximate the input/output operator such as BT and H2 are well-established for explicit systems, i.e. systems of the formż which can formally be obtained from (5) by multiplication with E −1 from the left since E is positive definite. Hence, 3.1.2. Semi-discretization of the adjoint equations. The structure of the adjoint system (4) is very similar to the structure of the state equations (2). More precisely, the state occurring on the right hand side of (4a) and (4b) can be seen as an adjoint input. Furthermore, those parts of the state occurring in the variational inequality (3a)-(3b), can be interpreted as an adjoint output. Hence, to solve the adjoint system, the same procedure can be applied. Semidiscretization provides an implicit LTI system for the semi-discrete adjoint z a , the semi-discrete adjoint input w a and the semidiscrete adjoint output a a . Hereby, the terminal condition is obtained from condi- Analogously, this can formally transformed to an explicit system. Note that one can apply a time transformationť = T − t to obtain an ODE system with given initial value.
3.1.3. Basics on LTI systems. In the following discussions, we need some basic ingredients from system theory. Applying Laplace transformation to the system equations (6) we obtain for z 0 = 0 the relation where the subscript L denotes the corresponding quantity in Laplace space. Equation (8) says that the output depends linearly on the input with the so-called transfer function H : C → R q,m as factor. An important property of an LTI system is that representation (6), also called the realization of the system, is not unique. In fact, there are infinitely many other system matrices, namelỹ with an arbitrary regular matrix S (called state space transformation matrix) which map all inputs w to the same output a as for the original realization. Note that the transfer functions of these two realizations are identical. As mentioned, we will investigate systems where E is positive definite. Hence, stability can be discussed by means of the eigenvalues of Since −A is positive semi-definite, the system is α-stable for all α > 0. If (9) is fulfilled for α = 0 (which is the case if −A is positive definite), the system is called stable. For stable systems, we finally introduce the generalized Gramians P and Q of reachability and observability, respectively, which satisfy the Lyapunov equations To numerically solve the state system (5) or (6) as well as the adjoint system (7), respectively, one additionally has to generate a time grid on [0, T ] with k (for simplicity, equidistant) time steps . . , k} resulting in time step size h t = T /k. The full discretized state and control are denoted by z i = z(t i ) ∈ R n and w i = w(t i ) ∈ R m . The solution can be obtained, for instance, by a standard θ-method The vector z i , i ∈ K, is called discrete snapshot at time instance t i . The same has to be applied to the LTI system obtained from semi-discretization of the adjoint equations.
Remark 1. We note that other semi-discretization techniques such as finite differences or discontinuous Galerkin are also applicable.

3.2.
Projection to a reduced system. Instead of the large-scale dimensional trajectories z of the system (5), we consider trajectoriesẑ which evolve in an rdimensional subspace. We therefore consider a state space transformation parti- the so-called projection matrices Z, V ∈ R n×r . With this partition, Z and V satisfy Z T V = I r (where I r is the identity matrix of dimension r) which implies that Π := V Z T is a projection. We furthermore partition the statez = Sz by z = (z T 1 , z T 2 ) T with z 1 ∈ R r and z 2 ∈ R n−r and obtain, regarding the first r equations in the LTI system, . (11) Note that this formula is exact. Approximation by model reduction comes into play if we assume that S 1 z 2 is small and can therefore be neglected. We hence definê z ∈ R r as solution of the approximated ODE system in (11) where S 1 z 2 vanishes; i.e., (6) becomes with initial conditionẑ(0) =ẑ 0 = Z T z 0 which is also a linear time-invariant system with so-called reduced system matricesÊ := Z T EV ∈ R r×r ,Â := Z T AV ∈ R r×r , B := Z T B ∈ R r×m andĈ := CV ∈ R q×r . If Z = V , the procedure is called a Galerkin projection. In the general case, we obtain a Petrov-Galerkin projection. POD is usually combined with a Galerkin projection whereas BT and H2 are applied by means of a Petrov-Galerkin projection. The construction of the projection matrices will be summarized in the next paragraph.
Remark 2. POD can be used for implicit systems (indeed for nonlinear problems). BT is also applicable but we note that the standard method in Matlab called balreal is implemented only for explicit systems. For extensions to implicit systems, we mention Stykel [32], Saak [30] and references therein. H2 is usually considered for explicit systems. However, the theory and the algorithms can easily be extended to implicit systems (see Section 3.2.3 for details).
This gives rise to two approaches for H2 and BT. On the one hand, the system can firstly be transformed to an explicit one which can be reduced with standard H2 and BT methods. This approach involving reduction of the explicit system is called H2-RE or BT-RE, respectively. On the other hand, it is possible to directly reduce the implicit system and this is referred to as the H2-RI or H2-RI approach. Regarding the first approach, E is a large matrix which can make its inversion numerically infeasible. Nevertheless, since E is sparse its inverse may be computed with moderate numerical effort (or even given analytically; see numerical example). The second approach seems to avoid this problem. However, standard reduction methods are usually more robust for explicit systems and often designed for stable systems. In the numerical tests, we compare POD with both approaches applied to H2 and BT.
3.2.1. Projection matrix for POD. The first step is to solve the ODE system in (12) for a certain reference control u ref which may also be chosen as the starting guess for the optimization routine. Following Section 3.1.4, we obtain k discrete snapshots z i ∈ R n out of which we construct the so-called snapshot matrix It can be observed that the eigenvalues decay very rapidly such that the idea of model reduction is to use only r k basis functions. The projection matrix is then given by Note that due to construction of Z, we obtainÊ = I r .
We mention again that the original large-scale system has to be solved once for a certain reference u ref control. Hence, the projection matrix Z depends on the choice of this control (see our numerical experiments).

Projection matrices for Balanced Truncation.
The idea of BT model reduction for stable systems is to firstly find a so-called balanced realization E B , A B , B B , C B characterized so that the corresponding Gramians are diagonal and satisfy P B = Q B =: D H . The diagonal entries σ 1 ≥ . . . ≥ σ n ≥ 0 in the matrix D H are called Hankel singular values of the system. In the second step, less significant parts corresponding to n − r small Hankel singular values of the system are truncated. A (non-unique) form of the projection matrices is given by with singular value decomposition and Cholesky factorizations where Υ L and Υ R are lower left and upper right triangular matrices, respectively. For explicit systems, there are further transformations which can be found in [2,5]. Unstable systems are considered in [16]. Note that many methods (also for implicit systems) usually provideÊ = I r .
Contrary to POD, the original system need not be solved in advance. The matrices Z and V do not depend on the choice of any reference control.

3.2.3.
Projection matrices for H 2,α -norm optimal model reduction. For stable systems, the reduced system constructed in such a way that the output of the reduced system approximates the output of the original system optimally w.r.t. the L 2norm for the impulse input. Equivalently, the transfer function of the reduced system approximates the transfer function of the original system optimally w.r.t. the H 2 -norm given as the special case α = 0 of where tr(M ) denotes the trace of a matrix M . For α-stable systems with α > 0, one can shift the system by α (i.e., consider a system with perturbed system matrix A − αI n ) to obtain a stable system or, equivalently, search the optimal reduced system whose transfer function approximates the original one w.r.t. the H 2,α -norm. The choice of this norm is reasonable since many problems comprise L 2 expressions of the state in the cost functional. Correlations between this norm and the error in the optimal control have been investigated in [39].
for j = 1, . . . , r, whereλ j are the eigenvalues ofÂ andB,Ĉ are represented aŝ Sinceλ j ,B andĈ are not known, the projection matrices are found by an iterative algorithm. Note that formulae (14) are an extension to the known expressions for explicit systems where E = I n . For systems with m = q = 1, the method IRKA [11] is applicable, whereas the method MIRIAm [7] can be used in the multi-dimensional case. Note that the projection matrices and hence the reduced system matrices may have complex entries. However, for systems with m = q = 1 it has been shown that Z and V can be transformed so that the reduced system matrices are real [39].
As for BT, the projection matrices are independent on the choice of any reference control.

4.
Model reduction for optimal control problems.

4.1.
Reduced optimal control problem. Using semi-discretization as in Section 3, the PDE optimal control problem can be transformed to a large-scale ODE problem which reads as with some suitable function f in the cost functional depending on the output a induced by the tracking terms for the state and the input w induced by the regularization terms. For a fixed number r n, we can formulate the so-called reduced optimal control problem of order r as follows.
(P r 1 ) Remark 3. Note that the expression of the "reduced problem" is sometimes also used for the problem which is obtained if one considers the cost functionalĴ(u) = J(Su, u) where S is the control-to-state mapping. Here, the notation is related to the reduced system which is the well-known notation in the context of model reduction.
In [39], the reduced problem is considered for BT and H2 model reduction. Using an FDTO approach, the solution has been investigated for different fixed sizes r and compared with the solution of the original problem for a fine discretization.
Here, we will combine model reduction with an FOTD approach which requires model reduction of the adjoint system also. This may lead to additional reduction computations. In simple examples, however, the adjoint equation can have the same structure as the state equation so that the same reduced system may be used. In the numerical tests, a gradient projection method will be used for solving the reduced optimal control problems.

4.2.
A-posteriori error analysis for the control. Of course, if the reduced system (12) approximates the original system (6) well, we expect the solutionw of the reduced optimal control problem (P r 1 ) to be close to the optimal solutionw of the original control problem (P 1 ). Indeed, correlations between the approximation error in the H 2,α -norm of the model reduction and the error in the optimal control have been demonstrated for several numerical examples in [39], where BT and H2 were investigated and compared. However, although the error in the optimal control has been observed to be rather small (even for very small numbers of r), it is a-priori not clear how large r should be chosen.
To estimate the error in the optimal control a-posteriori, we will apply a technique also called perturbation method which was first used by Dontchev et al. [8] and Malanowski, Büskens and Maurer [23] for optimal control of ODEs. The method has later been adapted to PDEs [3,33,34], where in the latter two references the error of the solution of the reduced problem obtained by POD and/or reduced-basis model reduction was estimated. Here, we apply this technique in the same problem class, but for the different reduction methods, namely BT and H2.
The idea will be summarized here for a class of boundary control problem; the case of a distributed control is analoguous. A non-optimal controlũ = u together with corresponding stateỹ andp for problem (1)-(2) does not satisfy the variational inequality (3b) but the perturbed inequality for all u satisfying the control constraints (2e) with a perturbation ζ given by We stress that it is possible to estimate the error in the control without knowledge of the exact (unknown) control. However, the estimator requires the solution of the full primal and dual equations (to computep(x, t)); e.g., by standard FEM with fine discretization which we assume to provideỹ andp with negligible error. Otherwise, the estimator should also include the numerical discretization error forỹ andp.
In [34], an algorithm for solving the optimal control problem up to a desired accuracy was presented. The idea is to iteratively increase the number of POD basis functions for the reduced problem until the estimator ensures the desired accuracy of the optimal control. In each iteration, the reduced control problem was solved by a primal-dual active set strategy, which is locally superlinearly convergent; see, e.g., [14].
Here, we use the same idea of error estimation for the solution of the reduced problem with iteratively increased size r for all three reduction methods POD, BT and H2. In each iteration, the reduced problem is solved by a gradient projection method. Of course, one can also apply a primal-dual active set strategy. However, since we are mostly interested in comparing the errors for different numbers r of basis functions, our main focus in the present paper is not on the optimization method. Starting with r = 1 and some starting guess for the optimal control, we will use the obtained optimal control for the reduced problem of size r as the starting guess for the problem of size r + 1.

Remark 4.
To improve the approximation quality for the POD Galerkin scheme one can combine the change of the number r of basis functions with a change of the POD basis. In [38] a combination of the a-posteriori analysis with Optimality-System POD [20] was presented. Here we combine our a-posteriori error analysis with the following update strategy (compare [1,28]): Assuming that the main computational effort in determining the POD basis is the solution of the full state and adjoint PDE, one may update the POD basis for each r since these solutions are necessary for the error estimator in each step.

Numerical tests: Optimal boundary control of a convection-diffusion equation.
5.1. Problem formulation and optimality conditions. Consider the following optimal control problem of minimizing the L 2 -norm difference of the temperature y = y(x, t) of a rod at the boundary x = 0 to a desired temperature y d = y d (t) involving a regularization term with regularization parameter λ u > 0 for the boundary control u = u(t) acting at the boundary x = 1: y(x, 0) =y 0 (x) for x ∈ Ω = (0, 1), with u a , u b ∈ R satisfying u a ≤ u b and some convection parameter η ∈ R. The adjoint equations are given bȳ The variational inequality comes to which implies the weak minimum condition for the optimal control, where Proj U ad denotes the projection operator onto the set U ad .

Semi-discretization and FEM solution.
Let us semi-discretize the problem using finite elements. For j = 1 . . . , n, we define x j := ∆x(j−1), ∆x = 1/(n−1), w = u and hat functions ϕ j by else, for j = 1, . . . , n. Considering the weak formulation of the problem, we end up by an implicit LTI system for the state in the form (5) with system matrices Hereby, A D is related to the diffusion term in the state equation, and A C is related to the convection term. After the transformatioň we obtain a rather similar LTI system for the adjoint system with the same system matrices E a = E, B a = B, C a = C as for the state LTI system and the matrix A a = A + diag[−η, 0, . . . , 0, η] which is slightly different for η = 0. The obtained ODE systems are solved with a semi-implicit method for θ = 1/2 in (10).
Note that following the RE method discussed in Remark 2, this can be transformed to an explicit system. Hereby, we deduce from [36] that, indeed, the inverse of E can be given analytically in compact form as

5.3.
Non-convective systems. We first discuss the case η = 0; i.e., there is no convection in the system. We choose the data and discretizations as We apply a gradient projection method with a starting guess for the control of u ≡ 0.5. The stopping criterion is chosen so that (15) is fulfilled in the discrete L 2 -sense with error less than tol G =1e-05 is satisfied after 41 iterations. The obtained optimal controlū (solid line) and the function −p(1, t)/λ u (solid line with dots) appearing in the minimum condition (15) are given in Figure 1 (left).
The optimal stateȳ at x = 0 (solid line) and the desired temperature (dashed line) are given in Figure 1 (right). The reasonable control strategy suggests to heat the rod with maximal power u = 1 (i.e., the control constraint is active until the exit time t e = 0.4343) in the beginning (whereȳ y d ) and, after continuously decreased heating, to keepū around zero at the end (whereȳ ≈ y d ) of the time interval. Note thatū vanishes at time t v = 0.6869, it is minimal at time t m = 0.7677 withū(t m ) = −0.0168 and then monotonically increases until t = T withū(T ) = −1.5517e-05.

Solution using model reduction.
Let us now apply the proposed methods. For each reduced problem, we apply the gradient projection method with a combined Armijo and tri-section method for step size determination. The gradient method stops if the minimum condition is fulfilled in the L 2 -sense with error less than tol G =1e-05 or if the number of iterations exceeds 15 (which does indeed happen for low values of r). Furthermore, the stopping criterion for the outer loop (i.e., the optimization) is that the estimator fulfills ζ L 2 (0,T ) /λ u <tol E =1e-05 in the discrete sense. Since 0 is the eigenvalue with largest real part for the eigenvector [1, . . . , 1] T of A, we choose α = 0.1. The stopping criterion for the IRKA method to construct the reduced system for the H2 technique is given by a relative distance of the largest eigenvalue magnitude ofÂ between two iterations less than 1e-08. This results in 10-15 iterations depending on the value of r. For POD, different reference controls are tested: u ref (t) = u 1 (t) ≡ 0.5, u ref (t) = u 2 (t) = t and u ref =ū(t) wherē u is the optimal control obtained from the FEM solution depicted in Figure 1. The algorithm is tested for POD without and with basis update in each iteration; cf. Remark 4. H2 and BT are tested for the RE and the RI approach; cf. Remark 2. The BT-RE approach is realized with the Matlab routine balreal whereas BT-RI is applied using the Generalized Low-rank Cholesky factor ADI iteration (G-LRCF-ADI) by Saak [30]. As mentioned above, the adjoint and the state LTI system are equivalent after transformation (19) resulting in the same reduced system for H2 and BT. However, the sets of POD basis functions differ since the input u for the state is different to the adjoint input y − y d resulting in different snapshots. As starting guess for the control, we take u ≡ 0.5. All tests have been carried out in Matlab using a standard PC. The obtained optimal control is denoted byū.
The results for the three methods (POD, BT and H2) are shown in Tables 1-5. The tables show the values of the control error estimator, the effectivity (where, again,ū is taken from the FEM solution depicted in Figure 1), the number of gradient method iterations and the relative H 2,α -norm error of the reduced system in each iteration step of the algorithm. Hereby, the latter error is computed by means of the corresponding explicit systems.

5.3.3.
Discussions. The algorithm converges for all three reduction methods after 14 iterations or less. As a general observation we find that the method converges faster using H2 and BT than using POD. The choice of the reference control to construct the POD basis functions seems to have only a marginal effect in the number of iterations for the chosen tolerance accuracy. However, in the first few iterations, the error estimator provides large  The choice of α in the H2 reduction is heuristic and numerical tests for different values of α show some dependency on this choice. Furthermore, the accuracy in the number of IRKA iterations affect the form of the reduced system and also the total number of optimization iterations.
The estimator provides an effectivity η eff ≤ 8. Note that the effectivity turns out to be lower (between 1 and 3) in most of the iterations using POD. Despite this good effectivity for all methods, the error estimator requires two FEM solutions and hence, the main numerical effort in each iteration for 2D or 3D domains. r ζ /λ u η eff Iter MR error ζ /λ u η eff Iter MR error 1 3.683e+00 6.720 15 9.380e-01 3.683e+00 6.720 15    H2-RE and B2-RE give similar results in each iteration, regarding control error estimation (and hence, number of iterations) as well as the model reduction error. H2-RI however, converges must faster than all other methods whereas BT-RI works only slightly better than BT-RE.
We observe that the relative H 2,α -norm error is related to the estimated error in the control in the sense that methods with similar model reduction errors (such as BT-RE and BT-RI or POD with and without update) provide control errors of the same magnitude. Furthermore, the POD reduced system has a larger reduction error than BT-RE and H2-RE and requires more outer loop iterations. However, BT-RI has the largest reduction error in the last iteration (where the control error is low). Furthermore note that considering a given model reduction error, a corresponding system allows for a lower control error using POD than that obtained by using H2 (which is nearly identical to that obtained by BT).

FEM solution.
Let us now consider the case η = 0. As above, we start the discussion with the FEM solution. The data and discretization is chosen as in (20) and the accuracy for the gradient method is tol G =1e-05. We compute the solution for η = η i , i = 1, . . . , 4, with η 1 = 0.1, η 2 = 0.5, η 3 = −0.1, η 4 = −0.5. The solution is depicted in Figure 2. It can be observed that the structure of the optimal control for η i , i = 4, is the same as for η = 0. However, the exit time t e , the vanishing control time t v as well as the time t m of the minimal value andū(t m ) increase for increasing η. Indeed, for η = η 4 = 0.5, the numerical solution reveals no negative values for the optimal control butū monotonically decreases for t ≥ t e withū(T ) = 1.3467e-04. The stateȳ(0, t) approaches the desired state y d faster for smaller values of η revealing that despite of less heating, the convection leads to higher values ofȳ at x = 0.  Table 6 where the first, second, third and fourth block (each consisting of five rows) is related to a tolerance of tol E =1e-02, tol E =1e-03, tol E =1e-04 and tol E =1e-05, respectively. 0.  However, for strong convection (i.e., |η| = 0.5), convergence of the model reduction error as well as the value of the error estimator turned out to be so slow such that after 40 iterations some small tolerances could not be achieved. This is indicated by a dash in Table 6. Roughly speaking, H2 and BT converge faster than both versions of POD with and without update. Updating the POD basis brings a certain speed-up in many test cases. In some cases for POD (especially but not exclusively without update), the corresponding eigenvalue for determination of the POD basis vanishes at some saturation value r = r sat . Hence, the algorithm fails to compute the POD basis and small desired tolerances cannot be achieved. In this case, the entry in the table is also given by a dash. Furthermore note that the decay of the error estimator turned out to be not strict at certain iteration numbers for POD.
It can be observed that for increasing values of η, all algorithms converge slower or even fail to converge. Furthermore, the differences in the number of iterations between H2 and BT on the one hand, and POD on the other hand, increase. POD without updating seems to work rather poorly for increasing |η|. However, updating the basis provides a solution method which can be applied successfully in most test cases. 6. Conclusions and outlook. This paper deals with numerical solution methods for linear-quadratic parabolic optimal control problems. The focus is on projection based model reduction, in particular Proper Orthogonal Decomposition (POD), H 2,α -norm reduction (H2) and Balanced Truncation (BT) to solve the underlying dynamics efficiently.
We presented a framework for applying the three different model reduction methods based on semi-discretization of the state (and adjoint) equation to an LTI system with occuring control variables (and state variables) as input and relevant states for the cost functional (and for the minimum condition) as output variables. These two LTI systems are reduced to lower-dimensional reduced systems by applying certain Galerkin (for POD) or Petrov-Galerkin (for H2 and BT) projections. The projection matrices are summarized in this paper.
To solve the optimal control problem, an approach from [34] which was used for POD model reduction has been extended to H2 and BT model reduction. The approach involves iterative solutions of certain reduced control problems with increasing dimension r until some desired accuracy in the optimal control is achieved. The latter is verified in an outer loop with a-posteriori error estimation in each iteration. With this general framework, we were able to compare the three model reduction methods in terms of efficiency and performance of the optimal control solution technique by means of numerical tests for a convection-diffusion equation.
Since H2 and BT have so far been used without error estimation and are based on a first-discretize-then-optimize approach (which accounts for only the state equation), this paper makes three contributions. It extends the idea of using H2 and BT model reduction to a first-optimize-then-discretize approach, combines it with a-posteriori error estimation for the optimal control and implements these for comparison with POD which represents as one of the most popular model reduction techniques.
As result of the comparison in the numerical tests, we observe that all three model reduction techniques are well-suited for application. There are, however, some differences such as in the number of iterations of the outer loop. Roughly speaking, H2 and BT provide faster convergence than POD. On the other hand, POD can also be used for nonlinear problems. Furthermore, the reduction can be applied faster than for BT and H2 since the latter involve operations on matrices which depend on the spatial discretization number whereas POD works with matrices which depend on the time discretization number (which is often lower, especially for problems with 2D or 3D domains).
Regarding the performance of the outer loop, H2 and BT provide the possibility of reduction of an implicit or explicit system. The implicit approach H2-RI turned out to be the fastest of all methods and for all test cases whereas BT-RI provided slower convergence and robustness problems in the reduction algorithm for some test cases. The performance of POD depends rather strongly on the chosen reference control to construct the POD basis functions. Furthermore, convection in the system provided poor accuracies in the optimal control. However, by updating the POD basis, we obtain better performance independent on the chosen reference control with increased accuracies in the outer loop. Indeed, convection in the system turned out to decrease efficiency and performance of all solution methods.
In the future, further numerical tests are planned. In particular, it will be interesting to investigate 2D or 3D problems since the differences in the model reduction computational times will be more significant. Furthermore, problems with multidimensional inputs and outputs are of interest. The general framework also gives rise to the idea of applying other reduction techniques to obtain the reduced problem. Further work results from the fact that the error estimator requires solution of the full state and adjoint system which is the main computational effort, especially for 2D and 3D problems.