Convergence analysis of Euler discretization of control-state constrained optimal control problems with controls of bounded variation

We study convergence properties of Euler discretization of optimal control problems 
 with ordinary differential equations and mixed control-state constraints. 
 Under suitable consistency and stability assumptions a convergence rate of 
 order $1/p$ of the discretized control to the continuous control is established in 
 the $L^p$-norm. Throughout it is assumed that the optimal control is of 
 bounded variation. The convergence proof exploits the reformulation of first 
 order necessary optimality conditions as nonsmooth equations.


1.
Introduction. Direct discretization methods are widely used to solve challenging optimal control problems numerically and many different implementations are available. The theoretical convergence properties of direct discretization methods, however, are not yet fully understood, e.g. for problems with discontinuous or singular controls, and only few papers address this issue. Please refer to [23] for a survey on discretization of optimal control problems.
Malanowski et al. [22] show the convergence of an Euler discretization for optimal control problems subject to mixed control-state constraints. The convergence rates for state, control, adjoints and multipliers turn out to be linear with respect to the supremum norm under the assumption (amongst others) that the optimal control is continuous. Likewise, assuming -amongst others -that the optimal control is Lipschitz continuous, Dontchev et al. [9] investigate an Euler discretization of a state and mixed control-state constrained optimal control problem and establish a linear convergence rate for the state in the W 1,2 -norm and for the control in the L 2 -norm. These papers rewrite necessary conditions as generalized equations and exploit stability results based on Robinson's implicit function theorem for generalized equations to derive the convergence results. Higher order Runge-Kutta discretizations are considered in Hager [15] and in Dontchev et al. [10], where the discussion is restricted to second order Runge-Kutta methods. It is shown that a higher order convergence rate is possible, if the optimal control is sufficiently smooth.
Discontinuous controls are permitted in Alt et al. [3,4] and Veliov [28], wherein the discussion is restricted to linear and linear-quadratic optimal control problems, respectively. These papers assume the optimal control to be of bang-bang type and exclude singular subarcs.
Mesh independence results for Newton's method and the Lagrange-Newton method can be found in Alt [1,2]. Herein, consistent and stable discretizations for generalized equations play a central role and, as we shall see, are related to our approach.
Recently, Loxton et al. [25] investigate convergence properties for the control parametrization enhancing technique subject to control-state constraints and show convergence of the objective function values. Herein, the objective contains a term measuring the total variation of the control.
The purpose of this paper is to obtain convergence results for discretized optimal control problems subject to mixed control-state constraints without assuming continuity of the optimal control. Instead we will merely assume the optimal control to be of bounded variation, thus discontinuities in the control are permitted. The proof technique follows the general discretization theory for operator equations of Stetter [26], extends it towards nonsmooth equations, and aims at showing that consistency and stability implies convergence. Instead of considering generalized equations we will make use of nonsmooth reformulations of first order necessary optimality conditions. The assumptions in the convergence proof, however, exclude problems with pure state constraints and are not automatically satisfied for problems with the control appearing linearly.
Note that this paper is an abridged version of Chapter 5 in Kunkel [18]. In this article we discuss the following optimal control problem which we call the continuous optimal control problem in the sequel: Problem 1 (ODE-OCP). Let J : R nx × R nx → R, f : T × R nx × R nu → R nx , ψ : R nx ×R nx → R n ψ , c : T ×R nx ×R nu → R nc be Lipschitz-continuous w.r.t. t and twice continuously differentiable w.r.t. all other arguments. Find a state variable x ∈ W 1,∞ (T, R nx ), and a control variable u ∈ L ∞ (T, R nu ) such that the objective function J(x(t 0 ), x(t f )) is minimized subject to the ODE x(t), u(t)), a.e. in T, the boundary conditions ψ(x(t 0 ), x(t f )) = 0, and the mixed control-state constraints c(t, x(t), u(t)) ≤ 0, a.e. in T.
Without loss of generality the discussion is restricted to problems on the fixed and bounded time interval T := [t 0 , t f ] ⊂ R. As usual, the Banach space L ∞ (T, R n ) consists of all essentially bounded functions v : T → R n and the Banach spaces W 1,p (T, R n ) with p ∈ N or p = ∞ consist of all absolutely continuous functions v : T → R n with bounded derivative v in the L p -norm. The numbers n x , n u ∈ N, and n c , n ψ ∈ N 0 denote the numbers of states, controls, and constraints.
and with this norm, the space BV ([a, b], R) is a Banach space, see Heuser [16]. The space N BV ([a, b], R) is the space of all normalized functions of bounded variation, i.e. all functions v ∈ BV ([a, b], R) which are continuous from the right and satisfy v(a) = 0. Furthermore we define the space Our aim is to show convergence of the Euler discretization of Problem 1 for controls of bounded variation. The main tool in our discussion is the reformulation of necessary optimality conditions as system of equations. The optimality conditions consists of equations and complementarity conditions. The complementarity conditions will be reformulated as equations by applying an NCP (nonlinear complementarity problem) function ϕ : for a, b ∈ R m . One popular choice for ϕ in the special case m = 1 is the convex and Lipschitz continuous Fischer-Burmeister function ϕ(a, b) := √ a 2 + b 2 − a − b, compare Geiger and Kanzow [12]. Componentwise application of the Fischer-Burmeister function yields a vector-valued NCP function as in (1). Amongst others, Sun and Qi [27], Chen et al. [7], and Geiger and Kanzow [12] define and investigate alternative NCP functions.
The Fischer-Burmeister function ϕ is not differentiable at (a, b) = (0, 0). As a substitute for the derivative of ϕ we use Clarke's generalized Jacobian, compare [8]: This paper is organized as follows. In Section 2 we summarize necessary conditions for local minimizers of Problem 1. These conditions are rewritten as an equivalent nonsmooth system of equations. In Section 3 we discretize Problem 1 using Euler's method for the differential equation and obtain a finite dimensional optimization problem. The first order necessary conditions for minimizers of this finite problem are transformed into an equivalent nonsmooth system of equations using the same techniques as in Section 2. The convergence of the discretization method for controls of bounded variation is established in Theorem 4.6 in Section 4, which, together with Theorem 4.7, contains the main contribution of the paper. Finally a numerical illustration is presented in Section 5.
2. The continuous problem. The analysis of the discretization method which we define in Section 3 is based on necessary optimality conditions for the continuous and discretized optimal control problems. In this section we discuss the continuous problem.
Let (x * , u * ) be a weak local minimizer of Problem 1. The Jacobian matrices of f and c with respect to x and u are denoted by f x , f u , c x , and c u , respectively. Throughout, we use the notion f x [t] as an abbreviation for f x (t, x * (t), u * (t)) and similarly for functions with the same structure, e.g. the Hamilton function defined below. We define the time dependent index set I α (t) by I α (t) := {i ∈ {1, . . . , n c } | c i [t] ≥ −α}, the functions c i,α (t) by c i,α (t) := min{c i [t] + α, 0}, and the diagonal matrix S α (t) by S α (t) := diag(c i,α (t), i = 1, . . . , n c ). Let c Iα(t),u [t] := (c i,u ) i∈Iα(t) denote the submatrix of the Jacobian matrix c u that consists of the rows i ∈ I α (t).
with the Hamilton function H : Furthermore, the complementarity conditions hold a.e. in T : Using the Fischer-Burmeister function defined in Section 1 and due to (1), the necessary conditions (2)-(8) can be equivalently written as a nonlinear equation with F : Z ∞ → Y ∞ , 3. The discretized problem. In this section we introduce the Euler discretization of Problem 1. Let and let there exist constants c 1 > 0 and c 2 > 0 (independent of N ) such that The space W 1,p h (T, R n ) is defined to be the space of continuous functions which are affine on each interval [t i , t i+1 ] equipped with the norm · 1,p . The space L p h (T, R n ) is defined to be the space of functions g which are constant and continuous from the right on each interval [t i , t i+1 ] with g(t N ) := g(t N −1 ) equipped with the norm · p .
We discretize the control u by a piecewise constant function u h , the state x by a piecewise affine function x h , and focus on explicit Euler's method for the differential equation, i.e.
Note that x h (t i ) is the derivative of x h in the interval (t i , t i+1 ). The discretized problem reads as follows and we call it discrete problem throughout: is minimized subject to and the mixed control-state constraints Problem 2 is a finite dimensional nonlinear program. Note that the continuous mixed control-state constraints c(t, x(t), u(t)) ≤ 0 and the differential equations are satisfied at the grid points only. Again, we assume that some constraint qualification holds, so that we can normalize the Lagrange multiplier l 0 associated with the objective function to l 0 = 1.
be a local minimum of Problem 2 and let some constraint qualification be satisfied, e.g. the Mangasarian-Fromowitz constraint qualification, compare [12].
Then there exist discrete Lagrange multipliers Proof. Evaluate the KKT conditions for Problem 2, see Kunkel [18].
Theorem 3.1 leads to the discrete counter-part of Equation (9) By ∂F h we denote Clarke's generalized Jacobian of the locally Lipschitz continuous map F h , see Clarke [8]. According to Clarke [8, Thm 2.6.6, Prop. 2.6.2] the generalized Jacobian ∂F h is a subset of the larger set ∂ * F h which is defined to be the point to set mapping ∂ * F h : Z h,p ⇒ L(Z h,p , Y h,p ) given by As elements (s i , r i ) ∈ ∂ϕ[·] we might choose for t j ∈ G N and i ∈ {1, . . . , n c }, 4. Convergence analysis for controls of bounded variation. In this section we show convergence of sequences of solutions {z h } of the discrete problem F h (z h ) = 0 against the solution z of the continuous problem F (z) = 0. Note that we have to select the range and image space of F carefully to obtain convergence. Here, we only consider controls of bounded variation and measure the error in L p -norms with 1 ≤ p < ∞. We cannot expect convergence in L ∞ since u may possess jumps.
Assumption 4.1. Throughout this section we assume that the control u * and the multiplier η * of the mixed control-state constraints satisfy The following lemma can be found in [17,Thm 4]: Let v : R n → R be a Lipschitz continuous function with Lipschitz constant L and let z : [a, b] → R n be a function of bounded variation.
Then, the composition v • z is of bounded variation.
The absolutely continuous functions x * and λ * are functions of bounded variation.
Under Assumption 4.1, Lemma 4.2 implies that the compositions f (·, x * (·), u * (·)) and c(·, x * (·), u * (·)) as well as the first and second derivatives f , . . ., etc., are functions of bounded variation. Furthermore, since the multiplication and summation of two functions of bounded variation is of bounded variation, see Natanson [24, Chapter VIII], the Hamilton function and its first and second derivatives are also of bounded variation. This shows that We define the spaces The discretization operators are combined in an obvious way, so that we obtain restriction operators for the spacesZ p andỸ p , i.e.
. Note that byZ p andỸ p we denote the restriction of Z p and Y p to functions of bounded variation.
We exploit the following basic approximation properties for the discretization operators.
For v ∈ BV 1 (T, R) the restriction operator ∆ 1 h satisfies lim We prove consistency and stability of the discretization method: . Let z * ∈Z p be a locally optimal solution satisfying F (z * ) = 0. Then the discretization method is consistent of order 1/p at z * , i.e.
We exploit Lemma 4.3 (ii) and obtain Similarly it holds It remains to discuss the equation which arises from the optimality condition: Finally we obtain In the subsequent considerations let F h,j denote the j-th functional component of F h and let M := 4n x + n ψ + n u + n c denote the number of components of F h , e.g.
Theorem 4.5 (Stability). Let z * ∈Z p be a locally optimal solution of Problem 1 with F (z * ) = 0. Let the inverse operators W −1 h exist and let there exist constants C > 0 and δ > 0 (independent of h) such that . . , M , and every h > 0 sufficiently small.
Then the discretization method is stable at ∆ Zp h (z * ) and there exist solutions Under these assertions, Lempio [19,Corollary 6.1.20] shows the stability at ∆ Then the discretization method is convergent of order 1/p at z * .
Proof. Theorem 4.5 guarantees the existence of solutions z h of F h (z h ) = 0 for sufficiently small h > 0. It holds

Stability at ∆
We will show the assertions for all W h ∈ L(Z h,p , Y h,p ) in the larger set The linear operator equation W h (δz h ) = w h reads:  (17) with respect to δu h (t i ) and δη h (t i ) and insert the resulting expression for (δu h (t i ), δη h (t i )) into (16). This yields the reduced system Note that the system (19)-(20) resembles the system which is discussed in Gerdts [13,Eq. (3.6)]. Also the subsequent theorem can be considered as a discrete version of Gerdts [13,Theorem 3.4] and the techniques used in the proof are similar.
(iv) Let there exist constants C 1 ∈ (0, 1), C 2 > 0, and C 3 > 0 with Furthermore, let there exist C 4 > 0 such that Note that all assumptions in (iv) are satisfied, if Â h (t i ) is uniformly bounded.
(v) Let the matrix be nonsingular and let there exist constants C 5 > 0 and C 6 > 0 s.t.
Then, W h is nonsingular and there exists is nonsingular by Assumption (i) and we first consider the reduced system (19)- (20). For this system we show the uniform nonsingularity. Thus we have to show that there exists a constant K > 0 s.t. max{ δx h 1,p , δλ h 1,p , δσ h } ≤ K max{ b h p , w h,r }.

Reordering of Equation (19) with respect to
The matrices M h (t i ) and P h (t i ) are perturbed identity matrices with perturbations The nonsingularity of the matrix M h (t i ) is provided by the perturbation lemma, see [20,Thm. 3]. Furthermore the perturbation lemma yields Thus, Equation (21) can be solved for ξ h (t i+1 ), and ξ h (t N ) is uniquely defined by ξ h (t 0 ). It holds and repeated application yields With the same argumentation we show that the homogeneous problem from Assumption (v) has a unique solution Φ h . We obtain an estimate for Φ h (t i ) if we set b h = 0 and exploit Φ(t 0 ) = I in (22): by Assumption (iv). One can easily verify that ξ h may be represented as whereξ h is the solution of the inhomogeneous initial value problem for i = 0, . . . , N − 1 with a constant κ 1 := C2 C3(1−C1) . We introduce ξ h (t N ) from Equation (23) into the boundary conditions (20) and obtain The matrix R h is nonsingular andξ h (t N ) does not depend on the boundary conditions. This shows that the reduced system has a unique solution. Furthermore it holds where we exploit (25) and Assumption (v). We conclude that with constants κ 2 = C2C6 C3 and κ 3 = 1 + C2C5C6

C3
. This yields with a constant κ 4 > 0. It remains to analyze the p-norm of the derivative of ξ h . We apply Minkowski's inequality and conclude with constants κ 5 > 0, κ 6 > 0, and κ 7 > 0. This yields the uniform nonsingularity of the reduced system: The full system (16)-(18) has a unique solution: the solution x h , λ h and σ h of the reduced system and

By Assumptions (i) and (iii) we show in a similar way that
with a constant κ 8 > 0.
The right hand side of the reduced system by assumptions (i) and (ii). Summarizing, we have shown that there exists K > 0 such that Remark 2. The discrete error z h − ∆ Zp h (z * ) Z h,p lacks the problem that it does not necessarily see jumps in z * ∈Z p . One would rather like to work with the continuous error This error is well defined, since Z h,p ⊂Z p . We also get convergence inZ p , since

The quantity ∆
Zp h (z * ) − z * Z p is again of order h 1/p which can been seen from the proofs of Lemma 4.3 and Corollary 1.
Remark 3. The assumptions in Theorem 4.7 and especially the assumption on the nonsingularity of the matrices D h exclude problems with pure state constraints and are not automatically satisfied for problems with the control appearing linearly. To this end a more detailed investigation of subclasses that satisfy the assumptions of Theorem 4.7 would be desirable and is the subject of future research. 5. Numerical illustrations. Problem 2 is a finite optimization problem and various established software packages for its solution are available, see http://plato.asu.edu/guide.html for an overview and classification. Note that although the convergence proof is motivated by a nonsmooth reformulation of optimality conditions, it is not necessary to solve the discrete problems with Newton's method. The subsequent results are obtained with the solver WORHP, see Büskens et al. [6]. WORHP is an implementation of the sequential quadratic programming (SQP) method designed for smooth large-scale and sparse nonlinear optimization problems. The solver implements two strategies to tackle the quadratic subproblems, a nonsmooth Newton's method and an interior point method. Both methods lead to large-scale and sparse linear equation systems which are solved by the direct decomposition solver MA57, see Duff [11]. WORHP implements several fall-back strategies for difficulties (like ill-conditioning, breakdown of linesearch strategies,etc) that might occur while solving the optimization problem.
We consider the optimal control problem Problem 3. Let T := [t 0 , t f ], a ∈ BV (T, R) continuous with only countably many roots and b ∈ BV (T, R + 0 ) continuous. Find x ∈ W 1,∞ (T, R) and u ∈ L ∞ (T, R) such that is minimized subject to x(t 0 ) = 0, We set parameters t 0 := −2, t f := 2, The globally optimal control u of Problem 3 is given by The state x associated with the optimal control is obtained by integrating the state equation The optimal objective value is approximately x(2) ≈ −4.3349. From the local minimum principle we derive λ(t) = 1, σ = −1, and where η 1 and η 2 are the multipliers associated with −u(t) ≤ 0 and u(t) − b(t) ≤ 0 respectively. u is piecewise monotone and has jumps at s k := 5 k , k = 3, 4, 5, . . . and −s k . Let s 2 = 2. The total variation of u is (compare Figure 1) Figure 1. The optimal control u has countable infinitely many jumps.
, if all series converge. Herein, s − k and s + k denote left-and right-hand limits at s k . It holds All single series converge since they are dominated by C ∞ i=1 1 k 2 for some sufficiently large C > 0. This shows that u is of bounded variation. The total variation of u can be computed with a computer algebra software and one obtains Similarly we prove that the multipliers η 1 and η 2 are of bounded variation with Var 2 −2 (η 1 ) = Var 2 −2 (η 2 ) = − 731 36 + 25π 6 . The multipliers are plotted in Figure 2.  Let us now verify the assumptions of Theorem 4.7 for solutions z h of the discrete problems F h (z h ) = 0. The matrices in (16)-(17) read the matrixÂ h (t) in the reduced system (19) readŝ is nonsingular, and the matrices in (18) are given by It holds  The order of convergence may be estimated by determining the optimal C p and q p such that z h − z * Zp ≈ C p h qp in least-squares sense: we solve the discretized problem for various h i , i = 1, . . . , M , and then solve with respect to C p and q p . The continuous L p norms are integrated by an adaptive integration scheme. Table 1 summarizes the orders q p , which are in good agreement with the theoretical orders 1/p. 6. Conclusion. We studied discretized control-state constrained optimal control problems using explicit Euler's method and prove its convergence of order 1/p in the L p -norm. In the proof we assumed the optimal control to be merely of bounded variation. The proof technique follows and extends the general convergence theory in [26] towards nonsmooth equations since the classic approach cannot be used for discontinuous controls. The main convergence result in Theorem 4.6 can be summarized as follows: consistency and stability implies convergence.
There are several directions for future research. The convergence proof was conducted for state discretization by explicit Euler's method. Further research towards higher order discretization methods has to show whether higher order approximations are possible under suitable smoothness assumptions. Another topic for future research is the technical assumptions in Theorem 4.5 regarding the uniform nonsingularity of the operators W h . It would be desirable to have assumptions that can be verified in the optimal solution of the continuous problem rather than in the optimal solutions of the discretized problems. Finally, the approach used in this paper cannot be extended in a straightforward way to problems with pure state constraints. A potential approach for pure state constraints could be to use regularization techniques similar to [14] in order to embed the problem with pure state constraints into a family of problems with mixed control-state constraints. Then, the uniform convergence with regard to the regularization parameter has to be shown. This will be the subject of future research.
Appendix A. Proof of Lemma 4.3.

Proof.
(i) Norm equivalence in the finite dimensional space R N yields max i∈{1,...,N } (ii) Application of the fundamental theorem of calculus yields the assertion, see Natanson [24,Chapter IX,4].
(iv) Let v ∈ BV (T, R). Then, v restricted to the interval [t i , t i+1 ] is also a function of bounded variation and by Natanson [24, Chapter VIII, 3] the following holds.

|v(t)| ≤ |v(t i )| + Var
The function v(t i ) − v(·) is of bounded variation and thus We exploit this assertion piecewise and obtain (v) Let v ∈ BV 1 (T, R). Then for t ∈ [t i , t i+1 ] the following holds.
Thus, (vi) For all v ∈ BV 1 (T, R) the following holds.