Symplectic Runge-Kutta discretization of a regularized forward-backward sweep iteration for optimal control problems

Li, Chen, Tai&E. (J. Machine Learning Research, 2018) have proposed a regularization of the forward-backward sweep iteration for solving the Pontryagin maximum principle in optimal control problems. The authors prove the global convergence of the iteration in the continuous time case. In this article we show that their proof can be extended to the case of numerical discretization by symplectic Runge-Kutta pairs. We demonstrate the convergence with a simple numerical experiment.

Recently, Li et al. [1] proposed a new indirect iteration for optimal control problems in the context of deep neural networks, that utilizes the 'method of successive approximations', i.e. forward and backward integrations, combined with an 'augmented Lagrangian' regularization that ensures global convergence. The authors argue that this approach is particularly suitable for highdimensional optimal control problems as encountered in deep learning. Large scale optimal control problems figure centrally in a number of modern applications such as deep neural networks [1], reinforcement learning [2,3], filtering and data assimilation methods [4,5] and mean field and stochastic differential games [6,7,8]. In this paper we describe how the iteration of Li et al. combines naturally with symplectic/variational integrators to yield a convergent numerical scheme.
Optimal control problems possess a natural variational structure that gives rise to Hamiltonian dynamics which may be exploited in a numerical treatment [9]. Symplectic methods for Hamiltonian initial value problems have been much studied since the mid-1990s due to their demonstrated superiority for conserving energy and other first integrals [10,11,12]. On the other hand, optimal control problems lead to boundary value problems, and it is unclear that the advantages of symplectic integrators for IVPs should translate to the BVP setting. Recent papers that address the use of symplectic Runge-Kutta methods for optimal control stress the conservation of quadratic invariants [13,14] and the persistence of critical orbits in modified equa-tion expansions [15]. See also recent work on the preservation of bifurcations under symplectic discretization of boundary value problems [16].
In the first three sections of the paper we review the Hamiltonian structure of optimal control problems ( §1), the regularized forward-backward sweep iteration proposed by Li et al. [1] ( §1.2) and the discrete variational approach to constructing symplectic Runge-Kutta methods ( §2). In Section 3 we prove the convergence of the discrete regularized forward-backward sweep iteration, which follows closely the proof of [1] for the continuous case. It is the symplectic structure of the discretization that facilitates this proof. Finally, in Section 4 we demonstrate the convergence of the method for a simple example using two symplectic discretizations.

Background
In this section we define continuous optimal control of differential equations and discuss their Hamiltonian structure, and we review the regularized forward-backward sweep iteration of Lie et al. [1].

Hamiltonian structure of optimal control problems
The state of the system to be controlled is described by a vector x(t) : T → R d , where T = [0, T ] represents a time interval. The control function u(t) is for each t an element of the set of admissable controls U ⊂ R m . The motion of the system is described by a differential equationẋ (t) = f (x(t), u(t)), where f : R d × U → R d and ξ ∈ R d is the initial state. The control u(t) is chosen to minimize the objective functional where Φ : R d → R is the end cost and h : R d × U → R is the running cost.
As in [1] (cf. equations (A1) and (A2) of that article) we assume that Φ, f and h are twice continuously differentiable with respect to x and satisfy Lipschitz conditions for all x, x ′ ∈ R d , u ∈ U and t ∈ T : Note that the solution x(t) of (1) is well-defined for appropriate u(t) so that we may think of J as a functional essentially depending only on u(t).
The problem can be reformulated as a constrained optimization problem by introducing the Lagrange multiplier function λ(t) : T → R d and the Lagrangian functional (Throughout the paper we use the transpose and dot product notation interchangeably, whichever is more convenient.) The variational derivatives of the functional L with respect to the functions x(t), λ(t) and u(t), denoted L x , L λ and L u , are defined with respect to the L 2 inner product ·, · and a class of admissible variations in an appropriate (dual) function space, possibly via integration by parts. For instance, Here δx(t) denotes an arbitrary variation of the function x(t). Substituting into (4), expanding terms in Taylor series, and taking the limit yields where h x denotes the vector of partial derivatives of h with respect to x and f x denotes the Jacobian matrix of partial derivatives of f with respect to x. To express the left side as an inner product, we use integration by parts with respect to the term λ T δẋ to obtain The boundary terms vanish if we require λ(T ) = −Φ x (x(T )), leaving an inner product with δx(t).
In this case, there are no boundary restrictions on the variation δx(t), and we identify The variational derivatives L λ and L u can be determined analogously (in fact, no integration by parts is needed for these). The first order necessary conditions for an optimum of (4) are given by the Euler-Lagrange equations (L x ≡ L λ ≡ L u ≡ 0): These equations are supplemented by the boundary conditions x(0) = ξ and λ(T ) = −Φ x (x(T )).
In particular, if f and h are smooth and u is an optimal control in the interior of U, then it satisfies (6)- (8). It is useful to define a function g(x, λ, u) for the right side of (7) The Hamiltonian formulation can be derived from the Lagrangian formulation by introducing a Legendre transformation. In the particular case of optimal control, the adjoint variable λ is identically equal to the conjugate momentum associated to the state variable x. The Hamiltonian function is Hamilton's equations areẋ These are equivalent to the Euler-Lagrange equations. Note that minimizing the objective functional J corresponds to maximizing the Hamiltonian with respect to u. The condition (13) above can be generalized to apply to controls u(t) constrained by the boundary of U by replacing (13) with Pontryagin's maximum principlė

Regularized forward-backward sweep iteration
Solution of (14)- (16) is challenging due to the boundary conditions. One approach is to solve in succession (14) for x(t), (15) for λ(t) and (16) for u * (t) and iterate. Such a forward-backward sweep iteration typically diverges unless the Lipschitz constant K and the time interval T are small [? ]. In a recent article, Li et al. [1] proposed a modified iteration based on a regularized Lagrangian approach. They introduce the augmented Hamiltonian functioñ Subsequently, the forward-backward sweep iteration is modified to solve consecutively: It is important to note that along solutions to (14) and (15), the right two terms of (17) are zero. Consequently, only (20) is modified with respect to (16). However, Li et al. show that this modification is sufficient to ensure convergence [1].
Li et al. introduce the regularized forward-backward sweep iteration to train deep neural networks [1] and argue that an advantage of this approach is that it is suitable for application to high dimensional systems.

Variational integrators and symplectic Runge-Kutta pairs
Symplectic Runge-Kutta methods possess two properties that make them attractive for numerical integration of Hamiltonian initial value problems: they conserve certain quadratic first integrals and they conserve a modified Hamiltonian function over exponentially long time intervals. See the monographs [10,11,12] for a complete discussion. Symplectic Runge-Kutta methods can be derived using a discrete variational formalism, see [17].
Variational methods are also well known in the optimal control literature see e.g. the work of Marsden, Leok and Ober-Blöbaum [18] and references therein. In a recent review, Sanz-Serna [13] argues that it is the property of conservation of quadratic integrals that it is most relevant in the adjoint context.
For optimal control, use of the variational integrator framework may have additional advantages: first, by discretizing the integral before optimizing, one constructs a discrete problem for which an optimum may be established, whereas directly discretizing the Euler-Lagrange equations relies on the approximation property in the limit τ → 0, where τ > 0 is the step size, to guarantee an optimum. Second, backward error analysis implies the existence of a modified Hamiltonian, near the continuous Hamiltonian, which may have consequences for optimality in the presence of nonunique minima. Backward error analysis may also be applicable for control problems on long time intervals, or for problems with multiple time scales for which the time interval is long on a fast time scale.
We discretize the interval T into N > 0 equal steps of size τ = T /N . An s-stage Runge-Kutta method for the state equation (1) is where n = 0, . . . , N − 1 denotes the time step index and the coefficients b i and a ij , i, j = 1, . . . , s, are chosen to ensure accuracy, stability, and additional properties. See the monographs [19,20] for a thorough treatment. Numerical consistency requires the coefficients b i satisfy i b i = 1. In this paper we will also assume that b i ≥ 0, i = 1, . . . , s.
To simplify notion we will frequently suppress the time step index n in the internal stage variables X i,n and U i,n . In all formulas the stage variables are evaluated at time level n, so there should be no ambiguity.
The associated discretization of the cost function (2) is One can formally construct a discrete variational derivative of (23) with respect to discrete function spaces and a discrete inner product. However for uniform time step τ it is sufficient to consider just partial derivatives of L. The Euler-Lagrange equations become: The relations (25)-(26) are clearly equivalent to (21)- (22). Solving (27) for λ n+1 , substituting into (28) and defining the coefficientsã Similarly (29) is written It is useful to introduce the auxiliary stage variable Λ i to represent the term in square brackets in the previous two expressions: and the condition (30) becomes In terms of the new variable, the variational Runge-Kutta discretization of Pontryagin's maximum principle is This system consists of the state equations (31) and (32), the adjoint equation (33) and (34), and the optimality condition (35).
Recalling the Hamiltonian (10), we can also write the above relations in a form that emphasizes the Hamiltonian structure: In some cases, it is appropriate to replace the latter condition with the more general As noted in [13], a pair of RK methods defined by coefficients {b i , a ij } and {b i ,ã ij }, wherẽ That is, if these methods are applied to a pair of differential equationsẋ = H λ (x, λ),λ = −H x (x, λ), then the resulting map from t n to t n+1 is a symplectic map. Hence, we obtain the well-known result that the discrete variational approach automatically produces a symplectic integrator for the Euler-Lagrange equations.

Symplectic Euler method
The elementary example of a symplectic variational integrator is the symplectic Euler method, which corresponds to the RK pair with s = 1, b 1 = 1, a 11 = 0 = 1 −ã 11 . In this case all of the internal stage relations can be eliminated, leaving the discrete Lagrangian The discrete Pontryagin maximum principle is with boundary conditions Note that (43)-(45) can also be written in terms of the Hamiltonian H:

Reduced notation for Runge-Kutta methods
Hager [21] introduced notation that casts general symplectic Runge-Kutta methods (36)-(40) in a form consistent with the symplectic Euler method. Define where we view the stage values X i and U i as functions of grid point value x and discrete control u = {U 1 , . . . , U s } according to Similarly, define the Hamiltonian With this notation, the discretization of Pontryagin's maximum principle with any symplectic Runge-Kutta pair can be written as To see the equivalence, note that evaluating (50) at x n yields the implicit relations (32). Taking the derivative of (51) with respect to λ and substituting (49) shows (52) to be equivalent to (31). The proof of the relation (53) is more involved. We adapt the proof from [21] for our notation.
Let Ψ i (x) = ∂ x X i (x, u) and denote Ψ i = Ψ i (x n ). Then computing the derivative of (50) at x n yields the linear system The derivative on the right side of (53) is Rearranging (28) gives Premultiplying by Ψ T j and summing over j gives where the last equality follows from (56). Now changing the index of summation in the first sum on the left, we obtain where the second equality follows from (55), thus confirming (53).

Convergence analysis
In this section we prove the convergence of the regularized forward-backward sweep iteration (18)- (20) for symplectic Runge-Kutta methods. The proof here follows closely that of Li et al. for the continuous case [1]. It is the symplectic/variational structure that facilitates this analogy.
Using the compact notation (49) and (51), we define the discrete regularized Hamiltonian func- In iterate k, the symplectic Runge-Kutta discretization of the regularized forward-backward sweep iteration (18)- (20) solves, in sequence, proceeding as follows: (59) by forward integration with u and λ fixed, then (60) by backward integration with x and u fixed, and finally (61) solved for each time step independently (e.g. in parallel), with x and λ fixed.
It is important to recall that with u fixed, along solutions of (59) and (60) the extra regularization terms in the extended HamiltonianH τ are identically zero and H τ λ x n , λ n+1 , u n , i.e., the regularization terms only affect the maximization step (61).

Notation and identities
In the following we consider a single iteration of (59)-(61). We think of H τ , x and λ as functions of u. Consequently we denote by x u n and λ u n the numerical solutions to (52) and (53) given a candidate control u.
It is convenient to define the composite notation .
We consider two control sequences u and v, and we are interested in bounding the change inH τ when u is replaced by v. To that end we define an operator that denotes the difference between quantities dependent on u and v: δ u x n = x v n − x u n . We use this notation also for functions, e.g.
We denote byδ u H τ the change due to an update in u with x and λ fixed as functions of u: We denote the temporal forward difference operator by δ t : and remark that δ t commutes with δ u when applied to variables, i.e. δ u δ t x n = δ t δ u x n .
Next we note the discrete integration by parts formula: This formula holds for any discrete functions defined for n = 0, . . . , N , and in particular we may insert the difference operator δ u to obtain two useful alternatives:

Estimates
In the Appendix we show that-possibly with a restriction on step size-the Lipschitz conditions (3) on f and h translate into related Lipschitz conditions on f τ and h τ . Henceforth choosing K to be a generic Lipschitz constant we obtain the bounds Note also that the leftmost terms in the above inequalities as well as the analogous ones of (3) imply global bounds on the derivatives (which may be relaxed, see [1]) We use two discrete forms of Grönwall's lemma [22]. Let {b n } be a given, monotone sequence and τ, K > 0. Then the following implication holds: a n+1 ≤ (1 + τ K)a n + τ b n , ∀n ⇒ a n ≤ e τ nK a 0 + K −1 e τ nK b n−1 .
Under the same conditions, the following implication holds: a m , ∀n ⇒ a n ≤ e τ nK b n .
From (56) and (94), and using the bounds (66) on f x and h x , where we have absorbed the constant from (94) into K. Further using Grönwall bound (67) and the bound (66) on Φ x (x), From δ u x n+1 = δ u x n + τ δ u f τ | n and δ u x 0 = 0 we calculate and using Grönwall bound (68), Similarly, from δ u λ n = δ u λ n+1 + τ δ u H τ x (x n , λ n+1 , u n ) we obtain where the last term uses (3) and the Lipschitz condition (65) on H τ x . The discrete Grönwall's lemma gives Finally, making use of (70) gives The following estimates make use of Taylor's theorem in the mean value form: for some r 1 ∈ [0, 1], where H τ zz denotes the Hessian matrix of second partial derivatives of for some r 2 ∈ [0, 1]. Similarly, for some r 3 ∈ [0, 1].

Convergence of the iteration
Convergence of the regularized forward-backward sweep iteration relies on Lemma 2 of [1], the proof of which we adapt for the symplectic RK method here. The result we want states that under the assumptions (3), there exists a constant C > 0 such that for any two discrete controls u, v ∈ U, the discrete cost function (24) satisfies Define the discrete functional The functional I is identically zero for sequences x and λ satisfying (52)-(53). Note the identity δ u (λ n+1 · δ t x n ) = λ u n+1 · δ t δ u x n + δ u λ n+1 · δ t x u n + δ u λ n+1 · δ t δ u x n .
We find In our notation this is Remark. This is the point where the symplectic/variational property of the symplectic RK method is important. Since x n and λ n are discretized by a symplectic partitioned Runge-Kutta method, we see that I is also equivalent to the constraint part of the discrete Lagrangian: which is identically zero along a solution to the state dynamics (52). Of course, one could define I as above for an arbitrary choice of the λ n . Then I would be identically zero, but one would not be able to translate this into a statement about the Hamiltonian.
Using (63) the first two terms on the right side of (79) are equal to Similarly, using (64) the third term on the right side of (79) is equal to Remark. Again the symplectic property of the discretization allows us to express this as the gradient of the Hamiltonian collocated at the numerical solution of the forward and backward equations, which in turn will allow cancellation with the second term of the Taylor expansion in (84).
Combining (79), (80) and (81) gives Given that δ u x 0 = 0, the boundary term in (82) reduces to We substitute (72) and (75) into the second and third summand of (82), (83) into the boundary term, and subsequently the estimates (73) and (74) to yield: or, δ u z n · (H τ zz (z u n + r 1 δ u z n , v n ) − H τ zz (z u n + r 4 δ u z n , v n )) · δ u z n . (84) Next, we use the estimates (70) and (71) and the fact that the quadratic terms are bounded by a some constant K 3 to calculate which is the result sought (cf. (76)).
It now remains to show that the regularized forward-backward sweep iteration converges. We first show that an estimate of the same form as (76) holds for δ u H τ when the regularized Hamiltonian is maximized. These can be combined to show monotone decay of the objective function J τ [u]. Thereafter, it is shown that the sum of the decrements is finite, which implies convergence of the differences.
Let v denote the improved control obtained by solving (61). The resulting change inH τ must be nonnegative, hence The last term in square brackets vanishes since x u n and λ u n satisfy (52)-(53). Consequently, the above expression is equivalent to Combining this with Lemma 2 gives The summation on the right side is nonnegative, as a consequence of (86) . Therefore, choosing ρ > 2C ensures that J τ is nonincreasing. Next suppose we iterate (59)-(61). Let u (k) denote the control variable in iteration k. Then it holds that where D = (1 − 2C/ρ) > 0. Consequently, in the limit M → ∞ this sum is bounded, which implies proving convergence of the iteration.

Numerical illustration
In this section we study numerically the convergence of the discrete regularized forward-backward sweep iteration. As a test problem we control the motion of a damped oscillator in a double well potential. The controlled motion is given by where µ > 0 is a damping parameter. The control u(t) acts only on the velocity. As initial condition we choose ξ = (−1, 0) in the left potential well, and we seek to minimize the cost function where the target final position is x f = (1, 0), in the right potential well. For the numerical computations we take T = 6, ν = 1, and α = 10.
We solve the optimal control problem using the discrete regularized forward-backward sweep iteration (59)-(61) and the symplectic Euler scheme (43)-(45). We iterate until the update to the control variable u is less than a prescribed tolerance where ε = 1e −8 . The computed optimal path x(t) = (q(t), p(t)) is shown as a solid blue curve on the left plot of Figure 1. The background contours are level sets of the total energy function E = 1 2 p 2 + 1 4 q 4 − 1 2 q 2 . The optimal control must accelerate the motion of the particle to reach an energy level above the saddle point, allowing it to cross to the potential well on the right.
For this computation we chose ρ = 100 for the regularization parameter. Figure 2 shows the convergence of the discrete cost function (24) for values ρ = 50, ρ = 100 and ρ = 200. For ρ = 100, the convergence is monotone as predicted by the theory of the previous section (cf. (87)). For ρ = 50, we observe an initial reduction in cost, which eventually oscillates and does not converge. For ρ = 200, the iteration converges but at a slower late than for ρ = 100. Hence, our experience suggests there is a critical value of ρ below which there is no convergence of the regularized forward-backward sweep iteration, and above which the convergence becomes steadily slower.
The minimal cost obtained using the symplectic Euler method and N = 160 was J = 0.7712. We also computed the optimal solution for N = 20 time steps, shown as the red dash-dot line in the left plot of Figure 1. As noted in Section 2, by discretizing the Lagrangian we obtain a discrete optimal control problem for each N . For the case N = 20 the optimal path deviates significantly from that for N = 160. Because the Lipschitz constant is larger for this solution, it was necessary to take ρ = 400 for convergence. The optimal cost is J = 0.7006, which is lower than was found for N = 160.
We also solved the optimal control problem using the implicit midpoint rule, a second order symplectic Runge-Kutta method with s = 1 and coefficients a 11 = b 1 = 1/2. The solutions for N = 20 and N = 160 are shown in the right plot of Figure 1. Here we see that the discrete optimum at low resolution is much closer to that at high resolution. The optimal costs were computed J = 0.7837 for N = 20 and J = 0.7769 for N = 160. Both resolutions converged with ρ = 100.

Summary
In this article we have extended the convergence proof of a regularized forward-backward sweep iteration [1] for solving optimal control problems to the discrete setting. We showed that if the continuous problem is discretized by a partitioned Runge-Kutta pair (using a variational integrator approach), then the convergence proof of [1] may be easily adapted. Numerical experiments with the first order, explicit symplectic Euler method and the second order implicit midpoint rule demonstrate monotonic convergence of the cost function if the regularization parameter ρ is chosen large enough. For insufficiently large ρ the cost undergoes bounded oscillations; whereas for excessively large ρ the convergence is slower. In our experiments, convergence was observed even with large step sizes, however the resulting discrete optimization problem is an inaccurate approximation of the continuum problem. The total number of iterations required for convergence may be dramatically reduced using an acceleration method such as Anderson acceleration [23].
where X ′ i satisfies X ′ i = x ′ + τ s j=1 a ij f (X ′ j , U j ).
Denoting ∆X i = X i − X ′ i and using the Lipschitz condition on f (cf. (3)), we find Denote by |A| the matrix with elements |a ij |, by |∆X| the vector with elements ∆X i , and let 1 be the vector of dimension s with all elements equal to 1. Then the above equation becomes For explicit Runge-Kutta methods, the matrix on the left always has positive inverse given by For implicit Runge-Kutta methods, the matrix on the left of (91) is an M-matrix with positive inverse if we impose the step size restriction In either of the above cases we find Returning to (90) we obtain proving the first bound in (65).
To prove the second bound, recall (55). Taking norms, and using the bound (3), from which we conclude that We also find where the last inequality follows by inverting the matrix of (91)-in the case of implicit RK methods under the step size restriction (92). Similarly, we compute proving the second bound in (65).
The bounds on h τ and h τ x in (65) follow the same reasoning.