Douglas-Rachford Algorithm for Control- and State-constrained Optimal Control Problems

We consider the application of the Douglas-Rachford (DR) algorithm to solve linear-quadratic (LQ) control problems with box constraints on the state and control variables. We split the constraints of the optimal control problem into two sets: one involving the ODE with boundary conditions, which is affine, and the other a box. We rewrite the LQ control problems as the minimization of the sum of two convex functions. We find the proximal mappings of these functions which we then employ for the projections in the DR iterations. We propose a numerical algorithm for computing the projection onto the affine set. We present a conjecture for finding the costates and the state constraint multipliers of the optimal control problem, which can in turn be used in verifying the optimality conditions. We carry out numerical experiments with two constrained optimal control problems to illustrate the working and the efficiency of the DR algorithm compared to the traditional approach of direct discretization.


Introduction
Many of the problems we encounter in the world that can be presented as optimal control problems contain constraints on both the state and control variables.Imagine a scheduling problem where the aim is to maximize profits such as in [26] in which the control variable is the production rate and the state variable is the inventory level.Naively, one may want to greatly increase both the inventory level and production rate to maximize profits but in practice there is a limit to the amount of inventory a factory can hold and a limit on how quickly humans or machines can work.In order to accurately model this problem, and many others from a wide rage of application areas such as manufacturing, engineering, science, economics, etc., we must use an optimal control problem with both state and control constraints.Although these problems are commonly found in applications they are much more difficult to solve than optimal control problems which purely have control constraints.
We focus our attention on linear-quadratic (LQ) state-and control-constrained optimal control problems.These are infinite dimensional optimization problems that involve the minimization of a quadratic objective function subject to linear differential equation (DE) constraints, affine constraints on the state variables, and affine constraints on the control variables.There is extensive literature studying LQ control problems as they model many problems from a wide variety of disciplines -see [1, 14-16, 24, 27, 28].Though many of the LQ control problems posed in the literature contain control constraints along with the linear DE constraints, it is rarer to see state constraints included since, as mentioned above, they are much more difficult to deal with.This paper proposes a unique approach for solving state-constrained problems by applying the Douglas-Rachford (DR) algorithm.
The DR algorithm is a projection algorithm used to minimize the sum of two convex functions.In order to apply the algorithm one needs the proximal operators of the two convex functions.Splitting and projection methods such as the DR algorithm are a popular area of research in optimization with a variety of applications -see [2,3,6,9,20] for their use in sphere packing problems, protein reconstruction, etc.The use of these methods to solve discrete-time optimal control problems is not new but there are very few applications of these methods to continuous-time optimal control problems.The paper [7] is one in which projection methods were used to solve the energy-minimizing double integrator problem with control constraints.The papers [11][12][13] address more general energy-minimizing LQ control problems with control constraints.While a collection of general projection methods is used in [11], the DR algorithm is used in [12,13].
Following the promising numerical results observed in [12,13], the present paper will use the DR algorithm for addressing the more challenging control-and state-constrained LQ control problems.
The current technique for solving these control-and state-constrained LQ problems is a direct discretization approach where the infinite-dimensional problem is reduced to a large scale finite-dimensional problem by use of a discretization scheme, e.g. a Runge-Kutta method [5,22].This discretized problem is then solved through the use of an optimization modelling language such as AMPL [19] paired with a (finite-dimensional) large-scale optimization software such as Ipopt [33].
In the present paper (as was done in [7,11] for control constraints only) we make the following contributions for the numerical solution of LQ control problems with state and control constraints.
The two convex functions mentioned above are defined as follows: the information from the ODE constraints appears in one function while the information from the control and state constraints along with the objective functional appear in the other.We derive the proximal mappings of the two convex functions without reducing the original infinite-dimensional optimization problem to a finite-dimensional one, though we need to discretize the state and control variables over a partition of time when implementing the DR algorithm since a digital computer cannot iterate with functions.
The paper is structured as follows.In Section 2 we give the problem formulation and optimality conditions for the optimal control problem.In Section 3 we derive the proximal mappings used in the implementation of the DR algorithm.Section 4 introduces the DR algorithm.Then Section 5 begins with an algorithm for computing one of the proximal mappings (namely the projection onto an affine set) and a conjecture used to obtain the costate variable of the original LQ control problem.Next, this section introduces two example problems, a harmonic oscillator and a mass-spring system.At the end of this section, numerical experiments for the DR algorithm and AMPL-Ipopt suite and their comparisons are given for these two problems.Finally, concluding remarks and comments for future work are provided in Section 6.

Optimal Control Problem
In this section we formulate the general optimal control problem that will be the focus of this paper.We give some necessary definitions and provide conditions for optimality from optimal control theory.
Before introducing the optimal control problem we will give some standard definitions.Unless otherwise stated all vectors are column vectors.Let L 2 ([t 0 , t f ]; R q ) be the Banach space of Lebesgue measurable functions z : [t 0 , t f ] → R q , with finite L 2 norm, namely, the Sobolev space of absolutely continuous functions, namely . With these definitions we can state now the general LQ control problem as follows. (P) The state variable x ∈ W For every t ∈ [t 0 , t f ], the matrices Q(t) and R(t) are symmetric, and respectively positive semi-definite and positive definite.For clarity of arguments, these matrices are assumed to be diagonal, namely Q(t) := diag(q 1 (t), . . ., q n (t)) and R(t) := diag(r 1 (t), . . ., r m (t)).These diagonality assumptions particularly simplify proximal mapping expressions that appear later.The initial and terminal states are given as x 0 and x f respectively.

Optimality conditions
In this section we state the Maximum Principle by using the direct adjoining approach from [21].We start by defining the extended Hamiltonian function where the adjoint variable vector λ : [t 0 , t f ] → R n with λ(t) := (λ 1 (t), . . ., λ n (t)) ∈ R n and the state constraint multiplier vectors For brevity, we use the following notation, The adjoint variable vector is assumed to satisfy for a.e.t ∈ [t 0 , t f ], where H x := ∂H/∂x.
Maximum Principle.Suppose the pair is optimal for Problem (P).Then there exists a piecewise continuous adjoint variable vector λ ∈ W for i = 1, . . ., m, where b i (t) is the ith column of the matrix B(t) and r i (t) is the ith diagonal element of R(t).Moreover, the multipliers µ 1 (t), µ 2 (t) must satisfy the complementarity conditions for all i = 1, . . ., n.

Splitting and Proximal Mappings
In this section, we rewrite Problem (P) as the minimization of the sum of two convex functions f and g, and give the proximal mappings for these functions in Theorems 1-2.
We split the constraints from (P) into two sets A, B given as Despite previously defining We assume that the control system ẋ(t) = A(t)x(t) + B(t)u(t) is controllable; in other words the control system can be driven from any x 0 to any other x f -for a precise definition of controllability and the tests for controllability, see [29].Then there exists a (possibly not unique) u(•) such that, when this u(•) is substituted, the boundary-value problem given in the expression for A has a solution x(•).In other words, A ̸ = ∅.Also, clearly, B ̸ = ∅.We note that the constraint set A is an affine subspace.Given that B is a box, the constraints turn out to be two convex sets in Hilbert space.Since every sequence converging in L 2 has a subsequence converging pointwise, it is straightforward to see that the set B is closed in L 2 .The closedness of A will be established later on as a consequence of Theorem 2 (see Remark 2).
Fix β > 0 and let where ι C is the indicator function of the set C, namely Problem (P) is then equivalent to the following problem.
In our setting, we assume that we are able to compute the projector operators P A and P B .These operators project a given point onto each of the constraint sets A and B, respectively.Recall that the proximal mapping of a functional h is defined by [8,Definition 24.1].For our setting, for any (x, Note that Prox ι C = P C . In order to implement the Douglas-Rachford algorithm we must write the proximal mappings f and g.The proofs of Theorems 1 and 2 below follow the broad lines of proof of Lemma 2 in [13].In both theorems, the major difference from [13] is that the proximal operators in the current paper have two variables x − and u − .Thanks to separability, the proof of Theorem 1 is a straightforward modification of the corresponding part of the proof of [13, Lemma 2].We include a full proof of Theorem 1 for the convenience of the reader.On the other hand, the proof of Theorem 2 deals with the solution of a more involved optimal control subproblem, namely Problem (Pg).
Theorem 1.The proximal mapping of f is given as Prox f (x − , u − ) = (y, v) such that the components of y and v are expressed as Proof.From ( 13) and the definition of f in (11) we have that In other words, to find Prox f (x − , u − ) we need to find (y, v) that solves Problem (Pf) is separable in the variables y and v so we can consider the problems of minimizing w.r.t.y and v individually and thus solve the two subproblems (Pf1) The solution to Problem (Pf1) is given by i = 1, . . ., n, which, after straightforward manipulations, yields (14).The solution to Problem (Pf2) is obtained similarly as v j (t) = argmin u j ≤w j ≤u j β r j w 2 j + (w j − u − j (t)) 2 , j = 1, . . ., m, which yields (15) after straightforward manipulations.
Theorem 2. The proximal mapping of g is given as where x(t), λ(t) are obtained by solving the two-point boundary-value problem (TPBVP) Proof.Using [8,Example 12.25], and the definition of g in (11), which verifies the very first assertion.From (13) and the definition of g in (11) we have that In other words, to find Prox g (x − , u − ) we need to find (y, v) that solves the problem (Pg) Problem (Pg) is an optimal control problem where y(t) is the state variable and v(t) is the control variable.The Hamiltonian for Problem (Pg) is and the associated costate equation is If v is the optimal control for Problem (Pg) then, by the maximum principle, for all t ∈ [t 0 , t f ], a re-arrangement of which yields (17).Collecting together the ODE in Problem (Pg) and the ODE in (19), substituting v(t) from ( 17), and assigning y(t) = x(t), result in the TPBVP in (18).
Remark 1.We note from Theorem 2 that Prox g is the projection onto the affine set A.
Unlike Prox f , in general, we cannot find an analytical solution to (18)  The application of the Douglas-Rachford (DR) algorithm to our problem is slightly different to that in [7,12].Since we are solving control-and state-constrained optimal control problems we must define the proximal mappings at the pair (x, u), rather than just at u as in [7,12].Thus in the implementation of the DR algorithm we give the iterations for the pair of iterates (x k , u k ) rather than for u k alone.
Given β > 0, we specialize the DR algorithm (see [17], [25] and [18]) to the case of minimizing the sum of the two functions f and g as in ( 11)- (12).The DR operator associated with the ordered pair (f, g) is defined by Application of the operator to our case is given by where the proximal mappings of f and g are provided as in Theorems 1-2.Let X be an arbitrary Hilbert space.Now fix x 0 ∈ X.Given x n ∈ X, k ≥ 0, the DR iterations are set as follows. ( The DR algorithm is implemented as follows.We define a new parameter γ := 1/(1 + β) where β is the parameter multiplying the objective as in (11) and Theorem 1.The choice of γ ∈]0, 1[ can be made because changing β does not affect the solution of Problem (P).

Algorithm for projector onto A
We introduce a procedure for numerically projecting onto A that is an extension of Algorithm 2 from [12] to the case of LQ control problems with state and control constraints.The procedure below (Algorithm 2) can be employed in Step 3 of the DR algorithm.In the procedure we effectively solve the TPBVP in (18) by implementing the standard shooting method [4,23,31].Throughout the steps of Algorithm 2 below, we will solve the ODEs in (18), rewritten here in matrix form as with various initial conditions (IC): In the above equations, we are using λ DR , instead of just λ, to emphasize the fact that λ DR is the costate variable emanating from solving Problem (Pg) to compute the projection onto A within the DR algorithm.We reiterate that Problem (Pg) is more involved than its counterpart in [13], which leads to the ODE in (21) which in turn is more complicated than its counterpart in [12].

Algorithm 2. (Numerical Computation of the Projector onto A)
Step 0 (Initialization) The following are given: Current iterate u − , the system and control matrices A(t) and B(t), the numbers of state and control variables n and m, and the initial and terminal states x 0 and x f , respectively.Define z(t, λ 0 ) := x(t).

A conjecture for the costates for Problem (P)
Recall that the optimal control for Problem (P) is given by cases in (8).A junction time t j is a time when the control u j (t) falls into two cases of (8) simultaneously, i.e. a point in time where a control constraint transitions from "active" to "inactive," or vice versa.This definition of a junction time becomes important in the following conjecture, which has been formulated and tested by means of extensive numerical experiments.
Conjecture 1.Let λ DR (t) be the costate variable emerging from the projector into A computed in Algorithm 2 and λ(t) be the costate variable emanating from Problem (P).Let t j be a junction time for some u j , j = 1, . . ., m, such that b j (t j ) T λ DR (t j ) ̸ = 0. Define α := −r j (t j )u j (t j ) b j (t j ) T λ DR (t j ) . Then Remark 4. The ability to obtain the costate variable by Conjecture 1 is desirable as a tool for checking that the necessary condition of optimality in ( 8) is satisfied.Without this conjecture we are unable to verify whether the optimality condition is satisfied when using the DR algorithm, except when a dual version of the DR algorithm is employed, as in [13].
Once we have calculated λ in this way we can also find a multiplier µ k , k = 1, 2, numerically for the case when only one state constraint is active at a given time.Suppose that only the ith state box constraint becomes active.By rearranging Equation (1), using numerical differentiation to find λ and assuming µ 2 i (t) = 0, If µ 1 i (t) = 0, then we compute With (24), or with (25), the complementarity conditions in (3) or ( 4) can now be checked numerically.

Numerical Experiments
We will now introduce two example problems.Along with posing the optimal control problems we also give plots of their optimal controls, states, costates and multipliers with vertical lines signifying the regions where the state constraints become active.

Harmonic oscillator
Problem (PHO) below contains the dynamics of a harmonic oscillator which is typically used to model a point mass with a spring.The dynamics are given as ÿ(t) + ω 2 0 y(t) = f (t) where ω 0 > 0 is known as the natural frequency and f (t) is some forcing.In a physical system y represents the position of a unit mass, ẏ is the velocity of said mass, the natural frequency is expressed as where k is the stiffness of the spring producing the harmonic motion and f is the restoring force.In addition to the restoring force we will introduce another force u 1 that will affect the velocity ẏ directly.We let x 1 := y, x 2 := ẏ and u 2 := f to arrive at ẋ1 (t) = x 2 (t) + u 1 (t), ẋ2 (t) = −ω 2 0 x 1 (t) + u 2 (t).In this example problem the objective contains the squared sum of all four variables in the system.It is common to see this problem with the objective of minimizing the energy of the control variable but in this case we have also included the state variables to test the algorithm with a slightly more involved objective.The focus of this research is control-and state-constrained problems so the constraints are added as in Problem (P). (PHO)

Simple spring-mass system
The simple spring-mass system is another physical system that can be easily visualised, see [30].This problem contains two masses and two springs connected in sequence with dynamics given by m 1 ÿ1 (t) where m 1 , m 2 are the two masses, k 1 , k 2 are the spring coefficients (stiffness) and f 1 (t), f 2 (t) are the forces applied to m 1 , m 2 respectively.Let and k 2 = 2 then we retrieve the system in Problem (PSM).This dynamical system furnishes an optimal control problem with four state variables and two control variables.As in (PHO) we add state and control constraints and set the Figure 1: (PHO) Case 2 plots (see Table 1) using DR with N = 10 3 , −0.025 ≤ x 1 (t).Vertical lines indicate the interval in which the state constraint becomes active.
for the different values of γ.A specific value of γ that would provide the smallest errors was not obvious since many values provided similar performance but values closer to 1 appeared optimal thus the choice of γ = 0.95 was made for these experiments.
Graphical Results.In Figures 1-2 we have plots for (PHO) and (PSM) using DR.We have generated similar figures using Ipopt as well but since they mostly overlap Figures 1-2 we will not show them all here but will point out the differences.In Figure 3 we give the 1 , µ 2 2 for Case 2 (PHO) with N = 10 3 using DR (solid lines) and Ipopt (dotted lines).Note that µ 2 1 (t) found by Ipopt attains a maximum value of 33, which is not shown in the graph.multiplier vector µ 2 components for Case 2 (PHO) using DR and Ipopt.As indicated by the black vertical lines in the bottom left plot of Figure 1 when using DR the constraint on x 1 is active over a time interval that aligns with the interval where µ 2 1 is positive, as expected from Equations ( 3)-( 4).
In the results from Ipopt we observe that the state constraint only became active at a single point in time but Figure 3 shows that µ 2 1 from Ipopt (yellow dotted line) is positive for a larger interval on time thus Equation ( 4) is violated.This appears to be a numerical error that is not present in the DR results since there is an interval of points around the spike where the state constraint −0.025 ≤ x 1 (t) becomes active.When N = 10 4 , 10 5 for (PHO) we observe that when using Ipopt the lower bound on x 1 is never reached though again we see the interval of points that are almost equal to the lower bound.
In Figure 3 we also see a discrepancy in µ 2  2 .Since we have not imposed a constraint on the variable x 2 Equation (4) implies that µ 2 2 (t) = 0 for all t ∈ [t 0 , t f ].We can see in Figure 3 that µ 2 2 from Ipopt (purple dotted line) fails to satisfy this requirement while µ 2 2 from DR is, at least to the eye, equal to zero.
Another difference between the multipliers µ from DR and Ipopt is the maximum values reached by the functions.In Figure 3 we see that µ 2  1 obtained via DR and Ipopt have similar shapes in their plots but the maximum value reached using DR is approximately 0.7 while from Ipopt the maximum value is approximately 33.For (PSM) Case 2 with N = 10 3 the maximum value that µ 2 1 obtained using DR was approximately 16 while Ipopt reached approximately 310.Along with the functions having very different maximum values we noted that when generating these plots for N = 10 3 , 10 4 , 10 5 the results from DR were clearly converging to a single function µ 2 1 while this was not obvious from the Ipopt results.For (PSM) we see the approximate maximum values of µ 2 1 obtained when using Ipopt were 310 for N = 10 3 , 2500 for N = 10 4 and 3000 for N = 10 5 .In the same example using DR the maximum values of the function were 16 for N = 10 3 , 14 for N = 10 4 and 14 for N = 10 5 .
We also see some slight variation between the plots from DR and Ipopt at the junction times in the control variables (not shown in this paper in order to avoid excessive amount of visual material).The intervals where the control variables attain their boundaries using DR always appear slightly larger than those from Ipopt.The control variables in the region where they transition between active and inactive constraints appear more rounded at these corners when using Ipopt and have a more sharp transition when using DR.
Errors and CPU Times.Table 2 contains the errors in the controls, states, costates for DR and Ipopt while Table 3 contains the errors in the multipliers, objective values and CPU times.The CPU times were computed as averages of over 200 runs (up to 1,000 runs in the faster examples).The values within boxes are the smaller errors and CPU times between DR and Ipopt.At a glance we can see that more often than not DR produced smaller errors and faster CPU times when compared with Ipopt.Upon closer inspection we see that in many of the Case 2 results the errors from DR and Ipopt are comparable.We note that the "true" solutions we are using to calculate these errors are those explained earlier in this subsection except for the multipliers µ 2 in the case 1 examples.In those examples we take the zero vector as our "true" solution.
We observe that most of the results in Table 2 show a smaller error in the control variable from DR, especially in the case 1 examples where there are no state constraints.In the state variables we see that Ipopt has the smaller errors when N = 10 3 , 10 4 though there is little difference compared with DR and we see an improvement in DR when N = 10 5 .Like with the error in the control variables we see that the error in the costates is smaller for DR in almost all examples.
From Table 3 the errors in the multipliers show an improvement in DR compared with Ipopt though the "true" solution in the case 2 examples was obtained using results from Ipopt which as previously mentioned did not appear to converge to a specific value as we increased N so the quality of this "true" solution is not guaranteed.DR produced slightly smaller objective values that were closest to the "true" solution in almost all cases though the difference compared with Ipopt is marginal.We see that the CPU times are faster for DR especially in the examples where N = 10 5 .

Conclusion
We have applied the Douglas-Rachford (DR) algorithm to find a numerical solution of LQ control problems with state and control constraints, after re-formulating these problems as the sum of two convex functions and deriving expressions for the proximal mappings of these functions (Theorems 1-2).These proximal mappings were used in the DR iterations.Within the DR algorithm (Algorithm 1), we proposed a procedure (Algorithm 2) for finding the projection onto the affine set defined by the ODEs numerically.We carried out extensive numerical experiments on two nontrivial example problems and illustrated both the working of the algorithm and its efficiency (in both accuracy and speed) compared with the traditional approach of direct discretization.We observed that in general the DR algorithm produced smaller errors and faster run times for these problems most notably in the examples where we have increased the number of discretization points.From these numerical results the DR algorithm could in general be recommended over Ipopt when generating high quality solutions.
Based on further extensive experiments, we conjectured on how the costate variables can be determined.We successfully used the costate variables constructed as in the conjecture, as well as the state constraint multipliers calculated using these costate variables, for the numerical verification of the optimality conditions.
We recall that Algorithm 2 involves repeated numerical solution of the ODEs in (21) with various initial conditions.For solving (21) we implemented (explicit) Euler's method which requires only a continuous right-hand side of the ODEs.Algorithm 2 appears to be successful for the worked examples partly owing to the fact that in these examples the optimal control is continuous.We tried to use our approach for the machine tool manipulator problem from [16] which has 7 state variables, one control variable, and upper and lower bounds imposed on the single control variable and one of the state variables.However, our approach did not seem to yield a solution (so far) for this problem, conceivably owing to the fact that the optimal control variable, as well as the optimal costate variable vector, is not continuous, in that these variables have jumps a number of times during the process-see [16,Figure 5].Note that discontinuities in the control make the right-hand side of (21) discontinuous rendering Euler's method ineffective.Therefore, problems of the kind in [16] require further investigation.
We believe that our approach can be extended to more general convex optimal control problems, for example those with a non-quadratic objective function or mixed state and control constraints, as long as the pertaining proximal operators are not too challenging to derive.
It would also be interesting to employ and test, in the future, other projection type methods, for example the Peaceman-Rachford algorithm [8, Section 26.4 and Proposition 28.8], which, to the knowledge of the authors, has not been applied to optimal control problems.

Table 2 :
Errors in controls u, states x and costates λ and CPU times for the DR algorithm and AMPL-Ipopt, with ε = 10 −8 and specifications from Table1.

Table 3 :
Errors in multipliers µ, errors in objective values and CPU times for the DR algorithm and AMPL-Ipopt, with ε = 10 −8 and specifications from Table1.