OPTIMIZATION FOR A

. In this article, we discuss the optimization of a linearized traf-ﬁc ﬂow network model based on conservation laws. We present two solution approaches. One relies on the classical Lagrangian formalism (or adjoint calculus), whereas another one uses a discrete mixed-integer framework. We show how both approaches are related to each other. Numerical experiments are accompanied to show the quality of solutions.

For optimization purposes, different applications such as optimal routing of traffic at intersections [17,23,24,36], traffic light control [21], evacuation planning [20,36] and air traffic control [1] are of interest. Since in all problems the underlying optimization problem is restricted by partial differential equations, relaxed models with simplified dynamics have been investigated. In this context, two different solution approaches emerge. On the one hand, continuous optimization techniques have been successfully applied to compute optimal solutions. The first order optimality system is derived and solved by a descent type method [30,39]. On the other hand, suitable discretizations of the dynamics lead to network flow models that have been widely considered in the field of combinatorial optimization [3].
In fact, there exist a few research results comparing both optimization tools, see [8,14,15,17,42]. One might assume that a direct relation between continuous and discrete optimization techniques exist. This is especially the case when the governing dynamics in the network are (closely) linear. Appropriate numerical discretizations can be chosen, such that the original optimization problem either On each edge e ∈ E the dynamics of traffic flow is described by the well-known Lighthill-Whitham-Richards equations [35] for the density ρ e (x, t), x ∈ [a e , b e ], t ≥ 0. Thus the following equations are assumed to hold on each edge e: ∂ t ρ e (x, t) + ∂ x f e (ρ e (x, t)) = 0 ∀e ∈ E, x ∈ (a e , b e ) , t ≥ 0 ρ e (x, 0) =ρ e (1) whereρ e are constant functions, i.e. we assume Riemann initial data. As flow function f e we consider a symmetric hat function f e (ρ e ) = v e ρ e 0 ≤ ρ e ≤ σ e f e (σ e ) − v e (ρ e − σ e ) σ e < ρ e ≤ ρ max,e (2) borrowed from [7,9,38], also sometimes called Newell-Daganzo-Flux.
Remark 1. The hat function (2) is only piecewise differentiable on the intervals [ 0, σ e ) and ( σ e , ρ max,e ] with the property f e (ρ e ) = ±v e . Since differentiability is needed for the adjoint calculus performed in section 4, the function has to be smoothed in an appropriate way.
Looking for a network solution, coupling conditions have to be posed for the problem (1) to be well-defined. For an ingoing edge e ∈ δ − (v) the density at a vertex v is denoted byρ e (t) = ρ e (x = b e , t) and an outgoing edgeẽ ∈ δ + (v) is referred to byρẽ(t) = ρẽ(x = aẽ, t). Then, to guarantee flow conservation for all inner vertices v ∈ V with |δ ± (v)| > 0, the following condition must be fulfilled: fẽ(ρẽ(t)) ∀t ≥ 0. ( Due to the choice of constant initial data as in [5,23,26], the computation of the coupling condition may be done using the theory about Riemann problems [27]. This means, to get admissible solutions at junctions, we look for waves of negative speed for incoming edges and waves of positive speed for outgoing edges. Thus, we impose the following restrictions to the density values at the vertex Resulting from (4) bounds for the density values at the boundaries of a road are known and the flow going in or out of an edge can be bounded from above by the following functions: de e (ρ e (a e )) = maxFlowIngoing(ρ e (a e )) = f e (σ e ) · min 1, 2 − ρ e (a e ) σ e , (5a) su e (ρ e (b e )) = maxFlowOutgoing(ρ e (b e )) = f e (σ e ) · min being illustrated in figures 1, 2, respectively.   However, condition (3) is not sufficient to obtain unique solutions. Thus, analogously to Coclite-Garavello-Piccoli [5] we define a time-dependent distribution matrix α = (α eẽ ( t)) for all dispersing vertices v ∈ V (i.e. |δ + (v)| > 1), where 0 ≤ α eẽ ≤ 1 is the percentage of flow from edge e going to edgeẽ and ẽ∈δ + (v) α eẽ = 1 for all e ∈ δ − (v). We get As explained in [5,26], this still leaves a degree of freedom. Assuming that all drivers like to move forward as fast as possible, the throughput at each vertex v shall be maximized. Therefore we solve the following optimization problem at each vertex max s.t. (3) and (6) Remark 2. For simulation purposes the distribution matrix α is predefined and we consider the traffic flow network model consisting of the equations However, considering the optimal routing of cars through the network, the entries of the matrix α will be the control parameters.
By imposing certain conditions on the matrix α and restricting to networks without merging vertices, Coclite-Garavello-Piccoli [5] proved the existence of a unique solution for the optimization problem (7) (Theorem 3.2 in [5]) and the whole traffic flow network problem (8). For a merging junction with two ingoing and one outgoing edge the unique solvability of the maximization problem (7) is shown, among others, by Klar-Herty [23] by proposing FIFO-conditions for the flow through the vertex or Daganzo [6] and Göttlich-Herty-Ziegler [19] by defining a priority road.
In our study we restrict to three types of junctions presented in figure 3. The solution of the coupling conditions is explicitly given in equations (9)- (11).  . Junction types considered. Types 3(a) and 3(c) are solved using the techniques described by Coclite-Garavelo-Picolli [5], whereas for type 3(b) we use the approach of Klar-Herty [23].
For a simple junction connecting two edges (cf. fig. 3(a)) the unique solution of the coupling is then Considering a merging junction (cf. fig. 3(b)), the approach of Klar-Herty [23] yields the optimal solution to (7). The coupling at the dispersing junction (cf. fig. 3(c)) is solved using the conditions of Coclite-Garavello-Piccoli [5] 2.1. Discretization. For the numerical solution of equation (1), there exist several standard discretization schemes. Since the focus of this work is on optimization and the comparison of two different optimization approaches, we apply the simple staggered Lax-Friedrichs scheme as in [21,29,33]. In particular, this scheme does not necessitate the solution of Riemann problems in the interior of the computational domain and can thus be handled (easier) within both considered optimization approaches. We apply a staggered Lax-Friedrichs scheme on the grid x e,j = a e + j∆x e = a e + jL e /Nx e , where L e = b e − a e denotes the length of edge e and Nx e the number of cells. The time horizon [0, T ] is divided into parts of equal length, t n = n∆t = nT /Nt, where the time step ∆t has to satisfy the CFL-condition Then, for a fixed edge (neglecting the index), with λ = ∆t/∆x the evolution of the discretized density reads (j ∈ {2, . . . , Nx − 1}, n ∈ {0, . . . , Nt − 1}) The initial conditions ρ 0 j−0.5 (j ∈ {1, . . . , Nx }) are calculated using

SIMONE GÖTTLICH, OLIVER KOLB AND SEBASTIAN KÜHN
The values f (ρ n 0 ) and f (ρ n Nx ) result from the coupling conditions (4) and are affected by ρ n 0.5 and ρ n Nx −0.5 , respectively. So for instance, at a vertex with one entering edge e and one leaving edgeẽ, we get f (ρ n e,Nx e ) = f (ρ ñ e,0 ) = min su e (ρ n e,Nx e−0.5 ), deẽ(ρ ñ e,0.5 ) .
Remark 3. In equation (15) another advantage of the staggered Lax-Friedrichs scheme can be seen: The flow values at the boundaries needed in (13a), (13c) are directly computed from the coupling conditions. In contrast to other schemes, like the standard Lax-Friedrichs scheme, no computation of densities has to be done. This fact speeds up the computation and saves a lot of technical effort in the optimization part (see sections 4 and 5).
Note that the boundaries of the network have to be treated separately. At the inflow boundary with a desired inflow rate f in e (t n ) the actual inflow to the edge e is given by i.e. either the desired inflow rate is allowed to enter the edge or the edge is already jammed and thus the inflow rate is reduced to the amount of flow being able to be processed. At the outflow boundary at edge e we have which ensures that the flow reaching the end of an outflow edge is able to leave the network without being stopped (Neumann boundary condition).
3. Optimization problem. For the traffic flow control in a road network, it is relevant to have an upper bound on the maximal flow in a certain time horizon. However, for evacuation purposes [20], a lower bound on the minimal evacuation time is desired. In both cases, these bounds may be calculated by the traffic flow network model (8) where at dispersing junctions the drivers/evacuees are assigned to the connected roads in an optimal way, i.e. control of the distribution matrix α. In this optimization problem the drivers do not choose their routes themselves but are directed to a special route. Therefore they do not act selfish and a system optimum is achieved rather than a Nash equilibrium [2]. In order to achieve such a system optimum, we solve the following optimization problem: (14), (15), (16), (17) where J(ρ) is the objective function depending on the density of drivers. There are actually two ways to solve the optimization problem (18). On the one hand, we can apply an adjoint approach based on the Lagrangian formalism, which is explained in section 4. This procedure is the natural choice for PDE-restricted optimization problems. On the other hand, we use a mixed integer programming (MIP) approach, see section 5. This means a reformulation of the traffic flow model using suitable discretizations. Although both approaches initially seem to be different, they are closely related to each other.
The goal of the remaining article is to show the connection between the adjoint and the mixed integer approach as illustrated in figure 4.
Starting from the discretized traffic flow network model and the corresponding optimization problem (18), we follow two strategies to compute an optimal solution. In a first step, (A in figure 4), the first order optimality system (KKT-system) based  Figure 4. Sketch of the interaction between the discrete first order optimality system (section 4) and the mixed integer program (MIP) model (section 5) for the traffic flow network model with hat-function (2).
on the discretization introduced in section 2.1 is formally derived, i.e. the forward equations, the backward equations (adjoints) and the gradient equations. This procedure is called discretize-then-optimize approach, see [8,22]. From a numerical point of view, gradient type methods are used to solve the KKT-system. Alternatively, (B in figure 4), a special discretization of the problem (18) yields a linear mixed integer program (MIP), see [8,15,16]. In the discretization of (1) with (2), one has to distinguish whether the flux derivative is positive (f (ρ e ) > 0) or negative (f (ρ e ) < 0). Therefore binary variables will enter the problem. Further differences to the adjoint approach appear in the representation of the coupling conditions (3). The entire optimization model resulting in the MIP approach can be solved using Branch-and-Bound-techniques, e.g. CPLEX [28].
The connection between the two approaches (C in figure 4) will be illustrated in section 6. The idea is to consider the dual problem of the MIP and to compare the resulting dual variables with the adjoint variables.
To refine our ideas we present the following example for a LP.
Example 1. We consider a linear program of the form where c ∈ R n , b ∈ R m , A ∈ R m×n and m, n ∈ N. The dual of (19) is given by

SIMONE GÖTTLICH, OLIVER KOLB AND SEBASTIAN KÜHN
Furthermore the Lagrangian formulation of the linear program (19) with Lagrangian multiplier ϕ is given by This leads to the following KKT-system (state and gradient conditions) for (19). Equation (20b) can be interpreted as the adjoint equation of (19). Since the objective function J(x) = c T x is linear, we get from the adjoint equation (20b) We directly see that the dual variables Φ and the adjoint variables ϕ obey the same restrictions.
4. Adjoint equations. Adjoint calculus has been used for optimization purposes in a wide variety of applications, among others traffic flow problems [22,23], supply chains [8], optimal control of gas and water supply networks [14,25,33,34]. In this section we present an approach to solve the optimization problem (18), i.e. finding optimal distribution rates at each vertex subject to the considered objective function, by explicitly solving the so-called discrete adjoint equations, which are part of the discrete first order optimality system.
Within this section we start with a description of the general way to approach the adjoint equations. Then we dedicate a section to the application of the general approach to the different parts of the traffic flow network model, beginning with propagation (eq. (1) and its discretization (13)). We go on with the coupling conditions for a simple junction (eq. (15)) and close with the treatment of boundary conditions (eqs. (16) and (17)).

4.1.
First-discretize adjoint approach. The discretized equations (13), (14), (15), (16) and (17) are of the form C(ρ, α) = 0. So, the optimization tasks we consider are of the form min J(ρ) with lower and upper bounds α min , α max for the distribution rates α. For a given control α, the state equations C(ρ, α) = 0 uniquely determine the state variables ρ. Therefore, the state variables can be considered as a function of the control variables, i.e. ρ = ρ(α), which is implicitly given via the state equations, C(ρ(α), α) = 0. The resulting so-called reduced optimization problem reads min J(ρ(α)) To solve (22) with derivative-based optimization techniques, one needs to compute the total derivative of J with respect to α. Formally, this leads to d dα J(ρ(α)) = −ϕ T ∂ ∂α C(ρ(α), α) with the so-called adjoint state ϕ being the solution of the adjoint equation Rewriting (23) in the form ∂ ∂ρ J(ρ(α)) − ϕ T ∂ ∂ρ C(ρ(α), α) = 0 directly shows that (23) is part of the KKT system of (21) with Lagrangian multipliers ϕ. Note that we need to smoothen the kink in the hat function (2) as well as the minima in the supply/demand functions (5) and in the coupling conditions (9)- (11) to make the state equations C differentiable with respect to ρ.
Since we consider a time-dependent problem, the state equations and therewith the adjoint equation (23) have a very special structure. In general, this can be easily exploited to reduce the computational effort in practice (cf. [33,34]) and will be used below to explicitly solve the adjoint equation for the comparison with the dual variables of the MIP formulation.

J(ρ(α)) . (33)
In the following, we summarize the discrete adjoint equations given above for a better referencing initialisation: (24) propagation: (25) coupling at a node: (29), (31) in-and outflow: (32), (33) 5. Mixed Integer Program (MIP). As already announced, an alternative way to solve the optimal control problem (18) might be the derivation of a mixed integer program (MIP). To do so, we proceed similarly as in section 4. We discretize the constraints of (18) in a straightforward way following the ideas in [8,16,17,18,21]. We start with the reformulation of the hat function (2) introducing binary variables.
Then we go on with the discretization of (1) using the staggered Lax-Friedrichs scheme, cf. section 2.1. The coupling conditions (3) and (7) are rewritten accordingly. We close the section with a summary of the complete mixed integer program. The latter can be solved using a commercial solver as for instance CPLEX [28]. (2) is a piecewise linear, symmetric function. This can be understood as a composition of two linear parts where a binary variable κ ∈ B indicates whether the density ρ is less (κ = 1) or greater (κ = 0) than the breakpoint σ. This is guaranteed by

Reformulation of the hat function. The hat function
Thus the binary variable κ enables us to write the hat function (2) in one line only However, for a standard MIP, we need all constraints to be linear. We introduce an auxiliary variableκ representing the product κρ. Following [18] we get some additional technical constraints and may rewrite the hat function as a linear function in the variables ρ, κ andκ 5.2. Propagation. As described in section 2.1, the discretization of the transport equation (1) is done applying a staggered Lax-Friedrichs scheme (13). In order to deduce a mixed integer formulation, we use this discretization with λ = min {1/2v e } being the largest possible time step. This yields (omitting the index e for the edge) where the index j describes the spatial discretization point and n the time step considered. The variable f n 1 denotes the inflow into the edge, i.e. f (ρ n 0 ) = f (ρ(a, t n )) and f n 2 describes the outflow of the edge, i.e. f (ρ n Nx ) = f (ρ(b, t n )). The expression f (ρ n j ) is an abbreviation for the expression of (36) for each j and n. Thus, we get the set of equations (37) for each time step n, each discretization point j and each edge e, making a total of (Nt − 1) × Nx × |E| constraints.
Note that the initial conditions of (1) have to be discretized as well: Furthermore we need to implement the constraint for maximal throughput (9) in order to guarantee a unique solution f n e,2 = min su n e,2 , de ñ e,1 .
This minimum is included in the MIP by introducing a binary variable deciding on which term is smaller. Hereby we follow the approach in Göttlich et al. [18,Proposition 2.2]. The equations (5) are rewritten in the same way by making use of the binary variable κ, which has already been defined in (35).
Remark 4. The coupling of more than two edges in a junction is a straightforward extension of equation (41) adding technical constraints and four more variables per junction.
For the boundaries of the network, boundary conditions have to be specified. First we consider the inflow boundaries, i.e. all edges going into the network, e ∈ E in . For these edges the inflow variable f e,1 has to be calculated by for all 0 ≤ n ≤ Nt, where f in e (t) is a function describing the inflow to this edge. For the edges leading out of the network, we assume that all flow is leaving the network.
Summarizing, the constraints of the mixed integer formulation of (18) are initialisation: (38), (39) propagation: (37) coupling at a node: (40), (41) in-and outflow: (42), (43) Remark 5. Note that the controls α are only implicitly given in the MIP (44). They will be computed from the optimized flow values f n e,i in a post processing step.
6.1. Dual problem and connection to discrete adjoints. We start with a formal comparison of the two optimization approaches. For simplicity, we stick to the situation depicted in figure 3(a). The computations are numerically validated in subsection 6.2 and extended in subsection 6.3. For our investigations, we stick to linear objective functions J, e.g. the maximization of outflow. In order to dualize the MIP we introduce a dual variable Φ for each constraint of (44). Applying complementary slackness conditions, the number of dual variables reduces and simplifies the problem. We mainly distinguish between the free flow (f (ρ) > 0) or the heavy traffic regime (f (ρ) < 0). This leads to the dual problem of the MIP (44) Propagation: Φ n e,j−0.5 = 1 4 ( Φ n+1 e,j−1.5 + 2Φ n+1 e,j−0.5 + Φ n+1 e,j+0.5 Coupling at a node: withX n e,1 ∈ B andX n e,2 ∈ B being the counter parts of the binary variables defined in equations (28) and (30), respectively. They take their values according to the optimal primal solution being in the according regimes. Obviously, the equations of (45) correspond to the adjoint problem in the following way: For the initialization and the coupling the restrictions directly correspond. For all other constraints to be compared we note that f (ρ) = ±v due to the use of the hat function. InsertingX n e,i also decides on the sign of the velocity v and thus the comparison is straightforward again.
Note that for the numerical solution of (45), we first solve the primal MIP (44) and then insert the values forX n e,i of the optimal solution. The numerical results can be found in subsection 6.2. Our investigations can be extended to general network topologies including junctions of merging and dispersing type. However, we omit the details of this representation in this work and solely refer to a numerical study in section 6.3.
Next, we solve the optimization problem (18) numerically with the adjoint approach presented in section 4 (to compute derivatives for the reduced problem), which is done within the software package ANACONDA [33], and applying DONLP2 [40,41] as well as the mixed integer program (MIP) presented in section 5 applying CPLEX [28]. For the latter we employ a barrier procedure as starting algorithm (as it performs better than the default setting of CPLEX) and a tolerance gap of 0.
6.2. Chain of roads. We present numerical examples for two different network scenarios. For all settings we give the properties of edges (maximal density ρ max , maximal flow f (σ), velocity v and length L := b − a) in tables 7, 10 and display the corresponding graphs in figures 6, 9. Furthermore, we choose a spatial discretization of Nx = 4, i.e. ∆x = L/Nx , and satisfy the corresponding CFL-condition with equality yielding ∆t = L/8v. The initial values as well as the inflows are given for each example separately.
In the first example, the chain of roads, the focus is on the numerical equivalence of the adjoint variables and the dual variables of the mixed integer program (MIP). Note that there are no control variables in this example, i.e. no degrees of freedom for optimization. Nevertheless, the objective function influences the adjoint and dual variables.
The second example deals with a more complex setting, which can be considered as a straightforward extension of our previous results. In contrast to the work of Fügenschuh et al. [17], backtravelling waves enter into the adjoint and MIP problem. To the best of our knowledge, this effect has not been treated in the network context so far.
We come to the first example: a chain of roads, i.e. a vertex with one ingoing and one outgoing edge (see fig. 6). The properties of the edges are given in table 7. Since the setting consists of a single junction we use the coupling conditions given in (15) and (40), (41) for the adjoint approach and the mixed integer program (MIP), respectively. We consider the following linear objective function for the MIP, and the corresponding objective function for the adjoint approach. The given objective function can be interpreted in the following way: The density on each edge shall be minimized, where the second edge has a larger influence than the first edge, and the throughput in each vertex shall be maximized. For this objective function we compare the solutions obtained with the two approaches, evaluated at the first and the last cell of both edges (figure 8). We observe that the match is perfect, i.e. the adjoint variables (solid line) and the dual variables of the MIP (dots) coincide for all time steps. The maximal error is 5 · 10 −7 . 6.3. Diamond network. Next we study a more complicated network (figure 9). There are both two dispersing (v 2 and v 3 ) and two merging vertices (v 4 and v 5 ). The properties of the edges are given in table 10. We stick to this kind of network since we want to show, at least experimentally, that the computations we did in sections 4, 5 still hold true for interlinked networks. This example is motivated by evacuation dynamics, where the objective is to maximize the total number of rescued evacuees while ensuring a safest possible evacuation. Thus we consider the following objective function, which on the one hand maximizes the throughput and on the other hand penalizes high densities on certain edges max J 2 = 5 The controls are the distribution rates α 2 and α 3 at the vertices v 2 and v 3 , respectively, and similar degrees of freedom in v 4 and v 5 (for backtravelling waves). The inflow is constant, i.e. f in (t) = 1.5, and the considered time horizon is T = 5. Now, we compare both approaches by looking at the optimal solutions (see table 11, figures 12, 13, 14, 15). The mixed integer program (MIP) is solved using CPLEX [28] and the adjoint approach is solved by the software packages ANACONDA [33] and DONLP2 [40,41] (the latter for the reduced problem).
Comparing the optimal solutions we see that the mixed integer program (MIP) yields (as expected) the best objective value, since the approach guarantees a global optimal solution, but at significantly longer running times. Due to the internal use of the Branch and Bound algorithm, the CPLEX solver tends to prefer bangbang-solutions, cf. figure 12. This phenomenon has been observed before in [31]. Solving the KKT-system and thus the adjoint equation is much faster and leads to a smoother solution, and in this case not reaching the global optimum, at least not with the same accuracy. In general, this cannot be expected since the optimization problem is not convex and hence the method may get stuck in a local optimum. Using the optimal MIP solution as a starting solution for the adjoint approach improves the results (see table 11). DONLP2 recognizes the optimality of the starting solution and stops after just three iterations.
Alternatively, using the solution of the adjoint approach as a starting solution improves the running time of the mixed integer program (MIP) enormously (about 40 times faster, see 1114.80 - Figure 11. Comparison of optimal solutions. In figure 13 we plot the density in the last cell of edge 7, i.e. the outflow of the network for both solution methods and a uniform distribution at the vertices, i.e. α 2 = α 3 = 0.5. Since we present the original solutions without using any starting solutions, the distribution rates and also the outflow slightly differ. This difference immediately vanishes when using the solution of the mixed integer program as a starting solution to the adjoint approach. Finally, we observe that both solution approaches yield an increased output compared to the non-optimal uniform distribution.   In figures 14, 15 we present the density on the last cell of the penalized edges 3 and 6, respectively. Consequently, the MIP and the adjoint method as well result in lower densities compared to the uniform distribution. This is due to the penalization of density on these edges in the objective function (47). The difference between the optimized values and those from the uniform distribution are larger on edge 3 in comparison with edge 6. This is on the one hand caused by the higher penalization of density on edge 3. On the other hand, edge 3 is congested, if the flow is uniformly distributed. This drawback is avoided by both optimization approaches.