Optimal Control of Linear Cost Networks

We present a method for optimal control with respect to a linear cost function for positive linear systems with coupled input constraints. We show that the optimal cost function and resulting sparse state feedback for these systems can be computed by linear programming. Our framework admits a range of network routing problems with underlying linear dynamics. These dynamics can be used to model traditional graph-theoretical problems like shortest path as a special case, but can also capture more complex behaviors. We provide an asynchronous and distributed value iteration algorithm for obtaining the optimal cost function and control law.


I. INTRODUCTION
In many real-world systems the quantities of interest, like amounts of goods or distributions of probabilities, are intrinsically positive.Such systems naturally lead to models that are positive systems, where the state x is confined to the positive orthant R n + .In the context of optimal control, positive systems exhibit many advantageous properties, as reviewed in [1].Of special note is that such systems admit linear Lyapunov functions.This guarantees positive stage costs for a linear cost function, giving significant computational advantages in applications where such a cost function can accurately capture the desired objective.In contrast, current methods for optimal control of general linear systems with quadratic cost [2] give rise to Riccati equations where the number of variables scales quadratically with the dimension of the state, as opposed to linearly.Additionally, as noted in [1], systems on this form allow for sparse optimal state feedback u = Kx, giving favorable scaling results for highdimensional inputs.
In this paper we exploit these advantages for the purpose of optimal control of a system where the magnitude of combinations of input signals are jointly constrained by a linear function of the state.These coupled constraints generalize the previous work [3], wherein all inputs are fully actuated in either the positive or negative direction, and give rise to tradeoffs between several different inputs.This novel formulation covers a range of problems excluded by the constraints in [3].In particular, it provides a natural framework for a general class of network routing and shortest path-like problems with underlying linear dynamics unfolding on a graph.Our formulation, which uses linear costs for the state and input, allows for an explicit solution to the associated Bellman equation, just like in the familiar quadratic cost case.We also provide a method for obtaining the optimal cost function J * (x) by means of linear programming.This paper focuses on control problems over networks, in which physical constraints and scalability concerns in applications typically call for distributed solution methods.We give a distributed implementation of the value iteration algorithm for the proposed problem formulation, similar to those treated in [4], and show that convergence results presented in [5] can be applied to the present setting.This allows for distributed and asynchronous computation of the local optimal state feedback, wherein nodes need only share their local estimates of the cost function and the cost of the optimal control action.An extended analysis of the problem setup in [3], leveraging established theory of Dynamic Programming and extending the problem to constraints based on other norms, can be found in [6], wherein upper bounds on the convergence rate of policy iteration for such systems are given.That analysis, however, also fails to capture the class of network routing problems treated here.
The formulation we present connects results from the field of optimal control to a rich history of works (e.g.[7], [8]) on graphs with finite state and action spaces, which becomes a special case of the problem as stated here.Our framework is, however, flexible enough to cover a wider range of problems with continuous state and action spaces.Notably, interpreting the continuous state as a probability distribution allows for the modeling of Markov Decision Processes (MDPs) as another special case where existing results on convergence of methods like policy iteration [9] can be applied.Most works in the related literature that treat dynamic versions of the shortest path problem use models where the structure of the graph changes [10].Similar problems with dynamics in a continuous state space evolving on a graph structure are, however, relatively unexplored.
In the next section we present relevant theory and define notation.Section III contains the problem statement, required assumptions and main result in the form of Theorem 1.In Section IV, an algorithm is given for distributed and asynchronous value iteration that finds the fixed point of the operator given in Theorem 1, corresponding to the optimal cost function.The results are illustrated by an example in Section V, with application to a simplified model of a large-scale cooling system.Finally, Section VI presents conclusions and sets out possible avenues for future study.

A. Notation
Inequalities are applied element-wise for matrices and vectors throughout.Further, the notation R n + is used to denote the closed nonnegative orthant of dimension n.The operator min{A, 0} extracts the minimum element of A, yielding zero if A has no negative elements.Let diag(a) denote a square matrix with the elements of the vector a along the diagonal.The expressions 1 p×q and 0 p×q signify a matrix of ones or zeros, respectively, of the indicated dimension, with subscript omitted when the size is clear from the context.If the dimension is zero, this is to be interpreted as the empty matrix.

B. Problem setup
Consider the infinite-horizon optimal control problem Minimize where is partitioned into M subvectors u i , each containing m i elements, so that m = M i=0 m i .In the upcoming treatment, we will let the matrices B i be further subdivided into individual columns as The costs connected to states and actions are s ∈ R n >0 and r ∈ R m ≥0 with r i ∈ R mi following the partition of u, and r ij denoting the jth element of r i .The constraints on the input u(t) are given by 1) is equivalent to finding some optimal value function J * (x) that satisfies the corresponding Bellman equation In the specific context of (1) the immediate cost is given by the value at each instant t of the cost function to be minimized, so g(x(t), u(t)) = s ⊤ x(t) + r ⊤ u(t).Note that stabilizability of the system in (1) is a necessary condition for the problem to have a finite value.In the previous work [3] the Bellman equation ( 2) has an optimal linear solution J * (x) = p ⊤ x with p ∈ R n for a certain class of positive linear systems.Our main result below proves that similarly favorable results can be obtained for systems on the form (1), which have a natural interpretation as network routing problems, but are not in general admissible in [3].

III. MAIN RESULT
In anticipation of our main result, we introduce two assumptions on the dynamics of the system in (1).
Assumption 1: Any negative element of B i is located on the ith row.
Assumption 2: The matrices A, B and E satisfy the element-wise inequality A ≥ |B − E| ≥ 0 where Assumptions 1 and 2 guarantee positivity of the dynamics in (1).While this may seem restricting, the partition of u is natural in many relevant applications.In the case of a network problem where the inputs u correspond to quantities transferred over edges in the network, Assumption 1 requires that all edges leading from node i be included in the same constraint partition u i .Since positivity of the system is often motivated by states corresponding to real quantities, Assumptions 1 and 2 mean that no more than the quantity already in node i at time t (or expected to be within one time step) can be transferred from it.These assumptions are sufficient to ensure invariance of R n + in the general case, but in special cases where parts of the state space are unreachable they may be relaxed.In such cases it is, however, possible to remodel the system in compliance with the assumptions, which can therefore be made without loss of generality.Given these assumptions on the matrices of the problem (1), we are now ready to state our main result: Theorem 1: Under Assumptions 1 and 2, the following three statements are equivalent: (i) The problem (1) has a finite value for every x 0 ∈ R n + .(ii) There exists p ∈ R n + satisfying the equation is bounded and has solution p satisfying (3).Given the existence of a p as in (ii), the minimal value of (1) is p ⊤ x 0 .The optimal linear state feedback law is then given by u i (t) = K i x(t) with where the vector E ⊤ i enters at the jth row with j being the index of the minimal element of r i + B ⊤ i p, provided it is negative.If all elements are nonnegative then K i = 0 mi×n .
Proof: We first show that R n + is invariant under the dynamics in (1).It holds that As a consequence of Assumption 1 and the constraints given in (1) this expression has a lower bound given by recalling that min{B i , 0} denotes the minimum element of B i , yielding zero if all elements are nonnegative.This final expression is contained in Given the ansatz p 0 = 0, we compute each subsequent iterate p k+1 by applying the Bellman equation where U (x) denotes the set of all inputs u satisfying the constraints in (1).We partition the vector r + B ⊤ p k in the same fashion as u into M subvectors r i + B ⊤ i p k ∈ R mi .Since each subvector u i is only bound by one of the upper input constraints, we can minimize the expression above independently for each u i : where the minimum in the final expression is taken over all elements in the vector r i + B ⊤ i p k and 0. The Bellman iteration becomes x.As a consequence, we have an increasing sequence of functions generated by iteration of the Bellman equation, and the cost functions all take the form J k (x) = p ⊤ k x.
Given that the problem (1) has a finite value according to (i), the sequence of value functions generated by this iteration has the upper limit J * (x) = p ⊤ x.This limit fulfills the equation in (ii).The converse also holds since, given that a p satisfying (ii) exists, the increasing sequence of {p k } ∞ k=0 must satisfy p k ≤ p for all k.This means that the Bellman equation has a nonnegative and finite solution, implying that (i) holds.To find the optimal controller we seek the policy µ(x) minimizing the Bellman equation for the optimal cost function J * (x) = p ⊤ x.Taking the expression from (5) we get µ(x) = arg min which, partitioned like the minimization in (5), can be performed separately for each subvector µ i yielding the expression where the nonzero row vector E ⊤ i enters on the jth row and j ∈ {1, ..., m i } is the index of the largest in magnitude negative element of r i + B ⊤ i p.If all elements are positive then µ i (x) = 0 mi×n is the minimizing argument.As a result, the state feedback law for each partition u i is stationary and given by u i (t) = K i x(t).
Finally, we address (iii) in the same way as treated in [3].If (ii) holds then the p fulfilling Equation 3 clearly also solves the program which is equivalent to the linear program in (iii).Any solution fulfilling the Bellman inequality gives a lower bound on the corresponding optimal cost function.Thus the solution of the linear program must also be the optimal cost function and, as a consequence, (ii) implies (iii).To show the converse, assume that the linear program in (iii) is bounded by some argument p k .Calculate We now have p k+1 ≥ p k and as a consequence which is exactly the constraint in the reformulated program above.Since we assumed that p k attains the maximum of the program and p k+1 ≥ p k , the only possible conclusion is that p k+1 = p k .Thus the maximizing argument of the program in (iii), if it is bounded, also solves Equation 3, implying (ii).This concludes the proof.To illustrate the above theorem we apply it to a simple shortest-path problem.A detailed example showing more general possibilities is presented in Section V. Example 1: Consider the (trivial) problem of finding the shortest path from node x 1 to node x 4 in Figure 1.This can be formulated as an optimal control problem on the form (1) with n = 4, m = 10 (one input for each direction of traversal of the edges) in the following way: Let the states x i correspond to the nodes, select E = A = I and let B be a node-link incidence matrix: We partition the input vector ⊤ to collect all actions corresponding to directed edges originating in node i into u i .The weights are selected as We now have the problem on the form required by Theorem 1 and finding a p satisfying (3) by fixed point iteration we get the optimal cost function as The optimal control law is given by ( 4) which yields the optimal routing in each node.We can easily verify that Assumption 1 and Assumption 2 are satisfied by the above choice of A, B and E. The shortest path problem assigns cost only to traversal of the edges, while the formulation (1) also associates a cost s i with each state x i and allows for the possibility of taking no action (u = 0).However, since the dynamics are not asymptotically stable (A = I), the optimal policy will never be to remain in i as long as the cost s i is strictly positive.This means the cost s i will be incurred once per edge traversal in the optimal policy.By reallocating some of the cost from the edges to all nodes except the endpoint x 4 (from r to s), we guarantee that the optimal policy will correspond to the shortest path.The weights (6) result in the same cost as in the original formulation for all paths that do not stay more than one time step in each node, i.e. all paths that are possible candidates for the optimal solution.

IV. DISTRIBUTED IMPLEMENTATION
As the dimensions of the state and input spaces grow in (1), it is desirable that the complexity of the solution method scales favorably.One method for finding the optimal control of ( 1) is to solve the linear program in Theorem 1.The number of parameters scales linearly with the dimension of the state and the program is thus computationally tractable even for high-dimensional problems.In many applications, however, global communication and access to information is not feasible.An alternative method is to distribute the the solution of the problem, requiring only local communication and knowledge of the local dynamics.Below we present an algorithm for distributed and asynchronous calculation of the optimal cost function for agents with limited knowledge of the global state.
Consider the dynamics (1) when each state x i belongs to a separate agent i with its own constraint 1 ⊤ u i ≤ E ⊤ i x, so that M = n.The optimal cost function J * (x) = p ⊤ x with p fulfilling (3) can be found by asynchronous distributed value iteration starting from the ansatz p 0 = 0. Agent i affects other agents through either the dynamics A, the constraints E or the inputs governed by the structure of B. Here, let W (i) ∈ R n×n denote the local incidence matrix of node i given by We define a set of neighbors of i for each of these interactions: with equality for the choice E = A. Note that these sets are defined to include only out-neighbors of i. Next, index the columns of A as A = A 1 • • • A n and let each agent i have local estimates p(i) , q(i) ∈ R. Further, let p = p(1) • • • p(n) ⊤ denote the vector containing all agents' estimates of the local value function.We have shown in the proof of Theorem 1 that (3) has a solution p in R n + .The positive orthant is invariant under iteration of the Bellman equation ( 2).As a consequence, if p satisfying (3) exists, then value iteration can be performed in a distributed fashion following Algorithm 1 with local estimates converging to the optimal cost function [5, Proposition 2.6.2],under the additional assumption that all agents update their estimate an infinite number of times as t → ∞.Note that steps 8 and 9 can (with slight abuse of notation) be evaluated despite agent i not having access to the full vector p, since the unknown terms in both cases are multiplied by zero.As stated in Algorithm 1, each agent stores only local estimates p(i) and q(i) which converge to the fixed point corresponding to the solution of (3).In order to extract the optimal policy, agents would additionally need to store the index of the element that minimizes the expression in step 5.
Algorithm 1 Asynchronous value iteration 1: local estimates p(i) , q(i) ← 0 for i = 1, . . ., n 2: local terminating conditions c i ← 1 for i = 1, . . ., n 3: while ∃i such that c i = 1 do 4: sample agent i ∈ {1, ..., n} 5: receive p(j) from each neighbor j ∈ N A i ∪ N B i 7: receive q(j) from each neighbor j ∈ N E i 8: : end if 15: end while To perform Algorithm 1, agent i needs to store and access, in addition to the sets of in-and out-neighbors, only the following: A ji for j ∈ N A i , E ji for j ∈ N E i and B i .From the perspective of privacy we note that agent i requires only the quantities pj and qj = min{r j + B ⊤ j p, 0} from its neighbors, which contains no information about which action attains the minimum.Agents are not required to reveal their optimal action in order to perform the algorithm.
Remark 1: The algorithm as stated here does not include a truly distributed terminating condition.In settings where the optimal controller is calculated online and system parameters are potentially subject to change, it is realistic to run the algorithm indefinitely.Various methods of termination detection for distributed and asynchronous algorithms exist dating back to the seminal work [11], but applying these to the results presented here in a computationally efficient manner is a non-trivial problem in its own right and beyond the scope of this paper.

V. EXAMPLE: LINEAR FLOW DYNAMICS
We now present a more intricate, yet physically motivated, example to demonstrate some of the nuances that can be captured by the presented framework beyond the simpler shortest path problem.In the following example an optimal controller is derived for a system where endpoints are not specified a priori and the routing problem is complicated by the underlying dynamics of the network.
Example 2: Consider a simplified liquid cooling system serving some type of large-scale industrial plant.The system is represented by a directed graph G(V, E) where the set of links E models travel across the pipes that transport the liquid coolant between nodes V.Each node i ∈ V corresponds to a different location in the plant and has an associated state x i that specifies the heat contained there.The dynamics are given by (1) with n = |V| and m = |E|.Let A ii model the loss of heat at location i due to both dissipation and diffusion to other states, while the off-diagonal elements A ij , j ̸ = i model the heat diffusing from i to other states j.For all  states we have j A ji ≤ 1, where equality means that no dissipation occurs in state i.
In each node i, the heat can be transferred by selecting the input vector u i , dictating how much heat is transported across each of the links e ji leading from i.The amount transferred is limited by the current amount present in state x i , giving the constraint 1 ⊤ u i ≤ E ⊤ i x where E ii = A ii and E ij = 0 for i ̸ = j.The matrix B has one column corresponding to each directed link e ij (from j to i), in which the jth element is -1 and the ith is a number in the interval [0.95, 1) modeling the heat loss over the pipe.The weights r and s signify the relative importance of cooling the different sections of the plant.Since each pipe is represented by two links e ij and e ji , the weights and heat dissipation for the two directions are set to be identical.
Figure 2 illustrates the dynamics of an example system with n = 26.The coloring of the nodes and links represent the relative costs r and s of transporting and storing one unit of heat respectively.The stationary control law found by applying Theorem 1 is illustrated in Figure 3 by indicating the optimal routing of heat across the graph.In any node i that has no arrow originating in it, the optimal control is u i = 0, letting the heat diffuse and dissipate passively.As the dissipation only occurs in certain nodes, all heat must necessarily be transported to these either through passive diffusion or active heat transfer in order to obtain a finite value of the optimization problem (1).
Figure 4a shows the elements of p k when applying value iteration (fixed point iteration of (3)) to find p for the above problem, with initialization p 0 = 0.In Figure 4b the evolution of the local cost estimates p(i) when performing Algorithm 1 for the same problem are displayed.Both methods converge to the fixed point corresponding to the optimal cost function in Figure 3 and the rate of convergence differs approximately by a factor n = 26, since only one agent updates their state in each iteration of Algorithm 1.In other words, the figures show both methods converging after approximately the same number of agent updates.

VI. CONCLUSION
We have presented a framework for calculating optimal control of positive network dynamics in a scalable way.The resulting controller is a state feedback law u(t) = Kx(t) where K is sparse given some inherent sparsity of the dynamics.Furthermore, we have given a distributed and asynchronous algorithm for finding the optimal cost function and corresponding controller.This framework admits a wide range of network problems, both well and lesser studied.This indicates a possible direction for further exploration in adapting tools developed on shortest path problems and Markov decision processes for use in optimal control.

Fig. 2 :
Fig. 2: Graph illustrating the cooling system in the example.The coloring of nodes and edges represent the cost vectors s and r, ranging from bright blue, corresponding to a cost of 0.2 per unit of heat transported, to bright red, corresponding to a cost of 1. Intermediate hues represent costs on that interval.Dashed red lines signify diffusion of heat between nodes and dashed blue lines represent dissipation.

Fig. 3 :
Fig. 3: The optimal cost function J * (x) = p ⊤ x given by Theorem 1 with each node showing the corresponding element of p. Dashed lines indicate the possible inputs (in both directions) and solid arrows show the direction of the optimal stationary control policy.

Fig. 4 :
Fig. 4: (a) Elements of the cost function iterate p k evolving under fixed point iteration of (3), starting from p 0 = 0.Each element corresponds to a node in Example 2. (b) Local cost function estimates p(i) when performing distributed value iteration according to Algorithm 1 for the nodes in Example 2. Note that the apparent difference in convergence rate compared to Figure 4a is a result of only updating one node in each iteration of Algorithm 1. Scaling the axis of iterations by a factor n = 26 illustrates that the rate of convergence for the two methods is similar.
The authors are with the Department of Automatic Control and ELLIIT Strategic Research Area at Lund University, Lund, Sweden.This work is partially funded by Wallenberg AI, Autonomous Systems and Software Program (WASP) and the European Research Council (ERC) under the European Union's Horizon 2020 research and innovation programme under grant agreement No 834142 (ScalableControl).