GLOBAL OPTIMAL FEEDBACKS FOR STOCHASTIC QUANTIZED NONLINEAR EVENT SYSTEMS

. We consider nonlinear control systems for which only quantized and event-triggered state information is available and which are subject to random delays and losses in the transmission of the state to the controller. We present an optimization based approach for computing globally stabilizing controllers for such systems. Our method is based on recently developed set oriented techniques for transforming the problem into a shortest path problem on a weighted hypergraph. We show how to extend this approach to a system subject to a stochastic parameter and propose a corresponding model for dealing with transmission delays.


Introduction
When the loop between a given control system (the plant) and an associated controller is closed via a digital network it is often necessary to include certain properties of the network into the model of the overall (closed loop) system, cf.[38].For example, typically the bandwidth of the network imposes restrictions on how much data can be transmitted during a given time span.Also, in networks where data is transmitted in "packets" (which is the typical situation, e.g. in networks using TCP), the transmitted data might arrive delayed -or not at all (the packet gets lost).
There are essentially two ways in order to reduce the amount of information which is transmitted from the plant to the controller: One can reduce (1) the frequency by which data is transmitted and/or (2) the size of each data packet.A common approach for (1) is to transmit data not in regular time intervals ("sampled data approach", cf.[2]), but only when necessary in order to guarantee stability, i.e. when a certain "event" requires to transmit new data.During the last years, this paradigm of event-based control is subject to much research effort, cf.[1,17].A method in order to reach goal (2) is to reduce the number of bits representing a state as much as possible.This is equivalent to using a quantization of the state space of the system (i.e. a partition).Whenever the current state is send to the controller, only the information on which cell of the partition the current state is contained in is transmitted, cf.[8,14,26,31].
In addition to the requirement of reducing the amount of transmitted information as much as possible, typically any practical available network suffers from a further drawback, namely the fact that data is transmitted with varying delays -which may even be infinite in the sense that certain packets do not arrive at all at the controller, cf.[23].This of course can noticeably deteriorate the behavior of the closed loop system, up to a complete loss of stabilizability for parts of the state space.There haven been various attempts to cope with this: For linear and non-quantized systems we refer to, e.g., [13], for non-linear systems with constant delay to [22], for modeling losses without delays to [27], and for a focus on network protocols to [28].For a unified study of quantization and delay effects in nonlinear control systems see [21] where a quantized feedback control method is combined with the small-gain approach.The article [9] focuses on quantization and delay effects for linear systems using an LMI approach to find a controller with saturation.Recent investigations also deal with the design of encoders and decoders in order to stabilize a quantized time-delay nonlinear system and use a Liapunov-Krasowskii-functional approach and dynamic quantization to construct a stabilizing feedback [3,6,30], cf. also [24] for a small gain approach.In [32] an event based triggering scheme is presented in order to construct a real-time scheduler for stabilizing control tasks.We also refer to [29] for the construction of symbolic models for systems with time delays and to [35] for a study of distributed networked control systems with delays and losses, proposing a decentralized event-triggering scheme.
In this paper, we extend a recently developed approach for the construction of global optimal feedbacks for nonlinear quantized event systems which is based on a set oriented discretization of the optimality principle (cf.[10][11][12]16]) to the case where an additional external stochastic parameter is present in the system.We use this new construction together with an appropriately developed model for delays in order to design a global optimal feedback which to a certain extent is robust to delays.The underlying quantization is given by an arbitrary -but finite -partition of state space and yields a controller which practically stabilizes the system, i.e. it drives the system into a given target set.This is in contrast to, e.g., methods based on logarithmic quantization which are able to asymptotically stabilize to a point, cf.[36,37].We assume that the transmitted data is time stamped and that the plant and the controller possess synchronized clocks such that the controller can compute the actual delay of the received data.As with any method which is based on a space discretization of the optimality principle, our approach is subject to the curse of dimension, i.e. the number of nodes in the hypergraph scales exponentially in the dimension of the state space.Consequentely, our method is restricted to systems of small dimension (up to four on current standard hardware, say).On the other hand, the class of systems to which it applies is rather general.
The paper is structured as follows: After recalling the basics of our construction from [11,12] in Section 2 and giving an example which shows how badly delays can impact on the stabilizable set in Section 3, we develop the theoretical framework for dealing with such problems in Section 4 by discretizing a stochastic Bellman equation using a set oriented approach.There, we also prove a result on the stochastic stability of the associated feedback closed loop system.In Section 5, we then propose a corresponding model for incorporating delays and illustrate our concept by reconsidering the example from Section 3.

Optimal feedbacks for quantized event systems
The plant is modeled by a nonlinear discrete time control system (which may, e.g., be derived from a continuous time system by time sampling) where f : X × U → X is continuous, x k ∈ X is the state and u k ∈ U is the control input at time k, X ⊂ R n and U ⊂ R m compact.In addition to f , we are given a continuous running cost function c : X × U → [0, ∞) as well as a closed target set X * ⊂ X .We assume c to satisfy c(x, u) = 0 iff x ∈ X * .Our goal is to compute a feedback law for this system which drives the system into the target set X * (where a different, locally acting controller takes over) while accumulating the least costs possible.However, the information which is transmitted from the plant to the controller is restricted in the following two ways: 1. Event model: The controller only receives information on the state whenever an event occurs.That is, even though the system (1) moves from x k to a new state x k+1 , this information possibly will not be transmitted to the controller.Instead, the plant "waits" until a certain condition on the new state is fulfilled (for example, when the new state exceeds a certain distance from the old one).We are going to check for this condition by introducing an event function r : X × U → N ∪ {∞}.For example, r(x, u) might be defined as the smallest r ∈ N such that f r (x, u) − x > ε for some prescribed tolerance ε > 0 (where the iterate f r is defined by . Note that we include the possibility r(x, u) = ∞, which handles the situation that for some x ∈ X , u ∈ U there is no r ∈ N for which the encoded condition is fulfilled (for example, if x = f (x, u)).
As in [12], based on the discrete time model (1) of the plant, we are now dealing with the discrete time system where f (x, u) = f r(x,u) (x, u) and we set f ∞ (x, u) = x.Accordingly, we define an associated running cost c : The natural number enumerates the events and we can reconstruct the true time k from via the event function r by k +1 = k + r(x , u ).
Remark : In practice, one will set r(x , u ) = ∞ if r(x , u ) > R for some upper bound R < ∞.

Quantization model:
The controller only receives quantized information on the state.Formally, we are given a (finite) partition P = {P 1 , . . ., P d } of cells P i ⊂ X (in our implementation, we use boxes aligned with the coordinate axes).For each x ∈ X , we denote by [x] ∈ P the cell containing x.At event time , only [x ] is transmitted from the plant to the controller.This fact is modeled by a choice function γ : P → X which chooses an arbitrary point from a given cell, i.e. γ fulfills [γ(P )] = P for all P ∈ P. We denote by Γ the set of all these choice functions.The quantized model of the plant is now given by the finite state system, cf.[11,12], defined by Computationally, an explicit construction of the choice function γ is not necessary.All we need to be able to compute (cf. the next section) is F (P, u, Γ) := {F (P, u, γ) | γ ∈ Γ} which can be either approximated by mapping a finite set of sample points from P or by using interval arithmetic in the case that the partition P consists of rectangles.Both approaches can be made rigorous, cf.[15,33].As a result, in each step, in addition to u, a choice function γ has to be chosen.Thus, we now have two control parameters, where γ should be viewed as having a perturbative effect on the dynamics.In fact, formally, together with a suitable cost function (cf. the next section) the system (3) constitutes a dynamic game.
Note that for any fixed u, the function x → r(x, u) is not necessarily constant on a cell.Accordingly, without further ado, it is not possible to recover the "true time k" from the transition events in (3).

Computing the optimal feedback
In order to be compatible with our quantization, from now on we assume that X * is given by the closure of the union of the cells from a subset X * ⊂ P, i.e.X * = P ∈X * P .For the quantized system (3) we define the cost function For P 0 ∈ P, (u ) ∈ U N and (γ ) ∈ Γ N , the cost accumulated along a trajectory (P ) ∈ P N of (3), starting in P 0 , is where L = L(P 0 , (u ) , (γ ) ) := inf{ ≥ 0 : P ∈ X * }.Note that possibly L = ∞, in which case the series does not converge since there are finitely many partition elements and so min P∈P inf u∈U C(P, u) > 0 by the assumptions on c.The optimal value function is where γ : and the sup in the definition of the optimal value function is over all strategies of this form.This construction models the fact that in each step of the dynamics, the choice function γ is chosen after the control u , i.e. the "perturbing player" has the advantage of knowing the choice of u , cf. [7].
Typically, there will be cells P ∈ P with V (P ) = ∞.For example, any cell P which contains a point x = x 0 ∈ X which is not stabilizable to X * , i.e. for which there is no control sequence (u ) such that the associated trajectory (x ) of (2) converges to X * will have V (P) = ∞.Another example is a cell P which contains a point x with r(x, u) = ∞ for all u ∈ U.
We let S = {P ∈ P | 0 < V (P ) < ∞} denote the stabilizable subset of P and S = ∪ P ∈S S ⊂ X .Note that we exclude the target region X * from S here since we only want to control the system into X * .By standard arguments, cf.[4], the optimal value function V restricted to S is the unique solution to the optimality principle with the boundary condition V | X * = 0. Given V , we can construct a feedback for (3) resp.( 1) by setting Note that the minimum exists since V attains only finitely many values and f and c were assumed to be continuous and thus u → C([x], u) is continuous on the compact set U.
The optimal value function can be computed by an efficient shortest path algorithm applied to the hypergraph1 G = (P, E), whose nodes are the cells of the partition P and whose edges are given by cf. [11].This hypergraph encodes local reachability information between the cells of P.
For a given control u ∈ U and a given choice function γ ∈ Γ, F (P, u, γ) is a single cell from P. Accordingly, F (P, u, Γ) = {F (P, u, γ) ∈ P | γ ∈ Γ} is the set of all cells which can be reached from P using this fixed u ∈ U. Since there are only finitely many cells in P, there are only finitely many subsets F (P, u, Γ) ⊂ P for varying u ∈ U (even if U is not finite).For each of these subsets, the hypergraph contains a corresponding edge (P, N ), cf.Fig. 1.A shortest path in a such weighted hypergraph can be computed by an efficient Dijkstra-type algorithm, cf.[11,34].

Construction of the hypergraph
In an implementation, the easiest way to construct this hypergraph is by mapping finitely many sampling points from each cell P (corresponding to choosing a finite set Γ of choice functions from Γ), using finitely many sampling points Ũ ⊂ U: For each cell P ∈ P and each ũ ∈ Ũ, compute u (2)  f (P, u (2) ) f (P, u (1) ) F (P, u (1) , Γ)
where x = γ(P, ũ), γ ∈ Γ, is a sampling point from P .The minimization in computing the weights (4) can then be performed discretely.This is the approach we have been using in the numerical experiments in the following sections.Typically, of course, depending on how the sampling points are chosen, this sampling approach will result in some edges being improperly constructed or even missing.In practice, this problem can largely be avoided by repeating the computation with an increasing number of sampling points until the results do not seem to change any more.
In principle, one could also construct the hypergraph in a rigorous way by using properly constructed Lipschitz estimates on the map f [15] or by using interval arithmetic [33].We refer to [16] and [11] for further details on how to construct the hypergraph.

A controller subject to delays and losses
In addition to the two restrictions modeled in the previous section, namely (1) the event based information transmission and (2) the transmission of quantized information only, we now additionally assume that the transmission of the state information from the plant to the controller is realized via a digital network and that this transmission is subject to delays and even losses.More precisely, we assume that the state information P generated at time k reaches the controller at time k + δ , where δ is a random variable with a known distribution π on N ∞ := {0, 1, 2, . . ., ∞}, cf. Figure 2.
In order to exemplify the effect of this additional restriction, we perform the following experiment: we consider the classical inverted pendulum on a cart, cf.[16].The dynamics of the pendulum is given by the continuous time control system where ϕ ∈ [0, 2π] denotes the angle between the pendulum and the upright position and u ∈ U := [−64, 64] is the force acting on the cart.We have used the parameters m = 2 for the pendulum mass, m r = m/(m + M ) for the mass ratio with cart mass M = 8, = 0.5 as the length of the pendulum and g = 9.8 for the gravitational constant.The instantaneous cost is Denoting the evolution operator of the system for constant control functions u(t) ≡ u by Φ t (x, u), x = (x 1 , x 2 ) = (ϕ, φ), we consider the discrete time system f (x, u) given by approximating the evolution Φ T (x, u), T = 0.01, by the explicit Euler scheme with step size 0.0025 (and constant control u ∈ U).Likewise, the discrete time cost function c(x, u) is obtained by an associated numerical quadrature of the continuous time instantaneous cost.We choose X = [0, 2π] × [−8, 8] as the state space and ] as the target region.For the partition P of X we use a uniform grid of 2 6 × 2 6 rectangles.By means of this grid we define the event function r as follows: Let s(x) ∈ R n and ρ(x) ∈ R n denote the center and the radius of the rectangle containing x, respectively.Then by means of the event set with event radius e r = 9.4 we define the event function r(x, u) = min{t ∈ {0.01, 0.02, . . ., 10} : In other words, the event function indicates when the corresponding event set is left.We note that an event set overlaps with other event sets, i.e. event sets do not form a partition of X .In fact, using the given partition cells as event sets would not yield a stabilizing feedback in this example.A corresponding numerical experiment shows that in this case the image of a given cell near the target set stretches too far along the unstable direction of the origin.In contrast, the chosen event radius is rather arbitrary and could also be chosen such that the event sets are aligned with the given partition (e.g., e r = 9) -which would enable the plant to emit events based on the given quantization of state space.
For the construction of the hypergraph, we use a grid of 5 × 5 equidistant sample points in each space cell P as well as 17 equally spaced points in the control set U = [−64, 64].The cost function C is computed by maximizing c over the grid points in each cell P .This discrete distribution is inspired by typical delay distributions as determined in [23].
In order to illustrate the effect of delays on the closed loop system we employ a feedback which was constructed for the model without delays as described above.We now simulate the closed loop system, first without delays and then with random delays up to 90 ms.The underlying distribution of the (independent) delays is depicted in Figure 2. The shape of this distribution is inspired by typical delay distributions as experimentally determined by [23].Figure 3 shows the effect of the delays on the stabilizable set of the employed feedback.

Feedbacks for stochastic quantized event systems
In a digital network, packet delays and dropouts typically occur at random.In order to model this situation, we extend (2) by a third, random parameter δ (which will be used to model delays later on), i.e. we consider a stochastic event system where at each event instance the parameter δ ∈ N ∞ is chosen independently from a given distribution π : N ∞ → [0, 1].We assume the map g : X × U × N ∞ → X to be continuous in x and u.The running cost c and the target region X * are given as in Section 2, so we assume c(x, u) = 0 iff x ∈ X * .In Section 5, we will develop a specific model g in which the parameter δ denotes the time delay by which the state information reaches the controller.In this section, we first abstractly extend the framework of the previous section to the case of a stochastic system (8).

Quantization model
Again, the controller only receives quantized information on the state and, using the same construction as in Section 2, we model the plant by a stochastic finite state system P ∈ P, defined by G(P, u, γ, δ) = [g(γ(P ), u, δ)].

A stability theorem
Clearly, due to the randomness of the parameter δ, in general one cannot expect the feedback (11) to render the closed loop system (asymptotically) stable in a deterministic sense.However, one can prove stability with a certain probability.The key results here are from stochastic stability theory using stochastic Liapunov functions originally proved in [5,18] and [19].We are going to use a version from [20].In summary, our result is as follows: Given a particular selection of λ > 0 bounding the attainable cost, one obtains the probability of actually achieving that cost (or lower).Furthermore, almost all of the trajectories achieving such a bound also converge to the target set.To be more precise, for λ > 0 and V from (10), let S λ = {x ∈ X : V ([x]) ≤ λ}.Then using the optimal feedback (11), the closed loop system is stochastically stable in the sense of the following theorem.
Proof.We will show that V (x) := V ([x]) is a stochastic Liapunov function for (12) in the sense of [20], Theorem 4.1 in Chapter 4. From (10) and (11) we have that Now for any u ∈ U, so that with x = x and u(x) = u(x ) from ( 13) we get Since by (11) the feedback is constant on partition elements, i.e. u(x) = u([x]), the right hand side is a function constant on the finitely many [x ] and greater than 0 for x not in the target set X * .Hence, there exists a nonnegative continuous function α(x) ≤ C([x], u(x)) which is 0 exactly on the target set, and it immediately follows which is the condition in [20], Theorem 4.1 (Chapter 4).

Implementation
In order to compute the optimal value function defined in the previous section we perform a standard value iteration (in Section 2 a Dijkstra type shortest path algorithm can be used due to the deterministic nature of the system).Based on (10), the value iteration reads with V 0 (P ) = 0 if P ∈ X * and V 0 (P ) = ∞ else.As in similar situations (cf., e.g., [10]) the use of graph algorithms still proves helpful here.The main reason is that the dynamic game is represented by a graph for which an evaluation is much faster than its construction.Particularly, for the value iteration this means that the number of iterations does not influence the number of evaluations of G.For more details on how to construct the underlying hypergraph we refer to [10,11].However, for our stochastic Figure 4: A hyperedge of order 2: the children at depth 1 correspond to the states in cell whereas the children at depth 2 correspond to the variation of δ.
framework, we need a slightly more general concept.We note that a classical hyperedge (which we call a hyperedge of order 1 here) is a tree of depth 1 with the root being the start node and its children being sets of nodes reached by the one-step dynamics of the system under consideration.In order to be able to compute the optimal value function by (16) we introduce a new kind of hyperedge which we call hyperedge of order 2 (cf. Figure 4).A hyperedge of order 2 is a tree of depth 2, the children at depth 1 correspond to different states y i = γ i (P ) within the current cell P , whereas the children at depth 2 correspond to the different values of the stochastic parameter δ.Analogous to game trees (cf., e.g., [25]) it is now possible to first calculate the expectation over the values of nodes at depth 2, collect the result at depth 1 and then calculate the maximum over the y i to obtain the new value V j+1 (P ) for cell P .

Numerical experiment
In this section, we first present an abstract model for an event system which incorporates delayed and lost transmissions of the state from the plant to the controller.We then apply the construction from the previous section in order to obtain a feedback which is robust to these delays and losses in a stochastic sense, i.e. the closed loop system will be stochastically stable in the sense of Theorem 4.1.We reconsider the example from Section 3 and experimentally demonstrate that our new feedback construction possesses almost the same stabilizable set as the original feedback for the system without delays and losses.

System and delay model
We consider a system modelled as described in Section 2, i.e. the plant is modelled by a nonlinear discrete time control system f and an event generator which implements an event function r.Whenever an event is generated in the plant, it is transmitted to the controller, but this transmission is subject to a delay δ ∈ N ∞ (where δ = ∞ corresponds to the possibility that the information does not reach the controller at all, i.e. a "packet loss").Since the transmission of the events from the plant to the controller is subject to delays and losses, the plant will still operate for some time with the old control input computed from the previous event, cf. Figure 5. Formally, we model this situation by the stochastic event system where the time index enumerates the events as generated in the plant (cf.Section 2), the delays δ ∈ N ∞ are chosen i.i.d.from a given distribution π, the vector z = (x , w ) ∈ Z := X × U denotes the extended state (x ∈ X the current state, w ∈ U the old control input) and the mapping g is defined as follows: where t = t(δ, z) = min{δ, r(z)}, s = s(δ, z, u, t) = s(δ, (x, w), u, t) = r (f t (x, w), u) if δ < r(z), 0 if δ ≥ r(z), w = w (δ, z, u) = w (δ, (x, w), u) = u if δ < r(z), w if δ ≥ r(z).
In this model, any delay δ ≥ r(z) is treated as δ = ∞, i.e. as if the corresponding data would never reach the controller.
f s (f t (x , w ), u ) w u time t s r(x , w ) x Figure 5: Delay model: at time k the -th event is generated and the system is in state x .The transmission of the state information from the plant to the controller is delayed by t time units, during which the old control input w is still operational.At time k +t (when the plant is already in state f t (x , w )) the state information x reaches the controller, changing the input to its new value u = u(x ).

The delayed inverted pendulum reconsidered
We reconsider our example from Section 3.However, we now compute a feedback law based on the framework of Section 4, i.e. the computation of the feedback already utilizes a model that includes the delay distributed according to Figure 2. In order to experimentally check for the stabilizable set in phase space, we randomly choose 100 finite sequences (δ i ) 1000 =0 , i = 1, . . ., 100, by choosing each δ i i.i.d.according to the given distribution and 25 sample points in each partition cell.If the feedback trajectory of at least one sample point associated to some delay sequence (δ i ) 1000 =0 leaves the given phase space X or does not reach the target set X * then this cell will be considered as being not stabilizable to the target region, cf.Figures 3 and 6.These figures illustrate that by incorporating the delay into the construction of the controller a much larger region of the state space X remains stabilzable.In fact, the stabilizable set for the delayed system with delay-based controller almost does not deteriorate in comparison to the undelayed system with the standard controller.It seems like at the boundary of the controllable region some discretization effects occur due to the finite number of test points in each box.

Figure 2 :
Figure 2: Probability distribution of the delays δ in the inverted pendulum example.This discrete distribution is inspired by typical delay distributions as determined in[23].

Figure 3 :
Figure 3: The inverted pendulum controlled by the feedback construction from Section 2 for the system without delay model: In color are the regions of state space which are stabilizable to a neighborhood X * of the origin (black centered rectangle) for a simulation without delay (left) and for one with stochastic delays up to 90 ms (right).The color of a cell indicates the average accumulated cost for initial states from that cell.

Figure 6 :
Figure 6: The inverted pendulum controlled by the feedback construction from Section 2 (left) and from Section 4 (right): In color are the regions of state space which are stabilizable to a neighborhood X * of the origin (black centered rectangle) for a simulation with stochastic delays.The color of a cell indicates the average accumulated cost for initial states from that cell.