Optimizing the stable behavior of parameter-dependent dynamical systems - maximal domains of attraction, minimal absorption times

We propose a method for approximating solutions to optimization problems involving the global stability properties of parameter-dependent continuous-time autonomous dynamical systems. The method relies on an approximation of the infinite-state deterministic system by a finite-state non-deterministic one - a Markov jump process. The key properties of the method are that it does not use any trajectory simulation, and that the parameters and objective function are in a simple (and except for a system of linear equations) explicit relationship.

• The third group consists of probabilistic approaches, with trajectory simulation [GNW04], or without [Kol]. Absorption probabilities for an approximate non-deterministic system reveal properties of the DOA. Probabilistic approaches for solving stochastic control problems have been studied by Kushner et al. [Kus77,KD92].
In order to view the stability behavior of the system as an optimization problem, we choose a simple setting: let a real valued objective function be given, which depends on the stability properties of the system in a desired way; such a function could be e.g. the Lebesgue volume of the DOA. To make the optimization more convenient to carry out, we assume the dependence of this objective function on the control parameter to be sufficiently smooth (at least differentiable), that we can use a simple gradient method to get to a (local) maximum/minimum. Along the lines of the above example (maximization of the volume of the DOA) we would like to highlight some difficulties arising if we try to use one of the above methods for optimization by the gradient method. For methods using direct simulation, computing the gradient of the objective function will require by the chain rule the computation of derivatives of the flow with respect to (w.r.t.) the parameters. Thus, variational equations for the underlying ordinary differential equation (ODE) have to be solved. These are computationally expensive, since they are in general higher dimensional than the original ODE 2 , and they also require the derivative of the vector field (which also has to be computed numerically). The Lyapunov function based techniques deliver only the boundary of the DOA as level set of some scalar function. It is not clear how to obtain the derivative of the level set w.r.t. the parameters, and if there is a way to devise an efficient method based on this. Further, the most Lyapunov function based techniques deliver merely a subset of the DOA which' topology depends on the characteristics of the chosen Lyapunov function class (e.g., the level sets of quadratic functions are always ellipsoids).
The considerations above should give the reader a first insight into what one has to deal with when trying to solve an optimization problem associated with the DOA. Nevertheless, the main purpose of this work is not the comparison of the different approaches one could use for optimizing the objective function, rather to analyze and show the feasibility of the method we chose.
The approach. Our method is based on a set-oriented approach to approximate properties of dynamical systems [DH97,DJ98]. We partition the state space X into disjoint sets, which serve to discretize the original deterministic dynamical system, and relate it to a finite state non-deterministic system -a Markov jump process. Then we construct the infinitesimal generator (matrix) of the latter by integrating the vector field of the ODE on the boundary of the partition elements. From this matrix we compute absorption probabilities for the finite state system, which yield the desired information about the DOA; or we compute expected absorption times, which serve as approximations of the absorption times of the original system. For example, the volume of the DOA is simply the integral over the absorption probability function associated with the original deterministic system. 3 All of the involved computations are explicit (except for the solution of a system of linear equations), and it is easy to obtain their derivatives. Composing these derivatives according to the chain rule yields the derivative of the objective function w.r.t. the parameters, and allows the application of local optimization methods, e.g. the gradient method.
Outline. This work is structured as follows. Section 2 collects the mathematical background on deterministic and non-deterministic dynamical systems which are going to be used afterwards. In Section 3 we introduce the discretization we use to make a finite-state non-deterministic system from the original infinite-state deterministic one. Section 4 is an outlook: it collects possible alternatives for the approximation of the absorption times of the deterministic system by quantities derived from the non-deterministic one. Section 5 is the main part of this work stating the optimization problem and showing how we propose to solve it. Numerical examples follow in Section 6, and some concluding remarks are collected in Section 7.

Dynamical systems
The temporal variation of states x(t) ∈ R d we are interested in is given by an (autonomous) ODĖ We are only looking at a compact subset X of the whole state space R d . If a trajectory leaves X it immediately terminates; in this case we write φ t x = ω, where ω is a fictive state representing everything outside X . We call the pair (X , φ t ) a dynamical system, and we refer to X as state space. Later, we will focus on systems for which some states shall be steered into a target region T ⊂ X in a prescribed way. Following objects will be at the center of our attention.
Definition 1. For the target region T ⊂ X we call the set the domain of attraction of T . The quantity is called the absorption time of x.
Note that we are only interested in trajectories which stay all the time in X , until being absorbed in T .

Markov chains
We use the theory of Markov chains to describe non-deterministic dynamical behavior. From now on in this section let Y = {1, 2, . . . , n} be a discrete state space. We also work with a given probability space (Ω, F, P).
Definition 2 (Stochastic process). Let I = N or I = R ≥0 , and let Y be the set of possible states. Then a family {Z t } t∈I , where Z t : Ω → Y for all t ∈ I, of Y-valued random variables is called a stochastic process.
Definition 3 (Discrete time Markov chain). We call a stochastic process {Z t } t∈T a discrete time Markov chain, if I = N, and P ∈ R n×n with (i) P ij ≥ 0 and (ii) n i=1 P ij ≤ 1, describes the transition probabilities, i.e.
for all t ∈ I and i, j ∈ Y. The matrix P is called the transition matrix. A matrix satisfying (i) and (ii) is called sub-stochastic. If (ii) holds with equality, P is called stochastic.
Definition 4 (Continuous time Markov chain). Let I = R ≥0 . Further, let G ∈ R n×n be such that Then P t is a sub-stochastic matrix for every t ≥ 0. 4 A process {Z t } t∈I with P t ji = P(Z t+s = j | Z s = i) for all t, s ≥ 0, and i, j ∈ Y is called a continuous time Markov chain. It is also often called a Markov jump process (MJP). The matrix G is called the (infinitesimal) generator of the process.
One may think of such MJPs as follows (cf. Section 2.6 in [Nor97]). Being in an arbitrary state i at an arbitrary time s, the process remains at the given state until some random time s + t, when it jumps, independently of t, at random to another state j. The jump time t has exponential distribution with parameter G ii , and the probability of jumping to state j (j = i) is −G ji /G ii , unless G ii = 0, in which case the process does not leave the state i ever.
Consider now a discrete time Markov chain with transition matrix P. If P is sub-stochastic but not stochastic, there is an i ∈ Y such that j∈Y P ji = p i < 1. In other words, if the process is currently in state i, there is a positive probability 1 − p i that it will not end up in Y in the next step -the process terminates. Analogous considerations can be done with a continuous time process. Its generator will satisfy j∈Y G ji < 0 for some i ∈ Y. We call such processes leaky. They reflect the restriction of our interest to the state space: everything leaving the state space is considered to be lost.
In the following we denote a Markov chain already terminated at time t by Z t = ω. Here ω represents "everything outside of the state space" -just as in the deterministic setting, Section 2.1.

Absorption probabilities and expected absorption times
Throughout this section we consider MJPs. Furthermore, we assume that the set T ⊂ Y is absorbing (i.e. P(Z t ∈ T | Z 0 ∈ T ) = 1), and that the process is possibly leaky. Then, under a reachability assumption, the process will either end in the absorbing set T , or "leak out" (i.e. terminate in the fictive state ω; cf. above). 5 Let p ∈ [0, 1] n denote the absorption probabilities for the absorbing set, i.e. p i = P Z t ∈ T for some t ≥ 0, provided Z 0 = i .

The corresponding absorption time
is a random variable itself; here we are interested in its expectation. The expected absorption times are defined as An important quantity will be the expected termination time. The termination time is a random variable measuring the time to termination either in the state ω or in the absorbing set T . We define We wish to compute p, a and t from the generator G of the process. Set G := G Y\T ,Y\T ; i.e. G is the matrix with the jump rates from Y \ T to Y \ T . The transiency of the set Y \ T is equivalent with lim t→∞ e t G = 0.
Proposition 5 ([Nor97] Theorem 3.3.1). Assume lim t→∞ e t G = 0. Then the absorption probabilities are the unique solution of The main difference between the expected absorption times and termination times can be seen without computation. It holds a ω = ∞, but t ω = 0. A representation of these quantities involving the generator has to respect this fact.
Proposition 6 ([Nor97] Theorem 3.3.3). Assume lim t→∞ e t G = 0. Then the expected termination times are the unique solution of Along the lines of the proof of Proposition 6 we obtain Corollary 7. Assume lim t→∞ e t G = 0. Then the expected absorption times are the unique solution of where G ωi := −G ii − j∈Y G ji , the termination rate from state i.
As one can see, the quantities p, a and t can be computed by solving systems of linear equations. There is a problem with a which will pose us difficulties when computing absorption times from a discretization of the system. Starting at i, if there is a non-zero probability (does not matter how small) of terminating in ω at some future time, the expected absorption time will be infinite. Even if these nonzero probabilities come from discretization errors, it makes the resulting absorption times not applicable for the approximation of absorption times of the original system. Note that a {p=1} = t {p=1} , i.e. a and t coincide for the states which have absorption probability one; clearly because these states terminate in T .
Alternative measures of absorption times are introduced in Section 4 below.

Discretization
The desired objects of the deterministic system defined byẋ = v(x) on X will be computed from the discretization given below. Partition the state space X into finitely many disjoint partition elements X 1 , . . . , X n , where each set X i has a piecewise smooth boundary ∂X i , such that the unit outer normal vector n j exists almost everywhere (measured by the d − 1 dimensional Lebesgue measure m d−1 on ∂X i ). Usually, the X i are rectangles or simplices.
Definition 8. The discrete generator matrix G n ∈ R n×n associated with the vector field v is defined by where x · y denotes the dot product of the vectors x and y, and f + denotes the positive part of the function f .
The discretization goes back to [FJK12], but apart from the probabilistic point of view, it is only the spatial discretization of the upwind method known from finite volume methods [LeV02]. A first application of this idea for the computation of the DOA appears in [Kol]. It is shown in the latter work that G n indeed generates a MJP on the state space with elements X 1 , . . . , X n , this MJP can be associated with a non-deterministic system on X which converges to φ t in distribution as n → ∞. This elucidates why we use absorption probabilities and expected absorption times of the MJP generated by G n in order to approximate the DOA and absorption times of φ t .
Due to this convergence in distribution if one starts the MJP on a sufficiently fine discretization with a distribution highly concentrated around a state x ∈ D, the distribution of Z t n is highly concentrated around φ t x. Now, if t is chosen such that φ t ∈ T (it is possible per definition of T ) then Z t n is already absorbed with high probability. This tells us intuitively that for a sufficiently fine partition p i ≈ 1 if X i ⊂ D; and similarly that the expected absorption times of Z t n approximate the absorption times of φ t well. 6 Remark 9. The consideration here should help to associate the system φ t with the MJP generated by G n . Let Z be a random variable distributed uniformly in X i . Then we have by Lemma 4.1 and Lemma 4.4 in [FJK12] d dt If now {Z t n } t≥0 is the MJP generated by G n , we also have from the definition that Note that unless the partition elements X i and X j have a fully d−1 dimensional intersection ∂X i ∩∂X j , it holds G n,ij = 0. Thus, G n is sparse. Another numerical advantage of the discretization is that it does not use trajectory simulation -only integrals of at least continuous functions (the (v·n j ) + ) on d−1 dimensional domains have to be computed.
Remark 10 (The "wrapping effect"). Having seen how the discretization works we can go back to the effect mentioned at the very end of Section 2. The figure to the right shows a trajectory of a stable system spiraling into one point, say, the origin. The stability is weak, since the decrease of the distance to the origin during one rotation is small. The rectangles in the figure indicate the elements X i of the chosen partition. Computing the discrete generator G n of this system, one can see, that no matter in which partition element one starts in, there is always a positive probability to leak out of the state space. This effect decreases as the partition gets finer, it does not vanish completely however, so the absorption times a n computed from G n will be infinity for all boxes except the target (per definition). To remedy this fact one can work with the termination times, or with other constructs; see Section 4 below.
The "wrapping effect": no matter how small the partition elements are, there is always a nonzero probability for the induced MJP to leak out.

Alternative measures for the absorption time
It has been already discussed that the expected absorption times a n of the discrete generator G n are not the right tool to approximate the absorption times τ of the deterministic system. Also the approximation through the expected termination times t n may be highly defective; e.g. when the time to absorption (if absorption occurs) and the time to leak-out (if leaking out occurs) are way apart.
We would like to discuss here two alternative approximations to τ by some quantities derived from the discrete generator which only depend on absorption times; i.e. on the random variable A. Both are solutions to a system of linear equations, thus their computation is not a harder task than that of the previously used quantities. Their incorporation into our approach presented below, and the analysis of their numerical properties are subject of future work.
Transformation. Let us recall that A denotes the random variable of absorption times. Since The advantage is that h attains always finite values, even if the expected absorption time a doesn't. The transformation τ → 1 − e −τ is called the Kružkov transform. Note that 1 − e −τ is continuous on the entire state space X if τ is continuous on D.
One can show that for a MJP with generator matrix G holds If h n is the solution of (5) corresponding to the discrete generator G n , we may use − log(h n ) to approximate τ .
Absorption times conditioned to absorption. After finishing this work the following idea of P. Pollett and D. M. Walker has been pointed out to us. If we expect a box to lie in the DOA but the corresponding absorption probability is smaller than 1, the expected absorption time conditioned to that absorption occurs in finite time [Wal98] is a very natural approximation to τ . Hence, define where p denotes the absorption probabilities. Then by Theorem 2 [Wal98] we have Since the magnitude of the p i may range over several orders of magnitude (in practice between machine precision to one), any numerical method using (6) has to address stability issues.

Parameter dependent system and objective function
The knowledge gained in the previous sections will be now applied to optimize the stable behavior of dynamical systems arising from the vector field v(x; b), where b ∈ R r is an adjustable parameter. The quality of the dynamical behavior is going to be measured by an objective function f : R r → R which will be subject to minimization or maximization w.r.t. b. This objective function will consist of two parts; one representing the desired behavior, and one representing the costs connected with the choice of the control parameter. Our two model problems are as follows.
Maximal DOA. The goal here is to obtain a subset of the state space with the biggest possible volume which is steered into the target region T . Correspondingly, one would like to maximize the volume m (D(b)) of the set D(b), the DOA associated with v(·; b). We define the objective function as with α > 0 being some penalty parameter, and |b| 2 : , the characteristic function of the set D(b), acts as a function of absorption probabilities for the deterministic system: if x is steered into T then χ D(b) (x) = 1, and 0 otherwise. Suppose a partition {X 1 , . . . , X n } satisfying the conditions of Section 3 is given. By computing the absorption probabilities p n from the discrete generator associated with v(·; b) for this partition, we obtain an approximation of the objective function: Our strategy is to use this approximation in order to carry out the maximization.
Minimal absorption times. Here we have a prescribed set D 0 ⊃ T , and the goal is to minimize the average absorption time over this set. The objective function is defined as with some penalty parameter α > 0; where τ (·; b) is the function of absorption times associated with v(·; b).
The approximation of this objective function is going to be discussed in Section 5.3 below.
Remark 11 (Other objective functions). Depending on the needs of the application one can work with other objective functions as well. Here we present two of them which can be handled by our method. They are not going to be discussed further in the following.
(a) If it is of higher importance that the DOA covers some specific regions, one could take some function θ : X → R to weight absorption probabilities. The objective function has a clear representation in terms of absorption probabilities.
(b) Average absorption speed maximization over a domain D 0 of interest:

Iterative optimization
In this section we describe the optimization procedure for finding (approximate) local extrema of the objective function f by performing a local optimization of its approximation f n . For the demonstration we choose the problem of DOA maximization. Suppose we have chosen a partition {X 1 , . . . , X n } of the state space X such that the target region T is the union of some partition elements. Later on we will use the short-hand notation i ∈ T for X i ⊆ T . Set f n (b) = n i=1 m(X i )p n,i − α|b| 2 . Next we show that under fairly general conditions the objective function f n is differentiable w.r.t. b. Then, by the gradient method The differentiability of all these mappings would imply the differentiability of f n , but milder conditions suffice as well. Let G n (v) denote the discrete generator associated with v. Since the number of partition elements, n, does not change throughout the computation, for simplicity we drop the subscript n.
• b → v A weaker assumption than the differentiability of v(·; b) w.r.t. b suffices here, see Remark 13 below.
• v → G Lemma 12. Let v be a continuous and δv a bounded vector field from X to R d . Further assume that m d−1 (∂X i ∩ {v · n i = 0}) = 0 for all i = 1, . . . , n, where n i denotes the outer normal on ∂X i . Then the directional derivative of G(v) in the direction δv, is given by where o(ε) denotes a function g(ε) such that g(ε)/ε → 0 as ε → 0. The second equation follows from the uniform continuity of v on compact sets and from the boundedness of δv, the third one from the condition m d−1 (∂X i ∩ {v · n i = 0}) = 0. The proof for i = j is the same.
Remark 13. The previous lemma still holds if δv is only defined on X \ N , and the set N satisfies m d−1 (∂X i ∩N ) = 0 for every i; i.e. there is no fully d−1 dimensional intersection between the boundaries of the partition elements and the set of points where δv is not defined. The application we have in mind is the case where v is continuous but only piecewise differentiable w.r.t. b. Such a case may arise if due to some technical limitation the control has a maximal amplitude; we will show this kind of an example later on.
• G → p Next we address the differentiability of p w.r.t. G. As in Section 2.3, let G denote the part of G with the jump rates between the X i which are not contained in T . Let q i = j∈T G ji . Ifp denotes the absorption probabilities corresponding to partition elements contained in X \ T , Proposition 5 shows that Hence p is arbitrarily smooth in the entries of G and we get by elementary calculus Lemma 14. The directional derivative ofp(G) in the direction δG is given by and Dp i (G) = 0 for i ∈ T .
• p → f (b) Finally, the first summand of f (b) is linear in p, thus differentiable; and the second summand is clearly differentiable w.r.t. b.
Summary: optimization procedure. Under the assumptions made in this section the objective function f is differentiable, and we sum up the optimization procedure by the gradient method as an algorithm. Initialization: Let some partition {X 1 , . . . , X n } of X be given. Choose some b 0 ∈ R r such that T D(b 0 ), a tolerance threshold TOL > 0, and step sizes γ 0 , γ 1 , . . .. Set For k = 0, 1, . . . perform the following steps: Remark 15. There are methods exploiting smoothness with higher performance than the gradient method (e.g. the Gauß-Newton method), however it is not clear at the first sight if the mapping v → G is differentiable more than once. Also, there are even more sophisticated first order methods than the simple gradient method [Nes83]. Using these methods for the optimization of our problem will be the subject of future work.

Minimal absorption times
We have already discussed at the end of Section 2.3 that the expected absorption times, a n , computed from G n may be inadequate for approximating τ . We also noted that a n,i = t n,i if p n,i = 1, hence we expect t n to be a good approximation of τ on D. Under the same assumptions as in Section 5.2, and by using the short-hand notation i ∈ D 0 for X i ⊂ D 0 , we aim to minimize the objective function f n (b) := i∈D0 m(X i )t n,i + α|b| 2 for some prescribed region D 0 . Just as above, we drop the subscript n for simplicity.
The strategy is analogous to the one in the previous section: we establish the differentiability of f w.r.t. b, compute the derivative Df (b), and apply the simple gradient descent method to compute a local minimum of the objective function.
The differentiability of f is proven along the same lines as in the previous section, except for Dt(G). Denote byt the vector with entries t i whith i ∈ X \ T .
Proof. We have from (2) thatt = − G −T e. Differentiation w.r.t. G yields the first claim. The second follows from t i (G) ≡ 0 for i ∈ T .
Note that unless there is a b such that D 0 ⊆ D(b), we cannot expect the objective function to obtain finite values, since τ (x; b) = ∞ for x / ∈ D(b). To exclude such a case we assume that we already start the optimization with a b 0 such that D 0 ⊆ D(b 0 ). 7 Further, we have to assure that none of the b k is such if D 0 is the union of partition elements; what we assume from now on. Thus, we have to assure that the sequence {g(b k )} k=0,1,... is constant, in particular non-decreasing. If the increment ∆b k := Df (b k ) in the iteration of the gradient method is small enough, we have and the condition Dg(b k ) · ∆b k ≥ 0 assures that the above sequence is essentially non-decreasing. Note that the computation of Dg can be done similarly to that of Df in Section 5.2. So if Dg(b k ) · ∆b k < 0, we use the projection of ∆b k onto Dg(b k ) ⊥ := {x ∈ R r | x · Dg(b k ) = 0} as increment.
Summary: optimization procedure. Under the assumptions made in this section the objective function f is differentiable, and we sum up the optimization procedure by the gradient descent method as an algorithm. Initialization: Let some partition {X 1 , . . . , X n } of X , and some D 0 ⊃ T be given. Choose some b 0 ∈ R r such that D 0 D(b 0 ), and a tolerance threshold TOL > 0. Set For k = 0, 1, . . . perform the following steps: 1. Compute ∆b k := Df (b k ) and Dg(b k ).

STOP if |∆b
Remark 17. There are more established methods for constrained optimization than the one we use here. Their application, however, is beyond the scope of this work.

Parameter-affine systems
Here we consider the model problems for the case where v(·; b) is affine- where v 0 and the columns of v c are vector fields from X to R d , and B is obtained from b by reshaping the vector to a matrix. 7 We may find such a b 0 by taking the maximization problem from Section 5.2 with the objective function f (b) = D 0 χ D(b) − α|b| 2 for some very small α > 0, with its discrete counterpart For this class of vector fields an additional approximation step may save a huge amount of computational efforts. Let e i ∈ R r , i = 1, . . . , r, be such that e i,i = 1 and e i,j = 0 for j = i. Then, by linearity, and we use instead of G n (v) in our computations. 8 The reason why G * n (v) is not simply a linear combination is the fact that linear combination of generator matrices does not have to be a generator matrix. Conical combination of generator matrices, however, is a generator matrix. Unfortunately, differentiability of G * n (v) w.r.t. b i at b i = 0 is not guaranteed any more.
The usage of G * n (v) has the advantage that the computationally expensive steps of computing the generator matrix for a vector field have to be done only once, at the beginning (2r + 1) times, and not for all iterates b k in the gradient method. This brings a massive speed-up in runtime at the expense of small loss in accuracy.
6 Numerical examples 6.1 Example: Domain of attraction maximization Consider the two dimensional dynamical system given bẏ The goal is to maximize D(b) − α b 2 2 , with α = 0.02. For this we use the method described in Section 5.2 (see Remark 13 on how to handle the above saturation condition) with step sizes γ k = 3. Note that the linearization of system (10) around the origin yieldsξ = −Bξ. We start the iteration with B 0 = 1 1 0 1 , and do 15 gradient steps.
In Figure 1 we show the absorption probabilities corresponding to the system with parameter matrix B 15 (for the one computed with the gradient method for the particular partition, respectively), from left to right for a 64 × 64, a 128 × 128 and a 256 × 256 uniform partition of the state space. Table 1 shows the change in the objective function after 15 gradient steps for the different partitions. For all three discretizations Df (b 15 ) 2 < 10 −3 . The objective function increased by 15% compared with the naive initial parameter array computed from linearization.
There is no reason to expect the objective function to have only one local minimum (which will be the global one as well). Thus we applied the algorithm to different initial parameter values. The resulting sequence, however, converged always to the same optimum.
To save computational efforts one could start an optimization with a coarse resolution, and successively change to a finer partition at some appropriate iteration step.

Example: Absorption time minimization
Consider system (10) with the same saturation condition as above.
The goal is to minimize with α = 0.02. For this we use the method described in Section 5.3 with step sizes γ k = 3, and 15 steps. We start the iteration at the optimal parameter value computed for the maximal DOA above, which is B 0 = 0.89 0.35 0.75 1.4 .
In Figure 2 we show the optimal absorption times computed with the gradient descent method for three different partitions. In every case the optimum seems to occur at the boundary of the feasible region b D 0 ⊂ D(b) . This matches well with the fact that the boundary of D 0 (indicated by the circular contour in the figure) is very close to the boundary of the set with high absorption probabilities (the plotted boxes indicate absorption probabilities greater than 0.9).
After 15 steps the gradients (more precisely, the gradients projected to Dg(b k ) ⊥ , since the iteration converges to the boundary of the feasible set; see Section 5.3) were for all discretizations smaller than 4 · 10 −3 .  (but very similar) local minima which all lie at the boundary of the feasible region b D 0 ⊂ D(b) . Also, for finer partitions these local minima are closer to each other. This suggests that the occurrence of multiple minima is only due to the finite discretization of the problem, and they all collapse to one minimum as the diameter of the partition elements tend to 0.

Example: Affine parameter dependence
We test the modification introduced in Section 5.4 on the systeṁ which has the desired affine dependence on the parameter array b.
First we compute the discrete generators corresponding to the system (11) with parameter B 0 = 0.1 10 0 15 by the direct discretization (4) and by the modified discretization (9), respectively.
The state space is [−1, 1] × [−1, 1], and we apply a 40 × 40 uniform partition in both cases. Next, we compute the absorption probabilities for the target T = [−0.05, 0.05] × [−0.05, 0.05] using the one and then the other discretization. Figure 3 shows the results. To understand why does the modified dis- Figure 3: Absorption probabilities computed using the standard discretization (left) and the modified discretization (9) (right). The larger diffusivity of the modified discretization is reflected in the milder descent of absorption probabilities.
cretization give more "blurred" absorption probabilities, note the following. The standard discretization takes a linear combination of some vector fields and computes the discrete generator from it. The modified method takes discrete generators of some vector fields and combines them linearly to one generator.
This means for one element in the partition, that if the component vector fields v(·, e i ) show locally in many different directions, the combined generator of the modified discretization will yield transition rates in many different neighboring partition elements. Hence, the generator from the modified discretization may have a larger "diffusivity", forcing absorption probabilities of neighboring boxes to be closer to each other. This is what we observe as blurring in the figure. This diffusivity effect decreases as we use finer partitions. Applying 15 steps of the gradient method for the problem of maximizing D(b) − 0.02|b| 2 with a 256 × 256 partition and B 0 as above, the difference in the objective function computed by the two discretizations is only 3.5%, and the corresponding absorption probabilities are shown in Figure 4. Note that the modified method computes 9 generators at the beginning, and then no more for the whole iteration. Since in this example approximately 45 steps are needed that the gradient of the objective function falls under 10 −3 (the step sizes are constant, γ k = 3), the modified method has a notable advantage over the standard one. For the examples presented here the computation of the discrete generator was always by far the computationally most expensive step, however for finer partitions, and especially in higher system dimensions, the computational costs of the solution of the linear equations may dominate. These computations can be carried out more quickly by hierarchical GMRES techniques (see [Kol10], Section 5.7.4), so that the modified discretization for the case of linear dependence on the parameters still induces a considerable speed-up in the optimization against the standard discretization (4).

Conclusions
We have proposed a method for the optimization of the global stability properties of time-continuous autonomous parameter-dependent systems. It uses an approximation of the deterministic dynamics by a Markov jump process (essentially the spatial discretization of the upwind method). The main computational properties of the method are that no trajectory simulation is needed, and that the computation of the objective function and of its derivative is simple and consist mostly of explicit steps. If the system is affine-linear in the parameters, a considerable speed-up is achieved by a slight modification.
A quantitative performance analysis of the method, in particular convergence statements, are subject of future work, just as the incorporation of adaptivity. The advantages of combining the basic idea with some more sophisticated optimization approaches are also to be investigated.