Towards Optimal Spatio-Temporal Decomposition of Control-Related Sum-of-Squares Programs

This paper presents a method for calculating the Region of Attraction (ROA) of nonlinear dynamical systems, both with and without control. The ROA is determined by solving a hierarchy of semidefinite programs (SDPs) defined on a splitting of the time and state space. Previous works demonstrated that this splitting could significantly enhance approximation accuracy, although the improvement was highly dependent on the ad-hoc selection of split locations. In this work, we eliminate the need for this ad-hoc selection by introducing an optimization-based method that performs the splits through conic differentiation of the underlying semidefinite programming problem. We provide the differentiability conditions for the split ROA problem, prove the absence of a duality gap, and demonstrate the effectiveness of our method through numerical examples.


INTRODUCTION
This work addresses stability and reachability analysis on controlled nonlinear dynamical systems.The method used in this paper assesses these properties by calculating the Region of Attraction (ROA) of a given target set.The reachable set is obtained by reversing the time.
The majority of the currently used methods deal with autonomous systems, where the approximation of ROA is obtained from level sets of Lyapunov functions (see, e.g., [1]).In the case of polynomial systems, one can obtain the Lyapunov functions by solving a series of semidefinite programs (SDPs), similar to our method.However, these approaches do not consider control systems and are limited to the equilibria of the considered systems.
This paper uses the works from [2] and [3], which provide converging outer approximations of the ROA for controlled polynomial systems with respect to a given target set, not necessarily an equilibrium.The original version of this method, introduced in [2], calculates the ROA by transforming the problem into an infinite-dimensional Linear Program (LP) on measures, whose dual is then relaxed into a problem on sum-of-squares polynomials of a given degree which can be equivalently written as an SDP.
The unsolved challenge of that approach was the rapid growth rate of the size of the SDP with respect to the degree of the used polynomials.The work [3] provided a remedy for this issue in the form of spatio-temporal decomposition of the problem and its variables.The new problem was obtained by splitting the time and state space into several smaller subsets resulting in multiple smaller, interconnected SDPs of lower complexity.The work provided significant improvement in terms of time and memory demands, but left one question unanswered: How to split the time and state space?
This work is a step toward answering this question by proposing an optimization-based method for finding the split configuration.We use the recent work on conic differentiation [4] to obtain a gradient of the related sum-of-squares problem with respect to the parametrization of the splits (e.g., the split positions).The parameters are then optimized by a first-order method.We provide the conditions for differentiability of the split problem and show the effectiveness of the method on the examples used in [3].
This work can be seen as complementary to sparsity-based approaches for complexity reduction of the SDP-based methods [5,6,7] and could be combined with them.The same philosophy of splitting the state space can be applied to other problem classes amenable to these methods such as the invariant set [8] or invariant measure computation [9], extreme values [10] or partial differential equations [11].
Structure of the paper Section 2 defines the Region of Attraction (ROA) and motivates this work.Section 3 presents the split ROA problem and the conditions for strong duality.The differentiation is described in Section 4 along with the proof of differentiability.The numerical results are in Section 5 and we conclude in Section 6.
Notation The set of consecutive integers from 1 to n is denoted by Z n .The Lebesgue measure (the volume) of a set A is denoted by λ(A).

PROBLEM STATEMENT
We shall focus on the problem of calculating the region of attraction (ROA) of a controlled nonlinear dynamical system where x(t) ∈ R n is the state vector, u(t) ∈ R m is the control input vector, t is time, T > 0 is the final time and f is the vector field, which is assumed to be polynomial in variables x and u.The state and control inputs are constrained to lie in basic semialgebraic sets where g U j (u), g X j (x), and g XT j (x) are polynomials.The Region of Attraction (ROA) is then defined as where "a.e." stands for "almost everywhere" with respect to the Lebesgue measure and L([0, T]; U) denotes the space of measurable functions from [0, T] to U.
The work [2] introduced an algorithm for calculating an outer approximation of the ROA, based on relaxing a polynomial optimization problem into a semidefinite program based on the sum-of-squares (SOS) techniques.In [3], we improved the accuracy of the SOS relaxation by discretizing (or splitting) the time and state space thus allowing for tighter approximations with favorable scaling properties.It was, however, unclear how to perform this splitting to obtain optimal results, which is the focus of this paper.
We build upon the algorithm from [3], which increased the accuracy by splitting the sets X and [0, T] into multiple subsets, and consider the positions of splits as parameters that are then optimized using gradient descent.The following example shows the advantage of splitting the problem and then optimizing the splits.

Motivation example
Let us motivate the idea of optimizing the split positions.Consider a simple double integrator system . Figure 1 shows the difference between the original approach from [2] (salmon), the improved version from [3] (purple), and the version provided in this paper (blue).The real ROA is depicted in green.We can see that each version provides a significant improvement in accuracy.The most notable detail about the ROA calculated by the proposed algorithm (blue) is that it has exactly the same memory demands as the ROA calculated via the method [3] (purple), which provides substantially worse estimation.
F I G U R E 1 ROA approximations of the double integrator.The salmon approximation was obtained with degree 4 polynomials and no splits [2], the purple approximation with degree 4 polynomials and 4 equidistantly-placed splits [3], and the blue approximation with degree 4 polynomials and 4 splits with optimized positions.The green set is the ground truth.The memory demands for blue and purple are exactly the same.

DISCRETIZING TIME AND STATE SPACE
Although this work is mainly concerned with the SDP formulation of the ROA problem, we will first introduce the polynomial problem in order to give meaning to the parameters, which will be optimized through the SDP formulation.
We discretize the time axis as where and the state space as where K ≥ 2 is the number of time splits (including the boundaries) and I is the number of closed subsets X i .In this work, we assume that the subsets X i are created by splitting the state space into axis-aligned boxes.Therefore, the interiors of the sets X i do not intersect but the sets can share a zero-volume boundary.Therefore the boundary between two neighboring cells X a and X b is X a ∩ X b .The time-splits are the scalars T k , excluding the fixed boundary terms; we denote this set as We define the set of all neighboring subsets X i as The set of state-space parameters is denoted as θ X and depends on the chosen parametrization of the splits.In this work, the state space is divided by hyperplanes which are formed by splitting the axes into intervals; this restricts the X i 's to axis-aligned boxes as mentioned earlier.The locations of the splits are the parameters in θ X .An illustration is provided in the Figure 2.
x 1 Illustration of the split sets X i , the set boundaries x a,b , their normal vectors h a,b , and the parameter set θ X , for which we get θ X = {ρ 1 , ρ 2 } in this example.
We shall now describe the steps required to turn the problem of finding the ROA into a semidefinite program (SDP).We will describe the polynomial formulation, then relax it via SOS, and then reformulate it as an SDP.

Polynomial formulation
The outer approximation of the ROA X 0 can be obtained as a super-level set X0 := {x : v(0, x) ≥ 0} of a piece-wise polynomial which is a solution to the problem where the polynomials w i and v i,k are the problem variables and h a,b is the normal vector of the boundary X a ∩ X b , which is assumed to be element-wise positive.For more insight, we refer the reader to [3].The problem is parametrized by the time splits θ T and the state splits θ X (which define the boundaries X a ∩ X b ).Let us define the vector of the parameters as

Sum-of-squares relaxation
The SOS approximation of (10) where z = [t, x, u] ⊤ , w i (x) and v i,k (t, x) are polynomials, w i is a vector of coefficients of w i (x) and l i is a vector of Lebesgue measure moments indexed with respect to the same basis as the coefficients of w i .The decision variables in the problem are the polynomials v i,k and w i as well as sum-of-squares multipliers q, s and s.The symbols g X i , g XT i , g U i and g τ k denote the column vectors of polynomials describing the sets X i , X T ∩ X i , U and [T k , T k+1 ] in that order.The degrees of all polynomial decision variables are chosen such that the degrees of all polynomials appearing in (12) do not exceed a given relaxation order d.This is a design parameter controlling the accuracy of the approximation.
The outer approximation of the ROA of degree d is defined by the piece-wise polynomial v d as It is worth mentioning that if some splits coincide and create a "degenerate" subset X d , the subset will simply lose dimensions; in the most extreme case it will become a point, which is still a valid semi-algebraic set and no special treatment is necessary.

Strong Duality
We shall now show that the splitting does not break the strong duality property of the original, non-split ROA problem.The infinite-dimensional setting is addressed later in Theorem 1.Here we focus on the strong duality of the relaxed problem (12), for which we use the general proof from [12].To use the results from [12], we need to show that our problem can be written as the augmented version of the generalized moment problem (GMP) where M(K i ) + is a set of positive measures supported on a set K i , A and B are index sets, a α and b β are scalars, is a vector of polynomials and similarly for Ψ β and c.The vector notation is to be understood as The primal of (10) reads with decision variables µ i Tk , μi 0 , µ i k , and µ a∩b k .The normal vector h a,b of the boundary X a ∩ X b is element-wise positive.The set N in Xi contains indices of neighbours of X i , such that the normal of their common boundary h •,i points to X i .Similarly for N out Xi in the opposite direction.For our specific case of splits along the axes and element-wise positive normal vectors, we can write The blue area is the set X blue = i∈E(ha,b,a) X i and both the red and blue areas together are the set Considering polynomial test functions Ψ 1 = Ψ 1 (x) and Ψ 2 = Ψ 2 (t, x), we can write the equality constraints of (16) as and Since Ψ 1 dλ is a constant and f is assumed to be polynomial, we can see that ( 16) fits the general description (14).
To use the results from [12] for showing strong duality of the relaxation (12), we must satisfy the following two assumptions Assumption 1.The description of the sets K i in ( 14) contains the ball constraint Although the problem (10) does not contain the ball constraints in the presented form, they can be easily added without changing the optimal value to ensure the strong duality of the problem.
Note that if the subset K i is a hypercube, it is usually described by g Ki,p (x) = R 2 p -||x p || 2 for all p = 1 . . .n; in this case, the redundant ball constraint is not needed.
If Assumption 2 is satisfied, the set X is connected, and h ⊤ a,b f is nonzero on the splits, then the feasible set of ( 16) is bounded, i.e., there exists a constant C > 0 such that 1 dµ < C for all measures µ appearing in (16).
Proof.To prove that all masses are bounded, we first sum the equations of ( 16) over time and space, which eliminates the additional boundary measures introduced by the time and space splits respectively.This will give us a similar situation as in the original work [2] and simplify the proof for the boundary measures.We shall denote 1 dµ as mass(µ).
The first constraint µ i T1 + μi 0 = λ implies that p ⋆ s is bounded and µ i T1 and μi 0 are bounded.By summing the second constraint over all i and k we obtain A test function Ψ(t, x) = 1 then gives us i mass(µ i TK-1 ) = mass(µ i T1 ), (21) meaning that µ i TK-1 are bounded.With a test function Ψ(t, x) = t we obtain which shows that µ i k are bounded.Let us sum the second constraint over all i for some time k = s with a test function Since µ i T1 are bounded, µ i T2 are bounded as well; by induction all µ i Ts are bounded.To show the boundedness of the remaining measures µ a∩b k let us first introduce the following sets of indices: By summing the second constraint over the indices P(h, i), we will be left only with the boundary measures µ •∩: k corresponding to the normal vectors in the direction of h, their indices are characterized by the set E(h, i).For convenience, we also define the set of tuples which contains the set indices E(h, i) and indices of their connecting neighbours in the direction h.See Fig. 3 where all the normal vectors have the same direction, i.e. h i,j = h a,b .The normal vectors with directions different from h a,b have been summed out since they all appear in P(h a,b , a) exactly twice with opposite signs (once as incoming and once as outgoing vector).We can rewrite the equation ( 28) as where µ E s = (i,j)∈En(ha,b,a) Since the top part of ( 29) is bounded, the integral h ⊤ a,b f dµ E s is also bounded.The measure µ E s is supported on a intersection of X and a hyperplane with normal vector h a,b .Using the assumptions on connected X and that h ⊤ a,b f is nonzero on the support of µ E s , we can conclude that µ E s is bounded.
This implies boundedness of all the measures µ i∩j s from (30), since they all have the same sign.Due to the construction of the set E(h a,b , a), the measure µ a∩b s is trivially one of the measures µ i∩j s and is therefore bounded.This procedure can be done for all the measures µ a∩b k .We can now conclude that under the assumption of the feasible set of ( 16) is bounded.
Theorem 1.If Assumption 2 holds then there is no duality gap between (10) and (16), i.e., d ⋆ s = p ⋆ s .Proof.The proof is based on classical infinite-dimensional LP duality theory result [13, Theorem 3.10], following the same arguments as in [2, Theorem 2].The key ingredients to these arguments are the boundedness of masses established in Lemma 1 and the continuity of the operators L ′ appearing in ( 16) that holds trivially.
Lemma 2. The SOS relaxation (12) of (10) and its dual, the moment relaxation (not presented) of (16), have zero duality gap if Assumptions 1 and 2 hold.
Proof.The proof follows directly from [12,Proposition 6], where the only missing part is boundedness of the masses of the relaxed problem, which follows from boundedness of masses of ( 16) since the proof of the boundedness of the masses of ( 16) in Lemma 1 uses only constant or linear test functions.

SDP DIFFERENTIATION
We will consider the SDP in the form of a primal-dual pair, parametrized by θ.To stay consistent with the usual notation, we shall abuse ours and use x as a vector of decision variables in the context of semidefinite programming.
with variables x ∈ R n , y ∈ R m , and s ∈ R m with data A ∈ R m×n , b ∈ R m , and c ∈ R n .We can assume strong duality due to the Lemma 2 and therefore p ⋆ = -d ⋆ .The KKT conditions are Notice that the conic constraints do not depend on θ, this reflects the fact that neither the number of the split regions X i nor the number of the time splits T k changes.We can describe the (primal) SDP concisely as a function of θ as where D(θ) is a shorthand for all the program data depending on θ.The goal of this paper is to find a (sub)optimal set of parameters θ ⋆ such that Note that we are dealing with multiple meanings for the optimal value p ⋆ ; let us clarify that p ⋆ (θ) is the minimal objective value of (32) for some parameters θ, while p ⋆ (θ ⋆ ) is the minimal objective value for the optimal parameters θ ⋆ which is what we are after.In the context of ROA, p ⋆ (θ ⋆ ) corresponds to the ROA with optimal splits.We shall tackle (35) by assuming differentiability of S which will be rigorously proven later, in Lemma 3. Assuming an existing gradient, we can search for θ ⋆ via a first-order method, which iteratively updates the parameters θ by using the gradient of s as for some initial guess θ 0 (e.g.obtained from the recommendations in [3]) and a stepsize γ.The stepsize can be a function of k and/or some internal variables of the concrete gradient descent algorithm.The following section will present two ways of calculating ∇S(D(θ)).
The parameter θ is considered to be a column vector of size n θ .The perturbation of the vector θ in direction k is where e k is the kth vector of standard base, containing 1 at kth coordinate and 0 everywhere else.The scalar ϵ k is the perturbation size.
The object D(θ) is to be understood as a vector of all the SDP data, for example where we dropped the dependence on θ to lighten up the notation.

4.1
Methods for finding the derivative

Finite differences
The estimate of the derivatite of (34) at the point θ in the direction e k is where the step ϵ f is a free parameter.The gradient is then estimated as

Analytical derivative
The gradient of S can be written as The first fraction dS dD signifies how the problem solution changes with respect to the input data.This problem has been tackled in [4] for general conic programs; here we shall address some specific issues tied to SOS-based SDPs.
The second fraction dD dθ shows how the problem data changes with the parameters θ.Modern tools (YALMIP [14], [15], GloptiPoly [16], The sum of Squares Programming for Julia [17], [18]) allows the user to write directly the polynomial problem (10) or its SOS representation, alleviating the need for constructing the problem data (A, b, c) directly.Despite the undeniable advantages this abstraction brings, it makes it more difficult to work on the problem data directly, since these parsers are usually not created to be autodifferentiable.For this reason, we shall estimate the derivatives dD numerically, striking a tradeoff between convenience and accuracy.
For example, sensitivity to θ k is obtained as where P k ϵ = {-ϵ d , . . ., ϵ d } is a set of perturbation steps sizes of the parameter θ in the direction k.Each evaluation of D is to be understood as a call to one of the aforementioned programming tools which constructs (32) from (10).
The following subsection will explain how to obtain the derivative ds dD Obtaining dS dD This subsection summarizes the approach listed in [4] and [19], while focusing on the specific case of the SOS-based SDPs.The generic approach presented in [4] is not immediately usable for our concrete problem, therefore we shall provide some remedies in the following subsections.We shall first quickly review the generic approach in [4].In the following text, Π A shall denote a projection onto the set A and Π a projection onto R n × K ⋆ × R + .We shall also drop the dependence on θ to lighten up the notation.Lastly, we abuse our notation again and use v and w to denote vectors corresponding to the primal-dual solution of (32) in the context of semidefinite programs.
The derivative of the solution can be written as where the primal-dual derivatives are obtained as where the variables u, v, and w are related to the solution by Note that is w a normalization parameter which is in our case always equal to 1, and thus not necessary; we only keep it to stay consistent with [4].The meaning and other possible values of w are explained in [19].The derivative of z is obtained as the solution to where M = ((Q -I) dΠ(z) + I)/w and g = dQ• Π(z/|w|).Note that the matrix M depends only on the current solution, not the perturbations; we shall exploit it later in this section.The matrices Q and dQ are defined as Let us now write relevant cone projections and their derivatives: where X = UΛU ⊤ is the eigenvalue decomposition of X, i.e., Λ is a diagonal matrix of the eigenvalues of X and U an orthonormal matrix.The matrix Λ + is obtained as max(Λ, 0), element-wise.Finally, the derivative of Π S+ at a non-singular point X in the direction where • is element-wise product and where k is the number of negative eigenvalues of X, and U is chosen such that the eigenvalues λ i in the diagonal matrix Λ in the decomposition X = UΛU ⊤ are sorted in increasing order, meaning that the first k eigenvalues are negative.

Exploiting problem structure
The most demanding task in this approach is solving for dz.The paper [4] suggests the use of LSQR [20] instead of direct solve via factorization when the matrix M is too large to be stored in dense form.Luckily, we can also factorize the matrix M in its sparse form, via free packages such as SuiteSparse [21] (the default factorization backend in MATLAB [22] and Julia [23]), Intel MKL Pardiso [24], and MUMPS [25], [26].Moreover, recall that we have n θ parameters and have to solve (57) in n θ directions resulting in Since the matrix M does not depend on the perturbed data, we need to factorize it only once to solve (57) for all n θ directions of θ.
In this paper, we factorize M by the QR factorization [27], [28] as where Q is orthonormal and R is upper triangular.The equation (57) then becomes which is solved by backward substitution due to R being a triangular matrix.
All the aforementioned methods (Finite differences, LSQR, and QR) are compared in Section 4.3.

Conditions of differentiability
As was mentioned above, the analytical approach for obtaining the derivative is preferable to the finite differences.However, the analytical approach also assumes differentiability of the ROA problem with respect to the split positions, which is proved in the following Lemma.We use the notion of genericity from [29, Definition 19].We call a property P of an SDP generic if it holds for Lebesgue almost all parameters (A, b, c) of the SDP.In other words, the property fails to hold on a set of zero Lebesgue measure.Concretely, we will use the genericity of uniqueness [29, Theorems 7,10, and 14] and strict complementarity [29,Theorem 15] of the primal-dual solutions to (32).
Lemma 3. The mapping from the split positions to the infimum of the SOS-relaxation (10) is differentiable at a point θ if assumptions 1 and 2 hold, and the primal-dual solution of (32) is unique and strictly complementary for the problem data D(θ).
Proof.The conditions of differentiability according to [4] are uniqueness of the solution and differentiability of the projection Π of the vector z, needed for construction of (46).Since the uniqueness is assumed, only the projection Π needs to be investigated.Assuming (x, y, s) to be the optimal primal-dual solution, the projection Π(z) can be written as where Π R n is differentiable everywhere and Π R+ is also differentiable since we are at the solution with w = 1.The only cause for concern is Π K ⋆ , where K ⋆ is a product of the positive semidefinite cone S + and the free/zero cone.Therefore Π K ⋆ is differentiable if and only if Π S+ is differentiable.Let us denote the semidefinite parts of y and s as matrices Y and S respectively.The matrices Y and S commute, since YS = SY = 0.They also share a common set of eigenvectors Q such that Q ⊤ Q = I, making them simultaneously diagonalizable as where Λ Y and Λ S are diagonal matrices with eigenvalues on the diagonal.The product YS can be then written as and for i th eigenvalue we get the condition Strict complementarity of the SDP solution means that the ranks of Y and S sum up to full rank.Taking the sum, we can write and since Λ Y and Λ S are diagonal, we can claim that otherwise the rank of Y + S would decrease.By putting (65) and (67) together, we conclude that for i th eigenvalues λ i s and λ i y , one has to be zero and the other nonzero.This implies that the matrix Y -S will not be singular and thus the projection Π S+ (Y -S) is differentiable, and therefore the whole SDP (32) is differentiable.

Comparison of differentiation approaches
The Figure 4 shows scaling of the proposed methods with an increasing number of parameters and the Figure 5 investigates their scaling with the degree of approximation.We see that using QR factorization to solve (46) clearly outperforms both LSQR, suggested in [4], and Finite differences 4.1.1.We see that QR is preferable, since the factorization is done only once for all parameters whereas LSQR needs to solve (58) n θ -times, similarly for Finite differences which solves the ROA for each parameter individually.
The concrete software packages used are Krylov.jl[30] for LSQR and SuiteSparse [21] for QR factorization, both used through their interfaces to the programming language Julia [23].
F I G U R E 4 Computation time needed to obtain derivatives for degree 6 double integrator with respect to the number of parameters.The LSQR method always reached the maximum number of iterations, which was set to 1000.The times for LSQR and QR include the cost of obtaining the matrices M and g in (46).F I G U R E 5 Computation time needed to obtain derivatives for double integrator with 6 parameters and increasing degree.The LSQR method always reached the maximum number of iterations, which was set to 1000.The times for LSQR and QR include the cost of obtaining the matrices M and g in (46).

NUMERICAL EXAMPLES
This section presents the optimization results on Double integrator and Brockett integrator, the optimization results are presented in 5.1.The section is divided into optimization of low degree and high degree problems, the degree is to be understood as the degree of the polynomial variables in (10).All of the examples use ADAM [31] as the first-order method for the optimization; the stepsize of ADAM was 0.05 and the decay rates 0.8 and 0.9.The results were obtained on a computer with 3.7GHz CPU and 256GB RAM.
To simplify the following, let us define θ d as the parameter path obtained by optimizing degree d problems, i.e. θ d contains the split locations of each iteration of the optimization algorithm.Similarly, p ⋆ d (θ) will denote the vector of optimal values for degree d approximation calculated along θ.For example, p ⋆ 6 (θ 4 ) denotes a vector of optimal values of degree 6 approximation, evaluated on parameters obtained from optimizing the split locations of a degree 4 approximation.

5.1
Low degree

Double integrator
First, we consider the double integrator example from [2, 9.3] which is defined as 2], X T = {0} and T = 1. Figure 6 shows the results for optimization of the split locations with degree 4 polynomials.The initial conditions were equidistantly placed split positions.The dotted black line shows the estimated global optimum, which was attained by sampling the parameter space on a grid of square cells with sizes of 0.1.A total of 25116 unique split positions were evaluated in 23 hours.
The optimizer attained by the proposed method was θ ⋆ = -0.0590.070 -0.017 0.015 while the global estimate was at θ ⋆ g = 0 0.2 -0.4 0 .The obtained optimum improves the initial guess by 58% with respect to the global optimum estimate, and it was found in 2 minutes whereas the global estimate took 23 hours.
The Figure 7 shows 1-dimensional line segment parametrized by r ∈ [0, 2], connecting the attained solution θ ⋆ (t = 0) to the global estimate θ ⋆ g (t = 1).We see that in this particular direction, the optimum is quite sharp, which makes it difficult to find precisely using gridding.

Brockett integrator
The Brockett integrator is defined according to [32] as where X = {x ∈ R 3 : ||x|| ∞ ≤ 1}, X T = {0}, U = {u ∈ R 2 : ||u|| 2 ≤ 1}, and T = 1.This system usually serves as a benchmark for nonholonomic control strategies, because it is one of the simplest systems for which there exists no continuous control law which would make the origin asymptotically stable [32].
Figure 8 shows the optimization results for degree 4 approximation.The estimate of the global optimum was attained by sampling the parameter space on a grid with cell size 0.1.Furthermore, we assumed that the splits are symmetrical along all three axes, making the search space 3-dimensional.The computation time of the sampling was 16 hours over 1000 unique split positions.Without the symmetry assumption, the computation would be intractable as the full search space has 6 dimensions.The optimizer attained by our method was θ ⋆ = [0.011,-0.011, 0.011, -0.011, 0.004, -0.004] ⊤ , while the global estimate was θ ⋆ g = [0, 0, 0, 0, 0, 0] ⊤ .Both minimizers are quite close in this case.The found optimum improves the initial guess by 62% with respect to the global optimum estimate, and it was found in 30 minutes whereas the global estimate with symmetry assumption took 16 hours.A brute-force search in the whole parameter space would take roughly 12 years on the same computational setup.
F I G U R E 8 Brockett integrator: degree 4 polynomials and 6 splits.The volume of the ROA approximation is shown in the top plot.The bottom plot shows how the split positions evolved during the optimization process.There were 3 splits for each state variable.The black-and-yellow dot represents the attained minimum.We see that all the splits were close to the origin at the minimum.
F I G U R E 9 Brockett integrator: degree 4 polynomials and 6 splits.Slice of the value function between the attained optimum (r = 0) and the estimated global optimum (r = 1).

High degree
This section contains optimization results for the same systems but at a higher degree of the SOS approximation.We do not provide the estimates of global minima, because they would take very long to compute.The high-degree systems had numerical difficulties and their optimization was more demanding than the low-degree case.
We provide two plots for each system, the first one being the application of the same method directly on the high-degree system.The second shall plot objective values of the high-degree system, for the parameter paths obtained from the lowdegree optimization; the low-degree problems were optimized first and the high-degree system was simply evaluated along their parameter paths.Given the positive results, this technique could be a possible measure to circumvent the inherently bad numerical conditioning of high-degree SOS problems.

Double integrator
Figure 10 shows the results for the Double integrator with degree 8 polynomials.We see that the path of the objective values is not as smooth as in the low-degree case.
Figure 11 shows the objective values with degree 8 polynomials while using parameter path obtained with degree 4 and 6 polynomials.We see that we can obtain almost the same optimal values much faster (120 times for the degree 4 path and 24 times for the degree 6 path).
F I G U R E 10 Double integrator: degree 8 and 4 splits.The volume of the ROA approximation is shown in the top plot.The bottom plot shows how the split positions evolved during the optimization process.There were two splits for each state variable.The black-and-yellow dot represents the attained minimum.

Brockett integrator
Figure 12 shows the results for the Brockett integrator with degree 6 polynomials.Again, we see that the objective path is not as smooth as in the low-degree case.
Figure 13 shows the results from using the split parameter path obtained with degree 4 polynomials.We see that in this case, the found minimum was improved by approximately 55% compared to the initial guess.

CONCLUSION
We have presented a method for optimizing the split locations for ROA computation via conic differentiation of the underlying SDP problem and first-order methods.We have adopted a differentiation method for multi-parametric SDPs and improved its scaling by a considerable margin as can be seen in Figures 4 and 5.In Section 5, we demonstrated the viability of the method by optimizing the split ROA problem with an off-the-shelf first-order method without any problem-specific tuning.We have managed to improve the objective values by 60% improvement across the presented examples, where 100% would mean attaining the (estimated) global optimum.
F I G U R E 11 Double integrator: degree 8 polynomials and 4 splits, split parameters computed with lower degree polynomials.The computation times per iteration were 7.91s, 41.6s, and 990.5s for degrees 4, 6, and 8 in this order.We can see that all the trajectories reach similar optimal values, while the lower-degree ones were calculated significantly faster.
F I G U R E 12 Brockett integrator: degree 6 polynomials and 6 splits.The volume of the ROA approximation is shown in the top plot.The bottom plot shows how the split positions evolved during the optimization process.There were 3 splits for each state variable.The black-and-yellow dot represents the attained minimum.
Finally, we have discussed the possibility of saving time and avoiding numerical issues of high-degree approximations by using optimal solutions from low-degree approximations as starting points.
Possible generalizations of the presented method include using non-axis aligned splits or, more generally, a suitably parametrized semialgebraic partition of the constraint set.Another avenue for future research is merging the proposed approach with sparsity exploiting methods of [5,6,7], which are complementary means of reducing the computational complexity of the moment-sum-of-squares approaches for dynamical systems.
for graphical representation.Let us fix time as k = s and focus only on space.We shall show the boundedness of a measure µ a∩b s which corresponds to the normal vector h a,b .Let us sum the equations corresponding to time s and the space indices P(h a,b , a), with a test function Ψ(t, x) = 1 we obtain 0 + i∈P(ha,b,a) mass(µ T i s+1 ) -mass(µ T i s ) + (i,j)∈En(ha,b,a)

F I G U R E 6
Double integrator: degree 4 polynomials and 4 splits.The optimal values of the SDP relaxation of the ROA are shown in the top plot.The bottom plot shows how the split positions evolved during the optimization process.There were two splits for each state variable.The black-and-yellow dot represents the attained minimum.The estimated global optimum is shown by a black dotted line.F I G U R E 7 Double integrator: Double integrator: degree 4 polynomials and 4 splits.Slice of the value function between the attained optimum (r = 0) and the estimated global optimum (r = 1).