Formal synthesis of closed-form sampled-data controllers for nonlinear continuous-time systems under STL specifications

We propose a counterexample-guided inductive synthesis framework for the formal synthesis of closed-form sampled-data controllers for nonlinear systems to meet general STL specifications. Rather than stating the STL specification for a single initial condition, we consider an (infinite) set of initial conditions. Candidate solutions are proposed using genetic programming, which evolves controllers based on a finite number of simulations. Subsequently, the best candidate is verified using reachability analysis; if the candidate solution does not satisfy the specification, an initial condition violating the specification is extracted as a counterexample. Based on this counterexample, candidate solutions are refined until eventually a solution is found. The resulting sampled-data controller is expressed as a closed-form expression, enabling the implementation in embedded hardware with limited memory and computation power. The effectiveness of our approach is demonstrated for multiple systems.


Introduction
Recent years have seen a surge in interest in controller synthesis for temporal logic specifications, realizing complex behavior beyond traditional stability requirements, see, e.g., the recent literature survey in [1].Originally stemming from the field of computer science, temporal logic has been used to describe the correctness of complex behaviors of computer systems [2].As it originally dealt with finite systems, (bi-)simulation approaches have been proposed to abstract infinite systems to finite systems [3,4].However, as a downside, these approaches (e.g., [5][6][7][8]) typically suffer from the curse of dimensionality and return controllers in the form of enormous lookup tables [9].
Where certain temporal logics, such as linear temporal logic, reason over traces of finite systems, signal temporal logic (STL) reasons over continuous signals [10].Besides a Boolean answer to whether the formula is satisfied, quantitative semantics of STL has been introduced [11,12] This work is supported by NWO Domain TTW under the CADUSY project #13852, the ERC Starting Grant SENTIENT (755953) and the ERC Consolidator Grant justITSELF (817629).This paper was not presented at any IFAC meeting.providing a quantititive measure on how robustly a formula is satisfied.These robustness measures enable optimizationbased methods for temporal logic, such as model predictive control (MPC) [13][14][15][16][17], optimal trajectory planning [18], reinforcement learning [19], and neural networks [20,21].Apart from optimization-based methods, other proposed approaches for STL specifications rely on control barrier functions (CBF) [22,23].While the work in [22] does not optimize a robustness measure of the STL specification, the computation of the control input for every time step relies on online quadratic optimization.Alternatively, in [24,25] the synthesis for a fragment of STL is reformulated to a prescribed performance control problem, resulting in a continuous state feedback control law.
While (bi-)simulation approaches provide feedback strategies for all (admissible) initial conditions, only a limited number of optimization-based approaches consider a set of initial conditions [1], including [13,15,26].In [16], tube MPC is used, in which a tube around a nominal initial condition is found for which the robustness measure is guaranteed.Similarly, the control barrier functions in [22] provide a forward invariant set around the initial condition.
In this work, we utilize genetic programming (GP) [27] and reachability analysis [28] to synthesize controllers.The benefit of genetic programming is that it is able to automatically find a structure for the controller, as the right structure is typically unknown beforehand [1].Moreover, the resulting controllers can be verified using off-the-shelf verification methods and are generally easier to interpret than e.g.neural network controllers or look-up tables in the form of binary decision diagrams (BDDs).Genetic programming has been used for formal synthesis for reach-avoid problems in [29,30], in which controllers and Lyapunov-like functions are automatically synthesized for nonlinear and hybrid systems.Also, reachability analysis has been used in formal controller synthesis for reach-avoid problems, e.g., in [26], MPC is combined with reachability analysis, whereas [31][32][33] synthesizes a sequence of optimal control inputs [33] or linear controllers [31,32,34] for a sequence of time intervals.
To the best of our knowledge, there are no closed-form controller synthesis methods which guarantee general STL specifications for a set of initial conditions.The goal of this work is to synthesize correct-by-construction closed-form controllers for nonlinear continuous-time systems subject to bounded disturbances for finite-time STL specifications.Moreover, we consider a sampled-data implementation of the controller, i.e., the controller output is only updated periodically and is held constant between sampling times.To this end, we propose a framework based on counterexampleguided inductive synthesis (CEGIS) (see e.g.[13,[35][36][37]), combining model checking for STL [38], counterexample generation using reachability analysis [39], and genetic programming (GP) [27].Our CEGIS approach combines a learning step with a formal verification step, in this case GP and reachability analysis, respectively.Violations obtained during verification are used to improve the learning process, until a controller which formally satisfies the desired specification is found, or a user-defined maximum of iterations is reached.The synthesis of a closed-form sampled-data controller for general STL specifications is NP-complete.Unsurprisingly, the proposed method is not a complete method, i.e. existence of the solution does not guarantee that a solution will be returned in a finite number of iterations.Moreover, as the method relies on simulations, reachability analysis, and SMT solvers, the (offline) computational complexity of the proposed method is significant.However, costly computations are performed offline and the method results in an interpretable closed-form sampled-data controller, enabling digital implementation with small online computational costs and a small memory footprint.
The main contributions of this work are twofold: first of all, we propose a CEGIS framework combining genetic programming with reachability analysis for the synthesis of closed-form sampled-data controllers for STL specifications.To enable reasoning over reachable sets as opposed to singular trajectories, [38] introduced reachset temporal logic (RTL) and proposed a sound transformation from STL to RTL.Our second contribution is the definition of quantitative semantics for RTL, and proving that the quantitative semantics is sound and complete.Similar to the quantitative semantics of STL, these quantitative semantics provide a measure of how robustly a formula is satisfied.

Preliminaries
The set of real positive numbers is denoted by R ≥0 .The interior and power set of a set S is denoted by int(S) and 2 S , respectively.Finally, an n-dimensional zero vector is denoted by 0 n .

Signal temporal logic
We consider specifications expressed in signal temporal logic (STL) [10], using the following grammar: where ϕ, ϕ 1 , ϕ 2 are STL formulae, and h(s) ≥ 0 is a predicate over a signal s : R ≥0 → R n and a function h : R n → R. The Boolean operators ¬ and ∧ denote negation and conjunction, respectively, and U [a,b] denotes the bounded until operator, i.e. until between a and b, where a < b and a, b ∈ Q ≥0 .Note that since a, b ∈ Q ≥0 , the STL formula inherently reasons over finite-time signals.We can also define other standard (temporal) operators from (1), such as disjunction The satisfaction relation (s, t) |= ϕ indicates that the signal s starting at t satisfies ϕ.We consider the same definition of the semantics as in [38], which slightly deviates from e.g.[10] w.r.t. the until operator1 .Since we build upon the results in [38], we have adopted the corresponding definition.STL is equipped with quantitative semantics ρ(s, ϕ, t) that provides a robustness measure of how well a signal s starting at time t satisfies or violates the STL specification [11,12].If ρ(s, ϕ, t) is negative, lower values imply that ϕ is more strongly violated.Conversely, if ρ(s, ϕ, t) is positive, higher values imply that ϕ is satisfied more robustly.

Reachset temporal logic
Consider a closed-loop system described by: where ξ(t) ∈ R n denotes the state, ω(t) ∈ Ω ⊂ R l an external disturbance, I ⊂ R n is the set of initial conditions, I and Ω are compact, and f : R ≥0 × R n × R l → R n and ω : R ≥0 → R l are assumed to be Lipschitz continuous.In this work, we are not only interested in the STL performance of a single trajectory, but rather in the set of all trajectories satisfying system Σ, defined by That is, given an STL specification ϕ, we are interested in whether the system Σ satisfies ∀ξ ∈ S(Σ) : (ξ, t) |= ϕ.For general systems of the form (2), it is not possible to construct the set S(Σ).However, it is possible to construct a set that does not persevere individual trajectories, but stores which states can be reached at a given time, i.e. a reachable set: Definition 1 (Reachable set) Given a system Σ, a mapping R e : R ≥0 → 2 R n is an exact reachable set if and only if: A mapping R : R ≥0 → 2 R n is a reachable set if and only if ∀t ∈ R ≥0 : R e (t) ⊆ R(t).
That is, a reachable set satisfies that ∀t ∈ R ≥0 , ∀ξ ∈ S(Σ) : ξ(t) ∈ R(t).Reachability analysis tools such as CORA [40] can return a sequence of sets As reachable sets do not store the information of individual trajectories, it is not possible to use the STL formula directly over the reachable sets.However, [38] introduced reachset temporal logic (RTL), enabling the direct reasoning over reachable sets, and established a sound transformation between STL and RTL.The RTL fragment relevant for this work is: Here, ψ, ψ 1 , ψ 2 are propositional formulae over states x and φ, φ 1 , φ 2 are formulae over a reachable set R : R ≥0 → 2 R n .Additionally, A denotes the all operator.The semantics of RTL is defined as follows: Consider the following notion: Definition 2 (c-divisible) An STL formula ϕ is said to be c-divisible, if all interval bounds of the temporal operators of ϕ are divisible by c.
Note that since a, b ∈ Q ≥0 , there always exists a c such that an STL formula is c-divisible.[38].
The transformation Υ c yields RTL formulae of the form where I, J i , K ij are finite index sets and ψ ijk are nontemporal subformulae.As can be seen, j relates to a time step c/2, whereas i and k relate to the number of conjunctions and disjunctions.As an example, the transformation of

Genetic programming
The controllers in this work are synthesized using genetic programming (GP) [27], a variant of genetic algorithms (GA) [41], which evolves entire programs rather than optimizing parameters.In our case, the evolved program is a controller based on elementary building blocks consisting of state variables and basic functions, such as addition and multiplication.Within genetic programming, a candidate solution, called an individual, is represented by a data structure enabling easy manipulation, such as an expression tree.This data structure is called the genotype, whereas the individual itself, e.g., an analytic function, is referred to as the phenotype.A pool of individuals, called the population, is evolved based on a cost function, called the fitness function, which assigns a fitness score to all individuals.Depending on the fitness score, individuals can be selected to be recombined or modified using genetic operators, such as crossover and mutation.In the former case, two subtrees of individuals are interchanged, whereas in the latter case, a random subtree is replaced by a new random subtree.Each genetic operator has a user-defined rate, which determines the probability of the operator being applied to the selected individuals.A number of individuals are selected based on tournament selection: a fixed number of individuals are randomly selected from the population, and the individual with the highest fitness is returned.The process of selection and modification through genetic operators is repeated until a new population is created.The underlying hypothesis is that the average fitness of the population increases over many of these cycles, which are referred to as generations.The algorithm is terminated after a satisfying solution is found or a maximum number of generations is met.
We use the variant grammar-guided genetic programming (GGGP) [29,42], which utilizes a grammar to which all individuals adhere: the population is initialized by creating random individuals adhering to the grammar and the used genetic operators are defined such that the resulting individuals also adhere to the grammar.The grammar is defined by the tuple (N , S, P), where N is a set of nonterminals, S a starting tree, and P a set of production rules, which relate nonterminals to possible expressions.An example of a grammar is shown in Figure 1a.In this grammar, the nonterminals correspond to polynomials pol , monomials mon over time t, and constants const .The starting tree S restricts the class of controllers to time-varying state feedback laws, linear in the state x ∈ R2 .Given the grammar in Figure 1a, an example of a genotype is shown in Figure 1b, which has the corresponding phenotype of 9.5x 1 + 4.2tx 2 .

Problem definition and solution approach
We consider nonlinear systems subject to disturbances of the form: , and the set of initial conditions I ⊂ R n .The sets I and Ω are compact, and , and ω : R → R l are assumed to be Lipschitz continuous.We consider sampled-data timevarying state-feedback controllers κ : where t k denotes the k-th sampling instant, t 0 = 0, and η is the sampling time.This results in a closed-loop system of the form (2 The goal of this paper is formalized as follows: Problem 3 Given a c-divisible STL formula ϕ, the openloop system (8), and a sampling time η, synthesize a closedform sampled-data time-varying controller κ : R ≥0 ×R n → R m such that for all initial conditions and disturbances the resulting trajectories ξ of the closed-loop system satisfy ϕ, i.e.: In this work, we propose a counterexample-guided inductive synthesis (CEGIS) framework to synthesize a controller such that (R, 0) |= Υ c (ϕ), thereby solving Problem 3 as follows from Theorem 1.The framework consists of iteratively proposing a controller obtained through GGGP 2 and then formally verifying the RTL formula Υ c (ϕ) using reachability analysis.The proposed controller by GGGP is optimized w.r.t. a set of simulated trajectories (obtained through numerical integration), with the underlying idea that these are relatively fast to compute and provide a sensible search direction for the synthesis.The computationally more intensive reachability analysis verifies the resulting controller.
Given the initial data, the algorithm goes through the following cycle, illustrated in Figure 2, where each cycle is referred to as a refinement: A1) A candidate solution is proposed using GGGP, based on simulation trajectories corresponding to the set I. A2) For the given candidate controller, the reachable set is computed.A3) Based on the reachable set, either: (a) (R, t) |= φ, which is formally verified through SMT solvers, thus a controller solving Problem 3 is found.(b) (R, t) |= φ, and a counterexample is extracted in the form of an initial condition x and a corresponding disturbance realization w.This pair (x, w) is added to I and the algorithm returns to step A1).(c) (R, t) |= φ and a maximum of refinements is reached, therefore the algorithm is terminated.
To quantify the violation or satisfaction of an RTL formula, we introduce quantitative semantics for RTL in the next section.The proposal of a candidate controller in step A1) is discussed in Section 5.The verification and counterexample generation in step A3) is discussed in Section 6.

Quantitative semantics
Inspired by the quantitative semantics of STL [11,12], we define quantitative semantics for RTL in this section.These quantitative semantics provide a robustness measure on how well the formula is satisfied.For an RTL formula φ with propositional subformulae ψ, the quantitative semantics is given by functions P (R, φ, t) and (x, ψ), respectively, recursively defined as: The quantitative semantics of STL are sound and complete [12,43].The quantitative semantics of RTL also have these properties: Theorem 2 (Soundness and completeness) Let φ be an RTL formula, R a reachable set, and t a time instance, then: Remark 4 Note that P (R, φ, t) = 0 does not imply (R, t) |= φ nor (R, t) |= φ.This is because on the boundary of an inequality, the distinction between inclusion or exclusion is lost within the quantitative semantics.That is, if (x, ψ) = 0, we also have (x, ¬ψ) = 0, hence the quantitative semantics of two mutually exclusive logic formulae evaluate to the same value.
The proof of Theorem 2 can be found in Appendix A. Consider a c-divisible STL formula ϕ and the corresponding RTL formula φ = Υ c (ϕ) in the form of (7).Using the equivalences a (φ 1 ∧ φ 2 ) = a φ 1 ∧ a φ 2 and rewriting ψ ijk in disjunctive normal form, we can express the RTL formula (7) as: where A ijk and B ijk a denote finite index sets, ∼∈ {≥, >}, and h ijk ab (x) ∼ 0 is a predicate over x.Using the quantitative semantics, the robustness measure of this RTL formula for a reachable set R and time t = 0 is given by 5 Candidate controller synthesis In this section, we detail step A1) of the proposed algorithm in Section 3, i.e., the proposal of a candidate controller.The candidate controller is synthesized using GGGP, by maximizing an approximation of the robustness measure, based on a finite number of simulated trajectories.The sampling time is equal to c/2 to coincide with the time instances at which the robustness measure P (R, φ, 0) is evaluated.For an RTL formula of the form (7) and t = 0, the first and the final time instances of relevance τ 0 and τ f , are given by τ 0 = 0 and τ f = c 2 max i∈I |J i |, respectively.Let us denote the finite set of sampled-time instances T = {τ 0 , . . ., τ f }.Given a candidate controller κ : R ≥0 × R n → R m , a set of pairs of initial conditions and disturbance realizations I, we consider an approximated reachable set Rκ I : T → 2 R n formed by all corresponding simulated trajectories x : T → R n , such that for a given time instance τ q ∈ T : Provided Rκ I , we approximate the robustness measure by P ( Rκ I , φ, 0).

Outline of the candidate controller synthesis
The proposal of a candidate controller in step A1) is based on approximating an optimal controller that solves: If the optimum is positive, i.e. the optimal controller yields a positive robustness measure w.r.t. the worst-case approximated reachability set, it can be expected that P (R, φ, 0) is positive 3 , which would imply from Theorem 2 and Theorem 1 that the corresponding optimal controller solves Problem 3. To (approximately) solve this optimization problem, our algorithm alternatively updates the controller and the disturbances within I, as described in the following steps, which are also illustrated in Figure 3: A1.a) Given the set I, We synthesize an analytic expression κ : R × R n → R m by using GGGP to solve: If for the resulting controller κ the robustness measure approximation P ( Rκ I , φ, 0) is negative, this optimization step in (15) is repeated.Otherwise, the algorithm continues to the next step.A1.b)Given the controller κ, for each initial condition x i in I, an analytic expression for a disturbance realization ω i : R → Ω is synthesized using GGGP, in which the robustness measure approximation is minimized, i.e.: The set I is then updated by replacing (x i , ω i ) with If the corresponding robustness degree approximation P ( Rκ {(x i ,ω i )} , φ, 0) is negative, the algorithm returns to step A1.a).Otherwise, if for 3 Due to e.g.truncation errors in the numerical integration, there can be a mismatch between infI P ( Rκ I , φ, 0) and P (R, φ, 0).Therefore, the proposed framework always relies on additional formal verification.all updated disturbance realizations the robustness measure approximation is positive, i.e., ∀i, P ( Rκ {(x i ,ω i )} , φ, 0) > 0, the algorithm returns a candidate controller.

Reference-tracking controllers
To speed up the synthesis, it is possible to impose a structure to the solution.In this section we discuss the design of reference-tracking controllers, based on a nominal reference trajectory x ref (t) and a corresponding feedforward input u ff (t).That is, we consider a time-varying referencetracking controller of the form: where κ fb : R ≥0 × R n → R m is a time-varying feedback controller.This controller is then used in a sampled-data fashion as in (9).The feedforward input and reference trajectory can be computed beforehand as follows: R1) Given a point x 0 ∈ int(I), (e.g. the centroid of I if I is convex), an analytic expression for u ff : R → R m is synthesized using GGGP, by maximizing the approximated robustness measure for a nominal trajectory starting at x 0 , i.e. a trajectory with no disturbance: arg sup R2) Given the feedforward input u ff , an analytic expression for the corresponding nominal reference trajectory x ref : R → R n is synthesized using GGGP.Given the simulated solution x(τ k ) corresponding to x(0) = x 0 , ω(t) = 0 l , and u(t) = u ff (t), x ref is obtained by fitting for each state dimension i ∈ {1, . . ., n} an expression to x i (τ k ), based on the Euclidean norm of the error vector e i = [e i (τ 0 ), . . ., e i (τ f )], with e i (τ k ) = x i (τ k ) − x ref,i (τ k ), i.e., maximizing: Using the synthesized pair (u ff (t), x ref (t)), the user-defined grammar within GGGP can be used to enforce the structure of a time-varying reference controller in (17) within step A1), as demonstrated by the following brief example: Example 5 Let us consider a one-dimensional system with dimensions n = m = 1.The structure of (17), where we further restrict κ to be linear in state, can be enforced by taking the starting tree S = u ff (t) + pol (x − x ref ) and the production rules from Figure 1a.

Reachability analysis and verification
In this section we detail step A3) of the algorithm.We use polynomial zonotopes PZ as the set representation of the reachable set4 : Definition 6 (Polynomial zonotope) Given a generator matrix G ∈ R n×h and exponent matrix E ∈ Z p×h ≥0 , a polynomial zonotope PZ is defined as , where E (k,i) denotes the i-th entry the k-th row of E and G (•,i) denotes the i-th column of G.The vector α = [α 1 , . . ., α p ] T is referred to as the parameterization vector of the polynomial zonotope.
Consider a parameterization vector α and a reachable set R(t) expressed as a polynomial zonotope.The corresponding point in the reachable set z(α, R(t)) ∈ R n is given by: where E R and G R denote the exponent matrix and generator matrix of R(t), respectively.The benefit of polynomial zonotopes as set representation is that dependencies between points in subsequent reachable sets are maintained under the reachability analysis operations [39].That is, for a reachable set R : R ≥0 → 2 R n and parameterization vector α, we have ξ(t) = z(α, R(t)) =⇒ ξ(0) = z(α, R(0)).This enables the extraction of an initial condition corresponding to a point for which the specification is violated.Using this method, we construct a counterexample in the form of a pair of initial condition and disturbance realization (x 0 , ω), such that the corresponding trajectory results in a violation of the RTL formula.After reachability analysis, the algorithm undergoes the following steps: B1) For all subformulae φ ijk in (12b), the corresponding robustness sub-score (13b) is computed by solving the following nonlinear optimization problem5 over the corresponding set R(jc/2): B2) Given the robustness sub-scores p * ijk , compute the full robustness measure (13a): B3) As we rely on nonlinear optimization, we cannot guarantee to find the global optimum p * .However, as it is a minimization problem, a suboptimal solution is an upperbound p, such that P (R, φ, 0) = p * ≤ p.Given p, either: (a) p < 0, hence the RTL specification is violated.In this case, given the argument (ijk) * solving (20), we extract an initial condition x 0 corresponding to α (ijk) * , i.e. x 0 = z(α (ijk) * , R(0)).For this initial condition x 0 , a disturbance realization ω is synthesized similarly to step A1.b), i.e., GGGP is used to solve: The counterexample given by the pair (x 0 , ω) is subsequently added to I.This new set I is then used to improve upon the synthesized controller in step A1).(b) p ≥ 0, hence the RTL specification is potentially satisfied.However, to guarantee this, we perform an additional verification step, based on Satisfiability Modulo Theories (SMT) solvers [45], which are capable of verifying first-order logic formulae.
The subformula (12b) holds if the following firstorder logic formula holds: ∀x ∈ R j c 2 : where again ∼∈ {≥, >}.Suitable SMT solvers to verify (21) include Z3 [46] when R(jc/2) and h ijk ab are expressed as polynomials, and dReal [47] when these are expressed as general nonlinear expressions 6 .Given the Boolean answers to the subformulae in (12b) for all i, j, and k, it is trivial to compute the Boolean answer to (12a).
A synthesized controller, formally verified in step B3.b), solves Problem 3, as formalized in the following theorem: Theorem 3 (Correct-by-design controller) Given a cdivisible STL formula ϕ, an open-loop system (8), and a sampling time η, if the algorithm in Section 3 returns a controller before the maximum number of refinements, then the closed-loop system satisfies ∀ξ ∈ S(Σ) : (ξ, 0) |= ϕ. ( PROOF.If the algorithm terminates, the returned controller results in a reachable set R such that (R, 0) |= ψ, where φ = Υ c (ϕ).By Theorem 1, we have ∀ξ ∈ S(Σ) : (ξ, 0) |= ϕ. 2 7 Dealing with conservatism Conservatism, in the reachability analysis and the transformation from STL to RTL, may cause that (R, 0) |= φ, whereas ∀ξ(0) ∈ I, (ξ, 0) |= ϕ, i.e., the desired STL specification holds for all initial conditions, whereas based on the reachability set, the RTL specification is not met.This conservatism can be reduced by refining settings such as the time steps or Taylor order in the reachability tool (see [49]), or reducing the parameter c to obtain less conservative RTL formulae φ, at the cost of increased overall computational complexity.Similarly, truncation errors of the integration scheme, and conservatism within reachability analysis (introducing spurious trajectories) can lead to mismatches between Rκ I and R(t).This mismatch can be bridged considering an optional error signal ε added to the simulated trajectory x(τ q ), which is co-synthesized with the disturbance realizations, as will be shown in the subsequent case studies.
Issues due to conservatism can also be dealt with within the synthesis of a candidate controller in step A1), e.g. the controllers within GGGP could be further optimized w.r.t. the robustness, such that the added robustness could potentially compensate for conservative reachability analysis.Controller complexity (measured as the number of nonterminals) can also be used as a secondary optimization criterion to facilitate less conservative reachability analysis.The resulting multi-objective optimization problem is solved using Pareto-optimality ranking [50] that results in a rank that is used as the new fitness value used within the selection.
Finally, note that the optimization problems in steps A1.a), A1.b), R1), R2), B1), and B3.a) are non-convex and therefore finding a global optimum cannot be guaranteed.The optimization problems are used to propose candidate controllers or to provide counterexamples that constrain the solution space.As the goal is to find a qualitatively correct controller rather than an (quantitatively) optimal one, loss in optimality is of a lesser importance.Moreover, by the use of mutation within genetic programming, the algorithm is capable of exploring the search space, until a solution is found.

Case studies
In this section we demonstrate the effectiveness of the proposed framework on benchmarks from competing synthesis methods, i.e. reachability-based [31] (car example), MPCbased [17] (path planning), and abstraction-based [5] (airplane landing manoeuvre).Moreover, we consider the effect of input saturation, which is enforced through the STL specification.Additionally, we consider a platooning benchmark [34] to investigate the scalability w.r.t.system dimension.While we use for the aforementioned benchmarks the reference-tracking controller structure discussed in Section 5.2, we demonstrate the ability to synthesize controllers from scratch on a simplified spacecraft [51] in Section 8.6.
The case studies are performed using an Intel Xeon CPU E5-1660 v3 3.00GHz using 14 parallel CPU cores.The GGGP algorithm is implemented in Mathematica 12 and the reachability is performed using CORA in MATLAB.Motivated by the non-convex and discontinuous nature of the optimization problems, we use population-based optimization methods, but any suitable optimization tool can be used instead.For the optimization problem in (19), we use particle swarm optimization of the global optimization toolbox in MATLAB.
Within each generation of GGGP, parameters within an individual are optimized using Covariance Matrix Adaptation Evolution Strategy (CMA-ES) [52], based on the same fitness function as used for GGGP.More specifically, we use the variant sep-CMA-ES [53], due to its linear space and time complexity.For the verification of (21), we use the SMT solver dReal with δ = 0.001.
Across all benchmarks, the probability rate of the crossover and mutation operators being applied on a selected individual are 0.2 and 0.8, respectively.Benchmark-specific settings are shown in Table 1, which include the number of simulations n s , number of individuals, and the number of GGGP and CMA-ES generations.Note that the number of GGGP generations for κ and ω i is the number of generations per step A1.a) and A1.b), and not the total of GGGP generations per proposal of a controller in step A1), which depends on the number of times step A1.a) and A1.b) are repeated.For each case study, we use a grammar with nonterminals and production rules as shown in Table 2.These nonterminals correspond to general time-dependent expressions expr t , time-dependent trigonometric functions trig t , time-and state-dependent polynomial expression pol t and pol x , respectively, time-and state-dependent monomials mon t and mon x , respectively, variables var , and constants const .The polynomials are restricted to polynomials over either time t or states x, where the state-dependent polynomials are further restricted to not contain zero degree monomials.The time-dependent expressions are formed by timedependent polynomials, a product of these polynomials, and time-dependent trigonometric functions, and a sum of two expressions.The trigonometric functions are restricted to hyperbolic tangents, sines and cosines with time-dependent polynomial arguments.Note that per case study, different starting trees are used, such that potentially only a subset of the grammar is available.E.g., if the starting tree is pol t , candidate solutions are restricted to time-dependent polynomial solutions.
We use Runge-Kutta as numerical integration scheme.To keep a constant number of initial conditions in I, counterexamples are added using a first-in, first-out principle.To compensate for the gap between the simulation and the reachability analysis (as discussed in Section 7), we consider an added error signal bounded by the scaled vector field of the dynamics f , parameterized by where δ is a constant and σ : R ≥0 → [−1, 1] n×n a timevarying diagonal matrix which determines the sign and magnitude of the error signal.The constant δ is optimized after each reachability analysis such that the mismatch between the robustness measure and the approximated robustness measure is minimized, i.e.: where {(x, ω)} is the counterexample pair computed in Section 6.
In reporting the synthesized controllers, its parameters are rounded from six to three significant numbers for space considerations.
Finally, in this section we denote the logic function indicating set membership of a set Y by ϕ Y , i.e. given a set Y in the form where h ij : R n → R, we have ϕ Y = i j h ij (x) ∼ 0.

Car benchmark
Let us consider a kinematic model of a car from [31]: where the states x 1 , x 2 , x 3 , x 4 denote the velocity, orientation, and x and y position of the car, respectively.Furthermore, u 1 and u 2 denote the inputs and w 1 and w 2 disturbances.The sampling time η of the sampled-data controller is set to be 0.025 seconds.Similarly to [31], we consider a "turn left" maneuver over a time interval T = [0, 1], where within T , the trajectories stay within the safe set S and at the final time instant, the system is in the goal set, captured by the STL specification: We consider the following safe set S and goal set G: To guide the synthesis, we impose the reference-tracking controller structure from Section 5.2 and therefore we first design a feedforward signal and reference trajectory using GGGP.For u ff , x ref , we use polynomial expressions as a function of time t, for the feedback law κ we restrict the search space to reference-tracking controllers which are linear in the tracking error and polynomial in time: and for ω i we consider saturated polynomials in time.This is done using the grammar with starting trees: Here, sat (ω i ,ωi) denotes a saturation function such that ω i (t) ∈ Ω, where sat (ω i ,ωi) (x) = max(ω i , min(x, ω i )).
Finally, for each disturbance realization, we co-evolve the error signal ε i in (23), which is dependent on the candidate controller κ and disturbance realization ω i : where diag denotes a diagonal matrix.For the simulations and reachability analysis, we use a sampling time of 0.025 seconds and 0.0125 seconds, respectively.
First, a feedforward control input and reference trajectory for a nominal initial condition are synthesized as described in Section 5.2.An example of a found feedforward controller and corresponding reference trajectory are shown in Table 4.For 10 independent runs, the average synthesis time of u ff and the reference trajectory per dimension x ref,i is shown in Table 3.Using these u ff and x ref as building blocks for the controller, κ is synthesized as described in step A1).An example of a synthesized K(t) in ( 26) is given by The corresponding reachable set is shown in Figure 4. We observe that the final reachable set is not within the goal set.The red dots represent the violation and the corresponding initial condition.After refining the controller iteratively, an example of a controller satisfying ϕ 1 after 3 refinements is shown in Table 4.
For 10 independent synthesis runs of κ, statistics on the number of generations, number of refinements, complexity in terms of number of non-terminals, and computation time is shown in Table 3 and Figure 7.In most cases, a solution was obtained around 3 refinements.However, due to the stochastic nature of the approach, in one case it took 20 refinements before a solution was found.

Input saturation
In our general framework, we do not canonically consider input saturation.Input saturation can be considered in multiple ways, such as restricting the grammar of the controller to include a saturation function, or even a continuous approximation using e.g. a sigmoid function.However, the downside of such an approach is that the reachability analysis under these functions is typically challenging for stateof-the-art reachability tools, due to the strong nonlinearity or hybrid nature.Instead, for illustrative purposes, we incorporate the constraint within the STL specification, such that for all states in the reachable set the saturation bounds are not exceeded.Let us revisit the car benchmark, where we consider the same input constraints as in [31], namely u ∈ U = [−9.81,9.81] × [−0.4,0.4].The STL specification is extended to: with The synthesis statistics are shown in Table 3 and Figure 7.
An example of a synthesized K(t) in ( 26) is given by In most cases, a solution was found in around 4 to 5 refinements, with the exceptions of two runs with 20 and 40 refinements, respectively.

Path planning for simple robot
Let us consider the path-planning problem for a simple robot adopted from [17].We deviate from [17] in considering the system in continuous time and consider bounded disturbances.The system is described by: where the state vector represents the x-velocity, y-velocity, x-position and y-position, respectively.The sampling time of the sampled-data controller η is set to be 0.5 seconds.
Similar to [17], we consider the specification in which the system needs to remain in a safe set S and eventually visit regions P 1 , P 2 and P 3 : with Similar to Section 8.2, we impose this constraint through the STL specification, yielding the following STL specification: where U is given by (28).We consider the same controller structure and grammar as the previous benchmark, with the exception of the grammar of the feedfoward input and reference trajectory.For these elements, we extend the grammar to expressions which can include trigonometric functions, by using the grammar in Table 2 and the starting trees S u ff = ( expr t , expr t ) and S x ref,i = expr t .For the simulations and reachability analysis, we use a sampling time of 0.5 seconds.The statistics on the synthesis is again shown in Table 3 and Figure 7.An example of the controller elements u ff , x ref and K(t) of a synthesized controller are shown in Table 4.The corresponding reachable set of the state and input is shown in Figure 5. Across 10 independent runs, commonly in 1 to 2 refinements a solution was found, with one run requiring 8 refinement.

Landing maneuver
Let us consider the landing aircraft maneuver, adopted from [5].The system model is given by where the states x 1 , x 2 , x 3 denote the velocity, flight path angle and the altitude of the aircraft, ν i denotes a disturbed input, where u 1 denotes the thrust of the engines and u 2 the angle of attack.Finally, D(ν, x 1 ) and L(ν, x 1 ) denote the lift and drag, respectively, and m = 60 • 10 3 kg, g = 9.81m/s 2 .The sampling time of the sampled-data controller is set at η = 0.25 seconds.Compared to [5], we do not consider measurement errors, but the proposed framework can be adapted arbitrarily to accommodate this type of disturbance.We define the following safe set, goal set and input bounds: and consider the following specification: where the set U is given by (28).That is, trajectories are always within the safe set and satisfy the input constraints, until between 18 and 20 seconds the goal set is reached.
We use the same controller structure and grammar as the path-planning problem.For the simulations and reachability analysis, we use a sampling time of 0.25 seconds.The algorithm settings are shown in Table 1.The statistics of 10 independent synthesis runs are again shown in Table 3 and Figure 7.An example of the controller elements u ff , x ref  and K(t) of a synthesized controller are shown in Table 4.
The corresponding reachable set of the altitude over time, as well as the reachable sets of the pitch angles at multiple time instances are shown in Figure 6.

Scalability: platoon
Consider a platooning system [34], described by: where N denotes the number of vehicles, x 1 , x 2 the position and velocity of the first vehicle, x 2i , x 2i+1 the relative position and relative velocities of vehicle i, and the input u i denotes the acceleration of vehicle i.We consider a sampling time η of 0.05 seconds.The specification involves the Table 3 Statistics over an average of 10 independent synthesis runs.Time FF: average computation time of the feedforward components; Total gen.: total number of GGGP generations for κ before a solution was found; Total ref.: total number of refinements; Complexity: number of total non-terminals within the genotype of the synthesized controller; GP κ: synthesis of candidate κ using GGGP; GP ω: disturbance realization optimization; RA: reachability analysis; CE: counterexample extraction; SMT: verifying the specification through an SMT solver; min: minimum; med: median; max: maximum.The average contribution percentages do not sum up to one, as the contribution of routines such as writing (SMT) files are not displayed.acceleration of the platoon up to a goal set within a second, subjected to input constraints, while each vehicle maintains a safe distance, which is captured by the STL formula where and U = [−10, 10] N .We use the feed forward signal u f f,i = 1.4 for all i ∈ {1, . . ., N }, where we exploit the fact that the desired control input for the agents after the first one is equal to the first one.The reference trajectory for this feedforward controller is given by: We impose the structure κ i (x, t) = κ i−1 (x, t) − κ i (x, t) for i = 2, . . ., N , where for κ i (x, t) and κ i (x, t) we use the same control the same controller structure and grammar as the path-planning problem.For the simulations and reachability analysis, we use a sampling time of 0.05 seconds.The algorithm settings are shown in Table 1.
Given a maximum of 5000 GGGP generations, for N = 2, in 10 independent runs a controller was found, for N = 3, in 9 out of 10 runs, and for N = 4, no solutions were found.The results statistics for the successful synthesis runs are again again shown in Table 3 and Figure 7.

Discovering structures from scratch: spacecraft
Let us consider a simplified model of spacecraft [51], described by: where states denote the angular velocity and the inputs denote control torques that are aligned with the principle axes.We consider a sampling time η of 0.1 seconds.For the simulations and reachability analysis, we use a sampling time of 0.1 seconds.Note that for stabilization, linearization methods are not appropriate, as the system linearized around points in the set {x ∈ R 3 | x 1 , x 2 = 0} are not controllable.The goal is to control the system in finite time to a set around the origin G = [−0.2,0.2] 3 and the control input is constrained s.t.u ∈ U = [−5, 5] 2 , which is captured by the STL specification For the disturbance, we use the same grammar as in the previous case studies.For the controller we consider polynomial state feedback controllers This is done using the starting tree S u = ( pol x , pol x ).The algorithm settings are shown in Table 1.The results of 10 independent synthesis runs are again shown in Table 3 and Figure 7.An example of a found controller is given by κ Of the 10 synthesized controllers, 8 controllers have the same structure as the above controller.

Discussion
We discuss now the results from Section 8 and relate them to the results in the literature.Recall that a GGGP generation is the cycle of creating a new population through fitness evaluation, selection and applying genetic operators.A refinement is defined as the cycle of proposing a candidate solution based on GGGP, validation using reachability analysis, and extracting counterexamples.Therefore, in each refinement, there are one or multiple GGGP generations.First of all, Figure 7a shows a polynomial relation between the number of refinements and the total number of GGGP generations.Secondly, Figure 7b shows a polynomial relation between the number of refinements versus the total computation time.Finally, Figure 7c illustrates that more refinements does not imply that complexity of the controller increases.However, the complexity of the found controller does seem to be dependent on the system and STL specification.
While the computation time is related to the number of refinements, this relationship depends on the STL tion and the dynamics.For the car benchmark without and with input constraints, we observe that the added constraints within the STL specification increased the required number of generations, and typically required more time per refinement.Hence, the total computation time heavily depends on the STL specification, as expected.Additionally, we observe an increase in the median of the complexity of the resulting controllers.
With the platoon example, we see that for an increase in state dimension the number of generations required to find a solution significantly increases.This is expected, as the search space is significantly larger.For N = 4, no solutions were found within 5000 GGGP generations.However, it is worth nothing that the optimal solution for N = 4 in [34] very tightly satisfies specification.However, general conclusions regarding computation time and system order cannot be drawn.For example, the input-constrained car and path planning benchmarks are both four-dimensional systems, where the STL specification of the latter is more involved.Regardless, the path-planning problem has a lower computation time and requires less generations and number of refinements, indicating a dependency between the computation time and the dynamics of the system, which is also as expected.
The resulting offline time complexity of our proposal is clearly higher than alternative methods like the one in [31], with synthesis time for the car benchmark of just around 10 seconds, or in [5], taking about 700 seconds for the aircraft benchmark controller synthesis.Both these alternative methods are faster offline, at the cost of potentially much larger controllers to be stored: a linear controller for each sampling time in [31]; an exponentially growing number of entries in a look-up table with increasing system dimension in discretization-based methods like that of [5].However, for the systems for which synthesis is successful in both ours and discretization-based methods, the size of the resulting controllers do not seem prohibitive.Nonetheless, a more clear advantage of our approach is the ability to constraint the controller structure so as to produce controllers that are easier to understand by end-users than e.g. the lookup-tables (or BDDs) of discretization-based methods.
An alternative to alleviate the memory footprint of controllers is the use of MPC approaches, such as [17] for the path-planning problem.These solutions require additional online computational complexity compared to the quick evaluation that our controllers enable.As an example, the most complex of our aircraft controllers just requires on average 4.7 • 10 −7 seconds to compute the control actions (evaluated using timeit in MATLAB running on an Intel i7-8750H CPU).
In summary, the tests performed indicate that this approach may be competitive with respect to competing alternatives whenever the application at hand requires both low memory and computational footprint, but more importantly whenever there is a need to impose specific controller structures to improve interpretability of the controller.
As most automated synthesis methods for the type of problems we handle, the proposed framework is not a complete method.That is, the method is not guaranteed to find a solution in a finite number of iterations, regardless of its existence.Nevertheless, for the presented case studies, in 10 independent runs a solution was always found.Since the search space is navigated nondeterministically, we observed that the number of GGGP generations, number of refinements and computation time can vary significantly for each run.
As it has been highlighted, the offline computation time of our current implementation is not competitive with those in other references.Note however, that the performance of our implementation has not been optimized for speed, being a mix of Matlab and Mathematica code, or parallelization of individuals in GGGP, which consume most of the computation, c.f. Table 3.Additionally, limiting the fragment of STL, e.g., to h(s) linear, the robustness degree computation can be considerably simplified.If the robustness measure is also upper bounded in a non-conservative manner, the usage of SMT solvers becomes redundant.This would significantly reduce the computation time for benchmarks such as the path-planning problem.Finally, input constraints, currently part of the STL formula, can also be captured using  saturation functions in our grammar, simplifying the synthesis.However, as a caveat, discontinuous functions such as saturation functions significantly complicate the reachability analysis.

Conclusion
We have proposed a framework for CEGIS-based correct-bydesign controller synthesis for STL specifications based on reachability analysis and GGGP.The effectiveness has been demonstrated based on a selection of case studies.While the synthesis time is outmatched by methods solving similar problems, the proposed method results in a compact closedform analytic controller which is provably correct when implemented in a sampled-data fashion.This enables the implementation in embedded hardware with limited memory and computation resources.

Nonterminals N and startingFig. 1 .
Fig. 1.Example of a grammar and a genotype adhering to it.The corresponding phenotype is given by 9.5x1 + 4.2tx2.

Fig. 3 .
Fig. 3. Schematic overview of the synthesis of candidate controller.

Fig. 4 .
Fig. 4. Reachable set for the first controller for the car benchmark, which violates the desired controller specification.Figures (c) and (d) illustrate the reachable set near the goal set.Red dots: a point in the final reachable set that is outside of the goal set and its corresponding initial state, yellow: initial set, green: goal set G, gray: reachable set, red: safe set S, blue: reachable set at t = 1, black: example of simulation traces.

Fig. 5 .
Fig. 5. Reachable set of a found controller for the path planning benchmark.(a) Reachable set of the x-y position.(b) Reachable set of the input over time.Yellow: initial set, gray: reachable set, red: safe set S and input constraints, green: target sets P1, P2, and P3, black: selection of simulated trajectories, blue: reachable sets at certain time instances within one of the target sets.

Fig. 6 .
Fig. 6.Time evolution of the reachable set of the altitude x3 under a synthesized controller for the landing maneuver.Gray: Reachable set over time of the altitude x3.Blue: the set of the aircraft pitch x2 + u2 for 7 time intervals.

Table 1
General settings for each of the case studies.The number of individuals, GGGP generations, and CMA-ES generations are shown for each controller component and disturbance realizations.

Table 4
Examples of synthesized controllers.Numerical values are rounded for space considerations.