Finite Elements with Switch Detection for Numerical Optimal Control of Nonsmooth Dynamical Systems with Set-Valued Heaviside Step Functions

This paper develops high-accuracy methods for numerically solving optimal control problems subject to nonsmooth differential equations with set-valued Heaviside step functions. An important subclass of these systems are Filippov systems. Writing the Heaviside step function as the solution map of a linear program and using its optimality conditions, the initial nonsmooth system is rewritten into an equivalent dynamic complementarity system (DCS). We extend the Finite Elements with Switch Detection (FESD) method [38], initially developed for Filippov systems transformed via Stewart’s reformulation into DCS [44], to the above mentioned class of nonsmooth systems. The key ideas are to start with a standard Runge-Kutta method for the DCS and to let the integration step sizes to be degrees of freedom. Next, we introduce additional conditions to enable implicit but exact switch detection and to remove possible spurious degrees of freedom if no switches occur. The theoretical properties of the method are studied. Its favorable properties are illustrated on numerical simulation and optimal control examples. All methods introduced in this paper are implemented in the open-source software package nosnoc [35].


Introduction
The set-valued version of the Heaviside step function, denoted as γ : R → P(R), is defined as follows: {1}, y > 0, [0, 1], y = 0, {0}, y < 0. (1) Here P(R) represents the power set of R. The set-valued Heaviside step function, often referred to simply as the step function, provides an intuitive way to model Boolean relationships within a dynamical system.
In this paper, we study nonsmooth dynamical systems of the form: ẋ ∈ F(x, u, Γ((ψ(x)))). (2) Here, u ∈ R nu represents a known control function that could be obtained, for example, by solving an optimal control problem.The equation ( 2) is an instance of a differential inclusion (DI) since the right-hand side is set-valued due to the function Γ, which can enter F in a nonlinear way.Thus, the set F(x, u, Γ((ψ(x)))) can be convex or nonconvex.
The nonsmooth and set-valued nature of F(x, u, Γ((ψ(x)))) in the DI (2) introduces complexities in its numerical treatment that we address in this paper.In particular, we develop accurate and efficient numerical methods for solving simulation and optimal control problems subject to such DIs.An important subclass of nonsmooth systems with Heaviside step functions are Filippov systems [5,18,22].Moreover, some classes of systems with state jumps can be reformulated into Filippov systems via the time-freezing reformulation [28,29,32,34,37].Therefore, this modeling approach enables us to treat a broad class of practical problems.
In a Piecewise Smooth System (PSS), the state space is split into nonempty regions where each of them is equipped with a different vector field.
Step functions are used in PSS modeling to determine in which region or on what boundaries the trajectory is.Smoothed single-valued versions of the Heaviside step function are often used in the numerical treatment of a PSS [11,24,31].A prominent application example of step functions are gene-regulatory networks [5,31].Another common application of step functions is to compute selections of Filippov sets in sliding modes [17,18].
The event of x(t) becoming nondifferentiable is called a switch.In our case, this corresponds that one or more components of ψ(x) become zero, or if they were zero, they become nonzero.Many mature numerical simulation methods to treat ODEs with switches exist, and the corresponding theory is wellestablished [4].However, numerical methods for solving Optimal Control Problems (OCPs) subject to nonsmooth dynamical systems are not yet at such a mature stage.The key ingredient to high-accuracy methods and efficient numerical optimal control is to detect the time points when the switches occur [38].In the control community, a popular approach to deal with the nonsmoothness is to introduce integer variables to label all modes of the nonsmooth system [10].This leads to mixed-integer optimization problems for solving OCPs.However, they often become computationally intractable when nonconvexities appear or exact switching times need to be computed.
On the one hand, the computationally usually less favorable indirect methods for optimal control are not widespread since Pontryagin-like conditions are not established for many classes of nonsmooth systems, cf.[25,12] for an overview.On the other hand, the application of direct methods, i.e., in a first-discretizethen-optimize approach, is not as straightforward as for OCPs with smooth ODEs and can lead to spurious solutions and wrong conclusions.This was first rigorously explained in the seminal paper of Stewart and Anitescu [49].They show that the derivatives of an integration map obtained by time-stepping methods (e.g., the implicit Euler method) with respect to initial values and controls (called numerical sensitivities) do not converge to corresponding continuous-time values (called sensitivities), no matter how small the integrator step size is.In practice, this can result in termination at feasible, but nonoptimal points [33].Moreover, they show that the numerical sensitivities of the smoothed approximations of a nonsmooth system are only correct if the step size shrinks faster than the smoothing parameter.This makes the smoothing approach impractical, as very small step sizes are needed even for moderate accuracy.
The limitations of direct methods based on time-stepping were recently overcome by the method of Finite Elements with Switch Detection (FESD) [38].This method is based on standard Runge-Kutta (RK) discretizations of the Dynamic Complementarity systems (DCS), where the integrator step sizes are left as degrees of freedom as first proposed by Baumrucker and Biegler [9].Additional constraints are introduced to have a well-defined system with exact switch detection.The discretization yields Mathematical Programs with Complementarity Constraints (MPCC).They are nonregular and nonsmooth Nonlinear Programs (NLP) [8,43].Still, with suitable reformulations and homotopy procedures, they can be solved efficiently using techniques from smooth optimization.
Initially, we developed FESD for Stewart's reformulation of Filippov systems into DCS [44].In this paper, we extend these ideas to nonsmooth systems of the form of (2) and corresponding OCPs.This enables us to cover a more general class of nonsmooth ODEs with a discontinuous r.h.s..

Contributions.
We provide a detailed study of the transformation of a Filippov system into a dynamic complementarity system (DCS) via set-valued Heaviside step functions.The key step in this transformation is to view the Heaviside step function as the solution of a parametric linear program and to use its optimality conditions.Furthermore, if the active set in the DCS is fixed, we obtain locally a smooth ODE or Differential Algebraic Equation (DAE).We study the well-posedness of these systems.In the theoretical analysis and algorithmic development, we focus on the Filippov systems.Simple tutorial examples accompany all developments.Most importantly, we present the extension of the FESD method to nonsmooth systems that are described with step functions.We also adapt the convergence and well-posedness results of [38] to this case.If Filippov systems are modeled via step functions, one ends up with multi-affine expressions, consisting of products of step functions, in the r.h.s. of the ODE.We propose a lifting algorithm that introduces auxiliary variables and makes these expressions "less nonlinear".This can improve the convergence of the proposed method [6].The dynamical system (2) is more general than the Filippov system.Therefore, the FESD method developed here applies to this broader class of systems.The performance of the new method is compared to the original FESD [38] and standard RK discretizations in terms of accuracy and computational time.All methods are implemented in the open-source software package nosnoc [1].
Outline.In Section 2, we define and relate various classes of nonsmooth systems, including Dynamic Complementarity Systems (DCS), Filippov systems, and systems with set-valued Heaviside step functions.It is also demonstrated that the latter is equivalent to a DCS.Section 3 explores the properties of this DCS for a fixed active set and during active-set changes.In Section 4, we introduce the FESD method for this class of problems.Section 5 presents convergence and well-posedness results for the new method.Additionally, in Section 6, we demonstrate how to efficiently model piecewise smooth systems with step functions and introduce a lifting algorithm to reduce nonlinearity in the DCS.In Section 7, we showcase the developments using several numerical examples.
Notation.The complementarity conditions for two vectors a, b ∈ R n read as 0 ≤ a ⊥ b ≥ 0, where a ⊥ b means a ⊤ b = 0.For two scalar variables a, b the so-called C-functions [20]  .All vector inequalities are to be understood element-wise, diag(x) ∈ R n×n returns a diagonal matrix with x ∈ R n containing the diagonal entries.The concatenation of two column vectors a ∈ R na , b ∈ R n b is denoted by (a, b) := [a ⊤ , b ⊤ ] ⊤ , the concatenation of several column vectors is defined analogously.A column vector with all ones is denoted by e = (1, 1, . . ., 1) ∈ R n and its dimension is clear from the context.The closure of a set C is denoted by C, its boundary as ∂C.For a given vector a ∈ R n and set I ⊆ {1, . . ., n}, we define the projection matrix P I ∈ R |I|×n , which has zeros or ones as entries.It selects all component a i , i ∈ I from the vector a, i.e., a Given a matrix M ∈ R n×m , its i-th row is denoted by M i,• and its j-th column is denoted by M •,j .For the left and right limits, we use the notation x(t s + ) = lim t→ts, t>ts x(t) and x(t s − ) = lim t→ts, t<ts x(t), respectively.

Overview and relations between different classes of nonsmooth dynamical systems
This section defines, reviews, and relates several classes of nonsmooth dynamical systems relevant to the algorithmic development in this paper.The class of Dynamic Complementarity Systems (DCS) is defined in Section 2.1.In Section 2.2, generic ODEs with a Discontinuous Right-Hand Side (DRHS) and their Filippov extension are considered.In Section 2.3, these results are applied to a more structured ODE with DRHS, namely to piecewise smooth systems.This is followed by Section 2.4, where another class of structured ODEs with DRHS, i.e., nonsmooth systems where Heaviside step functions appear in the dynamics.This class is the focus of this paper, but it is closely related to other classes that we review.Section 2.5 relates these systems to Filippov and DCSs.The key tool is to rewrite the set-valued Heaviside step function in terms of the optimality conditions of an appropriate linear program.For comparison, we review Stewart's way of rewriting a Filippov system as a DCS in Section 2.6.We conclude with Section 2.7, where we summarize the relations between all system classes regarded in this section.set-valued Heaviside step function, Eq. ( 1) Γ(ψ(x)) concatenation of all regarded step functions , Eq. ( 1) i-th mode of the total n f modes of the PSS system, Eq. ( 8) F(x, u, Γ(ψ(x))) r.h.s. of the nonsmooth ODE with set-valued Heaviside step functions, Eq. (2) F F (x, u) r.h.s. of the Filippov differential inclusion, Eq. ( 6), (11) F AP (x, u) r.h.s. of the Aizerman-Pyatnitskii differential inclusion, Eq. 14 Filippov set obtained with set-valued Heaviside step functions, Eq. 20

I(•)
active set for Filippov systems, Eq. ( 12) T n n−th time interval with fixed active set T n = (t s,n , t s,n+1 ) K index set of all switching functions ψ i (x) that are zero for a given I, Sec.3.2 This section assumes that a continuous control function u : [0, T ] → R nu is given.Discontinuous u(t) can be considered by partitioning the considered time interval [0, T ] into pieces where u(t) is locally continuous, and considering a sequence of ODEs with continuous control functions.

Dynamic complementarity systems
A dynamic complementarity system (DCS) [13,39] is the problem: where x ∈ R nx are the differential states, y ∈ R ny and z ∈ R nz are the algebraic states of the DCS.The DCS consists of an ODE (defined the function f : ) coupled with algebraic equality constraints (3b) (defined by g e : R nx × R ny × R nz → R nz ) and complementarity constraints (3c) (defined by y : [0, T ] → R ny and g c : R nx × R ny × R nz → R ny ).We assume that the problem functions f, g e and g c are at least twice continuously differentiable in all arguments.In the complementarity conditions (3c), both components y i and g c,i (x, y, z) have to be non-negative for i = 1, . . ., n y , but only one of them is allowed to be strictly positive at a time t.These conditions encode combinatorial structure in the problem and make it nonsmooth.In contrast to many other nonsmooth system formulations, complementarity conditions can be efficiently treated with Newton-type methods, which makes them attractive from a computational point of view [20].
The formulation in Eq. ( 3) is very general and further assumptions on f, g e and g c determine the existence, uniqueness, and qualitative properties of x, y and z.Several important models fit into the form of (3), most prominently rigid bodies with friction and impact [13,48] and electric circuits with electronic devices [3].We refer the reader to [13,39,48] for an extensive collection of theoretical results and application examples.
If one writes (3c) equivalently via a C-function Φ, i.e., Φ(y, g c (x, y, z)) = 0, the DCS (3) can be seen as a nonsmooth Differential Algebraic Equation (DAE).The DCS is an instance of a more general class of problems called Differential Variational Inequalities (DVI), formally introduced by Pang and Stewart [39].One way to classify DCS (and DVIs) is their index [39], that is, how many times g c (x, y, z) = 0 must be differentiated with respect to time t to find z and y as a function of x.A similar concept is the relative degree, cf.[13,Appendix C].Index zero DVIs (and DCS) have continuously differentiable solutions [39,Proposition 5.1].The existence and uniqueness of solutions of index zero systems are proven in [39].Index one DVIs have in general absolutely continuous solutions [48].Uniqueness for some index one DVIs is proven by Stewart [47].Index two systems are systems with state jumps [48,Chapter 6].For index two DVIs only existence can be proven [46].In general, for higher index DVIs solutions fail to exist [48], and if they exist they are distributions [13].In this paper, we are interested in DCS where x(t) and y(t) are continuous functions of time, and z(t) is allowed to be discontinuous.

Discontinuous ODEs and Filippov systems
Consider an ODE with a discontinuous right-hand side (DRHS): with a given initial value x(0) = x 0 .More precisely, f is discontinuous in x but continuous in u.
Classical solutions x(t) to (4) on an interval [0, T ] are continuously differentiable, which is impossible with a discontinuous f .Carathéodory solutions of an ODE satisfy the ODE in integral form, i.e., x(t) = x(0) + t 0 f (x(t), u(t))dt for t > 0, where the integral is the Lebesgue integral.This allows discontinuities of f in x and in some situations, a Carathéodory solution might be of use, as x is absolutely continuous, and it requires (4) to be satisfied for almost all t ∈ [0, T ].However, this relaxation is not always sufficient as can be seen from the following example [16,Example 5]: with x(0) = x 0 .For x 0 > 0, there exist the solution x(t) = x(0) − t for t ∈ [0, x 0 ).Similarly, for For x 0 < 0, there exist the solution x(t) = x(0) + t for t ∈ [0, −x 0 ).For t larger than |x 0 | in both cases, each solution reaches the point x(t) = 0 and cannot leave it as the vector fields from both sides push towards it.However, since ẋ = 0 ̸ = −1, we have no solution in the classical or Carathéodory sense, and a more general concept is necessary.Deciding on which more general notion of solution to use is a modeling step.A popular option in the control community is Fillipov's extension, which suggests embedding the ODE (4) into the following differential inclusion [21]: The right-hand side is now a closed convex set.Given a point x(t) and the corresponding u(t), the idea behind this definition is to regard the closed convex hull of all neighboring values in a ball x(t) ∈ δB(x(t)) instead of only f (x(t), u(t)) and thereby ignoring all values of f (x(t), u(t)) on sets of measure zero.In practice, these sets are surfaces on which f becomes discontinuous or points/surfaces where f is not even defined.If f (•) is continuous at x(t) we obtain F F (x(t), u(t)) = {f (x(t), u(t))}.This DI ( 6) is a so-called Filippov system.An absolutely continuous function x is said to be a Filippov solution if it is a solution to the initial value problem defined (6).It is guaranteed that at least one solution exists to the Filippov DI (6) [16,22].For the uniqueness of solutions, more assumptions must be made, for explicit statements see [13,16,22].
For illustration, we revisit the example in (5) and apply Filippov's extension to it.We obtain It can be seen that this DI has a solution for any x(0) and for all t ∈ [0, ∞), since for t > |x 0 | we have that ẋ = 0 ∈ [−1, 1].

Piecewise smooth systems as Filippov systems
In the modeling of physical systems and control applications the discontinuities in (4) usually appear in a very structured way.Such an example are piecewise smooth systems, which read as: where R i are disjoint, connected, and open sets.They are assumed to be nonempty and to have piecewisesmooth boundaries ∂R i .We assume that i∈J R i = R nx and that R nx \ i∈J R i is a set of measure zero.The functions f i (•) are assumed to be at least twice continuously differentiable and Lipschitz continuous on an open neighborhood of R i .We regard PSS where the regions R i are defined via a finite number of switching functions ψ j (x), j = 1, . . ., n ψ .It is assumed that the functions ψ j (x) are Lipschitz continuous and at least twice continuously differentiable.
Definition 1 (Base regions).Given n ψ scalar switching functions ψ j (x), j ∈ C := {1, . . ., n ψ }, we define the n f = 2 n ψ base regions: Ri .These definitions are compactly expressed via a dense sign matrix S ∈ R n f ×n ψ : The matrix S has no repeating rows and no zero entries.The sets Ri are defined using the rows of the matrix S: Note that the boundaries of the regions ∂ Ri are unions of subsets of the zero-level sets of corresponding functions ψ j (x).For notational convenience and ease of exposition, we assume that the PSS regions are equal to the base regions, i.e., n f = 2 n ψ and R i = Ri for i = 1, . . ., n f .Other cases are discussed later in Section 6.Note that the ODE (8) is not defined on the region boundaries ∂R i .However, if the vector fields from both sides of ∂R i push towards it, the solution needs to evolve on ∂R i .The cases when x(t) must evolve on ∂R i [16,22] are so-called sliding modes.To define the sliding mode dynamics, one can apply Filippov's extension to the PSS (8).Furthermore, the special structure of the PSS allows a more explicit definition of (6) via a finite number of multipliers θ i [22,44].The corresponding DI reads as An important notion for Filppov PSS (11) is the active set, which is defined as the set Note that if x(t) is in the interior of a region, then I is a singleton, and for sliding modes, it consists of the indices of regions neighboring the current sliding surface.
Remark 2. A solution of the Filippov PSS (11) is an absolutely continuous function.However, systems with state jumps (and thus discontinuous x(t)) are often of practical interest.One way to transform such systems into a PSS, and thus to apply the methods developed in this paper, is to use the time-freezing reformulation [28,29,34,37,50].In several classes of systems with state jumps some parts of the state space are forbidden (e.g., a hooping robot is not allowed to penetrate the ground).The main idea in the time-freezing is to define auxiliary regions R i in the forbidden region and corresponding auxiliary dynamics f i (x).By construction, the endpoints of the trajectory parts evolving in the auxiliary regions satisfy the state jump law.Furthermore, a clock state t(τ ) is introduced, which evolves in the feasible region of the initial system ( dt dτ > 0) and is frozen in the initially forbidden region ( dt dτ = 0).Here τ is the time of the new relaxed system.Now, by taking only the parts of the trajectory where the clock state t(τ ) evolved, we recover the discontinuous solution of the original system.An example of such a system is treated in Section 7.2.

Aizerman-Pyatnitskii differential inclusions
So far we have introduced dynamic complementarity and Filippov systems.Now we introduce another special case of the discontinuous ODE (4) and relate it to the others later with the help of step functions.Regard and ODE with DRHS of the form of: where f : R nx × R nu ∈ R nv is continuous in all arguments, but v(x) is discontinuous, e.g. it could be the usual single-valued Heaviside step function.Such systems appear commonly in sliding mode control or the modeling of gene-regulatory networks [5].It is assumed that at each point of discontinuity, for the components v i (x) a closed convex set V i (x) is given, out of which the corresponding arguments of ( 13) take their values.Hence, one can define the following differential inclusion: The set F AP (x, u) is in general nonconvex, except in some special cases.For example, F AP (x, u) is convex if v(x) enters the r.h.s. of (13) linearly and V (x) is closed convex.The DI (14) does not have as rich a theory as Filippov DIs, and there are fewer results available on the existence of solutions and convergence of numerical methods [5].These systems are called Aizerman-Pyatnitskii DIs, cf.[5] and [22, page 55, Definition c].Time-stepping methods for such systems were developed in [5].
In this paper, we focus on the case where the functions V i (x) are given by set-valued Heaviside step functions.In particular, we regard the DI (2) introduced at the beginning of the paper, where V (x) = Γ(ψ(x)).

Rewriting Filippov PSS as DCS via set-valued Heaviside step functions
The next question we address is: How can we find an (implicit) function of x for computing the Filippov multipliers θ in (11)?To derive such a function, we will make use of the set-valued step function and the algebraic representations of the regions R i in (10).Moreover, expressing the step function via the Karush-Kuhn-Tucker (KKT) conditions of a suitable Linear Program (LP), we can write the Filippov DI (11) as an equivalent DCS.This DCS will be the main formulation used in the development of the FESD method.

Heaviside step functions via linear programming
We start the described procedure with a closer look at the step functions.Let us denote by α ∈ R n ψ a selection α ∈ Γ(ψ(x)).A well-known way to express the function Γ(ψ(x)) [5,9] is the use of the solution map of the parametric linear program: Note that all components of α are decoupled in this LP, i.e., every α i can be obtained by solving a onedimensional LP with the objective −ψ i (x)α i and the feasible set 0 ≤ α i ≤ 1.Let λ n , λ p ∈ R n ψ be the Lagrange multipliers for the lower and upper bound on α in (15b), respectively.The KKT conditions of (15) read as Now we have a purely algebraic representation of the set-valued Heaviside step function.Let us look at a single component α j and the associated functions ψ j (x).From the LP (15) and its KKT conditions, one can see that for ψ j (x) > 0, we have α j = 1.Since the upper bound is active, we have that λ n j = 0 and from (16a) that λ p,j = ψ j (x) > 0. Likewise, for ψ j (x) < 0, we obtain α j = 0, λ p j = 0 and λ n j = −ψ j (x) > 0. On the other hand, ψ j (x) = 0 implies that α j ∈ [0, 1] and λ p j = λ n j = 0. From this discussion, it can be seen that ψ(x), λ n and λ p are related by the following expressions: That is, λ p collects the positive parts of ψ(x) and λ n the absolute value of the negative parts of ψ(x).From this relation, we can immediately conclude the following: Lemma 3. Let ψ(x(t)) be a continuous function of time, and then the functions λ p (t) and λ n (t) are continuous in time.
The continuity of the Lagrange multipliers λ p (t) and λ n (t) plays a crucial role in the switch detection in the FESD method.The exact switch detection is necessary for high-order accuracy and the correct computation of numerical sensitivities.

Filippov system as a dynamic complementarity systems
For the sake of clarity, we start by illustrating what we want to achieve on a simple example and give in the sequel the general expression.
Example 1 (Step representation).We regard a PSS with four regions defined via two scalar switching functions ψ 1 (x) and ψ 2 (x).The regions are equal to the base sets from Definition 1 and read as . The corresponding Filippov system (defined by (11) reads as: On the other hand, by direct evaluation of the step functions, one can see that Therefore, we conclude that the sets in the r.h.s. of (18) and (19) are the same sets, i.e., we can express the Filippov set via set-valued Heaviside step functions.
Furthermore, we can observe how the sign pattern in S determines how α j enters the expression for θ i .For S i,j = 1 we have α j , for S i,j = −1 we have (1 − α j ).In summary, the definition of θ i consists of products of of α j and (1 − α k ), i.e., it is multi-affine in the selections α j , j = 1, . . ., n ψ .
We generalize the patterns observed in the previous example and define the set Note that we have Similar definitions of F H (x, u) as in ( 20) can be found in [18, Section 4.2] and [24, Section 2.1].Observe that this set has the same form as the r.h.s.F AP (x, u) in (14).Next, we show that F H (x, u) is indeed the same set as F F (x, u), i.e., the set in the r.h.s. of (11).
Lemma 4 (Lemma 1.5 in [18]).Let a 1 , a 2 , . . ., a m ∈ R. Consider the 2 m non-repeated products of the form then it holds that Proof.We only need to show that θ i ≥ 0 for all i ∈ J and since it consists of a product of terms that takes value in [0, 1].Without loss of generality, regard θ 1 and suppose that x / ∈ R 1 .This means that x ∈ R i , i ̸ = 1 and that at least one ψ j (x) < 0, j ∈ C, which implies that α j = 0. From (21) it follows that θ 1 = 0 if x / ∈ R 1 .By similar arguments it follows that θ i = 0 if x / ∈ R i for i = 1, . . ., n f .Next we show that i∈J θ i = 1.We introduce the change of variables: Then all θ i are of the form By applying Lemma 4, we conclude that i∈J θ i = 1 and the proof is complete.□ To pass from the definition in Eq. ( 20) to a dynamic complementarity system, we state the KKT conditions of ( 15) to obtain an algebraic expression for Γ(ψ(x)).Combining this with the definition of the Filippov set in (20) and the expression for θ i in (21), we obtain the following DCS: where We group all algebraic equations into a single function and use a C-function Ψ(•, •) for the complementarity condition to obtain a more compact expression: Finally, we obtain a compact representation of (22) in the form of a nonsmooth DAE: We can see that ( 22) is an instance of the generic DCS (3).Set z = (θ, α) and y = (λ n , λ p ).It follows that f (x, y, z) = F (x, u)θ, the function g e (x, y, z) is define by the expressions in (22b) and (22c), and g c (x, z, y) = (α, e − α).Remark 6.In this section, we have focused on PSS, their Filippov extension, and their relation to Aizerman-Pyatnitskii DIs (14).However, we can also reformulate generic Aizerman-Pyatnitskii DIs with Heaviside step functions (2) into DCS by replacing the step function by (16).There is no need for equivalence to Filippov systems to apply this procedure, and the numerical method developed in this paper can also be applied directly to such DCS.Such a more general example is treated in Section 7.1.

Stewart's representation
The FESD method [38] was originally developed for Stewart's representation [44] of Filippov systems.Stewart's representation assumes a specific definition of the regions R i and uses a LP to transform the Filippov system into an equivalent DCS, which is not the same DCS as (22).For comparison and completeness, we briefly introduced Stewart's DCS.
In Stewart's representation [44], the regions R i are defined via so-called indicator functions g i (x) for all i ∈ J = {1, . . ., n f }.The definition reads as Arguably, this definition of the regions might be less intuitive than (10).However, if the regions R i match the base sets Ri from Definition 1, it was shown in [38, Proposition 2] that the function g(x) = (g 1 (x), . . ., g n f (x)) can be obtained as: The multiplier vector θ is expressed as the solution of an LP parameterized by x: ODE with DRHS (4) Filippov DI ( 6) DCS ( 3) Filippov PSS (11) Heaviside DCS (22) Structured ODE with DRHS (13) Aizerman-Pyatnitskii DI (14) Stewart DCS( 28) Using its KKT condition, one can obtain a DCS equivalent to (24), which reads as: where µ ∈ R and λ ∈ R n f are the Lagrange multipliers for the constraints (27b) and (27c), respectively.The DCS ( 28) is also an instance of the generic DCS (3).Set z = (θ, µ) and

Summary and relations between different formalisms
The diagram in Figure 1 summarizes the relationships between the nonsmooth systems studied in this paper.The first column consists of the different ODEs with DRHS that we have treated.After treating a generic ODE with DRHS in (4), we specialize in two structured cases: PSS in (8) and ODEs (13), where their dynamics contain some discontinuous expressions of x, e.g. as Heaviside step functions.
These ODEs may not have a classical or Carathéodory solution in some cases, so we embed them in differential inclusions and obtain more general notions, such as Filippov solutions.These concepts are summarized in the second column.A generic ODE with DRHS is generalized to Filippov DIs (6).If the ODE is more structured as a piecewise smooth system, then its Filippov extensions become more explicit (11).In both cases, the r.h.s. is now a convex and compact set.If the discontinuous function in the structured ODE (13) is replaced by set-valued extensions, we obtain an Aizerman-Pyatnitskii DI (14).The set on the r.h.s. may even be nonconvex.In Proposition 5 we show that when we use PSS and Heaviside step functions, the Aizerman-Pyatnitskii DI and the Filippov extension for PSS are equivalent.
The third column consists of dynamic complementarity systems obtained from the DIs, which are useful representations for numerical computations.The Heaviside step DCS (22) and the Stewart DCS (28) are both instances of a generic DCS (3).They are derived from the corresponding differential inclusions using the KKT conditions of parametric LPs, as shown in Sections 2.5.2 and 2.6, respectively.The numerical methods developed in [38] and in this paper exploit the continuity properties of the Lagrange multipliers in the KKT conditions of these LPs.We conclude this section by illustrating the different formalisms with an example.
Example 2. Let us illustrate the different classes of nonsmooth systems on the discontinuous ODE (which is in the form of (4)): This is a PSS (as in (8)) with the switching function ψ(x) = x, with the regions The Filippov extensions of this PSS (Eq.( 11)) reads as: In the form of an Aizerman-Pyatnitskii DI (2), i.e. (14), the system reads as ẋ ∈ 3−2γ(x).
We can write also as a DCS of the form of (22): Similarly, by defining the indicator function g(x) = (−x, x), we can state Stewart DCS (28):

Properties of the step representation DCS
In this section, we study some properties of the DCS obtained via Heaviside step functions ( 22) for a fixed active set and at active-set changes.These properties will be useful for algorithmic development in the subsequent sections.For a fixed active set, i.e.I(x(t)) = {i | θ i (t) > 0} being constant on some time interval, there are no switches and the dynamics have locally no discontinuities.On the other hand, at an active-set change there is a switch and a discontinuity in the dynamics.

Active-set changes and continuity of λ p and λ n
Active-set changes are paired with discontinuities in some of the algebraic variables.We have seen in Lemma 3 that λ p and λ n are continuous functions of time.
At an active-set change, we at least one of the switching functions ψ j (x) either becomes zero, or if it was zero it becomes nonzero.It follows from ψ j (x(t)) = λ p j (t) − λ n j (t) (in (17)), that also both λ p j (t) and λ n j (t) must be zero at an active-set change.We use now the DCS formulation via step functions in (22) to illustrate the different switching cases that arise in Filippov systems.
Example 3.There are four possible switching cases which we illustrate with the following examples: (a) crossing a surface of discontinuity, ẋ(t) ∈ 2 − sign(x(t)) (same system as in Example 2), (b) entering a sliding mode, ẋ(t) ∈ −sign(x(t)) + 0.2 sin(5t), (c) leaving a sliding mode ẋ(t) ∈ −sign(x(t)) + t, In case (a), for x(0) < 0 the trajectory reaches x = 0 and crosses it.In example (b), for any finite x(0), the trajectory reaches x = 0 and stays there.On the other hand, in example (c), for x(0) = 0, the DI has a unique solution and leaves x = 0 at t = 1.In the last example, the DI has infinitely many solutions for x(0) = 0, and x(t) can spontaneously leave x = 0 at any t ≥ 0. The trajectories are illustrated in Figure 2. Whenever x(t) has a kink, which corresponds to a switch and discontinuity in the dynamics, both the Lagrange multipliers λ p (t) and λ n (t) are zero at that time.

Fixed active set in the step formulation
We study the properties of the DCS ( 22) for a fixed active set I(x(t)).Without loss of generality, the corresponding time interval is T = (0, T ).For a fixed active set, the DCS ( 22) reduces either to an ODE or to a Differential Algebraic Equation (DAE).
We start with the simpler ODE case.Let ψ j (x) ̸ = 0 for all j ∈ C := {1, . . ., n ψ }, then x(t) is in the interior of some region R i .It can be seen from the LP (15) that α j ∈ {0, 1} for all j ∈ C.This implies that θ i = 1 and θ k = 0, k ̸ = i.Therefore, I(x(t)) = {i} and the Filippov DI reduces to ẋ ∈ F F (x) = {f i (x)}, i.e., we have locally an ODE.
Next, we regard the case when I(x(t)) is not a singleton, i.e., the trajectory evolves at the boundary of two or more regions.Consequently, we have at least one ψ j (x) = 0. Let us associate with I(x(t)) the index set K(x(t)) = {j ∈ C | ψ j (x) = 0}, i.e., the set of indices of all switching functions that are zero for a given active set I(x(t)).
In the sequel, we make use of the following notation.For a given vector a ∈ R n and set I ⊆ {1, . . ., n}, we define the projection matrix P I ∈ R |I|×n , which has zeros or ones as entries.It selects all component a i , i ∈ I from the vector a, i.e., a Following the discussion from the previous section, for all nonzero ψ j (x), i.e., j ∈ C \ K, we can compute α j ∈ {0, 1} via the LP (15) and λ p,j , λ n,j via (17).Next, we have that λ p,j = λ n,j = 0 for all j ∈ K.It is left to determine α j for all j ∈ K and thus implicitly all θ i , for all i ∈ I. Recall that θ i = 0 for all i / ∈ I.By fixing the already known variables in (22) we obtain the DAE: where we define F I (x, u) := F (x, u)P ⊤ I ∈ R nx×|I| , i.e., we select only the columns of F (x, u) with the index i ∈ I.Note that α j for all j ∈ C \ K are known and thus no degrees of freedom.We keep them for ease of notation.Thus we have a DAE with |I| + |K| unknowns, namely θ I ∈ R |I| and α K ∈ R |K| , and |I| + |K| algebraic equations in (29b) and (29c).
Next, we investigate conditions under which the DAE ( 29) is well-posed.For this purpose, we define the matrix where ∇ψ K (x) = ∇ψ j (x) | j ∈ K ∈ R nx×|K| is a matrix, whose columns are the gradients of the switching functions that are zero for the given active set I.Moreover, we define a compact notation for the partial Jacobian B K,I (α) ∈ R |I|×|K| of (29b) w.r.t. to α K , with the elements: Assumption 7. Given a fixed active set I(x(t)) = I for t ∈ T , it holds that the matrix functions W K,I (x, u) and B K,I (α) are Lipschitz continuous, and that W K,I (x, u)B K,I (α) has rank |K|, i.e. it is full rank, for all t ∈ T .
Proposition 8. Suppose that Assumption7 holds.Given an initial value x(0), the DAE (29) has a unique solution for all t ∈ T .
Proof.First, we differentiate (29c) with respect to t, such that algebraic variables appear explicitly in the algebraic equations (this correspond to so-called index reduction in the theory of DAEs, cf.[27]): Next, we prove that the partial Jacobian of (30b)-(30c) w.r.t. to the algebraic variables (θ I , α K ) has rank |I| + |K|, i.e., it is an invertible matrix.We omit the dependencies on α and x for brevity.The Jacobian of (30b)-(30c) w.r.t. to (θ I , α K ) has the form To prove that this matrix has full rank, we show that the only solution (v, w) ∈ R |I| × R |K| , to the following linear system is the zero vector: From the first line we have v = B I,K w and substituting this into the second line we have W I,K B I,K w = 0. Since the matrix W I,K B I,K ∈ R |K|×|K| has rank |K|, the only solution to ( 31) is w = 0, and v = Bw = 0. Hence, A has full rank.Now we can apply the implicit function theorem [19,Theorem 1B.1] to (30b)-(30c), which guarantees the existence of continuously differentiable functions θ I (x) and α K (x).Since the function θ I (x) is continuously differentiable, it is also Lipschitz continuous for a fixed I(x(t)) on t ∈ T .By substituting θ I (x) into (30a), we have a product of two Lipschitz continuous functions (all columns of F I (x, u), are Lipschitz by assumption), and the DAE (30) reduces to an ODE with a Lipschitz continuous r.h.s.This enables us to apply the Picard-Lindelöf Theorem to obtain the assertion of the proposition.
□ We make a few comments on Assumption 7. The rank condition can be checked explicitly, since one can simply compute the matrix W K,I (x, u)B K,I (α).We have already assumed Lipschitz continuity of all columns of f i (x, u) and of the gradients ∇ψ j (x).Here we additionally assume it for the matrix W K,I (x, u), whose entries are computed as inner products on these vectors.The entries of the matrix B K,I (α) are multi-affine terms, which are also Lipschitz, at least on the bounded domains we consider here.In Stewart's reformulation, we consider a square matrix with entries ∇g i (x) ⊤ f j (x, u) (which is structurally similar to W K,I (x, u)).For well-posedness with a fixed active set, the invertibility of this matrix is assumed [38,44].
In [18], the authors make some assumptions on the signs of the entries of W K,I (x) and prove the existence, but not uniqueness, of solutions with a fixed-point argument.For the case of |K| ≤ 2, i.e., sliding modes with co-dimension one or two, and with additional assumptions on the signs of the entries of W K,I (x) they even prove the uniqueness of solutions.
Observe that for a given x(t), there might be several I(x(t)) that yield meaningful DAEs of the form of Eq. ( 29).This may happen when the Filippov DI does not have unique solutions, such as in Example 3 case (d).The trajectory may stay in sliding mode or leave it any point of time.The trajectory pieces in either scenario are well-posed, even thought the overall trajectory is not unique.x(t) x(t) Figure 3: Example trajectories of an ODE with DRHS, with a switch at t = 1.The solid lines correspond to numerical approximations, the transparent ones to analytic solutions.The circle markers correspond to the stage values of an underlying Runge-Kutta (RK) method, the vertical dashed lines to the boundaries of the integration intervals.The left plots show an approximation obtained with a time-stepping RK method, the middle one with FESD but without step equilibration, and the right one with FESD including step equilibration.

Main ideas
We outline the main ideas in the derivation of the FESD discretization, and in the following sections we explain each step in more detail.The starting point is a standard time-stepping Runge-Kutta (RK) discretization of the DCS (22).In this case, a fixed integration step size is assumed.If the switch occurs within an integration interval, the high accuracy properties of these methods are lost.This is shown in the left plot of Figure 3.The switch occurred at an RK stage point inside the integration interval, and thereafter there is a large discrepancy between the numerical approximation and the exact solution.Nevertheless, the standard RK methods are a starting point for the development of the FESD method and we recall them in Section 4.2.
If the switches always coincided with the integration interval boundaries, the RK method would not lose its accuracy.This can be achieved by detecting the switches and adjusting the step size.To achieve this, in the FESD method [38], the integration step sizes are left as degrees of freedom.This idea was first proposed by Baumrucker and Biegler in [9] and extended and theoretically analyzed in [38].Moreover, additional complementarity conditions, called cross complementarities, prohibit switches within an integration interval (finite element), which leads to their isolation at the boundaries and recovery of high accuracy.An illustration is given in the middle plot of Figure 3.In contrast to the fixed step size discretization, the numerical and exact solutions are indistinguishable.We study these extensions in Section 4.3.
However, when there are no switches, the step sizes are not uniquely determined.As can be seen in the middle plot of Figure 3, the integration intervals all have different and somewhat random lengths.In Section 4.4 we introduce the step equilibration conditions that make neighboring finite elements have the same length if no switches occur, In Section 4.4 we introduce the step equilibration conditions that make neighboring finite elements have the same length if no switches occur, and thus determine them uniquely.The right plot in Figure 3 illustrates this effect, where now before and after the switch have equidistant grids.This is also a key ingredient for having locally unique solutions, as we will show later.
In Section 4.5, we summarize the developments of the previous the sections and summarize the full FESD discretization in a compact way.We emphasize that in the FESD method, switch detection is fully implicit and based only on algebraic conditions.This makes it suitable for the discretization of optimal control problems.We conclude with Section 4.6, where we show how to apply the FESD method to discretize an optimal control problem subject to a nonsmooth dynamical system with Heaviside step functions.

Standard implicit Runge-Kutta discretization
As a starting point in our analysis, we regard a standard Runge-Kutta (RK) discretization of the DCS (22).For ease of notation, we work with the nonsmooth DAE formulation of the DCS (24), which we restate for convenience ẋ = F (x, u)θ, (32a) In the sequel, one should keep in mind that (32b) collects all algebraic equations including the complementarity conditions (22d) and (22e).
In a discretized Optimal Control Problem (OCP) we usually have several control intervals.In our derivations here, it is sufficient to consider a single control interval [0, T ] with a constant control input q ∈ R nu , i.e., we set u(t) = q for t ∈ [0, T ].The extension to more elaborate control parameterizations is straightforward [42].In Section 4.6 we show how to discretize and OCPs with multiple control intervals.
Let x(0) = s 0 be the given initial value.The control interval [0, T ] is divided into N FE finite elements (i.e., integration intervals) [t n , t n+1 ] via the grid points 0 = t 0 < t 1 < . . .< t NFE = T .On each of the finite elements we regard an n s -stage Runge-Kutta method which is characterized by the Butcher tableau entries a i,j , b i and c i with i, j ∈ {1, . . ., n s } [27].The step-sizes read as h n = t n+1 − t n , n = 0, . . ., N FE − 1.The approximation of the differential state at the grid points t n is denoted by We regard the so-called differential representation of the Runge-Kutta method [27].Thus, the derivatives of the states at the RK stage points t n,i := t n + c i h n , i = 1, . . ., n s , are the degrees of freedom.For a single finite element, we summarize them in the vector V n := (v n,1 , . . ., v n,ns ) ∈ R nsnx .The stage values for the algebraic variables are collected in the vectors: The vector x next n denotes the state value at t n+1 , which is obtained after a single integration step.Now, we can state the RK equations for the DCS (32) for a single finite element as Next, we summarize the equations for all N FE finite elements over the entire interval [0, T ] in a discrete-time system format.To make it more manageable, we use some additional shorthand notation and group all variables of all finite elements for a single control interval into the following vectors: x = (x 0 , . . ., x NFE ) ∈ R (NFE+1)nx , V = (V 0 , . . ., V NFE−1 ) ∈ R NFEnsnx and h := (h 0 , . . ., h NFE−1 ) ∈ R NFE .Recall that the simple continuity condition x n+1 = x next n holds.We collect the stage values of the Filippov multipliers in the vector Θ = (Θ 0 , . . ., Θ NFE−1 ) ∈ R n θ and n θ = N FE n s n f .Similarly, we group the stage values of the algebraic variables in the vectors A, Λ p , Λ n ∈ R nα , where n α = N FE n s n c .Finally, we collect all internal variables in the vector All computations over a single control interval of the standard discretization (denoted by the subscript std in the corresponding functions) are summarized in the following equations: where s 1 ∈ R nx is the approximation of x(T ) and . . .
In (34), h is a given parameter and implicitly fixes the discretization grid.In contrast to standard RK discretizations, we will now proceed by letting h be degrees of freedom and introduce the cross-complementarity conditions.

Cross complementarity
For ease of exposition, suppose that the underlying RK scheme satisfies c ns = 1.This means that the right boundary point of a finite element is a stage point, since t n+1 = t n + c ns h n .For example, this assumptions is satisfied for Radau and Lobatto schemes [27].We provide extensions for c ns ̸ = 1 at the end of the section.
The goal is to derive additional constraints that will allow active-set changes only at the boundary of a finite element, compare left and middle plots in Figure 3.Moreover, in this case, the step size h n should adapt such that the switch is detected exactly.Recall that for the step reformulation at every stage point we have the complementarity conditions: where n is the index of the finite elements (integration interval) and m the index of the RK-stage.We exploit the continuity of the Lagrange multipliers λ p and λ n , cf.Lemma 3. We regard the boundary values of the approximation of λ p and λ n on an interval [t n , t n+1 ].They are denoted by λ p n,0 , λ n n,0 (which we define below) at t n and λ p n,ns , λ n n,ns at t n+1 .Next, we impose a continuity condition for the discrete-time versions of λ p and λ n for all n ∈ {0, . . ., N FE − 1}: Note that λ p 0,0 and λ n 0,0 are not defined via (36), as we do not have a preceding finite element for n = 0. Nevertheless, they are crucial for determining the active set in the first finite element.They are not degrees of freedom but parameters determined by a given s 0 .Using (17) we obtain λ p 0,0 = max(ψ(s 0 ), 0) and λ n 0,0 = − min(ψ(s 0 ), 0).We have seen in Section 3.1 that, due to continuity, λ p i (t) and λ n i (t) must be zero at an active set change, see also Figure 2.Moreover, on an interval t ∈ (t n , t n+1 ) with a fixed active set, the components of these multipliers are either zero or positive on the whole interval.The discrete-time counterparts, i.e., the stage values λ p n,m and λ n n,m should satisfy these properties as well.We achieve these goals via the cross complementarity conditions, which read as, for all n ∈ {0, . . ., N FE −1}: In contrast to (36), here we have conditions relating variables corresponding to different RK stages within a finite element.Equation ( 37) extends the complementarity conditions for the same RK-stage, i.e., for m = m ′ , which are part of the standard RK equations, cf.(35).Some of the claims about the constraints (37) are formalized by the next lemma.Recall that in our notation α n,m,j is the j-th component of the vector α n,m .Lemma 9. Regard a fixed n ∈ {0, . . ., N FE −1} and a fixed j ∈ C. If any α n,m,j with m ∈ {1, . . ., n s } is positive, then all λ n n,m ′ ,j with m ′ ∈ {0, . . ., n s } must be zero.Conversely, if any λ n n,m ′ ,j is positive, then all α n,m,j are zero.
It is now left to discuss why the boundary points λ p n+1,0 = λ p n,ns and λ n n+1,0 = λ n n,ns of the previous finite element are included in the cross complementarity conditions (37).It turns out, they are the key to switch detection.A consequence of Lemmata 9 and 10 is that, if the active set changes in the j-th component between the n-th and (n + 1)-th finite element, then it must hold that λ p n,ns,j = λ p n+1,0,j = 0 and λ n n,ns,j = λ n n+1,0,j = 0. Since x next n = x n+1 , we have from (33) the condition ψ j (x n+1 ) = 0, which defines exactly the switching surface between two regions.Therefore, we have implicitly a constraint that forces h n to adapt such that the switch is detected exactly.Given a, b ∈ R p , the complementarity conditions 0 ≤ a ⊥ b ≥ 0 mean that a, b ≥ 0 and a i b i = 0, i = 1, . . ., p. Due to the non-negativity of a and b, the last conditions can be replaced by a ⊤ b = 0. Similar aggregations can be made with the cross complementarity conditions.For clarity, we stated (37) in their most sparse form, without any aggregation.However, the nonnegativity of α n,m , λ p n,m and λ n n,m allows more compact forms.In the sequel, we use a formulation such that, together with the constraint

NFE−1 n=0
h n = T , we have the same number of new equations as new degrees of freedom by varying h n .Thus, we combine constraints of two neighboring finite elements and have the compact formulation whose entries are for all n ∈ {0, . . ., N FE − 2} given by We remind the reader that we use this seemingly complicated form to obtain a square system of equations.This simplifies the study of the well-posedness of the FESD equations later.However, in an implementation one can use any of the equivalent more sparse, or dense, formulations.Many possible variants are implemented in nosnoc [35], and the user can control the sparsity.

Step size equilibration
To complete the derivation of the FESD method for the Heaviside step reformulation (32), we need to derive the step equilibration conditions.Here, step refers to the integration step size and should not to be confused with the set-valued Heaviside step function.
If no active-set changes happen, the cross complementarity constraints (37) are implied by the standard complementarity conditions (35).This can easily be verified by looking at the stage point in the middle plot of Figure 3. Therefore, we end up with a system of equations with more degrees of freedom than conditions.The step equilibration constraints aim to remove the degrees of freedom in the appropriate h n if no switches happen.This results in a piecewise uniform discretization grid for the differential and algebraic states on the considered time interval.
We achieve the goals outlined above via the equation: where η n is an indicator function that is zero only if a switch occurs, otherwise its value is strictly positive.This provides a condition that removes the spurious degrees of freedom.In the remainder of this section, we derive a possible expression for η n .The derivations below are motivated by the following facts.Let t n be a switching point with ψ j (x(t n )) = 0 for some j ∈ C. Consequently, it holds that λ p j (t n ) = λ n j (t n ) = 0. If, for example, a switch occurs at t n such that ψ(x(t − n )) < 0 and ψ(x(t + n )) > 0, we have that λn This can be verified by looking at the plots in Figure 2. The symmetric case is possible as well.The absolute values of these directional derivatives help us to encode the switching logic.Now, instead of looking at the time derivatives, in the discrete-time case, we exploit the non-negativity of λ p n,m , λ n n,m , and the fact no switches occur within a finite element.For n ∈ {1, . . ., N FE − 1}, we define the following backward and forward sums of the stage values over the neighboring finite elements [t n−1 , t n ] and [t n , t n+1 ]:  Moreover, for all n ∈ {1, . . ., N FE − 1} we define the following variables to summarize the logical dependencies: and The switching cases and sign logic is summarized in Table 2.For readability, we put in the table a one if a variable is positive and a zero if it is zero.Let us discuss how the variables above encode the switching logic, and for this purpose, we regard the j-th switching functions ψ j (x).If no switch occurs, and for example, we have that ψ j (x(t)) < 0 during the regard time interval, it follows that λ n j (t) > 0 and λ p j (t) = 0 during this time interval.In the discrete time setting, we have σ λ n ,B n,j , σ λ n ,F n,j > 0 and σ λ p ,B n,j = σ λ p ,F n,j = 0.This means that π λ n n,j > 0, π λ n n,j = 0 and υ n,j > 0. It can be seen that the symmetric case with ψ j (x(t)) > 0 leads also to υ n,j > 0, hence we do not enumerate all symmetric cases in Table 2.
On the other hand, if we have a switch of the crossing type (cf.top plots in Figure 2 with ψ j (x(t)) < 0 for t < t s,n and ψ j (x(t)) > 0 for t > t s,n , it follows that λ n j (t) > 0, λ p j (t) = 0 for t < t s,n and λ n j (t) = 0, λ p j (t) > 0 for t > t s,n .In the discrete-time setting we obtain the sign pattern as in the second row of Table 2, with υ n,j = 0.
In general, if there is an active-set change in the j-th complementarity pair, then at most one of the j-th components of σ λ p ,B n and σ λ p ,F n , or σ λ n ,B n and σ λ n ,F n is nonzero.In these cases, we obtain that υ n,j = 0, and if now switch happens we have that υ n,j > 0.
In other words, υ n,j is only zero if there is an active-set change in the j-th complementarity pair at t n , otherwise, it is strictly positive.We summarize all logical relations for all switching functions into a single scalar expression and define It is zero only if an active-set change happens at the boundary point t n , otherwise, it is strictly positive.

The FESD discretization
We have now introduced all extensions needed to pass from a standard RK discretization (34) to the FESD discretization for the step reformulation.With a slight abuse of notation, we collect all equations in a discrete-time system form: where F fesd (x) = x NFE is the state transition map and G fesd (x, h, Z, q, T ) collects all other internal computations including all RK steps within the regarded time interval: Here, the control variable q, the horizon length T , and the initial value s 0 are given parameters.
Remark on RK methods with c ns ̸ = 1.The extension for the case of an RK method with c ns ̸ = 1 follows similar lines as in Stewart's formulation [38].We have that t n + c ns h n < t n+1 .Hence, the variables λ p n,ns and λ n n,ns do not correspond to the boundary values λ n (t n+1 ) and λ p (t n+1 ) anymore.We denote the true boundary points by λ p n,ns+1 and λ n n,ns+1 .They can be computed from the KKT conditions of the step reformulation LP (15).For all n ∈ {1, . . ., N FE − 2} we have These equations are appended to the FESD equation in (41).However, to make the switch detection work, we must update the continuity conditions for the discretetime versions of the Lagrange multipliers and adapt the cross-complementarity conditions accordingly.For all n = {0, . . ., N FE − 1}, ( 36) is replaced by: We append to the vectors A, Λ p and Λ n the new variables α n,ns+1 , λ p n,ns+1 and λ n n,ns+1 accordingly.For the whole control interval, we have in total 3(N FE − 1)n c new variables.It is only left to state the modified cross complementarity conditions, including the expressions' (n s + 1)-th point.More explicitly, the n-th component of (38) reads now for all n ∈ {0, . . ., N FE − 2} as 4.6.Discretizing optimal control problems with FESD Regard a optimal control problem subject to a nonsmooth dynamical system (2): of the following form: where L : R nx × R nu → R is the running cost and R : R nx → R is the terminal cost, s 0 ∈ R nx is a given initial value.The path and terminal constraints are grouped into the functions G path : R nx × R nu → R n Gp and G terminal : R nx → R n G t , respectively.In this paper, we consider a direct approach [42], i.e., we first discretize the continuous-time OCP (44) and then solve a finite-dimensional nonlinear program (NLP).Here we discretize the OCP using the FESD method.First, for the discretization of the control function, we consider N ≥ 1 control intervals of equal length, indexed by k.We use a piecewise constant control discretization, where the control variables are collected q = (q 0 , . . ., q N −1 ) ∈ R N nu .Such a discretization is typically used in feedback control, but extensions to more sophisticated control are straightforward.All internal variables are additionally indexed by k.
Second, we discretize the cost function (44a) and the dynamics (44c).To apply the FESD method, the differential inclusion in (44c) is transformed into an equivalent DCS, as described in Section 2.5.The DCS, now of the from of ( 22), is discretized with the FESD method summarized in the previous section.For each control interval k we use (40) with N FE internal finite elements.The state values at the control interval boundaries are grouped in the vector s = (s 0 , . . ., s N ) ∈ R (N +1)nx .In Z = (Z 0 , . . ., Z N −1 ) we collect all internal variables and in H = (h 0 , . . ., h N −1 ) all step sizes.For the cost discretization, one can derive a quadrature formula [42,Chapter 8], or introduce a scalar quadrature state l(t) = L(x(t), u(t)), ℓ(0) = 0, integrate it together with the dynamics equations, and use ℓ(T ) in the objective, which approximates the integral term in (44a).
Third, the path constraints are relaxed and evaluated only at the control interval boundary points, i.e., at the variables s k and q k , k = 0, . . ., N − 1.If necessary, the path constraint can be evaluated on a finer grid using the values computed in the internal integration intervals within a control interval or at the RK stage points.The terminal constraint and cost are simply evaluated at s N , which is an approximation of x(T ).In summary, we obtain a discrete-time variant of (44): where L : R nx × R (NFE+1)nsnx × R nu → R is the discretized running cost.Due to the complementarity constraints in the FESD discretization, this NLP is a mathematical program with complementarity constraints.This are degenerate NLPs since the complementarity constraints lead to the violation of standard constraint qualifications at all feasible points.In practice, they can often be efficiently solved by solving a sequence of related and relaxed NLPs within a homotopy approach.Such an approach with some of the standard reformulations [43,8,41] is implemented in nosnoc.A survey and comparison of state-of-the-art methods for solving NLPs in nonsmooth optimal control problems is given in [36].All these homotopy solution methods are implemented in nosnoc [35].In practice, Scholtes' relaxations [43] together with IPOPT [51], called via its CasADi interface [7], often work very well.We use this method in the numerical experiments in the paper, when solving problems like (45).

Convergence theory of FESD for the step representation
In this section, we present the main convergence result of the FESD method for the step representation outlined in (40).Specifically, we show that: (1) the solutions to the FESD problem are locally isolated; (2) both the solution approximations and (3) the numerical sensitivities obtained via the FESD method converge with the same order of accuracy as the underlying RK method.The proofs are similar to those used in Stewart's case in [38], hence, we will not repeat them here.The main difference is in the assumptions we make.

Main assumptions
We start by stating all assumptions.The first assumption relates to the underlying RK methods: Assumption 11. (Runge-Kutta method) A Butcher tableau with the entries a i,j , b i and c i , i, j ∈ {1, . . ., n s } related to an n s -stage Runge-Kutta (RK) method is used in the FESD (40).Moreover, we assume that: (a) If the same RK method is applied to the differential algebraic equation (29) on an interval [t a , t b ], with a fixed active set, it has a global accuracy of O(h p ) for the differential states.
(b) The RK equations applied to (29) have a locally isolated solution for a sufficiently small h n > 0.
The purpose of this assumption is to describe the properties of an underlying RK method used in FESD when applied to a smooth ODE or DAE.Both requirements (a) and (b) are standard and are satisfied by many RK methods used in practice, e.g.Radau IIA or Gauss-Legendre methods, cf.[27].The second assumption concerns the existence of solutions to the FESD problem outlined in (40).
Assumption 12. (Solution existence) For given parameters s 0 , q and T , there exists a solution to the FESD problem (40), such that for all n ∈ {0, . . ., N FE − 1} it holds that h n > 0.
The problem ( 40) is a nonlinear complementarity problem [20].The proof of existence of solutions for the standard RK equations (34) can probably be done with standard tools from complementarity theory [20], as it was done for example in [5,Proposition 15] for the implicit Euler method.With the additional cross complementarity and step equilibration conditions appearing in (40), the same proof technique is no longer applicable.The existence of solutions is still an open problem.Motivated by empirical observations, we assume the existence of solutions in this paper.The next assumption is slightly more technical and relates to the regularity of the problem under consideration.
This assumption is made to ensure the correct rank of partial Jacobians of (40) (with a fixed active set).It is used to prove the local uniqueness of solutions of the FESD problem, and similar assumptions are made in [38].The first part of the assumption requires that at least one complementarity pair on stage points within an integration interval (finite element) satisfies strict complementarity.Looking at the common switching cases in Figure 2, one can see that this assumption always holds and is thus not restrictive.The second part requires that at least one term, G rk (x n+1 , Z n , h n , q) multiplied by h n , is nonzero.Both assumptions can be checked computationally once a candidate solution has been computed.
Before starting the final assumption, let us introduce some notation.We use a suitable interpolation scheme to construct a continuous-time approximation of the solution based on the stage values obtained by solving equation (40).We denote the continuous-time approximation on every finite element [t n , t n+1 ] by xn (t; h n ).To approximate the solution over the entire time interval [0, T ], we append the local approximations from each finite element: where h = max n∈{0,...NFE−1} h n .The subscript h for a solution approximation xh (t) indicates that we get different solution approximations on the entire interval [0, T ] as the maximum step size changes, and not that the entire approximation is parameterized by a single step size h.As the maximum step size shrinks, we expect xh (t) to converge to an exact solution x(t).The set of all grid points is defined as G = {t 0 , . . ., t NFE }.
We can also use this approach to construct continuous-time representations for the algebraic variables, which we denote by θh , αh , λn h and λp h .The n-th switching point of the true solution is denoted by t s,n and one corresponding to a solution approximation by ts,n .Similarly, the active sets (cf.( 12)) of the solution approximation are denoted by I(x h (t)) = În , t ∈ ( ts,n , ts,n+1 ) and the active set at switching point ts,n by Assumption 14.Let I 0 n be the active set at switching point x(t s,n ) of true solution and Î0 n the active set at the switching point x( ts,n ) a solution approximation.If ts,n is sufficiently close to t s,n and I 0 n = Î0 n , then I n+1 = În+1 .Furthermore, if there are several possible new active sets, they are identical for both the true solution and its approximation.
This assumption requires that a given solution approximation and the corresponding true solution predict the same active sets.In other words, the solution approximation enters the same region or sliding mode as the true solution after a switching event.In Stewart's reformulation, this statement can be directly proved using an auxiliary linear complementarity problem constructed with the problem data [44].In the case of DCS (22), such auxiliary problems are not available and the property is assumed directly.The requirement that a sufficiently good solution approximation predicts the same active set as the true solution is needed to prove the high accuracy convergence of a switch detection method such as FESD, cf.[44,Theorem 4.3 ] and [38,Theorem 16].

Solutions of the FESD problem are locally isolated
In this section, we present a theorem on the regularity of solutions to the FESD problem (40).For brevity, we focus on the case where c ns = 1.For the reader's convince we restate the FESD problem Furthermore, we provide a summary of the dimensions of all key functions and variables: • Degrees of freedom: Z = (x, V, Θ, A, Λ p , Λ n ) and h.
• Total number of degrees of freedom: n Z + N FE , where n Z = (N FE + 1)n x + N FE n s n x + n θ + 3n α .
• Dimension of Θ: • Cross complementarity: The vectors s 0 ∈ R nx , q ∈ R nu and T ∈ R are given parameters.As a result, the system of equations ( 47) has a total of n Z + N FE unknowns and n Z + 2N FE − 1 equations, which means it is always over-determined.
Theorem 15.Suppose that Assumptions 11, 12 and 13 hold.Let s 0 , q 0 and T be some fixed parameters such that G fesd (Z * , h * , s 0 , q, T ) = 0 in (47).Let P * ⊆ R nx × R nu × R be the set of all parameters (ŝ 0 , q, T ) such that Z ∈ R n Z , which is the solution of G fesd (Z, h, ŝ0 , q, T ) = 0, has the same active set as Z * .Additionally, suppose that G fesd (•) is continuously differentiable in s 0 , q and T for all (s 0 , q, T ) ∈ P * .Then there exists a neighborhood P ⊆ P * of (s 0 , q 0 , T ) such that there exist continuously differentiable single valued functions Z * : P → R n Z and h * : P → R NFE .
Proof.The proof follows similar lines as the proof of [38,Theorem 14] and we omit it for brevity.
□ The main steps in the proof are as follows.One observes that if there is an active-set change (i.e., a switch) between two adjacent finite elements, then the cross-complementarity conditions are implied by the standard complementarity conditions and are thus redundant and can be removed.The step size is locally determined by the step equlibration condition.Similarly, if there is a switch, then the cross complementarity conditions are binding and the corresponding step equilibration condition can be removed.In summary, the FESD problem (47) reduces to a square systems with n Z + N FE equations and unknowns.Under the given assumptions, one can prove that the Jacobian of ( 47) is of full rank and apply the implicit function theorem to obtain the above result.

Convergence of the FESD method
We proceed by stating the results of the convergence of the FESD method.We show that under suitable assumptions, the sequence of approximations xh (•) generated by the FESD method converges to a solution of (8).In particular, the FESD method has the same order as the underlying RK method for smooth ODE.
Theorem 16.Let x(•) be a solution of (8) with finitely many active set changes for t ∈ [0, T ] with x(0) = x 0 .Suppose the following is true: (a) The Assumptions 7 and 14 are satisfied.
Then x(•) is a limit point of the sequence of approximations xh (•), defined in Eq. ( 46) as h ↓ 0.Moreover, for sufficiently small h > 0, the solution of (40) generates a solution approximation xh (t) on [0, T ] such that: ∥x h (t) − x(t)∥ = O(h p ), for all t ∈ G.
Proof.The proof follows similar lines as the proof of [38,Theorem 16].The primary distinction lies in the prediction of new active sets.In [38,Theorem 16], we use [38,Assumption 8] to be able to apply [44,Lemma A.2] and demonstrate that both the approximation and the exact solution share the same active set in the vicinity of a switching point.In this theorem, the assertion emerges directly from Assumption 14. □ The proof of [38,Theorem 16] is inspired by the proof of [44,Theorem 4.3] and is quite involved.The main idea is to consider intervals [t s,n , t s,n+1 ] with fixed active sets I n where the dynamics are locally smooth, and we recover the accuracy of the underlying RK method.At a switching point, ts,n , respectively t s,n , one has to prove (48a) and that the solution approximation can continue to evolve with the same active set I n+1 as the true solution.Then the argument can be used inductively.Some distinction must be made for the case where the approximation switches before or after the true solution to obtain the results.Once (48a) is proved, (48b) can be obtained from some algeraic manipulations and the Lipschitz continuity of the local dynamics and its solution.The error is dominated by the maximum step size h; hence, it is used in the error estimate.

Convergence of discrete-time sensitivities
This section concludes by demonstrating that the numerical sensitivities (cf.Section 1 for a definition) obtained using the FESD method for the step reformulation converge to the correct values with a high order of accuracy.We remind the reader that numerical sensitivities of standard time-stepping methods, e.g.(34), do not converge to the correct values, no matter how small the step size becomes [49].The convergence of the sensitivities is crucial for the success of direct optimal control methods [33].
Before stating the result, we assume the time derivatives of the solution approximation xh converge accordingly.This assumption extends Assumption 11 and allows us to consider a wide range of RK methods.Assumption 17. (RK derivatives) Regard the RK methods from Assumption 11 applied to the differential algebraic equations (29).The derivatives of the numerical approximation for the same RK method converge with order 1 ≤ q ≤ p, i.e., ∥ ẋh (t) − ẋ(t)∥ = O(h q ), t ∈ G.
We remind the reader that for collocation-based implicit RK methods for ODE in general it holds that q = p − 1 [26,Theorem 7.10].
Proof.The proof is essentially the same as the proof of [38,Theorem 18], one has only to replace the local switching functions ψ i,j (x) by an appropriate switching function ψ k (x).□ The key to this proof is the fact that the cross complementarity conditions imply ψ j (x h ( ts )) = 0 at a switching point ts , cf.Section 4.3.Most of the proof then applies the chain rule and the implicit function theorem to obtain the similar expression for the discrete-time numerical sensitivities as for the continuoustime case.

Efficient modeling with set-valued Heaviside step functions
In this section, we show how to efficiently represent common geometries of the PSS regions with the use of Heaviside step functions.This is useful for reducing the complexity of the modeling process.Moreover, we introduce a lifting algorithm, which makes the multi-affine expressions for θ i in (22b) "less nonlinear" with the help of auxiliary variables.

Overview of expressions for θ via Heaviside step functions
We regard the following two sets: A = {x | ψ A (x) > 0}, B = {x | ψ B (x) > 0}, and let α A ∈ γ(ψ A (x)) and α B ∈ γ(ψ B (x)).Note that these set do not have to correspond to the base regions in Definition 1, but we can use them to define such regions.Table 3 provides an overview of how the elementary algebraic expressions for the multipliers θ i are related to the geometric definition of a region R i .More complicated expressions can be obtained by combining the ones listed in Table 3.
Remark 19 (Sum of Filippov systems).In practice, one often encounters DIs that arise from the sum of several Filippov systems.This occurs, for example, if we have multiple surfaces with friction, or multiple objects touching the same frictional surface [45].All developments from this paper can be extended to this case and are implemented in nosnoc.For the sake of brevity, we omit the corresponding equations and refer the reader to [38, Section 2.3].

Representing unions of sets
The regions R i may be given as unions of base sets Rj .Consequently, the number of multipliers θ ∈ R n f decreases and it holds that n f < 2 n ψ .For example, in the extreme case, we may have R 1 = Ri and R 2 = ∪ 2 n ψ j=1,j̸ =i Rj , for some i, which significantly reduces the number of variables, since we are left with only two regions, i.e., n f = 2.We illustrate such a case with by a simple example and discuss the general case in the sequel.
Example 4 (Union of sets).We regard an example with two scalar switching functions ψ 1 (x) and ψ 2 (x), with the basis regions

Definition of R i
Expression for θ i Sketch The PSS is defined by the two regions: According to the second row of Table 3, if a region R i consists of the union of two sets, the expression for θ i is the sum of two corresponding indicators.In the current example, R 1 consists of the union of three base regions, so the expressions for θ 1 must be a sum of three terms corresponding to the three base sets.On the other hand, all base sets are constructed from the intersection of two sets defined by ψ 1 (x) and ψ 2 (x), see the third row in Table 3.Thus, the products in the expressions refer to the intersection of the sets.Therefore, the related Filippov system reads as ẋ = θ 1 f 1 (x) + θ 2 f 2 (x), where By direct calculation, we verify that θ 1 , θ 2 ≥ 0 and θ 1 + θ 2 = 1.In contrast to the previous examples, the union of sets introduces a sum in the expressions for θ 1 .
We generalize the reasoning above as follows.Given n f total regions and the matrix F (x, u) ∈ R nx × R n F , where n F = 2 n ψ is the number of possibly repeating columns.We define the index sets R k = {i | F •,i (x, u) = f k (x, u)}, i.e., the set if the indices of the columns of F (x, u) equal to f k (x, u).Note that if we have no unions, then n F = n f and R k = {k} for all k ∈ J .Using these definitions, the expression for θ i in (21) reduces to: 2 + S k,j α j .

A lifting algorithm for the multi-affine terms
It can be seen from ( 21), the expression of θ i consist of a product of n ψ affine terms (of the form of α j and 1 − α k ).If n ψ is large, then this expression is very nonlinear.To reduce the nonlinearity, we introduce auxiliary lifting variables as in the lifted Newton's method [6], which iterates on a larger but less nonlinear problem.Whenever there are more than two terms in the multi-affine expression for θ i , we introduce lifting variables β k and derive an equivalent formulation, which has only bilinear terms.Now, instead of having n f multi-affine expressions, we have n f + n β expressions, but which are less nonlinear.We exploit the structure of the matrix S and derive an easy-to-implement algorithm that automates the lifting procedure.To give an idea of the final results we aim to obtain, we illustrate the lifting procedure with an example.
We proceed by outlining a general lifting algorithm.The expressions for θ i consist of the product of n ψ affine terms.Our goal is to have at most n d terms in the multi-affine expression for θ i .For example, if we pick n d = 2, we have only bilinear expressions in the equations defining θ and β.Thus, the parameter

Ref.
n given n d < n ψ , the total number of new lifting variables is n β = 2 n ψ − 2 n d .

Comparisons of Stewart's and the Heaviside step reformulation
We compare Stewart's reformulation (28) and the Heaviside step reformulation (22) based on their total number of algebraic variables, complementarity constraints, and equality constraints for a given number of switching functions n ψ .The total number of regions (and multipliers θ i ) in Stewart's reformulation is always n f = 2 n ψ .In contrast to the Heaviside step reformulation, we cannot reduce the number of variables if the regions R i are defined as unions of the base regions Ri .On the other hand, in the Heaviside step reformulation, depending on the geometry of the regions R i , n f is an integer in [2, 2 n ψ ].In the step reformulation, we may introduce n β lifting variables to reduce the nonlinearity.If n d > n ψ , this leads to n β = 2 n ψ − 2 n d additional lifting variables and equations.
We compare now the number of algebraic variables.In Stewart's reformulation, we have λ ∈ R 2 n ψ and µ ∈ R. In the Heaviside step reformulation, we have α, λ p , λ n ∈ R n ψ .Thus, the total number of algebraic variables in the former case is n S alg = 2 • 2 n ψ + 1, and in the later case n H alg = n f + 3n ψ + n β .The number of complementarity constraints n comp in Stewart's case is n S comp = 2 n ψ , and in the Heaviside step case n H comp = 2n ψ , i.e., we have exponential versus linear complexity.Finally, in Stewart's reformulation, we have in total n S eq = 2 n ψ + 1 equality constraints (g i (x) = λ i − µ and e ⊤ θ = 1).In the Heaviside step case, there are n H eq = n f + n β + n ψ equality constraints, for the definitions of θ i , β i and the constraints ψ i (x) = λ p i − λ n i , respectively.The numbers of variables and constraints are summarized in Table 4. Figure 4 illustrates the different quantities for several n ψ .We plot for the Heaviside step reformulation two extreme scenarios: 1. Worst complexity case -every basis set defines a PSS region, n f = 2 n ψ , we lift to have only bilinear terms, i.e., n d = 2 (maximizes the number of lifting variables) -red line in the plots.
2. Best complexity case -no lifting and only two regions (n f = 2) for all n ψ -yellow line in the plots.
Note that in both cases the Heaviside step reformulation has the same number of complementarity constraints.For smaller values of n ψ , both reformulations have similar complexity.For a large number of switching functions, the step reformulation has fewer variables.However, if there is no lifting, the problem can become very nonlinear for large n ψ .An extensive numerical comparison of the use of the two approaches in simulation and optimal control problems has been done in [40,Section 5.8].It turns out that statistically both formulations have similar performance, but on specific examples one can outperform the other.In addition, the Heaviside step reformulation provides more modeling flexibility.

Numerical examples
In this section, we consider two numerical examples.First, we consider a numerical simulation example of a gene regulatory network and empirically demonstrate that the FESD method has higher order integration accuracy.Second, we consider an optimal control example of a planar two-link monoped that must reach a certain goal in the horizontal direction.We compare the Heaviside step reformulation with the Stewart reformulation in terms of total computational time and cost per iteration.

Gene regulatory networks
In the simulation experiment we consider a gene regulatory network model, which is described by a DI of the form (14).The model is not a Filippov system, but a more general DI.More numerical examples with FESD for the step reformulation can be found in [32].In this reference, we have generated integration order plots for a Filippov system and confirmed empirically the results of Theorem 16.We regard the IRMA example, a synthetic network composed of five genes, originally proposed in [14].This example is inspired by [5], which includes more examples with Heaviside step functions that are implemented in nosnoc [2].can be seen that the FESD step formulation preserves the integration order of the underlying Runge-Kutta method, while with the standard time-stepping approach, using methods that typically have higher-order integration accuracy degrade to order one without FESD.This experiment shows, that if a feasible solution is found the integration order of the underlying Runge-Kutta method is preserved.The implementation of the example is publicly available 3 .

Optimal control example with state jumps
Now we further investigate the use of FESD for the Heaviside step reformulation by applying it to an optimal control problem.We regard the problem of synthesizing dynamic motions of the two-link Capler robot with state jumps and friction [15].Systems with state jumps do not directly fit the form of ODEs with set-valued Heaviside step functions (14).However, using the time-freezing reformulation we can transform the system with state jumps into a PSS of the form of (8) [37] and use the methods developed in this paper, see also Remark 2. The monoped's configuration is described by four degrees of freedom q = (q x , q z , ϕ knee , ϕ hip ), where (q x , q z ) are the coordinates of the monoped's base at the hip and ϕ knee , ϕ hip are the angles of the hip and knee.The robot is actuated by two direct-drive motors at the hip and knee joints.The control variables are the torques of these motors u(t) = (u knee (t), u hip (t)).Denote by p foot (q) = (p foot,x (q), p foot,z (q)) and p knee (q) = (p knee,x (q), p knee,z (q)) the kinematic position of the robot's foot and knee, respectively.We model a single contact point, the tip of the robot's foot touching the ground, which is expressed via the unilateral constraint f c (q) = p foot,z (q) ≥ 0.Moreover, we denote the expression summarizing the Coriolis, control, and gravitational forces by f v (q, u), the inertia matrix by M (q), the normal contact Jacobian by J n (q) ∈ R nq , and the contact tangent by J t (q) ∈ R nq .The detailed derivation of all mentioned functions, i.e., the model equations, kinematic expressions, and all parameters for the robot can be found in [23,Appendix A].
After the time-freezing reformulation, the PSS state consists of x(τ ) = (q(τ ), v(τ ), t(τ )) ∈ R 9 , with q(τ ) being the position, v(τ ) the velocity, t(τ ) is a clock state needed for the time-freezing reformulation and τ is the time of the ODE, cf.[37].For the time-freezing PSS, we have in total three switching functions: the gap function, as well as the normal and tangential contact velocities [37]: ψ(x) = (f c (q), J n (q) ⊤ v, J t (q) ⊤ v).
This allows a definition of eight base sets via the sign matrix (cf.Definition 1): However, the time-freezing PSS has only three regions, where the first one consists of the union of the first six base sets, and the other two match the two remaining base sets, i.e.: R 1 = ∪ 6 i=1 Ri = {x ∈ R nx | f c (q) > 0} ∪ {x ∈ R nx | f c (q) < 0, J n (q) ⊤ v > 0}, R 2 = R7 = {x ∈ R nx | f c (q) < 0, J n (q) ⊤ v < 0, J t (q) ⊤ v > 0}, R 3 = R8 = {x ∈ R nx | f c (q) < 0, J n (q) ⊤ v < 0, J t (q) ⊤ v < 0}. is x(0) = (0, 0.4, 0, 0, 0, 0, 0, 0, 0).Given a reference x ref (t), which is a spline interpolation between the initial and final position, and includes two jumps, we define the least-squares objective with the running and terminal costs Collecting all the above, we can define an OCP of the form of ( 44), which we discretized with the FESD Radau IIA scheme of order 3 (n s = 2), with N FE = 3 finite elements on every control interval.This OCP is discretized and solved with nosnoc in a homotopy loop with IPOPT [51]. Figure 7 illustrates several frames of an example solution (N = 60).Figure 8 shows the relevant vertical and horizontal positions and velocities, and Figure 9 shows the optimal controls.From these plots we can see that the robots make several small jumps, mostly landing with low vertical velocities to reduce energy dissipation, and successfully reaches the target position with zero terminal velocity.
Next, we solve this OCP for different values N (number of control intervals) from 30 to 90 in increments of 10 and compare FESD for the Heaviside step reformulation (this paper) to FESD derived for Stewart's reformulation [38].We compare the CPU time per NLP iteration and total CPU time for both approaches in Figure 10.As expected, due to the smaller number of variables and constraints, the Heaviside step reformulation leads to faster NLP iterations than the Stewart reformulation.In our experiments, we try different linear solvers from the HSL library [30].The linear solver choice has a particularly strong influence on performance for both reformulations.After evaluation, we select the best linear solver for each one, which is ma27 for Stewart reformulation and ma97 for the Heaviside step reformulation.We can see the linear algebra costs increase significantly, for N > 60, as the linear solvers need more time to factorize larger systems.The total computation time is influenced by several factors: homotopy algorithm, initialization, NLP solver performance, linear algebra solver, etc., and as such shows a less clear trend.In this example, Stewart's reformulation achieves a consistently lower computation times despite a higher cost per iteration.On the other hand, in our experiments in [32], we used a similar setting and the Heaviside step reformulation led to lower total computational times.We can conclude that which reformulation is the better depends strongly on the example and the chosen optimization algorithm parameters.Further comparisons of the two approaches can be found in [40, Section 5.2].

Summary
This paper extends the Finite Elements with Switch Detection (FEDS) method to nonsmooth dynamical systems with set-valued Heaviside step functions.The step functions allow to encode logical relations within a dynamical system.These systems cam be equivalent to Filippov systems, and this paper focuses on this case.The set-valuedness of the Heaviside step functions leads to a differential inclusion.However, by using the optimality condition of a parametric linear program whose solution map corresponds to the step function, one obtains an equivalent Dynamic Complementarity System (DCS).Now the nonsmoothness and combinatorial structure is expressed algebraically and is thus suitable for numerical computations.In the derivation of FESD, we exploit the continuity of the Lagrange multipliers in the DCS, which in turns enables the accurate detection the nonsmooth transition in time.This is also necessary for the correct computation of sensitivities of the discretized nonsmooth system.We show how to apply the FESD method to discretize optimal control problems subject to nonsmooth systems with set-valued step functions and provide some convergence results of the method in simulation problems.Compared to Stewart's reformulation used in [38], the reformulation considered here allows for a more compact reformulation of the same system, but with a smaller number of algebraic variables.The comparison on an optimal control example shows that the new approach has a lower cost per iteration.In summary, we extend the FESD method to a problem class that allows more modeling flexibility than in our initial study in [38].Our claims are verified by numerical examples.An implementation of the new method is provided in the open-source software package nosnoc [35].

1 J
eq. in the step DCS as nonsmooth DAE (23) g(x) Stewart's indicator function, (25) W K,I (x, u) auxiliary matrix used in the study of the DCS (22), Sec.3.2 B K,I (x, u) auxiliary matrix used in the study of the DCS (22), Sec.3.2 R i regions of the PSS, Eq. (8) Ri base sets, Def.index set of PSS modes, Eq. (

Figure 1 :
Figure 1: Summary of relations between nonsmooth systems treated in this paper.

Figure 2 :
Figure 2: Illustration of example solution trajectories for different switching cases.The rows from top to bottom show x(t), α(t), λ p (t) and λ n (t) for the cases (a)-(d) in Example 3, respectively.

4 .
Finite Elements with Switch Detection (FESD) for the step representation if the left and right time derivatives are zero, respectively.Likewise, they are positive when the left and right time derivatives are nonzero.

Figure 4 :
Figure 4: Comparison of the complexities of Stewart's and the step reformulation.

3Figure 9 :
Figure 9: The optimal controls for the example of N = 60 control stages.

Figure 10 :
Figure 10: The total CPU time of the homotopy loop for different N (left plot) and the CPU time per NLP solver iteration (right plot).

Table 1 :
Key symbols used in this paper.

Table 2 :
Overview of switching cases for the step size equilibration

Table 4 :
Comparison of the problem sizes in Stewart's and the step reformulation for a fixed n ψ .