Optimal Switching for Hybrid Semilinear Evolutions

We consider the optimization of a dynamical system by switching at discrete time points between abstract evolution equations composed by nonlinearly perturbed strongly continuous semigroups, nonlinear state reset maps at mode transition times and Lagrange-type cost functions including switching costs. In particular, for a fixed sequence of modes, we derive necessary optimality conditions using an adjoint equation based representation for the gradient of the costs with respect to the switching times. For optimization with respect to the mode sequence, we discuss a mode-insertion gradient. The theory unifies and generalizes similar approaches for evolutions governed by ordinary and delay differential equations. More importantly, it also applies to systems governed by semilinear partial differential equations including switching the principle part. Examples from each of these system classes are discussed.


Introduction
We consider hybrid dynamical systems on some infinite (or finite) dimensional space Z and a finite set of modes M. For a given family {A j } j∈M of densely defined linear operators on Z, families of nonlinear functions {f j } j∈M and {g j,j ′ } j,j ′ ∈M×M on Z and a finite time horizon [0, T ] with initial condition z(0) = z 0 ∈ Z the dynamics are governed by abstract continuous time evolution equations combined with discrete events involving state resetṡ z = A j z + f j (z), z = g j,j ′ (z − ) whenever the mode j ∈ M is held constant or whenever j with associated state z − is switched to the new mode j ′ ∈ M with new state z at switching times (τ k ) k∈N ⊆ [0, T ], respectively. Supposing that the sequence of switching times (τ k ) k and the modal sequence (j k ) k are subject to our control and that we have a cost function J = J(z) integrating running and switching cost associated to the respective continuous or discrete evolution, we may consider the minimization of J over any such sequences of finite length as an optimal control problem. The precise setting and main hypotheses are introduced in Section 2 below.
This and variants of this optimal control problem have been extensively addressed for ordinary differential equations (ODEs), e. g., based on dynamic programming principles [6,13], non-smooth programming [4], control parametrization enhancing techniques [15] and relaxation techniques [3,19]. Moreover, if the modal sequence (j k ) k is a-priori fixed, the control problem reduces to switching-time optimization and can be solved using gradient-based methods [7,8,14,23]. The latter approach has also been extended using gradients with respect to mode-insertions into a given sequence [8]. Switching-time optimization and mode-insertions can be combined to conceptual algorithms to tackle the original problem [1,21]. We refer to [25] for a more detailed survey of available results for the ODE case.
Much less work has been done for similar optimal control problems in context of ordinary delay differential equations (DDEs) and partial differential equations (PDEs). Such problems arise for example in optimal control of gas networks, where switching of valves is an essential part of the control mechanism for the gas flow governed by algebraically coupled PDEs on a graph representing the network of pipes [10,11]. Switching-time optimization has been considered for ordinary DDEs in [20,22] and, when switching only affects boundary data, for scalar hyperbolic PDEs in the semilinear case [9] and in the non-linear case [18]. In a more abstract fashion based on semigroup theory covering both, certain DDEs and PDEs, dynamic programming extends to problems when A j is a generator of a strongly continuous semigroup independent of j and switching only affects the non-linear perturbation [16]. In the same setting, relaxation techniques can sometimes be applied [12]. Our contribution in this paper is to extend the concept of switching-time optimization and mode-insertion from ODE problems in [8] to the abstract setting of non-linearly perturbed strongly continuous semigroups. Unlike in [8], we consider non-autonomous dynamics, state-resets at switching times and include switching costs. Moreover, among switching of the non-linear perturbation, our theory explicitly considers switching of the generators, which (in non-trivial cases) cannot be handled with the results available in the literature so far. This allows-under certain technical restrictions-the treatment of switching, e. g., the delay parameter of a DDEs or switching the principle part of a PDE in the hybrid dynamical system represented by the above equations. Our analysis focuses on the differentiability properties of the cost function and the representation of the derivative using solutions to appropriate adjoint problems. The analysis of gradient-descent algorithms using such derivative information as well as applications for example to gas network optimization will be considered in future work. In Section 2 we introduce our abstract problem setting including the hypotheses concerning the regularity of the system parameters. In Section 3 we consider differentiation of the costs with respect to the switching times for a given mode sequence.
In Section 4 we discuss differentiation of the costs with respect to the insertion of a new mode into a given sequence of modes. In Section 5 we show that one can recover the result of [8] for the ODE case from our theory under rather mild technical assumptions on the system parameters. Moreover, we show that the results can be used for example to obtain efficient gradient-representations of integro-type DDE and that the theory is consistent with stability analysis for a PDE switching between a transport equation and a diffusion equation.

Notation, Basic Hypotheses and Preliminaries
In the presentation of our results, we mainly use standard notion from the theory of strongly continuous semigroups as for example in [17]. Nevertheless, for clarity, we mention the following notation and conventions used in context of a Banach space Z. We denote by Z * the topological dual space of Z and for every z * ∈ Z * the dual pairing by z * , z Z * , Z : Our basic hypotheses in this paper are as follows.
(A1) Z is a reflexive Banach space and z 0 ∈ Z, M is a finite set and j 0 ∈ M.
We then define z − (τ n ) := z n−1 (τ n ) for all n ∈ {1, . . . , N }. Depending on whether we wish to emphasize the dependence of a mild or classical solution z to (1) on (j, τ ) we will use both the notations z(.) and z(., j, τ ) equally in the following-still keeping in mind, however, not to confuse this with the value z(τ k ) = z(τ k , j, τ ) of z at the time t = τ k .
Remark 1: According to the above definition, z is a mild/classical solution to (1) if and only if z n is the mild/classical solution to the abstract Cauchy probleṁ z n (t) = A jn z n (t) + f jn (t, z n (t)), t ∈ (τ n , τ n+1 ), for every n ∈ {0, . . . , N }. In the case where τ n = τ n+1 this problem degenerates to the one-point map z n (τ n ) = g jn−1,jn (z n−1 (τ n )) and if multiple switching times coincide, for instance τ n = . . . = τ k < τ k+1 for some n, k ∈ {0, . . . , N } with n < k, the map z only adopts the value of the last function defined at that time point, that is z(τ n ) = z(τ k ) = z k (τ k ). Therefore, z is in either case a right-continuous map on [τ 0 , τ N +1 ], continuous on [τ n , τ n+1 ) for every n ∈ {0, . . . , N }. For given sequences j and τ the maps z 0 , . . . , z N are uniquely defined by z and vice versa.
⋄ We have the following wellposedness result.
Under the Assumptions (A1)-(A3), there exists a unique maximal T max > 0 such that (1) has a unique mild solution on [0, T max ) for every sequence of modes (j n ) n=0,...,N ⊆ M and every monotonically increasing sequence of switching times (τ n ) n=0,...,N +1 ⊆ [0, T max ). T max is lower semicontinuous as a function of the initial state z 0 ∈ Z. If, furthermore, Assumption (A4) is satisfied, then the solution is classical.
Proof: Proof by induction over the number N of switching points: Basis case: if N = 0, that is if there is no switching point, then system (1) reduces toż (t) = A j0 z(t) + f j0 (t, z(t)), t ≥ 0, According to [17,Chapter 6,Theorem 1.4], if Assumptions (A1)-(A3) are satisfied, there is a unique maximal T max > 0 such that this equation has a unique mild solution for t ∈ [0, T max ). If furthermore (A4) holds, then the mild solution is classical by [17, Chapter 6, Theorem 1.5]. Moreover, T max is lower semicontinuous as a function of the initial value z 0 ∈ Z, see for instance [5,p. 59,Proposition 4.3.7]. Induction hypothesis: if the system has N − 1 switching points τ 1 , . . . , τ N −1 , then there is a unique maximal T max = T max (z 0 ) > 0 such that the following holds: if 0 ≤ τ 1 ≤ . . . ≤ τ N −1 < T max , then the system has a unique mild solution on [0, T max ). Furthermore T max is lower semicontinuous as function of the initial value z 0 . Induction step: now suppose the system has N switching points and first fix z 0 ∈ Z.
Recalling the basis case we find a maximal T 1 max > 0, such that for every choice τ 1 ∈ [0, T 1 max ) the first equation has a unique mild solution on [0, τ 1 ]. Fix τ 1 , then applying the inductive hypothesis we further get a maximal existence time ) > 0 such that for every choice τ 1 ≤ τ 2 ≤ . . . ≤ τ N < T 2 max the rest of the system has a unique mild solution on [τ 1 , T 2 max ). The combined end time T max (τ 1 ) = τ 1 + T 2 max (z(τ 1 )) as a function of τ 1 is thus lower semicontinuous. Now choose any θ ∈ [0, T 1 max ), then we have Finally, using the basis cases and the hypothesis, we know that T 1 max is lower semicontinuous as function of z 0 and T 2 max (z 1 ) is lower semicontinuous as a function of z 1 ∈ Z. Since z(τ 1 ) depends continuously on z 0 (even Lipschitz-continuously, see [17, Chapter 6, Theorem 1.2]), we also find that T max and thusT max are lower semicontinous with respect to z 0 .
Without loss of generality we set τ 0 = 0 and, in regard of Lemma 2, can add the following assumptions: (A5) Let T ∈ (0, T max ) be given with T max as in Lemma 2 and define the set of admissible switching times as (A6) Let l : [0, T ] × Z → R be continuous and continuously differentiable with respect to the second argument. Furthermore let l m,n : [0, T ] × Z → R be continuously differentiable for every m, n ∈ M with m = n. We then define the cost function J and the reduced cost function Φ by Φ(j, τ ) = J(τ, z(., j, τ )).

Switching Time Gradient
In this section, we fix a sequence j = (j n ) n=0,...,N of modes for the hybrid evolution (1) and address the subproblem of determining optimal switching times in order to minimize (3). The problem can then be summarized as solving the following parametric optimization problem Motivated by similar approaches for ODEs in [7,8], we consider in the following the differentiability of J with respect to admissible switching times τ ∈ T (0, T ) and prove an adjoint equation based representation of the gradient ∂Φ ∂τ . Analogous to the ODE case in [8], this leads to first order optimality conditions and makes this subproblem accessible for gradient based optimization methods.
mapping (t, τ ) onto the classical solution z(t, τ ) to (1) at the time t and switching times τ is continuously differentiable on the subset Moreover, for any fixedτ ∈ T (0, T ) and k ∈ {1, . . . , N } the partial derivative can be continued on [τ k , T ] as a right-continuous function and then is the mild solution to the systeṁ Proof: Applying the given assumptions on Lemma 2 yields a continuously differentiable solution z to (1) for every fixed τ ∈ T (0, T ) and we get for t ∈ (τ n , τ n+1 ) and all n ∈ {k + 1, . . . , N }. Since the right-hand sides of these equations are differentiable with respect to τ k , so are the left-hand sides and differentiating (8) using (7) yields In particular, Then differentiating (9) furthermore leads to exists for all n ∈ {k + 1, . . . , N }. Therefore z k is a mild solution to (6). Moreover, if z is the given solution to (1) Remark 4: Note that z is in general not differentiable with respect to τ k as a function on the whole time interval [τ 0 , T ] and, in particular, the above derivative on the boundary t = τ k has to be understood one-sided. Indeed, since z(t) does not depend on τ k for t < τ k , we then get z k (t) = 0, thus the left and right derivatives in t = τ k do not match.
⋄ Problem (5) is equivalent to the minimization of the reduced cost function and since T (0, T ) ⊂ R N is compact, if Φ is continuous, a minimum exists. If Φ even is differentiable, we can ask for first order optimality conditions. Formally applying the chain rule yields In order to evaluate the right-hand side by applying Lemma 3, however, we would need to solve N individual systems. Instead, we will seek a computationally more efficient representation and will express the above derivative by means of the solution to (1) and the solution to the following adjoint problem on the dual space Z * : Remark 5: We can motivate these equations by applying the Lagrange formalism to the minimization problem (5): Define the Lagrange function ṗ n (t) + (A jn ) * p n (t), z n (t) Z * , Z + p n (t), f jn (t, z n (t)) Z * , Z dt .
Theorem 8: Assume z is the unique classical solution to (1) and z k and p are the unique mild solutions to (6) and (10), respectively.
(i) The reduced cost function Φ is continuously differentiable on T (0, T ) with respect to the k-th switching time with for every τ ∈ T (0, T ) and every k ∈ {1, . . . , N }.
Proof: Applying the chain rule and Lemma 7 yields that Φ is a differentiable map and where e k ∈ R N is the k-th unit vector and As a composition of continuous functions ∂Φ ∂τ k is continuous. This concludes the proof for (i).
The assumptions in (ii) yield that τ is a local minimum of Φ under the constraint Applying the classical necessary optimality conditions by Karush-Kuhn-Tucker, we find that there is Lagrange multiplier λ ∈ [0, ∞) N +1 such that for any fixed k ∈ {1, . . . , N }. If we define for the sake of simplicity λ −1 = λ N +1 = 0, then proving the claim.

Remark 9:
The adjoint problem (10), due to its dependency on z in general, only admits a mild solution if z is a classical solution to (1). We are not aware of weaker concepts in order to derive a gradient representation as in Theorem 8. However, in special cases, for instance if the function f in (1) is in fact linear, the results in Theorem 8 can be generalized to mild solutions z to (1), if problem (10) admits a classical solution. ⋄

Mode Insertion Gradient
In this section, we consider an infinitesimal insertion of a new mode into a given sequence of modes for the hybrid evolution (1) and provide a representation for the sensitivity of the cost function (3) with respect to this perturbation. This concept has been introduced for ODEs in [8] and makes the subproblem of determining optimal sequences of modes for the hybrid evolution (1) in order to minimize (3) again accessible for gradient based optimization methods. To this end, we assume (B1) transition functions g i,j , g k,j , g i,k mapping between the modes i, j, k ∈ M satisfy g i,j = g k,j • g i,k . (B2) j = (j n ) n=0,...,N ⊆ M is a given sequence, k ∈ {0, . . . , N } is fixed and  ∈ M.

Examples
In this section, we present some applications for the theory developed above. We first state the results that our theory yields for the special case of ordinary differential equations. Moreover, we apply our theory to a system of delay differential equations and finally to a system of partial differential equations. 5.1. Ordinary Differential Equations. The above results also cover the case of switched systems of ODEs. Set Z = R m and for all j ∈ M set A j = 0. If (j n ) n=0,...,N ⊆ M and (τ n ) n=0,...,N +1 ⊆ [0, ∞) with 0 = τ 0 ≤ τ 1 ≤ . . . ≤ τ N +1 , then (1) reduces tȯ and the adjoint equation (10) becomeṡ Suppose Φ is defined as in (A6) and, again, we want to find sequences j and τ that minimize Φ. Then there is a T max > 0 such that (16) has a unique classical solution and (17) has a unique Carathéodory-solution for every T ∈ (0, T max ) and all τ ∈ T (0, T ). Furthermore, Φ is differentiable with respect to the k-th switching time τ k with dΦ dτ k (τ ) = l 1 (τ k , z − (τ k )) − l 1 (τ k , z(τ k )) + (l 2 k ) τ (τ k , z − (τ k )) and 8 (ii) holds. If, furthermore, Assumptions (B1)-(B2) hold, then the mode insertion gradient defined in (14) for (16) and a insertion mode ∈ M is given by  (8) and (10) to derive the above formulae. Since z and p depend continuously on the semilinearities f jn , see the variation of constants formula, passing to the limit l → ∞ yields the claim.
Let, for instance, f 1 (t, z(t)) = z(t) and f 2 (t, z(t)) = 0. Then, with the transition function g 1,2 (z) = z, we get the systeṁ z(t) = A 1 z(t) + z(t) for t ∈ (0, τ ), A 2 z(t) for t ∈ (τ, T ), We note that D(A 2 ) is an A 1 -admissible subspace of D(A 1 ), thus the part of A 1 on D(A 2 ), again denoted by A 1 in the following, is the generator of a C 0 -semigroup with domain D(A 2 ). Therefore suppose z 0 ∈ D(A 2 ), then (23) has a unique classical solution for every choice of τ ∈ [0, T ]. Assume we want to minimize the L 2 -norm of z at the final time, then an appropriate cost function could have the form If we compare this with (3), we get l(t, z(t)) = 1 2 δ T (t)z(t) 2 = 1 2 z(T ) 2 and l z (t, z(t)) = δ T (t)z(t) = z(T ), where δ T denotes the delta distribution evaluating at t = T , and the adjoint equationṗ (t) = −A 1 p(t) − p(t) for t ∈ (0, τ ), −A 2 p(t) for t ∈ (τ, T ), Since the first evolution in (23) is unstable, while the second one is asymptotically stable, we would expect the optimum to be τ = 0. Applying Theorem 8 indeed yields where we used that {S 2 (t)} t≥0 commutes with A 1 and A 2 and that z(T ) = S 2 (T − τ )S 1 (τ )z 0 ∈ D(A 2 ) = H 2 (R, R), thus τ = 0 is a global minimum.

Conclusions
This paper presents solution theory and sensitivity formulae for dynamical systems switching between abstract evolutions. The results can be used for descent methods for optimization applied to a broad variety of differential equations of ordinary, delay or partial type. In case of partial differential equations, the presented theory also covers constant boundary conditions such as homogenous Dirichlet-or Neumann-conditions by including these in the domain of the semigroup generator. More general boundary conditions require an extension of the presented theory to unbounded perturbations. Further directions for future work are second order necessary conditions, sufficient conditions and the convergence behavior of algorithms such as coordinated-descent-methods or alternating-direction-methods using the provided gradient information on the level of appropriate discretizations.