Methodology for robust multi-parametric control in linear continuous-time systems

This paper presents an extension of the recent multi-parametric (mp-)NCO-tracking methodology by Sun et al. [Comput. Chem. Eng. 92 (2016) 64–77] for the design of robust multi-parametric controllers for constrained continuous-time linear systems in the presence of uncertainty. We propose a robust- counterpart formulation and solution of multi-parametric dynamic optimization (mp-DO), whereby the constraints are backed-off based on a worst-case propagation of the uncertainty using either interval analysis or ellipsoidal calculus and an ancillary linear state feedback. We address the case of additive uncertainty, and we discuss approaches to dealing with multiplicative uncertainty that retain tractability of the mp-NCO-tracking design problem, subject to extra conservativeness. In order to assist with the implementation of these controllers, we also investigate the use of data classiﬁers based on deep learning for approximating the critical regions in continuous-time mp-DO problems, and subsequently searching for a critical region during on-line execution. We illustrate these developments with the case studies of a ﬂuid catalytic cracking (FCC) unit and a chemical reactor cascade.


Introduction
On-line optimization and real-time control have received significant attention over the past few decades, driven by the need to improve performance and reduce economic costs in industrial processes. The strategy employed in classical model predictive control (MPC) entails the repeated solution of an optimal control problem that predicts the system's future behavior over a finite, receding time-horizon, using the current state measurements or estimates as initial conditions [42]. The optimized control actions are implemented until the next measurements become available, effectively creating a feedback control. This process can be quite computationally demanding and cause important delays for fast dynamic systems, thereby leading to suboptimal performance or even instability.
Several strategies have been developed to mitigate the computational burden caused by the on-line solution of optimization problems in MPC. In the explicit MPC paradigm [38], the optimization is performed off-line, resulting in an explicit mapping between the control actions and the initial state or other measurable quantities. For continuous-time systems in particular, this approach gives the future of the system is optimized as if neither external disturbances nor model mismatch were present, despite the fact that such disturbances and mismatch are the reason why feedback is needed in the first place. This situation is similar to classical MPC [4,42] and is a clear call for the development of robust mp-NCOtracking controllers. Popular approaches to designing robust MPC controllers for linear systems rely on the construction of ancillary feedback laws using linear matrix inequalities (LMI) and min-max optimization [23,24,31,43,45]; the propagation of robust forward invariant tubes enclosing the state trajectories for all possible disturbance realizations [29,33,34,[39][40][41]; and the reformulation of MPC as a multi-stage optimization problem using dynamic programming and subsequent robustification of the subproblems [25]. The approach to robustifying mp-NCO-tracking controllers developed through this paper is inspired by tube-based MPC. It involves backing-off the path and terminal state constraints based on a worst-case uncertainty propagation, for instance using interval analysis or ellipsoidal calculus. An important advantage of this approach is that it retains the same complexity as with nominal mp-NCO-tacking controllers. Preliminary results have been presented in [48], which are extended herein by introducing ancillary feedback laws in order to reduce the conservativeness.
A second limitation of the mp-NCO-tracking methodology in [47] is the lack of a systematic procedure for characterizing the critical regions in continuous-time mp-DO problems and searching for a critical region during on-line execution. This is due to the fact that the critical regions in mp-DO may be non-convex and closedform expressions describing their boundaries may not be available in general. This contrasts with explicit NLP, e.g. based on mp-QP, for which powerful detection and mapping of the critical regions are available, including geometrical techniques [3,5,46], combinatorics [13,15,17], and, more recently, graph-theoretic approaches [1,37]. Herein, we investigate the use of data classifiers based on deep learning [2,14] in order to characterize the critical regions in mp-DO. Such classifiers take the control problem parameters as inputs and map the corresponding critical regions in terms of their switching structures. Similar applications of machine learning within explicit MPC have been proposed for approximating the solution of both linear and nonlinear MPC [16,28]. Another feature of data classifiers lies in their ability to estimate the likelihood of a given parameter value to belong within a certain critical region, thus providing a basis for the point-location problem during the on-line execution of mp-NCO-tracking.
The rest of the paper is organized as follows. Section 2 gives a formulation of the optimal control problem of interest and provides some background on the mp-NCO-tracking methodology. The two main contributions of the paper, namely the use of data classifiers for mapping of the critical regions in mp-DO and the robust-counterpart formulation and solution of mp-NCO-tracking, are detailed in Sections 3 and 4, respectively. The case study of a fluid catalytic cracking (FCC) unit is used throughout these two sections to illustrate the theoretical developments. A second case study concerned with the control of series of chemical reactors is presented in Section 5. Finally, Section 6 concludes the paper and discusses future research directions.
Notation. The set of compact subsets of R n is denoted by K n , and the subset of compact convex subsets by K n C . Given two subsets W, Z ∈ K n , their Minkowski sum is W ⊕ Z:={w + z | w ∈ W, z ∈ Z}, and their Minkowski (or Pontryagin) difference is W Z:={w ∈ W | ∀z ∈ Z, w + z ∈ W }.
An interval vector is given by y L , y U ∈ K n C , with the interval radius and midpoint defined as The set of n-dimensional positive semi-definite [resp. positive definite] matrices is denoted by S n + [resp. S n ++ ]. An ellipsoid with center q ∈ R n and shape matrix Q ∈ S n + is given by The support function V [Z] : R n → R of a set Z ∈ K n is defined as In particular, the support functions of the interval y L , y U and the ellipsoid E(q, Q ) are with abs(c) := (|c 1 |, . . ., |c n |) T . The ith row [resp. column] of a matrix A ∈ R m×n is denoted by A (i,·) [resp. A (·,i) ]. e k ∈ R n denotes the vector with a 1 in the ith coordinate and 0's elsewhere.

Problem formulation and background
We consider constrained linear-quadratic optimal control problems under uncertainty, in the form of where u(t) ∈ R nu and x(t) ∈ R nx are the control input and the state response at a given time t; Â ∈ , with ∈ K n Â C , is the parameter vector; w(t) ∈ W , with W ∈ K nw C , is the additive time-varying uncertainty; Q f ∈ S nx + , Q ∈ S nx + , and R ∈ S nu ++ are given weighting matrices; and The main objective herein is to construct robust mp-NCOtracking controllers that can guarantee feasibility against the worst-case scenario of the time-varying uncertainty w(t). The focus is on developing robust formulations that are amenable to numerical solution at a similar computational effort as the nominal mp-NCO-tracking controllers in Sun et al. [47], i.e. with F w = 0 in (1). Another principal objective is the use of data classifiers based on deep learning to approximate the critical regions in continuoustime mp-DO problems, and subsequently search for a critical region during on-line execution. Before presenting these two contributions, we provide an overview of mp-DO and mp-NCO-tracking in the following subsections.

Multi-parametric dynamic optimization
In the absence of uncertainty (F w = 0), an optimal solution to Problem (1) for a given parameter value Â can be characterized by means of the first-order necessary conditions for optimality (NCOs) in the form of a multi-point boundary value problem [7]; see [47] for a complete list of conditions and related controllability and regularity assumptions. It consists of the tuple (u(t), x(t), (t), (t), , ), where (t) ∈ R nx are the co-state (adjoint) variables; (t) ∈ R ng , the multipliers for the path constraints; ∈ R n h , the multipliers for the terminal constraints; and , the multipliers for extra interiorpoint constraints in the presence of high-order state constraints. A solution typically consists of finitely (N t (Â)) many arcs, which define the so-called optimal switching structure (S(Â)). The switching times t k (Â), k = 1 . . . N t (Â) − 1, between consecutive arcs in the optimal solution either correspond to the activation or deactivation times for a given path constraint, or touch-and-go points in the case of higher-order state constraints.
Each critical region i ⊆ , i = 1, . . ., NC of the mp-DO problem (1) comprises those parameter values Â ∈ i for which the optimal control solutions all share a common switching structure S i . An explicit parameterization of the optimal solutions in a critical region can be derived from the equality conditions in the parametric NCOs, and comes in the form of [47] ∀Â ∈ i , The remaining inequality conditions in the parametric NCOs yield an implicit description of the boundary of a critical region. At the boundary between two critical regions, at least one terminal or path constraint will thus change from active to inactive. This is similar in essence to the idea in region-free explicit model predictive control [27].

mp-NCO-tracking methodology
The mp-NCO-tracking methodology proceeds in two steps, as illustrated in Fig. 1: (i) The off-line step defines the multi-parametric control structure, which entails a partitioning of the parameter domain into NC critical regions, 1 ∪ · · · ∪ NC ⊆ , each corresponding to a unique switching structure S 1 , . . ., S NC . The feedback law in each critical region is furthermore expressed in the parameterized form (2). (ii) The on-line step applies the feedback law (2) in a receding horizon manner. Each sampling time entails the location of the critical region i containing the current parameter values Â. For a given switching structure candidate S i , this step starts by computing the switching times t 1 (Â), . . ., t N i t −1 (Â) so as to enforce continuity of the Hamiltonian function at these times, for instance by applying a Newton iteration. A verification that Â is contained within i then consists of a simple check that all of the primal and dual feasibility conditions are satisfied under the feedback law K i .

Classifier-based implementation of mp-NCO-tracking
A central, and particularly arduous, task for the application of the mp-NCO-tracking methodology is detecting all of the critical regions of the mp-DO problem (1), and representing those regions in a form that can be easily exploited in the on-line point location problem. Because the critical regions are generally non-convex and lack a closed-form representation, classification methods from the fast-developing field of machine learning, where the task is to predict a discrete class for a given input, represent a powerful alternative. In the context of mp-NCO-tracking, multinomial classifiers [2] can provide the desired mapping between the control problem parameter Â and the set of possible switching structures {S 1 , . . ., S NC }.
Our focus in this paper is on multilayer perceptron (MLP) [14], a class of neural networks featuring multiple (hidden) layers and non-linear activation functions, which has the ability to distinguish data sets that are not linearly separable. The MLPs of interest comprise exactly NC neurons in their output layers, each one corresponding to the unique switching structure for certain Â ∈ . A natural objective for MLP training is minimizing misclassification with respect to the training data set. In particular, we use a so-called softmax function, whose output P:=[p 1 , . . ., p NC ] is a vector representing the probability for each switching structure. The normalized probabilities returned by the softmax classifier can subsequently be exploited on-line, in the point location problem (Step ii in Section 2.2), for guiding the search of the correct switching structure and thereby avoid a naïve enumeration. Other loss functions which can only recognize the class with the highest score, such as support vector machines, are not as suitable in this context.
The specifics of MLP training for their application in mp-NCOtracking are detailed in Section 3.1, followed by an illustrative case study in Section 3.2. Although the focus is on MLP and softmax classifiers, the methodology could of course be transposed to other classification techniques.

MLP training for mp-NCO-tracking
The off-line training of an MLP requires a labeled training set, learning from this training set, and checking the evaluation accuracy against a labeled testing set (which is generated independently from the training set). So long as the evaluation accuracy is lower than a user-defined threshold, the training set keeps enlarging and both the training and evaluation steps are repeated. This overall procedure is summarized in Algorithm 1 below. The outcome is a trained MLP, which encapsulates a compact representation of the critical regions for the mp-DO problem at hand and can be passed to the mp-NCO-tracking controller for on-line use (see Fig. 1). Several remarks are in order. The initial sample set in Step i can be generated with a space filling technique, such as Sobol or Latin Hypercube sampling. The labeling of these samples calls for detecting the switching structure of the optimal control solution corresponding to every sample Â (1) , . . ., Â (2M) . This detection can be made based on a numerical solution of the optimal control problem (1). In parameterizing the control trajectory as u(t) = ( ), where is a finite set of parameters and the parameterization corresponds to a piecewise constant or linear function, and discretizing the path constraint as interior-point constraints, one can approximate the infinite-dimensional problem in (1) with a convex quadratic program (QP) for which fast and reliable solvers are available. The active constraints in the QP solution correspond to active path constraints at given times and active terminal constraints. The structure of the actual optimal control can then be inferred from inspection of the KKT multipliers and collecting the adjacent active times for the path constraints, as described in [44].

Algorithm 1. MLP training procedure for mp-NCO-tracking
The learning process in Step ii aims to minimize a softmax loss criterion representing the mismatch between the predicted and actual switching structures across the entire training set. Let s : R n Â → R K denote the score function of the MLP, where s j (Â) is the score of the jth class (or output neuron) at a point Â ∈ . The classical softmax function normalizes the score vector s in the range [0, 1], so as to yield a probability distribution over the K possible outcomes. Then, the softmax loss criterion, L for the classifier is formed in the following way, where T train :={(Â (M+1) , e k (M+1) ), . . ., (Â (2M) , e k (2M) )} denotes the training set, with labels e k (i) ∈ {0, 1} K and unique matching switching structure k (i) ∈ {1, . . ., K} for each sampling point i = M + 1 . . .2M (see Algorithm 1). A regularization term is normally added to the softmax criterion L, e.g. the sum of squared weights in the MLP, in order to diffuse the weights throughout the MLP neurons and prevent over-fitting. An important, yet arduous, decision prior to applying Algorithm 1 is selecting the number of hidden layers and hidden neurons in the MLP. Notice that it is necessary to have one or more hidden layers in the MLP since the critical regions are nonconvex in general, and therefore they may not be linearly separable. Although a single hidden layer MLP is capable of approximating any continuous functions under mild assumptions on the activation function [11,18], it may require an arbitrary large number of neurons in that hidden layer to meet a desired accuracy. Deep learning mitigates this problem by including additional hidden layers. As a rule of thumb, we consider hidden layers with 20 neurons here since the mp-DO problems of interest have no more than a few dozen critical regions. Furthermore, we start with a single hidden layer and keep adding layers so long as the prediction accuracy improves significantly. A more systematic analysis is beyond the scope of this paper and shall be the scope of future work.
The test for stopping the MLP training compares a worst-case estimate of the classifier error rate at the confidence level 1 − ˛, with the user-defined misclassification threshold ε 0 . In practice, this hypothesis testing can be recast in terms of the misclassification rate for the testing set, given byε withk:=arg max{ k (s(Â))|k = 1. . .K} referring to the class of the MLP's predicted (i.e. most likely) switching structure at a point Â; 1[ · ], the indicator function; and M, the cardinality of the testing set. This leads to the following stopping criterion [35], cumulative probability function; and · denotes the floor function. Notice that such a termination criterion for the MLP training ensures that the sample size is sufficiently large for the test to be statistically significant, but it does not provide any guarantee that the trained MLP will describe the mp-DO critical regions to a desired accuracy. Instead, the ability of an MLP to describe a large number of nonconvex critical regions can improve with an increasing number of hidden layers and number of neurons in these layers. The caveat is of course that a greater (off-line) computational burden is necessary for training a bigger MLP and for labeling the samples in a larger training set T. There is furthermore no guarantee that all of the possible switching structures within the parameter set have been uncovered.
A simple strategy to reduce the odds of missing one or more switching structures entails increasing the initial sample size. For instance, if the number of critical regions could be pre-computed, e.g. via the application of complete-search techniques [8], one could then use this information to increase the sample size until all of the critical regions have indeed been uncovered. In order to reduce the overall number of samples needed to meet the required accuracy and to empower a better representation of the boundary between critical regions, one could also apply a biased sampling approach in Step iii, or other active learning strategies [9]. For instance, a higher concentration of sampling points could be generated around points that are either misclassified by the MLP or correctly classified but with a probability below a certain threshold; see, e.g., [10,12].

FCC case study
For illustration, we consider a fluidized catalytic cracking (FCC) unit operated in partial combustion mode [21]. The objective is to steer the system to a given operating point, defined in terms of the mass fraction of coke on regenerated catalyst, C rc and the regenerator dense bed temperature, T rg . The manipulated variables are the flow rate of air sent to the regenerator, F a and the catalyst flow rate, F s . The linear input-output dynamic model results from the linearization and reduction of a first-principles nonlinear model around the equilibrium point C * rc = 5.207 × 10 −3 , T * rg = 965.4 K and T * f = 400 K, where the latter denotes the feed oil temperature. The control and state variables in this linear dynamic system are whereas the initial conditions of the states C rc and T rg are treated as uncertain parameters, and the temperature T f acts as (unmeasured) disturbance, The mp-DO problem of interest reads as min In the present subsection, the focus is on the nominal case with The robustification against the timevarying disturbance w(t) will be considered later in Section 4.4. The problem where the temperature T f is treated as a third uncertain parameter, whose value is revealed during on-line execution, was addressed in Sun et al. [47].
We apply Algorithm 1 to construct an MLP that describes the switching structures of the mp-DO (4). Both the training and testing sets are initialized with M = 500 points using Sobol sampling, and an additional 100 points are added to the training set at each iteration. The labeling of all these points proceeds by parameterizing the control trajectories as piecewise constant and discretizing the path constraints at interior points over 100 stages, and then solving the resulting QP problems using the Gurobi MATLAB library. The selected MLP is a fully connected network, with 2 hidden layers, 20 neurons in each layer, and tanh(x) as the sigmoid activation functions in each neuron; we use the TensorFlow toolbox (https://www. tensorflow.org/) for the MLP training. Furthermore, we impose a misclassification error rate of <0.1% with 95% confidence on the MLP as a termination criterion for Algorithm 1, which implies that none of the points in the testing set should be misclassified in order to meet the stopping criterion (3).
After the first iteration of Algorithm 1, 9 critical regions are detected in the training set. A total of 10 elements in the testing set are misclassified by the trained MLP, which corresponds to a misclassification rate of 2.0%. It takes 15 iterations to Algorithm 1 to produce an MLP with no misclassification within the testing set. The final training set comprises of 1900 sample points, and the trained MLP represents 11 critical regions as shown in Fig. 2: • 1 comprises unconstrained optimal controls; • 2 and 7 comprise controls with two arcs, a boundary arc where u 2 reaches its lower and upper bound, respectively, followed by an interior arc; • 6 and 11 comprise controls with three arcs, a boundary arc where x 1 reaches its upper and lower bound, respectively, located in between an initial interior arc and a final one; • 5 and 10 comprise controls with the same constrained arc as in 6 and 11 , respectively, yet without the final interior arc; • 3 and 8 combine the previous two cases in 2 + 6 or 7 + 11 , respectively, and give rise to four arcs, starting with a boundary arc for u 2 (like 2 or 7 ), followed by an interior arc, a boundary arc for x 1 (like 6 or 11 ), and a final interior arc; • 4 and 9 comprise controls with the same first three arcs as in 3 and 8 , respectively, but without the final interior arc.
For the parameter value Â = (9 × 10 −4 , 14), the structure with highest probability predicted by the softmax classifier is 6 (100.00%). In turn, we can use the corresponding feedback laws K 6 in the form of (2) to compute the switching times t 1 (Â), t 2 (Â), and check that all of the corresponding primal and dual feasibility conditions are satisfied. The resulting optimal control trajectory is shown on Fig. 3(a).
One of the misclassified samples in the training set is obtained for the parameter values Â = (9.29 × 10 −4 , 15.3). The structure with highest probability that is predicted by the softmax classifier is again 6 (70.19%). But in calculating the optimal switching times, we find that the switching between the second and final arcs should now occur at t 2 ≈ 10.0087, which is beyond the time horizon T = 10. Therefore, this point does not belong to 6 . The structure with second highest probability is 5 (29.81%), which does not present the final interior arc, and turns out to be the correct one. The resulting optimal control trajectory is shown on Fig. 3(b).

Robust mp-NCO-tracking controllers
This section presents a robustification of the mp-NCO-tracking methodology by Sun et al. [47], which has been summarized in Section 2. The main idea sustaining this construction is the use of worst-case uncertainty propagation [8] as a means to back-off the terminal and path constraints in the nominal mp-DO problem, and thereby guarantee feasibility of the multi-parametric control laws against all possible uncertainty scenarios. An attendant advantage to backing-off the constraints is that the resultant mp-DO problem retains the same complexity to that of the nominal mp-DO problem [48]. However, this method suffers from the fact that, because of a lack of feedback in the optimal control problem, the back-offs can be large, especially for long prediction horizons. This large conservativeness could significantly reduce the range of feasible parameter values in the robustified mp-DO problem and motivates the need for including feedback in the optimal control problem.
Ideally the decision variable in the mp-DO problem should be a control law of the form x : R → R nu , but would then lead to a similar complexity as in (continuous-time) dynamic programming. Tube-based MPC circumvents the complexity problem by using a sub-optimal control policy, e.g. an affine feedback law for linear MPC [30,32]. Building on this analogy, we consider an affine state feedback law herein. The robust counterpart mp-DO formulation for the additive uncertainty case is developed in Section 4.1, including the rigorous computation of constraint back-offs using interval and ellipsoidal reachability tubes. An extension of this approach to multiplicative uncertainty is treated in Section 4.2, and the determination of the feedback gain is discussed in Section 4.3. Finally, the same FCC case study as previously is used to illustrate the robust mp-NCO-tracking control methodology in Section 4.4.

Robust mp-do formulation with additive uncertainty
We consider an ancillary state feedback law in the form of wherex(t) ∈ R nx andû(t) ∈ R nu are the state and control values of the nominal system that generates the reference trajectory at time t; and K ∈ R nu×nx is a linear feedback gain matrix. By a slight abuse of notation, we set K = 0 for 0 ≤ t < t s , and K / = 0 for t ≥ t s , in order to account for the fact that the control actions are open loop over a sampling period [0, t s ], The closed-loop dynamics of the uncertain system in (1) are described bẏ For a given uncertainty referenceŵ(t) ∈ W , we can then split the state and the disturbance into nominal and perturbed components as so that the dynamics of the state referencex and perturbation d x are given bẏx withx(0) = B Â Â + b 0 and d x (0) = 0. Observe that the initial value problem (8) is independent of the nominal controlû and the parameter Â. By linearity, a unique solution to (8) exists for all disturbances with w(t) ∈ W , denoted by ı x ( · , w, K) subsequently. The reachability tube K : [0, T ] → K nx describing the solution set for all possible realizations of the time-varying uncertainty w is defined as Provided that a compact reachable-tube enclosure K (t) ⊇ K (t) is available on [0, T], a conservative robust counterpart to the uncer-tain mp-DO problem (1) that minimizes the nominal cost may be formulated as [19] inf x,û with An inherent advantage of this formulation is that the mp-NCOtracking methodology from [47] applies readily to the problem (10) in order to devise a robust mp-NCO-tracking controller. It is clear that the back-off terms on the right-hand side of the robustified constraints will modify the boundary of the nominal critical regions, and possibly the number and type of critical regions too. Such changes in activation level of the constraints will furthermore propagate through the optimality conditions of the robustified mp-DO problem, thereby resulting in different feedback laws (2) in those critical regions with active path or terminal constraints.
In general, any set-valued function K : [0, T ] → K nx C satisfying the following generalized differential inequality for every c ∈ R nx yields a valid enclosure of the reachable tube (9) [49]: The special cases of tractable interval and ellipsoidal enclosures are treated below, after which we discuss an approach to computing the gain matrix K.
Case of ellipsoidal tube enclosures. In practice, the choice of ellipsoidal reachable tubes over interval tubes may be dictated by the fact that the former are more efficient at mitigating the wrapping effect, thereby reducing the overall conservativeness. Under the assumption that the time-varying uncertainty w(t) is bounded within the ellipsoid E(ŵ(t), Q w (t)), an ellipsoidal tube enclosure K (t):=E(Q x (t)) ⊇ K (t), parameterized by the matrix-valued function Q x : [0, T ] → S nx + , can be precomputed by solving the auxiliary ODEs a.e. t ∈ [0, T ], with Q x (0) = 0, and for any weighting function Ä : [0, T ] → R ++ . For instance, Ä may be chosen to minimize the trace ofQ x (t), for some finite tolerance > 0. Therefore, an alternative choice for the back-offs ı G (t), 0 ≤ t ≤ T, and ı H in the robust counterpart problem (10) is The case of a time-varying uncertainty w(t) which is bounded within an interval vector [w L (t), w U (t)] can be treated likewise, e.g. by regarding this interval vector as the Minkowski sum of n w one-dimensional ellipsoids [19].

Extension to multiplicative uncertainty
As an extension to the additive uncertainty case in problem (1), we consider a broader class of uncertain linear dynamic systems with uncertain state coefficient matrix F x such that where w (t) ∈ W , with W ∈ K n w C , is the multiplicative timevarying uncertainty; F 0 x , . . ., F n w x ∈ R nx×nx are given scaling matrices. Notice that a further extension to the case of uncertain coefficient matrices F u and F Â is also possible, for instance by invoking similar arguments as in [45].
Applying a similar splitting of the state and the disturbance into nominal and perturbed components as in (6), the dynamics of the state referencex and perturbation d x becomėx withx(0) = B Â Â + b 0 and d x (0) = 0. Observe that the dynamics (16) are now dependent on the control referenceû and the parameter Â via the nominal state trajectoryx. Therefore, a similar strategy as for additive uncertainty, consisting of precomputing a reachable tube enclosure K for the state perturbations, will rely on the availability of an a priori enclosureX(t) ∈ K nx for the nominal state. For instance, such enclosures could be computed using state-of-theart set-valued integrators [19,49]. Another complication arises due to the presence of bilinear terms d x (t)d w (t) in (16), which adds to the conservativeness of the enclosures K . The construction of tractable interval and ellipsoidal tube enclosures is discussed below.
Case of interval tube enclosures. An interval tube enclosure can be precomputed via the following modified system of auxiliary ODEs a.e. t ∈ [0, T ], ∀i ∈ 1, . . ., nx, Case of ellipsoidal tube enclosures. Precomputing an ellipsoidal tube enclosure K (t):=E(Q x (t)) ⊇ K (t) is more involved in the presence of mixed additive-multiplicative uncertainty. For simplicity of exposition and without loss of generality, we shall assume here that the multiplicative uncertainty is bounded within the unit ball, d ω (t) T d ω (t) ≤ 1, i.e. Q w (t) = I n w ∀t; and that the state reference is bounded within the ellipsoid E(Qx(t)) centered at the origin. Using recent developments in multiplicative ellipsoidal uncertainty from [20], an ellipsoidal enclosure E(Q 0 x (t)) for the solutions of the homogeneous ODĖ for any matrix-valued weighting function ˙ : [0, T ] → S nx ++ . Moreover, the product term By using similar superposition ideas as in [26,19], the following auxiliary ODEs thus propagate the desired shape matrix Q x (t) ∈ S nx + , a.e. t ∈ [0, T ], with initial condition Q x (0) = 0, and for any weighting functions Á :

Selection of the linear gain matrix
The gain matrix K in the linear state feedback (5) can have contradictory effects on the back-off terms (11), and care must therefore be exerted in selecting these gains. Notice that large gain values (in magnitude) can reduce the spread of the reachability tube enclosure K by shifting the poles of the linear system (8) to the left. For constraints that do not have a direct dependence on the control in particular, such as terminal state constraints and pure state path constraints, a smaller reachability tube enclosure will result in smaller back-off terms ı G (t) and ı H , and hence reduce the conservativeness in problem (10). But a large gain matrix could as well lead to larger back-off term in mixed control-state path constraints, due to the product terms Kd, with d ∈ K (t), in (11). For instance robustified input-bound constraints u L ≤ u(t) ≤ u U , come in the form The control domain will thus shrink in the presence of a linear state feedback, and could even become empty when the gain entries become too large (in magnitude).
In the case of an unstable dynamic system, choosing K such that [F x + F u K] is Hurwitz can prevent the reachable tube (9) from growing exponentially as time advances; but even a stable tube may still be impractically large depending on the amount of uncertainty. By analogy to the target set in robust tube MPC [29,40], a better choice for K instead is one for which the reachable set K (T ) is robustly positively invariant (RPI) for the linear system (8). In the case of ellipsoidal tubes with additive disturbance for instance, we may choose K as a (global) optimum of the problem with the weights Ä(t) as in (14) above. Notice that such problems are nonconvex in general, thus rendering the determination of K a difficult task. A further complication in the case of interval tubes is that the dynamics may be nonsmooth.

FCC case study (continued)
This subsection revisits the FCC case study introduced in Section 3.2, by accounting for either additive or multiplicative uncertainty in the mp-DO problem (4). A sampling time of t s = 1 is consid-ered throughout, and we reiterate that no feedback action (K = 0) is exerted during the first sampling period [0, t s ].
Additive uncertainty case. The feed oil temperature T f acts as an additive disturbance w(t) ∈ W :=[−10, 10] in problem (4), with reference valueŵ(t) = 0, ∀t. Precomputed interval and ellipsoidal tube enclosures K of the state perturbation dynamics (8) for this scenario are shown in Fig. 4. Despite the state coefficient matrix being Hurwitz,  Fig. 4(a) and (b) are found to grow significantly over the prediction horizon due to the cumulated uncertainty. The following gain .
The span of the resulting closed-loop reachability tube enclosure in Fig. 4(c) is significantly reduced. For comparison, we also show the tube enclosure corresponding to an intermediate feedback gain of 0.2K * in Fig. 4(d).
Following the discussion in Section 4.3, a more negative gain can lead to smaller reachability tubes, and therefore smaller back-offs for the pure state path constraints in problem (4); but the control domain will shrink concurrently with a larger gain matrix, as indicated in (22). For instance, the control input u 2 reaches its lower or upper bound in several critical regions ( 2 , 3 , 4 , 7 , 8 , 9 ), and the linear gain K * will reduce its domain as [−15 + ı u 2 (t), 15 − ı u 2 (t)], by up to The critical regions of the robust mp-DO problems are compared to those of the nominal mp-DO problem (F w = 0) in Fig. 5, for two linear feedback gains. Both the robust and nominal mp-DO solutions consist of the same 11 critical regions; see Section 3.2 for details. The discrepancy between the robust and nominal critical regions is reduced upon increasing the feedback gain, here from 0.2K * to K * in Fig. 5(a) and (b), respectively. The effect of the constraint back-offs is seen mainly in the lower-left and upper-right corners, where the boundaries of the regions 4 , 5 , 9 and 10 , and to a lesser extend of the regions 3 , 6 , 8 and 11 , are pushed towards the central unconstrained region 1 due to the state constraints being tightened. By contrast, the boundaries of the regions 2 and 7 , where the upper or lower bound of the control u 2 is initially active, remain about unchanged despite a tighter control region; this behavior is consistent with the fact that ı u 2 (t) = 0 for 0 ≤ t ≤ t s due to the absence of feedback during the first sampling period.
Several robust optimal control and response trajectories, as obtained with different linear gains, are compared against the cor-responding nominal optimal control in Fig. 6. The nominal control trajectory in Fig. 6(a) is for the initial condition (9 × 10 −4 , 14) ∈ 6 and consists of a boundary arc with x 1 = 10 −3 between two interior arcs. Without the ancillary state feedback (K = 0), the robust control trajectory departs significantly from this nominal control. In fact, the initial condition (9 × 10 −4 , 14) belongs to the critical region 5 (instead of 6 ) for this robust mp-DO problem, since the final interior arc is no longer present. This behavior is caused by the rather large span of the reachability tube K in Fig. 4(b). Increasing the linear feedback gain to K * progressively reduces the conservativeness of the robust optimal control strategy. We also note that the robust optimal control trajectories can exhibit a discontinuity at t s insofar as the constraint back-offs are themselves discontinuous at this point. Another comparison is presented in Fig. 6(b) for the initial condition (− 1 ×10 −3 , 20) ∈ 2 , where the input u 2 is initially saturated at its upper bound. Since the input saturation is shorter than the sampling period, the constraint back-offs do not have any effect on the (nominal or robust) optimal input and response trajectories in this scenario.
Finally, closed-loop control and response trajectories obtained from the application of the nominal and robust mp-NCO-tracking controllers are compared in Fig. 7. The state measurements are considered noise-free in this comparison, and we use the following realization of the additive disturbance: where H(t) is the unit-step (Heavyside) function. The results in Fig. 7(a) confirm that the robust mp-NCO-tracking controller keeps the response feasible at all times, for all three linear feedback gains K = 0, K = 0.2K * and K = K * ; whereas the state constraints may become violated with the nominal mp-NCO-tracking controller under certain uncertainty scenarios. Despite large differences in the control trajectories obtained with different feedback gains, compare Fig. 6, the closed-loop responses of all three robust mp-NCO-tracking controllers are about the same in Fig. 7(a). This behavior illustrates the low sensitivity of the mp-NCO-tracking feedback controller with respect to the extra conservativeness  introduced by the constraint back-offs. With the initial condition (− 1 ×10 −3 , 20) ∈ 2 in Fig. 7(b), all the robust closed-loop optimal input and response trajectories are again identical to the nominal ones since the constraint back-offs do not modify the active constraints.
Mixed Additive-Multiplicative Uncertainty Case. We add multiplicative uncertainty on top of the additive uncertainty considered previously, corresponding to ±10% variations in the nominal entries of the state coefficient matrix: , with the extra uncertainty bounded within the unit ball, Precomputed tube enclosures K of the state perturbation dynamics (16) within this scenario are shown in Fig. 8. Only ellipsoidal tubes are reported since their interval counterparts end up being much more conservative due to the wrapping effect of interval arithmetic. Similar to the additive uncertainty case in Fig. 4, defining an ancillary feedback law with optimized linear gain, , results in a much smaller reachability tube ( Fig. 8(b)) than without this ancillary feedback ( Fig. 8(a)). Furthermore, the robust mp-DO solutions with either K = 0 or K * consist of the same 11 critical regions as in Fig. 2(b).
Several robust optimal control and response trajectories are compared against the corresponding nominal optimal control in   Fig. 9(a), for the initial condition (9 × 10 −4 , 14) ∈ 6 . Predictably, the addition of multiplicative uncertainty on top of the additive disturbance gives larger constraint back-offs, and therefore leads to extra conservativeness, compared with Fig. 6(a). The closed-loop control and response trajectories obtained from the application of the mp-NCO-tracking controllers in Fig. 9(b) -under the same uncertainty scenario (24) as earlier -provide a good illustration of the robustness and effectiveness of the proposed methodology.

Robust mp-NCO-tracking control of a CSTR cascade
We consider the cascade of two non-isothermal CSTRs presented in Fig. 10 [36] as a final case study. The following three irreversible exothermic reactions take place in parallel in each CSTR, where A is the main reagent, B is the desired product, and U and R are (undesired) by-products. The feed stream to CSTR1 is pure A at the flow rate flow rate F 1 , molar concentration C in A1 , and temperature T in 1 ; whereas CSTR2 is fed with both the outlet stream of the first reactor and a second stream of pure A at the flow rate F 2 , molar concentration C in A2 , and temperature T in 2 . The liquid holdups V 1 and V 2 of both reactors are constant and a jacket is used to remove heat from, or supply heat to, each reactor.   A set of mass and energy balance equations describing the concentration of A, C Aj and temperature, T j within each CSTR j = 1 . . .2 is given by: where with Q j , the rate of heat input/removal from the reactor j = 1 . . .2; H i , k • i , E i , the enthalpies, pre-exponential constants and activation energies of the three reactions i = 1 . . .3, respectively; c p and , the heat capacity and density of the liquid mixture. Numerical values for the constant parameters are reported in Table A.1 (Appendix (A)).
The process model (25) features multiple steady-states. The control objective herein is to stabilize the reactor at an open-loop unstable steady-state, denoted with next. The steady-state of Numerical expressions for the (nonzero) matrices and vectors participating in the quadratic objective and linearized dynamics of problem (1) are reported in A for completeness. Bound constraints on the state and input variables and the domain of the disturbances are given by: The parameters Â correspond to the initial values x(0) = Â ∈ , where the domain is identical to the feasible state domain above. Lastly, the prediction horizon in the control problem is set to T = 5, and we consider infrequent measurements at a sampling time of t s = 1 in order to demonstrate the robustness of the proposed controllers.
The robust mp-NCO-tracking methodology starts with the determination of a suitable gain matrix. Since the steady state of interest is unstable, and so is the linearized dynamic system therefore, the cross-section of any open-loop reachability tube enclosure (K = 0) will grow exponentially over the time horizon. We use the following linear feedback gain instead, as obtained from the numerical solution of the optimization problem (23): A comparison between the projections of ellipsoidal reachability tube enclosures computed without and with the feedback gain K * is presented in Fig. 11. While this comparison clearly shows the benefits of the ancillary state feedback (5) in terms of keeping the span of the reachability tube enclosure under control, it also illustrates the need for the linear feedback gain to compromise between the four state variables; i.e., the ancillary feedback is first detrimental to the temperature T 1 enclosure soon after the sampling period compared with the open-loop tube, before showing a stabilizing effect towards the end of the prediction horizon.
In a next step, we construct an MLP that describes the switching structures of the robust mp-DO problem using Algorithm 1. Both the training and testing sets are initialized with M = 5000 points, generated using Sobol quasi-random sampling. A fully connected MLP, with 4 neurons in the input layer and 5 hidden layers comprising 20 neurons each, gives an accuracy of 99.76% on the testing set, which corresponds to a misclassification error rate of <0.5% with 95% confidence.
The robust mp-DO problem (with feedback gain K * ) -and its nominal counterpart (with F w = 0) likewise -is found to comprise a total of 17 critical regions. These regions correspond to optimal controls having between 1 and 4 arcs, with the input u 4 (rate of heat supply/removal for CSTR2) and the state x 1 (concentration of A in CSTR1) reaching their bounds: • 1 comprises the unconstrained optimal controls; • 2 and 3 comprise controls with two arcs, a boundary arc where u 4 reaches its lower or upper bound, respectively, followed by an interior arc; • 4 and 5 comprise controls with three arcs, a boundary arc where x 1 reaches its upper or lower bound, respectively, inbetween two interior arcs; • 6 , . . ., 9 combine the last two cases, and give rise to four arcs, starting with a boundary arc for u 4 (as in 2 and 3 ), followed by an interior arc, a boundary arc for x 1 (as in 4 and 5 ), and a final interior arc. • 10 , . . ., 13 also comprise controls with four arcs, differing from 6 , . . ., 9 in that the upper or lower bounds for u 4 and x 1 are now both active on the second arc; • 14 , . . ., 17 again comprise controls with four arcs, differing from 6 , . . ., 9 in that the upper or lower bound for u 4 is active on the first three arcs.
The left plots in Fig. 12 shows the robust optimal control and response trajectories, corresponding toŵ andx in the mp-DO (10), for an initial condition in the critical region 8 ; the switching times between the 4 arcs are around t ≈ 0.29, t ≈ 0.54, and t ≈ 1.30. Also shown around these optimal trajectories are the back-off values ±ı x (t) and ±ı u (t) of the corresponding bound constraints, which provide an indication of the conservativeness of the robust solution. In particular, the feedback gain K * appears to be reaching a good compromise between the contraction of the state and control feasible domains, per the discussion in Section 4.3.
The closed-loop behavior resulting from the application of the robust mp-NCO-tracking controller is given on the right plots in Fig. 12, from the same initial condition within the critical region 8 . The sampling period is set to t s = 1, and the following uncertainty realization is applied: Also note that the state measurements at sampling times are simulated using the nonlinear model (25) here, instead of the linearized model. The robust mp-NCO-tracking controller is seen to keep the response feasible at all times, in particular the active state constraint x 1 (t) ≥ −0.1, while a nominal mp-NCO-tracking would result in large constraint violations.

Conclusions and future research directions
This paper has introduced a methodology for the design of robust mp-NCO-tracking controllers in continuous-time linear dynamic systems subject to time-varying uncertainty. An extension of the mp-NCO-tracking approach by Sun et al. [47] entails backingoff the path and terminal state constraints based on a worst-case uncertainty propagation in the form of interval or ellipsoidal reachability tubes in order to enforce feasibility between sampling times. Herein, the use of an ancillary state feedback as a means of reducing the conservativeness of the resulting controllers has been investigated, including the selection of suitable gain matrices in the linear feedback law that can compromise between the reduction of the state and control feasible domains. An inherent advantage of this approach is that the complexity of solving the robust counterpart mp-DO problem remains the same as the nominal mp-DO problem, and in particular the off-line computational effort is independent of the number of disturbances. Another main contribution of the paper has been the use of data classifiers based on deep learning for approximating the critical regions in continuous-time mp-DO problems. Not only do these classifiers provide a practical way of designing the (robust) mp-NCO-tracking controllers, but they can also be used to efficiently search for a critical region during online execution. The overall methodology has been illustrated with the detailed case study of an FCC unit, and the final case study of a more challenging CSTR cascade with four controlled inputs and four additive disturbances. Future work will consider applications to higher-dimensional problems, where model reductions techniques can be used for reducing the order of the dynamic system subject to an acceptable performance loss [38]. Another research direction entails the extension of robust mp-NCO-tracking to address problems having linear time-varying dynamics, and eventually nonlinear dynamics. We shall also consider economic objectives as part of the optimization formulation in order to encompass dynamic real-time optimization problems in the spirit of the original NCO-tracking methodology [6].