Robust Data-Driven Predictive Control of Unknown Nonlinear Systems using Reachability Analysis

This work proposes a robust data-driven predictive control approach for unknown nonlinear systems in the presence of bounded process and measurement noise. Data-driven reachable sets are employed for the controller design instead of using an explicit nonlinear system model. Although the process and measurement noise are bounded, the statistical properties of the noise are not required to be known. By using the past noisy input-output data in the learning phase, we propose a novel method to over-approximate reachable sets of an unknown nonlinear system. Then, we propose a data-driven predictive control approach to compute safe and robust control policies from noisy online data. The constraints are guaranteed in the control phase with robust safety margins through the effective use of the predicted output reachable set obtained in the learning phase. Finally, a numerical example validates the efficacy of the proposed approach and demonstrates comparable performance with a model-based predictive control approach.


I. INTRODUCTION
Model Predictive Control (MPC) is one of the most powerful and well-established methods because of its ability to handle system constraints, nonlinear dynamics efficiently, and trajectory tracking [1]. The MPC scheme generally requires a sufficiently accurate model of the dynamical system to guarantee optimal control performance while satisfying system constraints. However, in practice, acquiring an accurate model, especially for complex systems, is often time-consuming and too costly [2], [3]. Thus, recent datadriven approaches for predictive control have attracted a considerable amount of interest in the control community [4].
A remarkable contribution to data-driven predictive control for linear time-invariant (LTI) systems can be found in [5], where the authors addressed an optimal trajectory tracking problem for unknown LTI systems by the dataenabled predictive control (DeePC) algorithm. DeePC relies on behavioral systems theory and Willems' Fundamental Lemma (WFL) [6] to learn a non-parametric system model from noise-free data. Although DeePC is a seminal approach, it was initially developed only for noise-free systems, and later it was extended to stochastic LTI systems in [7]. However, when the statistics of the uncertainties and noise in the LTI systems are not known, but it is assumed that they are bounded, the data-driven set-based approach presented in [8] is more appropriate for guaranteeing robust constraint satisfaction. This approach relies on using datadriven reachable set prediction within a predictive control scheme to compensate for the complexity of capturing an accurate model. Unlike LTI systems, data-driven control for nonlinear systems is quite challenging, and there are many open challenges in this research area.
Recent works on data-driven predictive control for uncertain nonlinear systems include [9], which provides a chanceconstrained MPC approach to control a nonlinear system with a known nominal part and unknown additive dynamics modeled as a Gaussian process. However, the nonlinear system is not completely unknown in [9]; in addition, the proposed approach is not robust and allows a certain amount of constraint violations. Extensions of the WFL to nonlinear systems have also been proposed to make it analogous to data-driven control approaches in LTI systems. In this regard, [10] proposes a data-driven predictive control approach for unknown nonlinear, control-affine systems using available noise-free data. This approach relies on local linearization of the underlying system using noise-free data in order to apply WFL. In [11], WFL is used to reproduce kernel Hilbert space and extend its applicability to a class of nonlinear systems. However, it is important to emphasize that all the works cited above either assume noise-free data or do not provide robust safety guarantees under system constraints.
This paper presents a robust data-driven predictive control approach to control unknown nonlinear systems under bounded process and measurement noise. The proposed approach uses noisy data to derive an implicit data-driven model of the system in real-time and obtains an optimal control input guaranteeing robust constraint satisfaction. We employ a zonotopic set representation in this approach to present an implicit data-driven system representation.
To this end, two online phases are considered: the learning and control phases. The goal of the learning phase is to compute an implicit data-driven representation of the unknown nonlinear system using zonotopes at each time step. In this phase, using Taylor series expansion, the unknown nonlinear system is approximated by a data-driven linear model using the past input-output data of the finite horizon. Then, the model mismatch and the Lagrange remainder term of the Taylor series are bounded by a zonotope using the data. In the control phase, we propose a robust, data-driven approach called nonlinear zonotopic predictive control (NZPC). NZPC utilizes the learning phase and zonotope recursion at each time step to predict the reachable output set over a finite horizon. The optimal control problem solved by NZPC in this phase yields an optimal control input that minimizes the given cost function and satisfies the specified constraints. In this method, the input-output data set is updated over time as the close-loop system evolves, i.e., old data is discarded each time new data is collected and added to the data set to enhance the implicit system representation and, accordingly, to improve the controller performance. The code to recreate our findings is publicly available in a GitHub repository 1 .
The remainder of this paper is organized as follows: Section II provides preliminaries and formally defines the problem. In Section III, we present our robust data-driven predictive control approach for unknown nonlinear systems. An illustrative example demonstrates the proposed approach in Section IV, and its performance is compared to a robust model predictive control approach in the same section. We conclude the paper with Section V and indicate future prospects of this work.

II. PRELIMINARIES AND PROBLEM STATEMENT A. Notations
The set of natural numbers, non-negative integers, and real n-dimensional space are denoted by N, Z ≥0 , and R n , respectively. Moreover, x ∈ R n is an n×1 vector, and x ∞ and x 2 respectively represent the infinity-norm and the 2norm of x, and x 2 P = x P x. Furthermore, R n×m denotes the real n × m matrix space. We write 1 n , and 0 n for ndimensional column vectors with all entries equal to 1 and 0, respectively. For simplicity, where no confusion arises, we use 0 and 1 to denote matrices with the proper dimensions. For a given vector x, x (i) denotes its i-th element. For a given matrix A, the column j of A is denoted by (A) .,j , and the element at row i and column j of A is denoted by (A) i,j . Finally, diag(.) is the diagonal operator making a diagonal matrix by its arguments. For X ∈ R n×m , X and X † indicate the transpose and Moore-Penrose pseudoinverse of a matrix X, respectively. If matrix X is full-row rank, then X † = X (XX ) −1 is the right inverse, i.e., XX † = I n , where I n ∈ R n×n is the identity matrix. The Kronecker product is denoted by ⊗.
A stacked window of sequence {x(k)} t k=t−T is For notational simplicity, we write X + = X [t−T +1,t] and X − = X [t−T,t−1] to denote the shifted stacked windows. We use the subscript t + k|t to highlight predictive quantities, e.g., x t+k|t is the (t + k)-step-ahead prediction of the vector 1 https://github.com/aalanwar/Data-Driven-Predictive-Control x initialized at time t as x t|t = x(t).

B. Set Representations
A zonotope is an affine transformation of a unit hypercube, [12], and it is defined as follows.
Definition 1 (Zonotope): Given a center c Z ∈ R n and a number γ Z ∈ N of generator vectors g i Z ∈ R n , a zonotope is defined as We use the short notation Z = c Z , G Z to denote a zonotope, where G Z = [g 1 Z . . . g γ Z Z ] ∈ R n×γ Z . A zonotope Z could be over-approximated by a multidimensional interval by [13,Proposition 2.2]: The conversion of an interval I = [I, I] to a zonotope is denoted by Z = zonotope(I, I).
Given L ∈ R m×n , the linear transformation of a zonotope Z is a zonotope LZ = Lc Z , LG Z . Given two zonotopes Z 1 = c Z1 , G Z1 and Z 2 = c Z2 , G Z2 , the Minkowski sum is obtained as For simplicity, we use the notation + instead of ⊕ for Minkowski sum and write Z 1 − Z 2 to denote Z 1 + (−Z 2 ). The Cartesian product of two zonotopes Z 1 and Z 2 is obtained as A matrix zonotope is defined as follows [13, p. 52]. Definition 2 (Matrix Zonotope): Given a center matrix C M ∈ R n×T and a number γ M ∈ N of generator matrices G 1 M , . . . , G γ M M ∈ R n×T , a matrix zonotope is defined as We use the short notation Consider a discrete-time nonlinear control system where f : R nx × R nu − → R nx is an unknown nonlinear function, x(k) ∈ R nx and y(k) ∈ R ny are respectively the state and the output of the system with n y ≤ n x at time k ∈ Z ≥0 , u(k) ∈ Z u = c Zu , G Zu ⊂ R nu is the control input, X 0 ⊂ R nx is the set of initial states, H ∈ R ny×nx is the system output matrix which is assumed to be known and full row-rank, i.e., rank(H) = n y , w(k) denotes the process noise bounded by a zonotope w(k) ∈ Z w = c Zw , G Zw , v(k) is the measurement noise bounded by a zonotope v(k) ∈ Z v = c Zv , G Zv for all time steps. At time k, input and output constraints for a controller design are given by Notice that the constraints are time-varying to account for operational limits under different circumstances.
In this paper, we aim to solve a receding horizon optimal control problem by employing only the input-output data of the system (1) without explicitly knowing the nonlinear function f (x(k), u(k)). To be precise, we assume that at each time step t ∈ Z ≥0 , we have access to the past K input-output trajectories of different lengths T i + 1, i = 1, . . . , K denoted by {u(k)} t k=t−Ti and {y(k)} t k=t−Ti . We consider a single trajectory and collect all the input and 'noisy' output data in the following matrices If a data point index t is negative, it refers to the data obtained from offline experiments, i.e., before considering NZPC in the loop. We indicate the set of all available data Given the set of possible inputs u(k) ∈ Z u , process noise w(k) ∈ Z w , for k = 0, ..., N , and initial state set X 0 ∈ R nx , the state reachable set R x N is defined as the set of all possible states in the state space that a system can reach after N time steps: In this paper, however, we are particularly interested in the output reachable set, which is defined below. Definition 3 (Output reachable Set): The output reachable set R y N is defined as the set of all possible outputs that a system can reach after N time steps: (3) and Z v is the measurement noise zonotope. We denote by {w(k)} t k=t−T and {v(k)} t k=t−T the sequences of the unknown actual process and measurement noise corresponding to the available input-output trajectories. From the bounded noise assumptions, it follows directly that the corresponding stacked matrices are matrix zonotopes resulting from the concatenation of multiple noise zonotopes Z w and Z v as described in [14]. We further denote the actual, but unknown, state trajectory by X [t−T,t] .

D. Main Assumptions
In this subsection, we present the required assumptions for the data-driven reachability analysis and the NZPC approach.
Assumption 1: For any k ∈ Z ≥0 , it is assumed that x(k) ∞ ≤ η, where η ≥ 0 is known. According to Assumption 1, the zonotope Z η = 0 nx , ηI nx contains x(k) at all times. We define M η = C Mη , G Mη to denote a matrix zonotope resulting from the concatenation of multiple zonotopes Z η . We also define For simplicity, we omit the time index k when no confusion may arise, and in order to introduce a concise notation, we define a new extended state vector Assumption 2: The unknown nonlinear function f is twice differentiable in x and u. Assumption 2 implies that f is locally Lipschitz continuous in x and u at each dimension, i.e., there exists a constant L (i) f > 0, with i = 1, . . . , n x for each dimension, such that for any ξ 1 , ξ 2 ∈ F, it holds that Remark 1: Considering a separate Lipschitz constant for each element of the function f decreases the conservatism, especially when the data has a different scale for each dimension.
III. ROBUST DATA-DRIVEN PREDICTIVE CONTROL In this section, we propose a data-driven nonlinear zonotopic predictive control approach for the nonlinear system (1). The proposed algorithm, abbreviated as NZPC, consists of two online phases: the learning phase and the control phase, which are described in detail in the following subsections.

A. Learning Phase
In this phase, we propose a data-driven algorithm to overapproximate the reachable sets of a nonlinear control system given input and noisy output data D [t−T,t] . This algorithm is a key technical result for predicting all possible reachable output trajectories within the prediction horizon in the NZPC approach. Using (1a), we rewrite the output equation (1b) as Our NZPC approach does not involve state measurement x(k), and, therefore, we need to construct a zonotope that contains all possible x(k) consistent with the output measurement y(k). Lemma 1 ( [15]): Let Assumption 1 holds. Then, given the output measurement y(k) of the system (1), it holds that Let ξ = x u . Then, Taylor series of f around ξ can be represented (see [16]) by where L H is the Lagrange remainder given by To present a standard notation of the linearized system, we can separate vector ξ into x and u in (5) and rewrite it as follows: If a model of a nonlinear system is available, the Lagrange remainder L H could be over-approximated as described in [13,Section 3.4.3]. However, in this paper, the model is unknown. Therefore, we over-approximate L H from the available input-output data using Algorithm 1, which is described below.
Algorithm 1 computes over-approximated data-driven reachable sets of the nonlinear control system (1). At time t, the over-approximated data-driven reachable set and the exact model-based reachable set within k steps are denoted byR y t+k|t , and R y t+k|t , respectively. The initial output set is denoted by R y t|t . Then, the procedure of Algorithm 1 is as follows: Set k = 0. First, we obtain an approximate linearized modelM H given the data D [t−T,t] in line 1 of Algorithm 1; [14], [17]. Then, for the chosen linear model, we derive a zonotope Z L that over-approximates the modeling mismatch ∆M H together with the Lagrange remainder L H following lines 2-4. By computing a zonotope Z , we over-approximate the modeling mismatch along with the Lagrange remainder for all data points in F in line 5. Eventually, we compute the over-approximated output reachable setsR y t+k+1|t for k = 1, . . . , N − 1 according to lines 6-9.
Proof: We can rewrite (6) as follows: whereM H is an approximation of M H and ∆M H = M H −M H is the model mismatch. We aim to obtainM H , which is an approximate linearization of f H (x(k), u(k)), and over approximate the model mismatch ∆M H along with the Lagrange remainder L H . Let Assumption 2 hold. Then, by neglecting the Lagrange remainder and substituting (6) in (4), the following holds for the collected data and its corresponding unknown actual state and noise realizations: where, by Lemma 1, we set x = H † (y −v ) and v = c Zv with y a known linearization point. Now, let Assumption 1 hold. Then, again by Lemma 1, it holds that To get an approximated linear modelM H , we set X − = H † (Y − − C Mv ), which is the center of matrix zonotope M x|y . Also, in (8), we choose V + = C Mv and W − = C Mw . By substituting these values in (8), we obtain because v = c Zv and C Mv = 1 T ⊗ c Zv . Thus, an approximated linear modelM H can be derived according to line 1 of Algorithm 1 by using the least-squares solution of (8).
The second step of the proof involves over-approximating the model mismatch ∆M H and the Lagrange remainder L H for the data. For this, we define Ξ = (X − ) .,1 . . . (X − ) .,T (U − ) .,1 . . . (U − ) .,T with (Ξ) .,j , for j = 1, . . . , T , denotes the j-the column. By substituting (7) in (4), the following relation holds for any point (Ξ) .,j and its corresponding noise realization Rearranging equation (9) and considering the zonotopes bounding the noise realizations Z w , Z v , and the zonotope bounding the state Z x|y yields to bounding the model mismatch and Lagrange remainder for a single point (Ξ) .,j as follows: (10) Now, we can over-approximate the right-hand side of (10) for all (Ξ) .,j , j = 1, . . . , T corresponding to the data D [t−T,t] together with their state trajectory and noise realizations by Z L according to lines 2-4 of Algorithm 1. In particular, we have proved that for all (Ξ) .,j , j = 1, . . . , T , corresponding to the data D [t−T,t] the following holds Next, the model mismatch and the Lagrange remainder should be over-approximated for all ξ ∈ F. To this end, we assume Z η and Z u to be compact. Due to this assumption, all points (Ξ) .,j , j = 1, . . . , T corresponding to D [t−T,t] are dense in F, i.e., there exists some δ ≥ 0 such that, for any ξ ∈ F, there is a (Ξ) .,j corresponding to D [t−T,t] such that ξ − (Ξ) .,j 2 ≤ δ [18]. Given Assumption 2 and a known δ, we know that for every ξ ∈ F, there exists a (Ξ) .,j corresponding to D [t−T,t] such that with representing a box with length H 2 L (i) f δ in each dimension i. Finally, according to (4) and (11), the model-based reachable set can be over-approximated using line 8 of Algorithm 1, i.e.,R y t+k+1|t ⊇ R y t+k+1|t , which completes the proof. Note that as T → ∞, i.e., the δ approaches zero, we observe that Z − → 0, and, for every ξ ∈ F, Z L accounts for both the model mismatch and the Lagrange remainder. For computing the Lipschitz constant and the δ, [18] and [19] have proposed methods that work well in practice.
Remark 2: In model-based reachability analysis of nonlinear systems, [20] proves that the norm of the Lagrange remainder is minimized by choosing the center of the reachable set as the linearization point. Therefore, based on this result, we simply choose the centers of R y t|t , Z v , and Z u , as the linearization points y , v , and u , respectively.

B. Control Phase
In this subsection, we provide an extension of the results on zonotopic data-driven predictive control presented in [8] to unknown nonlinear systems. At each time step t, we use past input-output measurements to predict future reachable regions over the time horizon N ∈ N using the learning phase described in the previous Subsection III-A. Then, we compute the control input trajectories such that the predicted outputs stay within the computed reachable regions and the cost is minimized over the horizon N . At time t ∈ Z ≥0 , given the past T + 1 input-output data D [t−T,t] of the nonlinear system (1) with constraints (2), and bounds on the process and measurement noise, we define the following data-driven optimal control problem min u,y where the initial point y t|t = y(t). Here, u = (u t|t , . . . , u t+N −1|t ) and y = (y t+1|t , . . . , y t+N |t ) are input and output trajectories predicted at time t over the finite horizon N , respectively, and y(t) is the measured output at the current time t. The cost (12a) containing a weighted tracking terms with respect to the desired references y r = (y r (t + 1), . . . , y r (t + N )) and u r = (u r (t), . . . , u r (t + N − 1)) is minimized online. The control and output tracking weights are represented by positive definite matrices Q ∈ R ny×ny and R ∈ R nu×nu , respectively. The constraint (12b) is derived from Algorithm 1 in which Z u is replaced with u t+k|t . Notice that the Cartesian product of a zonotope and a vector in equation (12b) is given bŷ R x t+k|t × u t+k|t =R x t+k|t × u t+k|t , 0 nu . The constraint (12c) ensures that the output constraints are satisfied by predicted reachable sets (12b). This ultimately limits the choice of u t+k|t in (12b), and implicitly tightening the input constraints. The constraint (12d) guarantees that y t+k+1|t is contained within the allowable reachable region. We denote the optimal solution of (12) at time t by u * and y * , and apply the first optimal control input u * t|t to the system. As in standard MPC, (12) is solved in a receding horizon fashion summarized in Algorithm 2.

Algorithm 2 Nonlinear Zonotopic Data-Driven Predictive Control
Input: A time horizon N , weighting matrices Q, R, constraints U k , Y k , desired references y r , u r , control input zonotope Z u , and the initial input-output data D [t−T,t] . Online Phase: 1: while t ∈ Z ≥0 do 2: Learning Phase: Compute the over-approximated reachable setsR y t+k+1|t at time t for k = 0, . . . , N − 1 using Algorithm 1. 3: Control Phase: Solve (12) and apply the first input u * t|t to the system. 4: Increase the time step t = t + 1. 5: Collect the new data and update input-output data D [t−T,t] . 6: Update the initial output set R y t|t = y(t), 0 ny . 7: end while Then, one should verify the following constraints: In the remainder of this section, we prove the robust satisfaction of the specified control problem subject to the feasibility of (12). Notice here that, by assuming feasibility, we mean, it is assumed that an admissible solution to (12) exists that satisfies the constraints. Then, the following result guarantees that such a solution can be found by Algorithm 2.
Theorem 2: Consider a discrete-time nonlinear system (1) together with input and output constraints (2). Suppose Assumptions 1 and 2 hold, and process and measurement noise are bounded by Z w and Z v , respectively. If problem (12) is feasible at each time step, then the data-driven controller obtained from (12) guarantees the robust satisfaction of the closed-loop constraints, i.e., y(k) ∈ Y k and u * (k) ∈ U k , for all time k ∈ Z ≥0 and all possible realizations of bounded process and measurement noise.
Proof: Based on Theorem 1, the data-driven reachable sets computed in (12b), over-approximates the exact modelbased reachable sets. According to the constraint (12c), the optimal control input sequence is chosen such that the output is within the intersection of the over-approximated output reachable sets and the output constraints 12c in the presence of bounded noise, along with satisfying input constraints (12e). This, therefore, guarantees the robust constraint satisfaction of Y k at each time step subject to the feasibility of (12).
This theorem enables an optimal predictive control directly from the available noisy data without a prerequisite offline system identification step. The designed controller provides robust constraint guarantees against all possible bounded noise realizations. While we focused on robust constraint satisfaction, we did not tackle the issue of how to guarantee recursive feasibility and closed-loop stability in this paper. Addressing this issue is a relevant future research direction.
Furthermore, without the model knowledge and direct state measurements, verifying controllability and observability of the system are impossible. However, similar to [10], we can implicitly assume that the nonlinear system (1) is controllable and observable so that the data-driven predictive control approach proposed in this paper is feasible and well-posed. This amounts to assuming that the system is equipped with sufficient amount of sensing and actuating equipment so that the output of the nonlinear system can be steered easily and the input-output data is informative of the system's state. Formally exploring notions of controllability and observability in a data-driven setting is not considered in the current paper and is deferred for the future work.

IV. NUMERICAL EXAMPLE
In this section, we verify our proposed NZPC algorithm through an illustrative example. First, we demonstrate the effectiveness of the proposed data-driven reachability analysis by considering Algorithm 1 as a self-contained algorithm. We consider the discrete-time nonlinear system given in [21,Section 8.3.5], whose model is given by: where τ = 0.015, α = 7.2×10 10 , β = −8750, and ρ = 1.5× 10 13 . Note that the function f (x, u) is the ground truth which is not known by the data-driven reachability analysis and the NZPC algorithm. Also, note that this illustrative example differs from the original example given in [21,Section 8.3.5], in the sense that the process noise and the control input are applied differently, and we consider the system output matrix H = 1 0.001 −0.01 1 . Furthermore, we assume η = 22. The initial available input-output data contain 50 trajectories of a length of 10 (i.e., T = 500).
Next, we apply the proposed NZPC approach (Algorithm 2) to the same system and demonstrate its applicability. We set the following parameters: y r = 0 ny , u r = 0 0.007 , Q = 5I ny , R = 0.02I nu , X 0 = −2 −20.5 , diag(0.01, 1) , and N = 3. The constraints are U = u r , diag(5, 3) , Y l = −3 −22 , Y u = 0.25 2.7 , and the Z = 0 2 , diag(0.08, 0.71) . The reachable setsR y t+k+1|t , system trajectory y(k), and the predicted output y pred (k) of the closed-loop system are plotted in Figure 2a over 150 time steps. This Figure  illustrates that the output trajectory of the closed-loop system and the predicted outputs by NZPC are inside the reachable sets. For comparison, we apply NZPC assuming that the accurate model of the system is available, and we denote this approach by the RMPC-zono approach. The input-output trajectories of one dimension of the closed-loop system under NZPC and RMPC-zono approaches are depicted in Figures  2b and 2c. These results show that the closed-loop outputs' convergence towards the output reference y r in NZPC is slower compared to RMPC-zono, which can be due to the less accurate prediction. However, the input and output constraints are satisfied using both control approaches.
We presented the effectiveness of the NZPC approach through a simple example for illustration purpose. However, this approach can be applied to higher-dimensional systems and also when the number of outputs is less than the number of states. In particular, in our future work, we will exploit the presented results to design a data-driven predictive scheme for complex systems like smart buildings to optimize energy use for space heating while maintaining thermal comfort levels of the occupants.
V. CONCLUSION In this paper, we presented a zonotopic data-driven robust predictive control approach to control unknown nonlinear systems based on input-output data, where the output data might be corrupted by sensor noise. We proposed an algorithm that uses available data to over-approximate reachable sets over a finite horizon through a learning phase. Then, in a control phase, we utilized the over-approximated reachable sets as an implicit data-driven system representation in the receding horizon optimal control problem. The proposed control algorithm updates the input-output data as the closed-loop system evolves. This algorithm computes optimal control inputs to drive the system along the desired reference while providing robust system constraint satisfaction.
As compared to the existing literature, the distinctive features of our method are twofold. First, we provide robust safety guarantees in the presence of bounded process and measurement noise. Second, these guarantees do not require explicit knowledge of the nonlinear system model. There are several open problems that will be addressed in future work. For instance, providing necessary and/or sufficient conditions for the recursive feasibility of the NZPC algorithm is an ongoing work that will be presented in another venue. Also, extending the proposed approach to nonlinear systems whose output map is not known is an interesting problem that will be pursued. The current paper will serve as the foundation for the aforementioned prospects.