Data‐enabled predictive control for quadcopters

Abstract We study the application of a data‐enabled predictive control (DeePC) algorithm for position control of real‐world nano‐quadcopters. The DeePC algorithm is a finite‐horizon, optimal control method that uses input/output measurements from the system to predict future trajectories without the need for system identification or state estimation. The algorithm predicts future trajectories of the quadcopter by linearly combining previously measured trajectories (motion primitives). We illustrate the necessity of a regularized variant of the DeePC algorithm to handle the nonlinear nature of the real‐world quadcopter dynamics with noisy measurements. Simulation‐based analysis is used to gain insights into the effects of regularization, and experimental results validate that these insights carry over to the real‐world quadcopter. Moreover, we demonstrate the reliability of the DeePC algorithm by collecting a new set of input/output measurements for every real‐world experiment performed. The performance of the DeePC algorithm is compared to Model Predictive Control based on a first‐principles model of the quadcopter. The results are demonstrated with a video of successful trajectory tracking of the real‐world quadcopter.


INTRODUCTION
The analysis and design of control systems is traditionally addressed using a model-based control approach where a model for the system is first identified from data, and the control policy is then designed based on the identified model. The system identification step is often the most time-consuming and challenging part of model-based control approaches. 1,2 System identification often requires expert knowledge and partial system models, 3 and unless the control objective is taken into account during the identification process, the obtained model may not be useful for control. 4 These observations as well as the advancements in sensing and computation technologies have motivated a tendency toward data-driven control methods yielding many successes. [5][6][7][8] Such methods bypass the traditional model-based control approach, and design control inputs directly from data. These so-called direct data-driven methods for control design benefit from ease of implementation on complex systems where system identification is too time-consuming and cumbersome. Among these data-driven methods are learning-based and adaptive Model Predictive Control (MPC) approaches, where the unknown system dynamics are substituted with a learned model which maps inputs to output predictions. [9][10][11][12][13] However, such methods still require learning an input/output model and often involve (stochastic) function approximation by means of neural networks or Gaussian processes, which come with their own tuning challenges and can be inconsistent across applications. 14 One algorithm that does not require any function learning or system identification is the so-called data-enabled predictive control (DeePC) algorithm. 15 Instead, this algorithm directly uses previously measured input/output data to predict future trajectories. The previously measured input/output data from the system act as motion primitives that serve as a basis for the subspace of possible system trajectories. The DeePC algorithm builds on the seminal work on linear time invariant (LTI) systems by Willems et al., specifically what is known as the fundamental lemma in behavioral systems theory. 16 This result was used by Markovsky et al. for the first time for control purposes allowing for the synthesis of data-driven open loop control for LTI systems. 17 The DeePC algorithm extended this method to closed-loop control and was implemented in a receding horizon optimal control setup. This algorithm was shown to be equivalent to MPC for deterministic LTI systems, 15 and was later extended giving guarantees on recursive feasibility and closed-loop stability. 18 Additionally, numerical case studies have illustrated that the algorithm performs robustly on some stochastic and nonlinear systems and often outperforms system identification followed by conventional MPC. [19][20][21] Several other data-driven control methods have been proposed that make use of input/output data in similar ways as DeePC. One method uses the fundamental lemma to synthesize stabilizing output feedback controllers solving the linear quadratic regulation problem using only input/output data. 22 Other methods use previously measured input/output trajectories as motion primitives to compute minimum energy inputs, 23 or produce new control inputs for LTI systems. 24 All of these methods, including the DeePC algorithm, rely on the linearity property. For nonlinear systems, data-driven control methods that make use of motion primitives to synthesize new trajectories have been proposed. 25,26 Common to these methods is a nonlinear data-fitting step in the generation of the motion primitives. One approach uses sparse identification to fit the raw data to a predefined library of nonlinear primitives. 25 An approach tailored to robotics applications learns motion primitives from demonstration trajectories by estimating the parameters of nonlinear Gaussian basis functions. 26 By contrast, DeePC directly uses raw data sequences as motion primitives, and there is no data-fitting step. It relies on a robustifying regularization, which is incorporated directly in the optimal control objective, to address nonlinearity and stochasticity. 15,19,27 The focus of this paper is on implementing this robustified, regularized variant of the DeePC algorithm for the first time on a real-world system. In particular, we seek to analyze how the algorithm can be applied for real-time control of a quadcopter whose dynamics are nonlinear and the measurements are corrupted by noise. The quadcopter is a common benchmark system for verifying data-driven control methods. 7,8,[28][29][30] It makes for an interesting benchmark because it is nonlinear, open-loop unstable and has fast dynamics. Our main aim is to gain valuable insights on this benchmark which will assist in the implementation of our novel data-driven control method across multiple systems.
Contributions: The DeePC algorithm is implemented for the first time on a real-world system bridging the gap from theory to application. Through this, we gain key insights into choices of the algorithm's hyperparameters, providing tuning guidelines. We demonstrate that the DeePC algorithm is computationally tractable and suitable for real-time control. A video of the DeePC algorithm performing figure 8 trajectory tracking on the real-world quadcopter is provided here: https://doi.org/10.3929/ethz-b-000493419. Outline: The real-world quadcopter system, problem statement, and DeePC algorithm are introduced in Section 2. The main contributions appear in Section 3, where we present simulation analysis and experimental results, as well as a video of successful trajectory tracking of the quadcopter. We conclude in Section 4 stating some future directions of research.

SETTING
We first present the quadcopter system in Section 2.1, providing details about its input/output channels, and the first-principles modeling that is used for simulation-based analysis. We then formally state in Section 2.2 the quadcopter control goal as a general finite-horizon, discrete-time, optimal control problem. Section 2.3 recalls the DeePC algorithm, showing how it can be used to address both LTI and nonlinear stochastic control problems in a data-driven way.

Quadcopter
For the purpose of simulation, we use a nonlinear, continuous-time quadcopter model. Full details of the model derivation are provided in other works. 31,32 Here we highlight the key definitions, equations, and control architecture. The model presented is also the starting point for the model-based control methods that are used for comparison in Section 3.4. We define the model in terms of an inertial frame of reference, denoted (I), and a body frame of reference attached to the quadcopter, denoted (B), with the origin of frame (B) fixed at the quadcopter's center-of-gravity. The position of the body frame with respect to the inertial frame is denoted by ⃗ p = (p x , p y , p z ). We use Euler angles to describe the orientation of the body frame relative to the inertial frame, and following the ZYX intrinsic Euler angle convention, we denote the roll, pitch, and yaw angles by ⃗ = ( , , ) respectively. The angular rates about the body frame axes are denoted by ⃗ = ( x , y , z ). Thus, the model has 12 states, (⃗ p,̇⃗ p, ⃗ , ⃗), and the inputs to the model are the thrust force from each propeller, denoted f i , i = 1, … , 4. The parameters required for the quadcopter model are the mass m, the mass moment of inertia J, the body frame coordinates for the center-of-thrust of each propeller (d x i , d y i ), and the constant of proportionality d i that approximates a linear relation between the torque due to propeller drag and the thrust force f i . Figure 1 visualizes this definition of the quadcopter. The nonlinear, continuous-time equations of motion are readily derived as where a g is the acceleration due to gravity. An important feature of these equations is that the equilibrium inputs are the same at all positions ⃗ p and at all yaw angles . Most off-the-shelf quadcopters are equipped with an on-board controller that allows the user to specify references instead of directly specifying the thrust force for each propeller, we refer to this as the inner controller. Often the manufacturer does not provide details of the inner controller and does not allow the user to bypass it. We consider a quadcopter with an inner controller that uses the data from the onboard inertial measurement unit (IMU) to track user provided references for the angular rate about the x (B) and y (B) axes of the body frame and maintains a constant yaw angle. We leave the inner controller as implemented by the manufacturer, and consider the following three inputs to the system: • the body rate references about the x (B) and y (B) axes, denoted by ref,x and ref,y respectively, and • the total thrust force from the propellers combined, denoted by f tot . F I G U R E 2 Block diagram of the cascaded control architecture used for the simulations and experiments. In an off-the-shelf quadcopter system, the inner controller is typically already implemented. Here we focus on the synthesis of the outer controller.
The outer controller adjusts these three inputs to ensure that the quadcopter tracks a position reference provided by the user, based on feedback of position and orientation measurements, ⃗ p, , and , provided by an external motion capture system. 33,34 Our aim is to design a data-driven outer controller for this 3 input, 5 output off-the-shelf quadcopter system (see Figure 2 for a schematic of the architecture). Previous work has demonstrated that it is possible to design in simulation a data-driven controller from position and orientation measurements to propeller thrusts directly. 15 As the goal of this work is to control a real-world quadcopter, we consider that the inner controller on the off-the-shelf quadcopter constitutes a part of the black-box system under investigation.

Problem statement
Let us consider a discretized version of the quadcopter dynamics (1), which we denote by , where x(t) ∈ R n , u(t) ∈ R m , and y(t) ∈ R p are, respectively, the state, control input, and output at time t ∈ Z ≥0 . Note that even though the continuous-time dynamics (1) are known, an analytic expression does not exist for the nonlinear discretized dynamics described by mappings f ∶ R n × R m → R n and h ∶ R n × R m → R p in (2). We purposefully abstract notation above to highlight the fact that the problem statement is not unique to a quadcopter, but can be applied to many systems with nonlinear dynamics whose linearization about the operating point is a controllable and observable LTI system (see Section 2.3). For the quadcopter, we have that, u(t) = (f tot , ref,x , ref,y ) ∈ R 3 , and y(t) = (p x , p y , p z , , ) ∈ R 5 .
The state x(t) includes the quadcopter position, velocity, Euler angles, angular velocities, motor currents, and the states of the inner controller. From these quantities, u(t) and y(t) are what we have available for controller synthesis, while the state x(t) is regarded as unknown.
The problem of constrained finite-horizon optimal control is considered. Given the current time t ∈ Z ≥0 , a time horizon T f ∈ Z ≥0 , input and output constraint sets  ⊆ R m ,  ⊆ R p , the goal is to design a sequence of admissible control i=0 ⊂  such that when applied to system (2), the resulting outputs {y(t + i)} T f −1 t=0 ⊂ R p lie in the constraint set  and minimize the stage costs given by cost function c ∶ R p × R m → R ≥0 . More formally, we wish to solve the following optimization problem: wherex(t) is an estimate of the state at time t, typically computed by filtering the sequence of past inputs and outputs. Problem (3) is solved in a receding horizon fashion and is widely known as output-feedback MPC. The cost function c can be designed by the user to attain various control objectives (e.g., regulation or trajectory tracking). Without knowledge of system (2), solving problem (3) is no longer possible as we are unable to predict forward trajectories of the system, and estimate the current state x(t). To resolve these issues, we approach the problem in a data-driven manner. In particular, we use the DeePC algorithm, 15 which replaces the constraints requiring system knowledge by raw input/output data to solve an optimization problem similar to (3), and, under assumptions to be recalled next, directly equivalent to (3).

2.3
Data-enabled predictive control

DeePC for deterministic LTI systems
The DeePC algorithm has been shown to be an equivalent data-driven method for solving (3) when the unknown system (2) is a deterministic LTI minimal realization, that is, when the dynamics in (2) are of the form where A, B, C, D are matrices of appropriate dimensions. Note that (4) being a minimal realization implies controllability and observability properties of the system. Several modifications have also been proposed for robustifying the algorithm against stochastic disturbances. 19,27 We first introduce the necessary preliminaries, then recall the DeePC algorithm as applied to LTI systems of the form (4), followed by the robustifying regularizations that allows the algorithm's adaptation for the nonlinear quadcopter system (2) with noisy measurements. Let the Hankel operator which maps a sequence of signals u = {u(i)} T i=1 ⊂ R m to a Hankel matrix * with L ∈ Z >0 block rows be denoted by Definition 1 (Persistency of Excitation 16 ). Let L ∈ Z >0 . The sequence of signals u = {u(i)} T i=1 ⊂ R m is called persistently exciting of order L if the Hankel matrix ℋ L (u) has full row rank.
Note that the property of being persistently exciting of order L requires the length of the sequence of signals be large enough; in particular, the length must be such that T ≥ (m + 1)L − 1. Intuitively, a persistently exciting sequence of signals must be sufficiently long and sufficiently rich to excite all aspects of the dynamics (4). The DeePC algorithm relies on the following fundamental result. .
The result above states that the subspace spanned by the columns of the Hankel matrix ) corresponds exactly to the subspace of possible trajectories of (4). Hence, the Hankel matrix may serve as a nonparametric model for (4), one that is simply constructed from raw time-series data and does not require any learning.
In what follows, we will see how the above theorem allows us to perform implicit state estimation as well as predict forward trajectories of the unknown system allowing us to solve an optimization problem equivalent to (3) when the system is of the form (4).
Data collection: Let T d , T ini ∈ Z >0 be the length of data collection and the time horizon used for initial condition estimation, respectively. Suppose (u d , is a sequence of input/output measurements collected from (4) during an offline procedure. Suppose further that the input is persistently exciting of order T ini + T f + n. We partition the input/output measurements into Hankel matrices where U p consists of the first T ini block rows of The data in U p and Y p will be used in conjunction with past data to perform implicit initial condition estimation, and the data in U f and Y f will be used to predict future trajectories. Data-driven control and estimation: be the T ini most recent past input/output measurements from the system. By Theorem 1, Every column of the Hankel matrix is a trajectory of the system (motion primitive), and any new trajectory (right-hand side of (6)) can be synthesized by a linear combination of these motion primitives. Hence, given an input sequence u to be applied to the system, one can solve the first three block equations of (6) for g, and the corresponding output sequence is given by y = Y f g. The top two block equations in (6) are used to implicitly fix the initial condition from which the future trajectory departs. To uniquely fix the initial condition from which the future trajectory departs, one must set T ini ≥ , where is the lag of the system (i.e., the number of past measurements required to uniquely identify the current state of the system through back-propagation of the dynamics (4)). This in turn implies that the predicted trajectory given by y = Y f g is unique. 17 Note that the lag of the system is a priori unknown, but is upper bounded by n. Hence, knowing an upper bound on the state dimension n of the system is sufficient to obtain unique predictions.
The Hankel matrix in (6) simultaneously performs state estimation and prediction, and can thus be used as a predictive model for system (4). Substituting (6) for the unknown dynamics (4) in the optimization problem (3) gives rise to the following data-driven optimization problem allowing for the computation of optimal control inputs without knowledge of a system model: where  T f is the T f -fold cartesian product of  (similarly for  T f ). The optimization problem (7) was shown to be equivalent to the MPC problem given in (3) when the unknown system is of the form (4). 15 Note that the optimization problem (7) does not include any parameters that need to be estimated from data. The Hankel matrix directly uses raw data without further processing, the cost function c is specified by the practitioner, and the optimization variable g is solved for in every online iteration of the algorithm. There is no separate model-fitting or denoising step.

Regularized DeePC for nonlinear noisy systems
The goal of this paper is to implement the above DeePC optimization problem to control a real-world quadcopter described above in Section 2.1. As the quadcopter dynamics do not satisfy the deterministic LTI assumption necessary to show the equivalence of the MPC optimization problem (3) and the DeePC optimization problem (7), regularizations are needed. Indeed, when the input/output data used for the Hankel matrix in (7) is obtained from a nonlinear system or is corrupted by process or measurement noise (as is the case with any real-world application) the subspace spanned by the columns of the Hankel matrix no longer coincides with the subspace of possible trajectories of the system. In fact, in any real-world problem setting the Hankel matrix used for predictions in (7) will generally be full rank. Hence, the Hankel matrix constraint will imply that any trajectory is possible leading to poor closed-loop performance of the DeePC algorithm. Furthermore, the online measurements y ini used to set the initial condition from which the predicted trajectory departs are corrupted by measurement noise, and thus may cause poor predictions. Including a 2-norm penalty on the difference between the estimated initial condition Y p g and the measured initial condition y ini coincides roughly with a least-square estimate of the true initial condition. Regularization has been proposed as one method to deal with these difficulties and extend the DeePC algorithm to nonlinear noisy systems. 15 We present a variation of these regularizations in the following regularized DeePC optimization problem minimize u,y,g where s ≥ 0, and r ∶ R T d −T ini −T f +1 → R ≥0 is a function used to regularize g. In comparison to the original regularized DeePC formulation, 15 we use abstract stage cost and regularization functions c and r, respectively. These will be made concrete in Section 3.2. We also use the 2-norm instead of the 1-norm to penalize the difference between the estimated initial condition Y p g and the measured initial condition y ini . Algorithm 1 below summarizes the DeePC procedure where (8) is implemented in a receding horizon fashion.

Algorithm 1. Regularized DeePC
. Set t to t + s and update u ini and y ini to the T ini most recent past input/output measurements. 5. Return to 1.
It has been shown that when r(g) = g ||g|| q , where g ≥ 0 and q ∈ Z >0 ∪ {+∞}, problem (8) coincides with a distributionally robust problem formulation. Using such a q-norm regularization for the decision variable g induces robustness to all systems (nonlinear or stochastic) that could have produced the data in the Hankel matrices (5) within an s-norm induced Wasserstein ball around the data samples used, where 1 q + 1 s = 1. 19,27 The computational complexity of (8) can be characterized by the number of decision variables and constraints. There equality constraints, and 2(m + p)T f inequality constraints, when  T f and  T f are box constraint sets. As is expected of a finite-horizon optimal control method, the computational complexity grows with the time horizon T f . Furthermore, T ini and T d also affect the computational complexity. The former is related to the observability of the unknown system (2), the latter to the system's dimensionality.

RESULTS
In this section, we present the results and insights gained by applying DeePC Algorithm 1 described in Section 2.3 for trajectory tracking of the quadcopter system described in Section 2.1. The challenges posed by this application are: 1. The nonlinear and stochastic nature of the quadcopter system requires that the regularization function in (8) and the other hyperparameters offered by the DeePC Algorithm 1 be chosen appropriately for the application at hand. This is addressed by the simulation-based analysis in Section 3.2. 2. The simulation model is a simplification of the real-world quadcopter system which neglects complex aerodynamic phenomena, drag, delays in actuation, communication and sensing, and process noise. Essentially, the simulation model contains merely the bare Newtonian dynamics, and even those are subject to parametric uncertainties. Therefore, it is not clear that simulation-based parameter selection can be directly transferred to real-world experiments. This is addressed by the experimental results in Section 3.3.
The real-world results were collected from laboratory experiments conducted using a motion capture system to provide measurements of the position and orientation of the quadcopter at a frequency of 25 Hz. Thus, the sampling time in the discrete-time dynamics (2) is 40 ms. The laboratory setup was developed as part of a previous work. 35 To provide the reader with an idea for the scale of the setup, the Crazyflie 2.0 36 quadcopter weighs 28 grams and a 12 cubic meter flying space was available. Further details on the setup are given in Section 3.3 where the experimental results are presented. The simulation environment uses the model presented in Section 2.1 and the model parameters identified in a previous work. 37 These model parameters do not match the specific Crazyflie 2.0 used for the experiments, partially due to additional hardware required for detection by the motion capture system.

Data collection
As described in Section 2.3, the input signal used in the Hankel matrices appearing in (7) must be persistently exciting of sufficient order. This data can be collected by injecting a random input sequence, or by performing a manual flight experiment where a human performs the function of the outer controller. For repeatability of results, we chose the former. Two possible choices of random input signals to be applied during the data collection phase are a pseudorandom binary sequence (PRBS) designed for multiple inputs, 38 or a white noise signal. Both types of perturbations were tested in simulations and showed a negligible difference in the performance of the DeePC algorithm. The results in this paper are presented using a PRBS input signal during the data collection phase because it generally provides better performance for classical system identification techniques. 39 The input signals applied for data collection consist of the PRBS excitation signal added to an existing controller that maintains the quadcopter around the hover state. The data collected was used to populate the Hankel matrices in (5).

Simulation-based analysis and insights
The aim of our controller is to track a steady-state reference (u r , y r ) ∈ R m × R p . We therefore consider as the cost function c the quadratic tracking error between the prediction and the given steady state reference, that is, where Q ≽ 0, R ≻ 0. This cost function is a generalization to the original regularized DeePC 15 which considers a nonzero steady-state reference control input u r . The values chosen for Q, R, and u r are given in Appendix A. The time horizon was chosen as T f = 25 which corresponds to 1 second in real time. Furthermore, we choose the regularization function in (8) as the following where g ≥ 0, q ∈ Z >0 ∪ {+∞}, the vector 1 T ini ⊗ u r denotes the stacked column vector consisting of T ini copies of u r (similarly for 1 T ini ⊗ y r ), and † denotes the pseudo-inverse. The vector g r in (10) can be thought of as a "steady-state trajectory mapper" which linearly combines columns of the Hankel matrix to match the given steady-state reference trajectory. Among the possibly infinite number of vectors g that match the steady state, this is the one with the smallest 2-norm.
In the case when there is no g that matches the steady state, g r matches it as closely as possible in the 2-norm sense. However, this case is unlikely in practice since the Hankel matrix is generally full rank as discussed in Section 2.3. Penalizing the difference between g and g r ensures that the stage cost in (8) is zero when the quadcopter is at the steady-state reference (u r , y r ). This is another generalization to the original regularized DeePC, 15 where only g is penalized, and the regularization norm q is chosen to be the 1-norm. We will consider both the 1-norm and the 2-norm in this paper. Under these design choices, the regularized DeePC optimization problem (8) offers several hyperparameters given by: • T d , the total number of data points used to construct the Hankel matrices in (5), • T ini , the time horizon used for initial condition estimation, • s , the weight on the softened initial condition constraint, • g , the weight on the regularization of g, • q, the norm used to regularize g in (10), and • p, the number of outputs used to construct the Hankel matrices in (5).
Although p may seem fixed by the output measurements available, in the case of quadcopter control, it is reasonable to consider whether to use all measurements for position control, that is, set p = 5, or use only the position measurements, that is, set p = 3.
Note that if one were to approach the control problem through system identification followed by MPC, a number of hyperparameters would also need to be selected. For example, the MATLAB subspace system identification method N4SID requires choosing a model order, weighting scheme, forward estimation and backward prediction horizons, weighting prefilter, output weighting matrix, and other hyperparameters. More generally, system identification for quadcopters requires significant engineering, and previous works resort to the use of partial model knowledge, such as the presence of integrators 40 or the decoupled nature of the dynamics. 41,42 This is in addition to the use of full model knowledge in simulating the system and generating the input/output data for identification in these works. Further, the DeePC hyperparameters affect the closed-loop control performance directly and not through an offline system identification step, which means that they can be easily adapted online on the arrival of new data.
To investigate the effect of the hyperparameters for DeePC, we perform a grid search over the ranges and a range of T d values that satisfy the minimum data length prescribed by the persistency of excitation requirement from Definition 1. Note that the prediction horizon T f , and the cost matrices Q and R are not parameters unique to the regularized DeePC optimization problem (8), but are also parameters for MPC. For the sake of clarity we do not consider them as hyperparameters in the simulation-based analysis. Moreover, fixing T f = 25, and Q and R as in Appendix A, was sufficient for achieving good closed-loop performance, and allows for a focus on the other hyperparameters of DeePC. The time horizon used for initial condition estimation T ini , and the number of outputs p are also not unique to DeePC, since they are used in some model-based control approaches which, for example, perform receding horizon state estimation. We consider them as hyperparameters in the simulation-based analysis in order to gain insights on the implicit state estimation capability of DeePC. For each combination of hyperparameters the following procedure is carried out in simulation. The same procedure is used for the real-world experiments presented in Section 3.3.  [Procedure for collecting results in simulation and real-world experiments] For simulation, the system used was a model of the off-the-shelf quadcopter system with dynamics (1) and architecture as in Figure 2, where measurements were affected by zero-mean Gaussian noise with covariance matrix Σ y as in Appendix A. For the real-world experiments, the system used was the Crazyflie 2.0.
1. The quadcopter is brought to hover at y = (0, 0, 1) with a stabilizing controller. The system is excited by adding a PRBS signal to the output of the stabilizing controller, as per Section 3.1, for the input/output data collection step of the DeePC algorithm. (8) is setup with the input/output data collected in step 1. 3. The DeePC controller is turned on and the quadcopter is commanded to track a diagonal step up from y(0) = (−0.5, −0.5, 0.5) to y r = (0.5, 0.5, 1.5).

The resulting closed-loop tracking error is measured as
∑ T e −1 t=0 ||y(t) − y r || 2 2 , where t = 0 is the time index at the start of the step trajectory and T e = 250 is the chosen experiment length, which corresponds to 10 seconds in real time.

Sensitivity to T d and T ini
As discussed in Section 2.3, for LTI systems the DeePC algorithm requires a minimum number of data points to satisfy the persistency of excitation property. Since we apply the DeePC algorithm to a nonlinear system subject to measurement noise, it becomes unclear as to how many data points are needed in order to construct the Hankel matrices in (5). Figure 3 shows the sensitivity analysis of T d and T ini on the tracking error. Figure 3 (left) shows the influence of T d on the tracking error, where for each value of T d considered we show the smallest tracking error achieved over all combinations of the other hyperparameters in the grid given by (11) with T ini = 6. Similarly, Figure 3 (right) shows the influence of T ini on the tracking error, where for each value of T ini considered we show the smallest tracking error achieved over all combinations of the other hyperparameters in the grid given by (11) with T d = 331.
The key insight from the grid search result in Figure 3 (left) is the distinct improvement in the tracking error of the regularized DeePC algorithm when the number of data points is chosen such that the Hankel matrix appearing in the DeePC optimization problem (8) has at least as many columns as rows. Since the Hankel matrix is generally full rank when the data is obtained from a nonlinear noisy system, having a square Hankel matrix ensures that the subspace spanned by its columns contains the actual subspace of possible trajectories of the system. When the Hankel matrix is slim (i.e., has less columns than rows), this property may not hold; the subspace spanned by the columns of a slim Hankel matrix may not contain the subspace of possible trajectories of the system. This insight is summarized as the following inequality which states that T d should be chosen to be larger than both the minimum amount needed for persistency of excitation in the LTI case and the minimum amount such that the Hankel matrix in (8) is square F I G U R E 4 Influence of g and s on the tracking error. All other hyperparameters are fixed to the values described in the text. The coloured shading is restricted to the interval (36,120) to sufficiently display the shape of the region shown. The cost increases steeply in regions where the cost is greater than 120, thus the plot is clipped for values greater than 120 for the sake of clarity.
Here n = 8 is the number of states corresponding to a minimal realization of (1) linearized about hover. Note that the minimum number of data points such that the Hankel matrix in (8) is square is directly affected by the number of outputs p. Hence, a larger p requires more data points to satisfy the lower bound in (12) and thus results in more decision variables in problem (8). The distinct improvement in the tracking error when T d is chosen such that the Hankel matrix in (8) is square is also observed in a power system application of DeePC. 21 A similar trend is observed in Figure 3 (right) for T ini where good tracking performance is achieved for values larger than T ini = 2 for p = 5, and T ini = 3 for p = 3. This suggests that more past measurements are needed to estimate the initial condition of the unknown system when p = 3. We observed, however, that setting T ini = 6 gives steadier flight of the quadcopter. Under noisy measurements, increasing T ini leads to better initial condition estimates. For the remaining results (simulation and experimental), Procedure 1 was conducted with the number of data points T d = 331 and with T ini = 6. This resulted in good tracking error performance for both p = 3 and p = 5, while keeping the size of the DeePC optimization problem (8) small enough to be computationally tractable in real-time.

3.2.2
Sensitivity to s , g , q, and p Figure 4 shows the results from the grid search as a heat map over ( s , g ) with fixed values of q = 2 and p = 3 for the purpose of visualization, and fixed values of T d = 331 and T ini = 6 for the reasons described above. The figure provides the insight that there is a threshold for s (approximately s ≥ 10 7 ) beyond which small tracking error can be achieved. The intuitive explanation for this insight is that a large enough penalization on the softened initial condition constraint ensures that the future predicted trajectory departs from an initial condition close to the actual initial condition. A similar trend of the tracking performance as a function of s is observed in other numerical case studies of DeePC. 15,21 This suggests that a tuning guideline for s is to choose it as large as possible without causing the optimization solver to encounter numerical issues. Figure 4 also exposes a range for g in which small tracking error is achieved. To investigate this further we consider the grid search results for all combinations of q ∈ {1, 2} and p ∈ {3, 5}. Figure 5 shows the results from the grid search over g for a fixed value of s = 7.5 × 10 8 and for all four combinations of q and p, for example, the line for q = 2, p = 3, is the slice of Figure 4 at the fixed value of s . In all cases a small tracking error is achieved for a range of g , although the combination q = 1, p = 3 performs relatively poorly. This range of g with acceptable tracking error is wider for q = 2 than for q = 1, which suggests that for the setup under consideration, 2-norm regularization is less sensitive to hyperparameter selection than 1-norm regularization. This observation is supported by observing the heat maps for all four combinations q ∈ {1, 2} and p ∈ {3, 5} as provided in Appendix B. Based on these insights, for the remainder of the results we fix the values s = 7.5 × 10 8 and q = 2 and now investigate in more detail the influence of g and the choice of output measurements p ∈ {3, 5}.
To provide some intuition for how g influences the optimal solution of the regularized DeePC optimization problem (8) we now take a closer look at the closed loop trajectories resulting from g ∈ {0,500}. Figure 6(A, B) shows the (solid thick line) this is the respective slice of Figure 4. The main observation is that the choice q = 2, that is, a 2-norm regularization on decision variable g, provides a wider range of g for which acceptable tracking error is achieved.  (8) is obtained from a nonlinear system and is corrupted by measurement noise, then the subspace spanned by the columns of the Hankel matrix is all of R (m+p)(T ini +T f ) . Hence, without regularization on the decision variable g, the Hankel matrix predicts that every trajectory is possible. The value g = 500 is selected from the grid search result where the DeePC algorithm achieved the smallest tracking error (see Figure 5). We see in Figure 6(B) that desirable reference tracking is achieved and that more physical predictions are computed by the regularized optimization problem (8).
An important distinction between the g hyperparameter and the T d , T ini , and s hyperparameters discussed above, is that the g regularization cannot be arbitrarily increased, shown also in Figure 5. The reason is that at a certain level the regularization term r(g) in (8) dominates the tracking error term, leading to poor tracking performance and eventually instability of the system. However, the range of g resulting in small tracking error is large (e.g., g ∈ [100, 10000] for q = 2, p = 3 in Figure 5) indicating robustness to the choice of g .
Hyperparameters g and q, which parameterize the regularization function r(g) in optimization problem (8), are the main parameters of the regularized DeePC algorithm that are not present in model-based control approaches. These hyperparameters provide distributional robustness against the uncertainty in the system generating the input/output data. 19,27 Increasing the regularization weight g provides an increased level of robustness at the cost of being conservative. For intuition, the counterpart of g in a model-based approach are model-order selection parameters that decide how much of the data should be attributed to the model and how much to noise. Similarly, the choice of the regularization norm q corresponds to the choice of a loss function in system identification, such as the average or the worst-case cost. The range of values of g and q which result in a small tracking error depends on the nature of the uncertainty in the system, and the analysis above does not indicate a general guideline that we would expect to apply across multiple systems. Interestingly, however, we observe here and in other applications 20,21 that the combination g = 500 and q = 2 performs well. We will explore this empirical observation further in future work.

Real-world DeePC implementation
We now investigate how the insights gained through the simulation analysis of Section 3.2 transfer to laboratory experiments on a real-world quadcopter, with the details of the experimental setup provided at the start of Section 3. The experiments are performed as per Procedure 1 (see Section 3.2) and through the results we investigate: (a) whether the insights from the simulation-based analysis are validated in experiments; (b) whether the hyperparameter values identified from the simulation-based analysis can be directly transferred to the laboratory environment; and (c) the reliability of the tracking performance achieved. Figure 7 provides a schematic of the laboratory setup used to collect the experimental results. The motion capture system consists of multiple cameras placed around the flying space and connected to a dedicated computer. The software running on the motion capture computer provides accurate measurements 34 of the position and orientation of the Crazyflie 2.0 36 quadcopter, i.e., measurements of (⃗ p, , ). These measurements are available to an offboard laptop where the outer controller from Figure 2 is implemented. The control decisions of the outer controller, that is (f tot , ref,x , ref,y ), are sent via the Crazyradio link to the Crazyflie 2.0 where the firmware provided with the quadcopter runs an onboard controller to track these.
The following analysis of performance on the real-world system focuses on hyperparameters g and p as these are hyperparameters for which the simulation-based analysis of Section 3.2 did not provide clear tuning guidelines. On the other hand, the tuning guidelines found in Section 3.2 for hyperparameters T d , T ini , and s generalized well to the real-world quadcopter, and no significant new insights were observed when varying these hyperparameters in the real-world. Hyperparameter q was set to the 2-norm because it reduces the number of decision variables in the optimization problem (8) to be solved online and hence reduces the online computation time required. Moreover, the simulation-based results in Section 3.2 suggest that similarly low tracking error performance is achievable with both q ∈ {1, 2}. Figure 6(C, D) shows the p z coordinate of the closed loop trajectory, reference, and DeePC predictions when implemented on the quadcopter using the same hyperparameter values as Figure 6(A, B) respectively. The main feature of Figure 6 is that the simulation and experimental results show qualitatively similar closed-loop trajectories (solid lines) and predictions computed by the DeePC optimization problem (8) (dashed lines). This provides experimental validation of the insight that regularization is required to predict physically reasonable trajectories when applying DeePC to a real system. Moreover, a direct transfer of the hyperparameters selected via simulation to the experiments was possible, and we observed that tracking performance was not significantly improved by adjusting the regularization parameter g . Appendix C provides a similar comparison for hyperparameter values above and below g = 500, indicating that the real-world implementation also achieves the best tracking performance at approximately g = 500.
To investigate the reliability of the performance observed in Figure 6(D), and also to investigate the influence of hyperparameter p, Procedure 1 was repeated in 28 experiments for each of p = 3 and p = 5. To capture different operating conditions, 14 trials were performed with a fully charged battery and 14 with a partially depleted battery. Figures 8 and 9 and Table 1 summarize the results. Figure 8 shows the position time series data (solid grey) of all 28 trajectories for p = 3 (A, B, C) and for p = 5 (D, E, F), with the average at each time point (dashed) shown to assist with visualization. Figure 9 shows that same data as a top view.
Quantitatively, Table 1 shows that p = 3 achieves a lower tracking error compared to p = 5, in terms of mean, median, and SD. This is likely due to the orientation measurements having higher noise than the position measurements. This can be addressed by performing a weighted penalization of Y p g − y ini using the covariance matrix of the measurement noise. Qualitatively, Figures 8 and 9 suggest that there is less variation in the closed loop trajectories with p = 3 than with p = 5. This result on the real-world quadcopter suggests than when applying DeePC to other systems, performance  may be improved by discarding measurements with higher noise as long as the system is observable with the remaining measurements.
From the online computation perspective, Table 1 shows that optimization problem (8) is solved sufficiently fast for both p = 3 and p = 5 considering that output measurements are provided for real-time implementation at 25 Hz. For the case of p = 3, there were 451 optimization decision variables, 168 equality constraints, and 300 inequality constraints. As a point of reference, the optimization problem in the output-feedback MPC approach of Section 3.4 had 283 optimization decision variables, 208 equality constraints, and the same number of inequality constraints as in the DeePC.
A video of the quadcopter successfully tracking step trajectories and a figure 8 using the DeePC algorithm can be found here: https://doi.org/10.3929/ethz-b-000493419.

Summary of hyperparameter selection insights
Through the simulation-based analysis of Section 3.2 and the real-world implementation, we gained insights on the selection of the DeePC hyperparameters that we expect to assist with applying the DeePC to other systems. They are summarized as: • Choose T d as per (12), that is, choose it to be larger than both the minimum amount needed for persistency of excitation in the LTI case and the minimum amount such that the Hankel matrix in (8) is square.
• Choose T ini by incrementally increasing it until steady tracking is observed. This coincides with a value which both exceeds the lag of the system in the LTI case and provides good initial condition estimates in the presence of noisy measurements.
• Choose s as large as possible without causing the optimization solver to encounter numerical issues.
• In regards to p, performance may be improved by discarding measurements with higher noise as long as the system is observable with the remaining measurements.
The selection of the regularization function r(g), parameterized in hyperparameters g and q, depends on the nature of the uncertainty in the system generating the input/output data and is expected to vary from one application to another. Preliminary empirical observation suggests that the combination g = 500 and q = 2 serves as a good initial choice. The 2-norm regularization is advantageous for real-time control because it reduces the number of decision variables in the optimization problem (8) to be solved online and hence reduces the online computation time required.

Comparison with model-based control
The results in Section 3.3 show that DeePC Algorithm 1 achieves good performance for the step reference tracking task specified in Procedure 1 in a data-driven fashion. We now present a model-based point of comparison that is developed for linear systems. We take a first-principles approach that considers the linearization of the quadcopter dynamics (1) about the hover equilibrium point, and we assume that the inner controller tracks the body rates reference signal without dynamics or delays. We use a sampling time of 0.04 seconds, that is, 25 Hz, to convert the continuous-time linear model to discrete-time. The resulting linear system model can be readily derived. 44 Hence we consider a model based-controller with eight states and three inputs, (⃗ p,̇⃗ p, , ) and (f tot , ref,x , ref,y ), respectively. [meters] [meters]

Top view
DeePC MPC F I G U R E 10 Experimental comparison of data-enabled predictive control and model predictive control.
The model-based control method we implement is output-feedback MPC, as described in Section 2.2. Optimization problem (3) is solved in a receding horizon fashion with the dynamics function f replaced by the linear-time invariant system model described above, the cost function c given by (9), and all parameters {T f , Q, R,  , , u r } set to the same values as used for the DeePC as given in Appendix A. The state estimate,x(t), is constructed by directly taking the measurements for (⃗ p, , ), anḋ⃗ p is estimated as the discrete time derivative of subsequent ⃗ p measurements. Figure 10 compares a trajectory of this first-principles MPC approach with that of the DeePC. Figure 10(A) shows the time series of the vertical position p z , and Figure 10(B) shows the trajectory in the (p x , p y )-plane Figure 10(A) shows that DeePC and MPC achieve qualitatively similar tracking performance for the vertical position p z . Both have a similar rise time and settling time, with the most distinct feature being that the DeePC controller overshoots the reference but then settles to a smaller steady-state offset. For MPC, this offset is present because there is a model mismatch between the steady-state input, u r , and that needed to maintain the real-world quadcopter at steady state. As the DeePC controller is provided with the same u r , this indicates that the structure of the DeePC controller is able, to some extent, to correct for a mismatch of the steady state input u r provided. Figure 10(B) shows a clear disparity between the tracking performance in the horizontal (p x , p y )-plane. Where the MPC follows an almost straight line trajectory from the starting point to the target, the DeePC controller by contrast has quite different tracking behavior for the p x and p y directions, a trend also observed in Figure 9 and in our simulation-based tests. This leaves open an interesting direction for further investigation to understand why the DeePC controller produces a faster rise time for the p x direction compared to the p y direction.
Overall, for the quadcopter application we see that DeePC performs similarly to MPC where a first-principles model is available. This indicates the potential for DeePC to tackle applications where a first-principles model is either not available or identifying all the necessary model parameters is not conceivable.

Model mismatch
In all of our analysis, the off-the-shelf quadcopter is maintained at a zero yaw angle by the inner controller. At that yaw angle, the quadcopter body frame x (B) , y (B) axes are aligned with the inertial frame x (I) , y (I) , as is demonstrated in the top view (right) of Figure 1. Therefore, the x (I) and y (I) dynamics are decoupled from each other with respect to the body rate reference control inputs of the outer controller ref,y and ref,x , respectively. In the real-world experimental setup, the yaw angle measurement zero reference point must be calibrated by carefully aligning the quadcopter body frame with the inertial frame, and some calibration error is expected. We now consider a case where there is a yaw calibration error of approximately 30 • , which is exaggerated for the purpose of demonstration. The quadcopter body frame is rotated by 30 • around the inertial z (I) axis at the yaw measurement zero reference point, leading to a misalignment in the inertial and body frames that is unknown to the controller.
To capture this yaw miscalibration in simulation, an offset of 30 • between the true yaw angle and the yaw angle measurement available to the controller is induced. Figure 11 shows the simulation results of the quadcopter tracking a 1 meter step in the p x direction with DeePC and output-feedback MPC. With no knowledge of the coupling between the p x and p y dynamics induced by the misalignment of the inertial and body frames, the output-feedback MPC controller causes the quadcopter to deviate considerably in the positive p y direction then spiral around the target in the (p x , p y )-plane. By contrast, the quadcopter takes a more direct path to its target under DeePC control. This suggests that the DeePC controller implicitly learns a good mapping between the body rate references ref,x , ref,y , and the p x , p y dynamics from the data collected at the misaligned frames of reference. The mapping is not perfect; a slight spiraling effect as the quadcopter approaches its target is observed, but the improvement to the model-based approach which equally lacks knowledge of the frames misalignment is apparent. The yaw angle mismatch is an example of a bias error that can occur when adopting a linear model-based control approach to a nonlinear system. Such a bias error is present when the linearization is performed at an incorrect operating point. The DeePC algorithm provides some robustness to such a bias error, since it is able to adapt to unknown operating conditions of the system from the data, and also by virtue of the regularization in optimization problem (8). One can further consider a case where the yaw angle measurement calibration drifts slowly over time, and a periodic recalibration is required for a model-based control approach to perform well. Instead of recalibration, the data in the Hankel matrix in (8) can be updated online in the DeePC approach. We will explore this concept further in future work.

CONCLUSION
We demonstrated that the regularized DeePC algorithm is suitable for real-time control of a real-world quadcopter, thereby bridging the gap between theory and practice. In the process, we performed a sensitivity analysis on the hyperparameters of the DeePC algorithm in simulation, gaining key insights on their effect. These simulation takeaways generalized well to the real-world quadcopter system, where minimal hyperparameter refining was performed. Through the real-world implementation, it was demonstrated that the DeePC algorithm is computationally tractable and adequately solvable in real-time, with solve times far beneath the real-time requirement. The insights from the simulation and real-world experiments were condensed into a set of hyperparameter selection guidelines expected to assist with applying the DeePC algorithm to other systems (see Section 3.3.1). Future work includes applying the DeePC algorithm on other real-worlds systems for which no first-principles model can be derived.

APPENDIX A. PARAMETERS FOR IMPLEMENTATION OF THE DEEPC ALGORITHM
The following lists the hyperparameters offered by the DeePC algorithm, and the design choices required to specify the quadcopter tracking goal. The value specified in this list is used for all results unless otherwise indicated in the text.
• T d = 331, the total number of data points used to construct the Hankel matrices in (5), • T ini = 6, the time horizon used for initial condition estimation, • s = 7.5 × 10 8 , the weight on the softened initial condition constraint, • g = 500, the weight on the regularization of g, • q = 2, the norm used to regularize g in (10), • p = 3, the number of outputs used to construct the Hankel matrices in (5), • T f = 25, the prediction horizon, (corresponds to 1s in continuous time), • Q = , the covariance matrix of measurement noise in simulation when p = 5. Note that when p = 3 the covariance matrix is the top left 3 × 3 block of Σ y .

APPENDIX B. FURTHER RESULTS FOR THE GRID SEARCH ANALYSIS
For completeness, we include here the results for the grid search analysis, described in Section 3.2, for all hyperparameters considered. Figure B1 bottom left is the same as shown in Section 3.2, and the other plots in Figure B1 are for the remaining combinations of q ∈ {1, 2} and p ∈ {3, 5}.

F I G U R E B1
Influence of g and s on the tracking error for the four combinations of 1-norm or 2-norm regularization (q ∈ {1, 2} respectively) on the decision variable g, and p ∈ {3, 5} outputs measured, as labeled on the axes. All other hyperparameters are fixed to the values described in the Section 3.2. The coloured shading is restricted to the interval (36,120) to sufficiently display the shape of each plot. All plots increase steeply for values greater than 120, and the plots are clipped for values greater than 120. Figure C1 shows results similar to Figure 6 for comparing the closed loop trajectories (solid lines) and the predictions computed by the DeePC optimization problem (8) (dashed lines). This shows the same trend that the performance observed in the simulation-based analysis, Figure C1(A-C), is qualitatively similar to that observed in the real-world experiments, Figure C1(D-F). Qualitatively, the best g chosen in simulation also performs best in reality and results in a similar closed loop trajectory. A smaller value of g results in a faster but more oscillatory response, and a larger value of g results in a sluggish response. This figure demonstrates that, despite unmodeled dynamics in simulation, the real-world system behaves similarly to the simulation model when applying DeePC Algorithm 1. Consequently, simulation-based hyperparameter selection was adapted on the real system with minimal adjustments required.