Zero-Order Optimization for Gaussian Process-based Model Predictive Control

By enabling constraint-aware online model adaptation, model predictive control using Gaussian process (GP) regression has exhibited impressive performance in real-world applications and received considerable attention in the learning-based control community. Yet, solving the resulting optimal control problem in real-time generally remains a major challenge, due to i) the increased number of augmented states in the optimization problem, as well as ii) computationally expensive evaluations of the posterior mean and covariance and their respective derivatives. To tackle these challenges, we employ i) a tailored Jacobian approximation in a sequential quadratic programming (SQP) approach, and combine it with ii) a parallelizable GP inference and automatic differentiation framework. Reducing the numerical complexity with respect to the state dimension $n_x$ for each SQP iteration from $\mathcal{O}(n_x^6)$ to $\mathcal{O}(n_x^3)$, and accelerating GP evaluations on a graphical processing unit, the proposed algorithm computes suboptimal, yet feasible solutions at drastically reduced computation times and exhibits favorable local convergence properties. Numerical experiments verify the scaling properties and investigate the runtime distribution across different parts of the algorithm.


I. INTRODUCTION
Real-world applications of model predictive control using Gaussian processes (GP-MPC), such as vision-based robot path-tracking [1], trajectory tracking using a robotic arm [2], autonomous racing [3], [4], or high-speed quadrotor flight [5], have showcased its potential to leverage closedloop data for constraint-aware online model adaptation.
Yet, the computational cost associated with GP inference remains a major challenge. To fully exploit the rich, statedependent uncertainty description induced by the posterior covariance of the GP, it should be included in the optimal control problem (OCP) formulation and propagated through the dynamics model over the prediction horizon [6]. However, incorporating the covariance propagation into the OCP constitutes a major limitation for real-time implementation of the algorithm on embedded hardware, both in terms of additional optimization variables capturing the state covariance, as well as computationally expensive evaluation and differentiation of the GP posterior covariance.
As a result, practical implementations resort to various heuristics to speed up the algorithm. Popular approaches include GP approximations with a fixed number of basis functions [7], [8], reduction of optimization variables by fixing the state covariances in the optimization problem based on their predicted value at the last MPC iteration [3], [6], or This project has received funding by the European Union's Horizon 2020 research and innovation programme, Marie-Skłodowska Curie grant agreement No. 953348, ELO-X.
to completely ignore the uncertainty description provided by the GP posterior covariance in the control algorithm [1], [5].

A. Contributions
This paper addresses the aforementioned challenges by i) applying a tailored inexact sequential quadratic programming (SQP) algorithm [9], [10] to solve the GP-MPC optimal control problems, returning suboptimal, yet feasible solutions at convergence; ii) speeding up GP computations by efficient inference and automatic differentiation (AD) [11], [12], as well as parallelization on a graphical processing unit (GPU); iii) showing that the resulting algorithm maintains favorable local convergence properties if the GP posterior covariance and process noise covariance at the solution are sufficiently small.

B. Related Work
A tailored Jacobian approximation for stochastic and robust nonlinear model predictive control (NMPC) has been originally developed in [9], [10], where it is referred to as a zero-order method. Based on this approach, [13] presents a robust NMPC algorithm that improves the ellipsoidal uncertainty propagation by efficiently optimizing for optimal linear feedback policies used in the predictions. In the stochastic setting, in [14], the tailored Jacobian approximation is coupled with a more accurate uncertainty propagation based on linear-regression Kalman filtering, showing real-time feasible timings for an automotive example. With respect to the aforementioned works, this paper presents a direct extension of [9], [10] to the setting of Gaussian process-based MPC. As such, the improvements presented in [13], [14] can be straightforwardly applied to our results as well.
To the best of the authors' knowledge, [8] is the only publication that applies the tailored Jacobian approximation to a variant of GP-MPC. Therein, the tailored stochastic NMPC algorithm is used in conjunction with an approximate GP based on a finite set of basis functions for an autonomous driving application, where the distributions of the weights are estimated jointly with the system state using a particle filter. Conceptually, the presented work differs from [8] insofar as we apply exact GP inference with an approximate uncertainty propagation, while in [8], a finite-dimensional GP approximation is used and the uncertainty propagation is performed using a particle filter. Moreover, we additionally focus on a theoretical analysis of the convergence properties and provide a high-performance implentation of the proposed algorithm.

II. PROBLEM FORMULATION
We consider discrete-time, nonlinear dynamics of the form where ψ, η : R n x × R n u → R n x describe the known and unknown parts of the system dynamics as functions of the state x(k) ∈ R n x and input u(k) ∈ R n u at time step k, respectively, and are assumed to be twice continuously differentiable. The process noise w(k) ∼ N (0, Σ w ) is assumed to be i.i.d., i.e., with diagonal covariance matrix Σ w = diag σ 2 1 , . . . , σ 2 n w ; it affects the states via the matrix B ∈ R n x ×n w , which allows for modeling the process noise in a lower-dimensional space. The system is subject to n h individual chance constraints, i.e., for all constraints j = 1, . . . , n h and time steps k, with satisfaction probability 0 < p j ≤ 1. This formalism also captures hard constraints, for example on the inputs, by setting p j = 1 for the respective constraint.
The key idea of GP-MPC is to model the unknown dynamics η as a Gaussian process with posterior mean µ d : (R n x × R n u ) → R n w and posterior covariance Σ d : (R n x × R n u ) → R n w ×n w , respectively. Note that by d(x, u) we denote a GP conditioned on data points already; see e.g. [15,Chap. 2] for the inference formulas. At every sampling time, the problem to be solved can be formulated as a stochastic OCP 1 , where a sequence of control polices π π π := {π i } N−1 i=0 , π : R n x → R n u , is determined over a given set Π π i of admissible policies, that minimizes the expected value of a user-defined cost over a finite time horizon of length N, As computing exact solutions to the stochastic OCP (4) is generally intractable, in the following we present a common approach to approximate (4) by a deterministic problem, as previously proposed by [6].

A. Mean and covariance propagation
Propagating the uncertainty introduced by the GP model through the nonlinear dynamics model in a computationally efficient way generally proves to be very challenging: As Gaussianity of the state distribution is lost after a nonlinear 1 In (4), terminal constraints have been omitted for simplicity. transformation, so is the possibility to capture the entire distribution by its first two moments. To retain computational tractability, we use a common linearization-based approximation of the propagation of the expected state and covariance, which results in the following deterministic update equations, starting from B. Chance constraints Given the covariance propagation above, it is possible to efficiently formulate the individual chance constraints (4e) as deterministic constraints on the mean prediction by a suitable tightening, i.e.,h j (µ For general probability distributions, the tightening factor α := p j 1−p j can be chosen based on the Chebyshev inequality; for Gaussian distributions, setting α j := Φ −1 (p j ), where Φ −1 (·) is the inverse cumulative density function of a standard normal Gaussian variable, is a less conservative choice.

C. Optimal control problem
With the above simplifications and an approximate formulation of the expected cost, we arrive at the following deterministic approximation of the stochastic OCP (4) in terms of the predicted state mean µ µ µ := {µ x i } N i=0 , state covariance Σ Σ Σ := {Σ x i } N i=0 and control input sequence u u u := {u i } N−1 i=0 , cf. [6], Remark 1: While optimizing over control input sequences in (9) has computational advantages compared to optimization over a sequence of feedback policies in (4), it can lead to underconfident predictions and an overly conservative controller. As a remedy, linear state feedback can be incorporated into the controller [6]. In that case, hard input constraints can generally not be satisfied and should be replaced with individual chance constraints; the adaptation of the OCPs (4) and (9) is straightforward and can be found in [6].
Infeasibility arising from fixing the covariances based on the previous MPC instance. The predicted state trajectory and covariances from the previous time step are drawn with solid lines in blue; predicted covariances around current predicted state trajectory in red; reference for previous/current time step with dashed lines; infeasible region in light red. When the linearized dynamics at the shooting nodes vary strongly from one time step to another, in this example based on a reference change, fixing the covariances based on the previous MPC instance might lead to significant prediction errors. Solving (9) in a receding horizon fashion leads to a highly performant and adaptive, yet uncertainty-aware control strategy, demonstrated by real-world applications such as [3], [5]. Nevertheless, for a real-time GP-MPC implementation, there remain major computational challenges to solve.
First, the OCP (9) has n x + n u + (n x + n 2 x )/2 optimization variables at each prediction stage, structurally equivalent to OCPs arising from a direct multiple shooting discretization [16]. For this set of problems, among the most competitive solvers at the time of this writing are interior point algorithms exploiting the block-sparse structure of the multiple shooting approach [17]. Still, these algorithms' cubic computational complexity in the number of shooting nodes leads to a total computational complexity of O(n 6 x ), which becomes prohibitive even for a moderate number of states.
Second, even without updating the GP data points online, evaluating the GP posterior mean and covariance and their Jacobians in a gradient-based optimization generally scales quadratically with the number of data points [15], posing strict limitations on the number of data points amenable for online inference. In this regard, solving (9) is particularly demanding, as computing the constraint Jacobians (9d) requires not only the Jacobian of the nominal and GP dynamics, but also their respective second-order derivatives.
A popular heuristic to ensure computational tractability despite the limitations stated above has been to propagate the state covariance (9d) outside the optimizer, based on input and state predictions from the last MPC iteration [3], [6]. Fixing the covariance matrices Σ x i at each stage in (9) eliminates the corresponding (n x + n 2 x )/2 augmented states, as well as the need to compute the equality constraint Jacobian derived from (9d). However, this heuristic generally does not lead to feasible solutions for problem (9), as Fig. 1 illustrates.
To address the aforementioned challenges, we propose a tailored sequential quadratic programming (SQP) algorithm that computes suboptimal, yet feasible, solutions to (9) with drastically reduced computational footprint.

III. ZERO-ORDER METHOD FOR GP-MPC
In the following, we present a tailored optimization method for Gaussian process-based MPC based on inexact sequential quadratic programming. To avoid excessive notation in the following exposition, let us rewrite the GP-MPC optimal control problem (9) in the following compact form, Thereby we have concatenated all mean states and inputs into y := (y 0 , . . . , y N−1 , µ x N ) and vectorized covariances into is the column-wise vectorization operator.

A. Sequential quadratic programming
Sequential quadratic programming [18] finds a solution to (10) by iteratively solving quadratic programs that locally approximate the original NLP at the current solution estimate (ŷ,P). At each iteration, the solution (∆y, ∆P) of the quadratic subproblem min ∆y,∆P is then used to update the current solution estimate, where M yy denotes the chosen approximation of the Hessian of the Lagrangian of (10). The core SQP algorithm is summarized in Alg. 1. Computationally, the SQP algorithm is ideally suited for NMPC, as the computational load can be reduced significantly by running only a fixed number of iterations before applying the control input to the plant, usually with negligible effect on the control performance, c.f. the Real-Time Iteration [19]. Still, as the number of optimization variables at each shooting node from (9) carries over to the quadratic subproblems in (11) when employing a sparsity-exploiting interior point solver, their cubic scaling with respect to the number of augmented states -or O(n 6 x ) with respect to the state dimension -poses a major limitation on the nominal system dimension amenable for a GP-MPC application.

B. Zero-order algorithm
To alleviate the limited scalability of optimizing over the state covariance matrices in GP-MPC, we employ an inexact SQP method that has been initially presented for stochastic [9] and robust NMPC [10]. The core idea is that, by using a tailored Jacobian approximation in the SQP algorithm, the equality constraints in (11) corresponding to the covariance propagation (10c) can be decoupled from the optimization problem. This allows for subsequent elimination of the associated optimization variables ∆P from the QP (11), which we present in the following.
The tailored Jacobian approximation is obtained by neglecting the Jacobian of the covariance propagation (10c) with respect to the nominal variables y, i.e., by setting ∂ g ∂ y := 0 in (11b). This leads to a zero-order approximation of the corresponding equality constraints at each SQP iteration. Under the approximate Jacobian, rearranging (11b) yields an approximation of ∆P based solely on the current linearization point (ŷ,P), i.e., which can be used to eliminate ∆P ≈ ∆P from the QP subproblems (11). Instead of solving the linear system in (12), however, let us show how ∆P may be obtained more efficiently and intuitively by making use of the fact that g(y, P) corresponds to a vectorized version of the covariance propagation (6). We start by noting that the corresponding vectorized equality constraints (10c) may be written as where A(y) ∈ R n 2 x N×n 2 x N is invertible and contains the system dynamics and b(y) ∈ R n 2 x N the vectorized process noise and GP posterior covariances for all prediction steps; see Appendix A for an explicit construction. Due to linearity of g(y, P) in P, obtaining ∆P in (12) based on the current linearization point (ŷ,P) is equivalent to solving for and setting ∆P :=P + −P. Since (13) is obtained by vectorization and stacking of (6) for all stages, it can easily be verified that solving forP + corresponds to propagating the predicted covariance as given by equation (6) based on the current iterateŷ and initial covariance Σ x 0 := 0, followed by an application of the vectorization operator.
After eliminating ∆P ≈ ∆P, problem (11)  The modified iteration of the SQP algorithm alternates the solution of QP (15) with the covariance propagation based on (6) until convergence, as summarized in Alg. 2. Due to the reduced size of the QPs, we recover again the computational complexity of O(n 3 x ) when employing sparsity-exploiting interior point QP solvers. This provides a drastic improvement compared with the complexity of O(n 6 x ) for the QPs in the original problem (11), as also demonstrated by the numerical examples in Section V.

IV. LOCAL CONVERGENCE PROPERTIES
At convergence, the solution obtained from Alg. 2 is guaranteed to be feasible and suboptimal for the original NLP (9) [10], [19]. This is in contrast to existing heuristics discussed in Fig. 1, where feasibility of the obtained solution with respect to (9) cannot be guaranteed.
Whether Alg. 2 converges to a feasible point of (9), however, depends on the error in the Jacobian approximation. In [10], this has been studied for equality constraints of the form 0 = A(y)P −σ b(y), with a scalar uncertainty parameterσ > 0. In particular, it has been shown that forσ → 0, the tailored Jacobian approximation does not deteriorate the local convergence properties of the SQP algorithm. In the following, we transfer these results to the case where b(y) is encoding uncertainty based on the GP posterior covariance by defining a parameter σ := b(ȳ) , which implicitly bounds the process noise and GP posterior covariance at a fixed point (ȳ,P) of Alg. 2. This way, for GPs based on a twice continuously differentiable kernel, we can show that for small enough process noise and GP posterior covariance, applying the tailored Jacobian approximation to GP-MPC preserves the local convergence properties of the SQP algorithm.
To this end, we first provide a brief review of the standard arguments for local convergence analysis of inexact Newtontype methods using strongly regular generalized equations.

A. Generalized equations
Generalized equations [20] allow us to reformulate the KKT conditions of (10) compactly as the set inclusion where and z := (y, P, λ µ , λ Σ , ν) contains the primal variables y, P, and the Lagrange multipliers λ µ , λ Σ , ν corresponding to the constraints (10b)-(10d), respectively. Thereby, N K (z) is defined as the normal cone to the set K := R n y +n P × R n f × R n g × R n h + at z. While solutions to (16) are KKT points of the original problem (9), the suboptimal solution obtained by Alg. 2 will instead satisfy the perturbed generalized equation whereF Evidently, the suboptimality is thereby caused by the inexact Jacobian approximation used in the QP subproblems, leading to a perturbed Lagrangian gradient ∇ (y,P)L (z) = ∇ (y,P) L(z) − ∂ ∂ y (A(y)P + b(y)) 0 λ Σ (20) in the stationarity conditions in (19) and (17). Every iterate of Alg. 2 solves the linearized generalized equation where J(ẑ) ≈ ∂F ∂ z z=ẑ is the Jacobian approximation around the linearization pointẑ.
To obtain a sufficient local convergence criterion for Alg. 2 to a solutionz := (ȳ,P,λ µ ,λ Σ ,ν) of (18), we need the following key requirements to be satisfied. Thereby B(z, r) denotes a ball of radius r ∈ R centered at z ∈ R n z .
Lemma 1 (cf. [22], Lemma 2): Let Assumptions 1 and 2 hold. Denote by z + a solution to (21) constructed at the linearization pointẑ. Then, there exist strictly positive constants κ < 1 and r κ , such that, for anyẑ ∈ B(z, r κ ), it holds that Note that for Newton-type optimization, Assumption 1 is standard and could be replaced by the stronger, but more frequently encountered assumptions of linear independence constraint qualification and strong second-order sufficient condition, see [20,Thm. 4.1]. Regarding Assumption 2, the goal for the following section is to show that it is satisfied for sufficiently small process noise and GP posterior covariance matrices, ensuring local convergence of Alg. 2 by Lemma 1.

B. Local convergence for small uncertainties
The difference between the approximate and exact Jacobian of (19), at the linearization pointẑ and a solutionz of (18), respectively, is given by Hence, from Lemma 1 and equation (24) we can deduce local convergence in a neighborhood aroundz if the errors in the Hessian-and tailored Jacobian approximations are sufficiently small. As only the latter is essential to our method, we focus on the Jacobian difference induced by the tailored Jacobian approximation; see e.g. [18] for a discussion of the Hessian approximation's role in SQP methods.
To derive the result, let us introduce an auxiliary generalized equation, parametrized byb ∈ R n 2 x N , The parameterb serves as a proxy for uncertainty at a solutionz b of (25) induced by the process noise Σ w and GP posterior covariances Σ d (ȳ i ) at all shooting nodes i = 0, . . . , N − 1 in (9). As such, ifb = b(ȳ), then a solutionz of (18) is also a solution of (25). Furthermore, lettingb → 0 allows us to study the local convergence properties of Alg. 2 for vanishing process noise and GP posterior covariances in the neighborhood ofz, denoting byz b : R n 2 x N → R n z ,b →z b (b) a single-valued solution map of (25). The main result makes use of the following two regularity assumptions.
Assumption 3: Let (25) be strongly regular atz b (0). Assumption 4: Let b(y) be twice continuously differentiable for all y ∈ R n y . Assumption 3 is standard and analogous to Assumption 1 for vanishing process noise and GP posterior covariances at a solution of (18). Assumption 4 can also be rephrased in terms of the GP kernel function's regularity: As b(y) denotes the vectorized process noise and GP posterior covariances, the latter of which are a linear combination of kernel function evaluations [15], we restrict ourselves to twice continuously differentiable kernels. This captures many of the kernels commonly used in practice, such as squared exponential, linear or Matèrn kernels with ν ≥ 5/2. Lemma 2: Let Assumptions 3 and 4 hold and define σ := b(ȳ) . Then, for any ε > 0, there exists a δ ∈ R such that if σ ≤ δ , it holds that Proof: We will show the above implication by considering both summands inside (27) separately.
For the second term, consider the Taylor expansion of the i-th component of b(y) around the expansion pointȳ, evaluated at y t :=ȳ + δ y · t, for any unit vector δ y with δ y = 1 and strictly positive scalar t ∈ R, Thereby, the Hessian is evaluated at some ξ ∈ Ξ, with the set Ξ := {ȳ + s(y t −ȳ) | s ∈ [0, 1] andȳ, y t ∈ R n y }. Since b i (y) is twice continuously differentiable, there exists some M 2 ∈ R such that for all ξ ∈ Ξ, it holds that From (28), by taking the absolute value, dividing by t > 0 and inserting the definitions of σ and M 2 , we then obtain Since (28) holds for all t > 0 and δ y with δ y = 1, taking the supremum over δ y on the left and the infimum over t on the right hand side preserves inequality (30), i.e., By applying the definition of the induced matrix norm on the left, and solving for the value of the infimum on the right hand side, (31) simplifies to Thus, we have shown that the norms of both summands in (27) scale with O(σ ) = O( b(ȳ) ), which proves the assertion by the triangle inequality and the definition of O(σ ).
To summarize, in tandem with Lemma 1 and a sufficiently accurate Hessian approximation M yy across all SQP iterates, Lemma 2 establishes guaranteed convergence of Alg. 2 to a solutionz of (18) for sufficiently small process noise and GP covariances in a local neighborhood -and practical convergence depending on the regularity of (18) in terms of the Lipschitz constant γ.

V. NUMERICAL RESULTS
Alg. 2 has been prototyped in Python using the corresponding acados interface. For the nominal dynamics, evaluation and sensitivity computation is thereby performed using an acados integrator and just-in-time compiled CasADi functions [23]; for the GP mean and covariance, the corresponding computations are carried out with PyTorch [11] using the GPyTorch library [12]. 2 In the following, we compare different variants of the proposed algorithm against a "naïve" GP-MPC implementation and nominal MPC using a scalable benchmarking example.

A. Hanging chain example
To test the scaling properties of Alg. 2, we apply it on a slightly modified variant of the hanging chain example -a popular benchmark for numerical methods for NMPC [10], [17]. Parameter values used in the following description are given in Table I. The system is defined by a chain of masses m connected by linear springs with stiffness k. The mass at one end of the chain is tied to the origin; the n u = 3 velocity components of the other end's mass constitute the control inputs of the system. The system state is given by the position components of the controlled mass as well as the position and velocity components of the intermediate masses, resulting in a state space dimension of n x = 6(n mass − 2) + 3. The initial state of the chain is computed based on its resting position where the controlled end is placed at (x init , y init , z init ) := (6l(n mass − 1), 0, 0). After applying a control input of u init := (1, 1, 1) for T init := 1s, the control task is to restore the chain's resting position while ensuring that none of the masses violates a wall constraint at y wall . The example is modified by adding a latent force to the y-acceleration of each intermediate mass, where x and v x denote the position and velocity along the x-axis of the frame, respectively. The continuous-time dynamics are discretized using an implicit Runge-Kutta integrator, with time step T s . A Gauss-Newton Hessian approximation M yy is employed. The model mismatch on each of the n w := 3(n mass − 2) velocity states of the intermediate masses is captured using independent GPs with squared exponential kernel. Training data is generated by recording D := 15N x 0 samples of the model mismatch from closedloop simulations with a nominal model predictive controller that only considers the nominal dynamics and no constraint tightening, starting from N x 0 perturbed initial conditions. The following experiments were performed using an Intel i9-7940X processor running at 3.10 GHz and an NVIDIA GeForce RTX 2080 Ti GPU.
B. Timings for increasing state and GP output dimension Fig. 2 shows the scaling of the mean computation times per SQP iteration as the number of states n x is increased by adding masses to the chain, for the following controller implementations: • "nominal": Using only the nominal model and no uncertainty description, • "naïve": Solving (9) exactly without any data points (no GP-related computations 3 ), including the covariances as optimization variables into an augmented state, • "alg2-cpuC-D": Alg. 2 with D data points and GP inference and AD on C CPU cores, and • "alg2-gpu-D": Alg. 2 with D data points and GP inference and AD performed on GPU.
It becomes evident that the scaling of order O(n 6 x ) for the "naïve" method quickly becomes prohibitive in terms of computation time. In comparison, for n x = 39, a roughly 1000-fold speed-up by means of the zero-order optimization strategy can be observed. Comparing the computation times of Alg. 2 for different number of data points clearly shows the effects of GPU acceleration, which accounts for a 5 − 10fold speed-up and better scaling properties per SQP iteration for n x ≥ 33 and the associated GP output dimension. 3 Due to software limitations, for a fair comparison the "naïve" GP-MPC implementation only makes use of the GP prior, i.e., µ d (y) ≡ 0 and Σ d (y) ≡ const., i.e., it corresponds to a stochastic NMPC implementation with the process noise covariance inflated by the GP prior covariance. Therefore the present timings can be viewed as a lower bound for the actual timings when solving (9) with a GP conditioned on data, which would additionally require not only GP posterior evaluations and derivative computations, but also expensive computations associated with the Hessian of the GP posterior mean.  alg2-cpu1-0 alg2-cpu1-1500 alg2-cpu14-1500 alg2-gpu-1500 • "acd itf": acados interface for get/set operations, • "acd itg": acados implicit Runge-Kutta integrator, • "acd qp": acados QP solve of (15) using HPIPM [24], • "con tight": NumPy covariance propagation (6) and constraint tightening for (15c), • "gpytorch": GPyTorch GP inference and AD. For 150 data points, Fig. 3 shows that the overhead introduced by parallelizing gpytorch operations is higher than the computational speed-ups, especially on the GPU. For 1500 data points, however, the parallelization leads to significantly lower computation times, saving about 60-70% of the computational costs associated with the GPs, which at this point dominate the computational footprint of the method. For 0 data points, potential for improvement can be seen in terms of the constraint tightening computations, which could be improved by switching to a C/C++ implementation.

VI. CONCLUSIONS
To make GP-MPC computationally tractable, both the complexity to solve the OCP, as well as evaluating and differentiating the GP posterior mean and covariance need to be addressed. To tackle these challenges, this paper has presented an inexact SQP approach with a tailored Jacobian approximation, while parallelizing GP inference and differentiation routines on a GPU. The results demonstrate that drastically reduced computation times can be achieved while ensuring feasibility of the converged iterates and maintaining favorable convergence properties, pushing the computational boundaries to apply GP-MPC in real-world scenarios.