Stochastic data-driven model predictive control using Gaussian processes

Nonlinear model predictive control (NMPC) is one of the few control methods that can handle multivariable nonlinear control systems with constraints. Gaussian processes (GPs) present a powerful tool to identify the required plant model and quantify the residual uncertainty of the plant-model mismatch given its probabilistic nature . It is crucial to account for this uncertainty, since it may lead to worse control performance and constraint violations. In this paper we propose a new method to design a GP-based NMPC algorithm for finite horizon control problems. The method generates Monte Carlo samples of the GP offline for constraint tightening using back-offs. The tightened constraints then guarantee the satisfaction of joint chance constraints online. Advantages of our proposed approach over existing methods include fast online evaluation time, consideration of closed-loop behaviour, and the possibility to alleviate conservativeness by accounting for both online learning and state dependency of the uncertainty. The algorithm is verified on a challenging semi-batch bioprocess case study, with its high performance thoroughly demonstrated.


Introduction
Model predictive control (MPC) describes an advanced control method that has found a wide range of applications in industry. MPC employs an explicit dynamic model of the plant to determine a finite sequence of control actions to take at each sampling time. The main advantage of MPC is its ability to deal with multivariate plants and process constraints explicitly [1]. Linear MPC is relatively mature and well-established in practice, however many systems display strong nonlinear behaviour motivating the use of nonlinear MPC (NMPC) [2]. NMPC is becoming progressively more popular due to the advancement of improved non-convex optimization algorithms [3], in particular in chemical engineering [4].
The performance of MPC is however greatly influenced by the accuracy of the plant model, which is one of the main reasons why MPC is not more widely used in industry [5]. The development of an accurate plant model has been cited to take up to 80% of the MPC commissioning effort [6]. NMPC algorithms exploit various types of models, commonly developed by first principles or based on process mechanisms [7]. Many mechanistic and empirical models are however often too complex to be used online and in addition have often high development costs. Alternatively, black-box identification models can be exploited instead, such as support vector machines [8], fuzzy models [9], neural network models [10], or Gaussian processes (GPs) [11].
GPs are an interpolation technique developed by Krige [12] that were popularized by the machine learning community [13]. While GPs have been predominantly used to model static nonlinearities, there are several works that apply GPs to model dynamic systems [14,15,16]. GP predictions are given by a Gaussian distribution. The mean of this distribution can be viewed as a deterministic prediction, while the variance can be interpreted as a measure of uncertainty for this deterministic prediction. This uncertainty measure is generally difficult to obtain by nonlinear parametric models [11] and may in part explain the relative popularity of GPs. For control applications this uncertainty measure can be utilized to efficiently learn a dynamic model by exploring unknown regions or avoiding regions with too high uncertainty to improve robustness [17]. So far, GPs have been exploited in a multitude of ways in the control community, including reinforcement learning [18], designing robust linear controllers [19], or adaptive control [20]. In particular, GPs have been shown to be an efficient approach to attain approximate plant models for NMPC.
The use of GPs for NMPC was first proposed in Murray-Smith et al. [21], which updates a GP model online for reference tracking without constraints. In Kocijan et al. [11], Kocijan and Murray-Smith [22] the GP is instead identified offline and utilized online, in which the variance is constrained to prevent the NMPC from steering the dynamic system into regions with high uncertainty. A GP plant model is updated online in Klenske et al. [23] and in Maciejowski and Yang [24] to overcome unmodeled periodic errors or changes to the dynamic system after a fault has occurred respectively. GPs have been shown in Wang et al. [25] to be an efficient means for disturbance forecasting for a linear stochastic MPC approach applied to a drinking water network. GPs have further been applied to approximate the mean and variance required in stochastic NMPC [26]. Grancharova et al. [27] derived an explicit solution for GP-based NMPC. In Cao et al. [28] a GP dynamic model is employed for the control of an unmanned quadrotor, while in Likar and Kocijan [29] a GP dynamic model is exploited to control a gas-liquid separation process. While these and other works show the feasibility of GP-based MPC, there is a lack of efficient approaches to account for the uncertainty measure provided. Model uncertainty can lead to constraint violations and worse performance. To mitigate the effect of uncertainty on MPC, robust MPC [30] and stochastic MPC [31,32] methods have been developed.
The majority of works for GP-based MPC indeed consider the uncertainty measure provided, however most proposed algorithms employ stochastic uncertainty propagation to achieve this, for example [11,22,33,28,27,25]. Hewing and Zeilinger [34] give an overview of the various stochastic propagation techniques available. These approaches have some considerable disadvantages, which are: • No known methods to exactly propagate stochastic uncertainties through GP models. Instead, only approximations are available relying on linearization or statistical moment-matching.
• Increased computational time of GP-based MPC due to the propagation approach itself.
• Most works consider only open-loop propagation of uncertainties, which is often prohibitively conservative due to open-loop growth of uncertainties.
Recently some papers have proposed different robust GP-based MPC algorithms. In Koller et al. [35] a NMPC algorithm is introduced based on propagating ellipsoidal sets using linearization, that provides closed-loop stability guarantees. This approach may however suffer from increased computational times, since the ellipsoidal sets are propagated online. Furthermore, the method may be relatively conservative due to the use of Lipschitz constants. Maiworm et al. [36] propose the use of a robust MPC approach by bounding the one-step ahead error, while the determination of the required parameters seems to be relatively difficult. Soloperto et al. [37] suggest a robust control approach for linear systems, in which the GP is used to represent unmodeled nonlinearities. The approach is shown to stabilize the linear system despite these uncertainties, which however may have no solution if the difference between the linearized system and the actual nonlinear system is too large.
In this paper we extend an algorithm first introduced in Bradford et al. [38], for which the following extensions were made: • Inclusion of uncertainty for the initial state.
• Adding additive disturbance noise to the problem definition.
• Accounting for state dependency on the GP noise.
• Improved algorithm to obtain the required back-offs.
The aim of this approach is to take into account the uncertainty given by a GP state space model for a NMPC finite-horizon control problem. Due to the issues using stochastic uncertainty propagation for the NMPC formulation as highlight previously, we base the NMPC only on cheap evaluations of the GP. This leads to considerably faster evaluation times with little effect on the performance. The proposed method utilizes explicit back-offs, which were recently proposed in [39,40] to account for stochastic uncertainties in NMPC. These methods generally rely on generating closed-loop Monte Carlo (MC) samples offline from the plant to attain the required back-off values. To obtain exact MC samples of the GP dynamic models we exploit results from Conti et al. [41], Umlauft et al. [42]. There are several important advantages of this new method: • Back-offs are attained using closed-loop simulations, therefore the issue of open-loop growth of uncertainties is avoided.
• Required computations are carried out offline, such that the online computational times are nearly unaffected.
• Independence of samples allows some probabilistic guarantees to be given.
• Explicit consideration of online learning and state dependency of the uncertainty to alleviate conservativeness.
The algorithm proposed is aimed at finite-horizon control problems, for which batch processes are a particularly important example. They are utilized in many different chemical engineering sectors due to their inherent flexibility to deal with variations in feedstock, product specifications, and market demand. Frequent highly nonlinear behaviour and unsteady-state operation of batch processes have led to the increased acceptance for advanced control solutions, such as NMPC [43]. Works on batch process NMPC accounting for uncertainties include an extended and Unscented Kalman filter based algorithms for uncertainty propagation [44,45], a NMPC algorithm using min-max successive linearization [46], NMPC algorithms that employ PCEs to account for possible parametric uncertainties [47,48], and multi-stage NMPC [49].
The paper is comprised of the following sections. In Section 2 the problem definition is given. In Section 3 a general outline of GPs is given including the sampling procedure used. Section 4 shows how the GPs can be exploited to solve the defined problem. In Section 5 the semi-batch bioprocess case study is described, while in Section 6 the results and discussions for this case study are given. Section 7 concludes the paper.

Notation
N and R represent the sets of natural numbers and real numbers respectively. The variable δ i j denotes the Kronecker delta function, such that: The notation diag(a 0 , a 1 , . . . , a n ) is used to represent the following diagonal matrix: diag(a 0 , a 1 , . . . , a n ) := We represent the Gaussian distribution with mean µ and covariance Σ as N(µ, Σ). Further, φ ∼ N(µ, Σ) denotes that the random variable φ follows a Gaussian distribution with mean µ and covariance Σ.
The expected value of a random variable φ is denoted as: where p φ the probability density function of φ over the sample space Ω. Further, we define the indicator function and the probability measure of random variable φ as follows: where A is a set defining an event on φ and P{φ ≤ c} is the probability that φ is less than or equal to c. Lastly, we require the definition of the beta inverse cumulative distribution function (cdf) for a random variable φ. This function betainv(P, A, B) returns a value C of φ following a beta distribution with parameters A, B that has a probability of P to be less than or equal to C. The definition is as follows:

Problem definition
The dynamic system in this paper is given by a discrete-time nonlinear equation system with additive disturbance noise: where t is the discrete time, x ∈ R n x is the state, u ∈ R n u are the control inputs, f : R n x × R n u → R n x are nonlinear equations, and ω represents Gaussian distributed additive disturbance noise with zero mean and diagonal covariance matrix Σ ω . The initial condition x 0 is assumed to be Gaussian distributed with mean µ x 0 and covariance matrix Σ x 0 . We assume measurements of the states to be available with additive Gaussian noise expressed as: where y ∈ R n x is the measurement of f(x, u) perturbed by additive Gaussian noise ν ∼ N(0, Σ ν ) with zero mean and a diagonal covariance matrix Σ ν = diag(σ 2 ν 1 , . . . , σ 2 ν nx ). The aim of the control problem is to minimize a finite-horizon cost function: where T ∈ N is the time horizon, U = [u 0 , . . . , u T −1 ] T ∈ R T ×n u is a joint vector of T control inputs, : R n x × R n u → R is the stage cost, and f : R n x → R denotes the terminal cost.
The control problem is subject to hard constraints on the inputs: The states are subject to a joint chance constraint that requires the satisfaction of a nonlinear constraint set up to a certain probability, which can be stated as: where X t is defined as: The joint chance constraints are formulated such that the joint event over all t ∈ {0, . . . , T } of all x t fulfilling the nonlinear constraint sets X t has a probability greater than 1 − . For convenience we define the tuple z = (x, u) ∈ R n z with joint dimension n z = n x + n u . The dynamic system in Equation 1 is assumed to be unknown. Instead, we are only given a finite number of noisy measurements according to Equation 2. The available data can then be denoted by the following two matrices: where z (i) represents the input of the i-th data point with corresponding noisy observation y (i) , N denotes the overall number of training data points, Z is a collection of input data, and the corresponding noisy observations are collected in Y.
It should be noted that the uncertainty in this problem arises partially from the uncertain initial condition x 0 and the additive disturbance noise ω. Most of the uncertainty however comes from the fact that we do not know f(x, u) and are only given noisy observations of f(x, u) instead. To solve this problem we train a GP to approximate f(·) using the available data in Equation 6. The GP methodology is introduced for this purpose in the next section. This GP then represents a distribution over possible functions f(·) given the available data, which can be exploited to attain stochastic constraint satisfaction of the closed-loop system.

Regression
In this section we introduce the use of GPs to infer a latent function f : R n z → R from noisy data. For a more complete overview refer to Rasmussen and Williams [13]. Let the noisy observations y of f (·) be given by: where z ∈ R n z is the argument of f (·) and y is a perturbed observation of f (z) with additive Gaussian noise ν ∼ N(0, σ 2 ν ) with zero mean and variance σ 2 ν . GPs can be considered a generalization of multivariate Gaussian distributions to describe a distribution over functions. A GP is fully specified by a mean function and a covariance function. The mean function represents the "average" shape of the function, while the covariance function specifies the covariance between any two function values. We write that a function f (·) is distributed as a GP with mean function m(·) and covariance function k(·, ·) as: The prior GP distribution is defined by the choice of the mean function and covariance function. In this study we apply a zero mean function and the squared-exponential (SE) covariance function defined as: where z, z ∈ R n z are arbitrary inputs, ζ 2 denotes the covariance magnitude, and Λ −2 diag(λ −2 1 , . . . , λ −2 n z ) is a scaling matrix.
Remark (Prior assumptions). Note the zero mean assumption can be easily achieved by normalizing the data beforehand. The SE covariance function is smooth and stationary, such that choosing the SE covariance function assumes the latent function f (·) to be smooth and stationary as well.
From the additive property of Gaussian distributions the measurements of f (·) also follow a GP accounting for measurement noise: We denote the hyperparameters defining the prior jointly by Ψ [ζ, λ 1 , . . . , λ n z , σ ν ] T , in which the variance σ ν of the measurement noise is included in case it is unknown. Commonly the hyperparameters are unknown, such that they need to be inferred from the available data using for example maximum likelihood estimation (MLE).
Assume we are given N noisy function evaluations according to Equation 7 denoted by Y [y (1) , . . . , y (N) ] T ∈ R N as the result of the inputs given in Z = [z (1) , . . . , z (N) ] T ∈ R N×n z . According to the prior GP assumption, the data follows a multivariate Gaussian distribution: The log-likelihood of the observations is consequently given by (ignoring constant terms): The MLE estimate of the hyperparameters Ψ is determined by maximizing Equation 13. Once the hyperparameters are known, we need to determine the posterior GP distribution of the latent function f (·). From the prior GP assumption we know that the training data and the value of f (·) at an arbitrary input z follow a joint multivariate normal distribution: where k(z) [k(z, z (1) ), . . . , k(z, z (N) )] T . The posterior Gaussian distribution of f (z) given the data (Z, Y) can then be found by using the conditional distribution rule for multivariate normal distributions based on the joint normal distribution in Equation 14, which leads to: where D = (Z, Y) denotes the training data available to obtain the posterior Gaussian distribution. The mean function µ f (z; D) in this context is the prediction of the GP at z, while the variance function σ 2 f (z; D) is a measure of uncertainty. In Figure 1 we illustrate a prior GP in the top graph and the posterior GP in the bottom graph.

Recursive update
Often we are given a training dataset D initially to build a posterior GP and then obtain data points individually afterwards. For example in GP-based MPC we may build an initial GP offline following the procedure shown in Section 3.1, and then also update this model online as new data becomes available. In this work we keep the hyperparameters constant but update the mean function and variance function in Equations 15b-15c recursively given the new data points. Furthermore, the approach used for MC sampling of the GPs to be introduced in Section 3.4 requires recursive updates of this form as well.
Let the new dataset be given by D + = (D, (z + , y + )), where D is the training data for the initial GP, while z + is the new input and y + the new corresponding output measurement. Then we will refer to the updated mean function and variance function as: The updated terms in Equations 16a-16b can be expressed as: where k(z), Z and Y refer to quantities of the initial GP. The noise term σ 2 ν in the lower diagonal is shown in brackets, since the new "measurement" y + may be noiseless as is the case for GP MC samples. In this case the noise term should not be added to the new diagonal element.
Note the updates k + (z), Z + , and Y + are trivial, however the update of the inverse covariance matrix Σ +−1 Y is more involved. In essence we require the inverse of the previous covariance matrix after adding a horizontal row and a vertical row to the covariance matrix of the initial GP, see Equation 17c. For this process there are efficient formula available, one of which is introduced in Appendix A. These take advantage of the fact that we already know the inverse covariance matrix Σ −1 Y of the initial GP. Once the update has been carried out, these terms then define the new initial GP. This update procedure is then repeated for the next measurement.

State space model
In this section we briefly show how the previously introduced GP methodology can be utilized to identify unknown state space models in the form of Equation 1 based on the measurements (data) according to Equation 2. GPs are commonly applied to model scalar functions with vector inputs as shown in Section 3.1. To extend this to the multi-input, multi-output case as required it is common to build a separate independent GP for each output dimension, see for example Deisenroth and Rasmussen [18]. Let the function in Equation 1 be given by . . , f n x (z)] T . We aim to build a separate GP for each function f i (z) ∀i ∈ {1, . . . , n x } according to Section 3.1. For this purpose we are given observations . . , n x } and corresponding inputs Z = [z (1) , . . . , z (N) ] T , where y i refers to the i-th dimension of measurements obtained according to Equation 2. Let Y = [Y 1 , . . . , Y n x ] correspond to the overall measurements available. The posterior Gaussian distribution of f(·) at an arbitrary input z = (x, u) is: where µ f (z; D i ) and σ 2 f (z; D i ) are the mean function and variance function built according to Section 3.1 with datasets Remark (Additive disturbance noise). Note the additive disturbance noise defined in Equation 1 is simply added to the posterior covariance matrix due to the additive property of multivariate Gaussian distributions.
In addition, given an initial GP state space model built with a dataset D and a new data point (z + , y + ), we can update it recursively utilizing the method introduced in Section 3.2:

Monte Carlo sampling
GPs are distribution over functions and hence their realizations yield deterministic functions, see for example the GP samples shown in Figure 1. In this section we show how to attain independent samples of GP state space models over a finite time horizon. Generating a MC sample of a GP would require sampling an infinite dimensional stochastic process, while there is no known method to achieve this. Instead, approximate approaches have been applied such as spectral sampling [50]. Exact samples of GPs are however possible if the GP MC sample needs to be known at only a finite number of points, which is exactly the situation for state space models over a finite time horizon. This technique was first outlined in Conti et al. [41] and has been employed in Umlauft et al. [42] for the optimal design of linear controllers. We next outline how to obtain an exact sample of a GP state space model over a finite time horizon.
Assume we are given a GP state space model as shown in Section 3.3 from the input-output dataset D = (Z, Y). The initial condition x 0 is assumed to follow a known Gaussian distribution as defined in Equation 1. A GP state space model represents a distribution over possible plant models, for which each realization will lead to a different state sequence. The aim of this section is therefore to show how to obtain a single independent sample of such a state sequence, which can then be repeated to obtain multiple independent MC samples of the GP. Let We assume the control inputs to be the result of a feedback control policy, which we represent as κ : R n x × R → R n u . The control actions u (s) i are then given as follows: where i is the current discrete time. Note the control policy depends on the discrete time directly, since it is a finite horizon control policy. Consequently, the control actions over the finite time horizon T are a function of X (s) and denoted jointly as Note these control inputs are different for each MC sample s due to feedback.
We start by sampling the Gaussian distribution of the initial state x 0 ∼ N(µ x 0 , Σ x 0 ) to obtain the realization χ (s) 0 . The posterior Gaussian distribution of the next state in the sequence x 1 is subsequently given by the GP of f(·) as defined in Equation 18 dependent on χ (s) 0 : The realization of x 1 is obtained by sampling the above normal distribution, which we will denote as χ (s) 1 . To obtain the next state in the sequence χ (s) 2 we need to first condition on χ (s) 1 , since this is part of this sampled function path. This requires to treat χ (s) 1 similarly to a new training point, however without observation noise (i.e. no σ 2 ν is added to the kernel evaluation k(Z 0 , Z 0 )) and without changing the hyperparameters. Note that if the sampled function were to return to the same input it would lead to the exact same output, since it is conditioned on a noiseless output. This shows that the sampled function is deterministic, since it is the result of sampling. Next we draw χ (s) 2 according to the posterior Gaussian distribution obtained from adding the previously sampled data point to the training dataset D as a noiseless observation. This sample is then again added to the training dataset as a noiseless observation, from which the GP is updated and the next state is drawn. This process is repeated until the required time horizon T has been reached.
This sampling approach is summarized in Algorithm 1 below and is illustrated in Figure 2. Each GP MC sample is defined by a state and corresponding control action sequence. Note that this gives us a single MC sample and subsequently needs to be repeated multiple times to obtain multiple realizations.
. Note updating µ f (·; D) with mean values has no effect.

Solution approach
From the input-output dataset D = (Z, Y) defined in Equation 6, we fit a GP state space model as outlined in Section 3.3. The aim is to solve the problem defined in Section 2 based on this GP state space model. In this context the GP represents a distribution over possible plant models for the process given the available dataset. In Section 3.4 we have shown how to create a sample of this plant model over a finite time horizon T , which each lead to different state sequences and corresponding control sequences based on a control policy. In this paper we aim to design a NMPC algorithm based on the GP that acts as this control policy. The MC samples are utilized to tune the NMPC formulation by adjusting so-called back-offs to tighten the constraints and attain the closed-loop probabilistic constraint satisfaction. We next state the GP NMPC formulation, which is based on the tightened constraint set and predictions from the mean function µ f (·; D) and covariance function Σ f (·; D).

Finite-horizon Gaussian process model predictive control formulation
In this section we define the NMPC OCP based on the GP nominal model given the dataset D = (Z, Y). Let the optimization problem be denoted as P T µ f (·; D), Σ f (·; D); x, t for the current known state x at discrete time t based on the mean function µ f (·; D) and covariance function Σ f (·; D): wherex,û, andV T (·) refers to the states, control inputs, and control objective of the MPC formulation,Û t:T −1 = [û t , . . . ,û T −1 ] T , η k are weighting factors to penalize uncertainty, and X k is a tightened constraint set denoted by: j represent so-called back-offs, which tighten the original constraints X t defined in Equation 5. Remark (Objective in expectation). It should be noted that the above objective in Equation 22 does not exactly optimize the objective in Equation 3, since it is difficult to obtain the expectation of a nonlinear function. Approximations of this can be found in Hewing and Zeilinger [34], however these generally are considerably more expensive and often only lead to marginally improved performance.
Remark (Scaling for state dependency factors). Note we have opted for scalar scaling factors η k to account for state dependency. For this to work reliably it is therefore necessary to normalize the data to ensure that all data has approximately the same magnitude, for example normalizing the data to have zero mean and unit variance.
The NMPC algorithm solves P T µ f (·; D), Σ f (·; D); x t , t at each sampling time t given the current state x t to obtain an optimal control sequence: Only the first optimal control action is applied to the plant at time t before the same optimization problem is solved at time t + 1 with a new state measurement x t+1 . This procedure implicitly defines the following feedback control law, which needs to be repeatedly solved for each new measurement x t : It is explicitly denoted that the control actions depend on the GP model used. There are several important variations of the GP NMPC control policy. Firstly, it may seem reasonable to update the mean and covariance function using the previous state measurement and corresponding input by applying the recursive update rules introduced in Section 3.2. We will refer to this as learning. This may however lead to a more expensive and less reliable NMPC algorithm.
Secondly, the algorithm may want to avoid regions in which there is great uncertainty due to sparsity of data. This can be achieved by assigning some of the η k with non-zero values to penalize the algorithm moving into these regions with high variance. This explicitly takes advantage of the state dependency of the noise covariance function and is hence referred as state dependent. It should be noted that evaluation of the covariance function is computationally expensive. It was determined that setting only η t+1 to a non-zero value is often sufficient due to the continued feedback update, i.e. penalizing only the variance for the one-step ahead prediction at time t. These variations can be summarized as follows: • Learning: Update the mean and variance function of the GP using the previous state measurement and the known corresponding input.
• State dependent: Set some η k not equal to zero, which will lead to the NMPC algorithm trying to find a path that has less variance and hence exploiting the state dependent nature of the uncertainty.
• Non learning: Keep the mean and variance function the same throughout the run.
• Non state dependent: Set all η k to zero and hence ignoring the possible state dependency of the uncertainty.

Probabilistic guarantees
In this section we illustrate how to obtain probabilistic guarantees for the joint chance constraint introduced in Section 2 in Equation 5 based on independent samples of the GP plant model. For convenience we define a single-variate random variable C(·) representing the satisfaction of the joint chance constraint [51]: Equation 25 is intractable, however a good nonparametric approximation is often achieved utilizing the so-called empirical cumulative distribution function (ecdf). We define the cdf to be approximated as follows:

The probability in
Assuming we are given S independent and identically distributed MC samples of X and hence of C(X), the ecdf estimate of the true cdf in Equation 26 is given by: where X (s) is the s-th MC sample andF C(X) (c) is the ecdf approximation of the true cdf F C(X) (c). The quality of the approximation in Equation 27 strongly depends on the number of samples used and it is therefore desirable to quantify the residual uncertainty of the sample approximation. This problem has been studied to a great extent in the statistics literature [52]. In addition, there are several works applying these results for chance constrained optimization, see for example Alamo et al. [53]. In general, these results rely on the following observations. Firstly, 1{C(X (s) ) ≤ c} for a fixed value of c describes a Bernoulli random variable, in which either C(X (s) ) exceeds c and takes the value 0 or otherwise takes the value 1 with probability F C(X) (c). Secondly, the ecdf describes the number of successes of S realizations of these Bernoulli random variables divided by the total number of samples and hence follows a binomial distribution, i.e. F C(X) (c) ∼ 1 N S Bin(S , F C(X) (c)). The confidence bound for the ecdf can then be determined from the Binomial cdf. This method was first introduced by Clopper and Pearson [52] as "exact confidence intervals". Due to the close relationship between beta distributions and binomial distributions, a simplified expression can be obtained using beta distributions instead. This leads to the following theorem that gives us a lower and upper confidence bound [54].
Theorem 1 (Confidence interval for empirical cumulative distribution function). Assume we are given a value of the ecdf, β =F C(X) (c), as defined in Equation 27 based on S independent samples of C(X), then the true value of the cdf, β = F C(X) (c), as defined in Equation 26 has the following lower and upper confidence bounds: Proof. The proof uses standard results in statistics and can be found in Clopper and Pearson [52], Streif et al. [54].
In other words the probability of β exceeding the valueβ ub has a probability of α and the probability of β being less than or equal toβ lb has also a probability of α. In particular,β lb for small α represents a conservative lower bound on the true probability β. An illustration of the confidence bound for the ecdf is shown in Figure 3. In general more samples will lead to a tighter confidence bound as expected. The theorem provides a lower boundβ lb that accounts for the statistical error due to the finite sample estimate made, i.e. it gives us a conservative value that is less than or equal to the true probability of feasibility with a confidence level of 1 − α.
We assume we are given S independent samples of the trajectory X generated according to Section 3.4. Let the approximate ecdf be given byβ =F C(X) (0) according to Equation 27, andβ lb the corresponding lower bound according to Theorem 1 with confidence level 1 − α. From this the following Corollary follows: Corollary 1 (Feasibility probability). Assuming the GP representation of the plant model to be a correct description of the uncertainty of the system and given a value of the ecdfβ =F C(X) (0), as defined in Equation 27 based on S independent samples and a corresponding lower boundβ lb ≥ 1 − with a confidence level of 1 − α, then the original chance constraint in Equation 5 holds true with a probability of at least 1 − α.

Determining back-off constraints
In this section we describe how to determine the required back-off values to tighten the constraints of the GP NMPC algorithm in Equation 22. The aim is to choose these values to obtain probabilistic guarantees for the state constraints defined in Equation 5 despite not knowing the exact dynamics. The GP provides a nominal model for the GP NMPC formulation in Equation 22 using the mean function and a distribution of possible plant models given the initial dataset in Equation 6. This distribution is exploited to attain different possible plant model realizations to simulate the closed-loop response of the GP NMPC algorithm and then uses the response values to tighten the constraints, such that the original constraint set is satisfied with a high probability according to Equation 5.
We have shown how to obtain the closed-loop trajectory of a control policy according to the realization of a plant model GP distribution in Section 3.4. To this we apply the GP NMPC control policy defined in Equation 24. We propose to utilize S independent MC samples of the GP distribution generated according to Section 3.4, which then in turn describe S different possible plant models with corresponding state and control trajectories. The goal now is to adjust the back-offs, such that the S different state trajectories adhere the original constraint set for all but a few samples to attain the required probability of constraint satisfaction. Let X (s) = [χ (s) 0 , . . . , χ (s) T ] T refer to the state trajectory of sample s. The update rule for adjusting the back-offs is based on two steps: First, we define an approximate constraint set, which is then adjusted by a constant factor to obtain the required constraint satisfaction using the ecdf of the joint chance constraint in Equation 30. The approximate constraint set in essence needs to reflect the difference in constraint values for the nominal model of the MPC and the realizations of the GP plant model. We first set all the back-off values to zero and run S MC samples of the GP plant model. As defined in Section 3.4, let χ t refer to the states according to the nominal trajectory with the back-offs set to zero as well. Now assume we aim to obtain back-off values that imply the satisfaction of the following individual chance constraints: where δ is a tuning parameter and should be set to a reasonably low value. The rule in other words aims to find approximate back-offs for the nominal predictions χ t utilized in the MPC formulation in Equation 22, such that the chance constraints holds for any possible GP plant model MC sample with a probability of 1 − δ. The parameter δ in this case is a tuning parameter.
To accomplish this we define the following ecdf based on the S MC samples available: (0) is a sample approximation of the chance constraint given in Equation (29) on the RHS. In Paulson and Mesbah [40] it is proposed to employ the inverse ecdf to approximately fulfill the requirement given in Equation (29) using the S MC samples available. The back-offs can then be stated as follows: j is the inverse of the ecdf defined in Equation 30 andb (t) j refers to these initial back-off values. Note the inverse of an ecdf is given by the quantile function of the discrete S values with probability δ. This gives us the initial back-off values as required for the first step. Note that both the nominal trajectory χ t and the GP realizations depend on the back-off values, which is however ignored since we are only interested in obtaining some reasonable initial values. In the next step these back-off values are further adjusted using a constant back-off factor γ. The new back-offs are then defined as: We aim to change γ until the lower bound of the ecdfβ lb as defined in the previous section for the joint chance constraint is equal to 1 − in Equation (5). This is a root finding problem, in which γ is adjusted untilβ lb reaches the required value: where the aim is to determine a value of γ, such that h(γ) is approximately zero.
To attain the required γ we use the so-called bisection method [55]. This method determines the root of a function in an interval a γ and b γ , where h(a γ ) and h(b γ ) have opposite signs. In our case this is relatively easy. Setting the value of γ too low returns generally a negative value of h(γ) due to the constraint violations using low back-offs, while setting it too high leads to positive values leading to a too conservative solution. Note we generally set the initial a γ to zero since this corresponds to the S MC samples used to determineb (t) j . The bisection method consists of repeatably bisecting the interval, in which the root is contained. The overall algorithm to determine the required back-offs in n b back-off iterations is summarized below as Algorithm 2. The output of the algorithm are the required back-offs with the corresponding lower bound on the probability of satisfying the state chance constraint. Note for learning = true the mean and covariance function of the GP are recursively updated utilizing the same procedure as for the update of the GP plant model MC sample.
Remark (Conservativeness of chance constraint). Note to adjust the back-offs we use the ecdf, which does account for the true shape of the underlying probability distribution. This avoids the problem that is often faced in stochastic optimization utilizing Chebyshev's inequality to robustly approximate chance constraints, which is often excessively conservative [56].

Algorithm
The overall algorithm proposed in this paper is summarized in this section. Firstly, the problem to be solved needs to be defined as outlined in Section 2. Thereafter, it needs to be decided if the GP NMPC should learn online or exploit the state dependency of the uncertainty as shown in Section 4. Once the GP NMPC has been formulated the back-offs are determined by running closed-loop simulations of the defined problem as shown in Section 4.3. Lastly, these back-offs then give us the tightened constraint set required for the GP NMPC online. This GP NMPC is then run online solving the problem initially outlined. An overall summary can be found in Algorithm 3.

Algorithm 3: Back-off GP NMPC
Offline Computations 1. Build GP state-space model from data-set D = (Z, Y) and additive disturbance Σ ω . 2. Choose time horizon T , initial condition mean µ x 0 and covariance Σ x 0 , stage costs and f , state dependent factor η t , constraint sets X t , U t ∀t ∈ {1, . . . , T }, chance constraint probability , ecdf confidence α, tuning parameter δ, decide if learning should be carried out, the number of back-off iterations n b and the number of Monte Carlo simulations S to estimate the back-offs. 3. Determine explicit back-off constraints using Algorithm 2. 4. Check final probabilistic valueβ lb from Algorithm 2 if it is close enough to .

Case study
The case study utilized in this paper deals with the photo-production of phycocyanin synthesized by cyanobacterium Arthrospira platensis. Phycocyanin is a high-value bioproduct and its biological function is to enhance the photosynthetic efficiency of cyanobacteria and red algae. It has been considered as a valuable compound because of its applications as a natural colorant to replace other toxic synthetic pigments in both food and cosmetic production. Furthermore, it has shown great promise for the pharmaceutical industry because of its unique antioxidant, neuroprotective, and anti-inflammatory properties. Using a simplified dynamic model we verify our GP NMPC algorithm by operating this process using a limited dataset.

Semi-batch bioreactor model
The simplified dynamic system consists of three ODEs describing the evolution of the concentration of biomass, nitrate, and bioproduct. The dynamic model is based on the Monod kinetics, which describes microorganism growth in nutrient sufficient cultures, where intracellular nutrient concentration is kept constant because of the rapid replenishment. We assume a fixed volume fed-batch. Control inputs are given by the light intensity (I) in µmol.m −2 .s −1 and nitrate inflow rate (F N ) in mg.L −1 .h −1 [57]. To capture the effects of light intensity on microalgae growth and bioproduction (photolimitation, photosaturation, and photoinhibition phenomena) the Aiba model is used [58]. The balance equations are given as follows: where C X is the biomass concentration in g/L, C N is the nitrate concentration in mg/L, and C q c is the phycocyanin (bioproduct) concentration in the culture in mg/L. The corresponding state vector and control vector are given by x = [C X , C N , C q c ] T and u = [I, F N ] T respectively. The initial condition is denoted as x 0 = [C X 0 , C N 0 , C q c 0 ] T . The missing parameter values can be found in Table 1.

Problem set-up
The time horizon T was set to 12 with an overall batch time of 240h, and consequently the sampling time is 20h. Based on the dynamic system in Equation 34 we define the objective and the constraints according to the general problem definition in Section 2. The measurement noise matrix Σ ν and disturbance noise matrix Σ ω were set to: The mean µ x 0 and covariance Σ x 0 of the initial condition are given by: The control algorithm aims to maximize the amount of bioproduct produced C q c with a penalty on the change of control actions. The corresponding stage and terminal cost can be stated as follows: where ∆ u t = u t − u t−1 and R = diag(3.125 × 10 −8 , 3.125 × 10 −6 ). The overall objective is then defined by Equation 3.
For the case of state dependency we set all η i to zero except η 0 , see Equation 22. The value of η 0 was set to 15. Note that for these factors to work properly it is important to normalize the data as we did in this case study.
There are two path constraints in the problem. The amount of nitrate is constrained to remain below 800 mg/L, while the ratio of bioproduct to biomass may not exceed 11.0 mg/g for high density biomass cultivation. These constraints can be stated as: Lastly, there is a terminal constraint on nitrate to reach a final concentration of below 150 mg/L: The maximum probability for violating the joint chance constraint was set to = 0.1. The control inputs light intensity and nitrate inflow rate are constrained as follows: For the back-off iterations we employed S = 1000 MC iterations with the initial back-offs computed according to Equation 31 with δ = 0.1 and α = 0.01. The maximum number of back-off iterations for the bisection algorithm was set to n b = 16.

Implementation details and initial dataset generation
The optimization problem for the GP NMPC in Equation 22 is solved using Casadi [59] to obtain the gradients of the problem using automatic differentiation in conjunction with IPOPT [60]. The "real" plant model was simulated using IDAS [61]. In the next section, different variations of the proposed algorithm are presented, for which two different type of datasets were collected. For the first type of dataset we designed the entire input data matrix Z according to a Sobol sequence [62] 40]. The ranges were chosen for the data to cover the expected operating region. The corresponding outputs Y were then obtained from the IDAS simulation of the system perturbed by Gaussian noise as defined in the problem setup. In the second approach only the control inputs were set according to the Sobol sequence in the range u i ∈ [120, 400] × [0, 40] and the corresponding states Y were obtained from the trajectories of the "real" system perturbed by noise using samples of the initial condition and the time horizon as defined in the problem setup based on these control inputs. For both datasets the input data Z and output data Y are normalized to zero mean and a standard deviation of one. The reason we use two different types of datasets is to highlight the advantages of accounting for state dependency in two of the algorithm variations. In the first dataset the data is relatively evenly distributed and hence considering the state dependency of the uncertainty to avoid regions with high data sparsity has essentially no effect, while in the second approach there are clearly defined trajectories that can be followed by accounting for state dependency.

Results and discussions
In this section we present and discuss the results from the case study described in the previous section. For comparison purposes we compare six different variations of the proposed GP NMPC approach, which are as follows: • GP NMPC 50, 60, 100: GP NMPC approach without learning and without taking into account state dependency for dataset sizes of 50, 60, and 100 points using the first type of dataset.
• GP NMPC learning 50: GP NMPC approach with learning and without state dependency for a dataset size of 50 points, which will be compared to the above case of 50 data points without learning. The first type dataset is utilized.
• GP NMPC SD/NSD 50: GP NMPC approach with and without accounting for the state dependency for a dataset size of 50 points employing the second type of dataset.
In addition, we compare the approaches to a nominal NMPC algorithm based on the GP model to show the importance of employing back-offs to prevent constraint violations, i.e. we run the GP NMPC on the "real" plant model, while setting the back-offs to zero. The results of the outlined runs are summarized in Figures 6-12, and in Tables 2-3. In Figures 6-7 we show the evolution of the back-off factor and the probability of constraint satisfactionβ lb over the 16 back-off iterations from Algorithm 2. The next two Figures 4-5 show the 1000 MC trajectories of the constraints with a line to highlight the nominal prediction of the GP NMPC. Next the GP NMPC was applied to the "real" plant with back-offs from the final iteration and without back-offs referred to as nominal as shown in Figures 8-9. Figure 10 shows the probability density function of the objective values obtained from the "real" plant, while Figure 11 shows representative control trajectories for GP NMPC 50, 60, 100 compared with the optimal trajectory obtained from solving the OCP of the "real" plant ignoring uncertainties. Figure  12 shows the back-off values for the nitrate constraints g 1 and g 2 for GP NMPC 50 and GP NMPC 50 learning. Lastly, Table  2 shows the mean values for the back-offs averaged over time for the final back-off iteration, while Table 3 shows the attained probability of satisfactionβ lb together with the average computational times for solving a single GP NMPC optimization problem. We can draw the following conclusions from these results: • Figures 6-7 and Table 2 show that apart from GP NMPC 50 the other variations reach the requiredβ lb and hence successfully converge to a reasonable back-off factor. For these, as expected, a low back-off value leads to too lowβ lb values near zero, while too high back-off values lead to too highβ lb values. The value ofβ lb does vary by ±0.1 even on convergence, which is due to the randomness of the MC samples. Nonetheless, sinceβ lb is a sample robust value, it is high enough if at least 0.9 is reached once over the 16 iterations, which is the case for all of them. Note that GP NMPC 50 does not converge, since even without back-offs the NMPC remains feasible for all MC trajectories and hence the bisection procedure fails. The performance of GP NMPC 50 is however also by far the worst. This is due to insufficient amounts of data in crucial areas for the control problem.
• From GP NMPC 50, 60 to 100 the objective value steadily improves with increased number of data points as shown in Figure 10. This is as expected, since more data points should lead to a more accurate GP plant model and hence more optimal control actions. Lastly, a more accurate GP plant model should also require less conservative back-offs. For the constraint g 2 the mean of the back-off values steadily decreases from 0.022 mg/L for GP NMPC 50 to 0.003 for GP NMPC 100 as shown in Table 2. For the constraints g 1 /g 3 on the other hand the mean of the back-offs decreases dramatically from GP NMPC 50 with a value of 250mg/L to a value of 34.2mg/L for GP NMPC 60, while slightly increasing again for GP NMPC 100 to 38.8 mg/L. This is further illustrated in Figure 4, for which the spread of the trajectories decreases steadily from GP NMPC 50, 60 to 100. All in all, larger datasets lead to improved solutions.
• The learning approach GP NMPC 50 learning leads to a reasonable solution with an objective value that is on average higher than GP NMPC 60 and has a sharper peak, which suggests a more reliable performance as can be seen in Figure  10. In contrast, GP NMPC 50 without learning is unable to determine a good solution and therefore has an objective value that is considerably lower by on average over 30% than the remaining scenarios. GP NMPC 50 is also unable to reach aβ lb value of 0.9 and has instead a much higher value as shown in Table 3, but at the expense of performance. This is highlighted in Figure 11 in which the control trajectory of GP NMPC 50 decreases the nitrate flowrate early to satisfy the terminal constraint, which leads to a sub optimal solution. This is believed to be due to the high uncertainty of the g 1 /g 2 /g 3 constraint trajectories of GP NMPC 50, which can be seen by the large spread of the trajectories in Figure 4. GP NMPC 50 learning on the other hand has a spread of the constraints g 1 /g 2 /g 3 that is significantly less than GP NMPC 50. This is further highlighted by the considerably higher back-off values of GP NMPC 50 compared to GP NMPC 50 learning as can be seen in Table 2, where the g 1 /g 3 back-off values are nearly 400% larger, while the g 2 back-off values are over 300% larger. In conclusion, accounting for online learning has lead to a significantly better solution, although it should be noted that at larger datasets the effect is nearly negligible.
• It can be seen in Figure 5 that GP NMPC NSD 50 has a larger spread of trajectories than GP NMPC SD 50. This is as expected, since GP NMPC SD 50 aims directly in its objective to minimize uncertainty. Consequently, the mean of the back-off values of constraints g 1 /g 3 and g 2 in Table 2 are over 50% larger and over 100% greater than those for GP NMPC SD 50, respectively. Nonetheless, the attained average objective of GP NMPC SD 50 is marginally lower as can be seen in Figure 10. This is expected, since accounting for state dependency reduces conservativeness, but may also lead to a sub optimal solution due to conflicting objectives.
• In Table 3 the average computational times of a single GP NMPC evaluation are shown, which range from 135ms to 48ms. Overall, it can be seen that by far the largest computational times are attributed to GP NMPC 100, which is reasonable since the complexity of GP plant models grows exponentially with the number of data points. GP NMPC 50 and GP NMPC 50 learning can be seen however to have higher computational times, which is due to a more complex optimization problem from the reduced amount of data. This is further highlighted by GP NMPC 50 SD and GP NMPC 50 NSD attaining the lowest average computational times due to the second type of dataset leading to easier optimization solutions.
• Lastly, in Figures 8-9 trajectories of the constraints are shown by applying the GP NMPC variants to the "real" plant. It can be seen that the nominal variations of GP NMPC 50, 60, 100, and GP NMPC 50 learning with back-offs set to zero violate the nitrate constraint g 1 to remain below 800mg/L by a substantial amount of up to 50mg/L for GP NMPC 50 learning and GP NMPC 60. With back-offs on the other hand the approaches remain feasible throughout the run, which illustrates the importance of employing back-offs. GP NMPC NSD 50 nominal can be also seen to violate the nitrate constraint g 1 by 50mg/L, while GP NMPC SD 50 nominal does not violate this constraint. This is likely due to GP NMPC SD 50 nominal following a feasible trajectory in the dataset. GP NMPC NSD 50 with back-offs remains feasible. Overall, it can be seen that back-offs are important to achieve feasibility given the presence of plant-model mismatch.

Conclusions
In conclusion, a new approach is proposed for finite-horizon control problems using NMPC in conjunction with GP state space models. The method utilizes the probabilistic nature of GPs to sample deterministic functions of possible plant models.
Tightened constraints using explicit back-offs are then determined, such that the closed-loop simulations of these possible plant models are feasible to a high probability. In addition, it is shown how probabilistic guarantees can be derived based on the number of constraint violations from the simulations. Furthermore, it is shown that online learning and state dependency of the uncertainty can be taken into account explicitly in this method, which leads to overall less conservativeness. Moreover, the computational times are shown to be relatively low, since constraint tightening is performed offline. Finally, through the comprehensive semi-batch bioprocess case study, the efficiency and potential of this method for the optimisation of complex stochastic systems (e.g. biological processes) is well demonstrated.
In this paper we are often concerned with inverting matrices as shown in Section 3.2 in Equation 17c: This is an recursive update formula and hence Σ −1 Y is known a priori. In this Section we introduce a recursive formula to exploit this fact to obtain Σ +−1 Y in a cheaper fashion taken from Strassen [63] adjusted to our case. The following quantities need to be computed to obtain Σ +−1 Y : The inverted matrix is then given by: