Emulating dynamic non-linear simulators using Gaussian processes

The dynamic emulation of non-linear deterministic computer codes where the output is a time series, possibly multivariate, is examined. Such computer models simulate the evolution of some real-world phenomenon over time, for example models of the climate or the functioning of the human brain. The models we are interested in are highly non-linear and exhibit tipping points, bifurcations and chaotic behaviour. However, each simulation run could be too time-consuming to perform analyses that require many runs, including quantifying the variation in model output with respect to changes in the inputs. Therefore, Gaussian process emulators are used to approximate the output of the code. To do this, the flow map of the system under study is emulated over a short time period. Then, it is used in an iterative way to predict the whole time series. A number of ways are proposed to take into account the uncertainty of inputs to the emulators, after fixed initial conditions, and the correlation between them through the time series. The methodology is illustrated with two examples: the highly non-linear dynamical systems described by the Lorenz and Van der Pol equations. In both cases, the predictive performance is relatively high and the measure of uncertainty provided by the method reflects the extent of predictability in each system.


Introduction
Computer models, e.g. numerical simulators, are sophisticated mathematical representations of some real-world phenomena implemented in computer programs [37]. Such models are widely used in many fields of science and technology to aid our understanding of physical processes or because conducting physical experiments is too costly, highly time-consuming or even impossible in some cases [47]. In some cases, simulators are available as commercial packages and the underlying functions are unknown to the user. In most applications, it is crucial to understand the sensitivity of model outputs to variation or uncertainty in inputs [37]. Performing such quantitative studies require a large number of simulation runs, see for example [12]. It becomes impractical if each simulation run is time-consuming.
Emulators, also known as surrogate models, metamodels or response surfaces [24] provide a "fast" approximation of complex simulation models using a limited number of training runs. The most popular classes of emulators are neural networks, splines, regression models, etc. We refer the reader to [17,7,13] for more information on different types of emulators and their properties. Among the diverse types of emulators, Gaussian processes (GPs) have become increasingly popular over the last two decades in the field of the design and analysis of computer experiments [47,48,19]. Also known as Kriging, especially in geostatistics [10], GPs have been effectively used in many real-world applications including wireless communication [49], metallurgy [2], biology [54,21,28], environmental science [29,6], and sensor placements [25].
There are several reasons for the popularity of GPs. Firstly, they can be used to fit any smooth (with different degrees of smoothness), continuous function thanks to the variety of covariance kernels available [36]. See Sect. 2 for more details on kernels. Secondly, GPs are non-parametric models, i.e., no strong assumptions about the form of the underlying function are required, unlike polynomial regression [42]. Moreover, the prediction performance of GPs is comparable to (if not better than) other methods such as neural networks [40,22], the limit of a single layer neural network as the number of neurons tends to infinity is a Gaussian process [33,35]. The main advantage of GPs is that they provide not only a mean predictor but also a quantification of the associated uncertainty. This uncertainty reflects the prediction accuracy and can serve as a criterion to enhance prediction capability of the emulator [18]. This paper deals with the emulation of dynamic computer models that simulate phenomena evolving with time. The output of a dynamic simulator is a time series for each input. The time series represents the change in the model state variables from one time step to the next. Such models are often expressed by a system of differential equations.
Dynamic simulators appear in many applications. For instance, Stommel's box model [52] simulates the evolution of temperature and salinity to determine the ocean density. In [4] a dynamic model is developed whose output is a time series of general practice consultations for the 2009 A/H1N1 influenza epidemic in London. Since this model is computationally expensive, a GP emulator is developed for calibration [11]. Another real-world example of dynamic computer models is presented in [26] where a saturated path hydrology model simulates the movement of water at catchment scales. In [56] large climate models with time series output that exhibit chaotic behaviour are emulated using Bayesian dynamic linear model Gaussian processes. We refer to [9] for more examples on such simulators.

Gaussian processes as emulators
Let f be the underlying function of an expensive simulator we want to approximate (or predict) defined as f : X −→ F . Here, X ⊂ R d and F ⊂ R q are the input and output space respectively. We further assume that f is a "black-box" function; there is no analytic expression for it and additional information such as gradients are not available. Also throughout this paper we assume the simulator to be deterministic (vs. stochastic); i.e. if it is run twice with same inputs, the outputs will be identical.
A GP defines a distribution over functions which can be regarded as a generalisation of the normal distribution to infinite dimensions. Formally, a GP indexed by X is a collection of random variables {Z x , x ∈ X } such that for any N ∈ N and any x 1 , ..., x N ∈ X , (Z x 1 , ..., Z x N ) follows a multivariate Gaussian distribution [41]. GPs are fully characterized by their mean function µ(.) and covariance kernel k(., .), which are defined as The mean function reflects our prior belief about the form of f . That is why µ(.) is also called "prior" mean within the Bayesian framework. While µ(.) could be any function, k(., .) must be symmetric positive definite. The most commonly used kernel is the squared exponential (SE) which has the form In the above equation, the parameter σ 2 , referred to as the process variance, controls the scale of the amplitude of sample paths. The parameter θ l is called the characteristic length-scale and controls the degree of smoothness of sample paths along the coordinate l , 1 ≤ l ≤ d. Usually the kernel parameters are unknown and need to be estimated. Choosing appropriate kernel parameters has a huge impact on the accuracy of emulators. Cross validation or maximum likelihood estimation are common methods for this purpose. Covariance kernels play an important role in GP modelling. They customize the structure of sample paths of GPs. As an example, three different kernels (exponential, Matérn 3/2, and SE, see [41] for more information) and the associated sample paths are illustrated in Fig. 1. While in a process incorporating the SE kernel the sample paths are smooth (infinitely differentiable), they are only continuous (not differentiable) when the exponential kernel is used. Herein, we consider stationary covariance kernels that are translation invariant. The value of a stationary kernel depends only on the difference between input vectors. In other words, k(x, x Figure 1: The structure of GP sample paths is determined by the covariance kernel. Left: Graphs of three (stationary) kernels: exponential (solid), Matérn 3/2 (dashed), and squared exponential (dotted). Right: Three sample paths corresponding to the covariance kernels shown on the left picture. The process with squared exponential kernel is infinitely differentiable, whilst with Matérn 3/2 the process is only once differentiable. The process with exponential kernel is not differentiable.
To fit a GP, the true function f is evaluated at n locations X n = x 1 , . . . , x n with the corresponding outputs (observations) y = f (x 1 ), . . . , f (x n ) . Together, X n and y form the set of training samples/data denoted by D = {X n , y}. Then the conditional distribution of Z x is calculated as: If the mean function µ(.) is known, the prediction (conditional mean, m(.)) and its uncertainty (conditional variance, s 2 (.)) at a generic location x * are of the form where k(x * ) = k(x * , x i ) 1≤i≤n is the vector of covariances between the observation at x * and the outputs at the x i s and K = k(x i , x j ) 1≤i, j≤n denotes the matrix of covariances between sample outputs. Also, µ(X n ) is the vector of mean function values at the training samples. The mean predictor obtained by Eq. (4) interpolates the points in the training data. Moreover, the prediction uncertainty vanishes at the training points and grows as we get further from them. An illustrative example is shown in Fig. 2.

Emulating dynamical simulators
There are many different proposed approaches for emulating dynamic simulators. According to [44], these approaches can be divided into four categories: 1. One method is to use a multi-output emulator for predicting time series output [9]. In this case, the dimension of output space is q = T where T is the length of time the simulator is run for. But when T is large, the efficiency will reduce or may cause numerical problems. In addition, prediction is possible only for a fixed time horizon and one needs to repeat the prediction procedure for different time horizons.
Building q separate emulators for each time point has the drawback of losing some information as the correlation between various outputs (which we expect to be high) is not considered. To take into account such correlation, one can employ multivariate emulators [14,45], which use nonseparable covariance kernels. However, multivariate emulators are not efficient when the simulator's output is highly multivariate. A common approach to alleviate this problem is to perform dimension reduction techniques on the output space such as principal components analysis [16] and wavelet decomposition [3]. A potential drawback of these techniques is that we may lose some information by leaving out some components.

2.
A second approach is to treat time as an additional model input [23]. This method is computationally demanding as the size of the covariance matrix (see Eqs. (4) and (5)) is nT because the inputs are now X n × {1, . . . , T } [9]. Inversion of an n × n covariance matrix has a computation cost of O(n 3 ). Moreover, it is shown in [9] that the performance of multi-output emulators exceeds emulators with time as an extra input.
3. One-step ahead emulations are another example in which the basic assumption is that the model output at a given time depends only on the previous output in time. Then, the transition function needs to be approximated. This method is reported to be efficient, [8].
4. Finally, methods have been described that combine stochastic dynamic models with innovation terms in the form of GPs. For example, in [30] a time-varying auto regression time series, which is a type of dynamic linear model, combined with GPs is used to emulate a dynamic computer code in a hydrological system. Similar work is carried out in [56] with application to climate models.
Here, we propose a methodology based on iterative one-step ahead predictions. We assume that simulating a long time series from the system is expensive. To overcome this, our strategy is to only simulate a short period of time, and to use this simulation to build an emulator that can rapidly approximate longer time simulations. Our method is similar to the work in [8]. However, we propose a methodology to propagate uncertainty through the time series including both the uncertainty of inputs to the emulators at time t and the correlation between them. This is an important aspect of one-step ahead predictions because input to the GP model is uncertain after the first emulation. Besides, it can be used as a criterion to estimate the predictability horizon of an emulator.

Methodology
We focus on a d-dimensional system of ordinary differential equations (ODEs) of which the state variable is given by the real-valued vector x(t) = (x 1 (t), . . . , x d (t)) that we wish to predict over time. This system gives rise to the flow map Φ : To emulate x(t), we begin by writing Φ(.) in d components denoted by f 1 , . . . , f d such that each f l maps x(t 0 ) to x l (t): In this work, we use a variable order method to integrate the system to t = t 1 , . . . ,t T to provide the next step ahead. More precisely, we use the LSODA (Livermore solver for ordinary differential equations with automatic switching between stiff and nonstiff methods) method [38], as implemented in the R package deSolve. Then, each function f l , 1 ≤ l ≤ d , is treated as a black-box function maps that x(t 0 ) to x l (t 1 ). In the next step, f l s are replaced with their GP emulators denoted byf l s which are iteratively used for one-step ahead predictions over the time horizon T . These instructions are summarized in Algorithm 1. Note that in Algorithm 1, only the initial input to the emulators is certain. Thereafter, inputs are actually outputs of the emulators in the previous step. For example, to predict x at t 2 , the Algorithm 1 Emulation of dynamic non-linear computer models 1: Select n samples of initial conditions: X n = x 1 (t 0 ), . . . , x n (t 0 ) 2: Run the simulator to obtain x 1 (t 1 ), . . . , x n (t 1 ) 3: for l = 1 to d do 4: Build the lth emulatorf l , based on D = {X n , y} 6: end for 7: Predict over time horizon T given initial condition x * (t 0 ) as folloŵ (4) and (5). So, we need to incorporate the input uncertainty in our modelling. The following section deals with this issue.

Emulation with uncertain input
GPs with uncertain inputs have been studied in [15], [5,27]. Suppose we want to predict f (x) at random point x * drawn from a distribution that has mean µ * and variance Σ * . The probability distribution of the prediction at x * with the GP emulatorf is determined by The above integral is analytically intractable [15]. However, it can be approximated using a Monte Carlo (MC) method which relies on samples repeatedly drawn from a probability distribution and statistical analysis to infer the results [43], as demonstrated by an illustrative example in Fig. 3. An alternative approach is to compute the first and second moments of p(f (x * ) using the law of iterated expectations and conditional variance. Iff (x) ∼ N m(x), s 2 (x) , the mean and variance off (x * ), assuming x * has the normal distribution, are given by Still, computing quantities in (8) and (9) is not straightforward because they are functions of the random variable x * . One can approximate m(x * ) and s 2 (x * ) using a Taylor expansion around µ * , first order for the former and second order for the latter. They are given by [5]: Rewriting Eqs. (8) and (9) based on the first and the second order Taylor expansions leads to [15] where Tr is the trace operator. Derivatives of the conditional mean and the variance in Eqs. (12) and (13) can be calculated analytically. For detailed information, we refer the reader to [34]. It is also possible to use the MC method. For example, to approximate E x * [m(x * )] in Eq. (8), samples are repeatedly drawn from the random variable x * . Then, they are propagated through the function m(.) defined in Eq. (4). Finally, the desired quantity is approximated using where n MC denotes the number of MC samples.

One-step ahead predictions with uncertain inputs
In this section we first describe the emulator which is applied for predicting dynamic models. We then examine the prediction capability of the emulator on two well studied dynamical systems: the Lorenz and the Van der Pol systems, which are described in subsequent sections. The GP emulator we use in our experiments consists of a first order polynomial regression for the mean function in Eq. (1) (i.e., µ(x) = β 0 + β 1 x) and a squared exponential kernel, given in Eq. (3), for the covariance kernel k. These choices of µ and k are recommended in [8]. A set of training samples of size n = 12d, as recommended in [20,31], is drawn over the space of initial conditions. Our training sample is constrained to lie in a cube. For example, a suitable boundary for the Lorenz attractor, as described in [1,55], is given by the unstable manifold of the origin. To define a bounding box that contains the attractor, we therefore simulated the system with initial conditions close to the origin and chose as boundaries in each coordinate the extremes of the simulation. Note we do assume that the system is constrained to lie in the same volume, although clearly we would like to capture most of the variation. The points should be selected based on a space-filling sampling scheme, and we therefore use a Latin hypercube [51,39]. The goal in a space-filling design is to spread the points evenly within the input space. No attempt is made to have the points lie along the stable manifold, we simply try to 'fill' space.
To build each emulatorf l (1 ≤ l ≤ d), the training data consists of X = x 1 (t 0 ), . . . , x n (t 0 ) with the corresponding outputs y = x 1 l (t 1 ), . . . , x n l (t 1 ) . The R packages DiceKriging [46] and deSolve [50] are employed to fit the GP emulator and to solve differential equations, respectively. The unknown parameters of the SE kernel k (i.e. σ and θ l s) and the mean function µ (i.e. β 0 and β 1 ) are estimated by maximum likelihood implemented in DiceKriging. To solve ODEs, we use the default solver of deSolve, called "ode", which is based on the LSODA method [38]. A full Jacobian matrix is used which is calculated internally by LSODA. The error control (the relative and absolute tolerances) performed by the solver is 10 −6 . In these two examples, we use a time step equal to 0.01, i.e. t ν+1 − t ν = 0.01 , ∀ν ∈ {1, 2, . . . , T − 1}.

Lorenz system
The Lorenz model was first proposed by Edward Lorenz in 1963 [32] as a mathematical representation of atmospheric convection. It is a three-dimensional system of ordinary differential equations. Under certain choices of parameters it can display chaotic behaviour, i.e, its behaviour is highly sensitive to initial conditions. The evolution of three state variables is described by [50] where a, b and c are parameters [53]. Here, we assume a = −8/3, b = −10 and c = 28. We focus on the case with initial conditions x(t 0 ) = (x 1 (t 0 ) = 1, x 2 (t 0 ) = 1, x 3 (t 0 ) = 1) . To propagate input uncertainty, we have implemented two alternative methods, Taylor expansion and Monte Carlo, to approximate Eqs. (8) and (9). Since they yield similar results, for the sake of brevity, we only present emulations obtained using the MC approach.
Emulation of the Lorenz model using the iterative one-step ahead predictions considering the input uncertainty, as described in Algorithm 1, is demonstrated in Fig. 4. We show the evolution of predictions for each system variable over time, as well as a three-dimensional picture showing the evolution of the whole system, (x 1 (t), x 2 (t), x 3 (t)). The solid line represents the true model and the dashed blue line is the GP prediction. It can be seen that the prediction precision is high at the beginning of the time course, for example t ≤ 14. However, the emulator deviates from the true model as time progresses. Fig. 4 suggests that the emulator is well suited to describing the evolution of the system within a "wing" of the Lorenz attractor, but that predictions break down upon switching to the other part of the attractor.   Fig.  4. The uncertainties are compared with the case in which the input uncertainty is not considered (dashed red line). Generally, if emulation is carried out with uncertain inputs, the order of uncertainties is higher. Nevertheless, they are still too small and contrary to our expectations do not increase with time as the uncertainty builds up from step to step. The true model is not inside the credible intervals, which are defined as m(x * ) ± 2s(x * ). Note the credible intervals are not shown on Fig. 4, but can easily be derived from Figs. 4 and 5. In particular, we would expect the uncertainty to "blow up" when we reach the bifurcation point (at about t=14) where our emulator can be on a different branch to the true model but still has very small uncertainty.  (Fig. 4) with and without considering input uncertainty, solid black and dashed red lines respectively. Incorporating input uncertainty augments the prediction SD. Contrary to our expectations, the uncertainties do not grow with time.

Van der Pol oscillator
The Van der Pol model was first introduced by the Dutch electrical engineer Balthasar van der Pol in 1920. The Van der Pol oscillator models express the behaviour of nonlinear vacuum tube circuits. In its two-dimensional form, it is given by the following ordinary differential equations [53] Here, the scalar α > 0 determines the nonlinearity and the strength of damping. The results of predicting state variables of the Van der Pol oscillator is illustrated in Fig.  6. The corresponding uncertainties are given by Fig. 7. The initial condition is x(t 0 ) = (x 1 = 1, x 2 = 1) and α = 5. The difference between emulation and the true model is indistinguishable up to about t = 100. Again, considering the input uncertainty augments prediction uncertainties everywhere. But, since they are small, the true model is not inside the credible intervals when prediction accuracy declines. In the next section, we propose a method to better propagate uncertainty.

Propagating the uncertainty
In the previous section, each element of x(t) = (x 1 (t), x 2 (t), . . . , x d (t)) is emulated separately; d different GP emulators denoted byf 1 ,f 2 , . . . ,f d are employed independently such that the l−th emulatorf l emulates the transition function f l defined as f l : x(t 0 ) −→ x l (t 1 ). However, we may lose some information if correlation between emulators is neglected. As we will see later, this is the reason why uncertainties are small and do not grow with time. Let x * ∼ N (µ * , Σ * ) be an uncertain input to the d emulators. As a result, x * * = f 1 (x * ), . . . ,f d (x * ) is a random vector whose mean is determined by Notice that x * * is not necessarily a random normal variable, see Fig. 3. But, we approximate its distribution by a Gaussian. Let Σ * * be the covariance matrix of x * * . In order to include the correlation between emulators, Σ * * must be of the form: The diagonal elements of Σ * * are calculated using Eq. (9) which is approximated by the MC method. The off-diagonal elements, i.e. cross covariances, are given by: The cross covariances can be obtained analytically when the kernel is the squared exponential, see [34]. Herein, we use the MC method to approximate the cross covariances in Eq. (17): To shed more light on this method, we present here the mean and the covariance matrix of the input in one-step ahead predictions: • at t = 0, the input is deterministic, • at t = 2, the input to emulators are no longer deterministic and the method described in this section should be applied, . . app.
The results of emulating the Lorenz and Van der Pol models considering input uncertainties together with the correlation between emulators are illustrated in Figs. 8 and 9, respectively. The predictive capability of these emulators is high at the beginning of the time course, say up to t = 25, and tends to the average of the system afterwards (dashed blue lines). In the case of the Lorenz model, the true model predominantly remains inside the credible intervals (= prediction ± 2× prediction standard deviation). The uncertainty grows and reaches its maximum, interestingly, when deviation from the true model begins. From this point onwards the expected value of the emulator is the long term average of the underlying function and the uncertainty is large enough to encompass most values of the system. We interpret this as the emulator in effect "giving up" on trying to emulate the value at a particular time t and instead falling back on a prediction for a random time. Such a prediction is useless in practice but statistically makes sense. This point can be used as a measure for the predictability horizon of dynamic emulators which is illustrated in Fig. 10. The picture on the left of Fig. 10 is associated with prediction uncertainties of variables in the Lorenz system while that on the right shows variables of the Van der Pol model. The y−axis in Fig. 10 is on logarithmic scale. We now propose an alternative approach to emulate dynamic models. In this method, correlations between emulators are not considered in prediction. However, they are incorporated for uncertainty propagation. At each step, the prediction and the associated uncertainty are obtained based on different inputs denoted by x * P and x * SD , respectively. To better illustrate this approach, the emulation procedure at t = 2 and 3 is given below. Note that x * P = x * SD up to time t = 2. .
The predictive capability of this approach is illustrated in Figs. 11 and 12. The results are in line with this approach being a combination of emulation with uncertain inputs and the method explained earlier this section. As can be seen, predictions in this case do not tend to the timeseries mean once the discrepancy between the emulator and the true model increases, which occurs when the prediction uncertainty reaches its maximum. Rather, the emulator persists in approximating trajectories on the attractor. However, this method could be relatively timeconsuming because prediction and uncertainty of prediction are calculated separately.  Figure 11: Emulating the Lorenz system (solid black): prediction (dashed blue) and the credible intervals (dotted red). Correlations between emulators are not considered in prediction (x * P is used as input to emulators) while they are incorporated in uncertainty estimation (x * SD is used as input to emulators).

Conclusion
In this paper we develop a general framework to emulate dynamically highly non-linear functions with time series outputs using Gaussian processes. Such functions show the behaviour of phenomena evolving with time. One advantage of our method is that it is easy to implement in comparison to alternative methods; it uses a GP emulator to perform one-step ahead predictions in an iterative way over the whole time series. Moreover, we propose a number of ways to propagate uncertainty through the time series based on both the uncertainty of inputs to the emulators and the correlation between them. This allows to measure the horizon, we term the "predictability horizon", within which the prediction accuracy is high. The capability of our method is illustrated in application to two non-linear dynamical systems: the Lorenz and Van der Pol systems. In both cases, we obtained a very good predictive performance and an accurate measure of the predictability horizon.