Emulating the dynamics of complex systems using autoregressive models on manifolds (mNARX)

,


Introduction
Modelling and predicting the behaviour of dynamical systems is a fundamental yet challenging task encountered in engineering and applied sciences.The goals in modelling such systems are diverse and encompass for instance gaining deeper insight into the system dynamics to uncover its governing equations, e.g., for system control purposes Levin and Narendra (1996).
For predictive maintenance Langeron et al. (2021), digital twins Edington et al. (2023), or damage detection Mattson and Pandit (2006), the goal is to predict the future state of a system based on past observations and measurements.For the purpose of uncertainty quantification Mai and Sudret (2017); Bhattacharyya (2022); Bhattacharyya et al. (2020), reliability analysis Garg et al. (2022), or design optimization Deshmukh and Allison (2017), an attempt is made to reduce the cost of system response analysis.This is achieved by replacing the system with an inexpensive-to-evaluate surrogate model.
Regardless of their specific goals, all of these examples share autoregressive modelling as a tool for describing the evolution in time of the system under study.This is due to the ability of autoregressive models to represent time-dependent data using past observations or predictions.
A special class of autoregressive models is that of exogenous input autoregressive models (ARX), which incorporate exogenous inputs such as time-dependent loads or control signals to improve their predictive power.
Many variants of ARX models have been developed and successfully applied in various fields.
For example, nonlinear polynomial models with exogenous inputs have been used to describe the evolution of magnetic activity due to solar winds Balikhin et al. (2001), gas turbine shaft speeds Chiras et al. (2001), or frictional dynamics Wan et al. (2008).Mai et al. (2016) and Spiridonakos and Chatzi (2015) used them to create flexible surrogates for modelling dynamical systems under uncertain excitation.Neural network-based ARX models have been used with great success to reduce the measurement error of microelectromechanical systems Li et al. (2021), in the context of fault detection in wind turbines Schlechtingen and Ferreira Santos (2011), or for vibration control Song et al. (2022).Other popular algorithms used in the context of dynamical systems are state-space models Zhang et al. (2019), or vector autoregressive models Lütkepohl (2005).
Despite the relatively rich literature on the topic, emulating the response of dynamical systems over extended time periods can still pose a major challenge.The difficulty often comes from the complexity of real-world systems, which can involve nonlinear springs, dampers, coupling, and controllers.These factors can result in a system response that is nonlinear, non-differentiable, or even discontinuous with respect to the exogenous input, making the approximation of the response challenging Kerschen et al. (2006).In these cases, classical ARX algorithms often fail, or require a significant amount of data to make accurate predictions.In most cases, this is because these algorithms rely on some form of regularity in the input-output state-space mapping, such as smoothness, symmetry, or stationarity.
In practice, the regularity assumptions may not hold when the system dynamics are complex, rendering traditional modelling methods ineffective.This challenge can be addressed by working in a different space in which these assumptions are satisfied.For example, Calandra et al.Calandra et al. (2016) demonstrated the effectiveness of a classical surrogate model even in the case of discontinuous functions, when a suitable feature space with smooth input-output mappings is constructed.However, identifying such a space can be difficult.
To solve this problem, various methods have been proposed to identify nonlinear transforms that map the input data into spaces better suited for surrogate modelling.Recently, autoencoders have emerged as a powerful tool for this purpose.For instance, Champion et al. Champion et al. (2019) used autoencoders to identify a reduced set of coordinates that simplifies the identification of system dynamics, while Lee and Carlberg employed deep convolutional autoencoders to project dynamical systems onto nonlinear manifolds Lee and Carlberg (2019).Autoencoders were also used by Simpson et al. Simpson et al. (2021) to construct reduced-order models of nonlinear dynamical systems.
A comprehensive review of other popular methods for identifying linear or nonlinear subspaces in the context of surrogate modelling can be found in Lataniotis et al. (2020).However, while lowerdimensional spaces are often preferred for surrogate modelling due to the curse of dimensionality Verleysen and François (2005), they can sometimes result in unfavorable topologies for surrogate modelling Lataniotis et al. (2020).In summary, identifying an input manifold that facilitates state-space surrogate modelling is known to significantly improve the predictive capabilities of classical emulation techniques.
With this in mind, we propose a novel surrogate modelling technique, called manifold nonlinear autoregressive with exogenous input (mNARX) modelling.This approach combines ARX modelling with a supervised incremental construction of a nonlinear exogenous input manifold.mNARX can accurately approximate complex dynamical systems, even when their response is dependent on high-dimensional exogenous inputs.Moreover, mNARX achieves this with high computational efficiency and stability over extended time periods.
The organization of the paper is as follows: In Section 2, we describe the general rationale of mNARX, including its individual components and the details of the algorithm.In Section 3, we address the construction of mNARX from an existent training dataset (the experimental design).
To validate its performance, we present two case studies: a coupled spring-mass system with a one-dimensional exogenous input, and a numerical aero-servo-elastic wind turbine simulator with a control system and high-dimensional exogenous input (three-dimensional wind field sampled over 10 minutes).Finally, in Section 4, we provide concluding remarks on the mNARX algorithm and discuss potential extensions to enhance its performance and applicability.

Manifold autoregressive with exogenous input modelling
Our goal is to introduce a novel algorithm, namely mNARX (for manifold nonlinear autoregressive with exogenous input) modelling, which can emulate the response of deterministic systems, excited by time-varying exogenous inputs.We describe such systems mathematically as where y(t) ∈ R is one scalar component of the response of the system at time t ∈ T , and x(t) ∈ R M is the M -dimensional vector of exogenous inputs.Although the system response can also be multivariate, for notation simplicity, we assume a univariate response.Further, we focus on discrete-time problems, which means that each element of the time axis T = {0, δt, 2δt, . . ., (N − 1)δt} is an integer multiple of the time increment δt.However, the algorithm presented can also be applied to continuous-time problems.The notation •(T ≤ t) indicates that the system is causal, meaning the response depends on past and current inputs, but not on future inputs.
Our objective with the mNARX algorithm is to construct a surrogate on a finite set of system inputs and corresponding system outputs called the experimental design (ED) The surrogate M shall emulate the system response (hereinafter the model prediction) over a long time period, based solely on the exogenous inputs: This can be challenging when the mapping from the exogenous input to the system output is highly nonlinear, the system has a long memory, the dimensionality of x(t) is high, or when the ED is small.To address these issues, mNARX uses a multi-step surrogate modelling approach based on two key components.The first is NARX modelling, introduced in Section 2.1.The second is the incremental construction of an exogenous input space, referred to as the exogenous input manifold, described in Section 2.2.The full mNARX algorithm is summarized in Section 2.3.

Autoregressive with exogenous input modelling
We employ AutoregRessive with eXogenous input (ARX) models as the basic tool to approximate the response of dynamical systems.At each time instant t on a discretized time axis T = {t 0 , t 0 + δt, . . ., (N − 1)δt}, the ARX model M maps an input vector φ(t) ∈ R Mφ to its corresponding model output y(t) ∈ R, which we express as The input vector φ(t) contains information on the system state variables at different time steps, and M is a parametric model characterized by the finite set of model parameters c.Typical examples of parametric ARX models include e.g.polynomials and Gaussian processes, or any mathematical function that is relatively simple and fast to evaluate.Note that when the model is nonlinear, it is often referred to as a Nonlinear ARX model (NARX).
Once the specific structure of the autoregressive model in Eq. ( 4) is chosen, the optimal set of parameters c needs to be calibrated from a dataset representative of the specific problem at hand, a process detailed in Section 2.1.1.

Calibration of a ARX model
To calibrate the coefficients c in Eq. ( 4), we first define the so-called design matrix: which is constructed by stacking the input vectors of all time steps.It is formed from the time-dependent exogenous input x(t) ∈ R M and the output of the model itself y(t) ∈ R.
In its extended form, every single row of the design matrix reads where ℓ y i ∈ {δt, 2δt, . . ., (N − 1)δt} are called autoregressive lags, and ℓ x j i ∈ {0, δt, 2δt, . . ., (N − 1)δt} are the exogenous input lags.We use the term lag to refer to the time delay between the output at the current time and the value of a variable included in the model.The minimum possible autoregressive lag is strictly larger than zero to preserve causality (ℓ y i > 0).On the other hand, no such limitation is needed for the exogenous input, which means that an immediate effect of the exogenous input on the system response can be modelled (ℓ . It is also important to note that both the autoregressive and exogenous input lags can be non-contiguous, e.g., ℓ y 1 + δt ̸ = ℓ y 2 , which means that it is not necessary to include the full set of possible lags in the model. There is no strict limit on the maximum lag (except the number of time steps available), but in practice, it is restricted to a reasonable size to keep the cardinality of φ(t), and hence typically that of c in Eq. ( 4) manageable.Since we use lagged time steps of the exogenous input and the autoregressive input, the number of time steps in φ is less than or equal (if no lags are included) to the number of time steps in x ( T ⊆ T ).
From Eq. ( 4) and ( 6) it becomes clear that we can map each input vector φ(t) to its corresponding model output y(t) and therefore can formulate a time-dependent problem as an ordinary regression problem.In this setting, these input-output pairs {φ(t), y(t)}, herein samples, no longer need to follow a temporal ordering.This has two important implications for the ARX training process: first, samples from different simulations of the system, which we refer to as model realizations, and therefore also the design matrices Φ (i) can be concatenated to form a larger and more informative design matrix: Φ (1)  . . .
where the superscript (i) indicates the index of the realization within the full experimental design (ED) of size N ED .Analogously we define an output vector y ED , whose component y (i)   correspond to Φ (i) , as y (1)  . . .
The regression task is then performed on Φ ED and y ED .
A second implication of the lack of chronological ordering in the rows of the design matrix, is that it allows one to subsample from Φ ED and y ED : where the k indices r i ∈ {1, 2, . . ., |y ED |} are randomly or deterministically drawn.Here, Φ ED i refers to the i th row of Φ ED and y ED i refers to the i th element of y ED .
The model fitting is then performed on Φ S and y S .Working on a subset of all the possible samples can facilitate the model fitting when the number of samples in each realization is large and the samples are highly correlated, without compromising the quality of the surrogate.

ARX model prediction
As outlined in Section 2.1.1,ARX models make use of past output time steps to generate predictions for future time steps.While the actual output of the simulations in the experimental design is used when fitting the model coefficients, this is not possible when predicting on unseen input data.During the prediction phase, the ARX surrogate M generates a new time step prediction by using its previous predictions ŷ(T < t): To initiate this prediction process, the first max({ℓ ŷ 1 , . . ., ℓ ŷ ny }) time steps of the output must be initialized.In practice, the starting conditions for the prediction can vary, but it is common to set the initial output time steps to zero, or to any physically meaningful initial conditions for the application in question.Some systems can exhibit a high level of sensitivity to their initial conditions, and even a small change in these conditions can result in vastly different system evolutions.However, this is true also for the corresponding full-physics-based model, and not a specific issue introduced by the ARX methodology.

Constructing an exogenous input manifold
Constructing a surrogate in the form of Eq. (3) for complex models M : x(T ≤ t) → y(t) can be a challenging task.To address this, we introduce a feature space, or input manifold, denoted by Our objective is to find a transformation F : x → ζ, that creates an input manifold that is more suitable for surrogate modelling than the original input space x ∈ R N ×M .Transforming the input space is common in e.g.dimensionality reduction, where the goal is to find a low-dimensional representation of x that makes the process of surrogate construction more tractable Lataniotis et al. (2020).Our objective, however, is not only to reduce the input dimensionality, as smaller spaces may yield even more complex input-output mappings, as demonstrated in Lataniotis et al. (2020).Instead, we aim at constructing a feature space that facilitates and simplifies the training of an accurate surrogate model.
In Section 2.2.1, we describe a supervised method for incrementally constructing such an input manifold ζ, by decomposing the transformation ζ = F(x) as a sequence of simpler transforms.
In Section 2.2.2, we provide further details on handling high-dimensional time-dependent inputs within the context of dynamical systems and discuss how this can be integrated into the manifold construction process.

Auxiliary quantities
Modelling the response of a dynamical system based solely on its raw exogenous inputs can be difficult due to the presence of strong nonlinearities and coupling effects.Such nonlinear or even discontinuous responses with respect to the raw input can be the result of control modules, nonlinear springs, dampers or coupling effect between system components.
mNARX addresses this issue by breaking down the modelling task into a series of intermediate steps, each with lower complexity than the full problem.Each modelling step corresponds to creating a new time-dependent feature, called an auxiliary quantity, that provides valuable information about the system state.The choice of the specific auxiliary quantities for a given system are typically based on prior expert knowledge about the system itself.For instance, a domain expert may know that a certain transformation of the raw input or other auxiliary quantity is more informative than the original input time series.Relevant auxiliary quantities can be, for example, moving averages of selected system outputs, integrals/derivatives, or the energy or frequency content of the signal up to a certain delay, or even the output of an active control system.Especially, feeding exogenous variables like control system outputs as additional exogenous input to the NARX model can significantly reduce its nonlinearity or eliminate discontinuitous in the input-output mapping.
The value of a new auxiliary quantity, z i (t) ∈ R, can be generated by applying any transformation F i on all information available, which includes already created auxiliary quantities z <i (T ≤ t), the raw exogenous input x(t) ∈ R M or past values of the auxiliary quantity z i (T < t) itself, as denoted in Eq. ( 12) and illustrated in Fig. 1.We use the subscript to indicate the construction order of auxiliary quantities, where z i represents the i th constructed quantity.Note that each step in Eq. ( 12) is closely related to Eq. ( 10) and that the auxiliary quantities can be viewed as an incremental extension of the exogenous input manifold: In principle, auxiliary quantities can also be interdependent, so there is no clear order in which they have to be created.This scenario is shown in Eq. ( 13), where the i th feature is the result of the transformation F i to the j th feature and the j th feature is the result of the transformation F j to the i th one: In such a situation, one must choose a leading quantity and alternately predict one time step at a time.For notational simplicity though, in this paper we will assume uncoupled auxiliary quantities, with a clear construction order.Along the same lines, not all auxiliary quantities z i (t) require all of the other quantities z j (t), j < i.Nevertheless, we will maintain the formulation in Eq. ( 12) and Eq. ( 10) in its general form for consistency.
The incrementally growing set of exogenous inputs is what we refer to as the exogenous input manifold.Each auxiliary quantity can be thought of as a time-dependent system itself that depends on this manifold and can be modelled using a NARX model.This stepwise enrichment of the manifold allows mNARX to be viewed as a series of physics-informed autoregressive models that break down the full problem into more easily solvable subproblems.
This process of reducing nonlinearity in the problem by constructing a more informative exogenous inputs manifold, can not only improve the accuracy of the final surrogate model, but it can also reduce the need for a large training dataset, without compromising prediction accuracy.
Therefore, mNARX is ideal for situations where data is scarce, but prior information on the physics of the system is available.In particular, mNARX can predict stably over long time periods, because all the components of the input manifold depend on the original exogenous input either explicitly or implicitly.

Reduction of non-temporal coordinates
NARX models require input data over multiple time steps, which often leads to high-dimensional regression problems, even when the raw exogenous input of the system is of moderate dimensionality.This can result in the curse of dimensionality, which refers to the growth of model complexity with an increasing number of input features.The complexity of the model not only slows down its training process, but it also increases the risk of overfitting and obtaining a model that generalizes worse than a simpler one.To mitigate this issue, dimensionality reduction techniques can be applied to the original input to reduce its dimensionality, prior to the construction of the the exogenous input manifold (Section 2.2.1) and design matrix (Section 2.1.1).In this paper, we focus on compressing the raw exogenous input x along its non-temporal coordinates using a transform G, to obtain a lower dimensional representation x: where x ∈ R N ×m , x ∈ R N ×M and ideally m ≪ M .We do not compress the input in the time domain as the aim is to still model the system in its original time scale.This approach allows us to keep Eq. ( 4), and to use x instead of the raw x as part of the exogenous input manifold.
This concept is illustrated in Fig. 2, which shows the reduction of an input with high spatial dimensionality to a small set of time-dependent features.Note that this compression step is compatible with the construction of auxiliary quantities as shown in Section 2.2.1, Eq. ( 12), where we now replace x(T ≤ t) with x(T ≤ t).
The mapping from the raw input data to the compressed input data can be performed using various techniques, such as auto-encoders Rumelhart and McClelland (1986), principal component analysis (PCA) Pearson (1901) and discrete cosine transform (DCT) Ahmed et al. (1974), but also non-invertible and nonlinear techniques such as Isomap Tenenbaum et al. (2000) or Kernel PCA Schölkopf et al. (1997).
Nonlinear transforms can often capture nonlinear relationships within the data better and preserve more of the local structure.However, they tend to be computationally more complex than linear methods, and can be sensitive to the choice of their hyperparameters Lee and Verleysen (2007).
The optimal choice of a dimensionality reduction technique depends on the specific problem being addressed, as well as on the characteristics of the input data.In general, it may be necessary to evaluate the performance of various techniques to determine the best approach for a given mNARX problem.

The mNARX algorithm
In summary, the mNARX algorithm consists of three main steps.A first optional data preprocessing step can be performed, for example, to reduce the dimensionality of the exogenous input, to perform interpolation in the time domain (resampling), or for scaling the input data.
In a second step, auxiliary variables are constructed sequentially to create the exogenous input manifold that facilitates the construction of an efficient and accurate surrogate model.In the third and last step, the time-domain NARX surrogate is trained, using the manifold as exogenous input.These steps are described in the algorithm 1.

Manifold construction (auxiliary quantities)
end for

Synergy with polynomial NARX models
Polynomial NARX models are a special case of the ARX models introduced in Section 2.1.For a given time instant t, they approximate y(t) as a sum of monomials formed from the input vector φ(t) (see Eq. ( 6)).We denote such a monomial P as with α ∈ N Mφ being an integer multi-index that defines the degree of the monomial of total degree ||α|| 1 = Mφ i=1 α i .We truncate the multi-index to a finite set α ∈ A Mφ,d,r : where d is the maximum allowed polynomial degree, and r constrains the maximum interaction order ||α|| 0 = Mφ i=1 1 {α i >0} .Using A as shorthand notation for the multi-index domain A Mφ,d,r and given Eq. ( 16) we can rewrite the general ARX formulation from Eq. (4) as where we now express y(t) as a sum of all monomials weighted by a set of real-valued coefficients c α .For brevity, from now on we will denote the multivariate polynomial basis P α (φ(t)) as P t .
Further, we will use the super script P (i) t to indicate that P t belongs to the i th realization in the experimental design with size N ED .This allows us to define a regression matrix Ψ comprising the P t of all time steps contained in the experimental design: Analogously, we define an output vector y corresponding to the input matrix Ψ as y = {y (1) 0 , . . ., y (1) tmax , . . ., y Note that, despite the initial temporal coherence of the exogenous input and output, the rows of the design matrix Φ and hence also of Ψ do not need to follow any temporal order, as explained in Section 2.1.Therefore, determining the set of model coefficients reduces to a linear regression problem y = Ψc + ε in which c = {c α , α ∈ A} collects the polynomial coefficients and ε is the residual error.We compute these coefficients by means of ordinary least squares minimization, which is computationally efficient: The fast process of training polynomial NARX models and their simple parametrization make them ideal for use in mNARX, which typically requires fitting multiple NARX models during the construction of the input manifold.Therefore, we will use them in the applications of the following Section 3.

Applications
3.1 Coupled spring-mass system

Problem statement
In our first application, we consider the coupled spring-mass system sketched in Fig. 3, which consists of two masses m 1 and m 2 connected by a linear spring with stiffness k 2 .The lower mass is fixed to the ground by a spring with stiffness k 1 .
The system is described by a system of ordinary differential equations with two degrees of freedom: Figure 3: Coupled spring-mass system The numerical values for the system parameters are listed in Tab. 1.The upper mass m 2 is about two orders of magnitude smaller than the lower mass m 1 , while the two ratios k 1 m 1 and k 2 m 2 are similar.Consequently, the displacement of the upper mass y 2 is strongly dependent on the displacement of the lower mass y 1 , while the displacement of the lower mass is largely unaffected by the upper mass.We study these two displacements under a random excitation acting on m 1 via the lower spring.The excitation is the arithmetic mean of N ω sinusoidal terms.The number of terms is modelled as a discrete uniform random variable N ω , with ..,10 = 1 10 .Each of the terms has random amplitude A i and frequency B i , both of which follow a uniform distribution ∼ U(−1, 1).The phase C i also follows a uniform distribution, C i ∼ U(−π, π).Besides being easily interpretable, Eq. ( 22) allows us to generate monochromatic as well as frequency-rich excitations.
On this spring-mass system we compare mNARX and classical NARX models, in terms of their ability to predict both displacements y 1 (t) and y 2 (t) from an unseen excitation x(t).Both approaches are trained and validated on 30-second realizations of the system starting at rest and sampled with a time step δt = 0.01 s.Training of the models is performed on all samples from the design matrix (see Section 2.1.1)built from N ED = 5 random simulations of the system.
Validation is performed on a large set of N val = 10,000 out-of-sample simulations.

mNARX and NARX configuration
In the NARX approach, we build two polynomial NARX models to predict y 1 (t) and y 2 (t) independently as a function of the exogenous excitation x(t).The model structures including the truncation scheme, the included lags and the total number of polynomial coefficients are listed in Tab. 2.
With mNARX, we first predict y 1 (t) as a function of x(t), identically to the NARX approach.
We then predict y 2 (t) with two exogenous inputs, namely x(t) and the prediction ŷ1 (t).This allows us to keep the total number of model coefficients small, as shown in Tab. 2, and helps avoid overfitting on the small training set.
For both approaches we initialize the NARX models with the first few time steps of the true output as detailed in Section 2.1.2.Although not necessary, this allows us to better validate and compare the two methods because the temporal evolution of the system is sensitive to its initialization.

Results
The performance of the NARX and mNARX method on the mass-spring system validation set are shown in Fig. 4. The results for the prediction of the lower mass displacement are shown in The comparison of mNARX and the standard NARX approach on y 2 is given in Fig. 4b.We depict the mNARX results in orange and the NARX results in purple.In the top left panel we see generally good agreement of the true and predicted peak displacement for both methods.
However, for the NARX approach there is a trend toward underestimating the true displacement especially when the peak displacement becomes large.The inferior performance of the NARX approach compared to mNARX becomes even more apparent when looking at the RMSE shown in the top right panel.The accuracy of the NARX approach is considerably lower and less consistent for rare events as it can be seen from the long tail of the histogram (note that the ordinate is in log scale).In the bottom panel it can be seen that the standard NARX surrogate is not capable of modelling the system response accurately.It shows a clear mismatch in magnitude and phase.Even though the mNARX model consists of only 8 terms compared to the 24 terms of the standard NARX model, its prediction closely follows the true system response.Furthermore, the prediction ŷ2 is stable over the entire 30 seconds and virtually no error accumulates over time.

Problem statement
After demonstrating mNARX on an analytical case study with low exogenous input dimensionality, we now use mNARX to emulate a realistic engineering scenario: an aero-servo-elastic (ASE) wind turbine simulator.The input to the simulator is a four-dimensional, temporally-coherent random field that represents the wind speeds across the 2D-area spanned by the turbine rotor: called a turbulence box.At every time instant on a discrete time axis T , the turbulence box is described by ν w = 3 wind speed components (longitudinal x, transversal y and vertical direction z) at every one of the ν y × ν z spatial grid points.Assuming a typical spatial discretization of O(10 1 ) in either direction, this results in a total spatial dimensionality of O(10 2−3 ).The outputs of the ASE simulator are multiple univariate time series such as the evolution of the blade-or tower-internal forces (e.g.bending moments), blade pitch, or the instant power production.Our goal is to construct an mNARX surrogate M that predicts the quantity f i at every time step t solely based on the wind speeds up to and including time t: Besides the high-dimensional exogenous wind input, additional complexity arises in the ASE simulations from the highly nonlinear and non-differentiable relationship between input wind and some of the output quantities.This nonlinearity and non-differentiability are introduced by the turbine controller system which manipulates multiple degrees of freedom (DOFs) of the turbine blades and the nacelle.Moreover, the problem is further complicated by the fact that many output quantities depend on the orientation of the rotor blades due to gravity and wind shear.An illustration of an onshore wind turbine with its DOFs is given in Fig. 5.

Computational model
In this study, we conduct ASE simulations on the well known reference 5MW NREL baseline onshore wind turbine Jonkman et al. (2009), using the NREL ROSCO (Reference Open-Source Controller) Abbas et al. (2022) and the open-source low-fidelity ASE simulator OpenFAST NREL (2021), with a time step of 0.00625 s.A summary of the turbine specifications can be found in Tab. 3. The standard defines two random parameters: • the reference wind speed V hub , which follows a Rayleigh distribution and is defined as the mean longitudinal wind speed at hub height over the total duration of the generated turbulence box • and the turbulence standard deviation of the longitudinal wind speed at hub height, σ 1 , which follows a lognormal distribution conditional on V hub .
The turbulence box is constructed using the Kaimal spectral model and an exponential coherence model, as suggested by the design standard.As a result, the turbulence box is coherent both in time and space for the longitudinal wind speed component but incoherent in space for the transverse and upwards wind speed components.Additionally, the three components of the wind speed differ in terms of their magnitude, with the longitudinal component being about one order of magnitude larger than the transverse and upwards ones.The longitudinal component v x is also superimposed by a wind shear profile (varying with the altitude z) with a constant shear coefficient α = 0.2, according to the standard, resulting in generally higher wind speeds at higher altitudes.The air density is modelled as a constant with a value of 1,225 kg/m 3 .For Temporal discretization s 0.05

Output quantities of interest
In this study, we focus on two quantities of interest (QoI) for the design of wind turbines, namely the blade root bending moment (more specifically the flapwise moment M Bld (t)), and the generated power of the turbine P (t).The accurate prediction of the flapwise bending moment is crucial for fatigue and ultimate limit state design, thus for the reliability of the turbine.From expert knowledge, we know that these two QoIs depend both on the rotor speed ω(t) and the blade pitch ϕ(t) imposed by the controller as soon as the wind speed exceeds the rated wind speed (see Fig. 5).Thus ω(t) and ϕ(t) will be our auxiliary quantities.
As outlined in Tab. 3, the 5MW NREL turbine has a rated wind speed of 11.4 m/s, at which it reaches its maximum power output of 5 MW and adjusts the blade pitch to maintain this level.
To take advantage of this knowledge, we construct two separate mNARX surrogates, namely one for below-rated wind speeds and another one for above-rated wind speeds.We classify each turbulence box used for prediction and validation based on the reference wind speed V hub .As this is a known characteristic of the turbulence box, the exogenous input, it is also available for the out-of-sample wind boxes in the validation set.

Training and validation data
In this study, we conducted a total of 1,050 ASE simulations, each with a duration of 12 minutes, using the computational model detailed in Section 3.2.2.To ensure the validity of our data, we truncated the first two minutes of each simulation, which include the start-up phase of the turbine.These 1,050 simulations were divided into two regimes, with 600 simulations belonging to the below-rated wind speed regime and 450 to the above-rated wind speed regime.To ensure robustness, we randomly selected 100 simulations from each regime to serve as training datasets for each emulator, while the remaining simulations were used for validation.

mNARX structure
The The dimensionality of v x is reduced by selecting n i and n j such that n i < ν y = 19 and n j < ν z = 19.For simplicity, we choose the same number of coefficients in the two spatial direction (n i = n j ).As the system response is mostly governed by the low spatial frequency components of the wind, we only keep the 2-5 coefficients corresponding to the lowest frequencies in each direction.This reduces the spatial dimensionality of each trajectory v x (t) by 1-2 orders of magnitude, from 361 to 4-25.
We use the time-dependent spectral coefficients ξ in Eq. ( 26) as the exogenous input to build a first polynomial NARX model (see Section 2.4) to model the blade pitch ϕ(t) of the turbine: where the subscript ϕ in ξ denotes that we only use a subset of ξ as the exogenous input to Mϕ .Modelling the blade pitch is a crucial first step as it determines the angle of attack and the amount of wind striking the blades, and therefore, how strongly the turbine responds to the wind.Because the ROSCO controller adjusts the blade pitch mostly based on the inflowing wind, using only ξ as an exogenous input is sufficient for this QoI.
When building the NARX model, the wind speed component v x is generated at 20 Hz, but the simulator output has a sampling frequency of 160 Hz, so we upsample ξ to match the sampling frequency of the simulator.With 100 training simulations, each 600 s long, this results in a large amount of data, with a total of O(10 7 ) time steps, which would cause the regression matrix (see Section 2.4) in the linear regression step to become extremely large.To avoid this issue, we use a random subset of the design matrix as described in Section 2.1.1 and the corresponding output vector to perform the least-square fitting of the polynomial NARX.By doing this, we can ensure that the model is well-conditioned and not overfitting, while at the same time reducing the computational costs associated with training.Note that the number of training samples can still be high relative to the number of regression coefficients.Although it is possible to further reduce the number of samples or choose a more complex model structure, we have found that increasing the number of lags or polynomial degree does not significantly improve accuracy.Moreover, using fewer samples also does not provide a significant benefit, since the computational cost of fitting the model is already low.The number of random time steps used to train Mϕ and the model configuration can be found in Table 5.In the second modelling step, we construct a NARX model to predict the rotor speed ω(t).The rotor speed is a crucial quantity in wind turbine operation, as it determines the power generation and the evolution of the rotor azimuth.The latter is strongly connected to the blade loads because of gravity and because a blade pointing upwards usually experiences higher wind speeds.
The rotor speed depends in turn on the blade pitch φ(t) that we modelled in the first step.
Therefore, we create an input manifold ζ ω (t) = {ξ ω (t), φ(t)} as the exogenous input for the NARX model.The NARX model is then represented by: The configuration of the NARX model and the number of random subsamples used for the least-square fit are listed in Tab. 6.Finally, to predict the generated power P , we again create a new exogenous input manifold } which includes a subset of the spectral coefficients (Eq.( 26)), the predicted blade pitch and the predicted rotor speed.The NARX model based on this input then reads and its specific structure is reported in Tab. 7. Our second main QoI, the flapwise blade root moment M Bld strongly depends on the blade pitch and the azimuth α(t) of the blades.When α is zero (the first blade pointing upwards) the blade is axially under compression because of gravity and typically experiences higher winds at this higher altitude.With α = 180°the blade points down, is under tension and exposed to lower wind speeds.Consequently, M Bld is periodic when the rotor speed is constant and quasi-periodic in practice when the rotor speed varies.M Bld shows even higher periodicity due to tower shadowing effect when any of the three blades pass the turbine tower.To account for these phenomena we integrate the predicted rotor speed in time to obtain a prediction for the rotor azimuth α(t) = t 0 ω(t)dt, and construct the higher harmonics ẑ(t) = {cos (k α(t)), sin (k α(t))}, k = {1, . . ., 4} from it.These harmonics, in conjunction with a large set of spectral coefficients and the blade pitch prediction, build a new manifold that allows us to model the complex evolution of M Bld : As the blade moments depend on the wind speeds in its close proximity, a relatively large number of 25 spectral coefficients is used.The full NARX configuration is given in Tab. 5 The results for the auxiliary quantities and main quantities of interest are presented in the following sections, namely Section 3.2.6-3.2.9.Similar to the spring-mass system (Section 3.1), the autoregressive predictions are initialized with the true initial conditions for better validation.This is particularly important for the blade moment, as it oscillates and its phase depends on the rotor azimuth.Starting the prediction with a different initial state can lead to a different evolution of the blade moment.It is worth noting, however, that this is also the case for the ASE simulator, where different initial conditions will yield different responses, and not specific to the mNARX surrogate.

Blade pitch
The performance of the NARX model on the first auxiliary quantity, namely the blade pitch ϕ(t), is shown in Fig. 6.Fig. 6a shows the results in the below-rated wind speed regime and Fig. 6b the above-rated wind speed regime.
We report the root-mean-squared error (RMSE) in degrees in the top panels.In both regimes, the RMSE is low with a maximum of 1.5°and does not show any clear outliers.The high peak for the below-rated wind speed regime at zero RMSE can be explained by mostly having a zero pitch angle during the full 10 min simulation at low wind speeds.This region of the input space is also well captured by the surrogate.
In the middle panels we show the traces with the lowest RMSE (best-case point of the validation set) for either regime.In the below-rated wind speed regime this is a simulation without any pitch controller action (ϕ = 0 over 600 s), and therefore modelled precisely.In the high wind speed regime, we see constantly high pitch angles between 10°and 20°which is replicated by the surrogate with very high accuracy.Similarly, we show the two samples with the highest RMSE (worst-case point of our validation set) in the bottom panels.It becomes clear that the surrogate for the low wind speed regime is not able to mimic the fast actions of the controller and returns a much smoother response.In the high wind speed regime where we see only a slightly more active controller, the surrogate exhibits a lower RMSE.This may be caused by the higher polynomial degree of the NARX model in this regime (see NARX configuration in Tab. 2) or because there is more data in this transition region from inactive to active pitch controller in the high wind speed training dataset.

Rotor speed
Similarly to the blade pitch results we present the performance of mNARX on the second auxiliary quantity, i.e., the rotor speed ω(t).The top panels show a very low RMSE for the rotor speed prediction in both wind speed regimes.The relative frequency of the RMSE in the low wind speed regimes decreases with increasing RMSE, whereas for high wind speeds the RMSE is almost symmetrically distributed around an RMSE of 0.1 rpm.This very different distribution of the RMSE is related to the fact that at low wind speeds the rotor speed takes a wide range of values whereas it is kept at about the rated rotor speed of 12.1 rpm (see Tab. 3) at high wind speeds (meaning about 1 % accuracy).Nevertheless, the middle panels, which show the simulations with the lowest RMSE (best-case), highlight the high accuracy of the surrogate.

Generator power
The results for the generator power P are given in Fig. 8.To provide a general overview of the results we plot the produced energy E during the 10 min simulation of the true power production and the predicted one for the above-(Fig.8a) and below-rated wind speed regime (Fig. 8b) in the top left panels.The heatmap represents the reference wind speed V hub of each simulation.In the low wind speed regime, the energy production increases with increasing V hub .In contrast, in the high wind speed regime in most simulations the energy production reaches its saturation level of 0.83 MWh, which corresponds to the turbine operating for 10 min at its rated power of 5 MW.In the low wind speed regime, the surrogate tends to underpredict the power output, as can be seen clearly in the top right panel in Fig. 8b, where we plot the difference between the true and predicted energy production ( Ê − E).In the high wind speed regime, there is a slight overprediction of the energy production as shown in the top right panel in Fig. 8b.On average the energy production is overpredicted by 0.0021 MWh which is 0.25 % of the saturation level.
The two middle panels display the simulations with the lowest discrepancy in the E. The surrogates predict the true power output well in both subplots.The bottom panel in Fig. 8a confirms the findings from the discrepancy plot that for high errors the surrogate consistently underestimates the power output in the below-rated wind speed regime.This mostly happens at wind speeds close to the classification boundary which also corresponds to the region in which the turbine reaches its rated power.This supports the indications from Section 3.2.6 that there is a lack of training samples in this input space region.The bottom panel in Fig. 8b which shows the worst prediction in the above-rated wind speed validation dataset shows still good agreement with the true output even when the power output drops well below rated power.The fact that the simulations in the two bottom panels have a similar V hub again supports the assumption that the below-rated wind speed data set is lacking enough samples close to or at rated power.

Flapwise blade root moment results
The results for the flapwise blade root moment M Bld in the below-rated wind speed regime are displayed in Fig. 9a and we present the result for the above-rated wind speed regime in Fig. 9b.
We quantify the accuracy of the surrogate in terms of the absolute peak moment |M Bld | max , which is one of the main quantities of interest in the wind turbine design process.Note that the peak value is extracted from the surrogated time series as | M Bld | max (that is, we do not surrogate this scalar quantity directly, as it is common in the wind energy literature).
In The error on the below-rated wind speed validation dataset is about twice lower than on the above-rated wind speed validation dataset.
In the two middle panels the traces with the lowest error (best-case of the validation set) in the peak moment are displayed.The 90 s long sections marked in grey cover the true peak moment and are shown in more detail below the full 600 s traces.In the low wind speed regime the predicted peak value matches the true one almost exactly because the highest moment occurs at the beginning of the prediction.Note that as explained in Sec 3.2.5 we initialize the prediction with the true values for a better comparison and therefore no error built up yet.In the high wind speed regime the true peak value appears later in the simulation after around 150 s while the predicted peak value occurs about 10 s after the true one.However, because the two peaks in the data are of similar magnitude, the error remains small.
In the two bottom panels, the traces with the highest error (worst-case of the validation set) are displayed.From the trace of the low wind speed regime it becomes clear that an error in the rotor speed can propagate to the prediction of the blade moment as stated in Sec 3.2.5.There is a clear mismatch in the phase of the true and predicted moment which is caused by a wrong azimuth, which is derived from the predicted rotor speed (see Section 3.2.5).A different pattern can be observed in the above-rated wind speed regime where the phases do match even after more than 400 s but the sharp peak in the response is still underpredicted.Nevertheless, the two traces with the highest error still show stable predictions over the full 600 s and visually agree well with the true response.

Discussion and conclusions
In this paper, we introduced mNARX, a novel surrogate modelling technique that enables the efficient and accurate emulation of the response of complex dynamical systems, even in the presence of e.g.active controllers.To do so, mNARX sequentially builds a chain of Nonlinear AutoRegressive with eXogenous inputs (NARX) models.These models are trained on an incrementally built exogenous input manifold that can contain not only the raw exogenous input but also the prediction of the NARX models earlier in the chain and features derived from these predictions.Therefore, the number of available features, and thus the information content of the exogenous input, increases as the modelling chain becomes longer, allowing for the modelling of more intricate quantities of interest.
We demonstrate the effectiveness of mNARX on different case studies.In the first example, we emulate a coupled two-mass-two-spring system with a one-dimensional exogenous input, and show that mNARX is capable of emulating the response of both system components with high accuracy, despite being trained on an extremely small dataset.In the second case study, a full aero-servo-elastic wind turbine simulator, we demonstrate that mNARX can handle dynamical systems with high exogenous input dimensionality when combined with dimensionality reduction techniques.This case study also highlights that mNARX provides long-term stable predictions with relatively small error accumulation over time, even when multiple auxiliary quantities, such as the turbine control system and turbine state variables, are constructed.Because this remarkable efficiency is achieved by decomposing the original problem into a set of relevant sub problems, mNARX works best when some prior knowledge of the system properties and underlying physics is available.Especially in many applications in physics and engineering, extensive knowledge about the system is usually available, and additional knowledge can be obtained through data processing or measurements, making mNARX well-suited for these applications.In cases where this condition is not met, mNARX falls back to standard NARX modelling, i.e. gives at least as good results.It should be noted, however, that mNARX inherits some limitations of standard NARX modelling, e.g., it may fail in resonant systems when the system response is no longer governed by the exogenous excitation.
Our methodology is readily extendible in multiple directions.High-sampling rate time-series are often highly correlated, making it possible to subsample from the design matrix of the experimental design data (as explained in Section 2.1.1).This not only speeds up the model calibration but also enhances accuracy in specific regions in the input or output domain.For instance, one may select more samples that are in proximity to extreme responses, if the primary focus is to capture extreme values (e.g. in view of reliability analysis).
To improve the performance of the mNARX surrogate in cases with high-dimensional exogenous inputs, nonlinear dimensionality reduction techniques can be employed.Additionally, alternative NARX models, as an example neural-network based, may also enhance the predictive performance of the mNARX surrogate, albeit likely with an associated increase in computational cost and data consumption compared to polynomial models.
To reduce the need for prior knowledge, ongoing research is focused on the automatic detection and selection of auxiliary quantities, as well as construction of the input manifold.A more data-driven setting can increase applicability in cases with limited knowledge of the system being modelled.

Figure 1 :
Figure 1: The figure shows the incremental construction of auxiliary quantities z i starting from the observed exogenous input x.With each iteration, the set of available features increases and the newly generated features provide more informative data, improving the mapping to the output y.

Figure 2 :
Figure 2: Illustration showing the creation of a small set of time-dependent features xi (t) ∈ R, i = 1, . . ., m, from the original high-dimensional input x(t) ∈ R M where M = n 1 × n 2 and m = 3.

Fig. 4a and
Fig. 4a and the result for the upper mass displacement in Fig. 4b.In the top left panel of Fig. 4a we provide a visual comparison between the true absolute peak displacements of the lower mass |y 1 | max and the predicted one |ŷ 1 | max .The root-mean-squared error (RMSE) on this quantity is shown in the top right panel.The full trace of the realization marked by a blue cross is displayed in bottom panel.These plots show that a linear model with only 6 terms predicts well the peak displacement even for the extremely low and high values.Further, the model prediction is accurate and stable over the full 30 seconds duration of the

Figure 4 :
Figure 4: Coupled spring-mass system results.(a) (Top left) Predicted maximum displacement of the lower mass |ŷ 1 | max vs. the true displacement |y 1 | max identical by construction for the mNARX and NARX surrogates.(Top right) Histogram of the root-mean-squared error (RMSE) for the prediction of the displacement y 1 .(Bottom) Exemplary trace of the lower mass displacement.The true displacement is depicted as a solid black line while the prediction is shown as a dashed blue line.The trace corresponds to the point and RMSE marked by a blue cross in the scatter plot and histogram.(b) (Top left) Predicted vs. true maximum displacement of the upper mass y 2 .The results for the NARX approach are shown in purple and the mNARX results are shown in orange.(Top right) Histogram of the RMSE for the prediction of the displacement y 2 .(Bottom) Exemplary trace of the upper mass displacement.The true displacement is depicted in black, the mNARX prediction as a dashed orange line and the NARX prediction in purple.The traces correspond to the orange and purple crosses in the scatter plot and histogram.

Figure 5 :
Figure 5: Sketch of an onshore wind turbine.The following degrees of freedom and loads are annotated: rotor azimuth angle α (red), blade pitch angle ϕ (orange), yaw angle θ (green), flapwise blade root bending moment M Bld (blue).
high spatial dimensionality of the turbulence box (M = 3 • 19 • 19 = 1,083) makes it infeasible to use it directly as an exogenous input to model any of the turbine response quantities.To reduce its dimensionality, we only keep the longitudinal wind speed component of the turbulence box, v x , as it is the most relevant in this context (M = 19 • 19 = 361) .Additionally, the transverse and vertical wind speeds, which have no spatial coherence by construction (see Section 3.2.2),can be considered as noise.To further reduce the dimensionality of the data, each 2D slice v x (t) of the turbulence box, characterized by pixels v κ,ℓ x (t), k, l = 1, . . .19, is represented by its 2D discrete cosine transform (DCT) coefficients ξ(t): Figure 6: Wind turbine simulation -Results for the blade pitch prediction in the below-rated (a) and above-rated wind speed regime (b).The top panel shows the root-mean-squared error (RMSE) of the blade pitch prediction in degrees on the validation dataset.The middle panel displays the true (black) and predicted (green) trace corresponding to the simulation with the lowest RMSE.The bottom panel illustrates the traces corresponding to the simulation with the highest RMSE.The prediction is depicted in red and the true response in black.
Figure 7: Wind turbine simulation -Results for the rotor speed prediction in the below-rated (a) and above-rated wind speed regime (b).The top panel shows the root-mean-squared error (RMSE) of the rotor speed prediction in revolutions per minute on the validation dataset.The middle panel displays the true (black) and predicted (green) trace corresponding to the simulation with the lowest RMSE.The bottom panel illustrates the traces corresponding to the simulation with the highest RMSE.The prediction is depicted in red and the true response in black.
Figure 8: Wind turbine simulation -Results for the generator power prediction in the belowrated (a) and above-rated wind speed regime (b).The left top panel shows the produced energy of the true output E and of the prediction Ê in MWh for each simulation in the validation dataset.The color of each scatter represents the reference wind speed V hub of that simulation.The right top panel shows the histogram of the difference between predicted and true produced energy.The middle panel displays the true and predicted trace of the power output corresponding to the simulation with the lowest difference Ê − E. The true response is shown in black and the prediction in green.The bottom panel illustrates the traces corresponding to the simulation with the highest error.The prediction is shown in red and the true values in black.

Figure 9 :
Figure 9: Wind turbine simulation -Results for the flapwise blade root moment M Bld prediction in the below-rated (a) and above-rated wind speed regime (b).The left top panel shows the absolute peak moment |M Bld | max of the true response and of the prediction in MNm for the validation dataset.The color shows the reference wind speed V hub .The right top panel shows the histogram of the difference between predicted and true peak moment.The middle panel displays the true and predicted trace of the moment corresponding to the simulation with the lowest error in the maximum moment.The true response is shown in black and the prediction in green.The bottom panel illustrates the traces corresponding to the simulation with the highest error.The prediction is shown in red and the true values in black.
mNARX has the favourable property of requiring a small training dataset, since it capitalizes on several intermediate NARX models, each of which modelling a simpler sub-problem compared to the full problem.Since each subproblem has relatively low complexity, we were able to use low-degree polynomial NARX models, and therefore keeping the computational cost of training and evaluating the full mNARX surrogate low.

Table 1 :
Coupled spring-mass system -System parameters

Table 2 :
Coupled spring-mass system -Model configurations of the mNARX and classic NARX

Table 4 :
discretization, we use a spatial grid resolution of 19 by 19 (ν y = ν z = 19) grid points, resulting in an exogenous input with a spatial dimensionality of 1,083.The wind speed is sampled at 20 Hz (time step of 0.05 s) while the ASE solver uses a time step of 6.25 ms.Thus the turbulence box is interpolated to this 8-time higher rate.The full set of parameters used for TurbSim is listed in TurbSim parameters (detailed distributions can be found in IEC (2019))

Table 5 :
Wind turbine simulation -Configuration of the NARX surrogate for the prediction of the blade pitch ϕ(t)

Table 6 :
Wind turbine simulation -Configuration of the NARX surrogate for the prediction of the rotor speed ω(t)

Table 7 :
Wind turbine simulation -Configuration of the NARX surrogate for the prediction of

Table 8 :
Wind turbine simulation -Configuration of the NARX surrogate for the prediction of the flapwise blade root moment M Bld