Data-driven model reduction by moment matching for linear and nonlinear systems

Theory and methods to obtain reduced order models by moment matching from input/output data are presented. Algorithms for the estimation of the moments of linear and nonlinear systems are proposed. The estimates are exploited to construct families of reduced order models. These models asymptotically match the moments of the unknown system to be reduced. Conditions to enforce additional properties, e.g. matching with prescribed eigenvalues, upon the reduced order model are provided and discussed. The computational complexity of the algorithms is analyzed and their use is illustrated by two examples: we compute converging reduced order models for a linear system describing the model of a building and we provide, exploiting an approximation of the moment, a nonlinear planar reduced order model for a nonlinear DC-to-DC converter. © 2017 The Author(s). Published by Elsevier Ltd. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).


Introduction
The availability of mathematical models is essential for the analysis, control and design of modern technological devices. As the computational power has advanced, the complexity of these mathematical descriptions has increased. This has kept the computational needs at the top or over the available possibilities (Åström & Kumar, 2014). A solution to this problem is represented by model reduction which consists in finding a simplified mathematical model which maintains some key properties of the original description.
In the linear framework several techniques have been developed by the systems and control community (Antoulas, 2005). These techniques can be divided into two main groups: approximation methods based on singular value decomposition (SVD) and approximation methods based on Krylov projectors, also known as Willcox, and Backx (2008), Hinze and Volkwein (2005), Volkwein (1999, 2008), Rowley, Colonius, and Murray (2004) and Willcox and Peraire (2002). A nonlinear enhancement of the moment matching approach has eluded researchers until recently. In Gallivan et al. (2004Gallivan et al. ( , 2006, a characterization of the moments in terms of the solution of a Sylvester equation has been given. Exploiting this relation, in Astolfi (2007Astolfi ( , 2010 a new interpretation of the moment matching approach, based on a steady-state description of moment, has been proposed. This has led to further developments in the model reduction field, e.g. the extension of the model reduction theory to linear and nonlinear time-delay systems (Scarciotti & Astolfi, 2016b), see also Ionescu and Astolfi (2013b), Ionescu, Astolfi, and Colaneri (2014), Scarciotti (2015) and Astolfi (2015a, 2016a). When dealing with model reduction, one usually starts with a high-dimensional model to be reduced. In fact, most of the methods assume the knowledge of a state-space model of the system to be reduced. However, in practice this model is not always available. Among the moment matching methods, a datadriven approach to reduce linear systems has been proposed under the name of Loewner framework (Mayo & Antoulas, 2007). This method constructs reduced order models by using matrices composed of frequency-domain measurements, which makes intrinsically difficult to extend the method to nonlinear systems. In this paper, inspired by the learning algorithm given in Bian, Jiang, and Jiang (2014) and Jiang (2012, 2014) to solve a model-free adaptive dynamic programming problem (see also the references therein, e.g. Baird, 1994;Vrabie, Pastravanu, Abu-Khalaf, & Lewis, 2009), we propose time-domain data-driven online algorithms for the model reduction of linear and nonlinear systems. Collecting time-snapshots of the input and output of the system at a given sequence of time instants t k , two algorithms to define families of reduced order models (in the framework introduced in Astolfi, 2010) at each instant of the iteration t k are devised. The reduced order models asymptotically match the moments of the unknown system to be reduced. These algorithms have several advantages with respect to an identification plus reduction technique: there is no need to identify the system, which is expensive both in terms of computational power and storage memory; since the reduced order model matches the moments of the unknown system, it is not just the result of a low-order identification but it actually retains some properties of the larger system; since the proposed algorithm determines directly the moment of a nonlinear system from the input and output data, it does not involve the computation of the solution of a partial differential equation, which is usually a difficult task; in addition, for linear systems the capability of determining reduced order models from data matrices (of order proportional to ν) is computationally efficient when compared with model reduction obtained by manipulating the system matrices (of order n ≫ ν).
Thus, the method is of computational value, for both linear and nonlinear systems, even when the system to be reduced is known.
This work is at the intersection of large and active research areas, such as model reduction and system identification. For instance, the use of time-snapshots and data matrices is reminiscent of the methods of POD, see e.g. Lorenz (1956) and Noack, Afanasiev, Morzynski, Tadmor, and Thiele (2003) and of subspace identification, see e.g. van Overschee and de Moor (1996) and Verhaegen and Verdult (2007). Moreover, while the Loewner framework is a frequency-domain data-driven method proposed in model reduction, various time-domain data-driven techniques have been presented in system identification, such as the eigensystem realization algorithm, see e.g. Cooper (1999), Houtzager, van Wingerden, andVerhaegen (2012), Majji, Juang, and Junkins (2010) and Rebolho, Belo, and Marques (2014), and the dynamic mode decomposition, see Hemati, Williams, and Rowley (2014). Hence, this paper can be seen as part of the model reduction literature or of the system identification literature. However, since the results of the paper have strong connections with the notion of moment, we present the results using the terminology of model reduction.
The rest of the paper is organized as follows. In Section 2.1 (3.1) the definition of moment and the model reduction techniques, as developed in Astolfi (2010), are recalled for linear (nonlinear) systems. In Section 2.2 we give a preliminary analysis to compute on-line estimates of the moments of a linear system. In Section 2.3 (3.2) approximations which converge asymptotically to the moments of the linear (nonlinear) system are given. Therein, a discussion of the computational complexity associated with the evaluation of these approximations is presented, a recursive least-square formula is given and a moment estimation algorithm is provided. In Section 2.4 (3.3) we give a family of reduced order models for linear (nonlinear) systems. In Section 2.5 we discuss how several properties, e.g. matching with prescribed eigenvalues or zeros, can be enforced in the present scenario. In Section 2.6 a linear reduced order model describing the dynamics of a building is estimated with the method proposed in the paper. In Section 3.4 a nonlinear reduced order model constructed using an approximation of the moment of a DC-to-DC converter provides a further example. Finally Section 4 contains some concluding remarks.
Preliminary versions of this paper have been published in Scarciotti and Astolfi (2015b,c). The additional contributions of the present paper are as follows: the results are presented in a more formal and organized way (by means, e.g. of theorems and propositions) and all the results are directly proved; two recursive algorithms are given; properties of the exponentially converging models, such as matching with prescribed eigenvalues, are discussed; the reduction of the linear model describing the dynamics of a building provides a new example based on a physical system; the so-called ''U/Y'' variation of the nonlinear algorithm is given. Finally, the example in Scarciotti and Astolfi (2015c) is extended providing a nonlinear planar reduced order model estimated with the presented techniques.
While this paper was under review, the proposed algorithm has been applied to the problem of model reduction of power systems (Scarciotti, 2017). Therein the authors provide an extensive testing of the linear algorithm. This suggests that the results of the present paper can be of relevance for applications in diverse research domains.
Notation. We use standard notation. R ≥0 (R >0 ) denotes the set of non-negative (positive) real numbers; C <0 denotes the set of complex numbers with negative real part; C 0 denotes the set of complex numbers with zero real part. The symbol I denotes the identity matrix and σ (A) denotes the spectrum of the matrix A ∈ R n×n . The symbol ⊗ indicates the Kronecker product and ∥A∥ indicates the induced Euclidean matrix norm. The symbol ϵ k indicates a vector with the kth element equal to 1 and with all the other elements equal to 0. The vectorization of a matrix A ∈ R n×m , denoted by vec(A), is the nm × 1 vector obtained by stacking the columns of the matrix A one on top of the other, namely vec(A) = [a ⊤ 1 , a ⊤ 2 , . . . , a ⊤ m ] ⊤ , where a i ∈ R n is the ith column of A and the superscript ⊤ denotes the transposition operator.

Model reduction by moment matching -recalled
To render the paper self-contained in this section we recall the notion of moment for linear systems as presented in Astolfi (2010). Consider a linear, single-input, single-output, continuoustime, system described by the equationṡ with x(t) ∈ R n , u(t) ∈ R, y(t) ∈ R, A ∈ R n×n , B ∈ R n×1 and C ∈ R 1×n . Let W (s) = C (sI − A) −1 B be the associated transfer function and assume that (1) is minimal, i.e. controllable and observable.
In Astolfi (2010), exploiting the observation that the moments of system (1) can be characterized in terms of the solution of a Sylvester equation (see Gallivan et al., 2004Gallivan et al., , 2006, it has been noted that the moments are in one-to-one relation with the well-defined steady-state output response of the interconnection between a signal generator and system (1). This interpretation of the notion of moment, which relies upon the center manifold theory, has the advantage that it can be extended to nonlinear systems and it is of particular interest for the aims of this paper.
Moreover, system (1) has a global invariant manifold described by M = {(x, ω) ∈ R n+ν : x = Πω}. Hence, for all t ∈ R, Finally, as shown in Astolfi (2010), the family of systemṡ with G any matrix such that σ (S) ∩ σ (S −GL) = ∅, contains all the models of dimension ν interpolating the moments of system (1) at the eigenvalues of S. Hence, we say that system (5) is a model of (1) at S.

A preliminary analysis
In this section we provide a preliminary analysis assuming to know the matrices A, B, C and the state x(0) in Eq. (1). This analysis is used in the following section for the development of an estimation algorithm which, this time, does not use the matrices A, B, C and the state x(0). To this end we make the following assumptions.
Assumption 1. The input u of system (1) is given by the signal generator (3), with S such that σ (S) ⊂ C 0 with simple eigenvalues. In addition, assume that the triple (L, S, ω(0)) is minimal. 1 A matrix is non-derogatory if its characteristic and minimal polynomials coincide.
2 The matrices A, B, C and the points s i identify the moments. Then, given any observable pair (L, S) with s i ∈ σ (S), there exists an invertible matrix T ∈ R ν×ν such that the elements of the vector C ΠT −1 are equal to the moments. Assumption 1 has a series of implications. The hypothesis on the eigenvalues of S is reasonable since the contribution of the negative eigenvalues of S to the response of the system decays to zero. The minimality of the triple (L, S, ω(0)) implies the observability of the pair (L, S) and the ''controllability'' of the pair (S, ω(0)). This last condition, called excitability of the pair (S, ω(0)), is a geometric characterization of the property that the signals generated by (3) are persistently exciting, see Åström and Wittenmark (1995),  and Verhaegen and Verdult (2007). The choice of the particular structure (3) for the input u is limiting in applications in which the input cannot be arbitrarily chosen. We suggest in Section 2.3 a possible way to deal with alternative input signals. Note that the two assumptions imply that σ (A) ∩ σ (S) = ∅, which in turn implies that Eq.
(2) has a unique solution or, equivalently, that the response of system (1) driven by (3) is described by (4). We evaluate Eq. (4) over a set of sample times T p k = {t k−p+1 , . . . , t k−1 , t k } with 0 ≤ t 0 < t 1 < · · · < t k−p < · · · < t k < · · · < t q , with p > 0 and q ≥ p. The set T p k represents a moving window of p sample times t i , with i = 0, . . . , q. We call Π k the estimate of the matrix Π at T p k , namely the estimate computed at the time t k using the last p instants of time t k−p+1 , . . . , t k . The estimate can be computed as follows.
Theorem 2. Let the time-snapshots Q k ∈ R np×nν and χ k ∈ R np , with p ≥ ν, be defined as respectively. Assume the matrix Q k has full column rank, then Proof. Eq. (4) can be rewritten as Using the vectorization operator and the Kronecker product on Eq. (7) yields Computing Eq. (8) at all elements of T p k yields If the matrix Q k has full column rank, then we can compute Π k from the last equation yielding Eq. (6).
Note that the selection of the set T p k can affect the quality of the data and the rank of the matrix Q k . Thus, to assure that T p k is nonpathological (Chen & Francis, 1995) we introduce the following technical assumption.
If Assumption 1 holds then it is always possible to choose the elements of T ν k , i.e. the sampling times, such that Assumption 3 holds (see Lemma 1 in Padoan, Scarciotti, & Astolfi, in press, for a proof of this fact). We now show that this assumption can be used to make the matrix Q k full rank. Lemma 1. Suppose Assumptions 1-3 hold. If p = ν, then Q k is square and full rank.
. Since σ (A) ⊂ C <0 and σ (S) ⊂ C 0 , the excitability of (S, ω(0)) implies that the i-block element of the matrix Q k is a n × nν matrix of rank n. Since the blocks are chosen corresponding to the elements of T ν k , by Assumption 3 the linear independence holds for all k. As a result Q k is a square full rank matrix.
The claim follows noting that these blocks are chosen according to the elements of T ν k . Thus by Assumption 3 the blocks are linearly independent for all k.
Since real data are affected by noise, the assumptions of Lemma 1 may not hold. In this case p, i.e. the number of samples in the moving window, can be selected larger than ν and, as well-known from linear algebra and remarked in Åström and Wittenmark (1995) and Jiang and Jiang (2012), the solution of Eq. (6) is the least squares solution of (9). We now prove that the estimate Π k is actually equal to Π. Theorem 3. Suppose Assumptions 1-3 hold. Let Π be the solution of Eq. (2). Then Π k computed by Eq. (6) is equal to Π.
Proof. The matrix Π k defined in Eq. (6) is such that the equations hold. Consider the first equation of system (1) computed at t k , namelẏ Substituting Eqs. (10) and (11) in Eq. (12) yields The discussion carried out so far has the drawback that information on the state of the system is required. In practice, this is usually not the case and only the output y is available. Thus, we look for a counterpart of Theorem 2 in which the output is used in place of the state. Theorem 4. Let the time-snapshots R k ∈ R w×nν and Υ k ∈ R w , with w ≥ nν, be defined as respectively. Assume the matrix R k has full column rank, then Proof. The result can be proved following the same steps used to obtain Eq. (6).
Similarly to Lemma 1, the following result guarantees that the matrix R k is full rank.
Lemma 2. Suppose Assumptions 1-3 hold. If w = nν, then R k is square and full rank.
Proof. The proof is omitted because similar to the one of Lemma 1, although this time also the observability of (C, A) is used.

On-line moment estimation from data
Eq. (6) contains terms that depend upon the matrix A and the initial states x(0) and ω(0). However, exploiting the fact that these terms enter the response as exponentially decaying functions of time, i.e. ω(0) ⊤ ⊗ e At and e At x(0), we present now an approximate version of the results of the previous section.
Assume the matrix  Q k has full column rank, then following the same steps used to obtain Eq. (6), we define Note that if Assumption 3 holds and p = ν, then  Q k is square and full rank (the proof of this fact is similar to the proof of Lemma 1 when x(0) = Πω(0), thus, it is omitted). We now prove that  Π k converges to Π. To this end, we first present a preliminary result.
Proof. By Assumptions 1 and 2 there exists a matrixΠ such that the steady-state response x ss (t) of the interconnection of system (1) and the generator (3) is described by the equation (14), (14) is such that Substituting Eqs. (15) and (11) in Eq. (12) yields (0)), from which, using Eq.
(2) and Assumption 2, the equation (0)) follows. By Assumption 1 there exists a sequence {t k }, with lim k→∞ t k = ∞, such that for any t i ∈ {t k }, ω(t i ) ̸ = 0 and Assumption 3 holds. By Assumption 2 (0) where the superscript * indicates the complex conjugate transpose. The dimensions of ∆ are related to the number of samples, whereas the dimensions of Π are related to the order of the system to be reduced and of the signal generator.
In fact, the POD is a decomposition of the entire cloud of data {x(t i )} along the vectors µ(t i ), called principal directions of {x(t i )} (Antoulas, 2005). By contrast, in the technique proposed in this paper the oldest data are discarded as soon as new data satisfying Assumption 3 is collected. As a consequence, while ∆ is built to describe the entire dynamics of {x(t i )}, Π is built to describe only the steady-state response of the system to be reduced. The result is that the POD is usually used with the Petrov-Galerkin projection for a SVD-based approximation (Rowley et al., 2004;Willcox & Peraire, 2002), whereas this technique is a moment matching method.
A similar discussion can be carried out for Eq. (13) that contains also terms which depend upon the matrix C . In this case note that Eq. (4) can be written as (0)) an exponentially decaying signal.
Thus, an approximate version of Theorem 4 follows.
Theorem 6. Define the time-snapshots  R k ∈ R w×ν and  Υ k ∈ R w , with w ≥ ν, as Assume the matrix  R k has full column rank, then is an approximation of the on-line estimate C Π k , namely there exists Proof. Eq. (16) can be derived following the same steps used to obtain Eq. (6). The convergence of the limit to C Π is proved repeating the proof of Theorem 5.
Note that if Assumption 3 holds and w = ν, then  R k is square and full rank (the proof of this fact is similar to the proof of Lemma 1, thus, it is omitted). Since  R k is smaller than R k , the determination of  C Π k is computationally less complex than the computation of  Π k .
Note also that, from Eq. (16) we are not able to retrieve the matrix  Π k , but only  C Π k . However, as shown in Eq. (5), we only need C Π to compute the reduced order model, i.e. Π is not explicitly required. Eq. (16) is a classic least-square estimation formula. Thus, we can provide a recursive formula. and holds for all t ≥ t r .
Proof. The formula is obtained adapting the results in Åström and Wittenmark (1995) (see also Ben-Israel & Greville, 2003;Greville, 1960;Wang & Zhang, 2011) to the present scenario, in which at each step we acquire a new measure and we discard an old measure: for completeness we provide the details of the proof. Note that Substituting the first equation in the second we obtain which substituted in (16), namely (17). Finally Eqs. (18) and (19) are obtained applying recursively the matrix inversion lemma (Åström & Wittenmark, 1995) Note that the construction of the initial values vec(  C Π r ), Φ r and Ψ r needed to start the recursion can be done in two ways: the first consists in using Eq. (16) to build vec(  C Π r ), Φ r and Ψ r and then updating the estimate with the equations in Theorem 7. However, this method has the drawback of requiring the inversion The second method consists in starting with dummy initial values vec(  C Π r ), Φ r and Ψ r . Since the formulas ''forget'' the oldest measurements, after a sufficient number of iterations all the dummy measurements are forgotten.
Since for single-input, single-output systems the terms (I + In comparison, the Arnoldi or Lanczos procedure for the model reduction by moment matching has a computational complexity of O(νn 2 ) (Antoulas, 2005, Section 14.1) (or O(ανn) for a sparse matrix A, with α the average number of non-zero elements per row/column of A). In addition, note that these procedures require a model to be reduced and thus further expensive computation has to be considered for the identification of the original system.
The approximations  Π k and  C Π k can be computed with the following algorithm.
1: Construct the matrices  Q k and  χ k (  R k and  Υ k , respectively). Else increase w. If k−w < 0 then restart the algorithm selecting a larger initial

4: Stop.
As already noted, it is more realistic to approximate C Π k , using output measurements, than to approximate Π k , which needs state measurements. Moreover, the determination of  C Π k is computationally more efficient since the number of unknown elements is smaller. Nevertheless, the determination of  Π k is not irrelevant. From a theoretical point of view,  Π k provides a way to determine the solution of the Sylvester equation from measurements. Note also that  Π k contains more information than  C Π k . In particular, it provides information regarding the order of the unknown system. Thus, the estimation of Π k paves the way for an extension of this work in the direction of system identification.
3 This is the computational complexity of the fastest algorithm (Le Gall, 2014) for the inversion and multiplication of matrices. If the classical Gauss-Jordan elimination is used, then the computational complexity is O(ν 3 ).

Remark 2.
It is not always possible to arbitrarily select the input of the system to be reduced. For instance the input signal may be composed of several frequencies. Instead of system (3), consider the input described by the equationṡ with v(t) ∈ R n an unknown signal. In this case the output response of system (1) is which can be written as (0)).
One can then apply the filtering techniques given in Åström and Wittenmark (1995, Chapter 11): we filter out v from y and u with a band-pass filter and apply the results of the paper to the filtered y f and u f .

Remark 3.
When the class of inputs is not given, it may be necessary to carry out a preliminary analysis to estimate specific frequencies at which we would like to interpolate the transfer function. This analysis requires an additional step. However, note that the estimation of some of the frequency peaks of the transfer function is not as computationally expensive as a full system identification procedure for two reasons: on one hand we are interested in ν ≪ n frequency features; on the other hand, since our aim is to obtain a reduced order model, we have the additional benefit of obtaining directly a model of order ν (in contrast with the additional computational cost of determining a reduced order model after a high-order identification procedure).

Families of reduced order models
Using the approximations given by Algorithm 1 a reduced order model of system (1) can be defined at each instant of time t k .
Definition 3. Consider system (1) and the signal generator (3). Suppose Assumptions 1-3 hold. Then the systeṁ , is a model of system (1) at S at time t k , if there exists a unique solution P k of the equation such that where  C Π k is the solution of (16).
With this definition we can formulate the following result.
Proposition 1. Select P k = I, for all k ≥ 0. If σ (F k ) ∩ σ (S) = ∅ for all k ≥ 0, then the model is a model of system (1) at S for all t k .
Note that for most purposes, models (22) and (25) can be defined using directly the asymptotic value of  C Π k . However, defining the reduced order model at each instant of time t k allows to implement the algorithm online. This is particularly advantageous when the unknown system has a parameter which is subject to variation. If the variation is sufficiently slow, then the algorithm would be able to produce updated reduced order models at each t k .
Remark 4. The so-called Loewner framework represents an alternative data-driven approach to model reduction by moment matching (Mayo & Antoulas, 2007). One of the main differences between our approach and the Loewner framework is that we use time-domain measurements, while the Loewner framework makes use of frequency-domain measurements. Thinking in a classical interpolation/Krylov fashion, in the Loewner framework the measurements are used to build projectors with a particular structure. In the framework introduced by Astolfi (2010)  Remark 5. The results developed so far can be easily extended to linear time-delay systems. In fact, Algorithm 1 can be used to estimate the moments of linear time-delay systems without any modification. On the other hand the choice of the structure of the time-delay reduced order model is more difficult. We refer the reader to Astolfi (2015b, 2016b) for more details.

Properties of the exponentially converging models
In Astolfi (2010), Ionescu and Astolfi (2013a), Ionescu et al. (2014) and Scarciotti and Astolfi (2016b) the problem of enforcing additional properties and constraints on the reduced order model has been studied. In this section we go through some of these properties to determine if they hold for the families (25) and under which conditions.
(1) Matching with prescribed eigenvalues: Consider system (25) and the problem of determining at every k the matrix G k such that σ (F k ) = {λ 1,k , . . . , λ ν,k } for some prescribed values λ i,k . The solution of this problem is well-known and consists in selecting G k such that σ (S − G k L) = σ (F k ). This is possible for every k and for all λ i,k ̸ ∈ σ (S) because G k is independent from the estimate  C Π k . Note also that by observability of (L, S), G k is unique at every k.
(2) Matching with prescribed relative degree, matching with prescribed zeros, matching with compartmental constraints: These problems can be solved at every k as detailed in Astolfi (2010) for all s ∈ σ (S) at k. Even though the asymptotic value of  C Π k satisfies this condition there is no guarantee that the condition holds for all k. However, if the condition holds for the asymptotic value, then there existsk ≫ 0 such that for all k ≥k condition (26) holds.

Los Angeles University Hospital building model
In this section we apply Algorithm 1 to the model of a building (Los Angeles University Hospital) with 8 floors, each having three degrees of freedom (Antoulas, Sorensen, & Gugercin, 2001;Chahlaoui & Van Dooren, 2005). The model is described by equations of the form (1) with a state of dimension n = 48. The output of the system is the motion of the first coordinate. Note that this model has been reduced with various methods in Antoulas (2005), obtaining a reduced-order model of order ν = 31. In this paper we reduce the system with a model of order ν = 19, interpolating at the points 0, ±5.22ι, ±10.3ι, ±13.5ι, ±22.2ι, ±24.5ι, ±36ι, ±42.4ι, ±55.9ι and ±70ι, corresponding to the main peaks in the magnitude of the frequency response of the system. A reduced order model (25) at time t k has been constructed assigning the eigenvalues of F k (determined with the data-driven technique given in Scarciotti, Jiang, & Astolfi, 2016). Fig. 1 shows the Bode plot of the system (solid/blue line) and of the estimated reducedorder model at t = 100 (dotted/red line). The green circles indicate the interpolation points. The surface in Fig. 2 (Fig. 3, respectively) represents the magnitude (phase, respectively) of the transfer function of the reduced order model as a function of t k , with 4.1 ≤ t k ≤ 23.7 s. The solid/blue line indicates the magnitude (phase, respectively) of the transfer function of the reduced order model for the exact moments C Π. The figures show how the approximated magnitude and phase of model (25) at S of system (1) evolve over time and approach the exact reduced order model as t k → ∞. We also check the qualitative behavior of the reduced order model when the input is not an interpolating signal. To this end, we select the input u = c 1 cos(0.001t) + c 2 sin(0.1t) + c 3 sin(3t) + c 4 cos(15t) + c 5 sin(23t) + c 6 sin(37t) + c 7 sin(50t), (27) where c i , with i = 1, . . . , 7, are randomly generated weights such that  7 i=1 c i = 1. Fig. 4 (top graph) shows the output response of the building model in solid/blue line and the output response of the asymptotic reduced order model in dotted/red line when the input to the two systems is selected as in (27). The bottom graph shows the normalized absolute error between the two output responses, i.e. |y(t) − Ψ ∞ (t)|/ max(y(t)). We see that the error is fairly small even though the input is not an interpolation signal.

Model reduction by moment matching -recalled
Similarly to the linear case, to render the paper self-contained we recall the notion of moment for nonlinear systems as presented   in Astolfi (2010). Consider a nonlinear, single-input, single-output, continuous-time, system described by the equationṡ with x(t) ∈ R n , u(t) ∈ R, y(t) ∈ R, f and h smooth mappings, a signal generator described by the equationṡ with ω(t) ∈ R v , s and l smooth mappings, and the interconnected In addition, suppose that f (0, 0) = 0, s(0) = 0, l(0) = 0 and h(0) = 0.
The following two assumptions play, in the nonlinear framework, the role that Assumptions 1 and 2 play in the linear case. These two assumptions can be used to give a description of steadystate response for the nonlinear system (28).
Lemma 4 implies that the interconnected system (30) possesses an invariant manifold described by the equation x = π (ω), which can be used to define the moment for nonlinear systems.

On-line moment estimation from data
Solving Eq. (31) with respect to the mapping π is a difficult task even when there is perfect knowledge of the dynamics of the system, i.e. the mapping f . When f is not known Eq. (31) can be solved numerically, however this has the additional drawback of requiring information on the state of the system. In practice, this is usually not the case and only the output y is available, with the consequence that also the mapping h has to be known. Note that given the exponential stability hypothesis on the system and Lemma 4, the equation where ε(t) is an exponentially decaying signal, holds. We introduce the following assumption.
The assumption that the mapping to be approximated can be represented by a family of basis functions is standard, see e.g. Toth (2010). For some families of basis functions, e.g. radial basis functions, there exist results of ''universal'' approximation (Park & Sandberg, 1991;Rocha, 2009). Practically a trial and error procedure can be implemented, for instance starting with the use of a polynomial expansion or with the use of an expansion based on functions belonging to the same class as the ones generated by the signal generator (e.g. sinusoids, for sinusoidal inputs). Thus, let with N ≤ M. Using a weighted sum of basis functions, Eq. (33) can be written as where e(t) =  M N+1 γ j ϕ j (ω(t)) is the error resulting by terminating the summation at N. Consider now the approximation which neglects the error e(t) and the transient error ϵ(t). Let Γ k be an on-line estimate of the matrix Γ computed at T w k , namely computed at the time t k using the last w instants of time t i assuming that e(t) and ϵ(t) are known. Since this is not the case in practice, define  Γ k as the approximation, in the sense of (35), of the estimate Γ k . Finally we can compute this approximation as follows.
Theorem 8. Define the time-snapshots  U k ∈ R w×N and  Υ k ∈ R w , with w ≥ ν, as and If  U k is full column rank, then is an approximation of the estimate Γ k .
To ensure that the approximation is well-defined for all k, we need that the elements of T w k be selected such that  U ⊤ k  U k is full column rank. This condition expresses a property of persistence of excitation that is guaranteed by the following assumption .
Assumption 7. The initial condition ω(0) of system (29) is almost periodic 6 and all the solutions of the system are analytic. In addition, system (29) satisfies the excitation rank condition 7 at ω(0).
To ease the notation we introduce the following definition.
Definition 5. The estimated moment of system (28) is defined as for all t ∈ R, with  Γ k computed using (38).
Note that a nonlinear counterpart of Theorem 7 can also be formulated. The recursive least-square algorithm is obtained with Ω(ω(t k )) playing the role of ω(t k ),  U k playing the role of  R k and )Ω(ω (t k )) ⊤ ) −1 . Similarly, Algorithm 1 can be adapted to the present scenario to determine the approximation  h • π N,k (if the system is linear, Algorithm 1 is recovered selecting N = ν and φ j = ϵ j ).
Algorithm 2. Let k be a sufficiently large integer. Select η > 0 sufficiently small. Select w ≥ ν.  (40)). Else increase w. If k−w < 0 then restart the algorithm selecting a larger initial then k = k + 1 go to 1.

4: Stop.
The convergence of the estimated moment is guaranteed by the next result.
Theorem 9. Suppose Assumptions 4-7 hold. Then lim t→∞ Proof. Assumption 7 guarantees that the approximation  Γ k is well-defined for all k, whereas Assumptions 4 and 5 guarantee that Lemma 4 holds and thus that h • π is well-defined. The quantity ∥ε(t k )∥ vanishes exponentially by Assumption 5.
Note that if we try to apply the linear algorithm to data generated by a nonlinear system, then the algorithm would not converge. In fact, the algorithm would try to encode in a linear term information regarding higher order terms. Hence,conditions (20) or (21) would not be satisfied.
Up to this point we have always considered one trajectory ω(t).
While this is sufficient in a linear setting, in which local properties are also global, it may be restrictive in the nonlinear setting. To solve this issue, Algorithm 2 can be easily modified to operate with multiple trajectories. To this end, it suffices to implement the algorithm replacing the matrices  U k and  Υ k with the matrices respectively, where  U i k and  Υ i k are the matrices in (36) and (37), respectively, sampled along the trajectory of system (29) starting from the initial condition ω(0) = ω i 0 , with q ≥ 1. We refer to this method as the ''U/Y'' variation.

Families of reduced order models
Using the approximation given by (39) a reduced order model of system (28) can be defined at each instant of time t k .
Definition 6. Consider system (28) and the signal generator (29). Suppose Assumptions 4-7 hold. Then the systeṁ with ξ (t) ∈ R ν , u(t) ∈ R, ψ(t) ∈ R and φ k and κ k smooth mappings, is a model of system (28) at (s, l) at time t k , i.e. system (43) has the same moment of system (28) at (s, l), if the equation has a unique solution p k such that where  h • π N,k (ω) is obtained by (38).
The next result is a direct consequence of Definition 6 and Astolfi (2010).

Then the system described by the equationṡ
is a model of system (28) at (s, l) for all t k .
Similarly to the linear case (see Section 2.5), the conditions given in Astolfi (2010) to enforce additional properties upon the reduced order model can be adapted to hold in the present scenario. Moreover, the results of this section can be extended to time-delay systems. In fact, Algorithm 2 can be used to estimate the moments of nonlinear time-delay systems without any modification, see Astolfi (2015c, 2016b).

An approximated nonlinear reduced order model of the DC-to-DC Ćuk converter
In this section we revisit the example given in Scarciotti and Astolfi (2015c). Therein, using a linear signal generator, a reduced order model of the DC-to-DC Ćuk converter with linear state dynamics has been given. Herein, we obtain a planar reduced order model with nonlinear state dynamics using an input generated by a nonlinear mapping l. Note that since the system is of loworder, the example serves only illustrative purposes. In particular, we provide a proof of principle that the method allows to obtain a nonlinear reduced order model without solving a partial differential equation.
The averaged model of the DC-to-DC Ćuk converter is given by the equations (Rodriguez, Ortega, & Astolfi, 2005) where x 1 (t) ∈ R ≥0 and x 3 (t) ∈ R ≤0 describe the averaged currents, x 2 (t) ∈ R ≥0 and x 4 (t) ∈ R ≤0 the averaged voltages, L 1 , C 2 , L 3 , E and G positive parameters and u(t) ∈ (0, 1) a continuous control signal which represents the slew rate of a pulse width modulation circuit used to control the switch position in the converter. In the remaining of the paper we used the numerical values given in Rodriguez et al. (2005) to simulate system (47). Consider the input generated by the equationṡ ω 1 = −75ω 2 ,ω 2 = 75ω 1 , which generates a positive input signal with higher order harmonics.
This polynomial approximation fits well for u < 0.8, whereas it does not decrease as fast as the actual output of the system when the input is close to 1: for this value the output of the converter becomes negatively unbounded. This suggests that the following results can be improved if other basis functions are used. The reduced order model is chosen as in Proposition 2, with δ η = 220  The top graph in Fig. 5 shows the time histories of the output of system (47) (solid/blue line) and of the reduced order model (dotted/red line) for the input sequence represented in the bottom 8 Simulations showed that polynomial surfaces of higher order are necessary to have a satisfactory approximation in a larger region W . graph. The input is obtained switching ω(0) every 0.5s (the switching times are indicated by solid/gray vertical lines). ω(0) takes, in order, the values of (−0.45, −0.45), (−0.25, −0.45), (0.15, 0.05) and (0.5, 0.5). The middle graph in Fig. 5 shows the absolute error (dashed/green line) between the two outputs. We note that the error is mostly due to the poor transient performance. This problem could be alleviated with a selection of δ η as a function of ξ . The already small steady-state error could be further reduced with a different selection of basis functions.

Conclusion
We have presented a theoretical framework and a collection of techniques to obtain reduced order models by moment matching from input/output data for linear systems and nonlinear systems. The approximations proposed in the paper have been exploited to construct families of reduced order models. We have shown that these models asymptotically match the moments of the unknown system to be reduced. Conditions to enforce additional properties upon the reduced order model have been discussed. The use of the algorithm is illustrated by several examples.