Fundamental Limitations of Ad Hoc Linear and Quadratic Multi-level Regression Models for Physical Systems

A central issue in contemporary applied mathematics is the development of simpler dynamical models for a reduced subset of variables in complex high dimensional dynamical systems with many spatio-temporal scales. Recently, ad hoc quadratic multi-level regression models have been proposed to provide suitable reduced nonlinear models directly from data. The main results developed here are rigorous theorems demonstrating the non-physical finite time blow-up and large time instability in statistical solutions of general scalar multi-level quadratic regression models with corresponding unphysical features of the invariant measure. Surprising intrinsic model errors due to discrete sampling errors are also shown to occur rigorously even for linear multi-level regression dynamic models. all of these theoretical results are corroborated by numerical experiments with simple models. Single level nonlinear regression strategies with physical cubic damping are shown to have significant skill on the same test problems. 1. Introduction. A central issue in contemporary applied mathematics is the development of simpler dynamical models for a reduced subset of variables in complex high dimensional dynamical systems with many spatio-temporal scales. This is an important issue in systems ranging from biomolecular dynamics to climate atmosphere ocean science to engineering turbulence. The approaches to address these issues range from analytic stochastic mode reduction strategies [1, 2, 3, 4] to purely data driven strategies [5, 6, 7, 23] to methods which use physical analytic properties to constrain data driven methods [8, 9, 10, 21]. The references listed above extensively describe other related earlier work on these issues. Another straightforward approach relies on adapting multi-level regression techniques for nonlinear time series to fit the data output of some variables of a given nonlinear system without regard to any physical constraints. Single level linear regression models of this type for a reduced subset of variables have some skill for


Introduction.
A central issue in contemporary applied mathematics is the development of simpler dynamical models for a reduced subset of variables in complex high dimensional dynamical systems with many spatio-temporal scales.This is an important issue in systems ranging from biomolecular dynamics to climate atmosphere ocean science to engineering turbulence.The approaches to address these issues range from analytic stochastic mode reduction strategies [1,2,3,4] to purely data driven strategies [5,6,7,23] to methods which use physical analytic properties to constrain data driven methods [8,9,10,21].The references listed above extensively describe other related earlier work on these issues.
Another straightforward approach relies on adapting multi-level regression techniques for nonlinear time series to fit the data output of some variables of a given nonlinear system without regard to any physical constraints.Single level linear regression models of this type for a reduced subset of variables have some skill for low frequency variability ( [11] and references therein) and suitable improved versions have very high skill for active data assimilation of turbulent signals ( [12] and references therein) as well as some skill for the low frequency response to changes in external forcing [13,14].Recently, quadratically nonlinear multi-level regression models have been utilized to determine suitable reduced nonlinear models from data [15,16,17] and significant skill for these models for fitting the time series as well as for prediction has been reported in that work.All of these ad hoc regression strategies suffer from severe model error; in other words, they do not model the exact nonlinear dynamics of the physical system on the reduced subset of variables [14,22].
The goal of the present work is to develop a series of rigorous mathematical theorems which illustrate unambiguously the fundamental limitations of such ad hoc linear and quadratic multi-level regression models and also to demonstrate these effects in numerical experiments with simple models.These limitations include non-physical finite time blow-up and large time instability in statistical solutions of general scalar multi-level quadratic regression models with corresponding unphysical features of the invariant measure.These results are presented in section 3 below while the preliminary section 2 contains a brief introduction to physics constrained stochastic mode reduction models and ad hoc multi-level linear and quadratic regression models.With the background from section 2, simple theorems are developed in section 4 which point out some remarkable fundamental limitations in multi-level regression modeling for reduced sets of variables even for linear models due to model error involving discrete sampling of the time series.Other interesting recent work on this topic involves discrete sampling of systems with widely separated time scales [18,19,20]; the examples and theory here apply to situations without such a wide separation of time scale and highlight the role of non-Markovian features.Section 5 is a brief concluding discussion.
2. Physical stochastic mode reduction and Ad hoc multi-level nonlinear regression models.Here we introduce physical stochastic mode reduction and both linear and quadratic multi-level nonlinear regression models in the context of an observed scalar variable which arises from a true or perfect model with more degrees of freedom and with nonlinear dynamics which is not known in detail, in other words, there is model error.The emphasis here is on simple models which illustrate the typical features of vastly more complicated systems.
2.1.Physical stochastic mode reduction: A prototype model.The threemode dyad model discussed next is a simple nonlinear stochastic model in the family of prototype models [3,4,8] which mimic the structure of dynamical cores in largescale climate models.One of the modes in the model, x 1 , is intended to represent a slowly-evolving variable accessible to observation, whereas the remaining modes, x 2 , x 3 , act as surrogate variables for the unresolved (subgrid scale) degrees of freedom in the climate model.The unresolved modes are coupled to x 1 linearly and via a dyad interaction between x 1 and x 2 , and the model also allows for external forcing to act on mode x 1 .Specifically, the governing stochastic differential equations are dx 1 =(Ix 1 x 2 + L 12 x 2 + L 13 x 3 + F + L 11 x 1 )dt, (2.1a) ) where {W 1 , W 2 } are independent Wiener processes, and the parameters I, {L 11 , L 12 , L 13 }, and F , respectively measure the dyad interaction, the linear couplings, and the external forcing.Model (2.1), as well as the associated reduced scalar model below in equation (2.2), has been used as a prototype model to develop methods based on the fluctuation-dissipation theorem (FDT) for assessing the low-frequency climate response to external perturbations and model error [13,14,22].The parameters of the three-mode model provide a physical correspondence with a number of important dynamical processes in the cores of comprehensive climate models ( [4], chap.3).Specifically, F represents a large-scale external forcing (e.g., solar heating), while L 11 describes linear dissipative effects, such as viscosity and surface drag ( for this reason, L 11 would generally be viewed as a component of a negative-definite operator).The combined effect of F and L 11 is referred to as the bare truncation, as it would still be part of the equation of motion for x 1 if one were to completely ignore its interaction with the unresolved modes.The coefficients L ij , are components of a skew-symmetric operator representing, e.g., the β−effect associated with the Coriolis force, and the parameter I measures of interaction between modes x 1 and x 2 .Notice that this nonlinear dyad interaction has the physical property of conservation of energy and a physics based reduced model for x 1 alone should reflect this important physical feature [3,8,21].In geophysical flows, this type of interaction is expected to arise whenever the state variables are expansion coefficients in general bases of empirical orthogonal functions (EOFs) for the core dynamical fields [8].
Aside from the above deterministic components, a key ingredient of model (2.1) is stochastic forcing in equations (2.1b) and (2.1c), which, together with the linear damping parameterized by γ i , acts as a surrogate for the nonlinear interaction between the unresolved modes.The parameter ε controls the time-scale separation of the dynamics of the slow and fast modes, with the fast modes evolving infinitely fast relative to the climate-variable time scale in the limit ε → 0.
We now examine the scalar stochastic model associated with the three-mode model in the limit ε → 0. This reduced version of the model is particularly useful in exposing in a transparent manner the influence of the unresolved modes, (x 2 , x 3 ) on the resolved mode, x 1 , when there exists a clear separation of time scales in their respective dynamics (i.e., when ε is small).As follows by applying the MTV mode reduction procedure [1,2,3,8,13] to the coupled system (2.1), the resolved mode, x 1 , in the reduced model is governed by the nonlinear stochastic differential equation which may also be expressed in the form with the parameter values ) , which is a manifestation of the fact that in scalar models of the form (2.3), whose origin lies in multivariate models with multiplicative energy conserving dyad interactions [e.g. the threemode model (2.1)], a non-zero multiplicative-noise parameter, B, is accompanied by a non-zero cubic damping, c.As noted in [8] and proved rigorously in [21], cubic damping has the important role of suppressing power-law tails of the probability density function (PDF) which are not compatible with climate data [9,24].The explicit derivation above can be generalized substantially using general averaging theorems [25,26].
2.1.1.Physics based single level Regression Strategies.The above stochastic mode reduction strategy and the associated normal form in (2.3) yield a physics-based parametric regression strategy where the coefficients F , a, b, c, A, B, σ in (2.3) are all estimated from data in a time series for x 1 even though the dynamics of the true system being modeled is vastly more complicated but with the general physical structural features of (2.1) and the parameter, ε is not necessarily small.Physical examples of this procedure are presented in primitive form in [10] and in [8], using the normal form in (2.3) explicitly.
For simplicity in exposition here, we set A ≡ B ≡ 0 in the regression models in (2.3) and utilize Euler's method for (2.3) resulting in the discrete model Since the coefficients F, a, b, c all enter linearly in (2.5) given the input time series, x n , n = 0, 1, 2..., we follow [15,16,17] and use linear regression least squares estimates for the coefficients F, a, b, c.
The resulting scalar cubic regression model (2.5) has parameter estimators where So, we have that where µ k is the kth moment of x n , µ k = xk .Therefore, we need a long enough time series (compared to the decorrelation time) to have accurate 6 th moment estimation.
Since polynomial regression problem has such strong stiffness, we always replace the input time series by the centered and normalized one x = x − x std(x) with x, the numerical mean and std(x), the standard deviation.This centering and normalizing reduces the conditioning number of this least square problem and gives better estimation of the parameters.
The physical considerations developed earlier require that c satisfy c > 0 in a time series analysis.The limitation of the above physics-based regression models is that they involve only a single level of the time series and cannot include memory effects.

2.2.
Ad hoc multi-level quadratic regression models.The ad hoc quadratic multi-level quadratic regression models proposed and used in [15,16,17] for the scalar variables x have the form, dx =(F + ax + bx 2 )dt + r 1 dt (2.9a) which can be written in compact form as the following (2.11b) The motivation for these models which have N -levels is that they incorporate nonlocal linear memory effects in the time history of x(t) through the N memory levels in (2.9) as follows by successively integrating the equations for r N , r N −1 , ..., r 1 and substituting the formula for r 1 into the equation for x.This is a potential advantage of these methods beyond the Markovian methods in section 2.1.1.All the coefficients F, a, b, q, A, Σ are estimated by a similar linear regression procedure from the time series of successive time differences of x n , n = 0, 1, ...ñ.One disadvantage of the approach is that a huge number of non-unique coefficients need to be measured by the least squares procedure, of order 10 3 for fifteen variables with three levels [15].Another disadvantage of the quadratic regression model in (2.9) is that the quadratic term with b = 0 is introduced in an ad hoc fashion and has no direct physical meaning in contrast to the physical quadratic terms in (2.1), for example, that arise from conservation of energy in the nonlinear dynamics; these physical terms resulted in the normal form in (2.3) with the corresponding physics based regression model discussed in section 2.1.1 above.Even more troubling is that the quadratic coefficient, b = 0, in (2.9) might create non-physical blow-up and instability of statistical solutions of the multi-level regression model in contrast to the correct physical behavior of the underlying system [8,9,21,24].This can dramatically alter the predictive skill of the models in (2.9).Such unphysical features are established rigorously in section 3.Such possibilities were noted in [15,16,17] but discussed as being insignificant.Next we set b = 0 and N = 1 so that we introduce twolevel linear regression models where the issue of model error and non-uniqueness in regression fitting strategies are transparent yet non-trivial.

2.3.
Instructive test models for two-level linear regression.Consider a true signal x(t) to be modeled which is the first component of the solution of the 2 × 2 system of the linear stochastic equations dx =(āx + y)dt (2.12a) The dynamics in (2.12) is linear and has a Gaussian invariant measure provided the stability conditions ā + Ā < 0, ā Ā − q > 0 (2.13) are satisfied.Furthermore, under these circumstances, the zero-mean statistically stationary Gaussian climatological time series for the variable x(t) is complete determined by the correlation function C(τ ) = x(t)x(t + τ ) .Now, suppose we observe only x(t) and don't know the detailed model in (2.12).We follow the ad hoc strategy in section 2.2 above and consider the family of twolevel linear regression models as a special case of (2.9) with b = 0 with the form The models in (2.14) are candidate two-level regression models for the unknown dynamics in (2.12) which arise from fitting a long time series of x(t) alone.Which of the models in (2.14) have equilibrium marginal dynamics for x(t) which exactly recovers the equilibrium statistics, x(t)x(t + τ ) = C(τ )?This is easy to answer through the following result.
Thus, given the observed climatological distribution in (2.12), any two-level regression model in (2.14) automatically, has same equilibrium marginal dynamics for x(t).This is intrinsic non-uniqueness in the two-level regression model in a transparent fashion.The proof of Proposition 2.1 is a direct calculation as follows.
The realizability conditions make sure that the real parts of the eigenvalues of L, Re(λ 1,2 ), are negative, which implies that the two-level model has a stable equilibrium measure.
Under the realizability conditions, this two-level model determines a 2D Gaussian process with zero mean and autocovariance function The equilibrium marginal dynamics, x(t), is also Gaussian with zero mean x = 0 and autocovariance function C 11 (τ ) = (Σe L T τ ) 11 .It is known that x(t) is uniquely determined by the mean and the autocovariance function.We calculate the autocovariance function C 11 (τ ) in two cases.Case 1: Stability coefficient matrix L has two different eigenvalues The autocovariance of the equilibrium marginal dynamics, x(t), is the first entry of C(τ ), Case 2: Stability coefficient matrix L has the same eigenvalues The autocovariance of the equilibrium marginal dynamics, x(t), Therefore, in order to reproduce the autocovariance function of x(t), we need to keep the values of {λ, σ}, equivalently, {a + A, σ}.

Two-Level Regression Algorithms.
For completeness, here we introduce the Two-Level Regression Procedure suggested in [15].
For the linear case, the Two-Level Regression model is given by According to the multi-level regression procedure [15], in practice the increments dx, dy can be approximated by ∆t is assumed to be equal to the data set's sampling interval.We also assume that we have the white noise residual dr = σdW which can be approximated by So, the Two-Level Regression model becomes Note that this procedure makes model errors compared to the original linear two level model (2.20).In section 4 below, we will show that this kind of model error will lead to an intrinsic error when we use this regression procedure to recover the marginal dynamics of the mode x in (2.20).At the first level, we need to solve a least squares problem for where ã is the unknown parameter, {y i } is the residual.Let's denote we get the compact form of the least square problem for estimating unknown parameter ã Z = ãX + Y.
So, by the standard least squares method, we have the parameter estimate ã and the residual with the following relationship In the second level regression, we need to solve a least square problem for where q, Ã are the unknown parameters and {η i } is the residual.Here, the residual {η i } is assumed to have the form σξi √ ∆t , ξ i ∼ N (0, 1).Let's denote H i = yi+1−yi ∆t , η = {η i }, so we have the compact form for the second level least square problem We get the parameter estimates q Similarly, the discrete version of the two-level quadratic regression model from (2.9), (2.10) becomes Here, we apply linear regression procedure in each level.The residual forcing r i are determined by ordinary least-squares as described above in detail for the linear model (see [15]).
3. Finite time blow up, unstable statistical solutions, and pathological invariant measures for Ad hoc quadratic multi-level regression models.With the background in section 2.1, it is natural question to ask if the ad hoc multilevel quadratic regression models from [15] introduced in section 2.2 exhibit similar properties as are known to occur both theoretically [21] and in time series sampled from low frequency data of comprehensive models [9,24]; In these situations there is a smooth unique ergodic invariant measure with at most Gaussian tails [9,21,24] for the large fluctuations.Here in contrast, we demonstrate that the ad hoc quadratic multi-level regression models in (2.9), (2.10) have statistical solutions which exhibit pathological finite time blow-up, long-time instability of the mean under rather general hypothesis, and corresponding pathological invariant measures (if they exist) often with an infinite mean and at the least fatter than exponential tails in general.Numerical experiments confirm this pathological behavior of the two-level regression model as well as the skill of the physics based scalar regression model described in section 2.1.1,which has damping cubic terms.Below, we study statistical solutions of the general multi-level regression models in (2.9)-(2.11),so it is natural to consider the Fokker-Planck equation for the corresponding probability density, p(x, r, t) given by Without loss of generality, we assume that b > 0 in (3.1).The average of any nice functional f (x, r) is defined by The equations in (3.4) are not closed equations since the equation for x (t) involves the higher moment, x 2 (t); however, since x 2 (t) ≥ x 2 (t), we always have the differential inequality, With these preliminaries we state two of the main results proved below.Consider the linear function, h(x, r) defined by then for all statistical solutions with nice initial data, provided that F is large enough so that This has the immediate consequence.
Corollary 3.1.With the assumptions in (3.6) and (3.9), if the equilibrium measure for (3.1) exists, the equilibrium mean, x eq , cannot be finite.
A natural requirement for the regression model in (2.9)-(2.11) is that the regression process for r alone is stable without coupling to x, i.e. the matrix A in (2.11) satisfies the stability condition all eigenvalues of A have negative real parts. (3.10) With this hypothesis alone in (3.10), necessarily there are families of smooth statistical solutions which blow-up in finite time as stated next.
Theorem 3.2.Finite-time Blow-up of Statistical Solutions for stable multi-Level regression models for r alone: Assume that the stability condition in (3.10) for r alone is satisfied.Assume that the mean of the initial data satisfies Then the statistical solution from (3.1) blows up in finite time and in fact the mean satisfies where ϕ(t) is the solution of the ODE with finite time blow up for b > 0, As the proof below shows, it is straight forward to track the explicit dependence of C 1 , C 2 on the requirement in (3.10) and thus use the exact solution of (3.13) to obtain an explicit finite time blow up bound.Next we prove all the above results.
The proof of Theorem 3.1 is extremely simple.With (3.5) and (3.6), h (t) = x + α • r satisfies the equation, with λ given in (3.9) and this completes the proof.The proof of Corollary 3.1 is immediate since at equilibrium, the second equation in (3.4) implies r eq = −A −1 q x eq and applying Theorem 3.1 with initial data, p eq (x, r), establishes that x eq + α• r eq is infinite so necessarily, x eq is infinite.
The proof of Theorem 3.2 is slightly more complicated.First, the stability condition in (3.10) immediately guarantees the uniform bounds Thus, on any interval [0, T ] where x (t) is increasing for 0 ≤ t ≤ T , x (t) ≥ ϕ(t) where ϕ(t) satisfies the explicit finite time blow-up differential equality in (3.13).Thus, it remains for us to show that a priori, x (t) is increasing for the entire lifetime of the statistical solution provided that the initial data x (0) is chosen suitably.With C 1 , C 2 defined in (3.15) and b > 0, choose M 2 large enough so that (3.18) and consider initial data with x (0) ≥ M 2 .Then (3.17),(3.18)guarantee that initially, ∂ x ∂t | t=0 ≥ ε 0 > 0 and a priori ∂ x ∂t ≥ ε 0 > 0 for all later time in the life span, in other words, the set y ≥ M 2 in (3.18) is invariant for the statistical mean under the dynamics and automatically solutions with mean in this set have a monotone increasing mean in time.This concludes the proof of Theorem (3.2).
Note that in the proofs of Theorem (3.1), (3.2), we have made the tacit assumptions that the mean of local statistical solutions of (3.1) are continuously differentiable in time with finite second moment, x 2 (t); of course if these mild tacit conditions are already violated by the regression strategy, it already has unphysical pathological behavior.Theorem (3.1) and (3.2) show unphysical pathological behavior in the multi-level quadratic regression strategy but impose additional requirements like (3.9), (3.10) on the values of the regression parameters.The final theoretical result stated next does not restrict the structure of the regression coefficients in any fashion yet leads to pathological dynamics.Theorem 3.3.Instability of Statistical Solutions for General Ad hoc Quadratic Multi-level Regression: Assume the mild conditions that there is nonzero random noise in (2.9), i.e. σ = 0 and that A in (2.11) is invertible as well as b > 0. Choose α as in (3.6) and consider the functional h(x, r) = e α(x+ α•r) . (

3.19)
Then for α large enough so that there is general statistical instability, In particular, if the equilibrium measure, p eq , exists necessarily h eq is infinite as a consequence of (3.20) since h is positive.
Note that the equilibrium distribution of the ad hoc quadratic multi-level regression strategy necessarily has pathological fat tails in contrast to the physical behavior of actual low frequency dynamics [9,24,8,21] as discussed earlier.
Here is the proof of Theorem 3.3.We use the stochastic equations in (2.9).By Ito's lemma, we have where (3.6) has been utilized in the last equality.Simplifying (3.22) further, we get dh(x, r) =αh(F + (a + α • q)x + bx 2 )dt and note that the form of the matrix A in (2.11) guarantees automatically, α N > 0.
We get the equation for h(x, r) by integrating (3.23) yielding So, we get that h(x, r) grows exponentially as in (3.21) to finish the proof.

Pathology in Ad hoc quadratic multi-level regression: A case study.
Here we illustrate the pathological behavior in Theorem 3.2 as it arises naturally in an instructive case study of a time series, x(t), associated with a physics motivated problem [8,14,22].In [8], the one-dimensional normal form in (2.The time series length, Ñ , was increased systematically from Ñ = 10 9 to Ñ = 7 × 10 9 and Figure 1 shows the corresponding statistical convergence of the parameters, F , a and b; we observe that there is statistical significance of the quadratic interaction parameter b > 0 in Figure 1.Note that the regression parameters in (3.25) are consistent with the hypothesis of Theorem 3.2 since (3.10) is satisfied with b > 0; thus, we expect finite time blow-up of some statistical solutions.
Figure 2 is a plot of statistical convergence of the coefficients of the physics based single level regression strategy of section 2.1.1 with cubic damping terms and shows the statistical significance of the coefficients as the time series length Ñ increases.For the time series of shorter length, Ñ = 2 × 10 7 , we have the estimates F = −0.0045, a = −0.0185,b = 0.00615, c = 0.0032, σ = 0.226 which crudely approximate those in (3.24).Thus, for this physics based single level regression model, there is a statistically significant positive quadratic term but also statistically significant cubic damping.
Individual trajectories of the ad hoc two-level quadratic regression model and the single level regression strategy are depicted in Figure 3 starting from the same initial value, x 0 = 10, at the top and x 0 = 12 at the bottom of the figure.The trajectories of the two level regression model are stable for x 0 = 10 but blow up in finite time for x 0 = 12 as predicted by Theorem 3.2.On the other hand, the trajectories of the one level regression model with cubic damping are always stable; furthermore, as shown in Figure 4, the trajectory and equilibrium probability density function of the original model are well-captured by those from the one-level regression model.These numerical results are robust and apply to all five of the examples described in detail in [14] and [22] with explicit finite time blow up of the quadratic multi-level regression model although further detailed discussion is omitted here.
4. Discrete sampling model errors in linear multi-level regression strategies.In section 2.3, we introduced elementary test models for two level regression strategies that exhibit transparent non-uniqueness with the same marginal equilibrium dynamics.We also introduced the algorithm from [15] for two-level regression in section 2.3.1.The question we address here is the following one: Can the two level regression algorithm of section 2.3.1 recover the original marginal equilibrium dynamics for an infinitely long discrete time series and small sampling time ∆t 1? Surprisingly, the answer to this question is no as shown by the following.Assume we have infinitely many observations {x i = x(t i ), i = 0, 1, 2, ..., t i = i∆t}.Then, the two-level regression model for x(t) for ∆t 1 is given by Thus, the regression model can capture the variance of the marginal dynamics of x(t) C(0) = σ 2 (a + A)(q − aA) .When we use the multilevel regression procedure, we are making model errors.There are intrinsic errors when we estimate the variance of the local change of the trajectories A theoretical way to correct these model errors is not to use the time series of the hidden model directly but instead to apply Euler's method to the hidden model and generate a new auxiliary discrete time series which is not the time series of the perfect model.This procedure is confirmed by the following.
Theorem 4.2.Assume the Gaussian random process x i is the marginal dynamics of Euler scheme for the two-level linear hidden model where ε i is a Gaussian random number N (0, 1).Assume we have infinite many observations {x i = x(t i ), i = 0, 1, 2, ..., t i = i∆t}.Then, as ∆t → 0 the two-level regression model for x(t) is given by Therefore, the regression model can reproduce the marginal dynamics of the Euler scheme for the two level linear hidden model, since according to Proposition 2.1 (4.4) has the same eigenvalues as the original hidden model.However, clearly the model in (4.4) has different coefficients from the hidden model.
The proofs of Theorem 4.1 and 4.2 are tedious exact calculations using ergodicity and time averaging and are relegated to the appendix.While Theorem 4.2 is interesting in identifying the model error in two-level linear regression strategies, this result is not very useful from a practical view point since we do not know the actual hidden model generating the dynamics for x(t) very well.So the auxiliary time series using Euler's method is not available.The predictability properties for ensembles of x(t) which are not the marginal equilibrium dynamics can be very different for the two level regression model compared with the original model [22]; this is an important comment to keep in mind for [16,17] and is readily quantified in the present simplified setting but lack of space prevents a detailed discussion here.4.1.Simple numerical experiments for two level linear regression.In Theorem 4.1 and 4.2, we have assumed an infinitely long time discrete series for x(t) and then taken the limit as ∆t → 0 (see the appendix for the proofs which do this explicitly).In practice we have a finite length time series, N , and a finite sampling interval, ∆t, so it's interesting to see if the behavior predicted by Theorem 4.1,4.2actually arises in a numerical example.This is done next.
In the numerical example presented below, the hidden model in (4.1) has coefficients (a, q, A, σ 2 ) = (−2, 1, −1, 0.25) so the eigenvalues of the perfect stability matrix are 5) The eigenvalues of the stability matrix for the two level regression model predicted by Theorem 4.1 by using the true dynamics of the hidden model discretely sampled in time are (4.6) Below we will produce auxiliary time series for the dynamics of x(t) from two different procedures: 1. True dynamics of the 2D linear model where 2. Numerical integration by the Euler method where ε i is an i.i.d Gaussian random process N (0, 1).We will vary the observation time for the time series x(t), ∆T = m∆t so that with m = 1 and the method 1 in (4.7) we have the true dynamics for Theorem 4.1 while with m = 1 and method 2 in (4.7) we have the Euler Dynamics for Theorem 4.2.By increasing the observation time, ∆T , for the time series, we briefly explore the role of subsampling in this context [19,20]; this slight abuse of notation using ∆T should not confuse the reader.
In Figure 5 we present the computed autocovariance function from the Euler scheme and perfect model two level regression strategies compared with the autocovariance for the marginal equilibrium dynamics of the perfect hidden model.In the upper panel of Figure 5 we have the small time step, ∆t = 0.0005 and ∆T = ∆t with the long time series interval T = 5000 and the results confirm the prediction of Theorem 4.1, 4.2 very nicely.In the lower panel of Figure 5, we have a curious result; we used the large time step ∆t = 0.7 which is unstable for Euler's method for long time integration and a long time series interval T = 70000 with ∆T = ∆t and N = T ∆T = 10 5 .In this case, all the parameter estimates have large departure from the predicted value for the Euler method but nevertheless, a very accurate marginal covariance for x(t) has been calculated and it is more accurate than using the perfect dynamics as the time series input!In Figure 6, the integration time step is fixed at ∆t = 0.005 and the length of the time series, N = T ∆T , and the subsampling window, ∆T = m∆t, m = 1, 10, 100, 1000 are varied.The first column of Figure 6 shows that the Euler method recovers the exact covariance with model errors with the perfect hidden model like in Theorem 4.1, 4.2 even with undersampling m = 10, 100 for a long time series.The second and third columns show comparable behavior of the two methods with shorter time series and sampling with m = 1, 10, 100 and significant model errors in each case.The fourth column shows the complete inaccuracy of both methods with a very short time series while the bottom row demonstrates the superiority of generating time series through the perfect model for strongly undersampled situations with m = 1000.
5. Concluding discussion.The authors have demonstrated rigorously in section 3 of this paper that recently proposed [15,16,17] ad hoc quadratic multi-level regression strategies can have unphysical artifacts such as finite time blowup and instability of statistical solutions with corresponding pathological behavior of the attractor of the regression model compared to the physical time series actually being modeled [8,9,21,24].Section 3 also contains a series of numerical experiments confirming this predicted pathological behavior.It is also demonstrated in section The dash line is the true AutoCovariance, the solid line is from the time series generated by Euler Scheme, the solid line with dots is from the true dynamics.
3 that a physics based single level regression model described in section 2.1 with cubic nonlinear damping has significant skill as a regression model [8,21].In section 4, the surprising effects of discrete sampling in creating model error [22] in simple two level regression models were elucidated through both rigorous theorems and simple numerical experiments.
One important research direction is to find stable multi-level nonlinear regression models that can incorporate memory effects and nonlinearity in time in a simple fashion.One possibility is to use the three equation models as in (2.1) as simple multi-level regression models for data arising from large dimensional dynamical systems with a similar physical structure.Another interesting problem is to generalize the results of this paper to the case when x(t) is a vector time-series.The two-level regression model has the form Here, we will apply the two-level regression procedure we discussed in 2.3.1 to the true marginal dynamics of the two-level linear hidden model, x(t).
Step 1: Solve the true dynamics, we get that where We have the approximations Step 2: Statistics of the true dynamics.By section 2.3, we've learned the equilibrium mean is 0 and the covariance matrix Σ By ergodicity, we have So, we have the following relationships where ε(t) can be any stationary and ergodic random process with mean 0 and independent with random process x(t) and y(t).
Step 3: Calculate the parameter estimators ã, q under the true dynamics.
By the two-level regression procedure as discussed in Section 2.3.1, we've learned the expressions of the parameter estimates in (2.22, 2.24).So, we need the following expressions Let's plug the true dynamics (A.8) into (2.22,2.24) and consider the limiting behavior, take N → ∞ first, then take ∆t → Step 4: Calculate the parameter estimators Ã under the true dynamics.
Let's discuss each term.By the expansion of the covariance matrix (A.5) and the expansion of the matrix Q(∆t) (A.4), we have that + (Terms of independent interactions (equal to zero for N large)) + (Terms of independent interactions (equal to zero for N large)) N → ∞ by (A.Step 1: Gaussian random process x i is the marginal dynamics of Euler scheme for the two-level linear hidden model where ε i is a Gaussian random number N (0, 1).The equilibrium covariance matrix of this model has the following form 1 −a −a aA − q + a 2 σ 2 −2(a + A)(aA − q) + O(∆t).

. 13 )
so the mean of x(t), x (t), blows up in finite time.The constants C 1 , C 2 in (3.13) depend on the stability matrix A in (3.10) with M 1 depending accordingly on C 1 , C 2 and M 2 .

Figure 1 .
Figure 1.Statistical convergence of parameter estimations for the two level quadratic regression model (2.25) from centered and normalized time series, PC-1, with the integration time step ∆t = 0.005.The circles are parameter estimations and the stars are ends of 95% confidence interval.We observe statistical significance of the quadratic nonlinearity b.

Figure 2 .
Figure 2. Statistical convergence of parameter estimations for the physical based single level regression model (2.5) from centered and normalized time series, PC-1, with the integration time step ∆t = 0.005.The crosses are the exact value, the circles are the parameter estimations and the stars are ends of 95% confidence interval.We observe statistical significance of the quadratic term b as well as statistical significant cubic damping.

Theorem 4 . 1 .Figure 3 .
Figure 3. Instability of the two level quadratic regression model (2.25) with parameters obtained in Figure 1.The trajectory is stable in a finite time as initial x 0 = 10 and it blows up in a finite time as initial x 0 = 12.

Figure 4 .
Figure 4.The trajectory and equilibrium probability density function of the original model are well-captured by those from the physical based single level regression model (2.5).Here, we choose parameter set PC-1, length of the time series N = 2 × 10 7 and the integration time step ∆t = 0.005.

Figure 5 .
Figure 5. Compare AutoCovariance and Variance to the exact values.The upper panel: the observation time lag ∆T = ∆t = 0.0005, the interval length T = 5000.The lower panel: the observation time lag ∆T = ∆t = 0.7, the interval length T = 70000.The dash line is the true AutoCovariance, the solid line is from the time series generated by Euler Scheme, the solid line with dots is from the true dynamics.

Figure 6 .
Figure 6.Compare AutoCovariance and Variance to the exact values with fixed time step ∆t = 0.005, varied length of the time series, N = T∆T , and varied subsampling window, ∆T = m∆t, m = 1, 10, 100, 1000.The dash line is the true AutoCovariance, the solid line is from the time series generated by Euler Scheme, the solid line with dots is from the true dynamics.