Lessons in Uncertainty Quantification for Turbulent Dynamical Systems

The modus operandi of modern applied mathematics in developing very recent mathematical strategies for uncertainty quantification in partially observed high-dimensional turbulent dynamical systems is emphasized here. The approach involves the synergy of rigorous mathematical guidelines with a suite of physically relevant and progressively more complex test models which are mathematically tractable while possessing such important features as the two-way coupling between the resolved dynamics and the turbulent fluxes, intermittency and positive Lyapunov exponents, eddy diffusivity parameteri-zation and turbulent spectra. A large number of new theoretical and computational phenomena which arise in the emerging statistical-stochastic framework for quantifying and mitigating model error in imperfect predictions, such as the existence of information barriers to model improvement, are developed and reviewed here with the intention to introduce mathematicians, applied mathematicians , and scientists to these remarkable emerging topics with increasing practical importance. 1. Introduction. The 'inevitable reality' when it comes to predicting the dynam-ical behavior of turbulent, high-dimensional systems from nature is that the employed mathematical and numerical models need to properly account for propagation of uncertainty arising due to the limited understanding and partial observations of the true dynamics. Uncertainty Quantification (UQ) is undoubtedly an important issue in many physics, engineering, neural science and geoscience applications where complex nonlinear interactions between the resolved and unresolved processes, as well as various parametric uncertainties, need to be properly treated in order to reliably estimate the uncertain evolution of the true system on some subset of coarse-grained variables. In UQ for dynamical systems, one necessarily measures evolving estimates for the means, variances, etc. of principal quantities of interest. Thus, the mathematical framework of statistical solutions of complex dy-namical systems is a starting point for UQ. The development of techniques for UQ in high-dimensional systems, attempting to estimate the evolution of uncertainty and minimize the model error in order to improve predictions, has been in its infancy until recently. There is much current activity in disparate areas of mathematics, statistics, engineering and computer science leading to ideas and techniques which are relevant for UQ in dynamical systems; examples range from Bayesian hierarchical space-time models (e. and traditional Monte Carlo methods. However, despite the rapid improvements in numerical algorithms and availability of computing power, the fundamental mathematical problems for UQ in high-dimensional multi-scale systems remain unsolved. One well known obstacle in high-dimensional applications is the 'curse of dimensionality' which renders various UQ or data assimilation techniques based …


1.
Introduction. The 'inevitable reality' when it comes to predicting the dynamical behavior of turbulent, high-dimensional systems from nature is that the employed mathematical and numerical models need to properly account for propagation of uncertainty arising due to the limited understanding and partial observations of the true dynamics. Uncertainty Quantification (UQ) is undoubtedly an important issue in many physics, engineering, neural science and geoscience applications where complex nonlinear interactions between the resolved and unresolved processes, as well as various parametric uncertainties, need to be properly treated in order to reliably estimate the uncertain evolution of the true system on some subset of coarse-grained variables. In UQ for dynamical systems, one necessarily measures evolving estimates for the means, variances, etc. of principal quantities of interest. Thus, the mathematical framework of statistical solutions of complex dynamical systems is a starting point for UQ. The development of techniques for UQ in high-dimensional systems, attempting to estimate the evolution of uncertainty and minimize the model error in order to improve predictions, has been in its infancy until recently. There is much current activity in disparate areas of mathematics, statistics, engineering and computer science leading to ideas and techniques which are relevant for UQ in dynamical systems; examples range from Bayesian hierarchical space-time models (e.g., [121]) to variational approximations (e.g., [118,53]), to statistical-stochastic hybrids [80,81], to Polynomial Chaos Expansions (e.g., [36,125,123,63,43,98]) and traditional Monte Carlo methods.
However, despite the rapid improvements in numerical algorithms and availability of computing power, the fundamental mathematical problems for UQ in high-dimensional multi-scale systems remain unsolved. One well known obstacle in high-dimensional applications is the 'curse of dimensionality' which renders various UQ or data assimilation techniques based on brute-force sampling of the underlying state space unrealistic in such configurations. A more fundamental issue, however, arises when dealing with natural systems which are only partially observable yet are vastly more complicated than any conceivable model. A very useful and timely example is the the Earth's climate which is undoubtedly an extremely complex system coupling physical processes for the atmosphere, ocean, and land over a wide range of spatial and temporal scales (e.g., [21,100]). The dynamical equations for the actual climate system are obviously unknown. Moreover, all that is available from the true climate are some coarse-grained observations of functions such as mean or variance of temperature or tracer greenhouse gases, or the large scale horizontal winds. Thus, a fundamental practical difficulty in estimating sensitivity of the climate system to external or internal perturbations lies in predicting the coarse-grained response of an extremely complex high-dimensional system from partial observations of its present equilibrium combined with imperfect models.
It is clear that UQ techniques for partially observed high-dimensional turbulent dynamical systems require a synergistic framework capable of quantifying the model error and systematic model improvement leading to reliable predictions for a subset of coarse-grained variables of the perfect system. In this context basic questions and new issues arise such as the following: (A) How to measure, in an unbiased fashion, the statistical accuracy of a given imperfect model for reproducing the equilibrium statistics of the true system? (B) How to make the best possible estimate of response of the true system to changes in external or internal parameters by utilizing the imperfect knowledge available of the present equilibrium? (C) Given uncertain initial conditions, how to make the best possible predictions of the true system dynamics at short times, and at medium-long time ranges when both the initial data response and the asymptotic system properties are important? (D) How do coarse-grained measurements of different functionals of the true equilibrium on the subset variables which are available for measurements affect the assessments in (A)-(C)? What are the weights which should be assigned these functionals to improve the performance of the imperfect models? Which new functionals of the true equilibrium should be observed in order to improve the above assessments? The problem of quantifying and mitigating model error in coarse-grained imperfect models of complex high-dimensional systems is an important and a challenging one from both the practical and theoretical viewpoint. This task is especially difficult in turbulent systems in which energy often flows intermittently from the smaller unresolved or marginally resolved scales to impact much larger and longer spatiotemporal scales of motion of interest [72]. Atmospheric sciences, and meteorology in particular, have long been at the forefront of numerical and theoretical developments in modeling and forecasting complex turbulent systems. While the initial focus of work on numerical general circulation modeling was concerned with the short-term behavior of the atmosphere (i.e., the "weather"), it was soon realized that the investigation of long-term trends (i.e., the "climate") could also be attacked using a blend of computational techniques and reduced models.
On the other hand, the medium range forecasts, where both the initial conditions and the asymptotic system characteristics are important, present a substantially more challenging task and improvements in this area lag behind the other two regimes. These are the regimes of large contemporary interest in climate atmosphere ocean science where extended range forecasting including the behavior of extreme events within a season or the effects of global warming over the next ten to fifty years have major societal importance worldwide. In fact, these difficulties were recognized over five decades ago by John von Neumann in the context of numerical weather prediction who famously stated "The approach [...] is to try first shortrange forecasts, then long-range forecasts of those properties of the circulation that can perpetuate themselves over arbitrarily long periods, and only finally to attempt to forecast medium-long time periods ..." [101].
Recently, a stochastic-statistical framework for a systematic improvement of imperfect models and linking the statistical equilibrium fidelity of imperfect models with their sensitivity for capturing the forced response of the perfect system was proposed in [79,80,81,32]. This newly emerging viewpoint blends detailed, physicsconstrained dynamical modeling, stochastic parameterization and purely statistical analysis by combining empirical information theory with an appropriate fluctuation dissipation theorem, and it has at least two mathematically desirable features: (i) the approach is based on a skill measure given by the relative entropy which, unlike other metrics in UQ, is unbiased and invariant under the general change of variables [62,56,84,87], and (ii) the optimization principles based on the relative entropy systematically minimize the loss of information in the imperfect model which generally does not imply minimizing the error in individual moments of the associated probability densities like the mean or covariance; this is particularly important for UQ in high-dimensional systems.
The main goal of this research expository paper is to describe these recent and ongoing developments emphasizing the remarkable new mathematical and physical phenomena that emerge from the modern applied mathematics modus operandi applied to uncertainty quantification in partially observable high-dimensional dynamical systems. The use of these ideas in applied mathematics and numerical analysis for quantifying model error, as well as in climate change science and various engineering applications for improving imperfect predictions, is only beginning but already the wealth of new important concepts and approaches warrants a detailed treatment. Here, we adopt a systematic framework for discussing these issues based on a suite of physically relevant and progressively more complex test models which are mathematically tractable while possessing such important features as the two-way coupling between the resolved dynamics and stochasticity due to the interactions with unresolved processes, intermittency and positive Lyapunov exponents, eddy diffusivity parameterization and overdamped turbulent spectra. The two recurring themes in the following discussion are: • The utility of the unified stochastic-statistical framework for UQ and improvement of imperfect predictions with model error in turbulent dynamical systems with intermittency, positive Lyapunov exponents and hidden instabilities; • The existence of information barriers to model improvement in classes of imperfect models, their hallmarks and consequences for UQ in turbulent dynamical systems.
The mathematical toolkit utilized below includes empirical information theory, fluctuation-dissipation theorems and systematic physics-constrained, statisticalstochastic modelling for large-dimensional turbulent dynamical systems. While the nomenclature of this paper is biased towards the climate science applications, there are many immediately obvious analogies throughout the text to problems involving high-dimensional nonlinear and nonautonomous stochastic dynamical systems with non-trivial attractors in neural science, material science, or engineering.
The important points discussed and illustrated throughout the paper for UQ in turbulent high-dimensional systems with intermittency include the following: • The information-theoretic optimization advocated here can dramatically improve predictive performance and sensitivity of imperfect models (see also [79,80,81,32]). • Statistical equilibrium fidelity of imperfect models on the coarse-grained subset of resolved variables is necessary but not sufficient for high predictive skill and sensitivity to perturbations (see also [81]). • There exist explicit information barriers to model improvement within a given class of imperfect models beyond which the loss of information can be only reduced by expanding the class of models to allow for more degrees of freedom (see also [73,32]). • The information-theoretic optimization of imperfect models requires tuning the marginal probability densities for suitable coarse-grained variables with those of the perfect system on the unperturbed attractor. In the simplest Gaussian framework such a procedure implies simultaneous tuning of means and covariances variances [80,81]. • The sensitivity of imperfect models to capturing the effects of perturbations of the perfect system attractor can be tested via algorithms exploiting a suitable fluctuation dissipation theorem and experiments in the training phase in the unperturbed climate (see also [81]). • Nonlinear, non-Gaussian models can have long memory of initial conditions, including the initial conditions for the unresolved processes. Linear Gaussian imperfect models cannot reproduce the response in the variance to forcing perturbations in climate change scenarios. Consequently, the change in variability due to perturbations of the system's attractor remain undetected by linear imperfect models of a nonlinear system (see also [82,80,81]). • Intermittency and the presence of fat-tailed PDFs arising through the complex interactions with the unresolved processes leads to fundamental limitations in traditional UQ techniques, such as Polynomial Chaos Expansions and Monte Carlo simulations [10].
The authors hope that this article inspires other mathematicians, scientists, and engineers to explore the use of modern applied mathematics in developing strategies for UQ in reduced models of high-dimensional turbulent dynamical systems. The plan for the remainder of the paper is the following one. In Section 2, we discuss the general principles of information theory in the context of improving the statistical fidelity of imperfect models, as well as the origins of information barriers ( §2.2) and the important link [79,80,81] between the statistical equilibrium fidelity of imperfect models and their sensitivity ( §2.3). In Section 3 we introduce the simplest linear Gaussian models with hidden dynamics which are used to illustrate the utility of the information-theoretic framework for model optimization of §2 and transparent information barriers for short and intermediate range forecasting. Other central issues arising in the context of imperfect coarse-grained predictions are developed in this elementary and instructive context. In Section §4 we discuss UQ via the statistical-stochastic framework of §2 in two physically relevant nonlinear and non-Gaussian test models with hidden dynamics and intermittency; these mathematically tractable models for the dynamics of a single mode in a turbulent signal allow for examining the effects of model error and information barriers due to moment closure approximations and dimensional reduction in an unambiguous setting. In section 5 we discuss fundamental limitations of truncated Polynomial Chaos Expansions [10] as a method for UQ in systems with intermittency and fat-tailed PDFs; two examples based on the models discussed in §4 are used there to show the failure of such methods in the simplest possible setting. Section §6 discusses UQ based on the information-theoretic framework of §2 in the complex case of spatially extended systems with turbulent energy spectra and non-trivial interactions between the mean dynamics and the turbulent fluxes. Finally, Section 7 discusses a number of instructive examples of purely numerical artifacts which might lead to severe bias in UQ techniques; these range from fake fat-tailed numerical estimates for the marginal PDFs of the true statistics to situations where unbounded growth in time of the statistical moments remains undetected by various UQ techniques. Concluding remarks and final comments on a number of topics not discussed in this article are presented in §8.
Below is the table of contents for the remainder of the article: 2. Statistical dynamics of turbulent dynamical systems and imperfect model improvement via empirical information theory. Successful predictions in complex nonlinear systems encountered in contemporary climate change, atmosphere-ocean, or engineering applications are difficult since the actual dynamics is a turbulent large-dimensional system with positive Lyapunov exponents and intermittency on essentially all spatio-temporal scales. Moreover, the true system is only partially observable based on some coarse-grained statistical measurements of functionals on the system's attractor; here, we assume that the true system has a global attractor in the general nonautonomous setting (i.e., the global time forward and pullback attractors exist and coincide; see, e.g., [4,87] for a detailed discussion of these issues). Applications of empirical information theory and fluctuation dissipation theory to systematically improve imperfect model predictions have been addressed at length in the context of climate change applications in [79,80,81,32].
Here, we highlight the most important concepts which are necessary for the following exposition, including the information-theoretic framework for improving imperfect model fidelity to the unperturbed attractor, as well as the important link between the attractor fidelity and sensitivity. As already mentioned, the goal of such a stochastic-statistical framework is to estimate sensitivity of the true system to external forcing and/or initial data response and predict with imperfect models the coarse-grained response of an extremely complex high-dimensional system from partial observations of the present unperturbed attractor/climate.
Consider here for concreteness the Earth's climate system. While the actual equations governing climate dynamics on earth are unknown, it is natural to assume that these dynamics are Markovian, i.e., the future state depends only on the present state, on a suitably large space of variables v v v ∈ IR P , P 1. Thus, it is reasonable to assume that the perfect dynamical system for the climate is given bẏ where σ is a P × K noise matrix andẆ (t) ∈ IR K is K-dimensional white noise defined for t ∈ IR. The use of statistical descriptions for the climate system dates back to early predictability studies for simplified atmosphere models [66,67,68,22]. The high-dimensionality of such problems necessitates the development of reduced imperfect models; these are given by a known dynamical systeṁ which has a similar structure to (1) but its phase space, IR M , is typically completely different from that of the perfect system with M P ; the perfect system and its model share, however, the coarse-grained variables u u u ∈ IR N where N P − M . Throughout the following analysis we are interested in characterizing the statistical departures of the imperfect model dynamics relative to the perfect model on the subspace of the coarse-grained, resolved variables u u u.
The evolution of the probability densities p(v v v) and p(v v v m ) associated with the perfect and imperfect models satisfy appropriate Fokker-Planck equations (FPE) which we omit here for brevity. However, it is important to note that solving FPE in geophysical or engineering applications involving turbulent flows is unrealistic, despite the linearity of the problem, due to a massive number of relevant degrees of freedom in the dynamics. Therefore, a systematic procedure for constructing and validating reduced models on the subspace of relevant coarse-grained variables is of great importance in many contemporary AOS and engineering applications.
The assessment of the predictive performance of an imperfect stochastic model requires appropriate metrics for determining how well the marginal density π(u u u) of the perfect system on the common coarse-grained variables u u u is approximated by the imperfect model marginal density π m (u u u). The natural way [62,87] to measure the lack of information in one probability density, here the marginal π m (u u u), compared with the other, here π(u u u), is through the relative entropy, P(π, π m ), given by with the integration assumed over the whole subspace of the common coarse grained variables u u u ∈ IR M . Despite the lack of symmetry in its arguments, the relative entropy P provides an attractive framework for assessing model error in highdimensional turbulent systems, and in particular in AOS applications [56,84,17,19,2,79,12,117,37,38,80,81], due to its two 'distance-like' properties: • (i) P(π, π m ) is always positive unless π = π m [62,87], • (ii) it is invariant under any invertible change of variables [84,87]. Thus, the relative entropy (3) provides the following useful diagnostic definitions in the context of uncertainty quantification in high-dimensional turbulent systems: Model error Model error) characterizes the lack of information in the marginal probability density of the imperfect model, π M , relative to the true marginal density, π, on the coarse-grained variables u u u [79]. For given Gaussian statistical initial conditions with mean and variance u u u 0 , R 0 so that π t0 ≡ π(u u u, t|ū u u 0 , R 0 ), π m t0 ≡ π m (u u u, t|ū u u 0 , R 0 ), the model error is E t |ū u u 0 , R 0 = P(π t0 , π m t0 ).
Definition 2.2. (Internal prediction skill Internal prediction skill Internal prediction skill) of a perfect/imperfect model quantifies the role of initial conditions in the forecast of a future state of a system [56,37,38] and it represents the gain of information beyond the perfect/imperfect climate (i.e., the probability density on the attractor). The prediction skill for the perfect model and the internal prediction skill for the imperfect model are where π t0 ≡ π(u u u, t|ū u u 0 , R 0 ) is the marginal density on the coarse-grained variables u u u conditioned on the initial data and π att (u u u, t) ≡ lim t0→−∞ π t0 is the climatological marginal density of the perfect model; analogous definitions hold for the imperfect model.

Definition 2.3. (Model sensitivity
Model sensitivity Model sensitivity [79]) quantifies the gain of information in the marginal probability density of the perfect/imperfect system on the coarse-grained variables u u u in response to external perturbations from its climate (the probability density on the attractor); it is expressed via the relative entropy as where the climatological marginal densities π att , π m att are defined as in Definition 2.2 and π δ , π m δ are the perfect/imperfect model marginal densities perturbed from the original attractor and conditioned on the particular perturbation.
Note that most forced dissipative high-dimensional turbulent dynamical systems have marginal densities for coarse-grained variables on the attractor that are smooth despite the dissipative fractal pieces of the attractor so Definitions 2.1-2.3 make sense (see [75] for many examples).
The most practical setup for utilizing the framework of empirical information theory in AOS applications arises when both the measurements of the perfect system and its imperfect model involve only the mean and covariance of the resolved variables u u u so that π L is Gaussian with climate meanū u u and covariance R, whereas π m is Gaussian with model meanū u u m and covariance R m . In this case, P(π L , π m ) has an explicit formula [56,87] which is extensively used throughout this study The first term in brackets in (7) is the signal, reflecting the model error in the mean but weighted by the inverse of the model variance, R m , whereas the second term in brackets, the dispersion, involves only the model error covariance ratio, R R −1 m . The signal and dispersion terms in (7) are individually invariant under any (linear) change of variables which maps Gaussian distributions to Gaussians; this property is very important for unbiased model calibration.
2.1. Systematically improving imperfect models through empirical information theory. As argued earlier in [80,81,73], the framework of empirical information theory provides a convenient and unambiguous way for characterizing the model error and for improving the performance of imperfect models, as well as for assessment of the model forced response via appropriate tests in the unperturbed statistical equilibrium and appropriate fluctuation dissipation theorems [81,11].
Consider first a class of imperfect models, M; the best imperfect model on the coarse-grained variables u u u is characterized by the marginal density π m * , m * ∈ M, so that the perfect model with the marginal density π has the smallest additional information beyond the imperfect model density [79,80,81,73], i.e., P(π, π m * ) = min m∈M P(π, π m ). An important issue to contend with in any realistic scenario is the fact that the perfect model density, π, in (8) is not known and only its best unbiased estimate, π L , based on L measurementsĒ E E L of the perfect system during the training phase is available. The following general principle [75,79] facilitates the practical calculation of (8) P(π, π m L ) = P(π, π L ) + P(π L , π m L ), L L, (9) where L denotes the number of measurements of the perfect system. The first term P(π, π L ) in (9) represents the lack of information in the least biased density based on the L measurements π L relative to the true density π and it cannot be reduced unless more measurements are incorporated. As shown in [75], the information barrier in (9) is given by where is the entropy or uncertainty in the densityπ; we will return to this important concept in §2.2.
Note also that the second term, P(π L , π m L ), in (9) exactly gives the lack of information in the "coarse-grained" model probability density associated with fewer constraints, π m L relative to the least biased L-measurement estimate of the true density. Consequently, the optimization principle (8) can be computed by replacing the unknown π by the hypothetically known π L so that the optimal model within the considered class of imperfect models satisfies P(π L , π m * L ) = min m∈M P(π L , π m L ), L L.  [79,80,20] of an imperfect model consistent with the L measurements of the coarse-grained variables u u u arises when P(π L , π m * L ) 1, (13) with perfect climate fidelity for P(π L , π m * L ) ≡ 0. Remarks: (i) in an effort to minimize unnecessary technicalities in the following exposition, we frequently abuse the terminology and do not distinguish between statistical equilibrium in the autonomous case and time-dependent statistics on the attractor in the nonautonous case; see, e.g., [4,87], for a detailed discussion of these issues, (ii) for notational simplicity we skip the subscripts L and L in (8) and (12) in the following discussion with the tacit assumption that the number of the coarse-grained measurements of the imperfect model does not exceed the number of analogous measurements of the perfect system (i.e., L L).
It turns out, as illustrated in the subsequent sections, that even when the timeaveraged climate fidelity is achieved, i.e., with T the period on the attractor of the perfect system, a significant improvement in the model prediction skill and sensitivity can be achieved. Climate fidelity of imperfect models is a necessary but not a sufficient condition for good model sensitivity (see [80,81] for details), as illustrated in the following sections based on a suite of increasingly complex models. Below, we provide a simple example of this important link which is described in more detail, exploiting the fluctuation-dissipation theorem in [81,11].
2.2. Information barriers. The notion of an information barrier will prove to be an important concept in the following discussion and it requires special attention. Intuitively, information barriers arise in classes of imperfect models which fundamentally limit the skill of any model in the class to reproduce the correct statistical behavior from nature. As introduced earlier in (9) and (10) the first instance of information barrier was associated with the intrinsic lack of information P(π, π L ) in the least biased density π L based on L measurements of the true density π.
The intrinsic lack of information (10) cannot be reduced within the class of the least biased L-measurement densities and, consequently, only a part of the total lack of information is available for optimization in this configuration through (12). The information barrier (10) can only be overcome by expanding the framework to incorporate more measurements of the true density. Note also that additional information barriers can exist in the given L-measurement framework if one considers a subclass of the coarse grained model densities, say π mα L ⊂ π m L , L L, which does not contain m * minimizing (12); this case will be illustrated in §4 and §6.
Another situation in which the information barriers arise naturally is associated with the sensitivity of imperfect models with climate fidelity to internal or external perturbations. As shown in [81], the model error between the true perturbed density and the model density can be expressed in such a case as P(π δ , π m δ ) = P(π δ , π L,δ ) + where δπ L , δπ m denote the perturbations of the respective densities and P(π δ , π L,δ ) represents the information barrier which cannot be overcome unless more measurements of the perfect systems are taken into consideration. More details regarding the last formula (15) are given in the next section based on a simple example in the Gaussian framework.
2.3. The link between statistical equilibrium fidelity and sensitivity of imperfect models. The crucial issue in climate change science as well as many other areas involving high-dimensional turbulent systems regards estimating the accuracy with which the statistical evolution of an imperfect model predicts the perturbed coarse-grained statistics of the perfect model. The framework developed in [80,81] directly links the uncertainty of the imperfect model forecast in response to external perturbations with its fidelity to the unperturbed climate; this link can be further exploited to asses the sensitivity of imperfect models to forcing perturbations based on the fluctuation dissipation theorem and appropriate tests carried out in the training phase [81,32,11] (see also remarks in §8).
In order to understand this link, assume that the perfect system or the imperfect model or both are perturbed so that π δ (u u u, t) the unknown perfect distribution, π δ,L (u u u, t), its least-biased density based on L measurements, and π m δ (u u u, t) the model distribution all vary smoothly with the parameter δ, i.e., π δ (u u u) = π(u u u) + δπ(u u u), δπ(u u u)du u u = 0, with analogous expressions for the perturbed imperfect marginal density π m δ in terms of π m and δπ m ; the explicit time dependence in (16) was skipped for clarity. For stochastic dynamical systems rigorous theorems guarantee this smooth dependence under minimal hypothesis [41]. It is instructive to consider here an elementary but informative example assuming the Gaussian framework based on the measurements of the means and covariances which are assumed to be strictly diagonal for simplicity. Then, assuming climate fidelity of the imperfect models, the leading order Taylor expansion in the (small) parameter δ of the error P(π δ , π m δ ) of the imperfect predictions (see Definition 2.1) leads to P(π δ , π m δ ) = S(π δ,G ) − S(π δ ) The entropy difference, S(π δ,G ) − S(π δ ), in (17) corresponds to the intrinsic error due to measuring only the mean and covariance of the perfect system. This entropy difference cannot be reduced within the class of Gaussian models and represents an information barrier to improving the imperfect model sensitivity; as already signaled in §2.2, we will return to this important concept repeatedly throughout the paper. The first (second) summation in (17) is the signal (dispersion) contribution to the model error (see (7)). For more general results, including the discussion of the link between climate fidelity and sensitivity of imperfect models via the fluctuation dissipation theorem, see [80,32]. Given a class of imperfect models M, the error (17) in predicting the true system's response to external perturbations is minimized for the model which is the most consistent with the unperturbed climate, i.e., the model m * ∈ M satisfying (8) or (12). In fact, climate consistency of an imperfect model is a necessary but not a sufficient condition for its predictive skill, as shown in [80], where simple yet instructive examples reveal the possibility of intrinsic barriers to improving model sensitivity even with perfect climate fidelity. The only way to overcome such barriers is by extending the class of imperfect models to account for more degrees of freedom. In the following sections, we will present further examples of such situations in more complex non-Gaussian models.
3. UQ and forecasting with model error in linear systems with hidden dynamics. Here, we introduce a simple two-dimensional linear stochastic model which is very useful for elucidating many important effects of model error in short and medium range imperfect predictions on a common subset of resolved variables. This is the simplest exactly solvable model which allows for studying these features. An important source of model error in dynamical forecasts of complex high-dimensional turbulent systems using imperfect models on the coarse-grained subset of resolved variables is introduced due to neglecting, or misrepresenting, the interactions with unresolved processes which are effectively hidden from the coarse-grained imperfect models.
Here, we utilize the simplest possible two-dimensional linear framework and assume that one of the two dynamical variables is observed/resolved, and the second variable is hidden from observations while nontrivially affecting the resolved dynamics. We will also study imperfect models with correctly prescribed statistical properties of the marginal equilibrium dynamics for the observed variable and quantify their short and intermediate range predictive skill. In particular, we focus on the role of initial conditions in short range imperfect forecasts, and the subtle interplay between the initial data and the asymptotic system characteristics in the medium range forecasts for the resolved dynamics.
Important issues illustrated in this section are: • There exist many imperfect models with climate fidelity on the resolved variables which have very different short and medium range prediction skill; • Models with high skill for short range predictions are not necessarily those with high skill for intermediate or long range predictions; moreover, models with unphysical negative damping in the resolved dynamics can outperform models with positive damping for short range predictions; • In the linear Gaussian framework tuning imperfect models to reproduce the marginal climatological distribution at all time lags guarantees the correct response to any time dependent external forcing. Models with unphysical parameter values and inferior initial data response may have high skill in the forced response, exceeding that of imperfect models which are tuned to reproduce equilibrium statistics in the marginal climate; • There exist explicit information barriers for model improvement which require extending the class of imperfect models in order to achieve further improvement; • There exist important limitations for estimating the prediction skill based on the the superensemble framework and mutual information.
As will be shown below and in the following sections, this simple linear Gaussian framework with hidden/unresolved dynamics provides a very revealing testbed for analyzing and understanding many important issues associated with short, and medium range predictions of much more complex nonlinear natural systems where the partial observability of the true system and the subtle interplay between the initial conditions and model error are much more difficult to capture. Problems of this nature arise even in the relatively simple framework of multilevel linear regression models which attempt to tune imperfect model parameters based on the marginal dynamics of the resolved variables (see [89] for an extensive treatment); some remarks regarding this issue are presented in §8.
3.1. The linear system for the perfect model. Consider a situation when the resolved/observed dynamics x(t) is given by the marginal dynamics of the following 2 × 2 linear stochastic system which serves as the perfect model The system matrixL and its eigenvalues λ 1,2 are, respectively, There are four coefficients (a, q, A, σ) in (18) where a is the damping in the resolved dynamics x(t), and q is the damping in the unresolved dynamics y(t) with A the coupling parameter in the unresolved dynamics. Clearly, in the above formulation the resolved dynamics x(t) is affected by the hidden process y(t). We assume that the deterministic forcing F (t) acts only in the resolved subspace x and the stochastic noise generated by the scalar Wiener process W (t), t ∈ IR, W (0) = 0 is present only in the unresolved subspace with the variable y. Since the system (18) is linear with additive noise, it can be easily shown that for constant forcing F it has a Gaussian invariant measure provided that The above stability conditions guarantee that the eigenvalues ofL have negative real parts so that the equilibrium mean of (18) is given bȳ The autocovariance at equilibrium, C att (τ ) = x x x(t + τ ) ⊗ x x x T (t) att , depends only on the lag τ regardless of the nature of the forcing and it is given by where the covariance Σ ≡ C att (τ = 0, t) on the attractor of (18) is given by Extensions to the nonautonomous case with time-periodic forcing are trivially accomplished provided that the stability conditions (20) are satisfied so that there exists a time-periodic Gaussian measure on the attractor (defined in the nonautonomous sense; see, e.g., [87,4]) with the time-periodic mean,x x x att (t) ≡ lim t 0 ), and autocovariance (22). It is important to realize that even for the class of simple linear systems (18) the dynamics during the approach to the same attractor can be quite different depending on the coefficients {a, q, A, σ}. There exist three distinct regimes of stable dynamics of the linear Gaussian system (18) with different properties of the statistical evolution of the 2 × 2 system (18): Purely real eigenvalues λ 1,2 : • Normal mean dynamics: In this regime the two eigenvectors of the system matrixL with purely real eigenvalues λ 1,2 are nearly orthogonal. Provided that Λ ≡ λ 1 /λ 2 = 1, this configuration occurs for • Non-normal mean dynamics: For large eigenvalue ratio Λ 1 and weak damping, a Λ, in the resolved dynamics the angle between the eigenvectors is sensitive to variations of a but only one of the eigenvectors significantly changes its orientation. For Λ ∼ 1 corresponding to (a − A) 2 + 4q ≈ 0, the eigenvectors are nearly collinear and the direction of both eigenvectors depends strongly on the damping a in the resolved dynamics.
We will show in the next section that the marginal statistical properties for the resolved variable x(t) on the attractor can be reproduced by many different imperfect models. The crucial observation is that some of these models may be characterized by a normal mean dynamics while others will display a high sensitivity to model parameters and the statistical initial conditions due to non-normal character of their dynamics.
3.2. Imperfect models. One obvious source of model error in imperfect predictions of natural high-dimensional systems stems from the fact the imperfect models can be only tuned to the true dynamics on a subset of resolved, coarse-grained variables on the attractor of the true system. Here, we introduce two classes of imperfect models of (18) which we use to illustrate the following important facts: • There exist families of imperfect models of the linear system (18) with the same statistical characteristics of the resolved dynamics on the attractor (see §3.2.1). • Different imperfect models with the same attractor/climate fidelity (13) on the subspace of resolved variables can have significantly different skill for short and medium range prediction as well as the forced response (see §3.4).
The imperfect models described below will be subsequently used in combination with the stochastic-statistical framework of §2 to illustrate a number of important issues arising in imperfect predictions of vastly more complex nonlinear natural systems, such as the interplay between forced response and initial data response and limitations of the superensemble framework. The issues related to predictive performance of these imperfect models and their forced response sensitivity are discussed in the subsequent sections.

× linear models with model error
The imperfect models in this class have the same structure as the perfect system (18) but with different coefficients (a m , A m , q m , σ m ) and forcing F m (t). The pathwise solutions and statistics of the imperfect 2×2 models have the same structure as those for the perfect system (18), as shown in Appendix A.1.
Here, the model error is introduced by the incorrect coefficients (a m , A m , q m , σ m ) which are tuned so that the marginal statistics of the resolved dynamics, x m (t), in the model and the truth, x(t), coincide on the attractor (see §3.2.1).

Mean Stochastic Model (MSM) for the resolved dynamics
These models are the simplest examples representing under-resolution of a perfect model through a lower dimensional dynamical system. Here, these imperfect models represent the simplest possible reduced models of the perfect system (18) where the interactions between the unresolved dynamics y(t) and the resolved dynamics x(t) are represented by the white noise. Consequently, the resolved dynamics x(t) in (18) is modelled by the scalar Mean Stochastic Model (MSm) given by the linear SDE (e.g., [28]) where γ msm > 0 is the damping, F msm denotes the deterministic forcing, and σ msm is the amplitude of the white noise forcing defined on the whole real line. The pathwise solutions and statistics of the Gaussian process satisfying (24) are the standard solutions for the Ornstein-Uhlenbeck process and are given in Appendix A.2. The dynamics of x msm is stable for γ msm > 0 so that the statistics on the attractor arē The model error in MSm arises due to the dimensional reduction and representing the interactions with the unresolved processes through white noise forcing. It is important to note that MSm cannot reproduce the marginal autocovariance, (C att ) 11 (τ ), of the perfect system (18) at all lags and it cannot be tuned to reproduce the marginal equilibrium statistics; this affects the skill for predicting the forced response (see also Proposition 3 and §3.3).
Finally, we mention that another class of imperfect models of (18) can be obtained via a multilevel linear least-squares regression of the model coefficients (a m , q m , A m , σ m ) and the forcing F m from long time series of the marginal dynamics x(t). The limitations of the multilevel approach for constructing two-dimensional models and the advantages of physics-constrained scalar models were discussed at length in [89]; some remarks regarding this approach are relegated to §8.

Families of linear models with the same marginal statistics on the attractor.
Here, we show that there exist families of imperfect linear models of the system (18) with perfect climate fidelity (cf. Definition 2.4). Moreover, there also exist families of imperfect models with perfect climate fidelity and correct marginal twopoint statistics at all time lags. This non-uniqueness will be reflected in different short and medium range forecast skill of imperfect models with climate fidelity and different forced response properties. The discussion below builds on the results reported in [89] in the unforced case; more details are given in Appendix A.3 which outlines straightforward extensions to the time-periodic configuration.
The two definitions below distinguish between two degrees of statistical consistency on the attractor of a stochastic system: Marginal two-point equilibrium statistics Marginal two-point equilibrium statistics). Consider the solution (x(t, t 0 ), y(t, t 0 )) of the linear stochastic system (18) satisfying (20) and with a Gaussian invariant measure at equilibrium with mean (21), and autocovariance (22). Then, the marginal mean and autocovariancē is called the marginal two-point statistics on the attractor for the process x(t).

Definition 3.2 (Marginal climatology Marginal climatology
Marginal climatology (marginal one-point equilibrium statistics)). Consider the solution x(t, t 0 ), y(t, t 0 ) of the system (18) with Gaussian invariant measure at equilibrium with mean (21), and autocovariance (22). The marginal equilibrium mean and variancē are called the marginal climatology for the Gaussian process x(t).
Remarks: (i) Based on definitions (3.1) and (3.2) it is clear that linear systems with the same marginal two-point statistics on the attractor are contained within the family of systems with the same marginal climatology. (ii) The above definitions can be trivially extended to the nonautonomous case with a time-periodic attractor [87,4] provided that the stability conditions conditions (20) are satisfied.
It can be easily verified with the help of straightforward algebra and Definitions 3.1-3.2 that there exist families of imperfect models of (18) with same equilibrium marginal statistics, or the same climatology as that of the resolved variable x(t) satisfying the perfect system (18); we summarize these results below: [Family of imperfect 2×2 linear models with the same marginal twopoint equilibrium statistics for x(t)]: Consider the linear 2×2 system (18) satisfying the stability conditions (20) with constant forcing and coefficients (a, q, A, σ, F ). Then, the marginal two-point equilibrium statistics of (18) is controlled by three parameters λ 1,2 , σ, AF .
Since there are five independent coefficients in the (18) and four constraints (28), there exists a one-parameter family of 2×2 linear models (18) and coefficients with the same marginal two-point equilibrium statistics for x(t).
Proposition 2 (Family of imperfect 2×2 linear models with the same marginal climatology for x(t) ). Consider the linear system (18) satisfying the stability conditions (20) with constant forcing and coefficients a, q, A, σ, F . Then, the marginal climatology for x(t) is determined by Since there are five coefficients in the system (18) and two constraints (29), there exists a three-parameter family of linear models with structure (18) and coefficients a m (w w w), q m (w w w), A m (w w w), σ m (w w w), F m (w w w) , w w w ∈ IR 3 , with the same marginal climatology for x(t).
Proposition 3 (MSMs with correct marginal climatology). Consider the Mean Stochastic Model (24) with constant forcing and coefficients γ msm , σ msm , F msm . Provided that γ msm > 0, the climatology of (24) is controlled by two parameters The process x msm (t) satisfying (24) has the same climatology as x(t) satisfying the perfect system (18) provided that where a, A, q, σ are the coefficients of the perfect system (18) with constant forcing F . Since there are three coefficients in the system (24) with constant forcing and two constraints (31), there is a one-parameter family of MSm's with coefficients with the same climatology as that of the perfect system (18). In general, MSm cannot reproduce the marginal two-point equilibrium statistics of the perfect system (18).
The proofs of the above propositions are elementary and rely on algebraic manipulations of the expressions for the marginal second-order statistics at equilibrium of the systems (18) and (24); we omit them here for brevity. Time-averaged extensions of the above propositions to the nonautonomous case with time-periodic statistics on the attractor can be easily implemented but they are unnecessary for the following discussion.
3.3. Imperfect model optimization and information barriers. In many practical situations, such as actual experiments in climate science, it is important to understand the response of the true system to changes in external forcing, δF , and to construct imperfect models whose response captures the features of the true system's response on the resolved coarse-grained variables. Here, following [80,73], we discuss the first example of imperfect model optimization via the informationtheoretic principles of §2 which shows unambiguously the existence of information barriers in the simplest yet revealing case; here, the true system is given by the linear Gaussian system (18) and the imperfect models have perfect unperturbed climate fidelity (13). It can be also easily shown that in this linear Gaussian framework only the imperfect models with correct marginal equilibrium statistics at all time lags (cf. Definition 3.1) have the correct forced response at all times.
Consider the imperfect models of (18) discussed in §3.2 with perfect climate fidelity (13) on the resolved subspace x. In the linear Gaussian framework this is easily achieved by matching the equilibrium means and variances, i.e, wherex m att ,x msm att are the marginal imperfect model means and E m , E msm are the corresponding marginal variances (33) The asymptotic, infinite-time response of the perfect system to forcing perturbations from the statistically steady state can be obtained by replacing F in (18) by F + δF while the same experiment in the imperfect models for (18) involves replacing F m by F m + δF . Since the considered models are linear and Gaussian, the only change in the expected response to forcing perturbations is through the change in mean while the variance of x, x m and x msm for the perfect and imperfect models remains unchanged with the values E m and E msm tuned to E as in (53). Now, assume that the perfect linear system (18) with parameters (a, q, A, σ) and constant forcing F satisfies the stability conditions (20), with A > 0. As shown in [80,73] in such a case no MSm (24) with γ m > 0 can match the sensitivity of the perfect system (18) since the perfect and imperfect model sensitivity are always anti-correlated; this is easy to see by noticing that for A > 0, sign(δx) = −sign(δF ) in (34a) but for all MSm models sign(δx msm ) = sign(δF ) in (34c). The formula in (17) applies exactly to these models with perfect climate fidelity with In this situation with A > 0 there is an intrinsic barrier to sensitivity improvement for the MSm models (24), since any attempt to minimize the information theoretic model error in the sensitivity through the general principle in (8) is futile because no finite minimum over γ msm of (35) is achieved and necessarily γ msm → ∞ in the approach to this minimum value. This information barrier can only be overcome by enlarging the class of models beyond (24) by introducing more degrees of freedom in the model; in this simple framework expanding the class of imperfect models to include the 2 × 2 linear models allows for a trivial minimization of (17) to achieve the perfect forced response.
On the other hand, if the perfect system satisfies (20) with A < 0, then using (35) to optimize the sensitivity of MSm leads to Note that while for A < 0 in (18) the MSm with perfect climate fidelity and γ * msm also captures the infinite time forced response of the perfect model on the resolved variable, it does not correctly reproduce the forced response at finite times; this can be easily seen using the standard 'fluctuation-dissipation' formulas for the mean linear response of a stochastic system to external perturbations (e.g., [75,87,88,11]) which for the response in the mean are given by where · att denotes the ensemble average at the unperturbed equilibrium; the response in the variance is identically zero for linear Gaussian systems. Thus, the correct forced response at all times in the Gaussian framework requires correct marginal autocorrelations at all lags τ which cannot be achieved for the MSm (see Proposition 3). On the other hand, the imperfect 2×2 models with correct two-point marginal equilibrium statistics have the correct forced response from Proposition 1.

3.4.
Prediction skill of imperfect models with perfect statistical equilibrium fidelity. The simple linear framework of two-dimensional Gaussian models with hidden dynamics provides a useful testbed for analyzing and understanding many important issues associated with short, and medium range predictions of much more complex nonlinear natural systems where the partial observability of the true system and the subtle interplay between the initial conditions and model error are much more difficult to capture. Here, we use the linear Gaussian system (18) and its imperfect models to illustrate the following issues: • Initial data response at short and intermediate ranges for the imperfect models with climate fidelity; • Limitations of superensemble framework and mutual information for estimating prediction skill.
As shown in §2, the prediction skill and model error for imperfect models can be quantified using the information-theoretical metrics given in Definitions 2.1-2.3. Note first that for imperfect models with perfect statistical equilibrium fidelity (see (13) or (14)) and statistical initial conditionsū u u 0 ≡ (x 0 ,ȳ 0 ), R 0 , the following relationships hold Asymptotic model error : lim Unsurprisingly, this implies that all imperfect models with perfect climate fidelity are equivalent for long range forecasts where the sensitive dependence on the initial conditions vanishes. However, as already discussed in §3.2.1 imperfect models with climate fidelity can have very different short and medium range prediction skill due to different dynamical behavior during the approach to equilibrium from a given initial condition. Here, the sensitivity of imperfect predictions for the resolved variable x(t) to the unresolved initial conditionsȳ 0 is especially interesting since the uncertainties in these initial conditions are impossible to control in practice.
In order to mimic realistic forecast situations with uncertainties in initial conditions as well as the model error, we construct ensembles of perfect/imperfect forecasts with initial conditions normally distributed around the initial ensemble mean,ū u u u u u u u u 0 = (x 0 ,ȳ 0 ), and with covariance R 0 of the initial conditions in the ensemble smaller (in a suitable norm) than the climate variance at t 0 . We use the analytical formulas for the second-order statistics of the perfect system and the imperfect models (see Appendix A) and represent every ensemble member by its mean and covariance.
The examples discussed below are based on a suite of imperfect models introduced in §3.2 with either the correct marginal two-point statistics or correct marginal climatology (see Definitions 3.1-3.2 in §3.2.1). Given the perfect model (18) with coefficients (a, q, A, σ) and constant forcing F , we consider three imperfect models tuned to have the correct marginal two-point equilibrium statistics and two models with the correct marginal climatology (see Definitions 3.1, 3.2 and Proporitions 1-3); the cumbersome formulas for the coefficients a m , A m , q m , σ m , F m incorporating the above constraints are relegated to Appendix A.3. The models in the suite are given by:

Models with correct marginal two-points statistics on attractor
(i) 2×2 model with overestimated damping in x m (t): a m = 1.5a and A m , q m , σ m , F m given by (104), given by (104).
Note here that the model (iii) incorrectly assumes negative damping in the resolved dynamics while reproducing the correct two-point marginal equilibrium statistics and has, therefore, the correct response for all forcing perturbations. The last two models (iv) and (v) only have the correct marginal climatology (cf Definition 3.2) and they lack the correct forced response. Despite these obvious differences, the choice of the best model depends largely on the forecast lead time, as discussed below.
Figures 1, 2, 3, 4 illustrate the model error (4) and internal prediction skill (5) of the probabilistic forecasts from the imperfect models (i)-(v) in six distinct configurations of the perfect model: (1) nearly normal dynamics with the fast direction perpendicular to the resolved subspace (figure 1), (2) nearly normal dynamics with the fast direction parallel to the resolved subspace (figure 2), (3) non-normal dynamics with non-degenerate eigenspaces (figure 3), (4) non-normal dynamics with nearly degenerate eigenspaces (figure 4), (5-6) dynamics with complex eigenvalues (figure 5 with small real parts, and figure 6, with comparable real and imaginary parts). Note that in of all these regimes the nature of the dynamics in the imperfect models (i)-(v) can differ drastically from that of the perfect system. High predictive skill of a given imperfect model is achieved for a small model error E and high potential predictive skill Sk. The main points regarding the predictive skill of the optimized imperfect models (i)-(v), illustrated in the figures 1, 2, 3, 4, are: • The choice of the best imperfect model for initial data response depends on the nature of the true dynamics, the range of initial conditions, and the forecast range. Moreover, the model sensitivity to initial conditions for the unresolved dynamics is important. • When the perfect system dynamics is nearly normal with the strongest attracting direction along the unresolved subspace y, imperfect models with (at least) correct marginal climatology (Definition 3.2) have good overall predictive skill (figure 1). • The imperfect model (iv) tends to have the best overall skill in our test suite and good short range prediction skill in highly oscillatory systems (see figure  5); note that this model only reproduces the correct marginal climatology and it does not have the correct forced response (see §3.3). • The predictive skill of MSm, representing the simplest reduced model, is generally poor and is highly dependent on the initial conditions and the perfect system geometry. • Imperfect models with unphysical negative damping but correct marginal twopoint statistics (and forced response) can outperform models with positive damping for short range forecasts (see model (iii) in the bottom panels of figures 1, 3, 6) or even medium ranges (model (iii) in bottom panels of figures 4, 5). This model usually has better skill than MSm for predictions with all lead times. Si gnal te rm   Dispersion term Figure 2. Model error (4) and potential prediction skill (5) for forecasts using the imperfect 2 × 2 linear systems (18) and the MSm (24) with statistical equilibrium fidelity (13) (models (i)-(iii) also have the correct two-point equilibrium statistics; see Def. 3.1). The perfect system dynamics is nearly normal with strongest attracting direction nearly parallel to x with coefficients (a = −3.7, λ1 = −1, λ2 = −4, σ = 0.5, F = 0.1). The two examples are for different statistical initial conditionsx0,ȳ0, R0.  Model error Potential prediction skill   Figure 5. Model error (4) and potential prediction skill (5) for forecasts using the imperfect 2×2 linear systems (18) and the MSm (24) with statistical equilibrium fidelity (13) (models (i)-(iii) also have the correct two-point equilibrium statistics; see Def. 3.1). The perfect system and the imperfect models (except (iv)), have complex-conjugate eigenvalues λ1,2 = α + βi, α β λ1,2 = α + βi, α β λ1,2 = α + βi, α β; the imperfect model (iv) has purely real eigenvalues and correct marginal climatology and parameters (a = −2, λ1,2 = −0.5 ± 2i, σ = 0.5, F = 0.1). The two examples are shown for different statistical initial conditionsx0,ȳ0, R0.

Elementary Non-Gaussian test models with intermittency.
Here, we discuss two mathematically tractable nonlinear and non-Gaussian models with intermittency which are very useful for analyzing and understanding many nontrivial aspects of UQ and predictive skill in more complex systems; the discussion of UQ in the spatially extended systems presented in §6 builds on these simple models by combining them in an appropriate fashion in the Fourier domain.
The first system with intermittency [30,29,94,9] discussed below ( §4.1) combines turbulent dynamics of a complex scalar with unresolved processes affecting the resolved dynamics through colored noise fluctuations in the damping and colored noise fluctuations in the forcing. The second system [76], discussed in §4.2, is a one-dimensional reduced climate model for low-frequency variability with cubic nonlinearities and correlated additive and multiplicative white noise forcing; this model displays regime switching despite a unimodal PDF. Both of these models are very useful for studying various imperfect Gaussian or non-Gaussian approximations obtained either via the direct simplification of the true dynamics (e.g., linearization of the underlying dynamics, dimensional reduction) and/or moment closure approximations of the turbulent fluxes. This stochastic approach allows for analyzing many properties which are relevant for UQ and prediction in high-dimensional turbulent systems in a greatly simplified one-mode setting.
Important points discussed here include: • Effects of various types of model error on the prediction skill in nonlinear systems with non-trivial mean-turbulent flux interactions; • Utility of the information-theoretic framework for systematic model improvement in the presence of intermittency and positive Lyapunov exponents; • Existence of information barriers for improving the sensitivity response of imperfect models. • False multiple equilibria in the imperfect models; • Physically consistent white noise limits of complex dynamical systems as test models with judicious model error for UQ and filtering/data assimilation.

4.1.
Simple test model with hidden dynamics and intermittency. Consider a single Fourier mode u(t) of a turbulent signal modeled by the following stochastic system (see [30,29,9,11]) where W u , W b are independent complex Wiener processes and W γ is a real Wiener process defined for all time t ∈ IR. There are eight parameters in the system (38): two damping parameters d b , d γ > 0, two oscillation frequencies ω and ω b , the stationary mean of the dampingγ, and noise amplitudes σ u , σ b , σ γ > 0; F (t) is a deterministic forcing.
Here, we regard u(t) as one of the resolved modes in a turbulent signal where the nonlinear mode-interaction terms are replaced by a stochastic drag γ(t) and an additive noise term b(t), as is often done in turbulence models [75,18]. The nonlinear system (38), introduced first in [30] for filtering multiscale turbulent signals with hidden instabilities has a rich dynamics mimicking turbulent signals in various regimes of the turbulent spectrum, including intermittently positive finite-time Lyapunov exponents, as discussed below (see figure 7 and [9,11]). Moreover, due to the particular structure of the nonlinearity in (38), exact path-wise solutions and exact second-order statistics of this non-Gaussian system can be obtained analytically, as discussed in [33,34,30]. The mathematical tractability of this model and its rich dynamical behavior provides a perfect testbed for analyzing effects of errors due to various moment closure approximations and dimensional reduction in a suite of imperfect models introduced in §4.1.1.
The physically relevant dynamical regimes of (38) satisfying the mean-stability condition (see [9]) are (see also figure 7): (I) Regime of plentiful, short-lasting transient instabilities in the resolved component u(t) with fat-tailed marginal equilibrium PDF. This is a regime characteristic of the turbulent energy transfer range and is associated with rapidly decorrelating damping fluctuations γ(t); typical parameter values are σ γ , d γ 1, σ γ /d γ ∼ O(1) and positiveγ ∼ O(1).
(II) Regime of intermittent large-amplitude bursts of instability in u(t) with fattailed marginal equilibrium PDF. This regime is characteristic of turbulent modes in the dissipative range and it occurs for small Laminar' regime with nearly Gaussian equilibrium PDF. This regime is characteristic of the laminar modes in the turbulent spectrum. Here, fluctuations in u(t) decorrelate rapidly compared to the damping fluctuations γ(t) and the transient instabilities in u(t) occur very rarely. This type of dynamics occurs in (38) for 4.1.1. Imperfect models, information-theoretic optimization and information barriers. We describe here four classes of imperfect models derived from (38) which introduce model errors through various moment closure approximations or a dimensional reduction of the system (38); two of these models are nonlinear with 'Gaussianized' statistics while the simplest model, the MSm discussed already in §3, is both linear and Gaussian. The effects of model errors introduced by these approximations and ways of mitigating these errors are discussed in §4.1.2 where we apply the information-theoretic framework discussed in §2 to highlight the following important points: • The information-theoretic optimization ( §2.1) can dramatically improve predictive performance and sensitivity of imperfect models. In particular, for systems with too much dissipation a simple inflation of the stochastic forcing in order to optimize the time-averaged climate fidelity (14) leads to a significant point-wise reduction of model error and greatly improved prediction skill in models with consistent moment closure approximations. • Climate fidelity (cf. (13), (14) ) of imperfect models on the coarse-grained subset of resolved variables is necessary but not sufficient for high skill in forced response predictions. Climate fidelity requires the optimization of the whole model density and not its individual moments. • There exist information barriers to model improvements within a given class of imperfect models. • Nonlinear, non-Gaussian models can have long memory of initial conditions, including the memory of initial conditions for the unresolved processes. • Linear Gaussian models cannot reproduce the response in the variance to forcing perturbations in nonlinear systems [82]. The above issues, alongside those discussed in the context of the linear Gaussian framework in §3, will be important in the spatially extended framework discussed in §6; the examples discussed here provide an unambiguous illustration of analogous phenomena occurring in vastly more complex natural systems where they are much more difficult to capture.
Note first that, with the state vector v v v = (u, b, γ) T of the system (38) with one resolved component u and two hidden components (b, γ), the deterministic part in (38) can be written as withL a linear operator, B a bilinear function, and F a spatially uniform term representing generalized deterministic forcing; the exact form of these terms can be easily obtained by comparing (40) with (38) and (1). In what follows we will skip the explicit dependence on time in f f f in order to simplify the notation. It can be easily shown (e.g., [52,9]) that by adopting an analogue of the averaged Reynolds decomposition of the state vector and the overbar denotes an ensemble average.
Due to the particular form of the quadratic nonlinearity in the perfect system (38), exact formulas for the second-order statistics and path-wise solutions can be derived [30] without the explicit knowledge of the associated time-dependent probability density. Derivation of imperfect models in this framework utilizes some type of moment closure approximation applied to the turbulent fluxes in (41). Below we describe two nonlinear and two linear imperfect models of the system (38) which introduce model error due to various moment closure approximations applied to (41) and/or due to a dimensional reduction. These are standard ways to introduce model error in vastly more complex turbulent systems with the form (40) (e.g, [22]). The effects of errors introduced by these models on the short, medium and long range prediction skill, as well as ways of mitigating these errors via the informationtheoretic framework of §2, are discussed in the following sections. It is important to foreshadow the following discussion and stress that while the first three imperfect models below have Gaussian statistics due to the moment closure approximations, the first two models remain nonlinear in the state variables v v v = (u, b, γ). On the other hand, the last imperfect model is linear and non-Gaussian due to the presence of multiplicative noise which arises through appropriate white noise limits of the true dynamics.
Gaussian Closure model (GCm) Gaussian Closure model (GCm) Gaussian Closure model (GCm). This model is nonlinear in the state variables but has a 'Gaussianized' statistics with nontrivial and consistent mean-fluctuation interactions. The quasi-Gaussian closure approximation (e.g., [52]), which is familiar from turbulence theory, implies neglecting the third moment of the true probability density p(v v v, t) in the equation in (41b) for the covariances. Thus, we assume The closure (42) results in a fully coupled nonlinear dynamical system for the second-order statistics and it introduces model error due to neglecting the turbulent fluxes in the covariance (41b) of (38). Thus, the GCm closure for a general quadratic system is given by with

Deterministic-Mean model (DMm) Deterministic-Mean model (DMm)
Deterministic-Mean model (DMm). This model, similarly to GCm, is nonlinear in the state variables and has a Gaussianized statistics but, apart from neglecting the turbulent third-order flux in the covariance, it also neglects the turbulent backscatter into the mean. This ad-hoc closure corresponds to assuming in (41). Note that the constraint (44b) is inconsistent with the nontrivial evolution of the covariance in (41b) and effectively implies that . Thus, the DMm closure for a general quadratic system is given by with

Mean
The meanū msm and covariance R msm for this linear model can be computed analytically in a standard fashion leading tō whereÂ ≡ ∇f f f u | (ūmsm,b=0,γmsm) , Σ msm = 1 2 diag[σ 2 msm , σ 2 msm ], and the covariance R msm = u u u ⊗ũ u u T is considered in the real variablesũ u u = e[u msm −ū msm ], m[u msm −ū msm ] T .
Note that the linearity of MSm implies that the covariance R msm is insensitive to forcing perturbations. Consequences of this fact on improving the model fidelity and sensitivity are discussed in the next section.
White noise limits of the test system (38) White noise limits of the test system (38) White noise limits of the test system (38). One alternative to obtaining reduced models via the moment closure approximations or the direct dynamics linearization is to consider appropriate white noise limits of the unresolved (or poorly understood) components of the perfect system. Such a procedure leads to simplified stochastic models with reduced dimensionality and judicious model error. This approach will prove particularly useful and revealing in §7 where we discuss various numerical artifacts in complex spatially extended models.
Here, we illustrate this strategy on the single-mode system (38) by assuming that both the damping fluctuations γ(t) and the additive forcing uncertainty b(t) affecting the resolved dynamics u(t) decorrelate very fast, i.e., we take the formal limit in (38) As shown in more detail in Appendix B, the formal limit preserving the nonvanishing correlations u(t)γ(t) leads to the Stratonovitch SDE (e.g., [28]) for the complex scalar u(t), given by where W γ and the components of W u = 1 √ 2 (W 1u +iW 2u ) and W b = 1 √ 2 (W 1b +iW 2b ) are independent Wiener processes defined for all time t ∈ IR. We rewrite (111) in Ito form as (52) where α =γ − 1 2σ 2 γ is the damping term in the white noise limit. In contrast to the original system (38) with colored multiplicative and additive noise terms, the marginal equilibrium PDF for the components u R ≡ e[u], u I ≡ m[u] for the system (112) in the unforced case F ≡ 0 can be obtained analytically (see Appendix B) in the form Although this is not shown, the marginal equilibrium PDF (53) for u(t) obtained in the white noise limit of γ(t), b(t) provides a very good approximation of the marginal equilibrium statistics of u(t) in regime I (abundant transient instabilities) of the original system (38).

4.1.2.
Using stochastic parameterization to improve prediction skill and forced response of imperfect models. Here, we illustrate the utility of the information-theoretic framework discussed in §2 to systematically and simultaneously improve the climate fidelity and prediction skill of imperfect models GCm, DMm, and MSm of the system (38) in the presence of intermittency and positive Lyapunov exponents. We illustrate this approach in this nonlinear and non-Gaussian configuration by discussing: (i) model optimization and the existence of information barriers, (ii) forced response of optimized models, and (iii) the prediction skill of optimized imperfect models when both the initial conditions and the forced response are important; for more details see [11].
Note that the moment closures employed in the derivation of GCm and DMm discussed in the previous section, or the dimensionality reduction used in MSm, do not preserve a-priori the statistical attractor of the perfect system. In fact, as was shown in [11], these imperfect models have too much dissipation and underestimate both the equilibrium mean and variance relative to the perfect system (see figure 8).
Here, following the methodology of [80,11], we focus on improving the model fidelity by inflating the stochastic forcing in the resolved dynamics of the imperfect models in order to minimize the annually averaged information content, P(π att , π m att ) as in (14) of §2. In figure 8 we show an example of such an optimization procedure carried out in regime II (see §4.1) of mean-stable dynamics of the perfect system (38); we point out the following: • For all imperfect models the attractor fidelity is significantly improved by inflating the amplitude of the stochastic forcing σ M * u . The best results can be achieved for GCm which is based on consistent moment closure approximations.
• The total lack of information in the imperfect model attractor dynamics is greatly reduced at all times in the time-periodic setting even though only the  (38). Model error in the climate, P(πatt, π m att ), before (left) and after (right) the optimization of the imperfect models of the system (38) by stochastic noise inflation aimed at minimizing the time-averaged relative entropy (14). Note the significant improvement in the optimized imperfect models reflected in the point-wise decrease of the relative entropy. This example is typical of regime II in (38); the parameters used in computations arê γ = 1.2, σu = 0.5, σγ = dγ = 0.5. (14) is used in the optimization procedure; this was observed earlier for the Gaussian models in [32].

time-averaged formula
In the examples discussed below we use the time periodic forcing: where φ 1 = 1, φ 2 = 0 and ω = π/6 which ensures that the period T 0 = 12 so that it can be interpreted as a year consisting of 12 months; the forcing has two frequencies so that the equilibrium statistics of (38) has a more variable structure.

Information barriers to imperfect model improvement
The important issue in model optimization concerns the extent to which the climate/attractor fidelity of the imperfect models can be improved. This issue is particularly important and interesting in situations when the true system has a wide range of dynamical regimes, including turbulent regimes with intermittency. In figure 9, which is complementary to figure 8, we show results of the informationtheoretic optimization of the imperfect models of (38) which is achieved by the stochastic noise inflation for different parameters and dynamical regimes of the perfect system (38). As before, the optimal noise amplitude σ m * u is determined by minimizing the time-averaged lack of information (14) in the imperfect models. The following remarks are based on figure 9: • There exist barriers to model improvement by noise inflation for DMm and MSm as the hidden, transient instabilities become more abundant (see the transition between regimes II and I of (38) for increasing d γ in figure 9). . Information-theoretic optimization of imperfect models of the quadratic intermittent system (38). Optimal noise amplitude σ m * u for the time-averaged climate fidelity (14) and the corresponding model error as functions of the mean dampingγ and damping of fluctuations dγ for the imperfect models GCm, DMm and MSm. Note the barriers to model improvement by stochastic forcing inflation in DMm and MSm for increasing dγ as the hidden, transient instabilities become more abundant in u(t) on the approach to Regime I (see figure 7).
• For weak mean dampingγ, associated with very intermittent dynamics of u(t) and fat-tailed marginal equilibrium PDFs (see Regime II in figure 7), perfect attractor/climate fidelity cannot be achieved by inflating the stochastic forcing • GCm retains the best climate fidelity throughout the parameter range.

Forced response of imperfect models with optimal noise
Here, the main focus is on elucidating the link between the attractor/climate fidelity of imperfect models and their sensitivity to perturbations δF (t) of the external forcing. Consequently, we consider here long lead times so that essentially all knowledge of the initial conditions is lost and only the perturbations to the attractor of the perfect system are important. The sensitivity of the perfect system and its imperfect models to such perturbations (see Definition 2.3) is quantified via the relative entropies P(π δF , π att ), P(π m * δF , π m * att ); this is illustrated in figure 10 in a typical configuration with intermittent transient instabilities with where A δ 0 is the perturbed mean forcing, A 0 is the unperturbed mean forcing, and a   Figure 11. Interplay between the initial data response and the forced response for the quadratic model with intermittency (38). (Right) Model error P(π δF,t 0 , π m δF,t 0 ) (4) and (left) the internal prediction skill, P(π δF,t 0 , π m att ), for the predictions of the resolved component u(t) using the imperfect models optimized for unperturbed climate fidelity (14). The initial conditions at t0 are off the attractor and the subsequent climate change is induced by ramp-type forcing perturbations δF (55). Note the high skill of GCm at all ranges, including the short range skill in the dispersion. This situation is typical of regime II (see figure 7) of the perfect model (38) where both GCm and DMm achieve perturbed climate consistency; MSm has no short range skill and only a marginal skill for the long range forecasts. The perfect system parameters used in computations areγ = 1.2, σu = 0.5, σγ = dγ = 0.5.
dispersion. Unlike DMm and MSm, the sensitivity of GCm for the response in the mean remains very good in all dynamical regimes (not shown); this stems from the consistent moment closure approximations used in GCm.
• The linear Gaussian model MSm completely fails to detect the covariance response to the forcing perturbations. The linearization inherent in the MSm has relatively little effect in regimes with no transient instabilities in the resolved dynamics (e.g., nearly-Gaussian regime III of (38)) when all the discussed models perform well (not shown).
Interplay between initial data and forced response with optimal noise The most difficult forecasting scenarios are those involving the short and medium range imperfect model predictions when both the memory of the initial conditions, model error, and the response due to forcing perturbations are important in the forecasts [12,117,37,38].
In figure 11 we illustrate the predictive skill of the imperfect models with optimal noise in a scenario where the statistical initial condition is away from the attractor and the external forcing is perturbed at some later time during the evolution. As described in §2, the predictive skill of the imperfect models optimized for climate fidelity can be assessed by monitoring the model error, P(π δF,t0 (t), π m * δF,t0 (t)), and the internal prediction skill P(π m * δF,t0 (t), π m * att (t)). The following remarks are based on the results illustrated in figure 11: • GCm with optimal stochastic forcing has good prediction skill in dynamical regimes with intermittent large-amplitude transient instabilities (e.g., regime II of (38)), including the short term initial data response and the long term perturbed climate/attractor consistency. DMm has a good short range prediction skill and MSm has no skill in this regime. • All optimized imperfect models achieve perturbed attractor consistency in regimes with essentially no transient instabilities like, for example, regime III of (38 (not shown see [11]). GCm and DMm are essentially the same in this regime and have good prediction skill for all lead times. MSm, which completely neglects the dynamics of the unresolved variables has no short range prediction skill due to a poor initial data response. • In dynamical regimes characterized by abundant short-lasting transient instabilities (e.g., regime I of (38)) GCm has a good skill for predicting the mean dynamics but it fails, like all the other imperfect models, at predicting the covariances (the dispersion part in these cases dominates the total model error).

4.2.
Cubic scalar SDE with regime switching and unimodal PDF. The systematic development of reduced low-dimensional stochastic models from observations or comprehensive high-dimensional models is an important topic for atmospheric low-frequency variability, climate sensitivity, and improved extended range forecasting; other areas where such issues are important include neural science and engineering applications in situations where low-frequency variability can be efficiently described by just a few large-scale teleconnection patterns dominating the coarse-grained dynamics. Systematic strategies are essential in successfully developing reduced models and the recently developed stochastic mode reduction strategy provides a systematic procedure for the derivation of reduced stochastic models [86,85,75,77,78]. Systematically reduced stochastic models are attractive for sensitivity studies of the climate and other complex high-dimensional systems because: (i) they are computationally much more efficient than state-of-the-art climate models and have been shown to have comparable prediction skill, and (ii) they allow for a better understanding of the climate system due to the reduced complexity. Here, we consider an important example of a nonlinear scalar reduced model with correlated additive and multiplicative noise arising from advection of the large scales by the small scales and simultaneously strong cubic damping. This normal form for a single low frequency variable possesses an interesting dynamical configuration with regime switching despite unimodality of the associated PDF resembling the situation occurring in comprehensive climate models [77,74]. Two imperfect models, one Gaussian and one non-Gaussian, are utilized below to highlight the following issues: • Gaussian approximations of non-Gaussian systems can have, in certain circumstances, a smaller model error and better climate fidelity than non-Gaussian imperfect models. • Moment closure approximations of nonlinear systems can have false multiple equilibria which can be removed by information-theoretic optimization.
The cubic scalar framework utilized here provides the simplest possible illustration of both these issues which are important in complex high-dimensional situations where they are much more difficult to isolate.
The one-dimensional reduced climate model [76] for low-frequency variability with cubic nonlinearities and correlated additive and multiplicative white noise forcing is given by where a, b, c, A, B, σ u are arbitrary real constant coefficients, and W AB (t), W u (t) are independent Wiener processes defined for t ∈ IR; the correlated additive and multiplicative noise associated with W AB arises through the systematic stochastic mode reduction procedure [76]. Even for time-independent forcing this model possesses a wide range of interesting dynamical regimes which can be deduced from its equilibrium PDF given by (see [76]) where N 0 is a normalizing constant and The dynamical regimes of interest are illustrated in figure 12 and include: (i) Nearly Gaussian regime; (ii) Regime with symmetric fat-tailed algebraic tails; (iii) Regime with highly skewed unimodal PDF and no regime switching; (iv) Regime with highly skewed unimodal PDF and regime switching; (v) Bimodal regime.
The non-Gaussian cubic system (56) serves as an instructive and revealing model for comparing nonlinear Gaussian and linear non-Gaussian approximations and understanding the origins of information barriers to model improvement in the regimes (i)-(iv) of the cubic test system (56). Here, we consider two imperfect models, one linear non-Gaussian, and one nonlinear Gaussian; both these models are described below: Non-Gaussian linear model with multiplicative noise Non-Gaussian linear model with multiplicative noise Non-Gaussian linear model with multiplicative noise This model is obtained via the linearization of (56) in the state variable u which leads to where the model coefficients (a m , b m , c m , B m , A m , σ m u ) are, in principle, different from those of the perfect system (56). For time-independent forcing the equilibrium PDF associated with (58) can be easily derived as This nonlinear model with Gaussianized statistics is obtained in an analogous way to that discussed in §4.1.1 by consistently neglecting odd moments in the equations for the mean and variance of u(t); the resulting closed system of ODEs for the approximate second-order statistics is given by where the model coefficients (a m , b m , c m , B m , A m , σ u m ) are not necessarily identical to those in the perfect system (56). Here, the time-dependent Gaussian PDF for GCm is given by p gcm (t) = N ū m , u 2 m ; it is important to note that GCm for the cubic system (56) can have non-unique invariant measures due to false multiple equilibria in (60). This important new consequence of model error is discussed in more detail in the next section and is illustrated in figures 16 and 15.

Information barriers for model optimization.
The presence of the correlated multiplicative and additive noise and the cubic nonlinearity in the non-Gaussian system (56) for the real scalar u(t) provides a natural complement of the previous quadratic test model (38) with colored damping fluctuations discussed in §4.1. Here, we use the system (56) and its imperfect models discussed in the previous section to address in a simple unambiguous framework two new issues which are important for UQ in more complex high-dimensional systems: (i) the skill of non-Gaussian linear models and quasi-Gaussian nonlinear models for approximating non-Gaussian dynamics, (ii) existence of false multiple equilibria in the statistics of nonlinear imperfect models. The information-theoretic framework described in §2 is utilized again to illustrate these new issues in the simple one-dimensional setting. Figure 13 illustrates the results of information-theoretic optimization of the imperfect nonlinear Gaussian (60) and linear non-Gaussian (58) models of the cubic system (56) discussed in the previous section; here, the optimization ( §2) is carried out in various dynamical regimes of the system (56) by a simple additive stochastic noise inflation to achieve climate fidelity (13) through (12). The coefficients in the perfect system (56)  The results in figure 13 show the optimal noise values σ m * u and the minimum model error at equilibrium, P(π att , π m * att ), as a function of the true additive noise level σ u in the perfect system (56); the results for the Gaussian and non-Gaussian imperfect models are superimposed in each of the examined dynamical regimes (see figure 12) for easy comparison. The information barriers for the linear non-Gaussian model are prominent in the fat-tailed regime (ii) and the bimodal regime (v), while GCm has information barriers in the highly skewed regimes (iii)-(iv) where the linear non-Gaussian model performs marginally better. The relative entropies are computed numerically based on the analytical expressions for the equilibrium PDFs, since the Gaussian framework (7) is inappropriate in the highly skewed regimes. optimal noise for climate fidelity

Systematic model optimization and information barriers
optimal noise for climate fidelity optimal noise for climate fidelity optimal noise for climate fidelity  Figure 13. Information-theoretic optimization of imperfect models of the cubic system (56). The four insets corresponding to different regimes of (56) show (from top to bottom) the optimal additive noise amplitude σ m * u for the attractor/climate fidelity (13) obtained via (12), the minimum model error at equilibrium, and the PDFs for the optimized imperfect models for σ m * u corresponding to σu = 1. Note the information barriers to the improvement of the linear non-Gaussian model in regimes (ii) and (v), and the barriers for GCm in the highly skewed regimes (iii), (iv). Figure 14 illustrates the attractor/climate fidelity of the two imperfect models with time-periodic forcing in the phase space spanned by the mean and variance; Figure 14. Attractor fidelity of the imperfect models of the cubic system (56) with time-peroidic forcing. Limit cycles in the second-order statistics of the system (56) and the imperfect models before and after the information-theoretic optimization (cf. §2). In the fat-tailed PDF regime (ii) the optimization is carried out via the additive noise inflation σ m u , while in the highly-skewed PDF regime (iii) the additive noise and the mean forcing are optimized for the minimum model error. the two insets, corresponding to the situation before and after the optimal noise inflation, show the limit cycles in the time-periodic second-order statistics for the perfect system and the imperfect models. The two examples shown correspond to the fat-tailed PDF regime (ii) and the highly skewed PDF regime (iii). In both cases GCm has a significantly better attractor/climate fidelity for the first two statistical moments even before the optimal noise inflation, although in the highly skewed PDF regimes the model error in the full PDF of GCm often exceeds that of the linear non-Gaussian model (see figure 13); in the nearly Gaussian regime (i) both models perform similarly well.

Based on these instructive examples illustrated in figures 13-14, we make the following points:
• GCm has, in general, a better attractor/climate fidelity, including the timedependent configurations with nontrivial attractors; GCm can also be better optimized for climate fidelity by additive stochastic noise inflation except for highly skewed regimes where the linear non-Gaussian model performs better. • There exist barriers to imperfect model improvement which can be easily detected within the information-theoretic framework ( §2).

False multiple equilibria in imperfect models
Existence of false equilibria in imperfect models can have profound consequences for UQ in complex high-dimensional turbulent systems where such artifacts might not be easily identifiable. In figures 16 and 15 we show two distinct examples of such artifacts.
First, in figure 15 we show the configuration with two stable and one unstable equilibria in the Gaussian statistics of GCm when the perfect system (56) has a bimodal PDF (regime v). The information-theoretic optimization of §2 via the stochastic noise inflation lads to a one-equilibrium configuration for the statistics of GCm; note that the equilibrium mean and variance in the optimized GCm is  (4) for the two imperfect models in the regime (ii) with fat-tailed algebraic tails, (right) the sensitivity and model error for the two imperfect models in the regime (iii) with highly skewed equilibrium PDFs (see figure 12). Note the poor skill of the linear non-Gaussian model, this is particularly important in the regime with highly skewed PDFs where the linear non-Gaussian model has better attractor/climate fidelity ( figure 13).
close to the mean and variance of the perfect system (see black and magenta dots in figure 15).
A more complicated situation is illustrated in figure 16 showing the configuration with two stable equilibria in the Gaussian statistics of GCm when the perfect system (56) has a highly-skewed PDF (regime (iii)). In this case the globally optimal amplitude σ m u of the additive stochastic noise in GCm corresponds to the twoequilibrium configuration right before one of the stable equilibria disappears in the saddle-node bifurcation; note that the fat tail of the perfect model PDF is captured to some extent. The locally optimal configuration with one equilibrium corresponds to much larger noise values as shown in the right column of figure 16; note that in this case the single remaining equilibrium mean and variance of the optimized GCm is quite different from that of the perfect model (see black and magenta dots in figure 16) with the maximum in the PDF of GCm close to the in the trough in the perfect model PDF.
This new phenomenon, arising as a consequence of model error induced by symmetrization of the model PDF, deserves the following comments: • While the simple Gaussian imperfect models of nonlinear non-Gaussian systems can have high skill and climate fidelity when appropriately optimized, there exist dynamical regimes, usually associated with multimodality, where such imperfect unoptimized models display false multiple equilibria; • Information-theoretic optimization ( §2) is capable of mitigating these errors and improving the predictive skill even for the simple Gaussian closure models.

Forced response of imperfect models with optimal noise
Here, we consider the sensitivity of optimized linear non-Gaussian (58) and nonlinear Gaussian (60) imperfect models to perturbations δF (t) of the external forcing Consequently, we consider here long lead times so that essentially all knowledge of the initial conditions is lost and only the perturbations to the attractor of the perfect system are important. The imperfect models have the optimal additive stochastic forcing for attractort/climate fidelity according to (12). As before, the sensitivity (see Definition 2.3) of the perfect system and the imperfect models to such perturbations is quantified via the relative entropies, P(π δF , π att ), P(π m * δF , π m * att ). We illustrate the sensitivity of the optimized imperfect models in figure 17 in two intermittent regimes of the perfect cubic system (56); the left panel shows the sensitivity of the two imperfect models in the regime (ii) with symmetric fat-tailed PDFs while the right panel shows the results an analogous experiment in the regime (iii) with highly skewed PDF; for the sake of brevity we do not show the results in the nearly Gaussian regime (i), where the sensitivity is good for both models, and in the bimodal regime (v) where both models have poor sensitivity.
The following observations are worth pointing out in this context (see also figure  17): • The optimized GCm with attractor/climate fidelity has good sensitivity in the nearly Gaussian and the fat algebraic tail regimes while the optimized linear non-Gaussian model (58) has an acceptable skill only in the nearly Gaussian regime. • Despite the better attractor/climate fidelity in the highly skewed regimes (iii), (iv), the linear non-Gaussian model has a much worse forced response than GCm. • In regimes where the sensitivity of both optimized imperfect models is poor, including the configuration (iv) with regime switching and the unimodal PDF (see figure 12), GCm still performs better than the linear non-Gaussian model (58). • Unsurprisingly, the sensitivity of both optimized imperfect models is poor in the bimodal regime of the perfect model (not shown). In the nearly Gaussian regime (v) the sensitivity of both imperfect optimized models is good, as expected.

5.
Fundamental limitations of polynomial chaos expansions for UQ in systems with intermittency. The theory of homogeneous chaos of Wiener [96], proposed initially for use in the statistical theory of turbulence [97] and widely discarded some thirty years later (e.g. [103,15,16]), has recently regained its popularity as a method for quantifying propagation of uncertainty in nonlinear dynamical systems. Practical implementations of the Wiener Chaos framework for UQ, commonly referred to as Polynomial Chaos Expansions (PCE), rely on a truncated spectral expansion of the system variables via a (truncated) Galerkin-type projection onto a space spanned by a fixed orthonormal polynomial basis in random variables (see, e.g., [36,122,125,124,123,63,43] for but a few examples).
The PCE framework has many attractive features which are potentially well suited for numerical computations including the desirable separation of the random and deterministic components of the dynamics and 'realizability' of the resulting approximations as probability densities. Here, following [10] we utilize a simple unambiguous test model, related to the system (38) with intermittency discussed already in §4.1, to show that methods exploiting truncated spectral expansions have fundamental limitations for UQ in systems with intermittency or parametric uncertainty in the damping and fat-tailed probability densities. These shortcoming are important even in the absence of explicit time dependence, or nontrivial attractors which is a well known problem for PCE. The model used to illustrate these limitations is given by a scalar with uncertain dampingu whereγ is the mean damping and γ(t) represents the damping fluctuations which are given by either a time-independent random variable in the parametric uncertainty case, or by a Gaussian Ornstein-Uhlenbeck process in the intermittent case, similarly to (38). Intermittency and fat-tailed probability densities are abundant in the inertial and dissipation range of stochastic turbulence models (e.g., [19]) and we show that in such important dynamical regimes PCE performs, at best, similarly to the simple Gaussian moment closure technique utilized earlier in a different context for UQ within a framework of Empirical Information Theory [11]. The limitations of such truncated spectral expansions arise from: • The finite truncation of the spectral expansions of either the solution process; • Non-uniform convergence in time of PCE which leads to a rapidly increasing number of terms that are necessary to get a good approximate representation of the random process as time evolves; • Constant flux of randomness due to white noise forcing and fundamental problems with capturing this flux via finite truncations of the spectral representation of the associated Wiener process. • Slow decay of the PCE coefficients in the presence of intermittency which hampers implementation of sparse truncation methods widely used in elliptic problems or in low Reynolds number flows.
As discussed in detail in [10], the above issues lead to a number of fundamental limitations of the PCE approximations in systems with fat-tailed PDFs and intermittency; we illustrate these in §5.2 after a brief description of the PCE framework.

Truncated polynomial chaos expansions. Polynomial Chaos Expansion
(PCE) of a solution u of an SDE (e.g., [102]) or SPDE (e.g., [108]) separates the deterministic effects from the randomness arising due to the nonlinear interactions with unresolved processes. This is achieved by expanding the system variables via a Galerkin-type projection of the model variables onto the space spanned by the fixed orthonormal polynomial basis in random variables. In the case of the socalled Wiener-Hermite expansion, the orthonormal polynomials are given by infinite products of normalized Hermite polynomials where the multi-indices α≡{α 1 , α 2 , . . . } with integer entries satisfy ∞ i=1 α i < ∞; the polynomials (63) are orthonormal with respect to the Gaussian measure generated by iid normal random variables ξ ξ ξ = (ξ 1 , ξ 2 , . . . ), ξ i ∼ N (0, 1), ξ i ξ j = δ ij , which are used to represent the stochasticity of the solution process.
According to the Cameron-Martin theorem [13], any process with |u(t, ξ ξ ξ)| 2 < ∞ has the following expansion in terms of the orthonormal Wick polynomials T α (see Appendix C for more details), While the convergence of the infinite expansion (64) is guaranteed by the Cameron-Martin theorem, it is however, not uniform in time and the expansion coefficients are related by an infinite-dimensional set of ODEs in an infinite number of unknowns. Thus, the doubly infinite expansion (64) is useless in applications unless a suitable truncation is employed. The need for finite truncations of these expansions required in applications leads to important limitations for UQ in systems with intermittency, as discussed in the following section. The simplest such truncation is obtained by retaining only the first K random Gaussian variables ξ i and the Wick polynomials up to order N , leading to the truncated expression so that the resulting approximation has altogether N n=0 K + n − 1 n terms; clearly the number of coefficients in the truncated expansion grows rapidly with increasing K and/or N . In order to reduce the number of expansion coefficients and the computational overhead, Luo [69,70] and Hou et al. [49] proposed a sparse truncation method for reducing the number of PCE coefficients at a given truncation order. Similar sparse truncation methods were proposed for elliptic problems in [114,25]. We will show in the next section (see also [10]) that such truncation methods are not adequate for turbulent systems with energy transfer on the attractor and/or intermittency due to a slow decay of amplitudes of the expansion coefficients.

Intermittency, fat-tailed PDFs and shortcomings of truncated PCE.
We illustrate the limitations of the truncated PCE approximations in systems with fat-tailed PDFs and intermittency based on two instructive examples. The first example corresponds to stochastic dynamics of a scalar variable with the parametric uncertainty in the damping due to time-independent Gaussian fluctuations. This dynamics is described by an ODE with random initial values of the damping and  can be easily shown to have fat-tailed PDFs (see figure 18 and [10]). In the second example the intermittent dynamics of u(t) is driven by a purely deterministic forcing and stochastic damping fluctuations given by an OU process ( figure 19). In both cases the truncated PCE approximations fail to capture the intermittency which results in grossly underestimated approximations of low-order statistics (see [10] for more details).

PCE approximation of dynamics with uncertain damping PCE approximation of dynamics with uncertain damping PCE approximation of dynamics with uncertain damping and fat-tailed PDFs and fat-tailed PDFs and fat-tailed PDFs
This simple configuration is frequently used for testing PCE techniques for systems with simple parametric uncertainty in the damping fluctuations γ of (62) which leads to fat-tailed PDFs (see figure 18); this simple system is given bẏ Contrary to the second case discussed below, the statistically exactly solvable dynamics in this configuration is non-mixing and the autocorrelation of the damping fluctuations is constant and equal to σ 2 γ . Moreover, it can be easily shown that the mean and variance grow unboundedly in time, despite a possible initial metastable phase (see [10] for details).
The Cameron-Martin theorem [13] rigorously guarantees that the solutions of (66) admit the PCE expansion (64) on any finite time interval and the truncated equations for the PCE coefficients of the system (62) arė where α 1 = (1, 0, 0, . . . ) and the PCE coefficients for the time-independent damping fluctuations are given by γ α 1 = σ γ and γ α = 0 for α = α 1 . In deriving (67)we utilized the orthonormality of the Wick polynomials T α together with the important identity which allows for expressing the nonlinear terms in (62) via infinite expansions in the Wick polynomials (see [95,69,49,70] and Theorem 2 of Appendix C) Here, we additionally require that which leads to the following initial conditions in (67): where α 0 = (0, 0, . . . ).
Examples of truncated PCE approximations for this system are shown in figure  18 which we use to illustrate the following shortcomings of PCE in this simple configuration with parametric uncertainty: • Despite a good performance of truncated PCE approximations for sufficiently short times, even very high-order truncations fail to reproduce the inevitable unbounded growth of the statistics at later times; see figure 18. Thus, the good accuracy of the PCE approximations at short times can be misleading. • The unbounded growth of the statistics remains essentially undetected below certain truncation order which is parameter dependent but often high (in figure 18 the threshold is approx. N = 9); this shortcoming is also associated with the Gaussian Closure or Monte Carlo simulations.

PCE approximations in systems with intermittent transient PCE approximations in systems with intermittent transient PCE approximations in systems with intermittent transient instabilities instabilities instabilities
Here, the intermittency in the dynamics of u is induced by the stochastic damping fluctuations γ given by an Ornstein-Uhlenbeck process; the corresponding system is given by whereγ, d γ re the mean damping parameters, f is deterministic forcing, and σ γẆγ (t) represents stochastic white noise forcing defined for all time with amplitude σ γ .
Based on the Cameron-Martin theorem [13] the solutions of (66) admit the PCE expansion (64) with the truncated equations for the PCE coefficients of the system where for fixed i, I αj =δij equals 1 if α j = δ ij and equals zero otherwise; the orthonormal basis functions m i (t) are used in the truncated spectral representation of the Wiener process W γ (t) (see [49,69,10] and the Appendix C). In deriving (72) we utilized the orthonormality of the Wick polynomials T α and the identity (68). The white-noise forcing in the above truncated deterministic system of ODEs for the coefficients u α , γ α is represented by the K basis functions m i (t). Note that truncating (68) results in neglecting, at least partially, the effects of small scale fluxes on the 'resolved' dynamics u(t) which is particularly important in the presence of intermittency and non-vanishing energy transfer between Fourier modes of u at equilibrium.
Here, we assume that and we also require that the white noiseẆ γ is uncorrelated with the initial conditions, i.e., u 0Ẇ (t 0 ) = 0, γ 0Ẇ (t 0 ) = 0. (74) The constraints (73)-(74) lead to the following initial conditions and the truncated representation of the Wiener process Limitations of the truncated PCE method for solving white noise driven dynamics were noted in [49] where reliable integration of the uncertain dynamics required an unreasonably large dimension of the random vector ξ ξ ξ, except for sufficiently short times. The examples of PCE approximations of the second-order statistics of the system (71) show that the performance of this method is diminished further for UQ in systems with intermittency driven by unresolved, white noise processes. The following remarks are in place based on the examples shown in figure 19: • Finite truncations of the PCE approximations of the dynamics driven by white noise require a compromise between the order of truncation N and the number K of random variables used in the spectral approximation of the Wiener process. • Truncated PCE approximation performs well in nearly Gaussian regimes with strong mean damping provided that the white noise driving the evolution of damping fluctuations γ is sufficiently well resolved; however, in such cases the Gaussian closure performs similarly well and is more efficient. • PCE performs poorly in intermittent regimes, except for short times ( figure  19 with weak mean dampingγ), when both the high order of approximation and a good spectral resolution of the the white noise forcing are required (see also [49]). In the intermittent example shown in figure 19 long integration is needed when the statistics of u is of interest and the number of expansion coefficients necessary for accurate approximation grows to unacceptable levels. • Sparse truncation methods in the intermittent regimes driven by white noise are hampered by the slow decay of expansion coefficients. • For low order approximation (N ∼ 1) with sufficiently large number of random variables K ∼ 50 only short-time PCE approximation acceptable • For higher order truncations N ∼ 4 longer time approximation improves but there are insufficient number of random variables, realistically K ∼ 6, to obtain a decent approximation at either short or long times 6. Turbulent diffusion: New exactly solvable test models for UQ in complex high-dimensional systems. One of the important paradigm models for the behavior of turbulent systems [7,71] and the associated uncertainty quantification involves a passive tracer T (x x x, t) which is advected by a velocity field v v v(x x x, t) with dynamics given by where κ > 0 is molecular diffusion and the velocity field v v v is incompressible, ∇·v v v = 0.
When v v v(x x x, t) is a turbulent velocity field, the statistical properties of solutions of (79) such as their large scale effective diffusivity, energy spectrum, and probability density function (PDF) are all important in applications. These range from, for example, the spread of pollutants or hazardous plumes in environmental science to the behavior of anthropogenic and natural tracers in climate change science [26,27,99], to detailed mixing properties in engineering problems such as nonpremixed turbulent combustion [107,105,8]; some more specific important topics include: • Transitions between Gaussian and fat tailed highly intermittent PDFs for the tracer in laboratory experiments as the Peclet number varies with a mean background gradient for the tracer [40,50,51], • Fat tail PDFs for anthropogenic and natural tracers with highly intermittent exponential tails in observations of the present climate [99], • The nature of the sustained turbulent spectrum for scalar variance with a background gradient for the tracer [116], • Eddy diffusivity approximations for tracers in engineering and climate change science [26,27,79].
Recent advances addressing the above issues [80,92] and complementing the earlier work mentioned above utilized elementary stochastic models for turbulent diffusion of tracers with a background mean gradient in (79) and which led to new physically important intermittent regime with complex statistical features mimicking crucial aspects of laboratory experiments and atmospheric observations. These statistical features include the transition to intermittent scalar probability density functions with fat exponential tails as certain variances of the advecting mean velocity are increased while satisfying important physical constraints, and exact formulas for tracer eddy diffusivity which is nonlocal in space and time, as well as exact formulas and simple numerics for the tracer variance spectrum in a statistical steady state [92]. The recent use of such simple models with complex statistics as unambiguous test models for central contemporary issues in both climate change science and the real time filtering of turbulent tracers from sparse noisy observations and other important applications [35].
In this section we utilize a suite of Gaussian and non-Gaussian turbulent tracer models to discuss and illustrate a number of issues encountered in many other turbulent high-dimensional spatially extended systems; similarly to the previous sections, the emphasis is on the role of intermittency with nontrivial mean-turbulent flux interactions in UQ applications, and methods for mitigating imperfect model deficiencies through the use of the information-theoretic framework discussed in §2. The approach we assume for the imperfect models is motivated by the contemporary AOS numerical models which have too much dissipation in the model velocity field and employ various parameterizations of the unresolved processes, including eddy diffusivity parameterization. Thus, the imperfect models, which are discussed in more detail below, have the following general structure where v v v m is the coarse-grained model velocity, κ m is the model eddy diffusivity for the tracer, and σ(x, t)Ẇ (t), t ∈ IR, is the judiciously introduced stochastic forcing which can be spatially correlated; we will show based on detailed examples how the stochastic forcing can improve the climate fidelity and prediction skill for various coarse-grainings in the imperfect models with too much dissipation in this spatially extended case.
Important new issues for UQ discussed here (in addition to those discussed in §3 and §4 for the single Fourier mode) are: • The interplay between coarse-graining and uncertainty in imperfect models, • Consequences of overdamped/underdamped turbulent velocity spectra on the predictive skill of spatially extended models, • Mean-wave interactions, eddy diffusivity parameterization, and mitigation of model error via stochastic forcing, • Information-theoretic optimization and information barriers in spatially extended systems.
6.1. New exactly solvable test models for turbulent diffusion with hidden instabilities and intermittency. Here, we utilize instructive models for turbulent diffusion of a scalar tracer with nontrivial eddy-diffusivity, variance spectrum, and intermittent non-Gaussian statistics like tracers in the atmosphere [26] to provide a highly nontrivial demonstration of improving the fidelity of imperfect models through stochastic forcing. The two classes of statistically exactly solvable models introduced below build on the models discussed earlier for the single Fourier mode in §4; the resulting spatially extended models generalize all the models discussed earlier in [79,35,80,81,92,32] by incorporating additional intermittency due to (i) transient instabilities in the dynamics of the shear flow in the turbulent velocity field driving the tracer dynamics and/or (ii) intermittent fluctuations in the concentration of a reactive scalar with large scale instabilities. Next, we present a geophysical interpretation of these models although versions with engineering applications can be easily set-up [35]. Both of these systems have a zonal (east-west) turbulent jet, U (t), a family of planetary and synoptic scale waves with north-south velocity v(x, t) with x, a spatially periodic variable representing a fixed mid-latitude circle in the east-west direction, and tracer fluctuations T (x, t) with a north-south mean gradient α; the governing equations for these two models can be written compactly as where f (t), f v (x, t) are known deterministic functions, b v (x, t) represents a stochastic finitely correlated uncertainty in the forcing of the one-dimensional shear v, whileẆ U ,Ẇ v represent two independent white noise fluctuations defined for all time. The zonal jet U (t) =Ū (t) + U (t), whereŪ (t) > 0 is the climatological periodic mean which is strongly eastward while the random fluctuations, U (t), have a standard deviation consistent with such eastward dynamical behavior. The above system is solved by Fourier series with independent scalar complex variable versions of (80) for each different wave number k; in Fourier space the pseudo-differential operators P and Q have the , ω T k known functions of time which can be given by stochastic processes, as discussed below.
Non-Gaussian models with intermittent instabilities in the velocity Non-Gaussian models with intermittent instabilities in the velocity Non-Gaussian models with intermittent instabilities in the velocity field field field This non-Gaussian statistically exactly solvable system for the turbulent tracer with the mean gradient α is driven by a correlated turbulent velocity field with intermittency due to transient instabilities in the turbulent shear v(x, t). This model is obtained by adoptinĝ for the symbols of P and Q in (80) with each γ v k (t) given by a Gaussian stochastic process and ω v k , ω T k prescribed functions of wavenumber k. For the passive tracer mode T k in correlated velocity field with turbulent Rossby waves in the shear, we have with the values of the real coefficients a k , b k depending on the physical situation considered, as in [92]. In particular, for uncorrelated baroclinic Rossby waves a k ≡ 0 and b k ∝ k/(k 2 + F s ) where and F s is the stratification; this configuration is considered in one of the examples below.
The equations for the k-th Fourier mode of the tracer fluctuations T k and the shear flow v k with a spatially uniform zonal flow U are where γ v k and b v k are, respectively, the damping fluctuations and the forcing fluctuations in the k-th mode of the turbulent shear flow v k . This statistically exactly solvable system (see Appendix D for the analytical solutions) reduces to the model studied in [79,35,80,81,92,32] for γ v k ≡ 0, b v k ≡ 0, i.e., in the absence of stochastic fluctuations in both the damping and forcing in the dynamics of the Fourier modes of the v k . There exist two important and distinct sources of tracer intermittency in this model, both of them occur for a strongly eastward mean jetŪ . First, similarly to the simpler model with no transient instabilities in the velocity, i.e., γ v k ≡ 0, the intermittent tracer behavior occurs for a turbulent mean jet U (t) =Ū (t), and finitely correlated in time turbulent shear waves v in the presence of the large scale tracer gradient α (see [80] for details). The second source of intermittency arises due to the transient instabilities in the shear waves v for nontrivial stochastic solutions in (82) and a sufficiently weak mean damping d v k (see [9,11]). Combination of these two sources of intermittency with exact statistical solvability makes the system (82) a very attractive test model for UQ and data assimilation with judicious model error in realistic situations with the hidden instabilities due to γ v k mimicking, for example, geophysical boundary layers.
Non-Gaussian models for reactive tracer with intermittent Non-Gaussian models for reactive tracer with intermittent Non-Gaussian models for reactive tracer with intermittent instabilities in the velocity field and in the tracer instabilities in the velocity field and in the tracer instabilities in the velocity field and in the tracer This non-Gaussian, statistically exactly solvable model for a reactive turbulent tracer with intermittent large scale tracer instabilities and a correlated velocity field with intermittent instabilities in the turbulent shear flow is obtained by settinĝ for the symbols of P and Q in (80) with the dispersion relations ω v k , ω T k given by the general formulas (81). The Fourier modes of the turbulent tracer and velocity in this model satisfy where γ v k and b v k are, respectively, the damping fluctuations and the forcing fluctuations in the k-th mode of the turbulent shear flow v k , and γ(t) represents the spatially uniform damping fluctuations in the k-th mode of the tracer fluctuations T k . In addition to the two physically important sources of intermittency outlined in the previous model, the system (83) contains a third independent source of tracer intermittency due to the large scale reactive events in the tracer occurring at the rate r in (83a). The statistical exact solvability of this model (see Appendix D) makes it potentially very useful in realistic atmospheric applications with externally driven intermittent large scale tracer depletion and production.
Below we provide a highly nontrivial demonstration of improving the fidelity of imperfect models through information-theoretic optimization by inflating the stochastic forcing in the imperfect models.
6.2. Optimization of coarse-grained models for improved prediction skill and forced response. For spatially extended systems, the issue of coarse-graining becomes important in the context of model climate fidelity and sensitivity. Here, we discuss how coarse-graining is reflected in the quantification of model error using information metrics discussed in §2. This issue is compounded further by the Figure 20. Information-theoretic optimization of turbulent Gaussian Rossby waves. Rows 1-4: Relative entropy of the imperfect model with too much damping before optimizing noise (solid line), P(π, π m ), and after optimizing noise (dashed line), P(π, π m * ), as a function of coarse-graining. Left side corresponds to coarse-graining up to 15 modes, right side shows coarse-grainings up to 100 modes. Note different scales of y-axis on the left and right sides. Row 5: Additional noise ∆σ as a function of coarse-graining obtained by minimizing the lack of information.
presence of overdamped velocity spectra and eddy diffusivity parameterization in the imperfect models discussed below.
The first example [32] deals only with the turbulent shear flow which is assumed linear and Gaussian and we show that the uncertainty in the imperfect models grows with the number of Fourier modes considered (i.e., the coarse-graining); the information-theoretic optimization (see §2) by stochastic noise inflation greatly reduces the uncertainty for various coarse-grainings. The second example discusses optimization the Gaussian tracer model [32,81] which can be obtained from the more general model (82) as briefly explained below. In this more complex case, involving the effects of coarse-graining, eddy diffusivity and overdamped velocity spectra, a significant gain can be achieved using the stochastic-statistical optimization advocated in §2.
Optimization of coarse-grained Gaussian models for turbulent Optimization of coarse-grained Gaussian models for turbulent Optimization of coarse-grained Gaussian models for turbulent velocity velocity velocity Here, as a simple illustration, we consider a linear Gaussian model for the turbulent shear flow given for each mode v k by (82c) with γ v k = β v k ≡ 0, constant damping and noise amplitude d v k , σ v k and constant forcing f v k . The perfect model has K = 100 modes and the energy spectrum with E = 1, K 0 = 10 and θ = 3 which is appropriate for large scale turbulence in the atmosphere; the forcing is chosen to be roughly twice the standard deviation of Most models for turbulence in spatially extended systems have too much dissipation (Palmer 2001) due to inadequate resolution and deterministic parameterization of unresolved features involving wave-mean flow interaction and stochastic backscatter. We mimic this overdamping in the imperfect models for the turbulent velocity by imposing E k > E m k through E m = 0.8 and θ m = 3.5 in (84). The information-theoretic optimization discussed in §2 of the overdamped model for the turbulent shear can be carried out by inflating the stochastic forcing in order to maximize the climate fidelity (8); the results before and after such an optimization procedure are shown in figure 20 where, for simplicity, the noise amplitude is the same for all modes (see [32] for details). The first four rows illustrate the uncertainty content in the signal and dispersion components of the total uncertainty for different coarse-grainings, and the bottom row shows the difference between the optimal and true noise amplitudes for different coarse-grainings.
We note the following facts based on the example shown in figure 20: • The uncertainty in unoptimized models grows with increasing resolution.
• The model improvement is significant in both the signal and dispersion components of the total uncertainty even when the noise amplitude σ k = σ is assumed the same for all modes. • The difference between the true and optimal noise amplitudes increases with the number of modes in the coarse graining.
Optimization of coarse-grained Gaussian models for turbulent Optimization of coarse-grained Gaussian models for turbulent Optimization of coarse-grained Gaussian models for turbulent Gaussian or non-Gaussian tracer Gaussian or non-Gaussian tracer Gaussian or non-Gaussian tracer This example is based on the model for the turbulent tracer discussed in [81,32] which can be obtained from (82) by disregarding the transient instabilities in the uncorrelated velocity field (U (t), v(x, t)), i.e., we set γ v k = b v k ≡ 0 in (82); moreover we set ω v k = βk/(k 2 + F s ) corresponding to the dispersion relation of baroclinic Rossby waves where β is the north-south gradient of rotation, and F s is the stratification. The perfect system is non-Gaussian for σ U = 0 in (82a) and it is Gaussian otherwise.
Here, the imperfect models are Gaussian with the same dynamics for the zonal jet and Rossby waves and coarse-grained components (U m (t), v m (x, t)) and the tracer equation is given by where κ m = κ + κ eddy m with κ eddy m the eddy diffusivity coefficient, often utilized for parameterization of unresolved turbulence in climate science [26,27] whileẆ (x, t) denotes space-time white noise forcing with variance σ T . Contrary to the standard deterministic parameterizations in climate science with σ T = 0, we consider here an extended class of stochastic models in (85) with σ T = 0 to be improved by stochastic forcing inflation [104]. We also assume that the eddy diffusivity is given by κ eddy m = θκ * m with 0 θ 1 where κ * m = σ 2 /2d γ is the exact value of the eddy diffusivity in the white noise limit of the jet fluctuations fluctuations U (t) the perfect system [80,81,92]. Here, we focus on model error for time-periodic statistical steady states for the tracer which are coarse-grained to a given number of spatial Fourier modes; the larger the number of these modes, the larger the demands on the imperfect model to represent smaller scale behavior accurately. Thus, the natural information metric we utilize here for model fidelity consists of (12) utilized spatially but averaged over the period on the attractor similarly to (14).
The most important aspects of UQ in spatially extended systems, in addition to those discussed for the single Fourier mode in the previous sections, are illustrated in figures 21 and 22 (see [80,81,92] for a more detailed treatment of these issues). Figure 21 shows an example of improvements in the forced response of Gaussian imperfect models (85) of the Gaussian tracer by optimizing the unperturbed climate fidelity (14) via inflation of the stochastic forcing; see [32] for details. Here, both the perfect system and the imperfect models are Gaussian with no transient instabilities in the shear dynamics and no fluctuations in the zonal jet, i.e., γ v k ≡ 0 and σ U ≡ 0. Consequently, the imperfect Gaussian models for the tracer are given by (85) with κ eddy m ≡ 0 due to the absence of the mean-wave interaction in the Gaussian perfect model for the tracer. The external perturbations are imposed by increasing the strength of the mean jetŪ through a perturbation δf U (t) of the forcing which is of the ramp type and increases linearly over a finite time interval to reach a new constant value at the end of the process. However, similar to the previously discussed example for the turbulent Gaussian shear flow, the imperfect Gaussian tracer models are driven by overdamped Gaussian velocity fields (here with E = 0.7, θ = 4, K 0 = 10, in (84) and the model diffusivity κ m = 0.6 instead of the true κ = 0.001.) The measure of the predictive skill, P(π δ , π m δ ), is shown in figure 21 next to the corresponding values of uncertainty for the original imperfect model. For the coarse-grainings of K = 1, 3, and 5 modes, the improvement is roughly four fold, while for higher coarse-grainings of up to K = 20 modes, a nearly 20-fold improvement is achieved by the simple optimal stochastic noise inflation. Figure 22 illustrates the case of non-Gaussian tracer dynamics with σ U = 0 in (82); we show several facets of the systematic information-theoretic improvement over the deterministic models with σ T ≡ 0 in (85) obtained using the imperfect Gaussian tracer models with optimal stochastic forcing σ * T = 0 according to (14). First, the optimal value σ * T increases as the deterministic eddy diffusivity increases from zero to κ * m in the white noise limit of the jet fluctuations; secondly, there is a significant gain of information by utilizing the optimal stochastic imperfect models compared with the deterministic ones and this information gain is necessarily in the dispersion of the tracer. Finally, the right panel in figure 22 illustrates existence of information barriers for optimizing the class of models (85) by noise inflation; optimal value with the smallest information discrepancy occurs for κ eddy • The predictive skill of the overdamped imperfect models with optimal stochastic noise is greatly improved for various coarse grainings even for simple noise inflation with the same amplitude across all modes and in the time-averaged framework (14). • The uncertainty in unoptimized imperfect models grows with increasing resolution (see also figure 20). • Optimal noise σ * T depends on the coarse-graining, and the time interval used for averaging; σ * T increases as the deterministic eddy diffusivity increases from zero to the white noise limit κ * m . • In systems with nontrivial mean-wave interactions, the optimal noise σ * T depends on the number of components used for minimizing the relative entropy in (8) or (14), e.g., the tracer or the tracer and the velocity (see [32] for details).
• There exist information barriers to improving imperfect models of spatially extended systems ( figure 22). In this complex setting the nature of such barriers depends on the deterministic parameterizations and it is scale dependent, often allowing for uncertainty reduction at the coarse scales while hampering improvements at smaller scales.
7. Numerical artifacts in UQ with long time integration. Incorrect or inappropriate numerical algorithms can introduce significant numerical errors into UQ algorithms and bias both the effects of model error and techniques for mitigating this error, such as those discussed in the previous sections. The main difficulty in dealing with numerical artifacts in UQ applications is that even in low-dimensional configurations they often lead to plausible results without unphysical singularities, instabilities, etc., and their consequences can be much more elusive than in the purely deterministic framework. Nevertheless, detection and proper handling of such numerical artifacts is crucial for reliable UQ procedures and imperfect model optimization. Here, we illustrate these issues based on the models already discussed in the previous sections and we discuss a number of examples of numerical artifacts which are particularly misleading for long time integration of complex nonlinear systems with unresolved degrees of freedom. The examples used below are: • Fake intermittency in numerical PDFs; • Divergence of PCE approximations for long time integrations; • Failure of MC with large sample due to rare events.
The time scales for the 'long time integration' vary depending on the problem; however, in all cases 'long time' means comparable with or exceeding the time scales associated either with a transient adjustment onto the attractor or time scales associated with some transient dynamics in the absence of an attractor.
Fake intermittency in numerical PDFs for stochastic systems Fake intermittency in numerical PDFs for stochastic systems Fake intermittency in numerical PDFs for stochastic systems: In §6 we showed, following [92], how the presence of the mean gradient in the dynamics of a passive tracer can lead to intermittent PDF with fat tails provided that the velocity field driving the tracer dynamics is turbulent but with a finitely correlated in time Rossby waves shearing the turbulent zonal jet. However, it can be shown (see Appendix D.3) that in the white noise limit of (82) with rapidly decorrelating shear waves v that the tracer dynamics is given by wheref v k is the forcing in the dynamics of v k in the white noise limit, other parameters are defined as in (82). It can be shown analytically (see Appendix D.3) that whenf v k = 0 all tracer modes have Gaussian statistics even in the presence of mean gradient α = 0 and turbulent zonal jet, i.e., for σ U = 0. We use these analytical results to show that numerical estimates of these Gaussian PDFs can lead to fat-tailed PDFs if the integration time step is too large.
In figure 23 we show various fake fat-tailed PDFs obtained from numerical approximations of (86) with too large time-steps and for the same Fourier mode with k = 1; the forward Euler method was used in the computations but it is well known (e.g., [57]) that both the Euler and Milstein methods have the same weak order of convergence, i.e., they have the same order of accuracy of approximating the statistics of the solutions. The first inset in figure 23 shows a superposition of all the erroneous PDFs and the subsequent insets compare the true Gaussian PDF with the numerical approximations obtained with different integration time steps. Note that for time steps too small for the presence of fake fat tails, one can obtain nearly Gaussian PDFs with incorrect variance; such errors might be difficult to spot even if the Gaussian nature of the true PDF is known.  Figure 24 illustrates the appearance of fake fat tails in numerically estimated PDFs for increasing wave number of the tracer (top) and increasing noise amplitude σ U in the jet (bottom). The two insets in the top row show results of computations with two different integration time steps using the forward Euler method and for different wave numbers k. For sufficiently small time step (relative to the system parameters) the numerical approximations remain nearly Gaussian throughout the considered wavenumbers k = 1, . . . , 4; however, this situation changes dramatically for larger time steps when approximating the PDFs for high wave numbers, as can be seen in the inset computed with ∆t = 10 −4 . Similar fat-tailed artifacts occur when increasing the noise amplitude, σ U , in U (t) as shown in the npottom two insets of figure 24.
The following points deserve highlighting based on the above examples: • Insufficiently small time step (relative to the system parameters) in the numerical integration scheme, be it forward Euler or Milstein, leads to erroneous fat-tailed PDFs ( figure 23). The variance is first affected by the errors leading to nearly Gaussian PDFs with incorrect variance.
• The time-step value yielding accurate approximations for the statistics of turbulent modes at small wave numbers may be insufficiently small for larger wave numbers in the spectrum ( figure 24). Note that the above artifacts might be difficult to detect in high-dimensional nonlinear systems since they are not associated with any spectacular divergences, singularities, etc. There is a great need for mathematical theory and the systematic assessment of numerical algorithms which capture the long-time statistical dynamics of turbulent dynamical systems with high accuracy. Recent work in this important area was carried out in [119,120] for the example of a turbulent dynamical system arising in the large Prandtl number limit of classical Rayleigh-Benard convection; this work serves as a model for further research.

Divergence of PCE approximations for long time integrations
Divergence of PCE approximations for long time integrations Divergence of PCE approximations for long time integrations of stochastic systems of stochastic systems of stochastic systems.
These limitations of the Polynomial Chaos Expansion approximation were already discussed in §5 and at length in [10] and are associated with the finite order truncations in systems with fat-tailed PDFs and intermittency; the most important of these artifacts are: • Failure of truncated PCE approximations to detect unbounded growth of statistics in systems with parametric uncertainty in the damping (see figure 18 and [10]). • Failure of truncated PCE approximations in systems with nontrivial attractors (e.g., [106,64,3]). • Divergence of truncated PCE in systems driven by white noise, except at sufficiently short times ( figure 19 and [10]). Similarly to the issues raised in the previous paragraph, the artifacts arising in the context of the truncated PCE approximations are not associated with spectacular blow-up of the approximations; rather, they lead to underestimated statistical moments which are usually more difficult to detect in complex configurations.

Failure of Monte Carlo simulations with insufficiently large
Failure of Monte Carlo simulations with insufficiently large Failure of Monte Carlo simulations with insufficiently large ensemble size due to rare events ensemble size due to rare events ensemble size due to rare events.
This shortcoming of the MC was already illustrated in figure 18 of §5 where, similarly to the PCE approximations and to the Gaussian moment closure, MC completely failed to detect the unbounded growth in low order statistics despite large sample sizes for this toy problem with 50000 realizations. The problems associated with MC simulations and large scale phenomena triggered by rare events are well known (e.g. [54]) but it is important to stress their role in creating various artifacts in UQ applications.
8. Concluding remarks. Here, we reviewed the recent developments and discussed a range of new important mathematical issues arising in the emerging stochastic-statistical framework to quantifying uncertainty and mitigating model error in imperfect predictions of partially observed complex high-dimensional turbulent dynamical systems. The systematic discussion of issues in imperfect model predictions for uncertain initial data at short and medium-long ranges, as well as the forced response to system perturbations, was accomplished based on a suite of physically relevant and progressively more complex but mathematically tractable test models. These instructive test models possessed such important features as the two-way coupling between the resolved dynamics and the stochasticity due to the interactions with unresolved processes, intermittency and positive Lyapunov exponents, eddy diffusivity parameterization and overdamped turbulent spectra. In particular, the following two themes were emphasized throughout the article: • The utility of the unified stochastic-statistical framework for UQ and improvement of imperfect predictions with model error in turbulent dynamical systems with intermittency, positive Lyapunov exponents and hidden instabilities; • Existence of information barriers to model improvement, their hallmarks and consequences for UQ in turbulent dynamical systems.
Other topics developed here were concerned with the role and mitigation of model error due to coarse-graining and dimensional reduction, moment closure approximations and associated turbulent flux parameterization, the effects of memory of initial conditions in producing medium and long range forecasts and the sensitivity of imperfect models to external perturbations. The mathematical toolkit utilized empirical information theory, fluctuation-dissipation theorems and systematic physics-constrained, statistical-stochastic modelling for large-dimensional turbulent dynamical systems. New mathematical and computational phenomena in addressing these problems have been discussed throughout this article. Many theoretical issues remain in UQ and prediction of turbulent dynamical systems and the authors hope that this review inspires mathematicians, applied mathematicians, and scientists to contribute to this increasingly important contemporary science discipline.
Finally, we mention several important topics for mathematical research directly connected with this expository article and not discussed in detail.
(1) Data assimilation and filtering with judicious model error (1) Data assimilation and filtering with judicious model error (1) Data assimilation and filtering with judicious model error for turbulent systems for turbulent systems for turbulent systems A key feature of turbulence in inertial and energy dissipation regimes is bursts of energy across multiple scales with intermittent instabilities and effectively stochastic forcing due to complicated interactions between the turbulent modes; the resulting dynamics is highly non-Gaussian with associated fat-tailed PDFs. Stochastic Parameterization Extended Kalman Filters (SPEKF) have been introduced and analyzed recently [30,29,94,93] as computationally cheap algorithms which make judicious model errors while retaining high filtering skill for complex turbulent signals [42,55,9,94]. For example, aliasing is usually viewed as a bad feature of numerical algorithms; in the present context, judicious use of aliasing yields stochastic superresolution [94,55,93]. The basis for the SPEKF algorithms is the system (38), used earlier in a different context, for the complex scalar partially observed turbulent mode u(t) (the reader can think of a Fourier amplitude of turbulence at a given spatial wavenumber) coupled with stochastic additive forcing and multiplicative damping/instability coefficients, γ(t), b(t), which are learned on the fly from the observed turbulent signal. The advantage of the system (38) is that is has non-Gaussian dynamics but nevertheless exactly solvable first and second-order statistics, so they are readily implemented practically in a filtering algorithm. The rich dynamical behavior of (38), as already illustrated in figure 7, can be utilized to test the filter performance of a wide variety of Gaussian filter approximations [9]. See the recent book [93] for an elementary introduction to these topics.
(2) Fluctuation-Dissipation Theorems for high-dimensional (2) Fluctuation-Dissipation Theorems for high-dimensional (2) Fluctuation-Dissipation Theorems for high-dimensional turbulent systems turbulent systems turbulent systems There is a great need for further developments exploiting the appropriate fluctuation dissipation theorems for linking statistical equilibrium fidelity and sensitivity of imperfect models of high dimensional turbulent systems within the framework of information theory for model error and sensitivity. Moreover, much more rigorous work, in addition to [41], should be done for time-periodic systems, general multiplicative noise, and rigorous FDT representation formulas. The recent papers [1,79,74,80,81,82,11] and, in particular, [81,88,31] contain much of the formal research program and demonstrate it on exactly solvable test models. Some recent results [81,11] open up enticing prospects for developing techniques for improving imperfect model sensitivity based on specific tests carried out in the unperturbed climate and using the kicked response FDT.
The fundamental concepts justifying the linear response theory for forced dissipative stochastic dynamical systems through the fluctuation dissipation theorem have been substantially refined since the pioneering work of Leith [65] and are well known in the literature (see [41] and the references above). Despite this fact, minor modifications of this general concept appear intermittently in the literature in various guises. One recent example proposed in [14] is the method for predicting the low-frequency variability of El Nino-Southern Oscillation (ENSO) based on a particular noise sampling technique applied to the reduced model obtained by fitting a quadratic model to the ENSO data through a multilevel linear regression. We note that such an approach reduces to the classical FDT framework in the limit of taking infinitely many sample paths of the suggested noise 'snippets', provided that the reduced ad hoc quadratic model does not yield unphysical solutions with a finite-time blow-up (see [89] and (4) below). The authors in [14] seem to be unaware of this connection to earlier work [1,88,82,31] yet they empirically take this limit.
(3) Blended Dynamic Subspace techniques for UQ in turbulent (3) Blended Dynamic Subspace techniques for UQ in turbulent (3) Blended Dynamic Subspace techniques for UQ in turbulent systems systems systems Development of computationally efficient and accurate methods for propagation of uncertainty in turbulent systems is difficult since the complex intermittent interactions between the resolved and unresolved processes are important across a wide range of spatial and temporal scales with nontrivial energy transfer on the attractor. Due to this strong, time-dependent, two-way coupling between the mean dynamics and stochasticity, most attempts aimed at reducing the underlying complexity of the problem by separating the mean evolution from the dominant dimensionality of the uncertain subspace associated with the most energetic uncertain unresolved processes usually end in failure.
Contrary to the truncated Polynomial Chaos Expansions which retain a finite dimensional 'uncertain subspace' spanned by a fixed basis set, the methods exploiting dynamically orthogonal (DO) field equations applied to spatially extended systems with uncertainty attempt to capture the evolution of the most energetic stochastic modes in a dynamically consistent fashion [109,110]. Nevertheless, this improved method suffers from deterioration of UQ skill in time for turbulent dynamical systems and even non-normal linear systems [112]. New recently developed methods by T. Sapsis and one of the authors, which combine the DO framework with an approximate but statistically consistent mechanism for turbulent energy transfer between the reduced uncertain subspace and the under-resolved complementary dynamics offers a very promising way forward. However, this work is reported elsewhere [112,111,113]. Development of data-driven reduced stochastic-statistical models of turbulent dynamical systems for uncertainty quantification and long range forecasting is an extremely important issue. Standard linear regression models can have some skill but they suffer from inherent mathematical limitations and intrinsic barriers in skill [82]. Ad-hoc nonlinear regression models can exhibit improved skill in a training time series (see references in [89]) but can suffer unphysical finite-time blow-up of statistical solutions, as well as unphysical pathology in their invariant measure [89]. There are rigorous proofs [126] that physics-constrained stochastic mode reduction models which are Markovian have the physically correct asymptotic behavior for their invariant measure for low-frequency variability; however, further generalizations are required to include non-Markovian memory effects which are crucial in many applications. There is a recent new approach to designing such methods [83]. There is a wide array of data-driven clustering algorithms [77,46,47,44,23,24,45,48] to develop multiple regime Markov models for use in prediction. Giannakis and one of the authors [37,38,39] apply empirical information theory to assess the skill of coarse-grained partitions of phase space and reduced Markov models for long-range prediction. The methods of Horenko [46,47,44,45,48,39] are especially promising in this context but need further physical constraints to be more useful for long range forecasting. This is an exciting area for future interdisciplinary research. and the eigenvalues ofL are given by Clearly, the conditions (20) imply that the eigenvalues (89) have negative real parts and the system converges to a statistical attractor (see, e.g. [4,87], for a detailed definition of the attractor in the general nonautonomous case).
The mean dynamics for the linear Gaussian system (18) can be easily found by taking the ensemble average of (87) which leads to wherex 0 ,ȳ 0 denote the mean initial conditions at t 0 . The mean on the attractor can be obtained by taking the limit t 0 → −∞ in (90) which exists for any bounded forcing F (t) provided that the stability conditions (20) hold. It can be easily seen with the help of (88) that for time-periodic forcing F (t) = f 0 + f 1 sin ω ω ωt the time averaged mean on the attractor is given by where T = 2π/ω in the time-periodic case (f 1 = 0); for constant forcing taking an arbitrary non-zero T reduces (91) to (21) in §3.
The covariance matrix of the process x x x(t, t 0 ) satisfying (18) is given by so that, using (92) and (88), the the covariance Σ on attractor is . (93) and the autocovariance on the attractor is given by (94) Note that in this linear Gaussian framework the autocovariance on the attractor is independent of the forcing F (t).
where x 0 msm is the initial condition at t = t 0 . Consequently, the statistics of this Gaussian process is fully determined bȳ so that for γ msm > 0 the statistics on the attractor in the general nonautonomous case (see, e.g., [87,4]) is so that the variance on the attractor becomes R att ≡ C msm att (τ = 0) = σ 2 msm /2γ msm .
Note that x M S (t) cannot reproduce the marginal autocovariance (94) for x(t) since the necessary condition cannot be satisfied for all lags τ except for the degenerate case λ 1 = λ 2 . A particular choice of γ msm which guarantees correct decorrelation time for x msm (t) is and leads to has the same equilibrium marginal statistics for x(t). Thus, there is a oneparameter family of models with structure (18) with the same marginal statistics for x(t) and x m (t) provided that the coefficients of the imperfect models satisfy a m = w, A m = a + A − w, q m = q − aA + w(a + A − w), where w is a free parameter; for w = a this system coincides with (18). Remark. For time-periodic forcing F = f 0 + f 1s sin ωt + f 1c cos ωt in the perfect system (18) the four-parameter family of 2×2 models with correct time-averaged two-point marginal statistics on the attractor and coefficients is given by (104) with three additional free parameters f 1s m , f 1c m , ω m .
(II) Constant forcing; identical marginal climatology. Consider the linear model (18) with coefficients (a, q, A, σ) and forcing F = f 0 . According to Prop. 2, there is a three-parameter family of models with structure (18) and coefficients (a m , q m , A m , σ m , f 0 m ), with the same marginal climatology for x(t) and x m (t) provided that the coefficients of the imperfect models satisfy: a m = w 1 , q m = w 2 , A m = w 3 , σ m = σ (w 1 + w 3 )(w 1 w 3 − w 2 ) (a + A)(aA − q) , where {w 1 , w 2 , w 3 } are the free parameters. Remark. For time-periodic forcing F = f 0 + f 1s sin ωt + f 1c cos ωt in the perfect system (18)  MSm's with the same climatology can be described as follows.
(III) Constant forcing; correct marginal climatology for x msm (t). Consider the linear model (18) with coefficients (a, q, A, σ) and constant forcing F = f 0 . According to Proposition (3), there exists a one-parameter family of MSm's (24) with coefficients (γ msm , σ msm , F msm ) and the same marginal climatology for x msm and x; these coefficients satisfy γ msm = w, σ 2 msm = −w σ 2 (a + A)(aA − q) , F msm = −w Af 0 aA − q , where w is a free parameter. The MSm with correct marginal climatology and correct decorrelation time for x msm is the one with w = −(aA − q)/(a + A) consistent with (102).
Remark. For time-periodic forcing F = f 0 + f 1s sin ωt + f 1c cos ωt in the perfect system (18) the four-parameter family of MSm models with correct time-averaged marginal climatology for x msm has coefficients given by by (106) with three additional free parameters f 1s msm , f 1c msm , ω msm .
Appendix B. White noise limit for elementary systems with intermittency. Construction of imperfect models by considering special white noise limits was discsussed in §4.1.1. Here, we consider the limiting dynamics of the quadratic complex scalar system (38) assuming that the stochastic fluctuations of the damping γ(t) and the forcing fluctuations b(t) decorrelate infinitely fast. Similar procedure for the turbulent tracer is discussed in §7 and Appendix D.3 in order to unambiguously illustrate some numerical artifacts in long time numerical integrations of stochastic systems.
Consider the dynamics of the system (38) in the following limit so that the absolute value of the autocovariance of γ(t) and b(t) formally approach the Dirac delta distributions, i.e, lim dγ ,t→∞ σγ /dγ =const | γ(t + τ )γ(t) | =σ γ 2 δ 0 (τ ), lim and the processes γ(t), b(t) in (38) approach the white noise processes dγ(t) →σ γ dW γ (t), db(t) →σ b dW b (t), (110) where W γ (t), W b (t) are two independent Wiener processes defined for all time, as in (38). As usual in physics and engineering, the white noise limit of colored noise leads to the Stratonovich integral (e.g., [28]). This fact is a consequence of requiring non-vanishing correlations u(t)γ(t) , u(t)b(t) before the limit and after the limit. The limiting process leads to the Stratonovitch SDE for the complex scalar u(t), du(t) = −γ+iω(t) u(t)+F (t) dt−σ γ u(t)•dW γ (t)+σ b dW b (t)+σ u dW u (t), (111) where W γ and the components of W b and W u are all independent Wiener processes. We rewrite (111) in the Ito form as du(t) = −α+iω(t) u(t)+F (t) dt−σ γ u(t)dW γ (t)+σ b dW b (t)+σ u dW u (t), (112) where α =γ − 1 2σ 2 γ is the damping term in the white noise limit. (s)ds dW u (t), (113) followed by a subseqeuent transformation to the polar coordinates using where φ is the polar angle and µ ≡ ln . Standard manipulations using (113), (114) and the Ito calculus lead to the following system with the same statistics as (113) and (112) a) d = −α +σ whereσ u ≡ σ u / √ 2 the new noise terms are defined via the orthogonal transformations dW u,b (t) dW φ u,b (t) = cos(φ + Ω) sin(φ + Ω) − sin(φ + Ω) cos(φ + Ω) where Equation (118b) implies that p eq = p eq ( ) which implies that (118a) is solved by Finally, the marginal probability distributions for u 1 ≡ e[u] is given by For an arbitrary δ > 0 the integral in (120) can be expressed through the Hypergeometric function whose properties can be used to simplify the above expression p eq (u 1 ) = N 0 (σ 2 γ u 2 1 +σ 2 u ) κ , However, a simpler although incomplete justification of the above fact can be obtained by evaluating the integral in (120) atγ = mσ 2 γ , m a positive integer, so that δ = 1 + m and standard solutions apply.
Appendix C. Truncated polynomial chaos expansions. Here, we augment the discussion of the truncated Polynomial Chaos Expansions (PCE) in §64 with a few additional technical details; all of the information below is well known in the PCE literature.
The following result, often referred to as the Cameron-Martin theorem, forms the foundation of Wiener Chaos theory Theorem 1 (Cameron-Martin [13]) Assume that for fixed x and s t, u(x, s) is a functional of the Wiener process W on the interval [0, s] with |u(x, s)| 2 < ∞, then u(x, s) has the following Fourier-Hermite expansion: u(x, s) = α∈J u α (x, s)T α , u α (x, s) = u(x, s)T α , where T α are the Wick polynomials defined via the Hermite polynomials H αi , α i ∈ N by The multiindex α in (123) with a finite number of nonzero components is defined as so that the product in the right hand side of (123) has a finite number of factors and it is well defined. Wick polynomials are defined over a in infinite dimensional space with variables ξ ξ ξ = (ξ 1 , ξ 2 , . . . ), where ξ i = 0, ξ i ξ j = δ ij , and they form a complete orthonormal basis in L 2 on the probability space with respect to the Gaussian measure generated by ξ ξ ξ. In particular T α T β = δ αβ , T 0 = 1, T α = 0 when α = 0.
The order of the Wick polynomial T α is defined as |α| = α i . Furthermore, the first two statistical moments of u(x, s) are given by: and u 2 (x, s) = α∈J |u α (x, s)| 2 .
The following theorem due to [95] is very useful for our analysis since it allows for exact treatment of nonlinearities where C(θ, β, p) = θ β β + p p θ − β + p p 1 2 .
The operations on multi-indices are defined as and The doubly infinite expansion (122) is useless in applications unless a suitable truncation is introduced through the truncated index set so that the truncated PCE in (122) becomes The resulting approximation has altogether N n=0 K + n − 1 n terms. Various sparse truncation methods have been proposed in this context [49,114,25]; such methods are problematic in systems with intermittency [10].
C.1. Spectral representation of the white noise. when considering systems driven by white noise a suitable spectral representation of the associated Wiener process is required in the PCE approximations. Here, we follow here the approach of [69,70] and [49].
It is easy to see that the random variables defined by The expansion (138) converges in the mean-square sense Appendix D. Analytical formulas for the statistics of the turbulent tracer models. Two new classes of non-Gaussian statistically exactly solvable test models for the turbulent diffusion of a tracer were introduced and discussed in §6. Below, we present the analytical formulas for the second-order statistics in these two systems. First, the statistics of the turbulent velocity field with transient instabilities in the shear waves is derived; this velocity field drives both models (82) and (83) in §6. Next, the statistics of the tracer driven by the turbulent velocity field is derived. The general formulas for the non-Gaussian reactive tracer statistics are derived first and the statistics for the passive tracer are obtained by neglecting the binary reaction term in the general expressions. D.1. Statistics for the the velocity field in with intermittent transient instability. The dynamics of the two-dimensional turbulent velocity field (U (t), v(x, t)) with intermittent transient instabilities driving the two turbulent tracer models is given by (82b-e) or (83c-f). The second-order statistics is derived systematically below.
Path-wise solutions for the cross-sweep U (t) Path-wise solutions for the cross-sweep U (t) Path-wise solutions for the cross-sweep U (t) The path-wise solutions for the cross sweep U (t) =Ū (t) + U (t) in (82b) and (83c) are given byŪ where we use the short-hand notation for the initial condition U 0 ≡ U (t = t 0 ) and for the Green's function associated with U (t) The initial conditions U 0 in the zonal jet are assumed Gaussian so that U (t) is Gaussian.