Analysis of the 3DVAR Filter for the Partially Observed Lorenz '63 Model

The problem of effectively combining data with a mathematical model constitutes a major challenge in applied mathematics. It is particular challenging for high-dimensional dynamical systems where data is received sequentially in time and the objective is to estimate the system state in an on-line fashion; this situation arises, for example, in weather forecasting. The sequential particle filter is then impractical and ad hoc filters, which employ some form of Gaussian approximation, are widely used. Prototypical of these ad hoc filters is the 3DVAR method. The goal of this paper is to analyze the 3DVAR method, using the Lorenz '63 model to exemplify the key ideas. The situation where the data is partial and noisy is studied, and both discrete time and continuous time data streams are considered. The theory demonstrates how the widely used technique of variance inflation acts to stabilize the filter, and hence leads to asymptotic accuracy.


Introduction
Data assimilation concerns estimation of the state of a dynamical system by combining observed data with the underlying mathematical model. It finds widespread application in the geophysical sciences, including meteorology [15], oceanography [2] and oil reservoir simulation [24]. Both filtering methods, which update the state sequentially, and variational methods, which can use an entire time window of data, are used [1]. However, the dimensions of the systems arising in the applications of interest are enormous -of O(10 9 ) in global weather forecasting, for example. This makes rigorous Bayesian approaches such as the sequential particle filter [6], for the filtering problem, or MCMC methods for the variational problem [29], prohibitively expensive in on-line scenarios.
For this reason various ad hoc methodologies are typically used. In the context of filtering these usually rely on making some form of Gaussian ansatz [32]. The 3DVAR method [18,26] is the simplest Gaussian filter, relying on fixed (with respect to the data time-index increment) forecast and analysis model covariances, related through a Kalman update. A more sophisticated idea is to update the forecast covariance via the linearized dynamics, again computing the analysis covariance via a Kalman update, leading to the extended Kalman filter [13]. In high dimensions computing the full linearized dynamics is not practical. For this reason the ensemble Kalman filter [7,8] is widely used, in which the forecast covariance is estimated from an ensemble of particles, and each particle is updated in Kalman fashion. An active current area of research in filtering concerns the development of methods which retain the computational expediency of approximate Gaussian filters, but which incorporate physically motivated structure into the forecast and analysis steps [22,21], and are non-Gaussian.
Despite the widespread use of these many variants on approximate Gaussian filters, systematic mathematical analysis remains in its infancy. Because the 3DVAR method is prototypical of other more sophisticated ad hoc filters it is natural to develop a thorough understanding of the mathematical properties of this filter. Two recent papers address these issues in the context of the Navier-Stokes equation, for data streams which are discrete in time [5] and continuous in time [4]. These papers study the situation where the observations are partial (only low frequency spatial information is observed) and subject to small noise. Conditions are established under which the filter can recover from an order one initial error and, after enough time has elapsed, estimate the entire system state to within an accuracy level determined by the observational noise scale; this is termed filter accuracy. Key to understanding, and proving, these results on the 3DVAR filter for the Navier-Stokes equation are a pair of papers by Titi and co-workers which study the synchronization of the Navier-Stokes equation with a true signal which is fed into only the low frequency spatial modes of the system, without noise [25,12]; the higher modes then synchronize because of the underlying dynamics. The idea that a finite amount of information effectively governs the large-time behaviour of the Navier-Stokes equation goes back to early studies of the equation as a dynamical system [10] and is known as the determining node or mode property in the modern literature [27]. The papers [5,4] demonstrate that the technique of variance inflation, widely employed by practitioners in high dimensional filtering, can be understood as a method to add greater weight to the data, thereby allowing the synchronization effect to take hold.
The Lorenz '63 model [19,28] provides a useful metaphor for various aspects of the Navier-Stokes equation, being dissipative with a quadratic energy-conserving nonlinearity [9]. In particular, the Lorenz model exhibits a form of synchronization analogous to that mentioned above for the Navier-Stokes equation [12]. This strongly suggests that results proved for 3DVAR applied to the Navier-Stokes equation will have analogies for the Lorenz equations. The purpose of this paper is to substantiate this assertion.
The presentation is organized as follows. In section 2 we describe the Bayesian formulation of the inverse problem of sequential data assimilation; we also present a brief introduction to the relevant properties of the Lorenz '63 model and describe the 3DVAR filtering schemes for both discrete and continuous time data streams. In section 3 we derive Theorem 3.2 concerning the 3DVAR algorithm applied to the Lorenz model with discrete time data. This is analogous to Theorem 3.3 in [5] for the Navier-Stokes equation. However, in contrast to that paper, we study Gaussian (and hence unbounded) observational noise and, as a consequence, our results are proved in mean square rather than almost surely. In section 4 we extend the accuracy result to the continuous time data stream setting: Theorem 4.1; the result is analogous to Theorem 4.3 in [4] which concerns the Navier-Stokes equation. Section 5 contains numerical results which illustrate the theory. We make concluding remarks in section 6.

Set-Up
In subsection 2.1 we formulate the probabilistic inverse problem which arises from attempting to estimate the state of a dynamical system subject to uncertain initial condition, and given partial, noisy observations. Subsection 2.2 introduces the Lorenz '63 model which we employ throughout this paper. In subsections 2.3 and 2.4 we describe the discrete and continuous 3DVAR filters whose properties we study in subsequent sections.

Inverse Problem
Consider a model whose dynamics is governed by the equation with initial condition u(0) = u 0 ∈ R p . We assume the the initial condition is uncertain and only its statistical distribution is known, namely the Gaussian u 0 ∼ N (m 0 , C 0 ). Assuming that the equation has a solution for any u 0 ∈ R p and all positive times, we let Ψ(·, ·) : R p × R + → R p be the solution operator for equation (2.1). Now suppose that we observe the system at equally spaced times t k = kh for all k ∈ Z + . For simplicity we write Ψ(·) := Ψ(·; h). Defining u k = u(t k ) = Ψ(u 0 ; kh) we have We assume that the data {y k } k∈Z + is found from noisily observing a linear operator H applied to the system state, at each time t k , so that y k+1 = Hu k+1 + ν k+1 , k ∈ N.
Here {ν k } k∈N is an i.i.d. sequence of random variables, independent of u 0 , with ν 1 ∼ N (0, Γ) and H denotes a linear operator from R p to R m , with m ≤ p. If the rank of H is less than p the system is said to be partially observed. The partially observed situation is the most commonly arising in applications and we concentrate on it here. The over-determined case m > p corresponds to making more than one observation in certain directions; one approach that can be used in this situation is to average multiple observations to reduce the 2 effective observational error variance by the square root of the number of observations in that direction, and thereby reduce to the case where the rank is less than or equal to p. We denote the accumulated data up to time k by Y k := {y j } k j=1 . The pair (u k , Y k ) is a jointly varying random variable in R p ×R km . The goal of filtering is to determine the distribution of the conditioned random variable u k |Y k , and to update it sequentially as k is incremented. This corresponds to a sequence of inverse problems for the system state, given observed data, and it has been regularized by means of the Bayesian formulation.

Forward Model: Lorenz '63
When analyzing the 3DVAR approach to the filtering problem we will focus our attention on a particular model problem, namely the classical Lorenz '63 system [19]. In this section we introduce the model and summarize the properties relevant to this paper. The Lorenz equations are a system of three coupled nonlinear ordinary differential equations whose solution u ∈ R 3 , where u = (u x , u y , u z ), satisfieṡ Note that we have employed a coordinate system where origin is shifted to the point 0, 0, −(r + α) as discussed in [30]. Throughout this paper we will use the classical parameter values (α, b, r) = (10, 8 3 , 28) in all of our numerical experiments. At these values, the system is chaotic [31] and has one positive and one negative Lyapunov exponent and the third is zero, reflecting time translation-invariance. Our theoretical results, however, simply require that α, b > 1 and r > 0 and we make this assumption, without further comment, throughout the remainder of the paper.
In the following it is helpful to write the Lorenz equation in the following form as given in [9], [12]: We use the notation ·, · and | · | for the standard Euclidean inner-product and norm. When describing our observations it will also be useful to employ the projection matrices P and Q defined by (2.6) We will use the following properties of A and B: We will also use the following:

3DVAR: Discrete Time Data
In this section we describe the 3DVAR filtering scheme for the model (2.1) in the case where the system is observed discretely at equally spaced time points. The system state at time t k = kh is denoted by u k = u(t k ) and the data upto that time is Y k = {y j } k j=1 . Recall that our aim is to find the probability distribution of u k |Y k . Approximate Gaussian filters, of which 3DVAR is a prototype, impose the following approximation: (2.8) Given this assumption the filtering scheme can be written as an update rule To determine this update we make a further Gaussian approximation, namely that u k+1 given Y k follows a Gaussian distribution: Now we can break the update rule into two steps of prediction (m k , C k ) → (m k+1 ,Ĉ k+1 ) and analysis (m k+1 ,Ĉ k+1 ) → (m k+1 , C k+1 ). For the prediction step we assume thatm k+1 = Ψ(m k ) whilst the choice of the covariance matrixĈ k+1 depends upon the choice of particular approximate Gaussian filter under consideration. For the analysis step, (2.10) together with the fact that y k+1 |u k+1 ∼ N (Hu k+1 , Γ) and application of Bayes' rule, implies that As mentioned the choice of update rule C k →Ĉ k+1 defines the particular approximate Gaussian filtering scheme. For the 3DVAR scheme we imposeĈ k+1 = C ∀k ∈ N where C is a positive definite p × p matrix. From equation (2.12b) we then get is called Kalman gain matrix. The iteration (2.13) is analyzed in section 3. Another way of defining the 3DVAR filter is by means of the following variational definition: This coincides with the previous definition because the mean of a Gaussian can be characterized as the minimizer of the negative of the logarithm of the probability density function and because the analysis step corresponds to a Bayesian Gaussian update, given the assumptions underlying the filter; indeed the fact that the negative logarithm is the sum of two squares follows from Bayes' theorem. From the variational formulation, it is clear that the 3DVAR filter is a compromise between fitting the model and the data. The model uncertainty is characterized by a fixed covariance C, and the data uncertainty by a fixed covariance Γ; the ratio of the size of these two covariances will play an important role in what follows.

3DVAR: Continuous Time Data
In this section we describe the limit of high frequency observations h → 0 which, with appropriate scaling of the noise covariance with respect to the observation time h, leads to a stochastic differential equation (SDE) limit for the 3DVAR filter. We refer to this SDE as the continuous time 3DVAR filter. We give a brief derivation, referring to [4] for further details and to [3] for a related analysis of continuous time limits in the context of the ensemble Kalman filter. We assume the following scaling for the observation error covariance matrix: Γ = 1 h Γ 0 . Thus, although the data arrives more and more frequently, as we consider the limit h → 0, it is also becoming more uncertain; this trade-off leads to the SDE limit. Define the sequence of variables {z k } k∈N by the relation z k+1 = z k + hy k+1 and z 0 = 0. Then Here γ k ∼ N (0, I). By rearranging and taking limit as h → 0 we get

Analysis of Discrete Time 3DVAR
In this section we analyse the discrete time 3DVAR algorithm when applied to a partially observed Lorenz '63 model; in particular we assume only that the u x component is observed. We start, in subsection 3.1, with some general discussion of error propagation properties of the filter. In subsection 3.2 we study mean square behaviour of the filter for Gaussian noise. Recall the projection matrices P and Q given by (2.6), we will use these in the following. We will also use {v k } to denote the exact solution sequence from the Lorenz equations which underlies the data; this is to be contrasted with {u k } which denotes the random variable which, when conditioned on the data, is approximated by the 3DVAR filter.

Preliminary Calculations
Throughout we assume that H = (1, 0, 0), so that only u x is observed, and we choose the model covariance C = η −1 2 I. We also assume that Γ = 2 . The Kalman gain matrix is then G = 1 1+η H * and the 3DVAR filter (2.13) may be written The scalar parameter is a design parameter whose choice we will discuss through the analysis of the iteration (3.1). Note that we are working with rather specific choices of model and observational noise covariances C and Γ; we will comment on generalizations in the concluding section 6. We define v to be the true solution of the Lorenz equation (2.5) which underlies the data, and we define v k = v(kh), the solution at observation times. Note that, since Γ = 2 , it is consistent to assume that the observation errors have the form where ξ k are i.i.d. random variables on R. We will consider the case ξ 1 ∼ N (0, 1) for simplicity of exposition. Note that we may write We are interested in comparing m k , the output of the filter, with v k the true signal which underlies the data. We define the error process δ(t) as follows: Observe that δ is discontinuous at times t j which are multiples of h, since m k+1 = Ψ(m k ; h). In the following we write δ(t − j ) for lim t→t − j δ(t) and we define δ j = δ(t j ). Thus δ j = δ(t − j ). Subtracting (3.3) from (3.2) we obtain Now consider the time interval (t k , t k+1 ). Since δ(t) is simply given by the difference of two solutions of the Lorenz equations in this interval, we have and hence 1 2 In order to use (3.4) we wish to estimate the behaviour of δ(t − k+1 ) in terms of δ k . The following is useful in this regard and may be proved by using (3.8) together with Properties 2.1(4). Note that K is defined by equation (2.7) and is necessarily greater than or equal to one, since b, α > 1.

Accuracy Theorem
In this subsection we assume that ξ 1 ∼ N (0, 1) and we study the behaviour of the filter in forward time when the size of the observational noise, O( ), is small. The following result shows that, provided variance inflation is employed (η small enough), the 3DVAR filter can recover from an O(1) initial error and enter an O( ) neighbourhood of the true signal. The results are proved in mean square. The reader will observe that the bound on the error behaves poorly as the observation time h goes to zero, a result of the overweighting of observed data which is fluctuating wildly as h → 0. This effect is removed in section 4 where the observational noise is scaled appropriately, in terms of h → 0, to avoid this effect. For this theorem we define a norm · by u 2 = |u| 2 + |P u| 2 , where | · | is the Euclidean norm.
Consequently lim sup Proof. Recall that we have Eν k+1 = 0 and E|ν k+1 | 2 = 2 . On application of the projection P to the error equation (3.4) for 3DVAR we obtain (3.12) Define M 1 and M 2 by and (3.14) Adding (3.11) to (3.12) and using Lemma 3.3 shows that Now we observe that Thus there exists an h c > 0 and a λ > 0 such that, for all η sufficiently small Hence the theorem is proved. 8 The following lemma is used in the preceding proof.

Lemma 3.3. Under the conditions of Theorem 3.2 for
and Proof. Taking inner product of (3.5) with P δ, instead of with δ as previously, we get By rearranging and applying Proposition 3.1 we get Multiplying by integrating factor e α(t−t k ) and integrating from t k to t gives equation (3.18 Multiplying by the integrating factor e (t−t k ) and integrating from t k to t gives Rearranging the above equation gives (3.19).

Analysis of Continuous Time 3DVAR
In this section we analyse application of the 3DVAR continuous filtering algorithm for the Lorenz equation (2.5). We will use {v(t)} t∈[0,∞) to denote the exact solution sequence from the Lorenz equations which underlies the data; this is to be contrasted with {u(t)} t∈[0,∞) which denotes the random variable which, when conditioned on the data, is approximated by the 3DVAR filter.
We study the continuous time 3DVAR filter, again in the case where H = (1, 0, 0), Γ 0 = 2 and C = η −1 2 I. To analyse the filter it is useful to have the truth v which gives rise to the data appearing in the filter itself. Thus (2.17) gives We then eliminate z in equation (2.19) by using (4.1) to obtain In the specific case of the Lorenz equation we get From equation (4.2) with the choices of C, H and Γ 0 detailed above we get where we have extended w from a scalar Brownian motion to an R 3 -valued Brownian motion for notational convenience. This SDE has a unique global strong solution m ∈ C([0, ∞); R 3 ). Indeed similar techniques used to prove the following result may be used to establish this global existence result, by applying the Itô formula to |m| 2 and using the global existence theory in [23]; we omit the details. Recall K given by (2.7).
where λ is defined by Thus Proof. The true solution follows the model where we include the last term, which is identically zero, for clear comparison with the filter equation 10 Using Itô's formula gives Using Lemma 4.2 and the definition of λ gives Rearranging and taking expectations gives Use of the Gronwall inequality gives the desired result.
The following lemma is used in the preceding proof.
Proof. On expanding the inner product (4.14) We now use the Properties 2.1(1), (5) and the fact that true solution lies on the global attractor so that |v| ≤ K. As a consequence we obtain Using Young's inequality with parameter θ (4.16) Taking θ = ηK 2 yields the desired result (4.17)

Numerical Results
In this section we present numerical results illustrating Theorems 3.2 and 4.1 established in the two preceding sections. All experiments are conducted with the parameters (α, b, r) = (10, 8 3 , 28). Both the theorems are mean square results. However, some of our numerics are based on a single long-time realization of the filters in question, with time-averaging used in place of ensemble averaging when mean square results are displayed; we highlight when this is done.

Discrete case
Under the assumptions of Theorem 3.2 we expect the mean square error in δ = |v − m| to decrease exponentially until it is of the size of the observational noise squared. Hence we expect the estimate m to converge to a neighbourhood of the true solution v, where the size of the neighbourhood scales as the size of the noise which pollutes in observation, in mean square. The following experiment indicates that similar behaviour is in fact observed pathwise (Figure 2), as well as in mean square over an ensemble (Figure 3). We set up the numerical experiments by computing the true solution v of the Lorenz equations using the explicit Euler method, and then adding Gaussian random noise to the observed x-component to create the data. Throughout we fix the parameter η = 0.1. In Figure 2 the observational noise is fixed and in Figure  3 we vary it over a range of scales. Figure 2 concerns the behaviour of a single realization of the filter. Note that the initial error |v(0) − m(0)| is around E|v| ≈ 10 and it decays exponentially with time, converging to O( ); for this particular case we chose = 1. A consequence of the second part of Theorem 3.2 is that the logarithm of the asymptotic mean squared error log E|δ| 2 varies linearly with the logarithm of the standard deviation of noise in the observations ( ) and this is illustrated in Figure 3. To compute the asymptotic mean square error we take two approaches. In the first, for each , we time-average the error incurred within a single long trajectory of the filter. In the second approach, we consider spatial average over an ensemble of observational noises ν, at a single time after the error has reached equilibrium. In Figure 3 we observe the log-linear decrease in the asymptotic error as the size of the noise decreases; furthermore, the slope of the graph is approximately 2 as predicted by (3.10). Both temporal and spatial averaging deliver approximately the same slope.

Continuous case
In the case of continuous observations we again compute a true trajectory of the Lorenz equation using the explicit Euler scheme. We then simulate the SDE (4.3) using the Euler-Maruyama method. 1 Similarly to the discrete case, we consider both pathwise and ensemble illustrations of the mean square results in Theorem 4.1. Figures 4 and 5 concern a single pathwise solution of (4.3). Recall from Theorem 4.1 that the critical value of η, beneath which the mean square theory holds, is η c = 4/K. In Figure 4 we have η = 1 2 η c whilst in Figure 5 we have η = 10η c ; in both cases the pathwise error spends most of its time at O( ), after the initial transient is removed, suggesting that the critical value of η derived in Theorem 4.1 is not sharp. In Figure 6 we vary the size of observational error and take η = 1 8 η c . The initial error is expected to decay exponentially towards something of order , and this is what is observed in both the case where averaging is performed in time and in space. Indeed we observe the log-linear decrease in the asymptotic error as the size of the noise decreases, and the slope of the graph is approximately 2, as predicted by equation (4.5).

Conclusions
The study of approximate Gaussian filters for the incorporation of data into high dimensional dynamical systems provides a rich field for applied mathematicians. Potentially such analysis can shed light on algorithms currently in use, whilst also suggesting methods for the improvement of those algorithms. However, rigorous analysis of these filters is in its infancy. The current work demonstrates the properties of the 3DVAR algorithm when applied to the partially observed Lorenz '63 model; it is analogous to the more involved theory developed for the 3DVAR filter applied to the partially observed Navier-Stokes equations in [5,4]. Work of this type can be built upon in four primary directions: firstly to consider other model dynamical systems of interest to practitioners, such as the Lorenz '96 model [20]; secondly to consider other observation models, such as pointwise velocity field measurements or Lagrangian data for the Navier-Stokes equations, building on the theory of determining modes [14]; thirdly to consider the precise relationships required between the model covariance C and observation operator H to ensure accuracy of the filter; and finally to consider more sophisticated filters such as the extended [13] and ensemble [7,8] Kalman filters. We are actively engaged in studying other models, such as Lorenz '96, by similar techniques to those employed here; our work on Lorenz '63 and Navier-Stokes models builds heavily on the synchronization results of Titi and coworkers and we believe that generalization of synchronization properties is a key first step in the study of other models. Regarding the second direction, Lagrangian data introduces an additional auxiliary system for the observed variables through which the system of interest is observed, necessitating careful design of correlations in the design parameters C, meaning that the analysis will be considerably more complicated than for Eulerian data. This links to the third direction: in general the relationship between the model covariance and observation operator required to obtain filter accuracy may be quite complicated and is an important avenue for study in this field; even for the particular Lorenz '63 model studied herein, with observation of only the x component of the system, this complexity is manifest if the covariance is not diagonal. Relating to the fourth and final direction, it is worth noting that 3DVAR is outdated operationally and empirical studies of filter accuracy have recently been focused on the more sophisticated methods such as ensemble Kalman filter and 4DVAR [16,17]. These empirical studies indicate that the more sophisticated methods outperform 3DVAR, as expected, and therefore suggest the importance of rigorous analysis of those methods.