Long-time Asymptotics of the Filtering Distribution for Partially Observed Chaotic Dynamical Systems

The filtering distribution is a time-evolving probability distribution on the state of a dynamical system, given noisy observations. We study the large-time asymptotics of this probability distribution for discrete-time, randomly initialized signals that evolve according to a deterministic map $\Psi$. The observations are assumed to comprise a low-dimensional projection of the signal, given by an operator $P$, subject to additive noise. We address the question of whether these observations contain sufficient information to accurately reconstruct the signal. In a general framework, we establish conditions on $\Psi$ and $P$ under which the filtering distributions concentrate around the signal in the small-noise, long-time asymptotic regime. Linear systems, the Lorenz '63 and '96 models, and the Navier Stokes equation on a two-dimensional torus are within the scope of the theory. Our main findings come as a by-product of computable bounds, of independent interest, for suboptimal filters based on new variants of the 3DVAR filtering algorithm.

1. Introduction. The evolution of many physical systems can be successfully modeled by a deterministic dynamical system for which the initial conditions may not be known exactly.
In the presence of chaos, uncertainty in the initial conditions will be dramatically amplified even in short time intervals. However, when observations of the system are available, they may be used to ameliorate this growth in uncertainty and potentially lead to accurate estimates of the state of the system. In this work we provide sufficient conditions on the observations of a wide class of dissipative chaotic differential equations that guarantee long-time accuracy of the estimated state variables. The equations covered by our theory include the Lorenz '63 and '96 models as well as the Navier-Stokes equation on a two-dimensional torus. The importance of these model problems within geophysical applications is highlighted in [22], and their use for testing the efficacy of filtering algorithms is exemplified in [21], [17].
It is often natural to acknowledge the uncertainty on the initial condition by viewing it as a probability distribution which is propagated by the dynamics. Whenever a new observation of the state variables becomes available, this distribution is updated to incorporate it, reducing uncertainty. This process is performed sequentially in what is known as filtering [8]. Unfortunately, in almost all situations of applied relevance-with the exception of finite state signals and the linear Gaussian case-the analytical expression for these filtering distributions involves integrals that cannot be computed in closed form. It is thus necessary to employ a numerical algorithm to sequentially approximate the filtering distributions. In order to develop good algorithms, a thorough understanding of the properties of these distributions is desirable. The interplay between properties of the filtering distributions and those of their numerical approximations is perhaps best exemplified by the case of filter stability and particle filtering: the long-time behavior of particle filtering algorithms depends crucially on the filtering distributions' sensitivity to their initial condition [9], [7], [19]. The main result of this paper shows long-time concentration of the filtering distributions towards the true underlying signal for partially observed chaotic dynamics. The proofs combine the asymptotic boundedness of a new suboptimal filter with the mean-square optimality of the mean of the filtering distribution as an estimator of the signal [32]. All of our examples rely on synchronization properties of dynamical systems. This tool underlies the study of noise-free data assimilation initiated in [10] for the Lorenz '63 and the Navier-Stokes equation. The paper [10] motivated studies of the 3DVAR filter (three-dimensional variational method) for a variety of dissipative chaotic dynamical systems, conditioned on noisy observations, in [3] (Navier-Stokes), [16] (Lorenz '63), and [15] (Lorenz '96). The 3DVAR filter from meteorology [20], [23] is a method which, iteratively in time, solves a quadratic minimization problem representing a compromise between matching the model and the data. Here we study the filtering distribution itself, using modifications of the 3DVAR filter which exploit dissipativity to obtain upper bounds on the error made by the optimal filter. We also provide a unified methodology for the analysis. Furthermore, whereas previous work in [3], [15] required the observation noise to have bounded support, here only finite variance is assumed.
The suboptimal modified 3DVAR filter that we use in our analysis can also be interpreted using ideas from nonlinear observer theory [31], [29]. Its asymptotic boundedness is proved by a Lyapunov-type argument. Although more sophisticated suboptimal filters could be used to gain insight into the filtering distributions, our choice of modified nonlinear observers is particularly well-suited to deal with high (possibly infinite) dimensional signals, as indicated by the fact that the theory includes the Navier-Stokes equation. Filtering in high dimensions is not, in general, well-understood. For example, the question of whether some form of particle filtering could be robust with respect to dimension has received much recent attention [27], [24], [1]. By understanding properties of the filtering distribution in high and infinite dimensions we provide insight that may inform future development of particle filters. This paper is organized as follows. In section 2 we set up the notation and formulate the questions we address in the rest of this paper. Section 3 reviews the 3DVAR algorithm from data assimilation and its relation to more general nonlinear observers from the control theory literature. A new truncated nonlinear observer is also introduced. In section 4 we prove longtime asymptotic results for these suboptimal filters, and thereby deduce long-time accuracy of the filtering distributions. Section 5 contains some applications to relevant models, and we close in section 6. Figure 1. Graphic representation of the dependence structure assumed throughout this paper. Conditional on v0, . . . , vj , the distribution of vj+1 is completely determined by vj via a deterministic map Ψ; therefore, the signal process forms a Markov chain. Similarly, conditional on {vj } j≥0 , {yj} j≥1 is a sequence of independent random variables such that the conditional distribution of yj depends only on vj .
have access only to outcomes of the observation process. We suppose that both take values in a separable Hilbert space H = (H, ·, · , | · |) and that the signal is randomly initialized with distribution μ 0 , v 0 ∼ μ 0 . We assume further that there is a deterministic map Ψ such that and therefore all the randomness in the signal comes from its initialization. The observation process is given by where P denotes some linear operator that projects the signal onto a proper subspace of H, {w j } j≥1 is an independently and identically distributed (i.i.d.) noise sequence (independent of v 0 ), and > 0 quantifies the strength of the noise. A graphic representation of the assumed dependence structure is given in Figure 1. We define Q = I − P . For mathematical convenience, and contrary to usual convention, we see both observations and noise as taking values in the same space H as the signal, with the standing assumptions Qy j = 0, Qw 1 = 0, and P w 1 = w 1 a.s. 1 Thus Q is a projection operator onto the unobserved part of the system. For j ≥ 0, we let Y j := σ(y i , i ≤ j) be the σ-algebra generated by the observations up to the discrete time j.
Note that the law of {v j , y j } j≥0 is completely determined by four elements: the law of v 0 , the map Ψ, the law of w 1 , and the observation operator P . We will denote by P the law of {v j , y j } j≥0 and by E the corresponding expectation. It will be assumed throughout that E|v 0 | 2 < ∞ and that the observation noise satisfies Ew 1 = 0 and E|w 1 | 2 < ∞. For convenience and without loss of generality we normalize the latter so that E|w 1 | 2 = 1.
The main object of interest in filtering theory is the conditional distributions of the signal at discrete time j ≥ 1 given all observations up to time j. These are known as filtering distributions and will be denoted by The mean v j of the filtering distribution μ j is known as the optimal filter By the mean-square minimization property of the conditional expectation [32], this filter is optimal in the sense that, among all Y j -measurable random variables, it is the only one-up to equivalence-that minimizes the L 2 distance to the signal v j : In other words, v j is the best possible estimator (in the mean-square sense) of the state of the signal at time j given information up to time j. The optimal filter is usually, like the filtering distributions, not analytically available. However, by studying suitable suboptimal filters {z j } j≥0 and using (2.3) we can find sufficient conditions under which the optimal filter is close to the signal in the long-time horizon. We thus provide sufficient conditions under which the observations counteract the potentially chaotic behavior of the dynamical system and allow predictability on infinite time horizons.
The main objective of this paper is to investigate the long-time asymptotic behavior of the filtering distribution for discrete-time chaotic signals arising from the solution to a dissipative quadratic system with energy-conserving nonlinearity which is observed at discrete times t j = jh, j ≥ 1, and h > 0. The bilinear form B(·, ·) will be assumed throughout to be symmetric. We denote by Ψ t the one-parameter solution semigroup associated with (2.4), i.e., for v 0 ∈ H, Ψ t (v 0 ) is the value at time t of the solution to (2.4) with initial condition v 0 . Furthermore, we introduce the abbreviation Ψ = Ψ h . Our theory-developed in section 4-relies on two assumptions that we now state and explain.
2. (Squeezing property.) There is a function V : H → [0, ∞) such that V (·) 1/2 is a Hilbert norm equivalent to | · |, a bounded operator D, an absorbing set B V = {u ∈ H : V (u) 1/2 ≤ R} ⊃ B, and a constant α ∈ (0, 1) such that for all u ∈ B, v ∈ B V , The absorbing ball property concerns only the signal dynamics. It is satisfied by many dissipative models of the form (2.4)-see section 5. The squeezing property involves both the signal dynamics and the observation operator P. It is satisfied by several problems of interest provided that the assimilation time h is sufficiently small and that the "right" parts of the system are observed; see again section 5 for examples. We remark that several forms of the squeezing property can be found in the dissipative dynamical systems literature. They all refer to the existence of a contracting part of the dynamics. Their importance for filtering has been explored in [10], [3], and [5]. It also underlies the analysis in [12] and [15], as we make apparent here. We have formulated the squeezing property to suit our analyses and with the intention of highlighting its similarity to detectability for linear problems, as explained in subsection 4.2. The function V will represent a Lyapunov type function in section 4. For all the chaotic examples in section 5 the operator D will be chosen as the identity, but other choices are possible. As we shall see, the absorbing ball property is not required when a global form of the squeezing property, as may arise for linear problems, is satisfied.
We will construct suboptimal filters {m j } j≥0 that are forced to lie in B V . By the absorbing ball property the signal v j is contained, for large j and with high probability, in the forwardinvariant ball B. Therefore, intuitively, the squeezing property can be applied, for large j, to The main result of this paper, Theorem 4.8, shows that, when Assumption 2.1 holds, the optimal filter accurately tracks the signal. Specifically we show that there is a constant c > 0, independent of the noise strength , such that Note that (2.6) not only guarantees that in the low noise regime the optimal filter (i.e., the mean of the filtering distribution) is-on average-close to the signal, but also that the variance of the filtering distribution is-on average-small. Indeed, since it follows by taking expectations and using linearity of the trace operator that and therefore, (2.6) implies We hence see that (2.6) guarantees that the variance of the filtering distributions scales as the size of the observation noise, like O( 2 ). Thus the initial uncertainty in the initial condition which is O(1) is reduced, in the large-time asymptotic, to uncertainty of O( ): the observations have overcome the effect of chaos. Small variance of the long-time filtering distribution had been previously proposed as a condition for successful data assimilation [4].

Suboptimal filters.
The aim of this section is to introduce a suboptimal filter designed to track dynamics satisfying Assumption 2.1. This filter is based on the 3DVAR algorithm from data assimilation and nonlinear observers from control applications. We give the necessary background on these in subsection 3.1 before introducing the new filter in subsection 3.2.

3DVAR filter.
The 3DVAR filter approximates the filtering distribution μ j+1 by a Gaussian N (z j+1 , C) whose mean can be found recursively starting from a deterministic point z 0 ∈ H by solving the variational problem where C is a fixed model covariance that represents the lack of confidence in the model Ψ, and Γ is the covariance operator of the observation noise w 1 .
The covariance C of the 3DVAR filter is determined by the Kalman update formula It is immediate from (3.1) that z j is Y j -measurable for all j ≥ 0, and it can be shown [18] that the solution z j+1 to this variational problem satisfies where K is the Kalman gain The 3DVAR filter was introduced, and has been widely applied, in the meteorological sciences [23], [20]. Long-time asymptotic stability and accuracy properties-that guarantee that the means z j become close to the signal v j -have recently been studied for the Lorenz '63 model [16] subject to additive Gaussian noise, and the Lorenz '96 and Navier-Stokes equation observed subject to bounded noise [15], [3].
It will be convenient to allow for other choices of operator K in the above definition, and consider the more general recursion where D is some linear operator that we are free to choose as desired. Filters of the form (3.3) are known as nonlinear observers [31], [29]. The 3DVAR filter can be seen as an instance of these where the operator D is determined by model and noise covariances, and by the observation operator. We now derive a recursive formula for the error made by nonlinear observers when approximating the signal. To that end note, first, that the signal {v j } j≥0 Second, using (2.2) at time j + 1, combined with the assumption that P w j+1 = w j+1 , we have Therefore, substracting the previous two equations, we obtain that the error Despite their simplicity nonlinear observers are known to accurately track the signal under suitable conditions [29], [31]. Equation (3.4) plays a central role in such analysis, and will underlie our analysis too. It demonstrates the importance of the operator (I − DP )Ψ in the propagation of error; this operator combines the properties of the dynamical system, encoded in Ψ, with the properties of the observation operator P .

Nonlinear observers and truncated nonlinear observers.
In the remainder of this section we introduce a truncated nonlinear observer that is especially tailored to exploit the absorbing ball property of the underlying dynamics.
Given a nonempty closed convex subset C ⊂ H, take m 0 ∈ C and, for j ≥ 0, define the C-truncated nonlinear observer m j+1 by where P C is the orthogonal (with respect to a suitable inner product) projection operator onto the set C; this is well-defined for any nonempty closed convex set [26]. In the next section we will analyze the long-time behavior of this filter when C is chosen as B V , and the inner product is induced by V 1/2 (see Assumption 2.1). The main advantage of this truncated filter is that m j ∈ B V for all j ≥ 0, and large uninformative observations y j corresponding to large realizations of the observation noise w j will not hinder the performance of the filter. Examples of other truncated stochastic algorithms can be found in [13].

Stochastic stability of suboptimal filters and filter accuracy.
In this section we prove long-time accuracy of certain suboptimal filters under different assumptions on the underlying dynamics and observation model. These results are used to establish long-time concentration of the filtering distributions. We start in subsection 4.1 by recalling the Lyapunov method for proving asymptotic boundedness of stochastic algorithms. In subsection 4.2 we employ this method to show asymptotic accuracy of nonlinear observers when a global form of the squeezing property is satisfied, as happens for certain linear problems. Finally, in subsection 4.3 we use truncated nonlinear observers to deal with chaotic models where only the weaker Assumption 2.1 holds.

The Lyapunov method for stability of stochastic filters. Consider a Markov chain
{δ j } j≥0 and think of it as the random sequence of errors made by some filtering procedure. The next result, from [29], underlies much of the analysis in the following subsections.
Lemma 4.1. Let δ 1 j and δ 2 j be two realizations of the H-valued random variable δ j and set 2 for all x ∈ H and some θ > 0. 2. There are real numbers K > 0 and α ∈ (0, 1) such that for all Δ j ∈ H, Then, for any a ∈ H, Therefore, regardless of the initial state Δ 0 ,

Filter accuracy with global squeezing property.
The following results show that if, for some suitable operator D, the map (I − DP )Ψ satisfies a global Lipschitz condition, then it is possible to use nonlinear observers to deduce long-time accuracy of the filtering distributions. Although such a global condition does not typically hold for dissipative chaotic dynamical systems arising in applications, the following discussion serves as a motivation for the more general theory in subsection 4.3. Moreover, the results in this subsection are of interest in their own right. In particular they are enough to deal with the important case of linear signal dynamics.
Theorem 4.2. Assume that there is a Hilbert norm V (·) 1/2 in H, equivalent to | · |, and a bounded operator D and constant α ∈ (0, 1) such that by (3.3). Then there is a constant c > 0, independent of the noise strength , Proof. By assumption, V satisfies the first condition in Lemma 4.1. Set δ j = v j − z j . Then, using (3.4) and the independence structure, where C > 0 is independent of , and to obtain the first inequality we used equivalence of norms and the fact that D is bounded. Thus the second condition in Lemma 4.1 holds and the proof is complete.
The following corollary is an immediate consequence of the L 2 optimality property of the optimal filter (2.3).
In the remainder of this subsection, we apply, for the sake of motivation, the previous theorem to the case of linear finite dimensional dynamics. We take H = R d and let the signal be given by This framework has been widely studied within the control theory community, mostly-but not exclusively-in the case where both the initial distribution of the signal and the observation noise are Gaussian. Other than its modeling appeal, this linear Gaussian setting has the c 2015 SIAM and ASA. Published by SIAM and ASA under the terms of the Creative Commons 4.0 license exceptional feature that the filtering distributions are themselves again Gaussian. Moreover, their means and covariances can be iteratively computed using the Kalman filter [11]. Since the optimal filter is the mean of the filtering distribution, the explicit characterization of the Kalman filter yields an explicit characterization of the optimal filter. Suppose that, for some given v 0 ∈ R d and C 0 ∈ R d×d , μ 0 = N (v 0 , C 0 ), and suppose further that w 1 ∼ N (0, Γ). Then the filtering distributions are Gaussian, μ j = N ( v j , C j ), j ≥ 1, and the means and covariances satisfy the recursion (see [18]) where the predictive Kalman covariance C j+1|j and Kalman gain K j+1 are given by Similar formulae are available when the covariance operator Γ is not invertible in the observation space [18].
Remark 4.4. It is clear from (4.2) that the Kalman filter covariance C j , which is the covariance of the filtering distribution μ j , is deterministic and in particular does not make use of the observations. It follows from the discussion in section 2 that in the linear Gaussian setting lim sup In the linear setting the global squeezing property in Theorem 4.2 reduces to the control theory notion of detectability, as we now recall.
Definition 4.5. The pair (L, P ) is called detectable if there exists a matrix D such that ρ(L − DP ) < 1, where ρ(·) denotes spectral radius.
We remark that the condition ρ(L − DP ) < 1 guarantees the existence of a Hilbert norm in R d in which the linear map defined by the matrix L − DP is contractive. It, therefore, yields a global form of the squeezing property. Note that detectability may hold for unstable dynamics with ρ(L) > 1. However, the observations need to contain information on the unstable directions. It is not necessary that these are directly observed, but only that we can retrieve information from them by exploiting any rotations present in the dynamics. This is the interpretation of the matrix D in the definition. The next result states the abstract global theorem of the previous section in the setting of linear dynamics. Our aim in including it here is to make apparent the connection between classical control theory [14], ideas from data assimilation concerning the 3DVAR filter [3], [12], [15], and the new results for chaotic systems observed with unbounded noise in section 4.3.

Filter accuracy for chaotic deterministic dynamics.
In this section we study filter accuracy for signals satisfying Assumption 2.1. Our analysis now makes use of truncated nonlinear observers (3.5), which are forced to lie in the absorbing ball B V . The idea is that once the signal gets into the absorbing ball, projecting the filter into B V reduces the distance from the signal, as measured by the Lyapunov function V. This is the content of the following lemma. P B V x denotes the point (in the V 1/2 norm) closest to x ∈ H in the set B V . Therefore, Proof. The case x ∈ B V is obvious so we assume V (x) > R. Let ·, · V denote the inner product associated with the norm V 1/2 . We claim that and the claim is proved.
. To see this recall the elementary fact that for arbitrary Using the fact established in Lemma 4.7 we are now in a position to prove positive results about the truncated nonlinear observer, and hence the optimal filter, in the long-time asymptotic regime.
Theorem 4.8. Suppose that Assumption 2.1 holds. Let {m j } j≥0 be the sequence of B Vtruncated nonlinear observers given by (3.5). Then there is a constant c > 0, independent of the noise strength , such that Proof. By Lemma 4.9 below, for arbitrary δ > 0 there is J > 0 such that, for every j ≥ J, Now, for j ≥ J we have by the absorbing ball property that v J ∈ B implies that v j+1 ∈ B, and hence by Lemma 4.7 Using the independence structure the last term vanishes, and for the second term we can employ the squeezing property with v j ∈ B, m j ∈ B V to deducê Since α ∈ (0, 1), Gronwall's lemma starting from J gives (for a different constant c > 0) Finally, combining (4.5) and (4.6) yields and since δ > 0 was arbitrary and the norms V (·) 1/2 and | · | are assumed to be equivalent, the proof is complete.
The following lemma is used in the preceding proof. Lemma 4.9. Let δ > 0. Then, with the notation and assumptions of the previous theorem, there is J = J(δ) such that, for every j ≥ J, Proof. First, by the assumed equivalence of norms there is θ > 0 such that V (·) 1/2 ≤ θ| · |. Second, using the absorbing ball property it is easy to check that P[v J / ∈ B] can be made arbitrarily small by choosing J large enough. Therefore, since we work with the standing assumption that E|v 0 | 2 < ∞, it is possible to choose J large enough so that Then, for j > J, where we used that, for j > J and v J / ∈ B, |v j | ≤ |v 0 | by (2.5).

Finite dimensions (Lorenz '63 and '96 models).
We study first the finite dimensional case H = R d . Our aim is to introduce a general setting for which Assumption 2.1 holds, and thus the theory of the previous section can be applied. In order to do so we need to introduce suitable norms, and some conditions on the general nonlinear dissipative equation (2.4). We start by setting | · | to be the Euclidean norm, and V (·) = |P · | 2 + | · | 2 .
Next, we introduce a set of hypotheses on the general system (2.4), and the observation matrix P.
Assumption 5.1.  [10] (as used in [16]) and Lorenz '96 models [15]. Assumptions 5.1.3 and 5.1.5 are fulfilled when the "right" parts of the system are observed. Examples of observation matrices P that fit our theory are given-both for the Lorenz '63 and '96 models-in subsections 5.1.1 and 5.1.2.
The first two items of Assumption 5.1 are enough to ensure the absorbing ball property Assumption 2.1. Indeed, if these conditions hold, then taking the inner product of (2.4) with v gives 1 2 Finally, Gronwall's lemma yields Assumption 2.1.1 with r 0 = |f | 2 and r 1 = 1, and the absorbing ball We now show that the squeezing property is also satisfied provided that the time h between observations is sufficiently small. The proof is based on the analysis of the Lorenz '63 model in [10]. Recall that Q = I − P is the operator that projects onto the unobserved part of the system. Lemma 5.2. Suppose that Assumption 5.1 holds and let r > 0. Then there is h > 0 with the property that for all h < h , v ∈ B, and u ∈ H with |u − v| ≤ r , there exists α = α(r ) ∈ (0, 1) such that where b 1 (t) and b 2 (t) are also defined in Lemma 5.3. Therefore, noting that V Qδ(t) = |Qδ(t)| 2 ≤ |δ(t)| 2 , Since b 1 (0) = 1, b 2 (0) = 0, and b 1 (0) = −1 < 0, it follows that, for all sufficiently small t, max {b 1 (t), b 2 (t)} ∈ (0, 1), and the lemma is proved.
The following result has been used in the proof. Lemma 5.3. Suppose that the notation and assumptions of the previous lemma are in force, and that |δ 0 | ≤ r . Then, for t ∈ [0, h), where k and k i , 1 ≤ i ≤ 5, are constants defined in the proof, and k 3 and k 5 depend on r . Therefore, where the functions a 1 , b 1 , and b 2 are defined in the obvious way from the expressions above. Proof. First, it is not difficult to check (see, for example, [12]) that Assumptions 5.1.1, 5.1.2, and 5.1.3 imply that there exists a constant k > 0 such that, for u ∈ H, v ∈ B, and t > 0, |δ(t)| 2 ≤ e kt |δ 0 | 2 .
Next, using the definition of δ and the symmetry of B(·, ·) it is possible to derive from [16] the error equation Taking the inner product with δ, we obtain i.e., We now bound |P δ| 2 . Taking the inner product of (5.4) with P δ, Hence, On integrating from 0 to t and using that |δ(t)| 2 ≤ |δ 0 | 2 e kt , where the last equality defines k 4 and k 5 . This proves (5.2). Then, going back to (5.5), After denoting k 1 = c 2 1 r 2 , k 2 = k 1 k 4 , and k 3 = k 1 k 5 the inequality above becomes Finally, Gronwall's lemma gives (5.3).
The previous lemmas show that Assumption 5.1 implies the squeezing property Assumption 2.1.2 provided that the assimilation time h is sufficiently small. Indeed, taking with r as in (5.1) we have that |u − v| ≤ (1 + √ 2)r for u ∈ B, v ∈ B V , and we are in the setting of Lemma 5.2 with r = (1 + √ 2)r. Moreover, the requirement B ⊂ B V in 2.1.2 is also fulfilled. Therefore, the following result is a direct application of Theorem 4.8.
Theorem 5.4. Assume that the signal dynamics are defined via a general dissipative differential equation on R d with quadratic energy-conserving nonlinearity of the form (2.4), and that Assumption 5.1 is satisfied. Then there is h > 0 such that Assumption 2.1 is also satisfied for all h < h . Therefore, if {m j } j≥0 denotes the sequence of B V -truncated nonlinear observers given by (3.5) and (5.6), then there is a constant c > 0, independent of the noise strength , such that, for all discrete assimilation time h < h , Consequently,  It is then immediate from the definitions that the first, second, and fourth items of Assumption 5.1 are satisfied [10]. A verification of the third and fifth items can be found in the proof of Theorem 2.5 of [10].
To provide insight, in Table 1 we show a Monte Carlo estimate of the mean square error (MSE) E|m j − v j | 2 made by a truncated nonlinear observer with different values of the observation noise strength . The results suggest that the MSE of this suboptimal filter decreases as O( 2 ), in agreement with our theoretical analyses. This provides an upper bound for the error made by the optimal filter. It is worth mentioning that the values of h for which we observe accurate signal reconstruction are often far larger than the upper limits required by our theory.
Define the projection matrix P by replacing every third column of the identity matrix I d×d by the zero column vector (5.7) P = e 1 , e 2 , 0, e 4 , e 5 , 0, · · · .
For a proof that the first, second, and fourth items of Assumption 5.1 are satisfied, see Property 2.1.1 in [15]. The third item results from combining Property 2.1.1 and Property 2.2.2 in [15]. Finally, since A = I, the fifth item holds with c 3 = 1, c 4 = 0.
As for the Lorenz '63 model, we show a Monte Carlo estimate of the error made by a truncated nonlinear observer in Table 2 LetĤ be the space of zero-mean, divergence-free, vector-valued polynomials u from T 2 to R 2 . Let H be the closure ofĤ with respect to the L 2 norm. Finally, let P H : (L 2 (T 2 )) 2 → H be the Leray-Helmholtz orthogonal projector [30]. Then, the operator A and the symmetric bilinear form B in (2.4) are given by where ν is the viscosity.
We assume that f ∈ H so that P H f = f. In the periodic case considered here, The Fourier coefficients encode the divergence-free property and hence may be written as v k = v k k ⊥ /|k| for scalar coefficients v k , where | · | is the Euclidean norm, and for k = (k 1 , k 2 ) k ⊥ = (k 2 , −k 1 ). We now define the observation operator P = P λ in the general observation model (2.2) as P λ u = |k| 2 ≤λ u k e ikx , and set Q λ = I − P λ . Several choices of noise fit our theory, and a natural one is given by where ξ k ∼ N 0, k 2 n(λ) −1 and n(λ) := #{k : k 2 ≤ λ}.
We have already defined L 2 divergence-free functions as an appropriate closure ofĤ, and denoted this space by H; we now define H 1 divergence-free functions as the closure ofĤ with respect to the H 1 norm, and we denote this space H. It is in H that we will apply our general theory. We define a norm in H, which is equivalent to the H 1 norm. Note that with this definition E w 1 2 H 1 = 1. The following theorem-see [30], [6], or [25]-guarantees the existence and uniqueness of strong solutions to this problem with initial conditions in H. Take | · | = V (·) 1/2 = · H 1 . It is not difficult to prove the absorbing ball property for the Navier-Stokes equation with initial conditions in H [25]. Indeed there is θ = θ(ν) > 0 such that, for every u ∈ H, |Au| 2 ≥ θ|u| 2 . Then Assumption 2.1.1 is satisfied with r 0 = |f | 2 θ 2 and r 1 = θ. We hence set (5.9) B = u ∈ H : |u| ≤ r := √ 2 |f | θ .
The following squeezing property is taken from [3], which uses the analysis in [10]. Lemma 5.7. For every r > 0 there are constants α = α(r ) ∈ (0, 1) and λ = λ (r ) > 0 with the property that, for λ > λ , there exists h = h (r , λ) such that, for all u, v ∈ B(r ) := {x ∈ H : V (x) 1/2 ≤ r }, and assimilation time h < h , The previous lemma yields Assumption 2.1.2. for sufficiently small assimilation time h by choosing B V = B and r = 2r. The next result is then a straightforward application of Theorem 4.8.
Theorem 5.8. Take | · | and V as above, and let {m j } j≥0 be the sequence of B V -truncated nonlinear observers with B V = B given by (5.9). Then there are h , λ > 0, such that for all h < h and λ > λ Assumption 2.1 is satisfied, and therefore, there exists a constant c > 0, independent of the noise strength , such that lim sup j→∞ E|v j − m j | 2 ≤ c 2 .

Consequently,
6. Conclusions. We conclude by summarizing our work and highlighting future directions.
• Noisy observations can be used, in the long-time asymptotic regime, to compensate for uncertainty in the initial conditions of unstable or chaotic dynamical systems. • It would be interesting to study similar questions in continuous time, and to investigate the impact of other sources of uncertainty, such as those arising from incomplete knowledge of the parameters in the model. • We have determined conditions on the dynamics and observations under which the optimal filter accurately tracks the signal (and the variance of the filtering distributions becomes small) in the long-time asymptotic. • These properties of the true filtering distribution are potentially useful for the design of improved algorithmic approximations of the filtering distributions. • We have introduced a modification of the 3DVAR filter as a tool to prove our results.
This new filter is potentially of interest in its own right as a practical algorithm.