Analysis of a localised nonlinear Ensemble Kalman Bucy Filter with complete and accurate observations

Concurrent observation technologies have made high-precision real-time data available in large quantities. Data assimilation (DA) is concerned with how to combine this data with physical models to produce accurate predictions. For spatial-temporal models, the Ensemble Kalman Filter with proper localization techniques is considered to be a state-of-the-art DA methodology. This article proposes and investigates a localized Ensemble Kalman Bucy Filter (l-EnKBF) for nonlinear models with short-range interactions. We derive dimension-independent and component-wise error bounds and show the long time path-wise error only has logarithmic dependence on the time range. The theoretical results are verified through some simple numerical tests.


Introduction
With the advancement of technology, we now have access to vast amounts of high-precision data in many areas of science. It is important to develop robust and efficient tools to combine the available data with refined large-scale physical models. This study is known as data assimilation (DA) and typically the goal is to produce accurate real-time estimations of the current state of the system.
In geophysical problems, the considered models often have vast spatial scales, therefore millions of state variables are needed to store information at different locations. Such high dimensionality poses a severe challenge to DA methodologies, since the associated computations are expensive and direct global uncertainty quantification tends to be erroneous. Over the last two decades, various computationally feasible approaches have been developed with practical success [22,12,7,19]. One of the most popular algorithms among these is the Ensemble Kalman filter (EnKF). It has been first derived in [8] and heavily advanced and employed in the field of numerical weather prediction. To combat dimensionality issues arising due to the extent of the spatial domain, the so called localization techniques are often employed for the EnKF [21,9]. The key motivation behind localization is that many systems exhibit a natural decrease in spatial correlation. This can guide artificial tunings of the empirical covariance matrix to avoid spurious correlations.
The empirical success of EnKF has aroused great interest in understanding the underlying theoretical properties [13,24,2,1]. EnKFs can be interpreted as Monte Carlo implementations of the Kalman Filter [11,10] which is derived for linear prediction and observation models. Therefore most theoretical studies of EnKFs assume a linear setting [17,4,6,16,23]. Existing analysis of EnKFs for nonlinear models concern mostly the boundedness of algorithm outputs [14,24,13], which is not helpful in understanding EnKF performance. The only exception is a recent work [5], where accuracy and stability results have been derived assuming abundant and accurate observations. However, the results there do not consider localization, and hence they require the sample size to be larger than the state dimension. This is infeasible in practice.
This paper intends to close the aforementioned gaps, i.e. nonlinearity and high dimensionality in filter performance analysis, by investigating a localised Ensemble Kalman-Bucy Filter (l-EnKBF). Following [5], we assume abundant and accurate observations are available. Since most geophysical models are formulated through partial differential equations or their discretizations, the associated prediction dynamics often have a short interaction range. This is often paired with a short decorrelation length in the localization technique to reduce the potential spurious long-range correlations. Under these assumptions, we show that l-EnKBF estimation error for each component is bounded independent of the overall dimension, both in the sense of mean square and the moment generating function. Such result does not exist in literature for DA analysis, based on our knowledge. Some related dimension-independent error analysis can be found in [16,23], but the error estimates are implicit and the models are assumed to be linear. Moreover, we also show the long time path-wise error has a logarithmic dependence on the time range, which is much weaker than the square root dependence in [5]. All these results indicate l-EnKBF has stable and accurate estimation skills.
In Section 2 the underlying setting is outlined and the considered l-EnKBF will be defined. Upper and lower bounds for the empirical second moment are derived in Section 3. Then point-wise and path-wise bounds for the mean squared error and a Laplace type condition are derived in Section 4 in the l 2 sense, and in Section 5 in the component-wise sense. In Section 6, the numerical sensitivity of an implementation of the considered l-EnKBF with respect to the underlying assumptions is tested for the Lorenz 96 system. Throughout the article we assume ( · , ·, · ) denotes the l 2 -norm with its corresponding inner product. Given a matrix A ∈ R m×n the l 2 -operator norm is defined as where λ max denotes the largest eigenvalue of a matrix. The following two matrix norms are also useful to us: where both A i,j and [A] i,j denote the entries of the matrix A. The bracket notation is necessary to denote matrix entries such as [A −1 ] i,j or [AB] i,j . Given two symmetric matrices A and B then A B implies the matrix A − B is positive semidefinite, which is equivalent to v T (A − B)v ≥ 0 for all v ∈ R n . Given a covariance matrix Γ the Mahalanobis norm is defined by v 2 Γ = v T Γ −1 v. Lastly, in order to describe the smallness of certain quantities, we use the big O notation. In particular, a quantity a ǫ is O(ǫ p ), if there is a ǫ-independent constant C > 0 so that for sufficiently small ǫ, C −1 ǫ ≤ a ǫ ≤ Cǫ.

Problem setup
In this paper, we consider a continuous-time filtering problem, formulated by In (1), X t ∈ R Nx represents the system we try to recover. We assume its initial distribution is given by X 0 ∼ π 0 . Its dynamics is driven by a deterministic forcing described by a map f : R Nx → R Nx and a stochastic forcing term √ 2σdW t . We assume linear noisy observations Y t ∈ R Ny of the system are available. In (1), the matrices σ and R are positive definite matrices, and W t ∈ R Nx and B t ∈ R Ny are independent Wiener processes.
In many spatial models, each model component is representing a state information at one spatial location. This introduces a natural distance between two indices, which we will denote as d. As a simple example, For example, if the indices are representing themself on the interval [1, n], then d(i, j) can be taken as |i − j|. For another example, if the indices are representing equally spaced points on a length n circle, then d(i, j) be taken as min{|i − j|, |i + n − j|, |j + n − i|}.
We will use x i (t) to denote the i-th component of X t , so X t = [x 1 (t), . . . , x Nx (t)] T . We will also use f i and w i (t) to denote the i-th component of f and W t . For notational simplicity, we will often write x i (t) as x i and w i (t) as w i , whenever their dependence on time is evident. Then the SDE that x i follows is given by Note that different components are interacting through the drift term, as f i (X t ) could have dependence on x j (t) for j = i. But in many physical processes, such interactions are of short range, meaning the dependence of f i (X t ) on x j (t) decays with d(i, j). More generally, this can be formulated as To have a single number controlling the overall stability of the system, we will consider the largest row sum of these Lipschitz constants and define We will assume that C f is a constant independent of the dimension N x . This can be verified if F k decays to zero exponentially with increasing k. In Section 6, we demonstrate how to verify Assumption 2.1 on the Lorenz 96 model, assuming all components are bounded.

Localized ensemble Kalman-Bucy filter
In EnKBF, an ensemble of particles, denoted by {X i t } i=1,...,M , is used to describe the uncertainty of X t . To run this algorithm, each of the particle is initialized at a random location from π 0 and then driven by the following dynamics In (4), the sample mean and covariance are defined by The posterior distribution of X t conditioned on Y s≤t is then approximated by the Gaussian distribution N (X t , P M t ). When the dimension is high, EnKBF is in general ill-defined and it can perform poorly. This is because of two reasons. First, the rank of P t is at maximum M − 1. So if M ≪ N x , P t is singular and its numerical approximated inverse is usually unstable. Second, by random matrix theory, it is known that if X i t are i.i.d. samples from a Gaussian distribution N (0, P ), in order for the covariance sampler error in l 2 -norm P − P t to be small, one needs M = O(N x ). In other words, P t is a very inaccurate approximation of the true posterior covariance when M ≪ N x [23].
In practice, one popular way to resolve the issues mentioned above is to apply covariance localization. Mathematically, this operation can be formulated as replacing P t in (4) with P L t = P t • φ. Here • denotes the component-wise product or Schur product, so the components of P L t are defined as The symmetric matrix φ here is called a localization matrix. Its components are nonnegative. They are of value 1 at the diagonal, and decay to zero extremely fast along the off diagonal direction. One popular choice takes the form of φ i,j = ρ( d(i,j) l ), where ρ is a function from (4.10) in [9] ρ(x) = where l denotes the typical decorrelation length, which we assume to be independent of N x . We will consider again the largest row sum of φ, and define We will assume C φ is a constant independent of the dimension N x . This is true for most practical localization matrices including (6). When the true covariance matrix is spatially localized, P L t is a much better covariance estimator, because the localization operation eliminates spurious long distance correlation errors [3]. Moreover, the localization operation improves the rank, so P L t is often full rank and invertible. But this is not guaranteed in general. So for the rigorousness of this exposition, we use the following inversion Definition 2.2 If all diagonal entries of P t are nonzero, then its diagonal inverse (DI) is given by Note that it satisfies the following for all i = 1, . . . , n In the original EnKBF formulation (4), we replace P t with P L t and P −1 t with P † t and we obtain the localized EnKBF (l-EnKBF):

Abundant and accurate observations
When the observation sources are abundant, H in (1) can be assumed to be of rank N x , and there is an We can consider the following transformation then X t and Y t follow the SDE in below If we apply l-EnKBF (9) to the transformed system ( X t , Y t ), then the sample mean and covariance matrices will followx It is evident that the theoretical properties of X i t will be the same as the ones of X i t . Note that (10) corresponds to the original model (1) with σ = 1 and H = I. This is a much simplified parameter setting for followup discussion. And from the above derivation, there is no sacrifice of generality by focusing on it. Under this setting, the l-EnKBF formula will be simplified as When the observations are accurate and independent, the observation noise covariance RR T is a diagonal matrix with small components. We will use ǫ to describe their order. In summary, we have made the following assumption Assumption 2.3 Through a linear transformation, we assume (1) is transformed to moreover Ω = ǫ(RR T ) −1 is diagonal, and bounded by constants ω min I Ω ω max I.
By replacing Ω R with ǫ −1 Ω, the l-EnKBF formula is written as the sample mean process follows the following dynamics Because the sample covariance P t = 1

Wellposedness and Stability
Before the accuracy of the filter can be addressed it is crucial to check if the l-EnKBF can blow-up or collapse.
In other words, we will demonstrate that the filter is stable, such that there are upper and lower bounds for P t .

Matrix norms and Riccati equation
To start, we have several norm inequalities which are utilized in this paper.
Lemma 3.1 For any N × N matrix A, the following holds A ≤ Proof Inequality (14) follows via where e i and e j are the i-th and j-th standard Euclidean basis vector. Inequality (15) Proof For claim 1), just note that For claim 2), by the linearity of the Schur product, it suffices to show that 0 P • φ. This is known as the Schur product theorem, which can be verified using the following identity, which holds for all N x -dimensional vectors u, with D u being the diagonal matrix where its diagonal entries are the same as u: For claim 3), since P is a positive semidefinite matrix for each i and j, it follows that where e i and e j are the i and j-th standard Euclidean basis vector. This implies 2 e i , P e j ≤ e i , P e i + e j , P e j ≤ 2 max In other words in a positive semidefinite matrix the maximal values are reached on the diagonal. Note that the Schur product P • φ is a positive semidefinite as well, so it's maximal matrix entries are also assumed on the diagonal. Since φ is set to [φ] i,i = 1 for all i the Schur product does not alter the diagonal entries of P thus Recall that inequality (15) implies A ≤ A 1 for any symmetric matrix A which yields the first half of claim 4), since P • φ is symmetric. The other half can be obtained by In this paper, we often concern the component-maximum of processes which are driven by Riccati type of stochastic equation. The following lemma is a tool to analyze such quantities.
Suppose y t satisfies d dt y t = g(y t , t) + δ 0 for a fixed δ 0 > 0 and y 0 > m 0 , then the following hold 1) For all t > 0, y t > m t . .
Proof For claim 1), let t 1 = inf{t > 0, y t ≤ m t }. By continuity of m t and y t , t 1 > 0. Suppose t 1 is finite, then y t1 = m t1 . Therefore This indicate for sufficiently small δ > 0, This contradicts with the definition of t 1 . Therefore t 1 = ∞. For claim 2), denote the root of g(x, t) + δ 0 = 0 as It is easy to check that y Next note that y t is the solution of a Riccati differential equation. The solution to the Riccati ODE is given by has the explicit formulation If y 0 < y + , it is easy to check that (16) always take negative value, meaning y t < y + < y + − y − = ∆ ǫ for all t > 0. When y 0 > y + and t > t * , from (16) leads to . This leads to the first part of the claim. Next note that when 0 ≥ x ≥ −c ǫ and t > t * , g(x, t) ≤ − γ 3 . So y t will be decreasing with rate at least γ 3 when 0 ≥ y t ≥ −c ǫ and t > t * , this leads to our claim.

Upper bounds for sample covariance
Lemma 3.4 Under Assumptions 2.1 and 2.3, suppose P L t evolving in time according to (13) exists, the following holds And for all t > 0, P t max ≤ max{ P 0 max , λ max }. It is clear that when C f and ω min are constants, In [5] the bound depends explicitly on M (as the Frobenius norm is used to derive the bound). Here a different route is taken which results in a bound independent of M .
Proof Recall that P t is positive semidefinite, therefore by Lemma 3.2 where the components of P t follows an ODE (13). Therefore, in order to apply Lemma 3.3 Claim 1), it suffices to investigate the ODE deriving the component with the maximal value. Suppose at time t, This leads to Lastly, we have Insert (18), (19) and (20) to (17), we find Therefore Lemma 3.3 claim 1) applies with Let δ 0 = 1, Lemma 3.3 claim 2) yields the result of this lemma.

Lower bounds for sample covariance
To ensure that the filter does not collapse, it is crucial to have a lower bound on the covariance. This comes as a reverse of Lemma 3.4. For this purpose, we denote It should be noted that P t min is not a norm, and we choose this notation just for its symmetry with P t max .
Lemma 3.5 Under Assumptions 2.1 and 2.3, suppose P L t evolves in time according to (13), the following holds for sufficiently small ǫ > 0 It is clear that when C f and ω min are constants, Proof The proof is similar to the one of Lemma 3.4, but we need to change sign, because By Lemma 3.3 claim 1), we assume at time t, P t min = [P t ] k,k and investigate the ODE that −[P t ] k,k follows. It is given by the inverse of (17). Following same procedures prior to (18), we have (19) remains the same. Finally recall that in (20), we have Insert (21), (19), and (22) into (17), we find So we can apply Lemma 3.3 claim 3) with δ 0 = 1 and This gives us the claimed result.

Wellposedness of l-EnKBF
Since P † t is well defined as long as P t min > 0, using the same proof as in Theorem 2.3 of [5], we can show that the l-EnKBF given by (9) has a strong solution: Corollary 3.6 Suppose the initial ensemble is selected so that P 0 min > 0. Then the l-EnKBF filter is well defined for all t > 0.

Error analysis in l 2 norm
As the next step we consider the accuracy of l-EnKBF in terms of the l 2 norm. Since the filter estimate with the ensemble mean, the error is its deviation from the truth, e t = X t − X t . While it has already been shown in [5] e t 2 is of order N x √ ǫ through tail probability, our new result extends this estimate to the Laplace transforms. Moreover we show the path-wise maximum has the logarithm scaling with time, indicating the filter is highly stable in terms of error.
3) For any T > t 0 , the following holds Here E t0 denotes conditional expectation with respect to information available at time t 0 .
Note that the ǫ 1/2 scaling is sharp. This can be understood best if one applies the Kalman-Bucy filter to (1) with f (X) = 0, H = I Nx and R = √ ǫI Nx , the posterior covariance P t follows the ODE d dt P t = 2I Nx − ǫ −1 P 2 t . It is easy to show that P t will converge to the limit P ∞ = √ 2ǫI Nx , which is of order ǫ 1/2 as well.

Evolution of component-wise error
Before we prove the statements of Theorem 4.1 we consider the following auxiliary lemma which will be used several times throughout the remainder of the paper.
Lemma 4.2 Let e j (t) be the j-th component of the filter error e t . Then the following holds Note here we write e j (t) as e j for notational simplicity. In (23), M t is a N x dimensional martingale with components being In (23), α t and β t are two real valued processes given by By Lemmas 3.4 and 3.5, the following holds for all t ≥ 0 . When t ≥ t * , these bounds can be further improved to Proof Recall the evolution of X t and X t are given by dX t = f (X t )dt + √ 2dW t and The evolution of the error e t = X t − X t is given by the difference between the two, namely The j-th component of this differential equation is given by where f j denotes the j-th component of f t . Ito's formula implies that To continue, note that By Assumption 2.1, the second part of (25) can be bounded easily by To bound the first part of (25), we note by Assumption 2.1 and Cauchy Schwarz, Then multiplication with |e j | with (27) yields Plug these into (25), we find Next, we deal with [P L t Ωe t ] j in (24). Defineφ := φ − ρI and obtain the following equality Also note that Plug (30), (29), and (28) into (24), we obtain
When α and β are both positive, we have further that Proof Consider u ′ t = exp(α(t − t 0 ))u t . Then its evolution follows du ′ t = αu t dt + exp(α(t − t 0 ))du t ≤ exp(α(t − t 0 ))β + exp(α(t − t 0 ))dM t . Integrating both hands from t 0 to t, then take conditional expectation we have This leads to our claim.

Proof of Theorem 4.1
Claim 1). Note thatφ 0 and thus i,j [P t •φ] j,i e i e j ≥ 0, and i F d(i,j) ≤ C f . So utilizing Lemma 4.2 and summing over all j on both sides of (23) yields where the martingale is given by For t ∈ [0, t * ], (32) can be further upper-bounded by For t ≥ t * , (32) can be further upper-bounded by Since α ′ * = α * − C f = O(ǫ −1/2 ), β * = O(1), and ǫ −1 exp(−λǫ −1/2 ) = o(1) for any λ > 0, so we have proved for claim 1). Claim 2). First we note the quadratic variation of the martingale term M ′ t is given by So by Ito's formula on exp(λ e t 2 ), the following holds with By Lemma 3.4 and 3.5, we have for all t > 0 For t ≤ t * , by Gronwall's inequality we have And when t ≥ t * , Note that when 1 4 Otherwise, when 1 4 λα ′ * e t 2 ≥ λβ * N x , we have .
In summary, we always have After applying Grönwall's inequality and (34) we obtain the following When t → ∞, this leads to claim 2): Claim 3). We consider function By finding the critical point, it is easy to see Combine this with (35), we find So by Dynkin's formula, if we let τ = min{t : t ≥ t 0 , e t 2 ≥ M }, then By Markov inequality we have Then by Lemma 4.4, We take λ = λ * = O(ǫ − 1 2 ) to obtain our claimed result.

Analysis for component-wise error
While Theorem 4.1 provides an estimate e t 2 , the estimate has a scaling of N x because e t 2 is the sum of N x component errors. From Theorem 4.1, it is impossible to indicate the error of one specific component, or whether this component's error is independent of the dimension N x . This section shows that with a stronger structure assumption on the localization matrix, we can derive dimension-independent bounds for each individual component.

Assumption 5.1
The localization matrix φ is diagonally dominant. In other words, there is a q < 1 such that Moreover, the interaction between components can be dominated by a constant C F -multiple of the matrix structure φ: Since F d usually decays to zero quickly in practice, so C F are likely to be found. Using Lemma 3.1 it is easy to show φ satisfying Assumption 5.1 will have φ (1 − q)I, meaningφ = φ − qI is positive semidefinite. In other words, Assumption 5.1 is stronger than assumption for φ imposed in Theorem 4.1. In general, φ is not always diagonally domain. However, this can hold if one choose small localization length l. For example, for the Gaspari-Cohn [9] distance matrix φ, it will be diagonally dominant if l ≤ 1.4. In other words, Assumption 5.1 is likely to hold if the components of model represent spatial information of distant apart.
With Assumption 5.1, we can reproduce Theorem 4.1 type of result for individual component.
Theorem 5.2 Let e t = X t − X t be the filter error of l-EnKBF (11). Under Assumptions 2.1, 2.3, and 5.1, for any fixed t 0 > 0 there are constants c and C such that for sufficiently small ǫ > 0, 2) For any 0 < λ < cǫ −1/2 and index i, 3) For any T > t 0 , the following holds for all i Here E t0 denotes conditional expectation with respect to information available at time t 0 .

4) For any T > t 0 ,
Remark If Z 1 , . . . , Z n are i.i.d. samples of a Gaussian distribution, a rough estimate of max i {Z i } is of order log n. And when system has short range interaction, its components are tend to be independent when they are far apart. Likewise, when a system is stationary, it is close to independent with it self in a distance past. The filter error process happens to have both of these two properties. That is why we have the scaling of log(N x T ) in claim 4).

Component-wise Lyapunov weights
In order to bound e 2 i (t) in long time, it is necessary to build a Lyapunov function for it. The main challenge here is that dynamics of e 2 i (t) is coupled with the error of other components. The idea is here to find a weight vector v i so that j v i j e 2 i (t) is a Lyapunov function. The design of v i happens to relate to the structure of φ, and can be expressed as the Green function of a Markov chain. Consider a Markov chain X t on the points {1, . . . , N x }. Its transition probability is given by Fix an index i ∈ {1, . . . , N x }. Define vector v i , where its components are given by Then v i satisfies the following properties 2) For all index j, l =j φ j,l v i l ≤ v i j .

3)
Nx This is claim 1). Next, by doing a first step analysis of Markov chain, we find that This leads to From this and that φ is symmetric we obtain our claim 2). For claim 3), we sum (37) over all j and obtain Therefore we have which leads to our claim.

Proof of Theorem 5.2
Claim 1). Recall that Lemma 4.2 has shown that Recall thatφ = φ − ρI. In the following, we use P j,i to denote the (j, i)-th component of P t . Then by Cauchy Schwartz and Young's inequality so (38) leads to We denote the vector E t = [e 2 1 , e 2 2 , · · · e 2 N ] T . Further we define vector v i , of which the component is given by Then the SDE of E i t can be bounded by a linear combination of (39), which is We have used claim 3) and 2) of Lemma 5.3 at (40) and (41). Between time 0 and t ǫ , recall the upper bound in Lemma 4.2, apply Gronwall's inequality Then after t ǫ , for any t, apply Gronwall's inequality .

Numerical investigation
Lastly the theoretical findings are numerically verified by means of the stochastically perturbed Lorenz 96 system (L96) [15]. The evolution of each spatial component is given by and spatial periodicity is assumed, i.e., x −1 (t) = x Nx−1 (t), x 0 (t) = x N (t) and x Nx+1 (t) = x 1 (t). Numerically generated trajectories of (46) are typically bounded in the l ∞ norm, i.e., for all s for the Lorenz 96 system. In other words, the solution of (46) is largely indifferent from a soft-truncated version dX s (t) =f s (X(t))dt + √ 2dW s (t), wherẽ Then note that when Therefore, Assumption 2.1 is fulfilled with F d(i,j) =0 for d(i, j) > 2 where d(i, j) = min{|i − j|, |i + n − j|, |j + n − i|}, and (48) has only short range interactions. While we will only simulate (46) in below, we expect the associated filter behavior will be similar to the one in (48). Further the entries of the localization matrix φ are set to φ i,j = ρ d(i, j) l using the Gaspari-Cohn function (5) for ρ and setting the localization radius to l = 1.4. Note that this choice of localization radius ensures that φ is diagonally dominant, i.e., Assumption 5.1 is fulfilled. It is important to note that this choice is not necessarily the optimal 1 value for the considered system yet the chosen value is sufficient to obtain reasonable MSE values of the expected order. Further we choose the model noise variance to be σ = 1 and the observation operator H to be the identity matrix which is in line with Assumption 2.3. Three test scenarios are considered to numerically verify the sensitivity of the l-EnKBF with respect to the dimension N x , time interval size T and the measurement error ǫ.

Sensitivity with respect to ǫ
In the first test scheme, the expected filtering error is approximated via a time-averaged MSE for different measurement error values ǫ ∈ {0.003125, 0.00625, 0.025, 0.05, 0.1}.
In order to emulate a continuous setting the steps size is chosen to be dt = 10 −7 and the number of steps 10 7 . The dimension of the state space is set to be N x = 40, which is a standard choice of the Lorenz 96 model. The l-EnKBF is implemented with M = 10 ensemble members. The results are displayed in the left panel of Figure 6.1. Note that the MSE is normalised with respect to the dimension, i.e., is divided by N x . The test run confirms that the numerical growth rate with respect to an increasing ǫ is in line with theoretical order of the expected error derived in claim 1) of Theorem 4.1.

High dimensional testcase
In the second test scheme, the robustness with respect to state space dimension is investigated. In particular we consider the case where the number of ensemble members M is comparatively small and kept fixed for increasing dimensions. Thus the imbalance between ensemble size and dimension of the state space grows with increasing N x . More precisely we run the filter for N are dimension independent (see right panel of Figure 6.2) as stated in claim 1) Theorem 5.2. Note that we fixed the considered component of the state vector to be i = 11 while other index choice produces largely the same results.

Uniform error for bounded time interval
In the final test scheme, we consider a setting with a growing number of steps 10 6 to 10 7 for a fixed step size dt = 10 −5 resulting in filter runs for different time values T ∈ {10, . . . , 100}. Note that the step size is set to be slightly larger than in the previous examples so that the range of considered T values is more interesting. Further the measurement error variance is set to ǫ = 0.01 and the dimension of the state space is N x = 40. We simulate the filtering process 30 times and record the filter error e j (t), j = 1, . . . , 30, for each simulation. We plot the averaged path-wise l 2 -square error up to T , which is

Conclusion
In this paper, the earlier derived stability and accuracy results for the EnKBF are extended for systems with N x >> M via localization. Further the upper bound for the covariance is independent of the number of ensemble members M and the derived path-wise bounds have a better scaling with respect to the time T. Moreover it is shown that the accuracy in the individual components is independent of the state dimension N x and a Laplace type condition is obtained. Natural extensions include partially observed processes and misspecified drift functions f (x t , λ) with unknown parameter λ. Moreover the presented ideas can be used for the analysis of properties of consistent filters, such as the Ensemble Transform Particle Filter [20] or the Feedback Particle Filter [25], for finite number of ensemble members.