On the consistency of ensemble transform filter formulations

In this paper, we consider the data assimilation problem for perfect 
differential equation models without model error and for either continuous 
or intermittent observational data. The focus will be on the popular class of 
ensemble Kalman filters which rely on a Gaussian approximation in the 
data assimilation step. We discuss the impact of this approximation on 
the temporal evolution of the ensemble mean and covariance matrix. We also discuss 
options for reducing arising inconsistencies, which are found to be more severe 
for the intermittent data assimilation problem. Inconsistencies 
can, however, not be completely eliminated due to the classic moment 
closure problem. It is also found for the Lorenz-63 model that the 
proposed corrections only improve the filter performance for 
relatively large ensemble sizes.


1.
Introduction. Let us consider an ordinary differential equation with state vector x ∈ R N . We assume that there is a reference trajectory x ref (t) for t ≥ t 0 which we would like to approximate. However, contrary to standard initial value problems we do not have access to the initial value x ref (t 0 ). Instead we treat the model states x(t) as realizations of a random variable with known initial probability density function (PDF) π 0 (x) at time t 0 . The PDF for the model states x(t) at t > t 0 is then provided by the solution of the Liouville equation with initial PDF π(x, t 0 ) = π 0 (x). In general, the PDF π(x, t) will have little in common with the measure

SEBASTIAN REICH AND SEOLEUN SHIN
induced by the reference solution for t t 0 even if π 0 and π ref (·, t 0 ) are close. Here δ(·) denotes the Dirac delta function and closeness should be measured with an appropriate distance function such as the Wasserstein distance of two measures. See, e.g., [17].
In this paper, we consider the situation where in addition to the model (1) and the initial PDF π 0 we have access to partial observations of the state variable x. If the observations are taken continuously in time we talk about a continuous data assimilation problem. If the observations are taken at discrete points in time we have an intermittent data assimilation problem. Details of the mathematical setting will be discussed in the following section. See also the excellent introduction by Jazwinski [11].
Numerical methods for approximating either the continuous or the intermittent data assimilation problem are subject of ongoing research. Particle filters (also called sequential Monte Carlo methods) are a common choice [3] which however do not seem to work well for high dimensional problems and moderate ensemble sizes. Alternative methods such as the ensemble Kalman filter (EnKF) have become popular in recent years [10]. The EnKF is an example of a particle transform filter (contrary to the importance sampling approach of sequential Monte Carlo methods). However, the EnKF does not provide a consistent approximation to the data assimilation problem unless the PDFs π(x, t) are Gaussian. In this paper, we take the ensemble transform filter idea as a starting point (see recent work by Crisan and Xiong [8], Yang et al. [19], and Reich [14]) and discuss the consistency of ensemble transform filters with regard to their ensemble mean and covariance matrix propagation. Our work was inspired by Lei and Bickel [12] who proposed modifications to the EnKF which lead to a consistent update of ensemble moments during a single intermittent data assimilation step. The alternative analysis proposed in this paper also presents a unified view on ensemble transform filters for continuous and intermittent data assimilation.
2. Mathematical background. We assume that observations are either in form of continuous measurements y(t) ∈ R K , t ≥ t 0 , satisfying the stochastic differential equation where η(t) denotes K-dimensional standard Brownian motion and R ∈ R K×K is a symmetric positive-definite matrix, or intermittent measurements y q ∈ R K at discrete times t q ≥ t 0 , q ≥ 1, satisfying where η q ∈ R K are i.i.d. Gaussian random variables with mean zero and covariance matrix equal to the identity matrix. In both cases, we call h : R N → R K the forward operator. Note that y, h, η, and R take different meanings depending on whether we consider the continuous or the intermittent data assimilation problem.
The time evolution of an initial probability density function (PDF) π 0 (x) is provided by Kushner's equation in case of the continuous filtering problem (see, e.g., [11]) and by in case of the intermittent (continuous-discrete) filtering problem [6,14]. Hereh denotes the expectation value of h with respect to π, i.e., Eq. (4) is to be understood as the limit of smooth approximations to the Dirac delta function (mollification) [6]. We will consider particle methods for approximating the filtering problem, i.e. we will consider (conditional) empirical measures of M > 1 particles x i (t) for which we need to find appropriate evolution equations.
In the absence of observations the appropriate ensemble evolution is simply given by d dt To illustrate this point we introduce the empirical expectation valuê under equation (6), which is consistent with the continuity equation after integration by parts and setting π = π em . We now turn to the observation driven contributions to the filter equations (3) and (4), respectively, and their treatment under a particle method. Contrary to standard sequential Monte Carlo methods (see, e.g., [3]), we follow recent work by Crisan and Xiong [8] and Reich [14], i.e., we seek a vector field du such that in case of continuous filtering or vector fields u q such that in case of intermittent filtering. The complete particle filter is then given by either in case of continuous observations or by respectively, in case of intermittent observations. There are different options for finding these vector fields du and u q , respectively. See, for example, [15], [19], and [9]. In this paper, the focus will be on vector fields which satisfy (7) or (8), respectively, in a weak form using the empirical measure (5). More precisely, the vector fields should satisfy either respectively, for appropriate choices of the test function g. Hereĥ is defined in the obvious manner, i.e.ĥ In this paper, the set of admissible test functions g will be restricted to We emphasize at this point that the empirical measure π em (x, t) is always conditioned on the available measurements up to time t. This dependence is not explicitly expressed in the subsequent discussions.
We will discuss first (mean) and second-order (covariance matrix) consistency of the ensemble Kalman-Bucy formulation [7] dx for continuous data assimilation. Herê x i denotes the empirical mean and the empirical covariance matrix, respectively. Note that we also used a linear forward operator, i.e. h(x) = Hx, in (2). We will also consider ensemble square root filter formulations [16] for intermittent data assimilation and their formulation in terms of an embedded continuous formulation d ds The rest of the paper is organized as follows. In Section 3, we analyse the propagation of the ensemble mean and the ensemble covariance matrix under ensemble transform filters for both the continuous and the intermittent data assimilation problem. It is demonstrated that currently available ensemble transform filters suffer from inconsistency issues which can be reduced by appropriate modifications while not being completely avoidable due to the familiar moment closure problem. The problem appears to be more serious for the intermittent data assimilation problem. In Section 4, we present two types of numerical results. First we study two simplified 1D test problems with trivial model dynamics dx/dt = 0 while the Lorenz-63 model [13] is used as a more challenging test problem for assessing the different assimilation schemes.
3. Consistency of ensemble transform filters. To simplify the derivation of our ensemble transform filters, we restrict the discussion in this section to the trivial model equation dx/dt = 0. This restriction is justified by the fact that the empirical measure (5) is consistent with respect to all smooth test functions g under the model equations (1) alone. (3), the expectation valueḡ = E π [g] of a smooth scalar-valued test function g satisfies

Continuous filtering problem. Under Kushner's equation
and overbar denotes expectation with respect to the PDF π(x). We now take g(x) equal to the l-th component of the state vector x ∈ R N and set π = π em , i.e., we effectively derive an evolution equation for the ensemble mean x of the complete state vector. Straightforward calculations yield Using the abbreviations we obtain the equivalent formulation Here we have introduced the empirical covariance matrix This formulation is consistent with the evolution equation for the mean obtained from the ensemble Kalman-Bucy filter (10) for linear forward operators, i.e. h(x) =

SEBASTIAN REICH AND SEOLEUN SHIN
Hx. This consistency property is also valid for the ensemble Kalman-Bucy filter with "perturbed" observations for M → ∞, which is given by where η i (t), i = 1, . . . , M , denote independent realizations of standard Brownian motion.
A similar calculation yields an equation for the covariance matrix in the form of (see [11] for details) Here we have also introduced the covariance matrix between the state vector x and its forward operator value h(x): Upon replacing the PDF π by the ensemble induced empirical measure π em , we obtain leads to a consistent ensemble Kalman-Bucy filter formulation, i.e., this choice of du(x i , t) satisfies, while not being unique, the weak form (9) of (7) for g equal to The complete modified ensemble Kalman-Bucy filter equations are provided by 3.2. Intermittent filtering problem. Following Lei and Bickel [12] (but see also [11]), we define consistent estimators for the posterior mean and covariance matrix for an intermittent data assimilation problem bŷ , respectively. Here denotes the likelihood function for observing y given x and the ensemble members x i are assumed to follow the prior distribution. Then, using y = y q at t = t q and an ensemble Kalman filter analysis step to produce posterior proposals x p i , the corrected posterior ensemble members x a i , defined by are consistent with regard to the posterior mean and covariance matrix at time t q .
Here ∆x p i = x p i −x p ,P p xx denote quantities derived from the proposal ensemble. A related formula has been given by Lei and Bickel [12] for the ensemble Kalman filter formulation with perturbed observations [10]. However, while the formulation of Lei and Bickel [12]) requires M + 1 calculations of a matrix square root, formulation (16) requires only two. See also the particle filter resampling scheme of Xiong et al. [18].
We now investigate the inconsistency of an ensemble square root filter with regard to its ensemble mean and covariance matrix propagation in more detail. We recall that, instead of applying Bayes theorem [11] directly, one can solve at any observation instance t q , q ≥ 1, the evolution equation and y = y q at t = t q . The initial value is given by the prior PDF while the solution at s = 1 yields the posterior PDF according to Bayes' theorem. See [14] for a derivation. The time evolution of the expectation value of a scalar-valued function g is now given by This formula should be compared to the corresponding formula (12) for the continuous data assimilation problem. Using the empirical measure for π and applying the above formula component wise, we obtain the following consistent update for the ensemble mean: We find that the continuous ensemble Kalman filter formulation (11) in [5,6] is not consistent with the evolution equation for the ensemble mean even for linear forward operators and, more precisely, a third-order moment correction term is being introduced, which vanishes for Gaussian PDFs. We mention that the same inconsistency problem arises for the ensemble Kalman filter for intermittent data assimilation [12]. Note that the inconsistency can simply be removed by recentering the analysed ensemble about the posterior meanx a . The corresponding equation for the empirical covariance matrix is which one obtains from (17) with g = ∆x∆x T using the abbreviation andL denotes the ensemble mean of L. Again we find that the continuous ensemble Kalman filter equation (11) is not consistent with this evolution equation for the empirical covariance matrix. Let us discuss the source of inconsistency in more detail for the simple problem x ∈ R and y q = x + η q with η q ∼ N(0, 1). Then and the Kalman update is obtained for Gaussian PDFs π = N(x, P xx ) since the third-order moment vanishes and In all other cases the consistent estimation of the fourth-order moment is required to satisfy the second-order moment evolution equation. Hence we face a classic moment closure problem.
We now derive an alternative, consistent filter formulation in terms of the ensemble meanx and deviations ∆x i using (17). We first note that dx ds = − L∆x is the update equation for the mean, which we have already analysed explicitly. Next we set g = ∆x∆x T and find that leads to an update which is consistent with (18) since We finally obtain assimilation equations CONSISTENCY OF ENSEMBLE TRANSFORM FILTERS 185 in terms of the ensemble members x i , i = 1, . . . , M . Note that instantaneous consistency with regard to first and second-order moments does not imply that the update is consistent over the whole interval s ∈ [0, 1] since higher than second-order moments are not consistently propagated. 4. Numerical examples. We conduct two type of experiments. In the first set, we assume a trivial model dynamics and only assess the behavior of ensemble transform filters with respect to the assimilation of continuous or intermittent data under non-Gaussian PDFs/non-linear forward operators. The second experiment is based on the Lorenz-63 model [13] and examines the interplay of non-linear dynamics with intermittent data assimilation with regard to moment consistency.

1D test problems.
We test our formulation first for a single intermittent assimilation step where the prior is a bimodal Gaussian and the likelihood is The posterior distribution is again bimodal Gaussian and can be computed analytically. The same applies to the mean and the second-order moment of the posterior distribution. Here we demonstrate how an EnKF and the continuous ensemble transform filter fail to reproduce the correct mean and covariance. See Table 1. We note that the formulation (19), which we denote by consistent ensemble transform filter (CEnTF), does not improve the results. This is again due to the moment closure problem, i.e. (19) fails to reproduce the higher than second-order moments correctly. In contrast, both the second-order modified filter in [12] as well as the formulation (16) reproduce the correct mean and covariance as M → ∞ by construction. However, both these filters are difficult to implement for high-dimensional problem since they require computing the square roots of N × N covariance matrices. Furthermore, ensemble sizes M N = 1 are required to see the improved behavior of (16).
As a second example, we consider the continuous filtering problem with trivial model dynamics dx/dt = 0, a reference trajectory x ref (t) = 1, t ≥ 0, observations with η(t) denoting standard Brownian motion, and nonlinear forward operator h = x 3 /2. The initial PDF π 0 is Gaussian with mean zero and variance equal to P xx = 4. We computex(t) for t → ∞ using a high resolution approximation to Kushner's equation and compare this result tox(t) from the ensemble Kalman-Bucy filter (10) and the modified formulation (14) with M = 1000 ensemble members. We plot the evolution ofx and the two approximationsx in Figure 1. Both approximationŝ x(t) approachx(t) ≈ x ref = 1 for t sufficiently large. It should be noted that the underlying PDFs π(x, t) are strongly non-Gaussian in the initial phase. See Figure  2. We conclude that the modified formulation (14) does not offer an improved behavior for this particular test problem.
In Table 2 we present results for M = 10, 40, 400 ensemble members over 6000 data assimilation cycles. The ensemble size M = 400 corresponds to the setting of [12] and the results from Table 1 should be compared to those from Table 1 in [12]. We note that we obtain a significantly lower mean RMSE for the standard EnKF as compared to the one presented by [12] for their implementation.
We also implemented non-perturbative ensemble (square root) filter algorithms [10] as well as the modified update (19). However, we found that none of the filters worked for this example without additional modifications such as ensemble inflation.
5. Discussion. The consistency of ensemble transform filter algorithms has been discussed with regard to their mean and covariance evolution. Under both assimilation scenarios, i.e. continuous as well as intermittent data assimilation, one encounters the familiar closure problem for moment truncations. Due to the higher-order terms in (17) compared to those in (12) -compare the definitions for L and dK -, this problem is more severe for intermittent data assimilation. Various correction methods have been discussed; only those for the mean are easy to implement. Overall the ensemble Kalman-Bucy filter (10) is found to work well also for non-Gaussian PDFs while standard ensemble Kalman filters for intermittent data assimilation are found to be more problematic under non-Gaussian PDFs. However, even if computational affordable, our results for the Lorenz-63 model indicate that the correction term (16) is only advantageous for large ensemble sizes compared to the dimension of phase space, in which case one could also resort to traditional sequential Monte Carlo methods.
Even though we found that (14) as well as (19) did not offer improved results for the tests performed in this paper, online comparison with (10) and (11), respectively, could serve as an indicator for non-Gaussianity in the underlying PDFs and could serve as a form of error indicator. This topic needs to be explored further.