Particle filters for applications in geosciences

Particle filters contain the promise of fully nonlinear data assimilation. They have been applied in numerous science areas, but their application to the geosciences has been limited due to their inefficiency in high-dimensional systems in standard settings. However, huge progress has been made, and this limitation is disappearing fast due to recent developments in proposal densities, the use of ideas from (optimal) transportation, the use of localisation and intelligent adaptive resampling strategies. Furthermore, powerful hybrids between particle filters and ensemble Kalman filters and variational methods have been developed. We present a state of the art discussion of present efforts of developing particle filters for highly nonlinear geoscience state-estimation problems with an emphasis on atmospheric and oceanic applications, including many new ideas, derivations, and unifications, highlighting hidden connections, and generating a valuable tool and guide for the community. Initial experiments show that particle filters can be competitive with present-day methods for numerical weather prediction suggesting that they will become mainstream soon.


Introduction
Data assimilation for geoscience applications, such as weather or ocean prediction, is a slowly maturing field.Even the linear data-assimilation problem cannot be solved adequately because of the size of the problem.Typically, global-scale numerical weather prediction needs estimation of over 10 9 state variables, assimilating over 10 7 observations every 6-12 hours.Existing methods like 4DVar do not provide accurate uncertainty estimates and need efficient pre-conditioners, and Ensemble Kalman Filters heavily rely on somewhat ad-hoc fixes Although these problems are formidable, another difficulty arises from the fact that the problem is typically nonlinear, and, with increasing model resolution and more complex observation operators, increasingly so.Both variational and Kalman-filter-like methods have difficulty handling nonlinear problems.Variational methods can easily fail when the cost function is multimodal, and are hampered by the assumption that the prior probability function (pdf) of the state is assumed to be Gaussian.Ensemble Kalman filters make the explicit assumption that the prior pdf and the likelihood of the observations as function of the state are Gaussian, or, somewhat equivalently, assume that the analysis is a linear combination of prior state and observations.Both methods have been shown to fail for nonlinear dataassimilation problems in low-dimensional systems, and both have been reported to have serious difficulties in numerical weather prediction at the convective scale where the model resolution is only a few km.Particle filters hold the promise of fully nonlinear data assimilation without any assumption on prior or likelihood, and recent text books like Reich and Cotter (2015), Nakamura and Potthast (2015), and van Leeuwen et al. (2015) provide good introductions to data-assimilation in general, and particle filters in particular.
The standard or bootstrap particle filter can be described as follows.The starting point is an ensemble of size N of model states x n i ∈ ℜ Nx , called particles, that represent the prior probability density function (pdf) p(x n ), as: Between observations, each of these particles is propagated forward from time n − 1 to time n with the typically nonlinear model equations in which f (..) denotes the deterministic model, and β n is a random forcing representing missing physics, discretisation errors, etc.In this paper we assume this model noise to be additive, but one could also consider multiplicative noise in which β n is a function of the state of the system.We assume that the pdf from which the β n are drawn is known; typically a Gaussian N (0, Q).
At observation times the true system is observed via: in which the observation errors ǫ n are random vectors representing measurement errors and possibly representation errors.Again we assume that these errors have known characteristics, often Gaussian, so e.g.ǫ n ∼ N (0, R).These observations y n ∈ ℜ Ny are assimilated by multiplying the prior pdf above with the likelihood of each possible state, i.e. the probability density p(y n |x n ) of the observation vector given each possible model state, following Bayes Theorem: the observation error ǫ shifted by H(x n ), we find: (5) If we insert our particle representation of the prior into this theorem we find: in which the particle weights w i are given by: Since all terms are known explicitly we can just calculate this as a number.This self-normalising is consistent with the notion that for a proper representation of a pdf the sum of the weights should be equal to one, so that the integral over the whole state space of the particle representation of the pdf is equal to one.
Propagating the particles x n i to the next observation time n + 1 gives a weighted representation of the prior at time n + 1.
Assimilating the observation at time n + 1 by Bayes Theorem leads to a modification of the weights: Even in low-dimensional applications, the variation of the weights increases with the number of assimilation steps.Eventually one particle has a much higher weight than all the others.To prevent this, resampling can be used before propagation to obtain equally weighted particles.This duplicates high-weight particles and abandons low-weight particles.After resampling, some of the particles have identical values, but if the model contains a stochastic component and independent random forcings are used for different particles, diversity is restored.
In high-dimensional problems the weights vary enormously even at one observation time, and typically one particle obtains a much higher weight than all the others.Snyder et al. (2008Snyder et al. ( , 2015) ) have shown that the number of particles needed to avoid weights collapse, in which one particle gets weight 1 and the rest weights very close to zero, has to grow exponentially with the dimension of the observations y for a large class of particle filters.If the weights collapse, all particles are identical after resampling, and restoring diversity could take a long propagation time.
From the discussion above it becomes clear that for particle filters to work we need to ensure that their weights remain similar.In this article we will discuss four basic ways to make progress on this fundamental problem.In the first one, we need to assume that the model contains a stochastic component.As soon as that is the case we can explore the so-called proposaldensity freedom to steer particles through state space such that they obtain very similar weights.This is the traditional way to try to solve the degeneracy problem in the particle filter literature, see e.g.Doucet et al. (2001).As pointed out by e.g.Snyder et al. (2008) there are fundamental problems when applying these techniques to the high-dimensional geoscience applications.We will examine the issue in detail and discuss other ways to formulate the problem, leading to so-called equal-weight particle filters.Although hey are not the full solution to the problem, they do point towards new ways to formulate and attack the degeneracy problem.
In the second one, we consider methods that transform the prior particles into particles from the posterior, either in one go, or via a more smooth transformation process.This is a much newer development following basic ideas of the ensemble Kalman filter and recently transferred to particle filters by Reich (2013).
While the one-step approaches can be shown to fail in highdimensional settings, they do lend themselves very naturally to localisation, as discussed below.However, the smooth transition variants, although more expensive, seem to be able to avoid the degeneracy problem, and are an interesting new development.
The third, more straightforward from the geoscience experience, approach is to introduce localisation in particle filters.While initial implementations were discouraging (e.g.Van Leeuwen, 2009), new formulations have shown remarkable successes, such that localised particle filters are now tested in global operational numerical weather prediction systems, e.g.Potthast et al. (2018).
It is important to note that the challenges that come with localisation and inflation in Ensemble Kalman Filters are inherited when applying localisation in particle filters, with an extra complication c 2017 Royal Meteorological Society Prepared using qjrms4.clsdue to the nature of the non-Gaussianity in the distributions for strongly nonlinear systems.
The fourth approach is to abandon the idea of using pure particle filters and combine them with Ensemble Kalman Filters.
This should not be confused with using Ensemble Kalman Filters in proposal densities, as that would still be a pure particle filter.
Instead, one uses both filters.Several variants exist, such as second-order exact filters, in which only the first two moments are estimated, sequential versions in which first an EnKF is used and the posterior EnKF ensemble is used as input for the particle filter, or vice versa, and combinations in which localised weights are calculated and dependent on the effective ensemble size a full particle filter, an EnKF, or a combination of the two is used.An application to a full convective scale weather model has been successfully realised for the COSMO-KENDA system in Robert et al. (2017).
These four variants form the basis of the following four chapters.Each chapter contains a critical discussion of the approximations and remaining major issues.The paper is closed with a concluding section and an outlook of what possible next steps could be.

Proposal density particle filters
Ideally we draw independent samples directly from the posterior pdf because the samples would all have equal weight automatically.This can only be done, however, when the shape of the posterior pdf is known and when it is easy to draw from the posterior.An example of this is a Gaussian prior combined with a linear Gaussian likelihood.Under these assumptions the posterior is also Gaussian and the mean and covariance can be calculated directly from the prior using the Kalman update equations.Ensemble Kalman filters make use of this result and draw directly from that pdf, which is why all posterior particles have equal weights in an Ensemble Kalman Filter.
Markov-Chain Monte-Carlo methods draw directly from the posterior in a sequential way, so one sample after the other, after a burn-in period, see e.g.Robert and Cassela (2004), or van Leeuwen et al. (2015) for a geophysics-friendly introduction.
The samples are correlated, often 100% when the new sample is not accepted, making them very inefficient in high-dimensional systems.
The standard particle filter draws particles from the prior.These then have to be modified to become particles of the posterior via the weighting with the likelihood.This is a general procedure in statistics called importance sampling: one draws from an approximation of the pdf one is interested in, and corrects for this via so-called importance weights.
In the introduction we argued that drawing from the prior leads to weights that vary too much: typically, in high-dimensional problems with numerous independent observations one particle gets weight 1, and all other particles have a weight very close to zero.However, we could explore the idea of importance sampling on the transition from one time to the next.When the numerical model is not deterministic but stochastic we have the freedom to change the model equations to move the particles to those parts of state space where we want them to be, for instance closer to the observations.
Mathematically this works as follows.Assume we have observations at time n, so Bayes Theorem at time n is given by ( 4).If the model is stochastic, we can write the prior as where p(x n |x n−1 ) is the transition density, the pdf of the state at time n when the state at time n − 1 is known.For instance, if the model error is additive, so the model equation is given by (2), it holds that Often the model errors are assumed to be Gaussian, so β ∼ N (0, Q) and we find Assume now that at time n − 1 we have a set of weighted particles as in ( 1), but with weights w n−1 i instead of 1/N .We can evaluate the expression (9) for the prior as a weighted mixture of transition densities In the following we neglect the approximation error at time n − 1 and assume that ( 12) is exact.By Bayes formula (4), the posterior can then be written as: In the standard particle filter one makes one draw from ) for each i, and we know that this leads to ensemble collapse for high-dimensional systems.However, now the prior particles at time n are allowed to arise from following a different model equation.For instance when the model is given by (2), we can use in which g(., .) is now the deterministic part and βn is the stochastic part.Note that we allowed g(..) to depend on the observations at the future time.This means that we generate the prior particles at time n by making one draw from In general, we draw the particles at time n from an alternative model q(x n |x n−1 , y n ) and correct this by changing the weights of the particles.The pdf q is called the proposal density.We are now in the position to find the change in weights related to using such a proposal density.Multiplying and dividing each the term under the sum in (12) with the transition density q(x n |x n−1 i , y n ) leads to: This can always be done as long as the support of q(.|x n−1 ) is at least as large as that of p(.|x n−1 ).In a reinterpretation of this equation, if x n i is drawn from the alternative model q(x n |x n−1 i , y n ) and we define weights by then the weighted set of particles nevertheless represents the original prior p(x n ): This shows that a different model can be used in a particle filter, as long as we keep track of the weights.If we now combine this expression with Bayes Theorem we find for the posterior pdf: where the weights are given by: We see that the weights now contain two factors, the likelihood weight, which also appears in the standard particle filter, and a proposal weight.These two weights have opposing effects.If we use a proposal density that strongly pushes the model towards the observations, the likelihood weight will be large because the difference between observations and model states becomes smaller, but the proposal weight becomes smaller because the model is pushed away from where it wants to go, so p(x n |x n−1 i ) will be small.On the other hand, a weak pushing towards the observations keeps the proposal weight high, but leads to a small likelihood weight.This suggests that there is an optimum weight related to an optimal position x n i for each particle as function of its position at time n − 1.This will be explored in equal-weight formulations of the particle filter.

A simple relaxation scheme
To illustrate the idea of a proposal density we consider the following simple example.We could add a relaxation or nudging term to the original equation to steer the particles towards the c 2017 Royal Meteorological Society Prepared using qjrms4.clsobservations and make their weights more similar, as pioneered by van Leeuwen (2010) for geoscience applications.The model equation is written as: where we used time index m for the state vector to emphasise that there are several model time steps between observation times.
T a relaxation matrix of our choice.It is important to make a distinction between the deterministic part of the proposal and the random part of the proposal.In this example, the deterministic part consists of the first two terms on the right-hand side of the equation, while the third term denotes the random part.Let's assume the pdf of the random forcing is Gaussian with mean zero and covariance Q.Then we can immediately write for the proposal density since the pdf of x m is just a shift in the mean of the pdf of βm .For the original model, we assume that the random part is Gaussian with zero mean and covariance Q, so that The change in the model equations is compensated for in particle filters by a change in the relative weight of each particle, and the expression for this change in weight for this case is: in which, for Gaussian model errors, and Note that the normalisation factors of the Gaussians do not have to be calculated explicitly if we use that the sum of the weights has to be equal to one.
Simple as the scheme is, it does not solve the degeneracy problem.However, it can be used as a simple scheme when several model time steps are used between observation times, in combination with other schemes to be discussed next.

Weighted Ensemble Kalman Filter
One could also use other existing data-assimilation methods in proposal densities, like Ensemble Kalman filters or variational methods.In the Weighted Ensemble Kalman filter the stochastic EnKF of Burgers et al. (1998) is used as follows.The Ensemble Kalman Filter update can be written as: Kalman gain and ǫ i ∼ N (0, R), with R the observational error covariance.Using the expression for the forecast x f i in the Kalman filter update equation we find: which we can rewrite as the sum of a deterministic and a stochastic part as: and Therefore, we find for the proposal density: c 2017 Royal Meteorological Society Prepared using qjrms4.clswith Strictly speaking, this is correct only if the Kalman gain is calculated using the ensemble covariance of f (x n−1 ), so without the model errors β n , otherwise the proposal is not Gaussian.We can calculate the weights of the particles in a similar way as in the previous example.As experimentation shows, when the number of independent observations is high this filter will be degenerate, consistent with the theory of Snyder et al. (2015), and as proven in the next section.The only way to make this work is to include localisation, not only at the EnKF level, but also at the level of the particle filter.We will discuss local particle filters in a later section.

Optimal proposal density
In the class of particle filters in which the proposal density of each particle is dependent on only that particle, an optimal proposal density was derived by Snyder et al. (2015).They defined optimality as the proposal density that gives minimal variance of the weights, see e.g.Doucet et al. (2001).It turns out we can actually generalise this result and show that this optimal proposal density is optimal even when each particle has its own proposal density which is allowed to depend on all previous particles, so a proposal of the form q(x n |i, x n−1 1:N , y n ).Snyder et al. (2015) concentrate on the case that one is interested in an optimal representation of p(x n , x n−1 |y n ) in a sequential algorithm, so in a sequential smoother.To this end they introduce the random variable and determine that proposal density q that minimises the variance in the weights w * , with the expectation taken over the density from which we draw the particles, so the proposal q.
Here we show that the optimal proposal density is also optimal for the strict filtering case, so when we are interested in minimal variance of the weights at time n only.Specifically, the question is: given the set of particles at t = n − 1 drawn from p(x n−1 |y 1:n−1 ), which proposal density of the form q(x n |i, x n−1 1:N , y n ) gives minimal variance of the weights at time n?
Using Bayes formula, we can write the expression for the weight of particle i as function of the state at time n as: where we assume, without loss of generality, an equally weighted ensemble at time n − 1.
Consider the pair of random variables (I, x n ) where P rob( N and, conditionally on Furthermore, define the random variable where In order to find the proposal q that minimizes the variance of W , we use the well-known law of total variance (derived in the appendix for completeness): where we used that W = W (X) when I is given.First, we see that and thus the first term in V ar W (W ) is which is non-negative because it is a variance.For the second term we use that V ar X|I (W ) ≥ 0 with equality iff W is almost surely Because both p and q are densities (in x n ), cst = 1.Combining these results, we have a lower bound for V ar(W ) that is determined by the variance of p( ¯yn |x n−1 i ) over i, with equality iff q(x n |i, x n−1 Note that this is a new result as previous proofs only considered proposal densities of the form q(x n |x n−1 i , y n ), and we extended it to more general proposal densities of the form q(x n |i, x n−1 This remarkable result shows that firstly the optimal proposal density, so p(x n |x n−1 i , y n ), does indeed lead to the lowest variance in the weights for the class of particle filters in which the transition density is of the form q(x n |i, x n−1 Secondly, it shows that we can predict the variance in the weights without doing the actual experiment, for any number of particles, provided we can compute p(y n |x n−1 i ), and thirdly the weights are independent of the position of the particles x n .Any draw of N particles of the optimal proposal density will give the same weights ) and hence the same variance in the weights.Unfortunately, this variance is zero only when the observations are not dependent on the state at time n − 1, which is never the case.
A simple case where we can compute both the optimal proposal density and the weights p(y ) is given by ( 11) and the observation operator H = H is linear.By the same argument that is used to derive the Kalman filter update, we find where T is the Kalman gain for the background covariance Q, and This shows two things: First, in this special case, the simple relaxation scheme of Section 2.1 is optimal.Second, we can compare the weights of the optimal proposal with the weights of the standard filter: Both depend on the squared distance ||y n − Hf (x n−1 i )|| 2 , but in the standard particle filter the distance is defined w.r. to R and in the optimal proposal the distance it is defined is w.r. to HQH T + R. Hence the weights of the optimal proposal are more equal, but the improvement is substantial only if Q is large and the analysis of weight collapse by Snyder et al. (2008) still applies.
One can extend the optimal proposal density idea to more than one time step.Snyder et al. (2015) show that the optimal proposal is the proposal of this form with minimal variance in the weights in this case too, which can also easily be sen by applying the above to Looking back at the filters described in the previous sections we find the following.The relaxation scheme uses a simple proposal density that is of the form q(x n |x n−1 i , y n ), so the theory holds, and that proposal will lead to degenerate results.This is indeed the finding of van Leeuwen (2010).The Weighted Ensemble Kalman Filter has a proposal that depends on all particles at time n − 1 through the Kalman gain K, so the proposal is of the Hence also this filter will perform worse than the optimal proposal and hence will be degenerate for highdimensional systems.

Implicit Particle filter
The implicit particle filter is an indirect way to draw from the optimal proposal, even over several time steps.Often the assumption is made that the model errors of both original model and proposal density are Gaussian, and the observation operator H is linear.In that case a draw from the optimal proposal is a draw from a multivariate Gaussian, and we know how to do that.
However, when H is nonlinear, or when the proposal is used over several model time steps the density to draw from is not Gaussian anymore.Chorin et al. (2010) realised that one could still draw from a Gaussian and then apply a transformation to that draw to find samples from the optimal proposal density.The method is explained here for one time step, but the extension to multiple time steps is straightforward.
As mentioned in Sec. 2 on the proposal density the posterior pdf can be written as: c 2017 Royal Meteorological Society Prepared using qjrms4.cls The scheme draws from a Gaussian proposal q(ξ) = N (0, I), and we can write the transformation as q(x n |x n−1 i , y n ) = q(ξ)J −1 i in which J i is the Jacobian of the transformation from x n to ξ.That transformation is found implicitly, hence the name of the filter, by defining and, after drawing ξ i for each particle, solving for x n in for each particle, in which The weights of the particles become: Interestingly, while the optimal proposal density shows that the weights are only dependent on the position of the particles at previous time, so on x n−1 i via φ i , the implicit map makes the weights also dependent on the positions at time n, so on x n i via the Jacobian of the transformation between ξ and x.Only when the Jacobian is a constant, so when F i is quadratic in x i , this dependence disappears.
Solving ( 36) is not straightforward in the general case.Morzfeld et al. (2012) suggest a random map of the form in which P is a chosen covariance matrix, ideally the covariance of the posterior pdf, x a i = arg min F i (x n ) and λ i is a scalar.This transforms the problem into solving a highly nonlinear scalar equation for λ i , which is a much simpler problem than finding x n i directly.These map can be shows to be onto when F i (x n i ) has only closer contours in the high-probability regions; otherwise one would have to first choose a closer contour area and then perform the map.In general, when the optimal proposal (over several time steps if needed) is multimodal the transformation from the state variable to a Gaussian is not monotonic, and the implicit particle filter needs to be adapted, e.g. by using a separate Gaussian for each mode.
Of further interest is that x a i is the same as the solution to a 4DVar problem well known in meteorology.But it is a special 4DVar as the initial position of each particle is fixed and it has to be a weak-constraint 4DVar.The latter condition is needed as a strong-constraint 4Dvar would have no possibility to move a particle in state space as its initial condition is fixed.
However, also this filter will suffer from weight collapse in high-dimensional applications as it is still a sampling scheme for the optimal proposal density.The following sections will discuss ways to improve on the optimal proposal.

Equal weights by resampling at time n − 1
As noted already in equation ( 33), we can write equation ( 13) as where This says that the analysis distribution is a mixture of the optimal proposal pdf's p(x n |x n−1 i , y n ) with mixture weights α i .
If we can compute the optimal proposal density and the weights α i in closed form, we can also draw samples directly from this mixture density.For this, we first draw an index I from the discrete distribution with weights α i , P rob(I = j) = α j , followed by a draw from the corresponding pdf p(x n |x n−1 I , y n ).Doing this N times will lead to N different particles with equal weights because each of them is an independent draw directly from the posterior.If the index I is equal to a value j more than once, the particle x n−1 j is propagated independently several times.However, this does not solve the problem of weight collapse because drawing the index I is nothing else than resampling the particles at time n − 1 with weights proportional to N , the variance of these weights is c 2017 Royal Meteorological Society Prepared using qjrms4.clsexactly equal to the lower bound that we found in Section 2.3.The main difference is that the collapse now happens at time n − 1, and all particles will be different at time n.
If we cannot compute the optimal proposal density and the weights α i in closed form, we can still use the importance sampling idea to draw from the mixture p(x n |y n ) by drawing pairs (I, X n ) consisting of an index I and a state X n at time n.We choose a proposal distribution β i = β i (y n ) for the index and proposal distributions q(x n |x n−1 i , y n ) for the state.Then we draw the index I i with P rob(I i = j) = β j (y n ) and conditionally Finally, we compute weights w n i by The particles x n i with weights w n i provide the desired approximation of p(x n |y n ) whereas the indices I i can be discarded after the weights have been computed.We could produce an evenly-weighted approximation by a further resampling step, or take the weights w n i into account during the next iteration.
In this approach we can obtain equal weights w n i by choosing and With this choice, we draw directly from the mixture (39).As mentioned before, although the weights w n i are then equal to 1 N , the algorithm contains a hidden weighting and resampling step of particles at time n − 1.It remains thus susceptible to weight collapse in high dimensions.
This approach of using importance sampling for the joint distribution of (I, X n ) is due to Pitt and Shephard (1999) who called it "Auxiliary Particle Filter" (the index I is an auxiliary variable that is discarded at the end).They discuss in addition approximations of the optimal proposal density and the optimal weights α i .One of their suggestion is to use for the index I the proposal with weights where µ n i is a likely value of the distribution p(x n |x n−1 i ), e.g.
the mean or median or simply a draw from it.Typically, µ n i is found by a probing step where particles at time n are propagated by a simplified model, e.g. by omitting stochastic terms or with simplified subgrid-scale parameterisations or thermodynamics.If the weights become They will vary less provided x n i is close to µ n j , i.e. provided the simplified model is a good approximation to the full model and the stochastic part of the full model is small.

The Equivalent-Weights Particle Filter
The EWPF (van Leeuwen 2010; Ades andvan Leeuwen 2013, 2015a,b) is an idea to obtain a more evenly weighted set of particles by not sampling from the exact posterior, but allowing for a small error.It starts with determining the weight of each particle at the mode of p( Note that these weights are equal to the weights obtained in the optimal proposal density.In the optimal proposal density case the weights do not depend on the position x n of the particle, but note that the proposal used here will be different.
The particles are not moved to these modes, but the weights are used to define a target weight.This target weight w target is chosen such that a certain fraction ρ of particles can reach that weight.To this end we sort the weights in magnitude from high to low in an array w * i , i = {1, 2, ..., N ]} and set w target = w * N * ρ .
For instance, with 100 particles and a fraction of ρ = 0.8 we would The next step is to find a position in state space for the fraction of particles that can reach this weight such that its weight is exactly equal to the target weight.This means we solve for x n in c 2017 Royal Meteorological Society Prepared using qjrms4.cls for each particle i that can reach this weight.Denote this position as x * i .Note that this is purely deterministic move, so a stochastic part still has to be added.The final position of these particles is then determined by adding a very small random perturbation ξ from a chosen density, so This stochastic move ensures that the proposal has full support and is not a delta function centred at x * i .The density of ξ i should on the one hand have most of its mass concentrated around 0 in order not to change the weights of the particles too much, and on the other hand it should be relatively constant since we divide by the value of the proposal density.Both requirements cannot be fulfilled exactly, but we can take some error in the sampling into account and choose a narrow uniform distribution.
A formal way to avoid such an error has been described by Ades and van Leeuwen (2015b).They choose the proposal to be a mixture of a uniform density and a Gaussian.Both have small variance, and the mixture coefficient of the uniform density is chosen to be much larger than that of the Gaussian.This means that drawing from the Gaussian and also drawing from its tails becomes highly unlikely.In practice, since we always work with small ensemble sizes the chance of filter degeneracy by drawing from the tail of the Gaussian is indeed highly unlikely.In this way we obtain a proposal distribution that satisfies the requirements and thus the variance of the weights will be larger than the minimum derived in Section 2.3.However, the large variance arises from a highly unlikely event.With high probability the weights for a fraction ρ of particles will be almost equal.
Finally, the full weights for the new particles is calculated and whole ensemble is resampled, including those particles that were unable to reach the target weight.Because of the target-weight construction the weights of the particles are very similar, and filter degeneracy is avoided.
To analyse the scheme further, we can again look at the variance of the weights.Before we do that it is important that this scheme does not see the weight of a particle as a function of the state X and particle index I, but rather the state as function of the weight W and index I, so X(W, I).Specifically, W |I has values in two ranges.For the particles with I = i that can reach the target weight we find w|I = w target + ǫ i in which ǫ i is a small perturbation from the target weight due to the small stochastic move discussed above.For those particles that cannot reach the target weight their weights are very close to zero.So we find: If H is linear and the errors in observations and model equations are Gaussian we find ǭ = 0, but if any of these three conditions does not hold this is not necessarily so.
However, we do know that by construction |ǭ| << 1.Since the sum of the weights should be equal to 1, we find that w target ≈ 1/(N ρ), and hence E I [W ] = 1/N , as expected.Furthermore This expression shows that the variance in the weights ranges between 0 for ρ = 1, so when all particles are kept, to (N − 1)/N 2 ≈ 1/N for ρ = 1/N , so when one particle is kept.We can compare this with the optimal proposal when the number of independent observations is large.In that case one particle will have a weight very close to one, and the rest will have weights very close to zero.The variance in the weights is then (N − 1)/N 2 ≈ 1/N , indeed equal to the ρ = 1/N case in the EWPF scheme, as expected.The EWPF can, however, reduce that variance, even to zero, depending on the choice of the tuning parameter ρ.
On the other hand, as mentioned above, there is a very small but nonzero change to draw from the tails of the Gaussian in the proposal.Since the final weight of the particle is proportional to the inverse of the proposal, and the proposal is very small in the tail of the Gaussian, this would lead to a very large change in the weight from the target weight which in turn would mean that ǫ is not small.The way the filter avoids that is by targeting small ensemble sizes, so that this possibility remains very small indeed.
In practise this means that the filter will work well, with filter degeneracy as a very rare event.Alternatively, one could choose the proposal as a narrow uniform density and accept a bias in the c 2017 Royal Meteorological Society Prepared using qjrms4.clsfilter, which then should be smaller than the Monte-Carlo error resulting from the small ensemble size.
As can be deduced from the above, the filter has a number of tuning parameters, namely the fraction of particles used to determine the target weight, and the details of the density of the small-amplitude random perturbations added near the end.A large fraction will lead to a lower target weight, and hence particles will be moved further away from the mode of the optimal proposal density.This in practise will mean that the particles are pushed further away from each other, leading to a wider posterior pdf.A small value for the fraction will have the opposite effect.Since we do not know a-priori what the width of the posterior should be, this is a clear drawback of this method.We come back to this later.

The Implicit Equal-Weights Particle Filter
In the Implicit Equal Weights Particle Filter (IEWPF) we set the target weight equal to the minimum of the optimal proposal weights for al particles.Then, the position of each particle is set to the mode of the optimal proposal density plus a scaled random perturbation.The scale factor is chosen such that the weight of each particle is equal to the target weight.Note that in the standard setting no resampling is needed, but see Zhu et al. (2016) for other possibilities.
The implicit part of the scheme follows from drawing samples implicitly from a standard Gaussian distributed proposal density q(ξ) instead of the original q(x n |x n−1 , y n ), following the same procedure as in the implicit Particle Filter.We define a relation where x a i is the mode of p(x n |x n−1 i , y n ), P is a measure of the width of that pdf, ξ n i ∈ ℜ Nx is a standard Gaussian-distributed random vector, and α i is a scalar.
The IEWPF scheme is different from the implicit Particle Filter in that it chooses the α i such that all particles get the same weight w target , so the scalar α i is determined for each particle from: This target weight is equal to the lowest weight over all particles in a optimal proposal.This ensures that the filter is not degenerate in systems with arbitrary dimensions and an arbitrary number of independent observations.The resulting equation for each α i is nonlinear and complex because it will contain the Jacobian of the transformation from ξ n to x n , similar to the implicit particle filter.
That will contain the derivative of α i to ξ i , which is the main source of the complexity.
The scheme is similar to the optimal proposal density using the implicit particle filter by first determining the mode of the proposal and then adding a random vector.The difference is that in the IEWPF the size of the vector is determined such that the each particle reaches the target weight.It turns out that this construction excludes part of state space for all but one particle.For each particle the excluded part is different, so the ensemble samples the whole space, but the individual particles do not.Details of the method can be found in Zhu et al. (2016).
Analysing the scheme in more detail, the proposal density used in this scheme is of one dimension lower than that of the state itself.The direction of the random vector in state space is determined by the proposal density, but the size of the random vector is then determined deterministically, dependent on that direction.So the proposal density misses one degree of freedom for all but one particle, the particle with the lowest weight that has Although missing one degree of freedom in a very high dimensional system might seem acceptable it does lead to a bias.

Discussion
We first note that the optimal proposal is not optimal, as has been known a long time with the invention of the auxiliary particle filter.However, it turns out that it is not difficult to generate particle filters that even have zero variance in the weights.In the optimal proposal setting one forces P rob(I = i) = 1/N , while the simple choice P rob( ) leads to an equalweight particle filter.Furthermore, schemes have been introduced that consider the state as function of the state at previous time and the weight that state should obtain, so instead of working with W (X, I) we choose X(W, I), which opens up a whole new range of efficient particle filters in high dimensional systems.
c 2017 Royal Meteorological Society Prepared using qjrms4.cls It is noted that in implementation these two latter schemes presented above are strongly related to the optimal proposal density scheme.This is clearly visible when the observation errors and the model equation errors are Gaussian and the observation operator is linear.In that case each new particle can be written as the sum of a deterministic and a stochastic move, as The EWPF and the IEWPF are particle filters that are not degenerate in high-dimensional systems (by construction) and that do not rely on localisation.However, it is easy to see that both filters are biased, or inconsistent.In the limit of an infinite number of particles the target-weight constructions will prevent the schemes to converge to the full posterior pdf.While this might make them uninteresting for some, the situation is more subtle.We know that it will always be impossible to have enough particles in any realistic high-dimensional geophysical system.So the limit of an infinite ensemble size is of no relevance to real geophysical applications.We need to infer the error in the estimate for a finite number of particles, and, given the present computer capabilities, we are talking in the range of 10-1000 particles.As long as the bias from the target-weight construction is smaller than the Monte-Carlo error the bias is of no direct consequence.
Furthermore, we have to realise that we cannot store the full pdf, so we will have to choose what aspects of that pdf we are interested in.That might mean that a large bias in one variable might be irrelevant for another.This, in turn, then suggests that different particle filters will be best for different problems.
Coming back to the equal-weight formulations, it will be clear that the number of possible methods that have this property is huge, and much more research is needed to explore the best possibilities and to understand the nature and magnitude of the approximations made.

Transportation Particle Filters
In resampling particle filters the prior particles are first weighted to represent the posterior and then transformed to unweighted particles simply by duplicating high-weight particles and abandoning low-weight particles.This can be done in several ways, see e.g.Doucet et al. (2001).The problem is that if one particle has a much higher weight then all the others the unweighted particles will all be identical to that high-weight particle, and the filter is degenerate.In transformation particle filters one tries to find a transformation that move particles from the prior to particles of the posterior in a deterministic manner.
A related approach, which uses random transformation steps, is based on tempering the likelihood, which we also discuss in this section.

One-step transportation
In the Ensemble Transform Particle Filter (ETPF) Reich (2013) the unweighted particles are linear combinations of the weighted particles, so one writes: and similar for X a , and in which D is a transformation matrix.The only conditions on These three conditions leave a lot of freedom for all N 2 elements of D, and a useful way to determine them is to ensure minimal overall movement in state space of the particles from prior to posterior.
This leads to an optimal transportation problem and is typically solved by minimizing a cost function that penalises movement of particles.
We can see immediately that this method will not work when the weights are degenerate as the solution will be degenerate and all particles have no other choice than move to the prior particle c 2017 Royal Meteorological Society Prepared using qjrms4.clswith weight (close to) one.However, the strength of this filter is that allows for localisation in a very natural ways by making the weights, and hence the matrix D, space dependent.The method will be discussed in more detail in the section on localisation.
The ETPF provides a direct map from prior to posterior particles without explicitly constructing a transformation map.An alternative approach has been suggested in Moselhy and Marzouk (2012), where an approximate transportation map T is constructed such that T belongs to certain family of maps and T is chosen such that the Kullbeck-Leibler divergence between the pdf generated by T and the posterior pdf is minimized.See Spantini et al. (2017) for an efficient implementation in the context of filtering and smoothing.

Tempering of the likelihood
Instead of trying to transform the particles from the prior to particles from the posterior in one go one can also make this a more smooth transition.In tempering (Neal (1996), see also with 0 < γ i < 1 and ensuring that the sum of the γ's is equal to 1.
Then the weighting of the particle filter is first done with the first factor, so The particles are resampled, and now the weighting is performed using the second factor, followed by resampling, etc.Of course, after resampling several particles will be identical, so one needs to jitter the particles, so perturb them slightly, to regain diversity.
This jittering should be a move of the particles that preserves the posterior pdf.It could be implemented as a Markov-Chain Monte-Carlo method with the posterior as the target density, e.g.
A problem is, however, that in sequential filtering we only have a representation of the posterior density in terms of the present particles, and this representation is very poor due to the small number of particles.Possible avenues are to fit a pdf of a certain shape to the present particles, e.g. a Gaussian mixture model, and use that as target density.
A problem in the geosciences is that this posterior fit needs to preserve the delicate balances between the model variables that are present in each particle, and an extra complication is that these balances can even be nonlinear.Also the transition kernel of the Markov Chain should somehow preserve these balances.
An example of its use in the geosciences is the Multiple Data Assimilations (MDA) method of Emerick and Reynolds (2013), in which the intermediate pdf's are assumed to be Gaussian.
If, however, one allows for model error in the model equations, and these errors are indeed non-negligible in geophysical problems, the following scheme proposed by Beskos et al. (2014) does not have this problem.In that case the prior at observation time can be written as (see equation 9): in which we assume equal-weight particles at time n − 1 for ease of presentation.In this case the MCMC method that has the posterior as invariant density is easy to find as the transition densities defined above, followed by an accept/reject step.
When This means that we have to compensate for the weights created by this sampling, so we need to introduce particle weights Then, the scheme moves on to the next observation time m ′ and repeats the weighting and the resampling.The weights contain two factors, the weight gathered at time m and the new weight that ensures that the particle moves in the right direction, so When the effective ensemble size, typically defined as becomes too low the scheme resamples and repeats the procedure, until the actual observation time.
The scheme generates extra weights during the model integration, but corrects for them at each new time we do the resampling, ensuring much better positioned particles at the actual observation time n.It has not been used in high-dimensional settings but can easily be combined with any of the particle filters described in the section on proposal densities, and with localised particle filter schemes.

Particle flow filters
There is a recent surge in methods that dynamically move the particles in state space from equal-weight particles representing the prior, p(x), to equal-weight particles representing the posterior, p(x|y).In other words, one seeks a differential equation with cs = log π(y|x)dx.Explicit expression for fs are available for certain pdfs such as Gaussians and Gaussian mixtures (Reich 2012).These particle flow filters can be viewed as a continuous limit of the tempering methods described in the previous subsection avoiding the need for resampling and jittering.
Note that the elliptic partial differential equation ( 54) does not determine fs uniquely.Optimal choices in the sense of minimizing the L 2 (ps)-norm of fs lead to the theory of optimal transportation, see Villani (2008) and Reich and Cotter (2015).
Alternatively, one can seek vector fields fs, such that as the desired vector field in ( 52).An alternative approach, called, Stein variational decent, has recently been proposed by Liu and Wang (2016).Stein variational descent can be viewed as a numerical approximation to a particle flow (52) with vector field fs(x) := ps (−∇xV (x) − ∇x log ps(x)) (58) (Lu et al. 2018).
In general, to use (63) we need to be able to evaluate ps(x i ), which is typically unknown as we only know the particle representation of ps(x).A numerical implementation of the two formulations ( 57) and ( 58) can be based on a reproducing-kernel Hilbert space F with reproducing kernel K(., .),typically taken as a Gaussian.In the sequel, we will therefore assume that K(x, z) = K(z, x) (symmetric kernel).The inner product g, f F in F satisfies the reproducing property A computational approximation to (57) can now be obtained as follows (Russo 1990;Degond and Mustieles 1990).One approximates the pdf ps by the vector field fs by and the N particles x j move under the differential equations We still need to chose the right-hand side of (62) appropriately.
But we already remark that (53) holds independent of any specific choice for u j s .Since the drift term ( 57 where the last identity defines the variational or functional gradient of KL.Maximising this change in KL as function of the flow field fs is not trivial in general.However, with the reproducing kernel property of fs we have in which K is a vector-valued kernel, typically taken as K = IK. Using this in (65), the variational derivative of the KL divergence is found as The important point is that this variational derivative is independent from fs.One now chooses fs along this direction, c 2017 Royal Meteorological Society Prepared using qjrms4.clswhich gives the steepest descent, as Finally, one replaces the integral in (67) by its empirical approximation, to obtain for the dynamics (52) in the N particles x j .
The intuition behind Stein variational descent is that the first term in (69) pulls the particles towards the mode of the posterior, while the second term acts as a repulsive force that allows for particle diversity.Liu and Wang (2016) derived this formulation for a steady-state problem, and Pulido and van Leeuwen (2018) have extended the method to sequential particle filters.
The free parameter of these methods is the reproducing kernel K(., .), which needs to be chosen such that the particles sample the posterior and that physical (and potentially other) balances are retained.One also needs to select a proper time stepping scheme, typically chosen as a forward Euler scheme with variable time step ǫ, which can now be viewed as the step length in a gradient descent optimisation algorithm.

Discussion
Viewing particle filters as a transportation problem from equalweight particles of the prior to equal-weight particles of the posterior has led to an interesting set of filters.None of them have been implemented yet in high-dimensional settings, but some of them are ready to do so.The strong involvement of the machine learning community in problems of this kind also suggests rapid progress here.Finally we mention that the equal-weight particle filters from section 2 can be viewed as one-step transportation filters that explore the proposal density freedom, and in fact transform prior particles at time n − 1 to posterior particles at observation time n.

Localisation in Particle Filters
Localisation is a standard technique in Ensemble Kalman filtering to increase the rank of the ensemble perturbation matrix, allowing for more observations to be assimilated, and to suppress spurious correlations where real correlations are very small, but ensemble correlations are larger because of sampling noise.It limits the influence each observation can have to a localisation area that is much smaller than the full model domain.That idea can easily be incorporated when calculating the particle weights locally, as pioneered by Bengtsson et al. (2003), andvan Leeuwen (2003), and used in a high-dimensional parameter estimation problem in Vossepoel and van Leeuwen (2006).The difficulty, as we shall see, lies in the resampling step: how does one generate 'smooth' global particles from local resampled particles.Smooth here is not well defined, but it is related to the particles having realistic physical relations (balances) between the model variables.For example, if geostrophic balance is dominant, the resampling procedure should not generate particles that are completely out of geostrophic balance as that would lead to spurious adjustment processes via spurious gravity waves.
The formal way localisation can be introduced in particle filtering is as follows.Let us denote the state at grid point k as x k .Hence in contrast to other sections a superscript here denotes not the time index, but the grid point.Note that in geoscience applications each grid point typically has several model variables, so x k is a vector in general.Physically it makes sense to assume that the posterior of the state at this grid point depends only on a subset of the observations.Let us denote that subset as y [k] .We can then write: In turn, these observations are dependent not on the whole state vector but only on part of the state vector, denoted by x (k) : Introduce the notation x (k)\k to denote all those grid points in that part of the state vector excluding grid point k.Then we can rewrite the above as an integral over the joint pdf: c 2017 Royal Meteorological Society Prepared using qjrms4.cls Exploring Bayes Theorem we find: Taken together, this shows that The weights w k i thus depend only on the local observations y [k]   and the local prior particles x (k) i , so that the variance of the weights will be much smaller.
The approximation ( 70) is not unrealistic: a temperature observation in New York is not expected to change our pdf of the temperature in London at the moment of the observation.There will be, of course, an effect at later times, but that is not relevant here.The same assumption underlies the use of localisation in Ensemble Kalman Filters, and in variational methods when the background error covariance is constructed.
However, mathematically it does not follow from the assumption that under the prior the values of the state at grid points separated by more than a certain distance are independent.There can be an indirect flow of information from observations far apart over observations between neigboring grid points.In Ensemble Kalman Filters, the Kalman gain is generally a dense matrix even Repeating this procedure for all grid points, we obtain all marginals of the posterior pdf.However, because the weights w This means that to obtain global particles that can be forwarded with the model equations one would need to somehow glue different particles together.This is a major problem and has hampered localisation in particle filtering since the early 2000's.However, recently clever smoothing schemes have been constructed that seem to work well in high-dimensional geophysical applications.We will report on those below.
Another issue is that the localisation area cannot be too large to avoid filter collapse.As a rule of thumb, when there are more than say 10 independent observations inside a local area the particle filter will still tend to be degenerate for the number of particle one can typically afford, of O(10 − 1000).This means that when the observation density is high the localisation areas have to become unphysically small, or observations have to be discarded.This issue might be solved using tempering techniques as discussed earlier, but is often avoided by artificially enforcing a minimal weight of the particles, or by changing the observations, for instance by projecting them on a lower dimensional space favoured by the prior.
Setting a minimal weight or projecting observations to a lower dimensional space favoured by the prior has as consequence that not all information will be extracted from the observations, as observations that are very different from from the existing particles will be largely ignored.This is not directly equivalent to the standard quality control measures used by operational weather forecasting centres, in which observations that are a few standard deviations away from the forecast are ignored.The issue here is that a distance of less then one standard deviation for a few observations can already lead to weight collapse, and artificially setting minimum values for the weights avoids that.

Localisation based on resampling
Several localisation schemes have been proposed and discussed in the review by van Leeuwen ( 2009) and those will not be repeated here.The most obvious thing to do is to weight and resample locally, and somehow glue the resampled particles together via averaging at the edges between resampled local particles (van Leeuwen 2003).In the following several schemes in this category are discussed.

The Localized Particle Filter
Recently, Penny and Miyoshi (2016) used this idea with more extensive averaging, and their scheme runs as follows.First, for each grid point j the observations close to that grid point are found and the weight of each particle i is calculated based on the c 2017 Royal Meteorological Society Prepared using qjrms4.clslikelihood of only those observations: in which y j denotes the set of observations within the localisation area.This is followed by resampling via Stochastic Universal Resampling to provide ensemble members x a i,j with i = 1, ..., N for each grid point j.As mentioned before, the issue is that two neighbouring grid points can have different sets of particles, and smoothing is needed to ensure that the posterior ensemble consists of smooth particles.
This smoothing is performed for each grid point j for each particle i by averaging over the N p neighbouring points within the localisation area around grid point j: in which j k for k = 1, ..., N p denotes the grid point index for those points in the localisation area around grid point j.The resampling via Stochastic Universal Resampling is done such that weights are sorted before resampling, so that high-weight particles are joined up to reduce spurious gradients.
While this scheme does solve the degeneracy problem in simple one-dimensional systems it is unclear if it will work well in complex systems such as the atmosphere in which fronts can easily be smoothed out, and nonlinear balances broken, see e.g. the discussion in van Leeuwen (2009).

The Local Particle Filter
A different scheme has recently been proposed in Poterjoy (2016), that involves a very careful process of ensuring smooth posterior particles and retaining nonlinear relations.It proceeds as follows.
First, adapted weights are calculated for the first element of the observation vector y 1 , as wi = αp(y These weights are then normalised by their sum W . Then we resample the ensemble according to these normalised weights to form particles x ki .
The scalar α is an important parameter is this scheme, with α = 1 leading to standard weighting, and α = 0 leading to all weights being equal to 1 (before normalisation).Its importance lies in the fact that the weights are always larger than 1 − α, so even a value close to 1, say α = 0.99, leads to a minimum weight of 0.01 that might seem small, but it means that particles that are more then 1.7 observational standard deviations away from the observations have their weights cut off to something close to 1 − α.This seriously limits the influence the observation can have on the ensemble.Furthermore, the influence of α does depend on the size of the observational error, which is perhaps not what one would like.It is included to avoid loosing any particle.Now we do the following for each grid point j.For each member i we calculate a weight ωi = αρ(y 1 , x j , r)p(y in which ρ(..) is the localisation function with localisation radius r.We normalise these weights with their sum over the particles, so we obtain a normalised weight ω i for this grid point.Note, again, the role played by α.Then the posterior mean for this observation at this grid point is calculated as in which x i,j is the state at grid point j of particle i.Next a number of scalars are calculated that ensure smooth posterior fields (Poterjoy 2016): in which .. denotes a distance measure in the grid point.The final estimate becomes: c 2017 Royal Meteorological Society Prepared using qjrms4.cls where k i is the index of the i's sampled particle.This procedure is followed for each grid point so that at the end we have an updated set of particles that have incorporated the first observation.As a next step the whole process is repeated for the next observation, with a small change that ωi is multiplied by ωi from the previous observation, until all observations have been assimilated.In this way the full weight of all observations together is accumulated in the algorithm.Now the importance of α comes to full light: without α the ensemble would collapse because the ω's would be degenerate when observations are accumulated.
The final estimate shows that each particle at grid point j is the posterior mean at that point plus a contribution from the deviation of the posterior resampled particle from that mean and a contribution from the deviation of the prior particle from that mean.So each particle is a mixture of posterior and prior particles, and departures from the prior are suppressed.When α = 1, so for a full particle filter, we find for grid points at the observation location, for which ρ(y 1 , x j , r) = 1, that c j = 0, so r 2j = 0, and r 1j ≈ 1, so indeed the scheme gives back the full particle filter.
Between observations it can be shown that the particles have the correct first and second order moments, but higher-order moments are not conserved.To remedy this a probabilistic correction is applied at each grid point, as follows.The prior particles are dressed by Gaussians with width 1 and weighted by the likelihood weights to generate the correct posterior pdf.The posterior particles are dressed in the same way, each with weight 1/N .Then the cdf's for the two densities are calculated using a trapezium rule integration.A cubic spline is used to find the prior cdf values at each prior particle i, denoted by cdf i .Then a cubic spline is fitted to the other cdf, and posterior particle i is found as the inverse of its cdf at value cdf i .See Poterjoy (2016) for details.The result of this procedure is that higher-order moments are brought back into the ensemble between observation points.
This scheme, although rather complicated, is one of the two local particle filter scheme that has been applied to a highdimensional geophysical system based on primitive equations in Poterjoy and Anderson (2016).(van Leeuwen (2003) applied a local particle filter to a high-dimensional quasi-geostrophic system, but that system is quite robust to sharp gradients as it does not allow gravity waves.)

The Localised Adaptive Particle Filter
The localized adaptive particle filter (LAPF) is based on the localized version of the ensemble transform (48) following the LETKF described in Hunt et al. (2007), see also Reich (2013), with localization in observation space, and resampling in the spirit of Gaussian Mixture filters (Stordal et al. 2011).Localization is carried out around each grid point, and a transform matrix D is calculated for each localization box.We note that as for the LETKF the weights given by ( 7) depend continuously on the box location and the observations.
In a first step, the observations are projected into the space spanned by the prior particles.As mentioned above, this will reduce the information extracted from the observations, but is perhaps less ad-hoc than setting a lower bound on the weights, as in the LPF.The LAPF carries out local resampling in the sense that a list of particles with weight above a threshold of 1/N is calculated, with multiplicities according to the relative weights of each remaining particle.In this step, typically filter divergence appears, since only a selected number of unique particles carry most of the weight.This is still the case even in the localized setup.We note, hoewever, that due to continuity of the weights the selection of particles changes only modestly according to the changes of the weights.The outcome of the first step in each box is a set of Ñ ≤ N particles, which by taking into account their multiplicity yields N temporary local particles.
In a second step, a careful adaptive sampling is carried out in ensemble space around each of the N temporary particles.This approach is understood as a form of Bayesian sampling where each particle represents a local part of the posterior described by a basis function centered around the particle, a special case is the Gaussian mixture (88).We note, however, that the Gaussian mixture here is explicitely constructed for the posterior.This The LAPF is the first particle filter that has been implemented and tested in an operational numerical weather prediction context, and we provide a short description of the procedure.
The method has been implemented in the data assimilation show that the quality of the LAPF does not yet reach the scores of the operational global LETKF-EnVAR system, but the system runs stably and forecast scores are about 10-15% behind the current operational system.

The Local Ensemble Transform Particle Filter
This filter uses a classic sequential importance resampling particle filter from a set of forecast particles x f i , which can be obtained employing either the standard or the optimal proposals (or any other) and their associated importance weights w f i .The particles are then resampled in a statistically consistent manner, which can be characterized by an N × N stochastic transition matrix D with the following properties: (i) all entries d ij of D are non-negative and Let us denote the set of all such matrices by D. Then any D ∈ D leads to a resampling scheme by randomly drawing an element j * ∈ {1, . . ., N } according to the probability vector p j = (p 1j , . . . ,p N j ) ∈ R N for each j = 1, . . ., N .The jth forecast particle x f j is then replaced by x f j * and the new particles x n j = x f j * , j = 1, . . ., N , provide an equally weighted set of particles from the posterior distribution.Note that monomial resampling corresponds to the simple choice The ensemble transform particle filter (ETPF) (Reich 2013; (84) It has been shown under appropriate conditions that the variance of a resampling step based on D vanishes as N → ∞ (McCann 1995;Reich 2013).This fact is utilized by the ETPF and one defines even for finite particles numbers.Of course, by its very construction, the ETPF underestimates the posterior covariance.
However, there are corrections available that lead to second-order accurate implementations (de Wiljes et al. 2017).See Section 5.3 for more details.
Following previously introduced notations, localization can now be implemented into the ETPF as follows.For each grid point k, we extract the values of the forecast particle x f i at that grid point and denote them by x k i .We also associate an appropriately localized importance weight w k i to x k i .Then (84) gives rise to an associated localized transformation matrix at grid point k with the set D k defined by Note that the transport cost (distance)

Space-Time Particle Filters
The idea to run a particle filter over the spatial domain was introduced by van Leeuwen ( 2009), and the first algorithm, the Location Bootstrap Filter, was published by Briggs et al. (2013).
The Space-Time Particle Filter by Beskos et al. (2017) improves on this algorithm by removing the jitter step, as explained below.
In the following we assume observations at every grid point, but the algorithms can easily be adapted to other observation networks.
The Location Particle Filter of Briggs et al. (2013) runs as follows.First the ensemble of particles is run until the observation time n.From a spatial or grid-point perspective, this ensemble represents the prior joint pdf over all grid points.
Then the grid points are ordered 1, ..., L, such that points l and l + 1 are neighbouring grid points for each l ∈ 1, ..., L. In each grid point l we have a sample x l i for i ∈ 1, ..., N , in which the upper index denotes grid point number, not time, in this section.
We start the spatial particle filter at location l = 1 by calculating the weight p(y 1 |x 1 i ) (where the time index is suppressed) for each prior particle i, and perform resampling using these weights over the whole spatial domain.This means that the resampled particles are now samples of p(x 1:L |y 1 ).A small amount of jitter is added to avoid identical particles.The choice of this jitter density is again not clear for geophysical applications, more research is needed on this issue.
Then, the algorithm moves to the next grid point, calculates the weights p(y 2 |x 2 i ), and resamples the full state particles using this weight, generating samples from p(x 1:L |y 1 , y 2 ).Again some jitter is needed to avoid ensemble collapse, and the algorithm moves to the next grid point, until all grid points are treated this way.
Note that the algorithm does not suffer from artificial sharp gradients due to different particles being resampled at different grid points, but the algorithm will be very sensitive to the choice of jitter density used after updating the ensemble in each grid point.
Furthermore, when prior and posterior are very different, the algorithm will perform poorly, and Briggs et al. (2013) propose a smoother variant that employs copulas for numerical efficiency.
We will not discuss that variant here.Beskos et al. (2017) introduce the Space-Time Particle Filter.
Instead of using a jitter density to avoid identical particles they exploit the spatial transition density p(x n,l |x n,1:l−1 , x n−1,1:L ), in which n is the time index and l the spatial index.(In fact, Beskos et al. (2017) allow for a proposal density, but we will c 2017 Royal Meteorological Society Prepared using qjrms4.clsexplain the algorithm with using the prior spatial pdf as proposal.) So they exploit the pdf of the state at time n and grid point l, x n,l , conditioned on all previous grid points x n,1:l−1 at the same time, and conditioned on all grid points at time n − 1, denoted x n−1,1:L .The latter conditioning is needed when an explicit time step is used, in which case the explicit dependence on x n,1:l−1 disappears, and the spacial dependence comes in via the transition density from time n − 1 to time n.They do this by introducing a set of M local particles j, for each global particle i, with i ∈ 1, ..., N .
For each of the global particles i they run the following algorithm over the whole grid: 1 Starting from location l = 1 the M local particle filters grow in dimension when moving over the grid towards the final position L. At the first grid point the prior particles at that grid point are used, weighted with the local likelihood p(y 1 |x 1 ) and resampled.Let is call these particles x1 j , in which j is the index of the local particle, and 1 is the index of the grid point.
2 The mean G 1 of the unnormalised weights is calculated.
3 For the next grid point each of these M resampled particles are propagated to that grid point by drawing from p(x 2 |x 1 j , x n−1,1:L j ).Since each of the M particles is drawn independently they will differ and no jittering is needed.
4 Then the unnormalised weights p(y 2 |x 2 ) are calculated, and its mean G 2 , followed by a resampling step.
5 This process is repeated until l = L, so until the whole space is covered.
6 Finally, the total weight which is the unnormalised weight of the 1st global particle.
This procedure is followed N times for each global particle i independently.These global particles are then resampled according to the weight G i It is still possible that this filter is degenerate, see Beskos et al. (2017) for details and potential solutions.
The importance of this filter lies in the fact that there is a formal proof that it converges to the correct posterior for an increasing number of particles, unlike any of the other algorithms discussed.
Furthermore, the authors show that degeneracy can be avoided if the number of particles grows as the square of the dimension of the system, indeed much faster convergence than e.g. the optimal proposal density.

Discussion
Following into the footsteps of Ensemble Kalman Filters, exploring localisation is a rapidly growing field.But localisation in particle filters is not trivial as there is no automatic smoothing via smoothed sample covariances as in Ensemble Kalman filters.This jittering pdf can be chosen arbitrarily, for instance a smooth Gaussian, but it does violate Bayes Theorem.As mentioned above, this error might be negligible when the ensemble size is small.The latter method explores the transition density over space and time, leading to much better estimates of the spatial relations between grid points.Another potential issue of both methods is that if the spatial field is two or higher dimensional, as in geoscience applications, it is unclear how to order the grid points, and potentially large jumps might be created between neighbouring grid points that are treated as far apart by the algorithm.This needs further investigation.
c 2017 Royal Meteorological Society Prepared using qjrms4.cls

Filters
As mentioned in the previous section, there are two issues with localisation.Firstly, particle filters that employ resampling need to ensure smooth updates in space so that the newly formed global particles do not encounter strong adjustments to physical balances due to artificial gradients from glueing particles together.Presentday localised particle schemes concentrate on this issue.
Secondly, the localisation area cannot contain too many independent observations, and as a rule of thumb 10 independent observations is often too many, to avoid weight collapse.As mentioned, this demand can be in strong contrast with physical considerations of appropriate length scales.This is one of the main reasons to consider hybrids between particle filters and ensemble Kalman filters within a localisation scheme.In the following several recent hybrid methods are presented.

Adaptive Gaussian Mixture Filter
A bridging formulation allows to smoothly transition between an ensemble Kalman filter and a particle filter analysis update.
One such formulation is the adaptive Gaussian mixture filter (Stordal et al. 2011).
In an Gaussian mixture filter, the distribution is approximated by a combination of normal distributions centered at the values of the particles.Thus we have where N (x i , P) is a Gaussian Kernel with mean x i and covariance P.This covariance is initialized from the sample covariance matrix P of the ensemble by multiplying with a so-called bandwidth parameter 0 < h ≤ 1 such that At the analysis time, the filter computes a two-step update: In the first step we update the ensemble members and the covariance matrix according to the Kalman filter equations given by and For computational efficiency the analysis equations in the (adaptive) Gaussian mixture filter (Hoteit et al. 2008;Stordal et al. 2011) were proposed to use a factorized covariance matrix in the form P = LUL T as can be obtained from a singular value decomposition of the ensemble perturbation matrix and used, e.g. in the SEIK filter (Pham 2001) and error-subspace transform Kalman filter (ESTKF, Nerger et al. 2012).However, the particular form of the Kalman filter update equations is not crucial here.
In the second step we update the weights of the particles according to and then normalise these so that the sum of the weights is one.
The bridging is now done by interpolating the analysis weight with a uniform weight N −1 as where α is the bridging parameter.We obtain a transition between the ensemble Kalman filter and the particle filter by varying both α and h.For α = 0 and h = 1 we obtain the uniform weights of the ensemble Kalman filter, while for α = 1 and h = 0 we obtain the particle filter weights.Stordal et al. (2011) proposed to adaptively estimate an optimal value of α by setting α = N −1 Neff where Neff = ( i w 2 i ) −1 is the effective sample size.
The update formulation of the adaptive Gaussian mixture filter reduces the risk of ensemble degeneracy, but cannot fully avoid it.
To this end, we can combine the filter with a resampling step as in other particle filters.
c 2017 Royal Meteorological Society Prepared using qjrms4.clsweight distribution.Finally, the starting points of the centres of the prior Gaussians will be closer the observations, suggesting more uniform weights.
In an extension of the scheme, Frei and Künsch (2013) suggest to form a tempering scheme, alternatively using the ensemble Kalman filter and the particle filter.The resampling step of the particle filter is not problematic in this case as the Kalman filter will diversify identical particles in each next iteration.The paper also discusses approximate schemes for non-Gaussian observation errors and nonlinear observation operators.
In Robert et al. (2017), a variant of this method has been introduced which is based on the transform variant of the EnKF instead of the stochastic variant and in which the update is in ensemble space: where the column sums of W equal 1.The matrix W can be split into where W µ corresponds to computing the centers µ i , W α to the resampling and W ξ to the added noise ξ i .In the transform variant W ξ is deterministic and chosen such that the sample covariance of X P I is equals the covariance of the Gaussian mixture (101).
It thus belongs also to the class of second-order exact filters discussed in the next section.

Robert et al. (2017) apply a localized transform Ensemble
Kalman Particle Filter in the KENDA (Kilometer-Scale Ensemble Data Assimilation) with a setup similar to the one used operationally by MeteoSwiss.This system computes the weight matrices W only on a coarse grid and then interpolates these matrices to the original grid.Therefore the discontinuities introduced by resampling are smoothed out, but in a way that is possibly optimal for the EnKF and not for the EnKPF.
In Robert and Künsch (2017) a different localization method for the EnKPF was developed which proceeds by sequentially assimilating observations y k , limiting the state components influenced by y k to a subset.It smoothes out the discontinuities that occur when we connect a resampled particle in the region influenced by y k is connected to a background particle outside of this region.The smoothing is done in such a way that the secondorder properties of the smoothed particle remain correct.

Second-order exact filters
A second-order exact filter ensures that the posterior ensemble mean and ensemble covariance matrix are equal to those obtained from the particle filter weights.Thus, the requirement for the mean of the analysis ensemble is where the superscript f denotes the forecasted state vector.
Likewise, the posterior ensemble covariance matrix is required to fulfil (112)

Merging Particle Filter
The merging particle filter by Nakano et al. (2007) explores the sampling aspect of the resampling step.The method draws a set of q ensembles each of size N from the weighted prior ensemble at the resampling step.Then these sets are merged via a weighted average to obtain a new set of particles that has the correct mean and covariance but is more robust than the standard particle filter.
Define x i,j as ensemble member i in ensemble j.The new merged ensemble members are generated via To ensure the the new ensemble has the correct mean and covariance the coefficients α j have to be real and need to fulfil the two conditions c 2017 Royal Meteorological Society Prepared using qjrms4.cls When q > 3 there is no unique solution for the α's, while for q = 3 one finds: We can make the weights space-dependent in high-dimensional systems and since the new particles are merged older particles the resulting global particles are expected to be smooth.

Nonlinear Ensemble Transform Filter
A simple formulation of a second-order exact filter can be obtained by using Eq. ( 110) to compute the mean of the posterior ensemble (Xiong et al. 2006;Tödter and Ahrens 2015).For the associated ensemble perturbations, we can derive from Eq. ( 112) with w = (w 1 , . . ., w N ) T and W = diag(w) that where is the matrix of forecast (prior) ensemble perturbations.Posterior ensemble perturbations can now be obtained by factorizing A = W − ww T , e.g. by a singular value decomposition as A = VΛV T .This leads to A 1/2 = VΛ 1/2 V T .and posterior perturbations are then given by Finally, the full posterior particles are given by The computation of this filter are very similar to those in ensemble square-root Kalman filters like the ETKF (Hunt et al. 2007) or ESTKF (Nerger et al. 2012).As such, we can can also localize the filter in the same way.The localized NETF has been successfully applied to a high-dimensional geophysical system based on primitive equations in Tödter et al. (2016).In addition, the filter can be easily extended to a smoother by applying the filter transform matrix (the term in parenthesis in Eq. 118) to previous analysis times (Kirchgessner et al. 2017).

Nonlinear Ensemble Adjustment Filter
There is also a stochastic variant of this algorithm (Lei and Bickel 2011), which is motivated from the Stochastic Ensemble Kalman filter (Burgers et al. 1998;Houtekamer and Mitchell 1998).In this filter, we generate a set of perturbed model observations which represents the observation probability distribution.We now obtain an analysis mean of each particle analogous to Eq. ( 110) by where each weight w i (y k ) is computed from the likelihood of the perturbed measured ensemble member H(x i ) .When we now define we obtain the posterior ensemble members as where x n is given by Eq. ( 110) and P a is given by Eq. ( 112).This update equation only yields the correct first and second moments of the posterior distribution in the limit of a large ensemble.

Second-order exact ETPF
Also the ETPF (see Sec. 4.2) can be formulated to be second-order accurate (de Wiljes et al. 2017).For this, we approximate where the matrix D is obtained through (84).To ensure the second-order accuracy, we introduce a correction term such that c 2017 Royal Meteorological Society Prepared using qjrms4.clswith ∆ being a symmetric N × N matrix.Using ∆ in Eq. ( 123) and requiring that the result is equal to A leads to the condition Note that D still satisfies (82).However, d ij ≥ 0 does not hold anymore, in general.

Hybrid LETPF-LETKF
The hybrid LETPF-LETKF is also based on the simple idea of splitting the likelihood function into two factors at each grid point with α ∈ (0, 1), but now the particle filter is employed first, followed by the ensemble Kalman filter.This is similar to tempering in just two steps.When the likelihood is Gaussian the posterior is expected to be more Gaussian than the prior.Hence it makes sense to use a particle filter in the fist step, and to try to use an EnKF in the second step of the tempering procedure.
If the likelihood is Gaussian with localized error covariance matrix R k , then the factorization is equivalent to scaling this matrix by 1/α and 1/(1 − α), respectively.Hence, one can, for example, first apply an LETPF to the forecast particles x f i with inflated covariance matrix R k /α in order to obtain new particle values refined selection criteria for the parameter α are needed to make the hybrid LETPF-LETKF method widely applicable.

Hybrid EnVar PF
Based on the localized adaptive particle filter (LAPF) described in Section 4.1.3,a hybrid particle filter based ensemble variational data assimilation system (PfVar) can also be constructed.The idea is to replace the LETKF based ensemble in an EnVar by an LAPF based ensemble.
We briefly discuss a practical numerical weather prediction example here.Following Buehner et al. (2013), the operational EnVAR system of DWD for the ICON model with 13km global resolution and 6.5km resolution of its two-way nested area over Europe is using the ensemble of the global 40 member LETKF for its dynamic covariance matrix with a ratio of 70:30 towards the classical NMC based covariance matrix of the three-dimensional variational data assimilation system with 3h cycling interval.The LETKF ensemble is replaced by the LAPF ensemble, where the quality control of the variational high-resolution run is used for the ensemble data assimilation system under consideration.In the current system, no recentering of the ensemble with respect to the variational mean estimator is carried out, leading to a form of weak coupling of the systems.
In a quasi-operational setup (without a high-resolution nest), the hybrid PfVAR is running stably for a period of one month.
The observation minus background statistics show very promising behaviour in several case studies which are under investigation at DWD (Walter et al. 2018).In the current state of tuning, the forecast quality of the PfVAR seems comparable to the forecasts based on the LETKF based EnVAR.These new results studied in combination with Robert et al. (2017) show that today's particle filters are approaching the quality of state-of-the-art operational ensemble data assimilation systems and are already becoming important tools on all scales of NWP.

Discussion
Hybrid particle-ensemble Kalman filter schemes, especially when implemented adaptively, can avoid weight collapse in the particle filter part of the hybrid in any situation.The price paid is that not all information from the observations is extracted when the c 2017 Royal Meteorological Society Prepared using qjrms4.clsposterior pdf is severely non-Gaussian, but in many situations this is not the dominant source of error.The reason why these schemes are competitive is that they do take into account some non-Gaussianity via the particle filter, while the particle filter alone is very inefficient compared to the Ensemble Kalman Filter when the posterior is actually close to a Gaussian.So the objective is not necessarily to make the α as small as possible, but indeed to find an optimal α to ensure that the Ensemble Kalman Filter is used whenever we can.The same is true for the bridging parameter in the Adaptive Gaussian Mixture Filter.
The second-order exact filters are hybrids of a different kind, focussing on obtaining the posterior mean and the covariance correct given the limited prior ensemble.These methods are expected to be quite competitive to the hybrid filters discussed above, and the relative performance will depend strongly on the measure used to define what is best.For instance, RMSE are expected to be better for the second-order exact filters, while full ensemble measures like rank histograms and continuous ranked probability scores might benefit from the hybrid schemes.which order is best when, much more research is needed.

Conclusions and discussion
The largest issue of standard particle filters was until recently their degeneracy in high-dimensional settings: when the number of independent observations is large and the number of particles is limited (of order 10-1000 for geophysical applications), one particle gets weight one, and all others get weight zero.
Two developments have revived the interest in particle filters: efficient proposal densities and localisation, while hybrids with Ensemble Kalman Filters and recently transportation filters enhance confidence in the usefulness of particle filters in highdimensional settings.The new kid on the block are particle flow methods.Their popularity in the large machine learning community ensures rapid progress here too.It is unclear at this moment how competitive these new ideas will be.It is clear that developments on particle filters have been very fast, and the first tests of both localised and hybrid particle-EnKF filters in operational numerical weather prediction have been performed, with highly encouraging results.
This paper discussed these new developments and demonstrates that particle filters are useful in even the largest dimensional geophysical data-assimilation problems, and will allow us to make large steps towards fully nonlinear data assimilation.
From the presentation it has become clear that the field is too young to provide solid guidance on which method will be most fruitful for which problem.Given that most data-assimilation practitioners will have an implementation of a local Ensemble Kalman Filter in some form, localised particle filters seem to be the fastest way to make progress.However, one has to keep in mind that the resampling step needs smoothing that is more complex than in an Ensemble Kalman Filter, although exciting new variants like the ETPF and LAPF allow for smooth updates in a very natural way.Furthermore, with the small ensemble sizes now practical (10-100), more then 10 independent observations in a localisation area may already lead to filter degeneracy, forcing us to look into methods that limit the weights from below.This is another ad-hoc procedure that limits information extraction from observations, but it is unclear how severe this issue is.
Even more easy are implementations of hybrid PF-EnKF filters, but it is still unclear what these filters target.At the moment their value lies in bringing more non-Gaussianity into Ensemble Kalman Filters, but at the same time ensure that an Ensemble Kalman Filter is used when that is warranted.
We discussed two main variants that try to avoid localisation because of the issues discussed above: the equal-weight particle filters and transportation particle filters.The equal-weight variants, which avoid weight collapse by construction, do not have a complete mathematical foundation yet.We know these schemes are biased, but since they are tailored to high-dimensional problems with small ensemble sizes the bias error might be c 2017 Royal Meteorological Society Prepared using qjrms4.clssmaller than the Monte-Carlo error from the small ensemble size.Transportation particle filters still have to demonstrate their full potential in geoscience applications, but initial experiments with e.g.mapping particle filters on low-to-moderate dimensional systems together with the way they are formulated suggest they could become mainstream competetive schemes.
All, in all, a huge progress has been made in particle filtering, and initial attempts to implement them into full-scale numerical weather prediction models have succeeded, with promising initial results.This shows that particle filters cannot be ignored and will soon become part of mainstream data-assimilation systems in the geosciences.

Appendices A. Law of total variance
The law of total variance is an elementary theorem in statistics and probability.It can be proven as follows.First we need the Law of total expectation, which reads, using Using this equality on V ar X [X] leads to: which proves the theorem.
like localisation and inflation to find accurate estimates.Hybrids of variational and ensemble Kalman filter methods are a step forward, although localisation and inflation are still needed in realistic applications.An extra complication is localisation over time needed in ensemble smoothers like the Ensemble Kalman Smoother and 4DEnsVar when the fluid flow is strong: what is local at observation time is not necessary local at the start of the assimilation window because the observation influence is advected with the flow.Furthermore, the recent surge of papers on accurate treatment of observation errors shows that a long way is still ahead of us to solve even the (close to) linear data-assimilation problem.

DelMoral
et al. (2006) and Beskos et al. (2014)) one divides the likelihood up as follows: p(y|x) = p(y|x) γ1 ...p(y|x) γm several model time steps are performed between observation times one can also perform tempering in the time domain, as explored in van Leeuwen (2003) and van Leeuwen (2009) in the Guided Particle Filter.The idea is to assimilate the observations ahead of time, with a larger observation error covariance, and resample the particles.This can be done several times during the forward integration of the particles decreasing the observation error covariance each time.At the observation time the normal observation error covariance is used.This will force the particles towards the observations, and does not need extra jittering because each particle will see a different model noise realisation β in the model integration after the resampling steps.Of course one has to compensate for the fact that we have changed the model, and the way to do that is to realise that we have used importance sampling.Specifically, at time m we weight each particle i according to p(y n |x m i ) αm in which αm is a chosen factor between 0 and 1, increasing with m.For instance, one could use a linear increase c 2017 Royal Meteorological Society Prepared using qjrms4.clswith m, such as αm = (n − m)/(n − n ′ ), in which n ′ is the time of the previous set of observations.After weighting with w m i ∝ p αm (y n |x m i ) and resampling we attach a weight 1/w m j to each resampled particle, in which the j index is the new index of the resampled particle.The particle will keep this weight to the next artificial observation time.The connection with importance sampling is clear, instead of sampling from p(x m |x m−1 i ), we sample from a pdf q(x m |x m−1 i , y * ) ∝ p(x m |x m−1 i )p(y * |x m ) αm , in which y * is equal to y n taken at time m, and with larger uncertainty related to αm.
) in artificial time s ≥ 0 with the flow map defining the desired transformation.If the initial conditions of the differential equation (52) are chosen from a pdf p 0 (x), then the solutions follow a distribution characterized by the Liouville equation ∂sps = −∇x • (psfs) .(53)Upon setting p 0 (x) = p(x) two classes of particle flow filters arise.The particle flow filters ofDaum and Huang (2011, 2013);Reich (2011) find a sequence of vector fields fs(x), s ∈ [0, 1] such that p s=1 is equal to the desired posterior pdf p(x|y).The vector fields fs has to satisfy the elliptic partial differential equation ∇x • (psfs) = −ps(log π(y|x) − cs) ). Again several choices are possible.For example, an appropriate vector field fs can be derived from a reformulation of the Fokker-Planck equation for Brownian dynamics with potential V (x) = − log p(x|y) for example,Reich and Cotter (2015)), i.e. ∂sps = ∇x • (ps∇xV (x)) + ∇x • ∇xps = −∇x • (ps {−∇xV (x) − ∇x log ps}) and, hence, one obtains fs(x) := −∇xV (x) − ∇x log ps(x) = −∇x log ps ) gives rise to a gradient flow in the space of pdfs with respect to the Kullback-Leibler divergence KL = KL(ps||p(•|y)) between ps and the posterior pdf(Reich and Cotter 2015), it is natural to introduce the following particle approximation of the Kullback-Leibler divergenceV({x l }) := ps, log ps p(x|y) F. which leads to a gradient flow in the particles {x l } minimising V.The above formulation restricts the pdf ps, and hence the prior and the posterior, to be of the form (60).Alternatively, one can embed the vector field of the flow in an appropriate reproducing kernel Hilbert space and not the density itself.With that we can derive a practical implementation of the Stein variational formulation (58) as follows.First, note that the change in KL due to the flow field fs can easily be found as: dKL = lim ǫ→0 KL(p s+ǫ ) − KL(ps) ǫ = − ps(x) [fs(x)∇x log p(x|y) + ∇xfs(x)] dx.
grid point to the next, it is non-trivial to obtain a consistent posterior for pairs of state values (x k , x ℓ ) (and similarly for triplets etc.).
scheme runs as follows: system DACE (Data Assimilation Coding Environment) of Deutscher Wetterdienst (DWD) Potthast et al. (2018).The DACE environment includes an Local Ensemble Transform Kalman Filter (LETKF) based on Hunt et al. (2007) both for the global ICON model system and the convection permitting COSMO model system of DWD, compare Schraff et al. (2016)), both of which are run operationally at DWD * and build a basis, framework and reference for the LAPF particle filter implementation.The EDA system is equipped with a variety of tools to control the spread of the ensemble, such as multiplicative inflation and additive inflation, relaxation to prior spread (RTPS), relaxation to prior perturbations (RTPP) and stochastic schemes to add spread to soil moisture and sea surface temperature (SST) when needed (details are described in Schraff et al. (2016)).These tools are indispensible parts of the operational DWD ensemble data assimilation system and have been part of the large-scale particle * since January 20, 2016 for the global ICON model with 40km global ensemble resolution including a 20km resolved two-way nest over Europe; and since March 21, 2017 for the COSMO model with 2.8km resolution over central Europe filter testing, for careful comparisons keeping the setup as close to the operational settings as possible.Tests with the LAPF for the global ICON model with 40 particles of 40km global resolution have been successfully and stably run over a duration of one month.Extensive tests on how many particles form the basis for resampling in each localization box have been carried out, the numbers vary strongly over the globe and all heights of the atmosphere, ranging from 1 to N , with relatively flat distribution.Diagnostics and tuning of the system is under development and discussed in Potthast et al. (2018).Results other localized cost function.See Chen and Reich (2015) for more details.Solving the optimal transport problem (86) at each grid point can be computationally expensive.Less expensive approximations, such as the Sinkhorn approximation, and their implementation into the localized ETPF (LETPF) are discussed in de Wiljes et al. (2017).
Most local particle filters impose explicit spatial smoothing, which can affect delicate balances in the system.The Ensemble Transform Particle Filter and the Localized Adaptive Particle Filter come closest to the Ensemble Kalman Filter by using a linear transportation matrix to transforms the prior ensemble into a posterior ensemble, and this matrix can be made smoothly varying with space.All of these smoothing operations rely on forming linear combinations of particles, so can potentially harm nonlinear balances in the model.Furthermore, it should be noted that the smoothing operation does not necessarily follow Bayes Theorem, so it might result in an extra approximation of the true posterior pdf.When the ensemble size is small this approximation might be negligible compared to the Monte-Carlo noise, however.The Location Particle Filter and the Space-Time Particle Filteravoid this smoothing and rely on statistical connections between different grid points.The former does this via the prior pdf, defined by the prior particles.When the number of particles is low this pdf is estimated rather poorly.Furthermore, the method needs jittering of the global particles to avoid ensemble collapse after every resampling step after each new observation is assimilated.
) which is a quadratic equation in ∆ in the form of a continuoustime algebraic Riccati equation and there are known solution methods for this type of equation (see, e.g., de Wiljes et al. 2017).
point k.One then applies the LETKF to these intermediate particles xi with inflated covariance matrix R k /(1 − α).The choice of α is, of course, crucial.Numerical experiments indicate (Chustagulprom et al. 2016) that α > 0 can lead to substantial improvements over a purely LETKF-based implementation and that the choice of α can be based on the effective sample size of the associated LETPF.However, more One question that emerges when comparing the Ensemble Kalman Particle Filter and the LETPF-LETKF hybrid is what should one use first, the particle filter or the ensemble Kalman filter.Different experimental results seem to indicate that both orderings can be superior to the other.The PF-first methods have the advantage of a theoretical justification via a two-step tempering interpretation in which the particle filter step makes the prior for the EnKF much more Gaussian.Applying the EnKF first will bring the particles closer to the observations, leading to better weight balance in the particle filter.At this moment it is unclear E A [B] as denoting the expectation of B under pdf p(a):E Y [E X|Y [f (X)]] = f (x)p(x|y)p(y) dx dy = x y f (x)p(x, y) dy dx = f (x)p(x) dx = E X [f (X)]