A hybrid particle-stochastic map ﬁlter

in nonlinear state-space models is known to be a challenging task due to the posterior distribution being either intractable or expressed in a complex form. One of the most successful methods, particle ﬁltering (PF), although generally outperforming traditional ﬁlters, suffers from sample degeneracy. Drawing from optimal transport theory, the stochastic map ﬁlter (SMF) accommodates a solution to this problem


Introduction
The state-space formulation for time-dependent models has been long used in various applications in science and engineering.For example, in target tracking, the state vector represents the kinematics of the target [1][2][3] ; in weather prediction, it is related to temperature, pressure, humidity, etc.; in oceanography, it could refer to the spatial pattern of surface currents; and in economics, it concerns interest rates, inflation, etc.Nevertheless, filtering based on such formulation still faces numerous challenges, especially in nonlinear and high-dimensional situations.In the literature, various types of filters have been proposed, to cope with the various state estimation issues encountered by different models.
Indeed, in the case of linear Gaussian signal and observation models, the minimum mean squared error (MMSE) estimator underpins the Kalman filter (KF) [4] .However, despite its optimality in the aforementioned scenario, the KF is not applicable when either model is nonlinear.To overcome this, various extensions of the KF have been proposed in the literature, such as Extended Kalman filter (EKF) [5] and Unscented Kalman filter (UKF) [6,7] .Even though both EKF and UKF constitute improvements on the classical KF, they still have some drawbacks.Among those, satisfactory performance can only be achieved when they are used in mild nonlinear conditions.The reason behind this is the model linearization in EKF and the unscented transform in UKF have a weak capability to approximate high-nonlinear dynamic behaviours.
To address solutions to the state estimation for highly nonlinear and non-Gaussian models, the particle filter (PF) [8][9][10] has been proposed and is one of the most common choices in the literature.The key idea behind this important filtering approach is to represent the posterior density function in terms of a set of random samples along with their associated weights and to compute estimations based on these weighted samples.In cases when the number of samples becomes large (i.e., the Monte Carlo (MC) simulation), the distribution of these samples corresponds to an equivalent representation to the functional description of the posterior probability density function (pdf).In other words, large number of samples help to represent posterior distribution perfectly and the PF estimates converge to the optimal Bayesian estimates.Despite its ability to model non-linear and non-Gaussian cases, the PF suffers from sample degeneracy especially in large-scale systems [11] , where all but one particle weights tend to zero after a few iterations [8] .Thus the particles along with their corresponding weights cannot accurately represent the posterior distribution any longer.To alleviate the particle degeneracy, many variants of PF have been presented based on two strategies: (1) the use of effective importance proposals [8,12] , such as auxiliary PF [13] , unscented PF [14] , and PF based on multiple importance sampling [15] ; (2) employing resampling techniques [16,17] , such as multinomial resampling [18] , systematic resampling [16] or residual resampling [19] .Despite these improvements, in the highly nonlinear and high-dimensional cases, a large number of particles are still required due to the scale of the degeneracy problem and the computation load becomes prohibitive.
The Ensemble Kalman filter (EnKF) [20][21][22] has been proposed as another solution for the highly nonlinear cases.The EnKF also has a deterministic version which is the Ensemble Square Root filters (ESRF) [23] .In particular, for EnKF, the linear transformation is performed statistically by treating the observations as random variables.By contrast, in ESRF, analysis perturbations follow the KF analysis error covariance equation.Different from the PF, both EnKF and ESRF just require a small number of particles as they perform posterior estimation through linear particle movements and avoid the computation of importance sampling.However, despite these advantages over the PF, their performance in highly nonlinear environments is limited due to the intrinsic bias resulting from the low flexibility of their linear updates.
In an attempt to increase the flexibility of filtering techniques and address the drawbacks mentioned above, two kinds of nonlinear-transportation-based particle filters have been proposed.First, there are particle flow filters, which assume that the particle flow is embedded in a reproducing kernel Hilbert space and efficient solutions can be designed [24][25][26] .Although these approaches can provide flexible particle movements, they suffer from the slow transition from prior to posterior space.To improve transport efficiency, for the second method, optimal transport (OT) theory has also been utilised in conjunction with Bayesian methods to design one-step transportation methods.Optimal transport is an important mathematical theory, which was first introduced by Monge and then developed by Kantorovich [27] .Optimal transport maps define couplings that minimize a particular integrated transport cost corresponding to rearranging samples.Several filters based on optimal transport theory have been proposed.In [28] , the ensemble transform particle filter (ETPF) has been introduced, whereby the optimal transport problem is resolved by linear programming.Despite its robust filtering performance, it suffers from a large computational load.In [29] , two efficient generalizations of EnKF and ESRF have been presented, which are the stochastic map filter (SMF) and the deterministic transport map filter (DMF).For both of these approaches, the nonlinear movements of particles are captured by stochastic or deterministic maps based upon Knothe-Rosenblatt (KR) coupling which is also an optimal transport coupling [30] .Compared with other OT maps, the KR map has a triangular structure, so it is simple to invert, and computationally easier to parameterize and learn [27,31] .Benefiting from the flexibility of the nonlinear maps, the SMF has produced better performance for nonlinear models when compared to the EnKF which is based on linear transformation.
In recent years, motivated by the aforementioned drawbacks of PF and EnKF, three hybrid filters have been proposed and explored, namely the Gaussian mixture model-EnKF hybrid filter (GMM-EnKF) [32,33] ; the hybrid of ETPF and ESRF (ETPF-ESRF) [28,34] ; and the SIR-ESRF [35] via combining the standard PF and ESRF with a mean-preserving random orthogonal resampling.For all the aforementioned hybrid filters, the likelihood function is first separated into two parts, then, EnKF/ESRF and PF-based filters are applied sequentially to assimilate each likelihood part.As a result, the hybrid filters can yield more precise non-Gaussian approximations than PF since they alleviate the particle degeneracy issue through the use of EnKF/ESRF.However, these filters also have some drawbacks.In GMM-EnKF, since the EnKF is implemented in the first stage, it is not suitable for moderately non-Gaussian models where prior distributions are non-Gaussian and posterior distributions are close to Gaussian.ETPF-ESRF suffers from a large computational load due to the ETPF component.SIR-ESRF provides an efficient solution to the moderately non-Gaussian models and the mean-preserving random orthogonal transformation in the ESRF update stage produces the Gaussian approximation for the posterior distribution.However, in cases when the posterior is not close to a Gaussian form, this resampling technique causes large sampling errors.
In this paper, we propose a new nonlinear filter by combining ideas from the PF and the SMF.In particular, the main contributions of this work consist in: 1. We propose a hybrid particle-stochastic map filter, referred to as PSMF, which combines the particle filter and the stochastic map filter.In order to build the new hybrid filtering approach, the likelihood function is separated into two parts, with the PF and SMF being then employed sequentially.In the proposed approach, we utilise SMF and not DMF since the former only requires the optimisation of a convex function.2. To address the particle degeneracy issue, we employ the systematic resampling method [16,17] and a subsequent smoothing step by adding a low-variance Gaussian noise.Compared with the ETPF, the computation load of systematic resampling is relatively low and proportional to the number of particles.Also, it is not based on Gaussian approximation, hence when the posterior distribution is not close to Gaussian, the adopted resampling technique is still efficient.3. To evaluate the influence of the nonlinearity of the transport maps on hybrid filters, we introduce PSMF-L and PSMF-NL, which are based on linear and nonlinear transport maps, respectively.These two variants were compared in the experimental analysis, and their advantages and disadvantages are analysed in details.
The rest of the paper is organised as follows: we begin in Section 2 with a review of the theoretical preliminaries in the form of state-space models, particle filters, and stochastic map filter.In Section 3 , we present the details of the proposed PSMF algorithm.The experimental analysis is performed in Section 4 , whilst Section 5 concludes the paper with a summary.

State-space model
A state-space model can be described as a probabilistic graphical model, which defines the relationship between the hidden state variables and the measurements.It includes two equations, which are for states (signals) and their measurements.The signal model describes the state changes over time, whilst the measurement model explains the relationship between the states and measurements.A generic state-space model is expressed as where X k refers to the hidden state, Z k is the observation, the random model error is denoted by U k , and W k refers to the observation noise at time k .The random model and observation noise errors are independent of each other and the states.Figure 1 depicts the general conditionally independent structure of state-space models, which shows the independence of observations conditioned on the state.

Sequential Monte-Carlo methods
Particle filters, also referred to as Sequential Monte Carlo (SMC) algorithms are a class of Monte Carlo algorithms used to solve some filtering problems arising in signal processing and in Bayesian statistical inference.They represent a popular solution to estimate the expectations of the posterior distribution by representing the density with a large number of weighted samples.The key step of the PF algorithm is to design proper importance proposals for the posterior distribution and use the importance sampling method to calculate its mean.Thus, estimating the expectation of the posterior distribution becomes a tractable problem through the samples and their corresponding weights.At any time k , the posterior distribution can be expressed as Generally, the posterior distribution has a complex form, and sampling from it cannot be done.Hence, we usually obtain the sampled particles of posterior distributions using the importance sampling methodology.Let the importance proposal be q (X 0: k | Z 1: k ) , then we take the sequential form of q (X 0: where we firstly sample from the importance proposal and obtain the particles X i 0: k , i = 1 . . .N.Then, the corresponding weights w i k can be expressed as Hence, the posterior density is approximated as For the standard particle filter formulation, the importance proposal is chosen as which relies only on the prior information and leads to the weight updating equation as Degeneracy problem As mentioned in the above sections, a common problem in particle filtering applications is the degeneracy phenomenon, where a large part of the particles will have negligible weights after a few iterations.It has been proved that the variance of the importance weights can only increase over time [8] , and thus it is impossible to avoid the degeneracy phenomenon.Possibly, the most effective way to reduce the effects of this drawback is to design an efficient importance proposal [9] .Also, while PF will always degenerate in the long term, this problem can be alleviated by resampling techniques [16,17] , which replicate the particles with large weights and eliminate the ones with small weights.

Stochastic map filtering
In this section, the general formulation of the SMF is introduced.We first provide some brief, necessary background on OT.Then, we explain the Knothe-Rosenblatt rearrangement, which is an optimal transport map designing procedure and constitutes the basis of the SMF approach.The cost function for the parameter optimisation of the KR rearrangement is introduced subsequently.Finally, the assimilation step of the SMF concludes this subsection.

Optimal transport fundamentals
The optimal transport [27,36] is a historical mathematical theory, which exploits the most efficient way to move one distribution to another.It was first presented by French mathematician Gaspard Monge in 1781.Specifically, Monge's optimal transportation problem pursues a transport map g which pushes one probability distribution p(X ) , X ∈ 0 to another q (Y ) , Y ∈ 1 , i.e., where B is an arbitrary area in the domain 1 .In general, there exists an infinite number of maps that solve Eq. ( 9) .One way to select a map is to define a particular cost function and solve the optimal transport problem.For example, g can be restricted by minimising the p -Wasserstein metric below where MP denotes measure preserving maps meeting Eq. (9) .
is a cost function, and denotes the transport cost from X to g(X ) .
In 1942, Monge's transport problem was generalised by Leonid V. Kantorovich, whose formulation pursues a transportation plan γ .One can regard γ as the joint distribution which has marginals p(X ) and q (Y ) , i.e., Its corresponding objective function can be expressed as , where MP meets Eq. (11) , and c ( X, Y ) = | X − Y | p , p ≥ 1 .We should note that the optimization problem over the joint distribution is over a different space than the MP used in Eq. (10) .Specifically, for Monge's problem, a transport map assigns one element of X to exactly one of Y .By contrast, Kantorovich's problem is more general and supports one-to-many movements.
In recent years, the regularisation of many optimal transport maps/plans has been presented and applied to filtering problems, such as ETPF [28] , the mapping particle filter [25] , and filtering based on the Knothe-Rosenblatt (KR) rearrangement [29] .

Knothe-Rosenblatt rearrangement
Given any pair of positive densities, there exists a unique monotone triangular transport map, which defines a deterministic coupling between two distributions and is called the KR rearrangement [36] .This strategy belongs to Monge's problem, i.e., a one-toone transport map.Due to its ability to create connections between two probability densities, it has been widely used in Bayesian inference, importance sampling, SMC, etc. [37][38][39] .
A parametric strategy for the KR rearrangement is presented in Moselhy and Marzouk [37] .Assume X = [ x 1 , x 2 , . . ., x n ] T and Y = [ y 1 , y 2 , . . ., y n ] T are n -dimensional variables with distributions p(X ) and q (Y ) , respectively.Then, a standard triangular transport map can be expressed as where the map S transports the distribution p(X ) to q (Y ) and each component of S is monotone with respect to its last input.The crucial property of the KR rearrangement for its application to the SMC algorithm is that it provides an implicit characterization of the marginal conditional distributions.In the example above, S 1 transports p(x 1 ) to q (y 1 ) , S 2 transports p(x 2 | x 1 ) to q (y 2 | y 1 ) and so on.This conditional distribution transformation property is used in the design of the SMF algorithm.
The parameterisation of KR rearrangement is an important part of the design of transport maps.One way to parameterise each component of the map S is via multivariate polynomials which could either involve Hermite or Legendre polynomials [31] .Spantini et al. [29] also provides a specific computational parameterisation of triangular maps, which is based on radial basis functions (RBFs), and we adopt the same form in this work.

Constructing KR rearrangement from samples
Following the definition of the triangular transport maps above, we now explain the cost function definition based on the Kullback-Leibler (KL) divergence.We aim to optimise the parameters of the transport map polynomials which connect an arbitrary distribution p(X ) to a reference distribution q (Y ) .Following Eq. ( 9) , since S is a monotone and differentiable transformation, we have where ˆ p (X ) is an approximation of p(X ) .The map S can be obtained by minimizing the difference between ˆ p (X ) and p(X ) , and the KL divergence can be used to measure this.It can be expressed as Then, the transport map can be expressed in terms of the KL divergence as where H is a function space for map S. The unknown term ] is neglected in the objective function above because it is not related to S. We should note that in the above cost function, there is no term related to transport cost like Eq. ( 10) , because the uniqueness of the transport map is guaranteed by the triangular structure and monotonicity of KR rearrangement [37] .Assume that we have N samples X i , i = 1 . . .N, from p(X), then the cost function based on the discrete samples is expressed as Newton's method provides an efficient solution to the minimisation of the cost function in (16) and has been widely used in the literature [29] .

Data assimilation
In this section, we introduce the structure of the SMF.For a filtering problem, at any time k , we initially have samples . ., N are obtained.Then we have . Following [29] , we define the function S X : to a normal distribution.The resulting map is continuous and can be defined as From [29] , we have that the analysis map T which transports the joint distribution to the posterior can be expressed as where , and • denotes the composition of the two maps.In the above structure, S X transforms the samples from the joint distribution to the standard normal which is then pushed by S X (Z * k , ∼) −1 to the posterior.In principle, we can directly employ S X (Z * k , ∼) −1 to produce posterior approximations by pushing forward samples from the standard normal density.However, from Spantini et al. [29] , transporting samples from through T yields more accurate results which is attributed to the cancellation of errors in the composition of ˆ S X and its inverse.
In practice, we can obtain the estimator of T using the samples, and it can be expressed as where ˆ S X denotes the estimated map of S X .In general, the process of stochastic map filtering can be divided into two steps, i.e., the forecast step and the analysis step.First, we get samples of p(X k , Z k ) (forecasting).Second, the samples of the estimated posterior distribution can be obtained by transforming the joint prior samples through the map ˆ T (analysis).

Stochastic map-augmented particle filtering
In this section, we present the proposed hybrid filter that combines PF with the SMF.At any time k , we have samples . After the prior samples are obtained, the likelihood function is separated as For α values of 0 and 1, the hybrid filter boils down to the SMF and the PF, respectively.
When α is small, PSMF is close to the SMF.In this case, the sample degeneracy can be alleviated effectively, but the estimation error increases due to the limited flexibility of the maps.We should note that theoretically, the degree of non-linearity of a map can be arbitrarily high and there is no limit to its flexibility.However, with a certain number of particles, highly non-linear maps suffer from larger map variances and may produce worse estimations than more linear maps.So, the flexibility of nonlinear maps should be controlled according to the ensemble size.Specially, for a small number of particles, less non-linear maps, should be used.By contrast, a larger number of particles can restrict the variance of a highly non-linear map [29] .
By contrast, for large α, PSMF approaches closely the PF.Then, the error from sample degeneracy increases.In this case, as 1 − α So, there is no need for a highly flexible transport to complete the second assimilation step, and the SMF-part does not cause a rough estimate.With the adaptability mentioned above, the proposed hybrid filter achieves a trade-off between the PF and the SMF.
Before we implement the assimilation step, α needs to be chosen.For a given θ , we choose α as the largest value which satisfies the condition ESS θ N (21) where ESS is the effective sample size and can be approximated by the sample weights.For a given α, based on (8) , we have the weight update equation which can be expressed as Since a resampling step is taken at every iteration, we have w i k −1 = 1 /N and the normalized particle weights can be written as Then, the ESS can be approximated by From [33] , searching the value of 0 α 1 satisfying the condition in ( 21) can be achieved by a root finding method.For large θ , the searched α is small, and then the hybrid filter is closer to SMF.For a smaller θ , the hybrid filter moves towards the PF.The optimal values of θ depend on the nonlinear models.Currently, there is no automatic way to select parameter θ , and it needs to be selected manually based on the models.After θ is selected and α is calculated, the PF is implemented to update p(Z * k | X k ) α .Based on (23) , we obtain the normalized particle weights corresponding to the selected α.
To alleviate the particle degeneracy problem, we utilise a systematic resampling followed by a smoothing step.Different strategies for the alleviation of particle degeneracy [33][34][35] have been proposed in the literature.For example, EnKF-GMM avoids the degeneracy by sampling from the posterior density approximated by a Gaussian Mixture; the ETPF-ESRF filter employs the optimal transport strategy to implement the resampling; a meanpreserving random orthogonal transformation is used in SIR-ESRF to break the degeneracy.Although the performance of systematic resampling is proved in previous work, one did not pay enough attention to its embedding in hybrid filters, because the discontinuity update influences its application in the high-dimensional scenario.In this work, we only consider the low-dimension cases, but with the developments of the local PF [24,40] and its combination with the EnKF [41] , we can reasonably consider applying our hybrid filter to high-dimension models.This however is out of the scope of this paper.
The proposed resampling process [17] starts by generating N ordered numbers u i , i = 1 : N by which are used to select X i k according to the multinomial distribution.That is where function X (i ) = X i k .F denotes the cumulative probability distribution of the normalised particle weights, i.e., F (i denotes the inverse of F .After resampling, all the particle weights become equal, After resampling, to increase the diversity of particles, we add a small Gaussian noise to smooth the samples, which is helpful to decrease the variance of transport maps in the SMF assimilation stage. where 0 < β < 1 is the smoothing factor and v ar(X k ) is the variance of the resampled samples.ζ = 1 − β 2 is the shrinking factor, which is employed to remove the excess variance caused by the added noise [32] .
Next, (p(Z * k | X k ) 1 −α is assimilated by the SMF.First, to obtain the samples [ Z i k , X i k ] from the joint distribution, we sample the In this work, we only consider Gaussian and Gaussian Mixture likelihood.For Gaussian cases, as- where is the covariance matrix.Then For Gaussian mixture cases, the likelihood can be expressed as p where M is the number of mixtures.P j and j are the mixing factor and the covariance matrix of the ith Gaussian component, respectively.Similarly, we can sample to obtain Z i k .Then, the map ˆ S X can be designed from the samples [ Z i k , X i k ] according to (17) and the transport map ˆ T can be designed by (19) .In the last step, the samples [ Z i k , X i k ] are transported with the transport map ˆ T as In our work, thanks to the generalisation of the EnKF, the transport map ˆ T can be designed to be both linear and nonlinear which can then respectively be called PSMF-L and PSMF-NL.As also noted in Spantini et al. [29] , the nonlinear map achieves a better performance compared to the linear map, but suffers from larger map variances.The overall procedure of the proposed PSMF algorithm is summarised in Algorithm 1 .

State-space models
In this section, we introduce the low-dimensional nonlinear state-space models used in the experimental analysis.There are four benchmark models considered, i.e., (1) the univariate nonstationary growth model [42] , (2) Henon map [35] , (3) Lorenz-63 model [29] , and (4) a target tracking model also used in Henke et al. [43] , 44 ].The first model is one-dimensional, the Henon map is a two-dimensional model, whilst Lorenz-63 and the target tracking models are of dimensions 3 and 4, respectively.The details of these state-space models are presented in the sequel.
; α ← root-finding method; if α > 0 then for i ← 1 to N do w i k ← particle weights by (23); for i ← 1 to N do X i k ← systematic resampling by ( 25) and ( 26); for i ← 1 to N do X i k ← smoothing by (27); (16).S X is defined by (17);

Henon map
Different from the other models, for the Henon map, the number of steps is set to 1.This helps to focus on a single Bayesian assimilation update.Hence, the complication related to varying non-Gaussianity along cycled steps can be eliminated.
From [35] , the prior is the joint density of U and V whose samples are obtained by pushing samples of normal variables U 0 and V 0 forward through the Henon map, i.e., where . The true values of U and V are set to −4 and 0.6.10 0 0 Monte Carlo runs were simulated in the experiments.
For the observation model, we assume the states are measured every t obs = 0 .5 time units, then the measurement model can be represented as where Z k = Z(k t obs ) and W k ∼ N (θ 2 I d ) is zero-mean white Gaussian observation noise.In our experiment, the states are fully measured, i.e., d = 3 , and the observation noise variance is set to For the initialisation, the true state is set by sampling from The true model runs 60 0 0-time steps to generate the states, and the observations are produced by the measurement models.The initial ensemble is produced by a spin-up stage.N samples are obtained from the initial condition N (0 , I d ) and then the stochastic EnKF is run for the first 20 0 0 steps.The produced particles are used as initial samples.In the last 40 0 0 steps, several filters are run.Only the filter results for the last 20 0 0 steps are used for the analysis of the filter quality, and the middle 20 0 0 steps are used for eliminating the influence of switching between EnKF and other filters at the end of the 20 0 0th iteration.

Target tracking with heavy-tailed measurement noise
In this subsection, we describe a nonlinear target tracking model [43,44] .As it is shown below, the state of the target X is a four-dimensional vector, where P east and P north are the map coordinates of the object, V abs is the value of velocity, and ϕ is the direction of movement.
The dynamics of the target tracking model can be described by where t obs is the time interval from current state to the next state, and is the heavy-tailed process noise.Different from Henke et al. [43] , 44 ], in our experiment, the heavy-tailed noise is approximated by a Gaussian mixture as is used in various other target tracking models [2,3] , and can be expressed as ∼ N(0 , Q ) with probability 0 .85 N(0 , ηQ ) with probability 0 .15 (35) where Q is the covariance matrix, and η is a diagonal matrix used to enlarge Q.Then, the measurement equation is given by where W ∼ N (0 , R ) is the measurement noise with covariance R and H = 1 0 0 0 0 1 0 0 (37) is the transformation matrix.Following the measurement model, at every time instant, the position coordinates of the target, P east and P north , are observed.For the complete definition, we also need to specify the observation interval t obs , the values for the initial state X 0 with the covariance matrix C 0 , the processing noise covariance Q, and the measurement noise covariance R .In the     experiment, without loss of generality, the target starts to move from the initial point and the real initial state X 0 is set to 0 m 0 m 30 m / s 0 rad , where the initial speed of 30 m / s is a reasonable value for a vehicle.For simplicity, we assume the initial state subject to a Gaussian distribution with the covariance C 0 where the deviation about the position, velocity, and direction is set to 10 m 10 m 3 m / s π / 10 rad , so we have C 0 = diag 100 m 2 100 m 2 9( m / s ) 2 π 2 / 100 rad 2 .
In our experiments, we set The whole tracking process lasts for 120 s and 50 Monte Carlo runs were simulated.

Benchmark filters
In our experiments, the proposed hybrid PF-SMF filter is compared to 6 state-of-the-art filters, including the classic PF, EnKF, ESRF, SMF and two-hybrid filters of SIR-ESRF and GMM-EnKF.In this section, we introduce the configuration of these filters.
For PSMF-L and PSMF-NL, we set both the smoothing parameter β to 0.2.Following [29] , linear and nonlinear transport maps take the "linear" and "linear + 2 RBFs" forms, respectively.Also, the SMF part in the proposed hybrid filter processes the observations at a given time sequentially.
For a fair comparison, other filters' configurations follow that of PSMF according to their structures.First, systematic resampling is performed in PF, SIR-ESRF and GMM-EnKF.Second, the same smoothing step is adopted in PF.Third, the map design configurations are the same as in SMF.Finally, the observations are sequentially processed in EnKF, ESRF, SMF, and the ESRF/EnKF parts in hybrid filters.

Performance evaluation in relation to ensemble size
In the first set of experiments, we numerically investigated the performance of filters under different ensemble sizes over a range of state-space models.The number of particles used was successively N = [20 , 40 , 60 , 100 , 200 , 400 , 600] .For the hybrid filters, the parameter θ has influences on their filtering performance.In this experiment, the value of θ was taken from a vector [0.0 01,0.0 02,0.0 04,0.0 06,0.0 08, 0.01, 0.02, 0.04, 0.06, 0.08, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 0.92, 0.94, 0.96, 0.98, 0.99, 0.992, 0.994, 0.996, 0.998, 0.999], and we selected the best performing θ and its corresponding results as the final results.The coordinates in the vector range from 0.001 to 0.999.In the scales near 0 and near 1, a smaller sampling interval is chosen.The reason behind this is that for some of the state-space models, the performance of the hybrid filters is more sensitive to the change of θ within those two scales.The minimum θ in the vector can ensure the effective numbers are less than 1, and maximum larger than N − 1 , which can ensure the best parameter values are in the tested range.Alternatively, we can use a root finding method to search the best performing hybrid filters and their corresponding values of θ [35] .
The root mean-square error (RMSE) was used to analyse the per- formance of each filter.For the UNGM and Lorenz-63, estimations on all variables were used to calculate RMSE values.For Henon map model, following [35] , the variable u and v were analysed separately whilst the location estimation precision was considered for the target tracking model.In addition, the average continuous ranked probability score (CRPS) is also computed.Different from RMSE, it quantifies the spread of the ensemble.Lower CRPS indicates that the ensemble concentrates around the true values, and is more precise for state estimation.

Evaluating PSMF vs. traditional filters
RMSE comparisons between the PSMF and the traditional filters are plotted in Fig. 2 .When compared to the PF for a small number of particles (N < 100) , the proposed approach provides improved results in all cases.For large ensemble sizes, although PSMF still yields better tracking performance under higher dimensional models, similar estimations are obtained under the one-dimension model.These comparisons indicate that different from the classic PF, the proposed hybrid filter can effectively alleviate particle degeneracy due to employing the parameter θ .Note however that in one-dimensional cases, because of the narrow sampling space, for medium and large number of particles, the classic PF does not suffer from particle degeneracy seriously, and PSMF does not make a significant difference in these cases.
For small N, the PSMF has similar results with the EnKF/ESRF, whereas when increasing the number of particles, the PSMF starts to show considerable gain for models.This shows that with few number of samples, the PSMF utilises a larger θ value to maintain the effective number of particles, so that α tends to be smaller leading to the PSMF performing closely to the SMF.By contrast, with large N, the PSMF adopts a smaller θ , and α increases.Then, the PF part occupies a larger assimilation proportion, and more nonlinear information can be extracted.On the other hand, due to their limited flexibility, the EnKF/ESRF does not yield better results with increased number of particles.
Specifically speaking for the linear and nonlinear versions of the proposed approach, for small N, the PSMF-L yields better performance compared to the PSMF-NL.In most cases, with an increase in the number of particles, the advantages of the PSMF-L gradually reduces.The reasons behind this are related to the quality of transport maps.From [29] , we can recall that a linear map has more robust performance with small N but can have limited flexibility.By contrast, a nonlinear map holds higher non-linearity but might yield unacceptable variances without a sufficient number of particles.As a result, for small N, nonlinear maps cause large variance, and the PSMF-NL performs worse than its linear counterpart, the PSMF-L.For large N, the variance produced by the nonlinear maps can be suppressed and the difference between PSMF-L and PSMF-NL becomes smaller.
The results above do not imply that the PSMF-L can always achieve better results than PSMF-NL.From the results under the variable v of the Henon map, PSMF-NL provides better results.
As mentioned before, compared with the nonlinear maps, linear maps have very limited flexibility, which causes poor estimation results.In some cases, a large θ is needed to maintain the effective number of particles, and then the value of α decreases.Consequently, the SMF occupies a larger assimilation part and needs to perform more complex transportations.If the bias caused by the linear transport approximation is larger than that of nonlinear-map variance, the PSMF-NL can perform better than the PSMF-L.To complete this section, CRPS comparisons between the PSMF and the traditional filters are also plotted in Fig. 3 and they follow the same characteristics similar to the RMSE results in Fig. 2 .This suggests that our new hybrid filter shows advantages over the traditional filters in both mean estimations and ensemble concentration.

PSMF vs. hybrid filters
RMSE comparisons between the PSMF and other hybrid filters are plotted in Fig. 4 .
When compared to the SIR-ESRF, the proposed PSMF performs worse when the particle number is small.In most cases, for increasing number of particles, the proposed PSMF's performance achieved is closer to or better than that of SIR-ESRF.
The difference between the performance of the SIR-ESRF and that of PSMF is primarily due to their distinct strategies of alleviating the particle degeneracy.In the PSMF, systematic resampling with a smoothing step is adopted.By contrast, SIR-ESRF relies on the mean-preserving random orthogonal transformation which resamples the ensemble during the ESRF assimilation stage.This resampling technique performs well under medium-level nonlinear models.However, since it is based on the Gaussian approximation, when the posterior distribution gets further from Gaussian, more errors will be introduced.In all the experiments, for small N, the bias caused by the systematic resampling and transport map variances is larger than that of the Gaussian approximation in SIR-ESRF, which thus determines the proposed PSMF to achieve closer or worse results.However, for medium and large N, the bias from the systematic resampling and transport map variance is less sig-nificant.As a result, it becomes smaller than the error from the resampling in the SIR-ESRF.
For small N, the PSMF and GMM-EnKF achieve considerably lower performance compared to the SIR-ESRF.However, the GMM-EnKF achieves worse results than the proposed PSMF for a medium number of particles.When increasing the number of particles, its performance gradually approaches that of PSMF, although under the nonlinear non-Gaussian target tracking model, for N = 600 , the PSMF still offers obvious advantages.These results are caused by the structure of the GMM-EnKF.Different from the other hybrid filters, the GMM-EnKF implements the EnKF assimilation first, which is based on the Gaussian assumption for the prior distributions.In all the state-space models in this paper, all the measurement models are linear and Gaussian, which means that the posterior has a closer form to a Gaussian than the prior.Thus, implementing ensemble-based filters first introduces more error.For small N, the EnKF update occupies a large part to keep the ESS.As a result, the Gaussian approximation introduces more errors.For large N, the GMM-EnKF just needs a small θ value to ensure the ESS is large enough.Consequently, a smaller α will be selected and the GMM occupies a large proportion.Hence, more nonlinear transform information between the prior and posterior can be obtained.However, the 4-dimensional target tracking model has a larger sampling space and needs more effective particles.So, even for N = 600 , the EnKF occupies a larger assimilation proportion than the other cases.As a result, the GMM-EnKF is significantly outperformed by the proposed PSMF.
As in previous section, we also provide comparisons between the PSMF and the other hybrid filters in terms of CRPS and these are plotted in Fig. 5 .The results are consistent with the RMSE results in Fig. 4 .This means that the differences among the hybrid filters in the mean estimation can be extended to the ensemble concentration.Finally, for further objective evaluation, numerical results obtained for both RMSE and CRPS are also shown in Tables A.1 and A.2 , respectively, in Appendix A .Due to the limited space, a sparse selection of ensemble sizes, [20,60,20 0,60 0], is provided.Results indicate that in most cases, hybrid filters work better than traditional filters.Also, SIR-ESRF tends to perform better than the others with few particles, while PSMF obtains robust performance with large ensemble sizes.

The effect of θ on filter performance
In the second set of experiments, we investigated the influence of θ on the hybrid filters.We used the same values of θ as in Section 5.3 , i.e., between 0.001 and 0.999.The number of particle used were 20 , 200 , and 600.
In Fig. 6 , the influence of parameter θ is evaluated, for an ensemble size of N = 20 .It can be observed that, in general, the best results are achieved by the SIR-ESRF.As was mentioned in the previous sections, the reason behind this is that the resampling technique of the SIR-ESRF, i.e., the mean-preserving random orthogonal transformation, has a more robust performance for small N.In addition, the large variance of nonlinear maps for small number of particles makes the proposed nonlinear approach be inefficient in this case.
The PSMF-L and GMM-EnKF achieve inconsistent results and lower performance than the SIR-ESRF in most cases.This is due to the known drawbacks of these two filters: when using few particles, the systematic resampling technique in PSMF-L is not as ro-bust as that in SIR-ESRF.By contrast, the GMM-EnKF runs by first performing the EnKF component, which leads to large errors.
In Fig. 7 , the influence of the parameter θ is evaluated for N = 200 .It can be seen that for the first 3 models, for the same reason as above, the GMM-EnKF yields the worst results.It is clear that the proposed PSMF-NL achieves the best performance for large θ values (close to 1).The improved results benefit from the larger flexibility of the nonlinear maps and reduced map variances in the case of a medium number of particles.However, when θ is small, the advantage of PSMF-NL is not obvious anymore, the reason being that in such case, the values of α tend to be large, and the SMF part occupies a small proportion of the assimilation process.Consequently, there is no need for a flexible nonlinear map to complete the assimilation.Also, we should notice that the performance of the PSMF-NL is just slightly worse than that of PSMF-L in this case.The small effective number of particles caused by small θ do not cause large nonlinear map variances.Because of our smoothing strategy, the diversity of particles can be ensured and the variance can be suppressed.
For the first 3 models, the PSMF-L and SIR-ESRF achieve comparable results with lower performance compared to the proposed PSMF-NL.The disadvantages of the SIR-ESRF stem from the Gaussian assumption of the resampling techniques, whilst the disadvantages of the PSMF-L are due to the lower flexibility of linear maps.
For the last model (4D tracking), the results are slightly different.First, the SIR-ESRF performs relatively worse than the proposed PSMF-L since the posterior distributions under this model are far from the standard Gaussian assumption.Second, when θ is close to 1, the advantage of the proposed PSMF-NL can still be seen.Nevertheless, on a large scale, the PSMF-NL performs worse than its linear counterpart.This is due to increased variance of the nonlinear map in this case.According to the structure of the triangular maps, it is known that the number of map parameters increases with the dimension of models.As a result, a medium number of particles cannot effectively suppress the variance of nonlinear maps that causes PSMF-NL to perform worse than the linear PSMF-L.
Finally, in Fig. 8 , the influence of parameter θ is evaluated, for N = 600 .The results are similar to those obtained for N = 200 , with some slight differences.It can be seen that For large N, the proposed PSMF-NL benefits more from the increase of the number of particles.For a larger scale of values of θ , it outperforms the others for the first three models.Even under the 4D target tracking model, the variance of nonlinear maps can be suppressed more efficiently, and its performance becomes closer to that of the PSMF-L.
We can also notice that under the target tracking model, when 0 . 2 < θ < 0 .8 , the location estimation log (RMSE) of the proposed PSMF varies very little.We attribute this to the fact that the transformation between a prior distribution and corresponding posterior distribution is not particularly complex under this model.
When θ changes between 0.2 and 0.8, the assimilation proportion of PF changes as well.However, the SMF can always complete the remaining assimilation update and the final estimation results remain stable.When θ is very close to 1, transport maps cannot extract all the remaining nonlinear information and the performance is reduced.By contrast, when θ is very close to zero, the PSMF becomes closer to the classical PF and hence suffers from particle degeneracy.

The effect of the smoothing step on filter performance
In this section, the influence of the smoothing step on the filtering performance of PSMFs is investigated.The parameter set is the same as in the first experiment, and the experiment results are shown in Fig. 9 .Under most models, the smoothing step does not influence the filtering performance.However, under Lorenz-63, with a large number of particles, PSMF performs better than PSMFnsm.The reason is that with large ensemble sizes, the optimal value of θ is relatively small under Lorenz-63 as shown in Fig. 8 .Thus, ESS is small, which causes large map variances.A smoothing step can increase the diversity of samples, and then the map variance is decreased.Hence, in this case, PSMF works better than PSMF-nsm.

Computational complexity analysis
For an exhaustive evaluation of our proposed PSMF method, the filter efficiency with the change of particle numbers is also evaluated.To this end, we fix the parameter θ to 0.5, and employ ensemble sizes of [10 0 , 20 0 , 40 0 , 60 0 , 10 0 0] .Small particle numbers are not considered because some filters diverge with few samples and then the precise computational cannot be obtained.The rest of the parameters are the same as in the first set of experiments presented above.The computational time is shown in Fig. 10 .
Compared with traditional methods, PSMF is less efficient than PF, EnKF and ESRF.However, the comparison between PSMF and SMF does not lead to consistent conclusions, being dependent on the state-space models investigated.Specifically, under the onedimensional UNGM model, the PF part of PSMF can keep the ESS high, and the SMF part is omitted in many iterations.Consequently, PSMF is more efficient than SMF.By contrast, under higher-dimensional models, the SMF part of PSMF can be omitted in fewer iterations.Thus, the inclusion of both PF and SMF steps in PSMF slows the filtering process.
Also, GMM-EnKF is more time-consuming than PSMF for all the models.This is due to its ESS calculation for searching α.Specifi-cally, because GMM-EnKF implements EnKF first, the whole hybrid filter should be implemented to calculate ESS.However, for PSMF, only the PF part is implemented to obtain ESS, as the SMF step does not change ESS.Therefore, GMM-EnKF spends more time on the search of α by the root-finding method.
Finally, PSMF has a similar computational load to SIR-ESRF when the ensemble size is small.However, the computational time of SIR-ESRF increases faster than PSMF along the x -axis and becomes larger than that of PSMF when particle numbers are large.This is caused by the mean-preserving random orthogonal transformation in SIR-ESRF, which has nonlinear computational complexity with respect to the particle number as explained in Grooms and Robinson [35] .By contrast, PSMF has linear computational complexity.

Conclusion
In this paper, we proposed a novel hybrid filtering approach, the PSMF, which enhances the standard particle filter by using ideas from stochastic map filters.To break the particle degeneracy issue, which is peculiar to PF, systematic resampling followed by a smoothing step was adopted.In order to analyse the impact of the nonlinearity of transport maps, we presented PSMF-L and PSMF-NL by adopting linear and nonlinear transport maps, respectively.
Two sets of experiments involving four widely employed statespace models were implemented to validate the proposed hybrid filters.In the first experimental setup, we investigated the performance of the PSMF with optimal parameters of θ for different ensemble sizes.Experiment results showed that for medium and large number of particles the linear version of the proposed hybrid filter, PSMF-L, yields better results than the benchmark approaches.Nevertheless, in some cases, the error inherently introduced by the nonlinear map variance was smaller than that due to the linear approximation of the linear map.Consequently, the nonlinear hybrid filter provided better results than the PSMF-L.We should also note that, with a small number of particles, the advantages that the PSMF offers were not obvious when compared to the other filters, since the PSMF suffers errors caused by the systematic resampling and transport map variance.
For the second series of simulations, the relationship between the performance of the PSMF and the parameter θ was investigated.It was demonstrated that in most cases the PSMF-NL is more tolerant to changes of the parameter θ for medium and large numbers of particles.This owes to the higher flexibility of nonlinear maps.In addition, for small numbers of particles, due to the large nonlinear map variance, the PSMF-NL achieves lower performance with respect to benchmark approaches.Our future work will consider extending the proposed approaches to higher dimensional problems.Moreover, we will also investigate the deterministic transport map filter (DMF) [29] within the context of the proposed hybrid filters.

Fig. 2 .
Fig. 2. The comparison of the proposed PSMF to the traditional filters under different number of particles.The horizontal axis represents the number of particles, and the vertical axis represents the log (RMSE).

Fig. 3 .
Fig. 3.The comparison of the proposed PSMF to the traditional filters under the different number of particles.The horizontal axis represents the number of particles, and the vertical axis represents the log (CRPS).

Fig. 4 .
Fig. 4. The comparison of the PSMF to the other hybrid filters under the different number of particles.The horizontal axis represents the number of particles, and the vertical axis represents log ( RMSE ) .

Fig. 5 .
Fig. 5.The comparison of the proposed PSMF to the traditional filters under the different number of particles.The horizontal axis represents the number of particles, and the vertical axis represents log ( CRPS ) .

Fig. 6 .
Fig. 6.The comparison among hybrid filters with different values of parameter θ under different models.The ensemble size N is set to 20.Horizontal axis represents the value of θ, and vertical axis represents the estimation log ( RMSE ) .

Fig. 7 .
Fig. 7.The comparison among hybrid filters with different values of parameter θ under different models.The ensemble size N is set to 200.Horizontal axis represents the value of θ, and vertical axis represents the estimation log ( RMSE ) .

Fig. 8 .
Fig. 8.The comparison among hybrid filters with different values of parameter θ under different models.The ensemble size N is set to 600.Horizontal axis represents the value of θ, and vertical axis represents the estimation log ( RMSE ) .

Fig. 9 .
Fig. 9.The comparison of the proposed PSMFs between with and without the smoothing step under the different number of particles.PSMF-nsm-L and PSMF-nsm-NL represent linear and nonlinear PSMFs without the smoothing step, respectively.The horizontal axis represents the number of particles, and the vertical axis represents log ( RMSE ) .

Fig. 10 .
Fig. 10.The comparison of the proposed PSMF to the other filters under the different number of particles.The horizontal axis represents the number of particles, and the vertical axis represents log ( time ) .

Table A2
CRPS results of the filters under different models.C and N represent CRPS and the ensemble size, respectively.M1:M5 represent UNGM, Henon-u, Henon-v, Lorenz-63, and Target tracking, respectively.F1:F9 represent PF, EnKF, ESRF, SMF-L, SMF-NL, SIR-ESRF, GMM-EnKF, PSMF-L, PSMF-NL.The first and second minimums under different particle numbers and models are highlighted in red and blue bold font, respectively