Non-linear Gaussian smoothing with Taylor moment expansion

This letter is concerned with solving continuous-discrete Gaussian smoothing problems by using the Taylor moment expansion (TME) scheme. In the proposed smoothing method, we apply the TME method to approximate the transition density of the stochastic differential equation in the dynamic model. Furthermore, we derive a theoretical error bound (in the mean square sense) of the TME smoothing estimates showing that the smoother is stable under weak assumptions. Numerical experiments are presented in order to illustrate practical use of the method.


I. INTRODUCTION
The aim of this paper is to consider smoothing problems (see, e.g., [1,2]) on non-linear continuous-discrete state-space model of the form dX(t) = a(X(t)) dt + b(X(t)) dW (t), on a temporal domain T := [t 0 , ∞), where X : T → R d is an Itô process driven by a standard Wiener process W : T → R s modelling the state of the system.The random variable Y k ∈ R dy stands for the measurement of X k := X(t k ) via a (non-linear) function h : R d → R dy and with a Gaussian noise ξ k ∼ N(ξ k | 0, Ξ k ).The drift a : R d → R d and the dispersion coefficient b : T → R d×s of the stochastic differential equation (SDE) above are assumed to be regular enough so that the SDE is well-defined in Itô's SDE construction (see, e.g., [3,Ch. 5]).Suppose that at times t 1 ≤ t 2 ≤ • • • ≤ t T ∈ T we have a set of measurement data y 1:T := {y k } T k=1 .The aim is to approximate the so-called smoothing posterior densities using Gaussian densities: for all k = 1, 2, . . ., T .The goal is thus to compute the means and covariances {m s k , P s k } T k=1 using the measurement data.This is known as a continuous-discrete Gaussian smoothing problem [1,2,[4][5][6].
There exists a plethora of methods that can be used to obtain the Gaussian approximated density in Equation (2).When the functions a, b, and h are linear, the approximation in Equation (2) becomes exact, and one can use (continuousdiscrete) Kalman filters and smoothers to solve them exactly This work was supported by Aalto ELEC doctoral school.Zheng Zhao and Simo Särkkä are with Department of Electrical Engineering and Automation, Aalto University, Finland (e-mail: zheng.zhao@aalto.fi).
One major challenge in solving the filtering and smoothing problems for non-linear models lies in recursively propagating filtering and smoothing solutions through the SDE in Equation (1).To that end, we need to be able to compute the transition densities for k = 1, 2, . . ., T in order to carry out the computations.However, the transition density of the SDE in Equation ( 1) is, in general, analytically intractable and it needs be approximated.In order to obtain an approximation to the transition density, one solution is to locally approximate the non-linear SDE by a linear SDE [11].Another commonly used approach is to discretise the SDE by using Itô-Taylor expansions [12][13][14] (e.g., Euler-Maruyama or Milstein's method).However, low-order Itô-Taylor expansions are accurate only when the discretisation time step is sufficiently small, while high-order expansions are numerically efficient only when the dispersion coefficient b is commutative [12,Ch. 10] which is too restrictive for many practical models.
The main contribution of this letter is to develop a Taylor moment expansion (TME) based Gaussian smoother for computing smoothing density approximations as in Equation (2).In the approach, TME is used to approximate the transition density as follows: where g M and Q M are determined by the expansion so that they approximate the exact respectively, up to an arbitrary order of precision M .This idea heavily depends on our previous work that applies the TME method to Gaussian filters [15].In addition, we analyse the stability of the proposed smoother by theoretically showing that the smoother error is bounded in a mean square sense under weak assumptions on the SDE and the filtering estimates.The experimental results show that the proposed smoother outperforms a number of baseline smoothers that are commonly used in practice.
This letter is organised as follows.In Section II, we present the Taylor moment expansion (TME) based approach for solving the continuous-discrete Gaussian smoothing problem.

arXiv:2110.01396v2 [math.NA] 4 Nov 2021
In Section III, we derive the error bound for the smoothing estimates.Experimental results are presented in Section IV, followed by conclusions in Section V.

II. TAYLOR MOMENT EXPANSION GAUSSIAN SMOOTHER
Let us suppose that the filtering densities for k = 1, 2, . . ., T have already been approximated as Gaussian p( , where the means and covariance are given by {m f k , P f k } T k=1 .These approximations can be computed with many different Gaussian filters (e.g., EKFs or sigma-point filters) [13,[15][16][17][18].To be able to implement the standard nonlinear Gaussian smoother (see, e.g., [4,19]) for the problem, we need to be able to compute the following quantities: Methods for approximating the functions/expectations g and Q include, for example, SDE linearisations [11] or Itô-Taylor expansions (see Sec. I).In the following, we show how to use the Taylor moment expansion (TME) method [15] to approximate these quantities.
Consider a target function [15,20,21] that for every t ≥ s ∈ T, where operator A is defined by (∇ and H denote gradient and Hessian, respectively) and the remainder is given by The remainder above is in general intractable, hence, we usually discard it in order to compute the TME numerically.
If we do so, the differentiability conditions for φ and the SDE coefficients are then reduced to 2 M and M − 1, respectively.Therefore, by choosing the identity function φ I (x) = x ∈ R d , we can form an M th order TME approximation to g by where A is applied elementwise for any vector/matrix-valued target function.As for the covariance function Q, we need to introduce φ II (x) = x x T ∈ R d×d .Based on this, one possible where Remark 1.There is a fundamental difference between EKFSs and TME in the use of Taylor expansion.In EKFSs, the Taylor expansion is taken on the SDE coefficients, while in TME, the expansion is taken on the expectations (e.g., mean or covariance) with respect to the SDE distribution.
Putting together all the results above, we define the TME Gaussian smoother in the following algorithm.
for k = T − 1, T − 2, . . ., 1.The recursion should be started from m s T := m f T and P s T := P f T at k = T .Sigma-point methods (e.g., unscented transform [22], Gauss-Hermite [17], spherical cubature [13], or sparse-grid quadratures [23,24]) can be used to numerically approximate the expectations in Algorithm 2. Specifically, for any integrand z : R d → R n , the sigma-point methods approximate Gaussian integrals of the form z(x) N(x | m, P ) dx by where {χ i m,P := m + √ P β i } K i=1 are the integration nodes, and {w i ≥ 0} K i=1 are the associated weights of the nodes.We can define sigma-point TME Gaussian smoothers as methods where the integrals in Algorithm (2) are computed via Equation (5).

III. STABILITY ANALYSIS
The (sigma-point) TME Gaussian smoother defined via Algorithm 2 is not exact smoother due to the errors introduced by the Gaussian approximations, TME approximations, and numerical integrations.Because these errors accumulate in time, it is important to analyse if the TME smoother is stable in the sense that the mean square error is finite: where • 2 stands for the Euclidean and spectral norms for vectors and matrices, respectively.Please note that m s k indeed depends on T , hence, the error above is a function of both k and T .The Equation ( 6) essentially means that the smoothing errors are bounded over k for every number of measurements T ≥ 1.
We show that as long as the filter is stable and the SDE coefficients are suitably regular, then the smoother is stable.In addition, the concluded result is independent of the choice of the measurement model in Equation (1).We use the following assumptions in order to obtain the main result in Theorem 6.

Assumption 3 (Stable filtering). There exist c f (k) and c
for all k ≥ 1, and that sup k≥1 tr(P f k ) ≤ c P < ∞ almost surely.

Assumption 4 (System regularity).
There exists c S > 0 such that g M (x)−S m,P (g M ) 2  2 ≤ ∇ x g M (x) 2 2 x−m 2 2 +c S tr(P ), for all x ∈ R d , m ∈ R d , and P ∈ R d×d .Furthermore, where λ min denotes the minimum eigenvalue.
Assumption 5.There exists c such that 2 ≤ c < ∞.Assumption 3 ensures that the filter result is stable.This is necessary for ensuring stable smoothing, as the smoother takes the filtering result as input and produces the smoothing results by amending the filtering results.It is also worth noting that Assumption 3 is based on the standard definitions of stable filters (see, e.g., [15,16,[25][26][27][28] for the definitions, and models and filters that suffice the stability assumptions).
Assumption 4 ensures that the sigma-point integration of the TME approximant g M does not deviate significantly from g M with respect to a Lipschitz constant c g .This is a commonly used assumption in analysing the stability of sigma-point filters [28, Assump.4.1].In addition, Assumption 4 requires the TME covariance approximant Q M to be strictly positive definite.The Beneš model [15] is an example that verifies Assumption 4, where a(x) = tanh(x) and b = 1.
Assumption 5 says that the distances between the sigmapoint mean predictions and the filtering means are uniformly bounded.In other words, the sigma-point mean prediction of m f k produced by Algorithm 2 should stay around m f k+1 for all k ≥ 1.This assumption is needed because the residual between m f k and m f k+1 is the key in producing m s k .Moreover, we argue that it is hard to relax this assumption unless Algorithm 2 is aware how the filtering means are obtained (i.e., the relationship between m f k and m f k+1 is not known by the algorithm).
Suppose that Assumptions 3, 4, and 5 are satisfied, and that 2 c G < 1.Then Algorithm 2 with the sigmapoint integration in Equation ( 5) is such that Proof.When k = T , by definition m s T := m f T , we recover the filtering bound c f (T ) at T .When k ≤ T − 1, by expanding Equation ( 4) we arrive at and hence, .
By Assumptions 3 to 5 and triangle inequalities, we have that almost surely and get By putting together all the bounds above, we conclude the result.

IV. EXPERIMENTS
In this section, we test the TME smoother defined in Algorithm 2 on a simulated model and compare it against other commonly used baseline smoothers 1 .The test model we choose here is a Lorenz '63 system [30,31] defined by where κ = 10, λ = 28, µ = 2, σ = 5, and W is a three-dimensional standard Wiener process with a unit spectral density.This model is considered challenging for non-linear filters and smoothers because making accurate predictions from its SDE can be hard [32].We simulate (numerical) ground truth trajectories and their measurements uniformly at 100 time instants 0.02, 0.04, . . ., 2 s using 10, 000 Euler-Maruyama integration steps between each measurement.In order to verify the efficacy of TME smoothers, we run combinations of commonly used filters and smoothers.These include a continuous-discrete extended Kalman filter (EKF-RK4) and smoother (EKS-RK4) with 4th order Runge-Kutta integration [2, Alg.10.35], and 3rd order Gauss-Hermite sigma-point filters (GHFs) and smoothers (GHSs).In particular, we compare TMEs (in orders 2 and 3) against the Euler-Maruyama (EM) scheme in GHFs and GHSs in order to show the benefits of using the TME scheme in sigma-point Gaussian smoothing.
We evaluate the performance of the smoothers by computing their root mean square errors (RMSEs) with respect to the simulated true trajectories.The means and standard deviations of the RMSEs are computed from 1, 000 independent Monte Carlo (MC) simulations of the model and runs of the smoothers.
The RMSE results are shown in Table I.We see that for all filters (except EKF-RK4), the smoothers that use the TME scheme have the best mean RMESs and lowest standard 1 The implementation codes (in both Python and Matlab) of the experiments can be found at https://github.com/zgbkdlm/tmefs.deviations, followed by EM and EKS.Moreover, the 3rd order TME results in better RMSEs compared to those of the 2nd order TME.The combination of EKF-RK4 and EKS-RK4 is found to be the worst, while the smoother that uses EM turns out to be the best under the EKF-RK4 filter.
In order to verify the stability analysis in Theorem 6, we estimate the error bound E X k − m s k 2 2 of the TME-2 filter and smoother by using 10, 000 MC runs.Since c G is a reciprocal of c 2 Q , we hereby check if the error bound tends to be contractive as c Q increases.This is done by tuning the σ of the test model, as c Q is lower bounded by σ 2 .
The error bounds with two different σ settings are shown in Figure 1.We see that when σ = 0.2 is small (i.e., c Q is small and c G is large) the error bound increases as k approaches to one.On the contrary, the error bound becomes contractive as k goes to one when σ = 15 is large (i.e., c Q is large and c G is small).This numerical result verifies the property of the theoretical error bound in Theorem 6 which shows that the bound is indeed increasing and contractive (as k → 1) when the smoothing gain bound c Q is sufficiently large and small, respectively.

V. CONCLUSION
We have proposed a Taylor moment expansion (TME) based Gaussian smoother for non-linear continuous-discrete statespace models.The key is to use TME to approximate the statistical properties (i.e., mean and covariance) of SDEs, resulting in asymptotically exact propagation of Gaussian smoothing estimates through the SDEs.Moreover, we have theoretically analysed the stability of the proposed TME smoothers, showing that the smoothers are stable if their preceding filtering procedures are stable and the SDE satisfies certain weak assumptions.The numerical experiments verify that the proposed TME smoothers outperform a number of commonly used smoothers, and that the theoretical error bound matches its numerical estimates in a Monte Carlo experiment.

Algorithm 2 (
TME-M Gaussian smoother).Let {m f k , P f k } T k=1 be a collection of filtering means and covariances.The TME Gaussian smoother obtains {m s k , P s k } T −1 k=1 by recursively computing

Figure 1 . 2 2
Figure 1.MC estimates of E X k m s 2 2 Shaded area stand for the 0.95 confidence interval of the MC samples.

Table I
RMSES (IN MEAN AND STANDARD DEVIATION) OF THE SMOOTHING ESTIMATES IN COMBINATIONS OF DIFFERENT FILTERS AND SMOOTHERS.BOLD NUMBERS REPRESENT THE BEST IN EACH OF THEIR ROWS.