NLOS mitigation in indoor localization by marginalized Monte Carlo Gaussian smoothing

One of the main challenges in indoor time-of-arrival (TOA)-based wireless localization systems is to mitigate non-line-of-sight (NLOS) propagation conditions, which degrade the overall positioning performance. The positive skewed non-Gaussian nature of TOA observations under LOS/NLOS conditions can be modeled as a heavy-tailed skew t-distributed measurement noise. The main goal of this article is to provide a robust Bayesian inference framework to deal with target localization under NLOS conditions. A key point is to take advantage of the conditionally Gaussian formulation of the skew t-distribution, thus being able to use computationally light Gaussian filtering and smoothing methods as the core of the new approach. The unknown non-Gaussian noise latent variables are marginalized using Monte Carlo sampling. Numerical results are provided to show the performance improvement of the proposed approach.


Introduction
The knowledge of position is ubiquitous in many applications and services, playing an important role.The widely diffused Global Navigation Satellite System (GNSS) offers a worldwide service coverage due to a network of dedicated satellites [1].GNSS is recognized to be the de facto system in outdoor environments when it is available.Under the assumption that its reception is not obstructed or jammed [2][3][4], there is no doubt that GNSS is the main enabler for location-based services (LBS).One of such situations is indoor positioning and tracking, where satellite signals are hardly useful (unless extremely large integration times are considered).In indoor scenarios, a plethora of alternative and complementary technologies can be considered [1,5,6].
We are interested in a particular propagation phenomena encountered in most positioning technologies (both outdoor and indoor), known as non-line-of sight (NLOS).It is one of the most challenging problems for tracking.Particularly, when considering time-of-arrival (TOA) measurements as range estimates, the measured *Correspondence: jvila@cttc.cat 1 Centre Tecnològic de Telecomunicacions de Catalunya (CTTC/CERCA), 08860 Castelldefels, Barcelona, Spain Full list of author information is available at the end of the article distance can be severely degraded.These ranges are typically positively biased with respect to the true distances, therefore seen as outliers at the receiver.It is of interest to develop NLOS mitigation techniques, providing enhanced robustness to tracking methods based on TOA measurements [6].
In general, the problem under study concerns the derivation of new robust methods to solve the Bayesian filtering and smoothing problem in challenging applications such as the LOS/NLOS propagation conditions in indoor localization systems.The state-space models (SSM) of interest are expressed as where x k ∈ R n x and y k ∈ R n y are the hidden states of the system and measurements at time k.f k−1 (•) and h k (•) are known to be the possibly nonlinear functions of the state; and both process and observation noises, u k and n k , assumed to be mutually independent.In real-life applications, we may not have a complete knowledge of the system conditions, thus the measurement noise statistics are assumed to be unknown to a certain extent.In contrast, we consider a known process noise covariance Q k .
Regarding the measurement noise, we assume that it is distributed according to a parametric heavy-tailed skew t-distribution, n k ∼ ST φ k , with φ k representing the set of possibly unknown parameters of the non-Gaussian distribution.The probability density function (pdf ) of the univariate skew t distribution of interest can be written as [19] ST z; μ, σ 2 , λ, ν = 2T z; μ, λ 2 + σ 2 , ν T(z; 0, 1, ν + 1), with μ ∈ R, σ 2 ∈ R + , λ ∈ R, and ν ∈ R + , referring to the distribution location, scale, skewness, and degrees of freedom, respectively.T z; μ, σ 2 , ν is the pdf of the Student's t distribution, with (•) the gamma function.T(z; 0, 1, ν) is the cumulative distribution function (CDF) of the Student's t distribution with ν degrees of freedom and

State-of-the-art
The skew t distribution has been recently shown to provide a reasonable fit to realistic indoor TOA measurements.For instance, characterizing range measures in NLOS conditions in ultra-wideband (UWB) localization [7] or in multipath channels when ranging is computed with long-term evolution (LTE) networks [8].Interestingly, this distribution allows a Gaussian meanscale mixture (GMSM) representation, which implies that the distribution can be reformulated as hierarchically (conditionally) Gaussian [9,10].Mathematically, if we have the skew t-distributed random variable z ∼ ST (z; μ, σ 2 , λ, ν), then we can write [41] with τ ∼ G τ ; ν 2 , ν 2 , γ |τ ∼ N + γ ; 0, τ −1 , and N + (•) and G(•) as the positive truncated normal and gamma distributions.This is a key point in our problem formulation, because under the knowledge of the noise parameters (i.e., μ, σ 2 , λ, ν, γ and τ in (3)), both the conditional marginal filtering and smoothing posterior distributions of the states, p(x k |y 1:k ) and p(x k |y 1:N ), turn to be Gaussian and thus we are able to use computationally light Gaussian smoothing methods to infer the states of the system.
In the literature, some contributions dealing with conditionally Gaussian SSMs corrupted by both heavy-tailed symmetric and skewed noise distributions were proposed.A particle filter (PF) solution for linear SSMs in symmetric α-stable (SαS) noise was presented in [11].This idea was further explored in [12] for nonlinear systems and generalized to other symmetric distributions in [13].The key idea was to take advantage of the conditionally Gaussian form and use a sigma-point Gaussian filter [14,15] for the nonlinear state estimation.A robust filtering variational Bayesian (VB) approach was considered for linear systems in [16] and further extended to nonlinear SSMs in [17] considering a symmetric Student t measurement noise.But symmetric distributions may not always be appropriate to characterize the system noise.Recently, two interesting approaches to deal with linear SSMs under skewed noise were proposed, the first one uses a marginalized PF [18] and the other considers a VB solution [7,19].It is important to point out that (i) these contributions deal with either nonlinear systems corrupted by symmetric distributed noises or linear SSMs under skewed noise and (ii) the core of these methods use standard Bayesian filtering algorithms, then the smoothing problem needs to be further analyzed within this context.
Related to the problem under study, it is worth saying that several contributions deal with the filtering problem in nonlinear/non-Gaussian SSM under model uncertainty using sequential Monte Carlo (SMC) methods, for instance, joint state and parameter estimation solutions [20], model selection strategies using interacting parallel PFs [21,22], or model information fusion within the SMC formulation [23].The main drawback of SMC methods is their high computational complexity and the curse-ofdimensionality [24].That is the reason why we propose to take advantage of the underlying conditional Gaussian nature of the problem and use more efficient methods in this context.

Contributions
The main contributions of the article, which generalize the preliminary results in [25], are summarized as: The article is organized as follows: first, we provide a discussion on Gaussian filtering and smoothing in nonlinear/Gaussian systems, together with the sigma-pointbased approximation of the multidimensional integrals in the conceptual solution, being computationally more efficient than SMC methods under the Gaussian assumption; then, we provide the conditionally Gaussian formulation of the measurement noise and a method to deal with the unknown non-Gaussian noise latent variables, and finally, we propose a NLOS indoor localization solution, based on the Gaussian smoother and the sequential noise latent variables marginalization.Numerical results are provided in realistic scenarios using UWB signals.

Gaussian filtering and smoothing
This section reviews the general filtering and smoothing solutions in the case of nonlinear/Gaussian systems, this material corresponds to Sections 2.1 and 2.2, respectively.Then, in Section 2.3, we provide the implementation details when sigma-points are used to solve the filtering/smoothing equations.Notice that when the system is linear/Gaussian, the optimal solutions are given by the standard Kalman filter (KF) [35] and Kalman smoother (KS) [36], and for general nonlinear/non-Gaussian systems, one should consider more sophisticated SMC techniques [29].
For the formal derivation of the Gaussian filter/smoother, we assume that the measurement noise in (2) is zero-mean Gaussian with known covariance R k .Later, in Section 3, we discuss how the method can be used in the context of conditionally Gaussian models.

Bayesian Gaussian filtering
From a theoretical point of view, all necessary information to infer information of the unknown states resides in the marginal posterior distribution of the states, p(x k |y 1:k ).Thus, the Bayesian filtering problem is one of evaluating this distribution.It can be recursively computed [26] in two steps: (1) prediction of p(x k |y 1:k−1 ) using the prior information and the previous filtering distribution and (2) update with new measurements y k to obtain.
The recursive solution provides an estimation framework that is optimal in the Bayesian sense, that is, the characterization of the posterior distribution allows us to compute the minimum mean-squared error (MMSE), the maximum a posteriori (MAP) or the median of the posterior (minimax) estimators, addressing optimality in many senses.The multidimensional integrals in the prediction and update steps are analytically intractable in the general case.Actually, there are few cases where the optimal Bayesian recursion can be analytically solved.This is the case of linear/Gaussian models, where the KF yields to the optimal solution [27].In more general models, one must resort to suboptimal algorithms.A plethora of methods can be found in the literature [28].A popular tool are particle filters (PF) [29][30][31][32], a set of simulation-based methods which are applicable in nonlinear/non-Gaussian setups.Under the Gaussian assumption of interest, the quadrature KF (QKF) [14,15,33] and cubature KF (CKF) [34] are typically the methods of choice.In this case, the marginal predictive and posterior distributions are In the prediction step, we compute the marginal predictive distribution mean and covariance as 1 In the update step, the mean and covariance of the marginal posterior are given by the KF Equations [35] where the Kalman gain is The predicted measurement and both innovation and crosscovariance matrices are computed as The problem reduces to the approximation of these integrals.

Gaussian smoothing
In the previous section, we summarized the general Gaussian Bayesian filtering solution but sometimes it may be interesting to obtain an estimate of the smoothing posterior and not its filtering counterpart.In the problem under study, we consider a forward-backward smoother formulation [36] to obtain the marginal smoothing posterior, p(x k |y 1:N ), where we used the state that is Markovian and then The forward-backward smoothing [36] performs two filtering passes, that is, first a standard forward filtering from time k = 1 to N, and then, the backward filtering from k = N to 1, backwards in time.Notice that the predictive and filtering distributions may be obtained from the standard Bayesian filtering solution.At time k, if we consider that we know the filtering distribution, N x k ; xk|k , x,k|k , and the predictive distribution, N x k+1 ; xk+1|k , x,k+1|k , from the forward filtering, together with the smoothed density at k + 1, N x k+1 ; xk+1|N , x,k+1|N , because the smoother is running backwards, then the analytical solution to the marginal smoothing posterior is obtained as follows: using the Markovian properties of states, we have that p(x k |x k+1 , y 1:N ) = p(x k |x k+1 , y 1:k ), and then we can obtain the conditional smoothing distribution of x k as with which can be used to obtain the smoothing distribution by marginalization over x k+1 , Under the Gaussian assumption, the problem is to recursively obtain the mean and covariance of the Gaussian marginal smoothing posterior distribution, which is given by [37,38] x,k+1|k and k,k+1|k referring to the cross-covariance between x k and x k+1 .Note that in practice we do not require the computation of the smoothing estimation error covariance, x,k|N , for the smoother recursion.However, it is useful in order to have a measure of the smoothing uncertainty.The smoother gain D k can be easily obtained from the standard forward filtering pass, therefore adding very few extra computation.

Sigma-point Gaussian filtering and smoothing
An appealing class of filters and smoothers within the nonlinear Gaussian framework are the sigma-point Gaussian filters (SPGF) [14,15,34,39,40] and smoothers (SPGS) [37,38], a family of derivative-free algorithms which are based on a weighted sum of function values at specified (i.e., deterministic) points within the domain of integration, as opposite to the stochastic sampling performed by particle filtering methods.The idea is to use a set of so-called sigma-points to efficiently characterize the propagation of the normal distribution over the nonlinear system.In the sequel, we detail the formulation of such approximation and how it can be used to perform filtering or smoothing.

Filtering
Consider a set of sigma-points, {ξ i , ω i } i=1,...,L M .Then, construct the transformed set which captures the mean and covariance of the posterior distribution, The integrals in the prediction step can be approximated as In the following update step, first compute the transformed set to capture the mean and covariance of the predictive marginal distribution, x i,k|k−1 = S x,k|k−1 ξ i + xk|k−1 , with x,k|k−1 = S x,k|k−1 S x,k|k−1 .Then, we approximate the integrals of interest as,

Smoothing
The smoothed state ( 17) is obtained using the predicted filtering and smoothing states, xk+1|k and xk+1|N , respectively.Define again a set of sigma-points and weights, {ξ i , ω i } i=1,...,L M , and the transformed set which captures the corresponding mean and covariance, x i,k|k = S x,k|k ξ i + xk|k .Use this transformed sigma-points to estimate the predicted subspace state, its prediction error covariance, and the cross-covariance as x,k+1|k = Finally, estimate the smoothed subspace and covariance as x,k|N = x,k|k x,k+1|k .Notice that the smoother gain can be embedded into the prediction step of the forward filtering, then only the last step is performed in the backward recursion.At time k = N, both filtering and smoothing estimates are the same, then the backward pass runs from time N − 1 to 1. Compared to the filtering process, implementation of the smoothing solution only impacts in having additional steps 6 and 12 in Algorithm 1, where we use the notation x = S x S x for the factorized covariances.

Hierarchically Gaussian measurement noise formulation
In the previous Section 2, we assumed a Gaussian measurement noise with known covariance matrix.But in Algorithm 1 Sigma-point Kalman filter/smoother Factorize covariance and evaluate sigma-points: Estimate the predicted subspace state and covariance: Estimate the cross-covariance and smoother gain: Measurement update 8: Factorize covariance and evaluate the sigma-points: Predicted measurement, cross and innovation covariance: Estimate the updated subspace state and error covariance: challenging applications such as the NLOS propagation conditions of interest here, the Gaussian assumption does not hold and noise parameters may be unknown to a certain extent.In such scenarios, one may have outliers or impulsive behaviors that produce biased estimates, for instance, under NLOS conditions the receiver is likely to estimate distances to the anchors larger than the true ones [6]; therefore, we must account for more accurate observation models.In general, these non-Gaussian behaviors can be effectively characterized by parametric heavy-tailed and positive-skewed noise distributions.It has been recently shown experimentally that TOA-based positioning under NLOS conditions [7] and multipath ranging error distributions in LTE networks [8] can be well approximated by a skew t-distribution [9].Taking into account the problem at hand, we are interested in measurement models with independent observation components and measurement noise models where the noise components are independently univariate skew t-distributed with ST n k,i ; μ, σ 2 , λ, ν defined in Section 1.
A key point on the problem formulation is to take advantage of the hierarchically (conditionally) Gaussian formulation of the measurement noise distribution.The hierarchical Gaussian representation of the skew tdistribution is written as [41] While τ k,i controls the heavy-tailed behavior, γ k,i controls the skewness of the distribution.
We can define the vector with 2 × n y noise distribution latent variables,
The measurement noise in (24) can be written as where The distribution hyperparameters are application dependent and typically assumed a priori known.The standard Gaussian filter/smoother in charge of the state estimation assumes a zero-mean Gaussian measurement noise with known parameters.In the skew t-distributed case, at every time step, the filter requires an estimate of the corresponding mean and covariance, m k (φ k ) and R k (φ k ), respectively.In the following, we consider the marginalization of the noise latent variables in the general filter/smoother formulation.

Noise latent variables marginalization
In the problem of interest, the measurement noise is conditionally Gaussian with unknown noise latent variables.Therefore, the filtering/smoothing formulation in Section 2 must be modified to take such uncertainty into account.We assume known measurement noise distribution hyperparameters, and thus, we want to marginalize the state estimation with respect to the noise latent variables, γ k,i and τ k,i .We can write the marginalized posterior distribution as Notice that the measurement noise parameters only affect the computation of the innovation in the measurement update of the filter/smoother.Within this context, the predicted measurement is reformulated as where the first term corresponds to the standard Gaussian case, and the second is a marginalization of the noise term over the last available distribution (i.e., we write n k (φ k ) to explicitly emphasize the dependency on the noise latent variables, φ k ).Using a similar formulation, we can rewrite the innovation covariance as For the marginalization of the noise latent variables, a key point is to obtain the posterior distributions of γ k,i and τ k,i .The joint posterior is given by As the observation components are assumed to be independent, the likelihood function is We can define a normalized observation and γk,i λγ k,i /σ .Then, we have the normalized likelihood is given by Using the conjugate nature of the prior distributions [42], it is possible to obtain the analytical solution for the posterior of γ k,i and τ k,i .In this case, we have from (28) that the a priori distributions are with γ 0 = 0, κ 0 = σ 2 /λ 2 , and α 0 = β 0 = ν/2.We are interested in updating with the new measurements to get the posterior distributions with from basic conjugate analysis results.Interestingly, the posteriors at k in ( 38) and ( 39) can be used as the priors in k + 1 instead of ( 36) and (37).In this way, the algorithm is learning the environment as it progresses over time.However, given the assumed model, it is more meaningful to reset the prior at each time instant instead of sequentially using the latest posterior.The reason is that measurements are assumed independent, so there is no benefit in carrying out information from one time instant to the other.Instead, under these conditions, we suggest to use the values γ 0 , κ 0 , α 0 , and β 0 at k − 1 before updating the distribution with ỹk,i .Sequential use of the posterior will be interesting when the generative model is known to have some memory.
In [25], we proposed to use a point estimate for γ k,i and τ k,i from a single observation using their posterior marginals.The corresponding modes of these distributions were used as point estimates γk,i = where we took into account that γk,i ∈ R + by construction.In this contribution, instead of using a point estimate, we consider a Monte Carlo-based marginalization drawing L samples from the joint posterior given by (38) and (39).Using these distributions, we propose to compute the two integrals of interest as k being random samples drawn from the joint posterior distribution of the noise latent variables, φ k .In practice, this can be easily implemented by first drawing a sample from (39) and then, using that sample, draw from (38).These expressions can be further expanded as follows where γ (j) k,i are random samples drawn from the posterior of γ (j) k,i , and Rk is approximated by a diagonal matrix, where the p-th element of the diagonal is Finally, we have that the marginalized Monte Carlo sigma-point Gaussian filter and smoother (MSPGF/S) proposed in this contribution is given by Algorithm 1 with step 9 modified as A further improvement of standard SPGF/S schemes comes from the fact that the filter should preserve the properties of a covariance matrix, namely, its symmetry and positive-definiteness.In practice, however, due to lack of arithmetic precision, numerical errors may lead to a loss of these properties.To circumvent this problem, a squareroot filter can be considered to propagate the square root of the covariance matrix instead of the covariance itself [33,34].We propose to use square-root cubature and quadrature Kalman filters/smoothers (named SCKF/S and SQKF/S, respectively) [38,43] as the core implementation of the new square-root MSPGF/S.These methods resort to cubature [34] and Gauss-Hermite quadrature rules [15] to approximate the integrals in the optimal solution.While the SCKF/S uses L c = 2n x sigma-points, in the SQKF/S we have L q = α n x , where α determines the number of sigma-points per dimension, which is typically set to α = 3.A straightforward solution to avoid the exponential computational complexity increase of the standard QKF in high-dimensional systems is the use of sparse-grid quadrature rules, which reduce the computational complexity with negligible penalty in numerical accuracy [44,45].

SSM for the TOA-based localization problem
To show the performance of the proposed approach, we consider a TOA-based localization problem, where a set of N anchor nodes at known locations, x k,i =[ x k,i , y k,i ] , provide range information.If we define the state to be inferred as position and velocity components of the target, p k (x k , y k ) and v k (ẋ k , ẏk ) , respectively, the observed range from each node i to the target, is modeled as ρk,i = ρ i (x k ) + n k,i , i ∈ {1, . . ., N}, with n k,i denoting the ranging error and ρ i (x k ) ρ k,i = x k − x k,i the true distance from the i-th node to the target node at time k.The complete measurement equation is given by . . .
In standard localization applications, the measurement noise is nominally distributed according to n t ∼ N n t ; 0, σ 2 • I N , where σ depends on the technology used to obtain the ranging estimates.In the case of UWB devices, this is typically on the order of 0.1 to 1 meter.But the Gaussian distribution does not capture the NLOS propagation conditions [7]; thus, we must account for more accurate measurement models such as the skew tdistribution introduced in the previous sections.Using the noise n k defined in (29), the measurement equation is defined as Considering the state x k =[ x k , y k , ẋk , ẏk ] , the process equation is defined as a linear constant acceleration model with The Gaussian process noise u k ∼ N u k ; 0, σ 2 u I 2 models an acceleration of σ u m/s 2 .

Numerical results
For the numerical evaluation of the proposed method, the root mean square error (RMSE) of position is used as the measure of performance, which is obtained from 1000 Monte Carlo runs.The new method was validated in a realistic scenario composed of N = 6 anchor nodes, circularly deployed in a 40 × 40 m 2 area, and considering σ u = 0.03 m/s 2 .We compare the tracking performance obtained with four methods: 1. SQKF/S operating under the Gaussian assumption without accounting for the non-Gaussian nature of the measurement noise (SQKF/S-G).
2. SQKF/S using point estimates of the noise latent variables ψ k,i as proposed in [25] (SQKF/S-P).3. New square-root SPGF/S-based solution with marginalized noise latent variables φ k within the filter/smoother formulation via Monte Carlo sampling (MSPGF/S).4. A clairvoyant SQKF/S that knows exactly the realization of the latent variables φ k at each instant k and thus can use m k (φ k ) and R k (φ k ).This is the performance benchmark for the new methodology (SQKF/S-K).
We also considered a sampling importance resampling PF with 81 particles (i.e., equivalent to the number of sigma-points in the SQKF/S), but as already shown in [46], the filter is in general not able to correctly localize the target (i.e., the filter diverges).Moreover, to obtain the same performance than the clairvoyant SQKF/S, we must consider a much larger number of particles.This is the reason why these results are not shown in the figures, since for the fair comparison in terms of number of particles the PF does not provide convergent result.
The proposed MSPGF/S can be implemented using cubature [34] and Gauss-Hermite approximations [15], then using respectively L c = 2n x = 8 and L q = α n x = 81 deterministic samples to approximate the integrals of the general solution.In the proposed indoor localization scenario, we tested both cubature and quadrature approximations, and the performance obtained was found strictly equivalent.In practice, the method of choice is the cubature-based solution, which has the lowest computational complexity.
Notice that all the methods consider known distribution hyperparameters, which are application dependent.We consider an UWB TOA-based indoor localization realistic scenario, with hyperparameters given in [7] and adjusted to match real data: μ = −0.1 m, σ = 0.3 m, and λ = 0.6 m and ν = 4.The corresponding Gaussian approximation is given by μ G = 1.3 and σ G = 1.6.Although the clairvoyant filter/smoother (SQKF/S-K) with fully known measurement noise parameters outperforms the rest, we have a small performance loss with the proposed methodology considering unknown noise latent variables.The new marginalized approach is more robust and outperforms the SQKF/S-P using point estimates first proposed in the filtering context in [25].The SQKF/S-G operating under the full Gaussian assumption, even if the parameters of the underlying Gaussian noise (i.e., μ G and σ G ) are correctly obtained to fit the real data, shows the worst performance.This is because this filter/smoother does not take into account the NLOSinduced outliers in the measurement noise.For the sake of completeness, we assess the impact of the Monte Carlo sample size in the MSPGF/S performance.The mean RMSE of position and velocity, for the different methods and several representative values of L, are given in Table 1.The performance of the proposed approach is not degraded when using a sample size as low as L = 50 samples, therefore being possible to keep a low overall computational complexity.
Notice that the parameter ν of the skew t distribution controls the tails of the distribution.Lower ν implies  heavier tails, thus more outliers and impulsive behaviors.
To fully characterize the new method, the performance obtained in the UWB TOA-based localization scenario but now with ν = 2, to induce more outliers in the measurements, is shown in Figs. 3 and 4. The proposed method correctly deals with the non-Gaussian noise and approaches the optimal solution.In this case, the performance given by the filter/smoother under the Gaussian assumption (SQKF/S-G) is really poor.This is because the underlying noise distribution is more heavy-tailed, then the Gaussian approximation is no longer valid.

Conclusions
This article presented a new Bayesian filtering and smoothing framework to deal with nonlinear systems corrupted by parametric heavy-tailed skew t-distributed measurement noise.The new method takes advantage of the conditionally Gaussian form of the skew tdistribution, which allows to use a computationally light

Table 1
Mean RMSE of position and velocity versus Monte Carlo sample size L in the MSPGS