Optimized Self-Localization for SLAM in Dynamic Scenes Using Probability Hypothesis Density Filters

In many applications, sensors that map the positions of objects in unknown environments are installed on dynamic platforms. As measurements are relative to the observer's sensors, scene mapping requires accurate knowledge of the observer state. However, in practice, observer reports are subject to positioning errors. Simultaneous localization and mapping addresses the joint estimation problem of observer localization and scene mapping. State-of-the-art approaches typically use visual or optical sensors and therefore rely on static beacons in the environment to anchor the observer estimate. However, many applications involving sensors that are not conventionally used for Simultaneous Localization and Mapping (SLAM) are affected by highly dynamic scenes, such that the static world assumption is invalid. This paper proposes a novel approach for dynamic scenes, called GEneralized Motion (GEM) SLAM. Based on probability hypothesis density filters, the proposed approach probabilistically anchors the observer state by fusing observer information inferred from the scene with reports of the observer motion. This paper derives the general, theoretical framework for GEM-SLAM, and shows that it generalizes existing Probability Hypothesis Density (PHD)-based SLAM algorithms. Simulations for a model-specific realization using range-bearing sensors and multiple moving objects highlight that GEM-SLAM achieves significant improvements over three benchmark algorithms.


I. INTRODUCTION
S CENE mapping equips dynamic observers with the ability to explore and create three-dimensional representations of unknown environments, hence impacting on autonomous as well as guided systems, including robots, unmanned aerial vehicles, ad-hoc mobile networks, and planetary rovers.In practice, the detection of objects from the observer's sensor measurements is subject to uncertainty, whilst multipath effects in, e.g., communications, SONAR, and acoustic sensors, lead to false alarms.Moreover, dynamic objects result in spatiotemporal variations.Tracking algorithms [1]- [3] can be used to exploit temporal models of the object dynamics and to estimate their smoothed trajectories from the distorted detections.Since object detections are relative to the observer's positional state, accurate knowledge of the observer trajectory in the environment is required in order to track objects across time and space.Reports of the observer position and motion can be obtained from, e.g., Global Positioning System (GPS) or gyroscopic sensors.However, in practice, the observer reports are subject to errors due to physical and mechanical limitations.For example, inertial sensor reports of robots decrease with hardware age due to the wear of the mechanical parts [4].Dead reckoning [5], propagating the initial position using the reports, therefore leads to divergence between the estimated and ground truth observer trajectories.
Observer localization can be improved by identifying the observer state that optimally aligns the sensor measurements with the estimated objects' positions.Conditional on the estimated observer position, the objects are mapped using tracking algorithms.Observer localization and object mapping are therefore jointly dependent and present the estimation problem of SLAM.
SLAM developments to date led from the basic formulation of the simultaneous estimation problem [6], [7], through convergence and consistency studies [8], to the current focus on robust perception [9].An overview of the open problems and future directions of SLAM research is provided in [10].
The aim and objective of this paper focus on two of the challenges highlighted in [10].The aim of this paper is to provide a theoretical framework for SLAM that is suitable for sensors that facilitate perception beyond vision but are not yet conventionally used for SLAM, such as acoustic microphone arrays [11], [12].Since many non-conventional sensors are deployed in environments where the positions of the observer and the objects are highly time-varying, e.g., underwater, the objective of this paper is the development of an approach that is specifically designed for dynamic environments.
SLAM was originally developed for the use of LIDAR range finders.Research rapidly focused on visual sensors as monocular, stereoscopic, and RGB-D cameras became available as commodity hardware [13]- [15].Visual SLAM can be broadly classified into approaches that use either local or global descriptors.Local descriptors [16], [17] involve detection algorithms that abstract objects into salient features.Global descriptors [18] process the image without a detection phase, and are hence particularly suitable for place recognition.
However, global descriptors are often difficult to acquire for non-visual sensors.For example, acoustic signals correspond to the superposition of multiple interfering source signals, each of which is the result of a convolution with the acoustic transmission channel.Processing of audio content without a detection phase is therefore difficult.As such, feature-based SLAM is often a more appropriate choice over global descriptors for modalities other than visual sensors.
The majority of feature-based SLAM approaches are graphbased [19]- [21].Graph-based SLAM typically relies on non-convex or non-linear optimization techniques.Hence, convergence to local minima may result in estimates unsuitable for navigation [10].Moreover, pose graph optimization is susceptible to false feature detections.For improved robustness, loop closure [22] is typically used to recalibrate the observer trajectory when place recognition detects a previously visited scene.Fundamentally, loop closure relies on the assumption that each scene must contain immovable objects, of which at least some are visible each time the scene is revisited.However, this static-world assumption is invalid in dynamic environments.For example, underwater scenarios are subject to severe changes in both the positions of objects, such as vessels, as well as the seabed, i.e., the transmission channel for passive or active SONAR.In fully dynamic environments, loop closure is therefore not applicable, causing significant limitations for large-scale autonomy.Even on short scales, SLAM in dynamic scenes to date remains challenging [10].
For dynamic scenes, probabilistic filtering is a natural alternative to graph-based approaches.Perhaps the most widely used probabilistic feature-based approach is FActored Solution To Simultaneous Localization and Mapping (FastSLAM) [23]- [25].A particle filter [26] samples multiple candidates of the observer state from the prior dynamical model of the observer motion.For each observer candidate, or particle, an ensemble of Extended Kalman Filters (EKFs) is used to estimate the feature positions, one at a time.Data association is used to establish a one-to-one mapping between the detections and features.However, in realistic conditions affected by false alarms and missing detections, robust data association is one of the fundamental challenges of SLAM algorithms [27].
For robust estimation against false alarms and missing detections, the Rao-Blackwellized PHD (RB-PHD) filter [28], [29], formulates the SLAM problem using Random Finite Sets (RFSs) to avoid the need for heuristic data association altogether.For each observer particle, one realization of a PHD filter [30], [31] is used to track the set of features.The most likely observer state is identified based on the Probability Density Function (pdf) of the set of detections conditional on the feature estimates.However, the expression of this pdf as proposed in [28], [29] is of combinatorial nature and hence computationally intractable.Therefore, the RB-PHD filter approximates the multi-detection pdf by considering the contribution of a single feature only.Alternatively, [32] approximates the intractable likelihood by a randomized computation of the equivalent matrix permanent.The work in [33] avoids approximating the multidetection pdf by extending [28] to a Labeled Multi-Bernoulli (LMB) filter [34].However, as reported in [34], the feature update for each observer particle and time step can require several seconds of computational time.Hence the LMB filter remains computationally prohibitive for many large-scale SLAM problems.
Rather than approximating the multi-detection pdf, the Single Cluster PHD (SC-PHD) filter [35]- [38] uses the closed-form solution.The SLAM problem is considered as the estimation of a cluster process of features, where the cluster centre corresponds to the observer state.Thus, the observer state is estimated as the centroid that aligns all feature detections with the feature estimates in the map.However, as the SC-PHD filter relies only on feature information for observer localization, applications where features are highly dynamic and subject to uncertainty can lead to short-term ambiguities and hence long-term divergence of the observer estimates.
For robust SLAM performance in dynamic and uncertain environments, this paper proposes a novel approach, called GEM-SLAM, that fuses the knowledge inferred from feature mapping based on the SC-PHD filter, with reports of the observer motion, thereby generalizing existing PHD-based SLAM approaches.Such observer reports are typically obtained from inertial sensors, but may also include path planning instructions, motor outputs, or GPS coordinates.By considering the reports as measurements of the observer position, GEM-SLAM exploits uncertainties in both the observer dynamics and report accuracy to propagate the posterior observer pdf in time.Observer localization is optimized by intersecting the space of likely observer positions corresponding to the posterior pdf with the space of observer positions corresponding to the closed-form multi-detection likelihood of the features.
The novel contributions of this paper are therefore as follows.1) We derive the theoretical framework and show that GEM-SLAM fuses observer information from features with the observer reports.Furthermore, illustrative examples will demonstrate that GEM-SLAM reduces ambiguities in observer localization arising from dynamic scenes, leading to improved accuracy in the observer and feature estimates.2) The proposed framework is derived as a general, model-independent formulation.GEM-SLAM is therefore sensor-agnostic and can be realized for different sensor-specific models, including sensors that are not conventionally used for SLAM.For illustrative purposes and performance evaluation, this paper presents in addition to the general formulation a model-specific realization for range-bearing sensors in dynamic scenes.3) A detailed discussion of the relationship between GEM-SLAM, the RB-PHD and SC-PHD filter as well as FastSLAM shows that GEM-SLAM generalizes existing PHD approaches, expressing the RB-PHD and SC-PHD filter as special cases.Using the model-specific realization, the estimation accuracy of GEM-SLAM is compared against the three benchmark approaches using simulated data.Simulations are used for the lack of available open-source datasets appropriate for SLAM in dynamic scenes.The results demonstrate that GEM-SLAM results in significant improvements in observer localization and feature mapping accuracy for increasing report errors, False Alarm Rates (FARs) and dynamic features.
This paper is structured as follows: Section II provides the problem formulation.The necessary background theory on RFSs is summarized in Section III.The proposed approach is derived and its relationships analyzed in Section IV.A modelspecific realization is provided in Section V.The experimental setup is described in Section VI, results are discussed in Section VII, and conclusions are drawn in Section VIII.

II. PROBLEM FORMULATION
A general formulation describes the positional observer state, r t , at time step t, as a function of the control input, u t , and the process noise, v t , i.e., where f ( where d(•) captures the dynamics of the feature and the process noise, n t,n , allows for variations of the feature state from the dynamical model.Objects may appear in and leave the scene over time, therefore leading to feature birth and death.The number of features hence is time-varying and unknown a priori, such that the N t feature states can be described as a RFS [39] with realizations, S t , modelled as where s t,n is the state of feature n relative to the observer state, r t , the set cardinality is |S t | = N t , the newborn feature process is B t , and P (s t−1,n ) is the survival process: where ∅ denotes the empty set.Given the sensor measurements, M t detections, ω t,m , m = 1, . . ., M t , of the N t features are obtained relative to r t , i.e., ω t,m = ḡ (s t,n , e t,m ) , (6) where the mapping from the feature to the detection space is given by ḡ(•), and e t,m describes the detection error.
In practice, detection algorithms often result in missing detections and false alarms.The feature detections thus are modelled by a RFS with realizations, Ω t , such that where D(s t,n ) models the detection process as for m ∈ 1, . . ., M t .The term K t in (7) is the RFS of false alarms, commonly modelled as a Poisson point process [30], [40], [41] of N t,c Independent and Identically Distributed (IID) false alarms distributed uniformly over the sensor Field of View (FoV) [29], [36].The likelihood, κ(ω t,m |r t ), of a singledetection corresponding to a false alarms is hence where U(•) denotes a uniform pdf, and λ c is the maximum number of false alarms in the surveillance region.

III. FISST PREREQUISITES
This section provides a summary of Finite Set STatistics (FISST) required for the derivation of GEM-SLAM.

A. Multi-feature Probability Density Function
Assuming the observer state, r t , is known, the posterior feature pdf, p ( S t | r t , Ω The set integral, • δW , for any, W , is defined as [30] p(W ) δW The summation in (12) enumerates the hypotheses that any number of objects can be contained in W .The set integral is hence equivalent to marginalizing over all subsets {w 1 , . . ., w n } ⊂ W , ∀n = 0, . . ., ∞.However, as a consequence, ( 10) is combinatorially intractable.

B. Probability Hypothesis Density
The first-order multi-object moment, referred to as the PHD, λ (w), can be used to approximate (10).The PHD expresses the probability that one of the multiple objects in W has the state w.If p (W ) is a Poisson point process [40], i.e., the number of objects, L, in W is Poisson distributed and the feature states are IID, the pdf, p (W ), can be expressed in terms of its PHD, λ (w), and vice versa as [30], [39] where δ W (w) = w ∈W δ w (w) is the sum of Dirac-Delta functions concentrated at w ∈ W .By definition, the PHD is not a pdf but rather an intensity, since [30]: where L is the number of objects in the region, S, of the state space, and E [•] is the expectation operator.The expression in ( 15) is a crucial property as estimation of λ (w) simultaneously results in an estimate L.

C. Multi-Feature PHD for Known Observer Positions
The PHD filter was originally proposed in [30] for a static observer with known position.The results can be straightforwardly extended to a dynamic observer with known states as summarized in (25) set at bottom of the page.
The posterior PHD, λ ( s t | r t , Ω 1:t ) for any s t ∈ S t , conditional on the a priori known state of a moving observer, models the hypotheses that 1) any detection may be due to a newborn source, 2) any detection may originate from any of the existing features, or 3) any existing feature may not be detected, i.e., [30] where the PHDs of newborn, missed, and detected features, The predicted PHD, λ ( where p s (s t−1 ) is the survival probability, p ( is the single-feature transition density, and the PHD, IV. PROPOSED GEM-SLAM ALGORITHM SLAM sequentially estimates the joint marginal posterior pdf, p ( r t , S t | Ω 1:t , y 1:t ).In this section, the proposed GEM-SLAM algorithm is derived.

A. Posterior pdf
Defining X t (r t , S t ) and Z t (y t , Ω t ) as realizations of marked point processes [40] corresponding to the joint states and measurements respectively, the posterior SLAM pdf, p ( X t | Z 1:t ), can be written using Bayes's theorem as where p ( Z t | X t ) is the likelihood function of the observations, and the predicted density, p ( X t | Z 1:t−1 ), is obtained by marginalizing, X t−1 , from the transition density, p ( X t | X t−1 ), and the posterior at t − 1, i.e., Similar to (10), the posterior SLAM pdf is combinatorially intractable, but can be approximated by its PHD as detailed in the following subsections.The derivations can be found in Appendix A.

B. Predicted GEM-SLAM PHD
Using the probability chain rule, the predicted pdf in ( 22) can be written as where p ( S t | r t , Ω 1:t−1 ) is defined in (11), and p ( r t | r t−1 ) and p ( r t−1 | y 1:t−1 ) are the observer prior pdf and posterior pdf at t − 1 respectively.. As derived in Appendix A-A, the joint predicted PHD of ( 23) is given by for any x t ∈ X t , where λ ( s t | r t , Ω 1:t−1 ) is given in (20).The joint predicted PHD therefore extrapolates p ( r t−1 | y 1:t−1 ) using the observer dynamics specified by the prior, p ( r t | r t−1 ), and marginalizes the previous observer state, r t−1 , from the result as well as the predicted feature PHD.

C. Updated GEM-SLAM PHD
Similar to (23), the probability chain rule can be applied to (21), leading to (25) below.Note that Bayes's theorem for the posterior observer pdf, p ( r t | y 1:t ), and similarly p ( S t | r t , Ω 1:t ), can be arranged to where p ( y t | r t ) and p ( r t | y 1:t−1 ) are the likelihood and predicted observer pdf respectively.Inserting ( 26) into (25): where the multi-detection evidence, p ( Ω t | r t ), is given by the Chapman-Kolmogorov equation [42] as For tractability of (27), p ( where λ ( s t | r t , Ω 1:t ) is provided in ( 16) and ( 17).The multidetection evidence, L ( Ω t | r t ), is given by where ( ω t,m | r t ) is defined in (18), and N t,c and N t|t−1 respectively are the number of false alarms and predicted features obtained by applying (15).The result in ( 29) is crucial as it allows for full exploitation of the joint dependency between the features and the observer.The features are dependent on the observer through λ ( s t | r t , Ω 1:t ).Simultaneously, the observer pdf is dependent on the features through the multi-detection evidence, L ( Ω t | r t ), facilitating probabilistic anchoring of the observer using the features.This point is further elaborated on in Section V-C and Section V-E for the model-specific GEM-SLAM realization.The necessary equations for the general formulation of GEM-SLAM are summarized in Algorithm 1.
To provide insight into the physical meaning of the multidetection evidence in (30), recall from ( 18) that the singledetection evidence, ( ω t,m | r t ), models the probability of the detection being either due to a false alarm with probability κ ( ω t,m | r t ), or due to a feature with probability p ( ω t,m | r t ).For reasons of illustration, consider expanding the closedform solution in (30) using (18), leading to the combinatorial expression: (31) where W Ω t denotes the partitions of set Ω t not including the empty set, and {Ω t − W } denotes the set difference.The expression in (31) highlights that GEM-SLAM enumerates explicitly all hypotheses of the origin of detections, i.e., 1) all M t detections are false alarms; 2) all detections are due to features at W = Ω t ; and 3) each set partition W Ω t are due to features, whilst the remaining detections, Ω t − W , are false alarms.As the expression in (31) involves combinatorial terms, it is computationally intractable.This problem is avoided by the closed-form expressions in (30) and (18).In fact, the formulation of the multi-detection likelihood is one of the fundamental differences between the RB-PHD, SC-PHD filter and GEM-SLAM as discussed in the following subsection.

D. Relationship to Existing Approaches 1) SC-PHD SLAM:
The SC-PHD filter [36], [43] was the first approach based on the closed-form solution of the multidetection evidence in (30).The SC-PHD filter formulates the SLAM PHD as: where the observer posterior pdf in ( 32) is given by Therefore, the SC-PHD filter propagates the observer estimates using the predicted pdf, p ( r t | Ω 1:t−1 ), only.Recalling the predicted SLAM PHD in ( 24) and the dynamical model in ( 1), the SC-PHD filter therefore exploits the uncertainty in the dynamical model of the observer, but cannot incorporate the uncertainty in the reports.
In contrast, GEM-SLAM incorporates the uncertainty in the observer dynamics as well as the reports via the posterior pdf of the observer states.In (29), the observer information is optimally fused with information inferred from the features by intersecting the space of likely observer states of the posterior pdf with that of the multi-detection evidence.The optimal exploitation of additional information about the observer motion is crucial for dynamic scenes in order to disambiguate information inferred from the time-varying feature states.
2) RB-PHD SLAM: Similar to the SC-PHD filter, the RB-PHD filter [28], [29] relies on the predicted pdf of the observer state only.Moreover, the RB-PHD filter is based on the computationally intractable multi-detection evidence in (31).To address the combinatorial nature of the expression, the RB-PHD filter relies on an approximation, which considers the contribution of a single feature only, (see Appendix B): where and ŝt is the single, selected feature.The multi-detection evidence of the RB-PHD filter in (34) therefore accounts for two hypotheses: either all detections are false alarms or all detections are false alarms with the exception of a single detection that is due to a feature.In contrast, GEM-SLAM in (30) accounts for the hypotheses that any partition of detections may be due to features.GEM-SLAM therefore reduces to the RB-PHD filter if N t = 1 and p ( y t | r t ) = const.
3) FastSLAM: FastSLAM [23], [24], as an approach based on Random Variables (RVs) rather than RFSs, assumes that all features are static and exist permanently in the scene.Furthermore, it is assumed that at most one feature detection, ω t , is observed at each time step.The joint posterior pdf of the observer state and features is therefore factorized as [23]: where k t is the detection-to-feature assignment, obtained by heuristic [44] or probabilistic [25], [45] data association, and p ( r t | ω t , y t ) is the observer posterior pdf, evaluated using the current, single detection ω t .Using Bayes's theorem, the singlefeature update is hence expressed by The feature PHD in ( 16) reduces to (35) for a detection probability of p d = 1, survival probability p s = 1, false alarm intensity κ ( ω t, | r t ) = 0, and detection cardinality of M t = 1.Furthermore, assume that the multi-detection RFS, Ω t , can be considered as a sequence of M t single detections observed between ]t − 1, t].The multi-detection evidence in (30) thus reduces to Therefore, under the above assumptions, GEM-SLAM reduces to FastSLAM by a scaling factor of e −N t |t −1 .

V. MODEL-SPECIFIC REALIZATION
This section provides a model-specific realization of GEM-SLAM for range-bearing sensors and dynamic features.

A. Model
To provide a model-specific realization of GEM-SLAM, consider as an illustrative example a range-bearing sensor installed in the body of a humanoid robot.The observer state is defined as r t rT t , v t , γ t T where rt = [ x t , y t , z t ] T is the 3D position, v t denotes the velocity, and γ t is the orientation.Assuming that the observer moves in the direction of its orientation, the position, rt , becomes a non-linear function of the orientation.The observer state can hence be separated into a linear subspace, p t rT t , v t T , and its non-linear dependency, γ t , such that r t p T t , γ t T .Hence: where 0 P ×1 is the P × 1 zero vector, N (•) denotes the Gaussian pdf, v t,v is the process noise with covariance Σ t,v v , and the orientation in (36b) obeys a random walk with process noise, v t,γ , process noise variance, σ 2 t,γ , and ϑ(α) = mod (α, 2π).Furthermore, where where g(•) is the Cartesian-to-spherical transformation, e t,m is the detection error with covariance, R t,m , and s t,n is the sensor-relative feature state:

B. GM-PHD Filter for Feature Mapping
For the Gaussian state-space in ( 39) and ( 40), the feature PHD in ( 16) reduces to a Gaussian Mixture Model (GMM), [31]: where J t−1 is the number of Gaussian Mixture (GM) components at t − 1, w , are given by the EKF [46], i.e., m (j ) where the gain, K (j,m ) t , is given via the measurement Jacobian, G (j ) t , and innovation covariance, S (j,m ) t , as: The previous mean and covariance, m(j) t−1 and Σ(j) t−1 , relative to the current r t , are obtained using (41) as The GM weights are given by w t−1 and: where ( ω t,m | r t ) is defined in (18) with Furthermore, the newborn PHD can be modelled as a detectiondriven process [47], such that where J b is the number of newborn components sampled from each of the M t detections.The newborn GM mean is m t ] T , and weight, w )/J b , where g −1 (•) is the spherical-to-Cartesian transformation, and ( ẋ0 , ẏ0 , ż0 ) are constant a priori initialization values of the feature velocity.

C. Rao-Blackwellized Particle Filter for Observer Localization
Recalling (36), the orientation, γ t , is described by a wrapped Gaussian pdf [48], whilst p t is modelled by a Gaussian statespace that is non-linearly dependent on γ t .To separate the linear Gaussian subspace from the non-Gaussian orientation, p ( r t | y 1:t ) is factorized as where p ( γ t | y 1:t,γ ) and p ( p t | γ t , y 1:t,v ) are the posterior pdfs of the orientation and linear subspace respectively.The formulation of GEM-SLAM facilitates estimation of r t using a Rao-Blackwellized particle filter [49], where γ t is estimated by optimal Importance Sampling (IS) and p t is obtained from its optimal estimator formulated below.Using optimal IS, I particles, γ(i) t , i = 1, . . ., I, of the orientation are drawn from where N w (•) denotes a wrapped Gaussian pdf, and the mean, μ t,γ , and variance, ς t,γ , are obtained by the wrapped Kalman Filter (KF) [50].
As p t is described by a linear Gaussian state-space in (36a) and (38a), the optimal estimator of p t is the KF.Therefore, for each γ(i) t : where the mean, μ t,p , and covariance, ς t,p , are provided by the KF equations based on the dynamical model in (37) evaluated at γ(i) t .Applying ( 49)-( 51) to ( 29): where r(i) , λ s t | r(i) t , Ω 1:t is given by ( 16), and α α(j) t are the normalized IS weights.Using (29): where p (y t ) is the observer report evidence, given as where σ 2 t,v is the velocity component of Σ t,v v .Furthermore, using ( 30), ( 18), (47): Therefore, the observer particles are weighted by the multidetection and observer report evidence terms.The detection uncertainty in the observer orientation and speed, as well as the rotational and translational agility are explicitly accounted for via the observer evidence in (54).The multi-detection evidence in (55) facilitates exploitation of the existence of multiple features for observer localization.The product in (55) approaches zero for an increasing number of detections corresponding to  [49]; low likelihood values.The optimal feature-to-detection alignment thus corresponds to the observer particle that maximizes (55).Therefore, the observer is probabilistically anchored by identifying the location and orientation that corresponds to the optimal alignment between the features and detections.The GEM-SLAM pseudo-code is provided in Algorithm 2.

D. Relationship to Existing Approaches
As PHD-based approaches, the RB-PHD and SC-PHD filter use the same feature model as GEM-SLAM, estimated using the Gaussian Mixture PHD (GM-PHD) filter in Section V-B.In contrast, FastSLAM, based on RVs rather than RFSs, assumes that feature detections are processed one-by-one with known data association, and uses a single EKF for each feature.
The fundamental difference between the benchmark algorithms and GEM-SLAM is the approach to observer localization.The RB-PHD and SC-PHD filter as well as FastSLAM model the observer state, r t , as the Cartesian feature coordinates, rt , and use the observer reports as parameters of the control input, u t = Δ t y t,v [ sin(y t,γ ), cos(y t,γ ), 0 ] T [25].Hence, where Σ t,r is the process noise covariance.As (33) incorporates the predicted pdf, but not the updated posterior, observer localization for the RB-PHD, SC-PHD filter, and FastSLAM is realized by prior IS, i.e., [36] v The particle state, r(i) t , is evaluated from ( 57) using (36).Based on (57), prior IS hence incorporates the uncertainty in the dynamical model of the observer, but cannot model uncertainty in the reports.Therefore, prior IS is expected to lead to increased estimation variance and hence decreased performance compared to optimal IS used in GEM-SLAM [52].Furthermore, the GEM-SLAM particle weights in (53) compensate for the report evidence in addition to the multi-feature evidence.As discussed in Section IV-D1, this fusion is equivalent to evaluating the intersection between the areas of likely observer positions corresponding to the report and multi-detection evidence respectively.GEM-SLAM is therefore expected to result in a significantly reduced area of likely observer positions compared to the benchmark approaches.
Table I summarizes the model-specific differences between GEM-SLAM, the RB-PHD and SC-PHD filter as well as Fast-SLAM.The following subsection highlights these differences using an illustrative example.

E. Illustrative Comparison
Consider the initial observer position at (5, 2, 1.8) m in a 6 × 6 × 2.5 m 3 room with γ 0 = 0 rad and v 0 = 0.5 m/s.The observer trajectory is simulated using (36) and (38) with σ t,v γ = 0.08 rad, σ t,v v = 0.1 m/s, σ t,w γ = 0.18 rad and σ t,w v = 0.5 m/s.Two static features are initially located at (4, 3, 1.8) m and (2, 2, 1.8) m, both moving with a velocity of (0.1, 0.1, 0) m/s in the x-y-z directions.The feature states and detections are simulated using (39) and (40) for Q t = 10 −9 I 6 and R t = 0.1I 3 .The number of uniformly distributed false alarms was simulated from a Poisson distribution with λ c = 1.The room is divided into a grid with 5 cm 2 resolution in xand y-coordinates.The z-positions of the co-located observer sensor and sources are assumed known a priori.
For each observer position in the grid, the EKF equations are used to update the two feature states with each of the two detections, resulting in four new feature states.The resulting importance weights of GEM-SLAM, the SC-PHD filter, the RB-PHD filter, and FastSLAM displayed as contour plots in Fig. 1.The "confidence area" corresponding to the outermost contour level is evaluated numerically based on the distances between vertices along the contour lines.The resulting area for FastSLAM assuming known data association is shown in Fig. 1(a) and corresponds to a confidence area of 5.35 m 2 .In practice, data association leads to false detection-to-feature assignments.The impact of false assignments is illustrated in Fig. 1(b), clearly showing that the ground truth observer position is outside of the confidence area.Therefore, FastSLAM leads to unreliable observer estimates for false data associations.The results for the RB-PHD filter are shown in Fig. 1(c) and correspond to a confidence area of 7.87 m 2 .The SC-PHD filter leads to an area of 6.22 m 2 (see Fig. 1(d)) by using the closed-form multi-detection evidence rather than the approximation by the RB-PHD filter.By fusing observer information inferred from the features with the observer reports, GEM-SLAM in Fig. 1(e) leads to a reduced confidence area of 2.24 m 2 .

F. Computational Complexity
The computational complexity of the GEM-SLAM realization in Algorithm 2 is O(M t N t I), i.e., linear in the feature detections, number of features and observer particles.Therefore, the complexity of GEM-SLAM is comparable to the RB-PHD filter and multi-hypothesis FastSLAM as shown in [53], as well as the SC-PHD filter [54].As also mentioned in [53], all three PHD-filter approaches can be reduced to O(M t I log(N t )) using binary tree enhancements [25].
The crucial advantage of GEM-SLAM in terms of complexity is the required number of particles.As discussed in the previous subsection and highlighted in Fig. 1, GEM-SLAM needs to sample from a significantly reduced area of likely observer positions compared to the SC-PHD and RB-PHD filters as well as Fast-SLAM.Therefore, fewer particles are required for GEM-SLAM to achieve observer point estimates with comparable accuracy to the benchmark approaches.

A. Simulation Setup
The experimental setup is designed to decouple the GEM-SLAM performance from any specific detection algorithm.Hence, an 'oracle localizer' is used so that the Root Mean Square (RMS) localization error and rates of missing detections and false alarms can be controlled.The sensor path is simulated for 300 waypoints from (36) with γ 0 = π rad and Δ T = 0.1 s, with σ 2 t,γ = 0.52 rad 2 and Σ t,v v = 10 −9 I 4 .The observer reports are simulated from (38)  SLAM performance in static scenes is typically verified using open-source datasets, providing sensor data and accurate ground truth positions of the moving observer and nearby static features (see, e.g., [9,Table 46.1] for an overview).However, opensource datasets that provide ground truth positions of dynamic features are not currently available.

B. Experimental Description
Experiment 1: Evaluation for increasing RMS sensor report errors in velocity of σ t,w v = 0, 5, 10 m/s.Experiment 2: Evaluation for increasing FARs between λ c = {0, 3, 5} with σ t,w v = 5 m/s.Experiment 3: Evaluation for an increasing number of dynamic features, N t,d = {0, 1, 3}.Dynamic features are simulated [55] for a constant velocity with an arbitrarily chosen example value of 1 m/s.The orientation is simulated as a random walk with process noise standard deviation of 0.35 rad.

C. Implementation Details
The GEM-SLAM realization in Section V-C is evaluated using I = 100 particles, and L b = 10 newborn feature components per detection, N b = 10 −4 and Σ b = diag [2, 2, 1].The initial sensor state is assumed known a priori with Σ 0,p = diag [0.01, 0.01, 0.0025].The sensor pdf is approximated by evaluating the bimodal Kernel Density Estimate (KDE) over the xand y-coordinates of the particles [56], where H is the 2 × 2 symmetric, positive definite bandwidth.
To mitigate an exponential explosion, the feature GMM is pruned to a maximum of 250 components using [31, Table II] with merging and truncation thresholds, 0.05 and 10 −7 respectively.Systematic resampling [49] is applied to the sensor particle to avoid particle depletion [57].
The sensor point estimate, rt , is extracted as the weighted average of all particles.The number of features, Ñt , is obtained as the weighted average of the number of features across all sensor particles.To extract feature point estimates, all feature GM means are transformed relative to rt .The Estimation-Maximization (EM) algorithm in [58] is used to cluster the feature means into Ñt + 1 clusters, with one diffuse cluster to absorb outliers [59,Section 10.1].
The GEM-SLAM results are compared against FastSLAM implemented as detailed in [25,Table 3.2]; the SC-PHD filter as per [43,Figure 6,7]; and the RB-PHD filter as in [28, Table 1] for a single-feature map.All algorithms are initialized with J 0 = 0, i.e., without knowledge of the initial feature states.

D. Performance Metrics
Accuracy of the sensor estimates, reports, and ground truth is evaluated using the Euclidean distance between two positions, x t [ x t,x , y t,x , z t,x ] T and y t [ x t,y , y t,y , z t,y ] T : where • denotes the Euclidean norm.The OSPA distance [60], [61], Δ(X t , Y t ), is used as a measure of feature accuracy, continuity, and false track initialization.Given X t {x t,1 , . . ., x t,N } and Y t {y t,1 , . . ., y t,M }: where 1 ≤ p < ∞ is the OSPA metric order parameter, Π N is the set of permutations of length M with elements from {1, . . ., N}, and d c (x t,i , y t,π (i) ) min (c, d(x t,i , y t,π (i) )) is the cutoff distance between two tracks with cutoff value, c.
The OSPA parameters were chosen as c = 30 m and p = 2 to emphasize detection errors as discussed in [62].

A. Experiment 1 -Velocity Error
The average Euclidean distance between the ground truth observer path and GEM-SLAM estimates is shown in Fig. 2(a) for each setting of σ t,w v .The results are compared against the SC-PHD and RB-PHD filter, FastSLAM, and dead reckoning.
Across all three velocity error settings, GEM-SLAM achieves an average Euclidean distance of between 0.37 m, 0.41 m, and 0.25 m with standard deviation of 0.14 m, 0.17 m, and 0.11 m respectively.In comparison, dead reckoning results in an average localization error of 0.37 m with a standard deviation of 0.14 m for σ t,w v = 0 m/s.The error increases to 2.70 m with standard deviation of 1.46 m for σ t,w v = 5 m/s, and to 12.86 m with standard deviation of 6.48 m for σ t,w v = 10 m/s.GEM-SLAM hence results in observer localization improvements compared to dead reckoning of 84.81% and 98.05% for σ t,w v = 5, 10 m/s respectively.
The SC-PHD filter results in average Euclidean distances of 5.23 m for σ t,w v = 0 m/s, 5 m for σ t,w v = 5 m/s, and 8.48 m for σ t,w v = 10 m/s, with standard deviation of 1.95 m, 2.39 m and 4.1 m respectively.The RB-PHD filter results in a Euclidean distance of 5.18 m, 4.91 m, and 8.46 m and standard deviations of 1.92 m, 2.45 m, and 4.08 m respectively for the three settings of σ t,w v .Therefore, GEM-SLAM results in performance improvements of 92.93%, 91.8%, and 97.05% compared to the SC-PHD filter, as well as 92.85%, 91.64%, and 97.04% compared to the RB-PHD filter for σ t,w v = 0, 5, 10 m/s respectively.
The OSPA distances corresponding to the feature mapping accuracy of GEM-SLAM and the benchmark algorithms are shown in Fig. 2 To provide further insight into the results, Fig. 3 shows the estimated scene map obtained by GEM-SLAM at t = 100 for σ t,w v = 0 m/s and for σ t,w v = 10 m/s.A comparison with the estimated map obtained by the SC-PHD filter is also provided.For the baseline scenario of σ t,w v = 0 m/s, the results in Fig. 3(a) show that the GEM-SLAM estimates of the observer trajectory and feature positions accurately reflect the ground truth.In contrast, the map estimates of the SC-PHD filter in Fig. 3(b) diverge from the true observer path, resulting in increasing errors in the feature estimates.
This divergence is attributed to two factors.Firstly, by relying on only the feature information for observer localization, the SC-PHD filter leads to ambiguities in the estimated observer position when no features exist in the map, and can hence not be used to probabilistically anchor the observer.In our  experiments, all algorithms are initialized without prior knowledge of the feature positions.Features therefore need to be initialized from the observer-relative detections whilst simultaneously estimating the observer position itself.Ambiguity in the observer position can be resolved by optimally exploiting the observer reports, as demonstrated by the GEM-SLAM results.Secondly, uncertainty in the feature detections leads to uncertainty in observer localization.In the baseline scenario of σ t,w v = 0 deg, the observer reports are affected only by the orientation error of σ t,w γ = 1 deg.Report noise is therefore almost negligible, as indicated by the accurate dead reckoning estimates shown in Fig. 3(a) and 3(b).Nevertheless, the variance of prior importance sampling for the SC-PHD filter is determined by the process noise, i.e., σ 2 t,γ = 0.52 rad 2 , therefore leading to large estimation variance.In contrast, the use of the Rao-Blackwellized particle filter in GEM-SLAM leads to a reduced estimation variance, as both the report and process noise variance are explicitly used for importance sampling.The results are verified for a realistic velocity error of σ t,w v = 10 deg in Fig. 3(c) and 3(d).
The results in Fig. 2(a) also compare the GEM-SLAM performance against FastSLAM.The results highlight that Fast-SLAM results in a observer localization performance of 0.75 m, 2.69 m, and 7.23 m with standard deviation of 0.34 m, 1.45 m, and 4.35 m for σ t,w v = 0, 5, 10 m/s respectively.Observer localization of FastSLAM is therefore less robust to report uncertainty than GEM-SLAM.Nevertheless, despite the increasingly inaccurate observer estimates, FastSLAM's performance in feature mapping remains relative constant with OSPA distances of 14.33 m, 14.86 m, and 15.33 m and standard deviations of 7.59 m, 7.6 m, and 6.53 m for the three velocity error settings.The OSPA performance of FastSLAM is therefore comparable to that of the SC-PHD and RB-PHD filter for scenarios where uncertainty is dominated by observer report errors.Nevertheless, GEM-SLAM outperforms FastSLAM by up to 96.54% in observer localization accuracy and 94.13% accuracy in the feature map.

B. Experiment 2 -False Alarms
The results for increasing FARs, as discussed in Section VI-B, are summarized in Fig. 2(c).The results highlight robustness of observer localization for GEM-SLAM with Euclidean distances of 0.28 m, 0.26 m, and 0.29 m with standard deviations of 0.1 m, 0.09 m, and 0.1 m for λ c = 0, 3, 5 respectively.As only the FAR is varied in this experiment, dead reckoning results in an observer localization error of 4.69 m with 1.56 m standard deviation across all three settings of λ c .GEM-SLAM results in an improvement in observer localization of at least 93.8% compared to dead reckoning.The performance improvements of GEM-SLAM compared to the SC-PHD filter correspond to 95.96%, 96.02%, and 95.44%.The performance improvements of GEM-SLAM compared to the RB-PHD filter is between 95.91% and 96.5%, and against FastSLAM between 92.03% and 93.14%.
GEM-SLAM also outperforms the benchmark algorithms in terms of feature mapping acccuracy, with OSPA distance of 1.12 m, 10.97 m, and 16.11 m for λ c = 0, 3, 5 respectively.These results correspond to an improvement of between 59.95% and 95.58% compared to the SC-PHD filter; between 55.16% and 95.51% compared to the RB-PHD filter; and between 59.20% and 93.78% compared to FastSLAM.

C. Experiment 3 -Number of Dynamic Features
The observer localization performance of GEM-SLAM is compared to the benchmarks in Fig. 2(e), whilst Fig. 2(f) shows the corresponding OSPA results for feature mapping.The results highlight that GEM-SLAM achieves observer localization with accuracy of 0.23 ± 0.01 m on average across all three scenarios.Feature mapping is achieved with 2.2 m OSPA distance for a fully dynamic scene where all three features as well as the observer are moving.For a single moving feature, GEM-SLAM achieves an OSPA distance of 1.99 m, whilst a fully static scene results in 0.97 m feature mapping accuracy.
The SC-PHD and RB-PHD filter result in OSPA distances of 19.71 m and 19.55 m respectively, whilst FastSLAM results in an OSPA distance of 16.06 m for three moving features.A fully static scene results in OSPA distances of 16.65 m for the SC-PHD filter, 16.59 m for the RB-PHD filter, and 14.46 m for FastSLAM.GEM-SLAM therefore results in a feature mapping performance improvement for fully dynamic scenes of 88.83%, 88.74% and 86.30% compared to the SC-PHD filter, RB-PHD filter, and FastSLAM respectively.

VIII. CONCLUSION
This paper proposed a novel approach to SLAM, called GEM-SLAM, that is targeted at 1) dynamic scenes for moving (j )t|t−1 and w (j,m ) t are the predicted and updated GM weights, and the predicted and updated mean, m

Fig. 1 .
Fig. 1.Probabilistic anchoring of the observer for dynamic features: Map of the areas of likely observer positions for (a) FastSLAM assuming known data association; (b) FastSLAM with incorrect data association; (c) the RB-PHD filter, L RB ( Ω t | r t ); (d) the SC-PHD filter, L ( Ω t | r t ); and (e) GEM-SLAM, L ( Ω t | r t ) p (y t ).
with default values of σ t,w v = 5 m/s and σ t,w γ = 0.02 rad.The sources are simulated from (39) within a 50 × 50 × 3 m 3 surveillance volume with Q t = diag 10 −2 , 10 −2 , 10 −9 , 10 −3 , survival probability, p s = 1, and initial positions at the centre of three randomly selected quadrants inside the surveillance volume.The height of each feature is randomly drawn from a uniform distribution between [1.5, 1.95] m.Feature detections are simulated from (6) with R t,m = diag [25, 25, 9].False alarms are modelled by a Poisson process with a default FAR of λ c = 0, and uniformly distributed alarms within the surveillance volume.
(b).The results indicate that GEM-SLAM results in almost constant average OSPA distances of 1 m, 1.09 m, and 0.9 m with standard deviations of 4.04 m, 4.05 m, and 4.06 m for σ t,w v = 0, 5, 10 m/s respectively.In comparison, the SC-PHD filter results in OSPA distances of 15.76 m, 15.95 m, and 17.98 m with standard deviations of 8.18 m, 8.20 m, and 7.96 m respectively for the three velocity error settings.The RB-PHD filter corresponds to OSPA distances of 15.89 m, 15.91 m, and 18.23 m, with 8.13 m, 8.12 m, 7.81 m standard deviation respectively.Therefore, GEM-SLAM outperforms the SC-PHD and RB-PHD filters by up to 16.99 m and 17.33 m respectively.These results are clearly correlated with the significant improvement in observer localization.

Fig. 2 .
Fig. 2. Performance evaluation of GEM-SLAM and comparison to the SC-PHD and RB-PHD filter, FastSLAM, and dead reckoning.(a),(b) Experiment 1 for increasing reported velocity error; (c), (d) Experiment 2 for increasing FAR; (e),(f) Experiment 3 for an increasing number of moving features.Bars show the distance metrics and whiskers indicate the standard deviation across time.Top row: Average Euclidean distance of the observer state estimates compared to the ground truth across all time steps.Bottom row: Average OSPA distance of the multi-feature states.

Fig. 3 .
Fig. 3. Experiment 1: Map estimates after t = 100 time steps for one Monte Carlo iteration using σ t,w v = 0 m/s comparing (a) GEM-SLAM, and (b) the SC-PHD filter, and using σ t,w v = 10 m/s for (c) GEM-SLAM and (d) the SC-PHD filter.

Fig. 4 .
Fig. 4. GEM-SLAM scene map for λ c = 5, showing that 7 detections, including 4 false alarms, cause a single false initialization.The increasing OSPA distance with increasing λ c is mainly attributed to temporary false feature initializations.To illustrate this point, Fig. 4 shows the estimated scene map at t = 200 for a single Monte Carlo run with λ c = 5.The results show that from 7 detections, a single feature is falsely initialized at (46.29, 33.29) m.Nevertheless, the features are accurately mapped with estimates at (37.13, 38.08) m, (12.46, 12.75) m, and (12.44, 37.93) m compared to the ground truth at (37.46, 37.65) m, (12.57, 12.68) m, and (12.65, 37.73) m.False feature initialization could be reduced by reducing the number of birth components, J b , or by decreasing the birth weight, w b .Either setting reduces the contribution of birth components, thereby reducing the estimated number of features.As a tradeoff, reducing the birth weight also results in slower initialization of features.Adverse affects of slow initialization on observer localization can be avoided if the initial positions of some of the features are known a priori.
1:t ), is propagated sequentially via p ( S t | r t , Ω 1:t ) = p ( Ω t | r t , S t ) p ( S t | r t , Ω 1:t−1 ) p ( Ω t | r t , S t ) p ( S t | r t , Ω 1:t−1 ) δS t , P is the P × P identity matrix and Δ t is the time delay between t − 1 and t.The observer reports, y t [ y t,v , y t,γ ] T , containing the reported speed, y t,v , and orientation, y t,γ , are noisy measurements of the state, r t , i.e., t s a t−1,n + n t,n , n t,n ∼ N (0 6×1 , Q t ) (39) where n t,n is the process noise with covariance Q t , and the dynamical model, D t , follows a constant-velocity model.Using a range-bearing sensor, each detection is defined as ω t,m [ r t,m , φ t,m , θ t,m ] T , with sensor-to-feature range, r t,m , azimuth, φ t,m , and elevation, θ t,m , where

TABLE I MODEL
-SPECIFIC DIFFERENCES BETWEEN GEM-SLAM AND THE BENCHMARK ALGORITHMS