LiDAR Inertial Odometry Based on Indexed Point and Delayed Removal Strategy in Highly Dynamic Environments

Simultaneous localization and mapping (SLAM) is considered a challenge in environments with many moving objects. This paper proposes a novel LiDAR inertial odometry framework, LiDAR inertial odometry-based on indexed point and delayed removal strategy (ID-LIO) for dynamic scenes, which builds on LiDAR inertial odometry via smoothing and mapping (LIO-SAM). To detect the point clouds on the moving objects, a dynamic point detection method is integrated, which is based on pseudo occupancy along a spatial dimension. Then, we present a dynamic point propagation and removal algorithm based on indexed points to remove more dynamic points on the local map along the temporal dimension and update the status of the point features in keyframes. In the LiDAR odometry module, a delay removal strategy is proposed for historical keyframes, and the sliding window-based optimization includes the LiDAR measurement with dynamic weights to reduce error from dynamic points in keyframes. We perform the experiments both on the public low-dynamic and high-dynamic datasets. The results show that the proposed method greatly increases localization accuracy in high-dynamic environments. Additionally, the absolute trajectory error (ATE) and average RMSE root mean square error (RMSE) of our ID-LIO can be improved by 67% and 85% in the UrbanLoco-CAMarketStreet dataset and UrbanNav-HK-Medium-Urban-1 dataset, respectively, when compared with LIO-SAM.


Introduction
Mobile robots and autonomous driving both require accurate and reliable state estimation. It is challenging for traditional GNSS/INS to achieve high accuracy localization requirements in indoor and urban environments.
In recent years, some developments have been made in both visual and LiDAR SLAM. Due to the wide field of view and detailed texture information that cameras provide, visionbase slam can estimate a six-degree of freedom (6-DOF) state. However, it is easily affected by lighting shifts and low-texture and repetitive-texture environments [1,2]. Compared to cameras, LiDAR has more direct, accurate and reliable depth information for SLAM systems, allowing for robust state estimations even in challenging environments [3].

Localization and Mapping Based on LiDAR and Inertial
The most widely used SLAM technique in urban areas is a localization and mapping algorithm based on LiDAR [4], because LiDAR can accurately estimate the depth of the point cloud. We can categorize these investigations using representations of LiDAR measurements as a basis. The most widely used and basic method, the iterative closest point approach (ICP) [5], uses LiDAR point clouds directly. Recent studies have proposed a variety of representations, such as grid cells, features of LiDAR point cloud and surface elements. The primary distinction between these algorithms is matching search. In particular, to increase computational efficiency, variations in ICP approaches match associated points using downsampled points and various data structures, such as kd-trees, ikd-trees, incremental voxels, and projection depth maps [6][7][8]. The use of projection geometry for a nearest-neighbor search was proposed in [7,9]. SuMa, a powerful 3D LiDAR-based SLAM system that employs a shader language for parallel processing, was proposed by Behley et al. [9]. Using the input point clouds, they executed a spherical image projection and correlated it, treating these points as the closest points. In addition, this paper indicated that motion drift can be greatly decreased through surface element-based map creation and matching. Deep learning, in recent years, has made significant advancements. Many scholars have developed approaches based on deep learning [10,11] for LiDAR motion estimation using projected LiDAR features.
Zhang et al. [12] proposed LiDAR odometry and mapping (LOAM), which is a LiDAR slam solution based on LiDAR features. Instead of using complete point clouds in ICP, planar and edge point features are matched in LOAM to provide real-time and low-drift state estimation. This consists of the odometry system and the mapping system. The odometry module performs point-to-edge/plane matching at a high frequency, and the mapping module runs at a lower frequency to perform scan-to-map matching, which can obtain more precise pose estimations. During the nearest-neighbor search in the former module, kd-trees are built for each feature set from each LiDAR scan. The mapping module uses these LiDAR features to construct a global map and optimize poses further. In the KITTI odometry benchmark site, LOAM has achieved the best performance [13]. In order to further enhance performance, numerous LiDAR slam frameworks based on LOAM have been proposed. F-LOAM [14] uses a non-iterative, two-stage distortion compensation approach to boost the system's real-time performance and computational efficiency. Nevertheless, there is still no sub-module for global optimization, such as a loop detection strategy. This will lead to significant cumulative errors over long periods of time in large-scale environments. HDL SLAM [15] is a multi-sensor fusion framework that can combine LiDAR, IMU and GNSS sensors; however, the reliability of the frame-to-frame registration is low because it is based on the NDT algorithm. LeGo-LOAM [16] extracts two types of points in the feature extraction module: ground and non-ground points. This significantly improves feature registration efficiency. Then, the LiDAR mapping module receives its initial value via two-step registration. Moreover, a keyframe selection strategy and loopback detection method are added to the backend for better global localization consistency and real-time performance. However, because of the loose coupling of IMU and LiDAR, running in a wide range of scenarios still leads to large cumulative errors. Liu et al. [17] proposed a feature extraction method and computed descriptors based on deep learning. This method uses two-stage state estimation and experiments in a longrange environment. Ye et al. [18] proposed a tightly coupled LiDAR SLAM algorithm, LIO-Mapping, built on VINS and LOAM. For feature extraction and state estimation, a LiDAR front-end component in LOAM takes the role of the visual front-end component in VINS-Mono. Unfortunately, due to the size of the optimization issue, real-time execution on an unmanned vehicle is challenging. LiLi-OM [19] presents an adaptive keyframe selection strategy that can be used for solid-state LiDAR and conventional LiDAR. In addition, it introduces a method with metric weights for sensor fusion. To further increase processing speed and trajectory accuracy, LIO-SAM [20] represents a tightly coupled LiDARinertial odometry system based on the framework of incremental smoothing and mapping (iSAM2) [21]. The factor graph optimization can also incorporate GPS and loop closure factors. An algorithm for loosely coupled adaptive covariance estimation was proposed by Zhang et al. [22]. Better experimental results are achieved compared to the LIO system with a tightly coupled framework. Liu et al. [3] proposed a feature extraction algorithm to extract rod-shaped and planar features, which reduces computational consumption and improves the accuracy of the LIO.
In addition to optimization-based LIO systems, filter-based LIO systems have emerged in recent years. The error state Kalman filter is used in [23] to propose a filter-based method for LiDAR-inertial odometry that estimates the robot's pose state. FAST-LIO [24] represents a new method to calculate the Kalman gain to accelerate the system. FAST-LIO2 [25] directly uses the original LiDAR points to perform scan-to-map registration and proposes the ikd-Tree structure to reduce the time consumption of map updates compared with the static k-d tree structure. Faster-LIO [26] has an iVox structure to organize voxels through a hash table and uses a LRU cache to realize map point addition and deletion, whose parallel accelerated version can achieve a performance that completely surpasses the performance of ikd-Tree. However, these existing SLAM systems are assumed to perform in a static environment. In fact, when an autonomous system navigates in realistic environments, the spatial structure will become more complex as more moving objects, such as moving people and cars, enter the environment. Therefore, for pose estimation and mapping, the online removal of dynamic objects is essential.

Dynamic Point Removal Approaches in SLAM
In the real world, moving objects such as cars and pedestrians are common. However, the excellent SLAM systems mentioned above are designed assuming a static environment and cannot robustly perform in dynamic scenes. Moving objects must be recognized and eliminated from the LiDAR point clouds in order to provide accurate positions and navigation. The following is a summary of the relevant methods: • Model-based approaches: These approaches are based on the simple prior model. For example, the ground is required to be removed first in [27,28]. Ref. [29] is based on the concept that most moving objects in urban scenes will inevitably come into contact with the ground. • Voxel map-based approaches: These approaches construct a voxel map and track the emitted ray from LiDAR. When the end point of a LiDAR ray hits a voxel, it is considered to be occupied. Moreover, the LiDAR beam is regarded as traveling across free voxels. The voxel probability in the voxel map can be computed in this way. However, these methods are computationally expensive. Even with engineering acceleration in the latest method [30], processing a large number of 3D points online is still difficult [31]. In addition, these methods need highly accurate localization information, which is a challenge for SLAM. In [32], an offline approach for labeling dynamic points in LiDAR scans based on occupancy maps is introduced, and the labeled results are used as training datasets for deep learning-based LiDAR SLAM methods. • Visibility-based approaches: In contrast to building a voxel map, the visibility-based approaches just need to compare the visibility difference rather than maintaining a large voxel map [33][34][35]. Specifically, the observed point should be considered dynamic if the view from the previously observed point blocks out the view from the current point. RF-LIO [36] proposed an adaptive dynamic object rejection method based on removert, which can perform SLAM in real time.

•
Learning-based method: The performance of semantic segmentation and detection methods based on deep learning has significantly improved. Ruchti et al. [37] integrated a neural network and an octree map to estimate the occupancy probability. Point clouds in a grid with a low occupancy probability are considered dynamic points. Chen et al. [38] proposed a fast-moving object segmentation network to divide the LiDAR scan into dynamic and still objects. The network is able to operate even faster than the LiDAR frequency. Wang et al. [39] proposed a 3D neural network, SANet, and added it to LOAM for semantic segmentation of dynamic objects. Jeong et al. [40] proposed 2D LiDAR odometry and mapping based on CNN, which used the fault detection of scan matching in dynamic environments.

New Contribution
This paper mainly aims to improve the accuracy of LIO in dynamic urban environments. The following are the main contributions of this paper: 1.
An online and effective dynamic point detection method at the spatial dimension is optimized and integrated. This approach fully utilizes the height information of the ground in the point clouds to detect dynamic points.

2.
An indexed point-based dynamic point propagation and removal algorithm is proposed to remove more dynamic points in a local map along the spatial and temporal dimensions and detect dynamic points in historical keyframes.

3.
In the LiDAR odometry module, we propose a delayed removal strategy for keyframes. Additionally, a lite slide window method is utilized to optimize the poses from scanto-map module. We assign dynamic weights to the well-matched LiDAR feature points in the historical keyframes in the sliding window.

Materials and Methods
An overview of our ID-LIO is shown in Figure 1. Our system includes three main modules. The first module is IMU pre-integration. This is employed to estimate system motion and obtain an initial pose result for point cloud decomposition and LiDAR odometry. Second, the feature extraction module includes de-skewed point clouds and feature extraction. In the de-skewed point clouds part, point clouds from a raw LiDAR scan are de-skewed using the pose from the IMU integration. In the feature extraction part, the roughness of the points is computed in order to extract the planar and edge features. Last, the key component of our LIO system is the mapping module. To achieve dynamic points in local map removal online before the scan-to-map module, several critical steps are applied, as follows: (i) IMU odometry is used to estimate the initial pose. (ii) The raw LiDAR scan and the matching local map are constructed, respectively, as the pseudo occupancies. (iii) The main moving objects of the local map are detected by comparing the height difference in the pseudo occupancy in the scan and the local map. (iv) Through a comparison of the dynamic observation numbers of map points with the threshold, we can finally remove the primary moving points from the local map. After the historical keyframes are updated, the dynamic points in the previous keyframes can also be labeled. (v) If the current frame is determined to be a keyframe, only the LiDAR scan-to-map residuals with dynamic weight are optimized via the lite slide window optimization method.

IMU Pre-Integration
The raw IMU measurements include accelerationâ B (t) and angular velocityω B (t).
The measured values are all under the IMU coordinate system B, which is the same as the frame of motion body. The IMU measurement model is expressed as: where b is the bias and η is white noise. The raw, measured values of IMU are affected by them. The gravity vector is g W = [0, 0, g] T in the world frame W. It only affects the measurement of the accelerometer. In our work, we assume that i + 1 is the current frame and i is the previous frame. Whenω i+1 B andâ i+1 B come, the initial pose state between frame i and frame i + 1 can be estimated using the IMU pre-integration approach: where R i+1 WB is the attitude rotation, v i+1 WB is velocity and p i+1 WB is position.

Indexed Point Initialization
In our work, we concentrate on feature point clouds in keyframes. We denote the extracted edge and planar features from a LiDAR scan at time step t + 1 as F e t+1 and F p t+1 , respectively. All the features extracted at time t + 1 compose a LiDAR keyframe F t+1 , . A number of keyframes of edge points close to the current t + 1 moment together constitute the local edge point cloud map M e t corresponding to the current frame, where M e t = F e t−i , F e t−i+1 , · · · , F e t−1 , F e t . i + 1 indicates the number of frames that make up the current local corner point map. Similarly, the local plane point In order to better find the point and then update the status of the point, a new type of point is defined. This operation of storing extra information in addition to the 3D position of a point is motivated by directed geometry point [41]. In our work, a point P is a vector with size 7: We do not initialize every LiDAR scan. When a current LiDAR scan is selected as a keyframe, we initialize the points of this keyframe. This is because the local map is composed of multiple keyframes. The state of feature points in a keyframe is what we are concerned with. The difference between a common point and an indexed point is shown in Figure 2. In Figure 2a, the local map consists of multiple keyframes composed of common points. In Figure 2b, because of the indexed point, we can know from which keyframe the points on the local map come. This part is used in Section 2.4. Figure 2. The difference between common points and index points. In (a), the local map consists of multiple keyframes composed of common points. In (b), the local map consists of multiple keyframes composed of indexed points. In this way, we can know from which keyframe the points on the local map come.

Dynamic Point Detection
For dynamic point detection in high-dynamic urban environments and in the process of SLAM, the offline post-processed ERASOR [29] method is optimized and integrated into the SLAM online system. As shown in Figure 3, we can see the overview of the online dynamic point detection method.

Problem Definiton
We first denote the current LiDAR frame and corresponding voxel submap at time step t + 1 as F B t+1 and M W t , respectively. The former is a full query LiDAR scan and the latter is a feature map that is composed of some keyframes around F t+1 , found by the sliding window method. Note that M W t is in the world frame W. However, F B t+1 is in frame B. In the ERAOSR algorithm, the point clouds on the local map need to be transformed to frame B, and the offline local map needs to be constructed in advance. This takes substantial computational time because of the large number of point clouds on the local map. In order to improve the real-time performance of the system, we use the same local map as the scan-to-map module, and transform frame F B t+1 to the world coordinate system W before the moving point detection, to obtain F W t+1 and record the current position of the motion robot p W t+1 in frame W first., This can be formulated as follows: where B W T t+1 is the relative transformation between F B t+1 and M W t from IMU odometry; Equation (7) represents the conversion of all point cloud coordinates in the B to the W. Figure 3. Overview of the online dynamic point detection method. First, we construct R-POD for the current frame and local map, respectively. Then, we compare the height difference in the pseudo occupancy in the scan R-POD and local map R-POD and obtain the dynamic pseudo occupancy of the local map. Last, we use the R-GPF method to obtain the ground plane, and the points above the ground are dynamic points.

Pseudo Occupancy-Based Removal Method
First, we divide the LiDAR frame F W t+1 and the submap M W t to region-wise pseudo occupancy descriptors (R-POD) O M t and O F t+1 , respectively, in a process similar to scan context: where r N is the ring number and θ N is the sector number. O F or M includes the point clouds that meet the following formulas: where and R l is the maximum detection distance of LiDAR. Then, we calculate the maximum height difference ∆h M (i,j) and ∆h F (i,j) in each bin. Thereafter, we can obtain the potentially Finally, we use region-wise ground plane fitting (R-GPF) [29] to obtain ground points in dynamic bins and the ground plane. The dynamic points are points in dynamic bin, expecting ground points, and all dynamic points in dynamic bins consist of the final dynamic points set M m D . Therefore, we can obtain the dynamic set M D , where M D is

Dynamic Point Propagation and Removal
ERAOSR [29] is an offline method that uses the long sequence local map. However, we need to eliminate dynamic points in the online process of SLAM. We can only obtain the short sequences in the SLAM, which will decrease the performance of dynamic point removal. Therefore, we propose a dynamic point removal method based on spatial and temporal dimensions to improve the removal performance.
In our work, we do not remove dynamic points in a submap when obtaining the dynamic point set M D after the dynamic point detection module, like RF-LIO. We judge whether the feature point is dynamic not only according to the difference between the current LiDAR scan and submap but also via the history difference. We propagate these points to historical keyframes based on the indexed points in Section 2.2. Then, we perform the dynamic point removal. If a feature point is a dynamic feature point, the frequency with which the feature point is judged to be a dynamic feature point will be high. Thus, the dynamic observation numbers (DON) of a point is used to determine whether the point is dynamic in the temporal dimension. The dynamic point propagation and removal process is summarized in Algorithm 1. In this algorithm, τ D is a threshold used to distinguish dynamic feature points and static feature points. If τ D is small, there will be many static points misclassified as dynamic points. If τ D is large, we cannot remove dynamic feature points better. In our system , τ D is set to 3 and 3 is an empirical value; it ensures a balance between the removal of dynamic points and the retention of static points.

LiDAR Odometry
This module includes scan-to-map matching and front-end optimization. To describe this section clearly, we start with the scan-to-map matching module, which is followed by the front-end optimization module.

Feature-Based Scan-to-Map Matching
The sensor motion between two consecutive LiDAR scans is estimated using the front end of SLAM. There are a variety of scan-matching methods, and the fundamental method of this module is the same as LOAM. We compute the roughness of each point in each LiDAR frame. Then, we extract edge and planar features for this new scan. The formula for calculating roughness is as follows: where p i represents the target point and p j is in the same ring as the LiDAR scan. The edge and planar features from the LiDAR frame t + 1 are denoted as F e t+1 and F p t+1 , respectively. Then, the feature points in F e t+1 and F p t+1 are transformed to W and obtain W F e t+1 , W F p t+1 . IMU pre-integration provides the initial transformation. By using the nearest-neighbor search method, we are able to locate its correlating feature points in M e t , M p t . Thus, the following formula can be used to calculate the distance between a target point and its related edge: where p ei t+1 ∈ W F e t+1 is an edge feature point and m e1 t , m e2 t ∈ M e t are two different points on the corresponding edge line. Similarly, the distance from a target point to its associated plane can be calculated as follows: At last, we can estimate the pose between the current frame and local map by solving the optimization problem:

Front-End Optimization
Within the scanning-to-mapping part, the local map is used with the dynamic points removed. However, we use a LiDAR scan, in which dynamic points exist. Because, in a single LiDAR frame, there are not enough sparse LiDAR points to use the dynamic point removal method as described above [42], we propose a delayed removal strategy for the keyframe and build a cost function including only LiDAR measurements with a dynamic observation-related weight jointly motivated by VINS-Mono. To obtain more accurate odometry for each keyframe of LiDAR, we optimize the pose states in the lightweight sliding window iteratively. In our LIO system, the states in the sliding window that need to be optimized are defined as where we optimize the poses of the n keyframes before the current moment t, and ]. For a sliding window of size n, these states are obtained by minimizing the final cost function, which is described as: In this cost function, r e L (x w k ) denotes edge-line geometric terms of LiDAR and r p L (x w k ) means the planar LiDAR error terms. Based on the previous LiDAR term calculation of each scan, the LiDAR term incorporates geometric constraints with dynamic weight from the dynamic point propagation and removal module (see Section 2.4) into the cost function scheme. The term is denoted as: and the dynamic observations-related weight is defined as: where d e i is the distance from the i-th target edge point to its corresponding edge and d p j is the distance from target planar point to its corresponding plane in Section 2.5.1; w i is a dynamic observations-related weight; and n i do is the dynamic observations of the i-th feature point. In this paper, we set the value of dynamic weight according to the DON. This indicates that the higher the DON, the less important the corresponding residuals d are in the cost function r.

Experimental Setup
We validate our system ID-LIO by conducting a set of experiments on public datasets, and ID-LIO is contrasted with FAST-LIO2, Faster-LIO and LIO-SAM. Compared with the LIO-SAM, the ID-LIO employs the same feature extraction module, loopback detection strategy and optimization method for the back-end using GTSAM. We use evo [43] to evaluate the accuracy. The ablation experiment demonstrates the effect of our proposed methods. The proposed methods are executed on a computer with an Intel i5 CPU and 32G RAM. Ubuntu 18.04 with the robot operating system (ROS) melodic [44] is our computer system. The datasets used for validation include UrbanNav datasets in Hong Kong (UNHK) [45] and UrbanLoco datasets in California (ULCA) [46]. The UrbanNav-HK-Data20190428 is a dataset containing some movable objects that is defined as a low-dynamic dataset. We define the others in UNHK and ULCA as high-dynamic datasets because they contain masses of movable objects. As shown in Table 1, the data information is listed in detail. The sensor suite of the UBHK dataset is shown in Figure 4.

Dynamic Observations Number Analysis
The impact of the appropriate DON value on the removal results of dynamic objects is very critical. Consequently, the purpose of this experiment is to demonstrate that the approach mentioned in Section 2.4 is accurate. Setting the correct DON threshold is important. If the DON value is too low, the static points may be incorrectly regarded as dynamic points. If it is too large, many moving points may be wrongly identified as static points. In the implementation of our algorithm, we set the DON value to 3. This can ensure that static points and dynamic points are correctly distinguished. To further prove the rationality of this threshold value, we chose three dynamic urban datasets. We tested the LIO-SAM using the different thresholds between 1 and 6. DON1 to 6 means the value of DON is from 1 to 6. The root mean square error (RMSE) of the absolute trajectory error (ATE) of them is shown in Table 2, which demonstrates that the fixed threshold 3 can provide more reliable and accurate pose estimation. ULCA-MarktStreet was chosen for our analysis. Figure 5 shows the trajectories of different DONs with our system in ULCA-MarktStreet in an X-Y plot when the thresholds are 1, 3 and 6. This figure shows the total error for the three distinct thresholds. It can be seen that the trajectory corresponding to DON3 is closer to the ground truth because the DON3 system achieves a balance between dynamic point removal and static point retention. The box diagram of the deviation is shown in Figure 6, which allows us to have a better view of the magnitude of the deviation. The DON3 system has the smallest error.
As shown in Figure 7, there are two kinds of red point sets in the figures. Red cloud points on the street indicate dynamic objects by moving cars, and the more red points, the better. However, red cloud points at other places indicate static objects that have been incorrectly removed, and the fewer red points, the better. If the threshold is 1, more dynamic points are removed. At the same time, many static points are incorrectly removed. If the threshold is 6, more static points are preserved. However, many dynamic points are also preserved in comparison with DON1. Neither of these two extreme cases applies to dynamic point removal. However, DON3 can achieve better rejection results. It retains most of the static points of the local map while removing more dynamic points.

Comparison of Dynamic Removal Strategies for Local Map
To further demonstrate the efficacy of the proposed removal strategy based on the pseudo occupancy and indexed point in the spatial and temporal dimensions (denoted as LIO-SAM + Spatial + Temporal), we make comparisons with the removal strategy based on the pseudo occupancy in the spatial dimension only (denoted as LIO-SAM + Spatial). The ATEs of the mapping thread are shown in Table 3. Figure 8a shows the dynamic point removal results obtained by these two methods. The red points on the street are dynamic traces by moving cars. Figure 8b identifies part of the dynamic feature points. In contrast, our method based on spatial and temporal dimensions successfully removes most of the dynamic traces. After the trajectories of both systems are aligned with the ground truth, we acquire the error maps for these two approaches, which are illustrated in Figure 9. Table 3. RMSE of ATE for LIO-SAM with dynamic removal method based on spatial dimension (denoted as LIO-SAM + Spatial) and LIO-SAM with dynamic removal method based on spatial and temporal dimensions (denoted as LIO-SAM + Spatial + Temporal).

Comparison of Delayed Removal Strategy for Keyframe
During the scan-to-map matching, both the dynamic points in the map and the LiDAR scan can result in significant deviation. According to the moving points on the local map, we can obtain the dynamic points in keyframes. When we perform the slide window optimization, the dynamic LiDAR terms will obtain the dynamic weights. Although we cannot remove dynamic points in the current keyframe, they will be removed in the next slide window optimization. Therefore, we conducted this experiment to prove the efficiency of the delayed removal strategy. The ATEs of the mapping thread are shown in Table 4. Figure 10 shows the trajectories of our system with delayed removal strategy and without the strategy. Figure 11 shows the error maps of our system without different removal strategy. Figure 12 shows the removal result. Figure 12a,b shows the real environment with moving cars and people corresponding to the current frame. Figure 12d shows a keyframe with red dynamic points after the delayed removal strategy. The dynamic cars and people are marked with red points. Figure 12c shows a static keyframe after removing the dynamic points of this keyframe.
The goal of this experiment is to demonstrate the effectiveness of the delayed removal method for dynamic points in historical keyframes. We use a dynamic point propagation method to obtain the dynamic points in historical keyframes based on the dynamic local map feature points. Figure 12f shows the dynamic traces on the local map. Figure 12e shows the dynamic points in this keyframe obtained by the dynamic point propagation method; for example, red points marked by yellow rectangles in Figure 12. There are five people walking on the road. These red points are part of the red points in the yellow rectangle in Figure 12f. Similarly, the car in the yellow oval in Figure 12e is part of the red points in the yellow oval in Figure 12f. The error maps of the ID-LIO with delayed removal strategy and without this strategy are shown in Figure 11. We can see that the trajectory in our system with this method is closer to the ground truth.

Results on Low-and High-Dynamic Datasets
We evaluate our ID-LIO system on six datasets with the ground truths from the UrbanNav and UrbanLoco datasets, and we compare the results with LIO-SAM, Fast-LIO, Faster-LIO and the ground truths. For quantitative evaluation, we computed the ATE for these three methods on different datasets using the evaluation tool EVO, and the results are shown in Table 5.  [46] 100.56 110.37 60. 35 15.34 After the trajectories of these four systems are aligned with the ground truth, we acquire the error maps for these approaches. These are illustrated in Figure 13, from which we can see that the error of each position calculated by our system is less than 1. Finally, we compare the differences between the point cloud maps of FAST-LIO and ID-LIO in Figure 14. We can see that Fast-LIO is the least effective because it does not have back-end loopback detection, and LIO-SAM is unable to detect the closed-loop accurately because there are many dynamic objects in the urban environment, resulting in inaccurate poses obtained by scan2map. Our ID-LIO adopts a dynamic point detection and rejection algorithm and achieves better loopback results compared to LIO-SAM.

Runtime Performance Analysis
Real-time performance is also a critical criterion to consider when evaluating SLAM systems in practice. We record the time consumption of ERASOR, LIO-SAM and ID-LIO on these six datasets. We can see that the runtime of ERASOR is more than 100 ms because of the amount of time spent on map construction and point cloud coordinate conversion. LIO-SAM is a real-time SLAM system that processes each frame in less than 100 ms. ID-LIO is a system with a dynamic point removal module that performs well in real time in five datasets. ULCA-MarktStreet is a highly dynamic and large scale dataset. Therefore, the runtime of ID-LIO is more than 100 ms. The runtime results are shown in Table 6. Table 6. Average runtime of three systems for processing a scan (ms).

Discussion
In this paper, we proposed an improved and robust LIO system in dynamic environments. A dynamic point removal method based on pseudo occupancy and indexed points is used for a local map before scan-to-map registration. It reduces the influence of dynamic points on feature matching and improves pose accuracy. In addition, we proposed a delayed removal strategy for keyframes, and the optimization based on sliding window incorporates the LiDAR measurement with dynamic weights to reduce error from dynamic points in keyframes, which further improves the pose accuracy. ID-LIO enables the real-time, accuracy and robustness requirements of SLAM. Compared with the offline ERASOR algorithm, our algorithm is 40% faster. In addition, the results of our dynamic object removal algorithm are better than ERASOR. We can remove more dynamic points and preserve more static points, as shown in Figure 9. The ATE average RMSE of our ID-LIO can be improved by 67% and 85% in the UrbanLoco-CAMarketStreet dataset and UrbanNav-HK-Medium-Urban-1 dataset, respectively, when compared with LIO-SAM. The dynamic point removal module enhances the robustness and accuracy of ID-LIO in highly dynamic environments. Compared with FAST-LIO and Faster-LIO, the ATE average RMSE of our ID-LIO can be improved by 88% and 89%, respectively, in the UNHK-TST dataset.
For future work, we note that it is important to improve the real-time performance of the LIO system in highly dynamic and large-scale environments. We can see that in the ULCA-MarktStreet dataset, the runtime of our system is more than 120 ms. This is because we need to construct local maps for scan-to-map registration and pseudo occupancy maps for dynamic point removal repetitively in our system, which is time-consuming. In the future, we will focus on spatial data structure for local map addition and deletion to improve the system's efficiency. It is also important to note that the LIO system is susceptible to Z-direction drift in vast scenarios. In the future, the ground constraint will be added to reduce drift on the Z-axis.