Short-range Lidar SLAM utilizing localization data of monocular localization

Simultaneous localization and mapping (SLAM) is a widely used technology in autonomous mobile robots, where sensors such as Lidar or cameras are typically used. Sensor fusion using multiple sensors has been employed to compensate for the shortcomings of each sensor in SLAM. However, the sensor cost cannot be ignored when considering its practical usage. Therefore, this study aims at realizing a high-precision SLAM using a sensor switching system, combining multiple low-cost sensors. The sensor switching system consists of a low-cost Lidar SLAM and a monocular localization. Since a low-cost Lidar has a short laser range, degeneracy often occurs due to the fact that they cannot capture features while building maps. The proposed system uses localization data from monocular localization to ensure precision in regions where degeneracy occurs. The proposed system was evaluated through the simulation assuming the museum environment where the degeneracy occurred. The accuracy of the robot trajectory and the built map proved the effectiveness of the proposed system.


Introduction
Autonomous mobile robots are used in various environments and applications, such as transport robots in factories and service robots in facilities. These robots can solve the labor shortage issue caused by the declining birth rate and aging population, save labor, and improve efficiency by automating tasks. In recent years, due to the COVID-19 pandemic, the demand for autonomous mobile robots that can replace human labor is expected to increase. In addition, a more accurate movement of the robots and a reduction in the installation costs are expected.
A high-accuracy simultaneous localization and mapping (SLAM) system [1] is required for stable and accurate autonomous movement in practical applications. Furthermore, low-cost sensors are desirable for SLAM. Typical sensors used in SLAM are Lidar and cameras.
Lidar can easily realize high-accuracy SLAM as they capture the distance to the surrounding environment by laser irradiation to acquire point cloud data of the surrounding environment. However, Lidar with a wide range (hereinafter referred to as long-range Lidar) is expensive.
In contrast, low-cost cameras can acquire information over long distances because they identify the surrounding environment based on information available through images. However, a monocular camera cannot directly measure the distance to the surrounding environment; thus, it is necessary to calculate the distance, which contributes to the inaccuracy of SLAM. RGB-D cameras can acquire depth information; however, the accuracy is lower than that of Lidar and is more expensive than monocular cameras.

Open Access
*Correspondence: snakamura@hosei.ac.jp 1 Faculty of Science and Engineering, Hosei University, 3-7-2, Kajino-cho, Koganei, Tokyo 184-8584, Japan Full list of author information is available at the end of the article Some studies improved the SLAM performance with a monocular camera (hereinafter referred to as monocular SLAM) using Lidar. For example, in [7][8][9] the problem of uncertain map scale in monocular SLAM was solved using lasers. In contrast, several studies improved the performance of Lidar SLAM using a monocular camera. For example, focusing on the fact that the monocular camera has a high environment recognition performance, there are some studies that utilized the visual information of the camera for loop detection and scan matching in Lidar SLAM [10][11][12][13].
In addition, as an approach similar to the authors work described below, a study utilized the pose information from the monocular SLAM [14]. In reference [14], a calibration method that focuses on the robot trajectory is applied to the monocular SLAM, and therefore, could use the pose information of the monocular camera when Lidar SLAM is degraded. However, this method was based on the assumption that the map built by SLAM is accurate to a certain extent; this method can not be used in supplementing the poses in real time during SLAM. Therefore, when degeneracy occur during the map build, the accuracy of the map built by Lidar SLAM is degraded, which results in poor accuracy of the calibration; the supplement by the pose information of the monocular camera will not function accurately. Nevertheless, an approach to actively use Lidar with a narrow range (hereinafter referred to as "short-range Lidar") is quite practical in terms of reducing the cost.
This study proposes a low cost yet highly accurate Lidar SLAM under following two conditions: (1) Limited to extremely low-cost sensor configuration of short-range 2D Lidar and monocular camera, (2) Assume an indoor space where monocular SLAM is basically well performed but short-range Lidar SLAM degenerates in some areas.
Specifically, this is a method that maintains the accuracy of a short-range 2D Lidar SLAM operated as a base, by using the pose information from the monocular SLAM as the supplement in case the degeneracy is automatically detected. In this respect, the proposed method is a sensor switching method rather than a fusion in terms of localization.
Degeneracy refers to a situation in which scan-based pose estimation, such as scan matching [16], is not accurately performed. Degeneracy is more likely to occur in short-range Lidar with a narrow scan range because it is difficult to capture the features needed for posing. When degeneracy occurs, the robot misunderstand that a scan was obtained at the same pose even if the robot is moving; thus, a map that is shrunk in the direction of translation compared to the real environment is obtained. Although it is possible to suppress degeneracy to certain extent by using odometry, the accuracy of odometry cannot be ensured because of the cumulative error.
The structure of this paper is as follows. In "Localization supplement system by monocular localization", the authors propose a system that utilizes the pose information of the monocular localization in case the degeneracy occurs to maintain the accuracy of mapping build of the short-range Lidar SLAM. In "Simulation", the performance of the proposed system is evaluated through simulations. Finally, in "Conclusion", a summary of the study and future works are described. Figure 2 shows the configuration of the proposed system. The goal of the system is to generate the accurate map based on the short-range Lidar SLAM which could be adequately used for autonomous navigation of the robot  afterwards, even in case the degeneracy occurs. Thus, in environments where degeneracy occurs in short-range Lidar SLAM, the system considers using the pose information from monocular localization, after applying the coordinate system calibration method described below, instead of the pose information from Lidar SLAM. Figure 2 shows that the proposed system operates based on the short-range Lidar SLAM, and the calibrated pose from monocular localization is prepared as a substiture for Lidar pose in case the degeneracy occurs. It should be noted that this calibrated pose is automatically caliculated in realtime during the SLAM. In this sense, the proposed system could be categorized as the improved Lidar framework which improves the accuracy of Lidar SLAM owing to the use of camera in localization, as shown in Fig. 1. The details of the algorithm is as follows.

Algorithm
The algorithm of the proposed system is shown in Fig. 3. The proposed system can be divided into three units corresponding to Lidar SLAM, monocular localization, and calibration. Each unit is processed parallelly, and the pose information from the Lidar SLAM (Lidar pose), monocular localization (monocular pose), calibrated monocular localization (calibrated pose), and degeneracy detection information is transmitted and received between each unit. In the Lidar SLAM unit, scan and odometry are acquired, and scan matching is used to estimate the pose. In case degeneracy occurs, the calibrated pose is set to the Lidar pose (see Localization 1 of Figs. 2, 3) and used for map construction. On other hand, when degeneracy does not occur, the map is built based on the pose information obtained by scan matching.
The degeneracy detection is performed by focusing on the fact that the displacement of the pose is under estimated in the direction of translation or rotation when degeneracy occurs. In detail, the amount of displacement of the Lidar pose and the calibrated pose from previous time step to the present time step is calculated respectively, and in case their difference is larger than a threshold value, the degeneracy is assumed to be detected. However, since the calibrated pose is set to the Lidar pose in case degeneracy occurs as mentioned above (see Localization 1 of Figs. 2, 3), the problem that raw Lidar pose can not be used for the degeneracy detection arise. To deal with this problem, the raw Lidar pose which is not overwritten in any case (see Localization 2 of Figs. 2, 3) is also prepared and it was used for degeneracy detection.
Thus, the degeneracy detection based on the Lidar pose (of Localization 2) and the calibrated pose at the timing when the robot acquires the k th pose is expressed (1) and (2), and the rotation distance is calculated by Eqs.
(3) and (4), using (x l k , y l k , θ l k ) and (x c k , y c k , θ c k ) , respectively. Here, k − 1 represents the information from previous time step.
Then, the past n times average of the difference in travel distances between the Lidar pose and the calibrated pose is calculated, as shown in Eq. (5) and Eq. (6).
In case the estimated travel distance derived by the Lidar pose becomes smaller due to degeneracy, there will be a certain difference from that derived by calibrated pose. Therefore, when r k in Eq. (5) or �φ k in Eq. (6) becomes larger than a certain value, it indicates that degeneracy has occurred. Note that we applied a quite simple degeneracy detection method in this study and, if necessary, methods such as [17] and [18] could be applied to improve the performance.
In this paper, the pose of n times is required for the degeneracy detection, and there is a difference between the time when robot entered the degenerate environment and the time when the detection is made. This leads to using inaccurate pose while this n times delay. To solve this problem, the calibrated pose is always stored up to n time steps ago, and when the detection result switches from "non-degenerate" to "degenerate", the Lidar pose is reset up to n time steps ago with these stored calibrated poses.
The monocular localization unit estimates the robot's pose from the acquired image and transmits the monocular pose to the calibration unit. In this case, the monocular SLAM also prepares and executes the map in advance.
The calibration unit obtains the degeneracy detection information from the Lidar SLAM unit. If degeneracy does not occur, the estimated values of the transformation parameters are updated using the Lidar pose obtained from the Lidar SLAM unit and monocular pose obtained from the monocular localization unit according to the coordinate calibration method described below. In contrast, if degeneracy occurs, the estimated values of the transformation parameters are not updated. Finally, the estimated transformation parameters are applied to the monocular pose to derive the calibrated pose, which is transmitted to the Lidar SLAM unit.

Coordinate calibration method
The coordinate systems of the map built by Lidar SLAM and monocular SLAM are different. A monocular camera cannot estimate the physical distance due to its structure and is processed as a dimensionless quantity. Moreover, the origin and direction of each axis of the coordinate systems depend on pose of the sensors with respect to the robot. Hence, calibration between the coordinate systems is required in order to convert the monocular pose into the Lidar pose.
In this study, an automatic calibration method that uses only the robot's pose information is developed based on practicality. Although several coordinate calibration methods have been currently proposed [14,15,[19][20][21], an automatic calibration method was developed referring to the method [21] and adding a scale estimation function to it.
A visualization of the automatic calibration is shown in Fig. 4. The coordinate systems of Lidar map and monocular map are represented as O l and O m , and the position coordinates by (x l , y l ) and (x m , y m ) , respectively. The robot trajectory in each coordinate system was recorded to automatically estimate the transformation parameters between each coordinate system. The transformation parameters were estimated using the same pose Here, the calibration from the monocular map coordinate system to the Lidar map coordinate system is considered. Using the monocular map coordinate system, the Lidar map coordinate system can be expressed as Eq. (7) with S as the scaling matrix, R as the rotation matrix, and T as the translation vector.
Therefore, in detail, the Lidar map coordinate system has a relation with the pose coordinates of the monocular map coordinate system as shouwn in Eq. (8).
where S lm x and S lm y represent the scaling in the x and y directions, respectively; θ lm represents the rotation direction; T lm x and T lm y represent the translation in the x and y directions, respectively. Here, the transformation parameters are estimated by the least-squares method shown in Eq. (9) [19], where (x l k , y l k ) and (x m k , y m k ) represent the poses of the k robot in each coordinate system of Lidar map and monocular map, respectively.
If two or more pairs (n ≥ 2) of corresponding points are obtained from the previous robot trajectory, ∂ε 2 ∂T lm ∂θ lm = 0 becomes valid and the transformation parameters can be estimated in principle by solving this system of equations. However, this nonlinear system of equations cannot be solved analytically and can be computed iteratively using the Newton method. Therefore, the approximate solutions are obtained using Eq. (12), where the i-th approximate solution is Eq. (10) and the correction values is Eq. (11).
Here, the appropriate initial value is set to P 0 = (T lm x,0 , T lm y,0 , S lm x,0 , S lm y,0 , θ lm 0 ) , and the correction value P is calculated by Eq. (13), which is derived from Newton's x �T lm y �S lm x �S lm y �θ lm T (12) P i+1 = P i + P method. Y and X of Eq. (13) are calculated using Eq. (14) and (15), respectively.
The iteration is terminated when all of the element of correction value P becomes sufficiently small. Finally, the approximate solution thus obtained is substituted into Eq. (8) in order to be used for calibration of the coordinate system.

Simulation setup
In this simulation, GMapping [22] for short-range Lidar SLAM and ORB-SLAM2 [23] for monocular SLAM was used. The PC used for the simulation had the following specifications. CPU: Intel Core i7-8700K @ 3.70GHz, memory: 64GB, OS: Ubuntu 18.04, and the system was built using a robot operating system (ROS) [24]. Gazebo [25] was used for evaluating the performance of the proposed method.
The robot and sensors used in Gazebo are shown in Table 1 and Fig. 5. The robot used is a two-wheel drive with rear casters. The measurement distance of the short-range Lidar was set to 4 m, measurement range to 270deg, resolution of the monocular camera to 640 480, and fps to 30.
The simulated environment in Fig. 6 is assuming the museum and the mobile robot is expected to be used as guide robot. The blue line in Fig. 6a shows the visualization of the laser irradiation of Lidar. The environment in Fig. 6b represents the case the degeneracy occurs in the direction of translation, and that is Fig. 6c represents the case the degeneracy occurs in the direction of rotation. The robot was driven along the trajectory shown in black in Fig. 6b, c and the environmental map was built using the long-range Lidar, short-range Lidar, and proposed method, respectively. Here, the measurement distance of the long-range Lidar was set to 30 m. Moreover, the map in Fig. 7a, b was prepared in advance for the monocular localization.
In the environment shown in Fig. 6b, the robot first drives through the non-degenerate environment where the degeneracy does not occur, and then drives through the degenerate environment where degeneracy occurs in the short-range Lidar SLAM in the direction of translation (the area in the red box in Fig. 6b). Then it took the path of driving through the non-degenerate environment again. In the environment shown in Fig. 6c, the robot first drives through the non-degenerate environment as in Fig. 6b, and then performs rotational motion at a location where degeneracy occurs in the short-range Lidar SLAM in the direction of rotation (the area shown in the red frame in Fig. 6c). Then, it took the path of driving through the non-degenerate environment again.
In this simulation, n in Eq. (3) was set to 3, and degeneracy was considered to occur when r k is 0.1m or higher and phi k is 3deg or higher, based on a preliminary simulation results. The estimation of the transformation parameters begins when 30 or more corresponding poses are obtained, and the iterative calculation is terminated when all of the elements of the correction value �P = (�T lm x , T lm y , S lm x , S lm y ,�θ lm ) is less than 0.01. Figure 8 shows the created map and trajectory of the Lidar pose during the driving. Figure 8a, b, and c show the results obtained by the long range Lidar only, short range Lidar only, and the proposed method, respectively, when driving in the environment of Fig. 6b. Figure 8d, e, and f show those results when driving in the environment of Fig. 6c. In the built map, the white area indicates no obstacle, the black area indicates an obstacle, and the gray area indicates that the area has not been searched    Fig. 8a, was successful because the Lidar could measure the entire environment. However, the map built by the short-range Lidar SLAM did not match the wall on the right side, Fig. 8b. This is due to the degeneracy in the translation in the area in the red frame in Fig. 6b, resulting in a shrinking map. In contrast, Fig. 8c shows that the proposed method can maintain accuracy even in the degenerate environment. This is because degeneracy can be accurately predicted by the proposed system, and the pose supplemented by the calibrated pose prevents the map from shrinking.

Simulation results and evaluation
The distance error between the Lidar pose and true value is shown in Table 2. Table 2 shows that the MAE (Mean Absolute Error) and RMSE (Root Mean Squared Error) values of the proposed method are smaller by 2.80 m and 3.86 m, respectively, compared to the short-range Lidar method, which confirms the improved localization accuracy of Lidar SLAM. Moreover, the MAE and RMSE of the proposed method were larger than those of the long-range Lidar by only 0.20 m and 0.25 m, respectively. Although the accuracy of the proposed method was slightly worse than that of long-range Lidar SLAM, it is expected that the proposed method can reach the accuracy of long-range Lidar with further improvements.
Next, the results of simulations conducted in Fig. 6c environment were as follows. The map built by the longrange Lidar SLAM, shown in Fig. 8d, was successful because the Lidar could measure the entire environment. However, the map built by the short-range Lidar SLAM is distorted as shown in Fig. 8e. This is due to the degeneracy in the direction of rotation in the area in the red frame in Fig. 6c, resulting in the reduced amount of rotation. In contrast, Fig. 8f shows that the proposed method can maintain accuracy even in the degenerate environment. This is because degeneracy can be accurately detected by the proposed system, and the pose supplement by the calibrated pose prevents the map from being distorted.
The distance error between the Lidar pose and true value is shown in Table 3. Table 3 shows that the MAE (Mean Absolute Error) and RMSE (Root Mean Squared Error) values of the proposed method are smaller by 1.50 m and 3.40 m, respectively, compared to the short-range Lidar method, which confirms the improved localization accuracy of Lidar SLAM. Moreover, compared with long-range Lidar, the MAE of the proposed method was smaller by 0.01 m, and the RMSE was equal to the proposed method, indicating that the accuracy of the proposed method is almost equivalent to that of long-range Lidar SLAM.
Finally, further analysis of the internal behavior was conducted as follows.
First, we start from the case of simulation in Fig. 6b environment. Figure 9 shows the overall landscape of the robot and the environment, sensor information (laser scans) from the short-range Lidar SLAM and the longrange Lidar SLAM, and sensor information (images from the monocular camera) from the monocular localization at a few specific locations along the movement trajectory (indicated by in Fig. 9a). In particular, two locations are     Figure 10 shows the transition of the degeneracy detection index r k and the transformation parameters. Figure 12a shows the results of degeneracy detection at each location along the trajectory. The location inside the red frame in Fig. 6b, which is considered as the environment where degeneracy occurs, correponds to the area in the red frame in Fig. 12a, indicating that degeneracy occurs over a long period of time. The detail situation where the degeneracy occurs could be seen in Fig. 9. From Fig. 9k, 9o, it can be seen that the laser scan of the short-range Lidar has not changed in this area in terms of movement. This is a situation where pose translation cannot be estimated from the laser scan, which is nothing but degeneracy. On the other hand, from Fig. 9l, 9p and Fig. 9m, 9q, the laser scan of the long-range Lidar and the image of the monocular camera change with movement indicates that no degeneracy is occuring even in this area.
Next is the case of simulation conducted in Fig. 6c environment. Figure 11 shows the transition of the degeneracy detection index �φ k and the transformation parameters. Figure 12b shows the results of degeneracy detection at each location along the trajectory. The location inside the red frame in Fig. 6c, which is considered as the degenerate environment, corresponds to the area in the red frame in Fig. 12b, indicating that degeneracy occurs at those points. Therefore, the proposed method, which automatically detects degeneracy immediately and switches to the calibrated pose based on the pose information from the monocular localization with higher accuracy, is expected to work in these situations.

Conclusion
In this study, we proposed a short-range Lidar SLAM that utilizes localization data from the monocular localization as a supplement, with the goal of realizing a high-accuracy SLAM with low-cost sensors in environments where degeneracy occurs. Then, the accuracy of the proposed short-range Lidar SLAM was evaluated via a simulation in an environment where degeneracy occurs.