Scale Estimation and Correction of the Monocular Simultaneous Localization and Mapping (SLAM) Based on Fusion of 1D Laser Range Finder and Vision Data

This article presents a new sensor fusion method for visual simultaneous localization and mapping (SLAM) through integration of a monocular camera and a 1D-laser range finder. Such as a fusion method provides the scale estimation and drift correction and it is not limited by volume, e.g., the stereo camera is constrained by the baseline and overcomes the limited depth range problem associated with SLAM for RGBD cameras. We first present the analytical feasibility for estimating the absolute scale through the fusion of 1D distance information and image information. Next, the analytical derivation of the laser-vision fusion is described in detail based on the local dense reconstruction of the image sequences. We also correct the scale drift of the monocular SLAM using the laser distance information which is independent of the drift error. Finally, application of this approach to both indoor and outdoor scenes is verified by the Technical University of Munich dataset of RGBD and self-collected data. We compare the effects of the scale estimation and drift correction of the proposed method with the SLAM for a monocular camera and a RGBD camera.


Introduction
Simultaneous localization and mapping (SLAM) may help robots create their own maps while locating themselves in unknown areas by using specific sensors. Recently, there has been a huge scientific interest in a new artificial intelligence research topic, where the robot is capable of sensing the environment. In particular, the monocular SLAM has attracted a widespread attention thanks to its low cost, wide application, and rich information, and has recently made great strides. Similar to more popular cases, e.g., EKF-SLAM [1,2], ORB-SLAM [3,4] and LSD-SLAM [5], a complete monocular SLAM system is formed by the standard extended Kalman filter method, the feature point method or the direct method.
The monocular SLAM suffers from a particular disadvantage, whereby the camera cannot estimate the absolute scale and therefore, the scale will drift. Figure 1a shows the phenomenon of the scale drift in a very popular ORB-SLAM [1] based on a monocular camera, where no closed loop is detected. In order to solve this issue, we introduce a point distance information into the visual reconstruction to establish a relationship between the visual and real distance. The sensor is assembled by integration of the monocular vision and the 1D laser range finder (LRF). Figure 1b depicts the effect of the scale correction. However, this solution is different from others so that it does not suffer from high hardware costs, exhibits a large detection range, and its accuracy will not be affected by its volume. In the meanwhile, it will not be subject to additional restrictions, so it may be widely used in the construction and positioning process based on the visual 3D motion. In order to achieve this goal, we focus on the following two issues;

•
The absolute scale is estimated based on the fusion result of the 1D distance of the LRF and image of the monocular camera.

•
Correcting the scale drift of the monocular SLAM using the laser distance information which is independent of the drift error. depicts the effect of the scale correction. However, this solution is different from others so that it does not suffer from high hardware costs, exhibits a large detection range, and its accuracy will not be affected by its volume. In the meanwhile, it will not be subject to additional restrictions, so it may be widely used in the construction and positioning process based on the visual 3D motion. In order to achieve this goal, we focus on the following two issues; • The absolute scale is estimated based on the fusion result of the 1D distance of the LRF and image of the monocular camera.

•
Correcting the scale drift of the monocular SLAM using the laser distance information which is independent of the drift error. This paper is organized as follows: The existing methods related our work is outlined in the following section. Section 3 illustrates the proposed algorithm by mathematical description. In Sections 4 and 5, we evaluate the solution of the scale estimation and correction by deploying it on the indoor sequences from the Technical University of Munich (TUM) dataset [6] and actual sequences collected by our proposed approach. Finally, conclusions and future are provided in Section 6.

Related Work
As mentioned earlier, monocular SLAM method has specific weaknesses, where it cannot rely on the structure-from-motion (SFM) technique to restore the absolute scale of the object due to without extra benchmark in the real world [7]. In addition, due to the error accumulation, there is bound to be a drift between the current and the previous reconstructed scales. The drift level will increase in the subsequent process. If it is not corrected in time, the final result will be more serious. Many scholars have put forward constructive proposals for this defect as follows.
It is proposed to reduce the scale drift through uniform optimization of all data. The main idea is to combine the vision SLAM system with the front-end and back-end solutions, the front-end provides initial estimates, and the back-end solves the optimal solution of the overall solution by integrating all the data. For example, Strasdat et al. [8] constructed a monolithic iterative nonlinear optimizer that includes two parts of both bundle-adjustment to densely connected parts of the graph and pose-graph SLAM to weakly connected parts. In [9], Carlone used a closed-form optimization approach called linear approximation for the pose graph optimization (LOAGO). It converts the pose-graph problem and formulate it into a quadratic optimization task by relative orientation representation. On this basis, Dubbelman et al. [10] introduced the COP-SLAM method and extended it to an optimization problem in the 3D scene. These tasks represent significant improvements to the back-end and require high computational consumption. It is difficult to achieve good results for areas that have not yet formed a closed loop. Most importantly, it suffers from lack of estimating the absolute. This paper is organized as follows: The existing methods related our work is outlined in the following section. Section 3 illustrates the proposed algorithm by mathematical description. In Sections 4 and 5, we evaluate the solution of the scale estimation and correction by deploying it on the indoor sequences from the Technical University of Munich (TUM) dataset [6] and actual sequences collected by our proposed approach. Finally, conclusions and future are provided in Section 6.

Related Work
As mentioned earlier, monocular SLAM method has specific weaknesses, where it cannot rely on the structure-from-motion (SFM) technique to restore the absolute scale of the object due to without extra benchmark in the real world [7]. In addition, due to the error accumulation, there is bound to be a drift between the current and the previous reconstructed scales. The drift level will increase in the subsequent process. If it is not corrected in time, the final result will be more serious. Many scholars have put forward constructive proposals for this defect as follows.
It is proposed to reduce the scale drift through uniform optimization of all data. The main idea is to combine the vision SLAM system with the front-end and back-end solutions, the front-end provides initial estimates, and the back-end solves the optimal solution of the overall solution by integrating all the data. For example, Strasdat et al. [8] constructed a monolithic iterative nonlinear optimizer that includes two parts of both bundle-adjustment to densely connected parts of the graph and pose-graph SLAM to weakly connected parts. In [9], Carlone used a closed-form optimization approach called linear approximation for the pose graph optimization (LOAGO). It converts the pose-graph problem and formulate it into a quadratic optimization task by relative orientation representation. On this basis, Dubbelman et al. [10] introduced the COP-SLAM method and extended it to an optimization problem in the 3D scene. These tasks represent significant improvements to the back-end and require high computational consumption. It is difficult to achieve good results for areas that have not yet formed a closed loop. Most importantly, it suffers from lack of estimating the absolute.
In addition, some scholars have studied the use of specific prior information to estimate the absolute scale. The basic idea is to estimate the pose of a well know pattern or physical points in the real world. For this, the inverse EKF method [11,12] is more representative, and its key concept is the direct parametrization of the inverse depth of features relative to the camera locations from which they were first viewed. Currently, a classic approach assumes the camera is located at a fixed height above the ground plane to estimate the absolute scale of the environment [7,13,14]. Duncan [15] estimated the absolute scale and integrated it seamlessly into the beam adjustment by recognition of a prior on the size of the objects, (similar [16,17]). There is also a method for correcting the scale drift based on recognizing similar targets again [18]. However, the aforementioned methods require explicit a priori information and cannot be applied when the scene changes. In order to mitigate these constraints, some scholars have introduced the deep learning method into the visual SLAM [19][20][21]. Based on these predecessors, Laina [22] modeled the ambiguous mapping between the monocular images and the depth maps and realized the depth map of the scene given a single RGB image by a fully convolutional architecture encompassing residual learning. Next, Keisuke's results, published in 2017 [23], integrated the depth map predicted by the convolutional neural network and the depth map acquired by the monocular SLAM to estimate the absolute scale method. These methods are still in their infancy and have high requirements for computing platforms. In order to get more robust measurements, one require to train in advance through large amounts of data.
To achieve more extensive practicality and reduce the dependence on the scene in the real-time SLAM, some scholars prefer another scale estimation and drift correction scheme to employ additional sensors. Martinelli [24] estimated the absolute scale by merging a monocular camera, three orthogonal accelerometers and three orthogonal gyroscopes, analyzed all the observable modes, and established a fast closed-form solution. The similar work has been before [25,26]. One may also integrate other types of non-visual sensors, e.g., GPS [27,28], and wheeled odometers [29][30][31]. These combinations effectively exploit different measurement properties of multiple sensors, providing more dimensional support for the scale estimation. However, each sensor has a limitation on the scene, and the fusion will further restrict the application scenario.
Other people have explored ways to build the stereo vision. For instance, the widely used stereo-SLAM, which uses the real-scale information contained in the fixed baseline between the two cameras to locate and map [32,33]. The depth estimation and the mapping range of the binocular vision heavily rely on the baseline length and the calibration accuracy between the cameras [34]. On the other hand, with the advent of the RGB-D sensors which capture RGB images along with the per-pixel depth information. For instance, the Real Sense, Kinect and Xtion, rely on the combination of the depth image and the RGB image to realize the purpose of scale estimation and drift correction. Zhang et al. [35] applied the bidirectional Fast Library algorithm and the General Iterative Closest Points (GICP) to improve the accuracy of SLAM. Di and Zhao [36] proposed a method based on extended bundle adjustment with integrated 2D and 3D information on the basis of a new projection model. The SLAM method may be formed based on the RGB-D camera [37][38][39]. However, the depth image resolution of the RGBD cameras is not ideal and the depth of the imaging is limited. For example, Microsoft's Kinect v2 possesses a detection range of 0.5~4.5 m, so it may only be used in the small-scale scene as indoors.
Our goal is to design a SLAM providing a wide range of detection capabilities on the premise of satisfying low power consumption and small volume. Based on lidar odometry, laser ranging requires a high-precision distance sensor with large depth information. The 3D scanning radar in the real-time SLAM suffers from various disadvantages, e.g., huge point cloud data, high power consumption, large size and expensive price. Here we overcome these issues by combining the 1D laser distance information (LRF) and the image information. The small volume of the LRF model is shown as Figure 2a. Figure 2b shows the system with integrated sensors. Afterwards, we show how to use the distance information of one non-feature point in each frame to achieve absolute scale reduction and implement such a new SLAM scheme.

Method Description
This section systematically elaborates the proposed approach and the initialization including the scale estimation and the drift correction based on laser distance information. The main idea is to establish a correspondence between the one point distances and the matched feature points via quasi-dense reconstruction of the local surface around the laser spot. According to the correspondence, we then estimate the absolute scale and effectively correct the scale drift.

Initialization
The core idea of estimating the absolute scale is to estimate the relative pose of the camera using the SFM [40] and accurately estimate the depth of the space around the laser ranging point. As a result, we establish the absolute scale factor between the initial structure and the real world. However, the sparse point clouds based on the feature reconstruction cannot satisfy this correspondence. An accurate and dense reconstruction of the surface around the laser spot is required. Here, we use a method based on Furukawa's [41] dense reconstruction to accelerate the calculation of regions.
For a known collection of multi-frame set ={F } i Γ , where F i represents the data of i -th frame, including an image I i and distance of the laser i laser l . We select a frame in the middle as reference frame ref F and consider the world coordinate system as the camera coordinate system of the reference frame. Prior to this, we extracted the ORB-feature [42] of image {I } i and completed the matching. Following the SFM [40] method, the estimation of the fundamental matrix or homography is performed on the image sequence [4], and the pose relationship of all frames [ , ] i i s R t ( s is the scale factor) are optimized for achieving the bundle adjustment (BA) [43].

Initial Reconstruction
The coordinates of the laser spot is expressed as in the reference camera coordinate system. It may be calculated from the relative external parameters [ , ] laser laser R t of the camera and laser that have been previously calibrated, and the distance i laser l from the laser spot.
The image projection point may be represented as . We then create an Next, the area is divided into a grid with β pixels per side of the unit In the process of dense reconstruction, we ensure that each patch contains at least one successfully reconstructed patch, or multiple patches (see Figure 3).

Method Description
This section systematically elaborates the proposed approach and the initialization including the scale estimation and the drift correction based on laser distance information. The main idea is to establish a correspondence between the one point distances and the matched feature points via quasi-dense reconstruction of the local surface around the laser spot. According to the correspondence, we then estimate the absolute scale and effectively correct the scale drift.

Initialization
The core idea of estimating the absolute scale is to estimate the relative pose of the camera using the SFM [40] and accurately estimate the depth of the space around the laser ranging point. As a result, we establish the absolute scale factor between the initial structure and the real world. However, the sparse point clouds based on the feature reconstruction cannot satisfy this correspondence. An accurate and dense reconstruction of the surface around the laser spot is required. Here, we use a method based on Furukawa's [41] dense reconstruction to accelerate the calculation of regions.
For a known collection of multi-frame set Γ = F i , where F i represents the data of i-th frame, including an image I i and distance of the laser l i laser . We select a frame in the middle as reference frame F re f and consider the world coordinate system as the camera coordinate system of the reference frame. Prior to this, we extracted the ORB-feature [42] of image I i and completed the matching. Following the SFM [40] method, the estimation of the fundamental matrix or homography is performed on the image sequence [4], and the pose relationship of all frames [R i , st i ] (s is the scale factor) are optimized for achieving the bundle adjustment (BA) [43].

Initial Reconstruction
The coordinates of the laser spot is expressed as X We then create an α × α(pixels 2 ) area named Cell re f , and perform a dense reconstruction of the targets in the Cell re f . Next, the area is divided into a grid with β pixels per side of the unit C re f l,m (0 ≤ (l, m) < n, α = n•β). In the process of dense reconstruction, we ensure that each patch contains at least one successfully reconstructed patch, or multiple patches (see Figure 3).
Next, we use the projection relation to find the visible image of the detected patch p in a specific position of all images. We screen the visibility image I j of the patch p through the local area NCC score and add it to set ( ) V p . The NCC score is a function that evaluates the local similarity which will be elaborated in the Appendix A: Due to the presence of noise, this initialized value is inaccurate, therefore, we need to subsequently minimize the NCC score to estimate their optimal solution. An evaluation function based on the NCC score may be expressed as: are optimization variables to maximize the score The positional parameter ( ) Next, we use the projection relation to find the visible image of the detected patch p in a specific position of all images. We screen the visibility image I j of the patch p through the local area NCC score and add it to set V(p). The NCC score is a function that evaluates the local similarity which will be elaborated in the Appendix A: Due to the presence of noise, this initialized value is inaccurate, therefore, we need to subsequently minimize the NCC score to estimate their optimal solution. An evaluation function based on the NCC score may be expressed as: The Least square method may be used to achieve the optimal solution for the parameters o( p) and v( p). We may update the visible collection V( p) as: . Then, it is assumed that the initialization of the patch p is successful, we add it to the patch set P = { p}.

Patch Proliferation
We next reconstruct patches p for cells C re f l ,m (the index (l , m )) which do not contain an ORB-feature based on the continuity of the object surface. These patches do not satisfy the hypothesis relationship and are introduced by filtering. A dense reconstruction of the area near the target laser spot is achieved. Given an existing patch p and the accordingly cell C re f l,m (the index (l, m)), we look for its neighbor patch p in the direction of the laser projection point x re f l . We next describe the index where the laser projection as: One may initialize p by the patch p parameter, We then take the initial parameters o(p ) and v(p ) into the optimization function (4) and solve the optimized optimal solution about o( p ) and v( p ), for V( p ) ≥ γ 3 where the expansion is considered successful. It will then be added to the collection P and the collection N( p) of the neighborhood patches p. This process will be repeated until it is expanded for each existing patch around the neighborhood Near re f l of C re f l 0 ,m 0 , as Science the reconstruction is based on the assumption of continuity, we filter out the reconstructed patches that do not satisfy this assumption. The patch set U(p) is selected from the adjacent N(p) of the patch p, which is not continuous with the patch p. p (p ∈ U(p)) will make a unified judgment: Among them, γ 4 represents the upper limit of the vertical offset between the two patches. If p does not satisfy (13), then it is added to the set denoted by U(p). Two main characteristics of the error patches, combining the experimental experience, are as follows.
The first filter focuses on the outliers that lie outside the real surface. p will be removed from the set P as an error patch when: Most of the patches that are incorrectly reconstructed are individual patches, connected by few patches as: When the discontinuity patch is discarded, if neighbor Near re f l of the laser cell C re f l 0 ,m 0 still satisfies Equations (14) and (15), then the local dense reconstruction for the reference frame work is completed.

Scale Estimation
We now compare the spatial location x where X re f l is the data from the laser range finder in the system, and X We next select a new frame as the reference frame and repeat the above process to estimate the scale factor λ re f 2 , while: λ The scale estimation is judged correct, and the absolute scale λ 0 of the system initialization reads: Based on the scale factor, we restore the real scale λ 0 of the map and adjust the corresponding absolute position of frames. At this point, the system initialization is completed.

Scale Correction
After successful initialization, we estimate the absolute scale. In order to maintain stability of the SLAM system, the scale drift needs to be detected and corrected during the subsequent measurement. Here, we propose the idea of key frame bundling. The key frame selection method is similar to ORB-SLAM [2] and the key frame bundle (KF-Bundle) is the aggregation of multiple key frames combined into a similar initialization Γ i . We solve the absolute scale of the key frame λ i bundling using the same method as in the previous section.

Drift Estimation
Since the scale drift is generated through accumulation of errors, no sudden change occurs in the scale factor relative to the scale of the previous key frame cluster. Therefore, we remove the wrong estimation scale factor via filtering. Once the following two conditions are concurrently satisfied, the scale drift reaches the threshold and the scale update process will be triggered.
(1) The scale of the last three estimates is offset by more than a certain amount relative to the current map scale: (2) The scale drift of the current image beam is still expanding: The value ρ 2 σ is related to the accuracy of the map and the update frequency of the scale.
In the actual experiment, we set it to 0.05 0.01 making a trade-off between the computational cost and the positioning accuracy (see Figure 4). The reason is as follows, • It is not necessary to update each KF cluster since the scale drift is relatively slow and an accurate scale could be followed for a long period. • Too frequent updating scale takes up computing resources of the system and does not yield corresponding benefits.

•
Although we effectively exclude the vast majority of wrong scale estimates, it cannot be rule out there will still be individual errors and frequent updates making the system less stable.

Drift Estimation
Since the scale drift is generated through accumulation of errors, no sudden change occurs in the scale factor relative to the scale of the previous key frame cluster. Therefore, we remove the wrong estimation scale factor via filtering. Once the following two conditions are concurrently satisfied, the scale drift reaches the threshold and the scale update process will be triggered.
(1) The scale of the last three estimates is offset by more than a certain amount relative to the current map scale: (2) The scale drift of the current image beam is still expanding: The value ( ) 2 ρ σ is related to the accuracy of the map and the update frequency of the scale.
In the actual experiment, we set it to ( ) 0.05 0.01 making a trade-off between the computational cost and the positioning accuracy (see Figure 4). The reason is as follows, • It is not necessary to update each KF cluster since the scale drift is relatively slow and an accurate scale could be followed for a long period.

•
Too frequent updating scale takes up computing resources of the system and does not yield corresponding benefits.

•
Although we effectively exclude the vast majority of wrong scale estimates, it cannot be rule out there will still be individual errors and frequent updates making the system less stable.

Scale Correction
Once a proper scale estimation is achieved and the updating criteria are met, the correcting flow will be initiated at the next key frame insert. After entering the local optimization process, the key frames having the common view relationship with the new key frame and the map points observed by these key frames will be re-scaled. The detailed process reads: (1) Inserting a new key frame and enabling the local BA optimization process, the key frames, directly connected to the key frame, will be grouped into a KF-Bundle i Γ . (2) The map points observed by the key frames in the i Γ will be optimized.

Scale Correction
Once a proper scale estimation is achieved and the updating criteria are met, the correcting flow will be initiated at the next key frame insert. After entering the local optimization process, the key frames having the common view relationship with the new key frame and the map points observed by these key frames will be re-scaled. The detailed process reads: (1) Inserting a new key frame and enabling the local BA optimization process, the key frames, directly connected to the key frame, will be grouped into a KF-Bundle Γ i . (2) The map points observed by the key frames in the Γ i will be optimized.
(3) All map points and key frames will be transformed in to the current local coordinate and will be re-scaled by λ i . (4) Convert the local coordinates to the world coordinate and these information will be used for the following tracking.
In summary, the 1D laser distance information is used to estimate of the SLAM's scale parameters and to correct the drift. In this process, we patch the area around the laser spot, and such a scale reconstruction is achieved at low cost.

TUM Dataset Experiments
Our proposed method requires a distance information provided by a fixed 1D LRF and the image captured by camera. However, in reality, no relevant data set exist for direct use, therefore, we adopt the TUM dataset to this method. The depth information of the center point of the image is then extracted from the depth image to simulate the data of the LRF located at the origin of the camera coordinate system, and directed along the camera's optical axis. To evaluate the performance of the proposed method, the popular RGB-D dataset (TUM) [4] is used. The TUM dataset exhibits various advantages, i.e., the relative movement of the measurement system is known which allows quantitative evaluation of the proposed method. Moreover, the TUM dataset contains a rich set of scenes and is very challenging because of including fast rotating motion, intense motion blur, and rolling shutter tails. All experiments are performed using a computer equipped with an Intel Core i7-4900MQ @2.80 GHz CPU and 16 Gb RAM. The proposed method run in four main threads, introducing some randomness in the task. For this reason, we report the median from several runs.

Initial Estimation
First, we compared the initialization efficiency and the scale estimation bias under different scenarios in the data set. The number of passed frames before the initialization and the initial scale error is equal to the evaluation standard. The scale error err ini is expressed as: revealing the percentage of the error between the measured distance d M i and the true distance d truth i of the adjacent six frames. To demonstrate the superior performance of our method, we compare it with the RGBD camera of ORB-SLAM [3]. This is a very challenging job since the RGB-D uses a different initialization strategy than ours (iterative closest point (ICP) registration, where there is no limit on the relative rotation), and in the measurement process provides more direct access to the depth of each point information. In order to evaluate the impact of the dense reconstruction on the monocular vision, we introduce the monocular SLAM method as a contrast for the number prior to the initial passed frames.
From Table 1, we may see that for different scenes, similar to the monocular camera method, our method is successfully initialized. This result is related to the SFM [40] initialization method. Since initialization of both the monocular camera method and the proposed method require a rigid condition with parallax between the frames, the initialization success corresponds to the camera movement. Since the camera has started relative motion, we may quickly complete the initialization in fr1/teddy data and others. However, for fr1/xyz due to the beginning of a violently rotation, until above 170 frame of our proposed to complete the initialization. The RGB-D camera method does not require disparity between two frames since it uses the feature point cloud registration algorithm, and may be initialized at every start. For errors, our method is capable of effectively achieving the scale estimation, but not as good as the SLAM for RGBD camera. Since we have greatly reduced the demand for the depth images, we believe that this error range is completely acceptable. In the subsequent measurement, we may eliminate the reconstruction error by filtering the large-scale error estimation through Equations (21) and (22).

Scale Correction
Let us now evaluate the measurement results achieved from different methods in each scene. In the evaluation process, we mask all methods from the closed-loop optimization threads, in order to verify effectiveness of the proposed scaling update method. We multiply the results of the monocular vision by the true scale of the initial moments due to the lack of the absolute scale in the ORB-SLAM for monocular camera. Figure 5 plots the trajectories corresponding to the measurement results in both stereo and plane scenarios. It may be seen from these figures that in both cases, the proposed method (the red line) is closer to the true value (the black line) than the monocular camera (the blue line), and it is very close to that of the RGBD camera (the green line).

Scale Correction
Let us now evaluate the measurement results achieved from different methods in each scene. In the evaluation process, we mask all methods from the closed-loop optimization threads, in order to verify effectiveness of the proposed scaling update method. We multiply the results of the monocular vision by the true scale of the initial moments due to the lack of the absolute scale in the ORB-SLAM for monocular camera. Figure 5 plots the trajectories corresponding to the measurement results in both stereo and plane scenarios. It may be seen from these figures that in both cases, the proposed method (the red line) is closer to the true value (the black line) than the monocular camera (the blue line), and it is very close to that of the RGBD camera (the green line). In order to more objectively assess contribution of the proposed method to the scale correction, we refer to the calculation method of the rotation error and mileage error [44]. We set the true value of the rotation matrix and mileage of the camera's current frame relative to the world coordinate system as In this process, we have aligned the results of three methods in time and recorded the changes in the scale factors achieved during the measurement process. Figure 6 plots the result. This figure clearly shows the optimization effect of the proposed method compared to the Monocular SLAM of the relative distance errors. By sudden change of the scale of the monocular vision, our algorithm timely detects the change and maintains the drift within a reasonable range. As shown in Figure 6a, around 9th frame, marked by watery blue, the monocular SLAM rises rapidly. However, the proposed method effectively corrects the scale drift and reduces the distance errors accumulation. In order to more objectively assess contribution of the proposed method to the scale correction, we refer to the calculation method of the rotation error and mileage error [44]. We set the true value of the rotation matrix and mileage of the camera's current frame relative to the world coordinate system as R i true and d i true , where R true = [ r 1 true r 2 true r 3 true ] T , is the total distance of the track, and: In this process, we have aligned the results of three methods in time and recorded the changes in the scale factors achieved during the measurement process. Figure 6 plots the result. This figure clearly shows the optimization effect of the proposed method compared to the Monocular SLAM of the relative distance errors. By sudden change of the scale of the monocular vision, our algorithm timely detects the change and maintains the drift within a reasonable range. As shown in Figure 6a, around 9th frame, marked by watery blue, the monocular SLAM rises rapidly. However, the proposed method effectively corrects the scale drift and reduces the distance errors accumulation. In Figure 6b,d, there are also significant situations like this. A similar effect to RGBD camera is achieved in the final distance estimation error. For the rotation errors, all three methods are based on Figure 6. Error assessment results. The horizontal rows of three charts respectively indicate, the relative distance error, the rotation error, and the change curve of the scale estimate. Each row of the data corresponds to a data sequence. (a) fr1/xyz; (b) fr3/office; (c) fr3/near; (d) fr3/xyz; (e) fr1/teddy. In Figure 6b,d, there are also significant situations like this. A similar effect to RGBD camera is achieved in the final distance estimation error. For the rotation errors, all three methods are based on the corrected matching of the ORB-features. Therefore, similar results are achieved due to the error matches correspond to the image features.

Self-Collected Dataset Experiments
Finally, we evaluate the proposed methods on the self-collected data sets. Figure 7a plots the data acquisition system which consists of a camera and a LRF module. We use a "mer131-210" camera with 1280 × 1024 (pixels 2 ) resolution and a SDK-100D 1D-LRF module that the measurement range reads 0.3~100 m and the accuracy reads (1 + 0.5% × distance) (mm). As shown in Figure 7b, we use markers to pre-determine the conversion relationship between the laser coordinate system and the camera coordinate system. Placed on the mobile platform during the trial and evaluated for three different scene. the corrected matching of the ORB-features. Therefore, similar results are achieved due to the error matches correspond to the image features.

Self-Collected Dataset Experiments
Finally, we evaluate the proposed methods on the self-collected data sets. Figure 7a plots the data acquisition system which consists of a camera and a LRF module. We use a "mer131-210" camera with 2 1280 1024 ( ) pixels × resolution and a SDK-100D 1D-LRF module that the measurement range reads 0.3~100 m and the accuracy reads (1+0.5% distance) ( ) mm × . As shown in Figure 7b, we use markers to pre-determine the conversion relationship between the laser coordinate system and the camera coordinate system. Placed on the mobile platform during the trial and evaluated for three different scene. Since the true value of the pose of the Mono-LRF during the measurement is unknown, we decide to project the estimated trajectory of the camera to Google Maps. Due to the larger scene, the measurement range of the RGBD-camera has been exceeded. Therefore, here we only compare the monocular camera method [2]. We has given the monocular SLAM exactly the same scale information as ours Mono-LRF SLAM within 100 key frames after initialization and shielded closed loop detection to better illustrate the corrective effect on the scale drift. Table 2 lists the measured distances.

Scene 1
Scene 1 is a round of small buildings, Figure 1 shows its reconstruction effect. The GPS has measured distance of 407 m which is close to our result of 405 m. Taking into account the error of the GPS system, this difference is within an acceptable range. Through projection of the trajectories on the Google Maps (Figure 8), we may see the corrective effect of the proposed method on the scale drift. Since the true value of the pose of the Mono-LRF during the measurement is unknown, we decide to project the estimated trajectory of the camera to Google Maps. Due to the larger scene, the measurement range of the RGBD-camera has been exceeded. Therefore, here we only compare the monocular camera method [2]. We has given the monocular SLAM exactly the same scale information as ours Mono-LRF SLAM within 100 key frames after initialization and shielded closed loop detection to better illustrate the corrective effect on the scale drift. Table 2 lists the measured distances.

Scene 1
Scene 1 is a round of small buildings, Figure 1 shows its reconstruction effect. The GPS has measured distance of 407 m which is close to our result of 405 m. Taking into account the error of the GPS system, this difference is within an acceptable range. Through projection of the trajectories on the Google Maps (Figure 8), we may see the corrective effect of the proposed method on the scale drift.

Scene 2
Scenario 2 is resulted from the measurement of the surround of a larger scene, similar to the result of the scene 1. It is in the course of 1.23 km, considering irregularity of the movement and the influence of the building on the GPS positioning accuracy. The distance measured by the proposed method is slightly smaller than the GPS data. The monocular SLAM is exhibited a larger drift. Figure  9 plots the details and clearly show that our measurements are slightly offset.

Scene 3
We next investigate the long-distance scenes that experienced multiple rotations. In the motion, we randomly turn the SLAM system and the results are shown in Figure 10. Increasing the number of the rotations, deviations have occurred in the direction estimation of both algorithms. The scale drift of the monocular ORB-SLAM system has rapidly accumulated, resulting in a large inaccuracy with the real environment. It may be seen from Figure 10 that, the proposed method still maintains a more accurate estimate of the distance traveled than monocular.

Scene 2
Scenario 2 is resulted from the measurement of the surround of a larger scene, similar to the result of the scene 1. It is in the course of 1.23 km, considering irregularity of the movement and the influence of the building on the GPS positioning accuracy. The distance measured by the proposed method is slightly smaller than the GPS data. The monocular SLAM is exhibited a larger drift. Figure 9 plots the details and clearly show that our measurements are slightly offset.

Scene 2
Scenario 2 is resulted from the measurement of the surround of a larger scene, similar to the result of the scene 1. It is in the course of 1.23 km, considering irregularity of the movement and the influence of the building on the GPS positioning accuracy. The distance measured by the proposed method is slightly smaller than the GPS data. The monocular SLAM is exhibited a larger drift. Figure  9 plots the details and clearly show that our measurements are slightly offset.

Scene 3
We next investigate the long-distance scenes that experienced multiple rotations. In the motion, we randomly turn the SLAM system and the results are shown in Figure 10. Increasing the number of the rotations, deviations have occurred in the direction estimation of both algorithms. The scale drift of the monocular ORB-SLAM system has rapidly accumulated, resulting in a large inaccuracy with the real environment. It may be seen from Figure 10 that, the proposed method still maintains a more accurate estimate of the distance traveled than monocular.

Conclusions
This study introduced a novel SLAM method for achieving the absolute scale estimation and the scaling drift correction through the fusion of 1D laser distance information and monocular vision. We first describe integration of the novel fusion. Such a SLAM method does not have to increase the baseline distance in order to measure a large scene like the binocular vision, and is no longer limited by sensors such as RGBD that are applicable to small-scale measurements. Thanks to the simple structure of the 1D laser ranging system, it is convenient to realize the miniaturization and high integration of the measurement system. The latter part of the paper validates the effectiveness of the proposed method through datasets and collected data. Experimental results using the TUM dataset effectively prove the reliability of our scale estimation, which may be acceptable, compared to the increase of the data in one dimension and the wide spread of the measurement range, although there are still many disparities in comparison with the ORB-SLAM for RGBD cameras. In addition, through the collection of the mobile data, we clearly observed the huge improvement of the proposed method compared to the original ORB-SLAM for monocular cameras.
However, the method presented in this article is not perfect. We have tried to overcome the difficulties and tried to make the method more robust. Errors may occur due to purely rotational motion states and for the lack of texture or extreme discontinuity due to the dependence on the SFM method and the local dense reconstruction. In the future, we will combine other depth estimation methods to further improve the fusion of the laser and vision data. This approach may eventually find applications in the increasingly important robotics industry.

Scene 3
We next investigate the long-distance scenes that experienced multiple rotations. In the motion, we randomly turn the SLAM system and the results are shown in Figure 10. Increasing the number of the rotations, deviations have occurred in the direction estimation of both algorithms. The scale drift of the monocular ORB-SLAM system has rapidly accumulated, resulting in a large inaccuracy with the real environment. It may be seen from Figure 10 that, the proposed method still maintains a more accurate estimate of the distance traveled than monocular.

Conclusions
This study introduced a novel SLAM method for achieving the absolute scale estimation and the scaling drift correction through the fusion of 1D laser distance information and monocular vision. We first describe integration of the novel fusion. Such a SLAM method does not have to increase the baseline distance in order to measure a large scene like the binocular vision, and is no longer limited by sensors such as RGBD that are applicable to small-scale measurements. Thanks to the simple structure of the 1D laser ranging system, it is convenient to realize the miniaturization and high integration of the measurement system. The latter part of the paper validates the effectiveness of the proposed method through datasets and collected data. Experimental results using the TUM dataset effectively prove the reliability of our scale estimation, which may be acceptable, compared to the increase of the data in one dimension and the wide spread of the measurement range, although there are still many disparities in comparison with the ORB-SLAM for RGBD cameras. In addition, through the collection of the mobile data, we clearly observed the huge improvement of the proposed method compared to the original ORB-SLAM for monocular cameras.
However, the method presented in this article is not perfect. We have tried to overcome the difficulties and tried to make the method more robust. Errors may occur due to purely rotational motion states and for the lack of texture or extreme discontinuity due to the dependence on the SFM Figure 10. Trajectory in scene 3. The red trajectory is basically the same as the real one, while the blue trajectory has a large scale drift problem.

Conclusions
This study introduced a novel SLAM method for achieving the absolute scale estimation and the scaling drift correction through the fusion of 1D laser distance information and monocular vision. We first describe integration of the novel fusion. Such a SLAM method does not have to increase the baseline distance in order to measure a large scene like the binocular vision, and is no longer limited by sensors such as RGBD that are applicable to small-scale measurements. Thanks to the simple structure of the 1D laser ranging system, it is convenient to realize the miniaturization and high integration of the measurement system. The latter part of the paper validates the effectiveness of the proposed method through datasets and collected data. Experimental results using the TUM dataset effectively prove the reliability of our scale estimation, which may be acceptable, compared to the increase of the data in one dimension and the wide spread of the measurement range, although there are still many disparities in comparison with the ORB-SLAM for RGBD cameras. In addition, through the collection of the mobile data, we clearly observed the huge improvement of the proposed method compared to the original ORB-SLAM for monocular cameras.
However, the method presented in this article is not perfect. We have tried to overcome the difficulties and tried to make the method more robust. Errors may occur due to purely rotational motion states and for the lack of texture or extreme discontinuity due to the dependence on the SFM method and the local dense reconstruction. In the future, we will combine other depth estimation methods to further improve the fusion of the laser and vision data. This approach may eventually find applications in the increasingly important robotics industry.