Panoramic Visual SLAM Technology for Spherical Images

Simultaneous Localization and Mapping (SLAM) technology is one of the best methods for fast 3D reconstruction and mapping. However, the accuracy of SLAM is not always high enough, which is currently the subject of much research interest. Panoramic vision can provide us with a wide range of angles of view, many feature points, and rich information. The panoramic multi-view cross-imaging feature can be used to realize instantaneous omnidirectional spatial information acquisition and improve the positioning accuracy of SLAM. In this study, we investigated panoramic visual SLAM positioning technology, including three core research points: (1) the spherical imaging model; (2) spherical image feature extraction and matching methods, including the Spherical Oriented FAST and Rotated BRIEF (SPHORB) and ternary scale-invariant feature transform (SIFT) algorithms; and (3) the panoramic visual SLAM algorithm. The experimental results show that the method of panoramic visual SLAM can improve the robustness and accuracy of a SLAM system.


Introduction
Simultaneous Localization and Mapping (SLAM) is an advanced technology in the area of robot navigation, pilotless driving, unmanned aerial vehicle surveying and mapping, and virtual reality (VR)/augmented reality (AR). It refers to the use of a sensor in an unfamiliar environment, where the data observed by the sensor are used to estimate the state of motion of the sensor itself, while building a map of the surrounding environment. SLAM technology can be divided into LiDAR SLAM and visual SLAM. For historical reasons, the research into LiDAR SLAM began earlier than research into visual SLAM, and LiDAR SLAM technology is more mature than visual SLAM technology in theory, algorithms, and landing products. However, LiDAR is more expensive than cameras, and LiDAR has a limited range of detection. Cameras have no distance limit and cost less. At present, the solutions for visual SLAM technology are mainly based on RGB-D cameras and monocular, stereo, or panoramic cameras. The biggest difference between the two schemes is that RGB-D cameras are equipped with depth sensors, whereas ordinary monocular, stereo, and panoramic cameras are not. Since RGB-D cameras are generally more expensive than ordinary cameras, it is of great significance to study visual SLAM technology based on ordinary cameras (like monocular, stereo, or panoramic cameras without depth sensors), to reduce the cost. Among the ordinary cameras, panoramic cameras have gradually become one of the hotspots in the field of visual SLAM research because of their wide range of information perception and fast and complete information acquisition.
The common monocular camera has a horizontal angle of view of about 60 degrees and a vertical angle of view of about 45 degrees. When a mobile platform moves continuously, because of the small field of vision, the extracted feature points only stay in the field of vision for a short period of time. As a result, mobile platforms cannot observe the feature points continuously and effectively, which limits the development of SLAM based on visual sensors. The longer the continuous observation time of the feature points, the more conducive this is to system state correction and updating [1]. Davison and Murray [2] also noted that the longer the time of continuous feature observation, the faster the error based on ORB-SLAM2 [6]. Fisheye-SLAM [9] and ORB-SLAM3 [28,29] implement fisheye visual SLAM. PAN-SLAM [30] implements a panoramic visual SLAM based on a multicamera system. Caruso et al. [31] proposed large-scale direct SLAM for omnidirectional cameras based on LSD-SLAM. Liu et al. [32] and Matsuki et al. [33] respectively proposed fisheye-stereo DSO and omnidirectional DSO based on DSO [12]. Forster et al. [34] and Heng et al. [35], respectively, proposed multi-camera system SVO and fisheye-stereo SVO based on SVO [13]. OpenVSLAM [36] implements a versatile visual SLAM framework with high usability and extensibility. The system can deal with various types of camera models, such as perspective, fisheye, and equirectangular.
The main aim of this work is to make full use of the omnidirectional perspective of panoramic vision and SLAM technology to achieve a higher positioning accuracy than monocular visual SLAM, while focusing on the spherical imaging model and the problems in feature extraction and matching. The main contributions of this paper are as follows.
(1) The panoramic imaging model. We study the pixel expression method for spherical images, and derive the formula between the pixel coordinates and camera coordinates. (2) Feature extraction and matching of panoramic images. Because panoramic images are seriously distorted and the imaging model differs from that of an ordinary monocular camera, we compare and analyze the feature extraction effects of various algorithms. The Spherical Oriented FAST and Rotated BRIEF (SPHORB) feature extraction algorithm is identified as being the most suitable for a panoramic visual SLAM positioning system. In addition, we propose improvements to the scale-invariant feature transform (SIFT) algorithm, and realize binary SIFT and ternary SIFT. These improvements to SIFT greatly increase the speed of SIFT while ensuring sufficient accuracy. (3) Research into a SLAM algorithm for panoramic vision and the implementation of a location system. The ORB-SLAM2 [6] algorithm is improved, via front-end odometry and back-end optimization, to realize a SLAM positioning system that is suitable for panoramic vision.

Overview of Our Method
Mobile panoramic visual imaging mainly adopts three modes: multi-lens combination, rotation, and refraction [16,37]. The current mainstream approach is to capture panoramas through multi-lens combinations, such as Point Grey's Ladybug series. It consists of six fisheye lenses with very high resolution, but it is too expensive and this has reduced its popularity. Ricoh is a consumer-grade panoramic camera. It is composed of two fisheye lenses, which are sufficient for the experimental resolution of this paper.The experimental data in this paper include two parts: the simulation data of the InteriorNet dataset [38] and the measured data collected by a Ricoh camera. This paper proposes a SLAM method based on panoramic vision and its overall flow chart is shown in Figure 1. Our system is based on ORB-SLAM2 [6] for development and improvement. We extended it for a spherical image. Firstly, the collected data are transformed by the spherical imaging model (please refer to Section 4) to synthesize a 360-degree panoramic image. The SPHORB algorithm is then used as the front-end of the SLAM system to extract features of panoramic images and realize panoramic visual odometry. Next, the position and pose of the panoramic camera are optimized at the back-end with g2o [39]. Loop closure detection is carried out at the same time. The experimental results show that the proposed method is more efficient, accurate, and robust than monocular vision for pose estimation.

The Spherical Imaging Model
Unlike monocular cameras, the image distortion of fisheye and panoramic cameras is very serious. The traditional perspective model is no longer applicable. Many researchers have proposed unique models based on the imaging principles of fisheye and panoramic cameras, while other researchers have proposed imaging models that can describe perspective, fisheye, and panoramic images with a unified model. In 2000, Geyer et al. provided a unified theory for all central catadioptric systems [40]. This model was extended by Barreto et al. in 2001, which was known as the spherical camera model [41]. It can model central catadioptric systems and conventional cameras. In 2015, Khomutenko et al. further extended the model and proposed an enhanced unified camera model (EUCM) [42]. The new model applies to catadioptric systems and wide-angle fisheye cameras. It does not require additional mapping to model distortions, and it takes just two of the projection parameters using a simple pinhole model to represent radial distortion. This model was used in fisheye-SLAM [9] and achieved good results. In 2018, Usenko et al. propose the double sphere camera model [43], which fits well with large field-of-view lenses. It is computationally friendly and has a closed-form inverse.
In this paper, we use a spherical imaging model called "longitude and latitude expression". This method avoids complicated description parameters. It compares the panoramic spherical image to the Earth, and the pixel coordinates to the latitude and longitude. As shown in Figure 2, O − xyz is the camera coordinate system. The pixel coordinates of the projection points of object point P(X w , Y w , Z w ) in a planar image are p(u, v). The projection point on the spherical image is p s , which can be expressed in latitude and longitude as p s (θ, ϕ). P(X, Y, Z) is the object point, which is equivalent to the P(X w , Y w , Z w ) mentioned above. p s (x, y, z) is the corresponding projection point on the sphere. p(u, v) is the corresponding point on the plane.
We let α be the angle between the projection of vector −→ Op s on plane O − yz and the z-axis, and β be the angle between vector −→ Op s and plane O − yz. In real images, u and v correspond to the rows and columns of the image pixels, respectively, which are finite ]. According to the spatial geometric relations, a formula can be derived: where f is the focal length of the camera, and (u 0 , v 0 ) are the pixel coordinates of the principal point. Equation (1) expresses the mapping relationship between the panoramic planar image and the panoramic spherical image.
Panoramic spherical image means that the panoramic image acquired by the camera is mapped to a virtual spherical surface in space, which emphasizes the imaging process of the image. Panoramic planar image is the image output by the camera, which is similar to the planar image we see on paper, and emphasizes the appearance of the image in front of us. In this paper, spherical images refer to panoramic spherical images, and panoramic images refer to panoramic planar images. Generally speaking, there is no difference, but the emphasis is different.
When the spherical image is mapped to the plane completely, the aspect ratio of the planar image must be 2:1 (see Figure 3). The mapping relationship between the planar image and spherical image is just like that between a map and the Earth. The latitude and longitude (θ, ϕ) in the spherical image correspond to the rows and columns (u, v) of the planar image. The latitude θ ∈ [0, π] is divided into H number of equivalents, corresponding to row u ∈ [0, H] of the planar image. The longitude ϕ ∈ [0, 2π] is divided into W number of equivalents, corresponding to column v ∈ [0, W] of the planar image. In this way, the spherical image is mapped to a planar image with a resolution of W × H. According to this, we can construct a two-dimensional array to express the spherical pixels. According to Equation (1), p s can be expressed as p s (α, β). Therefore, p s (α, β) can be used to express p(u, v) and p s (x, y, z) (see Equations (2) and (3)). In the formulas, W and H respectively represent the width and height of the panoramic image.
We let P c (X c , Y c , Z c ) be the camera coordinates of P(X, Y, Z). Because the optical center O and spherical projection points p s and P c are collinear, Equation (4) can be obtained, and Equation (5) is then established, where R is the distance between the object square point and the optical center of the camera.
According to Equations (2) and (3), the relationship between the panoramic spherical coordinates and the pixel coordinates can be derived, as shown in Equations (6) and (7).
By combining Equations (4) and (7), the relationship between pixel coordinates p(u, v) and camera coordinates P c (X c , Y c , Z c ) can be derived, as shown in Equation (8).

Feature Extraction and Matching of Spherical Images
A few feature extraction algorithms have been designed for use with spherical images, such as spherical SIFT [44], PCA-SIFT [45], etc. Although, to a certain extent, the influence of spherical image distortion on feature extraction is solved, the speed of the feature extraction is not ideal. The main concern of this paper is panoramic visual SLAM positioning technology, which requires the system to output the real-time pose information of the camera. Therefore, the algorithms with poor real-time performance are not discussed.
In a real-time visual SLAM system, in order to ensure that the speed of the feature extraction matches that of the system, it is usually necessary to reduce the quality of the feature extraction. One solution for monocular vision SLAM systems is to use the Oriented FAST and Rotated BRIEF (ORB) algorithm [46] to complete the feature extraction and matching. However, in panoramic vision, because of the influence of the image distortion, and the fact that the camera imaging model differs from that of monocular vision, the ORB algorithm is not ideal for the feature extraction of panoramic images.
The SPHORB algorithm stems from the geodesic grid and can be considered as an equal-area hexagonal grid parametrization of the sphere used in climate modeling. It has been proved in topology that any surface can be approximated by triangulation. Therefore, a sphere can also be approximated by triangles, which can be combined into hexagonal meshes (and may contain a small number of pentagons). The idea of the SPHORB algorithm is to approximate the spherical image and obtain a hexagonal spherical mesh (similar to a football). The fine-grained and robust features are then directly constructed on the hexagonal spherical grid, avoiding the time-consuming computation of spherical harmonics and the related bandwidth constraints, thus enabling a very fast performance and high descriptive quality (the specific process is shown in Figure 4). We therefore use the SPHORB algorithm for the feature extraction.

The Panoramic Visual SLAM Algorithm
The SLAM problem can be described by two equations: the motion equation (Equation (9)) and the observation equation (Equation (10)).
In the motion equation, subscript k denotes the current time serial number, and k − 1 denotes the last moment. u k is the sensor's reading and w k is the noise. x k represents the position of the sensor at the current time. x k is a three-dimensional vector. x k−1 represents the position of the sensor at the last moment.
In the observation equation, subscript j represents the ordinal number of the currently observed landmarks. y j is the landmark observed by the sensor at position x k , which is also a three-dimensional vector. z k,j denotes the observation data corresponding to the landmarks y j . v k,j is the measurement noise.
These two equations are the most basic equations in the SLAM problem. They describe the motion and observation models of the sensor in the SLAM problem. Therefore, the problem can be abstracted as follows: how to solve the location problem (estimate x) and the mapping problem (estimate y) when we know the reading data of the motion measurement and the reading data of the sensor. At this time, we model the SLAM problem as a state estimation problem, i.e., how to estimate the internal and hidden state variables by measuring data with noise [47].
In this paper, we mainly address the location problem of panoramic SLAM, i.e., how to solve the x-vector in the above-mentioned state estimation problem, the position and attitude of the panoramic camera, and how to make full use of the wide-range perspective of the panoramic camera to optimize the vector x.
The algorithm framework of classical visual SLAM is shown in Figure 5. Firstly, the data of the visual sensor, including the video and image data, are input. Secondly, feature extraction and matching of the image data are carried out. The transform matrix T (including rotation matrix R and translation vector t) is calculated according to the principle of reprojection error minimization, and the pose change of the camera is estimated. At the same time, a local map and the initial pose map are constructed. Next, in the backend optimization, considering the loop information, the transformation matrix T and the three-dimensional coordinate X of the landmark are optimized simultaneously by using the non-linear optimization method. Finally, sparse three-dimensional point clouds are generated.

Front-End Visual Odometry
Compared with the classical SLAM algorithm framework, the SLAM algorithm based on panoramic vision faces some problems: (1) the distortion of the spherical image makes the feature extraction and matching difficult; (2) the mapping relationship between the pixel coordinates and camera coordinates of the planar image is not applicable to a spherical surface; and (3) the method of solving pose with a polar constraint of the planar image is not applicable to a spherical surface.
Therefore, in view of the panoramic visual SLAM positioning problem, we need to improve the front-end visual odometry part of the classical visual SLAM framework. The improvement process is shown in Figure 6. To deal with the distortion of spherical images, the SPHORB algorithm, which can directly extract and match the features of a spherical surface, is adopted to effectively reduce the influence of image distortion on feature extraction and matching. From the pixel coordinates to the camera coordinates, the planar image is described by an internal reference matrix, while the panoramic image is a sphere. The mapping relationship between the pixel coordinates (u, v) and camera coordinates (θ, ϕ) needs to be described by a latitude and longitude expression.

Back-End Optimization
Since the polar geometric relationship of a spherical panorama is consistent with that of a planar image, the essential matrix E between two spherical coordinate systems can be calculated directly using the coordinates of standard spherical panoramic image points. Therefore, the polar-constrained relationship of the planar image x T 2 Ex 1 = 0 can be extended to the sphere. x 1 , x 2 are the panoramic spherical coordinates (x 1 , y 1 , z 1 ), (x 2 , y 2 , z 2 ), which represent a pair of namesake points p 1 , p 2 . The panoramic spherical coordinates can be directly calculated by Equations (6) and (7).
In this study, the back-end optimization algorithm in ORB-SLAM2 [6] was improved to enable it to handle the spherical model. In the optimization process of the back-end of the sphere, we still use the pixel reprojection error, and the error function can be expressed as shown in Equation (11). xp is the pixel coordinate of the point after reprojection, and x p is the pixel coordinate of the matching point.
In order to optimize the overall reprojection error, the least-squares problem is constructed. All the positions are adjusted to minimize e. By combining Equations (8) and (11), the Jacobian matrix of the reprojection error point P c (X c , Y c , Z c ) can be obtained as shown in Equation (12). The Jacobian matrix of pose ξ is shown in Equation (13).
where e represents the reprojection error, P c represents the camera coordinates of the object points, and ξ represents the Lie-algebraic form of the pose. So far, we have derived the Jacobian matrix of the observation equation of the panoramic camera from the camera pose and feature points, which are an important part of the back-end optimization. They are also the unique part that distinguishes a panoramic camera from a monocular camera in the process of back-end optimization.

Experimental Data
In order to test the robustness and accuracy of panoramic visual SLAM in different environments, four datasets were selected (see Figure 7a). The first two groups were from our measured data, while the latter two groups were from InteriorNet data. The trajectory of our measured data was roughly a rectangle, and the movement of the camera was relatively stable. InteriorNet data were simulated by a computer. It could arbitrarily change the viewpoint to generate a panoramic image, so its trajectory was irregular. We used these two different types of data to evaluate the robustness of the algorithm. The InteriorNet data were generated by Li et al. [38] in a simulated environment. Each InteriorNet dataset contains panoramic data, plus corresponding monocular data and fisheye data (as shown in Figure 7b), each with 1000 frames of images. The movement of the measured data was relatively stable, while the data generated in the simulated environment showed more violent movement. In this paper, the robustness and accuracy of panoramic visual SLAM and monocular visual SLAM are evaluated through the data of various scenes and motion states.   Because SIFT has good robustness to scale and rotation, and its accuracy is high but its speed is slow, we attempted to improve its speed so that it could be used in SLAM. The main improvement was to quantize the 128-dimensional floating-point vector (128 × 32 = 4096 bits) of SIFT with the median as the bound, and to binarize the original floating-point numbers. The numbers greater than the median were recoded to 1, and the numbers less than the median were recoded to 0, so that the data were compressed into 128 bits. This can greatly reduce the memory consumption and improve the matching speed, while maintaining the robustness of SIFT.
Similarly, in order to quantify the original 128-dimensional floating-point vector more accurately, we implemented "ternary" SIFT. At the same time, taking the values at 1/4 and the median as the boundaries, the encoding from small to large was 00, 10, and 11. The original 32-bit floating-point numbers were compressed into 2 bits, with a total of 256 bits.
In the experiments, because the parts of feature extraction and descriptor calculation were the same, the time taken for the quantization descriptor could be ignored, so that the matching speed and accuracy of the three methods could be compared. The coarse matching was screened by a ratio test, for which the threshold was 0.8. The fundamental matrix was calculated by random sample consensus (RANSAC), and a reprojection error of 3 pixels was used for the fine matching. After several groups of experiments, three pairs of typical panoramic images were selected for analysis. The first pair was made up of indoor images with more feature points, without too large a rotation angle, which is a common situation in SLAM. The second pair was made up of images with a 90 degree rotation. The third group was made up of outdoor images with fewer feature points. The experimental results are shown in Figure 8. The left, middle, and right are the results of SIFT, binary SIFT, and ternary SIFT, respectively. The evaluation of matching results for different kinds of SIFT are listed in Table 1. For the case of more feature points, as in the first group of data, the matching data and fine matching rate of the three methods are almost the same, but the speed of SIFT is significantly slower than that of binary SIFT and ternary SIFT. The matching result of ternary SIFT is better, and even better than SIFT in the case of rotation, and the speed is also faster, as shown in the second group of data. For the case of fewer feature points, the matching results of binary SIFT and ternary SIFT are worse than that of SIFT. The reason for this may be that the number of matching points is small, and the proportion of wrong matches in the coarse matching is high, which leads to the increase of iterations in RANSAC. In general, the matching speed of ternary SIFT is the fastest. In the case of more feature points, a superior matching result can be obtained, even if the image has rotation.

SPHORB and ORB
The ORB algorithm is one of the fastest feature extraction algorithms available, and has good matching accuracy, but it is mainly used for processing planar images. For spherical images, the ORB algorithm does not work as well. The SPHORB algorithm is a feature extraction algorithm used to process spherical images, and is an improvement of the ORB algorithm based on the features of a spherical image (please refer to Section 5), ensuring faster processing speed and higher accuracy.
In the panoramic image-matching experiments, the three datasets described in Section 7.2.1 were again used. The feature points calculated by ORB and SPHORB were used for the matching in the three experiments. Figure 9 shows the matching result of the ORB algorithm on the left and the SPHORB algorithm on the right.
As shown in Figure 9, in the first and third groups of experiments, the matching lines of the SPHORB have better consistency and fewer crossover lines. The figure shows that the matching quality was better than ORB. In the second experiment, because the image was rotated 90 degrees, the ORB algorithm only matched the central part of the image, but the feature with the same name on the edge was not matched. However, the SPHORB algorithm could match most of the eponymous feature points in both the center and the edge.
The evaluation of matching results for ORB and SPHORB are listed in Table 2. The filtering rules for the rough matching and fine matching are consistent with those described in Section 7.2.1. However, the results from the first and second sets of data experiments showed that the ORB algorithm had a higher matching precision than the SPHORB algorithm. Notably, in the second set, SPHORB had a fine matching rate of only 24.86%, which is clearly not true. The reason for this is most likely the removal of a large number of correct matches during the RANSAC process. As described in Section 7.2.1, the RANSAC algorithm in OpenCV was adopted, which is mainly used for planar images. For panoramic images, the effect of removing mismatches is often not good, especially when a pair of panoramic images has a large rotation angle (as in the second group of data). Therefore, a special spherical RANSAC method is needed to obtain a reliable and precise matching rate. This will be addressed in our future research.
In summary, the matching results show that the accuracy the SPHORB algorithm is higher than the ORB algorithm. Because the current filtering rules for the precise matching of spherical images are unreliable, the data results in Table 2 do not reflect the true accuracy of SPHORB.

Panoramic Visual SLAM Experiment
According to the data characteristics, the experiments were divided into two groups. The first group of data was made up of measured data without the true values of the trajectories. These data were used to evaluate the mapping effects of ORB and SPHORB in SLAM, including the initialization speed, the number of matches per frame, and the tracking time of each frame. The initialization speed was measured by the ID of the frame where the initialization was successful. We recorded the number of successful matching points in each frame and calculated their mean value. The greater the number of matching points, the better the accuracy of SLAM. Finally, the average tracking speed in each frame were recorded. The second group of data was made up of the InteriorNet simulation data, and because the data provided the true values of the trajectories, they could be used to evaluate the accuracy of the trajectories. The data also provided the monocular image corresponding to the panoramic image (see Figure 7b), which could highlight the advantages of using panoramic images in SLAM.
The experimental results for the first group of data are shown in Figure 10 and listed in Table 3. It can be seen from the figure that the common view of SLAM when using SPHORB is much denser than when using ORB. This is due to the fact that the number of matching points of SPHORB is higher, which makes the constraint between frames stronger and the final accuracy higher.
(a) Running screenshot of measured data 1.   The experimental results for the second group of data are shown in Table 4. The two groups of InteriorNet simulation data were used to complete three groups of experiments. Panoramic images were used for the SLAM with the SPHORB and ORB algorithms, and monocular images were used for the SLAM with the ORB algorithm. Due to the violent movement in the simulated data, tracking failure occurred in the monocular images, whereas no tracking failure occurred in the SLAM experiments with the panoramic images. These comparative experiments proved the advantage of SLAM in respect of panoramic images.  Table 4, except for "Monocular ORB", which experimented with monocular images, the other entries all experimented with panoramic images. The results show that in the column of SPHORB, the initial effect, the average number of matches per frame, and the total number of final map points, are the best among the three groups of experiments, but its shortcomings are also very obvious, and the speed is slow. Table 5 and Figure 11 show the results of the evaluation with the EVO Python package [48]. The headers max, mean, min, rmse, and std in Table 5 represent the maximum, average, minimum, root mean square error, and standard deviation of the positioning error, respectively. From the experimental results for the simulation 1 data, it is clear that the rmse of SLAM with the SPHORB algorithm is the lowest. The trajectory of the SPHORB algorithm is closest to the true value of the trajectory. In contrast, the trajectory of monocular ORB is not complete, because it lost many frames, resulting in only a short tracking result.
The scene of simulation 2 data is more complex, so the three groups of experiments did not obtain good results. As shown in Table 4, the monocular ORB had tracking failures, so its results are not comparable with the other two groups. In Table 5, we put the symbol ( ) on the corresponding row. The accuracy of panoramic SPHORB was slightly better than that of panoramic ORB, but the time consumed by SPHORB was about four times that of ORB. It can be seen that, for the case of a complex scene, the accuracy of SPHORB does not show a great advantage over ORB, and it does take more time.

Conclusions
In this paper, we have studied the spherical imaging model and a method of panoramic visual SLAM. We have developed a SLAM positioning system suitable for panoramic vision. Through the research of this paper, the following conclusions can be drawn: (1) For the spherical model, we compared the spherical surface to the Earth. The pixel coordinates on the sphere were expressed in latitude and longitude. The equations derived by this method are concise and easy to understand, which provides convenience for the back-end optimization part of panoramic SLAM. (2) Experiments show that most of the time, ternary SIFT outperforms binary SIFT and SIFT in accuracy and efficiency. The precision of ternary SIFT is slightly less than SIFT only when the number of feature points is very small (i.e., less than 500), but this is acceptable. (3) Spherical images have a higher resolution and more feature points, which has greater advantages than monocular images. However, the distortion of spherical images is serious. After weighing the relationship between accuracy and speed, it was found that the SPHORB algorithm is the most suitable among the feature extraction and matching algorithms mentioned in this paper for panoramic visual SLAM positioning systems.