A unified SLAM solution using partial 3D structure

1 Abstract —Good quality of environment mapping demands modelling the associated environment nearly to its 3D originality. This paper presents a unified Simultaneous Localisation And Mapping (SLAM) solution based on partial 3D structure. As compared to existing representations such as grid based mapping, the novelty of the proposed unified approach lies in estimation, representation and handling of compact partial 3D features-based map model for a team of robots that are working in an unknown environment with unknown poses. The approach replies on a camera to perceive the environment and a 2D laser sensor to generate a SLAM solution with partial 3D features based representation. Extended Kalman Filter (EKF) estimates the robot pose based on its motion model and map of the explored environment. The solution has been tested in an indoor environment on two identical custom-developed robots. Experimental results have demonstrated efficacy of the approach. The presented solution can be easily applied on a distributed/centralized robotic system with ease of data handling and reduced computational cost.

1 Abstract-Good quality of environment mapping demands modelling the associated environment nearly to its 3D originality. This paper presents a unified Simultaneous Localisation And Mapping (SLAM) solution based on partial 3D structure. As compared to existing representations such as grid based mapping, the novelty of the proposed unified approach lies in estimation, representation and handling of compact partial 3D features-based map model for a team of robots that are working in an unknown environment with unknown poses. The approach replies on a camera to perceive the environment and a 2D laser sensor to generate a SLAM solution with partial 3D features based representation. Extended Kalman Filter (EKF) estimates the robot pose based on its motion model and map of the explored environment. The solution has been tested in an indoor environment on two identical custom-developed robots. Experimental results have demonstrated efficacy of the approach. The presented solution can be easily applied on a distributed/centralized robotic system with ease of data handling and reduced computational cost.

I. INTRODUCTION
Simultaneous Localization and Mapping (SLAM) is considered as the fundamental problem in mobile robotics [1]. Initially both the map and the mobile robot position are not known and the robot moves through the environment, perceives it using on-board sensors and finally generates a map model. Due to sensor noises and process noises associated with motion of the robot, various probabilistic estimation techniques are used to explicitly model different sources of noises and their impact on measurements for efficient robotic map building.
The applications of SLAM extend from land environments to aerial and underwater scenarios [2]. A number of latest findings have been reported in the present decade. SLAM solutions have been successfully integrated with path planning and navigation issues to enhance autonomy of the moving robot. SLAM problem, at present, is considered as a solved problem with different estimation Manuscript received March 23, 2014; accepted July 18, 2014. This research was funded by grant from University of Genova, Italy.
approaches [1]. Many researchers have proposed distinct solutions starting from late 80's when stochastic techniques have started to replace deterministic based approaches in solving localization problem because of associated uncertainties in moving robots [3]. Due to variation in stochastic techniques, a number of SLAM versions have been found in the literature. Researchers have used Extended Kalman Filter (EKF), Information Filter (IF), Sparse Extended Information Filter (SEIF) and Rao-Blackwellised Particle Filter (RBPF) to propose SLAM solution [4]. Various combinations of integrated on-board sensors (e.g. 2D laser scanner, ultrasonic sensor, camera etc) have been used for testing such solutions [4]. In [5], EKF based SLAM solution has been presented using features for map modeling while SEIF based solution has been proposed in [6]. RBPF based approaches rely on map features modeling [7] or grids based map modeling [8].
In the present research, feature based EKF SLAM algorithm is proposed to solve localization and mapping issues. A 2D laser sensor and a camera are used to perceive the environment nearer to its 3D originality.
The organization of the paper is as follows: Section II discusses the related work. Section III highlights the proposed feature based representation scheme. Section IV presents unified EKF SLAM framework and its results have discussed in Section V. Finally Section VI comments on conclusion and potential applications of this research.

II. RELATED WORK
Scientific community reports several contributions to present 3D solutions using camera [9] since it is considered as cheapest and useful sensor to detect 3D features. Camera provides rich set of information about the environment which can be used for robot localization and mapping. Numerous techniques employed to use single camera or a group of cameras to efficiently solve SLAM problem are reported in [10]. Most of these approaches have used point features (using patches) extracted from the image. This approach has been considered useful; however it cannot efficiently produce a 3D model of the explored environment. Another major challenge offered by this approach is in the features initialization [9], [11]. Camera provides bearing A Unified SLAM Solution Using Partial 3D Structure S. Riaz un Nabi Jafri 1,2 , J. Iqbal 3 , H. Khan 1 , R. Chellali 1 1 only information of the feature and it is almost impossible to extract the depth information of the features using only one image (single view) [12]. An inverse depth concept has been used to initialize features with depth uncertainty reaching to infinity [12], [13]. It has been successfully demonstrated that this scheme works well not only to localize the camera but also to map very distant features. In an attempt to model lines using end points, use of line segments has been proposed in [14], [15]. Some drawbacks associated with this solution include delayed initialization and uncertain end points [9].
To solve SLAM problem, the present work exploits simplicity and computational efficiency of MonoSLAM i.e. single camera based SLAM solutions [16]. The distinguishing features of the proposed approach include building compact map model and sharing/ processing of information among team of robots.

III. PROPOSED APPROACH
In indoor environments, geometric shapes are common such as tables, walls, chairs and cupboards etc., which need to be modeled. In this work we are detecting these shapes from integrated sensors and extracting lines/corners features from them to model the environment.  horizontal Field of View (FoV) than the camera that is why we are observing some objects with horizontal features whose vertical counterparts are not detected. Though the camera has smaller horizontal view but its wide vertical sensing range permits it to observe window limits not detected by the laser. Figure 1(b) illustrates the desired representation of the environment with the robot (represented by triangle) and extracted features from both sensors. Laser sensor can display extracted feature immediately but it is almost impossible to display camera extracted feature exactly on the actual location because of unavailability of depth information. These limitations offer several challenges to use both sensors and demand a solution to initialize camera based features immediately and to converge them after getting sufficient parallax [12]. Here we are proposing a unified approach for presenting the features in a common way and to update them in EKF SLAM.

A. Camera Features
After image capturing from a calibrated camera, inverse depth parameterization of vertical lines has been incorporated in EKF framework. Consider a robot moving in an indoor environment from point A to point B with initial pose (xA ,yA , θA) to final pose (xB , yB , θB) as shown in Fig. 2. During its motion, it observes a vertical line feature (as shown by solid cyan line), which is projecting on X-Y plane at point V.
The bearing of the first vertical line feature is denoted as in the robotic frame of reference as shown by arrow from point A to point V in Fig. 2. The camera frame of reference and robotic frame of reference (shown by XR-YR axis with dashed arrows) are considered as same in this work. A 2D unit vector describing the direction of the line feature from the camera can be represented as (1) where is the u-axis (horizontal camera axis) central coordinate in the image, is the focal length of the camera, is the physical spacing of pixel on the optical sensor of the camera, is the rotation matrix from the camera coordinate frame to the world coordinate frame and is the detected u-axis coordinate of the vertical line feature. The bearing can be represented using directional vector as The world frame is shown by XYZ arrows. The bearing of the feature in world frame ( ) can be computed using The inverse depth representation for the projected feature point can be given by where ( , ) is the location of the robot in world frame, α is the bearing of the line in world frame and is the inverse of the range (depth) information of the line feature ( ) and is initialized with a guess 1/ .
Projected coordinates of V ( , ) can be written as During its motion from A to B, the robot is detecting again vertical line with different local bearing in its view as shown by arrow from B to V in Fig. 2. Point coordinates of V take form of (7) All the robot poses at A and B are known (with nearest global values) from the motion model. (6) and (7) show that only inverse depth ( ) is unknown and we need to find its value using significant changes of robotic pose as well as feature bearing angles. The proposed approach is based on taking the advantage of the robot successive movement and iteratively predicting a nearer estimate for the point V coordinates. For future matching in upcoming images, an image patch covering all vertical portion of the image with a small width around the line is stored to associate line in future observations. When the robot moves to B and takes an image then the line appears in the image on different horizontal pixel value. To associate this line with the older line, we need to predict a guess for the older line which shows that the older line will probably appear in this horizontal region of the new image shown in pixels. This guess is generated using motion model of the robot. Currently it is assumed that we are getting the pose prediction and = , which gives the location of the robot at point B in world frame. The older feature location can be predicted using (8) .
Rewriting (8) in inverse depth form as Equation (9) is used to find the horizontal pixel value of the older line for the image feature. Measurement model given by (10) shows the representation for finding the value of predicted older line position ( ) To associate new observed lines with the old ones, nearest neighbor method is used. For each new line, it is evaluated that new observation and old prediction satisfy (11) , p z u u u th   (11) where is the new observation for current feature and thu is the threshold (in pixels). After getting matched pair, for more robust association, new image patch belonging to a new current feature is compared with previously stored patch through cross correlation. In case at any step, if no line is matching with new line and new line is far from older lines (greater than predefined threshold) then new line is considered as fresh candidate and is initialized immediately.

B. Laser Features
This work uses laser based horizontal geometrical (line and corner) features and parameterize them as inverse depth features. Consider the same scenario of the robot movement as shown in Fig. 3. At point A, the robot observes a corner C (as pointed by brown arrow) with range (distance from robot) as and bearing as . The laser sensor frame of reference is fixed and is same as the robotic frame of reference. Bearing of the corner feature can be given by (12) .
The corners can be initialized and represented by   , , , , where symbolizes laser corner feature, ( , ) is the location of robot at point A, and are bearing (in world frame) and range of the corner from the robot respectively. Corner point coordinates C ( , ) in terms of (13) can be written as During motion from A to B, the robot is detecting again corner with different range and bearing (say = 5 ELEKTRONIKA IR ELEKTROTECHNIKA, ISSN 1392-1215, VOL. 20, NO. 9, 2014 ) as represented by arrow from B to C in Fig. 3. We need to predict old corner feature and to associate this with new observed corner feature. The prediction ℎ of old corner feature is given by where dhlc and αhlc are the predicted range and bearing for the old corner from point B respectively, (xB, yB, θB) is the predicted new robot pose at point B and (xC, yC) is the old corner coordinates as determined by (14). In this work, nearest neighbor method is used which is detailed in [17].
Each new corner measurement is tested with (old) prediction against threshold as given by . lc lc lc h z th   (16) If the re-observed corner is found within a predefined threshold then it firmly associates with the previously seen corner. If no match is found then the corner is treated as new feature and is initialized as given by (13).
At the starting location (point A), the robot is observing horizontal lines (in brown) shown in Fig. 4. One such line is pointed by brown arrow from point A in the figure. The detected line is making a global bearing of with range of . Feature initialization is given by where represents the laser line feature, ( , ) is the location of the robot and and are bearing and range (in world frame) of the observed line respectively. The end points global coordinates for the lines are saved to associate them in future. As the robot moves, the line feature will re observe with different range and bearing (say = ) as indicated by arrow from point B in Fig. 4. The prediction ℎ of old line feature is given by where dhll and αhll are the predicted range and bearing for the old line from point B respectively, (xB, yB, θB) is the predicted new robot pose at point B, and are the old line global range and bearing respectively. Each new line feature measurement is tested with old line feature prediction for validating threshold given by IV. UNIFIED EKF SLAM Based on fusion of features extracted from both kinds of sensors in the same algorithm, a unified EKF SLAM solution has been proposed.
The motion prediction only affects the robot state during motion without changing the features state. F and G are represent the Jacobians of motion function , , , w.r.t. system state and w.r.t. noise , , = , , ∆ respectively, Q is represents the covariance matrix for the noise , , as stated in [12].

C. Measurement Update
After prediction, in the measurement update step of the algorithm, Kalman gain K is determined by [3] where is Jacobian of the measurement prediction function w.r.t. the state vector. Three different functions for predicting different sensors as given by (10), (15) In (24), is the observation noise matrix, which deals with uncertainty of the observations [14]. After determining Kalman gain , state vector and covariance matrix are updated: where h and z represent the predictions associated with the features and current measurements respectively [12]. This procedure involves final state estimations and its covariance in the current iteration. New features are added after these steps. Though all kinds of features are distinguished w.r.t. different measurements and associated models but a balanced state and covariance updates result in a peaceful and unified solution.

V. EXPERIMENTS AND RESULTS
An indoor office like environment is selected for experiments having objects with orthogonal and nonorthogonal orientations. The robots are moved manually from different start-end positions to get different trajectories and corresponding laser range data and images have been recorded. The frequency of both sensor inputs is fixed to 10 Hz. Maximum linear speed of each robot is around 0.1 m/sec and maximum angular speed is around 0.03 rad/sec. The different data sets obtained are then used independently in EKF algorithm. The algorithm is developed by combining standard MATLAB functions with various open source functions such as openSLAM developed by research community. Hough transformation function is used to extract image lines while Split and Merge algorithm is used to extract laser lines.
In first experiment, the robot is initialized with pose (0, 0, 0). Figure 5(a) illustrates the first image from the camera recorded at the starting location with the vertical lines (shown in green) observed from the image. Figure 5(b) indicates the corresponding results of laser lines (in blue) and corner features (in cyan) extracted from the first scan. In this figure, vertical image lines representing inverse-depth projected points (in green with numbering) and current robot location (in red) are also shown. All the vertical lines are just initialized with a predefined value and these positions are not showing the actual locations. Laser scan is covering 210 degrees around the robot, so number of horizontal features is more than extracted vertical line features (which have only 120 degree view). The robot continues its motion. Figure 6(a) shows final location of the robot (having black square as its identity) while Fig. 6(b) illustrates 2D map with trajectory. The updated laser lines (in blue), laser lines corners (in cyan), vertical image lines showing inverse-depth points (in green) and current robot pose (in red) can be seen in the figure. New vertical lines (in green) are also observed which were not detected at the beginning. Each inverse-depth point, initially initialized with a default value, has now converged to its nearest world location. Successive observations lead to depth estimation and points shift to almost correct locations. Some lines are not converged correctly due to low parallax and problems in data association. Figure 7 presents the resultant partial 3D updated map of the environment and trajectory within it as shown by red marks.   In second experiment, another similar robot is also deployed. Assuming that this robot is moving from a known location with a known orientation in the same environment, the first image view from the second robot is shown in Fig. 8(a) with the corresponding laser scan illustrated in Fig. 8(b). In the image, the first robot can also be seen in the view of the second robot (red mark).  Second robot continues its motion and keeps building its map. Due to known location of the second robot w.r.t. the first one, the second robot poses and its map can be plotted in the same map model of the first robot (shown in Fig. 7). Figure 9 shows trajectories of the first and second robot in red and blue respectively in the resultant partial 3D global map of the environment. It is important to note that vertical and horizontal lines are increased as second robot has explored environment with different views as compared to first robot exploration. Table I shows the mean features position error when compared with their actual locations in the environment with the developed map model.

VI. CONCLUSIONS
A unified EKF SLAM framework is proposed and tested in an indoor environment for building partial 3D map by using more than one robot. The proposed method provides a compact map model as compared to other representations. It is easy to process and to handle during computation and sharing. Based on experiments, results are found nearly accurate for both kinds of sensors extracted features. For better performance of the proposed method, it is noticed that for image features, extraction and association of more features are required. The developed map model finds its potential in various applications requiring real-time survey of environment, 3D modeling, building map for robot navigation and path planning. Results can also beneficiate human assistance to accomplish various tasks.