Traversable Ground Surface Segmentation and Modeling for Real-Time Mobile Mapping

Remote vehicle operator must quickly decide on the motion and path. Thus, rapid and intuitive feedback of the real environment is vital for effective control. This paper presents a real-time traversable ground surface segmentation and intuitive representation system for remote operation of mobile robot. Firstly, a terrain model using voxel-based flag map is proposed for incrementally registering large-scale point clouds in real time. Subsequently, a ground segmentation method with Gibbs-Markov random field (Gibbs-MRF) model is applied to detect ground data in the reconstructed terrain. Finally, we generate a texture mesh for ground surface representation by mapping the triangles in the terrain mesh onto the captured video images. To speed up the computation, we program a graphics processing unit (GPU) to implement the proposed system for large-scale datasets in parallel. Our proposed methods were tested in an outdoor environment. The results show that ground data is segmented effectively and the ground surface is represented intuitively.


Introduction
Mobile robots must be able to navigate and interact with unknown environments, without collisions or encountering other dangers, by determining traversable terrain regions, reconstructing terrain models, and employing other technologies. The technologies of dynamic terrain reconstruction and modeling with multiple sensors have been researched to provide mobile robots with the ability to conduct free space detection and support navigation without collision [1].
Datasets received by multiple sensors are integrated to produce accurate and reliable terrain information, such as 3D point cloud, video image, GPS data, and rotation state [2,3]. It is necessary to integrate instantaneously received datasets into terrain model, such as voxel map and textured mesh.
The conventional terrain modeling methods are difficult to process large-scale datasets, which are so large that they exceed the memory capacity of mobile robots. Meanwhile, the huge computational cost of the large-scale datasets processing causes a low speed of terrain modeling and visualization.
In remote operation applications, ground segmentation is necessary in the assessment of traversable regions. To accomplish this objective, we apply a ground segmentation method using the Gibbs-Markov random field (MRF) method. To create a photorealistic visualization for traversable ground surface, textured terrain models are represented using sensed datasets from each frame. By mapping captured video images on the 3D ground surface mesh, the visualization system provides intuitive imagery of a 3D geometric model for easy 2 International Journal of Distributed Sensor Networks terrain perception. Conventionally, the captured images are registered into the terrain model frame by frame. However, when mobile robots navigate a large-scale environment, the sensed images are registered incrementally. The amount of data becomes so large that it exceeds the robots' memory capacity.
For the purpose of improving traversable ground surface reconstruction to get rapid and intuitive representation, realtime terrain modeling and photorealistic visualization systems have been developed [4,5]. Photorealistic visualization attempts to realistically represent the realistic 3D terrain and objects model in virtual world, as these appear in the real world [6,7]. This type of visualization provides perceptive geospatial information for path planning and decisionmaking. In order to improve the speed performance, the realtime terrain visualization methods are researched [8,9]. A level-of-detail (LOD) method is commonly used to render the near-field regions of the terrain model in real time. In farfield regions, billboard rendering methods, which represent a texture in front of the terrain model, are commonly applied to realize photorealistic visualization. Considering a large-scale terrain model, the sequential computation of data registration requires substantial computational power.
In this paper, we propose a real-time traversable ground surface segmentation and intuitive representation system. To speed up the computation of the proposed methods, we used graphics processing unit (GPU) programming to implement the segmentation and modeling processes for large-scale datasets in parallel [10].
To reconstruct an intuitive terrain model using limited memory in real time, we firstly create a voxel-based flag map in GPU memory to register point clouds without redundancy. Then, we create a node-based texture mesh for ground surface representation. Each node in the mesh contains a certain quantity of vertices and a node texture. The node texture is generated by mapping the triangles in the captured image, which are projected from the vertices of the node mesh. Subsequently, we implement Gibbs-MRF method to segment traversable ground surface in the flag map and create a ground mesh from the segmentation results. Next, we map the triangles in a mesh node onto the captured images in order to generate the node texture. Finally, we represent the reconstructed terrain model by overlaying the ground mesh with the node textures. This paper is organized as follows. In Section 2, we discuss related works on large-scale terrain reconstruction and traversable ground segmentation. In Section 3, we describe a real-time traversable ground surface segmentation and intuitive representation system. In Section 4, we analyze the results of the proposed segmentation and modeling methods. In Section 5, we present our conclusions.

Related Works
During navigation and interactive tasks, rapid feedback on intuitive representations of a robot's surrounding terrain is required for real-time remote operation. To provide a remote operator with representations of a robot's surrounding terrain, robots need to reconstruct a terrain model using multiple sensors [11]. Sukumar et al. [12] propose an unmanned terrain modeling method using a multisensor system. This method integrates sensed datasets from each frame into a textured terrain mesh and provides a convenient visualization. However, when robots explore outdoor environments for long periods of time, this system cannot process largescale datasets.
To prevent the terrain model from being too large for the physical memory capacity, the terrain storage system requires highly compressed terrain model and fast memory loading performance [13]. Chang et al. [14] proposed a panorama image generation method, which combines video sequence into a panoramic scene. By this method, overlapping regions of video sequence were removed so that video storage was effectively compressed. Gingras et al. [15] reconstructed an unstructured surface from a 360 ∘ point cloud scan and represented the traversable areas using a compressed irregular triangular mesh. They applied a mesh simplification algorithm to reduce the triangles quantity on the large-scale terrain surface. Zhuang et al. [16] proposed an edge-featurebased ICP algorithm to extract edge points, which were registered into a 3D roadmap integrated with planar features and elevation information. By using this method, a large number of redundant points of pseudoedge were removed so that large-scale point clouds registration was implemented fast. However, the visualization of these works is not intuitive enough for remote operation. It is necessary to overlay 3D terrain model with captured video image.
Other researchers have enhanced the performance of photorealistic visualization to provide real-time terrain modeling with low CPU consumption. Huber et al. [8] and Kelly et al. [17] describe methods for real-world representation using video-ranging modules. Scenes are rendered in the near field using a 3D color mesh, whereas far-field scenes are rendered using a billboard texture of the scene in front of robots. However, they allocated one color per grid, which caused distortion.
Meanwhile, ground segmentation is an important task that determines the traversability of a terrain. To segment ground data in 2D image or 3D terrain model, we need to calculate each pixel or voxel's probability of being in the ground and nonground configurations [18,19]. Vernaza et al. [20] presented a prediction-based structured terrain classification method for the DARPA Grand Challenge. They used an MRF model to classify the pixels in 2D images into obstacles or ground regions. Because the computation of MRFs is too complicated for large-scale datasets, we need to remove redundant elements from the MRF in order to reduce the computational cost of ground segmentation.

Traversable Ground Surface Reconstruction
In this section, a real-time traversable ground surface segmentation and intuitive representation system is presented. Firstly, a GPU-based ground surface reconstruction system is described. Then, a voxel-based flag map is proposed for generating a nonreduplication terrain model. Next, a Gibbs-MRF model is applied to detect ground surface in the reconstructed terrain. Finally, we create a node-based texture mesh for ground surface representation.

GPU-Based Ground Surface Reconstruction System.
In a large-scale environment, there are a large number of triangles of several nodes to be mapped from the captured video images. Hence, to realize real-time traversable ground surface segmentation and reconstruction, we implement the proposed methods in parallel by applying GPU programming. The framework of the GPU-based terrain modeling system is shown in Figure 1.
After converting 3D points into global positions based on the GPS and rotation states, we copy the global positions and the current captured images in CPU memory to GPU memory. Subsequently, we create a voxel map by registering the global positions into the voxel-based flag map to remove redundancies. The color information of the voxel map is computed by projection from the voxels to the captured 2D images. Using the Gibbs-MRF method, we segment traversable ground data from the voxel map and insert them into the ground mesh. Next, we project each triangle of the mesh onto the captured image in order to acquire the mapped triangles within the captured image. We then duplicate the mapped triangle to the node texture buffer. Finally, we copy the node meshes and the node textures in GPU memory to the node-based texture mesh of the terrain model in CPU memory. By overlaying each node mesh with its node texture, we render an intuitive ground surface.

Voxel-Based Flag Map.
After the vehicle collects several consecutive frames of 3D point clouds, some points are inserted into the same voxel. This causes wasteful duplication of memory if we register these points into the terrain model. To remove redundant points, we developed a voxel-based flag map to register 3D points into the terrain model without reduplication.
The sensed point clouds are quantized into a space of regular voxels. We specify a bitstream to define a voxel-based flag map. We allocate a 1-bit variable b(v) for each voxel v. We To solve this problem, we shift the center of the flag map to the position of the vehicle when the vehicle moves to a certain distance. It is necessary to drop the passing information from the memory of the flag map and store new sensed points. In this manner, we utilize a flag map with limited range to represent the information about the dynamic environment surrounding the vehicle.

Ground Segmentation Based on Gibbs-MRF Model.
To segment ground data, we need to calculate each voxel's probability for ground or nonground configuration. The voxels in the voxel-based flag maps are highly affected by their neighboring voxels. This phenomenon follows the property of Markov chains; therefore, an MRF model is widely used in segmentation. In this section, we introduce a ground segmentation method using Gibbs-MRF.

MRF and Gibbs Distribution.
We define as a set of voxels. The random vector = { } on has a vector consisting of a color observation variable and a configuration variable. The configuration variable has a value for the ground, objects, and the background.
A neighborhood system for V contains all voxels within a distance of ( ≥ 0) from V, defined as = { V | V ∈ }, where V is the neighbor set for voxel V. We define a clique as a voxel set neighboring the given voxel. In our application, a clique contains the given voxel and its neighboring voxels. A clique set is defined as a collection of single-voxel and pair-voxel cliques, as follows: We define the observation variable as a joint value computed based on the height value and color information of a voxel. The color information of a voxel is mapped from the voxel center position to the captured image. Variable only represents the configuration. Given observation and configuration , the segmentation process for an image aims to find the best possible configuration * for voxel V, which gives the following optimum solution: Based on its MRF property, it can be said that the configuration at location V only depends on the configuration of its neighboring voxels. The random field on has the joint probability density function (PDF), as follows: However, it is difficult to practically specify the joint probability density ( | ). To evaluate ( | ), we apply the Gibbs distribution, following the Hammersley-Clifford theorem.
We use a potential function ( ) to evaluate the impact of the neighboring voxels in clique , ∈ . The energy function ( ) in (4) is defined as the sum of the impacts of clique set : The PDF is calculated using the Gibbs distribution form, as follows: Constant is referred to as the temperature factor in Gibbs' theory, and it controls the deviation of the distribution of ( ) in MRF. The posterior probability of under the Bayesian rule is as follows: The solution of (2) is as follows: * = arg max ( | ) = arg max ( | ) ( ) . (7) We assume that the probability functions ( | ) and ( ) are in the form of a Gibbs distribution, as expressed in (5). Thus, the process for determining the maximum value of ( ) is the same as that for determining the minimum value of the energy function ( ). Therefore, the solution to (7) is equivalently obtained by minimizing the energy functions, as follows: * = arg min ( | ) = arg min { ( | ) + ( )} . (8) In this application, we consider the impact of the neighboring voxels in single-voxel and pair-voxel potential cliques. The energy function of ( | ) + ( ) is expressed as follows: The values of the clique potential functions 1 ( V ) and 1 ( V | V ) depend on the local configuration and observations of clique 1 . The clique potential functions 2 ( V , V ) and 2 ( V , V | V , V ) reflect the pair-voxel consistency of clique 2 .

Ground Segmentation in the Global Voxel Map.
In the 3D segmentation process, we initially segment the 3D ground data using the robot vehicle's height (ℎ) as the standard. We assume that if the sensed 3D point is below the vehicle height value, ranging from −ℎ − Δ to −ℎ + Δ, then this point is the ground data. In our experiment, Δ = 0.05 m. The rough segmentation results produce dataset 1 . This dataset does not contain all the ground data, because we only determine the configuration using a local height value. To refine the segmentation result computed by the height histogram method, we apply the Gibbs-MRF model to ground segmentation in the global voxel map, which contains only few errors.
The sum of the clique potential functions 1 ( V | V ) and 1 ( V ) is formulated as follows. Constant in (10) is a positive numerical value: The clique potential functions 2 ( V , V | V , V ) and 2 ( V , V ) are formulated as follows: In (11), ‖ V − V ‖ is the color and height differences between observations V and V . Constants and in (11) are positive numerical values.
Deriving (9) using the potential functions defined in (10) and (11), we can label the configurations of each voxel. The voxels with a ground configuration are grouped into dataset 1 . Dataset 1 is shown as the blue region in Figure 2

Texture Mesh Generation for Ground Surface.
In the voxel map visualization results, we find that there are some holes that exist between adjacent voxels. To solve this problem, we apply a texture terrain mesh so as to produce a photorealistic visualization with intuitive and sufficient information. A terrain mesh is a kind of 2.5D elevation map which represents an x-z cell with the top point in this cell. The texture terrain mesh is generated by mapping the texture onto the terrain mesh.
In this section, we describe a node-based texture mesh which provides an intuitive representation of the reconstructed traversable ground surface. The mesh is generated using several nodes. Each node has a certain quantity of vertices and a texture. The height value of each vertex in the mesh is updated with the top voxel of an x-z cell in the global voxel map. If a new voxel is to be inserted into the mesh but is outside the existing nodes, we create a new node to register this voxel. Figure 3 shows the process of node texture generation. For each 3D triangle ( 0 , 1 , 2 ) in the mesh of that node, we create a triangle ( 0 , 1 , 2 ) in a node texture, which has a set of triangle pixels. Subsequently, a triangle ( 0 , 1 , 2 ) in a captured image is projected from ( 0 , 1 , 2 ). We then duplicate triangle ( 0 , 1 , 2 ) from triangle ( 0 , 1 , 2 ). After all of the triangles in a node mesh are mapped from the current image, the node texture is updated.
In large-scale terrain environment, there are more than a thousand of such triangles sensed in the terrain mesh. Therefore, it is difficult to apply CPU programming to implement such node texture generation method in real time. Therefore we realize the triangle duplication method in parallel by applying GPU programming, which is implemented by CUDA kernel.
We firstly copy the current captured images, an updated node mesh, and its texture into GPU memory. Next, each triangle of node mesh in GPU is projected onto its destination triangle in the captured image. Then, we duplicate the destination triangle to the triangle in node texture. Finally, the generated node texture in GPU memory is copied into CPU memory (see Pseudocode 1).

Experiments
Experiments were performed to test the proposed traversable ground surface segmentation and modeling method. The mobile robot shown in Figure 4 was used to gather data using its integrated sensors, including a LiDAR sensor, a GPS receiver, a gyroscope detector, and three video cameras. The valid data range of the LiDAR sensor was approximately 70 m from the robot. The multiple sensors collect terrain information in the form of 3D point clouds, 2D images, GPS data, and rotation states. The proposed algorithms were implemented on a laptop with a 2.82 GHz Intel Core2 Quad CPU, a GeForce GTX 275 graphics card, and 4 GB RAM. The terrain model is reconstructed and represented using Microsoft's DirectX application programming interface. and captured video images into the texture terrain mesh. The bottom images were captured by the three cameras. The robot captured one image of 659 × 493 pixels every 0.1 s. The node size was 12.8 × 12.8 m 2 . The resolution of the node texture was 512 × 512 pixels. We compared the texture buffer sizes of the node textures and the video images, as shown in Figure 6. After 20 s, 200 images were captured and 82 nodes were registered in the terrain model. The node textures buffer size of these nodes was 82.0 Mbit, generated from 247.8 Mbit of video images. Therefore, the results demonstrate that large-scale video images were registered to the node textures with a low memory overhead using the proposed GPU-based ground surface mesh generation method.
We compare the GPU-based rendering speed with CPUbased terrain modeling results, as shown in Figure 7. After the voxels were registered in the terrain model, we segmented the points into the ground dataset and nonground dataset using the Gibbs-MRF method. In our project, we implemented the ground segmentation procedure once for 180 packets of registered voxels. The ground segmentation duration was International Journal of Distributed Sensor Networks 7 Memcpy(cuda image, cpu captured image, hosttodevice); Memcpy(cuda mesh XYZ, cpu updating node mesh, hosttodevice); Memcpy(cuda node texture, cpu updating node texture, hosttodevice); cuda Kernel Projection<<<Dg, Db>>>(cuda mesh XYZ, cuda mesh UV, projection matrix); cuda Kernel Duplication<<<Dg, Db>>>(cuda node texture, cuda mesh UV, cuda image, cuda node texture); Memcpy(cpu updating node texture, cuda node texture, devicetohost); Pseudocode 1 0.5271 ms on average, which is much faster than 0.1 s and satisfies the real-time requirement.
We compute the rendering frame counts every second. After 5 seconds, 21,134 triangles were generated in the ground mesh. The rendering speed is approximately 38 fps (frames per second) using GPU programming. After 35 seconds, 105,723 triangles were generated. The rendering speed slows to 26 fps using GPU programming. Using the CPU, the terrain rendering speed is 6 fps after 30 seconds of data collection. In our study, we aimed to render more than 15 fps to achieve real-time visualization. Therefore, the results demonstrate that we meet the real-time requirements for dataset registration and visualization using GPU programming.

Conclusions
In this study, we developed an intuitive terrain modeling technique with a traversable ground surface segmentation method for a mobile robot. The mobile robot collects 3D point clouds, 2D images, GPS, and rotation states through multiple sensors. We constructed a voxel-based flag map in order to register the sensed 3D point clouds into the terrain model without redundancy. We segment ground data from the reconstructed voxel map with few errors using a Gibbs-MRF model. The ground voxels are registered into a nodebased texture mesh to provide rapid and intuitive information of the visualized terrain with low memory. In order to realize real-time approach, we employed a GPU to implement the proposed methods in parallel. We tested our proposed system using a mobile robot mounted with integrated sensors. The results demonstrated an intuitive visualization performance and low memory requirement in a large-scale environment.
However, in large-scale environment, the nonground objects have complex shapes, such as persons, trees, and vehicles. In this paper, we represent these objects using pointbased rendering method and the visualization results are not intuitive enough. In the future, we will research feature-based object classification algorithms to group the sensed objects into different types. Then, we will represent different objects using modeling method.