Classification of terrestrial laser scanning data based on relative projection density

Point cloud classification is a critical step in ground 3D scene analysis. The density of large-scale terrestrial laser scanning data will decrease rapidly with the increase of distance, which will affect features extraction. Focusing on this problem, we propose a grid features based on the relative projection density for point cloud classification in this paper. Geometric features are constructed based on the neighborhood covariance eigenvalue of each point. In grid feature extraction, the relative projection density is used to replace the number of projection points as grid density feature directly. In an outdoor scene obtained by the panoramic scanning of Reigl-VZ400 scanner, the grid features based on the relative projection density and the traditional projection density features are compared and analyzed. Based on the Random Forest for classification, the result shows that the relative projection density features with overall accuracy of 96.51%. Compared with traditional projection density feature, it is more accurate in classification, and it also performs relatively well in the extraction of cars, pedestrian and pole.


Introduction
Terrestrial laser scanning(TLS) technology is a new measurement technology in the field of surveying and mapping. It can capture accurate 3D information of the target object efficiently. In recent years, TLS has been widely used in many areas, such as building reconstruction [1],vegetation detection [2], roads extraction [3] ,and forest resource survey [4]. Since original scanning data is a discrete point cloud and does not contain semantic information, point cloud classification is a necessary processing before TLS data application, which specifies a specific semantic label for each 3D point.
Feature extraction is a significant step in point cloud classification, and many features have been proposed in previous studies, including geometric features, RGB colors, echo-based features etc. Geometric features are calculated from the 3D coordinates of all points. To describe the shape of the neighborhood, dimensionality features can be constructed by the eigenvalues of the local structure tensor [5]. The eigenvalues are extended in both 3D and 2D space by Weinmann, and he combined with projection density, height difference and height standard deviation for classification [6] [7] [8]. The grid projection density is greatly affected by the point cloud density, and it is difficult to distinguish the geometric features of ground objects themselves. Focusing on this problem brought by point density, Chen et al. [9] presented polar grid features, it can possibly take the place of the regular grid to adapt to density change, and further improve the classification accuracy of buildings from a large distance. A single point description method based on the point feature histogram (PFH) can provide multi-value features with good robustness at the overall scale, but the computational complexity is high [10] [11].In order to solve this problem, a similar approach called Point Feature Histograms [12] has been proposed. RGB color is usually obtained by the built-in camera of the scanner and assigned to each point by the registration of image and point cloud. In [14], taking advantage of the different colors of objects in the mining area, RGB colors of the point cloud are converted to the HSV space to achieve the classification of the surface point cloud data. Both RGB color and geometry features are involved in constructing 23dimensionality features, and classify with random forest [15]. In [16], RGB colors were combined with intensity information modified by angle and distance for classification.
Echo features mainly include the number of echoes and echo intensity. Pirotti et al. [17] proposed a method, that can identify non-ground points and classify TLS data based on the number of echoes. Echo intensity is one of the most commonly used features of TLS point cloud, and its value is the maximum amplitude of the returned pulse [18].The intensity of artificial object such as buildings and roads is often higher than natural features such as vegetation, therefore, the intensity can be used to extract buildings or roads directly [19]. In [20], the optimal neighborhood is selected based on the intensity feature, the outliers of intensity will be eliminated. Then the geometric features such as normal vector and dimensional feature are combined to form multi-dimensionality features, participating in classification with SVM.
As RGB colors and echo information are not available for all datasets, the use of non-geometric features will limit the universality of a classification method. Because the calculation of geometric features only relies on the coordinate information of the points, we only involve geometric features in our investigations. As a part of geometric features, grid features have also been studied in many studies [7][21], among which rectangular grid is the most commonly used grid. Grid feature is influenced by the local structure and point density of the target object. The number of projected points in each subgrid commonly decreases with the increase of scanning distance. Accordingly, it may not be suitable for large-scale TLS to use the number of projection points as grid density feature directly. Focusing on the problem brought by density variation, we propose the relative projection density to optimize the classification accuracy of TLS data.

Neighborhood selection
In point description, it is necessary to find a subset that can represent the characteristics of the point cloud best. However, point cloud of different sizes and densities will affect the definition of neighborhood and the selection of corresponding scale parameters. In order to overcome the disadvantages of fixed neighborhood. In this paper, the method in [6] is used to determine the optimal neighborhood size based on K-nearest Neighbor (KNN) by gradually increasing the number of neighboring points k ( kϵ [10,100], ∆k=10).

Feature extraction
Most public datasets contain geometric information of spatial 3D coordinates, so this paper mainly involves geometric features in our investigations. Firstly, the ground points are eliminated by the method of Cloth Simulation Filter(CSF) [22], and secondly the features are calculated from the remaining points. Finally, the geometric features (3D features, 2D features and grid features) of single points are generated by the optimal neighborhood selection method.

3D features.
In this paper, we use the method in [6] to construct 3D features. Linearity L λ , planarity P λ , scattering S λ , Shannon entropy DE, eigenentropy EE, Omnivariance O λ , anisotropy A λ , change of curvature C λ , the local point density D and the verticality V of single point are added to describe the 3D geometric properties of point cloud. The 10-dimensonal 3D vectors can be obtained as   3 , , , , , , , , , 2.2.3. Grid features. 2D grid feature extraction takes 2D grid as the processing object, which is usually based on rectangular grid. Grid features are extracted by projecting the 3D points onto a 2D grid and analyzing the points in each sub-grid. Projection density is a crucial grid feature. In large-scale scene, the point density will decrease rapidly with the increase of scanning distance. The grid near the scanner usually contains more points than the grid farther away from the scanner, as shown in Figure 1. The projection density based on the number of points is not only related to the height difference in one subgrid, but also directly affected by point density. For example, the projection density of tall buildings at a distance may be lower than that of low vegetation or tree canopies at a close distance, which affects the classification accuracy. Therefore, directly taking the number of projection points as the grid density characteristic may not be suitable for large-scale laser scanning data. In this paper, the relative projection density is proposed as grid density feature based on rectangular grid.
In addition, the parameters of the projected density grid feature also include the maximum height difference and height standard deviation of elevation values within the grid. Hence, the grid feature of relative projected density can be expressed as

Classification method
Reference label of each point is marked manually, while ground points can be obtained by CSF filtering. This paper randomly selects an equal number of sample points from each category as training samples, and uses Random Forest(RF) [23] for classification. We use the package in [24] and set the number of trees in the forest to 500.

Data
The experimental data in this section are obtained in an urban scene by the REIGL-VZ400, and the angular resolution is 0.057°. The information of each category in the scene is shown in Table 1. The ground points are filtered by CSF filtering and do not participate in feature extraction and classification. Therefore, 5,723,196 non-ground points participate in feature extraction and classification, and 1,000 points are randomly selected from each of the six categories as samples. The relative projection density is compared with the projection density based on polar grid which is also proposed to adapt to the density variation in TLS data in [9]. Besides the grid features, the 3D and 2D features in [9] are the same in our method. As the 3D and 2D neighborhoods of the two methods are both generated adaptively by the method in [7], parameters requiring manual setting are related with the grid features. The main parameters of grid feature proposed in this paper are grid size, which are set as 1m*1m in this test. For polar grid, its main parameters are angular width and polar grid size Δ . The angular width is set to N times of the angular resolution (0.057°) with N=10 and Δ is set to 0.5. Due to the randomness of the selection of samples, the result of each run may be different. Thus, 100 experiments are repeated for each method. The evaluation metrics are composed of overall accuracy (OA), recall, precision and F1-score. The experimental results are shown in Table 2-3.
It can be seen that our method obtain better overall accuracy than polar grid features and the corresponding overall accuracy value are larger than 90%. The best overall accuracy of a single run is 96.75% for relative projection density grid features, and the average overall accuracy of relative projection density grid features is 3.2% higher than that of the polar grid features.
For the class level, the relative projection density grid features of each class are superior to the polar grid features. Both two methods have similar precision on buildings class, but in terms of recall and F1score, relative projection density grid features still remain a higher level. In addition, as shown in Figure  3, a large number of internal structures of a close distance buildings and facades of distant buildings are identified as car and vegetation in the polar grid features. In the vegetation class, recall, precision and F1-score are approximate to each other in relative projection density grid features, and the false detections are mainly from the car and pole classes. However, in the polar grid features, more false detections are from the classes of car and building, which will have a greater impact on the extraction of buildings. Moreover, in vegetation class, there is an obvious gap between two methods of accuracy indicators. In the classes of cars and pole, the recall value of both methods are higher than 95%, which indicate that most of the surface points of cars and poles could be correctly identified, but the precision is lower than 50%, and the precision of pole under the polar grid features is only 18.57%, lower than any values in all classes. What's more, the two classes also have lower F1-score than other classes. In pedestrian and others classes, the relative projection density grid features are superior to polar grid features. Our method performs better on all classes extraction than polar grid features, but it improves more in car, pole and pedestrian.