Detection of Man-Made Constructions using LiDAR Data and Decision Trees

Real estate monitoring is very important aspect of country economics, but old manual methods of land survey are time and resources consuming processes as geodata actualization tasks. Actual, precise, multidimensional and detailed information is the main instrument of geospatial intelligence to understand current economic situation and to make effective decision. Actualization of geoinformation using remote sensing is the modern approach of the computer age to complete Earth observation and human environment monitoring. This article describes multi-stage classification model, which detects man-made constructions in LiDAR point cloud. Proposed classification model applies decision tree and geometrical features of shape to remove noises. The goal of study is to experimentally compare decision trees with crisp and fuzzy logic (ID3 algorithms) to select the more suitable algorithm for noise reduction task. Algorithms are compared using total accuracy and Cohen’s Kappa coefficient.


Introduction
Land and rural development is important part of human existence, however, natural resources must be efficiently used considering different factors like environment protection, cultural heritage, the potential for development of tourism and manufacturing, legal and economic conditions, etc. The geospatial intelligence can take a correct decision about effective usage of Earth resources, only if they have precise and sufficiently detailed information about actual geospatial situation. Therefore geospatial data actualization must be completed on an on-going basis.
Remote sensing is the modern approach of the computer age to complete Earth observation and monitoring providing relatively fast and cheap solutions to make geospatial data actualization, but the obtained data must be preprocessed to get statistical and semantical information for decision-making. The remote sensing data can be analysed manually, but it is time-consuming process due to massive amount of data, that makes necessary to develop the automatic data actualization systems with the high performance computing (HPC) solution.
This research is a follow-up to HPC system development for real-estate actualization using LiDAR point cloud and computer vision ). The proposed system consists from three stages: 1 st stage: filtering of last return points in LiDAR point cloud to remove vegetation and other noises; 2 nd stage: detection and segmentation of surface facilities using min-cut method; 3 rd stage: classification of surface facilities to identify buildings among them; where the goal of each stage is to remove additional noise-objects (see Fig.1).

Fig. 1. Classification system with multi-stage noise filtering
The initial classification model has used filter by area to identify buildings among noise-objects . The result of LiDAR point cloud processing is vector layer with building shapes prepared for geographical information systems (GIS). The obtained vector layer is compared with a previous layer to detect geospatial changes using an intersection of shapes. When geospatial changes are detected, image analyst must verify them using orthophoto, spectral images or LiDAR before to make data actualization.
Geospatial data belongs to the big data. Therefore, despite the high recognition accuracy of system, even the error smaller than 1% provides too many false objects. To improve classification accuracy, it was decided to replace area filter with decision tree. The previous study was related with geometric feature selection to filter buildings from walls, robust trees, large cars and other surface objects using the random forest of decision trees with crisp logic. 11 geometric features were studied and 5 features were selected as the most effective for classification providing solution with total accuracy 99% and Cohen's Kappa coefficient 0.90 (Kodors, 2017). However, completing classification tasks some authors obtain better accuracy results using fuzzy decision trees comparing with crisp decision trees (Idri and Elyassami, 2011). Therefore, the goal of this study is to compare classification accuracy of two decision tree models: with crisp logic and with fuzzy logic; as a solution for building recognition using the geometric features of shapes. Additional task of study is to measure influence of correct classification probability into recognition accuracy and the loss of data, what can be applied to set verification priority for image analysts.

Decision Trees and Remote Sensing
Decision Trees are classification methods and algorithms with relatively long history. The idea of using decision trees to identify and to classify objects is firstly mentioned by Hunt et al. in 1996(Sharma, 2013. Decision trees successfully find application in tasks related with classification using remote sensing data. For example, decision trees are applied to classify land covers using spectral images due to natural approach, when each pixel is analysed independently (Sharma, 2013), (Kulkarni and Shrestha, 2017), (Pooja et al., 2011), (Kulkarni and Lowe, 2016). However, pixel-based methods become ineffective with resolution increase (Veljanovski et al., 2011), but it does not reduce the significance of decision trees as classification method, which found renaissance in processing of shape or segment features. For example, LiDAR point cloud can be projected into 2D grid using voxel indices with subsequent classification of each pixels (Nesrine et al., 2009); or shape can be described using mathematical parameters compatible with input of decision tree (Jamil and Bakar, 2006), that was applied in study (Kodors, 2017).
Fuzzy Decision Trees are based on fuzzy logic introduced by Zadeh in 1965 (Idri and Elyassami, 2011). Completing experimental comparison, some authors obtain better accuracy results using fuzzy decision trees in place of decision trees with crisp logic (Idri and Elyassami, 2011). Fuzzy decision trees do not directly work with input data, each value is firstly preprocessed by membership function, which identifies strength of belonging to some subcategory of feature called event. Fuzzy trees have been applied for LiDAR processing before: pixel-based solution for forest boundary detection (Zhang et al., 2017) and object-basedfor land cover classification (Syed et al., 2005).

Dataset
25 samples of LiDAR point cloud have been applied in the experiment. The dataset of LiDAR data was provided by the State Land Service of Latvia for research tasks. The data was collected considering next technical parameters (WEB, a):  the total minimal point density must be 4 p/m 2 , the DEM must have minimum 1.5 p/m 2 ;  the vertical precision must be 0.12 m with the level of confidence 95%;  the horizontal precision must be 0.36 m with the level of confidence 95%. The collected data was preprocessed, filtered and classified, each sample contained the point cloud with area 1 km 2 and the minimal point density equal to 1 p/m 2 .
The provided dataset was processed using next algorithm : 1 st step: LiDAR point cloud is filtered to retain only surface points (single and last return points). 2 nd step: LiDAR point cloud is projected into 2D grid recording the maximally high point in cell of area 1 m 2 .
3 rd step: the points with strong elevation (1.8 m) are marked as seed points for min-cut segmentation algorithm. 4 th step: surface facilities are segmented using Dinic's algorithm with Dijkstra path finding algorithm. 5 th step: obtained segments are vectorised using 4-path (rook type) Theo Pavlidis' algorithm to get shapes of object.
Each shape was manually classified into two classes "buildings" and "noise" verifying each object using cadastral map, orthophoto and LiDAR data classified points. Total number of shapes is 844 284 with 99.68% of noise-objects. Total number of unique shapes is 19 999, where 2 428 (12.14%) belong to buildings and 17 825 (89.13%) are noises. 5 geometric features (see Table 1) were calculated for each shape. The features were selected considering the previous study (Kodors, 2017).

Features of Dataset
The traditional classification methods with crisp logic try to find hyperplanes, which separate one class from other; however, classical logical reasoning is not effective due to intersection of feature values (see Fig.2-3).
Unique samples of dataset (19 999) have been analysed with a goal to identify how many common samples belong to classes "buildings" and "noise" depending on the number of features (see Table 2).   The analysis of common sample decrease depending on the set of features (see Table 2) has showed, that feature C 1 does not minimize the number of common samples. The Spearman correlation between C 1 and A is 0.626 for class "noise" and 0.423 for class "buildings", according to source (Kodors, 2017). If C 1 is compared with C 2 , the equations have similar form. The correlation analysis (Spearman) for 254 common samples has showed, that C 1 has very weak correlation with A (0.039), moderatewith C 2 (0.456) and strong -with{ F (-0.707), R (-0.693) }. The feature A has very strong correlation with C 2 (-0.838) and weakwith { F (0.223), R (0.183), C 1 (0.039) }.
Completing the analysis of entropy (see Eq.1-2 (Sharma, 2013), (Pooja et al., 2011), (Kulkarni and Lowe, 2016) and see Table 3), the higher information gain is provided by features { C 2 , A, R }, that complies with features' importance; but { C 1 , F } are weak features for cluster split. The conclusion is "A and C 1 features do not replace each other, simply C 1 is too weak in this case to decrease number of common samples".
where Ethe entropy of dataset D, nthe number of classes; a class; Da dataset; ( )probability of class .
where ( )an information gain of feature ; a feature; ja band of feature (subgroup of value range); ( )probability, that a sample of feature belongs to a band j (see Eq.3); ( | )an entropy of subdataset ( D | ) (see Eq.1); ( )the entropy of all dataset (see Eq.1).
where ( )probability, that a sample of feature belongs to a band j; = | |size of all dataset; ( | )samples, which belong to a band j of feature . So, there is not possibility to uniquely classify 254 shapes using the set of features { C2, R, F, A, C 1 }, however, each sample of shape has different probability to belong to each class (see Fig.4), which can be applied by classification system, when users get classified layer with probability coefficient for each shape. The frequency analysis is completed for each feature (see Fig.5).
Frequency and distribution analysis is applied to define membership functions for fuzzy decision tree and to better understand each feature. Else-if ( | | = 1 ) Return the leaf node of feature ϵ with probability ( ) of each class.

Else
Calculate the entropy E of dataset D using Eq.1; Calculate the information gain ( ) using Eq.2 for each attribute ϵ ; Select the attribute with the maximal information gain ; Construct new node with the feature ; For each band b of the feature obtain subdataset ′ ; Remove the feature from the set: ′ = − ; For each band of new node call the function "Create node" with the parameters ( ′ , ′ , C ); Return the node of decision tree with probability ( ) of each class. End Function: Classify sample Goal: to classify sample. Input: na node of decision tree, sa sample. Output: class and its probability.

Start
If ( n is leaf node ) Return class with the maximal probability in the current node n.
Else Select next node ′ by the band of node feature; Return class with the maximal probability in the current node n.

Else
Return output of the function "Classify sample" using parameters ( ′ , s). End

Fuzzy Interactive Dichotomizer 3 (FID3)
Introduction: FID3 is based on ID3 algorithm, the difference is the star entropy calculated using membership functions. A linguistic group of feature is called event and its probability is defined by the membership function (see example in Fig.6).

Function: Create fuzzy node
Goal: to generate fuzzy decision tree. Input: Da training dataset, Athe set of attributes, Cthe set of classes, Mmembership functions. Output: fuzzy node.

Start
If Return the leaf node of class ϵ with probability 1.0.
Return the leaf node of feature ϵ with probability ( ) of each class.

Else
For each class ϵ calculate the star probability * ( | ) using Eq.6. For each feature calculate the star entropy E * using Eq.4; Select feature ϵ with the minimal star entropy * ; Construct new fuzzy node with the feature a min ; For each event g of feature a min obtain subdataset ′ ; Remove the feature a min from set: A' = Aa min ; For each event of new fuzzy node call the function "Create fuzzy node" with the parameters ( ′ , A', C, M ); Return the fuzzy node with probability ( ) of each class. End Function: Classify sample Goal: to classify sample. Input: na node of decision tree, sa sample, Mmembership functions. Output: class and its probability.

Start
If ( n is leaf node ) Return class with the maximal probability in the current node n.
Else Select next node ′ by the membership function with the maximal output; Return class with the maximal probability in the current node n.

Else
Return output of the function "Classify sample" using parameters ( ′ , s). End where * ( )the star entropy of feature ; nthe number of classes; a class; the number of membership functions (events) of feature ϵ ; * ( )the star probability of event (see Eq.5); * ( | )the star probability of event for class (see Eq.6). * ( ) = ∑ ( ) / , where * ( )mean star probability of event ; N = | (D | ) |the number of samples, which belong to event ; ( )a membership function j of attribute ; dsamples, which belong to event Sample belongs to event with the maximal output of a membership function in other words where * ( | )mean star probability of class in event ; N' = | (D | ) |-size of subdataset D, where all samples belong to class ; ( ′)a membership function j of attribute ; d'samples, which belong to class and event .

Results and discussions
The obtained dataset was processed using ID3 and FID3 algorithms. Each algorithm was analysed using two training approaches: a) the algorithms are trained using dataset with the unique samples (see Tables 4a  and 5a); b) the algorithms are trained using the full dataset with repeating samples (with probability of each sample), (see Tables 4b and 5b). Validation was completed using the full dataset with repeating samples, where the area of objects is greater than 10 m 2 (this condition is defined in the previous study (Kodors, 2017)).
The histogram bands of Fig.2-3 and Fig.5 were used by ID3 algorithm to construct the decision tree with the crisp logic. FID3 algorithm has used the membership functions manually defined using the distribution and frequency analysis of data (see Fig.6).
The accuracy of each algorithm is evaluated using the confusion matrix, the total accuracy (A) and Cohen's Kappa coefficient (K) (see Table 4 and 5). Additionally the results of experiments are compared with the random forest algorithm applied in the previous study (Kodors, 2017).
The result of experiment has showed, that a fuzzy decision tree can process unique samples, when a frequency about each sample is unknown (see Table 5a). However, ID3 algorithm is more precise (see Table 4b), if there is a sufficiently large dataset to obtain probability of unique samples. This comparison of the algorithms identifies the importance of sample probability for ID3 algorithm. In contrast, a fuzzy decision tree applies knowledge about a sample probability hidden in membership functions. To verify the possibility of FID3 to identify unknown samples, the dataset of unique samples was split into the training dataset (20%) and the validation dataset (80%). The measurements were completed 1000 times and they provided next results:  A min = 0.90687, A mean = 0.90988, A max = 0.91292;  K min = 0.54366, K mean = 0.55919, K max = 0.57290. Therefore, fuzzy decision trees are preferable, when there is not a dataset with the probability of samples, but experts can identify linguistic groups, which generalize biases and probability.   Comparing with a random forest algorithm applied in the previous study (Kodors, 2017), both algorithms ID3 and FID3 have smaller precision, their errors are 1.4% and 1.6% versus 1.1% of the random forest algorithm.
Selecting an answer, a decision tree verifies probability of each class in the node. Considering that, probability can be applied to filter incorrect answers; however, it provides the loss of data (see Fig.7). According to Fig.7, ID3 is more robust -60% of buildings are classified with probability 99% unlike 40% of FID3. The parameter "probability" can be provided together with a shape, it will be useful to accelerate manual data verification using the filter of GIS.

Fig. 7. Error decrease and data loss increase depending on probability of correct answer
Analysing the constructed fuzzy decision tree, 15 rules with 99% probability of the category "Buildings" were obtained (see Table 6). All rules excluding the 9 th row contain compactness (C 2 ) equal to value "compact". It identifies relatively strong "linear dividing" of classes using this feature that correlates with the result of feature analysis in the previous research (Kodors, 2017).
Looking globally into the building detection and classification problem using remote sensing data with the high resolution, ISPRS provides benchmark test (WEB, b) with spectral and DSM data, which has the ground sampling distance equal to 9 cm. The Vashington 2D labelling challenge provides building classification precision F 1 from 0.82 to 0.96 with mean value 0.93, where the unit of calculation is a pixel. The proposed method with filter 16 m 2 (Kodors et al., 2015) had the F 1 score 0.95, While the proposed method with improved filter by the random forest algorithm provides F 1 equal to 0.985. However, the different resolution and landscape of ISPRS and experiment datasets must be considered. Therefore, the precise comparison, of course, must be completed using ISPRS benchmark dataset.

Table 6. Classification rules of buildings
Rules with probability 99% 1) Area="small" AND R="regular" AND C2="compact" AND F="long" 2) Area="small" AND R="regular" AND C2="compact" AND F="compact" 3) Area="small" AND R="rectangular" AND C2="compact" AND C1="elongated" 4) Area="small" AND R="rectangular" AND C2="compact" AND C1="rectangular" 5) Area="middle" AND C2="compact" AND R="oblique" AND F="extended" 6) Area="middle" AND C2="compact" AND R="regular" AND C1="compact" 7) Area="middle" AND C2="compact" AND R="rectangular" AND C1="compact" 8) Area="middle" AND C2="compact" AND R="rectangular" AND C1="extended" 9) Area="middle" AND C2="extended" AND R="rectangular" 10) Area="large" AND C2="compact" AND R="oblique" AND C1="extended" 11) Area="large" AND C2="compact" AND R="regular" AND C1="compact" 12) Area="large" AND C2="compact" AND R="rectangular" AND F="long" 13) Area="large" AND C2="compact" AND R="rectangular" AND F="elongated" 14) Area="large" AND C2="compact" AND R="rectangular" AND F="compact" 15) Area="large" AND C2="compact" AND R="rectangular" AND F="square" The well-developed libraries like TensorFlow, Keras, Caffe, etc., supporting GPU calculations increased the number of machine learning engineers. And the understandable supervised solution, expected high precision, available open data, plenty of training courses and simple tuning model only increases the number of deep learning scholars. Therefore, nowadays, the deep learning is massively used for image classification including LiDAR data processing (Yang et al., 2017;Rizaldy et al., 2018;Sun et al., 2018aSun et al., , 2018b. The deep learning is based on the application of the artificial neural networks and it is intuitive to use 2D projection of LiDAR data as input that is actually applied in practise. As result, the deep learning deserves attention, because the proposed semantic segmentation algorithm based on the energy minimization approach methodology processes 2D projection of LiDAR data too. The deep learning scholars propose next results: overall kappa = 0.89 (Sun et al., 2018a), F 1 of roofs = 0.93 and F 1 of impervious surfaces = 0.90 (Yang et al., 2017), F 1 score of buildings = 0.95 (Sun et al., 2018b); that is close to the mean F 1 score for ISPRS data (WEB, b). Therefore, it can be concluded, that the proposed method has potential, which is comparable to deep learning methods, but it does not require training and the high-performance computing as deep learning solutions. However, the deep learning is applicable to process orthoimages and shows good results for building detection providing F 1 equal to 0.95 (Liu et al., 2018), that is important considering the fact, that airborne and satellite imaging are the more cost-effective services neither airborne laser scanning.
But regardless of deep learning and proposed segmentation classification algorithm precisions, the ID3, FID3 and the random forests filters, which analyse the geometric features of shapes, can extend all classification algorithms providing different precision improvement for each method. Considering to this experiment, the improvement is equal to F 1 = 0.035, that replaces the method from category "middle" (0.82 < x < 0.96) to "high" (x > 0.96). Of course, it must be considered, that proposed method only detects buildings.
Other application of geometric feature filters is quality control and tuning. It is timeconsuming to analyse the visual features of the correctly and incorrectly classified objects, but the conversion of adjectives like "compact", "large", "long", etc. into digital/ mathematical form provides possibility to apply computers for big data analysis.

Conclusions
The analysis of common sample decrease (see Table 2) has showed, that 5 features { C 2 , R, F, A, C 1 } do not separate all samples. Considering only classification accuracy, the correct solution is to add features for stronger division of classes, what can be obtained, for example, using features of spectral images. Firstly, increase of feature number requires additional performance, secondly, these additional features can be restricted, for example, if end user has only LiDAR data, therefore the increase of classification accuracy using a more powerful algorithm remains important task.
The comparison of the algorithms identifies, that the random forest algorithm provides better classification accuracy than ID3 and FID3. The errors among 3 algorithms are not very different, however, the difference is palpable in the case of the big data. The extension with the random forest algorithm increased the precision of previously published method (Kodors et al., 2015) from 0.95 to 0.985 (F 1 score) showing the high potential of method comparing with the modern solutions, which have the average precision equal to 0.93 (the deep learning solutionsapproximately 0.95). Close classification results show, that intelligent system must be extended with additional services like land cover classification, 3D model generation etc., or 1 st and 2 nd classification stages must be improved, that is confirmed by 254 common shapes of both classes, which provides constantly incorrect classification.
However, tuning of current system is possible. Decision trees are working using linear separators, which are provided by bands defined by logical expressions "less than" and "greater than". Fuzzy trees have overlays of membership functions, but the rule "event with maximal probability" identifies biases too in the intersection points between two events. Samples are distributed in multidimensional space and clusters can have custom forms. Linear separators can draw custom forms out, only if pixilation is sufficiently small. Therefore, the usage of PCA transformation can provide the better space for linear split of classes, that can improve classification accuracy and simplify decision tree structure; but the bi-plotsadditional information about relations among the features.
The significance of feature "compactness" (C 2 ) for classification task was proved by the entropy and fuzzy decision tree structure analysis, that correlates with the results of the previous study (Kodors, 2017).
Speaking about the type of logic, the crisp logic was more effective in the present case, when frequency distribution between classes is known for each sample.
The error decrease and data loss increase depending on probability of correct answer identify, that data loss increase is too strong to apply this filter for automatic classification. However, it is useful for the semi-automatic solution, when the part of data is accepted without manual verification, but other data are verified considering decrease of building probability according to classifier.