A Simple But Effective Approach of Building Footprint Extraction in Topographic Mapping Acceleration

Topographic mapping using stereo plotting is not effective, because it takes much time and labour-intensive. Thus, this research was conducted to find the effective way to extract building footprint for mapping acceleration from LiDAR data. Building extraction method in this process comprises four steps: ground/non-ground filtering, building classification, segmentation, and building extraction. Classification of ground and non-ground classes was performed using Adaptive-TIN Surface algorithm. Non-ground points from filtering process were classified as building with the algorithm based on multiscale local dimensionality to separate points at the maximum separability plane. Segmentation using segment growing was used to separate each building, so boundary detection could be conducted for each segment to create boundary of each building. Lastly, building extraction was conducted through three steps: boundary point detection, building delineation, and building regularization. With ten samples and step 0.5, classification resulted in quality and miss factor of 0.597 and 0.524, respectively. The quality was improved by segmentation process to 0.604, while miss factor was getting worse to 0.561. Meanwhile, on the average shape index value from extracted building had 0.02 difference, and the number of errors was 30% for the line segment comparison. Regarding positional accuracy using centroid accuracy assessment, this method could produce RMSE of 1.169 m.


Introduction
The availability of high-quality geospatial data, including building footprint as part of digital topographic map, is necessary for bigger purposes, including 3D city models (Sugihara et al., 2012;Park and Guldmann, 2019) and spatial planning. Building footprint in topographic maps depicts the boundary of building roof with specified level of detail. Information about building footprint is important in spatial planning to understand the location of a settlement area or identification of building functions on a higher scale.
Traditionally, topographic data production uses stereo plotting process utilizing the photogrammetry technique. However, manual stereo plotting takes much time and labour-intensive, so it is not effective to be applied for large scale topographic mapping in the whole area of Indonesia. Due to the high needs for large scale topographic maps, acceleration of large-scale mapping must be conducted. Therefore, innovation in a large scale mapping method should be discovered.
Automation of map feature extraction is one of the solutions for mapping acceleration. The automation will reduce the amount of time and human resources, so that it will increase the speed of topographic map production. One of the map features that have been evaluated to be extracted in this research is building. It is not easy to develop the algorithm for automatic building extraction, because most buildings in Indonesia are very dense and close each other, and many of them are in irregular shapes, making it a more challenging research.
Research of automatic building extraction is not of something new, many methods have been developed effectively and accurately. A recent technology is based on deep learning algorithms, for example using neural networks and Markov Random Fields (MRF) (Davydova et al., 2016), end-to-end trainable gated residual refinement network (GRRNet) (Huang et al., 2019), Fully Convolutional Network (Antasari, 2019), Convolutional Neural Network (Yang et al., 2018), Deep Convolutional Network (Setiaji and Harintaka, 2019), or Deep Fully Convolutional Networks (Persello and Stein, 2017). Then, the development of LiDAR creates a new approach for automation of building extraction, for instance using OBIA (Object Based Image Analysis) with the use of class modeling methods (Tomljenovic et al., 2016), data fusion of point and grid-based features using a graph cut algorithm (Du et al., 2017), a novel ordered pointaided Hough Transform (OHT) (Widyaningrum et al., 2019), a modified LEGION segmentation (Liu et al., 2012), automatic process with the emphasis on top-down approaches (Huang et al., 2013), and rule-based segmentation of non-ground LiDAR point clouds (Awrangjeb and Fraser, 2013).
The purpose of this research is to find the best and the most effective approach to extract building footprint automatically for topographic mapping acceleration in Indonesia. It is because the most needed in Indonesia is finding a simple but effective approach to extract topographic map features, since completing topographic map for a large country such as Indonesia is still challenging. This research is expected to improve business process of topographic mapping in Indonesia to be cheaper, faster, and more effective.

Data and Materials
This research used LiDAR data acquired in 2016 (for building footprint extraction) and topographic map of Indonesia at 1:5,000 scale created in 2017 (as reference dataset). More specifically, the research was conducted at Sayang-sayang Village, located in Cakranegara District in Mataram City, Lombok Island, West Nusa Tenggara Province, Indonesia ( Figure 1). This area was chosen because it represented various characteristics of buildings, containing both dense and sparse building groups, so that the analysis could be conducted based on the density of the buildings. Regarding the specification of the data, the density of point clouds was 4 ppm (points per m 2 ) or higher, meaning that for each square meter of the area there were at least four points. This specification was suitable for Indonesia topographic mapping at the scale of 1:5,000.

Ground/Non-ground Filtering
Ground/non-ground filtering was one of the important steps in this research, because it would affect building classification result. Ground and non-ground classification was not only useful for DTM extraction, but also for identifying building, because it could reduce the number of points (Ramiya et al., 2017). In that sense, the next step of point cloud classification into vegetation and building classes was conducted in non-ground points only. Therefore, point cloud separation to ground and non-ground classes determined the quality of building classification.
Classification of ground and non-ground classes was performed using Adaptive-TIN Surfaces algorithm in Terrasolid software. It uses densification of the TIN, where a point will be added  son, 2000). The inputs such as maximal building size, terrain angle, iteration angle, and iteration distance were crucial, in which the values used were 100 m for maximum building size, 88.0 o for terrain angle, 5.00 o to plane for iteration angle, and 1.40 m for iteration distance. The maximum building size shows the area that has at least one initial ground point, a terrain angle assumes the steepest angle between points in generated TIN, a terrain angle depends on the terrain condition in which a smaller iteration angle is better for a flat terrain and vice versa, and the iteration distance is the maximum distance between point to triangle (Rizaldy and Mayasari, 2016).

Building Roof Classification
Building roof classification used the algorithm developed by Brodu and Lague (2012). It was based on a multiscale local dimensionality to separate points into two classes at the maximum separability plane. The classification considered how the cloud geometrically looked like at ball geometry at a given scale, then the best combination of scales at which dimensionality was measured. The diameter of the ball was defined as a scale, while point of interest was located at the centre, as shown in Figure 3.
To determine whether the cloud appeared locally at a given scale in 1D, 2D, or 3D, the proportion of variance by the eigen values resulting from Principal Component Analysis (PCA) was used, with the Equation (1) defined the proportion of variance (Brodu and Lague, 2012).  in the neighbourhood ball determined whether the points were distributed in one, two, or three dimensional around the reference scene point. The algorithm used a linear Support Vector Machines (SVM) created by CANUPO training plugin in Cloud Compare software. A linear classifier proposed one solution in the form of hyperplane that best discriminated and separated two classes (e.g. vegetation against building). It used weight vector (w) and bias (b), in which the hyperplane was defined by equation w T X -b = 0. w and b were used to project the feature space on the hyperplane, then the distance to the hyperplane d 1 = w T X -b 1 was calculated for each point. The second distance d 2 was obtained by repeating the process in order to get the second-best direction orthogonal to the first, then both distance d 1 and d 2 were used as coordinates at 2D plane of maximum separability.
The plane of maximum class separability only kept two main components, so in this research, point clouds were separated into two categories: building and vegetation. To do this classification step, the training area was first created that represented both categories. Then, some parameter numbers were decided to create this training data, including minimum and maximum scale to be the limits of multiscale dimensionality. The unit of the scale referred to the unit of the point cloud. In this research, the chosen number for scales was 2-10, because the building lengths in the researched area were considered between 2 and 10 m, since they were located at a settlement area. Another parameter was step, defined as the distance unit from the minimum to the maximum scales, in order to achieve multiscale features. Small step means more features, otherwise, larger step means less features. In this research, the chosen number for the step was 0.5 m.
Shortly, the whole process for all classification steps is explained in Figure 4. It started from ground classification and non-ground classification, then non-ground points were classified as building and vegetation points. All processes here were binary classification. The building points were then used in the next stage, namely segmentation.   Segmentation Segmentation in point cloud is used to distinguish group of points based on feature similarity. In order to improve classification result, Vosselman (2013) used segment-based features rather than traditional point-based features for point based classification. The segment-based features led to the more accurate classification result. In this research, segmentation was used to separate each building, so that boundary detection could be conducted per segment to create boundary of each building. The data used in this stage was the previous classification result, namely classified building points.
Segment growing algorithm (Vosselman, 2013) was chosen for the segmentation process. Point Cloud Mapper software was used for the implementation of the growing segmentation. In this algorithm, the test for accepting neighbouring points as extensions of a segment was based on the similarity of feature values and normal vectors scaled by the planarity to distinguish between surfaces with different orientations (Vosselman, 2013). Seed points with similar feature values would be extended and merged with their neighbour points if their values were close to the average feature values for each segment. The combination with the normal vector direction scaled by planarity was used to distinguish between surfaces with different orientations, in which when λ 1 > λ 2 >λ 3 were the eigen values of the co-variance matrix, the planarity could be expressed as ( ) 2 3 2 λ λ λ − (Vosselman, 2013). Segmentation was also used to remove small segments and unsegmented points. Therefore, it eliminated building errors in the building extraction process. Unsegmented points, coloured as white points ( Figure 5) and did not have segment number, were points without sufficient nearby similar points to start a new seed (Vosselman, 2013).

Building Footprint Extraction
Building extraction process started from boundary detection for each segment to create delineation of coarse building footprint from those boundary points. After segmented points were obtained using segment growing, then boundary points of each segment were extracted using developed boundary package in MATLAB before delineation of coarse building boundaries was created from those boundary points ( Figures  6a, 6b, and 6c). The algorithm of boundary point detection returned points that represented boundary around the building roof points. Unlike the convex hull, the boundary could shrink towards the interior of the hull to envelop the points (https://www.mathworks.com/help/matlab/ref/ boundary.html). Then, points for each segment were connected with lines using Points to Line tool, so it produced raw building footprints. Lastly, to produce a smooth and sharp building footprint, regularization of building boundary was conducted using Building Regularization tool that applied algorithm developed by Gribov (2015) (Figure 6d). The implementation of the boundary delineation and building footprint regularization was performed in ArcGIS Pro software (https:// www.esri.com/en-us/arcgis/products/arcgis-pro/ resources).
In order to achieve a good geometry of building regularization, reconstruction of coarse building required the support of 90° and 45°. This algorithm was also suitable to arcs, in which the arc passing through considered locations differed from the segment by the necessity to define the radius (Gribov, 2015). Building regularization required the setting of tolerance parameter that was set to 1 m. This number was based on experiments using several numbers, considering the difference between the number of segments from extracted and reference segments, so it could be assumed that extracted objects had similar level of detail to reference objects. Figure 7 illustrates the result of each process, from ground/non-ground classification until building footprint extraction. It can be seen that each process affected the quality of the next process. For instance, due to some very close buildings were segmented as one segment, in building extraction result they became one single building. Thus, to produce a good extraction result, the quality of each process from the beginning must be ensured.

Result
The comparison between extracted and reference data (topographic map) is presented in Figure 8. The green circle shows that the extracted buildings are similar to reference data, while the yellow one shows that the extracted buildings still need improvement. Some errors happened, for instance, the very close and dense buildings were extracted as one single building ( Figures  8a and 8b). Besides the dense area, another challenge was vegetation, where the buildings that were covered by vegetation had problems in their geometric result (Figures 8c and 8d), even not extracted at all (Figure 8e).

Classification Analysis
The classification result of training data was shown in a plane of maximum separability and separation line (Figure 9). Balanced accuracy (ba) and Fisher Discriminant Ratio (fdr) were used to quantify the performance of the classifier, where a large ba value indicated a good recognition rate, while a large fdr value indicated that classes were To test the effect of the number of samples and step parameter to ba and fdr results, experiments with various parameters were conducted. Several numbers of samples were used from 10, 15, 20, 25, and 30, and also changed the step parameter by 0.5 and 1, where the result was shown in Table  1. It can be seen that the addition of samples and changing step parameters does not improve the results of ba and fdr too much. Besides comparing the ba and fdr values, analysis of classification accuracy was conducted using matched rate analysis, consisting of completeness, correctness, quality, and miss factor, defined by equation (4) -(7) (Zeng et al., 2013). Initially, those equations were used to assess extracted buildings instead of classified building points, but in this research, it was adapted to examine the accuracy of building classification result. Reference points here were non-ground point clouds that were overlapped with building geometry from the topographic map, which were obtained from manual stereo plotting. It was assumed that those points were the correct building points and there was no geometrical offset between point cloud and the topographic map.  Table 2. With ten samples and step 0.5, it produced the highest correctness at once the lowest completeness and quality compared to the other parameters. The best result was produced using twenty-five samples and step 1, with quality of 0.666 and miss factor of 0.263. However, since the purpose of this research is to find the simple and effective method for building extraction, the least number of samples was chosen.

Segmentation Analysis
Segment growing result showed point clouds were segmented to each building, so that segmentation result could be identified as building separation. This process was also used to remove   Compared to the classification result, segmentation increased the quality of building points based on the assessment method in the previous chapter. The quality was escalated from 0.597 to 0.604, while the correctness was improved from 0.869 to 0.913. Nevertheless, the completeness and miss factor became worse, where the completeness was reduced from 0.656 to 0.641, while miss factor was increased from 0.524 to 0.561. It was because there were more points eliminated in this stage. Therefore, the percentage of missing objects was higher.

Building Extraction Analysis
Three methods were used to assess the results of geometry of footprint building extraction, which were segment of line comparison, shape index similarity (Veltkamp and Hagedoorn, 2000;Veltkamp, 2011b;Kweon et al., 2019), and edge point-correspondence (Ramirez and Ali, 2003;Kweon et al., 2019). The segment of line comparison was used to evaluate the number of sections of every line resulted from samples against reference dataset. Statistically, both data were calculated to generate distinction of the line segments and counted it into a percentage which indicated the level of error. The matrix of the line segment comparison is shown in Table 3. The shape index similarity was used to enumerate the shape ratio between two objects. This method expressed the ratio of an area to the length of the circumference (Veltkamp, 2011a). Each sample of extracted building was calculated as a polygon, where the shape index was calculated between perimeters in giving an internal area of a polygon. If the shape index was 1, it meant the polygon calculated was circular, while the higher number of 1 indicated the bar-shaped of the polygon. If the polygon was square, it had an index of approximately 1.13. It is possible to express the shape index equation as follows:   Table 3 shows the number of line segment in each data sample, where it was arranged downward respectively in sample ID, the number of segments of reference datasets, and the number of segments of extracting data. Each number of segments of extracting data is shown as four columns as well as the interval of tolerance in building regularization processes, which are 1 m, 1.5 m, 2 m, and 2.5 m consecutively. The comparison of extracting data to reference datasets obtained the maximum number of errors in each sample data. The error was depicted as a percentage which is illustrated in Figure 10a. Based on the graph, the maximum error was found in sample data N, where it had indefectible error regarding to line segment of building extraction. The most suitable interval tolerance value in regularization process was 1 m. For all, the number of errors on the average was 30% or 0.3, meaning the line segment of this building extraction is acceptable.
The assessment of shape similarity resulted in the value of index in every sample. The result of shape similarity index is shown in Figure 10b. Related to extracted data and reference dataset, there was a low difference index mostly. On the average, the whole sample had the difference of 0.02 in the shape index value. Comparing the whole data, sample S merely generated the highest difference shape index, or quantitatively was 0.3. Thus, the result of building extraction had a nearly similar shape to reference data in particular of building edge features.
The last test was positional accuracy using the calculation of Root Mean Square Error (RMSE). The accuracy was assessed by comparing centroid positions between extracted and reference sample objects. Centroid was used because Euclidean distance between the centres of the mass of an extracted object and the reference object could be used as another metric to measure the positional accuracy other than the corner points of buildings (Zeng et al., 2013). From those buildings, centroid coordinates were calculated on both data as shown in Table 4, RMSE was then obtained from Equation (9).

Time Processing Analysis
These works were conducted using laptop with specification of processor Intel(R) Core (TM) i7-7500U, RAM memory 8.00 GB, and 64-bit Operating system. Thus, the result of this analysis can be a consideration for practicality evaluation using the hardware specification, and the process could be faster using higher comput- er specification. Time processing analysis was conducted for each step, namely ground/nonground classification, building classification, segmentation, and building footprint extraction for an area of 2.478 km 2 . Based on the existing topographic map, it has 3,637 buildings. The result of time processing analysis is presented in Table 5.
To assess the effectiveness of this proposed method, manual digitization was conducted in the same area. The result was manual digitization which needs 5 hours and 20 minutes, with the output specification for manual digitization was similar with the output from automatic extraction (for instance, in a very dense settlement area, delineation was conducted in the boundary of the settlement area). The time processing analysis indicated that the method was almost nine times faster than the manual method, and although this proposed method was simple, it produced good results with high effectiveness. The comparison of manual and automatic extraction for the whole researched area is presented in Figure 11.

Conclusions
This research aims to support mapping acceleration, especially for topographic map production. It focused on automatic building extraction from LiDAR data using the combination of several methods, namely classification, segmentation, and building footprint extraction.
Classification of ground and non-ground classes was performed using Adaptive-TIN Surface algorithm. Building class classification used multiscale local dimensionality to separate points into two classes at the maximum separability plane. Then, segmentation was conducted using segment growing algorithm. Building footprint extraction comprises three steps: boundary point detection, building delineation, and building regularization.
With ten samples and step 0.5, classification resulted in quality and miss factor of 0.597 and 0.524, respectively. The quality was improved by segmentation process to 0.604, while miss factor was getting worse to 0.561. Meanwhile, on the average shape index value from extracted building had the difference of 0.02 and the number of errors was 30% for the line segment comparison. Regarding positional accuracy using centroid accuracy assessment, this method could produce RMSE of 1.169 m. From those statistical analyses, it can be concluded that this method produced acceptable quality for building footprint extraction, especially in sparse settlement areas.