Structural and Geometrical Vegetation Filtering-Case Study on Mining Area Point Cloud Acquired by UAV Lidar

DOI: https://doi.org/10.46544/AMS.v26i4.06 Abstract Filtering vegetation in point clouds is one of the basic steps in processing the products of bulk data collection. Commonly used filtering methods have been developed for large areas, usually scanned from an aircraft at high altitude, where the point clouds are very poorly detailed, and the terrain is essentially flat. Nowadays, point clouds are generated not only by aerial and ground scanning but mainly by photogrammetry from UAVs and, more recently, by scanners mounted on UAVs. Various objects are measured, including anthropogenic objects, rugged areas with large elevations, rocks, pits, buildings, etc. Therefore, the aim of filtering is no longer to remove everything from the cloud except the ground surface but to remove vegetation as such and some unnecessary objects. In this task, the use of structure filters, which classify points based on the surrounding of each point in terms of its structure, seems to be advantageous. Since many different filtering algorithms have been developed and their behaviour is controlled by the parameters chosen, it is necessary to test suitable filters and their settings for each type of area. In this paper, selected freely available filtering methods based on a geometric approach are tested as a comparison to the CANUPO-based structure filter, which is the main object. Testing of ground filtering procedures on real data acquired by the UAV 3D scanner DJI L1 corresponding to the nature of the mining area was performed. Test results were evaluated by type I error, type II error, and total error, where type I error represents incorrectly unremoved points, type II error represents incorrectly removed points, and total error represents the sum of type I error and type II error. The tested geometric filters CSF, PMF and SMRF showed an overall error of about 7.5% in the best case, of which error type I constitutes a significantly larger part (about 6%) than error type II (about 2%). In contrast, the tested CANUPO structural filter in basic use achieved up to 5.2% total error, using a defined probability bound of up to 4.1%. The distribution of errors of type I and type II is almost even here. The specific probability set here has a relatively small effect on the result, at 0.1% of the total error. Some additional insights into the design and use of filters emerged from the testing. Geometric filters are significantly faster, but CANUPO is significantly more reliable in terms of removing vegetation as points having a character of noise. In particular, the maximum radius used and the total number of filters must be considered when creating a filtering (training) prescription.

With the rapid development of UAV technology and digital photogrammetry, the Structure from Motion (SfM) method is usually used today, which makes it possible to obtain a point cloud under very good economic conditions , (Štroner et al., 2019). However, for the correct processing of the photogrammetric method, the correct placement of ground control points (GCP), a suitable flight strategy (Štroner et al., 2020), (Štroner et al., 2021) and also the correct computational algorithm are necessary. The accuracy and usability of the resulting point cloud are further dependent on the ground sample distance (GSD) of the images and the density of the resulting point cloud , (Moudrý et al., 2019).
In contrast, the laser scanning method is more expensive, but the result is directly a point cloud. The laser scanning method was first applied by ground instruments , , (Křemen, 2020), or the laser scanner was placed in an aircraft (Siwiec, 2018). In recent years, the placement of a laser scanner on the UAV platform with a fixed-wing or rotary-wing (Hu et al., 2020), (Torresan et al., 2018) and on other flying platforms, such as the airship (Koska et al. 2017), (Urban et al., 2016), (Jon et al., 2013), has become increasingly popular. The accuracy of laser scanners is usually specified by the manufacturer and can be tested using special reflective control points or a sufficiently accurate point cloud obtained by another method (Štroner et al., 2021). It seems very advantageous today to use a combination of both methods (Koska and Křemen, 2013), which complement each other. The advantages and disadvantages of both methods are described, for example, in the publication (Shaw et al., 2019).
This article deals with the mass collection of data in mining areas that are specific to very rugged terrain, as can be seen from the articles (Kovanič, 2013), (Urban et al., 2016), (Ren et al., 2019). For the correct interpretation of the surface, it is necessary to remove unwanted elements such as parked vehicles, buildings, structures, and especially vegetation.
Many vegetation filtration procedures are known and have been published in recent years (Meng et al., 2009), (Rashidi and Rastiveis, 2017), (Shi et al., 2018), (Li et al., 2020). Some are included in commercial software or are provided as free or open-source software (Tinhkam et al., 2011), (Montealegre et al., 2015), (Yilmaz et al., 2018). Common to many of these filters is that they were developed for Lidar data taken from an aircraft where the carrier is relatively far from the earth's surface (Polat and Uysal, 2015), (Wei et al., 2017), (Kumar et al., 2016), Susaki, 2012), but filters designed for ground-based laser scanning (Brodu and Lague, 2012) can be found. A very interesting alternative is the use of filters based on PCA algorithms (Cheng et al., 2021). However, various geometric filters are commonly used, which have also been tested many times in , , (Cai et al., 2019), (Silva et al., 2018).
With the development of UAVs, it is typical for both photogrammetry and laser scanning methods that the carrier is placed relatively close to the surface, and the point cloud density makes it possible to record even very rugged surfaces realistically. As shown in (Štroner et al., 2021), the use of geometric filters in rugged areas is not suitable, and a better result was achieved with a structural filter based on the CANUPO tool, which allows you to train your own filter. Its use for vegetation filtering has not yet been explored, especially as regards the process of creating a filter definition. The aim of the article is to test the effectiveness of various variants of trained CANUPO filters on typical mining area data obtained by the UAV scanning system and to compare their success with the results of selected geometric filters.

Testing data
The test data was acquired by a DJI Matrice 300 UAV with a DJI Zenmuse L1 scanning system. The measurements were performed in June 2021 under full vegetation cover in the vicinity of the village of Sedlice in the Košice Region, Slovak Republic. The flight height was 100 m above the lower level of the area, a single grid scheme of the raid was used, the onboard GNSS RTK receiver was connected to the SKPOS network of permanent reference stations. This information is presented here for the overall picture; it is not significant from the testing point of view. a) b)

Fig. 1. Test data -point cloud used for testing -a) cloud with vegetation b) cloud manually de-vegetated
The scanned data was decimated to a density of about 100 points/m 2 , the cloud used for testing contained 1 457 259 points, the acreage of the area was 9 955 m 2 , the elevation between the lowest and highest point of the terrain was 25 m. The data were selected because of the presence of both rugged surface with a high slope, and more or less dense vegetation in the form of both thickets and trees. In order to reliably test the success of vegetation removal, the test plume (see Fig. 1a) was manually cleared to obtain a comparison plume (see Fig. 1b). To illustrate the nature of the data, Fig. 1c shows the problematic part of the cloud with dense vegetation of different heights, then the heap in the background (grey).

Tested ground filtering algorithms
For testing, ground filtering algorithms implemented in freely available software packages were used. Specifically, CSF (Cloth Simulation Filter), PMF (Progressive Morphological Filter) and SMRF (Simple Morphological Filter) were used, and their principle is described in detail in (Zhang et al., 2003) (Pingel et al., 2013) (Zhang et al., 2003). These are filters based on the primary generation of a surface approximation, which is used to filter point clouds based on the point-to-approximation surface distance subsequently. The PMF and SMRF filters were used in the implementation in the PDAL software package (http://pdal.io), and the CSF in the CloudCompare version 2.12 software implementation.
CANUPO (described in detail in (Brodu and Lague, 2012)) was used as a structural filter, whose outstanding feature is the possibility of creating a custom filter for specific data. Two separate training clouds need to be created to create the filter -one representing the vegetation class, the other representing the surface class. The calculation is based on principal component analysis (PCA) with different radii of spherical neighbourhoods and subsequent statistical processing. Thus, when creating a custom filter, in addition to the custom type clouds, it is necessary to define how many and how large (defined by radius) neighbourhoods (dimensions) will be used. This is not a simple choice; in general, a higher number of dimensions means better quality results but also longer processing time. According to the conclusions of (Štroner et al., 2021), this filter is very useful in distinguishing scattered point cloud parts from points forming "smooth and continuous" surface areas.
Each filter used can be used with different settings (CSF, PMF, SMRF with numerical constants defining the filter behaviour; for CANUPO, it is the different number and size of spherical surroundings used for evaluation).
However, the optimal setting is not generally known. Therefore, determining the best setting is also part of the testing. The tested settings are listed in Table 1; for settings not listed, the default ones were used.  CANUPO filtering was tested on the same sample data with different sizes and numbers of neighbourhoods listed in Table 2. For both geometric filters (PMF, SMRF, CSF) and structural filters (CANUPO), all combinations of these parameters were tested. CANUPO also provides, in addition to the intended classification, the reliability (probability) of the classification and can be classified in such a way that a point is assigned to a given class only if the reliability of its identification reaches at least a chosen threshold. This method of evaluation has been tested with the result that not classified points in any class are considered to be surface points.

Filter quality evaluation
Testing the efficiency of a particular filter was always carried out using the same procedure: 1. Applying a specific filtering procedure to the test data (data_orig) with a specific setting (data_filter is created). 2. Determining the first type of error (Error type I) by comparing the data_filter with the reference (manually cleaned, data_etal) data in CloudCompare. 3. Determine the second type of error (Error type II) by comparing the data_filter with the reference (manually cleaned, data_etal) data in CloudCompare. 4. Calculation of evaluation criteria. 5. Repeat steps 1-4 for all filters and variations tested.
The error type I is determined by calculating the distance between data_filter and data_etal (cloud to cloud distance function in CloudCompare). Points that are further away than the selected value of 0.01 m (chosen due to the data resolution of 0.1 m) remained in the cloud, although they should have been discarded, are then evaluated as an error type I.
An error type II is determined by the same procedure; only the comparison is made in reverse order (data_etal and data_filter). Points that are further away than the selected value (0.01 m) have been discarded, although they should have remained there, are evaluated as an error type II. A single summary criterion was then determined for the overall evaluation, obtained by simply summing errors type I and II (hereafter referred to as the total error). For each filter, an evaluation was performed for all combinations of parameter settings, and then the combinations that gave the best results in terms of total error (sum of type I and type II errors) were selected.

Results
The results of the testing are shown in the following tables. Due to the large amount of data found, the data is divided into results obtained by geometric filters and procedures using CANUPO. To increase the clarity of the presentation of the results, the efficiency is described as a percentage (100% is the total number of points of the filtered data_orig cloud, 1,457,259 points). The first 5 different results are shown to show how different the results are and whether the detected optimum is flat or sharp because in practical use, it is not possible to check the quality of the result in such an exact way, and the parameters have to be chosen based on experience or guesswork. Since there are many filters setting parameters in some cases, abbreviations (listed below the table) have been used here. For all tables, the abbreviations that E I; E II and TE stand for Error type I, Error type II and Total Error, respectively.  Table 3 shows the top 5 results obtained by the PMF along with the settings used, and Fig. 2 shows the correctly evaluated terrain in natural colour, points that should have been removed and were not (error type I) in blue, and points that should not have been removed and were (error type II) in red. It is clear that filtering does not remove low vegetation or vegetation below which the terrain is not covered by points. On the contrary, there were parts of the point cloud removed where there are significant changes in the character of the terrain. This is well evident in the middle part of the image on the tops of the heap; then, there is a very steep area to the top left. The unremoved points are 5.9%, the erroneously removed points are about 1.5%, the total error is then 7.4% at best. The top 5 results presented differ very little in terms of total error. The results of the CSF filter are very similar to the PMF filter in terms of overall error (7.4%), the Type I and Type II errors are also similarly distributed, and the reported results are consistent. The parameters used are both cloth resolution and a classification threshold of about 0.3 m (three times the resolution of the filtered cloud itself). Compared to the PMF results, the CSF (see Fig. 3) adapts better to the rugged terrain (as seen again in the middle part), but it performs worse in the left part, which is steep. It also removes extraneous features better (for instance, the car in the middle left).

Fig. 3. Quality evaluation of ground filtering -the best CSF variant
The SMRF filter (see Fig. 4) achieved an overall error similar to the other geometric filters (7.3%), the error distribution is significantly more in favour of the first kind of error (6.7%), whereas the second kind of error is significantly smaller here (0.7%). The filter adapts quite well to steep terrain (top left in the figure). However, where the nature of the terrain changes significantly, there is an erroneous removal of terrain points throughout the area.
Overall, the results of the geometric filters can be said to achieve the same results in a given rugged region, although each of them has different evaluation errors, which is probably due to the geometric nature of the process.

Fig. 4. Quality evaluation of ground filtering -the best SMRF variant
CANUPO filtration works on a different principle. As can be seen in the results, it reliably distinguishes points on surfaces from scattered points characterizing vegetation. CANUPO filtering without probability achieves an overall performance of 5.2%, i.e. about 2% better than geometric filters, with an error of the first kind of about 2.1% and an error of the second kind of 3.1%. Compared to geometric filters, this means that it removes rather than erroneously retains more points. The erroneously removed points are mainly on the edges of solid surfaces. Using the CANUPO filter with the probability option (75% -up to 99%) improves the overall error by an additional 1%, where the error of the first kind is approximately the same as the error of the second kind (both 2%). The differences between the results using probability are minimal; 0.1% cannot be taken as a significant differencethis corresponds to 1.4 thousand points. Since the results are very similar between the different variants of the CANUPO filter with probability selection, only the best results of each variant are reported directly in the text (Table 6).

Fig. 5. Quality evaluation of ground filtering -the best CANUPO variant (95% probability)
One side of Table 7 are the most efficient results of the top five achieved by the Canupo structural filter evaluation is the resulting filtering error; the other side is the computational complexity. Especially in the case of CANUPO filtering, it is necessary to consider how to set up the filter generation. The number of radii determines the difficulty of the calculation, and it is, therefore, advisable to choose a larger step in terms of filtering speed. The maximum number of 100 radii is for a filter with a minimum of 0.1 m, a step of 0.1 m and a maximum of 10 m. As can be seen from Table 7, practically the same results can be achieved with half the number of radii. We also present the best results for the variants with a lower number of radii than the previous variants (see Table 8). It is evident here that although the number of radii is greatly reduced, and thus the computational effort, the results obtained are still very similar to the best ones.
The above are still the top results in terms of success rates achieved. Table 9 then shows the results obtained with the minimalist filter with only five radii, namely a minimum of 1m, a step of 1m and a maximum of 5m. Here, the results for probabilities of 85%, 90% and 95% are quite close to the non-better ones (still better than the geometric filters) but use much less computational power.
The individual filters should also be evaluated in terms of processing time. For the 1 457 259 point test cloud used, the processing times for the best performing individual methods are PMF: 7.5 s; SMRF: 7.8 s; CSF: 3. 9 s; CANUPO 80% (0.1;0.1;10; 100 radii): 8 min 8 s; CANUPO 80% (0.1;0.2;10; 50 radii): 4 min 44 s; CANUPO 95% (0.3;0.3;10; 34 radii): 3 min 55 s; CANUPO 95% (1;1;5; 5 radii): 23.4 s. The computer used was the same in all cases, a laptop with an AMD Ryzen 9 5900HX processor, 32 GB RAM. From the testing performed, it is useful to draw general conclusions that suggest how to choose the filter definition parameters in terms of maximum radius and number of radii. The following graphs partially answer this question. Data from testing the CANUPO filter with a 95% probability are shown. However, for the other variants, the results are very similar. The data has to be interpreted, taking into account that there are three variables in the system (minimum and maximum radii, radii step or the number of radii), and their influence cannot be separated in the individual filters. In Fig. 6, the magnitudes of the maximum radii for each filter are plotted against the total error obtained. The data shows that a larger maximum radius overestimates a smaller total error. Far-right are values exceeding 30% error; these are filters where only three radii at max are used. This is followed by Fig. 7, which shows the number of radii used as a function of the total error. It is clear that a small number of radii increases the probability that the overall error will be unfavourable. The results correspond to a logical situation where higher resources spent, with a larger maximum radius and a higher total number of radii used, yields better quality results, but this also increases the computation time.

Conclusions
Testing of ground filtering procedures on real data corresponding to the nature of the mining area was performed. The tested geometric filters CSF, PMF and SMRF showed an overall error of about 7.5% in the best case, of which error type I constitutes a significantly larger part (about 6%) than error type II (about 2%). In contrast, the tested CANUPO structural filter in basic use achieved up to 5.2% total error, using a defined probability bound of up to 4.1%. The distribution of errors of type I and type II is almost even here. The specific probability set here has a relatively small effect on the result, at 0.1% of the total error.
The nature of the operation of geometric and structural filters is different; geometric filters practically approximate the terrain by a surface and consider as points to be removed those points that are further from the surface than the threshold. Because this surface is relatively coarse compared to the resolution of the filtered cloud itself, the threshold must be set relatively large (cloud resolution of about 0.1 m, a suitable threshold for the test data was, for instance, 0.3 m for CSF). Thus, all points will remain in this space, even those that, for example, represent low vegetation. Therefore, the bulk of the total error here is made up of the first kind of error -i.e., points that were not removed in error. Terrain points are considered to be those that are lowest, without consideration of their nature.
In contrast, the CANUPO-based structure filter classifies points based on their neighbourhood (multiple neighbourhoods with different radii) and thus removes all points that have a noise character (or do not match the character of the surface), regardless of their location. Thus, it can be advantageously applied to such data where rugged surfaces (even with a large slope) are present. For these reasons, the structure filter does not make errors on rugged surfaces, as can be seen in the test data on the heap in the middle of the area. In terms of the design of the CANUPO classifier, based on the experiment, it can be concluded that the quality of the result is not fundamentally dependent on the number and interval of parameters. However, a higher number of radii used and a larger maximum radius increases the probability of a good quality result. When designing filters, it is advisable to consider the resolution of the cloud versus the size of the radius and not to take the reported radii as absolute numbers.