DIGITAL TERRAIN FROM A TWO-STEP SEGMENTATION AND OUTLIER-BASED ALGORITHM

We present a novel ground filter for remotely sensed height data. Our filter has two phases: the first phase segments the DSM with a slope threshold and uses gradient direction to identify candidate ground segments; the second phase fits surfaces to the candidate ground points and removes outliers. Digital terrain is obtained by a surface fit to the final set of ground points. We tested the new algorithm on digital surface models (DSMs) for a 9600km region around Perth, Australia. This region contains a large mix of land uses (urban, grassland, native forest and plantation forest) and includes both a sandy coastal plain and a hillier region (elevations up to 0.5km). The DSMs are captured annually at 0.2m resolution using aerial stereo photography, resulting in 1.2TB of input data per annum. Overall accuracy of the filter was estimated to be 89.6% and on a small semi-rural subset our algorithm was found to have 40% fewer errors compared to Inpho’s Match-T algorithm.


INTRODUCTION
Digital Elevation Models (DEMs) are important for studies of earth surface processes such as geomorphology, hydrology (Wilson and Gallant, 2000) and vegetation (Moore et al., 1991). They also have applications in wireless communication (Sawada et al., 2006) and land cover map generation (Caccetta et al., 2015). It is popular to generate DEMs using remote sensors including light detection and ranging sensors (LiDAR), digital stereo (multiple view)-photography, and interferometric synthetic aperture radar (InSAR). Such sensors may be airborne or spaceborne and provide dense observations of the land surface. From the data acquired, positional (e.g. x,y,z) and surface property (e.g. reflectance from optical photogrammetry or backscatter from SAR) information is directly available, or with some processing, may be derived. The positional information is directly relevant to the generation of a Digital Surface Model (DSM), which records heights of any feature relative to a reference (typically taken to be zero at mean sea level). In this paper we focus on the generation of Digital Terrain Models (DTM), which provide estimates of ground elevation, through filtering and then removing above ground objects from DSMs. We note that given a DSM and a DTM, then we may derive a so called "normalised Digital Surface Model" (nDSM) by simple subtraction of the DTM from the DSM. DTMs are typically used for estimating ground properties such as derivatives (slopes and curvatures) or flow path prediction in runoff models. nDSMs are typically used for building and/or tree height extraction.
The published literature is mostly focused on LiDAR data, however many of the concepts and results generalise to stereo photography derived point clouds due to the similarities of the sensed objects. Meng et al. review and comment on the existing methods for ground filtering LiDAR point clouds (Meng et al., 2010). They categorised algorithms into six classes, segmentation/clustering, morphological, directional scanning, contour-based, TINbased and interpolation-based. We prefer a coarser classification: algorithms either label individual points, or label clusters or segments of points. Furthermore this labelling is performed using neighbourhood based comparisons (e.g. morphological and direction scanning) and/or surface-based comparisons (contour, TIN and interpolation based). Alternative classifications are given by (Shan and Aparajithan, 2005) and (Tóvári and Pfeifer, 2005).
Neighbourhood comparisons typically use operators derived from the morphological opening and closing operations to quickly label points as ground. In comparison, algorithms that label segments/clusters such as (Belkhouche et al., 2015) and (Tóvári and Pfeifer, 2005), divide the point cloud into homogeneous groups of points and then use neighbourhoods, fitted surfaces or properties of the segments to discriminate ground segments from nonground segments.
To improve accuracies, neighbourhood methods have been supplemented with other secondary filters such as repetitive sampling by (Kobler et al., 2007), surface fitting robust to outliers (Woestyne et al., 2004) and multiscale neighbourhoods (Abo Akel et al., 2004). Kobler et al. (Kobler et al., 2007) first used a pointbased neighbourhood algorithm similar to Vosselman's classic slope-based filter (Vosselman, 2000) to remove most non-ground points and then applied a point-based surface filter to obtain a good DTM approximation. This latter filter repeatedly fitted surfaces to random samples of the point cloud and relied on the point cloud containing very few non-ground measurements. The final DTM was found using the distribution of all the surfaces. Wang and Tseng (Wang and Tseng, 2014) pre-processed point clouds with a segment-based filter and then used 1-dimensional neighbourhoods to label the points. Abo Akel et al. (Abo Akel et al., 2004) used a segment-based filter (using only intrinsic properties of the segments) to extract a road network and then performed multiscale neighbourhood comparisons to find all other ground points. Woestyne et al. (Woestyne et al., 2004) also use a segment-based filter but follow it with a surface-based filter.
The review article (Sithole and Vosselman, 2004) lists common difficulties for ground filters of LiDAR data: • Outlying points far above or below the true surface. These are usually caused by sensor or matching errors, or birds or aircraft in the image. Most filters have a bias towards low points so low outliers are typically more problematic than high outliers.
• Large objects. These are especially problematic for filters that use fixed-sized neighbourhoods because the object's extents may exceed the neighbourhood.
• Complex shapes and configurations. For example, disconnected patches of bare earth (such as courtyards), buildings connected to other buildings, roofs that have multiple levels, or roofs overshadowed by trees.
• Attached objects are objects that are seamlessly connected to the ground on one edge, but are above the ground on another edge. Some examples of this are roofs that touch the ground, bridges and ramps.
• Vegetation on slopes or low vegetation. It is difficult to determine between natural variation in the Earth's surface and variation due to vegetation.
• Discontinuities and abrupt height changes in the ground. Many filters remove or smooth these terrain features.
For stereo photography data, canopies also present difficulties because stereo photography relies on observations from two or more views potentially producing fewer ground points than LiDAR.
There are two types of errors in classifying ground points: omission errors (rejection of true bare-Earth points) and commission errors (false labelling of objects as bare-Earth).
This paper presents a novel algorithm for generating DTMs from DSMs. We concentrate our description on using elevation models presented in raster/gridded form, and note that the concepts generalise to non-gridded points. The algorithm first labels segments and then applies a surface-based outlier removal. It differs from the algorithm of (Woestyne et al., 2004) by using (1) a neighbourhood to label the segments so that small isolated patches of ground are correctly filtered and (2) a multiresolution outlier removal that can quickly remove large regions of non-ground points. The algorithm is described in detail in Section 2.
We applied the algorithm to a 9600km 2 decimetre resolution dataset and the quality of the resulting DTMs are discussed in Section 3. The filter accuracy was 89.6% and the DTMs are already in use by local governments (Western Australian Planning Commission, 2013).

MATERIALS AND METHODS
Here we describe our 2-stage algorithm for generating DTMs from DSMs. The complete work flow is presented in Figure  1. The data used for developing and testing the algorithm is described in Section 2.1. The surface fitting algorithm used in both the surface-based filter and the final DTM interpolation is described in Section 2.2. The first stage comprises a segmentlabelling neighbourhood filter and is described in Section 2.3. The second stage, a point-labelling surface-based filter, is described in Section 2.4. Computational aspects of producing the DTMs are described in Section 2.5.

Data
The algorithm can be applied directly to point clouds generated by LiDAR or stereo photography, however for simplicity we use DSMs that are already available.
The DSMs were generated as part of a wide-region, fine-scale urban monitoring study (Caccetta et al., 2015). See Figure 2 for an example DSM. This project leverages state-funded annual stereo aerial photography capture of a 9600km 2 region around Perth, Australia. Henceforth we will refer to this region as the Urban Monitor (UM) region.
The annual stereo aerial photography data set, which contained approximately 35,000 frames per annum, was acquired using a Microsoft UltraCAM-D (Leberl and Gruber, 2003) flown at a height of about 1300m above the ground, capturing multispectral data (red, green, blue and near infrared bands) along with panchromatic data. The ground sample distance (GSD) was approximately 0.3m and 0.1m for the multispectral and panchromatic data respectively. The dates of each capture corresponded to Perth's dry hot Mediterranean-type summers. The captured frames had at least 65% forward and 30% side overlaps. The overlaps among frames allowed detailed DSM to be extracted using stereo photogrammetric techniques.
All frames went through a rigorous photogrammetric process including image calibration (Collings et al., 2011;Collings and Caccetta, 2013), aerial triangulation (block adjustment), generation of DSM, and ortho-rectification using the derived DSM. High performance computing based aerial digital photogrammetry was employed in the creation of a DSM at 20cm resolution for the 9600km 2 region. The in-house DSM generation method was developed based on one of the authors early work (Wu, 1996(Wu, , 1995, and the processing was performed at Western Australia's Pawsey Supercomputing Centre. The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLI-B3, 2016 XXIII ISPRS Congress, 12-19 July 2016, Prague, Czech Republic In the nDSM cool colours correspond to low elevations, hot colours are high elevations (red = taller than 10m). Noticeable are tall canopies of dense plantation forests in the North, and non-uniform canopy of native forests in the East. Some height errors in water are also visible due to DSM issues over water. In (a) elevations range from 0m (blue) to approximately 0.5km (white) above sea level.

Multi-Resolution Surface Fitting
The surface fitting algorithm was derived from Terzopoulos' work (Terzopoulos, 1988) and is quite similar to the surface fitting algorithms of Bolitho et al. (Bolitho et al., 2007) and Zoraster (Zoraster, 2003). The algorithm can be applied directly to a point cloud to generate a DSM. In principle other surface fitting methods might also be sufficiently fast and accurate.
The algorithm operated at multiple resolutions using coarse resolution results as starting estimates for fine resolution surface fitting. The surface fitting at a coarse resolution provides approximate surfaces with relatively small computational costs. The finer resolution surface fitting adjusts this approximate surface to improve detail and accuracy. Multiple resolution algorithms are well known for reducing computation times.
At each resolution the initial surface guess was iteratively evolved to minimise an energy function that combined a data matching cost with a roughness cost. The former was the difference between the estimated surface and the input points averaged to the current resolution. The latter was based on the magnitude of the second-order derivatives of the surface.

Segmentation Filter
This filter used an algorithm known as inflows that was originally developed by one of the authors (Caccetta, 1997). Similar algorithms were more recently presented for point clouds (Belkhouche et al., 2015) and DSMs (Beumier and Idrissa, 2015). The filter required a slope S parameter, an area A parameter and a relative gradient R parameter to be provided by the user. The algorithm first labelled pixels having slope less than S as candidate ground points (cgp), and rejected all other pixels. Connected components of cgp were treated as regions. Regions with area smaller than A were rejected, largely for the purpose of removing numerous commission errors at the expense of some omission errors. Remaining regions were then labelled according to the percentage of neighbouring boundary pixels higher than the nearest region pixel, or in watershed terminology, the percentage of neighbouring pixels that would flow into the region (hence the term "inflow"). Regions having inflow scores greater than, for example R = 0.5, were retained and all other regions were rejected. An example is shown in Figure 3. Using the DSM (b) (hot/grey colours correspond to high DSM values), the area was segmented into pixels above or below the slope threshold (c). In (c) pixels that would 'flow' into a neighbouring region are yellow, and pixels that would 'flow' away from a neighbouring region are blue. In (c) the level segments are coloured with various saturations of red; the less saturated (darker) a segment the higher the proportion of inward flowing pixels. The segments with more than 50% of their boundary pixels flowing inwards were then labelled ground (d) (ground pixels are coloured pink).
The parameters used for the UM region were S = 25 o , A = 0.4m 2 and R = 0.5. Initial testing indicated that the algorithm performed well on outliers in the DSM, discontinuities, sharp ridges, low vegetation, vegetation on slopes, large objects, and even low objects. However it often misclassified saw-toothed roofs, attached objects (such as roof-top car parks) and more complex configurations such as roofs with many objects on top of the roof and roofs neighboured by even higher roofs (such as in high density commercial districts). It also had difficulty with large horizontal segments in canopies (where the areas of segments could be greater than the area threshold of A) and extremely steep terrain (greater than S slopes).
With inflows as the only filter these few (by area) commission errors resulted in DTMs with mounds every few metres, see Figure 4 for an example. Initial testing suggested that this was an unacceptable error rate, however inflows was extremely fast.
Removal of further non-ground points required a method that compared candidate points to the closest true-ground points, which were potentially many metres to tens of metres away. This suggested either extremely large neighbourhoods (larger than most non-ground objects) or surface fitting. In the next section we describe our solution which was based on the latter.

Surface-based filter
This filter improves the candidate ground point (cgp) set by removing points that cause large roughness in the fitted surface. At each iteration a surface, u, was fitted to the current set of cgp and then a cgp point located at (i, j) was removed if the sum of the  Figure 4. An example of a surface fitted to an inflows candidate ground mask. Shown is a false colour display of the region (a), the DSM (b), the set of cgp from inflows (c) and the fitted surface (d). The sub-figures (b), (c) and (d) are coloured according to the same height scale (blue = low, red = high). The surfaces (a) and (d) are also sun-shaded to accentuate texture. In (c) the black pixels are those that were not candidate ground. A few high, nonground, pixels (coloured green or yellow due to their height) can be seen in (c); they are causing the large mound-like features in (d).
second-order derivatives, , evaluated at (i, j) (calculated by discrete approximation from the heights of neighbouring pixels) was above a user-defined threshold and the cgp was above the fitted surface.
The protection of below-surface cgp prevented true-ground points from being removed when nearby above-ground points caused rough surfaces. In the present algorithm this caveat was relaxed on some iterations to remove below-ground DSM errors which occasionally resulted in loss of true ground points.
2.4.1 Parameter Selection. The user defined thresholds were chosen by trial-and-error on small subsets of the data. The region contained two broad terrain types, a smooth coastal plain that contained most of the urban area and a hillier area with little urban development and lots of forests. In Figure 2a the hillier area is mostly orange and white, whilst the plains are green and blue. Exploratory testing gave a strong indication that it would be beneficial to use different parameters for each of these terrain types. On the coastal plain there were many complicated aboveground structures (due to the higher urbanisation) leading to more commission errors by inflows, however the greater smoothness of the ground allowed stricter thresholds and thus enabled removal of more non-ground points.
Parameter choices in the surface fitting algorithm affected both the computation time and the quality of the filtering. A smoother surface used coarser resolutions, required less computation time, and enabled tighter roughness thresholds which removed larger amounts of commission error. However smoother surfaces were unable to detect small commission errors that were close to trueground points. Heuristically, surface fitting at 3.2m resolution (which was 2 4 times the input DSM resolution) is ideal for removal of residential roof sized patches of commission error because at this resolution a residential roof is only a few pixels wide and there are similar sized patches of ground (backyards, roads) to constrain the surface close to true ground. Outlier removal that used 1.6m resolution or finer surface fits were useful because the surface fitting could detect a greater number of small commission errors in trees. Outlier detection at the full 0.2m resolution was not considered due to the greater computational time it required. Furthermore additional care was taken when candidate ground points below the surface could be removed because there was a risk that true ground points on the floors of valleys and depressions would also be removed.
We used two areas to guide parameter selection, a sharp, well vegetated valley for the hills ( Figure 5) and a large region containing urban, suburban and parkland areas for the coastal plain ( Figure 6). These regions were chosen to be challenging with the expectation that the filter will work better on any other region.
Only six iterations were used due to the complexity of determining parameters. The parameters selected, and details of the parameter search can be found in (Hingee, 2013). We do not include them here because they were specific to the surface fitting program and do not add much conceptually.
(a) false colour display (b) sun-shaded DSM Figure 5. The region used to guide parameter choices for the hilly terrain. Dimensions: 1km by 1km. Grey colours correspond to high elevations (∼ 270m), cool colours correspond to low elevations (blue ∼ 63m). Figure 6. The region used to guide parameter choices for the coastal plain. Shown is a false colour display (above) and sunshaded DSM (below). Dimensions: 3.4km by 1.4km. Grey colours correspond to high elevations (∼ 63m), cool colours correspond to low elevations (blue ∼ 10m).

Processing
Filtering of the DSM and surface fitting to create the DTM was performed using Western Australia's Pawsey Supercomputing Centre. Due to very large memory requirements of the surface fitting, the DSM was cut into 20 000 x 20 000 pixel tiles (4km by 4km), with an overlap of 1000 pixels (200m). This resulted in a total of around 1000 tiles to be processed.
The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLI-B3, 2016 XXIII ISPRS Congress, 12-19 July 2016, Prague, Czech Republic The surface-based outlier removal together with the final surface fit to produce a DTM required around 16GB of memory and just over 4 hours of CPU time for each tile resulting in over 4000 hours of processing in aggregate. In comparison the segmentlabelling filter used very little memory or other resources.
The resulting DTM tiles were mosaicked back together to form the final DTM taking around 36 hours to complete. The 1000pixel tile overlap allowed the feathering of edges through a simple linear weight ramp, from a weight of 0 at the edge of a tile, to 1 where the overlap finished. Without this feathering the final mosaic would have contained extensive tile edge effects, due to surface fitting being calculated independently on each tile.
The nDSM in Figure 2, which was generated by simply subtracting the DTM from the DSM, required around 5 hours to complete.

RESULTS AND DISCUSSION
The quality of the final DTMs generated for the region were investigated with a manual inspection of the entire region (Section 3.1), arrays of manually classified points (Section 3.2) and a comparison to the commercially available match-T algorithm (Section 3.3).

Manual Inspection
The DTMs of the entire UM area were manually inspected for large errors and large patches of errors. Here we focus on the DTM for 2007. The results of the other year, 2009, were similar -see (Hingee, 2013). Due to time constraints the DTM was only checked at scales larger than the typical residential house.
Industrial roofs were often problematic because inflows was confused by their complex shapes (saw-toothed roofs, rooftop car parks attached to the ground, and roofs with objects on them are all common in industrial/commercial zones) and the large roof areas hindered removal by the smoothing filter. Bridges were not classed as commission errors because in some real-life situations they are regarded as ground, however it was noticed that our filter partially commissioned most bridges as ground -inflows labelled them as ground and the smoothing filter eroded the edges.
Some dense forest issues also occurred, primarily in two large plantation forests, and were due to a lack of visible ground to constrain the surface-based filter.
Discontinuities and sharp changes in true ground such as cliffs, gullies and occasional peaks of sand dunes were overly smoothed however these issues were either small or rare, and deemed a reasonable price for reduced commission error.
Two large gentle hill-sides were interpolated instead of fitting to ground points present in the DSM. These were caused by large ground segments with inflow scores lower than the threshold of R = 0.5.
The effects of the piecewise generation of the DTM (recall that the size of the data necessitated surface fitting on 4km by 4km subsets) were only noticed three times, all in large data-less regions such as rivers or lakes.

Comparison to Arrays of Manually Interpreted Points
In clear, flat, open ground, manual inspection showed that the DTM was nearly identical to the DSM and consequently inaccuracies of the DTM were attributed to two sources, the input DSM and any commission/omission errors of the ground filter (a third source, occlusions, was also possible in other parts of the UM region where there was very dense forest).
Accuracy was estimated by arrays of points in three separate regions. A complex urban region (the same region used to train parameters for the coastal plains), a simpler coastal suburb and a farming district that contained a nature reserve. Manual interpretation of these points as either ground, non-ground, or unknown (could not distinguish between ground or non-ground) was performed using orthorectified images. Due to time restrictions no manual interpretation of points occurred in the hilly region. The proportion of predictions that matched the manual interpretations (the overall accuracy) was 89.6%, the percentage of manually interpreted pixels that were incorrectly predicted as ground (the commission error) was 2.2% and the fraction of manually interpreted pixels that were incorrectly predicted as non-ground (the omission error) was 8.1% (these accuracies and the sums of each row above are ignoring the unknown pixels).

Comparison to an Inpho match-T DTM
Inpho's match-T algorithm (Lothhammer, 2005) was used to create both a DSM and a DTM for a 4.5km by 5.7km region representing a fairly simple, but prevalent landscape. The region was on the flat coastal plain and contained clear ground, sparse forest and large, highly vegetated residential blocks. Due to time and data transfer limits it was only possible to compare algorithms on this small region.
Exactly the same raw photographs from the 2009 data were used for both algorithms. The default parameters were used for the Inpho algorithm. It is likely Inpho's DTM could have been improved with better parameters however this risked over fitting.
In clear open ground both DTMs closely matched the DSMs so numerical height accuracies were not compared. However there were clear differences in the abilities of each method to filter nonground objects.
Each DTM was evaluated by manually scanning through the region at high resolution, noting any commission or omission errors. For an error to be noticeable there usually needed to be an area of unexpected roughness of approximately 300m 2 or larger (e.g. the surface contained objects, protrusions or small mounds where nothing in the DSM or orthophotos indicated a change in height), or was smoother than it should be (e.g. loss of gullies and sharp landscape features).
The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLI-B3, 2016 XXIII ISPRS Congress, 12-19 July 2016, Prague, Czech Republic The height of the two DTMs were then differenced (Figure 7, bottom) to highlight further differences, this was especially useful for discerning omission errors in our DTM. Differences that the user could not easily attribute as error, such as inclusion/exclusion of a large mound of dirt, were ignored.
Images of the results are shown in Figure 7. A total of 40 errors were found in our DTM. There were 20 commission errors of trees, 15 omissions of dams and gullies, 3 roofs commissioned as ground and 2 incidences where the DTM smoothed over true bulges in the ground.
A total of 67 errors were found in Inpho's DTM. There were 59 errors caused by trees, 6 roof errors, and 2 erroneous protrusions in the DSM that the DTM failed to remove. No omission errors were found; it did not smooth over gullies or dams.
Overall our algorithm resulted in 40% fewer errors. Inpho's DTM contained almost three times as many vegetation commission errors as our DTM, and twice as many roof errors. However it captured many sharp changes in the ground, which were mostly caused by gullies and dam walls, that our DTM consistently missed. In dense urban areas and forests Inpho is likely to create many more above-ground errors because there are many more above-ground objects in these regions. The improvement gained from our algorithm implies a significant reduction in costly manual editing for the region.
Assuming the accuracy of our DTM here is similar, on average, to the whole UM region the error counts suggest at least 400 unrecorded small-scale commission errors.

CONCLUSION
Our algorithm's estimated accuracy for the Perth region was 89.6% and in a peri-urban region it produced 40% fewer errors than Inpho's match-T algorithm. Qualitative manual inspection suggested that the algorithm produced good estimates of ground height in well-vegetated dense suburbs, new suburbs, peri-urban areas and farmland. However it had difficulties with a few rarer terrains: industrial buildings, dense forests, extremely steep hills, sharp depressions and discontinuities.
The algorithm, which combines a segmentation filter with a surface based smoothness filter, shows how improvements to single method algorithms can easily be obtained through a secondary filter. It is also a rare demonstration of DTM estimation from digital stereo photography.
The results of this algorithm applied to the UM region are already in use by urban decision makers (Western Australian Planning Commission, 2013, page 13).

FUTURE WORK
In the future we would like to gain a better estimate of our algorithm's accuracy using manually interpreted points over a broader region. We would also like to investigate improvements to the filter through a specialist below-ground filter and/or morphological removal of outliers after surface fitting. Applications to very steep terrains could use an inflows variation wherein segments of slowly changing slope are separated by high curvatures.
Our algorithm was sufficiently fast and accurate to produce DTMs for multiple dates. Future comparisons between these DTMs could provide further valuable information to a wide range of professionals. Figure 7. Comparison with the match-T generated ground elevation. Shown from top: a false colour display; the DTM generated by Inpho; our DTM; and a grey scale difference of the two DTMs with the errors noted in polygons. In the last image dark shades correspond to locations where the Inpho DTM was higher than our DTM. Many more errors were found in the Inpho DTM (blue polygons) than our DTM (yellow polygons). Note that the Inpho DTM contained some stereo photography matching issues at the edge of the region, these were ignored.