Is waveform worth it? A comparison of LiDAR approaches for vegetation and landscape characterization

Light Detection and Ranging (LiDAR) systems are frequently used in ecological studies to measure vegetation canopy structure. Waveform LiDAR systems offer new capabilities for vegetation modelling by measuring the time‐varying signal of the laser pulse as it illuminates different elements of the canopy, providing an opportunity to describe the 3D structure of vegetation canopies more fully. This article provides a comparison between waveform airborne laser scanning (ALS) data and discrete return ALS data, using terrestrial laser scanning (TLS) data as an independent validation. With reference to two urban landscape typologies, we demonstrate that discrete return ALS data provided more biased and less consistent measurements of woodland canopy height (in a 100% tree covered plot, height underestimation bias = 0.82 m; sd = 1.78 m) than waveform ALS data (height overestimation bias = −0.65 m; sd = 1.45 m). The same biases were found in suburban data (in a plot consisting of 100% hard targets e.g. roads and pavements), but discrete return ALS were more consistent here than waveform data (sd = 0.57 m compared to waveform sd = 0.76 m). Discrete return ALS data performed poorly in describing the canopy understorey, compared to waveform data. Our results also highlighted errors in discrete return ALS intensity, which were not present with waveform data. Waveform ALS data therefore offer an improved method for measuring the three‐dimensional structure of vegetation systems, but carry a higher data processing cost. New toolkits for analysing waveform data will expedite future analysis and allow ecologists to exploit the information content of waveform LiDAR.


Introduction
The spatial and volumetric structure of vegetation in ecosystems is a key driver of function (Shugart et al. 2010), and Light Detection and Ranging (LiDAR) instruments provide critical data for describing and modelling vegetation structure (Vierling et al. 2008). LiDAR instruments can be operated from the ground (e.g. Terrestrial Laser Scanning; TLS) from airborne platforms (e.g. Airborne Laser Scanning; ALS) or from satellites [e.g. freely available data from ICESat ], and come in two formsdiscrete return and full waveform systems (Lefsky et al. 2002;Vierling et al. 2008). The difference between these is the way in which data are recorded. Discrete return systems (most com-monly used) measure the time taken for a laser pulse to travel to an object and are used to determine height. In products derived from ALS data, there are usually two datasets: a digital surface model (DSM) provides an estimate of the top-of-canopy height, while the digital terrain model (DTM) shows topographic variability in the neighbouring ground surface. Such data can be used to describe canopy patterns (Anderson et al. 2010;Luscombe et al. 2014), model hydrological flow paths (Jones et al. 2014), monitor wildlife habitat , or produce carbon inventories at patch (Calders et al. 2015) or landscape  scales. Waveform ALS data ( Fig. 1), however, have the potential to provide much richer spatial information about canopy characteristics in three dimensions. This is because these systems record the range to multiple targets within the canopy (Danson et al. 2014). By measuring the time-varying signal of the laser pulse as it illuminates different elements of the canopy, these systems can be used to model the spatial character and arrangement of structures that drive canopy biophysical processes such as canopy architecture and size and woody biomass (Mallet and Bretar 2009;Armston et al. 2013), and can provide useful data for studies requiring tree species discrimination (Alonzo et al. 2014).
It is only since around 2010 that waveform systems have begun to be heavily explored in ecological contexts (with limited earlier examples by Anderson et al. (2006), and Hyde et al. (2005), for example). This is probably because of the high data volumes requiring high computing power, and the complexity of analysing the return signal [e.g. rather than a few 'hits' (typically, up to five) from a discrete return system, waveform systems give a near-continuous pulse; Fig. 1]. Waveform data represent a significant signal processing tasktracing the photon from the sensor to the ground and understanding what the interactions represent is a potential barrier to their application in ecology and beyond. Extracting 3D canopy information from the waveform is challenging because the pulse can be perturbed on its path through the canopyfor example, the electromagnetic radiation in the pulse can be redirected within the canopy and is known to suffer 'multiple scattering' between different elements (e.g. leaves and woody biomass). This leads to highly complex signals requiring denoising and correction using signal processing approaches, followed by product validation. Despite this challenge, there are a variety of new waveform signal processing approaches emerging, particularly for vegetation applications, with most studies following one of three methods: 1 Decomposition into points and attributes using function fitting (Hofton et al. 2000;Wagner et al. 2008); 2 Decomposition into points using deconvolution (Hancock et al. 2008;Jiaying et al. 2011;Roncat et al. 2011); 3 Extracting metrics such as height of median energy (Drake et al. 2002). The points or metrics from the resulting models can then be used to infer plot-level characteristics or calculate canopy height (Boudreau et al. 2008), fit geometric primitives to crowns (Lindberg et al. 2012); or fill voxels to enable construction of three-dimensional models from a regular grid of cubes (e.g. as in Minecraft) where canopy structure can be optimally modelled (Hosoi et al. 2013).
Waveform laser scanning technology is now at a tipping point, evidenced by NASA's forthcoming 'Global Ecosystem Dynamics Investigation LiDAR' space mission, due for launch in 2018 [GEDI (Krainak et al. 2012;NASA, 2014)]. It is hoped that the enhanced capability of the waveform system on GEDI will provide superior global estimates of vegetation carbon stocks.
In this article, we address the pragmatic research question of what benefits waveform ALS data can offer ecologists over more easily obtainable discrete return ALS products, using urban systems as an exemplar. Quantitative description of the pattern and 3D structure of urban vegetation demands fine-scale spatially distributed information describing canopy architecture (Yan et al. 2015).  figure). In contrast, a discrete return system would not provide details of the pulse, but would instead report a series of 'hits' from various components of the landscape being monitored, typically from near to the top of the tree and from somewhere close to the ground surface (sometimes with further returns from points in between). Simulated discrete returns are shown on the plot in the left of the figure. This is because the pattern and extent of green infrastructure (e.g. street trees, parks, domestic yards and gardens) is a key determinant of the provision of ecosystem services in cities and towns, including nutrient cycling, temperature and flood risk regulation, reduction in atmospheric pollution, aesthetics, and multiple dimensions of human health (Gaston et al. 2013). Most examples of remote sensing approaches for mapping urban green space rely on either optical classification of aerial photographs, or height-based classification of discrete return ALS to determine the spatial distribution of basic classes such as trees, bushes and grass (Chen et al. 2014;Yan et al. 2015). While these data are appropriate to the particular scale range of the texture of urban vegetation variance, and allow the small patch sizes of urban green space to be mapped (e.g. in yards and gardens), they neglect to characterize the important vertical distribution of vegetation and photosynthetic material through the depth of the canopy and its spatial form. Furthermore, they cannot account for important habitat features such as the understorey, which are important in driving urban ecological connectivity. This work sought to establish the impact of those omissions in describing urban vegetation complexity.
Here, we compare a simply processed waveform ALS product with discrete return ALS data from the perspective of ecologists working in urban environments. We validate the findings using a ground-based TLS survey, quantify differences in each approach and evaluate the relative processing costs of each. Finally, we discuss the wider implications for using waveform ALS data for vegetation monitoring in other ecological settings.

ALS survey data
An ALS survey was carried out over the town of Luton, UK on 5 and 6 September 2012 (Fig. 2) when the urban vegetation was in full leaf-on stage. The survey utilized the UK Natural Environment Research Council (NERC) Airborne Research and Survey Facility (ARSF) Dornier 228 aircraft platform and the Leica ALS50-II ALS system with a WDM65 full-waveform digitizer, measuring at 1064 nm. Georegistration of the scans was achieved using differential global positioning system (GPS) data from the aircraft and at a linked GPS ground station. All ALS data were collected by a single instrument with separate discrete return and waveform output streams. The footprint density of ALS data (waveform and discrete return data) were collected with a density of between one point per 25 cm 2 and one point per 4 m 2this variability is normal and is dependent on scan angle and overlap between flight lines. The discrete return ALS data had up to four returns per pulse. Raw ALS data were processed into a geolocated point cloud with associated waveforms using Leica ALSPP software (version 2.75). More detailed documentation about the data processing can be found online (NERC ARSF, 2014a,b).
Two data products from the ALS survey were compared: a discrete return ALS point cloud describing x, y, z spot heights and intensity; and a waveform ALS dataset, which required pre-processing before it could be used.

Field site description
Data from two field validation sites (both within an area of Luton, UK, called Little Bramingham Woods) are presented in this article (Fig. 2). The first site was in an area of dense and varied tree cover with a clear understorey (referred to as the 'woodland' site) and the second was from a residential area (referred to as the 'suburban' site). A very simple 2 m resolution land cover map (LCM) was generated for these sites using data from an airborne hyperspectral survey (with the AISA Eagle 12 bit pushbroom scanner) carried out at the same time as the ALS survey. The LCM was generated by applying an unsupervised classification algorithm to discrete return ALS data and a Normalized Difference Vegetation Index (NDVI) product. The NDVI was calculated using equation 1 where q vis was the mean visible reflectance in channels from 500 to 680 nm, and q nir was the mean infrared reflectance between 761 and 961 nm.
A 70-cm threshold for discriminating tall from short vegetation and an NDVI threshold of 0.2 for discriminating vegetated from non-vegetated areas was used. In the woodland area, the LCM showed that the majority of the site was covered by tall vegetation. In the suburban area, as was expected, there was a mix of tall and short vegetation and vegetated and non-vegetated areas. For both woodland and suburban sites, the discrete return and waveform ALS data were extracted for a 20 m by 20 m square at the centre of each TLS ground validation site for comparison. These comparison areas were chosen because they were proximal to sampling sites where complementary ecological data were being collectedspecifically bird feeders where population counts were being collected and where flows of biodiversity through urban systems were being measured. These sites were also evaluated in the waveform LiDAR datasets prior to collection of the TLS validation data, and were found to be representative areas with a variety of waveform shapes and widths.

Method for processing waveform ALS data
The ALS50-II system recorded the intensity of reflected light as an eight-bit value every 1 ns. The first step in signal processing the waveform data was to remove background electronic noisewhich is known to be very stable in the Leica ALS50-II . Here, we used a simple method to extract canopy signals from the waveform ALS data. The first peak in the waveform above the noise threshold was traced back to the mean noise level (DN = 12, derived from a histogram) to provide a consistent estimate of the canopy maxima. The histograms of signal intensity from  were then used to set the simple noise threshold at DN = 16 (see Hancock et al. 2015, figure 5b) to remove all background noise, and the result was a product showing point height information that could be used to compare datasets quantitatively. Further processingfor example, using function fitting, deconvolution or pulse width subtraction may have further improved the retrieval of the 'true' canopy top (Hofton et al. 2000). These more complex signal processing methods were not the focus of this article and will be discussed in a subsequent article, which develops a validated voxel-based approach for 3D canopy description in urban settings.

Validation data from TLS survey
To validate the information content of the two ALS products, a waveform TLS system was deployed [Riegl VZ-400, operating at 1545 nm (near infra-red)] to measure vegetation structure (from the ground up) on 5 and 7 August 2014. The TLS instrument had a reported 5-mm accuracy and 3-mm repeatability, which was far greater than the ALS data. Previous work by Calders et al. (2015) has shown that this approach provides a good validation (accurate tree heights were obtained and attenuation was not found to be significant). The dates of field sampling with TLS were chosen to ensure that the vegetation was in a similar state to the time of the ALS survey. Validation sites were chosen to cover a range of observed habitat structures, and a variety of ALS waveform shapes and urban typologies. As a result, the TLS scan methodology had to be adapted for each site so as to capture the variability in canopy structure appropriately. The plot sizes also varied, with small (5 m) plots sometimes requiring three scan positions to capture variability in the dense vegetation while sparsely vegetated plots measuring tens of metres in size only required two scan positions due to reduced occlusion. Each site was scanned from two or three different positions so as to infill shadowed areas, and multiple scans were co-registered using reflector targets. TLS point clouds were then manually translated to align the roofs of buildings with the geolocated ALS data to within 10 cm vertically and <30 cm horizontally.

Quantitative comparison
To compare quantitatively the consistency of the height estimate error in the datasets, the mean difference between the ALS and TLS-derived ranges to the tallest object, and the standard deviation (SD) of those differences were calculated for a 595 m area around the plot centres of the 20920 m extracts. In the woodland area, this 595 m measurement area was covered with dense trees. The LCM classification indicated that the woodland plot comprised of 100% tall vegetation. In the surburban zone, the 595 m measurement area was a road surface with neighbouring pavement and lamp posts with no green elements. The LCM classification indicated that this plot comprised 75% short non-vegetation (e.g. roads, footpaths, gravel driveways or cars), and 25% tall nonvegetation (e.g. buildings or lamp posts). These comparison plots therefore represent end members of urban structural diversity and so offer the most effective insight into the relative merits of waveform versus discrete return ALS products.
The ALS waveform-derived canopy top was calculated using the method described in Hancock et al. (2011) using a mean noise level of 12 and a noise threshold of 16. Calders et al. (2015) have demonstrated that TLSderived estimates of canopy height are very reliable (see figure 6 in Calders et al. 2015) and our comparisons therefore rely on TLS being able to provide a robust validation of true canopy height. Biases between TLS measuring the leaf underside versus the ALS measuring the leaf topside are treated as negligible here.

Results
Validation of airborne discrete return and waveform ALS data with TLS Figure 3 shows the results of comparing waveform and discrete return ALS data with TLS data. Over hard surfaces with little spatial complexity in height and structure, such as roads and buildings in the suburban area [ Fig. 3(A) and (B)], the discrete return data provided a height model that indicated basic trends, while the waveform data showed pulse blurring caused by the 3.55 ns system pulse . Conversely, the waveform pulses (coloured green) in Figure 3(B) travelled through urban green space components like bushes and shrubs, and so provided potentially useful within-canopy structural information, while the discrete return points failed to capture the detail of the canopy profile. In the woodland setting, the ALS waveform system recorded returns from throughout the canopy and could be used to provide useful information on the canopy understorey (e.g. presence/absence, density and structure). In some settings, there was penetration of the ALS waveform all the way to the ground, allowing the urban habitat to be described much more accurately than with discrete return data [ Fig. 3(C) and (D)]. In some places, however, there were data shadowsfor example, beneath the centre of a large tree [ Fig. 3(D)]. This same figure shows that in a few places, the discrete return ALS heights of the tree tops appear to be underestimated relative to the height derived from TLS. A few further issues are evident with the waveform datain Figure 3(B) and (D) some of the waveform returns appear below the TLSderived ground surface. These errors are caused by the combination of multiple scattering of photons in the canopy and automatic instrument settings applied at the point of data collection. These erroneous points can be corrected using signal processing approaches (see 'Introduction'), but these are computationally complex and require extensive testing and validation.

Quantitative comparison
Applying the method explained in 2.3 and 2.4, statistics were generated that showed that discrete return ALS data consistently overestimated the range (and so underestimated height), with a bias of 0.82 m (SD = 1.78 m) in the 595 m woodland test area. Conversely, the waveform ALS data consistently underestimated range (and so overestimated height), but with a smaller bias, and provided a more consistent estimate of height (i.e. smaller SD) than the discrete return data (bias = À0.65 m; SD = 1.45 m). In the 595 m suburban test area, the biases showed similar patterns (discrete return bias = 0.78 m; waveform bias = À0.29 m), but the discrete return data had a lower SD (0.57 m) compared to the waveform data (0.76 m), indicating that more consistent results were achieved with discrete return data where vegetation was not present. This analysis adds weight to the suggestion that the discrete return algorithms are optimiszd for hard surfaces (such as roads), where they outperform simply processed waveform data, and that waveform data provide more accurate results over vegetation. It should be noted that the waveform ALS product could be processed to generate a product which performed as well as the discrete return data over hard surfaces, but the computational costs of doing so would be high.

ALS intensity measures
Further issues with discrete return ALS products are apparent when evaluating discrete return ALS intensity values over vegetated surfaces. Figure 4 demonstrates this by comparing the intensity measured from the discrete return ALS product with the reflected energy from the waveform data (the integral of the waveform intensity with time) over a mixed urban landscape in Luton. Areas of high intensity appear brighter than those with lower intensity. At 1064 nm, healthy green vegetation would be expected to reflect radiation strongly and yet some of the vegetated areas in Figure 4(A) show low intensity (indicated by dark areas), which is an artefact of the diffuse return containing a large amount of energy, but having a low, broad peak . Therefore, there are often non-physical effects caused by signal distortion, and these could lead to large errors in interpretation of discrete return ALS data if used for automated land cover determination. This is frequently overlookedfor example, studies by Antonarakis et al. (2008) and Donoghue et al. (2007) both utilized discrete return ALS intensity as an additional measure to derive a supervised classification of vegetation types. The discrete return intensity is a function of vegetation structure (e.g. foliage profile), albedo (e.g. phenology) and the processing algorithm applied, so will confound classification accuracy if one or more of those variables is changed. Waveform ALS data are much less prone to such limitations, being able to record a much more accurate measure of reflected radiation and shape of the signal response of the target, allowing the same discrimination using the physically based shape rather than an artefact [ Fig. 4(B)].

Computational requirements
When deciding which ALS product to use, one must consider data volumes and computational requirements under-A B Figure 4. The impact of using discrete return intensity versus waveform airborne laser scanning (ALS) in the near infrared (1064 nm) is shown for a mixed zone in the focal area of Luton. In (A) the intensity of the discrete return ALS data are shown, while (B) shows the difference when waveform ALS intensity is used. The major differences in intensity appear in zones with dense vegetation. These data show that relying on discrete return intensity would lead to biasthe area of dense trees appear as having low intensity (low reflectance at 1064 nm) when they should have high reflectance (the two are related). This bias is not present in waveform intensity, which shows both the mown grass and the dense trees as having high intensity which is correct given the known strong vegetation reflectance response in this region of the spectrum. pinning information extraction. Data volume and processing costs are currently much higher with waveform data than with discrete return data. For example, the waveform files used here [LAS1.3 format (ASPRS, 2015)] were 6 to 10 times larger than the discrete return (LAS1.0 format) files. For example, one strip of discrete return ALS data would occupy 700 mb of disc space, while the same spatial extent of waveform ALS data would occupy 4.2 gb. Much of this additional data volume is occupied by wave bins that contain no usable signal, but which must be retained for postprocessing. Once the background noise is removed, file sizes can be reduced by roughly an order of magnitude by simple run-length encoding. The signal processing needed to extract target properties is computationally expensive: applying the method described in Hancock et al. (2008) took 25 processor days on a computer with a 3-GHz CPU, although this could be parallelized on a cluster workstation to expedite processing time. In comparison, the discrete return point cloud is processed by the instrument during collection and typically, is ready for use in geographical information systems or other image processing software on delivery (although some users will subsequently choose to apply additional topographic normalization techniques or post-process the data using other tools).
While considering the various costs of extracting information from waveform ALS data, it is also important to highlight the recent development of new software tools for expeditious analysis of such data. Not all of these tools are mature, but they offer a means by which most users could extract useful information from both discrete return and waveform-capable LiDAR systems (from both ALS and TLS systems). Such tools [we list only free-touse (FTU) or open-source (O/S) options] are briefly summarized in Table 1.

Summary and Conclusions
The results shown here suggest that discrete return ALS data are optimized for use in measurement of simple hard targets (i.e. roads), and that the methods and assumptions used to generate discrete return ALS products do not permit accurate description of the three-dimensional structural complexity of vegetated areas. Using two urban landscape typologies, we have shown that if discrete return data were used alone, measurements of the vegetation system would be biased in terms of canopy height (underestimation), inaccurate in terms of intensity (likely resulting in physical misclassifications of green space) and missing vital data on the characteristics of the canopy understorey. Inaccuracies arising from the use of discrete return ALS data in measuring tree canopy height have been reported previously, for example, by Zimble et al. (2003) who showed bias in deriving canopy height models from discrete return ALS (in this example, the underestimation was caused by the points missing tree tops, hitting the shoulders of tree crowns and thus, underestimating canopy height). The bias in canopy height in the discrete return ALS data reported in our study is most likely caused by the signal processing algorithms used to generate the discrete return products and has also previously been reported also by Gaveau and Hill (2003). This is a different, and additional effect to that described by Zimble et al. (2003). Such biases in discrete return ALS data could be addressed on a site-by-site basis using an empirical calibration against ground data, although using the waveform allows this bias to be removed in a more consistent way (Hancock et al. 2011).
By adopting a waveform ALS approach, there are benefits and costs for the ecologist. The major benefits are a more complete three-dimensional description of the vegetation canopy. With waveform data, we show how ecologists can obtain improved canopy height models, which are critical for improving understanding of spatial carbon assessment and biomass, for example (Lefsky et al. 2005;Hilker et al. 2010). We also show the potential of the waveform approach for improved detection and description of understorey characteristics, which are important if spatial models of biodiversity, resource availability (Decocq et al. 2004), and variables such as propagule abundance and connectivity (Jules and Shahani 2003) are to be determined. To date, there have only been a limited number of studies that have investigated canopy understorey characteristics with LiDAR systems, and none currently exist which use waveform ALS for this purpose. For example, Hill and Broughton (2009) used leaf-off and leaf-on discrete return ALS data to map the spatial characteristics of suppressed trees and shrubs growing beneath an overstorey canopy, and Ashcroft et al. (2014) have demonstrated the capability of TLS to capture three-dimensional vegetation structure, including understorey. With waveform data, we have shown that there exists an unexplored capability to model canopy understorey in leaf-on stage, over large areal extents: an exciting scientific opportunity. The costs are a high data storage and processing demand (see 'Computational requirements'), and in this thread, there is certainly a great need for more work to improve and optimize the processing of waveform data to account for multiple scattering effects and for accounting for the waveform pulse shape. It is also worth noting that currently there are many LiDAR systems (both ALS and TLS systems) that are waveform capable, but the waveforms are often discarded during the automated process of generating discrete return data [e.g. Riegl LMS-Q1560 (Disney et al. 2010)].
In answering the question posed in the title of the article, we therefore conclude that there is a hidden and rich resource in data from waveform ALS systems that would provide added value for spatial ecologists investigating vegetation systems and dynamics across a range of ecological systems. The 'costs' of processing waveform data should not be overlooked, but a growing suite of processing tools (Table 1) will reduce the processing costs and the technical requirements for users of waveform data to have signal processing expertise. As waveform data become more readily available [e.g. through new global missions such as NASA's GEDI (NASA, 2014;Krainak et al. 2012)] and tools become available to make those data easier to process, we suggest that these will provide a rich source of accurate, three-dimensional spatial information for describing vegetation canopies. This will improve scientific understanding of the functional relationships between vegetation structure and related, important ecological and environmental parameters in a wide range of settings.