Simulated impact of sample plot size and co-registration error on the accuracy and uncertainty of LiDAR-derived estimates of forest stand biomass
Research highlights
► Field plot size is a key design parameter in lidar forest surveys. ► Large plots produce more reliable ground and lidar measurements. ► Large plots are more resilient to the ill effects of co-registration error. ► Optimal plot size depends on tree size and stand heterogeneity.
Introduction
Light Detection and Ranging (LiDAR) systems are able to capture the complex three-dimensional (3-D) structure of forest canopies and underlying ground surface topography at very high spatial resolutions, and have proven to be highly effective for the estimation of forest biomass and other biophysical properties across a broad range of forest ecosystems (Lefsky et al., 2002b, Reutebuch et al., 2003). When using LiDAR to undertake forest surveys, data are typically collected either as a complete (‘wall-to-wall’) spatial coverage or as series of discrete, non-overlapping flight lines (also known as swaths or strips). Field estimates of the target variable (e.g., total aboveground biomass—TAGB) are acquired before or after LiDAR data collection using a network of ground-reference (field) plots established in areas with planned or known LiDAR coverage. Finally, a regression model (or suite of models) is developed (1) to describe the statistical relationship between the field-measured variable of interest (response) and one or more LiDAR metrics (predictors), and (2) to spatially extend model predictions of the target variable across all areas of the land base where LiDAR data were captured (García et al., 2010, Hudak et al., 2006, Li et al., 2008, Næsset and Gobakken, 2008).
Successful application of this model-based estimation procedure is dependent on several underlying assumptions. First, selected LiDAR metrics must be uniquely and well correlated with the biophysical property of interest (Lefsky et al., 2005, Li et al., 2008). Second, the spatial scale (plot size and shape) and location of the LiDAR samples must exactly match the scale and location of ground-reference samples (Patterson & Williams, 2003). Third, plot size must be sufficiently large to minimize potential edge effects (i.e., unplanned exclusion or inclusion of ground-based or remotely sensed forest measurements along the plot boundary), as well as to capture an adequate amount of on-ground (in situ) structural variability (Curtis and Marshall, 2005, Zenner, 2005). Last, the network of ground-reference plots must cover the full range of variability in both response and predictor variables to minimize the occurrence of extrapolation errors when prediction models are applied across the entire LiDAR dataset (Montgomery et al., 2006).
Field plots are typically located using global positioning systems (GPS), with positional accuracies determined by instrument, user, satellite, atmospheric, and environmental related factors (Johnson & Barton, 2004). Forest canopies in particular block or scatter the GPS signal (known as multipathing), which make it difficult to locate field plots with planimetric (horizontal) accuracies better than 1–5 m (Bolstad et al., 2005, Næsset and Jonmeister, 2002, Wing and Karsky, 2006). LiDAR (x, y, z)-return coordinates (point clouds) are also subject to positional errors, but the magnitude of these errors is generally considered to be minor (< 0.5 m in the horizontal domain) relative to field plots and other types of remotely sensed data (Gatziolis & Andersen, 2008). Planimetric errors associated with field plots and/or LiDAR data result in an imperfect spatial overlap (i.e., co-registration error) between these two datasets, and this can potentially weaken the prediction accuracy of regression estimators (Gobakken & Næsset, 2009).
Plot size is an important design parameter in forest surveys, because it has the potential to either dampen or inflate the impact of edge effects and co-registration error. The edge-effect (noise) associated with LiDAR metrics is largely unavoidable, and related to the fact that trees located just outside the plot boundary (not included in the ground sample) may still have some portion of their crowns falling within the plot. As well, trees tallied within the plot may have part of their crowns lying outside the plot boundaries (not measured by LiDAR). Sample plots that have a large perimeter-to-area ratio will, in theory, produce LiDAR metrics that are less precise and less accurate, due to the inclusion of substantially more edge-induced measurement error. Because the perimeter-to-area ratio of a square or circular plot declines nonlinearly with increasing plot size, we expect that larger plots may substantially reduce the negative impact of the edge-effect on the magnitude and stability of LiDAR metrics. Furthermore, larger plots maintain a higher degree of spatial overlap in the presence of GPS positional errors (Flewelling, 2009), exhibit less between-plot variance (Zeide, 1980), and are therefore less affected by the co-registration error that inevitably occurs between ground and LiDAR samples.
Despite the emerging popularity of airborne LiDAR in forest research and operational forest management, few published studies have examined the impact of sample plot size and co-registration error on the prediction accuracy of least-squares regression estimators. Gobakken and Næsset (2009), for example, found that larger field plots (plot size ranged from 200 to 400 m2) established with minimal GPS positional error (< 2–3 m) generally produced more accurate least-squares predictions of stand height, basal area, and volume. They also observed that LiDAR metrics were less sensitive to variations in plot size and co-registration error in short, dense, spatially homogeneous stands than tall, sparse, heterogeneous stands. Mauro et al. (2009) similarly showed that GPS positional errors had a greater impact on LiDAR metrics when extracted from small plots (6 m radius) compared to larger sized plots (10 m radius). Both studies caution that specific findings could be strongly scale-dependent and therefore relevant only for the forests studied.
In this paper, we investigate the impacts of plot size and co-registration error on the precision and accuracy of LiDAR height and density metrics, regression model coefficients, and the prediction accuracy of least-squares estimators. We focus on TAGB as the target variable of interest, because of its importance as a key indicator of ecosystem productivity, carbon storage, and biodiversity (Houghton et al., 2009). Also, TAGB is jointly determined by tree mass and size-frequency (Li et al., 2008, Næsset and Gobakken, 2008, Ni-Meister et al., 2010), and so provides for a more comprehensive analysis of the way in which plot size and co-registration error interact and propagate in regression-based prediction models. We analyzed simulated 3-D forest canopies and synthetic LiDAR point clouds rather than real data, so that we could (1) control the exact amount of spatial overlap between ground-reference and LiDAR samples, and (2) investigate the impact of plot size and co-registration error across a much broader range of TAGB estimates and forest canopy structures. We introduce several possible options to improve regression estimators for LiDAR forest biomass surveys, based on notions of model parsimony, sampling efficiency, and the existence of an ‘optimal’ plot size. Here, we strictly define ‘optimal’ as the minimum plot size required to produce a regression estimator with ‘good’ or ‘acceptable’ statistical properties, rather than in the classical sense of a cost–benefit (i.e., cost of field data collection versus prediction accuracy).
Section snippets
Biomass of simulated forest canopies
We made use of a relatively large (n = 179), well-documented dataset of ‘virtual’ (computer-generated) coniferous (Douglas-fir) forest canopies previously developed and described by Frazer et al. (2005). The maximum geometric extent of each canopy volume was 50 m (width) × 50 m (length) × 60 m (height), with a voxel (volume element) resolution of 10 cm (0.001 m3) (Fig. 1). Tree crowns and boles were modelled as parabolas and cones, respectively (Van Pelt & North, 1996). Stand tables used to construct
PCA and cluster analyses of LiDAR metrics
PCA indicated that 99.8% of the total variance in the original 45 LiDAR metrics could be explained by the first 10 PCs. However, stopping rules based on randomization (i.e., Rnd-Lambda and Avg-Rnd) and broken-stick indices suggested that only the first three PCs were significant (Peres-Neto et al., 2005). These three PCs explained 90.2% of the total variance, with PC1, PC2, and PC3 accounting for 58.1, 21.8, and 10.3% of the total variance, respectively. PC1 was positively correlated with Lhq
Importance of plot size as a key sampling design parameter
Our findings demonstrate that plot size can be a critically important sampling design parameter in LiDAR forest surveys. We found that regression model fit and prediction accuracy improved markedly as plot size increased from 314 to 1964 m2; however, marginal improvements in model fit and prediction accuracy were diminishing and asymptotic as plot size approached 2500 m2. Gobakken and Næsset (2009) similarly reported improvements in R2 and RMSE for stand-level basal area and volume predictions as
Acknowledgements
We thank the Pacific Forestry Centre (PFC) and the Canadian Wood Fibre Centre (CWFC) of the Canadian Forest Service (through the Lodgepole Pine Partnership Project) for the funding support that made elements of this research possible. We gratefully acknowledge Dr. Brice Mora, CFS, PFC, for assistance in preparation of the R source code, and two anonymous reviewers and the RSE Editor-in-Chief for their thoughtful editorial comments and suggestions.
References (54)
- et al.
Simulation and quantification of the fine-scale spatial pattern and heterogeneity of forest canopy structure: A lacunarity-based method designed for analysis of continuous canopy heights
Forest Ecology and Management
(2005) - et al.
Estimating biomass carbon stocks for a Mediterranean forest in central Spain using LiDAR height and intensity data
Remote Sensing of Environment
(2010) - et al.
Patterns of covariance between forest stand and canopy structure in the Pacific Northwest
Remote Sensing of Environment
(2005) Predicting forest stand characteristics with airborne laser using a practical two-stage procedure and field data
Remote Sensing of Environment
(2002)- et al.
Comparing regression methods in estimation of biophysical properties of forest stands from two different inventories using laser scanner data
Remote Sensing of Environment
(2005) - et al.
Estimation of above- and below-ground biomass across regions of the boreal forest zone using airborne laser
Remote Sensing of Environment
(2008) Estimation of tree heights and stand volume using an airborne lidar system
Remote Sensing of Environment
(1996)- et al.
How many principal components? Stopping rules for determining the number of non-trivial axes revisited
Computational Statistics & Data Analysis
(2005) Investigating scale-dependent stand heterogeneity with structure-area-curves
Forest Ecology and Management
(2005)- et al.
A comparison of autonomous, WAAS, real-time, and post-processed global positioning system (GPS) accuracies in northern forests
Northern Journal of Applied Forestry
(2005)