Area-based vs tree-centric approaches to mapping forest carbon in Southeast Asian forests from airborne laser scanning data Remote Sensing of Environment

Tropical forestsare akey componentof theglobalcarbon cycle, and mapping their carbon density isessentialfor understanding human in ﬂ uences on climate and for ecosystem-service-based payments for forest protection. Discrete-returnairbornelaserscanning(ALS)isincreasinglyrecognisedasahigh-qualitytechnologyformapping tropical forest carbon, because it generates 3D point clouds of forest structure from which aboveground carbon density (ACD) can be estimated. Area-based models are state of the art when it comes to estimating ACD from ALS data, but discard tree-level information contained within the ALS point cloud. This paper compares area-basedandtree-centricmodelsforestimatingACDinlowlandold-growthforestsinSabah,Malaysia.Theseforests are challenging to map because of their immense height. We compare the performance of (a) an area-based model developed by Asner and Mascaro (2014), and used primarily in the neotropics hitherto, with (b) a tree- centric approach that uses a new algorithm ( itcSegment ) to locate trees within the ALS canopy height model, measures their heights and crown widths, and calculates biomass from these dimensions. We ﬁ nd that Asner and Mascaro's model needed regional calibration, re ﬂ ecting the distinctive structure of Southeast Asian forests. We also discover that forest basal area is closely related to canopy gap fraction measured by ALS, and use this ﬁ nding to re ﬁ ne Asner and Mascaro's model. Finally, we show that our tree-centric approach is less accurate at estimating ACD than the best-performing area-based model (RMSE 18% vs 13%). Tree-centric modelling is ap- pealingbecauseitisbasedonsummingthebiomassofindividualtrees,butuntilalgorithmscandetectundersto-ry trees reliably and estimate biomass from crown dimensions precisely, areas-based modelling will remain the method of choice. under the CC BY (http:// creativecommons.org/licenses/by/4.0/).


Introduction
Forests are an important component of the global carbon cycle, and their future management is key to international efforts to abate climate change. During the 1990s, about 89,000 km 2 of tropical forests were lost to agriculture each year, and a further 24,000 km 2 were degraded (Nabuurs et al., 2007). Estimates of deforestation rates vary, but somewhere in the region of 230 million hectares of forest were lost per year between 2000 and 2012 (Hansen et al., 2013). Furthermore, some 30% of tropical forests were degraded by logging and/or fire during that period (Asner et al., 2009). These changes resulted in significant releases of greenhouse gases (GHGs) to the atmosphere, constituting approximately 10% of global emissions (Baccini et al., 2012) and emphasising the significance of forests in the terrestrial carbon cycle (Pan et al., 2011). Forest conservation and restoration are increasingly recognised as critical for mitigating climate change (Agrawal et al., 2011). The climate change agreement brokered at COP21 in Paris, and signed by over 200 nations, may be significant in this respect. It is now recognised that, as well as harbouring biodiversity and supporting a billion livelihoods, tropical forests are essential for climate change abatement. Even if nations de-carbonise their energy supply chains within agreed schedules, we are unlikely to avoid 2°C global warming unless 500 million ha of degraded tropical forests are protected, and land unsuitable for agriculture is afforested (Houghton et al., 2015). Forest protection can offset emissions over the next 40 years, buying time for humanity to reduce its dependency on fossil fuels (Houghton et al., 2015).
Accurate monitoring of forest extent and carbon density is essential for these renewed efforts to protect forests, because this information is the basis of programmes to reduce emissions from deforestation and forest degradation and industrial zero-net-emissions programmes, and airborne laser scanning (ALS) is widely recognised as an essential component of these projects. Regional ALS maps of carbon density are all currently generated using "area-based" approaches (Naesset, 2002). These approaches, applied in over 70 publications (Zolkos et al., 2013), relate live-wood aboveground carbon density (ACD) estimates obtained from field plots to simple summary statistics, such as mean canopy height, derived from the ALS point cloud through statistical models (Fig. 1). These approaches for mapping structural attributes of complex multi-layered forests, as outlined by Drake et al. (2002Drake et al. ( , 2003, have since been applied to carbon mapping in several tropical regions (Englhart et al., 2011;Englhart et al., 2013;Asner et al., 2012Asner et al., , 2013Vincent et al., 2012;Jubanski et al., 2013;Laurin et al., 2014;Réjou-Méchain et al., 2015). However, a well-recognised problem is that many different ALS structural metrics can be used to construct the multiple regression equations, and so these models are idiosyncratic by virtue of their local fine-tuning and cannot be applied more widely than the region in which they were calibrated. An alternative approach, advocated by Asner and Mascaro (2014), uses a simple power-law function of mean top-of-canopy height (TCH) to predict carbon density within tropical landscapes. This generalised approach has obvious advantages when it comes to mapping carbon across the tropics. Yet this power-law model hinges on the assumption that (i) forest basal area and TCH are closely related and (ii) that average (i.e., 1-ha resolution) between-plot variation in basal area-weighted wood density is low at regional scales (Asner and Mascaro, 2014;Vincent et al., 2014;Duncanson et al., 2015). Currently, however, we do not have a clear understanding of situations in which these assumptions are supported.
In response to these potential issues with area-based approaches for estimating ACD from airborne laser scanning, there is current interest in developing individual-tree-based approaches to make greater use of the 3D information contained in ALS data ( Fig. 1; Eysn et al., 2015;Ferraz, Saatchi, Mallet, and Meyer, 2016;Vauhkonen et al., 2012). Advances in sensor technology and computational power have generated a proliferation of approaches for detecting individual tree crowns within discrete-return ALS point cloudsincluding those working with the rasterized upper surface of the ALS point cloud (e.g. Hyyppa et al., 2001;Chen et al., 2006;Yu et al., 2011), and those exploiting the entire point cloud (Morsdorf et al., 2004;Reitberger et al., 2009;Duncanson et al., 2014;Ferraz et al., 2016). There are several advantages of individualtree-based mapping compared to area-based approaches: (i) it has a strong fundamental basis because it is conceptually similar to allometric approaches used in field-based inventories; (ii) uncertainty in the estimation model is much less dependent on plot size, allowing calibration using individual trees and small plots (Dalponte and Coomes, 2016); (iii) narrow patches of forest with high conservation value, such as riparian strips, can be mapped; (iv) growth and death of individual trees can, in principle, be tracked and this information used to Fig. 1. Schematic diagram illustrating the key differences between area-based and tree-centric approaches used to estimate aboveground carbon density (ACD) from ALS data. Area-based approaches rely on summary statistics calculated from the ALS point cloud (e.g., top canopy height in a) to develop statistical relationships for estimating ACD (b). In contrast, tree-centric mapping aims to identify and measure the crown dimensions of individual trees within the ALS point cloud (c), and then use these to estimate their ACD (d). Data shown in panels (b) and (d) were taken from Asner and Mascaro (2014)  parameterise individual based models of forest dynamics (Shugart et al., 2015). The main disadvantages are that tree-centric approaches are computationally intensive, most delineation methods can only distinguish trees in the upper canopy, and over-or under-segmentation of trees can lead to biases in biomass estimation. Nevertheless, if these issues can be resolved, individual based modelling could mark a fundamental shift in the way forests are monitored remotely (Shugart et al., 2015). To our knowledge, only one group has used tree-centric approaches and airborne ALS data to map carbon in dense tropical forest, where obscured trees and over/under segmentation may result in substantial bias (Ferraz et al., 2016).
This paper reports on a generalised tree-centric approach for mapping the carbon density of tropical forestsi.e. one where a single model can be applied reliably across contrasting forest types. The work focusses on old-growth forests in the Sepilok Forest Reserve in lowland Sabah, Malaysian Borneo. Sabah has lost much of its lowland forests in recent decades (Gaveau et al., 2014), but has become an important testbed for global initiatives to protect and restore forests. We recently showed that tree-centric approaches to carbon mapping perform well in Alpine coniferous forests (r 2 = 0.98 when fieldand ALS estimate of carbon stocks are compared) and that a correction factor can be applied to account for the small obscured trees that were invisible from the air (Dalponte and Coomes, 2016). However, we recognise that applying this approach to Sabah's forests is likely to be challenging, as it contains some of the tallest and densest forests in the tropics. We therefore compare the tree-centric approach with Asner and Mascaro's (2014) generalised area-based method, and critically evaluate whether the advantages of working with individual trees outweigh any disadvantages associated with the accuracy of tree detection. Sepilok Forest Reserve, within which the study focusses, contains three distinctive forest types within close proximity, providing an outstanding opportunity to test the universality of carbon mapping approaches.

Study site
Sepilok Forest Reserve (5°10′ N 117°56′ E) is a remnant of lowland tropical rainforest on the east coast of Sabah, Malaysia, close to the town of Sandakan (Fox, 1973). This 4294-ha protected area, which ranges in elevation from 0 to 250 m a.s.l., was founded by the Sabah Forest Department's research centre in 1931. Extensive areas have never been commercially exploited, although about 670 ha in the northeast and south of the reserve were selectively logged up until 1957, and several hundred hectares in the north had lianas and non-commercial trees thinned out in 1958 (Dent et al., 2006;DeWalt et al., 2006). Four distinct forest types are recognised within the reserve: Sandstone hill dipterocarp forest (hereafter "sandstone hill") grows on dissected hillsides and crests; alluvial dipterocarp forest is found in the valleys (hereafter "alluvial"); heath forest is located on podzols associated with the dip slopes of cuesta (shallow-sloping slopes following bedding planes; hereafter "kerangas"); and mangroves flank the bay. The first three of these species-rich forests have been the focus of much previous research (Dent et al., 2006;DeWalt et al., 2006;Fox, 1973;Nilus et al., 2011).

Field data
There are nine permanent forest plots of 4 ha within the Sepilok reserve, three for each forest type (Alluvial, Sandstone hills and Kerangas). Three of these plots (one alluvial and two sandstone) were established in 1968 (Fox, 1973). An additional six plots were established in 2000-2001 (Nilus, 2004). All plots have been regularly re-censused since, with data standardised, quality controlled and curated within the ForestPlots.net web application and database (Lopez-Gonzalez et al., 2011). The latest census of these plots carried out between 2013 and 2015 (the "2014 resurvey") was used for this analysis. For the purposes of this analysis, each 4 ha plot was subdivided into 1 ha subplots which were analysed separatelygiving a total 36 1-ha plots. We recognise that spatial autocorrelation could impact the 1-ha plot data; however, we emphasize that the plots are not being used for landscape sampling, rather they are intended for calibration and validation purposes only. In each of the 1-ha plots, all stems ≥ 5 cm in diameter were permanently tagged, their stem diameter (D, in cm) was measured, they were identified to species (or closest taxonomic unit), and their position was marked to the nearest 10 × 10 m subplot (45,214 trees in total). The standard point of measurement (POM) for diameter was at 1.3 m from the base of the tree; for trees with buttress or deformity at 1.3 m, POM was above the non-cylindrical feature. The corners of the plots were geolocated using a Geneq Sx Blue II GPS, which differentially corrects using Space Based Augmentation and has a positional accuracy of b 2 m.

ALS acquisition and processing
Airborne ALS data were acquired on 5 November 2014 using a Leica ALS50-II ALS flown at 1850 m altitude on a Dornier 228-201 travelling at 135 knots. The ALS sensor emitted pulses at 83.1 Hz with a field of view of 12.0°, and a footprint of about 40 cm diameter. The average pulse density was 7.3/m 2 . The total area surveyed was of 26 km 2 . The Leica ALS50-II sensor records full waveform ALS, but for the purposes of this study the data were discretised by the sensor system, with up to 4 returns recorded per pulse. Accurate georeferencing of ALS point cloud was ensured by incorporating data from a Leica base station running in the study area concurrently to the flight. The ALS data were pre-processed by NERC's Data Analysis Node and delivered in standard LAS format. All further processing was undertaken using LAStools (http:// rapidlasso.com/lastools/). Points were classified as ground and nonground, and a digital elevation model (DEM) was fitted to the ground returns using a TIN (LAStools module las2dem), producing a raster of 1 m resolution. The DEM elevations were subtracted from elevations of all non-ground returns to produce a normalised point cloud, and a canopy height model (CHM) was constructed from this on a 0.5 m raster by averaging the first returns. Finally, holes in the raster were filled by averaging neighbouring cells. Given the very high point density, this last procedure was applied to very few pixels (0.02% of the total).

Height-diameter allometry and crown area measurements
We manually delineated the crowns of 147 trees in images of the CHM, and measured their maximum heights (H, m) using the CHM. By walking around in the plots with print outs of the delineated CHM, it was possible to geo-locate 91 of these trees with confidence, giving us 91 trees for which we had H estimated from ALS and D measured on the ground. This dataset spanned the full range of tree sizes present at Sepilok (D: 12-165 cm; H: 16-72 m). Power-law and 3-parameter Weibull functions were fitted to the H-D relationships using maximum likelihood estimation, assuming residuals to be normally distributed and increasing with tree size and the model with the strongest statistical support was found using the Akaike Information Criterion (AIC) approach. AIC is founded on information theory and offsets the complexity costs of including additional parameters against likelihood improvements to select models (see Supporting information for code). Variation in allometry among soil types was also tested using AIC. The best-supported model (i.e., the one with the lowest AIC) was then used to estimate the heights of all trees recorded in the plots. We compared our H-D allometry with another dataset of 644 tree heights collected from a random subset of trees in the plot as part of the 2014 resurvey, using a Vertex III hypsometer (Haglöf, Sweden). For the 91 trees that we had hand delineated, crown area (CA, in m 2 ) was calculated in the imagery, and compared with CAs estimated on the ground from two orthogonal crown diameter measurements.

Estimating tree-and plot-level aboveground carbon density
Field data from the 36 1-ha plots were used to estimate the aboveground biomass (AGB, in kg) of each individual tree using Chave et al.'s (2014) pantropical equation: where WD is wood density (dry mass / wet volume, in g cm −3 ), which we obtained from the global wood density database (Chave et al., 2009;Zanne et al., 2009), D is stem diameter (cm) and H is the estimated height (m) obtained from ground-based allometry, because heights were recorded for only a fraction of trees in the field survey. Trees were assigned species-specific WD values (51% of stems) or closest taxonomic unit (92% of stems matched to genus). For each 1 ha plot, aboveground carbon density (ACD, in Mg C ha −1 ) was then estimated by summing the AGB of all trees within the plot and applying a carbon content conversion factor of 0.47 (Martin and Thomas, 2011).
The dataset of 91 manually-delineated trees was used to obtain a relationship between AGB Field values and H ALS and CA ALS measurements. Regression of manually vs automatically delineated CA produced a near 1:1 relationship, with R 2 = 0.85 and RMSE = 43.3, so we regarded these as equivalent. Jucker et al. (2016) showed that the biomass of 2395 harvested trees in a global dataset was related to height and ) according to a power-law function: Jucker et al. (2016) showed that this function delivered unbiased estimates of AGB across six orders of magnitude variation in tree mass, while α and β varied among plant clades, so is most accurate when fitted to local data. This function was fitted to data from the 91 segmented trees by least-squares regression of log-log transformed data. Goodness of fit was evaluated in comparison to a function based on height alone and on H × CA, the model adopted by Ferraz et al. (2016), using AIC.
2.6. Area-based approach 2.6.1. Generalised ACD equation Asner and Mascaro (2014) proposed a generalised approach for estimating ACD using a single ALS metrictop canopy height (TCH, in m)and minimal field data inputs. The method relates ACD to TCH, stand basal area (BA; in m 2 ha − 1 ) and the community-weighted mean wood density (WD; in g cm −3 ) as follows: This function was fitted to data from the 36 Sepilok plots. TCH was the mean height of CHM pixels within each plot extracted using the raster package of R. As recommended by Asner and Mascaro (2014), leastsquares regression was used to relate field-measured BA to TCH, and field-estimated WD to TCH. These two regression relationships were then included as sub-models in Eq. (3), such that ACD was predicted solely as a function of ALS derived TCH.

Local ACD equation
To reduce uncertainty in ACD estimates, Asner and Mascaro (2014) suggest that, when possible, regionally-calibrated equations for estimating ACD from TCH, BA and WD should be developed. The data presented in this manuscript comes from a single reserve, albeit with diverse forest types, so we refer to our models as "local" rather than "regional", which is more appropriate if plots are widely dispersed in the region of interest. Least-squares regression was used to estimate the parameters of the following power-law model relating ACD to TCH, BA and WD values in the Sepilok dataset: As for the generalised ACD model, BA and WD were first estimated as a function of TCH, and these functions used as sub-models in Eq. (4).

Gap fraction as an estimator of basal area
Central to robustly estimating ACD using ALS data is identifying a metric which captures variation in BA among stands, which is considerable in Sepilok (Fig. S1). Asner and Mascaro's (2014) power-law models make the assumption that TCH can be used as a strong predictor of BA. However, this may not universally be the case (Duncanson et al., 2015;Spriggs, 2015). We tested whether gap fraction (GF)the proportion of area not occupied by crowns at a given height abovegroundcould be used as an alternative ALS metric for predicting BA (Fig. 2b). Gap fraction GF h is calculated by creating a plane horizontal to the ground in the canopy height model (CHM), at height h above ground, and counting the number of pixels for which the CHM lies beneath the plane, divided by the total number of pixels in the plot. GF was calculated for heights of 1 to 50 m.
Goodness of fit of global, local and gap-fraction-modified models are compared by reporting %RMSE, calculated as ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ∑ðy−ŷÞ 2 q =ðNyÞ, where y are the data, y is the mean,ŷ are the model estimates, N is the number of observations. %Bias is reported as ∑ðy−ŷÞ=ðNyÞ.

Tree-centric approach
As an alternative to area-based approaches, we tested whether ACD could be accurately estimated by summing the biomass of individual tree crowns (ITCs; Fig. 1). The itcSegment algorithm was used to delineate trees within each of the 36 plots and a 10 m buffer zone around them. itcSegment is implemented in R and made freely available on CRAN (https://cran.r-project.org/web/packages/itcSegment/index. html). It works by finding local maxima in the raster CHM, regarding these maxima as tree tops, then growing crowns around them by local searching of the raster CHM and point cloud. This approach was initially developed for coniferous forests (Dalponte et al., 2014;Eysn et al., 2015;Dalponte and Coomes, 2016) following the concept of Hyyppa et al. (2001). It has three stages (see Supporting information for further details): 1.
Smoothing the Canopy Height Model. A Gaussian low-pass filter is applied to the raster CHM to smooth it, improving the reliability of the region-growing code by removing sharp changes in height within crowns that might otherwise be inferred to be crown edges.
2. Locating local maxima in the CHM. To function effectively in tropical forests, we found it necessary to introduce variable window sizes when searching for local maxima around the central pixel of the window. If a particular pixel in the CHM is high above the ground, then it is assumed to belong to a large tree, so a relatively large search window is needed to find the top of that tree. The dimension of the search window for a CHM pixel of elevation z is equivalent to crown radius of a wider-than-average tree of that height, ensuring that even the widest crowns in the images were not over-segmented. This crown radius was obtained by fitting an allometric function to the relationship between crown radius and height of 147 manually segmented trees, using quantile regression (quantreg package of R), with tau = 0.9 such that 90% of data-points sat below the regression line (Fig. S4). A look-up table was created so that a window size could be obtained quickly for a given value of z based on this allometry.
3. Region growing around local maxima: The algorithm then searches iteratively around each local maximum for neighbouring pixels that are slightly lower in elevation but not greatly lower (i.e. off the crown's edge) or higher (i.e. belonging to a neighbouring tree), and regards these as being part of the same crown. The algorithm has various stopping rules to constrain the growth of each crown: crown width cannot exceed those observed in manually delineated crowns and crown depth cannot exceed a fixed percentage of height. Once regions are fully grown, the algorithm explores the ALS point cloud and uses the data to create a 2-D convex-hull polygon representing the crown.
Once trees are delineated in the imagery, the next step is to estimate their individual biomass. This was achieved by predicting biomass using Eq. (2), which was parameterised using the crown diameters and heights of the 91 trees. We then summed the AGB of segmented trees, and multiplied by a carbon density of 0.47, to obtain ACD estimates for each of the 1 ha plots. The final step is to calculate a correction factor to account for understorey trees that were not segmented because they are obscured by the upper canopy, and large trees that were over-segmented. This correction factor was obtained by fitting a regression line, passing through the origin, to the relationship between ACD ALS and ACD Field (Dalponte and Coomes, 2016).
The accuracy of the ITC delineation was assessed by comparing the numbers of delineated trees within stem diameter classes with the numbers observed in the field plots. To do this, the heights of the delineated trees were used to estimate stem diameters, using an allometric function derived from the 91 trees co-located on the ground and in the aerial imagery.

Comparison of forest types
Alluvial, sandstone hill and kerangas forest types are clearly visible in the CHM and DEM of Sepilok Reserve (Fig. 2). In the three panels showing magnified-CHMs of the forest types, the alluvial forest is seen to contain the tallest trees (some reaching 70 m in height), while also exhibiting large gaps between emergent trees (Fig. 2d). The sandstone hill forest appears less tall (50-60 m) and is more densely packed (Fig. 2c); while the kerangas is much shorter with densely packed trees (Fig. 2e). Summary statistics from the plots corroborate these descriptions: the sandstone forests have the greatest above-ground carbon density, not because they are the tallest but because they have high basal areas (Fig. 2g-i), whereas the kerangas has low biomass by virtue of its height, despite having high wood densities (Fig. S1). The sandstone hill and kerangas forests have unimodal distribution of ALS returns with height, which is typical of the signal returned from forest canopies (Fig. 3). However, the alluvial has a distinctive bimodal distribution, resulting from a dense subcanopy sitting beneath the scattered emergent trees (Fig. 3). This strong variation in forest structure makes it challenging to map carbon density using a single model, as is our aim.

Allometric relationships
ALS-estimated heights (H ALS ) were related to field-measured diameters (D) as follows: H LiDAR = 5.29 × D 0.50 (red circles and curve; Fig.  3a). This power-law function was better-supported statistically than a 3-parameter Weibull (see Supporting information). The ALS-estimated heights are mostly within the upper quartile of data points obtained from field-measurements, and the power-law regression line through the field data is systematically different to the ALS dataset (grey circles and curves; Fig. 3a). The allometric function obtained from the field data is more similar to previous reports than our ALS-based function estimates (See Fig. 3a; Morel et al., 2011, Feldpausch et al., 2011and Chave et al., 2014, so it seems likely that our approach for segmenting ALS imagery is preferentially sampling emergent trees that have systematically different HD allometries. For this reason, we used field-based allometries to estimate heights of trees, and generate forest-type-specific allometries (see Supporting information). If we had used the ALS-based allometry, ACD would have been 1.19× greater than the ones presented; this underscores the need to collect terrestrial LiDAR, ALS and field datasets across multiple sites in the tropics, to get a better handle on tree allometry.
As expected, a linear relationship was found between field and remotely measured crown areas, although field estimates are 30% smaller on average, with large uncertainty (Fig. 3b), probably because estimates based on field data assume that crowns are symmetrical ellipses, based on just two diameter measurements, whereas the ALS approach allows the margins to be determined more accurately (e.g. Fig. S4).  (2014) The generalised model (Eq. (3)) gave accurate estimates of ACD when sub-models of the relationships between BA, WD and TCH were fitted to the Sepilok datasets. We obtained the following relationship between basal area and top canopy height (Fig. 4a): Note the relationship is forced through the origin for reasons explained by Asner and Mascaro (2014). The basal area-weighted mean WD is estimated from TCH as follows (Fig. 4c): By entering Eqs. (5) and (6) into Asner and Mascaro's (2014) generalised model (Eq. (3)), we get ACD generalised = 7.37 × TCH 0.870 . This function produces biased estimates of biomass when applied to the Sepilok Forest Reserve, underestimating ACD field by 19% on average (Fig. 5a).

Local ACD equation
A local model, obtained by fitting Asner and Mascaro's (2014) power-law model to the data available from Sepilok yielded the following relationship: where TCH was obtained from the ALS survey, and BA and WD from the field data collected from the 36 plots. Entering the BA and WD relationships into Eq. (7), gives the following relationship: ACDLocal = 1.030 × TCH 1.535 . As shown in Fig. 5b, the local model is unbiased, but its accuracy remains low (%RMSE =27%). It is clear that the sandstone hill and kerangas plots fall along a line, but that the alluvial plots deviate from the line, and that there is no correlation between fieldand ALSpredicted ACD among these plots.

Gap fraction as an estimator of basal area
The primary reason why Eq. (7) is inaccurate in its prediction is that BA is so poorly correlated with TCH (Fig. 5a). However, we find that gap fraction at 19 m above ground (GF 19 ) is closely related to BA (Fig. 5b)   estimates (Fig. 5c). The RMSE was 29 Mg C ha −1 , corresponding to a percentage RMSE of 13%.

Individual tree crown (ITC) approach
AGB (kg) is related to the height and crown diameter of 91 delineated stems as follows: where CD is crown diameter. The allometric model has R 2 = 0.74 and %RMSE = 11.8% and had lower AIC than alternative models (AIC = 190,197,209 for H × CD, H × CA and H, respectively). Introducing variable window sizes when searching for local maxima allowed detection of both large and small trees within the rainforest canopies (Fig. 6ac). Without this flexibility, the algorithm either omits small trees when the window was large, or over-segments large ones when the window was small. Nevertheless, because itcSegment only searches for maxima in the canopy height model, it cannot find subcanopy trees, detecting only 9.5% of stems in the 10-30 cm size class. Trees in the 10-30 cm size class contribute 23% of ACD, so represent a significant pool of carbon (Fig. 6e). Furthermore, carbon stored in very large trees was overestimated. Only 63 trees exceeded 110 cm diameter within the 36 plots, representing a tiny fraction of stems (0.2%) but a more significant percentage of carbon (10%). The ITC analysis recognised 65 trees over 100 cm diameter, but overestimated their carbon by 124%. These problems were mostly restricted to the four plots at a single alluvial site, within which the very tallest trees are located.
In our previous work with conifers in the Italian Alps, we found that ITC-based estimates of ACD were consistently less than field estimates, because obscured trees were by far the most important source of bias in the tree-centric approach (Dalponte and Coomes, 2016). This allowed us to use a single correction factor to all plots. However, here we find that ACD ALS is greater than the field estimates in five of the 36 plots (Fig. 7a). The bias is removed when ACD ALS values are multiplied by a single correction (1.21) but the RMSE remains high (20%). Even when   7. Relationship between aboveground carbon density (ACD) estimated from field data and ACD estimated using the tree-centric approach; (a) without correcting for undetected subcanopy trees; (b) applying a universal correction factor; (c) applying forest-specific correction factors to the ACD estimates obtained from the tree-centric approach to account for missing stems. The one-to-one line is given for reference in all panels (dashed black lines); the solid red line in (a) is a regression line fitted to all plots from which the universal correction factor (1.21) is obtained. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.) separate correction factors are applied to the three forest types (alluvial = 1.07, sandstone hill = 1.30, kerangas = 1.32), the %RMSE of tree-centric approach remains greater than that obtained by area-based modelling (RMSE = 18 vs 13%). Note once again that the alluvial forest is problematic to model, because over-segmentation in one site causes the correction factor to differ greatly from the other forest types.

Area-based approaches
The power-law modelling approach developed by Asner and Mascaro (2014) was effective for mapping forest carbon in the oldgrowth lowland rainforests of Sabah once adjustments had been made for use in this region. Their generalised model parameterised mostly from Neotropical datasets (Eq. (3)) gave biased results when applied to the Sepilok data, even when local BA-TCH and WD-TCH sub-models were used, so locally parameterised power-law functions were developed. Furthermore, basal area was better modelled as a function of gap fraction rather than top-of-canopy height.

Distinctiveness of SE Asian forests
Asner and Mascaro's (2014) paper shows that tropical forests from 14 regions differ greatly in structure, exemplified by the TCH-BA scatterplot in Fig. 8. Remarkably, they found that a generalised powerlaw relationship could be fitted that transcended all these contrasting forests types, once regional differences in structure were incorporated as sub-models relating BA and WD to TCH. However, that generalised model delivered biased results when applied to Sepilok forest datasets, underestimating ACD by 19% when the models were fitted with local data (Fig. 5a). The explanation for this bias is the dominance of lowland forests by trees in the Dipterocarpaceae, which make them distinctive from all other tropical forests (Corlett, 2009). The height-diameter allometry of these dipterocarp dominated forests are different from those found elsewhere -they have particularly narrow stems for their height (Feldpausch et al., 2011;Banin et al., 2012). The basal area of the plots in Sabah is not especially high compared with those in the Asner and Mascaro dataset (Fig. 8a), but ACD is in the upper quantiles of other forest of comparable basal area (Fig. 8a): the extraordinary height of the alluvial and sandstone hill forests gives rise to high ACDs, while high wood density in the stunted kerangas trees are the underlying cause of their high ACD. The ACD values in Sepilok Reserve (sandstone hill, alluvial and kerangas contain on average 226,190,150 Mg C/ha,respectively) are not dissimilar to those reported from other unlogged plots in lowland Sabah (176 Mg C/ha; Morel et al., 2011) or to regional estimates derived remotely (160 Mg C/ha Avitabile et al., 2016), but they are among the greatest ACD recorded anywhere on the planet (Avitabile et al., 2016;Sullivan et al., 2017). The kerangas forests at Sepilok contain many stems of Tristaniopsis merguensis and Cotylelobium melanoxylon, which have dense wood (0.94 and 0.83 g cm −3 , respectively) reflecting adaptation to the nutrient-poor soils. The distinctive structure of the Sepilok forest compared with anything used to calibrate the power-law model previously meant that it needed be custom-fitted with data only from that forest. Rainforests of SE Asia are structurally diverse (Corlett, 2009), and it remains to be seen whether a generalised model can be produced for this region, as found previously for 14 forest types in the Asner and Mascaro dataset.

Predicting basal area by gap fraction
ACD is more closely related to basal area than to height (Fig. 8), so it is essential that the basal-area sub-model predicts basal area accurately from ALS statistics. Here we found that gap fraction at 19 m above ground (GF 19 ) was a better predictor of basal area than TCH. This makes intuitive sense if one considers an open forest comprised of just a few trees -the crown area of each tree scales with its basal area so the gap fraction at ground level of a plot is negatively related to the basal area of its trees. A similar principle applies in denser forests, but now that canopies overlap, the best-fitting relationship between gap fraction and basal area is no longer at ground level, but is instead further up the canopy. Our study is the first to formally introduce gap fraction into the modelling framework of Asner and Mascaro (2014), but several other studies have concluded that gap fraction is an important variable to include in multiple regression models of forest biomass (Colgan et al., 2012a;Li et al., 2014;Spriggs, 2015;Singh et al., 2016). Additionally, (Colgan et al., 2012b) used fractional canopy cover (CC = 1 − GF) to modify the relationship between aboveground carbon density and TCH for African woodland savannas. In a follow-on paper to this one, we develop a state-wide carbon model for Sabah, including secondary as well as old-growth forests. Modelling basal area as a function of canopy cover at 19-m height is once again shown to improve goodness of fit, although the functional form is different (Coomes et al. archived). In the specific case of Sepilok, GF 19 proved particularly effective for predicting basal area, as it was able to clearly distinguish between alluvial forestswhere large emergent crown tower over a dense understory layerand the more compact vertical distribution of crowns in sandstone hill and kerangas forests (Fig. S2). Incorporating GF 19 into the basal area sub-model yielded the best predictions of ACD (Fig. 5c), with an RMSE of 29.0 Mg C ha − 1 . This is slightly greater than the RMSE 24.7 Mg C ha −1 reported by Asner and Mascaro (2014) for their generalised model fitted to 14 forest types, but their study included secondary forest, which have lower variance in ACD (Fig. 8), so a lower RMSE was anticipated.

Tree-centric approach
ITC delineation has the potential to make much fuller use of the information within 3D ALS point clouds compared to area-based approaches, yet in the present study, the end result was less accurate ACD estimates than those achieved by area-based approaches (RMSE = 18% vs 13%). Our previous work in the Italian Alps showed tree-centric approaches to give marginally more accurate results (R 2 = 0.98 vs 0.96), but that paper focussed on structurally simple conifer-dominated forests. Another recent study working in dense tropical forests also found the tree-centric approach to underperform compared to areabased methods. Ferraz et al. (2016) developed an algorithm capable of detecting both sub-canopy as well as canopy crowns, yet it too delivered less accurate results than an area-based model using TCH as in input variable when applied to the 50-hectare plot on Barro Colorado island in Panama (RMSE = 13.8% vs 12.5% at the 1-hectare scale).
Three key factors contributing to uncertainty in ACD estimates are over-segmentation of large trees, incomplete detection of sub-canopy trees, and uncertainty in AGB estimates from ITC information. These are discussed in turn: Over-segmentation of emergent dipterocarps, such as the one shown in Fig. S4, was observed. This is concerning given that 90% of biomass in alluvial plots is held in just 12% of large trees, compared to 36% in the kerangas plots and 23% in the sandstone hill plots. Yet over-segmentation is not actually such a problem, because of the H × CD term in the tree biomass function (i.e., AGB = 0.136 × (H × CD 1.52 ). To illustrate this point, imagine we split a tree in two, vertically. If we only modelled AGB as a function of H, we would severely overestimate ACD. However, by including crown diameter (CD) in the equation this error is reduced, because the two resulting trees would each have half the CD of the original. This functional form has been shown previously to provide unbiased estimates of biomass when applied to a global dataset of AGB, H and CD values (Jucker et al., 2016). In contrast, Ferraz et al. (2016) found that the sum of WD × ∑CA × H was the function that gave the closest relationship between fieldand ALS-estimates of ACD. Including crown dimensions in regression modelling may be useful when mapping regions with diverse forest types, because Blanchard et al. (2016) report that CA-D allometry is less variable among forest types than H-D relationships.
Incomplete detection of understory trees accounted for 46 Mg C ha −1 of missed carbon in the Sepilok forest, a significant fraction of total ACD. Detecting more understory trees will help reduce bias in carbon estimation. Recent years have seen substantial progress in segmentation, as both ALS instruments and the algorithms used to delineate trees from ALS data have improved considerably (Popescu et al., 2003;Yao et al., 2012;Duncanson et al., 2014;Paris, Valduga and Bruzzone 2016;Shendryk et al., 2016). Several recent papers point the way forward. Duncanson et al. (2014) developed an approach that first finds trees in the upper canopy, then strips points from the 3D point cloud that are associated with these trees, before searching the remaining point cloud for further trees. Ferraz et al. (2016) developed a "bandwidth model", which estimates the likely crown dimensions of trees of a certain height, based on information gained by manual delineation of crowns from the imagery, and uses this information to search the entire point cloud for trees. Paris et al. (2016) developed a segmentation method that was able to correctly delineate the crowns of 97% and 77% of canopy dominant and understorey trees, respectively, as well as accurately measuring the crown dimensions of all segmented trees. Equally promising is Shendryk et al.'s (2016) algorithm which segments trees from the bottom up, mimicking the approach used to process terrestrial laser scanning data (Calders et al., 2015). Nevertheless, our less-sophisticated approach of segmenting only the trees in the upper canopy, and applying a correction factor to account for missed carbon, may deliver similar precision provided the size distribution of forests in known, such as a power-law or Weibull distribution (Spriggs, 2015). Comparisons of the effectiveness of different algorithms when applied to the same tropical forest plots will help the refinement of approaches.
Estimates of biomass from ITC information have high variance, because the allometric relationship between tree-AGB and (H × CD) has high residual variance (see also Jucker et al., 2016): our model predicts that a 50 m tall tree contains 2300 kg C with 95% confidence intervals of 1870-2920 kg C, which is very wide. One reason for this uncertainty is that ALS measures tree volumes and is generally poor at estimating wood density: we found a strong relationship between TCH and WD at plot level, but the relationship between individual H and WD values was very weak.

Conclusions: is it worth trying to see both the forest and the trees?
Individual-based modelling performed less well than area-based approaches, as also found by Ferraz et al. (2016) working in Panama. Given that ITC approaches are computationally demanding, is there any justification in using them? We argue that researchers should certainly persist in their development for several reasons. First, maps of trees are of practical value. For example, farm owners in several tropical countries are legally obliged to protect strips of riparian forests on their land, but enforcing such laws has proven difficult for lack of fine-scale maps of riparian zone extents and the fate of trees within them: individual based mapping is an excellent tool for this. Similarly, the high carbon stock (HCS) approach developed in partnership with the Roundtable on Sustainable Palm Oil commits oil palm companies to protect and restore forests on their estates to make them carbon neutral (http:// highcarbonstock.org/), and fine-scale mapping of carbon in these forest fragments could be valuable in this context. Secondly, there is great interest in tracking the dynamics of the largest trees in tropical forest, because they account for a large fraction of forest carbon and are known to be especially vulnerable to drought, acting as early-warning signals of the effects of climate change (Nepstad et al., 2007;Bennett et al., 2015). Repeat surveying with ALS could prove valuable for this, but only area-based approached have been used to date (e.g., Simonson et al., 2016).
Another advantage of the ITC approach is that it is directly analogous to well-established field-based approaches for measuring carbon, and so has a strong theoretical basis in terms of identifying and minimising sources of uncertainty and bias, and propagating error. For example, differences in height estimates between ALS and ground measurements (Fig. 3) shine a light on a major source of bias in ACD estimates (Hunter et al., 2013). With area-based approaches, prediction error decreases with increasing plot sizes due to reduced edge effects (Goetz and Dubayah, 2011). The same is true of ITC approaches but the errors are less dependent on plot size (Dalponte and Coomes, 2016). Intriguingly, Ferraz et al. (2016) showed the coefficients of models developed by a ITC approach were invariant of scale, whereas the coefficients of power-law models varied with scale. ITC approaches could allow a more explicit characterisation of such issues. For the moment, areabased approaches are likely to remain the method of choice for mapping tropical forest carbon, but ITC approaches are advancing rapidly and could offer a viable alternative as algorithms, sensors and analysis platforms mature.