A comprehensive framework for assessing the accuracy and uncertainty of global above-ground biomass maps

Over the past decade, several global maps of above-ground biomass (AGB) have been produced, but they exhibit significant differences that reduce their value for climate and carbon cycle modelling, and also for national estimates of forest carbon stocks and their changes. The number of such maps is anticipated to increase because of new satellite missions dedicated to measuring AGB. Objective and consistent methods to estimate the accuracy and uncertainty of AGB maps are therefore urgently needed. This paper develops and demonstrates a framework aimed at achieving this. The framework provides a means to compare AGB maps with AGB estimates from a global collection of National Forest Inventories and research plots that accounts for the uncertainty of plot AGB errors. This uncertainty depends strongly on plot size, and is dominated by the combined errors from tree measurements and allometric models (inter-quartile range of their standard deviation (SD) = 30 – 151 Mg ha (cid:0) 1 ). Estimates of sampling errors are also important, especially in the most common case where plots are smaller than map pixels (SD = 16 – 44 Mg ha (cid:0) 1 ). Plot uncertainty estimates are used to calculate the minimum-variance linear unbiased estimates of the mean forest AGB when averaged to 0.1 ∘ . These are used to assess four AGB maps: Baccini (2000), GEOCARBON (2008), GlobBiomass (2010) and CCI Biomass (2017). Map bias, estimated using the differences between the plot and 0.1 ∘ map averages, is modelled using random forest regression driven by variables shown to affect the map estimates. The bias model is particularly sensitive to the map estimate of AGB and tree cover, and exhibits strong regional biases. Variograms indicate that AGB map errors have map-specific spatial correlation up to a range of 50 – 104 km, which increases the variance of spatially aggregated AGB map estimates compared to when pixel errors are independent. After bias adjustment, total pantropical AGB and its associated SD are derived for the four map epochs. This total becomes closer to the value estimated by the Forest Resources Assessment after every epoch and shows a similar decrease. The framework is applicable to both local and global-scale analysis, and is available at https://github.com/arnanaraza/PlotToMap. Our study therefore constitutes a major step towards improved AGB map validation and improvement.


Introduction
Above-ground biomass (AGB) is the total mass of material stored in the living stems, branches and leaves of vegetation, and is often described as a biomass density, with units of mass per unit area. AGB is recognised by the Global Climate Observing System (GCOS) as an Essential Climate Variable (ECV), primarily because it is intimately related to both emissions of CO 2 to the atmosphere arising from land use change and fire, and uptake of CO 2 from the atmosphere due to vegetation growth (GCOS, 2016). However, it has much wider significance because of its value to human societies for energy, materials and other ecosystem services, and is also important in forest management and for policy initiatives such as Reducing Emissions from Deforestation and forest Degradation (REDD+). As a result, there have been major efforts to map forest AGB using Earth Observation (EO) data ; at least 15 AGB maps for five epochs have been derived at pantropical to global scales according to meta-analyses by Rodríguez-Veiga et al. (2017) and Zhang et al. (2019).
Further maps are anticipated because of new missions dedicated to measuring forest structure and AGB, including the Global Ecosystem Dynamics Investigation (GEDI) LiDAR mission (Dubayah et al., 2020), the NASA-ISRO Synthetic Aperture Radar (NISAR) (Kellogg et al., 2020) and the ESA-BIOMASS SAR mission .
Current AGB maps were derived using different methods and data sources (Langner et al., 2014;Rodríguez-Veiga et al., 2017;Mitchard et al., 2013;Réjou-Méchain et al., 2019). This leads to significant disagreements between them that reduce their value for estimating carbon stocks in global and national applications. In addition, the maps have specific individual error properties, rendering them unreliable for biomass change analysis, despite representing different epochs .
The accuracy of AGB estimates is normally quantified by characterising their error, i.e., the difference between the estimated and true AGB; this is normally unknown unless trees are destructively harvested to obtain their true weight. Ideally, the full error distribution would be known, but accuracy is commonly described statistically using various moments of the error distribution. Often, only two moments of the error are considered: the bias, which is the mean value of the error, and the precision, which quantifies the spread in the distribution of random errors around this mean (Dieck, 2007). These random errors are often less troublesome since their dispersion can be reduced by averaging, but this does not reduce bias.
There are many potential sources of bias in AGB estimation, including methodological, human and equipment biases when measuring tree dimensions; the use of incorrect allometric models when estimating tree AGB from these measurements; factors affecting EO signals such as saturation at high biomass, and mixed soil and vegetation components influencing signals from low biomass areas ; and variations in the EO signal due to environmental effects such as rain and snow (Santoro et al., 2015). As a result, a consistently observed pattern in current AGB maps derived from space data is overestimation of low biomass and underestimation of high biomass Rodríguez-Veiga et al., 2019).
Assessment of the accuracy of an AGB map has to take into account both errors in the map itself and in the reference data used to validate it (Duncanson et al., 2021). Reference data are commonly in situ plot measurements (plot data), whose uncertainty can be quantified using methods described in the Committee for Earth Observing Satellites (CEOS) AGB validation protocol (Duncanson et al., 2021). Plot uncertainties originating from tree measurements have rigorously been assessed at local scales (McRoberts et al., 2016;Chave et al., 2004;Harmon et al., 2015;Réjou-Méchain et al., 2017), and their propagation into AGB maps has been assessed at local (Chen et al., 2015) and regional scales (Rodríguez-Veiga et al., 2016). At larger scales, Avitabile and Camia (2018) accounted for both plot and map uncertainties in four pan-European maps using independent plot data to evaluate map bias and precision.
Consistent accuracy and uncertainty assessment of continental and global scale AGB maps are hampered by the lack of a global reference dataset (Schimel et al., 2015;Rodríguez-Veiga et al., 2017). Sampling the world's forests is highly labour-intensive and expensive, and forest inventory data are often not open access. National Forest Inventories (NFIs) have been established only in a limited set of countries, of which a minority are in tropical areas (McRoberts and Tomppo, 2007). In addition, NFIs in the tropics are often incomplete or unrepresentative because forest regions may be remote, inaccessible, or located in conflict areas. Efforts have recently started to centralize and standardize AGB plot data. The Forest Observation System (FOS) provides access to thousands of plot data (Schepaschenko et al., 2019), and the standardization of plot data from large plots and LiDAR-derived transects has been advocated . Other data sources, such as the Global Forest Biodiversity Initiative (GFBI) (Liang et al., 2016), provide data to researchers by request and GFBI also encourages the contribution of data. Fine-resolution AGB maps from LiDAR can also be used as an alternative to plot data for AGB map validation (McRoberts et al., 2019a) and provide high-quality AGB estimates over more extended areas than forest inventories (Labriere et al., 2018). Nevertheless, there are only a few fine-resolution AGB maps, open access data sources and proprietary plot data that can be used under data-use agreements. The consequent lack of a consistently sampled reference AGB dataset has consequences for statistical inference, which is only possible under certain assumptions and requires the data to be accompanied by uncertainty estimates (de Bruin et al., 2019;McRoberts et al., 2020;Duncanson et al., 2021).
Using a collection of plot datasets with uncertainty estimates across the globe offers several opportunities. Firstly, differences between plot measurements and global AGB maps can reveal regional patterns that may be explainable by environmental and/or ecological variables (de Bruin et al., 2019). This would allow these differences to be predicted using model-based approaches. For example, Tsutsumida et al. (2019) identified geographical areas with high AGB errors, attributing error hotspots to local land use practices. Secondly, global data can be used to investigate bias and develop bias reduction methods. As examples, Xu et al. (2016) and Zhang and Liang (2020) applied random forest regression to model and remove AGB map bias, whilst Avitabile et al. (2016) combined weighted linear averaging with bias removal methods when fusing the Saatchi et al. (2011b) and Baccini et al. (2012) pantropical maps. Thirdly, the availability of plot-level uncertainties allows evaluation of the extent to which plot-map differences can be attributed to map error.
A key GCOS principle for climate monitoring is that random errors and time-dependent biases in satellite observations and derived products should be identified (GCOS, 2016), and more generally map users prefer AGB maps to be unbiased and to have spatially explicit uncertainty information (Quegan and Ciais, 2018). The latter should include information on the spatial correlation of map errors since this is needed to model the precision of AGB estimates derived by averaging and summing map pixel values at coarser grid or country scales (de Bruin et al., 2019). In this paper, we propose a model-based framework designed to meet these needs using a global opportunistic sample of plot data to assess four AGB maps. This allows four questions to be addressed: (1) What is the error contribution from different plot error sources? (2) How can map bias be assessed? (3) How can map users and producers benefit from this framework? (4) How can the framework be applied to derive the total AGB and its uncertainty in the pantropics in different periods?

A framework for comparing plot and map estimates of AGB
The framework first pre-processes plot data to minimize forest area (where "forest" is set to be 30-m pixels with >10% tree cover (Hansen et al., 2013)) and temporal mismatches with the AGB maps (Section 2.2), and then has three main analysis steps highlighted in Fig. 1. Although plot estimates of AGB may be biased if an incorrect allometric model is used, this bias will tend to be small if local allometric models are used, as is often the case for NFIs and research plots . Hence we here assume they are unbiased and, after quantifying their uncertainties (Section 2.3), use them to calculate the minimumvariance linear unbiased (MVLU) estimates of the mean AGB within 0.1 ∘ grid cells, together with their uncertainties (Section 2.4). This allows the biases in the maps to be quantified. Map bias and the spatial correlation of random map errors are then modelled, respectively using spatial covariates and variograms of AGB residuals as inputs (Section 2.5), and applied to four global AGB maps (Baccini, GEOCARBON, GlobBiomass and CCI Biomass; see Section 2.2). Finally, we estimate the total pantropical AGB for each map epoch, together with their confidence intervals, and compare them with the values from the 2020 Forest Resource Assessment (FRA) (UN-FAO, 2020) (Section 2.6).

Data inputs
Three types of input data are needed to implement the framework: Fig. 1. Schematic of the framework indicating the three main inputs and three analysis steps leading to the comparison between plot and map estimates of AGB and the totals of the latter. Note the two-way link between the plot-to-map comparison and uncertainty modelling, indicating the assessment of map accuracy.
(1) plot estimates of AGB, which we refer to as plot data; (2) plot data with tree-level measurements (at least tree diameter), referred to as plots with tree data; and (3) global AGB maps.

Plot data
We used a global collection of plot data from NFIs (often derived using systematic sampling) and research network plots; the latter were mainly in the tropics, where they make up a quarter of the tropical plots. Most plot data were obtained under data-use agreements (see Table S1 for the plot metadata). From them, we selected a subset meeting the following criteria. Plots should: 1. not have been used for AGB map calibration; 2. have precise coordinates to at least four decimal places in decimal degrees; 3. have been measured within ten years of the map epoch ; 4. have plot-level AGB estimated using all trees with a diameter at breast height (DBH) of 10 cm or greater; 5. not have been deforested in the period between the inventory and the map epoch, according to forest loss data (Hansen et al., 2013); 6. have comprehensive metadata that contains information about the field measurements, allometric model and sampling scheme used; 7. have an associated report or other publication.
This yielded a total of at most 116,181 (out of a possible 225,698) globally distributed plots to be used as reference data. Their coverage was assessed against: (a) world regions and tree cover; (b) biomes defined by specific precipitation and temperature regimes (Iremonger and Gerrand, 2011); (c) strata derived from tree cover and population density, in order to assess plot coverage in forests with and without human disturbance (Fig. 2). The plot data cover all biomes (though some, such as portions of boreal and tropical rainforest, are underrepresented) and they extend over all the population density and tree cover strata.
The size of individual plots ranges from 0.02 ha to 25 ha, with a median size of 0.20 ha, and they cover a total area of 18,192 ha. The plot measurements took place between 1996 and 2018, and their AGB values were estimated using allometric models deemed appropriate for their forest area by the data providers. The mean plot AGB is 100.70 Mg ha − 1 with a standard deviation (SD) of 158.31 Mg ha − 1 . More comprehensive plot summary statistics are shown in Table S2.

Harmonizing plot and map data
Comparisons between plot and map data are only meaningful if they share common spatial and temporal characteristics. This requires applying two pre-processing steps to the plot data. For plots surveyed either before or after the map epoch, the first uses forest growth data and the number of years between the plot and map estimates to adjust the plot AGB to the date of the map data (Avitabile and Camia, 2018). We used the growth data from model-based estimates derived from chronosequences and permanent plots for the tropics and subtropics (Requena Suarez et al., 2019), and for temperate and boreal regions (Buendia et al., 2019); these are improvements of the estimates in the Fig. 2. Distribution of the plot data: (a) within countries and areas with >10% tree cover (Hansen et al., 2013); (b) within ecological regions or biomes (Whittaker, 1975) as a function of rainfall and temperature; (c) within strata with and without possible man-made forest disturbances as indicated by a scatterplot of log2-scaled human population density (Balk and Yetman, 2004) against tree cover percentage. IPCC 2006 report. Specific growth data are used depending on the forest type, ecological region, forest age and continent. The second step deals with the different areas of the forest plots and the AGB map support unit (i.e., the original pixel size or a coarser grid cell). Note that the map provides an estimate of AGB within each support unit, but this may include non-woody and non-forest areas, especially in heterogeneous and fragmented landscapes (Chave et al., 2004;Nascimento and Laurance, 2002). AGB maps including non-forest woody vegetation are also preferred by some users, including climate modellers (Quegan and Ciais, 2018). To provide an estimate of the same quantity from plot data, we assume that the plot data properly represent the forested part of the support unit and other types of land cover have negligible AGB. Then for plots smaller than the support unit, the plot-based estimate of the AGB in the support unit (or average AGB for coarser grid cells) is given by multiplying the plot AGB by the forest fraction (0-1), where this is derived by defining forests as areas with at least 10% tree cover (following Food and Agriculture Organization guidelines (UN-FAO, 2010)) and using the 30-m tree cover layer by Hansen et al. (2013).

AGB maps
From the AGB maps listed in Table S3, four were selected on the basis of three criteria: (1) global extent; (2) open access; (3) accompanying maps of uncertainty (referred to as an SD layer). The Baccini map (epoch 2000), is based on the two-step method of Baccini et al. (2012) that first establishes a statistical model relating spaceborne LiDAR metrics to AGB reference plots, allowing AGB estimation at the LiDAR footprints. These AGB estimates are then used to calibrate a statistical model which estimates AGB from Landsat reflectance, thus generating a global AGB map accessible at https://www.globalforestwatch.org/. The available SD layer of the Baccini map includes estimates of errors from allometric models, the LiDAR-based model and the Landsat-based model, and is currently limited to the pantropics, but the provision of the global layer on the Global Forest Watch platform is planned (Table S3). The GEO-CARBON 2007-2010 map (Avitabile et al., 2014) was produced by combining a refined pantropical map (Avitabile et al., 2016) (a fusion of the Saatchi et al. (2011b) and Baccini et al. (2012) maps) with the boreal map of Santoro et al. (2015), to obtain global coverage. The SD layer of the refined pantropical map is estimated using a procedure based on error stratification, whereas that of the boreal map accounts for random variation in the radar backscatter intensity used to predict growing stock volume (GSV) and AGB. The GlobBiomass 2010 and the CCI Biomass 2017 version 1 maps were produced from spaceborne synthetic aperture radar (SAR) images of backscattered intensity Santoro et al., 2021), from which GSV was estimated using physically-based models and then converted to AGB by scaling for wood density and biomass expansion factors estimated from empirical models . The SD layers of the two maps account for random errors in radar data and from the biomass retrieval process and its parameters using a first-order Taylor series approach. The Baccini and GEOCARBON maps are limited to forest areas, while GlobBiomass and CCI Biomass maps include non-forest areas.

Uncertainty in estimating AGB from plot data
The plot harmonization described in Section 2.2 involves adjusting plot values to minimize temporal and areal mismatches between the plot and map estimates, both of which involve uncertainty. Another cause of difference between the true AGB in a map pixel and its estimate from plot data comes from sampling errors since plots are typically smaller than map pixels (Baccini et al., 2007). This section describes the methods used to estimate the SDs of these error sources and the SD when estimating plot-level AGB itself.

Plot measurement and allometric model errors
Non-destructive forest inventory is the traditional method used to estimate tree AGB, but has an uncertainty of 5-44% at the stand scale (Burt et al., 2020). Tree measurement errors originate from uncalibrated surveying tools and human errors, which propagate into the allometric model used to estimate AGB from tree diameter (and height) per tree, and then into aggregation at plot-level (Réjou-Méchain et al., 2017). The cumulative error from tree measurement and allometric models is termed "measurement error" in this study.
To estimate the uncertainty in measurement errors (SD me ), plots with tree data were used to estimate how errors in individual tree measurements propagate into biomass estimates at plot-level. We used data from 8457 plots, ranging from 0 to 25 ha in size and with a total of 267,907 trees. These plots are within all the major climatic zones (tropics, subtropics, temperate and boreal) and across eight countries, of which the majority are in the tropics. However, plots with tree data constitute only 7.4% of all plot data used in our analysis. A two-step approach was therefore implemented to predict SD me for all plots using a model calibrated on the plots with tree data.
The first step estimated tree-level errors due to uncertainty in tree parameters (wood density, diameter and tree height). Tree wood density data and their uncertainty were obtained from global and regional databases (Chave et al., 2009). For trees without height data, stem diameter was used to estimate height, H, using the Weibull height-diameter model (Eq. (1)). This three-parameter estimator of H has been tested within tropical forests (Feldpausch et al., 2012), temperate coniferous forests in Norway (Mahanta and Borah, 2014) and China (Zhang et al., 2014), boreal forests (Zhang et al., 2018), and coniferous forests in the Philippine Highlands (Anacioco et al., 2018): where DBH is diameter at breast height and a, b and c are fitted coefficients.
Errors arising from the parameters in the biomass allometric model, such as model coefficients and residual standard errors, were also considered. These data are derived from the dataset of destructive tree measurements provided in Chave et al. (2014). The overall error propagation included the probability distributions of the tree and allometric model parameters and was implemented by running 1000 Monte Carlo simulations using the AGBmonteCarlo function of the BIOMASS R package (Réjou-Méchain et al., 2017). The outputs are estimates of AGB and its SD me for each tree, which were aggregated and scaled to plotlevel.
In the second step, a random forest (RF) model trained on the SD me of the 8457 plots using climatic zones, AGB and plot size as covariates was used to predict SD me for all plots. The RF model predictions were tested using an independent random subset containing a third (n = 2819) of the plots with tree data. The evaluation of the model resulted in an R 2 of 0.86 and Root Mean Square Error (RMSE) of 22.1 Mg ha − 1 .

Temporal differences
The correction for plot and map temporal mismatch introduces errors caused by uncertainties in the growth rate, for which data are available per forest type, biome and continent. To derive the temporal uncertainty for the plots, SD td , we multiplied the SD of the growth data in Table 4.9 of IPCC 2019 (Buendia et al., 2019), SD gr , in Mg ha − 1 yr − 1 , by the difference between the plot survey year (PY) and the map epoch (MY):

Sampling error
The sampling error can be significant, especially when the AGB exhibits large local variability since plots are often smaller than map pixels (Baccini et al., 2007). To estimate this within-pixel sampling error, spatial configurations using measured data from 8 to 60 ha plots and 5-250 m EO footprints were simulated by Réjou-Méchain et al. (2014). For each simulation, configurations of both plot and pixel were randomly located. For simulations where plots are smaller than pixels, the RMSE was computed and normalized by the mean AGB of the footprint to derive a Coefficient of Variation (CV). We adopted the results of Fig. 6 and Table S2 of Réjou-Méchain et al. (2014) to train an RF model to predict CV as a function of plot size and AGB map pixel size. We evaluated the model using one-third of the total set of plots (n = 38,727) which yielded an R 2 of 0.81 and an RMSE of 0.07. The CV was then converted into the SD of sampling error (SD se ) by multiplying by the mean AGB of all the plot data (μAGB) (Eq. (3)).

Plot-level uncertainty
Assuming the three error sources are independent, the uncertainty in the estimate of the mean AGB within a map pixel using plot data, SD p , is then given by: The SDs of the three main plot error sources and the maps were analyzed for each map over three groups of plot sizes: large plots (1-25 ha), plots of moderate size (0.3-1 ha) and smaller plots, usually from NFIs (<0.3 ha); and over biomass ranges (<150, 150-300, and >300 Mg ha − 1 ). The same grouping was also used to summarize the SD layers of the maps.

Aggregation to 0.1 ∘ grid cells
The plot data were compared with the AGB maps in 0.1 ∘ grid cells (referred to below as default AGB maps), which is a spatial scale comparable to those typically used in global carbon cycle and climate models. Non-forest pixels (taken to have AGB = 0) were included for Baccini and GEOCARBON prior to aggregation to avoid biasing the 0.1 ∘ AGB averages if only forest pixels are used. For each grid-cell i, its average AGB was estimated from the map data by averaging the AGB estimates at each pixel in the cell; and from the plot data by the MVLU estimate under the assumption that the plot data are unbiased. The MVLU estimate is the weighted mean of each plot estimate of AGB, x, inside the grid cell i, where the weight is inversely proportional to the variance of x. Hence the plot-based estimate of the AGB of the grid cell has uncertainty SD pG (i) given by Eq. (5), where the summation is over all plots in the grid cell.
Only grid cells containing at least 5 plots were selected; on average these cells contained 15 plots, with an average total area of 2.3 ha. Around 46% of the total number of available grid cells were excluded from this selection process. Similar studies have also set minimum plot numbers to select grid cells for map assessment, e.g., Fazakas et al. (1999); Baccini et al. (2012Baccini et al. ( , 2017; Xu et al. (2021). Although in some cases these plots may not properly represent the grid cell, notably when they are research plots lying in particular types of forest, only 4% globally and 24% of the tropical plots used for analysis are research plots. This issue is discussed further in Section 4.3, and Figs. S1 and S2.

Evaluation of differences between plot and map estimates of AGB
Plot and map estimates of AGB were tabulated and compared in AGB bins of width 50 Mg ha − 1 . For each bin, the following accuracy metrics were computed: Root Mean Squared Difference (RMSD) between map and plot AGB at 0.1 ∘ (AGB mG and AGB pG ) for all grid cells within the bin (n) (Eq. (6)); and the Mean Difference MD (Eq. (7)), which is interpreted as map bias or simply "bias". Scatterplots were also used to locate transitions from map overestimation to underestimation and AGB ranges exhibiting little bias. The scatterplots have a higher number of AGB bins than the tabulated results.

Map error conformity
We assessed whether the reported map SD is consistent with the plot SD and the other uncertainty components at 0.1 ∘ using a metric denoted as map error conformity (EC). Map uncertainty is classified as optimistic (OP) or pessimistic (PE) according to Eq. (8).

Spatial uncertainty
This section details the model-based approach to predicting bias and the geostatistical approach to modelling precision when aggregating map SD over the tropics. The restriction to the tropics is because the extra-tropical SD layer is not currently available for the Baccini map (Section 2.2), and since most plots with tree-level data used for measurement error estimation are in the tropics.

The random forest algorithm
We used RF (Breiman, 2001) to model and predict bias. RF is a nonparametric ensemble model of decision trees from bootstrapped samples of the training data and produces averaged predictions (RF regression). We implemented RF using the ranger R package (Wright and Ziegler, 2017). This provides a standard error (SE) of the RF model, calculated using the infinitesimal jackknife approach (Wager et al., 2014) along with a function case.weights that prioritizes data with higher weights when forming the bootstrap samples, and hence the trees of the RF model.

Bias modelling
We modelled bias using RF regression and data at 0.1 ∘ to form weighted bootstrap samples. The model used open access sources of spatially exhaustive covariates that were considered to have a possible influence on bias (Chave et al., 2004;Réjou-Méchain et al., 2014;Santoro et al., 2015), all averaged to 0.1 ∘ . We first tested 11 covariates, including the AGB map itself, its reported uncertainty, slope, aspect, tree cover, elevation, rainfall, temperature, biome, longitude and latitude. Using all and partial combinations of the covariates, we created multiple RF models using the default RF hyperparameters. The models were evaluated using a randomly held-out 30% of the 0.1 ∘ data to assess the proportion of the variance of residuals explained by the model. We then visually inspected the bias for indications of geographic correlation among covariates, as suggested in Meyer et al. (2019). After this initial investigation, we limited the covariates to the five listed in Table 1, which also gives brief metadata on the final covariates.
The predictive power of the covariates in the RF model was assessed by the Variable Importance Measure (VIM) and Partial Dependence Plots (PDP). VIM is the mean decrease in accuracy of an RF model after data permutation of a covariate, while a PDP shows the marginal effect of covariates on bias prediction. We normalized and ranked the VIM for every AGB map. The PDPs are displayed as matrices, color-coded with bias and with the axes labelled by the values of a covariate pair, e.g., bias plotted against AGB map and tree cover (Fig. S3).
Under the assumption that the error in the estimated bias is normally distributed, we derived the 95% confidence intervals (CI) of the predicted bias (MD) using the SE from the RF model (Eq. (9)). The estimated bias was then subtracted from the default 0.1 ∘ AGB map at all grid cells where the 95% CI of bias does not include zero. The corrected AGB maps (referred to as bias-adjusted) were then compared with the plot estimates at 0.1 ∘ using a third of the total grid cells independent from the data used for bias modelling (Fig. S4).

Uncertainty of the aggregated AGB map over the tropics
Model-based inference (as used in this study) has to account for spatial correlation in map errors when summing or averaging over an area. Furthermore, the variance of map errors may vary over space (heteroscedasticity). To account for the latter, the AGB residual, AGB R (x), defined as map-plot difference at plot location x, was scaled by the map SD; this assumes the SD of the residuals is proportional to the map SD at that point (Eq. (10)): where SR(x) is the scaled residual and SD m (x) is the map SD. This scaling was assumed to transform the residuals to homoscedasticity. We then generated variogram models, γ(h) in Eq. (11), to estimate the spatial correlation of SR at spatial lag h, where x is a plot location, and the errors are assumed to be statistically stationary: As proposed in Christensen (2011), the variograms were adjusted for the variance of plot errors by subtracting the mean SD p /SD m from the nugget of the variograms. Using the adjusted variograms, SD m (i) was computed using the covariances estimated at the original map scale within each grid cell. An identical procedure was adopted when estimating the SD in the total pantropical AGB for each map (Section 3.4).
This step is based on the covariances σ i, j of grid cell pairs i and j (1…n), derived after convoluting the adjusted variograms from the original map pixel size to 0.1 ∘ following the procedure of Kyriakidis (2004). These covariances of the map error component yielded the variance and hence SD of the total estimated AGB within the tropical belt (SD trop ) (Eq. (12)): For each AGB map, this was transformed to a 95% CI of the subsequent pantropical AGB corresponding to each map epoch.

Total pantropical AGB
The total pantropical AGB (− 25 ∘ to 25 ∘ latitude) estimated from biasadjusted maps was compared to that given by the default 0.1 ∘ maps i.e., not adjusted for bias. The 2020 FRA (UN-FAO, 2020) data for 2000, 2010 and 2017 (2017 as the average of 2015 and 2020) were also used to assess how AGB data compares with map estimates over time. Since the FRA provides AGB only in forest areas, we used 0.1 ∘ tree cover maps to remove 0.1 ∘ grid cells whose forest cover was less than a given threshold, chosen to produce a pantropical forest area close to that reported in the FRA. We used the Hansen et al. (2013)

Uncertainty as a function of plot size and AGB
For plots <0.3 ha, the inter-quartile range of SD was 30-151 Mg ha − 1 for measurement error and 16-44 Mg ha − 1 for sampling error (Fig. 3). The SD of measurement error decreased sharply for larger plots and increased slightly as AGB increased. The sampling errors were also affected by map pixel size. For instance, GEOCARBON (1 km pixels) had consistently higher SD than Baccini (30 m pixels). Plots that were temporally adjusted for epoch 2000 (Baccini) and 2017 (CCI Biomass) exhibited slightly higher temporal SDs than for the other two maps because of the longer periods that had to be bridged between the map epoch and plot inventory date (Table S1), but the temporal error had the lowest SD of the three error sources. On average among the maps, the estimated SD of each error was 93.9 Mg ha − 1 (measurement), 51.6 Mg ha − 1 (sampling) and 24.6 Mg ha − 1 (temporal), equivalent to 73, 22 and 5% of the total variance, respectively. Map SDs exhibited very different magnitudes, but high map AGB values were always associated with higher map SD.

Plot-to-map comparison
Comparisons between map and plot estimates of AGB at 0.1 ∘ in Fig. 4 and Table 2 show that, while all the maps overestimate lower biomass and underestimate higher biomass, the transition point from over-to underestimation differs. For example, the Baccini map starts to underestimate AGB at around 150 Mg ha − 1 , whereas for the other maps this occurs around 200 Mg ha − 1 . For all maps, the largest underestimation of AGB was in the highest biomass bin. GEOCARBON has the smallest underestimation for values of AGB >300 Mg ha − 1 , while the GlobBiomass and CCI Biomass maps have the lowest absolute MD over the range 50-200 Mg ha − 1 . The inter-quartile ranges of the binned AGB map values do not overlap with the 1:1 line below 50 Mg ha − 1 and above 300 Mg ha − 1 , indicating bias dominates random errors for those bins.
The map error conformity (EC) in Table 2 shows that overall, GEO-CARBON is optimistic about map precision whilst CCI Biomass is pessimistic. The precision estimates for the Baccini and GlobBiomass maps tend to be optimistic for low AGB and pessimistic for high AGB.

Spatial bias
The fraction of the variance of the bias explained by the RF models ranged from 24 to 36% over the AGB maps. Map AGB and tree cover were the most important predictors in the models (Fig. S3 and Table S4). The proportion of 0.1 ∘ grid cells for which the 95% CI of the bias prediction included 0 Mg ha − 1 ranged from 4 to 15% across the AGB maps, so most grid cells were corrected for bias.

Table 1
Covariates used in bias modelling, with a brief description, unit and spatial resolution. Systematic underestimation is particularly obvious over the tropical rainforests of the Amazon, the Congo basin and insular Southeast Asia (Fig. 5), but also occurs in parts of other climatic zones, particularly the sub-tropical zone of China and southeast Australia, the temperate zone of Spain and USA, and the boreal zone of Russia and Canada. There is no obvious common spatial pattern of overestimation among the maps. The GEOCARBON map has the smallest underestimation in the tropics, followed by the CCI Biomass map. The GlobBiomass map exhibits the largest overestimation in the temperate regions, while the Baccini map has the largest overestimation in the boreal zone.

Estimates of total pan-tropical AGB
The default and bias-adjusted maps, and the FRA all show a decrease in the total pan-tropical AGB from 2000 to 2017 (Table 3), but with important differences. The bias-adjusted maps give higher total AGB than the default maps for all years since they correct for map underestimation in high AGB regions. The 95% CIs for the estimated total AGBs take account of the map-specific spatial correlation in AGB map errors (see convoluted variograms in Fig. S5, which exhibit sills from 0.11-0.38 and nugget = 0, and the modified SDs in 0.1 ∘ grid cells in Fig. S6). The differences between the three estimates decrease from 2000 to 2017, though the bias-adjusted estimate is still 12.4 Petagrams (Pg) greater than the FRA estimate in 2017.

Uncertainty drivers in plot-to-map comparison
The largest contributor to plot SD (73%) was measurement error (which includes allometric model errors), which is much larger for smaller plots and increases slightly with higher biomass. Most of the plot data come from small plots (mostly NFIs) in which there are fewer trees. Combined with geolocation errors causing trees near the plot boundary to be included or excluded, this produces large uncertainties . Moreover, around 34% of the plots smaller than 0.3 ha are extra-tropical, and may be subject to erroneous uncertainty estimates due to the use of the wrong allometric model (Chen et al., 2015). Generic allometric models similar to Chave et al. (2014) for nontropical forests are not yet developed. The uncertainty of measurement error is probably best estimated by the data producers themselves and its provision would be a useful addition to the data quality requirements for current and upcoming plot data (see Section 2.2).
Sampling errors in the range 16-44 Mg ha − 1 were estimated for small plots when map pixel size and forest cover were taken into account (see Section 2.2). They tend to be amplified when small plots are compared with large map pixels , as observed in the GEOCARBON results (1 km pixel size, Fig. 3), which has the highest SD for this error. This occurs partly because forest structure tends to be nonuniform over short distances (Chave et al., 2004;Saatchi et al., 2011a), especially on slopes  and when there are very big trees (de Castilho et al., 2006). However, the influence of forest structure variability tends to decrease as plot size increases, e.g., Saatchi et al. (2011a) found that the CV of sampling error was 80% smaller for 1 ha plots than for 0.01 ha plots in the tropics. As well as plot size, the number and spatial spread of plots inside a map pixel affects sampling errors Bradford et al., 2010). Several randomly placed samples may be better at capturing the mean AGB of a forest region than a single large sample covering the same total area (Nascimento and Laurance, 2002). LiDAR data may also be useful as reference data since LiDAR-based AGB maps typically cover substantial areas and hence provide samples covering the whole range of AGB in a landscape. This will prevent biases arising from preferential sampling, which is often implicit in the selection of research plots (Duncanson et al., 2021). Whenever available, these maps would be preferred over the research plots themselves as reference data.
Uncertainty from plot temporal adjustment was largest for the AGB maps of 2000 and 2017 because of the longer periods between the map epoch and plot inventory date (Table S1), but was a small contributor to total plot uncertainty. Improvements might be possible by stratifying the Fig. 3. Boxplots of plot-level SD for measurement error, sampling error and temporal adjustment as a function of plot size and AGB, color-coded and labelled according to the AGB map they are compared with; the horizontal bar indicates the median and the boxes show the inter-quartile range. Also depicted are boxplots of map SD as a function of plot size and AGB. growth data to capture the growth of disturbed forests under different management intensities and natural disturbances (Requena Suarez et al., 2019). Such estimates could be complemented by forest age maps (Besnard et al., 2021).
Map SDs exhibited very different magnitudes as a result of the use of different data, different AGB estimation methods and different ways of propagating uncertainty (Section 2.2). However, high map AGB values are always associated with higher map SD (though not necessarily a higher CV) (Rodríguez-Veiga et al., 2017). If plots are used for calibration, such as in the Baccini map, large measurement errors may contribute a significant part of the total error propagated to the map (Réjou-Méchain et al., 2017). AGB maps produced without in situ calibration avoid such errors, but are vulnerable to model uncertainties (Santoro et al., 2011). For example, the SDs for GlobBiomass and CCI Biomass arise mainly from limitations in the model converting backscatter to GSV and uncertainty in its parameters .
All AGB maps tended to overestimate low AGB and underestimate high AGB. Numerous studies, summarized in Réjou-Méchain et al.
and Duncanson et al. (2021), show similar effects, as a result of several intertwined factors. Both optical and radar sensors are known to saturate for higher values of AGB (Zhao et al., 2016;Rodríguez-Veiga et al., 2019), which inevitably leads to underestimation of AGB. However, the factors causing map overestimation for lower AGB are more complex. For radar-based maps, it is largely driven by imperfect allometric models (Santoro, 2020) and the influence of soil moisture and roughness (Santoro et al., 2011). For maps derived using optical data, it is possibly a result of fitting saturated EO data to plot data in the regression models, particularly if plots are limited to certain forest conditions but used to calibrate models predicting AGB globally . The smaller map bias in mid-range AGB values is expected for regression methods, which often force the mean of the training data and predictions to be equal. Similar behaviour for the model-based approaches used by GlobBiomass and CCI Biomass may reflect higher sensitivity of radar backscatter to AGB and reduced soil  Table S2). Each circle represents an AGB bin and its size denotes the number of grid cells in the bin, while the whiskers correspond to the 25 th and 75 th percentile range of the map AGB.
effects in this AGB range. GEOCARBON has the closest match to plot data for AGB >200 Mg ha − 1 , possibly because its bias removal method used a plot dataset in the tropics which may overlap with our plot data. However, this effect is not easy to quantify as here the comparison is with plot data after the harmonization process (Section 2.2).

Bias and precision modelling
The model-based approach to predicting bias at 0.1 ∘ yields broadscale spatial patterns of map over-and underestimation that exhibit significant similarities between the four maps (Fig. 5), and are also similar to patterns observed in Avitabile et al. (2016) for two global maps and Tsutsumida et al. (2019) for regional maps. Error hotspots, mainly of map underestimation, stand out in the regions of agreement and disagreement between the maps (Fig. 6). Such hotspots occur regardless of the methods used to produce the maps (Xu et al., 2016) and whenever there are sufficient reference data to compare with the maps (Avitabile et al., 2016;Rodríguez-Veiga et al., 2019). However, insufficient and unrepresentative reference data may cause incorrect estimation of map bias and hence erroneous map correction (Avitabile et al., 2016). We attempted to counteract this effect by reducing the plot-based estimates of AGB at 0.1 ∘ when non-forest areas exist in grid cells (Section 2.2). The plots used here cover all major ecological zones, though some zones are under-sampled, and are subject to large measurement errors. These plot errors are accounted for when creating the training data, as explained in Section 2.5. For example, training data within areas with high map underestimation, such as the Tasmanian forests, were mostly small plots, so received lower weights and hence had a lower chance to become training data. A similar situation is observed in Sweden.
The Variable Importance Measure (Table S4) and Partial Dependency Plots (PDP; Fig. S3) indicate that the predicted bias is most sensitive to map AGB and tree cover, but this is also clear from Fig. 5. In particular, the PDPs show that map underestimation of at least 60 Mg ha − 1 mostly occurs when AGB >300 Mg ha − 1 and canopy cover is in the range 60-90%. Furthermore, bias in the radar-based maps, e.g., CCI Biomass, is sensitive to steep northeasterly slopes because of the look geometry of the sensor and incorrect pre-processing of the SAR data for moderate and steep terrain . These observations may help in developing improved AGB estimators that combat such deficiencies.
Spatial autocorrelation analysis revealed spatial dependency in errors up to lags of 50-104 km, depending on the map (Fig. S5). Shortrange autocorrelation of residuals (<5 km) comes from localized forest structure (Guitet et al., 2015;Mascaro et al., 2014). However, longerrange autocorrelation is found from our plot data, which are mostly from NFIs that are configured to sample the forest over short distances (e.g., using nested plots) while also representing regional to country scales. Similar effects were found in other large-scale studies (Baccini et al., 2012;Avitabile et al., 2011;Ploton et al., 2020b). Large scale AGB mapping often uses environmental variables, e.g., topography (Baccini et al., 2012) and climate (Hernández-Stefanoni et al., 2020), as predictors, and these exhibit long-range spatial dependency that may transfer into the AGB error structure (Ploton et al., 2020b). This may also affect the two radar-based maps as the GSV and biomass expansion Table 2 Summary statistics of the plot-to-map comparison: mean plot and map AGB, MD (bias), RMSD, and mean variances of plot and map AGB errors per biomass bin at 0.1 ∘ . The EC column lists whether the SD layer provided with the map is optimistic (OP) or pessimistic (PE) about map precision.   factor used for AGB estimation are mapped with climatic variables as inputs Santoro, 2020). The variograms also indicate that the map SD layers need to be improved. In the variogram models shown in Fig. S5, the residuals are scaled by the map SD, so the SDs are incorrect when the sills deviate from 1 (see "default variogram" in Fig. S5). Further evidence for the need to adjust the SDs is given by the map error conformity measures (Table 2). Overly pessimistic estimated SD for the CCI Biomass 2017 map has already been corrected in its updated version (Santoro, 2020).

Strengths and limitations of the framework
A core GCOS principle is that estimates of AGB should as far as possible be unbiased. This study provides a comprehensive framework for meeting this principle by estimating the bias and uncertainty in AGB maps (Fig. 1). It can be adapted to the requirements of different map users or producers  but requires estimates of uncertainty in both plots and maps, and careful vetting of the quality and suitability of plot data. Open source tools estimating plot uncertainty (e. g., BIOMASS and PlotToMap) are of great value for this. BIOMASS has not been widely tested other than in tropical regions, though is currently being tested in extra-tropical forests. We also provide an interactive online tool for users of the framework and open source software (which can be readily updated) that offers more flexibility in pre-processing plot data and comparing plot and map AGB estimates (https://github.com/a rnanaraza/PlotToMap). Countries with constraints on sharing plot data could use such tools while maintaining national data privacy.
Bias-adjusted maps of AGB and its uncertainty in 0.1 ∘ or larger grid cells can be used in climate and carbon modelling (see Fig. S4). They can also provide estimates of national AGB over time, and its uncertainty, to assist carbon accounting based on NFI sampling (McRoberts and Tomppo, 2007) and to enhance local AGB estimates Le Toan et al., 2011). In addition, they can provide baseline AGB values when more frequent estimation of carbon emissions is desirable (Csillik and Asner, 2020).
Information on local to regional map biases and their dependence on terrain and forest variables (Table S4, Fig. S3) may help to trace the factors causing such bias. Moreover, map producers should find the analyses of spatial error structure from variograms and map error conformity informative. For more precise maps, our variograms that account for measurement errors (i.e., with zero nugget) and our optimized method of SD aggregation can be adapted.
The application of the uncertainty framework to estimating pantropical AGB showed a persistent temporal decline in stocks and increasing agreement between the map-based estimates and the estimate from FRA over time. This may reflect both increasing quality of forest AGB data from countries (Nesha et al., 2021) and improving map accuracy, and suggests that we can have more confidence in more recent estimates of pantropical forest AGB. However, the large disagreements between the map-based and FRA estimates of AGB in 2000 indicate that Table 3 Pantropical forest AGB with 95% CI estimated from the default and bias-adjusted maps. Also shown are estimates derived from FRA data for these three years. The forest areas of map-based AGB estimates are set as the closest possible area to the FRA forest area based on a 0.1 ∘ tree cover threshold. The analysis includes all pantropical countries but without the pantropical portions of China and Australia.  long-term AGB change estimation based on differences in AGB maps is likely to be unreliable. Further analysis of AGB change from these estimates will be addressed in a follow-up study. The limitations in the uncertainty framework reported here pertain to the plot dataset, plot selection bias, methodological limitations and data requirement issues. Most plot data with tree-level measurements lack tree height, so we estimated height with the Weibull model, but this model may not apply in all biomes, such as woodlands and mangrove forests. We also lacked tree-level data in non-tropical regions, so estimates of measurement errors for plots in these regions are subject to revision. The fact that the plot dataset in the boreal regions is concentrated in two countries may also limit global application of the framework.
Plot selection bias will arise if the plots inside a 0.1 ∘ grid cell do not provide a random sample of the AGB within the cell. For example, if they are selected to lie within high AGB areas within a diverse forest landscape, their weighted average would overestimate the AGB at 0.1 ∘ . Research plots, which make up 4% of the total dataset but 24% in the tropics, are particularly prone to this effect. To analyze how this might affect our analysis we examined the variability of tree cover at grid scale and plot locations, and treated this as a proxy for AGB variation (Avitabile and Camia, 2018). This analysis yielded a set of grid cells without preferential samples (referred to as strict filtering of the plot dataset; see Fig. S1 for the specific steps). Assessment of the GlobBiomass map at pantropical and global scales against the filtered plot dataset (based on the tree cover for the same epoch (Hansen et al., 2013)) gave results differing only slightly from use of the current dataset (see Fig. S1). This suggests that preferential sampling had little effect on our analysis. Possible reasons for this are that almost the same number of grid cells were excluded under the current approach and the strict filter (56% and 57% in the pantropics, respectively), and that many of the grid cells selected were the same under both approaches, particularly in tropical high AGB areas e.g., 77% of the tropical grid cells where GlobBiomass >250 Mg ha − 1 used in the current approach were also used after strict filtering (Fig. S2). Though we used several grid cells containing research plots, these are mainly plots with area >0.60 ha located in forests that visually exhibit homogeneous canopy cover. Nonetheless, the bias seen in the corrected maps when AGB >300 Mg ha − 1 may be exacerbated by the lack of representative plot data, even if the minimum number of plots inside grid cells is increased. This AGB range was only covered by research plots in Tasmanian and Amazonian forests, and most of them were excluded from the bias modelling since the weighted bootstrapping limits the use of small plots with high AGB.
A number of possibilities exist for improving the bias modelling. Using additional covariates, such as AGB texture and canopy height, may help (Xu et al., 2021). Additional plot data, preferably large plots, are desirable both to compensate for those with high variance and, more importantly, to increase spatial coverage since a large training dataset is needed to capture local AGB error patterns (Xu et al., 2016). Selected values from local AGB maps, if they exist, can be added to the training data in under-sampled areas (McRoberts et al., 2019b). We also plan to assess different cross-validations of the bias model. In addition, the use of harmonized plots at coarser scales, e.g., 1 km from forestry concessions (Ploton et al., 2020a) and 25 km from NFIs (Menlove and Healey, 2020) are also options, given their extensive forest coverage.

Conclusions
1. The comprehensive uncertainty framework developed in this paper can correct existing AGB maps for bias, within the limitations of the bias model. Such maps, with their associated SDs at coarser scales, are particularly useful in the context of climate and carbon cycle modelling. The analysis of spatial bias and models of spatial error correlation provide valuable information to both map users and producers on local to regional map errors.
2. The estimates of bias in the AGB maps exhibit spatial patterns that largely reflect AGB itself. The bias models would benefit from additional plot data and local AGB maps within poorly represented regions i.e., LiDAR-based maps will be preferred over plot data whenever available. 3. The spatial uncertainty modelling was hindered by plot AGB uncertainty arising principally from measurement and sampling errors, which tends to be especially large in regions where only small plots are available. It would be helpful if NFIs included some larger plots (e.g., >1 ha) to serve multiple purposes, including the assessment of global AGB maps. 4. Both map-based and FRA estimates of pantropical AGB show a decline from 2000 to 2017, and become increasingly close with time, despite the datasets and methods used being quite different for different map epochs. However, there is still a difference of at least 6.9 Pg between the map-based and FRA estimate in 2017.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.