Int J Appl Earth Obs Geoinformation Forest biomass retrieval approaches from earth observation in di ﬀ erent biomes

The amount and spatial distribution of forest aboveground biomass (AGB) were estimated using a range of regionally developed methods using Earth Observation data for Poland, Sweden and regions in Indonesia (Kalimantan), Mexico (Central Mexico and Yucatan peninsula), and South Africa (Eastern provinces) for the year 2010. These regions are representative of numerous forest biomes and biomass levels globally, from South African woodlands and savannas to the humid tropical forest of Kalimantan. AGB retrieval in each region relied on di ﬀ erent sources of reference data, including forest inventory plot data and airborne LiDAR observations, and used a range of retrieval algorithms. This is the widest inter-comparison of regional-to-national AGB maps to date in terms of area, forest types, input datasets, and retrieval methods. The accuracy assessment of all regional maps using independent ﬁ eld data or LiDAR AGB maps resulted in an overall root mean square error (RMSE) ranging from 10 t ha − 1 to 55tha − 1 (37% to 67% relative RMSE), and an overall bias ranging from − 1 t ha − 1 to +5 t ha − 1 at pixel level. The regional maps showed better agreement with ﬁ eld data than previously developed and widely used pan-tropical or northern hemisphere datasets. The comparison of accuracy assessments showed commonalities in error structures despite the variety of methods, input data, and forest biomes. All regional retrievals resulted in overestimation (up to 63tha − 1 ) in the lower AGB classes, and underestimation (up to 85 t ha − 1 ) in the higher AGB classes. Parametric model-based algorithms present advantages due to their low demand on in situ data compared to non-parametric algorithms, but there is a need for datasets and retrieval methods that can overcome the biases at both ends of the AGB range. The outcomes of this study should be considered when developing algorithms to estimate forest biomass at continental to global scale level.


Background
Forests cover around one third of the Earth's land surface, are an essential socio-cultural element of modern society, support biodiversity and influence the climate system via coupled carbon-water-energy cycles (Bonan, 2008). Quantifying forest aboveground woody biomass (AGB), i.e. the amount of woody matter within a forest, has profound social and economic importance, since it is a source of materials and energy for direct human use, and its structure and temporal dynamics exert substantial influence on the functioning of terrestrial ecosystems, with direct impacts on biodiversity, as well as on the carbon and energy cycles and consequently the whole Earth system (e.g. Bonan, 2008;Le Quéré et al., 2018;Pan et al., 2011). As such, AGB can be used to evaluate the dynamics of global vegetation and Earth system models (e.g. Thurner et al., 2017;Carvalhais et al., 2014), was recognised by the Global Climate Observing System (GCOS) as an Essential Climate Variable (ECV) (Bojinski et al., 2014), and plays an important role in several essential biodiversity variables (EBV) (Pereira et al., 2013). However, quantification of AGB still presents a scientific challenge with significant implications for our current knowledge about the Earth system (Pan et al., 2011;Le Quéré et al., 2018).
Knowledge of the spatial distribution of forest AGB is typically derived from ground measurements collected by national forest inventories. From these, regional-to national-scale summary data are generated for the FAO's quinquennial Global Forest Resource Assessment (FRA) reports (FAO, 2005(FAO, , 2010FAO, 2015), aiming at giving a global portrait of biomass stocks and their changes in time. Vast areas covered by forests mean that ground-based forest inventories need a large amount of resources to provide accurate information on the extent, spatial distribution and dynamics of forest AGB. However, forest inventory data in developing countries can be fairly inaccurate (Saatchi and Moghaddam, 2000) and often many years out of date . A review of the country FRA reports (FAO, 2010) showed that 45 countries (i.e. around 20%) indicated high quality for the reference data used (mostly located in Europe and North America), while 171 did not report on quality (most African and Asian countries). In addition, forest inventory data are not always comparable and biomass estimates may be biased due to differing national forest definitions and differences in methods used to produce the estimates, such as the choice of the minimum tree diameter sampled (Searle and Chen, 2017) and plot size . The only practical approach for consistent global or regional woody biomass estimation therefore lies in systematic use of Earth Observation (EO) data, either in parameterised model-based approaches or in combination with high-quality reference data. Satellite data have long been used for forest cover mapping, clear-cut or burnt area monitoring and detection of disturbances (Hansen et al., 2013;Healey et al., 2005;Fraser and Li, 2002;Rignot et al., 1997). However, without biomass information this is insufficient to quantify the role of forests in the global carbon and energy cycles and other biogeochemical cycles. In addition, financial mechanisms aiming to reduce emissions and enhance carbon stocks, such as the Reducing Emissions from Deforestation and Forest Degradation (REDD+) initiative and carbon trading schemes, require credible and consistent measurement, reporting and verification (MRV) systems that are spatially explicit with a wall-to-wall extension and provide a full carbon account of forest ecosystems (Steffen et al., 1998).

Current status of biomass estimation from space
Studies aiming at wall-to-wall estimation of AGB at regional and global scale have used passive optical, active or passive microwave, and LiDAR data obtained from Earth Observation space platforms either stand-alone or in synergy (e.g. Saatchi et al., 2011;Baccini et al., 2012;Thurner et al., 2014;Gallaun et al., 2010;Hu et al., 2016, Liu et al., 2015. Multispectral optical imagery contains information on the photosynthetic parts of vegetation rich in chlorophyll, while microwave active sensors, such as Synthetic Aperture Radar (SAR), contain information on the dielectric (essentially moisture content) and structural properties of objects, soil surface and plants. The main advantage of microwave radar sensors is that, unlike optical imagery, radar images are unaffected by cloud cover, allowing usable image acquisitions even in the cloudiest places on Earth. Spaceborne LiDAR sensors, on the other hand, give a sampled retrieval pattern along the orbit and to measure elapsed time between emitted and received light pulses which can be used to estimate forest canopy height at each footprint location. However, these datasets present different degrees of saturation to AGB, where saturation refers to the AGB level at which the sensitivity of the signal (i.e. backscatter, reflectance) becomes too small to be measurable, or where the signal fails to penetrate the canopy (Fagan and DeFries, 2009). This is particularly relevant for dense tropical forest, which is a key biome where accurate biomass information is needed.
The search for consistent approaches over forested areas in the tropics prompted the use of satellite data calibrated against in situ biomass, with special emphasis on forest height estimates derived from the first spaceborne LiDAR, the Geoscience Laser Altimeter System (GLAS) on board the Ice, Cloud and land Elevation Satellite (ICESat) (Lefsky, 2010). This led to the development of two pan-tropical biomass maps Baccini et al., 2012) at reporting grid size of 1 km and 500 m respectively. The former reported a relative RMSE at pixel level of approximately 30%, while the latter reported similar figures in terms of RMSE (38-50 t ha −1 ). These maps exhibited significant regional differences, although these decreased when biomass estimates were aggregated to country or biome scale (Mitchard et al., 2013(Mitchard et al., , 2014Rodriguez-Veiga et al., 2016). Avitabile et al. (2016) fused Saatchi et al. (2011), andBaccini et al. (2012) datasets into a 1 km pan-tropical map using a bias-removal approach by incorporating additional field observations and locallycalibrated high-resolution biomass maps. The bias in the overall mean AGB was reduced to + 5 t ha −1 , compared with the biases in the input maps of + 21 t ha −1 and + 28 t ha −1 respectively. Using very long time series of C-band radar data from Envisat ASAR, Santoro et al. (2015a) produced a Growing Stock Volume (GSV) map for the northern hemisphere at 1 km spatial resolution. The relative RMSE of the retrievals at provincial level was between 12% and 45% (average 29%) when compared to National Forest Inventory data from the major forested countries. This map provided the basis for a carbon stock map of the boreal and temperate forests .
A first composite global dataset of forest AGB was developed within the European Commission-funded GEO-CARBON project. The product merged, at a pixel size of 0.01°, the Saatchi et al. (2011), andBaccini et al. (2012) pan-tropical datasets with the boreal and temperate dataset (Santoro et al., 2015a;Thurner et al., 2014) and used the IPCC Tier 1 biomass values for the few remaining areas not covered by these datasets (Avitabile et al., 2014(Avitabile et al., , 2016. This exercise, despite being hindered by limitations in the input EO data used by individual biomass maps, approximations in the retrieval approaches and the fact that the individual maps were based on data acquired at different times between 2000 and 2010, is still the most consistent global AGB map to date. Hu et al. (2016) also published a global AGB map at 1 km resolution derived using GLAS metrics interpolations, MODIS NDVI and Land Cover products and the SRTM DEM, together with climate data. However, the dataset used to calibrate the map consisted of 3348 forest inventory plots of different sizes (including very small plots of 0.05 ha). The calibration dataset was also geographically biased as the plots were mostly located in continental China (> 55% of plots) and Brazil (23% of plots), while almost no plots were used from Europe, North America, Australia, and Africa. These issues might explain the large differences observed in this map when compared to previous global and pan-tropical maps (Rodríguez-Veiga et al., 2017;Hu et al., 2016). Liu et al. (2015) used vegetation optical depth (VOD) retrieved from several passive microwave satellite sensors to map time series of AGB for all vegetation types globally over the period 1993-2012 at 27.5 km resolution. Unfortunately, this approach was calibrated using the Saatchi et al. (2011) map, which added the uncertainties from this product to the final map, and make it difficult to validate due to the coarse resolution of the product.
At continental scale, MODIS data and forest inventory plots have been used to map AGB over Europe (Gallaun et al., 2010) at 500 m resolution, and Africa (Baccini et al., 2008) at 1 km resolution. The woodlands and savannas of Africa were also mapped at 25 m spatial resolution using ALOS PALSAR data (Bouvet et al., 2018).
At national and regional scales, several examples have been published, such as for Mexico Cartus et al., 2014), Canada (Beaudoin et al., 2014), Cameroon (Mermoz et al., 2014), China (Yin et al., 2015;Piao et al., 2005;Liu et al., 2012), the Amazon basin (Saatchi et al., 2007), Russia , USA (Kellndorfer et al., 2011) and Colombia (Anaya et al., 2009), with spatial resolutions ranging from 30 m to 1 km and in most cases using a combination of optical and SAR imagery. Regional approaches use field AGB measurements to calibrate the algorithms, often complemented with airborne LiDAR datasets Perrin et al., 2016;Lu et al., 2012). These regionally-calibrated products can use a wider variety of datasets, as well as regional expertise, to provide the best possible estimates of biomass. In contrast, global, pantropical or continental products suffer from limitations in the amount and representativeness of data available for calibration and validation. Pantropical maps from Saatchi et al. (2011), andBaccini et al. (2012) circumvented the paucity of ground data for calibrating their nonparametric machine learning approaches at large scale by using AGB estimated from LiDAR footprints from the space-borne ICESAT-GLAS instrument. However, they still used a fair amount of ground-based values of AGB to calibrate the relationship between AGB and LiDAR footprint metrics. An algorithm that avoids the use of in situ data for model training is the BIOMASAR algorithm (Santoro et al., 2015a(Santoro et al., , 2011Cartus et al., 2012); the algorithm is, however, constrained with information on maximum biomass which are derived from inventory data, regional and national statistics, as well as remote sensing-based biomass estimates (Santoro et al., 2015a(Santoro et al., , 2011Cartus et al., 2012). Inaccurate data sources ultimately translate into local estimation biases (Santoro et al., 2011).
The long list of AGB datasets presented above highlights that biomass mapping methods are largely driven by data availability and are scale-dependent. National and regional products can be generated by different parametric and non-parametric approaches. Non-parametric methods, such as machine learning techniques, usually out-perform parametric approaches (Evans and Cushman, 2009) and are preferred at national and regional level if enough ground data are available. At global or continental level the lack of representative in situ measurements is the motivation for using physically-based approaches that require few ground data (if any).
This paper describes a diverse set of regional approaches to AGB mapping in different biomes carried out during the European Space Agency (ESA) Data User Element GlobBiomass project (GlobBiomass, 2015;Schmullius et al., 2015;Balzter et al., 2016;Schmullius, 2017). This study aimed to produce spatially consistent and accurate maps of AGB, using all available EO data and regional knowledge with the objective of supporting the development of global biomass retrieval algorithms and the assessment of thereof resulting estimates. These maps can be used for direct estimation of carbon emission factors or emissions contributing to greenhouse gas inventories. Further aims were: i) to better understand the strengths and weaknesses of existing methods to map AGB using available EO datasets, ii) to establish how differences in forest structure and reference data affect methods to invert EO data to AGB. Five regional AGB maps derived using reference data and EO imagery and various retrieval methods were generated for the year 2010 at 25-100 m spatial resolution. The regions were selected to encompass a wide range of biomes and forest types. Each region was at least 300,000 km 2 in size; it was either nationwide for Poland (temperate forest) and Sweden (boreal forest), or covered a substantial part of Indonesia (Kalimantan, tropical forest), Mexico (the Yucatan peninsula & Central Mexico, tropical forest-woodland transition), and South Africa (Eastern forest belt, subtropical dry forest). All maps were evaluated quantitatively against an independent dataset, and qualitatively by local experts. They were also compared with existing continental scale AGB maps Baccini et al., 2012;Thurner et al., 2014) where these overlapped the study areas.

Study regions
The study regions cover the most common range of woody AGB from low (< 50 t ha −1 ) to high (> 300 t ha −1 ) and are representative of major climates and forested biomes, including boreal, temperate, dry tropical savanna and wet tropics (Fig. 1).
Sweden is mostly situated in the boreal region, while Poland lies in the temperate forest zone. Sweden and Poland occupy approximately 447,000 km² and 313,000 km², of which 60% and 30% are forests, respectively. Coniferous forests predominate, though broadleaved forests occupy a significant area in Poland.
The study areas in Central Mexico, the Yucatan peninsula and Kalimantan represent a wide variety of tropical and subtropical forest  (Fick and Hijmans (2017) for global forests and study sites sampled at 0.5°grid scale. The climate space is divided into global terrestrial biomes (Whittaker, 1962(Whittaker, , 1970. The global distribution of forests is according to the Global Land Cover (GLC2000) map (Bartholomé and Belward, 2005).
ecosystems. The Yucatan peninsula study area with approximately 160,000 km 2 comprises a mix of tropical moist, tropical dry forests and mangroves, whilst an area of 353,000 km 2 in Central Mexico covers subtropical coniferous forest, tropical dry forest, tropical moist forest, xeric shrublands, and includes some of the forests with the highest biomass density in Mexico (the Oyamel forest). The Indonesian part of Borneo (Kalimantan) covers 73% of Borneo's land mass (approximately 540,000 km²). The ecosystems of Kalimantan include different forest types: mangrove forests, peat swamp and freshwater swamp forests, the most extensive extent of heath forests in Southeast Asia, lowland dipterocarp forests, ironwood forests, forests on limestone and ultrabasic soils, hill dipterocarp forests and various montane formations (MacKinnon et al., 1996).
The South African study area of approximately 334,000 km 2 is situated along a 1300 km North-South transect running to the East of the country next to Zimbabwe, Mozambique, Swaziland, and the Indian Ocean, and is dominated by forested landscape. This area contains various forest types: savanna (68% of the area), commercial plantations, and scattered remnants of indigenous dense forests (Mucina and Rutherford, 2006).

Data
Remote sensing imagery from different airborne and satellite sensors (optical, LiDAR and SAR) were utilized in this study (Table 1). Except for Sweden, the main dataset used was the freely available ALOS PALSAR 2009 and 2010 mosaics of gamma nought (γ°=σ°/cosθ, where σ°is the radar backscattering coefficient and cosθ is the local incidence angle) produced by JAXA at 25 m pixel spacing in HH and HV polarizations (Shimada et al., 2014). The ALOS PALSAR mosaics are processed according to a standard protocol (Shimada et al., 2014) which involves calibration, multi-looking (16 looks), projection, orthorectification and slope correction using the Shuttle Radar Topography Mission (SRTM) Digital Elevation Model (DEM) data. A destriping process (Shimada and Isoguchi, 2002) was also applied to try and equalise intensity differences between neighbouring strips normally attributed to seasonal and daily differences in soil moisture conditions. As part of our methods, if significant strip effects still remained locally, substitution by another year's mosaic or histogram matching of the problematic strip with neighbouring strips was performed. A multitemporal multichannel filter (Quegan and Yu, 2001) using a 7 × 7 window was also applied to all the annual mosaics. At this point, the remaining speckle effect was considered negligible.
Landsat 5 and 7 ETM + Surface Reflectance (SR) imagery computed by the Landsat Ecosystem Disturbance Adaptive Processing System (LEDAPS) method (http://ledaps.nascom.nasa.gov/) ( Masek et al., 2006) were used to generate annual ( ± 1 yr) median value composites from good quality pixels for all spectral bands in Mexico and Poland. Landsat Percent Tree Cover (PTC) products (Hansen et al., 2013;Sexton et al., 2013) for the year 2010 were acquired from USGS Land Cover Institute (https://landcover.usgs.gov/), and the Global Land Cover Facility (http://glcf.umd.edu/) for Mexico and Eastern South Africa, respectively. Additionally, freely available 30 m spatial resolution elevation data from the Shuttle Radar Topography Mission (void-filled SRTM Plus NASA V3) was obtained for Mexico, Eastern South Africa and Kalimantan from the USGS Earth Explorer repository (http:// earthexplorer.usgs.gov/). SPOT-4 High Resolution Visible Infrared (HRVIR) and SPOT-5 High Resolution Geometric (HRG) data were acquired between 2008 and 2010 (approximately 80% from 2010) for Sweden; all images were geometrically precision-corrected to the Swedish National Grid, and the pixel size for all bands was resampled to 25 m using cubic convolution.
The accuracies of the resulting maps were evaluated using either AGB forest inventory plots or airborne LiDAR-derived AGB maps, collected and produced according to different protocols (Table 1), and with characteristics specific to each region. Airborne LiDAR-derived Table 1 Datasets and algorithms used for each 2010 regional map.  AGB maps were used in Kalimantan for calibration and validation, but only for validation in Eastern South Africa (Naidoo et al., 2015). A subset of LiDAR-derived AGB maps or the field data (15-30%) was excluded in each region and used as an independent validation dataset. The subset was extracted by stratifying the reference dataset into different AGB classes in order to have a similar distribution of AGB in both the calibration and validation datasets.

Biomass estimation methods
Both parametric and non-parametric methods were used to predict AGB. These can be further grouped into data-driven and model-based methods. The method for each region was selected based on data availability and the expertise of each regional research group. Teams working in areas with forest inventory and other in situ data of sufficient number and quality for calibration purposes used non-parametric machine learning algorithms, while areas with insufficient in situ data used parametric models, such as model-based regression, and when available, complemented the ground observations with airborne LiDAR biomass predictions (Table 1).
The probabilistic outputs from the non-parametric MaxEnt algorithm (Phillips et al., 2006(Phillips et al., , 2004 were used for Mexico . Machine learning algorithms Random Forest (Breiman, 2001;Cartus et al., 2014) and a k-Nearest Neighbours (kNN) (Tomppo et al., 2008;Reese et al., 2003) were applied in Poland and Sweden respectively, both requiring large amounts of field plots for calibration. The parametric method used in Kalimantan used a two-step calibration approach where field plots are first used to calibrate airborne LiDAR measurements covering a larger area, and these were then used to calibrate a multivariate linear regression model with backscatter intensity and texture parameters from the SAR imagery as predictors (Englhart et al., 2011(Englhart et al., , 2012. Bayesian inversion of a semi-empirical model (the water cloud model -WCM) was used to relate PALSAR backscatter to AGB in South Africa (Bouvet et al., 2018). This method relies on a small number of at least 1 ha in situ AGB plots, ancillary data and simulations from the Multi-static Interferometric and Polarimetric Electromagnetic model for Remote Sensing (MIPERS) (Villard and Borderies, 2007;Villard, 2009) for parameterization.

Accuracy assessment methods
A standardised accuracy assessment was carried out for all regional AGB maps by making use of the independent reference data. The assessment was based on stratifying the reference AGB into contiguous ranges of values and quantifying the estimation bias, the standard deviation of the error and the Root Mean Square Error (RMSE) within each range. The selected ranges varied with test site, depending on the maximum value of biomass for the site and the need to have a sufficient number of reference data within each range.
In / . When the CV exceeds 1, the RMSE is dominated by random error, but when it is less than 1 the dominant error source is bias in the estimator. In particular, if CV = 10, then bias makes up 10% of the RMSE, while if CV = 0.48 it contributes 90%.

Results
The constructed AGB maps for the year 2010 were generated with a pixel size of 25 m for Mexico-Yucatan Peninsula, Central Mexico,   (Olson et al., 2001) derived from the combined regional maps in this study (light grey) aggregated to 1 km spatial resolution compared to the AGB histogram from the GEO-CARBON global map (dark grey) within the study areas. Flooded grasslands and savannas biome is not included in this analysis due to the small amount of data available from the study regions, while the temperate grasslands, savannas, and shrublands biome was not encountered in the study regions.
The regional maps cover the whole range of expected woody AGB densities from low to high biomass. The histogram of the combined regional maps was comparable to the histogram of AGB extracted from the global AGB map of the GEO-CARBON project in the different forest biomes (Olson et al., 2001) covered by this study (Fig. 3). The most substantial differences are on the tropical and subtropical grasslands, savannas and shrublands, and in the montane grasslands and shrublands biomes, where the GEO-CARBON map showed a strongly skewed histogram towards low AGB (< 50 t ha −1 ) and very low frequency of higher AGB classes, while this study showed a more distributed declining trend from low to high AGB classes in those biomes. Additionally, the GEO-CARBON map shows the same skewed histogram towards low AGB in the Mediterranean forests, woodlands, and scrub biome, while this study's histogram showed to be slightly skewed towards higher AGB (50-100 t ha −1 ). In the tropical and subtropical moist broadleaf forests biome histograms are similarly skewed towards high AGB, but GEO-CARBON's histogram showed higher frequency at the highest AGB range (> 250 t ha −1 ).
The accuracies of the regional AGB maps were assessed using the independent validation datasets (  Fig. 5). LiDAR AGB datasets were aggregated to the corresponding satellite map resolution for validation, while plot datasets used the average value of the pixels within the plot boundaries.
The accuracy analysis reveals several commonalities but also some important differences between study regions. All regions over-estimate AGB for the lower AGB ranges and, with the exception of Eastern S. Africa which covers a very limited biomass range, under-estimate in the upper ranges, especially in the highest AGB class. The bias in the lowest AGB class is substantial in absolute terms for all regions except Central Mexico and Eastern S. Africa, but even in these regions it has values that are 47% and 40% of the mid-range values (i.e., 15 and 10 t ha −1 ), respectively. For most regions the bias decreases in absolute value before increasing again. This is expected for regression-based approaches, which ensure that the regression curve passes through the point defined by mean of the reference and estimated data, but occurs for all methods.
There are striking differences in the balance between bias and random error in the RMSE, as is clear from Table 2 and Fig. 4. In Kalimantan, bias and random error are of similar magnitude except in the middle AGB ranges, where random error dominates. For Central Mexico and Eastern S. Africa, random error is dominant except for the highest AGB class in Central Mexico, where it is comparable to bias. Note that in these two regions the bias is roughly constant across all ranges (except for the highest range in Central Mexico) so it decreases sharply relative to the mid-range values. In the Yucatan Peninsula and Sweden, bias and random error are comparable in the lower biomass ranges, the middle ranges are dominated by random error, while bias is the largest component of error in the highest AGB ranges. For Poland, bias is by the Table 2 Accuracy assessment of the regional AGB maps stratified by reference AGB range: Sample size (N), Root Mean Square Error (RMSE), Relative RMSE (Rel. RMSE), Standard Deviation of the error (SD), Bias, and Coefficient of Variation (CV) of the error (when CV > 1 the random error dominates, when CV < 1 the bias does).    Outliers are not included. All the maps were aggregated to 1 km spatial resolution.
far the dominant error source apart from the two middle ranges of AGB. It is also noticeable that the SD of the error does not vary greatly across the different AGB ranges for each region, though is markedly different between regions (see Table 2 and Fig. 4). Hence, roughly speaking, the random error is not strongly dependent on the true AGB, and its value relative to the true AGB decreases as AGB increases (Fig. 5). It can also be seen that the contribution of the random error to the RMSE increases as AGB increases up to approximately 100 t ha −1 , then reduces sharply with increasing AGB (Fig. 5).
The values given for the overall bias in each region are close to zero (Table 2). This implies that all the methods cause the fitting curve (or its equivalent) to go through the point defined by the averages of the reference and estimated data, in common with normal regression methods which force this to happen. This explains why the overall RMSE and SD are nearly the same, since SD 2 = (RMSE) 2 -(overall bias) 2 and the overall bias is constrained to be nearly zero (Table 2).
Previously published pan-tropical Baccini et al., 2012;Avitabile et al., 2016) and northern hemisphere  mapping studies show AGB distributions and spatial patterns that are different from those for the regional maps from this study and the independent validation data (Figs. 6 and 7). In particular, in Mexico the distributions from Baccini et al. (2012) are shifted towards much higher values than is found in the present study and the validation data.
Similar estimates can be found in areas with high AGB levels such as Kalimantan, but they deviate from the reference AGB in areas of lower AGB such as Eastern South Africa, Yucatan and Central Mexico. In Central Mexico, the Saatchi et al. (2011) AGB distribution is similar to those in our study and the validation data, but for the Yucatan peninsula and Eastern South Africa it is shifted towards higher values while for Kalimantan is shifted towards lower values. In Avitabile et al. (2016) and this study, the AGB distribution is similar to that obtained from the validation data in Yucatan and Central Mexico but in Eastern South Africa the Avitabile et al. (2016) data are shifted toward lower values while this study is shifted towards higher values. In Kalimantan, the distributions of the validation data and all the maps, except for Avitabile et al. (2016), are highly skewed towards high values. Only the Avitabile et al. (2016) AGB map provides estimates similar to the true AGB in the Mexican sites. In Sweden, the AGB distributions estimated in this study and in Thurner et al. (2014) largely agree with the validation data. In Poland, the AGB distribution from Thurner et al. (2014) looks highly skewed and shifted towards lower values than the estimates from this study and in the validation subset.

Evaluation of the maps
The analysis of the regional maps affirms important properties of AGB estimation methods, some of which have been previously reported in the literature (e.g. Cartus et al., 2014;Carreiras et al., 2012;Englhart et al., 2012;Baccini et al., 2008;Sandberg et al., 2011;Rauste, 2005;Avitabile et al., 2016;Avitabile and Camia, 2018). The accuracy assessment shows underestimation in the upper AGB ranges in which the major error component of the RMSE originates from bias ( Table 2). The exception is in Eastern South Africa, where the reference values of AGB are below 80 t ha −1 . Additionally, although less apparent in Eastern South Africa and Central Mexico, AGB values below 100 t ha −1 are overestimated, regardless of the choices of data and method used to retrieve AGB (Table 2 & Fig. 4). Fig. 4 is derived from an independent dataset not used in model fitting, but similar behaviour is seen for data used in model fitting. This means that, although direct linear regression is not being used, all model fits used to predict AGB have an intercept at zero AGB that is too high, and/or the gradient of the model is too large for lower AGB values.
A major problem with the observed biases is that they depend on the true value of the AGB. If not, the data could easily be calibrated to remove them. Even though the analysis quantifies how the bias depends on the true AGB in each region, this does not lead to any way to correct the estimated values. Although some of the methods incorporate bias reduction measures, e.g. MaxEnt (Xu et al., 2016;Saatchi et al., 2011), and post-processing bias reduction techniques are also available, there is a risk of undesirable effects such as inflation of the overall mean square error due to an increase of the variance (Kosmidis, 2014;Xu et al., 2016). Addressing this problem requires new algorithms that intrinsically remove the bias (if this is possible), new data that do not  suffer saturation of the signal for higher biomass (assuming this is the primary cause of the observed underestimation in this range of biomass), or accepting that such bias will occur because AGB is only indirectly related to the remote sensing observables considered in these studies. However, as has been shown, for some regions and methods bias is not the dominant effect (i.e. Kalimantan and Central Mexico); instead, the most important type of error is random scatter of the data points in the model inversions, i.e., the model inversions are noisy. Some of this scatter may be reducible if its source can be identified, e.g., there may be errors due to inaccuracies in the ground data, geolocation errors (so that the reference data and inversions are spatially mismatched), radar speckle may have been insufficiently reduced (though our methodology ensures this should be of negligible importance in our analysis), or the remote sensing signal may be weakly correlated to AGB due to the limited sensitivity to biomass of the sensor. Moreover, if the dominant error term comes from scatter, it can be reduced by spatial averaging (whereas bias cannot) at the expense of reduced spatial resolution and aggregation to coarser spatial units. Pan-tropical (Saatchi et al., 2011;Baccini et al., 2012;Avitabile et al., 2016) and northern hemisphere maps  were compared to in situ data and the estimates from this study. Some similarities were found between our study and Avitabile et al. (2016) in the Mexico sites and Thurner et al. (2014) in Sweden. Even though Avitabile's map is a fusion of Saatchi's and Baccini's data, the fusion method used additional ground measurements and higher resolution regional maps to correct for the bias of the original estimates. This is the case in Mexico, where the Avitabile et al. (2016) product was calibrated with another regional AGB product (i.e. Cartus et al., 2014). However, in Eastern South Africa, Avitabile's map does not have calibration data and systematically underestimates AGB with maximum predicted AGB values just above 20 t ha −1 , close to a factor of three less than the higher AGB values reported in the validation dataset (60 t ha −1 ). In general, the distributions of the AGB estimates from this study are closer to those of the independent validation data than those of the published AGB global maps.

Strengths and weaknesses of proposed retrieval methods and available datasets
Several methods were used to predict AGB and showed specific strengths and weaknesses (The requirement for only a limited number of ground data points is an advantage of semi-empirical methods, such as the Water Cloud Model (WCM) + Bayesian inversion used in Eastern South Africa where only one parameter of the regression needs to be derived from in situ biomass data. However, the formulation of the WCM for this region does not produce estimates for AGB above˜100 t ha-1, and it is tuned for regions such as savannas with biomass below the saturation level at L-band . A method for global mapping, BIOMASAR (Santoro et al., 2011), is also based on the WCM and does not need in situ data to fit the model parameters. However, the BIOMASAR algorithm is understood to estimate growing stock volume (GSV), so a further step is needed to estimate AGB, which requires spatial information on specific wood density and biomass expansion factors, or similar proxies. Table 3). A key factor in the performance of regional methods was the amount and quality of available in situ data. In regions with abundant in situ data from a forest inventory, non-parametric datadriven methods, such as k-NN and MaxEnt were chosen, but in regions where data were scarce a model-based parametric approach was selected. Large numbers of forest plots were available in Sweden (22,548 plots) and Mexico (5140 plots). Poland (285 plots) and Kalimantan (247 forest plots) had fewer, but in Kalimantan airborne LiDAR biomass maps were developed to increase the size of the training and validation dataset. In the data-scarce region of the South Africa savannas only 37 1-ha AGB plots were available, but they were complemented with airborne LiDAR-based biomass maps derived from locally developed LiDAR models calibrated against field data (Naidoo et al., 2015) for validation. LiDAR airborne data was used for calibration and validation in Kalimantan, while only as validation in Eastern South Africa. This study assumed the AGB predicted by the LiDAR airborne to be errorfree. However, the use of LiDAR data might introduce substantial errors in the AGB prediction originated from the ground-to-LiDAR model used (Saarela et al., 2016;Holm et al., 2017), which are not accounted for in this study.
The requirement for only a limited number of ground data points is an advantage of semi-empirical methods, such as the Water Cloud Model (WCM) + Bayesian inversion used in Eastern South Africa where only one parameter of the regression needs to be derived from in situ biomass data. However, the formulation of the WCM for this region does not produce estimates for AGB above˜100 t ha −1 , and it is tuned for regions such as savannas with biomass below the saturation level at L-band . A method for global mapping, BIOMA-SAR (Santoro et al., 2011), is also based on the WCM and does not need in situ data to fit the model parameters. However, the BIOMASAR algorithm is understood to estimate growing stock volume (GSV), so a further step is needed to estimate AGB, which requires spatial information on specific wood density and biomass expansion factors, or similar proxies.
The most used dataset in this study was the L-band SAR ALOS PALSAR annual mosaics (Shimada et al., 2014;Shimada and Ohtaki, 2010), which were used in all regional methods except for Sweden. The saturation level of L-band SAR was found in previous studies to vary between 40 t ha −1 and 150 t ha -1 (Balzter et al., 2002a, b, Tansey et al.,   2004, Lucas et al., 2010;Hame et al., 2013) and generally HV gave higher saturation levels than HH polarization (Le Toan et al., 1992;Saatchi et al., 2007;Mitchard et al., 2009;Hamdan et al., 2011;Mitchard et al., 2012;Saatchi et al., 2011;Englhart et al., 2011;Hamdan et al., 2014). The saturation level at L-band depends on the geometry of the radar measurements, and therefore on forest type and environmental effects , as it can be observed in the different relationships found in the Yucatan peninsula and Central Mexico (Fig. 8). Use of the annual mosaics was also a limitation, as better results have been obtained with the use of multi-temporal SAR datasets due to the decrease in the retrieval error in AGB ranges to which the sensitivity of the SAR signal is weak Santoro et al., 2015bSantoro et al., , 2006Thiel and Schmullius, 2016;Antropov et al., 2017;Cartus et al., 2012). However, this was necessary due to the cost of acquiring the multiple ALOS PALSAR images used to generate the mosaics over large areas and for a given year. For the model-based approaches, information provided by optical sensors, such as Landsat Percent Tree Cover (e.g. in Eastern South Africa), was needed for parameterisation (Bouvet et al., 2018). Such datasets were also used in the non-parametric machine learning methods, where they contributed towards improving model performance. Three out of the five regional methods used optical data as predictor variables. DEM data from SRTM were also used in Mexico and Poland as a predictor variable and in South Africa and Kalimantan for correcting or masking terrain effects, respectively. The use of topographic information by machine learning approaches for forests located in mountainous areas contributed to the estimation of AGB in Central Mexico.
The evaluation of the maps showed that a crucial limitation of the retrievals is that underestimation occurs at high AGB ranges, and overestimation at low AGB ranges. Remote sensing of AGB (using either reflectance or radar backscatter) is subject to decreasing sensitivity to AGB as biomass increases. Hence changes in AGB above a saturation level result in changes in the remotely sensed variable that are small compared to the variability in the signal. In these circumstances it is readily understood how linear regression would lead to these effects. The regression line always passes through the point defined by the mean of the reference data and the mean of the estimates, and the fitting effectively rotates the line about this point in order to reduce the sum of squared differences between the linear estimates and the reference data. For a concave curve, such as is produced by saturation, it is then inevitable that over-estimation will occur for low biomass and under-estimation for high biomass: getting a good fit for low biomass tends to steepen the line, while for high biomass it reduces the slope, and the regression line trades one against the other. In the case just discussed the model does not properly capture the relationship between the signal and the reference AGB, either due to insufficient calibration data in the upper AGB range, or by fitting an inappropriate model to the AGB observations. However, if instead a fitting curve is used that correctly represents the saturation, bias is still to be expected for higher values of biomass. This occurs because, by definition, the backscatter values in the saturation zone are the result of random variation around the saturation level, with at most a weak dependence on biomass. Hence, although it is possible to use an estimator that assigns values of biomass above the saturation level, these represent the scatter in the data and the estimator will be biased towards the saturation level.
However, for correctly fitted data the overestimation at low AGB ranges is more difficult to explain. For SAR datasets, it could be connected to the high variability of the signal under soil moisture changes, as well as soil roughness (Mattia et al., 2009), but these do not apply to an optical-based method or to methods using SAR and optical data in synergy. Alternatively, it could be linked to the underestimation at high AGB ranges, as models provide the best fit by minimising an overall error term or cost function. The overestimation in the lower reference AGB ranges may stem from the model compensating for its inability to predict high AGB values accurately. In Kalimantan, the estimations were the highest, above 300 t ha −1 , and underestimation can only be observed above 200 t ha −1 . That underestimation occurs at such a high AGB level might be due to the large number of reference data points at high AGB levels compared to all other regions, which resulted in a fitted model with the largest errors and biases in the mid AGB ranges, between 50 and 150 t ha −1 , for which fewer calibration data were available (Table 2 and Fig. 4). Avitabile et al. (2016) also found overestimations in the low AGB range and underestimations in the high AGB range when validating pan-tropical datasets Baccini et al., 2012) against reference data. Several studies using the Random Forests regression algorithm have found similar behaviour at both ends of the AGB range, and report it as an effect of the averaging of tree-based algorithms (Baccini, 2004;Baccini et al., 2008;Avitabile et al., 2011;Urbazaev et al., 2018). Blackard et al. (2008) reported the same using a treebased method (i.e. recursive partitioning regression), but suggested that saturation of optical data could explain the underestimation for high AGB densities, and scaling issues between plot and pixel could explain the underestimations at low AGB. This effect, characteristic of treebased algorithms, could also explain the results in Poland, but cannot explain them in the other regions. Additionally other studies, such as Tsui et al. (2013) which used Kriging, Chopping et al. (2011) which used a geometric-optical canopy reflectance model, Del Frate and Solimini (2004) which used Neural Networks, or Sun et al. (2011), and Chi et al. (2015 which used multiple linear regression methods showed the same effects. Kattenborn et al. (2015) also reported the same effect for four semi-or non-parametric regression models (Random Forest, Generalized Additive Model, Generalized Boosted Regression Models  and Boosted Generalized Additive Model), and suggested insufficient calibration data at the low and high AGB ends as the main cause. However, the consistency across all these studies suggests that there is a fundamental problem in retrieving biomass from the available data, which may only be solved by the use of SAR data with higher sensitivity to small scattering elements such as C-band (e.g. Sentinel-1) on the lower AGB range, and data with greater sensitivity to large scattering objects (i.e. high biomass) such as the P-band  to be exploited by the ESA BIOMASS mission.

Conclusions
The regional forest AGB mapping methods presented here reflect both the variety of training data available in different regions and the diverse range of algorithmic choices of each regional team. However, the retrieved AGB values agree better with independent in situ data than those published recently Baccini et al., 2012;Avitabile et al., 2016;Thurner et al., 2014). As the different methods were not compared on the same site, we cannot comment on relative performance. However, we can form conclusions based on the commonalities observed from the comparison of standardized accuracy assessments.
The key EO dataset used in most methods was the L-band ALOS PALSAR mosaics, which provides the highest sensitivity to AGB of the currently available spaceborne sensors. However, it is clear that all current spaceborne sensors (SAR and optical) are inadequate for accurately estimating AGB beyond 100-150 t ha −1 . The case studies presented here highlight challenges of using sub-optimal datasets for this task. Any estimation beyond this range was dominated by negative bias or presented large errors for any of the given study sites. The assessment indicates, however, that one could push this limit in certain conditions, as seen in Kalimantan or Central Mexico. This could be linked to the use of large amounts of in situ data in the case of Kalimantan. In Central Mexico, a forest structure which leads to a higher L-band saturation level, or the contribution of additional datasets (i.e. the DEM) could be the cause.
There is also a general problem with overestimation at low AGB densities which cannot be entirely explained by the datasets used, but rather as an intrinsic problem of the proposed algorithms to correctly capture the relationship between EO data and AGB in the low AGB range. This means that we might have to consider alternative regression schemes, or accept the biases at both ends of the biomass range, provided that the modelling framework captures the relationship between observations and biomass.
This aspect shall deserve substantial attention in future studies as currently existing models for large-scale biomass estimation rely on simplifying assumptions that may not fully encompass the complex interaction of the remote sensing signal with vegetation The amount and type of reference data is also very relevant in terms of achieving the most reasonable AGB prediction model. Eastern South Africa used large plots (i.e. 1 ha), more suitable for calibration of EO methods , which might explain the good results for the low AGB ranges in this study area. Mexico, Sweden, Poland and Kalimantan relied on datasets of numerous small plots for calibration, supplemented with LiDAR in the case of Kalimantan. In the future, similar research should be based on homogeneous field-based datasets to avoid possible discrepancies resulting from the training data.
Better quality and more abundant large plots for calibration of the algorithms , the use of SAR time series (from Sentinel-1, ALOS 2 PALSAR 2, or future NovaSAR and NISAR) (Santoro et al., 2011;Antropov et al., 2017), the increasing availability of airborne or spaceborne LiDAR sensors like GEDI (Dubayah et al., 2014;Goetz et al., 2015) and MOLI (Kimura et al., 2017), satellites specifically designed for biomass estimation such as NISAR (Rosen et al., 2016(Rosen et al., , 2015, TanDEM-L ( Moreira et al., 2015), and the P-band BIOMASS SAR mission , and algorithms capable of reducing the errors in the low and high AGB ranges, are a promising way forward to improve global biomass estimates and reduce biases and errors in the map products in all biomass ranges.

Author contributions
P. Rodríguez-Veiga drafted the manuscript, and made the overall analysis and interpretation of the data. P. Rodríguez-Veiga and Heiko Balzter made the study conception and design. S. Quegan did the accuracy analysis, supervised the teams, and interpreted the results. Data acquisition, field work, mapping, and accuracy analysis, in each region was performed by the different teams organized as follows: Yucatan peninsula and central Mexico (P. Rodríguez-Veiga, K. Tansey Herold provided support for the teams on standardized definitions, and characterization of the uncertainties. C. Thiel, C. Pathe, and C. Schmullius were responsible for the conceptual design of the GlobBiomass project proposal, as well as for the overall coordination of the project. Moreover, C. Pathe assisted in SAR data acquisition and pre-processing. S. Quegan, H. Balzter, C. Schmullius, and F. M. Seifert supervised the project, and provided science advice on the work of the regional teams. S. Quegan, J. Carreiras, Y. Rauste, M. Santoro, N. Carvalhais, V. Avitabile, and R. Mathieu made critical revisions of the paper. All authors discussed the results, provided critical feedback, and helped shape the research, analysis and manuscript.