Estimation of the total dry aboveground biomass in the tropical forests of Congo Basin using optical, LiDAR, and radar data

ABSTRACT In this investigation, optical (SPOT-7 NAOMI), airborne LiDAR, and PolInSAR L-band data, along with forest inventories, were employed to develop models for estimating total dry aboveground biomass (AGB) over the tropical forests in the Congo Basin (Gabon) of Central Africa. Remote sensing-based variables like texture (from SPOT), median canopy height (from LiDAR), and backscattering coefficient along with canopy surface heights (from PolInSAR) were used to estimate the AGB. These variables were used individually (or combined) to develop the AGB models based on the multivariate adaptive regression splines (MARS) approach. Validation indicated that in case of the single variable models, the LiDAR-based model yielded the lowest estimation root-mean-square error (RMSE = 28%). The error decreased further when the median canopy height was combined with the texture or with the radar variables (RMSE <25%). The texture derived from the Fourier transform textural ordination (FOTO) approach was more effective in improving the results as compared to the gray level co-occurrence matrices (GLCM) approach. Model validation indicated that the best performance in AGB estimation was achieved by combining optical, LiDAR, and radar data (R2 = 0.89 and RMSE = 24%).


I. Introduction
Forests play a key role in mitigating the effects of climate change (Picard et al. 2018). After the oceans, forests constitute the second largest atmospheric carbon sink, which accounts for more than 70% of the total vegetation biomass (Villard et al. 2016). Carbon is captured and stored by the different components of the trees, which includes branches, trunks, leaves, and roots (Jones et al. 2019). Carbon content assessment routinely considers the total dry above-and belowground biomass (Scharlemann et al. 2014;Wade et al. 2019). Total dry aboveground biomass (AGB) of trees is the total quantity of dry organic matter that is located above the ground, which is accommodated by the branches, leaves, stems, stumps, bark, and fruits (Neumann et al. 2016;). This dry matter constitutes more than 80% of total forest biomass (Villard et al. 2016).
Traditionally, AGB is estimated using allometric equations, which are based upon forest inventory data. The critical variables that are generally used include diameter, height, and wood density (Chave et al. 2014;Ngomanda et al. 2014;Fayolle et al. 2018). AGB can also be estimated using a combination of remotely sensing data and ground measurements (Migolet et al. 2007;Coulibaly et al. 2008;Cartus, Santoro, and Kellndorfer 2012;Chaparro et al. 2019;Knapp et al. 2020). In this regard, remotely sensed information from a variety of sensors (for example, optical, radar, and LiDAR) at different spatial and temporal resolution have been used over the last several decades (Xue and Su 2017;Cracknell 2018).
However, remote sensing approaches confront various challenges, particularly over the tropical forests. These challenges can include saturation of indices extracted from optical and radar data, or effects of complex topography (Fung 1994;Ploton et al. 2017;Bouvet et al. 2018;García et al. 2018;Marselis et al. 2018;Chaparro et al. 2019;El Hajj et al. 2019;Hiatshwayo et al. 2019;Pourshamsi et al. 2021). Saturation of an index renders it insensitive to characterize changes in the target (Bréda, Soudani, and Bergonzini 2002;Aubert 2012). For example, due to the density of the vegetation, the C-band backscattering coefficient saturates at AGB values evaluated between 70 and 150 t ha −1 . Saturation can lead to an underestimation of forest variables, such as canopy height or AGB (Mermoz et al. 2015;García et al. 2018;Bouvet et al. 2018;Pourshamsi et al. 2021). Some indices are relatively less prone to saturation than the others. For instance, FOTO (Fourier transform textural ordination) based indices seem to better resist saturation in stands where AGB can attain 500 Mg ha −1 (Bastin et al. 2014;Tapamo et al. 2014). Synthetic Aperture Radar (SAR) data acquired in L-and P-bands (longer wavelengths) also offer excellent potential in tropical forests (e.g. Sinha et al. 2015;Cartus and Santoro 2019;Pourshamsi et al. 2018Pourshamsi et al. , 2021. LiDAR offers promising performance in estimating tree height (up to 40 m) and biomass (Zolkos, Goetz, and Dubayah 2013;Chen, Vaglio Laurin, and Valentini 2015;Réjou-Méchain et al. 2015;El Hajj et al. 2018;Silva et al. 2018). Nevertheless, LiDAR data might as well be affected by the complexity of the topography/vegetation and the extent of the spatial coverage being considered (Lefsky et al. 2005;Gopalakrishnan et al. 2015;Urbazaev et al. 2018). The links between AGB and remotely sensed indices are generally established using statistical approaches, such as multiple linear regressions (MLR) or power functions (Singh et al. 2015;Chong et al. 2017;Kamir, Waldner, and Hochman 2020). These approaches often have difficulty modeling high-dimensional data from diverse or redundant sources (Almeida et al. 2019b; Ku and Popescu 2019). Almeida et al. (2019b), while formulating AGB models over the Brazilian forests using a combination of LiDAR and hyperspectral data, showed that machine learning (ML) models performed better than linear regressions. In order to overcome the inadequacies of classical regression methods, ML techniques are increasingly being considered (Domingues et al. 2020;Knapp et al. 2020). Multivariate adaptive regression splines (MARS) is one such technique that models interactions and non-linearities in large datasets, which are regularly encountered in remote sensing (Abdulelah Al-Sudani et al. 2019) and ecology (García Nieto et al. 2019). In this technique, the model for estimating the dependent variable (like AGB) is established as a function of the most relevant biomass-sensitive variables (like forest height determined from the remotely sensed information), while the less significant and redundant ones are eliminated. MARS is well known for its performance of estimation. Investigations have reported better performance while employing MARS as compared to the other statistical approaches (Pramila and Mahesh 2015;Park et al. 2017), logistic regressions, and neural networks (Li et al. 2019). Since the early 2000s, several models have been proposed to produce AGB maps at various resolutions (ranging from 30 m to 1 km) and spatial scales: global (Chen, Peng, and Fei 2019); pan-tropical (Avitabile et al. 2016;Baccini et al. 2017); continental (Bouvet et al. 2018); regional (Baccini et al. 2008;Almeida et al. 2019b); and local (Bastin et al. 2014;Liao et al. 2019;Van Pham et al. 2019).
Over the tropical Central African nation of Gabon, Landsat Enhanced Thematic Mapper Plus (ETM+) (Goïta, Mouloungou, and Goze 2017), LiDAR , and radar (Cartus and Santoro 2019) or merged LiDAR-radar (Mitchard et al. 2012) data were used to predict AGB over selected sites. However, large uncertainties have been reported (Gopalakrishnan et al. 2015;Loubota Panzou et al. 2016;Li et al. 2017;Gonçalves, Malico, and Sousa 2018;Reichstein and Carvalhais 2019;Hansen, Potapov, and Tyukavina 2019). One of the major difficulties associated with AGB retrieval in tropical forests is the scarcity of both in-situ and remote sensing data, which is partly due to the complexity of these environments, their accessibility, and the presence of persistent cloud cover, among other obstacles. This situation is particularly true over the Congo Basin, where large-scale experimental remote-sensing data are limited. The AfriSAR 2016 campaign in Gabon, jointly conducted by NASA, ESA, and AGEOS, attempted to address this gap. The campaign enabled the acquisition of airborne LiDAR and radar data along with forest measurements (available at https:// daac.ornl.gov/). Many approaches that have been proposed for estimating AGB in tropical forests since 2017 are based upon these experimental data (El Hajj et al. 2018;Labriere et al. 2018;Saatchi et al. 2019). Nevertheless, important uncertainties remain in the estimated AGB. For example, Labriere et al. (2018), reported RMSE errors varying between 40.2 t ha-1 (23.3%) and 85.6 t ha-1 (49.7%), and bias between 9.61% and 15.9% for AGB map produced at a spatial resolution of 1 ha and 0.25 ha, respectively, in the Mondah forest (including Raponda Walker Arboretum) in Gabon. El Hajj et al. (2019), also produced such maps in Gabon at a spatial resolution of 0.25 ha using different algorithms and reported initially a RMSE of 63.3 t ha −1 (38.1%) with a bias of 19.1 t ha-1 (11.5%). The level of uncertainties still affecting AGB estimation, especially in tropical forests, motivates the development of new approaches in order to better address issues related to climate change and sustainable development Bouvet et al. 2018;Saatchi et al. 2019;El Hajj et al. 2019;Epple et al. 2016;Mulongoy and Cung 2011). These concerns serve as the main motivation for this research. Our study proposes new models for estimating AGB and indicates their effectiveness over the Congo Basin. We hypothesized that a combination of multisource remote sensing data (like optical, LiDAR, radar etc.) will offer the best potential for predicting AGB in the complex forest environments, which is often the case over the Congo Basin. A field campaign was conducted as part of this study to collect relevant forest variables from a number of test plots. These datasets were combined with those acquired during the AfriSAR campaign for this investigation. Unlike other investigations conducted over the Congo Basin, all models that were developed in this work are based on the MARS approach. Despite its promising potential, this approach has been under-utilized for estimating tropical forest AGB (Vaglio Laurin et al. 2016). The key contribution of this paper is the development of MARS-based models to better estimate AGB, based on a combination of multi-source data. The paper is organized in the following manner: Section I: Introduction, Section II: Data and Methodology, Section III: Results, Section IV: Discussion, and Section V: Conclusion.

Study area
The study site is located in Gabon (Congo Basin, Central Africa), which is a country covered by dense tropical rainforest. In 2010, Gabon's forests represented 88% of the country's total area i.e. 266,667 km 2 (Sannier, Fichet, and Massard 2014). The site chosen for this investigation is situated between 9°18ʹ42.29" E and 9°25ʹ53.10" E, and 0° 35ʹ19.72" N and 0°32ʹ1.58" N and has an area of 8092 ha (Figure 1). The area is characterized by low elevations (up to 41 m asl), and includes mostly mature and secondary forests, forest regrowth, and floodplains including mangroves. A portion of the site is included within the Raponda Walker Arboretum Conservation Area (formerly known as Forêt de la Mondah).

Data collection
This study is based on in-situ measurements that were acquired from two different field campaigns. The first is the AfriSAR campaign in Gabon, which was conducted in 2016 as a collaborative effort between NASA, ESA, and AGEOS (Fatoyinbo et al. 2018). This campaign included 15 one-hectare sample plots, but only 14 of them were used due to data availability. A publicly available database provides access to different measured variables, including the number of trees (Na) per plot and their diameters at breast height (D). All trees with a diameter greater than or equal to 10 cm have been counted, and their D measurements were measured with a tape at 1.30 m above the ground. Calculated variables are also available in the database, especially modeled maximum height (H T ) and AGB. For more details on how the in-situ measurements were taken during the AfriSAR campaign, please see Fatoyinbo et al. (2018) and Labriere et al. (2018). H T is calculated locally based upon measured diameters ) as equation 1: The standard residual error of the height in the Mondah forest is estimated around 7.4 m, i.e. about 23.2% of average maximum height in the region . The AGB of individual trees that were geo-referenced in the field within each 1-ha sample plot were estimated with the equation of Chave et al. (2014) (Fatoyinbo et al. 2018): where ρ is wood density (g cm −3 ), D is the diameter at breast height (cm), and H T is the modeled maximum height (m) of the tree. The variable ρ is derived with the BIOMASS package of R, based upon tree species, genus or family (Fatoyinbo et al. 2018;Donegan et al. 2014). According to Chave et al. (2014) the standard residual error of AGB estimation using equation 2 is evaluated to 0.357 kg, i.e. 0.03% of the average AGB from 4004 sample plots. The authors reported underestimation of 20% for AGB > 30 t ha −1 and overestimation of 2.7% for AGB between 10 and 30 t ha −1 .
The second campaign was carried out in this study specifically to increase the in-situ measurements. It was undertaken over the same site as the AfriSAR campaign from December 2017 to January 2018. A total of 27 plots were randomly sampled. Each measured 30.8 × 30.8 m, the size of which differed from those of the AfriSAR campaign (1-ha). Within each plot, the total number of trees (Na) were counted. Bole (trunk) diameters (D, cm) were measured with a DBH tape at 1.3 m above the ground surface. For each tree, its scientific binomial (genus, species, and authority) and family were recorded. Height H T and AGB were calculated, respectively, over the different plots using equations 1 and 2, similar to the AfriSAR campaign. Wood density values (ρ) of the trees were extracted from the International Center for Research in Agroforestry (ICRAF, Nairobi, Kenya) database (http://db.worldagro forestry.http://db.worldagroforestry.org//wd), based upon species, genus and family of each tree. The ICRAF and the BIOMASS package of R employ the same relationship to calculate the density.
Since the plots of the AfriSAR campaign had a different size (1 ha) compared to those used during the second campaign (30.8 x 30.8 m), a grid of 30.8 × 30.8 m sample plots was superimposed on each of the 14 AfriSAR plots (100 x 100 m each). In some cases, the 30.8 × 30.8 m grids were cut or did not fully align within the 1-ha AfriSAR campaign plots. These grids were excluded from the investigation to keep only those completely located inside the AfriSAR plots, which allowed random selection of 48 plots of 30.8 × 30.8 m. In addition to this, there were 27 plots from the second campaign. Therefore, a total of 75 sample plots of 30.8 × 30.8 m were available, which were randomly distributed across the study site. These plots contained a total of 3076 trees. The mean values of the related attributes are presented in Table 1.
In addition to the in-situ measurements, this investigation employed the following remotely sensed datasets. i) SPOT-7 New AstroSat Optical Modular Instrument (NAOMI) data: This data was acquired in 2015 and provided by l'Agence Nationale des Parcs Nationaux (ANPN) and AGEOS for research purposes. The image contains four spectral bands, including Blue band (450-525 nm), Green (530-590 nm), Red (625-695 nm), and Infrared (760 à 890 nm) at 6 m spatial resolution. In addition to this, there is a panchromatic band (450-745 nm) at 1.5 m resolution. The image is geo-referenced and corrected for unwanted radiometric effects. The four spectral bands are pansharpened with the help of the panchromatic band to enhance the spatial resolution to 1.5 × 1.5 m. As reported in previous studies, such a fine spatial resolution allows better characterization of texture and canopy crowns structure (Ploton et al. 2012;Migolet and Kalifa 2020).
ii) Full waveform airborne LiDAR data: This data was acquired by NASA's Land Vegetation and Ice Sensor (LVIS) during the AfriSAR campaign, with a density of 2.3 points per square meter . More specifically, this study uses the median heights (RH50, in m) of the canopy extracted from the LiDAR acquisitions. Past investigation observed that RH50 was more effective than top canopy RH100 heights in estimating AGB . Details on how canopy height information is derived from the AfriSAR LiDAR measurements is available in Blair and Hofton (2017).
iii) Polarimetric and interferometric SAR data (PollnSAR): This data was acquired at L-Band during the AfriSAR campaign (see details in Lavalle et al. 2018). The measurements were taken with the UAVSAR (Uninhabited Aerial Vehicle Synthetic Aperture Radar flown by Jet Propulsion Laboratory, Pasadena, CA). The radar variables considered here are the backscatter coefficient (σ 0 , in decibels dB) and the height to top of canopy (H R , in m). Backscattering coefficient σ 0 corresponds to the normalized value of the radar return from a distributed target. It depends on several factors such as radar frequency, surface type and dielectric state, as well as sensor geometry. L-Band σ 0 has been employed by different studies to estimate forest biomass (Villard et al. 2016;Agrawal and Tolpekin 2019). Canopy height H R was derived in the AfriSAR campaign using the Random Volume over Ground (RVoG) model applied to PolInSAR data ). More information on the model and its inversion can be found in Denbina, Simard, and Hawkins (2018) and Papathanassiou and Cloude (2001). H R is an estimation of forest height from PolInSAR that could be a relevant variable for the computation of AGB. The choice of σ 0 was dictated by its sensitivity to vegetation physical characteristics Mermoz et al. 2015;Agrawal and Tolpekin 2019;Pourshamsi et al. 2021).

Methodology
The methodology of the study focused on the following three components: The computation of texture indices, the development and validation of the AGB models, and the comparison of the estimated AGB with the existing AGB maps. The methodological flowchart of the study is presented in Figure 2.

Computation of the texture indices
The initial step involves the computation of texture indices from the SPOT-7 pan-sharpened image at 1.5 m resolution. The choice of texture indices was determined by their resistance to saturation under high biomass conditions as compared to spectral indices (Bastin et al. 2014;Tapamo et al. 2014;Ploton et al. 2017;Chen and Wang 2020;Li, Zhou and Xu, 2021). Texture quantifies the relative arrangement of gray levels, spatial patterns, and geometric structures within the image (Haralick1979). Its properties are exploited in a variety of image analysis and computer vision applications such as object recognition and classification. In the recent past, texture information has been employed for forest biomass estimation (e.g. Alberto et al. 2012;Kelsey and Neff 2014;Singh, Malhi, and Bhagwat 2014;Singh et al. 2015;Li, Zhou, and Wenbin 2021). It can be quantified using different statistical, frequency, and geometrical measures. In this investigation, texture parameters are derived using, i) A frequency approach based on the Fourier transform by textural ordination (FOTO). The FOTO approach combines two-dimensional discrete Fast-Fourier Transform (FFT-2D) and Principal Component Analysis (PCA). Its implementation requires different steps, which have been reported by several investigations (Couteron, Barbier, and Gautier 2006;Proisy, Couteron, and Fromard 2007;Ploton et al. 2012;Barbier and Couteron 2015;Migolet and Kalifa 2020). The initial step involves converting the spectra from the spatial domain to the frequency domain by calculating a periodogram (i.e. spectral density of the signal) for each pair of spatial frequencies with Fourier coefficients. This operation is generally performed with different window sizes within the image. The mean values of the periodograms must then be calculated in all possible sliding window directions to generate an averaged radial spectrum (r-spectrum). The r-spectra synthesizes the characteristics of texture variations in terms of grain fineness, which is important for characterizing forests. Coarse grain textures are related to forests with higher heights and larger crowns, while fine grains correspond more to forests with small to medium crowns. In tropical forests, and particularly in our study area, coarse and fine texture grains characterize mature and secondary forests, respectively (Couteron, Raphael Pelissier, and Paget 2005;Couteron, Barbier, and Gautier 2006;Barbier et al. 2012;Ploton et al. 2012;Migolet and Kalifa 2020). The next step involves applying textural ordination to the r-spectrum values in order to characterize their variability, measure the dispersion between them, and reduce the dimension of the data. To realize the ordination, the r-spectrum values are integrated into a two-dimensional table, where each row corresponds to the r-spectrum of a given window and each column contains the amplitudes of the r-spectrum. The column values are then normalized and subjected to PCA. The PCA scores of the r-spectra correspond to the FOTO indices. Texture grain types are usually interpreted from the factor scores that are derived from principal components (PC) 1 and 2. Negative scores on the PC1 axis indicate coarse texture (mature forests), while positive scores are related to fine texture (secondary forest). On the other hand, PC2 explains the openness of the canopy, with negative and positive scores for closed and open canopies respectively. This ability to describe forest structures appears useful for estimating structural parameters (height and diameter) or AGB (e.g. Proisy, Couteron, and Fromard 2007;Barbier et al. 2012;Migolet and Kalifa 2020).
In this investigation, the FOTO approach was applied on the pansharped SPOT-7 near-infrared band due to its high sensitivity to vegetation (Proisy, Couteron, and Fromard 2007). Unsupervised classification by K-Means clustering was applied to the image to mask all areas except those covered by forests. By considering the previous studies in forest environments (Couteron, Raphael Pelissier, and Paget 2005;Couteron, Barbier, and Gautier 2006;Proisy, Couteron, and Fromard 2007;Ploton et al. 2012;Barbier and Couteron 2015), four increasing window  sizes were considered for the application of FFT-2D, namely, 75 m, 100 m, 150 m and 175 m. The r-spectra were then generated and analyzed for the different window sizes.
ii) The gray level co-occurrence matrix (GLCM) proposed by Haralick (1979) (Coulibaly and Gwyn 2005;Alberto et al. 2012;Kelsey and Neff 2014;Moya et al. 2019;Liao, He, and Quan 2019). In addition to the FOTO approach, the GLCM technique was employed to generate the texture parameters. Six parameters were selected (Table 2) based on their performances reported in the previous studies on forest environments (e.g. Alberto et al. 2012;Kelsey and Neff 2014;Singh, Malhi, and Bhagwat 2014). GLCM indices can be extracted from either the red or the near infrared band. Based on past investigations, the red band was chosen over the infrared band to generate the GLCM parameters (Alberto et al. 2012;Singh, Malhi, and Bhagwat 2014;Hiatshwayo et al. 2019). The reflectance in the red band is sensitive to the absorption by the chlorophyll in plant tissues for photosynthesis (Devineau 1990;Briottet 2016). The contrast in the light absorption requirements of the different forest species probably leads to better sensitivity of texture features in forested environments. For the computation of GLCM indices, four sampling directions (0°, 45°, 90°, and 135°) and a single inter-pixel distance were adopted in order to maximize discrimination among the land cover classes (Messier, Cavayas, and Pierre 2001). Several window sizes were tested during the parameter computation. Based on the analysis of curves of the coefficients of variation, the optimal window size was chosen to be 5 × 5 m (Coulibaly and Gwyn 2005).

Model development and validation 2.3.2.1 Preliminary processing and analysis.
The randomly selected ground samples (75 sample plots) were analyzed with respect to the remote sensing measurements (RH50, H R , σ 0 , and texture indices from FOTO and GLCM). Total in-situ AGB for each sample plot was determined by summing the AGB value for each tree located inside the plot. Moreover, for each of the 75 plots, we calculated the average modeled maximum height H T and the diameter at breast height D. Mean values of the remote sensing parameters considered were also calculated for each plot. Using basic allometric principles (Gould 1966;King 1996;Picard, Saint-André, and Henry 2012;Chave et al. 2014;Ngomanda et al. 2014), fieldbased variables were compared to remotely sensed information to analyze their relationships that  (2005) Standard deviation Standard deviation characterizes the gray level dispersion of the basic elements or patterns from which the texture is formed Standarddeviation ¼ ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi

Coulibaly and
Goita (2006) Homogeneity Homogeneity quantifies the level of texture uniformity. Homogeneity ¼ Singh, Malhi, and Bhagwat (2014) Entropy Entropy measures the degree of organization or disorder of the texture. Entropy ¼ Measures the linear dependence of gray levels in an image Correlation ¼ ði À jÞ 2 P i; j; d; θ ð Þ P(i,j,d,θ) is the probability of moving from gray-level pixel i to gray-level pixel j that are located at a distance d from one another in an orientation θ. The values µ x and σ x are respectively the mean and standard deviation of the rows of the co-occurrence matrix, while µ y and σ y define the same statistics for the columns. L corresponds to the maximum quantification level of the gray levels.
supported the AGB model development process. Before developing the AGB models, a preliminary dispersion analysis on field data (H T , D, AGB) was performed in order to have insights on their distributions. Outliers were analyzed in order to identify potential sources of field errors related to measurement records and calculations. Spatial autocorrelation analysis was performed to characterize the existing spatial relationships between the different AGB sample plots. Both global Moran's I (Moran 1950) and Anselin local Moran's I (Anselin 2010;Ou et al. 2019) statistics were employed in this investigation. Global Moran I index evaluates the global autocorrelation at fixed distance bandwidth. It indicated the spatial autocorrelation over the whole study area. Anselin local Moran Index was employed to identify the statistically significant clusters and outliers, and to describe the local spatial autocorrelation between the AGB sample plots. Typically, two potential AGB clusters (High-High (HH), Low-Low (LL)) and two potential outliers (High-Low (HL), Low-High (LH)) could be identified from the analysis. Moran's local (or Global) index values vary between −1 and +1. Positive value indicates a strong correlation and a significant clustered pattern over the AGB sample plots. A negative index shows negative correlation between neighboring AGB and a grouping of dissimilar values. Index value near or equal to zero indicates an absence of correlation and clustering. In this study, P-value (statistical significance at 95% confidence level) were used to interpret the calculated spatial statistics and determine the presence or absence of autocorrelation in the AGB sample plots data (Bouayad Agha and Bellefon, 2018; Baloloy et al. 2018;Ou et al. 2019). As in the work of Ou et al. (2019), the spatial autocorrelation was analyzed using different distance bandwidths. Here, the minimum, mean, and the maximum distances between neighboring sample plots were 28 m, 4073 m and 12,515 m, respectively. A bandwidth of 12,000 m was fixed initially for local autocorrelation analysis.

AGB models development and validation.
Following the autocorrelation analysis, 62 out of the 75 available sample plots were used in the development and validation of the AGB models. Among these, 45 plots were randomly selected to train the models. Different statistical metrics (correlation, P-value, etc.) were used to analyze the strength of the modeled relationship. The remaining 17 plots were used to validate the models and compare the estimated AGBs with the existing AGB maps. The data values of these sample plots are presented in Appendices 1 and 2. Models linking ground-based AGB to remotely sensed variables were established with the MARS algorithm (Friedman 1991). The choice of this approach is based on the fact that it considers the non-linearities and multi-dimensionality, while remaining simple in terms of its formulation and interpretation. In order to achieve this, MARS employs a sum of basis functions, the general form of which is given as follows (e.g. López-Serrano et al. 2016): where y i are the observed values of the independent variable, n 1 is the number of observations, f M (x i ) is the MARS model with basis functions M, x are the observed values of the predictors that are included in the MARS model, and P M is the number of parameters in the MARS model. In this research, several well-known statistical metrics are used to evaluate and validate the models. These include the coefficient of determination (R 2 ), the P-value, root-mean-square error (RMSE) and relative error (RE). Details on these metrics can be found in Ngomanda et al. (2014), Yang et al. (2017) or .

III. Results
The major results are summarized in this section, which includes the development of models based on the MARS approach and their validation. Even if the various texture indices were used during the process of developing the AGB models, only FOTO indices are explicitly explained here for the sake of brevity. Since the GLCM parameters are wellknown in the literature, they are not presented. Furthermore, the results of the comparison between the best model developed here and the existing AGB maps are not shown, but are discussed in Section IV.

Characterization of the FOTO indices
The r-spectra for each of the five analysis windows (75, 100, 125, 150, and 175 m) were generated following the application of the FOTO method (Section 2.3.1). The values were initially highest with the smallest window, and they decreased progressively with increasing window size, particularly for wave number less than 4 (Figure 3 (a)), and for spatial frequencies lower than 30 km −1 cycles (Figure 3 (b)). The highest proportion of explained variance (86%) was obtained with the 175 m window size (Figure 3 (c)), which captures more information in the r-spectra. Scatter analysis of the PC1 and PC2 scores from PCA of the r-spectra allowed four types of forest textures to be distinguished for each window size ( Figure 4). The 175 m window size displays the greatest dispersion, starting from the origin of the orthogonal plane, which meant greater discrimination between textures ( Figure 4 (e)). Thus, the 175 m window size was chosen, considering the best-explained variance proportion that it achieved and the dispersion of its PCA scores. Figure 5 illustrates the different texture types of the study area generated at the 175 m window size. Red represents a fine and closed texture. This texture is more consistent with the young-adult or floodable secondary forests (according to the field observations). Green indicates coarse and open texture, which corresponds best to the mature forests. Blue highlights coarse and closed texture, i.e. the old-growth secondary forests. Yellow indicates fine and open texture, which is associated with mature and secondary youngadult or floodable forests over the study area. A closed texture indicates the absence of gaps in the forest stand, which is not the case with open texture. This presence or absence of gaps in the canopy also indicates the existence or nonexistence of AGB on the ground. The closed or open texture is an indicator of the continuous presence or absence of AGB. Figure 6 illustrates the results of autocorrelation according to Anselin local Moran's index. The quantitative results are indicated in Table 3, including P-values (P). Around 2/3rd (61.3%), i.e. 46 of the 75 AGB sample plots indicated no spatial autocorrelation (P > 0.05). The results also show two significant clusters (P between 0.002 and 0.05), including a LL cluster (28% of the data) and HH cluster (about 9% of the data). One main LH outlier group is also found in the data with 21 AGB sample plots (P between 0.002 and 0.05). The results of the global Moran I index (shown in Table 4) indicates a global positive spatial autocorrelation of AGB data over the study area. In order to ensure that the autocorrelations are not sensitive to the approach used to derive sample plots from AfriSAR, we conducted the same analysis over the original AfriSAR campaign plots. Both Anselin local Moran I and global Moran I were calculated and analyzed. According to the results (Figure 7, Table 4), 66.7% of the original AfriSAR AGB sample plots indicated no autocorrelation. However, two clusters were observed that included a HH cluster containing 8.3% of the data and a LL cluster with 25% of the plots. As in our case, the global Moran I indicated a positive global autocorrelation over the study area. In summary, the spatial autocorrelations in the original AfriSAR AGB data is quite similar to the ones observed over the 75 sample plots used in this study. There is no spatial autocorrelation in the majority of the data considered. Accordingly, we used them in conjunction with the remotely sensed data to develop and validate the AGB models.

Developed AGB models
Before the development of the AGB models, the correlations and P-values at 95% confidence level were first computed between the different variables under study (Table 5). This analysis identified the remote sensing variables that could contribute the most to the AGB estimates, as well as highlighting co-linearities among predictors. AGB is significantly correlated with RH50 height of LiDAR (R 2 = 0.77, P < 0.0001) to a larger degree than to any other remotely sensed variables.
The importance of the FOTO analysis was also demonstrated; after RH50 of the LiDAR, the FOTO PC2 index is the second most strongly correlated variable with AGB (R 2 = 0.54, P < 0.0001). The two indicators from the  L-Band PolInSAR dataset (σ 0 and H R ) follow, with similar R 2 ranging between 0.45 and 0.47. Overall, the GLCM texture indices exhibited weaker correlations with AGB (the four most significant are shown in Table 5). Therefore, for the rest of the study, we retained only two GLCM indices, i.e. the mean and the contrast, which indicated correlations of R 2 = 0.22 and 0.13 with AGB respectively. Appendix 3 presents the scatter plots of the relationships between AGB and the different biomass-sensitive variables considered for this investigation (RH50, FOTO PC2 index, σ 0 , and GLCM mean texture index). AGB models were developed following the correlation analysis summarized in Table 5 and Appendix 3. The models with the highest estimation performance are tabulated in Table 6. They are classified into two categories. The first category presents the set of seven models that were developed using the individual or combined variables from either the SPOT-7 optical image, or from radar, or LiDAR. The second category contains the seven models developed by combining the variables from the different sources of remotely sensed data.
The statistical metrics associated with each model are summarized in Table 7. Models (1 and 2) using only the individual GLCM texture indices (contrast or mean) yielded the lowest, albeit significant, results. The errors associated with these models are very large (RMSE > 84%). The use of the individual variables extracted from the radar data (σ 0 and H R ) in Models 3 and 4 yielded improved results compared to those of the texture indices. Nevertheless, the errors on AGB remained large (R 2 : 0.47 to 0.51; RMSE: 69% to 72%). Model 5 combined the FOTO indices (PC1, PC2, and PC3), and Model 6 was based upon the radar variables σ 0 and H R . The two models yielded comparable  performances (R 2 : 0.60 to 0.64; RMSE: 60% to 63%). The results thus obtained surpassed those of the aforementioned single-variable models, but the errors were still high. Model 7 was developed using only the RH50 height of the airborne LiDAR. It produced significantly better results than Models 1 to 6, which were based upon either the optical image or PolInSAR data (R 2 = 0.77; RMSE = 47%). Regardless of the combinations used, optical data alone always produced substantial errors. The best model that was based solely on the optical image is Model 8, which combined FOTO PC2 and PC3 indices with the GLCM mean texture index (R 2 = 0.67; RMSE = 57%). The results are greatly improved when the radar indices (σ 0 and H R ) are combined with either the GLCM texture indices (Model 9) or the FOTO indices (Model 10). These two models yielded the same error levels (R 2 ~0.76; RMSE ~49%), which is comparable to the error produced by the RH50-based Model 7. Finally, our results indicated that the introduction of LiDAR RH50 into any combination produced the best results, regardless of the combination considered, i.e. with GLCM parameters (Model 11), FOTO indices (Model 13), PolInSAR (Model 12), or with a combination of optical and PolInSAR data (Model 14). These four models (11 to 14) were the most efficient ones. They yielded almost similar metrics (R 2 : 0.82 to 0.83; RMSE: 40 to 43%).

Biomass model validation
In order to validate the AGB models, the estimated values were compared to the AGBs of the independent sample plots set aside for validation purposes. The results reported in Table 8 confirmed that models based solely on the GLCM texture indices (Models 1 and 2) produced large errors (RMSE > 59%). Model 3 and 4, which are based on the radar indices σ 0 and H R respectively, yielded significantly better results, with much lower errors than those observed during the development phase (RMSE = 50 to 53%). Moreover, the combination of the two variables (σ 0 and H R ) in the same equation (Model 6) reduced the error by more than 10% compared to Models 3 and 4. Model 5, based solely on FOTO indices, yielded higher errors during the validation process (RMSE about 74% versus 63% in the training). Overall, the validation phase confirmed that the best single-variable model for estimating AGB is Model 7, which employed the RH50 of LiDAR (R 2 = 0.84; RMSE = 28%).
Examination of the results of variable associations (Table 8) revealed that the combination of texture indices derived solely from the optical image (FOTO and GLCM indices) in the same equation did not improve the AGB estimates (Model 8). However, a complementarity was observed between optical and PolInSAR data. The results improved significantly when radar indices were combined with GLCM texture indices or with FOTO indices (Models 9 and 10: R 2 = 0.44 to 0.59; P ≤ 0.004; RMSE = 55% to54%). This improvement was also observed during the development phase. Finally, the validation result in Table 8 indicate that any combination involving the LiDAR variable RH50 (Models 11 to 14) allowed for a better retrieval of the AGB. This is particularly noticeable for the combination of RH50 and radar indices (Model 12: R 2 = 0.90; RMSE = 24.1%), the FOTO indices of the optical image (Model 13: R 2 = 0.88; RMSE = 24.2%), and the combination of RH50 with all other variables (Model 14: R 2 = 0.89; RMSE = 23.7%). The combinations employing the FOTO texture indices (Models 10 and 13) gave better results than those integrating the GLCM indices (Models 9 and 11). The scatterplots of the observed and the estimated AGBs with the four models are presented in Appendix 4. The results from Models 12 and 13 are quite close to those of Model 14.
Considering both training and validation phases, Model 14 emerged as the most successful predictor of AGB in this investigation. This model was employed to     (Continued) produce AGB maps at a spatial resolution of 30.8 × 30.8 m as can be observed in Figure 8. AGB is shown in four-color ranges (Figure 8 (a)), while the locations of the sample plots are illustrated on SPOT-7 true color composite image (Figure 8 (b)). The maximum estimated AGB values in the area attained values as high as 585 t ha −1 . Areas with the highest AGB values (≥240 t ha −1 ) are more prevalent in the northwest and north-central part of the study site (dark-green color, Figure 8 (a)). They correspond to areas covered mostly by mature forests that are under conservation and are included in the Raponda Walker Arboretum. In the same areas, there are old secondary forests with biomass values between 146 and 240 t ha −1 (medium-light green color, Figure 8 (a)). Youngadult or floodplain secondary forests (medium-dark green color, Figure 8 (a)) are scattered over much of the area, with biomass values ranging from 52 to 146 t ha −1 . Finally, the lowest AGB (<52 t ha −1 ) coincided with bare soils or forest regrowth (light-yellow color, Figure 8 (a)). These areas are generally concentrated outside the conservation zone to the south, east, and northeast.

IV. Discussion
This investigation combined a set of new and past datasets (from the AfriSAR campaign) acquired over the tropical forests (both in-situ and various remote sensing measurements) in the Congo Basin. The MARS-based methodology proposed in this work demonstrates the potential to combine multi-source remotely sensed variables to improve the AGB estimation. The study also explored combinations of different derived information like texture (FOTO and GLCM indices), forest height, and radar backscattering coefficients. The overall statistics indicated that the AGB estimation error remained high (≥23% , Table 8), which can be attributed to uncertainties that may be contributed by different sources like quality of the insitu measurements and sensitivity of the remotely sensed data. For instance, Chave et al. (2009) reported mean errors of about 27% on wood density    .
In this study, standard residual errors between modeled forest height (HT, Equation 1) and corresponding heights derived from LiDAR and PolInSAR are 2.6 m (17.0%) and 3.07 m (20.3%), respectively. All these factors are critical in the computation of AGB using Equation 2. Moreover, the difficulties related to data collection in tropical forests may influence the sampling framework in the field, as well as the scale and representativity of the datasets. The analysis of Moran's Global index (Table 3) computed over sample plots (30.8 x 30.8 m) at the study site and those from the AfriSAR campaign at 1 ha (Table 4) reveals the existence of spatial autocorrelation in the AGB data. This global index characterizes the overall distribution and spatial structure of AGB in the whole study area. Spatial autocorrelation and heterogeneity are rather frequent phenomena in forests Méthot et al. 2014;Ou et al. 2019). In contrast, the use of local Anselin Moran's I to evaluate autocorrelation at local scales shows no spatial autocorrelation for 61% of AGB samples (from sample plots) used in this study (Table 3), and for about 67% of the data in the AfriSAR campaign (Table 4). Therefore, the strategy employed here to derive sample plots from the existing AfriSAR samples (1 ha plots) did not introduce unexpected spatial autocorrelations. The level of observed autocorrelation is comparable to several other investigations reporting the development of the AGB models (Ou et al. 2019;Ploton et al. 2020).
The GLCM texture indices indicated low to moderate correlations with biomass, compared to other indicators (e.g. FOTO PC2, H R , σ 0 or RH50, see Table 5). In the past investigations, these indices have often yielded mixed results, depending upon several factors such as image type, window size, interpixel distances, or sampling directions (Coulibaly and Gwyn 2005;Nichol and Sarker 2011;Alberto et al. 2012;Kelsey and Neff 2014;Pandit, Tsuyuki, and Dube 2020). In this study, we did not test several interpixel distances while estimating the indices. Only one fixed value was used. This may explain the relatively poor performance of the GLCM. Unlike the GLCM parameters, the results obtained with FOTO indices represented the vegetation better. The FOTO PC2 index described the forest structure better, which is also supported by the strong correlation with stand diameters (Table 5). However, this result differed from those reported in the previous studies, where the strongest correlations were found with the FOTO PC1 (Proisy, Couteron, and Fromard 2007;Singh et al. 2015;Migolet and Kalifa 2020). In this investigation, FOTO PC1 best described young-adult or floodprone secondary forests (notably mangroves). Unfortunately, the limited number of sample plots over these stands did not permit further investigation.
The strong link observed between PC2 and AGB enabled the development of relatively efficient MARS models, using only the SPOT-7 optical image (see Table 7, Model 5: R 2 = 0.60; P < 0.0001; RMSE = 63%). This was favored certainly by the fine spatial resolution of the pansharpened SPOT-7 image (1.5 m × 1.5 m), which allows better discrimination of the tree crowns. Moreover, the choice of the window size to derive the r-spectra is also crucial. Here, a window size of 175 × 175 m showed better distinctions between fine and coarse textures in closed and open forest canopies. Although, increasing the window size tend to improve the discrimination of fine and coarse textures, it could reduce the scoring for each principal component (Couteron, Barbier, and Gautier 2006;Proisy, Couteron, and Fromard 2007;Ploton et al. 2012;Barbier and Couteron 2015;Migolet and Kalifa 2020). In agreement with the previous investigations, this work also indicated the strong potential of L-band PolInSAR variables for estimating AGB in tropical forests. The sensitivity of the HV polarization channel to forest cover and volume scattering can explain this behavior (Ho Tong Minh et al. 2018;El Hajj et al. 2019). The backscatter coefficient σ 0 also appeared to be significantly related to the forest parameters (diameter and height) and AGB (Table 5). This permitted the development of Models 3, 4 and 6 (Tables 8 and 9) using the MARS approach with only L-band PolInSAR data. Validation results (Table 8) indicated that combining σ 0 and H R reduced the error by at least 10%.
The limitations on using the L-band data may arise from possible saturation effects in very dense forest areas, such as the north-western part of the study area. The introduction of texture information to the radar backscattering (Models 9 and 10) does not improve the AGB estimation errors. Among the individual variables that were considered, the LiDAR derived median canopy height (RH50) demonstrated the greatest potential for estimating forest AGB with low validation errors (Table 8, Model 7: R 2 = 0.84; P < 0.0001; RMSE = 28%). This result is consistent with previous studies Saatchi et al. 2019). Leitold et al. (2015) emphasized the importance of the density of LiDAR point cloud measurements in improving the accuracy of height estimates from LiDAR and AGB data. However, the density of 2.3 LiDAR points per square meters did not seem to be a limiting factor in this investigation. The results obtained solely with the RH50 height are quite interesting and significant, but our analysis indicated that they can be substantially improved by introducing additional variables extracted from radar or optical data. The best performance was actually observed for Models 11 to 14 that included LiDAR derived heights (Tables 8,9). The combinations with the radar σ 0 (Model 12) or with the FOTO indices (Model 13) proved to be particularly efficient, with RMSE errors of the order of 41% in the training phase (Table 7) and 24% in the validation phase (Table 8). Radar and optical data provide complementary information on the spatial structure of the forest (volume, crown, and density), thereby favoring better estimation of the AGB. Previous investigations also reported improvement in the results after combining multi-source variables (Sun et al. 2011;Zhang et al. 2014;Kumar et al. 2015;Almeida et al. 2019b;Li, Zhou, and Wenbin 2021).
Our research focused solely upon the use of the MARS approach in developing the AGB models. This was motivated by the fact that the approach has been seldom employed despite its potential being recognized by several investigations (Vaglio Laurin et al. 2016;Baloloy et al. 2018;Migolet and Kalifa 2020). The ability of the MARS approach to take into account the non-linearities in modeling is particularly valuable in the context of modeling the tropical forests biomass. In most of the models that have been developed, MARS circumvents the necessity of including non-linearities by summing linear basis functions. Interaction terms among variables have been rarely used here, except with optical data, which have much more indirect relationships with AGB. The relatively low number of the training dataset used may also contribute to the absence of the interaction terms (Li, Bakshi, and Goel 2009). In some cases, RMSE errors observed during the model validation appeared to be lower than those obtained during the training phase. The differences in the representativity of validation and the training dataset can probably explain this behavior (Zhang and Anthony 2016;López-Serrano et al. 2016;Vaglio-Laurin, 2015;Park et al. 2017;Li et al. 2019). This calls for further validations of the developed AGB models with more detailed forest datasets.
To further investigate, we compared the results of Model 14 to existing AGB maps. The seventeen independent validation sample plots were used in the evaluation in each case. Figure 9 indicates the comparison. Our model has a small bias and can reasonably estimate both lowest (<250 tons per hectare) and highest AGB (>400 tons per hectare). In contrast, most existing models overestimate AGB when biomass approaches 250 tons per hectare. Overall, the results are better for all the maps produced using local models developed over the Mondah forest site, compared to the ones at country (El Hajj et al. 2019) or regional (Bouvet et al. 2018) scales. This is most likely due to the representativity of the insitu measurements employed in different investigations. However, due to the limited number of samples used as well as the scale and spatial resolutions considered by various investigations, the comparisons should be interpreted with caution (Saatchi et al. 2011;Baccini et al. 2012;Ngomanda et al. 2014;Baccini et al. 2015;Baccini et al. 2017).

V. Conclusion
This investigation proposed the MARS-based modeling approach employing multi-source datasets to estimate aboveground biomass and demonstrated its utility over the tropical forests in the Congo Basin. The results indicated that the LiDAR derived median height is the most efficient predictor of AGB as compared to all other individual variables (RMSE = 28%). The performance of AGB estimation was enhanced when median height was combined with optical and radar observations (RMSE between 23% and 33%). The best model combined different variables extracted from the remotely sensed observations (LiDAR median height, PolInSAR target backscattering coefficient and canopy surface height, and Optical texture indices) to achieve a better performance (RMSE = 24%). This research Figure 9. Comparison of AGB estimated using Model 14 with AGBs from existing maps. The dotted gray line represents the 1:1 correspondence line. SR corresponds to the original spatial resolution of the AGB maps. ** is for P ≤ 0.0001; * for P < 0.05; no star indicates not significant at 95% confidence level. RMSE error is indicated in each case. demonstrated the efficiency of the MARS approach in estimating tropical forest AGB with or without the combination of different types of remotely sensed data. Compared to the results of the best model developed in this research, the errors in most existing AGB maps appear high. Future works will focus on further refinement and validation of the AGB models using extensive data precisely representing the vast majority of classes and tropical forest conditions.

Disclosure statement
No potential conflict of interest was reported by the author(s).

Highlights
(1) MARS approach was used with multi-source remote sensing data to estimate AGB. (2) Median canopy height is better correlated to AGB as compared to other variables. (3) FOTO PC2 texture is better correlated to forest biomass as compared to GLCM. (4) Best AGB models were achieved by combining multisource remote sensing variables.

Appendices
Appendix 1 Data used to train the AGB models proposed in this investigation.