Integrating Sentinel-1 and 2 with LiDAR data to estimate aboveground biomass of subtropical forests in northeast Guangdong, China

ABSTRACT Accurate monitoring of forest aboveground biomass (AGB) is vital for sustainable forest management. Generally, the AGB is estimated by combining satellite images and field measurements. Nevertheless, field plots are inaccessible in complex forest areas. Due to the limitations related to either optical or SAR data alone, combining the two data types is indispensable. Here, we explored the LiDAR-derived biomass as training/testing samples for the combination of yearly time-series of Sentinel-1 and Sentinel-2 images to estimate AGB in north-eastern Conghua, Guangdong province, China. The AGB reference map derived from airborne LiDAR data and field plots could provide more samples for satellite-based AGB estimation with a limited amount of sampling plots. We designed four groups of experiments based on diverse Sentinel-1 and Sentinel-2 variables, including backscatter, backscatter indices, texture features, spectral bands, vegetation indices, and biophysical features. Four different prediction methods (support vector regression, multi-layer perceptron neural network, K-nearest neighbour, and random forest (RF)) were separately used to estimate AGB. Results showed that the RF model achieved the best accuracy for AGB mapping. All Sentinel-1 and Sentinel-2 experiment (R2: 0.72, RMSEr: 17.65%) performed better than the monthly complementary experiment (R2: 0.66, RMSEr: 19.01%). Additionally, the arid period images were observed to be sensitive for estimating AGB. The most contributing Sentinels predictors were determined via a sequential forward selection. Consequently, the proposed methodology has great potential for low-cost, large-scale, and high-precision AGB estimation.


Introduction
Subtropical forests are crucial carbon reservoirs in terrestrial ecosystems, and their spatial distribution and changes are significantly related to climate variability (Wang et al. 2019;Zhao et al. 2019).Climate change means the possibility of increased global warming and land use change (Kim, So, and Bae 2020).These hazards call for the sustainable management of forests.
Subtropical forests are a key factor in tackling climate change (Moomaw, Law, and Goetz 2020).Furthermore, aboveground biomass (AGB) is indicative of the carbon sequestration capacity of forest ecosystems (Deb et al. 2017).The aim of ensuring sustainable management of forests entails monitoring the temporal and spatial variation in biomass.Remote sensing has been regarded as an efficient means in estimating large-scale forest AGB (Lu et al. 2016).Nevertheless, implementing field measurement as an important prerequisite is still confronted with many hurdles, especially in subtropical forest areas.On one hand, widespread fieldwork requires substantial time and human costs (Timothy et al. 2016).On the other hand, not all tree species have available allometric equations which is needed to convert field measurements into AGB values.Environmental and climatic factors which might affect the growth status of tree diameter-at-breast height (DBH) and height (H) may influence the feasibility of existing models.This uncertainty exists most of the time and affects the results of AGB prediction.It must be noted, furthermore, that acquiring sufficient field data to meet statistical sampling rules is almost insurmountable owing to the heterogeneous characteristics of subtropical forests (Hawbaker et al. 2009).A more parsimonious yet feasible method to acquire enough forest reserves in far-off, complex and uncharted forest areas may be applied to improve the map of forest properties using forest structure information.A series of threedimensional structural characteristics that are significantly conducive to AGB estimation can be delivered from airborne light detection and ranging (LiDAR), such as the stand structural complexity, stand height, and forest cover (Bolton, Coops, and Wulder 2015;Kane et al. 2010).Forest structural attributes including biomass, have been acquired with high accuracy by combining LiDAR and plot-based samples of the forest conditions (LaRue et al. 2020;Lin et al., 2022a;Qin et al. 2022).The declining costs have led to LiDAR acquisitions for growing large areas within the bounds of possibility, whereas multi-temporal and large-area applications for LiDAR coverage are costly and logistically prohibitive compared with accessible satellite images (Zald et al. 2016).
The most promising and economical method to characterize subtropical forest properties is to employ a combination of LiDAR and satellite data (Tsui et al. 2013).Consequently, the LiDARderived forest parameters are utilized as dependable training and testing datasets to accomplish subsequent satellite-based wall-to-wall AGB estimation (Stojanova et al., 2010;Shao, Zhang, and Wang 2017).The study by Zald et al. (2016) used LiDAR plots as a substitute for conventional field and photo plots and they were subsequently used as response variables for multi-temporal change predictors and tasseled cap indices derived from Landsat imagery.The satisfactory accuracy (R 2 : 0.69, RMSE: 13.45%) acquired by the approach demonstrated the feasibility of using LiDAR data as a substitute for field-measured data.Other studies that integrating the LiDAR-derived parameters with optical metrics confirmed their suitability for estimating AGB (Shao, Zhang, and Wang 2017;Wang et al. 2020).
Sentinel-2 (S-2) optical satellite images are medium-resolution, freely-and globally-available, making them conducive to low-cost, large-scale and high-precision AGB mapping.The Sentinel-2 mission comprises two identical satellites; thus, S-2 has shorter revisit intervals of five days, which can produce more available images that are immune to weather conditions (Li et al. 2020;Shoko and Mutanga 2017).The S-2 imagery provides 13 multispectral bands containing two near-infrared bands, three red-edge bands, and two short-wavelength infrared bands, besides visible bands (Drusch et al. 2012).Moreover, S-2 is unique open access data comprising three red-edge bands, which is beneficial to monitoring vegetation information (Delegido et al. 2011).This knowledge notwithstanding, the available optical data are difficult to acquire because they are susceptible to cloud coverage, especially for the rainy period in subtropical forest areas, and the data have saturation phenomena in forests with high biomass (Lin et al., 2022b).The deficiency, however, can be compensated to some extent by SAR data, which are free from weather conditions due to the characteristics of the imaging technique and can obtain more vertical structure information owing to its ability to penetrate a certain depth of forest canopy (Han, Wan, and Li 2022).The Sentinel-1 (S-1) with dual-polarization SAR in C-band offers more opportunities for large-scale AGB estimation owing to its global coverage, free data access, and short repeat cycles of 12 days, especially in the subtropics where optical data are often not available (Karki et al. 2021;Li et al. 2022).Nevertheless, for SAR data, saturation also occurs in dense vegetation areas.Many studies have been conducted using these datasets, individually or complementarily, for forest AGB estimation.Chen et al. (2019) used S-1, S-2, their derivatives (such as biophysical parameters, vegetation indices, and texture), and elevation data to estimate biomass in Jilin province, north-eastern China.To compare the performance of different algorithms, two parametric (geographically weighted regression (GWR), stepwise regression (SWR)) and three non-parametric methods (support vector regression (SVR), artificial neural networks (ANN), and random forest (RF)) were applied to map AGB.They found that texture features from S-1 data were more suitable for AGB estimation than direct use of backscatter information, especially the texture based on a larger window of VV polarization.The study revealed that spectral indices related to chlorophyll characteristics were sensitive to AGB.The combination of S-1, S-2, and digital elevation model (DEM) data was implemented by Liu et al. (2019) to predict AGB using RF algorithm in Yichun, northeast China.They found that the synergistic use of multi-source data acquired satisfactory results (R 2 : 0.74, RMSE: 23.38 Mg/ha).Furthermore, the study observed the contribution of biophysical parameters (such as the fraction of photosynthetically active radiation (FAPAR), fractional of vegetation cover (FVC)) and red-edge derived spectral indices in improving model estimation performance.Wang et al. (2020) utilized field samples, unmanned aerial vehicle (UAV) LiDAR data, and S-2 to map AGB using RF method in northeast Hainan Island, China.The results of variable selection showed that approximately half of the remaining predictors were associated with rededge bands, and the possible contribution of short-wavelength infrared (SWIR) indices was also noted.The suitability of S-1 and/or S-2 for biomass estimation has also been affirmed by other research (Luo et al. 2022;Nuthammachot et al. 2022;Theofanous et al. 2021).
Many challenges are encountered when combining multi-source information, for example, excessive metric dimensionality, information redundancy, and the selection of the optimal estimating model.Parametric methods are the most commonly used AGB estimation models, but they are not suitable for complex subtropical forest areas (Bjork et al. 2022;Poorazimy et al. 2020).Nonparametric methods, for example, SVR, RF, K-nearest neighbours (KNN), and multi-Layer perception neural network (MLPNN), were selected owing to their ability to establish the complicated non-linear relationships effectively between biomass and remote sensing predictors and to handle high data dimensionality (McRoberts et al. 2017;Wolanin et al. 2019;Zhang et al. 2022).Furthermore, feature selection, an important procedure in the combination of different data sources, was embedded to choose the most informative yet parsimonious variables (Torabzadeh, Morsdorf, and Schaepman 2014).In our study, sequential forward selection was used due to its power to optimize model performance and improve the generalization ability of the models (Ludwig et al. 2019;Yu et al. 2021).The selected independent set of variables can promote the interpretation and applicability of the models.
This study aims to detect a more economical, high-precision, and large-scale estimation method for modeling subtropical forest AGB in north-eastern Conghua, Guangdong province, China; thus, airborne LiDAR data, yearly monthly time-series S-1 and multi-temporal S-2 images along with field data were utilized in a cost-effective way.The reason why the S-1 and S-2 datasets were chosen is that S-1 and S-2 are relatively novel, openly available and have superior resolution in terms of space, spectrum, and time, which is incomparable with other free-access datasets.Four experiments were implemented to compare the performance of diverse data sources (S-2, S-1, yearly time-series of S-1 and S-2, along with all S-1 and S-2 data) and prediction algorithms (RF, SVR, KNN, and MLPNN).For this objective, a large variety of optical and SAR variables were calculated to explore the most informative metrics related to aboveground biomass derived from each data source.By employing a sequential forward selection method, we performed a dimensionality reduction, and the optimal variables of each model were acquired.
To our knowledge, this is the first study that combines LiDAR-derived biomass and time-series of S-1 and S-2 images for subtropical forest AGB estimation in north-eastern Conghua, Guangdong province, China.Therefore, the purposes of this study were (1) to test the reliability of the LiDARderived AGB derived from field-based AGB and LiDAR-derived predictors as training and validation samples for subsequent satellite-based modeling; (2) to explore the performance of S-1, S-2 and their combination in estimating AGB; (3) to determine the best prediction model to map AGB; (4) to detect the variables that contributed significantly to mapping AGB from a large number of predictors; and (5) to investigate the most suitable image acquisition period to estimate AGB.

Study area
The study was conducted in the subtropical forest areas situated in north-eastern Conghua, Guangdong province, China.The area covering the extent of the airborne LiDAR data acquisition is geographically located between latitude 23°50 ′ 9 ′′ to 23°55 ′ 1 ′′ N and longitude 113°50 ′ 57 ′′ to 113°5 8 ′ 36 ′′ E (Figure 1).The elevation ranges from approximately 198 m in the northeast to 619 m in the southwest.The climate is subtropical monsoon, with yearly mean temperature of 21.3 °C and yearly mean rainfall of 1952.4 mm, and the rainy season lasts from April to September.The main vegetation types in this study site are broadleaf and coniferous mixed forest and the dominant tree species include Pinus massoniana, Schima superba, Cinnamomum porrectum, and Castanopsis fissa.

Field data
The field campaigns were conducted from August to September 2017.There were 60 sample plots in total, and each plot was square in shape with an area of 30 m × 30 m.The plot positions covered by the extent of LiDAR flights were selected on the basis of a subjective assessment.The centers of these plots were recorded using a GPS (Garmin MAP 60CS, accuracy ±3 m), and the tree species, along with the corresponding type (deciduous vs. evergreen), were also taken down.At each field plot, all the measurements (e.g.DBH and H) indispensable for calculating biomass were obtained.Furthermore, all the trees with a DBH <3 cm were abandoned during the measurements.The vegetation type (i.e.coniferous, deciduous, or mixed forest) of each plot was determined via a threshold of 75% in advance of the analysis (Singh et al. 2015).If deciduous (coniferous) trees accounted for more than 75%, the vegetation type of the plot would be marked as deciduous (coniferous), otherwise, it would be marked as mixed forest (Godwin, Chen, and Singh 2015).
In this study, we followed the methodology of Fang, Liu, and XU (1996) for biomass accounting in each plot.On the basis of the DBH and H, the volume of every tree was acquired through a volume table.Then, the summation of each tree volume was obtained as the total volume (TV) at plot scale.The total AGB (TAGB) in each plot was calculated using the relationship (Eq. 1) between the TV and TAGB: where a and b are known constants from Fang, Liu, and XU (1996), and their values hinge on the forest types of the plots.The plot biomass was ultimately obtained at a megagrams per hectare (Mg/ ha) conversion unit.Finland).The LiDAR data were pre-processed by first recognizing and removing individual noisy points.We classified the point cloud data into two categories ground and non-ground returns, and then the interpolation procedure was separately conducted for the ground returns and all first returns in order to produce a digital surface model (DSM) and a DEM with a 1 m resolution.To obtain the crown height model (CHM) raster data, the DEM was subtracted from the DSM.The CHM pixels with values between 2 and 35 m were reserved based on the maximum and minimum of measured tree heights within this area to remove the undergrowth vegetation and objects above the height of the trees (Hyyppa et al., 2012;Wulder et al. 2012).

Satellite data acquisition and pre-processing
Two satellite image types (i.e.S-1 and S-2) were utilized in this study.According to previous studies, the integrated use of SAR and optical images could improve the estimation performance of the vegetation parameters (Castillo et al. 2017;Forkuor et al. 2020).Due to the near independence of S-1 data from weather circumstances, in the instances of excessive cloud cover, S-2 could not provide available data, and the S-1 imagery can be used to fill the gap.S-2 data, which have three red-edge bands related to vegetation information, are frequently used high-resolution data for large-scale AGB mapping.S-2 has a 5-day temporal resolution.The potential of S-2 data to augment the AGB prediction has been shown by several studies (Puliti et al. 2020;Theofanous et al. 2021).However, the S-2 image is often susceptible to cloud cover, and integrating S-1 and S-2 provides year-round images to improve the estimation accuracy.S-1 images for the period from January to December 2017 were obtained from the Google Earth Engine (GEE) (https://code.earthengine.google.com)(Figure 2).The S-1 data were gathered in the Interferometric Wide Swath (IW) mode with the VV (Vertical transmit -Vertical receive) and VH (Vertical transmit -Horizontal receive) polarizations, and they were the Multi-Look Ground Range-Detected (GRD) products.Twelve scenes (a scene monthly) covering the study site were acquired.In this study, the API interface based on JavaScript language was used to load the GRD product dataset, i.e. 'COPERNICUS/S1_GRD', in the 'Code Edit' module of the GEE platform.The dataset was pre-processed for each scene using ESA's Sentinel Applications Platform (SNAP) software, including orbit files applying, thermal noise processing, speckle filtering, radiometric calibration, terrain correction, and geocoding.Therefore, the processed data (i.e.backscatter coefficients) completed by GEE could be used directly.Subsequently, all the S-1 data were resampled to 10 m resolution using the bilinear interpolation method and resized based on the spatial coverage of the study site.
S-2 images were obtained from the Copernicus Open Access Hub (https://scihub.copernicus.eu/dhus/#/home).S-2 data were acquired for the months of January, February, March, October, November and December 2017 (Figure 2).Owing to an excess of cloud coverage for a considerable part of the study site, the other months of 2017 failed to provide effective data.For each image, atmospheric correction was implemented by employing the Sen2cor plugin of SNAP version 9.0 software (SNAP 2022) to transform the Level 1C to Level 2A.The coastal aerosol, water vapour, and SWIR-Cirrus bands were abandoned due to their contribution to atmospheric correction and lower spatial resolution, with ten spectral bands remaining.Three red-edge bands, one near infrared (NIR) band, and two short-wavelength infrared (SWIR) bands of 20 m resolution were resampled using the bilinear interpolation approach to a spatial resolution of 10 m.Each S-2 image was clipped to cover the study region precisely.

Methods
Figure 3 illustrates the proposed method for forest AGB estimation in the study.Five major steps were followed to acquire the AGB map: (1) predictors extraction, (2) reference map of AGB establishment, (3) training and testing datasets production, (4) feature selection procedure and (5) model calibration and assessment.

LiDAR variables
Two classes of LiDAR variables were derived: the canopy height and canopy cover metrics (Table 1).Three types of canopy height variables describing the distribution condition of heights were calculated, including the percentile height (H.p: p60, 70, 80, 90, and 95), mean height (H.mean ), and interquartile distance of percentile height (HIQ).For the canopy cover variables, the point densities (PD), which were of diverse height intervals (such as the proportion of returns above 10 m or between 10 and 20), were derived (Almeida et al. 2019).The whole area along the LiDAR transects was resampled to 10 m, and the 14 LiDAR variables were extracted for every cell.3.1.2.Sentinel-2 and Sentinel-1A variables Three classes of optical variables were derived from the pre-processed Sentinel-2 images: spectral bands, vegetation indices, and biophysical parameters (Table 2).The red, green, blue, red-edge, near-infrared, and SWIR bands were employed for subsequent analysis.For each scene, nine spectral indices were chosen based on previous research (Castillo et al. 2017;Wang et al. 2020;Wu et al. 2022) (Table 2).In addition to the vegetation indices, three biophysical parameters, including FAPAR, FVC, and leaf area index (LAI), were derived for all the images.These biophysical metrics were shown to be sensitive to forest AGB due to their description for vegetation status and dynamics information (Forkuor et al. 2020;Ronoud et al. 2021).This step was performed for each scene by the biophysical processor of the SNAP software.The processor derived biophysical parameters using tested, general approaches based on PROSAIL radiative transfer model.
According to published studies (Bhatt and Nain 2021;Forkuor et al. 2020;Mayamanikandan et al. 2022), two polarization bands (VH, VV), the quotient (VV/VH), the sum (VH + VV), and the difference (VH-VV) were calculated as the SAR inputs.Furthermore, research based on texture measures has shown good potential to predict forest structure parameters (Chen et al. 2018;Liao, He, and Quan 2020;Thapa et al. 2015).Eight texture features comprising the mean, homogeneity, variance, dissimilarity, second moment, entropy, contrast, and correlation, were produced from the  radar backscatter band (VH) based on the grey level co-occurrence matrices (GLCM) (Table 2).The superiority of texture features extracted from small window sizes to fine changes in pixel brightness compared with those computed from a large window size had been shown (Kelsey and Neff 2014).Therefore, we selected a 3 × 3 window size to acquire texture variables.Table 2 shows a list of all the variables (metrics extracted from S-1 and S-2 data) employed in the study.

Reference map of AGB
The 60 field-measured AGB values were randomly split into 70-30% training and testing samples.
Based on the biomass measurements and LiDAR variables (Table 1), AGB modeling was implemented using the SVR method.The computational complexity of SVR algorithm does not Homogeneity depend on the dimensionality of the sample space, which avoids the curse of dimensionality at some level.Moreover, it is impervious to outliers and thus has relatively better robustness.In Chen et al. (2018), the SVR model reached the best performance for AGB prediction using limited field data.
To reduce the information redundancy and improve the efficiency during AGB retrieval, the backward feature exclusion method was performed to filter the LiDAR variables sensitive to biomass.When removing a metric leads to a decrease in the RMSE value of the SVR model, it means that this variable contributes little to AGB, and vice versa.The optimal group of LiDAR variables was obtained when the RMSE no longer decreased, and then they were utilized for AGB prediction, thereby forming the LiDAR-derived AGB reference map with 10 m spatial resolution.

Training and testing data
288 variables in total were utilized to estimate forest AGB.The selected predictors consisted of 60 S-2 optical bands (10 spectral bands each from six images), 54 S-2 vegetation indices, 18 S-2 biophysical metrics, 24 SAR polarization bands (2 bands each from 12 scenes), 36 SAR difference, quotient, sum bands, and 96 texture features.660 samples that were almost evenly distributed within the AGB reference map were adopted (Table 3).The number of samples selected based on the extent of the region (100 km 2 ) was comparable to that in similar studies (Shao, Zhang, and Wang 2017).The extracted biomass values from the AGB reference map were appended to the calculated backscatter and optical information.The total samples were randomly divided into training and testing samples at a ratio of 7:3.

Regression algorithms
In our study, four diverse regression methods including SVR, MLPNN, KNN, and RF were considered to map AGB.An introduction of each algorithm is provided as follows.
With the support vector machine (SVM), regressions are accomplished depending on the major principle that mapping the input vector to a high-dimensional feature space using some well-chosen nonlinear mapping kernel function (Almeida et al. 2019).SVM comprises three diverse implementations, namely, SVR, NuSVR, and LinearSVR.The SVR method was selected because it is frequently used.The SVR method is derived from the support vector classification method.The Gaussian Kernel, namely, radial basis function (RBF), was chosen to predict forest AGB.Many studies have compared different kernels and the superiority of RBF kernel was found (Monnet, Chanussot, and Berger 2011;Nguyen et al. 2020).The SVR (RBF) was further optimized by using the gird search approach to detect a more appropriate combination of C and epsilon that maximized the R 2 .C is a regularization parameter.Epsilon specifies an epsilon-tube whose training loss function has no penalty related to points forecasted within a distance epsilon from observed values.The SVR algorithm has demonstrated the superiority of the generalization ability (Monnet, Chanussot, and Berger 2011).
Figure 4. Schematic diagram of an instance MLPNN model structure to estimate forest AGB.The neurons N i , N j , N k in the input layer, hidden layers, and output layer, as well as their internal connections and weightings (W ij , W jk ) were shown.
MLPNN is a non-parametric algorithm.The rationale of the neural network is to simulate and facilitate biological neurons, and multi-layer perceptron (MLP), which is the most fundamental neural network model, was used in this research (Chen et al. 2018).With the exception of an input layer and an output layer, at least one hidden layer is incorporated into MLP, and its different layers are fully connected (Figure 4).The training method generally adopts the backpropagation (BP) algorithm.Compared to single-layer perceptron, multi-layer perceptron can learn nonlinear functions.The regression model was optimized by looking for appropriate activation functions, along with the number of hidden layers and nodes.The MLPNN method has been demonstrated to be a feasible algorithm for estimating AGB (Chen et al. 2018).
For K-nearest neighbour (k-NN) algorithm, the main idea is that the estimated value of each query point to be measured is acquired via the weighted average of the corresponding attribute values of the k nearest neighbours (Xie et al. 2022).Uniform was selected as a weight parameter because each point in the local neighbourhood contributes equally to uniform weights in the prediction of a query point.This characteristic is suitable for our datasets.The parameters n_neighbours and leaf_size of KNN were set to keep the default values.The advantages of the KNN algorithm have been proven (for example, in terms of their effectiveness and convenience) (Persson, Jonzén, and Nilsson 2021).
RF is considered the most frequently used and efficient algorithm.It belongs to the ensemble learning method, which is approximately divided into bagging and boosting methods (Almeida et al. 2019).The general idea behind ensemble learning is to train many relatively weak models (e.g.decision tree, SVM, etc.) and package them to yield a relatively strong model (Forkuor et al. 2020).The strong model outperformed a single weak model.Decision trees are used as weak models in RF.During the training phase, RF employ bootstrap sampling to gather many diverse sub-training sets from the input training set for training multiple diverse decision trees in turn.The aim of using bootstrap sampling is to utilize multiple diverse sub-models to augment the robustness and stability of the final models.In the prediction phase, the RF predicts a response variable by averaging the results predicted by multiple interior decision trees to acquire the final outcome.In this study, RF was optimized by using the gird search approach to explore the more proper combination of a number of trees (ntree) and depth of trees (mdepth) to improve the performance.Moreover, the previous research (Zald et al. 2016) supported the superiority of the RF which has no demand for the feature dimension of the dataset compared with the other machine learning algorithms.

Experimental design
Consistent with this study's purpose, four experiments were implemented with diverse datasets to estimate forest AGB: (1) all optical data; (2) all SAR data; (3) yearly time-series of SAR and optical data; and (4) the integration of all SAR and optical data.Table 4 presents the details about the experiments.The modeling framework was built by applying the four datasets to the four prediction models (RF, KNN, SVR, and MLPNN).Prior to modeling, a feature selection procedure was used to acquire the optimal predictors for each model.Section 3.6 details the variable selection method used here.

Feature selection
The aim of this step was to detect the optimal predictors of each model for estimating AGB.For this objective, we employed the sequential procedure to eliminate comparatively inferior predictors using the LiDAR-derived AGB value as the independent testing dataset (N = 198).The model performance was evaluated by the coefficient of determination (R 2 ).The sequential forward selection (SFS) began with an available variable to build a prediction model, and then the model was validated using testing samples of the corresponding variable.The predictor that had the highest R 2 was retained.In the next step, the saved predictor with another variable from the remaining set (n-1) were used to run as the training set, and the model was subsequently validated based on the same principle.The SFS took into account all the potential combinations of variables as much as possible and selected the subset with the maximum R 2 , which meant the predictors having the best contribution to the model were determined (Figure 5).The SFS was applied to each dataset (S2all, S1all, S1S2yearly, and S1S2all), and the selected variables from each dataset were respectively input into the four prediction models (RF, KNN, SVR, and MLPNN).

Accuracy assessment
The accuracy of each experiment was evaluated by independent testing samples (N = 198) not involved in the process of model building.Three accuracy evaluation indices (Eq.2-4) were calculated to contrast the forecasted values with observed values (LiDAR-derived AGB values).They are the coefficient of determination (R 2 ), root mean squared error (RMSE, Mg/ha), and relative RMSE (RMSEr, %) assuming the form of a percentage. (2) Where 'E' is the estimated value, and 'T' is the true value.

Performance assessment of the AGB reference map
Based on the backward feature exclusion method, the optimal LiDAR variables, including the H. mean , H. p80 , H. p90 , H. p95 , PD 20_30 , PD 14 , PD 22 , and PD 26 , were applied to produce the AGB reference map.The accuracy of the SVR model validated using the independent testing dataset (18 fieldmeasured AGB) was R 2 of 0.81 and RMSE of 20.71 Mg/ha (Figure 6).

Performance of diverse data inputs and prediction models for AGB estimation
Table 5 shows the testing outcomes of sixteen experiments implemented with S-1 and/or S-2 data.All of the experiments were assessed using the same independent testing dataset (N = 198).Except for experiment B, the RF model outperformed the other prediction models, and the worst performance was afforded by the SVR model, which delivered a relatively higher RMSE for each experiment.The performance of the KNN and MLPNN models varied in different experiments.For instance, KNN outperformed MLPNN in experiment D, while MLPNN outperformed KNN in experiment C. In terms of all four regression methods, the results in experiments S2all and S1all suggested that S-2 data, which were obtained in six months, assumed better performance compared to the S1all acquired in twelve months in general.Even if the KNN model acquired the best performance with an RMSE of 46.49Mg/ha compared to other models in experiment B, it was still higher than the worst performance with an RMSE of 37.59 Mg/ha reached by the SVR model in experiment A. With respect to R 2 , coefficients of determination of 0.33 and 0.48 were achieved for S1all in the KNN model and S2all in the SVR model, respectively.The combination of yearly monthly time-series of S-1 and S-2 images improved the estimation accuracy of the models for all the prediction algorithms by increasing the R 2 and decreasing the RMSE r .The improvements in the R 2 (increase of 0.10-0.14)and RMSE r (decrease of 2.38-4.88%)were reached by experiment S1S2all relative to models with S2all.Relative to the counterpart in experiment S1S2yearly, experiment S1S2all acquired a higher accuracy with four models.Overall, the best performance was afforded by the RF model and experiment S1S2all.
The reference AGB versus model predicted AGB derived from the different data sources and methods are plotted in Figure 7.All the models showed an overestimation of low AGB and an underestimation of high AGB.However, the RF model with all optical and SAR data whose points were closer to the 1:1 line compared with the other models showed slight overestimation and underestimation.

Optimum variables and data acquisition periods for AGB estimation
The most critical predictors selected by the SFS procedure for each model in four experiments are presented in Tables S1-4 (Supplementary).With respect to the amount of selected optimal predictors, the importance of S-1 and S-2 data to the S1S2yearly and S1S2all experiments hinged on the prediction method considered.For the combined experiments (S1S2yearly, S1S2all), the optical variables were selected more than the SAR variables except for the MLPNN model in experiment S1S2all.Moreover, the MDI2 metric was only chosen by the MLPNN method, and the sum, variance, along with the mean predictors, were not selected by any model in combined experiments (S1S2all and S1S2yearly).
The optimal variables and image acquisition period for the RF model assuming the best performance for AGB estimates in each experiment are shown in the predictor importance graphs (Figure 8).Compared with the backscatter bands along with conventional vegetation indices, derivatives such as the texture features (entropy, contrast, and correlation), radar indices, biophysical parameters, and red-edge and shortwave infrared indices were discovered to be preferable variables for AGB estimation in all the experiments.For S1S2yearly and S1S2all, the top four most crucial predictors were the shortwave infrared index (Mar_MDI1), biophysical products (Feb_LAI, Nov_-LAI/Oct_LAI), and SAR index (Apr_VHVVc).Among the first ten most contributing predictors, in S1all, texture measures accounted for six out of ten (Dec_variance, Oct_dissimilarity, Jun_correlation, Mar_correlation, Jun_variance, and Jun_homogeneity), and in S2all, the top six most informative metrics were derivatives (Oct_NDVIre1, Mar_LAI, Nov_MDI2, Feb_IRECI, Jan_NDVI, and Jan_IRECI).In addition, the graphs reveal that the quotient indices were more useful for AGB modeling than the difference and sum indices, although the quotient indices were the worst predictor of three SAR indices in S1all.Overall, the biophysical parameters, near-infrared and shortwave infrared indices of optical data, along with the quotient indices of SAR data, were found to contribute more to the AGB prediction.
According to the predictor importance ranking of experiments S1all, S1S2yearly, and S1S2all with respect to the RF model (Figure 8), the optimal image acquisition stages for estimating subtropical forest AGB were investigated.The results of ranking in experiment S1S2all revealed that the top ten most important predictors were almost all obtained during the arid periods (October to March), except for one metric (Apr_quotient).In experiment S1all, the most important metric (Dec_variance) was acquired during the arid season.Moreover, the inclusion of arid season SAR data (S1S2all) improved the AGB estimation accuracy compared to the experiment S1S2yearly.Consequently, the images (S-1 and S-2) of arid seasons were important for the subtropical forest AGB estimating in the study site.

Forest AGB predictive mapping
The final AGB map of the study region derived from integrated S-1 and S-2 images (S1S2all) using the RF model is presented in Figure 9.The estimated map is characterized by AGB values ranging from a low of 91.35 Mg/ha to a high of 192.59 Mg/ha, and the average values is 163.95Mg/ha.Additionally, about 70% of the study region has AGB values between 140 and 170 Mg/ha.The spatial distribution of forest AGB predictions, such as high and low biomass areas, is consistent with field observations.Nevertheless, there was an overestimation or underestimation occurring in sparse or dense forest areas, while the result that approximately 70% of the final AGB map was between 140 and 170 Mg/ha is in line with the study area.

The LiDAR-derived AGB as a substitute for field-measured data
To our knowledge, the AGB reference map derived from LiDAR data and field plots as ground reference data was relatively rarely employed to characterize subtropical forest information.The feasibility of using LiDAR data as a substitute for conventional forest inventories has been demonstrated by previous research (Shao, Zhang, and Wang 2017;Zald et al. 2016).Apart from further confirming the suitability of the LiDAR-derived reference map, the follow-up modeling work which reached satisfactory accuracy (R 2 : 0.72; RMSE: 28.63 Mg/ha; RMSE r : 17.65%, for the best model), also demonstrated the applicability of yearly multi-temporal time-series of SAR and multispectral data for AGB mapping.There were trends to overestimate AGB in sparse forests and underestimate AGB in dense forests.The overestimation may be pertaining to the variation of background land cover and undergrowth vegetation, which leads to an increase in spectral variation.Compared to other ground cover, the relationship between spectral information of the background classes and AGB becomes more complicated.The explanation for underestimation could be that saturation phenomena occurs, which is critical limitation for optical data when estimating AGB in dense vegetation areas.The H. mean , H. p80 , PD 20_30 , and other LiDAR variables selected by the backward feature exclusion method were consistent with some similar studies (Almeida et al. 2019;Wang et al. 2020), which indirectly supported the correctness of this routine.Furthermore, the good validation accuracy (R 2 : 0.81, RMSE: 20.71 Mg/ha) (Figure 6) also demonstrated the reliability of the reference map of AGB as a substitute for ground reference data.The subsequent modeling results indicated that employing the LiDAR-derived AGB as  a surrogate for field measurements is relatively economical and reliable, and thus, it can be applied in future studies when the amount of representative inventories in large forest areas is restricted.Despite its superiority, the method had a limitation which requires LiDAR data available.

S-1 versus S-2 for modeling AGB
The outcomes of our research showed that S-2 data are more applicable for mapping AGB in the study site than S-1 data.Although this discovery is consistent with previous studies comparing the contribution of SAR and optical data in estimating AGB (Forkuor et al. 2020;Vafaei et al. 2018), others presented inverse results (Chen et al. 2018;Navarro et al. 2019).This result can be attributed to the short wavelength (C-band), which has a limited ability to capture vertical structural information of forests compared with longer wavelength SAR images (e.g.L and P).Previous studies have found that the involvement of interferometric synthetic aperture radar (InSAR) can weaken the saturation effect to some degree (Saatchi et al. 2011).Thiel and Schmullius (2016) utilized Pband PolSAR and dual antenna X-band InSAR data to estimate AGB and found that the inclusion of more dimensional SAR features obtained more precise results than unidimensional SAR metrics.The contribution of textural information based on S-1 for AGB mapping has been found both in this study and Chen et al. (2018).Furthermore, our study observed the SAR indices of sum, difference, and quotient (Table 2) to be beneficial for AGB modeling, which was in line with the findings of Forkuor et al. (2020) with the exception of the quotient.Therefore, the SAR indices, textural information, and longer wavelength SAR data (e.g.L-and P bands) coupled with vegetation height information from InSAR should be applied for AGB mapping in the near future.
Although there was a saturation phenomenon with optical data, especially in dense vegetation areas, the performance of S-2 data was still better than the S-1 data in the study.Similar results were reported in other research that compared SAR and optical data to predict AGB.Forkuor et al. (2020), for instance, compared S-1 and S-2 for forest AGB prediction and acquired preferable estimation accuracy with S-2 data (multispectral bands, spectral indices, and biophysical parameters) (R 2 : 0.83, RMSE: 60.6 Mg/ha, for the S2 model; R 2 : 0.61, RMSE: 86.6 Mg/ha, for the S1 model).Besides the relatively high spatial resolution and multiple red-edge bands of S-2, the scholars regarded the inclusion of S-2 extracted biophysical variables as functional for the results of S-2.In our study, the involvement of S-2 derived spectral indices, such as NDVIre3 and MDI1, along with biophysical variables (LAI, FAPAR, and FVC), made S-2 as conducive to the estimation performance as possible.Although the contribution of these parameters has been observed by other research (Wang et al. 2020), the importance ranking graphs of experiments S1S2yearly and S1S2all offer new perspectives for the importance and sensitivity of S-2 derived SWIR indices, red-edge indices and biophysical parameters to AGB estimates in the study region.The SWIR indices (MDI1) and biophysical parameters (LAI) were in the top three important predictors in experiments S1S2yearly and S1S2all.The fifth most important variable in experiment S1S2yearly was red-edge indices (NDVIre3).The results of importance ranking graphs further demonstrate the superiority of SWIR indices, red-edge indices and biophysical parameters for AGB estimation in subtropical forest areas.
The combination of SAR and optical data obtained the best results compared to the other experiments implemented here.Similar outcomes were acquired in other studies using SAR and optical data (Forkuor et al. 2020;Shao, Zhang, and Wang 2017).This finding can be attributed to the complementarities of both data in the data characteristics and imaging technique, and thus combining two types of data is conducive to predicting AGB.Furthermore, the integration of yearly monthly time-series of SAR data and optical data according to different seasons provides novel insights for the combination of S-1 and S-2 for future forest AGB estimation.In our study, experiments S1S2all (R 2 : 0.72; RMSE: 28.63 Mg/ha; RMSE r : 17.65%, for the best S1S2all model), along with S1S2yearly (R 2 : 0.66; RMSE: 29.80 Mg/ha; RMSE r : 19.01%, for the best S1S2yearly model), accomplished the most satisfactory estimation results compared to the other experiments.However, the performance of experiment S1S2all was better than that of S1S2yearly.A possible explanation for this result is the augmentation of derivative information from S-1 data (e.g.texture metrics, quotient, difference, etc.) during the dry season.Therefore, for the study area, the AGB mapping precision is better when as many S-1 derivative features in the dry period as possible are used.

Performance of SVR, MLPNN, KNN, and RF prediction algorithms
The superior performance of the RF approach has been revealed by many previous studies (Chen et al. 2018;Wang et al. 2020).Chirici et al. (2020) utilized four approaches to yield a wall-to-wall map of growing stock volume and the RF method acquired the best estimation result (R 2 : 0.69, RMSE r : 37.2%) compared to the other methods (multiple linear regression, KNN, and GWR).Another study conducted by Fassnacht et al. (2014) applied five methods, including stepwise linear regression, SVR, RF, KNN, and gaussian processes, to estimate biomass, and these methods were selected on the basis of a systematic review of the accessible literature.The outcomes showed that the experiment using the RF method produced the best accuracy (R 2 > 0.6 and RMSE < 40 t/ha).For this study, the lowest error of AGB estimation (R 2 : 0.72, RMSE r : 17.65%, for the best model) was also generated by the RF algorithm in almost all experiments.The satisfactory performance of RF model can be attributed to the robustness and flexibility in terms of noise and outliers.The performance of the KNN and MLPNN changed in different experiments.For example, KNN performed better than MLPNN in experiment S1all, while the opposite outcomes appeared in experiment S1S2yearly.This phenomenon may be caused by the combination of feature selection and data source, and the result of feature selection is related to the characteristics of the model.The reason why the SVR model obtained the worst performance may be that the selected kernel function was not suitable for the input data.The performance of different kernel functions should be determined in future AGB estimations.Aside from picking an appropriate regression method, diverse variable subsets attuned to different models were delivered through the feature selection procedure, which reduced the feature dimension, decreased information redundancy, and improved the modeling efficiency.The satisfactory results suggested the feasibility of our feature selection procedure.Other feature selection approaches should be investigated in follow-up studies to improve the AGB modeling performance.

Contribution of S-1 and S-2 derivatives in estimating AGB
The importance ranking graphs suggested that on the whole, derivatives of S-1 and S-2 data are preferable for predicting AGB than the original information (backscatter and spectral bands).Similar results were found by other studies using SAR and optical data to estimate AGB (Castillo et al. 2017;Forkuor et al. 2020).For instance, Forkuor et al. (2020) found that backscatter indices (S-1), spectral indices, and biophysical parameters (S-2) were more efficient than multispectral (S-2) and backscatter (S-1) bands in mapping AGB.Castillo et al. (2017) conducted a total AGB estimation study on diverse coastal land uses and found that the model based on LAI (S-2) had a higher accuracy than that using the raw bands (S-1 and S-2) and the vegetation indices.They observed the importance of using derivatives.In reference to S-1, valuable information was provided by extracted indices of the difference (VV-VH) and quotient (VH/VV), along with textural features of correlation and contrast in this study.The quotient indices were more important variables than the difference in the S1S2all and S1S2yearly experiments.Overall, the S-1 derivatives contribute more than the raw backscatter bands.
In the case of S-2, the vegetation indices MDI1 and NDVIre3 were shown to be the optimal predictors for AGB estimates in the study.MDI1 was noted to be the most important metric rather than the other red-edge derived spectral indices, followed by the biophysical predictors (LAI) in the S1S2all and S1S2yearly experiments.Wang et al. (2020) acquired similar results and found that MDI1 was as important a predictor for AGB modeling after variable selection.The effectiveness of both NIR bands and SWIR bands in AGB mapping was further verified by this research.This knowledge notwithstanding, the performance of MDI2, which was excluded from the variable set after predictor selection in experiments S1S2all and S1S2yearly concerning the RF method, was inferior to MDI1.On the other hand, the potential of red-edge derived spectral index NDVIre3 was observed, which can be attributed to its capacity to relieve the saturation problem in AGB mapping.This finding is in line with previous research which used red-edge derived vegetation indices for AGB estimation (Mutanga and Skidmore 2004).Concerning biophysical parameters, this study suggested that LAI had better performance than the other two (FVC, FAPAR).The results were in line with those observed by Chen et al. (2018), who used the LAI, FAPAR, and FVC for estimating AGB, and they detected a higher correlation between AGB and LAI.However, Baloloy et al. (2018) observed a relatively lower correlation for LAI in AGB estimation, and Forkuor et al. (2020) also revealed that the LAI made a lower contribution compared to FAPAR and FVC.The reason for these differences may be attributed to the distinctions in the vegetation structure and density.For example, there was poor forest cover for the study area in Forkuor et al. (2020), while Chen et al. (2018) and our study regions had relatively higher stand density with a mean AGB value of more than 100 Mg/ha.

Influence of seasonality on model performance
Concerning the influence of seasonal factors, the graphs of predictor importance ranking delivered by the RF model for the four experiments revealed that data obtained in the arid period (October to March) contribute to AGB estimation in this study region.Bouvet et al. (2018) and Laurin et al. (2013) observed that the sensitivity of SAR images to AGB is susceptible to seasonal factors.The increase in vegetation and surface moisture during the rainy season could lead to the reduced sensitivity of SAR images to AGB (Nguyen et al. 2016).S-1 with relatively low canopy penetration compared to P-and L-bands could be more suitable for the area with a great percentage of deciduous vegetation where the branches could be emerged in the arid period (Laurin et al. 2013).By comparing experiments C with D, it is also clear that the inclusion of SAR data during the dry season enhanced the model performance.The findings on seasonality in our study coincide with those of Forkuor et al. (2020), who utilized S-1 data to establish a prediction model of the forest AGB in the Sudanian Savanna (SS) agroclimatic region of West Africa.They analysed the variable importance ranking including both arid and rainy season data and they found that the selected SAR variables were all in the arid season, among the top ten optimal features.Overall, when time-series optical and SAR data are combined to predict subtropical forest AGB where optical data available during the rainy season are difficult to obtain, the inclusion of SAR data during the dry season can improve the estimation accuracy.

Conclusion
In this study, we determined the potential of integrating LiDAR-derived biomass with the combination of yearly multi-temporal (monthly) time-series of Sentinel-1 (SAR) and Sentinel-2 (multispectral) for estimating subtropical forest AGB in north-eastern Conghua, Guangdong province, China.Four experiments that included diverse feature combinations were designed, and four prediction models (RF, SVR, KNN, and MLPNN) were used for our objectives.We drew the following conclusions from the study.First, the reference AGB map derived from LiDAR with the synergy of yearly time-series of S-1 and S-2 images yielded satisfying estimation performance (R 2 : 0.72, RMSE: 28.63 Mg/ha, RMSE r : 17.65%) for the forest AGB.This scheme provides a potential alternative for large-scale mapping of forest structure attributes when field data are limited and multi-month optical and SAR images are available.Second, the experiment of monthly S-2 (S2all) accomplished preferable precision for AGB mapping compared to that using all the S-1 data (S1all), despite the supplementary use of the two data types (S1S2yearly, S1S2all) obtaining the best performance.Moreover, all the optical and SAR data (S1S2all) outperformed the yearly time-series of the optical and SAR data (S1S2yearly).Regardless of the prediction method used here, the AGB mapping accuracy was improved by at least 9% with regard to the RMSE and RMSE r after using all the data (S1S2all) relative to all the optical data (S2all).Third, the RF model presented the best performance compared to the other algorithms in this study, which further demonstrates the flexibility and robustness of the RF algorithm.Fourth, the most highly contributing Sentinel-1 variables for AGB mapping were related to the backscatter indices (e.g.quotient, difference, etc.) and texture features (e.g.correlation, contrast, etc.), and the most important Sentinel-2 variables were regarded as the MDI1, LAI, and NDVIre3.Finally, dry season data are sensitive for the AGB estimation of subtropical forests, and rainy season data are also indispensable.This research was devoted to determining the capability for synergy of airborne LiDAR and yearly monthly time-series of SAR and multispectral data for AGB estimation and a satisfying result was obtained.Overall, the proposed methodology provides important insights for low-cost, large-scale, and high-precision structural attribute monitoring in subtropical forest areas.
Benefit the People Demonstration and Guidance Program, China, under Grant 22-3-7-cspz-1-nsh;Open Research Fund Program of LIESMARS under Grant 19R07;Open Research Fund Program of Key Laboratory of Ocean Geomatics, Ministry of Natural Resources, China, under Grant 2021B03;and Introduction Plan of High-end Foreign Experts under Grant G2021025006L.

Figure 1 .
Figure 1.Study area.(a) Mapping showing the study site location in Conghua, Guangdong province, China; (b) S-2 image of the study site obtained on 27 October, 2017.

Figure 3 .
Figure 3.The workflow of the proposed approach for the modeling and mapping of the subtropical forest AGB.

Figure 4 .
Figure 4. Schematic diagram of an instance MLPNN model structure to estimate forest AGB.The neurons N i , N j , N k in the input layer, hidden layers, and output layer, as well as their internal connections and weightings (W ij , W jk ) were shown.
Satisfactory accuracy (R 2 : 0.81, RMSE: 20.71 Mg/ha) was acquired compared with similar research (Shao, Zhang, and Wang 2017; Tsui et al. 2013), and thus laid a dependable foundation for follow-up modeling work.In the LiDAR-derived biomass map, the AGB values ranged from 19.32-263.74Mg/ha, with an average value of 132.69 Mg/ha and a standard deviation of 64.81 Mg/ha.The biomass values in the reference map were subsequently utilized as training and testing data for the satellite-based wall-to-wall AGB estimating.

Figure 5 .
Figure 5. Flow chart of the sequential forward selection.

Figure 7 .
Figure 7.The scatterplots of the reference AGB and model predicted AGB produced from the S-1 and/or S-2 sources and diverse methods.

Figure 8 .
Figure 8. Importance ranking of the 10 highest predictors for RF model with four different datasets.

Figure 9 .
Figure 9. AGB map from the RF model with all the S-1 and S-2 integrated variables (S1S2all).

Table 1 .
List of LiDAR variables for AGB estimation.
h Number of first returns above a height h (10, 14, 18, 22, or 26) divided by the amount of all first returns.a_b Number of first returns between the height interval a_b (10_20, or 20_30) divided by the amount of all first returns.Almeida et al. (2019)

Table 2 .
The list of variables extracted from S-1 and S-2 images.

Table 3 .
Summary of the AGB values of 660 samples (Mg/ha).

Table 4 .
Experimental design for AGB estimation.

Table 5 .
Modeling performance for diverse prediction algorithms and data sources.