Combining Radiative Transfer Model and Regression Algorithms for Estimating Aboveground Biomass of Grassland in West Ujimqin, China

: Grassland aboveground biomass (AGB) is a crucial indicator when studying the carbon sink of grassland ecosystems. The exploration of the grassland AGB inversion method with viable reproducibility is signiﬁcant for promoting the practicability and efﬁciency of grassland quantitative monitoring. Therefore, this study provides a novel retrieval method for grassland AGB by coupling the PROSAIL (PROSPECT + SAIL) model and the random forest (RF) model on the basis of the lookup-table (LUT) method. These sensitive spectral characteristics were optimized to signiﬁcantly correlate with AGB (ranging from 0.41 to 0.68, p < 0.001). Four methods were coupled with the PROSAIL model to estimate grassland AGB in the West Ujimqin grassland, including the LUT method, partial least square (PLSR), RF and support vector machine (SVM) models. The ill-posed inverse problem of the PROSAIL model was alleviated using the MODIS products-based algorithm. Inversion results using sensitive spectral characteristics showed that the PROSAIL + RF model offered the best performance (R 2 = 0.70, RMSE = 21.65 g/m 2 and RMES r = 27.62%), followed by the LUT-based method, which was higher than the PROSAIL + PLSR model. Relatively speaking, the PROSAIL + SVM model was more challenging in this study. The proposed method exhibited strong robustness and universality for AGB estimation in large-scale grassland without ﬁeld measurements.


Introduction
Grassland ecosystems, as one of the largest ecosystems in the world, are an important part of natural ecosystems, playing a significant role in maintaining ecological balance, producing fodder crops and providing carbon sink functions [1,2]. Aboveground biomass (AGB), referring to the total dry weight of plants per unit area, is the key to our proper understanding of the interactions between the biosphere and atmosphere [3]. Thus, the accurate estimation and dynamic inspection of grassland AGB can evaluate the carrying capacity of grassland for livestock, comprehensively understand the process and influencing factors of global climate change, and help managers develop policies for the rational utilization of grassland to achieve sustainable development [4,5]. Although field measurements are the most accurate at obtaining grassland AGB, they are laborious, time-consuming, expensive and damaging to grasslands in large-scale areas [6,7]. Remote sensing has been widely used to effectively estimate grassland AGB owing to its large-scale earth observation image with high spatiotemporal resolution [8,9]. AGB inversion can be performed based on diverse types of remote sensing data, including active sensor data, optical data and multisource data [10]. Wang et al. [11] demonstrated the potential of airborne light detection and ranging (LiDAR) for retrieving Remote Sens. 2023, 15, 2918 2 of 22 AGB in the Hulunber grassland ecosystem in China. Schlund and Davidson [12] combined L-band and P-band SAR (Synthetic Aperture Radar) to estimate the boreal forest AGB. Jin et al. [13] used MODIS-based various vegetation indices to estimate AGB in temperate grasslands in Northern China. The accuracy of this model based on field data and MODIS data was greater than 73%. Lu et al. [14] offered a detailed description and generalization about the main features of the above data types. As an active remote sensing technology, LiDAR can offer detailed three-dimensional structural information on vegetation and has demonstrated significant potential for estimating vegetation's structural attributes, such as vegetation AGB, height and volume [15,16]. However, compared with satellite images with frequent and wall-to-wall coverage, it is not impractical for airborne LiDAR data in multitemporal and large-area applications due to its high costs [17]. Other active sensor SAR data have shown great advantages in estimating vegetation AGB by using the intensity, polarization and phase information of radar waves. Furthermore, compared with optical satellite data, SAR data have very low requirements for weather conditions during their satellite transit [8,18]. Therefore, many researchers have explored the application and performance of SAR data in AGB mapping [19,20]. However, the application of SAR data is constrained because of their high sensitivity to factors of the underlying surface, such as topographic relief, soil moisture and vegetation coverage, which may greatly reduce the accuracy of AGB mapping [21,22].
The methods of estimating grassland AGB using optical satellite data are dominantly empirical methods, including parametric and nonparametric methods. These parametric methods assume that there are explicit functional relationships between response variables (i.e., biomass) and predictors (i.e., potential variables from optical data). They can be basically grouped into linear and nonlinear models [23,24]. Although parametric approaches are simple and easy to operate, it is difficult for them to capture complex relationships. By contrast, nonparametric methods, such as artificial neural networks (ANN), random forest (RF) and support vector machines (SVM), are capable of exploring complicated nonlinear relationships [25][26][27][28]. Moreover, they have superiority when dealing with high-dimensional issues associated with large numbers of predictors [28]. However, their dependence on a large number of field-measured data restricts their versatility [29]. Hence, these empirical algorithms are generally limited to a certain time and area and have poor transferability.
The vegetation optical radiative transfer model (RTM) is based on accurate physical information that contains detailed relationships between vegetation parameters reflecting canopy characteristics and canopy spectra [30]. The RTM-based method provides advantages of transferability, parameter variability, and robustness compared with empirical statistical methods [31]. The inversion method using optical satellite images and RTM has broad applications in estimating vegetation biochemical and biophysical attributes, such as chlorophyll, leaf area index (LAI) and leaf water content [32][33][34][35]. Grassland AGB can be expressed as the product of LAI and leaf dry matter content (DMC) (AGB = DMC × LAI) [18], and DMC and LAI can be inversely calculated by the PROSAIL (PROSPECT + SAIL) RTM model. Based on the above definition, some studies have carried out the inversion of grassland AGB. Quan et al. [18] inversed LAI and DMC to estimate grassland AGB using the PROSAIL model and spectral data from Landsat-8 imagery. The statistical relationship between the measured equivalent water thickness (EWT) and DMC was adopted to filter unrealistic parameter combinations to minimize the RTM ill-posed problem and achieved good accuracy (R 2 = 0.64, RMSE = 42.67 g/m 2 ). He et al. [31] retrieved grassland AGB in the Zoige Plateau of China based on the PROSAIL model with the MODIS product (RMSE = 60.06 g/m 2 and R-RMSE = 18.1%).
A new method that couples RTM and regression models have been proposed for retrieving vegetation biochemical parameters owing to the complexity of RTM [36,37]. In this method, a simulated dataset derived from the RTM model becomes the primary data source and is regarded as the training data of the regression models for performing the prediction of vegetation attributes. This methodology can clarify the relationship between the vegetation parameters and spectral information while acquiring a high inversion accuracy by effectively combining the advantages of these two methods. This algorithm has shown unique advantages in some studies for estimating the biochemical parameters of vegetation [38,39]. However, few studies have investigated the use of this inversion technique when coupling RTM and regression models for mapping grassland AGB.
Therefore, in order to explore the grassland AGB estimation method with high accuracy and viable universality based on the combination of RTM and regression models, the specific objectives of this study were as follows: (1) perform PROSAIL model forward modeling to establish the simulated multispectral reflectance dataset of the grassland, (2) determine the sensitive bands and optimize the spectral indices based on the simulated dataset, which can be used as the predictors of subsequent inversion, and (3) compare the performance of the PROSAIL model separately when coupled with different regression algorithms (partial least square (PLSR), RF, and SVM) and that of the lookup-table (LUT) method for grassland AGB estimation using the AGB measurements.

Study Area
The study area was the West Ujimqin grassland, situated on the east of Xilin Gol League, Inner Mongolia Autonomous Region, China (43 • 57 -45 • 23 N, 116 • 21 -119 • 31 E) with a total area of approximately 22,500 km 2 ( Figure 1). The elevation of the West Ujimqin grassland ranges from 835 m to 1957 m. This area has a temperate continental climate with an annual average temperature of approximately 1 • C. The annual mean precipitation varies between 345 mm and 350 mm and is mostly concentrated in July and August. The study area is abundant in grass species, including Artemisia frigida, Artemisia halodendron, Caragana microphylla Lam, Carex spp. and Leymus chinensis.

Field Data Collection
The field measurements were carried out in the study area from 5 to 18 July 2015. Around the central location of the study region, sample plots were randomly selected according to subjective evaluations and distributed as evenly as possible. Forty-six 30 × 30 m plots were determined. Five subplots (1 × 1 m) were separately set at the center; four corners of each plot, and five pictures were taken in each of them. In each subplot, a GPS was used for positioning, and then destructive sampling was conducted for the aboveground grass by cutting all the grass. After removing the soil, gravel and other sundries adhering to the samples, they then were put into sealed plastic bags and transported to the laboratory, dried to a constant weight at 65 • C, and finally weighed to obtain the AGB. The AGB of each sample plot was expressed by the average value of the AGB of the five subplots.

Satellite Images Acquisition and Preprocessing
During the field measurements, the multispectral data observed in the study area were from Landsat-8 OLI (Operational Land Imager) products, which were obtained from the United States Geological Survey (USGS) (http://glovis.usgs.gov/ (accessed on 12 March 2022)). A total of two 185 × 185 km Landsat-8 OLI scenes were obtained to fully cover the study area. Only images with less than 10% cloud were selected as remote sensing data (5 July 2015 and 12 July 2015). The images were processed using the ENVI 5.3 software. Radiometric calibration was performed to obtain the top of the atmosphere radiance transformed from the digital number (DN). The FLAASH tool was subsequently used to perform atmospheric correction to convert the radiance value into surface reflectance [40]. Additionally, the FMASK algorithm was operated to detect and mask clouds and cloud shadows in the image. Finally, the two tiles were mosaicked together into one image, which was then clipped according to the spatial extent of the study area.

Field Data Collection
The field measurements were carried out in the study area from 5 to 18 July 2015. Around the central location of the study region, sample plots were randomly selected according to subjective evaluations and distributed as evenly as possible. Forty-six 30 × 30 m plots were determined. Five subplots (1 × 1 m) were separately set at the center; four corners of each plot, and five pictures were taken in each of them. In each subplot, a GPS was used for positioning, and then destructive sampling was conducted for the above-

Methods
A schematic representation of the methodology in this study is given in Figure 2. The global sensitivity analysis for the input parameters in the PROSAIL_5B model was first performed to recalibrate their values. On this basis, a simulated canopy reflectance dataset was established (Section 2.3.1). The simulated canopy spectra were converted to the pixel spectra of Landsat-8 OLI, and the subsequent analyses were implemented using simulated multispectral data (Section 2.3.1). The sensitive spectral bands and indices were selected and optimized according to the correlation analysis (Section 2.3.1). The LUT-based method was utilized to invert grassland AGB based on the observed and simulated spectral The PROSAIL model [41], coupled with the PROSPECT model [42] and the SAIL model [43], calculated the vegetation canopy reflectance in the whole spectral direction. In this study, the version of PROSAIL_5B resulting from a combination of PROSPECT_5B and 4SAIL was adopted [44]. In the PROSPECT_5B leaf optical properties model, the leaf reflectance and transmittance were calculated as a function of six parameters (Equation (1)) [45]. Then, these two parameters were input into the 4SAIL canopy reflectance model to simulate the canopy spectral (Equation (2)) [46]: The PROSAIL model [41], coupled with the PROSPECT model [42] and the SAIL model [43], calculated the vegetation canopy reflectance in the whole spectral direction. In this study, the version of PROSAIL_5B resulting from a combination of PROSPECT_5B and 4SAIL was adopted [44]. In the PROSPECT_5B leaf optical properties model, the leaf reflectance and transmittance were calculated as a function of six parameters (Equa-Remote Sens. 2023, 15, 2918 6 of 22 tion (1)) [45]. Then, these two parameters were input into the 4SAIL canopy reflectance model to simulate the canopy spectral (Equation (2)) [46]: (ρ l , τ l ) = PROSPECT_5 B(N, C ab , C car , C brown , DMC, EWT) (1) where ρ 1 is the leaf reflectance, τ 1 is the leaf transmittance and ρ c is canopy reflectance. The full names, symbols and units of all parameters required for the model are given in Table 1.
The PROSAIL model requires the involvement of many parameters. To reduce the amount of computation in the forward process, improve the inversion speed and ensure the accuracy of the simulation simultaneously, in this study, fixed values were set for some parameters based on the previous literature or field surveys, including the Leaf inclination distribution function (LIDF), leaf brown pigment (C brown ), carotenoid content (C ar ) and geometric characteristics. The LIDF was determined by the average leaf slope (LIDFa) and the bimodality of the distribution (LIDFb). Six leaf inclination modes could be obtained according to the different combinations of the two parameters. LIDF does typically not influence the canopy reflectance; hence, it was fixed in a spherical mode (LIDFa = −0.35, LIDFb = −0.15) based on the field measurements of this study. The C brown was fixed to 0 since field sampling was conducted during the grass growth stage. Car was set at 8 µg/cm 2 (the default value in the PROSAIL model) because its impact on LAI and DMC was negligible [18]. The geometric characteristics described by θ s , θ v , and θ z were determined according to the Landsat-8 metadata information. The ranges of other parameters (LAI, N, C ab , DMC, EWT, P soil and H spot ) were obtained from the literature and MOD15A2H product (Table 1). Their settings were recalibrated by sensitivity analysis (see Section 3.1) owing to the lack of measured data: the step size of the sensitive parameters was identified, and the values of the less sensitive parameters were fixed.

• Sensitivity analysis
Global sensitivity analysis (GSA) was conducted to study the influences of the input parameters (LAI, N, C ab , DMC, EWT, P soil and H spot ) and the interactions between different parameters on simulated canopy reflectance. The Sobol method is a global sensitivity analysis that applies the Monte Carlo algorithm for sampling, which is capable of obtaining the first-order, second-order and higher-order sensitivity of parameters based on the model decomposition [51]. In this study, the first-order sensitivity indices and total sensitivity indices were calculated, where the former represented the contribution of individual parameters to the model outputs, and the latter reflected the interactive influence among parameters [45]. To prevent the influence of insufficient samples and uneven distribution on the sensitivity analysis, 5000 samples were randomly generated by the Monte Carlo algorithm; then, the Sobol method was used for parameter sensitivity calculation.

• Spectral scale conversion
According to different measurement scales, spectral data could be classified into three types: material spectrum, end element spectrum and pixel spectrum [52]. On the basis of our research's purpose, it was necessary to convert the spectral scales. In this study, the simulated canopy reflectance in the study area was transformed into the pixel reflectance of Landsat-8 OLI. The simulated Landsat-8 OLI reflectance was derived with Equation (3) [53]: where R PROSAIL is the simulated hyperspectral reflectance of the PROSAIL model and λ 1 and λ 2 are the band wavelength limits. RSR λ is the spectral response of the Landsat-8 OLI sensor ( Figure 3). Monte Carlo algorithm; then, the Sobol method was used for parameter sensitivity calculation. •

Spectral scale conversion
According to different measurement scales, spectral data could be classified into three types: material spectrum, end element spectrum and pixel spectrum [52]. On the basis of our research's purpose, it was necessary to convert the spectral scales. In this study, the simulated canopy reflectance in the study area was transformed into the pixel reflectance of Landsat-8 OLI. The simulated Landsat-8 OLI reflectance was derived with Equation (3) [53]: where PROSAIL R is the simulated hyperspectral reflectance of the PROSAIL model and  •

Spectral indices
The vegetation indices could reduce the effects of interference factors such as the atmosphere, terrain, shadow and soil background with the purpose of highlighting vegetation information [54,55]. Therefore, RVI (ratio vegetation index) and NDVI (normalized difference vegetation index), which are widely used, were selected in this study. To enhance the performance of these two-band spectral indices and find the indicators that are significantly related to AGB, the band combination was changed from the original fixed red and near-infrared (NIR) band to all the possible two bands from simulated multispectral reflectance. Two types of spectral indices, RSI (ratio spectral index) and NDSI (normalized difference spectral index) [39], were considered. The calculation formulas were as follows (Equations (4) and (5)) [56]: •

Spectral indices
The vegetation indices could reduce the effects of interference factors such as the atmosphere, terrain, shadow and soil background with the purpose of highlighting vegetation information [54,55]. Therefore, RVI (ratio vegetation index) and NDVI (normalized difference vegetation index), which are widely used, were selected in this study. To enhance the performance of these two-band spectral indices and find the indicators that are significantly related to AGB, the band combination was changed from the original fixed red and near-infrared (NIR) band to all the possible two bands from simulated multispectral reflectance. Two types of spectral indices, RSI (ratio spectral index) and NDSI (normalized difference spectral index) [39], were considered. The calculation formulas were as follows (Equations (4) and (5)) [56]: where ρ is the spectral reflectance, and the subscripts i and j refer to the bands of Landsat-8 OLI (i = j).

LUT-Based Method
The LUT method is a widely used inversion methodology of vegetation biophysical and biochemical parameters. It is a traversal optimization algorithm that effectively avoids the problem of local minima by directly comparing the simulated spectra with the remotesensing image spectra [57]. During this inversion, a cost function was required to search for the optimal match between the optical image and simulated spectral features to identify the parameter combinations per pixel. RRMSE (relative root mean squared error) was used as the cost function (Equation (6)) [31]: where n is the dimension of the spectral feature vector used. R LUT i and R OLI i are the simulated and observed spectral features (Landsat 8 OLI-based variables), respectively.
Due to the ill-posed problem during the inversion process, the parameter combination (DMC and LAI) corresponding to spectral features in the LUT that yielded the minimum RRMSE was not always the best solution [58]. Therefore, to alleviate this problem, we investigated the use of the multiple solutions method, which considers the best 50 to 500 LUT entries (a step size of 50) as the retrieved parameter combinations. However, there may still be some unrealistic combinations in the retrieved parameter combinations obtained by the LUT-based method because of the ill-posed problem. Following the study of Quan et al. [18], which obtained good accuracy by estimating grassland AGB when LAI was constrained at a range to filter unrealistic combinations using MODIS products. The MOD15A2H product is an 8-day comprehensive dataset with 500 m spatial resolution [59]. The accuracy that the LAI value provides has been evaluated by [60][61][62]. In this study, therefore, the LAI range of the Landsat-8 image was estimated based on MOD15A2H and MOD13A1 products. MOD15A2H and MOD13A1 products (spatial resolution: 500 m) in the study area during the field measurements were downloaded from the USGS. LAI and EVI within the vegetation coverage in the study area were first extracted from these two products to yield a dataset consisting of 10,947 entries, of which 70% were used as the training set and the rest as a testing set. LAI and EVI within the vegetation coverage in the study area were first extracted from these two products to yield a dataset consisting of 10,947 entries, of which 70% were used as the training set and the rest as a testing set. Additionally, the model between LAI and EVI was trained by the RF algorithm, where LAI was the response variable, and EVI was the predictor. The trained model had R 2 with 0.89 and RMSE with 0.32. Then, the EVI value for each pixel (spatial resolution: 30 m) was derived with the bands of Landsat-8 OLI (Equation (7)): where ρ NIR , ρ red and ρ blue are the NIR band, the red band and the blue band, respectively. The calculated EVI values were input into the trained RF model to estimate the corresponding LAI values of Landsat-8 OLI (spatial resolution: 30 m). Finally, an LAI range was calculated by the estimated LAI value per pixel (0.8 × LAI, 1.2 × LAI) [18], and the combi-nations placed out of this range were removed from the retrieved parameter combinations. The average value of AGB (LAI × DMC) obtained from the remaining combinations was taken as the estimated AGB value of the pixel.

Regression Algorithms
Three regression algorithms were used to estimate the grassland AGB. These algorithms were PLSR, RF and SVM. To prevent a large uncertainty of the prediction results caused by the unrealistic spectral features in the regressing process, the practical simulated spectral features reserved in the LUT-based method were applied to train the models.
• PLSR PLSR is a commonly used multivariate regression algorithm for vegetation parameter estimation [63,64]. This technique is designed to alleviate multicollinearity and noise problems in the data [65]. PLSR concentrates many collinear variables on a few independent latent variables while eliminating variables with less information [66]. It has a similar principle to principal component regression (PCR), but with the difference that PCR only considers predictors in the decomposition process [66], while PLSR iteratively decomposes the response variables and predictors simultaneously to achieve the best fitting [63,66]. In this study, PLSR was used to develop a linear model between the AGB (response variable) and spectral characteristics (predictors). To avoid the over-fitting problem, we used the leave-one-out cross-validation approach [63].
• RF RF is a member of ensemble machine learning regression algorithms that predict a response variable (AGB) through a set of predictors (spectral characteristics) by generating many regression trees (forest) and taking the average value of the responses predicted from all trees as the result. In the forest, each regression tree is independently built by the bootstrap samples that only reserve two-thirds of the sample data [67]. At each node, the selection of predictors is based on a randomly chosen subset instead of all the predictors, which is a key feature for improving prediction performance [67]. RF has certain antiinterference to the noise in the training dataset and robustness against nonlinearity as well as over-fitting problems [68].
The model was optimized by tuning the following parameters with the aim of minimizing the error: ntree (the number of trees generated by the RF) and mtry (the number of predictors at each node). In this study, RF was implemented using the RF package in the R programming environment.
• SVM Support vector regression (SVR) is a method of SVM that can solve the problem of regression fitting, which minimizes the error of all training samples from the optimal hyperplane by finding an optimal classification surface [69]. Both linear and nonlinear regression predictions could be performed using SVR. It directly conducts linear regression to solve the linear problem. However, the kernel function was required to map the training samples to a high-dimensional space when solving the nonlinear problem, and then linear regression was performed. SVR can produce a relatively better prediction accuracy with a small sample compared to other algorithms [70]. In this study, the type of kernel function was the radial basis function (RBF), and the epsilon-SVR was selected as the SVM model.

Model Validation
To evaluate the performance of the above methods for estimating grassland AGB, the LUT-based method and the three constructed regression models were applied to the Landsat-8 image and were assessed using field observations. The coefficient of determination (R 2 ), the root mean square error (RMSE) and relative root mean square error (RMSE r ) (different from RRMSE in the LUT-based method) were used to evaluate the relationships between the measured and estimated AGB, and the evaluation indicator formulas were as follows (Equations (8)-(10)): where y est is the estimated AGB, y mea and y mea are the measured AGB and the average value of the measured AGB, respectively. n is the number of field observations.

Sensitivity Analysis
The global sensitivity curves for seven parameters (LAI, N, C ab , DMC, EWT, P soil and H spot ) were obtained from Figure 4, including the first-order sensitivity coefficients ( Figure 4a) and total sensitivity coefficients ( Figure 4b). The results showed that the firstorder sensitivity index of each parameter was slightly below the total sensitivity index. The sum of the Sobol first-order sensitivity contributions for these parameters was close to 100%, and the sum of the Sobol total sensitivity coefficients was marginally more than 100%, which indicated that their interactions had no evident effect on the variation of canopy reflectance.    The regions of the sensitive spectra varied significantly among the parameters (Figure 4a). The sensitivity of C ab fluctuated in the visible range (400-780 nm) and was mainly sensitive to 530-642 nm and the red range (692-744 nm) with an index above 0.3. The sensitivity of EWT had an opposite trend compared to that of C ab . Obvious changes existed in the NIR region (780-2500 nm), in which its contribution was more than 0.4 within 1149-1882 nm. As two key parameters for calculating grassland AGB, LAI was highly sensitive to 400-525 nm, 658-689 nm, 1893-1998 nm and 2329-2500 nm with a contribution above 0.7. DMC's sensitivity index, which fluctuated continuously between 0-0. 35 and was more sensitive to 753-1152 nm and 1236-1301 nm in the NIR range, with the contribution rate reaching more than 20%. The sensitivity indices of P soil , N, and H spot within the whole spectral range were below 0.1, indicating that they were extremely insensitive to the overall canopy reflectance (400-2500 nm).

Spectral Scale Conversion and Correlation Analysis
Based on the above analysis, C ab , LAI, DMC and EWT were identified as sensitive parameters that contributed greatly to the grassland canopy reflectance; their values were set in certain step sizes: the step size of LAI and DMC was set as small as possible, and the settings of C ab and EWT were relatively large because they had less influence on the target variable (AGB). Whereas N, P soil and H spot were determined to be insensitive parameters, they were set as fixed values according to [31,46,50], respectively. The final calibration of the seven model parameters is shown in Table 2. As a result, the fixed value or step size of all input parameters required for the PROSAIL model was identified (Tables 1 and 2). Additionally, the simulated spectral dataset was generated based on the PROSAIL model, containing 46,800 simulated spectral and corresponding parameter combinations.

Spectral Scale Conversion and Correlation Analysis
The simulated multispectral reflectance of Landsat-8 OLI (B1, B2, B3, B4, B5, B6, B7) was generated from the simulated hyperspectral reflectance based on the spectral response functions. The reflectance of seven bands in the simulated Landsat-8 OLI spectral dataset is shown in Figure 5 (500 randomly selected entries). Correlations were analyzed between the simulated spectral bands and AGB. B1, B2 and B7 revealed a high correlation with AGB among the single bands with correlation coefficients (R) that reached more than 0.45 at a p-value < 0.001 ( Figure 6).

AGB Model Performance
As described in Section 3.2, the selected spectral bands and indices and corresponding grassland AGB from the simulated multispectral reflectance dataset, which was constructed based on the PROSAIL model, were used to form a new dataset to estimate AGB in the study area based on LUT-based and regression algorithms. Their estimation accuracies were validated against 46 AGB field measurements.

Estimation Accuracy Based on LUT-Based Method
In the LUT-based method, RRMSE (Equation (6)) was used as a cost function, where the feature vector referred to the 11 spectral features listed in Table 3 (i.e., n = 11, three spectral bands and eight spectral indices). For the observed spectral characteristics per pixel, LUT was sorted in ascending order of RRMSE. We tested the best 50 to 600 LUT entries (a step size of 50) as the retrieved parameter combinations (a total of 12 multiple solutions). The accuracies of the different solutions for estimating AGB are presented in Figure 8. The results demonstrated that the first 450 solutions showed the best accuracy with R 2 = 0.62, which was regarded as the best measure regarding the LUT-based method in this study. In summary, the three spectral bands (B1, B2 and B7) and eight spectral indices (NDSI b3, b4 , NDSI b5, b6 , NDSI b5, b7 , NDSI b6, b7 and RSI b3, b4 , RSI b5, b6 , RSI b5, b7 , RSI b6, b7 ) that significantly correlated with AGB were chosen to establish the models for estimating AGB in this study (Table 3).

AGB Model Performance
As described in Section 3.2, the selected spectral bands and indices and corresponding grassland AGB from the simulated multispectral reflectance dataset, which was constructed based on the PROSAIL model, were used to form a new dataset to estimate AGB in the study area based on LUT-based and regression algorithms. Their estimation accuracies were validated against 46 AGB field measurements.

Estimation Accuracy Based on LUT-Based Method
In the LUT-based method, RRMSE (Equation (6)) was used as a cost function, where the feature vector referred to the 11 spectral features listed in Table 3 (i.e., n = 11, three spectral bands and eight spectral indices). For the observed spectral characteristics per pixel, LUT was sorted in ascending order of RRMSE. We tested the best 50 to 600 LUT entries (a step size of 50) as the retrieved parameter combinations (a total of 12 multiple solutions). The accuracies of the different solutions for estimating AGB are presented in Figure 8. The results demonstrated that the first 450 solutions showed the best accuracy with R 2 = 0.62, which was regarded as the best measure regarding the LUT-based method in this study.  Figure 9a shows the validation accuracy based on the LUT-based method with an R 2 value of 0.62, an RMSE of 24.28 g/m 2 and an RMSEr of 30.97%. However, it was found that the retrieved AGB values were marginally underestimated in the range of more than approximately 100 g/m 2 . This was probably because the spectra of the soil used in the PROSAIL model did not exactly match the actual soil situation in this region.  Figure 9a shows the validation accuracy based on the LUT-based method with an R 2 value of 0.62, an RMSE of 24.28 g/m 2 and an RMSE r of 30.97%. However, it was found that the retrieved AGB values were marginally underestimated in the range of more than approximately 100 g/m 2 . This was probably because the spectra of the soil used in the PROSAIL model did not exactly match the actual soil situation in this region.

Estimation Accuracy Based on PROSAIL and Regression Algorithms
A total of 18544 simulated spectral features and corresponding parameter combinations (spectral features are listed in Table 3) were reserved in the LUT-based method, of which 70% were used for training and 30% for testing to obtain the best estimation model. Figure 9b-d shows the accuracy results of the PROSAIL model when coupled with these regression algorithms using the 46 measured AGB values. The scatter plots of measured AGB and estimated AGB showed obvious differences among the three algorithms for estimating AGB. The top-performing method was the PROSAIL + RF model, which obtained the highest R 2 (0.70) and the lowest RMSE (21.65 g/m 2 ) and RMSE r (27.62%) compared to the other two regression models. Additionally, the accuracy of the PROSAIL + RF model was also better than that of the LUT-based method. The differences in R 2 and RMSE between the PROSAIL + RF and LUT-based methods were approximately 0.08 and 2.63 g/m 2 , respectively. Figure 9b shows that the measured versus estimated AGB based on the PROSAIL + RF model was relatively and evenly distributed around the 1:1 line. This was followed by the PROSAIL + PLSR algorithm (R 2 = 0.51, RMSE = 27.52 g/m 2 , RMSE r = 35.10%), which was poorer than the LUT-based method. In addition, the predicted AGB values based on the PROSAIL + PLSR model were mostly underestimated to various degrees, indicating that the saturation problem occurred (Figure 9c). The PROSAIL + SVM model performed relatively worse compared to all other algorithms (R 2 = 0.30, RMSE = 32.88 g/m 2 , RMSE r = 41.94%). However, the saturation phenomenon was not observed for the high AGB values (Figure 9d).

Estimation Accuracy Based on PROSAIL and Regression Algorithms
A total of 18544 simulated spectral features and corresponding parameter combinations (spectral features are listed in Table 3) were reserved in the LUT-based method, of which 70% were used for training and 30% for testing to obtain the best estimation model. Figure 9b-d shows the accuracy results of the PROSAIL model when coupled with these regression algorithms using the 46 measured AGB values. The scatter plots of measured AGB and estimated AGB showed obvious differences among the three algorithms for estimating AGB. The top-performing method was the PROSAIL + RF model, which obtained the highest R 2 (0.70) and the lowest RMSE (21.65 g/m 2 ) and RMSEr (27.62%) compared to the other two regression models. Additionally, the accuracy of the PROSAIL + RF model was also better than that of the LUT-based method. The differences in R 2 and RMSE between the PROSAIL + RF and LUT-based methods were approximately 0.08 and 2.63 g/m 2 , respectively. Figure 9b shows that the measured versus estimated AGB based on the PROSAIL + RF model was relatively and evenly distributed around the 1:1 line. This was followed by the PROSAIL + PLSR algorithm (R 2 = 0.51, RMSE = 27.52 g/m 2 , RMSEr = 35.10%), which was poorer than the LUT-based method. In addition, the predicted AGB values based on the PROSAIL + PLSR model were mostly underestimated to various degrees, indicating that the saturation problem occurred

Spatial Distribution of Grassland AGB
The estimation map of grassland AGB was mapped by coupling the PROSAIL model and RF algorithm on the basis of the LUT method in the study area ( Figure 10). The estimated values varied from 8.573 g/m 2 to 293.378 g/m 2 . We masked the pixels (white part of Figure 10) in non-grassland areas such as clouds, rivers and towns, which were not involved in AGB inversion. The spatial distribution of retrieved AGB had significant heterogeneity and showed the characteristic of increasing from west to east. model and RF algorithm on the basis of the LUT method in the study area ( Figure 10). The estimated values varied from 8.573 g/m 2 to 293.378 g/m 2 . We masked the pixels (white part of Figure 10) in non-grassland areas such as clouds, rivers and towns, which were not involved in AGB inversion. The spatial distribution of retrieved AGB had significant heterogeneity and showed the characteristic of increasing from west to east.

The Simulated Dataset Based on PROSAIL Model
In this study, LAI, Cab, EWT and DMC were determined as sensitive to canopy reflectance among the PROSAIL model parameters through global sensitivity analysis, while N, Psoil and Hspot had fewer sensitivity indices within the whole spectral range. The sensitivity index is an effective indicator for measuring the impact of the input parameters on the output of a model. Sawut et al. [39] fixed DMC, Hspot, N and EWT during simulations of the Suaeda salsa canopy reflectance based on the PROSAIL model, while sensitive parameters such as Bt (Betalain content), Psoil, LAI, and Cab set to different values by means of step size. The Sentinel-2A reflectance was further simulated based on the canopy reflectance, and reliable Bt inversion results were obtained (R 2 =0.68). Jing et al. [71] fixed five insensitive parameters based on the sensitivity analysis of the PROSAIL model to reduce the complexity and instability of LAI estimation. A total of 46800 simulated grassland canopy reflectance of our study were yielded based on different combinations of PROSAIL input parameters. To perform the inversion of AGB from the satellite images, the simulated Landsat-8 OLI reflectance dataset was further generated with spectral response functions. The simulated B1, B2 and B7 bands showed a high correla-

The Simulated Dataset Based on PROSAIL Model
In this study, LAI, C ab , EWT and DMC were determined as sensitive to canopy reflectance among the PROSAIL model parameters through global sensitivity analysis, while N, P soil and H spot had fewer sensitivity indices within the whole spectral range. The sensitivity index is an effective indicator for measuring the impact of the input parameters on the output of a model. Sawut et al. [39] fixed DMC, H spot , N and EWT during simulations of the Suaeda salsa canopy reflectance based on the PROSAIL model, while sensitive parameters such as Bt (Betalain content), P soil , LAI, and C ab set to different values by means of step size. The Sentinel-2A reflectance was further simulated based on the canopy reflectance, and reliable Bt inversion results were obtained (R 2 = 0.68). Jing et al. [71] fixed five insensitive parameters based on the sensitivity analysis of the PROSAIL model to reduce the complexity and instability of LAI estimation. A total of 46,800 simulated grassland canopy reflectance of our study were yielded based on different combinations of PROSAIL input parameters. To perform the inversion of AGB from the satellite images, the simulated Landsat-8 OLI reflectance dataset was further generated with spectral response functions. The simulated B1, B2 and B7 bands showed a high correlation with AGB, reaching more than 0.45 (p < 0.001). Additionally, RSI and NDSI were calculated based on simulated multispectral reflectance. The inversion accuracy based on traditional vegetation indices largely relied on the selected bands [72]. Although many vegetation indices have been studied, some bands are still seldom or never used together. In order to identify the bestperforming spectral bands in the inversion of grassland AGB, therefore, it was necessary to test all the possible two-band combinations from simulated multispectral reflectance. Many studies have proved that spectral indices show great advantages in estimating vegetation biophysical and biochemical attributes [39,[73][74][75]. The correlation analysis results of our study showed that the spectral indices NDSI b3,b4 , NDSI b5,b6 , NDSI b5,b7 , NDSI b6,b7 and RSI b3,b4 , RSI b5,b6 , RSI b5,b7 , RSI b6,b7 provided strong correlation(R > 0.4, p < 0.001). The 11 sensitive spectral features of AGB were successfully obtained (Table 3).

Dataset Optimization Based on LUT Method
Some progress has been attained in the retrieval of vegetation parameters, such as LAI, C ab and EWT, based on the RTM model [32,76,77]. However, the inversion accuracy of the traditional RTM was limited owing to the complexity of the model algorithm: the difference between the simulated and actual spectral information caused by many surface uncertainty factors and the influence of an ill-posed inversion process. Therefore, many studies have been performed to alleviate the problems encountered in the inversion of RTM. Some researchers have attempted to use prior knowledge to constrain the number and range of the input parameters and reduce the influence of ill-posed inversion problems [78]. Additionally, the ill-posed problem was also solved to some extent by using multiple solutions instead of a unique solution. Darvishzadeh et al. [57] analyzed the unique solution and multiple solutions methods separately and finally considered that the average value of the first 100 solutions was the most favorable in their study. Quan et al. [18] used MODIS products to filter out the unrealistic combinations in the LUT and obtained a good estimation result of AGB with an R 2 of 0.64 and RMSE of 42.67 g/m 2 . Thus, for the LUT-based method in this study, the method of multiple solutions and the utilization of MODIS products to eliminate unrealistic parameter combinations was also considered, thereby achieving the optimization of the simulated dataset. The first 450 solutions were filtered and then averaged. The estimation accuracy R 2 reached 0.62, and the RMSE and RMSE r were 24.28 g/m 2 and 30.97%, respectively, which confirmed the effectiveness of the LUT-based method in this study.

Estimation Performance
In this study, the inversion of grassland AGB was achieved by combining the PROSAIL model with LUT using the sensitive bands and spectral indices (Table 3) as explanatory variables. Furthermore, the PROSAIL model was separately coupled to the three regression algorithms (PLSR, RF and SVM) to perform the same predictions using simulated spectral features reserved in the LUT-based method as predictors, and the accuracies of the four methods were compared. The PROSAIL + RF model showed the best performance (R 2 = 0.70, RMSE = 21.65 g/m 2 , RMSE r = 27.62%), followed by the LUT-based method (R 2 = 0.62, RMSE = 24.28 g/m 2 , RMSE r = 30.97%). The accuracy level of grassland AGB estimated by the PROSAIL + PLSR model (R 2 = 0.51, RMSE = 27.52 g/m 2 , RMSE r = 35.10%) was distinctly better than the PROSAIL + SVM model (R 2 = 0.30, RMSE = 32.88 g/m 2 , RMSE r = 41.94%) but lower than the LUT-based method. However, a saturation phenomenon occurred in the PROSAIL + PLSR, and this problem was not discovered in the other methods. The outstanding performance of the RF model could be attributed to its distinctive variable selecting and regression method promoting robustness against noise [67]. PLSR was powerful at dealing with collinearity and noisy data [65]. However, its performance was limited due to the fact that it only constructed a linear model, which could not clarify the complex relationship between the variables and AGB very well. The poor estimation performance of the SVM model was possibly caused by the deviation of the optimal hyperplane generated by SVR due to the noise in the training dataset.
The method combining the PROSAIL model with regression algorithms to estimate the grassland AGB was based on the simulated spectral dataset constructed by the RTM, which contained an explicit relationship between canopy attributes and spectral features. Traditional empirical methods using AGB field data and remote sensing features have been considerably mature for grassland AGB estimation. Liang et al. [79] established regression models between the NDVI derived from MODIS and the measured grassland AGB, with a minimum RMSE of 87.512 g/m 2 . Quan et al. [18] constructed PLSR models between Landsat-8 OLI multispectral data and AGB measurements and attained a good result (R 2 = 0.55, RMSE = 37.79 g/m 2 ). Compared with previous empirical methods, the proposed method, especially the RTM combined with the RF model in this study, achieved similar or even higher accuracies. This methodology is rarely applied to the estimation of grassland AGB, although similar studies have been conducted to retrieve vegetation biochemical parameters. Lü [80] constructed the PROSAIL + GBM (gradient boosting machine) model and the PROSAIL + RF model to estimate the chlorophyll content at the canopy scale and found that the latter acquired better accuracy and was less timeconsuming. Sawut et al. [39] integrated the PROSAIL model with SVM to estimate Bt in Suaeda salsa at different spectral scales, and its prediction accuracy (R 2 ) reached 0.78 at the hyperspectral scale and 0.68 at the multispectral scale. Traditional empirical methods are highly dependent on sufficient field-measured data and can only be applied at a specific time and site. By contrast, the simulated dataset based on the RTM was used by the data source to estimate grassland AGB. The measured data were only used to evaluate the accuracy of each estimation model. Thus, it was not limited by the measured data and could be applied to grassland AGB estimation across regions. Furthermore, the simulated dataset was further optimized based on MODIS products rather than used directly to make it more representative of the study area; this achieved the aim of reducing the uncertainty of the inversion result. Therefore, the method based on RTM (especially the RTM + RF model) was more reproducible and promising. The estimation results indicated that the PROSAIL + RF model with optimized spectral characteristics displayed great potential for estimating grassland AGB (R 2 = 0.70, RMSE = 21.65 g/m2, RMSE r = 27.62%). It was illustrated that the coupling of the PROSAIL and the RF model could alleviate this ill-posed problem to a certain extent and successfully estimate grassland AGB. This methodology should be further investigated and validated in other grassland areas with different climate conditions and grassland types before extensive application.

Conclusions
In this study, a novel method of combining the PROSAIL model and regression models based on the LUT algorithm was proposed to estimate grassland AGB. To overcome this ill-posed problem, the relationship between the EVI and LAI of MODIS products was determined and was then applied to filter out the unrealistic combinations of spectral characteristics and their corresponding parameters. The filtered dataset used in the LUTbased method was applied to the other three algorithms (PLSR, RF and SVM) to estimate AGB. The main findings of this study were as follows: (1) LAI, C ab , EWT and DMC were determined as sensitive to canopy reflectance among the PROSAIL model parameters through global sensitivity analysis.
Therefore, the proposed method in this study demonstrated significance in monitoring the spatiotemporal distribution and changes in grassland AGB in large-scale areas. Future work should focus on extending this estimated workflow into other regions to further verify its effectiveness.