A Two-Stage Approach to the Estimation of High-Resolution Soil Organic Carbon Storage with Good Extension Capability

: Soil organic carbon storage (SOCS) estimation is a crucial branch of the atmospheric– vegetation–soil carbon cycle study under the background of global climate change. SOCS research has increased worldwide. The objective of this study is to develop a two-stage approach with good extension capability to estimate SOCS. In the ﬁrst stage, an artiﬁcial neural network (ANN) model is adopted to estimate SOCS based on 255 soil samples with ﬁve soil layers (20 cm increments to 100 cm) in Luoding, Guangdong Province, China. This method is compared with three common methods: The soil type method (STM), ordinary kriging (OK), and radial basis function (RBF) interpolation. In the second stage, a linear model is introduced to capture the regional differences and further improve the estimation accuracy of the Luoding-based ANN model when extending it to Xinxing, Guangdong Province. This is done after assessing the generalizability of the above four methods with 120 soil samples from Xinxing. The results for the ﬁrst stage show that the ANN model has much better estimation accuracy than STM, OK, and RBF, with the average root mean square error (RMSE) of the ﬁve soil layers decreasing by 0.62–0.90 kg · m − 2 , R 2 increasing from 0.54 to 0.65, and the mean absolute error decreasing from 0.32 to 0.42. Moreover, the spatial distribution maps produced by the ANN model are more accurate than those of other methods for describing the overall and local SOCS in detail. The results of the second stage indicate that STM, OK, and RBF have poor generalizability ( R 2 < 0.1), and the R 2 value obtained with ANN method is also 43–56% lower for the ﬁve soil layers compared with the estimation accuracy achieved in Luoding. However, the R 2 of the linear models built with the 20% soil samples from Xinxing are 0.23–0.29 higher for the ﬁve soil layers. Thus, the ANN model is an effective method for accurately estimating SOCS on a regional scale with a small number of ﬁeld samples. The linear model could easily extend the ANN model to outside areas where the ANN model was originally developed with a better level of accuracy.


Introduction
Against the background of global warming, the accurate estimation of soil organic carbon storage (SOCS) has become a massive task faced by the forestry and ecological circles. Due to the different data acquisition platforms available in China, putting forward an accurate and useful SOCS estimation standard and method [1] is urgently needed for SOCS-related research to progress. In recent years, the amount of research on soil organic carbon (SOC) and SOCS has increased, both domestically and overseas [1,2], with a mean annual growth rate of 9%. Emerging SOCS estimation methods have been promoted by a few researchers. The existing SOCS estimation methods can be roughly classified into three categories: Soil-type methods, life-zone methods, and modelling methods. A recent study showed that 93 of the 500 previously published papers on carbon storage focus on soil carbon estimation, among which 79 (84.9%) used the soil-type method [2]. The The main tree species present were Cunninghamia lanceolata (Lamb.) Hook., Eucalyptus robusta Smith, and Pinus massoniana Lamb. [23]. The soil parent materials included granite, sand shale, and erosive parent materials [21]. The soil zone of the study area was the lateritic red earth zone of the south Asian subtropical monsoon rain forest. The mean soil thickness in the Luoding and Xinxing study areas was 107.8 and 99.3 cm, respectively.

Field Sample
Soil profile data for Luoding (Estimation area) and Xinxing (Extension area) were obtained from Guangdong Academy of Forestry Sciences as part of an investigation and evaluation in a forest soil nutrient survey project that began in 2015 [23]. In total, 345 soil profiles were collected, of which 225 and 120 samples came from Luoding and Xinxing, respectively. Each profile was divided into five layers: 0-20 cm (L1), 20-40 cm (L2), 40-60 cm (L3), 60-80 cm (L4), and 80-100 cm (L5). The excavation depth of the profiles was either 1 m or to the parent material layer if a profile's thickness was less than 1 m. A total of 1695 samples were collected from 345 soil profiles with 3-5 layers. The bulk density (BD) was measured by the ring (core) method [24,25]. The BD was calculated with Equation (1), and its statistics are listed in Table 1. The soil samples were air-dried in the laboratory stage and then sieved through 3, 2, and 0.25 mm sieves. The weight and volume of the gravel (diameter > 2 mm) were recorded. The soil organic matter(SOM) content of soil samples was measured by the potassium dichromate-titration method [26]. Then, the SOM values of the soil samples were transformed into the SOCD using Equation where is the bulk density of the samples (g·cm −3 ), is the original weight of the soil (g), is the soil water content (as a percentage), is the cylinder's volume (cm 3 ). is the soil organic carbon density (kg·m −2 ) of the samples, T is the thickness of the soil layer

Field Sample
Soil profile data for Luoding (Estimation area) and Xinxing (Extension area) were obtained from Guangdong Academy of Forestry Sciences as part of an investigation and evaluation in a forest soil nutrient survey project that began in 2015 [23]. In total, 345 soil profiles were collected, of which 225 and 120 samples came from Luoding and Xinxing, respectively. Each profile was divided into five layers: 0-20 cm (L1), 20-40 cm (L2), 40-60 cm (L3), 60-80 cm (L4), and 80-100 cm (L5). The excavation depth of the profiles was either 1 m or to the parent material layer if a profile's thickness was less than 1 m. A total of 1695 samples were collected from 345 soil profiles with 3-5 layers. The bulk density (BD) was measured by the ring (core) method [24,25]. The BD was calculated with Equation (1), and its statistics are listed in Table 1. The soil samples were air-dried in the laboratory stage and then sieved through 3, 2, and 0.25 mm sieves. The weight and volume of the gravel (diameter > 2 mm) were recorded. The soil organic matter (SOM) content of soil samples was measured by the potassium dichromate-titration method [26]. Then, the SOM values of the soil samples were transformed into the SOCD using Equation (2): where BD s is the bulk density of the samples (g·cm −3 ), M is the original weight of the soil (g), w is the soil water content (as a percentage), V is the cylinder's volume (cm 3 ). SOCD s is the soil organic carbon density (kg·m −2 ) of the samples, T is the thickness of the soil layer (cm), SOM s is the SOM content of the samples (g·kg −1 ), 0.58 is the Bemmelen coefficient [27], and α is the percentage of rock fragments (gravel) in the soil with diameters greater than 2 mm.

Other Data and Materials
The coarse resolution 0-20 cm soil organic matter map (CSOM) was obtained from the Atlas of Guangdong soil on a scale of 1:2,800,000 and was compiled by the Guangdong soil survey team (Guangdong Soil Survey Team, 1993). Then, it was transferred to a digital map within a geographic information system. The content of coarse-resolution soil organic matter (CSOM) in the map was delineated at six levels (Level 1: greater than 40 g/kg; Level 2: 30-40 g/kg; Level 3: 20-30 g/kg; Level 4: 10-20 g/kg; Level 5: 6-10 g/kg; Level 6: <6 g/kg; National soil survey office, 1998). This study applied the CSOM as the required input parameter for the ANN model to represent the general conditions of SOM in a region. For the purposes of modelling and map production, the CSOM map was converted from features to a raster map with a 10 m resolution after the levels of the maps were numbered. The digital elevation model (DEM) data obtained from Guangdong Academy of Forestry Sciences were derived from stereo images of Cartosat-1 (IRS-P5) with a 12.5 m resolution [28] and resampled to a 10 m resolution to meet the map production needs. The spatial analyst extension and developed Forest Hydrology Tools of ArcGIS [29,30] (ArcGIS 10.2 for Desktop. Version 10.2.0.3348. RedLands, Esri Inc., Redlands, CA, USA) were used to derive four topographic variables, slope, aspect, topographic position index (TPI), and potential solar radiation (PSR), and five hydrological variables, the soil terrain factor (STF), sediment delivery ratio (SDR), depth to water (DTW), flow length (FL), and flow direction (FD). The TPI refers to the relative topographic position of the central point, which is the difference between the elevation of this point and the mean elevation of a predetermined neighborhood [31,32]. The SDR is the ratio of the sediment transported to the outlet to the total erosion in the watershed area [33]. The DTW refers to the elevation differences between the land and the nearest water surfaces [34,35]. This study applied nine variables derived from the DEM model as the candidate parameters for the ANN model input.
A soil-type map ( Figure 2) with a 1:1,000,000 scale was downloaded from the Soil SubCenter, National Earth System Science Data Center, National Science and Technology Infrastructure of China (http://soil.geodata.cn (accessed on 1 June 2020)) [36]. The classification units used in this study were subtypes. Lateritic earth was present in the greatest frequency in both Luoding and Xinxing, accounting for 37.1% and 69.4% of the total area, respectively. Additionally, for the purpose of map production, the soil-type map was converted from features to a raster map with a 10 m resolution after the classes of the maps were numbered.  [37]. This ANN model was trained with a back-propagation technique, which adjusted the weight and bias values along a negative gradient descent to minimize the mean squared error (MSE) between the network outputs (predicted values) and the targeted values (measured values) [38]. The BP-ANN model had a threelayer structure: an input layer, an output layer, and a hidden layer. The input layer was made up of several predictors (required parameters and candidate parameters) with 2-10 nodes according to the parameters' combination. The output layer contained one node, SOCD. The number of hidden layer nodes was 35, and this was determined based on previous studies [39,40]. An early stopping method was used to quickly confirm the fittest model coefficients and avoid over-fitting [41]. The early stopping method divided the training data into a training subset and test subset, in which the latter was used to test the training error by the end of a training period. In this process, if the training root mean square error (RMSE) decreased while the testing RMSE increased, the early stopping method stopped the training [42]. The programming was implemented in Matlab software (Matlab (R2016a). Version 9.0.0.341260. Natick, The MathWorks Inc., Natick, MA, USA).
2. Selection of model inputs. This study took the CSOM maps as the required parameter and the nine topographic and hydrological variables derived from the 10 m resolution DEM data as the candidate parameters. These parameters were used because the CSOM was able to capture the SOCS at regional scales, providing basic data support for the ANN model. The DEM-derived variables were able to modify the SOCS on the local scale [39]. Each combination of candidate parameters corresponded to an ANN model. For the nine parameters the number of combinations, as calculated by + … + , was 511 [39,40]. The required parameter, the candidate parameters, and 90% of soil samples in Luoding were input into the ANN model. We chose the optimal combination of candidate parameters with the best accuracy through 10-fold cross-validation. In the 10-fold crossvalidation mode, the modeling dataset (90% of soil samples) was divided into 10 equal  This study applied the back-propagation ANN (BP-ANN) because it can accommodate nonlinearity when limited discontinuous points are available between the input and output data [37]. This ANN model was trained with a back-propagation technique, which adjusted the weight and bias values along a negative gradient descent to minimize the mean squared error (MSE) between the network outputs (predicted values) and the targeted values (measured values) [38]. The BP-ANN model had a three-layer structure: an input layer, an output layer, and a hidden layer. The input layer was made up of several predictors (required parameters and candidate parameters) with 2-10 nodes according to the parameters' combination. The output layer contained one node, SOCD. The number of hidden layer nodes was 35, and this was determined based on previous studies [39,40]. An early stopping method was used to quickly confirm the fittest model coefficients and avoid over-fitting [41]. The early stopping method divided the training data into a training subset and test subset, in which the latter was used to test the training error by the end of a training period. In this process, if the training root mean square error (RMSE) decreased while the testing RMSE increased, the early stopping method stopped the training [42]. The programming was implemented in Matlab software (Matlab (R2016a). Version 9.0.0.341260. Natick, The MathWorks Inc., Natick, MA, USA).
2. Selection of model inputs. This study took the CSOM maps as the required parameter and the nine topographic and hydrological variables derived from the 10 m resolution DEM data as the candidate parameters. These parameters were used because the CSOM was able to capture the SOCS at regional scales, providing basic data support for the ANN model. The DEM-derived variables were able to modify the SOCS on the local scale [39]. Each combination of candidate parameters corresponded to an ANN model. For the nine parameters the number of combinations, as calculated by C 1 9 + . . . + C 9 9 , was 511 [39,40].
The required parameter, the candidate parameters, and 90% of soil samples in Luoding were input into the ANN model. We chose the optimal combination of candidate parameters with the best accuracy through 10-fold cross-validation. In the 10-fold cross-validation mode, the modeling dataset (90% of soil samples) was divided into 10 equal subsets. A Land 2021, 10, 517 6 of 20 model was built with data in nine subsets as the calibration data, and this was validated with the remaining subset as validation data; this process was repeated 10 times until all subsets were used in the validation set [39].
3. SOCD estimation. After obtaining the optimal SOCD prediction values and the spatial distribution maps with 10 m resolution for five soil layers, we were able to calculate the SOCS of each layer by multiplying the forest area of Luoding with the mean SOCD of each layer. RedLands, Esri Inc., Redlands, CA, USA) to extract soil subtype information from 90% of soil samples from a soil type map at a scale of 1:1,000,000 ( Figure 2). After obtaining the information on the corresponding subtypes from 90% of the soil samples, the SOCD of each subtype at the Ln layer was calculated using Equation (3) (for subtypes not covered by samples, the SOCD was replaced by the mean SOCD of all subtypes in the mapping process). Then, the SOCD and SOCS of a soil layer were determined using Equations (4) and (5), respectively: where SOCD tn is the soil organic carbon density (kg·m −2 ) of a subtype (refers to "t" in the equation) at the Ln soil layer, SOCD st is the soil organic carbon density (kg·m −2 ) of samples belonging to subtype t, x is the number of samples within subtype t, and α is the percentage of rock fragments (gravel) in the soil with diameters greater than 2 mm. SOCD n is the soil organic carbon density of the Ln soil layer. SOCS n is the soil organic carbon storage (Tg) of the Ln soil layer and A is the area of Luoding (km 2 ). 2. Ordinary Kriging (OK): The samples used for OK interpolation had to meet the intrinsic hypothesis or second stationary assumption. Additionally, the samples had to obey a normal distribution. This study adopted the number field sieve method to identify the extremum by using samples' mean values to add or subtract three-fold standard deviations [43]. The extremum, which is outside the number field, was replaced by extreme values within the number field. For the first step, a semivariogram (Equation (6)) using the field samples was expressed as a function of the distance between sampled points, and the integrity of spatial continuity was calculated in one or multiple directions. We chose the optimal model from the Gaussian, exponential, and spherical models to form the experimental semivariograms [44][45][46]. The semivariogram analysis of OK was conducted using GS+9.0 software. The second step of OK interpolation was to calculate the values of unknown points as weighted sums of the adjacent samples' SOCD values (Equation (7)) [47,48]: where γ(h) is the semivariogram for the distance interval class h known as lag, N(h) is the number of paired samples in the lag interval, and Z(x i ) and Z(x i + h) are the sample's SOCD values at two points separated by the distance interval h. Z(x) is the value to be Land 2021, 10, 517 7 of 20 estimated at location x, and n is the number of sites within the search neighborhood used for the SOCD estimation.

Basis Function (RBF):
The RBF is suitable for calculating a large number of samples. It can achieve smooth interpolation by calculating a set of weighted coefficients of the nodes through the basis function. The RBF method involves the use of a family of five deterministic exact interpolation techniques: The inverse multi-quadric function, the multi-quadric function, the spline with tension, the completely regularized spline, and the thin-plate spline [6,44]. Of these five functions, the spline with tension with the lowest error was selected for SOCD interpolation.

Extension Test of the Four Estimation Methods
This part of the study was conducted to determine the extension capability of the four methods when facing a nearby area with similar hydrological and geological conditions to the estimation area (Luoding). The four estimation methods only extended their results estimated in Luoding to Xinxing, and no soil profile data from Xinxing were adopted. The accuracy assessment of the extension results used 80% of the soil profile data from Xinxing.

The Second Stage: The Linear Model for Extending the ANN Estimation Method
The aim of building the linear model was to modify the SOCD estimated by the ANN method to adapt the geological differences over a large extended area, so we used Xinxing as the test area. Besides improving the ANN model's estimation accuracy at Xinxing, the linear model used a small number of field samples to reduce the sampling cost. In line with Zhao's research [39], 1/5 soil samples can be used to build a steady linear model; hence 20% of soil samples from Xinxing were randomly selected as the modelling data for the linear model in this study. The other 80% of the samples were applied for validation. In the first step, we used Matlab software (Matlab (R2016a). Version 9.0.0.341260. Natick, The MathWorks Inc., Natick, MA, USA) to produce maps of Xinxing by adopting the Luodingbased ANN model. In this process, we did not use any data from Xinxing. In the second step, linear model building used SOM levels from the CSOM map for classification, which meant that one level corresponded to a linear model. In the CSOM map, the proportions of SOM from Xinxing from high to low were 45.2%, 44.0%, 10.6%, and 0.2% of the total area for Level 3, Level 5, Level 4, and Level 2, respectively. In the calculation process, we merged Level 2 with Level 3 and classified the 120 soil profiles of Xinxing according to their corresponding levels. Subsequently, 20% of the soil samples from each level were selected to build a corresponding linear model (Equation (8)): where SOCD e is the adapted SOCD. SOCD ANN is the initial SOCD estimated by the Luoding-based ANN model. i is an SOM Level of Xinxing. a i is the shifting parameter, which describes the average difference in SOCD between the ANN model area (Luoding) and the extended linear model area (Xinxing). b i is the stretching parameter, which describes the rate of change in SOCD between Luoding and Xinxing.

Accuracy Assessment
The estimation results for Luoding were validated by 10% of the Luoding soil samples, and the extended accuracy validation used 80% of the soil samples from Xinxing. Three indices were adopted to describe the accuracy: the root mean square error (RMSE), the determination coefficient (R 2 ), and the mean absolute error (MAE). The calculations used were as follows: Land 2021, 10, 517 8 of 20 where X m,i is the measured value, X pre,i is the predicted value, N is the number of samples, X pre,i is the fit value, and X pre,i is the mean of predicted value.

Statistical Characteristics of the Field SOCD
As Table 2 shows, the mean SOCD of the soil profiles decreased from L1 to L5, and the mean SOCD of L1 was double that of L5 in both study areas. The sharpest decrease in the mean SOCD, a decline of approximately 1 kg·m −2 , occurred from L3 to L4. Additionally, the mean SOCD of Xinxing at each layer greater than that in Luoding. The CV shows that all soil layers had a moderate level of variation in both study areas under the CV classification standard [46]. The skewness near 0 and kurtosis near 3 showed that the data followed a normal distribution. As Table 2 shows, the data had a skewed distribution at each layer.

Accuracy of SOCS Estimation Methods
The accuracy levels of the four methods verified by the estimation samples were generally higher than the validation accuracy levels, as shown in Table 3. For the ANN method, the R 2 verified by the estimation samples at all soil layers exceeded 0.8, which was the best modelling accuracy. The accuracy of OK verified by the estimation samples was better than that of the RBF for each soil layer. The STM, which generally had R 2 values lower than 0.3, showed a poor estimation capability. The accuracy levels of the four methods are shown in Table 3. For the RMSE and MAE indices, the STM, OK, and RBF had higher values than the ANN method. The low R 2 of the STM indicates that the estimation results have little reference significance. For the two interpolation methods, the R 2 , which was between 0.13 and 0.37 at L1-L3, declined dramatically at L4 and L5. The R 2 of the ANN was generally higher than that for the three other methods, especially at L1, L4, and L5. For the STM, its R 2 was lower than for the ANN by 0.77, 0.57, 0.50, 0.70, and 0.74 at L1-L5, respectively, with a mean of 0.66. For the OK, its R 2 was lower than for the ANN by 0.69, 0.33, 0.27, 0.77, and 0.76 at L1-L5, respectively, with a mean of 0.56. For the RBF, its R 2 was lower than for the ANN by 0.65, 0.36, 0.22, 0.68, and 0.78 at L1-L5, respectively, with a mean of 0.54. Figure 3 represents the SOCD distribution estimated by the four methods at L1-L5. The results display a decreasing trend for SOCD with an increase of soil depth. The SOCD estimation results for the whole soil profile are higher with the RBF than with the other methods and are 9.5% higher than the ANN estimation method. The results with the OK are 3.1% higher than with the ANN estimation method. Thus, we can reasonably speculate that the SOCS values estimated by the RBF and OK may lead to higher values. The results estimated by the STM are quite similar to those obtained with the ANN method, only 1.0% lower than the results for the ANN across the whole soil profile. Although the STM is not an effective method for the estimation of samples, the results are still reliable when the mean value is considered.   In terms of the SOCS estimation results (Figure 4), all methods show that the SOCS values at L1 and L2 account for approximately half of the 0-100 cm soil profile. The mean SOCS proportions in L3, L4 and L5 are 21%, 16% and 14%, respectively, and the total proportion is 51%. Therefore, the other half of the SOCS exists in L3-L5. This result is consistent with Batjes's [17] conclusion that the SOCS in the soil profile of 30-100 cm accounts for 46-63% of the whole profile. This result demonstrates that the SOCS is divided in half at a depth of about 40 cm in relation to the whole profile (0-100 cm).

The Correlation between ANN Model Parameters and the Predicted SOCD
As Table 4 (blot fonts) shows, the combination of candidate parameters adopted by the ANN model is not in accordance with the Pearson correlation, because machine learning methods like ANN are black box methods [7] that cannot reveal the functional relationships between the target and predictor variables [10]. Therefore, we performed the Pearson correlation to analyze the functional relationships between the parameters and SOCD predicted with the ANN model. The CSOM was significantly correlated with the predicted SOCD at a soil profile of 0-100 cm. For the candidate parameters, the slope had a significant positive correlation with SOCD at L1-L4, which agreed with previous research [47,48] showing that the slope is a strong determinant of the SOCD distribution. Researchers have mostly discussed how the slope affects the SOC distribution in horizontal space, and there are still opposing viewpoints for different factors [21,49]. Our study focused on the correlation between the slope and SOCD with a vertical distribution. According to Table 4, the positive correlation proves that the larger the slope is, the weaker the human influence appears, which provides a favorable environment for SOC aggregation. The slope has no correlation with the SOCD under the 80 cm soil layer, which indicates that the SOCD under the 80 cm soil layer is not affected by the slope. The STF was found to be negatively correlated with the SOCD at L1-L4. The STF is used to define the hydrologic similarity between points [50], which only describes the influence of terrain on streamflow formation [51]. This study deduced that a higher STF represents higher streamflow activity, and higher streamflow activity prevents the SOC accumulation process occurring. It is worth pointing out that the nine hydrological and topographic parameters of L5 did not correlate with SOCD. We deduced that the hydrological and topographic parameters had little influence on the SOCD under the 80 cm soil profile where the SOCD was low and the SOM had been deposited on the bottom for a long period of time.

Spatial Mapping of SOCD Estimation Methods
For the ANN estimation method, the SOCD at L1 was found to be comparatively high among the five layers, as shown in Figure 5a, with an evident fluctuation trend, which shows that the SOCD of the southwest grid exceeded 5.0 kg·m −2 . Under L1, similar to the estimation results, the SOCD in L1-L4 decreased evenly. However, the maps of L2-L4 were slightly indistinct across the overall SOCD spatial distribution. At L2 and L3, an apparent difference in the spatial distribution maps appeared in the western hilly areas with elevations above 150 m, where the SOCD was generally greater than 3.1 kg·m −2 . The main difference between the ANN and the other three methods was that the ANN maps can portray a change of detail in grid resolution. However, except for L1, regional differences were not distinctly displayed on the maps, which is a weakness of ANN model mapping.
For the soil type method, the classification of patches is completely based on the soil-type map, so the SOCD distribution cannot be clearly described. Moreover, mapping is also restricted by the number of samples, which means that when fewer samples are owned by the subtype, the estimation result is less reliable. Taking the map of L4 as an example (Figure 5b), the map shows that the SOCD is higher in the northeast than in the surrounding areas, which is contrary to the results of the other three methods. In practice, areas with abnormal values all belong to one soil subtype, which has merely one corresponding soil sample within 90% of soil samples. The abnormal values resulted in a notable rise in the CV, 63.2% at this layer, while the CVs of the other layers were lower than 30%. This reveals the defect of STM in spatial mapping.
For the OK and RBF, the bull's eye effect of the RBF (Figure 5d) resulted in an unsmooth surface, especially at L4. However, the OK maps were smooth and not affected by extreme values of samples. This shows that the RBF has a strict demand for sample values, which significantly affects the map's smoothness. In terms of the variation tendency from L1 to L5, the SOCD values continuously declined. The SOCD in the west region of Luoding was lower than in the northeast region. However, the overall SOCD at L5 predicted by the two interpolation methods was less than 3 kg·m −2 , showing a relatively flat variation in SOCD. The 10 m resolution maps produced by the ANN estimation method have the capability to precisely portray the detail of the SOCD distribution. The RBF and OK can only describe the SOCD in grid resolution. Additionally, the RBF map displayed a bull's eye effect, leading to an unsmooth surface. Similarly to the two interpolation methods, the STM only describes the difference in SOCD in grid resolution, where a soil subtype corresponds to one result. Therefore, it is difficult to determine the precise distribution of SOCD through the STM map. Table 5 shows the accuracy of the extension results. The R 2 index of STM, OK, and RBF did not exceed 0.1, which suggests that the three methods have little generalizability when there is no data from the target area to support the methods' operation. The accuracy of estimation results obtained by extending the Luoding-based ANN model to Xinxing was relatively high. However, compared with the validation accuracy in Luoding (Table 2), the R 2 decreased by 43-56%, the RMSE rose by 0.60-0.82 kg·m −2 , and the MAE rose by 35-78%. The level of error increased significantly, and the mean R 2 was lower than 0.5. For the soil type method, the classification of patches is completely based on the soiltype map, so the SOCD distribution cannot be clearly described. Moreover, mapping is also restricted by the number of samples, which means that when fewer samples are owned by the subtype, the estimation result is less reliable. Taking the map of L4 as an example (Figure 5b), the map shows that the SOCD is higher in the northeast than in the surrounding areas, which is contrary to the results of the other three methods. In practice, areas with abnormal values all belong to one soil subtype, which has merely one corresponding soil sample within 90% of soil samples. The abnormal values resulted in a notable rise in the CV, 63.2% at this layer, while the CVs of the other layers were lower than 30%. This reveals the defect of STM in spatial mapping.

Extension Results of SOCD Estimation Methods
For the OK and RBF, the bull's eye effect of the RBF (Figure 5d) resulted in an unsmooth surface, especially at L4. However, the OK maps were smooth and not affected by extreme values of samples. This shows that the RBF has a strict demand for sample values, which significantly affects the map's smoothness. In terms of the variation tendency from L1 to L5, the SOCD values continuously declined. The SOCD in the west region of Luoding was lower than in the northeast region. However, the overall SOCD at L5 predicted by the two interpolation methods was less than 3 kg·m −2 , showing a relatively flat variation in SOCD. The 10 m resolution maps produced by the ANN estimation method have the capability to precisely portray the detail of the SOCD distribution. The RBF and OK can only describe the SOCD in grid resolution. Additionally, the RBF map displayed a bull's eye effect, leading to an unsmooth surface. Similarly to the two interpolation methods, the STM only describes the difference in SOCD in grid resolution, where a soil subtype corresponds to one result. Therefore, it is difficult to determine the precise distribution of SOCD through the STM map.   Compared with the high accuracy and low error rate of the ANN method in Luoding, the accuracy level in Xinxing was unacceptable. Because of the low estimation accuracy under similar topographic and hydrologic circumstances between Luoding and Xinxing, we suggest that the accuracy of the ANN estimation method may decrease greatly when the two target areas have a larger distance and different circumstances. Therefore, the ANN model performs poorly when extended to a new area. Nevertheless, of all the estimation methods, only the ANN method has the ability to improve. To further improve the accuracy, the linear model was added to tackle this weakness of the ANN estimation method to improve the generalizability.

Coefficients and Accuracy of the Linear Model
The linear model of each layer was built based on the CSOM Levels of Xinxing presented in Table 6. By using 20% of soil samples from Xinxing to build the linear model, the accuracy improved significantly compared with the results shown in Table 5. From L1 to L5, the mean RMSE declined by 0.65, 0.33, 0.32, 0.59, and 0.39 kg·m −2 , respectively. The mean MAE declined by 33%, 17%, 20%, 34%, and 28%, respectively. The mean R 2 increased by 0.29, 0.23, 0.25, 0.27, and 0.27, respectively. As shown in Table 6, only two sample points from level 4 were used for model building, which led to lower R 2 values for the three CSOM levels at each layer. Nevertheless, despite using a small number of samples to build a linear model, which may have resulted in lower accuracy, we conclude that the linear model still effectively improved the ANN estimation method s estimation accuracy for an extended area.

Spatial Distribution Maps of the Extended Area
The Xinxing SOCD spatial distribution map at L1 modified by the linear model for which the R 2 index was the highest, shown in Figure 6c, improved the R 2 by 0.29. The original SOCD map of Xinxing extended by the Luoding-based ANN model is shown in Figure 6a, and it did not involve the input of any data from Xinxing. To allow a distinct contrast, the water bodies were not removed from the maps.
According to Table 5, the accuracy of Level 4 was lower than at the other two levels, and it is shown in Figure 6c that the SOCD of Level 4 patches was distinctly lower than that of other levels. On the one hand, the linear model based on CSOM level was able to effectively modify the original extended data to give a higher level of accuracy, which allowed is to produce 10 m resolution maps and results using only the linear model and 24 soil samples. We should emphasize that the linear model operates under the condition of having one already-built ANN model. On the other hand, the small number of input samples and the use of coarse-resolution SOM maps could lead to deviations on the map. As shown in Figure 6c, the original boundary of the CSOM map was still visible. We should point out that the variation in the patches at level 4 was not accurate because the boundaries between the levels in the CSOM map were portrayed artificially, and they do not exist in reality.

Estimation Capability of the Four Methods
1. The ANN model method. The estimation accuracy of the ANN decreased at L2 and L3 because of the low CV of samples. We speculate that when applying profile data with low CV values, the ANN method's estimation accuracy is relatively inferior. However, the accuracy of the ANN method was higher than that of the other three methods, and the mean R 2 of the five soil layers was 0.78. Meanwhile, the ANN method produced SOCD spatial distribution maps that could reveal the local-scale variation with a resolution of 10 m. One drawback was that the overall variation of SOCD at L2-L4 on the regional scale was somewhat vague, which means we could not obtain more distribution information from these maps. This drawback may become our next focus in improving the mapping capability and evaluating the impact of parameters on mapping.
2. The Soil type method. In the accuracy assessment, The STM had the worst level of accuracy in every soil layer. It performed the worst because the STM is a taxonomic statistical method, which means that its estimation results are based on soil taxonomy but estimated value of sample. As the estimation result showed, the 0-100 cm SOCD result of the STM was the nearest to the ANN method, revealing that the STM's estimation results could be credible when the mean value is considered. The STM's accuracy would be significantly affected by the profile data and number of samples, making the STM an imprecise SOCS estimation method at the point dimension.
3. The two interpolation methods. The accuracy of the two interpolation methods showed a similar pattern: the accuracy declined sharply at L4 and L5. The following factors may cause this decline: (i) The spatial distribution of file samples was uneven and aggregated and (ⅱ) some samples at L4 and L5 layers may reach the parent material layer, leading to a high CV value. Especially for deterministic interpolation methods like the

Estimation Capability of the Four Methods
1. The ANN model method. The estimation accuracy of the ANN decreased at L2 and L3 because of the low CV of samples. We speculate that when applying profile data with low CV values, the ANN method's estimation accuracy is relatively inferior. However, the accuracy of the ANN method was higher than that of the other three methods, and the mean R 2 of the five soil layers was 0.78. Meanwhile, the ANN method produced SOCD spatial distribution maps that could reveal the local-scale variation with a resolution of 10 m. One drawback was that the overall variation of SOCD at L2-L4 on the regional scale was somewhat vague, which means we could not obtain more distribution information from these maps. This drawback may become our next focus in improving the mapping capability and evaluating the impact of parameters on mapping.
2. The Soil type method. In the accuracy assessment, The STM had the worst level of accuracy in every soil layer. It performed the worst because the STM is a taxonomic statistical method, which means that its estimation results are based on soil taxonomy but estimated value of sample. As the estimation result showed, the 0-100 cm SOCD result of the STM was the nearest to the ANN method, revealing that the STM's estimation results could be credible when the mean value is considered. The STM's accuracy would be significantly affected by the profile data and number of samples, making the STM an imprecise SOCS estimation method at the point dimension.
3. The two interpolation methods. The accuracy of the two interpolation methods showed a similar pattern: the accuracy declined sharply at L4 and L5. The following factors may cause this decline: (i) The spatial distribution of file samples was uneven and aggregated and (ii) some samples at L4 and L5 layers may reach the parent material layer, leading to a high CV value. Especially for deterministic interpolation methods like the RBF, a significant fluctuation of data within a short distance greatly affects the interpolation accuracy, which illustrates the relatively strict data requirements of the interpolation method [7]. We suggest that the interpolation method is not suitable for estimating the SOCS of bottom layer soil. As the estimation results showed, the SOCD values at 0-100 cm estimated by OK and RBF, especially the values of the RBF, were higher than those obtained with the ANN method. We conclude that the ability of the two interpolation methods to perform SOCD estimation of the profile data of this study is lower.

Extension Capability of ANN Model Estimation Method
In this paper, the linear model is chosen as the extension model to meet the requirements of a small number of samples and simple operability. Theoretically, a stable linear model could be established with a number of samples twice that of its independent variables. Therefore, only 20% of the soil samples were selected to establish the linear model in this study. As the results showed, the ANN model extension results modified by the linear model met the expectations of this study. However, because SOCS changes are affected by human activities, natural conditions [52,53], and many other factors, simply describing the statistical relationship between samples and predicted values with a linear model might cause a poor fitting result. The prediction results are also directly affected by the sample quality. Once the sample outliers appear, the linear model s prediction result will deviate from the actual situation. This study found that the linear models built at level 4 had lower modeling accuracy because of the small number of samples, but the relationship between number of samples and accuracy was not further discussed. Thus, the next task related to the use of the linear model in extension is to find the relationship between number of samples and model accuracy and to determine the upper limit value of the accuracy. For the linear model building classification criteria, the CSOM causes a distinct boundary in the distribution maps. This result demonstrates that taking the CSOM map as the model building criteria was effective but may affect our judgment on the real SOCS distribution law. To avoid this condition, finer or other classification maps could be introduced as the linear model building criteria and the potential of the linear model in the ANN model generalization could be further explored.

SOCS Vertical Distribution of the Estimation Area
The results show that the SOCS of all soil layers in Luoding decreased with the soil depth, and the SOCS of L1 accounted for 30% of the 0-100 cm soil layer. We verified that Luoding's SOCS significantly gathers at the soil surface (0-20 cm) by comparing our results with those of other research. For example, Xie et al. [54] concluded that the SOCS at a soil layer of 0-20 cm accounted for 32.5% of the 0-100 cm soil layer in China; Li et al. [55] found that the SOCS at 0-20 cm accounted for 38.7% of the total in Eastern China; and Chen et al. [56] estimated the SOCS results in Fujian Province and concluded that the SOCS at 0-20 cm accounted for 39.3% of the 0-100 cm soil layer. We suggest that the proportion of Luoding's surface SOCS is relatively low. In terms of the SOCS in the 0-100 cm soil layer, on a global scale, according to previous studies [4,17,57,58], the global average SOCD of the 0-100 cm soil layer is about 10.6 kg·m −2 . Using the scale of terrestrial ecosystems in China, Liang et al. [59] predicted that the average SOCD in the 0-100 cm soil layer in China is about 10.31 ± 3.39 kg·m −2 , and Xu et al. [9] predicted that the terrestrial SOCD in China is about 9.13 ± 0.87 kg·m −2 . Compared with the SOCD of the 0-100 cm soil layer (12.1 kg·m −2 ) predicted by the ANN model in Luoding, it is evident that Luoding has a higher level than the national level. The comparison results indicate that the SOCD is relatively high in Luoding, but it does not concentrate in the surface layer (0-20 cm).

Conclusions
Our results demonstrate that the ANN estimation method had the best performance in terms of the estimation accuracy and spatial distribution mapping when used to assess data obtained from Luoding City, Guangdong Province, China. The generalization test demonstrated that the soil-type method, Ordinary Kriging, and Radial basis function have little generalizability. The extension results for the linear model built with only 24 samples show that it is practical to improve the estimation results of the ANN model. Thus, the ANN estimation method is an effective method for obtaining accurate SOCS estimation results on a regional scale. By adopting a linear model approach, the ANN estimation method can effectively extend the estimation region with only a small number of field samples with a preferable level of accuracy. However, one drawback of this study is the classification of the linear model. In future research, we will try to use a higher resolution and more precise mapping as the classification criteria to build the linear model to remove the artificial boundary.