Global Optimization of Soil Texture Maps From Satellite‐Observed Soil Moisture Drydowns and Its Implementation in Noah‐MP Land Surface Model

Soil moisture (SM) plays an important role in regulating regional weather and climate. However, the simulations of SM in current land surface models (LSMs) contain large biases and model spreads. One primary reason contributing to such model biases could be the misrepresentation of soil texture in LSMs, since current available large‐scale soil texture data are often generated from extrapolation algorithm based on a scarce number of in‐situ geological measurements. Fortunately, recent advancements in satellite technology provide a unique opportunity to constrain the soil texture data sets by introducing observed information at large spatial scales. Here, two major soil texture baseline data sets (Global Soil Data sets for Earth system science, GSDE and Harmonized World Soil Data from Food and Agriculture Organization, HWSD) are optimized with satellite‐estimated soil hydraulic parameters. The optimized soil maps show increased (decreased) sand (clay) content over arid regions. The soil organic carbon (SOC) content increases globally especially over regions with dense vegetation cover. The optimized soil texture data sets are then used to run simulations in one example LSM, that is, Noah LSM with Multiple Parameters. Results show that the simulated SM with satellite‐optimized soil texture maps is improved at both grid and in‐situ scales. Intercase comparison analyses show the SM improvement differs between simulations using different soil maps and soil hydraulic schemes. Our results highlight the importance of incorporating observation‐oriented calibration on soil texture in current LSMs. This study also joins the call for a better soil profile representation in the next generation of Earth System Models (ESMs).

Plain Language Summary Soil moisture (SM) is important for weather and climate but is often poorly simulated by Land Surface Models (LSMs).One possible reason could be the inappropriate representation of soil texture maps utilized in LSMs since current gridded soil texture maps are often derived from a limited number of in-situ measurements.In this study, we leverage the benefits of modern satellite products and land surface theories to improve several major global soil texture maps, and use the calibrated soil maps to improve SM simulation in one example LSM.Results show increased sand content over arid areas while the results for clay content show the opposite pattern.The SOC result shows an overall increase over the entire globe but is more evident in dense vegetation land covers.The model simulated SM using the calibrated soil maps generally outperforms those with the baseline soil maps.The improvement is more significant in the experiment with soil maps considering SOC.Our results here provide successful evidence for constraining soil texture data from large-scale observations.We also show that observation-oriented calibration on soil texture maps is necessary for a better land surface simulation, which is critically important for the development of ESMs.

Introduction
Soil moisture (SM) is a crucial component in terrestrial water cycles (Oki & Kanae, 2006).It can have substantial influence on regional weather and climate through its role as the intermediate medium for land-atmosphere water and energy flux exchanges (Seneviratne et al., 2010).Persistent negative anomaly in SM could induce severe droughts, causing great loss in agriculture production and socio-economic activities (Miralles et al., 2019;Mukherjee et al., 2018).It can also cause hazardous extreme events that threaten human life (Miralles et al., 2014;Seneviratne et al., 2010;Zhou et al., 2019).Accurate predictions of SM is thus essentially useful for decisionmakers to take prevention and mitigation measures for such natural disasters in time.
SM is usually simulated by land surface models (LSMs) and contributes to the boundary layer condition in Earth System Models (ESMs) through regulations on surface water flux exchanges.However, evaluation studies have shown that the SM simulation in current LSMs contains significant biases, for example, wide SM spreads were found within a diverse ensemble of LSMs (Boone et al., 2004;Dirmeyer, Gao, et al., 2006, Dirmeyer, Koster, et al., 2006b, Dirmeyer et al., 2016).The large SM biases could then be amplified through land-atmosphere interaction processes and result in substantial air temperature and precipitation biases in ESMs (Dong, Lei, & Crow, 2022).In previous studies, successful efforts have been addressed to improve SM simulations in LSMs, including calibrating the models' soil hydrothermal parameterization schemes (Koster et al., 2017(Koster et al., , 2018;;Zheng et al., 2015), essential land surface parameters such as Leaf Area Index (LAI), surface albedo (Kumar et al., 2019;Malik et al., 2012;Yin et al., 2016), etc.However, these improvement studies are model specific, that is to say, SM biases cannot be improved if the models already use the optimal parameterization schemes or land surface parameters.The systematical SM biases (i.e., the long-term equilibrium SM) determined by soil texture between models should be improved through the calibration of soil texture data sets utilized in current LSMs (He et al., 2023;Teuling et al., 2009).
Soil texture is defined as a certain soil composition of sand, clay and loam content, and determines the soil hydrothermal properties such as saturated soil matric potential, saturated soil hydraulic conductivity and diffusivity, etc.The Soil Organic Carbon (SOC), as an additional component of soil texture composition, is also recently attracting increasing interests within the scientific community (Hugelius et al., 2020;Rawls et al., 2003;Sun et al., 2021).Generally, there are two options to define soil texture in LSMs: one is to use the modelprescribed look-up table, in which a certain value obtained from in-situ geological surveys (e.g., particle size analyses (PCA) (Black, 1965;Gee & Bauder, 2018)) is assigned for each soil type; The other is to incorporate gridded soil maps externally, which is favored in recent land surface simulation studies since it is spatially more representative compared to the look-up table approach (Xu et al., 2023).However, the gridded soil maps themselves contain biases, since they are often generated by extrapolating the point-scale measurements into polygons (Dai et al., 2019;Shangguan et al., 2014).Such mismatch in spatial scales might lead to unreal soil texture results, particularly when the number of in-situ measurements is limited.However, since large-scale soil texture observation is unavailable, it is difficult to improve the quality of current gridded soil maps.
Recent advancements in remote sensing technologies have provided unique opportunities to observe Earth's land surface from large spatial scales.Development of satellite-observed land surface products such as land surface temperature (Wan, 2013), SM (Entekhabi et al., 2010;Kerr et al., 2001;Njoku et al., 2003), and vegetation indices (e.g., LAI, Fractional Vegetation Cover (FVC), etc.) (Cohen et al., 2006;Garrigues et al., 2008;Xiao et al., 2016) facilitates improvement of land surface modeling by providing real-time and realistic land surface information (He et al., 2021;Kolassa et al., 2020;Kumar et al., 2015Kumar et al., , 2019;;Li et al., 2019).Data assimilation (DA) technology is one typical way of incorporating satellite data sets to improve land surface simulations (Gettelman et al., 2022;Yang, Chen, et al., 2020, Yang, Zhao, et al., 2020).While this approach is effective in improving models' biases, to maintain the persistent improvement of LSM simulations, the DA approach requires continuous input of satellite signals.The improvement thus cannot be extended to future predictions when satellite measurements are unavailable.An alternative and much simpler approach for satellite-based model improvement is using satellite information to calibrate LSMs' physical schemes and parameters (Balsamo et al., 2018;Lu et al., 2020;Yang et al., 2009).Efforts using this approach have demonstrated that the parameter calibration approach can also substantially improve land surface modeling biases (Chen et al., 2010;Kolassa et al., 2020;Sun et al., 2021), and more importantly, it can complement the DA-based model improvements (Koster et al., 2018).The model improvement from the calibration approach can be efficiently transferred into future scenarios compared to the DA approach.
In land surface modeling studies, empirical functions (i.e., pedo-transfer functions, PTFs, e.g., Wösten et al., 2001) are often used to link soil texture data and soil hydraulic parameters (e.g., wilting point, field capacity, saturated point, etc.)However, since the acquisition of soil hydraulic parameters could be more labor-intensive and more costing than soil texture measurements, the typical way is to first collect soil texture information, then estimate the soil hydraulic parameters by applying PTF equations.Instead of directly obtaining them from laboratory measurements or field observations, recent studies have shown that some soil hydraulic parameters can be estimated from SM data only (Akbar et al., 2018;Dong, Akbar, et al., 2022).With satellite observations of SM being increasingly available and the empirical functions being known, the newly developed SM-soil hydraulic parameter estimation approach brings a new opportunity to derive soil texture from soil hydraulic parameters in turn.Such an approach could also provide new insight into improvement on currently existing soil texture products.
In this light, this study aims to use soil hydraulic parameters (i.e., soil wilting point θ w and critical point θ c ) estimated from long-term satellite SM data set to calibrate two major soil texture baseline data sets, that is, Global Soil Data sets for Earth system science (GSDE) developed by Land-Atmosphere Interaction Laboratory of Sun Yat-Sen University (Shangguan et al., 2014), and Harmonized World Soil Data developed by Food and Agriculture Organization (FAO) (Nachtergaele et al., 2009).The calibration is realized by applying an optimization procedure, where the soil hydraulic parameters θ w and θ c calculated from baseline soil maps and PTF functions are compared with those from satellite estimations.Additionally, since SOC plays an important role in regulating soil texture and SM content (Rawls et al., 2003;Sun et al., 2021), two PTF schemes with and without considering SOC are used to investigate the influence of SOC on the calibration results.The baseline and calibrated soil maps are then incorporated into an example LSM (i.e., Noah LSM with Multiple Parameters, Noah-MP (Niu et al., 2011;Yang et al., 2011)) to conduct simulations.The simulated SM with baseline and calibrated soil maps are then compared and validated with observational data at both regional and in-situ scales.In doing so, this study could offer an observation-based reference for improving SM simulations in LSMs.This study also provides a method for calibrating gridded soil texture maps with satellite observations, which joins the call for updating the characterization of the soil profile for the development of next generation LSMs (as well as ESMs).
Recall that θ c and θ w refers to the start and end point of the water-limited surface water loss process, respectively.In this light, it is reasonable to expect θ c corresponds to the initial SM value (θ dd0 ), and θ w refers to the minimum SM value (θ min ) of the water-limited drydown curve (solid red line in Figure 1b).To characterize this part of drydown curve, the linear relation between the rate of water loss and SM can be described as: where, k refers to the slope of the curve (d 1 ).Integrating both sides of Equation 1 as a function of time, an empirically derived equation can be derived, as: where, Δθ refers to the SM change during each soil drying event (m 3 m 3 ); θw refers to the estimated minimum SM value (m 3 m 3 ); t refers to the time duration of the SM drydown event (days); and τ L refers to the time scale when SM returns back to an equilibrium state, that is, SM memory time (days).Details of the derivation from Equations 1 and 2 can be found in (McColl et al., 2017).
To obtain estimations of θ c and θ w , the water-limited SM drydown event is first identified as the observed SM timeseries with consistent SM decrement.θ c thus corresponds to the observed initial SM value of each drydown event.Then Equation 2 is used to fit the identified drydown event to derive θ w .We note that the estimated parameters can be different for each identified drydown event (i.e., there are statistical variations in θ c and θ w estimations).To select a representative statistic from the estimated ensembles, we first mapped the standard deviation and coefficient of variation (CV) of each threshold.From Figure S1 in Supporting Information S1 it can be seen that the statistical uncertainty of both θ c and θ w are within a reasonable range.However, there are geographical differences observed between arid and humid regions.The uncertainty of thetaw is much higher over humid areas, while the θ c shows much higher variations over arid areas, which aligns with the assumption that SM is difficult to dry down to θ w , and similarly difficult to reach to θ c over arid areas.Therefore, for grids with low statistical uncertainty (i.e., CV<40%), we chose to use the statistical median as "representative" estimations for the parameters, while expect 5-percentile of the estimated θ w and 95-percentile of θ c ensembles to be more representative over the two areas respectively (Pablos et al., 2017;Zhang et al., 2024).The comparison between median-and percentile-based maps of θ c and θ w are shown in Figure S2 in Supporting Information S1.We should also note that there could be more effective statistics for each grid rather than the 5-and 95-percentiles we used here, but comprehensively selecting the most effective representative statistics at each station is beyond the scope of this study.
The analyses in this study are conducted by using 19-year (2002-2020) remote sensing SM data, that is, Neural Network Soil Moisture (NNSM (Yao et al., 2021)).Compared to other remote sensing SM data sets such as Soil ); E max is the maximum evapotranspiration rate (LT 1 ). Figure 1b is the projection of 1a in the time domain, where the x-axis refers to time (e.g., days) and y-axis refers to SM content (m 3 m 3 ).θ w , θ c , and θ fc refers to soil wilting point, critical point, and field capacity, respectively.Figures adapted from (He et al., 2023).
Active and Passive (SMAP) and Soil Moisture and Ocean Salinity (SMOS), NNSM provides a much longer temporal span and larger spatial coverage over frozen and heavy-vegetated areas with high data accuracy (ubRMSE ≤0.04 m 3 m 3 ).NNSM has a daily temporal resolution and a spatial resolution of 36 km.The daily temporal resolution of NNsm makes both θ w and θ c more observable compared to previous estimations using 3day SM products from SMAP (Akbar et al. (2018).The identified drydown events containing less than three observation samples and events with the determination coefficient (R 2 ) of the fitting less than 0.7 are filtered, consistent with (McColl et al., 2017).Here, we clarify that although the ancillary soil texture data used for SM retrieval could potentially determine the ranges of SM and presumably the wilting point and field capacity, such an effect could be minor since the sensitivity of the satellite SM retrival algorithm to soil texture variations is overall low.Taking SMAP SM (which serves as the target product for NNSM) algorithm for an example, changes of 20% clay content actually only result in up to 0.03 m 3 m 3 SM change, which is within the acceptable ubRMSE range (Figure 6 in Das & O'Neill, 2020).Such low sensitivity is also proven in a previous study on soil texture estimation (Zhao et al., 2022).Therefore we concluded that SMAP prescribed soil texture has minor effect on the final estimations of θ w and θ c , and further, the optimization procedure.

Soil Texture Optimization Through SCE-UA Algorithm
Once θ w and θ c are estimated from satellite observation, and relations between them and soil texture have been known through PFT functions, the most straightforward way to derive soil texture data is to directly solve the empirical equations.However, in doing so the underlying hypothesis would be to assume satellite estimations are the truth.However, theoretically the truth could never be known for any variable or system even with abundant observation records.To avoid such a problem, we here apply another method, that is, the SCE-UA (Shuffled Complex Evolution-University of Arizona (Duan et al., 1992(Duan et al., , 1993(Duan et al., , 1994))) algorithm to obtain soil texture from the satellite-estimated θ w and θ c without explicitly solving the PTF equations.The basic working flow of this algorithm is depicted in Figure 2.
The basic working flow of applying SCE-UA in this study is that for a certain grid, a variation range of each soil texture component will be first prescribed (Step 1 in Figure 2).In this study, the variation range of ±20% for sand, ±10% for clay, and ±5% for SOC is defined.The SCE-UA algorithm will then generate a sample pool based on the input baseline soil texture data (Step 2).For each sample, fitness values of wilting point and critical point are calculated through PTF (θ′ wi and θ′ ci , i = 1,n where n refers to the total number of soil texture combinations, Step 3).Then the generated θ′ wi and θ′ ci are shuffled (Step 4) and the cost function between the fitness values and satellite estimation of θ w and θ c is calculated (Step 5).Then the evolution procedure is conducted using the Constraint-handling using Constrained Evolutionary (CCE) optimization algorithm (Step 6) to ensure the cost function is minimum between the fitness values and satellite estimations.The soil texture combination corresponding to the time when the cost function is minimum is chosen as the optimal soil texture composition for this grid.
Two baseline soil texture data sets, that is, GSDE that includes SOC and HWSD that includes only the basic components of soil texture (i.e., sand, clay, and loam) are chosen in this study because we want to investigate the effect of different PTF schemes on SM improvement.SOC has attracted increasing interests in recent land surface modeling studies (Chen et al., 2016;Lawrence & Slater, 2008) since it can have a significant influence on soil hydraulic properties through its effect on soil structure and adsorption properties (Rawls et al., 2003(Rawls et al., , 2004)).Two PTF schemes with and without consideration of SOC effect (i.e., SR06 (Saxton & Rawls, 2006) and CH78 (Clapp & Hornberger, 1978), correspondingly) are therefore chosen to apply for GSDE and HWSD, respectively.Descriptions of the two PTF schemes are listed in Table S1 in Supporting Information S1.We use the first soil layer of both data set, which is 5 cm for GSDE and 30 cm for HWSD, respectively.The original spatial resolution of GSDE and HWSD is 0.25°.
Three experiment designs for soil texture optimization in this study are shown in Table 1.Opti_exp1 uses sand, clay and SOC from GSDE as baseline soil texture data and SR06 PTF scheme to account for the SOC effect; Opti_exp3 uses sand and clay from the HWSD data set and CH78 as the PTF scheme that does not account for SOC effect.However, since different soil texture data sets are used, comparing results between these two experiments cannot give an explicit indication of how SOC affects soil texture optimizations.As such, an additional experiment, Opti_exp2, which uses only soil sand and clay content from GSDE and the non-SOC effect PTF scheme is also conducted.Comparing results from Opti_exp2 and Opti_exp1 therefore indicates the effects of   2020) for SOC content.Optimization results exceed this range is regarded as ineffective and will be masked out.The analysis in the main context is conducted on 3,000 iteration steps, that is, for each experiment there are 3,000 suits of soil sand, clay (and SOC) combinations.The optimization result will then be chosen as the one combination that can produce the closest θ w and θ c to the satellite estimations.
SOC on soil texture optimization, and comparing results from Opti_exp2 and Opti_exp3 can inform the global spatial patterns of soil texture before and after optimization in two different data sets.

Simulation Designs in Noah-MP LSM
As aforementioned in our research purposes, the final goal of optimizing soil texture data is to improve SM simulation in LSMs.Here we use Noah-MP (Niu et al., 2011;Yang et al., 2011) as an example LSM to evaluate how the optimized soil texture data can help improve SM simulations.Note that although θ c is not directly used in Noah-MP, it is the soil texture data optimized from SMAP-based θ w and θ c that has been incorporated in the land surface simulations here.For each optimization experiment listed in Table 1, we design two simulation experiments-one uses the baseline soil texture data set, and the other one uses the optimized soil texture-to run simulations in Noah-MP (Table 2).As we mentioned previously, since the most representative statistics of θ w and θ c are unknown, which might degrade the accuracy of optimization procedure and therefore the model simulations, we here applied a ground-based calibration strategy to minimize the uncertainty.Where in-situ SM measurements are available, the satellite-estimated soil thresholds statistics, whether in median or 5-and 95-percentiles, are chosen based on the SM simulation biases against in-situ measurement.Specifically, we first run two model simulations, with one using soil texture maps optimized from median soil thresholds, and the other one using soil texture maps optimized from percentile-based soil thresholds.Then we compare the Root-Mean-Square-Error (RMSE) between simulated SM from each case above and the simulated SM from another model run using baseline soil texture maps.If the RMSE is degraded, it means the statistics we chose from satellite soil thresholds cannot be representative enough compared to the in-situ observations, and the median (percentile-based) soil thresholds are then judged to be misrepresentative and are replaced by percentilebased statistics (median) for further optimization and simulations.
China is chosen here as the simulation region because high quality meteorological forcing data is necessary to constrain the additional simulation biases coming from forcing data.A high-quality meteorological data set (CMFD, China Meteorological Forcing Data set) (He et al., 2020) that has been widely used is chosen to drive the model runs, different from the global analyses in the other part of this study.Another reason of the choice of study region is because the stations of the in-situ SM observations we used in this study are much more dense over this study area, compared to other areas in the world (see next paragraph).The simulation period is from 01 Jan 2008 to 31 Dec 2010, in which the first 2 years are run for the model's spin-up.The model is run at the 3-hourly time step and at a spatial resolution of 0.1°.All the soil texture data sets used in the model simulation are downscaled into this spatial resolution, and the first soil layer depth in models are set to be consistent with GSDE and HWSD for each case respectively.
To evaluate how much the simulated SM can be improved from the optimized soil texture data, assessments at both grid and in-situ scales are conducted.At the grid scale, the difference between SM from simulations using baseline soil texture and NNSM, and the difference between SM simulation using optimized soil texture and NNSM are compared.Since NNSM also serves as the primary satellite data set to extract soil hydraulic parameters, the other remote sensing SM product that is independent of the optimization procedure, namely, ITPLDAS (SM data set of China based on microwave DA (Yang et al., 2016;Yang, Chen, et al., 2020, Yang, Zhao, et al., 2020)), is also used for evaluation.For the in-situ evaluation, the biases between SM from simulations using baseline and optimized soil texture data are also compared by using 616 in-situ SM observations with quality control procedure applied (Wang & Shi, 2019).The available soil depths of the in-situ observations are 10, 20, 50 and 100 cm.To ensure the consistency of comparison with the model simulations, we chose 10 cm for the GSDE cases and 20 cm for the HWSD case, respectively.

Global Patterns of Soil Hydraulic Parameters From Satellite Estimations
Global medians of the two soil thresholds, namely, θ w and θ c estimated from satellite-identified drydown events are shown in Figures 3a and 3d, respectively.Both θ w and θ c show clear geographical gradient globally, for example, they both show high values in climatologically humid regions (e.g., eastern U.S., southern Sahara, and southern China) whereas show low values in climatologically arid regions (e.g., western U.S., central Asia, and central Australia).In terms of magnitude, θ c with the mode of ∼0.23 m 3 m 3 , is overall higher than θ w , which has a mode of 0.13 m 3 m 3 (x-axis value corresponding to the peak of probability density function (PDF) function in Figure 3).The results above suggest that our estimations of θ w and θ c using NNSM data set in this study are reasonable, where their global patterns compare consistently to the soil drydown time in several previous studies (McColl et al., 2017(McColl et al., , 2019)).
Compared to satellite estimations, the soil thresholds calculated from one of the baseline soil texture data sets (i.e., GSDE, using SR06 PTF) without any satellite correction show much less spatial gradient, for example, both θ w and θ c are relatively uniform (e.g., North America), showing scattered hotspots across climate zones (e.g., Amazon, eastern Africa, and northwestern India) (Figures 3b and 3e).Results are similar when using different combinations of baseline soil texture data and PTF functions (Figure S1 in Supporting Information S1).This uniform-with-hotspots pattern could be a product of the extrapolation algorithm of the baseline soil texture data set, where the limited point-scale geological survey samples can barely represent the soil characteristics to a much larger spatial extent, which indicates that the spatial pattern of SM simulations might be biased if baseline soil texture data set were directly used in LSMs without corrections from satellite information.In addition, the uniform distribution of soil thresholds in baseline soil texture also results in overestimation in the magnitude of θ w and θ c over arid areas while underestimation over humid areas, which may add additional biases to SM simulations.
The soil thresholds from baseline soil texture with satellite corrections, that is, θ w and θ c corresponding to the minimum value of the cost function in the SCE-UA procedure, compare much closer to satellite estimations in both spatial pattern and magnitude, than to those from baseline soil texture only (Figures 3c and 3f, and Figure S3 in Supporting Information S1 for results using different baseline soil texture data sets and PTF functions).The East-West gradient over North America is reproduced, and the left-skewed PDF of θ w is largely corrected.A similar correction is also made to the dual-peak distribution of θ c .The results suggest that the SCE-UA procedure can efficiently reproduce θ w and θ c that are closest to satellite drydown estimations, and the soil texture components derived under this condition can also serve as an improved data set to represent soil characteristics at large spatial scales.
To validate the optimized soil hydraulic parameters, we also compared the results against in-situ records (Figure 4).The data we used are from Gupta et al. (2022a).To our best knowledge, this is so far the most complete soil hydraulic measurement record publicly accessible synthesized from a comprehensive review of literature.The total number of records is 136,990.However, to ensure the consistency of soil depth (5 cm, 30 cm for HWSD case), the remaining reads are 5,826 (4,376 for HWSD case), and provide only soil hydraulic record of wilting point (θ w ).From the figure it is observed that θ w calculated based on the baseline soil texture maps (left column in Figure 4) are highly overestimated almost everywhere compared to the in-situ measurements (Figure 4g).By comparison, in maps of θ w calculated based on optimized soil maps, the overestimation is mitigated, especially over eastern United States and eastern and high-latitude Euroasian area, although at other stations there remains large overestimation (e.g., western Europe and South America).Nevertheless, this result indicates our optimization algorithm is overall effective to improve the baseline soil texture maps.

Global Patterns of Soil Texture Optimized From SCE-UA Algorithm
Before discussing their spatial patterns, the sensitivity of the optimization results to one important parameter of the SCE-UA algorithm, that is, the iteration steps n, is analyzed.We conducted 4 optimization experiments with n equaling 300, 3,000, 10,000 and 30,000 to conduct analyses.Figure S4 in Supporting Information S1 shows that the optimization results of sand, clay and SOC are insensitive to the iteration steps.The optimized sand and clay content both remain consistent among the four experiments.The SOC content in the four experiments is also highly consistent, only except that the experiment with 300 Iteration steps shows minorly lower SOC content.The analyses in the context below are based on optimization results with iteration steps of 3,000.Soil texture derived from the SCE-UA procedure with satellite information (we will refer to them as "optimized" data set hereafter) as well as comparison with their baseline soil texture data sets are shown in Figures 5-7.
Overall, the soil texture components from the SCE-UA algorithm show a similar geographical pattern to the baseline maps, for example, sand content from both optimized and baseline data sets are high in climatologically arid areas (e.g., great deserts such as California, Sahara and central Australia), while low in climatologically humid areas (Figure 5a-5f), and vice versa for the soil clay component (Figures 6a-6f).Soil organic carbon, however, is overall evenly distributed across climate zones, but shows close relations to vegetation density.For example, in areas with dense vegetation coverage, SOC is comparably high (Figures 7a-7b).The above results suggest that the SCE-UA procedure performs reasonably in reproducing the overall geographical pattern of global soil component content.
However, the optimized soil texture data sets differ from the baseline soil maps in magnitude (Figures 5-6, right columns and Figure 6c).Each soil content from the optimized data set shows up to 20%, 10%, and 15% changes in absolute values respectively (the three numbers correspond to the prescribed range in Table 1).Similar to their spatial patterns, the magnitude changes in sand and clay content show dependence on climate zones, that is, the sand content from the optimized data set increases over arid regions whereas decreases over humid regions (the other way around for the clay content changes), while SOC shows overall increase globally, although the increase is comparably more over the densely covered vegetation areas.
The spatial dependence of the sand and clay changes corresponds well to the satellite-baseline soil thresholds difference (Section 3.1), further verifying that the optimized soil texture results do contain satellite information from satellite observations, such that they could perform better in representing large-scale soil characteristics (compared to the baseline soil maps).We should also note that such spatial dependence of sand changes is also related to PTF functions, for example, the decrease of sand content in humid areas in the cases Opti_exp2 and Opti_exp3 is not as much as that in Opti_exp1 (even showing an increase in some areas such as eastern U.S. and southern China).This indicates that appropriately choosing PTF functions is essential to obtain reasonable soil sand estimations.

Soil Moisture Simulations From Noah-MP LSM
JJA SM simulations from Noah-MP LSM using baseline and optimized soil texture maps are then compared with contemporary NNSM SM product and shown in Figure 8. SM simulated using baseline soil maps (i.e., exp1, exp3 and exp5) show overall positive biases (biases of ∼0.1 m 3 m 3 compared to NNSM, inset PDF distribution) over the entire study area.The positive biases are especially large in southeastern China while substantial negative biases are observed in the northwestern part, with maximum biases larger than 0.15 m 3 m 3 (in absolute value).By comparison, SM using optimized soil data (i.e., exp2, exp4 and exp6) show reduced biases, with the rightskewed PDF distribution corrected left-forward.Noticeably, the substantial positive and negative biases over southeastern and northwestern regions are largely corrected, with the maximum biases constrained within 0.1 m 3 m 3 respectively.Scattered hotspots with extremely large SM biases are still observed, possibly due to local land surface conditions (e.g., vegetation properties) that may not be accounted for by soil characteristics.
In addition to the overall SM improvement in experiments using the optimized soil texture data set, the SM improvements are also case-dependent.In exp4 and exp6, where different baseline soil data and PTF functions without considering SOC are used, the negative SM biases-meaning less SM retained by the soil in the simulations compared to satellite observation-over southeastern regions do not show as much improvement as in exp2.The SM simulation difference between exp2 and the other two experiments corresponds reasonably to the discrepancy in the spatial distribution of soil sand content-the decrement of soil sand percentage in exp2 will consequently enhance the soil water-holding capability, thus that more water can be retained in the soil column, while in the other two cases the soil sand percentage is increased, resulting in degradation of the soil waterholding capacity.Detailed inter-case comparisons will be discussed in Section 3.4.
Since NNSM serves as input (although not directly) to the optimization procedure, using it as a reference to validate SM simulations might lead to the comparison itself being biased.To further validate the simulated SM independently, a comparison using SM from ITPLDAS is conducted.Results show that SM biases over southeastern (negative biases) and northwestern regions (positive biases, more inland compared to Figure 7) are still substantial between simulations using baseline soil maps and ITPLDAS, and biases are both reduced, although relatively moderate when compared to SM comparison with NNSM (Figure S3 in Supporting Information S1).Another thing noticed in particular when compared with ITPLDAS is that large biases in northeastern areas also exist in exp1, exp3 and exp5, and using the optimized soil texture data set does not visibly help to reduce the biases.The relative moderate improvement over southeastern and northeastern areas could be again because over these areas where the land surface is covered by dense vegetation, the soil texture may play a secondary role in regulating SM dynamics.However, we should also note that the different performance of how soil texture can improve SM in comparison with ITPLDAS may also be caused by the behavior of the reference SM data itself.
ITPLDAS is an assimilation system where the SiB2 LSM is incorporated (Yang, Chen, et al., 2020, Yang, Zhao, et al., 2020), therefore the output SM product may unavoidably be influenced by the prescribed model parameters (e.g., land surface parameters such as vegetation height, leaf reflectance, surface roughness, etc.), especially over areas with high vegetation density.For example, SiB2 only uses 9 vegetation types (Table 2 in (Sellers et al., 1996)) while the vegetation category in Noah-MP is 27 (this option is based on USGS's land cover classification; the other option is based on MODIS product, which has 20 categories).This means the representation of vegetation spatial heterogeneity in SiB2 is much coarser, which may yield biases in the ITPLDAS product when compared with simulations from Noah-MP.Even for similar vegetation types, their parameters differ a lot.For example, the canopy base and top height for Broadleaf Evergreen Forests (BEF) are 1 and 35 m in SiB2 (Table 5 in (Sellers et al., 1996)), while 8 and 20 m in Noah-MP.Similar differences are also found in other vegetation biomes.These differences indicate that improving soil texture only may not necessarily help to reduce the simulated SM biases.Instead, additional efforts should be addressed to improve the characterization of other important land surface parameters.
To further validate the simulation performance, we compared the simulated SM from each experiment with 616 in-situ observations.The validations are conducted at both monthly and JJA-mean time scales.From the statistics in Table S2 in Supporting Information S1, it is clearly seen that the soil texture has larger effect on surface SM simulations at longer time scales, shown by the less in-situ stations with in-significant RMSE changes.Within the stations with significant RMSE changes, we also observed that at both time scales, the RMSEs between simulations using optimized soil maps against in-situ observations are improved: In the case where SOC effect is considered, 255 out of 264 (364 out of 429) show reduced RMSE at monthly (JJA-mean) scales; In the cases without considering SOC, 203 out of 322 (255 out of 325) and 380 out of 476 (429 out of 537) show reduced RMSE at monthly (JJA-mean) scales, respectively (Figure 9).The comparative improvement of the HWSD case (Figures 9e and 9f) as compared to GSDE cases (Figures 9a-9d) demonstrate that even though the nominal soil depth of remote sensing soil product is 3∼5 cm whereas the depth of HWSD soil maps are as deep as 30 cm, the impact of such mismatch is minor.

Intercomparison of Results Between Different Optimization and Simulation Cases
In previous sections we describe the overall behaviors of soil texture, soil thresholds and SM improvement in different optimization/simulation cases and show that they are comparably consistent.However, there are still some inter-case differences that may be caused by the different baseline soil data or PTF functions used, and therefore need to be examined in detail.In this section, we will elucidate how the optimized soil texture, soil thresholds, and simulated SM results behave between different optimization and simulation cases.
Figure 10 shows the intercomparison of PDF distribution of soil texture data, soil thresholds and SM simulations between different cases.Compared to all other cases, the optimization case using the baseline soil data of GSDE and PTF function considering SOC effect (Opti_exp1) and the corresponding simulation case (exp2) shows the closest performance of soil thresholds and SM simulation compared to satellite observations respectively (Figures 10c-10e, solid red lines).The optimization case using GSDE soil data but the PTF function without considering the SOC effect shows the second closest result to satellite observations in terms of SM simulation and soil thresholds (Figures 10c-10e, green solid lines).The comparison between the above two cases suggests that the choice of PTF is essential in the soil texture optimization procedure, and a PTF considering the SOC effect may help improve the soil texture optimization performance.Since there is no directly available satellite-observed soil texture data for comparison, the above analyses may also suggest the soil sand and clay distribution from the optimized soil texture using GSDE and SOC-based PTF-that is, unimodal distribution of both sand and clay but with sand content more frequently occurring between 30% and 60% while clay content 8%-30% (Figures 10a-10b, solid violet lines)-should outperform other results and serve as a better reference for soil texture assessment studies.
The optimization case uses the same non-PTF scheme as in Opti_exp2 but the HWSD soil data show the most inferior performance of soil thresholds estimation as well as SM simulation compared to satellite observations (Figures 10c-10e, orange solid lines; only among cases using optimized soil data).Comparison between cases using soil data without optimization, however, shows consistent results (Figures 10c-10e, dash lines).The comparison here reflects inconsistent behaviors (thus large uncertainties) of current soil texture data sets, and our results show that utilization of GSDE soil texture data could produce more realistic soil thresholds and SM simulation in LSM.

Effect of Soil Texture on Key Land Surface Variables
Since soil texture is expected to have impact beyond SM, we have also assessed additional key land surface variables, that is, ET, Soil Evaporation, Plant transpiration, surface runoff, sensible heat flux and surface soil temperature (ST), between simulations using baseline and optimized soil maps.The resulting difference map from simulations of GSDE baseline map considering SOC effect is shown in Figure 11.The corresponding soil texture changes are shown in Figure S6 in Supporting Information S1.From Figure 11 we observed that using optimized soil texture data, the model simulated ET is higher (lower) over areas where sand content decreases (increases) (Figure 11a), and such an increase in ET is mainly contributed by the increase (decrease) in soil evaporation rather than plant transpiration (Figures 11b and 11c).This is reasonable because less (more) sand content means more (less) soil water available for ET to happen.However, since plant transpiration also relies on soil matric potential (higher potential more difficult for plant to withdrawal water), the effect of increasing soil matric potential on transpiration cancels out with the effect of increasing water availability, leading to an overall minor change of simulated plant transpiration.In contrast, the simulated runoff is reduced (increased) over areas of decreasing (increasing) sand content (Figure 11d).This is because the precipitation arriving on land surface between the two simulations remains unchanged, therefore more water partitioned to ET would consequently lead to decrease in runoff (drainage neglected).Similarly, the increase (decrease) of ET also lead to increase (decrease) in sensible heat flux and surface ST over more (less)-sand areas (Figures 11e and 11f).Similar results are also shown in experiments without considering SOC effect (Figure S7-S9 in Supporting Information S1).
These results correspond reasonably with the textbook soil hydrological theories as well as with several previous studies in which the soil texture effect on surface hydrometeorological variables is also evaluated (Dennis & Berbery, 2021;Tafasca et al., 2020).For example, in Tafasca et al. (2020), they show similar correspondence of the above variables to soil texture changes in a LSM ORCHIDEE.In Dennis and Berbery (2021), using a mesoscale weather forecast model (WRF), they further show such changes in surface hydrometeorological variables caused by soil texture can influence 2-m air temperature as well as the Planetary Boundary Layer Height (PBLH) simulation.Altough in both studies together with ours, the magnitude of changes are within a limited range (perhaps cannot exceed the influence of uncertainty embedded in meteorological forcing data in some regions), the fact that soil texture can induce non-negligible changes over certain areas suggests that the soil texture should not be downplayed in regional land surface modeling studies.

Limitations of This Study
While the method this study provides has many advantages and does indeed improve the land surface simulation results, several limitations of the method should be noted as precautions for further application.First, the two soil thresholds, θ w and θ c , may contain high uncertainty over extremely humid and arid areas, respectively.In humid regions, the soil could be persistently wet so that it may never dry down to its wilting point; similarly the soil in arid regions can be consistently dry so that it never rises up to reach θ c .Therefore over these areas the derived soil texture results may still be subject to large biases, although using long-term (i.e., 20-year) SM products may help reduce such uncertainty.
Second, to provide a strict constraint on the soil hydraulic parameters-soil texture relationship, more soil hydraulic parameters should be incorporated with the optimization procedure, for example, incorporating all the 7 parameters on the lefthand side of the PTF functions (Table S1 in Supporting Information S1).However, only two of them were applied here because of the non-availability of other parameters when this study was conducted.We did consider using the 20-year maximum satellite SM as the saturated point (θ sat ) for further analyses.The idea was finally abandoned since satellite sensors can hardly detect the moment when SM achieves its saturation point.
Finally, we note that the mismatch of soil depths between different products used in this study can introduce additional biases in the optimization procedure as well as model simulations for the HWSD case.The nominal soil depth of NNSM SM is only 3 ∼ 5 cm, whereas the first soil layer depth of HWSD soil map is down to 30 cm.However, studies have reported that in many conditions (e.g., dry areas), the actual penetrating depth is much deeper than the nominal 5 cm (Feldman et al., 2023;Short Gianotti et al., 2019).Therefore, the remote sensingbased soil thresholds may still be effective for the HWSD optimization case.The comparative RMSE improvement at the in-situ level between HWSD and GSDE cases help demonstrate this point.Nevertheless, the mismatch of soil depths could be one reason why the soil texture effect on surface hydrometeorological variables are less significant than the other two cases incorporating GSDE, since the shallower soil layer usually tends to be more correlated with near surface atmosphere.In fact, the soil texture content in this case is overtuned toward shallower soil layer, and the actual distribution of soil composition in the optimized HWSD maps should be slightly skewed toward that of the baseline maps.To further improve on this issue, we expect a more layered HWSD product, or more advanced satellite sensing technologies that can detect deeper SM conditions in future.

Implications for Soil Texture Calibration Using Satellite Data Sets
As we noted in the introduction part, soil texture serves as the fundamental parameter in LSMs, while current soil texture data sets may be prone to large biases when applied to a large spatial extent since they are often extrapolated from point-scale in-situ measurements, and therefore need constraint by large-scale soil information.Satellite-observed SM products indeed provide the large-scale observation-based constraint on soil texture, yet seldom has been done to bridge the missing link, likely because the issue of scale gaps has not gained sufficient attention within the land surface modeling community in the past.Nowadays as we are in the big data era where spatial heterogeneity of model outputs can be especially important for local environmental solutions, such scale gap problem has to be considered carefully in the model development.
So far, the only study (to our best knowledge) on incorporating satellite SM to soil texture calibration in LSMs is (Zhao et al., 2022), in which the calibrated soil texture is derived by converting the best-screened soil type from the Noah-MP look-up table that can produce the closest SM simulation to SMAP through FAO soil fraction triangle.Using GSDE soil texture as the reference data set ("truth"), the calibrated soil texture outperforms its baseline soil texture data set (i.e., GLDAS-Noah soil texture, on which the Noah-MP soil type look-up table is based).Our study resembles Zhao et al., 2022 in that we both apply an "effect-to-cause" procedure to derive soil texture, which can be seen as a backward process compared to the traditional "from soil texture to soil hydraulic parameters to SM" approach.Such a backward approach should be valued in future large-scale soil texture calibration studies since the availability of the effect variables can be more easily accessed with the development of new land surface theories and data products.
However, we should note that although Zhao et al., 2022 does serve as a pioneering study that motivates the conceptualization of our work, it does also contain several limitations, which may impede the applications of the derived soil texture data set.The biggest disadvantage is that the result is very much limited by the soil type lookup table, where only 12 available soil types are presented for each model grid.That is to say the soil texture fraction at each grid can only be selected from a fixed number of soil sand, clay and loam combinations.In the real world, it is certainly possible that there can be other soil fraction combinations beyond the 12 available choices.In comparison, the optimization procedure our study used here can have as many soil fraction combinations as possible (within the variation upper and lower bounds), therefore the derived results may be closer to the real soil conditions.

Conclusions
This study set out to improve SM simulation in Noah-MP LSM by using soil texture data sets constrained by satellite-derived soil thresholds (i.e., θ w and θ c ). Soil moisture thresholds are first estimated from satelliteidentified soil drydown events.The SCE-UA algorithm and two PTF schemes are then chosen to conduct optimization procedure based on two baseline soil texture data sets, that is, GSDE and HWSD.The baseline and optimized soil texture data sets are then incorporated with Noah-MP LSM to conduct SM simulations.The model simulation results show that using the optimized soil texture data can indeed improve SM simulations in Noah-MP, and utilization of GSDE soil data optimized with SOC-based PTF scheme can produce the most significant SM improvement.
The inter-case comparison shows that there is large uncertainty in the current global soil texture database.Since soil texture often serves as the prescribed parameter in LSMs, such uncertainty will lead to systematical model biases, that is, even though all the other physical parameterizations or parameters were perfect, the model simulations could still be unrealistic due to the biased representation of soil texture.Our study, by using 20-year NNSM SM data, shows that satellite SM data has the potential to constrain such uncertainty.The improved soil texture in this study could therefore serve as a new database to be employed in LSMs for a more accurate land surface simulation.However, we recognize this study could only serve as a testbed.To build a more robust satellite-optimized soil texture database, further experiments using different satellite products (e.g., SMAP, SMOS, CCI, etc.), global soil database (e.g., Data and Information System of International Geosphere-Biosphere Program,IGBP-DIS), and PTF schemes are required and inter-case comparison analyses are highly expected.
Our study also provides a method to improve land surface simulations from the perspective of land-atmosphere interactions.Previous LSM improvement studies prefer incorporating the "state" variable (e.g., land surface temperature, LAI, SM, etc.) to improve land surface simulations.However, such improvement may be limited (especially for improvement in flux simulations) since the state variables could only poorly capture the nearsurface atmosphere's impact on or response to the land surface.By comparison, SM drydown used in this study directly reflects the coupling between SM and the key flux variable ET, therefore it provides more "Dynamic" information to be incorporated into LSM.Although only the baer soil evaporation is significantly improved in the results, this study does show that incorporating land-atmosphere coupling information should serve as an essential part in LSM improvement studies.Recent research also shows that the wrong representation of the SM-ET coupling regime is the dominant reason for land surface simulation biases (Dong, Lei, & Crow, 2022;Zhou et al., 2023), which further confirms our conclusion here.Joint efforts considering both state variables and land-atmosphere interactions could therefore complement each other and facilitate the better LSM simulations in future.
Finally, we highlight the benefit of satellite-optimized soil texture maps in macro-scale land surface modeling, while we did not argue that the optimized soil texture maps in this study is more true against the observationupscaled ones.The major conceptual differences between these two techniques are: the observation soil maps are derived through bottom-up approach, that is, the soil texture is by definition classified based on soil particle sizes through the USGS soil triangle model.As we discussed earlier in the context, such an approach can be seen as true at the point scale, since the input to the model is directly measured from field or laboratory experiments.However, to apply this method to the gridded soil hydrological modeling and usually in macro-scale resolution, the uncertainty lies on that the input data are unavailable and need to be obtained by extrapolating the limited measurements.Our approach, in comparison, uses a top-down method which in turn derives soil texture from soil retention curves, where the unique benefit is the input data to the top-down model are directly available at the spatial resolution more relevant to macro-scale hydrological and land surface modeling.

Figure 1 .
Figure 1.Diagram for surface water loss function (a) and soil moisture (SM) drydown curve (b).The x-axis in (a) refers to SM (m 3 m 3 ), and y-axis refers to surface water loss rate (L(θ), in unit of LT 1); E max is the maximum evapotranspiration rate (LT 1 ).Figure1bis the projection of 1a in the time domain, where the x-axis refers to time (e.g., days) and y-axis refers to SM content (m 3 m 3 ).θ w , θ c , and θ fc refers to soil wilting point, critical point, and field capacity, respectively.Figures adapted from(He et al., 2023).

Figure 2 .
Figure 2. Working flow of soil texture optimization by using SCE-UA algorithm in this study.Parameters in each block are: (1) S, C, and soil organic carbon (SOC) are for soil sand content (%), soil clay content (%) and SOC content (%) from the input baseline soil texture data; ∆s, ∆c and ∆soc are the absolute value of prescribed variation range for sand, clay and SOC respectively; (2) S i , C i and S i indicate the sand, clay and SOC content generated by SCE-UA program; i refers to the sample index; (3) PTF indicates the PedoTransfer Function (in this study, both SR06 and CH78 schemes are used), θ′ xi indicates the soil hydraulic parameters calculated using the soil composition samples generated by SCE-UA program; x = w,c indicates both θ c and θ w will be calculated; i refers to the sample index.

Figure 3 .
Figure 3. Global distribution of θ w (top) and θ c (bottom) from satellite estimations (a and c), baseline soil texture map (b and e) and optimized soil texture map (c and f).Insets refer to the probability density distribution of each map.Areas with vegetation water content larger than 5 kg m 2 are masked.

Figure 4 .
Figure 4. Global validation of soil wilting point against in-situ records.

Figure 5 .
Figure5.Global patterns of sand content from baseline soil maps (left column), optimized soil maps (middle column) and their difference (right column).For the middle and right columns, From top to bottom are results from GSDE soil map with SR06 PTF scheme, GSDE soil map with CH78 PTF scheme and HWSD soil map with CH78 PTF scheme.Insets refer to probability density distribution.Areas with vegetation water content larger than 5 kg m 2 are masked.

Figure 6 .
Figure 6.Same as Figure 4 but for the soil clay content.

Figure 7 .
Figure 7. Global patterns of soil organic carbon from baseline soil maps (a), optimized soil maps (b) and their difference (c).Insets refer to probability density distribution.Areas with vegetation water content larger than 5 kg m 2 are masked.

Figure 8 .
Figure8.Difference of JJA soil moisture from simulations with baseline soil maps and NNsm (left column), and simulations with optimized soil maps and NNsm (right column).Panels from top to bottom refers to simulations using GSDE soil map with SR06 PTF scheme, GSDE soil map with CH78 PTF scheme, and HWSD soil map with CH78 PTF scheme.Areas with water bodies and vegetation water content larger than 5 kg m 2 are masked.

Figure 9 .
Figure 9. Difference of SSM RMSEs against in-situ observation between experiments using optimized and default soil maps at monthly (left column) and JJA-mean (right column) time scales.Triangles in mageta color indicate increase in RMSEs; reversed triangles in yellow color indicate decrease in RMSEs.Stations with absolute RMSE difference less than 0.005 m 3 m 3 are filtered.

Figure 10 .
Figure 10.Intercomparison of soil texture (a) and (b) and soil thresholds (c) and (d) for difference cases (global).(e) and (f) denote the intercomparison of simulated Soil moisture (SM) and remote sensing-based products (China).For (a) and (b), dash lines indicate probability density function of soil texture from default soil maps and solid lines indicate result from optimized soil maps.For (c) and (d), dash lines indicate soil thresholds calculated using default soil maps and PTF functions, black solid lines indicate estimations from NNsm product, and solid lines in color indicate results from optimized soil maps and PTF functions.For (e) and (f), dash lines in color indicates results simulated with default soil maps and PTF functions; solid lines in color indicate results simulated with optimized soil maps and PTF functions.Black solid line in (e) is SM from NNsm while in (f) is from ITPLDAS.See legend in each panel for explanation of lines in different color.Areas with vegetation water content larger than 5 kg m 2 are masked.

Figure 11 .
Figure 11.Changes of land surface hydrometeorological variables between simulations caused by soil texture.From (a) to (f) are evapotranspiration (ET, mm d 1 ), baer soil evaporation (E, mm d 1 ), plant transpiration (T, mm d 1 ), surface runoff (Runoff, mm d 1 ), sensible heat flux (H, W m 2 ), and surface soil temperature (ST, K).Results are for simulations using GSDE soil maps considering soil organic carbon effect.The other two cases are shown in Supporting Information S1.Areas with vegetation water content larger than 5 kg m 2 are masked.

Table 1
Experiment Designs for Global Soil Texture Optimization a The lower and upper bound for Sand, Clay and SOC are 6%-98%, 3%-58%, 0%-40% respectively.The statistics for sand and clay are based onZhao et al. (2022), and Hugelius et al. (

Table 2
Experiment Designs for Noah-MP Model Simulations Over China HE ET AL.