Sensitivity of joint atmospheric-terrestrial water balance simulations to soil representation: Convection-permitting coupled WRF-Hydro simulations for southern Africa

Regional weather and climate models play a crucial role in understanding and representing the regional water cycle, yet the accuracy of soil data significantly affects their reliability. In this study


Introduction
Soil plays a vital role in Earth's critical zone and significantly impacts the climatic systems of various sub-continental regions.It serves as the primary medium for plant growth, acting as a natural reservoir for carbon, nutrients, and water, and additionally, soil facilitates energy and water exchanges between the atmosphere and the land surface, making it a crucial component in Earth's overall hydrological cycle (Hamidov et al., 2018;Jost et al., 2021;Lal et al., 2017).In simulating past or future climate change and feedback, Earth system models and coupled regional land-atmosphere models are effective in predicting how the hydrological cycle changes globally and regionally (Dai et al., 2019;Gleeson et al., 2020;Lin et al., 2020;Zhang et al., 2022).Since soil layers predominantly influence terrestrial water variations, changes in soil conditions such as soil moisture content and water retention characteristics directly impact the atmosphere system (Verhoef and Egea, 2014;Vogel et al., 2018;Zhou et al., 2021).Therefore, gaining a deeper understanding of soil characteristics is crucial for studying the hydrological cycle, accurately predicting the effects of climate change, and ensuring sustainable land resource management (Dai et al., 2019;Dennis and Berbery, 2022;Wang et al., 2015).
Many soil survey projects and programs have been initiated worldwide with the goal of investigating soil properties and developing highresolution soil data maps ranging from regional to global scales (Batjes et al., 2017;Hengl et al., 2017).Challenges exist such as the harmonization of spatiotemporal complex heterogeneous data and the need to map geospatially accurate soil properties at different scales from individual soil profiles to territorial units and to the global scale (Batjes et al., 2017;Dai et al., 2019;Liakos and Panagos, 2022;Sanchez et al., 2009).Effective estimation of physical properties and hydrological parameters of soils remains challenging, as direct measurements are usually profile-based, and in most cases, it is impractical to obtain sufficient samples to reflect spatial variation (Cosby et al., 1984;Kishné et al., 2017).Consequently, discrepancies in soil properties are prevalent among different global soil datasets commonly used today.For instance, a comparison conducted by Dennis and Berbery (2021) between the widely used Food and Agriculture Organization soil database (FAO; FAO, 2013), the USDA State Soil Geographic Database (STATSGO; Soil Survey Staff (NRCS), 2017), and the Global Soil Dataset for Earth System Model (GSDE; Shangguan et al., 2014) revealed finer soils stipulated by GSDE compared to STATSGO over the U.S. Great Plains, and vice versa over central Mexico.Dy and Fung (2016) identified large block differences in soil type between FAO and GSDE over the Tibetan area and southern China.Additionally, Zhang et al. (2023) highlighted substantial differences in alternative global soil datasets in southern Africa, particularly in the central region where soil profile samples were less numerous.
Sophisticated land surface models (LSMs) have been validated for their reliable performances in reproducing individual components of the hydrological cycle (Liu et al., 2020;Ma et al., 2017;Wei et al., 2021;Xue et al., 2001), including soil moisture, evapotranspiration, runoff, as well as precipitation when coupled with numerical weather and climate models (Arnault et al., 2018;Shang et al., 2022).Two-way coupled land-atmosphere models integrate comprehensive processes occurring in terrestrial and atmospheric domains, capturing feedback mechanisms and providing a more holistic view of water and energy interactions, thus allowing for a more appropriate representation of water cycles.In very high-resolution modeling practices, such as convection-permitting modeling which considers high topographic gradients in complex terrain, advanced fully coupled atmosphere-hydrology models have been employed to better represent the joint terrestrial and atmosphere water cycle at watershed (Quenum et al., 2022;Rummler et al., 2019;Senatore et al., 2015;Zhang et al., 2019) and subcontinental scales (Arnault et al., 2021a;Zhang et al., 2023).Nevertheless, one important source of uncertainties in coupled model simulations still stems from land surface modeling.Most LSMs rely on parametric assumptions and a large number of parameters to represent soil, vegetation, and hydrology processes, which can affect the effectiveness and accuracy of modeling results.
In most LSMs, hydrophysical properties of soil are typically prescribed based on soil type classifications, with empirically derived values from experimental investigations linked to the lookup table (Chen and Dudhia, 2001;Dennis and Berbery, 2021;Marthews et al., 2022).This approach is computationally efficient, model-compatible, and easily transferable across different regions.But it is associated with deficiencies such as inaccurate regional-specific parameters and its dependence on soil type maps.Therefore, the uncertainties introduced by soil properties in LSMs are largely due to the delineation of soil types in the soil databases.As soil type classifications in global soil databases vary regionally, the choice of data can lead to significant variations in coupled modeling results.Some studies have concluded that modeled surface water and energy fluxes vary according to changes in soil type and associated hydrophysical parameters (e.g., Campoy et al., 2013;Gao et al., 2008;Pedruzzi et al., 2022;Zhang et al., 2023).Such changes may further affect the atmospheric and terrestrial water balance through the full coupling of land surface, hydrology and atmosphere modeling framework.Although there is extensive research on such relationships regarding land cover change (Jach et al., 2020;Wang et al., 2023), soil moisture initializing (Lin and Cheng, 2016;Schär et al., 1999), and surface hydrological processes (Rummler et al., 2019;Zhang et al., 2019), there has been less attention regarding the impact of soil hydrophysical parameters.Results by Dennis and Berbery (2022) demonstrated the influence of soil types and parameters on boundary layer thermodynamic changes as well as their modulation of atmospheric water budgets in North America.However, the specific roles of soil properties in regional climate and water cycle require continuous investigation across different regions, considering variations in climate characteristics, water resources distribution, and land-atmosphere feedback mechanisms.
Southern Africa is confronted with substantial risks arising from climate change and water scarcity (Engelbrecht et al., 2024;Rouault et al., 2024).Hydrological dynamics, such as droughts and precipitation patterns, are influenced by both external atmospheric factors, such as the El Niño Southern Oscillation (Hoell et al., 2021), and land-atmosphere feedback loops (Cook et al., 2006;Mwanthi et al., 2023).Previous research on land-atmosphere coupling has identified that soil moisture plays a crucial role in modulating wet season precipitation in various regions of southern Africa, exerting both positive and negative impacts (Cook et al., 2006;Yang et al., 2018;Zhou et al., 2021).This implies that changes in soil moisture may lead to both increase and decrease in precipitation.Nevertheless, existing investigations into these atmospheric feedback mechanisms in southern Africa often utilized global monitoring or modeling framework, and to a less extent regional coupled modeling approaches.Consequently, it becomes even more important to explore the impact of various feedback mechanisms on the regional water cycle via regionally coupled model simulations, taking into account the uncertainties of soil types and hydrophysical parameters.In this context of regional coupled modeling, a prior study by Zhang et al. (2023) underscored notable differences among global soil datasets commonly used in southern Africa and quantified the internal variability of simulated surface hydrometeorological variables.Their findings also suggest significant potential implications for atmospheric modeling results.This study serves as an extension work of Zhang et al. (2023) by focusing specifically on the atmospheric thermodynamics and implications for modeling regional atmospheric and terrestrial water cycles.
This study aims to address the following questions: how do changes in soil type and its associated hydrophysical parameters affect the regional water cycle?To what extent can different global soil datasets modulate the regional water cycle through coupled land-atmosphere modeling over the southern Africa region?It is hypothesized that soil hydrophysical properties linked to soil type will significantly impact surface fluxes, leading to alterations in atmospheric thermodynamic instability, thus affecting moisture transport and regional water budgets.This highlights a key difference from the study by Zhang et al. (2023), which did not focus on these specific dynamics and regional variations.To test this hypothesis, we employ different soil datasets within the fully coupled Weather Research and Forecasting Hydrological Modeling system (WRF-Hydro) for convection-permitting ensemble modeling over the southern Africa region.As in the approach by Zhang et al. (2023), Z. Zhang et al. four widely used, open access, global soil datasets, respectively derived from the Food and Agriculture Organization (FAO), Harmonized World Soil Database (HWSD), Global Soil Dataset for Earth System Modeling (GSDE), and global gridded soil information (SoilGrids), are selected for this investigation based on their widespread acceptance and utilization in current climate and weather modeling.The coupled WRF-Hydro model is chosen for its capability to incorporate sophisticated land surface hydrological processes and lateral terrestrial water flow.This is crucial for modeling strategies over complex terrain areas at very high resolutions, like in the convection-permitting scale (<4 km).This investigation focuses on assessing the impact of soil type and hydrophysical properties on surface fluxes, as well as their influence on atmospheric processes and regional water budgets.
The subsequent sections of this article are structured as follows: Section 2 outlines the coupled modeling approach, experimental design, data, and methodology.Section 3 presents and discusses the results of model evaluation and comparison, while conclusions are discussed in Section 4.

Study area and soil datasets
The study area for our model simulations is the southmost region of Africa, south of 19 S • , with a primary focus on South Africa and its surrounding regions (Fig. 1).This area is characterized by varied land cover and complex terrain topography, as well as seasonally and spatially variable climate.The climate exhibits considerable diversity, ranging from subtropical arid and semi-arid to temperate and Mediterranean climates, with different precipitation patterns across the region (Baade et al., 2024;Rouault et al., 2024).The terrain varies remarkably, with mountainous regions in the east and south, and relatively flat plateaus in the interior (Fig. 1a).The predominant land cover consists of grasslands, savannas and barren land, with a small portion of forests, wetlands and urban areas (Fig. 1b).
Following Zhang et al. (2023), four widely utilized global soil datasets are used in this study.The first dataset is provided by the Food and Agriculture Organization (FAO) of the United Nations (FAO, 2013;FAO-UNESCO, 1981), and developed from soil surveys and the merging of various soil-type datasets into a single database at a grid resolution of 3 arc-minute (~9 km).It has been extensively utilized for decades in agricultural planning and environmental assessments and is still the default soil data for regional climate modeling applications, including WRF and WRF-Hydro models.
The first alternative dataset employed is the Harmonized World Soil Database (HWSD) version 1.2 (FAO/IIASA/ISRIC/ISSCAS/JRC, 2012).The HWSD aims to harmonize soil information from diverse sources worldwide.It combines soil data from different national and regional soil databases within the FAO soil world map, creating a consistent and standardized dataset at a resolution of 30 arc-second (~1 km) resolution.
The Global Soil Dataset for Earth System Modeling (GSDE) is another high-resolution global dataset primarily developed for earth system modeling purposes (Shangguan et al., 2014).It is also based on the FAO soil world map, incorporating data from HWSD, other national and regional soil databases, as well as local soil maps.However, GSDE utilizes advanced statistical methods and mapping procedures to predict soil parameters in unsurveyed areas.The GSDE data shares the same resolution as HWSD.
Lastly, the SoilGrids (SGD) is used as an alternative source of information on soil properties.It is a recent global soil mapping project led by the International Soil Reference and Information Centre (ISRIC, Hengl et al., 2017).It employs state-of-the-art machine learning prediction models and a vast amount of soil observations to generate high-resolution maps of various soil properties.The SoilGrids data has a resolution of 250 m, providing the most detailed spatial estimation of soil distribution on a global scale.All of these datasets are open access and have broad applications in various environmental modeling studies.

Coupled WRF-Hydro model and experiment design
The fully coupled Weather Research and Forecasting Hydrological Modeling system (WRF-Hydro) is employed to investigate the soil datarelated impact on weather and climate modeling in southern Africa.This comprehensive modeling system comprises Weather Research and Forecasting (WRF) model version 4.2 (Skamarock et al., 2019) coupled with the Noah land surface model with multi-parameterization options (Noah-MP; Niu et al., 2011) and the hydrological module of WRF-Hydro version 5.1 (Gochis et al., 2018).
In the model configuration, the atmospheric part in WRF-Hydro is based on an Arakawa-C grid with terrain-following vertical levels of 35 up to 50 hPa.The model domain covers the study area in southern Africa (Fig. 1), encompassing a total of 650 × 500 horizontal grid points with a grid spacing of 4 km.This setup allows for convection-permitting scale of dynamic downscaling, enabling a more detailed representation of atmospheric processes.The choice of model physical parameterizations is informed by previous literature (Abba Omar and Abiodun, 2021;Arnault et al., 2021b;Crétat et al., 2012;Ratnam et al., 2013) and many downscaling test runs.The final selected parameterization combination is listed in Table 1.The model system is initialized and driven by 3 hourly ERA5 reanalysis (Hersbach et al., 2020), and model outputs are saved every 3 h.The static conditions at the land surface are based on the land use and land cover map from MODIS, the Moderate Resolution Imaging Spectroradiometer (Friedl et al., 2010;Fig. 1b).Except for the default soil types from the FAO database, the other three global soil datasets described in Section 2.1 are implemented separately in coupled WRF-Hydro simulations, depicted as WRFH-HWSD, WRFH-GSDE, and WRFH-SGD in this paper.In the model preprocessing procedure, all the aforementioned soil data are interpolated from their native grid to the 4 km model grid using the default nearest neighbor method.These data are classified into soil categories at the model grid according to the USDA 16-class soil classification system based on the percentages of sand, silt, and clay in the soil.The classified results are shown in the left column of Fig. 2. The soil type distributions varied significantly among the four global datasets and their statistical differences had been discussed in Zhang et al. (2023).
The Noah-MP LSM and the hydrological module of WRF-Hydro are integrated with the atmospheric modeling to account for landatmosphere coupling as a lower boundary condition.Noah-MP LSM is renowned for its increased flexibility in model parameterization and its realistic representation of land surface processes, making it extensively used in weather and climate model simulations to dynamically represent energy, water, and momentum flux exchanges (Ma et al., 2017;Niu et al., 2011).However, it does not explicitly consider specified land hydrological processes, such as overland surface flow generation and lateral flow processes, which are critical for accurately predicting hydrological responses in high-resolution modeling.To overcome this limitation, WRF-Hydro hydrological module is further coupled with Noah-MP LSM, explicitly representing lateral water processes through a distributed hydrologic model.It offers a detailed representation of land hydrological processes, encompassing routed surface and subsurface runoff, horizontal redistributed soil moisture, and streamflow modeling.Thus, it has gained increasing popularity in research applications.The land surface in WRF-Hydro is divided into four soil layers, each with specific soil depths of 0 to 10, 10 to 40, 40 to 100, and 100 to 200 cm, and water and heat exchanges are parametrized and computed within this 2 m soil depth.Horizontal overland and subsurface water flow is computed on a 400 m hydrological subgrid, based on a refined terrain gradient.This hydrological subgrid interacts with the 4 km WRF and Noah-MP grid through an aggregation-disaggregation procedure, rescaling the hydrological moisture condition.Therefore, this two-way interaction between land surface hydrological modeling and atmospheric modeling enhances the model's ability to simulate water-related phenomena more accurately (e.g., Rummler et al., 2019;Senatore et al., 2015;Zhang et al., 2021).
Default physical parameterization options in Noah-MP and WRF-Hydro with low uncertainty are chosen in model simulations.For instance, parameterization of surface exchange processes with atmosphere uses the default Monin-Obukhov scheme with identical roughness lengths (Brutsaert, 1982;Chen et al., 2019).This option is selected over other options with tunable constant values (Chen et al., 1997) or vegetation-dependent values (Chen and Zhang, 2009) of Zilitinkevich coefficient, to reduce model complexity.It is also worth noting that the simulations of river streamflow require the activation of an additional routing module in the river channel, which requires more computational resources but is not the focus of this case study.Lateral hydrological processes in this WRF-Hydro setup only account for lateral surface and subsurface routing for soil moisture redistribution, like in Arnault et al. (2021a).Table 1 lists the combinations of physical schemes of Noah-MP and WRF-Hydro module adapted in the modeling.Detailed descriptions of the Noah-MP model and WRF-Hydro module are available at Niu et al. (2011) and Gochis et al. (2018) respectively.
Soil hydrophysical parameters and routing roughness parameters are kept at their default values from the Noah-MP LSM and WRF-Hydro models, ensuring that comparisons of results relied solely on the description of the soil data.The soil hydrophysical parameters are presented in a lookup table (Table 2), derived empirically and experimentally from soil surveys in the United States (Cosby et al., 1984).It is noted that the classified soil types with property parameters are location-dependent, which may inaccurately represent the true soil parameters at grid scale.Nevertheless, such an approach uses only generic soil type maps, and is computationally inexpensive and easily transferable to different regions.Based on the soil type distributions and lookup   2b and 2c display the spatial patterns of selected soil hydrophysical parameters, namely wilting point and porosity, of the FAO default data.The wilting point indicates the lowest soil water content under normal conditions, and the porosity indicates the water-holding capacity of the soil.The differences in these parameters between the three alternated soil datasets and the FAO default soil are further illustrated in the lower three rows of Fig. 2. In general, the three perturbated soil data show more heterogeneous soil hydrophysical properties.In the western and northern parts of the study domain, the wilting point is generally lower in HWSD, GSDE, and SoilGrids compared to the FAO default.Within the western part of South Africa territory, HWSD and GSDE usually have higher values of wilting point, while SoilGrids is similar to the FAO default (DEF).Over the Drakensberg Mountain areas, the North-South running feature in the NE of South Africa (north of Eswatini), the differences in wilting points are highly variable.As for soil porosity, the replaced soil datasets generally have lower values, indicating a reduced maximum water holding capacity in the soil for HWSD, GSDE, and SoilGrids datasets.The distribution of field capacity differences follows a similar pattern to the distribution of wilting point differences and thus is not shown.
Therefore, by maintaining the same atmospheric boundary forcing, the simulation differences offer an opportunity to reveal the role of altered surface boundaries on coupled land-atmosphere modeling systems.The modeling simulations cover the period from January 2015 to March 2018, encompassing three austral summers in total.In the analysis we focus on the five summer months from November to March, which is favorable for thermal and hydraulic variation, considering the presence of adequate precipitation (> 45 mm/month) and high temperatures over the study region (Supplementary Material, Figure S1).Thereby, the initial 10 months prior to October 2015 were designated as the model spin-up time.During this time, the simulations were continuously operated beyond the summer months to ensure the modeling system reached equilibrium in soil moisture and energy conditions annually.Conducting three years of simulations reduces internal variability and model dispersion, thus better isolating the impacts of the surface conditions compared to using data from a single year.Among the three selected years, 2015/16 featured one of the strongest ENSO events on record, 2016/17 was characterized by typical La Niña year, and 2017/18 was a normal year.By averaging ensembles from these three distinct climate patterns, it effectively separates the influences of associated external climate forcing from internal feedback mechanisms in regional climate simulations.

Observation based reference datasets
In order to facilitate an intercomparison of different modeling experiments and to demonstrate the representativeness of the model configuration, it is essential to appropriately simulate the key hydrometeorological variables.For this purpose, the simulated climate characteristics are evaluated against observation-based gridded products for temperature, precipitation, and evapotranspiration, generally named reference data (REF) in the remaining presentation.
These gridded products used for evaluation include the Climate Research Unit (CRU) temperature version 4, with a 0.5 • horizontal resolution (Harris et al., 2020); the Multi-Source Weighted-Ensemble Precipitation (MSWEP) version 2.8, with a spatial resolution of 0.1 • (Beck et al., 2019); and the land evapotranspiration dataset of the Global Land Evaporation Amsterdam Model (GLEAM) version 3.6, with a spatial resolution of 0.25 • (Martens et al., 2017).These datasets are utilized as they are all observational based and have been broadly and successfully used in assessment of regional modeling performance, including southern Africa (e.g.Arnault et al., 2021b;Zhang et al., 2024).In the following model evaluation and the calculation of biases and correlations, the simulation results are bilinearly regridded in space and time onto the corresponding reference dataset, allowing for the direct comparisons.

Joint atmospheric-terrestrial water balance
The water cycle at a regional scale can be characterized by the joint atmospheric-terrestrial water balance.The calculation of the atmospheric water budget is based on the balanced equation for atmospheric water, given as: P and ET are the precipitation and evapotranspiration rate, respectively.W is the atmospheric water storage and − ∇⋅ Q → denotes the vertically integrated water convergence, and they are respectively calculated as follows: with pressure (p), horizontal wind vector fields (u, v), specific humidity (q), and gravitational acceleration (g).The residual term also appears in atmospheric water budgets (Eq (1)), in response to the non-closure of water balance derived from the modeling system.The systematic presence of residuals in numerical weather modeling is largely attributable to interpolation and integration of atmospheric data at pressure levels, numerical errors in model discretization methods and the 3-hourly sampling frequency, as well as imperfections in the representation of atmosphere processes (e.g., Kurkute et al., 2020;Roberts and Snelgrove, 2015).
The terrestrial water budget terms are expressed in the balance equation as follows: ∂TWS ∂t where the terrestrial water storage (TWS) includes the sum of water in the soil, surface ponded water and groundwater amount.R is the runoff from the model simulations.

Model performance
The results of the control simulation using the coupled WRF-Hydro model with default FAO soil data are evaluated against observationbased reference data.Fig. 3 presents a comparison between the default modeling (DEF) and reference data (REF) for 2-meter air temperature, precipitation, and evapotranspiration during the summer.The spatial patterns of the simulated variables are displayed at the 4-km model grid resolution, showcasing the added value of dynamic downscaling at high resolution.The coupled modeling accurately captures the spatial variabilities of these variables.It distinctly represents the temperature gradients from the central Highveld to the Drakensberg Mountains and the east coast while also showing clear east-west gradients for precipitation and evapotranspiration over the land surface.
When interpolating the simulation results to the reference dataset grids, the statistical results for spatial correlations between the default WRF-Hydro and references are as follows: 0.94 for temperature, 0.89 for precipitation, and 0.85 for evapotranspiration.All of these correlations are significant, with associated p-values below 0.05.The scatter plots in Fig. 3 show small biases for air temperature (Fig. 3c), mostly in the range of − 2.6 • C to 0.9 • C, resulting in an overall cold bias of − 1.03 • C. Precipitation is slightly overestimated in regions with high precipitation (Fig. 3f), leading to an overall overestimation of about 0.46 mm day − during the summer.Simulated evapotranspiration is quite comparable to the reference data (Fig. 3i), with an overall bias of − 0.097 mm day − during the summer.
The simulated precipitation is further evaluated against the MSWEP reference data on a subdaily scale, allowing for the validation of convection simulation spatially.Three hourly model outputs and reference data are integrated at 6 hourly intervals to minimize non-precipitation intervals.Fig. 4 presents spatial maps of calculated correlation coefficient values between the model and reference data as well as the corresponding histograms.The simulated subdaily precipitation exhibits a high correlation with the reference data and is statistically significant (p < 0.05) over the vast majority of the area, except for very dry regions Table 2 Soil hydrophysical parameters prescribed in the model lookup table, including parameters of porosity (θ s ), field capacity (θ f ), wilting point (θ wp ), saturated metric potential (Ψ s ), saturation hydraulic conductivity (K s ), and slop of retention curve fitting (b).

Soil texture
θ s (m 3 m − 3 ) such as the Namib Desert (Fig. 4a).Regions with abundant and stable summer precipitation, such as the Drakensberg Mountains, the Highveld area, and southern coastal areas, show considerably high correlation coefficient values (r > 0.5).The histograms of subdaily precipitation are also comparable, indicating similar percentile distribution functions of 6-hourly accumulated precipitation (Fig. 4b).In terms of differences in precipitation intensities, in general, the numerical modeling approach simulates more moderate precipitation events (> 1 mm 6-hr − 1 ) compared to small precipitation events (< 1 mm 6-hr − 1 ).This indicates that the overall slight wet bias in summer precipitation is associated with an overestimation of moderate-intensity precipitation.Nevertheless, the evaluation suggests that the coupled WRF-Hydro prediction model successfully reproduces the key water and energy variables at the surface over the study region.This supports the use of the coupled modeling approach to further examine the impact of alternative soil datasets.

Impact on land-atmosphere interfaces
Changes in the land surface conditions serve as the primary driver of atmospheric changes in this study.Upward moisture and heat fluxes at the surface contribute to atmospheric thermodynamic alterations through land-atmosphere feedback.Therefore, Fig. 5 illustrates the spatial patterns of surface latent and sensible heat fluxes and surface soil moisture.The patterns of heat fluxes and soil moisture exhibit a pronounced east-west contrast within the study region (Fig. 5a-c), with the highest latent heat (~ 150 W m − 2 ) and moisture found in the forested regions of the Drakensberg Mountains in the east.These heat flux patterns are consistent with precipitation distribution, suggesting an overall condition of water-limited dynamics in the study region.
When replacing the soil dataset, surface energy and moisture conditions change consistently with the alteration of soil type and the associated hydrophysical parameters.In the central and western dryland areas, differences in surface soil moisture (right column of Fig. 5) are closely related to variations in wilting point values (middle column of Fig. 2).Lower wilting points correspond to drier surface soils, while higher wilting point corresponds with increased soil moisture.Regions with lower soil moisture generally correspond to lower latent heat fluxes and higher sensible heat fluxes.In contrast, areas with higher soil moisture in the western dryland in South Africa do not significantly affect the latent heat flux.This is because the soil moisture levels in these areas are extremely low, near to the soil wilting point, making it difficult for incoming radiation to evaporate the soil water.Consequently, differences in soil properties only slightly modify the sensible heat flux.In the eastern mountainous areas with moderate precipitation, differences in soil moisture are more related to variations in soil porosity.Relatively drier soils tend to correspond to higher latent heat and lower sensible heat fluxes (Fig. 5) due to reduced water-holding capacity linked to lower porosity (Fig. 2).Additional influence on other surface variables at land-atmosphere interfaces like soil and air temperature, air humidity has been analyzed in Zhang et al. (2023).
It is worth noting that the energy flux differences shown in Fig. 5 represent averaged values for the entire summer, which counteract smaller nocturnal differences.During daytime turbulent fluxes could be much larger, possibly reaching up to twice the values, as suggested by Dennis and Berbery (2022) and Lee et al. (2023).Spatial differences in Fig. 5 are denoted with a dashed area of significance at 95 % confidence level (p < 0.05).In general, the impacts of the replaced global soil data on surface soil moisture and latent heat are significant across almost the entire domain and on sensible heat in some areas.
Comparing the simulation results, disparities exist in both overall values and spatial and temporal distributions.WRFH-HWSD and WRFH- The histogram distributions depicting the number of grids and days with differences are displayed in the right column of Fig. 6.WRFH-HWSD and WRFH-GSDE exhibit skewed distribution towards lower values of latent heat fluxes (Fig. 6a) and soil moisture (Fig. 6c) compared to WRFH-DEF.Notably, 99 % of differences in latent heat flux fall within the range of − 30 to 10 W m − 2 , with the majority (86 % to 94 %) concentrated between − 10 and 5 W m − 2 .Spatial differences for WRFH-SGD are overall conservative, displaying a relatively tight distribution.For the statistics of daily values, the average soil moisture from all three alternative soil datasets is lower than the WRFH-DEF throughout the entire summer (Fig. 6d).Averaged latent heat fluxes from WRFH-HWSD and WRFH-GSDE are also consistently lower each day during the entire summer.These histogram statistics indicate that changes in soil data, directly impact moisture and energy exchanges at the land-atmosphere interface both spatially and temporally.

Atmospheric thermodynamic response to surface changes
Fig. 7 presents the changes in atmospheric instability induced by land surface conditions.The diagnostic indicators included here are the convective available potential energy (CAPE), convective inhibition (CIN), and planetary boundary layer height (PBLH).CAPE and CIN characterize the degree of local instability, with higher CAPE and lower CIN indicating a greater likelihood of occurrence of conducive environments for moist convection.Changes in CAPE and CIN depend on the vertical distribution of temperature and humidity in the lower atmosphere and are also relevant to the PBL structures.
Differences in moisture and energy at the lower land-atmosphere interface (Fig. 5) are seen to directly influence atmosphere instability quantities (Fig. 7).The spatial difference maps of summer-averaged CAPE are highly correlated with disparities in surface latent and sensible heat fluxes, resulting in differences in CAPE values of − 50 to − 120 J kg − 1 over the central region.However, the changes in atmosphere instability quantities are secondary effects of the changes in soil properties, making them less statistically significant compared to the surface fluxes (Fig. 5).The spatial pattern of differences in CIN is generally consistent to that of CAPE but shows additional variations.WRFH-HWSD, WRFH-GSDE and WRFH-SGD all simulate high sensible heat flux with respect to the WRFH-DEF over the central flat area (Fig. 5), i.e., the Kalahari Desert, therefore the overall reduction in CIN is quite obvious (about − 30 ~ − 40 J kg − 1 ) in this region.The drier and warmer surface conditions resulting from changes in soil properties lead to a reduction in CAPE in the interior region, creating unfavorable conditions for convection formations.Additionally, the enhancement of sensible heat promotes the deepening of the mixing layer depth, as seen in the increase in PBLH (Fig. 7).This, in turn, leads to an increase in lifting condensation level (LCL) height, narrowing the distance between LCL and the level of free convection, and ultimately reducing CIN.In the western part of domain area, the change in CAPE is quite small, but CIN decreases considerably along the coastal Namib desert and shows an increase over the nearby mountains (Fig. 1a).
The eastern region, encompassing the Highveld and Drakensberg Mountains, experiences varying changes in the PBL depth attributed to changes in soil hydrophysical properties and surface conditions.Across most areas, WRFH-HWSD and WRFH-GSDE exhibit greater energy for convection inhibition compared to WRFH-DEF, resulting in increased CAPE by up to 20 J kg − 1 .Meanwhile, the cooler surface also results in an increase in CIN.Such increases in CAPE and CIN indicate more energy for sustaining convection, yet the energy is more difficult to access.It is noted that the eastern region has relatively abundant moisture sources and precipitation.Thus, changes in atmospheric instability of a competing effect on convection development likely lead to alterations in precipitation processes and the regional water budget.For the arid western region, while experiencing significant atmospheric thermodynamic changes, changes in the water budgets may be less pronounced due to low precipitation levels.

Joint atmospheric and terrestrial water budgets
The atmospheric water budget terms of precipitation (P), atmospheric moisture convergence (CONV) and atmospheric water storage (AWS) are calculated according to Eq (2) and Eq (3), and the differences are displayed in Fig. 8. Similarly, due to varying moisture conditions across the region, differences in atmospheric moisture budget variables P and CONV also exhibit east-west and north-south gradient.In the north-central region of the domain with drier soil moisture and higher sensible heat flux due to soil properties changes (as shown in Fig. 5), the atmospheric water budget primarily shows an increase in CONV, meanwhile the AWS experiences a slight increase across the region.Changes in P are somewhat dispersed over this area (Fig. 8).Additionally, around the area of decreased CONV, AWS mainly shows a decrease, reflecting moisture divergence.Over the eastern region with wetter conditions, spatial differences in P are more consistent with CONV, as CONV is one of the important factors influencing convective precipitation.
Spatial differences in terrestrial water compartments (Fig. 9) show strong signals of changes in soil hydrophysical parameters.In general, evapotranspiration (ET) and terrestrial water storage (TWS) are remarkably reduced over the central area due to the created drier conditions analyzed in Section 3.2, primarily driven by differences in simulated soil moisture.Differences in runoff are predominantly observable over the eastern area and central interior.In the eastern mountainous region, the differences are mixed, complexly influenced by infiltration excess and overland flow, which depend on precipitation amount and intensity.In the central interior region, runoff differences are directly related to changes of TWS and ET.Overall, the spatial distribution of runoff generally exhibits an inverse relationship to the variation in TWS and ET.Changes in soil properties, including a decrease in water holding capacity (reduced wilting point and porosity), result in more runoff generation with lower soil water storage and reduced ET.Additionally, it should be noted that the WRF-Hydro model enables surface runoff and subsurface water routing in a 400-m subgrid and the topography gradient could further influence the spatially generated runoff in model experiments.Nonetheless, as is evident in Fig. 9, the spatial distribution of runoff is substantially influenced by soil hydrophysical characterization.Compared to the atmospheric water budgets (Fig. 8), changes in terrestrial water compartments are more pronounced, with statistically significant differences at the 95 % level over most regions.It is also worth noting that for terrestrial variables, particularly the TWS, areas with minor spatial differences are still statistically significant, suggesting further differences in temporal terms.
Therefore, the monthly variations of joint regional water budgets are further investigated.Considering the wide diversity of moisture conditions between the eastern and western parts, the domain is divided into two subregions according to the IPCC climate reference regions for climate model data (Iturbide et al., 2020): the Eastern southern Africa region (ESAF) and West southern Africa region (WSAF), divided by the longitude of 25 • E (illustrated in Figs.3d and 3g).The water budget values for all ensemble members as well as their changes are shown in Fig. 10 and Fig. 11 for the two subregions.Ensemble results of all water budget values exhibit consistent monthly variation, owing to the reasonable downscaling from the identical atmospheric forcing.In the wetter ESAF region, differences in water budgets of P and CONV among the ensemble members are relatively large, with variations ranging from 0.1 to 0.4 mm day − 1 for the monthly values.This leads to differences of around 5 % in P and 15 % in CONV (Fig. 10).In the drier WSAF region, where P is low, changes in CONV are considerably more apparent (10 to 50 %), and overall P also changes within 5 % (Fig. 11).Changes in the terrestrial water budgets are evident in both subregions.ET changes in ESAF and WSAF are up to 0.3 mm day − 1 and 0.5 mm day − 1 , respectively, corresponding to the order of change up to 10 % and 30 %.   Z. Zhang et al.Changes in runoff are quite apparent in both subregions, especially during the latter summer months from January to March.This is mainly attributed to changes in overall soil water storage resulting from different soils, leading to the obvious change in ΔTWS in the early two summer months.

Discussion and conclusions
This study explored the influence of soil hydrophysical properties on land-atmosphere coupling and joint atmospheric-terrestrial water budgets through the use of coupled regional land-atmosphere modeling.Given the diverse and great disparities among the soil data in modeling practice, the present study examined the impact on modeling uncertainty using four of the most common and widely used global soil data, namely, FAO, HWSD, GSDE and SoilGrids datasets.We performed convection-permitting fully coupled WRF-Hydro ensemble simulations over the southern Africa region for the austral summer months from 2015 to 2018.The spatial patterns of soil hydrophysical properties were altered by varying the soil dataset and soil hydrophysical parameter lookup table assignment, hence modulating the regional water cycle through land-atmosphere feedback.
Substantial variations in soil type distribution and soil water hydrophysical parameters are entailed across southern Africa when using these global soil datasets.In comparison to the default FAO soil data in the coupled modeling, significant alterations in soil water and turbulent fluxes partitioning are found due to changes in soil hydrophysical properties.Beyond the findings of Zhang et al. (2023), current investigations demonstrate that these modifications in near-surface water and energy distribution result in differences in atmospheric thermodynamic instability during the austral summer period.Specifically, over the central interior of the study region, shifts in soil types led to broad changes in soil hydrophysical parameters, i.e., a reduction in soil water-holding capability.This significant decrease in soil moisture and upward latent heat contributed to a statistically significant drop in averaged CAPE and overall reduction in CIN.These projected changes indicate that there is less energy available to sustain convection in the regional environment, while the energy is more readily accessible.These results are broadly aligned with sensitive experiments conducted by Dennis and Berbery (2022) over the Mideast United States and by He et al. (2016) over eastern China.In our study, local precipitation did not consistently correlate with atmospheric thermodynamic instability due to the rare occurrence of convection in the arid interior region.
Changes in surface conditions related to soil hydrophysical parameters were observed to dynamically impact both atmospheric and terrestrial water budgets.These effects were especially pronounced in terrestrial processes, where variations in average soil moisture directly influenced subsequent evapotranspiration, runoff generation, and terrestrial water storage changes.These impacts exhibited significant spatial variability, with overall changes exceeding 1 mm day − 1 observed in several areas.In contrast, the influence of soil conditions on atmospheric water budgets was noticeable but less significant.In the arid interior region, reduced soil moisture appeared to enhance atmospheric moisture convergence, which favors precipitation increase.Such suggested potential negative feedback in soil moisture-atmosphere interactions has also been pronounced in previous modeling studies for southern Africa (Cook et al., 2006;New et al., 2003;Yang et al., 2018), indicating that dry soil in combination with surface heating, reduces local moisture recycling, which is however compensated by moisture advection and convergence.Nevertheless, our results did not consistently show a significant increase in precipitation, with only slight increases observed in some local patches.This could be attributed to the difficulty in triggering the increased moisture convergence due to the general dryness of the region.Additionally, the soil moisture perturbations remain relatively small, and the prescribed regional modeling boundaries may also constrain the internal feedback mechanisms.Modeling extension towards fully coupled land surface-hydrology-atmosphere modeling can therefore contribute to the understanding and characterization of land-atmosphere feedbacks in southern Africa.
By using four commonly used global soil data, we identified uncertainties in the numerical climate and weather models in representing regional water balances.Simulated water budgets displayed consistent seasonal variations but were highly sensitive to perturbations in soil datasets.Soil description uncertainties had a more pronounced impact on terrestrial fields, with monthly ET changes ranging in the order of 5 to 13 % for the wetter eastern subregion (ESAF) and 5 to 30 % for the arid western subregion (WSAF) separated in the simulation area.
Acknowledging the critical role of soil hydrophysical properties in partitioning precipitation into surface runoff and soil infiltration, our findings indicate that soil type uncertainties lead to runoff differences of up to 90 % in arid subregions.When examining precipitation and atmosphere water convergence, modeling uncertainties attributed to soil perturbation are usually within 5 % and 30 %, respectively.These soil data-related uncertainties have significant implications for current modeling practices.
Numerous studies have shown that land cover and large-scale vegetation changes substantially affect regional climate, including regional water cycles.In African, Glotfelty et al. (2021) found that implementing broad classes of land cover changes in regional climate models resulted in precipitation rate changes mostly within 0.5 mm day − 1 in Sub-Saharan Africa.Similarly, Smiatek and Kunstmann (2023) observed that potential band reforestation over the Sahel region led to nonsignificant changes in wet season precipitation (<0.5 mm day − 1 ) and runoff (<4 mm day − 1 ).These changes in regional water budgets are covered by our modeling variabilities induced by soil data uncertainties.Therefore, our study in accordance with other research on soil data perturbation (Dennis and Berbery, 2022;He et al., 2016;Lin and Cheng, 2016;Pedruzzi et al., 2022) suggests that soil data play an equally or even more pronounced role in regulating the regional water cycle than land cover changes.Notably, the effects of certain large-scale vegetation changes highlighted in regional climate modeling, such as the broad worldwide deforestation (e.g., Arnault et al., 2023;Chen and Coauthors, 2019;Lee and Berbery, 2012;Zhang et al., 2024), and long-term vegetation restoration and afforestation in Northern China (e.g., Yu et al. 2020;Wang et al. 2023), have been shown to be comparable in both magnitude and broadness to the impacts of soil data changes presented in this study.These changes affect atmospheric thermodynamics and moisture flux convergence, as well as the terrestrial water budgets.Furthermore, observational studies indicate that the rate of soil moisture increment in response to precipitation and the occurrence of subsurface water flow, is highly depending on the type of vegetation cover (Cheng et al., 2020;Tian et al., 2019).Therefore, it is crucial to fully consider changes in land surface boundaries in further modeling practice.Emphasizing the compounding influences of soil type and hydrophysical properties, as well as vegetation cover and parameters, will enhance the accuracy and reliability of current climate and water cycle modeling.
Lastly, it is important to highlight that the impact of the uncertainty in soil representation on Earth system climate modeling comprises just one part of the broader landscape of uncertainties inherent in the current modeling strategy.Soil represents only one fraction of the boundary conditions that attributes the variance of the model simulation.A multitude of other sources of uncertainty arise from structural considerations (Howland et al., 2022;Karypidou et al., 2023;Zheng et al., 2021), such as the representation of unresolved physical processes and the design of the model itself.Moreover, internal variability, arising from stochasticity due to nonlinear dynamic processes inherent to the atmosphere, introduces further layers of uncertainty into the modeling process (Bassett et al., 2020;Marthews et al., 2020;Palmer, 2001;Quenum et al., 2022).Nevertheless, it is imperative to acknowledge that refined soil mapping and the accurate representation of soil properties are crucial in enhancing the predictive capacity of current modeling systems.By comprehensively considering and recognizing the complexity of these uncertainties, we can better understand and predict the intricate dynamics of the Earth's climate system.

Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Fig. 1 .
Fig. 1.(a) Topography map and (b) MODIS land use and land cover map depicting the WRF-Hydro model domain, encompassing the southern Africa Region.

Fig. 2 .
Fig. 2. Dominant soil types in WRF-Hydro at 4 km horizontal resolution according to (a) the FAO default dataset (DEF) with assigned hydrophysical parameters for wilting point (b) and soil porosity (c).The soil types determined from perturbated soil datasets including (d) HWSD, (g) GSDE, and (j) SoilGrids (SGD), as well as the differences maps in assigned values of wilting point (e, h, k) and soil porosity (f, i, l) in comparison to the default soil data.

Fig. 3 .
Fig. 3. Spatial distributions and scatter plots for comparing simulated (a-c) 2-m air temperature, (d-f) precipitation, and (g-i) evapotranspiration during the summertime of 2015-2018 between the observational-based reference (REF) and WRF-Hydro with FAO default soil dataset (DEF).

Fig. 4 .
Fig. 4. Comparison of 6-hourly precipitation during the summertime of 2015-2018 between the observational-based reference data (REF) and WRF-Hydro with FAO default soil dataset (DEF).The comparison is presented in (a) a map displaying correlation coefficients and (b) a histogram depicting spatially averaged precipitation.Dashed areas in the correlation map indicate high statistical significance at the 95 % confidence level.

Fig. 5 .
Fig. 5. Spatial distributions of simulated surface (a) latent heat flux, (b) sensible heat flux, and (c) soil moisture for WRFH-DEF experiment, as well as the difference between soil data perturbed experiments, including WRFH-HWSD (d-f), WRFH-GSDE (g-i), WRFH-SGD (j-l) in comparison to the WRFH-DEF during the summertime of 2015-2018.Dashed areas in difference maps indicate statistical significance at the 95 % confidence level.

Fig. 6 .
Fig. 6.Histograms of (a) temporal-averaged and (b) spatial-averaged daily latent heat flux for the WRFH-DEF experiment (left column), along with the differences between each perturbed experiment and WRFH-DEF (right column) during the summertime of 2015-2018.(c-d) as same as (a-b) but for soil moisture.

Fig. 10 .
Fig. 10.Boxplots depicting the variation in components of the monthly joint atmospheric-terrestrial water budgets for the Eastern southern Africa region (ESAF).Each panel displays the monthly budget values (upper) derived from all ensemble simulations and the changes (bottom) between perturbed experiments of WRFH-(HWSD, GSDE, SGD) and WRFH-DEF.

Fig. 11 .
Fig. 11.Boxplots depicting the variation in components of the monthly joint atmospheric-terrestrial water budgets for the western southern Africa region (WSAF).Each panel displays the monthly budget values (upper) derived from all ensemble simulations and the changes (bottom) between perturbed experiments of WRFH-(HWSD, GSDE, SGD) and WRFH-DEF.

Table 1
Physical parameterization options of the coupled WRF-Hydro model adopted in the study.