Improving regional climate simulations based on a hybrid data assimilation and machine learning method

. The energy and water vapor exchange between the land surface and atmospheric boundary layer plays a critical role in regional climate simulations. This paper implemented a hybrid data assimilation and machine learning framework (DA-ML method) into the Weather Research and Forecasting (WRF) model to optimize surface soil and vegetation conditions. The hybrid method can integrate re-motely sensed leaf area index (LAI), multi-source soil moisture (SM) observations, and land surface models (LSMs) to accurately describe regional climate and land–atmosphere interactions. The performance of the hybrid method on the regional climate was evaluated in the Heihe River basin (HRB), the second-largest endorheic river basin in Northwest China. The results show that the estimated sensible ( H ) and latent heat ( LE ) ﬂuxes from the WRF (DA-ML) model agree well with the large aperture scintillometer (LAS) observations. Compared to the WRF (open loop – OL), the WRF (DA-ML) model improved the estimation of evapotranspiration (ET) and generated a spatial distribution consistent with the ML-based watershed ET (ETMap). The proposed WRF (DA-ML) method effectively reduces air warming and drying biases in simulations, particularly in the oasis region. The estimated air temperature and speciﬁc humidity from WRF (DA-ML) agree well with the observations. In addition, this method can simulate more realistic oasis–desert boundaries, including wetting and cooling effects and wind shield effects within the oasis. The oasis–desert interactions can transfer water vapor to the surrounding desert in the lower atmosphere. In contrast, the dry and hot air over the desert is transferred to the oasis from the upper atmosphere. The results show that the integration of LAI and SM will induce water vapor inten-siﬁcation and promote precipitation in the upstream of the HRB, particularly on windward slopes. In general, the proposed WRF (DA-ML) model can improve climate modeling by implementing detailed land characterization information in basins with complex underlying surfaces.


Introduction
Land-atmosphere interactions are an essential component of the hydrological cycle and factors that influence climate change (Nelli et al., 2020;Zhou et al., 2022).Terrestrial components, such as soil and vegetation, play a crucial role in atmospheric processes, such as changes in evapotranspiration (ET), which affect the water vapor content in the atmosphere (Gentine et al., 2019;Sawada et al., 2015;Wu et al., 2023).Soil and vegetation processes directly affect surface water vapor transport and energy circulation, particularly in the arid vegetated area (Erlandsen et al., 2017;Gao et al., Published by Copernicus Publications on behalf of the European Geosciences Union. 2008; R. Liu et al., 2018;X. Zhang et al., 2017;Zhang et al., 2019;Zhao et al., 2021).Such surface variability affects the available energy distribution at the land surface and has additional effects on the sensible and latent heat fluxes (H and LE), surface temperature, and water vapor (Sawada et al., 2015;Wen et al., 2012).Although land surface models (LSMs) have been improved incrementally in the past few decades, it is still challenging to effectively couple LSMs with atmospheric models to improve the description of landatmosphere interactions (Chen and Dudhia, 2001;Liu et al., 2021).The success of the coupling depends not only on the sophisticated physical processes of the LSMs but also on soil and vegetation characteristics and the accurate characterization of the water vapor fluxes at the land-atmosphere interface (Gentine et al., 2019;Zhang et al., 2021b;Z. Zhang et al., 2022).
The development of Earth observation technology has provided important opportunities to study land-atmosphere interactions using the data assimilation (DA) method (Liang et al., 2021).The Land Data Assimilation System (LDAS) has been widely developed and applied in recent years under various hydrological and vegetation conditions (Wu et al., 2022;Xia et al., 2019).It uses remotely sensed observations to constrain model physical processes and empirical parameters to improve water-energy-carbon flux simulations (Tian et al., 2022;Zhao and Yang, 2018).A series of studies have assimilated satellite-retrieved leaf area index (LAI), land surface temperature (LST), soil moisture (SM), and microwave brightness temperature observations into LSMs and improved simulations of ET, runoff, and gross primary productivity (GPP; Ahmad et al., 2022;X. He et al., 2020X. He et al., , 2021;;Ling et al., 2019;Seo et al., 2021;Xie et al., 2017;Xu et al., 2019Xu et al., , 2021)).In addition, DA can improve the initial conditions of regional climate models (RCMs) and enhance the capability of the models in the simulation of land-atmosphere interactions (Pan et al., 2017;Yi et al., 2021).Several studies have also shown that the assimilation of air pressure, air temperature, humidity, wind speed, and lightning observations into the Weather Research and Forecasting (WRF) model can improve the simulation of atmospheric state variables and the accuracy of weather prediction (Campo et al., 2009;Cazes Boezio and Ortelli, 2019;Comellas Prat et al., 2021;Grzeschik et al., 2008;Pilguj et al., 2019).
Machine learning (ML) algorithms have been increasingly applied in Earth and environmental modeling studies to predict land surface variables at various spatial and temporal scales (Jung et al., 2020;Reichstein et al., 2019;Xu et al., 2018).Compared to physical models, ML technology can fluently and accurately establish nonlinear and complex relationships between diverse independent variables (Koppa et al., 2022;Nearing et al., 2018).Thus, ML-based approaches can create beneficial pathways for knowledge discovery in process models, based on extensive data (Moosavi et al., 2021;Reichstein et al., 2019).The main improvements are focused on model approximation, parameterization, bias cor-rection, and hybrid modeling (Brajard et al., 2020;He et al., 2022;Jia et al., 2021;Xu et al., 2014;Zhao et al., 2019).Several studies have shown that the integration of the DA and ML methods can enhance the reliability of predictions and reduce simulation errors by including physical information in observed data (Brajard et al., 2020;Buizza et al., 2022;Forman and Xue, 2017;Gottwald and Reich, 2021;He et al., 2022).Forman and Xue (2017) integrated a ML model (as a measurement operator) into a DA system to improve the estimation of the snow water equivalent.Zhao et al. (2019) used a physics-constrained ML method to improve the LE estimates.He et al. (2022) proposed a hybrid model that can integrate remotely sensed LAI and multi-source SM observations to improve the estimation of ET within the coupled DA and ML framework.In general, the hybrid DA and ML approach was used in land surface modeling, but it is neglected in regional climate simulations.
The Heihe River basin (HRB) is a typical endorheic river basin in the arid and semi-arid regions of Northwest China (Li et al., 2013).The upstream mountain region is mainly covered by alpine meadow and Qinghai spruce, has a complex topography, and receives abundant precipitation.The midstream oasis of the HRB is mainly characterized by irrigated croplands, while the downstream oasis is characterized by riparian forests, and at the periphery of the oasis, there is vast desert (Xu et al., 2020).In the HRB, precipitation is the main water resource input in mountainous areas and determines the growth of vegetation in the oasis region, in addition to supporting urban and population development (Li et al., 2018(Li et al., , 2021)).Precipitation, snow, and changes to permafrost in the upstream mountains can affect mountain runoff and SM, evaporation, and the groundwater table in the mid-and downstream oases.Strong land-atmosphere interactions in the HRB affect the water and energy exchange between the surface and atmosphere and influence the sustainability of the oasis (Gao et al., 2008;Pan et al., 2021b).The oasis-desert local circulation in the HRB can lead to the microclimate features in oasis-desert areas, which include the cooling and wetting effect and wind shield effect of the oasis and the humidity inversion effect within the surrounding desert (Liu et al., 2020).
In recent decades, several comprehensive experiments have been implemented over the HRB to study landatmosphere interactions, including the Heihe River basin field experiment (Hu et al., 1994), Watershed Allied Telemetry Experimental Research (WATER; Li et al., 2009), and Heihe Watershed Allied Telemetry Experimental Research (HiWATER; Li et al., 2013).In recent years, many mesoscale climate models and high-resolution computational fluid dynamics (CFD) models have been used to analyze the effects of land-atmosphere interactions on the regional climate (R. Liu et al., 2018Liu et al., , 2020;;Xie et al., 2018;X. Zhang et al., 2017;Zhang et al., 2021a).X. Zhang et al. (2017) added an irrigation scheme to the WRF model and identified strong cooling and wetting effects on irrigated cropland in the midstream of the HRB.Liu et al. (2020) investigated the oasis-desert microclimate effects based on an improved CFD model and found that the oasis had a cold and wet island effect and wind shield effect.Zhang et al. (2021b) applied the WRF-Hydro model in the HRB and emphasized the role of lateral flow in the regional precipitation circulation.These studies illustrate that mesoscale climate models can be used as essential tools to better understand regional climate and land-atmosphere interactions in the HRB.However, the advantages of improving the representation of soil and vegetation processes in affecting regional climate via the coupled DA and ML framework have not been fully exploited, especially in basins with complex underlying surfaces.Therefore, this study aims to investigate the improvement in the hybrid DA and ML framework for regional climate and land-atmosphere interactions in the HRB, based on the WRF model, and to further reveal its physical mechanisms.
The goals of this study were to (1) couple the hybrid DA and ML (DA-ML) framework to the WRF model and improve the estimation of LAI and SM, (2) validate the H and LE with the large aperture scintillometer (LAS) observations and compare the ET estimates with the ML-based watershed ET, (3) investigate the performance of the air temperature and specific humidity estimates and evaluate the effects of the hybrid framework on near-surface atmospheric conditions in the mid-and downstream oasis regions, and (4) discuss the effects of the hybrid framework on wind speed and precipitation in the HRB.

Study area and dataset
The HRB (37.7-42.7 • N, 97.1-102.0• E) is the secondlargest endorheic river basin in Northwest China and has an area of approximately 143 000 km 2 , and the elevation ranges from 800 to 5000 m (Fig. 1).The annual precipitation is approximately 400 mm, and it gradually decreases from the upstream region of the HRB (south) to the downstream region (north).Land cover types exhibit spatial zonation in the HRB.The upstream region is a typical mountainous environment, including extensive alpine meadows, a few Qinghai spruces, glaciers, snow, and permafrost.The midstream region is spatially composed of oasis-desert ecosystems, and irrigated cropland is the main component of the oasis in this area.The downstream region of the HRB is covered mainly by desert and riparian ecosystems (Populus euphratica and tamarisks).Water vapor transport in this region is predominantly controlled by midlatitude westerly and polar northerly winds (Pan et al., 2021b).As a result, precipitation over the HRB shows strong spatial variability, with more than 70 % occurring in the upstream mountains (L.Wang et al., 2018;X. Wang et al., 2018).Nine fluxes and meteorological stations selected from the Heihe integrated observatory network were used for comparison with the model results (Fig. 1).
Among them, the Arou, Dashalong, and Hulugou stations in the upstream region of the HRB are covered by alpine meadows, while the Daman, wetland, and Huazhaizi stations in the midstream region are covered by irrigated cropland, wetlands, and desert, respectively.The Sidaoqiao, mixed forest, and desert stations in the downstream regions are covered by Populus euphratica, tamarisk, and desert, respectively.More details on the in situ information and measurement instruments can be found in Chen et al. (2014), Li et al. (2013), and S. Liu et al. (2018).
To provide a high-resolution land cover and soil texture dataset that matched the WRF simulation period, the regional land cover and soil texture product generated by Zhong et al. (2014) and Song et al. (2016), with a spatial resolution of 30 m, was employed.These datasets were downloaded from the National Tibetan Plateau Data Center (TPDC; Pan et al., 2021a; https://data.tpdc.ac.cn/en/, last access: 8 January 2022).The elevation data were generated by NASA's Shuttle Radar Topography Mission (SRTM3; 90 m), and these data were obtained from the Geospatial Data Cloud (http://www.gscloud.cn/,last access: 8 January 2022).The assimilated LAI data were retrieved from the Global Land Surface Satellite (GLASS) product with a spatial resolution of 1 km (Xiao et al., 2014; http://www.glass.umd.edu/, last access: 5 December 2020).Daily LAI observations were generated by linearly interpolating the original 8 d GLASS LAI product.The GLASS product has been demonstrated to have better accuracy than the Moderate Resolution Imaging Spectroradiometer (MODIS) and Advanced Very High Resolution Radiometer (AVHRR) and provides a continuous time-space LAI estimation (Xiao et al., 2014).The daily Soil Moisture Active Passive (SMAP) SM product (https: //appeears.earthdatacloud.nasa.gov/,last access: 5 December 2020), with a spatial resolution of 9 km, was integrated into the hybrid framework.
In this study, SM observations from the ecohydrological wireless sensor networks (WSNs) up-and midstream regions of the HRB are used as an independent validation to evaluate the SM estimates from the WRF (DA-ML).The validation SM dataset in the upstream regions was mainly covered by grassland and obtained by averaging SM observations from 40 nodes.There are nine network nodes installed in the LAS source area at Daman station that measured SM at the depths of 10 cm every 5 min (Che et al., 2019;S. Liu et al., 2018) (https://data.tpdc.ac.cn/en/, last acces: 8 January 2022).The half-hourly H was measured by the LAS instrument at the Arou, Daman, and Sidaoqiao sites.LE in these sites was obtained as the residual method of the energy balance equation (S.Liu et al., 2016).Compared to eddy covariance (EC) observations, the scintillometer provided kilometer-scale H and LE and is widely used for the validation of remote sensing products and model simulations (Zheng et al., 2023) 2014) generated gridded atmospheric forcing data using the WRF model over the HRB at an hourly 0.05 • resolution.These datasets have been widely used as input data for various models and for environmental and climate change analyses (Xu et al., 2019;Zhang et al., 2016).

WRF model setup
The advanced research WRF model version 4.0.3(Skamarock et al., 2019) was used in this study.The WRF is a state-of-the-art numerical weather and climate model designed by the National Center for Atmospheric Research (NCAR) for meteorological research and numerical weather predictions (Wang et al., 2021).The model source code is available at the official repository for WRF (https://github.com/wrf-model/WRF,last access: 28 December 2021).The model domain covering the HRB consisted of two-way nested domains with 9 and 3 km grid spacing.Only the simulation results for the 3 km grid were used in this study (Fig. 1).This high-resolution setting excludes the uncertainty in the cumulus parameterization and thus simulates the soil-precipitation feedback more realistically (Prein et al., 2015).In the vertical direction, 28 vertical sigma levels from the surface to 50 hPa were used.The atmospheric lateral boundary conditions in the WRF model were provided by the ERA5 reanalysis dataset, with a 0.25 • spatial resolution and hourly temporal resolution (https://cds.climate.copernicus.eu/cdsapp#!/search?type=dataset, last access: 8 January 2022).It is widely used in WRF simulations and provides boundary and initial conditions (Liu et al., 2021;Ma et al., 2022).The land cover, soil texture, elevation, and GLASS LAI dataset were resampled to 3 km to be consistent with the model simulation resolution.The physical parameterization schemes selected Thompson scheme (Thompson et al., 2008) Planetary boundary layer (PBL) Mellor-Yamada-Janjic scheme (Janjic, 1994) Surface layer Eta similarity scheme (Janjic, 1994) Land surface model (LSM) Noah-MP land surface model (Yang et al., 2011) in this study included the rapid radiative transfer model (RRTM) longwave and shortwave radiation scheme (Mlawer et al., 1997), Thompson microphysics scheme (Thompson et al., 2008), Mellor-Yamada-Janjic planetary boundary layer scheme (Janjic, 1994), and Noah-MP land surface scheme (Yang et al., 2011).The time step was 30 s, and the time resolution of the model output was hourly.Further details regarding the WRF model setup are presented in Table 1.The dynamic vegetation parameterization scheme was turned on in the Noah-MP model to generate dynamic LAI simulations.We resampled the spatial resolution of the WRF simulations to 1 km with the bilinear interpolation method for comparison with the station observations.

Hybrid model
In this study, the hybrid model proposed by He et al. (2022) based on the DA and ML methods was incorporated into the WRF model to improve the LAI, SM, and ET simulations.The hybrid approach relies on the DA method to update the vegetation dynamics of the Noah-MP model and the ML method to construct a three-layer SM surrogate model.Compared with the direct assimilation of coarse-resolution remotely sensed SM, the hybrid model can improve the estimation of SM and ET on the heterogeneous land surface.This is because in situ SM profile observations are used to construct an ML-based surrogate model to improve SM and ET estimation on complex underlying surfaces.
In the DA part, the remotely sensed LAI was assimilated using the ensemble Kalman filter (EnKF) method to update the leaf biomass (LFMASS) and optimize the specific leaf area (SLA) in the Noah-MP model.LAI is estimated as the product of leaf biomass predictions and SLA (LAI = LFMASS × SLA) in the Noah-MP model.Model ensembles were generated by adding normally distributed random errors to the model states (LFMASS) and parameters (SLA).The ensemble size is set as 40 to ensure an accurate approximation of the error covariances while maintaining computational efficiency (Seo et al., 2021).Normally distributed errors with a mean of zero and a standard devia-tion of 10 g m −2 were added to the LFMASS (Ahmad et al., 2022;Xu et al., 2021).The standard deviation of the SLA was set to 10 % of the default parameter (Xu et al., 2021).Furthermore, a uniform observation error standard deviation of 0.1 (-) was added to the remotely sensed LAI (He et al., 2022;Xu et al., 2021).These relevant statistical values have been widely used in previous LAI DA studies (Ahmad et al., 2022;Ling et al., 2019;Rahman et al., 2022).
In the ML part, the normalized soil texture (ST), land cover (LC), air temperature and humidity (Ta and RH), wind speed (U ), precipitation (P ), solar radiation (Rs), LAI, and SM observations were used to construct the SM surrogate model.ST, LC, Ta, RH, U , P , Rs, and LAI are the predictor variables.The in situ SM profile observations (from 19 automatic weather stations) and SMAP SM products in the HRB are used as target variables to train and test the SM surrogate model.The extreme gradient boosting (XGBoost) method was chosen in the SM surrogate model to improve multilayer SM simulations.The first layer (the top 0.1 m) of in situ SM observations and SMAP SM were trained to establish the surface layer ML model.The averaged second (0.1-0.4 m) and third (0.4-1.0 m) layers of in situ SM observations were used to construct the root zone ML models.The SM observations at different depths were averaged to be consistent with the Noah-MP model soil layer.The number of SM training samples in the first, second, and third layers are 9824, 7804, and 7793, respectively.A 10-fold testing method is employed to examine the performance of each ML method.In each fold, 90 % of the training samples are used to train the model, and the remaining 10 % of the data is used to test the model.The SM surrogate model can consider the effects of midstream irrigation events and downstream shallow groundwater tables on SM and improve Noah-MP ET estimates.More details regarding this method can be found in He et al. (2022).
The coupled land-atmosphere DA-ML system consists of two steps.In the first step, the meteorological forcing data were generated from the WRF model at time t.Then, the meteorological forcing data and initial states were input into the Noah-MP model to simulate the LAI, SM, and ET.In the  A1.The differences between the WRF (DA-ML) and WRF (OL) simulations were used to investigate the effects of LAI and SM integration.The root mean square deviation (RMSD) and coefficient of determination (R 2 ) statistical metrics were used to evaluate the performance of the WRF (DA-ML) model, as follows: where P i and O i are the predicted and observed values at time step i, respectively.P and O represent the mean values of P i and O i .
4 Results and discussion

Validation of the hybrid model
Figure 3 shows the monthly averaged LAI estimates from the WRF (OL), WRF (DA-ML), and GLASS products.As indicated, the WRF model failed to capture the magnitude and seasonality of the LAI.This is because the simulation of LAI dynamics in Noah-MP is controlled by the planting date, harvest date, and growing degree days in the cropland (X.Liu et al., 2016).In addition, an inaccurate specification of the SM saturation, Vcmax25, and Clapp-Hornberger b parameter affects photosynthesis and biomass accumulation in vegetation (Cuntz et al., 2016;Levis et al., 2012).All these parameters are site-specific and empirical and cannot be easily applied across regions.The assimilation systematically increased LAI during the growing season, and a significant increment in LAI was observed in midsummer (June-August).
The seasonal pattern of the WRF (DA-ML) was more consistent with that of the GLASS LAI than the WRF model, which indicates that the WRF (DA-ML) provides essential information for modeling vegetation dynamics.The simulated LAIs from WRF (OL) in the cropland, grassland, forest, and shrubland areas were 1.12, 1.05, 1.49, and 0.33 m 2 m −2 , respectively, all of which were lower than that of the GLASS LAI.After assimilation, the simulated bias in the LAI from WRF (DA-ML) in the HRB can be reduced from 0.94 to 0.11 m 2 m −2 .The results also show that the WRF (DA-ML) systematically overestimates the LAI, especially in the cropland.This is because, in addition to LAI assimilation, the integration of multi-source SM observations also affects the LAI dynamics.
The SM estimates from the WRF (OL) and WRF (DA-ML) are validated over the up-and midstream WSNs in Fig. 4. The SM estimates from the WRF model were markedly lower than WSN observations for cropland because the impacts of irrigation events on SM estimates are not fully considered in the Noah-MP model (He et al., 2022;Zhang et al., 2020).The Noah-MP model also slightly underestimates SM in the upstream regions because it ignores the effects of dense root systems and soil organic matter on SM estimation (Chen et al., 2012;Sun et al., 2021).As anticipated, SM predictions from the WRF (DA-ML) are closer to the measurements than those of WRF.WRF (DA-ML) SM retrievals indicate a reasonable response to the precipitation and irrigation events in the midstream cropland.Similarly, the WRF (DA-ML) SM dynamics show a characteristic response to precipitation in the upstream regions.The results also indicate that the SM simulations from the WRF (DA-ML) model find it hard to capture the observed peak values.This is because the prediction accuracy of the ML methods is limited by the training dataset.This also means that if the model is applied under extremely wet conditions with sparse training data, then the performance of the hybrid model will decrease as the number of training samples decreases.In general, the WRF (DA-ML) can use the information contained in remotely sensed LAI and multi-source SM observations to improve land surface conditions.
Figure 5 shows the spatial patterns of the averaged LAI and SM estimates from May to September 2015.The WRF simulation significantly underestimated the LAI, particularly in the up-and midstream vegetation areas of the HRB.In addition, it underestimated the SM in the mid-and downstream vegetation regions.The integration of LAI and SM into the WRF model improved the estimation of leaf biomass and SM and increased LAI and SM in the HRB.The maps of estimated LAI and SM from the DA-ML method consistently resembled the rainfall, vegetation cover, irrigation event, and shallow groundwater table features (Xu et al., 2018(Xu et al., , 2020)).The precipitation in the upstream mountains, irrigation in the midstream oasis, and shallow groundwater in the downstream oasis enhance SM and provide the necessary water supply for vegetation growth (Li et al., 2022).Figure 5 also shows the LAI and SM differences between the WRF (DA-ML) and WRF (OL) simulations.The maximum LAI (SM) difference from the WRF (DA-ML) and WRF (OL) simulations reaches approximately 2.24 m 2 m −2 (0.16 m 3 m −3 ) and is present in the midstream oasis of HRB.The difference between the WRF (OL) and WRF (DA-ML) was due to the effects of irrigation and crop growth.Similarly, this difference in the downstream oasis is a result of the shallow water table and the growth of riparian forests.In the upstream alpine meadows, the LAI simulated by the WRF (DA-ML) was greatly increased compared to that of the WRF (OL).However, the SM enhancement in the WRF (DA-ML) was not significant due to the sufficient precipitation in mountainous areas.still higher and lower than the observed values at the Sidaoqiao site.This is because the spatial representation of the model simulation (3 km) is inconsistent with the LAS measurements (path length of 2350 m).This mismatch will introduce uncertainty in the validation results, especially in heterogeneous land surfaces (Y.Zhang et al., 2022).The LE measurements of the LAS instrument are obtained from the residuals of the surface energy balance equation, which may lead to uncertainties in the LE observations.In addition, the higher surface heterogeneity and complex hydrological processes in the downstream oasis affect the training accuracy of the ML method, which further affects the performance of the WRF (DA-ML) model (He et al., 2022).

Sensible and latent heat fluxes
Figure 7 shows the spatial distribution of ET estimates from the WRF (OL), WRF (DA-ML), and ETMap over the HRB.The results indicate that the ET values from the WRF model were underestimated, especially in the midstream oasis region, which was mainly because the WRF model underestimated the SM and LAI (see Figs. 3 and 4) during the growing season.Compared with the WRF (OL) model, the WRF (DA-ML) method improves the estimation of ET, and the spatial distribution is consistent with that of ETMap because of the effective information contained in the remote sensing LAI and multi-source SM observations.The estimation of ET in the WRF (OL) is sensitive to SM and vegetation dynamics, especially in semi-arid regions.Therefore, the WRF (DA-ML) model will produce more improvements in the mid-and downstream oasis regions compared to the WRF (OL) model.The spatial patterns of ET from the DA-ML method showed a significant gradient from wet to dry, owing to variations in the precipitation and vegetation cover.In the upstream regions of the HRB, the spatial pattern of retrieved ET was mainly controlled by precipitation and vegetation cover.The ET values were higher in areas with heavier precipitation and denser vegetation.In the midstream region, the spatial pattern of ET was well aligned with the oasis caused by crop growth and irrigation.Meanwhile, the ET values were higher in the downstream oasis because of shallow water tables and transpiration from riparian forests (Xu et al., 2018(Xu et al., , 2020)).The sparsely vegetated areas covered by desert and Gobi in the mid-and downstream regions had the lowest ET values.The results show that the integration of remotely sensed LAI and multi-source SM observations is essential for studying land-atmosphere water vapor fluxes (ET) because of the realistic land surface conditions.

Air temperature and specific humidity
The monthly averaged 2 m air temperature and specific humidity from the WRF (OL), WRF (DA-ML), and corresponding observations at nine sites are shown in Figs. 8  and 9.As indicated, the WRF model overestimated (underestimated) the air temperature (specific humidity) in the HRB, especially in the midstream oasis (Daman and wetland stations), which was mainly because the WRF model underestimated the SM and LAI (see Figs. 3 and 4) in the HRB.Compared to the WRF model, the WRF-(DA-ML)simulated seasonal cycles of air temperature and specific hu-midity at the nine sites were closer to the measurements, which was because the integration of remotely sensed LAI and multi-source SM observations improves the estimation of vegetation dynamics and SM, decreases the air temperature, and increases the specific humidity.The increased specific humidity was due to the enhanced evaporation from the soil and stronger transpiration from the expanded vegetation cover.Simultaneously, evaporation absorbs a large amount of energy, thereby reducing the air temperature (Wen et al., 2012).The discrepancy between the WRF (OL) and WRF (DA-ML) was amplified in the middle of the growing season (June, July, and August) due to dense growing vegetation and higher SM caused by several irrigation events.After integrating LAI and SM, the simulated air temperature and specific humidity values from the Daman station decreased and increased by approximately 1.75 K and 1.86 g kg −1 , respectively.But for Sidaoqiao, the air temperature and specific humidity decrease and increase by about 0.59 K and 0.41 g kg −1 , respectively.The results show that the midstream artificial oasis exhibits a stronger cold and wet island effect than the downstream natural oasis.The estimated air temperature and specific humidity increased from May to July and decreased from August to September.The specific humidity estimated at Daman exhibited significant seasonal variations due to irrigation events and crop phenology.
Tables 3 and 4 further compare the simulated air temperature and specific humidity with the same variables from the station observations.The WRF (OL) results show a dry bias in the HRB region, which is reduced by the simulation of the WRF (DA-ML).The statistical metrics (i.e., R 2 and RMSD) of the daily air temperature and specific humidity estimates from the WRF (OL) and WRF (DA-ML) methods are shown in Tables 3 and 4. For the nine sites, the average Table 3. Averaged air temperature and R 2 and RMSD of the WRF (OL) and WRF (DA-ML) compared with the measurements at the nine sites.Note that Obs stands for observation and Sim for simulation.
Figure 10 compares the spatial patterns of the air temperature and specific humidity maps from the WRF (OL) and WRF (DA-ML).Compared with the WRF (OL), significant differences were observed in the WRF (DA-ML).The integration of LAI and SM decreases air temperature and increases specific humidity in the vegetated area of the HRB, particularly in the midstream oasis region.The spatial distribution of specific humidity from the WRF (DA-ML) is consistent with the LAI and SM maps in Fig. 5   simulations lead to different land surface dynamic and thermal characteristics between the oasis and desert.This difference leads to oasis-desert interactions and produces microclimatic effects, including the cooling and wetting effects of the oasis.The average simulated air temperature from WRF (OL) and WRF (DA-ML) methods in the midstream oasis were 293.64 and 291.32 K, respectively.In contrast, the near-surface air temperatures over the desert are approximately 294.13 and 293.54 K, respectively.The difference in air temperature between the oasis and desert areas indicates that the oasis areas represent a cold and wet island compared to the surrounding desert.This difference is amplified after the implementation of the DA-ML method.The significant wetting and cooling effects propagate in desert areas to a maximum distance of approximately 5-10 km from the edge of the oasis.In the midstream oasis, the dominant vegetation is irrigated cropland, and the vegetation cover was only approximately 42 % in the original WRF model; however, the vegetation cover was updated to approximately 70 % in the WRF (DA-ML).The different land surface dynamic and thermal characteristics between the oasis and desert can produce oasis-desert interactions and enhance local circulation.The oasis-desert interactions create a water vapor flux from the oasis to the surrounding desert.This transport process is beneficial for increasing desert water vapor and maintaining the sustainability of desert vegetation (Li et al., 2016;Liu et al., 2020).A similar pattern was observed in the downstream oasis.However, because of the decreased SM and vegetation cover (Fig. 5), the downstream oasis exhibited a weaker wet island effect.The results also indicated that enhanced vegetation transpiration increases specific humidity and reduces air temperature owing to increased LAI in the upstream region of the HRB.
The abovementioned findings show that the proposed WRF (DA-ML) method exhibits strong wetting and cooling effects in the mid-and downstream oasis.These wetting and cooling effects reduce the air warming bias and dry bias in the simulation.Therefore, the WRF (DA-ML) simulation is much closer to the observations than the WRF (OL) simulation.Two vertical profiles were selected in Fig. 10 to further analyze the effect of the DA-ML on the local climate in the mid-and downstream areas.The difference between the WRF (DA-ML) and WRF (OL) methods is used to represent the enhanced cooling and wetting effects after improving the LAI and SM simulations.As illustrated in Fig. 11, the enhanced wetting and cooling effects of the midstream oasis were the strongest in the southern irrigated cropland and gradually decreased in the northern desert areas.The magnitudes of the surface wetting and cooling effects were consistent with the differences in LAI and SM estimates from the WRF (DA-ML) and WRF (OL).For example, the difference in LAI and SM peaks at 38.08 • N and the wetting and cooling effects of midstream irrigated cropland were also stronger in this region.These results suggest that the wetting and cooling effects caused by irrigation and vegetation growth occur mainly in the oasis region and do not affect more distant non-oasis areas.Moreover, the wetting and cooling effects of the oasis were mainly concentrated in the boundary layer, gradually decreased from the land surface upward, and were replaced by slightly warming and drying effects.Such warming and drying effects may be related to the enhanced subsidence over the oasis.Similar results have been demonstrated in several previous studies (Liu et al., 2020;Wen et al., 2012;X. Zhang et al., 2017;M. Zhang et al., 2017).
Figure 12 shows the same wetting and cooling effects in the downstream oasis.Compared to the midstream irrigated cropland, the downstream oasis wetting and cooling effects were mainly influenced by the growth of riparian forests and shallow groundwater tables.The wetting and cooling effects showed maximum values at 42.01 • N owing to the strong LAI and SM shifts.The results also indicate that the wetting and cooling effects of the downstream oasis were weaker than those of the midstream oasis.By integrating LAI and SM, the air temperature in mid-and downstream oasis decreases by 0.96 and 0.12 K and the specific humidity increases by 0.52 and 0.06 g kg −1 , respectively.In general, the integration of the LAI and SM data can produce more realistic land surface conditions in the oasis region and lead to stronger wetting and cooling effects.

Wind speed and precipitation
The mean wind vectors at 10 m during the growing season from the WRF (OL) and WRF (DA-ML) in the midand downstream oases are shown in Fig. 13.By comparing the simulated wind speeds in the oasis and the surrounding desert, we found that crops, shelterbelts, and residential areas in the midstream oasis produced a wind shield effect.The wind speed within the oasis is less than that of the surrounding desert because the drag force of crops, shelterbelts, and residential areas reduces the wind speed and also changes the wind direction (Liu et al., 2020).In Fig. 13, the heat transfer coefficient (C h ) from the WRF (DA-ML) was used to compare the surface roughness in the oasis with that of the surrounding desert.C h is an important parameter for calculating the heat transfer between the land and atmosphere, and it is mainly related to the length of the surface roughness and the intensity of the stability of the atmospheric surface layer (Smedman et al., 2007).The results show that the C h estimates are higher and reduce the wind speed in the midstream oasis compared to the surrounding desert.Ozdogan and Salvucci (2004) and Liu et al. (2020) also showed that crop growth enhanced the surface roughness and slow wind speed.Figure 13 also shows that the wind speed values from the WRF (DA-ML) scheme are slightly lower than those from the WRF (OL) scheme.The average wind speed in the midstream oasis was reduced from 1.92 to 1.23 m s −1 by integrating the LAI and SM.As mentioned earlier, the different dynamic and thermal characteristics between the oasis and desert can produce oasis-desert interactions and generate local circulation, which drives the cold and moist airflow from the oasis to the surrounding desert in the lower atmosphere.As shown in Fig. 13, the area to the south of the midstream irrigated cropland is the Qilian Mountains, whereas the area to the north is a large desert.Therefore, the southerly airflow generated by the midstream oasis weakened the background north wind.The mean wind vectors in the downstream oasis are shown in Fig. 13.The wind speed in the downstream desert regions is slightly higher than in the oasis regions.The lower wind speed in the mid-and downstream oasis is helpful to plant growth, people's survival in the environment, and the maintenance of the oasis and desert ecosystem (Wang and Cheng, 1999).The results also indicated that the wind vector in the downstream oasis was mainly controlled by the https://doi.org/10.5194/hess-27-1583-2023 Hydrol.Earth Syst.Sci., 27, 1583-1606, 2023 background northerly wind and the effects of LAI and SM integration on the wind vectors were weaker.
The integration of the LAI and SM affected the wind speed at the land surface and the local circulation through oasisdesert interactions.Figure 14 shows the zonal mean vertical velocity and local meridional circulation in the midstream oases from the WRF (OL) and WRF (DA-ML).Compared with the flat topography of the downstream oasis, the topography of the midstream oasis generally varies from plains to mountains (from low to high altitude) from north to south.The surface dynamic and thermal characteristics of the oasis and surrounding desert differed significantly; therefore, strong horizontal temperature and humidity field gradients were observed at the intersection of the boundary layer of the mountains, oasis, and surrounding desert (Meng et al., 2015;Wen et al., 2012).The air humidity and vegetation cover in the midstream oasis were enhanced by integrating the LAI and SM, which resulted in stronger evaporation from irri-gated cropland than from the surrounding desert.As shown in Fig. 14, the divergence of the lower atmosphere over the midstream oasis is enhanced, and the wet and cold air masses are transferred to the surrounding desert through advection, whereas the dry and hot air is transferred into the oasis from the upper atmosphere.In the upper atmosphere, the desert-to-oasis air masses enhance the background northerly winds, which promote atmospheric water vapor transport in the HRB.However, oasis-desert interactions are weaker in the downstream region (Fig. A1) than in the midstream region under actual weather or climate conditions, which is attributed to the local circulation being weakened by stronger background northerly winds.Overall, the simulation of soil and vegetation characteristics can be improved by integrating LAI and SM and enhancing land-atmosphere interactions in mid-and downstream oases.
Figure 15 exhibits the influence of the DA-ML on precipitation in the HRB.The results show that the integrated LAI  and SM led to increased precipitation in the upstream regions of the HRB and that the spatial variation in precipitation was very heterogeneous.The increase in precipitation was mainly concentrated in the southeastern part of the HRB, where it reached approximately 1.5 mm d −1 , which represented 32 % of the simulated value of the WRF (OL) experiment.In contrast, precipitation increased insignificantly in the mid-and downstream oasis regions.The increased precipitation in the upstream region may have been due to the additional water vapor supply.Water vapor fluxes in the mountain areas and midstream oasis regions were enhanced by integrating the LAI and SM.Driven by background northerly winds (Fig. 14), more water vapor fluxes from the midstream oasis region were carried to the upstream region.The wind speed and precipitation estimates in the upstream region (around the Babao River basin) are shown in Fig. 15b and c.As shown, the WRF (DA-ML) enhanced the estimation of precipitation on windward slopes compared with valleys.After integrating the LAI and SM, the land-atmosphere interactions are altered.The DA-ML increased the latent heat and decreased the sensible heat flux.The water vapor carried by the air masses and lifted on the sloped surface was more likely to condense and produce precipitation (Yue et al., 2021).In general, the DA-ML enhanced the precipitation estimates in the upstream mountain areas, mainly on windward slopes.
The simulated daily precipitation from the WRF (OL) and WRF (DA-ML) was compared with that of the AFD and CMFD references in Fig. 16.Because the water vapor from the East Asian monsoon was blocked by the Tibetan Plateau, most of the precipitation was concentrated in the southeastern part of the Qilian Mountains.Figure 16 shows that the high precipitation zone of the HRB was mainly located in the mountainous areas below 39.5 • N due to orographic lifting and convection (Zhang et al., 2021b).The main precipitation events were consistent between the WRF (OL) and WRF (DA-ML).The WRF (DA-ML) had higher precipitation in the southern domain because peak precipitation was enhanced and fewer precipitation events were in-creased (red rectangle).Both the WRF (OL) and WRF (DA-ML) captured the temporal and spatial variability in precipitation well and were consistent with the reference data, indicating that the 3 km high-resolution grid contains information on topography-related heterogeneity and accurately estimates the precipitation distribution.The estimated precipitation in the upstream regions of the HRB was more consistent with the AFD reference but was overestimated by approximately 0.43 mm d −1 compared to the CMFD.The discrepancy between the precipitation estimated by the WRF and CMFD schemes occurred because the China Meteorological Administration sites fused by the CMFD product were mainly distributed at elevations below 3500 m (J.He et al., 2020).Therefore, there are some uncertainties in the precipitation simulation of the CMFD products in high-altitude mountainous areas (Zhang et al., 2021b).Compared to the WRF model, the seasonal mean air temperature and specific humidity simulated by the WRF (DA-ML) at the nine sites were closer to the station measurements.For the WRF model, the nine-site-averaged root mean square deviation (RMSD) of the air temperature and specific humidity estimates was 1.79 K and 1.08 g kg −1 .The WRF (DA-ML) reduces the aforementioned RMSDs by 21.23 % and 24.07 %.Strong wetting and cooling effects on vegetated areas were observed through the integration of LAI and SM, especially in the midstream oasis.The magnitude of the surface wetting and cooling effects corresponded well with the differences in the LAI and SM estimates from the WRF (DA-ML) and WRF (OL).These results indicate that the wetting and cooling effects gradually decrease from the land surface upwards and are replaced by slight warming and drying effects.
The crops, shelterbelts, and residential areas in the midstream oasis produce a wind shield effect because of the stronger surface roughness.The different land surface dynamic and thermal characteristics between the oasis and desert can produce oasis-desert interactions and generate local circulation.In the lower atmosphere, wet and cold air masses are transferred to the surrounding desert by advec-tion, while the dry and hot air over the desert is transferred to the oasis from the upper atmosphere.The results show that the integration of LAI and SM will induce water vapor intensification and promote precipitation in the upstream regions of the HRB.The WRF (DA-ML) simulation captured the temporal and spatial variability in precipitation well and was consistent with the reference data.The results indicate that the 3 km high-resolution grid can consider topographic information and produce accurate precipitation distribution estimates.
Appendix A Table A1.The computation details about the WRF (OL) and WRF (DA-ML).Author contributions.XH developed the model code and completed the draft paper, with support from all co-authors.YL, SL, TX, FC, ZL, RL, and LS revised the paper.ZZ provided the original WRF code.ZX, ZP, and CZ provided the methods for processing the observational data.All authors contributed to the synthesis of the results and key conclusions.
Competing interests.The contact author has declared that none of the authors has any competing interests.
Disclaimer.Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Figure 1 .
Figure 1.(a) Land surface elevation, (b) meteorological stations, and land cover types in the study area.The solid blue and black lines represent the river and the boundary of the HRB, respectively.

Figure 2 .
Figure 2. (a) Details of the hybrid DA and ML method and (b) a flowchart of the coupling with the WRF model.

Figure 3 .
Figure 3. Seasonal variations in the LAI estimates for cropland, grassland, forest, and shrubland in the HRB.

Figure 6
Figure 6 compares the daily H and LE estimates from the WRF (OL) and WRF (DA-ML) models with the LAS at the Arou, Daman, and Sidaoqiao sites.As indicated, the retrieved H and LE from the WRF (DA-ML) model agree well with the observations and mainly fall around the 1 : 1 line.The WRF (DA-ML) model performs better than the WRF (OL) model because of the improved LAI and SM simulations.The statistics of turbulent heat flux estimates at the three sites are summarized in Table 2.The three-siteaveraged RMSD of daily H and LE predictions for the WRF (OL) model are 53.61 and 63.73 W m −2 , respectively.The WRF (DA-ML) model decreases the abovementioned RMSDs by 43.74 % and 23.98 %.The relatively low RMSD values indicate that the WRF (DA-ML) model can accurately estimate turbulent heat fluxes over different sites with contrasting environmental conditions.The results also show that the simulated H and LE of the WRF (DA-ML) model are

Figure 4 .
Figure 4.The time series of SM estimates from the WRF (OL) and WRF (DA-ML) models against the up-and midstream WSN observations in 2015.

Figure 5 .
Figure 5.The LAI and SM estimates from the WRF (OL) and WRF (DA-ML) models during the growing season in 2015 and the average difference in the LAI and SM between the WRF (DA-ML) and WRF (OL) (i.e., WRF (DA-ML) minus WRF (OL)).

Figure 6 .
Figure 6.Scatterplot of daily sensible and latent heat flux estimates from the WRF (OL) and WRF (DA-ML) models versus measurements at the Arou, Daman, and Sidaoqiao sites.

Figure 7 .
Figure 7. Spatial distribution of evapotranspiration estimates obtained from the WRF (OL), WRF (DA-ML), and ETMap during the growing season in 2015.

Figure 8 .
Figure 8. Monthly averaged air temperature simulations from the WRF (OL) and WRF (DA-ML) versus the observations at nine sites in 2015 (error range denotes the standard deviation).

Figure 9 .
Figure 9. Monthly averaged specific humidity simulations from the WRF (OL) and WRF (DA-ML) versus the observations at nine sites in 2015 (error range denotes the standard deviation).

Figure 10 .
Figure 10.Spatial distribution of the air temperature and specific humidity estimates from the WRF (OL) and WRF (DA-ML) during the growing season in 2015 and the average difference in air temperature and specific humidity between the WRF (DA-ML) and WRF (OL) (i.e., WRF (DA-ML) minus WRF (OL)).The blue line indicates the mid-and downstream oasis vertical profile used in Figs.11 and 12.

Figure 11 .
Figure 11.Mean vertical profile of differences in air temperature and specific humidity between the WRF (DA-ML) and WRF (OL) (i.e., WRF (DA-ML) minus WRF (OL)) and mean LAI and SM during the growing season in 2015 in the midstream oasis.The dashed and solid lines represent the WRF (OL) and WRF (DA-ML), respectively.The shaded white area represents the change in elevation.The orange bar represents the oasis area.

Figure 12 .
Figure 12.Mean vertical profile of differences in air temperature and specific humidity between the WRF (DA-ML) and WRF (OL) (i.e., WRF (DA-ML) minus WRF (OL)) and mean LAI and SM during the growing season in 2015 in the downstream oasis.The dashed and solid lines represent the WRF (OL) and WRF (DA-ML), respectively.The orange bar represents the oasis area.

Figure 13 .
Figure 13.(a, d) Mean heat transfer coefficient (C h ) from the WRF (DA-ML) and (b, c, e, f) wind vectors at 10 m during the growing season from the WRF (OL) and WRF (DA-ML) in the midstream (a-c) and downstream (d-f) oasis.Colored contours indicate elevations above ground level, and shading indicates the extent of the oasis.The black line is the boundary of HRB.

Figure 14 .
Figure 14.The zonal mean vertical velocity and meridional circulation from the WRF (OL) and WRF (DA-ML) models during the growing season in 2015 in the midstream oasis.The shaded white area represents the change in elevation.The orange bar represents the oasis area.

Figure 15 Figure 16 .
Figure 15.(a) Average difference in precipitation between the WRF (DA-ML) and WRF (OL) (i.e., WRF (DA-ML) minus WRF (OL)) in the Heihe River basin and (b) the upstream region (around Babao River basin) and (c) wind vectors at 10 m from the WRF (DA-ML) in the upstream region.The black line is the boundary of HRB.
Figure A1.The zonal mean vertical velocity and meridional circulation from the WRF (OL) and WRF (DA-ML) models during the growing season in 2015 in the downstream oasis.The orange bar represents the oasis area.

Table 1 .
WRF model setup.Note that RRTM is for the rapid radiative transfer model.

Table 2 .
Statistical indices of daily H and LE estimates from the WRF (OL) and WRF (DA-ML) models at the Arou, Daman, and Sidaoqiao sites.

Table 4 .
Averaged specific humidity and R 2 and RMSD of the WRF (OL) and WRF (DA-ML) compared with the measurements at the nine sites.