Understanding root-zone soil moisture in agricultural regions of Central Mexico using the ensemble Kalman filter, satellite-derived information, and the THEXMEX-18 dataset

ABSTRACT An Ensemble Kalman Filter (EnKF)-based assimilation algorithm was implemented to estimate root-zone soil moisture (RZSM) using a Soil-Vegetation-Atmosphere Transfer (SVAT) model during a complete growing season of corn in Central Mexico. Synthetic and field soil moisture (SM) observations and NASA SMAP SM retrievals were used to understand the effect of vertically spatial updates and uncertainties in meteorological forcings on RZSM estimates. Assimilation of RZSM every 3 days using SM observations at 4 depths lowered the averaged standard deviation (ASD) and the root mean square error (RMSE) by 60 % and 50 %, respectively, compared to the open-loop ASD. The assimilation of synthetic SM at the top 0-5 cm obtained RZSM closer to observations compared to THEXMEX-18 SM measurements and SMAP SM retrievals. Differences between EnKF estimates and SM observations and SMAP SM retrievals are mainly due to misrepresentation of vegetation conditions. The results improved SM estimates up to 10-cm depth using SMAP SM retrievals; however, additional studies are needed to improve SM at deeper layers. The implemented methodology can estimate SM at the top 10 cm of the soil every 3 days to mitigate the impact of the climate change on agricultural production over rainfed areas, particularly in developing countries.


Introduction
Information of the root-zone soil moisture (RZSM) is crucial in hydrology, micrometeorology, and agriculture studies (Hanson, Rojas, and Schaffer 1999) to estimate energy and moisture fluxes at the land surface. In agricultural applications, the RZSM is defined as the soil moisture in a column from the soil surface up to 1-m depth (Reichle et al. 2018). Soil moisture (SM) plays a significant role in simulating the sensible and latent heat at the ground surface and infiltration and runoff in the soil (Western and Bloschl 1999). Soil Vegetation Atmosphere Transfer (SVAT) models are used to estimate energy and moisture transport in soil and vegetation at the land surface and in the root-zone (Casanova and Judge 2008). Most SVAT models rely on measurements or empirical functions to simulate the effects of growing vegetation on land surface models. However, the RZSM estimates from SVAT models still diverge from in situ measurements due to errors in model conceptualization, computation, and numerical implementation, and due to uncertainties in model parameters, forcings, and initial conditions. Although various efforts have been made worldwide to obtain long-term in situ SM measurements to correct these errors and improve the knowledge of RZSM (e.g. Dorigo et al. 2021). Among these efforts, the field experiments have provided valuable information to improve the understanding of temporal and spatial variations of SM. For instance, the Australian Airborne Calibration/Validation Experiments for the Soil Moisture and Ocean Salinity mission (AACES) (Peischl et al. 2012) was performed in South-East Australia and contributed to evaluate the SM estimates from the European Space Agency (ESA)-Soil Moisture and Ocean Salinity (SMOS) mission, particularly, over agricultural regions during summer seasons. Similarly, the ELBARA SM network (Wigneron et al. 2012;Fernández-Morán et al. 2015;Rautiainen et al. 2011) was implemented by ESA to obtain in situ SM measurements and improve the SMOS estimations of surface parameters over Europe. Recently, the USA National Aeronautics and Space Administration (NASA) has implemented a series of field experiments called Soil Moisture Active Passive Validation Experiments (SMAPVEXs) to validate and improve the SM estimates derived from the Soil Moisture Active-Passive (SMAP) mission over agricultural regions in USA and Canada (McNairn et al. 2015;Bhuiyan et al. 2018;Colliander et al. 2019;Judge et al. 2021). In developing countries, China has been conducting field experiments to help in the evaluation of satellite products in the recent years (e.g. Li et al. 2013;Zhao et al. 2020). These databases are crucial to represent changing interactions in the vegetation layer and compute realistic estimates of the fluxes when using the SVAT models. In developing countries, there is still missing information characterizing their regions to improve SM at these latitudes. When there is no information in some regions, the vegetation conditions are usually obtained using empirical functions or allometric equations based on indirect observations (Kolassa et al. 2020). However, the global representation over agricultural areas does not characterize correctly phenological changes in the plant (Kolassa et al. 2020;Gao et al. 2021). This indicates that there is still a need of studies focused on calibrating values to represent appropriately changes in land surface conditions in SVAT models over developing countries and improve the RZSM estimates. Such studies can provide unique insights into the biophysics implemented in the model.
The RZSM estimates can also be significantly improved by assimilating remotely sensed SM observations into an SVAT model (Reichle and Koster 2005;Reichle et al. 2007;Monsiváis-Huertero et al. 2010;Nagarajan et al. 2011). These remotely sensed SM observations can be obtained by using sensors such as the Time Domain Reflectometers (TDR) or retrievals from satellite information such as that from the NASA SMAP mission (Entekhabi et al. 2010) and the ESA SMOS mission (Kerr et al. 2016). Ensemble-based assimilation techniques such as the Ensemble Kalman filter (EnKF) and the Particle Filter (PF) have been widely used for land data assimilation research and applications since they can be applied to nonlinear and discontinuous models (Huang et al. 2008;Brandhorst, Erdal, and Neuweiler 2017;Yan and Moradkhani 2016). Nagarajan et al. (2011) compared the performance of a PF-based algorithm with the EnKF to improve RZSM estimates over a crop field during a complete growing season. They found that both the filtering techniques offered significant improvement in RZSM estimates compared to the open-loop; nevertheless, the presence of unaccounted biophysical errors in the model affects PF performance more than the EnKF leading to higher errors when the crop plant is fully developed. Furthermore, the EnKF performance has been reliable enough to generate global SM products using microwave satellite observations (De Lannoy and Reichle 2016). The assessment of SM and RZSM products based on assimilation frameworks and satellite observations over croplands shows a difference higher than 0.04 m 3 m −3 when compared to in situ measurements (De Lannoy and Reichle 2016; Reichle et al. 2017;Nair and Indu 2019). Among the different sources of uncertainties that cause this error, it has been mentioned the need of regional information of forcings, particularly precipitation, and long-term datasets at different soil depths over diverse climates to build up reliable reference data over agricultural areas and calibrate the SVAT models.
In Latin America, due to the lack of high temporal and spatial (vertical) density datasets over agricultural lands such as in (López-Lambraño et al. 2020;Monsiváis-Huertero et al. 2019), rigorous studies applying the EnKF to SM observations are very unusual. When there are databases available describing SM conditions, generally, these studies describe the near-surface (0-5 cm) conditions of the soil by using sparse SM networks or SM retrievals from microwave satellite sensors. It has been demonstrated that, although the improvement in the RZSM estimates when assimilating near-surface SM is lower compared to the assimilation of SM information describing the complete SM profile, the assimilation of the near-surface SM observations are very helpful to generate horizontal resolution estimates of the full SM profile with complete spatio-temporal coverage and often with better results than those of the model or satellite observations alone (Monsiváis-Huertero et al. 2010;Kolassa et al. 2017;Reichle et al. 2017;Mladenova et al. 2020). Ines et al. (2013) developed a EnKF framework using a modified Decision Support System for Agro-technology Transfer --Cropping System model to assimilate SM retrievals from the AMSR-E instrument to improve RZSM and yield forecasting over a growing season of corn. Authors found that it was possible to improve the estimation of yield and RZSM; however, the RZSM estimates were not accurate enough during wet conditions mainly because of a bias in the satellite SM retrievals. In Mladenova et al. (2019), authors assimilated SMAP SM retrievals using an EnKF-based algorithm and the Palmer model from the US Department of Agriculture (USDA) Foreign Agricultural Service (FAS) to monitor operationally SM conditions in the root zone. They concluded that the lowest improvement in RZSM is obtained in data-poor-areas because of high levels of random error in the soil and vegetation parameters. For regions with limited reliable databases describing the land surface conditions, such as Latin America, the evaluation of RZSM estimates from an EnKF-based framework using a calibrated SVAT model to assimilate bias-corrected SM retrievals from satellite operational missions can provide valuable insights for hydrological applications. This complete framework can account for the relationship between the observed SM near land surface and SM deeper in the soil profile.
The goal of this study is to compare the performance of RZSM estimates using an EnKF framework and an SVAT model when assimilating SM from in situ measurements and microwave satellite retrievals at different depths over an agricultural region located in Central Mexico. We use observations from the Terrestrial Hydrology Experiment in Mexico conducted in 2018 (THEX-MEX-18) (Monsiváis-Huertero et al. 2019) to compare the results from the EnKF using synthetic observations and SM retrievals from the NASA SMAP mission at a 36-km grid. Particularly, we aim at: (1) calibrating the SVAT model, called Land Surface Process (LSP) model (Judge, Abriola, and England 2003;Judge et al. 2008) for the soil and vegetation conditions in Central Mexico, (2) providing insights in the RZSM estimates when assimilating SM at depths of 0-5, 10, 20, and 30 cm and the assimilation of near-surface SM only, and (3) evaluating the accuracy in RZSM when assimilating SM from the SMAP L2SMP product.

General description
Mexico is a country with a large area of territory that is made up of different ecosystems and diverse climates, thus it is extraordinary land for agricultural activities. In specific, an area with a great presence of rain-fed agriculture and a temperate subhumid climate is been located between the parallels 22 • and 18 • . This region is characterized by the presence of volcanoes and mountains. The soil types in this region are loam and sandy loam that have a permanent wilting point of about 0.15 m 3 /m 3 , based on field observations (Inzunza-Ibarra et al. 2018), classing the area as a region with atypical/ extreme conditions for crop development. Within this area, it is located the municipality of Huamantla in the State of Tlaxcala, Mexico (see Figure 1). This municipality is considered the main producer of Mexican native corn, so this seed represents the most important crop with a coverage of approximately 65% of the total cultivated land in the zone (Torres-Salcido et al. 2015).
This positions Huamantla as one of the most important regions of the agricultural sector in Mexico. This zone has a temperate subhumid climate with a rainy period for most of the year usually from April to September. In winter, there is a range of temperatures between 2 and 18 degrees during the same day. The main relief feature is the Malinche volcano and the altitude of the area varies between 2300 and 3000 meters above sea level .
In this region, there are two types of corn varieties that are mainly cultivated: creole and hybrid. The creole cultivar is the native corn in Central Mexico, whereas the hybrid corn has been planted recently to support the new dry conditions as an effect of the climate change in the region. These two varieties have a high root density up to 50-60 cm depth in the soil where most of 80% of the roots are located and the corn plant has a growing season of 6 months, in average (Altieri and Trujillo 1987;Maria-Ramírez, Volke-Haller, and Guevara-Romero 2017;Velázquez-Cardelas et al. 2018).

The THEXMEX-18 dataset
The Terrestrial Hydrology Experiments in Mexico (THEXMEXs) are a series of field experiments conducted over different Mexican biomes to monitor the dynamics of soil and vegetation (Monsiváis-Huertero et al. 2019. The Terrestrial Hydrology Experiment in Mexico 2018 (THEX-MEX-18) was carried out in Huamantla to monitor surface parameters in rainfed fields of corn from April 14th to October 14th in 2018. During the THEXMEX-18, an area located between the parallels 19 • 11' and 19 • 27' North latitude and meridians 97 • 47' and 98 • 02' was covered.  1. TDR and soil temperature stations. Ground stations were programmed to collect data every 20 min. Each stations included sensors of SM and soil temperature. Soil moisture was measured using CS616 TDR sensors from Campbell Scientifics horizontally located at depths of 2.5, 5, 10, 20, and 30 cm. The SM measurements were calibrated at each depth and for each site using soil samples collected monthly to cover different SM conditions. The calibrated SM measurements showed a difference < 0.03 (m 3 m −3 ) when compared to gravimetric SM. In the case of soil temperature, the 108L Thermistors from Campbell Scietnfic were used at the same depths. One station containing a set of sensors was located at each monitored site. 2. Theta probe measurements. Surface SM (0-5 cm) was measured vertically using a ML3 ThetaProbe SM device. The measurements were uniformly distributed within the corn fields to characterize the spatial distribution of the surface SM. Simultaneously to the measurements of soil moisture, the surface soil temperature was measured at the same points. 3. Gravimetric sampling and soil texture. Soil samples were extracted at the depths of 2.5, 5, 10, 20, and 30 cm with a constant volume of 125 cm 3 (for depths of 2.5 and 5 cm) and 250 cm 3 (for depths of 10, 20, and 30 cm). The samples were placed in a plastic bag to minimize moisture loss, weighted wet, oven dried for 24 h at 100 • C, and reweighted dry to obtain the gravimetric soil. Finally, using the dried soil samples and applying the sieving method, the soil texture in terms of percentages of sand, clay, and silt was determined. 4. Vegetation measurements. A vegetation measurement protocol was carried out every 3 weeks to obtain descriptive parameters of the vegetation such as: vegetation water content (VWC), plant height, plant width, and leaf area index (LAI) (see Figure 2). In order to obtain a representative description, the measurements were carried out at 3 different sites within each crop field. 5. Meteorological forcings. In an additional module of the ground stations, sensors measuring relative humidity, air temperature, and precipitation were included. These sensors collected information every 20 min on the fields to characterize meteorological conditions in the study area. Additionally, we included the information collected by the meteorological stations of CONAGUA (National Water Commission of Mexico) to complement the meteorological conditions in the Huamantla region. 6. Land cover map. The land cover map was computed using information from the National Institute of Statistics and Geography of Mexico (INEGI) and a classification algorithm based on a Genetic Algorithm (GA) and a Support Vector Machine (SVM) code to process Sentinel-2 images (Mountrakis, Im, and Ogole 2011). The training and validation steps used 50 agricultural fields. The final classification showed an overall accuracy higher than 80% at 36-km scale. Table 2 shows the percentage of the main land cover classes and Figure 1 shows the land cover map during the THEXMEX-18. Fraction cover of the vegetation classes was evaluated using the temporal information of the land cover map. In addition to the in situ information, concurrent SMAP L2 soil moisture product (SMAP L2SMP) data (O'Neill et al. 2018) were acquired. The NASA launched the Soil Moisture Active Passive (SMAP) satellite in January 2015 with a radar (1.26 GHz) and a radiometer (1.14 GHz). Unfortunately, due to a failure in the radar in July 2015, the active sensor stopped working; however, the SMAP radiometer continues acquiring observations in an ascending orbit at 6 pm and in a descending orbit at 6 am, local time. SM products are derived from brightness temperature (TB) observations at different spatial resolutions. The passive SM product level-2 (L2SMP, version 7) has a grid resolution of 36 km, based on Equal-Area Scalable Earth (EASE) Grid, version 2. To retrieve SM, a passive τ-ω model is used as the forward model (Chan et al. 2016). For this work, we use the L2SMP product based on the V-polarization Single Chanel Algorithm (SCA-V) from April 14th to October 14th in 2018 to be incorporated in the assimilation framework. Currently, the SCA-V algorithm is considered as the default option to generate the SMAP SM retrievals O'Neill et al. 2018).  3. LSP model and enKF algorithm

Land surface process (LSP) model
The LSP model simulates in 1-d the transport of moisture and energy in the atmosphere, canopy, soil surface, and vadose zone (Judge, Abriola, and England 2003;Judge et al. 2008). This model simulates energy and moisture transport in soil and vegetation using a fuzzy-type equation, and estimates energy and moisture fluxes at the soil surface and in the root zone, including dynamic vegetation conditions . The main equations that describe the energy balance withing the LSP model are shown below (Casanova and Judge 2008): The equations (1) and (2) represent the energy balance between the soil and the canopy. For this equations, Q net,c is the energy flux at the canopy, Q net,s is the energy flux at the soil, H sc are the heat fluxes between canopy and soil, H ca are the heat fluxes between canopy and air, H sa are the heat fluxes between soil and air, LE tr are the latent heat fluxes of transpiration, LE ev are the latent heat fluxes of canopy evaporation, LE es are the latent heat fluxes of soil evaporation, R s,c is the net solar radiation intercepted by the canopy and R l,c is the net long-wave radiation at the canopy. Then, Equation 3 describes the moisture balance based on the infiltration of moisture at the soil surface: where I net,s is the net infiltration of moisture at the soil surface, P fB is the variable related to precipitation, D is the canopy drainage from the canopy to the soil, E is the soil evaporation, R is the runoff and D is the rate of change in moisture intercepted by the canopy. The equations (4), (5) and (6) define the soil processes based on heat and moisture transport in the soil.
where θ is the volumetric soil moisture, T is the soil temperature, C s,v is the volumetric heat capacity of soil, q h is the heat flux, q l is the liquid flux, and q m is the moisture transport in the soil. In general, these equations depend on the amount of moisture contained in the soil, vegetation, atmosphere, the temperature and how they interact with each other. Although most of the parameters used were obtained in situ measurements during the THEXMEX-18, some of these needed to be obtained from a literature review due to the complexity to be measured in the field.

Ensemble kalman filter (EnKF) algorithm
The assimilation algorithm implemented in this study is based upon the EnKF, as described in Evensen (2003), and used with favorable results in the assimilation of microwaves over agricultural fields (e.g. Zhao, Chen, and Shen 2013;Monsivais-Huertero et al. 2016). The EnKF uses an ensemble of simulations composed of different members. Each of these members represents possibility SM conditions. The variation among members indicates the propagation of uncertainty from different sources within the LSP model (see Table 4). Equation (7) represents the states of the EnKF conforming the ensemble members, A, shown in Equation (8).
In these equations, f (·) represents a non-linear model, x i− t indicates the prior state of the i th element at time t before update, x i+ t−1 indicates the posterior state of the i-th ensemble member at time t−1, u + t−1 represents the parameters of the non-linear model, u i t−1 is considered as the meteorological forcings, and v i t−1 is the model error. The A matrix ( Equation (8)) is conformed by each member of the ensemble represented by x i . In equation (8), N indicates the number of ensemble members.
Uncertainty in observations are mainly due to random inherent perturbations in the sensor. Equation (9) describes the conformation of the ensemble of perturbed observations.
The elements d are observations at time t described by where h(·) is the operator relating the state space to the observation space. Then, the ensemble of perturbed observations (D) is conformed based on the elements d. In this case, ε represents the error associated in observations with zero mean and γ is the ensemble of perturbations.
The representative equation of the EnKF can be written as (Monsiváis-Huertero et al. 2010;Evensen 2003): where A + denotes the posterior ensemble member, A − denotes the ensemble of prior states, K represents the Kalman gain and H the operator relating the ensemble of perturbed observations to the ensemble of states. The ensemble perturbation matrix, A ′ , is expressed as

Methodology
An EnKF-based assimilation framework was implemented to estimate RZSM using the LSP model in Huamantla, Tlaxcala. The methodology of this work is made up of several steps (see Figure 3). First, the database is composed of meteorological information, soil and vegetation parameters obtained from the THEXMEX-18 campaign, and satellite information. Afterwards, a calibration process of the LSP model was performed to obtain the optimal in situ parameters. The next step was the coupling of the LSP model and the EnKF algorithm to conform the assimilation algorithm in order to compare the SM simulations with in situ measurements and SM retrievals from SMAP L2SMP product.

LSP calibration
The LSP model is forced with seven micrometeorological parameters and requires 13 ground surface parameters and 15 parameters related to vegetation. For this work, the soil profile is defined by two layers with different properties. The first layer is from the soil surface (0 m) to 0.20-m depth and the second layer was considered between 0.2 m and 1.7 m. Based on soil texture, the first layer was identified up to a depth of 20 cm, containing the TDR measurements at 2.5, 5, and 10 cm. The next layer contains the measurements at 20 and 30 cm, the optimal case was identified with the method of Pareto front comparing both strata. The parameters that were not possible to obtain from in situ measurements were calibrated using the Monte Carlo method with ranges shown in Table 3, based on the literature. The parameter calibration was conducted using the mean square error as objective function. Soil parameters were calibrated under bare soil conditions and those describing the canopy throughout the vegetated period.   Goudriaan (1977) and ranges for soil parameters were from Rossi and Nimmo (1994 Base assimilation rate (kg CO 2 /m 2 s) −10 −8 − −10 −10 e photo Photosynthetic efficiency (kg CO 2 /J) 10 −7 − 10 −5 soil a Slope parameter for r s (m 2 s/kg To complement the meteorological conditions, the long-wave radiation was calculated as (Idso and Jackson 1969;Satterlund 1979): where R l is the long-wave radiation, σ is the Stefan-Boltzmann constant, and T A is the air temperature. The rest of the parameters were obtained from the in situ stations.

Assessment of SMAP L2SMP product
The assessment of the SMAP LS2SMP product over the agricultural region of Huamantla is evaluated by comparing the SMAP SM retrievals and the upscaled SM measurements. The upscaled SM is obtained by using the soil-weighted method as recommended by the SMAP team to evaluate the SMAP SM products Bhuiyan et al. 2018). The upscaled SM is computed by: where SM S,t is the temporal upscaled SM, s S,t is the temporal standard deviation, w i indicates the weights based on the soil texture map, SM i represents the averaged surface SM of all stations located in the i-th soil-texture class. To develop the soil-weighted scaling approach, each in situ surface SM value is identified based on its corresponding soil texture. Then, soil texture data are intersected to the SMAP pixel, and percent area statistics are derived for each of the soil texture types . In this work, we use the soil texture map provided by the National Institute of Statistics and Geography of Mexico (INEGI) (2017). The Huamantla region is composed of fluvisol (7.05%), cambisol (47.61%), and regosol (45.34%) (see Table 1). The performance of the SM from the SMAP L2SMP product is evaluated by the bias (Bias), the Root Mean Square Difference (RMSD), the unbiased RMSD (ubRMSD), and the Pearson correlation coefficient (r). The Bias, RMSD, ubRMSD, and r are evaluated as (Gruber et al. 2020): where SM SMAP is SM from the SMAP L2SMP product, SM upscaled is the upscaled SM, and N is the number of (SM SMAP,i , SM upscaled,i ) pairs. These pairs are composed of coincident times between SMAP SM retrievals and upscaled SM. In equation (16), the overline indicates the mean, and s SM SMAP,i and s SM upscaled,i represent the standard deviation of the SMAP L2SMP product and the upscaled SM measurements, respectively.

Implementation of the EnKF
For this work, the nonlinear propagator mentioned in equation (7) is the LSP model and the SM values describing the soil profile are obtained from the LSP model. The state vector, x, is conformed by the LSP SM estimates from the 35 nodes, as shown in equation (17). The meteorological forcings are presented as u i t−1 , u + t−1 and are the invariant inputs in the LSP model (see equation (7)). The SM observations from the THEXMEX-18 are obtained from measurements at 0-5, 10, 20, and 30 cm.
For this equation, i is the number of ensemble and k represents the number of nodes representing the soil profile in the LSP model. One hundred (N = 100) ensembles realizations were used for assimilation to achieve reliable estimates (Nagarajan et al. 2011;Monsiváis-Huertero et al. 2010). The 0-5 cm SM represents near-surface soil moisture (SM 0−5 cm ), comparable to SM from the SMAP L2SMP product, and the SM at 0-100 cm represents the RZSM. The 0-5 cm SM and RZSM estimates and observations are calculated by the equation: where k indicates the total number of nodes (blocks) within 0-5 cm or the root-zone, Dz i the thickness of the i th node, and SM i the soil moisture at i th node. Among all the inputs/forcings to the LSP model, precipitation observations typically have the highest errors compared with other micrometeorological parameters. A Gaussian error with zero mean and standard deviation equal to 12% of the observed precipitation value was introduced during events (Habib, Krajewski, and Kruger 2001). An error with a Poisson distribution and 0.45 mean was introduced in the absence of the events. In this paper, forcing variables vary within a physically reasonable range based on (Dunne and Entekhabi 2006;Monsivais-Huertero et al. 2016), as shown in Table 4.

Synthetic and field observations
The synthetic truth was obtained from one of the realizations from an open-loop simulation of the LSP model. The truth was not included in the ensemble of 100 members during the assimilation. The field observations were obtained from the THEXMEX-18 experiment, described in Section 2.2. A Gaussian error with zero mean and standard deviation of 0.02 m 3 m −3 was added to SM values for both synthetic and field observations, based on field information. The temporal frequency of assimilation for both the synthetic and field observations is every three days at 6 a.m., local time, corresponding to the descending pass of the SMAP satellite. Different assimilation scenarios with synthetic and field SM observations were performed. The evaluation of the assimilation algorithm was compared by assimilating SM at four depths (0-5, 10, 20, and 30 cm) and SM at 0-5 cm only. This evaluation was used to understand the improvements in the SM estimates over the complete SM profile from the assimilation scenarios.

LSP calibration
The calibration of the LSP model was carried out for different depths within the study area. The depths considered for the calibration were 0-5, 10, 20, and 30 cm.  (Casanova and Judge 2008), it is found the major differences with previous values are mainly due to the different cultivars of corn and soil type. Casanova and Judge (2008) calibrated the LSP model for sweet corn over a sandy soil, whereas in Central Mexico, farmers cultivate hybrid and creole corns over a sandy loam soil. This impacts the optimal values of the parameters related to the plant growth, such as σ, e c , c d , i w , l w , F b , and e photo . Table 6 shows RMSD at each depth. It is observed that the behavior of the LSP simulations are close to in situ information with an RMSD ranging between 0.0149 and 0.0195 (m 3 m −3 ) at all depths. Figure 4 shows the comparison between the SM estimates at depths of 0-5, 10, 20, and 30 cm from the LSP model and THEXMEX-18 observations. For the period of bare soil (DoY 105-160 ), a good agreement is observed between estimates from the LSP model and THEXMEX-18 observations at depths of 0-5 and 10 cm. However during the vegetated period (DoY 160-280), differences up to 3.61% SM, 2.98% SM, 2.76% SM and 1.58% SM at 0-5 cm, 10 cm, 20 cm and 30 cm, respectively, are observed during periods of frequent rainfalls and the late season when the plant is completely developed and the ears are in development. As reported in previous studies, most of the SVAT models fail in estimating SM during rainfall events, as observed in this case. At depths of 20 and 30 cm, the differences between LSP estimates and SM observations reduce, but there are still differences during the presence of rainfall events. It is also noted that after DoY 260, when the corn plant started senescence, the LSP estimates and SM observations reduced their differences because of the effects of the vegetation in the energy balance also reduce. This evidences that the main sources of error during the vegetated period in the LSP SM estimates after calibration come from the vegetation parameters. In Figure 4, two anomalous dry periods are identified. The fist one is from DoY 160 to 180, and the second one, from DoY 200 to 220, for which the SM values for both LSP estimates and SM observations are below the wilting point of 0.15 m 3 m −3 (Inzunza-Ibarra et al. 2018). From DoY 160 to 180, both LSP simulations and field observations indicated an extremely dry period. At depths of 0-5 and 10 cm, both SM estimates and SM observations are very close. However, at depths of 20 and 30 cm, SM observations show drier conditions than the LSP estimates. In contrast, during second dry period from DoY 200 to 220, the SM observations show wetter conditions than the LSP estimates. As observed in Figure 4, around DoY 160, there are different rainfall events reported by the rain gauges that were not captured by the TDR observations. At DoY 210, the intensities in rainfalls reported by the rain gauges are lower to reproduce the SM values using the calibrated LSP model. This indicates that the differences between the SM estimates and SM observations are mainly due to uncertainties in the meteorological forcings provided to the LSP model and misrepresentation of soil parameters during extreme conditions.

Performance of the SMAP L2SMP product
The upscaled SM showed a standard deviation , 0.40 (m 3 m −3 ) (40% SM) throughout the THEX-MEX-18 when using the soil-weighted method, that is in agreement with previous works (Bhuiyan et al. 2018;Caldwell et al. 2019). The mean value of the coefficient of variation of 0.314 indicates that standard deviation is significantly lower than the mean upscaled SM, assuring representative SM conditions at 36 km. Table 7 presents the statistics of the evaluation for the SCA-V SMAP L2SMP product when compared to the upscaled SM measurements. Overall, the SMAP L2SMP product shows an ubRMSD of 0.043 (m 3 m −3 ) (4.3% SM) for the agricultural region in Huamantla; thus, the SMAP SM retrievals are close to the SMAP mission requirements of ubRMSD , 0.04 (m 3 m −3 ) (4% SM) (Entekhabi et al. 2010). It is also observed that the bias is the main component of the difference with in situ SM observations. Because the upscaled in situ SM measurements and the SMAP L2SMP product are at the same spatial scale, the RMSD is primarily due to a composed effect of inaccurate values of the input parameters used in the SMAP SM retrieval algorithm such as surface temperature, vegetation representation, and surface soil roughness (Walker et al. 2019;Judge et al. 2021). When comparing the statistical metrics for bare soil and vegetated conditions, it is found that the bias and the ubRMSD increment by 0.034 (m 3 m −3 ) (3.4% SM) and 0.023 (m 3 m −3 ) (2.3% SM), respectively, demonstrating that they are time-variant. Variations in the performance of the SMAP L2SMP product have been identified when going from bare soil to vegetated conditions over agricultural regions (Zheng et al. 2020). During bare soil conditions, the bias is the result of uncertainty in effective soil temperature and soil roughness, whereas during vegetated conditions, it is mainly due to uncertainty in vegetation representation Judge et al. 2021). The characterization of the systematic component (bias) and the random component (ubRMSD) of the difference is not an easy task and the SMAP SM retrievals are usually assumed as unbiased when included into an assimilation framework (Mladenova et al. 2019). This is true for regions covered by the SMAP core validation sites; however, it has been demonstrated that this assumption is not valid for other regions (Singh et al. 2019;Zheng et al. 2020). Bias correction methods range from simple algorithms (Ryu et al. 2009) to more complex correction frameworks (Monsivais-Huertero et al. 2016) based on the sources causing the bias. Because the sources of bias in the SMAP SM retrievals do not degrade the mean performance of the retrieval algorithm model according to the SMAP mission requirements of ubRMSD , it is possible to use a simple bias-correction method (Ryu et al. 2009). The bias could be considered as constant within specific temporal intervals based on the land cover conditions over the agricultural region. In this study, we correct the bias using a constant value of 0.039 (m 3 m −3 ) (3.9% SM) for bare soil conditions and 0.073 (m 3 m −3 ) (7.3% SM) for vegetated conditions based on Table 7. Table 7 also shows the statistics of the SM estimates at 0-5 cm from the calibrated LSP model when compared to upscaled in situ measurements. It is observed that the ubRMSD from the LSP model and the SMAP L2SMP product is , 0.045 (m 3 m −3 ) (4.5% SM) for all conditions and both LSP SM estimates and SMAP SM show the same order of magnitude in ubRMSD. Nevertheless, a significant difference is observed when comparing the biases from the LSP SM estimates and Table 7. Root Mean Square Difference (RMSD), Bias, unbiased RMSD (ubRMSD), and correlation coefficient (r) when comparing the upscaled SM measurements at 0-5 cm with the SM from SMAP L2SMP product and the 0-5 cm SM estimates from the calibrated LSP model at 36-km scale.  Table 8 presents the RSMD and average standard deviations (ASD) in RZSM when assimilating depths of 0-5, 10, 20, and 30 cm and 0-5 cm only using synthetic and THEXMEX-18 observations. In all cases, the ASD in RZSM reduces by 1.39-1.57 %SM (50-57 %) compared to open-loop when assimilating either 4 depths or 0-5 cm only. This indicates that the assimilation process reduces significantly the uncertainty in RZSM estimates. The highest reduction in the ASD is when assimilating 4 depths. For the synthetic case, the assimilation of the 4 depths improves the RZSM estimates by 2 %SM (59%) compared to open-loop, whereas the assimilation of the top 5 cm of the soil improves by 1.8 %SM (54 %). In contrast, the assimilation of 4 depths and 0-5 cm from the THEXMEX-18 experiment improves the RZSM by 0.7 %SM (28%) and 0.3 %SM (13%), respectively, compared to openloop simulations. As expected, the assimilation of synthetic observations show higher improvement compared to field observations. Synthetic observations were obtained from a LSP simulation; thus, these observations follow the same physics in soil and vegetation presented in the LSP equations (see section 3.1). Unlike synthetic observations, THEXMEX-18 SM observations captured the actual SM conditions in the soil, including heterogeneity in texture between sites and physical processes that could be misrepresented in the LSP model. Beside these sources of error, the percentage of the improvement in RZSM when assimilating real observations over the agricultural area in Central Mexico is about the same order of magnitude than previous studies (e.g Monsiváis-Huertero et al. 2010;Nagarajan et al. 2011).

Assimilation of different SM depths
When comparing the assimilation of 4 depths to the assimilation of 0-5 cm only, it is observed that the improvement in RZSM is about 1.9 %SM (55%) in both cases, compared to open-loop. However, this is also an effect that the synthetic observations follow the same physics than the LSP model. For THEXMEX-18 observations, the assimilation of 0-5 cm only decreases by 3.7% SM (15%) in improving RZSM estimates compared to the assimilation of 4 depths. When assimilating the SM at top 5 cm of the soil, the Kalman equation (see Equation (10)) propagates statistically the improvement in the top layer into the deeper layers, based on the LSP simulations at the time of the assimilation. Thus, improvement at deeper layers could be affected under certain conditions such as a misrepresentation in the soil strata. Table 9 compares the SM values at depths of 0-5, 10, 20, and 30 cm after assimilating observations at 4 depths and 0-5 cm only, for both synthetic and THEXMEX-18 observations. For synthetic observations, the improvement in RMSD at all depths is 1.77-2.12 %SM (54-60%) and 1.59-1.96 %SM (48-56%) when assimilating 4 depths and 0-5 cm, respectively, compared to open-loop. Table 8. Root mean square differences (RMSD) and average standard deviation (ASD) in RZSM estimates when assimilating synthetic and field observations at depths of 0-5, 10, 20, and 30 cm (4 depths), 0-5 cm only, and SMAP SM retrievals. ubSMAP SM stands for unbiased (bias-corrected) SMAP SM retrievals.

Observations
Depth In order to propagate the improvement in SM estimates at the top 20 cm of the soil, it is necessary to use other assimilation techniques such as the simultaneous state-parameter update (e.g. Monsiváis-Huertero et al. 2010) and obtain a new value in some of the LSP parameters for each assimilation time. Figure 5(a,b) shows the comparison between the time series of the SM at the 0-5 cm and RZSM from open-loop simulations and when assimilating 0-5 cm observations for THEXMEX-18 measurements. This figure depicts that after every point of assimilation, the ensemble standard deviation reduces, and this enhancement is gradually lost as the time goes on. It also is observed that during bare soil conditions (DoY 105-160), both open-loop simulations and SM estimations at 0-5 cm and in the RZSM follow a similar trend than THEXMEX-18 observations even during extreme dry conditions (see for instance DoY 140 to 160). However, when the assimilated values occur close to a rainfall event, the estimations of SM at 0-5 cm and RZSM predict wetter conditions than those described by the in situ observations. This indicates that the drydown in the soil occurs faster than that represented in the LSP equations. This behavior is more evident in the RZSM because the effect of the misrepresentation in the soil infiltration is propagated in all layers. For vegetated conditions (DoY 160-280), open-loop simulations predict wetter conditions than THEX-MEX observations for both SM at 0-5 cm and RZSM, particularly from DoY 180 to 220 and from DoY 260 and 280. After assimilating 0-5 cm depth, the SM estimates are closer to in situ observations. Nevertheless, when the assimilated values is outside the region of one standard deviation (represented with a shaded area in the figure), after the assimilation process, the posterior values of SM marginally improve and the physics represented in the LSP model takes back the trend from the SM simulations at both SM at 0-5 cm and RZSM. Table 9. Root mean square differences (RMSD) and average standard deviations (ASD) of SM at different depths of the soil when assimilating SM observations at depths of 0-5, 10, 20, 30 cm and 0-5 cm only for synthetic and field observations and unbiased (bias-corrected) SMAP SM retrievals.  , and although the assimilation algorithm compensates this difference at the 3 depths, the improvement is lost after some points of simulations and SM estimates continue being biased high compared to field observations. As concluded in Monsivais-Huertero et al. (2016), the extended-state-vector technique to update simultaneously states and parameters could be useful to reduce the biases during the assimilation process. Figure 7 compares the soil moisture profile up to 1-m depth for open-loop simulations and when assimilating 0-5 cm observations during the THEXMEX-18 experiment every 3 days at 6 a.m. It is found that that during bare soil conditions, the LSP model provides SM simulations similar to the assimilation of 0-5 cm from THEXMEX-18 observations up to 1 m. From DoY 140 to 160, both the LSP model and the assimilated SM estimates show dry conditions up to 65 cm, and wetter conditions are observed at deeper layer. During the vegetated period (DoY 160-280), the LSP simulations show wetter conditions for the depths up to 60 cm compared to assimilated in situ observations. At depths deeper than the 60 cm, the LSP simulations and the assimilated SM estimates are closer. This difference could be mainly due to the presence of high root density at the top 60 cm. It is also noted that the effect of a fast infiltration is depicted by the assimilation of THEXMEX-18 observations up to a depth of 70 cm. This indicates that during long dry periods such as DoY 180-220, the plants may suffer of severe water stress if they do not develop roots larger than 65 cm.
Overall, it was found that the assimilation of 0-5 cm SM observations improved the ASD of the RZSM, comparable to the assimilation of depths at 0-5, 10, 20, and 30 cm. However, the improvement in the RMSD after assimilating of 0-5 cm SM observations reduced when compared to improvement of assimilating four depths. In order to reduce the RMSD in the RZSM, it is necessary to investigate in future works the combination of SM information at depths lower or equal than 30 cm with SM at 0-5 cm. Table 8 presents the RMSD and ASD of the RZSM when assimilating unbiased (bias-correct) SMAP SM at 36 km every 3 days. The ASD in RZSM estimates improve by 1.34 %SM (49%) compared to open-loop simulations. However, the RZSM estimates do not improve when assimilating the SMAP SM product, and even worse, the RZSM values are farther way from SM in situ observations. When looking at Table 9, it is observed that only SM estimates at depths of 0-5 and 10 cm marginally improve by 0.1 %SM (4.5%) compared to open-loop simulations. In contrast, SM estimates at depths of 20 and 30 cm are farther way from THEXMEX-18 in situ observations. This lets us conclude that although the certainty in the SM estimates increases, the actual values only marginally improve at the top 10 cm of the soil. Figure 5 shows the estimates of SM at a depth of 0-5 cm and in the RZSM. When comparing the SM estimates at 0-5 cm and in the RZSM, it is distinguished that the use of SMAP SM retrievals into the assimilation framework improves only the top 5 cm of the soil. This result is in agreement with the values presented in Table 9. This suggests that, to exploit the SMAP SM retrieval and estimates RZSM conditions over agricultural areas in Central Mexico, it is still necessary to improve the satellite SM retrievals. Figure 6 presents the comparison between the SM estimates at depths of 0-5, 10, 20, and 30 cm from open-loop simulations and when assimilating the unbiased SMAP SM retrievals. It is observed that for depths lower than 10 cm, the SM estimates after assimilation do not improve the estimates. The trend in the SMAP SM retrievals produces an erratic trend at depths lower than 10 cm, resulting in SM estimates far away from field observations. In general, the trend from SM estimates are more stable during compared to rainy periods when assimilating SMAP SM retrievals. This behavior explains the statistics presented in Table 9. The information provided by the values from the SMAP SM retrievals is considered by the Kalman equation only when it is within the one-standard-deviation areas, otherwise the assimilation process follows mainly the physics computed in the LSP model. That is why, from DoY 220-240, the posterior SM conditions after assimilation do not represent real SM values according to the LSP equations and the LSP SM simulations go back to open-loop conditions, losing the improvement in the RZSM values. Figure 7(c) presents the soil moisture profile when assimilating unbiased SMAP SM retrievals every 3 days at 6 a.m. It is found that when using the SMAP information, SM values show wetter conditions for the complete soil moisture profile compared to the assimilation of THEXMEX-18 information. In addition, it is also noted that the SMAP SM retrieval algorithm provide wetter conditions during the presence of rainfalls than field observations. This impacts the SM estimates because after rainfalls with a low intensity, there is a significant increment in the SM values throughout the complete soil moisture profile. The high values of SM from the SMAP retrieval algorithm after rainfalls have been already reported in previous studies such as ). Unlike the results presented in Tables 8 and 9 and Figures 5 and 7 (c), studies such as (Mladenova et al. 2019(Mladenova et al. , 2020 have successfully assimilated SMAP information to improve SM estimates at different depths of the soil after correcting the bias. Colliander et al. (2017) showed that most of the core validation sites used to calibrate SMAP SM products show low bias. However, in this study, we monitored an agricultural area that is not included as a core validation site in the SMAP mission. As mentioned in Colliander et al. (2017) and Chan et al. (2016), the accuracy in the SMAP SM retrievals highly depends upon the calibration of the parameter values used in the t − v model, particularly, on the parametrization to account for roughness and vegetation conditions. Nevertheless, this is not an easy task, for instance, when comparing results from Casanova, Judge, and Jones (2006), Walker et al. (2019) and Judge et al. (2021) over corn fields, it is evidenced that the h-empirical roughness parameter, the b factor, and the vegetation water content have a wider range of values depending upon the corn cultivar. Currently, the soil moisture community needs to exploit measurements from sparse soil moisture networks to characterize regions lacking of information such as Latin America.

Assimilation of SM from the SMAP L2SMP product
In this section, we showed that the assimilation of SM intervals from the SMAP L2SMP product reduced the ASD of the RZSM estimates, compared to open-loop simulation. The RMSD reduced at depths of 0-5 and 10 cm; however, it did not improve when compared to open-loop SM. By using SMAP SM retrievals at 36 km within an assimilation framework, it is possible to infer SM conditions at the top 10 cm, replacing the need of in situ SM sensors. To improve the assimilated SM at depths lower than 20 cm, it is necessary to improve the SMAP SM retrievals and reduce the bias and ubRMSD over agricultural areas before going into an assimilation framework, as reported in Mladenova et al. (2019) and Wigneron et al. (2017). The improvement of SMAP SM retrievals requires collaborative efforts, particularly over regions lacking information. In the next section, we discuss about opened questions remaining about this topic.

Discussion
Agricultural regions in developing countries, such as Mexico, affected by a limited infrastructure require information of SM conditions at the root-zone column during regular times with intervals lower than every 2 weeks. More regular information about SM conditions is very useful to understand the effects of the climate change on the precipitation pattern and re-evaluate the plating dates for economic activities based on rainfed local crops, if necessary. This need can be partially covered by passive microwave sensors on-board satellite missions such as the NASA SMAP (Forgotson et al. 2020). Because the penetration of these sensors is limited to the top 5 cm of soil (Ulaby et al. 2014), it is necessary to implement methodologies such as data assimilation to estimate the SM conditions in the root-zone (Reichle et al. 2018). In this work, we found that the standard deviation of RZSM estimates when assimilating in situ SM measurements at 0-5 cm is comparable to that when assimilating SMAP SM retrievals from the L2SMP product with a spatial resolution of 36 km (see Tables 8  and 9). The improvement in the standard deviation indicates that the uncertainty in the value is reduced; thus, the random component of the error is also reduced. However, the RMSD from the RZSM estimates increases when assimilating SMAP SM retrievals by 0.015 (m 3 m −3 ) when compared to the assimilation of upscaled SM measurements, mainly due to the propagation of systematic errors within the LSP model and the EnKF algorithm. Section 5.4 shows the potential benefit of assimilating bias-corrected SMAP SM retrievals to provide actual conditions in the root-zone column, removing the dependence on in situ SM measurements.
In general, the assimilation frameworks have the drawback that if the observations are more than one standard deviation away from the SM ensemble mean, the updated conditions could produce unrealistic new SM conditions and the improvement from the EnKF will be lost since the physicalbased model will be back to the open-loop behavior (Monsiváis-Huertero et al. 2010). For instance, from DoY 220-240, Figure 6 shows that SM observations from the SMAP SM retrievals are more than one standard deviation way from the SM ensemble mean from the LSP model, resulting in loss of the improved conditions and the LSP model brings back the SM conditions into the open-loop estimates. In this case, it is necessary to revise the validity of the assumption of constant parameters within the physical-based model and the reliability of the SM retrievals in that specific period. Studies such as (Monsivais-Huertero et al. 2016) have proposed the implementation of more complex algorithms to correct simultaneously random and deterministic errors into an assimilation framework based on the EnKF. Among different bias correction methods, the simultaneous update of state and parameters within the assimilation framework seems to be the best option due to the computing time and propagation of the updated conditions within the physical-based models. Like the state update only within the EnKF, the main drawback of this technique is that if the observations are more than one standard deviation way from the SM ensemble mean, the updated conditions could produce unrealistic SM conditions and incorrect values in the parameters. Wigneron et al. (2017) analyze the sources of errors within the SM retrieval algorithms based on passive microwave observations and shows that most of existing assimilation frameworks to improve SM assume bias-corrected observations (e.g. Mladenova et al. 2019). The uncertainty of SM estimates from the SMAP radiometer is a combined effect of calibration of in situ SM sensors (McNairn et al. 2015;Gruber et al. 2020), pixel heterogeneity at the satellite scale , the mismatch in the spatial scales between in situ SM measurements and satellite spatial scales observations (Bhuiyan et al. 2018;Caldwell et al. 2019), and the incorrect values in the of the input parameters within the SMAP SM retrieval algorithms (Judge et al. 2021).
In order to minimize the errors due to sensor calibration in this study, we calibrated the SM sensors independently for each field and at each depth using soil samples with a difference , 0.03 (m 3 m −3 ) when compared to gravimetric SM in all cases (see Section 2.2) similar to previous SM studies ( McNairn et al. 2015;Bhuiyan et al. 2018). This methodology was developed during the design of the NASA SMAPVEXs (McNairn et al. 2015) and is suggested to be implemented before evaluating the performance of SM retrievals from satellite observations (Gruber et al. 2020). In these protocols, it is recommended to collect soil samples as frequent as possible during different meteorological conditions to cover a wide range of SM values. However, this condition could impact the representativeness of large regions and the duration of field experiments using temporal and/or sparse SM networks. International collaborative efforts such as the International Soil Moisture Network (Dorigo et al. 2021) and the Joint Experiment for Crop Assessment and Monitoring (JECAM) (Jolivot et al. 2021) have been conforming databases describing soil and vegetation over different biomes worldwide; unfortunately, there are still several regions lacking of information, particularly at tropical latitudes, evidencing the need of additional field experiments contributing in the understanding of these regions.
As shown in Figure 1 and Table 2, the pixel heterogeneity at 36 km was accounted by generating a specific land cover map for the region of Huamantla. Colliander et al. (2019) pointed out the effect of heterogeneity in agricultural regions on SM retrievals when varying the size of the satellite pixels. Efforts such as the collection of the MODIS land cover product (García-Mora, Mas, and Hinkley 2012; Sulla-Menashe et al. 2019) and the ESA Climate Change Initiative's Land Cover (ESA-CCI-LC) dataset (Mousivand and Jokar Arsanjani 2019) can be useful tools since they provide global classification maps; nevertheless, the classifiers implemented still need to be improved by incorporated addition control points over developing countries.
The upscaling method is another aspect that could produce bias in SM retrievals. We generated the upscaled SM using the soil-weighted method as suggested in . The standard deviation , 0.40 (m 3 m −3 ) throughout the THEXMEX-18 and the mean value of the coefficient of variation of 0.314 in the 36-km upscaled SM confirm representativeness of this method over the agricultural region of Huamantla. Bhuiyan et al. (2018) showed that either the soilweighted method or the Voronoï diagrams can be used to upscale SM at 36 km when using dense SM networks. For sparse networks, Caldwell et al. (2019) shows that it is necessary to verify the standard deviation, the coefficient of variation, and the number of require locations to verify the representativeness of the upscaled SM at each spatial resolution to be evaluated. Because of the lack of representativeness in different sparse SM networks, there is lack of in situ SM information to validate SM products at spatial scales lower than 10 km .
Recent studies Judge et al. 2021) have found that the assumption of constant parameters, such as soil roughness and the b factor used to estimate the vegetation optical depth, is not correct since these parameters are time varying and this consideration requires to be re-evaluated to improve the SMAP SM retrievals. Currently, the NASA L2SMP product obtain the values for these parameters based on the lookup table technique based on land cover (O'Neill et al. 2018). In order to obtain the new representation of these parameters, further studies are needed due to diversity and heterogeneity in different biomes worldwide.

Conclusions
In this study, an EnKF-based assimilation algorithm was implemented to improve RZSM estimates using the LSP model over an agricultural area of Central Mexico. The objective of this study was to assimilate SM field observations and satellite SM retrievals during a complete growing season of corn to understand the effects of uncertainties in meteorological forcing on RZSM estimates when using the LSP model.
A comparison of EnKF performance using high spatio-temporal density observations from the THEXMEX-18 experiment, synthetic observations, and SM retrievals at 0-5 cm (near-surface SM) from the NASA SMAP mission at 36 km was conducted to differentiate model errors in biophysics from errors in forcings. The SMAP SM retrievals were bias-corrected by identifying the temporal variations of the bias between different land cover conditions. In order to reduce the error in the model representation of the conditions over Central Mexico, the soil and vegetation parameters of the LSP model were calibrated for the complete growing season using the Monte Carlo method. The calibrated values produced an RMSD of 1.69%SM and 1.94%SM in RZSM and near-surface SM, respectively. However, differences between SM estimates from the LSP and SM observations from the THEXMEX-18 remained during frequent rainfall events and during the presence of ears.
Assimilating synthetic observations produced estimates of SM close to the true values throughout the soil profile with an average error of about 1.38% SM (60% reduction of the open-loop RMSE) for the entire growing season. When assimilating THEXMEX-18 observations produced estimates with an average error of about 2.23% SM (36% reduction of the open-loop RMSE) in the top 5 cm. In layers up to 30 cm, the SM had an average error of 1.74% SM (30% reduction of the open-loop RMSE) during the growing season for the THEXMEX-18 observations. This difference between SM errors for the field experiment and synthetic case indicated that simplifications of a homogeneous soil in the LSP model do not represent properly the physics in the soil. When assimilating bias-corrected SM retrievals from the SMAP mission (SMAP L2SMP product), the average error of 3.34% SM (5% reduction of the open-loop RMSE) was observed at the top 0-5 cm with an averaged standard deviation of 1.42% SM. In contrast, the average error in the SM at deeper layers was of 3.25% SM, particularly, over developing countries. The improvement in the soil layers when assimilating SMAP SM retrievals was significantly lower than the assimilation of TDR observations at the top 5 cm of the soil. This indicates that the SM retrievals from satellite missions operating at L-band still need to be improved to infer SM conditions up to 100-cm depth using a data assimilation framework. The improvement in the SM retrievals needs the re-calibration of the main parameters governing the passive signatures from the t − v model, used as the baseline physical model within the retrieval SM algorithm Chan et al. 2016).
The assimilation of four depths (0-5, 10, 20, and 30 cm) resulted in lower uncertainty in RZSM estimation compared to the assimilation of SM at 0-5 cm only. However, these results demonstrated that SM observations at 0-5 cm, such as those from SMAP mission, could be very useful in an assimilation framework to estimate SM condition up to 10 cm. Although this improvement covers the top 10 cm of the soil, this information could be very valuable particularly on rainfed agricultural regions with lack of information about SM conditions such as Latin America, including Mexico. Frequent information (lower than every 2 weeks) of soil moisture at the top 10 cm of the soil helps to local decision-makers in understanding the local effects of climate change at tropical latitudes.
In this study, only the errors in soil and meteorological forcings were considered. Thus, the observed differences in EnKF performance between synthetic and SM observations may indicate errors in model biophysics that were not considered here, such as constant values in soil parameters or predictions in vegetation description and root distribution. In addition, the SMAP SM retrievals were used in the assimilation algorithm after bias correction based on field experiments. However, it is necessary to use complex bias correction methods in the assimilation algorithms to improve RSZM estimates when exploiting operational satellite SM products for regions without field information available, such as proposed in Moradkhani et al. (2005) and Monsivais-Huertero et al. (2016), or implement methodologies to improve the parameterization in the SMAP SM retrieval algorithm using sparse soil moisture networks covering developing countries.