Irrigation characterization improved by the direct use of SMAP soil moisture anomalies within a data assimilation system

Prior soil moisture data assimilation (DA) efforts to incorporate human management features such as agricultural irrigation has only shown limited success. This is partly due to the fact that observational rescaling approaches for bias correction used in soil moisture DA systems are less effective when unmodeled processes such as irrigation are the dominant source of systematic biases. In this article, we demonstrate an alternative approach, i.e. anomaly correction for overcoming this limitation. Unlike the rescaling approaches, the proposed method does not scale remote sensing soil moisture retrievals to the model climatology, but it extracts the temporal variability information from the retrievals. The study demonstrates this approach through the assimilation of soil moisture retrievals from the Soil Moisture Active Passive mission into the Noah land surface model. The results demonstrate that DA using the anomaly correction method can better capture the effect of irrigation on soil moisture in agricultural areas while providing comparable performance to the DA integrations using rescaling approaches in non-irrigated areas. These findings emphasize the need to reduce inconsistencies between remote sensing and the models so that assimilation methods can employ information from remote sensing more directly to develop representations of unmodeled processes such as irrigation.


Introduction
Soil moisture is an important environmental variable in that it modulates fluxes of water, energy, and carbon at the land-atmosphere interface (e.g. Hirschi et al 2011, Graf et al 2014, Trugman et al 2018, Green et al 2019. Therefore, accurate knowledge of the spatial and temporal distribution of soil moisture is critical for many environmental applications such as weather and climate forecasting (e.g. Wu andDickinson 2004, Koster et al 2010), agricultural water resources management (e.g. Shin andJung 2014, Jalilvand et al 2019), and drought and flood prediction (e.g. Hong and Kalnay 2000, Yuan et al 2011, Wanders et al 2014. Incorporating remote sensing retrievals of soil moisture into land surface models (LSMs) through data assimilation (DA) is often considered a viable approach to developing observation-informed, spatially-distributed estimates of soil moisture (Reichle et al 2002a, Heathman et al 2003. Though there have been numerous efforts on assimilating soil moisture retrievals from different satellite measurement platforms (e.g. Reichle et al 2007, Kumar et al 2009, 2014, Blyverket et al 2019, De Lannoy et al 2019, Seo et al 2021, most of these studies report relatively minor improvements from DA, particularly when high quality precipitation information is used to drive the model simulations. This is partly because the focus of most soil moisture DA studies is on improving short-term anomalies, as opposed to improving biases, which are often the dominant source of soil moisture errors . Further, these soil moisture DA studies and the evaluation metrics used in them are mostly focused on comparing to in-situ measurements, essentially ignoring the utility of remote sensing data to represent the significant spatial heterogeneity of soil moisture processes. Recent studies (e.g. Nearing et al 2018) have also shown that the coarse-scale remote sensing measurements may not have sufficient information about point-scale processes, making such comparisons to in-situ data inconclusive. On the other hand, a significant potential utility of satellite soil moisture observations should be in capturing the impact of processes that are influenced by factors outside precipitation or natural variability, which are ubiquitous on the land surface. For example, irrigation is one of the important and pervasive human activities across the world and has a direct effect on soil moisture in agricultural areas (Douglas et al 2009, Ozdogan et al 2010, Zhang et al 2017, yet is difficult to capture in models. The vast majority of the current soil moisture DA studies have not examined the usefulness of DA in the context of capturing such unmodeled processes (Kumar et al 2015).
Because agricultural irrigation is often driven by subjective practices, irrigation formulation in LSMs are poorly represented (Gibson et al 2017, Lawston et al 2017, Abolafia-Rosenzweig et al 2019. Recent studies (e.g. Lawston et al 2017, Brocca et al 2018 have demonstrated that microwave-based satellite remote sensing, particularly the Soil Moisture Active Passive (SMAP) satellite, can capture the effect of irrigation on soil moisture. Though assimilation of satellite soil moisture retrievals should offer the possibility of incorporating the irrigation signals into model estimates, technical limitations of DA methods have been shown to be restrictive in this context (Kumar et al 2015).
Specifically, most soil moisture DA systems are forced to employ observational rescaling strategies due to large differences in soil moisture representation in models and remote sensing retrievals . Rescaling approaches such as cumulative distribution function (CDF) matching ) are used to convert the remote sensing retrievals into the LSM's climatology prior to DA. While this approach is effective in eliminating systematic differences in all statistical moments between the modeled and remote sensing retrievals, they have been shown to cause statistical errors when the observations contain signals that are not modeled (Kumar et al 2015). They have also been shown to be significant sources of information loss in DA (Nearing et al 2018). Further, the CDF-matching is ineffective when unmodeled processes such as irrigation are the dominant source of systematic errors between LSMs and observations, as demonstrated in Kumar et al (2015). That is, CDF-matching, by design, results in most irrigation impacts being excluded prior to DA. Similar limitations also apply to other rescaling methods such as variance matching, least squares regression-based rescaling, and triple collocation analysis-based rescaling, that have been proposed in the literature (Yilmaz and Crow 2013). Reducing the reliance on rescaling methods is essential to improving the capability of DA systems to represent such processes.
Observation rescaling methods can be avoided if the representative and interpretation differences between modeled and remote sensing soil moisture are reduced. This is not easy to do given that significant revisions in model parameters and conceptual formulations in land surface and retrieval models are needed. Alternatively, here we explore the utility of simpler a priori bias correction approach if the assumptions about the model and remote sensing systematic errors are relaxed within reason. Compared to the CDF-matching method, the newly applied bias correction method does not scale soil moisture retrievals to the model climatology, but instead extracts the soil moisture anomaly information (i.e. temporal variability) from the retrievals and uses it directly. Hereafter, we refer to the newly applied bias correction method as 'anomaly correction' throughout this paper. Compared to CDF-matching, this approach better preserves the short-term anomaly information from the remote sensing retrievals. Our results demonstrate that employing the anomaly correction method can help the soil moisture DA framework to better preserve unmodeled processes such as irrigation from satellite-based soil moisture retrievals in agricultural areas while achieving comparable or better performance in estimating soil moisture than using the CDF-matching method. This study serves as a benchmark for future DA approaches to improve the information exploitation by directly incorporating remote sensing soil moisture retrievals.

Soil moisture data assimilation experimental setup
In this study, we conduct DA experiments using the NASA Land Information System (LIS; Kumar et al 2006, 2008, Peters-Lidard et al 2007, which is a multiscale LSM and DA system for hydrologic applications. Among various state-of-the-art LSMs available in LIS, we employ the Noah LSM version 3.9 (Ek et al 2003) as a forward forecast model. The LSM integrations are conducted over the continental United States (CONUS) domain, which is composed of about 10 km resolution grid cells. We assimilate the SMAP Level 2 Radiometer Half-Orbit 36 km Equal-Area Scalable Earth-Grid Soil Moisture product (SPL2SMP; O'Neill et al 2020a) into the model soil moisture simulation using an ensemble Kalman filter scheme (Evensen 1994, Reichle et al 2002b. To remove climatological biases in soil moisture between the Noah simulation and SMAP retrievals within DA, we use the CDF-matching and anomaly correction (see section 2.2 for more details). Outputs from the SMAP soil moisture DA experiments are evaluated using in situ soil moisture measurements from the International Soil Moisture Network (ISMN; Dorigo et al 2011Dorigo et al , 2013. Experimental setup details including the atmospheric forcing, land parameters, model initialization, perturbation settings, and quality-control (QC) procedure are provided in Text S1.

Bias correction methods
The CDF-matching follows the approach of , in which the satellite soil moisture retrievals are scaled to the model-simulated soil moisture climatology so that the CDFs of the scaled retrievals and the model estimates match. This CDF-matching approach corrects all the statistical moments of the distribution of observations regardless of its shape .
The newly proposed bias correction method (i.e. anomaly correction) follows two simple steps: (a) the climatology of the soil moisture retrievals is removed from the original SMAP data to obtain the SMAP soil moisture anomaly, and (b) the SMAP anomaly is then added to modeled soil moisture climatology, which is then assimilated into the LSM. The anomaly correction method assumes that the climatological soil moisture difference between models and observations is the main source of the systematic bias while the differences in higher moments such as standard deviation are small. Through the anomaly correction method, we expect that the climatological bias is removed, but the temporal variability of the original SMAP soil moisture retrievals is maintained so that the unmodeled irrigation signals can be captured by the model via assimilation.
Both the bias correction methods are performed for each model grid cell in the domain. The CDFs and climatologies of the model and observation are pre-generated for each grid cell using all available data from 2008 to 2020 (for the Noah LSM) and from 2015 to 2020 (for the SMAP retrievals), respectively. Based on the recommendations from prior studies (e.g. Kumar et al 2015) that show improved DA performance and reduced spurious statistical artifacts, CDFs are computed at a monthly scale. In the anomalybased bias correction, the anomalies are computed at each grid cell by subtracting the long-term mean across the entire data record from each day's data.

Rationality of the anomaly correction assumption
The key assumption in CDF-matching is that systematic differences in all statistical moments, such as mean, standard deviation, skewness, and kurtosis, of the model and observation distributions must be corrected. In contrast, the anomaly correction method assumes that correcting only the first moment of the systematic differences in soil moisture between models and observations is sufficient, and that maintaining the temporal variability of observations can help models to better capture the irrigation signals via assimilation. Figure 1 shows the spatial distribution of the mean and standard deviation of surface soil moisture from the model (OL; without assimilation) and SMAP retrievals across 2015-2020. Both the model and SMAP exhibit a similar spatial pattern (i.e. relatively dry and wet soil moisture conditions in the western and central/eastern CONUS, respectively) of the temporal mean surface soil moisture (figures 1(a) and (b)). However, compared to the SMAP retrievals, the Noah LSM-simulated soil moisture shows wetter and drier conditions in the western and central/eastern CONUS, respectively (figure 1(c)). The standard deviation of surface soil moisture is relatively small in the dry western part of the study domain while it is larger in the wet central/eastern CONUS (figures 1(d) and (e)). The spatial pattern of the differences in the standard deviation of soil moisture (figure 1(f)) is similar to that of the mean soil moisture difference (figure 1(c)), but its magnitude is less than ±0.04 m 3 m −3 for 99% of grid cells in the CONUS domain (figure 1(g)). These smaller differences in the standard deviation between the model and observation support the key assumption of the anomaly correction method that the biases in the mean is the main source of the systematic error. Note that similar checks must be conducted for every combination of remote sensing retrieval and LSM before DA.
In the following sections, we demonstrate the feasibility of the anomaly correction method for improving the model estimates of soil moisture over the CDF-matching method via assimilation of the SMAP soil moisture retrievals while preserving useful information about human-induced processes such as irrigation captured by the SMAP data, but not represented in the model. Figure 2 presents an evaluation of soil moisture DA performance by comparing against in-situ ISMN soil moisture measurements with the two a priori bias correction methods, that is, CDF-matching (DA CDF ) and anomaly correction (DA AC ). Note that the in-situ comparison is used primarily to evaluate the impact of the relaxed assumptions of climatological differences with the anomaly correction. Figure 2 shows the unbiased root mean square errors (ubRMSE) and anomaly correlation coefficients (anomaly R) with the normalized differences in the ubRMSE and anomaly R between the DA and OL cases computed as (DA − OL) /OL × 100. Negative values in the ubRMSE normalized difference and positive values in the anomaly R normalized difference indicate improvements because of DA. The percentage of model grid cells where the ubRMSE and anomaly R are improved (i.e. reduced ubRMSE and increased anomaly R) or degraded via assimilation of the SMAP soil moisture retrievals is also presented.

Evaluations using in-situ soil moisture measurements
Overall, compared to the OL run, both DA CDF and DA AC lead to small, but improved estimates of surface soil moisture in terms of the domain-averaged ubRMSE and anomaly R (figure 2). DA AC outperforms DA CDF in reducing the domain-averaged ubRMSE (figure 2(a)) while DA CDF results in a higher domain-averaged anomaly R than DA AC (figure 2(b)). Figure 2(c) shows that a 0.35% and a 2.54% reduction in ubRMSE are achieved by DA CDF and DA AC , respectively, and a 4.04% and a 2.02% increase in anomaly R are achieved by DA CDF and DA AC , respectively. When the normalized improvements in ubRMSE and anomaly R are combined, DA AC exhibits slightly better performance than DA CDF . Regarding the number of improved model grid cells, the DA AC case is superior to the DA case using the CDF-matching for both the ubRMSE and anomaly R ( figure 2(d)). That is, DA AC reduces the ubRMSE and increases the anomaly R for 51.6% (27.7% with statistically significant changes) and 60.3% (25.0% with statistically significant changes) of the evaluated grid cells, respectively, while DA CDF improves 46.1% (14.1% with statistically significant changes) and 58.3% (16.2% with statistically significant changes) of the evaluated model grid cells with respect to the ubRMSE and anomaly R, respectively. Note that additional DA experiments using and degraded grid cells by DA with (p < 0.05) and without (p ⩾ 0.05) statistically significant changes (p-values are obtained from a t-test). DACDF and DAAC represent DA cases using the CDF-matching and anomaly correction methods, respectively. In the distribution plots of (a) ubRMSE and (b) anomaly R, white dots and solid horizontal lines indicate the median and mean (the exact mean values are indicated in the figure), respectively, and black boxes and vertical lines represent the interquartile range and 95% confidence interval, respectively. the lumped CDFs calculated from all data points exhibit degraded performance, especially for anomaly R, compared to other DA cases (figure S2) due to spurious artifacts in the observations after scaling. This supports the use of the monthly-based CDFs in subsequent analyses.
These results are consistent with previous soil moisture DA studies that report small improvements (e.g. Kumar et al 2014, Blyverket et al 2019, De Lannoy et al 2019 in in-situ comparisons when high quality precipitation is used to drive the models. The domain-averaged improvements in the soil moisture estimates achieved from DA using both the a priori bias correction methods are marginal and are not statistically significant when compared against the ISMN soil moisture measurements. This is presumably attributed to the good performance of the OL integration driven by the high-quality meteorological forcing. That is, 55.2% and 43.5% of the evaluated model grid cells fall within the lower range of ubRMSE (less than 0.08 m 3 m −3 ) and higher range of anomaly R (greater than 0.6), respectively.
In the context of this paper, these results confirm that use of anomaly correction provides comparable DA performance to the CDF-matching approaches, in the evaluations against in-situ measurements, most of which are located in non-irrigated areas where soil moisture changes are dominated by natural variability (i.e. precipitation-driven alone). The utility of the anomaly correction method in capturing anthropogenic signals such as irrigation is demonstrated in the following sections.

Capturing irrigation signals
In this section, we discuss the ability of the soil moisture DA framework to capture the effects of agricultural irrigation on soil moisture variability. Figure 3 shows the time-series of precipitation rate and surface soil moisture from the uncorrected (original) and corrected (after the CDF-matching or anomaly correction) SMAP, and the OL and DA cases for a single year (2016) at five example agricultural locations: California (marked with a yellow circle), Idaho (Snake River basin, marked  (Salmon et al 2015). The left and right columns present the DA results using the CDF-matching and anomaly correction, respectively, for the same locations and time period.
with an orange triangle), Nebraska (marked with a red square), Texas (marked with green circle), and Georgia (marked with green triangle). These locations are selected from the Moderate Resolution Imaging Spectroradiometer (MODIS)-based map (Salmon et al 2015) of the irrigated grid cell fraction ( figure 3(a)).
The example locations are in general characterized by the wet winter season and dry summer season based on the precipitation rate, and thus are areas where irrigation is the main water source for crops during the warm, dry summer season. The SMAP data show that wet soil moisture conditions are maintained during the dry summer season (figure 3), which is attributed to irrigation practices. The irrigation signals, i.e. deviation of the soil moisture pattern from the precipitation pattern, are well captured by the SMAP retrievals especially in the California region (figures 3(b) and (g)) where flood irrigation practices dominate, as discussed in Lawston et al (2017).
The bias-corrected SMAP soil moisture in figure 3(b) shows that the irrigation signals, which are present in the uncorrected SMAP data, are not incorporated well after the CDF-matching. Therefore, assimilating the CDF-matched SMAP soil moisture data does not produce significantly higher soil moisture values during the summer months, which is presumably the time period when irrigation practices are active. On the other hand, the bias-corrected SMAP data after the anomaly correction is able to capture the effect of irrigation on soil moisture ( figure 3(g)).
Consequently, merging the SMAP soil moisture anomaly information into the model via assimilation helps the model to represent the wet soil moisture features from the observations during the rain-free, dry summer season as obviously seen in the flood irrigation-dominated region ( figure 3(g)). Although only surface soil moisture is directly updated during the assimilation procedure, the impact is propagated to deeper soil layers by the model subsurface physics ( figure 4).
Compared to the California region ( figure 3(g)), the obvious nature of the irrigation signal in the SMAP data is less evident at other locations, i.e. Idaho (figures 3(c) and (h)), Nebraska (figures 3(d) and (i)), Texas (figures 3(e) and (j)), and Georgia (figures 3(f) and (k)) regions, due to the simultaneous influence of irrigation and rainfall. Nevertheless, assimilating the SMAP data using the anomaly correction method (i.e. DA AC ) results in wetter soil moisture conditions than both the OL and DA CDF during the dry summer season. Over the Snake River basin location, SMAP (uncorrected) observations show soil moisture levels maintained during April to September time period, presumably due to irrigation. While DA CDF do not capture these features, DA AC captures the persistence of the soil moisture at the same level seen in the observations.

Land surface temperature (LST) and latent heat flux estimates
Soil moisture conditions affect LST through various physical processes. Soil thermal diffusivity in general increases with wetter soil moisture, although their relationship is non-linear because both soil thermal conductivity and heat capacity increase with soil moisture (Al Nakshabandi andKohnke 1965, Kwon andKoo 2017). Therefore, wetter soil moisture conditions may lead to more energy transfer from surface to subsurface soil layers during daytime. Increased soil moisture also increases the energy loss (i.e. latent heat flux to the atmosphere) from the land surface by enhancing evapotranspiration (ET). Many studies (e.g. Huang and Ullrich 2016, Thiery et al 2017, Chen and Dirmeyer 2019) demonstrate that irrigation leads to strong evaporative cooling of the land surface. On the other hand, increased soil moisture darkens the land surface (i.e. decreases the surface albedo) and resultantly increases net radiation at the land surface. All these physical processes associated with soil moisture, especially soil moisture-atmosphere feedbacks, shape the probability distribution of LST (Berg et al 2014).
Here, we compute the Kullback-Leibler divergence (KLD; Kullback and Leibler 1951) by comparing the model-estimated LST distribution against that of the MODIS LST data (MOD11A1; Wan et al 2015) during the period of July to September (figure 5). The KLD measures the dissimilarity between two different probability distributions, that is, a smaller KLD indicates that the probability density of the estimated LST is closer to that of the MODIS LST. This comparison is used as an independent evaluation of the ability of different soil moisture DA strategies for capturing irrigation impacts. Figure 5 shows the difference in KLD between the DA AC and DA CDF cases. In the KLD difference map, negative values represent that the LST estimates are improved by DA AC compared to the DA CDF case. Overall, DA AC is superior to DA CDF with respect to the KLD in the irrigated areas (see figure 3(a)), especially in the central/eastern CONUS including Nebraska, Kansas, Oklahoma, Texas, lower Mississippi, and Georgia. Compared to other irrigated areas, a relatively small number of locations in California and Idaho exhibits clear differences in KLD between the DA cases, which is attributed to small changes in the LST estimates by DA with a limited number of SMAP data resulting from the QC procedures. These results provide additional evidence that the newly applied soil moisture bias correction method (i.e. anomaly correction) can be an effective alternative to the traditional method (i.e. CDF-matching) within the DA framework to capture the irrigation signals in the soil moisture estimates via assimilation of the SMAP retrievals. Improvements in ET estimates over the irrigated areas are also observed (figure 6), when the results are compared against the Atmosphere-Land Exchange Inverse (ALEXI; Anderson et al 2007) ET product. Except for the lower Mississippi region, the comparison to ALEXI indicates beneficial impacts from DA over irrigated areas in the snake river basin, parts of Nebraska and central valley of California. Further evaluations using ground-based measurements need to be conducted in future studies.

Summary and conclusions
This study demonstrates a new bias correction method (i.e. anomaly correction) within the soil moisture DA framework with the aim to improve model estimates of soil moisture in irrigated areas. The newly applied method is different from the commonly used CDF-matching in that it does not scale remotely-sensed soil moisture retrievals to the modeled soil moisture climatology. Instead, the anomaly correction approach directly uses the anomalies from the remote sensing retrievals. Thus, the anomaly correction method maintains the temporal variability of the soil moisture retrievals, which undergoes significant changes when the CDFmatching method is employed to correct systematic biases between models and observations. The difference in the soil moisture standard deviations between the model and observation used in this study (from SMAP) is much smaller than the difference in the mean. This supports the key assumption of the anomaly correction method, that is, the bias in the mean is the primary source of the systematic differences in soil moisture between models and observations, and thus correcting the bias only in the first moment is sufficient for performing soil moisture DA.
The experimental results demonstrate that correcting biases using the anomaly correction method can ameliorate issues (i.e. loss of irrigation signals from observations by employing the CDF-matchingbased bias correction method) reported from previous studies. The results show that DA employing the anomaly correction can better capture the effect of irrigation on soil moisture in agricultural areas. Further, the use of anomaly correction exhibits comparable or better performance in estimating soil moisture compared to the DA case using the CDFmatching method in the comparisons to in-situ measurements. These findings are encouraging and suggest the potential of the soil moisture DA framework to represent unmodeled processes in the soil moisture estimates at large spatial scales. This paper demonstrates the potential added utility of directly assimilating information from remote sensing retrievals, using SMAP data. This more relaxed assumption in DA is possible because the systematic errors between SMAP and the LSM is primarily in the first moment of the statistical distribution. For retrievals from other sensors with larger errors in the variability of soil moisture (e.g. standard deviation), this approach is unlikely to be viable. It should be emphasized that further evaluations of our methodology using other soil moisture retrievals need to be conducted in future studies.
The results of this paper also highlight the need to improve the consistency across soil moisture retrievals and model estimates so that they can be used more directly in DA or end-use application environments. Currently, soil moisture remote sensing retrievals are produced in volumetric water content or as estimates of water saturation, and conversion between the two physical spaces require the knowledge of soil properties such as bulk density, which are often not reported along with retrieval products. Similarly, LSM-based soil moisture estimates are model-specific quantities, representing a measure of the wetness of the soil. To assimilate satellite soil moisture estimates directly, the inconsistencies between models and remote sensing retrievals must be addressed. Though the soil moisture community has largely sidestepped this issue by focusing on anomaly-based evaluations, studies have shown that they underestimate the errors (Dorigo et al 2010). The results of this study also show the promise of smaller information loss and larger utility from DA if more direct approaches to assimilating remote sensing retrievals can be developed.

Data availability statement
The NASA Land Information System (LIS) is available for public access at https://github.com/NASA-LIS/LISF. The Soil Moisture Active Passive (SMAP) soil moisture retrievals can be obtained from 10.5067/ F1TZ0CBN1F5N. The International Soil Moisture Network (ISMN) in situ soil moisture measurements can be downloaded from www.geo.tuwien. acat/insitu/data_viewer/. The Moderate Resolution Imaging Spectroradiometer (MODIS) land surface temperature can be accessed at https://e4ftl01.cr.usgs. gov/MOLT/MOD11A1.006/.
The data that support the findings of this study are available upon reasonable request from the authors.