Assimilation of remotely sensed soil moisture and vegetation with a crop simulation model for maize yield prediction

To improve the prediction of crop yields at an aggregate scale, we developed a data assimilation-crop modeling framework that incorporates remotely sensed soil moisture and leaf area index (LAI) into a crop model using sequential data assimilation. The core of the framework is an Ensemble Kalman Filter (EnKF) used to control crop model runs, assimilate remote sensing (RS) data and update model state variables. We modi ﬁ ed the Decision Support System for Agro-technology Transfer – Cropping System Model (DSSAT-CSM)-Maize model (Jones et al., 2003) to be able to stop and start simulations at any given time in the growing season, such that the EnKF can update model state variables as RS data become available. The data assimilation-crop modeling framework was evaluated against 2003 – 2009 maize yields in Story County, Iowa, USA, assimilating AMSR-E soilmoisture andMODIS-LAI dataindependentlyand simultaneously. Assimilating LAI or soil moistureindepen-dently slightly improved the correlation of observed and simulated yields (R = 0.51 and 0.50) compared to no data assimilation (open-loop; R = 0.47) but prediction errors improved with reductions in MBE and RMSE by 0.5 and 0.5 Mg ha − 1 respectively for LAI assimilation while these were reduced by 1.8 and 1.1 Mg ha − 1 for soil moisture assimilation. Yield correlation improved more when both soil moisture and LAI were assimilated (R = 0.65) suggesting a cause – effect interaction between soil moisture and LAI, prediction errors (MBE and RMSE) were also reduced by 1.7 and 1.8 Mg ha − 1 with respect to open-loop simulations. Results suggest that assimilation of LAI independently might be preferable when conditions are extremely wet while assimilation of soil moisture + LAI might be more suitable when conditions are more nominal. AMSR-E soil moisture tends tobemore biasedunder the presence of highvegetation (i.e.,when crops are fully developed) andthat updating rootzone soil moisture by near-surface soil moisture assimilation under very wet conditions could increase the modeled percolation causing excessive nitrogen (N) leaching hence reducing crop yields even with water stress reducedataminimumduetosoilmoistureassimilation.However,applyingthedataassimilation-cropmodeling framework strategically by considering a-priori information on climate condition expected during the growing season may improve yield prediction performance substantially, in our case with higher correlation (R = 0.80) and more reductions in MBE and RMSE (2.5 and 3.3 Mg ha − 1 ) compared to when there is no data assimilation. Scaling AMSR-E soil moisture to the climatology of the model did not improve our data assimilation results because the model is also biased. Better soil moisture products e.g., from Soil Moisture Active Passive (SMAP) mission, may solve the soil moisture data issue in the near future. © 2013 The Authors. Published by Elsevier Inc.


Introduction
When a crop model is used to predict crop yields early in the growing season, two sources of uncertainties prevailthose coming from climate and model uncertainties (Hansen, Challinor, Ines, Wheeler, & Moron, 2006).Climate-related uncertainty is greatest early in the growing season but tends to decrease as weather data become available as growing season progresses.Model-related uncertainty due to errors in model structure, modeling assumptions and other ancillary data, generally remains constant through the growing season.Skillful climate forecasts can reduce climaterelated uncertainty in crop yield prediction especially at the earlier stages of the growing season, while model-related uncertainty can potentially be reduced by assimilating remote sensing (RS) data during the growing season (de Wit & Van Diepen, 2007;Hansen et al., 2006;Vazifedoust, Van Dam, Bastiaanssen, & Feddes, 2009).
Remote sensing had been incorporated into crop simulation models either as a forcing function or simulation steering (Bouman, Van Diepen, Vossen, & Van Der Val, 1997).Forcing function is applied to replace simulated state variable with the RS observation while simulation steering is used to re-initialize (e.g., sowing date, planting density) or re-parameterize (e.g., canopy and growth parameters) the crop model in a way that minimizes the difference between simulated and measured data.Examples of the simulation steering approach include the works of Bouman (1992), Olioso et al. (2005), Fang, Liang, and Hoogenboom (2011) and Thorp et al. (2012) who linked radiative transfer models with crop models.Ines, Honda, Gupta, Droogers, and Clemente (2006) used remotely sensed evapotranspiration to reparameterize soil properties, crop and water management parameters of a pseudo-regional Soil-Water-Atmosphere-Plant (SWAP) model.
When RS data are used to replace the value of a model-simulated state variable or to infer some soil-plant-atmosphere-continuum properties, one assumes that the RS data are free of error or assumes that the level of data error is acceptable to be propagated within the simulated system (Fang, Liang, Hoogenboom, Teasdale, & Cavigelli, 2008;Ines & Mohanty, 2008a,b,c;Ines & Mohanty, 2009).Thorp, Hunsaker, and French (2010) assimilated measured Leaf Area Index (LAI) in the DSSAT-CSM-Wheat model using forcing and updating mechanisms.The updating mechanism is a forcing scheme that accounts for back propagation of the change in LAI to the system.Their simple assimilation procedure is more successful in minimizing errors in ET and canopy weight, but had difficulty improving yield simulations because yield is controlled by other factors aside from LAI. Vazifedoust et al. (2009) conducted a simple sequential data assimilation using a constant gain Kalman filter to assimilate LAI and the ratio of actual ET to potential ET (ET/ET p ) in SWAP-WOFOST and found significant improvements in simulated total dry matter but only one among three of their observation fields showed significant improvements in simulated yield.It should be noted that timing of the use and frequency of LAI data assimilation in crop model is critical as LAI (or NDVI) is more directly related to yield at silking and grain filling (Ozalkan, Sepetoglu, Daur, & Sen, 2010;Sehgal, Sastri, Karla, & Dadhwal, 2005;Teal et al., 2006).
Sequential data assimilation is a robust way of combining model and observations to minimize the uncertainty of a given modeled state as it enhances the use of information between imperfect model and observations.Of the several algorithms (e.g., particle filter, Kalman filter) capable of performing data assimilation to update sequentially model states and parameters, the Monte Carlo-based Ensemble Kalman Filter (EnKF) is the one that is widely used (Evensen, 2003).EnKF received a lot of attention in the geosciences because of its ease of implementation, computational efficiency and optimum performance.It uses the Monte Carlo approach to approximate the conditional second-order moments of variables of interest using a finite number of randomly generated model replicates, then corrects model forecast and error covariance (Evensen, 2003;Houtekamer & Mitchell, 1998).Many studies have implemented EnKF to assimilate RS data in meteorological and hydrological models with considerable success (e.g., Crow & Wood, 2003;Das & Mohanty, 2006;Das, Mohanty, Cosh, & Jackson, 2008;Dunne & Entekhabi, 2005;Evensen, 2003;Keppenne & Rienecker, 2002;Reichle, McLaughlin, & Entekhabi, 2002).
As with any other models, crop simulation models are also subject to structural and data (input and forcing) errors hence they are imperfect in simulating the truth.Sequential data assimilation can be used to improve crop model performance without altering its structure by periodically updating state variables within the growing season with RS observations.RS of vegetation (e.g., LAI) and soil moisture are potentially useful for sequential data assimilation because of their obvious influence on crop growth, hence on crop yields.Their spatial and temporal coverage also allows data assimilation for crop forecasting at regional scale.
EnKF had been used with crop models recently with some success and challenges especially when assimilating LAI (e.g., Curnel, de Wit, Duveiller, & Defourny, 2011).Most of these studies however were conducted under hypothetical conditions, so-called forward-backward simulations, and could be limited to explaining fully the strengths and limitations of the method under actual conditions, especially at predicting yield at aggregate scale (Curnel et al., 2011;Nearing et al., 2012).de Wit and Van Diepen (2007) showed the utility of RS-derived rootzone soil wetness index to correct some of the errors in the soil water balance associated with imperfect model inputs e.g., gridded rainfall data, in crop yield prediction.
In this paper, we developed a data assimilation-crop modeling framework for assimilating remotely sensed data with a crop model that could be used to improve crop yield forecasting at a given leadtime within the growing season.We present our implementation of an EnKF data assimilation system, development of the stand-alone DSSAT-CSM-Maize model, and testing and evaluation of the method under actual growing conditions in Story County, Iowa.The testing and evaluation aims to quantify the use of remotely sensed soil moisture and LAI to improve simulated yields within the data assimilation-crop modeling framework, independently and simultaneously.A variant of EnKF called an Ensemble Square Root Filter (Whitaker & Hamill, 2002) (but we termed it EnKF in general) was implemented for this study to simplify the use of RS data in the data assimilation, especially crop growth observations e.g., LAI, as the square root filter allows data assimilation without perturbing the observed data.This kind of work is important to improving the applications of data assimilation in crop yield forecasting.

EnKF data assimilation system
The core of data assimilation lies in the Kalman filter system, which assumes that observations are related to the true state x t (e.g., soil moisture or LAI at time t) as: where y is the observation vector, ε is a Gaussian random error vector with a mean of zero and observation error covariance R, and H is the operator that maps the model variable space to the observation space.Furthermore, the forecast of x t at t = k is Gaussian with mean x t = k f and error covariance P t = k f .Under these assumptions, the estimated state and error covariance is updated as: where f and a are indices of the prior (called forecast) and posterior (called analysis) estimates, respectively, t is an index of time, I is the identity matrix, and K is the Kalman gain matrix defined as The EnKF forecast and analysis error covariance come directly from an ensemble of model simulations: where N e is the number of ensemble members, n is a running index for ensemble member, and x f represents the ensemble mean calculated as: Usually, the ensemble is generated by perturbing the observed data.The variance used in the perturbation is based on the uncertainty of the data.On the other hand, model parameters are perturbed to generate the ensemble of model runs.In this system, ensemble members are integrated independently and updated in accordance with the Kalman filter method when new observations become available.
In our study, however, a variant of the EnKF approach (i.e., Ensemble Square Root Filter) was sought to ensure that the analysis error covariance does not become unrealistically low because we intended not to perturb observations to minimize the risk of pairing LAI profiles with unusual planting date.Burgers, Van Leeuwen, and Evensen (1998) demonstrated that P a is underestimated by a factor of (I − KH) when observations are not treated as random variables.This can cause the standard EnKF to reject observations in favor of the ensemble forecast, which can lead the analysis incrementally further away from reality, resulting in filter divergence (e.g., Burgers et al., 1998;Houtekamer & Mitchell, 1998;Mitchell & Houtekamer, 2000;Whitaker & Hamill, 2002).Whitaker and Hamill (2002) showed that adding random noise to observations further skews the distribution of P a that resulted in a more erroneous analysis even though the covariance is increased.They suggested an alternative way of updating ensemble members, where the ensemble mean x a t¼k À Á is still updated by Eq. ( 2) but deviations e x a t¼k from the mean are updated by: where e x f t¼k is a forecast value centered at the ensemble forecast mean, . By this method the analysis error covariance is guaranteed to be exactly equal to that of Eq. ( 2), and perturbed observations are no longer necessary.Fig. 1 shows the overall framework of the EnKF data assimilation system.First, ensemble members are generated (in our case, 40, see Section 2.3.5),then independent crop model runs are invoked.For each model run, each time a new RS observation becomes available, the run is interrupted, EnKF updates the target model state variables, and the simulation is re-initialized with the updated states and re-run until the next update is available.The data assimilation step also includes small inflation parameters (1.05 for soil moisture and 1.5 for LAI) to perturb the forecast ensemble members in case their variability becomes too low.This step ensures that the observations are not systematically rejected during assimilation.
Earlier, we conceptualized the integration by embedding the EnKF within the DSSAT-CSM code (Jones et al., 2003).However, we found that this is not feasible if the data assimilation crop-modeling framework is not developed in a parallel computing infrastructure because the crop model should run multiple ensemble members simultaneously, then wait to update state variables and model parameters when RS data are available.That is why we developed a modified version of DSSAT-CSM-Maize (Section 2.1.2) that allows EnKF to control an ensemble of independent crop model runs.
With respect to the time scales of soil moisture process dynamics and the observed surface layer, which is typically shallow as compared to deeper soil layers, the propagation of near surface soil moisture information only through vertical model physics is relatively inefficient.In contrast, updating of the deeper layers soil moisture based on the modeled surface-rootzone soil moisture error correlations expressed through the ensemble members can provide an efficient downward propagation of surface soil moisture (observation through remote sensing) information, as long as errors in the surface layer soil moisture are statistically correlated to errors in the soil moisture deeper layers via the model physics.The term P t = k f H T in Eq. ( 4) is the cross-covariance between errors in the model state (for example, surface and rootzone soil moisture) and errors in the observed variables (i.e., the near surface soil moisture measurements).The DSSAT-CSM model propagation and update steps as illustrated in Fig. 1 mean that surface information is propagated into the rootzone in two ways.First, in the DSSAT-CSM model propagation step, soil moisture interactions take place between the surface and deeper layers according to the modeled soil moisture dynamics.Second, in the presence of a surface soil moisture observation, an increment to deeper layer soil moisture is computed and applied using the EnKF update step, that is based on the innovation (i.e., difference between predicted and observed) and the surfacerootzone error correlation (as expressed in the Kalman gain).All of these are done on a daily time step (Fig. 1).
The assimilation of near-surface RS soil moisture updates the rest of the rootzone soil moisture using the updating equations in Eqs. ( 7) and ( 2), for a 9-layered soil profile, H would be equal to (1 0 0 0 0 0 0 0 0) T as the only layer with measurement is the first layer (see Section 2.2.2).Only near-surface and sub-surface soil moisture were updated here.The only link with the soil Nitrogen process is the updated profile soil moisture.The assimilation of RS LAI is also done using Eqs.( 7) and ( 2), but now H = 1.However, LAI is related to other vegetation variables, using stage-state functions in DSSAT-CSM, we updated plant leaf area (PLA) and plant leaf weight (LFWT) using the updated LAI values.The updated plant variables are used to calculate carbohydrates (CARBO) for the next time step.In this study, the crop model propagates the link between the updated plant variables and plant Nitrogen process in the next time step.

Modified DSSAT-CSM-Maize
The DSSAT-CSM simulates growth, development and yield of a crop growing on a homogeneous land area, under prescribed or simulated management practices, as influenced by the dynamics of solar radiation, temperature, and soil water, carbon and nitrogen (Jones et al., 2003).Two dominant soil profiles in our study area were sampled to represent the landscape.Each realization was used as homogenous land unit to run the modified DSSAT-CSM-Maize.Crop models are generally designed to run continuously from sowing until maturity or harvestunless the crop fails due to extreme stress.However, the Ensemble Kalman Filter must be able to interrupt the simulation of ensemble member as RS data become available, check and adjust target state variables, and re-start simulations with the updated state values as initial conditions.We therefore modified DSSAT-CSM-Maize to be able to stop and re-start at any point within the growing season, based on the timing and frequency of available RS data.Every time the modified DSSAT-CSM-Maize model stops, an ASCII file records the current values of model state variables and parameters.This file is used to access and modify assimilated state variables, and re-initialize the crop model when it is again invoked.We verified that, in the absence of data assimilation, model outputs using the EnKF-DSSAT-CSM-Maize implementation matched the original DSSAT-CSM version, within the bounds of truncation and rounding errors associated with the intermediate text file.Nitrogen (N) process was included in the simulations because we wanted to simulate actual yields.However, the only link with EnKF and the N-process is the updated soil moisture states.

Testing and evaluation
To test the data assimilation-crop modeling framework, we selected a large-scale, relatively homogenous farming system in Midwestern USA, Story County, Iowa, where the majority of the crops are maize and soybean, but maize dominates the landscape (Fig. 2).In our study, we only considered maize (Zea mays) in the analysis.

Crop yields
Maize production was downloaded and processed for Story County from the USDA National Agricultural Statistics Service (NASS) website.Available data include area planted and harvested and corresponding yield.Yields measured in bushels were converted to mass units (Mg ha −1 ).The 2003-2009 mean yield was 11.12 Mg ha −1 with a standard deviation of 0.7 Mg ha −1 .The NASS yield data is the most readily available estimate of maize yield for aggregate modeling in the county.Plot and field scale yield data averaged across the county may be more accurate to represent county yield but were not available.

Soils
Soil data in shapefile format were downloaded and processed for Iowa from the Soil Survey Geographic (SSURGO) database operated by the National Resources Conservation Service (NRCS) of the United States Department of Agriculture (USDA).The tabular dataset contains estimated and measured physical and chemical soil properties and soil interpretations.Map units were dominated by a single soil or mixture of soils.
We selected the two dominant soil profiles present within Story County.The soil profiles comprised mostly of loamy and clay loam soil of glacial till origin.Loamy soil is well-drained with high water holding capacity, making it suitable to agricultural use.Clay loam soil has lower drainage capacity and high water holding capacity due to clay texture.Soil profiles were simulated using nine soil layers (0-5, 5-15, 15-30, 30-45, 45-60, 60-90, 90-120, 120-150, 150-180 cm) to a depth of 180 cm using these two dominant soil types.

Remote sensing data
2.2.4.1.AMSR-E soil moisture.For this study, we used the AMSR-E soil moisture product (Njoku, Jackson, Lakshmi, Chan, & Nghiem, 2003).AMSR-E data was downloaded from the NSIDC web portal (http:// nsidc.org/data/amsre/)and are further processed at the Jet Propulsion Lab (JPL).The crop model runs on a daily time step, and the best soil moisture estimate we can use to assimilate is the AMSR-E descending (A.M. passing) product.A fundamental assumption used in the soil moisture retrieval algorithm is the uniformity of temperature profile between the soil and vegetation stand, and the morning retrieval period is the best time to satisfy this assumption, hence more accurate retrievals.An example of the AMSR-E soil moisture product (~0-2 cm depth) over North America gridded at 25 km is shown in Fig. 4. The AMSR-E soil moisture level-3 product retrieval algorithm uses polarization ratios (PR, i.e., difference between vertical and horizontal brightness temperatures at a given frequency, divided by their sum) of the AMSR-E channel brightness temperatures at X-band (10.27 GHz) (Njoku & Chan, 2006).
A time series of AMSR-E soil moisture data over Story County (Fig. 3) shows a distinct signature typical to AMSR-E X-band measurements.Fluctuations in soil moisture values due to wetting and drying are apparent earlier in the year, but soil moisture fluctuations are dampened later in the year when vegetation cover is high.The AMSR-E X-band measurements are unable to penetrate enough through lush vegetation to detect surface soil moisture.The measurements are attenuated by high vegetation water content (VWC).Uncertainty/errors linearly interpolated from 2% for bare soil and 7.5% for VWC of 6 kg m −2 (Bindlish, Jackson, Gasiewski, Klein, & Njoku, 2006) were assigned to AMSR-E soil moisture estimates during this period during data assimilation.

MODIS leaf area index (LAI).
We downloaded MOD15A2 from NASA ECHO portal (http://reverb.echo.nasa.gov/reverb/)and processed MODIS-LAI for Iowa.We used MODIS Reprojection Tool (MRT) to process 4 tiles of MODIS data for the state.The MODIS-LAI product is available as an 8-day composite at 1 km resolution.For testing and evaluation of the data assimilation-crop modeling framework we processed LAI from 2003 to 2009.In sampling county-wide LAI, we randomly selected maize pixels from the landuse-landcover maps of USDA-NASS crop data layer (http://nassgeodata.gmu.edu/CropScape/) and extracted LAI profiles from the co-located, stacked MODIS-LAI images.The extracted LAI profiles were further processed (smoothed, then averaged) to arrive at an aggregate Story County LAI time series.

Data assimilation strategies
We used a hybrid maize variety in the crop simulations, roughly calibrated using 2005 data.The crop model was run in open-loop and the genotype coefficients and planting date were adjusted (manually) to match the growth and phenological characteristics of the crops based on USDA-NASS, remote sensing data and crop information from Iowa State University (ISU) Extension and Outreach (http://www.extension.iastate.edu).Sequential assimilation of remote sensing data could also correct some uncertainty associated with calibrated model parameters (Das et al., 2008).
We used DOY 130 as sowing dates (all years) for the data assimilation runs based on the examination of the MODIS-LAI profiles.This in an approximate as the planting window in Central Iowa is between April 15-May 18 (http://www.extension.iastate.edu).Curnel et al. (2011) noted that a phenological shift between model and observations can have a large impact thus sowing date should be chosen with care.Simulations were started at DOY 100 to allow the soil water balance model to initialize for 30 days.Fertilizer applications were fixed at 200 kg-N ha −1 (based on typical practices for maize in Iowa) applied at sowing.Plant stand density was set at 7.2 plants m −2 .Evapotranspiration was modeled using Priestley-Taylor method.No irrigation was applied.
The study compared an open-loop simulation and simulations that assimilated (i) LAI only, (ii) soil moisture only and (iii) soil moisture and LAI.All simulations were conducted in 2003-2009, reinitialized for each growing season.Multiple year analyses ensure that the data assimilation-crop modeling framework was subjected to actual variability of climate.A provision was made in the data assimilationcrop modeling framework that if simulated LAI was above a threshold level (see Section 2.3.4),EnKF increases retrieval error of AMSR-E soil moisture (see Eqs. ( 2) and ( 7)).
2.2.5.1.Monte Carlo simulations.The success of data assimilation using EnKF depends greatly on the Monte Carlo setup.Crop model parameters that have a major influence on DSSAT-CSM-Maize model include physical attributes of soil profile (residual water content, field capacity and saturated water content), thermal time for seedling emergence, thermal time from silking to physiological maturity, maximum number of kernel per plant and phylochron interval.They were all considered fixed parameters during data assimilation.An uncertainty level of 10% was introduced to each model parameter above and perturbed using a Gaussian distribution.Then ensemble members were generated by randomly sampling model parameter combinations from the perturbed arrays.Forty ensemble members were selected to optimize the EnKF framework performance in terms of accuracy and computational time.At the start of simulation, the modified DSSAT-CSM-Maize model also randomly sampled values of leaf weight at emergence and plant leaf area at emergence for each selected ensemble member to increase the variability of the ensemble.
2.2.5.2.Nitrogen and scaling effects.Coupling the nitrogen (N) process is necessary in actual yield estimation.In addition to our coupled N experiments, we made an experiment when N is decoupled in the process i.e., simulations are not impacted by N stress, only by water stress, to test how the data assimilation-crop modeling framework will perform.Also, we tested scaling the AMSR-E soil moisture data to the climatology of the crop model with the aim of reducing the RS data bias (Reichle & Koster, 2004).We used a cumulative distribution function (CDF) mapping to scale the RS soil moisture data to the distribution of the modeled soil moisture (Eq.( 7)), where x and x' are the non-scaled and scaled AMSR-E soil moisture and F(.) is the CDF of the model or AMSR-E soil moisture; here we used normal distributions for the CDFs.
Note however that if the model is also biased there is a danger of scaling RS soil moisture to the climatology of the model.If available, correcting the RS data with ground observations upscaled at the proper resolution maybe a better option (e.g., Ines & Hansen, 2006;Ines, Hansen, & Robertson, 2011).Better soil moisture measurements from  9)), Mean Bias Error (MBE, Mg ha −1 ) (Eq. ( 10)) and Root Mean Squared Error (RMSE, Mg ha −1 ) (Eq. ( 11)) were used to measure performance,

MBE
where y n;i is the average yield (Mg ha −1 ) of the ensemble run, i is an index of year, M is the number of years, n is an index of ensemble member, N e is the size of the ensemble, y obs,i is the county level observed yield for year i and y n,i is an ensemble member yield for year i.

Open-loop simulations
The crop model without data assimilation was able to capture a substantial part of the year-to-year variability of county-level yields (R = 0.47, Fig. 5a, Table 1).However, simulations generally under-predicted observed yields, leading to high MBE (−3.7 Mg ha −1 ) and RMSE (4.7 Mg ha −1 ) (Table 1; actual errors could be lower as the model predicted yields in dry weight).The yield mismatch was particularly pronounced in 2006 and 2009.Based on the weather station data, these years received the lowest rainfall (Fig. 3) but records show that yields are high in the county.We hypothesize that this performance of the open-loop simulation could be attributed to several factors, which may include uncertainties in forcing data e.g., rainfall at the station (due to scale effects) and modeling assumptions, including model parameters.Assuming that the USDA-NASS yield data is accurate, which may be valid in the US, the open-loop simulation results suggest that the rainfall measured at the weather station in 2006 and 2009 may not have represented well what occurred in the county (scale effects) as the recorded yields of maize (fully rainfed) during those years are above average.A quick verification with CMORPH rainfall data supported this hypothesis, but one should note that there are also uncertainties associated with this data (http://iridl.ldeo.columbia.edu/expert/SOURCES/.NOAA/.NCEP/.CPC/.CMORPH).However, it is also possible that the crops' deeper root system may have allowed the crops to access deeper soil moisture during those years, hence the higher yields.During 2008, which is a very wet year, the slightly lower county yields could be attributed to the lower temperature and solar radiation, associated with very wet conditions, and water logging or leaching of nutrients due to runoff or deep percolation.The correlation of simulated yields and observations with open-loop simulation suggests that the weather station  used in the simulations had captured some of the inter-annual variability of rainfall in the county.

Assimilating LAI
Assimilating MODIS-LAI showed a slight improvement in simulated yields compared with the open-loop simulation (R = 0.51), including a small reduction of systematic error (Table 1, Fig. 5b).Assimilation improved simulated yields in some years (2004, 2005, 2006, 2007 and 2008), but did not show improvement in 2009.For brevity, we present detailed analyses only for the contrasting years, 2006years, , 2009years, and 2008years, . Based on weather station records, 2006years, , 2008years, , and 2009 were moderately dry, extremely wet, and very dry, respectively (Fig. 3).
Fig. 6 shows the vegetation, soil moisture dynamics (near-surface), water and nitrogen stress in 2006, with open-loop simulation (a-d) and data assimilation of LAI (e-f).Enhanced canopy growth resulting from the assimilation of remotely sensed LAI is associated with the improved yield simulation that year.In both cases, the crops were subjected to high intermittent water stress due to low rainfall (Fig. 6c,  g), but with no apparent nitrogen (N) stress (Fig. 6d,h).The small shifts in simulated near-surface soil moisture (Fig. 6b,f, from DOY 200 hence) with LAI assimilation show that the enhanced canopy also altered the simulated water balance, especially for evapotranspiration and rootzone soil moisture (not shown, respectively).
In 2009, rootzone soil moisture was very low (data not shown) and the crop was subjected to extreme levels of prolonged water stress due to lack of rain after anthesis as recorded by the weather station.This led to drastic reductions in LAI under the open-loop simulation (Fig. 7a,c).Instead of improving simulated yield, assimilation of remotely sensed LAI increased water stress and reduced simulated yield because the limited rootzone soil moisture could not satisfy the increase in water demand that resulted from increased LAI (Fig. 7e,g).This increase in canopy growth from assimilation of LAI data also increased N stress (Fig. 7d,h) because of non-availability of labile N due to limited soil moisture and increased N demand by the crops.
The improvements in simulated yields with the assimilation of LAI in 2008 were due to enhanced canopy growth (Fig. 8a,e) combined with abundant soil moisture availability due to being a wet year (Fig. 8b,f).Because of the extremely wet condition, the simulated LAI by openloop simulation were robust but more inferior than when there is data assimilation.With LAI assimilation, water stress was negligible until the later part of the growing season (Fig. 8c,g).Nitrogen stress is apparent due to higher N leaching and N loss associated with increased percolation and runoff (Fig. 8d,h).

Assimilating soil moisture
When assimilating RS soil moisture alone, we did not constrain vegetation dynamics.Assimilating AMSR-E soil moisture improved simulated yields compared with open-loop simulations, except for 2004 and 2008 (Table 1, Fig. 5c).
The median of the yield ensemble improved substantially when soil moisture was assimilated in 2006 (Fig. 5c).The assimilation of soil moisture led in an overall reduction in water stress, improvement in canopy growth (Fig. 9a,c) and increased near-surface soil moisture (Fig. 9b).The durations and intensities of water stress experienced by the crops in open-loop simulation were minimized after soil moisture assimilation.de Wit and Van Diepen (2007) observed that RS soil moisture when assimilated with a crop model could correct some scale issues associated to rainfall.But note that the AMSR-E signal is known to be attenuated by high vegetation moisture content (canopy fully developed) causing bias in measurements (Fig. 9b).Because of the bias in AMSR-E soil moisture, the assimilation of near-surface soil moisture to update soil moisture in the rootzone increased cumulative percolation by roughly 4 orders of magnitude, which is linked to N stress at the later part of the growing season (Fig. 9d).The bias in AMSR-E could be corrected by upscaled observations from the ground.When in service in 2014, the SMAP mission may reduce the severity of soil moisture signal saturation due to high vegetation volume during the growing season (Das, Entekhabi, & Njoku, 2011;Entekhabi et al., 2010).
The case of 2009 suggests the potential use of remotely sensed soil moisture to correct shortcomings in station weather data as inputs to simulate aggregate yield due to scale issues or instrument failure (Fig. 5c) as also mentioned by Bolten and Crow (2012) in their study.When soil moisture was assimilated, most of the water stress experienced by the crops during open-loop simulation and LAI data assimilation were minimized, resulting in enhanced canopy growth and yield (Fig. 10a-c, Fig. 5c).But as observed in 2006, the assimilation of AMSR-E soil moisture also increased total percolation compared with open-loop simulation, which is linked to N stress at the later part of the growing season (Fig. 10d).
The case of 2008 (Fig. 11) shows another aspect of soil moisture assimilation especially with a biased AMSR-E soil moisture data.This bias appears to be a disadvantage in very wet conditions.While assimilation of AMSR-E works well in nominal conditions, when condition is extremely wet, the performance can severely retrogressed (Fig. 5c).Due to excess water entering into the soil profile as a result of rootzone soil moisture updating and excessive rainfall, N losses appear to be high due to excessive leaching as well as by surface runoff, resulting in extreme (duration and intensity) N stress in the growing season (Fig. 11c,d).Even though water stress is completely eliminated after soil moisture assimilation (Fig. 11b,c), the crops in the simulations failed due to extreme N stress (Fig. 11a,d).In DSSAT-CSM, the law of the minimum is used to account for the effects of N and water stress to daily plant growth (i.e.,, (1-NStress)]) thus, if N stress dominates and depending on timing (vegetation and anthesis growth stages are critical), crops are adversely impacted resulting to lower yields or worse, crop failure.This is also what happened in 2004 simulations but not as severe as 2008.It appears that when a system is not soil moisture-controlled, assimilating only LAI may give better results (Fig. 5b).

Assimilating soil moisture and LAI
The performance of the data assimilation-crop modeling framework showed further improvement when both soil moisture and LAI were assimilated (R = 0.65, MBE = −2.0Mg ha −1 , RMSE = 3.9 Mg ha −1 , Table 1; Fig. 5d), suggesting an interaction between the two.
Assimilating both LAI and soil moisture (Fig. 9e-h) tightened the ensemble yield distribution in 2006 (Fig. 5c,d), compared to soil moisture assimilation alone.The resulting median was closer to the recorded county yield.Assimilating soil moisture reduced simulated crop water stress, while the LAI assimilation corrected the LAI profiles, simultaneously (Fig. 9e,f).
Assimilating the combination of AMSR-E soil moisture and MODIS-LAI increased the cumulative percolation (results not shown), which led to N stress in the later part of the growing season (Fig. 9h).It had been observed that assimilating AMSR-E soil moisture alone increased percolation and N stress (Section 3.2.2.).However, the effect of data assimilation on N stress was not as extreme/severe when both LAI and soil moisture were assimilated (Fig. 9d,h).This was due to the more members in the ensemble having above-average LAI values when assimilating soil moisture alone (Fig. 9a).We observed the same in the case of 2009 (Fig. 10a-h) but the simulated yield distribution shifted down when LAI and soil moisture were assimilated (Fig. 5d).
Simultaneous assimilation of soil moisture and LAI improved simulated yields slightly in 2004 and 2008 (Fig. 5c,d).However, assimilating LAI did not compensate for the over-prediction of leaching, N stress and yield reduction that apparently resulted from assimilating soil moisture data in these relatively wet years (Fig. 11e-h; see Section 3.2.2).We believe this could be an artifact of the bias in AMSR-E soil moisture data, with effects more pronounced under extremely wet conditions.

Composite analysis
With the current limitations of data and our data assimilation system, a strategic application of the data assimilation-crop modeling framework appears to be desired to further improve performance (i.e., one should know when to best use LAI alone or LAI plus soil moisture in data assimilation).Our analysis showed that climate condition expected during the growing season could provide information as to when a variable is best to be assimilated.As a base scenario, we exploited all sources of RS information for predicting crop yields.In this case, if we run data assimilation with soil moisture, LAI, and LAI + soil moisture then aggregate all their results, we get a yield performance of R = 0.73, MBE = − 2.4 Mg ha − 1 and RMSE = 3.0 Mg ha − 1 (composite all, Table 1), but the spread of prediction is wider (Fig. 5e).In our earlier analysis, we found that when the system is not soil moisture-controlled (extremely wet conditions), assimilating LAI alone might be more ideal because the plants will not be experiencing extreme water stress anyway, and when conditions are nominal, assimilating soil moisture and LAI might be better.This is to avoid the scenario that the data assimilation-crop modeling framework will deplete Nitrogen from the soil due to very high percolation.Using this information, we applied the data assimilation-crop modeling framework to predict crop yields and the performance substantially improved with R = 0.80, MBE = − 1.2 Mg ha − 1 and RMSE = 1.4 Mg ha − 1 (Fig. 5f, Table 1; composite best).The challenge in applying this rule of thumb would be the availability of skillful seasonal climate forecasts.Better soil moisture products e.g., SMAP, would be still a better alternative to fully utilize this kind of methodology.Some components of the data assimilation-crop modeling could be improved as well e.g., a variable inflation parameter.

Nitrogen and scaling effects
Accounting for the N process is important in simulating actual yields even when fertilizer application is at a very high level because of the interaction between water and N in crop growth.Fig. 12a shows the tendency of over-predicting crop yields when N is non-limiting in the data assimilation; the overall performance is poor (Table 1).
In our case, scaling RS soil moisture with model climatology did not work well because our model was also biased in simulating county scale soil-water-plant-atmosphere processes (Fig. 12b) as shown by openloop simulation results (Fig. 5a).Yield correlation was preserved as when there is no scaling performed, but the prediction errors were very high compared to no scaling (Table 1).However, the potential

Summary and conclusions
We developed and tested a data assimilation-crop modeling framework for assimilating RS soil moisture and LAI to predict crop yields at aggregate scale.We found that DSSAT-CSM-Maize run as a Monte Carlo ensemble without data assimilation was able to capture part of the inter-annual variability of maize yields in Story County, but with a strong negative mean bias (better model calibration could have improved this further).However, there were two interesting years (2006,2009) that shed light about the risk of using only point-scale weather data (station) as inputs for pseudo-regional crop modeling application to simulate aggregate yield.When MODIS-LAI was assimilated, we observed some improvements in simulated yields, including in 2006.However, under very dry conditions, as in 2009, LAI assimilation could not improve simulated yields because the rootzone soil moisture could not meet the increased water demand that result from improved canopy growth.
We also observed an incremental improvement in simulated yields when AMSR-E soil moisture was assimilated, especially for the cases of 2006 and 2009, suggesting the potential use of remotely sensed soil moisture data to correct the shortcomings of point-scale weather data as input to simulating aggregate yields, in addition to correcting model errors.However, it was observed that assimilating near-surface soil moisture to update the rootzone wetness had caused excessive drainage (vertical) that when conditions are extremely wet can result in the underestimation of yields, this is due to excessive N leaching/ loss.We believe that this is an artifact of the biased AMSR-E soil moisture data, which impact is more pronounced under extremely wet climate condition.Unfortunately, when the AMSR-E signal becomes insensitive due to dense vegetation it is the time when accurate soil moisture measurements are needed for data assimilation.
There were more improvements in simulated yields when soil moisture + LAI were assimilated, but because of the current limitation of the data assimilation system, a composite analysis was tested to decide when LAI or SM + LAI assimilation is best to be done.We found that applying the data assimilation-crop modeling framework strategically using a-priori information of climate expected during the growing season could improve overall the simulated yield performance.Under very wet condition, SM assimilation may not be necessary and that LAI assimilation may be more useful.
Applying the commonly used scaling of RS soil moisture data to model climatology did not improve our data assimilation results because our model is also biased (as shown in open-loop) in simulating county-scale soil-water-plant-atmosphere processes.
The AMSR-E instrument aboard the Aqua satellite malfunctioned in October 2011 and is no longer in service.NASA's upcoming Soil Moisture Active Passive (SMAP) mission shall be a new source of soil moisture estimates.The L-band radar and L-band radiometer measurements from the SMAP mission will help alleviate the problem of soil moisture detection under dense vegetation conditions, and the penetration depth would be ~0-5 cm.The SMAP radar and radiometer are designed to remotely sense soil moisture with high accuracy (RMSE ≤ 0.04 cm cm −3 ) under vegetation with water content up to 5.0 kg m −2 (Entekhabi et al., 2010).This will ensure that the SMAP measurements are effective throughout most of the growing season for many crops, hence the availability of better soil moisture data during the critical period of crop growth.For future work, we plan to use the SMAP radiometer-radar combined 9 km soil moisture product (Das et al., 2011) or the SMAP radar-only 3 km soil moisture product to test the data assimilation-crop modeling framework presented here.Also, the Global Precipitation Measurement (GPM) rainfall product will be of great value for aggregate scale modeling of crop yields.

Fig. 9 .
Fig. 9. LAI, soil moisture, water and nitrogen stress (unitless) simulations with (a-d) data assimilation of AMSR-E soil moisture and (e-h) data assimilation of AMSR-E soil moisture and MODIS-LAI in 2006.

Fig. 10 .
Fig. 10.LAI, soil moisture, water and nitrogen stress (unitless) simulations with (a-d) data assimilation of AMSR-E soil moisture and (e-f) data assimilation of AMSR-E soil moisture and MODIS-LAI in 2009.

Fig. 11 .
Fig. 11.LAI, soil moisture, water and nitrogen stress (unitless) simulations with (a-d) data assimilation of AMSR-E soil moisture and (e-h) data assimilation of AMSR-E soil moisture and MODIS-LAI in 2008.

Fig. 12 .
Fig. 12. Performance of assimilation of AMSR-E soil moisture + MODIS-LAI (a) when Nitrogen is not limiting (Nitrogen module is OFF), and (b) AMSR-E soil moisture is scaled to the climatology of the model, Nitrogen module is ON.The flat boxplot shows Story county (IA) yield data from USDA-NASS from 2003 to 2009.

Table 1
Performance (average) of the EnKF data assimilation system for simulating maize yields, Story County, Iowa.Data assimilation with Soil moisture + LAI with non-limited N (i.e., N-process is OFF).b Data assimilation with Soil moisture + LAI with AMSR-E scaling, N-process is ON. a