Augmenting agroecosystem models with remote sensing data and machine learning increases overall estimates of nitrate-nitrogen leaching

Process-based agroecosystem models are powerful tools to assess performance of managed landscapes, but their ability to accurately represent reality is limited by the types of input data they can use. Ensuring these models can represent cropping field heterogeneity and environmental impact is important, especially given the growing interest in using agroecosystem models to quantify ecosystem services from best management practices and land use change. We posited that augmenting process-based agroecosystem models with additional field-specific information such as topography, hydrologic processes, or independent indicators of yield could help limit simulation artifacts that obscure mechanisms driving observed variations. To test this, we augmented the agroecosystem model Agricultural Production Systems Simulator (APSIM) with field-specific topography and satellite imagery in a simulation framework we call Foresite. We used Foresite to optimize APSIM yield predictions to match those created from a machine learning model built on remotely sensed indicators of hydrology and plant productivity. Using these improved subfield yield predictions to guide APSIM optimization, total NO3−N loss estimates increased by 39% in maize and 20% in soybeans when summed across all years. In addition, we found a disproportionate total amount of leaching in the lowest yielding field areas vs the highest yielding areas in maize (42% vs 15%) and a similar effect in soybeans (31% vs 20%). Overall, we found that augmenting process-based models with now-common subfield remotely sensed data significantly increased values of predicted nutrient loss from fields, indicating opportunities to improve field-scale agroecosystem simulations, particularly if used to calculate nutrient credits in ecosystem service markets.


Introduction
Agroecosystem services at the subfield scale have seen a resurgence in interest as efforts have increased to reduce agricultural greenhouse gas emissions and nutrient loss, with payments even going to farmers who implement more sustainable management practices. However, to assign value to ecosystem services, it is first necessary to measure their magnitude. Robust, quantitative measurement of ecosystem services can be difficult given the interactions between weather, topography, genetics, pests, management, and a range of other factors, including cost of measurement, indicating a need for better ecosystem tools Figure 1. Comparison of yield monitor data for soybeans and maize (left) and their respective APSIM simulated yields which account for soil, weather, and management (right). Grey vertical bars on the yield monitor axis represent the field mean yield as identified by yield monitor. The vertical gradient bars on the yield monitor axis (left) represent the range at which APSIM has simulated yield (right). and understanding of the within-field processes driving differences in measurement. Agroecosystem models such as the Agricultural Production Systems Simulator (APSIM) (Holzworth et al 2014) have been used to overcome measurement practicality challenges by simulating the interactions of soil type, weather, temperature, and management on crops and the environment. However, point-based agroecosystem models like APSIM tend to underperform in suboptimal field areas (McNunn et al 2019, Pasley et al 2020 and often make landscape assumptions that are not reflective of the real world, missing within-field yield variation due to factors like erosion (Thaler et al 2021), topography (Maestrini andBasso 2018, Martinez-Feria andBasso 2020), soil compaction (Sivarajan et al 2018), and seasonal flooding (Upadhyay et al 2019), among others.
Meanwhile, statistical and machine learning (ML) models are well-suited for making heterogeneous landscape predictions, but are often treated as black boxes (Pantazi et al 2016, Smidt et al 2016, Liakos et al 2018, Peng et al 2019, lacking the scientific underpinnings found in process-based agroecosystem models. The next generation of agroecosystem models combines these approaches: ML models to predict factors such as crop productivity, and process-based models whose parameters can be adjusted based on statistical model output to account for spatial heterogeneity (Roberts et al 2017, Reichstein et al 2019, Zhenong et al 2019, Zhao et al 2020, Debasish et al 2021, Shuai and Basso 2022. Outputs from these coupled models can be used to better estimate hard-to-measure ecosystem outcomes (e.g. nitrate-nitrogen (NO 3 − N) leaching) and create 'what-if ' scenarios for more sustainable management practices. With large areas of US farmland either degraded (Thaler et al 2021), unprofitable (Brandes et al 2016), underperforming (Bruno et al 2019), or a combination of the three, it is important to represent field areas that can have the greatest impact on the environment and human health.
Prior studies which combined publicly available soils and weather data with field-specific management data found APSIM to be well-suited for identifying field mean yields and responses to management changes, though poor at representing subfield yield (McNunn et al 2019). This finding is reasonable given the state-of-the-art in crop modeling to first parameterize for field mean values, not extremes. As anticipated, we found that our initial APSIM simulations based on subfield soil characteristics, weather, and management were also accurate in predicting field average yields, but did not accurately reflect subfield yield variation (figure 1). This finding is important given prior research linking within-field yield variation to nitrate loss (Bruno et al 2019), estimating nitrogen loss in low-yielding areas to be 63% higher than high-yielding areas. We then asked, can APSIM be augmented with other public data like satellite imagery and ML to better simulate within-field yield? How does using such subfield yield data to guide APSIM optimization change predictions of NO 3 − N leaching? What are strengths and limitations of this augmentation?
We used a multiscale Foresite framework (figure 2) which combines vegetative indices from satellite imagery, digital elevation models, public soil and weather data together with crop simulation and ML models to predict one commonly measured and available variable (yield) and simulate one variable that is difficult and uncommonly measured (NO 3 − N loss). Yield was chosen as the target variable because it is already used to evaluate within-field productivity by farmers and its relationship to nitrogen leaching (Bruno et al 2019). Twenty-one site years and ∼130 ha were analyzed in Boone, Story, and Jasper counties, Iowa, USA, with Foresite predictions tested against calibrated yield-monitor data. Similar studies on subfield productivity have been conducted, but few attempt to explain subfield variation using remote sensing, yield monitor data, and crop simulation models together, instead using proxies for yield monitor data like National Agricultural Statistics Service (NASS) county averages (Zhenong et al 2017), crop model predictions (Lobell et al 2015), or using remote sensing and yield monitor data but not crop simulations for latent ecosystem services (Martinez-Feria and Basso 2020, Shuai and Basso 2022).
It should be noted that we did not choose to optimize APSIM to exact yield monitor values given (a) the relative difficulty we found in obtaining high-quality, calibrated yield monitor data, and (b) given the inherent error in yield monitor data at a local scale (Lyle et al 2014, Leroux et al 2018, with some yield monitor cleaning procedures removing as much as 30% of the data (Vega et al 2019). Beyond this initial proof of concept, we plan to apply the Foresite methodology to multiple crops and regions, and expect that augmenting an agroecosystem model with remote sensing will make it more generalizable.
We expected that NO 3 − N loss estimates would increase for fields with chronic problem areas such as ephemeral spring flooding, and estimates would decrease in fields that were particularly high yielding. We believe this work is important in improving modeling efforts to more accurately represent the agricultural landscape, and call for future work to more tightly couple landscape characteristics with agroecosystem models to better study underlying landscape changes from these yield and environmental drivers.

Methods summary
We compared how subfield NO 3 − N estimates change in intensively cropped fields when using a standalone agroecosystem model (APSIM 'default' simulations), and an approach that augments these simulations with ML yield estimates from remote sensing ('Foresite augmented' simulations). We analyzed 21 field site years in maize-soybean rotations across ∼130 ha in Iowa, USA. Both the default APSIM and Foresite simulations combined disparate publicly available data on weather and soils, along with field-specific management received from farmers. Foresite augmentation accounted for additional spatial heterogeneity of field productivity by using satellite imagery and yield monitor data to train and test a ML model. Foresite augmented simulations were then optimized to four ML-predicted subfield yield quantiles (low to high). The final product is a 10 m resolution subfield map for simulated NO 3 − N loss across these four productivity quantiles (figure 2).

Yield monitor and management data
To evaluate the accuracy of both soil-based APSIM and remote-sensing-based ML yield predictions, twenty-one site years of calibrated yield monitor and management data were selected for model testing. Yield monitor data for 2018 and 2019 was obtained for three fields in Boone and Story Counties, IA, USA, and for 2016 and 2018-2020 for four field sites in Jasper County, IA, USA. The yield monitor data was cleaned using the RITAS algorithm (Damiano 2021). All fields were conventionally managed in a maizesoybean rotation. Detailed information about field management and yield monitor data cleaning can be found in the supplement.

ML subfield yield predictions
A random forest ML model was fit using green chlorophyll index (GCI) and topographic wetness index (TWI) as inputs to predict subfield yield at a 10-m resolution. GCI is a vegetative index and has been used in a range of crop measurement contexts including plant chlorophyll content, leaf area index (LAI) (Gitelson et al 2005) and yield . It is also less prone to saturation in plant canopies with a high LAI than other measurements like normalized difference vegetation index (NDVI) (Gitelson et al 2003, Gutman et al 2021. It is calculated using the green and near infrared bands (GCI = NIR Green − 1). Sentinel 2 satellite imagery (10 m resolution) was obtained from the Copernicus API (https://scihub.copernicus.eu/) and used to calculate GCI.
Given the potential effect of hydrology on some of the analyzed fields' yield, we also calculated the TWI for each field as in (Mattivi et  GIS (v. 2.3.2) software and a 3 m LiDAR-derived DEM (Iowa DNR 2018). TWI is primarily used to analyze how topography influences hydrologic processes, and has been used to predict local variables such as soil moisture and pH (Sørensen et al 2006). Unitless, TWI is typically defined as ln a tan b where a is the local upslope contributing area draining to a certain point and b is the slope in radians. More information on the ML model can be found in the supplement.

APSIM weather and soil inputs
The APSIM (Holzworth et al 2014) was chosen for simulating the interactions of weather, soil, and management on crop growth and development along with environmental effects. APSIM simulates agroecosystems on a daily time step and is composed of modules for field management, soil and water dynamics, and livestock. The 'plug-and-play' nature of APSIM's modules and their generalizability has allowed APSIM to be calibrated for different regions of the world, including the Midwest broadly and Iowa specifically (Archontoulis et al 2014, and different field management practices like cover crops or variable fertilizer applications (Basche et al 2016, Puntel et al 2016, McNunn et al 2019. Weather data for each field site-year was obtained for the centroid of each field using the NASA-POWER API (https://power.larc.nasa.gov). Subfield soil profiles for field boundaries were obtained from the US national SSURGO soils database (NRCS 2019). SSURGO identified soil properties were translated for APSIM as per Sørensen et al (2006).

Calibrating APSIM for subfield yields
The apsimx R package was used to calibrate APSIM simulations (https://github.com/femiguez/apsimx) to field-specific yearly low, medium-low, mediumhigh, and high yield quantiles. These four yield categories were identified on a field-year level by splitting the ML predictive yields for the entire site year into four quantiles and then taking the mean yield for each quantile. We chose to simulate subfield yield in this manner both for generalizability given potential yield monitor artifacts and computational savings of alternatively running Foresite for each 10 m yield pixel.
To achieve the targeted low and medium-low yields, APSIM parameters that represented the ability for roots to explore the soil profile (XF) and proportion of water the crop can take up in a day (KL) were optimized to simulate plant and soil stresses based on the presence of poorly drained areas in three of the analyzed fields. To reach medium-high yields, radiation use efficiency and harvest index were optimized, and for the highest yields end of grain filling was optimized in addition to radiation use efficiency and harvest index.
To account for uncertainty in the exact driving mechanism of yield drag, we averaged NO 3 − N loss over all adjusted parameters for each respective yield category. Gridded leaching estimates were then obtained by combining the averaged subfield soil profile leaching to their respective field-year yield quantiles. More information on this process can be found in the supplement.

General yield prediction performance and NO 3
− N leaching On average, yield predictions from the ML model that incorporated subfield information predicted grain yield more accurately than the 'default' , soil-based APSIM predictions alone (table 1). Average root mean square error (RMSE) for maize predictions compared to actual yields were 2.2 Mg ha −1 for the ML and 3.03 Mg ha −1 for default APSIM, with an average reduction in RMSE by the ML of 23% across all fields and site years compared to default APSIM. For soybeans, the average ML RMSE was 0.52 Mg ha −1 while the average default APSIM RMSE was 2.03 Mg ha −1 , with an average reduction in RMSE by the ML of 66%.
Foresite augmented simulations also increased predictions of total NO 3 − N leaching from fields compared to default APSIM simulations. On average, Foresite augmented simulations resulted in 39% greater simulated NO 3 − N leaching for maize and 20% more for soybeans. Though the total NO 3 − N leached across all fields increased, not all fields followed this trend. Seven site years of maize simulations increased in total leaching and seven site years decreased in total, while for soybeans two site years increased and five decreased in total leaching. However, the total percent increase is due to the significant increased leaching in two of the largest soybean fields (Accola, K. Finch) and all three of the largest maize fields (Accola, K. Finch, E. Bilsland).

Within-field yield prediction performance and NO 3
− N leaching 3.2.1. Maize Improvements in ML vs default APSIM yield predictions were most pronounced in the lowest yielding field areas for maize (figure 3), with an RMSE of 2.58 and 4.82 Mg ha −1 for the ML and default APSIM, respectively. There were relatively little differences between the ML and default APSIM in RMSE for the remaining medium-low to high quantiles.
Nitrate-nitrogen leaching across all maize fieldyears exemplified a disproportionate amount of leaching from the lowest yielding quantiles, with total contribution of leaching of 42%, 22%, 20%, and 16% from the lowest to highest quantiles, respectively. This indicates that nearly as much as half of the total NO 3 − N leaching could be contributed by only 25% of the field area.
For example, in fields Accola and E. Bilsland, this disproportionate leaching contribution becomes more apparent. These two fields were chosen due to their topographic and spatial yield differences: Accola, which is dominated by particularly floodprone field areas (i.e. high TWI), and E. Bilsland which is a relatively high-yielding field. For Accola, the percent contribution of total NO 3 − N leaching from optimized simulations was 39%, 22%, 21%, and 18% from the lowest to highest yielding quantiles, respectively, compared to the default APSIM contribution of 28%, 26%, 24%, and 22%. For E. Bilsland, the contribution of total NO 3 − N leaching was 51%, 22%, 15%, and 12% from the lowest to highest yield quantiles, respectively, and 26%, 26%, 24%, and 23% for the default APSIM simulations.
Disproportionate subfield NO 3 − N losses are visibly evident when comparing default APSIM leaching estimates to the Foresite augmented leaching estimates. In the bottom-left panel C of both fields (figure 4), we can see the default soil-based leaching estimates, and in the bottom-right panel D we can see the leaching estimates that account for yield optimization. Compared to the APSIM default estimates, leaching in the Foresite augmented simulations follow yield trends more closely than soil profiles alone, creating different areas of high and low concern for environmental degradation than would be identified in default APSIM simulations.

Soybeans
For soybeans, the ML performed markedly better in predicting yield across all quantiles, with RMSEs of 0.64, 0.45, 0.41, and 0.53 Mg ha −1 in the lowest to highest quantiles, respectively. As in maize, it should be noted that the ML performed worst in the lowest yield quantile, though still markedly better than default APSIM (figure 3). Regarding Foresite augmented NO 3 − N leaching, the lowest yielding quantiles again produced more leaching across all soybean field years, with 31%, 27%, 22%, and 20% leaching from the lowest to highest quantiles, respectively. While quantile leaching differences were less pronounced in soybeans than maize fields, we can still see that the lowest quantile contributes 11% more leaching than the highest yielding quantile. For Accola, the percent contribution of total NO 3 − N leaching from optimized simulations was 35%, 29%, 19%, and 17% from the lowest to highest yield quantiles, respectively, compared to the default contribution of 28%, 26%, 24%, and 22%. For E. Bilsland, the contribution of total NO 3 − N leaching for augmented simulations was 28%, 27%, 24%, and 22% from the lowest to highest yield quantiles, respectively, and 26%, 26%, 25%, and 23% for the default simulations. Visually inspecting two individual fields (figure 5), we can see that leaching estimates for soybeans are still fairly delineated by soil profiles (panel (C)), though yield does have an impact on spatial variation and NO 3 − N leaching, if minimized compared to maize fields.

Discussion
Evaluating agroecosystems for productivity, environmental impact, and benefits from management changes is a rapidly evolving area of research, especially given the renewed interest in ecosystem service markets. Three prominent approaches to quantifying agroecosystem services seem to be emerging: in-field sampling, ML and remote sensing, and process-based scientific models (Giovani et al 2019, Mohsen et al 2019, Norris et al 2020, Wongpiyabovorn and Plastina 2022. While all three approaches have limitations (e.g. different propagations of error, time, resources, input data, etc), in this research study we explored combining two of these approaches by augmenting a process-based agroecosystem model with remote sensing data and ML.
We asked, can agroecosystem models like APSIM be augmented with remotely sensed data and ML to better simulate subfield yield variation, and how does this change prediction of ecosystem processes like NO 3 − N leaching? What are strengths and limitations of this augmentation?
Using ML, subfield yield prediction error was reduced in maize by 23% and soybeans by 66% compared to the default APSIM simulations. The greatest improvement in our approach was seen in the lowest yielding quantiles; however, the ML model still had a higher RMSE for the lowest yielding quantiles compared to the highest (figure 3). We expect the higher error is due to the presence of weeds in the flooded areas of fields, giving positive GCI values while the actual yield was known to be zero. We initially expected the ML to outperform APSIM in the highest yield quantiles as well given APSIM's suitability for predicting field means but not extremes, but this only held true in soybeans. We expect that saturation of GCI in the maize canopy (i.e. high LAI) reduced the variability needed for the ML to differentiate maize plant health and yield differences. Both of these issues (weeds and GCI saturation) could likely be addressed with more frequent image collection via daily satellite platforms, a mosaic approach from multiple image sources, or a different ML technique (Dian Bah et al 2018. It was also not our goal to find the best ML algorithm for yield prediction as this has already been addressed (Liakos et al 2018, Reichstein et al 2019, van Klompenburg et al 2020. Regarding ecosystem service estimates, we found that NO 3 − N loss predictions changed both spatially and in summation using the Foresite approach compared to default APSIM simulations. Overall, Foresite predicted 39% more NO 3 − N loss in maize and 20% more in soybeans than default APSIM predictions. The disproportionate contribution of low-yielding subfield areas was most apparent in maize, where the lowest yielding quantiles contributed 41% of NO 3 − N loss versus 16% in the highest yielding quantiles. This effect was less apparent though still significant in soybeans, with the lowest yielding quantiles contributing 31% of total NO 3 − N loss versus 20% in the highest. The difference in leaching between the two crops likely being due to the fact that maize is fertilized with large amounts of nitrogen inputs. However, prior research has shown than both systems can be similarly leaky (Lawlor et al 2008, Helmers et al 2012. While soybeans are not fertilized, we believe that reduction in plant water uptake, sustained mineralization of nitrogen from soil organic matter, and increased drainage in areas of low crop performance contributed to the leaching variation among soybean yields (supplement table S1).
While the Foresite augmented approach is an improvement over default agroecosystem simulations, there is still much work to be done. Depending on the source of satellite imagery, availability of images can be impacted by satellite flyover frequency and cloud cover, though this obstacle is being alleviated with daily satellite platforms like Planet (www. planet.com/). In addition, while we chose TWI as it helped to explain some of the mechanisms behind yield differences in fields, interpretation of TWI values may change between topographic regions (flat, sloped) and weather patterns (wet, dry). It is important to think carefully about which remote sensing metric may be useful in different regions, such as TWI in a region that is irrigated vs rain fed. To couple this idea, we need to be able to better explain the mechanisms of yield drag/boost and be able to accurately model them (Kivi et al 2022). For example, one additional known yield drag mechanism in the Accola field is a diagonal line that follows the Bakken Pipeline installation (Tekeste et al 2021); an example of something that we may not currently be able to explain from available yield monitor and remote sensing data alone, and instead need on-field measurements and prior field knowledge to diagnose problems.
Lastly, it should be noted that we also attempted to fit an additional out-of-sample ML yield prediction model by leaving one entire field out as testing data and training the ML on all remaining fields' data for the desired crop (data not shown). However, the model resulted in high error when predicting yield for entire unknown fields, especially in maize with GCI saturated canopies. There are several possible answers to this problem, including insufficient sample size or consecutive years of data; the natural variation in yearly weather patterns; or a lack of sites with similar enough features. While identifying the best yield prediction model was not the explicit goal of this study, this is a research area we would like to explore more in the future with more field sites, years of data, and high-resolution images (Pham et al 2022). This also illustrates the need for more high quality, long-term, calibrated yield monitor data to identify within-field productivity trends (Bunselmeyer and Lauer 2015) and improve ecosystem service assessments.

Conclusion
What implications do inaccurate APSIM subfield yield predictions have on field-scale predictions of NO 3 − N leaching? We have demonstrated that APSIM default field-scale and subfield-scale simulations are likely underpredicting total NO 3 − N loss. The significant increase in total leaching across all fields for the Foresite augmented vs default APSIM runs indicates that regional simulations of environmental degradation may be significantly underestimating impact on water quality and other environmental variables (e.g. greenhouse gases), providing significance when analyzing fields to maximize productivity and minimize environmental impact from lowproducing areas (Lawrence et al 2021).
We provide evidence that augmenting agroecosystem models could lead to better ecosystem services estimates, and a framework that future work can build upon. To better account for agroecosystem services, it will be necessary to combine data sources (in-situ observation, remote sensing, interpolation) and approaches (statistical, ML, processbased) to better understand how to manage our landscapes for food, fuel, fiber, and the sustainability of our natural resources. While we focused on corn and soybeans in Iowa, we believe similar approaches can be applied elsewhere, though no one data source or model may fit for all regions and all crops. It will take a multidisciplinary, multisource effort to sufficiently address agroecosystem service measurement.

Data availability statement
The data that support the findings of this study are available upon reasonable request from the authors.