Machine learning improves predictions of agricultural nitrous oxide (N2O) emissions from intensively managed cropping systems

The potent greenhouse gas nitrous oxide (N2O) is accumulating in the atmosphere at unprecedented rates largely due to agricultural intensification, and cultivated soils contribute ∼60% of the agricultural flux. Empirical models of N2O fluxes for intensively managed cropping systems are confounded by highly variable fluxes and limited geographic coverage; process-based biogeochemical models are rarely able to predict daily to monthly emissions with >20% accuracy even with site-specific calibration. Here we show the promise for machine learning (ML) to significantly improve field-level flux predictions, especially when coupled with a cropping systems model to simulate unmeasured soil parameters. We used sub-daily N2O flux data from six years of automated flux chambers installed in a continuous corn rotation at a site in the upper US Midwest (∼3000 sub-daily flux observations), supplemented with weekly to biweekly manual chamber measurements (∼1100 daily fluxes), to train an ML model that explained 65%–89% of daily flux variance with very few input variables—soil moisture, days after fertilization, soil texture, air temperature, soil carbon, precipitation, and nitrogen (N) fertilizer rate. When applied to a long-term test site not used to train the model, the model explained 38% of the variation observed in weekly to biweekly manual chamber measurements from corn, and 51% upon coupling the ML model with a cropping systems model that predicted daily soil N availability. This represents a two to three times improvement over conventional process-based models and with substantially fewer input requirements. This coupled approach offers promise for better predictions of agricultural N2O emissions and thus more precise global models and more effective agricultural mitigation interventions.


Introduction
Atmospheric nitrous oxide (N 2 O) concentrations are currently increasing by 0.98 ppb yr −1 , a rate 44% higher than during the period 2000-2005 (0.68 ppb yr −1 ). This dramatic shift is largely driven by increased anthropogenic sources that lift current total global N 2 O emissions to ∼17 Tg N yr −1 (Syakila andKroeze 2011, Thompson et al 2019). This represents a global warming impact of ∼2 Pg C equivalents yr −1 based on N 2 O's global warming potential (Robertson 2004, IPCC 2013. Approximately 60% of the contemporary N 2 O increase comes from cultivated soils that receive nitrogen (N) fertilizer (IPCC 2014, Robertson 2014, Tian et al 2019, but this value involves large uncertainty because the episodic nature of soil N 2 O fluxes challenges our ability to quantify emissions accurately (Barton et al 2015, Parkin 2008, Saha et al 2017a. Soil N 2 O fluxes are of microbial origin, mainly from incomplete denitrification (the sequential reduction of nitrite or nitrate to N 2 O then N 2 ) and nitrification (the oxidation of ammonium to nitrate). A number of environmental and management factors affect N 2 O production, in particular soil N inputs and N availability, pH, soil moisture, tillage, and temperature (Butterbach-Bahl et al 2013). Furthermore, N 2 O can also be consumed in soil (Schlesinger 2013), and controlling factors often interact in a nonlinear fashion to add considerable spatial and temporal variability to fluxes Smith 2003, Jin et al 2017).
Quantitative predictions of daily to monthly N 2 O fluxes are especially challenging. Commonly used methods include (a) empirical equation-based parametric models (Sozanska et al 2002, Roelandt et al 2005, (b) emission factor models based on the proportion of N inputs emitted as N 2 O (De Klein et al 2006), and (c) process-based biogeochemical models (Brilli et al 2017, Ehrhardt et al 2018, Gaillard et al 2018. Each prediction method has its own limitations with accompanying bias. For example, equation-based parametric models require N 2 O fluxes to be normally distributed with homogeneous variance, conditions rarely met by N 2 O flux data even when transformed. The IPCC's 1% emission factor is a useful but simple estimate of annual N 2 O emissions based largely on annual fertilizer N inputs, thus ignoring intra-annual variation and the dynamic variable interactions that influence emissions. The IPCC approach often underestimates local (Philibert et al 2012, Shcherbak et al 2014 and regional (Grace et al 2011, Griffis et al 2013 N 2 O emissions, with variable success at the global scale (Crutzen et al 2008, Del Grosso et al 2008a, Tian et al 2020. In contrast, process-based models predict daily to weekly fluxes, important for designing management interventions to mitigate emissions; however, they are complex and heavily rely on site and version-specific parameterizations that are sometimes ad hoc tunings (Del Grosso et al 2008b, Gilhespy et al 2014. Parameterizations add additional uncertainty when extending such models to sites without reliable model calibration data (Jarecki et al 2008, Rafique et al 2013, Ehrhardt et al 2018, Gaillard et al 2018, Berardi et al 2020, Fuchs et al 2020. Moreover, the N 2 O algorithms of popular biogeochemical models such as DayCent (Del Grosso et al 2000, Parton et al 2001, DNDC (Li 2000(Li , 2007 and EPIC (Izaurralde et al 2012), widely used for predicting regional N 2 O budgets, are usually derived from laboratory-based responses of N 2 O to individual environmental factors. This necessarily (and by design) simplifies the complex variable interactions typical of field settings but adds to the need for calibration with intensive chamber-based field measurements.
Although the lack of agreement between observed and predicted N 2 O fluxes by current prediction methods are widely known (Ehrhardt et al 2018), the development of improved alternative prediction approaches has been limited. While top-down models have benefited annual N 2 O flux predictions (Philibert et al 2013, Perlman et al 2014, such approaches have not been available for daily to weekly predictions due to a general lack of N 2 O measurements at finer temporal scales. That said, the increasing availability of long-term high frequency observations from automated flux chambers (Grace et al 2020) creates new opportunities to develop data-driven top-down models that can improve predictability and our understanding of the interacting factors and threshold conditions controlling daily N 2 O fluxes.
Here we demonstrate a novel application of a data-driven machine learning (ML) model (Random Forest) to predict N 2 O fluxes from two sites in the upper U.S. Midwest growing corn (Zea mays L.). Corn is responsible for ∼60% of U.S. N fertilizer use, and ∼60% of U.S. corn is grown in the Midwest, a region of high N 2 O emissions (Larsen et al 2007, ERS 2019a, 2019b. We use multi-year automated N 2 O flux data at one site together with observations from less dense manual chambers at both sites to deduce causal relationships among key variables to predict fluxes with known confidence. This hierarchically divergent approach requires no knowledge of underlying process-level relationships and thus is less able than process-based models to predict the effects of novel management changes (e.g. broadcast vs injected N fertilizer). Model training is also data intensive. Nevertheless, because the ML system learns relationships from available data it can make unbiased predictions, and as well provide insights into functional relationships by discriminating among different predictor variables as it learns patterns and identifies critical response thresholds. We further test a coupled model that uses ML to predict N 2 O fluxes based on a process-level cropping systems model's provision of an additional predictor variable, soil N availability.

Experimental systems
We used data from three experiments in two geographic locations with different soil types. The first experiment is a continuous no-till corn system in the biofuel cropping systems experiment (  annual precipitation and temperature of 869 mm and 6.8 • C, respectively. Both BCSE experiments were established in 2008 as part of the US Department of Energy's Great Lakes Bioenergy Research Center and management, site histories, and other details are reported in detail elsewhere (Sanford et al 2016, Gelfand et al 2020. The third experiment is a no-till corn-soybean (Glycine max L.)-winter wheat (Triticum aestivum L.) rotation at the KBS Long-term Ecological Research (LTER) site (KBS-LTER; 42 • 24' N, 85 • 24' W, 288 m elevation), 2 km from KBS-BCSE, established in 1989 . Details on agronomic management can be found in https://lter.kbs.msu.edu/datatables.

Data generation
N 2 O fluxes from KBS-BCSE were measured using automated gas flux chambers (50 × 50 × 18 cm) as described in Ruan and Robertson (2017) from 2012 to mid-July of 2017, except not in 2015 due to instrument failure. Fluxes were measured in one replicate block from three chamber locations within a plot (27 × 43 m). Fluxes were measured from one location on any given day and locations were randomly re-located every 10-15 d to minimize bias due to small-scale spatial variability. Each chamber was sampled four times per day; the daily average flux was used for this analysis, previously shown to approximate diurnal flux variability in the Midwest (Parkin 2008). At each sampling time, the chamber was closed for 45 min and headspace samples of 1 L each were taken at 0, 15, 30, and 45 min after chamber closure. Headspace samples traveled through a Teflon sampling line to a nearby (<80 m distant) trailer housing a gas chromatograph with an electron capture detector (350 • C) (SRI 8610C with custom sample acquisition, Torrance, CA, USA) to measure N 2 O concentrations, and an in-line infra-red gas analyzer (LI-820, LI-COR Biosciences, NE, USA) to measure CO 2 concentrations. N 2 O fluxes from ARL-BCSE (five replicate blocks) and KBS-LTER (four replicate blocks) were measured using static chambers (28.5 cm diameter, ∼17 cm height) on a weekly to biweekly sampling frequency as described elsewhere (Gelfand et al 2016, Oates et al 2016, Duncan et al 2019.
The predictor variables avoid co-linearity as they were uncorrelated (table 2, figure S1 (available online at stacks.iop.org/ERL/16/024004/mmedia)), and their distribution is shown in figure S2. The water-filled pore space (WFPS), an approximation of soil O 2 limitation, was estimated from volumetric water content (VWC) and bulk density in the top 25 cm soil layer. The VWC was continuously measured at the KBS-BCSE site by a data logger-controlled sensor network as reported elsewhere ; at ARL-BCSE and KBS-LTER, VWC was estimated from bulk density and gravimetric soil moisture values measured at each manual gas sampling event. To test the additional power of adding to the model daily available soil N, measured only with manual sampling events, we used SALUS, a processbased model extensively validated at KBS (Basso and Ritchie 2015, Hussain et al 2019, to simulate values for soil NH 4 + -N and NO 3 --N in the top 25 cm layer at KBS-BCSE. SALUS is an easy to calibrate functional model designed to represent feedbacks and interactions among crop, soil, management, genotype, and climate; avoiding known problems with accurately simulating daily crop growth and soil water balances by many biogeochemical models such as DayCent and DNDC (Jarecki et al 2008, Brilli et al 2017, Fuchs et al 2020. For ARL-BCSE and KBS-LTER, we used measured soil inorganic N data interpolated between the soil sampling dates to match the manual gas sampling dates. Static soil variables included soil organic carbon and clay content. Weather variables included precipitation and air temperature, obtained from weather stations within 1 km of each site. Missing data on predictor variables were imputed using the proximity matrix within the Random Forest algorithm.

Data processing
Each flux event was checked for linearity of headspace N 2 O and CO 2 accumulations during the chamber closure period, and periods with leakage (<3% of all fluxes data) were deleted. We converted fluxes between zero and the automated chamber's N 2 O detection limit (±1.1 g N 2 O-N ha −1 d −1 ) to 50% of the detection limit. Approximately 11% of the total number of N 2 O fluxes (2246) were negative and the lowest value was −7.5 g N 2 O-N ha −1 d −1 . The highest measured flux was 593 g N 2 O-N ha −1 d −1 .

Machine learning algorithm
We used R statistical software (R Core Team 2018) to implement Random Forest, a supervised ML algorithm for classification and regression based on the principle of recursive partitioning (Breiman 2001), and independent of the assumption of functional relationships between the response and predictor variables. A detailed description of the Random Forest algorithm can be found in Hoffman et al (2018). Briefly, Random Forest analysis ensembles numerous regression and classification trees following a process called 'bootstrap aggregation' or 'bagging.' Classification trees are relatively uncorrelated in two ways. First, a random subset of the data space is drawn (with replacement) to grow a tree to its full length, and each node of the tree group's observations are characterized by certain conditions on the predictor variables to produce an average prediction for the response variable. Each tree growing process uses only two-thirds of the bootstrapped data and one-third of the observations (out-of-bag data, OOB) are used for estimating the prediction errors. Second, each node split in a tree considers a random subset of predictor variables, usually a square root of the total number of predictor variables. The predictions from all the trees are averaged to make final predictions.
The variable importance function within the Random Forest algorithm ranks predictor variables based on the increase in model error by randomly permuting the values of the predictor variables. Briefly, the mean square error of the OOB data (OOB-MSE) for each tree is the average squared deviations of OOB observations from the predictions. The difference in OOB-MSE before and after random permutation of a predictor variable is averaged over all trees to compute variable importance. Top predictors were visualized using partial dependence and feature contribution plots using the Forest Floor package to unravel linear or non-linear interactions (Welling et al 2016).
We used one dimensional individual conditional expectation (ICE) plots that show the effect of a predictor feature on the response variable for all instances as well as a global average effect i.e. the partial dependence of the response on that feature (Friedman 2001, Goldstein et al 2015. The partial dependence plot allows us to identify whether the relationship between the predictor and response variable is linear, monotonic, or interactive (i.e. the pattern changes after a certain threshold of the predictor variable). The feature contribution which computes the contribution of a variable to Random Forest prediction with numerical observations, belongs to the same family of partial dependence plots. Unlike the variable importance, feature contribution is separately computed for each observation and allows us to apply a color gradient based on a predictor variable to identify latent interactions and correlations between predictor variables (Kuz'min et al 2011, Welling et al 2016.

Model building
Data used for model building included 1576 daily observations from continuous corn rotations at KBS-BCSE and ARL-BCSE; we used 670 daily observations collected from 2002 to 2014 in a corn-soybeanwheat rotation at KBS-LTER for independent model testing (table 1, figure S3). The N 2 O fluxes within the model-building data were divided into eight bins by merging some of the bins in figure S4 containing only a few high flux observations to assure representative sampling from the entire range of N 2 O fluxes; 70% of the randomly selected observations within each bin were used to train the model (training data) and the model was validated with the remaining data. We used the Synthetic Minority Over Sampling Technique for regressions to correct multiclass data imbalance in the training set (text S1; Chawla et al 2002, Torgo et al 2013; however, the number of observations in the highest domain of N 2 O fluxes >100 g N ha −1 d −1 were still limited ( figure S4).
The balanced training data were used to train two Random Forest models based on their representation of soil N availability. A Coupled model included both measured and SALUS predicted soil available N (see above); the Standard model included measured data only and used Days after fertilizations (DAF) as a surrogate of soil N availability. For each model we optimized the number of trees used in the forest (n tree ) and the number of variables considered at each node (m try ). The n tree = 800 best stabilized error and m try = 4 resulted in the greatest error reduction for the OOB observations (figure S5). The Random Forest models were tested on KBS-LTER data and model performance was assessed by root mean square errors (RMSE), coefficients of determination (R 2 ), and correlation coefficients (r) between observed and predicted daily N 2 O fluxes.

N 2 O flux variability, soils, and environment
The three studied sites exhibited large inter-annual variability in N 2 O fluxes during their respective measurement years (figure 1). Long-term average daily N 2 O fluxes were in the order ARL-BCSE (18.3) > KBS-BCSE (5.2) > KBS-LTER (3.4 g N ha −1 d −1 ) and widely varied within sites from −0.5 to 593, −7 to 237, and −4 to 89 g N ha −1 d −1 , respectively ( figure 1, table 1). The highest longterm daily average N 2 O flux from corn at ARL-BCSE (18.3 g N ha −1 d −1 ) was perhaps due to this site's greater soil N availability stemming from a high soil organic matter content and due to greater anoxia in its finer textured soils (table 1). The Mollisols at the ARL-BCSE in Wisconsin had three times higher SOC content than the Alfisols at the KBS sites in Michigan. Unlike the highly fertilized continuous corn systems at the BCSE sites, diverse crop rotation including unfertilized soybean and low-input winter wheat at KBS-LTER site lowered daily average fluxes. Furthermore, bi-weekly sampling frequencies for manual chambers at ARL-BCSE and KBS-LTER may have missed peak emissions events, in particular at KBS-LTER with its relatively narrow range of fluxes.

Model performance
We trained two Random Forest models to predict N 2 O fluxes. A Standard model used readily available soil, agronomic, and weather variables as model inputs and a Coupled model included additional high frequency soil N data generated by SALUS (table 2). We expected simulated soil N pools to improve estimates of available N based otherwise on N fertilizer rate and DAF in the Standard model.
Overall, both the Standard and Coupled models performed well in predicting temporal N 2 O fluxes from corn at the KBS-BCSE and ARL-BCSE validation sites using 70% of the observations for model training (figure 1). The models explained 65%-89% of the variability between observed and predicted N 2 O fluxes that were highly correlated (r > 0.80, p < 0.001). However, the Standard model overpredicted N 2 O fluxes on several occasions at the KBS-BCSE site, which resulted in higher prediction error than the Coupled model (RMSE of 10.2 vs 8.3). Surprisingly, however, the Standard model outperformed the Coupled model at ARL-BCSE (RMSE 15.8 vs 21), where the Coupled model over-predicted fluxes in 2010 and 2011. At the KBS-LTER test site, the Coupled model explained 51% variability of observed N 2 O fluxes in corn at KBS-LTER, for which no data were used to train the model. The standard model explained 38% of observed N 2 O fluxes from corn. For the entire corn-soybeanwinter wheat rotation at KBS-LTER, the Coupled model explained 38% variability, whereas the Standard model predicted only 13% variability.

Critical predictor variables for N 2 O emissions
The most important variables influencing N 2 O fluxes for both Standard and Coupled models were soil moisture (WFPS) and N availability (DAF for the Standard model, inorganic N for the Coupled model), which together explained >80% of the N 2 O flux variance in the OOB samples (figure 2). Other soil, environment, and management related variables had relatively smaller predictive values.
The ICE plot visualizes the average predictor variable effect on prediction (partial dependence) along with the spread for each instance in the data Inorganic N availability is represented in the Standard model as DAF and in the Coupled model as both measured and modeled (by SALUS) NH4 + -N and NO3 --N availability. A 70% of the data from KBS-BCSE and ARL-BCSE were used to build the Random Forest models and 30% were used for validation. KBS-LTER data were used for testing.
(figure 3). Both models identified that for most instances, N 2 O fluxes sharply increased after WFPS exceeded 0.70. The feature contribution plot showed that the Standard model predicted an exponential decrease in N 2 O fluxes with increasing DAF in interactions with WFPS ( figure 4(b)). Occurrence of high WFPS within a month after N fertilization increased N 2 O fluxes. Soil clay content >250 g kg −1 increased N 2 O for some instances, but lack of range on this variable in our data limits our interpretation ( figure S2(h)). Average N 2 O fluxes showed threshold responses to NO 3 --N and NH 4 + -N contents with a much smaller value for NH 4 + -N (∼30 kg NH 4 + -N ha −1 , figures 4(d) and (e)). However, heterogeneous relationships between N 2 O and NO 3 --N were observed for many instances (figure 3(e)), indicating possible interactions with WFPS. Higher WFPS values variably influenced N 2 O fluxes up to 125 kg NO 3 --N ha −1 and sharply increased beyond that threshold (figure 4). At the same time, WFPS > 0.75 negatively contributed to N 2 O fluxes when soil ammonium was below 30 kg NH 4 + -N ha −1 ; however, both moderate to high soil moisture content (0.6-0.9 WFPS) positively contributed to fluxes beyond 30 kg NH 4 + -N ha −1 .

Discussion
Our results demonstrate a high utility for using ML to predict agricultural soil N 2 O emissions. The Random Forest models were developed using both automated and manual chamber based N 2 O flux data from corn production at two sites with different soils and explained 65%-89% of daily flux variance at the training sites. When extended to a long-term test site with a very different cropping system it explained up to 51% of emissions variation in corn, especially when coupled with a cropping systems model that provided daily inorganic N values.

Predictability and interpretability of ML models
Our ability to assimilate data in traditional ways has not kept pace with the increasing availability of long-term N 2 O data from automated chambers and advanced sensor technology. ML provides the potential to efficiently use such data to generate new insights and derive predictive models that emulate high resolution fluxes. When tested against 13 years of weekly to biweekly flux measurements from an independent test site (KBS-LTER), our Coupled ML model accounted for 51% of the variability of daily average N 2 O fluxes for corn phase emissions and 38% for the entire corn-soybean-wheat rotation. This effort represents a 2-3 times improvement over conventional process-based models and with substantially fewer input requirements. The most commonly used process-based models DayCent (Del Grosso et al 2000, Parton et al 2001, DNDC (Li 2000(Li , 2007, and their variants (table S1) explain, on average, 20% variability of temporal N 2 O fluxes in 15 cropping system studies (71 observations) that we reviewed and that explicitly reported model performance (median of 17%, figure  S6). A recent modeling study using 24 process-based models showed equally large uncertainties for an ensemble model (Ehrhardt et al 2018). The improved performance of the ML model is due to its depending on functional relationships between predictor and dependent variables as learned from the data (Breiman 2001), rather than depending on an underlying process-level understanding of flux variability. This does not mean that ML models are inherently superior-indeed, their predictive ability is strictly correlational, and thus of limited use for predicting the effects of novel conditions, an important feature of process-based models with their superior understanding of biophysical interactions. Rather, ML models might be used to better scale existing knowledge and perhaps help to optimize processbased models by better identifying the key variables interactions responsible for driving episodic fluxes.
Indeed, in our model, key predictor variables are few (table 2, figure 2): two static soil properties (soil organic carbon and clay content) and four dynamic properties (WFPS, N availability (expressed either as DAF or inorganic N pools), temperature, rainfall, and N fertilizer rate). All of these factors are well known drivers of N gas emissions (Groffman et al 1988, Firestone and Davidson 1989, Conen et al 2000. These input variables are easy to measure proxies for soil biophysical processes. WFPS, for example, is an integrative measure of water availability and soil gas diffusivity (Linn and Doran 1984), and stands out here as the strongest individual predictor of N 2 O fluxes (figures 2 and 3). The strong interaction of WFPS with N availability (either DAF or inorganic N pools; figure 4) reflects the greater risk of significant N 2 O emissions following precipitation events close to N fertilization, reported in many other studies (e.g. Parkin andKaspar 2006, Saha et al 2017b). The lower predictive value for fertilizer rate is surprising and probably reflects the relatively narrow range of fertilizer rates (170-200 kg N ha −1 yr −1 ) used in our training data, which include only corn (figure 2). We might expect a broader range of fertilizer managements to elevate the predictive capacity of N rate, well known for its ability to mitigate N 2 O emissions from fertilized cropping systems (IPCC 2014;Mikkelsen and Snyder 2012;Millar et al 2010;Shcherbak et al 2014).
On the other hand, inorganic N species availability (the second most important variable) may represent a sufficient proxy of the integrative effect of management (fertilization) and microbial N transformations. Future development of our knowledge and data availability on N 2 O response to microbial community composition and activity may provide additional predictive capacity for the ML models.

Challenges of machine learning models for predicting N 2 O fluxes
Prior efforts to use ML to predict N 2 O fluxes have used cumulative fluxes as estimated from literature reviews (Philibert et al 2013) or process-level models (Villa-Vialaneix et al 2012, Perlman et al 2014. To our knowledge, the present effort is the first to use ML to predict highly resolved daily fluxes-especially important for designing management interventions to mitigate high fluxes and perhaps for building better process-based models. Yet there remain important challenges before ML models can be used more broadly to predict regional N 2 O emissions. The first challenge is to increase the model's generalizability. It would be naïve to believe that our ML model might be extrapolable to diverse soils, climates, and production systems given that the model was trained on data from a single type of cropping system with narrow ranges for predictor variables measured within a constrained biogeographic context. ML predictions are bound to the data range in the training set, such that the model will, for example, underestimate N 2 O fluxes beyond 593 g N ha −1 d −1 , the maximum value in our training data (figure 1; figure S4). Moreover, DAF is relevant only for Nfertilized systems. These limitations are illustrated by the model's lower predictability for fluxes from the whole rotation relative to the corn phase at KBS-LTER. For example, the Standard model identified DAF as the second most important variable, ranging from 1 to 378 d in the training data (annually fertilized corn); however, DAF ranged from 1 to 678 d for the KBS-LTER corn-soybean-wheat rotation at the test site, with N fertilization not present during the soybean phase of this rotation. Likewise, overestimation of N 2 O fluxes from the wheat phase of this rotation is likely due to the model's low temperature range-winter wheat is fertilized in early spring when soil temperatures are relatively low. The Coupled Model's use of soil inorganic N instead of DAF overcame this problem to some extent (figure 1).
The second difficulty is imbalance in our training data. Only 2% of total observations were associated with N 2 O fluxes >50 g N ha −1 d −1 , an arbitrary threshold that represents high flux events ( figure S4). This limits the ML model's opportunity to learn critical variable interactions promoting episodic emission events. To avoid this limitation, we attempted to balance the distribution of N 2 O flux data by dividing the training data into bins based on N 2 O fluxes, and then respectively over-and under-sampled the minority (high N 2 O flux observations) and majority (low N 2 O flux observations) bins ( figure S4). However, this may not have eliminated the imbalance problem and may be a reason that the ML models over and under predicted N 2 O fluxes that exhibited murky relationships with all predictor variables ( figure S7).
Both of these problems can be addressed by the addition of data sets from more diverse managements and geographies. Greater variability in training data (rather than more data from similar sites) will be key to developing more generalizable models. This can be achieved by coordinating N 2 O research to combine cross-site observations that follow a consistent protocol for predictor variables (Borer et al 2014, Almaraz et al 2020.

Coupled machine learning and process-based models
Our results provide formative evidence for improving the efficiency and accuracy of N 2 O predictions by integrating ML and process-based modeling approaches. Traditionally, biogeochemical models Figure 5. A conceptual strategy of integrated process-based and ML modeling for N2O prediction. Daily time-step outputs on soil biophysical and biogeochemical variables predicted by the process-based models with a satisfactory level of confidence feed into a Random Forest model, trained on diverse soil, climate, and management scenarios, to predict N2O fluxes. are guided by process-level theory while ML models are data-driven. Their fusion is gaining attention in Earth and Geosciences (Karpatne et al 2017a, Brenowitz and Bretherton 2018, Reichstein et al 2019 but is in its infancy in terrestrial biogeochemistry. We show that ML predictions of a temporally dynamic soil biogeochemical processes such as N 2 O fluxes can be improved by incorporating data produced by a well-validated process-based model simulating soilplant-atmosphere processes-in our case, by predicting inorganic N pools. This approach provides two important synergies for N 2 O prediction: First, replacing the N 2 O subroutine in the process-based model with an ML subroutine could provide a coupled model with greater predictive power ( figure 5). This, however, requires the ML model to be trained on diverse input data with a wide prediction range. For example, processbased models typically predict soil water, temperature, and N availability with acceptable fidelity in a wide variety of cropping systems and geographies. This information could be contributed to the ML sub-model as input variables to predict N 2 O. This approach has been efficiently used in atmospheric and ocean science (de Bezenac et al 2017, Karpatne et al 2017b).
Second, ML models can be used to analyze N 2 O prediction bias by process-based models. That is, an ML model can automatically learn the pattern of prediction deviation from observed fluxes and identify the variables associated with the greatest contribution to the mismatch. This would lead to a better representation of the biogeochemical processes affecting the predictor variable through improved parameterization learned from the real-world variability of N 2 O, and thereby help to correct model bias.

Conclusions
Results show that regression-based ML models such as Random Forest can, with limited input data, substantially improve temporal N 2 O flux predictions from intensively managed cropping systems. Furthermore, ML facilitates interpreting non-linear interactions among N 2 O predictor variables, which is often challenging with contemporary statistical methods. While our ML model predicted up to 51% variability of daily N 2 O fluxes from corn at two upper Midwest sites, its application to other regions and crops requires further improvements in model training based on diverse data sources from various soils, climate, crop, and management conditions. Results also identify potential opportunities for integrating ML with process-based models to improve the overall predictability of soil N 2 O emissions-a new paradigm, perhaps, for N 2 O modeling.

Data availability
Data and R code are available online at datadryad.org (https://doi.org/10.5061/dryad.bnzs7h493). The supplementary data provides supporting information.