A comprehensive uncertainty quantification of large-scale process-based crop modeling frameworks

Regional and global impact assessment tools are increasingly used to explore and evaluate the impact of climate change and extreme events on crop yield and environmental externalities. However, the large uncertainties associated with the inputs or the parameters in crop models within these tools, limits their predictive ability, exceeding the spatiotemporal variability of observed yields. The objective of this study is to explore and quantify different sources of uncertainties and assumptions made behind initial conditions (IC), soil input, meteorological forcing, management practices and model cultivar parameters by running regional simulations for the time period between 2009 and 2019. Simulations were performed for maize and soybean using the pSIMS platform across the U.S Midwest by incrementally accounting for five sources of uncertainty with a 7 km×7 km resolution using the APSIM and DSSAT crop growth models. First, the relative contribution of different sources of uncertainty was estimated over time and space. Then, a series of nitrate leaching hotpots were identified and a regional maize yield productivity index was estimated by decomposing the uncertainty in the same scenario using a hierarchical Bayesian random-effect model. All factors showed a strong spatial pattern in their contribution to the total uncertainty and their contribution was found to be partially dependent on location. However, across the whole region, it was found that the uncertainty around management is larger than IC, soil and meteorological forcing while showing a strong correlation with each of these factors. Given the high spatial correlation, we hypothesize that constraining soil inputs and management uncertainty could allow for the largest reduction in predictive uncertainty for crop yield. Our results showed vast areas over northern IA, IL and IN with high potential for NO3 leaching and southern IA, IL and east NE with lower maize productivity index compared to the regional average.


Introduction
Many agricultural regions are experiencing changes in cropping area, intensity and yields driven by shifts in the timing and variability of precipitation, temperatures, and extreme weather, which threatens food security and rural livelihoods (Iizumi and Ramankutty 2015). Researchers are increasingly turning to process-based simulation models to examine how current and future conditions could impact crop production systems (Challinor et al 2014), and to guide farmers and policymakers in designing and implementing adaptation measures for the evolving climatic conditions (Wang et al 2018). Regional and global impact assessment studies have chiefly focused on crop yields and total production, although increasingly they have also been expanded to include environmental externalities (Bondeau et al 2007, Liu et al 2007 and agro-economic modeling (Folberth et al 2019).
However, the models used in impact assessment studies are imperfect approximations of the multitude of soil and crop processes and their interactions with the environment. Often times, large uncertainties associated with model structure, choices of model inputs and parameters limits the predictive ability of these models, exceeding the spatiotemporal variability of observed yields (Ramirez-Villegas et al 2017). Application of process-based crop models is also limited by the paucity of spatially detailed input data available (Folberth et al 2016a). In the absence of spatial data on the distribution of key model inputs such as information on crop cultivars and management (e.g. irrigation, planting, fertilization, tillage), modelers often make broad assumptions across large geographical regions that may or may not reflect onthe-ground decision-making of individual producers in the context of economic opportunities and policy incentives (Folberth et al 2019). Considering that yield estimates from process-based crop models often lay the foundation for subsequent downstream agroeconomic assessments and policy-making, uncertainties and errors originating from these assumptions can propagate throughout the whole assessment chain (Folberth et al 2016b). This highlights the need to develop more efficient techniques to uncover the most critical sources of uncertainty and their underlying drivers, so to improve the quality and transparency of future impact assessments (Lobell and Asseng 2017).
Recent studies have provided conflicting results when quantifying and identifying the dominant sources of uncertainty in large-scale simulations (Asseng et al 2013, Kassie et al 2015, Ramirez-Villegas et al 2017. Thus, there is little agreement on how these sources could affect crop yield prediction products and their interpretation. This could be the result of inconsistencies in the factors and sample spaces examined across studies. For example, Asseng et al (2013) in a regional study concluded that the main source of uncertainty in projections of wheat yield arises from the variability among crop models, with less uncertainty arising from global circulation model variation. In a similar regional study, Tao et al (2018) concluded that the contribution of crop model structure to total uncertainty is greater than that from climate models and crop model parameters using similar climate data. Folberth et al (2016b), however, found that the soil data used may lead to differences in both magnitude and direction of climate change impacts under extreme scenarios in a global impact study for wheat yield predictions. They also reported that climate data contributed more to the uncertainties, above soils or nitrogen (N) fertilizer treatments (Folberth et al 2016b). In addition, studies often rely on weather data derived from future climate projections, leaving little opportunity for rigorous assessment of such findings, with some in the scientific community questioning the applicability of these impact assessment studies (White et al 2011).
Impact assessment studies, and in general highlevel questions of applied research, are inherently multidimensional, yet they are often framed in reductionist ways. This has resulted in impact assessment efforts falling short of capturing the complexity in understanding food systems (Nat 2020). While a growing body of studies have attempted to quantify uncertainties associated with a few key sources, to our knowledge, a comprehensive evaluation of uncertainty in the context of impact assessments is still lacking (Folberth et al 2016b). Here, we fill this gap by characterizing the uncertainty associated with the range of decisions and assumptions that modelers make when setting up large-scale simulation experiments, such as initial conditions (IC), soil, climate drivers, model structure, model parameters and management practices. We analyzed the contribution of each of these sources and potential interactions to the overall uncertainty in a multidimensional, regional-scale simulation using historic weather data. Our aim is to provide guidelines for future model development and model application to assess crop production risks in the face of changing climate.

Area of study
The Corn-Belt in the Midwestern United States is one of the world's most intensive and extensive agricultural region. This area produces over 33% of the world's maize and 34% of the world's soybeans (Wang et al 2020). Our specific area of study covers 15.8 M ha of maize and soybean cropland, which represents 49% (maize) and 42% (soybean) of the total area devoted to these crops in 2019. The most common crop rotations in the study area include corn-soybean or corn-corn-soybean and the normal yield expectation varies between 5-14.1 Mg ha −1 for corn and 1.6-4.3 Mg ha −1 for soybean. The study area exhibited 2009-2019 strong climatic gradients, with precipitation showing a strong trend from southeastto-northwest, and temperatures on a south-to-north gradient (figure 1). During this period, 2012 and 2011 were years with lowest precipitation and 2019 recorded the highest rain while, 2012 and 2016 were the hottest years and 2014 and 2009 were the coldest years.

Simulation platform and input data
DSSAT V. 4.7.5 and APSIM V 7.10 are two of the most widely used cropping system modeling platforms were employed in this study. Both of these two models follow a modular architecture for integrating simulations across soil, crop, management and environment, and are capable of simulating crop yield, biomass, leaf area index, soil water content, soil temperature, nutrient cycling related variables as well as their interactions with different crop and management practices (e.g. irrigation, fertilization) on a daily time step (Jones et al 2003, Holzworth et al 2014. Specifically, DSSAT and APSIM have been extensively used in the Midwest and elsewhere to examine the direct and indirect effect of new management practices on crop production and environmental outcomes (Dokoohaki et al 2015, 2018, Martinez-Feria et al 2019, Archontoulis et al 2020, Saddique et al 2020. Maize growth and development in both platforms is simulated using crop-specific maize models, both of which derived from the CERES-Maize model (Jones 1986). On the other hand, soybean is simulated in both platforms using calibrated generic crop models; DSSAT uses CROPGRO (Jones et al 2003) while APSIM uses their native PLANT family of crop modules. Using these two different modeling platforms with their different internal structures, methodologies in crop related modules in addition to other relevant modules such as soil moisture, carbon, nitrogen and soil temperature allows to account for uncertainty in model structure.
To run both APSIM and DSSAT for the whole region, we used the pSIMS platform (Elliott et al 2014), a suite of tools that provides uniform protocols for input data aggregation and harmonization and post-processing outputs to spatial data types. pSIMS facilitates the labor-intensive process of creating and running parallel clusters of crop simulations on a spatial grid and allows for using high-performance computing to run simulations that extend over large spatial extents. pSIMS was modified in this study to randomly choose soil profiles if available in the input data product (Elliott et al 2014). SoilGrid (Hengl et al 2017) with 250 m resolution was used as an alternative to the global soil dataset for earth system modeling (GSDE) (Shangguan et al 2014) for ensemblizing soil properties. In addition 10 ensemble members from ERA5 re-analysis data products were used to account for uncertainty in weather forcing driving the crop models. ERA5 is a global girded dataset with 30 km spatial and hourly temporal resolution developed by European Centre for Medium-Range Weather Forecasts (Hersbach et al 2020).

Simulation setup
First, different scenarios were defined where each scenario simulates 100 ensemble members (permutation) of random samples from five classes of uncertainty including IC, soil, meteorological forcing (Met), managements (Mng) and cultivar parameters (Params) for maize and soybean (table 1). Then, simulations were independently and sequentially performed using both DSSAT and APSIM for all scenarios over IL, IA, IN and areas with high farming density in NE, MN resulting in approximately 20 M simulations in total. Scenarios were designed to sequentially increase the number of uncertainty factors included in the simulations from only IC to the last scenario with IC, soil, meteorological forcing, management practices and model parameters. Simulations cover the time period between 2009 and 2019 with 7 km × 7 km resolution for two crops of maize and soybean where Müller et al (2019) was used as the baseline scenario. In this scenario models were set-up with fixed planting date adopted from (Sacks et al 2010) crop calendar, with additional detail in the conterminous U.S provided by crop calendar data from the US Department of Agriculture. We intentionally assumed a series of wide prior distributions for most of the contributing factors with the goal of accounting for all possible scenarios. We intend to further constrain these priors in subsequent studies (table 1).
Maize cultivar parameters for APSIM were adopted from Archontoulis et al (2020) and 14 sets of soybean cultivar parameters were selected based on Table 1. Defining different scenario; GSDE: global soil dataset for earth system modeling, C/N: carbon to nitrogen ratio, MG: maturity group. U: stands for sampling from a uniform distribution. N: stands for sampling from a normal distribution.

Variance partitioning and statistical analysis
The contribution of each source of uncertainty was estimated by calculating the difference in variance between pairs of scenarios (Dietze 2017). This method is particularly useful to capture the nonlinear interaction between different factors that may affect predictions. In addition to more direct sources of uncertainty like meteorological forcing and soil, we assessed the sensitivity of both crop models to year and site effects for maize yield and NO 3 leaching. This enables us to answer questions such as 'Are these models more sensitive to a specific location or environmental condition?' . We fit a hierarchical Bayesian random-effect model to estimate the temporal and spatial sensitivity by partitioning the total uncertainty produced in the Params scenario (scenario with all the sources of uncertainty) into a global mean µ g with a site effect (α s ) and a year effect (α y ) for each pixel as follows: where µ s represent the site-level model estimates, σ, σ T , τ s and τ s represent the precision around site-level mean, global mean and site effect and year effect respectively. Uniform hyperpriors were also defined sampling from a uniform distributions for τ s , τ y , σ T and σ. We fit our hierarchical model within the R statistical software (R Core Team 2020) using the NIMBLE package (de Valpine et al 2017).

Results
In this section we first take a closer look at different sources of uncertainty and their contribution to the total uncertainty around different model outputs and then we estimate the environmental sensitivity of maize yield and NO 3 leaching across the study area.

Variance partitioning
The total uncertainty generated by each scenario over the simulation time period showed that IC, Soil and Met factors contribute less in propagating uncertainty in model outputs compared to the Mng scenario while, cultivar parameters showed a more complex behavior. The contribution of each factor was found to be partially dependent on location and there is no uniform pattern for defining single factor contribution. Both models were more sensitive to Mng throughout the region whereas, the rest of the factors showed more spatial dependence in their sensitivity (change in variance across space). Soil and IC factors were relatively more sensitive to lower quality soils in southern IL and IA, whereas Met did not exhibit a clear spatial pattern. Maize and soybean yields were expected to show an increase in total variance in model outputs as we added more sources of uncertainty. However, it was found that this trend mostly holds as we added more uncertainty from IC scenario to Mng scenario, while adding the variability associated with model parameters dramatically changed this behavior. The Params scenario showed a strong spatial and temporal structure where in large parts of the region lower  total uncertainty was found compared to the other scenarios. On the other hand, NO 3 leaching for both maize and soybean followed the expected behavior and adding new sources of uncertainty resulted in increase in total variance (figure 4). Total variance in both maize and soybean yield showed strong spatial structure, as higher uncertainty was found in higher latitudes, probably due to uncertainty in model parameters compared to other factors.
From the temporal perspective, uncertainty in NO 3 leaching across the region varied for different years, and 2013 exhibited highest precision, while 2018 showed the highest variability for maize. For both crop yield and NO 3 leaching, cultivar parameters showed the highest sensitivity or change in variance over time followed by management and climate drivers with the largest variance. Similar to the regional assessments (figures 2 and 3), adding uncertainty associated with model parameters did not result in direct increase in total variance for yield, whereas NO 3 leaching followed the expected behavior. Maize yield showed substantially lower uncertainty in the Params scenario compared to the IC scenario in years with extreme weather (excess rain vs

Environmental sensitivity
Next, we decomposed the total uncertainty into a series of year and site effects to identify 'hotspots' for NO 3 leaching and maize yield. We believe this index represents the inherent nitrate leaching and maize production potential for these areas given their soil properties and weather conditions over the past decade.
The year effect or the temporal sensitivity estimated from the Param scenario with all the sources of uncertainty, showed that the models were highly sensitive to years with extreme events such as drought or excess precipitation in 2012, 2015, 2016 and 2019, while we found no statistically significant upward trend in the year effects ( figure A1). Among all years, the estimated year effect in 2012 and 2016 for maize yield and 2012 and 2015 for NO 3 leaching simulation were significantly lower and higher than zero respectively.
The estimated site effect revealed that vast areas over northern IA, IL and IN have a high potential for NO 3 leaching (figure 5) with approximately 30 kg ha −1 above regional average, while southern IA was found to show less potential for NO 3 leaching with 10 kg ha −1 below regional average for NO 3 leaching. This is consistent with findings from similar studies designating northern IN, IL and IA (Risch and Cohen 1995, Nolan et al 2002 as areas with high risk and maximum vulnerability for ground water contamination due to nitrate leaching. This analysis reveals areas with highest need and urgency for practicing alternative management practices or other interventions such as cover crops to manage and control the NO 3 leaching from this area. We were unable to find any statistically significant relationship between soil properties and site effects for NO 3 leaching, pointing to the fact that more complex and confounding dynamics between soil, weather and management practices control these processes.
In addition, the site effect for maize yield which represents maize productivity showed that relatively small areas over southern IL, IA and NE perform below average, whereas, northern and central IA, IL, NE and IN show high production capacity. The largest site effect was found for eastern NE with 1.5 Mg ha −1 above average, while the smallest maize productivity was found for southern IL with approximately 1 Mg ha −1 below regional average possibly due to lower quality soils. The high productivity in IA could be associated with the high quality glacial tills soils of the Des Moines lobe, while the loess and drift soils in central IL and IN supports the high productivity in those areas.
This analysis is particularly important because as many states have developed different methods for evaluating crop productivity (e.g. SPI index in IL, CSR2 in IA and NCCPI2 in IN), no single index is developed and offered to standardized and compare maize productivity and land valuation across broad regions. Furthermore, from the modeling perspective this analysis helps to assess the flexibility of crop models at a regional scale and evaluate their ability to capture dominant environmental drivers such as drought or excessive water conditions.

Discussion
Crop cultivar parameters were the largest source of uncertainty for NO 3 leaching and showed a substantial change (decrease) in crop biomass for both crops (figure 4). Tao et al (2018) reported that the relative contribution of crop model parameters was found to be small compared to model structure and climate drivers. However the lower uncertainty in Param scenario does not suggest confidence in model predictions and it points to the sensitivity of crop models to model parameters as well as a strong interaction between crop model parameters and other factors. In order to unpack this complex pattern, we expanded equation (B2). In this equation, it is often assumed for simplicity that the COV between different factors is negligible and this results in expressing the total uncertainty as the sum variance in each factor modulated by the model sensitivity to each factor. However if the COV between two factors is not negligible then the general form of the equation (B2) could be written as: where X and Z are different factors such as meteorological forcing and parameters, and the overall magnitude of this term depends on the sensitivity of models to each factor, variance in each factor and also the COV between the factors. In one hand, the internal design of most crop models and their approach to account for the effect of crop parameters such as dependency on thermal time produces strong and inherent covariability with latitude and longitude. On the other hand, most other environmental factors such as model inputs including the climate variables and management practices such as planting date are highly correlated with geographical locations causing these models to generate complex spatial structure in modeling the effect of crop parameters. Negative correlation coefficients found in figure A2 clearly suggests that this inherent covariability generated by the models reduces the total uncertainty within equation (B2) in terms including the model parameters resulting in reducing the total uncertainty in Param scenario for maize yield. In addition, equation (3). depends on the sensitivity of the model state variables to the factor ( ∂f ∂Z ). For instance, NO 3 leaching (as f ) is more likely to show less sensitivity to cultivar parameters compared to crop biomass lowering the overall magnitude of equation (3). compared to crop yield or biomass.
As a general guideline, the scientific community should strive to shrink the uncertainty in different factors by continuously confronting model simulations with observed data. This may be relatively straight forward for ICs, soil properties and management practices, however, the complex patterns found in the Param scenario suggests that crop related parameters need to be approached in a different way. Overemphasis on site-level parameter optimization will decrease the generalizability of the process-based models to new sites and environments and is likely to lead to over confident predictions (Raiho et al 2020). Therefore, we suggest constraining model parameters in a hierarchical framework with observations from multiple sites and different years as follows: where cultivar parameters (θ) are calculated as an overall mean in addition to a series of year and site effects for different model parameters. The purpose of partitioning parameter uncertainty into the overall mean and a series of site and year effects is twofold. First, it adds an extra layer of flexibility to the models, as it allows for accounting variations that are only a function of geographical location or time. This is especially important for point-scale processbased crop models such as APSIM and DSSAT mainly because they intrinsically lack the capacity for capturing spatial patterns. Second, by further analyzing the site effect (α site ) or the year effects (α year ), there is an opportunity for exploring missing processes/mechanism from the models. For example, Fer et al (2021) found large site to site variability in the parameters related to temperature response to respiration and photosynthesis by exploring the relationship between the site effects with environmental variables. They concluded that this variability is associated with a lack of thermal acclimation and adaptation in their model. Efficient solutions are put forward to account for the spatial structure, such as hierarchical parameter data assimilation as mentioned above as well as state data assimilation to constrain the uncertainty in model parameters Var [θ] and consequently model state variables Var[Y t + 1 ]. We hypothesize that a joint dynamic parameter and state data assimilation in a regional scale seems to be the most promising pathway to both constrain and quantify the model uncertainty in parameters and state variables for future climate impact studies.
Most climate impact studies attempt to account for prediction uncertainty by examining the variation due to meteorological forcing inputs. Yet, here we found that the uncertainty in management practices was almost as important as meteorological forcing, accounting for 20%-30% of total uncertainty (figure 2) for maize yield and 10%-20% on soybean yield depending on the geographical location. In a previous study, Folberth et al (2016b) also reported that one of the keys for improved climate impact assessments is the representation of crop management practices. Here, the uncertainty surrounding management was found to be larger than IC, soil and meteorological forcing, but showed a significant correlation with those factors. This could mean that successful efforts in reducing the uncertainty in IC and soil could substantially lower the variability in management uncertainty as well ( figure A2). The fact that management explained a large proportion of the total variance among all factors for maize yield reveals a need for further harmonization of input data and model setup. Folberth et al (2016b) also found that the spatial patterns of whether soil or weather dominate yield variability, depend on crop management.
In contrast to Folberth et al (2016b), we found soil to be the second smallest source of uncertainty after IC (figure 4). Although soil contributed roughly ∼15%-25% to total uncertainty in southern IL and IN and parts of IA (figures 2 and 3), it accounted only for approximately 5%-10% in the rest of the region. However Folberth et al (2016b) found that the total range of possible soil type variation is substantially higher than the solely climate-driven 10-year yield variability under most crop management configurations across the different climate regions. They concluded that the dynamic interactions of soil texture, precipitation and plant water requirement can result in unexpected climate change impact responses (Folberth et al 2016b).
To assess whether lower variability in one factor (or constraining a factor in a hypothetical scenario) will lead to lower uncertainty in other factors we sequentially subtracted the total variance in all scenarios and for all pixels. As it was mentioned before, this approach helps to capture the nonlinear interactions between factors that may affect prediction (Dietze 2017). Looking closer at figure A2, it was found that the covariance between soil, management and meteorological forcing uncertainty maybe contributing significantly to the year to year variability in these factors. We also hypothesize that given the large correlation between soil with Met and management uncertainty, constraining soil and management uncertainty could allow for the largest reduction in predictive uncertainty.
There are numerous studies which either account only for uncertainty in model structure using a multimodel ensemble framework or only the uncertainty in driving forces of the crop models. However, as it was shown in figure 3, each of those factors may contribute a relatively small portion to the total uncertainty. Many studies have accounted only for the variability in projections of future climate scenarios (Tebaldi and Knutti 2007, Asseng et al 2013, Kassie et al 2015, Lobell and Asseng 2017, Wang et al 2018, Müller et al 2019, but in this study we show that uncertainty in management practices, soil properties and crop cultivars are as significant in determining the accuracy and precision of yield prediction for both maize and soybean. In addition, it is unclear how much uncertainty is accounted for in multi-model ensemble studies as there is a lack of independence in models selected for those simulations (Raiho et al 2020). This challenges the precision of long-term estimates generated by those frameworks (Tebaldi and Knutti 2007). We also found that when different sources of uncertainty crossover with other sources of uncertainty the relative contribution of those could significantly vary across a region. It is noteworthy that the relative contribution of any single source of uncertainty to climate impact assessment will change along with the crop model structure, the ensemble of models selected, and location of the study area.
One important implication of this study is that the outcome can be used for hypothesis testing and exploring different model designs with the goal of reducing the total variance or eliminating the complex spatial and temporal structure in the uncertainty due to interaction between different factors. Alternatively, the estimated variability in this study could potentially provide priors for how to propagate uncertainty in future studies. It is also critical to test different levels of complexity for different processes in crop models and to assess when and where adding more complexity is justified (Raiho et al 2020). Generally, the primary practical uses of uncertainty evaluation can be summarized as (1) it can provide more accurate and robust projections and (2) it is helpful to better convey information for evaluating the potential risk, which is of importance to gain the trust of the general public (Shrestha et al 2016, Wang et al 2018. The knowledge gained from this analyses can inform both modelers and the wider scientific community, making use of publicly available data (Mistry et al 2017) about the magnitude of process and parameterization induced uncertainty across a wide range of sources of uncertainty (Folberth et al 2019).
The quantification of uncertainty in this research represents only a portion of the known biophysical variables and their associated uncertainty. This brings focus on efforts that should aim at reducing different sources of uncertainty whenever possible. However, when making statements about the future, it is important to recognize that the uncertainty in socioeconomic factors is just as relevant (Simelton et al 2012). This is a complex interaction involving future markets, demographic changes, improvements in plant breeding and agronomic practices and changes in human behavior (Smith 2013).
We did not discuss the use of 'black box' machine learning (ML) models and the implications of using those models, because (1) It is difficult to prove the real adequacy of full black-box models for future projection of our food system as those applications would likely fall out of the scope of training data sets, (2) Modern ML algorithms may show satisfactory predictive capacity; however, they lack the insights which allow us to reduce uncertainty moving forward. This practical limitation requires careful consideration of the quality of data sources and processes which leads to broad agronomic assessments. Our goal is not to provide the best predictive model but rather to suggest a path forward for improving the modeling process.
To reduce these uncertainties, future efforts should focus on (1) developing harmonized management practices and soil data sets and (2) further partitioning of process uncertainty to reducible and irreducible sources (Raiho et al 2020). As a future direction, we will be working towards studying these different factors in more detail at a regional scale with the goal of providing constrains on each factor and producing more constrained priors for future studies. Also, this should be a common practice for scientists to share detail description of their model calibration procedure and optimum parameters found, so it could be used to perform a series of meta-analysis studies for further constraining model parameters.

Conclusions
Incrementally accounting for five sources of uncertainty allowed for estimating the relative contribution of IC, soil, meteorological forcing, management practices and cultivar parameters in total uncertainty for crop yield and nitrate leaching across the U.S Midwest. Large areas over IA, IL, IN and east NE were identified as the NO 3 leaching hotspots, and maize yield productivity index was found to be lower in southern IL, IA and NE compared to the regional average. Even though the APSIM and DSSAT models showed flexibility in capturing complex environmental conditions such as drought, their internal design makes the interpretation of uncertainties in model parameters difficult. Across the whole region, it was found that the uncertainty around management is larger than IC, soil and meteorological forcing while, the ensemble of both models showed substantially larger sensitivity (change in variance) to cultivar parameters compared to management practices, soil, climate data or IC. As a future direction, we will be working towards estimating the uncertainty associated with different soil, climate and management data products and identify data products with the lowest bias for future impact assessment studies. Further, it is necessary to explore and quantify the reducible versus irreducible portion of uncertainty associated with each factor.

Data availability statement
The data that support the findings of this study are available upon reasonable request from the authors.

Appendix A
where this framework suggests that state variable Y at time t + 1 is a function of Y at time t with some drivers/covariates X given some model parameters θ where, both Y and X could be vectors describing multiple state variables or model inputs. Assuming no covariance across different factors the variability around equation (B1) could be written as: where for example ( ∂f ∂θ ) 2 term describes the model sensitivity to driver θ and Var[θ] describes the variability in model parameters. Equation (B2) clearly suggests that the variability around the state variable Y is sum of uncertainty in ICs, driving forces and model parameters and each of these uncertainties are modulated by model sensitivity to that factor. For example, if one is interested in quantifying the uncertainty around soil evaporation, even if there is a large uncertainty in a parameter controlling the carbon mineralization in soil (θ 1 ), since soil evaporation will most likely show little sensitivity to θ 1 then the uncertainty in θ 1 may not contribute as much to the uncertainty in soil evaporation given the small sensitivity.