Maize yield loss risk under droughts in observations and crop models in the United States

The negative drought impacts on crop yield are well recognized in the literature, but are evaluated mainly in a deterministic manner. Considering the randomness feature of droughts and the compounding effects of other factors, we hypothesize that droughts effects on yields are probabilistic especially for assessment in large geographical regions. Taking US maize yield as an example, we found that a moderate, severe, extreme and exceptional drought event (based on the standardized precipitation evapotranspiration index) would lead to a yield loss risk (i.e. the probability of yield reduction lower than expected value) of 64.3%, 69.9%, 73.6%, and 78.1%, respectively, with hotspots identified in Central and Southeastern US. Irrigation has reduced yield loss risk by 10%–27%, with the benefit magnitude depending on the drought intensity. Evaluations of eight process crop models indicate that they can well reproduce observed drought risks for the country as a whole, but show difficult in capturing the spatial distribution patterns. The results highlight the diverse risk pattern in response to a drought event of specific intensity, and emphasize the need for better representation of drought effects in process models at local scales. The analysis framework developed in this study is novel in that it allows for an event-based assessment of drought effects in a risk manner in both observations and process crop models. Such information is valuable not only for robust decision-makings but also for the insurance sector, which typically require the risk information rather than a single value of outcome especially given the uncertainty of drought effects.


Introduction
Drought is an extreme climate phenomenon that has destructive impacts on agricultural production at regional to global scales (Lobell et al 2014, Lesk et al 2016, Zipper et al 2016. Globally, droughts during 1964Globally, droughts during -2007 have caused a cereal loss of 1820 million Mg equivalent to global maize and wheat production in 2013, and the loss during more recent droughts  was twice larger than that during earlier droughts   (Lesk et al 2016). Assessing drought impacts is critical for adaptation and mitigations, and has thus attracted huge attentions during the past decades (Hlavinka et al 2009, Lobell et al 2014, Shi and Tao 2014, Troy et al 2015, Araujo et al 2016, Potopová et al 2016, Zipper et al 2016, Madadgar et al 2017.
In general, there are two groups of models for evaluating climate impacts on crop yields: process and statistical crop models, and each has its own strengths and weakness Asseng 2017, Leng andHall 2020). Process-based models represent the physiological and phenological dynamics of crop growth and yields, and allow for mechanistic separation of climate effects through numerical experiments that are impossible for large-scale field experiments. This unique capability has led to many useful impact assessment with process models, though most of them focus on the average and inter-annual variability of yields (Deryng et al 2011, Asseng et al 2013, Hawkins et al 2013, Iizumi et al 2013, Challinor et al 2014, Ray et al 2015, Schauberger et al 2017. Lack of observations combined with incomplete process understanding, however, often lead to substantial uncertainties in process model simulations (Bassu et al 2014; Leng et al 2016). Calibration is proposed as an effective method to reduce such uncertainty of process model simulations. For example, Ceglar et al (2019) showed that calibration of phenology parameters leads to a substantial improvement of simulated crop yields by world food studies simulation model (WOFOST) over Europe. Due to data limitations, process models often use the parameters calibrated in a few locations to a larger study region, or upscaled through the agro-climatic zonation schemes (Van Wart et al 2013), thus introducing uncertainties especially when applied outside locations/regions where they were developed (Elliott et al 2015, Müller et al 2017. The underlying uncertainties of process crop models challenge their application for evaluating the effects of climate extremes such as droughts, though large-scale evaluation of process models in simulating drought effects is rare. Through fitting a linear or non-linear relation between observed climate variables and census yield data, statistical models have been widely used for climate impact assessment given its simplicity and less computation cost (Schlenker and Roberts 2009, Leng 2017a, 2017b. Troy et al (2015) investigated the empirical relations between crop yields and several climate extreme index, and revealed a nonlinear sensitivity of US maize to droughts. Zipper et al (2016) showed a distinct pattern of US maize yield sensitivity to droughts, through examining the slope of a linear relationship between yield and a drought index. Similar correlation analysis has also been applied in other regions (Liu et al 2018, Zampieri et al 2017, Kim et al 2019. Such deterministic approaches could provide valuable insights of the overall sensitivity of yield response to droughts, but have some issues in quantifying drought impacts on crops. First, drought by its definition is rare. Its randomness and slow evolution process make it hard to quantify its direct effects. Second, besides drought, crop yield is influenced by several other factors that occur concurrently such as high temperature (Leng 2019), thus making it challenging to separate the role of drought in large geographic areas. Therefore, it is important to redefine drought effects in a risk-based manner. Indeed, providing a risk-based evaluation is valuable not only for robust decision-makings but also for the insurance sector, which typically rely on the many possible outcomes and the corresponding probabilities.
Recently, Leng and Hall (2019) investigated future drought impacts on global crop yields, but the performance of process models in the history remains under-examined. Before projecting what future crop yield may change under droughts using a large number of models, it is critical to determine how individual model performs compared to observations. Previous studies have well evaluated the skill of global gridded crop models (GGCMs) in yield simulations in terms of the mean and variability (Müller et al 2017, Leng andHall 2020), but little is known about its performance in simulating climate extreme impacts. Lecerf et al (2019) validated the process model WOFOST over Europe, and showed a promising skill of WOFOST in simulating the effects of water stress on crop yield. To the best of my knowledge, however, evaluation of an ensemble of process models have not been conducted in a probabilistic manner. Especially, the sensitivity of crop yields to droughts of various severities is not well reported.
To fill the gaps, we develop a framework to aid risk assessment of droughts on crop yields, with a focus on the validation of process-based crop models against observations. Here, risk is defined by the hazard (the probability of a drought event) and susceptibility/vulnerability (the probability of yield loss under the given drought intensity), following (UN/ISDR 2004, Yin et al 2014. A case study is conducted for maize yield in the United States, which accounts for 31% and 33% of global maize supply and export in 2019, respectively (https://www.usda.gov/). Specifically, this study contributes to the literature by addressing the following scientific questions: (a) how much risk of yield loss would be expected when experiencing a drought event of specific intensity? (b) How such yield loss risk is distributed across the country? (c) Can process models reproduce the patterns of yield loss risk under droughts? Here, the eight process models are evaluated in a probabilistic manner, regarding their skills in simulating yield response to four drought intensity categories (i.e. moderate, severe, extreme and exceptional droughts). The analysis framework developed in this study can be extended to other regions, considering different crop types and drought indices as well. Section 2 describes the materials and methods, with the results and discussions presented in section 3. Section 4 summarizes the major conclusions obtained in this study.

Crop yields and climate data
County-level annual maize yields and state-level growing seasons are obtained from the National Agriculture Statistics Survey (NASS) Quick Stats database maintained by the US Department of Agriculture (USDA) (www.nass.usda.gov/Quick_Stats). The NASS Cropland Data Layer is obtained used to mask out the maize growing areas. Maize yields simulated by eight process-based crop models are obtained from the Agricultural Modelling Intercomparison and Improvement Project (AgMIP) (Rosenzweig et al 2013). These process models show various differences and similarities in terms of input, processes, calibration, management, etc (supplementary tables S1 and S2 (available online at stacks.iop.org/ERL/16/024016/mmedia)). For example, the environmental policy integrated model-IIASA (EPIC-IIASA) uses four varieties, while the GIS-based EPIC model (GEPC) adopts a highyielding and a low-yielding variety and the environmental policy integrated model -BOKU (EPIC-BOKU) used the high-yielding variety. In this study, the eight crop models are harmonized in terms of climate inputs, crop area, crop planting and harvesting calendar, and are run following the consistent simulation protocols (Elliott et al 2015). Gridded climate data is obtained from AgMERRA climate data set (Ruane et al 2015), as it is used for driving AgMIP crop models. Thus, adoption of AgMERRA climate data can allow for reasonable comparisons between statistical and process-based crop models. Similar patterns are obtained when using the observed climate from Parameter-elevation Relationships on Independent Slopes Model (Schauberger et al 2017). The period 1980-2010 is selected because both census data and simulated yields are available during this period.

Probability modeling and uncertainty quantification
Copula are functions that can describe dependencies between variables, and are valuable for risk analysis (Nelsen 2007). Here, copula functions are used in this study to fit the joint probability distribution function (PDF) between a drought index (x) and crop yields (y).
where C is the cumulative distribution function (CDF) of copula, while F X (X) and F Y (Y) are the marginal distributions of x and y, respectively. The PDF of crop yields conditioning on a given drought condition (i.e. f y|x (y|x)) is calculated as follows: where c is the PDF of copula, f Y (y) is the PDF of crop yield. Based on the conditional probability distribution, yield loss probability is estimated as the area under f Y|X (y|x)| for yields lower than its longterm average value. The key features of copulas are its flexible structures in joining random variables with different types of marginal distributions, and its capability of measuring the non-linear dependence between variables (Nelsen 2007). In this study, five commonly used copulas are adopted (supplementary table S3), and the one that shows the highest statistically significant (at 95% confidence interval) maximum likelihood is selected as the best copula (Sadegh et al 2017).
To quantify the uncertainties associated with the probability analysis, the hybrid-evolution Markov chain Monte Carlo (MCMC) algorithm within a Bayesian framework is adopted to derive the posterior distribution of copula (i.e. p θ|Ẽ ) as follows: where p (θ)is the prior distribution of copula parameter θ, p(Ẽ|θ) represents the likelihood function solved by following equation (4).
whereσ denotes the standard deviation of measurement error. The 5th and 95th of the posterior distribution are used to represent the lower and upper bounds of our probability estimates, respectively.

Assessment of yield loss risk under drought
Despite the direct effects of soil moisture, this study assesses drought impacts from the climatic perspective, due to the lack of long-term continuous ground observations of soil moisture across a large geographical region. Using climate-based drought indicators allows for consistent comparison between statistical and process models, because both models can use the same climate dataset for drought calculations. It is well recognized that temperature is a critical factor that regulates drought effects on crop growth and yield (Lobell et al 2014, Leng 2017a S4). To account for the effects of technological improvements, the linear trend of maize yield is removed using the least squares method (Hlavinka et al 2009, Lobell et al 2011, before it is used for fitting the risk model. Based on the fitted joint distribution function, the risk of yield loss (i.e. yield lower than its expected value) under the four categories of droughts is estimated. Similar analysis is conducted for both irrigated and non-irrigated yields to explore the potential benefits of irrigation in mitigating yield loss risk under droughts of various severity. Analysis is also repeated based on the simulated yields by eight process crop models and compared with observation-based results in order to evaluate the performance of current state-of-theart crop models in simulating yield loss risk under droughts.

Results and discussions
During the past three decades, maize yield for the country as a whole has exhibited substantial interannual variations, which is significantly (P < 0.05) correlated with the drought conditions as measured by the SPEI (figure 1(a)). Such observed significant relation between yield anomaly and SPEI allows for modeling their full dependence structure (see section 2), which is shown in figure 1(b). Comparing the modeled yield distributions with observed yields (red dots) show that the majority of observed yields fall within the high-density region of PDFs, suggesting that our model is reliable for describing maize yield anomaly under droughts. Despite the tendency of high yields with increasing SPEI, low yields are observed under extreme wet conditions (with SPEI close to +2). Theoretically, this is possible because excessive rain water could induce waterlogging that could harm crop growth and yield. Li et al (2019) revealed the role of excessive rainfall in leading to yield loss over US. Our results confirmed this but have added value by quantifying the associated probabilities which tend to be relatively small. The unique value of our model lies in that it allows for examination of all possible yield responses to an individual drought event, complementing previous studies measuring the overall relation between the annual time series of crop yield and a specific drought index. Figure 2 shows the estimated yield loss risk (i.e. the probability of yield reduction lower than expected value) under a moderate, extreme, severe and exceptional drought event in observations and process-based crop models. Based on census yield and observed climate data, the probability of yield loss for the country as a whole is expected to be 64.3% 69.9%, 73.6%, and 78.1% under moderate, extreme, severe and exceptional droughts, respectively. Comparing the estimates under the four drought events can also indicate the non-linear sensitivity of yield loss risk to the increase of drought intensity. For example, yield loss risk grows faster when experiencing a shift in drought severity from moderate to severe than that from extreme to the exceptional category, i.e. demonstrating the non-linear response of yield to the increase in drought severity. The substantial increase of yield loss risk when drought intensity shifts from moderate to extreme, severe and exceptional points to the needs for effective adaptive measures for ensuring resilience in agricultural production in a warming climate with greater likelihood of more frequent and severe droughts (Sheffield and Wood 2008, Dai 2013, Huang et al 2017. Similarly, the process-based models simulated a consistent upward tendency of yield loss risk in response to the growth of drought intensity. However, large discrepancy is observed among the eight process models, and such spread becomes larger with increase in drought intensity. For example, the highest estimate of yield loss risk is 99.3% in the GIS-based EPIC model (GEPIC) under the exceptional drought category, while the lowest estimate is 57.9% in EPIC-Boku. Gridded crop models make a number of simplifications, and the large inter-model discrepancy could be due to the considerable differ- . For example, all crop models calculate a water stress coefficient ranging from 0 to 1, which would affect processes such as canopy senescence, stomatal conductance, assimilation rate and grain yield. This can explain that crop models generally agree on the direction of yield change with increase in drought severity. However, how drought exerts impacts in process models are diverse and depend on soil water content (e.g. predicting ecosystem goods and services using scenarios (PEGASUS)), soil water supply to demand ratio (e.g. parallel agricultural production systems simulator (pAPSIM)), or actual to potential transpiration ratio (e.g. CGMS-WOFOST which also depends on soil moisture), which may contribute to the large spread in process model simulations. Like previous model evaluation works (Müller et al 2017), quantitatively examining the underlying reasons behind the diverse model performance is beyond the capabilities of this study, as it would require the coordination efforts of the modeling groups for attribution analysis.
How are yield loss risks spatially distributed across the country? Previous studies have well Figure 1. The relations between maize yield and droughts: (a) shows the temporal changes in maize yield anomaly and the drought index SPEI; (b) shows the modeled dependence structure between maize yield and SPEI. Here, five copula models are tested and the maximum likelihood statistic is 113. 05, 116.89, 110.48, 112.01 and 113.11 for the Gaussian, t, Clayton, Frank and Gumbel, respectively. The t model is selected for probabilistic modeling, because it shows the maximum likelihood. The red dots (b) denote the observed yield-SPEI pairs, while the color background shows the modeled probability. It is evident that the observed yield-SPEI pairs lie well within the high probability areas, demonstrating that our probabilistic model is reliable for describing the yield-drought relation. examined the overall sensitivity of US maize yield to droughts (Zipper et al 2016), but it is unclear how yields respond to a drought event of specific severity. Here, we implement the risk model for each maize growing county (supplementary figure S1), based on which yield loss risks under a moderate, extreme, severe and exceptional drought event are estimated ( figure 3). Spatially, when experiencing an exceptional drought, the probability of yield loss could exceed 90% in most of the country. The highest risk is observed in central and southeastern US, while the lowest is in western US and high production regions such as the state of Illinois. Comparing the yield loss risk under the four categories of droughts indicates that maize yields in Southeastern US, Texas, High Plains are most vulnerable to the increase in drought severity than other areas. Notably, no counties show a 100% yield loss under an exceptional drought event, implying that other factors such as technology and management may have reduced yield sensitivity to drought (Elliott et al 2018). This further demonstrates the value of assessing drought impacts in a risk manner, rather than providing a deterministic evaluation. The revealed spatial patterns are found to be robust when examining the lower and higher bounds of estimations derived with the MCMC technique, although uncertainties are considerable in some areas (supplementary figure S2). The spatial patterns of yield loss risk are also robust to the log yield effects (supplementary figure S3) and the choice of drought index (supplementary figure S4), which are valuable for informing targeted adaptation and mitigation measures.
However, process-based crop models failed to reproduce the distinct spatial distribution patterns of yield loss risk (figure 3), which simulated more uniform patterns of yield loss risk and its sensitivity to the increasing severity of droughts. This may be attributed to the lack of spatially variable representation The increment of yield loss probability is statistically significant when drought intensity level shifts from moderate to severe, extreme and exceptional, from severe to exceptional, and from extreme to exceptional, while a non-significant increase is found between severe and extreme drought intensity. of soil layers, root growth and the thresholds for reducing crop water uptake, which are crucial for simulating drought effects. For example, the critical threshold of water stress is 0.25 in EPIC and 0.33 in crop growth monitoring system (CGMS-WOFOST), and remains spatially constant. Lack of spatially explicit crop management information could also contribute to the deficiency of crop models in reproducing the spatial heterogeneity of yield response to droughts (Deryng et al 2011, van Bussel et al 2015, Dobor et al 2016, Müller et al 2017, Elliott et al 2018. Using the crop model lund-potsdam-jena managed land (LPJmL), Jägermeyr and Frieler (2018) demonstrates that accounting for the observed spatial variations in sowing and harvesting dates can greatly improve the reproducibility of observed maize yield loss under droughts. They highlight that a spatially explicit implementation of cultivar phenology could lead to an improvement of model performance, which is even larger than that by refining model representation of water stress.
Our evaluation of current state-of-art process models has great implications for understanding the model strength and weakness geographically across the country, especially regarding the representation of climate extreme effects in crop models. Indeed, previous crop model evaluation studies mainly focused on yield averages and variability, while only a few studies have been conducted assessing the performance of process models in representing the effects of climate extremes. An evaluation of process-based models in our study is valuable for enhancing our understanding of process model strength and weakness, through (a) validating an ensemble of process models (rather than a single model) in a probabilistic manner; (b) assessing yield responses to droughts of various severities (i.e. moderate, extreme, severe and exceptional droughts), rather than a generic drought event; (c) revealing the contrasting model performances among different scales. Overall, the validation results suggest that projection of yield loss risk using crop models would be reliable at the country scale, but has large uncertainty at fine-scales.
The physical mechanisms behind the distinct spatial patterns are an open question since many factors could influence drought sensitivity in farmers' fields. In southeastern US, the relatively larger sensitivity of maize yield to droughts may be due to the poor water retention capacity by the sandy soils (Zipper et al 2015). In Iowa where tile drainage is extensively used, the relatively smaller sensitivity of maize yield to droughts may relate to excess water, high water table and wet soils over there (Schilling and Libra 2003). Maize yields also remain stable with increase in drought severity in irrigation in the western arid areas and Central High Plains, where irrigation is extensively applied. Indeed, results confirm that irrigation has well mitigated drought impacts on maize yields in areas where the separate data of irrigated and non-irrigated yields exists (supplementary figure S5). Overall, yield loss risk under a moderate drought would have been at least 10% higher if without irrigation, and such difference between irrigated and non-irrigated yield loss risks is statistically significant based on the unpaired t test.
Despite the great benefits by irrigation, yield loss risk still grows with increase in drought severity ( figure 4(a)), which implies that water stress may not be the sole factor affecting crop yields or farmers are not able to access sufficient irrigation water. The former suggests the compound effects by other important factors such as temperature, radiation, vapor pressure deficit and CO 2 (Lobell et al 2014, Deryng et al 2016, Siebert et al 2017 and confirms the value for our probabilistic assessment of drought effects, while the latter highlights the need for considering the constraints of irrigation water availability on agricultural production (Elliott et al 2014). Process crop models have well captured the role of irrigation in reducing yield loss risk under droughts, although the simulated benefits tend to become smaller with increase in drought severity ( figure 4(b)). The remaining risk under irrigation in process models may suggest that other important compounding factors/stresses have exerted influences on the yield-SPEI relations (e.g. extreme high temperatures), which in turn confirms the value for conducting a probabilistic assessment of drought impacts. These event-based findings have important implications not only for ensuring food security but also for water resource management. Indeed, substantial irrigation water withdrawals would lead to severe environmental problems (e.g. depleting environmental flows and groundwater resources, decreasing other human water uses), and is expected to exceed the water planetary boundary which defines the safe-operating space for humanity (Rockström et al 2009, Steffen et al 2015, Gerten et al 2020. Therefore, the revealed marginal benefits of irrigation under a specific drought event is important for guiding sustainable water use within the water Planetary Boundary.

Conclusions
Previous studies have well demonstrated drought impacts on crop yield at the regional and global scales. But the possible outcomes of crop yield in response to a specific drought event and their corresponding probabilities have been relatively under-examined, and especially an inter-comparison of observationbased statistical and process-based crop models is lacking. In this study, we develop a probabilistic framework to enable risk assessment of US maize yield response to a drought event in observations and crop models.
Results show that a single moderate drought event (with the drought indicator SPEI ranging from −0.8 to −1.2) would lead to a 64.3% chance of yield loss (i.e. yield lower than expected value) for the country as a whole. The risk, however, would jump to 69.9% when experiencing an extreme drought event (i.e. SPEI ranging from −1.6 to −1.9). Under an exceptional drought (i.e. SPEI ranging <−2.0), US maize yield would have a 78.1% probability of loss risk. The highest risk is observed in central and southeastern US, while maize loss risk is relatively low in western US, where irrigation has reduced yield loss risk by 10%, 17%, 22% and 27% under moderate, extreme, severe and exceptional droughts, respectively. Further analysis showed that current state-of-art process-based crop models can well capture the magnitudes and the sensitivity of yield loss risk to droughts for the country as a whole, but have difficulty in reproducing the distinct spatial patterns across the country. This suggests that continued efforts are required to improve the skill of process models especially at local scales, or conduct biascorrection before their applications.
Information on how crop yield will change in response to an individual drought of specific severity is critical for targeted adaptation and mitigations. Moreover, a risk-based analysis of yield response to droughts is valuable not only for robust decision-makings but also for the insurance sector, which typically require the risk information rather than a single value of outcome especially given the randomness of droughts. The analysis framework developed in this study allows for estimation of possible yield changes under a drought event of specific severity, which has the potential to be combined with existing drought monitoring systems, thus facilitating integrated and event-based risk assessment of drought effects on crop yield.

Data availability statement
The data that support the findings of this study are available upon reasonable request from the authors.