Stochastically modeling the projected impacts of climate change on rainfed and irrigated US crop yields

Food demands are rising due to an increasing population with changing food preferences, placing pressure on agricultural production. Additionally, climate extremes have recently highlighted the vulnerability of the agricultural system to climate variability. This study seeks to fill two important gaps in current knowledge: how irrigation impacts the large-scale response of crops to varying climate conditions and how we can explicitly account for uncertainty in yield response to climate. To address these, we developed a statistical model to quantitatively estimate historical and future impacts of climate change and irrigation on US county-level crop yields with uncertainty explicitly treated. Historical climate and crop yield data for 1970–2009 were used over different growing regions to fit the model, and five CMIP5 climate projections were applied to simulate future crop yield response to climate. Maize and spring wheat yields are projected to experience decreasing trends with all models in agreement. Winter wheat yields in the Northwest will see an increasing trend. Results for soybean and winter wheat in the South are more complicated, as irrigation can change the trend in projected yields. The comparison between projected crop yield time series for rainfed and irrigated cases indicates that irrigation can buffer against climate variability that could lead to negative yield anomalies. Through trend analysis of the predictors, the trend in crop yield is mainly driven by projected trends in temperature-related indices, and county-level trend analysis shows regional differences are negligible. This framework provides estimates of the impact of climate and irrigation on US crop yields for the 21st century that account for the full uncertainty of climate variables and the range of crop response. The results of this study can contribute to decision making about crop choice and water use in an uncertain future climate.


Introduction
The global population has increased rapidly in recent decades and is projected to continue growing in the coming decades (Lutz et al 1997, Keinan and Clark 2012, Freedman 2014. As a result, food demands are rising. Furthermore, climate change leads to more frequent climate extremes and increasing climate variability. This increases pressure on agricultural production, as crop growth and yields are sensitive to meteorological conditions during growing seasons (Monteith and Moss 1977, Rosenzweig et al 2001, Lobell and Field 2007, Kang et al 2009, Lobell and Gourdji 2012, Troy et al 2015, Zipper et al 2016. Many previous studies have focused on the impact of growing season climate conditions on crop yields. While irrigation is widely used in agricultural production, the actual contribution of irrigation to yields over large spatial scales has not been well studied.
A number of studies have connected the impact of climate change on crop yields. Average growing season air temperatures have increased for almost all major Original content from this work may be used under the terms of the Creative Commons Attribution 3.0 licence.
Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI. growing regions of maize, soybean, wheat, and rice (Lobell and Field 2007, Ortiz et al 2008, Zhu and Troy 2018. In cropping regions and growing seasons for most countries, temperature trends for 1980-2008 exceeded one standard deviation of the historical climatological variability (Lobell et al 2011), exerting a considerable drag on yield growth. Compared to the positive trends in air temperature, global precipitation trends were less consistent in both model projections and historical observations (Kumar et al 2013, Ren et al 2013). From 1980 to 2008, climate trends have caused a decline of 3.8% and 5.5% in global maize and wheat yields, respectively (Lobell et al 2011). More specifically, climate trends have led to a 2.5% decrease in European wheat yields and slightly increased maize yields (Moore and Lobell 2015).
To assess the agricultural risks of climate change in the 21st century, crop models have been used both globally and regionally (Deryng et al 2014, Rosenzweig et al 2014, Ahmed et al 2015, Li et al 2015. These studies provide insights into potential vulnerabilities of the agricultural system to a changing climate. However, significant effort is needed to calibrate and run the models. As an alternative, other studies have used statistical approaches, which can explicitly account for the uncertainty in yield responses without significant computational resources. Tebaldi and Lobell (2008) predicted that global yields for wheat and maize would change between 1.6% and −14.1% compared to current yields. These trends result from changing temperature, precipitation, and carbon dioxide. Lobell et al (2006) analyzed the impacts of future climate change on California perennial crop yields using statistical models, concluding that the warming effect from climate change will reduce yields for several major perennials. Lobell and Field (2007) applied regression models, showing that there was a clear negative response of global yields to increased temperatures for wheat, maize, and barley. More recently, Najafi et al (2018) presented a comprehensive global analysis of the changes in the crop yields and how they relate to different large-scale and regional climate variables, climate change variables and technology using a Bayesian multilevel model.
As an adaptation tool, irrigation is widely used to counter drought impacts and increase crop yields (Grassini et al 2009, Leng 2017). Irrigation makes crop cultivation possible in places where rainfall is insufficient and buffers against stress that comes from both climate variability and extremes. Global grain yields benefit significantly from irrigated agriculture: 17% of total cropped land is irrigated, yet they provide 40% of global cereals (Rosegrant et al 2002, Molden et al 2010, Siebert and Döll 2010 As mentioned above, former historical crop yield simulation studies are abundant. However, they either applied relatively more cumbersome process-based models or use statistical approaches on large scales, such as country-wide, that may average out details. The impact of irrigation was rarely studied due to limited data availability. As a comparison, this study seeks to account for uncertainty in yield responses to different projected climate conditions and how the widespread response of irrigated crops differs from rainfed over multiple US major growing regions. Using both irrigated and rainfed crop yields that are available at the county level in the US, we build statistical models for both rainfed and irrigated yields, fit with historical data. We then use the models to simulate projected crop yields with climate projections, both irrigated and rainfed, from 2010 to 2099. The simulation results allow for quantification of the possible impacts of future climate trends, accounting for the impact of irrigation. As a result, the crop yields in the 21st century can be simulated and analyzed comprehensively, providing insights to inform decision making within an increasingly uncertain future climate.

Data and methods
For historical climate data, we used daily precipitation and daily maximum and minimum temperature from the 1/8°spatial resolution dataset that covers the contiguous United States (CONUS) (Maurer et al 2002). This study uses 1970-2009 as the historical period, which is the period in common between the climate data and yield data, described below. To build the statistical model, we first calculated 23 growing season climate indices for maize, soybean, spring and winter wheat from this dataset (table 1) ucllnl.org/pub/dcp/archive/cmip5/bcca). The climate indices from table 1 were calculated using these projections.
The crop yield data were taken from the USDA NASS Quick Stats database (http://quickstats.nass. usda.gov/). Annual county-level crop yields for maize, soybean, spring and winter wheat from 1970 to 2009 were used. Additionally, this database provides separate estimates of both irrigated and rainfed yields for more than 800 counties. Among them, there are in total about 650 counties and 270 counties with more than 10 years and 30 years of yield report across all crops, respectively. The separate datasets for irrigated and rainfed crop yields enable us to build two statistical models that work for these two cases. As a result, we are able to simulate yields under both irrigated and rainfed conditions, allowing us to project the future impact of irrigation. Figure 1 shows the study regions by crop. We used counties with at least 30 years of historical yield record to build the model and applied the model for all counties with at least 10 years of historical yield record.
Due to technological advances and improved growing practices, crop yields have increased historically. To evaluate crop yield responses to climate, this human-induced trend has to be removed; the method of a seven-year moving window standardization in Troy et al (2015) was used in this study to detrend the crop yields. The 23 climate indices were standardized using their mean values and standard deviations from the reference period 1970-2009 to make those indices comparable. This means the trends in climate still exist in the time series.  To build our statistical model, we first performed stepwise regression to choose which of the 23 potential climate indices to include as predictors. We ran a form of stepwise linear regression between county-level crop yields and the time series of the 23 climate indices. The index that returned the largest R-squared was chosen as the first predictor. Then, we kept this predictor and added each of the other 22 indices and reran the regression. Whichever combination had the largest adjusted R-squared was then chosen. By repeating this process, we eventually obtained a combination of indices that returned the largest R-squared when regressed with the crop yield time series. The regression was terminated when the addition of other indices did not increase the R-squared value by at least 0.01. This results in a selection of climate indices as predictors that are unique by crop, growing region and irrigated state (table 2). For some crops, we exercised our judgment to select physically meaningful predictors. For instance, for spring wheat, although the model selects HD30 over GDD as one of the predictors in the set of best predictors, we prioritized GDD over HD30 due to its physical importance in predicting crop yields. After selecting the climate indices to use as predictors, we performed multivariate linear regression between the county-level crop yields and the chosen predictors, which returns regression coefficients for each predictor by county. Then, we regionally pooled the coefficients by the growing regions in figure 1 and fit probability distributions to those coefficients. If it passed the Kolmogorov-Smirnov test, a normal distribution was used, otherwise we fit kernel density to the pooled regression coefficients fitting. Pooling coefficients account for the range of crop yield responses to climate, and these distributions enabled us to stochastically sample coefficients for predictors, allowing for the full range of yield response. During each run, we generated 10 000 Monte-Carlo simulations. The supplementary material is available online at stacks.iop.org/ERL/14/074021/ mmedia contains more details on the methods.

Model performance
Using county-level historical climate indices and sampled coefficients, we simulated time series of crop yields by crop and region for both the irrigated and rainfed cases. The stochastic sampling of coefficients associated with different climate indices (predictors) accounted for the uncertainty of crop yield responses to different climate conditions within a certain growing region (figure 1), thus resulting in a spread of simulated crop yields under certain climate conditions. As a result, figure 2 shows the performance of our statistical model by comparing modeled standardized crop yields (boxplots) with USDA historical yields (blue lines) for both irrigated and rainfed cases for all crops. The simulated crop yields generally followed the year-to-year variability of the USDA crop yields report. Over 60% of the historical records occurred between the 25th and 75th percentiles of crop yields; 28% of historical yields were between the 45th and 55th percentiles. The rainfed case has more interannual variability compared to the irrigated because rainfed crops have less buffering capacity against climate variability. The mean R-squared values across counties calculated using the median simulated and historical detrended crop yields were provided in table S1.
We did ten cross validations to evaluate our statistical model. During each cross validation, a random thirty years were used to first build the model. Then we used the other ten years of historical yields to validate the model performance. The results of all cross validations were generally similar to those in figure 2: more than 60% of the historical records occurred within our simulated spread (between 25th and 75th percentiles) of crop yields and approximately 30% of historical yields were between 45th and 55th percentiles. The difference in mean R-squared values between each pair of ten cross validations were less than 0.01. Table S3 shows the average values of mean R-squared values calculated from all cross validations. This result was similar to R-squared values in table S1, as R-squared changes were less than 0.02. The results also showed that different subset of historical data between ten cross validations for the model setup did not result in different model performance. In the following section, we used this model to simulate future crop yields when given climate projections data.

Trend analysis and anomalies
To use our model for crop yield projections, we first compared the probability density function curves of the historical climate indices from the five CMIP5 models (CCSM4, CESM1-BGC, CSIRO-Mk3-6-0, GFDL-CM3, MPI-ESM-LR) with those calculated from Maurer et al (2002). After evaluating that the CMIP5 models replicated the historical climatology of the climate indices, we calculated the projected climate indices. To preserve the projected climate trends, we standardized the predictors using the mean value and standard deviation from the baseline period 1970-2009 of each model. We used the Mann-Kendall test with an estimate of the slope from the methods of Hirsch (1982) to detect statistical significance and quantify the magnitude of the trend of the median yield for 2010-2099. As figure 3 shows, the climate models result in similar trends for all crops with a few exceptions: CSIRO-Mk3-6-0 and GFDL-CM3 for rainfed soybean, CCSM4 for irrigated maize and CESM1-BGC and MPI-ESM-LR for irrigated winter wheat in the South. The trend magnitude should be analyzed together with the percentage of statistically significant trend values in the Monte Carlo simulations. Both irrigated and rainfed maize yields are expected to see a decreasing trend for all models although only 60% of Monte Carlo simulations had statistically significant trends. Similarly, both irrigated and rainfed soybean yields are projected to experience decreasing trends, and 80% of these trends were statistically significant. As a comparison, winter wheat in the Northwest will see an increasing trend during 2010-2099, and 88% of these trends are statistically significant. Irrigated soybean and rainfed winter wheat in the South also have consistent trends for all climate projections: a decrease for irrigated soybean and an increase for rainfed winter wheat yields in the South. Trends for rainfed soybean and irrigated winter wheat in the South are mixed, as different climate projections resulted in opposite trends. Figure 4 shows the time series of area-averaged, median values of simulated crop yields. The five climate models produce similar simulated yield results with only a few differences. As previously mentioned, irrigated and rainfed maize see a decreasing trend (−0.01 to −0.005 anomaly/year) in the 21st century. Similarly, both irrigated and rainfed spring wheat tend to decrease slowly, and more than 80% of these trends are statistically significant at the 10% level. In contrast, winter wheat in the Northwest is projected to increase, as all climate projections resulted in a positive trend except for the CCSM4 irrigated case. Comparing the irrigated and rainfed cases, rainfed crop yields tend to have more variability than irrigated crop yields. Yield anomalies are more likely to drop below 0.5 negative standard deviations from 2010 to 2099, with an increased likelihood after 2040. This also results in more concurrence of large crop yield declines across multiple crops under rainfed conditions. We define agriculturally bad years as having at least two crops with yield anomalies below −0.5 standard deviations calculated during the baseline period. These years are shown as dashed lines in figure 4. We find that agriculturally bad years occur more frequently when crops are rainfed, as we see more dashed lines in the right column than the left. Bad years tend to cluster within a decade such as the 2060s in MPI-ESM-LR and 2070s in CESM1-BGC. This is potentially due to a mode of internal variability such as ENSO and may require mitigation efforts to reduce the negative climate impacts. Irrigation is shown to be effective in mitigating the impacts of climate variability, as agriculturally bad years are much less likely to occur for irrigated agriculture projections compared to rainfed.

Potential climate drivers
By utilizing the statistical model and CMIP5 climate projections, we estimated the projected crop yields time series based on Monte Carlo simulation for 2010-2099. The simulated crop yields provide us with valuable insights about climate impacts on yields as well as the potential impacts of irrigation. The model structure allows for evaluation of the potential drivers of the yield trends.
Because the simulated yields were calculated by adding up the products of each predictor and its sampled coefficient, both the coefficient and predictor impact the trend results. The coefficients associated with temperature-related indices are mostly negative (figures S1-S10), such as DTR for maize, Tvar for soybean and SU for spring wheat. However, for indices like Tmin for rainfed winter wheat in the Northwest and FD for rainfed maize, the coefficients are sometimes positive. In contrast, the distribution of coefficients associated with precipitation-related indices is mostly centered near zero, indicating an uncertain impact on yields. After evaluating the coefficient, we calculated the trend values of each predictor at the county level and focused on the median values. As figure 5 shows, almost all precipitation-related indices have trend values that are close to zero and most of these trends are not statistically significant. On the contrary, most temperature-related indices have larger trend values with greater statistical significance. The coefficients associated with precipitation-related indices imply a minimal impact on crop yields, as more than 70% of them were centered at values ranging from −0.02 to 0.02. In contrast, more than 80% of coefficients associated with temperature-related indices were centered at values greater than 0.2 or less than −0.2, indicating a stronger impact on crop yields. For example, when sampling the coefficient associated with DTR for rainfed maize, the results are almost all negative, indicating that the increasing trend of DTR would contribute to decreasing rainfed maize yields. As a comparison, when sampling the coefficient associated with Tmin for rainfed winter wheat in the Northwest, the results are almost all positive, indicating that the increasing trend of Tmin would contribute to increasing irrigated winter wheat yields there. By implementing similar analysis, we found that all crops are impacted by climate change, mostly by the warming trend and greater diurnal temperature range. Therefore, the temperature-related indices are the major drivers that impact crop yields; and those with larger trends in figure 5, such as Tmax, FD, and Tmin, make the largest difference to the projected yield trends.
Because the model was at the county level and the above analysis was at the regional scale, it was possible some details within a region could be missed. Therefore, we compared county-level results as well. No differences in county-level trends were found, indicating that pooling is a reasonable approach to account for uncertainty in climate-yield response.

Discussion and conclusions
Using the Maurer historical dataset (Maurer et al 2002) and crop yield data from the USDA, we built a cross-validated statistical model that could simulate historical crop yields. With projected climate indices calculated from five downscaled CMIP5 climate projections, we ran 10 000 Monte Carlo simulations for each case, generating projections of crop yields for 2010-2099. Trend analysis indicates that maize and spring wheat yields are projected to decrease while rainfed winter wheat yields in the Northwest are projected to increase; irrigated soybean yields tend to decrease and rainfed winter wheat yields in the South Large anomalies tend to occur more frequently in the rainfed case, with more agriculturally bad years that have concurrent crop yield declines. Temperature-related indices will dominate the climate impacts on crop yields, with many of them experiencing larger trends, either negative or positive, compared to precipitation-related indices during 2010-2099. Temperature-related indices have regression coefficients with distributions centered on relatively larger negative or positive values that are far from zero, implying consistent impacts on crop yields. This is not the case for most precipitation-related indices.
Employing physically-based crop models involves significant effort, including parameter calibration and detailed local information. Srinivasan et al (2010) were able to precisely assess total Upper Mississippi River basin biofuel energy crop production using the Soil and Water Assessment Tool (SWAT) model. However, significant parameterization and calibration are required beforehand. Challinor et al (2005) also used process-based models and accounted for uncertainty by applying multiple weather data ensembles. Additionally, they introduced a single yield-gap parameter (YGP) which involves mean bias-correction. Similarly, Lobell and Burke (2010) used a process-based crop model to simulate historical maize yield variability, but they also trained statistical models on the simulated variability to capture crop response. However, the predictor variables in the model are limited to growing season precipitation and temperature. Simulation based on limited predictors could miss important details and result in insufficient robustness. In contrast to these studies, we considered the full uncertainty of yield response to climate by regionally pooing the possible 'impact patterns' that have occurred historically within a certain growing region. We have also considered many climate indices as potential predictors together with stepwise regression to identify which predictors matter the most. This enables us to further study the potential climate drivers for yield Figure 5. Median trend values of climate drivers (predictors) by crop. The color shows the percentage of counties whose trends were statistically significant at the 10% level for a two-sided test. Hollow circles indicate that no county has statistically significant trends. trends, revealing more details about how a changing climate may impact future yields.
Similar to the results in Grassini et al (2015) and Schauberger et al (2017), our study shows the impact of irrigation, especially for maize and soybean, in similar CONUS growing regions. In terms of projected yield trends, this is the first study to our knowledge using this dataset and approach. Compared to similar studies, both our study and several former studies (Lobell and Gourdji 2012, Lobell and Tebaldi 2014, Rosenzweig et al 2014, Blanc 2017 indicate that climate change would be a credible threat to sustaining crop productivity growth not only in the US but worldwide, if no adaptive management such as irrigation is made. This study is limited in some aspects, such as the assumption that the factors affecting historical crop yields will also affect future crop yields similarly. This could result in biased estimation of projected yields. In reality, adaptation measures (changing planting dates, substituting cultivars or crops, irrigation practice change, etc) are very likely to happen to counter climate change (Adams et al 1998). We used a static crop calendar in this project, which could impact the results. However, we tested the sensitivity of our results to the crop calendar by calculating the difference in climate predictors that would due to changes in the projected crop calendar following the method of Gourdji et al (2013), and we find the changes in climate indices are negligible. We also omitted some indirect impacts of climatic factors such as pests and diseases that varied both in severity and regional distribution in recent decades (Rosenzweig et al 2001, Oerke 2006). Variables such as relative humidity and wind speed are often used to model pests and diseases and also matter for crop growth. The results could potentially be improved if additional variables are included. In addition, regional pooling of regression coefficients could be more related to certain delineation of climate regions in future work, rather than where data are available. This would allow for consistent climate conditions spatially, thus similarly impacting the crop production within the region (Thornton et al 2009). For this paper, considering the small number of total counties, we pooled all the counties with available data together for each crop and divided winter wheat into two separate regions (figure 1). Generally, we analyzed the results at the regional level rather than the county level. Although the individual counties tend to have similar simulation results in terms of trends and anomalies, regional level analysis can average out some details that might be interesting (Gornott and Wechsung 2016). Some counties showed low R-squared values (less than 0.2) during the model validation process yet were still included. These counties could potentially show other intrinsic relationships such as the relatively weak response of crop yields to climate in certain areas. We chose five different CMIP5 climate model projections to simulate future crop yields, and these projections did cover results from various organizations/algorithms but were not all-inclusive. Supplementary analysis with more scenarios and climate projections would certainly enrich the modeled results, offering more possibilities for our simulations to be more realistic, as crop response is very sensitive to what actual global change will be (Lobell et al 2011, Lobell andGourdji 2012), especially considering the performance of different climate models could vary significantly by region.
Despite these limitations, this study provides insights into how future crop yields would change and the importance of irrigation as a climate adaptation tool. The model is fast to implement as new climate projections become available. Generally speaking, this study offers a statistical approach to simulate future crop yields given climate projections. The large collection of related climate indices (predictors) from which to choose increases the robustness of the model and contributes to the conclusion that temperature-related indices, such as Tmax, GDD, and Tmin, would be the major drivers that could impact crop yields. Additionally, impacts of irrigation are explicitly interpreted through the model which accounts for both irrigated and rainfed cases.