An empirical, Bayesian approach to modelling crop yield: Maize in USA

We apply an empirical, data-driven approach for describing crop yield as a function of monthly temperature and precipitation by employing generative probabilistic models with parameters determined through Bayesian inference. Our approach is applied to state-scale maize yield and meteorological data for the US Corn Belt from 1981 to 2014 as an exemplar, but would be readily transferable to other crops, locations and spatial scales. Experimentation with a number of models shows that maize growth rates can be characterised by a two-dimensional Gaussian function of temperature and precipitation with monthly contributions accumulated over the growing period. This approach accounts for non-linear growth responses to the individual meteorological variables, and allows for interactions between them. Our models correctly identify that temperature and precipitation have the largest impact on yield in the six months prior to the harvest, in agreement with the typical growing season for US maize (April to September). Maximal growth rates occur for monthly mean temperature 18 °C–19 °C, corresponding to a daily maximum temperature of 24 °C–25 °C (in broad agreement with previous work) and monthly total precipitation 115 mm. Our approach also provides a self-consistent way of investigating climate change impacts on current US maize varieties in the absence of adaptation measures. Keeping precipitation and growing area fixed, a temperature increase of 2 °C, relative to 1981–2014, results in the mean yield decreasing by 8%, while the yield variance increases by a factor of around 3. We thus provide a flexible, data-driven framework for exploring the impacts of natural climate variability and climate change on globally significant crops based on their observed behaviour. In concert with other approaches, this can help inform the development of adaptation strategies that will ensure food security under a changing climate.


Introduction
Establishing the climate risk to the global production of individual crops, and how that might change in the future, is an essential requirement for building a resilient and robust food system that ensures food security for all (FAO 2002). Decision-makers can then use this information to guide the development of suitable adaptation and mitigation strategies across different time frames. This requires the characterisation of the relationship between meteorological and food production variations.
There is a growing consensus that a range of methods are needed to accurately assess climate impacts on crop yield (e.g. Lobell and Asseng 2017, Tigchelaar et al 2018, Snyder et al 2018. Exploring different model formulations and assumptions (e.g. a multi-model ensemble) provides a way of assessing key uncertainties and biases in our understanding of crop-climate interactions. In turn, this can help evaluate our confidence in the direction and magnitude of climate change impacts on food production. Broadly, there are two complementary approaches to this: physiological processes-based models; or data-driven, statistical models. Physiological models are generally built upon an experiment-based understanding of the generic crop. Data-driven, statistical models can be developed when there is sufficient empirical yield data. Each approach can thus be developed and applied when the requirements for the other are not met.
Examples of statistical approaches include non-parametric models which are formulated in terms of meteorological variables rather than underlying physiological processes or critical thresholds. These models are calibrated using historical data and have demonstrated the ability to capture broad influences of weather on crop yield (e.g. Schlenker and Roberts 2009, Welch et al 2010, Lobell and Burke 2017. This approach differs to parameterised models, calibrated by field experiments, which do account for specific thresholds in quantifying the response to temperature (e.g. Cutforth and Shaykewich 1990, Yin et al 1995, Wang and Engel 1998, Yan and Hunt 1999, Streck et al 2007, Zhou and Wang 2018 or precipitation (e.g. Çakir 2004, Ge et al 2012, Carter et al 2016, Song et al 2019. Observations have also been used to constrain parameters of more complex process-based models (e.g. Iizumi et al 2009, Tao et al 2009. A physically motivated, but empirical, data-driven approach would complement both process-based crop models and existing statistical approaches. This formulation would allow models to be developed without extensive field trials, and with a greater range of validity. Our approach exploits Bayesian inference to derive an empirical and non-linear 'growth response function' that maps temperature and rainfall conditions to crop growth. This work is part of a trend to apply advanced statistics and machine learning methods to climatological and agricultural data sets. For example, You et al (2017) demonstrated the application of deep learning and Gaussian processes to predict yield based on remote imaging.
Bayesian inference, is an established approach for inferring the posterior values of model parameters, based on prior assumptions and new data. The advantage of Bayesian inference is it allows robust computation of errors, which is especially critical when the aim is to model the influence of predicted climate data which are themselves subject to large uncertainties. More generally, Bayesian inference is frequently applied to problems of model parameter estimation with noisy data in other fields such as astronomy (Hurley et al 2017). In this paper, we are determining generative probabilistic models, which have a greater ability to accurately capture uncertainty than the more common discriminative models in machine learning.
The aim here is to present a simple and robust model which captures the impact of mean temperature and precipitation changes on mean yield. Developing a flexible method for investigating the influence of present-day natural climate on yield allows us to explore the direction and scale of climate change impacts in the absence of adaptation. We apply these methods to model the response of US maize to temperature and precipitation, which has been widely studied (e.g. Schlenker and Roberts 2009, Hatfield et al 2011, Roberts et al 2012, Sánchez et al 2014, Hatfield and Prueger 2015, Partridge et al 2019. We aim to first demonstrate an ability to capture key aspects of present-day maize yield variability in the US, and secondly to explore the implications of climate change for the current maize varieties in the absence of adaptation.
The paper is structured as follows: section 2 describes the yield and meteorological data, followed by section 3 which presents the models, demonstrating their strengths and weaknesses. Section 5 applies the model to an ensemble of climate projections as a first step to predict the influence of climate change on yield for present day maize varieties. Finally, section 7 summarises the work and our main conclusions.

Data
Numerous studies have demonstrated the importance of temperature in maize development (e.g. Cross and Zuber 1972, Coelho and Dale 1980, Daughtry et al 1984, Cutforth and Shaykewich 1990, Bonhomme et al 1994, with quantities such as Growing Degree Days (GDD) offering better predictions of phenological changes and yield than calendar days after planting. Water stress is also associated maize yield reductions (e.g. Çakir 2004, Ge et al 2012, Carter et al 2016, Song et al 2019, suggesting that any model should incorporate the effects of both temperature and precipitation. We proceed assuming that climate variability is a major driver of observed yield anomalies, and do not attempt to separate direct physiological influences from impacts resulting from air quality or pests (Gornall et al 2010). We use annual maize yield and monthly temperature and rainfall data aggregated to the state scale. This minimises the impact of local variations in both meteorology and planting date (e.g. Roberts 2009, Lobell andBurke 2017). For future work, the model outputs are compatible with the large-scale atmospheric circulation patterns that can be reliably simulated by global climate models. The relatively low data requirements, compared to daily data, also support model flexibility and computationally efficiency. Constraining the model parameter posteriors depends on the sampling of temperature/precipitation space, the data can only constrain features which are present in the data. We therefore draw observations from across the US Corn Belt, covering Indiana, Illinois, Iowa, Ohio, Minnesota and Nebraska to maximise the sampling while being confident the regions use broadly similar agricultural techniques and have comparable climatologies.
The spatial distribution of maize cultivation (both irrigated and rained) was extracted from the MIRCA2000 dataset (Portmann et al 2010) and used to derive area-weighted climate variables. Monthly mean temperature and monthly total precipitation are based on CRU TS3.1/3.21 (Harris et al 2014) and GPCC v6 (Schneider et al 2014) and extracted for 1981-2014 from within the WFDEI dataset (Weedon et al 2014). An overview of the data is shown in table 1.
In all regions there has been a long-term increase in maize yield since 1960, on top of which year-to-year and multi-year variations are evident. To remove the long-term trend and decadal-scale variability, which can be driven by climate and non-climate factors (e.g. Hawkins et al 2013, Ray et al 2015, we use yield 'anomalies'. Removing the estimated long term trend gives an estimate of the yield anomaly relative to what was expected that year, which retains the units of tonnes per hectare (t/ha). We compare a range of methods for calculating yield anomalies: a) anomalies relative to the centred 5-year rolling median yield in each region which is subtracted from the time series; b) anomalies relative to the 5-year rolling mean; c) fractional anomalies relative to the 5-year rolling mean (which are unit-less); d) anomalies relative to the least squares linear trend across all regions. Finger (2010), applied robust detrending techniques, which are designed to perform regression in the presence of outliers. Such an approach could be used here to perform the linear de-trending; however, it would still rely on the assumption of a linear productivity increase. Removing outliers beyond 10% fractional yield anomaly influenced the linear fit parameters by less than 10% and had a negligible impact on parameter posteriors (changes to median values of less than 5 % of the variance).
The total yield anomaly in year j, ΔY j , is the difference between the yield for year j, Y j , and the rolling fiveyear median or mean ( ) , , , j j j j j j 2 1 1 2 . A feature of using the rolling 5-year median anomalies is that, on average, one in five anomalies will be exactly zero. For this reason we compare all four forms of yield anomaly. The choice of de-trending has some influence on the posterior parameter values and, therefore, the model performance as described in section 3.5. The temperature and precipitation anomalies for each month are calculated by subtracting the corresponding monthly climatological mean for 1981-2014. However, the main model presented uses actual temperature and precipitation values and not anomalies.
The relationship between the yield and the temperature and precipitation anomalies forms the basis of our linear models. These provide a useful comparator for more complex models that relate yield to temperature and precipitation directly. Table 1 summarises the variation around typical values for monthly temperatures and precipitation and annual maize yields. Figure 1 illustrates how these two measures are related, while figure 2 shows the relation between the 5-year median anomaly and the 5-year fractional mean anomaly. Later, we compare what impact the choice of target yield has on the performance of models.
Previous analysis of these data by Kent et al 2017 revealed that yield reductions greater than 10% are strongly associated with mean temperatures during June, July and August exceeding 23°C, combined with total precipitation less than 240 mm. In short, warmer and drier than normal conditions were associated with reduced maize yield in the US Cornbelt. There was also evidence that excess precipitation during the same period could predict yield reductions; however, this relationship was more tentative. Figure 3 shows the maize yield time series, normalised to 2007 linear trend levels, as a function of yearly average temperature and precipitation. Even at this level of temporal coarse-graining, (i.e. averaging temperature and precipitation anomalies over six months), there is evidence that warmer and drier conditions are associated with significantly reduced maize yield, and the basic features that any model should try to capture are evident.

Methods and models
It is well known that both temperature and water availability play influential roles in determining maize growth and yield. Temperature, in particular, affects the rate of phenological development (e.g. Cross and Zuber 1972, Coelho and Dale 1980, Daughtry et al 1984, Cutforth and Shaykewich 1990, Bonhomme et al 1994, with a range of studies demonstrating that growth and yield have a non-linear dependence on both daily temperature and accumulated thermal sums (e.g. Cutforth and Shaykewich 1990, Streck et al 2007, Schlenker and Roberts 2009, Zhou and Wang 2018. The non-linear response often incorporates cardinal temperatures which describe minimum, optimal and maximum thresholds for a particular crop (e.g. Yin et al 1995, Wang and Engel 1998, Zhou and Wang 2018. While many models make use of standard cardinal temperatures (e.g. Yin et al 1995), this work uses the observed data to estimate the optimal growing temperature.  Comparison between the 5-year median anomalies and the 5-year mean fractional anomaly. A feature of the 5-year median anomaly is that roughly one fifth of all anomalies are exactly zero, which introduces non-Gaussian behaviour. In contrast, a benefit of using the fractional difference to the 5-year mean is that we might expect anomalies to increase in proportion to the mean yield. The preferred model is the one which is most highly correlated with temperature and precipitation.
Water stress is also associated with maize yield reductions (e.g. , and can be affected by precipitation and temperature as well as soil and land management strategies. The joint dependence on precipitation and temperature can be understood in terms of the plant's demand for water, which is related to vapour pressure deficit (VPD) between the saturated plant leaf and the ambient air (e.g. Roberts et al 2012. Higher VPD tends to occur on warmer days with lower humidity, promoting higher transpiration rates from the plant. The plant may respond to reduce water loss by reducing stomatal conductance, but this can inhibit metabolic activity and carbon assimilation, potentially resulting in yield failure (e.g. . The empirical response of US maize yield to precipitation is weaker than for temperature (e.g. Lobell et al 2013), but water availability remains an important consideration and we, therefore, include precipitation in the models outlined below.
Building on the previous research outlined above, the models developed here explore yield dependence on monthly temperature and rainfall accumulated during the growing season, making use of both linear and nonlinear models trained on monthly weather observations. This approach allows us to explore the simultaneous influence of monthly temperature and precipitation variations on US maize, as well as the effect of interactions between these two variables. Using monthly resolution will also allow the model to applied in situations where the only data available on on that time scale such as is common from, for instance, satellite imaging derived data. As we add complexity to these models, we can capture more features in the data, but the model may also be subject to parameter degeneracies and over-fitting, given the limited volume of data we have chosen to fit against. The classes of model investigated here are: • Linear models, predicting yield using a sum over the growing season of the monthly contributions to growth based on a linear independent function of observed temperature and precipitation anomalies.
• Gaussian process regression, predicting yield using correlations between each monthly temperature and precipitation and their correlation with the target yield.
• Two-dimensional Gaussian model, predicting yield using a sum over the growing season of the monthly contributions to growth based on a two-dimensional Gaussian function (i.e. non-linear) of observed temperature and precipitation.
The linear model was used to determine which months were most closely correlated with yield. Using this as a basis, the parameters of the linear and two-dimensional Gaussian models were initially inferred using least squares minimisation, and subsequently using Bayesian inference to investigate the full posterior on the model parameters. In principle, it is possible to use the regression coefficients to extract information about the crop's response to temperature and precipitation, hereafter referred to as the growth response function. However, using the Gaussian models described below, we are able to make a more direct estimate of the growth response function.
Gaussian process regression was used as a baseline to measure the capacity of meteorological data to predict the yield. This is because it is a general form of regression which does not make any assumptions about the 'true' model as parameterised models must. 3.1. Growth as function of temperature and precipitation After initial investigations using the linear model and Gaussian process regression to identify those months where climatology is correlated with the yield, we extended the approach to develop a more physicallymotivated model which allows for non-linear responses to both temperature and precipitation. As a heuristic, we assume there is an optimum temperature and precipitation (subject to other variables being held constant, e.g. solar radiation, soil type and CO 2 concentration) away from which the plant's growth rate declines (c.f. Cutforth and Shaykewich 1990, Yin et al 1995, Wang and Engel 1998, Hatfield and Prueger 2015, Korres et al 2016, Tigchelaar et al 2018. Here we model the monthly contribution to yield as a time-independent twodimensional Gaussian function of mean monthly temperature and precipitation. This allows us to implicitly incorporate the effects of exposure to cold and heat as well as insufficient and excess precipitation, without explicitly needing to derive critical thresholds (e.g. cardinal temperatures) from the data. Since the function is time-independent it represents the crop's typical response to growing conditions averaged across the entire growing season. For an individual state, 30 years of monthly data is insufficient to fully sample the T, P plane around the peak of the Gaussian; to combat these data limitations we combine information from the different states. The model is described in full in section 3.4 and in the notebooks which execute the code and are available on GitHub.
The approach developed here explains the results in figure 3 through developing a more general and continuous description of the crop's response to temperature and precipitation, while maintaining consistency with the threshold-based approach demonstrated in Kent et al (2017). This approach shares some similarities with Snyder et al (2018) who developed an emulator of process-based crop models based on Agricultural Model Intercomparison and Improvement Project (AgMIP) Coordinated Climate-Crop Modeling Project (C3MP) data.
Freely fitting the Gaussian parameters for each month during the calendar year led to large unconstrained posteriors on the parameters. This is predominantly a consequence of the volume of data relative to the number of parameters, and means that further assumptions are needed to reduce the number of free parameters. One option is to assume the same functional shape for each month, while allowing the critical temperatures to change during different growth stages (e.g. Hatfield et al 2011, Sánchez et al 2014. This assumption reduces the model parameters by a factor of twelve. Based on the growing season for US maize, we also restricted equation (7) to sum over months April to September (months 4-9). This was determined using the linear response model which showed that months 1-3 had a correlation with yield that was consistent with zero. We experimented with various priors and settled on wide Gaussian priors around the means of the measurements, checking that the priors did not have a large influence on the posterior.
In the next two subsections we describe the models, while their performance is discussed in 3.5.

Linear model
The first model investigated here used multiple linear regression to predict yield anomalies as a function of monthly mean temperature and precipitation anomalies, with their contributions to maize growth summed over the growing season. The physical interpretation of this approach is that the regression coefficients capture the crop's underlying response to climate variables such that the yield anomaly is determined by that year's temperature and precipitation throughout the growing season. This method was used to empirically determine the months to include in training and prediction. The tightest correlation was between the months of April and September-as expected under the assumption that the months leading up to and including the harvest are critical for determining yield. In this linear model and the later two dimensional Gaussian model, the yield anomaly in year j is given by the sum of the monthly mean growth rates,  y i j , , for each month i during the growing season: Where Δt i is the duration of the monthly interval, and the growth rate is some function of the monthly temperature anomaly ΔT i and precipitation anomaly ΔP i , defined as the difference of the month i measurement relative to the 30-year mean for that month.
Under these assumptions, the growth rate can be expressed as The approximation assumes that the function is slowly changing on the scale of the temperature and precipitation anomalies such that a Taylor expansion to first order is sufficient and that the variations with temperature and precipitation are independent of each other. In this limit, the regression coefficients, s t and s p , will be related to the gradient of the growth response and are, themselves, functions of temperature and rainfall anomalies.

Gaussian process regression
A Gaussian process is a non-parametric distribution over an infinite collection of random variables such that any finite subset constitutes a multivariate normal distribution (Rasmussen and Williams 2005). Such models are completely specified by their second-order statistics, namely the mean and covariance, though most often assume the former to be zero everywhere and rely entirely on the latter to evaluate the predictive distribution. The covariance function measures similarity in the input space; a common choice for this is the squared exponential covariance function which we employ here. Gaussian processes serve as a general procedure for predicting non-linear behaviour; as such it provides a baseline against which to compare the predictive power of any physically-motivated parameterised model. Here, we use the Gaussian process model to provide a baseline for the power of temperature and precipitation to predict yield. As such, this helps test the sufficiency of linear models and our suggested generative model outlined below.

Two-dimensional Gaussian yield response function
This section outlines the assumptions and mathematical formulation that underpin the bivariate Gaussian yield response function. Validation metrics shown in section 3.5 demonstrate that yield predictions made by the bivariate Gaussian outperform those for both the linear and the Gaussian process models. For that reason, this discussion goes into more depth than the previous section.
Being non-linearly dependent on temperature and precipitation, the bivariate Gaussian model shares some common features with Cutforth and Shaykewich (1990), Streck et al (2007), Yin et al (1995), Zhou and Wang (2018). However, there are several major differences to previous work: firstly, the model incorporates both temperature and precipitation, allowing for potential interactions between their impacts on growth; secondly, the non-linear function is Gaussian rather than a Beta function (e.g. Yin et al 1995, Streck et al 2007, which avoids the need to estimate explicit minimum and maximum thresholds for temperature and precipitation since the growth rate tends to zero for large deviations from optimal growing conditions; thirdly, the contributions to maize growth are calculated and summed on monthly intervals, rather than daily as for Growing Degree Day models (e.g. Zhou and Wang 2018).
The mathematical structure of the model is as follows. The k data points for each year that we are fitting with the model are where T k and P k are vectors with the temperature and precipitation values, respectively, for each month of the year. Under this model the monthly yield response (increase in final yield due to that month's conditions) is The predicted yield is then where, n i is the normalisation for the ith month. There are also many variations on this general structure; for instance instead of using all twelve preceding months we can use six, as with the linear regression model. We can also force the parameters of the Gaussian shape to be constant across time while allowing the normalisation to vary. This latter option assumes that favourable conditions are a constant over the growing period but will allow for months closer to harvest to have a more significant impact on the yield. The final model we present assumes that the normalisation is fixed across the six months prior to harvest. Mathematically, the likelihood (of one data point) is the probability of the data given the model, In this case, the likelihood is Although the model is a function of monthly mean temperatures, figure 4 shows how the parameters can be related to daily maximum temperatures through their strong correlation with monthly mean temperatures. The likelihood and the prior jointly determine the posterior. The Stan language and its Python wrapper PyStan are used here to sample the posterior model parameter space (Carpenter et al 2017).

Model validation
This section outlines a selection of methods and procedures for assessing the predictive performance of the statistical models described in the previous section. We evaluate the robustness of our model using a suite of cross-validation schemes. The first method uses Bayesian p-values to quantify how unlikely the observed yield is, given the model. If the observed data is unlikely given the proposed model, the differences between them will be large compared to the uncertainty in the prediction. Figure 5 shows these p-values for each year and state. This method does not split the data set into training and testing sets so over-fitting is a possibility. We, therefore, investigate a number of more rigorous approaches below. Since our model is time-stationary by construction, we first consider methods which do not aim to remove temporal dependencies. The most straightforward of these is to randomly split the data such that eighty percent  of the years are used to infer the regression coefficients, while the remaining twenty percent are withheld to assess the predictions. This stochastic procedure is repeated with ten different seeds to obtain an average. In leave-one-out (LOO) validation, we test on each year in turn, retaining the data from all other years for training. The average annual predicted versus observed fractional yield for each year under this regime is shown in figure 6. We also include a variation of this in which we exclude adjacent yields when inferring the regression parameters, following (Iizumi et al 2018), to reduce potential temporal correlation.
A yet stricter approach is to follow a rolling-origin validation procedure, such that only observations prior to the one currently being tested on are available for training: in each iteration we advance forward a year and accumulate an additional training sample. We begin by employing the standard root-mean-square error (RMSE) and classical R 2 , indicating the degree of variance captured by the predictions, as quality measures of our cross-validated estimates. Table 2 summarizes the results for the three models obtained with the various validation schemes using the five year median anomaly. The RMSE values for each of the three schemes are all comparable with Gaussian process regression marginally performing best according to most metrics. The negative classical R 2 values show that all of the point predictions do not predict year to year variations if we ignore errors and factors that are not modelled in the measured yield. This means that the model point predictions can be used to predict mean yield changes resulting from climate changes but not to accurately predict yearly yields.
The classical R 2 values fail to make full or appropriate use of the model posteriors. Indeed, Gelman et al (2019) argued that classical R 2 is also not an appropriate metric of model performance in the context of Bayesian inference due to the possibility of it lying outside the [0,1] interval. Instead, they suggest a new metric, 'Bayesian R 2 ', which lies with the [0,1] interval by construction and involves using draws from the posterior rather than the mean or median values used by classical R 2 . The output of a Bayesian model is a probability distribution for every measurement, such that using a point prediction from the posterior median parameter values is arbitrary. The Bayesian R 2 is designed to provide a measure of the variance in the data accounted for by the model, and is defined as the predicted variance divided by predicted variance plus error variance: 1 is the variance of the predicted values for the draw from the posterior, s, and var s res is the expected residual variance. This equation therefore describes the proportion of variance explained for a given draw from the posterior sample. We therefore use these Bayesian R 2 values for the posteriors, computed here as the median of twenty draws from the posterior.
Using the Bayesian R 2 measure, the bivariate Gaussian model performs the best out of the suite of similar models considered here. The full results are shown in table 3. The results also imply that the 5-year mean fractional anomaly is the best captured of all the yield metrics. We, therefore, recommend the use of the bivariate Gaussian monthly growth model with the 5-year mean fractional anomaly values, and the remainder of the paper focuses on this model.

Results
In this section we discuss the inferred parameter values of the bivariate Gaussian generative model and summarise its general form. Results are plotted in figures 7-11. These figures show the posteriors on the parameters of the bivariate Gaussian generative model. Figure 11 shows the growth response function for the mean posterior parameter values. The mean parameter values for the posterior are μ T =19.1°C, σ T =6°C, μ P =114 mm, and σ P =75 mm. Statistics describing the posterior are presented in table 4. The Gaussian formulation means that modelled growth will fall to less than 10% of its maximal value for monthly temperatures that differ from the optimum (μ T =19.1°C) by more than roughly two standard deviations, i.e. 2σ T =12°C. This two-sigma approach can be used to infer information about the cardinal temperatures for US maize (e.g. Yin et al 1995), and corresponds to lower and upper monthly temperature thresholds of roughly 7°C and 31°C, respectively. Although they have been derived using monthly mean temperature and precipitation, the growth response functions can be expressed in terms of mean daily maximum temperature, giving a more direct comparison with previous work (e.g. Schlenker and Roberts 2009, Hatfield et al 2011, Sánchez et al 2014, Hatfield and Prueger 2015. This makes use of a strong linear relationship between mean daily maximum and monthly mean temperature (Pearson correlation 0.998) as shown in figure 4. One way of identifying the relationship between these related variables would be to fit their joint distribution. Here, we instead explore the conditional dependence by binning the monthly mean temperatures and calculating the associated mean daily maximum. The gradient of the best-fit line is 0.967±0.024, and the intercept is 6.863±0.34°C; consequently, an optimal monthly mean temperature of ≈18°C-19°C corresponds to an optimal daily maximum of ≈24°C-25°C. This is consistent with Schlenker and Roberts 2009 who applied a range of statistical models to US maize production and found increasing yield for daily temperatures up to 29°C, and yield decreases above this threshold. The review by Sánchez et al (2014) gives an optimal growing temperature of 30.8±1.6°C for the whole maize plant  life-cycle. However, the grain-filling stage of the cycle (typically June and July in the US) appears to be the most sensitive to temperature, with the optimal growing temperature given as 26.4±2.1°C, overlapping with the results presented here. For completeness, the inferred monthly minimum and maximum temperature thresholds correspond to daily maxima of ≈13°C-14°C and ≈36°C-37°C, respectively. The symmetry of these values either side the optimal temperature is a consequence of the Gaussian formulation of the model, and differs to other estimates (e.g. Zhou and Wang 2018, and references therein), but is broadly compatible. While this symmetry represents a limitation of the model, the Gaussian approach is still considered informative since it allows us to explore the joint non-linear response to temperature and precipitation.
As described earlier, Kent et al (2017) investigated temperature and rainfall thresholds during June, July and August that were associated with large (>10%) yield reductions, known as shocks. In that work, yield shocks were found to be associated with the simultaneous occurrence of mean temperatures above 23°C for the three month period, and total precipitation below 240 mm. Figure 7 shows that 23°C is significantly greater than the optimal growing temperature, suggesting consistency with Kent et al (2017), despite applying very different approaches to the same data. Applying the offset between monthly mean temperatures and the daily maximum suggests that shocks are associated with extended periods during the grain filling stage with daily maximum temperatures above 29°C, showing excellent agreement with Schlenker and Roberts (2009), and consistency with Sánchez et al (2014) and Hatfield and Prueger (2015). The 3-month precipitation threshold of 240 mm corresponds to an average of ≈80 mm per month. This is significantly below the optimal monthly precipitation, shown in figure 8, again indicating consistency between the two approaches. Within the Gaussian model described here, the thresholds identified by Kent et al (2017) correspond to a 25% reduction in yield relative to what would be expected under optimal temperature and precipitation conditions. Figures 9 and 10 show the prior and posterior distributions of mean and variance for temperature and precipitation respectively and show how the posteriors are not sensitive to the assumed wide Gaussian priors.  Figure 11 shows the global form of the growth response function, illustrating the joint response to both monthly mean temperature and precipitation. Based on this analysis, there is no evidence for correlated responses to temperature and precipitation; however, an absence of data in parts of the monthly T-P plane means that there is insufficient evidence to say that there is no correlated response. Had there been a strong  correlation, maximising growth rates at higher temperatures would likely require higher precipitation, as for the emulator of rainfed mid-latitudes C4 crops, developed by Snyder et al (2018).

Potential impacts of climate change on mean yield
The shape of the growth response function indicates the likely direction of climate impacts on maize yield in the US. In particular, as mean temperature increases, growing conditions will be further from the optimal for current varieties. This will tend to reduce yield in the absence of any successful adaptation strategies such as development of new maize varieties that are better suited to higher temperatures. An important caveat here is that we do not account for changing CO 2 concentrations, which are an input in the photosynthesis process. However, while increasing CO 2 levels might offset yield declines resulting from warmer temperatures, there is evidence that the photosynthetic rate for C4 crops starts to level out around 400 ppm (e.g. Leakey 2009).
To test the climate change response of the model, we used the derived growth response function to quantify the mean yield as a function of changes in mean temperature, while keeping precipitation and growing area fixed. To ensure a fair comparison with the present-day, we applied the delta change method by adding a constant increment to the observed temperature time series, and modelling the resultant yield. The temperature increment is varied between −5 and 5°C as shown in figure 12. In agreement with previous studies (e.g. Bassu et al 2014, Urban et al 2015, Lobell andAsseng 2017), this suggests that the US maize yield is likely to decrease by several percent in response to a 1°C temperature rise, with larger reductions expected for greater warming. Figure 12 provides a more complete exploration of potential climate change impacts within this framework, showing the expected yield as a function of constant changes in both temperature and precipitation, applying independent increments of −5 to 5°C and −100 to 100 mm to the observed temperature and precipitation time series, respectively. The precipitation increments are broadly in line with projected changes in the climate during the 21st century (USGCRP 2017), see also figure 14. Table 4. Summary of the posterior on the parameters of the bivariate Gaussian. For each parameter of the model we present the posterior mean, the standard error on the mean, the standard deviation, the 2.5%, 25%, 50%, 75%, and 97.5% percentiles the effective number of samples, n eff andR.R close to 1 is considered to be 'good' and indicates convergence. There are 2000 samples from the posterior, n eff takes correlations between chains into account so that it gives the equivalent number of completely independent chains. The correlation, ρ (defined in equation (6)), is consistent with zero, indicating no evidence for a correlated response to temperature and precipitation. The normalisation sets the mean yield at actual T and P values equal to our baseline yield. For a full description of the model parameters see section 3.4. This provides a look-up table of yield as a function of simultaneous changes in temperature and precipitation, subject to the caveats outlined previously. For example, figure 13 suggests that US maize is relatively well-suited to the current climate and that significant changes in either/both temperature and rainfall are likely to reduce mean yield. In contrast, locations where current temperatures are below the optimum (e.g. the UK), would be expected to see an increase in maize yield with a warming climate. Due to the formulation of the model, this analysis does not allow for changes in the frequency of extreme daily maximum temperatures or precipitation intensity which may have a significant influence on both average yield and yield variability.
Finally, to demonstrate the flexibility of the model, we have estimated yield based on projected changes in temperature and precipitation calculated by global climate model simulations provided by the Intergovernmental Panel on Climate Change Fifth Assessment Report, Working Group 1 (http://climatechange2013.org/report/full-report/ IPCC 2013). Figure 14 shows the ensemble of projected changes in summer temperature and precipitation for the Central North American Giorgi region (which encompasses the US Cornbelt) under the RCP8.5 scenario for 2041-2070 relative to 1981-2014. For each of the 39 climate model simulations, the projected change in temperature and precipitation is added to the historical climate data to create synthetic time series of future weather conditions. We then compute the yield for the full sample of 2000 parameter values from the posterior, giving a probability distribution of yields with a median reduction of 12%. Figure 15 shows the full distribution of yield changes arising from these changes using the yield model presented here.

Discussion
Understanding climate change impacts on food production is essential for developing effective policies and adaptation plans at local, national and international scales that will ensure food security for all. The broad aim of the work presented here is to explore the feasibility of deriving physically-realistic relationships between maize yield in the US and local monthly temperature and precipitation during the growing season. A key component of this was developing a computationally inexpensive generative model that captures the main impacts of monthly meteorology on maize growth rates, using this understanding to assess the plausible impacts of climate change on current varieties in the absence of adaptation measures. In turn, this new approach can contribute to the growing body of evidence that supports genetic breeding programs and the development of improved agronomic practices which will ensure high levels of agricultural productivity in the future. One caveat is that this data-driven approach is not designed to capture all of the relevant physiological processes that govern maize growth and will fail to identify high impact events associated with the plant's response to pests, disease, pollution or short-lived meteorological extremes (e.g. frost, hail, high winds, extreme rainfall, flash droughts, etc).
The reliability of the empirical growth response functions is dependent on the quality of both the yield and meteorological data. These data are well-documented in the US, making it an ideal location for exploring and validating the approach, i.e. comparing critical temperatures with laboratory measurements and other studies (e.g. Cutforth and Shaykewich 1990, Schlenker and Roberts 2009, Hatfield et al 2011, Sánchez et al 2014. In other regions, historical records may be sparser, potentially making the modelling more challenging; however, we have demonstrated here that it is possible to extract useful information about crop response functions from monthly data at only a few different locations. For that reason, this approach lends itself to locations where data is limited, and it will be of interest to assess how well the approach can capture the characteristics of different crops grown in different environments.
The model presented here has been developed firstly as a way of capturing the broad influence of natural climate variability on US maize yield, and secondly to explore potential climate change impacts on current maize varieties in the absence of adaptation. Because of this, our model is closely related to previous work (e.g. Schlenker and Roberts 2009, Roberts et al 2012, while our model also provides a method for the robust characterisation of model errors and permitting applications with differing time resolution measurements. We have also demonstrated how, given projections of monthly temperature and precipitation from global climate simulations, our model can be used to provide a probability distribution of the impact on mean yield. We emphasise that the model demonstrated here only considers area-average yield, rather than total production which may be more relevant for assessing potential climate change stresses on national food security. The reason for this is that total production is a function of both yield and the area harvested. The latter is known to be affected by a range of non-climate factors, including commodity and crude oil prices (e.g. Zafeiriou et al 2018), as well as water availability and soil suitability. In contrast, the yield is expected to be much more strongly correlated with weather conditions during the growing period.
Previous research has explored the influence of a range of adaptation strategies (e.g. Challinor et al 2014, and references therein). Examples include changes in crop varieties, species, planting times, irrigation, as well as more transformational changes such as crop relocation. Other studies have focused more on the economic implications of adaptation (Seo and Mendelsohn 2008, Carter et al 2018, Dalhaus et al 2018. In principle, models can also incorporate a number of adaptation options, such as different crop varieties or species, planting dates and irrigation.
The approach presented here contributes to the toolkit of methods that seek to inform adaptation decisions, particularly phenotyping and the objectives of crop genetic improvement programmes (e.g.http://wgin.org.uk/) such as helping crop breeders identify and target traits that will be more beneficial in the future (e.g. heat and drought tolerance, higher optimal growing temperatures), as well as assisting agronomists in developing improved practices that will maintain high levels of productivity. The results can also provide useful context for interpreting climate change impacts simulated by more complex models (e.g. Tigchelaar et al 2018, Ostberg et al 2018.
More generally, this approach could be used to identify regions where current crop varieties and agronomic practices are projected to come under stress in the future, indicating where and when incremental or transformational adaptation may be most effective.

Conclusions
We have developed and applied data-driven statistical models for exploring the dependence of US maize yield variations on monthly mean temperature and precipitation, using both linear and non-linear relationships. However, a particular aim of the approach presented here has also been to assess the feasibility of extracting physically plausible growth rate information from limited data, in this case state-scale yield and monthly meteorological information. We have demonstrated how these coarser grained models can be used with predicted global meteorological changes to compute yield reduction risks.
The linear model predicts maize yield using a sum over the growing season of the monthly contributions to growth based on a linear function of observed temperature and precipitation. The early months in a calendar year are weak predictors of maize yield, consistent with a planting date around April, c.f. AMIS crop calendar. In contrast, the following six months to harvest (i.e. April-September) are found to be strong predictors of maize yields, indicating that weather conditions during the growing season in the US Cornbelt are statistically much more important than antecedent conditions.
The non-linear model predicts maize yield using a sum over the growing season of the monthly contributions to growth based on a time-stationary two-dimensional Gaussian function of observed temperature and precipitation. The modelled growth rates are maximal at the peak of the function, and lower either side (c.f. Cutforth and Shaykewich 1990, Wang and Engel 1998, Streck et al 2007, Hatfield and Prueger 2015, Korres et al 2016, Tigchelaar et al 2018. As such, the growth response function represents the typical response of the crop, averaged over the growing season. The yield and meteorological data are then used to constrain the location of the peak and the width of the bivariate Gaussian function, which provides information about the optimal monthly temperature and rainfall for current US maize varieties. There are several major differences between this approach and previous work: firstly, our model incorporates non-linear responses to both temperature and precipitation, allowing for potential interactions to impact maize yield; secondly, the non-linear function is Gaussian rather than a Beta function (e.g. Yin et al 1995, Streck et al 2007, which avoids the need to estimate explicit minimum and maximum thresholds for temperature and precipitation since the growth rate tend to zero for large deviations from optimal growing conditions; thirdly, the contributions to maize growth are calculated and summed on monthly intervals, rather than daily as for Growing Degree Days (e.g. Zhou and Wang 2018).
This approach represents a simplification of the crop's true response function which is time-dependent and multivariate (e.g. Siebert et al 2017), reflecting changes in the crop's sensitivity to meteorological conditions during different growth phases (e.g. Hatfield et al 2011, Sánchez et al 2014. Despite this, the formulation shares similarities with other models that have explored non-linear yield responses to temperature and accumulated thermal units (e.g. Lobell et al 2013, Zhou and Wang 2018). In addition, it also allow us to straightforwardly explore the joint influence of temperature and rainfall variations on monthly timescales. Within this framework, we find maize growth rates are maximal for a monthly mean temperature of 19±0.7°C, and monthly total precipitation of 114±3mm. This corresponds to a daily maximum temperature of 24°C-25°C, in approximate agreement with Sánchez et al (2014), Hatfield and Prueger (2015) for the grain filling phase of growth. Due to the shape and fitted parameters of the bivariate Gaussian function the growth rates decline rapidly for temperatures above this threshold, in agreement with Schlenker and Roberts (2009).
Our analysis also suggests that current US maize varieties are relatively well optimised for present-day growing conditions in the US Cornbelt, but that growth rates would be maximised at slightly lower monthly temperatures (≈−1.5°C) and slightly higher monthly precipitation totals (≈+25 mm). Keeping precipitation at present-day levels and the growing area fixed, a 1°C temperature rise is expected to reduce the mean yield by 3%-5%, in broad agreement with previous findings (e.g. Bassu et al 2014, Urban et al 2015, Lobell and Asseng 2017. However, we note that this analysis does not allow for various adaptation strategies (such as changes in planting date, location or irrigation) or account for changes in carbon dioxide (which may not have a strong influence at current concentrations (e.g. Leakey 2009)). Similarly, changes in the frequency of extreme daily maximum temperatures or precipitation intensity may have a significant influence on both average yield and year-to-year variability.
The similarity with previous findings is encouraging given that, to our knowledge, this is the first attempt to directly estimate a growth response function from state-scale data, rather than from field trials. This suggests that the new approach could be applied across a range of spatial and temporal scales, and to distinguish growth responses of different maize varieties. The main constraint to this application is the need for sufficient yield and meteorological data to robustly estimate the model parameters outlined in table 4.
More broadly, this approach provides an intuitive and computationally inexpensive method for deriving data-driven crop indices that can be used in climate risk studies (e.g. Kent et al 2017), and can complement other approaches to modelling crops (e.g. Schlenker and Roberts 2009, Carter et al 2016, Lobell and Burke 2017, Zhou and Wang 2018.