Yield response to climate, management, and genotype: a large-scale observational analysis to identify climate-adaptive crop management practices in high-input maize systems

Sustaining food security under climate conditions expected for the 21st century will require that existing crop production systems simultaneously increase both productivity and resiliency to warmer and more variable climate conditions. In this study, we analyzed observational rainfed maize (Zea mays L.) yield data from major maize production areas of the US Corn Belt. These data included detailed information on crop management and genetics not typically available in observational studies, allowing us to better understand maize yield response to climate under variable management. Spatial variability in management variables across the study domain is coincident with spatial climate gradients. Regularized global and geographically weighted regression models were used to explore maize yield response to climate, management, genetics, and their interactions, while accounting for collinearity among them associated with corresponding scales of spatial variability. In contrast with recent analyses suggesting increased susceptibility to drought stress under higher plant populations in maize production, our analyses indicated that under moisture limitation, higher yields were achieved when high planting rates were coupled with delayed planting date. Maize genetic families that performed best with adequate moisture saw greater yield penalties under moisture limited conditions, while positive response to increased radiation was consistent among family lines. The magnitude of yield response to climate, management, and their interactions was also variable across the study domain, suggesting that information on crop management in spatial yield data can be used to better tailor local management practices to changes in yield potential resulting from agronomic advancements and changing local climate.


Introduction
Maize productivity in intensively managed systems has increased dramatically with the introduction of double-and single-cross hybrids, improved agronomic management and yield performance under higher plant densities, and increased use of synthetic nitrogen fertilizers (Duvick 2005). Combined with improvements in farm technology, these advances have resulted in a five-fold increase in average rainfed maize yields in the US over the past century (Barker et al 2005). Genetic selection for yield against the backdrop of increasing plant density has resulted in indirect selection for improved stress tolerance in maize hybrids, including drought tolerance (Cooper et al 2014, Pruitt 2016. Newer maize hybrids retain higher kernel numbers per plant under wider ranges of stress conditions, including drought, at increasing plant densities, when compared to older hybrids and open pollinated maize cultivars (Tokatlidis and Koutroubas 2004, Duvick 2005, Mansfield and Mumm 2014. Higher plant densities, which developed concomitantly with increased stress tolerance in maize genetics, is the agronomic trend that correlates most strongly with historical maize yield increases in the US Corn Belt (Hammer et al 2009). As a result of both improved genetics and management, average maize yields in the US Corn Belt in 2012, a year when there was a severe, region-wide drought, were higher than non-stress maize yields 25 years earlier (Boyer et al 2013). Despite increases in stress tolerance in maize hybrids, several recent observational studies have suggested that the higher plant densities required to maximize achievable yield potential in modern maize hybrids has led to an increase in crop sensitivity to air temperature and/or vapor pressure deficit (vpd), the latter being an important driver of evapotranspiration that is often associated with low local soil moisture (Lobell et al 2014, Ort and Long 2014, Meng et al 2016. Crop management recommendations are calibrated and distributed to the public domain regionally, through local seed distributors, equipment distributors, crop consultants, and extension experts. Variables such as cultivar maturity class and planting date are adjusted to account for local or regional average growing season thermal time. Planting rates (to achieve target plant densities) and cultivar selection are adjusted in response to climatic moisture limitations (Gaffney et al 2015). These management variables, therefore, represent a systemic control on crop yields that follows the same spatial gradients as climate. If management variables are not available in observational studies, omitted variable bias can obscure the identification of climate impacts on yield variability in spatially aggregated datasets (USDA NASS 2007, Challinor et al 2015, Carter et al 2018. If included, they can help explain some of the observed spatial heterogeneity in yield response to climate variables and improve the estimation of the climate effects (Schlenker and Roberts 2009, Tao and Zhang 2010, Li et al 2011, Butler and Huybers 2013, Carter et al 2018. Importantly though, the effects of management and climate are difficult to parse in spatially aggregated observational datasets that exhibit significant spatial covariability in management practices, crop genetics, climate, and crop yield. Novel analytical methods are required for these datasets to identify effective and scalable climate-adaptive management practices. This includes identification of agronomically important management (M) by climate (C) (M×C) and genotype (G) by climate (G×C) interactions.
The objective of this study was to quantify maize yield response to climate, management, crop genotype, and their interactions, and explore whether these responses are systematic across the Missouri, Iowa, Nebraska, and Kansas (MINK) region of the US Corn Belt. To address the analytical challenges listed above, we used penalized and geographically weighted regressions (GWR) to assess yield responses within a dataset provided by the National Corn Growers Association (NCGA) National Yield Contest, which contains record-level information on crop management not typically available in observational yield/climate studies. This study is offered as a proof of concept that, if more large-scale management by yield datasets were made available in the public domain, regional variability in management by climate interactions could be used to inform updates of regional crop management practices for climate smart farming.

Methods
Maize yield and climate data Rainfed maize grain yield and management data from 2005 to 2012 were obtained from the NCGA National Yield Contest (http://ncga.com/for-farmers/ national-corn-yield-contest) for the MINK region (approximately 3500 entries). Data for each entry included county location, maize yield, farm ID, planting date (PD), planting rate (PR), tillage practice (till), previous crop (PC), cultivar name, and cultivar maturity class (CultGDD) (supplementary data, text S1, figure S1 is available online at stacks.iop.org/ERL/ 13/114006/mmedia). Growing season duration for each entry was calculated using management data and accumulated thermal time as described in Carter et al (2016) (text S2). Yield contest entries were associated with hourly climate data from 70 automated weather stations operated by Iowa, Kansas, Missouri, and Nebraska (text S3). For each entry, we calculated mean growing season average daily maximum vpd (Svpd (kPa)) as a proxy for drought stress (Allen et al 1998), and growing season cumulative incident solar radiation (Srad (MJ m −2 )). VPD is thought to be a better proxy for drought than precipitation (Roberts et al 2012), although we recognize that drought is a complex phenomenon that is difficult to characterize with a single climate proxy. However, based on a previous analysis (Carter et al 2018), it was determined that Svpd and Srad were reasonable proxies for water and radiation limited growth, and unlike precipitation and temperature, were relatively uncorrelated and thus more useful in regression model inference (text S2, table S1). All management and climate variables exhibited strong spatial trends in local means and standard deviations (figure 1).

Statistical methods overview
This analysis was designed to, as much as possible, infer climate, management, and crop genomic effects on yield using a dataset that contained significant collinearity between climate and management variables. Methods were selected to manage this collinearity and to better understand how these effects explained yield variance across space and through time. To achieve this, we first used two different global, penalized (elastic net) regressions to assess the yield impacts of climate, management, crop genetics, and their interactions across the entire MINK domain, while controlling for the impacts of collinearity on model inference. Second, in the global regression model, we used random effects on location and year in order to partition unexplained variance in the model across space and time. This partitioning helps to explain the type of variability explained by the addition of new covariates when comparing two different, nested models. Finally, we used GWR and associated significance tests to determine if there was significant variation in climate, management, and interaction effects on yield across the MINK domain. If so, this would indicate that management variables had different impacts on yield that depended on local climate conditions. These three components are explained in more detail below, as well as in text S4.

Global regularized regression
We first developed two series of nested linear mixed effects models (LMEs) to estimate the impacts of climate (C), management (M), and genotype (G) on maize yields, with an emphasis on their interaction effects (figure 2; text S5). These models were fit with static fixed effects for the entire MINK region (i.e. global regressions). Management (Series M) LMEs were used to evaluate whether management variables (PD, PR, CultGDD, till, and PC) modified yield response to seasonal climate variables (Svpd, Srad). To Figure 1. Locally-calibrated summary statistics of standardized NGCA Yield Contest data and climate variables across the MINK region (corresponding variable means and standard deviations can be found in table S2). Local summary statistics were calculated using a bisquare kernel function with the adaptive bandwidth corresponding to the scale of geographically weighted regression (see below); (a) local means (LM), which indicate the spatial gradient in the standardized covariate relative to the global mean and (b) local standard deviations (LSD), which represent the amount local variability around the local mean. evaluate robustness of these effects, alternative climate variables were also evaluated (text S6, table S3, figures S2-S6). Genetic Family Grouping (Series G) LMEs were designed to evaluate the impact of genotype on yield response to climate, relative to the management variables. In Series G models, cultivar levels were grouped into 29 different 'families' based on Pioneer ® Hybrid Family ® groupings, which were represented by a categorical fixed effect (Fam).
In all models, random effects were included as analogs for spatial and temporal yield variance. Changes in yield variance assigned to the random effects among models were used to evaluate yield variance (spatial, temporal, or residual) explained by the inclusion of different groups of fixed effects relative to a NULL model with only random effects (text S7).
The LMEs were estimated using elastic net regression, a regularized regression procedure that combines lasso (L1) and ridge (L2) regression penalties to stabilize the estimation of model coefficients in the presence of significant collinearity in the dataset. This estimation procedure promotes group selection, i.e. the inclusion of multiple correlated covariates in the model and the exclusion of unimportant covariates (Zou and Figure 2. Structure of global penalized (elastic net) linear mixed effects models. Series M (management) models (blue) utilized the full NCGA dataset, but only looked at M and C (climate) effects. Series G (genotype) models (yellow) were run on the NCGA rainfed yield data subsetted to look only at Pioneer ® Hybrid Family ® groupings with 20+associated contest entry records, but explored management (M), climate (C), and genotype (G) effects on yields. Boxes indicate the nested structure of models. Blue text indicates effects containing CultGDD that were not present in models containing Fam in Series G (genotype) Models.
Hastie 2005). A description of elastic net regression and details of data standardization, penalty selection, spacetime stratified leave-k-out cross-validation, and evaluation of model performance can be found in text S8. We note that in the presentation and discussion of the coefficient estimates from the LME models, effects were considered 'weak' if the model coefficient was <0.05 (an effect less than 0.13 MT ha −1 per sd of the covariate).

Geographically weighted regression (GWR)
We re-analyzed several important terms from the M×C model (PR, PD, CultGDD, Svpd, Srad, and all pairwise M×C) using GWR to see if M×C relationships were consistent across the MINK region. In particular, we were interested in whether the positive yield response to PR was consistent across our study region, including the western MINK region where mean annual precipitation is lowest and lower PR are generally recommended to improve yield stability under water-limited conditions (i.e. Ciampitti 2015).
Importantly, although the global condition number (CN, a measure of collinearity) was low (3.1), local CNs were in excess of 5 in certain locations of the study area (shown later in figure 5(b)). The highest CNs occurred in regions of low yields, high Svpd and low PR (northwestern KS, southern NE), and high yields, low Svpd and high PR (northern IA) (see figure 1). As regions with the largest and smallest yields are going to exert the strongest influence over coefficient estimates, collinearity in these regions could exert strong control on relationships identified in the global regression. GWRs can isolate these regions and determine if coefficients varied from the global regressions and from regressions fit in regions where collinearity is low. Since GWR coefficients may be unstable in regions with high CN, which could be mistaken as evidence for spatial variability in these effects, a locally weighted ridge (L2) penalty was applied to regions where CNs were >4 (Gollini et al 2013). Information on bandwidth selection, kernel function, and ridge penalty calculation is provided in text S9. To determine if spatial variability in GWR coefficients was significantly different from the global LME model and not an artifact of sampling variability, we used parametrically bootstrapped test statistics to evaluate the significance of spatial variability in unpenalized GWR coefficients against a null hypothesis of global effects (Harris et al 2017, text S10). Since such testing is not applicable to a penalized GWR, we coupled a penalized GWR with statistical significance testing on an unpenalized GWR to provide two complimentary ways to determine if effects are varying across space.

Management (series M) models
The results of the three nested Series M models are shown in table 1. Including management variables (M model) resulted in a 0.291 sd reduction in marRMSE over the NULL model (table 1(a)), indicating that management variables alone explained about 0.74 MT ha −1 of yield variance. There was a 0.370 sd (0.94 MT ha −1 ) improvement in marRMSE over the NULL model when climate variables were added to the model as additive effects (M+C model), and a similar improvement in marRMSE (0.360 sd, or 0.92 MT ha −1 ) over the NULL model when interactive effects were considered (M×C Model, table 1(a)).
In all M Series models, variance component partitioning suggested that management variables explained the majority of spatial yield variance in the dataset (table 1(c), text S11). Two high magnitude effects stand out: a strong positive relationship between rainfed maize yield and PR (coefficient size ranging from 0.31 to 0.35) and a strong negative relationship between yield and Svpd (coefficients of −0.40 and −0.37) (table 1(b)). Coefficients on yield response to Srad in the global model were weak, suggesting that Svpd and management exerted a stronger limiting control on yield than Srad in aggregate across the MINK region. CultGDD also showed a consistent positive relationship with yield (coefficients from 0.08-0.11), while other management variables (PC, PD, till) had weak relationships with yield. While the M×C model did not show a marked improvement over the M+C model in terms of prediction accuracy, a high magnitude coefficient (0.18) on the interaction between PR and Svpd indicated that planting rates may have played an important role in modifying yield response to high Svpd (M×C model, table 1(b)). In particular, the positive Svpd:PR crossed effect indicates that higher planting rate sites were less susceptible to yield declines under increasing Svpd ( figure 3(a)). The coefficient estimate for the Svpd:PD crossed effect (0.09), though lower, could have some agronomic significance and suggests that that the lowest negative yield response to Svpd occurred at later planting dates ( figure 3(b)). The coefficient estimates for other interactions with Svpd (e.g. for Svpd: CultGDD, figure 3(c)) were small. Similarly, all interactions between Srad and management variables were small, with the exception of using soybean as a previous crop. These interaction effects were robust to alternative parameterizations of climate (table S3, figures S2-S6).
Genetic family grouping (series G) models The Series G models were designed to evaluate the impact of genotype ('Fam') on yield response to climate, relative to the management variables. All model results are shown in table S4, with the most important results highlighted here. First, as in the Series M models: PR, alone or as an interaction term with Svpd, had the largest penalized coefficient in the Series G models (e.g. 0.39 for PR, 0.28 for PR:Svpd in the C×(M+G) model, table S4(b)).
Second, we note that different distributions of yields were seen for different genetic families in the raw data ( figure 4(a)). However, the base Fam coefficients (⊕ in figure 4(a)) for the C×(M+G) model were relatively independent of these shifts in yield distribution, and the introduction of G×C effects did not improve prediction accuracy and explained only marginal yield variance (table S4(a) and (c), text S11). This indicates that other climate and management variables, which also co-vary with genetic family (e.g. different families have different recommended planting rates), better account for yield variance. Still, model results do highlight a more nuanced response of yield to different Fam: Svpd interactions ( figure 4(b)). In general, the interaction effects between Svpd and Fam suggest that cultivar families with the highest yields under low Svpd conditions were associated with greater yield reductions under high Svpd conditions. Interaction effects between Srad and Fam were small and unimportant ( figure 4(c)), compared to the Fam:Svpd interactions.  figure 5(d), table S5, figures S7 and S8), but usually only in magnitude and not in sign. For instance, focusing on the primary effects, GWR results showed that (1) there was slightly less yield sensitivity to Svpd in central IA than the rest of the domain; (2) positive yield response to PR was higher in NE, KS, central MO, and lower in northern IA (where PR was highest and least variable); and (3) positive yield response to increased CultGDD was higher in the central part of the study domain, where CultGDD was highest and moderately variable. For interaction effects, GWR results suggested that the buffering against yield losses under high Svpd by delayed PD was strongest in eastern NE/northeastern KS, while the greatest reduction in yield losses under high Svpd associated with increased PR occurred in southeastern IA and eastern MO, with a weaker association in northern IA.
Spatial variability in the primary effects on management variables indicates regions where altered management was associated with stronger or weaker yield response under mean Svpd and Srad conditions. That is, yield is limited more by a management variable in regions with larger local primary coefficients. Spatial variability in interaction effects indicates how the moderating impact of management on yield response to climate can change over space. When taken together, the magnitude of GWR coefficients on primary and interaction effects show which management variables, if changed, have the greatest potential to locally improve yields under different climate conditions. For example, in the drier, western portion of the MINK region, increases in PR are associated with higher magnitude increases in yields, more so under high Svpd (i.e. drought) conditions. This contrasts with general recommendations for lower PR under water-limited conditions to improve yield stability (Ciampitti 2015), and suggests that in the western MINK region, PR values may be generally below the threshold for terminal drought stress induced by high plant density. In contrast, in northern Iowa, yield improvements from higher PR were small and have little interaction with Svpd, suggesting that PR rates in that region are approaching optimal PR rates. Possible changes in management variables (PR, PD and CultGDD) to improve yield performance locally across the MINK region under different Svpd conditions are presented in text S12 and figure S9. These results suggest that there is spatial variability in the potential for management-mediated adaptation to yield decline under high Svpd across the MINK domain.

Discussion
A key finding of our analysis is that, on sites without significant underlying yield restrictions, the highest yields under high Svpd conditions were achieved at high planting rates and delayed planting datessuggesting that high PR, short-season plantings yield highest under moisture limited conditions. Once canopy closure has occurred, variability in plant density has little impact on maximum daily canopy ET (Hsiao et al 2009). Therefore, increasing plant density (increased PR) while reducing the amount of time that the crop is in the field (delayed PD) may be a viable strategy for achieving maximum yields before limited soil water reserves have been depleted below critical thresholds for physiological drought stress (Larson and Clegg 1999). This result, particularly for PR, contrasts with the conclusions of several recent observational studies, conducted without explicit data on reported PR, which suggested that increased PR was associated with increased sensitivity to drought stress (Lobell et al 2014, Ort and Long 2014, Meng et al 2016. The mechanism associated with catastrophic yield losses in maize under moisture stress historically, and the impetus for reduced planting density recommendations in drought-prone regions, has been maize's susceptibility to kernel abortion when per-plant daily water supply, and therefore perplant daily photosynthate supply, drops below a critical threshold Westgate 1991, Schussler andWestgate 1995). Yet a primary outcome of maize breeding in the last 40 years has been to reduce kernel abortion under increased competition for water and solar radiation, increasing the resilience of high-density plantings under variable climate (Barker et al 2005, Duvick 2005, Assefa et al 2016. Our results in no way indicate that there is no longer an upper limit beyond which planting density induces increased risk of kernel abortion under moisture stress, but it may suggest that this planting density threshold for modern cultivars has not been reached in regions where lower planting densities are recommended. In contrast, the results of the GWR suggest that in northern Iowa, where highest mean planting densities and reduced coefficients on the PR×Svpd relationship are observed, farmers may be paying a yield penalty for higher density plantings under waterlimited conditions. This indicates that in this region, the upper threshold of planting density may have been reached. Our analysis also highlights the nuanced impacts of maize genetics on maize yield response to climate and management, independent of maturity class (CultGDD). While significantly different means in yields can be seen across different Fam, the coefficients on individual Fam were quite small after statistically controlling for management and climate impacts (table S4, figure 4(a)). As genetic families have different optimal plant densities, this result suggests that the family-level tolerance for higher density plantings is an important component of family-level yield response. Although there were relatively small improvements in prediction accuracy and percent of yield variance explained when Fam was included as a fixed effect with and without G×C interactions (tables S4(a) and (c)), high magnitude coefficients on Fam×Svpd ranging from −0.11 to 0.16 (table S4(b)) indicated that Fam modified yield response to Svpd ( figure 4(b)). Maize cultivar response to high Svpd, therefore, may be an important component of observed variability in cultivar performance, independent of CultGDD and management (Westgate and Hatfield 2011). Coefficients on Fam×Svpd effects suggest that there is a tradeoff between cultivar performance at low versus high Svpd, with several of the highest performing cultivar families under low Svpd conditions showing marked yield declines under high Svpd conditions.
In observational studies, individual climate variables can serve as a poor proxy for climatic conditions that limit yields across large spatial scales for many reasons, including discrepancies in climate between climate station and field locations, spatial variability in soil water holding capacity, antecedent climate conditions, and interactions among climate variables (i.e. low moisture anomalies coincident with high temperature anomalies) , Carter et al 2018. These factors could be driving the observed spatial variability in the GWR-estimated relationship between maize yields and climate variables across the MINK region, which indicate that Srad and Svpd alone do not adequately represent the climate conditions that limit yields locally. For example, high Svpd with high underlying soil moisture would not be expected to limit yields , and may drive discrepancies between Svpd at field and climate station locations (Mueller et al 2017); while high Srad with low underlying soil moisture could lead to evaporative stress on crops, and therefore act as a proxy variable for drought stress under certain conditions. This is consistent with the stronger positive yield response to Srad and weaker negative yield response to Svpd seen in the most humid locations in the MINK region (e.g. Northwest and North-central Iowa). Similarly, we can only evaluate the impact of management variables that are explicitly measured in our dataset, although we know that other variables (including ones that are correlated with those used in this work) could be driving yield response (e.g. interactions between higher density plantings and row spacing, Shapiro and Wortmann 2006). Spatial variability in M and M×C effects could indicate that spatial management gradients are not locally optimized, and therefore yield response to the local portfolio of management practices is different from what is suggested by global patterns. This is of interest because it suggests that spatial data on management across large spatial scales can be used to identify sub-regions within MINK where local management practices are more or less effective than region-wide management practices. In particular, where GWR-estimated M×C effects are significantly different from the global model, there may be opportunities to modify management practices (increase PR, delay PD, reduce CultGDD) to improve yields under high Svpd.
As we continue to see spatially variable climatological shifts in growing season temperature and precipitation across major crop production regions under climate change, local crop management recommendations will need to adapt to new and changing agroclimatological regimes. These climatological shifts may outpace the ability of controlled experimental trials to provide crop managers with timely information to adjust local recommendations. Our analysis demonstrates how variability in yield/management data pooled across the MINK region can be used to locally improve maize yields under climate extremes. To our knowledge, this is the first analysis to examine variability in crop response to farm-level management at this spatial scale, likely in part because datasets which contain record-level information on crop management are rarely available in the public domain. Such data can assist crop managers to adapt to climate variability and climate change locally, underscoring the importance of collecting/retaining management information in observational yield datasets. However, a note of caution is required. Even if advanced statistical techniques are used to control for extensive, systemic collinearity in observational crop-climate data, these data characteristics limit the utility of observational analysis to identify causal mechanisms of climate-adaptive management within the highly coupled yield-climate-management system. In addition, in observational analysis we are only able to evaluate covariates that are available, suggesting that the potential for omitted variable bias is quite high, especially in spatial datasets covering large, diverse domains. Therefore, we argue that large observational studies are most effective for identifying emerging statistical discrepancies between current management recommendations and yield-management relationships, which could then be tested further using controlled experimental trials. In addition, because the NCGA Yield Contest data likely represents the best sites and agronomically (as opposed to economically) optimal management, these results are not necessarily generalizable to commercial maize production systems. Still, this study underscores the importance of spatially explicit crop management data when analyzing observational yield datasets for climate-yield interactions.

Conclusion
Using a unique maize yield dataset from the US Corn Belt that contained high-resolution management information, we analyzed how spatial gradients in management and crop genetics contributed to observed variability in crop yield response to climate variability. Contrary to recent analyses, we saw no evidence that increasing PR was associated with an increase in sensitivity to high Svpd conditions on highproduction sites. Highest yields under high Svpd conditions occurred with delayed PD and high PR, suggesting that intensive cropping with reduced season length can improve yields under water-limited conditions. Variability in cultivar response to increasing Svpd indicated the potential to exploit this genetic variability to improve cultivar performance under drought conditions. Spatial heterogeneity in yield response to climate, PD, and CultGDD was observed, but positive yield response to increased PR in the study area was nearly global, suggesting that a main driver of rainfed maize yields along spatial climatic gradients is PR, and, in the absence of underlying site limitations, rainfed maize yields in high-production systems in the western region of the US Corn Belt may be improved by increasing PR. Overall, spatial variability in management over large production regions, which was coincident with spatial climate gradients, was shown to provide useful information to inform management for yield stability under climatic variability.