How can machine learning help in understanding the impact of climate change on crop yields?

Ordinary least squares linear regression (LR) has long been a popular choice among researchers interested in using historical data for estimating crop yield response to climate change. Today, the rapidly growing field of machine learning (ML) offers a wide range of advanced statistical tools that are increasingly being used for more accurate estimates of this relationship. This study compares LR to a popular ML technique called boosted regression trees (BRTs). We find that BRTs provide a significantly better prediction accuracy compared to various LR specifications, including those fitting quadratic and piece-wise linear functions. BRTs are also able to identify break points where the relationship between climate and yield undergoes significant shifts (for example, increasing yields with precipitation followed by a plateauing of the relationship beyond a certain point). Tests we performed with synthetically simulated climate and crop yield data showed that BRTs can automatically account for not only spatial variation in climate–yield relationships, but also interactions between different variables that affect crop yields. We then used both statistical techniques to estimate the influence of historical climate change on rice, wheat, and pearl millet in India. BRTs predicted a considerably smaller negative impact compared to LR. This may be an artifact of BRTs conflating time and climate variables, signaling a potential weakness of models with excessively flexible functional forms for inferring climate impacts on agriculture. Our findings thus suggest caution while interpreting the results from single-model analyses, especially in regions with highly varied climate and agricultural practices.


Introduction
Statistical models are commonly used for studying the influence of short-term and long-term climate variability and change on crop yields. The choice of statistical technique(s) is dictated by the specific research questions, availability of data and computational resources, or statistical expertise. Given the increasing availability of high-quality data and advanced computational facilities, researchers now have an arsenal of techniques and tools to choose from. Consequently, it is critical to understand the advantages and disadvantages of these statistical tools to identify the most appropriate technique. Our study adds to this field of research by assessing and contrasting two widely used statistical techniques for crop yield analysis.
Among the most common techniques in the field of statistical crop yield modeling is ordinary least squares linear regression (LR hereafter) which minimizes the sum of squares in the differences between observed and predicted values. It models the dependent variable (DV) using linear predictor functions for relevant independent variables (IVs). LR is widely used to model crop yields as a function of climate variability (Lobell and Field 2007, Butler and Huybers 2013, Davis et al 2019. LR models, due to their relative simplicity and explicitly specified relationships between variables, are easy to interpret for understanding the role of IVs in explaining the DV 5 . The mathematical equations are easily understood, and the model fitting process is faster and often less computationally intensive compared to more advanced algorithms (James et al 2013). The conditions which influence LR performance are also well understood by the greater scientific community.
LR models require a priori specification of the functional form of the plant physiological response to climate, which can be problematic because climate and crop yield relationships are usually non-linear and contain complex interactions Roberts 2009, Hsiang et al 2013). Researchers often use polynomial forms for climate variables (Fishman 2016) and piecewise linear relationships (Schlenker and Roberts 2009) to better simulate physiological processes and improve model performance. These techniques, however, still rely on the modeler making a priori assumptions about the functional form. While variables like temperature or rainfall may have physiological justification for including polynomials of higher order, it may be hard to find empirical evidence for all climate variables. Polynomial forms are themselves approximations which can lead to erroneous predictions when pushed outside the training data limits. Piecewise, or segmented, functions need manual specification of the expected number and/or locations of the breakpoint(s) when automatic breakpoint selection fails; this too relies on the modeler making a priori assumptions about the true functional form. Additionally, climate variables seldom affect crop yield in isolation of each other, which requires modelers to specify ever more complex functional forms with interactions. While LR allows the specification of such interactions between variables of interest, they must be specified individually, which leaves room for missing important interactions 6 . Thus, building accurate LR models with high predictive performance across multiple crops, contexts, and/or geographies requires detailed contextual knowledge of the underlying physiological mechanisms in each case, which may not always be possible. 5 There are certain conditions that need to be met for this to be valid, the most important being that the IVs should not be strongly correlated, because interpretation of a specific IV's regression coefficient assumes all other IVs are held constant. This assumption may be voided if multiple correlated IVs vary together. 6 Theoretically, one could run models with all possible interactions between the IVs and then conduct model selection. However, this is contingent on the availability of a massive dataset. Even with a few IVs, combining all possible interactions can get unwieldy very quickly. For example, with 3 IVs, there are 3 single-IV terms and 3 two-IV interaction terms. These can be combined to create 17 different models; including 3-way interactions adds even more models to the set of potential model specifications. With more IVs, the number of models increases exponentially. Inclusion of every possible interaction can also lead to non-identifiability issues.
These limitations can potentially be addressed using flexible algorithms that do not need any a priori specification of the functional forms and start with fewer assumptions about the relationship between the DV and IVs. Machine learning (ML) offers many such options, and the field of crop yield modeling has begun to show considerable interest in ML methods (Gonzalez-Sanchez et al 2014, Chen et al 2016, Jeong et al 2016, Crane-Droesch 2018, Vogel et al 2019, Hoffman et al 2020, Yin et al 2022. Thus, it is important to understand how ML methods with automated feature selection and flexible functional forms compare to standard LR models with pre-specified features and functional forms, in terms of prediction accuracy as well as the underlying algorithms that determine the climate-yield relationships estimated by these models. There is an active debate on the distinction between traditional statistical methods and ML (Breiman 2001, Bzdok et al 2018, and the separation is blurry and subjective. In this study, we call LR a 'standard statistical method' , and use boosted regression trees (BRTs) as an illustration of ML.
BRTs are a form of tree-based ML regression procedure that create partitions in the predictor space using nested if-else conditions, with the goal of attaining the highest possible prediction accuracy. Because single-tree models are prone to overfitting and exhibit poor predictive performance (Elith et al 2008), BRTs combine multiple regression trees using a popular ensemble method called boosting, wherein several weak trees are sequentially trained to improve the predictive power of the full model without overfitting. Unlike LR, BRTs can handle non-linearities and discontinuities in DV-IV relationships without user input on functional forms, can easily accommodate missing data, can conduct inherent model selection by dropping unimportant variables, and are relatively immune to non-identifiability issues from correlated IVs (James et al 2013).
On the flipside, BRTs are less interpretable than LR, since the output contains no explicit coefficients for each IV as with LR models. Nonetheless, BRTs can be used to rank IVs in order of their relative contribution to predicting the DV (Elith et al 2008), making them somewhat interpretable. Researchers analyzing the relationship between crop yields and climate have used LR for years and are more comfortable interpreting their results; ML methods like BRTs are novel and therefore not widely understood or used in this field. As with all statistical methods, BRTs can be used for predictions and conditional inference using partial dependence plots (PDPs) (discussed later). BRTs can be computationally expensive and may need access to high performance computing: however, this is a diminishing concern with rapidly improving computing power.
The primary objective of this study is to compare LR and BRT as tools for eliciting the relationship between climate variability and crop yields. We use India as a case study, and focus on three major crops (rice, wheat, and pearl millet). For both tools, we (a) assess their out-of-sample prediction accuracy, (b) interpret the models with reference to various climate variables using partial dependence analysis and synthetic data experiments, and (c) conduct some historical climate change simulations. We conclude by highlighting the strengths and weaknesses of each technique under different conditions. While our study adds to the rapidly growing literature on using statistical models (including ML) for forecasting crop yields under a changing climate, its main objective is to analyze how the structure of different modeling algorithms leads to different interpretations of the relationship between climate and crop yields in those models.

Climate and crop production data
District-level yield data for three major cereal crops (rice, wheat, and pearl millet) of India, covering 311 districts from 1966 to 2011, was acquired from ICRISAT (2015). Rice and pearl millet are major kharif (summer monsoon) crops, while wheat is the biggest rabi (winter) crop in India. Publicly available daily-scale temperature (1961-2015; 634 districts) and precipitation data ; 651 districts) were acquired from the Indian Meteorological Department (Rajeevan et al 2006). The climate data uses current district boundaries, while the ICRISAT data is based on 1966 district boundaries, so we used area-weighted averaging to harmonize climate data for new districts to ICRISAT districts. Climate data was aggregated from daily to seasonal scale after masking it for the growing season using state-level crop calendars published by Ministry of Agriculture and Farmers Welfare (2016) for each crop.

Statistical software and methods
We conducted our analysis in R (R core team 2022). LR models used the stats R package (R core team 2022), and BRTs were constructed using the gbm R package (Greenwell et al 2020). The latter allows automated detection of optimum number of trees using k-fold cross-validation. The function needs two user-defined parameters: (a) the learning rate, which determines the contribution of each tree as the model grows, and (b) the tree complexity, which controls the number of interactions between IVs. Following the recommendations of Elith et al (2008), we selected five learning rates (0.1, 0.03, 0.01, 0.003, 0.001) and five tree complexity values (2, 4, 6, 8, 10) as possible candidates for our BRT models. We randomly sampled 80% of our data (stratified over years) to train a BRT model for each possible permutation and tested on the held-out 20% data. The learning rate (0.01) and tree complexity (6) that gave the most accurate results in terms of root mean squared error (RMSE) for the highest number of crop/model combinations were chosen for further analysis.

Models and climate variables
This study examined three different sets of climate variables as input to the models: noclim: a baseline model with no climate variables, Tavg_Psum: average seasonal temperature and total seasonal precipitation 7 , and Tavg_Psumday: average seasonal temperature, total seasonal precipitation and total precipitation days over the growing season.
All models built using these variable sets contained identical geography and time variables to account for non-climatic determinants of crop yield. While researchers have demonstrated that crop yields are sensitive to several other climatic factors such as duration of dry spells and heatwaves, growing degree days, and extreme degree days in crop yield models (Zaveri and Lobell 2019, Zhu et al 2019), we used only seasonal temperature and precipitation because we wanted to focus on comparing the algorithms' underlying mechanisms. Limiting our analysis to two climatic variables avoided excessive complexity and helped interpret the difference across methods.
Each of the three variable sets were used as input for four different modeling algorithms: lr_mono: LR with only monomial terms of time and climate, lr_quad: LR containing quadratic forms of time and climate, lr_sgm: segmented LR with one knot 8 for year and each climate term, and brt: BRT with the same variables used in the LR models 9 .
All combinations of the three variable sets and four models were run for the three crops (table 1). For clarity, we italicize variable set and model names (e.g. Tavg_Psum or lr_sgm). We use the non-italicized capitalized format when referring to LR and BRT algorithms in general. We measured model performance in terms of RMSE, while avoiding overfitting by using out-of-sample ten-fold cross-validation with 7 Precipitation as a variable ignores the initial conditions. For example, soil moisture could be low, medium, or high at the start of the season, which is not captured by seasonal precipitation. Please refer to (Ortiz-Bobea et al 2019) for a detailed analysis of soil moisture. 8 The knot signifies a point of abrupt change in the DV-IV relationship. Knot determination process is explained in greater detail in section 2.4.2. 9 Note that the brt model does not need any functional form as input. Only names of relevant IVs and DV are input into the algorithm.  (1): where y it is the crop yield in district i and year t; α i is the district specific intercept (dummy variable); β is the coefficient for linear time (harvest year) trend; γ n is the coefficient for n th climate variable (clim_var n ); ε it is the standard error. For lr_quad, equation (1) was modified to include quadratic terms for time and all climate terms. lr_sgm too used equation (1) as its functional form, but with a single knot for each variable (including time), allowing for a more flexible fit. brt does not need any formula as input.

Model inference 2.4.1. Partial dependence plots (PDPs)
To examine the marginal effect of each IV on the DV for all four model types, we plotted PDPs for each IV. Single-variable PDPs depict the DV as a function of an IV, marginalizing over observed values of all other IVs (Hastie et al 2017), so their interpretation is conditional on low correlation and/or interactions between IVs. The total seasonal precipitation and number of precipitation days are well correlated (R 2 of 0.27, 0.68, 0.24 for rice, wheat, pearl millet), and so we present both single and two-way PDPs from only the Tavg_Psum models.

Specification of segmented LR using BRT PDPs
We used PDPs for fitting some segmented LR models. The segmented R package (Muggeo 2008) requires an input of either the number of knots for each IV, or the starting location of the knots to start estimating the breakpoint locations. In our study, automated knot search sometimes failed for some crop-model combinations 10 . In such cases, we used the brt PDPs as a diagnostic tool to provide the most probable starting location of knots for the segmented algorithm; this reduced the proportion of failed runs from 38% to 14% for lr_sgm:Tavg_Psum.

Inference using synthetic data
A limitation of using observational data for model inference is that the true relationship is unknown, and it is difficult to assess a statistical model's ability to correctly unmask the true relationship. So, we created synthetic crop yield and climate data with hypothetical pre-determined DV-IV relationships. Three sets of experiments were conducted using LR and BRT models with this data, and their PDPs were examined. More details on the synthetic data creation process are available in SI.1.

Experiment one: conflation between time and climatic variables
Experiment one focused on the possibility of conflation between time and climatic variables in models built using observational climate and crop yield data (Sidhu et al 2022). We created three sets of synthetic mean seasonal temperature data: • Data SD.1A: no time trend in temperature (no climate change scenario) • Data SD.1B: temperature increasing linearly over time, and • Data SD.1C: temperature increasing quadratically over time For all three datasets, synthetic crop yield was calculated using the same coefficient for yield versus temperature. This synthetic data was then used to fit lr_mono and brt models to ascertain if they could determine this coefficient from each of the three synthetic datasets.

Experiment two: model ability to capture spatially varying yield-climate relationship
Experiment two aimed to highlight the flexible nature of non-parametric BRT models, compared to LR that requires user-specified functional forms. Consider the case where crop yield dependence on a climate variable varies across regions: for example, a multi-crop analysis in the Great Plains region of 10 Our segmented models were based on one-knot-per-variable optimization, as explained in table 1. When the segmented function (from the segmented R package) was unable to identify a knot that improved the model fit compared to the knot-less linear regression model (lr_mono in table 1), the segmented function returned a warning explaining that the knot search failed. When automated knot search failed in the one-knot mode, we reran the segmented function while allowing multiple knots (2, 3, or 4), but model runs that failed to identify discontinuity in single-knot specification also failed in multiple-knot runs. the US reported both positive and negative impacts of temperature increase on yields across different counties (Kukal and Irmak 2018). To test LR versus BRT in such a situation, we fit lr_mono and brt models to two synthetic datasets: • Data SD.2A: spatially invariant relationship between yield and temperature • Data SD.2B: yield sensitivity to temperature varied across districts 2.4.3.3. Experiment three: model ability to capture unexpected interactions between IVs While the effect of some interactions on crop yield, such as those between irrigation and precipitation are well understood, other obscure yet critical interactions between IVs may get ignored in the absence of a priori knowledge. For example, Zaveri and Lobell (2019) found that irrigation reduced crop sensitivity to extreme heat as well, an interaction less commonly used in models. Similarly, Anderson et al (2015) report a difference in yield response to temperature under high versus low soil moisture conditions. We created synthetic data for two hypotheticals: • Data SD.3A: low irrigated crop area (5%-10%) • Data SD.3B: high irrigated crop area (90%-95%) Crop yield was calculated by including an interaction between irrigation and temperature (such that irrigated areas exhibited lower yield sensitivity to temperature variability). We then built LR and BRT models with this data. Because we wanted to examine the effect of unknown interactions, we included temperature and irrigation as IVs without interaction in the LR formula (BRT does not need any formula specification).  Following the approach in Lobell et al (2011), we used scenario analysis to estimate the impacts climate change has had on India's crop yields over the study's reference period from 1966 to 2011. We linearly detrended all climate variables to create a counterfactual 'no climate change' scenario that retained annual variability. For the last decade in the dataset (2002-2011), we calculated the difference between yield predictions for the counterfactual and the observed climate data. This calculation was repeated using residual bootstrapping (n = 500) to quantify the impact climate change has already had on crop yields, with 1966 as the baseline 11 . Predictions of climate change impacts from all four modeling procedures (lr_mono, lr_quad, lr_sgm, and brt) for the two nonnull variable sets (Tavg_Psum, and Tavg_Psumday) were then compared for each crop.

Model accuracy
The BRT model outperforms all three LR models by a substantial margin in terms of yield prediction accuracy (figure SI.1), while the accuracy of the three LR models is in a similar range. For all crops, even the best performing LR models have higher RMSE than the corresponding noclim BRT model. This is because a BRT can fit complex and flexible relationships between yield and geography/time in the noclim model, while the LR is restricted by the functional form selected by the modeler.
Since we seek to infer the role of climate variability in determining crop yields, the contribution of climate variables in improving models, over and above geography and time, is of critical importance. We thus evaluated model performance in terms of percent reduction in RMSE compared to the corresponding baseline model type with no climate variables (noclim model) (figure 1). Adding climate variables significantly improved model performance for rice and pearl millet, but not for wheat (both LR and BRT). This pattern was consistent for all four model types. However, the absence of change in RMSE does not necessarily mean that climatic variability does not affect wheat yield; Sidhu et al (2022) showed that there may be compensatory effects between climatic and non-climatic variables. Wheat may also be exhibiting resilience to climatic variability as it is the most heavily irrigated out of the three crops analyzed in this study 12 . Zaveri and Lobell (2019) report that irrigated wheat in India exhibits only a quarter of the heat sensitivity of rainfed wheat. Wheat cultivation in India has also benefitted from other factors linked to India's Green Revolution such as improved varieties (Baranski 2015), energy subsidies for irrigation (Sidhu et al 2020), and guaranteed procurement (Chatterjee and Kapur 2017) which encouraged mechanization and fertilizer application, potentially weakening the link between climatic variability and wheat yield.
For rice and pearl millet, performance improvement with the addition of climate variables was highest for brt (figure 1). In other words, not only did BRTs outperform the linear models in predicting absolute yields for these two crops (figure SI.1), but also exhibit improved yield prediction accuracy with the addition of climate variables. This suggests that the BRTs establish a better relationship between climate variables and yields in a way that the LR algorithms are unable to.
The inclusion of precipitation days did not significantly improve model performance in any case except lr_mono for pearl millet (figure 1). Nonetheless, number of precipitation days could still be an important variable for our models; Sidhu et al (2022) showed an increase in 'relative importance' (Grömping 2006(Grömping , 2015 of climate when number of precipitation days is included. We therefore decided to include both Tavg_Psum and Tavg_Psumday models in our study.

PDPs
Partial dependence of pearl millet yield on time is consistent across all four model types (figure 2) 13 ; a linear approximation seems valid for representing the increase in pearl millet yield over time due to factors such as advancement in agricultural practices, better technology, mechanization, availability of improved cultivars among others (Lobell and Burke 2009).
However, the response of pearl millet yields to average seasonal temperature (middle) and total seasonal precipitation (right) is non-linear, a functional relationship that has been reported in previous studies using statistical and process-based models (Wang et al 2017, Agnolucci et al 2020). lr_quad, lr_sgm, and brt predict that pearl millet yield increases with increasing precipitation, reaches a maxima, and then declines with any further increase in precipitation, similar to the non-linear relationship between yield and precipitation reported in literature (Lobell and Asseng 2017). lr_mono, meanwhile, predicts continuously increasing yield with precipitation increases, an obvious artifact of this model type.
For rice, there is agreement between the partial dependence estimates for precipitation on yield between lr_sgm and brt: yield initially rises with increasing precipitation, but flattens beyond 1200-1300 mm (figure SI.2); lr_mono's linear fit deviates a bit, but the difference is not as stark as pearl millet. Wheat models, irrespective of the model type, estimate no significant impact of precipitation on yield (figure SI.3), denoting wheat's resilience to climatic variability due to reasons discussed earlier.
Similar differences between models are observed in the yield response to temperature. As expected, lr_mono estimates a monotonically decreasing pearl millet yield response to average seasonal temperature. brt, however, predicts little impact of temperature until around 28 degrees Celsius, beyond which there is a rapid decline in yield with rise in temperature. lr_sgm and lr_quad models also predict similar non-linear yield responses to temperature, although less sharp than the brt.
For both rice and wheat, all three LR specifications predict a strong and significant negative influence of temperature on yield, but the brt predicts a smaller effect (figures SI.2 and SI.3). There are temperature ranges where some LR and ML estimates Figure 3. PDPs for LR (red) and BRT (cyan) models fitted on synthetic data using user-defined coefficient for temperature. The LR is able to accurately model the relationship shown using black dotted line (expected because the underlying data creation used a linear function), but the BRT sensitivity to temperature change goes down (the blue plot becomes flatter) as temperature gets more correlated with time (top to bottom, right column). concur (all four have similar slopes in the high temperature range for rice; lr_sgm and brt estimate no significant impact of temperature change in the less than 22.5 degrees Celsius region for wheat). However, in general, brt predicts a lower sensitivity (evident from shallower slope of the PDP) to temperature compared to the three LR models, and this difference is most stark for rice. In SI.7, we further demonstrate the BRTs' ability to capture and represent non-linear relationships in a more automated fashion.
If the interaction of the IVs is expected to strongly influence the DV, PDPs with multiple IVs varying simultaneously would be significantly different compared to one-way PDPs like those depicted in figure 2. We present this analysis in SI.3; the important inferences stay the same.

Lessons from synthetic data 3.3.1. Experiment one: conflation between time and climatic IVs
The LR model is able to fit the expected functions for all three climate change scenarios (unchanging climate, linearly increasing temperature, quadratically changing temperature) (figure 3). This is unsurprising since yield was explicitly modeled as a linear function of temperature, and the LR is simply re-discovering that linear relationship. Conversely, as we move from a time-independent to linearly and then quadratically varying temperature data (or, when temperature contains a stronger time trend), the BRT model predicts a progressively lower sensitivity of crop yields to temperature, even though yield data was created from temperature using the same coefficient in all three cases. In this case, the increased flexibility of BRT becomes a weakness, as it is prone to conflating time and temperature when there is correlation between them. Additional tests with synthetic data are available in SI.4.
Based on this experiment, we can now better interpret results in section 3.2 where BRTs predicted a lower yield sensitivity to mean seasonal temperature. It is possible that the brt model in our analysis with actual data used the time variable to account for some of the yield decrease resulting from changing climate (i.e. conflating the climate signal with the time trend). Thus, the resulting yield predictions in later years by brt are lower compared to the other three LR models. Indeed, for rice and wheat, the year PDPs for brt (figures SI.2 and SI.3) show the sharpest reduction in yield increase over time, in comparison to the three LR models. This is a potential drawback of ML methods that fit non-parametric models without any functional input or a priori information from the user.

Experiment two: spatial flexibility of non-parametric models
Both LR and BRT are able to elicit the correct relationship for data SD.2A with spatially similar relationship between yield and temperature (figure 4; top row). However, for data SD.2B, where yieldtemperature relationship varied across districts, the BRT model outperforms LR by a big margin (figure 4; bottom row), because the LR model was constrained to a single coefficient for yield versus temperature in all districts. Thus, BRT models are better at capturing spatial heterogeneity in the response to climate. While LR performance can be improved with districtspecific coefficients for climate variables which have region-dependent relationship with the DV, that needs identification of those climate variables and manual tweaking of the functional form which may not always be possible with multiple climate variables. Also, fitting interactions between district dummies and climate variables is an all-or-nothing proposition, since it can either be initialized with a single parameter for all districts, or individual parameter for each district. Highly parameterized LR models are prone to overfitting and can have detrimental effects on statistical power (for example, by lowering the model's degrees of freedom); we provide an example of this in SI.8, where the number of parameters (p) approach the number of observations (n). BRT, on the other hand, was able to elicit the relationship without any extra input from the user even when p approaches n (SI.8). This experiment demonstrates the utility of fitting a more flexible ML technique compared to a more constrained LR.

Experiment three: accounting for unknown interactions
Since we did not explicitly include an interaction between temperature and irrigation, the LR model fit the same slope to yield-temperature curve for low as well as high irrigation districts (middle panel in figure 5). The BRT, in contrast, was able to deduce the interactive relationship between irrigation and temperature and mimicked the true relationship with high accuracy by fitting a functional form that changed depending on irrigation availability (right panel in figure 5). This may seem obvious for irrigation where interactions are expected, but it is important: as new data and climate variables become available, their interactions may not be clearly understood, so non-parametric methods like BRTs that do not need an a priori functional form specified might be useful in uncovering those interactions and doing so in different settings (see experiment two).
To summarize the lessons from tests using synthetic data, BRTs have the potential to elicit the true relationship between the DV and IVs without any a priori input from the user, especially when the relationship varies spatially (section 3.3.2) or there are unanticipated interactions between IVs (section 3.3.3). However, the extra flexibility also means that BRTs may conflate signals from unrelated IVs such as time/geography and climate (section 3.3.1). LR, in contrast, needs manual specification of expected functional form, which can get unwieldy with multiple IVs. LR is also prone to Figure 5. Crop yield as a function of seasonal temperature, as predicted by LR (middle) and BRT (right) models compared to the true relationship (left). The synthetic data contained two hypothetical regions with highly varied irrigation availability: low(red) and high (blue). The BRT is able to capture the interaction between irrigation and temperature encoded in the synthetic data very well, as opposed to LR which ignored the interactive impacts because they were not explicitly specified while building the model. The scatter points depict the underlying synthetic data, and the solid lines show the modeled (left) or predicted (middle and right) functional form for each district in the low and high irrigation states. misspecification in the absence of complete domain knowledge but given close user control, can be less prone to confounding which is harder with BRT.

Climate change impacts over reference period (1966-2011)
The effects of climate change from 1966 to 2011, as predicted by the four model types, vary across the three crops analyzed. For pearl millet (figure 6), lr_mono predicts a larger negative impact of climate change compared to the other three (lr_quad, lr_sgm, brt) models in the comparatively cooler regions of the country (central and western districts). This is commensurate with the PDPs (figure 2) where lr_mono has the biggest effect size for temperature in the lower temperature range. But in the northwest, the warmest region of India that includes Rajasthan (India's biggest pearl millet producer state), lr_quad and lr_sgm estimate a bigger influence of climate change than either lr_mono or brt. This trend also matches the PDP patterns seen earlier (figure 2). Similar to LR, BRT predicts the north-western region to be the hotspot of yield losses, although the magnitude of the predicted effect of historical climate change on pearl millet yield is smaller than the LR models.
For rice, the biggest contrast is between the LR and BRT predictions: the latter shows little to no impact of historical climate change across most parts of India (figure SI.8). Climate change impact estimates are similar for all three LR models: rice yield in most regions, except parts of south-east India, has been negatively affected by climate change. Climate impact hotspots are observed in all six LR simulations, the biggest of which is in the central Indian state of Madhya Pradesh (an important rice producer). The hottest districts in the south show a smaller effect of climate change in lr_quad and lr_sgm compared to the predictions made by lr_mono. This matches the shallower slopes observed for lr_quad and lr_sgm (compared to lr_mono) on the extreme right side of rice PDPs (figure SI.2). However, these differences are not as stark as with pearl millet.
Wheat (figure SI.9) shows similar patterns as rice: all three LR models exhibit similar effects of climate change. While lr_mono predicts a more consistent impact throughout the country, more variation is observed in lr_quad and lr_sgm models. There are some districts in the southern state of Karnataka that are estimated to have experienced larger yield losses due to climate change in the T_avg_Psumday:lr_sgm panel compared to the corresponding predictions in the T_avg_Psumday:lr_mono panel. Regardless, the trends are similar across the three LR methods, and the biggest difference is observed between LR and BRT panels. As with rice, brt predicts very little wheat yield loss due to climate change. This could be due to signal confounding, as illustrated in section 3.3.1. Also, wheat and rice were big beneficiaries of India's Green Revolution, which may have played a role in increasing these crops' resilience to climatic variability.
The key takeaway from this analysis concerns the range of predictions depending on the statistical technique used. The BRT predictions of historical climate effect on yields of all three crops is considerably smaller than the LR models. As the BRTs showed better accuracy compared to the LR models (in terms of RMSE reduction, figure 1), one could argue that the BRT results of smaller climate effects should be trusted. But the lower yield sensitivity to climatic variability as predicted by BRTs could be due to the conflation of time with climate variables in BRTs (section 3.3.1). It is also possible that spatial flexibility of BRTs explains some of the difference between LR and BRT predictions (we present evidence of this in section 3.3.2 and SI.6), but potential variable conflation by BRTs calls for caution in their use for investigating climate impacts on yield.

Conclusion
ML techniques like BRTs offer promising tools for use in climate-crop models. They can be more accurate in predicting yields than LR, especially as they allow the use of flexible functions in cases where the relationship between yields and climate variables varies across regions. When used as a diagnostic tool, they can uncover interactions between IVs that may otherwise be missed if users employing LR models do not anticipate them. They can also help identify breakpoints in IV-DV relationships for use in fitting the segmented LR models. Many benefits that BRTs provide automatically, can also be derived using LR, but this needs manual specification of the model, and requires a priori knowledge and assumptions about possible functional forms. Even if such specifications are possible, they may add unnecessary complexity to LR, and reduce statistical power or cause parameter instability when multiple climate variables are being analyzed simultaneously.
Conversely, BRTs can be hard to interpret, and users have less control on the functional form of the model. We showed through an example with synthetic data that BRTs can conflate two correlated IVs and lead to potentially erroneous interpretations. This conflation may have led BRTs to underestimate the climate change impacts on crop yields in our study. Thus, any novel learnt relationships need to be carefully analyzed and compared either with more traditional approaches or using domain knowledge to ensure that the estimated relationships are robust.
The stark differences in climate change impact estimates based on model choice requires resolution through future research that could assess other flexible methods including heavily penalized LR, such as elastic nets, or multivariate adaptive regression splines (Friedman 1991). The latter have previously been used in crop yield analyses (Lobell et al 2014), and contrasting them and other methods to BRTs would be a logical follow-up to our study. More generally, studies analyzing climate-yield relationships often choose one kind of modeling approach. Care must be taken in interpreting single-model analyses, especially in regions with varied climate and agricultural practices. This study shows that using a combination of models on the same datasets and utilizing their complementary strengths, while being clear about their weaknesses, will help advance work in this field. Previous studies have also shown that while mean seasonal weather conditions like average temperature and total precipitation are critical, crop yields are also sensitive to within-season climate variability like short-term extreme heat, excessive rain, and/or dry spells, especially during crucial crop growth stages (Ortiz-Bobea et al 2019). Our models with seasonal temperature and precipitation do not account for such sub-seasonal climatic variations; examining various algorithms using subseasonal climatic variables would improve our understanding of the impact of climatic variability on crop yields.

Data availability statement
All data used in this study is available open-source.