Combining historical agricultural and climate datasets sheds new light on early 20th century barley performance

Barley (Hordeum vulgare ssp. vulgare) is cultivated globally across a wide range of environments, both in highly productive agricultural systems and in subsistence agriculture and provides valuable feedstock for the animal feed and malting industries. However, as the climate changes there is an urgent need to identify adapted spring barley varieties that will consistently yield highly under increased environmental stresses. In this research we combined recently released historical weather data with published early 20th century Irish spring barley trials data for two heritage varieties: Archer and Goldthorpe, following an analysis first published by Student in 1923. Using linear mixed models, we show that interannual variation in observed spring barley yields can be partially explained by recorded weather variability. We find that whilst Archer largely yields more highly, Goldthorpe is more stable under wetter growing conditions, highlighting the importance of considering growing climate in variety selection. Furthermore, this study demonstrates the benefits of access to historical trials and climatic data and the importance of incorporating climate data in modern day breeding programmes to improve climate resilience of future varieties.


26
Spring barley (Hordeum vulgare ssp. vulgare) is the most widespread spring crop in Ireland and 27 approximately 120,000 ha are sown each year (TEAGASC 2020). It has been grown in Ireland since 28 the 1800s and is well suited to the Irish soils and long growing season, which offer high yield 29 potential (TEAGASC 2017). As the climate changes and extreme weather events become more 30 frequent, identification of spring barley varieties that prosper and consistently produce high yields is a 31 priority. 32 strength (Hunter 1913). Goldthorpe is a 2-row wide-eared barley known for its high malting quality. 42 In 1889 a single wide ear was found in a field of Chevalier near Goldthorpe, Yorkshire and was 43 selected and propagated to become Goldthorpe (Gothard et al. 1983;Malcolm 1983; Reid et al. 44 1929). 45 Analysis of these trials data by William Gosset in Student (1923) concluded that the chief difficulty in 46 comparing variety performance was that differences between varieties are small compared with 47

97
In addition to station data, ECMWF's twentieth century reanalysis dataset (ERA-20C) (Poli et al. 98 Growing season average temperature (Figure 1c), total rainfall (Figure 1d) and total 106 photosynthetically active radiation (PAR) (Figure 1e) for 1901-1930 confirm that the driest and 107 sunniest region is the south-east of Ireland. 108 Growing degree days (GDD) were calculated using daily temperature data for growing season months 109 March to August and the following equation: 110 where is daily minimum temperature and is daily maximum temperature. The spring barley variety trials are located across different sites, creating a clustered dataset where 121 trial yields are not independent and not all farms were used each year. At a given site, the yields are 122 all dependent on the same environmental factors such as rainfall and soil type, as well as the same 123 farmer and agronomy. Therefore, the following linear mixed-effect model was used so that both farm 124 and year could be included as random effects, using REML through lmer from lme4 (Bates et al. 125 2020) in R: 126 is the yield of variety in year at farm , is the overall trial series mean, is the effect of 127 monthly temperature in year at farm , is the effect of monthly precipitation in year at farm , 128 is the effect of variety , is the effect of year , with year included as a factor variable, is 129 the interaction between variety , monthly temperature in year at farm , and is the 130 interaction between variety and monthly precipitation in year at farm and is the residual 131 term.
is the effect of site within years, representing the interaction between year term and farm 132 term . This term means each farm is treated as different each year, which is a more accurate 133 representation given the exact location of fields is unknown and may have varied.

140
To reduce the dimensionality of the data and identify the most significant monthly temperature and 141 precipitation variables in determining yield to include in [2], best subset selection, forwards and 142 backwards stepwise selection, the lasso (Tibshirani 1996) and elastic net (Zou & Hastie 2005) were 143 used on the linear model run using lm in R: 144 These were implemented in R using the functions and arguments detailed in Table S1. Significant 145 variables (p < 0.05) in each of the selected models were identified using an analysis of variance 146 (ANOVA). For each method, the RMSE and adjusted R 2 were calculated for the selected model.     Ireland consistently receives high rainfall, water deficits are unlikely to be a yield-limiting factor. 211

216
Over the 1874-2020 period, significant long-term increases in growing degree days of 0.76 o C yr -1 (r = 217 0.26, p=0.003) and 2.3 o C yr -1 (r=0.66, p=3e-19) have been seen at Birr Castle and Glasnevin 218 respectively ( Figure 6). The more extreme increase in GDD seen at Dublin is likely due to increased 219 urbanisation and industrialisation in the city (Dublin City Council 2017), decreasing the city's 220 albedo, increasing absorption of solar radiation and local temperatures. In addition to being the 221 wettest of the 6 years, 1903 growing season has the 11 th lowest GDD recorded at both Birr Castle and 222  for 1901for , 1902for , 1903for , 1904for , 1905for and 1906for . The 1891for -1920

242
Roches Point's coastal location is evident from the less extreme temperature values, with higher mean 243 minimum temperatures and lower mean maximum temperatures (Figures 8 and 9).  1901, 1902, 1903, 1904, 1905 and 1906 (Table S2). A combination of using too many highly 290 correlated variables ( Figure S1) and too few farm growing seasons likely contributed to this. The 291 worst performing was backwards stepwise selection which did not drop any variables. The two lasso 292 methods reduced the model complexity significantly from 19 to less than 7 climate variables, but 293 these models had very low adjusted R 2 values of close to 0, indicating a poor model fit (Table S3). 294 Using the mixed-model backwards elimination approach, we found that all the climate variables were 295 dropped. In all methods except this, July maximum temperature was kept in the model. 296

297
Using PCA we found that the first 6 principal components (PC) explained ~90% of the observed 298 variation in yield (Table S4). These were then input into equation [3], replacing the climate variables.

Pearson's correlation analysis 302
In yield-climate correlation analysis we found July rainfall and July maximum temperature have the 303 largest absolute correlation with yield: -0.49 and 0.45, respectively ( Figure S2). These variables have 304 a strong negative correlation. 305

306
To understand if adding temperature or rainfall climate variables to the mixed model [3] improves the 307 fit, the AIC of the mixed model of [3] without any climate covariates was first calculated (Table S5). 308 We then added each climate variable and its interaction with variety to the model one at a time. None 309 of the interactions with variety were significant, the variety x climate interaction term was dropped 310 from the model and the models with each climate variable were looped through again. 311 Only three variables -July maximum temperature, August maximum temperature and July total 312 rainfallwere significant when included in the model. The models which included either July 313 maximum temperature or August maximum temperature improved the AIC and model fit. Notably all 314 the models that contained temperature had a lower AIC and better fit than any of the rainfall models, 315 including the significant July rainfall model (Table S5). 316 Both July mean maximum temperature and August mean maximum temperature had a positive 317 relationship with yield (Table S5), such that yield increased by ~ 1/4 t/ha per 1 o C increase in July mean 318 maximum temperature and by ~ 1/5 t/ha per 1 o C increase in August maximum temperature. 319

Comparison of standard error of difference between means 320
The mean difference in the variety values is £1.52/ha (12 shillings/acre) with a standard deviation of 321 £2.95/ha (23.9 shillings/acre) and corresponding standard error of the mean difference £0.41/ha (3.3 322 shillings/acre), in accordance with Student (1923). This corresponds to a t-statistic of 3.680, which 323 was statistically significant (p=0.0006) at the 95% level (df = 50). This provided strong evidence that 324 there was a difference in varietal performance. 325 Calculating the standard error of difference between variety values in the three mixed models 326 containing significant climate effects (Table S5)  with low GDDs (Figure 6). This coincides with the year of lowest mean yields and greatest yield 342 variability for Archer, but much lower variability for Goldthorpe (SD = 0.22 t/ha) (Figure 2). 343 A more recent experiment detailed by Gothard et al. (1983) found that Goldthorpe outperformed 344 Archer when spring and summer rainfall was high. Along with our results, this suggests Goldthorpe 345 may be able to withstand much higher soil moisture and waterlogging. Hunter (1929)

notes that 346
Goldthorpe requires plenty of moisture to produce the best yields and quality, supporting this theory. In contrast, the 1905 growing season was warmer than average ( Figure 12) with high growing season 352 GDDs ( Figure 6). There was low growing season PAR (Figure 11), but high PAR in July, when high 353 solar radiation is important for grain fill. The growing season was drier than average, starting wet but 354 drying in June and July. These favourable conditions likely contributed to the relatively high yields 355 seen in 1905 for both varieties. 356 Of the farms with 6 years of trials data, Farmer Wolfe performed the best on average (Figure 3). This 357 farm was located ~30km south-west of Birr Castle and experienced higher summer temperatures and 358 less summer rainfall than the other two farms. Other factors such as favourable agronomy, farm 359 management and soil type may also have encouraged higher yields here.

361
Through trialling various variable selection methods, we have highlighted the importance of 362 identifying collinearity early on in analysis involving multiple covariates. The use of these methods 363 and PCA was limited by the high correlation between covariates within a small dataset, but it was still 364 possible to extract information on the most important variables using simple mixed models. We were 365 able to show that July maximum temperature and August maximum temperature had a positive 366 relationship with yield and that July total rainfall had a negative relationship with yield (Table S5). 367 July rainfall can also be used as a proxy for solar radiation, so a wet July would usually be associated 368 with more cloud cover, reducing solar radiation interception during grain fill. Likewise wet weather 369 during grain filling can encourage ear and grain diseases, such as fusarium ear blight and ergot, which 370 can cause shrivelled grain and mycotoxins (AHDB 2018). Hence the plant benefits from more solar 371 radiation and less rainfall in July. Higher July maximum temperature implies less daytime cloud cover 372 intercepting solar radiation, hence the correlation between these two July variables and yield is of 373 opposite polarity. In future analysis of more recent crop yield data, inclusion of solar radiation data in 374 the models would be desirable to directly quantify the relationship between solar radiation and yield. 375 Our finding that July temperatures are positively correlated with spring barley yield contrasts with 376 other published research which shows that warmer temperatures during anthesis and grain fill can 377 have a detrimental effect (Addy et al. 2021;Hakala et al. 2020). This result is highly likely due to 378 July maximum temperatures in Ireland in the early 20 th century falling well short of those more 379 regularly seen today in some major UK spring barley growing areas. Specifically, maximum 380 temperature did not exceed 28 o C during the 6-year trials period whereas those in South-East England 381 now regularly exceed 30 o C in summer months. This finding shows the importance of region-specific 382 crop-climate research: despite the proximity of the UK to Ireland their climates differ and the same 383 relationships between weather variables and yield cannot be assumed. 384 We were unable to detect any GxE within the mixed models used. The lack of significance throughout 385 of climate variety interactions may well be related to the relatively small trials dataset, approximation 386 of site locations and sometimes large distances to weather stations. However, it is clear from the 387 UKRI 2016). How these varieties perform in the current and future climate is of interest given the 396 performance of these varieties in the 1901-1906 trials. It is hoped that Archer and Goldthorpe will be 397 trialled on large scale field plots to allow for comparisons with the yields from 1901-1906, but also to 398 test the models in the current climate on larger datasets. Student states that the advantage for the farmer of large scale trials is that s/he "…always has a 405 healthy contempt for gardening" and … "some varieties which have come out well on the small scale 406 have not done as well in the field, though this is not at all common". That said, two-acre plots (0.4 407 ha) are very large, as Student recognises, even for large-scale plots, though the produce was also 408 intended to provide seed for subsequent manufacturing tests, presumably including malting though we 409 have no record that they took place. 410 The importance of collaboration is also commented on: here between farmers in carrying out large 411 scale experiments -"… it is only by co-operation [between farmers] that enough evidence can be 412 obtained to be of any value", though he sees such co-operation as being most likely co-ordinated by 413 government bodies. It is unfortunate that, to date, that hasn't happened. 414 Another laudable feature of Student's paper is that he made the data available. Admittingly this was 415 largely to illustrate the method of analysis, but full data release is still not the norm. Subsequently, the 416 data was reanalysed by Patterson (1997), also for educational purposes. We do not know whether 417 Student had soil and weather records available to him (he was analysing the yield data when it was 418 already twenty years old) or whether he would have felt it advantageous to include them. In fact, we 419 find near identical results to Student: Archer yields more than Goldthorpe. In the absence of any 420 detectable variety x climate variable interactions (as here), this is expected. The climatic variables 421 which are available to us have, however, been used to identify drivers of yield differences between 422 sites and years in a dataset ~120 years old. 423

424
Through combining recently published historical rainfall and temperature data with spring barley 425 trials data, it has been possible to identify climatic influences on spring barley yield variability seen in Despite being available for ~100 years, we have demonstrated that there is value in adding historical 428 climate data to this small trials dataset. Today's large-scale trial datasets provide a great opportunity 429 for further insight on crop-climate interactions in a changing climate. 430 431