Catching-up with genetic progress: Simulation of potential production for modern wheat cultivars in the Netherlands

2013 – 2015 data set. Parameter values of both calibrations were compared. Sensitivity analysis was used to assess to what extent climate change, elevated CO 2 , changes in sowing dates, and changes in cultivar traits could explain yield increases. Results: The estimated reference light use efficiency and the temperature sum from anthesis to maturity were higher in 2013 – 2015 than in 1982 – 1984. PCSE-LINTUL3, calibrated on the 1982 – 1984 data set, underestimated the yield potential of 2013 – 2015. Sensitivity analyses showed that about half of the simulated winter wheat yield increase between 1984 and 2015 in the Netherlands was explained by elevated CO 2 and climate change. The remaining part was explained by the increased temperature sum from anthesis to maturity and, to a smaller extent, by changes in the reference light use efficiency. Changes in sowing dates, biomass partitioning fractions, thermal requirements for anthesis, and biomass reallocation did not explain the yield increase. Conclusion: Recalibration of PCSE-LINTUL3 was necessary to reproduce the high wheat yields currently obtained in the Netherlands. About half of the reported winter wheat yield increase was attributed to climate change and elevated CO 2 . The remaining part of the increase was attributed to changes in the temperature sum from anthesis to maturity and, to a lesser extent, the reference light use efficiency. Significance: This study systematically addressed to what extent changes in various cultivar traits, climate change, and elevated CO 2 can explain the winter wheat yield increase observed in the Netherlands between 1984 and 2015.


Introduction
Crop growth and yield are the result of interactions between crop genetic factors, environmental conditions, and crop management (often referred to as G x E x M; Hatfield and Walthall, 2015). Crop growth models simulate the development and growth of crops in a dynamic way and thereby take these factors, and their interactions, into account (Boote et al., 1996;Wallach et al., 2019). This makes crop modelling a useful tool for a wide-range of applications (van Ittersum et al., 2003) including yield-gap analysis (Schils et al., 2018), optimisation of crop management practices (McNunn et al., 2019), yield forecasting (Paudel et al., 2021), decision-support Divya et al., 2021), and climate change impact assessments on food security (Semenov et al., 2014). Yet, crop models can only be relevant for real-world applications if they are thoroughly calibrated and evaluated within research cycles linking simulation and experimentation (Silva and Giller, 2020). The latter encompasses crop models being able to reproduce the potential yield of a crop (i.e., the maximum yield that can be achieved by a crop genotype) in a well-defined biophysical environment (van Ittersum and Rabbinge, 1997).
Winter wheat was the third most cultivated crop in the Netherlands during the growing season of 2021, occupying 16% of the cultivated area in the country (CBS, 2021). There has been considerable genetic progress for wheat in the Netherlands (0.10 Mg ha -1 year -1 between 1978 and 2010; Rijk et al., 2013) with highest-yielding fields in the country reaching nearly 11 Mg fresh weight (FW) ha -1  in recent years. Such yield levels are considerable higher than the grain yield of 9.2 Mg FW ha -1 (assuming a 15% grain moisture content) obtained in experiments conducted under potential growth conditions in 1982-1984). Yet, the experiments documented by Groot and Verberne (1991) are still widely used by the crop modelling community. For instance, the data set was used to calibrate 14 different models at the workshop in which the data were first presented . Thereafter, the data set was used in various other crop modelling studies (Zhang et al., 2010;Yang et al., 2009;Li et al., 2009;Yin et al., 2001;van Delden et al., 2001;Asseng et al., 2000;Wang and Engel, 1998;Kleemola et al., 1998), including several recent ones (Palosuo et al., 2011;Ratjen and Kage, 2013;Olin et al., 2015;Kassie et al., 2016;Yang et al., 2017). Various wheat crop models have also been compared against these data in the Agricultural Model Intercomparison Project (AgMIP; (Ruane et al., 2016). The results of such exercises are likely inappropriate to inform decisions on wheat production in the Netherlands given the considerable genetic progress observed for this crop over the past decades (Rijk et al., 2013). Similar yield progress was documented for wheat crops in the UK (Mackay et al., 2011), France (Brisson et al., 2010), and Finland (Peltonen-Sainio et al., 2009).
Field experiments suitable for crop modelling are rare as the costs involved in field monitoring and data collection are high, explaining the heavy reliance of many recent studies on old data sets. Recently, however, a comprehensive data set has been collected for winter wheat growth and development in the Netherlands (Wiertsema, 2015) during the growing seasons 2013-2014 and 2014-2015. This recent data set provides opportunities to investigate which crop model parameters for winter wheat calibrated using the 1982-1984 data set of Groot and Verberne (1991) remain representative for modern winter wheat varieties and which ones changed over the last 35 years.
This study aimed: (1) to assess to what extent winter wheat traits, as captured directly or indirectly (through aggregation) in the form of crop growth model parameters, changed over the last 35 years in the Netherlands, (2) to identify which traits explained the observed increase in wheat yields over the same period, and (3) to quantify to what extent elevated CO 2 and climate change could explain the observed yield increase.

Material and methods
The PCSE LINTUL3 model is used for all simulations presented in this study. In summary, first the model was calibrated and evaluated against field experiments conducted in 1982-1984. Second, it was used to simulate the growth of winter wheat in field experiments conducted in [2013][2014][2015] to investigate if that model parametrization could simulate the growth of modern wheat varieties accurately. Third, PCSE LINTUL3 was recalibrated using the 2013-2015 data set and the parameter values obtained in this re-calibration were compared to the values obtained with model calibration using the 1982-1984 data set. Finally, simulations were conducted to examine for each recalibrated crop parameter whether its change explained the observed increase in wheat yields over time. In the next sections, these steps are described in more detail.

Description of data sets used for model calibration and evaluation
2. 1.1. 1982-1984 data set  Winter wheat cultivar Arminda was sown in three different sites across the Netherlands (De Bouwing, De Eest, and PAGV, which stands for "Proefstation voor de Akkerbouw en de Groententeelt in de Vollegrond"; see Table 1) under three different N fertilization treatments (Table A1). No irrigation was supplied. The release year of this cultivar was 1977 (Schot et al., 2000). In the present study, only data from the high fertilization rate treatment N3 was used. Destructive measurements were conducted in each site; 10 and 11 measurements were made during the growing season of 1982-1983 and 1983-1984, respectively. Measurements included accumulated aboveground dry matter and its partitioning over green leaves, dead leaves, stems, chaff, and grains. Crop phenology was recorded using the Zadoks decimal scale (Zadoks et al., 1974). (Wiertsema, 2015) The second data set was obtained during a field experiment conducted in the growing seasons 2013-2014 and 2014-2015 in Wageningen, the Netherlands (Table 1), under different N fertilization levels (Table A1). Again, only data from the high fertilization treatment (N3) were used in this study. During the 2013-2014 growing season, 15 mm of water was applied twice as irrigation on July 1 and July 4. No irrigation was supplied in the 2014-2015 growing season. Destructive measurements of aboveground dry matter, and its partitioning to the different organs, and leaf area index were taken at 10 and 9 different moments in 2013-2014 and 2014-2015, respectively. Crop phenology was recorded in the Feekes scale (Feekes, 1941). Three different cultivars (Julius, Tabasco, and Ritmo) were tested in 2013-2015. Julius and Tabasco were released in 2009 and 2008, respectively, whereas Ritmo is an older cultivar released in 1992.  (Groot, 1987). In the 2013-2014 growing season, minimum temperature, maximum temperature, and global radiation were measured daily with a weather station located in the field. In the 2014-2015 growing season, these data were obtained from the nearby Veenkampen weather station located 2 km north of the experimental site (51 o 58' N; 5 o 37' E). Daily weather data for the simulations for the sensitivity analysis for the period 1978-1988 were obtained from the Haarweg weather station and for the analysis in the period 2008-2018 from the Veenkampen weather station. These weather data contained a number of missing values in 2016. For these missing values, weather data from the nearby KNMI (Royal Dutch Meteorological Institute) weather station Deelen (52 o 3 N'; 5 o 53' E) were used instead.

Overview of LINTUL-3
The crop growth model LINTUL-3 was originally developed to simulate the growth of flooded rice in Asia (Shibu et al., 2010). The model also includes parameter values for spring wheat, to which we refer to as default parameter values in this manuscript, and was originally implemented in the Fortran Simulation Translator (FST) framework (Rappoldt and Van Kraalingen, 1996). LINTUL-3 has a relatively small number of parameters, several of which can be directly calculated from field observations. A new version of the model was recently released as part of the PCSE (Python Crop Simulation Environment, de Wit, 2023) framework, which is the version employed in this study. An overview of the state variables and parameters of this model, which is referred to as 'PCSE-LINTUL3' in this manuscript, is provided in Appendices A2 and A4. All simulations in this study assume that winter wheat was grown under potential growth conditions, without water and nutrient limitations, and without pests, diseases and weeds.

Adjustments of PCSE-LINTUL3 and its source code
The source code of PCSE-LINTUL3 with the adjustments described in this study can be found at https://github.com/ajwdewit/pcse/tree/ develop_lintul3/pcse.

Sensitivity of phenology to vernalization and photoperiodicity.
The original LINTUL-3 did not consider the sensitivity of the development stage to vernalization and photoperiodicity (Shibu et al., 2010).
Recently, the phenology model in the PCSE framework was extended with modules that calculate reduction factors of the development rate in case of suboptimal daylengths or in case the vernalization requirement has not been met yet (Ceglar et al., 2019;de Wit et al., 2020). All simulations done in this study considered the effects of daylength and vernalization on crop development.

2.2.2.2.
Reallocation of dry matter to grains. PCSE-LINTUL3 was further extended with routines simulating the biomass reallocation from the stems and chaff to the grains. For this purpose, PCSE-LINTUL3 was extended with a state variable for the dry matter that can be reallocated, WREALLOC(t) (g m -2 ), where t represents time. Before the development stage at which reallocation starts, DVS_REALLOC (-), is reached, there is no reallocation of dry matter to the grains and WREALLOC(t) = 0. At the day t realloc at which this development stage is reached, the initial amount of dry matter that can be reallocated is calculated as a fraction REALLOC_FRAC of the stem and chaff dry matter at that time: where WST(t) is the stem and chaff dry matter (g m -2 ) at time t realloc . From time t realloc until harvest, a fixed fraction REALLOC_RATE_REL (d -1 ) of this initial amount is reallocated. It is daily transferred from the stem and chaff to the grains until there is no dry matter left that can be reallocated. The rate of reallocation of dry matter from the stems and chaff, REALLOC_RATE(t), is calculated as: where Δt is the time step (d), which is always 1 d in PCSE-LINTUL3.

2.2.2.3.
Response of LUE to ambient CO 2 . PCSE-LINTUL3 was expanded with the possibility to adjust the light use efficiency (LUE) to rising CO 2 levels. To do so, the model calculates a factor f CO2 , which is the increase in LUE relative to a reference light use efficiency LUE ref at ambient CO 2 level of 350 ppm. The factor f CO2 and actual LUE were calculated as (O'Leary et al., 2015): where C a (t) is the ambient CO 2 level (ppm) at time t.

Model calibration and evaluation on the 1982-1984 data set
Data from the 1982-1983 growing season were used to estimate crop parameters and calibrate PCSE-LINTUL3. Data from the 1983-1984 growing season were used as independent data to evaluate model performance. Fig. 1. Function for biomass partitioning to the roots. An negative exponential growth function (line) was fitted to observed biomass partitioning fractions to the roots from various data sets (dots) compiled by van Keulen and Seligman (1987).  Overview of PCSE-LINTUL3 parameters that were either added or re-estimated for cultivars Arminda, Julius, Ritmo, and Tabasco. Re-estimated functions of biomass partitioning are displayed in Fig. 3 and Table A3.      and the saturated vernalization requirement (VERNSAT; d), daylength below, which there is no crop development (DLC; h) and for the daylength above which there is no further reduction of crop development (DLO; h) were adopted from Ceglar et al. (2019). We also assumed parameter TSUMAG ( o C d), the temperature above which leaf senescence starts, equals TSUM1 (van Oijen and Leffelaar, 2010). TSUM2 was calculated separately for each site in the calibration data set as the temperature sum between the dates of anthesis and maturity.

Biomass partitioning to roots and total dry matter.
The 1982-1984 data set did not contain data on root dry matter. Nevertheless, PCSE-LINTUL3 requires the partitioning of newly produced biomass to the roots as input. Therefore, the root dry matter was calculated from data compiled from various experiments of spring wheat and winter wheat (van Keulen and Seligman, 1987). The following equation was fitted to those data ( Fig. 1): where f rt (t) is the fraction of newly produced biomass that is partitioned to the roots at time t, f rt,0 is the biomass partitioning fraction to the roots at DVS = 0, and c is the relative growth rate of FRT. DVS(t) is the development stage at time t. The values of f rt,0 = 0.58 and c = − 2.58 were found by fitting Eq. 4 to the data from van Keulen and Seligman (1987) (r 2 = 0.53, Fig. 1). For each day between sowing and the harvest date in the calibration data set, the total aboveground dry matter was estimated by linear interpolation between the measurement points. From these values, the daily increase in total aboveground dry matter (ΔTAGB) was calculated.

This daily increase was multiplied with
( 1 − f rt (t) ) − 1 to estimate the daily increase in total dry matter, including root dry matter.

Determining the actual and reference light use efficiencies.
For each day between sowing and harvest date in the N3 treatment of the calibration data set, the leaf area index was estimated by linear interpolation between the field measurements. It was assumed for these calculations that the leaf area index was 0 at emergence date and at harvest date. For each estimated leaf area index, the daily amount of intercepted PAR (PARINT, MJ PAR d -1 ) was calculated following the Lambert-Beer law: where f PAR is the fraction of PAR in the global radiation (MJ PAR MJ -1 radiation; 0.5 MJ PAR MJ -1 radiation was assumed), DTR(t) is the daily global radiation (MJ radiation d -1 ) at time t, K is the extinction coefficient (m 2 ground m -2 leaf; 0.6 m 2 m -2 was assumed) and LAI(t) is the leaf area index (m 2 leaf m -2 ground) at time t. For each measurement date, the cumulative amount of intercepted PAR was calculated using Eq. 5. For each site in the calibration data set, the actual light use efficiency was estimated as the slope of a linear model, with the intercept forced at the orgin, fitted between the estimated total dry matter and the cumulative amount of intercepted PAR at each measurement day. Subsequently, the reference light use efficiency for each site was calculated. This was done by determining the CO 2 level of the harvest year from Fig. 6. Measured (dots) and simulated (lines) aboveground dry matter (a-c) and grain yield (d-f) for three different cultivars during two growing seasons in Wageningen. On this field site, culitvars Julius (a, c), Ritmo (b, e) and Tabasco (c, f) were grown. For each simulated cultivar, the crop parameters that were estimated for that cultivar were used in the simulations. NOAA (https://gml.noaa.gov/ccgg/trends/global.html#global), using this value to calculate f CO2 (Eq. 3), and calculate the reference light use efficiency (at 350 ppm) by dividing the actual light use efficiency by f CO2 .
2.3.1.5. Reallocation parameters. REALLOC_DVS was estimated for each site in the 1982-1983 data as the development stage at which the highest stem and chaff dry weight was measured. REALLOC_FRAC was estimated as the ratio of the measured stem and chaff dry matter at the date that REALLOC_DVS was reached to the measured stem and dry weight at harvest. REALLOC_RATE_REL was estimated as the inverse of the time difference between the harvest date and the date that development stage REALLOC_DVS was reached.
2.3.1.6. Biomass partitioning fractions. PCSE-LINTUL3 simulates the fraction of biomass that is partitioned to an organ as a so-called tabular function of the development stage. In tabular functions for biomass partitioning, fractions of biomass partitioning to an organ are given for specific development stages and the remaining biomass partitioning fractions are obtained by linear interpolation between the known values (Rappoldt and Van Kraalingen, 1989). If the elements of these table functions would be estimated directly through optimization, each element would represent a single parameter which makes the number of parameters to be estimated rather large. To avoid this, the biomass partitioning coefficients to the leaves were estimated using a biomass partitioning model (Berghuijs et al., 2020), which was extended to take into account that a part of the newly produced dry matter in PCSE-LINTUL-3 is assigned to the root. The fractions of newly produced dry matter partitioned to leaves (f lvg (t)), grains (f so (t)), and stems and chaffs (f st (t)) were calculated as: where f lvg,0 is the fraction of newly produced biomass that is assigned to the leaves as long as DVS(t) is below a threshold value d 1 . d 1 is the development stage above which the biomass partitioning to the leaves decreases. d 2 is the development stage above which there is no biomass partitioning to the leaves. d 3 is the development stage at which the biomass partitioning to the grains starts. d 4 is the development stage above which there is no more biomass partitioning to stems and chaff.
Only during this calibration step, the table functions for biomass partitioning to roots, leaves, stems and chaffs, and grains were replaced by the model in Eqs. 4 and 6-8 assuming that d 3 = 1 (i.e. the production of grains starts at anthesis). The other parameters were estimated in two steps for each site in the calibration data set. In the first step, parameters f lv0 , d 1 , and d 2 were estimated for each location separately by minimizing the sum of normalized root mean squared errors (ten Den et al., 2022) of leaf area index and leaf dry weight. These estimates were used as input for the second step in which d 4 was estimated by minimizing the sum of normalized root mean squared errors for leaf dry weight, stem and chaff dry weight, and grain dry weight.  [1982][1983], were used to simulate the growth of winter wheat at each location in the 1982-1983 data. The same parameter values were used to simulate the growth of winter wheat in 1983-1984 to determine the model's ability to reproduce observations that were not used for calibration. Simulated and measured values of total aboveground dry matter and grain yield were compared.

Statistical metrics for model evaluation
Two indicators of model performance were used to quantify the model's ability to reproduce the measured aboveground dry matter and yields. These were the root mean squared error (RMSE; g m -2 ) and the mean bias error (MBE; g m -2 ). These statistical indices were calculated as follows: where y i is the measured value of a variable y with index i, and ŷ i is the value of y i predicted by PCSE-LINTUL3. N is the total number of measurement days. The RMSE is a measure of the overall difference between measured and simulated values. The MBE is a measure of the extent of overestimation (if negative) or underestimation (if positive) of the simulated values compared to the measured values.

Model evaluation on the 2013-2015 data set, after calibration on the 1982-1984 dataset
The crop parameter estimates from the 1982-1984 data set were used as input to simulate the growth of each cultivar in the 2013-2015 data set. These simulations used sowing dates, harvest dates, CO 2 and weather data from 2013 to 2015 as input. Measurements and In all other scenarios, all parameters for cv Arminda except for parameters for biomass partitioning parameters (BP), light use efficiency (LUE), phenology (PH), reallocation (RL) or combinations of these parameter types, were used as input. The percentages indicate the increase of the average simulated yield relative to the baseline scenario. The abbreviation n.s. indicates that the average yield in a scenario is not significantly different from the average yield of the baseline scenario. Fig. 9. Average values of a) duration of the vegetative phase, and b) duration of the reproductive phase, c) total PAR, d) amount of intercepted PAR, and e) harvest index. The percentages indicate the increase of the average simulated yield relative to the baseline scenario (Arminda harvested in the period 1979-1988). The abbreviation n.s. indicates that the average yield in a scenario is not significantly different from the average yield of the baseline scenario. simulations of total aboveground dry matter and grain yield were compared. The performance of the model to reproduce these variables for the 2013-2015 data set was quantified using the RMSE (Eq. 9) and the MBE (Eq. 10).

Model calibration and evaluation on the 2013-2015 data set
Observations from the 2013-2014 growing season were used for further model calibration. Observations from the 2014-2015 growing season were used for model evaluation. For each cultivar in the 2013-2015 data set, the crop parameters and tabular functions that were estimated from the 1982-1984 data set were estimated with the data from the 2013-2014 growing season as well. The same procedure was followed as in the 1982-1984 data set, with the exception of the aggregation step. This latter step was not necessary, as the cultivars in the 2013-2015 data set were, unlike Arminda grown in 1982-1984, only grown at a single location ( Table 1). The parameters were estimated separately for each cultivar. The obtained estimates were used to simulate both the 2013-2014 data (calibration) and the 2014-2015 data (evaluation). Simulated and measured total aboveground dry matter and grain yields were compared and the RMSE and MBE were used to assess model performance.

Yield response to changes in climate, ambient CO 2 , sowing dates, and crop parameters
Differences between wheat yields in the Netherlands during the period at which the 1982-1984 data set was collected and the period that the 2013-2015 data set was collected might be explained by various factors. These factors can be subdivided into changes in cultivar traits between these periods and changes in other factors (climate change, increase of ambient CO 2 levels, sowing dates). The effects of noncultivar related factors and cultivar-related factors were investigated in two separate sets of sensitivity analyses as described below.

Sensitivity analyses set 1: Quantification of yield change due to noncultivar related factors
The potential yield was simulated over a period of 10 years and averaged. Harvest was assumed to take place when the crop reached physiological maturity. The baseline scenario, called "1979-1988", consisted of winter wheat harvested between 1979 and 1988. The previously estimated crop parameters of cv. Arminda were used as input. The input weather data were from the weather station Haarweg (Wageningen). For this scenario, the average ambient CO 2 concentration for the period 1979-1988 was used as input. The sowing date for each year in this baseline scenario was calculated by averaging the days of the years of sowing from the various locations in the 1982-1984 data set. For scenario "2009-2018", weather data from the weather station Veenkampen (Wageningen) for this period were used as input to simulate the average potential yields, but further all other input data were the same as in the baseline scenario. The remaining scenarios were "2009-2018 + CO2", "2009-2018 + SDATE", and "2009-2018 + CO2 + SDATE". The results of these scenario indicate the sensitivity of the potential yields to the average CO 2 concentration of the period 2009-2018 (CO2), or the sowing date of that period (SDATE) or both. For each scenario, student t-test (significance level 0.05) was used to test for significant differences between the baseline scenario and each of the other scenarios.

Sensitivity analyses set 2: Quantification of yield change due to cultivar related factors
The baseline scenario for the second set of sensitivity analyses is called "Arminda" and it is identical to the scenario "2009-2018 + CO 2 + SDATE" (see Section 2.6.1). New scenarios were devised to assess changes in yield potential with the crop parameters estimated from the 2012-2014 data set for Julius, Ritmo, an Tabasco. The investigated cultivar parameters were subdivided into (1) biomass partitioning parameters BP (FLVTB, FRTTB, FSOTB, FSTTB), (2) LUE related parameters (only LUE ref ), (3) phenological parameters PH (TSUM1, TSUM2, TSUMAG) and (4) reallocation related parameters (REALLOC_DVS, REALLOC_FRAC REALLOC_RATE_REL). For each possible combination of parameter groups, the average simulated yield across the growing seasons 2008-2009 and 2017-2018 was calculated if the values for that parameter group combination for Arminda would be replaced by estimates from cultivars parameters from the 2012-2014 data set. Student t-test was used to test for significant yield differences between the potential yields of each scenario and that of the baseline simulations.

Additional variables
A number of additional variables were calculated from the model output for Arminda grown between 1978and 1979-1988and Julius, Ritmo, and Tabasco simulated between 2008and 2017-2018. These variables included the lengths of the vegetative growth phase (sowing to anthesis) and the reproductive phase (anthesis to harvest), the annual PAR sum (between sowing dates of 2 consecutive seasons), the amount of intercepted PAR (using Eq. 5), and the harvest index.

Crop parameters
The estimated value of temperature sum from emergence to anthesis, if there would be no vernalization or daylength sensitivity (TSUM1), from the calibration data set was 889 ± 12 o Cd (Table A5). The temperature sum (without vernalization or daylength sensitivity) at which leaf senescence starts (TSUMAG) was assumed to equal TSUM1. The estimates of the temperature sum from anthesis to maturity (TSUM2) varied less (830 ± 7 o Cd; Table A5). Fitting Eq. 5 to the coupled observations of root dry matter and development stage resulted in f rt,0 = 0.533 kg kg -1 and c = − 2.145 (Fig. 1). For each location in the 1982-1983 growing season, there was a strong linear relationship between the estimated total dry matter and estimated cumulative intercepted amount of PAR. The slope of these relationships, which represents the actual light use efficiencies, varied from 2.51 g MJ -1 in PAGV to 2.94 g MJ -1 in De Bouwing (Fig. 2a,c,e, Table A2). Since the reference level of ambient CO 2 (350 ppm) was rather close to the CO 2 level in the harvest year of 1983 (343 ppm), the calculated values for reference light use efficiencies LUE ref (2.51-2.94 g MJ -1 ) were close to the actual light use efficiency (Table A7). The development stage from which reallocation starts, REALLOC_DVS, varied little (1.31-1.35). There was more variation in the fraction REALLOC_FRAC of stem dry matter at development stage REALLOC_DVS that is made available for reallocation (0.105-0.275). The estimates of the relative rate of reallocation REALLOC_RATE_REL were identical (0.0357 d -1 ) across all locations (Table A9) Table A11. For all aforementioned parameters, the values that were used for further simulations for cultivar Arminda were equal to their location based averages (Table 2). From the estimates of f rt,0 , c (Eq. 4, Fig. 1) and the averages of d 1 , d 2 , d 4 , and f lv,0 (Table A11, Eqs. 6-8), the tabular functions of the biomass partitioning to the roots (FRTTB), leaves (FLVTB), storage organs (FSOTB), and stems and chaffs (FESTS) were determined ( Fig. 3; Table A3).

Assessment of model performance
The model overestimated the total aboveground dry matter in the early growing season of 1982-1983 that was used for calibration (Fig. 4). Later in the growing season, this overestimation was not observed. The model performed well in reproducing the grain dry matter in the calibration growing season (Fig. 4d-f, Table A13).
Also in the 1983-1984 growing season used for evaluation, the aboveground dry matter was overestimated in the early growing season and the degree of overestimation was higher in De Bouwing than at the other locations (Fig. 4a-c, Table A13). In this season, the grain dry matter was well simulated (Fig. 4d-f).

Model evaluation on the 2013-2015 data set after calibration on the 1982-1984 data set
When the crop parameters of cultivar Arminda were used to simulate the growth of the more modern cultivars Julius, Ritmo, and Tabasco in the 2013-2015 data set, the model performed well in simulating the aboveground dry matter in both 2013-2014 and 2014-2015 (Fig. 5a-c, Table A14). During the early growing season, the aboveground dry matter was overestimated. The model also performed well in simulating the grain dry matter in 2013-2014. However, it substantially underestimated the grain dry matter in 2014-2015; the MBEs for Julius, Ritmo, and Tabasco were respectively 2.27 Mg ha -1 , 3.97 Mg ha -1 , and 4.01 Mg ha -1 .

Estimation of crop parameters
TSUM1 and TSUM2 had the same value for all three cultivars (TSUM1 = 932 o C d and TSUM2 = 993 o C d; Table 2) respectively, because the 2013-2015 data set did not contain separate phenological observations per cultivar. There was a strong linear relationship between estimated total dry matter and total intercepted PAR at the measurement days (Fig. 2b,d,e). There was variation in the estimates of the actual LUE between the three cultivars (2.97-3.13 Mg ha -1 ). Therefore, the reference LUE also varied per cultivar (2.83-2.98 Mg ha -1 ) (Fig. 2b,d,e, Table A8). However, the reference values of LUE were considerably lower than the actual ones, because the CO 2 level in 2014 (399 ppm) was considerably higher than the reference CO 2 level (350 ppm).

Assessment of model performance
After calibrating PCSE-LINTUL3 for cultivars Julius, Ritmo, and Tabasco, the model performed still well in reproducing the total aboveground dry matter during the later part of the growing season 2013-2014 used for calibration and slightly overestimated the aboveground dry matter in the earlier stage of this growing season (Fig. 5a-c, Table A15). The model also performed well in reproducing the grain yield of 2013-2014 ( Fig. 6a-c, Table A15).
The model performed well in reproducing the aboveground dry matter in the growing season 2014-2015, although the early aboveground dry matter is overestimated (Fig. 6a-c, Table A15). The grain yield was underestimated in this growing season (MBE between 1.18 and 2.72 Mg ha -1 ) ( Fig. 6d-f, Table A15), but to a lesser extent than when the crop parameters for Arminda would have been used (see Section 3.2, Table A15).

Sensitivity analyses set 1: Quantification of yield change due to noncultivar related factors
PCSE-LINTUL3 predicted that the average yield of cultivar Arminda over the harvest years 1979-1988 (baseline scenario "1979-1988") was 8.83 ± 0.42 Mg ha -1 (Fig. 7). If Arminda would be grown between 2009 and 2018, but the CO 2 levels and sowing date were the same as in the baseline scenario, the relative increase of average simulated yield was 14% (scenario "2009-2018"). Running the same simulations with the sowing dates from the growing season 2008-2009 to the growing season 2017-2018 ("2009-2018 + SDATE") resulted in almost the same yield as obtained in scenario "2009-2018". However, running scenario "2009-2018" with the CO 2 levels from this period resulted in a simulated yield increase of 22% relative to the baseline scenario ("2009-2018 + CO2") and using both the CO 2 levels and the sowing dates from 2009 to 2018 ("2009-2018 + SDATE + CO2)" resulted in an increase in simulated yield of 23%. All investigated scenarios resulted in a significantly higher average yield than in the baseline scenario. Thus, climate change and elevated CO 2 levels were responsible for substantial increases in wheat potential yield, while the effect of differences in sowing date was negligible.

Sensitivity analyses set 2: Quantification of yield change due to cultivar related factors
For this set of sensitivity analyses, the baseline scenario "Arminda" was identical to the scenario "2009-2018 + SDATE + CO2" from the previous set of sensitivity analyses. This scenario resulted in a yield of 10.8 ± 0.65 Mg ha -1 . Next, the increase in average yield relative to the baseline was calculated for each modern cultivar (Julius, Ritmo, Tabasco) and each group of re-estimated parameters (Fig. 7). For none of the cultivars, using the new parameter values for biomass partitioning ("BP") or for reallocation ("RL") resulted in a significant increase of the average of the simulated yields. In Tabasco, replacing the reference LUE ("LUE") did also not lead to a significant yield increase. The yield increase for scenario LUE was small but significant for Julius (8%) and Ritmo (7%). Changing the phenology parameters resulted in a larger yield increase for all three cultivars (11%). The strongest yield increase for all three cultivars was obtained when just the phenological parameters and the reference LUE would have been replaced ("LUE + PH"). When all re-estimated parameters were used as input, the relative yield increases for Julius, Ritmo and Tabasco were 15%, 11%, and 9%. The corresponding final yields were 12.4 ± 0.69 Mg ha -1 , 12.0 ± 0.65 Mg ha -1 , and 11.8 ± 0.63 Mg ha -1 .

Additional calculations explaining cultivar effects
The differences in duration of the vegetative phase were not significantly different between Arminda and Julius, Ritmo and Tabasco, regardless the period in which Arminda was grown (Fig. 9a). The duration of the reproductive phase of Arminda simulated from the 1978-1979 growing season to the 1987-1988 growing season was significantly shorter than for Arminda simulated from the 2008-2009 growing season to the 2017-2018 growing season Additionally, the duration of the reproductive phase of Arminda simulated in either of the periods was significantly shorter than for Julius, Ritmo, and Tabasco. The amount of intercepted PAR was significantly lower for Arminda simulated between the harvest years 1979 and 1988 than if the same cultivar was harvested between 2009 and 2018. Julius, Ritmo and Tabasco intercepted more radiation than Arminda during the same period (Fig. 9c). The total amount of PAR between the sowing date of 1978-1979 and the harvest date of 1987-1988 was significantly smaller than the amount between the sowing date of 2008-2009 and harvest date of 2017-2018 (Fig. 9d). The harvest index was not significantly different between the cultivars, except for Julius grown between harvest years 2009 and 2018, which had a slightly higher harvest index (0.54) than Arminda (0.51) in the same period.

Re-calibration of PCSE-LINTUL3 to catch-up with genetic yield progress
PCSE-LINTUL3 was extended to simulate 1) the response of crop development to daylength and vernalization, 2) the response of light use efficiency to ambient CO 2 levels, and 3) reallocation of stem and chaff dry matter to the storage organs. This extended model was subsequently calibrated and evaluated for Arminda grown on field trials during two growing seasons (1982-1983 and 1983-1984) at three different sites in The Netherlands. The model performed well in simulating both the yield and the aboveground dry matter during both growing seasons at all locations (Fig. 4a). Subsequently the parameters from cultivar Arminda were used as model input to simulate the growth of the newer cultivars Julius, Ritmo, and Tabsco during field trials in 2013-2014 and 2014-2015. The model performed well in simulating the 2013-2014 growing season, but considerably underestimated the yields in the 2014-2015 growing season, in which higher yields were obtained than in 2013-2014 by all investigated cultivars (Fig. 5). This underestimation for the 2014-2015 growing season was considerably reduced after the model was calibrated for each cultivar on the 2013-2014 data set (Fig. 6). The results indicate that a model, calibrated on the 1982-1984 data, underestimates yields of recent growing seasons due to the cultivation of newer cultivars in The Netherlands.

Parameters driving genetic progress of winter wheat in the Netherlands
There are various differences between the values of the re-estimated parameters of Arminda and of the newer cultivars Julius, Ritmo, and Tabasco. However, the differences in individual parameters between old and new cultivars do not necessarily explain the reported yield increase in Netherlands between 1978 and 2010 (Rijk et al., 2013). Additionally, climate change (temperature, radiation), elevated CO 2 , and changed management practices can affect the potential yields as well. Therefore, two sets of sensitivity analyses were conducted. Changes in sowing date did not explain the yield increase (Fig, 8). In contrast, changes in minimum and maximum temperature and global radiation explain a substantial part of the yield increase. If we also consider the change in ambient CO 2 concentrations, the yield increase is even greater (Fig. 8).
We also found that differences in biomass partitioning and reallocation related parameters did not significantly affect crop yield.
We found a fairly small effect of the relative LUE to explain potential yield increases in winter wheat, which is seemingly in contrast to some other studies (Foulkes et al., 2007;Shearman et al., 2005). These studies reported a substantial increase of the LUE of winter wheat cultivars released in the UK between 1972 and 1995 and suggest that this increase was associated with increases in yield. However, the LUE calculated in these studies was the actual LUE, rather than the LUE at a reference CO 2 level of 350 ppm. Applying the linear model that these authors fitted to calculate the actual LUE for the year 1977 (release year Arminda) and for 2009 (release year Julius), we would predict an increase in LUE of 16%. However, if we apply Eq. 3 to calculate the reference LUEs of Julius and Arminda and make the same comparison, the increase would be considerably less (9%). We conclude that a change in reference LUE in the Netherlands can explain some of the increase in yield and that almost half of the increase in actual LUE (Fig. 2) can be explained by elevated CO 2 levels rather than by changes in crop traits.
From a breeder's perspective, it can be interesting to breed for cultivars with increased LUE, but this is challenging since LUE is a complex trait and is therefore hard to select for. LUE lumps the effects of important plant traits and environmental conditions on crop growth, including biochemical constants for leaf photosynthesis (Farquhar et al., 1980), leaf optical properties (Jacquemoud and Baret, 1990), maximum CO 2 assimilation rates at various depths in the canopy (de Wit, 1965;Goudriaan, 1977Goudriaan, , 1986, N distribution in the canopy (Anten et al., 1995), O 2 concentrations (Farquhar et al., 1980), and leaf temperatures (Bernacchi et al., 2001). One way to understand LUE is the use of more complex crop growth models that simulate some of the processes that determine reference LUE. Examples of such crop models are DAISY (Hansen et al., 1991;Abrahamsen and Hansen, 2000), GECROS (Yin and van Laar, 2005), SUCROS (van Keulen et al., 1997;Spitters et al., 1989), NWHEAT (Groot, 1987;Groot and de Willigen, 1991), and WOFOST (de Wit et al., 2020).
Wheat yield was most sensitive to phenological parameters (TSUM1 and TSUM2). There was a difference of 35 o Cd between the TSUM1 determined for Arminda and for the newer cultivars in the 2013-2015 data set (Table A4). This difference is so small that the duration of the vegetative stage (Fig. 9) did not differ significantly between the cultivars, regardless in which years the cultivars were simulated. In contrast, the difference between the values of TSUM2 of Arminda and the newer cultivars were considerably larger (i.e., 154 o Cd). Indeed, the simulated reproductive period was considerably shorter for Arminda than for the newer cultivars, regardless of whether the growth of Arminda was simulated from the growing season 2008-2009 to 2017-2018 or from the growing season 1978-1979 to1987-1988. Yield increases between cultivars from different periods can either be explained by an increase of the harvest index or of dry matter production and both can potentially increase by a longer reproductive phase. There have been considerably increases in the harvest index due to the introduction of semi-dwarf genotypes throughout Europe (Foulkes et al., 2007) in the 1960 s and 1970 s throughout Europe (Foulkes et al., 2007;Schot et al., 2000), which reduced the biomass allocation to the stem during early flowering, and therefore, there was more biomass partitioning to the grains. This increased the harvest index (Foulkes et al., 2007). However, many recent cultivars are approaching their upper limit. Therefore, the recent increases of the yields in Europe were more likely obtained by increasing the dry matter production while maintaining the harvest index (Foulkes et al., 2007). Our results confirm this (Fig. 9b,e). There were no significant differences between the average simulated harvest indices of Arminda, regardless the simulated period, and of Ritmo and Tabasco. The average harvest index of Julius (0.54) was significantly larger than of Arminda (0.51 in both periods), but this difference is rather small. In contrast, the amount of intercepted PAR differed considerably between Arminda and the other cultivars. This indicates that the advantage of the recent cultivars due to a higher TSUM2 is mostly explained by the greater amount of intercepted PAR over a longer period of the growing season than Arminda.

Implications and recommendations for future research
After calibration, PCSE-LINTUL3 performed well in simulating the grain yield of Arminda in 1982-1983 (used for calibration) and in 1983-1984 (used for evaluation) at any site. When the same crop parameters were to simulate the growth of the cultivars in the 2013-2015 data set, the model substantially underestimated the grain yield of these cultivars. We also found that the average simulated yield of modern cultivars is 15% higher than of Arminda grown under recent conditions, even though the model considers the elevated CO 2 concentrations and difference in climate between the 1980s and the 2010s (Fig. 8). This has implications for the use of models calibrated on the 1982-1984 data set. For instance, yield gaps, calculated as the difference between the simulated potential yields and actual yields (van Ittersum and Rabbinge, 1997), would be underestimated when such a model calibrated on the 1982-1984 data set is used. This would hamper the use of crop models as a decision support tool . Therefore, we suggest that other models calibrated on the widely used 1982-1984 data set should be evaluated on the 2013-2015 data set or similar recent data sets and, if necessary, recalibrated as well.
The advantage of crop growth models in comparing yields of old and new cultivars is that they are capable of separating the effect of cultivar traits and changes in climate and CO 2 levels on yield. Reference LUE is a compounded and complex trait. Therefore, the use of more complex crop growth models that calculate the biomass production from the leaf photosynthesis could provide additional insights in which traits that determine the reference LUE can explain its increase. Subsequently, it can then be investigated which reference LUE traits should be selected for by breeders to increase yields under specific conditions. Unfortunately, the calibration of the photosynthesis modules in these models requires additional measurements. These consist at least of photosynthetic light response curves for each cultivar (ten Den et al., 2022) and, for some highly mechanistic models like GECROS (Yin and van Laar, 2005), photosynthetic light and CO 2 response curves measured under two oxygen levels as well (Yin et al., 2009). Since neither the 1982-1984 data set, nor the 2013-2015 data set, includes these types of measurements, it would be very challenging to calibrate the photosynthesis modules in these complex models in a reliable way. In contrast, the advantage of using a LUE driven model, like PCSE-LINTUL3, is that the actual LUE can be determined directly from the available data (Fig. 1). We suggest future research to conduct photosynthesis measurements on old and new cultivars, grown under the same circumstances. Analysis of these measurements would then reveal in which aspects the leaf photosynthesis traits differ between the cultivars. Although this type of comparison was done for Chinese cultivars (Wu et al., 2014), a similar analysis under Dutch conditions is lacking. Only after these measurements are available, complex crop growth models can be used to investigate which changes in leaf photosynthetic traits can explain the observed wheat yield increase in the Netherlands (Rijk et al., 2013).

Conclusion
PCSE-LINTUL3 was extended with 1) a phenological module that considers the sensitivity of crop development to daylength and vernalization, 2) a CO 2 response of the light use efficiency, and 3) reallocation of dry matter.
PCSE-LINTUL3, calibrated on the 1982-1984 data set underestimated wheat grain yields from the 2013-2015 data set. After model recalibration against field data from a more recent data set (2014)(2015), model performance to reproduce wheat yields improved.
Sensitivity analysis showed that increases in CO 2 levels and climate change between 1985 and 2015 alone would increase the simulated yields with 23% and, if genetic progress in the form of changes in crop parameters would also be considered, with 41%.
Yield increases due to genetic progress were mostly explained by higher values for the temperature sum from anthesis to maturity and, to a lesser extent, reference light use efficiency.
We recommend more research on comparing leaf photosynthetic traits of old and new cultivars to better understand the mechanisms of yield increase due to changes in references light use efficiency.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Data Availability
Data will be made available on request.