Calibrating and testing APSIM for wheat-faba bean pure cultures and intercrops across Europe

Cereal-legume intercropping can increase yields, reduce fertilizer input and improve soil quality compared with pure culture. Designing intercropping systems requires the integration of plant species trait selection with choice of crop configuration and management. Crop growth models can facilitate the understanding and prediction of the interactions between plant traits, crop configuration and management. However, currently no existing crop growth model has been calibrated and tested for cereal-legume intercrops throughout Europea. We calibrated the Agricultural Production Systems sIMulator (APSIM) for pure cultures of wheat and faba bean using data from Dutch field trials, and determined the phenological parameters to simulate pure cultures and intercrops from seven field experiments across Europe. APSIM successfully reproduced aboveground dry matters and, for wheat only, grain yields in pure cultures. In intercrops, APSIM systematically overestimated the aboveground dry matter and grain yield of faba bean and underestimated those of wheat. APSIM was reasonably capable of simulating plant heights in pure cultures, but respectively overestimated and underestimated the height of faba bean and wheat in intercrops. In order to simulate wheat-faba bean intercrops better, APSIM should be improved regarding the calculation of biomass partitioning to grains in faba bean and of height growth in both species.


Introduction
Global food production needs to satisfy increasing demands while reducing its environmental footprint by lowering anthropogenic inputs and spillovers. Diversification in agriculture can support achieving this complex goal (Tamburini et al., 2020). One way to diversify is intercropping, i.e. cultivating two or more species in the same field for a significant part of their growing period (Willey and Rao, 1980). Intercroppping can increase and stabilize crop yield and reduce the environmental impact of arable farming (Malezieux et al., 2009). Yield advantages can only occur if at least one of the species in the intercrop experiences less intra-and inter-specific competition in an intercrop than it would experience from intraspecific competition in pure cultures (Loreau, 2010). For instance, cereals and legumes grown in intercrops may complement each other regarding nitrogen use. While legumes are capable of fixing nitrogen from the atmosphere, cereals obtain all their nitrogen from the soil (Jensen, 1996). Therefore, the cereal experiences less competition for nitrogen from a neighboring legume than it would from a neighbor of its own species. As a result of such mechanisms for complementarity, intercrops oftenbut not alwaysshow overyielding (Yu et al., 2015(Yu et al., , 2016Martin-Guay et al., 2018;Li et al., 2020;Xu et al., 2020). Here, overyielding is defined as a situation where the land area that is required for two species to obtain a certain yield is lower if these species are intercropped than if those two species were grown in a pure culture. Moreover, intercropping may result in an increase in yield stability and improved resource use (Raseduzzaman and Jensen, 2017). Intercropping can reduce the abundance of herbivores and the extent of crop damage by herbivory and enhance the abundance of natural enemies (Letourneau et al., 2011). Intercropping can also result in a stronger weed suppression than a pure culture (Liebman and Dyck, 1993). Further, organic carbon and nitrogen in soil was shown to be higher in soils with a history of intercropping (Cong et al., 2015), due to increased organic matter input and improved nutrient retention. Because of all these potential advantages, intercropping can support the ecological intensification of agriculture (Bommarco et al., 2013).
Global agriculture is dominated by cereals, with maize, wheat and rice representing the crops with the largest growing areas, but they need high inputs of nitrogen via fertilizer application. Incorporation of legumes in cropping systems can reduce the need for anthropogenic nitrogen input. One way to incorporate legumes in cereal systems, and meet the nitrogen demand of cereals, is by using cereal-legume mixtures. In such mixtures, the legume can predominantly utilize nitrogen that is fixed biologically from air through symbiotic bacteria in the root nodules, thus reducing the competition with the cereal, relying on inorganic nitrogen from soil (Jensen, 1996). Designing sustainable and productive cereal-legume intercropping systems is, however, challenging. It is necessary to account for aboveground and belowground interaction between the cereal and legume species, including competition for light, water and nutrients, and the mechanisms behind such competition.
The extent to which one species dominates another is determined by various crop traits, including differences in rooting systems (Corre--Hellou et al., 2007), canopy structure, height and leaf angles (Goudriaan, 1988;Keating and Carberry, 1993;Pronk et al., 2003;Gou et al., 2017b), and nutrient uptake capacity by the cereal and the legume (Corre-Hellou et al., 2006). Additionally, the degree of interspecific competition on both species is also determined by management decisions like row configuration, plant density, fertilization timing and amount, and sowing dates (Yu et al., 2016).
Crop growth models quantitatively integrate the interactions among intercropped species over time and provide predictions of production outcomes Gaudio et al., 2019). They can thus help effectively exploring the net outcome of all the aspects affecting the growth of intercrops -pedoclimatic conditions, crop and cultivar traits, and management practices. As such they can support choices of species and management for the ecological intensification of agriculture. Several crop growth models have been developed and successfully applied to simulate the growth, development and yield of a wide variety of crop species and cultivars in pure cultures in various crop production systems and environments (Wallach et al., 2018). Some crop growth models can also simulate intercropping systems. For instance, the crop growth models FASSET (Jacobsen et al., 1998) and STICS (Brisson et al., 2003(Brisson et al., , 2004 were used to simulate pea-barley intercrop systems in Denmark (Berntsen et al., 2004) and France (Corre-Hellou et al., 2009), respectively. Wheat-maize intercropping was modelled by Gou et al. (2017b) and Tan et al. (2020) added competition for water to this model. The M 3 model was used to stimulate the nitrogen-limited growth of wheat-fababean strip intercrops in The Netherlands (Berghuijs et al., 2020). Yet, thorough parameterization and validation of crop growth models for cereal-legume intercrops remain rare, limiting the applicability of models to design the most effective intercropping systems.
The Agricultural Production Systems sIMulator (APSIM)  has also been used for simulating intercropping systems (Carberry et al., 1996;Knörzer et al., 2011;Chimonyo et al., 2016). APSIM is potentially interesting as a tool for designing sustainable cereal-legume production systems because it has modules for a broad range of crop species, and can simulate their growth under diverse conditions. However, APSIM has been infrequently tested and applied under European conditions, in particular when considering intercropping systems. To the best of our knowledge, APSIM has been applied to cereal-legume intercropping systems in Europe only once (Knörzer et al., 2011), i.e. wheat-field pea and maize-field pea mixtures grown in Southern Germany. Furthermore, there are only few studies available in which APSIM WHEAT (crop module in APSIM for modelling wheat) has been calibrated and validated for pure cultures of wheat in Europe (Asseng et al., 2000;Knörzer et al., 2011) and none relative to APSIM FABABEAN (crop module in APSIM for modelling faba bean) outside Australia. Given the potential broad applicability of APSIM, it is of interest to further assess its power in simulating cereal-legume intercrops under a variety of pedoclimatic conditions.
To advance our capabilities to evaluate intercropping as a tool to support ecological intensification of agriculture, here we assess to what extent APSIM can be used to simulate intercrops of cereals and legumes in Europe. Specifically, we consider spring wheat-faba bean intercrops grown under European growing conditions. This study aims to 1) calibrate and validate APSIM for pure cultures of spring wheat and faba bean under temperate European conditions, exploiting detailed data from experiments in the Netherlands (Kropff, 1989;Boons-Prins et al., 1993;Gou et al., 2016); and 2) evaluate the performance of APSIM in predicting the crop yield in various European sites for both pure cultures and intercrops of spring wheat and faba bean, based on several field experiments recently carried out throughout Europe.

Field experimental data
We exploited two groups of experimental data. The first group consists of previously published crop measurements on pure cultures of spring wheat (Gou et al., 2017b) and faba bean (Kropff, 1989) from experiments conducted in Wageningen, the Netherlands. These data are comparatively detailed, including results from periodic harvests and measurements of leaf area index (LAI) and plant height at different times within the growing season. Thanks to this higher level of detail, these data (Table 1) lend themselves to calibrating and validating APSIM WHEAT and APSIM FABABEAN for pure cultures of wheat and faba bean. The second group of data consists of final aboveground dry matter and yield measurements of spring wheat and faba bean in pure culture and intercrops. The data were collected in 2017 and 2018 at seven locations along a latitudinal gradient in Europe, as part of the EU Horizon 2020-funded project DIVERSify (Tables 2 and 3). The experiments in Dundee (UK) in 2018 and in Gleisdorf in both 2017 and 2018 were unfertilized. The experiment in Ancona in 2018 was fertilized, but it comprised only a single, low fertilization regime. In all the other cases, two different management treatments were considered, hereafter referred to as "conventional management" and "low input management". Input levels differed among the locations, but at each location nitrogen was applied at a higher rate at the conventional management level than at the low input level. In all intercropping experiments, the crop species were sown and harvested at the same date and both crop species were mixed within the row (i.e. not alternate row intercropping). We did not consider the experiment conducted in 2017 in Córdoba, because the crops were heavily affected by aphids, most likely as the result of an unusually late sowing date. Aphid infestation cannot be reproduced by APSIM. We also excluded the data from Dundee collected in 2017, because measurements of aboveground dry matter and grain yield separated per species were not available.

Soil water module
We used the SoilWat module to simulate the soil water dynamics for each combination of experiment and treatment. This module requires as input for each layer the thickness (Δz), the bulk density (ρ) and the volumetric soil water contents at air dry (θ ad ), field capacity (θ dul ), saturation (θ s ) and the 15-bar lower limit soil moisture content (θ LL15 ) (Probert et al., 1998).
Each of these volumetric water contents can range from 0 to soil porosity. In order to determine these parameters, we first obtained ρ and mass fractions of clay (f clay ), sand (f sand ), silt (f silt ), and organic carbon (f oc ) from the freely-available SoilGrids database (Hengl et al., 2014(Hengl et al., , 2017. We calculated the fraction of organic matter f om as (Pribyl, 2010): We used the pedotransfer function described by Wösten et al. (1999) and Wösten and Nemes (2004) to calculate the Van Genuchten-Mualem parameters α, n, and θ s (Mualem, 1976). Those parameters describe a soil water retention curve (Van Genuchten, 1980) as: where θ r is the residual volumetric soil moisture content and ψ is the water potential (hPa). We assumed that θ r = 0.025. We further adopted the assumptions from Leffelaar (2014) that the pF value (10-base logarithm of |ψ|) of a soil layer equals 5.0 at air dry, 4.2 at the wilting point and 2.0 at field capacity; and that the soil moisture content at wilting point is the same as θ LL15 . With these assumptions, we used Eq.2 to calculate θ ad , θ LL15 , and θ dul .

Metereological input data
APSIM requires daily global radiation, minimum and maximum temperature (T min (t) and T max (t)), and precipitation as input data. For each field trial and growing season, these data were obtained from the NASA Langley Research Center Atmospheric Science Data Center Surface meteorological and Solar Energy (SSE) web portal supported by the NASA LaRC POWER Project (https://power.larc.nasa.gov/data-access -viewer/).

Management data
APSIM requires sowing dates, nitrogen application amounts, and harvest dates as inputs. Tables 2 and 3 shows the input values of these variables.

APSIM calibration and validation procedure
2.3.1. Calibration procedure for spring wheat Table 1 shows an overview of the data that were used to calibrate and validate the crop module APSIM WHEAT (Meinke et al., 1997(Meinke et al., , 1998aMeinke et al., 1998b;Wang et al., 2003;Zheng et al., 2015). APSIM's cultivar Hartog was used as a starting point for the calibration procedure. Sensitivities for photoperiodicity (photop_sens) and vernalization (vern_sens) were set to 0. The parameter shoot_lag was calculated from the sowing depth, the default value of shoot_rate ( o C d), the observed emergence dates and daily temperature data. Phenological parameters (i.e. thermal times from one stage to the next) were determined based on phenological observations and the corresponding temperatures in the experimental data as: where tt stage is the thermal time ( o C d) from the start of a certain phenological stage (at time t start ) until the start of the next stage (at time t end ). Δt is the time step (d). T b is the base temperature ( o C d), i.e., the temperature below which the thermal time does not increase with temperature. In this way, we determined tt_end_juvenile, tt_flor-al_initiation, tt_flowering, tt_start_grainfill, and tt_end_grain_fill, i.e., the thermal times from the end of the juvenile stage until floral initiation, floral initiation to flowering, flowering to the start of grain filling and from the start of grain filling until the end, respectively. After determining the parameters values mentioned above, we ran APSIM to determine the stem weight at the day of flowering for wheat for the experiment from Wageningen conducted in 2013. From the simulated stem weight and the measured yield components, we calculated the parameters max_grain_size (maximum grain size; g seed − 1 ) and grain_-per_gram_stem (number of kernels per gram of stem at flowering). Finally, we manually optimized parameter y_frac_leaf (fraction of newly produced biomass allocated to leaves). Table 1 shows an overview of the data that were used to calibrate the crop module APSIM FABABEAN Turpin et al., 2002Turpin et al., , 2003. We used parameters for the cultivar Fjord as starting point. We switched off the sensitivity to vernalization (vern_sens = 0). We also switched off the sensitivity to photoperiodicity, by assuming that the phenological parameter values at short daylengths equal their values at long day lengths. We calculated the values of y_tt_end_of_juvenile, y_tt_flowering and y_tt_start_grain from phenological observations and temperature data using Eq.3. We thereby assumed that the ratio of the y_tt_emergence and y_tt_floral_initiation to the thermal time from emergence to flowering, and the ratio of tt_flowering and tt_start_-grain_fill to the thermal time from flowering to maturity, were the same in the cultivars that were used in this study and in the cultivar Fjord that was used as the starting point of calibration. We calculated the parameter shoot_lag for faba bean in the same way as for the cereals, assuming a sowing depth of 40 mm. Unfortunately, although the emergence dates for the Wageningen data from Kropff (1989) were known, the sowing dates were not. Therefore, we ran the simulations for the Kropff (1989) experiments using the emergence date as the start date of simulation in such a way that the plant emerged immediately with sowing depth equal to 0.

Calibration procedure for faba bean
We estimated the radiation use efficiency y_rue as the slope of the relationship between the sum of the daily intercepted radiation and the observed biomass, calculated as explained by Gou et al. (2017a). Finally, we adjusted the values of the maximum harvest index that can be reached (y_hi_max_pot) and the daily growth rate of the harvest index (y_hi_incr) to higher values than those representing the cultivar Fjord.

Validation procedure
We validated APSIM WHEAT and APSIM FABABEAN by comparing measured and simulated values of LAI, biomass and, in the case of APSIM WHEAT, the height of the plants. Table 1 shows the data that were used for validation procedure.

Local model recalibration and model evaluation on DIVERSify data
We used the newly calibrated and validated APSIM crop modules for each species based on the Wageningen data to predict the grain yields, total aboveground dry matters and plant heights at the end of the growing season that were measured in pure cultures and intercrops in the DIVERSify experiments (Tables 2 and 3). Since the cultivars differed from one location to the other and sometimes also between different  Kropff (1989) and Boons-Prins et al. (1993) Wageningen (the Netherlands) Faba bean Monica 1985Monica 1986Monica , 1988 No exposure to SO 2 Gou et al. (2017a) Wageningen (the Netherlands)  Wheat  Tybalt  2013  2014 Pure culture years at the same location, we recalculated the phenological parameters for each combination of crop species, cultivar and location from phenological observations in these field trials. All other parameters remained the same as calibrated based on the Wageningen data. The field trial of Córdoba lacked phenological measurements. Therefore, we determined the phenological parameters from estimates that were done in Spain on wheat monocultures in a previous study (Boons-Prins et al., 1993). We assumed that the phenological parameters from Córdoba were equal to those from Italy. Following Salo et al. (2016), we consider three indices to quantify the performance of APSIM: i) the mean bias error (MBE); ii) the root-mean squared error (RMSE); and iii) the index of agreement (IA) (Willmott and Wicks, 1980;Willmott, 1981). The MBE represents the average difference between the model predictions and the measured values and is usually called bias in the statistical literature. It is an average over all the locations, cultivars and treatments and is calculated as: where y i is either the observed average yield, average aboveground dry matter, or maximum plant height for a certain combination i of location and management treatment; ŷ i is the corresponding simulated yield, aboveground dry matter, or plant height for combination i ; N is the total number of combinations of location and management treatment. The RMSE quantifies the absolute deviation between observed and simulated values: RMSE is a standard deviation, and is expressed in the same units as the variable of interest.
The Index of Agreement (IA) is used as an indicator of model efficiency: where y is the average yield or average aboveground dry matter of all observations. IA is an index that varies from 0 to 1, with higher values representing better predictions. Its interpretation is similar to that of the coefficient of determination R 2 (proportion of variance explained) in ordinary regression.

APSIM WHEAT
The calibration procedure started with considering the effects of vernalization (exposure to low temperature) and daylength on the temperature sums required to reach different phenological stages (Zheng et al., 2015). We set the sensitivity to vernalization (vern_sens) to zero because this process is largely irrelevant for spring wheat. Moreover, even if it would have some effect, it cannot be quantified separately from the parameter tt_end_juvenile easily, as this would require data from various experiments during which the cultivar of interest was exposed to different low temperature regimes. We varied the parameter photop_sens between its minimum (0) and maximum value (5.0) and found that this parameter did not have any appreciable effect on the Table 2 Overview of the DIVERSify experiments conducted in 2017. The treatment levels consisted of a combination of management (C: conventional management; L: low input management) and cropping system (PC: pure culture; IC: intercrop). Fertilization rates and sowing densities are reported for wheat (W) and faba bean (F). simulated phenology. Therefore, this parameter was also set to zero. The next step in the calibration procedure of the spring wheat was to determine two of the main parameters known to affect yield: the number of grains per unit of stem weight (grains_per_stem), driving how many kernels are formed; and the maximum grain weight (max_grain_size), defining the maximum weight a grain can reach. This calibration was necessary because the default values for these parameters were not high enough to allow APSIM WHEAT to simulate the high yields that were reported in the experiments of Gou et al. (2016).
Figs. 1-3 summarize the results of the calibration and validation, and Table 4 shows the original parameter values of the APSIM WHEAT crop module and their re-estimated values after the first calibration step using the Wageningen data (Gou et al. (2016)). There was a good agreement between the simulated and measured biomass partitioning in the calibration experiment from 2013 (Fig. 1A), but LAI was somewhat overestimated after flowering (Fig. 2). There was also good agreement between the measured and simulated biomass of leaf and ear as well as LAI in the validation experiment from 2014 ( Fig. 1B and 2B, but the stem biomass was underestimated after flowering. Finally, although there was good a agreement between the simulated and measured plant height in the second half of the season in the calibration, the model systematically underestimated plant height in the validation (Fig. 3).

APSIM FABABEAN
APSIM FABABEAN uses thermal times to determine the timing of key phenological stages (y_tt_end_of_juvenile, y_tt_floral_initiation, y_tt_flowering, and y_tt_start_grain_fill). The thermal times are defined as a function of day length (x_pp_end_of_juvenile, x_pp_floral_initiation, x_pp_flowering, and x_pp_grain_fill). Consequently, APSIM assumes a rapid decline of the value of these parameters with increasing daylength. Although these daylength-dependent functions have been successfully applied to simulate the growth of faba bean in Australia (Turpin et al., 2003), the use of these functions resulted in a growing season duration that was too short under European conditions, which include longer daylengths than at lower latitudes, like in Australian. Consequently, the yields and the aboveground dry weights of faba bean were underestimated substantially when the default APSIM parameter values for faba bean were used. Therefore, in the calibration procedure, we assumed that the listed parameters were independent of daylength. We calculated the thermal times from the emergence dates, flowering dates, and maturity dates reported by Boons-Prins et al. (1993). Finally, we calculated the radiation use efficiency (y_rue; g MJ − 1 ) from the assumed extinction coefficients in APSIM, the measured aboveground biomass and LAIs, and radiation using the method described by Gou et al. (2017a). Table 5 summarizes the original parameter values of the APSIM FABABEAN crop module and their re-estimated values after calibration. Figs. 4 and 5 compare observed and simulated aboveground dry matter, partitioned over different organs, LAI and crop height for faba bean. In general there was a good agreement between the simulated and measured biomass partitioning, although the stem biomass was slightly underestimated in 1985 and 1988 (Fig. 4). There was a good agreement between the measured and simulated LAI for both the dataset used for calibration (from 1985) and that used for validation (1986 and 1988). For both 1985 and 1988, APSIM FABABEAN overestimated plant height early in the growing season and underestimated it at the end of the growing season (Fig. 5).

Biomass and yield in pure cultures and intercrops
Tables 6 and 7 show the adjusted values of the phenologial parameters for each location in the DIVERSify experiments. APSIM WHEAT performed reasonably well in reproducing most of the measurements on pure stands of wheat in the seven locations across Europe of the DIVERSify experiments in 2017 and 2018, although with a few outliers (Fig. 6). The simulated aboveground dry matter in Dundee (UK) in 2018 was about 6.5 times larger than the observations for both wheat and faba bean. The observed biomass production of the crop and the yield were particularly low at this location in 2018 due to severe drought stress. Nevertheless, APSIM WHEAT was not capable of fully reproducing these low yields, despite including the soil water balance and at least some aspects of the effects of low water availability on crop development. A possible explanation is that the interpolated soil data that we obtained from SoilGrids for this location were not representative of this site and overestimated the soil water holding capacity.
Similar to APSIM WHEAT, APSIM FABABEAN overestimated the aboveground dry matter in Dundee (UK). However, APSIM FABABEAN was also not able to reproduce the high total aboveground biomass was measured in Taastrup (Denmark) in 2017. While the performance of APSIM FABABEAN in simulating the aboveground biomass was Table 3 Overview of the DIVERSify experiments conducted in 2018. The treatment levels consisted of a combination of management (C: conventional management; L: low management) and cropping system (PC: pure culture; IC: intercrop). Fertilization rates and sowing densities are reported for wheat (W) and faba bean (F).   Regarding intercrops, the model substantially underestimated the aboveground dry matter of wheat (MBE = − 3878 kg ha − 1 ) and overestimated the aboveground dry matter of faba bean (MBE =7295 kg ha − 1 ), although some underestimated values occurred for some locations (Fig. 7).

Plant height in pure cultures and intercrops
APSIM performed poorly in explaining the variation of the observed plant heights both in pure culture and intercrop (Fig. 8). For example, the index of agreement was low for the plant heights in wheat pure cultures (IA = 0.43). Nevertheless, the model error was small (MBE = 19 cm, RMSE =20 cm) relative to the simulated heights, which were all between 79 cm and 85 cm. The IA was considerably higher in faba bean pure cultures (0.67) but, also there, the model error was relatively small (MBE =3 cm, RMSE =25 cm). Similarly to the results of the aboveground dry matter (Fig. 6C), APSIM FABABEAN strongly overestimated the plant height in pure cultures in Taastrup under both low input and conventional input management. In contrast to the pure cultures, APSIM WHEAT strongly underestimated the plant height of wheat in intercrops (MBE = − 54 cm, RMSE =55 cm), while APSIM FABABEAN strongly overestimated the plant height of faba bean (MBE =45 cm, RMSE =49 cm).

Discussion
This study is the first application of APSIM to simulate cereal-legume intercrops throughout Europe. We calibrated and validated APSIM WHEAT and APSIM FABABEAN on wheat and faba bean pure cultures based on detailed datasets from Dutch field trials. Next, we adjusted the phenological parameters for different locations in Europe and simulated pure cultures and intercrops of wheat and fababean throughout Europe. We evaluated the performance of APSIM to simulate these systems.

Performance of APSIM WHEAT for wheat monocultures
APSIM WHEAT could be successfully applied to simulate the yield and the aboveground dry matter of spring wheat and faba bean in the Netherlands under well fertilized conditions and pure cultures. This required adjustments of the phenological and yield component parameters with respect to those of the APSIM WHEAT default cultivar. Given that APSIM had mostly been calibrated and validated in subtropical and tropical areas for wheat, our results demonstrate the robustness of the APSIM WHEAT module in other climatic regions. The newly calibrated APSIM WHEAT was also capable of reproducing the biomass and yields of wheat in pure cultures of the DIVERSify field-trials across Europe reasonably well, after the phenological parameters were adjusted according to the location.

Performance of APSIM FABABEAN in pure cultures
Similarly to APSIM WHEAT, APSIM FABABEAN mainly required adjustments of the phenological parameters to simulate the biomass production of faba bean in the Netherlands. Making phenology Vector with phenological stages for which y_frac leaf contains a corresponding value. 1,2,3,4,5,5.4,6,6.9,7,8,9,10,11 1,2,3,4,5,5.4,6,6.9,7,8,9,10,11 -y_frac_leaf Vector with phenology dependent fractions of newly produced biomass partitioned to leaves 0,0,0.6,0.6,0.6,0.42,0,0,0,0,0,0,0 0,0,0.45,0.45,0.45,0.1,0,0,0,0,0,0 -x_stem_wt Vector with phenological stages for which y_stem_wt contains a corresponding value. 0,6 0,1.56 g y_stem_height Vector with stem weight dependent heights 0,1500 0,730 mm independent of day length was an essential part of the calibration. Despite the calibration, the resulting APSIM FABABEAN performance was mediocre at best in reproducing the aboveground dry matter from pure cultures of the DIVERSify field-trials across Europe and poor for grain yields (Fig. 6). This indicates that the model for biomass partitioning to the grains of APSIM FABABEAN needs to be further improved. We note that APSIM WHEAT calculates the number of grains from the stem weight per plant at flowering and then determines the increase in the weight per grain from the grain filling rate. Grain filling stops once either the thermal requirement of grain filling has been fulfilled or a certain weight per grain has been reached (Zheng et al., 2015). In contrast, APSIM FABABEAN assumes that harvest index is 0 until flowering and then starts to increase linearly with time until either the thermal requirement of grain filling or a preset maximum harvest index has been reached . Hence, unlike APSIM WHEAT, this approach does not consider the grain number and requires parameters (like daily increase of the harvest index and the maximum harvest index) that are difficult to obtain and can depend on both cultivar and the environment where the plant is grown. In the light of these results, the APSIM FABABEAN parameters that determine the harvest index and the biomass partitioning need to be determined for a broader range of cultivars and growing conditions. Additionally, it may be necessary to adjust the way in which APSIM FABABEAN calculates the grain dry matter production. APSIM FABABEAN performed considerably better in predicting the plant heights in pure cultures than APSIM WHEAT at most locations, Fig. 4. Simulated and measured biomass partitioning and leaf area index in faba bean for the experiments from Kropff (1989) that were used to calibrate (A-B) and validate (C-F) APSIM FABABEAN. Measured fruit (combined pod and grain), stem and leaf biomass are represented as red diamonds, blue circles, and green triangles respectively. Simulated fruit, stem and leaf biomass are represented as red solid lines, gblue dashed lines, and green dotted lines, respectively (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article). except for Taastrup in 2017, where it strongly underestimated the plant height at both high and low management input treatments (Fig. 8).

Performance of APSIM WHEAT and APSIM FABABEAN in intercrops
APSIM had limited capabilities to reproduce the yields of wheat-faba bean intercrops. This is not surprising as APSIM FABABEAN poorly reproduced the faba bean yields in pure cultures. However, APSIM also performed poorly in reproducing the total aboveground dry matter of both species when intercropped. It underestimated the total aboveground dry matter of wheat for each DIVERSify field trial and overestimated the total aboveground dry matter of faba bean. A possible explanation of this pattern is the way that APSIM simulates crop height. APSIM assumes that the crop height is proportional to the stem dry weight per plant until grain filling starts or until the plant has reached its maximum height (i.e. the last element of the parameter vector y_height; Table 4). After that, the plant height remains constant. This can work reasonably well for pure cultures. However, the direct link between crop height and stem production may be problematic in intercrops, as the plant heights of the two species grown in intercrop can differ, unlike in a pure culture. Since APSIM lacks a mechanism that allows plants to enhance the growth of the stem if they are shaded (Knörzer et al., 2011), the shorter crop species, which was almost always wheat in this study, will be increasingly shaded by the taller species over time without having a mechanism to adapt. This will result in an overestimation of the competitive ability of the taller crop (faba bean in our case) to intercept radiation relative to wheat (Fig. 8). This likely explains why wheat aboveground dry matter is systematically underestimated and faba bean aboveground dry matter is systematically overestimated.
The performance of APSIM in intercrops could thus be substantially improved by assuming that plant height growth is independent of the stem weight per plant in both APSIM WHEAT and APSIM FABABEAN, for instance by implementing in the source code a logistic relation between thermal time from emergence and height growth (Kropff and Van Laar, 1993;Gou et al., 2017b). Another solution would be to implement a relationship in APSIM that increases the biomass partitioning of dry matter of one species to the stem, if that species is shaded by its companion species. This solution would require changes in the APSIM source code. An alternative approach not requiring a change to the way in which APSIM simulates plant growth would be directly calibrating the model in intercrops (Chimonyo et al., 2016). While potentially improving APSIM performances in intercrop, this approach, however, does still not explicitly consider a potential adjustment in plant features caused by interspecific competition. Then, crop and cultivar parameters do not only depend on the crop and cultivar traits, but also on the cropping system in which they are grown. This will likely make it hard to use the same parameters values of a cultivar in APSIM that is calibrated    393  248  237  215  201  237  357  y_tt_end_of_juvenile  342  342  216  106  187  175  206  316  tt_flowering  208  208  131  125  114  106  125  181  tt_start_grain_fill  535  535  338  223  293  273  322  467 for one cropping system to simulate another system.

Conclusions and recommendations for future research
Crop growth models offer a powerful tool to explore the performance of a variety of species and management choices, under different pedoclimatic conditions, including future climates and the use of still to-bebred varieties. For intercropping, this means quantifying the net effects of intra-and inter-specific competition between the two species and varieties, and determining when facilitation effects prevail over competition, leading to higher yields than the corresponding pure crop and, in general, reducing the negative environmental effects of agriculture. This is a necessary step when aiming at using intercropping as one measure to support the ecological intensification of agriculture. Clearer understanding of the advantages and disadvantages can also facilitate the uptake of intercrop by farmerscurrently, this is not a preferred diversification practice for most farmers (Kleijn et al., 2019). Further, by testing different parameters and aspects of crop interactions, models can provide further insight into the mechanisms driving intra-and inter-specific competition in field crops. Yet, relatively few crop growth models have been tested for their ability to reproduce intercrops under a variety of pedoclimatic conditions. We successfully calibrated and validated APSIM WHEAT and APSIM FABABEAN on field trials of pure cultures of wheat and faba bean in the Netherlands. We then adjusted the phenological parameters of APSIM WHEAT and APSIM FABABEAN to various locations in Europe, in order to simulate field trials of pure cultures and intercrops of wheat and faba bean.
In pure cultures, APSIM WHEAT performed well in reproducing the observed grain yields and aboveground dry matter. APSIM FABABEAN was, to a certain extent, capable of reproducing most of the aboveground dry matter observations of pure cultures. But it had limited performances regarding grain yields, indicating limited capability of APSIM FABABEAN to simulate partitioning to grain under European growing conditions. In wheat-faba bean intercrops, APSIM WHEAT systematically underestimated the aboveground dry matter of wheat and APSIM FABABEAN systematically overestimated the aboveground dry matter.
Further evaluation of the results suggested some possible explanations for the limited performance of APSIM in simulating growth and yield of faba bean in pure stands and intercropping. We recommend further investigation of the following aspects: 1) Can the performance of APSIM FABABEAN to simulate the yield be improved by examining how the harvest index related parameters differ among cultivars and regions in Europe and/or by making the model for biomass partitioning to grains more mechanistic?
2) As competition for light is a key driver in the model, simulation of crop height is decisive for the partitioning of light and the growth of species in intercrops. In APSIM, height growth is linked to biomass growth. A model for height growth using a logistic growth function could improve the capability of APSIM WHEAT and APSIM FABABEAN to simulate dry matter production in wheat-faba bean intercrops. Alternatively, APSIM should be extended with a module that increases the biomass partitioning to the stem to one species in an intercrop, if it gets shaded by its companion species.
These aspects directly link to mechanisms currently not well captured by APSIM. Their proper inclusion is a necessary step to improve APSIM performance at least in the cereal-legume intercrop system and under the environmental conditions considered here. Only a model able to realistically represent pure cultures and intercrops using different cultivars and under a variety of pedoclimatic conditions such as those covered by our data would allow the design and evaluation of ecologically intensive cropping systems based on intercrops.

CRediT authorship contribution statement
Herman N.C. Berghuijs