Simulation of crop yield using the global hydrological model H08 (crp.v1)

. Food and water are essential for life. A better understanding of the food–water nexus requires the development of an integrated model that can simultaneously simulate food production and the requirements and availability of water resources. H08 is a global hydrological model that considers human water use and management (e.g., reservoir operation and crop irrigation). Although a crop growth sub-model has been included in H08 to estimate the global crop-specific calendar, its performance as a yield simulator is poor, mainly because a globally uniform parameter set was used for each crop type. Here, 5 through country-wise parameter calibration and algorithm improvement, we enhanced H08 to simulate the yields of four major staple crops: maize, wheat, rice, and soybean. The simulated crop yield was compared with the Food and Agriculture Organization (FAO) national yield statistics and the global data set of historical yield for major crops (GDHY) gridded yield estimates with respect to mean bias (across nations) and time series correlation (for individual nations). The improved simulations showed good consistency with FAO national yield. The mean biases of the major producer countries were 10 considerably reduced to -4%, 3%, -1%, and 1% for maize, wheat, rice, and soybean, respectively. The corresponding coefficients of determination (R 2 ) of the simulated and FAO statistical yield increased from 0.01 to 0.98, 0.21 to 0.99, 0.06 to 0.99, and 0.14 to 0.97 for maize, wheat, rice, and soybean, respectively; the corresponding root mean square error (RMSE) decreased from 7.1 to 1.1, 2.2 to 0.6, 2.7 to 0.5, 2.3 to 0.3 t/ha. Comparison with the reported performances of other mainstream global crop models revealed that our improved simulations have comparable ability to capture the temporal yield variability

where  is a crop-specific parameter of radiation use efficiency,  is photosynthetically active radiation, and  is the crop regulating factor. is calculated using shortwave radiation () (W m −2 ) and leaf area index (), as follows: where  is the fraction of growing season when growth declines, dpl1 and dpl2 are shape parameters of the LAI growth 120 curve (see the definition in Table 1 in Ai et al., 2020), and blai is the maximum leaf area index.
is calculated as:  = min(, , , ) where , , , and  are the stress factors for temperature, water, nitrogen, and phosphorous, respectively.The details of water and temperature stress are provided in the work of Ai et al. (2020).Nitrogen and phosphorous stress were not considered 125 because of the lack of available information regarding fertilizer application (Hanasaki et al., 2008a).
The aboveground biomass () (kg ha −1 ) is estimated with the accumulated biomass (∑ ∆) as: where ℎ is the heat unit index, which is calculated as the ratio of accumulated daily heat units ∑ () and the potential heat unit (): The daily heat units Huna(t) is expressed as the difference between the daily mean air temperature ( 5 ) and the the crop's specific base temperature (Tb; provided as a crop-specific parameter): The crop yield () (kg ha −1 ) is finally estimated from the aboveground biomass () using the crop-specific harvest index

135
() on the date of the harvest as: where  is the ratio of  (accumulated actual plant evapotranspiration in the second half of the growing season) to SWP (accumulated potential evapotranspiration in the second half of the growing season): 140 Differences in crop type are expressed by the differences in crop parameters (e.g., be, blai, and Tb).Currently, the crop submodel can simulate the yield for 18 food crops.The globally uniform default parameters for the food crops were collected from the default parameters of the SWIM model (Krysanova et al., 2000).

Algorithm improvement
Here, the crop sub-model was improved as follows.First, the effects of CO2 fertilization and vapor pressure deficit change on 145 radiation use efficiency were added to the H08 crop sub-model, using the equations and parameters adopted in SWAT (Neitsch et al., 2011;Arnold et al., 2013).In general, elevated CO2 has a positive impact on crop yield, whereas increased vapor pressure deficit has a negative impact.The CO2 fertilization effect is larger for C3 crops (e.g., wheat, rice, and soybean) than for C4 crops (e.g., maize).Specifically, the radiation use efficiency (be) is adjusted according to the concentration of CO2 as:

150
where be is the radiation use efficiency,  3 is the CO2 concentration in the atmosphere (ppmv),  % and  3 are shape coefficients.
where  35LJ is the ambient atmospheric CO2 concentration (ppmv),  3-6 is an elevated atmospheric CO2 concentration 155 (ppmv),  5LJ is the be of the crop at  35LJ , and  -6 is the be of the crop at  3-6 .
Additionally, the be is adjusted with the vapor pressure deficit (vpd) (kPa) as:

160
where  O#"P% is the be for the plant at a vpd of 1 Kpa, Δ "Q$ is the rate of be decline per unit increase in vpd,  :-I is the threshold vpd above which a plant will exhibit reduced radiation use efficiency. :-I is assumed to be 1 Kpa.

Parameter calibration
Next, we calibrated the key parameter of maximum leaf area index () and adjusted harvest index () accordingly countries in the world, the historical annual crop yield from FAO shows an apparent increasing trend.Hence, the usual way of splitting data into two periods (i.e., former for calibration, the latter for validation) didn't work.Therefore, we used the mean of even years for calibration and the mean of odd years for confirmation.Specifically, we calibrated the maximum leaf area index by iterating the values from 0.5 to 7.1, with an interval of 0.3, under both rainfed and irrigation conditions in the even 170 years from 1986 to 2015.The crop-specific best maximum leaf area index in each country was then determined as the value that can minimize the bias between the mean simulated yield and mean FAO statistical yield.When FAO statistical yield or simulated yield data are missing for a country, we took the original crop-specific default values.The calibration and confirmation results showed good agreement with the FAO statistics (Fig. S1). 175

Reference yield data
To calibrate and validate the simulated crop yield, several yield data sets with different spatial resolutions were collected.The country-level yield data from FAO (available at https://www.fao.org/faostat/en/#data,final accessing date is May 9, 2022) and 185 grid-level (0.5°) yield data from the Global Dataset of Historical Yield (GDHYv1.2+v1.3)(Iizumi et al., 2020) (available at https://doi.pangaea.de/10.1594/PANGAEA.909132) for the period of 1986 to 2015 were used to evaluate model performance.
FAO statistical yield was reported as fresh matter, whereas the model simulated yield denotes the dry matter.For consistency in the comparisons, as reported by Farder et al. ( 2010) and Müller et al. (2017), the FAO statistical yield was converted to dry mater with a crop-specific factor (e.g., 0.88, 0.88, 0.87, and 0.91 for maize, wheat, rice, and soybean) in accordance with 190 Wirsenius (2000).The global data set of historical yield for major crops (GDHY) yield data is a spatially explicit data set that converts the FAO annual national statistical yield to grid-level yield based on gridded net primary production estimated from several satellite products (Iizumi et al., 2020).The FAO statistical yield and GDHY yield provide valuable information for evaluation of crop model performances at country and grid levels, respectively (Müller et al., 2017;Iizumi et al., 2020). 195

Simulation setting and yield processing
After algorithm improvement and parameter optimization, two different simulations for maize, wheat, rice, and soybean were run under both rainfed and irrigation conditions from 1986 to 2015 on a daily scale.The simulation was performed with the default model and the improved model under the assumption that the four crops were planted and harvested in a hypothetical cropland of each grid cell.Under rainfed condition, the crop growth was subject to water stress; under irrigation condition,

200
there was no effect of water stress on crop growth.The yield processing is as follows: First, the gridded yield (Yld) was aggregated from simulated yield as follows: where  I56/ and  6II6 are the simulated yield under rainfed and irrigation conditions, respectively To further validate the above conjecture, we investigated the impacts of climate variables (i.e., precipitation and air temperature) on interannual yield variation by analyzing the correlations of total precipitation/mean air temperature in the growing season with the annual yield per crop.Using maize as an example (Fig. 4), there were no statistically significant relationships (p > 255 0.05) between precipitation and FAO statistical yield for most of the top 20 largest producer countries (17/20).Significant positive correlations between precipitation and the FAO statistical yield (p < 0.05) were found in only three countries: Romania, Hungary, and Serbia.The crop yield estimation relies on water availability; therefore, the variation in yield simulation largely reflects variation in precipitation.Accordingly, we observed good simulation performance in those three countries (Fig. 2) with a clear correlation between FAO yield and precipitation (Fig. 4).Also, there were no statistically significant relationships 260 between air temperature and FAO statistical yield for most of the top 20 largest producer countries (12/20) (Fig. 5).Similarly, there were no statistically significant correlations between precipitation/air temperature and FAO statistical yield in most countries for wheat, rice, and soybean (see Supplementary Figs.S5-10).

265
Spatially explicit yield data enabled us to more fully evaluate the spatial distribution of model simulations.We compared the spatial distribution between simulated crop yield (before and after improvement) and the GDHY yield data set.Using maize as an example, apparent overestimation was detected in many parts of the world (e.g., China, Argentina, Brazil, India, Indonesia, Thailand, Mexico, and most countries in Africa) in the default simulation (Fig. 6a).In contrast, the improved simulation (Fig. 6b) showed a spatial pattern similar to the GDHY yield data (Fig. 6c).For the yields of wheat, rice, and 270 soybean, the spatial distribution after improvement also showed a pattern similar to the GDHY yield data (Supplementary Figs.

S11-13).
In accordance with the method of Müller et al. (2017), we conducted grid-level time series analysis of the correlations of the detrended yield between simulated and GDHY data (Fig. 7) to further identify the differences in the two yield data sets.Using 275 maize as an example (Fig. 7a), statistically significant correlations (p < 0.1) were observed in a wide of range of regions (e.g., northeast USA, southern Europe, northeastern China, southern Brazil, eastern Argentina, southern Africa, and eastern Australia) (Fig. 7a).Notably, there were also substantial differences in a considerable number of locations without statistically significant correlations (p > 0.1) (e.g., southeastern USA, western and central Asia, Brazil, and central Africa) (Fig. 7a).Similar characteristics were found for wheat, rice, and soybean (Fig. 7b-d).

280
Such similarities or discrepancies between two yield data sets have been observed previously (see Fig. 9 in Müller et al., 2017).
For example, there were statistically significant correlations (p < 0.1) and no statistically significant correlations (p > 0.1) between two data sets developed by Iizumi et al. (2014b; an earlier version of GDHY used in this study) and Ray et al. (2012) in a wide of regions.Such comparisons can help to identify considerable disagreements in global estimates of the spatial 285 distribution of crop yield (Kim et al., 2021).Because it is difficult to determine whether one of these estimates is better than the others, the disagreement between our simulation and the GDHY data does not necessarily indicate that our simulation quality is low.

Limitations 290
Although crop yield simulations were improved, there were several limitations because of the assumptions, methods, and data sets used in this study.First, in accordance with the methods of previous studies (Müller et al., 2017;Jägermeyr et al., 2021), yield calculation and aggregation were conducted with the assumption that the irrigated harvest area and total harvest area per crop did not change throughout the study period; this assumption was based on data availability.However, these aspects do  2010), rather than using finer spatial scale (e.g., subnational or gird-level), which increased the uncertainty of the yield simulations within each country.As shown in Figure 6, the yield distribution is highly variable within a specific country.To incorporate the spatial heterogeneity in crop yield, ideally, parameter calibration should be conducted at grid-cell level (e.g., Iizumi et al., 2009;Sakurai et al., 2014).Although this approach has long-term promise, it is technically 300 challenging because of uncertainty in the global gridded yield products and the potential for inflation in the parameter optimization calculation.In addition, the calibrated parameter reflected the mean average sate, therefore might ignore the yearby-year variation.Third, the reference data set from GDHY does not represent purely observation-based yield, therefore, it is subject to errors or uncertainty resulting from its own methodology (e.g., errors in gross primary production and crop stress response) (Müller et al., 2017).Nonetheless, at current stage, both FAO and GDHY data sets remain good references for 305 evaluating the performances of crop models, as suggested or widely used in previous studies (Müller et al., 2017;Iizumi et al., 2020;Jägermeyr et al., 2021).Finally, our crop model is a simple model that does not fully represent the factors influencing crop growth.For example, we did not explicitly simulate N and P processes, although these effects are now reflected in the calibrated parameters (Fader et al., 2010).Additionally, the waterlogging effect is underrepresented in most crop models, including our model (Jägermeyr et al., 2021).Such physical mechanisms should be addressed in the development of future 310 models.

Case study to estimate the contribution of irrigation to global food production
Finally, to demonstrate the improved model can be applied for various food-water nexus study, a well-recognized study by Döll and Sibert (2010) which estimated the contribution of irrigation on global food production is revisited and traced.To

315
trace their work, a global crop yield model is needed which is capable to estimate crop yield reasonably well and deal with the effect of irrigation explicitly.
Irrigation plays a critical role in global food production.The literature usually indicates that approximately 40% of global total food production is from irrigated land (Postel et al., 2001;Siebert et al., 2005;Abdullah et al., 2006;Khan et al., 2006;Wada 320 et al., 2013;Perrone et al., 2020;Ringler et al., 2020;Borsato et al., 2020), but the rationale and country-specific variation have not been fully explained.To our knowledge, Postel (1992) reported one of the first estimates, whereby approximately 36% of the global food production was from irrigated land based on statistical data.Then, Siebert and Doll (2010) reported that irrigation contributed to approximately 33% of the global total production.Here, we revisited the irrigation contributions for global production of maize, wheat, rice, and soybean using our improved model.Irrigation contribution in percentage (I)

325
in a country (c) is defined as: *100%, where  6II6,Q and  I56/ ,  are the irrigated and rainfed yields for a country, respectively;  6II6,Q and  I56/,Q are the total irrigated and rainfed harvest areas for a country, respectively.
https://doi.org/10.5194/gmd-2022-285Preprint.Discussion started: 20 January 2023 c Author(s) 2023.CC BY 4.0 License.should be developed in future studies.Second, our calibration was conducted at the national scale in accordance with the method of Fader et al. (

Fig. 1 Fig. 2
Fig. 1 Comparison of mean simulated yield and mean FAO yield for the top 20 largest producer countries from 1986 to 2015.Default and improved indicates simulation using the default and improved model, respectively.Dashed green and yellow lines indicate ±10% and ±20% differences, respectively.SIM denotes simulated yield and FAO denotes reported yield from FAO. Panel (a) for maize, (b) for wheat, (c) for rice, and (d) for soybean, respectively.

Fig. 4
Fig. 4 Relationship between maize yield (blue: simulated; red: FAO) and total precipitation in the growing season from 1986 to 2015 for the top 20 largest producer countries.Y, yield; P, precipitation; R, correlation coefficient.520

Fig. 5
Fig. 5 Relationship between maize yield (blue: simulated; red: FAO) and mean air temperature in the growing season from 1986 to 2015 for the top 20 largest producer countries.Y, yield, T, air temperature; R, correlation coefficient.

Fig. 8
Fig. 8 Percentages of irrigation contribution to the global production of maize, wheat, rice, and soybean, respectively.