Calibration of the Crop model in the Community Land Model

Introduction Conclusions References

1 Introduction Development of Earth system models (ESMs) is a challenging process, involving complex models, large input datasets, and significant computational requirements.As models evolve through the introduction of new processes and through improvement of traditional algorithms, the ability of the models to accurately simulate feedbacks between coupled systems improves, although results may not have the desired impact on all areas.For example, Lawrence et al. (2012) estimate that changes to the hydrology parameterization may be responsible for the warm bias in high-latitude soils in the Community Land Model (CLM) version 3.5 to become cold biased in CLM4.0.Although testing of ESMs is extensive, ensuring after new developments are merged that the model can still perform with limited (if any) degradation, on rare occasions model behavior can be negatively affected.The strong nonlinearity of such models also makes parameter fitting a difficult task; and as global models are developed by several different user groups simultaneously, combinations of multiple alterations make identifying the specific cause that leads to of how climate will affect crop production and resulting carbon fluxes, and additionally, how cultivation will impact climate.
2 The CLM-Crop model CLM-Crop was designed and tested in the CLM3.5 model version (Drewniak et al., 2013).The crop model was created to represent crop vegetation similarly to natural vegetation for three crop types: maize, soybean, and spring wheat.The model simulates GPP and yield driven by climate, in order to evaluate the impact of climate on cultivation and the impact of agriculture on climate.Crops are modeled within a grid cell sharing natural vegetation; however, they are independent (i.e., they do not share the same soil column).This approach allows management practices, such as fertilizer, to be administered without disturbing the life cycle of natural vegetation.
Although the design of the crop model fits within the framework of natural vegetation, crops have a significantly different growing scheme, separated into four phases: planting, emergence, grain fill, and harvest.Each phase of growth changes how carbon and nitrogen are allocated to the various plant parts: leaves, stems, fine roots, and organs.During planting, carbon and nitrogen are allocated to the leaf, representative of seed.This establishes a leaf area index (LAI) for photosynthesis, which begins during the emergence phase.The emergence phase allocates carbon and nitrogen to leaves, stems, and roots using functions from the Agro-IBIS model (Kucharik and Brye, 2003).During the grain fill stage, decreased carbon is allocated to leaves, stems, and roots in order to fulfill organ requirements.When maturity is reached, harvest occurs: all organs and 60-70 % of the leaves and stems are harvested; and the remaining leaves, stems, and roots are turned into the litter pool.
The allocation of carbon to each plant part is driven largely by the carbon-nitrogen (CN) ratio parameter assigned to each plant segment.CLM first calculates the potential photosynthesis for each crop type based on the incoming solar radiation and the LAI.The total nitrogen needed to maintain the CN ratio of each plant part is calculated as plant demand.If soil nitrogen is sufficient to meet plant demand, potential photosynthesis is met; however, if soil nitrogen is inadequate, the total amount of carbon that can be assimilated is downscaled.
During the grain fill stage, a nitrogen retranslocation scheme is used to fulfill nitrogen demands by mobilizing nitrogen in the leaves and stems for use in organ development.This scheme uses alternate CN ratios for the leaf and stem to determine how much nitrogen is transferred from the leaves and stems into a retranslocation storage pool.The total nitrogen transferred at the beginning of the grain fill stage from the leaf and stem is represented by C leaf and C stem are the total carbon in the leaf and stem, respectively; leafcn and stemcn are the pregrain fill CN ratios for the leaf and stem; and fleafcn and fstemcn are the post-grain fill CN ratios for the leaf and stem.All of the CN ratios are fixed parameters which vary with crop type; initial values are reported in Table 1.
In addition to the above, CLM-Crop has a fertilizer application, dynamic roots, and soybean nitrogen fixation, described by Drewniak et al. (2013).Planting date and time to maturity are based on the Crop Calendar Dataset (Sacks et al., 2010).For the calibration procedure, we used the actual planting date reported for the Bondville site for the year 2004.Crops are not irrigated in the model, nor do we consider crop rotation.Although rotation will have an impact on the carbon cycle both above and below ground, CLM does not support crop rotation at this time.
The version of CLM-Crop detailed in (Drewniak et al., 2013) was calibrated against Amer-iFlux data for both the Mead, NE, and Bondville, IL, sites' plant carbon measurements, for both maize and soybean, using optimization techniques to fit parameters.When available, parameter values were taken from the literature or other models.Remaining parameters were derived through a series of sensitivity simulations designed to match modeled carbon output with AmeriFlux observations of leaf, stem, and organ carbon at the Bondville, IL, site and total plant carbon at the Mead, NE, (rainfed) site.
When CLM-Crop was ported into the CLM4 framework, the parameter values were no longer optimized as a result of various changes in model processes that affected how crops fit into the model framework.Therefore, we needed to retune the model parameters that represented crops with a more sophisticated approach described later in this paper.

Parameters affecting the crops
Over 100 parameters are defined in CLM4 to represent crops.Many of these parameters are similar to those that govern natural vegetation, but some are specific to crops.These parameters define a variety of processes, including photosynthesis, vegetation structure, respiration, soil structure, carbon nitrogen dynamics, litter, mortality, phenology, and more.To add further complication, parameters are assigned in various parts of the model; some parameters are defined in an external physiology file, some are defined in surface datasets, and others are hardcoded in the various subroutines of CLM4.
Performing a full model calibration for all parameters would be a monumental task, so we began our calibration process by narrowing down the parameters that are used only in crop functions or might have a large influence on crop behavior.Of this list, parameter values can be fixed across all vegetation types (or crop types), vary with crop type, or vary spatially and by crop type.We chose to limit the parameters to those that are either constant or vary with crop type.
Crop parameters are nominally defined from literature (when available), used to define a range of values appropriate for each crop type.In cases when parameters are not available, optimization techniques are used to estimate parameter values based on CLM performance.Determining a full range of acceptable parameter values was difficult for several parameters, and in some cases not possible.Of the full list of parameters in need of calibration, we began our approach with the six parameters listed in Table 1 that have a large influence on crop productivity and have the greatest uncertainty because the values are based on optimization from a previous model version.These six parameters are the carbon nitrogen ratios for the various plant parts (leaf, stem, root, and organ).Since the leaf and stem account for nitrogen relocation during grain fill, they are represented by two separate CN ratios, to separate pre-and postgrain fill stages of plant development.They influence how carbon and nitrogen are allocated, thereby impacting growth, nutrient demand, photosynthesis, and so on, and are included as part of the

Description of the observational data set
We used observations from the Bondville, IL AmeriFlux tower located in the midwestern United States (40.01 • N, 88.29 • W) using an annual no-till corn-soybean rotation; a full site description is given by Meyers and Hollinger (2004).The site has been collecting measurements since 1996 of wind, temperature, humidity, pressure, radiation, heat flux, soil temperature, CO 2 flux, and soil moisture.Soybeans were planted in 2002 and 2004 and corn was planted in 2001, 2003, and 2005.We used daily averaged eddy covariance measurements of NEE and derived GPP in our model calibration procedure, which are categorized as Level 4 data published on the AmeriFlux site, gap filled by using the Marginal Distribution Sampling procedure outlined by Reichstein et al. (2005).GPP is derived as the difference between ecosystem respiration and NEE, where ecosystem respiration is estimated using Reichstein et al. (2005).In addition, biomass information (which we convert to carbon) and LAI have been collected for years 2001-2005 for the various plant segments, including leaf (LEAFC), stem (STEMC), and organ (ORGANC), which are reported in the AmeriFlux website (http://public.ornl.gov/ameriflux).The frequency of biomass measurements is generally every seven days beginning a few weeks after planting and continuing through the harvest.We chose to calibrate against the Bondville AmeriFlux site because of the availability of unique biomass data collected.By performing the calibration against site data which includes crop rotation, we hope to indirectly include the effects of crop rotation on GPP and NEE in the model.
The time-dependent observations are denoted by y = {GPP, NEE, ORGANC, LEAFC, STEMC, LAI}.Because of uncertainties in fertilization use and measured data, we focused on the peak observed values as well as the growth slope for GPP, NEE, LEAFC, and STEMC.To remove the atmospheric induced noise in the NEE and GPP measurements we filtered the time series by applying a moving average operator with a width of 30 days.These operations are denoted by the map H y (y) = {max(abs(y)),slope(y)}, Y = H y (y), where y represents the filtered y and the slope is calculated in the beginning of the plant emergence phase, resulting in one maxima and one slope per variable per year.The observed GPP and NEE slopes were computed as the slope between 208th day and 188th day for 2002 and between 180th day and 160th day for 2004.The observed LEAFC and STEMC slopes were computed based on observed values on 7/16-8/13 and 7/23-9/10 for 2002; and 6/8-7/27 and 6/8-8/10 for 2004, respectively.The slopes estimated from numerical simulations were computed as the variable slopes between the date when GPP reaches 0.3 and 20 days ahead of it.

Initial conditions and spinup
CLM requires a spinup to obtain balanced soil carbon and nitrogen pools, which are responsible for driving decomposition and turnover.CLM is spun up by using the method provided by Thornton and Rosenbloom (2005), with crops simulated as grass, such that final soil carbon pools are reflective of natural vegetation.After the initial spinup is complete, grid cells growing crops are converted from grass to represent the appropriate amount of land surface occupied by agriculture.The model is run an additional 200 yr to rebalance the soil pools.In this study we spun up the initial litter, carbon, and nitrogen pools by using the default parameter values.
The meteorological forcing data used for the spinup is from the Bondville, IL flux tower site.The model is run in point mode, meaning only one grid cell is simulated, at a resolution of 0.5 • × 0.5 • .Since we do not have the meteorological data necessary to cover the entire spinup period, we cycle continuously through the period of data from 1996 to 2007 available for this site.

Calibration strategy
We represent the CLM-Crop model by f (x;θ), where θ are the time-independent parameters that we wish to calibrate and x are the internal states of the model.We consider different sets of calibration parameters according to their perceived level of uncertainty and importance in the crop development processes.The first set consists of plant specific physiological parameters: leafcn, fleafcn, fstemcn, organcn, frootcn, and livewdcn (see Table 1 and Sect.2.1 for details).
The model calibration strategy aims to merge model predictions that depend on parameters θ with observational datasets.Here we denote the model output by F (θ) = H(f (x;θ)), where H is a function that maps the model output to observation space Y obtained similarly with the procedure described in Sect.2.2.
We assume that the relationship between observation data and the true process follows a relationship of type where θ * are the perfectly calibrated parameters and ε represents the observational errors.This holds under the assumption that the model is a perfect representation of reality (Kennedy and O'Hagan, 2001).The problem statement can be extended to account for imperfect models, but then the statistical description of ε tends to become much more complicated.Therefore, for this study we start by considering a perfect model assumption.Following a Bayesian approach, we assume a prior distribution on the calibration parameters: where θ are the default parameters, K θ = 1 2 log(det(Σ θ )) + n θ 2 log(2π), Σ θ is the prior covariance, and n θ = dim(θ).We define the likelihood as where Σ obs = Cov(ε) and K y is defined similarly with K θ .The calibration result in the posterior distribution π(θ|y) ∝ π(y|θ)π(θ). (7) We use the Metropolis Hastings algorithm to estimate the posterior distribution (Chib and Greenberg, 1995).To accelerate and diagnose the convergence, we implemented a parallel version of the algorithm that consists in running several Markov chains in parallel while adjusting a Gaussian proposal distribution according to their spread (Solonen et al., 2012;Craiu et al., 2009).This algorithm and convergence diagnostics are briefly described in Appendix A.

Results
In this section we present our calibration results for parameters described in Sect.2.1 by using observations detailed in Sect.2.2.In this study we focus only on the CN parameters affecting the soy crop and restrict our calibration to year 2004.With these calibrated parameters we perform a validation experiment by using the data from year 2002.Moreover, we perform a twin experiment that consists in generating artificial data by using some control parameter values, then perturbing those parameters and applying the calibration strategy to recover the control values.

Validation of the method
We begin with a twin experiment with the aim of validating the parallel MCMC strategy applied in this study.We generate artificial observations by using the default parameter values and then perturb the parameters.We apply the calibration strategy using the perturbed parameters as initial guesses and the artificial observations; our aim is to recover the default parameters by 30% using a normal distribution.In Fig. 1a we show the box plot summary of the calibrated parameters.We note an almost perfect fit between the calibrated parameters and their default values, indicating that the method used in this study is appropriate.

Calibration using real data
In our next experiment we calibrate the six parameters listed in Table 1.The observational operator (Eq. 3) is defined by taking the annual maximum of the absolute value of LEAFC, LAI, ORGANC, STEMC, GPP, and NEE; and the slope of LEAFC, STEMC, GPP, and NEE as described in Sec.2.2.We applied the MCMC calibration strategy described in Appendix A.

Validation of real data results
To validate the generalization potential of our calibration we perform a one-way validation.We use the calibrated plant parameters result of soybean data of 2004 (Sect.4.2) to predict the observables obtained in 2002.In Fig. 3 we plot the model time series with observations and the control output of 2002.Here we note good performance in the success metric established for this study.There is, however, a temporal shift in the time series of 2002 that can be attributed to a mismatch in the planting dates.As noted in Sect.2, the planting date in CLM-Crop is fixed based on actual planting data as reported by AmeriFlux and therefore not subject to change based on seasonal conditions such as temperature and precipitation.Our analysis focuses on the slope of the growth and the peak of the GPP and carbon, which indeed show much improvement from the default parameter values.In Table 2 we show the sample correlation correlation matrix of the posterior parameter distribution.Here we note that the correlations between parameters is relatively small except for postgrain stem CN and organ CN ratio, which are weakly anticorrelated.

Discussion
In this paper, we sought to improve CLM-Crop model performance by parameter calibration of a subset of model parameters governing the carbon and nitrogen allocation to the plant components.By using an MCMC approach, we were able to improve the model-simulated GPP, NEE, and carbon biomass to leaf, stem, and organ with the new parameter values.In addition, we demonstrated that the calibrated parameters are applicable across alternative years and not solely representative of one year.
This study does have a few limitations stemming from a lack of observation data.Currently our results are suitable at one site across multiple years; testing at multiple sites would give a better indication of how well the model can perform globally, or even across a region.However, the limited data over agricultural sites constrains our ability to determine parameter values that are relevant at a global scale.Also, our use of fixed planting dates does not allow the model to modify when planting occurs as a farmer would in situ.Thus, the model may plant earlier or later compared to observations, which, if significant, could influence the growth cycle and resulting carbon fluxes.In addition, CLM-Crop does not have crop rotation, which is common across agricultural landscapes, including in the observation dataset.Crop rotation can modify below-ground carbon and nitrogen cycling that would have an impact on crop productivity through nutrient availability as well as NEE.While we would like to include crop rotation, CLM does not currently have the capability to support this function.Therefore, we tried to include the effects indirectly by calibrating against data that includes crop rotation.As more sophisticated crop representation is introduced into the model, we will revisit the calibration to improve model parameters.Moreover, we considered the initial litter, carbon, and nitrogen pools fixed by the values of the prior paramters because a direct spinup calculation would have made sampling prohibitively expensive.We will address this issue in a future study by including these pools in the calibration procedure.
Our approach has focused on one crop type, soybean, with the intent of determining the effectiveness of the MCMC method in performing parameter calibration.We consider the results promising and, as part of future work, hope to expand this research to additional years, crop types, and other parameters.Many other variables are of interest, including specific leaf area, fertilization rate, timing of the growth stages, respiration rates, and a few other parameters related to photosynthesis.As the model continues to evolve with the addition of new or improved processes, we also may need to revisit the parameter choices and evaluate their appropriateness.Moreover, a calibration procedure carried for such complex models with relatively little data and a few calibration parameters has the potential to lead to overfitting.In order to assess this effect we performed a validation experiment, which provides good confidence, albeit not proof, of a robust calibration of the parameters.Richer datasets will likely sharpen the results and enhance the confidence intervals.
The introduction of new datasets documenting agriculture productivity or carbon mass will also allow us to determine the applicability of our new parameter values across regions.In general, the calibration results depend on an accurate specification of the observational errors.In this study we did not have access to any information regarding the measurement process and, Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | physiology data file.

Discussion
Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper |

Fig. 1 .Fig. 2 .Fig. 3 .
Fig. 1.Calibrated C/N parameters for soybean in 2004 at Bondville, IL (40.01 • N, 88.29 • W) for (a) the twin experiment using artificial observations and (b) the real data calibration.The solid black line indicates the default values, and the thin red line indicates the median value for the parameter posterior distribution.The median value was used as the final calibrated parameter value.

Table 1 .
Parameters chosen for calibration.

Table 2 .
Posterior correlation of the parameters.