Evaluation of a terrestrial carbon cycle submodel in an Earth system model using networks of eddy covariance observations

Improvement of terrestrial submodels in Earth system models (ESMs) is important to reduce uncertainties in future projections of global carbon cycle and climate. Since these submodels lack detailed validation, evaluation of terrestrial submodels using networks of field observations is necessary. The purpose of this study is to improve an ESM by refining a terrestrial submodel using eddy covariance observations.We evaluated the terrestrial submodel (MOSES2/TRIFFID) included in the UVic Earth System Climate Model (UVic-ESCM) and tested the effects of terrestrial submodel improvements on future projection of carbon cycle and climate. First, we evaluated the terrestrial submodel as an off-line mode at point scales using 48 eddy covariance observation data, and improved it through fixing model parameters and structures. The terrestrial submodel was improved with the reduction of the root mean square error and the closer simulation of the seasonal carbon fluxes. Second, using the UVic-ESCM with the improved terrestrial submodel, we confirmed model improvement at most observation sites. The terrestrial submodel refinement also affected future projections; the UVic-ESCM with the improved terrestrial submodel simulated 100 ppmv lower atmospheric CO2 concentration in 2100 compared with the default UVic-ESCM. Our study underscores the importance of refinement of terrestrial submodels in ESM simulations.


Introduction
Earth system models (ESMs), which simulate the coupled cycle of climate and carbon among the atmosphere, land, and ocean, are now widely used to project future changes in climate and carbon cycle due to anthropogenic CO 2 emission (e.g. Cox et al., 2000;Lenton, 2000;Joos et al., 2001;Dufresne et al., 2002;Ichii et al., 2003;Kheshgi and Jain, 2003;Zeng et al., 2004;Friedlingstein et al., 2006). The inclusion of coupled climatecarbon cycle causes a positive climate-carbon cycle feedback effect and amplifies future climate changes on both global and regional scales (e.g. Cox et al., 2000;Friedlingstein et al., 2006;Yoshikawa et al., 2008). These ESMs have also been used to simulate historical changes in the global carbon cycle (e.g. Kato et al., 2009), to analyse the future CO 2 emission targets in an effort to stabilize the atmospheric CO 2 concentration (e.g. Matthews et al., 2005a;Miyama and Kawamiya, 2009;Zickfeld et al., 2009) and to determine the effectiveness of geoengineering schemes (Matthews and Caldeira, 2008).
These ESMs contain large uncertainties among models (Friedlingstein et al., 2006) and need further improvements. Through the model intercomparison study, Friedlingstein et al. (2006) pointed out that the uncertainty of the carbon budget and its feedback effects for in the terrestrial biosphere is larger than those reported for in the ocean. For example, they demonstrated that the projected terrestrial carbon budget differs among models even in sign. Therefore, further improvement of terrestrial submodels is required. One of the potential causes of the uncertainties is inaccurate parametrizations of various subprocesses. Therefore, more evaluation and calibration of model are required using observation.
Furthermore, many studies have analysed the effects of the uncertainties in the terrestrial submodels on future climate and carbon cycle projection within the ESM framework (e.g. Ichii et al., 2003;Jones et al., 2003;Thompson et al., 2004;Matthews et al., 2005a;Bala et al., 2006;. Jones et al. (2003) analysed the effect of different temperature sensitivities of soil decomposition. Thompson et al. (2004), Matthews et al. (2005a), and Bala et al. (2006) analysed the impact of the setting of the photosynthesis model on future projections. O'Ishi and Abe-Ouchi (2009) tested the impact of biogeographical, biogeochemical and biogeophysical effects under quadrupled CO 2 concentrations. Most of these analyses lack validations of terrestrial submodels. As one exception, Kato et al. (2009) evaluated the seasonality and interannual variabilities of terrestrial carbon fluxes. They found that only four of fifteen validation sites simulated reasonable seasonal and interannual variation of Net Ecosystem Productivity (NEP). Therefore, we need to improve the terrestrial submodel in ESM, and analyse the effects of site level performance evaluations and corresponding model improvements to a fully coupled ESM.
Eddy covariance observation data have become widely available in recent years (e.g. Baldocchi et al., 2001;Baldocchi, 2008), and these data can be used to evaluate and refine ESMs. Thus, in this study, we evaluated and improved a terrestrial submodel included in an ESM using about 50 eddy covariance observations. Then, we evaluated the effects of improvement of the terrestrial submodel on future projections of global environmental changes based on an ESM. Toward a systematic method of refining ESMs, which is also applicable to other ESMs, we first conducted an off-line model experiment and improved the terrestrial submodel alone. Then, by connecting back to the ESM, we assessed the effect of terrestrial submodel improvements on the ESM simulations at both point and global scales.  Matthews et al., 2009;Zickfeld et al., 2009). The UVic-ESCM is an ESM of intermediate complexity that includes coupled processes of climate and carbon cycle among atmosphere, land and ocean. The model consists of a vertically integrated energy-moisture balance atmospheric model coupled to the Modular Ocean Model version 2 (MOM2) ocean general circulation model (Pacanowski, 1995), terrestrial and ocean carbon cycle model, and a dynamic-thermodynamic seaice model . The terrestrial submodel consists of a modified version of a simple land surface model, Met Of-fice Surface Exchange Scheme version 2 (MOSES2; Essery and Clark, 2003), and a dynamic vegetation model, Top-down Representation of Interactive Foliage and Flora including Dynamics (TRIFFID; Cox, 2001) (Meissner et al., 2003). The horizontal resolution is 1.8 • × 3.6 • , and the ocean model has 19 vertical levels. The model has been widely used to project future global environmental changes due to anthropogenic CO 2 emission (Matthews et al., 2005b) and inversely calculate allowable anthropogenic CO 2 emission (Matthews and Caldeira, 2008). Details of the terrestrial submodel are given in the next section.

Model and data
2.1.2. Terrestrial submodel. A coupled model of MOSES2 and TRIFFID (MOSES2/TRIFFID) simulates carbon and water fluxes, such as photosynthesis and evapotranspiration, at an hourly time scale for vegetation and soil. It defines five plant functional types (PFTs): broadleaf tree, needleleaf tree, C3 grass, C4 grass and shrubs. Photosynthesis is calculated by a function of CO 2 , light, soil moisture, temperature and nutrients using a leaf-level model (Collatz et al., 1991(Collatz et al., , 1992 with coupled photosynthesis and stomatal conductance (Cox et al., 1999). Using the accumulated carbon fluxes passed from MOSES2, TRIFFID updates the vegetation and soil carbon at a monthly time scale. The vegetation dynamics module in TRIFFID also updates the fraction of each PFT based on the Lotka-Volterra competition equations. Carbon is passed to a single soil carbon pool through litterfall and vegetation mortality. Soil carbon decomposition is controlled by soil temperature using a Q 10 formulation and water stress.
The MOSES2/TRIFFID model was modified to suit simplicity of the UVic-ESCM. The MOSES2 is slightly different from its original version (Meissner et al., 2003) in its algorithm and structure. Difference are that soil water box is set as one box, rooting depths are set constant for all vegetation types (default: 1 m), and canopy interception of rainfall and its evaporation are not considered. The TRIFFID model is the same as the original one (Cox, 2001) except for some slightly different parameter settings.
The required model climate inputs are as follows: daily climate of average temperature (Tave), precipitation (Prec), relative humidity (RH), incoming surface solar radiation (Srad), diurnal temperature range (DTR) and wind speed (Wind). Atmospheric CO 2 concentration is also necessary to run the model. For the off-line simulation (see Section 3.1), we extracted the terrestrial submodel from UVic-ESCM to run the model in off-line mode forced by external climate data as model inputs. In the ESM simulation (see Section 3.2), these time variables for the model inputs are passed by the atmospheric submodel.

Site observation data
We used data from 48 eddy covariance observation sites across Ameriflux, CarboEurope and AsiaFlux to evaluate the terrestrial submodel. To select sites, we used the sites that have less missing data in the time series. Details for each site  From meteorological observations at eddy covariance observation sites, we also created the climate input data (Tave, Prec, RH, Srad, DTR and Wind) for the MOSES2/TRIFFID model. For off-line simulation, we used observed climate data from each eddy covariance observation site and long-term climate reanalysis data  from NCEP/NCAR reanalysis data (Kalnay et al., 1996), which was extracted at the corresponding grid. NCEP/NCAR reanalysis data were used to extend period of observed climate data at each site because observational period of these data were short. For Tave and Srad, NCEP/NCAR reanalysis data were corrected using observed daily data based on linear regression. For Prec, daily NCEP/NCAR precipitation data were corrected using observed monthly total precipitation data by obtaining the multiplier while preserving the frequency of rainy days of NCEP/NCAR reanalysis data. For other climate input data, we used NCEP/NCAR reanalysis data directly.

Experiments
To evaluate and improve the terrestrial submodel included in the UVic-ESCM, we conducted two experiments: an off-line experiment and an ESM experiment (Fig. 1). In the off-line experiment, we used the terrestrial submodel forced by observed climate inputs (off-line model run). We evaluated and refined the model at a point scale using eddy covariance observations. In this process, we first tested the default MOSES2/TRIFFID model with eddy covariance observations (default MOSES2/TRIFFID). Next, we refined the MOSES2/TRIFFID model using eddy covariance observations (improved MOSES2/TRIFFID). Then, in the ESM experiment (carbon cycle-climate coupled model run), we used the default and improved MOSES2/TRIFFID model as a terrestrial submodel in the UVic-ESCM. We tested the effects of the terrestrial submodel improvements on the ESM simulations.

Off-line experiments
We conducted off-line simulations using the terrestrial submodel extracted from UVic-ESCM (Step 1 in Fig. 1). We ran the submodel with the default settings at each eddy covariance observation site using input climate data. Then, we tuned the submodel and its parameters to reproduce the observed ET, GPP, RE and NEP. Spin-up was conducted using whole 1948-2006 climate data with the atmospheric CO 2 concentration fixed at the 1948 level (310.3 ppmv) for 1000 yr. After that, we ran the model from 1948 to 2006 with a time-variant CO 2 concentration (Etheridge et al., 1998;Keeling et al., 2009).
Evaluation of the terrestrial submodel was conducted by comparing with observed ET, GPP, RE, and NEP at monthly temporal scales. To highlight the effect of the model improvement on outputs, we analysed the results from both the default and improved models with observations. Then, we discussed how the model improvements were effective and which processes have potential problems in the default model. First, we evaluated the modelled ET, GPP, RE, and NEP at all sites, by comparing with observed data at a monthly time scale. Second, to investigate in more detail, the seasonal time series in ET, GPP, RE and NEP were evaluated at each eddy covariance observation site.
To analyse the effect of the model refinement, we calculated the root mean square error (RMSE) between observed and modelled monthly variations in ET, GPP, RE and NEP from the default and improved models, respectively.

ESM experiments
Using the UVic-ESCM with the default and improved terrestrial submodels, we conducted a two-step ESM experiment to analyse the effect of the terrestrial submodel improvement on the future projection of the global carbon cycle and climate. First, we evaluated the modelled carbon (GPP, RE and NEP) and water (ET) fluxes at eddy covariance observation sites (Step 2 in Fig. 1). We evaluated the monthly variations in carbon and water fluxes extracted from the corresponding grid, with the same eddy covariance observation sites as in the off-line experiment. Second, we analysed the effect of the terrestrial submodel Tellus 62B (2010), 5 Step 1: Step 2: Step 3:

Future Projection
• Carbon Budget • Climate Fig. 1. Overview of the procedure of the ESM improvement in this study. In step 1, we extracted the terrestrial submodel (MOSES2/TRIFFID) from the UVic-ESCM, and we evaluated and refined it using eddy covariance observations. In step 2, we evaluated the effect of terrestrial submodel improvements on the ESM simulation at the eddy covariance observation site. In step 3, we evaluated the effect of terrestrial submodel improvements on the projection of the carbon cycle and climate based on the ESM simulation at a global scale. refinement on the future projection of the global carbon cycle and climate changes (Step 3 in Fig. 1). By running UVic-ESCM with the default and improved terrestrial submodels until 2100, we evaluated the differences in the global carbon budget [atmospheric CO 2 concentration, terrestrial carbon fluxes and pools (GPP, NEP, biomass and soil carbon pools)] and temperature between these two different ESM settings.
The model spin-up and run includes three steps: uncoupled and coupled model spin-up, and coupled model run. First, with CO 2 concentration fixed at the 1850 level, uncoupled spin-up was conducted for 500 yr. Second, coupled spin-up was conducted for another 500 yr. Then, we ran the model from 1850 to 2100 using historical CO 2 emission (Marland et al., 2005) and the Intergovernmental Panel on Climate Change (IPCC) Special Report on Emissions Scenarios (SRES) A2 emission scenario.

Off-line experiments
In the default terrestrial submodel simulation, the model overall underestimated observed monthly ET and carbon fluxes (GPP, RE and NEP) (Fig. 2)  NEP, respectively). Of note, there was no correlation between modelled and observed NEP; the terrestrial submodel failed to reproduce observed NEP.
To investigate this issue in more detail, monthly variation in ET and carbon fluxes (GPP, RE and NEP) at four representative sites from different climatic zones (Hyyti, Finland (FIHyy) as a high latitude site; Howland Forest, US (USHo2) as a midlatitude site; Takayama, Japan (JPTak) as a mid-latitude site and Tapajos KM67, Brazil (BRSa1) as a tropical site) were investigated (Fig. 3). We found that simulated seasonal variation of water and carbon fluxes largely deviated from the observations for all sites. For example, for ET, all sites underestimated its seasonal peak (Figs 3a, e, i and m). In USHo2 and JPTak, ET was overestimated during the snow season in winter and spring (Figs 3e and i) due to overestimation of snow sublimation. For GPP and RE, most sites underestimated their seasonal amplitude (i.e. the peak magnitudes in summer). In USHo2 and JPTak, simulated seasonal peak of GPP was substantially lower than observed peaks (Figs 3b, c, f, g, j, k, n and o). Nevertheless, simulated GPP was overestimated during spring and autumn (Figs 3f and j). In BRSa1, both GPP and RE were underestimated by half of the observed value with a GPP decline during the dry season (from August to November in this site), which is inconsistent with observations (Figs 3n and o). The model also failed to reproduce the NEP seasonal phase as well as amplitude (Figs 3d, h, l and p). The major cause is substantial underestimation of simulated GPP compared with RE. In FIHyy, seasonal variation pattern of NEP was comparatively well reproduced; however, the peaks were underestimated (Fig. 3d). In USHo2, JPTak and BRSa1, NEP seasonal variations were not reproduced with negative land carbon uptake during summer (USHo2 and JPTak) and the dry season (BRSa1) (Figs 3h, l and p).
Through the default submodel simulation, we found that the default terrestrial submodel needs substantial refinements to reproduce carbon and water fluxes more accurately. To improve the terrestrial submodel, we set the following guideline to minimize the ambiguity and subjectivity of the model modification. First, we used the slopes of regression lines and determination coefficients between observed and simulated seasonal water and carbon cycles as criteria for model evaluation. We modified the terrestrial submodel to bring the slopes of regression lines and determination coefficients closer to 1. Second, only the minimum changes were applied to the model. Through the sensitive analysis of various model parameters, a few influential parameters (shown below) were selected and tuned to fit the observed seasonal water and carbon cycles. Third, changes in model structure were limited to the snow submodel only. Since the snow submodel itself is independent from other components with few degrees of freedom in the structure, objectivities of the submodel modification are mostly retained. These model modifications were conducted by hand, and the best parameters were determined through comparison with seasonal variations of the water and carbon budget. As a result, to improve them, we applied the following four modifications: (1) the snow sublimation and melting model was modified to prevent anomalous large sublimation during the snow season, (2) the quantum efficiency of photosynthesis and the nitrogen concentration were increased to enhance magnitude of GPP, (3) the temperature sensitivity of photosynthesis was changed to remove biases of GPP overestimation during spring and autumn and (4) deeper rooting depth (1.5 m) was set to prevent sudden GPP decline by water stress during the dry season in the tropics. Details of the model parameters are given in Table 2. Most of these modifications have been previously reported in other terrestrial ecosystem modelling studies. For example, Ichii et al. (2008) reported that the underestimation of snow cover in the Biome-BGC terrestrial ecosystem model (Thornton et al., 2002) causes biases of seasonal water and carbon cycle. Zaehle et al. (2005) reported that the photosynthesis efficiency parameter is one of the most sensitive parameters to determine the magnitude of photosynthesis in the Lund-Potsdam-Jena Dynamic Global Vegetation Model (LPJ-DGVM). Ichii et al. (2007Ichii et al. ( , 2009 reported that setting of rooting depth in the terrestrial ecosystem model significantly affects seasonal patterns in modelled ET and GPP in a seasonally dry environment. By applying these modifications, the terrestrial submodel was greatly improved (Fig. 2) compared with the results from the default model simulations. For example, the slopes of the regression lines became closer to 1, changing from 0.52 to 0.76 for ET, 0.36 to 0.64 for GPP, 0.48 to 0.71 for RE, and 0.02 to  Refinement of the terrestrial submodel also improved the simulation of the seasonal variation of water and carbon fluxes (Fig. 3). For ET, overestimation during the snow season and underestimation of seasonal amplitude were greatly improved (Figs 3a, e, i and m). In USHo2 and JPTak, the anomalous ET overestimations during the snow cover season were removed by the model refinement (Figs 3e and i). In FIHyy and USHo2, seasonal variations of ET were more accurately simulated (Figs 3a and e). For GPP and RE, underestimation of the seasonal peak was also greatly improved at most sites (Figs 3b, c, f, g, j, k, n and o). In FIHyy and USHo2, simulated seasonal variation of GPP and RE were closer to observed data overall (Figs 3b, c, f and g). In JPTak, underestimation of peak GPP in summer and overestimation of GPP in spring and autumn were improved (Fig. 3j). In BRSa1, magnitude of GPP and RE became closer to the observed magnitudes. For these seasonal variations, the biases of rapid decline during the dry season were removed (Figs 3n and  o). For NEP, the seasonal variation (e.g. seasonal amplitude and phase) slightly improved (Figs 3d, h, l and p), probably due to lack of a site history effect in the model. In all sites, simulated NEP comparatively improved; however, there is still room for improvement.
After the refinement, the model successfully reduced the RMSE of observed and simulated monthly variation in ET, GPP, RE and NEP at most sites (Fig. 4). The number of sites where RMSE was reduced after the model improvement was 28, 33, 17 and 35 sites for ET, GPP, RE and NEP, respectively. For seasonal variation of ET, GPP and NEP, we found improvement at more sites (more than half of the total); however, only about one-third of the all sites were improved for RE. The poor improvement of RE is probably caused by the effect of land use and management change, which we did not include in this analysis.

ESM experiments 4.2.1. Point scale analysis.
The refinement of the terrestrial submodel also improves ESM simulation at a point scale (Fig. 5). The ESM with the default terrestrial submodel mostly underestimates the monthly ET and carbon fluxes at all sites. Once we replaced the default terrestrial submodel with the improved one in the ESM, the simulation results of ET and carbon fluxes were greatly improved. The slopes of the regression line were increased toward 1 (0.58 to 0.94 for ET, 0.36 to 0.70 for GPP, 0.49 to 0.77 for RE and 0.05 to 0.26 for NEP), and higher correlation resulted (R 2 = 0.50 to 0.70, 0.47 to 0.61, 0.60 to 0.61 and 0.01 to 0.22 for ET, GPP, RE and NEP, respectively). There is a room for further improvements especially for the NEP. Potential causes are biases of the atmosphere submodel and differences in the spatial scales of a grid in the ESM and a footprint of observations.

Global scale carbon cycle and climate analysis.
Improvement of the terrestrial submodel also has effects on the simulated time variations of global climate and carbon cycle based on the ESM. Simulated atmospheric CO 2 concentration by the improved ESM (UVic-ESCM with the improved terrestrial submodel) was 100 ppmv lower in 2100 than that determined with the default ESM (UVic-ESCM with the default terrestrial submodel) (Fig. 6a). Change of the carbon budget also influenced simulated climate. Simulated global averaged temperature indicated a gradual rise compared with the default model and the difference is about 0.7 K in 2100 (Fig. 6b). As a result, the increase of the global averaged temperature is estimated to be about 3 K relative to 1850 by 2100.
One of the potential causes of difference in atmospheric CO 2 and global temperature is increased land carbon uptake in the simulation by the ESM with the improved terrestrial submodel. The result of the improved ESM showed a more upward trend of GPP and RE with a larger gradient than that observed with the default ESM (Figs 6c and d), which results in dramatic changes in NEP. Although the default ESM simulated a rapid NEP decrease after 2050, the improved one showed an increase (Fig. 6e). The changes resulted in a change in the terrestrial carbon reservoir (i.e. vegetation and soil). Simulated change in vegetation carbon showed that an increment in vegetation carbon was much the same in both the default and improved model simulations (Fig. 6f). In contrast, a simulated change in soil carbon indicated a great increase by 2100 using the improved model (Fig. 6g). The value was twice as high as compared with estimated soil carbon by the default model. Thus, increasing soil carbon is thought to contribute to an increase in NEP.

Discussion
This study could provide an important contribution to the ESM community toward the reduction of the uncertainties among models. One of the causes of the uncertainties in ESM comes from uncertainty in terrestrial submodels. This study demonstrated that (1) a terrestrial submodel included in an ESM needs refinements made by use of observation and (2) the submodel refinement has an impact on carbon cycle and climate simulation at both the point and global scales in the ESM. The potential contribution to the ESM community and remaining problems are described in this section.
Evaluation of the terrestrial submodel is a very important initial step to evaluate ESMs. First, we found biases of underestimation in terrestrial carbon cycle seasonality at multiple point scales. These biases potentially affect the online simulation and future projection of the carbon cycle and climate. Through confirmation of these biases and their causes, we found and fixed several potential problems in terrestrial submodels. However, most current ESMs are not evaluated well at the regional and global scales. These insufficient validations can cause large uncertainties in future projection.
Second, isolating the terrestrial submodel from an ESM is also effective for analysing the causes of the uncertainties inside ESMs. In this analysis, by extracting the terrestrial submodel, we found an effective framework for testing and refining the ESM. We conducted an assessment of the terrestrial submodel and replaced it with the improved one within the ESM. We found that the difference between the coupled model before and after terrestrial submodel refinements is mainly caused by the terrestrial submodel refinement. The remaining differences may be caused by the errors in climate data and/or the coupled effect of carbon cycle and climate.
Third, establishment of a systematic procedure that is applicable to other models is an effective way to improve ESMs. We tested the procedure to first evaluate the terrestrial submodel in an off-line mode and then evaluated an ESM in the coupled mode. The procedure itself is applicable to other ESMs, and useful for benchmarking and running ESMs. Furthermore, this work can be strengthened by including more systematic and objective model evaluation and refinement (Stockli et al., 2008;Randerson et al., 2009;Wang et al., 2009) using more eddy covariance observation sites and satellite-based products. More objective and systematic terrestrial model improvements will successfully improve the ecosystem models and ESMs.
Although this study proposed an effective case study for ESMs, and tested it within an ESM framework, there still remain several potential problems. First, the difference of spatial scales between ESMs and point observations of eddy covariance should be solved in the future. The footprint of eddy covariance observation (several km) is greatly different from that of ESM (in this study, 1.8 • × 3.6 • ). These differences can potentially affect the climate forcing to the terrestrial model. Different land cover types might be captured by the ESM due to insufficient atmospheric and terrestrial submodels. Therefore, further improvements of them are required. Moreover, geographical distribution of eddy covariance observation sites was biased to the northern hemisphere. Currently, several studies have described spatial upscaling techniques using eddy covariance observations, gridded climate data and satellite-based data (Papale and Valentini, 2003;Yang et al., 2007;Xiao et al., 2008;Jung et al., 2009). Use of these data as validation data for terrestrial ecosystem models can potentially improve the modelling.
Second, carbon cycle modelling in tropical forests is still uncertain, and needs further assessment to represent observed seasonality. Several studies pointed out the importance of deep rooting depth (e.g. Ichii et al., 2007;Baker et al., 2008), leaf phonological states (e.g. Baker et al., 2008;Poulter et al., 2009), hydraulic redistribution (Lee et al., 2005;Baker et al., 2008), and moisture control on soil respiration (e.g. Hutyra et al., 2007;Baker et al., 2008). Study of these mechanisms is still at a very early stage, which prevents their incorporation into the terrestrial submodels in ESMs. To evaluate the effects on the tropical forest vulnerability, such as forest dieback (e.g. Cox et al., 2000), we need to incorporate these mechanisms into the model.
Third, human effects on terrestrial ecosystems, such as disturbance and fire, were not included in the study. Several studies identified the effect of site history on the terrestrial carbon budget. These effects have already been included in the eddy covariance observations (Thornton et al., 2002;Law et al., 2003;Magnani et al., 2007). Lack of site history can also be a potential cause of poor correlation between observed and modelled RE and NEP. Therefore, future studies need to include site history.
Fourth, the atmosphere submodel also needs improvements. We demonstrated that the differences between off-line and online simulations are still large, although the model is improved in both off-line and online simulations. One of the potential causes of the differences is that the climate model is simply described by a two-dimensional energy balance model. However, several studies have found that the responses of climate model to simulate future changes are also different and climate model-dependent (Berthelot et al., 2005;Ito, 2005). Similar to this study, the atmosphere model should be also evaluated independently and coupled to ESMs.
Finally, more objective methods of model parameter calibration such as to apply optimization routine (Williams et al., 2009) and hierarchical analysis  are needed. In this study, we tuned parameter in an iterative way, and still biases were remained in the terrestrial submodels. These biases are partly due to insufficient model parameter tuning. Use of these sophisticated methods potentially helps to reduce the difference between model outputs and observations.

Conclusion
In this study, we evaluated and improved a terrestrial ecosystem submodel (MOSES2/TRIFFID) included in an ESM (UVic-ESCM) using eddy covariance observations. Then, we tested the effects of terrestrial submodel improvements on future projection of global carbon cycle and climate based on ESM. We found that the default terrestrial submodel in the ESM is immature. Through fixing parameters and models structures, the submodel was greatly improved by using eddy covariance observations as constraints. The terrestrial submodel improvement also affected the simulation of carbon cycle and climate coupled model both at point and global scales. Therefore, to improve ESM and reduce their uncertainties, terrestrial submodels should be evaluated and refined.

Acknowledgments
This study was financially supported by the A3 Foresight Program (CarboEastAsia: Capacity building among ChinaFlux, JapanFlux, and KoFlux to cope with climate change protocols by synthesizing measurement, theory, and modeling in quantifying and understanding of carbon fluxes and storages in East Asia) by JSPS, the National Natural Science Foundation of China (NSFC), and the Korea Science and Engineering Foundation (KOSEF) and Grant-in-Aid for Scientific Research (C) (ID: 50345865) from the Japan Society for the Promotion of Science (JSPS). The data were provided by the CarboEastAsia database. Special thanks to all scientists and supporting teams at the Amer-iFlux, CarboEurope and AsiaFlux sites. We also acknowledge two anonymous reviewers for their valuable comments on this manuscript.