Constraining ecosystem carbon dynamics in a data-limited world : integrating ecological “ common sense ” in a model-data-fusion framework

Many of the key processes represented in global terrestrial carbon models remain largely unconstrained. For instance, plant allocation patterns and residence times of carbon pools are poorly known globally, except perhaps at a few intensively studied sites. As a consequence of data scarcity, carbon models tend to be underdetermined, and so can produce similar net fluxes with very different parameters and internal dynamics. To address these problems, we propose a series of ecological 5 and dynamic constraints (EDCs) on model parameters and initial conditions, as a means to constrain ecosystem variable inter-dependencies in the absence of local data. The EDCs consist of a range of conditions on (a) carbon pool turnover and allocation ratios, (b) steady state proximity, and (c) growth and decay of model carbon pools. We use a simple ecosystem carbon model in a model-data fusion framework to determine the added value of these constraints in a data-poor context. Based 10 only on leaf area index (LAI) time series and soil carbon data, we estimate net ecosystem exchange (NEE) for (a) 40 synthetic experiments and (b) three AmeriFlux tower sites. For the synthetic experiments, we show that EDCs lead to an an overall 34 % relative error reduction in model parameters, and a 65 % reduction in the 3 yr NEE 90 % confidence range. In the application at AmeriFlux sites all NEE estimates were made independently of NEE measurements. Compared to these ob15 servations, EDCs resulted in a 69–93 % reduction in 3 yr cumulative NEE median biases (−0.26 to +0.08kgCm−2), in comparison to standard 3 yr median NEE biases (−1.17 to −0.84kgCm−2). In light of these findings, we advocate the use of EDCs in future model-data fusion analyses of the terrestrial carbon cycle.


Introduction
Terrestrial ecosystem carbon exchange is a fundamental part of the global carbon cycle link to biosphere processes.Atmospheric CO 2 measurements indicate the presence of a global land C sink, i.e. uptake by the terrestrial biosphere exceeds losses.However, relative to all major terms in the global carbon budget, the global land sink exhibits both the largest inter-annual variability and the largest uncertainty (Le Quéré et al., 2013).The terrestrial carbon budget uncertainty stems largely from unknowns in the size, spatial distribution and temporal dynamics of the major terrestrial carbon pools.As a result, there is little agreement among modelled land sink projections for the 21st century (Todd-Brown et al., 2013;Friend et al., 2013), reflecting uncertainty in knowledge on the current state of the terrestrial C cycle and its dynamics.
In recent years a growing volume of data from flux towers, satellites and plant trait databases has been used to constrain some of the key components of the terrestrial carbon cycle (e.g.Baldocchi et al., 2001;Simard et al., 2011;Kattge et al., 2011).In particular, a range of ecosystem carbon models and data sets have been brought together in model-data fusion (MDF) frameworks to produce an enhanced analy-Published by Copernicus Publications on behalf of the European Geosciences Union.sis of ecosystem carbon cycling (e.g.Williams et al., 2005;Fox et al., 2009;Carvalhais et al., 2010;Luo et al., 2011;Ziehn et al., 2012;Smith et al., 2013).Where multiple data streams are available, MDF approaches can provide an extensive insight into carbon pool dynamics, turnover rates and carbon allocation fractions (Richardson et al., 2010;Keenan et al., 2013).However, even at research-intensive sites, MDF studies can produce a wide range of acceptable model parameter sets, due to underdetermination of the carbon budget with available data.Some of these optimized parameter sets, even though they generate realistic fluxes over short timescales, are associated with major changes to larger carbon pools (soil, wood) that are nonsensical (Fox et al., 2009).For regional-and global-scale model implementation, the lack of in situ measurements amplifies this problem, sometimes referred to as equifinality (Beven and Freer, 2001).Ultimately, we need to overcome data limitations and underdetermination by integrating models and ecosystem knowledge in a common framework.This framework must ensure ecologically realistic outcomes, while still encompassing (i.e.effectively quantifying) the uncertainty associated with parameter estimation given observation errors (Hill et al., 2012).
Although a range of process-based models have been used to represent the dynamics of the terrestrial carbon cycle and land-atmosphere CO 2 exchange (e.g.Sitch et al., 2008;Schwalm et al., 2010), there are advantages in using simpler models to estimate ecosystem carbon state variables.Firstly, there is a trade-off between model complexity, such as the number of model parameters, and a model's ability to reproduce observations (e.g.Akaike, 1974): therefore a low-complexity model is preferable when it can reproduce ecosystem observations with comparable skill.Secondly, complex models are often computationally expensive, and this is an inhibiting factor when using iterative methods (such as Monte Carlo approaches) to estimate model parameters and their uncertainty.Ideally, the key terms of ecosystem carbon dynamics can be constrained by combining ecosystem observations with a model of appropriate complexity in a computationally efficient MDF framework.
Previous MDF studies have invariably relied on net ecosystem exchange (NEE) measurements (real and synthetic), along with other site-level observations (Williams et al., 2009).In a global context, the FLUXNET flux-tower network (Baldocchi et al., 2001) consists of hundreds of flux tower sites where hectare-scale NEE measurements have been made over the past two decades.In addition to NEE, complementary site-level biometric data can help resolve model parameters and state variables in an MDF context (Richardson et al., 2010;Hill et al., 2012;Keenan et al., 2013), alleviating the problem of underdetermination.However, the terrestrial biosphere will inevitably remain poorly sampled by FLUXNET.Alternative estimates of NEE from atmospheric CO 2 measurements (e.g.Peters et al., 2010;Feng et al., 2011) are only produced at continental-scale resolutions.Therefore, given the limited span of the FLUXNET flux-tower network, are spatially resolved global carbon cycle analyses limited by the sparsity of eddy flux and biometric data?NEE, the difference between photosynthesis and ecosystem respiration, is a function of the dynamics of all carbon pools over a range of timescales.In the absence of NEE observations, model NEE estimates depend on a knowledge of carbon pool sizes and model parameter values.In reality, carbon pools and model parameters (especially those related to plant allocation fractions and pool turnover rates) are poorly constrained, and therefore NEE estimates are subject to a comparably large uncertainty.Nonetheless, fundamental knowledge on ecosystem behaviour can potentially be used to overcome the lack of location-specific data or parameter values.For example, while parameters related to phenology, C allocation and turnover may vary across multiple orders of magnitude (Kattge et al., 2011;Fox et al., 2009), these parameters are strongly correlated (e.g.Sloan et al., 2013), and the range of possible parameter configurations is therefore limited.Such examples include correlations between leaf lifespan and leaf mass per area (Wright et al., 2004), leaf area index and total foliar N (Williams and Rastetter, 1999), and between foliar and root biomass (Sloan et al., 2013).These correlations can confine parameter searches to a smaller hyper-volume.Equally, while ecosystems exhibit a large range of non-steady-state dynamic behaviours, strong inter-relationships are expected between inputs, outputs, carbon pool magnitudes and turnover rates (Luo and Weng, 2011).Richardson et al. (2010) introduced the concept of reality constraints (or internal model constraints) on carbon pool dynamics within a carbon cycle MDF analysis: such constraints on the model state can potentially be used to improve estimates of model parameters.Here we propose that a broad range of model parameter combinations can be discarded when phenology, carbon allocation, turnover rates and pool dynamics are considered ecologically "nonsensical".We seek to address the following question: can we improve ecosystem model parameter and NEE estimates by incorporating ecological "common sense" into carbon cycle MDF analyses?
In this paper we propose a series of ecological and dynamic constraints (EDCs) on model parameters: these include turnover and allocation parameter inter-relations, carbon pool dynamics and steady-state proximity conditions (Sect.2).We quantify the added value of imposing EDCs in synthetic and real-data MDF contexts using a simple ecosystem carbon model, by measuring bias and confidence interval reductions of carbon cycle analyses relative to independent data (Sect.3).Finally, we discuss the prospects and limitations of our approach, as well as the implications of a wider EDC implementation in terrestrial carbon cycle MDF methods (Sect.4).Hiederer and Köchy, 2012) in our MDF analyses.The choice of these two data sets serves as an analogue for the limited ecosystem carbon data sets available on a global scale.

DALEC2
DALEC has been extensively used in MDF frameworks (e.g.Williams et al., 2005;Quaife et al., 2008;Richardson et al., 2010, amongst others).In particular, a range of MDF approaches were used in the REFLEX project, where ecosystem observations were assimilated into DALEC to produce carbon state analyses (Fox et al., 2009).Here we use the DALEC2 ecosystem carbon balance model, which combines components of DALEC evergreen and DALEC deciduous (Williams et al., 2005;Fox et al., 2009) into a single model.Gross primary production (GPP) in DALEC2 is determined from the aggregated canopy model (Williams et al., 1997), and is allocated to the biomass pools (foliar, labile, wood, and fine roots) and to autotrophic respiration (R a ); degraded carbon from biomass pools goes to two dead organic matter pools with temperature-dependent losses (heterotrophic respiration, R h ).The net ecosystem exchange is summarized as NEE = R a + R h − GPP.The C flow in DALEC2 is determined as a function of 23 parameters (including six initial carbon pool states, Table 1).We henceforth refer to the 23 parameters required to initiate DALEC2 as a parameter vector x.DALEC2 C pools and fluxes are iteratively calculated at a daily time step: the DALEC2 model equations are fully described in Appendix A. We henceforth refer to the ensemble of all model state variables (such as daily NEE, GPP, respiration terms and carbon pool trajectories) as DALEC2(x).
Here we propose a sequence of ecological and dynamic constraints (EDCs) on DALEC2 parameters and pool dynamics.For any given DALEC2 parameter vector x, all EDCs presented in this section (henceforth EDC 1, EDC 2, etc.) are implemented.The probability of parameters (henceforth  1.

Turnover constraints
We impose the following constraints on the relative sizes of turnover rates: EDC 2: EDC 3: EDC 4: where T i are daily temperature values during an N-day time window (e.g. 3 years).These constraints ensure the turnover rate ratios are consistent with knowledge of the carbon pool relative residence times (e.g.Gaudinski et al., 2000;Norby et al., 2002;Trumbore, 2006).In particular, we expect a faster litter turnover in contrast to soil organic matter (SOM) turnover (EDC 1), a faster conversion rate of litter to SOM relative to SOM turnover (EDC 2), the annual leaf loss fraction is greater than the annual woody biomass loss fraction (EDC 3), and a faster fine root turnover in contrast to SOM turnover (EDC 4).

Root-Foliar C allocation constraints
Strong correlations are expected between foliar and fine root carbon pools (e.g.Mokany et al., 2006;Sloan et al., 2013).We constrain the C allocation and dynamics of the root and foliar pools: EDC 6 : where C fol and C roo are the mean foliar and fine root carbon pool sizes over the model run period.EDC 5 ensures that the GPP allocated fraction to C roo and C fol (directly or via the labile C pool) are within a factor of 5 of each other.EDC 6 ensures that the mean fine root and foliar pool sizes are within a factor of 5 of each other.

Carbon pool growth
While we expect pools to potentially grow through time, we assume no recent disturbance and therefore limit the relative growth rate of pools.We constrain pool growth as follows: EDC 7 : where C year=1 pool is the mean carbon pool size in year 1, and is the mean carbon pool size after n − 1 years.We choose a value of G max = 0.1, which is equivalent to a 10 % yearly growth rate (or doubling of carbon over 10 yr) as the maximum growth rate for each pool in EDC 7.This assumption is conservative, given data on global forest biomass growth rates (Baker et al., 2004;Luyssaert et al., 2008).

Carbon pool exponential decay trajectories
While carbon pools are expected to grow and contract through time, in the absence of major and recent disturbance events carbon pool trajectories are expected to exhibit gradual changes on inter-annual timescales (e.g.Bellamy et al., 2005).Under these circumstances, rapid exponential decay in modelled DALEC2 carbon pools can only occur as a result of an ecologically inconsistent x.We examine the system response within a 3-year period by imposing a constraint on exponential pool trajectories (Fig. 1): we numerically fit an exponential decay curve a + be ct to all carbon pools, where t is time in days, and a, b and c are the fitted exponential decay parameters.
DALEC2 pool trajectories are rejected if the half-life of carbon pool changes is less than 3 years, i.e.EDC 8 We fully describe the numerical derivation of c in Appendix B.

Steady state proximity
For ecosystems with no recent disturbance events, we propose that each pool is within an order of magnitude of its steady-state attractor.We use mean gross primary production (F gpp ) as a proxy for long-term GPP to estimate the steady-state attractors, C ∞ pool , of four carbon pools (SOM, litter, wood and root).The steady-state attractors for C som , C lit , C woo and C roo are analytically derived as follows: where T is the mean annual temperature ( • C).For each pool, we impose an order-of-magnitude constraint on the proximity of C ∞ pool from the initial C pool value: where C 0 pool is the initial C som , C lit , C woo and C roo value for EDCs 9, 10, 11 and 12 respectively.
The 12 presented EDCs are what we believe to be the most ecologically suitable constraints on DALEC2 parameters and state variables, and are based on broader ecological knowledge of carbon dynamics.We discuss the advantages and the limitations of the proposed EDCs in Sect. 4 of this paper.

Model-data fusion
Given LAI observations, soil organic carbon estimates, prior parameter ranges (Table 1) and EDCs (Sect.2.2), our aim for each experiment is to estimate the probability distribution of parameters x.We assume no prior knowledge, other than the parameter ranges shown in Table 1: we therefore prescribe a uniform (i.e.non-informative) prior probability distribution onto all parameters.Within a Bayesian framework (e.g.Hill et al., 2012;Ziehn et al., 2012), we combine the above-mentioned information to derive the posterior probability density function of x, P (x|O), where P (x|O) ∝ P (O|x) • P range (x) • P EDC (DALEC2(x)).( 14) P (O|x) is the probability of the observations given x, P range (x) = 1 if all parameters are within the ranges prescribed in Table 1 (otherwise P range (x) = 0), and P EDC (DALEC2(x)) = 1 if all EDCs are met (otherwise P EDC (DALEC2(x)) = 0).For N observations, we derive the observation probability given x, P (O|x), as follows: where O n is the nth observation, M n is the corresponding state variable, and σ 2 n is the nth error variance for each observation (e.g.Xu et al., 2006): here we assume no error covariance between observation errors.
We employ an adaptive Metropolis Hastings Markov Chain Monte Carlo (MHMCMC) approach to draw 5 × 10 6 samples from P (x|O).This approach has been widely used to estimate the probability density function of ecosystem model parameters (Xu et al., 2006;Hill et al., 2012;Ziehn et al., 2012;Caldararu et al., 2012;Smith et al., 2013;Keenan et al., 2013, amongst others) and is ideal to explore parameter space without a need to define normal prior distributions for each parameter (e.g.Richardson et al., 2010).We repeat the MHMCMC algorithm four times (i.e.four chains), to ensure convergence between P (x|O) distributions from each chain.
To minimize sample correlations we use 500 x samples from the latter half of the accepted parameter vectors.We describe the details of our MHMCMC approach in Appendix C.

Synthetic truth -DALEC2 analyses
To quantify our ability to estimate synthetic DALEC2 ecosystem states, we perform the MDF approach over a 3year period using LAI and SOM observations created from a synthetic DALEC2 truth, based on known DALEC2 parameters.Our choice of synthetic DALEC2 states represents globally spanning data sets of satellite LAI retrievals and soil carbon map data.Based on 40 DALEC2 parameter combinations, we create 40 synthetic data sets representing typical temperate forest carbon dynamics, with 3 years of semicontinuous LAI data and one simulated soil organic carbon estimate.We use the 3-year meteorology drivers (temperate climate) from the REFLEX synthetic experiments (Fox et al., 2009).
We select 40 synthetic parameter combinations by randomly sampling parameter vectors x within the DALEC2 parameter space (Table 1), where (i) P EDC (DALEC2(x)) = 1, and (ii) x values are relevant to temperate forest ecosystems (see Appendix D).We remove approximately 95 % of daily LAI points to create an 8-day resolution semi-continuous LAI time-series.We add noise to the remaining 3 yr synthetic DALEC2 LAI: each LAI value is multiplied by a random error factor of 2 N(0,1) , where N (0, 1) is a random number derived from a normal distribution with a mean of zero and a standard deviation of 1.For each synthetic soil carbon observation, we multiply C 0 som at t = 0 by a random error factor of 2 N(0,1) .We fully explain the derivation of the synthetic experiment parameter vectors, (henceforth s) in Appendix D.
We perform the MHMCMC and label the posterior parameter ensemble (4 × 500 × 40 x samples) as xs STA (standard synthetic MDF) and xs EDC (synthetic MDF with EDCs).We assign an uncertainty factor of 2 to all synthetic observations, hence O n and M n are log-transformed observations and σ n = log(2).For each posterior DALEC2 x, we determine the log-normalized parameter-space error (x) by comparing x with its corresponding synthetic truth vector s: where x(n) and s(n) represent the nth parameters of x and s, the N is the number of parameters in x, and x(n) min , x(n) max are the minimum and maximum parameter values (see Table 1).To assess the parameter estimation capability for each experiment, we derive the (x) for each parameter vector in (a) xs STA (b) xs EDC and (c) for uniformly random samples where P range (x) = 1 (henceforth xs RAN ).We refer to the ensemble of (x) values for xs STA , xs EDC and xs RAN as E(xs STA ), E(xs EDC ) and E(xs RAN ).We quantify the overall EDC associated error reduction (I EDC ) as follows: where E represents the median of E for each posterior parameter ensemble.This allows us to assess the relative improvement of xs EDC over xs STA parameter estimates against the xs RAN "zero-knowledge" case.In addition, we determine the I EDC for two parameter subgroups: (a) directly constrained parameters, and (b) indirectly constrained parameters.We assign c lf , c ronset , c rfall , d onset , d fall and C 0 som to parameter group A: these parameters can be directly inferred from the LAI and soil organic carbon observations.We assign the remaining parameters to parameter group B: these can only be inferred from the DALEC2 model structure and -potentially -EDCs.Finally we compare NEE from DALEC2(xs EDC ) and DALEC2(xs STA ) against the NEE synthetic "truths" -DALEC2(s).

AmeriFlux -DALEC2 analyses
For the flux-tower experiments, we constrain DALEC2 parameters using (a) MODIS derived Leaf Area Index (LAI), and (b) total soil carbon from the harmonized world soil database (HWSD Hiederer and Köchy, 2011).We perform daily resolution 3-year DALEC2 analyses for three forest categories: evergreen needleleaf (ENF), deciduous broadleaf (DBF) and mixed forest (MF).We chose one AmeriFlux site from each forest type.To establish a suitable site for our method we chose sites with NEE data spanning across 3 years between 2001 and 2010.Our selected sites for each forest type are Howland Forest (US-Ho1, evergreen needleleaf forest, 45.2041  Desai et al., 2005).We chose temperate sites with little expected water-stress, and with a ≤ 3 months of recorded below-freezing soil temperatures.These criteria reflect the current capabilities of DALEC2, as hydrological processes are not explicitly portrayed in the model.
For each AmeriFlux site, we extract the corresponding MODIS LAI retrievals from the MOD15A2 LAI 8day version 005 1 km resolution product (downloaded from the Land Processes Distributed Active Archive Centre http://lpdaac.usgs.gov/):we only keep maximum quality flag data.Standard deviations are provided for 1 km MODIS LAI retrievals, however these (a) do not reflect the magnitude variability in uncertainty, (b) often imply the existence of negative LAI observations (σ LAI > LAI) and (c) are occasionally missing.While various MODIS LAI evaluations have been performed (e.g.Sea et al., 2011;Serbin et al., 2013), large-scale spatiotemporal LAI retrieval errors remain poorly quantified.For the sake of simplicity, we assign a factor of 2 uncertainty (i.e.log(LAI) ± log(2)) for each MODIS LAI observation.To minimize spatial discrepancies between MODIS and AmeriFlux sites, each LAI observation is the arithmetic mean of all available LAI retrievals within a 9pixel 3 km × 3 km area (centred on each AmeriFlux site).Overall, we use 95, 120 and 119 LAI values at US-Syv, US-Ho1 and US-MMS (5th-95th percentile ranges for LAI values are 0.4-5.8,1.0-5.6 and 0.4-5.5 respectively).
For each site we extract total soil carbon density from the nearest Harmonized World Soil Database 30 arc seconds resolution total soil carbon content (approx. 1 km at equator; Hiederer and Köchy, 2011): the authors have performed multiple comparisons of the global HWSD against other products, however no pixel-scale uncertainties are provided.We chose to assign an uncertainty factor of 2 on each site-scale HWSD soil carbon estimate.The HWSD soil carbon values are 2.3 × 10 4 , 2.3 × 10 4 and 5.2 × 10 3 g C m −2 .
To limit our study to the use of globally spanning data sets, we extract DALEC2 drivers from 0.125 • × 0.125 • ERA interim meteorology (see Appendix A for details).The DALEC2 analyses for each site are therefore completely independent from all site-level measurements (we note, however, that extensive meteorological and biometric data are meticulously recorded across the AmeriFlux site network).Therefore, we produce a fully independent ecosystem carbon cycle analysis, which can be evaluated against measured NEE at each flux-tower site.
As done for the synthetic experiments, we perform the MHMCMC approach at each site -with and without EDCs -and label the posterior parameter ensembles (4 chains × 500 x samples) as xa STA (standard Ameri-Flux MDF) and xa EDC (AmeriFlux MDF + EDCs).We compare the DALEC2 NEE analyses, DALEC2(xa EDC ) and DALEC2(xa STA ) against NEE measurements at each Amer-iFlux site.

EDC sensitivity test
To determine the sensitivity of our results to EDCs 1-12, we repeat MDF estimates of xs EDC and xa EDC by imposing only one EDC at a time (henceforth xs EDC(n) and xa EDC(n) , where n is the nth EDC).For the synthetic experiments, we determine the relative contribution of the nth EDC by quantifying the overall EDC associated error reduction (I EDC(n) , see Eq. 17) for each estimate of xs EDC(n) .Given the large computational cost of estimating xs EDC(n) for each EDC (40 synthetic experiments × 12 EDCs × 4 chains), we limit our sensitivity analysis to I EDC estimates based on 4 (out of 40) synthetic experiments.
We compare 3 yr integrated DALEC2 NEE estimates and AmeriFlux NEE measurements at all three sites (AmeriFlux NEE measurement temporal gaps have been consistently excluded from DALEC2 3 yr NEE estimates).We determine the DALEC2 3 yr NEE 50% confidence range (50% CR: 25th-75th percentile interval) reduction as follows: where R NEE,EDC(n) and R NEE,STA are the 50% CR of DALEC2(xa EDC(n) ) and DALEC2(xa STA ) 3 yr NEE estimates.Similarly, we calculate the 3 yr NEE bias reduction (relative to AmeriFlux NEE measurements) as follows: where B NEE,EDC(n) and B NEE,STA are the median biases of DALEC2(xa EDC(n) ) and DALEC2(xa STA ) 3 yr NEE estimates.

Synthetic experiments
The inclusion of EDCs resulted in substantial error reductions in posterior DALEC2 parameter and state variable estimates.We found an overall reduction in the posterior MHMCMC EDC parameter vector errors E(xs EDC ), relative to both the standard MHMCMC errors E(xs STA ) and the randomly sampled parameter vector errors E(xs RAN ): we found an improvement of I EDC = 34 % associated with using EDCs (Fig. 2c).For the directly constrained parameters (parameter group A) we found similar distributions for both  2b).We found that EDCs 5 and 8 accounted for the largest error reduction in DALEC2 parameter estimates (I EDC(5,8) ≥ 3 %, Table 2), followed by EDCs 6, 10 and 12 (I EDC(6,10,12) = 2 %).EDC 7 led to an overall parameter error increase (I EDC(7) = −13 %).The remaining EDCs accounted for small or negative error reductions.

AmeriFlux results
The DALEC2(xa EDC ) analyses outperformed the standard DALEC2(xa STA ) analyses at the AmeriFlux tower sites.The inclusion of EDCs in DALEC2 analyses amounted to overall NEE bias reductions at all sites (US-Syv, US-Ho1, US-MMS, we henceforth present all site results in this order).The aggregated DALEC2(xa EDC ) median daily NEE biases (−0.02, 0.13, −0.03 g C m −2 d −1 ) are closer to the AmeriFlux measured NEE by roughly one order of magnitude in contrast to DALEC2(xa STA ) median NEE biases (−0.52, −0.86, −1.15 g C m −2 d −1 ).The aggregated daily DALEC2(xa EDC ) NEE 90 % confidence ranges at each site  Based on DALEC2(xa EDC(n) ) 3 yr NEE estimates, EDC 10 resulted in a ≥ 18% bias reduction and a ≥ 5% 50 % CR reduction at all three sites, relative to DALEC2(xa STA ) (Table 2).EDCs 2 and 8 resulted in a > 10% 3 yr NEE 50 % CR reduction and an increase in 3 yr NEE bias at all three sites (NEE bias reduction ≤ −22%).EDCs 7 and 9 resulted in a ≥ 50% 3 yr NEE bias reduction and an increase in 3 yr NEE 50 % CR at all three sites (NEE 50 % CR reduction ≤ −15%).

Discussion
With the use of a simple model and globally available data, i.e. leaf area dynamics and soil carbon observations, we have demonstrated that the EDC approach provides an improved ability to infer the magnitude of carbon fluxes, live carbon pools and model parameters, in comparison to a standard parameter optimization approach (STA).
For ecologically relevant synthetic truths, EDCs provide improved estimates of the DALEC2 parameters and state variables.The EDC approach resulted in (a) parameter estimation error reductions, (b) NEE bias and confidence range reductions, and (c) improved estimates of the live biomass C pools, in contrast to the STA parameter and flux and C pool estimates.While there is little difference between directly inferable (Group A) estimated parameter errors between the EDC and STA approach, using EDCs led to a marked reduction in estimated parameter error for indirectly inferable (Group B) parameters.The indirectly inferred parameters include allocation fractions, subsurface pools and turnover rates, which are typically difficult to observe at field sites and virtually impossible to observe remotely (i.e. at regional scales).
By comparing DALEC2 analyses against independent AmeriFlux NEE measurements over real ecosystems, we further validated the advantages of using EDCs.At each Ameri-Flux site, we found that EDCs led to an increased confidence and a largely reduced NEE bias; our DALEC2 model analyses suggests that the use of EDCs regionally and globally could significantly enhance our ability to estimate ecosystem state variables in the absence of direct observational constraints.In light of the large differences between Earth system models (Todd-Brown et al., 2013;Friend et al., 2013), we anticipate that EDCs may help constrain ecosystem carbon terms on global scales, where carbon pools and their residence times are typically difficult or impossible to measure.
Together, EDCs 1-12 lead to overall improvements in parameter estimates and AmeriFlux site NEE confidence range/bias (Table 2): however, with the exception of EDC 10, when EDCs were tested individually, they did not lead to comprehensive improvements.For example, EDC 8 alone (no rapid exponential pool decay) resulted in large Ameri-Flux site NEE confidence range reductions, as well as improved synthetic parameter estimates; however, EDC 8 resulted in higher AmeriFlux site NEE biases.Conversely, EDC 9 (steady-state proximity of the soil carbon pool) resulted in the largest AmeriFlux site bias reductions, while NEE confidence was lower.EDC 5 (comparable fine root and foliar/labile allocation) led to the largest parameter improvements; however, the associated changes in AmeriFlux site NEE estimates were relatively small.Our findings demonstrate that robust improvements in carbon cycling parameter and state variable estimates only arise when EDCs are used collectively.
Here we developed a group of EDCs suitable to ecosystems with no recent major disturbance.However, we note that our EDCs can be adapted for a wider range of ecosystem dynamics.For example, recently disturbed ecosystems may be (a) rapidly recovering and (b) growing towards a steady state where carbon pools are greater than one order of magnitude from the initial carbon pools.Therefore a subset of our EDCs (EDCs 7-12) can be adapted to better represent ecological "common sense" in recovering ecosystems.
Ultimately, EDCs can be adapted to best represent ecological knowledge in a variety of ecosystem carbon model MDF applications, where the ecosystem observations are insufficient to constrain all model state variables (e.g.Fox et al., 2009).For example, on regional and global spatial scales, there is often no explicit knowledge on various model parameter values and their associated uncertainty.In such cases, our EDC approach imposes inter-parameter constraints while simultaneously allowing a global parameter exploration across several orders of magnitude (see Table 1).Hence EDCs allow us to incorporate ecologically consistent relationships between parameters (i.e.allocation ratios, turnover ratios), without the need to constrain otherwise unknown parameter and state variables.Moreover, as an alternative to imposing plant-functional-type priors, which risk being subjective and over-rigid, ecosystem trait inter-relationships derived from plant trait data (e.g.Wright et al., 2004;Kattge et al., 2011) could be incorporated as additional EDCs.Given a quantitative knowledge of parameter inter-relationships, we also note that a prior parameter variance-covariance structurein addition to EDCs -can also be used as an alternative or complementary constraint on the model state and parameters.Finally, we note that our choice of EDCs is open to adaptation and adjustment: we maintained relatively broad constraints (e.g.EDC 6 permissible root:foliar C range > one order of magnitude), which can likely be refined through further study.
In this study we limited our observational constraints to globally spanning MODIS LAI retrievals and the HWSD soil map.Given these two data sets, we have demonstrated that EDCs lead to improved model parameter estimates and reduced NEE bias and confidence ranges.Nonetheless, based on the posterior NEE probability density function, we are unable to determine whether sites are net carbon sinks or sources on annual timescales.However, an increasing number of continental and global scale biospheric data sets are becoming available: these include a global canopy height map by Simard et al. (2011), pan-tropical biomass maps by Saatchi et al. (2011); Baccini et al. (2012) and a pan-boreal carbon density map by Thurner et al. (2013).These products can potentially be used in conjuncture with MODIS LAI, HWSD data and our EDC approach in a MDF framework to better constrain terrestrial carbon cycle dynamics.

Concluding Remarks
We have addressed the underdetermined nature of the carbon cycle problem by applying a group of widely applicable ecological and dynamic constraints (EDCs) on an ecosystem carbon model in a model-data fusion (MDF) framework.Particularly where extensive in situ measurements are not available, EDCs can be used to incorporate ecological knowledge, such as parameter inter-relationships and pool dynamics constraints, into ecosystem carbon model analyses.In a synthetic data experiment, we found improved estimates of DALEC2 model parameters, live carbon pools and net ecosystem exchange (NEE) when using EDCs in DALEC2 MDF analyses.By validating our DALEC2 MDF analyses against independent AmeriFlux NEE measurements, we found that EDCs led to a 69-93 % reduction in 3-year NEE biases.We incorporated 12 EDCs in DALEC2 analyses of temperate forest ecosystem carbon cycling: these EDCs can potentially be adapted for a range of models and biomes.Moreover, additional EDCs can be derived to incorporate parameter interrelationships derived from regional or global plant trait data sets into ecosystem carbon model analyses.Here we have shown that EDCs can be used to constrain the poorly resolved components of the carbon cycle: we therefore advocate the use of EDCs in future MDF analyses of the terrestrial carbon cycle.where ψ = √ 2ψ f c rfall (we note that Eq. ( A9) can be solved using a Lambert W function, where ψ = W (f (c lf )); however, Lambert W functions cannot be solved analytically).We created a look-up function for ψ by fitting a sixth-order polynomial between ψ and log(1 − c lf ); the full polynomial is included in the DALEC2 code.The onset is a special case of the fall formula: it was derived such that 99.99 % of C lab is transferred to C fol annually at t% 365.25 = d onset and 68 % of leaf onset occurs within c ronset day.The functions are advantageous in that (a) the daily turnover rates result in a continuous and specified loss of carbon throughout a known time period, and (b) the functions are cyclical and hence do not need to be reset, "switched on" or "switched off" throughout the model run period.We also note that while we treated d onset and d fall as constant parameters, the functions can easily accommodate temporally variable definitions for leaf onset and leaf fall.Total ecosystem respiration F t rec and the net ecosystem exchange F t nee fluxes are derived at each time step and are shown below: A schematic of the carbon fluxes in DALEC2 is shown in Fig. A1.The DALEC2 code is available upon request.
For AmeriFlux DALEC2 analyses we used daily meteorological drivers for DALEC2 from 0.125 • × 0.125 • ERAinterim re-analyses.For each site we obtained coordinates from ameriflux.ornl.gov.We downloaded 6 h temperature and 12 h downward surface solar radiation data for all site locations and years from apps.ecmwf.int/datasets.We averaged temperature and radiation from the four nearest 0.125 • × 0.125 • ERA-interim grid points.We obtained minimum and maximum temperatures from the 6 h ERA-interim temperature range.For M daily radiation values we used the sum of the two 12 h radiation re-analyses.

Appendix B: Exponential C pool decay
Exponentially decaying C t pool trajectories can be approximated as C exp = a + be ct , where a, b and c are constants and t is time in days.To implement EDC 8, we numerically estimate parameter c: to derive c for each carbon pool trajectory (C pool ) we derive (i) the gradient between yearly means for years 1 and 2: and (ii) the gradient between the yearly means with a 1-day offset:

(B2)
Parameter c can be expressed as In the case of a true exponential curve with a known value of c, the numeric derivation of c shown in Eqs.(B1)-(B3) is exact (i.e.within numerical precision of the true c).In cases where there is no exponential decay, c is either positive or complex.While C exp is an approximation of C t pool , in practice this approach is both computationally fast and effectively able to identify rapid exponential decay (c < log(2)  365.25×3 ) trajectories.10. 5  6 < LAI year=3 LAI year=1 < 6 5 .
For a given vector s, all conditions must be met when P TF (DALEC2(s)) = 1.The above-listed conditions ensure that the selected s vectors broadly reflect canopy dynamics (1-4), carbon pool sizes (5-6) mean photosynthetic uptake ( 7) and limited year-to-year canopy changes (8) associated with temperate forest ecosystems (e.g.Fox et al., 2009).We derive each parameter vector s by selecting a random parameter vector s 0 and incrementally adjusting it until P EDC (DALEC2(s)) P TF (DALEC2(s)) = 1.To represent a range of canopy dynamics, we also imposed either (a) a deciduous condition (where c rfall > 10 11 ) or (b) a mixed forest or evergreen condition (where c rfall < 10 11 ), with a 50-50 % probability for either constraint.
We simplistically simulate the 8-daily MODIS LAI data and soil carbon map HWSD products from DALEC2(s) LAI and C som : we multiplied each soil organic carbon "truth" at t = 0 (C 0 som ) by 2 N(0,1) , where N (0, 1) is a random number sampled from the normal distribution with mean = 0 and variance = 1.
For LAI synthetic observations, we only kept one in eight LAI values, and created correlated gaps in the remaining LAI data of random lengths until at least 50 % of the 8 daily data is removed.Overall, between 65 and 68 LAI observations are kept for each 3 yr synthetic experiment.Twenty-two parameter vectors are categorized as deciduous, and 18 as evergreen.Mean 3 yr F t gpp ranges from 2.04-8.79g C m −2 d −1 (median = 4.75g C m −2 d −1 ) and mean 3 yr F t nee ranges from −3.71 to 2.87g C m −2 d −1 (median = −0.72 g C m −2 d −1 ).

Figure 2 .Figure 3 .
Figure 2. Aggregated parameter estimates xs STA (standard sampling, blue) and xs EDC (EDC sampling, red) from deciduous evergreen synthetic LAI and soil organic carbon observations -these are compared against observation and EDC independent parameter samples xs RAN (light grey).Normalized parameter space error ( ) probability density functions for (a) Group A (directly inferable) parameters, (b) Group B (indirectly inferable) parameters and (c) all DALEC2 parameters.values for each parameter group were derived using Eq.(15).In panels (d) and (e) the probability density functions of live biomass (foliar, labile, wood and roots) and dead biomass (litter and soil carbon) biases against the synthetic truth parameters s are shown for xs RAN , xs STA and xs EDC parameter estimates.

Figure 4 .
Figure 4. DALEC2 daily NEE ensemble estimates at three Amer-iFlux sites: Sylvania Wilderness (US-Syv, mixed forest, top two rows), Howland Forest (US-Ho1, evergreen needleleaf, middle two rows), and Morgan Monroe State Forest (US-MMS, deciduous broadleaf, bottom two rows).For each site the DALEC2(xa EDC ) and the DALEC2(xa STA ) ensemble confidence intervals are denoted as EDC and STA, respectively.The DALEC2 analysesbased on MODIS LAI retrievals, HWSD soil organic carbon estimates and ERA interim meteorological drivers -are completely independent from all AmeriFlux site measurements.

Figure 5 .
Figure 5. Three-year mean DALEC2 cumulative NEE (kg C m −2 ) compared against cumulative measured NEE at three AmeriFlux sites: Sylvania Wilderness (US-Syv, mixed forest, left), Howland Forest (US-Ho1, evergreen needleleaf, middle) and Morgan Monroe State Forest (US-MMS, deciduous broadleaf, right).The standard analysis median and 50 % confidence ranges (CR) are shown in blue, and the corresponding analyses with EDCs are shown in red.AmeriFlux NEE measurements are denoted as a black line.The DALEC2 analyses -based on MODIS LAI retrievals, HWSD soil organic carbon estimates and ERA interim meteorological driversare completely independent of all AmeriFlux site measurements.

Figure A1 .
Figure A1.Schematic of the carbon fluxes in DALEC2.The green arrow indicates the gross primary production (GPP).Red arrows represent respiration fluxes: autotrophic respiration (R auto ) and heterotrophic respiration (R het ).Blue arrows represent C allocation to the labile (C lab ), foliar (C fol ), wood (C woo ) and fine root (C roo ) pools.Grey arrows represent the litterfall and decomposition fluxes to the litter (C lit ) and soil organic matter (C som ) pools.

Table 1 .
DALEC2 model parameters, descriptions, and minimum-maximum parameter values: the corresponding DALEC2 equations are fully described in Appendix A.

Table 2 .
Synthetic experiment parameter error reduction, and AmeriFlux experiment 3 yr NEE 50% CR and bias reduction for MDF estimates using individual EDCs, relative to the standard MDF estimates.
a The parameter error reduction metric, I EDC(n) , is described in Sect.2.4.b The derivations of 3 yr NEE 50 % CR and bias reductions are described in Sect.2.6.
f auto F t gpp + (θ lit C t lit + θ som C t som )e T t