Bayesian calibration of a soil organic carbon model using 1 14 C measurements of soil organic carbon and heterotrophic respiration as joint constraints

. Soils of temperate forests store signiﬁcant amounts of organic matter and are considered to be net sinks of atmospheric CO 2 . Soil organic carbon (SOC) turnover has been studied using the 1 14 C values of bulk SOC or different SOC fractions as observational constraints in SOC models. Further, the 1 14 C values of CO 2 that evolved during the incubation of soil and roots have been widely used together with 1 14 C of total soil respiration to partition soil respiration into heterotrophic respiration (HR) and rhizosphere respiration. However, these data have not been used as joint observational constraints to determine SOC turnover times. Thus, we focus on (1) how different combinations of observational constraints help to narrow estimates of turnover times and other parameters of a simple two-pool model, the Introductory Carbon Balance Model (ICBM); (2) whether relaxing the steady-state assumption in a multiple constraints approach allows the source/sink strength of the soil to be determined while estimating turnover times at the same

Abstract. Soils of temperate forests store significant amounts of organic matter and are considered to be net sinks of atmospheric CO 2 . Soil organic carbon (SOC) turnover has been studied using the 14 C values of bulk SOC or different SOC fractions as observational constraints in SOC models. Further, the 14 C values of CO 2 that evolved during the incubation of soil and roots have been widely used together with 14 C of total soil respiration to partition soil respiration into heterotrophic respiration (HR) and rhizosphere respiration. However, these data have not been used as joint observational constraints to determine SOC turnover times. Thus, we focus on (1) how different combinations of observational constraints help to narrow estimates of turnover times and other parameters of a simple two-pool model, the Introductory Carbon Balance Model (ICBM); (2) whether relaxing the steady-state assumption in a multiple constraints approach allows the source/sink strength of the soil to be determined while estimating turnover times at the same time. To this end ICBM was adapted to model SOC and SO 14 C in parallel with litterfall and the 14 C of litterfall as driving variables. The 14 C of the atmosphere with its prominent bomb peak was used as a proxy for the 14 C of litterfall. Data from three spruce-dominated temperate forests in Germany and the USA (Coulissenhieb II, Solling D0 and Howland Tower site) were used to estimate the parameters of ICBM via Bayesian calibration. Key findings are as follows: (1) the joint use of all four observational constraints (SOC stock and its 14 C, HR flux and its 14 C) helped to considerably narrow turnover times of the young pool (primarily by 14 C of HR) and the old pool (primarily by 14 C of SOC). Furthermore, the joint use of all observational constraints made it possible to constrain the humification factor in ICBM, which describes the fraction of the annual outflux from the young pool that enters the old pool. The Bayesian parameter estimation yielded the following turnover times (mean ± standard deviation) for SOC in the young pool: Coulissenhieb II 1.1 ± 0.5 years, Solling D0 5.7 ± 0.8 years and Howland Tower 0.8 ± 0.4 years. Turnover times for the old pool were 377 ± 61 years (Coulissenhieb II), 313 ± 66 years (Solling D0) and 184 ± 42 years (Howland Tower), respectively.
(2) At all three sites the multiple constraints approach was not able to determine if the soil has been losing or storing carbon. Nevertheless, the relaxed steady-state assumption hardly introduced any additional uncertainty for the other parameter estimates. Overall the results suggest that using 14 C data from more than one carbon pool or flux helps to better constrain SOC models.

Introduction
Soils store around 3000 Pg C of soil organic carbon (SOC) (Jobbágy and Jackson, 2000;Tarnocai et al., 2009). This means that soils contain roughly 4 times more carbon than the atmosphere, and 6 times more carbon than the vegetation (Prentice et al., 2001). About 100 Pg C each year are emitted to the atmosphere from soils (Bond-Lamberty and Thomson, 2010). A considerable part of this soil CO 2 efflux is the product of soil organic matter decomposition via soil organisms. Apart from the importance of soil organic carbon in the global terrestrial carbon cycle as the largest terrestrial carbon pool and as the source of one of the largest terrestrial carbon fluxes, soil organic matter turnover is a key factor for soil fertility and nutrient resupply. Jenny et al. (1949) were the first to study soil organic matter (SOM) turnover with a processbased model. In this seminal paper SOC was modeled as one homogenous pool which decomposes according to first-order kinetics, analogous to nuclear decay. Since then a multitude of different SOM models have been devised which vary in their degree of complexity (Manzoni and Porporato, 2009). In general, with the spread of personal computers and the rapid increase in computing capacity more and more multipool models have been developed.
For years, fractionation techniques were developed in the hope that organic matter could be physically and chemically separated into pools that could be related to conceptual models of carbon cycling. This strategy is often referred to as "measuring the modelable" (Elliott et al., 1996). Though this approach seems to be successful for specific models and fractionation procedures (e.g., Zimmermann et al., 2007), the premise that measured fractions should represent "unique and non-composite pools" (Smith et al., 2002) is still difficult to fulfill. On the other hand the notion of "modeling the measurable" (Elliott et al., 1996) has been put forward and may, for example, lead to the inclusion of microbial dynamics in SOC models (Scharnagl et al., 2010). Microbial biomass data from chloroform fumigation methods could then serve as an additional observational constraint. In fact, these two related strategies can lead to a useful coevolution and refinement of both experimental and modeling approaches, given technical and conceptual advances. However, an abundance of soil observations already exist that have to date not been adequately used to test carbon cycle models. The strategy we suggest here could be described as "considering the measured", meaning that one should check which variables have been measured at a certain site and compare it with modeled output variables. Using the model outputs together with inverse modeling, soil processes and model parameters can be studied. We propose to use SOC stocks and heterotrophic respiration fluxes in order to link observations of soil C pools and fluxes (Kuzyakov, 2011) with their respective 14 C values in order to constrain the parameters of a simple serial two-pool SOC turnover model -the Introductory Carbon Balance Model (ICBM; Andrén and Kätterer, 1997).
The 14 C content of bulk SOC or different SOC fractions has been successfully used as an observational constraint in SOC models to calculate turnover times of SOC (Trumbore, 1993;Gaudinski et al., 2000;Schulze et al., 2009). Although these authors demonstrated the potential of this approach, they were looking for one single best parameter set, rather than treating the effect of measurement uncertainty on parameter uncertainty in a formal way. Further, the 14 C value of CO 2 that evolved during the incubation of soil and roots has been widely used together with the 14 C content in total soil respiration to partition soil respiration (SR) into heterotrophic respiration (HR) and rhizosphere respiration (RR) (e.g., Gaudinski et al., 2000;Trumbore, 2006 -for an overview;Muhr et al., 2010). To our knowledge, these two approaches of using radiocarbon in soil science research have not been used as joint constraints for the estimation of decomposition rates and other parameters of SOC models. However, Schmidt et al. (2011) proposed that the 14 C content of respired CO 2 and leached dissolved organic carbon could be used as additional constraints in model-data comparisons. Wutzler and Reichstein (2007) have shown that the commonly used equilibrium or steady-state assumption of many SOC models may lead to biased estimates of SOC turnover times. For a soil with SOC stocks below equilibrium, a calibration of turnover times assuming SOC stocks at equilibrium would yield too-fast turnover time estimates. In their modeling study Wutzler and Reichstein (2007) proposed a transient correction for decay rates to account for possible disturbances in the past. In the model-data comparison framework we propose, we tackled this issue from a different perspective by introducing and calibrating parameters relaxing the steady-state assumption. A similar approach has been by taken by Carvalhais et al. (2010) who introduced steadystate relaxing parameters to allow for vegetation and soil carbon pools out of equilibrium into the ecosystem model CASA. Hence, we try to constrain the source/sink function of the soil by subjecting these additional parameters to the previously described observational constraints.
To properly quantify the effect of uncertainties in measurements on the uncertainty of parameter estimates, we performed a Bayesian calibration with a Monte Carlo Markov Chain (MCMC) algorithm and data from three sprucedominated sites in the US and Germany. More specifically we wanted to address the following questions: (i) How do combinations of different observational constraints -ranging from measurements of SOC stock, 14 C of SOC, and heterotrophic respiration to measurements of 14 C of heterotrophic respiration -influence the parameter and prediction uncertainties of ICBM?
(ii) How well can the net carbon balance be constrained with a multiple constraints approach by relaxing the steady-state assumption?
B. Ahrens et al.: Bayesian calibration of a SOC model using 14 C measurements of SOC and HR 2149 model had only one type of litterfall as input (Andrén and Kätterer, 1997), in I 14 CBM carbon enters the first SOC pool -the young pool Y -as aboveground and belowground litter input (iL and iR, upper half of Fig. 1). Carbon in the Y pool is decomposed according to first-order kinetics with the decomposition rate k Y . A part h of the outflow from Y is not directly mineralized to CO 2 , but transferred via humification (h) into the old pool O (Fig. 1). Mineralization of carbon in the old pool O also follows first-order kinetics with the decomposition rate k O : Andrén and Kätterer (1997) devised a parameter r (external response factor) was intended to comprise the influence of abiotic conditions on decomposition, like soil moisture and temperature, and equally affect the decomposition rates of Y and O. Since we are running the model at an annual time step, we assumed that we do not need to explicitly account for the influence of external factors like climatic and edaphic conditions on SOC decomposition. Hence, r is set to 1; this means that external effects are lumped into the other parameters, and should be reflected in the variation of the decomposition rates k Y and k O and the humification coefficient h across the different sites. Additionally, we introduce the parameters bias iL and bias iR to the modeling setup ( Fig. 1), which should account for a potential bias in litterfall measurements by assuming that the actual litterfall is a multiple of the observed litterfall (cf. Sect. 2.3.1). Potential bias may arise if only leaf litterfall is sampled or the location of litterfall traps is unrepresentative. Hence, bias iL and bias iR are two dimensionless parameters that express the ratio between the "real" and observed litterfall (Fig. 1). This technique of accounting for under-or overestimated carbon input fluxes has been successfully used in studies modeling the decay of organic matter in marine sediments. Here, sediment traps were suspected to underestimate the carbon flux to the sediment (Soetaert and Herman, 2009).
In order to adapt ICBM for radiocarbon data, we essentially replicated Eqs. (1) and (2) as an additional 14 C-module of ICBM. Only radioactive decay of 14 C had to be added as an additional process, with λ, the radioactive decay constant for 14 C, equaling 1 8267 yr −1 (Stuiver and Polach, 1977): We used the atmospheric 14 C record as a proxy for the 14 C input via root and leaf litter input. In Fig. 1  graph shows a part of this record from 1900 to 2011 covering the prominent "bomb peak" resulting from aboveground nuclear weapons testing during the late 1950s and early 1960s (Hua and Barbetti, 2004). Throughout this paper we use the definition of 14 C (Stuiver and Polach, 1977) as where 14 C C SN denotes the 14 C C -ratio of the sample, normalized for isotope fractionation, and A ABS the 14 C C -ratio of the standard. According to Karlen et al. (1968) and Stuiver (1980), A ABS = 1.176 · 10 −12 . This corresponds to the 95 % specific activity of NBS Oxalic Acid I (SRM 4990B), normalized to a δ 13 C VPDB of −19 ‰ and decay corrected to 1950. Based on the atmospheric 14 C record and the 14 C notation (Eq. 5), the 14 C input via iL and iR was calculated as:

2150
B. Ahrens et al.: Bayesian calibration of a SOC model using 14 C measurements of SOC and HR (L) and fine roots (R) and its addition to SOC as aboveground and belowground litter input. tlag L and tlag R are introduced as additional model parameters influencing the 14 C-module of ICBM (Fig. 1); priors for these parameters were defined based on measurements at the different sites (Sects. 2.2.4, 2.3.1). Throughout this work we tried to challenge the assumption that total SOC and different SOC pools are in steady state. Dropping the steady-state assumption leads to the problem of initializing the different conceptual SOC pools (Yeluripati et al., 2009). The most common way to deal with initialization problems of conceptual and nonmeasurable SOC pools is to perform spin-up runs of the model in an undisturbed environment. Then estimates about initial SOC pools are retrieved based on a reconstructed disturbance history (Falloon and Smith, 2000;Wutzler and Reichstein, 2007;Yeluripati et al., 2009). Due to its simplicity the steady-state equations for the ICBM can still be derived relatively easily: where Y SS and O SS describe steady state and initial pool sizes of Y and O. iL ini and iR ini denote the amount of aboveground and belowground litter input at the beginning of the simulation period. This amount of litter input is assumed to be representative of the period before the simulation begins. Similarly, we can derive steady-state pool sizes for 14 Y and 14 O: where 14 iL ini and 14 iR ini is the initial 14 C input via litter input according to Eqs. (6) and (7). Here, the assumption is that 14 CO ATM 2 (t) was more or less constant before 1950. Actually 14 CO ATM 2 (t) did vary prior to 1950 due to natural causes and the Suess effect; nevertheless 14 CO ATM 2 (start − tlag (L,R) ) was taken as the initial 14 C signature of litter input, where start denotes the starting year of simulations. We took the latest year we give in Table 1 under "Stand history" as the starting year for the simulations at the different sites.
Contrary to the approach taken by Yeluripati et al. (2009), preliminary modeling exercises showed that it is not feasible to simultaneously treat the initial model pools Y ini , 14 Y ini , O ini and 14 O ini as unknown parameters, because of the inherent link between Y ini and 14 Y ini , and O ini and 14 O ini via 14 C. There is no reason to assume a discrepancy in the behavior of C and 14 C prior to 1950; hence, we have to assume that deviations from steady state have the same direction for C and 14 C. Consequently, two additional parameters f Y and f O (Fig. 1) were introduced for the nonsteady-state version of I 14 CBM that allow for a relative deviation of initial values from the steady state of the respective pools:

Study sites
We used data from three spruce-dominated forest ecosystems in Germany and the USA (Table 1) to calibrate the parameters of I 14 CBM. The Howland Forest research site is a spruce-fir forest in east-central Maine, USA. The stand was selectively logged around 1900, but has remained undisturbed since then (Hollinger et al., 1999 (Fernandez et al., 1993;Gaudinski et al., 2001). Due to the hummocky topography, the organic layer varies considerably in thickness (Gaudinski, 2001). Oi, Oe and Oa horizons of varying thickness have been separated and could possibly be designated as a mor-like humus. The Coulissenhieb II site is a mature Norway spruce (Picea abies L.) stand in the Fichtelgebirge mountains in northeastern Bavaria, Germany. Schulze et al. (2009) report that according to the forest administration the area has been clear cut during the 16th and 18th century as timber supply for the local mining industry. In 1867 the stand was afforested with Norway spruce, so that the average stand  (Schulze et al., 2009). High base saturation in the Oa horizon (54 %) and lower base saturation of 12-16 % in the subsoil indicates past superficial forest liming (Hentschel et al., 2009). The Solling roof project is a 71-year-old (2004) Norway spruce (Picea abies L.) plantation at the Solling plateau in Lower Saxony, Germany. The Solling roof project consists of four different plots, of which three are covered by transparent roofs underneath the canopy. In this work only 14 C SOC and 14 C HR data from the ambient control plot without a roof was used. This plot is mostly referred to as Solling D0 (Bredemeier et al., 1998). Table 1 gives an overview of the most important characteristics of all three sites, such as soil type, humus form, mean annual temperature and precipitation.

Measurements of soil organic carbon stocks
The soil organic carbon stock on an area basis (kg C m −2 ) was calculated as where i denotes the individual horizons/layers and SOC content,i is a SOC content or mass fraction (kg C (kg dry soil) −1 ), BD i is a soil bulk density (kg dry soil m −3 ), depth i is the thickness of the sampled horizon/layer i, and CF i is the volume fraction of coarse fragments, namely stones and roots CF = stone volume+root volume soil volume . The correction for coarse fragments is necessary, as stones contain no organic carbon and (live) roots are generally not summarized under SOM (dead soil organic matter according to Rodeghiero et al., 2009).

Soil respiration measurements
Two types of soil respiration chambers of the class of closed chambers were used: closed dynamic chambers were used at Howland and Coulissenhieb II, whereas at Solling closed static chambers were used. Generally, in closed chambers the CO 2 flux is estimated by measuring the increase of CO 2 in the chamber's head space during a known period of time (Pumpanen et al., 2004(Pumpanen et al., , 2009). The soil CO 2 -C efflux can then be determined from the increase in the CO 2 concentration c t . In closed static chambers the CO 2 concentration increase c t is determined from air sampled with syringes, which are then analyzed for CO 2 with a CO 2 analyzer (Borken et al., 1999;Pumpanen et al., 2009). In closed dynamic systems, c t is determined with portable infrared gas analyzers with the air circulating between the chamber and the analyzer (Pumpanen et al., 2009).

Soil incubations
The 14 C signature of HR was determined by incubating root-free soil samples at a constant temperature for several days. CO 2 that evolved during the incubations is sampled and analyzed for 14 C. Two different sampling methods were applied. At Howland Forest, samples from each horizon were taken, transferred to 100 mL jars and incubated for 12 days. The amount of CO 2 evolved during the incubation was measured, and the collected CO 2 was analyzed for 14 C. 14 C for bulk heterotrophic respiration can then be calculated as described in Eq. (16) (Gaudinski, 2001). At Coulissenhieb II and Solling a different sampling approach was taken. Instead of incubating disturbed soil samples from individual horizons, complete soil cores were taken. Roots were either manually removed from the soil cores at the Coulissenhieb II site (Muhr et al., 2008 or left in the soil cores under the assumption that root fragments die after 10 days and are not able to respire anymore (Lemke, 2007). Hence, the 14 C signature of CO 2 that evolved during the incubation of soil cores represents the bulk 14 C of HR.

Measuring radiocarbon signatures
14 C values were determined with accelerator mass spectrometry (AMS). The radiocarbon signatures are reported in relation to an oxalic acid standard (0.95 times the specific activity of NBS Oxalic Acid I (SRM 4990B) normalized to a δ 13 C VPDB of −19 ‰) (Stuiver and Polach, 1977). The δ 13 C VPDB value of the samples was used to account for isotopic fractionation that occurred during sample formation (Stuiver and Polach, 1977). The preparation of AMS graphite targets followed procedures described in Xu et al. (2007). The final determination of the 14 C 12 C -ratio of AMS graphite targets from all three sites was performed at the Keck-CCAMS facility of University of California, Irvine, USA.

Calculation of 14 C signatures for bulk SOC stock
In order to calculate a bulk 14 C value for the whole soil profile, we used a SOC-stock-based weighting approach: where 14 C SOC,i is the 14 C value of the horizon i, SOC stock is the total SOC stock of the whole profile as defined in Eq. (14), and SOC stock,i is the SOC stock of one horizon.

Calculation of 14 C signatures of bulk heterotrophic respiration
Similar to Gaudinski (2001) a flux-weighted average was calculated as a bulk 14 C value of heterotrophic respiration from individual incubation samples, when incubations were conducted per horizon (Howland) and not on soil cores including several horizons (Coulissenhieb and Solling): where F (CO 2 ) i is the CO 2 produced in jar i, BD i is the bulk density of the soil horizon in jar i, depth i is the thickness of the soil horizon in jar i, and 14 C incubation,i is the 14 C of CO 2 evolved during incubation.

Partitioning of soil respiration
Soil respiration SR can be partitioned into heterotrophic respiration (HR) and rhizosphere respiration (RR) using a variety of approaches. They range from root-exclusion experiments, like trenching and tree girdling experiments, to isotopic approaches, like continuous or pulse labeling of plants in 14 CO 2 or 13 CO 2 atmosphere or using the bomb-14 C signal as a pulse label (Kuzyakov, 2006). At all three sites in this study, an isotopic approach using the bomb-14 C signal was applied. The measurement of the 14 C signature of total soil respiration ( 14 C SR ) and its components ( 14 C HR and 14 C RR ) allows SR to be partitioned into HR and RR using a linear two-source, single isotope mixing model (Phillips and Gregg, 2001). On short timescales the radioactive decay of 14 C can be neglected and the atmospheric 14 C signal can be used as a label that allows us to distinguish between plantderived CO 2 (RR) and SOM-derived CO 2 . Plant-derived CO 2 normally closely follows the 14 C signature of the atmosphere, whereas SOM-derived CO 2 greatly differs from the atmospheric 14 C signal due to longer residence times of C in SOM pools. The 14 C signature of plant-derived CO 2 can, however, differ from the current atmospheric signal if carbon from storage pools and not only recently assimilated carbon is metabolized (Czimczik et al., 2006;). 14 C SR is then a mixture of 14 C HR and 14 C RR that can be described by the following mass balance equations: 14 C SR · SR = 14 C HR · HR + 14 C RR ·RR.
Based on this equation we can calculate the proportion of heterotrophic respiration in total soil respiration (f HR ):

Data uncertainties
Uncertainties for SOC, HR and 14 C HR were taken directly from the original publications we refer to in Sect. 2.2.4 or were calculated from uncertainties reported therein using the basic rules for error propagation for sums and products (Taylor, 1997). Because the propagation of uncertainties for f HR and 14 C SOC,bulk could not be broken down into steps that use the basic rules for error propagation for sums and products, the general formula for propagation of errors had to be applied (Taylor, 1997;Phillips and Gregg, 2001) (Eq. A1). A detailed description of how the uncertainties for f HR (Eq. A5) and 14 C SOC,bulk (Eqs. A2-A4) have been calculated can be found in Appendix A1.

Howland Forest
The total soil organic carbon stock at the Howland Tower site was calculated with Eq. (14) based on carbon content measurements in 1997 from the soil pit (n = 1) reported in Gaudinski (2001) and on data of spatial heterogeneity, coarse fraction volume (CF) and bulk density (BD) from n = 24 quantitative soil pits reported by Fernandez et al. (1993).
Here, we excluded the measurements from the BC horizon, because for the second sampling of 14 C SOC in 2007 measurements were only performed up to the Bs horizon. Because the SOC stock (10 ± 2 kg C m −2 ; mean ± SE) we calculated is only based on one soil pit, its standard error is considerable larger than standard errors of the SOC stock in other studies for the same site (Richardson et al., 2010) that are based on several soil pits reported in Fernandez et al. (1993). Bulk values of 14 C SOC up to the Bs horizon (bottom depth 40 cm) were calculated with Eq. (15) using the horizon specific SOC stocks from 1997 and 14 C values from 1997 and 2007. The 2007 14 C values were weighted with the horizon specific SOC stocks from 1997. 14 C SOC values stem from only n = 1 soil pit. The horizon specific standard errors for SOC stocks are based on estimates from Fernandez et al. (1993) and the standard errors for the 14 C values were used to calculate a standard error for the stock-weighted average with Eqs (A2)-(A4). Bulk 14 C HR values (Eq. 16) were calculated from incubations of horizon specific soil samples that were performed in 1999 and 2010 (personal communication, Carlos Sierra). An f HR of 0.55 ± 0.13 (mean ± SE) was calculated only with data from 1997 (Eqs. 19 and A5), as 14 C SR values were not available for 2010. The atmospheric 14 C signal in 1997 was used as a proxy for 14 C RR measurements. This f HR value was used to calculate HR from an annual time series (1997)(1998)(1999)(2000)(2001)(2002)(2003)(2004)(2005)(2006)(2007)(2008)(2009)) of soil respiration measurements at the tower site from n = 8 collars (personal communication, Kathleen Savage). Standard errors for HR were calculated via error propagation using the standard errors of SR and f HR . Average annual leaf litter input at Howland Forest is about 0.155 kg C m −2 (personal communication, Kathleen Savage). As no data on belowground litter input for Howland were available, we simply assumed that belowground litter input would be in the same range as aboveground litter input (cf. aboveground and belowground litter input at Coulissenhieb II andSolling, andMcClaugherty et al., 1984 andPersson, 1978). Based on lag times for different types of aboveground litter (Gaudinski, 2001, p. 121) we calculated a lag time of 5 years between the photosynthetic fixation of 14 C and its addition to SOC as aboveground litterfall at Howland Forest. A root lag time of 10.5 years (Eq. 7) was calculated from the 14 C of roots < 0.5 mm and 0.5-1 mm (Gaudinski, 2001, p. 151).

Coulissenhieb II
The total soil organic carbon stock at the Coulissenhieb II site (15.1 ± 0.9 kg C m −2 , mean ± SE) is based on measurements of n = 9 soil pits (0.7 m × 0.7 m) including the organic horizons (Oi, Oe and Oa) and the mineral horizons (Ea, Bsh, Bs and Bv; bottom depth 52 cm) (Schulze et al., 2009). A bulk value of 14 C of SOC was calculated with Eq. (15) using horizon specific SOC stocks and 14 C values reported by Schulze et al. (2009). 14 C values of SOC were determined for n = 3 of the nine soil pits. The horizon specific standard errors for SOC stocks and 14 C values were used to calculate a standard error for the stock-weighted average with Eqs. (A2)-(A4). A 14 C HR signature for the year 2007 was calculated as the arithmetic mean of 14 C HR values obtained from six different incubations (Table 2 in Table 2 in . The incubations were performed with soil cores from a control plot from six different sampling dates in the period from 3 August 2006 to 16 October 2007 . The 14 C HR of each sampling date was based on n = 3 replicates. The standard error relating to the 14 C HR,2007 value was calculated via error propagation from the standard errors of the individual sampling dates. The calculated 14 C HR was assigned to the measurement year 2007. 14 C SR , 14 C HR and 14 C RR values of the individual sampling dates were used to calculate f HR and the related standard errors (Eq. 19). The arithmetic mean of the individual f HR values is 0.82 ± 0.06 (mean ± SE). This value was used to calculate HR for the years 2006-2008 with HR = f HR · SR. Standard errors for HR were calculated via error propagation using the standard errors of SR and f HR . Aboveground litter input (iL) data was only available from the adjacent Coulissenhieb I site with an average needle litter input of 0.103 ± 0.017 kg C m −2 yr −1 (mean ± SD) (Berg and Gerstberger, 2004 input from fine roots (iR) of 0.206 kg C m −2 yr −1 was obtained by summing up monthly estimates of fine-root mortality based on the sequential coring method that were reported in studies on the effect of drought and soil frost on the fine-root system (Gaul et al., 2008a, b). Schulze et al. (2009) reported an aboveground litter lag time of six years based on 14 C measurements of fresh spruce litter. We calculated a root-biomass-weighted lag time of eight years from fine-root biomass data (Gaul et al., 2008a) and 14 C measurements of live roots in different depths (Gaul et al., 2009).

Solling D0
The total soil organic carbon stock at Solling D0 was calculated as a combination of SOC stock measurements for the organic layer from an adjacent spruce forest in 1993 and SOC stock measurements for the mineral soil in 1997. In order to be able to combine these measurements, we assume that SOC stocks did not change drastically within four years. The individual SOC stock measurements are based on n = 61 replicates for the Oi + Oe, n = 40 replicates for the Oa, and n = 5 replicates for mineral soil horizons. The combined Oi + Oe horizon was split into Oi and Oe based on data in Lemke (2007). Standard errors of the individual horizons were used to calculate the standard error of SOC stock up to 30 cm. A bulk value of 14 C SOC (3 ± 13 ‰; mean ± SE; bottom depth 30 cm) was calculated with Eq. (15) using horizon-specific SOC stocks and 14 C values for the Solling D0 site that were collected during a Ph.D. thesis (Lemke, 2007). 14 C values of SOC were determined with n = 3 replicates. The horizon specific standard errors for SOC stocks and 14 C values were used to calculate a standard error for the stock-weighted average with Eqs. (A2)-(A4). In July 2004 an incubation experiment yielded a 14 C HR signature of 119.4 ± 1.2 ‰ (mean ± SE). Together with 14 C SR and 14 C RR signatures from Solling D0, the 14 C HR signature was used to calculate f HR (0.69 ± 0.03; mean ± SE) using Eqs. (18) and (A5). This value was used to calculate HR for 2004 with HR = f HR · SR. Standard errors for HR were calculated via error propagation using the standard errors of SR and f HR . As an annual estimate of aboveground litter input the annual average of foliage litter input (0.109 kg C m −2 yr −1 ) on the roof control plot (D2) was used. Fine-root biomass and necromass measurements in different depths from the roof control plot (D2) (Murach et al., 1993) were used to calculate fine-root mortality with the compartmental flow method (Murach et al., 2009). These data from 1992 give an estimate of fine-root mortality of 0.094 kg C m −2 yr −1 . For the Solling D0 site we used the same tlag L and tlag R as for Coulissenhieb II.

Bayesian calibration
Process-based models in geosciences tend to be overparameterized with regard to data availability (van Oijen et al., 2005). Hence, it does not make sense to apply parameter fine-tuning, i.e., looking for one best parameter set, but rather to show how well we can constrain the uncertainty about model parameters with the data at hand. The Bayesian approach is suited to deal with overparameterized models because we are able to include prior knowledge about model parameters θ by updating the prior distribution of parameters p(θ) with the data likelihood p(y | θ) to the posterior distribution of parameters p(θ | y) (Reichert and Omlin, 1997;Gelman et al., 2004): The posterior density p(θ | y) describes the probability of parameters given the model and the observations y. It is a combination of the prior probability of a parameter set θ and the likelihood that we observe the observations y given this parameter set θ (Gelman et al., 2004). Numerical algorithms like the class of MCMC algorithms are commonly used to generate a sample from the posterior p(θ | y) ( van Oijen et al., 2005). The essential property of all MCMC algorithms is that at each iteration the approximate distributions are improved, so that they eventually converge to the target distribution, the posterior p(θ | y). After ensuring convergence of the MCMC algorithm all drawn samples can be used to make inferences about θ by simple summary statistics (e.g., mean, standard deviation and percentiles) or histograms and kernel density estimates which provide insight into the distribution of p(θ | y).

Prior parameter distributions
Based on concluding remarks by Andrén and Kätterer (1997) a broad prior for the humification coefficient h was derived from the mass fraction remaining after a 5-10-year litterbag experiment. Berg (2000) reported a remaining mass fraction of 0.26 for Norway spruce litter in litterbag experiments. We used this value as the mode for a logit-normal distribution with the 99th percentile at 0.9 (Fig. 2a). Since the decomposition rates k Y and k O are theoretically bound at 0, a log-normal distribution was chosen for these two parameters, with modes at 1 yr −1 and 0.006 yr −1 (the latter is the default recommendation by Andrén and Kätterer, 1997). The 99th percentile for k Y was set to 7 yr −1 , and to 1 15 yr −1 for k O (Figs. 2b and c).
The lag time parameters, tlag L and tlag R , are also theoretically limited to positive values. We use the measured lag times, reported in Sect. 2.2.4 for the individual sites, as modes for log-normal priors, with their 99th percentile at the measured lag time +2 years (Figs. 2d and e). Berg and Gerstberger (2004) reported that the ratio of foliar litter input to total aboveground litter input is dependent on stand age: in a Scots pine chronosequence the relative size of the foliar litter fraction was 83 % in an 18-25-year-old stand, 68 % in a 55-61-year-old stand, and 58 % in a 120-126-year-old stand. This corresponds to a possible bias iL factor between 1.2 and 1.7 when only foliar litter input was measured. Hence, we set the mode of a log-normal prior for bias iL to 1 and the 99th percentile to 2. (Fig. 2f). The same prior was used for bias iR but with its 99th percentile at 3 because the amount of root litter input is probably less constrained (Fig. 2g). Truncated normal distributions with mode = 1 and the 99th percentile = 1.5 (truncation at 0) were used for the deviation of steady-state parameters, f Y and f O , so that a priori the highest probability was assigned to Y and O pools in steady state (Figs. 2h and i).

Joint constraints calibration experiment
Under the assumption that the measurement errors were normally distributed, we formulated the data-likelihood function for the individual observational constraints i as: where i is one of the different data streams (SOC, HR, 14 C SOC , 14 C HR ), t ∈ m yrs denotes the years in which measurements were made, σ i (t) the uncertainty associated with the measurement Obs i (t), and ICBM i (t) the model predicted value. σ i (t) was kept fixed at the SEs of the respective measurement under the assumption that these dominate over modeling uncertainty (Reinds et al., 2008;van Oijen et al., 2013). In order to study how combinations of different observational constraints influence the posterior parameter uncertainty, we devised a set of multiple constraints calibration experiments with four runs containing different combinations SOC + 14 C SOC + 14 C HR Run(+HR) SOC + 14 C SOC + 14 C HR + HR of observational constraints i ( Table 2). The multi-objective data likelihood is then simply defined as the product of the the individual p(y | θ) i in Run(XY ): We then used a variant of the standard Metropolis-Hastings algorithm, the delayed rejection and adaptive Metropolis (DRAM) algorithm (Haario et al., 2006), to sample from the posterior distribution p(θ | y). In the adaptive Metropolis part of this algorithm the generation of new proposal parameter sets θ is made more efficient by learning from the accepted parameter sets thus far (Haario et al., 2001). The delayed rejection part of DRAM improves the efficiency by scaling the proposal covariance matrix with a predefined factor if the proposed parameter set is rejected (Haario et al., 2006). We used the DRAM implementation of Soetaert and Petzoldt (2010) to perform the calibration within the statistical software R-2.15.1 (R Development Core Team, 2012). The MCMCs were started from five overdispersed starting parameter sets θ using the datalikelihood function as defined in Eq. (22) and the priors θ as defined in Sect. 2.3.1. These overdispersed starting points were retrieved by Latin hypercube sampling from the entire range of the prior distributions. In short, Latin hypercube sampling means that the prior parameter space is subdivided into equally sized segments and a set of starting parameters is constructed by randomly drawing one value for each parameter out of the segments. All parameters θ that appear in the conceptual overview (Fig. 1) except for λ, the radioactive decay constant, are calibrated.
We can be sure that the five chains have converged if, after thousands of iterations, the chains have forgotten about their initial values. We monitored convergence using the potential scale reduction factorR as defined in Gelman et al. (2004). For each site (Table 1) and each calibration setup (Table 2) we ran five chains in parallel, with 100 000 iterations each. We discarded the first 50000 iterations of each chain and checked if the within-and between-chain point scale reduction factorR < 1.025. Additionally, we visually checked for convergence with trace plots for the five chains and each parameter. Furthermore, we checked density plots for each parameter and chain to ensure that inferences from different chains would give the same results. The second halves of the chains were merged, thinned to a total sample size of 16 666 (every 15th sampled value was kept) and treated as a sample from p(θ | y).

Information content of different constraints
We used two measures to quantify the information gain in moving from the prior p(θ) to the posterior p(θ | y). We computed the relative reduction of the interquartile range between the prior of a certain parameter θ and its posterior: where IQR denotes the interquartile range. IQR was used to quantify the reduction in uncertainty for individual parameters. When we want to take a multidimensional look at combinations of parameters, IQR becomes an undefined quantity. In this case we used the Kullback-Leibler divergence, D KL , to quantify the information content of the different data streams. D KL is a dimensionless measure for the dissimilarity between two probability density functions (PDFs), e.g., the Kullback-Leibler divergence between the posterior p(θ | y) and the prior p(θ) is denoted as D KL (p(θ | y) p(θ )). Since an accurate estimation of D KL based on PDF estimates of p(θ | y) and p(θ) is not possible in higher dimensions (the number of elements in θ), we used a D KL estimator based on a k-nearest neighbor (k-NN) search (Boltz et al., 2009). This k-NN-based D KL estimate does not explicitly estimate the PDFs, but allows a direct esti-mate of D KL from samples of p(θ) and p(θ | y), as retrieved by Bayesian calibration (Boltz et al., 2009).

Results and discussion
The results of all three sites will be presented and discussed in a comparative fashion to highlight similarities and differences between the sites. The results of the calibration at Howland Forest will be used to highlight common characteristics in a more detailed fashion, while differences for the two other sites are pointed out.

Information content of different observational constraints
The degree to which the posterior parameter distributions are constrained compared to the prior parameter distribution depends on three factors: the observational constraints included in the calibration, the respective measurement uncertainties, and the parameter in question (Fig. 3).
Using only SOC as observational constraint (Run(SOC)) already narrows the posterior distribution of k O by 14, 43 and 54 % at Howland Tower, Coulissenhieb II and Solling D0 (Fig. 3). Also, the IQR of the humification coefficient is somewhat better constrained in Run(SOC) compared to the prior (Fig. 3), but the violin plots of h still cover the whole range of possible values (e.g., Fig. 4a at Howland Forest and Figs. A1a, A2a).
SOC together with 14 C SOC (Run(+ 14 C SOC )) considerably narrows the estimates for the humification factor h and the decomposition rate of the old pool k O . Compared to the prior the interquartile ranges of h and k O are reduced in Run(+ 14 C SOC ) by 74-84 % and 88-95 %, respectively (Fig. 3). The other parameters were not considerably constrained by the observational constraints SOC + 14 C SOC .
The inclusion of 14 C HR into the observational constraints (Run(+ 14 C HR )) markedly reduced the uncertainty of the decomposition rate of the young pool k Y compared to Run(+ 14 C SOC ) (Fig. 3). The change of the interquartile range, IQR, is between 20 % for Howland Forest and 96 % for Solling D0. These percentages reflect large differences in observational uncertainties among the studied sites. While the reported uncertainty of 14 C HR at Solling D0 was only 1.2 ‰, at Howland Forest the uncertainty in different years was 2 and 5 ‰.
When HR was included in the calibration, bias iL and bias iR were shifted towards higher values for Howland Forest and Coulissenhieb II (e.g., Figs. 4h and i for Howland Forest). Also, the IQR was markedly decreased by the inclusion of HR in the calibration (Fig. 3), especially for bias iR .
Including 14 C HR in the calibration lead to a slight shift to higher posterior tlag L and tlag R values compared to the prior (panels f and g in Figs. 4, A1 and A2). This means that increasing the lag times is a simple possibility of achieving a      correct 14 C HR signature by increasing the 14 C contents in litter inputs. This already points to a trade-off between estimating k Y and lag times. The slight shift to higher posterior tlag L and tlag R values was accompanied by a broader posterior compared to the prior in Run(+ 14 C HR ) (Fig. 3). The parameters f Y and f O , which were introduced to allow for a deviation from steady state, are hardly constrained compared to the prior in all runs. In general, only the parameters h, k Y and k O could be constrained well with the used observational constraints.

Correlations between parameters
As shown in Fig. 5 for the Howland Tower site, there are many strong correlations between the different combinations of posterior parameter distributions. Prominent correlations between parameters can be explained by comparing the direction of the correlation coefficient to the model structure. The highest positive correlation coefficients were observed between h and k O meaning that a higher value of h can be compensated by a faster decomposition rate of k O . This strong correlation emerges already in Run(SOC), but is per-sistent as more data streams are included . This is consistent with what we have to expect from the model structure: if more carbon from the young pool is transferred to the old pool, the turnover time must be lowered to get the same amount of carbon in the old pool.
When 14 C SOC is included in the calibration, another interesting correlation emerges: the f O parameter is positively correlated with the decomposition rate of the old pool (Fig. 5b). This is in line with considerations by Wutzler and Reichstein (2007) who found that for soils that have not reached (and are below) their equilibrium stock, model calibration to the current carbon stock overestimates the decomposition rate of the slowest pool. They propose a transient correction which prescribes a lower decomposition rate for the old pool. The correlation between f O and k O in runs with 14 C SOC confirms these considerations: if f O was actually below the steady state, but was set to 1, k O would be shifted to faster decomposition rates.
In Run(+ 14 C HR ) h and k Y become negatively correlated (Fig. 5c). This correlation between h and k Y means that the same 14 C HR value can be achieved by either increasing the  fraction of the decomposition flux (k Y · Y ) that is directly respired (i.e., a lower h) or by increasing the decomposition flux itself (i.e., a faster k Y ). Another prominent positive correlation emerges in Run(+ 14 C HR ) for k Y and tlag R (Fig. 5c) and can be interpreted as follows: the young pool can keep its high 14 C signatures by either increasing the decomposition rate k Y or with longer lag times of tlag R , resulting in higher 14 C signatures of the litter input. This correlation pattern trickles down from the decomposition rate k Y of the young pool towards the decomposition rate k O of the old pool via the humification flux h (Fig. 5c).
In Run(+HR) bias iL and bias iR become strongly negatively correlated; this means that in I 14 CBM the total amount of litter inputs have to be at a certain level in order to explain the observed heterotrophic respiration. Hence, we can change either aboveground litter inputs or root litter inputs to get the same amount of total litter inputs. Due to the fact that I 14 CBM models bulk SOC stocks and does not model a depth distribution of root litter inputs, it is not very relevant which kind of C inputs drive the model. Still, it was important to distinguish between aboveground and belowground litter input in order to allow different lag times to the atmospheric record for root litter input and leaf litter input.
Nevertheless, the overall strong correlations suggest that the parameter distributions are more strongly constrained than suggested by the marginal distributions (Fig. 5). This is exemplified by the strong correlations between h and k O : the kernel density estimates of the posterior parameter distributions of h and k O (e.g., in the diagonal of Fig. 5d) do not give any information on how likely it is that low values of h are observed together with very high decomposition rates k O . If we look at the bivariate probability density plot in the lower triangle of Fig. 5d, we get the answer: it is very unlikely. Hence, it is fruitful not only to consider the univariate posterior parameter distribution, but also to consider correlations between parameters in two-or higherdimensional space, which provide a further constraint for the possible model behavior. Braakhekke et al. (2013) conclude that the fact they observed strong correlations between parameters is an indication that the model is overparameterized with respect to the available data. Certainly, I 14 CBM is also overparameterized with regard to the available data at Coulissenhieb II, Solling D0 and Howland Tower. Strong correlations between model parameters are, however, not necessarily only a measure for the degree of overparameterization of a model: a comparison between Run(SOC) (Fig. 5a) and Run(+HR) (Fig. 5d) at the Howland Tower site shows that for Run(SOC) there are far fewer correlations between the posterior parameter samples than in Run(+HR). We can expect that strong correlations between parameters will always exist in modeling studies based on 14 C and C data, because modeling 14 C and C in parallel introduces parameters that govern several similar equations (e.g., k Y in Eqs. 1 and 3). Hence, strong correlations between parameters should not only be seen as an indication for over-parameterization, but also as a reflection of the model structure: if, for example, the young and the old pool were not linked via the humification flux, but received litter input independently of each other, the correlation between k Y and k O would be considerably reduced. In addition, in multipleconstraints calibration settings, correlations between parameters are also an indicator for the strength of trade-offs between different objectives/data streams (Figs. 5a-d).
The features described above for Howland Forest generally also hold true for the calibration runs at the two other sites, Coulissenhieb II and Solling D0 (not shown). The strength of correlations is obviously slightly different, while the direction and magnitude of correlations is the same for most of the parameter combinations.
For a similar purpose as for correlation matrices (Fig. 5) and the IQR, we can use the Kullback-Leibler divergence between the joint posterior distribution of several parameters and their prior to quantify how well the different data streams constrain SOC turnover overall. We present two settings here: the joint posterior of the parameters k Y , h and k O is compared with the joint prior of these parameters (Fig. 6) in which no correlations were present. The parameters k Y , h and k O govern the overall SOC turnover if we do not account for possible biases in the assumptions or measurements with parameters such as bias iL , f O or, for 14 C lag times, tlag L and tlag R . Further, we compared the joint posterior of all parameters with the respective joint prior to evaluate the overall constraint of different data streams on the presented SOC model.
The overall information gain for SOC turnover (joint posterior of k Y , h and k O ) was highest when including 14 C of SOC and HR in the calibration (Fig. 6). Including 14 C HR at Solling D0 led to a disproportionate information gain due to the reported low uncertainty of that data stream at this site (Fig. 6). The information gain for the joint posterior of all parameters was always highest when all data streams were included (Fig. 6). For Run(+HR), the Kullback-Leibler divergence did not indicate much information gain for constraining k Y , h and k O (Fig. 6) compared to Run(+ 14 C); the information gain for all model parameters (Fig. 6) when including HR in the calibration is, however, considerable. This underlines the fact that the HR data are more important for constraining the bias iL and bias iR parameters than for constraining the essential SOC turnover parameters, k Y , h and k O .

Relaxed steady-state assumption
Graphical inspection of the overall agreement between the model and the data showed that I 14 CBM was in general able to reproduce the data used for calibration (Fig. 7). This is valid for all sites and for the all constraints run, Run(+HR) (not shown for Coulissenhieb and Solling). This result can possibly be expected for most inverse modeling studies at other sites, as practically all SOM models are overparameterized considering the inherent scarcity of 14 C data. Some features, however, are notable: even with all observational constraints included, the joint use of SOC stock, HR, 14 C SOC and 14 C HR data did not make it possible to determine if any of the sites has been gaining or losing SOC (Fig. 7), because the marginal distributions of the parameters f Y and f O generally followed their prior distributions (Figs. 4,A1,A2). Nevertheless, at least some constraint for the f O parameter was gained through the correlation between f O and k O (Fig. 5b) which emerged when including 14 C SOC into the calibration. This shows that only the use of a multiple constraints approach (here mainly SOC + 14 C SOC ) made it possible to put this admittedly weak constraint on the source/sink strength of the investigated soils. Nevertheless, this correlation makes it difficult to simultaneously estimate decomposition rates (e.g., k O ) and the source/sink strength of a soil (e.g., f O ), especially for soils with only small deviations from a steady-state SOC stock. We could potentially resolve this trade-off by prescribing stronger priors for f O if we have strong indications for a major carbon loss in the past. Better yet, we could estimate a k O for a soil for which we can be reasonably sure that the SOC stocks are in equilibrium. This k O could then serve as a strong prior for a soil with fairly similar conditions, for which we want to estimate f O .
At the Howland Tower site, SOC stocks modeled in Run(+HR) do not differ much between the nonsteady state and the steady state case (Fig. 7). Not surprisingly, the effects of the parameters that allow for a deviation from steady state are seen more clearly in the time series of modeled HR (Fig. 7d). At all three sites, modeled HR of Run(+HR; nonsteady state) rapidly approaches the modeled HR of Run(+HR; steady state) (e.g., Fig. 7c). This is due to the fact that HR is dominated by CO 2 evolved from the young pool (Fig. 7f). As the young pool only has mean turnover times T Y of 0.8 (Howland Tower), 1.1 (Coulissenhieb II) and 5.7 years (Solling D0) in Run(+HR), steady state will be reached rather rapidly. Conversely, the young pool only accounts for less than 10 % of the total SOC stock at all sites (e.g., Howland Tower in Fig. 7f); thus, the steady state of modeled SOC stock could not be reached within the simulation period, as mean turnover times of the dominant old pool are 377 (Coulissenhieb), 313 (Solling) and 184 years (Howland Tower).
The modeled uncertainty of 14 C HR varies considerably throughout the time series: the uncertainty is low before the bomb peak and increases towards the bomb peak, drops again and is considerably reduced after the observation point ( Figs. 7g and h). The curve of the modeled 14 C HR values is beginning to level out, so that differences in 14 C of heterotrophic respiration between subsequent years will become increasingly difficult to detect. This is even more pronounced for the modeled bulk soil 14 C SOC signature, because bulk 14 C SOC has nearly reached a plateau phase, where values hardly change from year to year. One has to keep in mind, however, that this does not tell us anything about how the bomb peak propagates through the soil profile. Nevertheless, when looking at the 14 C signatures of the young and the old pool (Figs. 7i and j), it becomes obvious that the first peak of 14 C SOC stems from the peak of 14 C in the young pool. The beginning of a plateau phase for 14 C SOC can then be attributed to a mixture of the decreasing 14 C signature of the young pool and a still increasing 14 C signature of the old pool. One may hypothesize that parameters will be less well constrained in the nonsteady-state case when f Y and f O do not show a significant deviation from steady state, because f Y and f O introduce additional degrees of freedom that might not actually be needed. The marginal density plots in Fig. 8 have the advantage over the violin plots (e.g., Fig. 4) that the posterior probability density is not scaled to 1, so we can also use the maximum density as a measure for how well a parameter is constrained. The marginal density plots in Fig. 8 compare how well the model parameters are constrained in the nonsteady-state case and steady-state case. The maximum posterior density of k O is reduced at all sites. At Solling D0 k Y is also slightly less well constrained in Run(+HR; nonsteady state) than in Run(+HR; steadystate). Overall, the marginal density plots in Fig. 8 suggest that parameters k Y , k O and h are constrained well in the nonsteady-state as well as in steady-state version of Run(+HR) compared to the prior. Giardina et al. (2004) report that only around 10 % of soil respiration is derived from the decomposition of old soil organic carbon. Taking the proportion of heterotrophic respiration in total soil respiration, f HR , (Eq. 19) and the contribution of the old pool O to HR at our three sites into account, we have similar mean contributions of 7.3 % (Howland), 5.4 % (Coulissenhieb II) and 13 % (Solling D0) of old soil organic carbon to soil respiration. Because we used a bulk soil organic matter turnover model, the turnover times and the humification coefficient give a diagnostic rather than a mechanistic insight into how much carbon is cycling on the different timescales. The mean turnover times T Y of the young pool of 0.8 (Howland Tower), 1.1 (Coulissenhieb II) and 5.7 years (Solling D0) together with the humification coefficient h of 0.14 (Howland Tower), 0.07 (Coulissenhieb II) and 0.21 (Solling D0) indicate that most of the organic carbon in these soils is turned over within a relatively short period. For the estimation of k Y , one has to keep in mind that it vitally depends on the 14 C HR value (Fig. 4), and thus, by way of 14 iL(t) and 14 iR(t), also on the lag times tlag L and tlag R that we used. Although we do not look at actual root turnover estimates with the parameter tlag R , but merely at a realistic 14 C value of root litter input to the soil organic carbon pool, our tlag R values might be overestimated due to a bias towards larger roots when handpicking roots. Hence, fast-cycling roots with a smaller difference to 14 CO ATM 2 might be underrepresented . In turn, this bias for larger roots and a lower tlag R would result in longer T Y estimates.

Discussion of fitted turnover parameters
The mean turnover times T O of the old pool (184 years at Howland, 377 years at Coulissenhieb II, 313 years at Solling D0) point to the presence of a relatively persistent carbon pool that makes up more than 90 % of the soil organic carbon stock. This high contribution of slowly cycling organic carbon can be mainly attributed to the inclusion of 14 C SOC data in the calibration. Again, this shows the merits of including SOC stocks and heterotrophic respiration fluxes plus their respective 14 C isotopologues. Nevertheless, one has to consider that with a bulk SOC model we have to sum and weight SOC stocks and SO 14 C up to certain depth, so that, e.g., the Coulissenhieb site with a considered bottom depth of 52 cm has a much longer turnover time for the old pool than Solling D0, where we used a bottom depth of 30 cm. Here, vertically explicit SOC turnover and transport models (e.g., Kaneyuki and Kichiro, 1978;O'Brien and Stout, 1978;Elzein and Balesdent, 1995;Baisden et al., 2002;Braakhekke et al., 2011) might be helpful in order to resolve different bottom depths for sampling SOC and SO 14 C. Given the structure of these models, their turnover times, however, still give more diagnostic than mechanistic insight because they do not consider important processes such as sorptive stabilization, energy limitation or the recycling of SOM through microorganisms which are expected to contribute to radiocarbon ages of SOC of more than 1000 years in the deep soil (Conant et al., 2011;Schmidt et al., 2011).

Interpretation of litter input bias parameters
If the sites are in steady state, the bias parameters can be interpreted as a systematic deviation of HR and litter input because HR = iL + iR under steady state (Sanderman et al., 2003). Already by comparing the relation of HR, iL and iR to the calibrated bias iL and bias iR , we see that for Coulissenhieb II and Howland Tower these two parameters have to be higher than 1. The reasons for a bias at these two sites can be manifold: the belowground litter input at these sites might have been underestimated (sequential coring at Coulissenhieb II and the assumption at Howland that belowground litter input is in the same range as aboveground litter input), or there may be a significant contribution of subsoil SOC turnover to overall heterotrophic respiration. Further, our partitioning of soil respiration using the bomb-14 C signal might have overestimated the proportion of heterotrophic respiration in total soil respiration, f HR , because the incubations used to measure 14 C HR might not have been conducted under conditions that are representative of what is observed in the field over the course of a year.
Furthermore, one could also speculate about recent deviations from steady state for faster-cycling soil components (organic layer). The applied deviation from the steady-state parameter, f Y , only matters in the first years of the simulation period, but due to its fast decomposition rate, the Y pool approaches steady state rather rapidly. Hence, one could also interpret a bias parameter above 1 as disturbance of the Y pool leading to a loss of SOC in the young pool. Given the information we have about these two sites, this seems, however, quite unlikely. Nevertheless, at sites where measurements of aboveground litter input and heterotrophic respiration are available, one could use the steady-state relation iR = HR−iL as an additional criterion for assessing the reliability of different methods quantifying root turnover (Lukac, 2012).

Conclusions
1. The Bayesian parameter estimation was very instructive: violin plots of posterior parameter distributions were useful to quickly study the effect of different multiple constraint experiments. The correlation structure between different posterior parameter estimates provided useful insights on model behavior and additional constraints for the parameters.
2. The joint use of four observational constraints did not make it possible to determine whether any of the sites has been storing or losing carbon. Nevertheless, the joint calibration to SOC stocks and the 14 C of SOC stocks showed that there is a trade-off between estimating the source/sink strength of the investigated soils and the decomposition rate of the old pool. Since the introduction of the relaxed steady-state assumption did not cause a considerable amount of extra uncertainty, we can recommend the use of a relaxed steadystate assumption in order to identify possible deviations from steady state.
3. The relation of heterotrophic respiration to the sum of above-and belowground litter input is useful for evaluating the reliability of root turnover estimates.
4. The joint use of all four observational constraints -SOC stock, 14 C of SOC stock, heterotrophic respiration and 14 C of heterotrophic respiration -gives the tightest uncertainties ranges for the most essential model parameters of I 14 CBM: k Y , k O and h. k Y can be primarily constrained by 14 C of heterotrophic respiration, while k O can be constrained well with 14 C of SOC. The transfer coefficient between the young and the old pool, h, was best constrained by the joint use of all data streams.
5. The calibration of the I 14 CBM with the four observational constraints provided a good diagnostic for how much carbon is cycling on the different timescales. The fitted parameters show that in the three investigated soils more than 90 % of the soil organic stock resides in a relatively persistent carbon pool, while the fastcycling young pool contributes more than 80 % to the overall heterotrophic respiration.
6. Using different data streams of model output variables to constrain the parameters of conceptual model pools is a valuable strategy for parameter calibration besides "measuring the modelable", i.e., finding fractions that are relatable to conceptual model pools, or "modeling the measurable", i.e., introducing model pools that can be measured directly.

Supporting material
Detailed data sets from all three sites, the statistical analysis and model code (xlsx files and R scripts) are available upon request by email to Bernhard Ahrens (bahrens@bgcjena.mpg.de).

A1 Data uncertainties
According to Taylor (1997), for independent random errors the uncertainty δq (here, standard error) of any function q, with the variables x, . . . , z with their corresponding uncertainties (here, standard errors) δx, . . . , δz, can be calculated as: (1) The standard error of the SOC-stock-weighted 14 C value (Eq. 15) was therefore calculated as: δq( 14 C SOC,bulk ) =