Effects of model structural uncertainty on carbon cycle projections: biological nitrogen fixation as a case study

Uncertainties in terrestrial carbon (C) cycle projections increase uncertainty of potential climate feedbacks. Efforts to improve model performance often include increased representation of biogeochemical processes, such as coupled carbon–nitrogen (N) cycles. In doing so, models are becoming more complex, generating structural uncertainties in model form that reflect incomplete knowledge of how to represent underlying processes. Here, we explore structural uncertainties associated with biological nitrogen fixation (BNF) and quantify their effects on C cycle projections. We find that alternative plausible structures to represent BNF result in nearly equivalent terrestrial C fluxes and pools through the twentieth century, but the strength of the terrestrial C sink varies by nearly a third (50 Pg C) by the end of the twenty-first century under a business-as-usual climate change scenario representative concentration pathway 8.5. These results indicate that actual uncertainty in future C cycle projections may be larger than previously estimated, and this uncertainty will limit C cycle projections until model structures can be evaluated and refined.


Introduction
The global carbon (C) cycle provides a critical set of feedbacks that influences climate change in Earth system model (ESM) simulations of the twenty-first century. However, simulations of the terrestrial C cycle show considerable spread among models, and much of the uncertainty in C cycle feedbacks with climate change arises from terrestrial processes (Friedlingstein et al 2006, Arora et al 2013, Jones et al 2013, Friedlingstein et al 2014. Uncertainty in projections of global surface temperature change arising from C cycle feedbacks compares in magnitude to the uncertainty arising from physical climate processes (Huntingford et al 2009). Moreover, many ESMs poorly simulate key metrics of the present-day terrestrial C cycle such as vegetation and soil C, plant productivity, and C turnover rates, among others (Anav et al 2013, Piao et al 2013, Todd-Brown et al 2013, Carvalhais et al 2014. These uncertainties in the terrestrial C cycle present a critical challenge for the development of the next generation of ESMs, reflecting both an incomplete understanding of the underlying biological and ecological processes themselves, and how to represent them at global scales. Indeed, while the methodology used to derive the C cycle feedback parameters among models varies, and thus results are not directly comparable, the coupled C cycle-climate simulations reported in the IPCC fourth assessment report (Friedlingstein et al 2006, Denman et al 2007 show a similarly broad range in the carbon-concentration feedback and the carbon-climate feedback for land as those reported in the IPCC fifth assessment report ( Third, data assimilation provides a mathematical framework to constrain a particular model with observations (Smith et al 2013, Hararuk et al 2014. Finally, the mathematical properties of a model in terms of C pools, the partitioning of C input to those pools, and the transfers of C among pools (Xia et al 2013, Luo et al 2014 have been analyzed to assess model uncertainty. While these different approaches yield valuable insight into differences in model states, fluxes and responses to forcings, they often fail to provide insight into the underlying model structures that are collectively responsible for C cycle projections. Structural uncertainties reflect incomplete knowledge of how to represent processes in models. Structural uncertainty tends to increase with greater model complexity, which often accompanies process-level model development aimed at improving model performance. For example, one key component of ESMs has been the recent inclusion of a terrestrial nitrogen (N) cycle. Terrestrial nutrient availability, specifically nitrogen (N), strongly limits plant productivity and ecosystem C fluxes (Vitousek and Howarth 1991, Hungate et al 2003, Lebauer and Treseder 2008. As such, terrestrial C cycle responses to environmental change, like elevated CO 2 and/or climate change, may be strongly mediated by N availability ( Despite the complexities of simulating N biogeochemistry at the global scale, these models consistently demonstrate an attenuation of C-cycle response to environmental change when considering C-N dynamics, relative to C-only simulations. Preliminary efforts to evaluate models that simulate C-N interactions indicate that they partially capture ecosystem responses to elevated CO 2 (Zaehle et al 2014), but also illustrate that representing N inputs, transformations, and losses from terrestrial ecosystems introduces multiple degrees of freedom that increase model uncertainty (Thomas et al 2015). Here, we use one key process in the terrestrial N cycle-N inputs from biological nitrogen fixation (BNF)-to demonstrate the importance of evaluating model structural uncertainty.
Although global increases of N deposition from human activities like fertilizer application and fossil fuel combustion have increased global terrestrial N availability ( Thus, most C-N models use simple, modified-empirical relationships to generate spatial estimates of BNF based on evapotranspiration (ET) and/or net primary productivity (NPP) (table 1; Cleveland et al 1999). These phenomenological relationships are not derived from mechanistic understanding of BNF, but broadly capture biogeographical observations of higher rates of BNF in humid environments with (seasonally) high Table 1. Summary of the global land models that include coupled C-N biogeochemistry, the parameterizations each uses to calculate BNF, and relevant references justifying the chosen parameterization.

Model
BNF approach N Fixation reference solar radiation. Thus, ET and NPP are good bases to derive empirical BNF estimates that are consistent with the view that the energetic costs of 'fixing' atmospheric di-nitrogen (N 2 ) into a biologically usable form (NH 3 ) broadly limit rates of BNF (Gutschick 1981). Estimating BNF using relationships between ET and NPP produce similar estimates of preindustrial BNF inputs, but lead to differing predictions about the response of BNF to changing climate and CO 2 . Here, we compare the differences in N fixation inputs using these two commonly used approaches, and the associated effects on NPP and the global land C sink using the most recent version of the Community Land Model (CLM4.5bgc).

Methods
The CLM4 We conducted two sets of offline simulations with CLM4.5bgc that were identical apart from their representation of BNF. In the NPP driven case, we use the standard NPP-BNF relationship from Cleveland et al where annual NPP fluxes (g C m −2 y −1 ) are used to calculate instantaneous BNF rates (g N m −2 s −1 ). In the modified case we use the lower bound of ET-BNF relationship reported by Cleveland et al (1999).  (1) and (2)). We used 1900-1919 meteorology, 1850 [CO 2 ], N deposition, and land cover (see, Koven et al 2015) and an accelerated spin-up procedure (Koven et al 2013) to approximate steady-state pools and fluxes using the NPP driven configuration, which was followed by another 500-year standard spin-up phase for both NPP and ET driven cases. The initialized simulations were forced with CRU-NCEP re-analysis data over this historical period . The anomaly forcing provides a smooth transition between the observed historical period and the projected RCP8.5 CCSM4 projection. We quantified global differences in mean steady-state (1850-1859) C and N pools and fluxes between these two cases. We examined changes in projected C and N pools and fluxes through the historical period and RCP8.5.

Results
Initial estimates of global BNF were approximately 15% higher when simulated as a function of NPP than when simulated as a function of ET, totaling 90 and 77 Tg N y −1 , respectively (figures 1(a) and (b)). Both of these values are lower than the BNF estimates of Cleveland et al (1999), and at the upper end of the uncertainty estimates in Vitousek et al (2013). In the ET driven case, rates of BNF across much of the northern hemisphere were 25-35% lower than in the NPP driven case. By contrast, rates of BNF were higher in many arid ecosystems and savannas using the ET parameterization, and generally similar across tropical forests. By comparison, estimates of BNF simulated by CASA-CNP show much higher N inputs in the tropics and lower rates of BNF in extra-tropical regions compared to either ET or NPP parameterizations (figure 1(c)), resulting in larger total rates of BNF (142 Tg N y −1 ) (Wang et al 2010;see, Cleveland et al 2013). Few observational data points and high uncertainty (Cleveland et al 1999) preclude robust corroboration of these estimates, although we argue they all represent plausible approximations of preindustrial BNF rates.
Despite the differences in BNF using the two alternative parameterizations (figures 1(a) and (b)), we found negligible differences in steady-state C fluxes and pools between cases. Global estimates of plant productivity from NPP-and ET-driven cases totaled 46.1 and 44.8 Pg C y −1 , respectively. Nitrogen limitation in CLM occurs through the instantaneous downregulation of photosynthesis based on the availability and demand for N (Thornton et al 2007). Modifications to the leaf photosynthesis and canopy integration (Bonan et al 2011(Bonan et al , 2012 and soil N biogeochemistry (Koven et al 2013) have made the model less sensitive to N inputs, as extant soil N pools can largely meet plant demand (figure 2); although, specific aspects of the representation of N biogeochemistry in CLM warrant more focused attention (Thomas et al 2013a(Thomas et al , 2013b(Thomas et al , 2015. In our ET-driven case, estimates of NPP were <10% lower at high latitudes than in the NPP-driven case, but elsewhere they were very similar (figures 2(a) and (b)). Differences between initial total ecosystem C (the sum of all vegetation, litter, and soil C pools) from NPP and ET cases were more subtle, totaling 4500 and 4460 Pg C (0-3 m depth), respectively (2610 and 2570 Pg C (0-1 m depth)), and representing N fixation using the different approaches generated few obvious spatial differences in C pools (figures 2(c) and (d)). Given the steady-state similarities, we focus on the evolution of terrestrial C and N dynamics in transient simulations with changing climate and [CO 2 ] through 2100.
Representing BNF using the ET relationship (equation (2); green lines, figure 3) produces equivalent estimates of BNF, NPP, and total ecosystem C storage through the historical period . By contrast, in the NPP driven case (equation (1)), rates of BNF accelerated because of increases in NPP from CO 2 fertilization under the RCP8.5 scenario (black lines, figure 3), creating a positive feedback between BNF and NPP that resulted in sustained increases in NPP and large increases in terrestrial C storage through the 21st century. Notably, the trajectories of terrestrial C storage are virtually identical for both cases through the historical period, with initial terrestrial C losses driven by land use and land cover that recover by the end of the 20th century, with slight (<2 Pg C) difference in terrestrial C accumulated by 2005. We cannot find empirical support from elevated . Moreover, disturbance currently has no effect on BNF in CLM4.5, which is inconsistent with empirical work showing the highest rates of N fixation immediately following disturbance (Batterman et al 2013, Sullivan et al 2014). By contrast, in the ET driven case N demand outpaces BFN, which increasingly attenuates CO 2 fertilization effects in the modified case. Thus, reducing terrestrial C accumulation by nearly 50 Pg C (∼40%) by 2100, which would increase the atmospheric [CO 2 ] burden by approximately 25 ppm.

Discussion
Representing N fixation as a function of either NPP or ET produces comparable initial land C stocks and fluxes in CLM4.5bgc (figures 1 and 2), but generate significant differences in trends of BNF that have large effects on the global C cycle in transient simulations ( figure 3). These results illustrate one key model uncertainty that more broadly reflects the status of the theoretical understanding and numerical implementation of terrestrial C-N biogeochemistry in ESMs  (Wieder et al 2013(Wieder et al , 2015b, among others. These uncertainties broadly limit the ability to accurately simulate changes in the terrestrial C cycle, and by extension to project future climate. We contend that evaluating these structural uncertainties in ESMs can simultaneously improve the theoretical understanding of biogeochemical processes, inform prognostic climate models, and highlight critical observational needs in the most uncertain aspects of the C-N system. This argues for replacing the empirical approaches (such as those for BNF described here), with more a mechanistic representation of biogeochemical processes. Several approaches for BNF already exist (Gerber et al 2010, Wang et al 2010, Brzostek et al 2014discussed below), however additional efforts are needed to evaluate how any of these approaches may improve confidence in future model projections.
Poor understanding and representation of the factors that regulate rates of biogeochemical processes significantly impede the ability to improve C-nutrient dynamics in ESMs. Specifically, we lack both a detailed theoretical understanding and sufficient empirical data to validate models and inform likely responses of BNF to elevated [CO 2 ] and climate change across biomes. Current approaches that represent BNF rates as a function of NPP are contradictory (equation (1); table 1), especially when the purpose of C-N models is to explore how terrestrial nutrient limitation may mediate C cycle response. Moreover, data from field  (table 1), but should likely be revised.
We recommend alternative model structures be considered to describe rates of BNF-the largest source of N inputs to terrestrial ecosystems. In the short-term, revisions to BNF parameterizations could include empirical relationships with ET and/or assignment of biome-level rates (Cleveland et al 1999). Both approaches still have shortcomings, but they would still represent an improvement over model structures that directly contradict empirical results. Longer-term efforts should focus on exploring large-scale and mechanistic drivers of BNF and potential C-N interactions in response to environmental change. Alternative structures that represent competing hypotheses about the relative importance of different factors effecting BNF rates are already available (table 1, figure 1). For example, the structure of the GFDL- influence BNF. Given the complexity of representing the global N cycle in ESMs, new efforts to mechanistically simulate BNF will require a significant investment in model development and the simultaneous collection of appropriate observational datasets to parameterize and evaluate different model structures and assumptions. Such developments may introduce many more degrees of freedom and uncertainty to land models, but will simultaneously present opportunities to address more scientifically and societally relevant questions about coupled biogeochemical cycles.
The lack of real progress in representing N fixation in models is not surprising-reflecting the fact that actual rates of BNF in most terrestrial ecosystems are poorly understood or measured, and in some cases, completely unknown. Thus, lack of data availability will significantly hinder the evaluation of model developments advocated here. The empirical relationships that inform the BNF parameterizations used in the majority of land models were formed based on extremely limited data (Cleveland et al 1999, table 1), and subsequent progress to generate new estimates has been slow (Cleveland et al 2010, Reed et al 2011). The lack of a robust method for generating point measurements of symbiotic BNF remains a key limitation to generating ecosystem level estimates, although some promising new field sampling approaches may help overcome this issue (Sullivan et al 2014). Observations of free-living BNF rates are even more rare than data . Thus, as plant productivity increases in response to elevated CO 2 and climate change, BNF also increases. By contrast, when BNF was estimated from ET (green line, Cleveland et al 1999), terrestrial N inputs do not keep pace with N demand from elevated CO 2 . This attenuates the rate of NPP increase and reduces the size of the terrestrial C sink nearly 50 Pg by the end of this century. on symbiotic nitrogen fixation rates, and completely absent from some ecosystems (Reed et al 2011). Nitrogen inputs from symbiotic and free-living pathways likely vary over space and through time (Batterman et al 2013), and may be subject to different environmental controls. Collectively, these observations provide a strong theoretical justification for considering symbiotic and free-living BNF separately in land models (Wang et al 2010, Hayes et al 2011, Thomas et al 2013a. For example, in the terrestrial ecosystem model (TEM), Hayes et al (2011) add N from symbiotic BNF to vegetation N pools, while free-living inputs contributing to soil N pools. Similar approaches may be feasible in the near-term with CLM; although we stress that more attention needs to focus on evaluating the C-cycle implications of such structural changes in models.
Model response uncertainties extend beyond representations of N fixation and generate wide variation in C cycle projections both among and within models ( Existing mathematical techniques, generally known as model-data fusion, can help improve model predictions and reduce model response uncertainty by: (1) estimating model parameters that best fit observations, and quantifying their associated uncertainty; (2) improving the model state through data assimilation; and (3) identifying key data deficiencies and model development needs (Wang et al 2009, Williams et al 2009, Dietze et al 2014, Hararuk et al 2014, Luo et al 2014. Although these techniques provide robust ways to constrain model parameters for interpolation, they may not provide reliable insight into how biologically driven processes may respond to environmental change as they overlook the theoretical underpinnings and structural assumptions responsible for processlevel representation in particular models. Structural errors can more formally be identified-but not necessarily attributed-with recursive prediction error algorithms (Lin and Beck 2007), although to our knowledge similar approaches have not been applied to ESMs. Our results indicate that considerations of alternative model structures are critical to improving both the theoretical understanding of important biogeochemical processes (like BNF) and the accuracy of C cycle projections. As an increasing number of models represent C-N biogeochemistry, structural uncertainties associated with the representation of N inputs, transformations, uptake, and losses need to be evaluated.