The signature of internal variability in the terrestrial carbon cycle

Uncertainty in model initial states produces uncertainty in climate simulations because of unforced variability internal to the climate system. Climate scientists use initial-condition ensembles to separate the forced signal of climate change from the unforced internal variability. Our analysis of an 11-member initial-condition ensemble from the Community Earth System Model Version 2 that spans the period 1850–2014 shows that a similar ensemble approach is needed to robustly assess trends in the terrestrial carbon cycle. Uncertainty in model initialization gives rise to internal variability that masks trends in carbon fluxes, and also creates spurious unforced trends, during the period 1960–2014 across North America, meaning that a single model realization can diverge from the observational record or from other models simply because of random behavior. The forced response is, however, evident in the ensemble mean and emerges from the noise of unforced variability at decadal timescales. Our results suggest that trends in the observational record must be interpreted with caution because of multiple possible histories that would have been observed if the sequence of internal variability had unfolded differently. Furthermore, internal variability produces irreducible uncertainty in the carbon cycle, leading to ambiguity in the magnitude and sign of carbon cycle trends, especially at small spatial scales and short timescales. The small spread in initial land carbon pools at 1850 suggests that internal climate variability arising from atmospheric and oceanic initialization, not the biogeochemical initialization, is the predominant cause of carbon cycle variability among ensemble members. Initial-condition ensembles with other Earth system models are needed to develop a multi-model understanding of internal variability in the terrestrial carbon cycle.


Introduction
Human-caused climate change is the outcome of forced change from greenhouse gas emissions, aerosols, and other anthropogenic forcings superimposed upon unforced random variations internal to the climate system (termed internal or natural variability) (Collins et al 2013). Internal variability arises from the chaotic behavior of the atmosphere at timescales of several days to a few weeks (Lorenz 1963) and from longer term variations in the coupled atmosphere-ocean system such as the El Niño/Southern Oscillation, Pacific decadal variability, and Atlantic multi-decadal variability (Deser et al 2012a(Deser et al , 2014. It manifests as climate variability in the observational record at subseasonal-to-seasonal timescales, and also as interannual variability and decadal or longer fluctuations. Internal variability is seen in climate models in that imprecision in initial atmospheric and oceanic states produce different climate trajectories over the historical record and in future projections, each with its own random sequence of unforced variability and each an equally plausible realization of climate change. Research in recent years utilizing single-model large ensembles, with 20-100 ensemble members that differ only in initial conditions, shows that internal variability generates considerable ensemble spread in climate, particularly at regional spatial scales, and has produced new insights for interpreting the observational record, for model evaluation in comparison with observations and other models, and to separate the forced signal of climate change from unforced variability (Deser et al 2012b, Maher et al 2019. In particular, internal variability can mask anthropogenic influences on climate, and a single model realization may diverge from the observational record, or from other models, simply because of random behavior. Furthermore, internal variability represents irreducible uncertainty in climate simulations and limits the accuracy of climate predictions, especially at regional spatial scales and timescales up to a few decades (Hawkins et al 2016).
Use of initial-condition ensembles to investigate internal variability has gained considerable traction among climate scientists, seen, for example, in analysis of ensemble spread in temperature and precipitation trends (Deser et al 2012a, 2012b, Martel et al 2018, Swain et al 2018, Bengtsson and Hodges 2019, Dai and Bloecker 2019, Maher et al 2019. Internal variability is a large source of uncertainty in projections of future climate at regional and decadal scales (Hawkins and Sutton 2009, Lehner et al 2020. Moreover, the forced anthropogenic signal may not emerge from the noise within the climate system for several decades, as seen in the concept of time of emergence Sutton 2012, Lehner et al 2017). Internal variability affects not just physical climate; for example, initialcondition ensembles reveal variability in Arctic sea ice (Swart et al 2015, Kirchmeier-Young et al 2017, Duvivier et al 2020, Landrum and Holland 2020 and snowmelt runoff (Mankin et al 2015). Initialcondition ensembles have also been used to study unforced variability in ocean biogeochemistry and the time of emergence of the forced signal (Frölicher et al 2009, Rodgers et al 2015, Schlunegger et al 2019.
The internal variability of the terrestrial carbon cycle in Earth system models and the associated ensemble spread are less well studied. The divergent climate trajectories arising from uncertainty in atmospheric and oceanic initialization should manifest in unforced variability in carbon cycle trends. Uncertainty in the initialization of land biogeochemical pools may also contribute to spread among ensemble members. An analysis of a six-member initialcondition ensemble of the Community Climate System Model found differences among ensemble members in annual land carbon uptake, especially at small spatial scales, though the spread among members decreased when averaging over decadal or continental scales (Lombardozzi et al 2014). The study found that internal variability precludes detection of forced changes in the carbon cycle at timescales shorter than several decades in many regions of the world. Earth system models show considerable differences in the terrestrial carbon cycle (Arora et al 2020), but the extent to which the differences reflect model structural uncertainty or internal variability is unknown. Initial studies of uncertainty partitioning in terrestrial carbon cycle projections did not utilize large ensembles to estimate the contribution of internal variability (Hewitt et al 2016, Lovenduski andBonan 2017). Here, we analyze an initial-condition ensemble for the Community Earth System Model (CESM) over the historical era (Danabasoglu et al 2020). We focus on North America to illustrate uncertainty in the terrestrial carbon cycle arising from model initialization so as to complement previous analyses of internal variability in North American temperature and precipitation trends (Deser et al 2012b(Deser et al , 2014(Deser et al , 2016. This work contributes to growing literature that climate uncertainty is a key component of carbon cycle uncertainty (Wu et al 2017, Bonan et al 2019.

Methods
We analyzed an 11-member ensemble of simulations with the CESM Version 2 (CESM2) that spans the historical period 1850-2014 (Danabasoglu et al 2020).
Simulations have a nominal 1 • horizontal resolution with active atmosphere, ocean, sea ice, and land component models. The atmospheric model for these simulations is the Community Atmosphere Model Version 6, and the simulations are referred to as CESM2(CAM6) in Danabasoglu et al (2020). All simulations use the same historical forcings, and atmospheric CO 2 concentration is specified. Danabasoglu et al (2020) describe the simulations and the initialization of the component models. The 11 simulations differ only in initial conditions for 1850, which were obtained from different years of a 1200 year preindustrial control simulation (years 501, 601, 631, 661, 691, 721, 751, 781, 811, 841, and 871). Each year represents equally plausible model states for initialization. Danabasoglu et al (2020) provide further details on the spinup of the preindustrial control. Uncertainty in atmosphere, ocean, sea ice, and land initial states propagates throughout the system as a result of physical, chemical, and biological processes in the component models and interactions among component models.
The Community Land Model Version 5 (CLM5), the CESM2 component land model, simulates surface fluxes of energy, mass, and momentum for coupling with the atmosphere; the associated hydrologic cycle and runoff of freshwater to the ocean; and terrestrial biogeochemistry so that leaf area index, canopy height, and vegetation and soil carbon pools are prognostic . In addition to soil temperature, soil moisture, and snow cover, land biogeochemical states must be initialized for 1850. For much of North America, initial vegetation and soil carbon states in 1850 differ by less than ±5% of the 11member mean (figures S1 and S2 (available online at stacks.iop.org/ERL/16/034022/mmedia)). The largest percentage spread in vegetation carbon occurs in regions with herbaceous vegetation and small initial carbon pools. Analyses of individual grid cells and ecoregions of the US further show that vegetation carbon is in quasi-steady state and the initial conditions differ on the order of 100-200 g C m -2 (figure S3). Soil carbon has a small trend (relative to pool size) over the preindustrial control, but initial conditions for the ensemble members differ by only 100-200 g C m -2 (figure S4). Land surface variables related to the energy, water, and carbon cycles are improved in CLM5 compared with previous versions of the model , Danabasoglu et al 2020. We analyzed gross primary production (GPP) and net ecosystem production (NEP). NEP is the net carbon gain after subtracting ecosystem respiration from GPP.
We analyze the period 1960-2014, which overlaps with the Global Carbon Project analysis of carbon trends (Friedlingstein et al 2019). We refer to an individual simulation as an ensemble member and the mean of the 11 members as the ensemble mean. We analyzed the time series of each member and the ensemble mean to obtain the trend over the 55 year period from 1960 to 2014. The trend was assessed using a linear fit to the 1960-2014 time series, and statistical significance was determined by regression slopes with p < 0.05.
The National Ecological Observatory Network (NEON) characterizes the US in terms of 20 ecoclimatic regions across the continental US, Hawaii, and Puerto Rico. Each region has a core terrestrial site (table S1) and an associated geographic domain (figure S5). We mapped these ecoclimatic domains onto the 1 • CLM5 surface grid and analyzed trends for 18 ecoregions (excluding Atlantic Neotropical and Pacific Tropical) at the core terrestrial site (defined as the model grid cell with the corresponding latitude and longitude of the site) and for the domain average.
A standard definition for time of emergence is the year when the forced signal S(t) (i.e. the change from a reference period) exceeds the noise N by a specified amount (S(t)/N > 2), but methods differ in the temporal filtering of data to estimate the forced signal (e.g. running mean, polynomial fit, or linear trend) and how to calculate the noise (e.g. standard deviation for a reference period, standard deviation of residuals after extracting the forced signal, or ensemble standard deviation of trend estimates) (Hameau et

Results
The ensemble mean time series has a statistically significant increase in annual GPP throughout much of North America between 1960 and 2014 (figure 1). Annual GPP increases by 200-300 g C m -2 yr -1 or more over the 55 year period throughout portions of Alaska, Canada, and the eastern US. Portions of the central and western US have a smaller increase in GPP. The magnitude of the trend varies among ensemble members, most prominently in Alaska and western Canada with some ensemble members having a small GPP increase (e.g. members 5 and 7) and others having a large increase in the same region (e.g. members 4 and 10).
Analysis of GPP trends for ecoclimatic regions across the US, defined by the NEON core terrestrial sites (table S1) and geographic domains (figure S5), shows regional differences in ensemble variability. The model grid cell corresponding to the Northeast (D01) site has a statistically significant increase in annual GPP over the period 1960-2014 in all 11 ensemble members, ranging from 266 to 363 g C m -2 yr -1 over the 55 years (figure 2(a)). The Mid-Atlantic (D02) site has larger spread in GPP trends: the trend is not statistically significant in one ensemble member, and GPP increases by 136-451 g C m -2 yr -1 over the 55 year period in the other ten members (figure 2(b)). The Ozarks (D08) site has large variability among ensemble members: GPP is statistically unchanged in three members, decreases in one member (−111 g C m -2 yr -1 over the 55 years), and increases by 88-591 g C m -2 yr -1 over the 55 years in the other seven ensemble members (figure 2(c)). GPP increases in all ensemble members at the Taiga (D19) site, with a range of 159-303 g C m -2 yr -1 per 55 years ( figure 2(d)).
The Mid-Atlantic (D02) and Ozarks (D08) sites show that although some ensemble members may not have a statistically significant GPP trend, significance is evident in the ensemble mean time series. The Prairie Peninsula (D06), Central Plains (D10), and Southern Rockies & Colorado Plateau (D13) sites have a similar result (table S2). Conversely, the Northern Plains (D09) and Desert Southwest (D14) sites have a small number of ensemble members with a statistically significant increase in GPP, but not in the ensemble mean. Ensemble variability decreases when GPP is spatially averaged over the ecoclimatic domains, most noticeably for the Mid-Atlantic (D02) and Ozark (D08) domains (figures 2(e)-(h)). Domain-average trends are generally less variable than site trends (table S2). However, while all domains have a statistically significant increase in the ensemble mean GPP, some domains such as the Prairie Peninsula (D06) and Northern Plains (D09) have a large number of members in which GPP does not increase. Two domains where the ensemble variability is essentially unchanged with spatial scale are Tundra (D18) and Taiga (D19).
NEP has a statistically significant increase in the ensemble mean time series across Alaska, Canada, the eastern US, and portions of the Central Plains and Pacific Northwest, but similar statistical significance is mostly lacking in any ensemble member (figure 3). Only the Canadian Arctic, eastern Canada, and northeastern US have a statistically significant NEP increase in the individual ensemble members. Analysis of the ecoclimatic regions further highlights the difference between the ensemble mean and individual ensemble members ( figure S6, table S3). NEP increases significantly in all 11 ensemble members for the Northeast (D01) site and in seven members for the Ozarks (D08) site, but in only one member for the Mid-Atlantic (D02) site and none at the Taiga (D19) site. At each of these sites, the ensemble mean NEP does significantly increase. The Northeast (D01), Great Lakes (D05), and Pacific Northwest (D16) sites are one end of the uncertainty spectrum, with all members showing positive NEP trends. The Central Plains (D10) and Taiga (D19) sites are the opposite extreme: NEP has no statistically significant trend in any ensemble member, but a positive NEP trend in the ensemble mean. The spread among ensemble members generally decreases when NEP is spatially averaged over the domain, but, consistent with figure 3, there are many ecoregions where the domain-average NEP increases in the ensemble mean whereas the ensemble members have instances of non-significant trends. Decadal averaging generally decreases the variability among ensemble members (figure S7).
The time at which the forced signal in GPP emerges, based on trends over the 1960-2014 period, varies among ecoclimatic regions and with spatial scale. At the site scale, the increase in GPP emerges throughout the eastern US and in the Pacific Northwest, where the trend in GPP is large and/or variability is small ( figure 4(a), table S4). The Northeast (D01) site has the shortest time of emergence (1978; 18 years). The longest time of emergence (2008; 48 years) is at the Mid-Atlantic (D02) site, where the forced signal is large but variability is also large. Although the GPP trend at the Great Basin (D15) site is small, the signal emerges because of low variability. The signal emerges over a larger area of the US when GPP is spatially averaged over the domains, and in most domains more rapidly than at the site scale ( figure 4(b), table S4). In central and southwestern regions of the US, the increase in GPP does not emerge during the 1960-2014 period at either the site or domain scales. The increase in NEP seen at many sites and most domains (table S3)  Concomitant with carbon cycle variability in the 11-member ensemble is variability in climate trends, such as 2 m surface air temperature ( figure  S9). Although the ensemble mean time series has a statistically significant warming between 1960 and 2014 over all of North America, differences among ensemble members are evident. Over Alaska and northwest Canada, for example, some ensemble members have small to non-significant warming while other members have more pronounced warming. Temperature trends for NEON ecoclimatic regions further highlight variability among ensemble members (figure S10, table S5). At the sites in the Northeast (D01), Mid-Atlantic (D02), Ozarks (D08), and Taiga (D19), the 55 year temperature trend varies by 1 • C or more among ensemble members. Ensemble variability does not necessarily decrease when spatially averaged for the entire domain. Notably, the Tundra (D18) and Taiga (D19) have the largest ensemble spread at both the site and domain scales. Ensemble variability in annual GPP correlates with variability in temperature and terrestrial water storage (figure S11). In northern and eastern regions of North America, years that are anomalously warm also have anomalously high GPP, whereas warm years correlate with low GPP in central and southwestern regions. Central and southwestern regions have high GPP during wet years.

Discussion
Large initial-condition ensembles are necessary to separate the forced signal of climate change from unforced variability internal to the climate system (Deser et al 2012b, Maher et al 2019. Our analyses show that multimember initial-condition ensembles are also necessary when considering temporal changes in the terrestrial carbon cycle. Throughout North America, the time series of annual GPP and NEP over the period 1960-2014 for any one ensemble member can deviate from the ensemble mean. Ensemble variability is particularly evident at the site scale and is generally reduced for spatial averages across ecoclimatic domains. Decadal averages reduce ensemble variability compared to annual fluxes. Ensemble variability raises important considerations for the detection and attribution of forced changes in the carbon cycle; for comparisons of models to observations and to other models; and for our understanding of how to reduce uncertainty in Earth system model simulations of the carbon cycle. Many modeling studies (e.g. Sitch et al 2015, Fernández-Martínez et al 2019, Friedlingstein et al 2019, Haverd et al 2020 report that plant productivity and net carbon uptake are changing, possibly as a result of elevated CO 2 concentration, climate change, increased nitrogen deposition, disturbance legacy, or some combination of these factors. Observational analyses using eddy covariance flux towers with a small footprint or biogeochemical studies of small stands also suggest a changing terrestrial carbon cycle (e.g. Dragoni et al 2011, Pilegaard et al 2011, Froelich et al 2015, Baldocchi et al 2018, Finzi et al 2020. Our results show that modeled and observational trends in the carbon cycle must be interpreted with caution. Use of one ensemble member to discern trends in GPP at the site scale could, by chance, result in masked trends (i.e. a nonsignificant trend even though the ensemble mean trend is significant) (figure 2, table S2). The ensemble also reveals the possibility of spurious trends in which some members have a significant trend even though the ensemble mean trend is not significant. Even at the domain scale, there is still a possibility of masked GPP trends in some ecoclimatic regions if only one ensemble member is analyzed. The need to consider the ensemble mean rather than any one ensemble member is particularly apparent for NEP (figure 3). In many ecoclimatic regions, a statistically significant increase in NEP is found in the ensemble mean time series at both the site and domain scales but not in most, or sometimes in any, of the individual ensemble members (table S3). Averaging across ensemble members decreases the random internal variability such that the forced response is apparent. Analysis of observed carbon cycle trends, or of carbon cycle models forced with meteorological observations, is subject to similar uncertainty in that the observational record is but one realization of many possible alternatives in which the sequence of internal variability could have unfolded differently. The forced signal in GPP emerges after a few decades in eastern regions of the US, is not evident in central regions, and emerges in western regions mostly at the domain scale ( figure 4). The implication is that only decadal-scale changes in GPP can be attributed to anthropogenic forcings. Our findings pertain to the period 1960-2014, but a previous analysis also found multi-decadal time of emergence for forced changes in the terrestrial carbon cycle over the 21st century (Lombardozzi et al 2014). Trends in NEP do not emerge over the 55 year period in any ecoclimatic region at the site scale and in only two regions at the domain scale (figure S8). Ensemble variability decreases when analyzing decadal averages (figure S7), suggesting that NEP trends might be detectable when averaged over long timescales. Annual assessments of the carbon cycle such as the Global Carbon Project (Friedlingstein et al 2019) are more likely to express internal variability than forced change, but decadal assessments are likely more robust.
Another implication of our findings relates to ecological studies of carbon cycle trends. Ecologists inherently focus on site research to discern changes in the carbon cycle (e.g. Dragoni et al 2011, Pilegaard et al 2011, Froelich et al 2015, Finzi et al 2020. Because carbon cycle trends are more readily detected at the domain scale than the site scale, bridging the gap between site-scale, in situ measurements and the regional to continental scale is essential. Eddy covariance measurements provide insights to the environmental drivers of interannual variability in carbon fluxes (Baldocchi et al 2018(Baldocchi et al , 2020. Further research is needed to bridge the understanding of carbon cycle variability gained from eddy covariance studies with the notion of internal variability gained from Earth system models. Atmospheric internal variability means that when comparing Earth system models to temperature and precipitation records, a single model realization many not replicate the observations simply because of unforced variability. Likewise, the observational record is but one realization of many possible outcomes. Climate scientists have begun to consider internal variability as it pertains to observations, using statistical models to generate an ensemble of unforced climate histories so as to estimate the uncertainty in temperature and precipitation trends due to internal variability (McKinnon et al 2017, McKinnon andDeser 2018). A similar consideration pertains to the terrestrial carbon cycle. Our results suggest that the observational record has multiple histories that would have been realized if the sequence of internal variability had unfolded differently.
A further implication of our findings relates to uncertainty reduction. The ensemble spread produced by internal variability represents irreducible uncertainty in climate projections (Deser et al 2012b, Hawkins et al 2016. The uncertainty is an inherent property of the model and differs from model structural uncertainty, which can be reduced. Internal variability is a particularly large contribution to climate uncertainty at regional and smaller spatial scales and at decadal and shorter timescales (Hawkins and Sutton 2009, Lehner et al 2020. Ocean biogeochemistry has similarly large irreducible uncertainty at regional scales (Frölicher et al 2016. Our results show that the notion of irreducible uncertainty extends to the terrestrial carbon cycle and creates ambiguity in carbon cycle projections. This uncertainty is an inherent property of the model and will not be reduced with better process representation of the carbon cycle. The uncertainty decreases when averaging over larger spatial regions (figures 2 and S6) or decadal timescales (figure S7). Better understanding of the irreducible uncertainty of the terrestrial carbon cycle, and better communication of the uncertainty, is critical to informing carbon cycle research.
It is well understood that small differences in initial atmosphere or ocean states produce different simulated climate trajectories. Indeed, uncertainty in initial conditions is the rationale for using initialcondition ensembles to assess uncertainty in climate simulations (Deser et al 2012a, 2012b, Maher et al 2019. Our results show that the initial model states in 1850, and the resulting climate variability among ensemble members, affect the magnitude and sign of carbon cycle trends over the historical era. Ensemble variability in annual temperature and water storage on land strongly correlates with ensemble variability in annual GPP (figure S11). A previous study with a different Earth system model correlated variability in net primary production with anomalies in temperature and precipitation and found a similar qualitative pattern over North America (Lombardozzi et al 2014). The CLM5 used in the CESM2 simulations differs from previous versions of the model (CLM4, CLM4.5) in the sensitivity of the carbon cycle to meteorological forcings (Bonan et al 2019, and the model has low interannual variability in carbon fluxes compared with previous versions (Wozniak et al 2020). Our results are likely model dependent, meaning that other modeling centers need to consider initial-condition large ensembles of the terrestrial carbon cycle so as to develop a multi-model understanding of internal variability. Analyses of climate uncertainty also point to the need to consider internal variability in a multi-model context (Lehner et al 2020).
Uncertainty in the land biogeochemical states used to initialize the ensemble members in 1850 could also contribute to ensemble variability in carbon cycle trends. However, the initial vegetation and soil carbon pools used for each ensemble member differ by less than ±5% of the ensemble mean across much of North America (figures S1 and S2). If the ±5% spread in biogeochemical initial conditions contributes greatly to the ensemble variability, this is a level of uncertainty that would be difficult to reduce. It is possible, too, that different time-evolving biogeochemical states contribute to ensemble variability over the period 1960-2014, but vegetation and soil carbon pools still differ by only about ±5% across ensemble members in 1960 (figures S12 and S13). It is more likely that ensemble variability in climate contributes to the variability in carbon cycle trends and that biogeochemical states are at most a secondary source of uncertainty. Simulations that vary only atmosphere/ocean initial states or land biogeochemical initial states could help separate the two influences on carbon cycle variability, but care must be taken to prevent introduction of spurious carbon cycle trends.
Finally, further research is needed to determine how many ensemble members are needed to accurately sample internal variability. The 11-member ensemble is too small to provide reliable probability distributions, such as can be obtained with larger ensembles of up to 100 members . Nonetheless, we find more instances of masked trends at the site and domain scale for GPP (table S2) and NEP (table S3) than for temperature warming (table S5), suggesting that more ensemble members are needed to capture carbon cycle variability than temperature variability.

Data availability statement
The data that support the findings of this study are openly available at the following URLs: https://esgf-node.llnl.gov/search/cmip6/ and https://www.cesm.ucar.edu.