Tipping point in North American Arctic-Boreal carbon sink persists in new generation Earth system models despite reduced uncertainty

Estimating the impacts of climate change on the global carbon cycle relies on projections from Earth system models (ESMs). While ESMs currently project large warming in the high northern latitudes, the magnitude and sign of the future carbon balance of Arctic-Boreal ecosystems are highly uncertain. The new generation of increased complexity ESMs in the Intergovernmental Panel on Climate Change Sixth Assessment Report (IPCC AR6) is intended to improve future climate projections. Here, we benchmark the Coupled Model Intercomparison Project (CMIP) 5 and 6 (8 CMIP5 members and 12 CMIP6 members) with the International Land Model Benchmarking (ILAMB) tool over the region of NASA’s Arctic-Boreal vulnerability experiment (ABoVE) in North America. We show that the projected average net biome production (NBP) in 2100 from CMIP6 is higher than that from CMIP5 in the ABoVE domain, despite the model spread being slightly narrower. Overall, CMIP6 shows better agreement with contemporary observed carbon cycle variables (photosynthesis, respiration, biomass) than CMIP5, except for soil carbon and turnover time. Although both CMIP ensemble members project the ABoVE domain will remain a carbon sink by the end of the 21st century, the sink strength in CMIP6 increases with CO2 emissions. CMIP5 and CMIP6 ensembles indicate a tipping point defined here as a negative inflection point in the NBP curve by 2050–2080 independently of the shared socioeconomic pathway (SSP) for CMIP6 or representative concentration pathway (RCP) for CMIP5. The model ensembles therefore suggest that, if the carbon sink strength keeps declining throughout the 21st century, the Arctic-Boreal ecosystems in North America may become a carbon source over the next century.


Introduction
Global mean surface temperatures have increased dramatically since the mid-20th century, but have increased up to four times faster in the Arctic-Boreal region (Masson-Delmotte et al 2021, Rantanen et al 2022). This phenomenon is referred to as 'arctic amplification' (Scheffer et al 2012, Francis et al 2017. Although the exact mechanisms for the arctic amplification are debated, temperature and snow-sea ice-albedo feedbacks are keys to understanding this system, and changes in atmospheric and ocean energy transport may play an important role (Previdi et al 2021). While Arctic-Boreal ecosystem productivity may initially benefit from rising atmospheric CO 2 , higher temperatures, longer growing seasons, and faster nutrient cycling, these same systems may increase carbon emissions through permafrost thaw, plant (autotrophic) respiration and increased microbial (heterotrophic) respiration (Mack et al 2004, Natali et al 2012, Crowther et al 2015, Schuur et al 2015, Huntzinger et al 2020, Miner et al 2022.
Projecting the future trajectory of the Arctic-Boreal system presents a large challenge to Earth system models (ESMs) (Hinzman et al 2013) and requires critical cryosphere-specific processes to accurately model its physical, biogeochemical and ecosystem dynamics (including carbon) (Hawkins and Sutton 2009, Knutti and Sedláček 2013, Slater and Lawrence 2013, Koven et al 2015, Lawrence et al 2015, Schimel et al 2015, Ciais et al 2019, Braghiere et al 2021b. The Coupled Model Intercomparison Project Phase 6 (CMIP6; Eyring et al 2016) is the most recent ESM activity, and builds upon CMIP5 (Taylor et al 2012), interpreted in the IPCC Fifth Assessment Report (Intergovernmental Panel on Climate Change 2014). CMIP6 includes the latest generation of comprehensive ESMs, driven by historical greenhouse gas concentrations and climate forcing followed by different future greenhouse gas concentrations pathways according to the shared socioeconomic pathways (SSPs) scenarios (Meinshausen et al 2020, Tokarska et al 2020. The SSPs picture multiple baseline worlds considering underlying factors, such as population, technological, and economic growth, and how those could lead to different future scenarios and global change outcomes. That does not imply a larger uncertainty in climate change, it rather considers different economic and political choices.
While benchmarking and validation of ESMs has become increasingly common in recent years (Fisher JB et al 2018), it is still rare to comparatively evaluate the performance of a carbon cycle model once it is updated (Fer et al 2021). However, comparing models and observations is required for hypothesis testing and predictive skill evaluation (Fisher RA et al 2018). To this end, the International Land Model Benchmarking (ILAMB) project (Hoffman et al 2017, Collier et al 2018 provides the means to track and compare performance through a comprehensive skill score method and to incorporate multiple observational datasets of the same variable of interest to account for observational uncertainty. Moreover, greater agreement between historical runs and observations may indicate that model components can be updated to better capture inaccurate processes. This would increase confidence in future projections, even though forthcoming changes, such as photosynthesis acclimation or species composition shifts, may become progressively more important. Whether Arctic-Boreal ecosystems will evolve into significant carbon sinks (Keenan et al 2014, Zhu et al 2016, Berner et al 2020 It is also imperative to understand if ESMs accurately describe the major carbon cycle fluxes, storage terms, and processes in the present. The critical threshold at which a perturbation can qualitatively alter the system's state or development is referred to as a tipping point (Lenton et al 2008). An Arctic-Boreal carbon cycle tipping point would occur when the rate of release of previously frozen soil carbon to the atmosphere by ecosystem respiration and disturbances (DISTs), including wildfires, surpasses photosynthetic CO 2 uptake (Ahlström et al 2015). This permafrost carbon feedback (Schuur et al 2015, Miner et al 2022 has been identified as a critical Earth system tipping point (Lenton et al 2008, McKay et al 2022. For example, the rapid warming of the Arctic-Boreal zone has accelerated permafrost degradation (Mekonnen et al 2021) and is remaking the vast, conifer-dominated boreal forests, lowering species diversity, increasing ecosystem vulnerability to disease, decreasing vegetation reproduction rates, making fires more frequent and intense, and increasing mortality rates (Lenton 2012, Seidl et al 2017. In a stark example of this from the mid-1990s, summer warming in the absence of sustained increases in precipitation breached the tipping point in western central Eurasian boreal forests, sharply shifting ecosystems into a warmer and drier regime (Buermann et al 2014). Ensuring that ESMs reproduce emergent Arctic-Boreal ecosystem tipping points requires considerable certainty in both ecosystem stability and carbon cycle drivers. Unfortunately, both of these complex, interconnected systems are largely unconstrained within ESMs.
The goal to reduce carbon cycle uncertainty is non-trivial (Hausfather et al 2022) and it can be bounded by inherent model uncertainties encompassing parametric and structural uncertainty, as well as forcing data uncertainty (Lovenduski and Bonan 2017). Yet, interpreting model spread as predictive failure would not bring as much benefit to science as much as a comprehensive discussion of model uncertainty (Bonan et al 2019). To this end, we compare projections of the carbon cycle variables from CMIP5 and CMIP6 over Alaska and northwestern Canada, the domain of NASA's Arctic-Boreal vulnerability experiment (ABoVE; figure S1). We benchmark the historical runs with a suite of state-of-the-art Earth observations to test if the most recent models converge in their projections of the Arctic-Boreal carbon cycle.
The rationale behind benchmarking models in the historical period before evaluating their future projections is related to the hypothesis that a model (or group of models) that better represents the present has greater predictive skills, and therefore, should better represent the future. Although mechanistic processes represented in these models could potentially change in the future (i.e. acclimate), the predictive skill of two model groups can only be evaluated with historical datasets. We compare model performances and uncertainty, identify improvements and deterioration from older ESMs to newer ones, and analyze modeled carbon uptake growing curves to determine if these models project a potential Arctic-Boreal carbon balance tipping point.

Study domain and models
This study focuses on the ABoVE domain, including the Arctic and Boreal regions of Alaska, and the western provinces of Canada (Fisher JB et al 2018, Stofferahn et al 2019, Huntzinger et al 2020. We benchmark six ecosystem and carbon cycle variables: above-ground biomass, gross primary productivity (GPP), ecosystem respiration (RECO), leaf area index (LAI), net ecosystem exchange (NEE), and topsoil carbon. We incorporate 3 meteorological forcing datasets (i.e. surface air temperature, precipitation, and surface downward shortwave radiation), output from a total of 20 CMIP models, with 8 models participating in the CMIP5 and the remaining 12 latest model versions participating in the CMIP6 (table 1). These models were chosen based on the availability of all the required variables included in the analysis. We also evaluated monthly historical simulations (1850-2005 for CMIP5 and 1850-2015 for CMIP6) driven by observation-based forcing data, including greenhouse gas concentrations, gridded land-use data, volcanic aerosols, and other meteorological variables (Eyring et al 2016).
Previously, in preparation for CMIP5, the landuse harmonization v1 (LUH1) project provided harmonized land use data for the years 1500-2100 at 0.5 deg × 0.5 deg resolution (Hurtt et al 2011). These data served as required land use forcing for CMIP5 climate model experiments and have been used in a number of related studies to assess the effects of land use change on carbon cycle and climate. More recently, as part of CMIP6 (Eyring et al 2016), the international research community has developed the next generation of advanced ESMs able to estimate the combined effects of human activities (e.g. land use and fossil fuel emissions) on the carbon-climate system. The strategy described in the updated LUH2 builds on the approach for harmonizing land use patterns and transitions in CMIP5. The new version is completely updated with new inputs and includes higher spatial resolution (0.25 deg vs. 0.5 deg), increased detail (12 states vs. 5 and all associated transitions), added management layers, new future scenarios (8 vs. 4), and a longer time domain (850-2100 vs. 1500-2100)-in all more than a 50fold increase in data from its predecessor (Hurtt et al 2020). Despite these differences, carbon fluxes associated with DISTs are several orders of magnitude smaller than photosynthesis and respiratory terms (see figure S2 in supporting information).
CMIP models provide multiple simulations based on the different experimental configurations for ensemble member analyses to capture the climate system's natural variability. However, some of the participant members have only released their first realization, denominated as r1i1f1; therefore, we utilized the first realization only, following previous recommendations (Anav et al 2013, Park andJeong 2021).

Model benchmarking
We used the ILAMB v2.6 package for model bench-  . Results from the global models were masked out to focus benchmarking on the ABoVE domain.
The relationships between these variables and precipitation, temperature, and incident shortwave radiation were analyzed using data products from the Global Precipitation Climatology  We use the results for the ILAMB overall score for the absolute values (S overall ) defined as: where S bias is the spatially integrated bias score, S RMSE is the root-mean-squared error (RMSE) score doubly weighted to emphasize its importance, S phase is the phase shift score, and S dist is the spatial distribution score. For the whole set of equations of each term in equation (1) refer to Collier et al (2018). All the ILAMB results and plots are available in https:// braghiere.github.com (accessed on: April 05th 2022).

Benchmarking CMIP models
Modeling if an Arctic-Boreal carbon cycle tipping point will occur is rather complex and dependent on a number of feedback loop interactions, but results indicate newer ESMs are generally better at capturing the present-day carbon cycle picture over the Arctic-Boreal ecosystems than were previous model versions.
The exception to the general improvement in more recent ESMs is related to the representation of carbon in soils. Figure 1 shows the overall scores for ecosystem and carbon cycle variables for the CMIP5 and CMIP6 ensemble members. Results for individual models can be found in supporting information (figure S3). The ensemble member is per definition fluxes and states averaged across individual model members within a modeling group. These use multiple datasets included in the ILAMBv2.6. Triangles corresponding to different ensemble members (CMIP5 and CMIP6) and spatial domains (ABoVE and global) are represented by a schematic in blue. Comparisons between CMIPs globally versus the ABoVE domain is performed to determine if the expected improvements in newer model versions are comparable when only considering the Arctic-Boreal North America.
For aboveground biomass, the CMIP6 ensemble member presents a higher overall score than the CMIP5 ensemble member across all the evaluated data products for the ABoVE domain and the globe. However, the overall score for the ABoVE domain is always smaller than the global overall score for both ensemble members. This suggests that most processbased models included in this study better capture carbon allocation to living stock for other ecosystems and regions of the globe on average than to the Arctic-Boreal North America. The highest overall score between models and biomass datasets is given for Thurner, followed by GlobalCarbon, and USForest.
For GPP and RECO, the CMIP6 ensemble member also presents a higher overall score than the CMIP5 ensemble member across all the evaluated data products for the ABoVE domain and the globe, with the overall score for the ABoVE domain being higher than the global overall score for both ensemble members. The higher score is mainly due to larger bias and RMSE in more productive areas of the globe, including the tropics and temperate forests. For GPP, the overall score related to the FLUXCOM data product is systematically larger than the those associated with FLUXNET2015, while for RECO, CMIP6 over ABoVE presents better agreement for FLUXNET2015 than to FLUXCOM.
Spatiotemporal GPP biases between CMIP5 and CMIP6 with FLUXCOM for the period from 1980 to 2014 are shown in supporting information ( figure  S4). The positive bias in GPP presented in CMIP5 in the northeastern and east of the ABoVE domain is not observed in CMIP6, as well as a negative bias in GPP in the pacific. A slight (∼1 g m −2 d −1 ) positive bias in CMIP6 GPP remains in central parts of the ABoVE domain, as well as southern croplands in Canada. For GPP annual cycle, CMIP6 also presents higher agreement with FLUXCOM (bias = 0.56 g m −2 d −1 , , approximately 30% reduction in bias and RMSE, with especially accurate performance during spring. To assess the representation of mechanistic processes in the models, we also evaluate variableto-variable relationships of GPP with precipitation, surface downward shortwave radiation, and temperature. The response curves are then scored by computing a relative error based on the RMSE of reference datasets to the relationship diagnosed in models. Across all the evaluated relationships of GPP with meteorological variables, the scores of CMIP6 were higher than those of CMIP5 (refer to supporting information; figure S5).
For LAI, the CMIP6 ensemble members present a larger overall score than the CMIP5 ensemble members across both spatial domains and data products. Still, the CMIP6 ensemble member over the ABoVE domain presents a larger overall score than the CMIP6 ensemble member over the globe, which is not observed in comparison with AVHRR and AVH15C1.
Spatiotemporal LAI biases between CMIP5 and CMIP6 with MODIS for the period from 2000 to 2006 are shown in supporting information (figure S6). A strong positive bias (>2 m 2 m −2 ) in LAI presented in CMIP5 in most of the ABoVE domain including Alaska, the northeastern and eastern parts of the domain, as well as the boreal cordillera (higher elevation terrain of the Rocky Mountains and the Coast Mountains) are corrected in CMIP6, as well as a negative bias in LAI in the Pacific coast, boreal plain and taiga plain. A positive bias in CMIP6 LAI remains in high elevation areas, as well as southern croplands in Canada. For LAI annual cycle, CMIP6 also presents higher agreement with MODIS (bias = 0.22 m 2 m −2 , RMSE = 0.53 m 2 m −2 ) than CMIP5 (bias = 0.67 m 2 m −2 , RMSE = 0.91 m 2 m −2 ), approximately 40% reduction in bias and 70% reduction in RMSE, with high performance in the first half of the year and slight overestimation in the second half of the year. Across all the evaluated relationships of LAI with meteorological variables, the scores of CMIP6 were higher than those of CMIP5 (refer to supporting information; figure S7).
Finally, soil carbon overall scores highlight a downgrade in model performance from CMIP5 to CMIP6 for both spatial domains in comparison to the HWSD data product and inferred turnover rates from Koven et al (2017) over the globe. In comparison to NCSDV22 soil carbon data, CMIP6 presents a larger overall score than CMIP5 models for both spatial domains, but with significantly smaller overall scores over the ABoVE domain (at least 10%). This may suggest that despite the uncertainty associated with empirical datasets, inconclusive discrepancies remain in the consistency of soil carbon estimates with the observations between the two evaluated ESM generations. Significant feedback loops between soil carbon and environmental factors associated with a lack of improvement in soil carbon representation within updated ESMs may reduce our confidence in future projections of climate change.
In general, CMIP6 overall scores are larger than CMIP5 overall scores for the globe and the ABoVE domain except for one intercompared soil carbon product and turnover rate. It is rather difficult to represent soils within ESMs because they depend on multiple factors such as topography, parent material, environmental factors, and microbial communities, as well as other living beings. Divergence in ESMs arise from incomplete parametric and structural understanding of soil carbon decomposition, as well as uncertainties in model inputs and boundaries, such as climate and plant functional types. An analogous figure to figure 1 showing the climatic forcing metrics is shown in supporting information (figure S8).

The carbon cycle in the ABoVE domain
Attempts to characterize the global carbon cycle through analyzing the carbon budget lies in understanding fundamental differences between the largest atmosphere-biosphere carbon fluxes, i.e. GPP, net primary productivity (NPP), NEE, and net biome production (NBP). Although highly uncertain (Anav et al 2015, Braghiere et al 2019, Jian et al 2022, the global carbon cycle has been previously estimated with: (a) global GPP being about 120 PgC yr −1 , with almost half being lost to autotrophic respiration, resulting in (b) NPP of about 60 PgC yr −1 ; (c) NEE being the difference between the rate of production of living organic matter (NPP) and the decomposition rate of dead organic matter (heterotrophic respiration), with a negative flux indicating net land carbon uptake. Heterotrophic respiration includes all losses by animals and microbial carbon decomposition by microbes. Global NEE is estimated to be about 10 PgC yr −1 , with (d) NBP being the net production of organic matter in a region containing a range of ecosystems and including, in addition to heterotrophic respiration, other processes leading to loss of living and dead organic matter (DISTs such as harvest, forest clearance, fires, insect outbreaks, among others), with a positive flux indicating net land carbon uptake. Global NBP has been considered to be close to zero throughout most of the historical period, but it has been persistently increasing mainly due to the CO 2 fertilization effect and is assumed to be in the order of 1% of NPP and about 10% of NEE (Steffen et al 1998, IPCC 2007.
As previously estimated for the whole globe, we use the CMIP5 and CMIP6 ensemble members for the historical period (Hist., 1986(Hist., -2005 and future simulations (2081-2100) to estimate the large atmosphere-biosphere carbon fluxes and budget over the ABoVE domain (figure 2). We also add three observation-derived fluxes (GPP, RECO, and NEE) from the GBAF ( In a single model, the carbon fluxes are expected to add up, but not in the ensemble member intercomparison. These numbers indicate the individual means of each carbon flux independently calculated across all models included in the ensemble members. The standard deviation of the means is shown in brackets next to their respective values. The historical period CMIP5 ensemble member gives a GPP of 4.18 PgC yr −1 , which is revised downwards by the CMIP6 ensemble member to 3.28 PgC yr −1 for the same period, a reduction of 21.5% in GPP. Likewise, the historical period CMIP5 ensemble member gives a RECO of 3.55 PgC yr −1 , which is revised downwards by the CMIP6 ensemble member to 3.14 PgC yr −1 for the same period, a reduction of 11.5% in RECO. As a result, historical period CMIP5 ensemble member gives a NEE of −0.18 PgC yr −1 , which is revised upwards by the CMIP6 ensemble member to −0.12 PgC yr −1 for the same period, an increase of 33.3% in NEE. The overall reduction in fluxes is mostly due to the inclusion of the nitrogen cycle in CMIP6, lowering the absolute strength of the feedback parameters over land (Arora et al 2020). This suggests that if all next generations ESMs were to include the nutrient limitation (nitrogen and phosphorus) of photosynthesis and respiration, the spread across models will potentially be further reduced.
The carbon flux associated with DISTs was revised downwards for the historical period by the CMIP6 ensemble member (0.07 PgC yr −1 ) in relation to the DIST carbon flux estimated by the CMIP5 ensemble member (0.11 PgC yr −1 ), a total reduction of 57.1%. Adding all these fluxes together gives a budget NBP of 0.07 PgC yr −1 for both ensemble members, which defines the ABoVE domain as a carbon sink over the historical period.
For future scenarios, the CMIP5 ensemble member projects an increase in GPP of 74.7% (7.32 PgC yr −1 ), while the CMIP6 ensemble member projects an increase in GPP of 98.4% (6.51 PgC yr −1 ) with a much stronger CO 2 fertilization effect. Likewise, the CMIP5 ensemble member projects an increase in RECO of 72.1% (6.11 PgC yr −1 ), while the CMIP6 ensemble member projects an increase in RECO of 88.2% (5.91 PgC yr −1 ). Although the carbon emissions by DISTs is also projected to have a stronger increase according to the CMIP6 ensemble member (157%) versus the CMIP5 ensemble member (36.4%), the ABoVE domain is projected to be a carbon sink by the end of the 21st century with a total carbon uptake of 0.20 PgC yr −1 in the CMIP6 and 0.15 PgC yr −1 in the CMIP5 simulations.

Persistent tipping point in NBP over the ABoVE domain in new generation ESMs
Under high emissions scenarios, a relative strengthening of global land carbon-climate feedbacks leads the terrestrial biosphere to shift from a carbon sink to a carbon source at some point after 2100 in all of the CMIP5 ESMs (IPCC 2007, Tokarska et al 2016. Likewise, Koven et al (2022) evaluated land carbon fluxes globally from 5 CMIP6 members until 2300 and found that terrestrial ecosystems are projected to switch from being a net sink to either a neutral state or a net source of carbon depending on the model and the scenario used in the projections. Nevertheless, Koven et al (2022) highlight that land models qualitatively disagree in the spatial patterns, the timing, and the magnitudes of the carbon responses to climate change. Therefore, the diverse potential for global and regional carbon cycle dynamics to change sign under different scenarios highlights the continued need for improved comprehension of the major drivers of terrestrial carbon cycle dynamics. Although we have not directly evaluated climate data after 2100, we found that a tipping point on the NBP curve by 2050-2080 persists in North American Arctic-Boreal ecosystems too, despite minimized uncertainty in CMIP6 models (figure 3 box plot). However, the remaining large spread in ESM projections and the lack of model representation of fundamental mechanistic processes that may amplify or mitigate soil carbon losses on longer time scales (including microbial dynamics, permafrost, peatlands, and nutrients) lead to low confidence in the magnitude of global soil carbon losses with global warming (IPCC, 2007(IPCC, , 2021. Nevertheless, both CMIP ensemble members across different carbon emission scenarios project a tipping point between 2050 and 2080. The carbon sink strength over the ABoVE domain starts to decrease during these years, indicating that sometime in the next century the North American Arctic-Boreal ecosystems will likely become a source of carbon to the atmosphere. This will amplify the local and global impacts of climate change already attributed to a local physical component, the permafrost (Natali et al 2019).
The reason for this tipping point in NBP over the ABoVE domain is that, although all the components of NBP are projected to increase throughout the 21st century (figure S2), the growth rate sum of both respiration terms (0.020 PgC yr −2 and 0.015 PgC yr −2 for autotrophic and heterotrophic respiration, respectively) plus DISTs (9 × 10 −4 PgC yr −2 ) is at least 15% larger than the growth rate of photosynthesis (0.020 PgC yr −2 ) over the same period according to CMIP6. The rate of change related to RECO (and DISTs) is outpacing increases in GPP, which suggests that the thermal response on respiration from warming is greater than photosynthetic gains from CO 2 fertilization and longer, warmer growing seasons.
The tipping point in NBP identified in our analysis may be interpreted as simply due to the CO 2 emissions flattening out (representative concentration pathway (RCP 8.5)) or decreasing (SSP585) in future scenarios (Meinshausen et al 2020); however, this is proven not to be the case when evaluating individual carbon fluxes projections over the ABoVE domain ( figure S2). All carbon fluxes over the ABoVE domain keep going up until at least 2100, i.e.after the CO 2 emissions flatten out (RCP 8.5) or decrease (SSP585), which indicates that climatic factors and not CO 2 are driving the carbon fluxes up. The mean NBP for the ABoVE region over the 20th and 21st century as simulated by the CMIP5 and CMIP6 models is shown in figure 3. The simulated 20th century net carbon flux is steeper in the CMIP6 model mean because the absolute values of land carbonconcentration feedback are larger, as also shown in Arora et al (2020). In addition, more CMIP6 models include a representation of the nitrogen cycle (table 1), which worked to reduce model spread despite the additional added complexity (Arora et al 2020). While the interannual variations are larger in CMIP5 models, these models make use of RCPs to determine the amount of warming that could occur by the end of the 21st century, by setting pathways for greenhouse gas concentrations, the SSPs set the stage on which reductions in emissions will-or will not-be reached. The model spread relative to the model mean change for each scenario is smaller for CMIP6 (figure 3 box plots), implying that these models present higher convergence in their projections than CMIP5. This suggests that despite more model complexity, the spread across land models is reduced in more recent versions. Despite disagreements in total carbon sink strength among CMIP5 and CMIP6 ensemble members, both CMIP versions agree that some areas of the ABoVE domain will indeed become stronger carbon sinks in the future relative to historical values. This indicates that the CO 2 fertilization effect of photosynthesis is predicted to overcome climate change impacts on respiration and other loss fluxes. Model mean patterns of NBP change for a 20 year time period centered around 2090, relative to 1986-2005, are shown in figure 4.
Although some areas are similar in CMIP5 and CMIP6, CMIP6 NBP is projected to increase in the South and West of the Hudson Bay, and decrease in northern Alaska, as well as the northern areas of Yukon and the Northwest territories. This is associated with less resilient land cover types, such as grasslands and open shrublands.
Tundra ecosystems are less resilient than boreal forests because of their low biodiversity (only 3% of the world's flora), low biomass (short and sparse vegetation), and low NPP (long and cold winters). In addition, permafrost makes it very difficult for roots to penetrate the soil, as well as slowing the decay of organic matter, which decreases the availability of nutrients (Scheffer et al 2012, Dial et al 2022. A robust increase in NBP is projected over southern Alaska and British Columbia across both model generations. This increase indicates a positive feedback in the productivity of evergreen needleleaf and mixed forests. Stippling on the maps is used to show model agreement locally, highlighting areas where the standard deviation of the model mean is 80% or less of the standard deviation of the interannual variability.

Discussion
Modern model versions increase our confidence in prediction as more computational power, observations, and process understanding are available. CMIP6 predicts a stronger carbon sink in the ABoVE domain than previously predicted by CMIP5, and smaller spread across all scenarios (figure 3 box plots). Smaller model spread in land carbon fluxes over the ABoVE domain are primarily due to better representations of photosynthesis, respiration, and biomass, although clear improvements in soil carbon are still lacking (figure 1).
The Arctic-Boreal carbon cycle in CMIP6 represents more of the relevant processes with increased detail. For example, while GPP generally presents high ILAMB overall scores because the description of processes like photosynthesis generally relies on very well-established ecophysiology models (Farquhar et al 1980, Collatz et al 1992, other processes controlling the stocking of carbon in vegetation, like respiration, litterfall, and allocation lack a unifying or generally acceptable framework (Reichstein and Carvalhais 2019). Moreover, the way in which ESMs calculate permafrost-defined as ground where soil temperature remains at or below 0 • C continuously for at least 2 years (Black andMuller 1948, Guo andWang 2016)-is directly linked to the representation of snow and soil physics of each individual model, rather than a newly implemented modeling feature, such permafrost thawing heterogeneous dynamics and nutrient cycling, for example. Previous studies have evaluated permafrost from CMIP5 (Koven et al 2013, Slater et al 2017 and CMIP6 (Burke et al 2020) and found that the spread of simulated present-day permafrost area within the ensemble members is large and mainly caused by structural divergences in mechanistic processes representation of soils and snow within models.
Even though the model spread in CMIP5 and CMIP6 projections are similar or only slightly smaller in CMIP6, scientists have incorporated some of the missing processes (such as components of the nitrogen cycle, for example) into carbon cycle projections, so we are more confident that ESMs capture more of the relevant land surface processes. Nevertheless, ILAMB scores should be used with caution when demonstrating model improvement or ranking models in their performance (Collier et al 2018). Spatiotemporal GPP and LAI biases between CMIP5 and CMIP6 with FLUXCOM and MODIS indicate a general improvement of CMIP6 models over CMIP5 models. Higher relationship scores of GPP and LAI with precipitation, downward shortwave radiation, and temperature indicate not only a general improvement towards smaller biases with observations, but also a more accurate mechanist process representation as function of environmental variables in newer model versions.
Spatiotemporal GPP biases between CMIP5 and CMIP6 with FLUXCOM for the period from 1980 to 2014 are improved by approximately 30% in newer model versions, with especially accurate performance during spring ( figure S4). For LAI, the spatiotemporal biases between CMIP5 and CMIP6 with MODIS for the period from 2000 to 2006 are improved by approximately 40% in newer model versions ( figure S6). The climate sensitives of GPP and LAI are also improved in CMIP6 models as indicated by higher scores between reference datasets to their relationships to climatic variables diagnosed in the models (figures S5 and S7 in supporting information).
A persistent tipping point in the NBP curve by 2050-2080 is projected over the ABoVE domain despite minimized uncertainty in CMIP6 models (figure 3 box plot). The tipping point in the NBP growing curve over the ABoVE domain is not simply due to the CO 2 emissions flattening out (RCP 8.5) or decreasing (SSP585) in future scenarios, but rather a combination of different carbon fluxes with opposite signs growing at different rates, namely respiration terms plus DISTs being at least 15% larger than the growth rate of photosynthesis ( figure S2). This indicates that climate change and not the CO 2 fertilization effect is causing this tipping point in the carbon cycle over the ABoVE domain. The rate of change related to RECO (and DISTs) outpaces carbon gains from GPP, which suggests that the thermal and moisture responses on respiration is larger than the photosynthetic gains from the CO 2 fertilization effect, as well as longer and warmer growing seasons.
In general, CMIP6 has higher ILAMB overall scores compared with CMIP5, although soil carbon is an exception. Initial attempts to model the relationships between soil carbon and climate factors were not successful to predict the uncertain effects of climate and land use change (Post et al 1982). These first models described soil carbon as being mainly controlled by evapotranspiration and precipitation rates, as well as plant functional types. However, it is now understood that soil carbon depends on a number of other mechanistic relationships with soil type, microbial communities, as well as species composition. Taken together, soil carbon uncertainties need to be addressed in the next generation of ESMs with greater emphasis of the modeling community by basing more off mechanistic science of the relationships driving soil and rhizospheric processes (Pallandt et al 2022).
Soil carbon and soil carbon turnover time are an exception to the general model improvement from CMIP5 to CMIP6 and have a lower benchmarking score for CMIP6 than for CMIP5 (figure 1). Although it is usually the case to use ILAMB overall scores to backup conclusions about model improvement, a more detailed evaluation analysis of the component metrics that determine the overall scores is required (Collier et al 2018, Bonan et al 2019), especially because ILAMB analysis assigns relative weights to each dataset to qualitatively account for uncertainty in the datasets themselves and multiple statistical metrics (see equation (1)). Moreover, previous studies have shown that ESMs cannot reproduce soil carbon under current climate (Todd-Brown et al 2013, Luo et al 2015 or regionally, as well as global carbon soil datasets present large discrepancies (Tifafi et al 2018, Crowther et al 2019 making it very challenging to appropriately conduct a model benchmarking. Global overall scores for soil turnover time also indicate a degradation in model performance in CMIP6 over CMIP5 (figure S9).
Model discrepancies and lower overall scores in ILAMB highlight soil carbon assumptions that deserve further investigations, especially regarding how ecosystems are likely responding to climate change and shifts in resource availability (Giorgi 2006, Wieder et al 2019, Braghiere et al 2022. Future model development endeavors should carefully consider rhizospheric and soil processes, for example the inclusion of nutrient dynamics into ESMs could help constrain the CO 2 fertilization effect (Norby et al 2010, Reich and Hobbie 2013, Terrer et al 2018, Braghiere et al 2022.
Although previous studies have hypothesized that rising temperatures may increase soil carbon losses through decreased soil turnover times mainly due to increased heterotrophic respiration, especially in high latitudes (Davidson and Janssens 2006, Koven et al 2011, Bond-Lamberty et al 2018, other effects like aggregate formation and mineral-organic interactions could actually stabilize soil carbon, limiting the loss responses to increased temperature (Dungait et al 2012, Han Weng et al 2017. Decomposition model structures generally use first-order decay kinetics and lack to represent microbial traits, but they could be updated to include microbial priming effects, as well as mineral-soil organic carbon interactions (Todd-Brown et al 2013, Wan andCrowther 2022). In addition, physical processes like erosion, soil creep, as well as river transport of carbon and sediments are typically not represented in ESMs, but they can be critical for carbon cycling (Resplandy et al 2018), especially from permafrost thawing.
Often experimental manipulations can indicate unanticipated ecosystem responses from theoretical expectations or previous observations (Melillo et al 2017, but these experiments can be expensive and demanding, which limit their spatial and temporal coverage. In order to expand scientific possibilities, modern ESMs should be enabled to make direct use of new observations from spacecraft in orbit allowing an estimation of carbon cycle variables, linking eddy covariance sites with much higher spatial coverages Schneider 2019, Braghiere et al 2021a).
The challenge remains on how to link above canopy measurable signatures to underground rhizospheric mechanisms, which depends upon the development of new theory to address previously unobserved processes. Hyperspectral datasets seem highly promising in deriving a wide range of unique constraints on plant functional traits that can be linked to the rhizosphere (Singh et al 2015, Butler et al 2017, Sousa et al 2021. As a way forward, scientists should be able to bridge different process-based model scales and modeling parametric decisions (Shi et al 2018, Braghiere et al 2020a, Wang et al 2021, as well as applying machine learning tools to spatially and temporally limited data to improve understanding about complex processes in which current models have little to no mechanistic ways to represent (Bloom et al 2016, Braghiere et al 2020b.

Conclusion
The rapid warming of the Arctic-Boreal zone is expected to shift current states of the biosphere indefinitely, moving the Earth system into new, and possibly irreversible states (Lenton 2012, Seidl et al 2017. Future NBP projections indicate narrower model spread for CMIP6, while both CMIPs indicate a tipping point in the NBP growing curve by 2050-2080 suggesting that, if the carbon sink strength keeps declining throughout the 21st century, the Arctic-Boreal ecosystems in North America will highly likely become a carbon source in the next century. In general, simulated carbon cycle variables from CMIP6 better agree with observations than those from CMIP5 for the globe and the ABoVE domain with an exception for soil carbon and turnover rate (figure 1). While processes like photosynthesis rely on well-established ecophysiology models (Farquhar et al 1980, Collatz et al 1992 and present improvements in relationship to climatic variables, soil carbon lacks a generally acceptable description and processes understanding (Reichstein and Carvalhais 2019), which can be alleviated with more experimental manipulations, observations, and newer modeling paradigms.

Data availability statement
The data that support the findings of this study are openly available. The CMIP6 data simulations performed by various modeling groups are available from the CMIP6 archive (https://esgf-node. llnl.gov/search/cmip6). The ILAMB package can be downloaded from (https://github.com/rubisco-sfa/ ILAMB). All ILAMB plots are available from (https:// braghiere.github.io/).