Earth System Models Are Not Capturing Present‐Day Tropical Forest Carbon Dynamics

Tropical forests play a key role in absorbing carbon from the atmosphere into the land surface. Recent analyses of long‐term (1985–2014) forest inventory plots across the tropics show that structurally intact tropical forest are a large carbon sink, but that this sink has saturated and is projected to be in long‐term decline. Here we compare these results with estimates from the two latest generations of Earth System Models, Climate Modelling Intercomparison Project 5 (CMIP5) (19 models) and CMIP6 (17 models). While CMIP5 and CMIP6 are of similar skill, they do not reproduce the observed 1985–2014 carbon dynamics. The “natural” pan‐tropical carbon sink from inventory data is 0.99 Pg C yr−1 (95% CI 0.7–1.3, n = 614) between 2000 and 2010, the best sampled decade, double the CMIP6 multimodel‐mean of 0.45 Pg C yr−1 (95% CI 0.35–0.55). The observed saturating and declining sink is not captured by the models, which show modest increases in sink strength. The future (2015–2040) “natural” pan‐tropical sink from a statistical model driven by extrapolating past trends of its putative environmental drivers decreases by 0.23 Pg C per decade (95% CI 0.09–0.39) until the 2030s, while the CMIP6 multimodel‐mean under the climate change scenario closest to the statistical model project an increasing carbon sink (0.54 Pg C per decade; 95% CI 0.25–0.67). CMIP multimodel‐means reproduce the response of carbon gains from tree growth to environmental drivers, but the modeling of carbon losses from tree mortality does not correspond well to the inventory data. The model‐observation differences primarily result from the treatment of mortality in models.

size of net tropical land-use change emissions, ∼1 Pg C yr −1 , because atmospheric CO 2 measurements suggest that the net carbon balance of the tropics is close to zero, although net land use change emissions are highly uncertain in magnitude and trend (Friedlingstein et al., 2019). For example, the tropical forest sink in live biomass was estimated at 1.2 Pg C in the 1990s using inventory plots (Pan et al., 2011). Yet two sets of long-term observations suggest that this "natural" tropical land carbon sink has passed its peak in the late 1990s or early 2000s. These come from direct observations of 565 long-term forest inventory plots across Amazonia and Africa (Hubau et al., 2020), and long-term atmospheric CO 2 records globally that suggest a more than proportionate increase in the Northern hemisphere land sink in the 2000s and 2010s, implying a concomitant reduction in the tropical sink, given the overall increasing sink (Ciais et al., 2019). Both studies suggest that the sink in the tropics is declining by ∼0.2 Pg C per decade.
Repeated censuses of 565 inventory plots from intact tropical forests in Africa and Amazonia show that the saturation and decline in the natural tropical forest carbon sink has a geographical pattern (Hubau et al., 2020). Carbon uptake in intact Amazonian forests has been declining since the 1990s (Brienen et al., 2015). Their African counterparts, while retaining a constant uptake until very recently, also appear to have saturated, with recent signs of a decline in carbon sink strength since ∼2010 (Hubau et al., 2020). The declining sink trend is driven by the increase in carbon losses (from tree mortality) being greater than the increase in carbon gains (from tree growth and new tree recruitment). Analyzing the putative environmental drivers behind the changes in carbon gains and losses with through linear mixed effects models Hubau et al. (2020) find that the later saturation of the sink in African forests is likely due to a combination of factors, including less frequent severe droughts, a more drought tolerant tree community, lower air temperatures, and slower turnover of trees compared to Amazonian forests, hence a much longer carbon residence (mean time that fixed carbon stays in the live biomass pool), resulting in a slower feedback between rising carbon gains and rising carbon losses (Hubau et al., 2020). The temporal trend toward saturation and decline of the natural tropical forest carbon sink suggests that the projected land carbon cycle feedback to climate change, where tropical land switches from a sink to a source, may already be underway.
However, to assess the future of the natural tropical forest carbon sink requires process-based models. Robust models that can simulate past patterns of carbon uptake and other properties of tropical forests enable projections of sink-to-source transitions that will allow society to utilize this information in policy decisions. Earth System Models (ESMs) are a key tool used to project the evolution of the land carbon sink under different future scenarios. Hence comparisons of ESM results on the evolution of the natural tropical forest carbon sink with those from the inventory plot data are useful for evaluating model performance and improving them in the future.
Here we compare the observations from the AfriTRON network of long-term forest plots in structurally intact tropical forest in Africa (Hubau et al., 2020;Lewis et al., 2009), the sister RAINFOR network of plots across Amazonia (Brienen et al., 2015;Malhi et al., 2002), and more limited data from SE Asia (Qie et al., 2017) with two generations of ESMs (CMIP5 (Taylor et al., 2012) and CMIP6 (Eyring et al., 2016). Specifically, we estimate carbon stocks and their change over 1985-2014, comparing like-for-like CMIP5 and CMIP6 results that have no or minimal land use change with observations from inventory plots. Furthermore, we compare the CMIP6 projections of the pan-tropical carbon sink from 2015 to 2040 with the projected sink trend from the observation-based statistical model by Hubau et al. (2020) over the same period.

Methods
We first compare the magnitude and trend of the observed tropical forest carbon sink in live biomass where there is no land use change, using the CMIP6 historical simulations, matched to grid squares within zero or minimal land use change (Figure 1). The plot data span the early 1980s to 2014, so we estimate trends over 1985-2014 for 17 CMIP6 models. We also extract the same data from 19 models within previous CMIP5 generation of models over their common time period 1985-2005). We then compare the carbon gains to the inventory plots (woody productivity from tree growth) with model equivalents (the woody fraction of net primary production (NPP)), and compare carbon losses in the plots (from tree mortality) and model equivalents (live biomass carbon leaving the live biomass carbon pool).  , 1985-2014. Time series and trend of net total biomass carbon (Carbon net change) change (a), Carbon gains from net primary production (NPP) (b), and Carbon losses from mortality (c) from CMIP6 multimodel-mean (black), 17 individual models (light gray), and observations for Africa (blue; Hubau et al., 2020) and Amazonia (orange; Brienen et al., 2015). Total biomass carbon is calculated by scaling published aboveground carbon (AGC) estimates to include belowground live biomass carbon (roots; Hubau et al., 2020). The model slope is based on the multimodel-mean. The slope values are for 1985-2014, slope values in brackets are for the best-sampled 2000-2010 period. A 5-year moving average is applied to model data to match the smoothing introduced by the sampling interval of the observations. availability) and the plot carbon residence time (CRT, which measures how long fixed carbon remains in the system and hence reflects when past increases in carbon gains leave the system as elevated carbon losses). Hubau et al. (2020) then used these carbon gain and loss models to estimate past trajectories of carbon gains and losses and extrapolations of past MAT and MCWD, MAT-change, MCWD-change and CO 2 -change, to make a tentative estimate of the future size of the intact forest carbon sink. This showed that the modeled trajectories fit the past observed trends  suggesting that they reliably predict a post-2015 multi-decadal decline in the carbon sink to 2040 on both continents.

Observational Data
Observational time series of carbon gains, losses and the net sink for Amazonian forests are obtained from Brienen et al. (2015), those for African forests are obtained from Hubau et al. (2020). Environmental predictor variables and projected forest carbon gains, losses and sink time series from both continents are obtained from Hubau et al. (2020). For our pan-tropical estimates we also include inventory data from Southeast Asia (Qie et al., 2017). The plot data (n Amazonia = 321; n Africa = 565; n SEAsia = 49) is re-censuses of the diameter every tree ≥10 cm diameter, their identity and height estimate, within a plot, typically 1 ha, using an allometric equation to estimate carbon stocks, and carbon gains (from tree growth and newly recruited stems) and carbon losses (from tree mortality) for reach plot over time. The observations have average census interval lengths of 4.4 years in Amazonia (Brienen et al., 2015) and 5.7 years in Africa (Hubau et al., 2020). The plots cover lowland (<1500 m above sea level) closed canopy (i.e., not woody savannah) old-growth mixed-age forests with at least 1000 mm annual precipitation. Following Hubau et al. (2020), total live biomass carbon is the observed aboveground carbon (AGC), carbon net sink, carbon gains and loss scaled to include the carbon in coarse roots, by adding 37% of the AGC of Amazonia forests, adding 25% of the AGC for African forests, and 17% of the AGC for SE Asian forests to the AGC values.
KOCH ET AL.   , 1985-2005. Multimodel-mean (black line) and individual models (light gray lines, CMIP6 n = 17; CMIP5 n = 19) time series of the annual net carbon sink and multimodel-mean trend over the common time period 1985-2005, and total biomass carbon in Africa (orange) and Amazonia (blue). Total biomass carbon is calculated as in Figure 1. The model slope is based on the multimodel-mean. A 5-year moving average is applied to model data to match the smoothing introduced by the sampling interval of the observations. Figure 3. Potential environmental drivers of total biomass carbon (Carbon) gains and losses in structurally intact African (blue) and Amazonian (orange) tropical forests in observations and Climate Modelling Intercomparison Project 6 (CMIP6) models. Relationship and trends of observed (a) and modeled (b) Carbon gains and losses with atmospheric CO2 concentrations, mean annual temperature (MAT), and mean climatic water deficit (MCWD). Each point represents an inventory plot (a) and a multimodel grid cell average (b). Varying transparency in observed plots represents total monitoring length and plot size as in Figure 2 in Hubau et al. (2020). Solid lines show significant trends, dashed lines nonsignificant lines (p > 0.05).

CMIP5 and CMIP6 Data
The CMIP5 and CMIP6 experiments are driven by CO 2 and other greenhouse gas concentrations as well as non-GHG forcings based on historical reconstructions until 2005 for CMIP5 and until 2014 for CMIP6, and with future SSPs from 2015 until 2040 for CMIP6 models, corresponding to differing levels of warming KOCH ET AL.  . Net total biomass carbon (Carbon) sink projections from observation-based model (blue/orange) and CMIP6 Socioeconomic Scenario Pathways (SSP) (black/gray). Observation-based projection of net total biomass carbon sink in intact tropical African (blue) and Amazonian (orange) forests from Hubau et al. (2020); multimodel-mean (black) and individual CMIP6 model projections 2015-2040 (gray). Model slope is based on multimodel-mean. A 5-year running average is applied to model data. and other features (Eyring et al., 2016;Taylor et al., 2012). There are 19 models in the CMIP5 experiments, and 17 models in the CMIP6 experiments, and 11 CMIP6 models that have SSP runs completed, listed in Table S1. All gridded data is aggregated to 2.5 × 2.5 degree resolution (via xESMF bilinear regridding, Zhuang, 2020) to facilitate a like-for-like intermodel comparison (e.g., Aloysius et al., 2016).
Total live biomass carbon (Carbon) is assumed to equal cVeg (Stem+Root+Leaf carbon). Carbon gains (in Mg C ha −1 yr −1 ), are calculated from NPP outputs multiplied by the an allocation fraction into wood (np-pWood/npp) and the biomass-carbon factor (0.456) based on Martin et al. (2018), as in Hubau et al. (2020).
In the absence of model-specific allocation fractions we use the multimodel-mean allocation fraction derived from a subset of six CMIP6 models (0.49) and for CMIP5 a combination of published ratios from terrestrial ecosystem models (that are used in CMIP5 models) and three CMIP5 ESMs (0.46, Malhi et al., 2011;Negrón-Juárez et al., 2015, Table S3). Annual carbon losses from mortality (in Mg C ha −1 yr −1 ) are calculated as carbon gains minus annual net change in total biomass carbon, assuming that any carbon gain that leaves the biomass carbon reservoir is dead plant material. Annual net biomass carbon-change (in Mg C ha −1 yr −1 ) is calculated as the year-to-year difference in total biomass carbon (in Mg C ha −1 ).
The mean annual carbon sink for 1985-2014 for CMIP6 models and for 1985-2004 for CMIP5 models is calculated from change in carbon stocks scaled by tropical forest area from Hubau et al. (2020), reported in Table S4. We applied a 5-year moving average to the model time series for a like-for-like comparison with the observations. Global gridded annual land use data is the same as used in CMIP5 (Hurtt et al., 2011) and CMIP6 (Hurtt et al., 2020). CMIP data is extracted for the grid cell locations based on their AGB and initial land use in 1985, and land use change over 1985-2014. Thereby data is masked to tropics extent (22.5°N-22.5°S), with only locations selected with a multimodel total biomass carbon ≥150 Mg C ha −1 and ≤2.5% grid cell area KOCH ET AL.  Table S7.  Pan-Tropics (Including SE Asia), 1980s-2000s land use in 1985 and ≤1% grid cell area change in land use over 1985-2014 in CMIP6 for comparison with observations (see Figure S1 & Table S2). For the comparison between CMIP5 and CMIP6 we apply the same thresholds but adjusted to the common 1985-2005 period. We also removed any grid cell with increases in land use above the 1% threshold at any point in time between 1985 and 2014. The extraction masks broadly overlap with the sampling locations from the observations ( Figure S1).
Trends in the carbon gains, losses, and net change time series from the observations, are reproduced from Hubau et al. (2020) for African forests  and Brienen et al. (2015) for Amazonian forests (1983-2011.5) with the R code provided in Hubau et al. (2020). For CMIP5 and CMIP6 multi-model means were calculated and trends fitted using the lm function in R (R Core Team, 2019) following the methodology from Hubau et al. (2020). For each grid cell we examined atmospheric CO 2 concentration, MAT, and drought intensity (i.e., minimum climatological water deficit-MCWD) as potential environmental drivers that according to theory may explain the long-term trends in carbon gains and carbon losses, consistent with Hubau et al. (2020). For the observations these were extracted from Mauna Loa atmospheric CO 2 record (Tans & Keeling, 2016), MAT from Climate Research Unit v.4.03 (Harris et al., 2020) and drought intensity from the Global Precipitation Climatology Centre (Ziese et al., 2014). Globally averaged annual atmospheric CO 2 concentrations, used to force the CMIP5 simulations, are derived from the CMIP5 input data archive for the historical period  (https://www.iiasa.ac.at/web-apps/tnt/RcpDb/). Global annual atmospheric CO 2 concentrations for CMIP6  are obtained via the ESGF (https://esgf-node.llnl.gov/search/ input4MIPs/). MAT is extracted from the CMIP5 and CMIP6 models for each grid cell. Consistent with Hubau et al. (2020) drought is quantified as Maximum Climatological Water Deficit (MCWD), a commonly used metric of dry season intensity for tropical forests (Aragão et al., 2007(Aragão et al., , 2014Phillips et al., 2009). MCWD for each year was calculated from monthly CMIP precipitation data for each selected grid cell, following the same method as in Hubau et al. (2020). First, we calculated monthly CWD values for each subsequent series of 12 months (Aragão et al., 2014). Monthly CWD estimation begins with the wettest month of the first year in the record (i.e., 1985) and is calculated as 100 mm per month evapotranspiration (ET) minus monthly precipitation (P). Then, CWD values for the subsequent 11 months were calculated recursively as: CWD i = ET -P i + CWD i -1, where negative CWD i values were set to zero (no drought conditions) (Aragão et al., 2014). This was repeated for each subsequent 12 months. The annual MCWD is then taken as the largest monthly CWD value in a subsequent year. Larger MCWD indicates more severe water deficits.

Future Carbon Trends
The observationally constrained future carbon trends use two statistical models based on data from 1983-2014; one to project carbon gains (from CO 2 -change, MAT, MAT-change, MCWD, and wood density) and a second to project carbon losses (from CO 2 -change, MAT-change, MCWD, and CRT). These are reproduced from Figure 3 in Hubau et al. (2020) with the data and R code provided.
For this CMIP study, only a subset of 11 models show future projected total biomass carbon gains, losses and net carbon change for all four available SSPs (SSP126, SSP245, SSP370, and SSP585) from 2015 until 2040 (Table S1). Global annual atmospheric CO 2 concentrations for the CMIP6-SSPs (2015-2040) are obtained via the ESGF (https://esgf-node.llnl.gov/search/input4MIPs/). For several of the previously selected grid cells land use is projected to increase substantially under all scenarios between 2015 and 2040 (e.g., up to 57% of the total grid cell area in African tropics in SSP585). Thus, we removed grid cells where land use change over 2015-2040 exceeds the 1% grid cell area threshold from the analysis. One exception is SSP585 where the 1% threshold would remove all grid cells in Africa. Therefore, a 3% threshold is applied for this scenario. Trends for the SSP scenarios are calculated using the lm function in R.
These differences persist when considering the best sampled decade for the observations, 2000-2010, where CMIP6 models show a sink of 0.23 ± 0.16 and 0.34 ± 0.08 Mg C ha −1 yr −1 for Africa and Amazonia respectively, compared to observations of 0.88 ± 0.30 and 0.80 ± 0.17 Mg C ha −1 yr −1 for African and Amazonia respectively ( Figure 2 and Table 1). Overall, the CMIP6 models underestimate the size of the natural carbon sink, although models INM-CM4-8 and EC-Earth3-Veg were closest in magnitude to the sink across both continents ( Figure S3). The CMIP5 models similarly also underestimate the observed size of the natural carbon sink: for the CMIP5 1985-2005 period, the observations give a sink of 0.77 ± 0.33 and 0.64 ± 0.21 Mg C ha −1 yr −1 in Africa and Amazonia respectively, 68% and 64% higher than the modeled outputs of 0.25 ± 0.18 and 0.23 ± 0.09 Mg C ha −1 yr −1 .
The trend in the natural carbon sink in the observations over 1985-2015 has been broadly stable in Africa (0.005 Mg C ha −1 yr −1 per year) but sharply negative in Amazonia after 1990 (−0.012 Mg C ha −1 yr −1 per year). The CMIP6 multimodel-mean replicates the observed trend for Africa, with very slightly positive slope (0.002 Mg C ha −1 yr −1 per year) and nonsignificant trend (trend p = 0.167). But, CMIP6 outputs do not replicate the results from Amazonia, as CMIP6 models show a stronger positive trend in the sink (0.008 Mg C ha −1 yr −1 per year) that is significant (p = 0.049). Both BCC models and CanESM5 are the only ESMs that reproduce the stability in the natural African carbon sink and a decline of the natural Amazonian sink ( Figure S3).
The year-to-year variability in the sink from the observations in Africa is 0.06 Mg C ha −1 yr −1 and in Amazonia is 0.08 Mg C ha −1 yr −1 . This is similar to that seen in some individual models (CESM2-WACCM, INM-CM5-0), however, others show much larger variability ( Figure S3). One particular outlier is NorCPM1 which exhibits a strong mortality event in the mid-1990s, which is not seen in the observations.
The carbon sink results are not driven by our restrictions on the inclusion of grid cells to limit the impacts of land use change in the CMIP6 model results. This is firstly because land use in our selected grid cells occupies a median of <0.1% (0%-2.2% range) of the total grid cell area in CMIP6 in 1985. The mean change in land use is <0.1% (range −0.3%-0.9%) of the total grid cell area for CMIP6 over 1985-2014. The analysis is insensitive to residual land use and is reproduced with a higher AGB threshold of 180 Mg C ha-1 and with a lower land use change threshold of 0.5% of the total grid cell area over 1985-2014 ( Figure S10).
Underlying the natural carbon sink are the trends in carbon gains and carbon losses. CMIP6 multimodel-mean carbon gains in both continents' tropical forests is slightly lower than the observations (Africa: 11%; Amazonia: 4%). However, the CMIP6-mean shows a clear increasing trend in carbon gains (both continents, p < 0.001), matching the observed positive trend in Africa and Amazonia (Africa, p = 0.037; Amazonia, p < 0.001; Figure 1). Interannual variability in carbon gains is similar between models and observations, with <0.01 Mg C ha −1 yr −1 on both continents. Overall, only a few individual models reproduce the observed magnitude, trend and interannual variability seen in the observations for Africa (BCC-CSM2-MR and INM-CM4-8), but none of the models reproduces those of the Amazonian carbon gains, suggesting that mismatches of opposing signs between individual models are dampened by the multimodel-mean ( Figure S3).
CMIP6 multimodel-mean carbon losses are slightly overestimated for Africa and Amazonia as compared to observations (Africa: 16%; Amazonia: 12%, Figure 1). CMIP6 models show increases in losses in Africa (p < 0.001) but no trend in Amazonia (p = 0.308). As such, they do not capture the observed contrasting trend in carbon losses between African forests-showing no multidecadal trend (p = 0.403)-and Amazonian forests-showing a monotonic increase (p = 0.002) (Figure 1). Multimodel-mean variability matches observations for both continents (both models and observations <0.01 Mg C ha −1 yr −1 ). Individual models reproduce either magnitude or variability or the trend since the 2000s but not all three ( Figure S3).

Environmental Drivers of Long-Term Change
The influence of the putative environmental drivers, CO 2 , MAT and water availability (as maximum climatological water deficit, MCWD) on carbon gains and losses in the observations is compared to the CMIP6 ( Figure 3) and CMIP5 model performance ( Figure S7). As expected, CMIP6 carbon gains show a significant positive relationship with CO 2 (p < 0.001) as do observations (p = 0.02), both of the same slope (Figure 3). A strong negative relationship of carbon gains and MAT is seen in the observations (p < 0.001), but the CMIP6 multimodel-mean, while also showing a negative relationship, is both not statistically significant (p = 0.776) and shows a much weaker relationship than observations (Figure 3). The negative relationship of carbon gains and MCWD drought is seen in the CMIP6 multimodel-mean (p < 0.001) and the observations (p = 0.003), with the models showing a slightly stronger relationship than the observations (Figure 3).
The picture is different for carbon losses: the CMIP6 multimodel-mean shows statistically significant relationships with CO 2 (positive), MAT (negative) or MCWD (negative), following trends in carbon gains. Yet, the observations show no significant relationship to any of these environmental drivers and no discernable trends. However, a more sophisticated multi-parameter model in (Hubau et al., 2020; Table S8. response to the same environmental changes: the plot mean wood density (which in old-growth forests correlates with belowground resource availability) and the plot (CRT, which measures how long fixed carbon remains in the system and hence reflects when past increases in carbon gains leave the system as elevated carbon losses), shows the same sign of trends as the CMIP6 multimodel-mean for CO 2 and MCWD with carbon losses (MAT is not retained in the multi-parameter model, as a change in the MAT term is included). The same trends hold for CMIP5 models ( Figure S7). The more sophisticated multi-parameter model includes CO 2 , MAT, and MCWD, but also wood density as a predictor variable which is not available for the CMIP models, precluding a like-for-like comparison. The significant negative effects of the environmental drivers in CMIP6 on carbon losses are a consequence of the fact that carbon losses are calculated as a function of carbon gains. Hence, the models show similar trends in gains as they do in losses (Figure 3). Unfortunately a strict like-for-like comparison of the model-observation results is not possible, because the expanded Hubau et al. (2020) model includes wood density as a predictor variable, which is not available for the CMIP models. Hubau et al. (2020) identify the CRT as a critical for determining the length of time from which past increases in carbon gains result in increased carbon losses that then begin to reduce the carbon sink. The CMIP6 multi-model mean CRT in Africa and Amazonia are lower than the observed CRT (45 years, 95% CI 44-47 and 65 years, 95% CI 64-66 for African and Amazonia respectively for CMIP6 vs. 86 years, 95% CI 83-89 and 80 years, 95% CI 79-83 for African and Amazonia respectively for the observations). The models also do not capture the lower CRT in Amazonia compared to Africa seen in the observations. This failure to capture the continental CRT differences is largely due to lower carbon gains relative to their total biomass carbon stock.
Overall, the CMIP6 models show good alignment with the observations in terms of the impacts of CO 2 and drought, but less so with temperature, and the models do not reproduce the CRT seen in tropical forests.

CMIP5 and CMIP6 Models Are of Similar Skill
Comparing the carbon dynamics for Africa, Amazonia, and the pan-tropics in CMIP6 and CMIP5 based on the common 1985-2005 period shows that the multimodel-means of both model generations produce a natural net carbon sink in live biomass of similar magnitude and trend for both continents (Figure 2). Although, the CMIP5 multimodel-mean (0.15 ± 0.09 Mg C ha −1 yr −1 ) shows a smaller natural mean pan-tropical sink than the CMIP6 multimodel-mean sink (0.21 ± 0.09 Mg C ha −1 yr −1 ) over 1985-2005. The interannual variability of the natural pan-tropical net carbon sink is similar in the CMIP5 (0.02 Mg C ha −1 yr −1 ) and CMIP6 (0.01 Mg C ha −1 yr −1 ) multimodel-means. Both continents' carbon sinks remained stable in the CMIP5 multimodel-mean (Africa p = 0.147; Amazon p = 0.348) while in the CMIP6 multimodel-mean the natural African carbon sink increased (p < 0.001) and the natural Amazon sink remained stable (p = 0.589) ( Figure 2). Like CMIP6, individual CMIP5 models are capable of capturing the observed divergence in the continents' carbon sinks (CMIP6: BCC models and CanESM5; CMIP5: CanESM2, both CESM models, HadGEM2-CC). However, unlike CMIP6, all CMIP5 models produce too high interannual variability and do not capture the observed variability in carbon sink strength in both continents ( Figure S3 and S4).
For carbon gains and losses, CMIP5 models show some different patterns compared to CMIP6 models. CMIP5 models produce substantially (28%) higher multimodel-means in carbon gains and losses than CMIP6 models for African tropical forests, which does not match observations ( Figure S5 and S6). This is caused by individual models producing too high NPP (e.g., both IPSL models and both MPI models report twice as high NPP as the CMIP6 mean and observations). The magnitude of carbon gains and losses in CMIP5 is similar to CMIP6 for Amazonia. Trends in carbon gains are similarly positive for all regions in both CMIP5 (Africa, p < 0.001, Amazonia, p = 0.007) and CMIP6 (both continents, p < 0.001), consistent with the observations. Likewise, trends in carbon losses are positive for all regions in both CMIP5 and CMIP6, except in CMIP5 Amazonia is showing no trend (CMIP5: Africa, p = 0.004; Amazonia, p = 0.145; CMIP6: Africa, p = 0.002; Amazonia, p = 0.045). Individual model variability is similar in both CMIPs.
The impact of long-term drivers is comparable between CMIP5 and CMIP6, with carbon gains in both showing significant positive relationship with CO 2 (CMIP5, p = 0.004; CMIP6, p = 0.004) and significant negative relationship wth MCWD (both p < 0.001) ( Figure S7). CMIP5 gains, however, show a significant negative relationship with MAT (p < 0.001) while CMIP6 gains do not (p = 0.442). CMIP5 gains agree with observations. For losses CMIP5 and CMIP6 both show the same responses, with no significant relationship for CO 2 (CMIP5, p = 0.252; CMIP6, p = 0.197), and significant negative relationships for MAT (CMIP5, p < 0.001; CMIP6, p = 0.017) and MCWD (both p < 0.001), because losses being calculated as a direct funtion of gains in the models. The change from a significant relationship of CMIP6 losses with CO 2 for 1985-2014 to a nonsignificant relationship over 1985-2005 is caused by the spike in losses at around 355 ppm CO 2 , which has a stronger influence over the shorter period with less data points. This spike is caused by the anomalous mortality event in Amazonia in NorCPM1. Removing these outliers results in a significant relationship (p = 0.042). Both CMIP5 and CMIP6 do not capture the observed relationship between carbon losses and its long-term drivers shown here but agree with the trends of a multi-parameter model in Hubau et al. (2020;Table 2 therein) that, however, includes the key variable wood density that is not available from models.
Overall, CMIP5 and CMIP6 models, on average, slightly underestimate carbon gains but overestimate carbon losses-with the exception of the overestimation of gains and losses in African tropics in CMIP5leading to an underestimate in the overall magnitude of the carbon sink in live biomass. They estimate the positive trend of carbon gains, consistent with the observations, and the positive trend in carbon losses, but at levels far below the trend seen in the observations in Amazonia.

The Future Carbon Sink
Finally, we compare the CMIP6 model estimates of the carbon sink in live biomass for intact African and Amazonian forests under four future SSPs from the smallest climate impacts scenario (SSP126) though a mid-scenario (SSP245) to a high scenario (SSP370) to the largest impact scenario (SSP585), with the published estimates of the future sink using extrapolations of environmental drivers and a statistical model, which is closest to SSP245 (Hubau et al., 2020), up to 2040. The CMIP6 models do not replicate the decline in the natural sink seen using the statistical model. Instead, the CMIP6 multimodel-mean shows an increasing net carbon sink under all scenarios (Table 2). Under no scenario does the multimodel-mean project a decline in the natural African net carbon sink, with even SSP585 showing an increase in the sink in the 2030s (p = 0.015). Likewise, the CMIP6 multimodel-mean projects no change in the natural Amazonian net carbon sink under any scenario, except for an increase in sink strength under SSP585 (p = 0.005). By contrast, observation-based model projections show a decline in the sink in African forests and a much stronger negative trend in Amazonia where the carbon sink in live biomass reaches zero by 2035 (both p < 0.001; Figure 4).
Projected CMIP6 carbon gains and losses partially follow the observation-based statistical model similarly to the modeled 1985-2014 results ( Figures S8 and S9). Generally, the multimodel-mean matches the observation-based statistical model of carbon gains relatively well but underestimates their trend. The CMIP6 multimodel-mean overestimates gains for the African tropics in SSP245 and SSP370, while those for Amazonia are similar. The scenarios with the highest CO 2 concentration (SSP370 and SSP585) also show the strongest increases in carbon gains in both continents (all SSPs on both continents p < 0.001, except SSP585, Amazonia, p = 0.005). The CMIP6 multimodel-mean mortality is higher in Africa than the observation-based statistical model but approximately matches the statistical model for Amazonia. The mortality response to the varying levels of climate change is diverse. Mortality increases in the African tropics under the strong warming scenarios SSP370 and SSP585 (both p < 0.001), as well as SSP126 (p = 0.007) but not SSP245 (p = 0.106). For Amazonia this pattern is reversed, with mortality increasing under the moderate warming scenarios (SSP126, p = 0.025; SSP245, p = 0.003) as well as SSP370 (p = 0.029) but not under SSP585 (p = 0.251). Overall, increases in mortality trends are lower than the increases in carbon gains under all scenarios, except for Africa under SSP370 and SSP585, resulting in an increasing carbon sink in live biomass in the model scenarios, unlike the observation-driven extrapolations which show losses increasing faster than gains resulting in a decline in the carbon sink strength over time.

Matching Observations and Models
The CMIP5 and CMIP6 models estimate lower carbon gains from woody productivity (fraction of NPP to wood and coarse roots) than seen in the forest inventory plot observations. At the same time CMIP models overestimate carbon losses from mortality, and thus overall substantially underestimate the total natural carbon sink in live biomass in intact tropical forests over the past three decades. In terms of trends, the models show rising carbon gains as the plots do, but cannot replicate the divergent carbon loss trends in Africa and Amazonia over 1985-2014, and hence do not show the observed saturation of the carbon sink in live biomass. For the future, the CMIP6 models predict an increasing natural carbon sink in intact tropical forests, while a statistical model based on the observations suggests a continued strong decline in the Amazon sink, and a modest decline in the African sink. Yet, a simple comparison of the CMIP models and forest plot observations is not straight-forward as sampling location, census interval length, methods for estimating forest biomass, and estimates of the likely environmental drivers differ between the CMIP models and the observations.
We assume that the inventory plots are an unbiased representation of each continents' tropical forests. In reality these plots have been haphazardly located based on the needs of individual studies in the past, and while usually placed randomly within a defined area, are not random or stratified across the continents (Brienen et al., 2015;Hubau et al., 2020;Lewis et al., 2009). The plots tend to be in clustered groups that are typically closer to access points (rivers, roads) than expected from random selection procedures. However, despite these limitations analysis shows that the plots are well-distributed and broadly representative of the climate space of closed canopy tropical forests (Sullivan et al., 2020). A further complication is that to match these intact forest plots to outputs from the ESMs, we needed to restrict spatial locations to grid squares with low levels of land-use change between 1985 and 2014, hence there is some potential spatial mismatch between the model outputs and observations. This uncertainty does not appear to affect the results because the environmental conditions covered by the selected grid cells is in good agreement with the observations (Figure S2), and if we alter the land use change grid square exclusion criteria these do not affect our results ( Figure S10). In addition to restricting ESM grid cells to those with little land use over 1985-2014, these grid cells experience both some land use prior 1985 (<2.5%) and some land use change over 1985-2014 (<1% of the grid cell area over 30 years). These appear to only have short term impacts on the carbon sink strength, as using an lower 0.5% land use change per grid cell threshold confirms that our results are not impacted by land use (Figure S10). Yet a small confounding impact on the long-term trend cannot be excluded. Likewise, a relaxing of our land use rules to include grid cells with initial (1985) land use in 5% of the grid cell area and 3% land use change over 1985-2005 shows that differences between Africa and Amazonia are not caused by our chosen grid cell thresholds. Therefore, overall, we do not expect the limitations in the location of the plots, nor the location of the low or zero deforestation grid cells in the ESMs to strongly impact our comparisons, as our analyses do not show this and there is no obvious mechanistic process to do so. However, such mismatches cannot be excluded as a potential cause of the differences amongst the model-observations comparisons.
Not all inventory plots were monitored for the entire time between 1980 and 2015, further complicating comparisons with the ESMs. On both continents the number of plots monitored increased rapidly in the 1980s, reaching a maximum in the mid-2000s, before declining closer to the 2015 cut-off date of this study. By contrast, model outputs are reported annually from 1985-2014 for every grid cell we include. However, the results are similar if we compare the best sampled decade, 2000-2010, suggesting this limitation of the temporal sampling plot data is not driving any of the model-observation differences we report (Figure 1, with 2000-2010 trends in brackets; Table 1 for decadal mean sink strength). In addition, the observations have differing census interval lengths which could make comparisons with the modeled interannual variability difficult, although we account for this by applying a 5-year moving average to the model time series.
Differences between observations and ESMs could also be related to potential bias in biomass estimation. In the observational plot data, biomass is estimated using an allometric model with stem diameter, tree height and wood density as predictor variables, which given the difficulty of measuring the mass of tropical trees and the high species diversity in tropical forest is, by necessity, a single model applied to all trees species across the tropics. The allometric model used in Hubau et al. (2020) was parameterized using the largest available number of harvested trees (Chave et al., 2014). However, recent analyses have demonstrated that inconsistencies in allometric relations can introduce significant bias into stand-scale biomass predictions (Burt et al., 2020). Furthermore, allometry may alter with ontogeny, although there is insufficient data to quantify and size dependency of allometric relationships (Burt et al., 2020). By contrast, total vegetation biomass-carbon (cVeg) in ESMs is calculated very differently. NPP and biomass are typically based on the Farquhar photosynthesis model (Farquhar et al., 1980) and a stomatal conductance model that calculates water, energy, and CO 2 fluxes (Bonan, 2008). These models are robust because they include mechanistic representations of processes (water, energy and nutrient balance; photosynthesis and plant growth; decomposition; disturbance) (Lawrence et al., 2011). Yet, these models represent a simplification of real-world biogeochemical dynamics. For example, a key challenge is that most models use a single plant functional type to represent all broadleaf evergreen trees throughout the tropics (Townsend et al., 2008). Furthermore, estimation of tropical forest NPP and biomass are limited to relatively coarse resolutions and single value estimates are used across large regions (Townsend et al., 2008). Future improvements of NPP and biomass estimates will depend on the development and maintenance of a network of sites where forest are continuously monitored using multiple approaches (Chave et al., 2019;Cleveland et al., 2015). We conclude that due to very different calculation methods, observation-based and ESM-based biomass estimates are difficult to compare and can produce very different results, from these scaling methods alone. Indeed, Cleveland et al. (2015) directly compared NPP estimates from different approaches, including field-based methods scaled from plot-level measurements and biogeochemical models, showing that different methods can produce different NPP patterns, both in space and through time (Cleveland et al., 2015). Thus our model-observation comparisons of the absolute carbon sink values need to be interpreted with caution, but the model-observation differences in trends over time may be more robust as the model and observation biomass estimation methods are time invariant.
The final difference between the observations and ESMs is that the observations are in response to the actual changes in the environment over the time these forests were monitored-estimates from CO 2 and climate data derived from measurements-plus any lagged effects from prior environmental change. Yet, the CMIP5 and CMIP6 carbon dynamics are driven by model-generated climate and idealized forcings. The forest-environment trends seen in the models may be impacted by these model-generated climates and idealized forcings in ways that differ to the real world over 1985-2014, which could affect the comparisons of the models and the observations. This is, however, not the case as shown by the very similar response of observed carbon gains to CMIP6-forced CO 2 and observed CO 2 (Table S6). Overall, it is difficult to find evidence that the matching of grid cells in space and time with the forest plot observations, or the differences in the magnitude and rate of change in environmental drivers of changes in models or observations, are the cause of any of the plot-model differences we find. Major disparities remain that cannot be explained by sampling issues or differences in the environmental driver data.

Models Underestimate NPP and Overestimate Losses
One key difference we find is for modeled carbon gains (i.e., modeled woody NPP), the CMIP6 multimodel-mean is on average 11% (Africa) and 4% (Amazonia) lower than the mean from 565 forest locations across Africa and Amazonia. This is in agreement with the 7% lower pan-tropical CMIP5 NPP multimodel-mean in Negrón-Juárez et al. (2015). Some of the underestimated NPP would potentially be balanced by applying model-specific wood allocation instead of a multimodel-mean allocation. The effect, however, is small, accounting only for 0%-3% of the difference we see between models and observed carbon gains in Africa and Amazonia when comparing the models where the allocation factor is available with the same models adjusted by the multimodel-mean allocation factor.
Modeled gains are derived from NPP, which is calculated as a composite of different plant functional types. While this cannot be accounted for from the available model output, we assume that the overall impact is small as we have chosen a total biomass carbon threshold that precludes most other vegetation than trees and averaging over a number of grid cells. A subset of models (n = 11) for which output from separate carbon pools is available, confirms this: we find that on average 99% of the total biomass carbon is allocated to woody biomass and root biomass (i.e., trees and shrubs and their roots) in our chosen grid cells (Table S5). Neither the allocation of woody NPP nor the spatial selection criteria explain the lower modeled carbon gains. Models generally have difficulties balancing GPP and respiration (Kim et al., 2018), thus the vegetation models' underlying empirical response functions of photosynthesis to temperature, CO 2 , and water supply, as well as responses of respiration to temperature, and the rules to allocate fixed carbon to leaves, trunks and roots are a likely cause for the modest mismatch between models and observations. These empirical relationships are often based on experiments and data from temperate locations, with a surprising lack of data from the tropics (Mercado et al., 2018;Pugh et al., 2016;Terrer et al., 2019).
The ESMs also show higher modeled losses compared to observations (Africa, 14%; Amazonia, 8%). This mismatch can partly be explained by discrepancies in the calculation procedure of modeled losses. While observed losses are derived from direct measurements in the field (each tree that died is recorded and the diameter of the previous census is used to calculate carbon loss, including estimated growth prior to its death), we calculate modeled losses in ESMs indirectly from carbon gains and net biomass change as no direct mortality output is available from the CMIP archives. Models only report NPP and the total grid cell carbon, which are used to calculate carbon gains and annual net biomass carbon-change, respectively. Annual carbon losses from mortality are then simply calculated as gains minus annual net carbon-change.
Thus overall a modeled underestimate in the carbon gains leads to an overestimate in the modeled carbon losses, as seen when comparing both the gain and loss observations and modeled estimates.
Furthermore, because modeled losses are a direct function of modeled gains, modeled gains and losses exhibit very similar relationships with the environmental drivers (CO 2 , MAT, MCWD), in contrast with the observed gains and losses ( Figure 3). This is consistent with how individual models treat mortality, often based on a fixed mortality rate, or age-induced mortality, or a leaf area to stem growth threshold induced mortality ("growth efficiency"), which all essentially link mortality to modeled carbon gains (McDowell et al., 2018).
As modeled losses are assumed to be a direct function of present carbon stocks and NPP, they do not take into account the potential for more complex feedback effects of past carbon dynamics. One of the conclusions in Hubau et al. (2020) is that observed losses appear to lag behind carbon gains, with lagged response times depending on the CRT of the forest. As such, owing to the typically longer CRT of African forests, the onset of increasing losses in Africa (around 2010) appears 10-15 years after the increase in Amazon losses began (around 1995). Because modeled losses do not respond to the CRT and hence longer lag times from past gains, they fail to capture the asynchronous increase in losses among the continents, and they fail to capture the magnitude of the increase. This is also reflected in the fact that models do not capture the lower CRT in Amazonia compared to Africa seen in the observations, and that overall CRT in the models is about one-fifth lower than in the observations. Improved mortality functions in CMIP models has been a longstanding issue (Friend et al., 2014;Galbraith et al., 2013;Koven et al., 2015;Negrón-Juárez et al., 2015), as has the importance of CRT  and are crucial to improve model performance in the future.

Models Have Difficulty Balancing the Impacts of Environmental Drivers
Based on the relationship between carbon gains and losses and their environmental drivers we can assess the models' ability to project a change in the natural carbon sink in live biomass following the change in the environmental drivers. The models do not capture the saturation and decline of the tropical forest carbon sink in live biomass seen in the observations. In the models increases in carbon gains from elevated CO 2 offset any increases in temperature-or drought-induced carbon loss. Similarly in the observations, consistent increases in photosynthesis from higher CO 2 are partially offset by the negative consequences of droughts and higher air temperatures on the carbon balance of tropical forests, with Amazonia more affected due to more rapidly rising temperatures, from a higher baseline temperature, more drought susceptible forests and stronger recent droughts than seen in Africa (see Ext. Data Figure 5 in Hubau et al., 2020). The trends of the environmental drivers are similar in both models and observations and so cannot explain the mismatch between observations and models: the prescribed CO 2 concentrations in the models are increasing, the modeled temperature increase is marginally higher in Amazonia (0.029°C year −1 ) compared to Africa (0.026°C year −1 ). The only difference is that modeled droughts are increasing in Amazonia (MCWD, 0.224 mm month −1 per year) as shown in the observations, but in Africa models show a wetting trend (−0.219 mm month −1 per year), whereas observations show a modest increase in droughts. Observed CRT, and its continental differences, however, are not captured in the models. Thus mortality only responds to recent carbon gains in the models, yet the observed decline in the tropical forest carbon sinks is largely driven by mortality increases from past gains.
The balancing effects of plant acclimation to rising CO 2 and temperature still represents a key uncertainty in the carbon cycle feedback to future climate change (Booth et al., 2012;Hubau et al., 2020;Lombardozzi et al., 2015). The magnitude of plant response to CO 2 is uncertain and depends on data from forest plots, ESMs, satellite observations, and limited experiments (Kolby Smith et al., 2016;Terrer et al., 2019). Another uncertainty is presented by the forest response to temperature and in particular its interactions with drought, which may be nonlinear as recent data and modeling suggests (Sullivan et al., 2020).

Need to Improve ESM Prediction Skill in Next-Generation Models
Recent advances in representing the land surface component of ESMs, may have improved CMIP model performance in some parts of the modeling realm, such as the prognostic representation of phenology (Li et al., 2019) and biomass saturation under increasing NPP (Lawrence et al., 2018). However, here we show that these changes did not improve modeling the natural intact tropical forest carbon sink in live biomass. Ultimately, two aspects determine how well ESMs reproduce the natural intacttropical forest carbon sink in live biomass; (i) the simulated climate, its seasonality and multiannual trends, and (ii) the simulated response of the vegetation to these environmental drivers. CMIP6 models generally agree with observations on MAT and its historical trends in the tropics (Fan et al., 2020). CMIP6 ESMs are able to reproduce total annual precipitation in the African tropics, but show a dry bias in Amazonia and a wet bias in SE Asia. Models reproduce the historical trends in precipitation, yet struggle with reproducing rainfall seasonality (Fiedler et al., 2020). Improving ESM to better simulate tropical precipitation will improve future ESM vegetation modeling.
In ESM carbon gains, mediated by CO 2 and nutrient limitation, under increasing CO 2 , still offset any heating-and drought-induced tree mortality in the current set of CMIP models. Yet observations have shown that combined drought-heat stress increased mortality (Sullivan et al., 2020). Nondemographic vegetation models currently used still apply a size or age dependent mortality that does not reflect plant cohort responses to dry conditions, something the next generation of vegetation models-not yet implemented in CMIP6-aim to address . Indeed, the implementation of vegetation demographics models with a mechanistic approach to tree water and energy fluxes in ESMs has been shown to improve plant carbon responses to drought (Christoffersen et al., 2016;Feng et al., 2018). A more mechanistic approach would likely also lead to an improvement in CRT, which has been found to be the primary driver behind decreasing carbon stocks under combined low precipitation and high temperatures (Sullivan et al., 2020).
Improvements in ESMs' representation of tropical vegetation are crucial to adequately project the fate of the tropical carbon sink under the future global warming. Critically, the inventory data suggests a saturation in the carbon sink strength as already occurred which is not simulated the CMIP6 models. There is a need to both continue the forest plot observations to confirm the saturation of the tropical forest carbon sink, and improve CMIP modeling efforts to understand this critical part of the global carbon cycle to better inform future climate policy. Integrated research incorporating new observations and model development will most quickly achieve these goals.

Conclusion
We compared recently reported observations of carbon dynamics of intact tropical forests with those estimated by ESMs. Both the current generation of ESMs (CMIP6) with their predecessor (CMIP5) models are of equal skill, but they do not reproduce well the observed 1985-2014 continental differences of total carbon gains and losses. This resulted in a 66% smaller total natural pan-tropical sink in the newest generation state-of-the-art CMIP6 models (0.34 ± 1.15 Pg C yr −1 ), as compared to the observed pan-tropical sink from 1985-2014 (1.02 ± 0.40 Pg C yr −1 ). Furthermore, the models show a moderate increase in carbon uptake in both the African and Amazonian tropics over the 1985-2014 window, whereas there is a clear decline in the sink in Amazonia. Future projections to 2040 show an increase in the sink to 2040 using the CMIP6 models under all four diverse future scenarios analyzed. This contrasts starkly with the observed strong past decline in the Amazonian carbon sink and the statistically derived predicted future decline of the natural carbon sink on both continents.
Critically, these mismatches can be explained because models do not correctly balance CO 2 fertilisation effects and respiration costs, while they also fail to capture trends in mortality, likely stemming from the difficulties of modeling the complex lag effects between past carbon gains and carbon losses. This makes predicting the future of the sink, and the critical time of transition from carbon sink to carbon source dynamics extremely challenging. This hampers ESM prediction skill and potentially underestimates important carbon cycle feedbacks under global heating. Well-integrated research programs that combine continued widespread monitoring of tropical forests with experiments to test hypotheses to synthesize this information into next-generation models are needed. Such multidisciplinary approaches are key to reduce the uncertainty of the fate of the tropical forest carbon sink, a key feedback in the Earth system.