Internal Variability Increased Arctic Amplification During 1980–2022

Since 1980, the Arctic surface has warmed four times faster than the global mean. Enhanced Arctic warming relative to the global average warming is referred to as Arctic Amplification (AA). While AA is a robust feature in climate change simulations, models rarely reproduce the observed magnitude of AA, leading to concerns that models may not accurately capture the response of the Arctic to greenhouse gas emissions. Here, we use CMIP6 data to train a machine learning algorithm to quantify the influence of internal variability in surface air temperature trends over both the Arctic and global domains. Application of this machine learning algorithm to observations reveals that internal variability increases the Arctic warming but slows global warming in recent decades, inflating AA since 1980 by 38% relative to the externally forced AA. Accounting for the role of internal variability reconciles the discrepancy between simulated and observed AA.

• Internally generated and externally forced temperature trends over the Arctic and globe can be partitioned using machine learning methods • Internal variability has enhanced Arctic warming while damping global warming over 1980-2022 • Accounting for internal variability in observations reconciles discrepancies between simulated and observed Arctic Amplification Supporting Information: Supporting Information may be found in the online version of this article.
impacts on albedo and surface heat fluxes (Serreze & Barry, 2011;IPCC Chapter 3;Feldl et al., 2020;Deng & Dai, 2022).Due to this coupling, the large internal variability in sea ice likely manifests as changes in Arctic surface temperature.Decadal atmospheric and oceanic internal variability may also contribute to recent Arctic warming (Kim & Kim, 2017;Proshutinsky et al., 2015).Internal variability has also been implicated in the recent slowdown of global warming in the early 21st century (Guan et al., 2015;Huber & Knutti, 2014;Kosaka & Xie, 2013;Zhang, 2016).However, it is still an open question whether the large differences in AA between model simulations and observations are mainly caused by climate model deficiencies, internal variability, or both (Chylek et al., 2022;Rantanen et al., 2022).
When comparing the model simulations with observations, it is important to account for the effects of internal variability (Deser et al., 2020).In single-model large ensembles, the same model is run with small perturbations in the initial conditions leading to unique realizations of internal variability in each ensemble member.
The externally forced signal can be estimated using the ensemble mean and the internal variability associated with each ensemble member can be obtained as the deviations from this mean (Kay et al., 2015).However, this technique cannot be applied to observations because there is only one observational record.To disentangle the effects of external forcing and internal variability on observed changes in climate, previous work has used various spatiotemporal analysis methods (e.g., Deser et al., 2014;Deser et al., 2016;Gong et al., 2019;Guo et al., 2019;Po-Chedley et al., 2021;Po-Chedley et al., 2022;Räisänen, 2021;Smoliak et al., 2010;Smoliak et al., 2015;Wallace et al., 2012;Wills et al., 2020).Here, we build upon previous methods using a machine learning (ML) approach, which is trained to separate the contribution of external forcing and internal variability to surface warming using climate model large ensembles (see Table S1 in Supporting Information S1 for information about the model large ensembles).The model-trained ML algorithm is then applied to observations to estimate the relative influence of external forcing and internal variability on recent  Arctic and global surface temperature changes.We find that internal variability enhanced Arctic warming but damped global warming, resulting in amplified AA in the observed record.We show that accounting for the effects of internal variability on Arctic and global surface warming reconciles differences between observed and model-simulated AA.

Data and Methods
The magnitude of AA depends on the southern boundary used to define the Arctic (Davy et al., 2018).In this study, AA is defined as the surface air temperature trend for the region poleward of 70°N  (Hersbach et al., 2020;Lenssen et al., 2019;Morice et al., 2021;Rohde & Hausfather, 2020;Zhang et al., 2019).Figure S1 in Supporting Information S1 indicates that warming is amplified north of 70 o N in all four observational data sets.
Following recent work showing that ML methods can effectively isolate internally generated and externally forced trends (Barnes et al., 2019;Connolly et al., 2023;Gordon & Barnes, 2022;Po-Chedley et al., 2022), we create ML algorithms to isolate these trend contributions in observed surface air temperature during the 43-year period from 1980 to 2022 over both the Arctic and globe.To do this, we create a training data set based on 10 CMIP6 models, of which each have at least 10 ensemble members (Table S1 in Supporting Information S1).Aside from the CESM2 large ensemble from the CMIP6 archive, we also include the 50 member CESM2 large ensemble with updated biomass burning aerosol emissions that better represents the historical radiative forcings in the high latitude northern hemisphere (referred to here as CESM2_SBMB) (Fasullo et al., 2022;Rodgers et al., 2021).The target data in our training are the externally forced and internally generated surface air temperature trends averaged over a given region (either Arctic or globe), which are derived as the mean trend and deviation from the mean in each ensemble.These trends are calculated using 43-year periods separated by five years spanning 1900-2047 (i.e., 1900-1942, 1905-1947, …, 1980-2022, …, 2005-2047).CMIP6 and CESM2_SBMB historical runs end in 2014, so we extend these simulations using either SSP3-7.0 or SSP5-8.5 (SSP5-8.5 is used when both are available) (O'Neill et al., 2016) until 2047 for seven of the models in our training data.The remaining four models only have sufficient ensemble members until 2014, and thus only periods from 1900 to 2012 are used to train our ML algorithm for these models.The ML algorithm is trained using 10 models with large ensembles but with one model leftout (see more details below).We test the results from 1980 to 2022 using the leftout model that is one of seven with extensions beyond 2014 (Table S1 in Supporting Information S1).The observationally derived AAs are compared with the seven large ensembles for 1980-2022, and with all other CMIP6 models with data available over 1980-2022 (OthersAllEM), even though each of them does not have enough ensemble members to properly derive the externally forced AA.
The predictor data (i.e., the input used to estimate the targets) are maps of surface air temperature (SAT) and sea level pressure (SLP) trends.While SAT trend patterns can capture signals related to both internally and externally generated trends (Po-Chedley et al., 2022), SLP trend patterns provide information on the dynamically induced variability, especially over the high latitudes (Deser et al., 2016;Wallace et al., 2012), thus complementing the information provided by SAT trend patterns.Our ML pipeline is thus designed to accept 43-year trend maps of both SAT and SLP and returns the components of the trend averaged over the Arctic or globe due to internal variability and external forcing.All maps of SAT and SLP trends are regridded to a common 2.5° × 2.5° grid.
The ML algorithms are trained for the predictions of the Arctic and global cases separately.For the global case, input data are global trend maps of SLP and SAT.For the Arctic case, we only use trend maps poleward of 20°N (Smoliak et al., 2015;Wallace et al., 2012).Patterns of surface temperature changes can impact both regional and global scale warming and can provide information about the relative role of internal variability (Dong et al., 2019(Dong et al., , 2020)).Outside the tropics SLP can be used as a proxy for the atmospheric circulation (Deser et al., 2014;Smoliak et al., 2010) and has been used to isolate dynamically induced changes in surface temperature in the northern hemisphere (Guan et al., 2015;Wallace et al., 2012).Further, using more than one geophysical variable may help in identifying signals of external forcing (Rader et al., 2022).
We use the convolutional neural networks (CNNs) that are trained separately for the Arctic and global-mean temperature trends (see Text S1 in Supporting Information S1).We validate the skill of the CNN using a leave-one-out cross validation, where the CNN is trained on data from all models except the model we test on (which is one of seven models covering 1900-2047) (see Text S1 in Supporting Information S1).This prohibits the CNN from learning model specific biases.When applying the CNN to the out-of-sample large ensemble, we also apply it to observed SAT and SLP trend patterns to derive the externally and internally generated trends.SAT trends are those of the four observational data sets from 1980 to 2022.SLP trends are from the ERA5, MERRA-2, and JRA-55 reanalysis data sets over the same time period (see Figure S2 in Supporting Information S1 for a comparison of SLP trends between reanalyzes used and the 20th century reanalysis for 1980-2015, showing a good agreement).Because we have four SAT data sets and three SLP data sets, in total we have 12 sets of SAT and SLP trend maps.For each of the seven models that we test on during the cross validation, we get estimates of internally generated and externally forced trends from each of the 12 observational SAT and SLP sets, providing 84 estimates of the internally generated and externally forced trends.The central value is then the mean over all 84 observational predictions and the uncertainty is quantified by taking into account both observational and ML prediction uncertainties (Text S2 in Supporting Information S1).

Arctic Amplification in Observations and CMIP6
Figure 1a shows the patterns of local amplification over the northern hemisphere high latitudes from the observational mean and multi-model mean (MMM).Observations show maximum amplification poleward of 70°N and that large extents of the Arctic Ocean have warmed at least four times as quickly as the global mean.Local amplification ratios exceed six in the Barents Sea, consistent with strong reductions in sea ice concentration in the same region (Isaksen et al., 2022;Parkinson, 2022;Screen & Simmonds, 2010).Although the MMM exhibits a similar pattern of local amplification, it substantially underestimates the magnitude as compared to observations (Rantanen et al., 2022;Ye & Messori, 2021).
Figure 1b shows the AA from the four observational data sets and seven large ensembles and OthersAllEM (see Section 2) for 1980-2022.While the forced component of AA ranges from 2.13 to 3.58 across models, individual ensemble members span a much larger range (Figure S3 in Supporting Information S1 shows each large ensembles AA distribution indivisually).Since the forcing is the same for all members of each model large ensemble, the deviations from the forced AA for a given ensemble member is entirely due to internal variability (Other-sAllEM is an exception for which the deviations could also be partly due to differences in forced trends).While the magnitude of AA varies across the observations, all show extreme AA compared to the distribution of model simulations (Figure 1b).All observational AA estimates sit outside the range of forced AA predicted by the large  data sets, with a mean of 0.791 K/decade, which are all well within the range of Arctic warming predicted by models (Figure 2a).Thus, the observed Arctic warming is not as extreme as AA when compared to model simulations.
The global warming trend from the four observational data sets ranges from 0.183 to 0.193 K/decade with a mean of 0.189 K/decade, which is on the lower side of the simulated range of externally forced trends (Figure 2b).However, some models have ensemble members that simulate global warming trends below what is observed, suggesting that internal variability may damp the rate of global warming (Kosaka & Xie, 2013;Watanabe et al., 2014;Wu et al., 2019;Xie & Kosaka, 2017;Zhang, 2016).Next, we attempt to partition the observed Arctic and global warming trends into their externally and internally generated components.

Separating Internal Generated and Externally Forced Trends in Observations
The test of the CNN algorithms on each of the seven models for 1980-2022 are shown as scatter points in Figure 3, which suggest that when presented with a set of SAT and SLP trend maps from a model ensemble not used during training, the CNN can reliably separate the internal and external contributions to the trends averaged over the Arctic and globe.This is despite the wide range of internally generated and externally forced trends simulated by models (see Figure 2).The skill of the CNN results from its ability to learn the patterns (in SAT and SLP) that correspond with the internally generated and external forced trends in both the Arctic and global domains.The CNN also generalizes well to simulations with forced trends far from the MMM (e.g., red dots showing results for CanESM5 in Figures 3b and 3e).Although the CNN predicts the internal and external trends separately, their sum accurately reproduces the total trend (see Figures 3c and 3f).This conservation of the total trend is not explicitly targeted during training but arises from learning this closure in the training data.
Having shown that the CNNs can reliably predict the internal and external trends in models, we apply the CNNs to observations from 1980 to 2022 using the four SAT data sets and three SLP data sets.The mean results for each  3a).The CNN predicts that the externally generated Arctic surface temperature trend is 0.619 K/decade.This suggests that internal variability has accelerated the pace of Arctic warming by ∼23% relative to the forced trend.Using all ensembles from the seven models for all 43-year periods separated by 5-year increment over 1900-2047/2012, the 2σ spread of Arctic internal variability is ±0.324K/decade.Many studies have shown that surface temperature trends in the Arctic are strongly coupled to sea ice trends, and that recent declines in sea ice cover have been enhanced by multidecadal variability (Deng & Dai, 2022;Ding et al., 2019;Kay et al., 2011;Screen & Simmonds, 2010;Serreze et al., 2009).Our results agree with previous studies showing that internal variability is an important contribution to recent trends in Arctic climate change (Bonan & Blanchard-Wrigglesworth, 2020;Chylek et al., 2022;Ding et al., 2019).
Application of the CNN to the global case suggests that internal variability dampens the observed temperature trend, which is also consistent with previous studies (Kosaka & Xie, 2013;Po-Chedley et al., 2022;Xie & Kosaka, 2017).All observational estimates show that internal variability reduces global surface warming over 1980-2022, with a central estimate of −0.024 K/decade (Figure 3d).The global CNN predicts the externally generated trend to be 0.207 K/decade.This suggests that internal variability has damped the global warming by ∼12% relative to the forced trend since 1980.Although this internal variability is substantial, the 2σ spread of internal variability from all large ensembles over the 1900-2047/2012 period is ±0.051K/decade.

Implications for Arctic Amplification and Discussions
Internal variability can impact AA through its effect on Arctic warming, global warming, or both.ML algorithms applied here can partition the contribution of externally forced and internally generated trends both over the Arctic and over the globe.Application of these algorithms to observations suggests that internal variability has enhanced Arctic surface warming (+0.145K/decade) while simultaneously dampening global mean surface warming (−0.024K/decade) over 1980-2022 (Figures 3a and 3d).Because AA is the surface temperature trend in the Arctic divided by the global mean trend, the opposing role of internal variability in the Arctic and global average inflates observed AA. Figure 4 is the same as Figure 1b but with AA estimates after we first subtract the contribution of internal variability derived from ML algorithms from both the Arctic and global mean trends and then recalculate their ratio.This was done for both observations and each ensemble member of the seven large ensembles (Figure S6 in Supporting Information S1 shows each large ensembles distribution of AA after removing internal variability indivisually).Upon removing the estimated effect of internal variability from the Arctic and global mean surface air temperature trend, AA from climate model simulations and observational data sets exhibit excellent agreement (cf., Figures 4-1b).
After subtracting the internally generated trend from the mean observational trend over the Arctic (0.791 K/ decade) and globe (0.189 K/decade), the externally generated trend is estimated as 0.646 K/decade and 0.213 K/ decade, respectively, meaning that the externally forced AA is 3.03.A similar result is obtained by using the externally forced Arctic to global warming trends directly estimated by the CNN, which are 0.619 K/decade and 0.207 K/decade (Figures 3b and 3e), and the resulting externally forced AA is 2.99.Our results shown here suggest that internal variability plays a substantial role in inflating recent AA and increased the 1980-2022 AA by 38%.Key to this result, is recognizing that internal variability has enhanced Arctic warming while simultaneously damping global warming.Vertical blue dashed lines show the 2σ spread of externally forced AA based on observations (see Text S2 in Supporting Information S1). Figure 4 shows that the estimates of observed, externally forced AA is still within the range of forced AA based on model simulations even when this uncertainty is included.Although here we present results using a definition of the Arctic as poleward of 70°N, repeating the analysis by defining the Arctic as poleward of 60°N produces similar results (see Figure S7 in Supporting Information S1).We also compare the results of the CNNs used in this study (Text S1 in Supporting Information S1) to Partial Least Squares regression (PLS), a linear pattern matching algorithm that has been shown to have skill in a similar context (Po-Chedley et al., 2022).We chose to use CNNs because they better minimize the RMSE (Figure S8 in Supporting Information S1), but results are similar using either CNNs or PLS methods (see Figures S9 and S10 in Supporting Information S1).The mean AA ratio after removing internal variability contributions to observed trends based on PLS regression and the CNN is 2.98 and 3.03, respectively.
Although we stress internal variability's role in inflating recent AA, these results do not discount the possible influence of forcing on the simulated-versus-observed differences in AA.Systematic biases in the forcing prescription can have a significant impact on simulated AA during 1980-2022.For example, changes in the amount of biomass burning prescribed in CESM2_SBMB compared to CESM2 lead to decreased surface warming in the Northern Hemisphere high latitudes and thus a smaller AA ratio in CESM2_SBMB (Figure 1b).Because AA is defined as the ratio of the total Arctic and global warming, a forcing bias in either region will impact the magnitude of AA even if internally generated trends match observations.Given that the externally forced and internally generated trends estimated from observations are within the bounds of the simulated externally forced and internally generated trends in the large ensembles, a pertinent question is why don't more models simulate the observed levels of AA? Crucial to reproducing the observed AA is simulating internal variability that enhances Arctic warming while simultaneously dampening global warming.The fact that model simulations generally do not reproduce the observed levels of AA may suggest that while models during the 1980-2022 period can simulate the observed amplitude of internal variability in the Arctic and over the globe separately, they struggle to simulate the combined manifestation of internal variability that enhances Arctic warming while suppressing global warming (Rantanen et al., 2022;Rosenblum & Eisenman, 2017).While prior research links multidecadal Arctic and Pacific variability (Bonan & Blanchard-Wrigglesworth, 2020;Screen & Deser, 2019), the causal link between internal variability in these regions and its portrayal in climate models remains an ongoing research area (Baxter et al., 2019).Our results show that considering internal variability can reconcile the discrepancy between observed and simulated AA but also calls for the need to better understand this unusual manifestation of internal variability.

Figure 2 .
Figure 2. Surface air temperature trends over 1980-2022 for the (a) Arctic and (b) global mean.The histograms show the distributions from model simulations, and the vertical lines represent the observations where gray shading shows their range.The black curve shows a normal distribution fitted to all the simulated temperature trends.The horizontal black line shows the range of externally forced trends with ticks showing individual models' forced trends.The trend values from individual observational data sets and forced trend values from individual models are provided in the legend.

Figure 1 .
Figure 1.(a) Local amplification (i.e., local surface air temperature trend divided by global mean temperature trend): over the northern high latitudes from the average of observational data sets and the multi-model mean during 1980-2022.The Arctic is the region poleward of 70°N (black circle), and the corresponding Arctic Amplification (AA) (i.e., the Arctic mean temperature trend divided by global mean trend) is provided at the top of each plot.(b) Comparisons of AA in observations and CMIP6 models.Observations are shown using vertical lines, and gray shading shows their range.Histograms show the relative frequency distribution of AA over 1980-2022 for each model, which is normalized by its number of ensemble members.The black curve shows a normal distribution fitted to all model AA values.The black horizontal line shows the range of forced AAs and the vertical tick marks represent the ensemble-mean AA for each model.The values of AA from each observation and forced AA from each model are provided in the legend.

Figure 3 .
Figure 3. (a)-(c) Arctic and (d)-(f) global surface air temperature trends predicted from the CNN (x-axis) versus corresponding actual trends (y-axis) over 1980-2022.The root mean squared error (RMSE) and correlation coefficient (r) are shown at the top of each plot.Panels (a), (d) shows results for internally generated trends, (b), (e) shows the externally forced trends, and (c), (f) shows the sum of the internally generated and externally forced trends.The vertical lines show the mean observational estimate for each temperature record, and the gray shading shows the ±2σ uncertainty of mean prediction.The mean (x) and its standard deviation (σ) based on observations are provided in the bottom right of each plot.The black diagonal line in (a)-(f) is the 1:1 line.

Figure 4 .
Figure 4. Same as Figure 1(b) but showing the AA estimates after subtracting the contributions of internal variability, as derived from the ML algorithms, from Arctic and global warming.This was done for both observations and each ensemble member of the seven large ensembles.The vertical blue dashed lines show the 2σ range of the estimated forced AA based on observations.
This research was supported by the U.S. Department of Energy (DOE), Office of Science, Office of Biological and Environmental Research, Regional and Global Model Analysis (RGMA) program area, as part of the HiLAT-RASM project.This research was also supported by the NASA FINESST Grant 80NSSC22K1438 and NSF Grant AGS-2202812.Additional funding was provided by the Calvin Professorship in Atmospheric Sciences.S.P.-C was supported through the PCMDI Project, which is funded by the RGMA program area of the Office of Science at DOE. M. Wang is funded with support of the Arctic Research Program of the NOAA Global Ocean Monitoring and Observing (GOMO) office through the Cooperative Institute for Climate, Ocean, and Ecosystem Studies (CICOES) under NOAA Cooperative Agreement NA20OAR4320271, Contribution No 2023-1295, and Pacific Marine Environmental Laboratory Contribution No 5531.Research at Lawrence Livermore National Laboratory was performed under the auspices of U.S. DOE Contract DE-AC52-07NA27344.The Pacific Northwest National Laboratory (PNNL) is operated for DOE by Battelle Memorial Institute under contract DE-AC05-76RLO1830.We would like to acknowledge high-performance computing support from Cheyenne (https://doi.org/10.5065/D6RX99HX)provided by NCAR's Computational and Information Systems Laboratory, sponsored by the National Science Foundation, for the analyses presented in this study and for data management, storage, and preservation.