On the Estimation of Internal Climate Variability During the Preindustrial Past Millennium

We use an ensemble of simulations of a coupled model (NCAR Community Earth System Model) driven by natural radiative forcing estimates over the pre‐industrial past millennium to test the efficacy of methods designed to remove forced variability from proxy‐based climate reconstructions and estimate residual internal variability (e.g., a putative “Atlantic Multidecadal Oscillation”). Within the framework of these experiments, the forced component of surface temperature change can be estimated accurately from the ensemble mean, and the internal variability of each of the independent realizations can be accurately assessed by subtracting off that estimate. We show in this case, where the true internal variability is known, that regression‐based methods of removing the forced component from proxy reconstructions will, in the presence of uncertainties in the underlying natural radiative forcing, fail to yield accurate estimates thereof, incorrectly attributing unresolved forced features (and multidecadal spectral peaks associated with them) to internal variability.

Other studies, however, have argued for a weaker and more limited role for natural radiative forcing of AMO-like variability in past centuries.Knudsen et al. (2014) argued that radiative forcing plays a dominant role only after 1800 CE and an "ambiguous" role prior (AD 1400-1800) to that time.Wang et al. (2017) argued for a more consistent but still minor role, attributing only 30% of the AMO variance to the combined impact of solar and volcanic forcing and arguing for a multidecadal internal oscillation in the residual series after the estimated forced component is removed.
These latter studies used a regression of proxy-reconstructed AMO indices against volcanic and solar radiative forcing time series in an attempt to estimate and remove the forced signal (Knudsen et al., 2014;Wang et al., 2017).Such an approach, however, cannot account for the full structure in the temporal response to forcing, which consists, for example, of a decadal-timescale exponential recovery following impulsive volcanic forcing that is not linearly proportional to the raw forcing series.
It is nonetheless possible, at least in principle, to remove the contribution of forced variability from a time series containing both forced and unforced components, using a direct estimate of the response to the underlying forcing.That estimate can be based on an energy-balance model driven by estimated radiative forcings (Mann et al., 2014) or by averaging across an ensemble of forced coupled model simulations (Frankcombe et al., 2015(Frankcombe et al., , 2018;;Mann et al., 2014;Steinman et al., 2015).Steinman et al. (2015) used the latter approach in conjunction with the CMIP5 historical multimodel ensemble to estimate forced components of surface temperature change from the multimodel means over the Northern Hemisphere, North Pacific, and North Atlantic regions.They regressed these averages against the corresponding instrumental surface temperatures, allowing for a scaling factor different from unity to account for possible differences between real-world and modelled climate sensitivity.The re-scaled mean serves as an estimate of the forced component of observed surface temperature change.Estimates of internal variability, including an "AMO" index for the North Atlantic region, were determined by subtracting that estimate from the raw surface temperatures.
The Steinman et al. (2015) "regression and removal" method appears to perform well in tests with model simulations (Frankcombe et al., 2015(Frankcombe et al., , 2018) ) during the modern instrumental period where forcings are well known and observational data (i.e., historical temperature measurements) are relatively reliable.The method may not perform well, however, during the pre-historic era, where forcings are far more uncertain and where proxy data may yield a biased estimate of climate variability (e.g., Schurer et al., 2013).Moreover, the simple regression-based rescaling procedure used by Steinman et al. (2015) to account for the potential mismatch between modeled and real-world climate sensitivity during the modern era cannot account for errors in estimates of the relative magnitudes of different (e.g., solar and volcanic) radiative forcings or for example the relative magnitudes of different volcanic eruptions.When model sensitivity, forcing estimates, and surface temperatures themselves are all uncertain, the true forced variability is very difficult to estimate and is likely, in such procedures, to leak into the residual estimate of internal variability.
In this study, we assess the performance of the "regression and removal" procedure (as applied in the past to e.g., proxy reconstructions of AMO-like variability) using an ensemble of simulations of a coupled model (the National Center for Atmospheric Research [NCAR] Community Earth System Model [CESM]) driven by natural radiative forcing estimates over the pre-industrial past millennium, where forced and internal variability components are both known and the efficacy of the estimation procedures in question can be rigorously tested.
While many past studies have focused specifically on the putative AMO and North Atlantic sea surface temperatures, our analysis applies more generally to any efforts to empirically separate forced and internal components of climate variability.We focus on the simplest possible variable: global mean temperature, recognizing that the findings have greater generality.If the "regression and removal" approach fails here, it will necessarily fail at regional scales (e.g., the AMO).

Data and Methods
We make use of the CESM Last Millennium Ensemble (CESM-LME), which is a true ensemble, that is, a set of simulations using the same model and identical forcings, differing only in the realization of internal variability.Such an ensemble differs from pseudo-ensembles, such as the CMIP5 multimodel archive, wherein both model physics and forcings vary from one simulation to the next, complicating the interpretation of differences among simulations.Past work (Frankcombe et al., 2018) demonstrates that an ensemble with as few as 10 members is more than adequate to obtain an accurate estimate of ensemble mean and spread.
The CESM-LME includes 13 simulations of CESM1.1-CAM5(Otto-Bliesner et al., 2016) driven by the "GRA" volcanic and "VSK" solar forcing (as well as other minor forcings such as orbital changes and pre-industrial land use changes) from the CMIP5 Last Millennium experimental protocol (Schmidt et al., 2011).The average over the ensemble yields an accurate estimate, within this synthetic framework, of the forced-only component, since that component is identical in each of the 13 simulations, while the internal variability, which is independent among realizations, largely cancels in the average (there is a small residual contribution due to the finite size of the ensemble).
The resulting global average series can be compared (Figure 1) to two alternative estimates of forced-only global mean surface temperature from a simple zero-dimensional energy balance model (EBM).The EBM estimates the balance between incoming shortwave and outgoing longwave radiative forcing, the former of which is prescribed to vary as indicated by solar and volcanic forcing series.The model consists of a single tunable parameter, the equilibrium climate sensitivity, which is chosen to have a mid-range value of 3.0° C for CO 2 doubling (see Mann et al., 2014, for details).The EBM was driven by two different combinations of volcanic/solar forcing: the GRA/ VSK forcing combination used in the CESM ensemble described above and the alternative CEA/SBF series combination from the CMIP5 Last Millennium protocol (Schmidt et al., 2011).Notably, the secondary long-term forcings (land-use change and Earth orbital impacts) included in the CESM simulations introduce a small (∼−0.1 C) millennial-scale cooling trend that is not accounted for in the forced EBM experiments.This has no influence on the multidecadal timescales of interest in this study.
To simulate the impact of using an imperfect model estimate of forced response in the "regression and removal" procedure, we introduce an inconsistency that is representative of the errors that likely exist between the real-world and estimated radiative forcing changes over the past millennium.Specifically, we employ two plausible but different forcing histories in different components of the procedure, using as our model estimate of the forced response the EBM-simulated global mean temperature series (Figure 1) based on the CEA/SBF forcing combination, while employing as our surrogate proxy-reconstructed temperature series each of the 13 realizations of the CESM ensemble, which, as discussed earlier, were driven by the alternative GRA/VSK forcing combination.
For each of the CESM ensemble members, we can obtain a "true" estimate of the internal variability as the residual series that results from subtracting off the "true" forced component (the ensemble mean series).We can then test the "regression and removal" method for estimating the internal variability component by comparing the  Last Millennium Ensemble (1000-1835 CE), smoothed to emphasize multidecadal (>40 years) timescales (for purpose of display, the first 9 of the 13 total realizations are displayed).Shown are the global mean temperature series for each realization (red), the "pseudo" internal variability (thick green), defined as the difference between the global mean temperature series for the realization (red curve) and the (imperfect) estimate of the "forced" component as diagnosed by the "regression and removal" method, and finally the true internal variability (thick black) as described in the article.The CESM ensemble mean global mean series (blue), representing the forced variability, is shown in each panel for comparison.
resulting "pseudo internal variability" series against the "true" internal variability series.We use the multi-taper spectral analysis approach with robust estimation of the background red noise spectrum as described by Mann & Lees (1996) to examine the frequency domain characteristics of those series (see Supporting Information S1 for further details).

Results
We find that the "regression and removal" approach indeed fails to faithfully reproduce the actual internal variability component where it is known.In Figure 2, we show the results for the first nine realizations (the remaining four are shown in the Figure S1 in Supporting Information S1).We see the pseudo internal variability series are inflated in amplitude relative to the true internal variability series, and by contrast with the latter, which are uncorrelated from one realization to the next, they show much of the same structure in each realization-residual forced variability that the method failed to remove.Note, for example, the large amplitude cooling in each case associated with the very large 1258 CE eruption.
An examination of the corresponding power spectra is especially instructive (Figure 3; Figure S2 in Supporting Information S1, Table 1).The spectra of the "true" internal variability series show statistically significant peaks in the interannual (3-10 year period) El Nino/Southern Oscillation range but none in the multidecadal  (30-100 year period; 0.01 < f < 0.033 cycle/year) range, consistent with the recent findings of Mann et al. (2020Mann et al. ( , 2021)).By contrast, the pseudo internal variability series show considerably redder spectra, with 11 of the 13 pseudo internal variability series displaying at least one spectral peak in the multidecadal range that is significant at the p = 0.1 level (average of 1.7 spectral peaks per realization-see Table 1), and 7 of the 13 significant at the p = 0.05 level (average of 0.7 spectral peaks per realization).

Discussion and Conclusions
An analysis using the CESM Last Millennium ensemble and EBM-simulated global temperature series as a test bed reveals that (a) rigorous estimates of internal variability derived from series containing both externally-forced and internal variability do not show evidence of multidecadal internal oscillations, while (b) the "regression and removal" method of estimating internal variability is seen to produce spurious apparent multidecadal peaks that are an artifact of incompletely removed (primarily volcanic) externally forced variability.
Approaches that seek to estimate internal variability by removing a model-based estimate of forced variability from reconstructed surface temperatures are thus shown to produce spurious apparent multidecadal peaks that are an artifact of incompletely removed externally forced variability in a case in which we know a priori there are no significant multidecadal spectral peaks for the true internal variability series.This is not to say that there are not meaningful ways forward toward a better understanding of the relative role of forced and internal variability using climate model simulations and paleoclimate proxy data.Using the MTM-SVD multivariate signal detection approach, Mann et al. (2020Mann et al. ( , 2021) ) have exploited the simultaneous availability of both control and forced last millennium simulations of the same models to demonstrate the apparent dominant role of natural radiative forcing-and volcanic forcing in particular-in apparent multidecadal oscillatory AMO-like behavior in past centuries.A similar comparison with the results of a parallel analysis of actual real-world proxy data, making use of the greatly expanded paleoclimate proxy data now available, could provide further insights.
A possible extension of the current study would involve multivariate (e.g., MTM-SVD) analyses of the CESM-LME surface temperature fields both with and without the (known) forced component of variability removed.The Ensemble Mean 0 0.7

Figure 1 .
Figure 1.Comparison of simulated global mean temperature series over the common timeframe 1000-1835 CE.Shown are EBM-simulated global temperatures (black and red) as in Mann et al. (2021) using the two different combinations of volcanic (GRA vs. CEA) and solar (VSK vs. SBF) forcings used in the CMIP5 Last Millennium protocol and the CESM Last Millennium Ensemble mean global average series (blue).The series have been smoothed to emphasize >15-year timescales.Smoothing here and elsewhere is based on lowpass filtering with optimal boundary constraints as described byMann (2008).The EBMs have been rescaled to have the same variance as the CESM multi-model mean for the purpose of comparison since the regression-based analyses in this article are insensitive to rescaling by a constant.

Figure 2 .
Figure2.Time series for realizations of the CESM Last Millennium Ensemble (1000-1835 CE), smoothed to emphasize multidecadal (>40 years) timescales (for purpose of display, the first 9 of the 13 total realizations are displayed).Shown are the global mean temperature series for each realization (red), the "pseudo" internal variability (thick green), defined as the difference between the global mean temperature series for the realization (red curve) and the (imperfect) estimate of the "forced" component as diagnosed by the "regression and removal" method, and finally the true internal variability (thick black) as described in the article.The CESM ensemble mean global mean series (blue), representing the forced variability, is shown in each panel for comparison.

Figure 3 .
Figure 3. Multi-taper Spectra based on K = 3 data tapers (see Supporting Information S1 for details).Results are shown for the first nine realizations of the CESM Last Millennium Ensemble as shown in Figure 2. Shown are spectra for both the "true" internal variability series (black) and the "pseudo" internal variability series (green) as discussed in the text.Median and p = 0.1 and p = 0.05 statistical significance levels relative to a red noise null hypothesis (bottom, middle, and top red lines, respectively) are shown by the three smoothed red curves.Note that multidecadal (>30 years) timescales correspond to frequencies f < 0.033 cycle/year.