Improving the forecast quality of near-term climate projections by constraining internal variability based on decadal predictions and observations

Projections of near-term climate change in the next few decades are subject to substantial uncertainty from internal climate variability. Approaches to reduce this uncertainty by constraining the phasing of climate variability based on large ensembles of climate simulations have recently been developed. These approaches select those ensemble members that are in closer agreement with sea surface temperature patterns from either observations or initialized decadal predictions. Previous studies demonstrated the benefits of these constraints for projections up to 20 years into the future, but these studies applied the constraints to different ensembles of climate simulations, which prevents a consistent comparison of methods or identification of specific advantages of one method over another. Here we apply several methods to constrain internal variability phases, using either observations or decadal predictions as constraining reference, to an identical multi-model ensemble consisting of 311 simulations from 37 models from the Coupled Model Intercomparison Project phase 6 (CMIP6), and compare their forecast qualities. We show that constraining based on both observations and decadal predictions significantly enhances the skill of 10 and 20-year projections for near-surface temperatures in some regions, and that constraining based on decadal predictions leads to the largest added value in terms of probabilistic skill. We further explore the sensitivity to different implementations of the constraint that focus on the patterns of either internal variability alone or a combination of internal variability and long-term changes in response to forcing. Looking into the near-term future, all variations of the constraints suggest an accelerated warming of large parts of the Northern Hemisphere for the period 2020–2039, in comparison to the unconstrained CMIP6 ensemble.


Introduction
Climate change is undoubtedly one of the most pressing issues of our time, and every region of the world will experience the impacts of a warming climate in the coming decades (IPCC 2021).Besides implementing efficient mitigation measures, it is therefore important to understand and anticipate how exactly the climate is going to change in order to prepare and adapt.The climate in the next 10-20 years-which we refer to as near-term climate change in this study-is of particular interest because most adaptation actions (whether designed for shorter or longer term climate change) will be exposed to these climate conditions, and is a relevant time scale for strategic decision making for climate adaptation (Attoh et al 2022).
Climate model simulations are widely used tools to predict or project future climate conditions.Climate projections simulate the climate responses to prescribed forcing (e.g. from greenhouse gases or aerosols), following a selection of different scenarios about future socio-economic, political and technological developments (O'Neill et al 2016).While the future emission pathways, and therefore the climate forcing, is the dominating uncertainty affecting long-term projections at the global scale (e.g. to the end of the 21st century), near-term regional climate changes are strongly affected by uncertainty related to internal climate variability (Hawkins andSutton 2009, Lehner et al 2020).
Initialised interannual to decadal climate predictions aim to reduce this uncertainty from climate variability, by initialising the climate model simulations with an estimate of the observed climate state, which intends to align the phasing of simulated and observed climate variability modes and also corrects errors in the model response to forcing, by enforcing the model state closer to the observed climate trajectory (Doblas-Reyes et al 2013, Meehl et al 2021).Such initialised climate predictions are nowadays routinely produced for interannual to decadal climate predictions (Kushnir et al 2019, Hermanson et al 2022), but do not typically extend beyond ten forecast years due to the high computational cost of producing retrospective prediction ensembles with annual initialisation (a recent exception providing retrospective predictions for 20 years after initialisation being (Düsterhus and Brune 2023)), which is needed to correct the inherent forecast drifts.
Several methods have recently been developed that aim to pass on some of the observed climate state or initialisation information to beyond ten forecast years, by constraining some aspects of internal climate variability in large ensembles of transient climate projections e.g. based on temporal or spatial analogues of ocean temperature variability (Befort et al 2020, Mahmood et al 2021, 2022).Other methods have been developed to derive seasonal to decadal climate predictions by constraining large climate model ensembles based on model analogues that represent the state of climate variability (Ding et al 2018, Menary et al 2021, Rader and Barnes 2023).All these methods make use of existing climate simulations to derive the climate predictions, and therefore do not involve substantial additional computational cost for providing initialised climate simulations beyond the decadal prediction horizon.These methods select those transient historical simulations or climate projections that are in closest alignment with initialised decadal predictions (Befort et al 2020, Mahmood et al 2021) or the observed climate state (Hegerl et al 2021, Mahmood et al 2022, De Luca et al 2023) at the time when starting the constrained projections.In particular the observations-based constraint, when applied to a large multi-model ensemble from Coupled Model Intercomparison Project phase 6 [CMIP6 (Eyring et al 2016)], provided added skill over the large ensemble of unconstrained CMIP6 projections for 20 year predictions, and skill values comparable to state-of-the-art decadal prediction systems for 10 year predictions (Mahmood et al 2022).
So far, these methods have all been applied to different climate model ensembles-usually what was used at the time of developing the different constraining methods.This application to different base data makes it however difficult to make a meaningful comparison of the different methods and their specific advantages and disadvantages.
In this study we therefore compare the specific methods constraining decadal-scale variability in climate projections based on spatial patterns of ocean temperature variability in agreement with decadal predictions (Mahmood et al 2021) and observations (Mahmood et al 2022), in a consistent way by applying them to an identical ensemble of CMIP6 simulations.This comparison allows us to identify the potential benefits of making use of initialised climate predictions in the constraint versus the simple use of observations prior to applying the constraint acting as 'initialisation' of the near-term projection.We further explore specific characteristics related to these constraining methods, e.g.details of how exactly the similarity of the projections to the decadal prediction or observational reference is measured, which differently exploits internal variability or warming trend information in the constraint.

Data and methods
We use transient historical climate simulations and climate projections from the latest version of the CMIP6.A total of 311 members from 37 CMIP6 models were used (see supplementary table S1), making use of all simulations that provided the required data by the time of analysis.These historical simulations are forced by the historical observed natural and anthropogenic forcings up to 2014 and use estimations of forcings based on shared socioeconomic pathways (SSP) scenarios from 2015 onwards.We use data from the SSP2-4.5 from 2015 onwards (and up to 2039 for the near-term future projections), the same scenario as is also used in the decadal predictions (Boer et al 2016).Note, however, that the differences between the different scenarios are small during the first decades until 2040.We use monthly sea-surface temperature (SST) data for the constraint, and analyse the (retrospective and future) climate predictions using monthly near-surface air temperature data.
We also use initialised decadal prediction experiments from CMIP6's Decadal Climate Prediction Project (DCPP) Component A (Boer et al 2016).These (retrospective) decadal predictions are initialised with our best estimates of the observational climate state in each year from 1960 onwards, and then run for ten years, using the same forcing data as the historical simulations until 2014 and forcings from the SSP2-4.5 scenarios afterwards.We use a total of 93 ensemble members from nine different decadal prediction systems providing data for the DCPP hindcast experiments.Not all of these models also provide decadal forecasts initialized in recent years.Therefore when presenting the predictions for the period 2020-2039 we can use only 65 ensemble members from 6 different decadal prediction systems (highlighted in supplementary table S1).The hindcast skill of constrained projections based on this smaller subset of DCPP simulations is however very similar to the skill using all models available for hindcasts (compare supplementary figure S14 to figure 1 and supplementary figure S15 to figure 2).This similarity is expected as only the DCPP ensemble mean anomaly patterns are used for the constraints.
Gridded observed SST data from the Extended Reconstructed Sea Surface Temperature version 5 [ERSSTv5; (Huang et al 2017)] are used when constraining based on observations, same as in Mahmood et al (2022).The SST data sets used for constraining projections, including both the model simulations and the observations, were regridded to a common uniform 3 • × 3 • grid.As reference to evaluate the retrospective constrained and unconstrained projections, as well as decadal predictions, we use the gridded observational surface temperature data from HadCRUT4.6 (Morice et al 2012).
The methodology used for constraining the projections is similar to Mahmood et al (2021), Mahmood et al (2022) in which the members are ranked based on the agreement of their global SST anomaly patterns with either observations or the DCPP ensemble mean, using area-weighted pattern correlations.The members ranking highest are then used to make predictions for the next 10 or 20 years in this study although prediction on much longer timescales can, in principle, be achieved given the projections provide data until the end of the 21st century.Following Mahmood et al (2022) we select the 30 top ranking members in each year (that is, the members selected can differ in the selection for each year), according to their pattern correlation with the observed or DCPP-predicted time-averaged SST anomalies.
Please note that the time windows over which the SST patterns are averaged for the constraints are different between the two methods using observations or decadal predictions, as both methods can only make use of information that is available by the time of making a prediction.For example, for predictions starting in 1961 (that is, to predict the 10 year period 1961-1970 or the 20 year period 1961-1980) would-in case of constraining based on 5 year average SST patterns-use observed SST data from 1956 to 1960, but would use DCPP-predicted SSTs from 1961-1965 from the decadal predictions initialised in 1960.We systematically test the SST averaging periods for the constraints for temporal averages ranging from 1 to 9 years in this study.These different temporal averages focus the constraining criteria to different (inter-annual to decadal) temporal modes of climate variability.In the following text we use the term OBS-constrained when referring to the projections constrained against observed SST anomaly patterns, and DCPP-constrained when referring to the projections constrained against the ensemble mean of the DCPP decadal predictions.
While previous studies (Mahmood et al 2021, 2022, De Luca et al 2023) have calculated the pattern correlations using a so-called 'uncentered' approach (which means the actual SST anomaly patterns are used), in this study in addition we also test to constrain based on pattern correlations using a 'centered' approach (which means the global mean SST anomaly is subtracted from the SST anomaly value at each grid cell).
Among the many choices involved in the ensemble member selection, there is also some sensitivity to the number of top-ranking members selected based on their SST anomaly pattern correlations.A previous study found the sensitivity of results to selecting 10, 30 or 50 members to be relatively small (Mahmood et al 2022).For the sake to focus the study on the comparison of different approaches in identical setting we therefore do not consider this sensitivity in the present study.
For analyzing the constrained projections, we interpolate the data from all model ensemble members to a common grid with resolution of 5 • × 5 • (Goddard et al 2013, Mahmood et al 2022).The evaluations are based on the annual constraints applied in all years from 1961 to 2000 (these years refer to the first year of the 10 year or 20 year forecast periods being evaluated).The skill of the constrained ensembles is evaluated using a range of different metrics.We calculate the anomaly correlation coefficient (ACC) to test the phase agreement between the climate model ensemble means (e.g.OBS-constrained, DCPP-constrained, unconstrained, and DCPP) and observational data sets (Goddard et al 2013).Since the ACC of temperature predictions is strongly affected by the forcing signal, the added value of the constrained ensemble over the full CMIP6 ensemble is estimated based on residual correlations after removing an estimate of the forced signal from observations and the constrained ensembles (Smith et al 2019).To this end, we use the ensemble mean of the full, unconstrained, CMIP6 ensemble as the best estimate of the forced signal.In some cases the residuals could be highly correlated but represent only a small fraction of the predictable signal (e.g. when the skill related to the forcing response is large).We therefore also calculate the ratio of predictable signals by dividing the predicted signal from initialisation by the total predicted signal, following Smith et al (2019), and show these as supplementary figures to complement the residual correlation results.The statistical significance of the ACC and residual correlation is assessed using a two tailed t-test.To account for temporal auto-correlation we calculate the effective degrees of freedom following (Guemas et al 2014).
We further calculate the ranked probability skill score [RPSS; (Wilks 2011)] to evaluate the skill of probabilistic predictions from the constrained ensembles against probabilities from the unconstrained ensemble as a benchmark.The RPSS is being calculated for three categories (above average conditions i.e. temperatures above the upper tercile, average conditions i.e. temperatures between the lower and upper tercile thresholds, and below average conditions i.e. temperatures below the lower threshold).To assess the significance of RPSS, a random walk test was applied following (DelSole and Tippett 2016).This test is applied to the Ranked Probability Score time series of the different hindcasts (e.g.constrained and unconstrained ensembles), and evaluates whether the number of times that the forecast is better or worse than the reference forecast is significant at the 95% confidence level.
Another important property of prediction ensembles is their reliability (Weisheimer andPalmer 2014, Verfaillie et al 2021).Here we evaluate the reliability of the different (constrained or unconstrained) model ensembles using the spread-over-error ratio (SOE), following (Ho et al 2013).The SOE is calculated based on the ratio of the ensemble spread to forecast error, and should be close to one if the ensemble is reliable.Values larger than 1 indicate underconfident (i.e.overdispersive) ensembles, and values smaller than 1 indicate overconfident (i.e.underdispersive) ensembles.

Skill evaluation of the constrained projections
The regional and global skill of the constrained ensembles is sensitive to the temporal average for which the similarity of the spatial SST anomaly patterns is estimated (e.g.average SST anomaly patterns after initialisation of the decadal predictions or before (poor-man) 'initialisation' in the observational constraint), as these temporal averages may highlight different spectra of the annual to decadal variability.We compare results for different constraining periods (1-9 years) to identify possible optima with regard to the global area of added skill relative to the unconstrained CMIP6 ensemble.
When constraining based on observations (OBS-constrained), approximately 10% of the global area show significant added skill (as measured by the residual correlation) for the first 10 forecast years when estimating the SST pattern similarities based on 1, 2 or 3 year averages-similar to the DCPP decadal hindcasts (figure 1(a)).This area of added skill increases up to 20% when using longer time averages.For longer forecast ranges (e.g. the average of forecast years 1-20), the global area with significant positive residual correlations ranges between 25% (when constraining based on 1 year average SST patterns) and above 30% (when averaging SST anomalies over 4 years and longer).
Also for the RPSS, the global area with added skill increases for constraining periods from 1 year up to 4 year averages, and then remains at approximately similar values of the area fraction with significant skill (figure 1(b)).For forecast periods of 10 years, about 30%-35% of the global area reach significant RPSS values against the unconstrained CMIP6 ensemble as reference (compared to less than 10% in the DCPP decadal hindcasts), and for forecast periods of 20 years about 45% of the global area reach significant RPSS values.
When using initialised decadal predictions to constrain the climate projections (DCPP-constrained), the global area fractions with significant improvements based on residual correlations are in a similar range as for the observations-based constraints (figure 1(e)).An important difference is that in this case of DCPP-constrained, there is a global skill maximum for averaging times of 2-3 years, when 10 year predictions have positive residual correlations in about 15% of the global area and 20 year predictions show added skill in more than 30% of the global area.These area fractions decline for longer averaging times, which indicates that the decadal predictions provide the largest added value (in particular in terms of residual correlation) in the first 2-3 forecast years (supplementary figure S16), and may point to issues deteriorating the quality of the initialised predictions for longer forecast times.
Similar to results of the OBS-constrained ensemble, the global area with significant RPSS values increases when constraining based on longer temporal averages up to 5 years, and then approximately levels if further increasing the averaging time up to 9 years (figure 1(f)).
For 10 year predictions, the global area with significant positive RPSS values (against the unconstrained CMIP6 ensemble as reference) exceeds 35% when constraining based on anomaly patterns averaged over 5 and more years.And for 20 year predictions, the DCPP-based constraint leads to significant improvements compared to the full CMIP6 ensemble (as measured by RPSS) over more than 50% of the global area.
The constrained projections based on the centered pattern correlations result in smaller areas with significant added skill ( figures 1(c), (d), (g) and (h)).These areas reach slightly above 10% of the globe for residual correlations when constraining based on observations, for both 10 and 20 year predictions (figure 1 (c)).This area is at similar magnitude as the DCPP decadal hindcasts.When constraining with the DCPP ensemble mean, the areas with added skill (based on residual correlation) remain below 10% for 10 year predictions, but increase up to 15% for 20 year predictions when using longer time averages to constrain (figure 1(g)).The RPSS values indicate significant added skill in less than 10% of the globe in most cases (figure 1(d)).Only when constraining based on the decadal predictions and using longer SST averages of eight or nine years, significant RPSS is reached in slightly larger areas up to 13% for 10 year predictions and 18% for 20 year predictions (figure 1(h)).
Overall, this analysis illustrates that the optimum choices for the temporal averages used to constrain SST anomaly patterns are sensitive to the skill metric of interest (e.g.residual correlation versus RPSS) and forecast time (e.g. 10 or 20 years).When constraining against observations, both the residual correlation and RPSS indicate largest global areas with (added) skill when averaging over 4-9 years (with only small differences for these different constraining times).In contrast, when constraining against decadal predictions, residual correlations indicate an optimum for constraining times of 2-3 years (in the uncentered case), whereas RPSS values reach a maximum when averaging SST patterns over 5 or more years.In general, constraining based on 5 year averages seems to give reasonably good results in all cases considered here.But when constraining based on decadal predictions, this averaging time scale represents a compromise between slightly larger areas with significant residual correlations for shorter averaging periods and largest areas with significant RPSS when averaging over 5 years or longer.The following discussion of spatial skill patterns focuses therefore on results based on 5 year averages to constrain global SST anomaly patterns, but results for other constraining periods are provided in supplementary figures S1-S4.
The spatial patterns of skill (as described by the ACC) for predicting near-surface temperatures are largely similar between the two constraining methods, using the uncentered approach.For both 10 and 20 year predictions most parts of the globe show high correlations, above 0.8 in large parts of the Atlantic and Indian Oceans and over continental regions in Eurasia, Africa, North and South America (figure 2).More moderate correlations (above 0.5) are found over parts of the Pacific Ocean and the Southern Ocean, and lower (non-significant or negative) correlations in the extra-tropical Southeast Pacific and extra-tropical Northeast Pacific and some grid cells in the Southern Ocean.
These skill patterns are overall similar in the unconstrained ensemble (supplementary figure S5), and the high correlations are to a large extent the consequence of long-term climate warming in response to external forcing from anthropogenic greenhouse gasses and aerosols (Smith et al 2019, Borchert et al 2021).The residual correlations indicate additional skill in predicting temperature variations on top of the forced warming in some regions.For the average of forecast years 1-10, positive residual correlations are primarily found over the North Atlantic (including parts of Western Europe in both constraining methods), the eastern and tropical Pacific, and some small areas in the Indian Ocean and Southern Ocean (and some neighbouring continental grid cells).These regions of added skill are similarly also found in the initialised decadal hindcasts from DCPP.In addition, the OBS-constrained ensemble also provides some added skill in mid and high latitude regions of Eurasia.
For 20 year predictions, added skill (as indicated by positive residual correlations) is found over similar regions but generally more extended areas than for 10 year predictions, most notably over the tropical Pacific but also some land regions over the Americas, Africa, Australia, western Europe and eastern Asia.The OBS-constrained ensemble results in larger areas with added skill also over central Asia.These areas of positive residual correlations are consistent with positive values of the ratio of predictable signals (supplementary figure S6).The latter, however, indicates that the largest added value from the constraints (in particular based on the uncentered approach) is found over the Pacific-where the actual ACC skill is lower, and therefore the residual correlation represents a larger fraction of the total skill.
The RPSS patterns indicate added skill from the constraints compared to the full (unconstrained) CMIP6 ensemble in similar mostly regions as the residual correlation, i.e. for 10 year projections primarily over the eastern tropical Pacific, the north Atlantic including western Europe (figure 3).Large areas with added skill (larger than seen for residual correlations) are found around the Indian Ocean and surrounding land region in southern and eastern Asia and Australia.As discussed further up (figure 1), about half of the global surface shows significant skill improvements for 20 year projections in the constraints based on uncentered pattern correlations, as measured by RPSS.Positive RPSS values are found in large parts of the tropical and subtropical Pacific, western North America, the North Atlantic including western Europe, substantial parts of Africa, the Indian Ocean, Australia, southern and Eastern Asia.Constraining based on DCPP often leads to higher RPSS values than constraining based on observations.
Overall, constraining projections based on their SST anomaly patterns in the years before or after 'initialisation' (using both observations and decadal predictions) can significantly enhance the forecast quality of near-term climate projections for the next 1-2 decades.The values and spatial patterns of skill improvements are largely similar between the two (uncentered) constraining methods in particular for correlation measures, although for RPSS we find larger values (and globally somewhat larger areas with significant values) when constraining based on decadal predictions.Using RPSS as the skill measure therefore indicates a small benefit for using decadal predictions, or at least initialised multi-year predictions for the next 2-5 years, as target against which to constrain the projections.The finding that the constraints can in some regions lead to higher skill than the DCPP-A hindcasts that are used to constrain the projections is intriguing.We refer the reader for a further discussion of this phenomenon to section 4.
Constraining based on the centered pattern correlation approach also results in very high correlations in most parts of the globe (figure 4).This approach, however, also reveals some differences regarding the regions with added skill in comparison to uncentered pattern correlations approach.In this case when using centered pattern correlations, we also find added skill in large parts of the North Atlantic Ocean, similar to the uncentered approach (figure 4).However, in the Pacific Ocean we find distinct patterns of added skill between the two approaches.In particular in the OBS-constrained ensemble, we find added skill in the western extratropical North Pacific, an area which did not show added skill in the uncentered approach.By contrast, no or only little added skill is found in the tropical and eastern extratropical North Pacific-the areas where the uncentred approach showed the largest improvements.Similar areas of added skill are also identified for RPSS (figures 3(c), (d), (g) and (h).
Maps of the SOE (figure 5) indicate that the unconstrained CMIP6 climate simulations are underconfident over large parts of the Northern Hemisphere (in particular over and around the North Atlantic and some areas in the Southern Hemisphere mid-latitudes (e.g.parts of Australia and the Indian Ocean), whereas in other parts of the Southern Hemisphere the CMIP6 ensemble is overconfident (e.g. at higher latitudes, over the Southern Ocean).This pattern is very similar in the DCPP decadal hindcasts, although SOE values over parts of the North Atlantic are closer to 1 (i.e. the DCPP ensemble is less underconfident than the CMIP6 historical runs); only at the most western part along the east coast of North America the DCPP ensemble is even less reliable than the unconstrained CMIP6 historical runs (supplementary figure S7).The constrained projections show SOE values closer to one over much of the North Atlantic (and surrounding) region, indicating that the variability constraints can provide more reliable near-time climate information for the next 10-20 years.These improvements in reliability (in terms of less underconfident ensembles) are strongest in the constraints based on uncentered pattern correlations-and here in particular when constraining based on DCPP, where the areas with underconfident projections are largely reduced to the eastern part of the North Atlantic and Europe (and even here showing improved reliability in terms of smaller SOE deviations from one in large areas).Also the constraints based on centered pattern correlations improve the reliability in parts of the North Atlantic (and surrounding continents) region, but less than the uncentered approach.The centered approach also leads to even more underconfident near-term projections projections in some parts of the North Atlantic (the subpolar North Atlantic when constraining against past observations and a region in the eastern midlatitude North Atlantic when constraining against decadal predictions).

Selected ensemble members by the different constraining methods
We next scrutinise some details about the specific ensemble members being selected by these constraining methods.As our full CMIP6 ensemble includes different models of very different ensemble sizes (individual models contribute between 1 and 50 ensemble members to the total of 311 ensemble members, see table S1 in the supplementary information), we calculate the percentage of simulations from each model selected at each start date (figure 6).Selecting 30 out of 311 ensemble members in each year suggests that, if each member had equal probability to be selected, it would be selected in 9.6% of cases when averaging over all start dates.Indeed, on average over all annual start dates 1960-2020, most of the models fall into the 0%-20% bin, consistent with that theoretical expectation.A few models, however, seem to be over-proportionally often selected in particular in the uncentered approach: members from the 2 versions of CanESM are selected on average in 21%-40% of the cases when constraining against both observations and decadal predictions, UKESM1-0-LL is over-proportionally selected in the DCPP-constrained ensemble, and FIO-ESM-2-0 and CESM2-WACCM for the OBS-constrained ensemble (note also that a few models which provide only a single ensemble member are selected in none of the start dates).
These uneven selection probabilities of the different models can be related to insufficient sampling of internal variability during the 61 start years, or to other effects that favour the selection of some models over others such as larger similarity with observations of SST variability patterns or warming patterns in some of the models.The similarity of SST anomaly patterns used to constrain the climate simulations has so far been computed based on so-called 'un-centered' pattern correlations (e.g.Mahmood et al 2021Mahmood et al , 2022)), that means that the simple local anomalies at each grid cell are used without subtracting a global average.This results in global anomaly patterns that are dominated by negative values (and a negative global average) in the earlier years (see figure 7).And in turn the global anomaly patterns are dominated by positive values in later years (please note that the anomalies have been calculated relative to the 1981-2010 climatological period).This changing dominance from negative anomalies to positive anomalies hence includes the signatures of the global warming trend over this time period used to constrain (or conceptually 'initialise') the projections, and will favour the selection of models with more similar warming patterns.The alternative approach measures the similarity of projections with SST anomaly patterns from observations or the decadal predictions is to calculate the so-called 'centred' pattern correlations.In this case, the global average of anomalies is subtracted from each grid value.This ensures that the global average of each anomaly map is zero, and therefore removes the signatures of global warming and results in more balanced maps with positive and negative anomalies for all start dates (figure 8).Selecting based on centered pattern correlations also results in a more balanced selection across models and avoids the over-proportional selection of some models (supplementary figure S8).Overall, this constraining approach using centered pattern correlations focuses more on the signatures of internal variability modes, whereas the constraint based on uncentered pattern correlations combines the effect of internal variability and the long-term warming in response to forcing.
The uncentered pattern correlation values of the selected simulations are typically above 0.6 for the observations-based constraint in the early and late part of the investigation period, and drops to below 0.3 around the year 2000 (supplementary figure S9).For the DCPP-based constraint the uncentered pattern correlation values even exceed 0.8 at the beginning and end of the investigation period, and also drop to around 0.3 in 1995.This drop in the pattern correlation values occurs approximately in the middle of the 1981-2010 climatological period against which the anomalies are calculated.In the middle of the climatological period the average anomalies are therefore close to zero and are less affected by the warming trend than anomalies outside or at the edge of the climatological period.Please note that the time when the drop in pattern correlation occurs is shifted by approximately 5 years (when constraining based on 5 year averages) between the observations-based and the DCPP-based constraint.This is because the observations-based constraint considers the SST anomalies prior to the constraining date, whereas the DCPP-based constraint uses anomalies from the initialized predictions and therefore after the initialization date.This drop in the pattern correlation points to a potential issue with the method, as it suggests that the constraining criteria changes along the selection and evaluation period (where the warming trend plays a larger role outside the climatological period but plays a smaller or no role in the middle of the climatological period).
In contrast, when constraining based on the centered pattern correlations, there is no such sharp drop in the correlation values (supplementary figure S10), and the correlations are lower in general (approximately 0.3-0.5 when constraining based on observations and approximately 0.4-0.6 when constraining based on the decadal predictions).This suggests that the constraining criteria are more consistent in time for this centered approach.

Projections of the near-term climate during 2020-2039
We finally apply these different methods to provide constrained future projections for the 20 year period 2020-2039.The unconstrained CMIP6 ensembles and all constraining methods project a substantial warming relative to the 1981-2010 climatology (figure 9 and supplementary figure S11), with strongest warming magnitudes in high northern latitudes.However, the exact warming amounts differ between the different methods.
When compared to the unconstrained CMIP6 ensemble, all constraints result in an amplified near-term warming over most global land regions (excluding Antarctica), and in particular over the Northern Hemisphere (figure 9).The constraints using uncentered pattern correlations (thus being affected by a signature of the warming trend in addition to internal variability) indicate amplified warming in almost all regions of the globe, with strongest differences exceeding 0.5 K compared to the unconstrained ensemble in high northern latitudes.
The effect of the constraint is smaller in the approaches measuring the similarity as centered pattern correlations.Still, when constraining against the decadal predictions initialised in 2019, the projected 20 year period 2020-2039 is warmer compared to the unconstrained CMIP6 ensemble in some parts of the globe and in particular the Northern Hemisphere (note that for the centered constraint based on observations the differences are statistically significant (p ⩽ 0.05) only over the Northern Pacific and some small areas over the eastern North Atlantic around southeastern Europe).The differences compared to the unconstrained CMIP6 are smallest when applying the approach using centered pattern correlations and observed SST anomalies.Also in this case there is a tendency for parts of the Northern Hemisphere to warm faster than the full CMIP6 ensemble mean.This constraint however also predicts a lower warming in the Subpolar North Atlantic, suggesting the tendency towards a cooler phase of the Atlantic Multidecadal Oscillation in the coming decades.
To better understand the differences between the projections using the different constraints, we have a look at the 20 year temperature trajectories over the North Atlantic Subpolar Gyre (SPG) region in the specific ensemble members being selected by the different constraints (supplementary figure S12).The SPG region shows some of the largest differences in near-future projections between the different constraints.While all constraints are highly skillful in this region, the added skill is largest and most widespread in the SPG region in the observations-based constraint using centered pattern correlations (compare figures 2 and 4).For the near-future 2020-2039 projections, both DCPP-based constraints indicate stronger warming by up to 0.5 K compared to the unconstrained CMIP6 ensemble mean in this region (figure 9).In contrast, the constraint based on observations and centered pattern correlations shows a smaller warming than the unconstrained CMIP6, and the constraint based on observations and uncentered pattern correlations also indicates an area with relatively small and non-significant differences.
The DCPP-based constraints select a relatively large number of CanESM5 ensemble members (21 members for the uncentered approach and 16 members for the centered approach).As discussed earlier, CanESM5 is one of the CMIP6 models with highest climate sensitivity, and all selected CanESM5 members project increasing temperatures during the 2020-2039 period over the SPG.It is also worth noting that the spread between the different selected CanESM5 members is relatively small compared to the spread across the other selected members (supplementary figure S12).Fewer CanESM5 members are selected in the observations-based constraints (11 in the uncentered approach and 6 in the centered approach), and in turn more simulations that project a smaller warming or even cooling (notably 2 CESM-WACCM members, and some MIROC6 members) in the SPG over the next 20 years.
These results demonstrate the importance to further scrutinize the mechanisms behind the selections in the different constraining approaches.But also the regional skill should be taken into account when interpreting these results.As discussed earlier, the observations-based constraint using centered pattern correlations shows highest (added) skill in the SPG region, which is a region strongly affected by decadal to multi-decadal variability.In this context, the over-proportional selection of CanESM5 members in the uncentered and DCPP-based approaches may be unrealistic (remember the selection was made based on global SST patterns including a global warming signature in the uncentered cases; in this study we did not apply regional SST constraints, which can lead to different results (Mahmood et al 2022, Cos et al 2024)).Based on the four approaches compared here, the observations-based constraint using centered pattern correlations may be most credible for the SPG region (based on skill and avoiding the clustered selection of a strongly warming model), a region strongly characterized by decadal-scale variability.However, in other regions some of the other constraints show stronger skill.And while hindcast skill does not necessarily indicate forecast skill, it is a useful indicator of the general credibility of the constrained projections.

Summary, discussion and conclusions
We present a comprehensive evaluation of different approaches to constrain internal variability in large multi-model ensembles of climate projections, based on selecting those ensemble members that are in closest agreement with SST anomaly patterns from observations or decadal predictions at a given point in time at the beginning of the near-term projection period.These approaches have previously been shown to improve the accuracy and probabilistic skill of seamless near-term climate change estimates (i.e. for the next 10-20 years), but have been developed and demonstrated based on different data bases, which did not allow to compare both approaches.Here we apply both constraining approaches to an identical ensemble of 311 different simulations from 37 different CMIP6 models, allowing for a consistent comparison and identification of potential advantages of one method over the other.We find that both constraining methods improve the accuracy of both 10 and 20 year projections in large parts of the globe, and on decadal prediction time scales both methods result in larger areas with added skill than the decadal (retrospective) prediction experiments contributed to CMIP6-DCPP.When measuring skill (or added skill) based on correlations of the ensemble mean, there is no clear advantage of one method over the other when using optimised selection settings (e.g.temporal averaging of SST patterns) specific for each method.When measuring skill based on the probabilistic RPSS (taking the distribution of all ensemble members into account), there is an indication of higher skill and larger areas with added skill when using initialised decadal predictions to constrain the projections.
These previous implementations of the constraints measured the similarity between individual ensemble members and the observational or decadal prediction reference based on uncentered pattern correlations.This means that the similarity of SST anomaly patterns is measured based on SST anomalies relative to a fixed climatology.These SST anomaly patterns are affected by the general warming trend over the past 60 years, resulting in predominantly negative anomalies in the earlier period and predominantly positive anomalies in recent years.This implies that this constraint also considers the signatures of the long-term warming trend.We find here that this approach selects some models over-proportionally often when applied to a multi-model ensemble.These over-proportionally selected models include models that are characterized by a high climate sensitivity (Zelinka et al 2020).
The uncentered constraint considers the mean SST warming patterns, in addition to the phasing of modes of variability.Some studies have developed constraints on equilibrium and transient climate sensitivity based on the observed warming rates in recent decades (Jiménez-de-la-Cuesta and Mauritsen 2019, Nijsse et al 2020, Tokarska et al 2020).However, Armour et al (2024) demonstrated recently that these climate sensitivity estimates might be biased low due to the inability of current climate models to reproduce the observed warming patterns, and that the observed warming patterns since 1980 may have led to a reduced observed global warming.By constraining based on observed (or observations-initialised for DCPP) SST patterns, our constraints could in part account for this effect.The spatial patterns of future warming, however, remain a major uncertainty.Our constraint does not resolve this uncertainty, but ensures that the near-term projections are started from a global SST pattern (which is a result of warming and variability) that is closest to the observed climate.
We also test an alternative approach, in which we constrain based on centered pattern correlations, where the global mean anomaly is subtracted from the local anomaly values.In this case, the global average anomaly is zero in all start years, which leads to more balanced anomaly maps that do not show the signatures of a warming trend as in the case of the uncentered approach.This approach also results in a more even selection of models, which indicates that this alternative selection focuses more on variability alone without the warming effect.
The pattern correlation values based on which the members are selected indicate larger similarity (i.e. higher correlations) with the observations or decadal predictions using the uncentered approach, which might contribute to the larger skill in many regions.There appears, however, a temporal inconsistency in this approach, with lower pattern correlation values in the center of the climatological period when global mean anomalies are close to zero.The centered approach shows temporally more consistent pattern correlations, but at lower values.
Overall higher skill can be achieved using uncentered approach-so if the goal is to achieve the highest possible skill, then this might be the method of choice.However, also the approach based on centered pattern correlation can have its advantages when the focus is specifically on some aspects of climate variability.For example, the skill of the North Atlantic SPG SSTs (e.g.averaged over 45 N-60 N, 50 W-20 W) is higher based on the centered constraint than in the uncentered one (not shown).
Taking both methods in combination (i.e.constraining based on centered versus uncentered anomalies) could offer a pathway to disentangle the roles of internal variability and response to forcing.This is based on the understanding that constraining based on centered anomalies focuses on signatures of internal variability, whereas constraining based on uncentered anomalies includes signatures of both long-term changes in response to forcing and aligning internal variability.A difference between both constraints could then be interpreted as contributions from the forcing response.This should be further explored and exploited in future studies.
The finding that the constrained ensemble can in some regions achieve higher skill than the initialised decadal predictions used to constrain the transient simulations is intriguing.We consider different hypotheses that could explain this finding.First, it is possible that the constraint towards the decadal prediction ensemble mean 'reinitialises' the ensemble member towards the predictable signal (represented by the ensemble mean) and removes the unpredictable noise.This argument is similar to the ensemble dispersion filtering presented by Kadow et al (2017), who re-initialise their initialised decadal predictions towards the ensemble mean after lead year two of the initialised runs.We tested this hypothesis by selecting the 30 DCPP members that are closest to the DCPP ensemble mean for each initialization (supplementary figure S13).However, this accordingly constrained DCPP ensemble does not show higher skill than the full DCPP ensemble (compare against e.g.figure 2(i)), suggesting the ensemble dispersion filter hypothesis does not seem to explain the high skill in the constrained projections compared to DCPP.
Second, it is possible that skill in the initialised decadal predictions is deteriorated as a consequence of effects related to initialisation shocks and related drift (Sanchez-Gomez et al 2016, Kröger et al 2018, Bilbao et al 2021), whereas the constrained transient runs do not suffer from these artefacts.This could also explain why the projections constrained with observations show larger skill, in terms of correlations, than the projections constrained with decadal predictions (i.e. in the latter case the DCPP mean constraining reference could still be affected by the drift effects) in some regions, e.g. in parts of the Pacific.
While for decadal prediction horizons we find that the constrained projections are competitive with the initialised decadal predictions and in some regions even better, this does not necessarily hold for shorter forecast times.A previous study (Mahmood et al 2022, see their supplementary figure S7) showed that for inter-annual predictions (forecast year 1) the DCPP ensemble shows substantially higher skill than the constrained projections.One interesting future evaluation could therefore be to compare the relative benefits of initialised predictions and constrained projections (in which the constraint can be further optimised for shorter forecast times using shorter constraining periods as well) for shorter forecast times (e.g.inter-annual and multi-annual).This could identify a potential optimum up to where the (computationally expensive) initialised predictions show most benefit, and may suggest to run these initialised predictions for shorter and then derive longer-term prediction information from the constrained projections (consistent with the finding in this study where the areas with added skill (e.g.residual correlation) in the DCPP-constrained projections are largest for the constraining periods of 2-3 years).
In conclusion, the different constraints that select those members of a large ensemble of transient climate simulations that have the most similar SST anomaly patterns to observations or initialized decadal predictions, can significantly enhance the skill of near-term climate projections in some regions.The different methods and their specific implementations (e.g. using centered or uncentered anomalies) have their specific advantages, and choice of a preferred method would depend on a specific target region of interest or specific research question.For regions with added skill, these constrained projections can provide improved estimates of near-term climate changes.Such near-term projections of enhanced forecast quality can support more targeted adaptation strategies to enhance resilience to the regional climate changes expected in the next few decades.

Figure 1 .
Figure 1.Fraction of global area where the added skill in near-surface temperature (measured as residual correlation (a), (c), (e), (g) and RPSS (b), (d), (f), (h) against the unconstrained CMIP6 ensemble as reference) of the constrained ensembles is positive and statistically significant at the 95% confidence level.In each panel the added skill is shown for constraints using SST anomaly patterns averaged over 1-9 years, and for reference also the area with added skill in the DCPP initialised decadal hindcasts is shown.Different colors represent different forecast periods (light pink: forecast years 1-10 average, blue: forecast years 1-20 average).The evaluations are based on the 40 constraints (or initializations in case of the DCPP hindcasts) applied in all years from 1961 to 2000 (these years refer to the first year of the 10-year or 20-year forecast periods being evaluated).

Figure 2 .
Figure 2. Anomaly correlation coefficient (ACC) between observed and predicted near-surface temperature anomalies for two forecast periods, average of forecast years 1-10 and 1-20, for the OBS-constrained ensemble (a), (b) and the DCPP-constrained ensemble (e), (f) using the uncentered pattern correlation approach.Figure shows constraints when using SST anomaly patterns averaged over 5 years.Residual correlations after removing the forced signal (estimated based on ensemble mean of the unconstrained 311 members following Smith et al 2019) for the constrained Best30 using the (c), (d) observational constraint (g), (h) DCPP-based constraint, and (i) for DCPP ensemble means.Stippling indicates regions where the ACC or residual correlation are statistically not significant at 95% confidence level (see Methods for details).The evaluations are based on the 40 constraints (or initializations in case of the DCPP hindcasts) applied in all years from 1961 to 2000 (these years refer to the first year of the 10 year or 20 year forecast periods being evaluated).

Figure 3 .
Figure 3. RPSS for the Best30 members constraining against observations based on the (a), (b) uncentered and (c), (d) centered pattern correlations and against the DCPP decadal predictions based on the (e), (f) uncentered and (g), (h) centered pattern correlations, (e) shows skill of the DCPP ensemble.RPSS was calculated against the unconstrained CMIP6 ensemble as reference.Stippling indicates regions where RPSS is not significant at 95% confidence level based on a Random Walks test (see Methods).The evaluations are based on the 40 constraints (or initializations in case of the DCPP hindcasts) applied in all years from 1961 to 2000 (these years refer to the first year of the 10 or 20 year forecast periods being evaluated).

Figure 4 .
Figure 4.As figure 2, but using centered pattern correlations for the constraints.

Figure 6 .
Figure 6.Relative frequency (i.e.count of selected simulations divided by the number of provided ensemble members provided by each model) of each model being selected into the OBS-constrained (blue) and DCPP-constrained (red) ensembles for each start date (start dates are shown along the x-axis; year numbers corresponding to the first forecast year of the constrained projections, i.e. starting in 1961).The number of members available from each model are shown in brackets after each model name.Average relative frequency over all start dates is shown in the rightmost column.

Figure 7 .
Figure 7. 5 year average SST anomaly patterns relative to the 1981-2010 climatological period used to constrain the projections in 1961 (left), 1995 (middle), and 2020 (right).First row: observational reference, 2nd row: ensemble member most similar to observations, 3rd row: DCPP ensemble mean as constraining reference, 4th row: ensemble member most similar to the DCPP reference.Note that the observational constraint used the 5 years prior to selecting members (i.e.1956-1960 for the constrained projections 'initialised' in 1961, whereas the DCPP-based constrain uses the first five forecast years of the decadal predictions (i.e.1961-1965 for the constrained projections starting in 1961).

Figure 8 .
Figure 8. Same as figure 5 but showing centered anomalies, i.e. after subtracting the global average of the anomalies.

Figure 9 .
Figure 9. (a) Projection of near-surface temperature anomalies for the 20 year average of the years 2020-2039, relative to the 1981-2010 climatological period (unit: K) in the unconstrained CMIP6 ensemble.(b)-(e) Difference between the constrained projections and the unconstrained CMIP6 ensemble mean for the 20-year average 2020-2039 (unit: K).Stippling indicates where the differences are not statistically significant at the 95% level according to a 2-sided t-test.(b) OBS-constrained ensemble using uncentered pattern correlations, (c) OBS-constrained ensemble using centered pattern correlations, (d) DCPP-constrained ensemble using uncentered pattern correlations, (e) DCPP-constrained ensemble using centered pattern correlations.