Extending CMIP5 projections of global mean temperature change and sea level rise due to thermal expansion using a physically-based emulator

We present a physically-based emulator approach to extending 21st century CMIP5 model simulations of global mean surface temperature (GMST) and global thermal expansion (TE) to 2300. A two-layer energy balance model that has been tuned to emulate the CO2 response of individual CMIP5 models is combined with model-specific radiative forcings to generate an emulated ensemble to 2300 for RCP2.6, RCP4.5 and RCP8.5. Errors in the emulated time series are quantified using a subset of CMIP5 models with data available to 2300 and factored into the ensemble uncertainty. The resulting projections show good agreement with 21st century ensemble projections reported in IPCC AR5 and also compare favourably with individual CMIP5 model simulations post-2100. There is a tendency for the two-layer model simulations to overestimate both GMST rise and TE under RCP2.6, which is suggestive of a systematic error in the applied radiative forcings. Overall, the framework shows promise as a basis for extending process-based projections of global sea level rise beyond the 21st century time horizon that typifies CMIP5 simulations. The results also serve to illustrate the differing responses of GMST and Earth’s energy imbalance (EEI) to reductions in greenhouse gas emissions. GMST responds relatively quickly to changes in emissions, leading to a negative trend post-2100 for RCP2.6, although temperature remains substantially elevated compared to present day at 2300. In contrast, EEI remains positive under all RCPs, and results in ongoing sea level rise from TE.


Introduction
Global mean surface temperature (GMST) and global mean sea level (GMSL) are fundamental aspects of the climate system important for both ongoing monitoring of observed climate change and for assessments of potential socioeconomic impacts of future change. While GMST rise responds relatively quickly to changes in greenhouse gas emissions (Collins et al 2013), the response of GMSL is characterised by much longer timescales and substantial long-term committed change (Church et al 2013, Clark et al 2016, Nauels et al 2017. It is partly this aspect of the sea level response that motivates scientists and stakeholders to think beyond the 2100 time-horizon that typifies sea level projections rooted in climate model simulations (e.g. Church et al 2013, Slangen et al 2014, Cannaby et al 2016. The primary motivation of the present study is to work towards multi-century sea level projections that are traceable to the CMIP5 (Coupled Model Intercomparison Project Phase 5; Taylor et al 2012) climate model simulations that formed the basis of the 21st century process-based GMSL projections presented in IPCC AR5 (Church et al 2013). The AR5 GMSL projections used estimates of global thermal expansion (TE) directly from CMIP5 models and the corresponding model simulations of GMST were used to derive GMSL contributions from glaciers and the surface mass balance of ice sheets.
Scenario-independent projections of the contributions from ice sheet dynamics for Greenland and Antarctica were developed based on the existing literature, with a similar approach also used for projections of groundwater (Church et al 2013). Since the publication of IPCC AR5, there has been much discussion in the literature around the potential future contribution of ice sheet dynamics, particularly from Antarctica (e.g. Levermann et al 2014, Ritz et al 2015, Golledge et al 2015, DeConto and Pollard 2016. However, the present work focuses on extending CMIP5 projections of GMST change and TE, which are crucial ingredients for developing multi-century GMSL projections. Similar efforts to develop multi-century projections of GMST and GMSL, including its various components, have been carried out recently by Nauels et al (2017) using the MAGICC simple climate model. The low computational cost of MAGICC allows greater exploration of both climate change scenarios and the uncertainties associated model parameter settings, including carbon cycle feedbacks. However, while Nauels et al (2017) calibrated their model against CMIP5 models, their MAGICC simulations are emissions-driven, whereas the process-based sea level projections presented in IPCC AR5 are concentrationsdriven (Church et al 2013). Nicholls et al (2018) used a different simple climate model (WASP; Goodwin 2016Goodwin , 2017 to develop projections of global sea level and ocean pH to 2300 under 1.5C and 2.0C stabilisation scenarios, and RCP8.5. Like the present study, these simulations were concentration-driven. However, the WASP-based studies used a 'history matching' approach (e.g. Williamson et al 2013), where model simulations are retained if they fall within the range of CMIP5 historical simulations, rather than the emulation of specific CMIP5 models that is used here.
In the present work we make use of a simple climate model to emulate a small number of individual CMIP5 simulations in order to provide an ensemble of multi-century projections that are more directly traceable to the sea level projections presented in IPCC AR5, using concentrations-driven estimates of climate forcings that have been diagnosed individually for each CMIP5 model being emulated.
The paper outline is as follows. In section 2 we describe the data and methods, including a description of the two-layer energy-balance model emulator that forms the basis of our extended projections. In section 3 we compare the emulator results with a subset of CMIP5 models that have simulations available to 2300. We present projections of GMST and thermal expansion for the complete emulator ensemble in section 4. Finally, in section 5 we include a discussion and summary.

The two-layer energy-balance model (TLM)
The work presented here employs a two-layer energybalance model (hereafter 'TLM', figure 1), which has proven to be a useful tool for understanding the response of complex climate models to climate forcings (Geoffroy et al 2013a, 2013b, Gregory et al 2015. The formulation is based on that described by Geoffroy et al (2013a) and does not include the efficacy term for deep ocean heat uptake , Geoffroy et al 2013b. The model consists of well-mixed upper and deep ocean layers, each of finite and fixed heat capacity ( , ) with temperatures expressed as anomalies (T' , T' ) relative to a pre-industrial equilibrium state. The temperature of the upper ocean layer is identified with the global mean surface air temperature. The upper ocean layer is subject to prescribed radiative forcing (F), can exchange heat with the deep ocean layer and emits radiation back to space according to the temperature anomaly and the climate feedback parameter ( ). The heat exchange between the upper and deep ocean layers is determined by their temperature difference and an exchange coefficient ( ), which can be thought of as representing of the strength of ocean vertical mixing. The two layer model has two prognostic variables (T' , T' ) and four free parameters ( , , , ). The Geoffroy et al (2013a) TLM parameters used in this study are summarised in table 1. The governing equations for the TLM are given below, where N is the net radiative imbalance at top-of-atmosphere, which is equal to the rate of change of global ocean heat content for this system. We use the parameters chosen by Geoffroy et al (2013a, table 1) to fit the surface temperature response of individual CMIP5 models to an idealised 4 × CO 2 abrupt forcing experiment. They showed that the tuned TLM was able also to accurately predict the surface temperature response for a 1% CO 2 experiment for each CMIP5 model. The TLM simulations of total ocean heat uptake are converted into global thermal expansion using the coefficient of expansion (e.g. Kuhlbrodt and Gregory 2012) for each CMIP5 model estimated by Lorbacher et al (2015). Of the 16 CMIP5 models presented in Geoffroy et al (2013aGeoffroy et al ( , 2013b, we ultimately make use of 14 (table 1). One model is eliminated due to no expansion coefficient being available (GFDL-ESM2M) and another model is eliminated due to being an outlier in the simulated thermal expansion (FGOALS-s2). Essentially, we use the largest possible TLM emulator ensemble, within the constraint of the parameter Figure 1. A physically-based emulator: the two-layer energy-balance model. The model consists of an upper ocean layer, which represents surface temperature and the atmosphere and a deep ocean layer. F is the radiative forcing at top-of-atmosphere, is the climate feedback parameter, is the heat exchange coefficient. T' and T' represent temperature perturbations from a pre-industrial equilibrium state. Prognostic variables are indicated in black and tuneable parameters indicated in red. Table 1. A summary of the TLM parameter fits for CMIP5 models reported in Geoffroy et al (2013a). These parameter settings constitute the 14 member TLM emulator ensemble used for the projections presented in figures 5 and 6. fits that are available. Of the 21 CMIP5 models used in the IPCC AR5 sea level projections ('AR5 ensemble'), eleven are common to the TLM emulator ensemble (table 2). The 21 CMIP5 models that constitute the AR5 ensemble were also selected on the basis of data availability, as discussed in Church et al (2013). In summary, there are no selection criteria for models to be included in the TLM and AR5 ensembles-all available models are used (noting that the AR5 ensemble projections were subject to the CMIP5 data availability at the time). In keeping with this philosophy, all CMIP5 model simulations available to 2300 are used in the assessment of the TLM ensemble and a subset of these are used to assess TLM model performance (where both TLM simulations and CMIP5 simulations are available, as indicated in table 2).

Estimates of radiative forcing (F)
We consider three climate change scenarios from the extended Representative Concentration Pathways framework (Meinshausen et  Our methods to convert these to radiative forcings are summarised in table 3. Unlike Nauels et al (2017), we determine the largest radiative forcings, those from CO 2 and sulphate aerosols, specifically for each CMIP5 model. Following the approach of Stevens (2015), the magnitude of the aerosol response for each model is related to its present-day aerosol forcing as estimated by Forster et al (2013) or ourselves (for models not included in Forster et al 2013). Radiative forcing from land-use changes are based on Annex II of the AR5 (IPCC 2013), with an adjustment from a 1750 to the 1850 baseline used here, except that for HadGEM2-ES we use −0.4 Wm −2 (diagnosed by Andrews et al 2017), and the present-day land-use forcing persists to 2300. Historical observed changes in total solar irradiance (TSI) including both 11 year solar cycles and longer term changes are specified by Solanki and Krivova (2003). For the future period, a time-invariant mean TSI of the final solar cycle is applied.

CMIP5 climate model data
To evaluate the TLM ensemble we use simulations of global mean surface air temperature ('tas') and global thermal expansion ('zostoga') from CMIP5 for RCP2.6, RCP4.5 and RCP8.5 (table 2). Zostoga represents an integral ocean quantity and can therefore be subject to climate model drifts associated with insufficient model spin-up and/or deficiencies in the representation of the global energy budget (Hobbs et al 2016). Following the approach of recent studies on Earth's energy budget McNeall 2014, Hobbs et al 2016), we use a linear fit to the available pre-industrial control ('piControl') data to drift correct zostoga time series for the RCP experiments.
In addition to the individual CMIP5 model simulations described above, this study makes use of AR5 ensemble projections of GMST and TE for RCP2.6, RCP4.5 and RCP8.5, These data constitute a central estimate and 90% confidence interval computed for the 21 CMIP5 models that formed the basis of the 21st century sea level projections presented in IPCC AR5 (Church et al 2013). We refer to these data as the 'AR5 ensemble' (table 2). The source data files are freely available from www.climatechange2013. org/report/full-report/ (see chapter 13 supplementary data files).

Emulator results
In this section we present comparisons of TLM emulated time series with those CMIP5 models that have data available to 2300 for the three RCP scenarios considered in this study. We compute the TLM 'discrepancy' as the difference between the CMIP5 model and TLM time series, for each variable. In general, the discrepancy varies by model and scenario. We compute the standard deviation across all discrepancy time series in order to factor this into the uncertainty of our emulated ensemble projections, which are presented in section 4. The standard deviations of the emulated ensemble and the discrepancy are added in quadrature for each variable and RCP scenario, under the assumption that the two terms are uncorrelated. We note that the primary objective of our emulated ensemble is to represent the behaviour of the AR5 ensemble as a whole, since the ensemble statistics (5%-95% range and median) are the basis of the AR5 projections.
The TLM generally does a very good job of emulating the CMIP5 model GMST response under RCP8.5 (figures 3(a) and (d)). For this scenario, changes in GMST simulated by the CMIP5 models are dominated by the forced response. The discrepancy is less than 10% of the forced signal and varies substantially across models. For this scenario, there is no obvious systematic bias in the discrepancy, which falls fairly evenly either side of the zero line for the seven CMIP5 models with data available to 2300. The worst performance is for IPSL-CM5A-LR, where the discrepancy exceeds 1.5C after 2250. The discrepancy term changes sign for some models, which leads to a local minimum around 2150 and more rapid growth thereafter (figure 3(d), shaded region).
Since the TLM parameter tunings were carried out using 4 × CO 2 simulations, it is perhaps not surprising that the RCP2.6 (a scenario with strong mitigation) performs the least well of the RCPs presented here (figures 3(c) and (f )). For this scenario, the CMIP5 time series exhibit substantial departures from the forced response arising from internal variability on annual-tomulti-decadal timescales. There is an overall tendency for the TLM to overestimate the temperature change seen in the corresponding CMIP5 model simulations, with the TLM discrepancy exceeding 50% of the peak temperature change some CMIP5 models ( figure 3(f )). The standard deviation of the discrepancy term grows initially and then stabilises somewhat during the 22nd century. Table 2. Summary of CMIP5 models and data used in this study. Xs in the RCP columns indicate CMIP5 models with data available to 2300 that are used to assess the TLM simulations and compute the model discrepancy (figures 3 and 4). Circles indicate individual CMIP5 model simulations that are used in comparisons with the AR5 and TLM ensembles (figures 5 and 6). An X in the final two columns indicates whether the CMIP5 model (or its tunings) are used in the TLM and AR5 ensembles. The shaded cells indicate those CMIP5 models that are common to both the TLM and AR5 ensemble.
For RCP4.5 the unforced GMST variations associated with internal variability become larger compared to the forced response in the CMIP5 time series (figures 3(b) and (e)). These variations are absent from the TLM emulated time series, since there is no representation of internal variability in the simple climate model. The magnitude of TLM discrepancy time series are similar to those for RCP8.5 and generally represent only a small fraction of the forced temperature change signal. Again, there are substantial differences in the performance of the TLM emulator for individual CMIP5 models. However, there is an overall tendency for the TLM to slightly overestimate the GMST anomaly and this bias exceeds 0.5C for some models during the final century of the simulations. The standard deviation of the TLM discrepancy grows approximately linearly with time (figure 3(d), shaded region).
As with surface temperature, the TLM generally does a good job of emulating CMIP5 time series of thermal expansion for RCP8.5 (figures 4(a) and (d)). The discrepancy time series show little evidence of any systematic bias and the standard deviation grows approximately linearly with time, approaching 0.2 m by 2300 ( figure 4(d)). Although still a small fraction of the forced response, the discrepancy is larger in relative terms than for global surface temperature. Variations in thermal expansion arising from internal variability in the CMIP5 model simulations are much smaller than those seen for surface temperature.
Under RCP4.5 the TLM continues to do a reasonable job of emulating the individual CMIP5 models. Unlike surface temperature, there is little evidence of any systematic bias in the TLM emulated time series of thermal expansion (figure 4(e)). The standard deviation of the discrepancy time series grows linearly with While surface temperature showed an increase in the size of the discrepancy relative to the forced signal for RCP2.6, this does not appear to be the case for thermal expansion (figures 4(c) and (f )). However, for this scenario there is some evidence of a systematic bias in the TLM model time series, which tend to overestimate the forced signal for thermal expansion. The IPSL model shows particularly poor agreement between the TLM and CMIP5 time series, even though it was one of the models with better agreement under RCP8.5. The standard deviation of the discrepancy grows approximately linearly over time and reaches about 5 cm by 2300 (about 20% of the mean forced response).
In general, the TLM performs better for the higher emissions scenarios where F is dominated by CO 2 forcing (figures 3 and 4). The comparison clearly illustrates the larger internal variability for GMST than for TE in the CMIP5 model simulations, which is most apparent for RCP2.6. Despite the relatively small sample sizes available, the TLM appears to show systematic overestimation of: (i) surface temperature change under RCP4.5 and RCP2.6; (ii) thermal expansion under RCP2.6. The fact that both quantities are overestimated for RCP2.6 suggests the net radiative forcing applied to the TLM model simulations may be overestimated for this scenario. We find no correlation between the timeaveraged discrepancy values for GMST and TE in any of the RCP scenarios, suggesting no systematic errors in the TLM parameter fits to the CMIP5 models in general.

Emulator ensemble projections
In this section we present comparisons of projections of GMST change and TE among three data sources. The first is the ensemble projections of the 21 CMIP5 models used for the 21st century GMSL projections reported in IPCC AR5 (Church et al 2013). The second is the individual CMIP5 simulations, including all available models for each RCP with data available to 2300. The third data set is our TLM emulated ensemble projections to 2300. This ensemble is always based on the emulation of the same 14 CMIP5 models (table 1). The standard deviations of the ensemble spread and TLM discrepancy (figures 3 and 4) are added in quadrature, under the assumption of independence of these uncertainties. All ensemble spreads are presented as 90% confidence intervals, assuming a normal distribution, following Church et al (2013).
The AR5 and TLM ensemble projections of GMST change over the 21st century show remarkably good  agreement in both the central estimate and 90% confidence interval. The central estimate of GMST change for the TLM is slightly higher than the AR5 ensemble for the RCP2.6 scenario, but results are almost identical for the other two scenarios. The agreement between AR5 and TLM results for RCP2.6 is better that we might have expected from the analysis of TLM model discrepancy shown in the previous section ( figure 3(f )). This suggests that the TLM has less overall bias compared to CMIP5 model simulations over the 21st century than suggested by the model subset considered in the previous section.
Beyond 2100, the TLM ensemble spread encapsulates most of the individual CMIP5 model time series that are available for the 22nd and 23rd Centuries. However, the reduction of GMST for RCP2.6, and the deceleration of temperature rise for RCP4.5, both appear to be underestimated by the TLM ensemble. The RCP8.5 scenario appears to show very good agreement in the time-evolution of GMST between the TLM  Forster et al (2013).

Yes
Methane (CH 4 ) Simplified expressions of radiative forcing detailed in Ramaswamy et al (2001).

No
Nitrous Oxide (N 2 O) Simplified expressions of radiative forcing detailed in Ramaswamy et al (2001).

No
Aerosols linear-log relationship between SO 2 emissions and forcing, e.g. Stevens (2015). (1.5-3.9) and 8.6C (5.1-12) for RCP2.6, RCP4.5 and RCP8.5 respectively. The AR5 and TLM ensemble projections of sea level change from TE over the 21st century also show very good agreement. The central estimates are very similar for all three scenarios, with very slightly lesser TE in the TLM ensemble under RCP4.5 and RCP8.5. The size of 90% confidence intervals for the two model ensembles are generally similar towards the end of the 21st century, but results vary somewhat by scenario.

Yes
All of the individual CMIP5 model time series available over the 22nd and 23rd Centuries are encapsulated by the 90% confidence interval of the TLM ensemble. At first glance it seems as if the TLM may overestimate the TE relative to individual CMIP5 model simulations for RCP2.6. However, closer inspection reveals that the CMIP5 models that show the largest expansion values over the 21st century are not available beyond 2100. The model discrepancy term plays a larger role in the overall TLM ensemble spread for TE that is does for GMST change, and becomes slightly larger than the spread in inter-model climate change response for all scenarios after 2100. At 2300, the central estimates (90% confidence intervals) of the TLM simulations of TE relative to the baseline period of 1986-2005 are 0.26 m (0.14-0.38), 0.48 m (0.28-0.67) and 1.2 m (0.84-1.6) for RCP2.6, RCP4.5 and RCP8.5 respectively.
Overall, the TLM ensemble shows very good agreement with the AR5 ensemble, in terms of both the central estimates and uncertainty, and therefore a smooth transition from 21st century projections to longer time horizons. On longer timescales both the central estimates and 90% confidence intervals for both GMST change and TE compare favourably with the CMIP5 model simulations that are available to 2300. We note that there is some overestimation of GMST and TE for RCP2.6 (and to some extent RCP4.5 for GMST). In practice, this means that sea level projections based on the emulated time series presented here will tend to overestimate the total sea level rise compared to the CMIP5 models. Despite this shortcoming, the TLM simulations perform well and can provide a useful basis for extending CMIP5 simulations and exploring sea level projections on multi-century timescales.
The time-evolution of GMST and TE across the RCPs presented here serves as a useful illustration of how different aspects of the climate system respond to changes in GHG emissions. GMST responds relatively quickly to the reduction in emissions under RCP2.6 and RCP4.5 (figure 5). However it is only under RCP2.6 that we see any reduction in the GMST beyond 2100 (in either the TLM or CMIP5 model simulations). In contrast, time series of TE show sea level rise from this component persisting to 2300 under all RCP scenarios (figure 6). This is indicative of Earth's energy imbalance also remaining positive for all RCP scenarios (figure S1) available at stacks.iop.org/ERL/13/084003/mmedia, and emphasises the need to monitor this aspect of ongoing climate change (von Schuckmann et al 2016).

Discussion and conclusions
We have presented a framework for extending the 21st century CMIP5 model projections of GMST and TE to 2300 as a stepping stone towards extending the processbased sea level projections presented in IPCC AR5 (Church et al 2013). Our method makes use of a simple TLM that has been tuned to emulate the behaviour of a subset of 14 CMIP5 models to arrive an emulated ensemble. The TLM formulation uses time-invariant values for the climate feedback parameter ( ) and the coefficient of heat exchange between the ocean layers ( ). As discussed by Held et al (2010), this model can be modified to effectively include a time-varying value for through including an efficacy factor for deep ocean heat uptake , Geoffroy et al 2013b. However, Geoffroy et al (2013b) showed that the impact of including ocean heat uptake efficacy is small for the timescales considered here and therefore is unlikely to have any substantive impact on the results presented here.
Despite the simplicity of the TLM used here, compared to other efficient climate models (e.g. MAGICC and WASP, section 1), we are able to achieve a good emulation of GMST and TE compared to the CMIP5 ensemble over the 21st century and the individual CMIP5 model simulations available to 2300. We note that our emulated ensemble projections tend to slightly overestimate the magnitude of both TE and GMST rise for RCP2.6, which suggests that our estimates of net radiative forcing may be too high for this scenario. Projections of GMSL rise based on these simulations may also tend to be similarly overestimated.
The model simulations presented here provide a clear illustration of the differing response of GMST and Earth's energy imbalance (for which TE is a proxy) to changes in GHG emissions. While GMST responds relatively quickly to changes in emissions, showing a negative trend post-2100 under RCP2.6, TE continues to rise under all scenarios. This result adds weight to recent calls to monitor Earth's energy imbalance as one of the primary metrics of ongoing climate change (von Schuckmann et al 2016). Even under RCP2.6, GMST remains substantially elevated at 2300, suggesting that it may take many more centuries before GMST returns to present-day levels, while TE is practically irreversible (Solomon et al 2009, Bouttes et al 2013.