Global temperature definition affects achievement of long-term climate goals

The Paris Agreement on climate change aims to limit ‘global average temperature’ rise to ‘well below 2 °C’ but reported temperature depends on choices about how to blend air and water temperature data, handle changes in sea ice and account for regions with missing data. Here we use CMIP5 climate model simulations to estimate how these choices affect reported warming and carbon budgets consistent with the Paris Agreement. By the 2090s, under a low-emissions scenario, modelled global near-surface air temperature rise is 15% higher (5%–95% range 6%–21%) than that estimated by an approach similar to the HadCRUT4 observational record. The difference reduces to 8% with global data coverage, or 4% with additional removal of a bias associated with changing sea-ice cover. Comparison of observational datasets with different data sources or infilling techniques supports our model results regarding incomplete coverage. From high-emission simulations, we find that a HadCRUT4 like definition means higher carbon budgets and later exceedance of temperature thresholds, relative to global near-surface air temperature. 2 °C warming is delayed by seven years on average, to 2048 (2035–2060), and CO2 emissions budget for a >50% chance of <2 °C warming increases by 67 GtC (246 GtCO2).


Introduction
Reflecting the 90%-100% consensus among relevant research [1,2], the 5th Assessment Report of the Intergovernmental Panel on Climate Change (IPCC AR5) stated that 'warming of the climate system is unequivocal' and 'It is extremely [95%-100%] likely that human influence has been the dominant cause of the observed warming since the mid-20th century'. [3] Such scientific findings can inform policy responses in concert with other factors such as risk aversion, discounting of the future and assessments of the severity of future climate impacts. The Paris Agreement of the United Nations Framework Convention on Climate Change (UNFCCC) Article 2.1(a) expresses a long-term goal of: 'Holding the increase in the global average temperature to well below 2 • C above pre-industrial levels and pursuing efforts to limit the temperature increase to 1.5 • C above pre-industrial levels, recognizing that this would significantly reduce the risks and impact of climate change'.
However, 'global average temperature' is not precisely defined, and achievement of the Agreement's goal may depend on possible different definitions and available measurement techniques. A related concept is that of a carbon budget, the allowable cumulative carbon dioxide (CO 2 ) emissions consistent with a specified level of peak warming with a particular probability [4][5][6].
The IPCC 5th Assessment Report (AR5) assessed carbon budgets for various levels of warming in billions of tonnes of carbon (GtC) or of carbon dioxide (GtCO 2 ) based on projections of global nearsurface air temperature change, which we refer to as 'global-tas', where tas means 'temperature, air, at surface', from complex Earth System Models (ESMs). In general, climate modelling studies use global-tas, whereas observational records typically combine nonglobal coverage of near-surface air temperature over land with sea-surface temperature (SST) over oceans. As it is likely that stakeholders may have diverse interpretations as to what global average temperature refers, here we provide carbon budgets for different definitions of global average temperature, including definitions consistent with current observational products. Three main factors contribute to differences in 'global average temperature' change between globaltas and observational records. Firstly, there are regions with missing data that may not warm at the globalmean rate. For example, the Arctic is now rapidly becoming warmer and wetter [7], but much of it is commonly excluded due to lack of long-term data [8]. Secondly, under CO 2 -driven global warming, modelled near-surface air temperatures warm more than SSTs [9]. Finally, data providers must decide how to account for changes in sea ice. There may be a change from reporting estimated near-surface air temperatures to SSTs where ice has retreated. In the HadCRUT4 dataset [10] this approach probably results in an artificially low reported warming compared with the air warming due to features of the normalisation procedure.
We refer to issues related to missing data as being due to 'masking', and the other two factors together as 'blending', specifically 'air-sea blending' and 'sea-ice blending'.
One early study accounted for the masking and air-sea blending issues [11], and some studies have accounted for masking but this is not universal. Recently, it was shown that over 1861-1880 to 2000-2009, modelled global-tas increased 24% more than a HadCRUT4 like blended-masked estimate [12]. Current observed temperature records should therefore exceed 2 • C later than global-tas, implying a larger carbon budget if compliance were assessed using one of them. Here we extend this prior work by (i) reporting results to 2099, (ii) calculating carbon budgets using IPCC techniques, (iii) accounting for realistic potential future data coverage and (iv) applying blending and masking to a low-emission scenario. In particular, the addition of a low-emission scenario allows us to determine to what extent temperature definitions matter if policymakers choose to take strong mitigation action.
Future blending-masking biases may change relative to the past because of increased modern data coverage: indeed, the blending-masking bias under transient warming with 2000-2009 data coverage was estimated to be 15% instead of 24% [12]. Furthermore, with strong mitigation sea-ice cover would be expected to stabilise before 2100, suppressing the future sea-ice blending bias [13]. In addition, the longterm warming pattern may differ from the historical pattern, leading to a different effect of coverage bias [14][15][16].

Methods
We consider two emission scenarios from the Coupled Model Intercomparison Project, phase 5 (CMIP5): the low emissions Representative Concentration Pathway 2.6 (RCP2.6 [17,18]) and the high emissions RCP8.5 [19]. Among CMIP5 scenarios, only RCP2.6 has a substantial probability of <2 • C warming so we use it as representative of a world of strong mitigation. This allows us to estimate shifts in the probability of compliance with Paris targets in such a world, and to determine whether the magnitude of blending and masking biases should change substantially in the future. Meanwhile, RCP8.5 is used to estimate carbon budgets in a manner that is comparable with a set reported by the IPCC 5th Assessment Report. Note that we report decadal temperature changes relative to 1861-1880 to include simulations beginning in 1861 and avoid major volcanic eruptions. Supplementary figures 1 and 2 available at stacks.iop.org/ ERL/13/054004/mmedia further justify the choice of these reference periods.
We process CMIP5 simulations on a 1 × 1 • lat-lon grid using the Cowtan et al (2015) [20] algorithm and assuming that 2005-2014 geographic data coverage is maintained in future. This is done by downsampling the HadCRUT4 historical coverage up to December 2014 to 1 × 1 • and extending this coverage to December 2099 in the following fashion. For each calendar month, coverage is allowed if data are reported at that location for that calendar month in more than 5 years from 2005-2014 inclusive. Mapping at 1 × 1 • instead of 5 × 5 • does not affect reported global temperature but keeps spatial information that may be useful in future. We area weight all reporting cells, whereas HadCRUT4 calculates hemispheres separately then averages those: this introduces a minor 1.9% difference in 1861-2016 warming (supplementary figure 3).
We calculate four temperature series for each simulation beginning with the widely used 'global-tas', and then add the effect of SST blending by mixing air temperatures and SSTs before calculating the anomalies, which we call 'air-sea blended'. Next, we add the effect of sea-ice blending by calculating the anomalies in air and ocean temperatures separately before combining them, and call this 'fully blended'. Finally we restrict coverage to follow the historical or assumed future HadCRUT4 like data availability and call this 'blended-masked'.
We select all CMIP5 simulations that have continuous historical and RCP2.6 or RCP8.5 runs from 1861-2099 inclusive and for which we could obtain the required output fields. These fields are Nearsurface Air Temperature (short name 'tas'), Sea Surface Temperature (SST, 'tos'), Sea Ice Concentration ('sic') and Sea Area Fraction ('sftof', see the CMIP5 Standard Output description at http://cmippcmdi.llnl.gov/cmip5/data_description.html). Simulations are listed in supplementary tables 1 and 2 and model configurations can be found in table 9. Table  9.A.1 of AR5 [21]. Each simulation was processed using the Cowtan et al (2015) code and our updated future coverage mask. Blended temperature at the i,jth grid point, T blend,i,j is obtained using: where w air,i,j is the fraction of the grid cell from which near-surface air temperatures are taken, T air,i,j refers to the local air temperature 'tas' and T ocean,i,j the local SST 'tos'. Each of these is converted into temperature anomaly relative to the local baseline of the same type (i.e. air or water). After the local anomalies are calculated, the grid points are then averaged with a spherical Earth area weighting. For global-tas, w air = 1 always, while for blended series w air,i,j is the fraction of land plus sea ice within the grid cell. For air-sea blended, a grid cell's w air,i,j is fixed based on the initial sea ice extent whereas for fully blended the sea-ice fraction changes depending on the monthly sea ice concentration. Carbon threshold exceedance budgets (TEBs) are calculated as in the Technical Summary of IPCC AR5 [22]. Linear interpolation between decadal means are used to compute the diagnosed cumulative CO 2 emissions since 1870 to the point that warming exceeds a given temperature threshold. Unlike in ref. [22], only complex ESMs are included in the analysis with Earth system models of intermediate complexity (EMICs) excluded. Reported percentiles correspond to percentiles of the distribution of ESM TEBs for that warming threshold. ESMs (models that can interactively diagnose compatible CO 2 emissions with a prescribed concentration pathway) considered here are identified with an asterisk in supplementary table 2. Figure 1(a) shows the CMIP5 historical-RCP2.6 and historical-RCP8.5 ensemble time series of global-tas. Figure 1(b) shows the blending-masking differences and figure 1(c) the decadal averages of these differences as a function of global-tas for historical-RCP2. 6 and panels (d) and (e) the same for historical-RCP8.5. All results shown here use a single simulation from each model, labelled 'r1i1p1' in CMIP5 nomenclature. Results are not sensitive to including the full ensemble (supplementary table 3 and supplementary figure 4).

Effect of temperature definition under low emissions.
In RCP2.6 the air-sea blending bias stabilises in the last ∼70 years of the simulations while figure 1(c) shows that the sea-ice-blending and masking biases increase with global-tas throughout the series, but at a much slower rate than under RCP8.5. This suggests that the temperature stabilisation reduces sea-ice loss and its contribution to reported temperature bias. Similarly, the error bars in figure 1(b) show that uncertainty introduced by sea ice change is smaller under RCP2.6.
However, temperature bias still continues to grow with time in RCP2.6, and figure 2 demonstrates that the masking bias component is likely dominated by the warming at high northern latitudes, which tend to warm much more than the global average and are poorly sampled. Table 1 contains the ensemble median and 5%-95% range for RCP2.6 and RCP8.5 temperature changes over periods spanning the past (1861-1880), present (2007-2016) and future (2090-2099). Under RCP2.6, the percentage of simulations consistent with 2 • C warming increases from 75% for air-sea blended (the same as for global-tas) to 90% for blended-masked. Percentage blended-masked bias is calculated separately for each simulation and the median and ranges of these percentages are reported: the 16% bias for 1861-2016 differs from the 24% reported previously [12] due to the changed time period, available RCP2.6 runs, and because this result doesn't use the same HadCRUT4 hemisphereweighting.
The ensemble suggests a decrease in air-sea blending bias in future, with global-tas warming from 2007-2016 to 2090-2099 just 1.9% (0.5%-3.9%, all bracketed values 5%-95% ensemble range) greater than the air-sea blended value. In addition, improved geographical data coverage relative to most of the historical period reduces the masking bias, although the ice-blending issue remains at a similar magnitude. Overall, 21st century global-tas warming is 10.6% (1.2%-29.7%) greater than the blended-masked estimate. The full-period blending-masking bias from 1861-1880 is approximately 14.9% (5.7%-20.6%).
As masking contributes the most to our blendingmasking biases we assess whether our model-based estimates are realistic by considering observational data records that handle land data in different ways and have different masking biases. These datasets are HadCRUT4 [10], Cowtan and Way [8] and Berkeley Earth, [23,24] all of which combine land air temperature data with the HadSST3 ocean product [25,26] and extend over our full period. HadCRUT4 is subject to the full blending-masking bias while Cowtan & Way follows the HadCRUT4 method except that missing regions are infilled by kriging, a statistical method that accounts for spatial covariance in the field and more heavily weights nearby data. The difference for each scenario (as labelled) between the CMIP5 ensemble median blended-masked temperature change and the global tas-only, shown as blended-masked minus tas-only. Each line represents one extra blending or masking factor: blue is ocean-blend only, green is ocean-blend plus sea-ice blend, and red is both blends plus masking for data coverage. On the right of the figure, each point and bar represents the ensemble median and 5%-95% range of each difference for the final decade. ((c) and (e)) decadal means of the differences from ((b) and (d)) plotted as a function of the global tas-only temperature change relative to 1861-1880 for the labelled scenario. Table 1. Percentage increase in observed temperature change between selected periods when considering global-tas relative to the blended or blended-masked version. 'Fully blended' includes the sea-ice change effect in addition to air-water warming differences. CMIP5 ensemble median reported with 5%-95% range in brackets. Bottom row shows the number of simulations and percentage of ensemble that show <2 • C difference between 1861-1880 and 2090-2099. warm at the global-average rate show 12%-18% more warming, the same order of magnitude as the masking biases inferred from CMIP5 simulations, although particularly poor historical coverage over the Antarctic and Southern Ocean means that infilling from neighbouring regions may be inadequate.

Carbon budgets and temperature thresholds under higher emissions.
IPCC AR5 carbon budgets correspond to cumulative emissions compatible with thresholds of modelled global-tas warming. Carbon budgets for a 1.5 • C or 2 • C warming in any form of blended or blendedmasked estimate will therefore be higher than the corresponding IPCC AR5 budget. Budgets given in table 2 correspond to cumulative CO 2 emissions since 1870 until the point of exceeding 1.5 • C or 2 • C warming (a threshold exceedance budget or TEB [22,27]) under RCP8.5 (see Methods). The IPCC AR5 results are also included for comparison, and differ somewhat since they include EMIC runs and were reported to the nearest 50 GtCO 2 , or approximately 13.6 GtC.
For the blended-masked timeseries the 1.5 • C and 2 • C thresholds are reached a median 7-8 years later than for global-tas under this high-emission scenario. This has implications for carbon budgets, with the TEB for which 50% of the ESMs have warming below 1.5 • C increasing by 53 GtC (194 GtCO 2 ) and 67 GtC (246 GtCO 2 ) for the 2 • C threshold.
The IPCC carbon budgets were reported relative to 1870, but policymakers require up-to-date guidance to inform discussions related to the Paris Agreement. We therefore also calculate the remaining post-2015 carbon budget based on the ESM ensemble after adjusting for observed warming through 2015 following the approach of Millar et al (2017, [28]). For example, given that HadCRUT4 shows approximately 0.9 • C human-induced warming to 2015, another 0.6 • C results in a total of 1.5 • C. In our ESM simulations, the remaining blended-masked carbon budget with a >66% chance of <0.6 • C warming post-2015 is 246 GtC. However, if Berkeley Earth were to be used, then historical human-induced warming is greater.
It would also likely show greater future warming for a given quantity of CO 2 emissions too as Berkeley better approximates air-sea blended temperatures rather than the blended-masked approach of HadCRUT4. We estimate the remaining 1.5 • C budget at near 161 GtC in that case (see supplementary table 4 and related discussion).

Discussion
Here we have shown that achievement of the Paris Agreement's long-term goals could depend on the definition of 'global average temperature'. The scientific background to the Paris Agreement was informed directly by the Structured Expert Dialogue [29], which used blended datasets with a wide range of coverage to track warming to date. Our results indicate the potential impact of choosing different types of observational product in the future to measure global temperatures in the context of the Agreement. As it is unlikely that estimates of global mean air temperature, which inherently rely on climate models, will be used, we show how the use of 'blended' observational products would increase the policy-relevant carbon budgets for 2 • C relative to the global air-temperature budgets given by IPCC-AR5 (see table 2).
A recent study estimated the post-2015 carbon budget with a >66% chance of achieving a 1.5 • C target at 204 GtC, rather than the 70 GtC implied by IPCC AR5, suggesting that the 1.5 • C target is 'not yet a geophysical impossibility', but likely requires 'strengthened pledges for 2030 followed by challengingly deep and rapid mitigation', i.e. cuts in net anthropogenic emissions (Millar et al 2017, [28]). The Millar et al value differs from IPCC-AR5 budgets as it updated Table 2. Estimated carbon budgets expressed in GtC for various percentiles of the ESM distribution for 1.5 • C or 2 • C global warming thresholds and different definitions of 'global average temperature'. The median and 5%-95% ensemble range of exceedance years are also shown and correspond to the full set of RCP8.5 CMIP5 simulations and not just the ESM subset. If an alternative observational dataset were used to monitor global temperature in the context of the Paris Agreement then estimates of human-induced warming and compatible carbon budgets would change. For example, the Berkeley Earth product uses infilling techniques with more data sources and a different sea-ice algorithm which should reduce differences with global-tas. It shows almost 20% more human-induced warming than HadCRUT4 through 2015, and hence would reduce post-2015 carbon budgets by around 80 GtC.
Biases associated with incomplete data coverage and the blending of air and water data both suppress reported warming relative to global near-surface air temperatures. Our analysis and results in table 1 indicate that these biases will tend to be smaller in future provided that the improved data coverage of recent decades is maintained. Furthermore, under a scenario of strong mitigation, the differences introduced by the retreat of sea-ice are smaller than under high emissions where sea ice retreat is more pronounced.

Conclusion
We have demonstrated here the importance of a clear understanding of different definitions of global mean temperature with regards to carbon budgets and achievement of long-term climate goals under the Paris Agreement. We propose that the definition of global mean temperature should be physically based, transparent and verifiable in order for stakeholders to have confidence in its value. For a timeseries to be truly 'global', it must account for the incomplete spatial coverage of direct observations, requiring techniques such as those used in Cowtan & Way or Berkeley Earth. It is key that policy-makers unambiguously elucidate how they intend to measure global temperatures in the context of the Paris Agreement to enable the most useful mitigation advice to be provide by the scientific community. If pure observation-based timeseries are used then further efforts for data-recovery in data-sparse regions would help, as would more long-term stations at high latitudes. In addition, the sea-ice blending effect is a non-physical artefact of algorithm design and it should be possible to account for this in future datasets. However, the long-term airsea blending effect is difficult to verify due to the lack of robust, homogenised and long-term collocated air-SST ocean data, and its lack of measurability may justify the definition of global-average temperature as being an air-sea blended value. Under this definition, potential blending biases are reduced to an equivalent of an apparent 2-3 year delay in exceeding temperature targets, instead of the 7-8 years for a fully blended-masked series.