Uncertainty in regional temperatures inferred from sparse global observations: Application to a probabilistic classification of El Niño

Instrumental records showing increases in surface temperature are some of the robust and iconic evidence of climate change. But how much should we trust regional temperature estimates interpolated from sparse observations? Here we quantify the uncertainty in the instrumental record by applying multiresolution lattice kriging, a recently developed interpolation technique that leverages the multiple spatial scales of temperature anomalies. The probability of monthly anomalies across the globe is represented by an ensemble, based on HadCRUT4 and accounting for observational and coverage uncertainties. To demonstrate the potential of these new data, we investigate the area‐averaged temperature anomalies over the Niño 3.4 region in the equatorial Pacific. Having developed a definition of the El Niño–Southern Oscillation (ENSO) able to cope with probability distribution functions, we classify the ENSO state for each year since 1851. We find that for many years it is ambiguous as to whether there was an El Niño or not from the Niño 3.4 region alone. These years are mainly before 1920, but also just after World War II.


Introduction
Long-term increases in temperatures across the globe are one of the key indicators of climate change. Measurements of air temperature from ground-based stations and sea surface temperature (SST) from ships and buoys in the oceans are combined to generate global surface temperature estimates. Initially, these raw data undergo a quality control process to remove random errors (i.e, observer or instrumental errors). Homogeneity assessment is further carried out to eliminate systematic biases (i.e., nonclimatic signals). Multiple independent data sets have been compiled that treat these errors in different ways. They are commonly better known by their acronyms: MLOST by Smith et al. [2008], GISS by Hansen et al. [2010], JMA by Ishii et al. [2005], HadCRUT by Morice et al. [2012], and BEST by Rohde et al. [2013].
In addition to quality control and data homogenization, the deficient spatial coverage has been highlighted as an important issue. It can, for example, lead to an underestimation of the global mean temperature trend by biased sampling of the Arctic [Cowtan and Way, 2014] which is warming particularly quickly. Here we investigate and quantify the impact of coverage uncertainties using HadCRUT4 [Morice et al., 2012] which clearly flags locations where reliable observations are missing (section 2.1). In section 2.2, we apply a novel multiresolution interpolation technique [Nychka et al., 2015] to create a spatially complete ensemble of equally probable realizations of temperatures (section 3). The quantified uncertainties in the resulting ensemble are evaluated against published estimates at the global mean scale (section 3.1). As expected, the uncertainties arising from poorer coverage tend to be greater at the local scale. We illustrate this by looking at the El Niño-Southern Oscillation (ENSO) in section 4.2, using an enhanced definition of El Niño designed for probability density functions (section 4.1). missing observations that deteriorate the computation of a station's monthly average and climatology, as well as changes in station properties that are left uncorrected (equipment, location, measurement time, and urban heat effects). The spatial coverage of the data varies with the available observations. Even its most complete coverage presents gaps over Africa and the polar regions [Cowtan and Way, 2014].
HadCRUT4 flags any 5 ∘ latitude by 5 ∘ longitude grid box as missing data, when there are no known observations for that month [Morice et al., 2012]. This in contrast to other databases that use various methods of spatial interpolation to fill in missing grid boxes. MLOST uses linear spatial interpolation using nearby stations [Smith et al., 2008]. GISS incorporates a linear inverse distance weighting to use information from stations up to 1200 km of the prediction location [Hansen et al., 2010]. Kriging can also be used to fill any data gaps based upon either covariances (JMA) [Ishii et al., 2005] or correlations (BEST) [Rohde et al., 2013]. While all these interpolated data sets provide error estimates, they do not benefit from the recent leaps forward made in approaches to quantify uncertainty in interpolations.

Statistical Approach
Geophysical variables typically exhibit heterogeneous variations at a range of scales [Berliner et al., 2012]. Standard covariance functions used in kriging can often oversimplify the complicated spatial dependence prevalent in variables such as temperature. Moreover, it is challenging for covariance functions with light tails to capture variation among observations widely separated over the spatial domain (e.g., Matérn) [Kleiber and Porcu, 2015] as it is the case for temperatures over the globe. Newly developed statistical techniques outperform covariance-based kriging in terms of accuracy and uncertainty quantification over the globe [Chang et al., 2015].
Multiresolution lattice kriging (MRLK) [Nychka et al., 2015] is designed to predict sparsely observed twodimensional spatial fields. The approach's advantage comes from its use of multiple resolutions to represent spatial dependence at various levels (local and regional). Additionally, it uses sparse approximations of the covariance matrix that ensures computational efficiency. Recent revisions mean that the routine incorporates full spherical domains (https://cran.r-project.org/web/packages/LatticeKrig). At each resolution level, it organizes the nodes (centers) of compactly supported radial basis functions on a geodesic grid of points using Wendland covariance functions and the great circle distance. The geodesic grid is used instead of a rectangular grid to avoid distortions at the poles. The centers of basis functions are chosen by considering points on a regular icosahedron. The triangles are further subdivided four times meaning that the number of basis functions at each resolution level roughly increases by a factor of 4.
We apply MRLK to the HadCRUT4 data set with lattices at three resolutions ( Figure 1b), so that the number of basis functions (3366) is greater than the number of grid boxes (2592). This enables us to capture variations from global to regional scales [Katzfuss, 2017]. A common value (8.95) of spatial autoregression weight (i.e., a measure of the strength of random variations across the field) was used at all levels. The proportion of global variance contained at each resolution (i.e., the distribution of the total variability) are = (0.211, 0.6703, and 0.1187) T (i.e., two thirds of the temperature variability occur at scales of around 1000 km). These ratios, as well as the spatial autoregression weights, are determined using the least sparse temperature field (February 1988) since they model underlying physical characteristics common across fields. The spatial prediction also requires the estimation of covariance parameters [Nychka et al., 2015]. These are selected with a profile maximum likelihood method for each spatial field since these parameters will vary across months due to short-and long-term variability in spatial patterns of temperatures. Note that we do not consider uncertainties in the model parameters; accounting for them could lead to less confidence in our temperature estimates. This approach allows for the statistical properties to change across the grid, say, with latitude [Nychka et al., 2015].
As an illustration, we present results for May 1861 (Figure 1). This is the month in HadCRUT4 with the minimum spatial coverage, less than 10% (Figure 1a). The Wendland compactly supported covariance function is used to construct radial basis functions at multiple levels ( Figure 1b). Using the multiresolution basis functions and observed spatial points, the temperatures are extended to the whole globe ( Figure 1c). The pointwise prediction uncertainties arising can be computed (Figure 1d). These uncertainties may be large (over 1 ∘ C) far away from the observations but naturally reduce near them. One interesting feature of MRLK is that it "adjusts" individual observations to fit the wider spatial field (cf. Figures 1a and 1c). Inspection of the distribution of these adjustments demonstrates the appropriateness of the prediction uncertainties ( Figure 1d). Figure 1e displays the distribution of the deviations of the predicted values from the observed values since MRLK is not an interpolatory scheme. These (standardized) deviations follow a standard normal distribution highlighting the ability of MRLK to make meaningful predictions where data are present.

An Ensemble Sampling Uncertainty in Surface Temperature
A 10,000-member ensemble is created to sample both the spatial and observational uncertainties. The ensemble is created via the 100 ensemble members of the sparse HadCRUT4 data set, which sample solely the observational uncertainties [Morice et al., 2012]. For each of these original sparse members, 100 samples are generated from the MRLK's own prediction uncertainty. Rather than sampling each grid point's prediction independently, we instead sample given fixed basis functions. This approach draws strength from the multiresolution aspect of the lattice kriging to effectively represent both large-and small-scale correlation structures in the climate system. Each month is treated independently for consistency with Morice et al. [2012].
Additionally, a further 100 members are created by sampling the uncertainty in the MRLK-interpolated median of the original sparse observations [Morice et al., 2012]. These can compared to the initial 100 ensemble members (only characterizing observational uncertainties with spatial gaps) to demonstrate the relative impact of coverage and observational uncertainties.

Uncertainties in Global Mean Temperature
The importance of the sparsity of the instrumental observations has long been recognized, and the uncertainty caused by it has been estimated before [Morice et al., 2012]. The error arising in an estimate of a global (or regional) mean temperature anomaly can be estimated by subsampling a spatially complete data set (such as a reanalysis) at the actual, sparse coverage [Brohan et al., 2006]. Applying that approach to the HadCRUT4 data set, Morice et al. [2012] were able to estimate the relative importance of coverage uncertainty compared to observational uncertainty at the global and hemispheric level (Figure 2b). As a test of our methodology and MRLK prediction uncertainties, we evaluate those same relative contributions in the global mean temperatures of the new ensemble ( Figure 2a). For each year, the median member along with the 95% confidence interval from the two uncertainty estimation approaches is presented. The approach we have adopted (section 3) assumes that uncertainty within the individual observations is correctly sampled by Morice et al. [2012].
It is evident that uncertainty arising from the incomplete spatial coverage is substantially larger than that arising solely from the observational errors ( Figure 2). This dominance is especially clear prior to the organized global monitoring and collection of weather data due to the establishment of the World Meteorological Organization in the 1950s. Both methodological approaches result in similar estimates of the respective uncertainty contributions. Unsurprisingly, when the uncertainty itself is larger, the impact of the uncertainty quantification approach also becomes larger. The expanded uncertainty estimates in the early record from the new approach (Figure 2a) are perhaps understandable given that it cannot incorporate the physical covariance structures sampled in the reanalysis-based estimate [Brohan et al., 2006]. Nonetheless, we consider the comparison of the two uncertainty estimates to show that our approach is not unreasonable. In fact, it has the advantage of quantifying the uncertainty at more local scales, as will be demonstrated via application to a commonly analyzed region.

Niño 3.4 Temperature Anomalies
To demonstrate the benefits of the multiresolution lattice kriging ensemble, we investigate its impact on reconstructions of the Niño 3.4 anomalies. The El Niño-Southern Oscillation (ENSO) is the dominant mode of interannual variability in the global climate, with an El Niño being the warm phase of the oscillation  [Trenberth, 1997]. Of its many definitions, the most common is based on sea surface temperature anomalies in the Niño 3.4 region of the equatorial Pacific (5 ∘ S-5 ∘ N, 120 ∘ W-170 ∘ W). Therefore, the reconstruction of spatial surface temperature over this region is of particular interest. Although sea level pressure-derived indices exist [Ropelewski and Jones, 1987], SST-based indices are still commonly used [e.g., Banholzer and Donner, 2014;Hartmann et al., 2013]. Before the state of ENSO within the ensemble can be identified, we must first revisit how an El Niño is defined.

Probabilistic El Niño Definition
The traditional definition of ENSO is deterministic, as it requires knowledge of whether the temperature at a time, T i , exceeds a critical value, T crit , often 0.5 ∘ C [Trenberth, 1997] as used here. Temperatures greater than this threshold are termed "El Niños" and less than −T crit are "La Niñas." ENSO is considered to be in a "neutral" state otherwise. The temperature anomalies in the Niño 3.4 region are often computed with respect to a 30 year moving climatology (updated every 5 years) [Lindsey, 2013]. This traditional definition has not been designed to handle uncertain temperatures. We therefore introduce a more sophisticated definition that can cope with temperature probability density functions (PDF).
Similar to the usual definition, our definition (equation (1) is also based on a critical temperature, T crit , but additionally a critical probability, P crit (taken as 0.66 here, following the "likely" criterion of Mastrandrea et al. [2010]). It is designed such that as the spread in each temperature observation ( i ) collapses to 0, the traditional definition emerges. Examples of the classification in action are shown in Figure 3. (1) We find it necessary to define additional ENSO states in order to account for the different reasons that lead to an inability to categorize an event as either El Niño (Figure 3a) or La Niña (Figure 3b). This can occur in several situations: either we are not sufficiently confident in the observation or the absolute temperature anomaly is not sufficiently large. We consciously split our neutral state into a "definite" classification (where we have a strong knowledge that the temperature anomaly is less than T crit , Figure 3c) and a "residual" classification (which often occurs during a transition into or out of an El Niño or La Niña, Figure 3e). We also introduce a new ENSO state that we term "ambiguous," to highlight years where one cannot feel confident enough to make a meaningful classification (Figure 3d). We consider that if the uncertainty in an individual temperature estimate is so large that it could simultaneously encompass both an El Niño and La Niña, any resulting ENSO state should be treated with skepticism. We express this in equation (1) as the standard deviation of a Niño 3.4 SST anomaly's PDF being greater than the standard deviation of index during the climatological period, . In a climate modeling context, T crit is often taken as 0.5 , which would collapse our definition down further [Leloup et al., 2008]. It is worth stressing that the order in which these conditions are tested matters. For example, there is a large standard deviation in the estimated temperature anomaly for 1940 (Figure 4), but the warmth is so that a robust categorization can be made. In the results below, we present solely an annual classification of ENSO states derived from the December-January-February (DJF) season.

Past El Niño Years
The results of applying the probabilistic ENSO definition to the new ensemble are shown in Figure 4. It is apparent that we are rather certain of both the Niño 3.4 SST anomaly and the ENSO classification over the past 60 years. These classifications are consistent with other records derived from observations. It is also clear that there is little knowledge of the ENSO state before 1880: nearly all years are classed as ambiguous (Figure 4). This accords well with Hansen et al. [2010] only starting their reconstruction from this date. Nonetheless, there are also several years since 1880 when the classification of ENSO is also ambiguous. Apart from a pause in the 1940s (e.g., 1946, Figure 3d) our knowledge of whether a year saw an El Niño or not appears robust beyond 1920. The different flavors of El Niño and their global impacts have become a burgeoning research topic of late [e.g., Banholzer and Donner, 2014]. The identification of a shift toward more central Pacific-type El Niños around the 1950s [Yu and Kim, 2013] seems potentially misplaced given the difficulty in detecting whether a year even had an El Niño (Figure 4).

Conclusion
Appropriate techniques to adequately infer regional temperatures from the limited early observations are essential to investigate the warming since the preindustrial. Here we applied a sophisticated approach to not only overcome the sparsity but to explicitly calculate the uncertainty in the resulting fields. The HadCRUT4 global temperature data set [Morice et al., 2012] is used to demonstrate the utility of applying multiresolution lattice kriging [Nychka et al., 2015]. Following the ethos of Morice et al. [2012], we have created a large ensemble of equally plausible spatially complete temperature fields to provide a quantified uncertainty for each location's past temperatures. The impact of these uncertainties was explored on a regional scale using temperatures in the Niño 3.4 region to describe past ENSO events. This necessitates a definition of El Niño that accounts for probabilistic temperature estimates. The uncertainty coming from a lack of nearby temperature observations can cause ambiguity in the classification of ENSO, especially prior to 1920.