The influence of non-stationary teleconnections on palaeoclimate reconstructions of ENSO variance using a pseudoproxy framework

Reconstructions of the El Niño–Southern Oscillation (ENSO) ideally require high-quality, annually resolved and long-running palaeoclimate proxy records in the eastern tropical Pacific Ocean, located in ENSO’s centre of action. However, to date, the palaeoclimate records that have been extracted in the region are short or temporally and spatially sporadic, limiting the information that can be provided by these reconstructions. Consequently, most ENSO reconstructions exploit the downstream influences of ENSO on remote locations, known as teleconnections, where longer records from palaeoclimate proxies exist. However, using teleconnections to reconstruct ENSO relies on the assumption that the relationship between ENSO and the remote location is stationary in time. Increasing evidence from observations and climate models suggests that some teleconnections are, in fact, non-stationary, potentially threatening the validity of those palaeoclimate reconstructions that exploit teleconnections. This study examines the implications of non-stationary teleconnections on modern multi-proxy reconstructions of ENSO variance. The sensitivity of the reconstructions to nonstationary teleconnections were tested using a suite of idealised pseudoproxy experiments that employed output from a fully coupled global climate model. Reconstructions of the variance in the Niño 3.4 index representing ENSO variability were generated using four different methods. Surface temperature data from the GFDL CM2.1 were used as pseudoproxies for these reconstruction methods. As well as sensitivity of the reconstruction to the method, the experiments tested the sensitivity of the reconstruction to the number of nonstationary pseudoproxies and the location of these proxies. We find that non-stationarities can act to degrade the skill of ENSO variance reconstructions. However, when global, randomly spaced networks (assuming a minimum of approximately 20 proxies) were employed, the resulting pseudoproxy ENSO reconstructions were not sensitive to nonstationary teleconnections. Neglecting proxies from ENSO’s centre of action still produced skilful reconstructions, but with a lower likelihood. Different reconstruction methods exhibited varying sensitivities to non-stationary pseudoproxies, which affected the robustness of the resulting reconstructions. The results suggest that caution should be taken when developing reconstructions using proxies from a single teleconnected region, or those that use less than 20 source proxies.


Introduction
Reconstructions of the Earth's climate prior to instrumental records are necessary for providing context for anthropogenic climate change, and to provide insight into climate variability on timescales longer than instrumental records allow.Climate proxies are biotic or chemical analogues that have a sensitivity to some aspect of the climate, for exampleoxygen isotope ratios in coral growth rings contain information on temperature and precipitation (Pfeiffer et al., 2004).However, high-quality proxies can be sparse and difficult to find (McGregor et al., 2010;Neukom and Gergis, 2012), lim-iting the amount of information that can be inferred about the climate.
One region where information from palaeoclimate proxies is limited is the central and eastern tropical Pacific Ocean.This area can be described as the centre of action of the El Niño-Southern Oscillation (ENSO), which is the most important regulator of interannual climate variability globally.ENSO involves changes in eastern equatorial Pacific sea surface temperature (SST) and an associated swing in precipitation and pressure anomalies across the tropical Pacific Ocean.While its most noticeable effects are in the tropical Pacific region, it also induces downstream effects, influencing climate variability in many parts of the world via teleconnections (e.g.Power et al., 1998;Brönnimann et al., 2006;Liu et al., 2013;Ding et al., 2014).
Due to the global reach of ENSO, understanding its behaviour is of great societal and economic importance (Solow et al., 1998;McPhaden et al., 2006).There are still uncertainties about past ENSO (Gergis and Fowler, 2009) and whether ENSO behaviour will change in response to future climate change (Collins et al., 2010;Vecchi and Wittenberg, 2010;Watanabe et al., 2012;Yeh et al., 2014).One reason for this is that the instrumental record is too short ( ∼ 150 years) to measure long-term changes in ENSO and its teleconnections (Wittenberg, 2009;Gergis et al., 2006, and references therein).Modelling suggests that five centuries of data may be required to understand the full range of natural ENSO variability (Wittenberg, 2009).Thus, climate proxy reconstructions of past fluctuations in ENSO are an essential tool in determining the full range of natural ENSO variability.
As previously described, the centre of action of ENSO is largely devoid of long, continuous, high-quality palaeoclimate proxy records (Wilson et al., 2010).Tropical corals are the dominant proxy type in this region, and are known to provide very skilful reconstructions of the surrounding SSTs and ENSO.However, the addition of non-climatic noise to these proxies also complicates the estimation of the significance of changing in past ENSO variability (Russon et al., 2014(Russon et al., , 2015)), as does their limited life span (i.e.records are on average about 50 years in length, with the longest records less than two centuries) (Cobb et al., 2013;Li et al., 2013;McGregor et al., 2013b;Neukom and Gergis, 2012).This has motivated the use of palaeoclimate proxies from single or multiple regions that are teleconnected with ENSO for the generation of reconstructions.For example, ENSO reconstructions have been developed using palaeoclimate proxies from the south-west US and northern Mexico (D'Arrigo et al., 2005) and northern New Zealand (Fowler, 2008), as well as using multiple proxies from locations in the tropical and subtropical Pacific outside ENSO's centre of action (Braganza et al., 2009;Cobb et al., 2013;Wilson et al., 2010).Multi-proxy reconstructions are generally considered to be more robust and more likely to contain a larger ratio of climate signal to local noise (Mann et al., 1998;Gergis and Fowler, 2009).
There are several issues when using teleconnected proxies for palaeoclimate reconstructions.Teleconnections may be non-linear in nature, for example, responding to El Niño events much more strongly than La Niña events (Hoerling et al., 1997).If this is not detected and accounted for in the reconstruction, ENSO variability and amplitude may be misrepresented (McGregor et al., 2013a).However, perhaps an equally important issue is the variability in the teleconnection itself.ENSO reconstructions exploiting teleconnected locations implicitly assume that the teleconnected relationship does not vary significantly in time -that it is stationary.However, it is often difficult or impossible to assess stationarity due to the brevity of the instrumental records (Gallant et al., 2013), causing many to skip this check altogether, noting it as an assumption.
However, significant changes in the relationship between ENSO and the climates of remote, teleconnected locations have been detected in models (Coats et al., 2013;Gallant et al., 2013), instrumental observations (López-Parages and Rodríguez-Fonseca, 2012; Gallant et al., 2013) and palaeoclimate data (Hendy et al., 2003;Rimbu et al., 2003;Timm, 2005).If these teleconnections were changed by some dynamical regime rather than through stochastic influence (e.g.random weather events), the relationship should not be considered as stationary.While these dynamical changes could be related to external climate forcing, such as with anthropogenic climate change (Müller and Roeckner, 2008;Herceg Bulić et al., 2011), there is evidence that they also change with internal climate forcing.For example, significant changes in teleconnections on near-centennial timescales are apparent in model simulations forced by internal dynamics alone (Gallant et al., 2013).
The changes to teleconnections via internal dynamics will result from either changes to ENSO itself or non-linear interactions with other regulators of climate variability.An example of the latter is the Southern Annular Mode, which is thought to affect the magnitude of South Pacific ENSO teleconnections (Fogt et al., 2011).The evidence suggests that this occurs on timescales around 30 years or longer.Using running correlations as a statistical descriptor of the relationship between ENSO and a remote climate variable, several studies highlighted that running correlations employing 11-25-year windows of data exhibit large, stochastic variability only (Gershunov et al., 2001;Sterl et al., 2007;van Oldenborgh and Burgers, 2005).However, a study using longer windows of data spanning 31-71 years (Gallant et al., 2013) found that stochastic processes could not explain the changes in observed and modelled running correlations in a significant number of locations in Australasia.Similar results are also found using model simulations (Coats et al., 2013;Gallant et al., 2013).Thus, there are numerous locations that display changes in ENSO's teleconnections that can be classified as "non-stationary" and thus are thought to be due to dynamical processes.This places increasing stress on the assumption that teleconnections are stationary.This raises the question as to whether non-stationarities have an appreciable influence on the robustness of past palaeoclimate reconstructions.
This study examines if and when non-stationary teleconnections degrade the skill of multi-proxy reconstructions of ENSO variance by employing a series of pseudoproxy experiments from a fully coupled global climate model (GCM).The robustness of ENSO variance changes (Russon et al., 2014(Russon et al., , 2015) ) is not examined in this paper.The experiments test how reconstruction skill varies with different proxy network locations and sizes.The sensitivity of the results to the reconstruction method is also tested.The model and the data used for these experiments are described in Sect. 2 and the methods are described in Sect.3. The experimental outcomes are presented in Sect.4, discussed in Sect. 5 and conclusions are provided in Sect.6.

Model data
This study uses 500 years of a pre-industrial control run of the Geophysical Fluid Dynamics Laboratory Coupled Model 2.1 (GFDL CM2.1) for all pseudoproxy experiments, which are described in detail in Sect.3. ENSO is represented using the Niño 3.4 index, calculated from the model as the area average of SST anomalies from the central Pacific region (5 • S-5 • N, 190-240 • E).In the GFDL CM2.1 simulations, the monthly variations in the Niño 3.4 index very closely correspond to the variations of the first empirical orthogonal function (EOF) of tropical Pacific SSTs, demonstrating that the Niño 3.4 index accurately represents ENSO variability in the model (Wittenberg et al., 2006).
Using climate data directly from GCMs is ideal for the evaluation of reconstruction methods (Zorita et al., 2003;Lee et al., 2008;von Storch et al., 2009) because models can provide the long time series necessary to robustly assess multidecadal to near-centennial-scale variability in teleconnections (Wittenberg, 2009).The ENSO indices can be calculated directly from the model, representing a "true" Niño 3.4 index for the reconstructed indices to be compared to.This allows the skill of reconstructions to be compared and their sensitivities to be studied.
The GFDL CM2.1 was selected due to its realistic representation of ENSO characteristics (Wittenberg, 2009, and references therein).The seasonal SST structure and ENSO evolution is well represented when compared to observations (Wittenberg et al., 2006;Joseph and Nigam, 2006), while also matching their power spectra (Wittenberg et al., 2006;Lin, 2007).The representation of the strength of local teleconnections in the model (Fig. 1b) shows that the regional responses of surface temperature (TS) and the Niño 3.4 index (shading) are quite similar to the observations (contours).Note that hereafter "TS" refers to SST temperatures over model ocean points and land surface temperatures over model land points.Hence, ENSO in GFDL CM2.1 is imposing downstream effects, i.e. teleconnections, that are broadly consistent with the observations, even if the strength of the connection is not as is observed (Wang et al., 2012).It has also been shown that the model teleconnections, represented by correlations in 31-year windows between grid points and the Niño 3.4 index generated from the model, do change over time and differ compared to correlations calculated over the entire period (Fig. 1a; Wittenberg, 2012).There is significant variation in teleconnection strength (i.e. the range of possible correlations) when using shorter windows of data compared to those of the entire data set.
It has been noted that the strengths and temporal and spatial structures of localised ENSO teleconnections can be poorly represented in GCMs (Joseph and Nigam, 2006;Rowell, 2013;Gallant et al., 2013).This is also seen in CM2.1, as there are teleconnections that are poorly represented at the local level, particularly on the "edges" of the main teleconnections regions (e.g. on the coast of Australia and North America).This is due to inaccuracies in the representation of the mean climate, annual cycle, ENSO, and the other modes of climate variability that are influenced by, or which influence, ENSO, such as the Southern Annual Mode (Delworth et al., 2006).While this limits the conclusions that can be drawn about real-world teleconnections, it still allows for an examination of reconstructions and the associated influence of the non-stationarity of teleconnections, internal to the GCM.
As ENSO events are generally synchronised to the seasonal cycle, the modelled TS was converted to July-June averages to capture ENSO event initiation and termination within 1 year (Rasmusson and Carpenter, 1982;Tziperman et al., 1997).This has the added benefit of reducing 500 years of monthly TS data (6000 values) to 499 annual values, matching the resolution of the majority of ENSO proxies.The 499-year mean was removed from the data set and the grid point time series were then linearly detrended by calculating the residuals from a line of best fit using linear regression in order to remove long-term trends such as model drift.This modified TS data set is used for all calculations and experiments in this study.Modelled precipitation, only briefly discussed in Sect.4, was subjected to the same processing prior to any calculations.The lines are the 1st, 5th, 50th, 95th, and 99th percentiles, with the lowest lines indicating the lowest percentiles (i.e. the bottom line is the 1st percentile).(b) The shading is the correlation between of the entire 499 years of TS at each grid point and the model-calculated Niño 3.4 index, both calculated from the GFDL CM2.1 data, also described in Sect. 2. The black contour lines are the correlation coefficients (spacing of 0.2) of observed surface land-sea temperatures to its corresponding Niño 3.4.Solid lines are positive values, while dashed lines are negative.These observations were calculated using the last 50 years of annual mean GISTEMP_ersst observational data (GISTEMP-Team, 2015).Data set is described by Hansen et al. (2010).

Methods
This section describes how the model data are used as a substitute for climate proxies and are selected for multi-proxy reconstructions.Non-stationarity in this paper is defined in Sect.3.2, and the palaeoproxy reconstruction methods tested will be described in Sect.3.3.

Pseudoproxy generation
The model TS and precipitation data were used to represent the climate proxies for all reconstructions.These data are commonly referred to as pseudoproxies and represent a "perfect" proxy, free of non-climatic noise (von Storch et al., 2009).The pseudoproxies are not degraded by adding noise (e.g. Lee et al., 2008), as the effects of noise on the reconstructions are not in the scope of this study.Pseudoproxies are randomly selected from a subset of the globe, determined by several conditions, depending on the experiment.The most basic condition, present in all experiments, is that the absolute correlation between the model grid point and the Niño 3.4 index is above 0.3 in the calibration window.This threshold is an arbitrary criterion that is simply there to ensure the pseudoproxies at least partially represent ENSO.The calibration window is the time period where relationships between the TS grid points and the model Niño 3.4 index are established.It is entrusted to the reconstruction methods to enhance the signal-to-noise ratio.
Networks of 3 to 70 pseudoproxies were used so that the effect of increasing network size could be examined.The same pseudoproxy was not used in the same network more than once, but could be used in multiple networks.To produce reconstructions of the model Niño 3.4 index variance, 1000 random networks were selected per network size, calibration window length, and calibration window position.The randomised selection process over a large number of grid points means that there is only a very small chance that a network would be replicated within 1000 iterations.
The correlation between ENSO and each grid point time series (i.e.Niño 3.4 and TS) over the whole time period (499 years) is assumed to represent the true teleconnection strength.In reality, however, information is limited to the observational record.As such, calibration can only occur during a relatively brief period, which we expect to result in reconstructions that are not as accurate as they potentially could be.To assess the effects of the use of different calibration windows, we carry out three versions of each experiment.
-The first version represents the scenario where all pseudoproxies with a good correlation, defined as |r| ≥ 0.3 over the whole time period (499 years long), can be used in the reconstructions (Fig. 1b).This can be conceptualised by using Fig. 1a, with this version corresponding to selecting the areas where |r| > 0.3 on the x axis (r is 499-year correlation).Information from the entire time series is available in this scenario, and can be thought of as using a calibration window 499 years long.
-The second version represents the realistic scenario, where calibration information is restricted to within a relatively small window and the long-term correlation is unknown, much like the effects of limited instrumental data in reality.This can be thought of selecting the areas where |r| > 0.3 on the y axis (r is correlation in the calibration window).This implies that there is a chance that the mean correlation over the whole time series is zero, or perhaps the opposite to the expected sign, and this is when non-stationarities are likely to be the largest problem for reconstructions.This would vary with calibration window, and is reflected in Fig. 2b, d and f, with the narrowing of the percentile lines as the length of the calibration window increases.
-The third version represents a combination of the first two versions, selecting the proxies with a good correlation in the calibration window but also over the whole time period (which would normally be unknown).This is equivalent to the case where a proxy is selected during a calibration period but also happens to have good correlations outside the window -the ideal proxy.This is represented by the overlapping areas of the first two versions in Figs.1a, and 2b, d and f for corresponding window lengths.This scenario uses a small calibration window like the second version of experiments but uses information from the 499 years of data as an additional more stringent pseudoproxy selection criterion.
The first and third versions of experiments produced substantially better reconstructions than the second version.This was ultimately because using much larger calibration windows and using information about the long-term strength of teleconnections results in more robust reconstructions.However, in reality, the generation of palaeoclimate reconstructions would apply an assumption equivalent to that of the second version of experiments, which limit the information on teleconnection strength to the calibration period only as they are constrained by the instrumental record.
As the second case is the most realistic case, we mainly focus on the second version of experiments for the remainder of the paper.For each grid box, the 499-year time series was split into ten calibration windows, of lengths 31, 61 and 91 years to match the running correlations performed previously.The mid-point of the calibration windows were spaced evenly in the 499-year data set, regardless of the amount of overlap or gap between them.Experiments were repeated for the different calibration window lengths and positions so that the sensitivity of reconstruction skill to calibration window characteristics could be examined.This resulted in 10 000 reconstructions for each calibration window length, for each experiment.The experiments based on pseudoproxy selection are described in Sect. 4.

Identifying non-stationarities
This study examines the conditions when non-stationary teleconnections impact the validity of palaeoclimate reconstructions.Therefore it is necessary to identify which grid points have non-stationary teleconnections, so that its impact on the reconstruction of ENSO can be assessed.The strength and variability in a location's relationship with ENSO was measured by calculating the running correlation between the grid point TS or precipitation time series and the modelled Niño 3.4 index.Running correlations used windows of 31, 61 or 91 years in order to examine multidecadal scale variations on a number of timescales.
This study uses the same definition of non-stationarity as described in detail in Gallant et al. (2013).Non-stationarity was tested against the null hypothesis that the running correlations from the GFDL CM2.1 were stationary.For this purpose, the running correlations computed from the GFDL CM2.1 were compared to the expected range of variation that the running correlations would exhibit if they were only influenced by random noise (e.g.weather events) at the grid point location.A Monte Carlo approach (similar to van Oldenborgh and Burgers, 2005;Sterl et al., 2007;Gallant et al., 2013) was used to generate stochastic simulations of TS and precipitation data at each grid point.The simulated data were constructed to have the same statistical attributes as the TS and precipitation data from the GFDL CM2.1 simulation.One thousand stochastic time series were computed for each grid point in order to determine this range, according to the following equation from Gallant et al. (2013).
(1) υ(t) is the stochastic TS or precipitation time series.The first two terms represent the stationary teleconnection strength, with a 0 and a 1 the regression coefficients between the grid point temperature or precipitation and the Niño 3.4 index c(t).The other terms represent the added noise.A red-noise process, η υ (t) + Bη υ (t − 1), was used and is weighted by the standard deviation σ υ of the local TS or precipitation time series, and the proportion of the regression's unexplained variance √ 1 − r 2 (where r is correlation of the local time series to the Niño 3.4 index).The red noise is generated by the sum of Gaussian noise (η υ ) and autocorrelation (B) of the TS or precipitation time series at a lag of 1 year.
A 95 % confidence interval was generated at each grid point from the stochastic simulations and was used to represent the range of running correlations possible, assuming a teleconnection was stationary.Thus, if a running correlation from the GFDL CM2.1 fell outside the range from the stochastic simulations, it was unlikely to have been influenced by stochastic processes alone.Hence, the teleconnection is defined as non-stationary.However, as a 95 % confidence interval was employed, and assuming independent and identically distributed data, such a test would falsely detect a non-stationarity in around 5 % of the time series.Thus, to decrease the likelihood of detecting false positives in the time series of running correlations, a grid point was defined as non-stationary only if the model running correlation time series fell outside the 95 % confidence interval more than 10 % of the time, which is double that expected by chance alone.As correlations are bounded, the running correlations were converted to Fisher Z scores using the following equation: where Z is the Fisher Z score and r is the running correlation values.Figure 2a, c and e show the number of non-stationary windows (outside the 95 % confidence interval) identified in the TS time series at each grid point for the different running correlation windows.Note that the points classified as non-stationary are denoted by the coloured areas in panels a, c, and e, while white areas indicate stationary teleconnections.There are more non-stationary grid points (N value on plot) with larger running correlation windows, suggesting that non-stochastic influences on teleconnections increase as timescales increase.Of further note is a large nonstationary area in the equatorial Pacific; given this is the area surrounding our ENSO index, it is debatable whether this should be considered as a non-stationarity.Rather, we expect the changing relationship in this surrounding region to be the result of complexities of ENSO that may not captured by the simple stochastic model of stationarity.For instance, ENSO displays (i) significant non-linearities in its magnitude (An and Jin, 2004) and duration (Okumura and Deser, 2010); (ii) differences in the evolution of events with La Niñas and most small to moderate El Niños having SSTAs (sea surface temperature anomalies) that propagate from east to west, while the SSTAs of large El Niño events propagate from west to east (McPhaden and Zhang, 2009;Santoso et al., 2013); and (iii) changes in its spatial structure (cen-tral Pacific-eastern Pacific events) which may be considered different flavours of events rather than non-stationary teleconnections of the event (Gallant et al., 2013;Sterl et al., 2007).

Reconstruction methods
This study examines the potential effects of non-stationarities on multi-proxy reconstructions of the running variance of the Niño 3.4 index (representing the variability in ENSO) using pseudoproxy data.All running variances were calculated using 30-year windows.Four simple, commonly used multiproxy reconstruction methods were selected.In some methods, such as composite plus scaling (CPS), there are variants to the technique designed to improve climate proxy reconstructions (Jones et al., 2009).However, the impact of nonstationarity on these will not be examined in this study.The reconstruction methods to be tested are as follows.

Median running variance (MRV) method
The MRV method was developed by McGregor et al. (2013a) to reconstruct the running variance of palaeo-ENSO from climate proxy data.It involves calculating the running variance of each of the normalised (zero mean and unit variance) proxy time series, and then calculating the median of these time series.The selected proxies have a demonstrated link to ENSO, identified by a correlation above the prescribed value, to ensure the resulting median time series contains information about ENSO variability.

Running variance of median (RVM) method
This method was also devised by McGregor et al. (2013a), as an alternate to the MRV for calculating ENSO running variance.Here, if the constituent pseudoproxy series is negatively correlated to Niño 3.4, it is flipped in sign before being used for calculations.Each of the proxy time series are normalised to zero mean and unit variance before the median of the group is calculated.This median time series is then normalised prior to calculating its running variance, which is the RVM reconstruction.Despite only differing in the order of operations with the MRV, this method was included in the study as it uses raw time series data, rather than preprocessed data as for the MRV method.

Composite plus scaling (CPS) method
CPS is a common method for reconstructing climate data from climate proxies (Esper et al., 2005;Hegerl et al., 2007;Mann et al., 2007, and references therein).In this study, the CPS described in Esper et al. (2005) and Hegerl et al. (2007) is employed.The proxy time series are normalised to zero mean and unit variance and are weighted by their correlation to Niño 3.4, before being summed to form a single time series.After normalising this single time series, running variance is taken to reconstruct ENSO variance, hereafter called "CPS_RV".

Empirical orthogonal function principal component (EPC) method
This method, described in detail in Braganza et al. (2009), is based on the ability of EOFs to extract the leading modes of variability from a data set (Xiao et al., 2014, and references therein).Like the MRV method, the proxy data must have established connections to ENSO to ensure that the common dominant signal is an ENSO signal.The leading EOF is then multiplied by the original pseudoproxies, and summed to produce a principal component (PC) time series that is a reconstruction of the ENSO index.The sign of the leading EOF is flipped, if necessary, to ensure that the resulting PC has a positive correlation with the modelled ENSO.Like the CPS method, the running variance of this normalised PC time series is calculated to produce a reconstruction of ENSO variance (hereafter named "EPC_RV").

Reconstruction performance
To measure the skill of the reconstructions, each is quantitatively compared to the running variance of the ENSO index in the model (calculated in Sect.2) by calculating Pearson correlation coefficients and root-mean-squared error (RMSE).Figure 3 shows that each of these four methods captures the running variance well when the entire data set is available (and with larger proxy networks).Therefore, these methods can be viewed as effective in performing climate reconstructions of ENSO variance.Using all data, the CPS_RV method performs significantly better than the other methods (to a 1 % level of the two-sample Kolmogorov-Smirnov test and Mann-Whitney U test), while the RVM is the worstperforming index.

Results
The results of the pseudoproxy experiments are presented in this section.Calibration windows of 31, 61 or 91 years are used to generate the reconstructions, and this window length also corresponds to that used for the running correlation.Only grid points with a good absolute correlation to ENSO (> 0.3) within the given calibration window were used as pseudoproxies.Here we examine the sensitivity of the reconstruction methods to non-stationarities, and the effect of proxy location on reconstruction skill.As stated previously, there will be a focus on the reconstructions produced using grid point TS as the pseudoproxies.

Proxy location effects
ENSO reconstructions are thought to be affected by the locations of the constituent proxies, with many scientists viewing proxies from within the tropical region with higher regard than those sourced elsewhere.These proxies are closest to the centre of action and thus expected to be more skilful.Here we examine the impact of tropical Pacific region proxies on reconstructions by comparing two experiments: RND glb_ts , which selects n pseudoproxies randomly from the global domain (see Supplement Fig. S1 for locations), and RND ntrop_ts , which has similar random selection but excludes the tropical region: 10 • S to 10 • N, 100 to 300 • E (RND ntrop_ts ).Note that both experiments do not discriminate between stationary and non-stationary locations in this section.
The reconstruction skill, which is represented by the correlation between the pseudoproxy reconstruction of the Niño 3.4 running variance and the model Niño 3.4 running variance, of both experiments is presented in Fig. 4. Here, network size n is varied from 3 to 70 (described in Sect.3.1) on the x axis of each panel, while rows represent the different-sized calibration windows and columns the different reconstruction methods (see Sect. 3.3).Looking at the percentile range (Fig. 4, shading) of the correlations between experiments reveals that the removal of tropical Pacific proxies clearly acts to decrease the skill of the resulting reconstructions.
These differences are most easily highlighted by arbitrarily defining skilful reconstructions by some threshold and calculating what proportion of experiment's reconstructions can be classified as skilful.Here we define skilful reconstructions as those that explain more than half the variance of the model ENSO variance (grey line at ∼ 0.7 correlation).The proportion of skill metrics for the global RND glb_ts and non-tropical RND ntrop_ts experiments, which are respectively plotted in each panel of Fig. 4 as blue and orange lines, can then be further simplified by focusing on the skill difference between experiments (Fig. 4, black line).The skill difference shows clear calibration window length and reconstruction method differences that will be discussed further in Sect.4.3, but on average, when tropical proxies are not used in reconstructions, the proportion of skilful reconstructions decreases by 14 %.However, even without the tropical proxies, the RND ntrop_ts experiment still produced quite high proportions of skilful reconstructions for larger network sizes (> 20 proxies, 77 %).This implies that although there is a reduction in skill with extra-tropical proxies, non-tropical reconstructions still have a high likelihood of producing skilful reconstructions.

Effect of non-stationarities
Here we examine the effect of non-stationarities on reconstructions of ENSO in order to understand how they may impact past reconstructions of ENSO variance.To this end, we compare the results of two experiments: (i) STAT ntrop_ts , which selects pseudoproxies from the same region as RND ntrop_ts but only includes pseudoproxies that are considered stationary (see definition in Sect.3.2), and (ii) NSTAT ntrop_ts , which selects from the same region but only the non-stationary pseudoproxies.Thus, here we effectively separate the pseudoproxies of the RND ntrop_ts experiment into stationary and non-stationary subgroups and generate reconstructions from each.
Figure 5 has the same panel layout as Fig. 4, with the green and pink representing stationary (STAT ntrop_ts ) and nonstationary (NSTAT ntrop_ts ) experiments respectively.Shading represents the percentile ranges of the reconstruction skill, thick lines indicate the proportions of skilful reconstructions and the thick black line is the difference between the stationary (STAT ntrop_ts ) and non-stationary (NSTAT ntrop_ts ) experiments.For all calibration window lengths (rows) and reconstruction methods (columns), the stationary experiment has greater skill than the non-stationary experiment, although there is reasonable variation between reconstruction methwww.clim-past.net/11/1733/2015/Clim.Past, 11, 1733Past, 11, -1749Past, 11, , 2015 ods and calibration window lengths (this will be discussed in later sections).In some cases, non-stationarities can reduce the proportion of skilful reconstructions by up to 60 % (panel b, black line, n > 60), but on average the proportion of skilful reconstructions is reduced by 30 %.Thus, these experiments suggest that extra-tropical non-stationarities act to reduce reconstruction skill.
It is interesting to note that when tropical region nonstationarities are included, they appear to improve reconstruction skill (Supplement Fig. S3).The majority of the pseudoproxies in the tropical region were found to be highly correlated with ENSO, as expected, and to demonstrate very little variation in their correlations to ENSO (not shown), usually less than ∼ 0.1 correlation.However, as seen in Fig. 2 many of these proxies are still classified as non-stationary, which may be due to non-linearities or variations in flavour of ENSO events.Thus, regardless of whether they are classified as non-stationary or not, the inclusion of these tropical pseudoproxies acts to improve the skill of the ENSO reconstructions.
The fact that we see a minimal effect of non-stationarities in the randomly selected experiments (see Sect. 4.1) may be because the likelihood of selecting non-stationarities is relatively low.For instance, Fig. 6 shows the proportions of non-stationary pseudoproxies in the reconstructions for the RND glb_ts experiment with a 31-year-long calibration window.It varies with different proxy network sizes, but as expected, the smaller groups have a greater chance of higher proportions of non-stationary proxies.With networks greater than 30, the most likely proportion is around 14 %, which is much more consistent than the smaller groups.Even with very small group sizes (n = 3), the chance that all stations are non-stationary is only 0.3 % (red line from Fig. 6).When only using extra-tropical locations (RND ntrop_ts ), the most likely proportion of non-stationary proxies is around 9 %, with an even lower chance of all constituent proxies being non-stationary.There is also a tendency for more nonstationarities to occur with the use of longer calibration windows (see Fig. 2a); consequently, the proportions of nonstationary proxies increase.For example, networks greater than 30 proxies can be up to 25 % non-stationary when using 91-year calibration windows (not shown).Regardless of the increases in non-stationarities with the use of longer calibration windows, these longer windows still produced more skilful reconstructions in the random selection experiments than those with shorter windows (RND glb_ts and RND ntrop_ts ; Fig. 4).Thus, although non-stationarities have the potential to influence the skill of ENSO reconstructions, this scenario appears unlikely if proxies are selected similar to a globally random manner.
However, if pseudoproxies are selected from regions that demonstrate co-variability in the running correlation between TS and Niño 3.4 SST anomalies, reconstruction skill is severely deteriorated.To this end, an EOF analysis was used to "organise" this co-variability, of which it is expected that non-stationarities are a major part.This is seen in the PNEOF1 experiment shown in Fig. 7.In this experiment the EOF was carried out on the running correlations between TS and Niño 3.4 SST anomalies at each grid point.Pseudoproxy networks were then selected only from those grid points that exhibited a strong relationship with the leading EOF (i.e. the absolute value of the EOF weighting > 0.01).The spatial map of this leading EOF is shown in panel e, for 31year window running correlations.The leading EOFs of the longer windows have very similar spatial patterns, with spatial correlations of 0.86 and 0.84 produced respectively, when comparing the 61 and 91-year window length EOF1 spatial patterns (not shown).The leading principal components for each window length are also similar (panel f).The resulting PNEOF1 experiment reconstructions display a large loss in skill when compared to the stationary pseudoproxies in the reconstructions (STAT ntrop_ts , dashed lines), with the former having very little likelihood of producing a skilful reconstruction (Fig. 7a-d).The proportion of non-stationary grid points used in the PNEOF1 reconstructions was small, ranging from 9 to 15 %.However, there was still a substantial loss of skill in these reconstructions even though the majority of grid points were classified as stationary by our statistical definition.This implies that a large and coherent change to the teleconnection exists in that region even if it considered mostly statistically stationary, and that was enough to degrade reconstruction skill.Thus, care should be taken to avoid the scenario where all constituent pseudoproxies used in a reconstruction lie in a region where there are large, co-herent variations in teleconnections, even if these variations are considered stationary.

Pseudoproxy network size and length
As discussed previously, the ENSO reconstruction skill is sensitive to the pseudoproxy network size and window length.This is clearly seen in Fig. 8, which displays the reconstruction skill of three different previously presented experiments (RND glb_ts , RND ntrop_ts , and NSTAT ntrop_ts ) as three different colours (see legend).Each panel shows the proportions of skilful reconstructions (thick lines) for different reconstruction methods (as titled), and different calibration window lengths (see inset legend).What is clear in all panels is that the reconstruction skill generally improves with increasing network size for all experiments, regardless of reconstruction method and calibration window length.This is also true when all pseudoproxies in a network are nonstationary (NSTAT ntrop_ts experiment), although the reconstruction skill generally improves at a slower rate (Fig. 8, red lines).This implies that larger pseudoproxy networks are less affected by non-stationarities, but this is also dependent on the calibration window length (discussed below) and the reconstruction method (discussed in Sect.4.4).In fact, when pseudoproxies are randomly selected (RND glb_ts and RND ntrop_ts ), using a minimum of 20 proxies gives a reasonable chance (77 % chance on average) that the resulting reconstruction will be skilful (Fig. 8).
The calibration window length also has an impact on reconstruction skill and sensitivity to non-stationarities (Fig. 9).For example, using small calibration windows (31 to 91 years) compared to the total number of model years available (499 years) leads to a relative decrease in skill, as indicated by the black 499-year reconstruction being higher in skill than the reconstructions using smaller windows.This decrease in skill is potentially due to some information loss in the relative data sets, and not necessarily due to nonstationarities.However, this reduction in skill at the median (thick line) is quite small (∼ 0.1 correlation) even at the smallest networks sizes and in the worst-performing reconstruction method.Thus, although there is a reduction in skill due to loss of information with smaller calibration window lengths, this is relatively small compared to the possible impacts of non-stationarities (see previous section).Figure 8 also shows that larger windows tend to improve skill, with the larger window lengths consistently having higher proportions of skilful reconstructions in the random selection experiments (RND glb_ts and RND ntrop_ts ).Larger windows also appear to generally improve reconstructions in the NSTAT ntrop_ts experiment.This increase in skill is not as great as removing non-stationarities from the reconstructions (Fig. 5) or changing the reconstruction method (following section).

Reconstruction method comparison
All reconstruction methods create skilful reconstructions given sufficiently large calibration windows and proxy network sizes in the random selection experiments RND glb_ts and RND ntrop_ts (see Figs. 8 and 9).The CPS_RV method performs almost as well as the MRV, although mainly with longer calibration windows and for the random selection experiments (RND glb_ts and RND ntrop_ts , Fig. 8).However, there is a clear distinction in the skill from the MRV method reconstructions compared to the other methods tested when considering the impact of non-stationarities and neglecting tropical pseudoproxies.For instance, when tropical pseudoproxies are not used in experiments, the MRV reconstructions are only marginally affected (Fig. 4c, g and k) implying that the method is not as dependent as other methods on the highly correlated tropical region.This is expected, as the EPC_RV and CPS_RV involve weighting regimes that would favour the highly correlated tropical pseudoproxies (see Sect. 3.3, and references therein).The MRV method has the highest proportion of skilful reconstructions at the lowest network sizes in all other experiments (Fig. 8c), with the clearest differences seen in the NSTAT ntrop_ts experiment (Figs. 5 and 8), while the percentile range of the MRV method also tends to be the smallest.Both of these indicate that the MRV method has the lowest sensitivity to nonstationarities.Further to this, in spite of the MRV method being negatively affected in the PNEOF1 experiment (Fig. 7, thick lines), and displaying some sensitivity to calibration window length (91-year windows perform better than shorter windows), it produces the highest proportion of skilful reconstructions and is thus still the most robust against nonstationarities.
It is worth noting that although the MRV method shows the most consistently high correlations to ENSO and appears to be the least sensitive to calibration window position (smallest percentile ranges, Supplement Fig. S5), it has the highest RMSE.It is well known that all reconstruction methods result in a loss in ENSO variance, and this is clearly shown in Fig. 10.In Fig. 10a-d, we can see that all reconstructions underestimate the model Niño 3.4 running variance (black line).However, this figure also shows that this variance loss is exaggerated with the MRV method (panels c, g), and this is also seen in Supplement Fig. S6, particularly at the larger network sizes.It is this variance loss that leads to the high RMSE of the MRV method.Other methods do not suffer as much from this variance loss as they are normalised after the reconstruction but prior to the calculation of the running variance (see Sect. 3.3).As the MRV utilises running variances from the beginning, it is unable to be normalised.Thus, while the MRV reproduces ENSO variance with the highest correlation skill, the MRV method requires rescaling to better match the magnitude of the variance changes.
In order to compensate for the variance loss of each reconstruction (Fig. 10a-d), we rescale each method's resulting running variance time series (Fig. 10e-h).Rescaling of the running variance time series was carried out using the average (calculated over 1000 reconstructions) regression between the reconstructions and the modelled Niño 3.4 running variance within the calibration window.When the MRV (Fig. 10c) is scaled (Fig. 10g), there is a jump in reconstruction variance (grey shading), such that the modelled Niño 3.4 index running variance is now encompassed by the grey shading.Using this simple scaling technique, we see a large reduction in the RMSE (see Supplement Fig. S7) -up to a 0.1 reduction in the median (Supplement Fig. S7, cyan lines) and no changes in the correlation (not shown).In fact, it is noteworthy that on average the scaled MRV has the smallest RMSE (significant to the 99 % level via a two-sample t test) of all reconstruction methods.

Precipitation pseudoproxies
Although not the focus of this paper, precipitation was also examined for all experiments.Precipitation-based reconstructions showed more variation in skill than TS and required larger network sizes for the same skill (see Supplement Fig. S2), but otherwise they had similar tendencies to temperature outlined above.However, there was one key dif- ference in precipitation -NSTAT glb_pr (red lines) produced less skilful reconstructions than RND glb_pr (grey lines) as we would expect.This is likely due to the absence of a large spatially coherent region of correlations in the tropical Pacific Ocean (compare tropical areas in Supplement Fig. S1b  and e).The CPS_RV method also generally outperforms the other methods, except for the non-stationary experiment NSTAT glb_pr , where the MRV appears to be superior to all methods.The RVM method appears to perform slightly better with precipitation than temperature (Supplement Fig. S2, mainly at longer calibration windows), which is consistent with the findings of McGregor et al. (2013a).

Discussion
Non-stationary relationships between the modelled Niño 3.4 index and regional temperature and precipitation were detected in the GFDL CM2.1 model.Our results demonstrate that non-stationarities between ENSO and regional climates can occur in many regions around the globe, which extends previous work of Gallant et al. (2013), who found significant non-stationary areas in the Australasian region in both modelling and observations.Like in Gallant et al. (2013), our work shows non-stationarities exist in climate models globally on timescales longer than approximately 30 years, demonstrating their occurrence at low frequencies.This is in contrast to van Oldenborgh and Burgers (2005) and Sterl et al. (2007), who examined non-stationarities at higher frequencies and found no detectable evidence for them in the observations using running correlation windows of around 20 years.The fact that these non-stationarities are found in a pre-industrial control simulation shows that this low-frequency variability can arise from unforced, internal climate variability, adding further evidence that this lowfrequency variability is an inherent part of the climate system.
Identifying what causes the occurrence of nonstationarities in ENSO teleconnections is not within the scope of this study.However, Wittenberg (2009) showed substantial changes to the behaviour of ENSO on similar timescales to those identified here in a 2000-year simulation using the GFDL CM2.1.Wittenberg (2009) discussed that such changes to ENSO behaviour could conceivably alter the teleconnections between ENSO and local climate and that these changes may not be represented in the historical record.Gallant et al. (2013) identified non-stationarities in three different GCMs.It is noted that while numerous models display non-stationarities, their regional existence may vary depending on the model used (Coats et al., 2013).We do not expect our evaluation of various different The first row of panels shows the unscaled reconstruction methods, while the second row has been scaled with a linear regression.Reconstruction methods are indicated in the title above the panel, and the scaled variants are named beginning with an "S".The year of the reconstruction is shown on the x axis of these panels.Running variances are calculated using 30-year moving windows.Some example reconstructions (red, yellow and blue lines) are shown, with the colours corresponding to a specific network of pseudoproxies.These reconstructions are from the RND glb_ts experiment, with a 91-year calibration window.
reconstruction methods performance in the presence of non-stationarities to be affected by model configuration; however, we intend to examine this in future research.
In this study, the pseudoproxy approach in the reality of the GFDL CM2.1 pre-industrial control simulations avoids the problems of non-climate-related noise that is inherent to real-world palaeoclimate proxies, allowing us to focus on the sensitivity of reconstructions to the occurrence of nonstationarities alone.However, in reality non-climate-related sources of noise in palaeoclimate proxies will confound, and likely degrade, reconstruction skill to a greater extent than examined here.Thus, our finding that a network size of > 20 will minimise the effects non-stationarities on reconstruction skill is likely an underestimate of minimum network size for a real-world reconstruction.The compounding effects of noise and non-stationarities on the reconstruction method, and hence a reconstruction, should be the focus of future research efforts in this area.
All reconstruction methods examined generate skilful reconstructions when utilising globally random source proxy selection, given sufficiently large calibration windows and proxy network sizes.Therefore, the results presented here highlight a case for considering the influence of nonstationarities on real-world reconstructions and their underlying methods, which generally employ small proxy networks.The influence of the choice of method on the reconstruction and its sensitivity to non-stationarities was stark.The non-stationarities and reconstruction method usually had a greater influence on reconstruction skill than the calibration window length.In the best-case scenario (i.e.long calibration window and large proxy network), the CPS_RV method had the greatest skill.In less-than-ideal conditions (e.g.small calibration windows or proxy networks), the MRV method clearly excelled, even managing to produce a high proportion of skilful reconstructions given only pseudoproxies considered non-stationary (Fig. 5).However, the unscaled MRV method showed poor RMSE performance, meaning that it can only be used to provide useful information on the relative changes in ENSO variance.We also note that the large difference between the MRV and RVM experiments (Figs. 3  and 9) is contradictory to the results in Fig. 4 of McGregor et al. (2013a).However, these differences were due to the 10year low-pass filter used in McGregor et al. (2013a), whereas in this study the data were unfiltered.Consequently, the RVM was found to be sensitive to the low-pass filtering, while the MRV was insensitive (results not shown).
For reconstructions of large-scale phenomena like ENSO, larger, more globally diverse networks will produce more informative reconstructions compared to those derived from smaller regions or single sites (Mann, 2002;Lee et al., 2008;von Storch et al., 2009;McGregor et al., 2013a).The experiments conducted here support this hypothesis, as the proportions of skilful reconstructions increase as the number of source proxies increases for almost all reconstruc-tion methods and calibration window lengths (Figs. 8 and  5).Our work further shows that large networks also reduce errors relating to non-stationarity of teleconnections, which further supports their employment (Fig. 5).However, this skill improvement is affected by the degree of non-stationarity and teleconnection co-variability present in the reconstructions, with non-stationary proxy networks (NSTAT ntrop_ts , Fig. 8, red lines) and "organised" teleconnection co-variability (PNEOF1, Fig. 7a-d) reducing the degree of improvement in skill with increasing network size.Thus, where increasing network size would usually improve the reconstruction, non-stationarities and spatial coherence in variations in teleconnection strength can substantially temper this improvement.In extreme cases, where proxies are selected from co-varying areas (PNEOF1, Fig. 7), reconstruction skill may show no improvement with larger proxy networks.This further stresses the importance of ensuring that all constituent proxies utilised in a reconstruction are not affected by co-varying teleconnections.This is more likely achieved in spatially diverse, large multi-proxy networks.
The results of this study further emphasise the need for more palaeoclimate proxies to be available for multi-proxy climate reconstructions.Given the skilful reconstructions in ENSO variance that can be produced by neglecting pseudoproxies from the centre of action as shown here, the utilisation of data solely from the eastern equatorial Pacific appears unnecessary.In fact, these results utilising globally random proxy selection support the development of palaeoclimate proxies from a wide range of global locations.Furthermore, developing an understanding of the teleconnections and their underlying mechanisms around the globe will assist with selection of palaeoclimate proxy locations that are unlikely to be affected by teleconnection co-variability.

Conclusions
We have demonstrated that non-stationarities in ENSO teleconnected proxies can significantly reduce reconstruction skill, and that this is dependent on proxy location, multiproxy network size, and reconstruction method.These results make the implicit assumption that the modelled covariability in the non-stationarities and relative proportions of non-stationary areas to stationary areas is realistic, which has not been explicitly tested here.Ultimately, our results show that non-stationarities are unlikely to significantly affect reconstruction skill for larger, globally selected, multiproxy networks (> 20 proxies).Non-stationarities will deteriorate reconstructions if the entire network exhibits nonstationarities, but this is highly unlikely (< 0.3 %) for large networks (> 20 proxies), which can be considered globally distributed.However, the results suggest caution when developing reconstructions using single site proxies or multiple proxies from the same teleconnected region, as there is a reasonable chance this would lead to an unskilful recon-struction if there are no other sources of information.Thus, using multiple teleconnected regions minimises any effects of non-stationarities for all methods tested.
Reconstruction methods that operate on the raw time series data (weighting the proxy time series directly) are most sensitive to non-stationarities (RVM, and CPS_RV methods), while the method utilising the running variance time series (MRV method) is the most robust against nonstationarities.However, these were the only methods tested, and there are many reconstruction methods in the literature (Jones et al., 2009;Wilson et al., 2010) that should be tested in future research.Neglecting proxies from ENSO's centre of action still allows for skilful reconstructions to be made, but their inclusion reduces the chance of producing particularly poor reconstructions even if non-stationarities are present.Further research would involve examining the organisation of non-stationarities and co-varying teleconnections in more detail, exploring the use of running variance on proxy time series as preprocessing, or evaluating how robust other reconstruction methods are against non-stationary teleconnections.
The Supplement related to this article is available online at doi:10.5194/cp-11-1733-2015-supplement.

Figure 1 .
Figure 1.(a) The percentiles of correlations found in 31-year segments between the model (see Sect. 2) surface temperature (TS) at each grid point and the model-calculated Niño 3.4 index (y axis), plotted against the corresponding correlations for the whole 499 years of data (x axis).The lines are the 1st, 5th, 50th, 95th, and 99th percentiles, with the lowest lines indicating the lowest percentiles (i.e. the bottom line is the 1st percentile).(b) The shading is the correlation between of the entire 499 years of TS at each grid point and the model-calculated Niño 3.4 index, both calculated from the GFDL CM2.1 data, also described in Sect. 2. The black contour lines are the correlation coefficients (spacing of 0.2) of observed surface land-sea temperatures to its corresponding Niño 3.4.Solid lines are positive values, while dashed lines are negative.These observations were calculated using the last 50 years of annual mean GISTEMP_ersst observational data (GISTEMP-Team, 2015).Data set is described byHansen et al. (2010).

Figure 2 .
Figure 2. Panels (a), (c) and (e) show the number of non-stationary windows (coloured shading) for each grid point over the entire data set for 31-, 61-and 91-year windows, respectively.This shading has been adjusted for the slightly different lengths of data available for the different calibration window length.The number of non-stationary grid points (using 499 years of data) for any window is shown in the bottom right corner of each panel as N. Panels (b), (d) and (f) show the percentiles of correlations between global TS and Niño 3.4 in 31-, 61-and 91-year windows respectively (y axis) versus the corresponding correlations for the whole 499 years of data (x axis).This plot is very similar to Fig. 1a, but with the underlying coloured shading representing the y axis positions of non-stationary windows in the plot (according to definition of non-stationarity; see Sect.3).A deeper red indicates a higher density of points, as many points can occupy the same correlation values.

Figure 3 .
Figure3.The 5th (lower dashed), 50th (thick) and 95th (upper dotted and dashed) percentiles of correlation coefficients calculated between the pseudo-reconstructions of running variance and ENSO running variance (y axis) plotted against the proxy network size (x axis).The percentiles are calculated across the 1000 iterations of randomly selected groups of source proxies.These reconstructions are from the first series of experiments (see Sect. 3), which utilize the entire 499 years of data.

Figure 4 .
Figure 4.A comparison of reconstruction skill of the global RND glb_ts (blue) and the non-tropical RND ntrop_ts (yellow) experiments.Correlation coefficients are calculated between the reconstructed running variance and ENSO running variance (y axis), plotted against the proxy network size (x axis).The coloured regions show the range of these coefficients, from the 5th to the 95th percentile, with overlapping regions shown by the yellow-green colouring.The thick blue and orange lines show the proportion of skilful reconstructions for the RND glb_ts and RND ntrop_ts experiments respectively.Skilful reconstructions are defined as explaining greater than 50 % of variance (grey line).The black line is the difference in skill between the RND glb_ts (blue line) and RND ntrop_ts (orange line) experiments.Each row corresponds to different calibration window lengths, titled on the y axis.Each column represents different reconstruction methods, titled at the top of the columns.

Figure 5 .
Figure 5.A comparison of reconstruction skill of the "stationary" STAT ntrop_ts (green) and non-stationary NSTAT ntrop_ts (pink) experiments.Correlation coefficients are calculated between the reconstructed running variance and ENSO running variance (y axis), plotted against the proxy network size (x axis).The coloured regions show the range, from the 5th to the 95th percentile, with overlapping regions shown by the brownish colouring.The thick green and red lines show the proportion of skilful reconstructions for the STAT ntrop_ts and NSTAT ntrop_ts experiments respectively.Skilful reconstructions are defined as explaining greater than 50 % of explained variance (grey line).The black line is the difference in skill between the STAT ntrop (green line) and NSTAT ntrop (red line) experiments.Each row corresponds to different calibration window lengths, titled on the y axis.Each column represents different reconstruction methods, titled at the top of the columns.

Figure 6 .
Figure6.This plot shows the percentage of TS-based reconstructions (y axis) with certain proportions of non-stationary proxies (x axis) for the RND glb_ts experiment.Each of the ten 31-year calibration windows has been included in these calculations, so that the proportions of non-stationarities for 10 000 reconstructions are shown (50 % being 5000 reconstructions).Different lines are for different proxy network sizes (see inset legend), and this determines what values of proportion can be taken; hence, larger groups have a wider range of possible non-stationarity proportions than smaller groups.The coloured circles of any proportions with 0 % of reconstructions have not been shown.

Figure 7 .
Figure 7. (a-d) The reconstructions from the PNEOF1 (solid) and STAT ntrop_ts (dashed) experiments using different reconstruction methods.The proportion of skilful reconstructions (out of the 10 000) is shown for calibration windows of 31-(blue), 61-(green) and 91-year length (red), plotted against proxy network size (x axis).Skilful is defined as the reconstruction explaining greater than 50 % of the variance of the model ENSO reconstruction.(e) The spatial map of the leading empirical orthogonal function (EOF) of running correlations calculated between TS at each grid point and ENSO (with a window length of 31 years).The spatial structure of this EOF is quantitatively similar to the first EOF with running correlation window lengths of 61 (spatial r = 0.86), and 91 (spatial r = 0.84) years.(f) The leading principal components of the leading EOF with running correlation window lengths of 31 (blue), 61 (green) and 91 (red) years.The year values correspond to the centre of the windows.(g) The variance explained by the first 10 EOFs, for the three different window sizes (see inset legend).

Figure 8 .
Figure 8. Reconstruction skill for different experiments (as indicated by colour) and reconstruction methods (each panel) using different calibration window lengths (see inset legend).The lines show the proportion of skilful reconstructions, with skilful being defined as explaining greater than 50 % of the explained variance of the Niño 3.4 running variance.

Figure 9 .
Figure 9.A comparison of all the RND glb_ts reconstructions using 499 years of data (black) and when using limited calibration windows of 31 (blue), 61 (green), and 91 years (red).The 5th (dashed), 50th (solid line) and 95th (dot-dashed) percentiles of correlation coefficients are shown for each of the window lengths and for reconstructions using the 499 years of data.Correlation coefficients are calculated between the reconstructed running variance and ENSO running variance (y axis), plotted against the proxy network size (x axis).Panels (a-d) show the comparison for the four reconstruction methods discussed in Sect.3.3.

Figure 10 .
Figure 10.The reconstructed Niño 3.4 running variance range of various methods (grey shading) plotted with the model Niño 3.4 running variance.The first row of panels shows the unscaled reconstruction methods, while the second row has been scaled with a linear regression.Reconstruction methods are indicated in the title above the panel, and the scaled variants are named beginning with an "S".The year of the reconstruction is shown on the x axis of these panels.Running variances are calculated using 30-year moving windows.Some example reconstructions (red, yellow and blue lines) are shown, with the colours corresponding to a specific network of pseudoproxies.These reconstructions are from the RND glb_ts experiment, with a 91-year calibration window.