Spatial and temporal agreement in climate model simulations of the Interdecadal Pacific Oscillation

Accelerated warming and hiatus periods in the long-term rise of Global Mean Surface Temperature (GMST) have, in recent decades, been associated with the Interdecadal Pacific Oscillation (IPO). Critically, decadal climate prediction relies on the skill of state-of-the-art climate models to reliably represent these low-frequency climate variations. We undertake a systematic evaluation of the simulation of the IPO in the suite of Coupled Model Intercomparison Project 5 (CMIP5) models. We track the IPO in pre-industrial (control) and all-forcings (historical) experiments using the IPO tripole index (TPI). The TPI is explicitly aligned with the observed spatial pattern of the IPO, and circumvents assumptions about the nature of global warming. We find that many models underestimate the ratio of decadal-to-total variance in sea surface temperatures (SSTs). However, the basin-wide spatial pattern of positive and negative phases of the IPO are simulated reasonably well, with spatial pattern correlation coefficients between observations and models spanning the range 0.4–0.8. Deficiencies are mainly in the extratropical Pacific. Models that better capture the spatial pattern of the IPO also tend to more realistically simulate the ratio of decadal to total variance. Of the 13% of model centuries that have a fractional bias in the decadal-to-total TPI variance of 0.2 or less, 84% also have a spatial pattern correlation coefficient with the observed pattern exceeding 0.5. This result is highly consistent across both IPO positive and negative phases. This is evidence that the IPO is related to one or more inherent dynamical mechanisms of the climate system.


Introduction
The Interdecadal Pacific Oscillation (IPO) is a major expression of decadal to interdecadal variability centred in, but extending beyond, the Pacific , Power et al 1999. The temporal variability of the IPO is closely related to that of the Pacific Decadal Oscillation (PDO, Power et al 1999, Folland et al 2002. The PDO (Mantua et al 1997, Newman et al 2003 is defined by mainly extratropical North Pacific sea surface temperatures (SSTs). The IPO is associated with the evolution of the El Niño-Southern Original content from this work may be used under the terms of the Creative Commons Attribution 3.0 licence.
Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI. Oscillation (ENSO) and its impacts globally (Arblaster et al 2002, Meehl and Arblaster 2012, Newman et al 2016, Schneider and Cornuelle 2005. Folland et al (2002) provided evidence that the influence of the IPO on the South Pacific Convergence Zone can be statistically significantly distinguished from that of ENSO. A distinguishable impact of the IPO from ENSO is also seen on Australian rainfall (Power et al 1999), agriculture (McKeon et al 2004) and flood risk (Kiem et al 2003).
Changes in the phase of the IPO in the last century have been associated with changes in the rate of anthropogenic global warming (Dai et al 2015, Fyfe et al 2016, Kosaka and Xie 2013, Maher et al 2014. Since evidence has been presented that a causal dynamical relationship exists between the IPO and periods of acceleration and slowdown in Global Mean Surface Temperature (GMST) rise, it is critical that we understand and model the IPO accurately. Significant research questions include: Will the IPO shift to its positive phase in the near future, or has it recently shifted phase? Will such a shift result in a period of accelerated surface warming extending beyond the El Niño of 2015-2016? Will particular phases of the IPO change their frequency in the future? How does decadal variability influence the determination of the emergence of a climate signal attributable to anthropogenic influences? (Hawkins et al 2014, King et al 2015, Muir et al 2013. Critically, none of these questions can be adequately addressed in the absence of reliable climate model representations of the IPO. Decadal climate prediction was a focus of the Coupled Model Intercomparison Project 5 (CMIP5) and the Fifth Assessment Report (AR5) of the Intergovernmental Panel on Climate Change (IPCC) (Kirtman et al 2013). In CMIP5, several experiments were designed to initialize climate models with the observed three dimensional ocean state, in order to project the evolution of the climate 10 yr into the future. There is evidence that the two major transitions of the IPO since 1960 (from negative to positive in the 1970s, and from positive to negative in the late 1990s) can be simulated in initialized hindcasts Teng 2012, 2014). Additionally, it has been shown that climate models initialized in the mid-1990s could have predicted the late-1990s IPO transition . A decadal prediction initialised in 2013 shows an IPO transition to positive occurred in the 2014-2015 timeframe (Meehl et al 2016). Decadal prediction systems rely on the realistic simulation of the temporal and spatial characteristics of the climate system on multiannual to decadal timescales.
Despite these advances and the importance of skilfully modelling decadal variability, significantly less attention has been given to the assessment of model simulations of the IPO and decadal-timescale Pacific variability than the simulation of ENSO and other modes of variability on interannual and sub-annual timescales. ENSO has been the subject of substantial modelling and skill-benchmarking efforts (Bellenger et al 2013, Guilyardi et al 2012, Brown et al 2014. The PDO and North Pacific decadal variability have also received significantly more direct attention than the Pacific-wide IPO. The PDO in the North Pacific has generally been found to be poorly modelled by the CMIP3 generation of climate models (Oshima andTanimoto 2009, Stoner et al 2009). Models do not accurately capture North Pacific decadal SST and sea level pressure (SLP) modes or tropical to extratropical teleconnections (Furtado et al 2011). There has been a reported improvement from CMIP3 to CMIP5 in modelling the PDO and its teleconnection to North American rainfall (Polade et al 2013). PDO spatial and spectral patterns in the North Pacific in the Palaeoclimate Modelling Intercomparison Project (PMIP3) past-millennium forced simulations show some similarities to the observed patterns (Fleming and Anchukaitis 2016). However, Kociuba and Power (2015) concluded that the CMIP5 models underestimate the magnitude of tropical mean sea level pressure (MSLP) variability on interdecadal time-scales, and presented evidence that this characteristic is related to deficiencies in the autocorrelation properties of ENSO in the models. Power et al (2016) investigated rates of surface warming in observations and CMIP5 models and concluded that over-estimates of simulated multidecadal warming in some models reduce the confidence in their long-term projections.
Studies of future projections of the IPO/PDO have yielded contrasting results. For example, Lapp et al (2012) documented a projected weak trend towards more negative PDO phase occurrences in the 21st century CMIP3 simulations. However, Dong et al (2014) suggested that increasing greenhouse gases and aerosols favour the positive phase of the IPO/PDO. Fang et al (2014) documented other potential changes to the PDO and its mechanisms under greenhouse warming, reporting weaker, higher frequency variability of the PDO due to faster oceanic baroclinic Rossby waves. However, there is a large degree of uncertainty with regards to the future of the IPO/PDO.
Previous studies have used a principal component analysis (PCA) methodology to identify the IPO/PDO, with the exception of Dong et al (2014) who used a North Pacific index to compare Pacific Decadal Variability (PDV) across model experiments. The IPO Tripole index (TPI) of Henley et al (2015) is used in this study, as it incorporates both the North and South Pacific, presents a systematic, reproducible and objective metric for the comparison of the IPO across observed and modelled SST datasets and does not require detrending of SST data nor any associated assumptions about the nature of long-term trends.
In addition to these differences to previous studies, there are important aspects of the temporal and spatial Environ. Res. Lett. 12 (2017) 044011 evolution of the Pacific-wide IPO that have not yet been comprehensively studied in climate models; for example, variance-based persistence metrics, autocorrelation, IPO event duration and the association between temporal metrics and spatial patterns.
This study evaluates the spatial and temporal representation of the IPO in CMIP5 models and investigates whether or not there is a relationship between the spatial and temporal characteristics of each model's representation of the IPO. Section 2 outlines the data and methods used in this study. Results are presented in section 3. A summary and discussion are given in section 4.

Data and Methods
This study uses observed SST data from HadISST2.1 from the UK Met Office on a 1°Â 1°grid at monthly resolution, covering 1850-2010 inclusive (Kennedy et al 2016, Rayner et al 2017, truncated to the period 1911 onwards, yielding 100 yr of observed SST. The HadISST.2.1 data is constructed in a two-step process, based on satellite and in-situ data (Kennedy, in preparation 2016). Firstly, an iterative Bayesian PCA-based reconstruction method (Ilin and Kaplan 2009) is used to characterise large-scale features of the SST fields. Secondly, a local optimal interpolation method adds smaller-scale detail in well-observed regions (Karspeck et al 2011). Uncertainty is expressed by drawing representative samples from both interpolation steps and the bias adjustment scheme. In this study we use 10 realisations from this dataset.
The model data used here is comprised of preindustrial control (unforced) and historical (allforcings, including volcanic eruptions, solar variability and time-varying anthropogenic emissions) experiments from 39 CMIP5 models (as listed in table S1 available at stacks.iop.org/ERL/12/044011/mmedia) at monthly resolution. Model preindustrial and historical runs vary in length from 100 to over 1000 yr, and some models have multiple realisations. Modelled and observed SST data (model variable 'ts') are re-gridded to a common 1.5°Â 1.5°grid.
We use the IPO Tripole Index (TPI) of Henley et al (2015) to track the IPO. The index has native units of°C and has the advantage of being explicitly aligned with the observed spatial pattern of the IPO. The TPI is defined as the difference between SST anomalies (SSTA) averaged over the central equatorial Pacific (T 2 , 10°S-10°N, 170°E-90°W) and the average of the SSTA in the Northwest (T 1 , 25°N-45°N, 140°E-145°W) and Southwest Pacific (T 3 , 50°S-15°S, 150°E-160°W), as shown in equation (1). The observed time series and spatial SST correlation pattern are shown in figure 1. Alternative definitions of the IPO are not expected to yield significantly different results, given the strong similarities in the observational period between the TPI and PCA-based IPO indices (Henley et al 2015).
Spatial composite patterns of the observed and modelled IPO are identified as the mean unfiltered SST during strong phases of the IPO (subsequently referred to as 'unfiltered'), as identified by periods of greater than or equal to one standard deviation above or below the mean of the low-pass filtered TPI (using a Chebyshev filter with a 13 yr cut-off period, subsequently referred to as 'filtered'). The durations of IPO phases are assessed using run-lengths above and below the long-term TPI observed or modelled mean. However, IPO durations of a given phase lasting less than 5 yr, resulting from brief incursions above or below the threshold, are omitted from the calculations of the mean run-length and other statistics in both observations and models. A caveat to be mentioned here is that the IPO SST patterns from the model control runs are entirely internally generated by construction since there is no variation in external forcings, while the IPO SST patterns from the 20th century all-forcings simulations, as well as the observations used to deduce an observed IPO pattern, include the effects of external forcings. Since the response of Pacific SSTs to some external forcings resembles the IPO, such patterns are not totally independent, though there is evidence that the IPO is the dominant pattern of decadal variability in the Pacific (Meehl et al 2009).
Boxplots of model statistics (figure 2) show the inter-centennial and inter-model variability, with statistics computed for each 100 yr block in each model. Some simulations extend beyond a multiple of 100 yr. In these cases, additional time is included in the previous sample if shorter than 50 yr, but taken as a new sample if longer than 50 yr. With model simulation durations between 100 and 1163 yr, the boxplots display between one and twelve samples per model. No detrending is applied to the observed or modelled TPI data, since the TPI circumvents the need to detrend SST data (Henley et al 2015). Boxplot boundaries are set at the 25th and 75th percentiles (p 25 and p 75 ). Outliers are shown as red crosses, and are defined as samples greater than p 75 þ 1.5 (p 75 À p 25 ) or less than p 25 À 1.5 (p 75 À p 25 ). Whiskers are shown at the extents of the data not considered outliers.

Model Performance
Key characteristics of the temporal and spatial variability of the IPO are assessed using a set of metrics applied to the TPI time series in the observations and models. The metrics assess the variance and persistence characteristics in the models, as well as the spatial pattern.

Temporal metrics
Here we compare decadal and total TPI variance in the CMIP5 models to observations. For the total TPI variability in the preindustrial control runs, 60% (140 of 231) of the simulations show a higher total standard deviation than the observed TPI, whereas for the historical simulations 36% (62 of 174) of the simulations are higher than the mean observed value. Nevertheless, the CMIP5 model 100 yr replicates span a large range either side of the observed values (figure 2 (a), unfiltered TPI standard deviation) so it is not possible to infer a systematic difference between control and historical runs. Additionally, the observations represent only one realisation of a process replicated a number of times in many of the models.
A clearer bias in the models is apparent with regards to the comparison of low frequency TPI variability, with the substantial majority of preindustrial and historical simulations underestimating the filtered standard deviation of the TPI ( figure 2(b)). This bias extends to the under-estimation of the ratio of decadal to total standard deviation in the majority of models, which is around 0.4 in the observations (figure 2(c)). The best performing models in terms of this ratio (modelled inter-quartile range overlaps with observations) are: ACCESS1-0, ACCESS1-3, CMCC-CM, MPI-ESM-LR, MPI-ESM-MR, MPI-ESM-P and MRI-CGCM3. However, none of the models systematically capture all three variance statistics accurately (all results shown in figure S1).
The annual autocorrelation of the TPI during the peak ENSO season of Sep-Feb provides an additional measure of interannual persistence. Despite the clear variance bias across the models, such a distinct bias is not evident in the autocorrelation, with the models being centered on, and spanning a large range around, the observed lag-one autocorrelation (figure 2(d), and similarly for lag-two autocorrelation as shown in figure S2(e)). The number of IPO events per century and the mean IPO run-length (figures 2(f) and (g)) reveal a bias towards more IPO events per century and lower mean IPO run-length compared to the available observations. Observations depict around 6-7 IPO events per century, and a mean run-length of 14-18 yr. Power spectra of the North Pacific PDO in PMIP3 past millennium simulations (Fleming and Anchukaitis Environ. Res. Lett. 12 (2017) 044011 2016) exhibit broadband peaks at low frequencies, however few statistically significant spectral peaks appear above red noise levels. The IPO behaviour is explored further in the supplementary section S1, where results are presented for each of the 3 poles of the TPI (Box 1: North Pacific, Box 2: Central and Eastern equatorial Pacific, Box 3: Southwest Pacific). The filtered and unfiltered variance results for Boxes 1 and 3 are qualitatively similar to the TPI results, with overestimation of the total variance and underestimation of the decadal variance. However, the models perform significantly better on these metrics in the equatorial Pacific (Box 2), with a much lower model spread and no distinct bias in model variance ratios. The mean IPO run-length and events per century are better modelled in Box 2 and 3 than in Box 1, where the run-length duration is underestimated. Note however that run-length statistics are highly sensitive to the threshold and few observed IPO phases are available (figure S6 shows timeseries and run-lengths of filtered anomalies in each Box). This suggests the North Pacific is the potential source of the mismatch between observed and modelled persistence in the IPO, which is consistent with studies cited above that noted similar deficiencies for the PDO (defined for the North Pacific). However, the observed data are potentially less reliable in the Southern Hemisphere due to fewer observations prior to the satellite era post-1979. We note the brief cool SSTA in Box 3 in the mid 1960s that cut short the long warm phase in Box 3 (IPO negative), reducing the observed mean runlength in that box, which would otherwise be around 20 yr.
The variance results in this section are consistent with the results of Kociuba and Power (2015) who examined the ability of models to simulate equatorial MSLP variability on interannual and interdecadal time-scales. They found that interannual atmospheric variability was too strong and decadal variability was too weak. They concluded that the deficiency on decadal time-scales primarily arose because the models tend to underestimate the equatorial MSLP lag-one autocorrelation, but overestimate the magnitude of the (negative) lag-two autocorrelation. They showed that such deficiencies combine to make it more difficult to sustain decadal and longer-term anomalies. This also helps to explain why the TPI runlengths tend to be too short and why the models tend to overestimate the frequency of IPO events.
It is also interesting to note that the models tend to simulate variability in SST in the central-eastern equatorial Pacific reasonably well. However, the Outliers are shown as red crosses, defined as samples greater than p 75 þ 1.5 (p 75 À p 25 ) or less than p 25 À 1.5 (p 75 À p 25 ). Whiskers are shown at the extents of the data not considered outliers.
Environ. Res. Lett. 12 (2017) 044011 models exhibit similar deficiencies in both the offequatorial TPI nodes and equatorial atmospheric variability (Kociuba and Power 2015). This indicates that equatorial atmospheric variability on interannual and decadal time-scales is at least partially driven by SST variability beyond the central-eastern equatorial Pacific, and also points towards the importance of tropical-extratropical atmosphere-ocean interactions, as emphasised by Newman et al (2016) and Farneti et al (2014).

Spatial metrics
Model performance is further investigated in this section by comparing the observed and modelled spatial patterns of the IPO in preindustrial control runs (similar results are obtained for the historical runs, not shown). The basin-scale aspects of the observed patterns of the IPO are captured well by the models. In particular, the models capture the strong extratropical SST amplitude variability in the North Pacific, and the broad ENSO-like cold-tongue region, extending into the Eastern North Pacific off the coast of North America (figures 3(a)-(d)). The multi-model ensemble (MME) composite pattern shows stronger SST anomalies near the Kuroshio extension region relative to the central North Pacific, which is not seen in the observations, and is similar to the PMIP3 simulated PDO patterns presented by Fleming and Anchukaitis (2016) (their figures 1 and 2). The mean MME pattern in the positive phase of the IPO exhibits equatorial SST anomalies extending further into the western Pacific than in the observed pattern, consistent with the well-known cold tongue model bias of ENSO. SST anomalies near the Maritime continent and around northern and eastern Australia are accordingly of opposite sign to the observed pattern for positive IPO phases ( figure 3(c)). The models better capture the SST pattern in this region during negative IPO phases than positive IPO phases (figure 3 (c)). The South Pacific has generally weaker SST amplitude relative to the North Pacific in the MME pattern compared to the observed, particularly in positive IPO phases, for which the observations indicate similar intensities of SST anomalies in the North and South Pacific. There is a suggestion that the Pacific IPO pattern is related to anomalous SST patterns in the Indian and Atlantic Oceans in models, however this is less consistent in observations (shown in the supplementary figure S7). Our model patterns have broad agreement with the IPO patterns shown by Maher et al (2014) using a PC-based methodology, and the PDO patterns identified by Newman et al (2016). In addition, with regards to the observed patterns, we find weak sensitivity to changes in methodology (e.g. analysis period, baseline period, IPO threshold). A longer baseline period, as used in Henley et al (2015), results in a slightly stronger offequatorial IPO positive pattern. However, the difference does not influence the results strongly. The pattern correlation coefficients between the models and observational IPO pattern across all modelled centuries are mostly in the range of 0.4-0.8 (figures 3(e) and (f)). The standard deviation of the patterns is a measure for the strength of the anomaly pattern in IPO phases. This is slightly underestimated by the models, and marginally closer to observations in IPO positive phases than in IPO negative phases. Particularly encouraging are the similarities between the modelled and observed patterns in regions distant from the nodes of the TPI index, such as broad regions along the coast of North America and the Gulf of Alaska. The higher levels of spatial correlation between modelled and observed SST patterns (near 0.8) mainly result from greater similarity in the magnitude of SST anomalies in the equatorial region (figure S5). Models showing such high pattern correlations generally do not show appreciably better pattern magnitudes in the South Pacific. Turning now to model biases, slightly cooler than observed anomalies are simulated across most of the Pacific basin during IPO positive phases, with the exception of the Sea of Japan and the south and eastern coasts of Australia and in the Tasman Sea where the SST model bias is very small and positive ( figure 4(a)). During negative phases, biases are more equally of both signs but mainly quite small. The strongest warm biases are off the coasts of Baja California, in the Kuroshio extension region and off the coast of southeast mainland Australia ( figure 4(b)). The highest areas of inter-model difference in the bias (above 0.2°C) are quite localised, in the Bering Sea and in the Kuroshio extension region (figures 4(c) and (d)). Otherwise, inter-model differences are smaller and rather uniform across models elsewhere in the Pacific. The model biases are systematic in most regions during both IPO phases (figures 4(e) and (f)). That is, when a bias is apparent, a large proportion of the models display the bias. Although an overall cold bias is apparent in IPO positive phases ( figure 4(a)), there are large proportions of the basin where the majority of models are biased warm (figure 4(e)), indicative of skewness in the distribution of model biases.

Association between temporal and spatial skill
In the previous sections we examined the ability of the models to simulate temporal and spatial characteristics of the IPO separately. Here we determine if there is a link between the ability of models to simulate these temporal and spatial characteristics in preindustrial control runs. To do this we examine the relationship between the spatial pattern correlation for each century in each model and a measure of the ability of the models to capture the ratio of the decadal-tototal variance of the TPI (figure 5). The temporal measure is the absolute value of the difference between the modelled and observed ratio of decadal-to-total standard deviation in the TPI, expressed as a fraction of the observed value. For example, a value of 0.2 indicates a 20% error (either positive or negative) in the statistic relative to the observation. Figure 5 shows that models that exhibit higher skill (lower error) in the ratio of the decadal-to-total variance also have a strong tendency towards having a higher spatial pattern correlation with the IPO pattern in observations for both positive and negative IPO phases. For example, for positive IPO phases, of the 13% (49 of 390) of model centuries that have a fractional bias in the temporal statistic of 0.2 or less, 43 of 49 (88%) also have a pattern correlation over 0.5. The corresponding percentages for IPO negative phases are 13% and 84% respectively. This result is highly consistent across both IPO positive and negative phases, and most models have several of their preindustrial centuries represented (model results shown in table S2). The Spearman correlations between the spatial and temporal metrics are: À0.38 and À0.35 for positive and negative IPO respectively (both p-values < 0.01). Models with two or more preindustrial centuries in the high skill region in either IPO phase are: ACCESS1-0, ACCESS1-3, CMCC-CM, CSIRO-Mk3-6-0, MPI-ESM-LR, MPI-ESM-MR, MPI-ESM-P, MRI-CGCM3 and NorESM1-M.

Conclusion
This study investigates the spatial and temporal simulation of the IPO in CMIP5 pre-industrial control and historical all-forcings runs using the TPI, a box-based index for tracking the IPO. As a whole, the models underestimate decadal-scale TPI variance Environ. Res. Lett. 12 (2017) 044011 and the ratio of the decadal-to-total variance compared to observations. Though the models credibly represent the observed spatial pattern of Pacific SSTs associated with the IPO, there is an overall bias in the duration of simulated IPO phases, with most models underestimating the mean runlength of IPO positive and negative phases and thereby overestimating the number of IPO events per century.
These results are consistent with the results of Kociuba and Power (2015) who found that interannual MSLP variability was too high and decadal variability was too weak. The deficiency on decadal time-scales primarily arose because the models tend to underestimate persistence and overestimate oscillatory behaviour in ENSO.
Our results indicate that the temporal bias is also related to the deficiencies in the simulation of decadal to multi-decadal variance in SST in the extratropical Pacific. Given that the models characterise decadal variability in the equatorial Pacific better than the offequatorial Pacific, and also given the links between the IPO, concurrent changes in upper ocean heat content and the strength of the shallow tropical meridional overturning cells (STCs) (Meehl et al 2011, Newman et al 2016, Roemmich et al 2015, the undersimulation of decadal variability in the extratropical Pacific could be due to a model bias in coupling the equatorial to the extratropical Pacific via the atmospheric bridge and the STCs. Temperature data in the intermediate depth ocean is short and sparse prior to the availability of sub-700 m depth ocean   Farneti et al (2014) relates cooler tropical SST anomalies to weakened subtropical winds, which in turn affects a suppression of the subtropical gyre. Reduced downwelling and equator-ward meridional heat transport via the subtropical cell (STC) leads to reduced equatorial upwelling. This provides a negative feedback mechanism, whereby cool equatorial SST anomalies are returned to the equator as warm anomalies on decadal timescales. A mechanism with tropical origins could result in quasi-symmetrical patterns across both hemispheres. However, a comprehensive mechanistic understanding of the IPO, its distinction from the PDO and the South Pacific Decadal Oscillation (SPDO, Chen and Wallace 2015), and its relation to interannual variability of ENSO remain unconfirmed by observational data and are the subjects of future research.
Given the uncertainty in the characteristic timescale and long-term stability of the IPO and PDO (Fleming and  We show that a subset of 13% of CMIP5 preindustrial control simulations capture the ratio of the decadal-to-total TPI variance to within 20% of the observational value and the majority of simulations in this subset also exhibit higher spatial skill. This association between spatial and temporal skill in the representation of the IPO in CMIP5 models has not been previously identified. The emergence of a higher skill subset of models that depict: (i) patterns of Pacific SST variability similar to the observed IPO; and (ii) temporal variance relationships similar to observed, suggests that the IPO is, or is at least closely related to, one or more inherent dynamical mechanisms of the climate system.
With the presence of model biases in pre-industrial control runs, uncertainty is introduced with regards to the future behaviour of the IPO obtained from forced model runs, such as a preference for a particular phase under greenhouse warming. The documentation here of model strengths and weaknesses related to simulation of the IPO presents an opportunity to make improvements in model simulations influencing decadal climate prediction. In particular, the mechanism that produces the IPO, which could involve tropical-midlatitude interactions in the Pacific and coupled ocean dynamical processes (Farneti et al 2014, Meehl and Hu 2006, Newman et al 2016 and dynamical connections between the Pacific and Atlantic Oceans , Taschetto et al 2015, needs to be better understood and modelled. Ongoing research into these mechanisms and monitoring of mid to deep ocean circulation are expected to improve our understanding of, and ability to track and predict the IPO. Given the association between IPO phase and global surface temperature variations, improved prediction of the IPO could result in better-constrained decadal predictions of future global mean surface temperature. Positive IPO phases Negative IPO phases Figure 5. Temporal bias and spatial skill in CMIP5 models in IPO phases. Bias in temporal metric is the absolute error in the ratio of filtered to unfiltered TPI standard deviation (as in panel (c) in figure 2) relative to the observed IPO (Tstat rel ). Spatial skill is the spatial pattern correlation between observed and modelled IPO patterns (r spatial ); each point represents a 100 yr block of a pre-industrial control simulation from one CMIP5 model; higher skill models (Tstat rel < 0.2 and r spatial > 0.5) are located at bottom right (models reported in table S2).