On the predictability of SSTA indices from CMIP5 decadal experiments

Sea surface temperature anomaly climate indices in the tropical Pacific and Indian Oceans are statistically significant predictors of seasonal rainfall in the Indo-Pacific region. On this basis, this study evaluates the predictability of nine such indices, at interannual timescales, from the decadal hindcast experiments of four general circulation models. A Monte Carlo scheme is applied to define the periods of enhanced predictability for the indices. The effect of a recommended drift correction technique and the models’ capabilities in simulating two specific El Niño and La Niña events are also examined. The results indicate that the improvement from drift correction is noticeable primarily in the full-field initialized models. Models show skillful predictability at timescales up to maximum a year for most indices, with indices in the tropical Pacific and the Western Indian Ocean having longer predictability horizons than other indices. The multi model ensemble mean shows the highest predictability for the Indian Ocean West Pole Index at 25 months. Models simulate the observed peaks during the El Niño and La Niña events in the Niño 3.4 index with limited success beyond timescales of a year, as expected from the predictability horizons. However, our study of a small number of events and models shows full-field initialized models outperforming anomaly initialized ones in simulating these events at annual timescales.


Introduction
Reliable prediction of rainfall at multi-year to decadal timescales would be of great benefit to water managers and decision-makers, as it affects agricultural production, electricity generation, environmental management, fisheries and regional and national economic prosperity. For example, effective prediction of rainfall and streamflow serves as a basis for devising catchment management policies that help reduce the impacts of droughts, floods, and other hydroclimatic extremes.
Long-term predictability of rainfall is critically dependent on our ability to forecast the slowly-evolving ocean state, as sea surface temperature (SST) has a strong influence on terrestrial rainfall. For instance, many studies have linked Australian rainfall to sea surface temperature anomalies (SSTAs) in the Pacific and Indian Oceans (Nicholls 1989, Drosdowsky and Chambers 2001, Risbey et al 2009, Kirono et al 2010, Pui et al 2012. Of particular importance are: the El Niño Southern Oscillation (ENSO) (McBride and Nicholls 1983, Drosdowsky 1993a, 1993b, Power et al 2006, Wang and Hendon, 2007; the Indian Ocean Dipole, a dipole in SSTA between the central Indian Ocean and Indonesia (Ashok 2003, Verdon and Franks 2005, Ummenhofer et al 2009a; the El Niño Modoki (EMI), characterized by warm SSTA in central Pacific (C) flanked by colder SSTAs in the East (E) and West (W) (Ashok et al 2007, 2009, Cai and Cowan 2009, Taschetto et al 2009, average SSTA in the South Tasman Sea (Drosdowsky 1993a) and the Indonesian seas (Nicholls 1984). These climate modes can be quantified in terms of SSTA indices, and their connections to regional rainfall have long served as a basis for climate forecasts. Schepen et al (2012) found these climate indices to be useful predictors of Australian rainfall totals for the period 1950-2009. Ramsay et al (2008) reported large correlations between the number of seasonal tropical cyclones over Australia and the Niño 3.4 and Niño 4 regions while Pui et al (2011) discussed the relationship between annual maximum flood in Eastern Australia and the Interdecadal Pacific Oscillation. Various studies such as Chiew et al (1998) have shown drought and dry conditions in Australia to be linked to some of the above mentioned climate indices. As an instance of operational use, the Australian Bureau of Meteorology provides probabilistic rainfall forecasts using a statistical prediction system based on SSTA indices over the Pacific and Indian Ocean (Schepen et al 2014). Skilled prediction of such vital SSTA indices at timescales longer than seasonal would have significant social, economic and environmental implications principally from its role in predicting rainfall.
In order to assess the possibility of multi-year climate forecasts, a new set of experiments was set up by the World Climate Research Program part of Phase Five of the Coupled Model Intercomparison Project (CMIP5). These provide simulations of near-term climate (10-30 year hindcasts/predictions) (Meehl et al 2009). The experiments are initialized based on observed oceanic and, in some cases, atmospheric states, and account for changes in external forcings, including anthropogenic aerosols, stratospheric aerosols, greenhouse gases, and solar activity. Predictability on seasonal to decadal timescales is dependent on the reliable forecasting of low frequency variability associated with the state of the ocean (Hawkins and Sutton 2009). Multiple hindcast and forecast runs of the same model with slightly perturbed initial conditions, called 'ensembles,' are carried out (table 1) in order to isolate the predictable part of the climate response from unpredictable short-term climate variability that is unrelated to the models initial state or external forcing.
Climate models are imperfect representations of the real world with equilibrium states that are different to the observed. As with weather and seasonal forecasting, decadal prediction requires that the model is initialized using information based on the observed state of the ocean and atmosphere. As the model is initially perturbed from its equilibrium state, it will tend to revert back to its own equilibrium state over a period of time (Mehrotra et al 2014). This results in spurious trends in the model, referred to as 'climate drift'. The importance of drift is time dependent and will also depend on the variable being assessed (Sen Gupta et al 2012). For weather forecasting (typically ranging from 1 to 15 days), the difference between model and observed climatologies can be ignored. But at longer lead times (monthly, seasonal and decadal timescales), these systematic model errors cannot be ignored, and some form of drift correction must be applied (Magnusson et al 2012). The scale of climate drift can also be affected by the way the model is initialized. Two common approaches are used in CMIP5. In 'full-field initialization' the model's initial state is forced to be as close to the observed state as possible. If the observed state and model equilibrium are very different, the resulting drift is very pronounced. To reduce this problem, models are sometimes initialized by adding the observed anomaly (relative to the observed climatology) to the model climatology (known as 'anomaly initialization') (Smith et al 2007). In this case the initialized model will still retain the pre-existing model climatological biases. However while still present, the drift can be substantially reduced. As such, correcting for drift is an essential prerequisite for proper implementation of a climate model forecast.
Predictability of Indian and Pacific Ocean SSTs from CMIP5 decadal experiments is a relatively new area of research and lacks extensive literature. Lienert and Doblas-Reyes (2013) reported poor Pacific SST predictability on interannual timescales based on the UK Met Office Decadal Prediction System (DePreSys) decadal hindcasts. They reported the correlation skill of ENSO and ENSO-Modoki to be limited to two and one year, respectively. A higher skill for the Indian Ocean SSTs' decadal predictability was obtained by Guemas et al (2013), though they attributed this to the influence of external radiative forcing (i.e. anthropogenic greenhouse gas warming and volcanic eruptions) rather than to the ability of the initialized models to reproduce the observed low-frequency variability. Mehta et al (2013) reported statistically significant decadal prediction skill of global and basinaveraged SSTA for the period 1961-2010 using correlation analyses of CMIP5 decadal experiments using historical optical depths and observations. The feasibility of long term rainfall predictions from SST was examined by Khan et al (2015) and Meehl et al (2010). Based on decadal predictions of Pacific SSTs, Meehl et al (2010) found significant skill in predicting the distribution of precipitation over North America and Australia while Khan et al (2015) reported improvements in seasonal rainfall forecasts over Australia using a better multi-model based SSTA prediction.
For practical applications of the decadal experiments, like in long-term rainfall forecasting, it is important for potential users to understand the limits of this system, and in particular the lead time until which the models' predictions have useful skill. The objective of this paper is to analyze and evaluate the usefulness of CMIP5 decadal simulations in predicting interannual variations in important climate indices. Specifically, we assess the lead time up to which the CMIP5 models are able to predict, with some acceptable skill, the evolution of important Indo-Pacific SSTA indices commonly used as predictors of regional

Drift correction
We apply a drift correction to the indices calculated from the decadal experiments, as recommended for CMIP5 decadal predictions by Climate Variability and Predictability (CLIVAR) (WCRP 2011). This assumes that the drift in the monthly average prediction is just a function of the forecast lead time and is not affected by the model state or the external radiative forcing. For each model and index, bias for the 120 month period (obtained by comparing the model and the observed index values month wise), averaged across all ensembles and initializations, is assumed to represent the mean model drift. So the mean drift is different for each model and index and is subtracted from the index values obtained for individual decadal experiment of the model to give the drift corrected forecast. The uncorrected model forecasts are hereby referred to as raw forecasts.

Estimation of predictability horizon
The indices for each ensemble member and for the ensemble mean for each initialization time are compared with the concurrent observed indices in order to calculate the 120 month (decadal) time series of squared errors (for both the raw and drift-corrected forecast). An estimate of the lead time over which the indices are predictable is obtained by comparing the model errors with distributions of randomly generated errors. A Monte Carlo (MC) scheme is applied to generate twelve random error distributions, one for each calendar month. This is done since SST variance can vary seasonally and, as such, the distribution of errors changes by season. For each index and model, the squared errors for all the decades and ensemble members are averaged to produce the typical monthly evolution of model errors over a period of 120 months. Model errors for each lead time are compared against the appropriate random error distribution. The index is considered predictable at a certain lead time, if the error is smaller than 95% of the random error distribution, i.e., the predictable lead time is the month before the error is greater than 95% of the random error distribution ( figure 2(b)). For instance, HadCM3 (i2) has 10 ensemble members, and we consider forty one decades (yearly intervals starting from 1960 to 2000, as mentioned earlier). The model squared errors are averaged across all decades (41) and ensembles (10), resulting in a single 120 month error time series for each of the indices. The MC scheme used here calculates the average of 410 (41 × 10) 120 month random error time series and repeats it 10 000 times to generate a probability distribution that is used as the climatology of the errors. The model squared errors and the error climatology are then compared (using a 5% significance level) to obtain the period of useful prediction, the predictability horizon. Errors associated with a persistence forecast are also calculated. Persistence is defined as the error that results if the index forecast is set to the initial value (i.e. set at the time of forecast initialization) throughout the forecast period. If the model forecast skill cannot exceed the persistence skill, then it has no merit.

Predictability horizon and drift correction
The predictability horizons (lead time in months) for the indices from the drift corrected and the raw model outputs are shown in figure 2(a). Drift correction leads to clear improvements for the full-field initialized models but improvements are not so evident in case of anomaly-initialized models, except for EMI. The indices in the tropical Pacific and the Western Indian Ocean have higher predictability horizons than other indices. For the ENSO related indices and WPI, MIROC5 shows the highest predictability horizon but shows no significant predictability, even at short lead times, for the DMI, EPI, II and TSI indices. The highest predictability horizons are noted for EMI, with an average of 13 months (range 9-23 months), and WPI, with an average of 12 months (range 10-16 months). EMI is discussed in further detail in section 4.2. The MME shows the highest predictability for WPI at 25 months. The Indian Ocean DMI has the poorest predictability horizon with a maximum of only three months.
We also estimated prediction lead times using a correlation analysis between the monthly observed and ensemble mean indices at each lag time (results for only WPI shown in figure 2(c)), which shows a similar trend as earlier with Pacific indices and the WPI showing higher skills than other indices while DMI shows the least, suggesting that our results are robust to the metric used. The predictability horizons shown in figure 2(a) are also broadly consistent with persistence forecast (where the forecast at all lead times is simply the initial value of that metric; see section 2.4). For a prediction to be useful, the associated error from the model should be smaller than both the 95% random error and the persistence forecast error. As an example, in figure 2(b), it can be seen that the HadCM3i2 Niño 3 prediction errors are lower than the persistence error for almost forty five months and lower than the MC climatology error for eight months supporting the predictability horizon obtained for this index.
Based on our analysis of the predictability horizons, the comparison of squared errors with persistence errors and correlations from this subset of models, no clear conclusion can be made regarding which type of initialization performs better. Indeed, previous studies (Magnusson et al 2012, Hazeleger et al 2013, Smith et al 2013 have not been able to identify a preferred initialization method for improving forecast quality at interannual to decadal timescales. Results from the two models (HadCM3i2 and CanCM4i1) initialized one and two years before the 1997 and 1982 El Niño events are shown in figures 3(a), (b), (e) and (f). Results from the other two models are given in the supplementary material (figure S1). HadCM3i2 and CanCM4i1 show considerable skill in capturing both the El Niño events 9-12 months in advance, in figures 3(a) and (b). At the peak of the 1997 (1982) El Niño, 9 (7) out of 10 ensemble members of HadCM3i2 (A) and 10 (8) out of 10 ensemble members of CanCM4i1 (F), have positive Niño 3.4 anomalies. Comparing with a binomial distribution for 10 ensembles, these correspond to 99 (90) percentile for HadCM3i2 (A) and 100 (95) percentile for CanCM4i1 (F) during the peak of 1997 (1982) El Niño. There is some indication that some models may provide useful information on the phase of ENSO at longer lags although the fraction of ensemble members agreeing might not be significant. For example, El Niño 1997(1982 has 6 (7) out of 10 ensemble members of HadCM3i2 (A) and 4 (7) out of 10 ensemble members of CanCM4i1 (F) showing positive Niño 3.4 anomalies as shown in figures 3(e) and (f), which corresponds to 62 (83) percentile for HadCM3i2 (A) and 38 (83) percentiles.

El Niño and La Niña prediction
Similar analyses are repeated for the La Niña events of 1988/89 (initializations 1987and 1988) and 1973/74 (initializations 1972and 1973HadCM3i3 and CanCM4i1 in figures 3(c), (d), (g) and (h)); MIROC5 and HadCM3i2 in figure S2). When initialized 9-12 months before the events, as in figures 3(c) and (d), 9(10) out of 10 ensemble members of HadCM3i3 (F) and 10 (10) out of 10 ensemble members of CanCM4i1 (F), have negative Niño 3.4 anomalies, at the peak of the 1988 (1973) La Niña. These correspond to a significance level of more than 99 percentile for all the cases. For longer lead times, as shown in figures 3(g) and (h), 7 (4) out of 10 ensemble members of HadCM3i3(F) and 6 (6) out of 10 ensemble members of CanCM4i1(F) show negative Niño 3.4 anomalies. As in the case of El Niño prediction at such lead times, the fraction of ensemble members predicting the phase of the event correctly is still insignificant.
Despite most ensemble members predicting the phase of ENSO correctly, majority of them under-predict the magnitude (severity) of the El Niño and La Niña events, with the situation becoming worse at larger lead times. However, the two full-filled initialized models (CanCM4i1 and HadCM3i3) clearly outperform the anomaly initialized models (MIROC5 and HadCM3i2) in predicting these events. The MME shows poorer skills compared to the individual models in simulating these anomalies and thus are not shown. While the numbers of models/events are too small to reliably infer that full-field initialization produces consistently superior results to anomaly initialization for ENSO forecasts, our findings are in agreement with previous studies that have suggested that full-field initialized models are more skillful at seasonal timescales (Magnusson et al 2012, Meehl et al 2014.

Discussion
This study evaluates the predictive skill of multiple SSTA climate indices using the decadal prediction experiments of four CMIP5 models. All models showed the highest predictability for ENSO related metrics and the Western Indian Ocean SSTA. Similar results were also obtained from comparing the model squared errors for each of the indices with their respective persistence errors and from the correlation coefficients. Our results also show the initialized models to have varying degrees of skills in predicting El Niño and La Niña events at 1 or 2 year lead times. Some specific points from the analysis are discussed in greater detail below.

Drift correction
We have demonstrated that drift correction is critical prior to using decadal simulation outputs in the case of full-field initialization models. However, for the anomaly-initialized models there is little to no benefit gained from applying a drift correction (for the metrics investigated). Drift correction techniques are imperfect remedies for these model biases and are often considered as an additional source of uncertainty, as they might neglect significant statistical relationships between the variables (Ho et al 2012). More sophisticated techniques for drift correction may be able to extract greater predictability. Some examples of improved drift correction techniques could include post processing approaches like applying a time-dependent trend adjustment after the simple time-dependent bias adjustment (Kharin et al 2012) or forming a covariate model of drift correction using the sub-surface ocean heat content as a metric, but as mentioned earlier these need to be scrutinized properly before application.

SSTA predictability
Predictability on multi-year timescales for Pacific SSTs (and PDO) was reported by Chikamoto et al (2012) using the MIROC climate model. Mehta et al (2013) studied decadal predictability of SST from four Earth System Models and reported highest skill from MIROC5 and low and insignificant skills from HadCM3 and CCSM4 in the tropical Pacific. Chikamoto et al (2015) reported ENSO related prediction skill to decrease rapidly after lead times of one year from the MIROC5 model. Our results are consistent with these findings that MIROC5 has the highest predictability in the tropical Pacific. However, it shows limited skill in terms of simulating large climate anomalies like El Niños and La Niñas.
The significantly high predictability horizons noticed for EMI from CanCM4 (i2) and CCSM4 (i2) are attributed to issues from sampling variability resulting from the fact that these models have runs with start dates every 5 years and thus have only 9 set of decades. Also, given that EMI is a function of SSTA averaged across different parts of the Pacific Ocean, these sampling issues might get largely amplified. These sampling issues regarding initializations every year and initializations every five years requires thorough analysis and would be the subject of a different study.
4.3. Implications for regional rainfall Chiew et al (1998) and many others suggest significant link between spring and summer rainfall over Northeast Australia with ENSO and winter rainfall over Northwest Australia with indices in the Indian Ocean. To investigate the implication of index predictability on regional rainfall prediction, we calculate correlations between area-averaged rainfall and the corresponding drift-corrected SSTA index forecast at lead times of 10-12 (6-8) and 22-24 (18-20) months for the spring-summer (winter) rainfall values. The concurrent correlation between the observed index and rainfall is also shown for comparison. Rainfall data is from the Australian Bureau of Meteorology monthly gridded rainfall dataset (Jones et al 2009). Figure 4 shows the correlations between area averaged Oct-Nov-Dec (OND) rainfall over Queensland with the concurrent observed and drift-corrected Pacific SSTA indices, and Jun-Jul-Aug (JJA) rainfall over North Western Australia with the concurrent observed and drift-corrected Indian Ocean SSTA indices within lead times extending to a maximum of one and two years. Only models with annual initializations are considered so that there are enough samples to provide reliable correlation estimates. There are strong negative correlations between the observed Pacific indices and area-averaged OND rainfall over Queensland highlighting the relationship between ENSO and Northeast Australian spring-summer rainfall. All the models are able to reproduce this negative relationship, although the correlations become nonsignificant after a lead time of 10 months, which is consistent with the predictability horizons. For the Indian Ocean, there are strong observed positive correlations between area-averaged JJA rainfall over North Western Australia with EPI and II and negative correlations with DMI. The models do not consistently reproduce these relationships, suggesting that the Indian Ocean provides little useful predictable skill even at lead times less than a year.
Numerous studies have reported significant link between SSTA over the Western Indian Ocean and East African rainfall (Ummenhofer et al 2009b, Black 2005. Our study shows all models and the MME having high predictability horizons for WPI. This result along with the analysis for figure 4 show the potentiality of rainfall prediction using the drift corrected model forecasts of these SSTA indices at seasonal to annual timescales. However, this also presents the need of using improved methods of rainfall prediction for a practical application of the SSTA indices from the CMIP5 decadal experiments.

Conclusions
Forecasting rainfall at seasonal and longer timescales is of great societal importance. However, climate models have large biases in their representation of rainfall in part related to problems in the simulation of realistic teleconnections from climate drivers such as ENSO and regional rainfall. Thus instead of looking at simulated rainfall, we have examined predictions of SSTA indices which have significant relationships with regional rainfall. We find that while these models are unable to predict interannual variations beyond about a year, they have considerable skill at shorter timescales, similar to bespoke seasonal forecasting systems. There is a clear divide in the predictability of Pacific indices and WPI compared to indices in the Indian Ocean, which show little predictability. This is also evident in the strength of the relationships between rainfall in two regions of Australia with the observed and modeled Indian and Pacific Ocean indices. Based on only two events and two models for each initialization type, full-field initialized models seem to outperform the anomaly initialized ones in simulating ENSO related anomalies at annual timescales. The recommended drift correction has negligible effect on anomaly initialized models but is critical for full-field initialized models and improved drift correction techniques may provide further improvements. Future works will, therefore, focus on investigating such advanced drift correction techniques and estimating the skill of rainfall prediction at longer than seasonal timescales from the CMIP5 decadal experiments.