Skillful Seasonal Prediction of the Southern Annular Mode and Antarctic Ozone

Using a set of seasonal hindcast simulations produced by the Met Ofﬁce Global Seasonal Forecast System, version 5 (GloSea5), signiﬁcant predictability of the southern annular mode (SAM) is demonstrated during the austral spring. The correlation of the September–November mean SAM with observed values is 0.64, which is statistically signiﬁcant at the 95% conﬁdence level [conﬁdence interval: (0.18, 0.92)], and is similar to that found recently for the North Atlantic Oscillation in the same system. Signiﬁcant skill is also found in the prediction of the strength of the Antarctic stratospheric polar vortex at 1 month average lead times. Because of the observed strong correlation between interannual variability in the strength of the Antarctic strato- spheric circulation and ozone concentrations, it is possible to make skillful predictions of Antarctic column ozone amounts. By studying the variation of forecast skill with time and height, it is shown that skillful predictions of the SAM are signiﬁcantly inﬂuenced by stratospheric anomalies that descend with time and are coupled with the troposphere. This effect allows skillful statistical forecasts of the October mean SAM to be produced based only on midstratosphere anomalies on 1 August. Together, these results both demonstrate a signiﬁcant advance in the skill of seasonal forecasts of the Southern Hemisphere and highlight the impor- tance of accurate modeling and observation of the stratosphere in producing long-range forecasts.


Introduction
Accurate prediction of the atmospheric circulation several months in advance relies on the presence of lowfrequency predictable signals in the climate system. It has now been demonstrated that the stratosphere is an important pathway for the communication of predictable tropical signals across the globe; in particular, El Niño-Southern Oscillation (ENSO) (Bell et al. 2009;Ineson and Scaife 2009;Hurwitz et al. 2011), the quasi-biennial oscillation (QBO) (Marshall and Scaife 2009;Garfinkel and Hartmann 2011), and the 11-yr solar cycle (Haigh 2003;Gray et al. 2013). These teleconnections allow for the possibility of significant predictability in regions remote from the direct effect of the signal. Despite this, many operational seasonal forecast models include only a poor representation of the stratosphere (Maycock et al. 2011), and it has been suggested that this contributes to Denotes Open Access content. their lack of seasonal forecast skill in the extratropics (Smith et al. 2012).
Furthermore, because stratospheric anomalies persist for longer than those in the troposphere and can influence surface weather patterns (e.g., Baldwin and Dunkerton 2001), the initial conditions of the stratosphere itself can act as a source of enhanced predictability Charlton et al. 2003;Hardiman et al. 2011). The effect of the stratosphere on the troposphere is especially pronounced following a rapid midwinter breakdown of the strong westerly stratospheric polar vortex [known as a sudden stratospheric warming (SSW)], and past work has focused on the influence of these events on forecast skill (Kuroda 2008;Sigmond et al. 2013). However, SSWs are highly nonlinear events that are currently not predictable beyond about two weeks in advance (Marshall and Scaife 2010), limiting their usefulness in seasonal prediction. SSWs also occur almost exclusively in the Northern Hemisphere (NH), with only one event in the approximately 60-yr record having been observed in the Southern Hemisphere (SH), in September 2002 (Roscoe et al. 2005).
The rarity of SSWs in the SH is a result of less dynamical forcing from vertically propagating planetary waves in the SH relative to the NH stratosphere. This, in turn, comes about because of lesser SH orography and land-sea temperature contrasts, which can excite planetary waves. This reduced variability also means that anomalies in the Antarctic stratosphere persist for longer than those in the Arctic (Simpson et al. 2011). Hence, the SH stratospheric circulation may be predictable on longer time scales, and thus more useful for seasonal forecasts despite the lack of SSWs. Indeed, Thompson et al. (2005) and Son et al. (2013) have found that smalleramplitude variations in the Antarctic stratospheric polar vortex are followed by coherent temperature and pressure anomalies at Earth's surface that resemble the southern annular mode (SAM) pattern. These observations led Roff et al. (2011) to find that improved forecasts of the SAM up to 30 days ahead may be achieved with a stratosphere-resolving model. The SAM is the dominant mode of variability of the extratropical Southern Hemisphere sea level pressure and affects the position of storm tracks, rainfall, surface air temperature, and ocean temperatures across the extratropics (e.g., Silvestri and Vera 2003;Reason and Rouault 2005;Hendon et al. 2007). As such, there are considerable societal benefits and interest in its prediction (Lim et al. 2013).
Another reason for interest in the prediction of the Antarctic stratosphere is the interannual variability in springtime ozone depletion, which can significantly affect the amount of harmful ultraviolet radiation reaching Earth's surface over the Southern Hemisphere. The magnitude of this interannual variability is a significant fraction of the magnitude of long-term depletion caused by emission of chlorofluorocarbons (CFCs) and other ozone-depleting substances. While ozone-depleted air is confined over the polar region by the stratospheric polar vortex during winter and spring (resulting in the ozone hole), this air is released to midlatitudes following the ultimate breakdown of the vortex (final warming) in late spring/early summer. The extent of the resulting summertime ozone depletion is largely determined by the total deficit in ozone over the Antarctic during spring (Bodeker et al. 2005). Salby et al. (2012) have shown that interannual variations in Antarctic ozone depletion are highly correlated with changes in planetary wave forcing of the stratosphere. They found that the anomalous vertical Eliassen-Palm (EP) flux (a measure of meridional eddy heat flux) at 70 hPa poleward of 408S during August-September explains almost all the interannual variance of anomalous ozone depletion during September-November. Using this relationship, they postulate that accurate prediction of planetary wave forcing could allow skillful seasonal forecasts of ozone depletion.
The influence of planetary wave forcing on ozone depletion comes about through both chemical and dynamical mechanisms. Planetary wave breaking causes an increase of the strength of the stratospheric residual mean meridional circulation (Haynes et al. 1991), with a resultant increase in large-scale descent and adiabatic warming over the pole. This warming inhibits the formation of polar stratospheric clouds, which have a vital role in the activation of halogen species that cause the chemical depletion of ozone. The increased meridional circulation, as well as an enhancement of horizontal two-way mixing caused by planetary wave breaking, also causes an increase in the dynamical transport of tropical ozone-rich air to the polar regions, further increasing ozone concentrations. Breaking planetary waves can also modify the geometry of the stratospheric polar vortex, stripping away elements of ozone-depleted air (Waugh et al. 1994), or in the extreme case of the 2002 SSW causing the ozone hole to split in two (Charlton et al. 2005).
Here, we address directly the influence of the stratosphere on springtime Antarctic seasonal forecast skill using a set of historical hindcasts (or historical reforecasts) from a new operational system with a fully stratosphere-resolving general circulation model. We find significant skill in the prediction of the Antarctic stratospheric polar vortex up to four months in advance, including for the 2002 SSW. Using the observed relationship between column ozone quantities and the stratospheric circulation, we are then able to infer skillful predictions of springtime ozone depletion, confirming the hypothesis of Salby et al. (2012). This exceeds the lead time of other contemporary ozone forecasts, which are typically no more than two weeks . The forecast system also shows significant levels of skill in the prediction of the surface SAM at seasonal lead times. By studying the variation of hindcast skill with time and height, we demonstrate that this skill is significantly influenced by the descent of predictable stratospheric circulation anomalies.

Seasonal forecast system
The analysis in this paper is based on results from a set of hindcast predictions produced by the Met Office Global Seasonal Forecast System, version 5 (GloSea5) (MacLachlan et al. 2014). This system is based upon the coupled Hadley Centre Global Environmental Model, version 3 (HadGEM3) (Hewitt et al. 2011), with an atmospheric resolution of 0.838 longitude 3 0.568 latitude, 85 quasi-horizontal atmospheric levels, and an upper boundary at 85 km. The ocean resolution is 0.258 in longitude and latitude, with 75 quasi-horizontal levels. A 15-member ensemble of hindcasts was run for each year in the period 1996-2009. The hindcast length is approximately four months from three separate start dates spaced two weeks apart and centered on 1 August (25 July, 1 August, and 9 August), with five members initialized on each start date. Members initialized on the same start date differ only by stochastic parameterization of model physics (Tennant et al. 2011).
Initial conditions for the atmosphere and land surface were taken from the European Centre for Medium-Range Weather Forecasts (ECMWF) Interim Re-Analysis (ERA-Interim) (Dee et al. 2011), and initial ocean and sea ice concentrations from the GloSea5 ocean and sea ice analysis, based on the Forecasting Ocean Assimilation Model (FOAM) data assimilation system (Blockley et al. 2013). The ERA-Interim data are linearly interpolated onto model levels between the surface and 64.56 km (near 0.1 hPa), and the 64.56-km values are then replicated onto the four subsequent levels up to 85 km. FOAM data are on the same grid as the ocean model. Beyond initialization the model takes no further observational data, and contains no flux corrections or relaxations to climatology. The model lacks interactive chemistry, and ozone concentrations are fixed to observed climatological values averaged over 1994-2005, including a seasonal cycle (Cionni et al. 2011). Scaife et al. (2014) have shown that this seasonal forecast system produces highly skillful forecasts of the North Atlantic Oscillation (NAO) during the Northern Hemisphere winter. The combined effects of ENSO, QBO, and sea ice teleconnections, as well as the increased ocean resolution that has improved the representation of Northern Hemisphere blocking events (Scaife et al. 2011), contribute to this skill.
Hindcast accuracy is verified by comparison with ERA-Interim (Dee et al. 2011). The ERA-Interim dataset has been demonstrated to have realistic representation of the stratospheric meridional circulation (Seviour et al. 2012;Monge-Sanz et al. 2013). It also assimilates observations of ozone concentrations, and this assimilation has been demonstrated to be in close agreement with independent satellite data (Dragani 2011).

a. Stratospheric polar vortex
The climatology of Antarctic stratospheric polar vortex winds in the GloSea5 hindcasts is compared to the ERA-Interim climatology in Fig. 1. The strength of the stratospheric polar vortex is measured by the zonalmean zonal wind (U) at 608S and 10 hPa, which is approximately the center of the mean position of the vortex in the midstratosphere. The composite for the GloSea5 hindcasts is formed from all the individual ensemble members over 1996-2009 (a total of 210), while that from ERA-Interim is a composite of all years from 1979 to 2010 (a total of 32 years). It can be seen that the mean of the GloSea5 hindcasts agrees very closely with ERA-Interim throughout the spring, with only a slight bias toward weaker winds in August and September. The interquartile and 95th percentile ranges of GloSea5 and ERA-Interim also agree well, although the ERA-Interim values are noisier as would be expected from a sample size consisting of fewer years.
The GloSea5 hindcast predictions of interannual variability of the Antarctic stratospheric polar vortex winds are shown in Fig. 2a. Anomalies are defined from the relevant climatology of either GloSea5 or ERA-Interim. For GloSea5, this climatology is calculated from the mean of each day across all ensemble members in all years, whereas for ERA-Interim the climatology is the mean of each day, smoothed with a 30-day running mean (in order to account for its increased noise resulting from the reduced sample size). Results are shown for September-November (SON) averages, corresponding to a 1 month average lead time. The correlation between the GloSea5 ensemble mean and ERA-Interim is 0.73, which is statistically significant from zero at the 99% confidence level, and has a 95% confidence interval of (0.37, 0.90). This correlation does not depend strongly on particular years; the correlation remains significant at the 95% level (r 5 0.57) if the year 2002 (which has the greatest anomaly) is excluded. Significance is calculated using a two-tailed bootstrap test, whereby the percentile of the observed correlation is calculated from the distribution of correlations of a large number (;10 000) of pairs of time series formed by resampling with replacement from the original time series. These significance tests make fewer assumptions about the underlying structure of the data than parametric tests (Wilks 2006) and are used throughout this study.
The skill shown in Fig. 2a cannot be accounted for by persistence of initial anomalies. In fact, there is a negative correlation between U on 9 August, when the last ensemble member is initialized, and the SON mean (r 5 20.54). Hence, a persistence forecast would be negatively correlated with observed values. This relationship may be consistent with ideas of a preconditioning of the polar vortex (e.g., McIntyre and Palmer 1983). The standard deviation of all GloSea5 ensemble members is 7.5 m s 21 and that of ERA-Interim is 9.7 m s 21 , indicating that the GloSea5 ensemble spread may be too small. However, there are large uncertainties in these values due to the short hindcast period and the large 2002 anomaly. Following Charlton and Polvani (2007), SSWs are defined as a temporary reversal of U at 608S and 10 hPa, occurring before the final transition to summer easterlies (final warming). Under this definition, one SSW event was simulated in the GloSea5 hindcasts, in 2002. A similar magnitude event (in terms of departure from climatology) occurred in a 1997 ensemble member, although U did not quite become easterly. Time series of stratospheric polar vortex winds for these two events are shown in Fig. 1a along with the observed 2002 SSW in Fig. 1b. It can also be seen in Fig. 2a that 2002 has the most anomalous stratospheric polar vortex in the GloSea5 hindcasts, with 14 of 15 ensemble members simulating negative anomalies, and the most negative ensemble mean. It is therefore possible that an increased likelihood of the 2002 event was to some degree detectable about two months in advance, although it has not been determined whether this predictability comes from a preconditioning of the vortex, as suggested by Scaife et al. (2005), or the result of external forcing.
Both the SSW events simulated by GloSea5 were vortex displacement events, in contrast to the vortex splitting event that occurred in 2002 (Charlton et al. 2005). This is demonstrated in Fig. 3, which shows geopotential height in the midstratosphere at the date of minimum U at 608S and 10 hPa, for the two simulated events in GloSea5 and the observed event in ERA-Interim. The distinction between splitting and displacement SSW events is important because it has been observed that tropospheric anomalies are greater following vortex splitting events, at least in the Northern Hemisphere (Nakagawa and Yamazaki 2006;Mitchell et al. 2013).
The timing of the final warming of the stratospheric polar vortex also has a significant effect on stratospheric temperature and ozone concentrations (Yamazaki 1987), as well as on the coupling of the stratosphere to the troposphere (Black and McDaniel 2007). The predictability of these events was investigated in GloSea5, but not found to be highly significant. This is probably because the mean timing of the final warming is toward the end of the 4-month hindcast simulation (around 20 November at 10 hPa), and the final warming does not occur before the end of the hindcast for some ensemble members, thereby introducing a bias in the mean.

b. Ozone depletion
GloSea5 does not include interactive ozone chemistry, so in order to make ozone forecasts concentrations must be inferred from other meteorological variables. Total ozone quantities over the Antarctic polar cap have been found to be highly correlated with vertical EP flux poleward of 408S (Weber et al. 2011;Salby et al. 2012). EP flux diagnostics are not routinely produced directly by operational seasonal forecast systems and require high-frequency output at high spatial resolution to calculate. However, vertical EP flux dominates variability of the stratospheric polar vortex, so it may be possible to use the strength of the vortex to infer ozone quantities.
SON mean total column ozone quantities areaweighted averaged over the polar cap (608-908S) are shown in Fig. 4a for ERA-Interim and the Total Ozone Mapping Spectrometer (TOMS) satellite instrument (Kroon et al. 2008). ERA-Interim data are highly correlated with TOMS, verifying the accuracy of ERA-Interim against direct satellite measurements (TOMS values are slightly higher than ERA-Interim; this is probably because TOMS cannot make observations during the polar night). The long-term trend in polar cap total column ozone is calculated by fitting a secondorder polynomial to the data. This long-term trend is due to changes in concentrations of CFCs and other ozone-depleting substances, and largely unrelated to dynamical variability. On the other hand, shorter-term interannual changes are strongly related to dynamical variability. In Fig. 4b anomalies of polar cap total column ozone from the long-term trend are plotted against anomalies of the SON mean U at 608S and 10 hPa. It can be seen that these two quantities are highly correlated (r 5 20.92), meaning that polar vortex variability explains approximately 85% of the variance of polar cap total column ozone anomalies. This strong correlation makes it possible to use GloSea5 forecasts of polar vortex winds to produce inferred predictions of polar cap total column ozone quantities. This is carried out by a leave-one-out cross-validation procedure (Wilks 2006); the linear regression of ERA-Interim ozone and U anomalies for all years 1979-2009 except the hindcast year is used to produce the hindcast for each ensemble member. Thus no information from the hindcast year enters the hindcast itself. Figure 2b shows the GloSea5 ozone hindcasts along with the assimilated values from ERA-Interim. The correlation between the GloSea5 ensemble mean and ERA-Interim is 0.73, which is statistically significant at the 99% level, and has a 95% confidence interval of (0.38, 0.91). Errors from the regression in Fig. 2b for the inferred ozone quantities for each ensemble member are small compared to the spread between ensemble members, and so are not plotted in this figure.

c. Southern annular mode
The SAM index in both GloSea5 and ERA-Interim is depicted as the difference between the normalized anomalies of zonally averaged mean sea level pressure at 408 and 658S (Gong and Wang 1999). These anomalies are calculated from the respective climatologies of GloSea5 and ERA-Interim. The ERA-Interim SAM index calculated in this way is also highly correlated with other measures of the SAM, such as the station-based index of Marshall (2003). The GloSea5 hindcast skill for the prediction of the seasonal (SON) mean SAM index is shown in Fig. 5. The correlation of the GloSea5 ensemble mean and ERA-Interim is 0.64, which is statistically significant at the 95% level, and has a 95% confidence interval of (0.18, 0.92), confirming skillful prediction of the SAM at 1-month average lead times. This is similar to the value for the December-February (DJF) NAO correlation skill of 0.62 found by Scaife et al. (2014) in the same seasonal forecast system. The 1-yr lag autocorrelation of the SON mean SAM is negative (r 5 20.36), and accounting for this by sampling pairs of consecutive years in the bootstrap test leads to a narrower confidence interval than presented above. The variability of the SAM simulated by GloSea5 is broadly realistic with a standard deviation of all ensemble members of 0.98 compared to 0.90 in ERA-Interim over the same period.
The SAM is strongly related to surface temperatures over much of the SH extratropics. Figure 6a shows the correlation of the SON mean SAM from ERA-Interim over 1996-2009 with SON mean gridded station-based surface temperature data from the Hadley Centre Climate Research Unit temperature dataset, version 4 (HadCRUT4) (Morice et al. 2012). The HadCRUT4 dataset has been chosen to demonstrate the relationship between the SAM and surface temperature because of the scarcity of temperature observations in the Southern Hemisphere, meaning that reanalysis data are poorly constrained in many regions. The same relationship between surface temperatures and the SAM is shown for the GloSea5 ensemble mean in Fig. 6b. Many of the observed correlations are reproduced in the hindcasts, such as the opposite-signed correlations over eastern Antarctica and the Antarctic Peninsula/Patagonia, as well as between eastern Australia and New Zealand. These results are in agreement with Gillett et al. (2006), who analyzed the temperature patterns associated with the SAM over the longer observational record of 1957-2005.
The GloSea5 ensemble mean SON surface temperature correlation with HadCRUT4 is shown in Fig. 6c. Also highlighted (black circles) are the points with the strongest observed correlations with the SAM (jrj . 0.5). Regions of significant positive correlations are found over eastern Antarctica, Patagonia, New Zealand, and eastern Australia. These are regions that also have a strong correlation with the SAM, indicating that the significant surface temperature skill is related to skill in prediction of the SAM. On the other hand, there are also some significant negative correlations in subtropical regions, which may indicate a model bias in the temperature pattern associated with the SAM in these regions.

d. Stratosphere-troposphere coupling
It is now investigated whether the statistically significant skill in hindcasts of the stratospheric polar vortex affects that of the surface SAM. Forecast skill as a function of lead time and height is studied for polar cap FIG. 5. SON mean southern annular mode (SAM) index in individual GloSea5 hindcast ensemble members (dots), ensemble mean (dashed green curve), and ERA-Interim (solid green curve). The SAM is calculated from mean sea level pressure data, and hindcasts initialized near 1 August. The correlation of the ensemble mean and ERA-Interim values is 0.64, which is statistically significant at the 95% level. (608-908S) mean geopotential height anomalies (Z 0 ). 1 Figure 7a shows the correlation of Z 0 in ERA-Interim with the GloSea5 ensemble mean hindcast values. Values are smoothed with a 30-day running mean before correlations are calculated, and plotted such that values for 15 September represent the correlation of the ERA-Interim and GloSea5 ensemble mean September mean values (without this smoothing, there are noisier but still significant correlations in a similar pattern). Between 1 and 9 August the ensemble mean is taken as the average of the 10 initialized ensemble members, and the average of all 15 ensemble members is used after this date.
As would be expected from the initialization of GloSea5 from ERA-Interim data, correlations are high in both the troposphere and the stratosphere for the August mean because of predictability on weather time scales. However, tropospheric and lower-stratospheric skill rapidly decays and becomes statistically insignificant throughout September. In contrast, stratospheric correlations remain statistically significant throughout the hindcast simulation, and as high as 0.8 until mid-October (corresponding to a 2-month lead time).
Importantly, the region of high levels of stratospheric skill descends with time and is present at the tropopause at the same time as a reemergence of significant tropospheric skill in mid-October. This reemergence cannot be accounted for by the persistence of tropospheric anomalies, so must be the result of the effect of another predictable signal on the extratropical tropospheric circulation. An obvious candidate for such a signal is the polar stratosphere, since this remains predictable throughout the hindcast period. The reemergence of tropospheric skill also occurs at the same time as the strongest observed coupling between the stratosphere and troposphere found in other studies (e.g., Thompson et al. 2005;Simpson et al. 2011).
To determine the stratospheric influence on tropospheric skill, a simple statistical forecast model is formed, which has as its only input the initial conditions of the Antarctic stratosphere. A leave-one-out cross-validation procedure is employed; ERA-Interim values are used to calculate the linear regression of Z 0 at 10 hPa on 1 August with Z 0 at all other times and heights from 31 of the 32 years from 1979 to 2009. This regression is then used to produce a hindcast of the 32nd year based on its Z 0 at 10 hPa on 1 August. The method ensures that no information from the hindcast year enters the model. The process is then repeated to make hindcasts of all 32 years. Figure 7b shows the correlation of 30-day running means of these statistical hindcasts with ERA-Interim values. As might be expected, skill is initially high in the midstratosphere but not the troposphere. As with the GloSea5 hindcasts, the region of high skill descends with time, and statistically significant correlations reemerge in the troposphere throughout October. This demonstrates that skillful forecasts of the Antarctic troposphere during October can be produced based only on knowledge of Z 0 in the midstratosphere on 1 August. It also suggests that the reemergence of tropospheric skill in the GloSea5 hindcasts in October is likely to be caused by predictable stratospheric anomalies which descend with time.
However, it is also possible that a third factor influences both the 1 August stratosphere and the October and November troposphere. ENSO may be such a factor, since it has been shown to influence both the surface SAM (Lim et al. 2013) and the polar stratosphere (Hurwitz et al. 2011). The influence of ENSO is therefore assessed using the same leave-one-out cross-validation procedure, and shown in Fig. 7c. The input to the statistical model is the July mean Niño-4 index (sea surface temperatures averaged over 58S-58N, 1608-1508W) from the Hadley Centre Global Sea Ice and Sea Surface Temperature dataset, version 1 (HadISST1) (Rayner et al. 2003). Similar results are obtained using the July mean Niño-3.4 index or Southern Oscillation index. The Niño-4 index-based statistical hindcasts show some significant tropospheric correlations around 1 September and in November but not during October. Hence, ENSO cannot account for the October reemergence of tropospheric skill in the GloSea5 hindcasts, at least in this statistical model.
Importantly, the longer 32-yr  period of ERA-Interim [rather than the 14-yr (1996-2009) period of the GloSea5 hindcasts] is used for the statistical analysis presented in Figs. 7b and 7c. The correlation between both the 1 August Z 0 at 10 hPa and the July mean Niño-4 index with the SON SAM is not statistically significantly different during 1996-2009 compared with 1979-2010. This was tested using a bootstrap test, which correlates subsets of 14 years from the (detrended) 32 years. Hence correlations found for the shorter period are deemed to be a marginal distribution of those over the longer period, so a more robust measure of sources of predictability can be obtained by studying the longer observational record.
Similar features are seen if the statistical hindcasts are repeated using the shorter period, although tropospheric skill from the polar vortex emerges later (in November), and that from Niño-4 earlier (in October; not shown). These statistical hindcasts also show lower skill than the GloSea5 hindcasts at almost all times in both the troposphere and stratosphere, which may indicate the importance of nonlinearities or the influence of other external factors that can be captured by the full dynamical model.

Discussion
We have demonstrated that Antarctic total column ozone amounts are predictable up to four months in advance during the austral spring, even with a model that lacks interactive chemistry. While using such a model has the advantage of being less computationally expensive than a chemistry-climate model, there are also some drawbacks. Primarily, the model will not be able to simulate zonal asymmetries in ozone concentrations and their influence on the stratospheric circulation or the feedback between ozone concentrations and stratospheric temperatures. Both these factors have been shown to be important in driving long-term trends in the SAM as a result of ozone depletion (Thompson and Solomon 2002;Crook et al. 2008;Waugh et al. 2009).
Perhaps more relevant for seasonal forecasts is the fact that we have not been able to determine whether the observed strong correlation between the stratospheric circulation and Antarctic ozone concentrations is dominated by a chemical or dynamical mechanism. If the relationship is dominated by a chemical mechanism, whereby enhanced descent over the pole inhibits the activation of ozone-depleting substances, we would expect the correlation to weaken as concentrations of these substances return to preindustrial levels. Accurate forecasts of ozone with models lacking interactive chemistry would then not be possible. On the other hand, if the mechanism is largely dynamical, whereby transport of ozone-rich air from the tropics is the important factor, we would not expect the relationship to change in time. Although a study to distinguish these mechanisms has been carried out for chemistry-climate models ), it has not been possible to do so in observations. In either case, we do not expect the relationship to break down soon, as concentrations of ozone-depleting substances are not projected to return to 1980 levels until the late twenty-first century (WMO 2011).
The correlation skill of 0.64 [95% confidence interval: (0.18, 0.92)] for the SON mean SAM in the GloSea5 hindcasts is greater than but not inconsistent with that found by Lim et al. (2013). They report a correlation of 0.40 for the SON mean SAM from 1 August initialized forecasts over 1981-2010 using the Predictive Ocean and Atmosphere Model for Australia, version 2 (POAMA2). Over the comparable period of 1996-2009, they find a correlation of 0.54 (H. Hendon 2014, personal communication). Significantly, POAMA2 has only two model levels in the stratosphere, and so may be unable to simulate the stratosphere-troposphere coupling described here. Lim et al. (2013) attribute their results to the influence of ENSO through a tropospheric teleconnection. This is not inconsistent with our result shown in Fig. 7c, since we find significant tropospheric predictability from ENSO during November, the same time that Lim et al. (2013) find the strongest correlation between ENSO and the SAM. The lack of discrepancy between these two systems despite their different stratospheric resolutions may be a result of the ENSO-SAM connection being too weak in GloSea5, or simply that the relatively short hindcast period used here prevents a statistically significant difference being detected.
Despite this significant correlation skill in hindcasts of the SAM, it is clear from Fig. 6 that the standard deviation of the GloSea5 ensemble mean SAM is much less than that of observations. The signal-to-noise ratio (ratio of the standard deviation of the ensemble mean to that of all ensemble members) is just 0.4. For a ''perfect'' forecast system (one in which observations are indistinguishable from an ensemble member), the signalto-noise ratio and correlation are directly related (Kumar 2009), so that the expected correlation would be just 0.3. The fact that it is greater than this is because the average correlation between ensemble members and observations is much greater than that between pairs of ensemble members. A similar but smaller difference is also found for the stratospheric polar vortex forecasts, and this is also observed by Scaife et al. (2014) for the NAO in the same system. These results mean that individual ensemble members have a smaller predictable signal than observations.
Given this result, it might be expected that more skillful predictions could be obtained with a larger ensemble size. To illustrate the variation of hindcast skill with ensemble size we systematically sample smaller sets of forecasts from the full 15 members for each year, following the method of Scaife et al. (2014). This is repeated many times and an average value for a given sample size calculated. This variation of correlation skill with ensemble size for both the SON mean SAM and stratospheric polar vortex winds is shown in Fig. 8. These curves closely follow the theoretical relationship of Murphy (1990), which relies only on the mean correlation between pairs of ensemble members hr mm i and the mean correlation between individual ensemble members and observations hr mo i given by r 5 hr mo i ffiffiffi n p ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffi 1 1 (n 2 1)hr mm i p , where r is the ensemble mean correlation, and n is the ensemble size. These curves are shown in Fig. 8, along with their asymptote for an infinite sized ensemble. Although the stratospheric forecasts cannot be greatly improved with a larger ensemble size in the current system, greater correlation scores of the SAM could be achieved with an ensemble size near 30. Although the large uncertainty range does not allow a strong statement about potential predictability, the asymptote near 0.8 is similar to that found by Scaife et al. (2014) using a longer hindcast and greater ensemble size for the DJF NAO. The dynamics of other seasons are different to those of the austral spring, so results presented here do not imply significant skill in prediction of the SAM at other times. For instance, Shaw et al. (2010) find that a favorable state for downward wave coupling between the stratosphere and troposphere is present only during September-December in the SH. Indeed, the 1-month lead time ensemble mean correlation of the DJF SAM with ERA-Interim is lower than that for SON at r 5 0.39 [95% confidence interval: (0.15, 0.63)].

Conclusions
Using a set of seasonal hindcasts initialized at the start of the austral spring, we have demonstrated skillful prediction of the interannual variability of the Antarctic stratospheric polar vortex at seasonal lead times. This includes capturing an increased likelihood of the 2002 SSW, which is the most extreme year in the ensemble mean and has the only ensemble member in 14 years that simulates an SSW (although another is close to simulating an SSW in 1997). Because this variability is observed to be closely correlated with Antarctic column ozone amounts, we are able to perform skillful predictions of interannual variability in Antarctic ozone depletion.
We also find significant skill in hindcasts of the spring mean SAM index. By studying the variation of this skill with time and height, we suggest that this skill is influenced by stratospheric anomalies that descend with time and are coupled with the troposphere in October and FIG. 8. GloSea5 ensemble mean correlation with ERA-Interim as a function of ensemble size for the SON mean U at 10 hPa and 608S and SON mean SAM (thick lines). A theoretical estimate of the variation of correlation with ensemble size is shown in each case (thin solid lines), along with its asymptote for an infinite sized ensemble (dashed lines). Error bars represent the 95% uncertainty range for the correlation of the full 15-member ensemble, calculated using a bootstrap test.
November. In fact, the influence of the stratosphere is such that skillful statistical predictions of the October SAM can be made using only information from 1 August in the midstratosphere.
Assuming that the 14-yr period studied here is representative of future years, these results suggest that it may now be possible to make skillful seasonal forecasts of interannual variations in springtime ozone depletion and large-scale weather patterns across the Southern Hemisphere.