The tropical influence on sub‐seasonal predictability of wintertime stratosphere and stratosphere–troposphere coupling

A unique set of relaxation experiments with a forecast model initialized during December–January 1999–2019 is used to explore tropical influence on the Northern Hemisphere polar stratosphere and stratosphere–troposphere coupling, to quantify predictability benefits due to the perfect knowledge of the tropical variability. On average, predictability of the polar stratosphere, as represented by the 50 hPa geopotential height anomalies north of 50°N (Z50), increases from 17 days in freely running (control) forecasts to 21 days in the tropical relaxation experiments. At sub‐seasonal time‐scales, a statistically significant improvement in weekly mean skill scores can be demonstrated in 14%–20% of individual forecast ensembles, mostly in cases when the skill of the corresponding control forecast is worse than average. In these forecasts, root‐mean‐square errors and forecast spread of Z50 during forecast weeks 3–5 are decreased by 10%–15%. Stratospheric improvements are detected during periods of both vortex strengthening and vortex weakening, including most major Sudden Stratospheric Warmings that occurred during the study period, via modulation of the upward wave activity fluxes. An active Madden–Julian Oscillation (MJO) is found in most of these events with MJO phase 5–7 preceding vortex weakening and MJO phase 3–4 preceding vortex strengthening. Forecasts with improved stratospheric circulation also have improved tropospheric circulation during and after the periods when improvements are detected in the stratosphere. We attribute these improvements to both stratosphere–troposphere coupling and tropospheric tropical teleconnections.

In this study we aim to better understand the tropical influence on the stratosphere and the stratospheretroposphere coupling and the implications of this influence for the extratropical predictability.To achieve this goal, we compare extended-range historical forecasts by a freely running operational model (control forecasts) with forecasts by the same model in which tropical atmospheric evolution is relaxed toward reanalysis.One can expect that the relaxation experiments would better predict extratropical evolution due to active tropical-extratropical interactions.The difference between the relaxation and control forecasts provides a measure of the tropical influence on the extratropics (Jung et al., 2010).It should be borne in mind that tropical evolution and tropical-extratropical interactions are already represented in the operational model.For example, Vitart (2017) showed that the operational models have skill in predicting MJO up to 4 weeks in advance, and Vitart (2014) showed that the extratropical forecasts made during active MJO are more skilful than those made during non-active MJO.Therefore, the difference between the control and the relaxation experiments will demonstrate the level of additional skill that can be achieved due to the perfect knowledge of the tropical evolution in addition to the skill already present in the model, but will not necessarily reveal all the episodes of active tropical-extratropical interactions.We also evaluate how this additional stratospheric skill can improve the tropospheric forecasts.
The article is organized as follows.Section 2 presents the data.Section 3 starts with discussion of the forecast skill obtained in the stratosphere due to the tropical relaxation and then discusses the associated changes in the tropospheric skill.Section 4 summarizes the results.

DATA AND MODEL EXPERIMENTS
The European Centre for Medium-range Weather Forecasts (ECMWF) model version CY47R1 is used to produce a set of extended-range forecasts.The horizontal resolution of the atmospheric model is Tco319 (about 32 km).The model has 137 levels in the vertical.The experiments follow the practices employed at ECMWF, whereby extended-range forecasts are produced twice a week and each forecast is accompanied by a set of hindcasts initialized at the same calendar day as the operational forecast but during the preceding 20 winters.Here we use the hindcast ensembles corresponding to nine forecasts initialized between 12 December 2019 and 9 January 2020.This sums up to 180 hindcast ensembles, referred to as control experiment (CTRL), covering the period from December 1999 to January 2019 with initialization dates on 12 December, 16 December, 19 December, 23 December, 26 December, 30 December, 2 January, 6 January and 9 January of each year.Each hindcast ensemble consists of 11 members.
Four relaxation experiments are initialized on the same dates as the control experiments.The details of the experiments are summarized in Table 1.In the first one (TROP-S), temperature, winds and humidity in the tropics are relaxed toward corresponding fields from ECMWF's ERA-5 reanalysis (Hersbach et al., 2020) with relaxation time of 2 hr.Two more experiments (TROP-W and TROP-WT) are designed to test the sensitivity of the simulations to the strength of the relaxation.In TROP-W, only temperature and winds are relaxed with a substantially longer relaxation time of 12 hr.In TROP-WT, the relaxation time is 12 hr for temperature and 36 hr for winds and the relaxation is only applied in the troposphere.Although we cannot attribute the difference between TROP-W and TROP-WT results to the lack of the stratospheric relaxation in TROP-WT because of the different relaxation time-scales, the comparison between these experiments does add to the evidence that a better representation of the tropics improves extratropical forecast skill.Additionally, to test the importance of the stratospheric knowledge for tropospheric predictability, another experiment (STRAT), in which the stratosphere is relaxed globally toward ERA-5 is performed.In this experiment the relaxation is done above 50 hPa, that is, only mid-to upper-stratospheric winds and temperatures are relaxed, to guarantee that the relaxation does not directly affect the tropospheric skill.In this respect, STRAT differs from some other stratospheric relaxation experiments in which a relaxation was applied at lower stratospheric altitudes down to 70 hPa (Hitchcock & Simpson, 2014;Kautz et al., 2020) or even to 150 hPa (Huang et al., 2022).
We diagnose stratospheric forecast skill by analysing geopotential height and zonal wind fields at the 50 hPa pressure level.Note that the 10 hPa fields, which are often used for stratospheric analysis, are not available for this study.Tropospheric skill enhancement due to the stratosphere-troposphere coupling is assessed at the 500 hPa pressure level.Tropospheric influence on the stratosphere is assessed by analysing the eddy heat flux at 100 hPa.MJO index in the forecasts is calculated following Vitart (2017) which in turn is based on the method by Wheeler and Hendon (2004).
Forecast anomalies are calculated with respect to the lead-time dependent climatology calculated for each initialization date and each experiment separately as an average over the 20-year hindcast sets initialized at the same calendar date.In the reanalysis, the anomalies are calculated with respect to the 1999-2019 daily mean seasonally varying climatologies.
Spatial anomaly correlation coefficient (ACC), rootmean-square error (RMSE), and the ensemble spread (ES) calculated as the standard deviation across ensemble members are the three diagnostics used for forecast skill assessment (e.g.Wilks, 2006).Following established practices, we refer to the deviation between a forecast ensemble mean and the reanalysis as forecast error, recognizing that the deviation can be caused either by model deficiencies or by the lack of predictability.Both daily and weekly mean skill scores are analysed.The weeks are defined starting from the first forecast day, that is, days 1-7 constitute week 1, days 8-14 constitute week 2 and so on.

Improvements in the stratospheric forecast skill
Comparison of the daily forecast skill metrics for the geopotential height at 50 hPa (Z50) north of 50 • N between the experiments is shown in Figure 1.While the weekly mean forecasts have more value at sub-seasonal scales, the use of the daily metrics allows comparison of the predictability limits defined using an ACC threshold value; hence it is shown here.In CTRL, the predictability defined as the last day when ACC is above 0.6, extends to day 17, that is, to week 3; however, the skill of individual forecast ensembles at sub-seasonal time-scales varies substantially (Figure 1a).The three tropical relaxation experiments show ACC skill improvements with respect to CTRL at most lead times with the largest improvement being in TROP-S, which has the strongest relaxation.In this experiment, the mean ACC is above 0.6 until day 21.In TROP-W and TROP-WT the predictability extends until day 19.Defining the predictability as the last day with the mean ACC > 0.5 also shows an improvement with respect to CTRL in all experiments: CTRL 20 days, TROP-S 28 days, TROP-W 24 days and TROP-WT 23 days.
The tropical relaxation experiments also show a reduction in both RMSE and ES (Figure 1b,c).In TROP-S the reduction in both metrics is about 6%-8% (∼20 m) during weeks 3-5.In TROP-W and TROP-WT, where the relaxation is weaker, the reduction is about 2%-5% (5-15 m).
In STRAT, where the relaxation is applied above the 50-hPa fields, the averaged ACC is above 0.9 at all lags and RMSE and ES are reduced by 50%-60% during weeks 3-5.Note that because of a relatively weak relaxation, STRAT does not always closely follow ERA-5 and in some individual forecast ensembles weekly mean ensemble mean ACC for Z50 drops below 0.8 during weeks 3-5 (37 cases or 20% of all cases) and even 0.6 (3 cases or 1.6% of all cases) (not shown).
Is skill improvement in the relaxation experiments statistically significant?To answer this question, we first compare the individual pairs of relaxation and control forecasts and count the number of cases when the skill of the ensemble mean forecasts in the relaxation experiments exceeds that of the respective CTRL ensemble.Under the null hypothesis that the skills of the forecasts are equal, the number of such cases should be close to 90 (50%).Figure 2a,b shows that for the weekly mean ensemble mean ACC and RMSE skill scores the fraction of such cases varies between 60% and 70% at most lags, peaking during weeks 3 and 4.This result is inconsistent with the null hypothesis according to a binomial test at p = 0.05, implying that the tropical relaxation significantly improves the stratospheric forecasts in the Northern Hemisphere even in the experiments with a weak nudging and not-nudged tropical stratosphere.As expected, the skill of STRAT forecasts is larger than that of CTRL in nearly all cases (>97%) with a few exceptions that occurred because of a relatively weak relaxation in STRAT.
We next address the question of when the tropical influence on the skill of the extratropical stratospheric forecasts is more detectable.To do this we compare the mean skill of individual forecast ensemble members, not the skill of the ensemble mean forecasts as was done in the previous test.Specifically, we first calculate the skill metric for each forecast ensemble member, both for CTRL and the relaxation experiments, and then average the metrics across forecast members to obtain a mean value for each forecast ensemble.The benefit of such a test is that it allows identification of cases when the effect of the relaxation is significant in comparison to the ensemble spread.In other words, we detect cases when the tropical signal is large in comparison with the unpredictable stratospheric variability.The fraction of relaxation forecasts in which the mean skill of the individual members significantly exceeds that of the mean skill of the members of the respective CTRL ensemble (referred thereafter for simplicity as "improved forecasts") is shown in Figure 2c,d for ACC and RMSE metrics.The fraction of such forecasts is relatively small, which indicates that, in most cases, the effect of the relaxation is small in comparison to the unpredictable variability.During weeks 1-2, the skill of CTRL forecasts is relatively large (ACC > 0.7), and the effect of the relaxation is difficult to detect.Thereafter, the number of improved forecasts increases in TROP-S from 14% (25 cases) during week 3 according to both ACC and RMSE skill scores to 21% (37 cases according to ACC and 38 cases according to RMSE) during week 5.Only 18 cases identified using ACC metric during week 3 are also identified using RMSE metric, and only 3 cases identified using both metrics during week 3 are also identified during week 4.This suggests that a detectable impact of the tropical relaxation is intermittent in time and, in most cases, cannot be detected for longer than a week.The number of improved forecasts in either TROP-W or TROP-WT is smaller than that in TROP-S at all lags.Similarly, the number of improved forecasts in TROP-WT is smaller than that in TROP-W, suggesting that the strength of the relaxation in the tropics has a significant influence on the tropical-extratropical interactions.We return to this point later.STRAT shows significant improvements in ∼90% of the forecasts already during week 1, and it shows improvements in nearly all forecasts starting from week 2 onwards (not shown).
Can the tropical relaxation worsen the stratospheric forecasts?To answer this question, we calculated the fraction of relaxation forecasts in which the mean skill of the individual members is significantly worse than that of the respective CTRL.According to both skill metrics, the fraction of such forecasts is about 3% or less in all experiments and at all lags except during week 6 when the fraction exceeds 5% (not shown).Note that during week 6 the skill of the forecasts is low in all experiments, and the forecasts are dominated by the unpredictable noise.

3.2
What are the cases when the stratospheric forecasts have been improved?
We next look in more detail at the cases when significant skill improvement due to tropical relaxation was detected (Figure 2c,d).For brevity we focus on TROP-S experiment because it has the largest number of forecasts with detected influence of the tropical relaxation.
Weekly mean stratospheric forecast skill metrics for TROP-S and CTRL are shown in Figure 3. On average, the weekly mean TROP-S forecasts with either ACC or RMSE metric improved during week 3 have ACC of 0.8 during week 3.This skill is larger than the mean skill across all TROP-S forecasts (0.72).At the same time the skill of the corresponding CTRL forecasts is lower during week 3 (0.62) than the mean skill across all CTRL forecasts (0.66).Similarly, the mean skill of the TROP-S forecasts with detected improvements during week 4 exceeds that of the mean skill across all TROP-S forecasts, but the mean skill of the corresponding CTRL forecasts is worse than the mean skill across all CTRL forecasts according to both ACC and RMSE metrics (Figure 3a,b).Thus, our method mostly detects improvements due to relaxation when the CTRL forecasts perform worse than average.This result does not mean that the forecasts with an average, or higher than average skill cannot be improved due to improved tropical representation.Instead, it suggests that improvements of skilful CTRL forecasts are more difficult to detect.The improved TROP-S forecasts also have a reduced spread, by about 5% during weeks 3-5, in comparison to the mean spread across all TROP-S forecasts.A reduced spread in these forecasts helps to detect improvements according to our method (Section 3.1).
We next look at the synoptic situations for the cases when tropical relaxation led to improved stratosphere forecasts.Figure 4 shows four examples of weekly mean 50 hPa geopotential height anomalies when both ACC and RMSE metrics were improved in the TROP-S forecasts with respect to those in CTRL.The cases clearly differ from each other.The first case, 20-26 January 2010, represents a situation when the polar vortex was shifted toward Eurasia before a major SSW.The second case corresponds to a split-type SSW during 6-12 January 2013.The third case shows a vortex strengthening over North America during 6-12 January 2000.The fourth case shows vortex strengthening over Greenland and northeastern Eurasia during 2-8 January 2007.In all these cases, CTRL strongly underestimates the magnitude of the anomalies seen in the reanalysis, but the forecast errors are largely alleviated in TROP-S.It appears that the tropics can affect the stratosphere in different ways, contributing either to weakening (Figure 4a-h) or strengthening (Figure 4i-p) of the stratospheric polar vortex.
In order to draw general conclusions about the influence of the tropical relaxation on the stratospheric skill we use a composite analysis and form two groups of forecasts corresponding to strengthening and weakening polar vortex.Specifically, the groups are formed based on the difference in the observed stratospheric winds at 50 hPa and 60 • N between a week when improvements in the Z50 forecast skill are detected (Figure 2c,d) and the preceding week.A criterion based on vortex tendency is chosen instead of a criterion based on the vortex strength, because the experiments start to diverge only during forecast week 2, while the vortex strength is affected by events occurring during several previous weeks.As a result, composing forecasts based on the vortex strength smooths out the difference between the experiments.
Altogether, there are 33 individual forecast ensembles in TROP-S with either increased mean ACC of individual ensemble members or decreased mean RMSE, or both, in comparison to those in CTRL during week 3. Initialization dates for these forecasts are listed in Table S1 together with forecast skill scores.Of these cases, 19 cases correspond to vortex weakening (VW3 group) and 14 to vortex strengthening (VS3 group).The VW3 group (Figure 5a) demonstrates positive Z50 anomalies stretching between the Northern Pacific and the Northern Atlantic and negative  anomalies over Eurasia and North America resembling a vortex split.Since ERA-5 represents only a single realization, deviations from the forecast ensemble means are to be expected (Figure 5b-d).However, the fact that the forecast errors are reduced in TROP-S (Figure 5c) in comparison to CTRL (Figure 5d), especially in the Pacific sector, confirms the positive influence of the tropical relaxation.Note that in the composite mean, the forecast errors in TROP-S appear similar in magnitude to those in STRAT (Figure 5b) because individual forecast errors partly cancel each other.However, individual events show smaller errors in STRAT than in TROP-S due to the stratospheric relaxation (Figure 4), as expected.The average RMSE in the VW3 and VS3 forecasts during week 3 is 94 m in STRAT, 142 m in TROP-S and 190 m in CTRL.Similarly, the average ACC for these events decreases from 0.91 in STRAT to 0.81 in TROP-S and 0.62 in CTRL.
Improvements due to the tropical relaxation are also seen in the VS3 group (Figure 5f-h).In this composite, the strongest negative anomalies are over the Canadian archipelago and positive anomalies over northern Eurasia, implying a polar vortex shift toward North America.Note that most of the differences between the forecast and reanalysis are not significant at p = 0.05 due to different structure of the forecast errors between individual events (Figure 4).
A closer look at these events suggests that a worse than average performance of CTRL forecasts is related to their failure to predict a change in the flow evolution beyond synoptic time-scale.This is illustrated in Figure 6a,c which shows evolution of the daily mean zonal mean zonal winds anomalies at 50 hPa and 60 • N (U50) for the composites shown in Figure 5.In the VW3 group (Figure 6a), the winds strengthen for the first 10 days and this strengthening is captured reasonably well by the forecast experiments.However, the change from the strengthening to a weakening around day 10 is not captured well by the composite means.TROP-S clearly performs better than CTRL but still underestimates the weakening seen in the reanalyses by about 2-4 m⋅s −1 during weeks 3 and 4. In the VS3 group, the forecasts also start to diverge from the reanalysis around day 10 when the initial weakening reverses.TROP-S shows reduced wind errors in comparison to CTRL but still underestimates the magnitude of the wind anomalies in the reanalysis.Thus, in both groups it is the shift in the circulation regime during week 2, that the tropical teleconnections help to predict, which leads to a better forecast skill in the polar stratosphere.The VW3 group (Figure 6a) includes forecasts initialized before most major sudden stratospheric warmings (SSW) observed during this period, including SSW 2004, 2006, 2009, 2010, 2013, 2019 (Table S1).Note that all other major SSWs during this period, except SSW in December 2001, occurred in February or March, that is, they are not covered by the forecasts considered in this study.Thus, our experiments suggest that the tropical teleconnections contribute to an SSW forcing or preconditioning of the stratosphere before SSW.For most of the events in this composite, except for the forecast initiated before SSW 2009, the evolution of the observed winds is mostly within the ensemble spread of the respective CTRL and TROP-S ensembles, suggesting that the underestimated wind anomaly is due to a low predictability at these lead times rather than due to a model error.For SSW 2009, whose exceptionally low predictability was reported earlier (Kim & Flatau, 2010;Taguchi, 2016;Karpechko, 2018), the observed wind weakening was outside the spread of the forecast ensembles.Note that a tropical origin of the SSW 2006 was earlier shown by Jung et al. (2010) in a similar relaxation experiment.Also, a tropical origin of the SSW 2009 was earlier suggested by Schneidereit et al. (2017).SSWs are typically associated with larger-than-average forecast errors in the stratosphere (Karpechko, 2018); thus, the positive effect of the relaxation during these periods is easier to detect.
The response of the stratospheric winds is related to a response of the eddy heat flux at 100 hPa (HF100), which is a measure of the tropospheric planetary wave activity propagating to the stratosphere (Karpetchko & Nikulin, 2004;Polvani & Waugh, 2004).Consequently, there is a strong correspondence between forecasted stratospheric winds and forecasted HF100 during the preceding several days (Taguchi, 2016;Karpechko et al., 2018).In VW3 the weakening is driven by an anomalous increase of HF100 during forecast week 2 and 3 with the peak value exceeding 10 m⋅s −1 ⋅K in the reanalysis (Figure 6b).Similarly, the strengthening in VS3 in Figure 6d is linked to a negative HF100 anomaly (Figure 6d) reaching −8 m⋅s −1 ⋅K in ERA-5.In both cases, the TROP-S composites capture a larger fraction of the observed HF100 anomalies compared to CTRL, indicating that the improved stratospheric forecasts in TROP-S are related to improved forecasts of the planetary wave activity propagating from the troposphere.Like TROP-S, STRAT also shows a closer agreement with ERA-5 in terms of HF100 than CTRL.While there is no relaxation applied at 100 hPa in STRAT, we find that the mean ACC score of Z100 does not drop below 0.75, and mean RMSE is reduced by 25%-35% with respect to that in CTRL (not shown).Thus, the relaxation applied at the higher levels strongly affects the flow at 100 hPa, explaining the improved HF100 forecasts in STRAT.
A tight link between HF100 and U50 is also seen within individual ensembles.Figure 7 shows the HF100-U50 link for the four forecasts shown in Figure 4. Similar relationships between U50 and HF100 are seen in the other forecast ensembles (not shown).Defining the strength of the tropical influence on the extratropical fluxes as the absolute difference between the ensemble mean HF100 in TROP-S and CTRL, one can compare it with the forecast spread (one standard deviation) across the respective CTRL members that is used here as a measure of the unpredictable extratropical variability.For the cases shown in Figure 7, the tropical influence varies between 56% (Figure 7a) and 104% (Figure 7c) of the spread.Looking at all VW3 and VS3 forecasts, the tropical influence amounts, on average, to 44% of the CTRL forecast spread, suggesting that the impact of the tropics on the wave activity flux can be significant in comparison to the unpredictable extratropical variability.Another point of interest seen in Figure 7 is that the slopes of the linear regressions of U50 on HF100 differ between the extreme cases by factor 1.7.For the cases when errors in HF100 can be considered as originating in the troposphere (e.g.Karpechko et al., 2018), such a difference may indicate that the stratosphere has different susceptibility to tropospheric forecast errors depending on synoptic situation.Alternatively, for the cases when anomalous HF100 originates in the stratosphere (Birner & Albers, 2017), these differences may indicate nonlinearities in the stratospheric mean flow-wave interactions.
It is of interest to look at the spatial structure of the HF100 anomalies during vortex weakening and strengthening episodes (Figure 8).In both composites, the largest anomalies, either positive (Figure 8a) or negative (Figure 8e), are co-located with the climatological HF100 maximum in the North Pacific.In VS3, the secondary HF100 maximum over Scandinavia is also suppressed.In VW3, there is an HF100 enhancement over the Atlantic, which can be associated with an upper tropospheric ridging such as the one preceding the SSW 2013 (Attard et al., 2016).In all experiments, the largest forecast errors coincide with the largest anomalous fluxes.TROP-S partly alleviates the errors seen in CTRL in both groups, especially over the North Pacific.Again, there is large case-to-case variability in spatial structures of the HF100 anomalies.
So far, the analysis was focused on the forecasts for which the improvements due to the tropical relaxation are detected during week 3. Figure 2c,d shows that the number of forecasts with detected improvements due to the tropical relaxation increases with lead time.This is likely a consequence of the fact that the skill of CTRL deteriorates with lead time, and it becomes easier to improve the skill.Forecasts with improved skill during weeks 4-5 show a better predicted magnitude of the anomalous eddy heat flux from the troposphere leading to an improved zonal mean stratospheric circulation as in Figure 6 (not shown).On the other hand, there is an increased diversity of synoptic situations, which make a composite analysis for these cases less efficient.A focused case-study looking into mechanisms contributing to tropical teleconnections in these cases is warranted; however, this is not done here.

Improvements in the tropospheric forecast skill
Are improved stratospheric forecasts analysed in the previous sections associated with improved tropospheric forecasts?Looking at the weekly mean ACC skill score for the 500 hPa geopotential height north of 50 • N (Figure 9) shows that TROP-S and STRAT have skill comparable to that in CTRL during the first 2 weeks; however, starting from week 3 the positive effect of the relaxation starts to emerge.Overall, the influence of the stratospheric and tropical relaxations has comparable effects on the extratropical tropospheric skill.Charlton-Perez et al. (2021) estimated that a perfect knowledge of the stratospheric conditions would increase the correlation skill score of the North Atlantic Oscillation (NAO) for week 3 from 0.49 (a representative value in the present forecast systems) to 0.56, that is, an increase of 0.07.This increase cannot be directed compared with the ACC increase of 0.03 found for STRAT during week 3 because of a different metric; however, the result found here is broadly consistent with the theoretical estimations of Charlton-Perez et al. (2021).
The composite mean of the TROP-S forecasts with improved stratospheric skill during week 3 shows an enhanced skill in the troposphere during weeks 3 and 4, suggesting that the tropical teleconnections affect both the stratosphere and the troposphere.The skill in TROP-S is superior to that in STRAT during weeks 3 and 4, when an improved stratosphere is expected to improve the skill of the tropospheric forecasts via downward coupling.STRAT has better stratospheric forecasts than TROP-S (Figure 3); however, it does not have the tropical influence present in TROP-S.Therefore, our results indicate that a direct tropical influence via the tropospheric teleconnections (tropospheric pathway) plays an important role in the improved Z500 skill during weeks 3 and 4 together with the stratosphere-troposphere coupling (stratospheric pathway).
It is possible that there was no downward influence from the stratospheric anomalies during these events, either because of a lack of downward propagation following a stratospheric signal (Nakagawa & Yamazaki, 2006;Karpechko et al., 2017), or because of a lack of systematic stratospheric zonal mean flow anomalies in the stratosphere.Figure 10 shows zonal mean geopotential height anomalies averaged over the polar cap for the VW3 and VW3 events separately.According to this diagnostic, which strongly correlates with a Northern Annular Mode (NAM) index and is traditionally used to detect a stratosphere-troposphere coupling (Hall et al., 2021), a downward propagation of the circulation anomalies from the stratosphere to the troposphere was clearly present during VW3 events, but not during VS3 events.In the VW3 group the strongest anomalies appear in the troposphere during week 4; therefore, one can expect that the stratospheric weakening during week 3 has affected the tropospheric circulation during the following weeks.In the VS3 group, the lack of a downward propagation is consistent with a lack of pronounced stratospheric zonal mean anomalies in the composite mean.Note that in this group zonally asymmetric anomalies dominate in the composite mean (Figure 5e), which are not necessarily associated with downward propagation.
Focusing now on the VW3 group, Figure 11 shows the evolution of weekly mean composite mean Z500 anomalies.Starting from week 3, the zonally symmetric circulation is characterized by a negative NAM phase in the reanalysis and the forecasts.The differences between the  forecasts appear in the magnitude and location of the anomalous centres.During week 3, an anomalous low over the Far East is predicted by all forecasts but the anomalous high over the east Pacific and positive anomalies across the central Arctic are best predicted by TROP-S.During week 4, TROP-S also captures the magnitude and the location of the anomalies better than the other composites.Both STRAT and CTRL predict stronger positive anomaly over the Arctic and stronger negative anomalies over North Atlantic and Europe during week 4 than during week 3, which is consistent with expected downward influence following the weakening of the stratospheric polar vortex (Figure 10).By week 5, the observed positive anomaly over the Arctic is not well predicted by TROP-S but it is predicted by STRAT.Also, a negative anomaly over the North Atlantic and western Europe strengthens in STRAT suggesting that this is a part of the stratospheric response.At the same time, the response in TROP-S becomes more zonally asymmetric by week 5 and includes a wave train across North America and Atlantic broadly resembling the one present in the reanalysis.It is worth noting that a significant positive anomaly over the eastern Arctic, resembling the one seen in the reanalysis and in STRAT, is also predicted by CTRL.Thus, although overall the skill of CTRL by week 5 degrades considerably (Figure 9), the freely running forecasts can capture a signal due to the stratosphere-troposphere coupling even at such long lead times.

Tropical forcing due to MJO
Finally, we look at the tropical forcing affecting the extratropical circulation.We focus on the MJO which is the leading mode of the intraseasonal variability in the tropics, and whose influence on the polar stratosphere and the troposphere has been demonstrated in several papers (Garfinkel et al., 2012;Garfinkel & Schwartz, 2017;Vitart, 2017;Domeisen et al., 2020;Statnaia et al., 2020).In both VW3 and VS3 composites the observed MJO amplitude increases with time, reaching its maximum of ∼1.8 at day 15 and day 22 in VW3 and VS3 respectively.Thus, an active MJO is implicated in the forcing of the extratropics during these events.The observed evolution is well captured by TROP-S due to a strong relaxation in this experiment, although the discrepancy with the observation is more pronounced in VS3.TROP-W and TROP-WT experiments also capture an increase in the MJO amplitude; however, due to a weaker relaxation in these experiments, the increase is less pronounced.Thus, one can expect that forcing of the extratropics is less efficient in these experiments, which is consistent with a smaller increase in the forecast skill in these experiments (Figures 1 and 2).Note also that while the peak MJO amplitude in both composites is comparable, in VS3 it is reached only by day 22 while during the period preceding week 3 the amplitude did not exceed 1.6.Thus, one could expect a weaker forcing in the VS3 composite mean, contributing to a weaker extratropical response in comparison to that in VW3 (Figure 6).The evolution of the MJO amplitude in CTRL and STRAT, which have no relaxation in the tropics, is different.In these experiments, the discrepancy with the observations becomes apparent during the first 3-5 days, and the amplitude of the composite means decreases to below 1 at all lags after 10 days.As the ECMWF model skilfully predicts MJO on average beyond 2 weeks (Vitart, 2017), the failure to capture the observed increase in the MJO amplitude contributes to the below-average performance of the CTRL forecasts during these events (Figure 3).
Figure 12c shows the distribution of the observed MJO phases during days corresponding to days 7-14 of the VW3 and VS3 forecasts, that is, a week before the detected tropical influence in the stratosphere.Before the VW3 events, MJO is mostly in phases 5-7, meaning an active convection over the Maritime Continent and the western Pacific.This is consistent with previous studies (e.g.Garfinkel et al., 2012) showing a weakening stratospheric polar vortex due to an enhanced upward wave propagation following MJO over the western Pacific.Before the VS3 events, MJO is mostly in phases 3-4, which corresponds to active convection over the east Indian Ocean.While there are only a few studies linking MJO with a strengthening stratospheric polar vortex (e.g.Garfinkel et al., 2014), Z500 anomalies in the North Pacific associated with MJO phase 3 are nearly opposing to those associated with MJO phase 7 (Henderson et al., 2017).Since the North Pacific is the region associated with a stratospheric forcing by the anomalous upward wave activity propagation (Ineson & Scaife, 2009;Garfinkel et al., 2010), a decreased wave flux and a strengthened stratospheric polar vortex can be expected following MJO phase 3.

DISCUSSION
Anomalous convection caused by the tropical sea-surface temperature anomalies induces a divergent flow in the middle and upper tropical troposphere, which acts as a source for Rossby waves propagating horizontally to the extratropics (Sardeshmukh & Hoskins, 1988;Scaife et al., 2017).Tracking the origin of the extratropical waves is not straightforward, therefore the degree to which the tropical forcing affects the extratropical variability is not well understood.Our comparison of historical forecast ensembles initialized during December-January with tropical relaxation experiments shows that in ∼20% of the forecasts a significant tropical influence can be detected in the Northern Hemisphere stratospheric circulation, which manifests as an improvement in the forecast skill at lead times from 3 to 5 weeks.This estimate marks the lower limit of all possible cases when the tropical-extratropical interactions play an important role in forcing the extratropical variability because cases when the tropical forcing is already well represented in the freely running forecasts cannot be detected using our data.Providing a better estimate for the importance of the tropical-extratropical interactions requires experiments in which these interactions are deactivated by relaxing the tropics toward an inactive state; however, defining an inactive state for such experiments can be challenging.One should also keep in mind that tropical-extratropical teleconnections may not be properly represented in the model, which would also reduce the number of detected cases.
Our results indicate a tropical contribution to the stratospheric anomalies leading to major SSWs 2004SSWs , 2006SSWs , 2009SSWs , 2010SSWs , 2013 and 2019 that occurred in January or early February.A tropical forcing for SSW 2018 has been also suggested in other studies (Statnaia et al., 2020;Knight et al., 2021); however, events occurring in mid-or late February are not covered by the forecasts considered in this study.Improvements in the stratospheric variability associated with the tropical forcing can be attributed to better-captured upward planetary wave activity fluxes, with vortex weakening preceded by an increased flux and vortex strengthening preceded by a weakened flux.The magnitude of the tropical contribution to the upward wave activity fluxes can be as large as one standard deviation across the fluxes predicted by the freely running ensemble members, clearly indicating the importance of a proper representation of the tropical variability in the forecasts for predicting the stratospheric evolution.
We attribute at least part of the tropical influence detected in the Northern Hemisphere polar stratosphere to a correct representation of MJO in the relaxation experiments.While MJO is, on average, skilfully predicted by the ECMWF model beyond 2 weeks, the control forecasts improved in this study by the tropical relaxation are characterized by a poorly predicted MJO beyond 5 days.Most of the cases when the relaxation helped to predict a weakening of the polar stratospheric vortex were preceded by an active MJO in phase 5-7, which was not predicted by the control forecasts.Active MJO in phase 6-7 has been previously associated with forcing of SSWs (Garfinkel & Schwartz, 2017); thus, our results are consistent with the literature.We also find that active MJO over the Indian Ocean (phase 3) is associated with vortex strengthening, again consistent with some previous studies (Garfinkel et al., 2014).While Vitart (2017) found a zonally asymmetric influence of MJO phase 3 on the stratosphere in the reanalysis and S2S models, our results are not necessarily inconsistent with theirs because there is large variability across S2S models and large case-to-case variability as found here.Improved understanding of the tropical teleconnections associated with strengthening of the stratospheric polar vortex is thus required.
We show that improved predictions of the stratospheric circulation in the tropical relaxation experiments are followed by improved predictions in the tropospheric circulation, which might be expected to be a result of the downward influence on the troposphere.However, the corresponding stratospheric relaxation experiments, which have a more accurate representation of the stratospheric variability, do not show more-skilful tropospheric forecasts than those in the tropical relaxation experiments, suggesting that the stratosphere is not the only contributor to the tropospheric improvements in our cases.While we find improved geopotential anomalies in STRAT forecasts over the Arctic and the Euro-Atlantic sector, which is consistent with expected downward coupling, overall, the tropospheric teleconnections from the tropics, which are expected to be better represented in TROP-S, seem to play an equally important role during forecast weeks 3 and 4 in these cases.
Contribution from the stratosphere-troposphere coupling to the tropospheric forecast skill has previously been found for cases when the stratosphere is in an extreme state (Sigmond et al., 2013;Tripathi et al., 2015;Domeisen et al., 2020;Charlton-Perez et al., 2021;Statnaia et al., 2022).The experiments used in our study can help to better quantify when and by how much the tropospheric skill is improved due to the stratospheric improvements.In particular, Charlton-Perez et al. (2021) showed that the NAO correlation skill score during week 3 can be improved by 0.07 due to the perfect knowledge of the stratospheric conditions.Our results demonstrate a similar level on skill improvements, although a thorough comparison has not been made.Note that a significant stratospheric influence at sub-seasonal scales has earlier been found in case-studies using a relaxation approach (Kautz et al., 2020;Huang et al., 2022); however, in these studies the relaxation was applied at lower levels than in the present experiments.Huang et al. (2022) demonstrated that the choice of the relaxation altitude strongly affects the degree to which the lower stratospheric evolution is captured, which plays an important role in the predictability of the tropospheric evolution.
In summary, we show that a systematic use of relaxation experiments for attribution of climate variability on sub-seasonal and seasonal time-scales as is done here provides valuable insights into factors contributing to atmospheric predictability.Repeating such experiments with other systems and adding counterfactual experiments (e.g.switching off stratospheric and tropical influences) would allow drawing broader conclusions about the relative contributions of stratospheric and tropospheric teleconnections on the tropospheric predictability and guide model developments.
-square error (RMSE), and (c) ensemble spread (ES) for daily mean ensemble mean forecasts of 50 hPa geopotential height anomalies over 50 • -90 • N for control experiment (CTRL) and the relaxation experiments.Dark shading marks the 25%-75% range across the skill of the individual CTRL ensembles; light shading marks the 5%-95% range.
a,b) Number of cases (in %) when (a) the spatial anomaly correlation coefficient (ACC) skill score or (b) the root-mean-square error (RMSE) skill score of the ensemble mean forecasts in the relaxation experiments exceeds that of the corresponding control experiment (CTRL) ensemble mean forecast.The shaded area marks the number of cases consistent with the null hypothesis of equal skill at p = 0.05 according to a binomial test.(c,d) Number of cases when (c) the mean ACC skill score or (d) the mean RMSE skill score of the forecast ensemble members in the relaxation experiments are significantly larger (for ACC skill score) or significantly smaller (for RMSE skill score) than the mean skills of the corresponding CTRL forecast ensemble members according to a two-sided t-test at p = 0.05.
U R E 3 (a) Spatial anomaly correlation coefficient (ACC), (b) root-mean-square error (RMSE) and (c) ensemble spread (ES) skill scores for weekly mean ensemble mean forecasts of 50 hPa geopotential height anomalies over 50 • -90 • N for control experiment (CTRL) (black), TROP-S (purple) and STRAT (blue) experiments.Solid lines show mean skill scores for all forecasts.Dotted and dashed lines show mean skill scores for forecasts that have better skill scores in TROP-S either according to ACC or RMSE metrics during weeks 3 and 4 respectively (see text for details of the method).
e,i,m) Weekly mean 50 hPa geopotential height anomalies in difference between forecasts and the reanalysis for the periods when week 3 forecasts in the stratosphere show higher skill score for TROP-S than control experiment (CTRL).(a-d) 20-26 January 2010, forecasts initialized on 6 January 2010; (e-h) 6-12 January 2013, forecasts initialized on 23 December 2012; (i-l) 6-12 January 2000, forecasts initialized on 23 December 1999; (m-p) 26 December 2011-1 January 2012, forecasts initialized on 12 December 2011.Also shown are forecast anomaly correlation coefficient (ACC) and root-mean-square error (RMSE) (in metres) skill scores.
5 (a,e) Composites of weekly mean 50 hPa geopotential height anomalies in ERA-5 and (b-d, f-h) difference between forecasts and reanalysis for the periods when TROP-S week 3 forecasts in the stratosphere show higher skill score compared to control experiment (CTRL).Composites are shown for (a-d) vortex weakening (VW3) group and (e-h) vortex strengthening (VS3) group.Hatching shows areas where (a,e) the composite mean anomalies are significantly different from 0, or (b-d, f-h) forecast errors are significant according to a two-sided t-test at p = 0.05.Also shown are averaged anomaly correlation coefficient (ACC) and root-mean-square error (RMSE) (in metres) skill scores for each composite.
Scatterplots of zonal mean zonal winds anomalies at 50 hPa and 60 • N (U50) at day 20 against eddy heat flux anomalies at 100 hPa and 45 • -75 • N (HF100) averaged over forecast days 10-20 across individual forecast ensemble members for the same cases as those shown in Figure4.Large symbols show ensemble means.Error bars are the 95% confidence intervals for the ensemble means.Also shown are the slopes of the linear regressions of U50 on HF100 across combined control experiment (CTRL) and tropical (TROP-S) members and the correlation coefficients together with the 95% confidence intervals.
9 (a) Spatial anomaly correlation coefficient (ACC), (b) root-mean-square error (RMSE) and (c) ensemble spread (ES) skill scores for weekly mean ensemble mean forecasts of 500 hPa geopotential height anomalies over 50 • -90 • N for control experiment (CTRL) (black), tropical (TROP-S) (purple) and stratospheric (STRAT) (blue) experiments.Solid lines show mean skill scores for all forecasts.Dashed lines show mean skill scores for the forecasts that have better skill scores in TROP-S during weeks 3 according to either ACC or root-mean-square error (RMSE) skill score.

F
I G U R E 10 Normalized polar cap (65 • -90 • N) geopotential height anomalies in ERA-5 for (a) vortex weakening (VW3) and (b) VS3 composites.Hatching indicates anomalies significantly different from 0 according to a two-sided t-test at p = 0.05.Week 3 is highlighted with magenta lines.
Composites of weekly mean 500 hPa geopotential height anomalies in (a,e,i) ERA-5 and (b-d, f-h, j-l) forecasts in the vortex weakening (VW3) composite.(a-d) week 3; (e-h) week 4; (i-l) week 5. Hatching shows areas where the composite mean anomalies are significantly different from 0 according to a two-sided t-test at p = 0.05.CTRL, control experiment; STRAT, stratospheric; TROP-S, tropical.
12 (a,b) Mean Madden-Julian Oscillation (MJO) amplitude for (a) vortex weakening (VW3) and (b) VS3 composites in BOM's analysis (black lines) and the forecasts.(c) Distribution of MJO phase in BOM's analysis for the days corresponding to days 7-14 of the VW3 and vortex strengthening (VS3) forecasts.