Subseasonal predictability of the North Atlantic Oscillation

Skillfully predicting the North Atlantic Oscillation (NAO), and the closely related northern annular mode (NAM), on ‘subseasonal’ (weeks to less than a season) timescales is a high priority for operational forecasting centers, because of the NAO’s association with high-impact weather events, particularly during winter. Unfortunately, the relatively fast, weather-related processes dominating total NAO variability are unpredictable beyond about two weeks. On longer timescales, the tropical troposphere and the stratosphere provide some predictability, but they contribute relatively little to total NAO variance. Moreover, subseasonal forecasts are only sporadically skillful, suggesting the practical need to identify the fewer potentially predictable events at the time of forecast. Here we construct an observationally based linear inverse model (LIM) that predicts when, and diagnoses why, subseasonal NAO forecasts will be most skillful. We use the LIM to identify those dynamical modes that, despite capturing only a fraction of overall NAO variability, are largely responsible for extended-range NAO skill. Predictable NAO events stem from the linear superposition of these modes, which represent joint tropical sea-surface temperature-lower stratosphere variability plus a single mode capturing downward propagation from the upper stratosphere. Our method has broad applicability because both the LIM and the state-of-the-art European Centre for Medium-Range Weather Forecasts Integrated Forecast System (IFS) have higher (and comparable) skill for the same set of predicted high skill forecast events, suggesting that the low-dimensional predictable subspace identified by the LIM is relevant to real-world subseasonal NAO predictions.


Introduction
The North Atlantic Oscillation (NAO) represents a north-south 'see-sawing' of the Atlantic jet stream, with the positive phase bringing increased precipitation and above average temperatures to northern Europe and decreased storminess to southern Europe and eastern North America, and vice versa for the negative phase (see Hurrell et al 2003, Kenyon and Hegerl 2008  govern atmospheric variability. However, on the intermediate subseasonal time scales lying within the 'weather-climate prediction gap' , where neither weather nor boundary forcing strongly determine predictability (e.g. Von Neumann 1960, Mariotti et al 2018, NAO forecast skill has remained stubbornly low on average. This has spurred efforts to identify 'forecasts of opportunity' where skill is expected to be relatively high (Lang et al 2020, Mariotti et al 2020 due to physical processes driving climate signals large enough to be predictable in the face of unpredictable weather noise (Albers and Newman 2019). For example, in the landmark paper by Baldwin and Dunkerton (2001) (hereafter, BD2001), stratospheric northern annular mode (NAM) anomalies were proposed as long-range predictors (or 'stratospheric harbingers') for anomalous tropospheric NAM/NAO events.
NAO variability during winter is driven by numerous physical mechanisms including remote tropical forcing by the Madden-Julian Oscillation (MJO) (Ferranti et al 2018, Tseng et al 2018, Mayer and Barnes 2020 and ENSO (Ayarzagüena et al 2018, King et al 2018, Nie et al 2020, sudden stratospheric warmings (SSW) (Sigmond et al 2013, Tripathi et al 2015, Domeisen et al 2020, and eddymean dynamics of the zonal index (Gerber andVallis 2007, Hitchcock andSimpson 2016). These mechanisms may yield similar NAO-like patterns, but each operates over different, overlapping timescale ranges. So, while the combined effects of these dynamical processes may result in above average NAO forecast skill, it is difficult to extract their individual contributions to subseasonal NAO predictions (Butler et al 2014, Polvani et al 2017, Domeisen et al 2019. Patterns of atmosphere-ocean variability arising from multiple physical processes unfolding across different timescale ranges may be a consequence of 'nonnormal' dynamical systems (Farrell 1988, Borges and Hartmann 1992, Farrell and Ioannou 1996, Newman et al 2003, Mitas and Robinson 2005, Coy and Reynolds 2014, Breeden et al 2020, Henderson et al 2020. These systems have dynamical 'modes' that may, at times, have a fairly similar spatial appearance, yet generally have fairly different temporal evolutions. In the most basic sense, this means that transient anomaly growth of a common geographic pattern (e.g. the NAO) can arise when several of the dynamical modes projecting on this pattern initially destructively interfere but eventually, as they evolve differently, constructively interfere.
Here we show that the nonnormal dynamics of observed NAO subseasonal variability can be well modeled by a type of empirical-dynamical model called a linear inverse model (LIM) Sardeshmukh 1995, Albers andNewman 2019). In particular, we show that the LIM predicts when subseasonal NAO forecasts will be most skillful. In addition, using a 'nonnormal filter' derived from the LIM, we find that while most NAO variability is associated with predominantly tropospheric dynamical modes whose predictability is limited to about two weeks, skillful subseasonal NAO forecasts are due to a handful of dynamical modes dominated by joint tropical sea-surface temperature-lower stratosphere variability, plus one mode representing downward propagation from the upper stratosphere.

The LIM and its nonnormal eigenspaces 2.1.1. Model description
In a LIM, the chaotic evolution of a 'coarse-grained' climate anomaly is approximated by the sum of slowly evolving, predictable linear dynamics and rapidly evolving, unpredictable white noise (Hasselmann 1976, Penland andMatrosova 1994; see also Penland 1996, Just et al 2001. The LIM is written as: where the matrix L is constructed from covariances of the anomalous climate state vector x and Bη is an observationally constrained white noise forcing vector, whose spatial structure matrix B is determined from a balance condition derived from (1) (Penland and Sardeshmukh 1995; see Albers and Newman 2019 and the supplement for details of our LIM's construction (available online at stacks.iop.org/ERL/16/044024/mmedia)). As a dynamical model, the LIM can be run either as a forecast model or as a climate model: To make predictions, the LIM's infinite-member ensemblemean forecast at lead τ isx (t + τ ) = exp [Lτ ] x (t) (Newman et al 2003); to generate lengthy climate simulations, equation (1) is integrated forwards in time as in Penland and Matrosova (1994) (see supplement for details of our stochastic integration). While in principle x represents the entire climate anomaly, in practice it contains a more limited set of variables; here, we use weekly averaged anomalous mean-sea level pressure (MSLP), tropospheric stream function and geopotential height, stratospheric stream function, tropical diabatic heating, and tropical SSTs, where the data is from the Japanese 55 year Reanalysis (Kobayashi et al 2015) for extended boreal winter (December-March, 1979-2017. Other variables and/or levels may then be obtained via multivariate linear regression onto x (see section 2.2).

LIM nonnormal filter
We used the LIM to construct a 'nonnormal filter' (Penland and Matrosova 2006) that can be applied to LIM output or JRA-55 observations projected onto the LIM state vector. The dynamics within our forecast operator L may be diagnosed through the differential evolution of its eigenmodes (Penland 1989, Penland and Matrosova 1994, Von Storch et al 1995; generally, these are nonnormal eigenmodes that are not orthogonal to each other (Strang 2006). The filter separates the small subset of eigenmodes with substantial stratospheric and tropical sea-surface temperature (SST) components, which we find are key to the predictable subseasonal NAO signal, from the many remaining eigenmodes representing tropo-spheric internal variability These two sets of eigenmodes, or 'subspaces' , are defined as follows (see supplement for details): (a) the 'internal' subspace has eigenmodes with e-folding (decay) times ranging continuously from 1 to 22 d, along with zero SST amplitude and relatively small stratospheric amplitudes; (b) the 'stratosphere-SST' subspace has far fewer eigenmodes, all with longer e-folding times (>32 d) and low frequencies, all with pronounced stratospheric amplitudes, and all but one with SST components. Note that, similar to earlier studies (Newman et al 2009, Henderson et al 2020, the internal subspace includes an MJO-like eigenmode. The complete set of eigenmodes spans x, so any anomaly (or composite of anomalies) can be represented as the sum over all eigenmodes. However, the nonnormality means that quadratic quantities such as power and anomaly correlation are not strictly additive; for example, total variance is generally less than the sum of the variances of each eigenmode (Farrell and Ioannou 1996).

Signal-to-noise calculation
High skill NAO forecasts are identified from the LIM's 'expected skill' , derived from the signal-to-noise ratio of each infinite-member ensemble-mean LIM forecast (Sardeshmukh et al 2000, Newman et al 2003, Albers and Newman 2019. The expected skill for lead τ at forecast initialization time t is where the squared forecast signal-to-noise ratio, S 2 (t; τ ) , is determined in the LIM from the statedependent forecast signalx (t + τ ) and the expected state-independent, forecast lead-dependent error variance (Newman et al 2003). Because equation (2) is defined for a 'perfect model' infinite member ensemble forecast (Sardeshmukh et al 2000), it provides a theoretical upper bound on mean predictability as a function of forecast lead time.

Observed and LIM NAM composites
In this paper, we compare composites over observed NAM events to the corresponding composites in the long LIM climate run. The NAM index is calculated individually at each pressure level as a polar cap average ( Observed NAM events are identified when the NAM index exceeds 1.57 standard deviations (STDs) at 10 hPa. This threshold roughly matches the event/year frequency (0.43 events per year) for negative NAM (weak vortex) events used in BD2001, which provided a reasonable balance between event sample size and event strength; this choice does not qualitatively impact our results (cf figure 1 to figures S2 and S3). For the LIM climate runs, we choose STD thresholds that closely matched the events/year frequency of the observed NAM composites; for 10 and 300 hPa levels, this equated to 1.4 and 1.6 STDs, respectively.
The LIM state vector, x, includes a limited set of variables on a limited set of pressure levels. Thus, to construct multilevel (1000-1 hPa) NAM composites, which require geopotential height on levels not explicitly included in x, we constructed a series of ridge regression models (Hastie et al 2015) that takes as input any time series of LIM model output (or JRA-55 observations projected onto the LIM state vector basis) and outputs the first 25 EOFs of geopotential height (explaining roughly 93% and 98% of the geopotential height variance for tropospheric and stratospheric levels, respectively) on a 30 • -90 • N five-degree grid on 16 pressure levels (units of hPa): (1, 5, 10, 30, 50, 70, 100:100:1000). The goodness of fit between the observed NAM time series and the regression model-based NAM time series is very good, with the correlation coefficients varying between 0.92 and 0.94, depending on the pressure level, and negligible mean-square error (see supplement S4 and figure S4 for details).
Significant differences between the observed NAM anomalies and those from the LIM climate run are calculated via 10 000-member Monte Carlo simulations (based on resampling of the LIM climate run where positive and negative NAM polarities are calculated separately). Differences are deemed to be significant if an observed NAM values lies outside the 99th percentile of the Monte Carlo distribution at each lag and pressure level; these values are denoted by light stippling in figure 1 (see supplement for details of the Monte Carlo simulations).

Results
BD2001 hypothesized that stratospheric NAM anomalies may lead to enhanced NAO skill. Thus, we begin by repeating the BD2001 'stratospheric-based' (defining onset at 10 hPa) NAM composite analysis using the JRA-55 dataset (figures 1(a) and (d), respectively). Applying the nonnormal filter, we find that nearly all of the observed NAM composite anomaly is contained within the stratosphere-SST subspace for both negative (cf figures 1(b) and (c)) and positive (cf figures 1(e) and (f)) events. There are two important exceptions: before onset (negative lags), there are notable upper stratospheric internal subspace anomalies consistent with the 'vortex preconditioning' often observed prior to SSWs (McIntyre 1982, Albers and Birner 2014), and after onset (positive lags), significant internal subspace anomalies exist within the troposphere and lower stratosphere, discussed below.
To explore the dynamics of downward propagating NAM anomalies, we constructed NAM composites from the sum of all negative and positive NAM events drawn from the LIM climate model run (3097 total events in roughly equal proportions, which can be summed because of the linearity of the LIM). These composites replicate the key features of observed NAM events (cf figures 2(a) and 1(a), (d)), including the differences between the stratosphere-SST (cf figures 2(b) and 1(b), (e)) and internal (cf figures 2(c) and 1(c), (f)) subspace composites, suggesting that these are not artifacts imposed by the filtering technique. The LIM composites are much smoother than the observed NAM composites, which exhibit the familiar 'dripping paint' features, mostly due to the much larger sample size the LIM composite represents; composites of smaller LIM subsamples also have 'drips' (not shown).
There are two key features of the observed composite that the LIM climate run does not reproduce (indicated by stippling in figure 1, based on Monte Carlo-based resampling of the LIM climate run, see supplement). First, the upper stratospheric onset for the observed negative NAM composite is more 'sudden' , with nearly vertical anomaly tilt and slightly more intense peak amplitude (cf figures 1(a) and 2(a)). This is also evident in the stratosphere-SST subspace alone (cf figures 1(b) and 2(b)). The LIM's inability to simulate these details could be due to nonlinear SSWs (Birner and Albers 2017) occurring during negative but not positive events (Gerber and Martineau 2018). Alternatively, such nonlinear behavior (i.e. asymmetry between negative and positive NAM events) could be consistent with a LIM whose noise amplitude matrix B includes a linear dependence upon the state (i.e. 'correlated additive-multiplicative noise' , Sardeshmukh and Penland 2015, Martinez-Villalobos et al 2019), yielding linear deterministic dynamics but non-Gaussian distributions. The second key difference is the internal subspace anomaly centered just above the tropopause for the first 10 d after lag 0 (figures 1(c) and (f)), which appears to strengthen the drips within the downward propagating stratosphere-SST subspace anomalies. Its location and timing suggest some combination of eddy feedbacks (Song and Robinson 2004, Hitchcock and Simpson 2014, 2016 and lower stratospheric isentropic mixing (de la Ca´mara et al 2018), both of which may be nonlinear even on the coarse-grained weekly time scale.
Within the stratosphere-SST subspace, there is one eigenmode of particular interest that we call the 'downward-propagating' NAM eigenmode. Unique among the subspace's eigenmodes, it has almost all of its standardized amplitude in the stratospheric portion of the LIM state vector (nearly 80%) and no amplitude in either tropical SSTs or diabatic heating (see figure S1 for additional details). Notably, the downward propagating anomaly seen in the stratosphere-SST subspace portion of the LIM composite (figure 2(b)) is almost entirely due to this single downward-propagating eigenmode alone (figures 3(a) and (b)).
Are tropical SSTs and heating therefore unimportant for surface-based NAM/NAO variability overall? Certainly not. To see this, we constructed a 'troposphere-based' NAM composite whose onset is located at 300 hPa (figures 2(d)-(f)) rather than at 10 hPa (figures 2(a)-(c)). The stratosphere-SST subspace still explains most of this composite, and all of it beyond +15 d (figures 2(e) and (f)). However, the downward-propagating stratospheric eigenmode is relatively weaker (figure 3(c)), while the remaining stratosphere-SST eigenmodes, predominantly tropospheric in character ( figure 3(d)), dominate the composite. Consequently, the stratosphere-based and troposphere-based NAM events have distinctly different surface patterns, with the former having a predominantly Atlantic signature (figure 4(a)) (Butler et al 2014, Hitchcock andSimpson 2014), and the latter having a more annular appearance ( figure 4(b)) (Thompson and Wallace 2000). These differences result from the downward-propagating eigenmode having near zero tropical characteristics (figure 4(c)) and therefore no Pacific basin response, while the troposphere-based NAM composite is dominated by stratosphere-SST eigenmodes with non-canonical ENSO-like (Capotondi et al 2015) SST anomalies (figure 4(d)) and a western tropical Pacific heating anomaly that decays prior to onset (not shown) (Newman and Sardeshmukh 2008).
To gauge the importance of stratosphere-SST subspace dynamics to overall variability, we compare the NAO time series to its internal and stratosphere-SST subspace components (figure 5). While the internal subspace NAO time series is well correlated with the total NAO (0.75, figure 5(a)), the stratosphere-SST subspace NAO time series is not (0.37, figure 5(b)); results are similar when the time series are smoothed with an additional lowpass temporal filter (not shown). Also, while the total and internal subspace time series have nearly identical power spectra, the stratosphere-SST subspace also has substantial power, especially at lower frequencies ( figure 5(c)). In fact, the sum of the subspace variances is 146% of the total variance. This is a measure of the importance of nonnormality to NAO amplification, since some of this internal subspace variability acts to cover up (destructively interfere with) slowly evolving stratosphere-SST subspace anomalies, with Figure 3. Contribution of the 'downward-propagating' NAM eigenmode and the remaining stratosphere-SST eigenmodes to the total stratosphere-SST subspace NAM composites. Stratosphere-based NAM composite contributions from the (a) 'downward-propagating' NAM eigenmode only and (b) the remaining stratosphere-SST subspace eigenmodes; note that these two panels sum to figure 2(b). Troposphere-based NAM composite contributions from the (c) 'downward-propagating' NAM eigenmode only and (d) the remaining stratosphere-SST subspace eigenmodes; note that these two panels sum to figure 2(f). As in figure 1, contour values larger than ±0.6 are purple/red, with white lines denoting values beginning at ±1 in intervals of 0.5; the thin vertical and horizontal lines denote lag-zero and the nominal tropopause, respectively. The 'downward-propagating' NAM eigenmode is denoted by the star markers in figures S1(a) and (b), while the remaining stratosphere-SST eigenmodes are the remaining eigenmodes within the blue shaded region of the total stratosphere-SST subspace in figure S1(b). Units in all panels are standard deviations. total anomaly growth occurring as the internal subspace anomalies subsequently decay or evolve away (Farrell and Ioannou 1996).
Finally, we quantify how the two subspaces contribute to enhanced subseasonal forecast skill by first identifying higher skill forecasts based on the LIM's expected skill (section 2.1.3, equation (2); see also figure S5), calculated separately for each forecast day and lead. For both the LIM and IFS (figure 6(a)), NAO forecast skill for leads of 3-4 weeks averages about 0.5-0.6 for the 15% of all hindcasts expected by the LIM to have highest skill, which is significantly greater than the skill of the remaining 85%. Next, we created two additional sets of LIM hindcasts where the forecast initial conditions were filtered to include either the internal (e.g. green line in figure 5(a)) or the stratosphere-SST (e.g. blue line in figure 5(b)) subspace components. While the internal subspace provides the greater contribution to high skill forecasts at short leads ( figure 6(b)), by leads of three weeks nearly all of the forecast skill is found within the stratosphere-SST subspace alone.

Discussion
We find that skillful subseasonal NAO forecasts arise due to a relatively small eigenmode subspace dominated by stratospheric and tropical SST-related  , overlaid with the same time series filtered to only include variance from the internal (green) or stratosphere-SST (blue) subspaces, respectively. (c) Power spectra of the NAO index for unfiltered JRA-55 data and for the same data filtered to include only variance determined from the internal (green) or stratosphere-SST (blue) subspaces; note that the truncated frequency/period that is plotted reflects the DJFM time window used for the calculation. Also, since the subspaces are not orthogonal, the variances are not additive (see section 2.1.2 for details). processes ( figure 6(b)). However, overall NAO variability is dominated by the internal subspace (figure 5), which is largely unpredictable beyond two weeks ( figure 6(b)). Identifying higher skill subseasonal forecasts, therefore, requires predicting when stratosphere-SST subspace anomalies are large relative to initial internal subspace 'noise' , which the LIM's signal-to-noise ratio is able to do (figure 6). Indeed, its identification of high skill NAO forecasts (skill >0.6 for 15% of all weeks 3 and 4 forecasts) notably improves upon existing methods of identifying conditional skill. For example, prior knowledge of the MJO can boost IFS NAO forecast skill above 0.5 only for forecast leads a little past two weeks (Ferranti et al 2018). Conditioning NAO forecasts on SSWs (Tripathi et al 2015) yields skill above 0.5 through forecast week 3, but only for a smaller fraction of all DJF forecasts (∼5%).
Even though the predictable portion of the subseasonal NAO lies primarily within the stratosphere-SST subspace, quantifying its potential predictability still requires understanding the dynamics of the internal subspace, whose interference with the stratosphere-SST signal is critical to predictable  (a)), but with skill from hindcasts given initial conditions filtered to only include the stratosphere-SST (blue, see figure 5(b)) or internal (green, see figure 5(a)) subspace portions of the LIM state vector; 95th percentile bootstrap confidence intervals are shown as transparent shading. NAO hindcast skill from the LIM and IFS is calculated by linear correlation between each model's predicted time series and the JRA-55 verification time series. Hindcasts include initialization dates on or after 1 December that also have verification dates on or before 15 March and are sampled for the common twice-weekly dates available for the IFS, covering the years 1997-2016. LIM hindcasts are ten-fold cross-validated (as in Albers and Newman 2019; see supplement for details). IFS hindcasts are from model version CY43R1/R3, operational in 2017, obtained from the Subseasonal-to-Seasonal Prediction Project database (Vitart et al 2017). IFS anomalies are computed by removing the lead dependent and model specific climatologies, which also serves as a mean bias correction. In identical fashion to the LIM hindcasts, the IFS hindcasts are computed as 7 d running mean anomalies and interpolated to an area-averaged five-degree grid. Confidence intervals for LIM and IFS hindcasts are calculated via bootstrapping (see supplement for details).
anomaly growth. Similarly, our focus on identifying the predictable NAO signal should not obscure the need to diagnose the unpredictable noise, which is also important to correctly estimating the signal-tonoise ratio (Scaife andSmith 2018, Strommen 2020) and which the internal subspace must dominate, given its importance to the overall variance. Note that our approach is potentially relevant to understanding the predictability of other modes of variability that may also result from nonnormal dynamics, including the Pacific-North America pattern (Henderson et al 2020) and the Pacific Decadal Oscillation (Newman et al 2016).
Within the stratosphere-SST subspace, we find two distinct types of dynamical modes: a single 'downward-propagating' stratospheric eigenmode, which has no tropical SST component and primarily an Atlantic surface signature, and a set of eigenmodes with joint stratospheric and SST components and joint Pacific-Atlantic surface signatures. It is notable that the downward-propagating eigenmode appears to account for nearly all of the NAM/NAO-related variability first identified by BD2001. The downwardpropagating eigenmode itself likely represents several systematically co-evolving physical processes. For example, its stratospheric portion is consistent with the polar night jet oscillation (PJO) (Kuroda andKodera 2004, Kohma et al 2010). However, the PJO is unlikely to extend deeply into the troposphere (Hitchcock and Shepherd 2013, Hitchcock et al 2013), suggesting that the tropospheric portion of this eigenmode is related to nonlinear eddy feedback processes, which both previous LIMs (Winkler et al 2001, Newman et al 2003 and comprehensive climate model studies (Hitchcock andSimpson 2014, 2016) have suggested may be effectively linear for suitable time-averaging windows. While disentangling the physical processes responsible for the other stratosphere-SST eigenmodes requires future study, these eigenmodes are relevant to the current debate regarding the relative importance of tropospheric versus stratospheric ENSO teleconnections Even if weekly averaged downward-propagating NAM events entirely resulted from effectively linear dynamics, the consequences of nonlinearity on shorter (e.g. daily) time scale NAM predictability could likely still be quite large. For example, like dynamic forecast models (Kim and Flatau 2010), the LIM is unable to predict most SSWs more than a week in advance. However, once the LIM is initialized with the SSW, its 4 week lead forecasts often show enhanced skill for many consecutive forecast cycles. That is, the fundamentally nonlinear 'sudden' initiation of SSWs may not be predictable on subseasonal forecast leads, but once the SSW has begun, the subsequent downward propagating NAM anomaly evolves with predictably linear dynamics.

Data availability statement
The data that support the findings of this study are openly available at the following URL/DOI: ftp: //ftp2.esrl.noaa.gov/Projects/LIM/Weekly/Albers NewmanERL2020/.