Predicting the MJO using interpretable machine-learning models

,


THE MADDEN-JULIAN OSCILLATION
The Madden-Julian oscillation (MJO) is one of the most enduring scientific mysteries in tropical meteorology, and a key source of sub-seasonal to seasonal predictability around the globe.MJO events are planetary-scale, coupled, coherent systems of convection (e.g.thunderstorms and rain) and circulation (e.g.zonal and meridional wind) in the tropics.

Key features of the MJO include:
Planetary spatial scale: MJO wind signals can extend up to 10,000 km, whereas convective signals (like the rain seen in Figure 1) are on the order of several 1,000 kms.
Intraseasonal timescale: The MJO oscillates with a variable period of 20-90 days.
Eastward propagation: The MJO propagates east with an average speed of 5 m/s.

Tracking the MJO
We use an MJO index called the Real-time Multivariate MJO index (RMM [2]) to track the strength and location of the MJO.RMM is formed using daily mean values of three variables: outgoing longwave radiation (OLR: a proxy for convection) zonal wind at 850 hPa (e.g. in the lower troposphere) zonal wind at 200 hPa (e.g. in the the upper troposphere) To form RMM, these three variables are processed to highlight MJO signals in the tropics, then an empirical orthogonal function analysis is used to create two indices known as RMM1 and RMM2 (see [2] or reach out for details!).
https://agu2020fallmeeting-agu.ipostersessions.com/Default.aspx?s=DE-FE-A0-CA-1F-5A-91-78-6B-D3-76-86-47-A0-76-7A&pdfprint=true&guestview=true 3/18  RMM can be used to assess two aspects of the MJO: MJO strength: The RMM amplitude, or distance from the origin to a point in RMM phase space, is a measure of MJO strength.Typically an amplitude of 1 is used as a threshold for an active MJO (see Fig. 2).Amplitude is measured as: MJO location (a.k.a.MJO phase): The phase of the MJO is an integer from 1 to 8, corresponding to octants in RMM phase space.The MJO phase tracks the physical location of the MJO (see Fig. 2).

MODELING THE MJO WITH MACHINE LEARNING
We develop two types of machine-learning models to predict the strength and/or location of the MJO: one type using classification neural networks, and a second using regression neural networks.Both model types input maps of tropical variables, and output information on the strength and or location of the MJO N days into the future (here N varies between 0 and 20 days).The first ML model we develop predicts the RMM phase (e.g.location) of the MJO.This "classification model" (Fig. 3) outputs the confidence that the MJO will be in each phase: the prediction being the phase with the highest confidence.The second ML model we develop predicts the strength and location of the MJO as quantified by the RMM index (see Section 1).This "regression model" outputs values of RMM1 and RMM2.
https://agu2020fallmeeting-agu.ipostersessions.com/Default.aspx?s=DE-FE-A0-CA-1F-5A-91-78-6B-D3-76-86-47-A0-76-7A&pdfprint=true&guestview=true 5/18 While the regression model outputs more information (we can infer both the strength and location of the MJO from RMM1 and RMM2), the classification model is more suitable for the interpretability methods in Section 4. Note that the classification model framework can be adapted to include information about the MJO amplitude (for example, by splitting each MJO phase into separate "strong" and "weak" bins).This will be explored as we continue our work.We show in Section 3 that performance is relatively comparable between models, to the extent they can be compared.

Scroll down to explore:
additional information about the input data further technical model details

Additional Input Data Information
Table 1 provides more details regarding data sources, variables, and resolution (data documentation in 3, Fig. 9).

Additional Model Information
Input data pre-processing: All input variables are processed the same way.At each grid point, the daily climatological mean, the first four harmonics of the seasonal cycle, and the mean of the previous 120 days are removed (analogous to RMM pre-processing).Unlike RMM, no latitudinal averaging is performed.The mean and standard deviation of the variable are then calculated over the tropics: the data is normalized by subtracting the mean and dividing by the standard deviation.
Input data: Individual models are trained on input of daily maps of between 1 and 3 variables chosen from the above.The model is given daily maps on a particular day ("lead time 0"), and predicts either RMM1/2 (regression) or RMM phase 1-8 (classification) N days into the future ("lead time N").Different models are trained for different lead times.

Model settings:
The model is a fully connected, feed-forward neural network.The input vector is typically of length 17 lat.pts.x 144 lon.pts = 2,448 pts.per variable (except SST).The hidden layer is 8 nodes and uses a "ReLu" activation function, with ridge normalization to help prevent overfitting.The ridge parameter is manually tuned for each model.More complex architectures were considered, but a simple single layer was found to be perform as well or better than larger/deeper networks.The loss function is mean square error (regression) or categorical cross entropy (classification).

MACHINE LEARNING MODEL PERFORMANCE
Scroll down to explore: Regression & classification ML model performance Sensitivity to variable choice and data amount Figure 5 shows performance of a series of ML regression models compared to a "persistence" statistical baseline model (see below).We found it improved the ML model skill substantially to train models separately for two halfs of the year, "winter" and "summer" (see Fig. 5 caption).
Key aspects of Figure 5: The ML regression model outperforms the statistical baseline model (except at very short lead times), with the winter model showing skill out to ~16-17 days.
Skill is lower in summer, at ~10 days.
The amplitude error indicates the ML model shows weaker-than-observed MJO amplitudes, with the error growing worse at longer lead times.
State-of-the-art dynamical models (like those participating in the S2S Project [8], not shown) have MJO skill that varies from 10 to 38 days.The S2S Project models show an average of ~20 days in all seasons with somewhat higher skill in winter ( [8][9]).

Baseline "Persistence" Model
We compare the ML models to a "persistence" baseline model.First we calculate the average change in RMM amplitude and RMM phase for one day (an amplitude change of ~ -2 e-5 and a phase angle change of ~.1 radians) from the entire observed period.
At lead time 0, the persistence model is set equal to the observed RMM values.At later leads, the persistence basline model advances the day 0 RMM values by adding the average amplitude and average phase angle for each day.

ML Classification Model Performance
Overall skill in this model is assessed via the model accuracy: the percentage of predictions where the MJO was placed in the correct phase.We also show the accuracy of the model when the MJO was placed either in the correct phase, or was off by 1 phase.
Figure 6 compares the baseline model, the ML classification model, and the ML regression model.Key aspects of Figure 6: The classification and regression models outperform the baseline model after ~5 days.The regression model slightly outperforms the classification model at most lead times.
The ML model accuracy is lower in summer, especially at longer leads where the decrease relative to winter is ~7-8%.
The ML models show substantially higher skill when the accuracy plus-or-minus one phase is considered.This indicates most misses, especially at shorter leads, tend to be within one phase.

Initially Strong Versus Weak MJO Events
In Figure 7, we examined the classification model's skill in predicting strong vs. weak MJO event.Note here the same model is used for weak, strong, and all MJO events at a given lead time; the model is not trained separately each time.Key aspect of Figure 7: Strong MJO events are more accurately predicted than weak events out to lead times of 20 days.

Sensitivity of Results to Variable Choice
We examined many additional combinations of variables as input to our model, an aspect of this work that is ongoing.Below we show results from a variety of classification models trained with different variables.We focus on winter and lead times of 0, 5 and 10 days.
The rows show a series of 1-variable input models (top panels), 2-variable input models (central panels) and 3-variable input models (bottom panel).Shorthand in the legend is defined in the caption.
Key aspects of Figure 8: In general more variables improves performance up to 3 variables.However a series of 4+ variable combinations (not shown) found no clear skill improvement relative to the 3-variable models.
1-variable models: Circulation-based models (e.g.those involving zonal wind) tend to perform best, especially at short leads.This is consistent with literature on RMM as a primarily circulation-driven index.
3-variable models: Skill at leads of 5+ days are more clustered than the 1-or 2-variable models, suggesting these models tend to find MJO signals more consistently.Best performing models are those including u850 and u200 plus one additional variable, such as OLR, q850, or SST.

Sensitivity of Results to Data Amount
Using ERA-20C data, we explored how the amount of training data impacts the results.
Figure 9 shows the skill of the winter regression model.The model here takes OLR, u850, and u200 as input.The RMM index is formed as in [2], but using ERA-20C OLR, u850, and u200 projected onto the observed EOFs.Shown is the BCC at leads 0-15 (different colors) as a function of input data amount in days.Key aspect of Figure 9: More input data improves model skill, but the improvement tends to plateau.For shorter leads the plateau begins earlier, but the amount of data encompassed by modern reanalyses appears sufficient to reach a relatively high level of skill at all leads.

NOVEL MACHINE LEARNING INTERPRETABILITY
Recent advances in interpretable artificial intelligence allow us to examine not only the skill of our ML models, but how model skill relates to the input data.

Layer-wise Relevance Propagation (LRP)
An ML interpretation technique recently shown to be useful in atmospheric science (including with regards to the MJO, e.g.[10] [11]) is known as layer-wise relevance propagation (LRP; [12] [13] [14]).LRP allows us to visualize, for a particuar prediction made by a neural network, what aspects of the input maps were most relevant in making that prediction, as illustrated in the schematic below.

LRP to Identify the MJO
Below shows a similar plot to Figure 10 but for a different MJO phase.Key aspects of Figure 13: In phase 1, note that low level winds and OLR on either side of the Maritime continent (with opposite signs) are important for determining MJO phase 10 days later.In contrast to the lead 0 plot, low-level winds around central America are not relevant to MJO prediction.
In phase 1, upper level winds in the northern Indian Ocean are most relevant, as well as winds off the west coast of South America that switch sign over the 10-day span.
In phase

ABSTRACT
The subseasonal Madden-Julian oscillation (MJO) is among the most important modes of tropical variability on the planet and is a dominant driver of subseasonal-to-seasonal (S2S) prediction globally.The past decade has seen substantial advances in MJO prediction using dynamical forecast models, which now routinely outperform traditional statistical MJO forecasts (e.g.multiple linear regression models).At the same time, an increasing body of literature has demonstrated that machinelearning methods represent a new frontier in Earth science, opening the door to more advanced statistical forecast models of the MJO.
In this study, we explore whether state-of-the-art machine learning methods can be used to make real-time MJO forecasts that outperform traditional statistical models and do comparably well to dynamical models.In particular, we utilize neural networks trained on observational tropical fields to attempt to make skillful forecasts of MJO convection out to several weeks lead time.Through contrasting the machine-learning models' behavior with simpler statistical models and dynamical forecast models, we explore the advantages and disadvantages of statistical versus dynamical forecasts.
A novel aspect of our analysis is the use of cutting-edge techniques to allow us to visualize how our neural network models makes their predictions.These techniques, such as layer-wise relevance propagation, can lead to new insights into regions of MJO predictability, allowing us to better interpret sources of MJO prediction skill within the machine-learning model.We further diagnose whether our machine-learning models contain well-known aspects of MJO prediction found in dynamical models, such as an increase in prediction skill during boreal winter or during certain phases of the stratospheric quasi-biennial oscillation.

Figure 2
Figure 2 shows an example of RMM phase space.

Figure 2 .
Figure 2. RMM phase space, from [2].The x-and y-axis are, respectively, RMM1 and RMM2; individual days are marked with a dot.Note the MJO phases and corresponding physical locations are labeled.

Figure 3 .
Figure 3. Schematic of one of the two neural network setups used in this study: the "classification model".The first layer takes normalized input maps of tropical variables.The next "hidden" layer has 8 nodes and a "ReLu" activation function, with ridge regularization to prevent overfitting.The final layer returns the likelihood that the MJO is in particular phase.

Figure 4 .
Figure 4. Similar to Figure3, but for the "regression model".The input and hidden layers are the same (though the ridge regularization is retuned).The final layer is different: here the output are real-number values of RMM1 and RMM2 (Fig.4).

Figure 5 .
Figure 5. Skill (left) and amplitude error (right) of the persistence baseline model and ML regression model.Seperate ML models are trained for winter (October -March) and summer (April-September) at each lead time.The black dashed line is a BCC of 0.5, a common threshold for skill (e.g.[7]).The amplitude error is the observed minus forecast amplitude: negative values indicate a model is too weak.

Figure 6 .
Figure 6.Similar to Figure 5, but showing accuracy for the baseline (line), ML regression (green/red dots) and ML classification (blue/orange dots) models.The left panel is the average accuracy across all eight phases.The right panel is the acccuracy of all eight phases, allowing for a plus/minus one phase error.The black dashed line is what a random guess would return (e.g.1/8 in the left panel and 3/8 in the right panel).

Figure 7 .
Figure 7. Similar to Figure 6, but only for the ML classification and baseline models in winter.The red and blue curve/dots indicate, respectively, the model performance when the initial amplitude of the MJO strong (e.g.greater than amplitude 1) or weak (e.g. less than amplitude 1).Black are all dates.

Figure 8 .
Figure 8. Similar to Figure 6, but only for the ML classification models in winter.Here seperate models are trained with different input variables at each lead time, including a range of 1-variable models (top row), 2-variable models (middle row), and 3-variable models (bottom row).Black bars in the middle & bottom rows indicate the spread of skill from the 1-or 2-variable models shown in the other plots.The legend indicates the variables in

Figure 9 .
Figure 9.The BCC of a winter ML regression model, trained on ERA20C data (see Table 1) with OLR, u850, and u200 input.Lead time is indicated by the color of the dot.The x-axis is the amount of training data, in days.The training data span always ends on Dec. 31, 1999.We vary the start date with increasing data amount, from the latest start date of June 1, 1998 (274 days) to the earliest of June 1, 1901 (17,952 days); recalling only winter days are selected.The vertical dashed line is approximately the number of days available from most modern reanalysis (e.g.beginning training on June 1, 1979).

Figure 10 .
Figure 10.Input data (left) and relevance heat maps (right) composited from the winter classification model at lead 0. All correct predictions placing the MJO in Phase 1 from the training and validation data are used in the composite of each image.The LRP maps from all samples are normalized prior to the composite so that the maximum relevance is 1, and show where the model focuses in making its prediction.

Figure 10
Figure 10 shows composites of the input variables and the LRP relevance heatmaps (see next subsection) for the OLR-u850-u200 classification model when the output was a Phase 1 MJO.Brighter colors on the right indicate the regions the model focuses on in making its prediction.

Figure 11 .
Figure 11.A schematic diagram of LRP, from[14].The top row represents a neural network trained to take some input and predict an output.LRP

Figure 12 .
Figure 12.As in Figure 10 but for Phase 6 predictions.

Figure 13 .
Figure 13.Similar to Figures 10 and 12, but for lead times of 10 days.The top three rows are Phase 1, and the bottom three are Phase 6.The left-hand column shows the input composite (e.g. the input 10 days prior to the correct prediction of the MJO phase).The center are relevance plots, and the right hand column shows the target composite (e.g. the composite image 10 days after the left-hand column).

Table 1 . List of datasets, variables, and resolutions for this study. For the SST fields, data is only over ocean points, and points over land are not included. The right-most column indicates the input data used for training and validating the model.
Further analysis comparing relevance maps for different target MJO states, for when the model is wrong, and as they relate to the model's confidence in its predictions will be explored in future.Please feel free to reach out with more questions or comments!