Improving the prediction of the Madden-Julian Oscillation of the ECMWF model by post-processing

. The Madden-Julian Oscillation (MJO) is a major source of predictability on the sub-seasonal (10- to 90-days) time scale. An improved forecast of the MJO, may have important socioeconomic impacts due to the influence of MJO on both, tropical and extratropical weather extremes. Although in the last decades state-of-the-art climate models have proved their capability for forecasting the MJO exceeding the 5 weeks prediction skill, there is still room for improving the prediction. In this study we use Multiple Linear Regression (MLR) and a Machine Learning (ML) algorithm as post-processing methods 5 to improve the forecast of the model that currently holds the best MJO forecasting performance, the European Centre for Medium-Range Weather Forecast (ECMWF) model. We find that both MLR and ML improve the MJO prediction and that ML outperforms MLR. The largest improvement is in the prediction of the MJO geographical location and intensity.

the MJO prediction (Zhang et al., 2013;Jiang et al., 2020). In particular, an improvement of the prediction skill when MJO crosses the Maritime Continent (MC) barrier (Wu and Hsu, 2009;Kim et al., 2016;Barrett et al., 2021) will be of practical importance due to the influence of MJO on ENSO, as an improved MJO prediction may contribute to improving the prediction of ENSO.
Machine Learning (ML) algorithms are being extensively used in many fields, and they are gaining a foothold in weather and 25 climate forecasts (O'Gorman and Dwyer, 2018;Nooteboom et al., 2018;Dijkstra et al., 2019;Ham et al., 2019;Dasgupta et al., 2020;Tseng et al., 2020;Gagne II et al., 2020;) among many others. Although MJO predictions obtained using ML models do not outperform dynamical models Martin et al., 2021a), a hybrid approach, combining dynamical models and ML techniques, may improve the results. In this way, it is possible to use dynamical models that have been developed across decades, based on physical phenomena, in combination with data-driven ML techniques, an approach 30 that has shown its ability to reduce the gap between observations and dynamical models' forecasts (Rasp and Lerch, 2018;McGovern et al., 2019;Scheuerer et al., 2020;Haupt et al., 2021;Vannitsem et al., 2021).
Recently, it has been shown that bias correction in linear dynamics (Wu and Jin, 2021) and the use of Deep Learning (DL) (Kim et al., 2021) can improve the MJO prediction. Specifically, Kim et al. (2021) have shown that the performance of poor models becomes comparable to that of the best model after DL-correction. Here we deal with a related but different problem: can we 35 use a rather simple ML algorithm -a single-hidden layer Feed-forward Neural Network (FFNN)-to improve the forecast of the model that provides the best MJO prediction?
Currently, the best forecast dynamical model in terms of MJO prediction skill is the one developed by the European Centre for Medium-Range Weather Forecast (ECMWF) (Jiang et al., 2020). Therefore, in this study we attempt to improve ECMWF forecasts by using an ANN. We also analyze the performance of Multiple Linear Regression (MLR) as a baseline post-processing 40 tool.
To quantify the forecast skill we use four metrics, namely the bivariate correlation coefficient (COR), the bivariate rootmean-square error (RMSE), with threshold values COR=0.5 and RMSE=1.4, as well as the amplitude error and the phase error (Rashid et al., 2011).
We apply the post-processing ML and MLR techniques to the ensemble mean of ECMWF, and we show that ML outperforms 45 MLR, being able to correct the ECMWF MJO forecasts, and the improvement lasts for longer than four weeks. In particular, ML improves the prediction of the MJO over the MC and its amplitude, while the phase errors obtained with the two postprocessing techniques are similar. For this study, we use the Real-time Multivariate MJO (RMM) index (Wheeler and Hendon, 2004) as labels for the supervised learning method, which is used to characterize the MJO geographical position and intensity. The first two principal components of the combined empirical orthogonal functions (EOFs) of outgoing longwave radiation (OLR), zonal wind at 200 and 850 hPa averaged between 15 • N and 15 • S are labeled RMM1 and RMM2. With a polar transformation, it is possible to define the MJO phase and amplitude. The phase is divided into 8 classes, each corresponding to a different sector of the phase diagram defining 55 the observed MJO life cycle. The amplitude, describing the MJO intensity, when smaller than 1 defines a non-active MJO. The ERA5 RMM1 and RMM2 from 13th June 1999 to 29th June 2019 were downloaded from ecm (2021). This time window is selected to match the ECMWF reforecasts, presented in the previous section.

ECMWF RMM reforecasts
The samples used as input for the ANN and to assess the model performance, are ECMWF reforecasts with Cyrcle 46r1 freely 60 available from ecm (2021). This dataset is composed of 110 initial dates per year for 20 years, between the 13th June 1999 and the 29th June 2019. In total there are 2200 starting dates, from which a 46-lead-days prediction is available. The dataset provides the prediction of four variables: the first two principal components of the RMM index, and their polar transformation.
For each starting day and variable there are 12 time series of 46 points. One is the controlled forecast (cf) corresponding to a forecast without any perturbations, then there are 10 perturbed forecasts members (pf) which have slightly different initial 65 conditions from the cf to take into consideration errors in observations and the chaotic nature of weather. Finally there is the ensemble mean (em), which corresponds to the mean of the 11 members (cf + 10 pf). In this particular study, we made use solely of the em data

Prediction skill
To know how good a model is in predicting, we present here the metrics that will be used. For sake of comparison, we use the 70 same metrics adopted in Kim et al. (2018), which are adapted from Lin et al. (2008); Rashid et al. (2011), where they define the COR and RMSE as follows: where a 1 (t) and a 2 (t) correspond to the observed RMM1 and RMM2 at time t, while b 1 (t, τ ) and b 2 (t, τ ) will be the respective In this study we use COR=0.5 and RMSE=1.4 as prediction skill thresholds (Rashid et al., 2011). The RMM prediction skill is defined as the time in which the COR takes a value below 0.5 and RMSE gets above 1.4. For a given lead time, the COR and 80 RMSE are the average value up to that lead time.

Amplitude and phase error
To characterize the MJO it is convenient to perform a change of coordinates from cartesian to polar (RMM1, RMM2)→(A, φ).
The MJO amplitude A(t), describing its intensity, can be written as:

85
while the MJO phase φ(t), describing the geographical position of the enhanced rainfall region center, can be written as: By definition Rashid et al. (2011), the amplitude error for a given lead time E A (τ ) can be expressed as where N represents the number of predicted days, A obs (t) is the observations amplitude at time t and A pred (t, τ ) is the 90 predictions amplitude at time t with a lead time of τ days. The phase error E φ (τ ) is defined by where a 1 (t) and a 2 (t) correspond to the observed RMM1 and RMM2 at time t, while b 1 (t, τ ) and b 2 (t, τ ) correspond to the predicted RMM1 and RMM2 at time t with a lead time of τ days. These two metrics allow us to analyze in more detail the model performance to predict the MJO, in conjunction with COR and RMSE. We use an adaptive number of neurons depending on the number of days we want to forecast. The ECMWF reforecasts provide predictions up to a lead time of 46 days for both RMM1 and RMM2, and we build a different network for each lead time. This 105 means that the number of output neurons N out can fall between 2 and 92 because we use both RMM1 and RMM2.
After selecting the number of output neurons (which is even and in fact defines our lead time, τ = N out /2), we adapt the number of input N in and hidden neurons N h as follows. As input, the networks receive the ECMWF reforecasts, which also limit the number of input neurons N in in the range between 2 and 92. After training the networks with different N in , we found the best result is obtained with N in = N out + 6 with an upper limit of 92. This means that for all lead times τ > 44, 110 N in = N out = 92. For lead times larger than 30-35 days, the prediction skill of the models falls below the thresholds of 0.5 and 1.4 imposed by the COR and RMSE, respectively, and thus, the lead times τ > 44 are not important to predict. Using all 92 inputs, the prediction skill for short lead times slightly decreases. For simplicity, a fixed number of 92 inputs could also be used.
An interpretation of the reason behind this result is that to correct the prediction values for a given day, the future predicted values can help up to some extent. To correct the prediction of a given day, for each RMM we use the predicted values of up 115 to 3 days after that particular day. To avoid overfitting, we want the number of hidden neurons to be relatively small, for this reason after some tests, we select N h = N in /2. The training has been performed over 100 epochs which allows to not overfit the model. The model performance is tested using a walk-forward validation. The procedure is as follows. First, we train the network on an expanding train set, and then test its performance on a validation set that contains the N samples that follow the train set. In our case, we found that the optimal minimum number of samples for the train set, out of 2200 available, is 1700 (∼ 120 17 years). Then, the train set is extended by 100 samples (∼ 1 year) for each run, and validated on the subsequent 200 samples (∼ 2 years). This method of walk-forward validation ensures that no information coming from the future of the test set is used to train the model. Other methods to avoid overfitting could also be used, such as early stopping or drop-out.
MLR is the well-known least squares linear regression where the observed RMMs are a linear combination of the ECMWF predicted RMMs. To compute the MLR we use the Python library scikit-learn (Pedregosa et al., 2011). With MLR we 125 correct the RMMs separately, and apply the same walk-forward validation used for the ANNs.

Results
The first part of this section will be devoted to the results obtained for the MJO amplitude and phase. In the second part we present the prediction skill assessment using the COR 0.5 level, and RMSE 1.4 level as metrics, while in the last part of the section we show how the different forecast methods perform for different MJO initial phases.

130
The results are obtained training the ANN from 13th of June 1999 using a walk-forward validation, and averaging the error obtained by testing over different unseen time windows from 5th December 2014 to 29th June 2019. The size of the windows is defined by the selected number of initial days from which the ECMWF forecast starts. Due to the bi-weekly acquisition of ECMWF, this means that each window of 200 points corresponds to 2 years approximately. Each member of the ensemble over which the average is performed, corresponds to a test set used for the walk-forward validation. Different sizes of the test set 135 between 100 and 500 samples have been tested, leading to prediction skills that vary sensibly. For this reason, it is important to take into account that results may vary depending on the test set and its size, albeit preserving the same general result: the post-processing corrections improve the ECMWF forecasts.
In Fig. 2, we show the error on the MJO amplitude for events starting with an amplitude larger than 1. We can notice an underestimation of the amplitude as expected (Jiang et al., 2020). Nevertheless, the post-processed amplitudes are closer to the 140 observed ones, with respect to the raw ECMWF forecast. The maximum improvement occurs for a lead time of 28 days when the ECMWF-ANN model has a RMSE similar to the RMSE of the uncorrected ECMWF at a lead time of 20 days.
By the definition of the amplitude error, errors of opposite sign could potentially cancel out resulting in misleading conclusions.
For this reason in Fig. 2 we also provide the RMSE of the amplitude error, which shows a similar behavior as before. Both post-processing techniques improve the results, with the ANN bringing the highest benefits in terms of the magnitude of error 145 reduction, and the forecasting horizon of the improvement.
In Fig. 3, we present the MJO phase error. The post-processing techniques provide an improved prediction, during which all three models predict a negative phase. A positive phase error indicates a faster propagating MJO, while a negative error represents a slower propagation. The ECMWF forecast shows an overall slower propagation of the MJO with respect to the observations, and both post-processing corrections provides an increment of the MJO speed prediction. In particular, at the 18 150 days lead time we can notice an increment of the ECMWF phase error, which MLR and ML tend to correct. In Fig. 5, we display the comparison between the observations, the ECMWF forecast, and its corrections, in a Wheeler-Hendon phase diagram for two different starting dates of the same MJO event. The dots are marked every 7 days to identify the weeks.
In the left panel, the 3 weeks prediction starts on the 21st November 2018 and displays its progression from the Western Hemisphere over the Indian Ocean. It is possible to notice that both post-processing techniques display very similar prediction,   Here we presented an example of a strongly active MJO event, where the corrections clearly improve the ECMWF prediction and it is among the best found. All predictions from the 12th of December 2014 to the 18th of June 2019, can be found in Silini (2021b). Looking at these results it is possible to appreciate the general improvement provided by the post-processing corrections.

170
Finally we study the amplitude error, the phase error, the COR, and RMSE, as a function of the different initial phases of MJO. As displayed in Fig. 6, applying post-processing methods improves the amplitude error for all initial phases. The MLR provides an improvement with respect to the ECMWF model, but the ML correction leads to the lowest error. Concerning the initial phases, we find the lowest amplitude error when an MJO event starts over the MC, while the largest is found in phase 2, over the Indian Ocean. With the MJO propagating at an average speed of 5 ms −1 , events starting in phase 2 will cross the MC 175 in 2-3 weeks time (Kim et al., 2014). The phase error displays a large worsening of the MJO localization prediction, when the forecast starts between the MC and Western Pacific (phase 6-8). This observation is consistent with Fig. 5, where we noticed a drop in the accuracy of the MJO speed prediction over the Indian Ocean and MC. The COR finds its maximum when starting over the MC continent, consistently with the amplitude error. The ML correction has the highest COR except for phase 8, where MLR leads to the highest one. The RMSE is very consistent with the COR, in which we find the the minimum in phase 180 4, and the ML correction having the lowest error, except for phase 8. Overall, we can conclude that the ML post-processing is worth applying especially to reduce the error on the amplitude prediction, while MLR could be useful for a better prediction of the MJO location.

Discussion
This study confirms the potential of post-processing techniques to reduce the knowledge and bias gap between dynamical 185 models forecasts and observations, providing advancement in MJO prediction.
It is interesting to compare the results presented in Fig. 4 with those reported in Fig. 7 of the Supplementary Information in Kim et al. (2021), keeping in mind that Kim et al. show the mean BCOR for the 8 dynamical models considered. While it can be seen in Fig. 7 that for short lead times (up to 2 weeks) a clear improvement with DL-post-processing is obtained, the average BCOR for short lead times is quite low compared to the ECMWF prediction. We can also notice that the improvement obtained by (as it could be expected, due to the fact that ECMWF model provides the best MJO forecast), but our improvement lasts for longer lead times.
It is also interesting to compare the different post-processing approaches used. While we use a feedforward neural network (FFNN) architecture, Kim et al. (2021) Kim et al. (2021), in our case we are not trying to predict the future of a time series using its past, but we are trying to improve the predictions.
There are other differences in the architecture of the networks used: we found that to improve the prediction of the RMMs for a day t, the information in the past and future predictions can both help the correction, while in Kim et al. (2021), the future 200 model's predictions (which are available) are not used for the correction. Another difference is that while the algorithm used by Kim et al. (2021) performs an expansion of the system dimensionality (hidden nodes > input nodes), we found good results performing a compression (hidden nodes < input nodes).
Comparing our results to those of post-processing ensemble weather predictions on medium-range time-scales (Vannitsem et al., 2021), we find that the general magnitude of improvements over the predictions of the dynamical model is lower. This

205
indicates the increasingly difficult challenge to obtain accurate MJO predictions for longer lead times, likely due to a generally lower predictability and a lower level of useful information that can be learned from the raw ensemble predictions compared to those of many other weather variables on shorter time scales. That said, our results indicating the potential of modern DL methods to improve over classical statistical approaches are well in line with the findings for medium-range post-processing (Rasp and Lerch, 2018;Vannitsem et al., 2021;Haupt et al., 2021).

Conclusions
We employed a MLR and a ML algorithm to perform a post-processing correction of the prediction of the dynamical model that currently holds the highest MJO prediction skill (Jiang et al., 2020), developed by ECMWF.
The largest improvement is found in the MJO amplitude and phase individually, which decreases the underestimation of the amplitude, providing a more accurate predicted geographical location of the MJO. The amplitude and phase estimation are 215 improved for all lead times up to 5 weeks.
We obtained an improved prediction skill of about 1 day for a COR of 0.5.
Plotting the forecasts in a Wheeler-Hendon phase diagram we found an improvement predicting the MJO propagation, notably across the MC, which helps overcome the MC barrier.
Considering the results obtained for each initial MJO phase, we found that both post-processing tools improve the prediction, 220 with the ML correction being the best.
The ML technique provides an improvement over MLR for all initial phases except phase 8. In the case of phase forecast it might be also sufficient to use MLR instead of ML. This suggests a predominance of linear corrections to improve the MJO phase forecast.
Due to the influence of the initial phase, amplitude and season on the prediction skill, it would be ideal to train the ANNs for 225 the different initial conditions, but this is not possible because of the limited data that is available for the training.
As future work, it would be interesting to test a stochastic approach to post-processing (as in Rasp and Lerch (2018)), which would allow to obtain a probabilistic forecast. A promising path to further reduce the prediction error is to include other informative variables as inputs to the ANNs in conjunction with the ECMWF predictions. These additional variables should be carefully selected using methods of causal inference such as well-known Granger Causality (Granger, 1969), Transfer
Although the improvement provided by the MLR and ML techniques, a post-processing method will always strongly rely on the accuracy of the dynamical model's forecasts. For this reason, it is crucial to work on both dynamical models and machine learning methods to progress.