Forecasting the magnitude and onset of El Nino based on climate network

El Nino is probably the most influential climate phenomenon on interannual time scales. It affects the global climate system and is associated with natural disasters and serious consequences in many aspects of human life. However, the forecasting of the onset and in particular the magnitude of El Nino are still not accurate, at least more than half a year in advance. Here, we introduce a new forecasting index based on network links representing the similarity of low frequency temporal temperature anomaly variations between different sites in the El Nino 3.4 region. We find that significant upward trends and peaks in this index forecast with high accuracy both the onset and magnitude of El Nino approximately 1 year ahead. The forecasting procedure we developed improves in particular the prediction of the magnitude of El Nino and is validated based on several, up to more than a century long, datasets.


Introduction
El NiñoSouthern Oscillation (ENSO) is an inter-annual coupled ocean-atmosphere climate phenomenon [1][2][3]. El Niñois the warm phase of ENSO and is characterized by several degrees warming of the eastern equatorial Pacific ocean. It occurs every 3-5 years, and is regarded as the most significant climate phenomenon on decadal time scales. Among other factors, it affects the surface temperature, precipitation and mid-tropospheric atmospheric circulation over extended regions in America, Australia, Europe, India, and East Asia [4][5][6][7][8]. In particular, strong El Niñocan trigger a cascade events that can affect many aspects of human life [9][10][11].
As a result of the environmental, economical, and social impacts of El Niño, intensive efforts have been undertaken to understand and eventually forecast El Niño [12][13][14]. Extensive atmospheric and oceanic observations have been used to track variations in ENSO cycle, and complex computer models have been developed to forecast El Niño [15][16][17][18][19][20][21]. Still, reliable forecasts techniques for the onset and in particular the magnitude of El Niñowith relatively long lead time (of more than half a year) are not fully satisfactory. We have just undergone one of the strongest El Niñoevents since 1948, which started in the end of 2014 and ended in mid-2016 [22]. The onset of this event was predicted one year ahead using the network approach [23]. Here, we develop a climate network-based index that can forecast the onset of El Niño approximately 1 year ahead (similar to [23][24][25]). In particular our approach forecasts the magnitude of El Niño, once it begins.

Methodology
The Oceanic Niño Index (ONI) is a standard index that is used to identify El Niño [26]. It is the running 3 month mean sea surface temperature (SST) anomaly averaged over the Niño 3.4 region, based on 30 years periods, Original content from this work may be used under the terms of the Creative Commons Attribution 3.0 licence.
Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI. updated every 5 years. When the ONI exceeds 0.5°C for at least five consecutive months, the corresponding year is considered to be an El Niñoyear. We use the ONI (whose first value is at 1950) to estimate the accuracy of our predictions for El Niñoevents occurred after 1950.
We analyze the variability of the daily mean near surface (1000 hPa) air temperature fields of the ERA-Interim reanalysis [27], the NCEP Reanalysis I [28], the AMIP SST boundary condition data (current version: PCMDI-AMIP 1.1.3) [29], and the extended reconstructed SST v5 (ERSST.v5) [30] in the Niño 3.4 region (i.e., 5°S-5°N, 120°W-170°W) using a climate network approach [31][32][33][34][35][36][37][38]. See table 1 (rows 1-5) for detailed information on the datasets. We find that the temporal variations of temperature anomaly (defined below in (i)) in different sites of the Niño 3.4 region become less coherent (more disordered) well before the onset of El Niño. In particular, the magnitude of the event is approximately proportional to the maximal degree of disorder (defined below in (ii)) that the Niño 3.4 region can reach before the onset of El Niño. We suggest a single index, the degree of disorder of the El Niño 3.4 region, that can forecast both the onset and magnitude of El Niño.
In the following, we first demonstrate the steps of the forecasting method we propose on 33 years (1984 to present) of the reanalysis data of the European Centre for Medium-Range Weather Forecasts Interim Reanalysis (ERA-Interim) [27]. We then examine the robustness and accuracy of the prediction method on longer periods using several other datasets (NCEP Reanalysis I [28], PCMDI-AMIP 1.1.2 [29] and ERSST.v5 [30]).
The daily mean near surface (1000 hPa) air temperature fields of the ERA-Interim reanalysis data have a spatial (zonal and meridional) resolution of 2.5°×2.5°, resulting in 105 grid points in the Niño 3.4 region. Different locations (grid points) in the Niño 3.4 region correspond to nodes in the local climate network, and the weight of links are determined by the similarities (defined below in (ii)) of the temporal temperature anomaly variations between pairs of nodes [31,38]. The forecasting algorithm is as follows: (i) At each node k of the network, we calculate the daily atmospheric temperature anomalies T k (t) (actual temperature value minus the climatological average which then is divided by the climatological standard deviation) for each calendar day. For the calculation of the climatological average and standard deviation, only past data up to the prediction date have been used. For simplicity leap days were excluded. We have used the first 5 years of data (1979)(1980)(1981)(1982)(1983) to calculate the first average value and start the prediction from 1984.
(ii) For obtaining the time evolution of the weight of the links between nodes i and j in the Niño 3.4 region, we follow [24,25,31] and compute, for each month t (the first day where the month starts) in the considered time span between 1 January 1981 and 31 August 2017, the time-delayed cross-correlation function defined as Table 1. Summary of the information (rows 1-5) and the forecasting power (rows 6-10) for different datasets. The 'Resolution of the data' refers to the spatial (zonal and meridional) resolution of the data. In the row of 'D' , the symbol ' * ' indicates that there are not enough events in the datasets to perform the Kolmogorov-Smirnov statistic. In the columns of lead time, we show the mean value±1 standard deviation in units of months. For more details on the above results see figure S2 in the SI.

DATA ERA-Interim [27]
NCEP Reanalysis I [28] PCMDI-AMIP 1.1.3 [29] ERSST.v5 [30] Type of data where the brackets denote an average over the past 365 days, according to We consider, for the daily datasets, time lags of τ ä [0, 200] days, where a reliable estimate of the background noise level can be guaranteed (the appropriate time lag is discussed in [39]). For monthly updating datasets (PCMDI-AMIP 1.1.3 and ERSST.v5), the brackets denote an average over the past 12 months, according to and we consider time lags of τ ä [0, 6] months. The similarity between two nodes (the weight of the link) is determined by the value of the highest peak of the cross-correlation function, , where θ is the corresponding time lag at the peak. The degree of coherence/disorder of the Niño 3.4 region is quantified by the average value of all links at their peaks, i.e. å å where N=105 is the number of nodes in the Niño 3.4 region. Thus, higher values of C(t) indicate higher coherence in the Niño 3.4 region. We like to note that the strength of the link between nodes i and j is represented by the strength of the crosscorrelation between the temperature records at the nodes, which is defined by [24,35] where ( ) g E denotes the average over 401 shifting days, according to is high when the peak at τ=θ is sharp and prominent, and it is low when the cross-correlation varies slowly with τ. In [24], Ludescher et al introduced a 12-mo forecasting scheme based on the observation that the mean strength of links that connect the 'El Niñobasin' (equatorial Pacific corridor) with the surrounding sites tends to increase about one year before the El Niñoevent.
(iii) The forecasting index (FI) we propose here, is based on the temporal evolution of C(t) (defined in (ii) equation (4)), representing the interactions or similarity (coherence) between the different sites within the Niño 3.4 region. We define the FI as a function of months as follows, where a=0 indicates that the average of ln(C) includes the current month, while m is the total number of months preceding t since January 1981. We use a minus sign in the right hand side of equation (7) so that peaks in the FI will correspond to peaks in the ONI, see figure 1. We also use the log in C(t) instead of just C (t) in order to make small variations of C(t) to become more significant so that it will be seen more clearly in figure 1. We start to evaluate FI(t) from January 1984. (For NCEP Reanalysis I, m equals the number of months before t since January 1950, and the FI(t) starts from January 1953; for PCMDI-AMIP 1.1.2, m equals the number of months before t since January 1872, and FI(t) starts from January 1950; for ERSST.v5, m equals the number of months before t since January 1856, and FI(t) starts from January 1950) Thus, it follows that FI(t) increases (C(t) decreases) when the Niño 3.4 region is less coherent or more disordered (due to the minus sign). FI(t) is calculated for each month (red dotted line in figure 1) and one can easily see that usually FI(t) increases well before the onset of El Niño, and decreases once El Niñobegins. In other words, the temporal variations of temperature anomaly in different sites of the Niño 3.4 region become less coherent (more disorder) prior to El Niño, and start to synchronize once El Niñobegins. In particular, we find that the more disordered the Niño 3.4 region is before El Niño, the higher is the magnitude of the approaching El Niño.
We provide a flow chart entitled 'Steps in calculating FI' that briefly describes the above algorithm in the supplementary material is available online at stacks.iop.org/NJP/20/043036/mmedia.

The forecasting algorithm using index FI
Based on the above observation, we suggest the following algorithm to forecast simultaneously both the magnitude and onset of El Niñousing FI(t). For demonstration see the example shown in figure 1(b).
(i) To forecast the magnitude, as soon as one month the ONI rises across 0.5°C we regard the value of the highest peak of FI(t) ('Peak', as indicated by the red points in figure 1(a) and the red arrow in (b)) since the end of last El Niñoas an estimate (forecasted magnitude) for El Niñostrength (observed magnitude). However, if the peak value is negative or there is no peak during this period, we use zero as the forecasted magnitude and forecast a weak El Niñoevent (ONI < 1°C) (we counted the results of all the datasets we used, and find that the ratio of such events is 13% on average of all the El Niñoevents, and most of them (84% on average of this kind of events) are indeed weak). In addition, we should clarify that if the ONI rises across but do not keep above 0.5°C for at least five months, we do not have an El Niñoevent, thus the value of the highest peak is not a prediction of El Niñomagnitude.
(ii) To forecast the onset, we track both FI(t) and the ONI, starting from the onset of the previous El Niño. If FI(t) increases from a local minimum ('Valley', as indicated by the blue arrow in figure 1(b)) continuously for at least two months (time segment that yielded the best forecast), the time at which FI(t) exceeds 0 (if it is not during ongoing El Niño/La Niñaperiod, i.e. −0.5°C < ONI < 0.5°C) is considered as a potential signal for the onset of either El Niñoor La Niñaevent within approximately the next 18 months ('Forecast', as indicated by the green arrows in figure 1). Moreover, if La Niñais experienced within these 18 months, we forecast a new El Niñoto occur within 18 months after the end of La Niña(the first month of ONI > −0.5°C after La Niña). Given the above, a true positive prediction of El Niñois counted if within 18 months after the potential signal an El Niñooccurs ('normal', as indicated by the green arrows in figure 1), or a La Niñathat followed by an El Niñoin the next 18 months occurs ('delayed', as indicated by the green arrows with stars on the top in figure 1); otherwise, a false alarm is counted. is the value used to predict the magnitude of El Niño(blue dot). The purple star above the purple arrow indicates that we might be undergoing a new La Niña(the gray shades, the ONI has already been bellow −0.5°C for the last two months), and therefore an El Niñois forecasted to come within 18 months after the end of the suspected ongoing La Niña.
Next, we elaborate on the reasoning behind our approach. In figure 2(a), we plot the probability density function for all links in network windows at which 'Valley' (m=Valley, blue), 'Forecast' (m=Forecast, green) and 'Peak' (m=Peak, red) occur, respectively. We compare these PDFs with a PDF of random networks that are obtained by shuffling the order of the calendar days for each node within the Niño 3.4 region. We find the strongest correlations for the 'Valley' periods (as the PDF is stretched toward higher values), then weaker correlations for the 'Forecast' periods, and then the weakest correlations for the 'Peak' periods (closest to the shuffled correlations). Thus, the Niño 3.4 network (region) becomes less coherent when progressing from 'Valley' periods to the 'Peak' periods. The order is reestablished towards the actual peak of El Niño. The evolution of the crosscorrelation of a typical link (shown in figure 2(b)), before the onset of 2014-2016 El Niñoevent, is shown in figure 2(c). The three cross-correlation functions (blue, green, and red) correspond to the 'Valley', 'Forecast' and 'Peak' points marked by blue, green and red arrows in figure 1(b). Consistently, we find that the maximal values of the cross-correlation function, q ( ) is decreasing from Valley to Peak months, the strength of the link ( ) W i j t , [24] is increasing. This difference is probably due to the autocorrelation of the temperature anomaly variations in the Niño 3.4 region [39]; see figure S1.

Forecasting the magnitude of El Niño
We now examine the accuracy and robustness of our forecast for the magnitude of El Niñoevents between 1950 and present (since the ONI begin from 1950), using several datasets. For this purpose, we plot the predicted magnitude versus the observed magnitude of El Niño(scatter plot), and use the Pearson correlation coefficient, r, to quantify the correlation. We present such scatter plots in figure 3.
Next, we apply the Kolmogorov-Smirnov test to quantify the significance of the relationship between the predicted and observed magnitude of El Niño; figure 3 (insets). Each time we randomly choose ten events and calculate the correlation coefficient between their predicted and observed magnitudes; we repeated this procedure 1 million times, and obtained the PDF of r-values for each dataset (colored by green in figure 3). For a comparison, we also consider random cases as follows. Each time we choose randomly ten predicted values and randomly ten observed values and then perform a linear regression between them; also here we have performed 1 million selections, and obtain the PDF of r-values for each dataset (colored by gray in figure 3). Then we compare the PDFs of observed r-values to the random r-values using Kolmogorov-Smirnov statistic D [40]. For each dataset used here, D is relatively large (D0.37), indicating significant difference between the observed and predicted El Niñomagnitude.
The results are summarized in table 1, in the rows heading 'El Niñomagnitude'. We note however, that the prediction of the magnitude of El Niñois performed at the actual onset of El Niño, which on average occurs about half a year prior to the peak of El Niño. Previous studies proposed various methods to forecast El Niñoevents. Some of these predict quite successfully the onset of El Niño, about one year in advance [24,25]. We compare our prediction method to prediction of the 12-mo forecasting scheme based on climate network approach [24] and to the prediction of state-of-the-art models-the COLA anomaly coupled model [41] and the Chen-Cane model [13]; for this purpose we use the operating characteristics (ROC) [42], see figure S3. The resulting hit rate of our approach is 0.81, and the lowest (worse) hit rate of 0.81 is obtained for NCEP Reanalysis I. Meanwhile, the false alarm rate of our approach is 0.24, and the highest (worse) false alarm ratio of 0.24 is obtained for ERSST.v5. For prediction lead time of 12 months the hit rate is < 0.4 for the COLA model [41] and < 0.45 for the Chen-Cane model [13] with false alarm ratio of ∼0.2. The hit rate for the network approach in [24] is 0.667 and false alarm rate is 0.095.
The prediction scheme we proposed here improves the prediction of the onset of El Niño. An additional and also the most important advantage of the prediction scheme we propose is that it provide prediction both for the magnitude and the onset of El Niñobased only on the temperature variability and their coherence in the Niño 3.4 region.

Summary
In summary, we introduce a new FI that is based on climate networks which accurately and simultaneously forecasts both the onset and magnitude of El Niño. The performance of the FI is examined successfully on  [29] and (d) ERSST.v5 [30]. The red lines indicate the linear regression fits. We also show the corresponding correlation (r-values), Kolmogorov-Smirnov statistic (D), and the function of the fitting lines. several datasets. Our forecasting algorithm is based on the finding that the similarity or the coherence of low frequency temporal variability of temperature anomaly between different sites (strength of links) in the Niño 3.4 region decreases well before El Niñoand increases at the onset of El Niño. The magnitude of the predicted El Niñois positively related with the highest peak in the FI during the period between the end of last El Niñoand the onset of the new one. The results presented here indicate an important characteristic of the phase of the ENSO cycle, i.e., significant increase of disorder occurs in the Niño 3.4 region well before the onset of El Niño. The relationship between El Niñoand the variation of the degree of disorder in the Niño 3.4 region may be further explained by defining an entropy based on the coherence of temperature variations in different sites of the Niño 3.4 region, which oscillates periodically with the ENSO cycle. There is surely a room of further improvement of the forecasting algorithm proposed here, probably with combination with other forecasting techniques and models.