Rainfall forecasting in arid regions using an ensemble of artificial neural networks

Water rainfall prediction is one of the most difficult tasks in hydrology because rainfall events are extremely random. This research presents a comparative analysis of different models that predict rainfall in an arid region. The forecasting models comprise the feed-forward, general regression, recurrent, cascade, and Elman neural networks. The performance of the aforementioned models is assessed using three evaluation metrics, namely the correlation coefficient, coefficient of efficiency, and Willmott’s index of agreement. Furthermore, the statistical significance of the neural network models is evaluated using the Wilcoxon-Mann-Whitney test. Finally, the correspondence of the neural network model results compared to the observations is examined using the Taylor diagram. The findings reveal that the general neural network exhibits the best performance compared to other models using the tropical rainfall measuring mission dataset at Suez city in Egypt. The Egyptian water municipality is intended to benefit from the proposed model in monthly rainfall forecasting in this arid region. The precise modeling of rainfall is vital for managing water resources such as food production, water allocation, and drought management.


Introduction
In the twenty-first century, the effects of climate change have become more recognizable. Natural disasters, such as earthquakes, droughts, and hurricanes, have become more common in recent years [1]. Climate change is expected to worsen the rainfall abnormalities and irregularities. The variability of rainfall events has directly contributed to an estimated loss of food production for around 81 million people every year [2]. Therefore, the precise modeling of short and long-term rainfall is crucial to improve the efficiency of operations related to water resource systems. Flood systems, hydrologic models, reservoir operation, and other systems require short-term rainfall prediction [3]. Long-term rainfall forecasting, on the other hand, is useful for a variety of applications, including environmental protection, food production, drought management, and ideal reservoir operation [4]. Both forecasting

Feed-forward neural network
It comprises an input layer, hidden layer(s), and an output layer [16]. There are many neurons in these layers [6]. There are many advantages and limitations of FFNN. The ability to train and adapt the network using historical data so the forecasted values meet the target values is one of the main advantages. The model is taught to find a relation between the inputs and outputs(s). The developed model predicts the outcome(s) from the data during the testing and validation process(s). In terms of limitations, where the structure and network design are not accurate, the training speed is slow [18]. The backpropagation feed-forward algorithm is the most popular type of neural network in hydraulic engineering because of its applicability and simplicity. Mohanty et al. [19] used this technique for short-term groundwater levels prediction and it was advantageous over the numerical model (MODFLOW).

General regression neural network
It performs predictions efficiently because the optimum regression surface is achieved promptly and the estimation error is neglected [20]. An input layer, a hidden layer, a summation layer, and an output layer make up the four layers. The hidden/pattern layer receives input data from the input layer. The Euclidean distance and activation function are calculated by the pattern layer. The summation layer is attached to the hidden layer. The ratio between the sum of the product of training output data and the activation function, as well as the sum of the activation function, is fed to the output layer by the summation layer. The output layer merely divides the two components of the summation layer to yield the predicted value. Varanasi and Tripathi [21] presented the use of this technique in forecasting wind power in India and the results showed significant reliability.

Recurrent neural network
It feeds the output from the previous step into the current phase, unlike conventional neural networks, which are characterized by independent inputs and outputs. Therefore, it is a powerful tool to model time series input data [22]. The main advantages of RNN could be abridged as follows: a) it can process inputs of any length, b) it stores information throughout the time in its internal memory, and c) the model's size does not increase even if the input size is enormous. On the other side, the major limitations are listed as follows: a) the computational time is slow because of its recurrent nature, b) the training process is a challenging task, c) processing long sequences using certain activation functions is difficult, and d) susceptibility to problems such as exploding and gradient vanishing [23]. This method was demonstrated in the field of air quality for contamination prediction of a PM10 concentration by Athira et al. [24].

Cascade-forward back propagation neural network
Its architecture is different from that of FFNN [25]. In CBPN, the layers are linked with biases that control data transfer accuracy between layers, under the user-addressed training and performance functions. CBPN is considered advantageously incomparable with the FFNN because it allows the neurons' participation regarding computation and weight updating of all the following layers. On the other hand, the functionality is limited in the FFNN, whereas it only supports the participation of neurons in computation and weight updating for the next layer [13]. Narad and Chavan [25] tested this technique in addressing Authentication issues successfully.

Elman back propagation neural network
It is a recurrent-based neural network with adaptive functionality for time series anomalies. The network architecture comprises an input layer, an output layer, a hidden layer, and a context one. The network also supports several user-based options, such as the training functions and the number of nodes in each layer. For illustration purposes, the Elman network is associated with feedback connections from the output layers to the input one. These connections consider the current and previous state of neurons from the signals collected from the input layer. In other words, the context layer, addressed for information recording, is conducted for connecting the previous iterations with the

Performance metrics
To make a rigorous comparison of various neural network models, a variety of metrics could be used. Three performance metrics (i.e., CE, R, and WI) are studied in this study to perform a fair evaluation of the chosen models. The following sub-sections provide a short overview of each metric.

Coefficient of efficiency
It compares the agreement of fit between observed and expected values, as in equation (1). It should be observed that a higher CE value means that the model is more efficient [27].
Where; Pi represents the predictions, Oi represents the observations, n represents the number of observations, and O ̅ represents the average observed value.

Pearson correlation coefficient
It is a metric for determining the strength of a linear relationship between the modeled and observed values. The calculation formula of this index is displayed in equation (2). It lies in the range between [-1, 1], with 1 and -1 denoting positive and negative linear correlation, respectively. Furthermore, a value of 0 means that the simulated and observed values do not correlate [11].
Where; P̅ represents the average predicted value.

Willmott's index of agreement
It is calculated by multiplying the ratio of mean square error to probable error by the number of data points and subtracting one, as shown in equation (3). Its value ranges from 0 to 1. It should be observed that a higher WI value indicates a good agreement between the predicted and actual values and vice versa [28].
The Wilcoxon-Mann-Whitney U-test is also used to determine the significance level of the neural network models' outcomes. It is a nonparametric test that compares two models without assuming that the values are normally distributed [29]. Finally, the correspondence of the neural network models compared to the observations is examined using the Taylor diagram [30]. This graph can discuss multiple aspects, such as the correlation coefficient, root-mean-square difference, and standard deviation between the observed and predicted rainfall. The Taylor diagram will confirm the accuracy of the performance metrics outlined in Equations 1-3.

Model development
The suggested rainfall forecasting model starts with a review of the literature on rainfall forecasting models, neural network models, and performance measures. The next step comprises performing the neural network models, predicting the total monthly rainfall, applying evaluation metrics to models' outputs, testing the statistical significance of models, validating the results of statistical metrics, and selecting the optimum rainfall forecasting model. Statistical significance tests are an efficient approach to test the significance levels of the prediction models. In this regard, they are utilized as a complementary stage to the performance evaluation comparison for the sake of selecting the bestperforming prediction model [31].
For the FFNN, RNN, CBPN, and EBP, several parameters (e.g., number of hidden neurons and layers, and learning algorithm) must be determined before the establishment of the model. In this research, the number of hidden neurons and layers are assumed to be 10 and 2, respectively [32]. Moreover, the scaled conjugate gradient algorithm is employed because it provides the best and fastest train and test efficiency [33,34]. For the general regression neural network, the only determining parameter is spread constant/ sigma. The spread constant value is deemed to be 1 in this study. Finally, the neural network models are built using the MATLAB R2015a neural networks toolbox. After developing the neural network models, their predictive performances are compared using three evaluation metrics, which are CE, R, and WI. The Wilcoxon-Mann-Whitney U-test is also used to determine the significance level of the neural network models' outcomes. Besides, the consistency of the neural network models compared to the observations is examined using the Taylor diagram. Finally, the optimum rainfall forecasting model is selected.

Case study
The proposed neural network models are trained and tested using tropical rainfall measuring mission (TRMM) dataset at Suez city in Egypt (Figure 1). The meteorological data of this region is hot and dry [35]. Therefore, it was chosen as a representative of arid regions in Egypt because of its low rainfall and high temperature. The latitude of this region lies between 29º55'36.48" N and 30º6'38.88" N while the longitude lies between 32º25'58.80" E and 32º38'0.60" E. The measured daily rainfall data was downloaded from the NASA tropical rainfall measuring mission (TRMM) (https://giovanni.gsfc.nasa.gov/giovanni/). It provides a detailed dataset of rainfall distribution across continents. The total monthly rainfall data were obtained by summing the values of the daily rainfall data. The monthly rainfall data of 20 years from March 2000 to December 2019 were used (until the end of mission recording). The data were divided randomly into 60% for the training phase (143 reading from March 2000 to January 2012) and 40% for the testing phase (95 reading from February 2012 to December 2019). The statistical parameters of the monthly rainfall in the studied region during the period of 2000 to 2019 are presented in Table 1. It is clear that the observed monthly rainfall data don't follow the normal distribution and are rather right-skewed due to their positive skewness value.  Figure 1. Location of Suez city in Egypt.

Results and discussion
A comparison between the predicted and observed rainfall measures at Suez city in Egypt can be seen in Figure 2. As discussed previously, three different measures are used to evaluate the performance of neural network models: CE, R, and WI, as summarized in Table 2. In general, the highest CE, R, and WI are the indicators of the best performance of any model. In terms of CE, GNN has a CE value of 0.84, whereas FFNN, RNN, CBPN, and EBP models have CE values of 0.67, 0.61, 0.73, and 0.65, respectively. The values of R for FFNN, GNN, RNN, CBPN, and EBP models are 0.83, 0.92, 0.79, 0.87, and 0.82, respectively. In contrast to other models with correlation coefficient values less than 0.87, GNN has the largest R-value (0.92). Similar to R results, GNN performs better in terms of WI. For example, the WI values for FFNN, GNN, RNN, CBPN, and EBP models are 0.91, 0.95, 0.89, 0.93, and 0.90, respectively. This shows that the GNN model has shown the same patterns of superiority over the other models. Taking into account the architecture of the addressed models, GNN can be identified as a reliable model for forecasting daily water rainfall in Suez, Egypt. CBPN is ranked second, and its performance in most of the applied metrics is very similar to that of GNN. RNN is the worst performing model because it has the lowest values of the applied metrics.  Wilcoxon-Mann-Whitney U-test is performed to evaluate the significance level of the neural network models' outcome, whereas the significance level is set to be 0.05. The conducted test checks the null hypothesis, which states that the ranks of the two models are identical. The alternative hypothesis, on the other hand, implies that there is a distinction in the ranks of the two models. If the P-value is less than the significance level, the null hypothesis is rejected in favor of the alternative hypothesis. However, if the P-value is greater than the degree of significance, the null hypothesis is acknowledged. The results of the two paired Wilcoxon-Mann-Whitney U-test are depicted in Table 3. It is obvious that the P-values of pairs (FFNN, GNN), (FFNN, CBPN), and (GNN, EBP) are more than 0.05, indicating statistically insignificant difference between the models (i.e., null hypothesis is true). On the other hand, the P-values of the pairs (FFNN, RNN) Figure 3. Comparison of the prediction models and observations using Taylor diagram. Figure 3 shows the Taylor diagram, which is used to check the models' accuracy by calculating the root mean square error, standard deviation, and correlation coefficient between the predicted and observed results. The diagram shows that the correlation coefficient values of the prediction models lie in the range of 0.7-0.95. The RNN shows the lowest correlation coefficient value (i.e., 0.79), while the GNN shows the highest correlation coefficient value (i.e., 0.92). Besides, the root mean square error values of the prediction models lie in the range 5-15. The GNN and RNN shows the lowest (i.e., 8.00) and highest (i.e., 12.41) root mean square error values, respectively. Finally, in terms of the standard deviation, the values of the prediction models lie in the range between 15 and 20. The standard deviation of the GNN model is 15.84, whereas the standard deviations of the other models are more than 17.64. Finally, in terms of root mean square error, standard deviation, and correlation coefficient, the GNN model provides more reliable predictions. These results coincide with that obtained using the performance metrics.