Efficacy of ANFIS-GOA technique in flood prediction: a case study of Mahanadi river basin in India

Accurateness in flood prediction is of utmost significance for mitigating catastrophes caused by flood events. Flooding leads to severe civic and financial damage, particularly in large river basins, and mainly affects the downstream regions of a river bed. Artificial Intelligence (AI) models have been effectively utilized as a tool for modelling numerous nonlinear relationships and is suitable to model complex hydrological systems. Therefore, the main purpose of this research is to propose an effective hybrid system by integrating an Adaptive Neuro-Fuzzy Inference System (ANFIS) model with meta-heuristic Grey Wolf Optimization (GWO) and Grasshopper Optimization Algorithm (GOA) for flood prediction in River Mahanadi, India. Robustness of proposed meta-heurestics are assessed by comparing with a conventional ANFIS model focusing on various input combinations considering 50 years of monthly historical flood discharge data. The potential of the AI models is evaluated and compared with observed data in both training and validation sets based on three statistical performance evaluation factors, namely root mean squared error (RMSE), mean squared error (MSE) and Wilmott Index (WI). Results reveal that robust ANFIS-GOA outperforms standalone AI techniques and can make superior flood forecasting for all input scenarios.


INTRODUCTION
Forecasting various hydrological phenomena is of significant concern in the field of hydrology and is pivotal for appropriate water resources development and disaster management. Every year, substantial public and financial damages, as well as fatalities, are caused by dangerous storms worldwide, specifically in areas subjected to monsoon weather and regions with slow growth of water conservancy schemes ( Jiang et al. 2013;Wang et al. 2015;Yu et al. 2015). Flood prediction and forecasting act as the essential practices to control flood events across the globe (Young 2002;Campolo et al. 2003;Moore et al. 2005;Jiang et al. 2016;Rath et al. 2017;Panigrahi et al. 2018;Samantaray & Sahoo 2020). Evidence of this information is a complex investigation that has concerned researchers over past decades. Upstream circumstances intensely influence the flood flows in downstream zones; therefore, a flood forecasting model needs to be developed which can detect accurately eloquent fundamental connection amid downstream and upstream situations. Artificial neural networks (ANNs) act as suitable models for the problem mentioned above. Over past decades, ANN and Adaptive Neuro-Fuzzy Inference System (ANFIS) models have been comprehensively utilized in a variety of engineering applications involving hydrology such as simulation of rainfall-runoff process (Wu & Chau 2011), model groundwater problems (Sahoo et al. 2005;Taormina et al. 2012), forecast streamflow (El-Shafie et al. 2007;Shu & Ouarda 2008) and modelling water quality (Singh et al. 2009;Yan et al. 2010). In the last decade, numerous developments have been made to improve both the enactment and consistency of ANN tools. In recent times, attention has been shifted from focusing on the applicability of ANN tools to importance on refining estimation capability and clarifying the inner conduct of ANN tools (Maier & Dandy 2000;Sudheer & Jain 2004;Araghinejad 2013). Chen et al. (2006) proposed the construction of a flood forecast model using ANFIS in the Choshui River, Taiwan, and compared its performance with a back-propagation neural network (BPNN). Obtained results demonstrated that ANFIS was effective and reliable to construct a flood forecasting model with better accuracy. Bisht & Jangid (2011) used ANFIS to develop river stage-discharge models at the Dhawalaishwaram barrage site in Andhra Pradesh, India. Based on the comparison of observed and estimated data, outcomes revealed that ANFIS performed better in predicting river flow discharge compared to customary models. Rezaeianzadeh et al. (2014) used ANFIS, ANN, multiple linear regression, and multiple nonlinear regression to forecast the peak flow of Khosrow Shirin catchment, positioned in the Fars region, Iran, on a daily basis. Predictive capabilities of the proposed model were evaluated and it was observed that ANFIS performed superiorly for predicting daily flow discharge at the proposed site with spatially distributed rainfall as input. Ghorbani et al. (2016) investigated the usability of two diverse ANNs, which include multilayer perceptron (MLP) and RBFN, and a comparison is made with support vector machine (SVM) for predicting streamflow in Zarrinehrud River, Iran on a monthly basis. Results indicated that the SVM model was more certain and consistent than MLP and RBFN in river flow prediction. Zhou et al. (2019) proposed a recurrent ANFIS entrenched with GA and least square estimator that helped in optimizing model constraints to make multi-step-ahead flood forecasts of Three Gorges Reservoir, China. The results demonstrated that the proposed model significantly improved the accuracy of flood forecasts.
Selection of model input (for example, pre-processing of data utilizing data-mining methods), model parameter optimization and post-processing of model output (for example, real time correction, ensemble forecasts) are key motivations and significant constituents in multi-step ahead hydrological forecasts. Several climatological parameters have a significant impact on the performance of models while dealing with multi-step ahead forecasts in application to real world problems. While making multi-step ahead flood forecasts, models with different problems (e.g. model instability/overfitting) will fail in tracing flow traces carefully, particularly during peak flows, because of an increase in forecast horizon. Therefore, an effective algorithm is necessary to determine an optimum network parameter setting for improving the reliability and stability of forecasting models.
Optimization algorithms such as differential evolution (DE), particle swarm optimization (PSO), genetic algorithm (GA) and GWO have been developed and integrated with data-driven models for forecasting various water resources and environmental problems (Senapati et al. 2007;Guo et al. 2014;Zhang et al. 2014;Prasad et al. 2017;Yaseen et al. 2017;Ewees & Elaziz 2018;Dehghani et al. 2019aDehghani et al. , 2019bTikhamarine et al. 2020). Mirjalili et al. (2014) introduced a GWO algorithm to optimize an MLP network showing superior performance compared to GA, PSO, Evolution Strategy (EA) and Ant Colony Optimization (ACO). Tikhamarine et al. (2020) proposed different efficient hybrid neural network models combining GWO with ANN, SVM, and MLR to improve precision and ability in forecasting streamflow on a monthly basis at Aswan High Dam in the Nile River. The results revealed that integrated techniques used in their study outperformed standard ANN, SVM, and MLR techniques and made improved forecasts throughout training and testing periods for monthly inflow. However, GWO has a few disadvantages such as low solving accuracy, unsatisfactory ability of local searching, and slow convergence rate. In recent times, a new meta-heuristic nature inspired optimization algorithm called GOA was introduced by Saremi et al. (2017). GOA is based on the swarming behaviour of grasshoppers. It is utilized to improve ANFIS model performance, which has previously not been explored for flood prediction. This algorithm is classified as a multi-solution algorithm in optimization problems having higher accurateness and avoids local optima. It has proved to be an influential algorithm in challenging problems which deal with unidentified search spaces (Saremi et al. 2017;Mirjalili et al. 2018). There are several applications of GOA integrated with AI models in researches involving various fields of science and engineering: selecting harmonic elimination in low-frequency voltage source inverter (Steczek et al. 2020); approximate flyrock distance in mine blasting (Fattahi & Hasanipanah 2021); estimating the parameter of photovoltaic modules on the basis of single diode models (Montano et al. 2020); neural assessment of heating load (HL) of residential buildings (Moayedi et al. 2019); optimal deployment of wireless sensor networks (Deghbouch & Debbat 2021); prediction of pipe burst in urban water distribution systems (Alizadeh et al. 2019) and many more. GOA also has specific application in modelling various hydrological parameters such as evaluation of rainfall temporal variability (Farrokhi et al. 2020); monthly prediction of groundwater level (Seifi et al. 2020); forecasting short-term hydrological drought (Nabipour et al. 2020) and optimization of the non-linear muskingum flood routing model (Khalifeh et al. 2020).
The present research utilizes robust ANFIS-GWO and ANFIS-GOA models for flood prediction in the Mahanadi river basin, India, and outcomes achieved are assessed with convential ANFIS and ANN models. Based on a literature survey, it is observed that no research has been carried out for predicting flood events using the robust ANFIS-GOA technique. The novelty of this research is the application of ANFIS-GOA in flood prediction, this has been carried out by the authors. Also, this study aims for the sensitivity analysis for three different artificial intelligence tools to forecast monthly flood water levels. In the present research, special attention is paid to modelling parameter optimization. This research also places emphasis on various input combinations for different scenarios which has a strong impact on the desired model output.

STUDY AREA
The Mahanadi River (Figure 1) flows in central India, rising in the hills of the southeastern state of Chhattisgarh, and mainly flowing through Odisha state. Mahanadi has a total course of 858 km (494 km in Odisha) and has an estimated drainage area of 141,600 km 2 (65,580 km 2 in Odisha), which is about 42% of the Odisha state. Mahanadi lies within 20.11°N 81.91°E coordinates. It is known for its devastating floods, causing much misery to life and property, as recorded in history. It is a significant river for the state of Odisha. It originates from the south of Sihawa town in Dhamtari district of Chhattisgarh and finally discharges to the Bay of Bengal at False Point of Jagatsinghpur, Odisha. In the present research, Jondhra and Kesinga gauge stations of Mahanadi river basin are selected for predicting flood events.

METHODOLOGY ANFIS
On the basis of fuzzy sets theory, fuzzy logic (FL) works where there is no brusque or obvious margin. Unlike twovalued Boolean logic, FL is multi-valued and deals with degrees of membership and truth. It utilizes any rational value from a real number set between 0 (totally false) and 1 (totally true), identified as its membership value (MV) and the function which correspond to those specified values is called the membership function (MF) (Das et al. 2019). MF can take the shape of triangle, trapezoid, Gaussian or sigmoid and are chosen on the basis of usability. It possesses coherent functions such as AND, OR, NOT. Every function possesses its own description on the  basis of MV theory. Fuzzy set of laws are additional significant elements of FL that narrate fuzzy sets to one another. The IF-THEN rule suggests that if rule is correct as well as precursor, that resultant is also correct. To train ANFIS indicates finding out constraints in its architecture utilizing an optimization function. In the training period, principle and resultant constraints are used, as revealed in Figure 2. For obtaining efficient outcomes using ANFIS, it is requisite to train its network. Since growth in application of ANFIS, various training techniques have been recommended for achieving improved outcomes. Amid various Fuzzy Inference Systems, Takagi-Sugeno (Takagi & Sugeno 1985) is among the most regularly utilized. A distinctive rule set having dual fuzzy 'IF-THEN' rule is classified as: where A 1 , A 2 and B 1 , B 2 are MVs of input variables x and y, correspondingly; p 1 , q 1 , r 1 and p 2 , q 2 , r 2 are constraints of output functions f 1 and f 2 . NF systems are an outcome of conjunction of ANN and FL. ANFIS comprises of five layer MLP wherein the BP algorithm is utilized for modifying primarily nominated MFs whereas the least mean square algorithm regulates unidentified factors of linear output functions. Mirjalili et al. (2014) proposed GWO algorithm mimicking social behaviour and hierarchy of grey wolves. GWO is a novel meta-heuristic optimization algorithm. In general, the wolves pack is distributed into four categories: Alpha (a), Beta (b), Delta (d) and Omega (v). Alpha (a) wolf is the most dominant wolf and is the leader of the wolves pack. The level of domination goes on decreasing from a to v as presented in Figure 3. The mechanism involved in GWO is carried out by splitting a solution set into four groups for a specified optimization problem. a, b and d wolves are the first three solutions, whereas residual solutions fall in the group of ω wolves. For implementing this mechanism, the hierarchical step is updated in each iteration on the basis of three optimal solutions. A representation of the updated position is demonstrated in Figure 4. The significant approach involved in GWO is to search, encircle, hunt, and finally attack the prey. Previous to hunting procedure, grey wolves encircle the victim. The encircling conduct of grey wolves is represented in the following equation:  whereX(t þ 1) is the next position of any wolf, X P ! (t) is the grey wolf position vector, t is the present iteration,Ã is the coefficient of matrix andD is the distance that separates wolf and victim. This is estimated using the following equation:
The preceding equations allow a solution for relocating around prey in a two-dimensional search space. Nevertheless this is insufficient for simulating the societal intellect of grey wolves. For simulating the prey, preeminent solution achieved by α wolf is nearer to the position of prey, however the global optimum solution is not known. Hence it is anticipated that the topmost three results have a better awareness about their position, so the remaining wolves must be prepared for updating their positions utilizing the following equations: where X 1 ! , X 2 ! and X 3 ! are calculated utilizing the following equations: The encircling and attacking of prey repeatedly continues until an optimal solution is achieved or it reaches maximum iterations.

GOA
GOA is a newly developed meta-heuristic optimization algorithm based on grasshoppers swarming behaviour (Saremi et al. 2017). The function to analyze swarming behaviour is well-defined as follows (Mafarja et al. 2019): where X i is the ith grasshopper position, S i is classical interaction, G i is gravity force and A i is wind advection. S i is expressed as: where s designates social forces and d ij is the distance between the ith and jth grasshoppers.
Here, s can be defined as: where f is intensity of attraction and l is scale of attractive length. G is expressed as: where b e g is the unit vector and g is the gravitational constant. A i is expressed as: where c e w is the unit vector and u is continuous drift. Lastly, a new location is computed using the following equation: where N is the number of grasshoppers. Moreover, an improved function is used for updating the grasshopper's location: where lb d is the lower bound; ub d is the upper bound; c T d is the Dth dimension value; and c is a declining coefficient, repulsion zone and attraction zone. Figure 5 represents the flowchart of GOA. Equation (17) is computed on the basis of location of the best grasshopper (target), its present location and the location of all other grasshoppers.
To balance exploitation and exploration, parameter c is needed to be reduced in proportion to the number of iterations. This mechanism promotes exploitation as there is an increase in iteration count. It is relevant to emphasize the variable C insertion calculated with Equation (18): Pseudo-code of GOA

Proposed hybrid methodology
In the present study, GOA and GWO algorithms were applied for evaluating optimum values and training ANFIS. For developing ANFIS-GOA and ANFIS-GWO, optimum GOA and GWO parameters can be set on the basis of abundant parametric studies. For creating ANFIS-GOA and ANFIS-GWO, two codes were generated in MATLAB. In ANFIS-GOA and ANFIS-GWO, GOA and GWO helps hybrid approaches to possess a closer relationship concerning input and output. Established robust techniques can estimate more precise outcomes for nonlinear problems. Major work in a hybrid modelling approach is the appropriate selection of GOA and GWO parameters. For evaluating the best values of GOA and GWO parameters, a trial and error approach was employed and optimum values were obtained. GOA parameters are given in Table 1. Also, Figures 4  and 6 shows the training process of ANFIS by GOA and GWO algorithms.

Preparation of data set
Precipitation (P) data was obtained from CWC, Bhubaneswar, whereas temperature (T), solar radiation (S r ), humidity (H), evapotranspiration loss (E l ), absorption loss (A l ) and percolation loss (P l ) data were collected from IMD, Pune, for the period of 1970-2019. Data from 1970 to 2004 were used for training, and 2005-2019 for testing purposes. Required monthly data were converted from daily data which is necessary in the training and testing model. The following arrangements were applied as input: Scenario 1: P, T, S r , H Scenario 2: P, T, S r , H, E l Scenario 3: P, T, S r , H, E l , A l Scenario 4: P, T, S r , H, E l , A l , P l Input and output were arranged in a manner so that all data fell within a computed range previous to training. This process is called normalization where normalized values fall within 0 and 1. The equation employed for scaling normalized data is: Attraction intensity f 0.5 Attraction longitude scale l 1.5 where I t ¼ converted data series, I ¼ actual input data set, I min ¼ minimum of actual input data set, and I max ¼ maximum of actual input data set.

Evaluating standards
Willmott's index, MSE and RMSE are assessing measures for finding the best model. For a selection of the best model for the proposed area of study, MSE, RMSE must be minimum while WI is maximum.
The Willmott's index value stipulates the percentage of deviation in one variable clarified by another variable (Willmott et al. 1985): x comp ¼ mean simulated data; and x obs ¼ mean actual data. The above results show that when evaotranspiraion loss is added to normal climatology indices (Scenario II) it impacts more accuraracy than Scenario I. Again, when we consider absorption loss (Scenario III), we get better results than Scenario II. The overall results show that Scenario IV is more prominent than the other three scenarios. Senario IV is different from Scenario III in terms of percolation loss. Therefore, losses are the key constraint toward flood forecasting. Similarly, on the other hand, the new hybrid model (ANFIS-GOA) showed better performance than ANFIS, ANFIS-GWO model. However, the obtained results showed that ANFIS-GOA perfomed best among other models. It has a lower coefficient of variation compared to other models and optimization algorithms. The major contribution of this research is the assessement of flood prediction potential by hybrid models based on parametric effects owing to regulatory weights and parameters formed in the training phase of models.

Assessment of outcome for different models
Appraisals of ANFIS, ANFIS-GWO and ANFIS-GOA model at training and testing periods for all proposed gauge stations are presented in Figures 7 and 8. The paramount value of WI for ANFIS, ANFIS-GWO and ANFIS-GOA models was 0.96097, 0.98139, and 0.99158, respectively, for Jondhara station. Similarly, for Kesinga     peak of 5,568 m 3 /s for Jondhra gauge site. For Kesinga, projected peak floods are 5,036, 5,145 and 5,199 m 3 /s for ANFIS, ANFIS-GWO and ANFIS-GOA, respectively, contrary to the observed peak of 5,308 m 3 /s, as presented in Figure 10. This indicates a significant impact on flood and was found to be beneficial for flash flood regions with a predictive flood index. Figure 11 shows a box-plot of observed and simulated flood values from 1970 to 2019 in Kesinga and Jondhra gauge stations. Assessment of ANFIS, ANFIS-GWO and ANFIS-GOA techniques with observed values reveals that the ANFIS-GOA method can make high flood estimations. Moreover, for ANFIS the box area is minimum which approves lower accurateness of ANFIS in comparison to other models.
A histogram showing the ratio of simulated and observed flood values for ANFIS, ANFIS-GWO and ANFIS-GOA models have been presented for assessing the frequency of datum points in a number of selected error bins. Here, the total number of months binned on the x-axis has been analysed where the probability of occurrence for any given time series has been checked. A close investigation of simulated and observed flood by ANFIS-GOA and its relative models is shown in Figures 12 and 13. These signify probability distribution of data. It is significant to understand that these plots are very necessary for representing probability occurrence of a specified flood value inside a particular interval. Based on this amount of model precision, it is apparent that probability distribution of predicted flood values by the ANFIS-GOA model were very near to the observed flood values for most intervals as presented in Figures 12 and 13.
Comparison of model performance MSE, RMSE and WI indictors are used for evaluating the performance of ANFIS, ANFIS-GWO and ANFIS-GOA models for two gauge basins. An assessment of the performance indicators is listed in Table 2, illustrating  the efficiency of each model. Evaluating flood is very significant and hence proposed the methods applied in the present study are important to demonstrate flood prediction information. Therefore, calculation of RMSE, WI, and MSE values is vital to predict flood. It is apparent that the ANFIS-GOA model executed well compared to ANFIS and ANFIS-GWO for four scenarios. Evaluation and assessment are conducted for studying the performance of the models.
The obtained results clearly show the advantages of GOA to solve real-world problems with unidentified search space. The success of GOA is due to many reasons. In the preliminary steps of optimization, the exploration capability of GOA is high because of the huge repulsion rate amid grasshoppers. This helps GOA in broadly exploring search space and discovering its favorable areas. Then, in the final steps of optimization, exploitation is high because of higher attraction forces amid grasshoppers. This behaviour causes a local search and improves the accurateness of solutions found in the exploration stage. GOA efficiently balances exploitation and exploration, primarily focusing on avoiding local optima, and then conjunction. Suggestion of an adaptive comfort zone coefficient is the reason behind this behaviour. Steady declining of this constituent carries grasshoppers nearer   to target which is proportional to the number of iterations. Finally, the suggested target chasing mechanism necessitates GOA for saving the best solution attained so far as target, and driving grasshoppers towards it with a hope to improve its accurateness or find a superior one in the search space. In view of simulation, result, discussions, and analysis of this research, we believe that GOA is capable of solving several optimization problems efficiently. It considers a specified optimization problem as a black box, thus it does not require gradient information of search space. Hence, it can be employed to any optimization problem in various fields conditional on appropriate problem formulation. The above results indicated that performance of ANFIS-GOA in terms of RMSE and WI value is prominent compared to hybrid ANN with KNN (Kan et al. 2020); ANFIS-GA, ANFIS-PSO, ANFIS-ACO (Azad et al. 2018); hybrid deep learning ConvLSTM (Moishin et al. 2021); FSF-ARIMA (Banihabib et al. 2020) and ANFIS (Rezaeianzadeh et al. 2014).

Sensitivity analysis
In this research, the sensitivity of precipitation (P), temperature (T), solar radiation (S r ), humidity (H), evapotranspiration loss (E l ), absorption loss (A l ), percolation loss (P l ) and constraints toward flood prediction through different machine learning approaches are discussed. At first, we developed a model using precipitation, temperature, solar radiation, humidity parameters and found the model efficacy at two proposed stations. Second, evapotranspiration loss was included with the previous model and found better results than Scenario 1. Similarly, inclusion of absorption loss (A l ) with Scenario 2 for all techniques found more performance value than Scenario 2. While we considered percolation loss (P l ) as an input constraint (Scenario 4) with the previous arrangement (Scenario 3) for model development, it gives preeminent value performance for all five machine learning approaches during both training and testing phases. Moreover, we found that all three losses (evapotranspiration loss, absorption loss, percolation loss) possess a sensitive effect towards flood prediction for all five proposed machine learning algorithims. Also, it was found that the ANFIS-GOA model gives more sensitive performance value than other proposed machine learning approaches. As the proposed study area is within a flood prone region, development of ANN models will aid in assessing flood discharge. These results suggest the most appropriate methods for developing environmental concerns for estimating flood in the stations of any flood region. However, a combination of techniques needs to be examined for improving the conjoint modelling techniques for the future.

Limitations and future scope
A major disadvantage of applying hybrid machine learning algorithms is that the training time of the model increases after hybridization of machine learning and meta-heuristic algorithms, particularly while dealing with certain complex problems. Also, they are classifier specific methods and depend on a combination of different feature selection methods. Taking into consideration the advantages and disadvantages of various algorithms, conjoining search strategies of different algorithms for generating a novel algorithm is a burning research matter. The prediction performances of data-driven models are subject to quality and quantity of data. This study is conducted in a specific location (Jondhra and Kesinga gauge stations of Mahanadi river basin). The scope of the present study can be extended by applying ML models to various other geographical locations. Selection of best input combinations for a particular model can possibly vary with changes in default model operators. This assessment of selecting best input combinations using different approaches could be an interesting subject for future research. Moreover, in the direction of future research, it is significant to mention that not all rules in model architecture are vital; hence, it is essential to reduce the complexity of trained models by removing noncontributing rules leading to a decrease in the computational cost of the network. To improve proposed methods more state-of-the-art AI methods, for example the probabilistic and ensemble forecasting methods, could be combined with data-driven models for reducing uncertainties in multistep ahead flood forecast in future research.

CONCLUSION
This study investigates the potential of new hybrid models combining GOA and GWO algorithms with the ANFIS (ANFIS-GOA and ANFIS-GWO) model for prediction of flood events. To achieve this objective, Jondhra and Kesinga stations located on the Mahanadi River were chosen as the case study. GOA and GWO were developed to optimize the parameters of ANFIS and were then compared with simple ANN and ANFIS models. The