Real-Time and Intelligent Flood Forecasting Using UAV-Assisted Wireless Sensor Network

TheWireless Sensor Network (WSN) is a promising technology that could be used to monitor rivers’ water levels for early warning flood detection in the 5G context. However, during a flood, sensor nodes may be washed up or become faulty, which seriously affects network connectivity. To address this issue, Unmanned Aerial Vehicles (UAVs) could be integrated with WSN as routers or data mules to provide reliable data collection and flood prediction. In light of this, we propose a fault-tolerant multi-level framework comprised of a WSN and a UAV to monitor river levels. The framework is capable to provide seamless data collection by handling the disconnections caused by the failed nodes during a flood. Besides, an algorithm hybridized with Group Method Data Handling (GMDH) and Particle Swarm Optimization (PSO) is proposed to predict forthcoming floods in an intelligent collaborative environment. The proposed water-level prediction model is trained based on the real dataset obtained from the SelangorRiver inMalaysia. The performance of the work in comparison with other models has been also evaluated and numerical results based on different metrics such as coefficient of determination (R2), correlation coefficient (R), RootMean Square Error (RMSE),Mean Absolute Percentage Error (MAPE), and BIAS are provided.

multi-hop communication, a UAV is called to bridge the communication and send the data to the base station.
Wireless sensor nodes are deployed at the edge of the urban river to monitor water flow behavior during times of flood or prolonged rainfall, and UAVs are adapted for wireless data collection from the sensors. In order to optimally use the UAV and efficiently control its topology, the disaster area is divided into several sub-regions by the cloud and the center of each subregion is considered as the hovering point of the UAV. Then, the sensor nodes are grouped into these sub-regions according to the received signal strength (RSS) of the detected beacons. In each sub-region, packets are forwarded based on a random walk process to collect the data of sensor nodes. If the packet returns to the starting node with an expected time of t, it can be determined that there is a failure, and that the UAV is functioning as a relay and forwards the next packet to the cloud. The main contributions of this paper are threefold: • We propose a framework for real-time data collection based on a multi-hop WSN and a UAV in which the UAV as a router relays the data packets of the sensor nodes when they fail to find any available node as the next hop. • We integrate cloud and SDN to manage network connectivity across the data center and simplify the dynamic programming process. we divide the disaster area into several subregions, and the random walk model is used by the UAV to collect data of each sub-region, including nodes IDs and neighbor tables in sub-regions. Then, the collected data will be forwarded to the cloud empowered by SDN for flood prediction. • We propose a novel prediction model for predicting floods. Once river flow data is transmitted to the central prediction unit, integrated Group Method Data Handling (GMDH) with Particle Swarm Optimization (PSO) is used to forecast floods.
The rest of the paper is structured as follows. Section 2 discusses the related works on the topic. In Section 3 we provide a statement for the considered problem, whereas in Section 4 we outline a multi-level network model. Section 5 presents the prediction model, whereas Section 6 explains the results. Section 7 illustrates the discussion. Finally, conclusions and future directions of the paper are given in Section 8.

Related Work
Although several works integrated UAVs and WSNs, it should be stressed that none of them make use of UAVs to enable higher-resilience WSNs during flood prediction or make evaluations based on real data. Concerning quick learning for UAV navigation tasks, some previous works typically emphasize accurate methods for components such as perception and relative pose estimation [10] or trajectory optimization and control [11]. UAVs can support various wireless communication protocols. For example, UAVs can communicate with WSNs in a self-organized way by ZigBee modules [12,13] and have the ability to serve as relays to forward data to the cloud [14][15][16][17]. These models include Artificial Neural Networks (ANNs), Genetic Programming (GP), Adaptive Neuro Fuzzy Inference Systems (ANFIS), and Support Vector Machines (SVM) to evaluate the longitudinal dispersion constant [18]. Among these techniques is the GMDH that is a self-organizing method with non-linear network models. It uses a combination of a quadratic polynomial in a multi-layer procedure [19]. Many recent algorithms such as GMDH networks have been able to perform accurate predictions, especially the river water stage prediction. The GMDH networks were a quick learning machine planned by Ivakhnenko in the 1960s [20,21]. The GMDH networks provide effective and efficient technical performance in various engineering fields [21], but their training suffers from certain disadvantages such as local minimum and slow convergence. Therefore, selecting an applicable training model is one of the paramount steps within the development of a data-driven model. This study adopted the PSO technique [20] to train GMDH networks for river prediction models. The developed model is a hybrid method for one-day-ahead prediction of river water where a non-linear regression approach is adopted due to the complex process of river flow prediction in natural rivers. It is evaluated in simulated networks in Malaysia, where some other neural network-based models, including DE, GA, and ANN, are also tested for comparison. The effective forecasting technique for river water stages would minimize losses from flooding exploitation due to the prediction of what people close to the river need [22][23][24]. Some limitations of the GMDH technique include slow convergence in training, imprecision in parameter assessment, overfitting, the partition of information, and low accuracy. Therefore, a hybrid version of GMDH was planned to considerably boost its performance. Robinson and colleagues [25] presented a Multi-Objective GMDH (MOGMDH) algorithm within a consistency criterion that used three different selectors within the choice procedure. This significantly improved the performance of the GMDH algorithmic program. Hiassat et al. [26] proposed the Genetic Programming-GMDH algorithmic program, which applies genetic programming to discover the simplest functions that can map inputs to outputs for every layer of the GMDH algorithmic program, and they presented a model that achieves better results than the standard GMDH algorithm in time series predictions using financial and weather information. Genetic Algorithms (GAs) have recently attracted attention in feedforward self-organizing networks. In this study, neuron connections are controlled to adjacent layers [27]. The lack of effective training algorithms for training multi-layer perceptron is an important issue in GMDH networks. In recent years, some data-driven improvements to training algorithms such as Back Propagation (BP) [28], Levenberg-Marquardt procedure [29], and scaled conjugate gradient procedure [30] have been used to perform training tasks. Usually, gradient-based methods have some drawbacks, such as slow speed convergence during training and getting trapped in local minimums. So far, several prediction approaches have been proposed. However, none of these approaches has taken into consideration the effect of data collection by UAVs for river flow prediction along with the PSO algorithm for training the GMDH model. We made a comparison to prove the novelty of the proposed model. The comparison with the state of the art is provided in Tab. 1. The table presents the proposed models that used UAVs for data collection from the sensor nodes using UAVs.

Problem Statement
In WSN-based flood monitoring approaches, nodes might be destroyed or get faulty during a flood that seriously affects the network connectivity. To overcome this issue, UAVs could be deployed to act as routers or data mules to fill the network communication gap caused by the inactive nodes. UAVs relay packets from the isolated nodes and enable continuous flood monitoring. In our UAV-assisted data collection mechanism, the WSN is modeled as an undirected graph as follows: Let G = (V , E) be a simple connected and undirected graph, where V and E represent the vertex and the edge set, respectively. In the WSN, the sensor nodes $n$ and the wireless communication links m are modeled as vertices and edges, respectively. The set of vertices is represented as V = {v 1 , v 2 , v 3 , . . . , v n } and set of edges E = {e 1 , e 2 , e 3 , . . . , e m } is expressed as the wireless communication links. The degree(v i ) represents the degree of a vertex and shows the number of valid neighbors of a sensor node. The value of degree(v i ) may change during the flood prediction process, due to the destroyed nodes. Also, the Valid neighbors are defined as nodes with valid wireless communication capability. Furthermore, we assume that each node possesses the information of its neighbors in a table that includes the connectivity status, neighbor node IDs and the radio signal strength indicator (RSSI) between the nodes. Matrix C shows the connectivity status between the nodes as follows: According to the matrix, if e(v i , v j ) exists, the vertices v i and v j can communicate. Otherwise, If there is no possibility for wireless communication between v i and v j , the UAV nodes is called to collect data from the node. To solve this problem, the sensor nodes are grouped into N subregions by the cloud using a number of beacons with known locations. Each sensor node records all the detected beacons, and selects the certain sub-region based on the highest RSS of the beacons' signal. Then, the random walk process is applied for propagating data on a connected graph with n vertices and m edges at the sub-regions. Given K sensor nodes in a sub-region, the distance matrix is defined as D and the location of the first UAV hovering point is expressed as ml f and while the UAV moves to the mth location, the distance matrix is defined as ml m . With m recorded locations, the collected data of K sensor nodes can be predicted through the proposed flood prediction model.

Proposed Multi-level Architecture
Here, the details of the suggested architecture are explained. The proposed network model is an adaptable and scalable model with multiple applications. The model was designed with three layers. In the cloud-SDN layer, a centralized SDN controller was defined as the main control entity and the central processing unit for action predictions. The SDN controller linked the ground WSN and UAV. The second layer included UAVs operated on-demand, with progressive sensors and communication. The third layer covered ground WSNs with scalar sensors such as rainfall sensors and water level sensors. Fig. 1 shows the network model and the key components of the cloud-SDN, UAVs, and sensors. The main components of the suggested framework are presented in detail.

Figure 1: Proposed multi-level architecture
A communication network is an important component of the flood control system. With the integration of advanced technologies and applications for achieving smarter controlling of rivers, a vast amount of data from different locations will be generated for analysis, update, control, and real-time flood predicting methods. Thus, the management of these networks is the main challenge due to the scale. Moreover, the equipment may not be able to exchange information due to heterogeneous devices and applications. Hence, it is a vital issue to find the best communication infrastructure to control and manage all devices throughout the total system, considering the real-time constraint. In this model, cloud computing-based SDN is a good solution to the aforementioned problems, thanks to the following advantages. Cloud technology offers high computing capacity to flood prediction utilities. Moreover, flexible per flow routing is possible using SDN and the flow can be defined across multiple network layers. Also, a logically centralized controller can improve the service efficacy of flood prediction. Also, due to the programmability of SDN, the network is made more active and an appropriate radio access interface can be selected for data delivery. Last but not least, quick-response cloud service is essential for river monitoring on the basis of the real-time road conditions. Generally, UAVs as aerial agents refer to active objects with behavior, state, and location, which are autonomous and mobile. They can move freely with state and code in execution without suspending services, provide better asynchronous interaction, reduce communication cost, and enhance flexibility. For greater geographical distances where ground nodes are infeasible, UAVbased systems can be integrated. UAVs collected data from the sensing targets and transmitted the collected data to the ground control station or terrestrial user equipment. Various reasons have been provided for the use of UAVs in the proposed network model. The main reason is that the employment of UAVs will lead to lower traffic over the wireless channel. Also, in comparison to traditional network forwarding, the reliability of the path will be significantly improved as the numbers of hops will be reduced where packets are diffused in the network over multiple hops. The direct communication, where the UAV collects data from each sensor node, is used for data acquisition.
The ground control station was configured for data analysis and to control management operations. Ground data was distributed between ground control stations and UAV communication nodes. Sensor nodes are flexible network elements that deliver (real-time) collected water level data to the central processing unit. However, considering the extremely large area and numerous working scenarios involved in flood control, it is impossible to manage floods without using UAVs as detection tools. These were generally controlled from the ground control station.

Prediction Model
In this section, the methodology for flood prediction using UAVs along with a PSO algorithm for training the GMDH model is described.

GMDH Approach
The GMDH method has various stages. The first stage involves partitioning data into training data and testing data. This division is based on consecutive heuristic selection points in the data set. Also, this partitioning is obtained by calculating the variance of data from the mean value. Points should have high variance and be employed in the testing data set for model checking, outside of the data in the training set. In the second step, input data for the input matrix was chosen in pairs and, between each pair, a quadratic polynomial was taken with the corresponding output. The least-square fitting [31][32][33] is used to set the polynomial coefficients. To verify polynomial's suitability, the outputs of the polynomials were evaluated using data points in the testing data. Mostly, Mean Squared Error (MSE) was used to select suitable polynomials for the next layer. Finally, this process was repeated until the smallest MSE was higher than the previous layer. A suitable data model was obtained by tracing back the polynomial path with the smallest MSE in each layer. The GMDH method relies on self-organizing methods for the assessment and estimation of recording machine models with uncertain variable relationships. GMDH networks use a regression based on the Ivakhnenko polynomial [34] as follows: where M is the number of input variables, (x 1 , x 2 , x 3 , . . . , x M ) are the input variables; and a 0 , a i , a ij , a ijk , . . . are the coefficients. Generally, Eq. (1) is the quadratic form of the two variables shown in Eq. (2): The configuration of the GMDH model employed in this study is presented in Fig. 2.

The Proposed Hybrid GMDH-PSO Algorithm
The usual version of GMDH has some shortcomings that need to be addressed: (i) how to train two-layered high-precision networks; (ii) how to specify the best number of input variables; (iii) how to choose a polynomial order to form a vector solution in every node; and (iv) how to select input variables. This study focused on these issues using the proposed GMDH-PSO model.

Using PSO in the Training Process
It is apparent from previous sections that the GMDH method has some limitations in the training process. Hybridization of the PSO model with standard GMDH can solve this problem. In this application, a three-layered perceptron was chosen. PSO was used to train the GMDH network. Initially, the fitness function of every particle was determined. The error function at current particle positions was evaluated to determine the fitness value of every swarm particle. Also, fitness values were determined on the basis of the particle position vectors corresponding to the network weight matrix. In this hybrid technique, all training data was set to the GMDH network. Then, the weights of each data set were updated such that the size of the training set was equal to the number of updated weights. The vector of each particle was selected to show their error vector. This vector stored the minimum errors encountered by each particle due to their input patterns. This value shows the Mean Square Error (MSE) during training. The flowchart procedure for training a GMDH network using PSO is given in Fig. 3.
Weight training was used for the following reason: W 1 shows the weight matrix between the input layer and the hidden layer; W 2 denotes the weight connection matrix between the hidden layer and the output layer. The i th particle of a PSO in multi-layer perceptron training is denoted as follows: CMC, 2022, vol.70, no.1 723 Figure 3: Architecture of the proposed SDN-EC framework For every particle, the former best fitness value was defined to present the position of the particle as follows: The best particle index among all the particles in the population is shown by b and the best matrix is presented by: The particle velocity i is denoted by The formula for particle manipulation in each iteration is presented as follows: where m and n denote matrix rows and columns, respectively; r and s are positive constants; t is the time step between observations and is commonly taken as unity; α and β are random numbers from 0 to 1; V and W refer to the new values.
where j = 1, 2; m = 1, . . . , M j ; n = 1, . . . , N j ; M j and N j are the rows and column sizes of the matrices W , P, and V . Eq. (5). was utilized to compute new particle velocities based on its previous velocity and the distance of its current position from its best experience and the best experience of its group. Then, a new position according to the new velocity is determined using 724 CMC, 2022, vol.70, no.1 Eq. (5). Also, Eq. (6) was used to determine the fitness of the i th particle in terms of an output mean squared error of the neural network as follows: In the above equation, the fitness value is f ; the target output is t kl ; the number of output neurons is O; the predicted output according to W i is p kl ; the number of training set is S.

Region and Data Description
Hydrographs offered daily water level records from Selangor River via http://infobanjir.water. gov.my. The Selangor River is the main river in Selangor, Malaysia. It runs from Kuala Kubu Bharu in the east and flows into the Straits of Malacca at Kuala Selangor in the west. The data presented through this website were suitable indicators of potential flooding or landslides. This study utilized the data from this website with discretion. This study extracted online hydrograph data for three stations-Selangor, Selayang, and Bernam-on the Selangor River. According to the existing hydrographs on 27 December 2018, the average water level measured by Station1 was about 48.72. These values were about 37, 44, and 21 for Station1, Station2, and Station3, respectively.

Data Normalization
The water levels data set at the Selangor river were predicted over one and two days based on measured daily levels. Data normalization was done to avoid false patterns that can be created by inconsistencies. The dataset had some variations because the collection devices were located in different time zones and geographical locations. Data were normalized by dividing the total daily water levels by the number of hours within that day. The normalized data series were computed as: where D t is total daily water level, D t is the normalized data, and H t is the number of hours in the i th day.

Construction of Polynomials by PSO
Particles were used as search agents in the PSO. The grouping of input variables from the previous layer was determined on the basis of the position of each particle. This data was then moved to the next layer. Every particle contained three main parameters: P 1 , P 2 , and P 3 . P 1 was defined as a polynomial order. In this context, the polynomial order was created from the previous layers and generated randomly. For simplicity, this study took 2 in each layer. However, this value can be either 2 or 3. The number of input variables was generated randomly and was obtained from the previous layer. We defined D and r = 2 as the width of the input dataset and the default lower bound, respectively. The number of input variables was P 2 ∈ [1, r], where r = min(D, 5). The position of every particle representing tall candidates in the current layer of the network was P 3 = {a ∈ Z + |1 ≤ a ≤ D}, which is a sequence of integers. These three parameters were used to arrange nodes to move to the next layer. P 1 , P 2 , and P 3 were used to determine the polynomial order, the number of node groupings, and the whole sequence, respectively. Fig. 4 shows the procedure for the three defined parameters used to form the polynomial. In our hybrid model, three parameters were used to create the polynomial and all particles consisted of separate parameter sets. Generated polynomials were employed as an objective function for PSO.

Framework of the GMDH-PSO
The GMDH-PSO framework is comprised of six main steps: First, the input variables of the system were determined. The primary population of PSO structures and corresponding learning parameters c 1 and c 2 were created. The input variables of the model were defined as x i ; (i = 1, 2, 3, . . . , n) and were related to output variable y. Then, the normalization of input data was completed. In both experiments, the original data needed to be normalized to generate equivalent water level data. In the second phase, training data for PSO and testing data was formed. The input-output data set (x i , y i ) = (x 1i , x 2i , . . . , x ni , y i ) ; i = 1, 2, 3, . . . , n was divided into a training and testing dataset. The size of the training and testing dataset were represented by n tr and n te , respectively, where $n = n tr + n te . The training dataset was employed to construct the GMDH-PSO model. The testing dataset was utilized to evaluate model quality. In the third phase, the primary information that would be used to construct the GMDH-PSO structure was determined. Note that the previously mentioned process determined the model's structural optimization by PSO variation operators. In this context, we defined the maximum number of generations as the termination method to balance model accuracy and complexity. The maximum number of input variables was used for every node in each layer. Moreover, the value of the weighting factor was determined for the aggregate objective function. In the fourth phase, the Polynomial Neuron (PN) structure was determined using the PSO algorithm. The least-square technique was used for parameter optimization through multiple-regression analysis. This technique was used to provide the formula to compute coefficients. The objective function, which was the main instrument used to control evolutionary searches in the solution space, was defined based on the following generated polynomial: where a 1 , a 2 , . . . , a 6 are the constants assessed using the training dataset. The formula used to compute coefficients was obtained using the least-square method in the following formula: a = (x t x) −1 x t y. In the fifth phase, if the current structure was the best, the model proceeded to phase 6, otherwise it returned to phase 3. This procedure was repeated for all nodes at all layers (from the input layer to the output layer). In the sixth phase, if an acceptable solution was obtained, then the algorithm was stopped, otherwise the model returned to step 2. The GMDH-PSO algorithm was carried out by consecutively repeating steps 2-6. When the termination condition was met, one solution vector with the optimum performance was selected in the last population generation as a solution vector and all remaining solution vectors were rejected. The pseudocode of GMDH-PSO is represented in Algorithm 1. Besides, Fig. 5 shows the GMDH-PSO model.  (x 1i , x 2i , . . . , x ni , y i )} ; i = 1, 2, 3, . . . , n tr 6 Testing-dataset = x j , y j = x 1j , x 2j , . . . , x nj , y j ; j = 1, 2, 3, . . . , n te 7 n tr = size of training dataset 8 n te = size of testing dataset 9 n = n tr + n te 10 Define Particles = {par 1 (P 11 , P 12 , P 13 ), . . . , par m (P m1 , P m2 , P m3 )} 11 Generate Polynomial by (PSO(Particles)) 12 if the current structure is the best then 13 go to next step 14 else 15 go to Line 11 16 if solution is acceptable then 17 go to End 18 else 19 go to Line 6 20 End

Results
The GMDH-PSO network was compared with earlier models such as DE [35], GA [36], and ANN [37] and the results are presented in this section. In these comparisons, the main indicators for prediction errors were calculated for model evaluation [38]. In this regard, we utilized the raw data related to river level values over 24 hours from three different stations. This study used the correlation coefficient R, RMSE, and BIAS data for accuracy evaluation in the training and testing stages as follows: where M refers to total events, Y (  In total, the GMDH-PSO model showed shows slightly better performance than the GA model in terms of accuracy. The predicted and measured data of Station 1, Station 2, and Station 3 for the proposed models are shown, respectively, in Figs. 8-10 (Appendix A).
Here, we evaluate the performance of GMDH-PSO and GMDH-BP during the training and testing phases. Model evaluation statistics were MAPE (Mean Absolute Percentage of Error), R, RMSE:

Simulation Validation
In this section, we discussed some experiments conducted to demonstrate the accuracy of our proposed model, and the obtained results were analyzed. These experiments were conducted to implement self-developed UAV-WSN modules, which were simulated with the OMNET++ tool. In our system, each sensor directly communicates with the UAVs to save energy and decrease the end to end communication delays. This study assumed that the active sensor nodes would communicate with UAVs if they were within the range of the beacon signal. Furthermore, the slept sensor node did not communicate if the beacon signal was weaker than the threshold or the beacon signal was not available. During data collection, this study assumed that each active sensor could periodically transmit sensing data to the UAVs. Tab. 4 shows all the parameters used in our simulations and two sets of valuations. To perform competition experiments, this study carried out different experiments under different experimental conditions. In the first experiment, every sensor node always transmitted a packet between the client and the server. In the second experiment, UAV carried out routing and packet switching between the source node and destination node. In this experiment, if a sensor node failed, former sensor nodes could not send their data to the sink node. By employing UAVs, the data collection was possible throughout the WSN, which sent data to the central processing unit for river prediction. In this context, performance evaluations were evaluated with two response variables: Round-Trip Time (RTT) delay and packet loss rate. RTT refers to how long it took for a packet to be sent back and forth from the source to the destination. Packet loss rate refers to the ratio of packets lost in the test to the data groups sent during transmissions. Besides, each experimental result is the average of the 30 runs for each simulation scenario. The 95% confidence interval (CI) has been calculated for the collected performance metrics unless they (CI) are profoundly small. To this end, the parameter values used in this study are shown in Tab. 4. These values were carefully selected to reflect realistic scenarios. Figs. 6 and 7 show the results of these two experiments: Experiment without UAVs and experiment with UAVs. The two sets of experiments were simulated thirty-five times, and the Shapiro-Wilk normality test was used to test the normality of experiment sets.
The experiment results showed that UAVs can improve the data collection and provide a reasonably well depiction of remotely sensed environments. Compared with the existing efforts [35][36][37][38], the main advantage of this study is to design a UAV-WSN model for river flow prediction.

Conclusions and Future Directions
This study used UAV remote sensing for scenarios where a sensor node is unable to send data packets in multi-hop communications to provide robust WSNs. The usage of UAVs can improve the accuracy of water level predictions to prevent floods. Experiments tested data collection performance with and without UAVs for river monitoring. This study's UAV-WSN model proposed the hybridization of the PSO and GMDH models for water level predictions. To validate the precision of the developed GMDH-PSO model, its performance was compared to the DE, GA, and ANN models. The GMDH-PSO method outperformed the other models. The statistical indicators used for the performance evaluation of the proposed model indicated lower RMSE and higher R and BIAS compared to the GA and DE models for all nodes. Also, this study compared GMDH-PSO and GMDH-BP during the training and testing stages. The outcomes showed that MAPE was lower in the GMDH-PSO model. Results underlined the ability of GMDH-PSO to predict non-linear time series data. For future works, this study recommends the use of other techniques to predict river water levels such as reinforcement learning. In future research, to improve the computation services while reducing the latency, we plan to apply edge computation (EC). Additionally, we will consider forecast of different environmental phenomena, such as urban underground drainage or rainfall-flow.

Conflicts of Interest:
The authors declare that they have no conflicts of interest to report regarding the present study.