Flood Forecasting of Malaysia Kelantan River using Support Vector Regression Technique

The rainstorm is believed to contribute flood disasters in upstream catchments, resulting in further consequences in downstream area due to rise of river water levels. Forecasting for flood water level has been challenging, presenting complex task due to its nonlinearities and dependencies. This study proposes a support vector machine regression model, regarded as a powerful machine learningbased technique to forecast flood water levels in downstream area for different lead times. As a case study, Kelantan River in Malaysia has been selected to validate the proposed model. Four water level stations in river basin upstream were identified as input variables. A river water level in downstream area was selected as output of flood forecasting model. A comparison with several benchmarking models, including radial basis function (RBF) and nonlinear autoregressive with exogenous input (NARX) neural network was performed. The results demonstrated that in terms of RMSE error, NARX model was better for the proposed models. However, support vector regression (SVR) demonstrated a more consistent performance, indicated by the highest coefficient of determination value in twelve-hour period ahead of forecasting time. The findings of this study signified that SVR was more capable of addressing the long-term flood forecasting problems.


Introduction
Research on the advancement of flood forecasting has been increasing since it contributes to disaster risk reductions, presenting a difficult, challenging, and complex application to model [1]. According to Sendai Frameworks for disaster risk reduction (SFDRR) of 2015-2030, the DRR is stated in priority number three and four, stipulated as 'investing in disaster risk reduction for resilience' and 'enhancing disaster risk preparedness for effective response' respectively [2]. Hence, in connection with these viewpoints, flood modelling and forecasting are crucial for disaster risk management. In many regions of the world, flood forecasting is among the few feasible options to manage flood disasters.
To date, several flood forecasting models generally focus on data-specification involving simplified various input assumptions [3]. Thus, to mimic the complex mathematical expression of physical processes and river behaviors, models with specific techniques (empirical black-box models, stochastic, and hybrids) were applied [4]. The physically and statistically based models improve the usage of advanced data-driven methods, such as in Machine Learning technique. The most well-known works of flood forecasting modelling include artificial neural networks (ANNs) [5][6][7], support vector machines (SVMs) [8,9], and adaptive neuro-fuzzy inference systems (ANFIS) [3,10], which have been effectively employed for both short-term and long-term flood forecasting.
ANNs model provides considerable flexibility in solving nonlinear problems, successfully applied in various hydrological areas [11,12]. ANNs has been employed for flood forecasting due to its ability and efficiency in terms of computing time. Although ANNs performed more efficiently in solving time series hydrological data rather than in a physical-based model, SVM has also been incredibly effective in improving flood forecasting techniques due to its high accuracy and capability [13]. The high accuracy of SVM compared with ANN indicated as an appropriate method for rapidly producing flood inundation forecasting and early warning system [14]. Furthermore, Wu [15] presented SVM effectiveness in different lead time of flood forecasting. The result shows that the SVM model provides a strong capability and satisfying regression model performance for one to three hour ahead of forecasting.
For more than a few decades, researchers have used conventional SVM algorithms and supervised learning algorithms such as neural networks successfully utilized for classification problems [16]. These learning approaches have been applied for regression task analysis, including function estimation by fitting a curve to a set of data points. The application of SVMs in addressing general problem of regression analysis is called Support Vector Regression (SVR). SVM has been proven in hydrological modelling and its application owing to the robustness of the system. SVM-Regression has played an important role in numerous time series forecasting applications, including flood forecasting [17]. Khaled Boukharouba [18] employed SVR for flash flood forecasting in the absence of rainfall forecast, based on the hierarchical flood events, and demontrated that SVR performed efficiently for flash flood forecasting.
Although some attempts have been devoted to address time-series issues by using SVM approach, published research works implementing SVM as a machine learning approach in the hydrological engineering area have been limited especially for flood forecasting. This study intends to evaluate these SVM models' performance against other models such as ANNs and linear regression models in predicting river water levels to address flood forecasting problems. In addition, this study aims to expand the results of a previous study [19]. This study proposes the multi-time ahead data-driven models that simulate and predict river water levels from historical-observed data by implementing SVM technique. In this study, the two machine learning algorithms, namely radial basis function and nonlinear autoregressive exogenous neural networks, have been successfully examined. The comparison among the three mentioned methods was investigated.

Methodology and Study Area
The proposed method has been evaluated by examining a case study in Malaysia, specifically in Kelantan River, as a representative of flood forecasting point (FFP). The area was selected due to its proximity to reservoir frequently causing seasonal-flood disasters in Malaysia. The state of Kelantan was situated in the eastern region and in the northeast of peninsular Malaysia, with Kota Bharu as the capital city of Kelantan. Kelantan state fronts China South Sea boundary in northeast, Terengganu state in east, Pahang and Perak in south and west respectively, and Thailand boundary in north. Kelantan state has a total area of about 15,101 km 2 with the population of approximately 1.76 million in 2015 [20].
Kelantan river basin covers about 13,000 km 2 with tributaries including Lebir river, Galas river, Pergau river and Nenggiri river [21]. Kelantan river is to approximately 105 km in length, including Lebir river and Galas river in Kuala Krai city, as the central part of Kelantan river, comprising approximately 2,430 km 2 and 7,770 km 2 respectively [21]. Fig. 1 illustrates the river network of Kelantan watershed, major cities, and water level stations. The total length of Kelantan main river comprises approximately 388 km from the head of its longest tributary, draining an area of about 13,000 km 2 and occupying more than 85% of Kelantan State [22].
The river water level data is retrieved from Department of Irrigation (DID) Malaysia on fifteen minutes basis. DID supervisory control and data acquisition systems collected about three month period of data in October -December 2011. Only the specific season on the whole one-month recorded in November is used as a dataset in this study. It is about 2880 records of dataset were used, employed for training and validation test. As shown in Tab. 1, four variables indicate the river water level as input data required for SVR network, with one observed water level as an output target.

Support Vector Regression
A software package, known as LIBSVM developed by Chi-Chung Chang and Chi-Jen [23] is used in this study. At the same time, the Matlab ® data normalization function was applied to normalize inputs and targets. LIBSVM serves as a library for support vector machines (SVMs) in solving SVM optimization problem in different types, including classification SVM, support vector classification (SVC), one-class SVM for distribution estimation, support vector regression (SVR), and SV regression (SVR). In this study, SVR is employed to investigate river water level for flood forecasting model. In prior study, SVR has been successfully employed for flood forecasting in China river basin by Bafitlhile and Zhijia Li [24]. This method has been compared to ANNs models in simulating and forecasting the stream flow. Results indicated that SVR generally performs better than ANNs in stream flow forecasting of catchments.
In examining the proposed model's effectiveness, it is significant to compare the previous studies. Therefore, a case study applied in [19] was examined to verify the models' performance. Two approaches, which were radial basis function neural network (RBFNN) and nonlinear autoregressive exogenous neural network (NARX), had been successfully implemented for twelve-hour period ahead of flood forecasting, with the formulation as described in [25]. The observed event-based water level data was divided into training and testing sets, where 80% of the available data was allocated for training data and the remaining 20% was allocated for testing data.
According to Vapnik's theory [26,27], SVM equations are illustrated in Eq. (1-4), respectively. Further, a set of N data points by x i ; d i f g n i is depicted in SVM-Regression function as in Eq. (1-2): In which: x i serves as input space vector, and d i serves as target value. Meanwhile, ' x ð Þ represents highdimension feature space for mapping the input x; b is a scalar; w is a normal vector; and C 1 n represents empirical error. SVR problem is formulated in the following optimization problem: Subject to: In which: regularization term is 1 2 k w k 2 , n is loss function related to approximation accuracy of training data point, C represents error penalty factor, and l represents size of training data set. By solving Eq. (1) and (2), a generic function is obtained through Eq. (5): In which: n is the number of support vectors, x i is the support vector, and K x; is a kernel function to map SVR input vector into a higher-dimensional feature space. In this study, RBF kernel is employed due to the efficiency of this kernel proven in previous studies [28]. Based on the literature, RBF kernel has worthy interpolation capabilities, mathematically expressed in Eq. (6): In which: variable x i and x j are input space vectors (vectors computed from the training or testing data set). The choice of three parameters (c; e; and C) determines RBF kernel function predictive accuracy. It is demonstrated that RBF outperformed than other kernel functions in SVM model [29]. Thus, in this study, RBF would be implemented as an optimization of kernel function.
The proposed models' effectiveness, can be evaluated by comparing their root mean square error (RMSE) and their coefficient of determination (R 2 ) value [30]. These formulations are illustrated in Eq. (7 and 8) respectively, in which n represents number of data points, Q f is forecasted value, Q 0 is actual value and Q 0 is average value of actual or observed records.

Result and Discussions
As a result, the four input variables represent river water level in upstream and downstream area. One output variable represents the observed river water level in downstream area as flood forecasting point. The illustration of single line time series from the input and output is presented in Fig. 2, indicating that four water level inputs from upstream stations significantly impact the flood water level as observed in downstream station. Each river with its levels contributes to river water level in output area due to heavy rain at the observed time. Thus, flood disasters are inevitable due to overflow of river water level.
This study constructed multi-step models to forecast river water level at different leading time steps. The trained SVR model is utilized to hourly forecast flood water level hydrograph in one until twelve hours ahead of forecasting time. The result of actual flood data and simulated floodwater level is summarized in Fig. 3, indicating that the predicted peak levels match the recorded peak levels for all flood events. SVR model from one-hour step size is closer to the measured water level, while other step sizes are considered one step behind. However, RMSE and R 2 indicate different performances among the simulated models. Both RMSE and R 2 are calculated to evaluate model performance as illustrated in Fig. 4. For one-hour to twelve-hour period ahead of forecasting time, it is obvious that the change in RMSE and R 2 is not very significant. However, results indicate that the proposed method performs with sufficient reliability when examined in four-hour period ahead of forecasting time, depicted by the highest R 2 value and the least RMSE value obtained in this study. This finding emerges since the t -4 means of four hour period before the time t has the most significant correlations for the forecasted water level. The twelve-hour period ahead of forecasting time is considered fit than other models, indicating that longer time-step of forecasting time could not reflect the expected predictions [15].
Additionally, this study employed a LIBSVM package which is e -SVR. SVR was trained by RBF kernel function to transform a nonlinear problem into linear function by mapping input data into a high dimensional feature space. The performance of SVR model is exceedingly sensitive based on the hyperparameter values, including cost constant C, radius of insensitive tube, e value, and kernel parameter c of RBF function. After several configurations, scale of C is set as 2 À5 , 2 À4 , …, 2 10 , and scale of c is set as 2 À5 , 2 À4 , …, 2 5 . Further, e -SVR has been tuned according to [31] to get the best C and best c. Following some explorations, the best values were set as 1, 6.9644, and 0.01 respectively for C, c, and e.
In evaluating SVR model's effectiveness, it is necessary to compare previous studies [19] in which FFP and data observed are the same. The evaluated SVR model in twelve-hour period ahead of forecasting time was compared with the presented models. Twelve-hour period ahead of forecasting time was selected to measure the time sufficiently, preventing flood disasters. It was reported that NARX neural network outperformed RBF neural network model in forecasting a twelve-hour period ahead of forecasting time to observe flood from river water level. Fig. 5 illustrates that the studied models could perform with the actual flood value, indicating that all the proposed models are proficient in following and fitting the observed flood data. To investigate the model performance, RMSE and R 2 are calculated to get insight into the detailed performance.
The overall comparison of model performances are calculated and summarized in Tab. 2. It can be seen that, in term of RMSE performance, the NARXNN still outperformed over the other two models. However, SVR model indicates better result as seen from the highest R 2 value; therefore, the proposed SVR model is have a great potential in long-term time ahead of flood forecasting time [32].

Conclusions
This study is set out to assess the support vector machine algorithm's feasibility for the time-series forecasting problem. SVM-Regression is used as a technique to establish river water level in flood forecasting model. The experiment was conducted by applying river water level data, measured in Kelantan River, Malaysia. A comparison of the three methods, including SVR, RBF, and NARX neural networks, is described in this study. This study investigated that SVR could easily forecast river water level in one to twelve-hour period ahead of forecasting time. Although SVR is presented outperforms in  This study examined three essentials machine learning methods to achieve river water level forecasting for flood disasters. These findings make several contributions to current in intelligent frameworks to build a committee machine with an intelligent system (CMIS), currently in development by the present authors. These individual learning machines could improve the proposed models to obtain the generalization and robustness of flood forecasting technique. For future research work, CMIS could also help as a promising optimization tool in the hydrological time-series forecasting topics in the context of advanced computational methods. Besides, correlation analysis between the time inputs variable and time forecasted data could be explored more in further studies.