Application of Support Vector Machine for River flow Estimation

In recent years application of intelligent methods has been considered in forecasting hydrologic processes. In this research, month river discharge of kakareza, a river located in lorestan province at the west of Iran, was forecasted using Support vector machine and as genetic programming Inference System methods in dehno stations. In this regard, some different combinations in the period (1979-2015) as input data for estimation of discharge in the month index were evaluated. Criteria of correlation coefficient, root mean square error and Nash Sutcliff coefficient to evaluate and compare the performance of methods were used. It showed that combined structure by using surveyed inelegant methods, resulted to an acceptable estimation of discharge to the kakareza river. In addition comparison between models shows that Support vector machine has a better performance than other models in inflow estimation. In terms of accuracy, Support vector machine with correlation coefficients ( 0.970 ) has more propriety than root mean square error (0.08m /s ) and Nash Sutcliff ( 0.94 ) . To sum up, it is mentioned that Support vector machine method has a better capability to estimate the minimum, maximum and other flow values. Keyword— Genetic Programming, Estimate, Kakareza River, Support Vector Machine


INTRODUCTION
1 Nowadays one of the most important issues for managing 2 flood and preventing the economic and physical damage 3 caused by it, are correctly prediction the river flows. 4 Accurate estimates of inflow to reservoirs could play an 5 important role in the planning and management of water 6 resources. But factors and various effects that have an 7 influence on this phenomenon that analysis makes difficult. 8 The statistical Models and the regression models are the 9 most commonly analytical techniques that frequently 10 according to a linear resolution of these phenomena 11 presented results along with error and cannot model with 12 acceptable accuracy temporal changes the phenomenon. So 13 choose a model that could using affective factors, estimates 14 acceptable the input current seems imperative. Recently 15 artificial intelligent (AI) techniques have been applied to 16 estimate/predict the discharge (Kisi and Cobaner 2009). 17 These AI techniques are simple, robust and can handle 18 complex non-linear processes with ease. From the 19 literature, it is seen that the AI techniques such as gene 20 expression programming (GEP), support vector machines 21 (SVM), etc. were used to predict the discharge (Wang et al. 22 2008). As they are fully non-parametric, AI techniques 23 have a major advantage that they do not require a priori 24 concept of the relations between the input variables and 25 output data (Bhagwat and Maity 2012). A classical feature 26 of AI is that the models that are able to analyze the 27 stochasticity, dynamicity, patterns and attributes in the 28 input variables used to simulate the evaporation data, and 29 so, are considered more feasible over the other methods of 30 the estimating of discharge data (e.g. experimental 31 approaches and physically-based models).

39
In a research, Presented appropriate method for 40 seasonal flow discharge and horary used by SVM, in the 41 research using the amount of snow equivalent water and 42 the volume of the previous periods, forecasted amount 43 volume flow for the six-month time scales and 24-hour 44 than the result showed satisfactory model (Asefa et 45 al.2005). Using by genetic programming were modeled the 46 process rainfall-runoff with daily data in two fairly big 47 China basin that results of GP showed good agreement 48 with real data (Jayawardenaet al.2005). In this paper, the 1 support vector machine (SVM) is presented as a promising 2 method for hydrological prediction. Through the 3 comparison of its performance with those of the ARMA 4 and ANN models, it is demonstrated that SVM is a very 5 potential candidate for the prediction of long-term 6 discharges (Lin et al,2006). Also in order to forecasts daily 7 discharge flow Shevell river in America used of genetic 8 programming and artificial neural network and showed 9 both methods had acceptable results but GP has relatively 10 higher precision than artificial neural network 11 (Guven.2009). Support Vector Machine (SVM) is used to 12 forecast daily river flow and the results of these models are 13 compared with observed daily values. The results showed a 14 good performance in network support vector machine is 15 estimating the daily discharge (Moharrampour et al.2012).

16
In total, according to the researches done and the fact that 17 the river Kakareza is one of the most important rivers in 18 Lorestan province and the most important source of water 19 supply to different parts of its neighboring areas, which 20 over the past decades has reduced the flow rate of the river 21 in the basin, which can be explained by lower river basin 22 fluxes and surface flows. Therefore, the importance of 23 river discharge modeling and management measures to 24 improve its water quality is more than necessary. 25 Therefore, the aim of this study was to estimate the 26 discharge of Kakareza River using a support vector 27 machine based on the use of the principle of inductive 28 minimization of structural error. In simulation, the learning 29 method with monitoring in radial base functions makes 30 estimating the parameter of high speed and error Less than 31 other kernel functions. ) Vapnik,1995;Vapnik,1998

34
Case study and used data

35
Study area is kakareza river in the province of Lorestan, 36 Iran. this river is one of permanent rivers in the province 37 and is originated from southeastern mountains of aleshtar 38 and biranshahr (dehno). When this river passes through 39 aleshtar suburbs it is known as kakareza. The river is 40 between "15 ° 48 ° 49 ° longitude to the" 22 ° 32 to "52 ° 41 33 degrees latitude and it flows across the east of 42 Khorramabad (capital city of Lorestan Province). This 43 river is one of initial branches of karkhe river in zagros 44 mountains and have the average altitude of 1550 meters 45 above sea level. kakareza river basin area is about 1148 46 square kilometers and its river has a length of 85 km. 47 kakareza river joins Kashkan, Cimmeria, and Karkhe rivers 48 in its way and eventually pours into the Persian Gulf. The 49 geographical location of the study area is shown in Figure 50 1. In this study, available runoff data at monthly scale of 51 horod station (kakareza) from 1979 to 2015 in Lorestan 52 Regional Water was used. One of the most important steps in modeling, is select the right combination of input variables. Also shown in Table   3 2.The structure of input combinations.

4
Table2.The structure of input combinations In this Table Q(t-4), Q(t-3), Q(t-2), and Q(t-1) are 7 respectively discharge in t-4, t-3, t-2, and t-1 time as input 8 and Q(t) is discharge in t time as output being considered. 9 Due to the significant cross-correlation between input and 10 output data, in order to achieve an optimal model to 11 estimate the inflow to kakareza river use of different 12 combinations of input parameters that showed them in 13 Table3. To estimate input discharge kakareza river using 14 by Gene Expression Programming and Support Vector 15 Machine with have catchment hydrometric data from 432 16 registered records during the period , count in 17 345 records to training and 87 remaining records to 18 verification. Table 3.Correlation between input and output parameters 20

22
Gene Expression Programming method presented with 23 Ferreira in 1999 (Ferreira.2001). This method is a 24 combination of genetic algorithms (GA) and genetic 25 programming (GP) method than in this, simple linear 26 chromosomes of fixed length are similar to what is used in 27 genetic algorithm and branched structures with different 28 sizes and shapes aresimilar to the decomposition of trees in 29 genetic programming.Since this method all branch 30 structures of different shapes and size are encoded in linear 31 chromosome with fixed length, this is equivalent than 32 Phenotype and Genotype are separated from each other and 33 system could use all evolutionary advantagesbecause of 34 their. Now,however the Phenotype in GEP included branch 35 structures used in GP, but the branch structures be 36 inferences by GEP (than also calledtreestatement) are 37 explainer all independent genomes. In short can say 38 improvements happened in linear structure then is 39 expressed similar with tree structure and this causes only 40 the modified genomemoved to the Next Generation and 41 don't need with heavy structure to reproduce and mutation 42

55
Support Vector Machine is anefficient learning system 56 based on optimization theory that used the principle of 57 induction minimization Structural error and results an 58 overall optimal solution (Vapnik,1998). In regression 59 model SVM is estimated function associated with the 60 dependent variable Y as if is afunction of several 61 independent variables X (Xu et al.2007).Like other 62 | Page www.aipublications.com regression problems is assumed the relationship between 1 the dependent and independent variables to be determined 2 with algebraic function similar f(x) plus some allowable 3 If W is coefficients vector, b is constant characteristic 7 of regression function, and also ∅ is kernel function, then 8 goal is to find a functional form for f(x). It is realized with 9 SVM model training by collection of samples (train 10 collection). To calculate w and b require to be optimized 11 error function in -SVM with considering the conditions 12

17
In the above equations, C is integer and positive, that 18 it's factor of penalty determinant when an error occurs. ∅ is 19 kernel function, N is number of samples and two 20 characteristics ε i and ε i * are shortage variables. Finally can 21 rewrite SVM function as follow (Shin et al,2005): 36

54
The general purpose of intelligent models is to express the 55 relation between variables that find their complexity 56 difficult in the nature of work with high uncertainty. Daily 57 stream flow is one of the important hydrological 58 parameters that is of great importance in future steps. In 59 order to reduce the error and also to estimate the daily flow 60 rate parameter with high accuracy using the lowest input 61 parameters, this method has been used which will provide 62 a better performance compared to approximate methods. 63 The aim of this study is to obtain this natural complexity 64 between hydrological parameters and provide a model for 65 prediction in the future, because daily discharge is more 66 important than other parameters, so this parameter is 67 selected as the target variable.

68
The results of Gene Expression Programming

69
Using gene expression programmingdue to the 70 selection of variables in the model and remove variables 71 with less impact and also ability to provide a clear 72 relationship were considered to estimating inflow to the 73 kakareza river. Since ever four input areincorporated to 74 determining the significant variables and more reviews in 75 addition four of the original operator (F1) and the states 76 based on arithmetic operators default (F2). The reason for 77 choice this type of operator has been based on studies 78    include four the main mathematical operators with a simple 1 mathematical relationship has the most accurate to 2 estimating inflow to the kakareza river. The scatter plots of 3 gene expression programming related to the verification 4 stage in Fig(2-b)  14 These results are consistent with Kisi and Shiri (2012) 15 research. And it can be stated that the equation obtained 16 from gene expression planning is obtained from the 17 random combination of the sum of the terminals and 18 functions. Therefore, if the relationship between inputs and 19 outputs is linear, but the operators sin, cos, etc. are selected 20 in the set of functions, the gene expression planning uses 21 the selective operators to extract the relationship, which 22 reduces the accuracy of the model. In this study, to 23 increase the precision of the model of the operators' sin, 24 cos, and so on, and with accuracy and simplicity, the 25 model derived from four basic mathematical operations 26 was proposed to estimate sediment load.  In order to estimate the inflow to the kakareza river by 2 SVM model can examine types of kernel function, than 3 was selected linear kernel, polynomial and radial basis 4 functions that are common types used in hydrology. The 5 results of study models is given in Table5. According to 6 this table combined model number4 with radial basis 7 functions kernel has the highest correlation coefficient 8 R=0.97, lowest root mean square error RMSE=0.08 m 3 /s 9 and NS=0.94 in verification stage that has optimal solution 10 than other models. In Fig3 shown the best model for 11 verification of data.

12
As shown in Fig(3-b)  performance support vector machine has been shown in the 19 Kakareza River discharge estimation, even if only one 20 input parameter is used, which leads to the presence of 21 statistical deficiencies in this network with Having the 22 minimum input parameters, such as flow rate, one day 23 before, would have acceptable performance in flow rate 24 forecasting. In Fig. 3, changes in computational and 25 observational values of time are shown, it is seen that this 26 model was in the estimation of most of the values of 27 acceptable accuracy in such a way that these estimates are 28 close to their actual value. The results are consistent with 29 the research by Buyukyildiz and Kumcu (2017) and 30 Nourani et al (2015). This can be explained by the fact that 31 the backup machine is based on the use of the principle of 32 inductive minimization of structural error. Therefore, in 33 simulation, using a learning method with monitoring in 34 radial base functions, the prediction of the parameter has a 35 higher velocity and less error than other kernel functions, 36 and this is a privilege of radial base functions.  Choosing the optimal solution for each of the models and compare together was defined all three methods can with good 10 accurate simulate inflow to the kakareza river. As can be seen in Table6throughtheused models, support vector machine    Finally difference between the observed inflow values 3 and optimal computational models calculated as a 4 percentage of the mean observed values (error value) and 5 was drawn this diagram in comparison with the data 6 recorded (Fig5). As seen in this Fig, more errors to ever 7 three models has been ±5 band the highest error rate gene 8 expression programming and support vector machine 9 models are respectively 6.61 and3.10 percent of the mean 10 observed values. Among these models (GEP and SVM) 11 svm model has lowest error value. Totally due to the high 12 estimation accuracy and reliability gene expression 13 programming and support vector machine models the 14 correlation between the observed values and the computed 15 values are respectively 0.970 and 0.880. Also the results of 16 was significant estimated and observed values in the 17 probability levels %5 and %10 shown, SVM model has 18 significant correlation in both probability levels.

21
In this research, we tried to evaluated performance 22 some models to simulating discharge to the kakareza river 23 In the province lorestan using by discharge month data in 24 kakareza river. Used models include gene expression 25 programming and support vector machine models. 26 Observed inflow values compared with estimated inflow in 27 these models (GEP and SVM). The results summarized as 28 follows:

53
The authors are very grateful of the Regional Water 54 Company, Lorestan Province, Iran, because they 55 participate in gathering the data required to perform the 56 work.